Download

Diachronic corpus

The corpus covers the following time spans:

Corpus from the period of 1850 – 1880: ➥ Corpus download (with metadata)
Corpus from the period of 1881 – 1910: ➥ Corpus download (with metadata)
Corpus from the period of 1911 – 1930: ➥ Corpus download (with metadata)
Corpus from the period of 1931 – 1950: ➥ Corpus download (with metadata)
Corpus from the period of 1951 – 1990: ➥ Corpus download (with metadata)
Corpus from the period of 1991 – 2021: ➥ Corpus download (with metadata)

Open parallel corpora

The following precompiled subcorpora are available for download:

Administrative corpus of official EU documents – parallel, in 23 languages with largest corpora in English, German, Romanian, Polish and Greek.
Journalistic corpus from SETimes.com – parallel, in 9 Balkan languages (Bulgarian, Romanian, Macedonian, Serbian, Albanian, Greek, Turkish, Croatian, Bosnian) and English.
Popular Science from Wikipedia – in Bulgarian..
Administrative/Science corpus with medical texts from the EMEA – parallel, in 23 languages.

At present, the corpora are provided in plain text format, but upon request an annotated version may become available. For more details, please contact us: bulnc@dcl.bas.bg.

Requests for subcorpora extraction

The requests include texts which can be distributed in compliance with the Copyright laws. The main corpora offered for download are listed above

The requests can be based on some of the following criteria:

Style, domain and/or genre;
Period of time;
Language(s);
Size of the text samples and the corpus; etc.

For more details, please contact us: bulnc@dcl.bas.bg.