Bulgarian Semantically Annotated Corpus
Home Description Application Publications The Bulgarian Sense-Annotated Corpus (BulSemCor) is a structured corpus of texts in Bulgarian in which all words are assigned an appropriate sense from the Bulgarian WordNet. BulSemCor was created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences. Language: Bulgarian. Type: general monolingual text corpus enriched with…

Bulgarian-English Sentence- and Clause-Aligned Corpus
Home Description Applications Publications The Bulgarian-English Sentence- and Clause-Aligned Corpus (BulEnAC) is a parallel corpus of aligned Bulgarian and English sentences and clauses with annotation of the syntactic relation between clauses. The corpus was developed at the Department of Computational Linguistics of the Institute of Bulgarian Language “Prof. Lyubomir Andreychin” at the Bulgarian Academy of Sciences. Languages: Bulgarian, English. Type:…

Bulgarian Brown Corpus
Home Description Copyright Applications Publications Links The Bulgarian Brown Corpus is a general static representative sample corpus of Bulgarian compiled at the Department of Computational Linguistics at the Institute for Bulgarian Language. It follows the methodology presented by Brown University, Providence, Rhode Island, USA and applied in the compilation of the famous Brown Corpus (Brown University Standard Corpus of Present-Day…

PARSEME Corpus with Annotated Verb Multiword Expressions
Information Corpus Annotation Publications The PARSEME-bg corpus covers 21 599 sentences amounting to 480 413 tokens, including 6721 annotated verb multiword expressions. Annotation was performed in two phases – phase 1.0 (2017) and phase 1.1 (2018). The distribution of semantic types of VMWEs is shown below. We use the following types of VMWEs: (a) verb idioms (VID) with non-compositional meaning…