
Bulgarian Semantically Annotated Corpus
Home Description Application Publications The Bulgarian Sense-Annotated Corpus (BulSemCor) is a structured corpus of texts in Bulgarian in which all words are assigned an appropriate sense from the Bulgarian WordNet. BulSemCor was created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences. Language: Bulgarian. Type: general monolingual text corpus enriched with…

Bulgarian-English Sentence- and Clause-Aligned Corpus
Home Description Applications Publications The Bulgarian-English Sentence- and Clause-Aligned Corpus (BulEnAC) is a parallel corpus of aligned Bulgarian and English sentences and clauses with annotation of the syntactic relation between clauses. The corpus was developed at the Department of Computational Linguistics of the Institute of Bulgarian Language “Prof. Lyubomir Andreychin” at the Bulgarian Academy of Sciences. Languages: Bulgarian, English. Type:…

PARSEME Corpus with Annotated Verb Multiword Expressions
Information Corpus Annotation Publications The PARSEME-bg corpus covers 21 599 sentences amounting to 480 413 tokens, including 6721 annotated verb multiword expressions. Annotation was performed in two phases – phase 1.0 (2017) and phase 1.1 (2018). The distribution of semantic types of VMWEs is shown below. We use the following types of VMWEs: (a) verb idioms (VID) with non-compositional meaning…

Hydra
Hydra is an OS-independent system designed for wordnet development, validation and exploration. The program enables users to browse and edit any number of monolingual wordnets at a time. The individual wordnets are synchronised, so that equivalent synonym sets, or synsets, may be viewed and explored in parallel. Fig. 1. Hydra’s Synset view with the Bulgarian WordNet and the…

Chooser
General description Chooser is an OS independent multi-functional system for linguistic annotation adaptable to different linguistic levels and different annotation schemata. Below Chooser’s features are discussed in relation to semantic annotation. The basic annotation functionalities implemented in Chooser are: fast and easy-to-perform annotation; run-time access to detailed information for the annotation candidates through the associated wordnet senses with…

A Web-Based Infrastructure for Bulgarian Data Processing
General description The Bulgarian Language Processing Chain (developed in 2011-2012) includes the following types of text processing and linguistic annotation: • Sentence segmentation; • Tokenisation; • POS tagging and grammatical annotation; • Lemmatisation. BgTagger The Bulgarian POS tagger (BgTagger) marks up each word with the most probable Part of Speech and unambiguous morphosyntactic information among the set of…