The Bulgarian WordNet was started within the EU-funded project BalkaNet – a Multilingual Semantic Network of the Balkan Languages directed to the construction of synchronized semantic databases for the following Balkan languages – Bulgarian, Greek, Romanian, Serbian, Turkish and the expansion of the Czech lexical-semantic network. After BalkaNet’s completion the development of the Bulgarian WordNet has continued within the nationally-funded projects BulNet – a Lexical-semantic Network of Bulgarian (2005-2010) and Language E-resources and Processing Tools (2011-2013); the latter is co-funded under the project CESAR: Central and South-East European Resources (Information and Communication Technologies Policy Support Programme Call: CIP ICT-PSP-2010-4).
Currently the Bulgarian WordNet comprises more than 49,189 (as of January 21 2013) synonym sets distributed into nine parts of speech – nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, particles and interjections. The words included in the Bulgarian WordNet have been selected according to different criteria, the main ones being frequency analysis of the word occurrences in large text corpora (taking account of the number of occurrences of citation forms and not of wordforms), as well as the inclusion of synsets already featuring in the wordnets of other languages and synsets that correspond to high-frequency word senses found in parallel corpora.
Each synonym set – SYNSET encodes the relation of equivalence between a number of lexical items – LITERALS (at least one should be explicitly represented in the SYNSET), each of them having a unique meaning (specified by the value of SENSE) – which pertain to one and the same part of speech (specified as the value of POS) and represent one and the same lexical meaning (specified as the value of DEF). Each synset is linked to its counterpart in PWN 3.0 by means of a unique identification number – ID. The common synsets in the Balkan languages are marked as common concepts subsets – BCS. In a monolingual database a synset should be linked to at least one other synset through an intralingual relation. Non-obligatory information may also be encoded such as examples of usage, stylistic peculiarities, morphological or syntactic properties, author and last edit details.
More detailed information about the Bulgarian WordNet, as well as current data about the number and distribution of the synsets according to part-of-speech, is available on BulNet’s webpage http://dcl.bas.bg/BulNet/wordnet_en.html.