BulNet

Bulgarian WordNet



General

   The Bulgarian wordnet (BulNet) is a lexical semantic network of Bulgarian that was launched within the project for development of a lexical semantic network of the Balkan languages BalkaNet. The Bulgarian database is integrated into the BalkaNet and the network of the European languages EuroWordNet through unique interlingual indexes (ILIs) marking unambiguously the counterparts in the different languages. After the completion of the BalkaNet project, the construction of the Bulgarian wordnet has continued within the nationally-funded projects BulNet - a Lexical-semantic Network of Bulgarian, and Electronic resources and processing tools" cofunded along the project CESAR: Central and South-East European Resources funded under The Information and Communication Technologies Policy Support Programme Call: CIP ICT-PSP-2010-4.

  The BulNet is developed following the Princeton WordNet (PWN) framework being a subtype of the traditional semantic networks whose structure consists of nodes and relations between the nodes. The nodes are synonym sets (synsets) that contain words or compounds (literals). Arcs connecting the nodes express semantic, derivative and extralinguistic relations between objects in the nodes. Literals (senses) and synsets (meanings) encode language independent concepts. The semantics of lexical units in wordnet is expressed implicitly by the synonymous relations between literals in the synset and relations to other nodes in the network, and explicitly through the explanatory definition and usage examples.

  As of January 21, 2013, the Bulgarian wordnet consists of 49,189 synonym sets, distributed into nine parts of speech - nouns, verbs, adjectives, adverbs (open-class words); pronouns, prepositions, conjunctions, particles, interjections (closed-class words). Each synonym set is supplied with explanatory definition which represents the common referential meaning of all its members.

  The synonym sets are linked to each other by means of semantic, morpho-semantic and extralinguistic relations that connect the words in a language. A Sense-Annotated corpus of Bulgarian (BulSemCor) was developed.

  The ongoing tasks of the project are:
     Further expansion of the Bulgarian WordNet with new synsets.
     Editing of the existing data.
     Development and application of tests for completeness and consistency of the database.
     Design and implementation of a system for automatic word sense disambiguation (WSD).
     Dissemination of the project's results.


[General] [Goals] [Results] [Wordnet] [People] [Publications] [Related projects] [Contacts]