Corpus-Extracted MWE Lists
The classification of multiword expressions (MWEs) developed by Baldwin et al. (Baldwin, T., C. Bannard, T. Tanaka, D. Widdows. An Empirical Model of Multiword Expression Decomposability. In: Proceedings of the ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment. 2003) who distinguish between non-decomposable, idiosyncratically decomposable and simple decomposable MWEs is adopted. Further, we divide simple decomposable MWEs into categories…
Bulgarian-X language Parallel Corpus
The Bulgarian-X language Parallel Corpus (Bul-X-Cor) is a part of the Bulgarian National Corpus (BulNC). The Bulgarian National Corpus is designed as a uniform framework for texts of different modality (written – spoken), period (synchronic – diachronic), and number of languages (monolingual – parallel where one of the counterparts is Bulgarian). Any X-language in the corpus is equally treated with…
Bulgarian National Corpus
The Bulgarian National corpus is created at the Institute for Bulgarian Language „Prof. L. Andreychin” by research associates from the Department of Computational Linguistics and the Department of Bulgarian Lexicology and Lexicography. It incorporates several individual electronic corpora, developed in the period 2001-2009 for the purposes of the two departments. The corpus is constantly enlarged with new texts. The Bulgarian…
Multiword Expression Dictionary for Bulgarian
The Bulgarian dictionary of MWEs includes 27,744 MWEs altogether which are divided into 13 categories based on their idyomaticity evaluated with respect to the following features: • whether the MWE is a named entity; • whether the MWE contains a reference to a named entity; • the degree to which the meaning of the MWE is compositional and transparent. The…
Classification of Verbs in BulNet
Practical results from Stage 1 The results include mainly combining semantic description from various sources (WordNet, FrameNet, VerbNet), as well as unified semantic description and classification of verb and noun synsets. A. Verb synsets with assigned general verb classes from VerbNet and frames from FrameNet: VERBS B. Recommended changes to the resources: DATA CHANGES File Contents 01 Assigning new hypernyms…
New semantic relations based on predicate-argument structure
Results from Stage 2 The results are related to the newly defined semantic relations based on conceptual frames, as well as their representation in WordNet by connecting verb synsets and classes of nouns that satisfy the selectional requirements of frame elements. A. Synsets with assigned frames from FrameNet (5 025 are manually verified and labelled by 0++): Verbs with assigned…
PARSEME: PARSing and Multi-word Expressions
PARSEME Corpus 1.1 (2018): GitLab | LINDAT/CLARIN PARSEME Corpus 1.0 (2017): GitLab Annotation notes Official Annotation Guidelines Annotation examples
PARSEME shared task: Phase 2
Phase 2: Second annotation by two independent annotators with feedback on guidelines, categories and language-specific features Last edited: 22/03/2016 PARSEME Guidelines v. 5 ➥ Results from Phase 2 ➥ Problematic cases and hesitation in annotators’ decisions ➥ Updated classification of vMWEs Results from Phase 2 200 sentences annotated by two different annotators ANNOTATOR 1: 301 vMWEs ANNOTATOR 2: 306 vMWEs 22 vMWEs…