PARSEME Corpus with Annotated Verb Multiword Expressions


The PARSEME-bg corpus covers 21 599 sentences amounting to 480 413 tokens, including 6721 annotated verb multiword expressions. Annotation was performed in two phases – phase 1.0 (2017) and phase 1.1 (2018). The distribution of semantic types of VMWEs is shown below.

We use the following types of VMWEs:

(a) verb idioms (VID) with non-compositional meaning (e.g., гушвам букета, обирам си крушите);

(b) light verb constructions (LVC) with two subtyles: true LVCs (LVC.full; e.g. имам възможност) and causative LVCs (LVC.cause; e.g. давам възможност);

(c) inherently reflexive verbs (IRV) with compulsory particle se / si (e.g. усмихвам се, спомням си);

(d) inherently adpositional verbs (IAV, e.g. заставам зад = подкрепям).

Copyright © 2015-2022 Department of computational linguistics. All rights reserved.