The PARSEME-bg corpus covers 21 599 sentences amounting to 480 413 tokens, including 6721 annotated verb multiword expressions. Annotation was performed in two phases – phase 1.0 (2017) and phase 1.1 (2018). The distribution of semantic types of VMWEs is shown below.
We use the following types of VMWEs:
(a) verb idioms (VID) with non-compositional meaning (e.g., гушвам букета, обирам си крушите);
(b) light verb constructions (LVC) with two subtyles: true LVCs (LVC.full; e.g. имам възможност) and causative LVCs (LVC.cause; e.g. давам възможност);
(c) inherently reflexive verbs (IRV) with compulsory particle se / si (e.g. усмихвам се, спомням си);
(d) inherently adpositional verbs (IAV, e.g. заставам зад = подкрепям).
PARSEME Corpus 1.1 (2018): GitLab | LINDAT/CLARIN
PARSEME Corpus 1.0 (2017): GitLab
Agata Savary, Marie Candito, Verginica Barbu Mititelu, Eduard Bejček, Fabienne Cap, Slavomir Čéplö, Silvio Ricardo Cordeiro, Gülşen Eryiğit, Voula Giouli, Maarten van Gompel, Yaakov HaCohen-Kerner, Jolanta Kovalevskaitė, Simon Krek, Chaya Liebeskind, Johanna Monti, Carla Parra Escartín, Lonneke van der Plas, Behrang QasemiZadeh, Carlos Ramisch, Federico Sangati, Ivelina Stoyanova, Veronika Vincze. PARSEME multilingual corpus of verbal multiword expressions. In: Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (Eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop. ISBN-13: 978-3-96110-123-8.
Koeva, S., C. Krstev, D. Vitas, T. Kyriacopoulou, C. Martineau, T. Dimitrova. Semantic and Syntactic Patterns of Multiword Names (a Cross-language Study). In: Manfred Sailer, Stella Markantonatou (Eds.) Multiword Expressions: Insights from a Multi-lingual Perspective, Language Science Press, 2018, 31-62. ISBN:978-3-96110-063-7 (Digital) 978-3-96110-064- 4 (Hardcover).
Barbu Mititelu, V., S. Leseva. Derivation and Multiword Expressions. Multiword Expressions: Insights from a Multi-lingual Perspective, Language Science Press, 2018, ISBN:978-396-110-064-4, DOI:10.5281/zenodo.1182583, 215-246.
Barbu Mititelu, V., I. Stoyanova, S. Leseva, M. Mitrofan, T. Dimitrova, M. Todorova. Hear about Verbal Multiword Expressions in the Bulgarian and the Romanian Wordnets Straight from the Horse’s Mouth. Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Association for Computational Linguistics, 2019, ISBN:978-1-950737-26-0, DOI:10.18653/v1/W19-5102, 2-12.
Stoyanova, I., S. Leseva, V. Barbu Mititelu, M. Todorova, M. Cristescu. Wrapping our Heads Around VMWEs and their Derivatives. Proceedings of the 14th International Conference on Linguistic Resources and Tools for Natural Language Processing, Cluj-Napoca, 18-20 November 2019, Editura Universității „Alexandru Ioan Cuza” din Iași, 2019, ISSN:1843-911X, 153-165.
Лесева, Св., Цв. Димитрова, М. Тодорова, Ив. Стоянова. Описание на отношенията между глаголните несвободни фрази и техните деривати в лингвистични ресурси. Сборник от конференцията Юбилейни Паисиеви четения 2018: 45 години филологии в Пловдивския университет, приета за печат: 2019, 1-17.
Тодорова, М. Аргументност и фразеологизация. Рада и приятели. Сборник в чест на 65-годишнината на проф. д-р Радка Влахова, Университетско издателство „Св. Климент Охридски“, 2019, ISBN: 987-954-07-4700-2, 143-156.
Leseva, S., Barbu Mititelu, V., Stoyanova, I. It Takes Two to Tango – Towards a Multilingual MWE Resource. Proceedings of the Fourth International Conference “Computational Linguistics in Bulgaria” (CLIB 2020), Institute for Bulgarian Language, 2020, ISSN:2367-5675, 101 – 111.