Без категория « Секция по компютърна лингвистика

Penny Labropoulou

Penny Labropoulou is a Principal Applications Researcher at the Institute for Language and Speech Processing/R.C. “Athena”, working mainly in the areas of: metadata models for the documentation of language resources and technologies and cultural resources infrastructures for sharing and exploiting language resources and language technologies, licensing issues and legal metadata for language resources, computational lexicography and lexicology (design and implementation…

Georg Rehm

Prof. Dr. Georg Rehm works as a Principal Researcher in the Speech and Language Technology Lab at the German Research Center for Artificial Intelligence (DFKI), in Berlin. Currently, Georg Rehm is the Coordinator of QURATOR (BMBF, 2018-2022) and European Language Grid (ELG; EU, 2019-2022). Furthermore, he is the Co-coordinator of European Language Equality (ELE; EU, 2021-2022) and involved, as a…

Post-workshops Survey (ELG / MIC 21 Workshops)

➥ Information about the event ➥ Download pdf

Pre-workshops Survey (ELG / MIC 21 Workshops): Results

➥ Information about the event ➥ Download pdf

Registration (ELG / MIC 21 Workshops)

➥ Information about the event Thank you for your interest in the First Bulgarian dissemination event of the European Language Grid (ELG). The event will be streamed live in the YouTube channel of the Department of Computational Linguistics. Participants can use the chat for comments and questions which will be addressed during the Q&A sessions. Please take part in our…

CulNet 2020

List of synsets in the Bulgarian Culinary WordNet 2020

Wiki1000+ corpus with annotated MWEs

General description Wiki1000+ is a corpus of articles from Wikipedia, compiled for the purposes of the study of multiword expressions (MWEs) in Bulgarian. The Wiki1000+ contains 6311 text samples and 13.4 million tokenс. The corpus is a part of the Bulgarian National Corpus. Compilation The corpus is collected automatically via a web crawler which crawls all pages in the Bulgarian…

N-grams on Bulgarian National Corpus

BgNgrams lists are extracted from the current version of the Bulgarian National Corpus (with a core Bulgarian part containing over 1.2 billion words). The n-grams involves both lemmas (n-gram lemma) and word forms (n-gram word form). n-grams can be 1-grams, 2-grams, 3-grams, 4-grams, 5-grams. The n-gram language models (1-5) are in the standard ARPA text and binary format.

Frequency Dictionaries

General overview The Frequency Dictionaries are derived from the Bulgarian National Corpus (BulNC), which is the largest systematically created and representative corpus of Bulgarian. The Frequency Dictionaries reflect the frequency of occurrence of lexical items in the corpus (BulNC version: December 2011). The classification of the BulNC samples is based on their style, domain and genre. Texts are divided into…

Без категория

Penny Labropoulou

Georg Rehm

Post-workshops Survey (ELG / MIC 21 Workshops)

Pre-workshops Survey (ELG / MIC 21 Workshops): Results

Registration (ELG / MIC 21 Workshops)

CulNet 2020

Wiki1000+ corpus with annotated MWEs

N-grams on Bulgarian National Corpus

Frequency Dictionaries

Bulgarian WordNet

Multilingual Image Corpus

Bulgarian National Corpus

Dictionary of Bulgarian Language, online implementation by DCL

META-SHARE – network of repositories of language data, tools and related web services

System for business intelligence, language resources provided by DCL.