PROGRAMME | Computational Linguistics in Bulgaria (CLIB-2024)

9 September 2024

House of Europe (124 G. RAkovski St.)

8:30 – Registration

9:00 – 9:15 – Conference Opening

Neural Networks, Large Language Models and Language Modelling

9:15 – 10:00 – Plenary Talk: Dr. Veselin Stoyanov (TOME AI, USA): Large Language Models for the Real World: Explorations of Sparse, Cross-lingual Understanding and Instruction-Tuned LLMs

10:00 – 11:15 – Session 1: Large Language Models and Language Learning | Chair: Stoyan Mihov (Institute of Information and Communication Technologies, BAS)

10:00 – 10:25 – Radu Ion, Verginica Barbu Mititelu, Vasile Pais, Elena Irimia, Valentin Badea: A Cross–model Study on Learning Romanian Parts of Speech with Transformer Models

10:25 – 10:50 – Ekaterina Goliakova, David Langlois: What do BERT Word Embeddings Learn about the French Language?

10:50 – 11:15 – Camille Lavigne, Alex Stasica: Whisper–TAD: A General Model for Transcription, Alignment and Diarization of Speech

11:15 – 11:30 – Coffee Break

11:30 – 12:45 – Session 2: Large Language Models in Analysis and Generation | Chair: Ivan Koychev (Sofia University St. Kliment Ohridski)

11:30 – 11:55 – Iglika Nikolova–Stoupak, Gael Lejeune, Eva Schaeffer–Lacroix: Contemporary LLMs and Literary Abridgement: An Analytical Inquiry

11:55 – 12:20 – Milica Ikonic Nesic, Sasa Petalinkar, Mihailo Skoric, Ranka Stankovic, Biljana Rujevic: Advancing Sentiment Analysis in Serbian Literature: A Zero and Few–Shot Learning Approach Using the Mistral Model

12:20 – 12:45 – Lyuboslav Karev, Ivan Koychev: Generating Phonetic Embeddings for Bulgarian Words with Neural Networks

12:45 – 13:45 – Lunch and Poster Session

13:45 – 14:30 – Plenary Talk: Prof. Joakim Nivre (Uppsala University and RISE, Sweden): Ten Years of Universal Dependencies

14:30 – 15:45 – Session 3: Treebanks and Parsers in Universal Dependencies | Chair: Milena Dobreva (University of Strathclyde)

14:30 – 14:55 – Nelda Kote, Rozana Rushiti, Anila Cepani, Alba Haveriku, Evis Trandafili, Elinda Kajo Mece, Elsa Skenderi Rakipllari, Lindita Xhanari, Albana Deda: Universal Dependencies Treebank for Standard Albanian: A New Approach

14:55 – 15:20 – Verginica Barbu Mititelu, Tudor Voicu: Function Multiword Expressions Annotated with Discourse Relations in the Romanian Reference Treebank

15:20 – 15:45 – Atanas Atanasov: Dependency Parser for Bulgarian

15:45 – 16:00 – Coffee Break

16:00 – 17:40 – Session 4: Modeling Multiword Expressions | Chair: Aleksandra Bagasheva (Sofia University St. Kliment Ohridski)

16:00 – 16:25 – Madalina Chitez, Ana–Maria Bucur, Andreea Dinca, Roxana Rogobete: Towards a Romanian Phrasal Academic Lexicon

16:25 – 16:50 – Laura Rituma, Gunta Nespore–Berzkalne, Agute Klints, Ilze Lokmane, Madara Stade, Peteris Paikens: Classifying Multi–Word Expressions in the Latvian Monolingual Electronic Dictionary Tezaurs.lv

16:50 – 17:15 – Laura Occhipinti: Complex Word Identification for Italian Language: A Dictionary–based Approach

17:15 – 17:40 – Ivana Brac, Matea Birtic: Verbal Multiword Expressions in the Croatian Verb Lexicon

10 September 2024

Datasets, Corpora and Lexical-semantic Resources

9:00 – 9:45 – Plenary Talk: Prof. Vito Pirrelli (NRC, Institute for Computational Linguistics, Pisa, Italy): Written Text Processing and the Adaptive Reading Hypothesis

9:45 – 10:35 – Session 5: Language Technologies and Language Acquisition | Chair: Mariana Damova (Mozaika LTD.)

9:45 – 10:10 – Alessandro Lento, Andrea Nadalini, Marcello Ferro, Claudia Marzi, Vito Pirrelli, Tsvetana Dimitrova, Hristina Kukova, Valentina Stefanova, Maria Todorova, Svetla Koeva: Assessing Reading Literacy of Bulgarian Pupils with Finger–tracking

10:10 – 10:35 – Denitza Charkova: Educational Horizons: Mapping the Terrain of Artificial Intelligence Integration in Bulgarian Educational Settings

10:35 – 11:25 – Session 6: Corpus–based Studies: Part 1 | Chair: Svilena Georgieva (DG Translation, EU)

10:35 – 11:00 – Ekaterina Tarpomanova: Evidential Auxiliaries as Non–reliability Markers in Bulgarian Parliamentary Speech

11:00 – 11:25 – Iglika Nikolova–Stoupak, Eva Schaeffer–Lacroix, Gael Lejeune: Extended Context at the Introduction of Complex Vocabulary in Abridged Literary Texts

11:25 – 11:40 – Coffee Break

11:40 – 12:55 – Session 6: Corpus–based Studies: Part 2 | Chair: Elitza Horozova (Translation Agency Sofita)

11:40 – 12:05 – Junya Morita: Corpus–based Research into Derivational Morphology: A Comparative Study of Japanese and English Verbalization

12:05 – 12:30 – Ivan Derzhanski, Olena Siruk: The Verbal Category of Conditionality in Bulgarian and Its Ukrainian Correspondences

12:30 – 12:55 – Natalia Dankova: Lexical Richness of French and Quebec Journalistic Texts

12:55 – 13:55 – Lunch and Poster Session

13:55 – 15:10 – Session 7: Language Resources and Datasets | Chair: Irina Temnikova (GATE Institute)

13:55 – 14:20 – Maria Khokhlova, Mikhail Koryshev: A Corpus of Liturgical Texts in German: Towards Multilevel Text Annotation

14:20 – 14:45 – Valentin Zmiycharov, Ivan Koychev, Todor Tsonkov: EurLexSummarization – A New Text Summarization Dataset on EU Legislation in 24 Languages with GPT Evaluation

14:45 – 15:10 – Petya Osenova: On a Hurtlex Resource for Bulgarian

15:10 – 15:30 – Coffee Break

15:30 – 17:10 – Session 8: WordNets, FrameNets and Ontologies | Chair: Tinko Tinchev (Sofia University St. Kliment Ohridski)

15:30 – 15:55 – Ivelina Stoyanova: Semantic Features in the Automatic Analysis of Verbs of Creation in Bulgarian and English

15:55 – 16:20 – Svetlozara Leseva: A ‘Dip-dive’ into Motion: Exploring Lexical Resources towards a Comprehensive Semantic and Syntactic Description

16:20 – 16:45 – Ivelina Stoyanova, Hristina Kukova, Maria Todorova, Tsvetana Dimitrova: Multilingual Corpus of Illustrative Examples on Activity Predicates

16:45 – 17:10 – Svetla Koeva: Large Language Models in Linguistic Research: The Pilot and the Copilot

17:10 – 17:30 – Conference Closing

POSTER SESSION

Chair: Svetlozara Leseva (Institute for Bulgarian Language, BAS)

The Poster Session will take place during the lunch break on 9 and 10 September.

The posters are listed in alphabetical order of the first authors’ surnames.

Fabio Maion, Tsvetana Dimitrova, Andrej Bojadziev: A Unified Annotation of the Stages of the Bulgarian Language. First Steps

Amal Haddad Haddad, Damith Premasiri: ChatGPT: Detection of Spanish Terms Based on False Friends

Jordan Kralev: Deep Learning Framework for Identifying Future Market Opportunities from Textual User Reviews

Ruslana Margova, Bastiaan Bruinsma: Look Who’s Talking: The Most Frequently Used Words in the Bulgarian Parliament 1990–2024

Sabrina Mennella, Maria Di Maro, Martina Di Bratto: Estimating Commonsense Knowledge from a Linguistic Analysis on Information Distribution

Georgi Pashev, Silvia Gaftandzhieva: Pondera: A Personalized AI–Driven Weight Loss Mobile Companion with Multidimensional Goal Fulfillment Analytics

Stanislav Penkov: Mitigating Hallucinations in Large Language Models via Semantic Enrichment of Prompts: Insights from BioBERT and Ontological Integration

Maria Todorova: Commercially Minor Languages and Localization