Assoc. Prof. Svetla Boytcheva (Institute of Information and Communication Technologies) | Computational Linguistics in Bulgaria (CLIB-2026)

Short bio

Dr. Svetla Boytcheva has a PhD in Computer Science and MSc in Mathematics from Sofia University St. Kliment Ohridski, Bulgaria. Her PhD thesis is in the field of Machine Learning and NLP. She has a long track of computer science courses teaching more than 25 years at one of the top universities in Bulgaria – Sofia University, American University in Bulgaria, University of Library Studies and Information Technologies, and New Bulgarian University. She has leadership and Management Skills gained as Vice Dean of Academic Affairs and head of the graduate program in Artificial Intelligence at Sofia University, as well as supervisor of successfully defended PhD student and more than 30 undergraduate and graduate student thesis projects.

Her current research interests include different aspects of Artificial Intelligence and Biomedical Informatics – machine learning, data mining, big data analytics, natural language processing, health informatics, and e-learning. She gained experience in participating in several EU funded and national research projects including EC FP7 — PSIP+, AcomIn, SISTER; EC FP6 — TENCompetence, KALEIDOSCOPE; INCO-Copernicus — LarFlast, ILPnet2; SOCARATES/ERASMUS— ETN-DEC; BMBF Germany —BIMDANUBE; Bulgarian National Science Fund — EVTIMA, IZIDA, DemoSem. She is the responsible person from the Bulgarian team for H2020 projects InnoRate, ExaMode, theFMS. Currently, she also participates ин several governmental projects and operational program projects co-financed by EU: eHealth National Scientific Programmes, Information and Communication Technologies for a Unite Digital Market in Science, Education and Security, the Centre for Advanced Computing and Data Processing and Ministry of Health Projects: Development of National Electronic Health Register for Diabetes Mellitus Diseases, Analysis of the morbidity, prevalence, and treatment assessment of Diabetes Mellitus and Cardiovascular Diseases. In 2011 she received the Rolf Hansen Memorial Award of the European Federation for Medical Informatics.

Currently, she is also an associate professor of computer science in the Department of Linguistic Modelling and Knowledge Processing at the Institute of Information and Communication Technologies of the Bulgarian Academy of Sciences. Her current position in Sirma AI (trading as Ontotext) is Senior Research Lead, that includes responsibilities for conducting research and prototypes development for scientific projects of the company. She has authored 10 books and more than 90 scientific papers. She is also an author for several textbooks in Computer Science and Information Technologies for middle and secondary schools in Bulgaria.

>> Back to Plenary Talks

Talk abstract

Clinical Natural Language Processing in Bulgarian

Healthcare is a data intense domain. A large amount of patient data is generated daily. However, more than 80% of this information is stored in an unstructured format – as clinical texts. Usually, clinical narratives contain a description with telegraph-style sentences, ambiguous abbreviations, many typographical errors, lack of punctuation, concatenated words, and etc. Especially in the Bulgarian context – medical texts contain terminology both in Bulgarian, Latin and transliterated Latin terminology in Cyrillic, that makes the task for text analytics more challenging. Recently, with the improvement of the quality of natural language processing (NLP), it is increasingly recognized as the most useful tool for extracting clinical information from free text in scientific medical publications and clinical records. Natural language processing (NLP) of non-English clinical text is quite a challenge because of the lack of resources and NLP tools. International medical ontologies such as SNOMED, MeSH (Medical Subject Headings), and the UMLS (Unified Medical Languages System) are not yet available in most languages. This necessitates the development of new methods for processing clinical information and for semi-automatically generating medical language resources. This is not an easy task because of the lack of a sufficiently accessible repositories with medical records, due to the specific nature of the content, which contains a lot of personal data and specific regulations for their access.

In this talk will be discussed the multilingual aspects of automation Extract text from clinical narratives in the Bulgarian language. This is very important task for medical informatics, because it allows the automatic structuring of patient information and the generation of databases that can be further investigated by retrieving data to search for complex relationships. The results can help improve clinical decision support, diagnosis and treatment support systems.

>> Back to Plenary Talks