Author Archive: Svetlozara Lesseva

Prof. Galya Angelova (Institute of Information and Communication Technologies)

Short bio

Galya AngelovaGalya Angelova is Professor in Computer Science and Doctor of Sciences, director of the Institute of Information and Communication Technologies (IICT) at the Bulgarian Academy of Sciences. She studied Mathematics and Informatics at Sofia University “St. Kliment Ohridski” and received her PhD from MTA SZTAKI (Computer and Automation Institute at the Hungarian Academy of Sciences). Her major fields of research are: Knowledge-based natural language processing (information extraction from text, automatic acquisition of conceptual information from text, analysis of clinical patient records in Bulgarian language, analysis of image tags and automatic tag sense recognition); Big data analytics and visualization; Digitization and Intelligent management of digital content.

Prof. Angelova has published more than 150 scientific publications in journals, book chapters, and edited Conference volumes. She was the coordinator or principal investigator of more than 25 projects with international or national funding. In 2012-2016 she coordinated the project AComIn “Advanced Computing for Innovation”, a 3.2 MEuro grant with the European Commission, FP7 Capacity, included by the European Commission in the book “Achievements of FP7: examples that make us proud”. Prof. Angelova received the Big Award PITAGOR of the Bulgarian Ministry of Education and Science in the category “Successful leader of international projects for 2015”. She acts as a reviewer and evaluator for the European Commission.

In 2002-2013, Prof. Angelova was Member of the Editorial Board of the International Conference of Conceptual Structures (ICCS) with a series of peer-reviewed Proceedings published by Springer in Lecture Notes on Artificial Intelligence. After 2001, she is the Chair of the Organising Committee of the International Conference RANLP (Recent Advances in Natural Language Processing), held biennially in Bulgaria. After 2009, the RANLP Proceedings is uploaded in the ACL Anthology. After 2007, the RANLP Proceedings is indexed by Scopus with current SJR-rank 0.143.

➥ Back to Plenary Talks

Talk abstract

Tag Sense Disambiguation in Large Image Collections: Is It Possible?

Automatic identification of intended tag meanings is a challenge in large annotated image collections where human authors assign tags inspired by emotional or professional motivations. This task can be viewed as part of the AI-complete problem to integrate language and vision. Algorithms for automatic Tag Sense Disambiguation (TSD) need “golden” collections of manually created tags to establish baselines for accuracy assessment. In this talk the TSD task will be presented with its background, complexity and possible solutions. An approach to use WordNet senses and Lesk algorithm proves to be successful but the evaluation was done manually for a small number of tags. Another experiment with the MIRFLICKR-25000 image collection will be presented as well. Word embeddings create a specific baseline so the results can be compared. The accuracy achieved in this exercise is 78.6%.

By improving TSD and obtaining high quality synsets for the image tags, we are actually supporting the machine translation of the large annotated image collections to languages other than English.

➥ Back to Plenary Talks

Assoc. Prof. Svetla Boytcheva (Institute of Information and Communication Technologies)

Short bio

Svetla BoytchevaDr. Svetla Boytcheva has a PhD in Computer Science and MSc in Mathematics from Sofia University St. Kliment Ohridski, Bulgaria. Her PhD thesis is in the field of Machine Learning and NLP. She has a long track of computer science courses teaching more than 25 years at one of the top universities in Bulgaria – Sofia University, American University in Bulgaria, University of Library Studies and Information Technologies, and New Bulgarian University. She has leadership and Management Skills gained as Vice Dean of Academic Affairs and head of the graduate program in Artificial Intelligence at Sofia University, as well as supervisor of successfully defended PhD student and more than 30 undergraduate and graduate student thesis projects.

Her current research interests include different aspects of Artificial Intelligence and Biomedical Informatics – machine learning, data mining, big data analytics, natural language processing, health informatics, and e-learning. She gained experience in participating in several EU funded and national research projects including EC FP7 — PSIP+, AcomIn, SISTER; EC FP6 — TENCompetence, KALEIDOSCOPE; INCO-Copernicus — LarFlast, ILPnet2; SOCARATES/ERASMUS— ETN-DEC; BMBF Germany —BIMDANUBE; Bulgarian National Science Fund — EVTIMA, IZIDA, DemoSem. She is the responsible person from the Bulgarian team for H2020 projects InnoRate, ExaMode, theFMS. Currently, she also participates ин several governmental projects and operational program projects co-financed by EU: eHealth National Scientific Programmes, Information and Communication Technologies for a Unite Digital Market in Science, Education and Security, the Centre for Advanced Computing and Data Processing and Ministry of Health Projects: Development of National Electronic Health Register for Diabetes Mellitus Diseases, Analysis of the morbidity, prevalence, and treatment assessment of Diabetes Mellitus and Cardiovascular Diseases. In 2011 she received the Rolf Hansen Memorial Award of the European Federation for Medical Informatics.

Currently, she is also an associate professor of computer science in the Department of Linguistic Modelling and Knowledge Processing at the Institute of Information and Communication Technologies of the Bulgarian Academy of Sciences. Her current position in Sirma AI (trading as Ontotext) is Senior Research Lead, that includes responsibilities for conducting research and prototypes development for scientific projects of the company. She has authored 10 books and more than 90 scientific papers. She is also an author for several textbooks in Computer Science and Information Technologies for middle and secondary schools in Bulgaria.

➥ Back to Plenary Talks

Talk abstract

Clinical Natural Language Processing in Bulgarian

Healthcare is a data intense domain. A large amount of patient data is generated daily. However, more than 80% of this information is stored in an unstructured format – as clinical texts. Usually, clinical narratives contain a description with telegraph-style sentences, ambiguous abbreviations, many typographical errors, lack of punctuation, concatenated words, and etc. Especially in the Bulgarian context – medical texts contain terminology both in Bulgarian, Latin and transliterated Latin terminology in Cyrillic, that makes the task for text analytics more challenging. Recently, with the improvement of the quality of natural language processing (NLP), it is increasingly recognized as the most useful tool for extracting clinical information from free text in scientific medical publications and clinical records. Natural language processing (NLP) of non-English clinical text is quite a challenge because of the lack of resources and NLP tools. International medical ontologies such as SNOMED, MeSH (Medical Subject Headings), and the UMLS (Unified Medical Languages System) are not yet available in most languages. This necessitates the development of new methods for processing clinical information and for semi-automatically generating medical language resources. This is not an easy task because of the lack of a sufficiently accessible repositories with medical records, due to the specific nature of the content, which contains a lot of personal data and specific regulations for their access.

In this talk will be discussed the multilingual aspects of automation Extract text from clinical narratives in the Bulgarian language. This is very important task for medical informatics, because it allows the automatic structuring of patient information and the generation of databases that can be further investigated by retrieving data to search for complex relationships. The results can help improve clinical decision support, diagnosis and treatment support systems.

➥ Back to Plenary Talks

Dr. Preslav Nakov (Qatar Computing Research Institute)

Short bio

Preslav_Nakov1Dr. Preslav Nakov is a Principal Scientist at the Qatar Computing Research Institute (QCRI), HBKU. His research interests include computational linguistics, “fake news” detection, fact-checking, machine translation, question answering, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. He received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant), and he was a Research Fellow at the National University of Singapore, a honorary lecturer at Sofia University, and research staff at the Bulgarian Academy of Sciences.

At QCRI, he leads the Tanbih project, developed in collaboration with MIT, which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. Dr. Nakov is the Secretary of ACL SIGLEX and of ACL SIGSLAV, and a member of the EACL advisory board. He is member of the editorial board of TACL, C&SL, NLE, AI Communications, and Frontiers in AI. He is also on the Editorial Board of the Language Science Press Book Series on Phraseology and Multiword Expressions. He co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals.

Dr. Nakov received the Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President’s John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov’s research was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, DefenseOne, Business Insider, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

➥ Back to Plenary Talks

Talk abstract

Detecting the Fake News at Its Source, Media Literacy, and Regulatory Compliance

Given the recent proliferation of disinformation online, there has been also growing research interest in automatically debunking rumors, false claims, and “fake news”. A number of fact-checking initiatives have been launched so far, both manual and automatic, but the whole enterprise remains in a state of crisis: by the time a claim is finally fact-checked, it could have reached millions of users, and the harm caused could hardly be undone. An arguably more promising direction is to focus on fact-checking entire news outlets, which can be done in advance. Then, we could fact-check the news before they were even written: by checking how trustworthy the outlets that published them are.

We will show how we do this in the Tanbih news aggregator (//, which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. The project’s primary aim is to promote media literacy and critical thinking, which are arguably the best way to address disinformation and “fake news” in the long run. In particular, we develop media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, stance with respect to various claims and topics, as well as audience reach and audience bias in social media. We further offer explainability by automatically detecting and highlighting the instances of use of specific propaganda techniques in the news (

Finally, we will show how this research can support broadcasters and content owners with their regulatory measures and compliance processes. This is a direction we recently explored as part of our TM Forum & IBC 2019 award-winning Media-Telecom Catalyst project on AI Indexing for Regulatory Compliance, which QCRI developed in partnership with Al Jazeera, Associated Press, RTE Ireland, Tech Mahindra, V-Nova, and Metaliquid.

➥ Back to Plenary Talks

Linguistic Intelligence: Computers vs. Humans (Abstract)

Prof. Dr. Ruslan Mitkov, University of Wolverhampton

Computers are ubiquitous – they are and are used everywhere. But how good are computers at understanding and producing natural languages (e.g. English or Bulgarian)? In other words, what is the level of their linguistic intelligence? This presentation will examine the linguistic intelligence of the computers and will look at the challenges ahead…

Read on

Ruslan Mitkov (University of Wolverhampton)

Short bio

Ruslan MitkovProf. Dr. Ruslan Mitkov has been working in Natural Language Processing (NLP), Computational Linguistics, Corpus Linguistics, Machine Translation, Translation Technology and related areas since the early 1980s. Whereas Prof. Mitkov is best known for his seminal contributions to the areas of anaphora resolution and automatic generation of multiple-choice tests, his extensively cited research (more than 240 publications including 14 books, 35 journal articles and 36 book chapters) also covers topics such as machine translation, translation memory and translation technology in general, bilingual term extraction, automatic identification of cognates and false friends, natural language generation, automatic summarisation, computer-aided language processing, centering, evaluation, corpus annotation, NLP-driven corpus-based study of translation universals, text simplification, NLP for people with language disabilities and computational phraseology.

Mitkov is author of the monograph Anaphora resolution (Longman) and Editor of the most successful Oxford University Press Handbook – The Oxford Handbook of Computational Linguistics. Current prestigious projects include his role as Executive Editor of the Journal of Natural Language Engineering published by Cambridge University Press and Editor-in-Chief of the Natural Language Processing book series of John Benjamins publishers. Dr. Mitkov is also working on the forthcoming Oxford Dictionary of Computational Linguistics (Oxford University Press, co-authored with Patrick Hanks) and the forthcoming second, substantially revised edition of the Oxford Handbook of Computational Linguistics.

Prof. Mitkov has been invited as a keynote speaker at a number of international conferences including conferences on translation and translation technology. He has acted as Programme Chair of various international conferences on Natural Language Processing (NLP), Machine Translation, Translation Technology, Translation Studies, Corpus Linguistics and Anaphora Resolution. He is asked on a regular basis to review for leading international funding bodies and organisations and to act as a referee for applications for Professorships both in North America and Europe. Ruslan Mitkov is regularly asked to review for leading journals, publishers and conferences and serve as a member of Programme Committees or Editorial Boards. Prof. Mitkov has been an external examiner of many doctoral theses and curricula in the UK and abroad, including Master’s programmes related to NLP, Translation and Translation Technology.

Dr. Mitkov has considerable external funding to his credit (more than є 20,000,000) and is currently acting as Principal Investigator of several large projects, some of which are funded by UK research councils, by the EC as well as by companies and users from the UK and USA. Ruslan Mitkov received his MSc from the Humboldt University in Berlin, his PhD from the Technical University in Dresden and worked as a Research Professor at the Institute of Mathematics, Bulgarian Academy of Sciences, Sofia.

Mitkov is Professor of Computational Linguistics and Language Engineering at the University of Wolverhampton which he joined in 1995 and where he set up the Research Group in Computational Linguistics. His Research Group has emerged as an internationally leading unit in applied Natural Language Processing and members of the group have won awards in different NLP/shared-task competitions. In addition to being Head of the Research Group in Computational Linguistics, Prof. Mitkov is also Director of the Research Institute in Information and Language Processing. The Research Institute consists of the Research Group in Computational Linguistics and the Research Group in Statistical Cybermetrics, which is another top performer internationally. Ruslan Mitkov is Vice President of ASLING, an international Association for promoting Language Technology. Dr. Mitkov is a Fellow of the Alexander von Humboldt Foundation, Germany and was invited as Distinguished Visiting Professor at the University of Franche-Comté in Besançon, France; he also serves as Vice-Chair for the prestigious EC funding programme ‘Future and Emerging Technologies’.

In recognition of his outstanding professional/research achievements, Prof. Mitkov was awarded the title of Doctor Honoris Causa at Plovdiv University in November 2011. At the end of October 2014 Dr. Mitkov was also conferred Professor Honoris Causa at Veliko Tarnovo University.

Talk abstract

With a Little Help from NLP: My Language Technology Applications with Impact on Society

The talk will present three original methodologies developed by the speaker, underpinning implemented Language Technology tools which are already having an impact on the following areas of society: e-learning, translation and interpreting and care for people with language disabilities.

The first part of the presentation will introduce an original methodology and tool for generating multiple-choice tests from electronic textbooks. The application draws on a variety of Natural Language Processing (NLP) techniques which include term extraction, semantic computing and sentence transformation. The presentation will include an evaluation of the tool which demonstrates that generation of multiple-choice tests items with the help of this tool is almost four times faster than manual construction and the quality of the test items is not compromised. This application benefits e-learning users (both teachers and students) and is an example of how NLP can have a positive societal impact, in which the speaker passionately believes.

The talk will go on to outline two other original recent projects which are also related to the application of NLP beyond academia. First, a project, whose objective is to develop next-generation translation memory tools for translators and, in the near future, for interpreters, will be briefly presented. Finally, an original methodology and system will be outlined which helps users with autism to read and better understand texts.


Short bio

Sujith RaviDr. Sujith Ravi, a Staff Research Scientist and Manager at Google, leads the company’s large-scale graph-based machine learning platform that powers natural language understanding and image recognition for products used by millions of people everyday in Search, Gmail, Photos, Android, YouTube, and Allo. The machine learning technology enables features such as Smart Reply that automatically suggests replies to incoming e-mails or chat messages in Inbox and Allo; Photos that searches for anything, from “hugs” to “dogs,” with the latest image recognition system; and smart messaging directly from Android Wear smartwatches powered by on-device machine learning.

Dr. Ravi has authored more than 50 scientific publications and patents in top-tier machine learning and natural language processing conferences, and his work won the ACM SIGKDD Best Research Paper Award in 2014. He organizes machine learning symposia/workshops and regularly serves as Area Chair and PC of top-tier machine learning and natural language processing conferences.

Talk abstract

Neural Graph Learning

Recent machine learning advances have enabled us to build intelligent systems that understand semantics from speech, natural language text and images. While great progress has been made in many AI fields, building scalable intelligent systems from “scratch” still remains a daunting challenge for many applications.To overcome this, we exploit the power of graph algorithms since they offer a simple elegant way to express different types of relationships observed in data and can concisely encode structure underlying a problem. In this talk I will focus on “How can we combine the flexibility of graphs with the power of machine learning?”

I will describe how we address these challenges and design efficient algorithms by employing graph-based machine learning as a computing mechanism to solve real-world prediction tasks. Our graph-based machine learning framework can operate at large scale and easily handle massive graphs (containing billions of vertices and trillions of edges) and make predictions over billions of output labels while achieving O(1) space complexity per vertex. In particular, we combine graph learning with deep neural networks to power a number of machine intelligence applications, including Smart Reply, image recognition and video summarization to tackle complex language understanding and computer vision problems. l will also introduce some of our latest research and share results on “neural graph learning”, a new joint optimization framework for combining graph learning with deep neural network models.


Short bio

Zornitsa Kozareva After leading and managing the AWS Deep Learning group at Amazon that was responsible for building and solving natural language processing and dialog applications (2016–2017), as of December 2017 Dr. Zornitsa Kozareva has taken a managerial position at Google. From 2014 to 2016 she was a Senior Manager at Yahoo! leading the Query Processing group that powered Mobile Search and Advertisement. Earlier, during the period 2009–2014, Dr. Kozareva wore an academic hat as Research Professor at the University of Southern California CS Department with affiliation to Information Sciences Institute where she spearheaded research funded by DARPA and IARPA on learning to read, interpreting metaphors and building knowledge bases from the Web.

Dr. Kozareva regularly serves as Area Chair and PC of top-tier NLP conferences. She has organized four SemEval scientific challenges and has published over 80 research papers. Dr. Kozareva is a recipient of the John Atanasoff Award given by the President of Republic of Bulgaria in 2016 for her contributions and impact in science, education, and industry; the Yahoo! Labs Excellence Award in 2014 and the RANLP Young Scientist Award in 2011.

Talk abstract

Building Conversational Assistants using Deep Learning

Over the years there has been a paradigm shift in how humans interact with machines. Today’s users are no longer satisfied with seeing a list of relevant web pages, instead they want to complete tasks and take actions. This raises the questions: “How do we teach machines to become useful in a human-centered environment?” and “How do we build machines that help us organize our daily schedules, arrange our travel and be aware of our preferences and habits?”. In this talk, I will describe these challenges in the context of conversational assistants. Then, I will delve into deep learning algorithms for entity extraction, user intent prediction and question answering. Finally, I will highlight findings on user intent prediction from shopping, movies, restaurant and sport domains.


Златен спонсор на CLIB 2016:



Prof. Dragomir Radev (Department of Electrical Engineering and Computer Science, University of Michigan)

Short bio

Dragomir RadevDragomir Radev is a Professor of Computer Science and Engineering, Information, and Linguistics at the University of Michigan. He also has an appointment in the Michigan Institute for Data Science (MIDAS).

Dragomir grew up in Bulgaria and got interested in Computational Linguistics in high school when he participated in a number of contests in mathematical linguistics. Dragomir has a PhD in Computer Science from Columbia University, where he currently holds a Visiting Professor title. Dragomir’s research is in Natural Language Processing, Applied Machine Learning, and Information Retrieval. He works in the fields of text summarization, lexical semantics, sentiment analysis, open domain question answering, and the application of NLP to other areas such as Bioinformatics and Political Science.

Dragomir is the past secretary of the Association for Computational Linguistics (ACL). Dragomir is also co-founder of the North American Computational Linguistics Olympiad (NACLO) and the coach of the US team for the International Linguistics Olympiad (IOL). Dragomir has close to 200 international publications as well as three patents. He is the co-author (with Rada Mihalcea) of the book “Graph-based Natural Language Processing and Information Retrieval” and the editor of two volumes of “Puzzles in Logic, Languages and Computation”.

Dragomir has worked for or consulted for IBM, Yahoo, Microsoft, AT&T, and other companies. In 2013, Dragomir received the University of Michigan’s Distinguished Faculty Award. He is an associate editor of the Journal of Artificial Intelligence Research (JAIR). Dragomir also teaches introduction to Natural Language Processing on Coursera. Dragomir became an Association for Computing Machinery (ACM) Fellow in 2015.

Talk abstract

Natural Language Processing for Collective Discourse

Natural Language Processing (NLP) has become very popular in recent years thanks to new technologies like IBM’s Watson, Apple’s Siri, Google Translate, and Yahoo’s text summarization system. One of the fundamental challenges in NLP is to automatically recognize similar words and sentences. I will talk about research done in the Computational Linguistics And Information Retrieval lab (CLAIR) on graph-based methods for similarity recognition and its applications to NLP tasks. These projects are related to Collective Discourse (text collections produced by large numbers of users) and its inherent properties such as centrality and diversity. In the first project we team up with the New Yorker magazine. Each week a captionless cartoon is published in the magazine and thousands of readers try to come up with funny captions for it. In our work, we try to uncover the topics of the jokes in the submitted captions. The second project is about analysing a corpus of word clues used in New York Times crossword puzzles. We compare different clustering methods for word sense disambiguation using these crossword clues. The third project is about the automatic generation of citation-based summaries of research articles. These summaries describe what readers of the papers find most important in the cited papers. If there is time, I will also briefly mention some applications to bioinformatics, political science, and social network analysis.

Dr. Preslav Nakov (Qatar Computing Research Institute, HBKU)

Short bio

Preslav NakovThis year our invited speaker will be Dr. Preslav Nakov of the Qatar Computing Research Institute, HBKU. His primary research interests include computational linguistics, machine translation, question answering, lexical semantics, Web as a corpus, and biomedical text processing.

Preslav Nakov holds a PhD degree in Computer Science from the University of California at Berkeley and a MSc degree from Sofia University. He was a Research Fellow at the National University of Singapore (2008-2011), a honorary lecturer at Sofia University (2008), researcher at the Bulgarian Academy of Sciences (2008), and a visiting researcher at the University of Southern California, Information Sciences Institute (2005).

Preslav Nakov is a co-author of a book on Semantic Relations between Nominals, two books on computer algorithms, and over 100 research papers, including over 40 in top-tier conferences and journals.

He received the Young Researcher Award at the Recent Advances in Natural Language Processing Conference 2011 (RANLP’2011) and was the first to receive the Bulgarian President’s John Atanasoff award. His research in machine translation won competitions in the Seventh and the Ninth Workshop on Statistical Machine Translation (WMT’12 and WMT’14), as well as in the 10th International Workshop on Spoken Language Translation (IWSLT’13).

Preslav Nakov is an Associate Editor of the AI Communications journal and an elected member of the ACL SIGLEX board (since 2013). He served on the programme committees of the major conferences and workshops in computational linguistics, including as a co-chair of SemEval 2014-2016, and as an area chair of *SEM’13 and EMNLP’16.

Talk abstract

Exposing Paid Opinion Manipulation Trolls in News Community Forums

The practice of using opinion manipulation trolls has been reality since the rise of Internet and community forums. It has been shown that user opinions about products, companies and politics can be influenced by posts by other users in online forums and social networks. This makes it easy for companies and political parties to gain popularity by paying for “reputation management” to people or companies that write in discussion forums and social networks fake opinions from fake profiles.

During the 2013-2014 Bulgarian protests against the Oresharski cabinet, social networks and news community forums became the main “battle grounds” between supporters and opponents of the government. In that period, there was a very notable presence and activity of government supporters in Web forums. In series of leaked documents in the independent Bulgarian media Bivol, it was alleged that the ruling Socialist party was paying Internet trolls with EU Parliament money. Allegedly, these trolls were hired by a PR agency and were given specific instructions what to write.

A natural question is whether such trolls can be found and exposed automatically. This is a very hard task, as there is no enough data to train a classifier; yet, it is possible to obtain some test data, as these trolls are sometimes caught and widely exposed (e.g., by Bivol). Yet, one still needs training data. We solve the problem by assuming that a user who is called a troll by several different people is likely to be one, and one who has never been called a troll is unlikely to be such. We compare the profiles of (i) paid trolls vs. (ii) “mentioned” trolls vs. (iii) non-trolls, and we further show that a classifier trained to distinguish (ii) from (iii) does quite well also at telling apart (i) from (iii).