Author Archive: Svetlozara Lesseva

Prof. Iryna Gurevych (Technical University of Darmstadt, Germany)

Short bio

Iryna Gurevych is a German computer scientist. She is Professor at the Department of Computer Science of the Technical University of Darmstadt and Director of Ubiquitous Knowledge Processing Lab. She has a strong background in information extraction, semantic text processing, machine learning and innovative applications of NLP to social sciences and humanities.

Iryna Gurevych has published over 300 publications in international conferences and journals and is member of programme and conference committees of more than 50 high-level conferences and workshops (ACL, EACL, NAACL, etc.).

She is the holder of several awards, including the Lichtenberg-Professorship Career Award und the Emmy-Noether Career Award (both in 2007). In 2021 she received the first LOEWE-professorship of the LOEWE programme. She has been selected as a ACL Fellow 2020 for her outstanding work in natural language processing and machine learning and is the Vice-president-elect of the ACL since 2021.

Talk abstract

Detect – Verify – Communicate: Combating Misinformation with More Realistic NLP

Dealing with misinformation is a grand challenge of the information society directed at equipping the computer users with effective tools for identifying and debunking misinformation. Current Natural Language Processing (NLP) including its fact-checking research fails to meet the expectations of real-life scenarios. In this talk, we show why the past work on fact-checking has not yet led to truly useful tools for managing misinformation, and discuss our ongoing work on more realistic solutions. NLP systems are expensive in terms of financial cost, computation, and manpower needed to create data for the learning process. With that in mind, we are pursuing research on detection of emerging misinformation topics to focus human attention on the most harmful, novel examples. Automatic methods for claim verification rely on large, high-quality datasets. To this end, we have constructed two corpora for fact checking, considering larger evidence documents and pushing the state of the art closer to the reality of combating misinformation. We further compare the capabilities of automatic, NLP-based approaches to what human fact checkers actually do, uncovering critical research directions for the future. To edify false beliefs, we are collaborating with cognitive scientists and psychologists to automatically detect and respond to attitudes of vaccine hesitancy, encouraging anti-vaxxers to change their minds with effective communication strategies.

Prof. Shuly Wintner (University of Haifa, Israel)

Short bio

Shuly Wintner is professor of computer science at the University of Haifa, Israel. His research spans across various areas of computational linguistics and natural language processing, including formal grammars, morphology, syntax, language resources, translation, and multilingualism.

He served as the editor-in-chief of Springer’s Research on Language and Computation, a program co-chair of EACL-2006, and the general chair of EACL-2014. He was among the founders, and twice (6 years) the chair, of ACL SIG Semitic. He is currently the Chair of the EACL.

Talk abstract

The Hebrew Essay Corpus

The Hebrew Essay Corpus is an annotated corpus of Hebrew language argumentative essays authored by prospective higher-education students. The corpus includes both essays by native speakers, written as part of the psychometric exam that is used to assess their future success in academic studies; and essays authored by non-native speakers, with three different native languages, that were written as part of a language aptitude test. The corpus is uniformly encoded and stored. The non-native essays were annotated with target hypotheses whose main goal is to make the texts amenable to automatic processing (morphological and syntactic analysis).

I will describe the corpus and the error correction and annotation schemes used in its analysis. In addition, I will discuss some of the challenges involved in identifying and analyzing non-native language use in general, and propose various ways for dealing with these challenges. Then, I will present classifiers that can accurately distinguish between native and non-native authors; determine the mother tongue of the non-natives; and predict the proficiency level of non-native Hebrew learners. This is important for practical (mainly educational) applications, but the endeavor also sheds light on the features that support the classification, thereby improving our understanding of learner language in general, and transfer effects from Arabic, French, and Russian on nonnative Hebrew in particular.

Linguistic Intelligence: Computers vs. Humans (Abstract)

Prof. Dr. Ruslan Mitkov, University of Wolverhampton

Computers are ubiquitous – they are and are used everywhere. But how good are computers at understanding and producing natural languages (e.g. English or Bulgarian)? In other words, what is the level of their linguistic intelligence? This presentation will examine the linguistic intelligence of the computers and will look at the challenges ahead…

Read on

Prof. Ruslan Mitkov (University of Wolverhampton)

Short bio

Ruslan MitkovProf. Dr. Ruslan Mitkov has been working in Natural Language Processing (NLP), Computational Linguistics, Corpus Linguistics, Machine Translation, Translation Technology and related areas since the early 1980s. Whereas Prof. Mitkov is best known for his seminal contributions to the areas of anaphora resolution and automatic generation of multiple-choice tests, his extensively cited research (more than 240 publications including 14 books, 35 journal articles and 36 book chapters) also covers topics such as machine translation, translation memory and translation technology in general, bilingual term extraction, automatic identification of cognates and false friends, natural language generation, automatic summarisation, computer-aided language processing, centering, evaluation, corpus annotation, NLP-driven corpus-based study of translation universals, text simplification, NLP for people with language disabilities and computational phraseology.

Mitkov is author of the monograph Anaphora resolution (Longman) and Editor of the most successful Oxford University Press Handbook – The Oxford Handbook of Computational Linguistics. Current prestigious projects include his role as Executive Editor of the Journal of Natural Language Engineering published by Cambridge University Press and Editor-in-Chief of the Natural Language Processing book series of John Benjamins publishers. Dr. Mitkov is also working on the forthcoming Oxford Dictionary of Computational Linguistics (Oxford University Press, co-authored with Patrick Hanks) and the forthcoming second, substantially revised edition of the Oxford Handbook of Computational Linguistics.

Prof. Mitkov has been invited as a keynote speaker at a number of international conferences including conferences on translation and translation technology. He has acted as Programme Chair of various international conferences on Natural Language Processing (NLP), Machine Translation, Translation Technology, Translation Studies, Corpus Linguistics and Anaphora Resolution. He is asked on a regular basis to review for leading international funding bodies and organisations and to act as a referee for applications for Professorships both in North America and Europe. Ruslan Mitkov is regularly asked to review for leading journals, publishers and conferences and serve as a member of Programme Committees or Editorial Boards. Prof. Mitkov has been an external examiner of many doctoral theses and curricula in the UK and abroad, including Master’s programmes related to NLP, Translation and Translation Technology.

Dr. Mitkov has considerable external funding to his credit (more than є 20,000,000) and is currently acting as Principal Investigator of several large projects, some of which are funded by UK research councils, by the EC as well as by companies and users from the UK and USA. Ruslan Mitkov received his MSc from the Humboldt University in Berlin, his PhD from the Technical University in Dresden and worked as a Research Professor at the Institute of Mathematics, Bulgarian Academy of Sciences, Sofia.

Mitkov is Professor of Computational Linguistics and Language Engineering at the University of Wolverhampton which he joined in 1995 and where he set up the Research Group in Computational Linguistics. His Research Group has emerged as an internationally leading unit in applied Natural Language Processing and members of the group have won awards in different NLP/shared-task competitions. In addition to being Head of the Research Group in Computational Linguistics, Prof. Mitkov is also Director of the Research Institute in Information and Language Processing. The Research Institute consists of the Research Group in Computational Linguistics and the Research Group in Statistical Cybermetrics, which is another top performer internationally. Ruslan Mitkov is Vice President of ASLING, an international Association for promoting Language Technology. Dr. Mitkov is a Fellow of the Alexander von Humboldt Foundation, Germany and was invited as Distinguished Visiting Professor at the University of Franche-Comt? in Besan?on, France; he also serves as Vice-Chair for the prestigious EC funding programme ‘Future and Emerging Technologies’.

In recognition of his outstanding professional/research achievements, Prof. Mitkov was awarded the title of Doctor Honoris Causa at Plovdiv University in November 2011. At the end of October 2014 Dr. Mitkov was also conferred Professor Honoris Causa at Veliko Tarnovo University.

Talk abstract

With a Little Help from NLP: My Language Technology Applications with Impact on Society

The talk will present three original methodologies developed by the speaker, underpinning implemented Language Technology tools which are already having an impact on the following areas of society: e-learning, translation and interpreting and care for people with language disabilities.

The first part of the presentation will introduce an original methodology and tool for generating multiple-choice tests from electronic textbooks. The application draws on a variety of Natural Language Processing (NLP) techniques which include term extraction, semantic computing and sentence transformation. The presentation will include an evaluation of the tool which demonstrates that generation of multiple-choice tests items with the help of this tool is almost four times faster than manual construction and the quality of the test items is not compromised. This application benefits e-learning users (both teachers and students) and is an example of how NLP can have a positive societal impact, in which the speaker passionately believes.

The talk will go on to outline two other original recent projects which are also related to the application of NLP beyond academia. First, a project, whose objective is to develop next-generation translation memory tools for translators and, in the near future, for interpreters, will be briefly presented. Finally, an original methodology and system will be outlined which helps users with autism to read and better understand texts.


Short bio

Sujith RaviDr. Sujith Ravi, a Staff Research Scientist and Manager at Google, leads the company’s large-scale graph-based machine learning platform that powers natural language understanding and image recognition for products used by millions of people everyday in Search, Gmail, Photos, Android, YouTube, and Allo. The machine learning technology enables features such as Smart Reply that automatically suggests replies to incoming e-mails or chat messages in Inbox and Allo; Photos that searches for anything, from “hugs” to “dogs,” with the latest image recognition system; and smart messaging directly from Android Wear smartwatches powered by on-device machine learning.

Dr. Ravi has authored more than 50 scientific publications and patents in top-tier machine learning and natural language processing conferences, and his work won the ACM SIGKDD Best Research Paper Award in 2014. He organizes machine learning symposia/workshops and regularly serves as Area Chair and PC of top-tier machine learning and natural language processing conferences.

Talk abstract

Neural Graph Learning

Recent machine learning advances have enabled us to build intelligent systems that understand semantics from speech, natural language text and images. While great progress has been made in many AI fields, building scalable intelligent systems from “scratch” still remains a daunting challenge for many applications.To overcome this, we exploit the power of graph algorithms since they offer a simple elegant way to express different types of relationships observed in data and can concisely encode structure underlying a problem. In this talk I will focus on “How can we combine the flexibility of graphs with the power of machine learning?”

I will describe how we address these challenges and design efficient algorithms by employing graph-based machine learning as a computing mechanism to solve real-world prediction tasks. Our graph-based machine learning framework can operate at large scale and easily handle massive graphs (containing billions of vertices and trillions of edges) and make predictions over billions of output labels while achieving O(1) space complexity per vertex. In particular, we combine graph learning with deep neural networks to power a number of machine intelligence applications, including Smart Reply, image recognition and video summarization to tackle complex language understanding and computer vision problems. l will also introduce some of our latest research and share results on “neural graph learning”, a new joint optimization framework for combining graph learning with deep neural network models.


Short bio

Zornitsa Kozareva After leading and managing the AWS Deep Learning group at Amazon that was responsible for building and solving natural language processing and dialog applications (2016–2017), as of December 2017 Dr. Zornitsa Kozareva has taken a managerial position at Google. From 2014 to 2016 she was a Senior Manager at Yahoo! leading the Query Processing group that powered Mobile Search and Advertisement. Earlier, during the period 2009–2014, Dr. Kozareva wore an academic hat as Research Professor at the University of Southern California CS Department with affiliation to Information Sciences Institute where she spearheaded research funded by DARPA and IARPA on learning to read, interpreting metaphors and building knowledge bases from the Web.

Dr. Kozareva regularly serves as Area Chair and PC of top-tier NLP conferences. She has organized four SemEval scientific challenges and has published over 80 research papers. Dr. Kozareva is a recipient of the John Atanasoff Award given by the President of Republic of Bulgaria in 2016 for her contributions and impact in science, education, and industry; the Yahoo! Labs Excellence Award in 2014 and the RANLP Young Scientist Award in 2011.

Talk abstract

Building Conversational Assistants using Deep Learning

Over the years there has been a paradigm shift in how humans interact with machines. Today’s users are no longer satisfied with seeing a list of relevant web pages, instead they want to complete tasks and take actions. This raises the questions: “How do we teach machines to become useful in a human-centered environment?” and “How do we build machines that help us organize our daily schedules, arrange our travel and be aware of our preferences and habits?”. In this talk, I will describe these challenges in the context of conversational assistants. Then, I will delve into deep learning algorithms for entity extraction, user intent prediction and question answering. Finally, I will highlight findings on user intent prediction from shopping, movies, restaurant and sport domains.


Златен спонсор на CLIB 2016:



Prof. Dragomir Radev (Department of Electrical Engineering and Computer Science, University of Michigan)

Short bio

Dragomir RadevDragomir Radev is a Professor of Computer Science and Engineering, Information, and Linguistics at the University of Michigan. He also has an appointment in the Michigan Institute for Data Science (MIDAS).

Dragomir grew up in Bulgaria and got interested in Computational Linguistics in high school when he participated in a number of contests in mathematical linguistics. Dragomir has a PhD in Computer Science from Columbia University, where he currently holds a Visiting Professor title. Dragomir’s research is in Natural Language Processing, Applied Machine Learning, and Information Retrieval. He works in the fields of text summarization, lexical semantics, sentiment analysis, open domain question answering, and the application of NLP to other areas such as Bioinformatics and Political Science.

Dragomir is the past secretary of the Association for Computational Linguistics (ACL). Dragomir is also co-founder of the North American Computational Linguistics Olympiad (NACLO) and the coach of the US team for the International Linguistics Olympiad (IOL). Dragomir has close to 200 international publications as well as three patents. He is the co-author (with Rada Mihalcea) of the book “Graph-based Natural Language Processing and Information Retrieval” and the editor of two volumes of “Puzzles in Logic, Languages and Computation”.

Dragomir has worked for or consulted for IBM, Yahoo, Microsoft, AT&T, and other companies. In 2013, Dragomir received the University of Michigan’s Distinguished Faculty Award. He is an associate editor of the Journal of Artificial Intelligence Research (JAIR). Dragomir also teaches introduction to Natural Language Processing on Coursera. Dragomir became an Association for Computing Machinery (ACM) Fellow in 2015.

Talk abstract

Natural Language Processing for Collective Discourse

Natural Language Processing (NLP) has become very popular in recent years thanks to new technologies like IBM’s Watson, Apple’s Siri, Google Translate, and Yahoo’s text summarization system. One of the fundamental challenges in NLP is to automatically recognize similar words and sentences. I will talk about research done in the Computational Linguistics And Information Retrieval lab (CLAIR) on graph-based methods for similarity recognition and its applications to NLP tasks. These projects are related to Collective Discourse (text collections produced by large numbers of users) and its inherent properties such as centrality and diversity. In the first project we team up with the New Yorker magazine. Each week a captionless cartoon is published in the magazine and thousands of readers try to come up with funny captions for it. In our work, we try to uncover the topics of the jokes in the submitted captions. The second project is about analysing a corpus of word clues used in New York Times crossword puzzles. We compare different clustering methods for word sense disambiguation using these crossword clues. The third project is about the automatic generation of citation-based summaries of research articles. These summaries describe what readers of the papers find most important in the cited papers. If there is time, I will also briefly mention some applications to bioinformatics, political science, and social network analysis.

Dr. Preslav Nakov (Qatar Computing Research Institute, HBKU)

Short bio

Preslav NakovThis year our invited speaker will be Dr. Preslav Nakov of the Qatar Computing Research Institute, HBKU. His primary research interests include computational linguistics, machine translation, question answering, lexical semantics, Web as a corpus, and biomedical text processing.

Preslav Nakov holds a PhD degree in Computer Science from the University of California at Berkeley and a MSc degree from Sofia University. He was a Research Fellow at the National University of Singapore (2008-2011), a honorary lecturer at Sofia University (2008), researcher at the Bulgarian Academy of Sciences (2008), and a visiting researcher at the University of Southern California, Information Sciences Institute (2005).

Preslav Nakov is a co-author of a book on Semantic Relations between Nominals, two books on computer algorithms, and over 100 research papers, including over 40 in top-tier conferences and journals.

He received the Young Researcher Award at the Recent Advances in Natural Language Processing Conference 2011 (RANLP’2011) and was the first to receive the Bulgarian President’s John Atanasoff award. His research in machine translation won competitions in the Seventh and the Ninth Workshop on Statistical Machine Translation (WMT’12 and WMT’14), as well as in the 10th International Workshop on Spoken Language Translation (IWSLT’13).

Preslav Nakov is an Associate Editor of the AI Communications journal and an elected member of the ACL SIGLEX board (since 2013). He served on the programme committees of the major conferences and workshops in computational linguistics, including as a co-chair of SemEval 2014-2016, and as an area chair of *SEM’13 and EMNLP’16.

Talk abstract

Exposing Paid Opinion Manipulation Trolls in News Community Forums

The practice of using opinion manipulation trolls has been reality since the rise of Internet and community forums. It has been shown that user opinions about products, companies and politics can be influenced by posts by other users in online forums and social networks. This makes it easy for companies and political parties to gain popularity by paying for “reputation management” to people or companies that write in discussion forums and social networks fake opinions from fake profiles.

During the 2013-2014 Bulgarian protests against the Oresharski cabinet, social networks and news community forums became the main “battle grounds” between supporters and opponents of the government. In that period, there was a very notable presence and activity of government supporters in Web forums. In series of leaked documents in the independent Bulgarian media Bivol, it was alleged that the ruling Socialist party was paying Internet trolls with EU Parliament money. Allegedly, these trolls were hired by a PR agency and were given specific instructions what to write.

A natural question is whether such trolls can be found and exposed automatically. This is a very hard task, as there is no enough data to train a classifier; yet, it is possible to obtain some test data, as these trolls are sometimes caught and widely exposed (e.g., by Bivol). Yet, one still needs training data. We solve the problem by assuming that a user who is called a troll by several different people is likely to be one, and one who has never been called a troll is unlikely to be such. We compare the profiles of (i) paid trolls vs. (ii) “mentioned” trolls vs. (iii) non-trolls, and we further show that a classifier trained to distinguish (ii) from (iii) does quite well also at telling apart (i) from (iii).

Plenary Talks