Category Archive: Invited Speakers

Prof. Iryna Gurevych (Technical University of Darmstadt, Germany)

Short bio

Iryna Gurevych is a German computer scientist. She is Professor at the Department of Computer Science of the Technical University of Darmstadt and Director of Ubiquitous Knowledge Processing Lab. She has a strong background in information extraction, semantic text processing, machine learning and innovative applications of NLP to social sciences and humanities.

Iryna Gurevych has published over 300 publications in international conferences and journals and is member of programme and conference committees of more than 50 high-level conferences and workshops (ACL, EACL, NAACL, etc.).  She is the holder of several awards, including the Lichtenberg-Professorship Career Award und the Emmy-Noether Career Award (both in 2007). In 2021 she received the first LOEWE-professorship of the LOEWE programme. She has been selected as a ACL Fellow 2020 for her outstanding work in natural language processing and machine learning and is the Vice-president-elect of the ACL since 2021.

Talk Abstract

Detect – Verify – Communicate: Combating Misinformation with More Realistic NLP

Dealing with misinformation is a grand challenge of the information society directed at equipping the computer users with effective tools for identifying and debunking misinformation. Current Natural Language Processing (NLP) including its fact-checking research fails to meet the expectations of real-life scenarios. In this talk, we show why the past work on fact-checking has not yet led to truly useful tools for managing misinformation, and discuss our ongoing work on more realistic solutions. NLP systems are expensive in terms of financial cost, computation, and manpower needed to create data for the learning process. With that in mind, we are pursuing research on detection of emerging misinformation topics to focus human attention on the most harmful, novel examples. Automatic methods for claim verification rely on large, high-quality datasets. To this end, we have constructed two corpora for fact checking, considering larger evidence documents and pushing the state of the art closer to the reality of combating misinformation. We further compare the capabilities of automatic, NLP-based approaches to what human fact checkers actually do, uncovering critical research directions for the future. To edify false beliefs, we are collaborating with cognitive scientists and psychologists to automatically detect and respond to attitudes of vaccine hesitancy, encouraging anti-vaxxers to change their minds with effective communication strategies.

Prof. Shuly Wintner (University of Haifa, Israel)

Short bio

Shuly Wintner is professor of computer science at the University of Haifa, Israel. His research spans across various areas of computational linguistics and natural language processing, including formal grammars, morphology, syntax, language resources, translation, and multilingualism.

He served as the editor-in-chief of Springer’s Research on Language and Computation, a program co-chair of EACL-2006, and the general chair of EACL-2014. He was among the founders, and twice (6 years) the chair, of ACL SIG Semitic. He is currently the Chair of the EACL.


 

Talk abstract

The Hebrew Essay Corpus

The Hebrew Essay Corpus is an annotated corpus of Hebrew language argumentative essays authored by prospective higher-education students. The corpus includes both essays by native speakers, written as part of the psychometric exam that is used to assess their future success in academic studies; and essays authored by non-native speakers, with three different native languages, that were written as part of a language aptitude test. The corpus is uniformly encoded and stored. The non-native essays were annotated with target hypotheses whose main goal is to make the texts amenable to automatic processing (morphological and syntactic analysis).

I will describe the corpus and the error correction and annotation schemes used in its analysis. In addition, I will discuss some of the challenges involved in identifying and analyzing non-native language use in general, and propose various ways for dealing with these challenges. Then, I will present classifiers that can accurately distinguish between native and non-native authors; determine the mother tongue of the non-natives; and predict the proficiency level of non-native Hebrew learners. This is important for practical (mainly educational) applications, but the endeavor also sheds light on the features that support the classification, thereby improving our understanding of learner language in general, and transfer effects from Arabic, French, and Russian on nonnative Hebrew in particular.

Dr. Georg Rehm (German Research Center for Artificial Intelligence)

Short bio

Dr. Georg Rehm works as a Principal Researcher in the Speech and Language Technology Lab at the German Research Center for Artificial Intelligence (DFKI), in Berlin. He is the General Secretary of META-NET, an EU/EC-funded Network of Excellence consisting of 60 research centers from 34 countries, dedicated to building the technological foundations of a multilingual European information society. Currently, Georg Rehm is the Coordinator of the BMBF-funded project QURATOR (Curation Technologies, 2018-2021) and the EU-funded project European Language Grid (ELG, 2019-2021). Furthermore, he is involved as a Principal Investigator in the European project Lynx: Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe (2017-2020) and in the BMWi-funded project SPEAKER (2020-2023).

In October 2018, Georg Rehm was awarded the honorary appointment as a DFKI Research Fellow for outstanding scientific achievements and special accomplishments in technology transfer.

Between 2015 and 2017, Georg Rehm was the coordinator of the BMBF-funded project Digitale Kuratierungstechnologien. He also was the Coordinator of the EU/EC-funded project CRACKER (2015-2017) which initiated, among others, the emerging European federation Cracking the Language Barrier. From 2010 to 2013 he was the project manager of T4ME, the original META-NET project.

Since 2013, Georg Rehm has been the Head of the German/Austrian Office of the World Wide Web Consortium (W3C), hosted at DFKI in Berlin. Georg Rehm is especially involved in Digital Publishing, Web of Things, Data Activity, Credible Web. Also related to ICT and standardisation, Georg Rehm is a member of the DIN Presidential Committee FOCUS.ICT.

Georg Rehm holds an M.A. in Computational Linguistics and Artificial Intelligence, Linguistics and Computer Science from the University of Osnabruck. After completing his PhD in Computational Linguistics at the University of Giessen, he worked at the University of Tubingen, leading projects on the sustainability of language resources and technologies. After being responsible for the language technology development at an award-winning internet startup in Berlin, he joined DFKI in early 2010.

Georg Rehm has authored, co-authored or edited more than 180 research publications and co-edited, together with Hans Uszkoreit, the META-NET White Paper Series Europe’s Languages in the Digital Age as well as the META-NET Strategic Research Agenda for Multilingual 2020. He is also one of the editors of Language Technologies for Multilingual Europe: Towards a Human Language Project – Strategic Research and Innovation Agenda.

>> Back to Plenary Talks

Talk abstract

Demonstration of the European Language Grid

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects, one of which was recently closed. The presentation will give an overview of the European Language Grid project and it will also contain a demonstration of the emerging ELG technology platform.

>> Back to Plenary Talks

Prof. Galya Angelova (Institute of Information and Communication Technologies)

Short bio

Galya AngelovaGalya Angelova is Professor in Computer Science and Doctor of Sciences, director of the Institute of Information and Communication Technologies (IICT) at the Bulgarian Academy of Sciences. She studied Mathematics and Informatics at Sofia University “St. Kliment Ohridski” and received her PhD from MTA SZTAKI (Computer and Automation Institute at the Hungarian Academy of Sciences). Her major fields of research are: Knowledge-based natural language processing (information extraction from text, automatic acquisition of conceptual information from text, analysis of clinical patient records in Bulgarian language, analysis of image tags and automatic tag sense recognition); Big data analytics and visualization; Digitization and Intelligent management of digital content.

Prof. Angelova has published more than 150 scientific publications in journals, book chapters, and edited Conference volumes. She was the coordinator or principal investigator of more than 25 projects with international or national funding. In 2012-2016 she coordinated the project AComIn “Advanced Computing for Innovation”, a 3.2 MEuro grant with the European Commission, FP7 Capacity, included by the European Commission in the book “Achievements of FP7: examples that make us proud”. Prof. Angelova received the Big Award PITAGOR of the Bulgarian Ministry of Education and Science in the category “Successful leader of international projects for 2015”. She acts as a reviewer and evaluator for the European Commission.

In 2002-2013, Prof. Angelova was Member of the Editorial Board of the International Conference of Conceptual Structures (ICCS) with a series of peer-reviewed Proceedings published by Springer in Lecture Notes on Artificial Intelligence. After 2001, she is the Chair of the Organising Committee of the International Conference RANLP (Recent Advances in Natural Language Processing), held biennially in Bulgaria. After 2009, the RANLP Proceedings is uploaded in the ACL Anthology. After 2007, the RANLP Proceedings is indexed by Scopus with current SJR-rank 0.143.

>> Back to Plenary Talks

Talk abstract

Tag Sense Disambiguation in Large Image Collections: Is It Possible?

Automatic identification of intended tag meanings is a challenge in large annotated image collections where human authors assign tags inspired by emotional or professional motivations. This task can be viewed as part of the AI-complete problem to integrate language and vision. Algorithms for automatic Tag Sense Disambiguation (TSD) need “golden” collections of manually created tags to establish baselines for accuracy assessment. In this talk the TSD task will be presented with its background, complexity and possible solutions. An approach to use WordNet senses and Lesk algorithm proves to be successful but the evaluation was done manually for a small number of tags. Another experiment with the MIRFLICKR-25000 image collection will be presented as well. Word embeddings create a specific baseline so the results can be compared. The accuracy achieved in this exercise is 78.6%.

By improving TSD and obtaining high quality synsets for the image tags, we are actually supporting the machine translation of the large annotated image collections to languages other than English.

>> Back to Plenary Talks

Assoc. Prof. Svetla Boytcheva (Institute of Information and Communication Technologies)

Short bio

Svetla BoytchevaDr. Svetla Boytcheva has a PhD in Computer Science and MSc in Mathematics from Sofia University St. Kliment Ohridski, Bulgaria. Her PhD thesis is in the field of Machine Learning and NLP. She has a long track of computer science courses teaching more than 25 years at one of the top universities in Bulgaria – Sofia University, American University in Bulgaria, University of Library Studies and Information Technologies, and New Bulgarian University. She has leadership and Management Skills gained as Vice Dean of Academic Affairs and head of the graduate program in Artificial Intelligence at Sofia University, as well as supervisor of successfully defended PhD student and more than 30 undergraduate and graduate student thesis projects.

Her current research interests include different aspects of Artificial Intelligence and Biomedical Informatics – machine learning, data mining, big data analytics, natural language processing, health informatics, and e-learning. She gained experience in participating in several EU funded and national research projects including EC FP7 — PSIP+, AcomIn, SISTER; EC FP6 — TENCompetence, KALEIDOSCOPE; INCO-Copernicus — LarFlast, ILPnet2; SOCARATES/ERASMUS— ETN-DEC; BMBF Germany —BIMDANUBE; Bulgarian National Science Fund — EVTIMA, IZIDA, DemoSem. She is the responsible person from the Bulgarian team for H2020 projects InnoRate, ExaMode, theFMS. Currently, she also participates ин several governmental projects and operational program projects co-financed by EU: eHealth National Scientific Programmes, Information and Communication Technologies for a Unite Digital Market in Science, Education and Security, the Centre for Advanced Computing and Data Processing and Ministry of Health Projects: Development of National Electronic Health Register for Diabetes Mellitus Diseases, Analysis of the morbidity, prevalence, and treatment assessment of Diabetes Mellitus and Cardiovascular Diseases. In 2011 she received the Rolf Hansen Memorial Award of the European Federation for Medical Informatics.

Currently, she is also an associate professor of computer science in the Department of Linguistic Modelling and Knowledge Processing at the Institute of Information and Communication Technologies of the Bulgarian Academy of Sciences. Her current position in Sirma AI (trading as Ontotext) is Senior Research Lead, that includes responsibilities for conducting research and prototypes development for scientific projects of the company. She has authored 10 books and more than 90 scientific papers. She is also an author for several textbooks in Computer Science and Information Technologies for middle and secondary schools in Bulgaria.

>> Back to Plenary Talks

Talk abstract

Clinical Natural Language Processing in Bulgarian

Healthcare is a data intense domain. A large amount of patient data is generated daily. However, more than 80% of this information is stored in an unstructured format – as clinical texts. Usually, clinical narratives contain a description with telegraph-style sentences, ambiguous abbreviations, many typographical errors, lack of punctuation, concatenated words, and etc. Especially in the Bulgarian context – medical texts contain terminology both in Bulgarian, Latin and transliterated Latin terminology in Cyrillic, that makes the task for text analytics more challenging. Recently, with the improvement of the quality of natural language processing (NLP), it is increasingly recognized as the most useful tool for extracting clinical information from free text in scientific medical publications and clinical records. Natural language processing (NLP) of non-English clinical text is quite a challenge because of the lack of resources and NLP tools. International medical ontologies such as SNOMED, MeSH (Medical Subject Headings), and the UMLS (Unified Medical Languages System) are not yet available in most languages. This necessitates the development of new methods for processing clinical information and for semi-automatically generating medical language resources. This is not an easy task because of the lack of a sufficiently accessible repositories with medical records, due to the specific nature of the content, which contains a lot of personal data and specific regulations for their access.

In this talk will be discussed the multilingual aspects of automation Extract text from clinical narratives in the Bulgarian language. This is very important task for medical informatics, because it allows the automatic structuring of patient information and the generation of databases that can be further investigated by retrieving data to search for complex relationships. The results can help improve clinical decision support, diagnosis and treatment support systems.

>> Back to Plenary Talks

Dr. Preslav Nakov (Qatar Computing Research Institute)

Short bio

Preslav_Nakov1Dr. Preslav Nakov is a Principal Scientist at the Qatar Computing Research Institute (QCRI), HBKU. His research interests include computational linguistics, “fake news” detection, fact-checking, machine translation, question answering, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. He received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant), and he was a Research Fellow at the National University of Singapore, a honorary lecturer at Sofia University, and research staff at the Bulgarian Academy of Sciences.

At QCRI, he leads the Tanbih project, developed in collaboration with MIT, which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. Dr. Nakov is the Secretary of ACL SIGLEX and of ACL SIGSLAV, and a member of the EACL advisory board. He is member of the editorial board of TACL, C&SL, NLE, AI Communications, and Frontiers in AI. He is also on the Editorial Board of the Language Science Press Book Series on Phraseology and Multiword Expressions. He co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals.

Dr. Nakov received the Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President’s John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov’s research was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, DefenseOne, Business Insider, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

>> Back to Plenary Talks

Talk abstract

Detecting the Fake News at Its Source, Media Literacy, and Regulatory Compliance

Given the recent proliferation of disinformation online, there has been also growing research interest in automatically debunking rumors, false claims, and “fake news”. A number of fact-checking initiatives have been launched so far, both manual and automatic, but the whole enterprise remains in a state of crisis: by the time a claim is finally fact-checked, it could have reached millions of users, and the harm caused could hardly be undone. An arguably more promising direction is to focus on fact-checking entire news outlets, which can be done in advance. Then, we could fact-check the news before they were even written: by checking how trustworthy the outlets that published them are.

We will show how we do this in the Tanbih news aggregator (//www.tanbih.org/), which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. The project’s primary aim is to promote media literacy and critical thinking, which are arguably the best way to address disinformation and “fake news” in the long run. In particular, we develop media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, stance with respect to various claims and topics, as well as audience reach and audience bias in social media. We further offer explainability by automatically detecting and highlighting the instances of use of specific propaganda techniques in the news (https://www.tanbih.org/propaganda).

Finally, we will show how this research can support broadcasters and content owners with their regulatory measures and compliance processes. This is a direction we recently explored as part of our TM Forum & IBC 2019 award-winning Media-Telecom Catalyst project on AI Indexing for Regulatory Compliance, which QCRI developed in partnership with Al Jazeera, Associated Press, RTE Ireland, Tech Mahindra, V-Nova, and Metaliquid.

>> Back to Plenary Talks

Loading...
X