Category Archive: Uncategorized

Dr. Veselin Stoyanov

Posted by Svetlozara Lesseva On March 27th, 2024

Short bio

Dr. Veselin Stoyanov is a Researcher with a track record of innovating in AI and NLP to solve real-world problems. He is currently the Head of AI at Tome building practical applications of LLMs. He was previously at Facebook AI, where he led the development of industry-standard Large LM methods such as RoBERTa, XLM-R, and MultiRay and their application to improve online experiences, e.g., reduce the prevalence of hate speech and bullying posts. He holds a Ph.D. from Cornell University and a Post-Doctoral degree from Johns Hopkins University.

Talk abstract

Large Language Models for the Real World: Explorations of Sparse, Cross-lingual Understanding and Instruction-Tuned LLM

Large language models (LLMs) have revolutionized NLP and the use of Natural Language in products. Nonetheless, there are challenges to the wide adoption of LLMs. In this talk, I will describe my explorations into addressing some of those challenges. I will cover work on sparse models addressing high computational costs, multilingual LLMs addressing the need to handle many languages, and work on instruction finetuning addressing the alignment between model outputs and human needs.

Prof. Vito Pirrelli

Posted by Svetlozara Lesseva On February 29th, 2024

Short bio

Prof. Vito Pirrelli is Research manager at the National Research Council Institute for Computational Linguistics Antonio Zampolli since 2003, he is head of the Laboratory for Communication Physiology, and co-editor in chief of the Mental Lexicon and Lingue e Linguaggio. His main research interests focus on fundamental issues of language architecture and physiology, lying at the interdisciplinary crossroad of cognitive linguistics, psycholinguistics, neuroscience and information science.

Over the last 20 years, he has been leading a data-driven research program that uses artificial neural networks, language models and information and communication technologies to investigate language as a holistic dynamic system, emerging from interrelated patterns of sensory experience, communicative and social interaction and psychological and neurobiological mechanisms. This research program went beyond the fragmentation of mainstream NLP technologies of the early 21st century, allowing innovation to come out of research labs and address societal needs. Using portable devices and cloud computing to collect ecological multimodal language data, the Comphys Lab currently offers a battery of tools, resources and protocols that support language teaching and education assessment, cultural integration and early diagnosis and intervention of language and cognitive disorders.

In 2021, following a peer review by the relevant Class Committee, he was elected member of the Academia Europaea.

Talk abstract

Written Text Processing and the Adaptive Reading Hypothesis

Oral reading requires the fine coordination of eye movements and articulatory movements. The eye provides access to the input stimuli needed for voice articulation to unfold at a relatively constant rate, while control on articulation provides internal feedback to oculomotor control for eye movements to be directed when and where a decoding problem arises.

A factor that makes coordination of the eye and the voice particularly hard to manage is their asynchrony. Eye movements are faster than voice articulation and are much freer to scan a written text forwards and backwards. As a result, given a certain time window, the eye can typically fixate more words than the voice can articulate.

According to most scholars, readers compensate for this functional asynchrony by using their phonological buffer, a working memory stack of limited temporal capacity where fixated words can be maintained temporarily, until they are read out loud. The capacity of the phonological buffer thus puts an upper limit on the distance between the position of the voice and the position of the eye during oral text reading, known as the eye-voice span.

In my talk, I will discuss recent reading evidence showing that the eye-voice span is the “elastic” outcome of an optimally adaptive viewing strategy, interactively modulated by individual reading skills and the lexical and structural features of a text. The voice span not only varies across readers depending on their rate of articulation, but it also varies within each reader, getting larger when a larger structural unit is processed. This suggests that skilled readers can optimally coordinate articulation and fixation times for text processing, adaptively using their phonological memory buffer to process linguistic structures of different size and complexity.

Prof. Joakim Nivre

Posted by Svetlozara Lesseva On February 29th, 2024

Short bio

Prof. Joakim Nivre is Professor of Computational Linguistics at Uppsala University and Senior Researcher at RISE (Research Institutes of Sweden). He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Vaxjo University.

His research focuses on data-driven methods for natural language processing, in particular for morphosyntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the widely used MaltParser system, and one of the founders of the Universal Dependencies project, which aims to develop cross-linguistically consistent treebank annotation for many languages and currently involves nearly 150 languages and over 500 researchers around the world. He has produced over 300 scientific publications and has over 42,000 citations according to Google Scholar (February, 2024). He is a fellow of the Association for Computational Linguistics and was the president of the association in 2017.

Talk abstract

Ten Years of Universal Dependencies

Universal Dependencies (UD) is a project developing cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. Since UD was launched almost ten years ago, it has grown into a large community effort involving over 500 researchers around the world, together producing treebanks for 148 languages and enabling new research directions in both NLP and linguistics. In this talk, I will review the history and development of UD and discuss challenges that we need to face when bringing UD into the future.

Jose Manuel Gomez-Perez

Posted by Svetlozara Lesseva On August 15th, 2022

Short bio

Jose Manuel Gomez-Perez is the Director of Language Technology Research at expert.ai. He works in the intersection of several areas of artificial intelligence, combining structured knowledge and neural models to enable machine understanding of unstructured data as an analogous process to human comprehension. Jose Manuel collaborates with organizations like the European Space Agency and has advised several tech startups. A former Marie Curie fellow, he holds a Ph.D. in Computer Science and Artificial Intelligence based on his work during project Halo, an initiative of Microsoft co-founder Paul Allen to create a Digital Aristotle for life and physical sciences.

He regularly publishes in areas of AI, natural language processing and knowledge graphs, and has given invited seminars at different universities in Europe and the USA. Recently, he published the book A Practical Guide to Hybrid Natural Language Processing. Magazines like Nature and Scientific American, as well as newspapers like El Pa?s have collected his views on AI, language and vision understanding, and their applications.

Talk abstract

Towards AI that Reasons with Scientific Text and Images

Reading a textbook in a particular discipline and being able to answer the questions at the end of each chapter is one of the grand challenges of artificial intelligence, which requires advances in language, vision, problem-solving, and learning theory. Such challenges are best illustrated in the scientific domain, where complex information is presented over a variety of modalities involving not only language but also visual information, like diagrams and figures.

In this talk, we will analyze the specific challenges entailed in understanding scientific documents and share some of the recent advances in the area that enable the development of AI systems capable to answer scientific questions. In addition, we will reflect on what new developments will be required to address the next grand challenge: to create an AI system that can make major scientific discoveries by itself.

Prof. Bolette Sandford Pedersen

Posted by Svetlozara Lesseva On July 6th, 2022

Short bio

Bolette Sandford Pedersen is professor of computational linguistics, Deputy Head of the Department of Nordic Studies and Linguistics & Centre Leader of the Centre for Language Technology. Her main research interests include computational lexicography, lexical semantics and linguistic ontologies.

Bolette Sandford Pedersen was coordinator of the Nordic NORFA network SPINN on harmonisation of language resources in the Nordic countries, coordinator of the Danish Senseval2 participation on sense tagging, project manager of DanNet, package leader of lexical resources in DK-CLARIN (2008-2011), Danish coordinator of the EU project CLARA — Common Language Resources and their Applications — a Marie Curie Initial Training Network (2011-2014) and of the EU project META-NORD (2011-2013), project co-leader of the project Semantic Processing Across Domains financed by the Danish Research Council (2013-2016).

She has been member of selected scientific committees at ACL, COLING, the Global WordNet Conference, the Euralex Congress, LREC, OntoLex, among others.

Talk abstract

Lexical Conceptual Resources in the Era of Neural Language Models

Lexical conceptual resources in terms of e.g. wordnets, framenets, terminologies and ontologies have been compiled for many languages during the last decades in order to provide NLP systems with formally expressed information about the semantics of words and phrases, and about how they refer to the world. In most recent years, neural language models have become a game-changer in the NLP field – based, as they are, solely on text from large corpora. It is time we ask ourselves: What is the role of lexical conceptual resources in the era of neural language models? The claim of my talk is that they still play a crucial role since NLP systems based on textual distribution alone will always to some extent be insufficient and biased. Through my own work, which has over the years taken place in close collaboration with leading lexicographers in Denmark, I will illustrate how such conceptual resources can be compiled based on existing high-quality and continuously updated lexicographical resources and how they can be further curated by examining the distributional patterns captured in word embeddings.

Dr. Hristo Tanev (Joint Research Centre, EC, Italy)

Posted by Svetlozara Lesseva On June 30th, 2022

Short bio

Hristo Tanev is a project officer and researcher at the Joint Research Centre of the European Commission. His research spans across various areas of computational linguistics and natural language processing, including event extraction, text classification, question answering, social media mining, lexical learning, language resources, and multilingualism.

He is a co-organizer of the Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. He has carried out research in three research institutions: University of Plovdiv Paisii Hilendarski (Bulgaria), ITC-irst (now Fondazione Bruno Kessler), Trento, Italy, and the Joint Research Centre of the European Commission, Ispra, Italy. He is among the founders of SIG SLAV (Special Interest Group of Slavic Language Processing) at ACL.

Demo abstract

Ontopopulis, a System for Learning Semantic Classes

Ontopopulis is a multilingual terminology learning system which implements several weakly supervised algorithms for terminology learning. The main algorithm in the system is a weakly supervised one which takes on its input a set of seed terms for a semantic category under consideration and an unannotated text corpus. The algorithm learns additional terms, which belong to this category. For example, for the category “environment disasters” in Bulgarian language the input seed set is: замърсяване на водите, изменение на климата, суша. The highest ranked new terms which the system learns for this semantic class are : опустиняване, обезлесяване, озонова дупка and so on.

In the demo session we are going to show how the system learns different semantic classes in Bulgarian and English.

Protected: MIC MM Object Detection Challenges 2022

Posted by Svetlozara Lesseva On May 27th, 2022

Linguistic Intelligence: Computers vs. Humans (Abstract)

Posted by Svetlozara Lesseva On April 27th, 2018

Prof. Dr. Ruslan Mitkov, University of Wolverhampton

Computers are ubiquitous – they are and are used everywhere. But how good are computers at understanding and producing natural languages (e.g. English or Bulgarian)? In other words, what is the level of their linguistic intelligence? This presentation will examine the linguistic intelligence of the computers and will look at the challenges ahead…

Read on

Prof. Ruslan Mitkov (University of Wolverhampton)

Posted by Svetlozara Lesseva On October 30th, 2017

Short bio

Prof. Dr. Ruslan Mitkov has been working in Natural Language Processing (NLP), Computational Linguistics, Corpus Linguistics, Machine Translation, Translation Technology and related areas since the early 1980s. Whereas Prof. Mitkov is best known for his seminal contributions to the areas of anaphora resolution and automatic generation of multiple-choice tests, his extensively cited research (more than 240 publications including 14 books, 35 journal articles and 36 book chapters) also covers topics such as machine translation, translation memory and translation technology in general, bilingual term extraction, automatic identification of cognates and false friends, natural language generation, automatic summarisation, computer-aided language processing, centering, evaluation, corpus annotation, NLP-driven corpus-based study of translation universals, text simplification, NLP for people with language disabilities and computational phraseology.

Mitkov is author of the monograph Anaphora resolution (Longman) and Editor of the most successful Oxford University Press Handbook – The Oxford Handbook of Computational Linguistics. Current prestigious projects include his role as Executive Editor of the Journal of Natural Language Engineering published by Cambridge University Press and Editor-in-Chief of the Natural Language Processing book series of John Benjamins publishers. Dr. Mitkov is also working on the forthcoming Oxford Dictionary of Computational Linguistics (Oxford University Press, co-authored with Patrick Hanks) and the forthcoming second, substantially revised edition of the Oxford Handbook of Computational Linguistics.

Prof. Mitkov has been invited as a keynote speaker at a number of international conferences including conferences on translation and translation technology. He has acted as Programme Chair of various international conferences on Natural Language Processing (NLP), Machine Translation, Translation Technology, Translation Studies, Corpus Linguistics and Anaphora Resolution. He is asked on a regular basis to review for leading international funding bodies and organisations and to act as a referee for applications for Professorships both in North America and Europe. Ruslan Mitkov is regularly asked to review for leading journals, publishers and conferences and serve as a member of Programme Committees or Editorial Boards. Prof. Mitkov has been an external examiner of many doctoral theses and curricula in the UK and abroad, including Master’s programmes related to NLP, Translation and Translation Technology.

Dr. Mitkov has considerable external funding to his credit (more than є 20,000,000) and is currently acting as Principal Investigator of several large projects, some of which are funded by UK research councils, by the EC as well as by companies and users from the UK and USA. Ruslan Mitkov received his MSc from the Humboldt University in Berlin, his PhD from the Technical University in Dresden and worked as a Research Professor at the Institute of Mathematics, Bulgarian Academy of Sciences, Sofia.

Mitkov is Professor of Computational Linguistics and Language Engineering at the University of Wolverhampton which he joined in 1995 and where he set up the Research Group in Computational Linguistics. His Research Group has emerged as an internationally leading unit in applied Natural Language Processing and members of the group have won awards in different NLP/shared-task competitions. In addition to being Head of the Research Group in Computational Linguistics, Prof. Mitkov is also Director of the Research Institute in Information and Language Processing. The Research Institute consists of the Research Group in Computational Linguistics and the Research Group in Statistical Cybermetrics, which is another top performer internationally. Ruslan Mitkov is Vice President of ASLING, an international Association for promoting Language Technology. Dr. Mitkov is a Fellow of the Alexander von Humboldt Foundation, Germany and was invited as Distinguished Visiting Professor at the University of Franche-Comt? in Besan?on, France; he also serves as Vice-Chair for the prestigious EC funding programme ‘Future and Emerging Technologies’.

In recognition of his outstanding professional/research achievements, Prof. Mitkov was awarded the title of Doctor Honoris Causa at Plovdiv University in November 2011. At the end of October 2014 Dr. Mitkov was also conferred Professor Honoris Causa at Veliko Tarnovo University.

Talk abstract

With a Little Help from NLP: My Language Technology Applications with Impact on Society

The talk will present three original methodologies developed by the speaker, underpinning implemented Language Technology tools which are already having an impact on the following areas of society: e-learning, translation and interpreting and care for people with language disabilities.

The first part of the presentation will introduce an original methodology and tool for generating multiple-choice tests from electronic textbooks. The application draws on a variety of Natural Language Processing (NLP) techniques which include term extraction, semantic computing and sentence transformation. The presentation will include an evaluation of the tool which demonstrates that generation of multiple-choice tests items with the help of this tool is almost four times faster than manual construction and the quality of the test items is not compromised. This application benefits e-learning users (both teachers and students) and is an example of how NLP can have a positive societal impact, in which the speaker passionately believes.

The talk will go on to outline two other original recent projects which are also related to the application of NLP beyond academia. First, a project, whose objective is to develop next-generation translation memory tools for translators and, in the near future, for interpreters, will be briefly presented. Finally, an original methodology and system will be outlined which helps users with autism to read and better understand texts.

DR. SUJITH RAVI (GOOGLE)

Posted by Svetlozara Lesseva On September 23rd, 2017

Short bio

Dr. Sujith Ravi, a Staff Research Scientist and Manager at Google, leads the company’s large-scale graph-based machine learning platform that powers natural language understanding and image recognition for products used by millions of people everyday in Search, Gmail, Photos, Android, YouTube, and Allo. The machine learning technology enables features such as Smart Reply that automatically suggests replies to incoming e-mails or chat messages in Inbox and Allo; Photos that searches for anything, from “hugs” to “dogs,” with the latest image recognition system; and smart messaging directly from Android Wear smartwatches powered by on-device machine learning.

Dr. Ravi has authored more than 50 scientific publications and patents in top-tier machine learning and natural language processing conferences, and his work won the ACM SIGKDD Best Research Paper Award in 2014. He organizes machine learning symposia/workshops and regularly serves as Area Chair and PC of top-tier machine learning and natural language processing conferences.

Talk abstract

Neural Graph Learning

Recent machine learning advances have enabled us to build intelligent systems that understand semantics from speech, natural language text and images. While great progress has been made in many AI fields, building scalable intelligent systems from “scratch” still remains a daunting challenge for many applications.To overcome this, we exploit the power of graph algorithms since they offer a simple elegant way to express different types of relationships observed in data and can concisely encode structure underlying a problem. In this talk I will focus on “How can we combine the flexibility of graphs with the power of machine learning?”

I will describe how we address these challenges and design efficient algorithms by employing graph-based machine learning as a computing mechanism to solve real-world prediction tasks. Our graph-based machine learning framework can operate at large scale and easily handle massive graphs (containing billions of vertices and trillions of edges) and make predictions over billions of output labels while achieving O(1) space complexity per vertex. In particular, we combine graph learning with deep neural networks to power a number of machine intelligence applications, including Smart Reply, image recognition and video summarization to tackle complex language understanding and computer vision problems. l will also introduce some of our latest research and share results on “neural graph learning”, a new joint optimization framework for combining graph learning with deep neural network models.