
Dr. Hristo Tanev (Joint Research Centre, EC, Italy)

Short bio

Hristo Tanev is a project officer and researcher at the Joint Research Centre of the European Commission. His research spans across various areas of computational linguistics and natural language processing, including event extraction, text classification, question answering, social media mining, lexical learning, language resources, and multilingualism.

He is a co-organizer of the Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. He has carried out research in three research institutions: University of Plovdiv Paisii Hilendarski (Bulgaria), ITC-irst (now Fondazione Bruno Kessler), Trento, Italy, and the Joint Research Centre of the European Commission, Ispra, Italy. He is among the founders of SIG SLAV (Special Interest Group of Slavic Language Processing) at ACL.


Demo abstract

Ontopopulis, a System for Learning Semantic Classes

Ontopopulis is a multilingual terminology learning system which implements several weakly supervised algorithms for terminology learning. The main algorithm in the system is a weakly supervised one which takes on its input a set of seed terms for a semantic category under consideration and an unannotated text corpus. The algorithm learns additional terms, which belong to this category. For example, for the category “environment disasters” in Bulgarian language the input seed set is: замърсяване на водите, изменение на климата, суша. The highest ranked new terms which the system learns for this semantic class are : опустиняване, обезлесяване, озонова дупка and so on.

In the demo session we are going to show how the system learns different semantic classes in Bulgarian and English.
