EN BG

Enriching the Semantic Network WordNet with Conceptual Frames



Duration: 2021-2023

Type of project: collective

Funding: National Science Research Fund



Principal Investigator: Prof. S. Koeva, Ph.D.

Participants: Prof. S. Koeva, Prof. Mila Dimitrova-Valchanova (Norwegian University of Science and Technology, Trondheim), Prof. Tinko Tinchev (Department of Mathematics and Informatics, Sofia University), Assist. Prof. S. Leseva, Assist. Prof. T. Dimitrova, Assist. Prof. M. Todorova, Assist. Prof. Valentina Stefanova, I. Stoyanova, M. Yalamov, K. Belev, H. Kukova, V. Petrova.

Summary:

The proposed theoretical research is aimed at devising a semantic description and a typology of verb predicates belonging to the basic conceptual apparatus. The semantic description will be based on the elaboration of a system of abstract conceptual frames representing the semantic structure of verbs belonging to the basic vocabulary (including the vocabulary of children of a certain age group) and the integration of conceptual frames into the structure of the semantic network Wordnet.

The fundamental objective of the proposed research is to achieve an abstract representation of the range of conceptual frames which describe the set of semantic relations between verb predicates and noun classes realised as (mandatory or optional) components of predicate structures.

Our hypothesis states that as far as conceptualisation reflects the world around us and allows cross-language communication, such abstract and (to a large extent) language independent description is possible.

In order to achieve our goal, we define the following objectives:

  • Development of a system of conceptual frames representing the semantic structure of verbs from the basic vocabulary (including the vocabulary of children of a certain age group).
  • Detailed ontological presentation of the semantic classes of nouns in WordNet, participating in the semantic structure of verbs from the basic vocabulary.
  • Integrating the system of conceptual frames into the structure of the WordNet semantic network.
  • Derivation of theoretical generalisations for the ontological description of the semantic classes of nouns, for the system of conceptual frames, as well as for the complex presentation of semantic information.

Prerequisite for achieving the main goal and specific objectives is the use of automatic procedures for analysis of large volumes of text with a view to the identification of verb predicates and their surroundings, as well as the implementation and use of an online system for creation, editing and visualisation of conceptual frames.

At the heart of this aspect of the study is the understanding that reliable theoretical conclusions can be drawn on the basis of quantitative and distributive analysis performed using modern language technologies.

In fulfilling the outlined objectives, another fundamental goal will be achieved: differentiation of the main similarities and differences in the models for conceptualisation, lexicalisation and grammaticalisation of different semantic classes of predicates in the modern Bulgarian language. This will highlight language-specific and language-independent semantic characteristics that have general theoretical significance (or are valid for a large group of languages).

The plan for realisation and dissemination of the results is oriented towards providing wide public access to the acquired new knowledge under non-exclusive and non-discriminatory conditions: providing free online access, dissemination of the created semantic resources with CC-BY-SA license, wide popularisation through participation in national and international scientific forums, integration of teaching results, organisation of events aimed at the general public.

1. Providing online access to project results

A web page will be developed where up-to-date information about the project and its progress will be published. Through this page the project team will make available the scientific results of the project. Access to the project results will also be provided through the page of the Institute for Bulgarian Language. To increase their visibility, the project results (databases and publications describing them) will be published on the META-SHARE page on the website of the Institute for Bulgarian Language: or other similar platforms.

This will ensure wide access to the project results, including: a) the developed semantic resources with free access (license CC-BY-SA) – a collection of verb synonym sets that form a part of the basic vocabulary; an ontology of the semantic classes of nouns in Wordnet; a system of conceptual frames representing the semantic structure of verbs from the basic vocabulary; a database of verb synonym sets and their conceptual frames; a system of conceptual frames integrated into the Wordnet structure; b) software for creation, editing and visualisation of conceptual frames; (c) freely available publications and presentations of reports at scientific forums (in the form of multimedia presentations, video recordings, etc.).

2. Publications

The project envisages preparation and publication of minimum 6 papers in refereed and indexed journals, such as Lingvisticae Investigationes: International Journal of Linguistics and Language Resources, as well in the journals published by the Institute for Bulgarian Language – Balgarski ezik (Bulgarian Language) and Balkansko ezikoznanie (Balkan Linguistics) (indexed in SCOPUS), Proceedings of the Institute of Bulgarian Language “Prof. L. Andreychin” of which at least two are in editions with impact factor (Web of Science) and impact rank (SCOPUS); in peer-reviewed thematic collections of papers and in proceedings of prestigious international conferences in which members of the project team will participate.

Some of the publications will be submitted to free access journals. The most significant theoretical and experimental results will be published in a special collective monograph with studies which will be accepted after being reviewed. To ensure maximum visibility of the project results, the volume will be included in relevant databases of refereed and indexed publications (e.g. SCOPUS, Web of Science, ERIH, EBSCO Publishing), and/or will be distributed via portals for sharing scientific literature (ResearchGate, Academia, etc .).

3. Participation in scientific forums

The main direction for the future dissemination of the results is through participation with at least 6 scientific presentations in international scientific forums in the field of computational linguistics, semantic networks and ontologies, and publications in prestigious international editions. Project participants will submit papers to national and international scientific forums, such as: the Language Resources and Evaluation Conference (LREC’2022), the Conference on Semantics (SEMANTICS), the Conference on Lexical and Computational Semantics (*SEM), the accompanying workshops, the Grammar Forum, organized by the Institute of Bulgarian Language, Sofia, 2021 and 2022, the International Annual Conference of the Institute of Bulgarian Language, Sofia, 2022 and 2023. It is envisaged to provide free access to the reports: multimedia presentations, videos and / or others.

4. Lectures to the scientific community, students and teachers

The results of the project and the analysis done will be represented to students who participate in Linguistics and Computational linguistics competitions and to integrated into the relevant courses in the Master’s programs in Computational Linguistics where some of the team members teach, or are traditionally invited as lecturers: the linguistic seminar of the PHD school at Sofia University; courses in Contemporary Syntactic Theories and Formal Description of Natural Languages in the Master’s Programme Computational Linguistics. Internet technologies in the Humanities.

5. Dissemination of the project results to the general public

As a practical application of future theoretical and applied research and the development and enrichment of the elaborated resources (Wordnet and Framenet) it is envisaged to create educational games for students of different ages and for the general public, in which to integrate the conceptual knowledge of predicates` compatibility and of semantic relations in a way that encourages the application of a research approach in solving the set language tasks.

The publication of the results in renowned refereed and indexed publications, the participation with scientific communications at scientific forums, the organisation of the Special Session on Wordnet and Ontologies, as well as scientific seminars will contribute for the dissemination of results, the development of scientific cooperation with Bulgarian and international teams and for the establishment, maintenance and development of international scientific networks (eg the European Network of Excellence META-NET, the European Federation of National Language Institutes, etc.), in accordance with indicators for the results of this proposal.



Copyright © 2015 Department of computational linguistics. All rights reserved.