EN BG

Enriching the Semantic Network WordNet with Conceptual Frames



Duration: 2021-2023

Type of project: collective

Funding: National Science Research Fund



Principal Investigator: Prof. S. Koeva, Ph.D.

Participants: Prof. S. Koeva, Prof. Mila Dimitrova-Valchanova (Norwegian University of Science and Technology, Trondheim), Prof. Tinko Tinchev (Department of Mathematics and Informatics, Sofia University), Assist. Prof. S. Leseva, Assist. Prof. T. Dimitrova, Assist. Prof. M. Todorova, Assist. Prof. Valentina Stefanova, I. Stoyanova, M. Yalamov, K. Belev, H. Kukova, V. Petrova.

Summary:

The proposed theoretical research is aimed at devising a semantic description and a typology of verb predicates belonging to the basic conceptual apparatus. The semantic description will be based on the elaboration of a system of abstract conceptual frames representing the semantic structure of verbs belonging to the basic vocabulary (including the vocabulary of children of a certain age group) and the integration of conceptual frames into the structure of the semantic network Wordnet.

The fundamental objective of the proposed research is to achieve an abstract representation of the range of conceptual frames which describe the set of semantic relations between verb predicates and noun classes realised as (mandatory or optional) components of predicate structures.

Our hypothesis states that as far as conceptualisation reflects the world around us and allows cross-language communication, such abstract and (to a large extent) language independent description is possible.

In order to achieve our goal, we define the following objectives:

  • Development of a system of conceptual frames representing the semantic structure of verbs from the basic vocabulary (including the vocabulary of children of a certain age group).
  • Detailed ontological presentation of the semantic classes of nouns in WordNet, participating in the semantic structure of verbs from the basic vocabulary.
  • Integrating the system of conceptual frames into the structure of the WordNet semantic network.
  • Derivation of theoretical generalisations for the ontological description of the semantic classes of nouns, for the system of conceptual frames, as well as for the complex presentation of semantic information.

Prerequisite for achieving the main goal and specific objectives is the use of automatic procedures for analysis of large volumes of text with a view to the identification of verb predicates and their surroundings, as well as the implementation and use of an online system for creation, editing and visualisation of conceptual frames.

At the heart of this aspect of the study is the understanding that reliable theoretical conclusions can be drawn on the basis of quantitative and distributive analysis performed using modern language technologies.

In fulfilling the outlined objectives, another fundamental goal will be achieved: differentiation of the main similarities and differences in the models for conceptualisation, lexicalisation and grammaticalisation of different semantic classes of predicates in the modern Bulgarian language. This will highlight language-specific and language-independent semantic characteristics that have general theoretical significance (or are valid for a large group of languages).

The project activities are organised in working packages.

The plan for realisation and dissemination of the results is oriented towards providing wide public access to the acquired new knowledge under non-exclusive and non-discriminatory conditions: providing free online access, dissemination of the created semantic resources with CC-BY-SA license, wide popularisation through participation in national and international scientific forums, integration of teaching results, organisation of events aimed at the general public.

1. Providing online access to project results

A web page will be developed where up-to-date information about the project and its progress will be published. Through this page the project team will make available the scientific results of the project. Access to the project results will also be provided through the page of the Institute for Bulgarian Language. To increase their visibility, the project results (databases and publications describing them) will be published on the META-SHARE page on the website of the Institute for Bulgarian Language: or other similar platforms.

This will ensure wide access to the project results, including: a) the developed semantic resources with free access (license CC-BY-SA) – a collection of verb synonym sets that form a part of the basic vocabulary; an ontology of the semantic classes of nouns in Wordnet; a system of conceptual frames representing the semantic structure of verbs from the basic vocabulary; a database of verb synonym sets and their conceptual frames; a system of conceptual frames integrated into the Wordnet structure; b) software for creation, editing and visualisation of conceptual frames; (c) freely available publications and presentations of reports at scientific forums (in the form of multimedia presentations, video recordings, etc.).

2. Publications

The project envisages preparation and publication of minimum 6 papers in refereed and indexed journals, such as Lingvisticae Investigationes: International Journal of Linguistics and Language Resources, as well in the journals published by the Institute for Bulgarian Language – Balgarski ezik (Bulgarian Language) and Balkansko ezikoznanie (Balkan Linguistics) (indexed in SCOPUS), Proceedings of the Institute of Bulgarian Language “Prof. L. Andreychin” of which at least two are in editions with impact factor (Web of Science) and impact rank (SCOPUS); in peer-reviewed thematic collections of papers and in proceedings of prestigious international conferences in which members of the project team will participate.

Some of the publications will be submitted to free access journals. The most significant theoretical and experimental results will be published in a special collective monograph with studies which will be accepted after being reviewed. To ensure maximum visibility of the project results, the volume will be included in relevant databases of refereed and indexed publications (e.g. SCOPUS, Web of Science, ERIH, EBSCO Publishing), and/or will be distributed via portals for sharing scientific literature (ResearchGate, Academia, etc .).

3. Participation in scientific forums

The main direction for the future dissemination of the results is through participation with at least 6 scientific presentations in international scientific forums in the field of computational linguistics, semantic networks and ontologies, and publications in prestigious international editions. Project participants will submit papers to national and international scientific forums, such as: the Language Resources and Evaluation Conference (LREC’2022), the Conference on Semantics (SEMANTICS), the Conference on Lexical and Computational Semantics (*SEM), the accompanying workshops, the Grammar Forum, organized by the Institute of Bulgarian Language, Sofia, 2021 and 2022, the International Annual Conference of the Institute of Bulgarian Language, Sofia, 2022 and 2023. It is envisaged to provide free access to the reports: multimedia presentations, videos and / or others.

4. Lectures to the scientific community, students and teachers

The results of the project and the analysis done will be represented to students who participate in Linguistics and Computational linguistics competitions and to integrated into the relevant courses in the Master’s programs in Computational Linguistics where some of the team members teach, or are traditionally invited as lecturers: the linguistic seminar of the PHD school at Sofia University; courses in Contemporary Syntactic Theories and Formal Description of Natural Languages in the Master’s Programme Computational Linguistics. Internet technologies in the Humanities.

5. Dissemination of the project results to the general public

As a practical application of future theoretical and applied research and the development and enrichment of the elaborated resources (Wordnet and Framenet) it is envisaged to create educational games for students of different ages and for the general public, in which to integrate the conceptual knowledge of predicates` compatibility and of semantic relations in a way that encourages the application of a research approach in solving the set language tasks.

The publication of the results in renowned refereed and indexed publications, the participation with scientific communications at scientific forums, the organisation of the Special Session on Wordnet and Ontologies, as well as scientific seminars will contribute for the dissemination of results, the development of scientific cooperation with Bulgarian and international teams and for the establishment, maintenance and development of international scientific networks (eg the European Network of Excellence META-NET, the European Federation of National Language Institutes, etc.), in accordance with indicators for the results of this proposal.



All resources produced by the project are distributed in an appropriate format with a free license Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA 4.0).


Task 2.1. A semantic resource: a collection of 5074 verb synsets, chosen according to a special methodology, including quantitative and qualitative criteria in order to verify their belonging to the basic vocabulary.
To download (pdf)


Task 2.1. A semantic resource: a subset of 269 verbs which were chosen to be tested for the degree of their knowledge among children at the initial stage of learning.
To download (pdf)


Task 2.1. Language tasks which were part of language games to test the degree of verbs’ knowledge among children at the initial stage of learning.
To tasks (online access)


Task 2.2. A semantic resource: a system of conceptual frames describing the semantic structure of verbs from the basic vocabulary. Upon acceptance, the system will be open for review by users.

Link to the system (online access: bulframe-editor@dcl.bas.bg | admin)


Task 2.2. A semantic resource: Automatically assigned semantic frames from Framenet to verbs from the basic vocabulary.
To download (pdf)


Task 2.3. A semantic resource: an ontology of the semantic classes of nouns mapped to noun synsets in WordNet that correspond to the types in the semantic structure of the verbs from the basic vocabulary.
To download (pdf)



Task 3.1. Technical specification of a system for creation, editing and visualisation of conceptual frames.
За изтегляне (pdf)


Task 3.2. A software system for creation, editing and visualisation of conceptual frames.

Link to the system (online access)


Task 3.3. An online system for creation, editing and visualisation of conceptual frames.

Link to the system (online access)

Коева, Св. Към типологичен анализ на комплементността в български. Доклади от Международната годишна конференция на Института за български език „Проф. Любомир Андрейчин“ (София, 2021). Светла Коева, Максим Стаменов (съставители), т. 2, Т. 2, София: Издателство на БАН „Проф. Марин Дринов“, 2021, ISSN:2683-118Х (print); ISSN 2683-1198 (online), 13-27. (pdf)

Тодорова, М., Цв. Димитрова, В. Стефанова. Изследване на основния понятиен апарат и речников запас на глаголи при ученици в начален етап на обучение. сп. „Педагогика“, 94, 7, 2022, ISSN:1314–8540 (Online); ISSN 0861–3982 (Print), DOI:10.53656/ped2022-7.06, 896-913. (pdf)

Лесева, Св., Стоянова, Ив. Семантично описание на глаголи за промяна и йерархична организация на концептуалните фреймове. Доклади от Международната годишна конференция на Института за български език „Проф. Любомир Андрейчин“. Светла Коева, Максим Стаменов (съставители), Издателство на БАН „Проф. Марин Дринов“, 2021, ISSN:2683-118Х, DOI:10.7546/ConfIBL2021.II.31, 76-85. (pdf)

Коева, Св., Дойчев, Е. Булфрейм – система за създаване и редактиране на концептуални фреймове. Доклади от Международната годишна конференция на Института за български език „Проф. Любомир Андрейчин“ (София, 2022 година). Светла Коева, Максим Стаменов (съставители), ISSN 2683-118X (print), ISSN 2683-1198 (online), 544-553. (pdf)

Димитрова-Вълчанова, М., В. Вълчанов. Аргументи за добро и за лошо: фактори, влияещи на процеса на активното разбиране на свободни глаголни словосъчетания и глаголни идиоми с еднаква опора. Български език, Приложение (Доклади от Осмия форум „Българска граматика“), 69 (2022), 23-41. (pdf)

Leseva, Sv., Stoyanova, Iv. Linked Resources towards Enhancing the Conceptual Description of General Lexis Verbs Using Syntactic Information (Лексикални ресурси, насочени към обогатяването на концептуалното описание на основна глаголна лексика със семантична и синтактична информация). Proceedings of the Fifth International Conference Computational Linguistics in Bulgaria (CLIB 2022), 2022, ISSN:2367-5675, 214-223. Индексира се в Scopus. (pdf)

Koeva, S., E. Doychev. Ontology Supported Frame Classification (Класификация на фреймове, основана на онтология). Proceedings of the Fifth International Conference Computational Linguistics in Bulgaria (CLIB 2022), 2022, ISSN:2367-5675, 214-223. Индексира се в Scopus. (pdf)

Presenting at conferences

Svetla Koeva on: Towards a typological analysis of complements in Bulgarian, a talk at the International Annual Conference of the Institute for Bulgarian Language “Prof. Lyubomir Andreychin”, 15.05.2021.

Svetlozara Leseva and Ivelina Stoyanova on: Semantic description of verbs for change and hierarchical organization of conceptual frames, a talk at the International Annual Conference of the Institute of the Bulgarian Language “Prof. Lyubomir Andreychin”, 15.05.2021.

Svetla Koeva and Emil Doichev on: Bullframe – a system for creating and editing conceptual frames, a talk at the International Annual Conference of the Institute of the Bulgarian Language “Prof. Lyubomir Andreychin”, 15.05.2022.

Mila Dimitrova-Vulchanova and Valentin Vulchanov on: Arguments for good and bad: factors influencing the process of active comprehension of free verb phrases and verb idioms with equal support, a talk at the Eighth Forum “Bulgarian Grammar”, organized by the Institute of the Bulgarian Language “Prof. Lyubomir Andreychin”, October 21 and 22, 2021.

Svetlozara Leseva and Ivelina Stoyanova on: Linked Resources towards Enhancing the Conceptual Description of General Lexis Verbs Using Syntactic Information, a talk at the Fifth International Conference Computational Linguistics in Bulgaria (CLIB’2022), September 2022.

Svetla Koeva and Emil Doichev on: Ontology Supported Frame Classification, a talk at the Fifth International Conference Computational Linguistics in Bulgaria (CLIB’2022), September 2022.

Scientific seminars

Seminar on October 27, 2021, at the Institute for Bulgarian Language “Prof. Lyubomir Andreychin”

11 a.m. – 1 p.m.: Seminar on Work package 2. A formal description of the semantic structure of verb synsets belonging to the basic vocabulary, participants: Prof. Svetla Koeva, Prof. Mila Dimitrova-Vulchanova (head of the work package), Prof. Valentin Vulchanov, Assist. Prof. Svetlozara Leseva, Assist. Prof. Maria Todorova, Assist. Prof. Valentina Stefanova, Assist. Prof. Ivelina Stoyanova, Assist. Prof. Tsvetana Dimitrova, Assist. Prof. Hristina Kukova. At the seminar the following activities were performed: review and evaluation of the results of the implementation of Selection of verb synsets that form a part of the basic vocabulary: from Month 1 to Month 6. Discussion of the current work on Task 2.2. Definition of the conceptual frames sufficient for the description of the selected verb synsets: from Month 7 to Month 18.

3 – 5 p.m.: Discussion of the experiments to determine the basic vocabulary of verbs in different age groups and their conceptual frames: thematic areas; criteria for selecting the elements in the experiments; degree of complexity; effect of order of language tasks.

Participants: Prof. Mila Dimitrova-Vulchanova, Prof. Valentin Vulchanov, Assist. Prof. Maria Todorova, Assist. Prof. Valentina Stefanova, Assist. Prof. Tsvetana Dimitrova, Assist. Prof. Hristina Kukova.

Seminar on October 28, 2021, Institute for Bulgarian Language “Prof. Lyubomir Andreychin”

10 – 11 a.m. Lecture by Prof. Mila Dimitrova-Vulchanova (Norwegian University of Science and Technology, Trondheim) on: Aspect in the mind of the speaker, at the Autumn Linguistic Seminar of the Institute for Bulgarian Language

2 – 4 p.m. Discussion of a joint publication reflecting the results so far. Participants: Prof. Svetla Koeva, Prof. Mila Dimitrova-Vulchanova, Prof. Valentin Vulchanov, Assist. Prof. Svetlozara Leseva, Assist. Prof. Maria Todorova, Assist. Prof. Ivelina Stoyanova

Seminar on October 29, 2021, Institute for Bulgarian Language “Prof. Lyubomir Andreychin”

10 – 12 a.m.: Discussion on the main goals and activities within Task 2.3. Definition of an ontology of the semantic classes of noun synsets in WordNet: from Month 13 to Month 18. As a result of the analysis in Task 2.2., the semantic classes selected for the conceptual frames of the verb synsets will be determined. Participants: Prof. Svetla Koeva, Prof. Mila Dimitrova-Vulchanova, Prof. Valentin Vulchanov, Prof. Tinko Tinchev, Assist. Prof. Svetlozara Leseva, Assist. Prof. Maria Todorova, Assist. Prof. Valentina Stefanova, Assist. Prof. Ivelina Stoyanova, Assist. Prof. Tsvetana Dimitrova.

2 – 4 p.m.: Discussion of Prof. Mila Dimitrova-Vulchanova and Prof. Valentin Vulchanov with all participants: a summary of the results achieved so far, an overview of the tasks that remain to be completed by the end of the first stage, identifying specific subtasks and the conditions for their implementation.

Copyright © 2015-2022 Department of computational linguistics. All rights reserved.