Български

Web applications for editing Bulgarian texts

Home

 

The goal of the project is to develop various Web services allowing users to edit and correct the spelling and grammar of Bulgarian texts, to look up words in various dictionaries (Thesaurus, Bulgarian-English, etc.), as well as to use several facilities for editing Bulgarian texts quickly and easily.
The specific objectives of the project can be defined as follows:

  • Creating Web applications (Web services, Web components and Web applications) to correct Bulgarian texts (spelling and grammar), to detect errors and to generate the most appropriate replacement suggestions.
  • Creating Web applications (Web services, Web components and Web applications) to speed up working with Bulgarian texts (correct hyphenation, spelling autocorrect, automatic insertion or replacement of user-defined symbols and text).
  • Creating Web applications (Web services, Web components and Web applications) improving the quality of working with Bulgarian texts (looking up words in a thesaurus, looking up words in a bilingual (Bulgarian-English) dictionary, ect.
  • Description

     

    The development of Web based applications assisting the work with Bulgarian texts is imposed, on the one hand, by the wider use of the Internet in everyday communications of various types (work, education, administration, media), and on the other hand, by the lack of modern Web based linguistic applications (for Bulgarian).
    The advantages of the Web based linguistic applications can be summarized as follows: they are more accessible to use as they are not related to any operation system or Web browser. The wider use of the Internet not only as an environment for communication but also as an operating environment, which includes text creation and editing, increases the importance of the project outcomes.
    During the first stage of the project the work was concentrated on the creation of Web based services (Web services, Web components and Web applications) to correct Bulgarian texts (spelling and grammar), to detect errors and to generate the most appropriate replacement suggestions. Various tasks were performed to achieve these results. All the tasks may be grouped as follows:

  • Providing large and consistent electronic linguistic resources.
    
The performed tasks include enlargement (inclusion, verification, and editing) of the various types of inflectional dictionaries (grammar, of proper names, thematic, of abbreviations, of multiword expressions).
  • Representation of language grammar rules at various language levels by means of formal grammars.
    
Low type linguistic rules, such as regular expressions used for detecting and assessing the type of text units, were improved through analysis, verification and editing. Context rules for grammar checking were formulated. Support systems were used for automated extraction of linguistic rules of different ranks and for operating with these rules.
  • Providing programs for processing Bulgarian which are effective with respect to speed, coverage, and precision.
    An improvement was achieved (through analysis, testing and improvement of the indicators for coverage and precision) of the programs for detecting the text units (tokenizer) and ascribing the correct grammar characteristics in a given context (tagger and lemmatizer). A program was created (conceptual modelling, programming and testing) which is suitable for the formulation of linguistic rules and which builds the rules into a finite state transducer and applies it on the text.
  • Creation of advanced linguistic applications for assisting the work with texts, finding the highest number of requested categories (spelling and grammar errors) and offering the most appropriate replacement suggestions.
    The first stage of the project resulted in the creation of the program Est for detection of spelling errors and generation of replacement suggestions and the program Est+ for detection of grammatical errors and generation of replacement suggestions for Bulgarian texts. The Spelling Checker is based on the construction of a dictionary in a minimal acyclic deterministic automaton and offers replacement suggestions on the basis of Levenshtein automata. The/Grammar Checker constructs a previously formulated grammar of grammatical errors in a finite state transducer. Both programs are available also as Windows applications (WinEst and WinEst+), accessible from the project webpage, and in addition the Spelling Checker is offered as an application for Mac OS.
  • The creation of modern Web based linguistic applications (Web services, Web components and Web applications) which offer a possibility for effective work with no respect to operation systems, text processing applications or browsers.
    The Spelling Checker is integrated as a Web service – both the online spelling checking and the Web service integration are possible. WebEst allows the users to check and correct Bulgarian texts on the Internet. The Spelling Checker Web service can be used in different blogs, chat forums, online shops, media, and everywhere in the creation of Internet contents, so that it will assist the correct writing of Bulgarian texts.
  • Check

     

    Web application of the Bulgarian Spelling Checker service.

    Bulgarian Spelling Checker online

    Web application of the Bulgarian grammatical dictionary.

    Bulgarian Dictionary online

    Publications

     

    Oliva K. i Sv. Koeva – Sintaksis na nevazmozhnoto, Balgarski ezik, 3, 7-17, 2009. ISSN 0005-4283.

    Koeva Sv. Syntactic Annotation in Bulgarian National Corpus. In: Proceedings from the seventh international conference Formal Approaches to South Slavic and Balcan Languages, Dubtovnik, 35-41, 2010. ISSN 978-953-55375-2-6 pdf file

    Integration

     

    Spelling Checker Web Service - description

    The Web service WebEst allows the Bulgarian Spelling Checker integration in various types of Web applications.
    The Web service can be integrated in a Web application through а Java script component, which provides requests to the Web service as well as the communication with the user interface. The integrated application allows direct queries to the Web service to check words spelling and generates suggestions for correction of misspelled words.
    The Web service is compatible with the popular jQuery component: jquery-spellchecker, which allows direct integration with a number of Web components for texts editing, such as: MarkItUp, jHtmlArea, WYMeditor, YUI, Dojo, NicEdit, etc.

    The Web service offers: checking for misspelled words and suggestions for a replacement.

    Checking for misspelled words

    Request:
        http://dcl.bas.bg/est/checkspelling.php?engine=dcl
        Content type: application/x-www-form-urlencoded;charset=UTF-8
        Data: text="a chunk of text without any punctuation"

    Response:
        Content type: application/json
        Example response: ["ябалка","кориер","ношница"]

    Suggestions for words correction:

    Request:
        http://dcl.bas.bg/est/checkspelling.php?engine=dcl
        Content type: application/x-www-form-urlencoded;charset=UTF-8
        Data: suggest="ношница"

    Response:
        Content type: application/json
        Example response: ["нощница","ножница","кошница"]

    Application with the jquery-spellchecker

    To integrate the Spelling Checker service WebEst, download jQuery Javascript Spelling Checker and include in the head section of your html document the following:

    <link rel="stylesheet" type="text/css" media="screen" href="css/spellchecker.css" />
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js"></script>
    <script type="text/javascript" src="js/jquery.spellchecker.js>></script>
      		

    Default usage:

    		
    $("textarea#text-content")
    .spellchecker({
    	     url: 'http://dcl.bas.bg/est/checkspelling.php',
            lang: "bg",
          engine: "dcl"
    })
    .spellchecker("check");
    		

    All options:

    		
    $("textarea#text-content")
    .spellchecker({
            url: 'http://dcl.bas.bg/est/checkspelling.php',       // url of the Spelling Checker Web service
            lang: "bg",       // Bulgarian language 
            engine: "dcl",       // dcl spell engine
            addToDictionary: false,       // currently not supported
            wordlist: {
                    action: "after",       // jquery dom insert action
                    element: $("#text-content")       // which object is applied
            },      
            suggestBoxPosition: "below",       // position of the box with suggestions; above or below the highlighted word
            innerDocument: false       //"true" will highlight the misspelled words
    });
    

    Downloads

     

    General description: The system for spelling checking WinEst for Microsoft Office detects and marks the incorrectly written words in a text and suggests the most probable candidates to correct the errors. WinEst offers the entire potential of the contemporary spelling correction: proficiently compiled dictionary, which contains over a million and a half words, and replacement suggestions, which are ordered according to their probability. WinEst is based on the Electronic Grammar Dictionary of Bulgarian, developed at the Department of Computational Linguistics, which contains over 85 000 words. It contains logic for detection of careless mistakes (wrong key pressed, letter swapping, skipped letters or extra letters), identifies errors of ignorance and integrates perfectly into the dictionaries used in Microsoft Office. WinEst uses an extremely fast and effective method for searching and detecting the correct words regardless of the text size. The functionality of the product is realised through the use of minimal acyclic deterministic automata and Levenshtein automata, which allow maximum speed, precision and coverage. A distinctive feature of WinEst is it is easy to install and uninstall, and no System restart is required.

    Advantages: WinEst offers the entire potential of the contemporary spelling checking and correction. Together with the proficiently compiled dictionary the product is capable of finding replacement suggestions, which are ranked by probability.

    Representativeness: covers the basic wordstock of Bulgarian.

    Precision:all words are checked by experts.

    Convenience: the replacement candidates are ranked by probability A module for Cyrillic layout: WinEst works perfectly both with the standard BDS layout and with the various phonetic layouts.

    Requirements:

    • Architecture: х86
    • Operating system: Microsoft Windows XP or Microsoft Windows 7
    • Office: Microsoft Office 2007 / 2010 - 32-bit version
    • Cyrillic layout on the keyboard. WinEst works with the standard BDS as well as the phonetic layouts.

    WinEst is a 32-bit module and thus requires a 32-bit Microsoft Office. The table below shows the operational compatibility of WinEst with the various versions of Microsoft Windows and Microsoft Office.

    A table of compatibility of WinEst with versions of Microsoft Windows and the different packages of Microsoft Office

    Operating System / Office Windows XP (32 bit) Windows XP (64 bit) Windows 7 (32 bit) Windows 7 (64 bit)
    Office 2007 (32 bit) Works Works Works Works
    Office 2007 (64 bit) Incompatible Does not work Incompatible Does not work
    Office 2010 (32 bit) Works Works Works Works
    Office 2010 (64 bit) Incompatible Does not work Incompatible Does not work

    Notes:

    • Once installed WinEst overrides any existing spellcheckers for Bulgarian language available for Microsoft Office – they are inactive until WinEst is uninstalled.
    • Once installed WinEst does not require any reinstallation due to upgrade or downgrade of Microsoft Office (as long as the new version is listed in the table above).

    Download:

    News

     

  • The Bulgarian Spelling and Grammar Checkers are available as Windows applications (WinEst и WinEst+), the Bulgarian Spelling Checker - as Mac OS application as well (MacEst).
  • Web service of the Spelling Checker WebEst can be used in webmail programs, blogs, chat rooms, forums, online magazines editorial pages, and everywhere where the online content is created such as to support the Bulgarian spelling.
  • Contacts

     

    The Department of Computational Linguistics deals with theoretical and applied research in the field of natural language processing (NLP). A main goal of the Department of Computational Linguistics is the development of efficient theoretical models and language technologies implemented within state-of-the-art computer applications and systems. The Department of Computational Linguistics is part of the Institute of Bulgarian Language "Professor Luybomir Andreychin" at the Bulgarian Academy of Sciences. The research team of the Department of Computational Linguistics was granted First place Scholarly Achievements Award and a diploma by the National Science Fund for attaining significant results in the development of the research project Verb Semantics - problems of the Interface (28 December 2005).
    1113 Sofia, 52, Shipchenski prohod blvd., bl. 17
    tel.: ++ 359 2 979 2969, ++ 359 2 979 2971; fax.: ++ 359 2 872 23 02, e-mail:  dcl@dcl.bas.bg

    Funding

     

    The project Web applications for editing Bulgarian texts, developed by the Department of Computational Linguistics, institute for Bulgarian, BAS, is funded by: