bgMWE – a tool for MWE recognition « Секция по компютърна лингвистика

bgMWE is a tool for corpus processing and MWE recognition and tagging created in 2012. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules:

Web crawler for Wikipedia;
Extraction of lexical data – lists of words and MWEs;
Converter between formats – vertical format, XML, etc.;
Pre-processing module – applying a chunker, a tagger, etc.;
Collection of frequency data;
MWE recognition and tagging.

Further improvement of bgMWE is planned in the following directions:

improving efficiency;
implementing various methods for MWE recognition;
developing a visualisation module or integrating existing open source visualisation methods;
module for extensive evaluation.