bgMWE is a tool for corpus processing and MWE recognition and tagging. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules:
- Web crawler for Wikipedia;
- Extraction of lexical data – lists of words and MWEs;
- Converter between formats – vertical format, XML, etc.;
- Pre-processing module – applying a chunker, a tagger, etc.;
- Collection of frequency data;
- MWE recognition and tagging.
Further improvement of bgMWE is planned in the following directions:
- improving efficiency;
- implementing various methods for MWE recognition;
- developing a visualisation module or integrating existing open source visualisation methods;
- module for extensive evaluation.
The tool is distributed as open-source software under the Creative Commons Attribution-NonCommercial 3.0 Unported License.
bgMWE is available for download from here.