bgMWE – a tool for MWE recognition

bgMWE is a tool for corpus processing and MWE recognition and tagging created in 2012. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules:

  • Web crawler for Wikipedia;
  • Extraction of lexical data – lists of words and MWEs;
  • Converter between formats – vertical format, XML, etc.;
  • Pre-processing module – applying a chunker, a tagger, etc.;
  • Collection of frequency data;
  • MWE recognition and tagging.

Further improvement of bgMWE is planned in the following directions:

  • improving efficiency;
  • implementing various methods for MWE recognition;
  • developing a visualisation module or integrating existing open source visualisation methods;
  • module for extensive evaluation.


The tool is distributed as open-source software under the Creative Commons Attribution-NonCommercial 3.0 Unported License.

Creative Commons Licence


bgMWE is available for download from here.

Contact person: Ivelina Stoyanova

Copyright © 2015-2022 Department of computational linguistics. All rights reserved.