Corpus collocation service

General description

The Corpus Collocations Service is a web service for collocations search and extraction of different types of statistics from the Bulgarian National Corpus including the parallel corpora in it – the Bulgarian-X Language Parallel Corpus. It employs the NoSketchEngine, a system for corpora processing that combines Manatee and Bonito.


The Collocation service is a RESTful web service which supports complicated queries through http. The queries returns the collocations of a given word in the NoSketchEngine format. Additionally, the system supports all the arguments accepted by NoSketchEngine, provided with default values and an optional language identifier.

The following types of queries are allowed, where XXXX denotes the query word:
(1) General query with no language specified (default language is Bulgarian)
(2) The following example restricts the statistics to English:

The output is represented in JSON (JavaScript Object Notation) data-interchange format as an array of all collocations of the given word. JSON can be viewed using extensions for different browsers: e.g. JSONView (Firefox, Chrome), JsonViewer (Opera). JSON is easy to process and various libraries are available for C, C++, C#, Java, Python, and many others (//www.json.org/).



Access credentials
Username: bulnc
Password: bulnc




Collocations have numerous applications in corpus linguistics and computational linguistics, specifically in the tasks of machine translation, text generation and summary generation, among others. The Collocation Service allows the users to observe the frequency of words and language constructions, and to generate frequency lists and language models.


Copyright © 2015 Department of computational linguistics. All rights reserved.