The Corpus Collocations Service is a web service for collocations search and extraction of different types of statistics from the Bulgarian National Corpus including the parallel corpora in it – the Bulgarian-X Language Parallel Corpus. It employs the NoSketchEngine, a system for corpora processing that combines Manatee and Bonito.
The Collocation service is a RESTful web service which supports complicated queries through http. The queries returns the collocations of a given word in the NoSketchEngine format. Additionally, the system supports all the arguments accepted by NoSketchEngine, provided with default values and an optional language identifier.
The following types of queries are allowed, where XXXX denotes the query word:
(1) General query with no language specified (default language is Bulgarian)
(2) The following example restricts the statistics to English:
Collocations have numerous applications in corpus linguistics and computational linguistics, specifically in the tasks of machine translation, text generation and summary generation, among others. The Collocation Service allows the users to observe the frequency of words and language constructions, and to generate frequency lists and language models.