EN BG

PARSEME shared task: Phase 1

Phase 1: First pilot annotation with feedback on guidelines, adding language-specific features

Last edited: 14/03/2016

Currently working on: ➥ PHASE 2 Guidelines

Principles of classification and annotation

The classification we propose uses the initial categories (LVC, VPV, SENT, ID, OTH) representing also some idiomaticity features:

  1. The subclasses of ID (ID1, ID2, ID3 and ID4) reflect the degree of decomposability represented by the degree of vMWE components’ fixedness:
  • ID1 includes frozen expressions (except for the verb forms)
  • ID2 comprises of vMWEs which allow only limited forms of any of the components
  • ID3 allows the addition of modifiers to the components
  • ID4 is the category of internalised (but open) argument position
  1. Тhe classification scheme covers the main categories of vMWEs in Bulgarian (for which there are examples in the Bulgarian Dictionary of MWEs), but is extendable to cover more categories
  2. Some conventions for Bulgarian which are different from the proposed pilot scheme in PARSEME
  • vMWEs with internal subject (the subject is part of the MWE) – we include these in the SENT category as they are not prototypical vMWEs (head is not a verb)

излиза  ми                име

appears for-me.DAT name  ‘a name sticks (for/to me)’

чашата     на   търпението ми         прелива  

glass.DEF of  patience      my.POS  overflows  ‘my patience runs out’

 


 

Classification

1. Lexicalised combinations of verbs with  reflexive and reciprocal particles and dative and accusative pronominal clititcs

Within which category: OTH

  • OTH1 with ‘се’

смея/V   се

laugh   REFL.ACC

състезавам/V   се

compete  REFL.ACC

 

  • OTH2 with ‘си’

въобразявам/V си

imagine   REFL.DAT

 

  • OTH3 Accusativa tantum

мързи/V ме

is_lazy   me.ACC  ‘I am lazy’, ‘it feels lazy to me’

 

  • OTH4 Dativa tantum

нагарча/V  ми

is_bitterish me.DAT

 

  • OTH2/OTH4 and other combinations

гади/V ми се

it_feels_sick me.DAT  REFL.ACC    ‘I feel sick’

 

2. Lexicalised combinations of verb and preposition

Within which category: VPC

вярвам/V в/P

believe      in

разчитам/V на/P

rely      on

3. Light verb constructions

Within which category: LVC

  • LVC-NP

чета/V доклад/N

read     report      ‘to present/to make a presentation’

вземам/V решение/N

take          decision   ‘to make a decision’

вземам/V   важно/A решение/N

take          important   decision

 

  • LVC-PP

държа/V под/P контрол/N

keep under control

 

(LVCs usually allow modifications of components)

 

4. ID1: Non-decomposable vMWEs with frozen components (except the verb)

Within which category: ID

  • ID1-NP

ритам/V камбаната/N

kick        bell.DEF   ‘to kick the bucket’

давам/V зелена/A улица/N

give        green       street     ‘to give the green light’

 

  • ID1-PP

бия/V на/P очи/N

hit    on   eyes  ‘to catch the eye’

вземам на мушка

take      on  target   ‘to draw a bead on’

удрям в гръб

hit      in  back  ‘to stab (s.b.) in the back’

изтривам от лицето      на земята

wipe       from face.DET of earth.DET   ‘to wipe (s.b.) off from the face of the earth’

 

  • ID1-NPPP

имам  жълто около  устата

have   yellow  around mouth.DET   ‘to be wet behind the ears’

 

  • ID1-PPPP

прочитам/V от/P корица/N до/P корица/N

read             from   cover      to     cover

 

  • ID1-AdvP

изваждам/V    наяве/ADV

expose            in_the_open   ‘to bring to light’

имам/V предвид/ADV

have     into_account   ‘to take into account’

 

  • ID1-AP

съм/V ясен/A като/P бял/A ден/N

be     clear     as         white  day    ‘to be as clear as day’

 

  • ID1-NPSC vMWEs with objectival small clause

държа очите      си                     отворени

keep  eyes.DET   my/your/.POSS  open    ‘to keep one’s eyes open’

 

5. ID2: Non-decomposable/semi-decomposable vMWEs with semi-fixed components (except the verb)

Within which category: ID

  • ID2-NP

броя/V звезди/N

count   stars      ‘to count the stars’

броя/V звездите/N

count    stars.DET

 

вдигам/V бяло/А знаме/N  

hold     white    flag      ‘to wave a white flag’

вдигам/V бялото/А знаме/N  

hold     white       flag      ‘to wave the white flag

 

  • ID2-PP

хващам/V в/P капан/N

catch        in    trap    ‘catch in a trap’

хващам/V в/P  капана/P

catch      in       trap.DEF ‘catch in the trap’

 

6. ID3: Decomposable vMWEs which allow modifier slots (XP_YP is labelling the possible modifier YP of the component XP of the MWE) 

Within which category: ID

  • ID3-NP_AdvP

виждам (много) зор

see         (much)   hardship    ‘to find (s.th.) very tough’

 

  • ID3-NP_AP

правя (първи/малки) стъпки

make   (first/small)      steps     ‘to make first steps’

давам (добър) урок

give    (good)   lesson‘to give s.b. a lesson’

забърквам (голяма) каша

mix-up (big) mess        ‘to make a (big) mess’

 

  • ID3-PP_AP

гълтам с (жадни) очи

swallow with  thirsty  eyes    ‘to take it all with greedy eyes’

 

  • ID3-PP_AdvP

стигам до много сърца

reach    to  many   hearts ‘to reach many hearts’

 

  • ID3-NP_APPP

хвърлям/V пари/N на/P вятъра/N  ‘throw money around’

throw       money  on  wind.DEF

хвърлям (големи/всички) пари на вятъра

throw     (big/all)               money  on  wind.DEF

 

  • ID3-AdvP_AdvP  

гледам/V    отвисоко/ADV

look             from_above            ‘to look down (at s.o.)’

гледам/V много/ADV    отвисоко/ADV

look           much  from_above            ‘to look very much down (at s.o.)

 

  • ID3-_AdvPNP (to the verb, but is still internal, i.e. within the metaphor)

вдигам  летвата    високо

raise     stick.DET   high   ‘to raise the bar high’

 

7. ID4: Decomposable  vMWEs with possessive slot within the NP/PP 

  • ID4-NP_наNp

копая/V гроба/N PPos{на/P някого/N} possessive prepositional expression

копая нечий гроб

dig  grave.DET of someone ‘dig one’s grave’

 

  • ID4-PP_наNP

ходя/V по/P нервите/N  PPos{на/P  някого/N}  possessive prepositional expression

ходя по нечии нерви

walk  on  nerves.DET  of somebody     ‘get on s.o.’s nerves’

 

8. ID5: vMVEs with an open slot NP or PP

  • ID5-_NSCs  vMWEs with small clause in agreement with the object slot 

изкарвам някого/го      чист

make        s.b./him.ACC  clean    ‘to make s.b. as innocent as a babe unborn’

одирам някого          жив

skin      s.b./him.ACC  alive   ‘to skin s.b. alive’

 

9. MWEs with sentence features

Within which category: SENT

  • SENT1: proverbs

който        не работи,  не трябва да яде

who.DET  not work      not shoud to  eat   

He who does not work, neither shall he eat.’ (2 Thesal. 3:10, НТ; popular biblical proverb)

 

  • SENT2: frozen clausal expressions

както и да е

as it is ‘whatever’

каквото и да става

whatever to happen       ‘no matter what happens’

 

  • SENT3: MWEs with lexicalised subject

излиза     ми              име

appears  for-me.DAT  name ‘a name sticks (for/to me)’

чашата     на търпението ми          прелива  

glass.DET of  patience      my.POS  overflows  ‘my patience runs out’

 

  • SENT4: vMWEs small clause in agreement with the subject slot

имам думата      последен

have  word.DET last        ‘to have the last word’

излизам/V сух/A от/F водата/N

come_out    dry    from  water.DEF  ‘come off clean’

 

10. Nominal, Adverbial or Participal MWEs derived from vMWEs

From Guidelines (among the types of vMWEs the following type is described): Syntactic nominal, participial and other variants of prototypical VMWEs maintaining their idiomatic reading, e.g. decisions which we made, decision making, heart-breaking.

ORIG_CAT/NEW_CAT where NEW_CAT can be one of: NOM (NP), PART (AP with participle), TRANS (AdvP with transgressive, corresponding to English present participle used to describe simultaneous actions)

Types of transformation:

  • NP from the vMWE 
  • Actor
  • Deeprichastie
  • NP turns into head

 

  • ID1-PP/NOM 

удрям в гръб (vMWE)

hit in back ‘to stab (s.b.) in the back’

удар в гърба

blow in back ‘blow in the back’

 

  • ID2-NP/NOM (NP turns into PP_на)

броя звезди (vMWE)

count stars ‘to count stars’

броене на звезди

counting stars ‘(the act of) counting stars’

 

играя карти (vMWE)

play cards

игра на карти

game of cards ‘card game’

играч на карти

player of cards  ‘card player’

 

  • ID2-NP/TRANS (NP turns into PP_на)

броя звездите (vMWE)

count stars ‘to count stars’

броейки звездите

counting stars ‘(while) counting stars’

 

  • ID3-_AdvPNP/NOM

вдигам  летвата    високо (vMWE)

raise     stick.DEF   high ‘to raise the bar’

високо вдигната летва

highly   raised     stick.DET ‘high bar’

 

  • ID2-NP/PART

броя звезди (vMWE)

count stars ‘to count stars’

броящ звезди (allows forms of the participle – gender, number, definitiveness)

counting stars ‘who counts stars’

 

броя минутите (vMWE)

count minutes

броени минути

counted minutes ‘counted minutes’

 

  • ID5-_NSCs/PART

одирам някого жив

skin s.b./him.ACC alive ‘to skin s.b. alive’

одран жив

skinned alive ‘skinned alive’

 


 

Sample annotation

We performed automatic annotation (using a dictionary of vMWEs) on a selection of 200 sentences, comprising a total of 7,234 tokens. The texts were annotated with POS, grammatical features and lemma. There are 318 annotated verbal and sentential MWEs.

Note: Manual verification of annotation, as well as dictionary expansion and description, are still ongoing.

➥ Link to vMWE annotation (Google spreadsheet, read only)

➥ Link to vMWE dictionary (4,268 vMWEs and sentential MWEs)

Copyright © 2015-2022 Department of computational linguistics. All rights reserved.