Phase 1: First pilot annotation with feedback on guidelines, adding language-specific features
Last edited: 14/03/2016
Currently working on: ➥ PHASE 2 Guidelines
Principles of classification and annotation
The classification we propose uses the initial categories (LVC, VPV, SENT, ID, OTH) representing also some idiomaticity features:
- The subclasses of ID (ID1, ID2, ID3 and ID4) reflect the degree of decomposability represented by the degree of vMWE components’ fixedness:
- ID1 includes frozen expressions (except for the verb forms)
- ID2 comprises of vMWEs which allow only limited forms of any of the components
- ID3 allows the addition of modifiers to the components
- ID4 is the category of internalised (but open) argument position
- Тhe classification scheme covers the main categories of vMWEs in Bulgarian (for which there are examples in the Bulgarian Dictionary of MWEs), but is extendable to cover more categories
- Some conventions for Bulgarian which are different from the proposed pilot scheme in PARSEME
- vMWEs with internal subject (the subject is part of the MWE) – we include these in the SENT category as they are not prototypical vMWEs (head is not a verb)
излиза ми име
appears for-me.DAT name ‘a name sticks (for/to me)’
чашата на търпението ми прелива
glass.DEF of patience my.POS overflows ‘my patience runs out’
Classification
1. Lexicalised combinations of verbs with reflexive and reciprocal particles and dative and accusative pronominal clititcs
Within which category: OTH
- OTH1 with ‘се’
смея/V се
laugh REFL.ACC
състезавам/V се
compete REFL.ACC
- OTH2 with ‘си’
въобразявам/V си
imagine REFL.DAT
- OTH3 Accusativa tantum
мързи/V ме
is_lazy me.ACC ‘I am lazy’, ‘it feels lazy to me’
- OTH4 Dativa tantum
нагарча/V ми
is_bitterish me.DAT
- OTH2/OTH4 and other combinations
гади/V ми се
it_feels_sick me.DAT REFL.ACC ‘I feel sick’
2. Lexicalised combinations of verb and preposition
Within which category: VPC
вярвам/V в/P
believe in
разчитам/V на/P
rely on
3. Light verb constructions
Within which category: LVC
- LVC-NP
чета/V доклад/N
read report ‘to present/to make a presentation’
вземам/V решение/N
take decision ‘to make a decision’
вземам/V важно/A решение/N
take important decision
- LVC-PP
държа/V под/P контрол/N
keep under control
(LVCs usually allow modifications of components)
4. ID1: Non-decomposable vMWEs with frozen components (except the verb)
Within which category: ID
- ID1-NP
ритам/V камбаната/N
kick bell.DEF ‘to kick the bucket’
давам/V зелена/A улица/N
give green street ‘to give the green light’
- ID1-PP
бия/V на/P очи/N
hit on eyes ‘to catch the eye’
вземам на мушка
take on target ‘to draw a bead on’
удрям в гръб
hit in back ‘to stab (s.b.) in the back’
изтривам от лицето на земята
wipe from face.DET of earth.DET ‘to wipe (s.b.) off from the face of the earth’
- ID1-NPPP
имам жълто около устата
have yellow around mouth.DET ‘to be wet behind the ears’
- ID1-PPPP
прочитам/V от/P корица/N до/P корица/N
read from cover to cover
- ID1-AdvP
изваждам/V наяве/ADV
expose in_the_open ‘to bring to light’
имам/V предвид/ADV
have into_account ‘to take into account’
- ID1-AP
съм/V ясен/A като/P бял/A ден/N
be clear as white day ‘to be as clear as day’
- ID1-NPSC vMWEs with objectival small clause
държа очите си отворени
keep eyes.DET my/your/.POSS open ‘to keep one’s eyes open’
5. ID2: Non-decomposable/semi-decomposable vMWEs with semi-fixed components (except the verb)
Within which category: ID
- ID2-NP
броя/V звезди/N
count stars ‘to count the stars’
броя/V звездите/N
count stars.DET
вдигам/V бяло/А знаме/N
hold white flag ‘to wave a white flag’
вдигам/V бялото/А знаме/N
hold white flag ‘to wave the white flag’
- ID2-PP
хващам/V в/P капан/N
catch in trap ‘catch in a trap’
хващам/V в/P капана/P
catch in trap.DEF ‘catch in the trap’
6. ID3: Decomposable vMWEs which allow modifier slots (XP_YP is labelling the possible modifier YP of the component XP of the MWE)
Within which category: ID
- ID3-NP_AdvP
виждам (много) зор
see (much) hardship ‘to find (s.th.) very tough’
- ID3-NP_AP
правя (първи/малки) стъпки
make (first/small) steps ‘to make first steps’
давам (добър) урок
give (good) lesson‘to give s.b. a lesson’
забърквам (голяма) каша
mix-up (big) mess ‘to make a (big) mess’
- ID3-PP_AP
гълтам с (жадни) очи
swallow with thirsty eyes ‘to take it all with greedy eyes’
- ID3-PP_AdvP
стигам до много сърца
reach to many hearts ‘to reach many hearts’
- ID3-NP_APPP
хвърлям/V пари/N на/P вятъра/N ‘throw money around’
throw money on wind.DEF
хвърлям (големи/всички) пари на вятъра
throw (big/all) money on wind.DEF
- ID3-AdvP_AdvP
гледам/V отвисоко/ADV
look from_above ‘to look down (at s.o.)’
гледам/V много/ADV отвисоко/ADV
look much from_above ‘to look very much down (at s.o.)
- ID3-_AdvPNP (to the verb, but is still internal, i.e. within the metaphor)
вдигам летвата високо
raise stick.DET high ‘to raise the bar high’
7. ID4: Decomposable vMWEs with possessive slot within the NP/PP
- ID4-NP_наNp
копая/V гроба/N PPos{на/P някого/N} possessive prepositional expression
копая нечий гроб
dig grave.DET of someone ‘dig one’s grave’
- ID4-PP_наNP
ходя/V по/P нервите/N PPos{на/P някого/N} possessive prepositional expression
ходя по нечии нерви
walk on nerves.DET of somebody ‘get on s.o.’s nerves’
8. ID5: vMVEs with an open slot NP or PP
- ID5-_NSCs vMWEs with small clause in agreement with the object slot
изкарвам някого/го чист
make s.b./him.ACC clean ‘to make s.b. as innocent as a babe unborn’
одирам някого жив
skin s.b./him.ACC alive ‘to skin s.b. alive’
9. MWEs with sentence features
Within which category: SENT
- SENT1: proverbs
който не работи, не трябва да яде
who.DET not work not shoud to eat
‘He who does not work, neither shall he eat.’ (2 Thesal. 3:10, НТ; popular biblical proverb)
- SENT2: frozen clausal expressions
както и да е
as it is ‘whatever’
каквото и да става
whatever to happen ‘no matter what happens’
- SENT3: MWEs with lexicalised subject
излиза ми име
appears for-me.DAT name ‘a name sticks (for/to me)’
чашата на търпението ми прелива
glass.DET of patience my.POS overflows ‘my patience runs out’
- SENT4: vMWEs small clause in agreement with the subject slot
имам думата последен
have word.DET last ‘to have the last word’
излизам/V сух/A от/F водата/N
come_out dry from water.DEF ‘come off clean’
10. Nominal, Adverbial or Participal MWEs derived from vMWEs
From Guidelines (among the types of vMWEs the following type is described): Syntactic nominal, participial and other variants of prototypical VMWEs maintaining their idiomatic reading, e.g. decisions which we made, decision making, heart-breaking.
ORIG_CAT/NEW_CAT where NEW_CAT can be one of: NOM (NP), PART (AP with participle), TRANS (AdvP with transgressive, corresponding to English present participle used to describe simultaneous actions)
Types of transformation:
- NP from the vMWE
- Actor
- Deeprichastie
- NP turns into head
- ID1-PP/NOM
удрям в гръб (vMWE)
hit in back ‘to stab (s.b.) in the back’
удар в гърба
blow in back ‘blow in the back’
- ID2-NP/NOM (NP turns into PP_на)
броя звезди (vMWE)
count stars ‘to count stars’
броене на звезди
counting stars ‘(the act of) counting stars’
играя карти (vMWE)
play cards
игра на карти
game of cards ‘card game’
играч на карти
player of cards ‘card player’
- ID2-NP/TRANS (NP turns into PP_на)
броя звездите (vMWE)
count stars ‘to count stars’
броейки звездите
counting stars ‘(while) counting stars’
- ID3-_AdvPNP/NOM
вдигам летвата високо (vMWE)
raise stick.DEF high ‘to raise the bar’
високо вдигната летва
highly raised stick.DET ‘high bar’
- ID2-NP/PART
броя звезди (vMWE)
count stars ‘to count stars’
броящ звезди (allows forms of the participle – gender, number, definitiveness)
counting stars ‘who counts stars’
броя минутите (vMWE)
count minutes
броени минути
counted minutes ‘counted minutes’
- ID5-_NSCs/PART
одирам някого жив
skin s.b./him.ACC alive ‘to skin s.b. alive’
одран жив
skinned alive ‘skinned alive’
Sample annotation
We performed automatic annotation (using a dictionary of vMWEs) on a selection of 200 sentences, comprising a total of 7,234 tokens. The texts were annotated with POS, grammatical features and lemma. There are 318 annotated verbal and sentential MWEs.
Note: Manual verification of annotation, as well as dictionary expansion and description, are still ongoing.