Phase 2: Second annotation by two independent annotators with feedback on guidelines, categories and language-specific features
Last edited: 22/03/2016
➥ Results from Phase 2
➥ Problematic cases and hesitation in annotators’ decisions
➥ Updated classification of vMWEs
Results from Phase 2
200 sentences annotated by two different annotators
ANNOTATOR 1: 301 vMWEs
ANNOTATOR 2: 306 vMWEs
- 22 vMWEs annotated by only 1 annotator
- 287 vMWEs annotated by both annotators
- 249 vMWEs annotated in the same way
- 38 vMWEs (13.24%) annotated by each annotator with a different type
Differences in annotators’ decisions (according to PARSEME’s scheme)
ID/LVC | 30 |
ID/OTH | 7 |
IPrepV/OTH | 1 |
TOTAL | 38 |
Problematic cases and hesitation in annotators’ decisions
The cases of vMWEs classified differently by the annotators raise general questions valid for Bulgarian and, we believe, (most of the) other languages.
1. LVC and ID
In the general case, hesitation with respect to the annotation of an VMWE as an LVC or an ID arises when the annotator is not sure if all the obligatory LVC tests apply, or cannot come up with an example to test the vMWE with respect to a given test, e.g. one annotator was not sure if Test 10 applies in the case of the vMWE поставям въпрос (pose a question).
In such cases, the application of additional tests (e.g. the possibility for substitution with a verb from the same base as the noun, Test 13) may lead to the decision whether the MWE is an LVC or an ID. The cases in which the VMWE cannot be replaced by a single word (verb) of the same stem as the noun in the MWE are more likely to be assigned the tag ID rather than LVC, even if semantically and structurally similar MWEs with a single word correspondence are assigned an LVC tag, e.g. слагам край (на) ‘put an end to’ (there is no separate verb from the same stem as that of the noun край (end), but there are synonymous one-word verbs: приключвам, прекратявам, свършвам ‘to finish, to end’). As a whole, contexts for the application of Test 10 are not always easy to think of which makes the annotators unsure if it is applicable. Taking this into account, the candidate LVCs where the noun is in its regular (non-idiomatic) sense, may be divided into two:
(a) Ones with a single verb correspondence from the same stem as the noun (if tests 7, 8, 9, 11, and possibly 12 apply, and Test 10 doesn’t), if Test 13 applies > LVC.
(b) Ones without a single verb correspondence from the same stem (Test 13 doesn’t apply) > apply the remaining additional tests.
In the case of (b) one more test may be introduced: try to replace the vMWE with a single verb formed from a different (but synonymous) stem, e.g слагам край (приключвам, прекратявам = put an end to), or a predicative with derivationally related adjective/adverb, имам право (съм прав = am right).
The following examples are cases of ambiguity of the type LVC/ID, where ambiguity means that they were annotated differently by the annotators: влизам в сила (take effect), завоювам нови позиции (gain new ground), встъпвам в длъжност (assume office).
2. ID and OTH
Hesitation in the annotation between ID and OTH in cases where the vMWE has sentential properties (e.g., a lexicalised subject, ребрата (на някого) му се броят – someone’s ribs can be counted (is so skinny)) or embeds or requires a subordinate clause, e.g., съм на път да – be on the way to do s.th., правя се на дръж ми шапката – to behave as if I don’t know the rules
The following tests are applicable:
(a) If the verb’s subject is free, the vMWE is marked as ID: правя се на дръж ми шапката – to behave as if I don’t know the rules.
(b) If the subject is lexicalised but there are other free elements, it is marked as ID (ID6): ребрата (на някого) му се броят – someone’s ribs can be counted (someone is so skinny).
(c) If there are no free elements, it is a frozen expression (marked as OTH).
- If it can be used as a sentence, it is of type OTH1 (e.g., proverbs, exclamations): жив(а) да не бях! – I can’t believe it, то ще си покаже – we’ll see.
- If it cannot be used on its own, it is of type OTH2: де се е чуло и видяло – has it ever been heard or seen.
3. vMWE and non-MWE
Hesitation whether an expression is a vMWE in the following cases:
(A) When a phrase is headed by to be (or another linking verb), e.g. съм следствие – be a consequence.
A possible test is to:
- Check if the predicative noun/NP contributes a regular meaning. If YES, the expression is not marked as an MWE, if NO – the expression is likely to be a MWE.
(B) LVC or non-vMWE: покривам цената – cover the expenses, правя промени – make changes, давам ефект – cause effect.
Apply the tests for LVC.
(C) When a verb-headed phrase embeds an MWE: виждам [черно на бяло] – see (sth) in [black and white]/in writing, виждам [в нова светлина] – see [in a new light], купувам [на едро] – buy [in bulk].
Possible (non-definitive) test to decide if the verb-headed phrase is a vMWE or a combination of a verb and an MWE are the following:
- Check if the embedded MWE is found with other verbs, e.g. възприемам [в нова светлина] – perceive [in a new light], разглеждам [в нова светлина] – view [in a new light], etc.; if the embedded MWE combines with a given set of verbs and the verb is in one of its regular meanings, then the expression is not marked as an MWE.
- Check if the verb and the embedded MWE are substituted separately, e.g. възприемам [в нова светлина] – perceive [differently] (unlike LVCs that may be substituted by a single verb). If NO, the expression may be an MWE. (This test may depend on the presence of lexicalisation. The lack of lexicalisation may be due to a lexical gap.)
(D) When the candidate vMWE is composed of the regular metaphorical meaning of any of the components: пазя в тайна – keep in secret, прохождам в бизнеса – to start off in business the expression is not marked as an MWE.
(E) Frequently used frozen phrases that had lost their verbal meaning: разбира/V се/REFL – of course, не щеш ли – unexpectedly are annotated as OTH.
4. Language-specific categories in Bulgarian
(1) vMWEs with a lexicalised subject are now categorised as a separate type as ID as they allow only certain transformations and are relatively frequent in Bulgarian.
(2) Additionally, we keep the subdivision of vMWEs described in the classification for Bulgarian (below).
Updated Classification of vMWEs
1. Light verb constructions, PARSEME category: LVC
- LVC-NP
чета/V доклад/N
read report ‘to present/to make a presentation’вземам/V решение/N
take decision ‘to make a decision’
вземам/V важно/A решение/N
take important decision - LVC-PP
държа/V под/P контрол/N
keep under control
(LVCs usually allow modifications of components)
2. Proper vMWEs with a different degree of idiomaticity, PARSEME category: ID
2.1. ID1: Non-decomposable vMWEs with frozen components (except the verb)
- ID1-NP
ритам/V камбаната/N
kick bell.DEF ‘to kick the bucket’давам/V зелена/A улица/N
give green street ‘to give the green light’ - ID1-PP
бия/V на/P очи/N
hit on eyes ‘to catch the eye’вземам на мушка
take on target ‘to draw a bead on’удрям в гръб
hit in back ‘to stab (s.b.) in the back’изтривам от лицето на земята
wipe from face.DET of earth.DET ‘to wipe (s.b.) off from the face of the earth’ - ID1-NPPP
имам жълто около устата
have yellow around mouth.DET ‘to be wet behind the ears’
- ID1-PPPP
прочитам/V от/P корица/N до/P корица/N
read from cover to cover
- ID1-AdvP
изваждам/V наяве/ADV
expose in_the_open ‘to bring to light’
- ID1-AP
съм/V ясен/A като/P бял/A ден/N
be clear as white day ‘to be as clear as day’
- ID1-NPSC vMWEs with objectival small clause
държа очите си отворени
keep eyes.DET my/your/.POSS open ‘to keep one’s eyes open’
2.2. ID2: Non-decomposable/semi-decomposable vMWEs with semi-fixed components (except the verb)
- ID2-NP
броя/V звезди/N
count stars ‘to count the stars’
броя/V звездите/N
count stars.DETвдигам/V бяло/А знаме/N
hold white flag ‘to wave a white flag’
вдигам/V бялото/А знаме/N
hold white flag ‘to wave the white flag’ - ID2-PP
хващам/V в/P капан/N
catch in trap ‘catch in a trap’
хващам/V в/P капана/P
catch in trap.DEF ‘catch in the trap’
2.3. ID3: Decomposable vMWEs which allow modifier slots (XP_YP is labelling the possible modifier YP of the component XP of the MWE)
- ID3-NP_AdvP
виждам (много) зор
see (much) hardship ‘to find (s.th.) very tough’
- ID3-NP_AP
правя (първи/малки) стъпки
make (first/small) steps ‘to make first steps’давам (добър) урок
give (good) lesson’to give s.b. a lesson’забърквам (голяма) каша
mix-up (big) mess ‘to make a (big) mess’ - ID3-PP_AP
гълтам с (жадни) очи
swallow with (thirsty) eyes ‘to take it all with greedy eyes’
- ID3-PP_AdvP
стигам до (много) сърца
reach to (many)/the hearts ‘to reach many hearts’
- ID3-NP_APPP
хвърлям (големи/всички) пари на вятъра ‘throw money around’
throw (big/all) money on wind.DEF
- ID3-AdvP_AdvP
гледам/V (много) отвисоко/ADV
look (much) from_above ‘to look very much down (at s.o.)
- ID3-_AdvPNP (to the verb, but is still internal, i.e. within the metaphor)
вдигам летвата (високо)
raise stick.DET high ‘to raise the bar high’
2.4. ID4: Decomposable vMWEs with possessive slot within the NP/PP
- ID4-NP_наNP
копая/V гроба/N PPos{на/P някого/N} possessive prepositional expression
копая нечий гроб
dig grave.DET of someone ‘dig one’s grave’
- ID4-PP_наNP
ходя/V по/P нервите/N PPos{на/P някого/N} possessive prepositional expression
ходя по нечии нерви
walk on nerves.DET of somebody ‘get on s.o.’s nerves’
2.5. ID5: vMVEs with an open slot
- ID5-_NSCs vMWEs with small clause in agreement with the direct object slot
изкарвам някого/го чист
make s.b./him.ACC clean ‘to make s.b. as innocent as a babe unborn’одирам някого жив
skin s.b./him.ACC alive ‘to skin s.b. alive’ - ID5 with small clause in agreement with the subject slot
имам думата последен
have word.DET last ‘to have the last word’излизам/V сух/A от/F водата/N
come_out dry from water.DEF ‘come off clean’
2.6. ID6: vMVEs with with lexicalised subject
излиза ми име
appears for-me.DAT name ‘a name sticks (for/to me)’
чашата на търпението ми прелива
glass.DET of patience my.POS overflows ‘my patience runs out’
3. Lexicalised combinations of verbs with pronouns, PARSEME category: IPronV
3.1. With reflexive particles
- IPronV1
смея/V се
laugh REFL.ACCсъстезавам/V се
compete REFL.ACC
3.2. With reciprocal particles
- IPronV2
въобразявам/V си
imagine REFL.DAT
3.3. With accusative pronominal clititcs
- IPronV3 Accusativa tantum
мързи/V ме
is_lazy me.ACC ‘I am lazy’, ‘it feels lazy to me’
3.4. With dative pronominal clititcs
- IPronV4 Dativa tantum
нагарча/V ми
is_bitterish me.DAT
3.5. Combinations of the above
- IPronV1/IPronV4 and other combinations
гади/V ми се
it_feels_sick me.DAT REFL.ACC ‘I feel sick’
4. Lexicalised combinations of verb and preposition, PARSEME category: IPrepV
вярвам/V в/P
believe in
разчитам/V на/P
rely on
5. MWEs with sentence features, PARSEME category: OTH
- OTH1: proverbs
който не работи, не трябва да яде
who.DET not work not shoud to eat
‘He who does not work, neither shall he eat.’ (2 Thesal. 3:10, НТ; popular biblical proverb)
- OTH2: frozen clausal expressions
както и да е
as it is ‘whatever’каквото и да става
whatever to happen ‘no matter what happens’
6. Nominal, Adverbial or Participal MWEs derived from vMWEs
From Guidelines (among the types of vMWEs the following type is described): Syntactic nominal, participial and other variants of prototypical VMWEs maintaining their idiomatic reading, e.g. decisions which we made, decision making, heart-breaking.
ORIG_CAT/NEW_CAT where NEW_CAT can be one of: NOM (NP), PART (AP with participle), TRANS (AdvP with transgressive, corresponding to English present participle used to describe simultaneous actions)
We need to describe in detail the possible transformations.
Types of transformation:
- NP from the vMWE
- Actor
- Adverbial participles
- NP turns into head
- ID1-PP/NOM
удрям в гръб (vMWE)
hit in back ‘to stab (s.b.) in the back’
удар в гръб
blow in back ‘blow in the back’
- ID2-NP/NOM (NP turns into PP_на)
броя звезди (vMWE)
count stars ‘to count stars’
броене на звезди
counting stars ‘(the act of) counting stars’играя карти (vMWE)
play cards
игра на карти
game of cards ‘card game’
играч на карти
player of cards ‘card player’ - ID2-NP/TRANS
броя звездите (vMWE)
count stars ‘to count stars’
броейки звездите
counting stars ‘(while) counting stars’
- ID3-_AdvPNP/NOM
вдигам летвата високо (vMWE)
raise stick.DEF high ‘to raise the bar’
високо вдигната летва
highly raised stick.DET ‘high bar’
- ID2-NP/PART
броя звезди (vMWE)
count stars ‘to count stars’
броящ звезди (allows forms of the participle – gender, number, definitiveness)
counting stars ‘who counts stars’броя минутите (vMWE)
count minutes
броени минути
counted minutes ‘counted minutes’ - ID5-_NSCs/PART
одирам някого жив
skin s.b./him.ACC alive ‘to skin s.b. alive’
одран жив
skinned alive ‘skinned alive’
Sample annotation – phase 2
Annotation is performed by 2 annotators independently.
➥ Link to vMWE dictionary (4,268 vMWEs and sentential MWEs)