EN BG

PARSEME shared task: Phase 2

Phase 2: Second annotation by two independent annotators with feedback on guidelines, categories and language-specific features

Last edited: 22/03/2016

PARSEME Guidelines v. 5


➥ Results from Phase 2
➥ Problematic cases and hesitation in annotators’ decisions
➥ Updated classification of vMWEs


Results from Phase 2

200 sentences annotated by two different annotators

ANNOTATOR 1: 301 vMWEs

ANNOTATOR 2: 306 vMWEs

  • 22 vMWEs annotated by only 1 annotator
  • 287 vMWEs annotated by both annotators
    • 249 vMWEs annotated in the same way
    • 38 vMWEs (13.24%) annotated by each annotator with a different type

Differences in annotators’ decisions (according to PARSEME’s scheme)

ID/LVC 30
ID/OTH 7
IPrepV/OTH 1
 TOTAL 38

TOP


Problematic cases and hesitation in annotators’ decisions

The cases of vMWEs classified differently by the annotators raise general questions valid for Bulgarian and, we believe, (most of the) other languages.

1. LVC and ID

In the general case, hesitation with respect to the annotation of an VMWE as an LVC or an ID arises when the annotator is not sure if all the obligatory LVC tests apply, or cannot come up with an example to test the vMWE with respect to a given test, e.g. one annotator was not sure if Test 10 applies in the case of the vMWE поставям въпрос (pose a question).

In such cases, the application of additional tests (e.g. the possibility for substitution with a verb from the same base as the noun, Test 13) may lead to the decision whether the MWE is an LVC or an ID. The cases in which the VMWE cannot be replaced by a single word (verb) of the same stem as the noun in the MWE are more likely to be assigned the tag ID rather than LVC, even if semantically and structurally similar MWEs with a single word correspondence are assigned an LVC tag, e.g. слагам край (на) ‘put an end to’ (there is no separate verb from the same stem as that of the noun край (end), but there are synonymous one-word verbs: приключвам, прекратявам, свършвам ‘to finish, to end’). As a whole, contexts for the application of Test 10 are not always easy to think of which makes the annotators unsure if it is applicable. Taking this into account, the candidate LVCs where the noun is in its regular (non-idiomatic) sense, may be divided into two:

(a) Ones with a single verb correspondence from the same stem as the noun (if tests 7, 8, 9, 11, and possibly 12 apply, and Test 10 doesn’t), if Test 13 applies > LVC.

(b) Ones without a single verb correspondence from the same stem (Test 13 doesn’t apply) > apply the remaining additional tests.

In the case of (b) one more test may be introduced: try to replace the vMWE with a single verb formed from a different (but synonymous) stem, e.g слагам край (приключвам, прекратявам = put an end to), or a predicative with derivationally related adjective/adverb, имам право (съм прав = am right).

The following examples are cases of ambiguity of the type LVC/ID, where ambiguity means that they were annotated differently by the annotators: влизам в сила (take effect), завоювам нови позиции (gain new ground), встъпвам в длъжност (assume office).

2. ID and OTH

Hesitation in the annotation between ID and OTH in cases where the vMWE has sentential properties (e.g., a lexicalised subject, ребрата (на някого) му се броят – someone’s ribs can be counted (is so skinny)) or embeds or requires a subordinate clause, e.g., съм на път да – be on the way to do s.th., правя се на дръж ми шапката – to behave as if I don’t know the rules

The following tests are applicable:

(a) If the verb’s subject is free, the vMWE is marked as ID: правя се на дръж ми шапката – to behave as if I don’t know the rules.

(b) If the subject is lexicalised but there are other free elements, it is marked as ID (ID6): ребрата (на някого) му се броят – someone’s ribs can be counted (someone is so skinny).

(c) If there are no free elements, it is a frozen expression (marked as OTH).

  • If it can be used as a sentence, it is of type OTH1 (e.g., proverbs, exclamations): жив(а) да не бях! – I can’t believe it, то ще си покаже – we’ll see.
  • If it cannot be used on its own, it is of type OTH2: де се е чуло и видяло – has it ever been heard or seen.

3. vMWE and non-MWE

Hesitation whether an expression is a vMWE in the following cases:

(A) When a phrase is headed by to be (or another linking verb), e.g. съм следствиеbe a consequence.

A possible test is to:

  1. Check if the predicative noun/NP contributes a regular meaning. If YES, the expression is not marked as an MWE, if NO – the expression is likely to be a MWE.

(B) LVC or non-vMWE: покривам цената – cover the expenses, правя промени – make changes, давам ефект – cause effect.

Apply the tests for LVC.

(C) When a verb-headed phrase embeds an MWE: виждам [черно на бяло] – see (sth) in [black and white]/in writing, виждам [в нова светлина] – see [in a new light], купувам [на едро] – buy [in bulk].

Possible (non-definitive) test to decide if the verb-headed phrase is a vMWE or a combination of a verb and an MWE are the following:

  1. Check if the embedded MWE is found with other verbs, e.g. възприемам [в нова светлина]perceive [in a new light], разглеждам [в нова светлина] – view [in a new light], etc.; if the embedded MWE combines with a given set of verbs and the verb is in one of its regular meanings, then the expression is not marked as an MWE.
  2. Check if the verb and the embedded MWE are substituted separately, e.g. възприемам [в нова светлина]perceive [differently] (unlike LVCs that may be substituted by a single verb). If NO, the expression may be an MWE. (This test may depend on the presence of lexicalisation. The lack of lexicalisation may be due to a lexical gap.)

(D) When the candidate vMWE is composed of the regular metaphorical meaning of any of the components: пазя в тайна – keep in secret, прохождам в бизнеса – to start off in business the expression is not marked as an MWE.

(E) Frequently used frozen phrases that had lost their verbal meaning: разбира/V се/REFL – of course, не щеш ли –  unexpectedly are annotated as OTH.

4. Language-specific categories in Bulgarian

(1) vMWEs with a lexicalised subject are now categorised as a separate type as ID as they allow only certain transformations and are relatively frequent in Bulgarian.

(2) Additionally, we keep the subdivision of vMWEs described in the classification for Bulgarian (below).

TOP


Updated Classification of vMWEs

1. Light verb constructions, PARSEME category: LVC

  • LVC-NP


    чета/V доклад/N
    read report ‘to present/to make a presentation’

    вземам/V решение/N
    take decision ‘to make a decision’
    вземам/V важно/A решение/N
    take important decision

  • LVC-PP


    държа/V под/P контрол/N
    keep under control

(LVCs usually allow modifications of components)

2. Proper vMWEs with a different degree of idiomaticity, PARSEME category: ID

2.1. ID1: Non-decomposable vMWEs with frozen components (except the verb)

  • ID1-NP


    ритам/V камбаната/N
    kick bell.DEF ‘to kick the bucket’

    давам/V зелена/A улица/N
    give green street ‘to give the green light’

  • ID1-PP


    бия/V на/P очи/N
    hit on eyes ‘to catch the eye’

    вземам на мушка
    take on target ‘to draw a bead on’

    удрям в гръб
    hit in back ‘to stab (s.b.) in the back’

    изтривам от лицето на земята
    wipe from face.DET of earth.DET ‘to wipe (s.b.) off from the face of the earth’

  • ID1-NPPP


    имам жълто около устата
    have yellow around mouth.DET ‘to be wet behind the ears’

  • ID1-PPPP


    прочитам/V от/P корица/N до/P корица/N
    read from cover to cover

  • ID1-AdvP


    изваждам/V наяве/ADV
    expose in_the_open ‘to bring to light’

  • ID1-AP


    съм/V ясен/A като/P бял/A ден/N
    be clear as white day ‘to be as clear as day’

  • ID1-NPSC vMWEs with objectival small clause


    държа очите си отворени
    keep eyes.DET my/your/.POSS open ‘to keep one’s eyes open’

2.2. ID2: Non-decomposable/semi-decomposable vMWEs with semi-fixed components (except the verb)

  • ID2-NP


    броя/V звезди/N
    count stars ‘to count the stars’
    броя/V звездите/N
    count stars.DET

    вдигам/V бяло/А знаме/N
    hold white flag ‘to wave a white flag’
    вдигам/V бялото/А знаме/N
    hold white flag ‘to wave the white flag’

  • ID2-PP


    хващам/V в/P капан/N
    catch in trap ‘catch in a trap’
    хващам/V в/P капана/P
    catch in trap.DEF ‘catch in the trap’

2.3. ID3: Decomposable vMWEs which allow modifier slots (XP_YP is labelling the possible modifier YP of the component XP of the MWE)

  • ID3-NP_AdvP


    виждам (много) зор
    see (much) hardship ‘to find (s.th.) very tough’

  • ID3-NP_AP


    правя (първи/малки) стъпки
    make (first/small) steps ‘to make first steps’

    давам (добър) урок
    give (good) lesson’to give s.b. a lesson’

    забърквам (голяма) каша
    mix-up (big) mess ‘to make a (big) mess’

  • ID3-PP_AP


    гълтам с (жадни) очи
    swallow with (thirsty) eyes ‘to take it all with greedy eyes’

  • ID3-PP_AdvP


    стигам до (много) сърца
    reach to (many)/the hearts ‘to reach many hearts’

  • ID3-NP_APPP


    хвърлям (големи/всички) пари на вятъра ‘throw money around’
    throw (big/all) money on wind.DEF

  • ID3-AdvP_AdvP


    гледам/V (много) отвисоко/ADV
    look (much) from_above ‘to look very much down (at s.o.)

  • ID3-_AdvPNP (to the verb, but is still internal, i.e. within the metaphor)


    вдигам летвата (високо)
    raise stick.DET high ‘to raise the bar high’

2.4. ID4: Decomposable vMWEs with possessive slot within the NP/PP

  • ID4-NP_наNP


    копая/V гроба/N PPos{на/P някого/N} possessive prepositional expression
    копая нечий гроб
    dig grave.DET of someone ‘dig one’s grave’

  • ID4-PP_наNP


    ходя/V по/P нервите/N PPos{на/P някого/N} possessive prepositional expression
    ходя по нечии нерви
    walk on nerves.DET of somebody ‘get on s.o.’s nerves’

2.5. ID5: vMVEs with an open slot

  • ID5-_NSCs vMWEs with small clause in agreement with the direct object slot


    изкарвам някого/го чист
    make s.b./him.ACC clean ‘to make s.b. as innocent as a babe unborn’

    одирам някого жив
    skin s.b./him.ACC alive ‘to skin s.b. alive’

  • ID5 with small clause in agreement with the subject slot


    имам думата последен
    have word.DET last ‘to have the last word’

    излизам/V сух/A от/F водата/N
    come_out dry from water.DEF ‘come off clean’

2.6. ID6: vMVEs with with lexicalised subject


излиза ми име
appears for-me.DAT name ‘a name sticks (for/to me)’

чашата на търпението ми прелива
glass.DET of patience my.POS overflows ‘my patience runs out’

3. Lexicalised combinations of verbs with pronouns, PARSEME category: IPronV

3.1. With reflexive particles

  • IPronV1


    смея/V се
    laugh REFL.ACC

    състезавам/V се
    compete REFL.ACC

3.2. With reciprocal particles

  • IPronV2


    въобразявам/V си
    imagine REFL.DAT

3.3. With accusative pronominal clititcs

  • IPronV3 Accusativa tantum


    мързи/V ме
    is_lazy me.ACC ‘I am lazy’, ‘it feels lazy to me’

3.4. With dative pronominal clititcs

  • IPronV4 Dativa tantum


    нагарча/V ми
    is_bitterish me.DAT

3.5. Combinations of the above

  • IPronV1/IPronV4 and other combinations


    гади/V ми се
    it_feels_sick me.DAT REFL.ACC ‘I feel sick’

4. Lexicalised combinations of verb and preposition, PARSEME category: IPrepV


вярвам/V в/P
believe in

разчитам/V на/P
rely on

5. MWEs with sentence features, PARSEME category: OTH

  • OTH1: proverbs


    който не работи, не трябва да яде
    who.DET not work not shoud to eat
    ‘He who does not work, neither shall he eat.’ (2 Thesal. 3:10, НТ; popular biblical proverb)

  • OTH2: frozen clausal expressions


    както и да е
    as it is ‘whatever’

    каквото и да става
    whatever to happen ‘no matter what happens’

6. Nominal, Adverbial or Participal MWEs derived from vMWEs

From Guidelines (among the types of vMWEs the following type is described): Syntactic nominal, participial and other variants of prototypical VMWEs maintaining their idiomatic reading, e.g. decisions which we made, decision making, heart-breaking.

ORIG_CAT/NEW_CAT where NEW_CAT can be one of: NOM (NP), PART (AP with participle), TRANS (AdvP with transgressive, corresponding to English present participle used to describe simultaneous actions)

We need to describe in detail the possible transformations.

Types of transformation:

  • NP from the vMWE
  • Actor
  • Adverbial participles
  • NP turns into head
  • ID1-PP/NOM


    удрям в гръб (vMWE)
    hit in back ‘to stab (s.b.) in the back’
    удар в гръб
    blow in back ‘blow in the back’

  • ID2-NP/NOM (NP turns into PP_на)


    броя звезди (vMWE)
    count stars ‘to count stars’

    броене на звезди
    counting stars ‘(the act of) counting stars’

    играя карти (vMWE)
    play cards
    игра на карти
    game of cards ‘card game’
    играч на карти
    player of cards ‘card player’

  • ID2-NP/TRANS


    броя звездите (vMWE)
    count stars ‘to count stars’
    броейки звездите
    counting stars ‘(while) counting stars’

  • ID3-_AdvPNP/NOM


    вдигам летвата високо (vMWE)
    raise stick.DEF high ‘to raise the bar’
    високо вдигната летва
    highly raised stick.DET ‘high bar’

  • ID2-NP/PART


    броя звезди (vMWE)
    count stars ‘to count stars’
    броящ звезди (allows forms of the participle – gender, number, definitiveness)
    counting stars ‘who counts stars’

    броя минутите (vMWE)
    count minutes
    броени минути
    counted minutes ‘counted minutes’

  • ID5-_NSCs/PART


    одирам някого жив
    skin s.b./him.ACC alive ‘to skin s.b. alive’
    одран жив
    skinned alive ‘skinned alive’

TOP


Sample annotation – phase 2

 

Annotation is performed by 2 annotators independently.

➥ Link to vMWE annotation with DIFF on differences in annotation by the two annotators (Google spreadsheet, read only)

➥ Link to vMWE dictionary (4,268 vMWEs and sentential MWEs)

➥ Link to Phase 1 notes

TOP


Copyright © 2015-2022 Department of computational linguistics. All rights reserved.