VU Amsterdam Metaphor Corpus


Gerard J Steen; Aletta G Dorst; J Berenike Herrmann; Anna A Kaal; Tina Krennmayr


Available for non-commercial use on condition that the terms of the BNC Licence are observed and that this header is included in its entirety with any copy distributed.

Editorial Practice

Encoding format: TEI P5 XML

Linguistic corpora

Linguistics analysis (Linguistics)

The corpus was annotated between September 2005 and August 2010.

The corpus is a small subset of BNC Baby, composed of fragments of BNC Baby texts.

The resource contains a selection of excerpts from BNC-Baby files that have been annotated for metaphor. There are four registers, each comprising about 50,000 words: academic texts, news texts, fiction, and conversations. Words have been separately labelled as participating in multi-word expressions (about 1.5%) or as discarded for metaphor analysis (0.02%). Main categories include words that are related to metaphor (MRW), words that signal metaphor (MFlag), and words that are not related to metaphor. For metaphor-related words, subdivisions have been made between clear cases of metaphor versus borderline cases (WIDLII, When In Doubt, Leave It In). Another parameter of metaphor-related words makes a distinction between direct metaphor, indirect metaphor, and implicit metaphor.