VU Amsterdam Metaphor Corpus


VU Amsterdam Metaphor Corpus


Gerard J Steen; Aletta G Dorst; J Berenike Herrmann; Anna A Kaal; Tina Krennmayr


Available for non-commercial use on condition that the terms of the BNC Licence are observed and that this header is included in its entirety with any copy distributed.

Download: zip



Editorial Practice

Encoding format: TEI P5 XML

OTA keywords

Linguistic corpora

LC keywords

Linguistics analysis (Linguistics)

  • designation: CollectionText
  • size: files: ca. 33 MB
Creation Date

The corpus was annotated between September 2005 and August 2010.

Source Description

The corpus is a small subset of BNC Baby, composed of fragments of BNC Baby texts.

The corpus is also available for online searching at


Title proper taken from OTA Catalogue Form

The resource contains a selection of excerpts from BNC-Baby files that have been annotated for metaphor. There are four registers, each comprising about 50,000 words: academic texts, news texts, fiction, and conversations. Words have been separately labelled as participating in multi-word expressions (about 1.5%) or as discarded for metaphor analysis (0.02%). Main categories include words that are related to metaphor (MRW), words that signal metaphor (MFlag), and words that are not related to metaphor. For metaphor-related words, subdivisions have been made between clear cases of metaphor versus borderline cases (WIDLII, When In Doubt, Leave It In). Another parameter of metaphor-related words makes a distinction between direct metaphor, indirect metaphor, and implicit metaphor.