VU Amsterdam Metaphor Corpus


Gerard J Steen; Aletta G Dorst; J Berenike Herrmann; Anna A Kaal; Tina Krennmayr


Available for non-commercial use on condition that the terms of the BNC Licence are observed

Editorial Practice

Encoding format: TEI P5 XML

Linguistic corpora

Linguistics analysis (Linguistics)

Creation Date


Source Description

The corpus is a small subset of BNC Baby, composed of fragments of BNC Baby texts.

The corpus is also available for online searching at


The resource contains a selection of excerpts from BNC-Baby files that have been annotated for metaphor. There are four registers, each comprising about 50,000 words: academic texts, news texts, fiction, and conversations. Words have been separately labelled as participating in multi-word expressions (about 1.5%) or as discarded for metaphor analysis (0.02%). Main categories include words that are related to metaphor (MRW), words that signal metaphor (MFlag), and words that are not related to metaphor. For metaphor-related words, subdivisions have been made between clear cases of metaphor versus borderline cases (WIDLII, When In Doubt, Leave It In). Another parameter of metaphor-related words makes a distinction between direct metaphor, indirect metaphor, and implicit metaphor.

Publications based on the data include:

  • Gerard J. Steen, Aletta G. Dorst, J. Berenike Herrmann, Anna A. Kaal, Tina Krennmayr, Trijntje Pasma (2010). A method for linguistic metaphor identification: From MIP to MIPVU. Amsterdam/Philadelphia: John Benjamins Publishing Company.
  • Tina Krennmayr (2011). Metaphor in Newspapers. Amsterdam: Vrije Universiteit. Available at
  • Aletta G. Dorst (2011). 'Personification in Discourse: Linguistic forms, conceptual structures and communicative functions', Language and Literature, 20 (2):113-135.

