GLBCC (Giessen - Long Beach Chaplin Corpus)


GLBCC (Giessen - Long Beach Chaplin Corpus)


Jucker, Andreas H.; Müller, Simone; Smith, Sara


Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Download: zip



Editorial Practice

Documentation files in PDF.

Data files in plain text.

OTA keywords

Linguistic corpora

LC keywords

Discourse analysis

  • designation: CollectionText
  • size: 5 files : ca. 2.27 MB
Creation Date


Source Description

no source record


The Giessen - Long Beach Chaplin Corpus (GLBCC) consists of transcribed interactions between native English speakers, ESL and EFL speakers. Pairs of students, in California (for English as native and second language) and in Giessen (for English as foreign language), participated in the experiment in which they were asked to watch the first part of a silent Charlie Chaplin movie. One participant (speaker A) was then asked to retell in a monologue what he or she had seen so far, while the other participant (speaker B) watched the rest of the movie and told his or her partner the second part of the movie (dialogue). Finally the two participants discussed several aspects of the movie on the basis of a few written prompts. 108 Sessions are recorded involving 191 speakers (in some cases only one speaker participated in a session and retold the entire movie in a monologue - C-speaker). There are 83 A-speakers, 90 B-speakers (in the first 7 recordings in California the A-roles were not recorded). Altogether, the corpus comprises 35 American, 4 British, and 2 Australian native speakers. 77 Non-native speakers are Germans, the others have a variety of linguistic backgrounds, including Hispanic, Japanese and Korean. The transcripts average 2472 words each.

Permanent URL