The Lancaster Speech, Writing and Thought Presentation Spoken Corpus

Title

The Lancaster Speech, Writing and Thought Presentation Spoken Corpus

Author

Short, Mick; Semino, Elena; McEnery, Tony; Heywood, John; McIntyre, Dan

Availability

Freely available for non-commercial use provided that the terms of the BNC user licence are observed for the files derived from the BNC, and that this header is included in its entirety with any copy distributed

Download: zip

Languages

English

Editorial Practice

Encoding format: TEI XML

The following editorial policies were applied in creating The Lancaster Speech, Thought and Writing Presentation Spoken Corpus.

The original transcripts were revised to ensure an accurate orthographic transcription. Non-fluency features were transcribed. Where it was necessary to revise the transcription, no punctuation was added, except for full-stops. These were added where it was felt a sentence boundary would exist in written data, and were included primarily to make the texts easier to read.

S-unit tags were removed from the BNC texts in order to increase readability of the files. Records of the s-unit numbers that were removed can be found in the individual file headers.

OTA keywords

Linguistic corpora
Corpus

LC keywords

Linguistics
Conversation
Discourse analysis
Speech
Interviews
English language--Spoken English
Oral history

Extent
  • designation: CollectionSound
  • size: 283 files : ca. 2.56 GB (offline - not available for download)
  • designation: CollectionText
  • size: 196 files : ca. 6.23 MB (online - available for download)
Creation Date

09/2001-05/2005

Source Description

no source record

Notes

The four major objectives of the project were: i) to establish an electronic corpus of (a) conversations, from the British National Corpus (BNC) and (b) oral narratives, from Lancaster's Centre for North Western Regional Studies (CNWRS) oral history archive; ii) to annotate the corpus manually for categories of Speech, Thought and Writing Presentation (ST and WP); iii) to conduct systematic quantitative and qualitative analyses of the annotated corpus and iv) to compare the findings with those of Short and Semino's 1994-7 study of ST and WP in a corpus of written fictional and non-fictional British English narratives (BA LRG M-AN2314/APN/3489).

A full description of the corpus can be found on the project website at: http://www.ling.lancs.ac.uk/stwp/default.htm

McIntyre, D., C. Bellard-Thomson, J. Heywood, A. McEnery, E. Semino and M. Short. 2003. 'The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English.' In Archer, D., P. Rayson, A. Wilson and A. McEnery (eds). Proceedings of the Corpus Linguistics 2003 Conference. Lancaster University: UCREL Technical Papers 16. 513-23.

McIntyre, D., C. Bellard-Thomson, J. Heywood, A. McEnery, E. Semino and M. Short. 2004. ‘Investigating the presentation of speech, thought and writing presentation in spoken British English. ICAME journal.

The digital audio is not available for download because permissions for the distribution and reuse were not acquired by the project.