The Lancaster Corpus of Mandarin Chinese


The Lancaster Corpus of Mandarin Chinese


McEnery, A.M. (ed.); Xiao, Richard (ed.)


Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Download: zip


Mandarin Chinese

Editorial Practice

Encoding format: XML

OTA keywords

Linguistic corpora

LC keywords

Componential analysis (Linguistics)
Linguistic analysis (Linguistics)
Chinese language--Modern Chinese, 1919-

  • designation: Text data
  • size: 30 files : ca. 42.8 MB
Creation Date


Source Description

no source record


The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for modern British and American English. The corpus is suitable for use in both monolingual research into modern Mandarin Chinese and cross-linguistic contrast of Chinese and British/American English. The corpus sampled 15 written text categories including news, literary texts, academic prose and official documents etc published in P.R.China in the early 1990s. The same sampling frame and period as FLOB/FROWN were used in LCMC. The corpus is encoded in Unicode (UTF-8) and marked up in XML.

Permanent URL