The Chambers-Rostand Corpus of Journalistic French

  • The Chambers-Rostand Corpus of Journalistic French
  • Le Corpus Chambers-Rostand du français journalistique

Chambers, Angela; Rostand, Séverine; University of Limerick, Ireland


Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Download: zip



Editorial Practice

The texts have been checked for spelling and format errors, however only basic errors have been corrected such as missing letters, missing accents on 'e', accidental repetition of words (e.g. "repetition of of words"), missing space often due to uploading and dowloading hiccups. Potential/controversial "errors" concerning accents, hyphenation, capital letters, etc. were not corrected as they may be considered as part of the evolution of a language.

Tagging of the texts is done at paragraph level only.

OTA keywords

Linguistic corpora

LC keywords

Electronic publications
French language--Written French
Language acquisition--Databases
Languages, Modern--Study and teaching
Discourse analysis
Language and languages--Computer-assisted instruction

  • designation: CollectionText
  • size: 5169 files : ca. 40 MB
Creation Date


Source Description

Le Monde [CD-Rom] : 2002 Le Monde Paris: 2002

Le Monde [CD-Rom] : April 2002 - March 2004 Le Monde Paris: 2004


Articles taken from the website of L'Humanité: in September/October 2004.

La Dépêche du Midi

Articles taken from the website of La Dépêche du Midi: in November 2004.


Mode of access: Online. Application to OTA

This corpus contains 979,831 words, made up of 1723 articles taken from three daily French newspapers:

  • Le Monde (576 articles / 355,046 words)
  • L'Humanité (576 articles / 367,486 words)
  • La Dépêche du Midi (570 articles / 257,299 words)

The articles were published in 2002 and 2003. They belong to one of six categories: editorial, cultural, sports, national news, international news, finance.

The articles were taken from the newspapers on the 4th, 12th, 20th and 28th of each month. If in one or all categories, an article was not available on a particular day, the article from the day after was taken. If no article was available on that day, the article from the day before was taken, and so on and so forth.

Permanent URL