The Chambers-Rostand Corpus of Journalistic French
| Title |
|
| Author | Chambers, Angela; Rostand, Séverine; University of Limerick, Ireland |
| Availability | This resource is freely available, you should be able to download it now. |
| Languages | French |
| Editorial Practice |
The texts have been checked for spelling and format errors, however only basic errors have been corrected such as missing letters, missing accents on 'e', accidental repetition of words (e.g. "repetition of of words"), missing space often due to uploading and dowloading hiccups. Potential/controversial "errors" concerning accents, hyphenation, capital letters, etc. were not corrected as they may be considered as part of the evolution of a language. Tagging of the texts is done at paragraph level only. |
| OTA keywords |
Linguistic corpora Corpus |
| LC keywords | Electronic publications |
| Extent |
|
| Creation Date | 2005 |
| Source Description |
Le Monde [CD-Rom] : 2002 Le Monde Paris: 2002 Le Monde [CD-Rom] : April 2002 - March 2004 Le Monde Paris: 2004 L'Humanité Articles taken from the website of L'Humanité: www.humanite.fr in September/October 2004. La Dépêche du Midi Articles taken from the website of La Dépêche du Midi: www.ladepeche.fr in November 2004. |
| Notes |
Mode of access: Online. Application to OTA
This corpus contains 979,831 words, made up of 1723 articles taken from three daily French newspapers:
The articles were published in 2002 and 2003. They belong to one of six categories: editorial, cultural, sports, national news, international news, finance. The articles were taken from the newspapers on the 4th, 12th, 20th and 28th of each month. If in one or all categories, an article was not available on a particular day, the article from the day after was taken. If no article was available on that day, the article from the day before was taken, and so on and so forth. |
