The Emille Corpus (Beta Release Version)
| Title | The Emille Corpus (Beta Release Version) [Electronic resource] |
| Editor | McEnery, A.M. (ed.); Baker, Paul (ed.); Hardie, Andrew (ed.) |
| Availability | This resource is freely available, you should be able to download it now. |
| Languages | English; Gujarati; Tamil; Hindi; Panjabi; Urdu; Bengali |
| Editorial Practice | Encoding format: SGML |
| OTA keywords |
Linguistic corpora Corpus |
| LC keywords | South Asia--Languages |
| Extent |
|
| Creation Date | 2003 |
| Source Description | : : |
| Notes |
Mode of access: Online. OTA website The collection consists of: Thirty million words of monolingual written data (Gujarati, Tamil, Hindi, Punjabi-news website articles); 600,000 words of monolingual spoken data (Hindi, Urdu, Punjabi, Bengali, Gujarati-radio broadcasts); 120,000 words of parallel data in each of English, Hindi, Urdu, Punjabi, Bengali and Gujarati (U.K. government leaflets). Further information available at: http://www.emille.lancs.ac.uk/home.htm |
