DIGITAL LIBRARY
FROM CORPUS TO LANGUAGE CURRICULUM: A DATA-BASED OR DATA-DRIVEN EXERCISE?
Hong Kong University of Science and Technology (HONG KONG)
About this paper:
Appears in: EDULEARN11 Proceedings
Publication year: 2011
Pages: 519-525
ISBN: 978-84-615-0441-1
ISSN: 2340-1117
Conference name: 3rd International Conference on Education and New Learning Technologies
Dates: 4-6 July, 2011
Location: Barcelona, Spain
Abstract:
It is almost inconceivable nowadays that a new English as a second language (ESL) curriculum could be developed without reference to information from large-scale corpora of the language. Advances in natural language databases appear, at first sight at least, to provide principled ways of selecting language items for teaching. For decades, West’s (1953) General Service List of English Words (GSL) has served as a reliable reference point for ESL curricula and materials for the early stages of language learning. However, since the 1990s, multi-million word corpora such as the British National Corpus (BNC) have provided curriculum developers with much more detailed information about English, at least in terms of word frequency and collocation. More recently, the availability of specialized corpora for English as a lingua franca (ELF), for example the Vienna-Oxford International Corpus of English (VOICE) and academic English, such as the Michigan Corpus of Academic Spoken English (MICASE) and the British Academic Written English (BAWE), might be more appropriate sources for ESL curriculum development, at least for advanced learners.

This paper examines some of the practical issues which arise when drawing on corpus data in curriculum development. Reference is made to a collaborative project, conducted with the Hong Kong Education Bureau (EDB), that set out to develop an English vocabulary curriculum for the twelve years of compulsory education in Hong Kong. The first phase aimed to identify sets of words which students might be expected to know by the end of each of the four Key Stages (Years 3, 6, 9 and 12). Potential vocabulary items were selected according to their frequency of occurrence in BNC and GSL, then subjected to scrutiny by teacher representatives of the four Key Stages. Each Key Stage was represented by teachers representing about 25% of Hong Kong schools. The teachers took part in a computer-based vocabulary decision-making task, in which they were asked to consider selections of words and judge their suitability for learners at the Key Stage they represented. Quantitative data generated by the decision-making tasks provided a basis for including items in the curriculum. However, the final selection of items adopted a more qualitative approach and involved reference to the topics and themes recommended in the official curriculum guides, the vocabulary content of approved English textbooks and some guiding principles established at the outset. Although frequency data about English words provided a helpful starting point for selecting vocabulary content, the teacher representatives rejected more of the potential items than the research team had anticipated. It was concluded that high-frequency items in a corpus of native English may not be useful or relevant to learners in S.E. Asia. The construction of ELF corpora, particularly if these include both written and spoken English, is likely to be of direct benefit to ESL curriculum and materials developers.
Keywords:
Corpus, curriculum, ESL.