S.H. Ng1, H.P. Zhang2, C.S. Zhao1, H.H. Goh1

1Nanyang Technological University (SINGAPORE)
2Northeast Normal University (CHINA)
In the past, ethnic Chinese children in Singapore mostly came from the Chinese-speaking homes. However, in a recent statistic revealed by the Singapore Ministry of Education (2010), the proportion of ethnic Chinese Primary One children with English as the most frequently used home language rose from 28% in 1991 to 59% in 2010. This cohort of children and probably future cohorts of children will have different Chinese language learning needs and the current implemented Chinese language curriculum may no longer suit them. In order to create a more realistic and life-inclined curriculum, one will need to first identify the level of language competence needed to fulfill the daily needs of children and their subsequent reading needs in the future. On this basis, this study seeks to investigate the daily lexicon of Singapore primary level written Chinese media, which will provide important information and indication of the level of Chinese language competence to be achieved for better daily living and communication as well as for future information attainment.

This article will report the findings of the said study which holds several significances. Above all, the corpus will generate the first Singapore primary-level written Chinese wordlists as resources for curriculum development and teachers training. In addition, the corpus constructed in this study will be a scientific and reliable basis for the development of courseware, source materials and ICT platforms for the teaching and learning of Mandarin.

This study was carried out through a construction of a specialized corpus, which drew data from two broad categorized sources, namely the newspapers and the non-newspaper. In order to ensure an unbiased corpus, stratified sampling and systematic sampling was adopted during the data collection process. For the newspapers data, it includes two popular students’ newspaper, namely “Thumbs Up” and “Comma”. Data were drawn from the headline news, entertainment information and leisure information, which accounts for 25% of the main corpus. The non-newspaper media accounts for the remaining 75% of the data, which drew data from storybooks, drama scripts, teaching materials, magazines, song lyrics, comics, internet information, notices and other written texts. According to a media engagement survey carried out by the research team, the collection of those data shall be representative and homogenous to reflect the written Chinese engaged by Singapore primary schools children. In total, the corpus collected approximately 1.2-million Chinese character-tokens.

As the study is due to complete by the end of this year, the research team will generate wordlists from the two corpora. The wordlists will be complied according to different orders of three layers. The first layer will consist of wordlists of different genres, e.g. headline news, entertainment information, etc. The second layer will consist of frequency wordlists of different genres and alphabetical wordlists (by Hanyu Pinyin) of different genres, e.g. headline news frequency wordlist, headline news alphabetical wordlist, etc. The third layer will consist of the high frequency wordlists, for instance the top 500 or top 1000. At the end of the study, an analysis of the various wordlists will be made to provide insight into the daily lexicon of Singapore primary level written Chinese.