DIGITAL LIBRARY
AUTOMATIC WORD QUIZ CONSTRUCTION USING REGULAR AND SIMPLE ENGLISH WIKIPEDIA
Waseda University (JAPAN)
About this paper:
Appears in: INTED2016 Proceedings
Publication year: 2016
Pages: 8032-8040
ISBN: 978-84-608-5617-7
ISSN: 2340-1079
doi: 10.21125/inted.2016.0889
Conference name: 10th International Technology, Education and Development Conference
Dates: 7-9 March, 2016
Location: Valencia, Spain
Abstract:
Many effective tools exist for the automated construction of vocabulary test questions (Aist 2001; Brown et al. 2005; Kunichika et al. 2003; Lee et al. 2013). They can generate a variety of question types and test scores correlate well with those from manually constructed tests. Most tools take a reading text as the primary input, determine keywords from the text, and generate test items. However, these systems are not compatible with one traditional and common method of vocabulary teaching: regular (e.g., weekly) vocabulary lists (Brown and Perry 1991; Khoii and Sharififar 2013; Sagarra and Alba 2006). In this method, the list is the input and learners are expected to distinguish among words in each list. For this purpose, it is difficult to use many existing tools without some adaptation.

Word Quiz Constructor (WQC; Rose 2014) was developed to fit this niche. WQC uses the Coxhead Academic Word List (AWL; Coxhead 2000) as its input and constructs custom-made vocabulary quizzes en masse. WQC produces several question types including multiple choice fill-in-the-blank items, definition items, and open-blank items. Carrier sentences used to provide contexts for the target AWL test words are drawn from either on-line or off-line corpora; specifically, Wikipedia and the British Academic Writing English corpus (BAWE; Gardner and Nesi 2012). Carrier sentences are filtered using tri-gram frequencies to find high-frequency contexts for test items and also by evaluating the automatic readability index (ARI; Smith and Senter 1967).

Previous work on WQC (Rose 2014) showed that it creates test items comparable to manually-produced items in terms of facility, discrimination, and distractor efficiency. However, some problems with WQC were also revealed: One key problem concerned Wikipedia as a source for context sentences. The academic writing style at Wikipedia yields many carrier sentences which are suitable for difficult items (i.e., high ARI) but few carrier sentences for easier items (i.e., low ARI). The present research attempts to solve this problem in WQC by using an alternative version of regular English Wikipedia (EW) called Simple English Wikipedia (SEW), which aims to use more basic English words and simpler sentence structure.

In order to carry out this evaluation, 800 quiz items—400 using EW and 400 using SEW—were produced using WQC. The mean SEW context sentence length was 16.8 words versus 19.6 words for EW sentences—a significant difference [t(797)=6.9, p<0.001]. In terms of character length, the difference was also significant: 80.9 for SEW versus 96.4 for EW [t(795)=8.1, p<0.001]. The ARI of SEW items was 9.9 (i.e., nearly 10th-grade reading level in US high school) versus 11.8 (nearly high school 12th grade) for EW items [t(786)=7.7, p<0.001].

There was some repetition of items: 397 unique items were generated using EW, but there were only 387 unique items using SEW. This may be because there are significantly fewer SEW (116k) than EW (802M) articles available at the Wikipedia web sites. However, this disadvantage may be offset by the production time: the mean time to generate a SEW item was 23.0 seconds, while EW items took an average of 48.9 seconds—more than twice as long.

Preliminary results suggest that SEW may provide a better source of context sentences than EW with respect to several measures. Further evidence will be presented together with information about future development plans for WQC.
Keywords:
Language assessment, vocabulary, automated quiz construction, on-line corpora.