DIGITAL LIBRARY
ESL GENIE: AN INTERACTIVE, INTELLIGENT SYSTEM FOR SECOND LANGUAGE LEARNING AND TEACHING ON THE WEB
University of California, San Diego (UNITED STATES)
About this paper:
Appears in: INTED2012 Proceedings
Publication year: 2012
Pages: 1300-1305
ISBN: 978-84-615-5563-5
ISSN: 2340-1079
Conference name: 6th International Technology, Education and Development Conference
Dates: 5-7 March, 2012
Location: Valencia, Spain
Abstract:
In a classroom setting, accommodating students with diverse backgrounds and levels of knowledge can be pedagogically challenging, especially for topics that have a knowledge space containing complex dependencies. Intelligent Tutoring Systems (ITS) attempt to address this challenge by providing computer-based learning environments that adapt to a student's level, facilitating self-paced and personalized learning trajectories. Traditionally, the knowledge domains of these systems are modeled explicitly, according to expert knowledge that may not always match the underlying learning dependencies of a novice, instead of taking full advantage of a corpus of user data in combination with machine learning techniques. In the field of second-language learning, this problem of knowledge domain mapping is exacerbated by the wide variety of factors influencing the structure of the underlying knowledge space, such as first language and previous language exposure. We are developing a web-based system for hybrid English as a Second Language (ESL) learning that takes a data-driven approach to the problem of domain mapping, assuming only very high-level organization of the domain knowledge, and having the actual structure of the student model be induced from a large corpus of student response data.

We use a Bayesian belief network with a layer of unobserved "knowledge component" (KC) nodes that represent aspects of a learner’s knowledge in the target language; the meaning of these nodes is not specified in advance, but one might imagine them corresponding to beliefs such as "subjects need to agree in number with verbs", or "adjectives should precede the nouns they modify". There are directional connections in the graph between these nodes, representing dependencies between knowledge components, that are learned on the basis of user responses. A number of learner parameters, such as overall proficiency and language background (e.g. age of exposure, and the learner’s native language), set a prior on the value of the KC nodes. Proficiency levels within each of the KCs then specify a distribution over learner responses.

One important application of the model is in predicting learner responses to quiz questions that have not yet been answered, which is important for purposes of both assessment and pedagogy. We want to be able to assess a learner’s knowledge state as efficiently as possible, which means maximizing the amount of information supplied by each question the student answers. Once we specify the type of information we want to maximize (e.g. information about a learner’s first language, vs. information about a learner’s overall proficiency), we want to present the learner with questions that maximize the expected decrease in entropy for that information measure. For pedagogical purposes, on the other hand, we want to choose questions that hit the sweet spot in Krashen’s "i+1" hypothesis: in other words, questions that we predict the learner may get wrong, but which a slight shift in knowledge state would allow the learner to answer correctly. Students (and their teachers/coaches) are able to introspect the student model, and are directed to resources pertaining to their weak areas. We believe that a data-driven approach to student knowledge modeling, facilitated by the wide reach of free online learning systems and the explosive growth of internet access in developing countries, can drive a revolution in personalized learning.
Keywords:
Computer-Assisted Language Learning (CALL), Intelligent Tutoring Systems (ITS), English as a Second Language (ESL), Second Language Acquisition (SLA).