Radboud University Nijmegen (NETHERLANDS)
About this paper:
Appears in: ICERI2021 Proceedings
Publication year: 2021
Pages: 6188-6195
ISBN: 978-84-09-34549-6
ISSN: 2340-1095
doi: 10.21125/iceri.2021.1393
Conference name: 14th annual International Conference of Education, Research and Innovation
Dates: 8-9 November, 2021
Location: Online Conference
Reading is a learned skill that children acquire through instruction and practice. A desirable feature of that practice is that children can read aloud under the guidance of a teacher. Unfortunately, this is not always possible to a sufficient extent because of general time-constraints in teacher-fronted education. For this reason, experts have been looking at Automatic Speech Recognition (ASR) technology as a possible alternative for the "listening ear" that is usually lent by teachers. Especially for reading in English, the contribution of ASR technology has been investigated from various perspectives and this research even led to relatively successful commercial products. In general, in these products the aim for ASR technology is to follow the children while reading aloud and provide some form of support when they hesitate.

An important requirement for this kind of research and applications is that ASR of child speech is of sufficient quality. In turn this requires that the ASR algorithms are trained with large amounts of child speech recordings that, in general, are difficult to obtain.

A compounding problem with child speech recognition is that children, as a group, display an enormous amount of variation in terms of speech characteristics, which is in part related to their variable physical characteristics. This means that large amounts of speech recordings are needed for each age cohort, which makes achieving high-quality levels of ASR performance even more difficult.

As a matter of fact, the reason why so far research and applications have mostly addressed English is that for English larger amounts of child speech recordings could be obtained than for other languages. As a way of circumventing the data sparsity problem, speech technologists have investigated child speech recognition by applying several techniques that had been developed for low-resourced languages.

In general, research has shown that the younger the children, the more difficult it is to achieve good ASR performance. On the other hand, one can imagine that ASR-based applications for learning to read are especially required for young children.

In this paper we report on research aimed at developing and testing a Dutch ASR-based reading tutor. Innovative features of this system are that it is intended for young children in the earlier stages of learning to read, when they start developing decoding skills, and that it addresses reading in Dutch. A preliminary study conducted during the pandemic when children were primarily working from home showed encouraging results. In the present paper we report on a recent, larger study in which children used the system in the classroom. After collecting over 574 thousand words from 265 first graders from 44 different schools, we analyzed various attempts by the children at reading words and sentences before and after they received various types of ASR-based feedback. The results indicate that successive attempts were characterized by improved reading accuracy and fluency. Significant improvement in accuracy was observed in accuracy exercises and improvement in fluency was observed in fluency exercises after the pupils were provided with feedback. We also present words that are most commonly mispronounced by the pupils during practice. We discuss these results in relation to those of previous studies and consider possible avenues of future research.
Reading tutor, child speech, individualized feedback, ASR, autonomous learning, learning environments.