DIGITAL LIBRARY
TAILORING COMPUTER-ASSISTED PRONUNCIATION TEACHING: MIXING AND MATCHING THE MODE AND MANNER OF FEEDBACK TO LEARNERS
1 University of Aizu (JAPAN)
2 Peter the Great St. Petersburg Polytechnic University (RUSSIAN FEDERATION)
About this paper:
Appears in: INTED2022 Proceedings
Publication year: 2022
Pages: 767-773
ISBN: 978-84-09-37758-9
ISSN: 2340-1079
doi: 10.21125/inted.2022.0263
Conference name: 16th International Technology, Education and Development Conference
Dates: 7-8 March, 2022
Location: Online Conference
Abstract:
Computer-assisted pronunciation teaching tools aim to provide meaningful feedback to learners. StudyIntonation (www.studyintonation.org) is a computer-assisted prosody training environment comprising a digital signal processing core, speech processing components, pitch visualization and evaluation algorithms along with the interactive mobile tools. Our goal is to provide multimodal tailored feedback according to learner preferences. Such feedback includes evaluative and actionable components. Instructive audial and visual feedback is tailored using personalized features so that the learners can better understand where pronunciation is inappropriate and what to do to improve.

The provision of visual speech representation in the form of contours of model and learner’s pitches has a positive effect on learner’s pronunciation, the latter being an important part of language proficiency. For tonal languages, such as Chinese and Vietnamese, the correct intonation is important on both phrasal and morpheme levels, since conveying the correct meaning is tightly connected to appropriate and accurate tone articulation. Even for non-tonal languages, such as English or Japanese, adequate modeling of tone movements within an utterance helps in achieving better connection to the very basic cognitive mechanisms of language.

We have developed a learning environment that provides feedback on pronunciation exercises based on signal processing algorithms. These are used to construct pitch graphs displayed in a mobile screen, with the support of an audio-visual content repository, and the extensible course developer’s toolkit.

Each pronunciation exercise consists of a model audio recorded by native speakers, its text, and the plotted model pitch contour and its rhythmic portrait are presented to the user. The app enables learners to try to record their attempts with a view to replicating the pitch and rhythm of the model. The learner attempts are plotted alongside the model to show how closely attempts match the model.

The visual feedback is accomplished by the metrics of the distance between the graphs, based on a dynamic time warping (DTW) algorithm assuring tempo invariant estimation. Though DTW provides an objective primary estimation, we are working on matching the mode and manner of feedback to provide tailored feedback that meets or exceeds learner expectations.

Particularly, playback of multiple attempts of the user can be processed and displayed on the same screen to demonstrate learner progress more clearly. Segmentation and highlighting the parts of pitch corresponding to relatively independent segments (e.g. tones in syllables, stressed words, intonation variability, etc.) can be beneficial to allow users to focus on particular aspects. All this can provide learners with a better understanding of their progress. However, such graphs still lack innate corrective or instructive personified value. Therefore, we need metrics that would enable the progress and prosody production estimation. Time-frequency features, which are fairly well suited to the purpose of automatic classification, are impractical to grasp synchronization and coupling effects during the attempts. The approaches based on non-linear dynamics theory such as recurrence quantification analysis, can contribute to producing an instructive and more personalized analysis of synchronization between the initial model, the learner, and the referential native speaker’s attempts.
Keywords:
Computer-assisted prosody training, L2 education, mobile technology, pitch graph, multimodality, tailored feedback, speech processing.