1 Universitat Autònoma de Barcelona (SPAIN)
2 Radboud Universiteit Nijmegen (NETHERLANDS)
About this paper:
Appears in: EDULEARN14 Proceedings
Publication year: 2014
Pages: 3696-3705
ISBN: 978-84-617-0557-3
ISSN: 2340-1117
Conference name: 6th International Conference on Education and New Learning Technologies
Dates: 7-9 July, 2014
Location: Barcelona, Spain
In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analyzing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition. In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with.

The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2.. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat (Boersma & Weenink, 2009).

We adopted the SAMPA phonemic inventory for the phonemic transcription in Spanish (Llisterri and Mariño, 1993) and added 11 new symbols and 7 diacritics taken from X-SAMPA (Wells, 1994) for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non native spontaneous speech (TEI, 2013; Gibbon et al., 1998), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following this encoding system.

Mispronunciations were annotated using an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking other annotators to transcribe a subset of the speech material. We calculated intratranscriber and intertranscriber agreement coefficients.

The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values.
We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified.
Non-native speech corpus, Japanese L1 - Spanish L2, automatic speech recognition.