About this paper

Appears in:
Pages: 3696-3705
Publication year: 2014
ISBN: 978-84-617-0557-3
ISSN: 2340-1117

Conference name: 6th International Conference on Education and New Learning Technologies
Dates: 7-9 July, 2014
Location: Barcelona, Spain

A CORPUS-BASED STUDY OF SPANISH L2 MISPRONUNCIATIONS BY JAPANESE SPEAKERS

M. Carranza Díez1, C. Cucchiarini2, J. Llisterri1, M. J. Machuca1, A. Ríos1

1Universitat Autònoma de Barcelona (SPAIN)
2Radboud Universiteit Nijmegen (NETHERLANDS)
In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analyzing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition. In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with.

The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2.. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat (Boersma & Weenink, 2009).

We adopted the SAMPA phonemic inventory for the phonemic transcription in Spanish (Llisterri and Mariño, 1993) and added 11 new symbols and 7 diacritics taken from X-SAMPA (Wells, 1994) for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non native spontaneous speech (TEI, 2013; Gibbon et al., 1998), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following this encoding system.

Mispronunciations were annotated using an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking other annotators to transcribe a subset of the speech material. We calculated intratranscriber and intertranscriber agreement coefficients.

The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values.
We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified.
@InProceedings{CARRANZADIEZ2014ACO,
author = {Carranza D{\'{i}}ez, M. and Cucchiarini, C. and Llisterri, J. and Machuca, M. J. and R{\'{i}}os, A.},
title = {A CORPUS-BASED STUDY OF SPANISH L2 MISPRONUNCIATIONS BY JAPANESE SPEAKERS},
series = {6th International Conference on Education and New Learning Technologies},
booktitle = {EDULEARN14 Proceedings},
isbn = {978-84-617-0557-3},
issn = {2340-1117},
publisher = {IATED},
location = {Barcelona, Spain},
month = {7-9 July, 2014},
year = {2014},
pages = {3696-3705}}
TY - CONF
AU - M. Carranza Díez AU - C. Cucchiarini AU - J. Llisterri AU - M. J. Machuca AU - A. Ríos
TI - A CORPUS-BASED STUDY OF SPANISH L2 MISPRONUNCIATIONS BY JAPANESE SPEAKERS
SN - 978-84-617-0557-3/2340-1117
PY - 2014
Y1 - 7-9 July, 2014
CI - Barcelona, Spain
JO - 6th International Conference on Education and New Learning Technologies
JA - EDULEARN14 Proceedings
SP - 3696
EP - 3705
ER -
M. Carranza Díez, C. Cucchiarini, J. Llisterri, M. J. Machuca, A. Ríos (2014) A CORPUS-BASED STUDY OF SPANISH L2 MISPRONUNCIATIONS BY JAPANESE SPEAKERS, EDULEARN14 Proceedings, pp. 3696-3705.
User:
Pass: