Polytechnic of Milan (ITALY)
About this paper:
Appears in: ICERI2014 Proceedings
Publication year: 2014
Pages: 4742-4749
ISBN: 978-84-617-2484-0
ISSN: 2340-1095
Conference name: 7th International Conference of Education, Research and Innovation
Dates: 17-19 November, 2014
Location: Seville, Spain
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach.
The basic idea is to invite to imitate pronunciation and prosody of an English mother-tongue speaker, giving to the student a visual-auditory real-time feedback, and a global evaluation.

Of course, it is impossible to obtain a perfect imitation of the mother-tongue speaker but, following what happens during a musical performance, imitation is a natural and efficient way to stimulate the subject’s abilities (in our case, modulating and controlling her/his oral linguistic production).
Two types of activities are defined: a Prosodic Session and a Pronunciation Session. Both sessions are conducted in a room equipped with a sound system and a smart board.

The Prosodic Session will consist of a simple, real-time visualization (similar to "karaoke") of the most important vocal parameters: pitch, amplitude, silences, timbre evaluation, and harmonicity. The subject will imitate the speaker’s voice, leveraging the vocal parameter graphs generated by the system.

The Pronunciation Session is dedicated to the pronunciation of vowels, consonant groups, and syllabic groups belonging to specific words (a typical problem of people with dyslexia). Subjects pronounce specific words, containing sensible linguistic patterns (like “dad” / “did”) and the system generates a graph showing the position of the phoneme of interest, within a reference schema (for example the vowels’ triangle). The smart board will provide the student with the opportunity of interacting with the graph, listening the correct pronunciations of phonemes.
Analysis protocols, based on data collected from the sessions, will assess improvements of the subject’s speech abilities, and the impact on her/his specific issues.

The project is based on the extraction of acoustic features, already known in the Music Information Retrieval field, and MPEG-7 encoding. For the prosodic session, both on-line and off-line processes will be carried out, with specific attention to making the final parameters independent of equipment and environmental conditions. For the pronunciation session, individual phonemes will be recognized thanks to Gaussian Markov Models; multi-dimensional feature vectors, collected by English mother-tongue speakers, will be used for training such models. In order to decrease the computational complexity, the initial list of features will be reduced by means of the PCA and SVD techniques.

Design of mathematical models and software functionalities has been completed; implementation is in progress. Testing, and validation will occur at La Musa, a new joint applied research laboratory of Politecnico di Milano and Fondazione Sequeri Esagramma, a non-profit organization providing support and rehabilitation programs to children and adults with cognitive or mental problems.
Dislexia, learning disabilities, pronunciation, prosody, second language, english learning, features, Gaussian Markov Models, music information retrieval.