DIGITAL LIBRARY
REALIZING A DYNAMIC SIGN LANGUAGE RECOGNITION FOR DEAF-MUTE’S EDUCATION
Northwest Normal University (CHINA)
About this paper:
Appears in: INTED2021 Proceedings
Publication year: 2021
Pages: 3675-3681
ISBN: 978-84-09-27666-0
ISSN: 2340-1079
doi: 10.21125/inted.2021.0762
Conference name: 15th International Technology, Education and Development Conference
Dates: 8-9 March, 2021
Location: Online Conference
Abstract:
Background & Motivation:
Sign language is the primary way for deaf-mutes to communicate with ordinary people. However, classroom teaching for the deaf-mute faces communication difficulties because of the lack of teachers who master sign language. Therefore, automatically recognizing sign language by artificial intelligence technology (AIT) can help teachers understand the expressions of deaf-mutes, thereby helping teachers better complete deaf-mutes teaching. Therefore, we can solve the problem of lacking sign language teachers in deaf-mutes education by AIT.

Methods:
This paper proposes a multimodal dynamic sign language recognition method based on a 3D convolutional neural network with Time-Space Pyramid Pooling block, which is named TSPP-C3DNet. The method in this paper mainly includes two stages. In the first stage, we train the two models in an end-to-end manner using RGB and depth video to reduce the time and space complexity of overall network calculation. In the second stage, the model trained in the first stage is used to extract RGB and depth video features, respectively. The overall model is trained by using multimodal features in the form of joint fine-tuning.

Results:
The results were tested using different scales (The video frames are 16, 32, 48 and 64, respectively) and obtain high recognition accuracy for 100 vocabularies on the Chinese Sign Language (CSL) dataset. In particular, the accuracy of TSPP-C3D (32-frame) was 88.63% higher than the best result of C3D (32-frame), which was 11.03% higher. The experimental results further demonstrate our method's effectiveness, and TSPP-C3DNet can effectively recognize complex sign language through larger video sequence data.

Conclusion:
This paper proposes a multimodal dynamic sign language recognition method, which can more accurately recognize dynamic sign language, thereby helping teachers understand deaf-mutes' expression and solve problems in classroom teaching for deaf-mutes.
Keywords:
Deaf-mutes education, dynamic sign language recognition, 3D convolution, Time-Space Pyramid Pooling, Multimodal data.