1 University Politehnica of Bucharest (ROMANIA)
2 IT Center for Science and Technology (ROMANIA)
3 M.A., Izriis Institute, Ljubljana (SLOVENIA)
About this paper:
Appears in: EDULEARN21 Proceedings
Publication year: 2021
Pages: 6342-6351
ISBN: 978-84-09-31267-2
ISSN: 2340-1117
doi: 10.21125/edulearn.2021.1285
Conference name: 13th International Conference on Education and New Learning Technologies
Dates: 5-6 July, 2021
Location: Online Conference
In the context of rapid development of robotic technologies and fast growth of the ageing population, assistive robots have become an important factor to consider for offering support to elderly people. Thus, not surprisingly, human–robot interaction plays a crucial role in the booming market for intelligent personal-service.

Most elderly people have difficulties in using the recent technological advances, as they adapt harder to changes. In this context, it is crucial for a robotic framework to offer an easier way for interacting with people. There are two main possibilities for performing human-robot interaction: using gestures or voice commands. It was proven that transparency of a robot increases the patient's trust in it. For example, when a robot explains its actions out loud, it automatically becomes more trustworthy for the user. This paper presents a platform that helps elderly people to learn how to interact with robotic platforms. Thus, we investigated different gestures and a set of voice commands in order to facilitate learning the human-robot interaction.

It is interesting to notice that, in the technical literature, there are not many efforts in the direction of gesture-based interactions between robots and the elderly. Of course, there are nowadays many devices (e.g. VR headsets) that incorporate performant hand gesture recognition. However, these are mostly designed for young people, making them less suitable for elderly. The use of convolutional neural networks (CNNs) for gesture recognition was introduced more recently. A CNN-based gesture recognition system was recently proposed and it is compared with the standard hand tracking provided by the Kinect sensors. Five categories of gestures are recognized. The solution consists of training a CNN on 125 images of size 28×28×1 for each gesture category. The original images were captured from a distance of 1.5–1.7m, and the region of interest containing the hand is selected using a depth map threshold. The reported accuracy is 95.53%. We tested these mechanisms on public specialized datasets, as well as on datasets that we filmed specifically with this purpose.

In order to perform human-robot interaction using voice command, we integrated a voice module that is composed of five components: Audio Preprocessing, Automatic Speech Recognition, Natural Language Understanding, Dialog Management and Text-to-Speech. The platform was used for the NLU component. Different intents and entities were created for each language. The text which results from the ASR component is sent to the platform that extracts the intent and the entity(ies). The extracted information is sent to the DM component. Tests were performed for English and Romanian.

Human-robot interaction based on gestures and voice commands was tested on the TIAGo robot. The tests have involved 30 elderly people. Based on the performed experiments we extracted both gestures and voice commands recognised with high accuracy and also learned easier by the elderly people. A survey performed among the 30 participants revealed also their preferences regarding the type of interaction and their opinion on the most easy to use one. In the future, we plan to extend both gestures and voice commands used for human-robot interaction.
Human-robot interaction, gesture, voice, user friendly, TIAGo robot.