University of Málaga (SPAIN)
About this paper:
Appears in: ICERI2014 Proceedings
Publication year: 2014
Pages: 4836-4845
ISBN: 978-84-617-2484-0
ISSN: 2340-1095
Conference name: 7th International Conference of Education, Research and Innovation
Dates: 17-19 November, 2014
Location: Seville, Spain
Robotics has become a common subject in many engineering degrees and postgraduate programs. Although at undergraduate levels the students are introduced to basic theoretical concepts and tools, at postgraduate courses there is a need to enlarge the vision of the robotics field, and therefore more complex topics have to be covered.

One of those advanced subjects is Cognitive Robotics, which covers aspects like automatic symbolic reasoning, decision-making, task planning or machine learning. In particular, decision-making algorithms involve mechanisms that represent those cognitive processes capable of selecting a sequence of actions that lead near-optimally to an specific outcome. Reinforcement Learning (RL) is a machine learning and decision-making methodology that does not require a model of the environment where the robot operates, overcoming this limitation by making observations. In short, the robot starts from a state s, executes action a and observes which state s' has reached and which associated reward R(s'|s,a) is obtained.

RL theory, in order to get the greatest educational benefit, should be complemented with some hands-on RL task. Of course, this could be done by programming the RL algorithm in some language or software, like Matlab, but, since we are in the robotics realm, the problem to be addressed should include a real robot. In this way, students get a complete vision of the robotic learning problem, as well as a first-hand experience of the problems that arise when dealing with a physical robotic platform.

Since 2008 we have been using the Lego Mindstorms NXT robot as an educational tool in several subjects related to real-time, control engineering, and robotics, both at undergraduate and graduate levels. With this background, we claim that these robots properly perform in teaching tasks, and that they offer much many advantages (software/firmware robustness, variety of sensors, PC and robot-to-robot communication links, different programming languages...) than drawbacks (weak mechanic structure, lack of sensory precision, computational limits...) Gathering RL techniques and Lego robots would be a proper way of helping the alumni to understand some core topics in real robotics applications.

Q-learning is a simple, effective and well-known RL algorithm that can be found in some Lego NXT applications. However, it has a major difficulty: the learning process of this algorithm is often incomplete or fall into local maxima if its parameters are not well tuned; in a real scenario, this problem is worsen due to real-time constraints. Thus, setting the Q-learning parameters in a real application is quite different from simulation tuning, and it is quite common that parameters must be re-tuned for each task/system combination.

In this paper we present a minimalistic implementation of a Q-learning method in a small mobile robot like the Lego NXT, focused on simplicity and applicability, and flexible enough to be adapted to other tasks. Starting from a simple wandering problem, we design an off-line model of the learning process in which the Q-learning parameters are studied. After that, we implement this solution on the robot, gradually enlarging the number of states-actions of the problem. The final result of this thorough design and analysis work is a teaching framework for developing practical activities regarding Q-learning in our robotics subjects, which will improve our teaching labor in several robotics subjects.
Reinforcement learning, q-learning, robotics, lego mindstorms nxt.