ADAPTIVE MICRO-SUPPORTS (“NUDGES”) FOR STUDENT MOTIVATION: A REINFORCEMENT LEARNING APPROACH ACROSS THE SEMESTER

J.P Machado da Costa; A.S. Coelho; O.M. Dourado Martins; R. Nacheva

doi:10.21125/inted.2026.0380

DIGITAL LIBRARY

ADAPTIVE MICRO-SUPPORTS (“NUDGES”) FOR STUDENT MOTIVATION: A REINFORCEMENT LEARNING APPROACH ACROSS THE SEMESTER

¹ IEES - Instituto Europeu de Estudos Superiores (PORTUGAL)
² Instituto Politécnico de Bragança (PORTUGAL)
³ Informatics Department, University of Economics - Varna (BULGARIA)

About this paper:

Appears in: INTED2026 Proceedings
Publication year: 2026
Article: 0380
ISBN: 978-84-09-82385-7
ISSN: 2340-1079
doi: 10.21125/inted.2026.0380

Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain

Abstract:

This study tests whether sequencing and personalising short, behaviourally informed messages (nudges) over a 14-week semester reduces procrastination, sustains motivation, and improves retention in higher education. The field deployment took place at ESTF–IEES across two undergraduate programmes (LTGSI n=168; LTGSILG n=130; N=298 consented of 312 eligible). Two institutional Learning Management Systems (LMS)—Moodle and InforEstudante—delivered three nudge families: Planning (implementation prompts, checklists, deep-links), Social Accountability (peer commitment/feedback), and Loss/Gain Framing (deadline-proximal salience). A contextual bandit policy (Thompson Sampling) selected the weekly nudge conditional on a risk score derived from LMS logs (first-action latency, activity, attendance history). Primary outcomes combined LMS behavioural metrics (latency, on-time submission, delay) and brief Motivated Strategies for Learning Questionnaire (MSLQ) subscales (self-efficacy, task value). Evaluation used mixed-effects models and quantile regression for tails, Cox/Kaplan–Meier for retention, and off-policy evaluation (IPS/DR) with logged action propensities to benchmark against fixed policies and status quo.

Results show 72% and 61% open rates in Moodle and InforEstudante, respectively, and average exposure of 9.8 nudges/student (SD=2.1). The adaptive policy reduced procrastination (−18% in mean first-action latency; −26% at Q0.90), decreased submission delay (−12.4 minutes), increased on-time submissions (+8.9 p.p.), and improved retention (Cox HR=0.72). Motivational gains were modest but meaningful (MSLQ self-efficacy d=0.32; task value d=0.27). Learned trajectories aligned with theory: Planning in weeks 1–5, Accountability in weeks 6–9, and Loss/Gain in weeks 10–14, with heterogeneous effects by weekly risk strata—evidence that “who needs what, when” is pivotal. DR/IPS estimates confirmed superiority over fixed policies after weight truncation and stratified bootstrap, supporting internal validity in a non-stationary environment; findings were robust to alternative specifications and sensitivity checks.

Contributions include:
(i) a dynamic, semester-long personalisation mechanism linking message type to temporal context and risk;
(ii) operational behavioural metrics of procrastination from LMS logs;
(iii) a transparent audit trail (logged propensities) enabling reproducible OPE.

Practically, the approach is low-cost, general data protection regulation (GDPR) conformant, and scalable within existing LMS workflows (weekly ETL, risk scoring, policy deployment, dashboards). Future work should explore fairness-constrained bandits, multi-objective rewards (motivation/efficiency/retention), and hybrid A/B/n + bandits designs to cumulate causal evidence while learning online.

Keywords:

Educational Nudging, Contextual Bandits, Learning Analytics, Procrastination, IPS/DR, Moodle, MSLQ.

About this paper:

Abstract:

Keywords:

Citation