Reinforcement Learning-Based Curriculum Sequencing under Heterogeneous Student Cognitive States

Main Article Content

👤 Christianto Hernando
🏢 Department of Information Systems, Faculty of AI and Data Science, Universitas Pelita Harapan, Indonesia
👤 Nicholas Felix Chandra
🏢 Department of Information Systems, Faculty of AI and Data Science, Universitas Pelita Harapan, Indonesia

This paper proposes a constraint-aware actor-critic framework for curriculum sequencing under heterogeneous student cognitive states inferred from interaction traces. The study evaluates (N=480) students across 25,600 sessions (decision horizon (T=20)) on a 60-item prerequisite-structured curriculum. Relative to a topological baseline, the proposed policy increases normalized learning gain from 0.241 to 0.312 and reduces time-to-mastery from 14.8 to 11.6 items, while improving load-adjusted return from 0.212 to 0.286. Cognitive sustainability improves concurrently: mean cumulative session load decreases from 8.6 to 7.4 and load variability decreases from 1.2 to 0.9, with the fraction of sessions exceeding the individualized load budget reduced from 27.8% to 12.5. Stratified analysis confirms that benefits concentrate in cognitively challenging regimes. In the high-load volatile stratum, learning gain rises to 0.301 under the proposed policy compared with 0.176 (topological), 0.214 (mastery-threshold), and 0.162 (contextual bandit), while completion improves to 86.7% compared with 74.5%, 78.6%, and 72.9, respectively. Behavioral stability also improves, as mean stall events drop to 1.5 per session versus 3.1 under topological sequencing, indicating reduced unproductive remediation cycles. Safety outcomes remain stable under constraints, with prerequisite violations limited to 0.3% compared with 1.1% for contextual bandit selection. Ablation results attribute performance to integrated modeling choices. Removing cognitive state inference reduces learning gain from 0.312 to 0.254 and increases cumulative load from 7.4 to 8.5, while removing the load penalty raises learning gain to 0.322 but sharply increases load to 9.6 and increases budget exceedance to 33.4%, confirming that sustainability requires explicit optimization rather than conservative pacing. Overall, the findings support constrained, belief-informed sequencing as a robust mechanism for improving learning effectiveness and cognitive stability in realistic adaptive-learning settings.

Hernando, C., & Chandra, N. F. (2026). Reinforcement Learning-Based Curriculum Sequencing under Heterogeneous Student Cognitive States. Adaptive Learning, 2(1), 25–47. Retrieved from https://al.mbicore.com/index.php/al/article/view/4

Article Details

Section
Articles