Reinforcement Learning-Based Curriculum Sequencing under Heterogeneous Student Cognitive States

Christianto Hernando; Nicholas Felix Chandra

PDF

Published: May 18, 2026

👤 Christianto Hernando

🏢 Department of Information Systems, Faculty of AI and Data Science, Universitas Pelita Harapan, Indonesia

👤 Nicholas Felix Chandra

🏢 Department of Information Systems, Faculty of AI and Data Science, Universitas Pelita Harapan, Indonesia

This paper proposes a constraint-aware actor-critic framework for curriculum sequencing under heterogeneous student cognitive states inferred from interaction traces. The study evaluates (N=480) students across 25,600 sessions (decision horizon (T=20)) on a 60-item prerequisite-structured curriculum. Relative to a topological baseline, the proposed policy increases normalized learning gain from 0.241 to 0.312 and reduces time-to-mastery from 14.8 to 11.6 items, while improving load-adjusted return from 0.212 to 0.286. Cognitive sustainability improves concurrently: mean cumulative session load decreases from 8.6 to 7.4 and load variability decreases from 1.2 to 0.9, with the fraction of sessions exceeding the individualized load budget reduced from 27.8% to 12.5. Stratified analysis confirms that benefits concentrate in cognitively challenging regimes. In the high-load volatile stratum, learning gain rises to 0.301 under the proposed policy compared with 0.176 (topological), 0.214 (mastery-threshold), and 0.162 (contextual bandit), while completion improves to 86.7% compared with 74.5%, 78.6%, and 72.9, respectively. Behavioral stability also improves, as mean stall events drop to 1.5 per session versus 3.1 under topological sequencing, indicating reduced unproductive remediation cycles. Safety outcomes remain stable under constraints, with prerequisite violations limited to 0.3% compared with 1.1% for contextual bandit selection. Ablation results attribute performance to integrated modeling choices. Removing cognitive state inference reduces learning gain from 0.312 to 0.254 and increases cumulative load from 7.4 to 8.5, while removing the load penalty raises learning gain to 0.322 but sharply increases load to 9.6 and increases budget exceedance to 33.4%, confirming that sustainability requires explicit optimization rather than conservative pacing. Overall, the findings support constrained, belief-informed sequencing as a robust mechanism for improving learning effectiveness and cognitive stability in realistic adaptive-learning settings.

Hernando, C., & Chandra, N. F. (2026). Reinforcement Learning-Based Curriculum Sequencing under Heterogeneous Student Cognitive States. Adaptive Learning, 2(1), 25–47. Retrieved from https://al.mbicore.com/index.php/al/article/view/4

Distributed Under Creative Commons CC-BY 4.0

Issue

Vol. 2 No. 1 (2026): Regular Issue February 2026

Section

Articles

Journal Metrics
Acceptance Rate	49%
Review Speed	45 days
Issue Per Year	4
Number of Volume	2
Number of Issues	5
Number of Articles	25
Number of Reviewers	5
Number of Contributors	57
Contributing Countries	6
No. of WoS Citations	18
No. of Scopus Citations	33
No. of Google Citations	53
Abstract Views	2,422 views
PDF Download	1,016

Tools
Reference Manager
Plagiarism Checker
Grammar Assistant

Article Sidebar

Main Article Content

Article Details