Real-Time Emotion-Aware Adaptive Learning System Using Multimodal Facial and Voice Recognition for Affective Personalization in Digital Instruction
Main Article Content
This study proposes and evaluates a Real-Time Emotion-Aware Adaptive Learning System that integrates facial expression recognition and voice-based affect modeling into an online instructional workflow. The system captured emotional signals from 42 participants during 25-minute learning sessions using webcam and microphone streaming. The CNN-based facial model achieved peak accuracy of 92% for happiness and 88% for neutral affect but decreased to 75%, 72%, and 69% when identifying sadness, anger, and fear. The Bi-LSTM voice model demonstrated precision values of 0.89 and 0.85 for happiness and neutrality, while sadness, anger, and fear dropped to 0.68, 0.72, and 0.65, respectively. A multimodal fusion mechanism improved overall recognition accuracy to 88%, representing gains of 9–13% over single-channel models. Adaptive interventions triggered by emotional signals produced measurable behavioral improvements. Difficulty reduction during confusion increased task completion by 17%, time extensions during anxiety lowered error rate by 11%, and encouragement prompts during frustration improved retry behavior by 22%. Gamified stimulation for boredom increased engagement duration by 26%. Overall, results indicate that emotional adaptivity doubled learning effectiveness, reduced negative affect accumulation, and embedded real-time personalization without disrupting instructional flow. The study concludes that multimodal affect monitoring constitutes a viable and necessary mechanism for next-generation intelligent tutoring.