Fine-Grained Learning Style Detection Using Multimodal Deep Learning and Log Data
Main Article Content
This study proposes a multimodal deep-learning framework for fine-grained detection of learning styles by integrating behavioral log data, textual reflections, and visual interaction signals from learners in online learning environments. Using a dataset containing 200 learner profiles and more than 50,000 interaction events, the model combines Bi-LSTM–based sequence encoding, BERT semantic extraction, and CNN-driven visual behavior processing within a cross-modal attention architecture. Descriptive statistics show substantial variation across behavioral and cognitive indicators, including mean time-on-task of 47.82 seconds (SD 21.45), quiz attempts averaging 2.31 per item, reflection lengths ranging from 12 to 402 tokens, and cursor travel distances spanning 92 to 1,560 pixels. Results demonstrate that the full multimodal model achieves an overall accuracy of 0.84 and an F1-score of 0.83 across FSLSM axes, outperforming all unimodal baselines. Ablation studies reveal that removing log data reduces accuracy from 0.84 to 0.76, while removing text or visual data lowers performance to 0.79 and 0.81 respectively, confirming the dominance of sequential behavior as a predictive signal. The model’s fine-grained outputs produce mean learning-style scores near the midpoint of each axis (0.51–0.58), illustrating blended tendencies across the population. Findings confirm that multimodal deep learning enables more accurate, interpretable, and nuanced learning-style detection than traditional questionnaires or unimodal analytics, advancing the potential for adaptive, data-driven personalization in online learning systems.