A real-time multi-modal deep learning framework for student attentiveness assessment in online learning environments
Rajasekaran Mariswamy, P.V. Praveen Sundar
Abstract
The rapid growth of online learning platforms has increased the need for intelligent systems capable of monitoring student attentiveness in real time to improve learning effectiveness and adaptive instruction. This paper proposes a multi-modal deep learning framework for attentiveness assessment by integrating visual, behavioral, and temporal information extracted from online classroom interactions. The proposed system consists of four major components, namely data acquisition, preprocessing and normalization, deep feature extraction with temporal learning, and attentiveness evaluation with analytics generation. Visual and spatial characteristics are learned using a convolutional neural network (CNN), while temporal behavioral patterns are captured through a long short-term memory (LSTM) network to model sequential engagement dynamics. The framework is designed to operate in both real-time and offline modes, enabling live monitoring during virtual classes as well as post-session analysis of recorded lectures. The computational pipeline is optimized through fixed-point processing, parallel convolution execution, and latency-aware temporal modeling, making it suitable for field programmable gate array (FPGA)-based and embedded implementations under constrained computational resources. Experimental evaluation conducted on an in-house dataset demonstrates that the proposed framework achieves 92.9% classification accuracy and a 91.9% F1-score, while maintaining strong generalization capability on cross-dataset benchmarks. Furthermore, latency analysis shows an average processing time of 31.6 ms per frame, enabling near real-time inference at approximately 30 frames per second.