Machine learning based education data mining through student session streams

ABSTRACT


INTRODUCTION
With the wide usage of the internet and the growth of information technology have affected the way academics and industries learn i.e., it is moved from the conventional offline mode to online mode namely the e-learning platform [1].Especially during the COVID-19 pandemic period, all classes have moved to an online model, highlighting the significance of the e-learning platform.However, significant challenges exist in providing a reliable and accurate model to predict student performance [2].Designing an effective assessment model for understanding student behavior using session streams of the e-learning platform will aid in improving students' academic performance by providing personalized content.
Personalized content delivery for improving student performance according to individual behavior in the e-learning platform is the major challenge of the current century [3].Adaptive personalizing techniques for understanding learner profiles have been emphasized [4], [5].Recently, data mining (DM) and machine learning (ML) have been used for building student performance prediction models.The DM has been used for establishing useful insight from student session stream data of the e-learning platform as shown in Figure 1;  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 383-394 384 alongside, improves decision-making performance by establishing behavior patter from data [6]- [9].Both ML and DM methodologies are very promising in different fields such as business, and network security including education.Recently, a new field has emerged namely education data mining (EDM) for enhancing learning style, understanding behavior, and improving student performance [10]- [13].The EDM data is composed of different information such as administration data, student session stream activity, and student academic performance data.Here they provided an EDM dataset collected from different databases and elearning systems.Here different ML models and an ensemble learning mechanism are constructed for predicting student performance during the course.The outcome shows ensemble model outperforms another model in terms of prediction accuracy [14]- [16].However, when data is imbalanced these model fails to establish feature affecting the predictive model; thus, providing poor classification accuracies.The objective of this paper is to build an effective student prediction model for predicting student grades during the course through an ensemble-based ML model that works well for student session stream e-learning data [17]- [19].Existing models construct ensemble learning by combining multiple ML models.However, these models are effective to address binary classification problems and when put forth under multi-label classification problems considering data imbalance, these methods exhibit poor accuracy [20], [21].The aforementioned limitations motivate this research work to develop an improved student performance prediction model through improved ensemble methodology [22], [23].This paper presents an effective student performance prediction through an improved ensemble-based ML model.First, the model briefs a detail of the ensemble algorithm namely XGBoost.Then, discusses the limitation of standard XGBoost when data is imbalanced.In addressing a modified XGBoost based student, a performance prediction model is presented [24], [25].The modified XGBoost (MXGB) encompasses an improved cross-validation mechanism for establishing features affecting the accuracy of the student performance prediction model.Finally, an ensemble-based ML is constructed for building an effective student performance predictive model.Here research significance is discussed: i) the proposed student performance prediction model employs an efficient ensemble-based predictive model through MXGB, which works well even when data is imbalanced; and ii) the MXGB encompasses an improved cross-validation mechanism to study which feature impacts the accuracy of the student prediction model; and the proposed student performance prediction model achieves better receiver operating characteristic (ROC) performance such as accuracy, sensitivity, specificity, and sensitivity, precision, and F-measure comparison with the state-of-art student performance prediction model.In section 2, ML model for EDM of student session streams.In section 3, the outcome was achieved using the proposed MXGB-based student performance prediction model over the existing ensemble-based existing proposed student performance prediction model.In the last section, the significance of the MXGBbased student performance prediction model over the existing ensemble-based student performance prediction model is discussed.

MACHINE LEARNING MODEL FOR EDM OF STUDENT SESSION STREAMS
This section presents an improved ML model namely MXGB for EDM of student session streams.The MXGB is an improvement of the standard XGBoost by considering an effective feature selection mechanism.The dataset of standard EDM is defined as (1): where =1,2,3, …, , outlines row size considered,  ∈ {−1,1} defines  ℎ row output, and  defines dimension vector of self-determining features experimental of row .In general, EDM data has diverse features that are multi-dimensional.Nonetheless, with fewer rows .Thus, for studying and designing student performance prediction model  ̂, for forecasting the real estimation of actual  is defined as (2):

XGBoost prediction algorithm
XGBoost algorithm is an improvised version of the gradient boosting algorithm [25] where weaker classifiers are combined for constructing strong classifiers for attaining better classification outcomes.Let consider a student session stream data  = {(  ,   );  = 1 … ,   ∈   ,   ∈ }, which composed of  samples of data with  features.Let   the predicted outcome by models as (3): where   defines a distinct regression tree and (  ) defines the respective prediction outcome provided by the respective  − ℎ tree concerning  − ℎ sample.The regression tree   and its function can be learned through the minimization of the following objective in (4).
In this work,  defines training loss operation for measuring variance among predicated value   and the actual value   .To avoid the over-fitting problem, the parameter  is used for penalizing the complexity of the predictive model as (5): where  and  define the regularization parameter,  defines the leaf size and  defines the score of the different leaves.The ensemble tree is constructed is through a summation process.Let ̂( ) define the prediction outcome of the  − ℎ sample considering  − ℎ iterations, it requires to add   for minimizing the (6): the ( 6) is simplified by eliminating constant parameter through second-order Taylor expansion as (7): where ℎ  defines the first-order gradient concerning  as ( 8): where   defines the first-order gradient concerning  as ( 9): therefore, the predictive model objective parameter is expressed using the (10).
The simplified representation of the ( 10) is given as (11): where   defines the sample set of leaf , which is represented as ( 12) and ( 13): where  defines the size of the tree, which is fixed, the optimal weights   * of leaf  is obtained through the (14).
In addition, the respective optimal size is obtained as (15): where   is represented as (16): similarly,   is represented as (17).
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  The  * defines the qualities of tree  where a smaller value indicates better tree structure.Though XGBoost is efficient in obtaining high prediction accuracy; however, poor feature selection under unknown environments or when data is imbalanced exhibit degradation of prediction accuracy.In addressing the research problem, an effective feature selection within training data is modeled in the next sub-section.

Modified XGBoost prediction algorithm
In this work, the feature selection process of standard XGBoost is modified by establishing better feature importance outcomes to achieve an improved prediction scheme.The feature selection process is improved by optimizing the cross-validation with a minimal validation error.The K-fold cross-validation scheme is used for optimizing the outcome of the predictive model where the dataset is randomly divided into  subset of equal size.Then, for constructing the student prediction model −1 is used, and the remaining is used for optimizing the prediction error of the student prediction model.Lastly, the mean of the prediction error of different combinations.
is used for optimizing the cross-validation error.After that, a grid of  appropriate outcomes is obtained for obtaining optimal prediction that minimized cross-validation error considering feature importance, and the student prediction model with minimal cross-validation error is chosen.The proposed cross-validation scheme with effective feature selection is composed of two phases.In the first phase, the main feature is selected from feature subsets.In the second phase, features chosen from the first phase are utilized for constructing an effective student performance prediction model.The traditional single-fold crossvalidation error is constructed as (18): ∈   =1 (18) however, the above equation does not identify which feature affects the accuracy of the predictive model.In addressing this work an effective cross-validation with effective feature selection with high importance affecting prediction accuracy is modeled as (19): ∈   =1  =1 (19) in (19), selecting ideal  ̂ for optimizing the student prediction model is attained as (20).
In (19),  defines the size of the training dataset considered, (•) defines the loss function and  ̂ () (•) defines a function to compute coefficients.The (19) is executed iteratively for constructing the best student performance prediction model (i.e., its optimization of training error is done in the first phase; the parameter is passed onto the second phase to understand and update the feature importance characteristic into the predictive model.The optimization process to obtain effective features is obtained through the minimization process of objective function employing gradient decent mechanism.The effective feature is selected employing the ranking method (•) for constructing a student performance prediction model through the (21): the feature subset is constructed as (22): the ideal feature with maximum score considering varied -folds instance is obtained as (23).
= {( 1 ), ( 1 ), … , (  )}, Then, compute the number of occurrences a particular feature is selected for  feature subsets having maximum score and the final feature subset is obtained as (24): The aforementioned equation is used for the generation of a subset of  ′ selected features, where  ℎ describe how many times a feature is selected.The enterprise performance management (EPM) training data utilized is a subset through selected features for building an effective student prediction model.To reduce randomness during the training process,  −folds are built by iterating  number of times in the first phase.In the second phase, for reducing variance subset of features is selected.Therefore, the proposed MXGB-based student performance prediction model significantly improves overall prediction accuracy in comparison with state-of-art ML-based student performance prediction schemes.

RESULT AND ANALYSIS
In this section, student performance prediction using the proposed MXGB and other existing MLbased student prediction methods are studied [22].The e-learning dataset from [22] is used for performance analysis.The selection of the dataset is based on a comparison paper [22].The model is a ML model for performing student performance prediction implemented using the Python 3 frameworks.The ROC performance metrics such as accuracy, sensitivity, specificity, precision, and F-measure are used for validating the student performance prediction model.The accuracy is computed as ( 26

Predictive model performance evaluation
In this section different ML-based student, performance prediction model in terms of specificity and sensitivity is studied.Figure 3 shows the specificity outcome achieved using different student performance prediction models such as random forest (RF), logistic regression (LR), and ensemble-based [22].XGBoostbased, and proposed MXGB-based.The RF-based attain a specificity of 0.875, the LR-based attain a specificity of 0.75, ensemble-based attain a specificity of 0.857.XGBoost-based attain a specificity of 0.8502, and the proposed MXGB-based attain a specificity of 0.946.A higher value of specificity i.e., closer to 1 is considered a good prediction model.Thus, the proposed MXGB-based student performance prediction model is much more efficient than other ML-based student performance prediction models in terms of specificity.Figure 4 shows the sensitivity outcome achieved using different student performance prediction models such as RF-based, LR-based, and ensemble-based.XGBoost-based, and proposed MXGB-based.The RF-based attains a sensitivity of 1, the LR-based attains a sensitivity of 0.857, ensemble-based attains a sensitivity of 0.857.XGBoost-based attain a sensitivity of 0.9449, and the proposed MXGB-based attain a sensitivity of 1.A higher value of sensitivity i.e., closer to 1 is considered a good prediction model.Thus, the Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  RF-based proposed MXGB-based student performance prediction model is much more efficient than other ML-based student performance prediction models in terms of sensitivity.However, the MXBG-based brings tradeoffs between higher sensitivity and specificity; thus, attaining much better student performance prediction accuracies.Further, performance is validated considering different ROC metrics such as specificity, recall, accuracy, precision, and F-measure using different predictive models as shown in Figure 2. From Figure 2, we can see the factor analysis based XGBoost (FA-XGB)-based predictive model achieves much better performance in comparison with XGBoost and ensemble-based predictive model.Figure 5 shows the ROC performance of different ML-based student performance prediction models.

Feature importance performance
Figure 3 shows a graphical representation of the feature importance parameter obtained using XGBoost and FA-XGB-based predictive model.From Figure 3, we can see that FA-XGB gives higher importance to features in comparison with XGBoost.Further, the FA-XGB-based predictive model gives importance in the following order Kolmogorov-Smirnov (KS), weight (WT), majorization-minimization (MM), moving window (MW), machine learning-based checker (MLC), machine reading comprehension (MRC), and moving window classifier (MWC).On the other side, the XGB-based predictive model gives importance in the following order WT, KS, MW, MM, MRC, MLC, and MWC.Further, it is noticed in both cases MWC is given very less importance.Figure 6 shows how selecting the right feature aid in improving the overall classification accuracy of the proposed FA-XGB-based predictive model.11 shows the graphical representation of the feature ranking score attained using XGBoost-based and MXGB-based student performance prediction model for session 2. From the result it can be stated that XGBoost-based gives a higher score for MW and a lesser score for MRC; On the other side, MXGB-based gives a higher score to WT and a lesser score for MRC. Figure 12 shows the graphical representation of the feature ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for session 3. From the result, it can be stated both XGB-based and MXGB-based give higher scores for MM and lesser scores for MWC; however, the MXGB-based model gives much higher feature importance in comparison with XGBoost-based student performance predictions.Figure 13 shows the graphical representation of the feature ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for session 4. From the result, it can be stated that XGBoost-based gives a higher score for MW and a lesser score for KS, MWC, and MRC; On the other side, MXGB-based gives a higher score to MM and a lesser score to MWC. Figure 14 shows the graphical representation of the feature ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for session 5. From the result, it can be stated that XGBoost-based gives a higher score for KS and WT and a lesser score for MW, MM, and MWC; On the other side, MXGB-based gives a higher score to KS and a lesser score to MWC. Figure 15

CONCLUSION
Predicting the performance of a student by analyzing the student session stream is a challenging task.ML algorithms have been used by various existing student performance prediction models to achieve improved prediction outcomes.However, these models tend to achieve higher accuracy to specific student data and when adapted to new data they exhibit poor performance.In addressing such issues, recent work has used an ensemble-based ML model for choosing the best model to perform prediction tasks.However, when data is imbalanced existing ensemble-based models exhibit poor performance.This paper presented an efficient ensemble machine-learning model by modifying XGBoost that works well even when training data is imbalanced.Here an effective cross-validation scheme is presented to identify which feature impacts the accuracy of the prediction model.The cross-validation scheme employs an effective feature ranking mechanism to improve prediction accuracy by optimizing the prediction error.The experiment is conducted using standard student session stream data.The proposed MXGB model significantly improves accuracy, sensitivity, specificity, precision, and F-measure performance in comparison with RF-based, LR-based, ensemble-based, and XGBoost-based student performance prediction models.The performance of the MXGB model will be tested using a more diverse dataset.Alongside this, would consider reducing training errors by considering multi-class classification.

Figure 1 .
Figure 1.General design of student performance prediction through ML models

2 )
in this work modifying the feature selection process during training XGBoost through minimization of the objective function and effective student performance prediction model is designed as shown in Figure 2.

Figure 2 .
Figure 2. Proposed ML model for EDM of student session streams where  defines true positive,  defines false positive,  defines true negative, and  defines false negative.The sensitivity is computed as (27):

Figure 3 .
Figure 3. Specificity performance of different ML algorithms for predicting student performance

Figure 4 .
Figure 4. Sensitivity performance of different ML algorithms for predicting student performance

Figure 6 .
Figure 6.Feature ranking score graphical representation

Figure
shows the graphical representation of the feature ranking score attained using the XGBoost-based and MXGB-based student performance prediction model for session 6.From the result it can be stated that XGBoost-based gives a higher score for KS and a lesser score for MLC and MWC; On the other side, MXGB-based gives a higher score to KS and a lesser score to MWC.The graphical representation from Figures 11 to 15 shows the MXGB-based gives higher importance to features in comparison with the XGBoost-based student performance prediction model.Thus, aiding the MXGB-based student performance prediction model to achieve higher accuracy in comparison with ensemble-based and XGBoost-based student performance prediction models.