DIGITAL LIBRARY
Α COMPARISON OF TWO TEXT-BASED FEATURE SETS FOR PREDICTING STUDENT PERFORMANCE: Α INITIAL EXPLORATION
University of Thessaly (GREECE)
About this paper:
Appears in: ICERI2021 Proceedings
Publication year: 2021
Pages: 4806-4814
ISBN: 978-84-09-34549-6
ISSN: 2340-1095
doi: 10.21125/iceri.2021.1101
Conference name: 14th annual International Conference of Education, Research and Innovation
Dates: 8-9 November, 2021
Location: Online Conference
Abstract:
Introduction:
To date, most scholarly research on Learning Analytics (LA) has mainly focused on using various feature sets for predicting student performance, most notably identifying students on the verge of academic failure. One category of features that has not been systematically explored in the field of LA involves the use of student-generated texts.

Study Focus:
This study examines the predictive power of student-generated short texts, such as summaries. More specifically, two large feature sets are compared:
(a) raw text feature set (i.e. intact summaries) and
(b) engineered features set (i.e. extracted features). The following research questions are addressed:
RQ1: Which feature set results in higher classification accuracy?
RQ2: Which Machine Learning (ML) algorithms are the top performing ones with each feature set in terms of classification accuracy?

Method:
Forty two first-year student teachers at a Greek University voluntarily participated in the study and were compensated with 2 credit points.
The participants watched six video lectures hosted on a customized Moodle-based LMS in a single 3 hour-long session. The topics covered in the lectures were related to digital media.
After viewing each video, the participants were asked to write a short summary of the main concepts. Their comprehension of each video’s concepts was measured using a 10-item knowledge test.

Data Analysis:
For the purposes of this study, we used the Python ML ecosystem and specifically the Pandas, Scikit-Learn, and spaCy libraries.
In the case of the raw text features set, the respective vector representations were created either using token counts or token counts with normalized weights.
In the case of the engineered features set, spaCy was used for: (a) extracting linguistic features (e.g. number of words, sentences, nouns, verbs, adjectives per student summary) and (b) computing the semantic similarity of each student summary with each respective video lecture transcript.

Using the median student performance in each video as a threshold, new binary variables were created: low and high performance.

The most common classification ML algorithms were used to categorize student performance in low and high. We followed the standard 10-fold cross-validation procedures, using 90% of the data for training and 10% of the data for testing. Additionally, GridSearchCV was used to exhaustively search a large number of hyperparameter combinations per ML algorithm.

Results and Discussion:
Regarding the RQ1, the results indicate that in approximately 50% of all comparisons, there was no difference in terms of accuracy between the two feature sets. Moreover, in 30% of all cases the raw text feature set resulted in higher classification accuracy, while the inverse was true in remaining 20% of the cases, with the engineered features set leading to higher accuracy rates.

Regarding the RQ2, the results show that, with one exception, in all other cases of feature set comparisons, at least one or more algorithms appeared to achieve the highest levels of classification accuracy. Over all 6 video lectures, the average classification accuracy was 0.90 in the case of the raw text features set and 0.78 for the engineered features set.

The paper is concluded with a discussion of the significant implications that this line of research can open up for tracing student learning in on-line environments.
Keywords:
Machine learning, text similarity, engineering features, natural language processing.