DIGITAL LIBRARY
IREAD: INTERPRETABLE RECOGNITION AND AUTOMATED DECONSTRUCTION OF SEMANTICS IN WRITTEN ENGLISH
PES University (INDIA)
About this paper:
Appears in: EDULEARN23 Proceedings
Publication year: 2023
Pages: 2266-2275
ISBN: 978-84-09-52151-7
ISSN: 2340-1117
doi: 10.21125/edulearn.2023.0668
Conference name: 15th International Conference on Education and New Learning Technologies
Dates: 3-5 July, 2023
Location: Palma, Spain
Abstract:
Multiple studies have been conducted about the importance of grammar while learning any second language. Wang likens English grammar to the framework or foundation of a house, imperative for the house to be solid [1]. Topics like subject and object clauses, tenses, tone, similes, voice, etc. are commonly used in written English and are featured in school curricula. Extensive research on these topics has been done in Natural Language Processing (NLP), to detect parts-of-speech and semantic relations in text. However, none of this is available in an easily accessible format to the layperson. Since the model does not explain how it arrived at the results, this calls into question its accountability and explainability.

This paper presents Machine Learning and Natural Language Processing techniques to deconstruct passages and identify literary devices and grammatical rules pertaining to the identification of Subject-Predicate-Object, Tense, Tone, Alliteration, Simile, Rhyme Scheme, Voice and Metaphor. The NLP layer is divided into three levels based on the difficulty of the topics covered throughout school - 1) Easy deals with topics like SPO triplets, tense (past, present, future), tone, which are common topics throughout 5th - 7th grade. 2) Intermediate includes devices like alliterations, personifications and similes. It accounts for the aesthetics of the language, targeted at students in the 7th - 9th grade. 3) Advanced involves concepts like voice and metaphors, covered in 9th and 10th grade.

While most NLP-based techniques focus on improving classification accuracies, the techniques used in this work focus on providing insight into the rationale behind the deconstruction. This is effected through (1) an intuitive explanation through an explication of the rules that underlie the grammatical constructs and (2) identifying the most relevant features for machine classification as a way to explain the outcome of models. For instance, for tone detection, a model-based explanation is provided using eli5, a popular eXplainable Artificial Intelligence framework. This library analyses individual predictions to understand the local performance of the model.

We validate the algorithms that underlie the learning aid on manually annotated datasets from high school textbooks on English Grammar that are widely prescribed and referred to, like Wren and Martin. The results are very promising, especially for the Rhyme Scheme, Alliteration and Voice components, which have accuracies of 97%, 95.04% and 94.8% respectively. Tense and Simile have accuracies of 85.9% and 83.26% respectively, and Metaphor has an accuracy of 66.67%. The Tone detection algorithm, which uses Logistic Regression, achieves an F1 score of 0.69 and Subject-Predicate-Object detection using Allen NLP’s OpenIE algorithm achieves an F1 score of 0.62.

These components are integrated into a simple web application, iREAD, creating a one-stop-shop to support a learner to improve their English grammar skills. While it is based on school curricula and can support k-12 students as a learning aid, iREAD is particularly useful for adult learners studying English as a second language.

The data and code have been made available to facilitate reproducing the results in this paper towards benchmarking the algorithms presented in this paper.

[1] Wang, F. (2010). The necessity of grammar teaching. English Language Teaching, 3 , 78–81.
Keywords:
Natural Language Processing, English Grammar Rules, eXplainable Artificial Intelligence.