WEB SYSTEM FOR TEXT ANALYSIS OF QUESTIONNAIRE DATA IN SCIENCE TEACHING
Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de México (MEXICO)
About this paper:
Conference name: 12th annual International Conference of Education, Research and Innovation
Dates: 11-13 November, 2019
Location: Seville, Spain
Abstract:
Text mining has been used for scientific text analysis in order to identify the main subject of a text, concepts that appear in documents, trends in posture on an issue, among others.
In this work, it was proposed to use text mining for automatic analysis of the written answers of a questionnaire designed to characterize diverse types of external representations of high school students on genetics’ topic, and thus determine the students’ conceptual achievement understanding. For this purpose, it was developed a web system that supports researchers in the analysis of questionnaire answers.
For the development of the system, different investigation lines of computational linguistics were analyzed, in particular for the analysis of written texts, and thus determine the specific procedure for the analysis of the students' answers.
Specifically, the web application supports researchers in the analysis of frequent concepts and relationships between them, as well as in the automatic evaluation of responses.
The web system aids researchers in the following activities:
1. Answers preprocessing. Students' answers are cleaned to facilitate and optimize their processing. The words that can be eliminated are determined, as well as the equivalent terms.
2. Analysis of frequent concepts and relationships between them. From the implicit relationships between the terms that appear in the students' answers, text mining allows us to identify concepts and relationships between them, the web system helps with the visualization of this information through a network of more important concepts.
3. Automatic evaluation. The web system can automatically evaluate the level of knowledge integration in a questionnaire answer. This process has three stages: the first one is the training that is carried out using the answers evaluated by an expert; the second one is the training validation; and finally, the use of the trained model to assess unevaluated answers. In the training stage a sample of the evaluated answers is used, these responses were graded from a modified five-level Wilson’s rubric. From this sample, a computer model (decision tree) is constructed that can determine the level of integration to which it belongs. In the validation stage, it is evaluated if the model has the expected behavior, using a different sample, called the test set, which is a set of already evaluated answers that have not been used in the training stage. If the model is able to evaluate in a similar way to the grade that the test set has, it is considered valid.
The proposed web system was tested with question one of the genetics questionnaire previously mentioned. In the conducted tests the best result achieved by the model was 74%. To analyze the validity of the automatic evaluation process, the networks of concepts constructed from a set of automatically qualified answers were compared against the networks of the complete set of answers graded by the experts. According to the researchers, it was observed that both networks of concepts are very similar.
This result is an indication that the automatic evaluation can help to quickly analyze the answers to this type of questionnaire, since the networks of concepts that build the web system are meaningful without the need to grade all the answers.Keywords:
Science education, Educational data mining, Learning analytics.