DIGITAL LIBRARY
MAKING LEARNING MATERIAL SMARTER (WITHOUT GENERATIVE AI)
1 LINK srl - Roma (ITALY)
2 Sapienza, Università di Roma (ITALY)
3 National Technical University of Athens, NTUA (GREECE)
About this paper:
Appears in: INTED2025 Proceedings
Publication year: 2025
Pages: 1519-1528
ISBN: 978-84-09-70107-0
ISSN: 2340-1079
doi: 10.21125/inted.2025.0469
Conference name: 19th International Technology, Education and Development Conference
Dates: 3-5 March, 2025
Location: Valencia, Spain
Abstract:
In a sequence of EU-funded projects, ending with Erasmus+ We-Collab, we developed a toolkit for text analysis. We applied it to analysing and enriching “traditional” scientific and educational documents. The analysis addresses mainly lexicon level and lexicon variety, sentence structure and text complexity, contexts of use of words, “cohesion” of contiguous paragraphs and “coherence” of entire documents. Tools for document enrichment support the management of multilingual glossaries and the annotation of texts with links to glossary entries and to online lexico-semantic resources.

The text analysis toolkit was developed as a collection of open-source libraries, mostly available as Python packages on GitHub. Its functions have been tested for seven languages. Multilingual support, with a clear, common architecture, is based mainly on spaCy, a modern library for natural language processing; spaCy includes statistical language models for some 20 languages, much smaller than those of the fashionable generative AI systems but more focused on specific language features.

We collected, adapted and extended additional linguistic resources to be used for functions at a higher level than those natively provided by spaCy and integrated them with an interactive interface extending our learning platform. Target users include curriculum designers, content creators, teachers and students; teachers could be interested in assessing the suitability of text materials for a target audience or an educational goal; a learner could use a few functions to understand a text better, say for finding or guessing the meaning of unknown terms.

Experimentation with a few teachers has been conducted on curricular materials in different disciplinary areas, such as marketing and linguistics, with Spanish, English, Croatian and Italian documents. We put in the same standard format multilingual glossaries in Spanish, English, and Croatian, already developed by teachers at partner universities; we enriched, with references to glossaries and to BabelNet nodes, a lengthy article in Spanish and an entire textbook in Croatian. BabelNet is the largest encyclopedic dictionary on the cloud: it integrates in a single lexico-semantic network information from many other open resources, such as WordNet, Wikipedia, Wikidata and VerbAtlas.

The Spanish article was annotated with references to a trilingual glossary on Discourse Analysis and to BabelNet nodes tagged with the domains “Language and Linguistic”, “Law and Crime” and “Philosophy and Psychology”. The chapters of the Croatian textbook on marketing were pre-processed and stored as the elements of a “corpus”; then they were annotated with references to a bilingual glossary on marketing and to BabelNet nodes tagged with six relevant domains; finally, the lexicon used in a sequence of chapters was compared, to compute the rate of occurrence of new terms.

Experimentation with other teachers and students has started. Teachers should design new learning activities, exploiting smarter materials. Students should be guided to use the functions that facilitate text comprehension, with a focus on technical terms and their contexts of use. They could also exploit the links with extraordinary but little-known resources like BabelNet to better organize domain ontologies and terminologies in their minds and put them in relation to their mother tongues.
Keywords:
Text Analysis Toolkit, Multilingual Glossaries, Document Cohesion, spaCy NLP Library, BabelNet Integration, Educational Document Enrichment.