University of Minho (PORTUGAL)
About this paper:
Appears in: EDULEARN21 Proceedings
Publication year: 2021
Pages: 11395-11401
ISBN: 978-84-09-31267-2
ISSN: 2340-1117
doi: 10.21125/edulearn.2021.2374
Conference name: 13th International Conference on Education and New Learning Technologies
Dates: 5-6 July, 2021
Location: Online Conference
The Book of Properties, a voluminous 17th century manuscript with a detailed inventory of the rustic and urban properties of the archbishops of Braga, includes abundant information about numerous areas of knowledge. The research of more than two thousand pages of edited text offers generous results even when done manually, through reading them or introducing search terms, namely in the context of classroom. Discovering all the toponyms and anthroponyms that populate this book, knowing which are the hundreds of first names, surnames and nicknames common at that time and traditional in Portugal, is just one of the possibilities of research. Covering the wealth of the Archbishopric of Braga countless properties throughout the north of the country, up to Galicia, and reaching the bishopric of Porto and Santarém, the abundance of data is undeniable. This heritage has been inventoried with rigorous measurement and characterization of houses and lands, including architectural information of enormous interest, agricultural data, fauna and flora. By providing the confrontations of all these properties with the surrounding lands and houses, also identifying their owners, it makes available to students and researchers an important database for a more detailed knowledge of the country and Portuguese language. An automatic text annotation (tagging) system makes possible to extract and use this data in a remarkable way, by placing ourselves instantly in front of large amounts of data, already organized in alphabetical order, or as appearing in the codex. It allows for deepening student's knowledge, making it more concrete and dynamic, being a powerful tool for knowing labels such as names of regions, cities, towns, villages, etc.; anthroponyms; types of land and houses; materials; products; food; animals or trees and plants in the landscape. Although text annotation can be performed manually, its automation is much desired, since it facilitates and significantly reduces tasks, which are quite expensive in terms of time and money, such as the manual text annotation performed by specialized personnel like linguists. Automatic text annotation allows for establishing relationships among annotated tags without human intervention, which improves text research and analysis time, as well as the quality of results. In the last two years, we have been developing a document management system especially oriented to receive the information of this codex. To improve text research and analysis and to reveal the various relationships between specific pieces of information, we incorporated in the system a set of mechanisms oriented for the annotation of the document database, allowing for creating a set of relevant, indexed, discovered and established tags, based on anthroponyms, degrees of kinship, toponyms, properties and its location, etc. In this way, we were able to maintain a base of tags as a means of indexing the most relevant information contained in the codex. Additionally, based on the specification of the created tags, the annotation mechanisms allow for analysing all the documents contained in the system and, by similarity, suggest a global annotation strategy for these tags, as well as generate a map of tag relationships to discover similar contents. In this work, we present the annotation system and demonstrate its utility by explaining the process of annotating a text and its practical use in research processes.
Teaching and Global Research, Annotation Systems, Information Tagging, Textual and Document Stores, Linguistic, Geographic and Sociocultural Research.