DATA DISCOVERY AND DISTRIBUTED REPRESENTATION FOR BETTER CULTURAL HERITAGE OBSERVATION AND LEARNING

D. Paneva-Marinova 1, J. Stoikov1, L. Pavlova2, A. Nikolova1

1Institute of Mathematics and Informatics, Bulgarian Academy of Sciences (BULGARIA)
2Laboratory of Telematics, Bulgarian Academy of Sciences (BULGARIA)
The objective of this paper is to present a new learning approach that support the educational process by facilitating a more effective learning content exploration, using automatic data discovery methods linking synonym learning concepts. In such way the study focuses on the content synthesizing activity, striving to deliver solutions for enhanced learning experience in the systems for digital cultural assets, that are not the usual environments for learning activities but preserve a valuable collection of cultural treasures and knowledge for humanity. The automatic data discovery methods and in particular the method of distributed representation could produce the required improvements to the learning experience of the users through facilitating the retrieval and analysis of specific data curation representations. The identification of both semantic and syntactic parallels in complex datasets from various data sources is crucial for complying with the requirements for data purity and their unambiguity. The concept of this study is to amend data quality by automatically detecting data pairs that represent the same entity and choosing the correct one, and it is achieved by using the computation model of the distributed representations, that aims to assist the meaningful demonstration of educational data. Identifying a duplicate value between two tables is an example of Entity Resolution, which spans to the task of identifying tuples pairs that represent the same entity. For example, the tuple ⟨Alexander The Great, King of Macedonia⟩ and ⟨Alexander III the Great, Basileus of Macedonia⟩ refer to the same person. In such a scenario the aim is to overcome the need for labeled data in order to identify both semantic and syntactic parallels, by utilizing distributed representations of tuples. As a result, the solution will support the learner to unambiguously understand the data about cultural objects, which being derived from disparate sources can be expressed with many synonyms. Moreover, the actions, which are part of the presented model, are diverse enough to be interpreted differently in a specific context and can be combined freely to support more personalized systems, ultimately increasing the users’ satisfaction with their interaction with the digital environment.