INTEGRATING TEXT MINING AND CITATION ANALYSIS IN THE DECISION-MAKING PROCESS FOR LIBRARY COLLECTIONS

L. Illescas1, D. Sucozhanay2, L. Siguenza-Guzman3

1Universidad de Cuenca, Facultad de Filosofía (ECUADOR)
2Universidad de Cuenca, Facultad de Ciencias Económicas y Administrativas y Departamento de Espacio y Población (ECUADOR)
3Universidad de Cuenca, Departamento de Ciencias de la Computación (ECUADOR)
In recent years, the scientific production in Ecuador has registered a considerable increase, due to the implementation of government policies designed to improve the quality of education. Higher Education Institutions (HEI) have also tried to stimulate research and scientific production to even higher quality standards with the pressure to rack up publications in high-impact journals. However, research and scientific production can flourish only in an environment where access to scientific knowledge is easily available. Consequently, Ecuadorian universities have increased their budget by approximately five times in order to provide access to digital databases and other electronic resources. Unfortunately, these efforts have not yielded the expected results to cover the minimum level of access to knowledge, due to the high costs of subscriptions to scientific journals. Therefore, decision making in library collection development becomes a very important process that needs to get the attention deserved.

In general, at the University of Cuenca, funds for library collection development are allocated by faculties; each faculty decides what to subscribe or unsubscribe, generally following historical spending patterns, electronic journal usage data, and in some cases, based on their own finances and priorities. Nevertheless, these indicators have been subject to recurring debates, due to their unclear relation with the current and future library needs of information. More research is required for the construction of accurate indicators regarding the library collection performance and the growing needs of collection development.

The aim of this article is to have a deep insight of the local use of the collection, contextualised to the references cited in scientific articles published by authors affiliated to the University of Cuenca. To achieve this goal, a set of the last 10-year publications were analysed. The full article and reference list were extracted using text mining methods. Text parsing and text filtering techniques were used for data extraction of each text corpus. Each word was classified as a text tree; in which, through the recognition of identities and the extraction of relationships, a data structure was constructed. This structure allowed the application of data mining techniques, such as clustering, decision trees and classification methods.

By integrating text mining and citation analysis in the decision-making process for library collections, the authors aim to provide a dynamic solution that assists library managers to make economic decisions based on an “as realistic as possible” perspective of the users' needs.