DIGITAL LIBRARY
THE ROAD TO THE LIBRARY OF BABEL OR HOW TO MANAGE LARGE DIGITAL COLLECTIONS
Metropolitan Autonomous University (MEXICO)
About this paper:
Appears in: ICERI2018 Proceedings
Publication year: 2018
Pages: 1481-1486
ISBN: 978-84-09-05948-5
ISSN: 2340-1095
doi: 10.21125/iceri.2018.1337
Conference name: 11th annual International Conference of Education, Research and Innovation
Dates: 12-14 November, 2018
Location: Seville, Spain
Abstract:
We present a tool and a model for digital archiving management that can be deployed on any situation concerning the utilization of large digital catalogs. This approach can be applied not only for scientific digital repositories, but also on any organization requiring the management of collections that encompass a massive amount of digital resources.

While DSpace is an open source repository software package typically used for creating open access repositories for scholarly and/or published digital content, the DSpace repository software serves a specific need as a digital archiving system, focused on the long-term storage, access and preservation of digital content.

The Babel Storage System is a dependable, scalable and flexible software defined storage system. Among its main features it can be underlined the availability of different types of data redundancy, a careful decoupling between data and metadata, a middleware that enforces metadata consistency, and its own load-balance and allocation procedure which adapts to the number and capacities of the supporting storage devices. It can be deployed over different hardware platforms, i.e. fully hardware-agnostic.

We have built an interface that connects the DSpace server with Babel. Digital catalogs preservation is the so-called core business of any repository. Nevertheless, from our view, the amount of information to be preserved imposes a structure that divides the storage capabilities in two main categories: primary and secondary. Primary storage is where documents are initially allocated; it is required from these devices to offer small transfer latency. In contrast, secondary storage is the support of the long-term archiving. These devices must be able to accommodate a massive volume of information.

Depending on the storage policies fixed by the organization, documents migrate from the primary to the secondary at some point of their lifetimes. This feature prevents the primary from being overloaded, limits its size and allows the usage of solid state devices, for instance, which have very low latencies, but do not offer big capabilities. We propose the usage of Babel as an alternative to secondary storage. In our solution, each document that is received at the primary DSpace server is automatically backed up at Babel in a transparent way. Also, the metadata that describes the collection is regularly backed up, enabling disaster recovery procedures. Should a document that has been eliminated from the primary server is required, it will be automatically recovered from Babel in a transparent way.

We call this new organization "the closed library model" because, as it happens in some libraries, users are not allowed to directly interact with the entire collection which remains closed to the public (for security issues). Instead, there is an authorized clerk that stores and retrieves any document from the shelves. We built an automatic clerk, which is the only authorized to interact with the library of Babel.

We consider that our model offers some advantages for repository management: 1) it supports an agile service for an important number of concurrent users, 2) it is possible to manage different collections and catalogs at the same server, 3) originals are never delivered, 4) decoupling primary and secondary storage offers high availability and scalability in a cost-effective way.
Keywords:
Digital repositories, archiving, massive storage.