Fedora Commons With Apache Hadoop: A Research Study
Publication or External Link
The Digital Collections digital repository at the University of Maryland Libraries is growing and in need of a new backend storage system to replace the current filesystem storage. Though not a traditional storage management system, we chose to evaluate Apache Hadoop because of its large and growing community and software ecosystem. Additionally, Hadoop’s capabilities for distributed computation could prove useful in providing new kinds of digital object services and maintenance for ever increasing amounts of data. We tested storage of Fedora Commons data in the Hadoop Distributed File System (HDFS) using an early development version of Akubra-HDFS interface created by Frank Asseg. This article examines the findings of our research study, which evaluated Fedora-Hadoop integration in the areas of performance, ease of access, security, disaster recovery, and costs.