Implementing and Managing a Data Curation Workflow in the Cloud

Rios, Fernando and Ly, Chun (2021) Implementing and Managing a Data Curation Workflow in the Cloud. Journal of eScience Librarianship, 10 (3). ISSN 21613974

[thumbnail of jeslib-461-rios.pdf] Text
jeslib-461-rios.pdf - Published Version

Download (764kB)

Abstract

Objective: To increase data quality and ensure compliance with appropriate policies, many institutional data repositories curate data that is deposited into their systems. Here, we present our experience as an academic library implementing and managing a semi-automated, cloud-based data curation workflow for a recently launched institutional data repository. Based on our experiences we then present management observations intended for data repository managers and technical staff looking to move some or all of their curation services to the cloud.

Methods: We implemented tooling for our curation workflow in a service-oriented manner, making significant use of our data repository platform’s application programming interface (API). With an eye towards sustainability, a guiding development philosophy has been to automate processes following industry best practices while avoiding solutions with high resource needs (e.g., maintenance), and minimizing the risk of becoming locked-in to specific tooling.

Results: The initial barrier for implementing a data curation workflow in the cloud was high in comparison to on-premises curation, mainly due to the need to develop in-house cloud expertise. However, compared to the cost for on-premises servers and storage, infrastructure costs have been substantially lower. Furthermore, in our particular case, once the foundation had been established, a cloud approach resulted in increased agility allowing us to quickly automate our workflow as needed.

Conclusions: Workflow automation has put us on a path toward scaling the service and a cloud based-approach has helped with reduced initial costs. However, because cloud-based workflows and automation come with a maintenance overhead, it is important to build tooling that follows software development best practices and can be decoupled from curation workflows to avoid lock-in.

Item Type: Article
Subjects: Open Digi Academic > Multidisciplinary
Depositing User: Unnamed user with email support@opendigiacademic.com
Date Deposited: 16 Feb 2023 10:54
Last Modified: 13 Jun 2024 13:32
URI: http://publications.journalstm.com/id/eprint/183

Actions (login required)

View Item
View Item