CSV Validation for Metadata Wrangling

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2015-06-04

Advisor

Citation

Abstract

This lightning talk describes a Python script for the validation of CSV files against arbitrary sets of rules specified in a schema file. The motivation for creating the tool was that CSV (comma-separated values) files have become a de facto standard for moving data between systems, and for any sort of batch ingest process. But CSV data can be messy, and often there are problems that appear only when the data is being loaded, after it is out of the hands of the librarians who have created the data and into the hands of systems staff. The tool is intended to empower data creators to validate CSV files against the requirements of the systems for which the data are being prepared, so that they can correct any problems themselves before sending the data along the pipeline.

Notes

A lightning talk delivered at the Library Research and Innovative Practice Forum, McKeldin Library, June 4, 2015. The tool described is available at http://www.github.com/jwestgard/csv-validate/.

Rights