CSV Validation for Metadata Wrangling
Westgard, Joshua A.
MetadataShow full item record
This lightning talk describes a Python script for the validation of CSV files against arbitrary sets of rules specified in a schema file. The motivation for creating the tool was that CSV (comma-separated values) files have become a de facto standard for moving data between systems, and for any sort of batch ingest process. But CSV data can be messy, and often there are problems that appear only when the data is being loaded, after it is out of the hands of the librarians who have created the data and into the hands of systems staff. The tool is intended to empower data creators to validate CSV files against the requirements of the systems for which the data are being prepared, so that they can correct any problems themselves before sending the data along the pipeline.