-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Breve allows for the assignment of data type per column and immediate validation against those types. This is excellent! However, once the dataset has been cleaned, the only output seems to be the cleaned CSV. I believe this tool would be even more useful if the type information created through Breve were recorded using JSON Table Schema and the data exported as a Tabular Data Package. Likewise, on import, the type information could be automatically set using validation rules expressed via the Data Package format.
A Data Package provides a minimal "container" for transporting any kind of data. It is designed for extension to allow publishers to add additional constraints on the format and type of data and metadata.
Concretely, you can create a Data Package by placing a specially formatted file, datapackage.json, in the directory containing the files that comprise your dataset. Given a dataset called dataset.csv that looks like this:
a,b,c
1,2,3
4,5,6
A very simple example of a datapackage.json that would accompany the unaltered CSV would look like this:
{
"name": "my-first-dataset",
"title": "My First Dataset",
"resources": [
{
"path": "dataset.csv",
"format": "csv",
"schema": {
"fields": [
{
"name": "a",
"type": "integer"
},
{
"name": "b",
"type": "integer"
},
{
"name": "c",
"type": "integer"
}
]
}
}
]
}
The data types you support would all be expressible via the JSON Table Schema language using a combination of type, format, and constraints per field:
http://specs.frictionlessdata.io/json-table-schema/#field-descriptors
We're building an ecosystem of tools and integrations that allow the reading of Data Packages in tools already in use today: http://frictionlessdata.io/about/ . We can definitely assist in supporting this integration.

