A library for working with Table Schema.
Table class for working with data and schema
Schema class for working with schemas
Field class for working with schema fields
validate function for validating schema descriptors
infer function that creates a schema based on a data sample
To use the library with
webpackplease replicate the
webpack.config.js->nodeconfiguration - https://github.com/frictionlessdata/tableschema-js/blob/master/webpack.config.js
The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specify
tableschema version range in your
package.json file e.g.
tabulator: ^1.0 which will be added by default by
npm install --save.
$ npm install tableschema
<script src="//unpkg.com/tableschema/dist/tableschema.min.js"></script>
Let's start with a simple example for Node.js:
const {Table} = require('tableschema')
const table = await Table.load('data.csv')
await table.infer() // infer a schema
await table.read({keyed: true}) // read the data
await table.schema.save() // save the schema
await table.save() // save the data
And for browser:
After the script registration the library will be available as a global variable
tableschema:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>tableschema-js</title>
</head>
<body>
<script src="//unpkg.com/tableschema/dist/tableschema.min.js"></script>
<script>
const main = async () => {
const table = await tableschema.Table.load('https://raw.githubusercontent.com/frictionlessdata/datapackage-js/master/data/data.csv')
const rows = await table.read()
document.body.innerHTML += `<div>${table.headers}</div>`
for (const row of rows) {
document.body.innerHTML += `<div>${row}</div>`
}
}
main()
</script>
</body>
</html>
A table is a core concept in a tabular data world. It represents data with metadata (Table Schema). Let's see how we could use it in practice.
Consider we have some local csv file. It could be inline data or remote link - all supported by
Table class (except local files for in-browser usage of course). But say it's
data.csv for now:
city,location
london,"51.50,-0.11"
paris,"48.85,2.30"
rome,N/A
Let's create and read a table. We use static
Table.load method and
table.read method with a
keyed option to get array of keyed rows:
const table = await Table.load('data.csv')
table.headers // ['city', 'location']
await table.read({keyed: true})
// [
// {city: 'london', location: '51.50,-0.11'},
// {city: 'paris', location: '48.85,2.30'},
// {city: 'rome', location: 'N/A'},
// ]
As we could see our locations are just strings. But it should be geopoints. Also Rome's location is not available but it's also just a
N/A string instead of JavaScript
null. First we have to infer Table Schema:
await table.infer()
table.schema.descriptor
// { fields:
// [ { name: 'city', type: 'string', format: 'default' },
// { name: 'location', type: 'geopoint', format: 'default' } ],
// missingValues: [ '' ] }
await table.read({keyed: true})
// Fails with a data validation error
Let's fix not available location. There is a
missingValues property in Table Schema specification. As a first try we set
missingValues to
N/A in
table.schema.descriptor. Schema descriptor could be changed in-place but all changes should be committed by
table.schema.commit():
table.schema.descriptor['missingValues'] = 'N/A'
table.schema.commit()
table.schema.valid // false
table.schema.errors
// Error: Descriptor validation error:
// Invalid type: string (expected array)
// at "/missingValues" in descriptor and
// at "/properties/missingValues/type" in profile
As a good citizens we've decided to check out schema descriptor validity. And it's not valid! We should use an array for
missingValues property. Also don't forget to have an empty string as a missing value:
table.schema.descriptor['missingValues'] = ['', 'N/A']
table.schema.commit()
table.schema.valid // true
All good. It looks like we're ready to read our data again:
await table.read({keyed: true})
// [
// {city: 'london', location: [51.50,-0.11]},
// {city: 'paris', location: [48.85,2.30]},
// {city: 'rome', location: null},
// ]
Now we see that:
null
And because there are no errors on data reading we could be sure that our data is valid against our schema. Let's save it:
await table.schema.save('schema.json')
await table.save('data.csv')
Our
data.csv looks the same because it has been stringified back to
csv format. But now we have
schema.json:
{
"fields": [
{
"name": "city",
"type": "string",
"format": "default"
},
{
"name": "location",
"type": "geopoint",
"format": "default"
}
],
"missingValues": [
"",
"N/A"
]
}
If we decide to improve it even more we could update the schema file and then open it again. But now providing a schema path and iterating thru the data using Node Streams:
const table = await Table.load('data.csv', {schema: 'schema.json'})
const stream = await table.iter({stream: true})
stream.on('data', (row) => {
// handle row ['london', [51.50,-0.11]] etc
// keyed/extended/cast supported in a stream mode too
})
It was only basic introduction to the
Table class. To learn more let's take a look on
Table class API reference.
A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a url to a JSON file or a JSON object. The schema is initially validated (see validate below). By default validation errors will be stored in
schema.errors but in a strict mode it will be instantly raised.
Let's create a blank schema. It's not valid because
descriptor.fields property is required by the Table Schema specification:
const schema = await Schema.load({})
schema.valid // false
schema.errors
// Error: Descriptor validation error:
// Missing required property: fields
// at "" in descriptor and
// at "/required/0" in profile
To not create a schema descriptor by hands we will use a
schema.infer method to infer the descriptor from given data:
schema.infer([
['id', 'age', 'name'],
['1','39','Paul'],
['2','23','Jimmy'],
['3','36','Jane'],
['4','28','Judy'],
])
schema.valid // true
schema.descriptor
//{ fields:
// [ { name: 'id', type: 'integer', format: 'default' },
// { name: 'age', type: 'integer', format: 'default' },
// { name: 'name', type: 'string', format: 'default' } ],
// missingValues: [ '' ] }
Now we have an inferred schema and it's valid. We could cast data row against our schema. We provide a string input by an output will be cast correspondingly:
schema.castRow(['5', '66', 'Sam'])
// [ 5, 66, 'Sam' ]
But if we try provide some missing value to
age field cast will fail because for now only one possible missing value is an empty string. Let's update our schema:
schema.castRow(['6', 'N/A', 'Walt'])
// Cast error
schema.descriptor.missingValues = ['', 'N/A']
schema.commit()
schema.castRow(['6', 'N/A', 'Walt'])
// [ 6, null, 'Walt' ]
We could save the schema to a local file. And we could continue the work in any time just loading it from the local file:
await schema.save('schema.json')
const schema = await Schema.load('schema.json')
It was only basic introduction to the
Schema class. To learn more let's take a look on
Schema class API reference.
Class represents a field in the schema.
Data values can be cast to native JavaScript types. Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema.
{
'name': 'birthday',
'type': 'date',
'format': 'default',
'constraints': {
'required': True,
'minimum': '2015-05-30'
}
}
Following code will not raise the exception, despite the fact our date is less than minimum constraints in the field, because we do not check constraints of the field descriptor
var dateType = field.castValue('2014-05-29')
And following example will raise exception, because we set flag 'skip constraints' to
false, and our date is less than allowed by
minimum constraints of the field. Exception will be raised as well in situation of trying to cast non-date format values, or empty values
try {
var dateType = field.castValue('2014-05-29', false)
} catch(e) {
// uh oh, something went wrong
}
Values that can't be cast will raise an
Error exception.
Casting a value that doesn't meet the constraints will raise an
Error exception.
Available types, formats and resultant value of the cast:
|Type
|Formats
|Casting result
|any
|default
|Any
|array
|default
|Array
|boolean
|default
|Boolean
|date
|default, any, \<PATTERN>
|Date
|datetime
|default, any, \<PATTERN>
|Date
|duration
|default
|moment.Duration
|geojson
|default, topojson
|Object
|geopoint
|default, array, object
|[Number, Number]
|integer
|default
|Number
|number
|default
|Number
|object
|default
|Object
|string
|default, uri, email, binary
|String
|time
|default, any, \<PATTERN>
|Date
|year
|default
|Number
|yearmonth
|default
|[Number, Number]
validate()validates whether a schema is a validate Table Schema accordingly to the specifications. It does not validate data against a schema.
Given a schema descriptor
validate returns
Promise with a validation object:
const {validate} = require('tableschema')
const {valid, errors} = await validate('schema.json')
for (const error of errors) {
// inspect Error objects
}
Given data source and headers
infer will return a Table Schema as a JSON object based on the data values.
Given the data file, example.csv:
id,age,name
1,39,Paul
2,23,Jimmy
3,36,Jane
4,28,Judy
Call
infer with headers and values from the datafile:
const descriptor = await infer('data.csv')
The
descriptor variable is now a JSON object:
{
fields: [
{
name: 'id',
title: '',
description: '',
type: 'integer',
format: 'default'
},
{
name: 'age',
title: '',
description: '',
type: 'integer',
format: 'default'
},
{
name: 'name',
title: '',
description: '',
type: 'string',
format: 'default'
}
]
}
Table representation
Array.<string>
Schema
AsyncIterator |
Stream
Array.<Array> |
Array.<Object>
Object
Boolean
Array.<string>
Headers
Returns:
Array.<string> - data source headers
Schema
Schema
Returns:
Schema - table schema instance
AsyncIterator |
Stream
Iterate through the table data
And emits rows cast based on table schema (async for loop).
With a
stream flag instead of async iterator a Node stream will be returned.
Data casting can be disabled.
Returns:
AsyncIterator |
Stream - async iterator/stream of rows:
[value1, value2] - base
{header1: value1, header2: value2} - keyed
[rowNumber, [header1, header2], [value1, value2]] - extended
Throws:
TableSchemaError raises any error occurred in this process
|Param
|Type
|Description
|keyed
boolean
|iter keyed rows
|extended
boolean
|iter extended rows
|cast
boolean
|disable data casting if false
|forceCast
boolean
|instead of raising on the first row with cast error return an error object to replace failed row. It will allow to iterate over the whole data file even if it's not compliant to the schema. Example of output stream:
[['val1', 'val2'], TableSchemaError, ['val3', 'val4'], ...]
|relations
Object
|object of foreign key references in a form of
{resource1: [{field1: value1, field2: value2}, ...], ...}. If provided foreign key fields will checked and resolved to its references
|stream
boolean
|return Node Readable Stream of table rows
Array.<Array> |
Array.<Object>
Read the table data into memory
The API is the same as
table.iterhas except for:
Returns:
Array.<Array> |
Array.<Object> - list of rows:
[value1, value2] - base
{header1: value1, header2: value2} - keyed
[rowNumber, [header1, header2], [value1, value2]] - extended
|Param
|Type
|Description
|limit
integer
|limit of rows to read
Object
Infer a schema for the table.
It will infer and set Table Schema to
table.schema based on table data.
Returns:
Object - Table Schema descriptor
|Param
|Type
|Description
|limit
number
|limit rows sample size
Boolean
Save data source to file locally in CSV format with
, (comma) delimiter
Returns:
Boolean - true on success
Throws:
TableSchemaError an error if there is saving problem
|Param
|Type
|Description
|target
string
|path where to save a table data
Table
Factory method to instantiate
Table class.
This method is async and it should be used with await keyword or as a
Promise.
If
references argument is provided foreign keys will be checked
on any reading operation.
Returns:
Table - data table class instance
Throws:
TableSchemaError raises any error occurred in table creation process
|Param
|Type
|Description
|source
string |
Array.<Array> |
Stream |
function
|data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents
|schema
string |
Object
|data schema in all forms supported by
Schema class
|strict
boolean
|strictness option to pass to
Schema constructor
|headers
number |
Array.<string>
|data source headers (one of): - row number containing headers (
source should contain headers rows) - array of headers (
source should NOT contain headers rows)
|parserOptions
Object
|options to be used by CSV parser. All options listed at https://csv.js.org/parse/options/. By default
ltrim is true according to the CSV Dialect spec.
Schema representation
Boolean
Array.<Error>
Object
Array.<string>
Array.<Object>
Array.<Field>
Array.<string>
Field |
null
Field
Field |
null
Array.<Array>
Object
Boolean
boolean
Boolean
Validation status
It always
true in strict mode.
Returns:
Boolean - returns validation status
Array.<Error>
Validation errors
It always empty in strict mode.
Returns:
Array.<Error> - returns validation errors
Object
Descriptor
Returns:
Object - schema descriptor
Array.<string>
Primary Key
Returns:
Array.<string> - schema primary key
Array.<Object>
Foreign Keys
Returns:
Array.<Object> - schema foreign keys
Array.<Field>
Fields
Returns:
Array.<Field> - schema fields
Array.<string>
Field names
Returns:
Array.<string> - schema field names
Field |
null
Return a field
Returns:
Field |
null - field instance if exists
|Param
|Type
|fieldName
string
Field
Add a field
Returns:
Field - added field instance
|Param
|Type
|descriptor
Object
Field |
null
Remove a field
Returns:
Field |
null - removed field instance if exists
|Param
|Type
|name
string
Array.<Array>
Cast row based on field types and formats.
Returns:
Array.<Array> - cast data row
|Param
|Type
|Description
|row
Array.<Array>
|data row as an array of values
|failFalst
boolean
Object
Infer and set
schema.descriptor based on data sample.
Returns:
Object - Table Schema descriptor
|Param
|Type
|Description
|rows
Array.<Array>
|array of arrays representing rows
|headers
integer |
Array.<string>
|data sample headers (one of): - row number containing headers (
rows should contain headers rows) - array of headers (
rows should NOT contain headers rows) - defaults to 1
Boolean
Update schema instance if there are in-place changes in the descriptor.
Returns:
Boolean - returns true on success and false if not modified
Throws:
TableSchemaError raises any error occurred in the process
|Param
|Type
|Description
|strict
boolean
|alter
strict mode for further work
Example
const descriptor = {fields: [{name: 'field', type: 'string'}]}
const schema = await Schema.load(descriptor)
schema.getField('name').type // string
schema.descriptor.fields[0].type = 'number'
schema.getField('name').type // string
schema.commit()
schema.getField('name').type // number
boolean
Save schema descriptor to target destination.
Returns:
boolean - returns true on success
Throws:
TableSchemaError raises any error occurred in the process
|Param
|Type
|Description
|target
string
|path where to save a descriptor
Schema
Factory method to instantiate
Schema class.
This method is async and it should be used with await keyword or as a
Promise.
Returns:
Schema - returns schema class instance
Throws:
TableSchemaError raises any error occurred in the process
|Param
|Type
|Description
|descriptor
string |
Object
|schema descriptor: - local path - remote url - object
|strict
boolean
|flag to alter validation behaviour: - if false error will not be raised and all error will be collected in
schema.errors - if strict is true any validation error will be raised immediately
Field representation
string
string
string
boolean
Object
Object
any
boolean
Constructor to instantiate
Field class.
Returns:
Field - returns field class instance
Throws:
TableSchemaError raises any error occured in the process
|Param
|Type
|Description
|descriptor
Object
|schema field descriptor
|missingValues
Array.<string>
|an array with string representing missing values
string
Field name
string
Field type
string
Field format
boolean
Return true if field is required
Object
Field constraints
Object
Field descriptor
any
Cast value
Returns:
any - cast value
|Param
|Type
|Description
|value
any
|value to cast
|constraints
Object |
false
boolean
Check if value can be cast
|Param
|Type
|Description
|value
any
|value to test
|constraints
Object |
false
Object
This function is async so it has to be used with
await keyword or as a
Promise.
Returns:
Object - returns
{valid, errors} object
|Param
|Type
|Description
|descriptor
string |
Object
|schema descriptor (one of): - local path - remote url - object
Object
This function is async so it has to be used with
await keyword or as a
Promise.
Returns:
Object - returns schema descriptor
Throws:
TableSchemaError raises any error occured in the process
|Param
|Type
|Description
|source
string |
Array.<Array> |
Stream |
function
|data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents
|headers
Array.<string>
|array of headers
|options
Object
|any
Table.load options
Base class for the all DataPackage/TableSchema errors.
If there are more than one error you could get an additional information from the error object:
try {
// some lib action
} catch (error) {
console.log(error) // you have N cast errors (see error.errors)
if (error.multiple) {
for (const error of error.errors) {
console.log(error) // cast error M is ...
}
}
}
boolean
Array.<Error>
Create an error
|Param
|Type
|Description
|message
string
|errors
Array.<Error>
|nested errors
boolean
Whether it's nested
Array.<Error>
List of errors
Base class for the all TableSchema errors.
The project follows the Open Knowledge International coding standards. There are common commands to work with the project:
$ npm install
$ npm run test
$ npm run build
Here described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted commit history.
Fix bug:
Improved behaviour:
New API added:
forceCast flag to the the
table.iter/read methods
Improved behaviour:
string and
geojson types
infer function
New API added:
format option to the
Table constructor
encoding option to the
Table constructor
Improved behaviour:
infer functions support formats inferring
New API added:
error.rowNumber if available
error.columnNumber if available
New API added:
Table.load and
infer now accept Node Stream as a
source argument
New API added:
Table.load and
infer now accepts
parserOptions
This version includes various big changes, including a move to asynchronous inference.
First stable version of the library.