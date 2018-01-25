FakeIt Data Generator

Utility that generates fake data in json , yaml , yml , cson , or csv formats based on models which are defined in yaml . Data can be generated using any combination of FakerJS, ChanceJS, or Custom Functions.

Generated data can be output in the following formats and destinations:

json

yaml

yml

cson

csv

Zip Archive of json , yaml , yml , cson or csv files

, , , or files Couchbase Server

Couchbase Sync Gateway Server

Install

npm install fakeit --save-dev npm install fakeit --global

CLI Usage

Usage: fakeit [ command ] [<file|directory|glob> ...] Commands: console [options] outputs the result to the console couchbase [options] This will output to couchbase sync-gateway [options] no idea directory|folder [options] [<dir|file.zip>] [<models...>] Output the file(s) into a directory help Options: -h, -- help output usage information -V, --version output the version number --root <directory> Defines the root directory from which paths are resolve from (process.cwd()) --babel <glob> The location to the babel config (+(.babelrc|package.json)) -c, --count <n> Overrides the number of documents to generate specified by the model. Defaults to model defined count -v, --verbose Enables verbose logging mode ( false ) -S, --no-spinners Disables progress spinners -L, --no-log Disables all logging except for errors -T, --no-timestamp Disables timestamps from logging output -f, --format < type > this determines the output format to use. Supported formats: json, csv, yaml, yml, cson. (json) -n, --spacing <n> the number of spaces to use for indention (2) -l, -- limit <n> limit how many files are output at a time (100) -x, --seed <seed> The global seed to use for repeatable data

Models

All data is generated from one or more YAML files. Models are defined similarly to how models are defined in Swagger, with the addition of a few more properties that are used for data generation:

At the root of a model the following keys are used, if it's not required then it's optional

name (required)

The name of the model

type

The data type of the model to be generated. This needs to be set top level, as well as a per property/items basis. It determines the starting data type, and how the result of the build loop will be converted once complete

Note: If type isn't set it defaults to 'null' .

Available types

types data type description number, long, integer 0 Converts result to number using parseInt double, float 0 Converts result to number using parseFloat string '' Converts result to a string using result.toString() boolean, bool false Converts result to a boolean if it's not already, if result is a string and is 'false' , '0' , 'undefined' , 'null' it will return false array [] returns the result from the build loop object, structure {} returns the result from the build loop null, undefined, * (anything else) null returns the result from the build loop

Places where it can be set

name: Types example type: object key: build: faker.random.uuid() properties: foo: type: object properties: bar: type: string data: value: FakeIt ftw bar: type: array items: type: string data: min: 1 max: 10 build: faker.random.word()

data

This is the main data object that is uses the same properties in several different situations.

min : The minimum number of documents to generate

: The minimum number of documents to generate max : The maximum number of documents to generate

: The maximum number of documents to generate count : A fixed number of documents to generate. If this is defined then min and max are ignored. If min , max , and count aren't defined count defaults to 1

: A fixed number of documents to generate. If this is defined then and are ignored. If , , and aren't defined defaults to 1 pre_run : A function that runs before the documents are generated

: A function that runs before the documents are generated pre_build : A function to be run before each document is generated

: A function to be run before each document is generated value : Returns a value (can't be a function).

: Returns a value (can't be a function). build : The function to be run when the property is built. Only runs if value isn't defined

: The function to be run when the property is built. Only runs if isn't defined fake: A template string to be used by Faker i.e. "{{name.firstName}}" . This will only run if build , and value aren't defined.

A template string to be used by Faker i.e. . This will only run if , and aren't defined. post_build : A function to be run after each document is generated

: A function to be run after each document is generated post_run : A function that runs after all the documents are generated for that model

The following keys can only be defined in the top level data object

dependencies : An array of dependencies of file paths to the dependencies of the current model. They are relative to the model, and or they can be absolute paths. Don't worry about the order, we will resolve all dependencies automagically #yourwelcome

: An array of dependencies of file paths to the dependencies of the current model. They are relative to the model, and or they can be absolute paths. Don't worry about the order, we will resolve all dependencies automagically #yourwelcome inputs : A object/string of input(s) that's required for this model to run. If it's a string the file name is used as the key. The key is what you reference when you want to get data (aka this.inputs[key] ). The value is the file path to the inputs location. It can be relative to the model or an absolute path.

key (required)

This determines the name of the document that's being generated. It only needs to be defined once per document. This is a reference to a generated property and is used for the filename or Document ID. If the key is an object it needs the data option defined above, it will only work with value , build , and fake since this already runs after the document has been built. If the key is a string then it use the string value to find the value of the document that was just built (using the lodash get method).

Examples of setting a key

In this example after each document is built it will look for the _id property and return it's result (aka user_1 , user_2 , etc.)

name: Key String Example type: object key: _id data: pre_run: | globals.user_counter = 0; properties: _id: type: string description: The document id data: post_build: `user_${this.user_id}` user_id: type: integer description: The users id data: build: ++globals.user_counter

In this example the key will be 'user_' + the current user_id (aka user_1 , user_2 , etc.)

name: Key Object Example type: object key: data: build: `user_${this.user_id}` data: pre_run: | globals.user_counter = 0; properties: user_id: type: integer description: The users id data: build: ++globals.user_counter

seed

If a seed is defined it will ensure that the documents created repeatable results. If you have a model with a data range of 2-10 a random number between 2 and 10 documents will be created no matter what the seed is. Let's say that 4 documents are generated the first time you run the model, each of those documents will be completely different than the next (as expected). Later you come back and you generate the data again this time it might generate 6 documents. The first 4 documents generated the second time will be exactly the same as the first time you generated the data. The seed can be number or string.

This only works if you use faker and chance to generate your random fake data. It can be produced with other fake data generation libraries if they support seeds.

faker.date functions will not produce the same fake data each time.

Functions

For any function defined above be sure to use | for multi line functions and NOT > . To see an in depth explanation see this issue

Each of these functions is passed the following variables that can be used at the time of it's execution:

documents - An object containing a key for each model whose value is an array of each document that has been generated

- An object containing a key for each model whose value is an array of each document that has been generated globals - An object containing any global variables that may have been set by any of the run or build functions

- An object containing any global variables that may have been set by any of the run or build functions inputs - An object containing a key for each input file used whose value is the deserialized version of the files data

- An object containing a key for each input file used whose value is the deserialized version of the files data faker - A reference to FakerJS

- A reference to FakerJS chance - A reference to ChanceJS

- A reference to ChanceJS document_index - This is a number that represents the currently generated document's position in the run order

- This is a number that represents the currently generated document's position in the run order require - This is the node require function, it allows you to require your own packages. Should require and set them in the pre_run functions for better performance.

For the pre_run , and post_run the this context refers to the current model. For the pre_build , build , and post_build the this context refers to the object currently being generated. If you have a nested object being created in an array or something, this will refer to closest object not the outer object/array.

Example users.yaml Model

name: Users type: object key: data: build: `user_${this.user_id}` data: min: 200 max: 500 pre_run: | globals.user_counter = 0; properties: user_id: description: The users id data: build: faker.random.uuid() name: description: The users first name data: fake: ' {{name.firstName}} ' last_name: description: The users last name data: fake: ' {{name.lastName}} ' username: description: The users username data: fake: ' {{internet.userName}} ' password: description: The users password data: fake: ' {{internet.password}} ' email: description: The users email address data: fake: ' {{internet.email}} ' phone: description: The users mobile phone data: fake: ' {{phone.phoneNumber}} ' post_build: this.phone.replace(/x[0-9]+$/, '' )

Results in the following

{ "user_id" : "4d9ec95c-f45d-42f4-9d32-4ac81d83f95b" , "name" : "Sandy" , "last_name" : "Turner" , "username" : "Zella61" , "password" : "gi7NVXsUoARHhyU" , "email" : "Buck_Cormier@hotmail.com" , "phone" : "715.612.8609" } { "user_id" : "7f513d5b-f944-4a80-b52a-4876627368b7" , "name" : "Duane" , "last_name" : "VonRueden" , "username" : "Mafalda92" , "password" : "3uXo4hFZJTdf1hp" , "email" : "Rickie_Braun@hotmail.com" , "phone" : "(356) 009-7477 " } ...etc

properties

This is used to define out the properties for an object.

Each key inside of the properties will be apart of the generated object. Each of the keys use the following properties to build the values.

type : The data type of the property. Values can be: string , object , structure , number , integer , double , long , float , array , boolean , bool

: The data type of the property. Values can be: , , , , , , , , , , description : A description of the property. This is just extra notes for the developer and doesn't affect the data.

: A description of the property. This is just extra notes for the developer and doesn't affect the data. data : The same data options as defined above

name: test key: build: faker.random.uuid() type: object properties: id: data: build: faker.random.uuid() title: type: string description: The main title to use data: build: | faker.random.word() phone: type: object properties: home: type: string data: build: faker.phone.phoneNumber().replace(/x[0-9]+$/, '' ) work: type: string data: build: chance.bool({ likelihood: 35 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '' ) : null

This will return a object like this

{ "id" : "4ce4da5c-0614-47d3-8fd6-3614c5461830" , "title" : "alliance" , "phone" : { "home" : "(949) 194-3347" , "work" : "314-939-0541" } } { "id" : "a649bbec-d629-4594-8fc8-ae34d97811a2" , "title" : "Unbranded" , "phone" : { "home" : "012-296-9810" , "work" : null } } etc...

items

This is used to define out how each item in an array is built It uses the same structure as properties does but it will return an array of values.

name: Array example key: data: build: faker.random.uuid() type: object properties: keywords: type: array description: An array of keywords items: type: string data: min: 3 max: 10 build: faker.random.word() phones: type: array description: An array of phone numbers items: type: object data: min: 1 max: 3 properties: cell: type: string data: build: faker.phone.phoneNumber().replace(/x[0-9]+$/, '' ) home: type: string data: build: chance.bool({ likelihood: 45 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '' ) : null work: type: string data: build: chance.bool({ likelihood: 10 }) ? faker.phone.phoneNumber().replace(/x[0-9]+$/, '' ) : null

{ "keywords" : [ "GB" , "Sports" , "redundant" , "Plastic" , ], "phones" : [ { "cell" : "(555) 555 - 5555" , "home" : "(666) 666 - 6666" , "work" : null }, { "cell" : "(777) 777 - 7777" , "home" : null "work" : "(888) 888 - 8888" , } ] }

Model References

It can be beneficial to define definitions that can be referenced one or more times throughout a model. This can be accomplished by using the $ref: property. Consider the following example:

contacts.yaml

name: Contacts type: object key: contact_id data: min: 1 max: 4 properties: contact_id: data: build: "chance.guid()" details: schema: $ref: '#/definitions/Details' phones: type: array items: $ref: '#/definitions/Phone' data: min: 1 max: 4 emails: type: array items: $ref: '#/definitions/Email' data: min: 0 max: 3 addresses: type: array items: $ref: '#/definitions/Address' data: min: 0 max: 3 definitions: Email: data: build: "faker.internet.email()" Phone: type: object properties: phone_type: data: build: "faker.random.arrayElement([ 'Home', 'Work', 'Mobile', 'Main', 'Other' ])" phone_number: data: build: "faker.phone.phoneNumber().replace(/x[0-9]+$/, '')" extension: data: build: "chance.bool({ likelihood: 20 }) ? chance.integer({min: 1000, max: 9999}).toString() : ''" Address: type: object properties: address_type: data: build: "faker.random.arrayElement([ 'Home', 'Work', 'Other' ]);" address_1: data: build: "`${faker.address.streetAddress()} ${faker.address.streetSuffix()}`" address_2: data: build: "chance.bool({ likelihood: 35 }) ? faker.address.secondaryAddress() : ''" city: data: build: "faker.address.city()" state: data: build: "faker.address.stateAbbr()" postal_code: data: build: "faker.address.zipCode()" country: data: build: "faker.address.countryCode()" Details: type: object properties: first_name: data: fake: " {{name.firstName}} " last_name: data: build: "return chance.bool({ likelihood: 70 }) ? faker.name.lastName() : ''" company: type: string description: The contacts company data: build: "return chance.bool({ likelihood: 30 }) ? faker.company.companyName() : ''" job_title: type: string description: The contacts job_title data: build: "return chance.bool({ likelihood: 30 }) ? faker.name.jobTitle() : ''"

For this model we used 4 references:

$ref: '#/definitions/Details'

$ref: '#/definitions/Phone'

$ref: '#/definitions/Email'

$ref: '#/definitions/Address'

These could have been defined inline but that would make it more difficult to see our model definition, and each of these definitions can be reused. References are processed and included before a model is run and it's documents are generated.

Overriding Model Defaults

The model defaults can be overwritten at run time by executing the pre_run function. The this keyword in both the pre_run and post_run functions is the processed model. Below are some examples of changing the number of documents the model should generate before the generation process starts.

name: Users type: object key: _id data: pre_run: | this.data.count = 100

This becomes beneficial if you are providing input data and want to generate a fixed number of documents. Take the following command for example:

Here we want to generate a countries model but we might not necessarily know the exact amount of data being provided by the input. We can reference the input data in our model's pre_run function and set the number to generate based on the input array.

name: Countries type: object key: _id data: inputs: '../inputs/countries.csv' pre_run: | this.data.count = inputs.countries.length;

JS API

If you don't want to use the CLI version of this app you can always use the JS api.

import Fakeit from 'fakeit' const fakeit = new Fakeit() fakeit.generate( 'glob/to/models/**/*.yaml' ) .then( ( data ) => { console .log(data) })

Fakeit Options

Below are the default options that are used unless overwritten.

import Fakeit from 'fakeit' const fakeit = new Fakeit({ root : process.cwd(), babel_config : '+(.babelrc|package.json)' , seed : 0 , log : true , verbose : false , timestamp : true , }) const models = 'glob/to/models/**/*.yaml' fakeit.generate(models, { format : 'json' , spacing : 2 , output : 'return' , limit : 100 , highlight : true , archive : '' , server : '127.0.0.1' , bucket : 'default' , username : '' , password : '' , timeout : 5000 , }) .then( ( data ) => { data = JSON .parse(data) })

Examples

To see more examples of some of the things you can do take a look at the test cases that are in this repo

