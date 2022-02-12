elasticdump

Tools for moving and saving indices.





Version Warnings!

Version 1.0.0 of Elasticdump changes the format of the files created by the dump. Files created with version 0.x.x of this tool are likely not to work with versions going forward. To learn more about the breaking changes, vist the release notes for version 1.0.0 . If you recive an "out of memory" error, this is probably or most likely the cause.

Installing

(local)

npm install elasticdump ./bin/elasticdump

(global)

npm install elasticdump -g elasticdump

Use

Standard Install

Elasticdump works by sending an input to an output . Both can be either an elasticsearch URL or a File.

Elasticsearch:

format: {protocol}://{host}:{port}/{index}

example: http://127.0.0.1:9200/my_index

File:

format: {FilePath}

example: /Users/evantahler/Desktop/dump.json

Stdio:

format: stdin / stdout

format: $

You can then do things like:

elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ -- type =analyzer elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ -- type =mapping elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ -- type =data elasticdump \ --input=http://production.es.com:9200/my_index \ --output=/data/my_index_mapping.json \ -- type =mapping elasticdump \ --input=http://production.es.com:9200/my_index \ --output=/data/my_index.json \ -- type =data elasticdump \ --input=http://production.es.com:9200/my_index \ --output=$ \ | gzip > /data/my_index.json.gz elasticdump \ --input=http://production.es.com:9200/my_index \ --output=query.json \ --searchBody= "{\"query\":{\"term\":{\"username\": \"admin\"}}}" elasticdump \ --input=http://production.es.com:9200/my_index \ --output=query.json \ --searchBody=@/data/searchbody.json elasticdump \ --input=http://es.com:9200/api \ --output=http://es.com:9200/api2 \ --input-params= "{\"preference\":\"_shards:0\"}" elasticdump \ --input=http://es.com:9200/index-name/ alias -filter \ --output=alias.json \ -- type = alias elasticdump \ --input=./alias.json \ --output=http://es.com:9200 \ -- type = alias elasticdump \ --input=http://es.com:9200/template-filter \ --output=templates.json \ -- type =template elasticdump \ --input=./templates.json \ --output=http://es.com:9200 \ -- type =template elasticdump \ --input=http://production.es.com:9200/my_index \ --output=/data/my_index.json \ --fileSize=10mb elasticdump \ --s3AccessKeyId " ${access_key_id} " \ --s3SecretAccessKey " ${access_key_secret} " \ --input "s3:// ${bucket_name} / ${file_name} .json" \ --output=http://production.es.com:9200/my_index elasticdump \ --s3AccessKeyId " ${access_key_id} " \ --s3SecretAccessKey " ${access_key_secret} " \ --input=http://production.es.com:9200/my_index \ --output "s3:// ${bucket_name} / ${file_name} .json" elasticdump \ --s3AccessKeyId " ${access_key_id} " \ --s3SecretAccessKey " ${access_key_secret} " \ --input "s3:// ${bucket_name} / ${file_name} .json" \ --output=http://production.es.com:9200/my_index --s3ForcePathStyle true --s3Endpoint https://production.minio.co elasticdump \ --s3AccessKeyId " ${access_key_id} " \ --s3SecretAccessKey " ${access_key_secret} " \ --input=http://production.es.com:9200/my_index \ --output "s3:// ${bucket_name} / ${file_name} .json" --s3ForcePathStyle true --s3Endpoint https://production.minio.co elasticdump \ --input "csv:///data/cars.csv" --output=http://production.es.com:9200/my_index \ --csvSkipRows 1 --csvDelimiter ";"

Non-Standard Install

If Elasticsearch is not being served from the root directory the --input-index and --output-index are required. If they are not provided, the additional sub-directories will be parsed for index and type.

Elasticsearch:

format: {protocol}://{host}:{port}/{sub}/{directory...}

example: http://127.0.0.1:9200/api/search

elasticdump \ --input=http://es.com:9200/api/search \ --input-index=my_index \ --output=http://es.com:9200/api/search \ --output-index=my_index \ -- type =mapping elasticdump \ --input=http://es.com:9200/api/search \ --input-index=my_index/my_type \ --output=http://es.com:9200/api/search \ --output-index=my_index \ -- type =mapping

Docker install

If you prefer using docker to use elasticdump, you can download this project from docker hub:

docker pull elasticdump/elasticsearch-dump

Then you can use it just by :

using docker run --rm -ti elasticdump/elasticsearch-dump

you'll need to mount your file storage dir -v <your dumps dir>:<your mount point> to your docker container

Example:

docker run --rm -ti elasticdump/elasticsearch-dump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ -- type =mapping docker run --rm -ti elasticdump/elasticsearch-dump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ -- type =data docker run --rm -ti -v /data:/tmp elasticdump/elasticsearch-dump \ --input=http://production.es.com:9200/my_index \ --output=/tmp/my_index_mapping.json \ -- type =data

If you need to run using localhost as your ES host:

docker run --net=host --rm -ti elasticdump/elasticsearch-dump \ --input=http://staging.es.com:9200/my_index \ --output=http://localhost:9200/my_index \ -- type =data

Dump Format

The file format generated by this tool is line-delimited JSON files. The dump file itself is not valid JSON, but each line is. We do this so that dumpfiles can be streamed and appended without worrying about whole-file parser integrety.

For example, if you wanted to parse every line, you could do:

while read LINE; do jsonlint-py " ${LINE} " ; done < dump.data.json

Options

Elasticsearch's Scroll API

Elasticsearch provides a scroll API to fetch all documents of an index starting from (and keeping) a consistent snapshot in time, which we use under the hood. This method is safe to use for large exports since it will maintain the result set in cache for the given period of time.

NOTE: only works for --output

Bypassing self-sign certificate errors

Set the environment NODE_TLS_REJECT_UNAUTHORIZED=0 before running elasticdump

NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump --input= "https://localhost:9200" --output myfile

MultiElasticDump

This package also ships with a second binary, multielasticdump . This is a wrapper for the normal elasticdump binary, which provides a limited option set, but will run elasticdump in parallel across many indexes at once. It runs a process which forks into n (default your running host's # of CPUs) subprocesses running elasticdump.

The limited option set includes:

If the --direction is dump , which is the default, --input MUST be a URL for the base location of an ElasticSearch server (i.e. http://localhost:9200 ) and --output MUST be a directory. Each index that does match will have a data, mapping, and analyzer file created.

For loading files that you have dumped from multi-elasticsearch, --direction should be set to load , --input MUST be a directory of a multielasticsearch dump and --output MUST be a Elasticsearch server URL.

--parallel is how many forks should be run simultaneously and --match is used to filter which indexes should be dumped/loaded (regex).

--ignoreType allows a type to be ignored from the dump/load. Six options are supported. data,mapping,analyzer,alias,settings,template . Multi-type support is available, when used each type must be comma(,)-separated and interval allows control over the interval for spawning a dump/load for a new index. For small indices this can be set to 0 to reduce delays and optimize performance i.e analyzer,alias types are ignored by default

--includeType allows a type to be included in the dump/load. Six options are supported - data,mapping,analyzer,alias,settings,template .

ignoreChildError allows multi-elasticdump to continue if a child throws an error.

New options, --suffix allows you to add a suffix to the index name being created e.g. es6-${index} and --prefix allows you to add a prefix to the index name e.g. ${index}-backup-2018-03-13 . --order accepts asc or desc and allows the indexes/aliases to be sorted before processing is performed

Usage Examples

multielasticdump \ --direction=dump \ --match= '^.*$' \ --input=http://production.es.com:9200 \ --output=/tmp/es_backup multielasticdump \ --direction=dump \ --match= '^.*-index$' \ --input=http://production.es.com:9200 \ --ignoreType= 'mapping,settings,template' \ --output=/tmp/es_backup

Module Transform

When specifying the transform option, prefix the value with @ (a curl convention) to load the top-level function which is called with the document and the parsed arguments to the module.

Uses a pseudo-URL format to specify arguments to the module as follows. Given:

elasticdump --transform= '@./transforms/my-transform?param1=value¶m2=another-value'

with a module at ./transforms/my-transform.js with the following:

module .exports = function ( doc, options ) { };

will load module ./transforms/my-transform.js', and execute the function with doc and options = {"param1": "value", "param2": "another-value"}`.

An example transform for anonymizing data on-the-fly can be found in the transforms folder.

How Elasticdump handles Nested Data in CSV

Elasticdump is capable of reading/writing nested data, but in a _opinionated way. This is to reduce complexity while parsing/saving CSVs The format flattens all nesting to a single level (an example of this is shown below)

{ "elasticdump" : { "version" : "6.51.0" , "formats" : [ "json" , "csv" ] }, "contributors" : [{ "name" : "ferron" , "id" : 3 }], "year" : 112 }

Output format

{ "elasticdump" : "{\"version\":\"6.51.0\",\"formats\":[\"json\",\"csv\"]}" , "contributors" : "{\"contributors\":[{\"name\":\"ferron\",\"id\":3}]}" , "year" : 2020 }

Notice that the data is flattened to 1 level. Object keys are used for headers and values as row data. This might not work with existing nested data formats, but that's the format that was chosen for elasticdump because of its simplicity. This detection is disabled by default, to enable use the --csvHandleNestedData flag

Notes

This tool is likely to require Elasticsearch version 1.0.0 or higher

Elasticdump (and Elasticsearch in general) will create indices if they don't exist upon import

When exporting from elasticsearch, you can export an entire index ( --input="http://localhost:9200/index" ) or a type of object from that index ( --input="http://localhost:9200/index/type" ). This requires ElasticSearch 1.2.0 or higher

) or a type of object from that index ( ). This requires ElasticSearch 1.2.0 or higher If the path to our elasticsearch installation is in a sub-directory, the index and type must be provided with a separate argument ( --input="http://localhost:9200/sub/directory --input-index=index/type" ).Using --input-index=/ will include all indices and types.

).Using will include all indices and types. We can use the put method to write objects. This means new objects will be created and old objects with the same ID be updated

method to write objects. This means new objects will be created and old objects with the same ID be updated The file transport will not overwrite any existing files by default, it will throw an exception if the file already exists. You can make use of --overwrite instead.

transport will not overwrite any existing files by default, it will throw an exception if the file already exists. You can make use of instead. If you need basic http auth, you can use it like this: --input=http://name:password@production.es.com:9200/my_index

If you choose a stdio output ( --output=$ ), you can also request a more human-readable output with --format=human

), you can also request a more human-readable output with If you choose a stdio output ( --output=$ ), all logging output will be suppressed

), all logging output will be suppressed If you are using Elasticsearch version 6.0.0 or higher the offset parameter is no longer allowed in the scrollContext

parameter is no longer allowed in the scrollContext ES 6.x.x & higher no longer support the template property for _template . All templates prior to ES 6.0 has to be upgraded to use index_patterns

property for . All templates prior to ES 6.0 has to be upgraded to use ES 7.x.x & higher no longer supports type property. All templates prior to ES 6.0 has to be upgraded to remove the type property

property. All templates prior to ES 6.0 has to be upgraded to remove the type property ES 5.x.x ignores offset (from) parameter in the search body. All records will be returned

ES 6.x.x from parameter can no longer be used in the search request body when initiating a scroll

Index templates has been deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.

Ensure JSON in the searchBody properly escaped to avoid parsing issues : https://www.freeformatter.com/json-escape.html

Dropped support for Node.JS 8 in Elasticdump v6.32.0. Node.JS 10+ is now required.

Elasticdump v6.42.0 added support for CSV import/export using the fast-csv library

Elasticdump v6.68.0 added support for specifying a file containing the searchBody

