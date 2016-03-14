Like underscore for Node streams (streams2 and up).
Functions for iterating over object mode streams:
forEach,
map,
reduce,
filter,
mapKey
fromArray,
toArray,
fromAsync
through /
thru,
writable,
readable,
duplex,
combine,
devnull,
cap,
clone
fork,
match,
merge,
forkMerge,
matchMerge,
parallel
pipe,
head,
tail,
pipeline
isStream,
isReadable,
isWritable,
isDuplex
npm install --save pipe-iterators
Preamble:
var pi = require('pipe-iterators');
v1.3.0: Updated dependencies to more recent versions - thanks @asgoth!
v1.2.0: added pi.fromAsync(callable)
v1.1.0:
merge,
forkMerge,
matchMerge and
parallel functions.
pipeline.
The iterator functions closely follow the native
Array.* iteration API (e.g.
forEach,
map,
filter), but the functions return object mode streams instead of operating on arrays.
pi.forEach(callback, [thisArg])
Returns a duplex stream which calls a function for each element in the stream.
callback is invoked with two arguments -
obj (the element value) and
index (the element index). The return value from the callback is ignored.
If
thisArg is provided, it is available as
this within the callback.
pi.fromArray(['a', 'b', 'c'])
.pipe(pi.forEach(function(obj) { console.log(obj); }));
pi.map(callback, [thisArg])
Returns a duplex stream which produces a new stream of values by mapping each value in the stream through a transformation callback. The callback is invoked with two arguments,
obj (the element value) and
index (the element index). The return value from the callback is written back to the stream.
If
thisArg is provided, it is available as
this within the callback.
Note: if you return
null from your map function, core streams will interpret this as EOF for the stream.
pi.fromArray([{ a: 'a' }, { b: 'b' }, { c: 'c' }])
.pipe(pi.map(function(obj) { return _.defaults(obj, { foo: 'bar' }); }));
pi.reduce(callback, [initialValue])
Reduce returns a duplex stream which boils down a stream of values into a single value.
initialValue is the initial value of the reduction, and each successive step of it should be returned by the
callback. The callback is called with three arguments:
prev (the accumulator value),
curr (the current value) and
index (the index).
When the input stream ends, the stream emits the value in the accumulator.
If
initialValue is not provided, then
prev will be equal to the first value in the array and
curr will be equal to the second on the first call.
pi.fromArray(['a', 'b', 'c'])
pipe(pi.reduce(function(posts, post) { return posts.concat(post); }, []));
pi.filter(callback, [thisArg])
Returns a duplex stream which writes all values that pass (return
true for) the test implemented by the provided
callback function.
The callback is invoked with two arguments,
obj (the element value) and
index (the element index). If the callback returns
true, the element is written to the next stream, otherwise the element is filtered out.
pi.filter(function(post) { return !post.draft; })
pi.mapKey(key, callback, [thisArg])
pi.mapKey(hash, [thisArg])
Returns a duplex stream which produces a new stream of values by mapping a single key (when given
key and
callback) or multiple keys (when given
hash) through a transformation callback. The callback is invoked with three arguments:
value (the value
element[key]),
obj (the element itself) and
index (the element index). The return value from the callback is set on the element, and the element itself is written back to the stream.
If
thisArg is provided, it is available as
this within the callback.
pi.fromArray([{ path: '/a/a' }, { path: '/a/b' }, { path: '/a/c' }])
.pipe(pi.mapKey('path', function(p) { return p.replace('/a/', '/some/'); }));
You can also call the
mapKey with a hash:
pi.mapKey({
a: function(value) { /* ... */},
b: 'str',
c: true
})
Each key in the hash is replaced with the return value of the function in the hash. When the value in the hash is not a function, it is simply assigned as the new value for that key.
These utility functions make it easy to provide input into a stream or capture output from a stream.
pi.fromArray(arr)
Returns a readable stream given an array. The stream will emit one item for each item in the array, and then emit end.
pi.toArray(callback)
pi.toArray(array)
Returns a writable stream which buffers the input it receives into an array. When the stream emits
end, the
callback is called with one parameter - the array which contains the input elements written to the the stream.
You can also pass an instance of an array instead of a callback. The array's contents will be updated with the elements from the stream when the writable stream emits
finish.
pi.fromAsync(fn)
Returns a readable stream given an async function. (since
v1.2.0)
The async function should accept one argument,
onDone, which is a
function(err, results). The function is called once - the first time someone reads from the stream. It should return either a single result, or an array of results.
The stream will emit one item for each item in the result (the single result, or each array item individually), and then emit end.
These functions make creating readable, writable and transform streams a bit less boilerplatey.
pi.thru([options], [transformFn], [flushFn]);
pi.thru.obj([transformFn], [flushFn]);
pi.thru.ctor([options], [transformFn], [flushFn]);
Returns a Transform stream given a set of
options, a
transformFn and
flushFn. You can call this function as
pi.through or
pi.thru. This uses the
through2 module, so you should take a look at the documentation for that module. In short:
options hash is passed to
stream.Transform to construct the stream. See the core docs.
transformFn has the signature:
function (chunk, encoding, onDone) {}. See the core docs for details.
flushFn has the signature
function(onDone). See the core docs for details.
thru.obj(fn) is a convenience wrapper around
thru({ objectMode: true }, fn).
thru.ctor() returns a constructor for a custom Transform. This is useful when you want to use the same transform logic in multiple instances.
BTW, if you need parallel execution but with the same API as a
thru stream, check out
parallel in the control flow section.
pi.writable([options], writeFn)
pi.writable.obj(writeFn)
pi.writable.ctor([options], writeFn)
Returns a Writable stream given a set of
options and a
writeFn.
Has the same options as
thru:
options hash is passed to
stream.Writable to construct the stream. See the core docs.
writeFn has the signature:
function(chunk, encoding, callback) {}. See the core docs for details.
writable.obj() is a convenience wrapper for
writable({ objectMode: true }).
writable.ctor() returns a constructor for the writable stream.
pi.readable([options], [readFn])
pi.readable.obj([readFn])
pi.readable.ctor([options], [readFn])
Returns a Readable stream given a set of
options and a
readFn.
Has the same options as
thru:
options hash is passed to
stream.Readable to construct the stream. See the core docs.
readFn has the signature:
function(size) {}. See the core docs for details.
readable.obj() is a convenience wrapper for
readable({ objectMode: true }).
readable.ctor() returns a constructor for the readable stream.
pi.duplex([options], writeFn, readFn)
pi.duplex.obj(writeFn, readFn)
pi.duplex.ctor([options], writeFn, readFn)
Returns a Duplex stream given a set of
options, a
writeFn and a
readFn.
Has the same options as
thru:
options hash is passed to
stream.Duplex to construct the stream. See the core docs.
writeFn has the signature:
function(chunk, encoding, callback) {}. See the core docs for details.
readFn has the signature:
function(size) {}. See the core docs for details.
duplex.obj() is a convenience wrapper for
duplex({ objectMode: true }).
duplex.ctor() returns a constructor for the duplex stream.
pi.combine(writableStream, readableStream)
Takes a readable stream and a writable stream and returns a duplex stream.
Note: the two streams ARE NOT piped together. If you want to construct a pipeline with multiple streams, you can, but you need to perform the pipe operations yourself (or use the
.pipeline function instead). This makes
.combine work with streams where the connections is not via a pipe mechanism, like with
child_process.spawn:
var child = require('child_process').spawn('wc', ['-c']);
pi.fromArray(['a', 'b', 'c'])
.pipe(pi.combine(child.stdin, child.stdout))
.pipe(process.stdout);
Listeners for the
error event will receive errors that are emitted in either stream, or that are emitted as a result of piping into the duplex stream.
pi.devnull()
Returns a writable stream which consumes any input and produces no output. Useful for consuming output from duplex streams when prototyping or when you want to run the processing but discard the final output.
pi.cap(duplex)
Returns a writable stream given a duplex stream. Any input written into the stream is written to the duplex stream.
pi.clone()
Returns a duplex stream. Inputs written to the stream are cloned and then written out. This is useful if you need to ensure that concurrent modifications to objects written into multiple streams do not influence each other.
These functions allow you to write more advanced streams, going from one linear sequence of transformation steps to multiple pipelines.
pi.fork(stream1, [stream2], [...])
pi.fork([ stream1, stream2, ... ])
Returns a duplex stream. Inputs written to the stream are written to all of the streams passed as arguments to
fork.
Every forked stream receives a clone of the original input object. Cloning prevents annoying issues that might occur when one fork stream modifies an object that is shared among multiple forked streams.
Also accepts a single array of streams as the first parameter.
Listeners for the
error event on the stream returned from
fork will receive errors that are emitted in any of the streams in passed to the function.
pi.match(condition1, stream1, [condition2], [stream2], [...], [rest])
pi.match([ condition1, stream1, condition2, stream2, ..., rest ])
Allows you to construct
if-else style conditionals which split a stream into multiple substreams based on a condition.
Returns a writable stream given a series of
condition function and
stream pairs. When elements are written to the stream, they are matched against each condition function in order.
The
condition function is called with two arguments -
obj (the element value) and
index (the element index). If the condition returns
true, the element is written to the associated stream and no further matches are performed.
The last argument,
rest is optional. It should be a writable stream (without a preceding condition function). Any elements not matching the other conditions will be written into it.
Listeners for the
error event on the stream returned from
match will receive errors that are emitted in any of the streams in passed to the function.
pi.fromArray([
{ url: '/people' },
{ url: '/posts/1' }, { url: '/posts' },
{ url: '/comments/2' }])
.pipe(pi.match(
function(req) { return /^\/people.*$/.test(req.url); },
pi.pipeline(
pi.forEach(function(obj) { console.log('person!', obj); }),
pi.devNull()
),
function(req) { return /^\/posts.*$/.test(req.url); },
pi.pipeline(
pi.forEach(function(obj) { console.log('post!', obj); }),
pi.devNull()
),
pi.pipeline(
pi.forEach(function(obj) { console.log('other:', obj); }),
pi.devNull()
)
));
pi.merge(stream1, [stream2], [...])
pi.merge([ stream1, stream2, ... ])
Takes multiple readable streams and merges them into one stream. Accepts any number of readable streams and returns a duplex stream.
pi.forkMerge(stream1, [stream2], [...])
pi.forkMerge([ stream1, stream2, ... ])
Fork followed by merge on a set of streams. Accepts any number of duplex streams; returns a duplex stream that:
forks each input, writes each input into the streams,
merges the inputs from the streams and writes them out
Useful if you need to concurrently apply different operations on a single input but want to produce a single merged output.
/ to-html() \
read .md() - to-pdf() - write-to-disk()
\ to-rtf() /
For example, imagine converting a set of Markdown files into the HTML, PDF and RTF formats - the same file goes in, each of the processing operations are applied, but at the end there are three objects (binary files in the different formats) that go into the same "write to disk" pipeline.
pi.matchMerge(condition1, stream1, [condition2], [stream2], [...], [rest])
pi.matchMerge([ condition1, stream1, condition2, stream2, ..., rest ])
Match followed by merge on a set of streams. Accepts any number of duplex streams; returns a duplex stream that:
matches conditions, selects the correct stream and writes to that stream
merges the inputs from each of the streams and writes them out
Useful if you want to conditionally process some elements differently, while sharing the same downstream pipeline.
For example, if you want to first check a cache and skip some processing for items that hit in the cache, you could do something like
pi.matchMerge(checkCache, getResultFromCache, performFullProcessing) (where
checkCache is a function and the other two are through streams).
pi.parallel(limit, [transformFn], [flushFn])
Returns a object-mode Transform stream given a
limit, a
transformFn and
flushFn. Works like a
through.obj stream but:
transformFn can be launched multiple times in parallel, with up to
limit tasks running at the same time
flushFn is only called after both 1) the thru-stream is instructed to end AND 2) all the tasks have been completed.
"done": emitted after each
transformFn execution completes
"empty": emitted when the execution queue becomes empty
The usual thru-stream conventions apply:
transformFn has the signature:
function (chunk, encoding, onDone) {}. See the core docs for details.
flushFn has the signature
function(onDone). See the core docs for details.
Both
transformFn and
flushFn are optional. If the transformFn is not provided, then it defaults to:
function(task, enc, done) { task.call(this, done); }
which works nicely if the items in your stream are something like:
pi.fromArray([
function(done) { this.push(1); done(); },
function(done) { this.push(2); done(); }
])
.pipe(pi.parallel(2))
.pipe(pi.toArray(function(result) {
assert.deepEqual(result.sort(), [1, 2]);
}));
Note how each task runs with
this set to the
parallel stream, which means you can push results out. Similar to normal core streams, the
done function can return one argument -
err. If you need to process the other arguments, define your own
transformFn.
Of course, you don't have to use callback functions just to get parallel processing - any task, even a basic thru stream like:
pi.parallel(16, function(filename, enc, done) {
var self = this;
fs.stat(filename, function(err, result) {
self.push(result); done();
})
});
will execute up to 16 stat calls at a time with
parallel.
Note that you can safely call
this.write() from within the transform function to add more tasks to run - this can be useful if your task processing causes more tasks to need to run. If you need the new payloads to go through some upstream processing, you can might consider writing to another stream that precedes
parallel, provided you haven't ended that stream yet.
These functions apply
pipe in various ways to make it easier to go from an array of streams to a pipeline.
pi.pipe(stream1, [stream2], [...])
pi.pipe([ stream1, stream2, ...])
Given a series of streams, calls
.pipe() for each stream in sequence and returns an array which contains all the streams. Used by
head() and
tail().
Also accepts a single array of streams as the first parameter.
pi.head(stream1, [stream2], [...])
pi.head([ stream1, stream2, ... ])
Given a series of streams, calls
.pipe() for each stream in sequence and returns the first stream in the series.
Also accepts a single array of streams as the first parameter.
Similar to
a.pipe(b).pipe(c), but
.head() returns the first stream (
a) rather than the last stream (
c).
pi.tail(stream1, [stream2], [...])
pi.tail([ stream1, stream2, ... ])
Given a series of streams, calls
.pipe() for each stream in sequence and returns the last stream in the series.
Also accepts a single array of streams as the first parameter.
Just like calling
a.pipe(b).pipe(c).
pi.pipeline(stream1, stream2, ...)
pi.pipeline([ stream1, stream2, ... ])
Constructs a pipeline from a series of streams. Always returns a single stream object, which is either duplex or writable. Pipelines are series of streams that either:
Given a pipeline that starts with a duplex stream and ends with a duplex stream,
pipeline returns a single duplex stream in which any writes go the first stream and any reads/pipes etc. are done from the last stream.
Normally, when just manually applying
pipe you have to pick whether to return the first stream or the last stream in the pipeline. Returning the first stream has the benefit that writes to it will correctly go into the pipeline, but of course any reads/pipes from it will skip the rest of the pipeline. Returning the last stream has the opposite problem: you can read from the pipeline but cannot pipe to the first stream anymore. With
pipeline you don't need to choose, since the return result works as you would expect.
Given a pipeline that starts with a duplex stream and ends with a writable stream,
pipeline returns a single writable stream in which any writes go the first stream. Since the last stream in the pipeline is writable but not readable, the pipeline is also only writable but not readable. This helps stop errors where you accidentally pipe the first stream of a pipeline out (which will not work as expected since outputs do not pass through the whole pipeline).
With
.pipeline(), writes into the return value go to the first stream but reads (and pipe calls) are applied to the last value in the stream:
module.exports = function() {
return pi.pipeline(a, a2, a3);
};
works as expected and
input.pipe(myPipeline).pipe(b) writes to
a but reads from
a3.
Listeners for the
error event on the stream returned from
pipeline will receive errors that are emitted in any of the streams in passed to the function.
These functions are like rvagg/isstream, but they work correctly on Node 0.8. The main differences are that 1) the 0.8 core streams from things like fs and child_process are correctly detected and 2) the functions use duck typing (checking for conformance to an API) rather than
instanceof checks which can be problematic in a browser environment or when using modules that are compatible from an API perspective but do not descend from the native
stream.
pi.isStream(obj)
Returns true if a stream provides either the Readable stream interface or the Writable stream interface.
pi.isReadable(obj)
Returns true if a stream provides the Readable stream interface.
pi.isWritable(obj)
Returns true if a stream provides the Writable stream interface.
pi.isDuplex(obj)
Returns true if a stream provides both the Readable and Writable stream interfaces.
Meh,
through2 streams already make writing async iteration quite easy.
Best handled by something that can do that in an efficient manner, such as binary-split.