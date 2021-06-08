At LogDNA, consuming log files and making them searchable is what we do!
It all starts with the ability to efficiently watch log files on a local
host and send new lines up to the LogDNA service. This Node.js class provides
functionality like Unix's
tail -f command, and we use it in our
agents to get the job done. Of course, anything needing
tail functionality
in Node.js could also benefit from using this.
Readable stream, which is efficient
and flexible in terms of being able to
pipe to other streams or consume via events.
npm install @logdna/tail-file
Instantiate an instance by passing the full path of a file to tail.
This will return a stream that can be piped to other streams or consumed
via
data events. To begin the tailing, call the
start method.
data events
const TailFile = require('@logdna/tail-file')
const tail = new TailFile('/path/to/your/logfile.txt', {encoding: 'utf8'})
.on('data', (chunk) => {
console.log(`Recieved a utf8 character chunk: ${chunk}`)
})
.on('tail_error', (err) => {
console.error('TailFile had an error!', err)
})
.on('error', (err) => {
console.error('A TailFile stream error was likely encountered', err)
})
.start()
.catch((err) => {
console.error('Cannot start. Does the file exist?', err)
})
pipe
This example is more realistic. It pipes the output to a transform stream
which breaks the data up by newlines, emitting its own
data event for
every line.
const TailFile = require('@logdna/tail-file')
const split2 = require('split2') // A common and efficient line splitter
const tail = new TailFile('/path/to/your/logfile.txt')
tail
.on('tail_error', (err) => {
console.error('TailFile had an error!', err)
throw err
})
.start()
.catch((err) => {
console.error('Cannot start. Does the file exist?', err)
throw err
})
// Data won't start flowing until piping
tail
.pipe(split2())
.on('data', (line) => {
console.log(line)
})
readline
This is an easy way to get a "line splitter" by using Node.js core modules.
For tailing files with high throughput, an official
Transform stream is
recommended since it will edge out
readline slightly in performance.
const readline = require('readline')
const TailFile = require('@logdna/tail-file')
async function startTail() {
const tail = new TailFile('./somelog.txt')
.on('tail_error', (err) => {
console.error('TailFile had an error!', err)
})
try {
await tail.start()
const linesplitter = readline.createInterface({
input: tail
})
linesplitter.on('line', (line) => {
console.log(line)
})
} catch (err) {
console.error('Cannot start. Does the file exist?', err)
}
}
startTail().catch((err) => {
process.nextTick(() => {
throw err
})
})
TailFile is a
Readable stream, so it can emit any events from that
superclass. Additionally, it will emit the following custom events.
'flush'
This event is emitted when the underlying stream is done being read.
If backpressure is in effect, then
_read() may be called multiple
times until it's flushed, so this event signals the end of the process.
It is used primarily in shutdown to make sure the data is exhausted,
but users may listen for this event if the relative "read position" in the
file is of interest. For example, the
lastReadPosition may be persisted to memory
or database for resuming
tail-file on a separate execution without missing
any lines or duplicating them.
'renamed'
<Object>
This event is emitted when a file with the same name is found, but has a different inode than the previous poll. Commonly, this happens during a log rotation.
'retry'
<Object>
If a file that was successfully being tailed goes away,
TailFile will
try for
maxPollFailures to re-poll the file. For each of those retries,
this event is emitted for informative purposes. Typically, this could happen
if log rolling is occurring manually, or timed in a way where the poll happens
during the time in which the "new" filename is not yet created.
'tail_error'
<Error>
When an error happens that is specific to
TailFile, it cannot emit an
error event
without causing the main stream to end (because it's a
Readable implementation).
Therefore, if an error happens in a place such as reading the underlying file
resource, a
tail_error event will be emitted instead.
'truncated'
<Object>
If a file is shortened or truncated without moving or renaming the file,
TailFile will assume it to be a new file, and it will start consuming
lines from the beginning of the file. This event is emitted for informational
purposes about that behavior.
Readable event)
TailFile implements a
Readable
stream, so it may also emit these events. The most common ones are
close
(when
TailFile exits), or
data events from the stream.
new TailFile(filename[, options])
filename
<String> - The filename to tail.
Poll errors do not happen until
start is called.
options
<Object> - Optional
pollFileIntervalMs
<Number> - How often to poll
filename for changes.
Default:
1000ms
pollFailureRetryMs
<Number> - After a polling error (ENOENT?), how long to
wait before retrying. Default:
200ms
maxPollFailures
<Number> - The number of times to retry a failed poll
before exiting/erroring. Default:
10 times.
readStreamOpts
<Object> - Options to pass to the
fs.createReadStream function. This is used for reading bytes
that have been added to
filename between every poll.
startPos
<Number> - An integer representing the inital read position in
the file. Useful for reading from
0. Default:
null (start tailing from EOF)
Readable superclass
constructor of
TailFile
<TypeError>|
<RangeError> if parameter validation fails
TailFile, which is a
Readable stream
Instantiating
TailFile will return a readable stream, but nothing will happen
until
start() is called. After that, follow node's standard procedure to
get the stream into flowing mode. Typically, this means using
pipe or attaching
data listeners to the readable stream.
As the underlying
filename is polled for changes, it will call
fs.createReadStream to efficiently read the changed bytes since the last poll.
To control the options of that stream, the key-values in
readStreamOpts will
be passed to the
fs.createReadStream constructor. Similarly, options for
controlling
TailFile's' stream can be passed in via
options, and they will
get passed through to the
Readable's
super() constructor.
Useful settings such as
encoding: 'utf8' can be used this way.
tail.start()
<Promise> - Resolves after the file is polled successfully
filename is not found
Calling
start() begins the polling of
filename to watch for added/changed bytes.
start() may be called before or after data is set up to be consumed with a
data listener or a
pipe. Standard node stream rules apply, which say
that data will not flow through the stream until it's consumed.
tail.quit()
undefined
close when the parent
Readstream is ended.
This function closes all streams and exits cleanly. The parent
TailFile stream will be
properly ended by pushing
null, therefore an
end event may be emitted as well.
Using "file watcher" events don't always work across different operating systems,
therefore the most effective way to "tail" a file is to continuously poll
it for changes and read those changes when they're detected.
Even Unix's
tail -f command works similarly.
Once
start() is called,
TailFile will being this polling process. As changes
are detected through a
.size comparison, it uses
fs.openReadStream to
efficiently read to the end of the file using async/await iterators.
This allows backpressure to be supported throughout the process.
TailFile keeps a
FileHandle open for the
filename, which is attached to an
inode. If log rolling happens,
TailFile uses the
FileHandle to read the rest of the
"old" file before starting the process from the beginning of the newly-created file.
This ensures that no data is lost due to the rolling/renaming of
filename.
This functionality assumes that
filename is re-created with the same name,
otherwise an error is emitted if
filename does not re-appear.
Because
TailFile won't be consumed until it is in a reading mode,
this may cause backpressure to be enacted. In other words, if
.start() is called,
but
pipe or data events are not immediately set up,
TailFile may encounter
backpressure if its
push() calls exceed the high water mark.
Backpressure can also happen if
TailFile becomes unpiped.
In these cases,
TailFile will stop polling and wait until data is flowing before
polling resumes.
If polling is off during backpressure,
TailFile can handle
a single log roll or rename during backpressure, but if
the log is renamed more than once, there will most likely be data loss, as polling for
changes will be off.
This is an extrememly unlikely edge case, however we recommend consuming the
TailFile
stream almost immediately upon creation.