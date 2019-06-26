Handle uncaught exceptions.
Supports 0.10 only. Designed for robustness and garaunteed eventual termination of the process.
var uncaughtHandler = require('uncaught-exception');
var myLogger = {
fatal: function fatal(message, metaObj, callback) {
// must call the callback once logged
}
}
var myStatsd = {
immediateIncrement: function inc(key, count, callback) {
// call the callback once incremented.
}
}
var onError = uncaughtHandler({
logger: myLogger,
statsd: myStatsd,
meta: { 'hostname': require('os').hostname() },
abortOnUncaught: true, // opt into aborting on uncaught
backupFile: '/path/to/uncaught-handler.log',
gracefulShutdown: function (callback) {
// perform some graceful shutdown here.
// for example synchronize state of your app to redis
// for example communicate to master process in cluster
// and ask for a new worker to be started
// must call callback once gracefully shutdown
// after you call the callback the process will shutdown
}
})
process.on('uncaughtException', onError)
For node0.10 and critical services we recommend that you:
statsd and set up alerts & metrics on "uncaught exceptions"
logger and pay attention to any "fatal" logs
backupFile, this is gold if your logger has an internal
uncaught exception or bug.
abortOnUncaught to false. Uncaught exceptions are bad but
no availability is worse, where possible you want to avoid the
absolute worse case of "spin crashing" and continue on exceptions
for partial availability, even if that means data corruption.
Generally the nodejs documentation recommends against continuing in an undefined state as it can cause cascading failures.
This is completely correct, however having the system restart itself in an automatic fashion causes a different failure mode that's worse. Generally partial availability trumps all.
However, uncaught exceptions are serious. You should page the oncall, even in the middle of the night. The on call is responsible for taking an action whether that's mitigation, failover or restarting the worker. However the on call must ALWAYS restart the worker.
If you want to partially automate the restarting of workers then I would recommend you implement something like:
ABORT mode, where
abortOnUncaught
is true.
NOT_ABORT
mode for the next 24 hours
By implementing an improved supervisor you can "throttle" the amount of process aborts that happen, this allows a single restart in the common case without manual intervention and maintains partial availability during a "spin crash" or "high volume uncaught" scenario.
gracefulShutdown handler
The
gracefulShutdown handler is really a pre-abort asynchronous
hook used to co-ordinate with your routing infrastructure, for
example you may want HAProxy to remove you from the pool or you
may want to more gently leave a
ringpop instance.
See docs.mli for type definitions
var onError = uncaughtHandler(options)
uncaught-exception/uncaught := (options: {
logger: {
fatal: (String, Object, Callback) => void
},
statsd: {
immediateIncrement: (String, Number, Callback) =>void
},
meta?: Object,
statsdKey?: String,
statsdWaitPeriod?: Number,
backupFile?: "stdout" | "stderr" | String,
abortOnUncaught?: Boolean,
loggerTimeout?: Number,
statsdTimeout?: Number,
shutdownTimeout?: Number,
gracefulShutdown?: (Callback) => void,
preAbort?: () => void
}) => onError: (Error) => void
uncaughtHandler takes an options object and returns an error
handling function that can be passed to
'uncaughtException'
listener of the
process.
You must pass the
uncaughtHandler a
logger with a
fatal()
method.
The
uncaughtHandler will exit your process once it's done
logging the error.
options.logger
options.logger is a logger object used to log the exception.
It's expected to have a
fatal() method that takes a string,
an error object and a callback.
The
logger should invoke the
callback once it's flushed it to
all the logging backends you support, (i.e. disk, sentry, etc)
options.statsd
options.statsd is a statsd object used to increment counters.
It's expected to have a
immediateIncrement() method that
takes a string, a number and a callback.
The
statsd should invoke the
callback once it's flushed it
to the stats service.
options.meta
options.meta allows you to configure the meta object that is
logged when an uncaught exception occurs. You might want to
put the
os.hostname() in the meta object.
options.statsdKey
options.statsdKey allows you to configure what kind of statsd
key we increment when we have an uncaught exception.
The key defaults to
"service-crash".
options.statsdWaitPeriod
options.statsdWaitPeriod is a configurable waiting period.
The node implementation of UDP which the
statsd client will
probably uses invokes the callback too early.
If you
abort() synchronously there is no garantuee that we've
actually send the statsd out of the process.
To work around this we have an "arbitrary" waiting period after
we get the
statsd callback.
options.statsdWaitPeriod defaults to
1500 milliseconds
options.backupFile
options.backupFile is a filePath that will be appended to
synchronously incase anything goes wrong inside the uncaught
exception handler.
It's highly recommended you pass a backup file path in case your logger fails.
Inspecting the
backupFile and looking at the core dump will
give you a deep insight into exactly what happened at the
end of your node process.
You may also pass the string literal
"stdout" or
"stderr" as
the
options.backupFile property. If you set it to either
"stdout" or
"stderr" then it will synchronously write to
process.stdout and
process.stderr respectively.
Caveat: If you are running windows and have set
options.backupFile to
"stdout" or
"stderr" then it's not
garaunteed to be synchronous. In windows any writes to
process.stdout when
process.stdout is a
PIPE will be
asynchronous. i.e.
node foo.js | tee file will involve
asynchronous writing to the
backupFile.
options.abortOnUncaught
If
options.abortOnUncaught is set to
true the uncaught handler
will call graceful shutdown and
process.abort() for you.
If this is set to
undefined or
false the uncaught handler
will not call graceful shutdown and it will not call process abort
options.loggerTimeout
The
uncaughtHandler will assume that your logger might fail or
hang so it times out the fatal logging call.
The default timeout is 30 seconds, you can pass
loggerTimeout
if you want to overwrite it.
options.statsdTimeout
The
uncaughtHandler will assume that your statsd might fail or
hang so it times out the statsd increment call.
The default timeout is 5 seconds, you can pass
statsdTimeout
if you want to overwrite it.
options.gracefulShutdown
The
uncaught-exception module supports doing a graceful
shutdown. Normally when an uncaught exception happens you
want to close any servers that are open and wait for all
sockets to exit cleanly.
This function only gets called if
abortOnUncaught is set to
true.
Ideally you want to empty the event loop and do a full graceful shutdown.
You may also want to communicate to the master process if you are
running under
cluster.
For more information on proper error handling see the node domain documentation
options.shutdownTimeout
The
uncaughtHandler will assume that your gracefulShutdown
might fail or hang so it times out the graceful shutdown call.
The default timeout is 30 seconds, you can pass
shutdownTimeout
if you want to overwrite it.
options.preAbort
You can specify your own
preAbort handler that MUST be
a synchronous function.
This function only gets called if
abortOnUncaught is set to
true.
The main use case is to invoke your own exit strategy instead of
the default exit strategy which is calling
process.abort()
For example you may want to
process.exit(1) here instead.
npm install uncaught-exception
npm test