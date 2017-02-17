A bridge between node and PhantomJS.
Working with PhantomJS in node is a bit cumbersome since you need to spawn a new PhantomJS process for every single task. However, spawning a new process is quite expensive and thus can slow down your application significantly.
phridge provides an api to easily
Unlike other node-PhantomJS bridges phridge provides a way to run code directly inside PhantomJS instead of turning every call and assignment into an async operation.
phridge uses PhantomJS' stdin and stdout for inter-process communication. It stringifies the given function, passes it to PhantomJS via stdin, executes it in the PhantomJS environment and passes back the results via stdout. Thus you can write your PhantomJS scripts inside your node modules in a clean and synchronous way.
Instead of ...
phantom.addCookie("cookie_name", "cookie_value", "localhost", function () {
phantom.createPage(function (page) {
page.set("customHeaders.Referer", "http://google.com", function () {
page.set(
"settings.userAgent",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
function () {
page.open("http://localhost:9901/cookie", function (status) {
page.evaluate(function (selector) {
return document.querySelector(selector).innerText;
}, function (text) {
console.log("The element contains the following text: "+ text)
}, "h1");
});
}
);
});
});
});
... you can write ...
// node
phantom.run("h1", function (selector, resolve) {
// this code runs inside PhantomJS
phantom.addCookie("cookie_name", "cookie_value", "localhost");
var page = webpage.create();
page.customHeaders = {
Referer: "http://google.com"
};
page.settings = {
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)"
};
page.open("http://www.google.com", function () {
var text = page.evaluate(function (selector) {
return document.querySelector(selector).innerText;
}, selector);
// resolve the promise and pass 'text' back to node
resolve(text);
});
}).then(function (text) {
// inside node again
console.log("The element contains the following text: " + text);
});
Please note that the
phantom-object provided by phridge is completely different to the
phantom-object inside PhantomJS. So is the
page-object. Check out the api for further information.
npm install phridge
phridge.spawn({
proxyAuth: "john:1234",
loadImages: false,
// passing CLI-style options does also work
"--remote-debugger-port": 8888
}).then(function (phantom) {
// phantom is now a reference to a specific PhantomJS process
});
phridge.spawn() takes an object which will be passed as config to PhantomJS. Check out their documentation for a detailed overview of options. CLI-style options are added as they are, so be sure to escape the space character.
Please note: There are known issues of PhantomJS that some config options are only supported in CLI-style.
phantom.run(function () {
console.log("Hi from PhantomJS");
});
phridge stringifies the given function, sends it to PhantomJS and evals it again. Hence you can't use scope variables:
var someVar = "hi";
phantom.run(function () {
console.log(someVar); // throws a ReferenceError
});
You can also pass arguments to the PhantomJS process:
phantom.run("hi", 2, {}, function (string, number, object) {
console.log(string, number, object); // 'hi', 2, [object Object]
});
Arguments are stringified by
JSON.stringify(), so be sure to use JSON-valid objects.
The given function can run sync and async. However, the
run() method itself will always run async as it needs to wait for the process to respond.
Sync
phantom.run(function () {
return Math.PI;
}).then(function (pi) {
console.log(pi === Math.PI); // true
});
Async
phantom.run(function (resolve) {
setTimeout(function () {
resolve("after 500 ms");
}, 500);
}).then(function (msg) {
console.log(msg); // 'after 500 ms'
});
Results are also stringified by
JSON.stringify(), so returning application objects with functions won't work.
phantom.run(function () {
...
// doesn't work because page is not a JSON-valid object
return page;
});
Errors can be returned by using the
throw keyword or by calling the
reject function. Both ways will reject the promise returned by
run().
Sync
phantom.run(function () {
throw new Error("An unknown error occured");
}).catch(function (err) {
console.log(err); // 'An unknown error occured'
});
Async
phantom.run(function (resolve, reject) {
setTimeout(function () {
reject(new Error("An unknown error occured"));
}, 500);
}).catch(function (err) {
console.log(err); // 'An unknown error occured'
});
resolve and
reject are just appended to the regular arguments:
phantom.run(1, 2, 3, function (one, two, three, resolve, reject) {
});
Since the function passed to
phantom.run() can't declare variables in the global scope, it is impossible to maintain state in PhantomJS. That's why
phantom.run() calls all functions on the same context object. Thus you can easily store state variables.
phantom.run(function () {
this.message = "Hello from the first call";
}).then(function () {
phantom.run(function () {
console.log(this.message); // 'Hello from the first call'
});
});
For further convenience all PhantomJS modules are already available in the global scope.
phantom.run(function () {
console.log(webpage); // [object Object]
console.log(system); // [object Object]
console.log(fs); // [object Object]
console.log(webserver); // [object Object]
console.log(child_process); // [object Object]
});
Most of the time its more useful to work in a specific webpage context. This is done by creating a Page via
phantom.createPage() which calls internally
require("webpage").create(). The returned page wrapper will then execute all functions bound to a PhantomJS webpage instance.
var page = phantom.createPage();
page.run(function (resolve, reject) {
// `this` is now a webpage instance
this.open("http://example.com", function (status) {
if (status !== "success") {
return reject(new Error("Cannot load " + this.url));
}
resolve();
});
});
And for the busy ones: You can just call
phantom.openPage(url) which is basically the same as above:
phantom.openPage("http://example.com").then(function (page) {
console.log("Example loaded");
});
If you don't need a particular page anymore, just call:
page.dispose().then(function () {
console.log("page disposed");
});
This will clean up all page references inside PhantomJS.
If you don't need the whole process anymore call
phantom.dispose().then(function () {
console.log("process terminated");
});
which will terminate the process cleanly by calling
phantom.exit(0) internally. You don't need to dispose all pages manuallly when you call
phantom.dispose().
However, calling
phridge.disposeAll().then(function () {
console.log("All processes created by phridge.spawn() have been terminated");
});
will terminate all processes.
I strongly recommend to call
phridge.disposeAll() when the node process exits as this is the only way to ensure that all child processes terminate as well. Since
disposeAll() is async it is not safe to call it on
process.on("exit"). It is better to call it on
SIGINT,
SIGTERM and within your regular exit flow.
Spawns a new PhantomJS process with the given config. Read the PhantomJS documentation for all available config options. Use camelCase style for option names. The promise will be fulfilled with an instance of
Phantom.
Terminates all PhantomJS processes that have been spawned. The promise will be fulfilled when all child processes emitted an
exit-event.
Destination stream where PhantomJS' clean stdout will be piped to. Set it
null if you don't want it. Changing the value does not affect processes that have already been spawned.
Destination stream where PhantomJS' stderr will be piped to. Set it
null if you don't want it. Changing the value does not affect processes that have already been spawned.
A reference to the ChildProcess-instance.
phridge extends the ChildProcess-instance by a new stream called
cleanStdout. This stream is piped to
process.stdout by default. It provides all data not dedicated to phridge. Streaming data is considered to be dedicated to phridge when the new line is preceded by the classifier string
"message to node: ".
Stringifies
fn, sends it to PhantomJS and executes it there again.
args... are stringified using
JSON.stringify() and passed to
fn again.
fn may simply
return a result or
throw an error or call
resolve() or
reject() respectively if it is asynchronous. phridge compares
fn.length with the given number of arguments to determine whether
fn is sync or async. The returned promise will be resolved with the result or rejected with the error.
Creates a wrapper to execute code in the context of a specific PhantomJS webpage.
Calls
phantom.createPage(), then
page.open(url, cb) inside PhantomJS and resolves when
cb is called. If the returned
status is not
"success" the promise will be rejected.
Calls
phantom.exit(0) inside PhantomJS and resolves when the child process emits an
exit-event.
Will be emitted when PhantomJS exited without a call to
phantom.dispose() or one of its std streams emitted an
error event. This event may be fired on some OS when the process group receives a
SIGINT or
SIGTERM (see #35).
When an
unexpectedExit event is encountered, the
phantom instance will be unusable and therefore automatically disposed. Usually you don't need to listen for this event.
A reference to the parent
Phantom instance.
Calls
fn on the context of a PhantomJS page object. See
phantom.run() for further information.
Cleans up this page instance by calling
page.close()
From opening a bug report to creating a pull request: every contribution is appreciated and welcome. If you're planing to implement a new feature or change the api please create an issue first. This way we can ensure that your precious work is not in vain.
All pull requests should have 100% test coverage (with notable exceptions) and need to pass all tests.
npm test to run the unit tests
npm run coverage to check the test coverage (using istanbul)
Unlicense