From source:
git clone git://github.com/mikeal/spider.git cd spider npm link ../spider
var spider = require('spider'); var s = spider();
The
options object can have the following fields:
maxSockets - Integer containing the maximum amount of sockets in the pool. Defaults to
4.
userAgent - The User Agent String to be sent to the remote server along with our request. Defaults to
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7 (firefox userAgent String).
cache - The Cache object to be used as cache. Defaults to NoCache, see code for implementation details for a new Cache object.
pool - A hash object containing the agents for the requests. If omitted the requests will use the global pool which is set to maxSockets.
Where the params are the following :
hosts - A string -- or an array of string -- representing the
host part of the targeted URL(s).
pattern - The pattern against which spider tries to match the remaining (
pathname +
search +
hash) of the URL(s).
cb - A function of the form
function(window, $) where
this - Will be a variable referencing the
Routes.match return object/value with some other goodies added from spider. For more info see https://github.com/aaronblohowiak/routes.js
window - Will be a variable referencing the document's window.
$ - Will be the variable referencing the jQuery Object.
spider.get(url) where
url is the url to fetch.
Currently the MemoryCache must provide the following methods:
get(url, cb) - Returns
url's
body field via the
cb callback/continuation if it exists. Returns
null otherwise.
cb - Must be of the form
function(retval) {...}
getHeaders(url, cb) - Returns
url's
headers field via the
cb callback/continuation if it exists. Returns
null otherwise.
cb - Must be of the form
function(retval) {...}
set(url, headers, body) - Sets/Saves
url's
headers and
body in the cache.
spider.log(level) - Where
level is a string that can be any of
"debug",
"info",
"error"