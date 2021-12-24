Parse through a sitemaps xml to get all the urls for your crawler.

Version 2

Installation

npm install sitemapper --save

Simple Example

const Sitemapper = require ( 'sitemapper' ); const sitemap = new Sitemapper(); sitemap.fetch( 'https://wp.seantburke.com/sitemap.xml' ).then( function ( sites ) { console .log(sites); });

Examples in ES6

import Sitemapper from 'sitemapper' ; ( async ( ) => { const Google = new Sitemapper({ url : 'https://www.google.com/work/sitemap.xml' , timeout : 15000 , }); try { const { sites } = await Google.fetch(); console .log(sites); } catch (error) { console .log(error); } })(); const sitemapper = new Sitemapper(); sitemapper.timeout = 5000 ; sitemapper.fetch( 'https://wp.seantburke.com/sitemap.xml' ) .then( ( { url, sites } ) => console .log( `url: ${url} ` , 'sites:' , sites)) .catch( error => console .log(error));

Options

You can add options on the initial Sitemapper object when instantiating it.

requestHeaders : (Object) - Additional Request Headers (e.g. User-Agent )

: (Object) - Additional Request Headers (e.g. ) timeout : (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)

: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds) url : (String) - Sitemap URL to crawl

: (String) - Sitemap URL to crawl debug : (Boolean) - Enables/Disables debug console logging. Default: False

: (Boolean) - Enables/Disables debug console logging. Default: False concurrency : (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10

: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10 retries : (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0

: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0 rejectUnauthorized : (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True

const sitemapper = new Sitemapper({ url : 'https://art-works.community/sitemap.xml' , rejectUnauthorized : true , timeout : 15000 , requestHeaders : { 'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0' } });

An example using all available options:

const sitemapper = new Sitemapper({ url : 'https://art-works.community/sitemap.xml' , timeout : 15000 , requestHeaders : { 'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0' }, debug : true , concurrency : 2 , retries : 1 , });

Examples in ES5

var Sitemapper = require ( 'sitemapper' ); var Google = new Sitemapper({ url : 'https://www.google.com/work/sitemap.xml' , timeout : 15000 }); Google.fetch() .then( function ( data ) { console .log(data); }) .catch( function ( error ) { console .log(error); }); var sitemapper = new Sitemapper(); sitemapper.timeout = 5000 ; sitemapper.fetch( 'https://wp.seantburke.com/sitemap.xml' ) .then( function ( data ) { console .log(data); }) .catch( function ( error ) { console .log(error); });

Version 1

npm install sitemapper@1.1.1 --save

Simple Example