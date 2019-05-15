Siteshooter

Automate full website screen shots and PDF generation with multiple view port support

Features

Crawls specified host and generates a sitemap.xml on the fly

on the fly Generates entire website screen shots based on sitemap.xml

Define multiple view ports

Automated PDF generation

Includes crawled meta data in generated PDF

Reports on broken website links (404 http response)

Supports HTTP basic authentication

Supports Microsoft Online 3 step authentication

Supports Salesforce Visualforce 3 step authentication

Supports site maps with HTTP, HTTPS, and FTP protocol URLs

Follows HTTP 301 redirects

Custom JavaScript inject file - injects into page prior to screen shooting

Trigger page events by passing querystring values to custom inject.js file

Getting Started

Dependencies

Install the following prerequisite on your development machine:

Notable npm Modules

Quick Start

$ npm install siteshooter

If siteshooter is installed, make sure you have the latest version by running:

$ npm update siteshooter

You may need to run these commands with elevated privileges, e.g. sudo , you will be prompted to do so if needed.

, you will be prompted to do so if needed. Installing with the --global flag affords you the siteshooter command on your machine's command line at any path.

flag affords you the command on your machine's command line at any path. Read more about the --global flag here.

Create a Siteshooter Configuration File

$ siteshooter

View the full siteshooter.yml example

Inside siteshooter.yml , add additional options.

All Simple Web Crawler options can be added to sitecrawler_options and will pass through to the crawler process

and will pass through to the crawler process Generated screenshot image files are optimized using imagemin and imagemin-pngquant modules, which reduce the overall size of generated PDFs. To adjust the image quality, update the image_quality option in your siteshooter.yml file.

domain: name: https://www.devopsgroup.io auth: user: pwd: pdf_options: excludeMeta: true screenshot_options: delay: 2000 image_quality: '60-80' transparent_background: false sitecrawler_options: exclude: - "pdf" stripQuerystring: false ignoreInvalidSSL: true viewports: - viewport: desktop-large width: 1600 height: 1200 - viewport: tablet-landscape width: 1024 height: 768 - viewport: iPhone5 width: 320 height: 568 - viewport: iPhone6 width: 375 height: 667

CLI Options

$ siteshooter -- help Usage: siteshooter [options] OPTIONS _______________________________________________________________________________________ -c --config Show configuration -C --cwd Set working directory, which will load a siteshooter.yml file in the specified path -e --debug Output exceptions -h -- help Print this help -i --init Create siteshooter.yml template file in working directory -p --pdf Generate PDFs, by defined view ports, based on screen shots created via Siteshooter -q --quiet Only return final output -s --screenshots Generate screen shots, by view ports, based on sitemap.xml file -S --sitemap Crawl domain name specified in siteshooter.yml file and generate a local sitemap.xml file -v --version Print version number -V --verbose Verbose output -w --website Report on website information based on Siteshooter crawled results

When running a siteshooter command without any options, the following options will run in order by default:

--sitemap

--screenshots

--pdf

Custom JavaScript Inject File

To manipulate the DOM, prior to the screen shot process, add a inject.js file in the same working directory as the siteshooter.yml .

Example: inject.js file

console .log( 'JavaScript injected into page.' ); if ( typeof (jQuery) !== "undefined" ) { jQuery( document ).ready( function ( ) { console .log( 'jQuery loaded.' ); }); }

Trigger JavaScript Events

When using the optional inject.js file, events can be triggered based on the following querystring parameter - pevent

<url> < loc > https://www.devopsgroup.io?pevent=open-privacy-overlay </ loc > < changefreq > weekly </ changefreq > </ url >

Example: Event detection & triggering

function getQueryVariable ( variable ) { var query = window .location.search.substring( 1 ); var vars = query.split( '&' ); for ( var i = 0 ; i < vars.length; i++) { var pair = vars[i].split( '=' ); if ( decodeURIComponent (pair[ 0 ]) == variable) { return decodeURIComponent (pair[ 1 ]); } } } if ( typeof (jQuery) !== "undefined" ) { jQuery( document ).ready( function ( ) { var pageName = window .location.pathname.replace( '/' , '' ), pageEvent = getQueryVariable( 'pevent' ); console .log( 'document ready.' ); console .log( 'userAgent' , navigator.userAgent); console .log( 'Page: ' , pageName); console .log( 'Event: ' , pageEvent); switch (pageName) { case '' : switch (pageEvent) { case 'open-privacy-overlay' : jQuery( 'a[data-target~="#modal-privacy"]' ).trigger( 'click' ); break ; } break ; } }); }

Tests

Tests are written with Mocha and can be run with npm test .

Troubleshooting

If you're having issues with Siteshooter, submit a GitHub Issue.

Make sure you have a siteshooter.yml file in your working directory and the yaml file is well formatted

file in your working directory and the yaml file is well formatted Experiencing font-loading issues? Try increasing the delay setting in your siteshooter.yml file

screenshot_options: delay: 2000

Trying to take a screenshot of a page with a video? Unfortunately, PhantomJS does not support videos. As such, here's one approach to showing a video's poster image.

if ( jQuery( 'video' ).length > 0 ){ jQuery( 'video' ).parent().prepend( '<img src="' +jQuery( 'video' ).attr( 'poster' )+ '"/>' ); jQuery( 'video' ).remove(); }

SimpleCrawler TypeError: The header content contains invalid characters Try setting the acceptCookies option to false



sitecrawler_options: acceptCookies: false

