Webster

Overview

Webster is A Powerful and Extensible Web Crawling Framework for Node.js application. You can use Webster to crawl websites and extract structured data from their pages.

Which is different from other crawling framework is that Webster can scrape the content which rendered by browser client side javascript and ajax request.

Docker quick start

pull the example docker image:

docker pull zhuyingda/webster-demo docker run -it zhuyingda/webster-demo

here is a simple demo for crawling this sample site, (which was a demo used by Scrapy framework):

node demo_producer.js env MOD=debug node demo_consumer.js

Requirements

Node.js 10.x+, redis

Works on Linux, Mac OSX

Or you can deploy on Docker.

Install

npm install webster

Usage on Raspbian Platform

sudo apt install chromium-browser chromium-codecs-ffmpeg env MOD=debug EXE_PATH=/usr/bin/chromium-browser node demo_consumer.js

Architecture overview

Documentation

You can see more details from here.

License

GPL-V3

Copyright (c) 2017-present, Yingda (Sugar) Zhu