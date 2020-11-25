Osmosis

HTML/XML parser and web scraper for NodeJS.

Features

Uses native libxml C bindings

Clean promise-like interface

Supports CSS 3.0 and XPath 1.0 selector hybrids

Sizzle selectors, Slick selectors, and more

No large dependencies like jQuery, cheerio, or jsdom

Compose deep and complex data structures

HTML parser features Fast parsing Very fast searching Small memory footprint

HTML DOM features Load and search ajax content DOM interaction and events Execute embedded and remote scripts Execute code in the DOM

HTTP request features Logs urls, redirects, and errors Cookie jar and custom cookies/headers/user agent Login/form submission, session cookies, and basic auth Single proxy or multiple proxies and handles proxy failure Retries and redirect limits



Example

var osmosis = require ( 'osmosis' ); osmosis .get( 'www.craigslist.org/about/sites' ) .find( 'h1 + div a' ) .set( 'location' ) .follow( '@href' ) .find( 'header + div + div li > a' ) .set( 'category' ) .follow( '@href' ) .paginate( '.totallink + a.button.next:first' ) .find( 'p > a' ) .follow( '@href' ) .set({ 'title' : 'section > h2' , 'description' : '#postingbody' , 'subcategory' : 'div.breadbox > span[4]' , 'date' : 'time@datetime' , 'latitude' : '#map@data-latitude' , 'longitude' : '#map@data-longitude' , 'images' : [ 'img@src' ] }) .data( function ( listing ) { }) .log( console .log) .error( console .log) .debug( console .log)

Documentation

For documentation and examples check out https://rchipka.github.io/node-osmosis/global.html

Dependencies

libxmljs-dom - DOM wrapper for libxmljs C bindings

needle - Lightweight HTTP wrapper

