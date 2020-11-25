openbase logo
openbase logo
CategoriesLeaderboard
osm

osmosis

by rchipka
1.1.10 (see all)

Web scraper for NodeJS

npm
GitHub
CDN

Overview

DocumentationTutorialsReviewsMaintenanceDependenciesVersionsAlternatives
Showing:

Popularity

Downloads/wk

1.7K

GitHub Stars

4K

Maintenance

Last Commit

1yr ago

Contributors

10

Package

Dependencies

2

License

MIT

Type Definitions

DefinitelyTyped

Tree-Shakeable

No?

Categories

Reviews

Be the first to rate

Readme

Osmosis

HTML/XML parser and web scraper for NodeJS.

NPM

Build Status

Downloads

Features

  • Uses native libxml C bindings

  • Clean promise-like interface

  • Supports CSS 3.0 and XPath 1.0 selector hybrids

  • Sizzle selectors, Slick selectors, and more

  • No large dependencies like jQuery, cheerio, or jsdom

  • Compose deep and complex data structures

  • HTML parser features

    • Fast parsing
    • Very fast searching
    • Small memory footprint

  • HTML DOM features

    • Load and search ajax content
    • DOM interaction and events
    • Execute embedded and remote scripts
    • Execute code in the DOM

  • HTTP request features

    • Logs urls, redirects, and errors
    • Cookie jar and custom cookies/headers/user agent
    • Login/form submission, session cookies, and basic auth
    • Single proxy or multiple proxies and handles proxy failure
    • Retries and redirect limits

Example

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
})
.data(function(listing) {
    // do something with listing data
})
.log(console.log)
.error(console.log)
.debug(console.log)

Documentation

For documentation and examples check out https://rchipka.github.io/node-osmosis/global.html

Dependencies

Please consider a donation if you depend on web scraping and Osmosis makes your job a bit easier. Your contribution allows me to spend more time making this the best web scraper for Node.

Donate

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

No alternatives found

Tutorials

No tutorials found
Add a tutorial