dry
dryscrape
pypi i dryscrape
dry

dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

by Niklas Baumstark

1.0 (see all)License:MIT
pypi i dryscrape
Readme

Overview

Author: Niklas Baumstark

dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.

It is built on the shoulders of capybara-webkit's webkit-server. A big thanks goes to thoughtbot, inc. for building this excellent piece of software!

Changelog

  • 1.0: Added Python 3 support, small performance fixes, header names are now properly normalized. Also added the function dryscrape.start_xvfb() to easily start Xvfb.
  • 0.9.1: Changed semantics of the headers function in a backwards-incompatible way: It now returns a list of (key, value) pairs instead of a dictionary.

Supported Platforms

The library has been confirmed to work on the following platforms:

  • Mac OS X 10.9 Mavericks and 10.10 Yosemite
  • Ubuntu Linux
  • Arch Linux

Other unixoid systems should work just fine.

Windows is not officially supported, although dryscrape should work with cygwin.

Installation, Usage, API Docs

Documentation can be found at dryscrape's ReadTheDocs page.

Quick installation instruction:

# pip install dryscrape

Contact, Bugs, Contributions

If you have any problems with this software, don't hesitate to open an issue on Github or open a pull request or write a mail to niklas baumstark at Gmail.

GitHub Stars

527

LAST COMMIT

5yrs ago

MAINTAINERS

1

CONTRIBUTORS

7

OPEN ISSUES

21

OPEN PRs

0
VersionTagPublished
1.0
7yrs ago
0.9.1
7yrs ago
No alternatives found
No tutorials found
Add a tutorial