spatula is a modern Python library for writing maintainable web scrapers.
- Page-oriented design: Encourages writing understandable & maintainable scrapers.
- Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
- Fast HTML parsing: Uses
lxml.html for fast, consistent, and reliable parsing of HTML.
- Flexible Data Model Support: Compatible with
pydantic, or bring your own data model classes for storing & validating your scraped data.
- CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
- Fully Typed: Makes full use of Python 3 type annotations.