trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
htmldate
Fast and robust date extraction from web pages, with Python or on the command-line
data-extractor
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
lassie
Web Content Retrieval for Humans™
goose3
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
gazpacho
🥫 The simple, fast, and modern web scraping library
goose-extractor
Html Content / Article Extractor, web scrapping lib in Python
newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
twint
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.