Scrapio - is a lightweight and user-friendy web crawling and scraping library. The main goal of creating the project was to make scraping big amounts of similar data from web easy and user-friendly. It might be useful for wide range of applications, like data mining, data processing and archiving. After some time, I am going to make it a standalone service, which will work as an API.
At the moment works as a library which can be used to crawl and scrap data from web. What it can do:
go get github.com/koshqua/scrapio
Crawler is easy to use. You just need to specify a starting URL and it will crawl all the URL on the host.
//init a new crawler, give it a start url, it's not necessary should be basic URL
cr := &crawler.Crawler{StartURL: "https://gulfnews.com/"}
//Start crawling func.
//After some time im going to implement more configs for this func, like max results, etc.
cr.Crawl()
//Do something with result, it's up to you
Scraper uses data structure given by crawler. Before initiating a scraper, you need to create a few selectors, to assign them to scraper. Selectors are the simple css-like selectors.
//create some Selectors, which you want to scrap.
h2 := scraper.NewSelector("h2", true, true, true)
img := scraper.NewSelector("img", true, true, true)
p := scraper.NewSelector("p:first-of-type", true, true, true)
//Initiate a new scrapper with given selectors
//Scraper depends on the crawler from previous code snippet.
//It gets pages and creates new structure with selectors and scrap results.
sc := scraper.InitScraper(*cr, []scraper.Selector{h2, img, p})
//And just start scraping
err := sc.Scrap()
if err != nil {
log.Fatalln(err)
}
Version | Tag | Published |
---|---|---|
v0.0.0-20200504235533-2f0050d63ae3 | 1yr ago | |
v0.0.0-20200329105957-a97bc2c540fc | 2yrs ago | |
v0.0.0-20200329102543-7f51383d066b | 2yrs ago | |
v0.0.0-20200329100701-8bbe4c83f31e | 2yrs ago |