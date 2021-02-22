Instamancer

Scrape Instagram's API with Puppeteer.

Notice: Instagram's Web UI and API now requires users to be logged in to access hashtag and account endpoints through a browser. As instamancer is designed to access publicly available data, it currently does not work as intended. Given that this change is unlikely to be reversed, Instamancer will remain unsupported and unmaintained indefinitely. Please use this pinned issue to discuss.

Instamancer is a new type of scraping tool that leverages Puppeteer's ability to intercept requests made by a webpage to an API.

Read more about how Instamancer works here.

Features

Scrape hashtags, users' posts, and individual posts

Download images, albums, and videos

Output JSON, CSV

Batch scraping

Search hashtags, users, and locations

API response validation

Upload files to S3 and depot

Plugins

Data

Metadata that Instamancer is able to gather from posts:

Text

Timestamps

Tagged users

Accessibility captions

Like counts

Comment counts

Images (Thumbnails, Dimensions, URLs)

Videos (URL, View count, Duration)

Comments (Timestamp, Text, Like count, User)

User (Username, Full name, Profile picture, Profile privacy)

Location (Name, Street, Zip code, City, Region, Country)

Sponsored status

Gating information

Fact checking information

Install

Linux

Enable user namespace cloning:

sysctl -w kernel.unprivileged_userns_clone= 1

Or run without a sandbox:

export NO_SANDBOX= true

See Puppeteer troubleshooting

Without downloading chromium

If you wish to install Instamancer without downloading chromium, enable the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before installation

export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD= true

From NPM

npm install -g instamancer

If you're using root to install globally, use the following command to install the Puppeteer dependency

sudo npm install -g instamancer --unsafe-perm= true

From NPX

npx instamancer

From this repository

git clone https://github.com/ScriptSmith/instamancer.git cd instamancer npm install npm run build npm install -g

Usage

Command Line

$ instamancer Usage: instamancer <command> [options] Commands: instamancer hashtag [id] Scrape a hashtag instamancer user [id] Scrape a users posts instamancer post [ids] Scrape a comma-separated list of posts instamancer search [query] Perform a search of users, tags and places instamancer batch [batchfile] Read newline-separated arguments from a file Configuration - -count, -c Number of posts to download ( 0 for all) [number] [ default: 0 ] - -full, -f Retrieve full post data [boolean] [ default: false ] - -sleep, -s Seconds to sleep between interactions [number] [ default: 2 ] - -graft, -g Enable grafting [boolean] [ default: true ] - -browser, -b Browser path. Defaults to the puppeteer version [string] - -sameBrowser Use a single browser when grafting [boolean] [ default: false ] Download - -download, -d Save images from posts [boolean] [ default: false ] - -downdir Download path [ default: "downloads/[endpoint]/[id]" ] - -video, -v Download videos (requires full) [boolean] [ default: false ] - -sync Force download between requests [boolean] [ default: false ] - -threads, -k Parallel download / depot threads [number] [ default: 4 ] - -waitDownload, -w Download media after scraping [boolean] [ default: false ] Upload - -bucket Upload files to an AWS S3 bucket [string] - -depot Upload files to a URL with a PUT request (depot) [string] Output - -file, -o Output filename. '-' for stdout [string] [ default: "[id]" ] - -type, -t Filetype [ choices: "csv" , "json" , "both" ] [ default: "json" ] - -mediaPath, -m Add filepaths to _mediaPath [boolean] [ default: false ] Display - -visible Show browser on the screen [boolean] [ default: false ] - -quiet, -q Disable progress output [boolean] [ default: false ] Logging - -logging, -l [ choices: "none" , "error" , "info" , "debug" ] [ default: "none" ] - -logfile Log file name [string] [ default: "instamancer.log" ] Validation - -strict Throw an error on response type mismatch [boolean] [ default: false ] Plugins - -plugin, -p Use a plugin from the plugins directory [array] [ default: []] Options: - -help Show help [boolean] - -version Show version number [boolean] Examples: instamancer hashtag instagood -fvd Download all the available posts, and their media from #instagood instamancer user arianagrande --type=csv Download Ariana Grande's posts to a - -logging=info --visible CSV file with a non-headless browser, and log all events Source code available at https://github.com/ScriptSmith/instamancer

Module

ES2018 Typescript example:

import {createApi, IOptions} from "instamancer" const options: IOptions = { total: 10 }; const hashtag = createApi( "hashtag" , "beach" , options); ( async ( ) => { for await ( const post of hashtag.generator( ) ) { console .log( post ); } } ) () ;

Generator functions

import {createApi} from "instamancer" createApi( "hashtag" , id, options); createApi( "user" , id, options); createApi( "post" , ids, options); createApi( "search" , query, options);

Options

const options: Instamancer.IOptions = { total: number , headless: boolean , logger: winston.Logger, silent: boolean , sleepTime: number , strict: boolean , hibernationTime: number , enableGrafting: boolean , fullAPI: boolean , proxyURL: string , executablePath: string , validator: Type<unknown>, plugins: IPlugin[] }

Comparison

A comparison of Instagram scraping tools. Please suggest more tools and criteria through a pull request.

To see a speed comparison, visit this page