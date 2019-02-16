openbase logo
openbase logo
CategoriesLeaderboard
nm

node-metainspector

by Gabriel Cebrian
2.0.0 (see all)

Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc. Inspired by the metainspector Ruby gem

npm
GitHub
CDN

Overview

DocumentationTutorialsReviewsMaintenanceDependenciesVersionsAlternatives
Showing:

Popularity

Downloads/wk

254

GitHub Stars

123

Maintenance

Last Commit

3yrs ago

Contributors

15

Package

Dependencies

5

License

Type Definitions

DefinitelyTyped

Tree-Shakeable

No?

Categories

Node.js Meta Tags

Reviews

Be the first to rate

Readme

status

Node-Metainspector

MetaInspector is an npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags....

Metainspector is inspired by the Metainspector gem by jaimeiniesta

This version requires node v6 or higher, as some dependencies make use of various bits of ES6 functionality. The 1.x.x versions are compatible with v0.x - v4 releases of node, and should be used instead for older applications.

Scraped data

client.url                  # URL of the page
client.scheme               # Scheme of the page (http, https)
client.host                 # Hostname of the page (like, markupvalidator.com, without the scheme)
client.rootUrl              # Root url (scheme + host, i.e http://simple.com/)
client.title                # title of the page, as string
client.links                # array of strings, with every link found on the page as an absolute URL
client.author               # page author, as string
client.keywords             # keywords from meta tag, as array
client.charset              # page charset from meta tag, as string
client.description          # returns the meta description, or the first long paragraph if no meta description is found
client.image                # Most relevant image, if defined with og:image
client.images               # array of strings, with every img found on the page as an absolute URL
client.feeds                # Get rss or atom links in meta data fields as array
client.ogTitle              # opengraph title
client.ogDescription        # opengraph description
client.ogType               # Open Graph Object Type
client.ogUpdatedTime        # Open Graph Updated Time
client.ogLocale             # Open Graph Locale - for languages

Options

timeout - Defines the time Metainspector will wait for the url to respond in ms
maxRedirects - Specifies the number of redirects Metainspector will follow
limit - The limit in the number of bytes Metainspector will download when querying a site

Usage

var MetaInspector = require('node-metainspector');
var client = new MetaInspector("http://www.google.com", { timeout: 5000 });

client.on("fetch", function(){
    console.log("Description: " + client.description);

    console.log("Links: " + client.links.join(","));
});

client.on("error", function(err){
    console.log(err);
});

client.fetch();

TO DO

Finish implementation of the properties below:

Add absolutify url function to return all urls as an absolute url

client.internal_links       # array of strings, with every internal link found on the page as an absolute URL
client.external_links       # array of strings, with every external link found on the page as an absolute URL

ZOMG Fork! Thank you!

You're welcome to fork this project and send pull requests. Just remember to include tests.

Copyright (c) 2009-2012 Gabriel Cebrian, released under the MIT license

Bitdeli Badge

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

url-metadataRequest an http url and scrape its metadata.
GitHub Stars
96
Weekly Downloads
4K
User Rating
4.0/ 5
1
Top Feedback
metascraperEasily scrape data from websites using Open Graph, HTML metadata & fallbacks.
GitHub Stars
2K
Weekly Downloads
23K
met
metafetchNodeJS package that fetches a given URL's title, description, images, links etc.
GitHub Stars
20
Weekly Downloads
620
met
metaget A Node.js module to fetch HTML meta tags (including Open Graph) from a remote URL
GitHub Stars
20
Weekly Downloads
77
lgm
lets-get-metaExtract meta tags from an HTML string in Node.js (not browsers)
GitHub Stars
7
Weekly Downloads
495
See 9 Alternatives

Tutorials

No tutorials found
Add a tutorial