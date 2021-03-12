Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in TypeScript and can be used as a CommonJS library

What you get

The ability to parse HTML documents as if you were dealing with HTML documents in a live browser

Fast queries that return essential data from HTML nodes

In-place order of HTML nodes after searching and parsing.

Complex queries with CSS selectors.

How to use

npm install --save beautiful-dom

const BeautifulDom = require ( 'beautiful-dom' ); const document = ` <p class="paragraph highlighted-text" > My name is <b> Ajah, C.S. </b> and I am a <span class="work"> software developer </span> </p> <div class = "container" id="container" > <b> What is the name of this module </b> <p> What is the name of this libray </p> <a class="myWebsite" href="https://www.ajah.xyz" > My website </a> </div> <form> <label for="name"> What's your name? </label> <input type="text" id="name" name="name" /> </form> ` ; const dom = new BeautifulDom( document );

API

Methods on the document object.

document.getElementsByTagName()

document.getElementsByClassName()

document.getElementsByName()

document.getElementById()

document.querySelectorAll()

document.querySelector()

Methods on the HTML node object

node.getElementsByClassName()

node.getElementsByTagName()

node.querySelector()

node.querySelectorAll()

node.getAttribute()

Properties of the HTML node object

node.outerHTML

node.innerHTML

node.textContent

node.innerText

Their usage is as they are expected to be used in an actual HTML DOM with the desired method parameters.

Examples for document object

let paragraphNodes = dom.getElementsByTagName( 'p' ); let nodesWithSpecificClass = dom.getElementsByClassName( 'work' ); let nodeWithSpecificId = dom.getElementById( 'container' ); let complexQueryNodes = dom.querySelectorAll( 'p.paragraph b' ); let nodesWithSpecificName = dom.getElementsByName( 'name' ); let linkNode = dom.querySelector( 'a#myWebsite' ); let linkHref = linkNode.getAttribute( 'href' ); let linkInnerHTML = linkNode.innerHTML let linkTextContent = linkNode.textContent let linkInnerText = linkNode.innerText let linkOuterHTML = linkNode.outerHTML

Examples for a node object

let paragraphNodes = dom.getElementsByTagName( 'p' ); let nodesWithSpecificClass = paragraphNodes[ 0 ].getElementsByClassName( 'work' ); let complexQueryNodes = paragraphNodes[ 0 ].querySelectorAll( 'span.work' ); let linkNode = dom.querySelector( 'a#myWebsite' ); let linkHref = linkNode.getAttribute( 'href' ); let linkInnerHTML = linkNode.innerHTML let linkTextContent = linkNode.textContent let linkInnerText = linkNode.innerText let linkOuterHTML = linkNode.outerHTML

Contributing

In case you have any ideas, features you would like to be included or any bug fixes, you can send a PR.

(Requires Node v6 or above)

Clone the repo