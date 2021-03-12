Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in TypeScript and can be used as a CommonJS library
npm install --save beautiful-dom
const BeautifulDom = require('beautiful-dom');
const document = `
<p class="paragraph highlighted-text" >
My name is <b> Ajah, C.S. </b> and I am a <span class="work"> software developer </span>
</p>
<div class = "container" id="container" >
<b> What is the name of this module </b>
<p> What is the name of this libray </p>
<a class="myWebsite" href="https://www.ajah.xyz" > My website </a>
</div>
<form>
<label for="name"> What's your name? </label>
<input type="text" id="name" name="name" />
</form>
`;
const dom = new BeautifulDom(document);
Methods on the document object.
Methods on the HTML node object
Properties of the HTML node object
Their usage is as they are expected to be used in an actual HTML DOM with the desired method parameters.
let paragraphNodes = dom.getElementsByTagName('p');
// returns a list of node objects with node name 'p'
let nodesWithSpecificClass = dom.getElementsByClassName('work');
// returns a list of node objects with class name 'work'
let nodeWithSpecificId = dom.getElementById('container');
// returns a node with id 'container'
let complexQueryNodes = dom.querySelectorAll('p.paragraph b');
// returns a list of nodes that satisfy the complex query of CSS selectors
let nodesWithSpecificName = dom.getElementsByName('name');
// returns a list of nodes with the specific 'name'
let linkNode = dom.querySelector('a#myWebsite');
// returns a node object with with the CSS selector
let linkHref = linkNode.getAttribute('href');
// returns the value of the attribute e.g 'https://www.ajah.xyz'
let linkInnerHTML = linkNode.innerHTML
// returns the innerHTML of a node object e.g ' My website '
let linkTextContent = linkNode.textContent
// returns the textContent of a node object e.g ' My website '
let linkInnerText = linkNode.innerText
// returns the innerText of a node object e.g ' My website '
let linkOuterHTML = linkNode.outerHTML
// returns the outerHTML of a node object i.e. '<a class="myWebsite" href="https://www.ajah.xyz" > My website </a>'
In case you have any ideas, features you would like to be included or any bug fixes, you can send a PR.
(Requires Node v6 or above)
git clone https://github.com/ChukwuEmekaAjah/beautiful-dom.git