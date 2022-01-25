This package parses MHTML files into multiple HTML files.

It aims to handle url resolution edge cases. It features a hand-written HTML parser that doesn't keep the tree or context which is roughly 10 times faster than running htmlparser2 (and about 15 times faster than running parse5).

It rewrites local URLS so that links inside the page keep working.

It supports a parser class that lets you view the files and a convenient npm run serve for serving files locally which expects files to be in the demos directory (for example, try opening http://localhost:8080/hn.mhtml ).

Installation

npm i fast-mhtml

MHTML Parser

Parses MHTML files.

const { Parser } = require ( "fast-mhtml" ); const p = new Parser({ rewriteFn : filenamify, }); const result = p.parse(mhtmlFileContents) .rewrite() .spit();

API

Processor

The processor provices a convenience method for converting a .mhtml file to multiple files.

It provides a single convert static method

const { Processor } = require ( "fast-mhtml" ); Processor.convert( "one.mhtml" );

new Parser([config])

Creates a new mhtml parser with the given rewriteFn mhtml file contents. Example:

const { Parser } = require ( "fast-mhtml" ); const parser = new Parser({ }); const parser2 = new Parser({ rewriteFn(url) { return url.replace( /\//g , '_' ); } }); const links = []; const parser3 = new Parser({ rewriteFn(url) { links.push(url); return url; } });

Parser.prototype.parse(data: Buffer | string)

Parses the given file with the mhtml parser. Expects the string or buffer contents of an .mhtml file produced by Google Chrome.

let data = await fs.readFile( './demos/nested-iframe.mhtml' ) const parser = new Parser({ }); parser.parse(data); parser.spit();

Rewrites all links in the given mhtml file to refer to the other files and passes them through the parser's rewriteFn (filenamify by default).

This is used so that links in the parsed mhtml file refer to the same file.

Note that in typical usage rewriteFn will translate how a link's URL will be saved on your server or locally.

Note: Assumes Parser.prototype.parse was called. Throws an error otherwise

let data = await fs.readFile( './demos/nested-iframe.mhtml' ) parser.parse(data); parser.rewrite(); parser.spit();

Benchmarks

Run with npm run benchmark :