Safely remove repeating whitespace from HTML text.

Using \s to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:

\u00a0 or (non-breaking space)

or (non-breaking space) \ufeff or ﻿ (zero-width non-breaking space)

…as well as these lesser-known ones:

\u1680 ​ or (Ogham space mark)

​ or (Ogham space mark) \u180e or ᠎ (Mongolian vowel separator)

or (Mongolian vowel separator) \u2000​ or (en quad)

or (en quad) \u2001 or (em quad)

or (em quad) \u2002 or (en space)

or (en space) \u2003 or (em space)

or (em space) \u2004 or (three-per-em space)

or (three-per-em space) \u2005 or (four-per-em space)

or (four-per-em space) \u2006 or (six-per-em space)

or (six-per-em space) \u2007 or (figure space)

or (figure space) \u2008 or (punctuation space)

or (punctuation space) \u2009 or (thin space)

or (thin space) \u200a or (hair space)

or (hair space) \u2028 or (line separator)

or (line separator) \u2029 or (paragraph separator)

or (paragraph separator) \u202f or (narrow non-breaking space)

or (narrow non-breaking space) \u205f or (medium mathematical space)

or (medium mathematical space) \u3000 or (ideographic space)

For the sake of completeness, the following characters which are not part of \s will also not be affected:

\u200b or ​ (zero-width breaking space)

Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.

Installation

Node.js >= 8 is required. Type this at the command line:

npm install normalize-html-whitespace

Usage