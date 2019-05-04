Safely remove repeating whitespace from HTML text.
Using
\s to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
\u00a0 or
(non-breaking space)
\ufeff or
﻿ (zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680 or
  (Ogham space mark)
\u180e or
᠎ (Mongolian vowel separator)
\u2000 or
  (en quad)
\u2001 or
  (em quad)
\u2002 or
  (en space)
\u2003 or
  (em space)
\u2004 or
  (three-per-em space)
\u2005 or
  (four-per-em space)
\u2006 or
  (six-per-em space)
\u2007 or
  (figure space)
\u2008 or
  (punctuation space)
\u2009 or
  (thin space)
\u200a or
  (hair space)
\u2028 or
  (line separator)
\u2029 or
  (paragraph separator)
\u202f or
  (narrow non-breaking space)
\u205f or
  (medium mathematical space)
\u3000 or
　 (ideographic space)
For the sake of completeness, the following characters which are not part of
\s will also not be affected:
\u200b or
​ (zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Node.js
>= 8 is required. Type this at the command line:
npm install normalize-html-whitespace
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');
//-> ' foo bar baz '