Notice This is a node module depends on cheerio can only run on nodejs. If you need a browser version, you may consider truncate or nodejs-html-truncate.

const truncate = require ( 'truncate-html' ) truncate( '<p><img src="xxx.jpg">Hello from earth!</p>' , 2 , { byWords : true })

Installation

npm install truncate-html

or

yarn add truncate-html

Try it online

Click https://npm.runkit.com/truncate-html to try.

API

truncate(html, [length], [options]) truncate.setup(options)

Default options

{ byWords : false , stripTags : false , ellipsis : '...' , decodeEntities : false , keepWhitespaces : false , excludes : '' , reserveLastWord : false , keepWhitespaces : false }

You can change default options by using truncate.setup

e.g.

truncate.setup({ stripTags : true , length : 10 }) truncate( '<p><img src="xxx.jpg">Hello from earth!</p>' )

or use existing cheerio instance

import * as cheerio from 'cheerio' truncate.setup({ stripTags : true , length : 10 }) const $ = cheerio.load( '<p><img src="xxx.jpg">Hello from earth!</p>' , { decodeEntities : true }) truncate($)

Notice

Typescript support

This lib is written with typescript and has a defination file along with it. You may need to update your tsconfig.json by adding "esModuleInterop": true to the compilerOptions if you encounter some typing errors, see #19.

{ "compilerOptions" : { "esModuleInterop" : true } }

About final string length

If the html string content's length is shorter than options.length , then no ellipsis will be appended to the final html string. If longer, then the final string length will be options.length + options.ellipsis . And if you set reserveLastWord to true of none zero number, the final string will be various.

All html comments <!-- xxx --> will be removed

About dealing with none alphabetic languages

When dealing with none alphabetic languages, such as Chinese/Japanese/Korean, they don't separate words with whitespaces, so options byWords and reserveLastWord should only works well with alphabetic languages.

And the only dependency of this project cheerio has an issue when dealing with none alphabetic languages, see Known Issues for details.

Using existing cheerio instance

If you want to use existing cheerio instance, truncate option decodeEntities will not work, you should set it in your own cheerio instance:

var html = '<p><img src="abc.png">This is a string</p> for test.' const $ = cheerio.load( ` ${html} ` , { decodeEntities : true }) truncate($, 10 )

Examples

var truncate = require ( 'truncate-html' ) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, 10 ) var string = '<p>poo 💩💩💩💩💩<p>' truncate(string, 6 ) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, 10 , { stripTags : true }) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, 3 , { byWords : true }) var html = '<p> <img src="abc.png">This is a string</p> for test.' truncate(html, 10 , { keepWhitespaces : true }) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, { length : 10 , stripTags : true }) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, { length : 10 , ellipsis : '~' }) var html = '<p><img src="abc.png">This is a string</p> for test.' truncate(html, { length : 10 , ellipsis : '~' , excludes : 'img' }) var html = '<p><img src="abc.png">This is a string</p><div class="something-unwanted"> unwanted string inserted ( ´•̥̥̥ω•̥̥̥` ）</div> for test.' truncate(html, { length : 20 , stripTags : true , ellipsis : '~' , excludes : [ 'img' , '.something-unwanted' ] }) var html = '<p> test for <p> encoded string</p>' truncate(html, { length : 20 , decodeEntities : true }) var html = '<p> test for <p> encoded string</p>' truncate(html, { length : 20 , decodeEntities : false }) var html = '<p> test for <p> 中文 string</p>' truncate(html, { length : 20 , decodeEntities : true })

for More usages, check truncate.spec.ts

Known issues

Known issues about handing CJK(Chinese/Japanese/Korean) characters when set the option decodeEntities to true .

You have seen the option decodeEntities , it's really magic! When it's true, encoded html entities will be decoded automatically, so & will be treat as a single character. This is probably what we want. But, if there are CJK characters in the html string, they will be replaced by characters like ö (still count as one character when truncating) in the final html you get. That's confused.

To fix this, you have two choices:

keep the option decodeEntities false, but & will treat as five characters.

false, but will treat as five characters. modify cheerio's source code: find out the function getInverse in the file ./node_modules/cheerio/node_modules/entities/lib/decode.js , comment out the last line .replace(re_nonASCII, singleCharReplacer); .

Credits

