chardet

Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

Packed size is only 22 KB

Works in all environments: Node / Browser / Native

Works on all platforms: Linux / Mac / Windows

No dependencies

No native code / bindings

100% written in Typescript

Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

const chardet = require ( 'chardet' ); chardet.detect(Buffer.from( 'hello there!' )); chardet.detectFile( '/path/to/file' ).then( encoding => console .log(encoding)); chardet.detectFileSync( '/path/to/file' );

To return the full list of possible encodings use analyse method.

const chardet = require ( 'chardet' ); chardet.analyse(Buffer.from( 'hello there!' ));

Returned value is an array of objects sorted by confidence value in decending order

[ { confidence : 90 , name : 'UTF-8' }, { confidence : 20 , name : 'windows-1252' , lang : 'fr' } ];

Working with large data sets

Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:

chardet .detectFile( '/path/to/file' , { sampleSize : 32 }) .then( encoding => console .log(encoding));

Supported Encodings:

UTF-8

UTF-16 LE

UTF-16 BE

UTF-32 LE

UTF-32 BE

ISO-2022-JP

ISO-2022-KR

ISO-2022-CN

Shift_JIS

Big5

EUC-JP

EUC-KR

GB18030

ISO-8859-1

ISO-8859-2

ISO-8859-5

ISO-8859-6

ISO-8859-7

ISO-8859-8

ISO-8859-9

windows-1250

windows-1251

windows-1252

windows-1253

windows-1254

windows-1255

windows-1256

KOI8-R

Currently only these encodings are supported.

Yes. Type definitions are included.

