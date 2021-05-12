Tesseract OCR for Node.js

Installation

First, you need to install the Tesseract project. Instructions for installing Tesseract for all platforms can be found on the project site. On Debian/Ubuntu:

apt-get install tesseract-ocr

After you've installed Tesseract, you can go installing the npm-package:

npm install node-tesseract-ocr

Usage

const tesseract = require ( "node-tesseract-ocr" ) const config = { lang : "eng" , oem : 1 , psm : 3 , } tesseract .recognize( "image.jpg" , config) .then( ( text ) => { console .log( "Result:" , text) }) .catch( ( error ) => { console .log(error.message) })

Also you can pass Buffer:

const img = fs.readFileSync( "image.jpg" ) tesseract .recognize(img, config) .then( ( text ) => { console .log( "Result:" , text) }) .catch( ( error ) => { console .log(error.message) })

or URL:

const img = "https://tesseract.projectnaptha.com/img/eng_bw.png" tesseract .recognize(img, config) .then( ( text ) => { console .log( "Result:" , text) }) .catch( ( error ) => { console .log(error.message) })

If you want to process multiple images in a single run, then pass an array:

const images = [ "./test/samples/file1.png" , "./test/samples/file2.png" ] tesseract .recognize(images, config) .then( ( text ) => { console .log( "Result:" , text) }) .catch( ( error ) => { console .log(error.message) })

In the config object you can pass any OCR options. Also you can pass here any control parameters or use ready-made sets of config files (like hocr):