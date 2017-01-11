A simple wrapper for the Tesseract OCR package for node.js
There is a hard dependency on the Tesseract project. You can find installation instructions for various platforms on the project site. For Homebrew users, the installation is quick and easy.
brew install tesseract --with-all-languages
The above will install all of the language packages available, if you don't need them all you can remove the
--all-languages flag and install them manually, by downloading them to your local machine and then exposing the
TESSDATA_PREFIX variable into your path:
export TESSDATA_PREFIX=~/Downloads/
You can then go about installing the node-module to expose the JavaScript API:
npm install node-tesseract
var tesseract = require('node-tesseract');
// Recognize text of any language in any format
tesseract.process(__dirname + '/path/to/image.jpg',function(err, text) {
if(err) {
console.error(err);
} else {
console.log(text);
}
});
// Recognize German text in a single uniform block of text and set the binary path
var options = {
l: 'deu',
psm: 6,
binary: '/usr/local/bin/tesseract'
};
tesseract.process(__dirname + '/path/to/image.jpg', options, function(err, text) {
if(err) {
console.error(err);
} else {
console.log(text);
}
});