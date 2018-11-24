JavaScript implementation of Japanese morphological analyzer. This is a pure JavaScript porting of Kuromoji.

You can see how kuromoji.js works in demo site.

Directory

Directory tree is as follows:

build/ kuromoji.js demo/ dict/ example/ src/ test/

Usage

You can tokenize sentences with only 5 lines of code. If you need working examples, you can see the files under the demo or example directory.

Install with npm package manager:

npm install kuromoji

Load this library as follows:

var kuromoji = require ( "kuromoji" );

You can prepare tokenizer like this:

kuromoji.builder({ dicPath : "path/to/dictionary/dir/" }).build( function ( err, tokenizer ) { var path = tokenizer.tokenize( "すもももももももものうち" ); console .log(path); });

Browser

You only need the build/kuromoji.js and dict/*.dat.gz files

Install with Bower package manager:

bower install kuromoji

Or you can use the kuromoji.js file and dictionary files from the GitHub repository.

In your HTML:

< script src = "url/to/kuromoji.js" > </ script >

In your JavaScript:

kuromoji.builder({ dicPath : "/url/to/dictionary/dir/" }).build( function ( err, tokenizer ) { var path = tokenizer.tokenize( "すもももももももものうち" ); console .log(path); });

API

The function tokenize() returns an JSON array like this:

[ { word_id: 509800 , word_type: 'KNOWN' , word_position: 1 , surface_form: '黒文字' , pos: '名詞' , pos_detail_1: '一般' , pos_detail_2: '*' , pos_detail_3: '*' , conjugated_type: '*' , conjugated_form: '*' , basic_form: '黒文字' , reading: 'クロモジ' , pronunciation: 'クロモジ' } ]

(This is defined in src/util/IpadicFormatter.js)

See also JSDoc page in details.