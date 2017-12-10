mimir: Bag-Of-Words and TF-IDF

Mimir knows a lot about words

mimir is a JavaScript micro-module to produce a vocabulary of words given a set of texts, and a vector representation of a text against that vocabulary. It also performs basic TF-IDF analysis.

In NLP and IR, a bag-of-words model is a way to represent a piece of text with a vector, which, in JavaScript, is a simple array of integers. A vector is the imprescindible starting element for any kind of machine learning or classification.

mimir disregards all grammar and non-alphanumeric characters.

As your text is now a vector, you can use feed it to trained classifiers such as Artificial Neural Networks (ANN), or a Support Vector Machine (SVM).

Usage

BOW

var mimir = require ( './index' ), bow = mimir.bow, dict = mimir.dict; var texts = [ "I like

, : ; chocolate" , "Chocolate; is great" , "I like --boar ragu'" , "I don't like artichokes" ], voc = dict(texts); console .log(bow( "boar like chocolate" , voc), bow( "Ragu is great and I like it" , voc));

Term Frequency - Inverse Document Frequency is extremely important for scoring the importance of words in a series of documents.