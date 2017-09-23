openbase logo
openbase logo
CategoriesLeaderboard
lda

lda

by Kory Becker
0.2.0 (see all)

LDA topic modeling for node.js

npm
GitHub
CDN

Overview

DocumentationTutorialsReviewsMaintenanceDependenciesVersionsAlternatives
Showing:

Popularity

Downloads/wk

229

GitHub Stars

265

Maintenance

Last Commit

4yrs ago

Contributors

5

Package

Dependencies

1

License

Type Definitions

DefinitelyTyped

Tree-Shakeable

No?

Categories

Node.js NLP

Reviews

Be the first to rate

Readme

LDA

Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js. LDA is a machine learning algorithm that extracts topics and their related keywords from a collection of documents.

In LDA, a document may contain several different topics, each with their own related terms. The algorithm uses a probabilistic model for detecting the number of topics specified and extracting their related keywords. For example, a document may contain topics that could be classified as beach-related and weather-related. The beach topic may contain related words, such as sand, ocean, and water. Similarly, the weather topic may contain related words, such as sun, temperature, and clouds.

See http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

$ npm install lda

Usage

var lda = require('lda');

// Example document.
var text = 'Cats are small. Dogs are big. Cats like to chase mice. Dogs like to eat bones.';

// Extract sentences.
var documents = text.match( /[^\.!\?]+[\.!\?]+/g );

// Run LDA to get terms for 2 topics (5 terms each).
var result = lda(documents, 2, 5);

The above example produces the following result with two topics (topic 1 is "cat-related", topic 2 is "dog-related"):

Topic 1
cats (0.21%)
dogs (0.19%)
small (0.1%)
mice (0.1%)
chase (0.1%)

Topic 2
dogs (0.21%)
cats (0.19%)
big (0.11%)
eat (0.1%)
bones (0.1%)

Output

LDA returns an array of topics, each containing an array of terms. The result contains the following format:

[ [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'small', probability: 0.1 },
    { term: 'mice', probability: 0.1 },
    { term: 'chase', probability: 0.1 } ],
  [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'bones', probability: 0.11 },
    { term: 'eat', probability: 0.1 },
    { term: 'big', probability: 0.099 } ] ]

The result can be traversed as follows:

var result = lda(documents, 2, 5);

// For each topic.
for (var i in result) {
    var row = result[i];
    console.log('Topic ' + (parseInt(i) + 1));
    
    // For each term.
    for (var j in row) {
        var term = row[j];
        console.log(term.term + ' (' + term.probability + '%)');
    }
    
    console.log('');
}

Additional Languages

LDA uses stop-words to ignore common terms in the text (for example: this, that, it, we). By default, the stop-words list uses English. To use additional languages, you can specify an array of language ids, as follows: 

// Use English (this is the default).
result = lda(documents, 2, 5, ['en']);

// Use German.
result = lda(documents, 2, 5, ['de']);

// Use English + German.
result = lda(documents, 2, 5, ['en', 'de']);

To add a new language-specific stop-words list, create a file /lda/lib/stopwords_XX.js where XX is the id for the language. For example, a French stop-words list could be named "stopwords_fr.js". The contents of the file should follow the format of an existing stop-words list. The format is, as follows:

exports.stop_words = [
    'cette',
    'que',
    'une',
    'il'
];

Setting a Random Seed

A specific random seed can be used to compute the same terms and probabilities during subsequent runs. You can specify the random seed, as follows:

// Use the random seed 123.
result = lda(documents, 2, 5, null, null, null, 123);

Author

Kory Becker http://www.primaryobjects.com

Based on original javascript implementation https://github.com/awaisathar/lda.js

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

com
compromisemodest natural-language processing
GitHub Stars
10K
Weekly Downloads
17K
User Rating
4.8/ 5
4
Top Feedback
1Great Documentation
1Easy to Use
1Performant
naturalgeneral natural language facilities for node
GitHub Stars
10K
Weekly Downloads
82K
User Rating
4.7/ 5
3
Top Feedback
1Easy to Use
1Performant
sen
sentimentAFINN-based sentiment analysis for Node.js.
GitHub Stars
2K
Weekly Downloads
13K
User Rating
5.0/ 5
1
Top Feedback
fra
francNatural language detection
GitHub Stars
4K
Weekly Downloads
26K
User Rating
2.5/ 5
2
Top Feedback
2Buggy
sn
spacy-nlpExpose Spacy nlp text parsing to Nodejs (and other languages) via socketIO
GitHub Stars
205
Weekly Downloads
174
User Rating
4.0/ 5
2
Top Feedback
1Great Documentation
1Performant
node-nlpAn NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
GitHub Stars
5K
Weekly Downloads
8K
See 10 Alternatives

Tutorials

No tutorials found
Add a tutorial