nm

node-minhash

Minash algorithm in node

Showing:

Popularity

Downloads/wk

24

GitHub Stars

7

Maintenance

Last Commit

5yrs ago

Contributors

1

Package

Dependencies

4

Size (min+gzip)

99.5KB

License

Type Definitions

Tree-Shakeable

No?

Categories

Readme

node-minhash

A simple command line tool for comparing text files using the minhash algorithm and contrasting with the jaccard index.

Build Status

References

Installation

If you have just clone this like then run the following

npm install
npm link

Or if you would like to install globally

npm install https://github.com/sjhorn/node-minhash -g

Command line tool usage

Using node

minhash file1.txt file2.txt

minhash https://file.com/page1.html https://file.com/page2.html

Using lib

var minhash = require('node-minhash');

minhash.summary(string1, string2);

Methods

.summary(file1, file2)

Compare two text strings using both minhash and jaccard index and print a summary

.compare(file1, file2)

Compare two text strings using both minhash and jaccard index

.shingles(string, words_per_single=2)

Convert string to set of shingles using the default of 2 words per shingle and tokenise using the natural libraries default tokeniser.

.jaccardIndex(string1, string2)

Compare two strings by tokenising and then compare the intersection of shingles to the union of shingles.

.shingleHashList(set)

Convert a set of shingles to a set of crc-32 hashes.

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100