Overview

Wuzzy was created to provide a smattering of some similarity identification stuff. Several simularity identification algorithm implementations are provided, including:

Jaccard similarity coefficient

Tanimoto coefficient

Pearson correlation

N-gram edit distance

Levenshtein distance

Jaro-Winkler distance

Fuzzy wuzzy was a bear, fuzzy wuzzy had no hair, fuzzy wuzzy wasn't very fuzzy, was he? Well, if you aren't sure maybe this library can help! :)

Installing

Wuzzy can be installed via npm ( npm install wuzzy ).

Examples

Some examples of using Wuzzy can be found in the real-wuzzy repository.

Methods

All bad jokes aside, below is a listing of the available functions. Have fun!

Computes the jaro-winkler distance for two given arrays.

NOTE: this implementation is based on the one found in the Lucene Java library.

wuzzy .jarowinkler ( [ 'D' , 'W' , 'A' , 'Y' , 'N' , 'E' ], [ 'D' , 'U' , 'A' , 'N' , 'E' ] ); wuzzy .jarowinkler ( 'DWAYNE' , 'DUANE' );

String|Array a - the first string/array to compare

a - the first string/array to compare String|Array b - the second string/array to compare

b - the second string/array to compare Number t - the threshold for adding

Number returns the jaro-winkler distance for

Calculates the levenshtein distance for the two provided arrays and returns the normalized distance.

wuzzy .levenshtein ( [ 'D' , 'W' , 'A' , 'Y' , 'N' , 'E' ], [ 'D' , 'U' , 'A' , 'N' , 'E' ] ); or wuzzy .levenshtein ( 'DWAYNE' , 'DUANE' );

String|Array a - the first string/array to compare

a - the first string/array to compare String|Array b - the second string/array to compare

b - the second string/array to compare Object w - (optional) a set of key/value pairs

Number returns the levenshtein distance for

Computes the n-gram edit distance for any n (defaults to 2).

NOTE: this implementation is based on the one found in the Lucene Java library.

wuzzy .ngram ( [ 'D' , 'W' , 'A' , 'Y' , 'N' , 'E' ], [ 'D' , 'U' , 'A' , 'N' , 'E' ] ); or wuzzy .ngram ( 'DWAYNE' , 'DUANE' );

String|Array a - the first string/array to compare

a - the first string/array to compare String|Array b - the second string/array to compare

b - the second string/array to compare Number ng - (optional) the n-gram size to work with (defaults to 2)

Number returns the ngram distance for

Calculates a pearson correlation score for two given objects (compares values of similar keys).

wuzzy.pearson( {a: 2.5 , b: 3.5 , c: 3.0 , d: 3.5 , e: 2.5 , f: 3.0 }, {a: 3.0 , b: 3.5 , c: 1.5 , d: 5.0 , e: 3.5 , f: 3.0 , g: 5.0 } ); // -> 0.396 or wuzzy.pearson( {a: 2.5 , b: 1 }, {o: 3.5 , e: 6.0 } ); // -> 1.0

Object a - the first object to compare

a - the first object to compare Object b - the second object to compare

Number returns the pearson correlation for

Calculates the jaccard index for the two provided arrays.

wuzzy .jaccard ( [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' ], [ 'a' , 'e' , 'f' ] ); or wuzzy .jaccard ( 'abcdef' , 'aef' ); or wuzzy .jaccard ( [ 'abe' , 'babe' , 'cabe' , 'dabe' , 'eabe' , 'fabe' ], [ 'babe' ] );

String|Array a - the first string/array to compare

a - the first string/array to compare String|Array b - the second string/array to compare

Number returns the jaccard index for

Calculates the tanimoto distance (weighted jaccard index).

wuzzy .tanimoto ( [ 'a' , 'b' , 'c' , 'd' , 'd' , 'e' , 'f' , 'f' ], [ 'a' , 'e' , 'f' ] ); or wuzzy .tanimoto ( 'abcddeff' , 'aef' ); or wuzzy .tanimoto ( [ 'abe' , 'babe' , 'cabe' , 'dabe' , 'eabe' , 'fabe' , 'fabe' ], [ 'babe' ] );

String|Array a - the first string/array to compare

a - the first string/array to compare String|Array b - the second string/array to compare