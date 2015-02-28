A simple-but-useful kNN library for NodeJS, comparing JSON Objects using Euclidean distances, returning top k closest objects.
Supports Normalized Weighted Euclidean distances. Normalize attributes by Standard Deviation. See here.
Features
key and
filter attributes to do the data assembly for you, Lisp style!
subject: vantage point object - will consider each attribute present in this object as a feature
objects: array of objects that should all have at least the attributes of subject
options:
- k: (default = unlimited) specifies how many objects to return
- standardize: (default = false) if true, will apply standardization across all attributes using stdvs - set this to true if your attributes do not have the same scale
- weights: (default = {}) a hash describing the weights of each attribute
- key: (default = none) a key function to map over objects, to be used if the subject attributes are nested within key
e.g. if subject is {a:0} and objects are [{x: {a: 0}}, {x: {a: 2}}], then provide key: function(o) {return o.x}
- filter: (default = none) a filter function that returns true for items to be considered
e.g. to only consider objects with non-negative a: function(o) {return o.a >= 0})
- debug: (default = false) if true, for every object will return distances of individual attributes as well as the overall distance from the subject under a property called 'debug'
e.g. if subject is {a:0, b:0} and object is {a:3, b:4}, the returned object will be {a: 3, b: 4, debug: {distance:25, details: {a: 9, b: 16}}}
Given John Foo's taste for movies:
|Attributes
|Value
|Weight
|explosions
|8
|10%
|romance
|3
|30%
|length
|6
|5%
|humor
|5
|5%
|pigeons
|10
|50%
John Foo would like to rent a movie tonight that most closely matches his movie tastes. He collected a DB of movies with numerical values ranging from 1 to 10 for each of the 5 attributes listed above (don't ask how).
John Foo loves his pigeons. It is the most important attribute to him, hence carries 50% of the weight. He does not like romance and wants to make sure that he avoids sappy movies. Even though he likes mid-length movies with explosions and semi-funny movies, he doesn't care as much, as long as the movie features peaceful pigeons.
Perfect case for Alike!
To install and add it to your
package.json
$ npm install alike --save
Now you can load up the module and use it like so:
knn = require('alike');
options = {
k: 10,
weights: {
explosions: 0.1,
romance: 0.3,
length: 0.05,
humour: 0.05,
pigeons: 0.5
}
}
movieTaste = {
explosions: 8,
romance: 3,
length: 5,
humour: 6,
pigeons: 10
}
knn(movieTaste, movies, options)
Where
movies is an array of objects that have at least those 5 attributes. Returns the top 10 movies from the array. Enjoy! :)
Alike is written in CoffeeScript in the
coffee/ folder. You may use
make coffee to compile and watch for changes. Unit tests are in the
coffee/test/ folder. You can run the tests with
npm test or if you are developing, you may use
make watch-test to watch while you TDD. :)
Run it with
coffee benchmark/ takes about 1m on a Macbook Air.
The benchmarks are designed to reflect realistically sized sets of data. They don't ship with the
npm package to keep things light.
Alike is licensed under the terms of the GNU Lesser General Public License, known as the LGPL.