The NGram class extends the Python 'set' class with efficient fuzzy search for members by means of an N-gram similarity measure. It also has static methods to compare a pair of strings.
The N-grams are character based not word-based, and the class does not implement a language model, merely searching for members by string similarity.
documentation_, which includes a tutorial and release notes.
GitHub issue tracker_ to report issues.
To install python-ngram from
pip install ngram
The set stores arbitrary items, but for non-string items a
str) must be specified to provide a string represenation. The key
function can also be used to normalise string items (e.g. lower-casing) prior
to N-gram indexing.
To index a string it pads the string with a specified dummy character, then splits it into overlapping substrings of N (default N=3) characters in length and associates each N-gram to the items that use it.
To find items similar to a query string, it splits the query into N-grams, collects all items sharing at least one N-gram with the query, and ranks the items by score based on the ratio of shared to unshared N-grams between strings.
In 2007, Michel Albert (exhuma) wrote the python-ngram module based on Perl's
String::Trigram module by Tarek Ahmed, and committed the code for 2.0.0b2 to
Sourceforge subversion repo.
Since late 2008, Graham Poulter has maintained python-ngram, initially refactoring
it to build on the
set class, and also adding features, documentation, tests,
performance improvements and Python 3 support.
Development takes place on
Github_. On checking out the repo run
tox to build
the Sphinx documentation and run tests. Run
pip install -e . to install the module
in editable mode, inside a virtualenv.
.. _documentation: https://python-ngram.readthedocs.io/en/latest/ .. _GitHub: http://github.com/gpoulter/python-ngram .. _GitHub issue tracker: https://github.com/gpoulter/python-ngram/issues .. _PyPI: http://pypi.python.org/pypi/ngram .. _String::Trigram: http://search.cpan.org/dist/String-Trigram/ .. _Sourceforge: https://sourceforge.net/projects/python-ngram/