napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus on implementing various methods for Probabilistic Label Trees. It allows training a classifier for very large datasets in just a few lines of code with minimal resources.
Right now, napkinXC implements the following features both in Python and C++:
Please note that this library is still under development and also serves as a base for experiments. API may not be compatible between releases and some of the experimental features may not be documented. Do not hesitate to open an issue in case of a question or problem!
The napkinXC is distributed under the MIT license. All contributions to the project are welcome!
Install via pip:
pip install napkinxc
We provide precompiled wheels for many Linux distros, macOS, and Windows for Python 3.7+. In case there is no wheel for your os, it will be quickly compiled from the source. Compilation from source requires modern C++17 compiler, CMake, Git, and Python 3.7+ installed.
The latest (master) version can be installed directly from the GitHub repository (not recommended):
pip install git+https://github.com/mwydmuch/napkinXC.git
A minimal example of usage:
from napkinxc.datasets import load_dataset from napkinxc.models import PLT from napkinxc.measures import precision_at_k X_train, Y_train = load_dataset("eurlex-4k", "train") X_test, Y_test = load_dataset("eurlex-4k", "test") plt = PLT("eurlex-model") plt.fit(X_train, Y_train) Y_pred = plt.predict(X_test, top_k=1) print(precision_at_k(Y_test, Y_pred, k=1))
napkinXC can also be used as executable to train and evaluate models using data in LIBSVM format. See documentation for more details.
This library implements methods from the following papers (see
experiments directory for scripts to replicate the results):