SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.





GitHub Stars



Last Commit

5mos ago







BSD 3 clause



SciKit-Learn Laboratory

.. image:: :target: :alt: Gitlab CI status

.. image:: :target: :alt: Azure Pipelines status

.. image:: :target:

.. image:: :target: :alt: Latest version on PyPI

.. image:: :alt: License

.. image:: :target: :alt: Conda package for SKLL

.. image:: :target: :alt: Supported python versions for SKLL

.. image:: :target: :alt: DOI for citing SKLL 1.0.0

.. image:: :target:

This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. One of the primary goals of our project is to make it so that you can run scikit-learn experiments without actually needing to write any code other than what you used to generate/extract the features.


You can install using either ``pip`` or ``conda``. See details `here <>`__.

  • Python 3.7, 3.8, or 3.9
  • beautifulsoup4 <>__
  • gridmap <>__ (only required if you plan to run things in parallel on a DRMAA-compatible cluster)
  • joblib <>__
  • pandas <>__
  • ruamel.yaml <>__
  • scikit-learn <>__
  • seaborn <>__
  • tabulate <>__

Command-line Interface

The main utility we provide is called ``run_experiment`` and it can be used to
easily run a series of learners on datasets specified in a configuration file

.. code:: ini

  experiment_name = Titanic_Evaluate_Tuned
  # valid tasks: cross_validate, evaluate, predict, train
  task = evaluate

  # these directories could also be absolute paths
  # (and must be if you're not running things in local mode)
  train_directory = train
  test_directory = dev
  # Can specify multiple sets of feature files that are merged together automatically
  featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]]
  # List of scikit-learn learners to use
  learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
  # Column in CSV containing labels to predict
  label_col = Survived
  # Column in CSV containing instance IDs (if any)
  id_col = PassengerId

  # Should we tune parameters of all learners by searching provided parameter grids?
  grid_search = true
  # Function to maximize when performing grid search
  objectives = ['accuracy']

  # Also compute the area under the ROC curve as an additional metric
  metrics = ['roc_auc']
  # The following can also be absolute paths
  logs = output
  results = output
  predictions = output
  probability = true
  models = output

For more information about getting started with ``run_experiment``, please check
out `our tutorial <>`__, or
`our config file specs <>`__.

You can also follow this `interactive Jupyter tutorial <>`__.

We also provide utilities for:

-  `converting between machine learning toolkit formats <>`__
   (e.g., ARFF, CSV)
-  `filtering feature files <>`__
-  `joining feature files <>`__
-  `other common tasks <>`__

Python API

If you just want to avoid writing a lot of boilerplate learning code, you can
also use our simple Python API which also supports pandas DataFrames.
The main way you'll want to use the API is through
the ``Learner`` and ``Reader`` classes. For more details on our API, see
`the documentation <>`__.

While our API can be broadly useful, it should be noted that the command-line
utilities are intended as the primary way of using SKLL.  The API is just a nice
side-effect of our developing the utilities.

A Note on Pronunciation

.. image:: doc/skll.png :alt: SKLL logo :align: right

.. container:: clear

.. image:: doc/spacer.png

SciKit-Learn Laboratory (SKLL) is pronounced "skull": that's where the learning happens.


-  *Simpler Machine Learning with SKLL 1.0*, Dan Blanchard, PyData NYC 2014 (`video <>`__ | `slides <>`__)
-  *Simpler Machine Learning with SKLL*, Dan Blanchard, PyData NYC 2013 (`video <>`__ | `slides <>`__)


If you are using SKLL in your work, you can cite it as follows: "We used scikit-learn (Pedragosa et al, 2011) via the SKLL toolkit ("


SKLL is featured in `Data Science at the Command Line <>`__
by `Jeroen Janssens <>`__.


See GitHub releases <>__.


Thank you for your interest in contributing to SKLL! See ` <>`__ for instructions on how to get started.

Rate & Review

Great Documentation0
Easy to Use0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Unwelcoming Community0