cargo install bandit


Bandit Algorithms in Rust

by Michael Bohn

0.12.4 (see all)License:GPL-3.0
cargo install bandit

Build Status codecov License

Multi-armed bandit algorithms in Rust



bandit = "0.12.3"

Description and Scope

This library currently only implements the annealing softmax bandit algorithm. Future work may also implement other bandit algorithm variants (pull-requests are welcomed).

Many inspirations for this project are taken from the book Bandit Algorithms for Website Optimization by John Myles White. Copyright 2013 John Myles White, 978-1-449-34133-6

Usage and Configuration

First, you need to create a bandit with three parameters:

let bandit = AnnealingSoftmax::new(arms, DEFAULT_BANDIT_CONFIG.clone(), DEFAULT_CONFIG);

The first parameters is the list of arms the bandit will draw from. An arm can be anything that implements the Clone + Hash + Eq + Identifiable traits. You probably always will derive the first three traits, but the last one, Identifiable, is special.

pub trait Identifiable {
    fn ident(&self) -> String;

Well not very special, it should be very easy to implement. The reason for this additional trait is that the full bandit state can be persisted to disk (in JSON format), to later continue at the exact state of the algorithm. Unfortunately Hash makes no guarantee about the used hashing algorithm and may change between Rust versions. Since we want to be able to load states regardless of the Rust version, we require the trait returning a unique and easily serialisable String as a unique identifier for the arm.

The next parameter is the general bandit configuration:

#[derive(Debug, PartialEq, Clone)]
pub struct BanditConfig {
    /// Log file for logging details about the bandit algorithm
    /// run. What will be logged depends on the bandit algorithm
    /// implementation.
    pub log_file: Option<PathBuf>

Allowing you to optionally supply a path to a log file, where every step in the algorithm is logged to. The log file is a simple csv file, logging arm draw states and updates to the rewards:


The last parameter is a special configuration for the annealing softmax algorithm:

#[derive(Debug, PartialEq, Copy, Clone, Serialize, Deserialize)]
pub struct AnnealingSoftmaxConfig {
    /// The higher the value the faster the algorithms tends toward selecting
    /// the arm with highest reward. Should be a number between [0, 1.0)
    pub cooldown_factor : f64

It currently only has one option: the cooldown_factor, which may be a float between 0 and 1.0. You can control how fast the annealing will happen with this factor. The higher the cooldown_factor the faster the algorithm will stop exploring new arms and will stick with the arm with the highest reward discovered so far. You probably have to experiment with this factor to find the best one for your particular setup (there are also tools to help you with that, see below).

After constructing and configuring your bandit, you can start selecting arms:

let arm = bandit.select_arm();

and update the reward for arms:

let reward = ... some f64 value
bandit.update(arm, reward);

Thats basically it. At some point and after enough rewards (and depending on your chosen cooldown_factor) the system will be completely cooled off and always selecting the highest reward arm. It is safe to let the system run for a very long time, always selecting arms and updating without fears for overflow errors and inconsistent bandit states (this was found out the hard way in a unit test, hooray for unit-testing).

Saving and Restoring states

A bandit can save itself into a file:

bandit.save_bandit(<path to save file>);

The bandit can be loaded again from the particular implementation:

let arms = ... list of arms, as in the initial creation
let bandit_config = ... BanditConfig, like in second parameter from initial creation
let bandit_loaded = AnnealingSoftmax::load_bandit(arms, bandit_config, <path to save file>);

The arms supplied do not necessarily have to match the arms that are restored from the file. If an arm is removed, it will be removed after loading. You will loose the stored reward after saving the bandit again. If a new arm is added, it will start with a zero reward.

Visualising Bandit Arm Selection Data

The bandit tools application allows you to analyse the stored state file and log file. Details are described in the separate repo bandit_tool. You can find the web application here:

GitHub Stars



16d ago








16d ago
1yr ago
3yrs ago
3yrs ago
No alternatives found
No tutorials found
Add a tutorial