offline-evaluation

Implementations and examples of common offline policy evaluation methods in Python.

Showing:

Popularity

Downloads/wk

0

GitHub Stars

173

Maintenance

Last Commit

1mo ago

Contributors

6

Package

Dependencies

0

License

Categories

Readme

Offline policy evaluation

PyPI version Downloads

Implementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this tutorial.

Installation

pip install offline-evaluation

Usage

from ope.methods import doubly_robust

Get some historical logs generated by a previous policy:

df = pd.DataFrame([
    {"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
    {"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
    {"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10},
    {"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
    {"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20},
    {"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10},
 ])

Define a function that computes P(action | context) under the new policy:

def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

Conduct the evaluation:

doubly_robust.evaluate(df, action_probabilities)
> {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

This means the new policy is significantly worse than the logging policy. Instead of A/B testing this new policy online, it would be better to test some other policies offline first.

See examples for more detailed tutorials.

Supported methods

  • Inverse propensity scoring
  • Direct method
  • Doubly robust (paper)

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100