python partial dependence plot toolbox
Update for versions:
xgboost==1.3.3 matplotlib==3.1.1 sklearn==0.23.1
This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)
When using black box machine learning algorithms like random forest and boosting, it is hard to understand the relations between predictors and model outcome.
For example, in terms of random forest, all we get is the feature importance. Although we can know which feature is significantly influencing the outcome based on the importance calculation, it really sucks that we don’t know in which direction it is influencing. And in most of the real cases, the effect is non-monotonic.
We need some powerful tools to help understanding the complex relations between predictors and model prediction.
through pip (latest stable version： 0.2.1)
pip install pdpbox
through git (latest develop version)
git clone https://github.com/SauceCat/PDPbox.git cd PDPbox python setup.py install
PDPbox can be tested using
pip install tox tox-venv
tox inside the pdpbox clone directory. This will run tests with python3.7.
To test the documentation, call
tox -e docs.
The documentation should open up in your browser if it is successfully build.
Otherwise, the problem with the documentation will be reported in the output of the command.
PDP: PDP for a single feature
PDP: PDP for a multi-class
PDP Interact: PDP Interact for two features with contour plot
PDP Interact: PDP Interact for two features with grid plot
PDP Interact: PDP Interact for multi-class
Information plot: target plot for a single feature
Information plot: target interact plot for two features
Information plot: actual prediction plot for a single feature