Gradient Boosting Decision Trees Algorithms (GBDT)
Author: Jiang Chen (
GBDT is a high performance and full featured C++ implementation of Jerome H. Friedman's Gradient Boosting Decision Trees Algorithm and its modern offsprings,. It features high efficiency, low memory footprint, collections of loss functions and built-in mechanisms to handle categorical features and missing values.
When is GBDT good for you?
- You are looking beyond linear models.
- Gradient Boosting Decision Trees Algorithms is one of the best offshelf ML algorithms with built-in capabilities of non-linear transformation and feature crossing.
- Your data is too big to load into memory with existing ML packages.
- GBDT reduces memory footprint dramatically with feature bucketization. For some tested datasets, it used 1/7 of the memory of its counterpart and took only 1/2 time to train. See docs/PERFORMANCE_BENCHMARK.md for more details.
- You want better handling of categorical features and missing values.
- GBDT has built-in mechanisms to figure out how to split categorical features and place missing values in the trees.
- You want to try different loss functions.
- GBDT implements various pointwise, pairwise, listingwis loss functions including mse, logloss, huberized hinge loss, pairwise logloss,
GBRank and LambdaMart. It supports easily addition of your own custom loss functions.
Installation (python2.7, linux x86_64 or osx x86_64):
- Install the latest stable version:
pip install gbdt
- Install the latest development version:
pip install git+https://github.com/yarny/gbdt.git