A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines algorithm,
in the style of scikit-learn. The py-earth package implements Multivariate Adaptive Regression Splines using Cython and provides an interface that is compatible with scikit-learn's Estimator, Predictor, Transformer, and Model interfaces. For more information about
Multivariate Adaptive Regression Splines, see the references below.
Now With Missing Data Support!
The py-earth package now supports missingness in its predictors. Just set
allow_missing=True when constructing an
If there are other features or improvements you'd like to see in py-earth, please send me an email or open or comment on an issue. In particular, please let me know if any of the following are important to you:
- Improved speed
- Exporting models to additional formats
- Support for shared memory multiprocessing during fitting
- Support for cyclic predictors (such as time of day)
- Better support for categorical predictors
- Better support for large data sets
- Iterative reweighting during fitting
Make sure you have numpy and scikit-learn installed. Then do the following:
git clone git://github.com/scikit-learn-contrib/py-earth.git
sudo python setup.py install
from pyearth import Earth
from matplotlib import pyplot
m = 1000
n = 10
X = 80*numpy.random.uniform(size=(m,n)) - 40
y = numpy.abs(X[:,6] - 4.0) + 1*numpy.random.normal(size=m)
model = Earth()
y_hat = model.predict(X)
pyplot.title('Simple Earth Example')
I am aware of the following implementations of Multivariate Adaptive Regression Splines:
- The R package earth (coded in C by Stephen Millborrow): http://cran.r-project.org/web/packages/earth/index.html
- The R package mda (coded in Fortran by Trevor Hastie and Robert Tibshirani): http://cran.r-project.org/web/packages/mda/index.html
- The Orange data mining library for Python (uses the C code from 1): http://orange.biolab.si/
- The xtal package (uses Fortran code written in 1991 by Jerome Friedman): http://www.ece.umn.edu/users/cherkass/ee4389/xtalpackage.html
- MARSplines by StatSoft: http://www.statsoft.com/textbook/multivariate-adaptive-regression-splines/
- MARS by Salford Systems (also uses Friedman's code): http://www.salford-systems.com/products/mars
- ARESLab (written in Matlab by Gints Jekabsons): http://www.cs.rtu.lv/jekabsons/regression.html
The R package earth was most useful to me in understanding the algorithm, particularly because of Stephen Milborrow's
thorough and easy to read vignette (http://www.milbo.org/doc/earth-notes.pdf).
- Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics,
19(1), 1–67. http://www.jstor.org/stable/10.2307/2241837
- Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani.
(2012). earth: Multivariate Adaptive Regression Spline Models. R package
version 3.2-3. http://CRAN.R-project.org/package=earth
- Friedman, J. (1993). Fast MARS. Stanford University Department of Statistics, Technical Report No 110.
- Friedman, J. (1991). Estimating functions of mixed ordinal and categorical variables using adaptive splines.
Stanford University Department of Statistics, Technical Report No 108.
- Stewart, G.W. Matrix Algorithms, Volume 1: Basic Decompositions. (1998). Society for Industrial and Applied
- Bjorck, A. Numerical Methods for Least Squares Problems. (1996). Society for Industrial and Applied
- Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning (2nd Edition). (2009).
Springer Series in Statistics
- Golub, G., & Van Loan, C. Matrix Computations (3rd Edition). (1996). Johns Hopkins University Press.
References 7, 2, 1, 3, and 4 contain discussions likely to be useful to users of py-earth. References 1, 2, 6, 5,
8, 3, and 4 were useful during the implementation process.