PySR: parallel symbolic regression built on Julia, and interfaced by Python.
Uses regularized evolution, simulated annealing, and gradient-free optimization.
(pronounced like py as in python, and then sur as in surface)
Check out SymbolicRegression.jl for the pure-Julia backend of this package.
Symbolic regression is a very interpretable machine learning algorithm for low-dimensional problems: these tools search equation space to find algebraic relations that approximate a dataset.
One can also extend these approaches to higher-dimensional spaces by using a neural network as proxy, as explained in 2006.11287, where we apply it to N-body problems. Here, one essentially uses symbolic regression to convert a neural net to an analytic equation. Thus, these tools simultaneously present an explicit and powerful way to interpret deep models.
Previously, we have used eureqa, which is a very efficient and user-friendly tool. However, eureqa is GUI-only, doesn't allow for user-defined operators, has no distributed capabilities, and has become proprietary (and recently been merged into an online service). Thus, the goal of this package is to have an open-source symbolic regression tool as efficient as eureqa, while also exposing a configurable python interface.
PySR uses both Julia and Python, so you need to have both installed.
You can install PySR with:
pip install pysr
The first launch will automatically install the Julia packages required. Most common issues at this stage are solved by tweaking the Julia package server. to use up-to-date packages.
Here is some demo code (also found in
import numpy as np from pysr import pysr, best # Dataset X = 2 * np.random.randn(100, 5) y = 2 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 2 # Learn equations equations = pysr( X, y, niterations=5, binary_operators=["+", "*"], unary_operators=[ "cos", "exp", "sin", # Pre-defined library of operators (see docs) "inv(x) = 1/x", # Define your own operator! (Julia syntax) ], ) ...# (you can use ctl-c to exit early) print(best(equations))
x0**2 + 2.000016*cos(x3) - 1.9999845
One can also use
best_tex to get the LaTeX form,
best_callable to get a function you can call.
This uses a score which balances complexity and error;
however, one can see the full list of equations with:
This is a pandas table, with additional columns:
MSE- the mean square error of the formula
score- a metric akin to Occam's razor; you should use this to help select the "true" equation.
sympy_format- sympy equation.
lambda_format- a lambda function for that equation, that you can pass values through.