.. raw:: html
.. raw:: html
Join Slack | Documentation | Blog | Twitter
.. raw:: html|build| |Documentation Status| |pkgVersion| |pyVersions| |Maintainability| |Coverage Status|
.. raw:: html
.. raw:: html
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort. This includes checks related to various types of issues, such as model performance, data integrity, distribution mismatches, and more.
This README refers to the Tabular version of deepchecks.
Check out the Deepchecks for Computer Vision & Images subpackage <deepchecks/vision>
__ for more details about deepchecks for CV, currently in beta release.
.. code:: bash
pip install deepchecks -U --user
..
Note: Computer Vision Install
To install deepchecks together with the Computer Vision Submodule that is currently in beta release, replace deepchecks
with "deepchecks[vision]"
as follows.
.. code:: bash
pip install "deepchecks[vision]" -U --user
.. code:: bash
conda install -c conda-forge deepchecks
🏃♀️ See It in Action
Head over to one of our following quickstart tutorials, and have deepchecks running on your environment in less than 5 min:
Train-Test Validation Quickstart (loans data) <https://docs.deepchecks.com/stable/user-guide/tabular/ auto_quickstarts/plot_quick_train_test_validation.html? utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out>
__
Data Integrity Quickstart (avocado sales data) <https://docs.deepchecks.com/stable/user-guide/tabular/ auto_quickstarts/plot_quick_data_integrity.html? utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out>
__
Full Suite (many checks) Quickstart (iris data) <https://docs.deepchecks.com/en/stable/user-guide/tabular/ auto_quickstarts/plot_quickstart_in_5_minutes.html? utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out>
__
Recommended - download the code and run it locally on the built-in dataset and (optional) model, or replace them with your own.
Play with some of the existing checks in our Interactive Checks Demo <https://checks-demo.deepchecks.com/?check=No+check+selected &utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out>
__,
and see how they work on various datasets with custom corruptions injected.
A Suite <#suite>
runs a collection of Checks <#check>
with
optional Conditions <#condition>
_ added to them.
Example for running a suite on given datasets
and with a supported model
:
.. code:: python
from deepchecks.tabular.suites import model_evaluation suite = model_evaluation() result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset, model=model) result.save_as_html() # replace this with result.show() or result.show_in_window() to see results inline or in window
Which will result in a report that looks like this:
.. raw:: html
Note:
data_integrity
, train_test_validation
) don't require a model as part of the input.See the full code tutorials here
_.
.. _full code tutorials here: https://docs.deepchecks.com/dev/user-guide/tabular/auto_quickstarts/index.html? utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out
.. _datasets: https://docs.deepchecks.com/en/stable/ user-guide/tabular/dataset_object.html ?utm_source=github.com&utm_medium=referral& utm_campaign=readme&utm_content=running_a_suite
.. _supported model: https://docs.deepchecks.com/en/stable/ user-guide/supported_models.html ?utm_source=github.com&utm_medium=referral& utm_campaign=readme&utm_content=running_a_suite
In the following section you can see an example of how the output of a single check without a condition may look.
To run a specific single check, all you need to do is import it and then
to run it with the required (check-dependent) input parameters. More
details about the existing checks and the parameters they can receive
can be found in our API Reference
_.
.. _API Reference: https://docs.deepchecks.com/en/stable/ api/index.html? utm_source=github.com&utm_medium=referral& utm_campaign=readme&utm_content=running_a_check
.. code:: python
from deepchecks.tabular.checks import TrainTestFeatureDrift import pandas as pd
train_df = pd.read_csv('train_data.csv') test_df = pd.read_csv('test_data.csv')
TrainTestFeatureDrift().run(train_df, test_df)
Will produce output of the type:
.. raw:: html
<h4>Train Test Drift</h4>
<p>The Drift score is a measure for the difference between two distributions,
in this check - the test and train distributions. <br>
The check shows the drift score and distributions for the features,
sorted by feature importance and showing only the top 5 features, according to feature importance.
If available, the plot titles also show the feature importance (FI) rank.</p>
<p align="left">
<img src="docs/source/_static/images/general/train-test-drift-output.png">
</p>
While you’re in the research phase, and want to validate your data, find potential methodological problems, and/or validate your model and evaluate it.
.. raw:: html
See more about typical usage scenarios and the built-in suites in the
docs <https://docs.deepchecks.com/stable/getting-started/welcome.html? utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=what_do_you_need_in_order_to_start_validating#when-should-you-use-deepchecks>
__.
Each check enables you to inspect a specific aspect of your data and models. They are the basic building block of the deepchecks package, covering all kinds of common issues, such as:
many more checks
_... _many more checks: https://docs.deepchecks.com/stable/ checks_gallery/tabular.html ?utm_source=github.com&utm_medium=referral& utm_campaign=readme&utm_content=key_concepts__check
Each check can have two types of results:
A condition is a function that can be added to a Check, which returns a pass ✓, fail ✖ or warning ! result, intended for validating the Check's return value. An example for adding a condition would be:
.. code:: python
from deepchecks.tabular.checks import BoostingOverfit BoostingOverfit().add_condition_test_score_percent_decline_not_greater_than(threshold=0.05)
which will return a check failure when running it if there is a difference of more than 5% between the best score achieved on the test set during the boosting iterations and the score achieved in the last iteration (the model's "original" score on the test set).
An ordered collection of checks, that can have conditions added to them. The Suite enables displaying a concluding report for all of the Checks that ran.
See the list of predefined existing suites
_ for tabular data
to learn more about the suites you can work with directly and also to
see a code example demonstrating how to build your own custom suite.
The existing suites include default conditions added for most of the checks. You can edit the preconfigured suites or build a suite of your own with a collection of checks and optional conditions.
.. _predefined existing suites: deepchecks/tabular/suites
.. raw:: html
The deepchecks package installed
JupyterLab or Jupyter Notebook or any Python IDE
Depending on your phase and what you wish to validate, you'll need a subset of the following:
Raw data (before pre-processing such as OHE, string processing, etc.), with optional labels
The model's training data with labels
Test data (which the model isn't exposed to) with labels
A supported model
_ (e.g. scikit-learn models, XGBoost, any model implementing the predict
method in the required format)
The package currently supports tabular data and is in beta release for the Computer Vision subpackage <deepchecks/vision>
__.
https://docs.deepchecks.com/ <https://docs.deepchecks.com/?utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=documentation>
__https://docs.deepchecks.com/en/latest <https://docs.deepchecks.com/en/latest/?utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=documentation>
__Slack Community <https://join.slack.com/t/deepcheckscommunity/shared_invite/zt-y28sjt1v-PBT50S3uoyWui_Deg5L_jg>
__
to connect with the maintainers and follow users and interesting
discussionsGithub Issue <https://github.com/deepchecks/deepchecks/issues>
__ to suggest
improvements, open an issue, or share feedback... |build| image:: https://github.com/deepchecks/deepchecks/actions/workflows/build.yml/badge.svg .. |Documentation Status| image:: https://readthedocs.org/projects/deepchecks/badge/?version=stable :target: https://docs.deepchecks.com/?utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=badge .. |pkgVersion| image:: https://img.shields.io/pypi/v/deepchecks .. |pyVersions| image:: https://img.shields.io/pypi/pyversions/deepchecks .. |Maintainability| image:: https://api.codeclimate.com/v1/badges/970b11794144139975fa/maintainability :target: https://codeclimate.com/github/deepchecks/deepchecks/maintainability .. |Coverage Status| image:: https://coveralls.io/repos/github/deepchecks/deepchecks/badge.svg?branch=main :target: https://coveralls.io/github/deepchecks/deepchecks?branch=main
.. |binder badge image| image:: /docs/source/_static/binder-badge.svg :target: https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html?utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out_binder .. |colab badge image| image:: /docs/source/_static/colab-badge.svg :target: https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html?utm_source=github.com&utm_medium=referral&utm_campaign=readme&utm_content=try_it_out_colab
Version | Tag | Published |
---|---|---|
0.0.1 | 9mos ago |