whylogs

Profile and monitor your ML data pipeline end-to-end , Join us in slack @ http://join.slack.whylabs.ai/

Showing:

Popularity

Downloads/wk

0

GitHub Stars

552

Maintenance

Last Commit

4d ago

Contributors

21

Package

Dependencies

19

License

Apache-2.0

Categories

Readme

whylogs: A Data and Machine Learning Logging Standard

License PyPI version Coverage Status Code style: black CII Best Practices PyPi Downloads CI Maintainability

whylogs is an open source standard for data and ML logging

whylogs logging agent is the easiest way to enable logging, testing, and monitoring in an ML/AI application. The lightweight agent profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.

whylogs can be installed in any Python, Java or Spark environment; it can be deployed as a container and run as a sidecar; or invoked through various ML tools (see integrations).

whylogs is designed by data scientists, ML engineers and distributed systems engineers to log data in the most cost-effective, scalable and accurate manner. No sampling. No post-processing. No manual configurations.

whylogs is released under the Apache 2.0 open source license. It supports many languages and is easy to extend. This repo contains the whylogs CLI, language SDKs, and individual libraries are in their own repos.

This is a Python implementation of whylogs. The Java implementation can be found here.

If you have any questions, comments, or just want to hang out with us, please join our Slack channel.

Getting started

Using pip

Install whylogs using the pip package manager by running

pip install whylogs

From the source

  • Download the source code by cloning the repository or by pressing [Download ZIP](https://github.com/whylabs/whylogs-python/archive/master.zip) on this page.

  • You'll need to install poetry in order to install dependencies using the lock file in this project. Follow [their docs](https://python-poetry.org/docs/) to get it set up.

  • Run the following comand at the root of the source code:

make install
make

Quickly Logging Data

whylogs is easy to get up and runnings

from whylogs import get_or_create_session
import pandas as pd

session = get_or_create_session()

df = pd.read_csv("path/to/file.csv")

with session.logger(dataset_name="my_dataset") as logger:
    
    #dataframe
    logger.log_dataframe(df)

    #dict
    logger.log({"name": 1})

    #images
    logger.log_images("path/to/image.png")

whylogs collects approximate statistics and sketches of data on a column-basis into a statistical profile. These metrics include:

  • Simple counters: boolean, null values, data types.
  • Summary statistics: sum, min, max, median, variance.
  • Unique value counter or cardinality: tracks an approximate unique value of your feature using HyperLogLog algorithm.
  • Histograms for numerical features. whyLogs binary output can be queried to with dynamic binning based on the shape of your data.
  • Top frequent items (default is 128). Note that this configuration affects the memory footprint, especially for text features.

Multiple Profile Plots

To view your logger profiles you can use, methods within whylogs.viz:

vizualization = ProfileVisualizer()
vizualization.set_profiles([profile_day_1, profile_day_2])
figure= vizualization.plot_distribution("<feature_name>")
figure.savefig("/my/image/path.png")

Individual profiles are saved to disk, AWS S3, or WhyLabs API, automatically when loggers are closed, per the configuration found in the Session configuration.

Current profiles from active loggers can be loaded from memory with:

profile = logger.profile()

Profile Viewer

You can also load a local profile viewer, where you upload the json summary file. The default path for the json files is set as output/{dataset_name}/{session_id}/json/dataset_profile.json.

from whylogs.viz import profile_viewer
profile_viewer()

This will open a viewer on your default browser where you can load a profile json summary, using the Select JSON profile button: Once the json is selected you can view your profile's features and associated and statistics.

Documentation

The [documentation](https://docs.whylabs.ai/docs/) of this package is generated automatically.

Features

  • Accurate data profiling: whylogs calculates statistics from 100% of the data, never requiring sampling, ensuring an accurate representation of data distributions
  • Lightweight runtime: whylogs utilizes approximate statistical methods to achieve minimal memory footprint that scales with the number of features in the data
  • Any architecture: whylogs scales with your system, from local development mode to live production systems in multi-node clusters, and works well with batch and streaming architectures
  • Configuration-free: whylogs infers the schema of the data, requiring zero manual configuration to get started
  • Tiny storage footprint: whylogs turns data batches and streams into statistical fingerprints, 10-100MB uncompressed
  • Unlimited metrics: whylogs collects all possible statistical metrics about structured or unstructured data

Data Types

Whylogs supports both structured and unstructured data, specifically:

Data typeFeaturesNotebook Example
Structured DataDistribution, cardinality, schema, counts, missing values[Getting started with structured data](https://github.com/whylabs/whylogs-examples/blob/mainline/python/GettingStarted.ipynb)
Imagesexif metadata, derived pixels features, bounding boxes[Getting started with images](https://github.com/whylabs/whylogs-examples/blob/mainline/python/Logging_Images.ipynb)
VideoIn development[Github Issue #214](https://github.com/whylabs/whylogs/issues/214)
Tensorsderived 1d features (more in developement)[Github Issue #216](https://github.com/whylabs/whylogs/issues/216)
Texttop k values, counts, cardinality[String Features](https://github.com/whylabs/whylogs/blob/mainline/examples/String_Features.ipynb)
AudioIn developement[Github Issue #212](https://github.com/whylabs/whylogs/issues/212)

Integrations

current integration | Integration | Features | Resources | | --- | --- | --- | | Spark | Run whylogs in Apache Spark environment|

  • [Code Example](https://github.com/whylabs/whylogs-examples/blob/mainline/scala/src/main/scala/WhyLogsDemo.scala)
| | Pandas | Log and monitor any pandas dataframe |
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/blob/mainline/python/logging_example.ipynb)
  • [whylogs: Embrace Data Logging](https://whylabs.ai/blog/posts/whylogs-embrace-data-logging)
| | Kafka | Log and monitor Kafka topics with whylogs|
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/blob/mainline/python/Kafka.ipynb)
  • [Integrating whylogs into your Kafka ML Pipeline](https://whylabs.ai/blog/posts/integrating-whylogs-into-your-kafka-ml-pipeline)
| | MLflow | Enhance MLflow metrics with whylogs: |
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/blob/mainline/python/MLFlow%20Integration%20Example.ipynb)
  • [Streamlining data monitoring with whylogs and MLflow](https://whylabs.ai/blog/posts/on-model-lifecycle-and-monitoring)
| | Github actions | Unit test data with whylogs and github actions|
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/tree/mainline/github-actions)
| | RAPIDS | Use whylogs in RAPIDS environment |
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/blob/mainline/python/RAPIDS%20GPU%20Integration%20Example.ipynb)
  • [Monitoring High-Performance Machine Learning Models with RAPIDS and whylogs](https://whylabs.ai/blog/posts/monitoring-high-performance-machine-learning-models-with-rapids-and-whylogs)
| | Java | Run whylogs in Java environment|
  • [Notebook Example](https://github.com/whylabs/whylogs-examples/blob/mainline/java/demo1/src/main/java/com/whylogs/examples/WhyLogsDemo.java)
| | Docker | Run whylogs as in Docker |
  • [Rest Container](https://docs.whylabs.ai/docs/integrations-rest-container)
| | AWS S3 | Store whylogs profiles in S3 |
  • [S3 example](https://github.com/whylabs/whylogs-examples/blob/mainline/python/S3%20example.ipynb)

Examples

For a full set of our examples, please check out [whylogs-examples](https://github.com/whylabs/whylogs-examples).

Check out our example notebooks with Binder: [Binder](https://mybinder.org/v2/gh/whylabs/whylogs-examples/HEAD)

  • [Getting Started notebook](https://github.com/whylabs/whylogs-examples/blob/mainline/python/GettingStarted.ipynb)
  • [Logging Example notebook](https://github.com/whylabs/whylogs-examples/blob/mainline/python/logging_example.ipynb)
  • [Logging Images](https://github.com/whylabs/whylogs-examples/blob/mainline/python/Logging_Images.ipynb)
  • [MLflow Integration](https://github.com/whylabs/whylogs-examples/blob/mainline/python/MLFlow%20Integration%20Example.ipynb)

Roadmap

whylogs is maintained by [WhyLabs](https://whylabs.ai).

Community

If you have any questions, comments, or just want to hang out with us, please join [our Slack channel](http://join.slack.whylabs.ai/).

Contribute

We welcome contributions to whylogs. Please see our [contribution guide](https://github.com/whylabs/whylogs/blob/mainline/CONTRIBUTING.md) and our [developement guide](https://github.com/whylabs/whylogs/blob/mainline/DEVELOPMENT.md) for details.

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100