sdgym

Benchmarking synthetic data generation methods.

Showing:

21 Versions

0.4.1

0.4.1.dev2

0.4.1.dev1

0.4.1.dev0

0.4.0

This release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym. It also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate metrics, and adds the normalized score to the benchmark results.

New Features
  • Add normalized score to benchmark results - Issue #102 by @katxiao
  • Add max rows and max columns args - Issue #96 by @katxiao
  • Automatically detect number of workers - Issue #97 by @katxiao
  • Add summary function and command - Issue #92 by @amontanez24
  • Allow jobs list/JSON to be passed - Issue #93 by @fealho
  • Add ydata to sdgym - Issue #90 by @fealho
  • Add dockerfile for sdgym - Issue #88 by @katxiao
  • Add Gretel to SDGym synthesizer - Issue #87 by @amontanez24

0.4.0.dev1

0.4.0.dev0

0.3.1

This release adds new features to store results and cache contents into an S3 bucket as well as a script to collect results from a cache dir and compile a single results CSV file.

Issues closed
  • Collect cached results from s3 bucket - Issue #85 by @katxiao
  • Store cache contents into an S3 bucket - Issue #81 by @katxiao
  • Store SDGym results into an S3 bucket - Issue #80 by @katxiao
  • Add a way to collect cached results - Issue #79 by @katxiao
  • Allow reading datasets from private s3 bucket - Issue #74 by @katxiao
  • Typos in the sdgym.run function docstring documentation - Issue #69 by @sbrugman

0.3.1.dev2

0.3.1.dev1

0.3.1.dev0

0.3.0

Major rework of the SDGym functionality to support a collection of new features:

  • Add relational and timeseries model benchmarking.
  • Use SDMetrics for model scoring.
  • Update datasets format to match SDV metadata based storage format.
  • Centralize default datasets collection in the sdv-datasets S3 bucket.
  • Add options to download and use datasets from different S3 buckets.
  • Rename synthesizers to baselines and adapt to the new metadata format.
  • Add model execution and metric computation time logging.
  • Add optional synthetic data and error traceback caching.

0.3.0.dev0

0.2.2

This version adds a rework of the the benchmark function and a few new synthetsizers.

New Features
  • New CLI with run, make-leaderboard and make-summary commands
  • Parallel execution via Dask or Multiprocessing
  • Download datasets without executing the benchmark
  • Support for python from 3.6 to 3.8
New Synthesizers
  • sdv.tabular.CTGAN
  • sdv.tabular.CopulaGAN
  • sdv.tabular.GaussianCopulaOneHot
  • sdv.tabular.GaussianCopulaCategorical
  • sdv.tabular.GaussianCopulaCategoricalFuzzy

0.2.2.dev0

0.2.1

New updated leaderboard and minor improvements.

New Features
  • Add parameters for PrivBNSynthesizer - Issue #37 by @csala

0.2.1.dev0

0.2.0

New Benchmark API and lots of improved documentation.

New Features
  • The benchmark function now returns a complete leaderboard instead of only one score
  • Class Synthesizers can be directly passed to the benchmark function
Bug Fixes
  • One hot encoding errors in the Independent, VEEGAN and Medgan Synthesizers.
  • Proper usage of the eval mode during sampling.
  • Fix improperly configured datasets.

0.2.0.dev1

0.2.0.dev0

0.1.0

First release to PyPi