Also consider using dvc to version control and share your data within your team. Read this blogpost to learn how to work with JupyterLab notebooks efficiently by using a data science project structure like this.
The final directory structure looks like:
├── AUTHORS.md <- List of developers and maintainers. ├── CHANGELOG.md <- Changelog to keep track of new features and fixes. ├── CONTRIBUTING.md <- Guidelines for contributing to this project. ├── Dockerfile <- Build a docker container with `docker build .`. ├── LICENSE.txt <- License as chosen on the command-line. ├── README.md <- The top-level README for developers. ├── configs <- Directory for configurations of model & application. ├── data │ ├── external <- Data from third party sources. │ ├── interim <- Intermediate data that has been transformed. │ ├── processed <- The final, canonical data sets for modeling. │ └── raw <- The original, immutable data dump. ├── docs <- Directory for Sphinx documentation in rst or md. ├── environment.yml <- The conda environment file for reproducibility. ├── models <- Trained and serialized models, model predictions, │ or model summaries. ├── notebooks <- Jupyter notebooks. Naming convention is a number (for │ ordering), the creator's initials and a description, │ e.g. `1.0-fw-initial-data-exploration`. ├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .` │ to install for development or to build `tox -e build`. ├── references <- Data dictionaries, manuals, and all other materials. ├── reports <- Generated analysis as HTML, PDF, LaTeX, etc. │ └── figures <- Generated plots and figures for reports. ├── scripts <- Analysis and production scripts which import the │ actual PYTHON_PKG, e.g. train_model. ├── setup.cfg <- Declarative configuration of your project. ├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for │ development or `python setup.py bdist_wheel` to build. ├── src │ └── PYTHON_PKG <- Actual Python package where the main functionality goes. ├── tests <- Unit tests which can be run with `pytest`. ├── .coveragerc <- Configuration for coverage reports of unit tests. ├── .isort.cfg <- Configuration for git hook that sorts imports. └── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
Just install this package with
conda install -c conda-forge pyscaffoldext-dsproject
and note that
putup -h shows a new option
Creating a data science project is then as easy as:
putup --dsproject my_ds_project
--dsproject comprises additionally the flags
This project uses pre-commit, please make sure to install it before making any changes:
conda install pre-commit cd pyscaffoldext-dsproject pre-commit install
It is a good idea to update the hooks to the latest version:
Please also check PyScaffold's contribution guidelines.