Official Python Pachyderm client.
This library provides the autogenerated gRPC/protobuf code for Pachyderm, along with a higher-level and more pythonic
See the API docs.
pip install python-pachyderm
Here's an example that creates a repo and adds a file:
import python_pachyderm # Connects to a pachyderm cluster on localhost:30650. # For other options, see the API docs. client = python_pachyderm.Client() # Create a pachyderm repo called `test` client.create_repo("test") # Create a file in `(repo="test", branch="master")` at `/dir_a/data.txt` # Similar to `pachctl put file test@master:/dir_a/data.txt` with client.commit("test", "master") as commit: client.put_file_bytes(commit, "/dir_a/data.txt", b"DATA") # Get back the file f = client.get_file(("test", "master"), "/dir_a/data.txt") print(f.read()) # >>> b"DATA"
How to load a CSV file into a Pandas dataframe
import pandas as pd f = client.get_file(("my_repo", "my_branch"), "/path_to/my_data.csv") df = pd.read_csv(f)
For more sophisticated examples, see the examples directory.
Prior to python-pachyderm 2.0, this library's versioning synced with pachyderm's core versioning; e.g. version 1.8.5 of this library synced with 1.8.5 of pachyderm core. python-pachyderm 2.0 onwards uses semver instead, so versions are not tied to pachyderm core. This was done for two reasons:
However, if for whatever reason you need to know which version of pachyderm core a version of python-pachyderm was built with, consult
CHANGELOG.md. As a broad rule of thumb, we recommend working with the latest version of both pachyderm core and python-pachyderm where possible.
This driver is co-maintained by Pachyderm and the community. If you're looking to contribute to the project, this is a fantastic place to get involved. Take a look at the contributing guide for more info (including testing instructions).