Dataset SDK
The main class for datasets is Dataset
from dioptra.lake.datasets import Dataset as LMLDatasetclass LMLDataset()Create a dataset
To create a dataset, use the create method
def create(
self,
name: str
)name of the miner
Alternatively, you can use the get_or_create method that will create the dataset if it doesn't exist or get it
def get_or_create(
self,
name: str
)name of the miner
Retrieve an existing dataset
you can retrieve a dataset with its name. The version retrieved will be the current head
the name of the dataset to retrieve
Add data to a dataset
you can add data to a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)
list of ids of datapoints to be added to the dataset
Remove data from a dataset
you can remove datapoints from a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)
list of ids of datapoints to be removed from the dataset
Download a dataset
to download the datapoints in the dataset, call the download_datapoints method
Commit a dataset
you can commit a new version of a dataset. This will commit the dirty head
commit message
Checkout a dataset
you can checkout a previous version of a dataset. This will set the head of the dataset to this version for all consumers of this dataset
commit id to checkout
Get a dataset commit history
to get the history of all commits of a dataset, you can call the history method
Delete a dataset
you can delete a dataset using the delete method. THIS IS NOT REVERSIBLE. This is the equivalent of deleting a git repository
Listing all datasets
to list all datasets, use the list_datasets method
Last updated
Was this helpful?