Dataset SDK
The main class for datasets is Dataset
from dioptra.lake.datasets import Dataset as LMLDataset
class LMLDataset()
Create a dataset
To create a dataset, use the create
method
def create(
self,
name: str
)
name
name of the miner
Alternatively, you can use the get_or_create
method that will create the dataset if it doesn't exist or get it
def get_or_create(
self,
name: str
)
name
name of the miner
Retrieve an existing dataset
you can retrieve a dataset with its name. The version retrieved will be the current head
def get_from_name(
self,
name: str
)
name
the name of the dataset to retrieve
Add data to a dataset
you can add data to a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)
def add_datapoints(
self,
datapoint_ids: List[str]
)
ids
list of ids of datapoints to be added to the dataset
Remove data from a dataset
you can remove datapoints from a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)
def remove_datapoints(
self,
datapoint_ids: List[str]
)
datapoint_ids
list of ids of datapoints to be removed from the dataset
Download a dataset
to download the datapoints in the dataset, call the download_datapoints
method
def download_datapoints(self)
Commit a dataset
you can commit a new version of a dataset. This will commit the dirty head
def commit(
self,
message: str
)
message
commit message
Checkout a dataset
you can checkout a previous version of a dataset. This will set the head of the dataset to this version for all consumers of this dataset
def checkout(
self,
commit_id: str
)
commit_id
commit id to checkout
Get a dataset commit history
to get the history of all commits of a dataset, you can call the history method
def history(self)
Delete a dataset
you can delete a dataset using the delete
method. THIS IS NOT REVERSIBLE. This is the equivalent of deleting a git repository
def delete(self)
Listing all datasets
to list all datasets, use the list_datasets
method
from dioptra.lake.datasets import list_datasets
def list_datasets()
Last updated
Was this helpful?