Dataset SDK

The main class for datasets is Dataset

from dioptra.lake.datasets import Dataset as LMLDataset
class LMLDataset()

Create a dataset

To create a dataset, use the create method

def create(
    self,
    name: str
)
Parameter
Description

name of the miner

Alternatively, you can use the get_or_create method that will create the dataset if it doesn't exist or get it

def get_or_create(
    self,
    name: str
)
Parameter
Description

name of the miner

Retrieve an existing dataset

you can retrieve a dataset with its name. The version retrieved will be the current head

Parameter
Description

the name of the dataset to retrieve

Add data to a dataset

you can add data to a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)

Parameter
Description

list of ids of datapoints to be added to the dataset

Remove data from a dataset

you can remove datapoints from a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)

Parameter
Description

list of ids of datapoints to be removed from the dataset

Download a dataset

to download the datapoints in the dataset, call the download_datapoints method

Commit a dataset

you can commit a new version of a dataset. This will commit the dirty head

Parameter
Description

commit message

Checkout a dataset

you can checkout a previous version of a dataset. This will set the head of the dataset to this version for all consumers of this dataset

Parameter
Description

commit id to checkout

Get a dataset commit history

to get the history of all commits of a dataset, you can call the history method

Delete a dataset

you can delete a dataset using the delete method. THIS IS NOT REVERSIBLE. This is the equivalent of deleting a git repository

Listing all datasets

to list all datasets, use the list_datasets method

Last updated

Was this helpful?