Dataset SDK

The main class for datasets is Dataset

from dioptra.lake.datasets import Dataset as LMLDataset
class LMLDataset()

Create a dataset

To create a dataset, use the create method

def create(
    self,
    name: str
)
Parameter
Description
name

name of the miner

Alternatively, you can use the get_or_create method that will create the dataset if it doesn't exist or get it

def get_or_create(
    self,
    name: str
)
Parameter
Description
name

name of the miner

Retrieve an existing dataset

you can retrieve a dataset with its name. The version retrieved will be the current head

def get_from_name(
    self,
    name: str
)
Parameter
Description
name

the name of the dataset to retrieve

Add data to a dataset

you can add data to a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)

def add_datapoints(
    self,
    datapoint_ids: List[str]
)
Parameter
Description
ids

list of ids of datapoints to be added to the dataset

Remove data from a dataset

you can remove datapoints from a dataset using their uuids. The dataset version head will move to an uncommitted version (dirty head)

def remove_datapoints(
    self,
    datapoint_ids: List[str]
)
Parameter
Description
datapoint_ids

list of ids of datapoints to be removed from the dataset

Download a dataset

to download the datapoints in the dataset, call the download_datapoints method

def download_datapoints(self)

Commit a dataset

you can commit a new version of a dataset. This will commit the dirty head

def commit(
    self,
    message: str
)
Parameter
Description
message

commit message

Checkout a dataset

you can checkout a previous version of a dataset. This will set the head of the dataset to this version for all consumers of this dataset

def checkout(
    self,
    commit_id: str
)
Parameter
Description
commit_id

commit id to checkout

Get a dataset commit history

to get the history of all commits of a dataset, you can call the history method

def history(self)

Delete a dataset

you can delete a dataset using the delete method. THIS IS NOT REVERSIBLE. This is the equivalent of deleting a git repository

def delete(self)

Listing all datasets

to list all datasets, use the list_datasets method

from dioptra.lake.datasets import list_datasets
def list_datasets()

Last updated

Was this helpful?