Ingestion basics

In katiML, data is ingested as a list of records. A record is centered around a Datapoint. Tags, Ground Truths and Predictions are attachements to a Datapoint.

# The Datapoint
{
    'metadata': {
        'uri': 's3://my_bucket/myfile.jpg'
    },
    'type': 'IMAGE',
    # The tags
    'tags': {
        'source': 'stanford_dogs'
    },
    # The ground turths
    'groundtruths': [{
        'task_type': 'CLASSIFICATION',
        'class_name': 'chihuahua'
    }],
    # The predictions
    'predictions': [{
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits': [0.9, 1.4]
    }]
}

Creating a Datapoint

To create a Datapoint with its attachements, simply create a record and ingest it. It will create the datapoint and attach its attachements.

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'metadata': {
        'uri': 'https://dioptra-demo.s3.us-east-2.amazonaws.com/stanford-dogs-dataset/n02085620-Chihuahua/n02085620_8578.jpg'
    },
    'type': 'IMAGE',
    'groundtruths': [{
        'task_type': 'CLASSIFICATION',
        'class_name': 'chihuahua'
    }],
    'tags': {
        'source': 'stanford_dogs'
    }
}])

wait_for_upload(upload_id)

Updating a Datapoint

A Datapoint can be updated to add more attachements or update its fields. To do so, ingest a record with the datapoint id you want to update

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'id': 'bohuih-156njb-nnouho', # the datapoint id
    # This will add a new prediction.
    'predictions': [{
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits: [0.9, 1.4]
    }]
}])

wait_for_upload(upload_id)

NOTE: A datapoint can only have one ground truth per task type, and one prediction per task type and model name. Ingesting conflicting values will override the previous attachement.

Updating an Attachement

Similarly, an attachement can be updated by ingesting a record with its id. This will override the previous attachement.

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'id': 'bohuih-156njb-nnouho', # the datapoint id
    'predictions': [{
        # This will update the prediction 'bohuih-156njb-nnouho'
        'id': 'bohuih-156njb-nnouho' # the prediction id
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits: [0.9, 1.4]
    }]
}])

wait_for_upload(upload_id)

PreviousQuick start NextIngestion SDK

Last updated 2 years ago

Was this helpful?