Ingestion basics
In katiML, data is ingested as a list of records. A record is centered around a Datapoint. Tags, Ground Truths and Predictions are attachements to a Datapoint.
# The Datapoint
{
'metadata': {
'uri': 's3://my_bucket/myfile.jpg'
},
'type': 'IMAGE',
# The tags
'tags': {
'source': 'stanford_dogs'
},
# The ground turths
'groundtruths': [{
'task_type': 'CLASSIFICATION',
'class_name': 'chihuahua'
}],
# The predictions
'predictions': [{
'task_type': 'CLASSIFICATION',
'class_names': ['chihuahua', 'pug']
'logits': [0.9, 1.4]
}]
}
Creating a Datapoint
To create a Datapoint with its attachements, simply create a record and ingest it. It will create the datapoint and attach its attachements.
import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'
from dioptra.lake.utils import upload_to_lake, wait_for_upload
upload_id = upload_to_lake(records=[{
'metadata': {
'uri': 'https://dioptra-demo.s3.us-east-2.amazonaws.com/stanford-dogs-dataset/n02085620-Chihuahua/n02085620_8578.jpg'
},
'type': 'IMAGE',
'groundtruths': [{
'task_type': 'CLASSIFICATION',
'class_name': 'chihuahua'
}],
'tags': {
'source': 'stanford_dogs'
}
}])
wait_for_upload(upload_id)
Updating a Datapoint
A Datapoint can be updated to add more attachements or update its fields. To do so, ingest a record with the datapoint id you want to update
import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'
from dioptra.lake.utils import upload_to_lake, wait_for_upload
upload_id = upload_to_lake(records=[{
'id': 'bohuih-156njb-nnouho', # the datapoint id
# This will add a new prediction.
'predictions': [{
'task_type': 'CLASSIFICATION',
'class_names': ['chihuahua', 'pug']
'logits: [0.9, 1.4]
}]
}])
wait_for_upload(upload_id)
NOTE: A datapoint can only have one ground truth per task type, and one prediction per task type and model name. Ingesting conflicting values will override the previous attachement.
Updating an Attachement
Similarly, an attachement can be updated by ingesting a record with its id. This will override the previous attachement.
import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'
from dioptra.lake.utils import upload_to_lake, wait_for_upload
upload_id = upload_to_lake(records=[{
'id': 'bohuih-156njb-nnouho', # the datapoint id
'predictions': [{
# This will update the prediction 'bohuih-156njb-nnouho'
'id': 'bohuih-156njb-nnouho' # the prediction id
'task_type': 'CLASSIFICATION',
'class_names': ['chihuahua', 'pug']
'logits: [0.9, 1.4]
}]
}])
wait_for_upload(upload_id)
Last updated
Was this helpful?