Dioptra Documentation
  • What is KatiML ?
  • Overview
    • 🏃Getting Started
    • 🌊KatiML
      • Quick start
      • Ingestion basics
      • Ingestion SDK
      • Query basics
      • Query SDK
      • Dataset basics
      • Dataset SDK
      • Supported fields
      • Matching local data with Kati ML IDs
      • Managing Datapoints with Tags
      • Configuring Object Stores (optional)
    • 🧠Active Learning
      • 📖Miners basics
      • ⛏️Miners SDK
      • 🚗[Experimental] Mining on the edge
    • 🤖PyTorch and Tensorflow integrations
      • Tensorflow
      • PyTorch
  • 😬Enough docs, show me some code !
  • 📑Case studies
  • Definitions
Powered by GitBook
On this page
  • Creating a Datapoint
  • Updating a Datapoint
  • Updating an Attachement

Was this helpful?

  1. Overview
  2. KatiML

Ingestion basics

In katiML, data is ingested as a list of records. A record is centered around a Datapoint. Tags, Ground Truths and Predictions are attachements to a Datapoint.

# The Datapoint
{
    'metadata': {
        'uri': 's3://my_bucket/myfile.jpg'
    },
    'type': 'IMAGE',
    # The tags
    'tags': {
        'source': 'stanford_dogs'
    },
    # The ground turths
    'groundtruths': [{
        'task_type': 'CLASSIFICATION',
        'class_name': 'chihuahua'
    }],
    # The predictions
    'predictions': [{
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits': [0.9, 1.4]
    }]
}

Creating a Datapoint

To create a Datapoint with its attachements, simply create a record and ingest it. It will create the datapoint and attach its attachements.

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'metadata': {
        'uri': 'https://dioptra-demo.s3.us-east-2.amazonaws.com/stanford-dogs-dataset/n02085620-Chihuahua/n02085620_8578.jpg'
    },
    'type': 'IMAGE',
    'groundtruths': [{
        'task_type': 'CLASSIFICATION',
        'class_name': 'chihuahua'
    }],
    'tags': {
        'source': 'stanford_dogs'
    }
}])

wait_for_upload(upload_id)

Updating a Datapoint

A Datapoint can be updated to add more attachements or update its fields. To do so, ingest a record with the datapoint id you want to update

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'id': 'bohuih-156njb-nnouho', # the datapoint id
    # This will add a new prediction.
    'predictions': [{
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits: [0.9, 1.4]
    }]
}])

wait_for_upload(upload_id)

NOTE: A datapoint can only have one ground truth per task type, and one prediction per task type and model name. Ingesting conflicting values will override the previous attachement.

Updating an Attachement

Similarly, an attachement can be updated by ingesting a record with its id. This will override the previous attachement.

import os
os.environ['DIOPTRA_API_KEY'] = 'my_api_key'

from dioptra.lake.utils import upload_to_lake, wait_for_upload

upload_id = upload_to_lake(records=[{
    'id': 'bohuih-156njb-nnouho', # the datapoint id
    'predictions': [{
        # This will update the prediction 'bohuih-156njb-nnouho'
        'id': 'bohuih-156njb-nnouho' # the prediction id
        'task_type': 'CLASSIFICATION',
        'class_names': ['chihuahua', 'pug']
        'logits: [0.9, 1.4]
    }]
}])

wait_for_upload(upload_id)
PreviousQuick startNextIngestion SDK

Last updated 1 year ago

Was this helpful?

🌊