Query basics

KatiML is powered by a SQL engine. So the filtering and joining operations uses a SQL like syntax.

Building a filter

All filters are refined by a common structure

{
    'left': 'STRING' // the left field
    'op': [=, !=, >, <, >=, <=, in, not in, like, not like] // the operator
    'right': 'STRING' // the right field / value
 }

filters = [{
    'left': 'groundtruths.class_name',
    'op': '=',
    'right': 'chihuahua'
 }]

NOTE: filters have to be built in reference of the object being queried. For example, when querying datapoints, filtering on groundtruths.class_name=chihuahua will filter the datapoints that have an attachement groundtruths.class_name=chihuahua but it won't filter the ground truth itself.

Query endpoints

KatiML has 3 query endpoints to query Datapoints, Predictions and Ground Truths

Query Datapoints

To query datapoints, use the select_datapoints endpoint. You can specify the fields to be retrieved. * will retrieve all fields of the object type. For example: fields=['*', 'groundtruths.*'] will return all datapoints fields and groundtruths fields.

NOTE: filters apply to datapoints in a left join fashion. This means that filtering by groundtruths.class_name=chihuahua will remove datapoints that do not have a ground truth equals to chihuahua but will return all ground truth for these datapoints

from dioptra.lake.utils import select_datapoints

select_datapoints(
    filters=[{
        'left': 'tags.value',
        'op': '=',
        'right': 'stanford_dogs'
        },{
        'left': 'groundtruths.class_name',
        'op': '=',
        'right': 'chihuahua'
     }],
    fields=['*', 'groundtruths.*'])

Query Predictions Ground Truths

To query Predictions and Ground Truths, use the select_predictions and select_groundtruths endpoints.

NOTE: to filter based on Datapoints id, use the datapoints field

from dioptra.lake.utils import select_datapoints

select_datapoints(
    filters=[{
        'left': 'datapoints',
        'op': '=',
        'right': 'iuhiug-3168byb78g8-noiuh'
        },{
        'left': 'prediction.class_name',
        'op': '=',
        'right': 'chihuahua'
     }])

Joining Datapoints and its attachements

A typical pattern to create a dataset is to select the Datapoints using various filters, then fetch the Ground Truths or Predictions needed for the task and join them. Doing this sequentially makes it possible to apply fine grain filters to datapoints and attachements.

For example: fetching Datapoints with a tag and getting all ground truths of type segmentation.

from dioptra.lake.utils import select_datapoints, select_groundtruths, join_on_datapoints

datapoints = select_datapoints(filters=[
    {'left': 'tags.name', 'op': '=', 'right': 'data_split'},
    {'left': 'tags.value', 'op': '=', 'right': 'validation'},
])

groundtruths = select_groundtruths(
    filters=[
        {'left': 'datapoint', 'op': 'in', 'right': list(datapoints['id'])},
        {'left': 'task_type', 'op': '=', 'right': 'SEGMENTATION'}
    ],
    fields=['datapoint', 'task_type', 'encoded_segmentation_class_mask', 'class_names']
)
my_dataset = join_on_datapoints(
    datapoints=datapoints,
    groundtruths=groundtruths)

PreviousIngestion SDK NextQuery SDK

Last updated 2 years ago

Was this helpful?