Query basics
KatiML is powered by a SQL engine. So the filtering and joining operations uses a SQL like syntax.
Building a filter
All filters are refined by a common structure
{
'left': 'STRING' // the left field
'op': [=, !=, >, <, >=, <=, in, not in, like, not like] // the operator
'right': 'STRING' // the right field / value
}
filters = [{
'left': 'groundtruths.class_name',
'op': '=',
'right': 'chihuahua'
}]
NOTE: filters have to be built in reference of the object being queried. For example, when querying datapoints, filtering on groundtruths.class_name=chihuahua
will filter the datapoints that have an attachement groundtruths.class_name=chihuahua
but it won't filter the ground truth itself.
Query endpoints
KatiML has 3 query endpoints to query Datapoints, Predictions and Ground Truths
Query Datapoints
To query datapoints, use the select_datapoints
endpoint. You can specify the fields to be retrieved. *
will retrieve all fields of the object type. For example: fields=['*', 'groundtruths.*']
will return all datapoints
fields and groundtruths
fields.
NOTE: filters apply to datapoints in a left join fashion. This means that filtering by groundtruths.class_name=chihuahua
will remove datapoints that do not have a ground truth equals to chihuahua but will return all ground truth for these datapoints
from dioptra.lake.utils import select_datapoints
select_datapoints(
filters=[{
'left': 'tags.value',
'op': '=',
'right': 'stanford_dogs'
},{
'left': 'groundtruths.class_name',
'op': '=',
'right': 'chihuahua'
}],
fields=['*', 'groundtruths.*'])
Query Predictions Ground Truths
To query Predictions and Ground Truths, use the select_predictions
and select_groundtruths
endpoints.
NOTE: to filter based on Datapoints id, use the datapoints
field
from dioptra.lake.utils import select_datapoints
select_datapoints(
filters=[{
'left': 'datapoints',
'op': '=',
'right': 'iuhiug-3168byb78g8-noiuh'
},{
'left': 'prediction.class_name',
'op': '=',
'right': 'chihuahua'
}])
Joining Datapoints and its attachements
A typical pattern to create a dataset is to select the Datapoints using various filters, then fetch the Ground Truths or Predictions needed for the task and join them. Doing this sequentially makes it possible to apply fine grain filters to datapoints and attachements.
For example: fetching Datapoints with a tag and getting all ground truths of type segmentation.
from dioptra.lake.utils import select_datapoints, select_groundtruths, join_on_datapoints
datapoints = select_datapoints(filters=[
{'left': 'tags.name', 'op': '=', 'right': 'data_split'},
{'left': 'tags.value', 'op': '=', 'right': 'validation'},
])
groundtruths = select_groundtruths(
filters=[
{'left': 'datapoint', 'op': 'in', 'right': list(datapoints['id'])},
{'left': 'task_type', 'op': '=', 'right': 'SEGMENTATION'}
],
fields=['datapoint', 'task_type', 'encoded_segmentation_class_mask', 'class_names']
)
my_dataset = join_on_datapoints(
datapoints=datapoints,
groundtruths=groundtruths)
Last updated
Was this helpful?