Matching local data with Kati ML IDs
One common usage pattern in Kati ML is to be able to match local data to the one in katiML.
In katiML, datapoints are identified by a unique id . This id is retrieved during a query call.
Matching with metadata URI
One way to match Kati ML id with local ids is with the uri
Let's take the following example:
from dioptra.lake.utils import select_datapoints
df_1 = select_datapoints(
filters=[{
'left': 'tags.value',
'op': '=',
'right': 'stanford_dogs'}])
df_1
In this example, the query returned two datapoints. Their uri is in the metadata column as a JSON field. So what if you have another dataframe with uri and groundtruth and need to match it with the id from the lake ?

To do this you'd create two new columns in the dataframes with the uri then do a join and write the output as json
Matching with tags
Another way to match to local ids is to use tags.
You can tag a data point with a custom tag that represent your local id. For example datapoint_id

To connect your tags.datapoint_id with the katiMLid you'd need to first explode the tags column then, select the tags with the name datapoint_id . The value datapoint is your datapoint_id and the datapoint is LakeML datapoint id

Last updated
Was this helpful?