One common usage pattern in Kati ML is to be able to match local data to the one in katiML.
In katiML, datapoints are identified by a unique id . This id is retrieved during a query call.
Matching with metadata URI
One way to match Kati ML id with local ids is with the uri
Let's take the following example:
from dioptra.lake.utils import select_datapointsdf_1 =select_datapoints(filters=[{'left':'tags.value','op':'=','right':'stanford_dogs'}])df_1
In this example, the query returned two datapoints. Their uri is in the metadata column as a JSON field. So what if you have another dataframe with uri and groundtruth and need to match it with the id from the lake ?
To do this you'd create two new columns in the dataframes with the uri then do a join and write the output as json
Matching with tags
Another way to match to local ids is to use tags.
You can tag a data point with a custom tag that represent your local id. For example datapoint_id
To connect your tags.datapoint_id with the katiMLid you'd need to first explode the tags column then, select the tags with the name datapoint_id . The value datapoint is your datapoint_id and the datapoint is LakeML datapoint id