Matching local data with Kati ML IDs
Last updated
Was this helpful?
Last updated
Was this helpful?
One common usage pattern in Kati ML is to be able to match local data to the one in katiML.
In katiML, datapoints are identified by a unique id
. This id is retrieved during a query call.
One way to match Kati ML id with local ids is with the uri
Let's take the following example:
In this example, the query returned two datapoints. Their uri
is in the metadata
column as a JSON field. So what if you have another dataframe with uri and groundtruth and need to match it with the id from the lake ?
To do this you'd create two new columns in the dataframes with the uri
then do a join and write the output as json
Another way to match to local ids is to use tags.
You can tag a data point with a custom tag that represent your local id. For example datapoint_id
To connect your tags.datapoint_id
with the katiMLid you'd need to first explode the tags
column then, select the tags with the name datapoint_id
. The value
datapoint is your datapoint_id
and the datapoint
is LakeML datapoint id