Matching local data with Kati ML IDs
One common usage pattern in Kati ML is to be able to match local data to the one in katiML.
In katiML, datapoints are identified by a unique id
. This id is retrieved during a query call.
Matching with metadata URI
One way to match Kati ML id with local ids is with the uri
Let's take the following example:
from dioptra.lake.utils import select_datapoints
df_1 = select_datapoints(
filters=[{
'left': 'tags.value',
'op': '=',
'right': 'stanford_dogs'}])
df_1

In this example, the query returned two datapoints. Their uri
is in the metadata
column as a JSON field. So what if you have another dataframe with uri and groundtruth and need to match it with the id from the lake ?

To do this you'd create two new columns in the dataframes with the uri
then do a join and write the output as json
import pandas as pd
import json
from dioptra.lake.utils import select_datapoints
datapoints_df = select_datapoints([])
with open('my_file_with_new_data.json', 'r') as f:
my_data = json.load(f)
my_data_df = pd.DataFrame(my_data)
my_data_df['uri'] = my_data_df['metadata'].apply(lambda x: x['uri'])
datapoints_df'uri'] = datapoints_df['metadata'].apply(lambda x: x['uri'])
df_new = datapoints_df'uri.set_index('uri').join(my_data_df.set_index('uri'))
results = []
for row in df_new[['id', 'groundtruths']].iterrows():
results.append(row[1].to_json(f))
Matching with tags
Another way to match to local ids is to use tags.
You can tag a data point with a custom tag that represent your local id. For example datapoint_id
from dioptra.lake.utils import select_datapoints
my_df = select_datapoints([], fields=['id', 'tags.*'])

To connect your tags.datapoint_id
with the katiMLid you'd need to first explode the tags
column then, select the tags with the name datapoint_id
. The value
datapoint is your datapoint_id
and the datapoint
is LakeML datapoint id
import pandas as pd
exploded_df = my_df.explode('tags')['tags'].apply(pd.Series)
mapping = exploded_df[exploded_df['name'] == 'datapoint_id'][['value', 'datapoint']]

Last updated
Was this helpful?