Managing Datapoints with Tags
In katiML, tags are name-value pairs that can be attached to datapoints. They help selecting and managing your metadata.
Tags are created and updated with a dictionary named tags in the datapoint object. This ensures that each tag name (tags.name) is unique for a datapoint (datapoints.id):
from dioptra.lake.utils import upload_to_lake
upload_to_lake({
"id": ...,
"tags": {
# Add or update a tag with name "foo"
"foo": "bar",
# Set to null to delete the tag with name "baz"
"baz": null,
...
},
"predictions": [...]
})Tags structure
Tags are a child table of datapoints. As such, you can retrieve them and use them to filter datapoints with the appropriate arguments specified.
datapoints_dataframe = select_datapoints(filters=[...], fields=[...])tags.name
The name of the tag. Unique for a datapoint.
tags.value
The value of the tag.
tags.datapoint
The datapoint id this tag is attached to.
Tags Usage
Tags can be used anywhere you use datapoint filters. For example, the following filters will select all datapoints with tags source: stanford_dogs AND dataset: train.
We'll illustrate the usage of tags with the following code:
Retrieving the list of tags
Assuming you went through the Quick Start and Ingestion Basics, let's review the following line:
The dataframe returned by select_datapoints contains datapoints and a column named tags corresponding to the requested child table tags.* which are the tags attached to the datapoints we are selecting.
We want to explode the datapoints dataframe along the tags column to have a flat list of tags. We then turn each tag dictionary into a row with .apply(pd.Series)
The terminal prints something like this: a dataframe of tags.
Grouping Tags by Name and Value
Next we'll group the tags by name and value so we can select groups of datapoints.
Here we use the pandas grouping operators to aggregate the datapoint column into a list of datapoints by unique value of tags name and value.
Selecting Datapoints Based on Tag values
Last updated
Was this helpful?