Dioptra Documentation
  • What is KatiML ?
  • Overview
    • 🏃Getting Started
    • 🌊KatiML
      • Quick start
      • Ingestion basics
      • Ingestion SDK
      • Query basics
      • Query SDK
      • Dataset basics
      • Dataset SDK
      • Supported fields
      • Matching local data with Kati ML IDs
      • Managing Datapoints with Tags
      • Configuring Object Stores (optional)
    • 🧠Active Learning
      • 📖Miners basics
      • ⛏️Miners SDK
      • 🚗[Experimental] Mining on the edge
    • 🤖PyTorch and Tensorflow integrations
      • Tensorflow
      • PyTorch
  • 😬Enough docs, show me some code !
  • 📑Case studies
  • Definitions
Powered by GitBook
On this page
  • Miner catalog
  • Entropy mining
  • Activation mining
  • KNN mining
  • Coreset mining
  • BADGE mining
  • Weighted Entropy mining
  • Getting an existing miner
  • Listing all miners
  • Running a miner
  • Getting Status
  • Get Results
  • Get Config
  • Delete
  • Reset

Was this helpful?

  1. Overview
  2. Active Learning

Miners SDK

PreviousMiners basicsNext[Experimental] Mining on the edge

Last updated 1 year ago

Was this helpful?

You can use several types of miners with Dioptra based on the type of model issues you are trying to resolve

Miner catalog

Entropy mining

More details about the when to use this

from dioptra.miners.entropy_miner import EntropyMiner
class EntropyMiner(
    display_name: str,
    size: int,
    model_name: str,
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool]
):
Parameter
Decription

name of the miner

number of datapoints to sample

name of the model to get the prediction from. If used with EMBEDDINGS then the model name should include the layer name with the format model_name:layer_name

a lml like list of filters to select the data to mine from

a limit to the number of datapoints to sample from

the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp

whether to order desc or asc. should be used with select_order_by

Activation mining

from dioptra.miners.activation_miner import ActivationMiner
class ActivationMiner(
    display_name: str,
    size: int,
    model_name: str,
    embeddings_field: str,
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool]
):
Parameter
Decription

name of the miner

number of datapoints to sample

name of the model to get the prediction from. If used with EMBEDDINGS then the model name should include the layer name with the format model_name:layer_name

the field to use. Could be EMBEDDINGS or LOGITS

a dioptra like list of filters to select the data to mine from

a limit to the number of datapoints to sample from

the name of a fielld to order by when doing selection of the data to mine from. This is useful when used with limit to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp

whether to order desc or asc. should be used with select_order_by

KNN mining

from dioptra.miners.knn_miner import KNNMiner
class KNNMiner(
    display_name: str,
    size: int,
    model_name: str,
    metric: Optional[str] -> 'euclidean',
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool],
    select_reference_filters: List[object],
    select_reference_limit: Optional[int],
    select_reference_order_by: Optional[str],
    select_reference_desc: Optional[bool]
):
Parameter
Decription

name of the miner

number of datapoints to sample

Name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name

the metric to be used to assess similarity. Could be euclidean or cosine

a dioptra like list of filters to select the data to mine from

a limit to the number of datapoints to sample from

the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp

whether to order desc or asc. should be used with select_order_by

same as above but to select the reference data to do similarity from

same as above but to select the reference data to do similarity from

same as above but to select the reference data to do similarity from

same as above but to select the reference data to do similarity from

Coreset mining

from dioptra.miners.coreset_miner import CoresetMiner
class CoresetMiner(
    display_name: str,
    size: int,
    model_name: str,
    metric: Optional[str] -> 'euclidean',
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool],
    select_reference_filters: List[object],
    select_reference_limit: Optional[int],
    select_reference_order_by: Optional[str],
    select_reference_desc: Optional[bool]
):
Parameter
Decription

name of the miner

number of datapoints to sample

name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name

the metric to be used to assess similarity. Could be euclidean or cosine

a dioptra like list of filters to select the data to mine from

a limit to the number of datapoints to sample from

the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp

whether to order desc or asc. should be used with select_order_by

same as above but to select the data that is already in the training dataset for coreset

same as above but to select the data that is already in the training dataset for coreset

same as above but to select the data that is already in the training dataset for coreset

same as above but to select the data that is already in the training dataset for coreset

BADGE mining

from dioptra.miners.badge_miner import BadgeMiner
class BadgeMiner(
    display_name: str,
    size: int,
    model_name: str,
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool],
):
Parameter
Decription

name of the miner

number of datapoints to sample

name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name

a dioptra like list of filters to select the data to mine from

a limit to the number of datapoints to sample from

the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp

whether to order desc or asc. should be used with select_order_by

Weighted Entropy mining

from dioptra.miners.weighted_entropy_miner import EntropyMiner
class WeightedEntropyMiner(
    display_name: str,
    size: int,
    model_name: str,
    select_filters: List[object],
    select_limit: Optional[int],
    select_order_by: Optional[str],
    select_desc: Optional[bool]
):

Getting an existing miner

To get an existing miner, you can set the miner_id of a BaseMiner

from dioptra.miner.base_miner import BaseMiner

existing_miner = BaseMiner()
existing_miner.miner_id = 'my_miner_uuid'

Listing all miners

To get the list of all miners, you can use the list_miners utility

from dioptra.miner import list_miners
def list_miners()

Running a miner

Once a miner is created, you can run

def run(self)

Getting Status

While the miner runs, you can get its status. It can either me SUCCESS, PROCESSING, or FAILURE

def get_status(self)

Get Results

After a run is finished, you can retrieved the results. The results are a list of the uuid of the datapoints selected. You can get the datapoints using the select_datapoints utility method.

def get_results(self)

Get Config

To get the config of a miner you can call the get_config method

def get_config(self)

Delete

To delete a miner call the delete method

def delete(self)

Reset

You can reset a miner to clear its results

def reset(self)

More details about the when to use this

More details about the when to use this

More details about the when to use this

More details about the when to use this

More details about the when to use this

🧠
⛏️
display_name
size
model_name
select_filters
select_limit
select_order_by
select_desc
display_name
size
model_name
embeddings_field
select_filters
select_limit
select_order_by
select_desc
display_name
size
model_name
metric
select_filters
select_limit
select_order_by
select_desc
select_reference_filters
select_reference_limit
select_reference_order_by
select_reference_desc
display_name
size
model_name
metric
select_filters
select_limit
select_order_by
select_desc
select_reference_filters
select_reference_limit
select_reference_order_by
select_reference_desc
display_name
size
model_name
select_filters
select_limit
select_order_by
select_desc
here
here
here
here
here
here