⛏️Miners SDK
You can use several types of miners with Dioptra based on the type of model issues you are trying to resolve
Miner catalog
Entropy mining
More details about the when to use this here
from dioptra.miners.entropy_miner import EntropyMiner
class EntropyMiner(
display_name: str,
size: int,
model_name: str,
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool]
):
display_name
name of the miner
size
number of datapoints to sample
model_name
name of the model to get the prediction from. If used with EMBEDDINGS
then the model name should include the layer name with the format model_name:layer_name
select_filters
a lml like list of filters to select the data to mine from
select_limit
a limit to the number of datapoints to sample from
select_order_by
the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit
to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp
select_desc
whether to order desc or asc. should be used with select_order_by
Activation mining
More details about the when to use this here
from dioptra.miners.activation_miner import ActivationMiner
class ActivationMiner(
display_name: str,
size: int,
model_name: str,
embeddings_field: str,
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool]
):
display_name
name of the miner
size
number of datapoints to sample
model_name
name of the model to get the prediction from. If used with EMBEDDINGS
then the model name should include the layer name with the format model_name:layer_name
embeddings_field
the field to use. Could be EMBEDDINGS
or LOGITS
select_filters
a dioptra like list of filters to select the data to mine from
select_limit
a limit to the number of datapoints to sample from
select_order_by
the name of a fielld to order by when doing selection of the data to mine from. This is useful when used with limit
to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp
select_desc
whether to order desc or asc. should be used with select_order_by
KNN mining
More details about the when to use this here
from dioptra.miners.knn_miner import KNNMiner
class KNNMiner(
display_name: str,
size: int,
model_name: str,
metric: Optional[str] -> 'euclidean',
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool],
select_reference_filters: List[object],
select_reference_limit: Optional[int],
select_reference_order_by: Optional[str],
select_reference_desc: Optional[bool]
):
display_name
name of the miner
size
number of datapoints to sample
model_name
Name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name
metric
the metric to be used to assess similarity. Could be euclidean
or cosine
select_filters
a dioptra like list of filters to select the data to mine from
select_limit
a limit to the number of datapoints to sample from
select_order_by
the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit
to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp
select_desc
whether to order desc or asc. should be used with select_order_by
select_reference_filters
same as above but to select the reference data to do similarity from
select_reference_limit
same as above but to select the reference data to do similarity from
select_reference_order_by
same as above but to select the reference data to do similarity from
select_reference_desc
same as above but to select the reference data to do similarity from
Coreset mining
More details about the when to use this here
from dioptra.miners.coreset_miner import CoresetMiner
class CoresetMiner(
display_name: str,
size: int,
model_name: str,
metric: Optional[str] -> 'euclidean',
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool],
select_reference_filters: List[object],
select_reference_limit: Optional[int],
select_reference_order_by: Optional[str],
select_reference_desc: Optional[bool]
):
display_name
name of the miner
size
number of datapoints to sample
model_name
name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name
metric
the metric to be used to assess similarity. Could be euclidean
or cosine
select_filters
a dioptra like list of filters to select the data to mine from
select_limit
a limit to the number of datapoints to sample from
select_order_by
the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit
to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp
select_desc
whether to order desc or asc. should be used with select_order_by
select_reference_filters
same as above but to select the data that is already in the training dataset for coreset
select_reference_limit
same as above but to select the data that is already in the training dataset for coreset
select_reference_order_by
same as above but to select the data that is already in the training dataset for coreset
select_reference_desc
same as above but to select the data that is already in the training dataset for coreset
BADGE mining
More details about the when to use this here
from dioptra.miners.badge_miner import BadgeMiner
class BadgeMiner(
display_name: str,
size: int,
model_name: str,
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool],
):
display_name
name of the miner
size
number of datapoints to sample
model_name
name of the model to get the prediction from. The model name should include the layer name with the format model_name:layer_name
select_filters
a dioptra like list of filters to select the data to mine from
select_limit
a limit to the number of datapoints to sample from
select_order_by
the name of a field to order by when doing selection of the data to mine from. This is useful when used with limit
to control the which datapoints are selected. For ex: the last 1000 datapoints ordered by timestamp
select_desc
whether to order desc or asc. should be used with select_order_by
Weighted Entropy mining
More details about the when to use this here
from dioptra.miners.weighted_entropy_miner import EntropyMiner
class WeightedEntropyMiner(
display_name: str,
size: int,
model_name: str,
select_filters: List[object],
select_limit: Optional[int],
select_order_by: Optional[str],
select_desc: Optional[bool]
):
Getting an existing miner
To get an existing miner, you can set the miner_id
of a BaseMiner
from dioptra.miner.base_miner import BaseMiner
existing_miner = BaseMiner()
existing_miner.miner_id = 'my_miner_uuid'
Listing all miners
To get the list of all miners, you can use the list_miners
utility
from dioptra.miner import list_miners
def list_miners()
Running a miner
Once a miner is created, you can run
def run(self)
Getting Status
While the miner runs, you can get its status. It can either me SUCCESS
, PROCESSING
, or FAILURE
def get_status(self)
Get Results
After a run is finished, you can retrieved the results. The results are a list of the uuid
of the datapoints selected. You can get the datapoints using the select_datapoints
utility method.
def get_results(self)
Get Config
To get the config of a miner you can call the get_config method
def get_config(self)
Delete
To delete a miner call the delete method
def delete(self)
Reset
You can reset a miner to clear its results
def reset(self)
Last updated
Was this helpful?