Dioptra Documentation
  • What is KatiML ?
  • Overview
    • 🏃Getting Started
    • 🌊KatiML
      • Quick start
      • Ingestion basics
      • Ingestion SDK
      • Query basics
      • Query SDK
      • Dataset basics
      • Dataset SDK
      • Supported fields
      • Matching local data with Kati ML IDs
      • Managing Datapoints with Tags
      • Configuring Object Stores (optional)
    • 🧠Active Learning
      • 📖Miners basics
      • ⛏️Miners SDK
      • 🚗[Experimental] Mining on the edge
    • 🤖PyTorch and Tensorflow integrations
      • Tensorflow
      • PyTorch
  • 😬Enough docs, show me some code !
  • 📑Case studies
  • Definitions
Powered by GitBook
On this page
  • Ingesting data from local
  • Ingesting data located in an object store
  • Wait for your upload to finish

Was this helpful?

  1. Overview
  2. KatiML

Ingestion SDK

Ingesting data from local

You can upload metadata from your local environment directly using the upload_to_lake utility.

from dioptra.lake.utils import upload_to_lake
def upload_to_lake(
    records: List[object]
) -> object
# Returns a dict with an `id` property.
Parameter
Description

Ingesting data located in an object store

If the data is in an object store, you can ingest it directly. The call will create a signed URL for the object so the Object Store credentials should be configured on the machine. It will then submit an upload request to Dioptra with the signed URL.

from dioptra.lake.utils import upload_to_lake_from_bucket
def upload_to_lake_from_bucket(
    bucket_name: str,
    object_name: str
) -> object
# Returns a dict with an `id` property.
Parameter
Description

name of the bucket where the ndjson is located

Wait for your upload to finish

The values returned from an upload_* method is an upload id. To wait for it, you can use the wait_for_upload method

def wait_for_upload(upload_id: str) -> object
# Returns a dict with details of an upload
Parameter
Description

upload_id

The id of the upload created by a call to upload_to_lake*

PreviousIngestion basicsNextQuery basics

Last updated 1 year ago

Was this helpful?

a list of json objects describing a datapoint. The objects should follow the format described in the

The path to the new line delimited JSON file. The objects in the file should follow the format described in the

🌊
records
bucket_name
records
Supported fields
Supported fields