Dioptra Documentation
  • What is KatiML ?
  • Overview
    • 🏃Getting Started
    • 🌊KatiML
      • Quick start
      • Ingestion basics
      • Ingestion SDK
      • Query basics
      • Query SDK
      • Dataset basics
      • Dataset SDK
      • Supported fields
      • Matching local data with Kati ML IDs
      • Managing Datapoints with Tags
      • Configuring Object Stores (optional)
    • 🧠Active Learning
      • 📖Miners basics
      • ⛏️Miners SDK
      • 🚗[Experimental] Mining on the edge
    • 🤖PyTorch and Tensorflow integrations
      • Tensorflow
      • PyTorch
  • 😬Enough docs, show me some code !
  • 📑Case studies
  • Definitions
Powered by GitBook
On this page
  • Version Control commands
  • Commit
  • View
  • Checkout
  • Diff
  • Difference between katiML Version Control and DVC

Was this helpful?

  1. Overview
  2. KatiML

Dataset basics

In katiML, datasets are version controlled collections of datapoints. This means that on top of being able to add / remove datapoints to a dataset, it is possible to commit, checkout, diff datasets, just like in git.

NOTE: UNLIKE git, katiML version control is centralized. There are no local repositories and every users share the same versions of the datasets. Modifying a dataset results in a shared, uncommitted version of the dataset for everyone. Similarly, checking out a version of a dataset will make this version current for everyone.

Version Control commands

Commit

Creates a new version of a dataset from the modifications that were uncommitted.

View

View a previous version of the dataset. A committed version cannot be modified. A previous commit can be checked out, the dataset modified and a new version can then be committed.

Checkout

Checkout (or rollback to) a given commit and make it current. The dataset can then be modified and changes committed to create a new version.

Diff

Diff two versions of a dataset. Will show the datapoints that were inserted, deleted as well as the change in distribution of the Ground Truths and tags

Difference between katiML Version Control and DVC

The main difference between katiML Version Control and DVC is the level at which the version control happen.

With DVC, the version control is done at a file level. DVC has no understanding of the content of the file. This means that when doing a diff, DVC can tell you which file changed, not how the dataset changed.

PreviousQuery SDKNextDataset SDK

Last updated 1 year ago

Was this helpful?

With katiML, the version control is done at a datapoint level. This means that LML can tell exactly how the dataset changed over time and do meaningful diffs like this

🌊
👇