Creating your first capability

A tutorial on how to create your first capability, add it to an agent and run in an assessment process

Overview

The following tutorial will walk you through creating an Assessment Capability in Highlighter. Each step in this tutorial can by carried out in you Highlighter account. Although we will be building a toy example, the tutorial wll link to other explanations/references as needed.

Prerequisites

Labelled data: For this tutorial we will be using a Street Number detection dataset. To import it into your account click here. Otherwise you can just follow along and fill-in-the-blanks with your own data.

What Is An Assessment Capability

Assessments tasks are sub-tasks of a larger assessment process that an organisation wishes to carry out.

More information on Capabilities here

Steps

Identify The Taxonomy and create as needed
Create A Model that outputs the desired taxa
Create An Experiment to track your model training
Configure Training Run and click Train

Identify The Taxonomy

When working in an extablished account you may have an existing taxonomy defined. In this case you may be able to skip to the Create A Model step.

For this tutorial we will be creating a street number detector. The detector will take images as input and return an enum attribute indicating the digit (0-9) and a pixel location attribute. For example:

street-number-example-data

For this we will need to create 10 Object Classes in Highlighter to represent each digit.

In the Highlighter UI, click the Develop tab in the top ribbon
Click the Taxonomy tab on the Highlighter side bar
Click the New Object Class button
Fill in the form (using Zero as an example):
- Name: Zero`
- Description: The digit zero
- Color: YOU CHOOSE
- Include in projects by default?: ☐ UNCHECK!
Click Save Object Class

Create A Model

A Model in Highlighter represents a transformation from some input to some output. Importantly, it does not specify how the transformation is performed. For the programmers among you, a Model serves as an Interface.

Create a new Model, You'll need to do this in the admin for now

Login into Highlighter admin
Capabilities->Models->New Model
Name: Street Number Detector
Description: Detects the location of street numbers within an image and classifies each digit from 0-9
Leave the rest of the fields as is. We are currently in the process of refactoring this part of Highlighter and need to do some cleaning up.
Click Create Model
Stay on this page for the next steps

model-created-page

Model Inputs

The Model's inputs serve two functions depending on where they're being used. At training time they inform the Dataset manipulations needed to convert the Highlighter Dataset Submissions into a form the model Trainer can use. At inference time they tell the Pipeline what data to pass to the model. For more information about Model Inputs see here.

In our case the Model we're creating is a Detector. It takes images as an input. At the moment an input image is assumed. Also, a Detector expects a full image so no filtering is required.

Model Output

The Model's outputs also serve two functions depending on where they're being used. At training time that are used in conjunction with the Model Inputs to convert the Highlighter Dataset Submissings into a form the model Trainer can use. At inference time they tell the Pipeline what attribtues the Pipeline Element is producing.

In our case we're creating a single headed Model. With head 0 producing 10 Object Class attributes (one for each digit 0-9) and a pixel location attribute representing the bounding box contaning each digit.

model-heads

At the moment the pixel_location attribute is infered within the code and has not been made explicit so we need only add the Object Class attributes to the Model Outputs..

Add Model Outputs

From the Model page you landed on from the previous step. Do this once for each digit 0-9
In the Actions box on the right click, Manage Model Outputs->New Model Output
Head: 0
Position: 0 increment for each digit 0-9
Entity Attribute: object_class
Entity Attribute Enum: zero select the appropriate object class
Default Threshold: 0.5
Click Create Model Output

Create An Experiment

Highlighter uses Experiments to track model training. Typically you would like to iteratively improve a model by running several training runs. Highlighter requires you group Experiments under a a Research Plan. The Research Plan provides a place to record high level objectives and aggregate performance metrics. Whereas Experiments provide a place for the more detailed investigations.

We recomend an Experiment contains a single Training Run, this avoids confusion when inspecting performance metrics.

Research Plan: (Street Number Detection)
- Experiment00: (Baseline: train model using default settings)
  - TrainingRun00 (Model Template: Detector DETR Resnet50)
  - Metrics ...
- Experiment01: (V1: Increased batch size to reduce loss noise)
  - TrainingRun01 (Model Template: Detector DETR Resnet50)
  - Metrics ...
- ...

The following instructions are in the Highlighter Frontend UI

Create Research Plan

In the Highlighter UI, click the Develop tab in the top ribbon
Research Plans->New Research Plan
- Title: Street Number Detector
- Description: Tutorial
- Objective: To develop a model capable of locating street numbers in images and classifying the digit (0-9)
- Evaluation Process: ToDo
- Assigned To: YOU
- Add Metrics: ToDo
- Click Save Research Plan

Create Experiment

From the Street Number Detector Research Plan page
Experiments->New Experiment
- Title: Baseline
- Description: Train model using default settings
- Hypothesis: 🤷
- Assigned To: YOU
- Click Save Experiment
- Note the experiment id in the url: https://demo.highlighter.ai/research_plans/123/experiments/456

Configure Training Run

We're on the home strech!!!

The Training Run is where you select the Model Template to use and configure the various parameters. Some parameters are common to several templates and some are unique to a specific template. See the Model Template reference for the specifics of each.

Select a Model Config Template

Model Config Templates are .json files with placeholders for some variables. Creating a Modle Config Template is out of the scope of this tutorial, for more information on the see the Creating a Model Config Template tutorial.

For now, we will simply use Detector DETR Resnet50. This template configures the Trainer to train a DETR model. For more information on DETR see End-to-End Object Detection with Transformers

Configure Training Run

In the Highlighter UI, click the Develop tab in the top ribbon
Training->Train New Model
Name: Street Number Detector 00
Model: Street Number Detector
Add Dataset:
- Purpos: train
- Dataset*: street-numbers-demo-train-0.8
Add Dataset:
- Purpos: dev
- Dataset*: street-numbers-demo-test-0.2
Add Dataset:
- Purpos: test
- Dataset*: street-numbers-demo-test-0.2
Model Template: Detector DETR Resnet50 You can view the full template by clicking Detail
Config Overide: (see json blob below). The Config Override dict is merged with the Model Template prior to training.
- num_classes: Set the number of classes to 10
- max_epochs: Set the number of epochs to train for, for brevity lets say 5
Click Train

{
  "model": {
    "bbox_head": {
      "num_classes": 10
    }
  },
  "train_cfg": {
    "max_epochs": 5
  }
}

Optional Override Default Highlighter Dataset Preprocessing Operations

Highlighter Datasets typically contain more attributes than a specific model needs at training time, ie: We may be training cat and dog detector but the dataset also has trees and cars. To perform this filtering we have DatasetPreprocessor class that can run a sequence of operations (ops). See the hl_train/highlighter_dataset/ops.py reference for all the common operations.

You can set your own pre-processing ops by setting the HIGHLIGHTER_DATASET_PREPROCESSING key-value-pair in the Model Config Overrides. The configured ops can be told to run before or after the default ops, or, you can delete the default ops and only use the ops you manually configure.

For eaxample:

To drop some corrupt images from the Dataset you could add the following. To drop the images before the default ops are run.

{
  "HIGHLIGHTER_DATASET_PREPROCESSING": {
      "when": "before",
      "ops": [{"type": "DropImages",
               "image_ids": [12345, 67890]
            }]
 }