Download Datasets From Highlighter
The Highlighter SDK allows you to download your Datasets from your Highlighter account and save it in some common formats.
When converting to common formats sucs as Coco or Yolo things like entity_id
will not be preserved. Only the information nessessary for training will end up
in the resulting saved dataset. If you want to save a dataset localled and not
loose this information you must use the hdf
or json
format.
CLI
hl dataset read --help
Usage: hl dataset read [OPTIONS] COMMAND [ARGS]...
Options:
-i, --dataset-ids TEXT integet <id> or <id>:<split>
--page-size INTEGER [default: 200]
--help Show this message and exit.
Commands:
coco
hdf
yolo
Use --help
to see the format specific cli options
For example, the following will:
- download dataset 123 and 456
- save the images to
/my/image/cache/
- save the annotations as a
coco
dataset tomy_dataset/
hl dataset read -i 123:train -i 456:test coco --annotations-dir my_dataset/ --data-file-dir /my/image/cache/
ls my_dataset/
> test.json train.json
Use --help
to see the format specific cli Options
hl dataset read -i 123 yolo --help
Python API
The following will do same download and conversion as in the CLI example
from pathlib import Path
from highlighter.datasets import Dataset
from highlighter.datasets.formats.coco import CocoWriter
from highlighter import HLClient
client = HLClient.from_env()
train_ds = Dataset.read_highlighter_dataset_assessments(
client, 123
)
train_ds.data_files_df.loc[:, "split"] = "train"
test_ds = Dataset.read_highlighter_dataset_assessments(
client, 456
)
test_ds.data_files_df.loc[:, "split"] = "test"
combined_ds = Dataset.combine([train_ds, test_ds])
annotations_dir = Path("my_dataset/")
writer = CocoWriter(annotations_dir)
writer.write(combined_ds)
images_dir = Path("/my/image/cache/")
Dataset.download_dataset_files(
client,
images_dir,
combined_ds.data_files_df,
)