Download Datasets From Highlighter
The Highlighter SDK allows you to download your Datasets from your Highlighter account and save it in some common formats.
When converting to common formats sucs as Coco or Yolo things like entity_id
will not be preserved. Only the information nessessary for training will end up
in the resulting saved dataset. If you want to save a dataset localled and not
loose this information you must use the hdf or json format.
CLI
hl dataset read --help
Usage: hl dataset read [OPTIONS] COMMAND [ARGS]...
Options:
-i, --dataset-ids TEXT integet <id> or <id>:<split>
--page-size INTEGER [default: 200]
--help Show this message and exit.
Commands:
coco
hdf
yolo
Use --help to see the format specific cli options
For example, the following will:
- download dataset 123 and 456
- save the images to
/my/image/cache/ - save the annotations as a
cocodataset tomy_dataset/
hl dataset read -i 123:train -i 456:test coco --annotations-dir my_dataset/ --data-file-dir /my/image/cache/
ls my_dataset/
> test.json train.json
Use --help to see the format specific cli Options
hl dataset read -i 123 yolo --help
Python API
The following will do same download and conversion as in the CLI example
from pathlib import Path
from highlighter.datasets import Dataset
from highlighter.datasets.formats.coco import CocoWriter
from highlighter import HLClient
client = HLClient.from_env()
train_ds = Dataset.read_highlighter_dataset_assessments(
client, 123
)
train_ds.data_files_df.loc[:, "split"] = "train"
test_ds = Dataset.read_highlighter_dataset_assessments(
client, 456
)
test_ds.data_files_df.loc[:, "split"] = "test"
combined_ds = Dataset.combine([train_ds, test_ds])
annotations_dir = Path("my_dataset/")
writer = CocoWriter(annotations_dir)
writer.write(combined_ds)
images_dir = Path("/my/image/cache/")
Dataset.download_dataset_files(
client,
images_dir,
combined_ds.data_files_df,
)