- defines unified contract for dataset for purposes such as training, visualization, and exploration, via `DatasetManifest`, `ImageDataManifest`, etc.
- provides many commonly used dataset operation, such as sample dataset by categories, sample few-shot sub-dataset, sample dataset by ratios, train-test split, merge dataset, etc. (See [Here](#oom))
-`image_text_matching`: each image is associated with a collection of texts describing the image, and whether each text description matches the image or not.
-`image_matting`: each image has a pixel-wise annotation, where each pixel is labeled as 'foreground' or 'background'.
-`image_caption`: each image is labeled with a few texts describing the images.
-`text_2_image_retrieval`: each image is labeled with a number of text queries describing the image. Optionally, an image is associated with one label.
`multitask` type is a composition type, where one set of images has multiple sets of annotations available for different tasks, where each task can be of any basic type.
`key_value_pair` type is a generalized type, where a sample can be one or multiple images with optional text, labeled with key-value pairs. The keys and values are defined by a schema. Note that all the above seven basic types can be defined as this type with specific schemas.
| S | `DatasetManifest` | wraps the information about a dataset including labelmap, images (width, height, path to image), and annotations. Information about each image is obtained in `ImageDataManifest`. <br>For multitask dataset, the labels stored in the ImageDataManifest is a dict mapping from task name to that task's labels. The labelmap stored in DatasetManifest is also a dict mapping from task name to that task's labels. |
| S,M | `ImageDataManifest` | encapsulates image-specific information, such as image id, path, labels, and width/height. One thing to note here is that the image path can be:<br> 1. a local path (absolute `c:\images\1.jpg` or relative `images\1.jpg`), <br> 2. a local path in a **non-compressed** zip file (absolute `c:\images.zip@1.jpg` or relative `images.zip@1.jpg`) or <br> 3. an url. <br>All three kinds of paths can be loaded by `VisionDataset` |
| S | `ImageLabelManifest` | encapsulates one single image-level annotation |
| S | `CategoryManifest` | encapsulates the information about a category, such as its name and super category, if applicable |
| M | `MultiImageLabelManifest` | is abstract class. It encapsulates one annotation with one or multiple images, each image is stored as an image index. |
| M | `DatasetManifestWithMultiImageLabel` | supports annotations associated with one or multiple images. Each annotation is represented by `MultiImageLabelManifest` class, and each image is represented by `ImageDataManifest`. |
| M | `KeyValuePairDatasetManifest` | inherits `DatasetManifestWithMultiImageLabel`, dataset with each sample having `KeyValuePairLabelManifest` label, dataset is also associated with a schema to define the expected keys and values. |
| M | `KeyValuePairLabelManifest` | inherits `MultiImageLabelManifest`, encapsulates label information of `KeyValuePairDatasetManifest`. Each label has fields `img_ids` (associated images), `text` (associated text input), and `fields` (dictionary of interested field keys and values). |
In addition to loading a serialized `DatasetManifest` for instantiation, this repo currently supports two formats of data that can instantiates `DatasetManifest`,
`DatasetInfo` as the first arg in the arg list wraps the metainfo about the dataset like the name of the dataset, locations of the images, annotation files, etc. See examples in the sections below
Once a `DatasetManifest` is created, you can create a `VisionDataset` for accessing the data in the dataset, especially the image data, for training, visualization, etc:
You can use `CocoManifestAdaptorFactory` to create the manifest from COCO format data and a schema, a COCO data example can be found in `COCO_DATA_FORMAT.md`, and a schema example (dictionary) can be found in `DATA_PREPARATION.md`.
```{python}
from vision_datasets.common import CocoManifestAdaptorFactory, DatasetTypes
# check schema dictionary example From `DATA_PREPARATION.md`
key_value_pair_dataset_manifest = adaptor.create_dataset_manifest(coco_file_path_or_url='test.json', url_or_root_dir='data/') # image paths in test.json is relative to url_or_root_dir
Coco annotation format details w.r.t. `image_classification_multiclass/label`, `image_object_detection`, `image_caption`, `image_text_match`, `key_value_pair`, and `multitask` can be found in `COCO_DATA_FORMAT.md`.
Iris format is a legacy format which can be found in `IRIS_DATA_FORMAT.md`. Only `multiclass/label_classification`, `object_detection` and `multitask` are supported.
Once you have multiple datasets, it is more convenient to have all the `DatasetInfo` in one place and instantiate `DatasetManifest` or even `VisionDataset` by just using the dataset name, usage (
This repo offers the class `DatasetHub` for this purpose. Once instantiated with a json including the `DatasetInfo` for all datasets, you can retrieve a `VisionDataset` by
There are supported operations on manifests for different data types, such as split, merge, sample, etc. You can run
`vision_list_supported_operations -d {DATA_TYPE}`
to see the supported operations for a specific data type. You can use the factory classes in `vision_datasets.common.factory` to create operations for certain data type.
```python
from vision_datasets.common import DatasetTypes, SplitFactory, SplitConfig
Training with PyTorch is easy. After instantiating a `VisionDataset`, simply passing it in `vision_datasets.common.dataset.TorchDataset` together with the `transform`, then you are good to go with the PyTorch DataLoader for training.
There are a few commands that come with this repo once installed, such as datset check and download, detection conversion to classification dataset, and so on, check [`UTIL_COMMANDS.md`](./UTIL_COMMANDS.md) for details.