torchgeo

Содержание

Datasets

Benchmark vs. Generic
Curated vs. Uncurated
Image vs. Target
Chip vs. Tile vs. Region
Geo vs. Vision
STAC vs. non-STAC
GeoDataset

Samplers

GeoSampler
RandomGeoSampler
GridGeoSampler (SequentialGeoSampler? CheckerboardGeoSampler?)
PreChippedGeoSampler

Transforms
Models

Datasets

There are many different ways in which we can classify our datasets. This classification allows us to create abstract base classes to ensure a uniform API for all subclasses. It's also important for organizing the documentation.

Benchmark vs. Generic

Benchmark: contains both images and targets (e.g. COWC, VHR-10, CV4A Kenya)
Generic: contains only images or targets (e.g. Landsat, Sentinel, CDL, Chesapeake)

We want to be able to combine two or more "generic" datasets to get a single "benchmark" dataset. For example, we need a way for users to specify an image source (e.g. Landsat, Sentinel) and a target source (e.g. CDL, Chesapeake). Can only combine "generic" datasets if they contain geospatial information. Is this always true?

Curated vs. Uncurated

Alternate names for benchmark vs. generic, otherwise identical.

Image vs. Target

Image: contains raw images
Target: contains ground truth targets

This makes it easy to combine "image" and "target" datasets into a single supervised learning problem, but what about datasets that contain both images and targets? Do we want to allow people to swap image or target sources in these kind of datasets?

Chip vs. Tile vs. Region

Chip/Patch: pre-defined chips/patches (e.g. COWC, VHR-10, DOTA)
Tile/Swath: possibly-overlapping tiles we need to sample chips/patches from (e.g. Landsat, Sentinel, CV4A Kenya)
Map/Region: static maps of stitched-together data (e.g. CDL, Chesapeake Land Cover, static Google Earth imagery)

Again, we need to be able to combine datasets from different categories into a single data loader.

Geo vs. Vision

Geo: contains lat/lon/proj/crs information, time is optional (e.g. Landsat, CDL, etc.)
Vision: no lat/lon information, only pre-defined images/targets

Any kind of geospatial dataset can be combined with another. Doesn't matter if they use chips/tiles/regions, as long as we have the lat/lon info.

Vision datasets can be combined to increase the size of the dataset. This creates a ConcatDataset just like in PyTorch/torchvision. Geo datasets can be combined to create a UnionDataset or IntersectionDataset.

STAC vs. non-STAC

Some datasets use the STAC API. Presumably there are competitors. Datasets stored using the STAC API may allow for a nice base class.

GeoDataset

Calculate the intersection of the bounds of the subdatasets to find where to sample from (easy)
Choose which coordinate system to use for the joint dataset, i.e. what coordinate system the resulting chips will be in(could be an argument)
Choose which spatial resolution to use, i.e. how big each pixel in the resulting chips will be (could be an argument)
Calculate the chips that you will sample in terms of geographic coordinates (not too bad)
This makes the chip_size argument a bit weird -- it is much more natural to specify this in "pixels" so there should be some conversion here
In __getitem__ you now need to crop chips from each dataset and resample/warp to a common grid (not horrible, but needs to be as fast as possible)

Samplers

Many datasets involve large tiles or maps of data, too large to pass directly to a PyTorch model. Instead, we'll need to load several small chips/patches of the imagery in each batch. PyTorch DataLoaders allow you to specify a Sampler class that provides these indices.

Most of these will need to return a tuple of (lat, long, width, height, proj, crs, time, etc.) instead of an int index. This will then be passed to the __getitem__.

Idea: use rasterio.vrt.WarpedVRT to combine multiple tiles in each dataset. Then use rasterio.windows.Window to return a single smaller patch.

Problem: Window needs col_offset, row_offset, width, height in pixel coord, but we don't know that ahead of time. Could we? Dataset could have total width and height params, index in pixels instead of lat/long. But then what about reprojections? If combining multiple datasets in different CRS/proj, we need a common way to index them.

__getitem__ must return geospatial information, otherwise we can't save results to a file or stitch together results.

PyTorch data loaders have two parameters: sampler and batch_sampler. Most of the time, we only need to use a custom Sampler, and the default BatchSampler will work. We may need to use a custom BatchSampler when the batch axis is replaced by a time axis, for example. Another example is a batch sampler that returns seasonal positive/negative pairs for Seasonal Contrast (https://arxiv.org/abs/2103.16607).

GeoSampler

Base class for the following samplers. Uses tuple instead of int for passing to __getitem__ of GeoDataset. Not intended for VisionDataset.

RandomGeoSampler

Randomly sample chips from the region of interest. Useful for training.

GridGeoSampler (SequentialGeoSampler? CheckerboardGeoSampler?)

Takes arguments like stride and chip size, and returns possibly overlapping chips. If stride > chip size, no overlap. Useful for prediction.

PreChippedGeoSampler

What if chips are already defined in the dataset? In that case, we will want to use those and index normally.

Transforms

Torchvision uses PIL, which isn't compatible with multi-spectral imagery. Although some of our imagery isn't multi-spectral, we don't want to have to implement the same transforms for every possible data structure. Instead, we should probably standardize on torch Tensors. This also has the benefit that transforms can be run on the GPU. Does this mean we need to use nn.Module? See https://discuss.pytorch.org/t/state-of-the-art-for-torchvision-datasets-transforms-models-design/123625 for discussion on this.

Models

Radiant Earth + Planetary Computer is planning to distribute pre-trained models, we should support these.