Clarify RasterDataset documentation for is_image and dtype (#1811)

* Change DEMs from mask to image (is_image=True)

* fix to revert to upstream file

* fix unused type: ignore comment

* Update torchgeo/datasets/geo.py

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* Update documentation to explain is_image and dtype. Update asterdem to override dtype.

* fix linting errors

* Made comment for is_image more succint.

* change asterdem dtype back to float32 (same as RasterDataset)

* removed integer images from documentation

* change Digital Elevation Model to DEM

* Clarify is_image and dtype.
Revert DEMs to masks

* Finish reverting DEMs to masks

* address review comments

* Changed Aster Global DEM and EU-DEM Dataset types to "DEM"

* Reorganize some information

* Use better formatting

---------

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
This commit is contained in:
David Meaux 2024-03-02 21:33:51 +01:00 коммит произвёл GitHub
Родитель 6d2e9a483b
Коммит 1eaade2747
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
3 изменённых файлов: 46 добавлений и 17 удалений

Просмотреть файл

@ -2,7 +2,7 @@ Dataset,Type,Source,License,Size (px),Resolution (m)
`Aboveground Woody Biomass`_,Masks,"Landsat, LiDAR","CC-BY-4.0","40,000x40,000",30
`AgriFieldNet`_,"Imagery, Masks",Sentinel-2,"CC-BY-4.0","256x256",10
`Airphen`_,Imagery,Airphen,-,"1,280x960",0.047--0.09
`Aster Global DEM`_,Masks,Aster,"public domain","3,601x3,601",30
`Aster Global DEM`_,DEM,Aster,"public domain","3,601x3,601",30
`Canadian Building Footprints`_,Geometries,Bing Imagery,"ODbL-1.0",-,-
`Chesapeake Land Cover`_,"Imagery, Masks",NAIP,"CC-BY-4.0",-,1
`Global Mangrove Distribution`_,Masks,"Remote Sensing, In Situ Measurements","public domain",-,3
@ -10,7 +10,7 @@ Dataset,Type,Source,License,Size (px),Resolution (m)
`EDDMapS`_,Points,Citizen Scientists,-,-,-
`EnviroAtlas`_,"Imagery, Masks","NAIP, NLCD, OpenStreetMap","CC-BY-4.0",-,1
`Esri2020`_,Masks,Sentinel-2,"CC-BY-4.0",-,10
`EU-DEM`_,Masks,"Aster, SRTM, Russian Topomaps","CSCDA-ESA",-,25
`EU-DEM`_,DEM,"Aster, SRTM, Russian Topomaps","CSCDA-ESA",-,25
`EuroCrops`_,Geometries,EU Countries,"CC-BY-SA-4.0",-,-
`GBIF`_,Points,Citizen Scientists,"CC0-1.0 OR CC-BY-4.0 OR CC-BY-NC-4.0",-,-
`GlobBiomass`_,Masks,Landsat,"CC-BY-4.0","45,000x45,000",100

1 Dataset Type Source License Size (px) Resolution (m)
2 `Aboveground Woody Biomass`_ Masks Landsat, LiDAR CC-BY-4.0 40,000x40,000 30
3 `AgriFieldNet`_ Imagery, Masks Sentinel-2 CC-BY-4.0 256x256 10
4 `Airphen`_ Imagery Airphen - 1,280x960 0.047--0.09
5 `Aster Global DEM`_ Masks DEM Aster public domain 3,601x3,601 30
6 `Canadian Building Footprints`_ Geometries Bing Imagery ODbL-1.0 - -
7 `Chesapeake Land Cover`_ Imagery, Masks NAIP CC-BY-4.0 - 1
8 `Global Mangrove Distribution`_ Masks Remote Sensing, In Situ Measurements public domain - 3
10 `EDDMapS`_ Points Citizen Scientists - - -
11 `EnviroAtlas`_ Imagery, Masks NAIP, NLCD, OpenStreetMap CC-BY-4.0 - 1
12 `Esri2020`_ Masks Sentinel-2 CC-BY-4.0 - 10
13 `EU-DEM`_ Masks DEM Aster, SRTM, Russian Topomaps CSCDA-ESA - 25
14 `EuroCrops`_ Geometries EU Countries CC-BY-SA-4.0 - -
15 `GBIF`_ Points Citizen Scientists CC0-1.0 OR CC-BY-4.0 OR CC-BY-NC-4.0 - -
16 `GlobBiomass`_ Masks Landsat CC-BY-4.0 45,000x45,000 100

Просмотреть файл

@ -329,7 +329,11 @@
"\n",
"### `is_image`\n",
"\n",
"If your data only contains image files, as is the case with Sentinel-2, use `is_image = True`. If your data only contains segmentation masks, use `is_image = False` instead.\n",
"If your data only contains model inputs (such as images), use `is_image = True`. If your data only contains ground truth model outputs (such as segmentation masks), use `is_image = False` instead.\n",
"\n",
"### `dtype`\n",
"\n",
"Defaults to float32 for `is_image == True` and long for `is_image == False`. This is what you want for 99% of datasets, but can be overridden for tasks like pixel-wise regression (where the target mask should be float32).\n",
"\n",
"### `separate_files`\n",
"\n",

Просмотреть файл

@ -55,9 +55,11 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
based on latitude/longitude. This allows users to do things like:
* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)
These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
@ -69,9 +71,9 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
Users may also want to:
* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)
These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
@ -108,7 +110,7 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
def __init__(
self, transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new GeoDataset instance.
Args:
transforms: a function/transform that takes an input sample
@ -344,7 +346,14 @@ class RasterDataset(GeoDataset):
#: ``start`` and ``stop`` groups.
date_format = "%Y%m%d"
#: True if dataset contains imagery, False if dataset contains mask
#: True if the dataset only contains model inputs (such as images). False if the
#: dataset only contains ground truth model outputs (such as segmentation masks).
#:
#: The sample returned by the dataset/data loader will use the "image" key if
#: *is_image* is True, otherwise it will use the "mask" key.
#:
#: For datasets with both model inputs and outputs, a custom
#: :func:`~RasterDataset.__getitem__` method must be implemented.
is_image = True
#: True if data is stored in a separate file for each band, else False.
@ -363,6 +372,10 @@ class RasterDataset(GeoDataset):
def dtype(self) -> torch.dtype:
"""The dtype of the dataset (overrides the dtype of the data file via a cast).
Defaults to float32 if :attr:`~RasterDataset.is_image` is True, else long.
Can be overridden for tasks like pixel-wise regression where the mask should be
float32 instead of long.
Returns:
the dtype of the dataset
@ -382,7 +395,7 @@ class RasterDataset(GeoDataset):
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
cache: bool = True,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new RasterDataset instance.
Args:
paths: one or more root directories to search or files to load
@ -605,7 +618,7 @@ class VectorDataset(GeoDataset):
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
label_name: Optional[str] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new VectorDataset instance.
Args:
paths: one or more root directories to search or files to load
@ -873,9 +886,11 @@ class IntersectionDataset(GeoDataset):
This allows users to do things like:
* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)
These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
@ -896,7 +911,12 @@ class IntersectionDataset(GeoDataset):
] = concat_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new IntersectionDataset instance.
When computing the intersection between two datasets that both contain model
inputs (such as images) or model outputs (such as masks), the default behavior
is to stack the data along the channel dimension. The *collate_fn* parameter
can be used to change this behavior.
Args:
dataset1: the first dataset
@ -1026,9 +1046,9 @@ class UnionDataset(GeoDataset):
This allows users to do things like:
* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)
These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
@ -1049,7 +1069,12 @@ class UnionDataset(GeoDataset):
] = merge_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new UnionDataset instance.
When computing the union between two datasets that both contain model inputs
(such as images) or model outputs (such as masks), the default behavior is to
merge the data to create a single image/mask. The *collate_fn* parameter can be
used to change this behavior.
Args:
dataset1: the first dataset