* Better README

* Update description
This commit is contained in:
Adam J. Stewart 2022-06-29 22:25:59 -07:00 коммит произвёл GitHub
Родитель a3d6e9fcec
Коммит 627473c609
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
8 изменённых файлов: 84 добавлений и 60 удалений

Просмотреть файл

@ -17,6 +17,6 @@ preferred-citation:
given-names: "Arindam"
journal: "arXiv preprint arXiv:2111.08872"
month: 11
title: "TorchGeo: deep learning with geospatial data"
title: "TorchGeo: Deep Learning With Geospatial Data"
url: "https://github.com/microsoft/torchgeo"
year: 2021

133
README.md
Просмотреть файл

@ -1,25 +1,23 @@
<img src="https://raw.githubusercontent.com/microsoft/torchgeo/main/logo/logo-color.svg" width="400" alt="TorchGeo"/>
<img src="https://raw.githubusercontent.com/microsoft/torchgeo/main/logo/logo-color.svg" width="400" alt="TorchGeo logo"/>
TorchGeo is a [PyTorch](https://pytorch.org/) domain library, similar to [torchvision](https://pytorch.org/vision), that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
TorchGeo is a [PyTorch](https://pytorch.org/) domain library, similar to [torchvision](https://pytorch.org/vision), providing datasets, samplers, transforms, and pre-trained models specific to geospatial data.
The goal of this library is to make it simple:
1. for machine learning experts to use geospatial data in their workflows, and
2. for remote sensing experts to use their data in machine learning workflows.
1. for machine learning experts to work with geospatial data, and
2. for remote sensing experts to explore machine learning solutions.
See our [installation instructions](#installation), [documentation](#documentation), and [examples](#example-usage) to learn how to use TorchGeo.
External links:
[![docs](https://readthedocs.org/projects/torchgeo/badge/?version=latest)](https://torchgeo.readthedocs.io/en/latest/?badge=latest)
Testing:
[![docs](https://readthedocs.org/projects/torchgeo/badge/?version=latest)](https://torchgeo.readthedocs.io/en/stable/)
[![style](https://github.com/microsoft/torchgeo/actions/workflows/style.yaml/badge.svg)](https://github.com/microsoft/torchgeo/actions/workflows/style.yaml)
[![tests](https://github.com/microsoft/torchgeo/actions/workflows/tests.yaml/badge.svg)](https://github.com/microsoft/torchgeo/actions/workflows/tests.yaml)
[![codecov](https://codecov.io/gh/microsoft/torchgeo/branch/main/graph/badge.svg?token=oa3Z3PMVOg)](https://codecov.io/gh/microsoft/torchgeo)
Packaging:
[![pypi](https://badge.fury.io/py/torchgeo.svg)](https://pypi.org/project/torchgeo/)
[![conda](https://anaconda.org/conda-forge/torchgeo/badges/version.svg)](https://anaconda.org/conda-forge/torchgeo)
[![spack](https://img.shields.io/spack/v/py-torchgeo)](https://spack.readthedocs.io/en/latest/package_list.html#py-torchgeo)
Tests:
[![style](https://github.com/microsoft/torchgeo/actions/workflows/style.yaml/badge.svg)](https://github.com/microsoft/torchgeo/actions/workflows/style.yaml)
[![tests](https://github.com/microsoft/torchgeo/actions/workflows/tests.yaml/badge.svg)](https://github.com/microsoft/torchgeo/actions/workflows/tests.yaml)
## Installation
The recommended way to install TorchGeo is with [pip](https://pip.pypa.io/):
@ -28,55 +26,34 @@ The recommended way to install TorchGeo is with [pip](https://pip.pypa.io/):
$ pip install torchgeo
```
For [conda](https://docs.conda.io/) and [spack](https://spack.io/) installation instructions, see the [documentation](https://torchgeo.readthedocs.io/en/latest/user/installation.html).
For [conda](https://docs.conda.io/) and [spack](https://spack.io/) installation instructions, see the [documentation](https://torchgeo.readthedocs.io/en/stable/user/installation.html).
## Documentation
You can find the documentation for TorchGeo on [ReadTheDocs](https://torchgeo.readthedocs.io).
You can find the documentation for TorchGeo on [ReadTheDocs](https://torchgeo.readthedocs.io). This includes API documentation, contributing instructions, and several [tutorials](https://torchgeo.readthedocs.io/en/stable/tutorials/getting_started.html). For more details, check out our [paper](https://arxiv.org/abs/2111.08872) and [blog](https://pytorch.org/blog/geospatial-deep-learning-with-torchgeo/).
## Example Usage
The following sections give basic examples of what you can do with TorchGeo. For more examples, check out our [tutorials](https://torchgeo.readthedocs.io/en/latest/tutorials/getting_started.html).
The following sections give basic examples of what you can do with TorchGeo.
First we'll import various classes and functions used in the following sections:
```python
from pytorch_lightning import Trainer
from torch.utils.data import DataLoader
from torchgeo.datasets import CDL, COWCDetection, Landsat7, Landsat8, stack_samples
from torchgeo.datamodules import InriaAerialImageLabelingDataModule
from torchgeo.datasets import CDL, Landsat7, Landsat8, VHR10, stack_samples
from torchgeo.samplers import RandomGeoSampler
from torchgeo.trainers import SemanticSegmentationTask
```
### Benchmark datasets
### Geospatial datasets and samplers
TorchGeo includes a number of [*benchmark*](https://torchgeo.readthedocs.io/en/latest/api/datasets.html#non-geospatial-datasets) datasets, datasets that include both input images and target labels. This includes datasets for tasks like image classification, regression, semantic segmentation, object detection, instance segmentation, change detection, and more.
Many remote sensing applications involve working with [*geospatial datasets*](https://torchgeo.readthedocs.io/en/stable/api/datasets.html#geospatial-datasets)—datasets with geographic metadata. These datasets can be challenging to work with due to the sheer variety of data. Geospatial imagery is often multispectral with a different number of spectral bands and spatial resolution for every satellite. In addition, each file may be in a different coordinate reference system (CRS), requiring the data to be reprojected into a matching CRS.
If you've used [torchvision](https://pytorch.org/vision) before, these datasets should seem very familiar. In this example, we'll create a dataset for the Cars Overhead With Context (COWC) car detection dataset. This dataset can be automatically downloaded, checksummed, and extracted, just like with torchvision.
<img src="https://raw.githubusercontent.com/microsoft/torchgeo/main/images/geodataset.png" alt="Example application in which we combine Landsat and CDL and sample from both"/>
```python
dataset = COWCDetection(root="...", split="train", download=True, checksum=True)
```
This dataset can then be passed to a PyTorch data loader.
```python
dataloader = DataLoader(dataset, batch_size=128, shuffle=True, num_workers=4)
```
The only difference between a benchmark dataset in TorchGeo and a similar dataset in torchvision is that each dataset returns a dictionary with keys for each PyTorch Tensor.
```python
for batch in dataloader:
image = batch["image"]
label = batch["label"]
# train a model, or make predictions using a pre-trained model
```
### Geospatial datasets
Many remote sensing applications involve working with [*generic*](https://torchgeo.readthedocs.io/en/latest/api/datasets.html#geospatial-datasets) geospatial data. This data can be challenging to work with due to the sheer variety of data. Geospatial imagery is often multispectral with a different number of spectral bands and spatial resolution for every satellite. In addition, each file may be in a different coordinate reference system (CRS), requiring the data to be reprojected into a matching CRS.
In this example, we show how easy it is to work with geospatial data and to sample small image patches from a combination of Landsat and Cropland Data Layer (CDL) data using TorchGeo. First, we assume that the user has Landsat 7 and 8 imagery downloaded. Since Landsat 8 has more spectral bands than Landsat 7, we'll only use the bands that both satellites have in common. We'll create a single dataset including all images from both Landsat 7 and 8 data by taking the union between these two datasets.
In this example, we show how easy it is to work with geospatial data and to sample small image patches from a combination of [Landsat](https://www.usgs.gov/landsat-missions) and [Cropland Data Layer (CDL)](https://data.nal.usda.gov/dataset/cropscape-cropland-data-layer) data using TorchGeo. First, we assume that the user has Landsat 7 and 8 imagery downloaded. Since Landsat 8 has more spectral bands than Landsat 7, we'll only use the bands that both satellites have in common. We'll create a single dataset including all images from both Landsat 7 and 8 data by taking the union between these two datasets.
```python
landsat7 = Landsat7(root="...")
@ -84,14 +61,14 @@ landsat8 = Landsat8(root="...", bands=Landsat8.all_bands[1:-2])
landsat = landsat7 | landsat8
```
Next, we take the intersection between this dataset and the Cropland Data Layer (CDL) dataset. We want to take the intersection instead of the union to ensure that we only sample from regions that have both Landsat and CDL data. Note that we can automatically download and checksum CDL data. Also note that each of these datasets may contain files in different coordinate reference systems (CRS) or resolutions, but TorchGeo automatically ensures that a matching CRS and resolution is used.
Next, we take the intersection between this dataset and the CDL dataset. We want to take the intersection instead of the union to ensure that we only sample from regions that have both Landsat and CDL data. Note that we can automatically download and checksum CDL data. Also note that each of these datasets may contain files in different coordinate reference systems (CRS) or resolutions, but TorchGeo automatically ensures that a matching CRS and resolution is used.
```python
cdl = CDL(root="...", download=True, checksum=True)
dataset = landsat & cdl
```
This dataset can now be used with a PyTorch data loader. Unlike benchmark datasets, geospatial datasets often include very large images. For example, the CDL dataset consists of a single image covering the entire continental United States. In order to sample from these datasets using geospatial coordinates, TorchGeo defines a number of [*samplers*](https://torchgeo.readthedocs.io/en/latest/api/samplers.html). In this example, we'll use a random sampler that returns 256x256 pixel images and an epoch length of 10,000 images. We also use a custom collation function to combine each sample dictionary into a mini-batch of samples.
This dataset can now be used with a PyTorch data loader. Unlike benchmark datasets, geospatial datasets often include very large images. For example, the CDL dataset consists of a single image covering the entire continental United States. In order to sample from these datasets using geospatial coordinates, TorchGeo defines a number of [*samplers*](https://torchgeo.readthedocs.io/en/stable/api/samplers.html). In this example, we'll use a random sampler that returns 256 x 256 pixel images and 10,000 samples per epoch. We also use a custom collation function to combine each sample dictionary into a mini-batch of samples.
```python
sampler = RandomGeoSampler(dataset, size=256, length=10000)
@ -108,10 +85,58 @@ for batch in dataloader:
# train a model, or make predictions using a pre-trained model
```
### Train and test models using our PyTorch Lightning-based training script
Many applications involve intelligently composing datasets based on geospatial metadata like this. For example, users may want to:
We provide a script, `train.py` for training models using a subset of the datasets. We do this with the PyTorch Lightning `LightningModule`s and `LightningDataModule`s implemented under the `torchgeo.trainers` namespace.
The `train.py` script is configurable via the command line and/or via YAML configuration files. See the [conf/](conf/) directory for example configuration files that can be customized for different training runs.
* Combine datasets for multiple image sources and treat them as equivalent (e.g., Landsat 7 and 8)
* Combine datasets for disparate geospatial locations (e.g., Chesapeake NY and PA)
These combinations require that all queries are present in at least one dataset, and can be created using a `UnionDataset`. Similarly, users may want to:
* Combine image and target labels and sample from both simultaneously (e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion (e.g., Landsat and Sentinel)
These combinations require that all queries are present in both datasets, and can be created using an `IntersectionDataset`. TorchGeo automatically composes these datasets for you when you use the intersection (`&`) and union (`|`) operators.
### Benchmark datasets
TorchGeo includes a number of [*benchmark datasets*](https://torchgeo.readthedocs.io/en/stable/api/datasets.html#non-geospatial-datasets)—datasets that include both input images and target labels. This includes datasets for tasks like image classification, regression, semantic segmentation, object detection, instance segmentation, change detection, and more.
If you've used [torchvision](https://pytorch.org/vision) before, these datasets should seem very familiar. In this example, we'll create a dataset for the Northwestern Polytechnical University (NWPU) very-high-resolution ten-class ([VHR-10](https://github.com/chaozhong2010/VHR-10_dataset_coco)) geospatial object detection dataset. This dataset can be automatically downloaded, checksummed, and extracted, just like with torchvision.
```python
dataset = VHR10(root="...", download=True, checksum=True)
dataloader = DataLoader(dataset, batch_size=128, shuffle=True, num_workers=4)
for batch in dataloader:
image = batch["image"]
label = batch["label"]
# train a model, or make predictions using a pre-trained model
```
<img src="https://raw.githubusercontent.com/microsoft/torchgeo/main/images/vhr10.png" alt="Example predictions from a Mask R-CNN model trained on the NWPU VHR-10 dataset"/>
All TorchGeo datasets are compatible with PyTorch data loaders, making them easy to integrate into existing training workflows. The only difference between a benchmark dataset in TorchGeo and a similar dataset in torchvision is that each dataset returns a dictionary with keys for each PyTorch `Tensor`.
### Reproducibility with PyTorch Lightning
In order to facilitate direct comparisons between results published in the literature and further reduce the boilerplate code needed to run experiments with datasets in TorchGeo, we have created PyTorch Lightning [*datamodules*](https://torchgeo.readthedocs.io/en/stable/api/datamodules.html) with well-defined train-val-test splits and [*trainers*](https://torchgeo.readthedocs.io/en/stable/api/trainers.html) for various tasks like classification, regression, and semantic segmentation. These datamodules show how to incorporate augmentations from the kornia library, include preprocessing transforms (with pre-calculated channel statistics), and let users easily experiment with hyperparameters related to the data itself (as opposed to the modeling process). Training a semantic segmentation model on the [Inria Aerial Image Labeling](https://project.inria.fr/aerialimagelabeling/) dataset is as easy as a few imports and four lines of code.
```python
from pytorch_lightning import Trainer
from torchgeo.datamodules import InriaAerialImageLabelingDataModule
from torchgeo.trainers import SemanticSegmentationTask
datamodule = InriaAerialImageLabelingDataModule(root_dir="...", batch_size=64, num_workers=6)
task = SemanticSegmentationTask(segmentation_model="unet", encoder_weights="imagenet", learning_rate=0.1)
trainer = Trainer(gpus=1, default_root_dir="...")
trainer.fit(model=task, datamodule=datamodule)
```
<img src="https://raw.githubusercontent.com/microsoft/torchgeo/main/images/inria.png" alt="Building segmentations produced by a U-Net model trained on the Inria Aerial Image Labeling dataset"/>
In our GitHub repo, we provide `train.py` and `evaluate.py` scripts to train and evaluate the performance of models using these datamodules and trainers. These scripts are configurable via the command line and/or via YAML configuration files. See the [conf](https://github.com/microsoft/torchgeo/blob/main/conf) directory for example configuration files that can be customized for different training runs.
```console
$ python train.py config_file=conf/landcoverai.yaml
@ -119,13 +144,13 @@ $ python train.py config_file=conf/landcoverai.yaml
## Citation
If you use this software in your work, please cite [our paper](https://arxiv.org/abs/2111.08872):
If you use this software in your work, please cite our [paper](https://arxiv.org/abs/2111.08872):
```
@article{Stewart_TorchGeo_deep_learning_2021,
author = {Stewart, Adam J. and Robinson, Caleb and Corley, Isaac A. and Ortiz, Anthony and Lavista Ferres, Juan M. and Banerjee, Arindam},
journal = {arXiv preprint arXiv:2111.08872},
month = {11},
title = {{TorchGeo: deep learning with geospatial data}},
title = {{TorchGeo}: Deep Learning With Geospatial Data},
url = {https://github.com/microsoft/torchgeo},
year = {2021}
}
@ -133,8 +158,6 @@ If you use this software in your work, please cite [our paper](https://arxiv.org
## Contributing
This project welcomes contributions and suggestions. If you would like to submit a pull request, see our [Contribution Guide](https://torchgeo.readthedocs.io/en/latest/user/contributing.html) for more information.
This project welcomes contributions and suggestions. If you would like to submit a pull request, see our [Contribution Guide](https://torchgeo.readthedocs.io/en/stable/user/contributing.html) for more information.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

Двоичные данные
images/geodataset.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 701 KiB

Двоичные данные
images/inria.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.5 MiB

Двоичные данные
images/vhr10.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 718 KiB

Просмотреть файл

@ -27,7 +27,7 @@ color_output = true
[tool.mypy]
ignore_missing_imports = true
show_error_codes = true
exclude = "(build|data|dist|docs/src|logo|logs|output)/"
exclude = "(build|data|dist|docs/src|images|logo|logs|output)/"
# Strict
warn_unused_configs = true

Просмотреть файл

@ -4,7 +4,7 @@ name = torchgeo
version = attr: torchgeo.__version__
author = Adam J. Stewart
author_email = ajstewart426@gmail.com
description = TorchGeo: datasets, transforms, and models for geospatial data
description = TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/microsoft/torchgeo
@ -21,7 +21,7 @@ classifiers =
Operating System :: OS Independent
Topic :: Scientific/Engineering :: Artificial Intelligence
Topic :: Scientific/Engineering :: GIS
keywords = pytorch, deep learning, machine learning, remote sensing, satellite imagery, geospatial
keywords = pytorch, deep learning, machine learning, remote sensing, satellite imagery, earth observation, geospatial
[options]
install_requires =
@ -134,6 +134,7 @@ extend-ignore =
exclude =
# TorchGeo
data/,
images/,
logo/,
logs/,
output/,

Просмотреть файл

@ -1,7 +1,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
"""TorchGeo: datasets, transforms, and models for geospatial data.
"""TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data.
This library is part of the `PyTorch <http://pytorch.org/>`_ project. PyTorch is an open
source machine learning framework.