torchgeo/tests/data
Isaac Corley 4cc0cbb327
Add ChaBuD Dataset (#1259)
* add chabud dataset and datamodule

* add clarifying comment for min/max

* fix wrong channel plotting

* cast image as float

* sort uuids

* add chabud dataset and datamodule

* add clarifying comment for min/max

* fix wrong channel plotting

* cast image as float

* sort uuids

* add chabud dataset and datamodule

* add clarifying comment for min/max

* fix wrong channel plotting

* cast image as float

* sort uuids

* update docs

* update test data

* fix order of operations

* update chabud config

* update per suggestions

* fix mypy error

* update to new test config format

* update version added

* use DatasetNotFoundError

* fix tests

* updates per suggestions

* Matching bands/mean/std

---------

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
2023-11-25 15:19:20 -06:00
..
advance Add ADVANCE dataset (#133) 2021-09-19 23:00:56 +00:00
agb_live_woody_density Fix Download Link AGB Live Woody Biomass dataset (#1713) 2023-11-05 14:59:07 -06:00
astergdem Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
bigearthnet BigEarthNet Splits (#221) 2021-11-05 16:58:25 +00:00
biomassters Add BioMassters Dataset (#1560) 2023-09-29 09:48:16 -05:00
cbf Make scripts executable 2022-08-06 22:00:59 +00:00
cdl CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
chabud Add ChaBuD Dataset (#1259) 2023-11-25 15:19:20 -06:00
chesapeake extract_archive: support deflate64-compressed zip files (#282) 2022-01-14 23:14:51 -06:00
cms_mangrove_canopy Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
cowc_counting Run linters on tests/data (#356) 2022-01-13 19:16:10 +00:00
cowc_detection Run linters on tests/data (#356) 2022-01-13 19:16:10 +00:00
cyclone Pytorch lightning based training framework (#42) 2021-07-17 16:57:18 -07:00
deepglobelandcover Make scripts executable 2022-08-06 22:00:59 +00:00
dfc2022 Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
eddmaps Add EDDMapS dataset (#533) 2022-05-14 21:29:47 -05:00
enviroatlas Drop Python 3.8 support (#1246) 2023-04-15 20:27:51 -05:00
esri2020 Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
etci2021 Fix float rounding issues (#736) 2022-09-02 12:08:31 -07:00
eudem Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
eurosat Adding splits to RESISC45 and EuroSat (#218) 2021-11-02 22:26:39 -05:00
fair1m Update FAIR1M dataset and datamodule (#1275) 2023-04-26 07:00:54 -05:00
fire_risk CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
forestdamage Make scripts executable 2022-08-06 22:00:59 +00:00
gbif Add GBIF dataset (#507) 2022-05-06 11:16:08 -05:00
gid15 Add datamodule for GID-15 dataset (#928) 2022-12-30 11:31:00 -06:00
globbiomass Bump black[jupyter] from 22.12.0 to 23.1.0 in /requirements (#1080) 2023-02-01 22:26:54 +00:00
idtrees Add IDTReeS dataset (#201) 2021-12-05 22:38:50 +00:00
inaturalist Add iNaturalist dataset (#532) 2022-05-14 21:29:34 -05:00
inria Added functionality for validation split (#1540) 2023-09-29 11:02:19 -05:00
l7irish L7 Irish: update for new dataset format (#1355) 2023-05-24 21:23:10 -05:00
l8biome L8 Biome: update for new dataset format (#1356) 2023-05-24 21:24:01 -05:00
landcoverai LandCover.ai: add data.py script (#643) 2022-07-02 12:25:29 -05:00
landsat8 Various fixes to GeoDataset 2021-08-10 10:06:00 -05:00
levircd Add plot method to Levir, and change directory path (#335) 2021-12-30 12:05:42 -06:00
loveda Add LoveDA dataset (#270) 2021-12-09 14:47:11 -06:00
mapinwild Add MapInWild dataset (#1131) 2023-09-29 10:52:08 +00:00
millionaid Make scripts executable 2022-08-06 22:00:59 +00:00
naip Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
nasa_marine_debris NASA Marine Debris dataset (#269) 2021-12-10 17:57:38 -06:00
nccm Adding Northeastern China Crop Map Dataset (#1666) 2023-11-17 21:25:07 +00:00
nlcd CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
nongeoclassification Rename VisionDataset to NonGeoDataset (#627) 2022-07-09 18:28:24 -07:00
openbuildings Make scripts executable 2022-08-06 22:00:59 +00:00
oscd Run linters on tests/data (#356) 2022-01-13 19:16:10 +00:00
pastis PASTIS dataset (#315) 2023-08-03 11:46:49 -07:00
patternnet Add plot method to PatternNet dataset (#314) 2021-12-31 11:00:15 -06:00
potsdam Add Potsdam Segmentation (#247) 2021-11-16 09:13:41 -08:00
raster Fix reprojection issues (#1344) 2023-05-18 21:33:47 -07:00
ref_african_crops_kenya_02 Add unit tests for CV4AKenyaCropType dataset 2021-06-24 17:20:23 +00:00
ref_cloud_cover_detection_challenge_v1 CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
reforestree Drop Python 3.8 support (#1246) 2023-04-15 20:27:51 -05:00
resisc45 Adding RESISC45 trainer with augmentations (#225) 2021-11-07 05:17:57 +00:00
rwanda_field_boundary Adding the RwandaFieldBoundary dataset (#1574) 2023-09-29 10:34:51 +00:00
seasonet adding seasonet dataset + tests + doc (#1466) 2023-09-24 14:17:50 +00:00
seco SeCo newer version bug fix (#1235) 2023-04-14 14:39:00 -05:00
sen12ms Run linters on tests/data (#356) 2022-01-13 19:16:10 +00:00
sentinel1 Drop Python 3.8 support (#1246) 2023-04-15 20:27:51 -05:00
sentinel2 Drop Python 3.8 support (#1246) 2023-04-15 20:27:51 -05:00
skippd Move SKIPPD to HF and add forecast task (#1548) 2023-09-24 12:56:54 -05:00
so2sat Add multiple versions of the So2Sat dataset (#1283) 2023-04-25 22:03:29 +00:00
spacenet CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
ssl4eo SSL4EO-L: add download support (#1424) 2023-06-20 11:54:10 -05:00
ssl4eo_benchmark_landsat CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
sustainbench_crop_yield CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
ts_cashew_benin Add unit tests 2021-06-24 10:48:41 -05:00
ucmerced Add train/val/test splits to UCMerced (#216) 2021-11-01 10:09:36 -05:00
usavars Sentinel-2: add support for files downloaded from USGS EarthExplorer (#754) 2022-09-05 14:18:57 -05:00
vaihingen Add Vaihingen Segmentation (#248) 2021-11-16 02:02:51 -06:00
vector Allow multilabels in VectorDataset (#862) 2022-10-26 11:42:13 -05:00
vhr10 CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
western_usa_live_fuel_moisture CDL/NLCD/SSL4EO: allow selection of classes (#1392) 2023-06-04 08:21:05 -07:00
xview2 Add xView2 Dataset (#236) 2021-11-15 08:45:57 -06:00
xview3 Add custom RasterDataset notebook (#283) 2021-12-21 15:29:15 -08:00
zuericrop Make scripts executable 2022-08-06 22:00:59 +00:00
README.md Better example for writing fake test data (#1315) 2023-05-07 22:37:53 -05:00

README.md

This directory contains fake data used to test torchgeo. Depending on the type of dataset, fake data can be created in multiple ways:

GeoDataset

GeoDataset data can be created like so. We first open an existing data example and use it to copy the driver/CRS/transform to the fake data.

Raster data

import os

import numpy as np
import rasterio as rio

ROOT = "data/landsat8"
FILENAME = "LC08_L2SP_023032_20210622_20210629_02_T1_SR_B1.TIF"
SIZE = 64

with rio.open(os.path.join(ROOT, FILENAME), "r") as src:
    dtype = src.profile["dtype"]
    Z = np.random.randint(np.iinfo(dtype).max, size=(SIZE, SIZE), dtype=dtype)
    with rio.open(FILENAME, "w", **src.profile) as dst:
        for i in dst.profile.indexes:
            dst.write(Z, i)

Optionally, if the dataset has a colormap, this can be copied like so:

cmap = src.colormap(1)
dst.write_colormap(1, cmap)

Vector data

import os
from collections import OrderedDict

import fiona

ROOT = "data/cbf"
FILENAME = "Ontario.geojson"

rec = {"type": "Feature", "id": "0", "properties": OrderedDict(), "geometry": {"type": "Polygon", "coordinates": [[(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]]}}
with fiona.open(os.path.join(ROOT, FILENAME), "r") as src:
    src.meta["schema"]["properties"] = OrderedDict()
    with fiona.open(FILENAME, "w", **src.meta) as dst:
        dst.write(rec)

NonGeoDataset

NonGeoDataset data can be created like so.

RGB images

import numpy as np
from PIL import Image

DTYPE = np.uint8
SIZE = 64

arr = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE, 3), dtype=DTYPE)
img = Image.fromarray(arr)
img.save("01.png")

Grayscale images

import numpy as np
from PIL import Image

DTYPE = np.uint8
SIZE = 64

arr = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE), dtype=DTYPE)
img = Image.fromarray(arr)
img.save("02.jpg")

Audio wav files

import numpy as np
from scipy.io import wavfile

audio = np.random.randn(1).astype(np.float32)
wavfile.write("01.wav", rate=22050, data=audio)

HDF5 datasets

import h5py
import numpy as np

DTYPE = np.uint8
SIZE = 64
NUM_CLASSES = 10

images = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE, 3), dtype=DTYPE)
masks = np.random.randint(NUM_CLASSES, size=(SIZE, SIZE), dtype=DTYPE)
with h5py.File("data.hdf5", "w") as f:
    f.create_dataset("images", data=images)
    f.create_dataset("masks", data=masks)

LAS Point Cloud files

import laspy

num_points = 4

las = laspy.read("0.las")
las.points = las.points[:num_points]

points = np.random.randint(low=0, high=100, size=(num_points,), dtype=las.x.dtype)
las.x = points
las.y = points
las.z = points

if hasattr(las, "red"):
    colors = np.random.randint(low=0, high=10, size=(num_points,), dtype=las.red.dtype)
    las.red = colors
    las.green = colors
    las.blue = colors

las.write("0.las")