torchgeo/tests/data
Nils Lehmann b9a09f5711
Add Digital typhoon dataset (#1748)
* analysis task dataset

* implement sequence sampling

* add outline datamodule

* add datamodule with two way splitting capabilities

* add plotting function

* download and verify

* add unit tests but they fail

* fix tests

* fix style

* trainer testing yaml

* test split logic

* fix tests

* fix tests2

* found bug

* try to fix mypy

* h5py error docs

* fix docs

* fix one mypy error

* mypy on test file

* fix coverage

* fix tests for trainers

* fix mypy

* try typed dict

* try to fix docs

* fix pytest

* linters

* suggested changes and normalization procedure

* regression target normalization

* update dataset splitting

* fix test

* quotes

* strings

* ruff

* quotes

* ruff format on all

* docs

* lazy import

* h5py

* h5py datamodule

* typo

* tests

* review

* pass tests

* fix tests

* list -> tuple

* mypy fix

* rename

* tests

* Remove Analysis

* min pandas 2.2.0

* resolve tests

---------

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
2024-08-29 11:29:04 +02:00
..
advance Add ADVANCE dataset (#133) 2021-09-19 23:00:56 +00:00
agb_live_woody_density Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
agrifieldnet AgriFieldNet: fix dataset length (#2087) 2024-05-25 20:31:50 +02:00
airphen Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
astergdem Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
bigearthnet BigEarthNet Splits (#221) 2021-11-05 16:58:25 +00:00
biomassters Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
cabuar Add CaBuAr dataset (#2235) 2024-08-28 15:57:58 +02:00
cbf Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
cdl Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
chabud Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
chesapeake Chesapeake: update to 2022 edition (#2214) 2024-08-17 10:10:02 +02:00
cms_mangrove_canopy Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
cowc_counting Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
cowc_detection Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
cropharvest Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
cv4a_kenya_crop_type CV4A Kenya Crop Type: radiant mlhub -> source cooperative (#2090) 2024-07-10 17:39:31 +02:00
cyclone Tropical Cyclone: radiant mlhub -> source cooperative (#2068) 2024-07-10 10:35:11 +02:00
deepglobelandcover Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
dfc2022 Ruff: enable ruff-specific rules (#2218) 2024-08-19 15:07:21 +02:00
digital_typhoon Add Digital typhoon dataset (#1748) 2024-08-29 11:29:04 +02:00
eddmaps Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
enviroatlas Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
esri2020 Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
etci2021 Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
eudem Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
eurocrops Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
eurosat Adding splits to RESISC45 and EuroSat (#218) 2021-11-02 22:26:39 -05:00
fair1m Update FAIR1M dataset and datamodule (#1275) 2023-04-26 07:00:54 -05:00
fire_risk Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
forestdamage Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
gbif Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
geonrw Add GeoNRW dataset (#2209) 2024-08-27 15:57:14 +02:00
gid15 Add datamodule for GID-15 dataset (#928) 2022-12-30 11:31:00 -06:00
globbiomass Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
idtrees Add IDTReeS dataset (#201) 2021-12-05 22:38:50 +00:00
inaturalist Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
inria Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
iobench Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
l7irish Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
l8biome Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
landcoverai Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
landsat8 Various fixes to GeoDataset 2021-08-10 10:06:00 -05:00
levircd LEVIRCD: data module tests without download (#2231) 2024-08-17 15:01:40 +02:00
loveda Add LoveDA dataset (#270) 2021-12-09 14:47:11 -06:00
mapinwild Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
millionaid Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
naip Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
nasa_marine_debris NASA Marine Debris: radiant mlhub -> source coop (#2183) 2024-07-27 09:24:56 +02:00
nccm Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
nlcd Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
nongeoclassification Rename VisionDataset to NonGeoDataset (#627) 2022-07-09 18:28:24 -07:00
openbuildings Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
oscd Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
pastis Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
patternnet Add plot method to PatternNet dataset (#314) 2021-12-31 11:00:15 -06:00
potsdam Add Potsdam Segmentation (#247) 2021-11-16 09:13:41 -08:00
prisma Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
quakeset Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
raster Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
ref_cloud_cover_detection_challenge_v1 Cloud Cover: radiant mlhub -> source cooperative (#2117) 2024-07-10 17:39:57 +02:00
reforestree Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
resisc45 Redistribute NWPU datasets on Hugging Face (#2210) 2024-08-17 17:01:44 +02:00
rwanda_field_boundary Rwanda Field Boundary: radiant mlhub -> source cooperative (#2118) 2024-07-10 17:40:23 +02:00
seasonet Ruff: enable ruff-specific rules (#2218) 2024-08-19 15:07:21 +02:00
seco SeCo newer version bug fix (#1235) 2023-04-14 14:39:00 -05:00
sen12ms Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
sentinel1 Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
sentinel2 Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
skippd Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
skyscript SkyScript: add new dataset (#2253) 2024-08-27 16:38:45 +02:00
so2sat Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
south_africa_crop_type South Africa Crop Type: fix dataset length (#2088) 2024-05-25 20:31:10 +02:00
south_america_soybean Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
spacenet SpaceNet: add SpaceNet 8, radiant mlhub -> aws (#2203) 2024-08-17 20:49:48 +02:00
ssl4eo Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
ssl4eo_benchmark_landsat Ruff: ensure all functions have type hints (#2217) 2024-08-12 14:32:31 +02:00
sustainbench_crop_yield Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
technoserve-cashew-benin Benin Cashews: radiant mlhub -> source cooperative (#2116) 2024-07-10 10:35:55 +02:00
ucmerced Add train/val/test splits to UCMerced (#216) 2021-11-01 10:09:36 -05:00
usavars Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
vaihingen Add Vaihingen Segmentation (#248) 2021-11-16 02:02:51 -06:00
vector Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
vhr10 Redistribute NWPU datasets on Hugging Face (#2210) 2024-08-17 17:01:44 +02:00
western_usa_live_fuel_moisture WesternUSALiveFuelMoisture: radiant mlhub -> source coop (#2206) 2024-08-05 11:11:23 +02:00
xview2 Add xView2 Dataset (#236) 2021-11-15 08:45:57 -06:00
xview3 Add custom RasterDataset notebook (#283) 2021-12-21 15:29:15 -08:00
zuericrop Ruff: prefer single quotes over double quotes (#2001) 2024-05-03 19:30:14 +02:00
README.md Update small mistake of attributeerror (#2162) 2024-07-12 14:28:33 +02:00

README.md

This directory contains fake data used to test torchgeo. Depending on the type of dataset, fake data can be created in multiple ways:

GeoDataset

GeoDataset data can be created like so. We first open an existing data example and use it to copy the driver/CRS/transform to the fake data.

Raster data

import os

import numpy as np
import rasterio as rio

ROOT = "data/landsat8"
FILENAME = "LC08_L2SP_023032_20210622_20210629_02_T1_SR_B1.TIF"
SIZE = 64

with rio.open(os.path.join(ROOT, FILENAME), "r") as src:
    dtype = src.profile["dtype"]
    Z = np.random.randint(np.iinfo(dtype).max, size=(SIZE, SIZE), dtype=dtype)
    with rio.open(FILENAME, "w", **src.profile) as dst:
        for i in dst.indexes:
            dst.write(Z, i)

Optionally, if the dataset has a colormap, this can be copied like so:

cmap = src.colormap(1)
dst.write_colormap(1, cmap)

Vector data

import os
from collections import OrderedDict

import fiona

ROOT = "data/cbf"
FILENAME = "Ontario.geojson"

rec = {"type": "Feature", "id": "0", "properties": OrderedDict(), "geometry": {"type": "Polygon", "coordinates": [[(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]]}}
with fiona.open(os.path.join(ROOT, FILENAME), "r") as src:
    src.meta["schema"]["properties"] = OrderedDict()
    with fiona.open(FILENAME, "w", **src.meta) as dst:
        dst.write(rec)

NonGeoDataset

NonGeoDataset data can be created like so.

RGB images

import numpy as np
from PIL import Image

DTYPE = np.uint8
SIZE = 64

arr = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE, 3), dtype=DTYPE)
img = Image.fromarray(arr)
img.save("01.png")

Grayscale images

import numpy as np
from PIL import Image

DTYPE = np.uint8
SIZE = 64

arr = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE), dtype=DTYPE)
img = Image.fromarray(arr)
img.save("02.jpg")

Audio wav files

import numpy as np
from scipy.io import wavfile

audio = np.random.randn(1).astype(np.float32)
wavfile.write("01.wav", rate=22050, data=audio)

HDF5 datasets

import h5py
import numpy as np

DTYPE = np.uint8
SIZE = 64
NUM_CLASSES = 10

images = np.random.randint(np.iinfo(DTYPE).max, size=(SIZE, SIZE, 3), dtype=DTYPE)
masks = np.random.randint(NUM_CLASSES, size=(SIZE, SIZE), dtype=DTYPE)
with h5py.File("data.hdf5", "w") as f:
    f.create_dataset("images", data=images)
    f.create_dataset("masks", data=masks)

LAS Point Cloud files

import laspy

num_points = 4

las = laspy.read("0.las")
las.points = las.points[:num_points]

points = np.random.randint(low=0, high=100, size=(num_points,), dtype=las.x.dtype)
las.x = points
las.y = points
las.z = points

if hasattr(las, "red"):
    colors = np.random.randint(low=0, high=10, size=(num_points,), dtype=las.red.dtype)
    las.red = colors
    las.green = colors
    las.blue = colors

las.write("0.las")