зеркало из https://github.com/microsoft/torchgeo.git
SSL4EO-L: add reproducibility instructions (#1416)
* Create landsat subdirectory * Python scripts in superdirectory now * Add README
This commit is contained in:
Родитель
2ba9207647
Коммит
1ad8caa754
|
@ -0,0 +1,99 @@
|
|||
# SSL4EO-L Instructions
|
||||
|
||||
This README describes the steps to recreate the datasets and reproduce the results of the SSL4EO-L project.
|
||||
|
||||
## Sampling
|
||||
|
||||
The first step in creating the SSL4EO-L pre-training and benchmarking datasets is to choose locations from which to sample. The following scripts can be run to choose non-overlapping locations to sample.
|
||||
|
||||
```console
|
||||
$ bash sample_30.sh # for TM, ETM+, OLI/TIRS
|
||||
$ bash sample_60.sh # only for MSS
|
||||
$ bash sample_conus.sh # for benchmark datasets
|
||||
```
|
||||
|
||||
The first section of these scripts includes user-specific parameters that can be modified to change the behavior of the scripts. Of particular importance are:
|
||||
|
||||
* `SAVE_PATH`: controls where the sampling location CSV is saved to
|
||||
* `START_INDEX`: index to start from (usually 0, can be increased to append more locations)
|
||||
* `END_INDEX`: index to stop at (start with ~500K)
|
||||
|
||||
These scripts will download world city data and write `sampled_locations.csv` files to be used for downloading.
|
||||
|
||||
## Downloading
|
||||
|
||||
Next, you'll actually download the data.
|
||||
|
||||
```console
|
||||
$ bash download_mss_raw.sh
|
||||
$ bash download_tm_toa.sh
|
||||
$ bash download_etm_toa.sh
|
||||
$ bash download_etm_sr.sh
|
||||
$ bash download_oli_tirs_toa.sh
|
||||
$ bash download_oli_sr.sh
|
||||
```
|
||||
|
||||
These scripts contain the following variables you may want to modify:
|
||||
|
||||
* `ROOT_DIR`: root directory containing all subdirectories
|
||||
* `SAVE_PATH`: where the downloaded data is saved
|
||||
* `MATCH_FILE`: the CSV created in the previous step
|
||||
* `NUM_WOKERS`: number of parallel workers
|
||||
* `START_INDEX`: index from which to start downloading
|
||||
* `END_INDEX`: index at which to stop downloading
|
||||
|
||||
These scripts are designed for downloading the pre-training datasets. Each script can be easily modified to instead download the benchmarking datasets by changing the `MATCH_FILE`, `YEAR`, and `--dates` passed in to the download script. For ETM+ TOA, you'll also want to set a `--default-value` since you'll need to include nodata pixels due to SLC-off.
|
||||
|
||||
## Parallel corpus
|
||||
|
||||
For each TOA and SR product, we want to create a parallel corpus. This can be done by running:
|
||||
|
||||
```console
|
||||
$ bash delete_mismatch.sh
|
||||
```
|
||||
|
||||
You may want to modify `ROOT_DIR`.
|
||||
|
||||
## Compression
|
||||
|
||||
The final step in dataset creation is to convert float32 values to uint8 and create compressed COG files. This can be done by running:
|
||||
|
||||
```console
|
||||
$ bash compress_tm_toa.sh
|
||||
$ bash compress_etm_toa.sh
|
||||
$ bash compress_etm_sr.sh
|
||||
$ bash compress_oli_tirs_toa.sh
|
||||
$ bash compress_oli_sr.sh
|
||||
```
|
||||
|
||||
You may want to modify `ROOT_DIR` or `NUM_WORKERS`.
|
||||
|
||||
## Chipping
|
||||
|
||||
For the benchmark datasets, there is one additional step required. You should download NLCD and CDL files from the same years as the benchmark datasets, either manually or using TorchGeo. Then you should run:
|
||||
|
||||
```console
|
||||
$ python3 chip_landsat_benchmark.py ...
|
||||
```
|
||||
|
||||
This will create patches of NLCD and CDL data with the same locations and dimensions as the Landsat images you downloaded. Valid options can be found by passing `--help`.
|
||||
|
||||
## Running Experiments
|
||||
|
||||
Using either the newly created datasets or after downloading the datasets from Hugging Face, you can run each experiment using:
|
||||
|
||||
```console
|
||||
$ python3 ../../../train.py config_file=...
|
||||
```
|
||||
|
||||
The config files to be passed can be found in the `../../../conf/` directory. Feel free to tweak any hyperparameters you see in these files. The default values are the optimal hyperparameters we found.
|
||||
|
||||
## Plotting
|
||||
|
||||
The following scripts can be run to generate the plots in our paper:
|
||||
|
||||
```console
|
||||
$ python3 plot_landsat_bands.py RBV MSS ETM --fig-height=3 # only TM, ETM+, OLI/TIRS
|
||||
$ python3 plot_landsat_bands.py # all bands
|
||||
$ python3 plot_landsat_timeline.py
|
||||
```
|
0
experiments/ssl4eo/chip_landsat_downstream.py → experiments/ssl4eo/landsat/chip_landsat_benchmark.py
Normal file → Executable file
0
experiments/ssl4eo/chip_landsat_downstream.py → experiments/ssl4eo/landsat/chip_landsat_benchmark.py
Normal file → Executable file
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo-l7-l2"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo-l7-l2-v2"
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo_l_etm_sr"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo_l_etm_sr_v2"
|
||||
NUM_WORKERS=40
|
||||
|
||||
# Satellite-specific parameters
|
||||
|
@ -21,7 +21,7 @@ MIN=$R_MIN
|
|||
MAX=$R_MAX
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/compress_dataset.py" \
|
||||
"$SRC_DIR" \
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo-l7-l1"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo-l7-l1-v2"
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo_l_etm_toa"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo_l_etm_toa_v2"
|
||||
NUM_WORKERS=40
|
||||
|
||||
# Satellite-specific parameters
|
||||
|
@ -25,7 +25,7 @@ MIN=($R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $T_MIN $T_MIN $R_MIN $R_MIN)
|
|||
MAX=($R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $T_MAX $T_MAX $R_MAX $R_MAX)
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/compress_dataset.py" \
|
||||
"$SRC_DIR" \
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo-l8-l2"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo-l8-l2-v2"
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo_l_oli_sr"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo_l_oli_sr_v2"
|
||||
NUM_WORKERS=40
|
||||
|
||||
# Satellite-specific parameters
|
||||
|
@ -21,7 +21,7 @@ MIN=$R_MIN
|
|||
MAX=$R_MAX
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/compress_dataset.py" \
|
||||
"$SRC_DIR" \
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo-l8-l1"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo-l8-l1-v2"
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo_l_oli_tirs_toa"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo_l_oli_tirs_toa_v2"
|
||||
NUM_WORKERS=40
|
||||
|
||||
# Satellite-specific parameters
|
||||
|
@ -25,7 +25,7 @@ MIN=($R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $T_MIN $T_MI
|
|||
MAX=($R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $T_MAX $T_MAX)
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/compress_dataset.py" \
|
||||
"$SRC_DIR" \
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo-l5-l1"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo-l5-l1-v2"
|
||||
SRC_DIR="$ROOT_DIR/ssl4eo_l_tm_toa"
|
||||
DST_DIR="$ROOT_DIR/ssl4eo_l_tm_toa_v2"
|
||||
NUM_WORKERS=40
|
||||
|
||||
# Satellite-specific parameters
|
||||
|
@ -25,7 +25,7 @@ MIN=($R_MIN $R_MIN $R_MIN $R_MIN $R_MIN $T_MIN $R_MIN)
|
|||
MAX=($R_MAX $R_MAX $R_MAX $R_MAX $R_MAX $T_MAX $R_MAX)
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/compress_dataset.py" \
|
||||
"$SRC_DIR" \
|
|
@ -7,15 +7,14 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
L5_L1="$ROOT_DIR/ssl4eo-l5-l1/imgs"
|
||||
L7_L1="$ROOT_DIR/ssl4eo-l7-l1/imgs"
|
||||
L7_L2="$ROOT_DIR/ssl4eo-l7-l2/imgs"
|
||||
L8_L1="$ROOT_DIR/ssl4eo-l8-l1/imgs"
|
||||
L8_L2="$ROOT_DIR/ssl4eo-l8-l2/imgs"
|
||||
L5_L1="$ROOT_DIR/ssl4eo_l_tm_toa/imgs"
|
||||
L7_L1="$ROOT_DIR/ssl4eo_l_etm_toa/imgs"
|
||||
L7_L2="$ROOT_DIR/ssl4eo_l_etm_sr/imgs"
|
||||
L8_L1="$ROOT_DIR/ssl4eo_l_oli_tirs_toa/imgs"
|
||||
L8_L2="$ROOT_DIR/ssl4eo_l_oli_sr/imgs"
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
|
||||
time python3 "$SCRIPT_DIR/delete_mismatch.py" "$L7_L1" "$L7_L2" --delete-different-locations --delete-different-dates
|
||||
time python3 "$SCRIPT_DIR/delete_mismatch.py" "$L8_L1" "$L8_L2" --delete-different-locations --delete-different-dates
|
||||
time python3 "$SCRIPT_DIR/delete_mismatch.py" "$L5_L1" "$L7_L1" "$L7_L2" "$L8_L1" "$L8_L2" --delete-different-locations
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l7-l2"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-30/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_etm_sr"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_30/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -25,7 +25,7 @@ NEW_RESOLUTIONS=30
|
|||
DEFAULT_VALUE=0
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l7-l1"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-30/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_etm_toa"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_30/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -24,7 +24,7 @@ ORIGINAL_RESOLUTIONS=(30 30 30 30 30 60 60 30 15)
|
|||
NEW_RESOLUTIONS=30
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l5-raw"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-60/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_mss_raw"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_60/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -24,7 +24,7 @@ ORIGINAL_RESOLUTIONS=(60 60 60 30)
|
|||
NEW_RESOLUTIONS=60
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l8-l2"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-30/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_oli_sr"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_30/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -25,7 +25,7 @@ NEW_RESOLUTIONS=30
|
|||
DEFAULT_VALUE=0
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l8-l1"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-30/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_oli_tirs_toa"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_30/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -25,7 +25,7 @@ ORIGINAL_RESOLUTIONS=(30 30 30 30 30 30 30 15 30 30 30)
|
|||
NEW_RESOLUTIONS=30
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -7,8 +7,8 @@ set -euo pipefail
|
|||
|
||||
# User-specific parameters
|
||||
ROOT_DIR=data
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo-l5-l1"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo-l-30/sampled_locations.csv"
|
||||
SAVE_PATH="$ROOT_DIR/ssl4eo_l_tm_toa"
|
||||
MATCH_FILE="$ROOT_DIR/ssl4eo_l_30/sampled_locations.csv"
|
||||
NUM_WORKERS=40
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
@ -24,7 +24,7 @@ ORIGINAL_RESOLUTIONS=(30 30 30 30 30 30 30)
|
|||
NEW_RESOLUTIONS=30
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
CLOUD_PCT=20
|
||||
SIZE=264
|
||||
DTYPE=float32
|
|
@ -6,12 +6,12 @@
|
|||
set -euo pipefail
|
||||
|
||||
# User-specific parameters
|
||||
SAVE_PATH=data/ssl4eo-l-30
|
||||
SAVE_PATH=data/ssl4eo_l_30
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
RES=30
|
||||
SIZE=264
|
||||
NUM_CITIES=10000
|
|
@ -6,12 +6,12 @@
|
|||
set -euo pipefail
|
||||
|
||||
# User-specific parameters
|
||||
SAVE_PATH=data/ssl4eo-l-60
|
||||
SAVE_PATH=data/ssl4eo_l_60
|
||||
START_INDEX=0
|
||||
END_INDEX=10
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
RES=60
|
||||
SIZE=264
|
||||
NUM_CITIES=10000
|
|
@ -6,12 +6,12 @@
|
|||
set -euo pipefail
|
||||
|
||||
# User-specific parameters
|
||||
SAVE_PATH=data/ssl4eo-l-conus
|
||||
SAVE_PATH=data/ssl4eo_l_conus
|
||||
START_INDEX=0
|
||||
END_INDEX=1000
|
||||
END_INDEX=10
|
||||
|
||||
# Generic parameters
|
||||
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
|
||||
SCRIPT_DIR=$(cd $(dirname $(dirname "${BASH_SOURCE[0]}")) && pwd)
|
||||
SIZE=264
|
||||
RES=30
|
||||
|
Загрузка…
Ссылка в новой задаче