This commit is contained in:
Siyu Yang 2020-09-29 15:16:56 -07:00
Родитель f19568dc53
Коммит f86a0f6a67
21 изменённых файлов: 1866 добавлений и 202 удалений

Просмотреть файл

@ -1,5 +1,81 @@
# Land cover mapping the Orinoquía region
# Contributing
In this project, we worked with the Wildlife Conservation Society Colombia ([WCS Colombia](https://colombia.wcs.org/en-us/)) to create up-to-date land cover maps of the [Orinoquía](https://colombia.wcs.org/en-us/Wild-Places/Orinoquia.aspx) region in Colombia. This natural region encompasses a range of ecosystems, from seasonally flooded savanna to rainforest. In recent years, new agricultural production practices have put pressure on these ecosystems. It is therefore critically important to present information on land use to policy makers so that we may balance the need for local communities to develop and conserving the biodiversity and ecological functions of the area.
Specifically, we used a land use and land cover (LULC) map that was manually produced using satellite imagery and field data from 2010-2012 to train a semantic segmentation model for 12 land cover classes. The model can be applied to composites of Landsat 8 imagery collected in subsequent years to enable ecological analysis.
This repo contains the scripts and configuration files used to train the land cover model using a median composite of Landsat 8 images from 2013-2014. We will be evaluating the result of applying the model to 2019-2020 imagery and releasing the resulting maps in the coming months.
## Installation
At the root directory of this repo, use `environment.yml` to create a conda virtual environment called `wcs`:
```bash
conda env create --file environment.yml
```
If you need additional packages, add them in `environment.yml` and update the environment:
```bash
conda env update --name wcs --file environment.yml --prune
```
### `ai4eutils`
You need to add the [`ai4eutils`](https://github.com/microsoft/ai4eutils) repo to the `PYTHONPATH`, as we make use of the `geospatial` module there.
### Setting up the Solaris package
We need to set up a separate conda environment for using Solaris. Instructions are available on https://github.com/CosmiQ/solaris.
We do not want to install the Solaris pip package inside the `wcs` environment because it requires versions of PyTorch/TensorFlow that we might not want.
There are two ways to set up Solaris:
1. To install Solaris using their published package on PyPI, first create a conda environment called `solaris`, which will make the next pip installation step go smoothly:
```
conda create --file environment_solaris.yml
pip install solaris==0.2.2
```
2. If installing from source:
Once we clone the repo, add some more dependencies to its `environment.yml` so that Jupyter Notebook ran from our `wcs` environment can use the Solaris kernel to run certain notebooks:
```
- nb_conda_kernels
- ipykernel
- humanfriendly
```
Our fork of Solaris is here: https://github.com/yangsiyu007/solaris
Upstream: https://github.com/CosmiQ/solaris
### Setting up an environment for the newer version of GDAL
It seems that in our existing environment, `rasterio` is not compatible with `GDAL>=3.1`, which is required for creating Cloud Optimized GeoTIFFs (COGs). We create a separate environment for GDAL to run such commands in:
```bash
conda create --name gdalnew --channel conda-forge python gdal==3.1.2
```
## Terminology
See [Satellite data terminology](https://github.com/microsoft/ai4eutils/tree/master/geospatial#satellite-data-terminology).
## Related repositories
https://github.com/microsoft/landcover
https://github.com/microsoft/ai4eutils
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
@ -12,3 +88,7 @@ provided by the bot. You will only need to do this once across all repos using o
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## License
This repository is licensed with the MIT license. See [LICENSE](./LICENSE).

75
commands.sh Normal file
Просмотреть файл

@ -0,0 +1,75 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
#@IgnoreInspection BashAddShebang
# A record of often-used bash commands.
# Add ai4eutils to PYTHONPATH
export PYTHONPATH=${PYTHONPATH}:/home/<username>/wcs/repos/ai4eutils
# Tiling and masking the 2013-2014 set
python data/3_tile_and_mask.py \
--image_path /home/<username>/wcs/mnt/wcs-orinoquia/images_sr_median/2013_2014 \
--label_path /home/<username>/wcs/mnt/wcs-orinoquia/provided_labels/Landuse_shape/derived_20200421/landuse.shp \
--label_property Landuse_WC \
--out_dir /home/<username>/wcs/mnt/wcs-orinoquia/tiles/full_sr_median_2013_2014 \
--region_outline /home/<username>/wcs/mnt/wcs-orinoquia/misc/outline/Orinoquia_outline.shp
# Training locally on a DSVM
conda activate wcs
export PYTHONPATH="${PYTHONPATH}:/home/<username>/wcs/pycharm"
cd training
sh run_training_local.sh
# Scoring
# modify selection of tiles at the top of inference.py
checkpoint_path=/disk/wcs/wcs_coarse_baseline_0_wrong_val_viz/outputs/wcs_coarse_baseline/checkpoints/model_best.pth.tar
out_dir=/home/<username>/wcs/mnt/wcs-orinoquia/delivered/20200715/results_coarse_baseline_201920
python training/inference.py --config_module_path training/experiments/coarse_baseline/coarse_baseline_config_refactored.py --checkpoint_path ${checkpoint_path} --out_dir ${out_dir} --output_softmax
# Mounting blob container
sh mount_wcs_containers.sh
OR
sudo mkdir /mnt/blobfusetmp-wcs-orinoquia
sudo chown USERNAME /mnt/blobfusetmp-wcs-orinoquia
blobfuse /home/<usrname>/wcs/mnt/wcs-orinoquia --tmp-path=/mnt/blobfusetmp-wcs-orinoquia --config-file=wcs-orinoquia.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 -o allow_other
# Copy data to disk for faster access
azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles_masks?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles_masks_coarse?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
Elevation data:
azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/images_srtm?SAS_KEY" /disk/wcs_data --recursive

Просмотреть файл

@ -1,3 +1,6 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
bands_use = {
# reference: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/
# surface reflectance product: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR

Просмотреть файл

@ -1,3 +1,6 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/* Landsat 8 Surface Reflectance imagery */
// imports

Просмотреть файл

@ -1,3 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Execute this script in QGIS to add the queried Sentinel 2 imagery as an XYZ tile to the project.
"""

Просмотреть файл

@ -1,6 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
make_data_shards.py

Просмотреть файл

@ -1,3 +1,12 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Tile scenes of imagery (Landsat 8 only) and corresponding polygon labels, and create masks for the latter
for training segmentation models. This uses the Solaris package and the conda env defined in `environment_solaris.yml`.
This is the script version of the notebook `data/3_tile_and_mask.ipynb`.
"""
import argparse
import os
import pickle

38
environment_solaris.yml Normal file
Просмотреть файл

@ -0,0 +1,38 @@
# Reference: https://github.com/CosmiQ/solaris
name: solaris
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- python>=3.6
- pip>=19.0.3
- affine>=2.3.0
- albumentations=0.4.3
- fiona>=1.7.13
- gdal=3.0.3
- geopandas>=0.7.0
- matplotlib>=3.1.2
- networkx>=2.4
- numpy>=1.17.3
- opencv>=4.1
- pandas>=0.25.3
- pyproj>=2.1
- pytorch>=1.3.1
- pyyaml=5.2
- rasterio>=1.0.23
- requests=2.22.0
# - rio-cogeo>=1.1.6
- rtree>=0.9.3
- scikit-image>=0.16.2
- scipy>=1.3.2
- shapely>=1.6.4
- tensorflow=1.13.1
- torchvision>=0.5.0
- tqdm>=4.40.0
- urllib3>=1.25.7
- nb_conda_kernels
- ipykernel
- humanfriendly
- pip:
- git+https://github.com/Toblerity/shapely.git@master#egg=shapely-1.7.1dev # temporary, 1.8dev required for numpy array support

57
training_wcs/README.md Normal file
Просмотреть файл

@ -0,0 +1,57 @@
# Model training
## Data
### Batching
Each tile is chipped in order (x then y, finishing one row then move to the next row of chips) before moving to the next tile.
### Folder structure and splits
The splits are stored in a JSON file, as a dict with `train` and `val` fields, pointing to lists of TIF image names, e.g. `wcs_orinoquia_trial_region_201301_201512_-69.74_3.742.tif`.
The splits are over tiles.
The image tiles and their label masks are stored together (not in separate folders for each split). Splitting is done by the datasets and loaders.
### Chip size
Using 256 for the width and height of the chips.
## Experiments
### `200211_mini_baseline`
Using bands that seem most helpful by looking at what each have been used for and the visualization of each band
Bands used:
2, 3, 4, 5, 6, 7
Bands 6 and 7 have fairly different pixel value distributions, so including both.
Bands 4 and 5 are combined to give the NDVI, so the channels are
2, 3, 6, 7, NDVI
(stacked in this order)
We are keeping them as float32, which they're read by rasterio as, to retain the original amount of information. We apply clipping, normalization and normalization according to the parameters specified in the experiment config file (values of max, min and gamma - in this experiment gamma is 1, so no gamma correction), all the while keeping them in float32 dtype (`return_array` set as `True` returns float32 instead of uint8).
### Data augmentation techniques
- Undersample chips that only contain the most common classes
### Which bands are most helpful
Band 8 panchromatic needs to be downsampled; band 10 and 11 needs to be enlarged
## Flooding and seasonality
Water has very low values on NDVI plots since it does not reflect much NIR. So getting the greenest pixel should
be okay for flooded savanna.

Просмотреть файл

@ -1,200 +0,0 @@
"""
Configurations for the 20200505_mini_baseline experiment
"""
import json
import os
import torch
import numpy as np
from viz_utils import VizUtils
from training_wcs.scripts.models.unet.unet import Unet # models.unet.unet import Unet
experiment_name = 'wcs_baseline_202005'
eval_mode = True
# I/O -------------------------------------------------------------------------------------------------
if not eval_mode:
aml_data_ref = os.environ.get('AZUREML_DATAREFERENCE_wcsorinoquia', '')
assert len(aml_data_ref) > 0, 'Reading aml_data_ref from environment vars resulted in empty string.'
data_dir = os.path.join(aml_data_ref, 'tiles', 'full_sr_median_2013_2014')
assert 'tiles' in os.listdir(data_dir)
assert 'tiles_masks' in os.listdir(data_dir)
# a dir with experiment_name will be created in here and checkpoints are saved here
# set as './outputs' for AML to stream to this Run's folder
out_dir = '/boto_disk_0/wcs/20190518_feature_scale_1/outputs' # './outputs'
os.makedirs(out_dir, exist_ok=True)
# TF events go here. Set it as './logs' if using AML so they can be streamed
log_dir = '/boto_disk_0/wcs/20190518_feature_scale_1/logs' # './logs'
# train/val splits are stored in
# on AML, this needs to be relative to the source_directory level
splits_file = './constants/splits/full_sr_median_2013_2014_splits.json' # '../training_wcs/scripts/constants/splits/full_sr_median_2013_2014_splits.json'
with open(splits_file) as f:
splits = json.load(f)
train_split = splits['train']
val_split = splits['val']
print(f'Train set has {len(train_split)} tiles; val set has {len(val_split)} tiles.')
# Training ----------------------------------------------------------------------------------------------
evaluate_only = False # Only evaluate the model on the val set once
# this is the *total* epoch; if restarting from a checkpoint, be sure to add the additional number of epochs
# to fine-tune on top of the original value of this var
total_epochs = 1000
print_every = 100 # print every how many steps; just the minibatch loss and accuracy
assert print_every >= 1, 'print_every needs to be greater than or equal 1'
starting_checkpoint_path = None
init_learning_rate = 1e-4
batch_size = 24
# probability a chip is kept in the sample while sampling train and val chips at the start of training_wcs
# this should be smaller if we now have more training_wcs examples
# prob_keep_chip = 0.006
# 133 training_wcs tiles * 64 chips per tile = 8512 chips. Should keep every 177 if visualizing 48, which is 0.0056
keep_every = 30 # a balance between not covering all training_wcs tiles vs iterating through val tiles too many times
num_chips_to_viz = 48
# Hardware and framework --------------------------------------------------------------------------------
dtype = torch.float32
# Model -------------------------------------------------------------------------------------------------
num_classes = 34 # empty plus the 33 WCS classes; this is the number of output nodes
num_in_channels = 5 # 2, 3, 6, 7, NDVI
# the smallest number of filters is 64 when feature_scale is 1, and it is 32 when feature_scale is 2
feature_scale = 1
is_deconv = True # True to use transpose convolution filters to learn upsampling; otherwise upsampling is not learnt
is_batchnorm = True
model = Unet(feature_scale=feature_scale,
n_classes=num_classes,
in_channels=num_in_channels,
is_deconv=is_deconv,
is_batchnorm=is_batchnorm)
# Data ---------------------------------------------------------------------------------------------------
common_classes = [
12
]
less_common_classes = [
32, 33
]
weights = []
for i in range(num_classes):
if i in common_classes:
weights.append(1)
elif i in less_common_classes:
weights.append(2)
else:
weights.append(10)
loss_weights = torch.FloatTensor(weights) # None if no weighting for classes
print('Weights on loss per class used:')
print(loss_weights)
# how many subprocesses to use for data loading
# None for now - need to modify datasets.py to use
data_loader_num_workers = None
# not available in IterableDataset data_loader_shuffle = True # True to have the data reshuffled at every epoch
chip_size = 256
# based on min and max values from the sample tile
# wcs_orinoquia_sr_median_2013_2014-0000007424-0000007424_-71.347_4.593.tif in training_wcs set
# bands 4 and 5 are combined to get the NDVI, so the normalization params for 4 and 5 are
# not used during training_wcs data generation, only for visualization (actually not yet used for viz either).
bands_normalization_params = {
# these are the min and max to clip to for the band
'min': {
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0
},
'max': {
2: 700,
3: 1500,
4: 1500,
5: 5000,
6: 5000,
7: 3000
},
'gamma': { # all the same in this experiment with value 1 which means no effect
2: 1.0,
3: 1.0,
4: 1.0,
5: 1.0,
6: 1.0,
7: 1.0
}
}
viz_util = VizUtils()
def get_chip(tile_reader, chip_window):
"""
Returns:
A numpy array of dims (5, H, W)
"""
normal_bands = [2, 3, 6, 7] # bands to be used without calculating other indices e.g. NDVI
bands_to_stack = []
for b in normal_bands:
# getting one band at a time because min, max and gamma may be different
band = viz_util.show_landsat8_tile(
tile_reader,
bands=[b], # pass in a list to get the batch dimension in the results
window=chip_window,
band_min=bands_normalization_params['min'][b],
band_max=bands_normalization_params['max'][b],
gamma=bands_normalization_params['gamma'][b],
return_array=True
)
bands_to_stack.append(band) # band is 2D (h, w), already squeezed, dtype is float 32
ndvi = viz_util.show_landsat8_ndvi(tile_reader, window=chip_window) # 2D, dtype is float32
bands_to_stack.append(ndvi)
stacked = np.stack(bands_to_stack)
if stacked.shape != (5, chip_size, chip_size):
# default pad constant value is 0
stacked = np.pad(stacked,
[(0, 0), (0, chip_size - stacked.shape[1]), (0, chip_size - stacked.shape[2])])
assert stacked.shape == (5, chip_size, chip_size), f'Landsat chip has wrong shape: {stacked.shape}, should be (5, h, w)'
# prepare the chip for display
chip_for_display = viz_util.show_landsat8_tile(tile_reader,
window=chip_window,
band_max=3000, # what looks good for RGB
gamma=0.5,
return_array=True)
return stacked, chip_for_display

Просмотреть файл

@ -1,3 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
First experiment using 6 channels (2, 3, 6, 7, NDVI, elevation) with the 13 + 1 coarse categories
mapped June 29 2020.

Просмотреть файл

@ -0,0 +1,386 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.core.interactiveshell import InteractiveShell\n",
"InteractiveShell.ast_node_interactivity = 'all' # default is last_expr\n",
"\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from shutil import copy\n",
"\n",
"import azureml.core\n",
"from azureml.core import (Workspace, Experiment, Datastore, Dataset, \n",
" ContainerRegistry, ScriptRunConfig, RunConfiguration, \n",
" Run)\n",
"from azureml.train.dnn import PyTorch\n",
"from azureml.data.data_reference import DataReference\n",
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"from azureml.tensorboard import Tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Version of AML: {}'.format(azureml.core.__version__))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run training on AML\n",
"We used this notebook to run experiments on [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-services/) earlier in the project, and since switched to running experiments on [Data Science Virtual Machines](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). This notebook is no longer maintained but is kept here as an example of using the AML Python SDK. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Provide credentials\n",
"\n",
"Provide the account name and the key to the storage account"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"storage_account_name = os.environ.get('STORAGE_ACCOUNT_NAME')\n",
"storage_account_key = os.environ.get('STORAGE_ACCOUNT_KEY')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to the AML workspace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.get(name='<workspace_name>', \n",
" subscription_id='<subscription_id>', \n",
" resource_group='<resource_group_name>')\n",
"print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"compute_target = ws.compute_targets['gpu-nc6']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to datastore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datastore_name = 'wcsorinoquia'\n",
"container_name = 'wcs-orinoquia'\n",
"\n",
"datastore = None\n",
"for name, ds in ws.datastores.items():\n",
" if name == datastore_name:\n",
" datastore = ds\n",
" \n",
"if datastore is None:\n",
" datastore = Datastore.register_azure_blob_container(\n",
" workspace=ws, \n",
" datastore_name=datastore_name, \n",
" container_name=container_name,\n",
" account_name=storage_account_name, \n",
" account_key=storage_account_key,\n",
" create_if_not_exists=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_ref = DataReference(datastore=datastore,\n",
" data_reference_name=datastore_name,\n",
" mode='mount')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"str(data_ref)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an AML experiment and run configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = 'wcs_baseline_20200506'\n",
"\n",
"exp_folder = './scripts_and_config'\n",
"\n",
"tags = {\n",
" 'model': 'unet, feature scale 2',\n",
" \n",
" 'starting_from': 'None',\n",
" \n",
" 'init_learning_rate': str(1e-4),\n",
" \n",
" 'loss_weights': 'all the same, set to 1',\n",
" \n",
" 'batch_size': '32',\n",
" \n",
" 'imagery': 'full_sr_median_2013_2014',\n",
" \n",
" 'bands': '2, 3, 6, 7, NDVI'\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.getcwd()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"copy('../viz_utils.py', exp_folder)\n",
"\n",
"# copytree requires that the destination folder must not already exist\n",
"os.makedirs(os.path.join(exp_folder, 'constants'), exist_ok=True)\n",
"os.makedirs(os.path.join(exp_folder, 'constants', 'class_lists'), exist_ok=True)\n",
"os.makedirs(os.path.join(exp_folder, 'constants', 'splits'), exist_ok=True)\n",
"\n",
"copy('../constants/landsat_bands_info.py', os.path.join(exp_folder, 'constants'))\n",
"copy('../constants/class_lists/lulc_wcs_label_maps.json', os.path.join(exp_folder, 'constants', 'class_lists'))\n",
"copy('../constants/splits/full_sr_median_2013_2014_splits.json', os.path.join(exp_folder, 'constants', 'splits'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"PyTorch.get_supported_versions()\n",
"PyTorch.DEFAULT_VERSION"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"script_params = {\n",
" '--config_module_path': 'experiments.baseline.baseline_config'\n",
"}\n",
"\n",
"pt_est = PyTorch(\n",
" source_directory=exp_folder,\n",
" script_params=script_params,\n",
" entry_script='train.py', # relative to source_directory\n",
" \n",
" inputs=[data_ref],\n",
" \n",
" compute_target=compute_target,\n",
" node_count=1,\n",
" use_gpu=True,\n",
" \n",
" # framework_version='1.3.1', # this version gets used, but can't specify it\n",
" \n",
" pip_packages=['pillow==6.1', 'tensorflow==1.14.0', \n",
" 'numpy', 'pandas', 'matplotlib', \n",
" 'geopandas', 'rasterio', 'scikit-image',\n",
" ]\n",
"\n",
"# Both of the following did not work (using conda does not work) \n",
"# - couldn't import rasterio.windows.Window if using conda_packages instead of pip_packages\n",
"\n",
"# conda_dependencies_file_path='training_environment.yml'\n",
"\n",
"# conda_packages=['numpy', 'pandas', 'matplotlib', \n",
"# 'geopandas', 'rasterio', 'scikit-image',\n",
"# 'tensorflow==1.14.0', 'pillow==6.1']\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = exp.submit(pt_est, tags=tags)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_details()['runId']\n",
"run.get_status()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### To archive an Experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name='name_of_exp_to_archive')\n",
"exp.archive()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start TensorBoard\n",
"\n",
"https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-monitor-tensorboard\n",
"\n",
"We wrote logs to ./logs, which AML uploads to Artifact Service and makes available to a TensorBoard instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# recover the Run object if needed\n",
"run_id = 'wcs_baseline_<run_id>' # get the run_id from above cell or from Azure Portal\n",
"run = Run(exp, run_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# when done, call the stop() method of the Tensorboard object, or it will stay running even after your job completes.\n",
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The TensorBoard stops loading after a little while and needs to be restarted..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:wcs] *",
"language": "python",
"name": "conda-env-wcs-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -0,0 +1,26 @@
#!/usr/bin/env bash
# run from training_wcs/
rm -rf scripts_and_config/constants
rm -rf scripts_and_config/experiment_config
mkdir scripts_and_config/constants
mkdir scripts_and_config/constants/class_lists
mkdir scripts_and_config/constants/splits
cp ../constants/landsat_bands_info.py scripts_and_config/constants
cp ../constants/class_lists/wcs_fine_coarse_label_maps.json scripts_and_config/constants/class_lists
mkdir scripts_and_config/experiment_config
cp ../viz_utils.py scripts_and_config
# MODIFY here and below at the `python` command to specify which experiment config to use
cp ./experiments/coarse_baseline/coarse_baseline_config.py scripts_and_config/experiment_config
# MODIFY for new experiment config file
# in specifying the config_module_path, remember only the config file itself is copied to folder experiment_config
# without the experiment grouping level folder present in the repo
/anaconda/envs/wcs/bin/python scripts_and_config/train.py --config_module_path scripts_and_config/experiment_config/coarse_baseline_config.py

Просмотреть файл

@ -0,0 +1,290 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Scores a list of tiles specified at the top of this script.
Scoring is not batched i.e. batch size is 1. Some squeeze() need to be rid of to use for larger batch sizes.
"""
import argparse
import importlib
import json
import math
import os
import sys
from datetime import datetime
import numpy as np
import rasterio
import torch
# SPECIFY input_tiles as a list of absolute paths to imagery .tif files
# with open('/home/boto/wcs/pycharm/constants/splits/full_sr_median_2013_2014_splits.json') as f:
# input_tiles_json = json.load(f)
# val_tiles = input_tiles_json['val']
# input_tiles = [os.path.join('/boto_disk_0/wcs_data/tiles/full_sr_median_2013_2014/tiles', i) for i in
# val_tiles]
# all 2019 - 2020 tiles
tiles_dir = '/home/boto/wcs/mnt/wcs-orinoquia/images_sr_median/2019_202004'
input_tiles = [os.path.join(tiles_dir, i) for i in os.listdir(tiles_dir)]
def write_colormap(tile_writer, config):
"""Write the experiment config's RasterLabelVisualizer's colormap to a TIFF file"""
tile_writer.write_colormap(1, config.label_viz.get_tiff_colormap()) # 1 for band 1
def main():
parser = argparse.ArgumentParser(description='Inference on tiles')
parser.add_argument(
'--config_module_path',
required=True,
help="Path to the .py file containing the experiment's configurations"
)
parser.add_argument(
'--checkpoint_path',
required=True,
help='Path to the checkpoint .tar file to use for scoring'
)
parser.add_argument(
'--out_dir',
default='./outputs',
help='Path to a dir to put the prediction tiles.'
)
parser.add_argument(
'--output_softmax',
action='store_true',
help='Outputs the softmax probability scores as tiff tiles too, and its visualization as RGB tiles.'
)
args = parser.parse_args()
assert os.path.exists(args.checkpoint_path), f'Checkpoint at {args.checkpoint_path} does not exist.'
out_dir = args.out_dir
os.makedirs(out_dir, exist_ok=True)
# config for the training_wcs run that generated the model checkpoint
try:
module_name = 'config'
spec = importlib.util.spec_from_file_location(module_name, args.config_module_path)
config = importlib.util.module_from_spec(spec)
sys.modules[module_name] = config
spec.loader.exec_module(config)
except Exception as e:
print(f'Failed to import the configurations. Exception: {e}')
sys.exit(1)
# device configuration
print(f'Using PyTorch version {torch.__version__}.')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}.')
# scoring configuration
chip_size = config.chip_size
prediction_window_size = config.prediction_window_size if config.prediction_window_size else 128
prediction_window_offset = int((chip_size - prediction_window_size) / 2)
print((f'Using chip_size {chip_size} and window_size {prediction_window_size}. '
f'So window_offset is {prediction_window_offset}'))
start_time = datetime.now()
# instantiate the model and then load its state from the given checkpoint
model = config.model
checkpoint = torch.load(args.checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['state_dict'])
print(f"Using checkpoint at epoch {checkpoint['epoch']}, step {checkpoint['step']}, "
f"val accuracy is {checkpoint.get('val_acc', 'Not Available')}")
model = model.to(device=device)
model.eval() # eval mode: norm or dropout layers will work in eval mode instead of training_wcs mode
with torch.no_grad(): # with autograd engine deactivated
for i_file, input_tile_path in enumerate(input_tiles):
out_path_hardmax = os.path.join(out_dir, 'res_' + os.path.basename(input_tile_path))
if os.path.exists(out_path_hardmax):
print(f'Skipping already scored tile {out_path_hardmax}')
continue
print(f'Scoring input tile {i_file} out of {len(input_tiles)}, {input_tile_path}')
# dict_scores = {} # dict of window tuple to numpy array of scores
# load entire tile into memory
tile_reader = rasterio.open(input_tile_path)
# use the get_chip function (to normalize the bands in a way that's consistent with training_wcs)
# but get a chip that's the size of the tile - all at once
whole_tile_window = (0, 0, tile_reader.width, tile_reader.height)
data_array: np.ndarray = config.get_chip(tile_reader, whole_tile_window, chip_for_display=False)
# pad by mirroring at the edges to facilitate predicting only on the center crop
data_array = np.pad(data_array,
[
(0, 0), # only pad height and width
(prediction_window_offset, prediction_window_offset), # height / rows
(prediction_window_offset, prediction_window_offset) # width / cols
],
mode='symmetric')
# set up hardmax prediction output tile
tile_writer_hardmax = rasterio.open(
out_path_hardmax,
'w',
driver='GTiff',
height=tile_reader.height,
width=tile_reader.width,
count=1, # only 1 "band", the hardmax predicted label at the pixel
dtype=np.uint8,
crs=tile_reader.crs,
transform=tile_reader.transform,
nodata=0,
compress='lzw',
blockxsize=prediction_window_size,
# reads and writes are most efficient when the windows match the datasets own block structure
blockysize=prediction_window_size
)
write_colormap(tile_writer_hardmax, config)
# set up the softmax output tile
if args.output_softmax:
out_path_softmax = os.path.join(out_dir, 'prob_' + os.path.basename(input_tile_path))
# probabilities projected into RGB for intuitive viewing
out_path_softmax_viz = os.path.join(out_dir, 'prob_viz_' + os.path.basename(input_tile_path))
tile_writer_softmax = rasterio.open(
out_path_softmax,
'w',
driver='GTiff',
height=tile_reader.height,
width=tile_reader.width,
count=config.num_classes, # as many "bands" as there are classes to house the softmax probabilities
# quantize probabilities scores so each can be stored as one byte instead of 4-byte float32
dtype=np.uint8,
crs=tile_reader.crs,
transform=tile_reader.transform,
nodata=0,
compress='lzw',
blockxsize=prediction_window_size,
blockysize=prediction_window_size
)
tile_writer_softmax_viz = rasterio.open(
out_path_softmax_viz,
'w',
driver='GTiff',
height=tile_reader.height,
width=tile_reader.width,
count=3, # RGB
dtype=np.uint8,
crs=tile_reader.crs,
transform=tile_reader.transform,
nodata=0,
compress='lzw',
blockxsize=prediction_window_size,
blockysize=prediction_window_size
)
# score the tile in windows
num_rows = math.ceil(tile_reader.height / prediction_window_size)
num_cols = math.ceil(tile_reader.width / prediction_window_size)
for col_idx in range(num_cols):
col_start = col_idx * prediction_window_size
col_end = col_start + chip_size
for row_idx in range(num_rows):
row_start = row_idx * prediction_window_size
row_end = row_start + chip_size
chip = data_array[:, row_start:row_end, col_start: col_end]
# pad to (chip_size, chip_size)
chip = np.pad(chip,
[(0, 0), (0, chip_size - chip.shape[1]), (0, chip_size - chip.shape[2])])
# processing it as the dataset loader _get_chip does
chip = np.nan_to_num(chip, nan=0.0, posinf=1.0, neginf=-1.0)
sat_mask = chip[0].squeeze() > 0.0 # mask out DEM data where there's no satellite data
chip = chip * sat_mask
chip = np.expand_dims(chip, axis=0)
chip = torch.FloatTensor(chip).to(device=device)
try:
scores = model(chip) # these are scores before the final softmax
except Exception as e:
print(f'Exception in scoring loop model() application: {e}')
print(f'Chip has shape {chip.shape}')
sys.exit(1)
_, preds = scores.max(1)
softmax_scores = torch.nn.functional.softmax(scores, dim=1)
softmax_scores = softmax_scores.cpu().numpy() # (batch_size, num_classes, H, W)
preds = preds.cpu().numpy().astype(np.uint8)
assert np.max(preds) < config.num_classes
# model output needs to be cropped to the window so they can be written correctly into the tile
# same order as rasterio window: (col_off x, row_off y, width delta_x, height delta_y)
valid_window_tup = (
col_start,
row_start,
min(prediction_window_size, tile_reader.width - col_idx * prediction_window_size),
min(prediction_window_size, tile_reader.height - row_idx * prediction_window_size)
)
# preds has a batch dim here
preds1 = preds[:,
prediction_window_offset:prediction_window_offset + valid_window_tup[3],
prediction_window_offset:prediction_window_offset + valid_window_tup[
2]] # last dim is the inner most, x, width
# debug - print(f'col is {col_idx}, row is {row_idx}, valid_window_tup is {valid_window_tup}, preds shape: {preds.shape}, preds1 shape: {preds1.shape}')
window = rasterio.windows.Window(valid_window_tup[0], valid_window_tup[1],
valid_window_tup[2], valid_window_tup[3])
tile_writer_hardmax.write(preds1, window=window)
if args.output_softmax:
# cropping, e.g. from (1, 14, 256, 256) to (1, 14, 128, 128)
softmax_scores = softmax_scores[:, :,
prediction_window_offset:prediction_window_offset + valid_window_tup[3],
prediction_window_offset:prediction_window_offset + valid_window_tup[2]]
# get rid of batch dim. First dim for TIFF writer needs to be number of bands to write
softmax_scores = softmax_scores.squeeze()
# quantize using 256 buckets so probability value is writen in 1 byte instead of 4 (float32)
softmax_scores_quantized = (softmax_scores * 255).astype(np.uint8)
tile_writer_softmax.write(softmax_scores_quantized, window=window)
# softmax_scores_project is of dims ((batch_size), H, W, 3)
# make sure to not use the quantized version!
softmax_scores_proj = config.label_viz.visualize_softmax_predictions(softmax_scores)
softmax_scores_proj = softmax_scores_proj.squeeze() # assume batch_size is 1 TODO
# we need it to be (3, H, W) to write to TIFF
softmax_scores_proj = np.transpose(softmax_scores_proj, axes=(2, 0, 1)).astype(np.uint8)
# debug - print(f'softmax_scores_proj min is {np.min(softmax_scores_proj)}, max is {np.max(softmax_scores_proj)}')
tile_writer_softmax_viz.write(softmax_scores_proj, window=window)
tile_writer_hardmax.close()
tile_writer_softmax.close()
tile_writer_softmax_viz.close()
duration = datetime.now() - start_time
print(f'Inference finished in {duration}, which is {duration / len(input_tiles)} per tile')
if __name__ == '__main__':
main()

Просмотреть файл

@ -0,0 +1,321 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Training script that trains a model instantiated in the configuration file specified.
Adapted from https://github.com/pytorch/examples/blob/master/imagenet/main.py
"""
import argparse
import importlib
import logging
import os
import shutil
import sys
import math
from datetime import datetime
from shutil import copytree
from time import localtime, strftime
import torch
import torch.nn as nn
import torch.optim as optim
from utils.logger import Logger
from utils.train_utils import AverageMeter, log_sample_img_gt, render_prediction
# ai4eutils needs to be on the PYTHONPATH
from geospatial.enums import ExperimentConfigMode
# from azureml.core.run import Run # importing this would make all the logging.info not output to stdout
logging.basicConfig(stream=sys.stdout,
level=logging.INFO,
format='{asctime} {levelname} {message}',
style='{',
datefmt='%Y-%m-%d %H:%M:%S')
logging.info(f'Using PyTorch version {torch.__version__}.')
parser = argparse.ArgumentParser(description='UNet training_wcs')
parser.add_argument(
'--config_module_path',
help="Path to the .py file containing the experiment's configurations")
args = parser.parse_args()
# config for the run
try:
module_name = 'config'
spec = importlib.util.spec_from_file_location(module_name, args.config_module_path)
config = importlib.util.module_from_spec(spec)
sys.modules[module_name] = config
spec.loader.exec_module(config)
except Exception as e:
logging.error(f'Failed to import the configurations. Exception: {e}')
sys.exit(1)
# device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
logging.info(f'Using device: {device}.')
torch.set_default_dtype(config.dtype)
def get_sample_images(which_set='train'):
"""
Get a deterministic set of images in the specified set (train or val) by using the dataset and
not the dataloader. Only works if the dataset is not IterableDataset.
Args:
which_set: one of 'train' or 'val'
Returns:
samples: a dict with keys 'chip' and 'chip_label', pointing to torch Tensors of
dims (num_chips_to_visualize, channels, height, width) and (num_chips_to_visualize, height, width)
respectively
"""
assert which_set == 'train' or which_set == 'val'
dataset = dset_train if which_set == 'train' else dset_val
num_to_skip = 5 # first few chips might be mostly blank
assert len(dataset) > num_to_skip + config.num_chips_to_viz
keep_every = math.floor((len(dataset) - num_to_skip) / config.num_chips_to_viz)
sample_idx = range(num_to_skip, len(dataset), keep_every)
samples = dataset[sample_idx]
return samples
def visualize_result_on_samples(model, sample_images, logger, step, split='train'):
model.eval()
with torch.no_grad():
sample_images = sample_images.to(device=device)
scores = model(sample_images) # these are scores before the final softmax
# TODO Juan suggests that we also visualize the second or third most confident classes predicted
_, preds = scores.max(1)
preds = preds.cpu().numpy()
images_li = []
for i in range(preds.shape[0]):
preds_i = preds[i, :, :].squeeze()
im, buf = render_prediction(preds_i)
images_li.append((im, buf))
logger.image_summary(split, 'result', images_li, step)
def weights_init(m):
if isinstance(m, nn.Conv2d):
logging.info('Initializing weights with Kaiming uniform')
nn.init.kaiming_uniform_(m.weight)
m.bias.data.zero_()
def train(loader, model, criterion, optimizer, epoch, step, logger_train):
for batch_idx, data in enumerate(loader):
# put model to training_wcs mode; we put it in eval mode in visualize_result_on_samples for every print_every
model.train()
step += 1
x = data['chip'].to(device=device) # move to device, e.g. GPU
y = data['chip_label'].to(device=device) # y is not a int value here; also an image
# forward pass on this batch
scores = model(x)
loss = criterion(scores, y)
# backward pass
optimizer.zero_grad()
loss.backward() # compute gradients
optimizer.step() # update parameters
# TensorBoard logging and print a line to stdout; note that the accuracy is wrt the current mini-batch only
if step % config.print_every == 1:
_, preds = scores.max(1) # equivalent to preds = scores.argmax(1)
accuracy = (y == preds).float().mean()
info = {'minibatch_loss': loss.item(), 'minibatch_accuracy': accuracy.item()}
for tag, value in info.items():
logger_train.scalar_summary(tag, value, step)
logging.info(
'Epoch {}, step {}, train minibatch_loss is {}, train minibatch_accuracy is {}'.format(
epoch, step, info['minibatch_loss'], info['minibatch_accuracy']))
return step
def evaluate(loader, model, criterion):
"""Evaluate the model on dataset of the loader"""
losses = AverageMeter()
accuracies = AverageMeter()
model.eval() # put model to evaluation mode
with torch.no_grad():
for t, data in enumerate(loader):
x = data['chip'].to(device=device) # move to device, e.g. GPU
y = data['chip_label'].to(device=device, dtype=torch.long)
scores = model(x)
loss = criterion(scores, y)
# DEBUG logging.debug('Val loss = %.4f' % loss.item())
_, preds = scores.max(1)
accuracy = (y == preds).float().mean()
losses.update(loss.item(), x.size(0))
accuracies.update(accuracy.item(), 1) # average already taken for accuracy for each pixel
return losses.avg, accuracies.avg
def save_checkpoint(state, is_best, checkpoint_dir='../checkpoints'):
"""
checkpoint_dir is used to save the best checkpoint if this checkpoint is best one so far
"""
checkpoint_path = os.path.join(checkpoint_dir,
f"checkpoint_epoch{state['epoch'] - 1}_"
f"{strftime('%Y-%m-%d-%H-%M-%S', localtime())}.pth.tar")
torch.save(state, checkpoint_path)
if is_best:
shutil.copyfile(checkpoint_path, os.path.join(checkpoint_dir, 'model_best.pth.tar'))
def main():
out_dir = './outputs' if config.out_dir is None else config.out_dir
# if running locally, copy current version of scripts and config to output folder as a record
if out_dir != './outputs':
scripts_copy_dir = os.path.join(out_dir, 'repo_copy')
cwd = os.getcwd()
logging.info(f'cwd is {cwd}')
if 'scripts' not in cwd:
cwd = os.path.join(cwd, 'scripts')
copytree(cwd, scripts_copy_dir) # scripts_copy_dir cannot already exist
logging.info(f'Copied over scripts to output dir at {scripts_copy_dir}')
# create checkpoint dir
checkpoint_dir = os.path.join(out_dir, config.experiment_name, 'checkpoints')
os.makedirs(checkpoint_dir, exist_ok=True)
# model
model = config.model
model = model.to(device=device) # move the model parameters to CPU/GPU
if config.loss_weights is not None:
assert isinstance(config.loss_weights, torch.Tensor), \
'config.loss_weight needs to be of Tensor type'
assert len(config.loss_weights) == config.num_classes, \
f'config.loss_weight has length {len(config.loss_weights)} but needs to equal to num_classes'
criterion = nn.CrossEntropyLoss(weight=config.loss_weights).to(device=device)
optimizer = optim.Adam(model.parameters(), lr=config.init_learning_rate)
# resume from a checkpoint if provided
starting_checkpoint_path = config.starting_checkpoint_path
if starting_checkpoint_path and os.path.isfile(starting_checkpoint_path):
logging.info('Loading checkpoint from {}'.format(starting_checkpoint_path))
checkpoint = torch.load(starting_checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['state_dict'])
# don't load the optimizer settings so that a newly specified lr can take effect
# optimizer.load_state_dict(checkpoint['optimizer'])
starting_epoch = checkpoint['epoch'] # we incremented epoch before saving it, so can just start here
step = checkpoint.get('step', 0)
best_acc = checkpoint.get('best_acc', 0.0)
logging.info(f'Loaded checkpoint, starting epoch is {starting_epoch}, step is {step}, '
f'best accuracy is {best_acc}')
else:
logging.info('No valid checkpoint is provided. Start to train from scratch...')
model.apply(weights_init)
starting_epoch = 0
best_acc = 0.0
step = 0
# data sets and loaders, which will be added to the global scope for easy access in other functions
global dset_train, loader_train, dset_val, loader_val
dset_train = config.dset_train
loader_train = config.loader_train
dset_val = config.dset_val
loader_val = config.loader_val
logging.info('Getting sample chips from val and train set...')
samples_val = get_sample_images(which_set='val')
samples_train = get_sample_images(which_set='train')
# logging
# run = Run.get_context()
aml_run = None
logger_train = Logger('train', config.log_dir, config.batch_size, aml_run)
logger_val = Logger('val', config.log_dir, config.batch_size, aml_run)
log_sample_img_gt(logger_train, logger_val,
samples_train, samples_val)
logging.info('Logged image samples')
if config.config_mode == ExperimentConfigMode.EVALUATION:
val_loss, val_acc = evaluate(loader_val, model, criterion)
logging.info(f'Evaluated on val set, loss is {val_loss}, accuracy is {val_acc}')
return
for epoch in range(starting_epoch, config.total_epochs):
logging.info(f'\nEpoch {epoch} of {config.total_epochs}')
# train for one epoch
# we need the `step` concept for TensorBoard logging only
train_start_time = datetime.now()
step = train(loader_train, model, criterion, optimizer, epoch, step, logger_train)
train_duration = datetime.now() - train_start_time
# evaluate on val set
logging.info('Evaluating model on the val set at the end of epoch {}...'.format(epoch))
eval_start_time = datetime.now()
val_loss, val_acc = evaluate(loader_val, model, criterion)
eval_duration = datetime.now() - eval_start_time
logging.info(f'\nEpoch {epoch}, step {step}, val loss is {val_loss}, val accuracy is {val_acc}\n')
logger_val.scalar_summary('val_loss', val_loss, step)
logger_val.scalar_summary('val_acc', val_acc, step)
# visualize results on both train and val images
visualize_result_on_samples(model, samples_train['chip'], logger_train, step, split='train')
visualize_result_on_samples(model, samples_val['chip'], logger_val, step, split='val')
# log values and gradients of the parameters (histogram summary)
for tag, value in model.named_parameters():
tag = tag.replace('.', '/')
logger_train.histo_summary(tag, value.data.cpu().numpy(), step)
logger_train.histo_summary(tag + '/grad', value.grad.data.cpu().numpy(), step)
# record the best accuracy; save checkpoint for every epoch
is_best = val_acc > best_acc
best_acc = max(val_acc, best_acc)
logging.info(
f'Iterated through {step * config.batch_size} examples. Saved checkpoint for epoch {epoch}. '
f'Is it the highest accuracy checkpoint so far: {is_best}\n')
save_checkpoint({
# add 1 so when we restart from it, we can just read it and proceed
'epoch': epoch + 1,
'step': step,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
'val_acc': val_acc,
'best_acc': best_acc
}, is_best, checkpoint_dir)
# log execution time for this epoch
logging.info((f'epoch training_wcs duration is {train_duration.total_seconds()} seconds;'
f'evaluation duration is {eval_duration.total_seconds()} seconds'))
logger_val.scalar_summary('epoch_duration_train', train_duration.total_seconds(), step)
logger_val.scalar_summary('epoch_duration_val', eval_duration.total_seconds(), step)
if __name__ == '__main__':
main()

Просмотреть файл

@ -0,0 +1,81 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
Transforms to be performed on the loaded chip: ToTensor, RandomHorizontalFlip, and RandomVerticalFlip.
Referencing: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class
We could use https://github.com/mlagunas/pytorch-nptransforms/ instead of using torch.flip (Caleb suggested)
"""
import random
import torch
class ToTensor(object):
"""Convert ndarrays in sample to Tensors."""
def __call__(self, sample):
return {
# 'chip_id': sample['chip_id'],
# some tiles become float64 doubles for some reason if just using from_numpy (?)
'chip': torch.from_numpy(sample['chip']).to(dtype=torch.float32),
# torch.long is required to compute nn.CrossEntropyLoss
'chip_label': torch.from_numpy(sample['chip_label']).to(dtype=torch.long),
# 'chip_for_display': torch.from_numpy(sample['chip_for_display'])
}
class RandomHorizontalFlip(object):
""" Horizontally flip chip with probability 0.5
Horizontal flip - flip along width dim
Can use https://pytorch.org/docs/master/generated/torch.fliplr.html instead when it's available in > 1.5.1
chip is of dim (channel, height, width)
chip_for_display is of dim (height, width, channel)
chip_label is of dim (height, width)
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, sample):
if random.random() < self.prob:
return {
# 'chip_id': sample['chip_id'],
'chip': torch.flip(sample['chip'], [2]),
'chip_label': torch.flip(sample['chip_label'], [1]),
# 'chip_for_display': torch.flip(sample['chip_for_display'], [1])
}
else:
return sample
class RandomVerticalFlip(object):
""" Vertically flip chip with probability 0.5
Vertical flip - flip along height dim
Can use https://pytorch.org/docs/master/generated/torch.flipud.html instead when it's available in > 1.5.1
chip is of dim (channel, height, width)
chip_for_display is of dim (height, width, channel)
chip_label is of dim (height, width)
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, sample):
if random.random() < self.prob:
return {
# 'chip_id': sample['chip_id'],
'chip': torch.flip(sample['chip'], [1]),
'chip_label': torch.flip(sample['chip_label'], [0]),
# 'chip_for_display': torch.flip(sample['chip_for_display'], [0])
}
else:
return sample

Просмотреть файл

@ -0,0 +1,271 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
This script contains two classes used for iterating chips in the dataset:
- LandsatDataset: reads one chip at a time from tiles in disk (so very slow)
- SingleShardChipsDataset:
"""
import logging
import math
import os
from random import shuffle
import numpy as np
import rasterio
from PIL import Image
from torch.utils.data.dataset import IterableDataset, Dataset
LABELS_TO_USE = 'tiles_masks_coarse' # 'tiles_masks' - which set of labels to use
class LandsatDataset(IterableDataset):
"""Class representing a dataset of Landsat image chips and label masks, such as a training_wcs set.
It reads one chip at a time from tiles stored in disk.
make_chip_shards.py iterate through an instance of this class for train and val set and store the chips
in numpy arrays for faster iteration during training_wcs.
"""
def __init__(self, data_dir, tile_names, get_chip_func,
get_chip_mask_func=None, repeat_chip_func=None,
chip_size=256, tile_size=2000, transform=None):
"""
Args:
data_dir (string): Directory containing the subfolders tiles and tiles_label_masks
tile_names (list of strings): Names of tiles in the split corresponding to this Dataset
get_chip_func (function): A function that returns (chip, chip_for_display) given the arguments
(rasterio reader, chip_window)
get_chip_mask_func (function): A function that returns chip_mask given the arguments
(rasterio reader, chip_window for PIL image)
transform (callable, optional): Optional transform to be applied
on a sample.
"""
super(LandsatDataset).__init__()
assert len(tile_names) > 0, 'tile_names passed to LandsatDataset is empty'
self.data_dir = data_dir
self.get_chip_func = get_chip_func
self.get_chip_mask_func = get_chip_mask_func
self.repeat_chip_func = repeat_chip_func
self.chip_size = chip_size
self.tile_names = tile_names
self.num_tiles = len(tile_names)
# how many rows and columns of chips can fit on this tile
# pixels per side / chip_size of 256
self.num_rows = math.ceil(tile_size / chip_size)
self.num_cols = self.num_rows # only support square tiles and chips
print(f'Number of rows or columns in a tile for making chips is {self.num_rows}')
self.transform = transform
self._shuffle_tiles = False
@property
def shuffle_tiles(self):
return self._shuffle_tiles
@shuffle_tiles.setter
def shuffle_tiles(self, value):
assert isinstance(value, bool)
self._shuffle_tiles = value
def __iter__(self):
# shuffle the tiles if the shuffle_tiles option is on (each epoch will get a new iterator instance, I think)
if self._shuffle_tiles:
print('Returning iterator instance with the tiles shuffled!')
shuffle(self.tile_names)
else:
print('Returning iterator instance with the tiles NOT shuffled!')
return self._get_generator()
@staticmethod
def _default_get_chip_mask(tile_label, chip_window):
"""
Args:
tile_label: PIL Image of the tile's label mask
chip_window: PIL window (left x, upper y, right x2, lower y2)
Returns:
chip_label as a numpy array of dtype np.uint8
"""
chip_label = tile_label.crop(box=chip_window)
chip_label = np.array(chip_label, dtype=np.uint8)
return chip_label
def _get_chip(self, tile_reader, tile_label, col_idx, row_idx):
"""Call the config supplied functions to get the imagery chip and optionally the label mask chip
referencing the col_idx and row_idx specified.
This function is responsible for
- observing entirely empty chips and masking out imagery pixels corresponding to label masks where
there is no label (category of 0).
- masking out label mask where there is no valid imagery available
(i.e. both imagery and label chips should be empty / of value 0 in the same pixels)
Args:
tile_reader: rasterio dataset object of the imagery tile
tile_label: PIL Image of the tile's label mask
col_idx: index of the column in the tile this call should reference
row_idx: index of the row in the tile this call should reference
Returns:
(chip, chip_label_masked, chip_for_display)
"""
# torch dimensions: batch, channel, height, width
# rasterio window is (col_off x, row_off y, width, height)
chip_window = (
col_idx * self.chip_size,
row_idx * self.chip_size,
self.chip_size,
self.chip_size
)
chip, chip_for_display = self.get_chip_func(tile_reader, chip_window) # dim is (C, H, W)
# where cloud is masked out on GEE, there is NaN values
# pixel values are normalized to [0, 1]; NDVI is [-1, 1]
chip = np.nan_to_num(chip, nan=0.0, posinf=1.0, neginf=-1.0)
# return None if satellite imagery in this chip is empty / not available
# sometimes the SWIR has non-zero values while all other bands are zero, so
# using band 2 (visible blue), which is the first in the stack, to mask out labels
if chip[0].max() == chip[0].min():
logging.debug('_get_chip, Empty imagery chip!')
return None, None, None
# mask for where data is available in the satellite chip,
# again using band 2 (visible blue) to determine if the satellite chip is available
# as opposed to summing across all bands
# also avoids the problem that the elevation layer has no gaps
sat_mask = chip[0].squeeze() > 0.0
# PIL window is (left x, upper y, right x2, lower y2)
mask_chip_window = (
col_idx * self.chip_size,
row_idx * self.chip_size,
col_idx * self.chip_size + self.chip_size,
row_idx * self.chip_size + self.chip_size
)
if self.get_chip_mask_func is None:
chip_label = LandsatDataset._default_get_chip_mask(tile_label, chip_window=mask_chip_window)
else:
chip_label = self.get_chip_mask_func(tile_label, chip_window=mask_chip_window)
if chip_label is None:
print('_get_chip, Skipped due to label mask not meeting criteria')
# the label mask can also be empty on some places where we have imagery data
if chip_label.max() == 0 and chip_label.min() == 0:
logging.debug('_get_chip, Empty label mask chip!')
return None, None, None
label_available_mask = chip_label > 0
# we only want to keep pixels that have both data and label
data_label_mask = np.logical_and(sat_mask, label_available_mask)
chip = chip * data_label_mask
chip = chip * sat_mask # masking out the elevation and other bands that may not be zero where the blue band is
if chip_for_display.shape != (self.chip_size, self.chip_size, 3):
# default pad constant value is 0
chip_for_display = np.pad(chip_for_display,
[(0, self.chip_size - chip_for_display.shape[0]),
(0, self.chip_size - chip_for_display.shape[1]),
(0, 0)])
chip_for_display = chip_for_display * np.expand_dims(data_label_mask, axis=2) # can't broadcast here somehow
chip_label_masked = chip_label * data_label_mask
# these are without the batch dimension, which will be added by the default collate fn as the outmost/first dim
return chip, chip_label_masked, chip_for_display
def _get_generator(self):
num_chips = 0
for tile_idx, tile_name in enumerate(self.tile_names):
# print(f'Now accessing tile number {tile_idx}, {tile_name}')
tile_path = os.path.join(self.data_dir, 'tiles', tile_name)
parts = tile_name.split('.tif')[0].split('_')
lon = parts[-2]
lat = parts[-1]
mask_name = f'mask_{lon}_{lat}.tif'
mask_path = os.path.join(self.data_dir, LABELS_TO_USE, mask_name)
tile_reader = rasterio.open(tile_path)
tile_label = Image.open(mask_path)
for row_idx in range(self.num_rows):
for col_idx in range(self.num_cols):
chip, chip_label, chip_for_display = self._get_chip(tile_reader, tile_label, col_idx, row_idx)
if chip is None:
continue # completely empty chip
sample = {
'chip_id': f'{tile_name}_col{col_idx}_row{row_idx}',
'chip': chip,
'chip_label': chip_label,
'chip_for_display': chip_for_display
}
if self.transform:
sample = self.transform(sample)
num_chips += 1
yield sample
if self.repeat_chip_func is not None and self.repeat_chip_func(chip_label):
#print('Chip repeated because of repeat_chip_func!')
yield sample # execution should return to this condition
else:
#print('Chip NOT repeated')
pass
print(f'End of generator... Number of chips is {num_chips}')
class SingleShardChipsDataset(Dataset):
"""
A class for iterating over chips created using the make_chip_shards.py script. The entire shard is loaded
into memory as a numpy array, so iteration is fast.
Only supports a single shard currently. To support multiple shards effectively we probably need to
implement IterableDataset.
"""
def __init__(self, data_shard_dir, shard_prefix='train', channels=None, transform=None):
# available channels in this round are (2, 3, 6, 7, NDVI, elevation)
# so if you want all the channels to be used, set channels=(0, 1, 2, 3, 4, 5)
super(SingleShardChipsDataset).__init__()
self.transform = transform
self.chips = np.load(os.path.join(data_shard_dir, f'{shard_prefix}_chips_0.npy'))
if channels is not None:
assert max(channels) < self.chips.shape[1]
assert min(channels) >= 0
self.chips = self.chips[:, channels, :, :]
self.labels = np.load(os.path.join(data_shard_dir, f'{shard_prefix}_labels_0.npy'))
print(f'For prefix {shard_prefix}, loaded chips of dims {self.chips.shape}, labels of dims {self.labels.shape}')
def __len__(self):
return len(self.chips)
def __getitem__(self, idx):
sample = {
'chip': self.chips[idx, :, :, :],
'chip_label': self.labels[idx, :, :]
}
if self.transform:
sample = self.transform(sample)
return sample

Просмотреть файл

@ -0,0 +1,90 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""
The Logger class logs metrics (scalar, image, and histograms) to an events file for display in TensorBoard.
Adapted from https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/04-utils/tensorboard/logger.py
Reference https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
"""
import os
import torch
import tensorflow as tf
import numpy as np
class Logger(object):
def __init__(self, split, log_dir, batch_size, aml_run=None):
"""Create a summary writer logging to log_dir.
Assumes that the batch size is constant throughout the run. We log
step * batch_size to be consistent across runs of different batch size.
"""
log_dir = os.path.join(log_dir, split)
self.writer = tf.compat.v1.summary.FileWriter(log_dir)
self.split = split
self.batch_size = batch_size
self.aml_run = aml_run
def scalar_summary(self, tag, value, step):
"""Log a scalar variable."""
step = step * self.batch_size
summary = tf.compat.v1.Summary(value=[tf.compat.v1.Summary.Value(tag=tag, simple_value=value)])
self.writer.add_summary(summary, step)
if self.aml_run:
self.aml_run.log(f'{self.split}/{tag}', value)
def image_summary(self, split, tag, images_and_buffers, step):
"""Log a list of images."""
step = step * self.batch_size
img_summaries = []
for i, (pil_im, buf) in enumerate(images_and_buffers):
# Create an Image object
h = pil_im.shape[0] if isinstance(pil_im, torch.Tensor) else pil_im.size[0]
w = pil_im.shape[1] if isinstance(pil_im, torch.Tensor) else pil_im.size[1]
img_summary = tf.compat.v1.Summary.Image(encoded_image_string=buf.getvalue(),
height=h,
width=w)
# Create a Summary value
img_summaries.append(tf.compat.v1.Summary.Value(tag=f'{split}_{i}/{tag}', image=img_summary))
# Create and write Summary
summary = tf.compat.v1.Summary(value=img_summaries)
self.writer.add_summary(summary, step)
def histo_summary(self, tag, values, step, bins=1000):
"""Log a histogram of the tensor of values."""
step = step * self.batch_size
# Create a histogram using numpy
counts, bin_edges = np.histogram(values, bins=bins)
# Fill the fields of the histogram proto
hist = tf.compat.v1.HistogramProto()
hist.min = float(np.min(values))
hist.max = float(np.max(values))
hist.num = int(np.prod(values.shape))
hist.sum = float(np.sum(values))
hist.sum_squares = float(np.sum(values ** 2))
# Drop the start of the first bin
bin_edges = bin_edges[1:]
# Add bin edges and counts
for edge in bin_edges:
hist.bucket_limit.append(edge)
for c in counts:
hist.bucket.append(c)
# Create and write Summary
summary = tf.compat.v1.Summary(value=[tf.compat.v1.Summary.Value(tag=tag, histo=hist)])
self.writer.add_summary(summary, step)
self.writer.flush()

Просмотреть файл

@ -0,0 +1,131 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from io import BytesIO
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from skimage import img_as_ubyte
from PIL import Image
from viz_utils import VizUtils
class AverageMeter(object):
"""Computes and stores the average and current value
https://github.com/pytorch/examples/blob/master/imagenet/main.py
"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
"""
Args:
val: mini-batch loss or accuracy value
n: mini-batch size
"""
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
viz_util = VizUtils()
ndvi_normalizer = mcolors.Normalize(vmin=-1, vmax=1)
def log_sample_img_gt(logger_train, logger_val,
samples_train, samples_val):
"""
Logs sample chips and ground truth labels to TensorBoard
Args:
logger_train: Logger object for the training_wcs loop, used every print_every
logger_val: Logger object for the validation loop, used at the end of every epoch
samples_train: dict of tensors of chips and labels for the training_wcs set
samples_val: dict of tensors of chips and labels for the val set
Returns:
None
"""
for split, samples, logger in [
('train', samples_train, logger_train),
('val', samples_val, logger_val)
]:
chips = samples['chip']
labels = samples['chip_label']
print()
print(f'{split} chips has shape {chips.shape}')
tag = 'image bands 6, 3, 2'
band_combo = (2, 1, 0) # bands 6, 3, 2 in the baseline case
images_and_buffers = []
for i, sample in enumerate(chips):
im_np = sample[band_combo, :, :]
im_np = np.transpose(im_np, axes=(1, 2, 0))
# if I don't print this it has min and max values outside of [-1, 1] ???
# what race condition could this be???
print(f'{i}th sample shape is {im_np.shape}, min: {im_np.min()}, max: {im_np.max()}')
im = Image.fromarray(img_as_ubyte(im_np))
buf = BytesIO()
im.save(buf, format='png')
images_and_buffers.append((im_np, buf)) # image_summary reads the first two dims for height and width
logger.image_summary(split, tag, images_and_buffers, step=0)
# log the NDVI of the sample chips
tag = 'ndvi'
ndvi_band = 4 # bands are (2, 3, 6, 7, NDVI, elevation) in baseline
images_and_buffers = []
for sample in chips:
ndvi = sample[ndvi_band, :, :].squeeze()
# TODO move to viz_utils as well
fig = plt.figure(figsize=(4, 4))
fig = plt.imshow(ndvi, cmap='viridis', norm=ndvi_normalizer)
buf = BytesIO()
plt.savefig(buf, format='png')
plt.close()
buf.seek(0)
im = Image.open(buf)
images_and_buffers.append((im, buf))
logger.image_summary(split, tag, images_and_buffers, step=0)
# log the elevation of the sample chips if available; elevation is available in the tensor input as a channel
# note that here we do not set a normalizer for imshow() so it's stretched by the max and min values
# on the chip only - not absolute compared to all elevation values
if chips[0].shape[0] == 6:
tag = 'elevation'
images_and_buffers = []
for sample in chips:
elevation = sample[5, :, :].squeeze()
im, buf = viz_util.show_single_band(elevation, size=(4, 4))
images_and_buffers.append((im, buf))
logger.image_summary(split, tag, images_and_buffers, step=0)
# log the labels
tag = 'label'
images_and_buffers = []
for label_mask in labels:
im, buf = viz_util.show_label_raster(label_mask, size=(4, 4))
images_and_buffers.append((im, buf))
logger.image_summary(split, tag, images_and_buffers, 0)
def render_prediction(hardmax):
im, buf = viz_util.show_label_raster(hardmax)
return im, buf

Двоичные данные
visuals/in_the_tool.png Executable file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 2.7 MiB

Двоичные данные
visuals/unet.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 118 KiB