Add training scripts etc.

2020-09-29 15:16:56 -07:00 · 2020-09-29 15:16:56 -07:00 · f86a0f6a67
--- a/README.md
+++ b/README.md
@ -1,5 +1,81 @@
+# Land cover mapping the Orinoquía region 

-# Contributing
+In this project, we worked with the Wildlife Conservation Society Colombia ([WCS Colombia](https://colombia.wcs.org/en-us/)) to create up-to-date land cover maps of the [Orinoquía](https://colombia.wcs.org/en-us/Wild-Places/Orinoquia.aspx) region in Colombia. This natural region encompasses a range of ecosystems, from seasonally flooded savanna to rainforest. In recent years, new agricultural production practices have put pressure on these ecosystems. It is therefore critically important to present information on land use to policy makers so that we may balance the need for local communities to develop and conserving the biodiversity and ecological functions of the area. 
+
+Specifically, we used a land use and land cover (LULC) map that was manually produced using satellite imagery and field data from 2010-2012 to train a semantic segmentation model for 12 land cover classes. The model can be applied to composites of Landsat 8 imagery collected in subsequent years to enable ecological analysis. 
+
+This repo contains the scripts and configuration files used to train the land cover model using a median composite of Landsat 8 images from 2013-2014. We will be evaluating the result of applying the model to 2019-2020 imagery and releasing the resulting maps in the coming months. 
+
+
+## Installation
+
+At the root directory of this repo, use `environment.yml` to create a conda virtual environment called `wcs`:
+
+```bash
+conda env create --file environment.yml
+```
+
+If you need additional packages, add them in `environment.yml` and update the environment:
+
+```bash
+conda env update --name wcs --file environment.yml --prune
+```
+
+### `ai4eutils`
+
+You need to add the [`ai4eutils`](https://github.com/microsoft/ai4eutils) repo to the `PYTHONPATH`, as we make use of the `geospatial` module there.
+
+
+### Setting up the Solaris package
+
+We need to set up a separate conda environment for using Solaris. Instructions are available on https://github.com/CosmiQ/solaris.
+
+We do not want to install the Solaris pip package inside the `wcs` environment because it requires versions of PyTorch/TensorFlow that we might not want. 
+
+There are two ways to set up Solaris:
+
+1. To install Solaris using their published package on PyPI, first create a conda environment called `solaris`, which will make the next pip installation step go smoothly:
+    ```
+    conda create --file environment_solaris.yml
+    pip install solaris==0.2.2
+    ```
+
+2. If installing from source:
+
+    Once we clone the repo, add some more dependencies to its `environment.yml` so that Jupyter Notebook ran from our `wcs` environment can use the Solaris kernel to run certain notebooks:
+    ```
+      - nb_conda_kernels
+      - ipykernel
+      - humanfriendly
+    ```
+
+Our fork of Solaris is here: https://github.com/yangsiyu007/solaris
+
+Upstream: https://github.com/CosmiQ/solaris
+
+
+### Setting up an environment for the newer version of GDAL
+
+It seems that in our existing environment, `rasterio` is not compatible with `GDAL>=3.1`, which is required for creating Cloud Optimized GeoTIFFs (COGs). We create a separate environment for GDAL to run such commands in:
+ 
+```bash
+conda create --name gdalnew --channel conda-forge python gdal==3.1.2
+```
+
+
+## Terminology
+
+See [Satellite data terminology](https://github.com/microsoft/ai4eutils/tree/master/geospatial#satellite-data-terminology).
+
+
+## Related repositories
+
+https://github.com/microsoft/landcover
+
+https://github.com/microsoft/ai4eutils
+
+
+## Contributing

 This project welcomes contributions and suggestions.  Most contributions require you to agree to a
 Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
@ -12,3 +88,7 @@ provided by the bot. You will only need to do this once across all repos using o
 This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
 contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+
+
+## License
+This repository is licensed with the MIT license. See [LICENSE](./LICENSE).
--- a/commands.sh
+++ b/commands.sh
@ -0,0 +1,75 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+#@IgnoreInspection BashAddShebang
+
+# A record of often-used bash commands.
+
+
+# Add ai4eutils to PYTHONPATH
+
+export PYTHONPATH=${PYTHONPATH}:/home/<username>/wcs/repos/ai4eutils
+
+
+
+# Tiling and masking the 2013-2014 set
+
+python data/3_tile_and_mask.py \
+--image_path /home/<username>/wcs/mnt/wcs-orinoquia/images_sr_median/2013_2014  \
+--label_path /home/<username>/wcs/mnt/wcs-orinoquia/provided_labels/Landuse_shape/derived_20200421/landuse.shp \
+--label_property Landuse_WC \
+--out_dir /home/<username>/wcs/mnt/wcs-orinoquia/tiles/full_sr_median_2013_2014 \
+--region_outline /home/<username>/wcs/mnt/wcs-orinoquia/misc/outline/Orinoquia_outline.shp
+
+
+
+# Training locally on a DSVM
+
+conda activate wcs
+
+export PYTHONPATH="${PYTHONPATH}:/home/<username>/wcs/pycharm"
+
+cd training
+
+sh run_training_local.sh
+
+
+
+# Scoring
+
+# modify selection of tiles at the top of inference.py
+
+checkpoint_path=/disk/wcs/wcs_coarse_baseline_0_wrong_val_viz/outputs/wcs_coarse_baseline/checkpoints/model_best.pth.tar
+
+out_dir=/home/<username>/wcs/mnt/wcs-orinoquia/delivered/20200715/results_coarse_baseline_201920
+
+python training/inference.py --config_module_path training/experiments/coarse_baseline/coarse_baseline_config_refactored.py --checkpoint_path ${checkpoint_path} --out_dir ${out_dir} --output_softmax
+
+
+
+# Mounting blob container
+
+sh mount_wcs_containers.sh
+
+OR
+
+sudo mkdir /mnt/blobfusetmp-wcs-orinoquia
+sudo chown USERNAME /mnt/blobfusetmp-wcs-orinoquia
+
+blobfuse /home/<usrname>/wcs/mnt/wcs-orinoquia --tmp-path=/mnt/blobfusetmp-wcs-orinoquia --config-file=wcs-orinoquia.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 -o allow_other
+
+
+# Copy data to disk for faster access
+
+azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
+
+azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles_masks?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
+
+azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/tiles/full_sr_median_2013_2014/tiles_masks_coarse?SAS_KEY" /disk/wcs_data/tiles/full_sr_median_2013_2014 --recursive
+
+
+Elevation data:
+
+azcopy cp "https://geospatialmldata.blob.core.windows.net/wcs-orinoquia/images_srtm?SAS_KEY" /disk/wcs_data --recursive
+
+
--- a/constants/landsat_bands_info.py
+++ b/constants/landsat_bands_info.py
@ -1,3 +1,6 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
 bands_use = {
    # reference: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/
    # surface reflectance product: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR
--- a/data/gee_queries.js
+++ b/data/gee_queries.js
@ -1,3 +1,6 @@
+// Copyright (c) Microsoft Corporation.
+// Licensed under the MIT License.
+
 /* Landsat 8 Surface Reflectance imagery */

 // imports
--- a/data/gee_sentinel_query.py
+++ b/data/gee_sentinel_query.py
@ -1,3 +1,5 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
 """
 Execute this script in QGIS to add the queried Sentinel 2 imagery as an XYZ tile to the project.
 """
--- a/data/make_chip_shards.py
+++ b/data/make_chip_shards.py
@ -1,6 +1,5 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
-
 """
 make_data_shards.py

--- a/data/tile_and_mask.py
+++ b/data/tile_and_mask.py
@ -1,3 +1,12 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+Tile scenes of imagery (Landsat 8 only) and corresponding polygon labels, and create masks for the latter
+for training segmentation models. This uses the Solaris package and the conda env defined in `environment_solaris.yml`.
+
+This is the script version of the notebook `data/3_tile_and_mask.ipynb`.
+"""
+
 import argparse
 import os
 import pickle
--- a/environment_solaris.yml
+++ b/environment_solaris.yml
@ -0,0 +1,38 @@
+# Reference: https://github.com/CosmiQ/solaris
+name: solaris
+channels:
+  - pytorch
+  - conda-forge
+  - defaults
+dependencies:
+  - python>=3.6
+  - pip>=19.0.3
+  - affine>=2.3.0
+  - albumentations=0.4.3
+  - fiona>=1.7.13
+  - gdal=3.0.3
+  - geopandas>=0.7.0
+  - matplotlib>=3.1.2
+  - networkx>=2.4
+  - numpy>=1.17.3
+  - opencv>=4.1
+  - pandas>=0.25.3
+  - pyproj>=2.1
+  - pytorch>=1.3.1
+  - pyyaml=5.2
+  - rasterio>=1.0.23
+  - requests=2.22.0
+#  - rio-cogeo>=1.1.6
+  - rtree>=0.9.3
+  - scikit-image>=0.16.2
+  - scipy>=1.3.2
+  - shapely>=1.6.4
+  - tensorflow=1.13.1
+  - torchvision>=0.5.0
+  - tqdm>=4.40.0
+  - urllib3>=1.25.7
+  - nb_conda_kernels
+  - ipykernel
+  - humanfriendly
+  - pip:
+    - git+https://github.com/Toblerity/shapely.git@master#egg=shapely-1.7.1dev # temporary, 1.8dev required for numpy array support
--- a/training_wcs/README.md
+++ b/training_wcs/README.md
@ -0,0 +1,57 @@
+# Model training
+
+
+## Data
+
+### Batching
+Each tile is chipped in order (x then y, finishing one row then move to the next row of chips) before moving to the next tile.  
+
+### Folder structure and splits
+
+The splits are stored in a JSON file, as a dict with `train` and `val` fields, pointing to lists of TIF image names, e.g.  `wcs_orinoquia_trial_region_201301_201512_-69.74_3.742.tif`.
+
+The splits are over tiles. 
+
+The image tiles and their label masks are stored together (not in separate folders for each split). Splitting is done by the datasets and loaders. 
+
+### Chip size
+
+Using 256 for the width and height of the chips. 
+
+
+## Experiments
+
+### `200211_mini_baseline`
+
+Using bands that seem most helpful by looking at what each have been used for and the visualization of each band
+
+Bands used:
+
+2, 3, 4, 5, 6, 7
+
+Bands 6 and 7 have fairly different pixel value distributions, so including both. 
+
+Bands 4 and 5 are combined to give the NDVI, so the channels are
+
+2, 3, 6, 7, NDVI  
+
+(stacked in this order)
+
+We are keeping them as float32, which they're read by rasterio as, to retain the original amount of information. We apply clipping, normalization and normalization according to the parameters specified in the experiment config file (values of max, min and gamma - in this experiment gamma is 1, so no gamma correction), all the while keeping them in float32 dtype (`return_array` set as `True` returns float32 instead of uint8). 
+
+
+### Data augmentation techniques
+
+- Undersample chips that only contain the most common classes
+
+
+### Which bands are most helpful
+
+Band 8 panchromatic needs to be downsampled; band 10 and 11 needs to be enlarged
+
+
+## Flooding and seasonality
+
+Water has very low values on NDVI plots since it does not reflect much NIR. So getting the greenest pixel should 
+be okay for flooded savanna. 
+ 
--- a/training_wcs/experiments/baseline/baseline_config.py
+++ b/training_wcs/experiments/baseline/baseline_config.py
@ -1,200 +0,0 @@
-"""
-Configurations for the 20200505_mini_baseline experiment
-"""
-
-import json
-import os
-
-import torch
-import numpy as np
-
-from viz_utils import VizUtils
-from training_wcs.scripts.models.unet.unet import Unet  # models.unet.unet import Unet
-
-
-experiment_name = 'wcs_baseline_202005'
-
-eval_mode = True
-
-# I/O -------------------------------------------------------------------------------------------------
-if not eval_mode:
-    aml_data_ref = os.environ.get('AZUREML_DATAREFERENCE_wcsorinoquia', '')
-    assert len(aml_data_ref) > 0, 'Reading aml_data_ref from environment vars resulted in empty string.'
-
-    data_dir = os.path.join(aml_data_ref, 'tiles', 'full_sr_median_2013_2014')
-    assert 'tiles' in os.listdir(data_dir)
-    assert 'tiles_masks' in os.listdir(data_dir)
-
-    # a dir with experiment_name will be created in here and checkpoints are saved here
-    # set as './outputs' for AML to stream to this Run's folder
-    out_dir = '/boto_disk_0/wcs/20190518_feature_scale_1/outputs' # './outputs'
-    os.makedirs(out_dir, exist_ok=True)
-
-    # TF events go here. Set it as './logs' if using AML so they can be streamed
-    log_dir = '/boto_disk_0/wcs/20190518_feature_scale_1/logs' # './logs'
-
-    # train/val splits are stored in
-    # on AML, this needs to be relative to the source_directory level
-    splits_file = './constants/splits/full_sr_median_2013_2014_splits.json' # '../training_wcs/scripts/constants/splits/full_sr_median_2013_2014_splits.json'
-
-    with open(splits_file) as f:
-        splits = json.load(f)
-    train_split = splits['train']
-    val_split = splits['val']
-    print(f'Train set has {len(train_split)} tiles; val set has {len(val_split)} tiles.')
-
-
-# Training ----------------------------------------------------------------------------------------------
-
-evaluate_only = False  # Only evaluate the model on the val set once
-
-# this is the *total* epoch; if restarting from a checkpoint, be sure to add the additional number of epochs
-# to fine-tune on top of the original value of this var
-total_epochs = 1000
-print_every = 100  # print every how many steps; just the minibatch loss and accuracy
-assert print_every >= 1, 'print_every needs to be greater than or equal 1'
-
-starting_checkpoint_path = None
-
-init_learning_rate = 1e-4
-
-batch_size = 24
-
-# probability a chip is kept in the sample while sampling train and val chips at the start of training_wcs
-# this should be smaller if we now have more training_wcs examples
-# prob_keep_chip = 0.006
-# 133 training_wcs tiles * 64 chips per tile = 8512 chips. Should keep every 177 if visualizing 48, which is 0.0056
-keep_every = 30  # a balance between not covering all training_wcs tiles vs iterating through val tiles too many times
-
-num_chips_to_viz = 48
-
-
-# Hardware and framework --------------------------------------------------------------------------------
-dtype = torch.float32
-
-
-# Model -------------------------------------------------------------------------------------------------
-
-num_classes = 34   # empty plus the 33 WCS classes; this is the number of output nodes
-
-num_in_channels = 5  # 2, 3, 6, 7, NDVI
-
-# the smallest number of filters is 64 when feature_scale is 1, and it is 32 when feature_scale is 2
-feature_scale = 1
-
-is_deconv = True  # True to use transpose convolution filters to learn upsampling; otherwise upsampling is not learnt
-
-is_batchnorm = True
-
-model = Unet(feature_scale=feature_scale,
-             n_classes=num_classes,
-             in_channels=num_in_channels,
-             is_deconv=is_deconv,
-             is_batchnorm=is_batchnorm)
-
-
-# Data ---------------------------------------------------------------------------------------------------
-
-common_classes = [
-    12
-]
-less_common_classes = [
-    32, 33
-]
-
-weights = []
-for i in range(num_classes):
-    if i in common_classes:
-        weights.append(1)
-    elif i in less_common_classes:
-        weights.append(2)
-    else:
-        weights.append(10)
-loss_weights = torch.FloatTensor(weights)  # None if no weighting for classes
-print('Weights on loss per class used:')
-print(loss_weights)
-
-# how many subprocesses to use for data loading
-# None for now - need to modify datasets.py to use
-data_loader_num_workers = None
-
-# not available in IterableDataset data_loader_shuffle = True  # True to have the data reshuffled at every epoch
-
-chip_size = 256
-
-
-# based on min and max values from the sample tile
-# wcs_orinoquia_sr_median_2013_2014-0000007424-0000007424_-71.347_4.593.tif in training_wcs set
-# bands 4 and 5 are combined to get the NDVI, so the normalization params for 4 and 5 are
-# not used during training_wcs data generation, only for visualization (actually not yet used for viz either).
-bands_normalization_params = {
-    # these are the min and max to clip to for the band
-    'min': {
-        2: 0,
-        3: 0,
-        4: 0,
-        5: 0,
-        6: 0,
-        7: 0
-    },
-    'max': {
-        2: 700,
-        3: 1500,
-        4: 1500,
-        5: 5000,
-        6: 5000,
-        7: 3000
-    },
-    'gamma': {  # all the same in this experiment with value 1 which means no effect
-        2: 1.0,
-        3: 1.0,
-        4: 1.0,
-        5: 1.0,
-        6: 1.0,
-        7: 1.0
-    }
-}
-
-viz_util = VizUtils()
-
-def get_chip(tile_reader, chip_window):
-    """
-
-    Returns:
-        A numpy array of dims (5, H, W)
-    """
-    normal_bands = [2, 3, 6, 7]  # bands to be used without calculating other indices e.g. NDVI
-    bands_to_stack = []
-    for b in normal_bands:
-        # getting one band at a time because min, max and gamma may be different
-        band = viz_util.show_landsat8_tile(
-            tile_reader,
-            bands=[b],  # pass in a list to get the batch dimension in the results
-            window=chip_window,
-            band_min=bands_normalization_params['min'][b],
-            band_max=bands_normalization_params['max'][b],
-            gamma=bands_normalization_params['gamma'][b],
-            return_array=True
-        )
-        bands_to_stack.append(band)  # band is 2D (h, w), already squeezed, dtype is float 32
-
-    ndvi = viz_util.show_landsat8_ndvi(tile_reader, window=chip_window)  # 2D, dtype is float32
-    bands_to_stack.append(ndvi)
-
-    stacked = np.stack(bands_to_stack)
-
-    if stacked.shape != (5, chip_size, chip_size):
-        # default pad constant value is 0
-        stacked = np.pad(stacked,
-                        [(0, 0), (0, chip_size - stacked.shape[1]), (0, chip_size - stacked.shape[2])])
-
-    assert stacked.shape == (5, chip_size, chip_size), f'Landsat chip has wrong shape: {stacked.shape}, should be (5, h, w)'
-
-    # prepare the chip for display
-    chip_for_display = viz_util.show_landsat8_tile(tile_reader,
-                                                   window=chip_window,
-                                                   band_max=3000,  # what looks good for RGB
-                                                   gamma=0.5,
-                                                   return_array=True)
-    return stacked, chip_for_display
-
--- a/training_wcs/experiments/coarse_baseline/coarse_baseline_config.py
+++ b/training_wcs/experiments/coarse_baseline/coarse_baseline_config.py
@ -1,3 +1,5 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
 """
 First experiment using 6 channels (2, 3, 6, 7, NDVI, elevation) with the 13 + 1 coarse categories
 mapped June 29 2020.
--- a/training_wcs/run_training_AML.ipynb
+++ b/training_wcs/run_training_AML.ipynb
@ -0,0 +1,386 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.core.interactiveshell import InteractiveShell\n",
+    "InteractiveShell.ast_node_interactivity = 'all'  # default is ‘last_expr’\n",
+    "\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from shutil import copy\n",
+    "\n",
+    "import azureml.core\n",
+    "from azureml.core import (Workspace, Experiment, Datastore, Dataset, \n",
+    "                          ContainerRegistry, ScriptRunConfig, RunConfiguration, \n",
+    "                          Run)\n",
+    "from azureml.train.dnn import PyTorch\n",
+    "from azureml.data.data_reference import DataReference\n",
+    "from azureml.core.runconfig import DataReferenceConfiguration\n",
+    "from azureml.tensorboard import Tensorboard"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('Version of AML: {}'.format(azureml.core.__version__))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Run training on AML\n",
+    "We used this notebook to run experiments on [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-services/) earlier in the project, and since switched to running experiments on [Data Science Virtual Machines](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). This notebook is no longer maintained but is kept here as an example of using the AML Python SDK. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Provide credentials\n",
+    "\n",
+    "Provide the account name and the key to the storage account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "storage_account_name = os.environ.get('STORAGE_ACCOUNT_NAME')\n",
+    "storage_account_key = os.environ.get('STORAGE_ACCOUNT_KEY')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Connect to the AML workspace"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ws = Workspace.get(name='<workspace_name>', \n",
+    "                   subscription_id='<subscription_id>', \n",
+    "                   resource_group='<resource_group_name>')\n",
+    "print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "compute_target = ws.compute_targets['gpu-nc6']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Connect to datastore"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "datastore_name = 'wcsorinoquia'\n",
+    "container_name = 'wcs-orinoquia'\n",
+    "\n",
+    "datastore = None\n",
+    "for name, ds in ws.datastores.items():\n",
+    "    if name == datastore_name:\n",
+    "        datastore = ds\n",
+    "        \n",
+    "if datastore is None:\n",
+    "    datastore = Datastore.register_azure_blob_container(\n",
+    "         workspace=ws, \n",
+    "         datastore_name=datastore_name, \n",
+    "         container_name=container_name,\n",
+    "         account_name=storage_account_name, \n",
+    "         account_key=storage_account_key,\n",
+    "         create_if_not_exists=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_ref = DataReference(datastore=datastore,\n",
+    "                         data_reference_name=datastore_name,\n",
+    "                         mode='mount')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "str(data_ref)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create an AML experiment and run configuration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "experiment_name = 'wcs_baseline_20200506'\n",
+    "\n",
+    "exp_folder = './scripts_and_config'\n",
+    "\n",
+    "tags = {\n",
+    "    'model': 'unet, feature scale 2',\n",
+    "    \n",
+    "    'starting_from': 'None',\n",
+    "    \n",
+    "    'init_learning_rate': str(1e-4),\n",
+    "    \n",
+    "    'loss_weights': 'all the same, set to 1',\n",
+    "    \n",
+    "    'batch_size': '32',\n",
+    "    \n",
+    "    'imagery': 'full_sr_median_2013_2014',\n",
+    "    \n",
+    "    'bands': '2, 3, 6, 7, NDVI'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.getcwd()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "copy('../viz_utils.py', exp_folder)\n",
+    "\n",
+    "# copytree requires that the destination folder must not already exist\n",
+    "os.makedirs(os.path.join(exp_folder, 'constants'), exist_ok=True)\n",
+    "os.makedirs(os.path.join(exp_folder, 'constants', 'class_lists'), exist_ok=True)\n",
+    "os.makedirs(os.path.join(exp_folder, 'constants', 'splits'), exist_ok=True)\n",
+    "\n",
+    "copy('../constants/landsat_bands_info.py', os.path.join(exp_folder, 'constants'))\n",
+    "copy('../constants/class_lists/lulc_wcs_label_maps.json', os.path.join(exp_folder, 'constants', 'class_lists'))\n",
+    "copy('../constants/splits/full_sr_median_2013_2014_splits.json', os.path.join(exp_folder, 'constants', 'splits'))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp = Experiment(workspace=ws, name=experiment_name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PyTorch.get_supported_versions()\n",
+    "PyTorch.DEFAULT_VERSION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "script_params = {\n",
+    "    '--config_module_path': 'experiments.baseline.baseline_config'\n",
+    "}\n",
+    "\n",
+    "pt_est = PyTorch(\n",
+    "    source_directory=exp_folder,\n",
+    "    script_params=script_params,\n",
+    "    entry_script='train.py',  # relative to source_directory\n",
+    "    \n",
+    "    inputs=[data_ref],\n",
+    "    \n",
+    "    compute_target=compute_target,\n",
+    "    node_count=1,\n",
+    "    use_gpu=True,\n",
+    "    \n",
+    "    # framework_version='1.3.1',  # this version gets used, but can't specify it\n",
+    "    \n",
+    "    pip_packages=['pillow==6.1', 'tensorflow==1.14.0', \n",
+    "                 'numpy', 'pandas', 'matplotlib', \n",
+    "                  'geopandas', 'rasterio', 'scikit-image',\n",
+    "                 ]\n",
+    "\n",
+    "# Both of the following did not work (using conda does not work) \n",
+    "# - couldn't import rasterio.windows.Window if using conda_packages instead of pip_packages\n",
+    "\n",
+    "# conda_dependencies_file_path='training_environment.yml'\n",
+    "\n",
+    "#     conda_packages=['numpy', 'pandas', 'matplotlib', \n",
+    "#                     'geopandas', 'rasterio', 'scikit-image',\n",
+    "#                     'tensorflow==1.14.0', 'pillow==6.1']\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "run = exp.submit(pt_est, tags=tags)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "run.get_details()['runId']\n",
+    "run.get_status()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### To archive an Experiment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp = Experiment(workspace=ws, name='name_of_exp_to_archive')\n",
+    "exp.archive()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start TensorBoard\n",
+    "\n",
+    "https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-monitor-tensorboard\n",
+    "\n",
+    "We wrote logs to ./logs, which AML uploads to Artifact Service and makes available to a TensorBoard instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# recover the Run object if needed\n",
+    "run_id = 'wcs_baseline_<run_id>'  # get the run_id from above cell or from Azure Portal\n",
+    "run = Run(exp, run_id)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The Tensorboard constructor takes an array of runs\n",
+    "tb = Tensorboard([run])\n",
+    "\n",
+    "# If successful, start() returns a string with the URI of the instance.\n",
+    "tb.start()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# when done, call the stop() method of the Tensorboard object, or it will stay running even after your job completes.\n",
+    "tb.stop()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The TensorBoard stops loading after a little while and needs to be restarted..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:wcs] *",
+   "language": "python",
+   "name": "conda-env-wcs-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/training_wcs/run_training_local.sh
+++ b/training_wcs/run_training_local.sh
@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+
+# run from training_wcs/
+
+rm -rf scripts_and_config/constants
+rm -rf scripts_and_config/experiment_config
+
+mkdir scripts_and_config/constants
+mkdir scripts_and_config/constants/class_lists
+mkdir scripts_and_config/constants/splits
+
+cp ../constants/landsat_bands_info.py scripts_and_config/constants
+cp ../constants/class_lists/wcs_fine_coarse_label_maps.json scripts_and_config/constants/class_lists
+
+
+mkdir scripts_and_config/experiment_config
+
+cp ../viz_utils.py scripts_and_config
+
+# MODIFY here and below at the `python` command to specify which experiment config to use
+cp ./experiments/coarse_baseline/coarse_baseline_config.py scripts_and_config/experiment_config
+
+# MODIFY for new experiment config file
+# in specifying the config_module_path, remember only the config file itself is copied to folder experiment_config
+# without the experiment grouping level folder present in the repo
+/anaconda/envs/wcs/bin/python scripts_and_config/train.py --config_module_path scripts_and_config/experiment_config/coarse_baseline_config.py
--- a/training_wcs/scripts/inference.py
+++ b/training_wcs/scripts/inference.py
@ -0,0 +1,290 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+Scores a list of tiles specified at the top of this script.
+
+Scoring is not batched i.e. batch size is 1. Some squeeze() need to be rid of to use for larger batch sizes.
+"""
+
+import argparse
+import importlib
+import json
+import math
+import os
+import sys
+from datetime import datetime
+
+import numpy as np
+import rasterio
+import torch
+
+# SPECIFY input_tiles as a list of absolute paths to imagery .tif files
+
+# with open('/home/boto/wcs/pycharm/constants/splits/full_sr_median_2013_2014_splits.json') as f:
+#     input_tiles_json = json.load(f)
+# val_tiles = input_tiles_json['val']
+# input_tiles = [os.path.join('/boto_disk_0/wcs_data/tiles/full_sr_median_2013_2014/tiles', i) for i in
+#                val_tiles]
+
+# all 2019 - 2020 tiles
+tiles_dir = '/home/boto/wcs/mnt/wcs-orinoquia/images_sr_median/2019_202004'
+input_tiles = [os.path.join(tiles_dir, i) for i in os.listdir(tiles_dir)]
+
+
+def write_colormap(tile_writer, config):
+    """Write the experiment config's RasterLabelVisualizer's colormap to a TIFF file"""
+    tile_writer.write_colormap(1, config.label_viz.get_tiff_colormap())  # 1 for band 1
+
+
+def main():
+    parser = argparse.ArgumentParser(description='Inference on tiles')
+    parser.add_argument(
+        '--config_module_path',
+        required=True,
+        help="Path to the .py file containing the experiment's configurations"
+    )
+    parser.add_argument(
+        '--checkpoint_path',
+        required=True,
+        help='Path to the checkpoint .tar file to use for scoring'
+    )
+    parser.add_argument(
+        '--out_dir',
+        default='./outputs',
+        help='Path to a dir to put the prediction tiles.'
+    )
+    parser.add_argument(
+        '--output_softmax',
+        action='store_true',
+        help='Outputs the softmax probability scores as tiff tiles too, and its visualization as RGB tiles.'
+    )
+    args = parser.parse_args()
+
+    assert os.path.exists(args.checkpoint_path), f'Checkpoint at {args.checkpoint_path} does not exist.'
+
+    out_dir = args.out_dir
+    os.makedirs(out_dir, exist_ok=True)
+
+    # config for the training_wcs run that generated the model checkpoint
+    try:
+        module_name = 'config'
+        spec = importlib.util.spec_from_file_location(module_name, args.config_module_path)
+        config = importlib.util.module_from_spec(spec)
+        sys.modules[module_name] = config
+        spec.loader.exec_module(config)
+    except Exception as e:
+        print(f'Failed to import the configurations. Exception: {e}')
+        sys.exit(1)
+
+    # device configuration
+    print(f'Using PyTorch version {torch.__version__}.')
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    print(f'Using device: {device}.')
+
+    # scoring configuration
+    chip_size = config.chip_size
+    prediction_window_size = config.prediction_window_size if config.prediction_window_size else 128
+    prediction_window_offset = int((chip_size - prediction_window_size) / 2)
+    print((f'Using chip_size {chip_size} and window_size {prediction_window_size}. '
+           f'So window_offset is {prediction_window_offset}'))
+
+    start_time = datetime.now()
+
+    # instantiate the model and then load its state from the given checkpoint
+    model = config.model
+
+    checkpoint = torch.load(args.checkpoint_path, map_location=device)
+    model.load_state_dict(checkpoint['state_dict'])
+    print(f"Using checkpoint at epoch {checkpoint['epoch']}, step {checkpoint['step']}, "
+          f"val accuracy is {checkpoint.get('val_acc', 'Not Available')}")
+
+    model = model.to(device=device)
+    model.eval()  # eval mode: norm or dropout layers will work in eval mode instead of training_wcs mode
+
+    with torch.no_grad():  # with autograd engine deactivated
+        for i_file, input_tile_path in enumerate(input_tiles):
+
+            out_path_hardmax = os.path.join(out_dir, 'res_' + os.path.basename(input_tile_path))
+            if os.path.exists(out_path_hardmax):
+                print(f'Skipping already scored tile {out_path_hardmax}')
+                continue
+
+            print(f'Scoring input tile {i_file} out of {len(input_tiles)}, {input_tile_path}')
+
+            # dict_scores = {}  # dict of window tuple to numpy array of scores
+
+            # load entire tile into memory
+            tile_reader = rasterio.open(input_tile_path)
+
+            # use the get_chip function (to normalize the bands in a way that's consistent with training_wcs)
+            # but get a chip that's the size of the tile - all at once
+            whole_tile_window = (0, 0, tile_reader.width, tile_reader.height)
+            data_array: np.ndarray = config.get_chip(tile_reader, whole_tile_window, chip_for_display=False)
+
+            # pad by mirroring at the edges to facilitate predicting only on the center crop
+            data_array = np.pad(data_array,
+                                [
+                                    (0, 0),  # only pad height and width
+                                    (prediction_window_offset, prediction_window_offset),  # height / rows
+                                    (prediction_window_offset, prediction_window_offset)  # width / cols
+                                ],
+                                mode='symmetric')
+
+            # set up hardmax prediction output tile
+            tile_writer_hardmax = rasterio.open(
+                out_path_hardmax,
+                'w',
+                driver='GTiff',
+                height=tile_reader.height,
+                width=tile_reader.width,
+                count=1,  # only 1 "band", the hardmax predicted label at the pixel
+                dtype=np.uint8,
+                crs=tile_reader.crs,
+                transform=tile_reader.transform,
+                nodata=0,
+                compress='lzw',
+                blockxsize=prediction_window_size,
+                # reads and writes are most efficient when the windows match the dataset’s own block structure
+                blockysize=prediction_window_size
+            )
+            write_colormap(tile_writer_hardmax, config)
+
+            # set up the softmax output tile
+            if args.output_softmax:
+                out_path_softmax = os.path.join(out_dir, 'prob_' + os.path.basename(input_tile_path))
+                # probabilities projected into RGB for intuitive viewing
+                out_path_softmax_viz = os.path.join(out_dir, 'prob_viz_' + os.path.basename(input_tile_path))
+
+                tile_writer_softmax = rasterio.open(
+                    out_path_softmax,
+                    'w',
+                    driver='GTiff',
+                    height=tile_reader.height,
+                    width=tile_reader.width,
+                    count=config.num_classes,  # as many "bands" as there are classes to house the softmax probabilities
+                    # quantize probabilities scores so each can be stored as one byte instead of 4-byte float32
+                    dtype=np.uint8,
+                    crs=tile_reader.crs,
+                    transform=tile_reader.transform,
+                    nodata=0,
+                    compress='lzw',
+                    blockxsize=prediction_window_size,
+                    blockysize=prediction_window_size
+                )
+                tile_writer_softmax_viz = rasterio.open(
+                    out_path_softmax_viz,
+                    'w',
+                    driver='GTiff',
+                    height=tile_reader.height,
+                    width=tile_reader.width,
+                    count=3,  # RGB
+                    dtype=np.uint8,
+                    crs=tile_reader.crs,
+                    transform=tile_reader.transform,
+                    nodata=0,
+                    compress='lzw',
+                    blockxsize=prediction_window_size,
+                    blockysize=prediction_window_size
+                )
+
+            # score the tile in windows
+            num_rows = math.ceil(tile_reader.height / prediction_window_size)
+            num_cols = math.ceil(tile_reader.width / prediction_window_size)
+
+            for col_idx in range(num_cols):
+
+                col_start = col_idx * prediction_window_size
+                col_end = col_start + chip_size
+
+                for row_idx in range(num_rows):
+
+                    row_start = row_idx * prediction_window_size
+                    row_end = row_start + chip_size
+
+                    chip = data_array[:, row_start:row_end, col_start: col_end]
+
+                    # pad to (chip_size, chip_size)
+                    chip = np.pad(chip,
+                                  [(0, 0), (0, chip_size - chip.shape[1]), (0, chip_size - chip.shape[2])])
+
+                    # processing it as the dataset loader _get_chip does
+                    chip = np.nan_to_num(chip, nan=0.0, posinf=1.0, neginf=-1.0)
+                    sat_mask = chip[0].squeeze() > 0.0  # mask out DEM data where there's no satellite data
+                    chip = chip * sat_mask
+
+                    chip = np.expand_dims(chip, axis=0)
+                    chip = torch.FloatTensor(chip).to(device=device)
+
+                    try:
+                        scores = model(chip)  # these are scores before the final softmax
+                    except Exception as e:
+                        print(f'Exception in scoring loop model() application: {e}')
+                        print(f'Chip has shape {chip.shape}')
+                        sys.exit(1)
+
+                    _, preds = scores.max(1)
+                    softmax_scores = torch.nn.functional.softmax(scores, dim=1)
+
+                    softmax_scores = softmax_scores.cpu().numpy()  # (batch_size, num_classes, H, W)
+                    preds = preds.cpu().numpy().astype(np.uint8)
+
+                    assert np.max(preds) < config.num_classes
+
+                    # model output needs to be cropped to the window so they can be written correctly into the tile
+
+                    # same order as rasterio window: (col_off x, row_off y, width delta_x, height delta_y)
+                    valid_window_tup = (
+                        col_start,
+                        row_start,
+                        min(prediction_window_size, tile_reader.width - col_idx * prediction_window_size),
+                        min(prediction_window_size, tile_reader.height - row_idx * prediction_window_size)
+                    )
+
+                    # preds has a batch dim here
+                    preds1 = preds[:,
+                             prediction_window_offset:prediction_window_offset + valid_window_tup[3],
+                             prediction_window_offset:prediction_window_offset + valid_window_tup[
+                                 2]]  # last dim is the inner most, x, width
+
+                    # debug - print(f'col is {col_idx}, row is {row_idx}, valid_window_tup is {valid_window_tup}, preds shape: {preds.shape}, preds1 shape: {preds1.shape}')
+
+                    window = rasterio.windows.Window(valid_window_tup[0], valid_window_tup[1],
+                                                     valid_window_tup[2], valid_window_tup[3])
+                    tile_writer_hardmax.write(preds1, window=window)
+
+                    if args.output_softmax:
+                        # cropping, e.g. from (1, 14, 256, 256) to (1, 14, 128, 128)
+                        softmax_scores = softmax_scores[:, :,
+                                         prediction_window_offset:prediction_window_offset + valid_window_tup[3],
+                                         prediction_window_offset:prediction_window_offset + valid_window_tup[2]]
+
+                        # get rid of batch dim. First dim for TIFF writer needs to be number of bands to write
+                        softmax_scores = softmax_scores.squeeze()
+
+                        # quantize using 256 buckets so probability value is writen in 1 byte instead of 4 (float32)
+                        softmax_scores_quantized = (softmax_scores * 255).astype(np.uint8)
+
+                        tile_writer_softmax.write(softmax_scores_quantized, window=window)
+
+                        # softmax_scores_project is of dims ((batch_size), H, W, 3)
+                        # make sure to not use the quantized version!
+                        softmax_scores_proj = config.label_viz.visualize_softmax_predictions(softmax_scores)
+
+                        softmax_scores_proj = softmax_scores_proj.squeeze()  # assume batch_size is 1 TODO
+
+                        # we need it to be (3, H, W) to write to TIFF
+                        softmax_scores_proj = np.transpose(softmax_scores_proj, axes=(2, 0, 1)).astype(np.uint8)
+                        # debug - print(f'softmax_scores_proj min is {np.min(softmax_scores_proj)}, max is {np.max(softmax_scores_proj)}')
+
+                        tile_writer_softmax_viz.write(softmax_scores_proj, window=window)
+
+            tile_writer_hardmax.close()
+            tile_writer_softmax.close()
+            tile_writer_softmax_viz.close()
+
+    duration = datetime.now() - start_time
+    print(f'Inference finished in {duration}, which is {duration / len(input_tiles)} per tile')
+
+
+if __name__ == '__main__':
+    main()
--- a/training_wcs/scripts/train.py
+++ b/training_wcs/scripts/train.py
@ -0,0 +1,321 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+Training script that trains a model instantiated in the configuration file specified.
+
+Adapted from https://github.com/pytorch/examples/blob/master/imagenet/main.py
+"""
+
+import argparse
+import importlib
+import logging
+import os
+import shutil
+import sys
+import math
+from datetime import datetime
+from shutil import copytree
+from time import localtime, strftime
+
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+from utils.logger import Logger
+from utils.train_utils import AverageMeter, log_sample_img_gt, render_prediction
+
+# ai4eutils needs to be on the PYTHONPATH
+from geospatial.enums import ExperimentConfigMode
+
+# from azureml.core.run import Run  # importing this would make all the logging.info not output to stdout
+
+
+logging.basicConfig(stream=sys.stdout,
+                    level=logging.INFO,
+                    format='{asctime} {levelname} {message}',
+                    style='{',
+                    datefmt='%Y-%m-%d %H:%M:%S')
+logging.info(f'Using PyTorch version {torch.__version__}.')
+
+parser = argparse.ArgumentParser(description='UNet training_wcs')
+parser.add_argument(
+    '--config_module_path',
+    help="Path to the .py file containing the experiment's configurations")
+args = parser.parse_args()
+
+# config for the run
+try:
+    module_name = 'config'
+    spec = importlib.util.spec_from_file_location(module_name, args.config_module_path)
+    config = importlib.util.module_from_spec(spec)
+    sys.modules[module_name] = config
+    spec.loader.exec_module(config)
+except Exception as e:
+    logging.error(f'Failed to import the configurations. Exception: {e}')
+    sys.exit(1)
+
+# device configuration
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+logging.info(f'Using device: {device}.')
+
+torch.set_default_dtype(config.dtype)
+
+
+def get_sample_images(which_set='train'):
+    """
+    Get a deterministic set of images in the specified set (train or val) by using the dataset and
+    not the dataloader. Only works if the dataset is not IterableDataset.
+
+    Args:
+        which_set: one of 'train' or 'val'
+
+    Returns:
+        samples: a dict with keys 'chip' and 'chip_label', pointing to torch Tensors of
+        dims (num_chips_to_visualize, channels, height, width) and (num_chips_to_visualize, height, width)
+        respectively
+    """
+    assert which_set == 'train' or which_set == 'val'
+
+    dataset = dset_train if which_set == 'train' else dset_val
+
+    num_to_skip = 5  # first few chips might be mostly blank
+    assert len(dataset) > num_to_skip + config.num_chips_to_viz
+
+    keep_every = math.floor((len(dataset) - num_to_skip) / config.num_chips_to_viz)
+    sample_idx = range(num_to_skip, len(dataset), keep_every)
+    samples = dataset[sample_idx]
+    return samples
+
+
+def visualize_result_on_samples(model, sample_images, logger, step, split='train'):
+    model.eval()
+    with torch.no_grad():
+        sample_images = sample_images.to(device=device)
+        scores = model(sample_images)  # these are scores before the final softmax
+
+        # TODO Juan suggests that we also visualize the second or third most confident classes predicted
+        _, preds = scores.max(1)
+        preds = preds.cpu().numpy()
+        images_li = []
+        for i in range(preds.shape[0]):
+            preds_i = preds[i, :, :].squeeze()
+            im, buf = render_prediction(preds_i)
+            images_li.append((im, buf))
+
+        logger.image_summary(split, 'result', images_li, step)
+
+
+def weights_init(m):
+    if isinstance(m, nn.Conv2d):
+        logging.info('Initializing weights with Kaiming uniform')
+        nn.init.kaiming_uniform_(m.weight)
+        m.bias.data.zero_()
+
+
+def train(loader, model, criterion, optimizer, epoch, step, logger_train):
+    for batch_idx, data in enumerate(loader):
+        # put model to training_wcs mode; we put it in eval mode in visualize_result_on_samples for every print_every
+        model.train()
+        step += 1
+
+        x = data['chip'].to(device=device)  # move to device, e.g. GPU
+        y = data['chip_label'].to(device=device)  # y is not a int value here; also an image
+
+        # forward pass on this batch
+        scores = model(x)
+        loss = criterion(scores, y)
+
+        # backward pass
+        optimizer.zero_grad()
+        loss.backward()  # compute gradients
+        optimizer.step()  # update parameters
+
+        # TensorBoard logging and print a line to stdout; note that the accuracy is wrt the current mini-batch only
+        if step % config.print_every == 1:
+            _, preds = scores.max(1)  # equivalent to preds = scores.argmax(1)
+            accuracy = (y == preds).float().mean()
+
+            info = {'minibatch_loss': loss.item(), 'minibatch_accuracy': accuracy.item()}
+            for tag, value in info.items():
+                logger_train.scalar_summary(tag, value, step)
+
+            logging.info(
+                'Epoch {}, step {}, train minibatch_loss is {}, train minibatch_accuracy is {}'.format(
+                    epoch, step, info['minibatch_loss'], info['minibatch_accuracy']))
+    return step
+
+
+def evaluate(loader, model, criterion):
+    """Evaluate the model on dataset of the loader"""
+    losses = AverageMeter()
+    accuracies = AverageMeter()
+
+    model.eval()  # put model to evaluation mode
+    with torch.no_grad():
+        for t, data in enumerate(loader):
+            x = data['chip'].to(device=device)  # move to device, e.g. GPU
+            y = data['chip_label'].to(device=device, dtype=torch.long)
+
+            scores = model(x)
+
+            loss = criterion(scores, y)
+            # DEBUG logging.debug('Val loss = %.4f' % loss.item())
+
+            _, preds = scores.max(1)
+            accuracy = (y == preds).float().mean()
+
+            losses.update(loss.item(), x.size(0))
+            accuracies.update(accuracy.item(), 1)  # average already taken for accuracy for each pixel
+
+    return losses.avg, accuracies.avg
+
+
+def save_checkpoint(state, is_best, checkpoint_dir='../checkpoints'):
+    """
+    checkpoint_dir is used to save the best checkpoint if this checkpoint is best one so far
+    """
+    checkpoint_path = os.path.join(checkpoint_dir,
+                                   f"checkpoint_epoch{state['epoch'] - 1}_"
+                                   f"{strftime('%Y-%m-%d-%H-%M-%S', localtime())}.pth.tar")
+    torch.save(state, checkpoint_path)
+    if is_best:
+        shutil.copyfile(checkpoint_path, os.path.join(checkpoint_dir, 'model_best.pth.tar'))
+
+
+def main():
+
+    out_dir = './outputs' if config.out_dir is None else config.out_dir
+
+    # if running locally, copy current version of scripts and config to output folder as a record
+    if out_dir != './outputs':
+        scripts_copy_dir = os.path.join(out_dir, 'repo_copy')
+        cwd = os.getcwd()
+        logging.info(f'cwd is {cwd}')
+        if 'scripts' not in cwd:
+            cwd = os.path.join(cwd, 'scripts')
+        copytree(cwd, scripts_copy_dir)  # scripts_copy_dir cannot already exist
+        logging.info(f'Copied over scripts to output dir at {scripts_copy_dir}')
+
+    # create checkpoint dir
+    checkpoint_dir = os.path.join(out_dir, config.experiment_name, 'checkpoints')
+    os.makedirs(checkpoint_dir, exist_ok=True)
+
+    # model
+    model = config.model
+    model = model.to(device=device)  # move the model parameters to CPU/GPU
+
+    if config.loss_weights is not None:
+        assert isinstance(config.loss_weights, torch.Tensor), \
+            'config.loss_weight needs to be of Tensor type'
+        assert len(config.loss_weights) == config.num_classes, \
+            f'config.loss_weight has length {len(config.loss_weights)} but needs to equal to num_classes'
+    criterion = nn.CrossEntropyLoss(weight=config.loss_weights).to(device=device)
+
+    optimizer = optim.Adam(model.parameters(), lr=config.init_learning_rate)
+
+    # resume from a checkpoint if provided
+    starting_checkpoint_path = config.starting_checkpoint_path
+    if starting_checkpoint_path and os.path.isfile(starting_checkpoint_path):
+        logging.info('Loading checkpoint from {}'.format(starting_checkpoint_path))
+        checkpoint = torch.load(starting_checkpoint_path, map_location=device)
+        model.load_state_dict(checkpoint['state_dict'])
+
+        # don't load the optimizer settings so that a newly specified lr can take effect
+        # optimizer.load_state_dict(checkpoint['optimizer'])
+
+        starting_epoch = checkpoint['epoch']  # we incremented epoch before saving it, so can just start here
+        step = checkpoint.get('step', 0)
+        best_acc = checkpoint.get('best_acc', 0.0)
+        logging.info(f'Loaded checkpoint, starting epoch is {starting_epoch}, step is {step}, '
+                     f'best accuracy is {best_acc}')
+    else:
+        logging.info('No valid checkpoint is provided. Start to train from scratch...')
+        model.apply(weights_init)
+        starting_epoch = 0
+        best_acc = 0.0
+        step = 0
+
+    # data sets and loaders, which will be added to the global scope for easy access in other functions
+    global dset_train, loader_train, dset_val, loader_val
+
+    dset_train = config.dset_train
+    loader_train = config.loader_train
+
+    dset_val = config.dset_val
+    loader_val = config.loader_val
+
+    logging.info('Getting sample chips from val and train set...')
+    samples_val = get_sample_images(which_set='val')
+    samples_train = get_sample_images(which_set='train')
+
+    # logging
+    # run = Run.get_context()
+    aml_run = None
+    logger_train = Logger('train', config.log_dir, config.batch_size, aml_run)
+    logger_val = Logger('val', config.log_dir, config.batch_size, aml_run)
+    log_sample_img_gt(logger_train, logger_val,
+                      samples_train, samples_val)
+    logging.info('Logged image samples')
+
+    if config.config_mode == ExperimentConfigMode.EVALUATION:
+        val_loss, val_acc = evaluate(loader_val, model, criterion)
+        logging.info(f'Evaluated on val set, loss is {val_loss}, accuracy is {val_acc}')
+        return
+
+    for epoch in range(starting_epoch, config.total_epochs):
+        logging.info(f'\nEpoch {epoch} of {config.total_epochs}')
+
+        # train for one epoch
+        # we need the `step` concept for TensorBoard logging only
+        train_start_time = datetime.now()
+        step = train(loader_train, model, criterion, optimizer, epoch, step, logger_train)
+        train_duration = datetime.now() - train_start_time
+
+        # evaluate on val set
+        logging.info('Evaluating model on the val set at the end of epoch {}...'.format(epoch))
+
+        eval_start_time = datetime.now()
+        val_loss, val_acc = evaluate(loader_val, model, criterion)
+        eval_duration = datetime.now() - eval_start_time
+
+        logging.info(f'\nEpoch {epoch}, step {step}, val loss is {val_loss}, val accuracy is {val_acc}\n')
+        logger_val.scalar_summary('val_loss', val_loss, step)
+        logger_val.scalar_summary('val_acc', val_acc, step)
+
+        # visualize results on both train and val images
+        visualize_result_on_samples(model, samples_train['chip'], logger_train, step, split='train')
+        visualize_result_on_samples(model, samples_val['chip'], logger_val, step, split='val')
+
+        # log values and gradients of the parameters (histogram summary)
+        for tag, value in model.named_parameters():
+            tag = tag.replace('.', '/')
+            logger_train.histo_summary(tag, value.data.cpu().numpy(), step)
+            logger_train.histo_summary(tag + '/grad', value.grad.data.cpu().numpy(), step)
+
+        # record the best accuracy; save checkpoint for every epoch
+        is_best = val_acc > best_acc
+        best_acc = max(val_acc, best_acc)
+
+        logging.info(
+            f'Iterated through {step * config.batch_size} examples. Saved checkpoint for epoch {epoch}. '
+            f'Is it the highest accuracy checkpoint so far: {is_best}\n')
+
+        save_checkpoint({
+            # add 1 so when we restart from it, we can just read it and proceed
+            'epoch': epoch + 1,
+            'step': step,
+            'state_dict': model.state_dict(),
+            'optimizer': optimizer.state_dict(),
+            'val_acc': val_acc,
+            'best_acc': best_acc
+        }, is_best, checkpoint_dir)
+
+        # log execution time for this epoch
+        logging.info((f'epoch training_wcs duration is {train_duration.total_seconds()} seconds;'
+                     f'evaluation duration is {eval_duration.total_seconds()} seconds'))
+        logger_val.scalar_summary('epoch_duration_train', train_duration.total_seconds(), step)
+        logger_val.scalar_summary('epoch_duration_val', eval_duration.total_seconds(), step)
+
+
+if __name__ == '__main__':
+    main()
--- a/training_wcs/scripts/utils/data_transforms.py
+++ b/training_wcs/scripts/utils/data_transforms.py
@ -0,0 +1,81 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+Transforms to be performed on the loaded chip: ToTensor, RandomHorizontalFlip, and RandomVerticalFlip.
+
+Referencing: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class
+
+We could use https://github.com/mlagunas/pytorch-nptransforms/ instead of using torch.flip (Caleb suggested)
+"""
+import random
+
+import torch
+
+
+class ToTensor(object):
+    """Convert ndarrays in sample to Tensors."""
+
+    def __call__(self, sample):
+        return {
+
+            # 'chip_id': sample['chip_id'],
+            # some tiles become float64 doubles for some reason if just using from_numpy (?)
+            'chip': torch.from_numpy(sample['chip']).to(dtype=torch.float32),
+            # torch.long is required to compute nn.CrossEntropyLoss
+            'chip_label': torch.from_numpy(sample['chip_label']).to(dtype=torch.long),
+            # 'chip_for_display': torch.from_numpy(sample['chip_for_display'])
+        }
+
+
+class RandomHorizontalFlip(object):
+    """ Horizontally flip chip with probability 0.5
+
+    Horizontal flip - flip along width dim
+
+    Can use https://pytorch.org/docs/master/generated/torch.fliplr.html instead when it's available in > 1.5.1
+
+    chip is of dim (channel, height, width)
+    chip_for_display is of dim (height, width, channel)
+    chip_label is of dim (height, width)
+    """
+
+    def __init__(self, prob=0.5):
+        self.prob = prob
+
+    def __call__(self, sample):
+        if random.random() < self.prob:
+            return {
+                # 'chip_id': sample['chip_id'],
+                'chip': torch.flip(sample['chip'], [2]),
+                'chip_label': torch.flip(sample['chip_label'], [1]),
+                # 'chip_for_display': torch.flip(sample['chip_for_display'], [1])
+            }
+        else:
+            return sample
+
+
+class RandomVerticalFlip(object):
+    """ Vertically flip chip with probability 0.5
+
+    Vertical flip - flip along height dim
+
+    Can use https://pytorch.org/docs/master/generated/torch.flipud.html instead when it's available in > 1.5.1
+
+    chip is of dim (channel, height, width)
+    chip_for_display is of dim (height, width, channel)
+    chip_label is of dim (height, width)
+    """
+
+    def __init__(self, prob=0.5):
+        self.prob = prob
+
+    def __call__(self, sample):
+        if random.random() < self.prob:
+            return {
+                # 'chip_id': sample['chip_id'],
+                'chip': torch.flip(sample['chip'], [1]),
+                'chip_label': torch.flip(sample['chip_label'], [0]),
+                # 'chip_for_display': torch.flip(sample['chip_for_display'], [0])
+            }
+        else:
+            return sample
--- a/training_wcs/scripts/utils/datasets.py
+++ b/training_wcs/scripts/utils/datasets.py
@ -0,0 +1,271 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+This script contains two classes used for iterating chips in the dataset:
+- LandsatDataset: reads one chip at a time from tiles in disk (so very slow)
+- SingleShardChipsDataset:
+"""
+
+import logging
+import math
+import os
+from random import shuffle
+
+import numpy as np
+import rasterio
+from PIL import Image
+from torch.utils.data.dataset import IterableDataset, Dataset
+
+LABELS_TO_USE = 'tiles_masks_coarse'  # 'tiles_masks'  - which set of labels to use
+
+
+class LandsatDataset(IterableDataset):
+    """Class representing a dataset of Landsat image chips and label masks, such as a training_wcs set.
+        It reads one chip at a time from tiles stored in disk.
+        make_chip_shards.py iterate through an instance of this class for train and val set and store the chips
+        in numpy arrays for faster iteration during training_wcs.
+    """
+
+    def __init__(self, data_dir, tile_names, get_chip_func,
+                 get_chip_mask_func=None, repeat_chip_func=None,
+                 chip_size=256, tile_size=2000, transform=None):
+        """
+        Args:
+            data_dir (string): Directory containing the subfolders tiles and tiles_label_masks
+            tile_names (list of strings): Names of tiles in the split corresponding to this Dataset
+            get_chip_func (function): A function that returns (chip, chip_for_display) given the arguments
+                (rasterio reader, chip_window)
+            get_chip_mask_func (function): A function that returns chip_mask given the arguments
+                (rasterio reader, chip_window for PIL image)
+            transform (callable, optional): Optional transform to be applied
+                on a sample.
+        """
+        super(LandsatDataset).__init__()
+
+        assert len(tile_names) > 0, 'tile_names passed to LandsatDataset is empty'
+
+        self.data_dir = data_dir
+        self.get_chip_func = get_chip_func
+        self.get_chip_mask_func = get_chip_mask_func
+        self.repeat_chip_func = repeat_chip_func
+        self.chip_size = chip_size
+        self.tile_names = tile_names
+        self.num_tiles = len(tile_names)
+
+        # how many rows and columns of chips can fit on this tile
+        # pixels per side / chip_size of 256
+        self.num_rows = math.ceil(tile_size / chip_size)
+        self.num_cols = self.num_rows  # only support square tiles and chips
+        print(f'Number of rows or columns in a tile for making chips is {self.num_rows}')
+
+        self.transform = transform
+
+        self._shuffle_tiles = False
+
+    @property
+    def shuffle_tiles(self):
+        return self._shuffle_tiles
+
+    @shuffle_tiles.setter
+    def shuffle_tiles(self, value):
+        assert isinstance(value, bool)
+        self._shuffle_tiles = value
+
+    def __iter__(self):
+        # shuffle the tiles if the shuffle_tiles option is on (each epoch will get a new iterator instance, I think)
+        if self._shuffle_tiles:
+            print('Returning iterator instance with the tiles shuffled!')
+            shuffle(self.tile_names)
+        else:
+            print('Returning iterator instance with the tiles NOT shuffled!')
+        return self._get_generator()
+
+    @staticmethod
+    def _default_get_chip_mask(tile_label, chip_window):
+        """
+
+        Args:
+            tile_label: PIL Image of the tile's label mask
+            chip_window: PIL window (left x, upper y, right x2, lower y2)
+
+        Returns:
+            chip_label as a numpy array of dtype np.uint8
+        """
+        chip_label = tile_label.crop(box=chip_window)
+        chip_label = np.array(chip_label, dtype=np.uint8)
+        return chip_label
+
+
+    def _get_chip(self, tile_reader, tile_label, col_idx, row_idx):
+        """Call the config supplied functions to get the imagery chip and optionally the label mask chip
+        referencing the col_idx and row_idx specified.
+
+        This function is responsible for
+        - observing entirely empty chips and masking out imagery pixels corresponding to label masks where
+          there is no label (category of 0).
+        - masking out label mask where there is no valid imagery available
+        (i.e. both imagery and label chips should be empty / of value 0 in the same pixels)
+
+        Args:
+            tile_reader: rasterio dataset object of the imagery tile
+            tile_label: PIL Image of the tile's label mask
+            col_idx: index of the column in the tile this call should reference
+            row_idx: index of the row in the tile this call should reference
+
+        Returns:
+            (chip, chip_label_masked, chip_for_display)
+        """
+
+        # torch dimensions: batch, channel, height, width
+
+        # rasterio window is (col_off x, row_off y, width, height)
+        chip_window = (
+            col_idx * self.chip_size,
+            row_idx * self.chip_size,
+            self.chip_size,
+            self.chip_size
+        )
+        chip, chip_for_display = self.get_chip_func(tile_reader, chip_window)  # dim is (C, H, W)
+
+        # where cloud is masked out on GEE, there is NaN values
+        # pixel values are normalized to [0, 1]; NDVI is [-1, 1]
+        chip = np.nan_to_num(chip, nan=0.0, posinf=1.0, neginf=-1.0)
+
+        # return None if satellite imagery in this chip is empty / not available
+        # sometimes the SWIR has non-zero values while all other bands are zero, so
+        # using band 2 (visible blue), which is the first in the stack, to mask out labels
+        if chip[0].max() == chip[0].min():
+            logging.debug('_get_chip, Empty imagery chip!')
+            return None, None, None
+
+        # mask for where data is available in the satellite chip,
+        # again using band 2 (visible blue) to determine if the satellite chip is available
+        # as opposed to summing across all bands
+        # also avoids the problem that the elevation layer has no gaps
+        sat_mask = chip[0].squeeze() > 0.0
+
+        # PIL window is (left x, upper y, right x2, lower y2)
+        mask_chip_window = (
+            col_idx * self.chip_size,
+            row_idx * self.chip_size,
+            col_idx * self.chip_size + self.chip_size,
+            row_idx * self.chip_size + self.chip_size
+        )
+        if self.get_chip_mask_func is None:
+            chip_label = LandsatDataset._default_get_chip_mask(tile_label, chip_window=mask_chip_window)
+        else:
+            chip_label = self.get_chip_mask_func(tile_label, chip_window=mask_chip_window)
+
+        if chip_label is None:
+            print('_get_chip, Skipped due to label mask not meeting criteria')
+
+        # the label mask can also be empty on some places where we have imagery data
+        if chip_label.max() == 0 and chip_label.min() == 0:
+            logging.debug('_get_chip, Empty label mask chip!')
+            return None, None, None
+
+        label_available_mask = chip_label > 0
+
+        # we only want to keep pixels that have both data and label
+        data_label_mask = np.logical_and(sat_mask, label_available_mask)
+
+        chip = chip * data_label_mask
+        chip = chip * sat_mask  # masking out the elevation and other bands that may not be zero where the blue band is
+
+        if chip_for_display.shape != (self.chip_size, self.chip_size, 3):
+            # default pad constant value is 0
+            chip_for_display = np.pad(chip_for_display,
+                                      [(0, self.chip_size - chip_for_display.shape[0]),
+                                       (0, self.chip_size - chip_for_display.shape[1]),
+                                       (0, 0)])
+        chip_for_display = chip_for_display * np.expand_dims(data_label_mask, axis=2)  # can't broadcast here somehow
+
+        chip_label_masked = chip_label * data_label_mask
+
+        # these are without the batch dimension, which will be added by the default collate fn as the outmost/first dim
+        return chip, chip_label_masked, chip_for_display
+
+    def _get_generator(self):
+        num_chips = 0
+        for tile_idx, tile_name in enumerate(self.tile_names):
+            # print(f'Now accessing tile number {tile_idx}, {tile_name}')
+            tile_path = os.path.join(self.data_dir, 'tiles', tile_name)
+
+            parts = tile_name.split('.tif')[0].split('_')
+            lon = parts[-2]
+            lat = parts[-1]
+
+            mask_name = f'mask_{lon}_{lat}.tif'
+            mask_path = os.path.join(self.data_dir, LABELS_TO_USE, mask_name)
+
+            tile_reader = rasterio.open(tile_path)
+            tile_label = Image.open(mask_path)
+
+            for row_idx in range(self.num_rows):
+                for col_idx in range(self.num_cols):
+                    chip, chip_label, chip_for_display = self._get_chip(tile_reader, tile_label, col_idx, row_idx)
+
+                    if chip is None:
+                        continue  # completely empty chip
+
+                    sample = {
+                        'chip_id': f'{tile_name}_col{col_idx}_row{row_idx}',
+                        'chip': chip,
+                        'chip_label': chip_label,
+                        'chip_for_display': chip_for_display
+                    }
+                    if self.transform:
+                        sample = self.transform(sample)
+                    num_chips += 1
+                    yield sample
+
+                    if self.repeat_chip_func is not None and self.repeat_chip_func(chip_label):
+                        #print('Chip repeated because of repeat_chip_func!')
+                        yield sample  # execution should return to this condition
+                    else:
+                        #print('Chip NOT repeated')
+                        pass
+        print(f'End of generator... Number of chips is {num_chips}')
+
+
+class SingleShardChipsDataset(Dataset):
+    """
+    A class for iterating over chips created using the make_chip_shards.py script. The entire shard is loaded
+    into memory as a numpy array, so iteration is fast.
+
+    Only supports a single shard currently. To support multiple shards effectively we probably need to
+    implement IterableDataset.
+    """
+
+    def __init__(self, data_shard_dir, shard_prefix='train', channels=None, transform=None):
+
+        # available channels in this round are (2, 3, 6, 7, NDVI, elevation)
+        # so if you want all the channels to be used, set channels=(0, 1, 2, 3, 4, 5)
+
+        super(SingleShardChipsDataset).__init__()
+
+        self.transform = transform
+
+        self.chips = np.load(os.path.join(data_shard_dir, f'{shard_prefix}_chips_0.npy'))
+        if channels is not None:
+            assert max(channels) < self.chips.shape[1]
+            assert min(channels) >= 0
+            self.chips = self.chips[:, channels, :, :]
+
+        self.labels = np.load(os.path.join(data_shard_dir, f'{shard_prefix}_labels_0.npy'))
+        print(f'For prefix {shard_prefix}, loaded chips of dims {self.chips.shape}, labels of dims {self.labels.shape}')
+
+    def __len__(self):
+        return len(self.chips)
+
+    def __getitem__(self, idx):
+
+        sample = {
+            'chip': self.chips[idx, :, :, :],
+            'chip_label': self.labels[idx, :, :]
+        }
+
+        if self.transform:
+            sample = self.transform(sample)
+
+        return sample
--- a/training_wcs/scripts/utils/logger.py
+++ b/training_wcs/scripts/utils/logger.py
@ -0,0 +1,90 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+"""
+The Logger class logs metrics (scalar, image, and histograms) to an events file for display in TensorBoard.
+
+Adapted from https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/04-utils/tensorboard/logger.py
+Reference https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
+"""
+
+import os
+
+import torch
+import tensorflow as tf
+import numpy as np
+
+
+class Logger(object):
+
+    def __init__(self, split, log_dir, batch_size, aml_run=None):
+        """Create a summary writer logging to log_dir.
+            Assumes that the batch size is constant throughout the run. We log
+            step * batch_size to be consistent across runs of different batch size.
+        """
+
+        log_dir = os.path.join(log_dir, split)
+        self.writer = tf.compat.v1.summary.FileWriter(log_dir)
+
+        self.split = split
+        self.batch_size = batch_size
+        self.aml_run = aml_run
+
+    def scalar_summary(self, tag, value, step):
+        """Log a scalar variable."""
+
+        step = step * self.batch_size
+        summary = tf.compat.v1.Summary(value=[tf.compat.v1.Summary.Value(tag=tag, simple_value=value)])
+        self.writer.add_summary(summary, step)
+
+        if self.aml_run:
+            self.aml_run.log(f'{self.split}/{tag}', value)
+
+    def image_summary(self, split, tag, images_and_buffers, step):
+        """Log a list of images."""
+
+        step = step * self.batch_size
+
+        img_summaries = []
+        for i, (pil_im, buf) in enumerate(images_and_buffers):
+            # Create an Image object
+            h = pil_im.shape[0] if isinstance(pil_im, torch.Tensor) else pil_im.size[0]
+            w = pil_im.shape[1] if isinstance(pil_im, torch.Tensor) else pil_im.size[1]
+            img_summary = tf.compat.v1.Summary.Image(encoded_image_string=buf.getvalue(),
+                                       height=h,
+                                       width=w)
+            # Create a Summary value
+            img_summaries.append(tf.compat.v1.Summary.Value(tag=f'{split}_{i}/{tag}', image=img_summary))
+
+        # Create and write Summary
+        summary = tf.compat.v1.Summary(value=img_summaries)
+        self.writer.add_summary(summary, step)
+
+    def histo_summary(self, tag, values, step, bins=1000):
+        """Log a histogram of the tensor of values."""
+
+        step = step * self.batch_size
+
+        # Create a histogram using numpy
+        counts, bin_edges = np.histogram(values, bins=bins)
+
+        # Fill the fields of the histogram proto
+        hist = tf.compat.v1.HistogramProto()
+        hist.min = float(np.min(values))
+        hist.max = float(np.max(values))
+        hist.num = int(np.prod(values.shape))
+        hist.sum = float(np.sum(values))
+        hist.sum_squares = float(np.sum(values ** 2))
+
+        # Drop the start of the first bin
+        bin_edges = bin_edges[1:]
+
+        # Add bin edges and counts
+        for edge in bin_edges:
+            hist.bucket_limit.append(edge)
+        for c in counts:
+            hist.bucket.append(c)
+
+        # Create and write Summary
+        summary = tf.compat.v1.Summary(value=[tf.compat.v1.Summary.Value(tag=tag, histo=hist)])
+        self.writer.add_summary(summary, step)
+        self.writer.flush()
--- a/training_wcs/scripts/utils/train_utils.py
+++ b/training_wcs/scripts/utils/train_utils.py
@ -0,0 +1,131 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+from io import BytesIO
+
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib.colors as mcolors
+from skimage import img_as_ubyte
+from PIL import Image
+
+from viz_utils import VizUtils
+
+
+class AverageMeter(object):
+    """Computes and stores the average and current value
+    https://github.com/pytorch/examples/blob/master/imagenet/main.py
+    """
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        """
+
+        Args:
+            val: mini-batch loss or accuracy value
+            n: mini-batch size
+        """
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
+
+
+viz_util = VizUtils()
+ndvi_normalizer = mcolors.Normalize(vmin=-1, vmax=1)
+
+def log_sample_img_gt(logger_train, logger_val,
+                      samples_train, samples_val):
+    """
+    Logs sample chips and ground truth labels to TensorBoard
+    Args:
+        logger_train: Logger object for the training_wcs loop, used every print_every
+        logger_val: Logger object for the validation loop, used at the end of every epoch
+        samples_train: dict of tensors of chips and labels for the training_wcs set
+        samples_val: dict of tensors of chips and labels for the val set
+    Returns:
+        None
+    """
+    for split, samples, logger in [
+        ('train', samples_train, logger_train),
+        ('val', samples_val, logger_val)
+    ]:
+        chips = samples['chip']
+        labels = samples['chip_label']
+
+        print()
+        print(f'{split} chips has shape {chips.shape}')
+
+        tag = 'image bands 6, 3, 2'
+        band_combo = (2, 1, 0)  # bands 6, 3, 2 in the baseline case
+        images_and_buffers = []
+        for i, sample in enumerate(chips):
+            im_np = sample[band_combo, :, :]
+            im_np = np.transpose(im_np, axes=(1, 2, 0))
+
+            # if I don't print this it has min and max values outside of [-1, 1] ???
+            # what race condition could this be???
+            print(f'{i}th sample shape is {im_np.shape}, min: {im_np.min()}, max: {im_np.max()}')
+
+            im = Image.fromarray(img_as_ubyte(im_np))
+            buf = BytesIO()
+            im.save(buf, format='png')
+            images_and_buffers.append((im_np, buf))  # image_summary reads the first two dims for height and width
+
+        logger.image_summary(split, tag, images_and_buffers, step=0)
+
+        # log the NDVI of the sample chips
+        tag = 'ndvi'
+        ndvi_band = 4  # bands are (2, 3, 6, 7, NDVI, elevation) in baseline
+        images_and_buffers = []
+        for sample in chips:
+            ndvi = sample[ndvi_band, :, :].squeeze()
+
+            # TODO move to viz_utils as well
+            fig = plt.figure(figsize=(4, 4))
+            fig = plt.imshow(ndvi, cmap='viridis', norm=ndvi_normalizer)
+
+            buf = BytesIO()
+            plt.savefig(buf, format='png')
+            plt.close()
+            buf.seek(0)
+            im = Image.open(buf)
+            images_and_buffers.append((im, buf))
+        logger.image_summary(split, tag, images_and_buffers, step=0)
+
+        # log the elevation of the sample chips if available; elevation is available in the tensor input as a channel
+        # note that here we do not set a normalizer for imshow() so it's stretched by the max and min values
+        # on the chip only - not absolute compared to all elevation values
+        if chips[0].shape[0] == 6:
+            tag = 'elevation'
+            images_and_buffers = []
+
+            for sample in chips:
+                elevation = sample[5, :, :].squeeze()
+
+                im, buf = viz_util.show_single_band(elevation, size=(4, 4))
+
+                images_and_buffers.append((im, buf))
+            logger.image_summary(split, tag, images_and_buffers, step=0)
+
+        # log the labels
+        tag = 'label'
+        images_and_buffers = []
+
+        for label_mask in labels:
+            im, buf = viz_util.show_label_raster(label_mask, size=(4, 4))
+            images_and_buffers.append((im, buf))
+        logger.image_summary(split, tag, images_and_buffers, 0)
+
+
+def render_prediction(hardmax):
+    im, buf = viz_util.show_label_raster(hardmax)
+    return im, buf
+
--- a/visuals/in_the_tool.png
+++ b/visuals/in_the_tool.png
--- a/visuals/unet.png
+++ b/visuals/unet.png