{Initial Commit - Transferring code to new repo }
This commit is contained in:
Perry Skountrianos 2021-02-17 11:47:57 -08:00
Родитель 75c41c4857
Коммит 7e9c2955f0
7 изменённых файлов: 950 добавлений и 24 удалений

Просмотреть файл

@ -1,33 +1,50 @@
# Project
# Distributed training of Image segmentation on Azure ML
> This repo has been populated by an initial template to help get you started. Please
> make sure to update the content to build a great experience for community-building.
The repo will show how to complete distributional training of image segmentation on Azure ML.
As the maintainer of this project, please make a few updates:
## Platform
- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
We complete the distributional training in Azure ML by using mutiple nodes and mutiple GPU's per node.
## Contributing
[Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning/)
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
[Azure ML SDK](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py)
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
To run the notebook, you need to have/create:
1. Create/have Azure subscription
2. Create/have Azure storage
3. Create/have Azure ML workspace
4. (Optional) Create/have Azure ML compute target (4 nodes of STANDARD_NC24) - this can be created in notebook.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Dataset
## Trademarks
We used the data from a kaggle project:
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
https://www.kaggle.com/c/airbus-ship-detection
The project is for segmenting ships from sattelite images. We used their train_v2 data.
To run the notebook, you need to:
1. create a container in Azure storage.
2. Upload "train_v2" into the container with folder name "airbus"
## Package
We used a package "Fast.AI". It can use less codes to create deep learning model and train the model. For example, we used 3 lines for the image classfication:
>data = ImageDataBunch.from_folder(data_folder, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=sz, bs = bs, num_workers=8).normalize(imagenet_stats)
>learn = cnn_learner(data, models.resnet34, metrics=dice)
>learn.fit_one_cycle(5, slice(1e-5), pct_start=0.8)
Fast.AI supports computer vision (CNN and U-Net), and NLP (transformer). Please find details in their website.
https://www.fast.ai/
You can install it by:
>pip install fastai
## Distributional training
Fasi.AI only support the NCCL backend distributional training, which is not natively supported by Azure ML. We used a script "azureml_adapter.py" to help complete the NCCL initialization on Azure ML.

Просмотреть файл

@ -0,0 +1,546 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"_uuid": "9380833a1a2503c5d3518f0ed8d6df8dcf05b7c2"
},
"source": [
"## Overview\n",
"The whole processing has 2 steps:\n",
"1. Image classification: classifying images with or without ships. \n",
"2. Image segmentation: segmenting ships from images.\n",
"We downsample images into 256 X 256. However, the downsampling caused ship size to be only 1 pixel, which leads to lower segmentation performance. So we select images with larger ship size for the segmentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"_uuid": "2a2f9181ed56a8310f6188ac1254f903574fb115"
},
"outputs": [],
"source": [
"import fastai\n",
"from fastai.vision import *\n",
"from fastai.callbacks.hooks import *\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import os, glob"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"\n",
"from azureml.core import Workspace, Datastore, Dataset, Experiment, Run, Environment\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"from azureml.core.model import Model\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"from azureml.train.dnn import PyTorch, Mpi\n",
"from azureml.train.hyperdrive import GridParameterSampling\n",
"from azureml.data.data_reference import DataReference\n",
"from azureml.train.hyperdrive import HyperDriveConfig\n",
"from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun\n",
"from azureml.pipeline.core import Pipeline, PipelineData\n",
"from azureml.train.hyperdrive import PrimaryMetricGoal\n",
"from azureml.train.hyperdrive.parameter_expressions import choice\n",
"\n",
"from azureml.core.runconfig import MpiConfiguration\n",
"\n",
"from azureml.widgets import RunDetails"
]
},
{
"cell_type": "markdown",
"metadata": {
"_uuid": "8fa09c99d9f5b03e8e3a213f6d84902d5e1d59e1"
},
"source": [
"### Prepare Azure Resource"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Connect the workspace\n",
"interactive_auth = InteractiveLoginAuthentication()\n",
"\n",
"subscription_id = '<Your Azure subscription id>'\n",
"resource_group = '<Your resource group in Azure'\n",
"workspace_name = '<Your workspace name>'\n",
"\n",
"workspace = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name,\n",
" auth=interactive_auth)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register storage container as datastore\n",
"storange_name = '<Your Azure storage name>'\n",
"ket_to_storage = '<Key to your storage>'\n",
"datastore_name = 'airbus'\n",
"\n",
"datastore = Datastore.register_azure_blob_container(workspace=workspace, \n",
" datastore_name=datastore_name, \n",
" container_name=datastore_name,\n",
" account_name=storange_name, \n",
" account_key=ket_to_storage,\n",
" create_if_not_exists=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Find datastore by name\n",
"datastore = Datastore.get(workspace, datastore_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Connect/create computer resource\n",
"cluster_name = 'gpu-nc24'\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace = workspace, name = cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size = 'STANDARD_NC24', min_nodes = 0, max_nodes = 4)\n",
" compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = 4, timeout_in_minutes = 20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register dataset\n",
"dataset = Dataset.File.from_files(path=(datastore, 'airbus'))\n",
"dataset = dataset.register(workspace=workspace,\n",
" name='Airbus root',\n",
" description='Dataset for airbus images')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the script folder\n",
"script_folder = os.path.join(os.getcwd(), \"training_scripts\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Clean\n",
"1. downsize images to 256 X 256 \n",
"2. Put images to 2 folders: ship or no ship\n",
"3. Create the segmentation label images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create experiment to clear data\n",
"exp_data = Experiment(workspace = workspace, name = 'urthecast_data_clean')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register data reference\n",
"data_folder = DataReference(\n",
" datastore=datastore,\n",
" data_reference_name=\"airbus_root\",\n",
" path_on_datastore = 'airbus',\n",
" mode = 'mount')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create estimator for data clean\n",
"script_params = {\n",
" '--data_folder': data_folder\n",
"}\n",
"est_data = PyTorch(source_directory = script_folder,\n",
" compute_target = compute_target,\n",
" entry_script = 'clean-data.py', # python script for cleaning\n",
" script_params = script_params,\n",
" use_gpu = False,\n",
" node_count=1,\n",
" pip_packages = ['fastai'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Submit for running\n",
"data_run = exp_data.submit(est_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show run details\n",
"RunDetails(data_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ship/No ship classification"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create experiment to classification\n",
"exp_class = Experiment(workspace = workspace, name = 'classification')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Data reference for classification data\n",
"class_data_folder = DataReference(\n",
" datastore=datastore,\n",
" data_reference_name=\"airbus_class\",\n",
" path_on_datastore = 'airbus/class',\n",
" mode = 'mount')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Estimator for classification\n",
"from azureml.train.dnn import PyTorch, Mpi\n",
"\n",
"script_params = {\n",
" '--data_folder': class_data_folder,\n",
" '--num_epochs': 5\n",
"}\n",
"\n",
"est_class = PyTorch(source_directory = script_folder,\n",
" compute_target = compute_target,\n",
" entry_script = 'classification.py', # Classification script\n",
" script_params = script_params,\n",
" use_gpu = True,\n",
" node_count=3, # 3 nodes are used\n",
" distributed_training=Mpi(process_count_per_node = 4), # 4 GPU's per node\n",
" pip_packages = ['fastai'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the hyper drive for parameter tunning\n",
"param_sampling = GridParameterSampling({\n",
" 'start_learning_rate': choice(0.0001, 0.001),\n",
" 'end_learning_rate': choice(0.01, 0.1)})\n",
"\n",
"hyperdrive_class = HyperDriveConfig(estimator = est_class,\n",
" hyperparameter_sampling = param_sampling,\n",
" policy = None,\n",
" primary_metric_name = 'dice',\n",
" primary_metric_goal = PrimaryMetricGoal.MAXIMIZE,\n",
" max_total_runs = 4,\n",
" max_concurrent_runs = 4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Kick off running\n",
"classification_run = exp_class.submit(hyperdrive_class)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show running details\n",
"RunDetails(classification_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get results for all running\n",
"classification_run.wait_for_completion(show_output = False)\n",
"\n",
"children = list(classification_run.get_children())\n",
"metricslist = {}\n",
"i = 0\n",
"\n",
"for single_run in children:\n",
" results = {k: np.min(v) for k, v in single_run.get_metrics().items() if (k in ['dice', 'loss']) and isinstance(v, float)}\n",
" parameters = single_run.get_details()['runDefinition']['arguments']\n",
" try:\n",
" results['start_learning_rate'] = parameters[5]\n",
" results['end_learning_rate'] = parameters[7]\n",
" metricslist[i] = results\n",
" i += 1\n",
" except:\n",
" pass\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1).T.sort_values(by = ['loss'], ascending = True)\n",
"rundata"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show best running\n",
"best_run = classification_run.get_best_run_by_primary_metric()\n",
"best_run.get_file_names()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_uuid": "f487dd77687f4edb070bd5d2dc9da9a001d62bdb"
},
"source": [
"### Ship segmentation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Data reference for segmentation\n",
"sgmt_data_folder = DataReference(\n",
" datastore=datastore,\n",
" data_reference_name=\"airbus_segmentation\",\n",
" path_on_datastore = 'airbus/segmentation',\n",
" mode = 'mount')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Experiment for segmentation\n",
"exp_sgmt = Experiment(workspace = workspace, name = 'segmentation')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"_uuid": "56ed39146115a4767a257fec60a3b367284fa0d6"
},
"outputs": [],
"source": [
"# Estimator for segmentation\n",
"segmt_script_params = {\n",
" '--data_folder': sgmt_data_folder,\n",
" '--img_folder': '256-filter99',\n",
" '--num_epochs': 12\n",
"}\n",
"\n",
"segmt_est = PyTorch(source_directory = script_folder,\n",
" compute_target = compute_target,\n",
" entry_script = 'segmentation.py', # Segmentation script\n",
" script_params = segmt_script_params,\n",
" use_gpu = True,\n",
" node_count=4, # 4 nodes\n",
" distributed_training=Mpi(process_count_per_node = 4), # 4 GPU's per node\n",
" pip_packages = ['fastai'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"_uuid": "25fa3283c992696575914a5fdb6ebc433a0b5d1f"
},
"outputs": [],
"source": [
"# Kick off running\n",
"segmentation_run = exp_sgmt.submit(config=segmt_est)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Running detail\n",
"RunDetails(segmentation_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"_uuid": "193381699f5595c916647bfd6c51eaeba699379d"
},
"outputs": [],
"source": [
"# Results\n",
"segmentation_run.wait_for_completion(show_output=False) # specify True for a verbose log\n",
"print(segmentation_run.get_file_names())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Register model\n",
"model = larger_sgmt_run.register_model(model_name='segmentation-99',\n",
" tags={'ship': 'min99'},\n",
" model_path='outputs/segmentation.pkl')\n",
"print(model.name, model.id, model.version, sep='\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {
"_uuid": "a5af78f6512ab4f514818ef47b3481ef67a65e46"
},
"source": [
"### Prediction\n",
"Sample code for prediction"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read image\n",
"size = 256\n",
"ifile = '<Test image>'\n",
"img = open_image(ifile)\n",
"img = img.resize(size)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prediction\n",
"model_path = '<The model path>'\n",
"learn = load_learner(model_path)\n",
"pred = learn.predict(img)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

Просмотреть файл

@ -0,0 +1,8 @@
name: project_environment
dependencies:
- python>3.6.2
- torch>1.0
- pip:
# You must list azureml-defaults as a pip dependency
- fastai
- azureml-sdk[notebooks,automl]

Просмотреть файл

@ -0,0 +1,36 @@
import os
def set_environment_variables_for_nccl_backend(single_node=False):
os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
if not single_node:
master_node_params = os.environ['AZ_BATCH_MASTER_NODE'].split(':')
os.environ['MASTER_ADDR'] = master_node_params[0]
os.environ['MASTER_PORT'] = master_node_params[1]
else:
os.environ['MASTER_ADDR'] = os.environ['AZ_BATCHAI_MPI_MASTER_NODE']
os.environ['MASTER_PORT'] = '54965'
print('NCCL_SOCKET_IFNAME original value = {}'.format(os.environ['NCCL_SOCKET_IFNAME']))
# TODO make this parameterizable
os.environ['NCCL_SOCKET_IFNAME'] = '^docker0,lo'
print('RANK = {}'.format(os.environ['RANK']))
print('WORLD_SIZE = {}'.format(os.environ['WORLD_SIZE']))
print('MASTER_ADDR = {}'.format(os.environ['MASTER_ADDR']))
print('MASTER_PORT = {}'.format(os.environ['MASTER_PORT']))
# print('MASTER_NODE = {}'.format(os.environ['MASTER_NODE']))
print('NCCL_SOCKET_IFNAME new value = {}'.format(os.environ['NCCL_SOCKET_IFNAME']))
def get_local_rank():
return int(os.environ['OMPI_COMM_WORLD_LOCAL_RANK'])
def get_global_size():
return int(os.environ['OMPI_COMM_WORLD_SIZE'])
def get_local_size():
return int(os.environ['OMPI_COMM_WORLD_LOCAL_SIZE'])
def get_world_size():
return int(os.environ['WORLD_SIZE'])

Просмотреть файл

@ -0,0 +1,66 @@
import numpy as np
import fastai
from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.callbacks.mem import PeakMemMetric
from fastai.distributed import *
import os, argparse, time, random
from azureml.core import Workspace, Run, Dataset
from azureml_adapter import set_environment_variables_for_nccl_backend, get_local_rank, get_global_size, get_local_size
parser = argparse.ArgumentParser()
parser.add_argument('--data_folder', type=str, dest='data_folder', default='')
parser.add_argument('--img_size', type=int, dest='img_size', default=256)
parser.add_argument('--batch_size', type=int, dest='banch_size', default=64)
parser.add_argument('--num_epochs', type=int, dest='num_epochs', default=12)
parser.add_argument('--start_learning_rate', type=float, dest='start_learning_rate', default=0.001)
parser.add_argument('--end_learning_rate', type=float, dest='end_learning_rate', default=0.01)
parser.add_argument('--pct_start', type=float, dest='pct_start', default=0.9)
args = parser.parse_args()
local_rank = -1
local_rank = get_local_rank()
global_size = get_global_size()
local_size = get_local_size()
# TODO use logger
print('local_rank = {}'.format(local_rank))
print('global_size = {}'.format(global_size))
print('local_size = {}'.format(local_size))
set_environment_variables_for_nccl_backend(local_size == global_size)
torch.cuda.set_device(local_rank)
torch.distributed.init_process_group(backend='nccl', init_method='env://')
rank = int(os.environ['RANK'])
data_folder = args.data_folder
sz = args.img_size
bs = args.banch_size
print('Data folder:', data_folder)
run = Run.get_context()
work_folder = os.getcwd()
print('Work directory: ', work_folder)
data = ImageDataBunch.from_folder(data_folder, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=sz, bs = bs, num_workers=8).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet34, metrics=dice).to_distributed(local_rank)
learn.fit_one_cycle(args.num_epochs, slice(args.start_learning_rate,args.end_learning_rate))
#learn.unfreeze()
#learn.fit_one_cycle(5, slice(1e-5), pct_start=0.8)
result = learn.validate()
run.log('Worker #{:} loss'.format(rank), np.float(result[0]))
run.log('Worker #{:} dice'.format(rank), np.float(result[1]))
os.chdir(work_folder)
if rank == 0:
run.log('loss', np.float(result[0]))
run.log('dice', np.float(result[1]))
#filename = 'outputs/classification.pkl'
#learn.export(outputs/)

Просмотреть файл

@ -0,0 +1,134 @@
import numpy as np
import fastai
from fastai.vision import *
from fastai.callbacks.hooks import *
import os, glob, argparse, time, random, math
from azureml.core import Workspace, Run, Dataset
parser = argparse.ArgumentParser()
parser.add_argument('--data_folder', type=str, dest='data_folder')
parser.add_argument('--org_size', type=int, dest='org_size', default=768)
parser.add_argument('--train_folder', type=str, dest='train_folder', default='train_v2')
parser.add_argument('--train_sgmtfile', type=str, dest='train_sgmtfile', default='train_ship_segmentations_v2.csv')
parser.add_argument('--class_folder', type=str, dest='class_folder', default='class')
parser.add_argument('--img_size', type=int, dest='img_size', default=256)
parser.add_argument('--min_area', type=int, dest='min_area', default=99)
parser.add_argument('--sgmtimg_folder', type=str, dest='sgmtimg_folder', default='256-filter99')
parser.add_argument('--sgmtlabel_folder', type=str, dest='sgmtlabel_folder', default='256-label')
args = parser.parse_args()
run = Run.get_context()
data_folder = args.data_folder
print('Data folder: ', data_folder)
train_folder = os.path.join(data_folder, args.train_folder)
SEGMENTATION = os.path.join(data_folder, args.train_sgmtfile)
# Clean images
print('Searching the broken images.............')
brokenfiles = []
for fpath in glob.glob(os.path.join(train_folder, '*.jpg')):
try:
img = open_image(fpath)
except:
fn = os.path.basename(fpath)
print(fn, ' is broken')
brokenfiles.append(fn)
print(len(brokenfiles), ' images are broken')
print('Moving broken images.........')
broken_folder = os.path.join(train_folder, 'broken')
os.makedirs(broken_folder, exist_ok=True)
for fn in brokenfiles:
orig_name = os.path.join(train_folder, fn)
new_name = os.path.join(broken_folder, fn)
os.rename(orig_name, new_name)
# Divide images into Ship and NoShip
print('Split images to ship & no-ship folder .........')
df_masks = pd.read_csv(SEGMENTATION, index_col='ImageId')
class_folder = os.path.join(train_folder, args.class_folder)
ship_folder = os.path.join(class_folder, 'ship')
noship_folder = os.path.join(class_folder, 'no-ship')
for fpath in glob.glob(os.path.join(train_folder, '*.jpg')):
fn = os.path.basename(fpath)
if isinstance(df_masks.loc[fn,'EncodedPixels'], str):
tpath = os.path.join(ship_folder, fn)
else:
tpath = os.path.join(noship_folder, fn)
os.rename(fpath, tpath)
print('Generating lable files............')
sz_enc = [args.org_size, args.org_size]
def enc2mask(masks, shape = sz_enc):
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
if(type(masks) == float): return img.reshape(shape)
if(type(masks) == str): masks = [masks]
for mask in masks:
s = mask.split()
for i in range(len(s)//2):
start = int(s[2*i]) - 1
length = int(s[2*i+1])
img[start:start+length] = 1
return img.reshape(shape).T
label_folder = os.path.join(train_folder, 'label')
for fpath in glob.glob(os.path.join(ship_folder, '*.jpg')):
fn = os.path.basename(fpath)
labelpath = os.path.join(label_folder, Path(fn).stem + '.png')
mask = enc2mask(df_masks.loc[fn,'EncodedPixels'])
maskimg = PIL.Image.fromarray(mask)
maskimg.save(labelpath)
def SummaryLabelArea(label_root):
min_area = 1000000
area_hist = np.zeros(20, int)
for fpath in glob.glob(os.path.join(label_root, '*.png')):
mask = open_mask(fpath)
area = mask.data.sum()
area_hist[int(math.log2(area))] += 1
if area < min_area: min_area = area
print('Min area is ', min_area)
print(area_hist / np.sum(area_hist))
return min_area, area_hist
SummaryLabelArea(label_folder);
print('Resizing images and labels .........')
def ResizeTrainLabel(train_root, label_root, dest_train_root, dest_label_root, size, min_area = 0):
for fpathstr in glob.glob(os.path.join(train_root, '*.jpg')):
fpath = Path(fpathstr)
lpath = os.path.join(label_root, fpath.stem + '.png')
mask = open_mask(lpath)
mask = mask.resize(size)
if mask.data.sum() > min_area:
dest_lpath = os.path.join(dest_label_root, fpath.stem + '.png')
mask.save(dest_lpath)
img = open_image(fpath)
img = img.resize(size)
dest_fpath = os.path.join(dest_train_root, fpath.stem + '.jpg')
img.save(dest_fpath)
sgmtimg_folder = os.path.join(train_folder, args.sgmtimg_folder)
sgmtlabel_folder = os.path.join(train_folder, args.sgmtlabel_folder)
ResizeTrainLabel(ship_folder, label_folder, sgmtimg_folder, sgmtlabel_folder, args.img_size, args.min_area)

Просмотреть файл

@ -0,0 +1,119 @@
import numpy as np
import fastai
from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.callbacks.mem import PeakMemMetric
from fastai.distributed import *
import os, argparse, time, random
from azureml.core import Workspace, Run, Dataset
from azureml_adapter import set_environment_variables_for_nccl_backend, get_local_rank, get_global_size, get_local_size
def dice_loss(input, target):
#input = torch.sigmoid(input)
smooth = 1.0
iflat = input.flatten()
tflat = target.flatten()
intersection = (iflat * tflat).sum()
return ((2.0 * intersection + smooth) / (iflat.sum() + tflat.sum() + smooth))
class FocalLoss(nn.Module):
def __init__(self, gamma):
super().__init__()
self.gamma = gamma
def forward(self, input, target):
if not (target.size() == input.size()):
raise ValueError("Target size ({}) must be the same as input size ({})"
.format(target.size(), input.size()))
max_val = (-input).clamp(min=0)
loss = input - input * target + max_val + \
((-max_val).exp() + (-input - max_val).exp()).log()
invprobs = F.logsigmoid(-input * (target * 2.0 - 1.0))
loss = (invprobs * self.gamma).exp() * loss
return loss.mean()
class MixedLoss(nn.Module):
def __init__(self, alpha, gamma):
super().__init__()
self.alpha = alpha
self.focal = FocalLoss(gamma)
def forward(self, input, target):
input = F.softmax(input, dim=1)[:,1:,:,:]
input2 = torch.log((input+1e-7)/(1-input+1e-7))
loss = self.alpha*self.focal(input2, target) - torch.log(dice_loss(input, target))
return loss
parser = argparse.ArgumentParser()
parser.add_argument('--data_folder', type=str, dest='data_folder', default='')
parser.add_argument('--label_folder', type=str, dest='label_folder', default='256-label')
parser.add_argument('--img_folder', type=str, dest='img_folder', default='256-filter')
parser.add_argument('--img_size', type=int, dest='img_size', default=256)
parser.add_argument('--batch_size', type=int, dest='banch_size', default=16)
parser.add_argument('--num_epochs', type=int, dest='num_epochs', default=12)
parser.add_argument('--start_learning_rate', type=float, dest='start_learning_rate', default=0.000001)
parser.add_argument('--end_learning_rate', type=float, dest='end_learning_rate', default=0.001)
args = parser.parse_args()
local_rank = -1
local_rank = get_local_rank()
global_size = get_global_size()
local_size = get_local_size()
# TODO use logger
print('local_rank = {}'.format(local_rank))
print('global_size = {}'.format(global_size))
print('local_size = {}'.format(local_size))
set_environment_variables_for_nccl_backend(local_size == global_size)
torch.cuda.set_device(local_rank)
torch.distributed.init_process_group(backend='nccl', init_method='env://')
rank = int(os.environ['RANK'])
data_folder = args.data_folder
sz = args.img_size
bs = args.banch_size
print('Data folder:', data_folder)
run = Run.get_context()
work_folder = os.getcwd()
print('Work directory: ', work_folder)
label_path = Path(os.path.join(data_folder, args.label_folder))
get_y_fn = lambda x: label_path/f'{x.stem}.png'
tfms = get_transforms(max_rotate = 10, max_lighting = 0.05, max_warp = 0.2, flip_vert = True,
p_affine = 1., p_lighting = 1)
img_path = os.path.join(data_folder, args.img_folder)
data = (SegmentationItemList.from_folder(img_path)
.split_by_rand_pct(0.2)
.label_from_func(get_y_fn, classes=['Background','Ship'])
.transform(tfms, size=sz, tfm_y=True)
.databunch(path=Path('.'), bs=bs, num_workers=0)
.normalize(imagenet_stats))
learn = unet_learner(data, models.resnet34, loss_func=MixedLoss(10.0,2.0), metrics=dice, wd=1e-7).to_distributed(local_rank)
learn.fit_one_cycle(args.num_epochs, slice(args.start_learning_rate,args.end_learning_rate))
#learn.unfreeze()
#learn.fit_one_cycle(args.num_epochs, slice(args.start_learning_rate,args.end_learning_rate))
result = learn.validate()
run.log('Worker #{:} loss'.format(rank), np.float(result[0]))
run.log('Worker #{:} dice'.format(rank), np.float(result[1]))
if rank == 0:
run.log('loss', np.float(result[0]))
run.log('dice', np.float(result[1]))
os.chdir(work_folder)
filename = 'outputs/segmentation.pkl'
learn.export(filename)