Video Dataset / Model refactor + framework for action tests (#477)

* Removed submodules.

* Add back submodules using https://

* Update FAQ.md

* added object detection readme

* fixes to ic (#437)

* fixes to ic

* Matplotlib bug fix

* Matplotlib matrix plot bug fix
* Fix 01 notebook heatmap

* revert env

* revert env yml

* remove matplotlib

* simplified plotting functions

* fixed most tests

* fixed test

* fixed unit test

* small text edits to the 02 notebook

* added fct description

* tiny cleanup on notebook

* Move r2p1d from contrib to scenarios.

* Update .gitignore.

* Add README.md

* Remove the folder /scenario/action_recognition/data/samples; update notebook to use web url for sample data.

* Move data split files to data/misc; update notebook accordingly.

* Add pretrained keypoint model (#453)

* Add pretrained keypoint model

* Fix bugs in tests

* Add 03 notebook in conftest.py

* Minor revision

* Reformat code using black

* if folder exists, remove (#448)

* Update data path.

* Add mask annotation tool (#447)

* Add mask annotation tool

* Update mask annotation explanation and add converion scripts

* Add screenshots of Labelbox annotation

* Rearrange screenshots

* Move convertion script into functions in data.py

* Point out annotation conversion scripts clearly in notebook

* Refine annotation conversion scripts

* Fix bugs

* Add tests for labelbox format conversion methods

* Update README.md

* Add keypoint detection with tuned model (#454)

* Add keypoint detetion with tuned model

* Add tests

* Minor revision

* Update tests

* Fix bugs in tests

* Use GPU device if available

* Update tests

* Fix bug: 'not idx' will be 'True' if 'idx=0'

* Fix bugs

* Move toy keypoint meta into notebook

* Fix bugs

* Fix bugs

* Fix bugs in notebook

* Add descriptions for keypoint meta data

* Raise exception when RandomHorizontalFlip is used without specifying hflip_inds

* Add NOTICE file.

* Add keypoint detection model tuning with top and bottom keypoints (#456)

* Add keypoint detection model tuning with top and bottom keypoints

* Fix undefined unzip_url

* Resolved undefined od_urls

* Plot keypoints as round dots to make them noticeable (#458)

* Plot keypoints as dots

* Change variable naming

* Add annotation tool to scenarios.

* Resolve test machine failure (#460)

This is due to the latest PyTorch (version 1.3) from conda is built on
CUDA 10.1 while the version on the test machine is CUDA 10.0.

* Remove unused imports in 02_mask_rcnn.ipynb (#463)

* Remove unused imports in 02_mask_rcnn.ipynb

* Add missing imports

* Simplify binary_mask() (#464)

* remove conflict code (#471)

* Update README.md (#472)

* unit test for action rec

* reformat files

* added 01/02 notebooks

* fix all unit tests + abstract out commons from action rec

* dataset

* test data

* black reformat

* refactor action rec

* ignore /data

* notebook update

* update gitignore

* manage transforms better

* tfms_config defaults

* video dataset refactor + black

* notebook update with video datsaet refactored out

* Refactor model/dataset

* clean up

* refactor + beautification

* re-run 02 notebook

* make tests work locally

* pr fixes

* PR fixes

* pr fixes

* pr fixes

* update env

* pr fix

* move decord to pip

* added ref to config

* update pr fix

* flake8 + pr bug

Co-authored-by: Lixun Zhang <lixun.zhang@microsoft.com>
Co-authored-by: Lixun <lixzhang@users.noreply.github.com>
Co-authored-by: PatrickBue <pabuehle@microsoft.com>
Co-authored-by: Simon Zhao <43029286+simonzhaoms@users.noreply.github.com>
Co-authored-by: Miguel González-Fierro <3491412+miguelgfierro@users.noreply.github.com>
This commit is contained in:
JS 2020-03-26 13:06:55 -04:00 коммит произвёл GitHub
Родитель 198b985581
Коммит 8221c1659e
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
28 изменённых файлов: 3123 добавлений и 1000 удалений

1
.gitignore поставляемый
Просмотреть файл

@ -116,7 +116,6 @@ output.ipynb
# don't save any data
classification/data/*
/data/*
!/data/misc
!contrib/action_recognition/r2p1d/**
!contrib/crowd_counting/crowdcounting/data/
!scenarios/action_recognition/data

Просмотреть файл

@ -1,4 +1,3 @@
#
# To create the conda environment:
# $ conda env create -f environment.yml
#
@ -36,8 +35,10 @@ dependencies:
- pre-commit>=1.14.4
- pyyaml>=5.1.2
- requests>=2.22.0
- einops==0.1.0
- cytoolz
- pip:
- decord==0.3.5
- nvidia-ml-py3
- nteract-scrapbook
- azureml-sdk[notebooks,contrib]>=1.0.30

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -1,316 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Video Dataset Transformation \n",
"\n",
"In this notebook, we show examples of video dataset transformation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append(\"../../\")\n",
"import os\n",
"import time\n",
"import decord\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from sklearn.metrics import accuracy_score\n",
"import torch\n",
"import torch.cuda as cuda\n",
"import torch.nn as nn\n",
"import torchvision\n",
"import urllib.request\n",
"import shutil\n",
"\n",
"from utils_cv.action_recognition.data import show_batch, VideoDataset\n",
"from utils_cv.action_recognition.model import DEFAULT_MEAN, DEFAULT_STD\n",
"from utils_cv.action_recognition import system_info\n",
"from utils_cv.action_recognition.functional_video import denormalize\n",
"from utils_cv.action_recognition.transforms_video import (\n",
" CenterCropVideo, \n",
" NormalizeVideo,\n",
" RandomCropVideo,\n",
" RandomHorizontalFlipVideo,\n",
" RandomResizedCropVideo,\n",
" ResizeVideo,\n",
" ToTensorVideo,\n",
")\n",
"\n",
"system_info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def show_clip(clip, size_factor=600):\n",
" \"\"\"Show frames in a clip\"\"\"\n",
" if isinstance(clip, torch.Tensor):\n",
" # Convert [C, T, H, W] tensor to [T, H, W, C] numpy array \n",
" clip = np.moveaxis(clip.numpy(), 0, -1)\n",
" \n",
" figsize = np.array([clip[0].shape[1]*len(clip), clip[0].shape[0]]) / size_factor\n",
" plt.tight_layout()\n",
" fig, axs = plt.subplots(1, len(clip), figsize=figsize)\n",
" for i, f in enumerate(clip):\n",
" axs[i].axis(\"off\")\n",
" axs[i].imshow(f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare a Sample Video\n",
"A sample video path:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"url = \"https://cvbp.blob.core.windows.net/public/datasets/action_recognition/drinking.mp4\"\n",
"VIDEO_PATH = os.path.join(\"../../data/drinking.mp4\")\n",
"# Download the file from `url` and save it locally under `file_name`:\n",
"with urllib.request.urlopen(url) as response, open(VIDEO_PATH, 'wb') as out_file:\n",
" shutil.copyfileobj(response, out_file)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"video_reader = decord.VideoReader(VIDEO_PATH)\n",
"video_length = len(video_reader)\n",
"print(\"Video length = {} frames\".format(video_length))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use three frames (the first, middle, and the last) to quickly visualize video transformations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"clip = [\n",
" video_reader[0].asnumpy(),\n",
" video_reader[video_length//2].asnumpy(),\n",
" video_reader[video_length-1].asnumpy(),\n",
"]\n",
"show_clip(clip)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# [T, H, W, C] numpy array to [C, T, H, W] tensor\n",
"t_clip = ToTensorVideo()(torch.from_numpy(np.array(clip)))\n",
"t_clip.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video Transformations\n",
"\n",
"Resizing with the original ratio"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(ResizeVideo(size=800)(t_clip))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Resizing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(ResizeVideo(size=800, keep_ratio=False)(t_clip))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Center cropping"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(CenterCropVideo(size=800)(t_clip))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Random cropping"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"random_crop = RandomCropVideo(size=800)\n",
"show_clip(random_crop(t_clip))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(random_crop(t_clip))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Random resized cropping"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"random_resized_crop = RandomResizedCropVideo(size=800)\n",
"show_clip(random_resized_crop(t_clip))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(random_resized_crop(t_clip))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Normalizing (and denormalizing to verify)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"norm_t_clip = NormalizeVideo(mean=DEFAULT_MEAN, std=DEFAULT_STD)(t_clip)\n",
"show_clip(norm_t_clip)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(denormalize(norm_t_clip, mean=DEFAULT_MEAN, std=DEFAULT_STD))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Horizontal flipping"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_clip(RandomHorizontalFlipVideo(p=.5)(t_clip))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "r2p1d",
"language": "python",
"name": "r2p1d"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -73,6 +73,18 @@ def path_detection_notebooks():
)
def path_action_recognition_notebooks():
""" Returns the path of the action recognition notebooks folder. """
return os.path.abspath(
os.path.join(
os.path.dirname(__file__),
os.path.pardir,
"scenarios",
"action_recognition",
)
)
# ----- Module fixtures ----------------------------------------------------------
@ -82,39 +94,33 @@ def classification_notebooks():
# Path for the notebooks
paths = {
"00_webcam": os.path.join(folder_notebooks, "00_webcam.ipynb"),
"01_training_introduction": os.path.join(
folder_notebooks, "01_training_introduction.ipynb"
),
"02_multilabel_classification": os.path.join(
"00": os.path.join(folder_notebooks, "00_webcam.ipynb"),
"01": os.path.join(folder_notebooks, "01_training_introduction.ipynb"),
"02": os.path.join(
folder_notebooks, "02_multilabel_classification.ipynb"
),
"03_training_accuracy_vs_speed": os.path.join(
"03": os.path.join(
folder_notebooks, "03_training_accuracy_vs_speed.ipynb"
),
"10_image_annotation": os.path.join(
folder_notebooks, "10_image_annotation.ipynb"
),
"11_exploring_hyperparameters": os.path.join(
"10": os.path.join(folder_notebooks, "10_image_annotation.ipynb"),
"11": os.path.join(
folder_notebooks, "11_exploring_hyperparameters.ipynb"
),
"12_hard_negative_sampling": os.path.join(
"12": os.path.join(
folder_notebooks, "12_hard_negative_sampling.ipynb"
),
"20_azure_workspace_setup": os.path.join(
folder_notebooks, "20_azure_workspace_setup.ipynb"
),
"21_deployment_on_azure_container_instances": os.path.join(
"20": os.path.join(folder_notebooks, "20_azure_workspace_setup.ipynb"),
"21": os.path.join(
folder_notebooks,
"21_deployment_on_azure_container_instances.ipynb",
),
"22_deployment_on_azure_kubernetes_service": os.path.join(
"22": os.path.join(
folder_notebooks, "22_deployment_on_azure_kubernetes_service.ipynb"
),
"23_aci_aks_web_service_testing": os.path.join(
"23": os.path.join(
folder_notebooks, "23_aci_aks_web_service_testing.ipynb"
),
"24_exploring_hyperparameters_on_azureml": os.path.join(
"24": os.path.join(
folder_notebooks, "24_exploring_hyperparameters_on_azureml.ipynb"
),
}
@ -164,6 +170,20 @@ def detection_notebooks():
return paths
@pytest.fixture(scope="module")
def action_recognition_notebooks():
folder_notebooks = path_action_recognition_notebooks()
# Path for the notebooks
paths = {
"00": os.path.join(folder_notebooks, "00_webcam.ipynb"),
"01": os.path.join(folder_notebooks, "01_training_introduction.ipynb"),
"02": os.path.join(folder_notebooks, "02_training_hmbd.ipynb"),
"10": os.path.join(folder_notebooks, "10_video_transformation.ipynb"),
}
return paths
# ----- Function fixtures ----------------------------------------------------------
@ -378,7 +398,7 @@ def od_cup_path(tmp_session) -> str:
@pytest.fixture(scope="session")
def od_cup_mask_path(tmp_session) -> str:
""" Returns the path to the downloaded cup image. """
""" Returns the path to the downloaded cup mask image. """
im_url = (
"https://cvbp.blob.core.windows.net/public/images/cvbp_cup_mask.png"
)
@ -687,6 +707,22 @@ def od_detections(od_detection_dataset):
return learner.predict_dl(od_detection_dataset.test_dl, threshold=0)
# ------|-- Action Recognition ------------------------------------------------
@pytest.fixture(scope="session")
def ar_path(tmp_session) -> str:
""" Returns the path to the downloaded cup image. """
VID_URL = "https://cvbp.blob.core.windows.net/public/datasets/action_recognition/drinking.mp4"
vid_path = os.path.join(tmp_session, "drinking.mp4")
urllib.request.urlretrieve(VID_URL, vid_path)
return vid_path
# TODO
# ----- AML Settings ----------------------------------------------------------
@pytest.fixture(scope="session")
def coco_sample_path(tmpdir_factory) -> str:
""" Returns the path to a coco-formatted annotation. """
@ -695,9 +731,6 @@ def coco_sample_path(tmpdir_factory) -> str:
return path
# ----- AML Settings ----------------------------------------------------------
# TODO i can't find where this function is being used
def pytest_addoption(parser):
parser.addoption(
@ -767,3 +800,4 @@ def tiny_ic_databunch_valid_features(tiny_ic_databunch):
tiny_ic_databunch, DatasetType.Valid, learn, embedding_layer
)
return features

Просмотреть файл

@ -13,7 +13,7 @@ OUTPUT_NOTEBOOK = "output.ipynb"
@pytest.mark.notebooks
@pytest.mark.linuxgpu
def test_01_notebook_run(classification_notebooks):
notebook_path = classification_notebooks["01_training_introduction"]
notebook_path = classification_notebooks["01"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -30,7 +30,7 @@ def test_01_notebook_run(classification_notebooks):
@pytest.mark.notebooks
@pytest.mark.linuxgpu
def test_02_notebook_run(classification_notebooks):
notebook_path = classification_notebooks["02_multilabel_classification"]
notebook_path = classification_notebooks["02"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -48,7 +48,7 @@ def test_02_notebook_run(classification_notebooks):
@pytest.mark.notebooks
@pytest.mark.linuxgpu
def test_03_notebook_run(classification_notebooks):
notebook_path = classification_notebooks["03_training_accuracy_vs_speed"]
notebook_path = classification_notebooks["03"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -65,7 +65,7 @@ def test_03_notebook_run(classification_notebooks):
@pytest.mark.notebooks
@pytest.mark.linuxgpu
def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["11_exploring_hyperparameters"]
notebook_path = classification_notebooks["11"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -91,7 +91,7 @@ def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
@pytest.mark.notebooks
@pytest.mark.linuxgpu
def test_12_notebook_run(classification_notebooks):
notebook_path = classification_notebooks["12_hard_negative_sampling"]
notebook_path = classification_notebooks["12"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,

Просмотреть файл

@ -23,7 +23,7 @@ def test_ic_20_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = classification_notebooks["20_azure_workspace_setup"]
notebook_path = classification_notebooks["20"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -46,9 +46,7 @@ def test_ic_21_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = classification_notebooks[
"21_deployment_on_azure_container_instances"
]
notebook_path = classification_notebooks["21"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -71,9 +69,7 @@ def test_ic_22_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = classification_notebooks[
"22_deployment_on_azure_kubernetes_service"
]
notebook_path = classification_notebooks["22"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -96,7 +92,7 @@ def test_ic_23_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = classification_notebooks["23_aci_aks_web_service_testing"]
notebook_path = classification_notebooks["23"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -119,9 +115,7 @@ def test_ic_24_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = classification_notebooks[
"24_exploring_hyperparameters_on_azureml"
]
notebook_path = classification_notebooks["24"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -180,7 +174,7 @@ def test_od_20_notebook_run(
workspace_name,
workspace_region,
):
notebook_path = detection_notebooks["20_deployment_on_kubernetes"]
notebook_path = detection_notebooks["20"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,

Просмотреть файл

@ -0,0 +1,23 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import os
from utils_cv.action_recognition.data import (
_DatasetSpec,
Urls,
)
from utils_cv.common.data import data_path
def test__DatasetSpec_kinetics():
""" Tests DatasetSpec initialize with kinetics classes """
kinetics = _DatasetSpec(Urls.kinetics_label_map, 400)
kinetics.class_names
assert os.path.exists(str(data_path() / "label_map.txt"))
def test__DatasetSpec_hmdb():
""" Tests DatasetSpec initialize with hmdb51 classes """
hmdb51 = _DatasetSpec(Urls.hmdb51_label_map, 51)
hmdb51.class_names
assert os.path.exists(str(data_path() / "label_map.txt"))

Просмотреть файл

@ -0,0 +1,70 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# This test is based on the test suite implemented for Recommenders project
# https://github.com/Microsoft/Recommenders/tree/master/tests
import os
import papermill as pm
import pytest
import scrapbook as sb
# Unless manually modified, python3 should be
# the name of the current jupyter kernel
# that runs on the activated conda environment
KERNEL_NAME = "python3"
OUTPUT_NOTEBOOK = "output.ipynb"
@pytest.mark.notebooks
def test_00_notebook_run(action_recognition_notebooks):
notebook_path = action_recognition_notebooks["00"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
parameters=dict(PM_VERSION=pm.__version__),
kernel_name=KERNEL_NAME,
)
nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
# TODO add some asserts like below
# assert nb_output.scraps["predicted_label"].data == "coffee_mug"
# assert nb_output.scraps["predicted_confidence"].data > 0.5
@pytest.mark.notebooks
def test_01_notebook_run(action_recognition_notebooks):
# TODO - this notebook relies on downloading hmdb51, so pass for now
pass
# notebook_path = classification_notebooks["01"]
# pm.execute_notebook(
# notebook_path,
# OUTPUT_NOTEBOOK,
# parameters=dict(PM_VERSION=pm.__version__),
# kernel_name=KERNEL_NAME,
# )
# nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
# TODO add some asserts like below
# assert len(nb_output.scraps["training_accuracies"].data) == 1
@pytest.mark.notebooks
def test_02_notebook_run(action_recognition_notebooks):
pass
@pytest.mark.notebooks
def test_10_notebook_run(action_recognition_notebooks):
notebook_path = action_recognition_notebooks["10"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
parameters=dict(PM_VERSION=pm.__version__),
kernel_name=KERNEL_NAME,
)
nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
# TODO add some asserts like below
# assert len(nb_output.scraps["training_accuracies"].data) == 1

Просмотреть файл

@ -18,7 +18,7 @@ OUTPUT_NOTEBOOK = "output.ipynb"
@pytest.mark.notebooks
def test_00_notebook_run(classification_notebooks):
notebook_path = classification_notebooks["00_webcam"]
notebook_path = classification_notebooks["00"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -33,7 +33,7 @@ def test_00_notebook_run(classification_notebooks):
@pytest.mark.notebooks
def test_01_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["01_training_introduction"]
notebook_path = classification_notebooks["01"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -52,7 +52,7 @@ def test_01_notebook_run(classification_notebooks, tiny_ic_data_path):
@pytest.mark.notebooks
def test_02_notebook_run(classification_notebooks, multilabel_ic_data_path):
notebook_path = classification_notebooks["02_multilabel_classification"]
notebook_path = classification_notebooks["02"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -71,7 +71,7 @@ def test_02_notebook_run(classification_notebooks, multilabel_ic_data_path):
@pytest.mark.notebooks
def test_03_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["03_training_accuracy_vs_speed"]
notebook_path = classification_notebooks["03"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -93,7 +93,7 @@ def test_03_notebook_run(classification_notebooks, tiny_ic_data_path):
@pytest.mark.notebooks
def test_10_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["10_image_annotation"]
notebook_path = classification_notebooks["10"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -110,7 +110,7 @@ def test_10_notebook_run(classification_notebooks, tiny_ic_data_path):
@pytest.mark.notebooks
def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["11_exploring_hyperparameters"]
notebook_path = classification_notebooks["11"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
@ -131,7 +131,7 @@ def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
@pytest.mark.notebooks
def test_12_notebook_run(classification_notebooks, tiny_ic_data_path):
notebook_path = classification_notebooks["12_hard_negative_sampling"]
notebook_path = classification_notebooks["12"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,

Просмотреть файл

@ -8,6 +8,7 @@ from utils_cv.common.gpu import (
is_linux,
is_windows,
which_processor,
system_info,
)
@ -39,3 +40,7 @@ def test_db_num_workers():
else:
assert db_num_workers() == 16
assert db_num_workers(non_windows_num_workers=7) == 7
def test_system_info():
system_info()

Просмотреть файл

@ -2,12 +2,13 @@
# Licensed under the MIT License.
import os
import pytest
from pathlib import Path
from PIL import ImageFont
from fastai.vision import ImageList
from utils_cv.common.gpu import db_num_workers
from utils_cv.common.misc import copy_files, set_random_seed, get_font
from utils_cv.common.misc import copy_files, set_random_seed, get_font, Config
def test_set_random_seed(tiny_ic_data_path):
@ -75,3 +76,21 @@ def test_get_font():
type(font) == ImageFont.FreeTypeFont
or type(font) == ImageFont.ImageFont
)
def test_Config():
# test dictionary wrapper to make sure keys can be accessed as attributes
cfg = Config({"lr": 0.01, "momentum": 0.95})
assert cfg.lr == 0.01 and cfg.momentum == 0.95
cfg = Config(lr=0.01, momentum=0.95)
assert cfg.lr == 0.01 and cfg.momentum == 0.95
cfg = Config({"lr": 0.01}, momentum=0.95)
assert cfg.lr == 0.01 and cfg.momentum == 0.95
cfg_wrapper = Config(cfg, epochs=3)
assert (
cfg_wrapper.lr == 0.01
and cfg_wrapper.momentum == 0.95
and cfg_wrapper.epochs == 3
)
with pytest.raises(ValueError):
Config(3)

Просмотреть файл

@ -1 +0,0 @@
from .common import Config, system_info

Просмотреть файл

@ -1,51 +0,0 @@
# Copyright (c) Microsoft
# Licensed under the MIT License.
import sys
import torch
import torch.cuda as cuda
import torchvision
class Config(object):
def __init__(self, config=None, **extras):
"""Dictionary wrapper to access keys as attributes.
Args:
config (dict or Config): Configurations
extras (kwargs): Extra configurations
Examples:
>>> cfg = Config({'lr': 0.01}, momentum=0.95)
or
>>> cfg = Config({'lr': 0.01, 'momentum': 0.95})
then, use as follows:
>>> print(cfg.lr, cfg.momentum)
"""
if config is not None:
if isinstance(config, dict):
for k in config:
setattr(self, k, config[k])
elif isinstance(config, self.__class__):
self.__dict__ = config.__dict__.copy()
else:
raise ValueError("Unknown config")
for k, v in extras.items():
setattr(self, k, v)
def get(self, key, default):
return getattr(self, key, default)
def system_info():
print(sys.version, "\n")
print("PyTorch {}".format(torch.__version__), "\n")
print("Torch-vision {}".format(torchvision.__version__), "\n")
print("Available devices:")
if cuda.is_available():
for i in range(cuda.device_count()):
print("{}: {}".format(i, cuda.get_device_name(i)))
else:
print("CPUs")

Просмотреть файл

@ -3,40 +3,35 @@
import os
from pathlib import Path
from typing import Union, List
from urllib.request import urlretrieve
import warnings
import decord
from einops.layers.torch import Rearrange
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import randint
import torch
from torch.utils.data import Dataset
from torchvision.transforms import Compose
from . import transforms_video as transforms
from .functional_video import denormalize
DEFAULT_MEAN = (0.43216, 0.394666, 0.37645)
DEFAULT_STD = (0.22803, 0.22145, 0.216989)
from ..common.data import data_path
class _DatasetSpec:
def __init__(self, label_url, root, num_classes):
""" Properties of a Video Dataset. """
def __init__(
self,
label_url: str,
num_classes: int,
data_path: Union[Path, str] = data_path(),
) -> None:
self.label_url = label_url
self.root = root
self.num_classes = num_classes
self.data_path = data_path
self._class_names = None
@property
def class_names(self):
def class_names(self) -> List[str]:
if self._class_names is None:
label_filepath = os.path.join(self.root, "label_map.txt")
label_filepath = os.path.join(self.data_path, "label_map.txt")
if not os.path.isfile(label_filepath):
os.makedirs(self.root, exist_ok=True)
urlretrieve(self.label_url, label_filepath)
os.makedirs(self.data_path, exist_ok=True)
else:
os.remove(label_filepath)
urlretrieve(self.label_url, label_filepath)
with open(label_filepath) as f:
self._class_names = [l.strip() for l in f]
assert len(self._class_names) == self.num_classes
@ -44,259 +39,15 @@ class _DatasetSpec:
return self._class_names
class Urls:
kinetics_label_map = "https://github.com/microsoft/ComputerVision/files/3746975/kinetics400_lable_map.txt"
hmdb51_label_map = "https://github.com/microsoft/ComputerVision/files/3746963/hmdb51_label_map.txt"
KINETICS = _DatasetSpec(
"https://github.com/microsoft/ComputerVision/files/3746975/kinetics400_lable_map.txt",
os.path.join("data", "kinetics400"),
400
Urls.kinetics_label_map, 400, os.path.join("data", "kinetics400"),
)
HMDB51 = _DatasetSpec(
"https://github.com/microsoft/ComputerVision/files/3746963/hmdb51_label_map.txt",
os.path.join("data", "hmdb51"),
51
Urls.hmdb51_label_map, 51, os.path.join("data", "hmdb51"),
)
class VideoRecord(object):
def __init__(self, row):
self._data = row
self._num_frames = -1
@property
def path(self):
return self._data[0]
@property
def num_frames(self):
if self._num_frames == -1:
self._num_frames = int(len([x for x in Path(self._data[0]).glob('img_*')]) - 1)
return self._num_frames
@property
def label(self):
return int(self._data[1])
class VideoDataset(Dataset):
"""
Args:
split_file (str): Annotation file containing video filenames and labels.
video_dir (str): Videos directory.
num_segments (int): Number of clips to sample from each video.
sample_length (int): Number of consecutive frames to sample from a video (i.e. clip length).
sample_step (int): Sampling step.
input_size (int or tuple): Model input image size.
im_scale (int or tuple): Resize target size.
resize_keep_ratio (bool): If True, keep the original ratio when resizing.
mean (tuple): Normalization mean.
std (tuple): Normalization std.
random_shift (bool): Random temporal shift when sample a clip.
temporal_jitter (bool): Randomly skip frames when sampling each frames.
flip_ratio (float): Horizontal flip ratio.
random_crop (bool): If False, do center-crop.
random_crop_scales (tuple): Range of size of the origin size random cropped.
video_ext (str): Video file extension.
warning (bool): On or off warning.
"""
def __init__(
self,
split_file,
video_dir,
num_segments=1,
sample_length=8,
sample_step=1,
input_size=112,
im_scale=128,
resize_keep_ratio=True,
mean=DEFAULT_MEAN,
std=DEFAULT_STD,
random_shift=False,
temporal_jitter=False,
flip_ratio=0.5,
random_crop=False,
random_crop_scales=(0.6, 1.0),
video_ext="mp4",
warning=False,
):
# TODO maybe check wrong arguments to early failure
assert sample_step > 0
assert num_segments > 0
self.video_dir = video_dir
self.video_records = [
VideoRecord(x.strip().split(" ")) for x in open(split_file)
]
self.num_segments = num_segments
self.sample_length = sample_length
self.sample_step = sample_step
self.presample_length = sample_length * sample_step
# Temporal noise
self.random_shift = random_shift
self.temporal_jitter = temporal_jitter
# Video transforms
# 1. resize
trfms = [
transforms.ToTensorVideo(),
transforms.ResizeVideo(im_scale, resize_keep_ratio),
]
# 2. crop
if random_crop:
if random_crop_scales is not None:
crop = transforms.RandomResizedCropVideo(input_size, random_crop_scales)
else:
crop = transforms.RandomCropVideo(input_size)
else:
crop = transforms.CenterCropVideo(input_size)
trfms.append(crop)
# 3. flip
trfms.append(transforms.RandomHorizontalFlipVideo(flip_ratio))
# 4. normalize
trfms.append(transforms.NormalizeVideo(mean, std))
self.transforms = Compose(trfms)
self.video_ext = video_ext
self.warning = warning
def __len__(self):
return len(self.video_records)
def _sample_indices(self, record):
"""
Args:
record (VideoRecord): A video record.
Return:
list: Segment offsets (start indices)
"""
if record.num_frames > self.presample_length:
if self.random_shift:
# Random sample
offsets = np.sort(
randint(
record.num_frames - self.presample_length + 1,
size=self.num_segments,
)
)
else:
# Uniform sample
distance = (record.num_frames - self.presample_length + 1) / self.num_segments
offsets = np.array(
[int(distance / 2.0 + distance * x) for x in range(self.num_segments)]
)
else:
if self.warning:
warnings.warn(
"num_segments and/or sample_length > num_frames in {}".format(
record.path
)
)
offsets = np.zeros((self.num_segments,), dtype=int)
return offsets
def _get_frames(self, video_reader, offset):
clip = list()
# decord.seek() seems to have a bug. use seek_accurate().
video_reader.seek_accurate(offset)
# first frame
clip.append(video_reader.next().asnumpy())
# remaining frames
try:
if self.temporal_jitter:
for i in range(self.sample_length - 1):
step = randint(self.sample_step + 1)
if step == 0:
clip.append(clip[-1].copy())
else:
if step > 1:
video_reader.skip_frames(step - 1)
cur_frame = video_reader.next().asnumpy()
if len(cur_frame.shape) != 3:
# maybe end of the video
break
clip.append(cur_frame)
else:
for i in range(self.sample_length - 1):
if self.sample_step > 1:
video_reader.skip_frames(self.sample_step - 1)
cur_frame = video_reader.next().asnumpy()
if len(cur_frame.shape) != 3:
# maybe end of the video
break
clip.append(cur_frame)
except StopIteration:
pass
# if clip needs more frames, simply duplicate the last frame in the clip.
while len(clip) < self.sample_length:
clip.append(clip[-1].copy())
return clip
def __getitem__(self, idx):
"""
Return:
clips (torch.tensor), label (int)
"""
record = self.video_records[idx]
video_reader = decord.VideoReader(
"{}.{}".format(os.path.join(self.video_dir, record.path), self.video_ext),
# TODO try to add `ctx=decord.ndarray.gpu(0) or .cuda(0)`
)
record._num_frames = len(video_reader)
offsets = self._sample_indices(record)
clips = np.array([self._get_frames(video_reader, o) for o in offsets])
if self.num_segments == 1:
# [T, H, W, C] -> [C, T, H, W]
return self.transforms(torch.from_numpy(clips[0])), record.label
else:
# [S, T, H, W, C] -> [S, C, T, H, W]
return (
torch.stack([
self.transforms(torch.from_numpy(c)) for c in clips
]),
record.label
)
def show_batch(batch, sample_length, mean=DEFAULT_MEAN, std=DEFAULT_STD):
"""
Args:
batch (list[torch.tensor]): List of sample (clip) tensors
sample_length (int): Number of frames to show for each sample
mean (tuple): Normalization mean
std (tuple): Normalization std-dev
"""
batch_size = len(batch)
plt.tight_layout()
fig, axs = plt.subplots(
batch_size,
sample_length,
figsize=(4 * sample_length, 3 * batch_size)
)
for i, ax in enumerate(axs):
if batch_size == 1:
clip = batch[0]
else:
clip = batch[i]
clip = Rearrange("c t h w -> t c h w")(clip)
if not isinstance(ax, np.ndarray):
ax = [ax]
for j, a in enumerate(ax):
a.axis("off")
a.imshow(
np.moveaxis(
denormalize(
clip[j],
mean,
std,
).numpy(),
0,
-1,
)
)

Просмотреть файл

@ -0,0 +1,498 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import os
import copy
from pathlib import Path
import warnings
from typing import Callable, Tuple, Union, List
import decord
from einops.layers.torch import Rearrange
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import randint
import torch
from torch.utils.data import Dataset, Subset, DataLoader
from torchvision.transforms import Compose
from .references import transforms_video as transforms
from .references.functional_video import denormalize
from ..common.misc import Config
from ..common.gpu import num_devices
Trans = Callable[[object, dict], Tuple[object, dict]]
DEFAULT_MEAN = (0.43216, 0.394666, 0.37645)
DEFAULT_STD = (0.22803, 0.22145, 0.216989)
class VideoRecord(object):
"""
This class is used for parsing split-files where each row contains a path
and a label:
Ex:
```
path/to/my/clip.mp4 3
path/to/another/clip.mp4 32
```
"""
def __init__(self, data: List[str]):
""" Initialized a VideoRecord
Args:
row: a list where first element is the path and second element is
the label
"""
self._data = data
self._num_frames = None
@property
def path(self) -> str:
return self._data[0]
@property
def num_frames(self) -> int:
if self._num_frames is None:
self._num_frames = int(
len([x for x in Path(self._data[0]).glob("img_*")]) - 1
)
return self._num_frames
@property
def label(self) -> int:
return int(self._data[1])
def get_transforms(train: bool, tfms_config: Config = None) -> Trans:
""" Get default transformations to apply depending on whether we're applying it to the training or the validation set. If no tfms configurations are passed in, use the defaults.
Args:
train: whether or not this is for training
tfms_config: Config object with tranforms-related configs
Returns:
A list of transforms to apply
"""
if tfms_config is None:
tfms_config = (
get_default_tfms_config(train=True)
if train
else get_default_tfms_config(train=False)
)
# 1. resize
tfms = [
transforms.ToTensorVideo(),
transforms.ResizeVideo(
tfms_config.im_scale, tfms_config.resize_keep_ratio
),
]
# 2. crop
if tfms_config.random_crop:
if tfms_config.random_crop_scales:
crop = transforms.RandomResizedCropVideo(
tfms_config.input_size, tfms_config.random_crop_scales
)
else:
crop = transforms.RandomCropVideo(tfms_config.input_size)
else:
crop = transforms.CenterCropVideo(tfms_config.input_size)
tfms.append(crop)
# 3. flip
tfms.append(transforms.RandomHorizontalFlipVideo(tfms_config.flip_ratio))
# 4. normalize
tfms.append(transforms.NormalizeVideo(tfms_config.mean, tfms_config.std))
return Compose(tfms)
def get_default_tfms_config(train: bool) -> Config:
"""
Args:
train: whether or not this is for training
Settings:
input_size (int or tuple): Model input image size.
im_scale (int or tuple): Resize target size.
resize_keep_ratio (bool): If True, keep the original ratio when resizing.
mean (tuple): Normalization mean.
if train:
std (tuple): Normalization std.
flip_ratio (float): Horizontal flip ratio.
random_crop (bool): If False, do center-crop.
random_crop_scales (tuple): Range of size of the origin size random cropped.
"""
flip_ratio = 0.5 if train else 0.0
random_crop = True if train else False
random_crop_scales = (0.6, 1.0) if train else None
return Config(
dict(
input_size=112,
im_scale=128,
resize_keep_ratio=True,
mean=DEFAULT_MEAN,
std=DEFAULT_STD,
flip_ratio=flip_ratio,
random_crop=random_crop,
random_crop_scales=random_crop_scales,
)
)
class VideoDataset:
""" A video recognition dataset. """
def __init__(
self,
root: str,
train_pct: float = 0.75,
num_samples: int = 1,
sample_length: int = 8,
sample_step: int = 1,
temporal_jitter: bool = True,
temporal_jitter_step: int = 2,
random_shift: bool = True,
batch_size: int = 8,
video_ext: str = "mp4",
warning: bool = False,
train_split_file: str = None,
test_split_file: str = None,
train_transforms: Trans = get_transforms(train=True),
test_transforms: Trans = get_transforms(train=False),
) -> None:
""" initialize dataset
Arg:
root: Videos directory.
train_pct: percentage of dataset to use for training
num_samples: Number of clips to sample from each video.
sample_length: Number of consecutive frames to sample from a video (i.e. clip length).
sample_step: Sampling step.
temporal_jitter: Randomly skip frames when sampling each frames.
temporal_jitter_step: temporal jitter in frames
random_shift: Random temporal shift when sample a clip.
video_ext: Video file extension.
warning: On or off warning.
train_split_file: Annotation file containing video filenames and labels.
test_split_file: Annotation file containing video filenames and labels.
train_transforms: transforms for training
test_transforms: transforms for testing
"""
# TODO check wrong arguments early to prevent failure
assert sample_step > 0
assert num_samples > 0
if temporal_jitter:
assert temporal_jitter_step > 0
if train_split_file:
assert Path(train_split_file).exists()
assert (
test_split_file is not None and Path(test_split_file).exists()
)
if test_split_file:
assert Path(test_split_file).exists()
assert (
train_split_file is not None
and Path(train_split_file).exists()
)
self.root = root
self.num_samples = num_samples
self.sample_length = sample_length
self.sample_step = sample_step
self.presample_length = sample_length * sample_step
self.temporal_jitter_step = temporal_jitter_step
self.train_transforms = train_transforms
self.test_transforms = test_transforms
self.random_shift = random_shift
self.temporal_jitter = temporal_jitter
self.batch_size = batch_size
self.video_ext = video_ext
self.warning = warning
# create training and validation datasets
self.train_ds, self.test_ds = (
self.split_with_file(
train_split_file=train_split_file,
test_split_file=test_split_file,
)
if train_split_file
else self.split_train_test(train_pct=train_pct)
)
# initialize dataloaders
self.init_data_loaders()
def split_train_test(
self, train_pct: float = 0.8
) -> Tuple[Dataset, Dataset]:
""" Split this dataset into a training and testing set
Args:
train_pct: the ratio of images to use for training vs
testing
Return
A training and testing dataset in that order
"""
pass
def split_with_file(
self,
train_split_file: Union[Path, str],
test_split_file: Union[Path, str],
) -> Tuple[Dataset, Dataset]:
""" Split this dataset into a training and testing set using a split file.
Each line in the split file must use the form:
```
path/to/jumping/video.mp4 3
path/to/swimming/video.mp4 5
path/to/another/jumping/video.mp4 3
```
Args:
split_files: a tuple of 2 files
Return:
A training and testing dataset in that order
"""
self.video_records = []
# add train records
self.video_records.extend(
[
VideoRecord(row.strip().split(" "))
for row in open(train_split_file)
]
)
train_len = len(self.video_records)
# add validation records
self.video_records.extend(
[
VideoRecord(row.strip().split(" "))
for row in open(test_split_file)
]
)
# create indices
indices = torch.arange(0, len(self.video_records))
train_range = indices[:train_len]
test_range = indices[train_len:]
# create train subset
train = copy.deepcopy(Subset(self, train_range))
train.dataset.transforms = self.train_transforms
train.dataset.sample_step = (
self.temporal_jitter_step
if self.temporal_jitter
else self.sample_step
)
train.dataset.presample_length = self.sample_length * self.sample_step
# create test subset
test = copy.deepcopy(Subset(self, test_range))
test.dataset.transforms = self.test_transforms
test.dataset.random_shift = False
test.dataset.temporal_jitter = False
return train, test
def init_data_loaders(self) -> None:
""" Create training and validation data loaders. """
devices = num_devices()
self.train_dl = DataLoader(
self.train_ds,
batch_size=self.batch_size * devices,
shuffle=True,
num_workers=0, # Torch 1.2 has a bug when num-workers > 0 (0 means run a main-processor worker)
pin_memory=True,
)
self.test_dl = DataLoader(
self.test_ds,
batch_size=self.batch_size * devices,
shuffle=False,
num_workers=0,
pin_memory=True,
)
def __len__(self) -> int:
return len(self.video_records)
def _sample_indices(self, record: VideoRecord) -> List[int]:
"""
Create a list of frame-wise offsets into a video record. Depending on
whether or not 'random shift' is used, perform a uniform sample or a
random sample.
Args:
record (VideoRecord): A video record.
Return:
list: Segment offsets (start indices)
"""
if record.num_frames > self.presample_length:
if self.random_shift:
# Random sample
offsets = np.sort(
randint(
record.num_frames - self.presample_length + 1,
size=self.num_samples,
)
)
else:
# Uniform sample
distance = (
record.num_frames - self.presample_length + 1
) / self.num_samples
offsets = np.array(
[
int(distance / 2.0 + distance * x)
for x in range(self.num_samples)
]
)
else:
if self.warning:
warnings.warn(
f"num_samples and/or sample_length > num_frames in {record.path}"
)
offsets = np.zeros((self.num_samples,), dtype=int)
return offsets
def _get_frames(
self, video_reader: decord.VideoReader, offset: int,
) -> List[np.ndarray]:
""" Get frames at sample length.
Args:
video_reader: the decord tool for parsing videos
offset: where to start the reader from
Returns
Frames at sample length in a List
"""
clip = list()
# decord.seek() seems to have a bug. use seek_accurate().
video_reader.seek_accurate(offset)
# first frame
clip.append(video_reader.next().asnumpy())
# remaining frames
try:
for i in range(self.sample_length - 1):
step = (
randint(self.sample_step + 1)
if self.temporal_jitter
else self.sample_step
)
if step == 0 and self.temporal_jitter:
clip.append(clip[-1].copy())
else:
if step > 1:
video_reader.skip_frames(step - 1)
cur_frame = video_reader.next().asnumpy()
clip.append(cur_frame)
except StopIteration:
# pass when video has ended
pass
# if clip needs more frames, simply duplicate the last frame in the clip.
while len(clip) < self.sample_length:
clip.append(clip[-1].copy())
return clip
def __getitem__(self, idx: int) -> Tuple[torch.tensor, int]:
"""
Return:
clips (torch.tensor), label (int)
"""
record = self.video_records[idx]
video_reader = decord.VideoReader(
"{}.{}".format(
os.path.join(self.root, record.path), self.video_ext
),
# TODO try to add `ctx=decord.ndarray.gpu(0) or .cuda(0)`
)
record._num_frames = len(video_reader)
offsets = self._sample_indices(record)
clips = np.array([self._get_frames(video_reader, o) for o in offsets])
if self.num_samples == 1:
# [T, H, W, C] -> [C, T, H, W]
return self.transforms(torch.from_numpy(clips[0])), record.label
else:
# [S, T, H, W, C] -> [S, C, T, H, W]
return (
torch.stack(
[self.transforms(torch.from_numpy(c)) for c in clips]
),
record.label,
)
def _show_batch(
self,
batch: List[torch.tensor],
sample_length: int,
mean: Tuple[int, int, int] = DEFAULT_MEAN,
std: Tuple[int, int, int] = DEFAULT_STD,
) -> None:
"""
Display a batch of images.
Args:
batch: List of sample (clip) tensors
sample_length: Number of frames to show for each sample
mean: Normalization mean
std: Normalization std-dev
"""
batch_size = len(batch)
plt.tight_layout()
fig, axs = plt.subplots(
batch_size,
sample_length,
figsize=(4 * sample_length, 3 * batch_size),
)
for i, ax in enumerate(axs):
if batch_size == 1:
clip = batch[0]
else:
clip = batch[i]
clip = Rearrange("c t h w -> t c h w")(clip)
if not isinstance(ax, np.ndarray):
ax = [ax]
for j, a in enumerate(ax):
a.axis("off")
a.imshow(
np.moveaxis(denormalize(clip[j], mean, std).numpy(), 0, -1)
)
pass
def show_batch(self, train_or_test: str = "train", rows: int = 1) -> None:
"""Plot first few samples in the datasets"""
if train_or_test == "train":
batch = [self.train_ds.dataset[i][0] for i in range(rows)]
elif train_or_test == "valid":
batch = [self.test_ds.dataset[i][0] for i in range(rows)]
else:
raise ValueError("Unknown data type {}".format(which_data))
self._show_batch(batch, self.sample_length)

Просмотреть файл

@ -5,198 +5,137 @@ from collections import OrderedDict
import os
import time
import warnings
from typing import Union
from pathlib import Path
try:
from apex import amp
AMP_AVAILABLE = True
except ModuleNotFoundError:
AMP_AVAILABLE = False
from IPython.core.debugger import set_trace
import numpy as np
import torch
import torch.cuda as cuda
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
from . import Config
from .data import (
DEFAULT_MEAN,
DEFAULT_STD,
show_batch as _show_batch,
VideoDataset,
)
from ..common.misc import Config
from ..common.gpu import torch_device, num_devices
from .dataset import VideoDataset
from .metrics import accuracy, AverageMeter
from .references.metrics import accuracy, AverageMeter
# From https://github.com/moabitcoin/ig65m-pytorch
TORCH_R2PLUS1D = "moabitcoin/ig65m-pytorch"
# These paramaters are set so that we can use torch hub to download pretrained
# models from the specified repo
TORCH_R2PLUS1D = "moabitcoin/ig65m-pytorch" # From https://github.com/moabitcoin/ig65m-pytorch
MODELS = {
# model: output classes
'r2plus1d_34_32_ig65m': 359,
'r2plus1d_34_32_kinetics': 400,
'r2plus1d_34_8_ig65m': 487,
'r2plus1d_34_8_kinetics': 400,
# Model name followed by the number of output classes.
"r2plus1d_34_32_ig65m": 359,
"r2plus1d_34_32_kinetics": 400,
"r2plus1d_34_8_ig65m": 487,
"r2plus1d_34_8_kinetics": 400,
}
class R2Plus1D(object):
def __init__(self, cfgs):
self.configs = Config(cfgs)
self.train_ds, self.valid_ds = self.load_datasets(self.configs)
self.model = self.init_model(
self.configs.sample_length,
self.configs.base_model,
self.configs.num_classes
class VideoLearner(object):
""" Video recognition learner object that handles training loop and evaluation. """
def __init__(
self,
dataset: VideoDataset,
num_classes: int, # ie 51 for hmdb51
base_model: str = "ig65m", # or "kinetics"
) -> None:
""" By default, the Video Learner will use a R2plus1D model. Pass in
a dataset of type Video Dataset and the Video Learner will intialize
the model.
Args:
dataset: the datset to use for this model
num_class: the number of actions/classifications
base_model: the R2plus1D model is based on either ig65m or
kinetics. By default it will use the weights from ig65m since it
tends attain higher results.
"""
self.dataset = dataset
self.model, self.model_name = self.init_model(
self.dataset.sample_length, base_model, num_classes,
)
self.model_name = "r2plus1d_34_{}_{}".format(self.configs.sample_length, self.configs.base_model)
@staticmethod
def init_model(sample_length, base_model, num_classes=None):
if base_model not in ('ig65m', 'kinetics'):
def init_model(
sample_length: int, base_model: str, num_classes: int = None
) -> torchvision.models.video.resnet.VideoResNet:
"""
Initializes the model by loading it using torch's `hub.load`
functionality. Uses the model from TORCH_R2PLUS1D.
Args:
sample_length: Number of consecutive frames to sample from a video (i.e. clip length).
base_model: the R2plus1D model is based on either ig65m or kinetics.
num_classes: the number of classes/actions
Returns:
Load a model from a github repo, with pretrained weights
"""
if base_model not in ("ig65m", "kinetics"):
raise ValueError(
"Not supported model {}. Should be 'ig65m' or 'kinetics'"
.format(base_model)
f"Not supported model {base_model}. Should be 'ig65m' or 'kinetics'"
)
# Decide if to use pre-trained weights for DNN trained using 8 or for 32 frames
if sample_length<=8:
if sample_length <= 8:
model_sample_length = 8
else:
model_sample_length = 32
model_name = "r2plus1d_34_{}_{}".format(model_sample_length, base_model)
print("Loading {} model".format(model_name))
model_name = f"r2plus1d_34_{model_sample_length}_{base_model}"
print(f"Loading {model_name} model")
model = torch.hub.load(
TORCH_R2PLUS1D, model_name, num_classes=MODELS[model_name], pretrained=True
TORCH_R2PLUS1D,
model_name,
num_classes=MODELS[model_name],
pretrained=True,
)
# Replace head
if num_classes is not None:
model.fc = nn.Linear(model.fc.in_features, num_classes)
return model
@staticmethod
def load_datasets(cfgs):
"""Load VideoDataset
return model, model_name
Args:
cfgs (dict or Config): Dataset configuration. For validation dataset,
data augmentation such as random shift and temporal jitter is not used.
Return:
VideoDataset, VideoDataset: Train and validation datasets.
If split file is not provided, returns None.
"""
cfgs = Config(cfgs)
train_split = cfgs.get('train_split', None)
train_ds = None if train_split is None else VideoDataset(
split_file=train_split,
video_dir=cfgs.video_dir,
num_segments=1,
sample_length=cfgs.sample_length,
sample_step=cfgs.get('temporal_jitter_step', cfgs.get('sample_step', 1)),
input_size=112,
im_scale=cfgs.get('im_scale', 128),
resize_keep_ratio=cfgs.get('resize_keep_ratio', True),
mean=cfgs.get('mean', DEFAULT_MEAN),
std=cfgs.get('std', DEFAULT_STD),
random_shift=cfgs.get('random_shift', True),
temporal_jitter=True if cfgs.get('temporal_jitter_step', 0) > 0 else False,
flip_ratio=cfgs.get('flip_ratio', 0.5),
random_crop=cfgs.get('random_crop', True),
random_crop_scales=cfgs.get('random_crop_scales', (0.6, 1.0)),
video_ext=cfgs.video_ext,
)
valid_split = cfgs.get('valid_split', None)
valid_ds = None if valid_split is None else VideoDataset(
split_file=valid_split,
video_dir=cfgs.video_dir,
num_segments=1,
sample_length=cfgs.sample_length,
sample_step=cfgs.get('sample_step', 1),
input_size=112,
im_scale=cfgs.get('im_scale', 128),
resize_keep_ratio=True,
mean=cfgs.get('mean', DEFAULT_MEAN),
std=cfgs.get('std', DEFAULT_STD),
random_shift=False,
temporal_jitter=False,
flip_ratio=0.0,
random_crop=False, # == Center crop
random_crop_scales=None,
video_ext=cfgs.video_ext,
)
return train_ds, valid_ds
def show_batch(self, which_data='train', num_samples=1):
"""Plot first few samples in the datasets"""
if which_data == 'train':
batch = [self.train_ds[i][0] for i in range(num_samples)]
elif which_data == 'valid':
batch = [self.valid_ds[i][0] for i in range(num_samples)]
else:
raise ValueError("Unknown data type {}".format(which_data))
_show_batch(
batch,
self.configs.sample_length,
mean=self.configs.get('mean', DEFAULT_MEAN),
std=self.configs.get('std', DEFAULT_STD),
)
def freeze(self):
def freeze(self) -> None:
"""Freeze model except the last layer"""
self._set_requires_grad(False)
for param in self.model.fc.parameters():
param.requires_grad = True
def unfreeze(self):
def unfreeze(self) -> None:
self._set_requires_grad(True)
def _set_requires_grad(self, requires_grad=True):
def _set_requires_grad(self, requires_grad=True) -> None:
for param in self.model.parameters():
param.requires_grad = requires_grad
def fit(self, train_cfgs):
def fit(self, train_cfgs) -> None:
""" The primary fit function """
train_cfgs = Config(train_cfgs)
model_dir = train_cfgs.get('model_dir', "checkpoints")
model_dir = train_cfgs.get("model_dir", "checkpoints")
os.makedirs(model_dir, exist_ok=True)
if cuda.is_available():
device = torch.device("cuda")
num_devices = cuda.device_count()
# Look for the optimal set of algorithms to use in cudnn. Use this only with fixed-size inputs.
torch.backends.cudnn.benchmark = True
else:
device = torch.device("cpu")
num_devices = 1
data_loaders = {}
if self.train_ds is not None:
data_loaders['train'] = DataLoader(
self.train_ds,
batch_size=train_cfgs.get('batch_size', 8) * num_devices,
shuffle=True,
num_workers=0, # Torch 1.2 has a bug when num-workers > 0 (0 means run a main-processor worker)
pin_memory=True,
)
if self.valid_ds is not None:
data_loaders['valid'] = DataLoader(
self.valid_ds,
batch_size=train_cfgs.get('batch_size', 8) * num_devices,
shuffle=False,
num_workers=0,
pin_memory=True,
)
data_loaders["train"] = self.dataset.train_dl
data_loaders["valid"] = self.dataset.test_dl
# Move model to gpu before constructing optimizers and amp.initialize
device = torch_device()
self.model.to(device)
count_devices = num_devices()
torch.backends.cudnn.benchmark = True
named_params_to_update = {}
total_params = 0
@ -210,19 +149,22 @@ class R2Plus1D(object):
print("\tfull network")
else:
for name in named_params_to_update:
print("\t{}".format(name))
print(f"\t{name}")
momentum=train_cfgs.get('momentum', 0.95)
# create optimizer
momentum = train_cfgs.get("momentum", 0.95)
optimizer = optim.SGD(
list(named_params_to_update.values()),
lr=train_cfgs.lr,
momentum=momentum,
weight_decay=train_cfgs.get('weight_decay', 0.0001),
weight_decay=train_cfgs.get("weight_decay", 0.0001),
)
# Use mixed-precision if available
# Currently, only O1 works with DataParallel: See issues https://github.com/NVIDIA/apex/issues/227
if train_cfgs.get('mixed_prec', False) and AMP_AVAILABLE:
if train_cfgs.get("mixed_prec", False):
# break if not AMP_AVAILABLE
assert AMP_AVAILABLE
# 'O0': Full FP32, 'O1': Conservative, 'O2': Standard, 'O3': Full FP16
self.model, optimizer = amp.initialize(
self.model,
@ -233,36 +175,34 @@ class R2Plus1D(object):
)
# Learning rate scheduler
if train_cfgs.get('use_one_cycle_policy', False):
if train_cfgs.get("use_one_cycle_policy", False):
# Use warmup with the one-cycle policy
scheduler = torch.optim.lr_scheduler.OneCycleLR(
optimizer,
max_lr=train_cfgs.lr,
total_steps=train_cfgs.epochs,
pct_start=train_cfgs.get('warmup_pct', 0.3),
base_momentum=0.9*momentum,
pct_start=train_cfgs.get("warmup_pct", 0.3),
base_momentum=0.9 * momentum,
max_momentum=momentum,
)
else:
# Simple step-decay
scheduler = torch.optim.lr_scheduler.StepLR(
optimizer,
step_size=train_cfgs.get('lr_step_size', float("inf")),
gamma=train_cfgs.get('lr_gamma', 0.1),
step_size=train_cfgs.get("lr_step_size", float("inf")),
gamma=train_cfgs.get("lr_gamma", 0.1),
)
# DataParallel after amp.initialize
if num_devices > 1:
model = nn.DataParallel(self.model)
else:
model = self.model
model = (
nn.DataParallel(self.model) if count_devices > 1 else self.model
)
criterion = nn.CrossEntropyLoss().to(device)
for e in range(1, train_cfgs.epochs + 1):
print("Epoch {} ==========".format(e))
if scheduler is not None:
print("lr={}".format(scheduler.get_lr()))
print(f"Epoch {e} ==========")
print(f"lr={scheduler.get_lr()}")
self.train_an_epoch(
model,
@ -276,14 +216,16 @@ class R2Plus1D(object):
scheduler.step()
if train_cfgs.get('save_models', False):
if train_cfgs.get("save_models", False):
self.save(
os.path.join(
model_dir,
"{model_name}_{epoch}.pt".format(
model_name=train_cfgs.get('model_name', self.model_name),
epoch=str(e).zfill(3)
)
model_name=train_cfgs.get(
"model_name", self.model_name
),
epoch=str(e).zfill(3),
),
)
)
@ -296,29 +238,35 @@ class R2Plus1D(object):
optimizer,
grad_steps=1,
mixed_prec=False,
):
) -> None:
"""Train / validate a model for one epoch.
:param model:
:param data_loaders: dict {'train': train_dl, 'valid': valid_dl}
:param device:
:param criterion:
:param optimizer:
:param grad_steps: If > 1, use gradient accumulation. Useful for larger batching
:param mixed_prec: If True, use FP16 + FP32 mixed precision via NVIDIA apex.amp
:return: dict {
'train/time': batch_time.avg,
'train/loss': losses.avg,
'train/top1': top1.avg,
'train/top5': top5.avg,
'valid/time': ...
}
Args:
model: the model to use to train
data_loaders: dict {'train': train_dl, 'valid': valid_dl}
device: gpu or not
criterion: TODO
optimizer: TODO
grad_steps: If > 1, use gradient accumulation. Useful for larger batching
mixed_prec: If True, use FP16 + FP32 mixed precision via NVIDIA apex.amp
Return:
dict {
'train/time': batch_time.avg,
'train/loss': losses.avg,
'train/top1': top1.avg,
'train/top5': top5.avg,
'valid/time': ...
}
"""
assert "train" in data_loaders
if mixed_prec and not AMP_AVAILABLE:
warnings.warn(
"NVIDIA apex module is not installed. Cannot use mixed-precision."
"""
NVIDIA apex module is not installed. Cannot use
mixed-precision. Turning off mixed-precision.
"""
)
mixed_prec = False
result = OrderedDict()
for phase in ["train", "valid"]:
@ -356,8 +304,10 @@ class R2Plus1D(object):
# make the accumulated gradient to be the same scale as without the accumulation
loss = loss / grad_steps
if mixed_prec and AMP_AVAILABLE:
with amp.scale_loss(loss, optimizer) as scaled_loss:
if mixed_prec:
with amp.scale_loss(
loss, optimizer
) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
@ -371,30 +321,26 @@ class R2Plus1D(object):
end = time.time()
print(
"{} took {:.2f} sec: loss = {:.4f}, top1_acc = {:.4f}, top5_acc = {:.4f}".format(
phase, batch_time.sum, losses.avg, top1.avg, top5.avg
)
f"{phase} took {batch_time.sum:.2f} sec: loss = {losses.avg:.4f}, top1_acc = {top1.avg:.4f}, top5_acc = {top5.avg:.4f}"
)
result["{}/time".format(phase)] = batch_time.sum
result["{}/loss".format(phase)] = losses.avg
result["{}/top1".format(phase)] = top1.avg
result["{}/top5".format(phase)] = top5.avg
result[f"{phase}/time"] = batch_time.sum
result[f"{phase}/loss"] = losses.avg
result[f"{phase}/top1"] = top1.avg
result[f"{phase}/top5"] = top5.avg
return result
def save(self, model_path):
torch.save(
self.model.state_dict(),
model_path
)
def save(self, model_path: Union[Path, str]) -> None:
""" Save the model to a path on disk. """
torch.save(self.model.state_dict(), model_path)
def load(self, model_name, model_dir="checkpoints"):
def load(self, model_name: str, model_dir: str = "checkpoints") -> None:
"""
TODO accept epoch. If None, load the latest model.
:param model_name: Model name format should be 'name_0EE' where E is the epoch
:param model_dir: By default, 'checkpoints'
:return:
"""
self.model.load_state_dict(torch.load(
os.path.join(model_dir, "{}.pt".format(model_name))
))
self.model.load_state_dict(
torch.load(os.path.join(model_dir, f"{model_name}.pt"))
)

Просмотреть файл

@ -20,7 +20,7 @@ def crop(clip, i, j, h, w):
clip (torch.tensor): Video clip to be cropped. Size is (C, T, H, W)
"""
assert len(clip.size()) == 4, "clip should be a 4D tensor"
return clip[..., i:i + h, j:j + w]
return clip[..., i : i + h, j : j + w]
def resize(clip, target_size, interpolation_mode):
@ -53,7 +53,9 @@ def center_crop(clip, crop_size):
assert _is_tensor_video_clip(clip), "clip should be a 4D torch.tensor"
h, w = clip.size(-2), clip.size(-1)
th, tw = crop_size
assert h >= th and w >= tw, "height and width must be no smaller than crop_size"
assert (
h >= th and w >= tw
), "height and width must be no smaller than crop_size"
i = int(round((h - th) / 2.0))
j = int(round((w - tw) / 2.0))
@ -71,7 +73,9 @@ def to_tensor(clip):
"""
assert _is_tensor_video_clip(clip), "clip should be a 4D torch.tensor"
if not clip.dtype == torch.uint8:
raise TypeError("clip tensor should have data type uint8. Got %s" % str(clip.dtype))
raise TypeError(
"clip tensor should have data type uint8. Got %s" % str(clip.dtype)
)
return clip.float().permute(3, 0, 1, 2) / 255.0

Просмотреть файл

@ -5,6 +5,7 @@ import torch
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()

Просмотреть файл

@ -37,13 +37,19 @@ class ResizeVideo(object):
size = (int(self.size), int(self.size))
else:
if self.keep_ratio:
scale = min(self.size[0] / clip.shape[-2], self.size[1] / clip.shape[-1], )
scale = min(
self.size[0] / clip.shape[-2],
self.size[1] / clip.shape[-1],
)
else:
size = self.size
return nn.functional.interpolate(
clip, size=size, scale_factor=scale,
mode=self.interpolation_mode, align_corners=False
clip,
size=size,
scale_factor=scale,
mode=self.interpolation_mode,
align_corners=False,
)
@ -66,7 +72,7 @@ class RandomCropVideo(object):
return F.crop(clip, i, j, h, w)
def __repr__(self):
return self.__class__.__name__ + '(size={0})'.format(self.size)
return self.__class__.__name__ + "(size={0})".format(self.size)
@staticmethod
def get_params(clip, output_size):
@ -116,13 +122,17 @@ class RandomResizedCropVideo(object):
size is (C, T, H, W)
"""
i, j, h, w = self.get_params(clip, self.scale, self.ratio)
return F.resized_crop(clip, i, j, h, w, self.size, self.interpolation_mode)
return F.resized_crop(
clip, i, j, h, w, self.size, self.interpolation_mode
)
def __repr__(self):
return self.__class__.__name__ + \
'(size={0}, interpolation_mode={1}, scale={2}, ratio={3})'.format(
return (
self.__class__.__name__
+ "(size={0}, interpolation_mode={1}, scale={2}, ratio={3})".format(
self.size, self.interpolation_mode, self.scale, self.ratio
)
)
@staticmethod
def get_params(clip, scale, ratio):
@ -187,7 +197,7 @@ class CenterCropVideo(object):
return F.center_crop(clip, self.size)
def __repr__(self):
return self.__class__.__name__ + '(size={0})'.format(self.size)
return self.__class__.__name__ + "(size={0})".format(self.size)
class NormalizeVideo(object):
@ -212,8 +222,12 @@ class NormalizeVideo(object):
return F.normalize(clip, self.mean, self.std, self.inplace)
def __repr__(self):
return self.__class__.__name__ + '(mean={0}, std={1}, inplace={2})'.format(
self.mean, self.std, self.inplace)
return (
self.__class__.__name__
+ "(mean={0}, std={1}, inplace={2})".format(
self.mean, self.std, self.inplace
)
)
class ToTensorVideo(object):

Просмотреть файл

@ -70,7 +70,7 @@ def read_classes_file(classes_filepath):
classes = {}
with open(classes_filepath) as class_file:
for line in class_file:
class_name, class_id = line.split(' ')
class_name, class_id = line.split(" ")
classes[class_name] = class_id.rstrip()
return classes
@ -87,7 +87,7 @@ def create_clip_file_name(row, clip_file_format="mp4"):
:return: str.
The output clip file name.
"""
#video_file = ast.literal_eval(row.file_list)[0]
# video_file = ast.literal_eval(row.file_list)[0]
video_file = os.path.splitext(row["file_list"])[0]
clip_id = row["# CSV_HEADER = metadata_id"]
clip_file = "{}_{}.{}".format(video_file, clip_id, clip_file_format)
@ -477,14 +477,14 @@ def extract_contiguous_negative_clips(
# video_path = os.path.join(video_dir, negative_sample_file)
video_fname = os.path.splitext(os.path.basename(video_file_path))[0]
clip_fname = video_fname+no_action_class+str(i)
clip_fname = video_fname + no_action_class + str(i)
clip_subdir_fname = os.path.join(no_action_class, clip_fname)
negative_clip_file_list.append(clip_subdir_fname)
_extract_clip_ffmpeg(
start_time,
duration,
video_file_path,
os.path.join(negative_clip_dir, clip_fname+"."+clip_format),
os.path.join(negative_clip_dir, clip_fname + "." + clip_format),
ffmpeg_path,
)
@ -496,6 +496,7 @@ def extract_contiguous_negative_clips(
}
)
def extract_sampled_negative_clips(
video_info_df,
num_negative_samples,
@ -548,7 +549,9 @@ def extract_sampled_negative_clips(
clips_sampled = 0
while clips_sampled < num_negative_samples:
# pick random file in list of videos
negative_sample_file = video_files[random.randint(0, len(video_files)-1)]
negative_sample_file = video_files[
random.randint(0, len(video_files) - 1)
]
# get video duration
duration = video_len[negative_sample_file]
# pick random start time for clip
@ -559,15 +562,27 @@ def extract_sampled_negative_clips(
# check to ensure negative clip doesn't overlap a positive clip or pick another file
if negative_sample_file in positive_intervals.keys():
clip_positive_intervals = positive_intervals[negative_sample_file]
if check_interval_overlaps(clip_start, clip_end, clip_positive_intervals):
if check_interval_overlaps(
clip_start, clip_end, clip_positive_intervals
):
continue
video_path = os.path.join(video_dir, negative_sample_file)
video_fname = os.path.splitext(negative_sample_file)[0]
clip_fname = video_fname+no_action_class+str(clips_sampled)
clip_fname = video_fname + no_action_class + str(clips_sampled)
clip_subdir_fname = os.path.join(no_action_class, clip_fname)
_extract_clip_ffmpeg(
clip_start, negative_clip_length, video_path, os.path.join(clip_dir, clip_subdir_fname+"."+clip_format),
clip_start,
negative_clip_length,
video_path,
os.path.join(clip_dir, clip_subdir_fname + "." + clip_format),
)
with open(label_filepath, 'a') as f:
f.write("\""+clip_subdir_fname+"\""+" "+str(classes[no_action_class])+"\n")
with open(label_filepath, "a") as f:
f.write(
'"'
+ clip_subdir_fname
+ '"'
+ " "
+ str(classes[no_action_class])
+ "\n"
)
clips_sampled += 1

Просмотреть файл

@ -28,7 +28,9 @@ class Urls:
base, "fridgeObjectsWatermarkTiny.zip"
)
fridge_objects_negatives_path = urljoin(base, "fridgeObjectsNegative.zip")
fridge_objects_negatives_tiny_path = urljoin(base, "fridgeObjectsNegativeTiny.zip")
fridge_objects_negatives_tiny_path = urljoin(
base, "fridgeObjectsNegativeTiny.zip"
)
# multilabel datasets
multilabel_fridge_objects_path = urljoin(

Просмотреть файл

@ -3,8 +3,10 @@
import os
import platform
import sys
import torch
import torch.cuda as cuda
import torchvision
from torch.cuda import current_device, get_device_name, is_available
@ -47,6 +49,15 @@ def torch_device():
)
def num_devices():
""" Gets the number of devices based on cpu/gpu """
return (
torch.cuda.device_count()
if torch.cuda.is_available()
else 1
)
def db_num_workers(non_windows_num_workers: int = 16):
"""Returns how many workers to use when loading images in a databunch. On windows machines using >0 works significantly slows down model
training and evaluation. Setting num_workers to zero on Windows machines will speed up training/inference significantly, but will still be
@ -58,3 +69,15 @@ def db_num_workers(non_windows_num_workers: int = 16):
return 0
else:
return non_windows_num_workers
def system_info():
print(sys.version, "\n")
print(f"PyTorch {torch.__version__} \n")
print(f"Torch-vision {torchvision.__version__} \n")
print("Available devices:")
if cuda.is_available():
for i in range(cuda.device_count()):
print(f"{i}: {cuda.get_device_name(i)}")
else:
print("CPUs only, no GPUs found")

Просмотреть файл

@ -83,3 +83,34 @@ def get_font(size: int = 12) -> ImageFont:
font = None
return font
class Config(object):
def __init__(self, config=None, **extras):
"""Dictionary wrapper to access keys as attributes.
Args:
config (dict or Config): Configurations
extras (kwargs): Extra configurations
Examples:
>>> cfg = Config({'lr': 0.01}, momentum=0.95)
or
>>> cfg = Config({'lr': 0.01, 'momentum': 0.95})
then, use as follows:
>>> print(cfg.lr, cfg.momentum)
"""
if config is not None:
if isinstance(config, dict):
for k in config:
setattr(self, k, config[k])
elif isinstance(config, self.__class__):
self.__dict__ = config.__dict__.copy()
else:
raise ValueError("Unknown config")
for k, v in extras.items():
setattr(self, k, v)
def get(self, key, default):
return getattr(self, key, default)

Просмотреть файл

@ -109,7 +109,13 @@ class CocoEvaluator(object):
labels = prediction["labels"].tolist()
rles = [
mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0]
mask_util.encode(
# Change according to the issue related to mask:
# https://github.com/pytorch/vision/issues/1355#issuecomment-544951911
np.array(
mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"
)
)[0]
for mask in masks
]
for rle in rles: