Video Dataset / Model refactor + framework for action tests (#477)
* Removed submodules. * Add back submodules using https:// * Update FAQ.md * added object detection readme * fixes to ic (#437) * fixes to ic * Matplotlib bug fix * Matplotlib matrix plot bug fix * Fix 01 notebook heatmap * revert env * revert env yml * remove matplotlib * simplified plotting functions * fixed most tests * fixed test * fixed unit test * small text edits to the 02 notebook * added fct description * tiny cleanup on notebook * Move r2p1d from contrib to scenarios. * Update .gitignore. * Add README.md * Remove the folder /scenario/action_recognition/data/samples; update notebook to use web url for sample data. * Move data split files to data/misc; update notebook accordingly. * Add pretrained keypoint model (#453) * Add pretrained keypoint model * Fix bugs in tests * Add 03 notebook in conftest.py * Minor revision * Reformat code using black * if folder exists, remove (#448) * Update data path. * Add mask annotation tool (#447) * Add mask annotation tool * Update mask annotation explanation and add converion scripts * Add screenshots of Labelbox annotation * Rearrange screenshots * Move convertion script into functions in data.py * Point out annotation conversion scripts clearly in notebook * Refine annotation conversion scripts * Fix bugs * Add tests for labelbox format conversion methods * Update README.md * Add keypoint detection with tuned model (#454) * Add keypoint detetion with tuned model * Add tests * Minor revision * Update tests * Fix bugs in tests * Use GPU device if available * Update tests * Fix bug: 'not idx' will be 'True' if 'idx=0' * Fix bugs * Move toy keypoint meta into notebook * Fix bugs * Fix bugs * Fix bugs in notebook * Add descriptions for keypoint meta data * Raise exception when RandomHorizontalFlip is used without specifying hflip_inds * Add NOTICE file. * Add keypoint detection model tuning with top and bottom keypoints (#456) * Add keypoint detection model tuning with top and bottom keypoints * Fix undefined unzip_url * Resolved undefined od_urls * Plot keypoints as round dots to make them noticeable (#458) * Plot keypoints as dots * Change variable naming * Add annotation tool to scenarios. * Resolve test machine failure (#460) This is due to the latest PyTorch (version 1.3) from conda is built on CUDA 10.1 while the version on the test machine is CUDA 10.0. * Remove unused imports in 02_mask_rcnn.ipynb (#463) * Remove unused imports in 02_mask_rcnn.ipynb * Add missing imports * Simplify binary_mask() (#464) * remove conflict code (#471) * Update README.md (#472) * unit test for action rec * reformat files * added 01/02 notebooks * fix all unit tests + abstract out commons from action rec * dataset * test data * black reformat * refactor action rec * ignore /data * notebook update * update gitignore * manage transforms better * tfms_config defaults * video dataset refactor + black * notebook update with video datsaet refactored out * Refactor model/dataset * clean up * refactor + beautification * re-run 02 notebook * make tests work locally * pr fixes * PR fixes * pr fixes * pr fixes * update env * pr fix * move decord to pip * added ref to config * update pr fix * flake8 + pr bug Co-authored-by: Lixun Zhang <lixun.zhang@microsoft.com> Co-authored-by: Lixun <lixzhang@users.noreply.github.com> Co-authored-by: PatrickBue <pabuehle@microsoft.com> Co-authored-by: Simon Zhao <43029286+simonzhaoms@users.noreply.github.com> Co-authored-by: Miguel González-Fierro <3491412+miguelgfierro@users.noreply.github.com>
This commit is contained in:
Родитель
198b985581
Коммит
8221c1659e
|
@ -116,7 +116,6 @@ output.ipynb
|
|||
# don't save any data
|
||||
classification/data/*
|
||||
/data/*
|
||||
!/data/misc
|
||||
!contrib/action_recognition/r2p1d/**
|
||||
!contrib/crowd_counting/crowdcounting/data/
|
||||
!scenarios/action_recognition/data
|
||||
|
@ -142,4 +141,4 @@ classification/FAQ.html
|
|||
classification/.DS_Store
|
||||
.DS_Store
|
||||
|
||||
*.h5
|
||||
*.h5
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
#
|
||||
# To create the conda environment:
|
||||
# $ conda env create -f environment.yml
|
||||
#
|
||||
|
@ -36,8 +35,10 @@ dependencies:
|
|||
- pre-commit>=1.14.4
|
||||
- pyyaml>=5.1.2
|
||||
- requests>=2.22.0
|
||||
- einops==0.1.0
|
||||
- cytoolz
|
||||
- pip:
|
||||
- decord==0.3.5
|
||||
- nvidia-ml-py3
|
||||
- nteract-scrapbook
|
||||
- azureml-sdk[notebooks,contrib]>=1.0.30
|
||||
|
|
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
|
@ -1,316 +0,0 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Video Dataset Transformation \n",
|
||||
"\n",
|
||||
"In this notebook, we show examples of video dataset transformation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"sys.path.append(\"../../\")\n",
|
||||
"import os\n",
|
||||
"import time\n",
|
||||
"import decord\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"from sklearn.metrics import accuracy_score\n",
|
||||
"import torch\n",
|
||||
"import torch.cuda as cuda\n",
|
||||
"import torch.nn as nn\n",
|
||||
"import torchvision\n",
|
||||
"import urllib.request\n",
|
||||
"import shutil\n",
|
||||
"\n",
|
||||
"from utils_cv.action_recognition.data import show_batch, VideoDataset\n",
|
||||
"from utils_cv.action_recognition.model import DEFAULT_MEAN, DEFAULT_STD\n",
|
||||
"from utils_cv.action_recognition import system_info\n",
|
||||
"from utils_cv.action_recognition.functional_video import denormalize\n",
|
||||
"from utils_cv.action_recognition.transforms_video import (\n",
|
||||
" CenterCropVideo, \n",
|
||||
" NormalizeVideo,\n",
|
||||
" RandomCropVideo,\n",
|
||||
" RandomHorizontalFlipVideo,\n",
|
||||
" RandomResizedCropVideo,\n",
|
||||
" ResizeVideo,\n",
|
||||
" ToTensorVideo,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"system_info()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def show_clip(clip, size_factor=600):\n",
|
||||
" \"\"\"Show frames in a clip\"\"\"\n",
|
||||
" if isinstance(clip, torch.Tensor):\n",
|
||||
" # Convert [C, T, H, W] tensor to [T, H, W, C] numpy array \n",
|
||||
" clip = np.moveaxis(clip.numpy(), 0, -1)\n",
|
||||
" \n",
|
||||
" figsize = np.array([clip[0].shape[1]*len(clip), clip[0].shape[0]]) / size_factor\n",
|
||||
" plt.tight_layout()\n",
|
||||
" fig, axs = plt.subplots(1, len(clip), figsize=figsize)\n",
|
||||
" for i, f in enumerate(clip):\n",
|
||||
" axs[i].axis(\"off\")\n",
|
||||
" axs[i].imshow(f)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prepare a Sample Video\n",
|
||||
"A sample video path:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://cvbp.blob.core.windows.net/public/datasets/action_recognition/drinking.mp4\"\n",
|
||||
"VIDEO_PATH = os.path.join(\"../../data/drinking.mp4\")\n",
|
||||
"# Download the file from `url` and save it locally under `file_name`:\n",
|
||||
"with urllib.request.urlopen(url) as response, open(VIDEO_PATH, 'wb') as out_file:\n",
|
||||
" shutil.copyfileobj(response, out_file)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"video_reader = decord.VideoReader(VIDEO_PATH)\n",
|
||||
"video_length = len(video_reader)\n",
|
||||
"print(\"Video length = {} frames\".format(video_length))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We use three frames (the first, middle, and the last) to quickly visualize video transformations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clip = [\n",
|
||||
" video_reader[0].asnumpy(),\n",
|
||||
" video_reader[video_length//2].asnumpy(),\n",
|
||||
" video_reader[video_length-1].asnumpy(),\n",
|
||||
"]\n",
|
||||
"show_clip(clip)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# [T, H, W, C] numpy array to [C, T, H, W] tensor\n",
|
||||
"t_clip = ToTensorVideo()(torch.from_numpy(np.array(clip)))\n",
|
||||
"t_clip.shape"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Video Transformations\n",
|
||||
"\n",
|
||||
"Resizing with the original ratio"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(ResizeVideo(size=800)(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Resizing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(ResizeVideo(size=800, keep_ratio=False)(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Center cropping"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(CenterCropVideo(size=800)(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Random cropping"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"random_crop = RandomCropVideo(size=800)\n",
|
||||
"show_clip(random_crop(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(random_crop(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Random resized cropping"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"random_resized_crop = RandomResizedCropVideo(size=800)\n",
|
||||
"show_clip(random_resized_crop(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(random_resized_crop(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Normalizing (and denormalizing to verify)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"norm_t_clip = NormalizeVideo(mean=DEFAULT_MEAN, std=DEFAULT_STD)(t_clip)\n",
|
||||
"show_clip(norm_t_clip)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(denormalize(norm_t_clip, mean=DEFAULT_MEAN, std=DEFAULT_STD))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Horizontal flipping"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"show_clip(RandomHorizontalFlipVideo(p=.5)(t_clip))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "r2p1d",
|
||||
"language": "python",
|
||||
"name": "r2p1d"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
|
@ -73,6 +73,18 @@ def path_detection_notebooks():
|
|||
)
|
||||
|
||||
|
||||
def path_action_recognition_notebooks():
|
||||
""" Returns the path of the action recognition notebooks folder. """
|
||||
return os.path.abspath(
|
||||
os.path.join(
|
||||
os.path.dirname(__file__),
|
||||
os.path.pardir,
|
||||
"scenarios",
|
||||
"action_recognition",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
# ----- Module fixtures ----------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -82,39 +94,33 @@ def classification_notebooks():
|
|||
|
||||
# Path for the notebooks
|
||||
paths = {
|
||||
"00_webcam": os.path.join(folder_notebooks, "00_webcam.ipynb"),
|
||||
"01_training_introduction": os.path.join(
|
||||
folder_notebooks, "01_training_introduction.ipynb"
|
||||
),
|
||||
"02_multilabel_classification": os.path.join(
|
||||
"00": os.path.join(folder_notebooks, "00_webcam.ipynb"),
|
||||
"01": os.path.join(folder_notebooks, "01_training_introduction.ipynb"),
|
||||
"02": os.path.join(
|
||||
folder_notebooks, "02_multilabel_classification.ipynb"
|
||||
),
|
||||
"03_training_accuracy_vs_speed": os.path.join(
|
||||
"03": os.path.join(
|
||||
folder_notebooks, "03_training_accuracy_vs_speed.ipynb"
|
||||
),
|
||||
"10_image_annotation": os.path.join(
|
||||
folder_notebooks, "10_image_annotation.ipynb"
|
||||
),
|
||||
"11_exploring_hyperparameters": os.path.join(
|
||||
"10": os.path.join(folder_notebooks, "10_image_annotation.ipynb"),
|
||||
"11": os.path.join(
|
||||
folder_notebooks, "11_exploring_hyperparameters.ipynb"
|
||||
),
|
||||
"12_hard_negative_sampling": os.path.join(
|
||||
"12": os.path.join(
|
||||
folder_notebooks, "12_hard_negative_sampling.ipynb"
|
||||
),
|
||||
"20_azure_workspace_setup": os.path.join(
|
||||
folder_notebooks, "20_azure_workspace_setup.ipynb"
|
||||
),
|
||||
"21_deployment_on_azure_container_instances": os.path.join(
|
||||
"20": os.path.join(folder_notebooks, "20_azure_workspace_setup.ipynb"),
|
||||
"21": os.path.join(
|
||||
folder_notebooks,
|
||||
"21_deployment_on_azure_container_instances.ipynb",
|
||||
),
|
||||
"22_deployment_on_azure_kubernetes_service": os.path.join(
|
||||
"22": os.path.join(
|
||||
folder_notebooks, "22_deployment_on_azure_kubernetes_service.ipynb"
|
||||
),
|
||||
"23_aci_aks_web_service_testing": os.path.join(
|
||||
"23": os.path.join(
|
||||
folder_notebooks, "23_aci_aks_web_service_testing.ipynb"
|
||||
),
|
||||
"24_exploring_hyperparameters_on_azureml": os.path.join(
|
||||
"24": os.path.join(
|
||||
folder_notebooks, "24_exploring_hyperparameters_on_azureml.ipynb"
|
||||
),
|
||||
}
|
||||
|
@ -164,6 +170,20 @@ def detection_notebooks():
|
|||
return paths
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def action_recognition_notebooks():
|
||||
folder_notebooks = path_action_recognition_notebooks()
|
||||
|
||||
# Path for the notebooks
|
||||
paths = {
|
||||
"00": os.path.join(folder_notebooks, "00_webcam.ipynb"),
|
||||
"01": os.path.join(folder_notebooks, "01_training_introduction.ipynb"),
|
||||
"02": os.path.join(folder_notebooks, "02_training_hmbd.ipynb"),
|
||||
"10": os.path.join(folder_notebooks, "10_video_transformation.ipynb"),
|
||||
}
|
||||
return paths
|
||||
|
||||
|
||||
# ----- Function fixtures ----------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -378,7 +398,7 @@ def od_cup_path(tmp_session) -> str:
|
|||
|
||||
@pytest.fixture(scope="session")
|
||||
def od_cup_mask_path(tmp_session) -> str:
|
||||
""" Returns the path to the downloaded cup image. """
|
||||
""" Returns the path to the downloaded cup mask image. """
|
||||
im_url = (
|
||||
"https://cvbp.blob.core.windows.net/public/images/cvbp_cup_mask.png"
|
||||
)
|
||||
|
@ -687,6 +707,22 @@ def od_detections(od_detection_dataset):
|
|||
return learner.predict_dl(od_detection_dataset.test_dl, threshold=0)
|
||||
|
||||
|
||||
# ------|-- Action Recognition ------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def ar_path(tmp_session) -> str:
|
||||
""" Returns the path to the downloaded cup image. """
|
||||
VID_URL = "https://cvbp.blob.core.windows.net/public/datasets/action_recognition/drinking.mp4"
|
||||
vid_path = os.path.join(tmp_session, "drinking.mp4")
|
||||
urllib.request.urlretrieve(VID_URL, vid_path)
|
||||
return vid_path
|
||||
|
||||
|
||||
# TODO
|
||||
|
||||
# ----- AML Settings ----------------------------------------------------------
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def coco_sample_path(tmpdir_factory) -> str:
|
||||
""" Returns the path to a coco-formatted annotation. """
|
||||
|
@ -695,9 +731,6 @@ def coco_sample_path(tmpdir_factory) -> str:
|
|||
return path
|
||||
|
||||
|
||||
# ----- AML Settings ----------------------------------------------------------
|
||||
|
||||
|
||||
# TODO i can't find where this function is being used
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption(
|
||||
|
@ -767,3 +800,4 @@ def tiny_ic_databunch_valid_features(tiny_ic_databunch):
|
|||
tiny_ic_databunch, DatasetType.Valid, learn, embedding_layer
|
||||
)
|
||||
return features
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ OUTPUT_NOTEBOOK = "output.ipynb"
|
|||
@pytest.mark.notebooks
|
||||
@pytest.mark.linuxgpu
|
||||
def test_01_notebook_run(classification_notebooks):
|
||||
notebook_path = classification_notebooks["01_training_introduction"]
|
||||
notebook_path = classification_notebooks["01"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -30,7 +30,7 @@ def test_01_notebook_run(classification_notebooks):
|
|||
@pytest.mark.notebooks
|
||||
@pytest.mark.linuxgpu
|
||||
def test_02_notebook_run(classification_notebooks):
|
||||
notebook_path = classification_notebooks["02_multilabel_classification"]
|
||||
notebook_path = classification_notebooks["02"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -48,7 +48,7 @@ def test_02_notebook_run(classification_notebooks):
|
|||
@pytest.mark.notebooks
|
||||
@pytest.mark.linuxgpu
|
||||
def test_03_notebook_run(classification_notebooks):
|
||||
notebook_path = classification_notebooks["03_training_accuracy_vs_speed"]
|
||||
notebook_path = classification_notebooks["03"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -65,7 +65,7 @@ def test_03_notebook_run(classification_notebooks):
|
|||
@pytest.mark.notebooks
|
||||
@pytest.mark.linuxgpu
|
||||
def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["11_exploring_hyperparameters"]
|
||||
notebook_path = classification_notebooks["11"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -91,7 +91,7 @@ def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
@pytest.mark.notebooks
|
||||
@pytest.mark.linuxgpu
|
||||
def test_12_notebook_run(classification_notebooks):
|
||||
notebook_path = classification_notebooks["12_hard_negative_sampling"]
|
||||
notebook_path = classification_notebooks["12"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
|
|
@ -23,7 +23,7 @@ def test_ic_20_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = classification_notebooks["20_azure_workspace_setup"]
|
||||
notebook_path = classification_notebooks["20"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -46,9 +46,7 @@ def test_ic_21_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = classification_notebooks[
|
||||
"21_deployment_on_azure_container_instances"
|
||||
]
|
||||
notebook_path = classification_notebooks["21"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -71,9 +69,7 @@ def test_ic_22_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = classification_notebooks[
|
||||
"22_deployment_on_azure_kubernetes_service"
|
||||
]
|
||||
notebook_path = classification_notebooks["22"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -96,7 +92,7 @@ def test_ic_23_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = classification_notebooks["23_aci_aks_web_service_testing"]
|
||||
notebook_path = classification_notebooks["23"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -119,9 +115,7 @@ def test_ic_24_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = classification_notebooks[
|
||||
"24_exploring_hyperparameters_on_azureml"
|
||||
]
|
||||
notebook_path = classification_notebooks["24"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -180,7 +174,7 @@ def test_od_20_notebook_run(
|
|||
workspace_name,
|
||||
workspace_region,
|
||||
):
|
||||
notebook_path = detection_notebooks["20_deployment_on_kubernetes"]
|
||||
notebook_path = detection_notebooks["20"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
from utils_cv.action_recognition.data import (
|
||||
_DatasetSpec,
|
||||
Urls,
|
||||
)
|
||||
from utils_cv.common.data import data_path
|
||||
|
||||
|
||||
def test__DatasetSpec_kinetics():
|
||||
""" Tests DatasetSpec initialize with kinetics classes """
|
||||
kinetics = _DatasetSpec(Urls.kinetics_label_map, 400)
|
||||
kinetics.class_names
|
||||
assert os.path.exists(str(data_path() / "label_map.txt"))
|
||||
|
||||
|
||||
def test__DatasetSpec_hmdb():
|
||||
""" Tests DatasetSpec initialize with hmdb51 classes """
|
||||
hmdb51 = _DatasetSpec(Urls.hmdb51_label_map, 51)
|
||||
hmdb51.class_names
|
||||
assert os.path.exists(str(data_path() / "label_map.txt"))
|
|
@ -0,0 +1,70 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
# This test is based on the test suite implemented for Recommenders project
|
||||
# https://github.com/Microsoft/Recommenders/tree/master/tests
|
||||
|
||||
import os
|
||||
import papermill as pm
|
||||
import pytest
|
||||
import scrapbook as sb
|
||||
|
||||
# Unless manually modified, python3 should be
|
||||
# the name of the current jupyter kernel
|
||||
# that runs on the activated conda environment
|
||||
KERNEL_NAME = "python3"
|
||||
OUTPUT_NOTEBOOK = "output.ipynb"
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_00_notebook_run(action_recognition_notebooks):
|
||||
notebook_path = action_recognition_notebooks["00"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
parameters=dict(PM_VERSION=pm.__version__),
|
||||
kernel_name=KERNEL_NAME,
|
||||
)
|
||||
|
||||
nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
|
||||
# TODO add some asserts like below
|
||||
# assert nb_output.scraps["predicted_label"].data == "coffee_mug"
|
||||
# assert nb_output.scraps["predicted_confidence"].data > 0.5
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_01_notebook_run(action_recognition_notebooks):
|
||||
# TODO - this notebook relies on downloading hmdb51, so pass for now
|
||||
pass
|
||||
|
||||
# notebook_path = classification_notebooks["01"]
|
||||
# pm.execute_notebook(
|
||||
# notebook_path,
|
||||
# OUTPUT_NOTEBOOK,
|
||||
# parameters=dict(PM_VERSION=pm.__version__),
|
||||
# kernel_name=KERNEL_NAME,
|
||||
# )
|
||||
|
||||
# nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
|
||||
# TODO add some asserts like below
|
||||
# assert len(nb_output.scraps["training_accuracies"].data) == 1
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_02_notebook_run(action_recognition_notebooks):
|
||||
pass
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_10_notebook_run(action_recognition_notebooks):
|
||||
notebook_path = action_recognition_notebooks["10"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
parameters=dict(PM_VERSION=pm.__version__),
|
||||
kernel_name=KERNEL_NAME,
|
||||
)
|
||||
|
||||
nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
|
||||
# TODO add some asserts like below
|
||||
# assert len(nb_output.scraps["training_accuracies"].data) == 1
|
|
@ -18,7 +18,7 @@ OUTPUT_NOTEBOOK = "output.ipynb"
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_00_notebook_run(classification_notebooks):
|
||||
notebook_path = classification_notebooks["00_webcam"]
|
||||
notebook_path = classification_notebooks["00"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -33,7 +33,7 @@ def test_00_notebook_run(classification_notebooks):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_01_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["01_training_introduction"]
|
||||
notebook_path = classification_notebooks["01"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -52,7 +52,7 @@ def test_01_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_02_notebook_run(classification_notebooks, multilabel_ic_data_path):
|
||||
notebook_path = classification_notebooks["02_multilabel_classification"]
|
||||
notebook_path = classification_notebooks["02"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -71,7 +71,7 @@ def test_02_notebook_run(classification_notebooks, multilabel_ic_data_path):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_03_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["03_training_accuracy_vs_speed"]
|
||||
notebook_path = classification_notebooks["03"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -93,7 +93,7 @@ def test_03_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_10_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["10_image_annotation"]
|
||||
notebook_path = classification_notebooks["10"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -110,7 +110,7 @@ def test_10_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["11_exploring_hyperparameters"]
|
||||
notebook_path = classification_notebooks["11"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
@ -131,7 +131,7 @@ def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
|
||||
@pytest.mark.notebooks
|
||||
def test_12_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["12_hard_negative_sampling"]
|
||||
notebook_path = classification_notebooks["12"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
|
|
|
@ -8,6 +8,7 @@ from utils_cv.common.gpu import (
|
|||
is_linux,
|
||||
is_windows,
|
||||
which_processor,
|
||||
system_info,
|
||||
)
|
||||
|
||||
|
||||
|
@ -39,3 +40,7 @@ def test_db_num_workers():
|
|||
else:
|
||||
assert db_num_workers() == 16
|
||||
assert db_num_workers(non_windows_num_workers=7) == 7
|
||||
|
||||
|
||||
def test_system_info():
|
||||
system_info()
|
||||
|
|
|
@ -2,12 +2,13 @@
|
|||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
from PIL import ImageFont
|
||||
|
||||
from fastai.vision import ImageList
|
||||
from utils_cv.common.gpu import db_num_workers
|
||||
from utils_cv.common.misc import copy_files, set_random_seed, get_font
|
||||
from utils_cv.common.misc import copy_files, set_random_seed, get_font, Config
|
||||
|
||||
|
||||
def test_set_random_seed(tiny_ic_data_path):
|
||||
|
@ -75,3 +76,21 @@ def test_get_font():
|
|||
type(font) == ImageFont.FreeTypeFont
|
||||
or type(font) == ImageFont.ImageFont
|
||||
)
|
||||
|
||||
|
||||
def test_Config():
|
||||
# test dictionary wrapper to make sure keys can be accessed as attributes
|
||||
cfg = Config({"lr": 0.01, "momentum": 0.95})
|
||||
assert cfg.lr == 0.01 and cfg.momentum == 0.95
|
||||
cfg = Config(lr=0.01, momentum=0.95)
|
||||
assert cfg.lr == 0.01 and cfg.momentum == 0.95
|
||||
cfg = Config({"lr": 0.01}, momentum=0.95)
|
||||
assert cfg.lr == 0.01 and cfg.momentum == 0.95
|
||||
cfg_wrapper = Config(cfg, epochs=3)
|
||||
assert (
|
||||
cfg_wrapper.lr == 0.01
|
||||
and cfg_wrapper.momentum == 0.95
|
||||
and cfg_wrapper.epochs == 3
|
||||
)
|
||||
with pytest.raises(ValueError):
|
||||
Config(3)
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
from .common import Config, system_info
|
|
@ -1,51 +0,0 @@
|
|||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import sys
|
||||
|
||||
import torch
|
||||
import torch.cuda as cuda
|
||||
import torchvision
|
||||
|
||||
|
||||
class Config(object):
|
||||
def __init__(self, config=None, **extras):
|
||||
"""Dictionary wrapper to access keys as attributes.
|
||||
|
||||
Args:
|
||||
config (dict or Config): Configurations
|
||||
extras (kwargs): Extra configurations
|
||||
|
||||
Examples:
|
||||
>>> cfg = Config({'lr': 0.01}, momentum=0.95)
|
||||
or
|
||||
>>> cfg = Config({'lr': 0.01, 'momentum': 0.95})
|
||||
then, use as follows:
|
||||
>>> print(cfg.lr, cfg.momentum)
|
||||
"""
|
||||
if config is not None:
|
||||
if isinstance(config, dict):
|
||||
for k in config:
|
||||
setattr(self, k, config[k])
|
||||
elif isinstance(config, self.__class__):
|
||||
self.__dict__ = config.__dict__.copy()
|
||||
else:
|
||||
raise ValueError("Unknown config")
|
||||
|
||||
for k, v in extras.items():
|
||||
setattr(self, k, v)
|
||||
|
||||
def get(self, key, default):
|
||||
return getattr(self, key, default)
|
||||
|
||||
|
||||
def system_info():
|
||||
print(sys.version, "\n")
|
||||
print("PyTorch {}".format(torch.__version__), "\n")
|
||||
print("Torch-vision {}".format(torchvision.__version__), "\n")
|
||||
print("Available devices:")
|
||||
if cuda.is_available():
|
||||
for i in range(cuda.device_count()):
|
||||
print("{}: {}".format(i, cuda.get_device_name(i)))
|
||||
else:
|
||||
print("CPUs")
|
|
@ -3,40 +3,35 @@
|
|||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Union, List
|
||||
from urllib.request import urlretrieve
|
||||
import warnings
|
||||
|
||||
import decord
|
||||
from einops.layers.torch import Rearrange
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from numpy.random import randint
|
||||
import torch
|
||||
from torch.utils.data import Dataset
|
||||
from torchvision.transforms import Compose
|
||||
|
||||
from . import transforms_video as transforms
|
||||
from .functional_video import denormalize
|
||||
|
||||
|
||||
DEFAULT_MEAN = (0.43216, 0.394666, 0.37645)
|
||||
DEFAULT_STD = (0.22803, 0.22145, 0.216989)
|
||||
from ..common.data import data_path
|
||||
|
||||
|
||||
class _DatasetSpec:
|
||||
def __init__(self, label_url, root, num_classes):
|
||||
""" Properties of a Video Dataset. """
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
label_url: str,
|
||||
num_classes: int,
|
||||
data_path: Union[Path, str] = data_path(),
|
||||
) -> None:
|
||||
self.label_url = label_url
|
||||
self.root = root
|
||||
self.num_classes = num_classes
|
||||
self.data_path = data_path
|
||||
self._class_names = None
|
||||
|
||||
@property
|
||||
def class_names(self):
|
||||
def class_names(self) -> List[str]:
|
||||
if self._class_names is None:
|
||||
label_filepath = os.path.join(self.root, "label_map.txt")
|
||||
label_filepath = os.path.join(self.data_path, "label_map.txt")
|
||||
if not os.path.isfile(label_filepath):
|
||||
os.makedirs(self.root, exist_ok=True)
|
||||
urlretrieve(self.label_url, label_filepath)
|
||||
os.makedirs(self.data_path, exist_ok=True)
|
||||
else:
|
||||
os.remove(label_filepath)
|
||||
urlretrieve(self.label_url, label_filepath)
|
||||
with open(label_filepath) as f:
|
||||
self._class_names = [l.strip() for l in f]
|
||||
assert len(self._class_names) == self.num_classes
|
||||
|
@ -44,259 +39,15 @@ class _DatasetSpec:
|
|||
return self._class_names
|
||||
|
||||
|
||||
class Urls:
|
||||
kinetics_label_map = "https://github.com/microsoft/ComputerVision/files/3746975/kinetics400_lable_map.txt"
|
||||
hmdb51_label_map = "https://github.com/microsoft/ComputerVision/files/3746963/hmdb51_label_map.txt"
|
||||
|
||||
|
||||
KINETICS = _DatasetSpec(
|
||||
"https://github.com/microsoft/ComputerVision/files/3746975/kinetics400_lable_map.txt",
|
||||
os.path.join("data", "kinetics400"),
|
||||
400
|
||||
Urls.kinetics_label_map, 400, os.path.join("data", "kinetics400"),
|
||||
)
|
||||
|
||||
HMDB51 = _DatasetSpec(
|
||||
"https://github.com/microsoft/ComputerVision/files/3746963/hmdb51_label_map.txt",
|
||||
os.path.join("data", "hmdb51"),
|
||||
51
|
||||
Urls.hmdb51_label_map, 51, os.path.join("data", "hmdb51"),
|
||||
)
|
||||
|
||||
|
||||
class VideoRecord(object):
|
||||
def __init__(self, row):
|
||||
self._data = row
|
||||
self._num_frames = -1
|
||||
|
||||
@property
|
||||
def path(self):
|
||||
return self._data[0]
|
||||
|
||||
@property
|
||||
def num_frames(self):
|
||||
if self._num_frames == -1:
|
||||
self._num_frames = int(len([x for x in Path(self._data[0]).glob('img_*')]) - 1)
|
||||
return self._num_frames
|
||||
|
||||
@property
|
||||
def label(self):
|
||||
return int(self._data[1])
|
||||
|
||||
|
||||
class VideoDataset(Dataset):
|
||||
"""
|
||||
Args:
|
||||
split_file (str): Annotation file containing video filenames and labels.
|
||||
video_dir (str): Videos directory.
|
||||
num_segments (int): Number of clips to sample from each video.
|
||||
sample_length (int): Number of consecutive frames to sample from a video (i.e. clip length).
|
||||
sample_step (int): Sampling step.
|
||||
input_size (int or tuple): Model input image size.
|
||||
im_scale (int or tuple): Resize target size.
|
||||
resize_keep_ratio (bool): If True, keep the original ratio when resizing.
|
||||
mean (tuple): Normalization mean.
|
||||
std (tuple): Normalization std.
|
||||
random_shift (bool): Random temporal shift when sample a clip.
|
||||
temporal_jitter (bool): Randomly skip frames when sampling each frames.
|
||||
flip_ratio (float): Horizontal flip ratio.
|
||||
random_crop (bool): If False, do center-crop.
|
||||
random_crop_scales (tuple): Range of size of the origin size random cropped.
|
||||
video_ext (str): Video file extension.
|
||||
warning (bool): On or off warning.
|
||||
"""
|
||||
def __init__(
|
||||
self,
|
||||
split_file,
|
||||
video_dir,
|
||||
num_segments=1,
|
||||
sample_length=8,
|
||||
sample_step=1,
|
||||
input_size=112,
|
||||
im_scale=128,
|
||||
resize_keep_ratio=True,
|
||||
mean=DEFAULT_MEAN,
|
||||
std=DEFAULT_STD,
|
||||
random_shift=False,
|
||||
temporal_jitter=False,
|
||||
flip_ratio=0.5,
|
||||
random_crop=False,
|
||||
random_crop_scales=(0.6, 1.0),
|
||||
video_ext="mp4",
|
||||
warning=False,
|
||||
):
|
||||
# TODO maybe check wrong arguments to early failure
|
||||
assert sample_step > 0
|
||||
assert num_segments > 0
|
||||
|
||||
self.video_dir = video_dir
|
||||
self.video_records = [
|
||||
VideoRecord(x.strip().split(" ")) for x in open(split_file)
|
||||
]
|
||||
|
||||
self.num_segments = num_segments
|
||||
self.sample_length = sample_length
|
||||
self.sample_step = sample_step
|
||||
self.presample_length = sample_length * sample_step
|
||||
|
||||
# Temporal noise
|
||||
self.random_shift = random_shift
|
||||
self.temporal_jitter = temporal_jitter
|
||||
|
||||
# Video transforms
|
||||
# 1. resize
|
||||
trfms = [
|
||||
transforms.ToTensorVideo(),
|
||||
transforms.ResizeVideo(im_scale, resize_keep_ratio),
|
||||
]
|
||||
# 2. crop
|
||||
if random_crop:
|
||||
if random_crop_scales is not None:
|
||||
crop = transforms.RandomResizedCropVideo(input_size, random_crop_scales)
|
||||
else:
|
||||
crop = transforms.RandomCropVideo(input_size)
|
||||
else:
|
||||
crop = transforms.CenterCropVideo(input_size)
|
||||
trfms.append(crop)
|
||||
# 3. flip
|
||||
trfms.append(transforms.RandomHorizontalFlipVideo(flip_ratio))
|
||||
# 4. normalize
|
||||
trfms.append(transforms.NormalizeVideo(mean, std))
|
||||
self.transforms = Compose(trfms)
|
||||
self.video_ext = video_ext
|
||||
self.warning = warning
|
||||
|
||||
def __len__(self):
|
||||
return len(self.video_records)
|
||||
|
||||
def _sample_indices(self, record):
|
||||
"""
|
||||
Args:
|
||||
record (VideoRecord): A video record.
|
||||
Return:
|
||||
list: Segment offsets (start indices)
|
||||
"""
|
||||
if record.num_frames > self.presample_length:
|
||||
if self.random_shift:
|
||||
# Random sample
|
||||
offsets = np.sort(
|
||||
randint(
|
||||
record.num_frames - self.presample_length + 1,
|
||||
size=self.num_segments,
|
||||
)
|
||||
)
|
||||
else:
|
||||
# Uniform sample
|
||||
distance = (record.num_frames - self.presample_length + 1) / self.num_segments
|
||||
offsets = np.array(
|
||||
[int(distance / 2.0 + distance * x) for x in range(self.num_segments)]
|
||||
)
|
||||
else:
|
||||
if self.warning:
|
||||
warnings.warn(
|
||||
"num_segments and/or sample_length > num_frames in {}".format(
|
||||
record.path
|
||||
)
|
||||
)
|
||||
offsets = np.zeros((self.num_segments,), dtype=int)
|
||||
|
||||
return offsets
|
||||
|
||||
def _get_frames(self, video_reader, offset):
|
||||
clip = list()
|
||||
|
||||
# decord.seek() seems to have a bug. use seek_accurate().
|
||||
video_reader.seek_accurate(offset)
|
||||
# first frame
|
||||
clip.append(video_reader.next().asnumpy())
|
||||
# remaining frames
|
||||
try:
|
||||
if self.temporal_jitter:
|
||||
for i in range(self.sample_length - 1):
|
||||
step = randint(self.sample_step + 1)
|
||||
if step == 0:
|
||||
clip.append(clip[-1].copy())
|
||||
else:
|
||||
if step > 1:
|
||||
video_reader.skip_frames(step - 1)
|
||||
cur_frame = video_reader.next().asnumpy()
|
||||
if len(cur_frame.shape) != 3:
|
||||
# maybe end of the video
|
||||
break
|
||||
clip.append(cur_frame)
|
||||
else:
|
||||
for i in range(self.sample_length - 1):
|
||||
if self.sample_step > 1:
|
||||
video_reader.skip_frames(self.sample_step - 1)
|
||||
cur_frame = video_reader.next().asnumpy()
|
||||
if len(cur_frame.shape) != 3:
|
||||
# maybe end of the video
|
||||
break
|
||||
clip.append(cur_frame)
|
||||
except StopIteration:
|
||||
pass
|
||||
|
||||
# if clip needs more frames, simply duplicate the last frame in the clip.
|
||||
while len(clip) < self.sample_length:
|
||||
clip.append(clip[-1].copy())
|
||||
|
||||
return clip
|
||||
|
||||
def __getitem__(self, idx):
|
||||
"""
|
||||
Return:
|
||||
clips (torch.tensor), label (int)
|
||||
"""
|
||||
record = self.video_records[idx]
|
||||
video_reader = decord.VideoReader(
|
||||
"{}.{}".format(os.path.join(self.video_dir, record.path), self.video_ext),
|
||||
# TODO try to add `ctx=decord.ndarray.gpu(0) or .cuda(0)`
|
||||
)
|
||||
record._num_frames = len(video_reader)
|
||||
|
||||
offsets = self._sample_indices(record)
|
||||
clips = np.array([self._get_frames(video_reader, o) for o in offsets])
|
||||
|
||||
if self.num_segments == 1:
|
||||
# [T, H, W, C] -> [C, T, H, W]
|
||||
return self.transforms(torch.from_numpy(clips[0])), record.label
|
||||
else:
|
||||
# [S, T, H, W, C] -> [S, C, T, H, W]
|
||||
return (
|
||||
torch.stack([
|
||||
self.transforms(torch.from_numpy(c)) for c in clips
|
||||
]),
|
||||
record.label
|
||||
)
|
||||
|
||||
|
||||
def show_batch(batch, sample_length, mean=DEFAULT_MEAN, std=DEFAULT_STD):
|
||||
"""
|
||||
Args:
|
||||
batch (list[torch.tensor]): List of sample (clip) tensors
|
||||
sample_length (int): Number of frames to show for each sample
|
||||
mean (tuple): Normalization mean
|
||||
std (tuple): Normalization std-dev
|
||||
"""
|
||||
batch_size = len(batch)
|
||||
plt.tight_layout()
|
||||
fig, axs = plt.subplots(
|
||||
batch_size,
|
||||
sample_length,
|
||||
figsize=(4 * sample_length, 3 * batch_size)
|
||||
)
|
||||
|
||||
for i, ax in enumerate(axs):
|
||||
if batch_size == 1:
|
||||
clip = batch[0]
|
||||
else:
|
||||
clip = batch[i]
|
||||
clip = Rearrange("c t h w -> t c h w")(clip)
|
||||
if not isinstance(ax, np.ndarray):
|
||||
ax = [ax]
|
||||
for j, a in enumerate(ax):
|
||||
a.axis("off")
|
||||
a.imshow(
|
||||
np.moveaxis(
|
||||
denormalize(
|
||||
clip[j],
|
||||
mean,
|
||||
std,
|
||||
).numpy(),
|
||||
0,
|
||||
-1,
|
||||
)
|
||||
)
|
|
@ -0,0 +1,498 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
import copy
|
||||
from pathlib import Path
|
||||
import warnings
|
||||
from typing import Callable, Tuple, Union, List
|
||||
|
||||
import decord
|
||||
from einops.layers.torch import Rearrange
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from numpy.random import randint
|
||||
import torch
|
||||
from torch.utils.data import Dataset, Subset, DataLoader
|
||||
from torchvision.transforms import Compose
|
||||
|
||||
from .references import transforms_video as transforms
|
||||
from .references.functional_video import denormalize
|
||||
|
||||
from ..common.misc import Config
|
||||
from ..common.gpu import num_devices
|
||||
|
||||
Trans = Callable[[object, dict], Tuple[object, dict]]
|
||||
|
||||
DEFAULT_MEAN = (0.43216, 0.394666, 0.37645)
|
||||
DEFAULT_STD = (0.22803, 0.22145, 0.216989)
|
||||
|
||||
|
||||
class VideoRecord(object):
|
||||
"""
|
||||
This class is used for parsing split-files where each row contains a path
|
||||
and a label:
|
||||
|
||||
Ex:
|
||||
```
|
||||
path/to/my/clip.mp4 3
|
||||
path/to/another/clip.mp4 32
|
||||
```
|
||||
"""
|
||||
|
||||
def __init__(self, data: List[str]):
|
||||
""" Initialized a VideoRecord
|
||||
|
||||
Args:
|
||||
row: a list where first element is the path and second element is
|
||||
the label
|
||||
"""
|
||||
self._data = data
|
||||
self._num_frames = None
|
||||
|
||||
@property
|
||||
def path(self) -> str:
|
||||
return self._data[0]
|
||||
|
||||
@property
|
||||
def num_frames(self) -> int:
|
||||
if self._num_frames is None:
|
||||
self._num_frames = int(
|
||||
len([x for x in Path(self._data[0]).glob("img_*")]) - 1
|
||||
)
|
||||
return self._num_frames
|
||||
|
||||
@property
|
||||
def label(self) -> int:
|
||||
return int(self._data[1])
|
||||
|
||||
|
||||
def get_transforms(train: bool, tfms_config: Config = None) -> Trans:
|
||||
""" Get default transformations to apply depending on whether we're applying it to the training or the validation set. If no tfms configurations are passed in, use the defaults.
|
||||
|
||||
Args:
|
||||
train: whether or not this is for training
|
||||
tfms_config: Config object with tranforms-related configs
|
||||
|
||||
Returns:
|
||||
A list of transforms to apply
|
||||
"""
|
||||
if tfms_config is None:
|
||||
tfms_config = (
|
||||
get_default_tfms_config(train=True)
|
||||
if train
|
||||
else get_default_tfms_config(train=False)
|
||||
)
|
||||
|
||||
# 1. resize
|
||||
tfms = [
|
||||
transforms.ToTensorVideo(),
|
||||
transforms.ResizeVideo(
|
||||
tfms_config.im_scale, tfms_config.resize_keep_ratio
|
||||
),
|
||||
]
|
||||
# 2. crop
|
||||
if tfms_config.random_crop:
|
||||
if tfms_config.random_crop_scales:
|
||||
crop = transforms.RandomResizedCropVideo(
|
||||
tfms_config.input_size, tfms_config.random_crop_scales
|
||||
)
|
||||
else:
|
||||
crop = transforms.RandomCropVideo(tfms_config.input_size)
|
||||
else:
|
||||
crop = transforms.CenterCropVideo(tfms_config.input_size)
|
||||
tfms.append(crop)
|
||||
# 3. flip
|
||||
tfms.append(transforms.RandomHorizontalFlipVideo(tfms_config.flip_ratio))
|
||||
# 4. normalize
|
||||
tfms.append(transforms.NormalizeVideo(tfms_config.mean, tfms_config.std))
|
||||
|
||||
return Compose(tfms)
|
||||
|
||||
|
||||
def get_default_tfms_config(train: bool) -> Config:
|
||||
"""
|
||||
Args:
|
||||
train: whether or not this is for training
|
||||
|
||||
Settings:
|
||||
input_size (int or tuple): Model input image size.
|
||||
im_scale (int or tuple): Resize target size.
|
||||
resize_keep_ratio (bool): If True, keep the original ratio when resizing.
|
||||
mean (tuple): Normalization mean.
|
||||
if train:
|
||||
std (tuple): Normalization std.
|
||||
flip_ratio (float): Horizontal flip ratio.
|
||||
random_crop (bool): If False, do center-crop.
|
||||
random_crop_scales (tuple): Range of size of the origin size random cropped.
|
||||
"""
|
||||
flip_ratio = 0.5 if train else 0.0
|
||||
random_crop = True if train else False
|
||||
random_crop_scales = (0.6, 1.0) if train else None
|
||||
|
||||
return Config(
|
||||
dict(
|
||||
input_size=112,
|
||||
im_scale=128,
|
||||
resize_keep_ratio=True,
|
||||
mean=DEFAULT_MEAN,
|
||||
std=DEFAULT_STD,
|
||||
flip_ratio=flip_ratio,
|
||||
random_crop=random_crop,
|
||||
random_crop_scales=random_crop_scales,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class VideoDataset:
|
||||
""" A video recognition dataset. """
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
root: str,
|
||||
train_pct: float = 0.75,
|
||||
num_samples: int = 1,
|
||||
sample_length: int = 8,
|
||||
sample_step: int = 1,
|
||||
temporal_jitter: bool = True,
|
||||
temporal_jitter_step: int = 2,
|
||||
random_shift: bool = True,
|
||||
batch_size: int = 8,
|
||||
video_ext: str = "mp4",
|
||||
warning: bool = False,
|
||||
train_split_file: str = None,
|
||||
test_split_file: str = None,
|
||||
train_transforms: Trans = get_transforms(train=True),
|
||||
test_transforms: Trans = get_transforms(train=False),
|
||||
) -> None:
|
||||
""" initialize dataset
|
||||
|
||||
Arg:
|
||||
root: Videos directory.
|
||||
train_pct: percentage of dataset to use for training
|
||||
num_samples: Number of clips to sample from each video.
|
||||
sample_length: Number of consecutive frames to sample from a video (i.e. clip length).
|
||||
sample_step: Sampling step.
|
||||
temporal_jitter: Randomly skip frames when sampling each frames.
|
||||
temporal_jitter_step: temporal jitter in frames
|
||||
random_shift: Random temporal shift when sample a clip.
|
||||
video_ext: Video file extension.
|
||||
warning: On or off warning.
|
||||
train_split_file: Annotation file containing video filenames and labels.
|
||||
test_split_file: Annotation file containing video filenames and labels.
|
||||
train_transforms: transforms for training
|
||||
test_transforms: transforms for testing
|
||||
"""
|
||||
|
||||
# TODO check wrong arguments early to prevent failure
|
||||
assert sample_step > 0
|
||||
assert num_samples > 0
|
||||
|
||||
if temporal_jitter:
|
||||
assert temporal_jitter_step > 0
|
||||
|
||||
if train_split_file:
|
||||
assert Path(train_split_file).exists()
|
||||
assert (
|
||||
test_split_file is not None and Path(test_split_file).exists()
|
||||
)
|
||||
|
||||
if test_split_file:
|
||||
assert Path(test_split_file).exists()
|
||||
assert (
|
||||
train_split_file is not None
|
||||
and Path(train_split_file).exists()
|
||||
)
|
||||
|
||||
self.root = root
|
||||
self.num_samples = num_samples
|
||||
self.sample_length = sample_length
|
||||
self.sample_step = sample_step
|
||||
self.presample_length = sample_length * sample_step
|
||||
self.temporal_jitter_step = temporal_jitter_step
|
||||
self.train_transforms = train_transforms
|
||||
self.test_transforms = test_transforms
|
||||
self.random_shift = random_shift
|
||||
self.temporal_jitter = temporal_jitter
|
||||
self.batch_size = batch_size
|
||||
self.video_ext = video_ext
|
||||
self.warning = warning
|
||||
|
||||
# create training and validation datasets
|
||||
self.train_ds, self.test_ds = (
|
||||
self.split_with_file(
|
||||
train_split_file=train_split_file,
|
||||
test_split_file=test_split_file,
|
||||
)
|
||||
if train_split_file
|
||||
else self.split_train_test(train_pct=train_pct)
|
||||
)
|
||||
|
||||
# initialize dataloaders
|
||||
self.init_data_loaders()
|
||||
|
||||
def split_train_test(
|
||||
self, train_pct: float = 0.8
|
||||
) -> Tuple[Dataset, Dataset]:
|
||||
""" Split this dataset into a training and testing set
|
||||
|
||||
Args:
|
||||
train_pct: the ratio of images to use for training vs
|
||||
testing
|
||||
|
||||
Return
|
||||
A training and testing dataset in that order
|
||||
"""
|
||||
pass
|
||||
|
||||
def split_with_file(
|
||||
self,
|
||||
train_split_file: Union[Path, str],
|
||||
test_split_file: Union[Path, str],
|
||||
) -> Tuple[Dataset, Dataset]:
|
||||
""" Split this dataset into a training and testing set using a split file.
|
||||
|
||||
Each line in the split file must use the form:
|
||||
```
|
||||
path/to/jumping/video.mp4 3
|
||||
path/to/swimming/video.mp4 5
|
||||
path/to/another/jumping/video.mp4 3
|
||||
```
|
||||
|
||||
Args:
|
||||
split_files: a tuple of 2 files
|
||||
|
||||
Return:
|
||||
A training and testing dataset in that order
|
||||
"""
|
||||
self.video_records = []
|
||||
|
||||
# add train records
|
||||
self.video_records.extend(
|
||||
[
|
||||
VideoRecord(row.strip().split(" "))
|
||||
for row in open(train_split_file)
|
||||
]
|
||||
)
|
||||
train_len = len(self.video_records)
|
||||
|
||||
# add validation records
|
||||
self.video_records.extend(
|
||||
[
|
||||
VideoRecord(row.strip().split(" "))
|
||||
for row in open(test_split_file)
|
||||
]
|
||||
)
|
||||
|
||||
# create indices
|
||||
indices = torch.arange(0, len(self.video_records))
|
||||
train_range = indices[:train_len]
|
||||
test_range = indices[train_len:]
|
||||
|
||||
# create train subset
|
||||
train = copy.deepcopy(Subset(self, train_range))
|
||||
train.dataset.transforms = self.train_transforms
|
||||
train.dataset.sample_step = (
|
||||
self.temporal_jitter_step
|
||||
if self.temporal_jitter
|
||||
else self.sample_step
|
||||
)
|
||||
train.dataset.presample_length = self.sample_length * self.sample_step
|
||||
|
||||
# create test subset
|
||||
test = copy.deepcopy(Subset(self, test_range))
|
||||
test.dataset.transforms = self.test_transforms
|
||||
test.dataset.random_shift = False
|
||||
test.dataset.temporal_jitter = False
|
||||
|
||||
return train, test
|
||||
|
||||
def init_data_loaders(self) -> None:
|
||||
""" Create training and validation data loaders. """
|
||||
devices = num_devices()
|
||||
|
||||
self.train_dl = DataLoader(
|
||||
self.train_ds,
|
||||
batch_size=self.batch_size * devices,
|
||||
shuffle=True,
|
||||
num_workers=0, # Torch 1.2 has a bug when num-workers > 0 (0 means run a main-processor worker)
|
||||
pin_memory=True,
|
||||
)
|
||||
|
||||
self.test_dl = DataLoader(
|
||||
self.test_ds,
|
||||
batch_size=self.batch_size * devices,
|
||||
shuffle=False,
|
||||
num_workers=0,
|
||||
pin_memory=True,
|
||||
)
|
||||
|
||||
def __len__(self) -> int:
|
||||
return len(self.video_records)
|
||||
|
||||
def _sample_indices(self, record: VideoRecord) -> List[int]:
|
||||
"""
|
||||
Create a list of frame-wise offsets into a video record. Depending on
|
||||
whether or not 'random shift' is used, perform a uniform sample or a
|
||||
random sample.
|
||||
|
||||
Args:
|
||||
record (VideoRecord): A video record.
|
||||
|
||||
Return:
|
||||
list: Segment offsets (start indices)
|
||||
"""
|
||||
if record.num_frames > self.presample_length:
|
||||
if self.random_shift:
|
||||
# Random sample
|
||||
offsets = np.sort(
|
||||
randint(
|
||||
record.num_frames - self.presample_length + 1,
|
||||
size=self.num_samples,
|
||||
)
|
||||
)
|
||||
else:
|
||||
# Uniform sample
|
||||
distance = (
|
||||
record.num_frames - self.presample_length + 1
|
||||
) / self.num_samples
|
||||
offsets = np.array(
|
||||
[
|
||||
int(distance / 2.0 + distance * x)
|
||||
for x in range(self.num_samples)
|
||||
]
|
||||
)
|
||||
else:
|
||||
if self.warning:
|
||||
warnings.warn(
|
||||
f"num_samples and/or sample_length > num_frames in {record.path}"
|
||||
)
|
||||
offsets = np.zeros((self.num_samples,), dtype=int)
|
||||
|
||||
return offsets
|
||||
|
||||
def _get_frames(
|
||||
self, video_reader: decord.VideoReader, offset: int,
|
||||
) -> List[np.ndarray]:
|
||||
""" Get frames at sample length.
|
||||
|
||||
Args:
|
||||
video_reader: the decord tool for parsing videos
|
||||
offset: where to start the reader from
|
||||
|
||||
Returns
|
||||
Frames at sample length in a List
|
||||
"""
|
||||
clip = list()
|
||||
|
||||
# decord.seek() seems to have a bug. use seek_accurate().
|
||||
video_reader.seek_accurate(offset)
|
||||
|
||||
# first frame
|
||||
clip.append(video_reader.next().asnumpy())
|
||||
|
||||
# remaining frames
|
||||
try:
|
||||
for i in range(self.sample_length - 1):
|
||||
step = (
|
||||
randint(self.sample_step + 1)
|
||||
if self.temporal_jitter
|
||||
else self.sample_step
|
||||
)
|
||||
|
||||
if step == 0 and self.temporal_jitter:
|
||||
clip.append(clip[-1].copy())
|
||||
else:
|
||||
if step > 1:
|
||||
video_reader.skip_frames(step - 1)
|
||||
cur_frame = video_reader.next().asnumpy()
|
||||
clip.append(cur_frame)
|
||||
|
||||
except StopIteration:
|
||||
# pass when video has ended
|
||||
pass
|
||||
|
||||
# if clip needs more frames, simply duplicate the last frame in the clip.
|
||||
while len(clip) < self.sample_length:
|
||||
clip.append(clip[-1].copy())
|
||||
|
||||
return clip
|
||||
|
||||
def __getitem__(self, idx: int) -> Tuple[torch.tensor, int]:
|
||||
"""
|
||||
Return:
|
||||
clips (torch.tensor), label (int)
|
||||
"""
|
||||
record = self.video_records[idx]
|
||||
video_reader = decord.VideoReader(
|
||||
"{}.{}".format(
|
||||
os.path.join(self.root, record.path), self.video_ext
|
||||
),
|
||||
# TODO try to add `ctx=decord.ndarray.gpu(0) or .cuda(0)`
|
||||
)
|
||||
record._num_frames = len(video_reader)
|
||||
|
||||
offsets = self._sample_indices(record)
|
||||
clips = np.array([self._get_frames(video_reader, o) for o in offsets])
|
||||
|
||||
if self.num_samples == 1:
|
||||
# [T, H, W, C] -> [C, T, H, W]
|
||||
return self.transforms(torch.from_numpy(clips[0])), record.label
|
||||
else:
|
||||
# [S, T, H, W, C] -> [S, C, T, H, W]
|
||||
return (
|
||||
torch.stack(
|
||||
[self.transforms(torch.from_numpy(c)) for c in clips]
|
||||
),
|
||||
record.label,
|
||||
)
|
||||
|
||||
def _show_batch(
|
||||
self,
|
||||
batch: List[torch.tensor],
|
||||
sample_length: int,
|
||||
mean: Tuple[int, int, int] = DEFAULT_MEAN,
|
||||
std: Tuple[int, int, int] = DEFAULT_STD,
|
||||
) -> None:
|
||||
"""
|
||||
Display a batch of images.
|
||||
|
||||
Args:
|
||||
batch: List of sample (clip) tensors
|
||||
sample_length: Number of frames to show for each sample
|
||||
mean: Normalization mean
|
||||
std: Normalization std-dev
|
||||
"""
|
||||
batch_size = len(batch)
|
||||
plt.tight_layout()
|
||||
fig, axs = plt.subplots(
|
||||
batch_size,
|
||||
sample_length,
|
||||
figsize=(4 * sample_length, 3 * batch_size),
|
||||
)
|
||||
|
||||
for i, ax in enumerate(axs):
|
||||
if batch_size == 1:
|
||||
clip = batch[0]
|
||||
else:
|
||||
clip = batch[i]
|
||||
clip = Rearrange("c t h w -> t c h w")(clip)
|
||||
if not isinstance(ax, np.ndarray):
|
||||
ax = [ax]
|
||||
for j, a in enumerate(ax):
|
||||
a.axis("off")
|
||||
a.imshow(
|
||||
np.moveaxis(denormalize(clip[j], mean, std).numpy(), 0, -1)
|
||||
)
|
||||
pass
|
||||
|
||||
def show_batch(self, train_or_test: str = "train", rows: int = 1) -> None:
|
||||
"""Plot first few samples in the datasets"""
|
||||
if train_or_test == "train":
|
||||
batch = [self.train_ds.dataset[i][0] for i in range(rows)]
|
||||
elif train_or_test == "valid":
|
||||
batch = [self.test_ds.dataset[i][0] for i in range(rows)]
|
||||
else:
|
||||
raise ValueError("Unknown data type {}".format(which_data))
|
||||
|
||||
self._show_batch(batch, self.sample_length)
|
|
@ -5,198 +5,137 @@ from collections import OrderedDict
|
|||
import os
|
||||
import time
|
||||
import warnings
|
||||
from typing import Union
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
from apex import amp
|
||||
|
||||
AMP_AVAILABLE = True
|
||||
except ModuleNotFoundError:
|
||||
AMP_AVAILABLE = False
|
||||
|
||||
from IPython.core.debugger import set_trace
|
||||
import numpy as np
|
||||
import torch
|
||||
import torch.cuda as cuda
|
||||
import torch.nn as nn
|
||||
import torch.optim as optim
|
||||
from torch.utils.data import DataLoader
|
||||
import torchvision
|
||||
|
||||
from . import Config
|
||||
from .data import (
|
||||
DEFAULT_MEAN,
|
||||
DEFAULT_STD,
|
||||
show_batch as _show_batch,
|
||||
VideoDataset,
|
||||
)
|
||||
from ..common.misc import Config
|
||||
from ..common.gpu import torch_device, num_devices
|
||||
from .dataset import VideoDataset
|
||||
|
||||
from .metrics import accuracy, AverageMeter
|
||||
from .references.metrics import accuracy, AverageMeter
|
||||
|
||||
# From https://github.com/moabitcoin/ig65m-pytorch
|
||||
TORCH_R2PLUS1D = "moabitcoin/ig65m-pytorch"
|
||||
# These paramaters are set so that we can use torch hub to download pretrained
|
||||
# models from the specified repo
|
||||
TORCH_R2PLUS1D = "moabitcoin/ig65m-pytorch" # From https://github.com/moabitcoin/ig65m-pytorch
|
||||
MODELS = {
|
||||
# model: output classes
|
||||
'r2plus1d_34_32_ig65m': 359,
|
||||
'r2plus1d_34_32_kinetics': 400,
|
||||
'r2plus1d_34_8_ig65m': 487,
|
||||
'r2plus1d_34_8_kinetics': 400,
|
||||
# Model name followed by the number of output classes.
|
||||
"r2plus1d_34_32_ig65m": 359,
|
||||
"r2plus1d_34_32_kinetics": 400,
|
||||
"r2plus1d_34_8_ig65m": 487,
|
||||
"r2plus1d_34_8_kinetics": 400,
|
||||
}
|
||||
|
||||
|
||||
class R2Plus1D(object):
|
||||
def __init__(self, cfgs):
|
||||
self.configs = Config(cfgs)
|
||||
self.train_ds, self.valid_ds = self.load_datasets(self.configs)
|
||||
self.model = self.init_model(
|
||||
self.configs.sample_length,
|
||||
self.configs.base_model,
|
||||
self.configs.num_classes
|
||||
class VideoLearner(object):
|
||||
""" Video recognition learner object that handles training loop and evaluation. """
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dataset: VideoDataset,
|
||||
num_classes: int, # ie 51 for hmdb51
|
||||
base_model: str = "ig65m", # or "kinetics"
|
||||
) -> None:
|
||||
""" By default, the Video Learner will use a R2plus1D model. Pass in
|
||||
a dataset of type Video Dataset and the Video Learner will intialize
|
||||
the model.
|
||||
|
||||
Args:
|
||||
dataset: the datset to use for this model
|
||||
num_class: the number of actions/classifications
|
||||
base_model: the R2plus1D model is based on either ig65m or
|
||||
kinetics. By default it will use the weights from ig65m since it
|
||||
tends attain higher results.
|
||||
"""
|
||||
self.dataset = dataset
|
||||
self.model, self.model_name = self.init_model(
|
||||
self.dataset.sample_length, base_model, num_classes,
|
||||
)
|
||||
self.model_name = "r2plus1d_34_{}_{}".format(self.configs.sample_length, self.configs.base_model)
|
||||
|
||||
@staticmethod
|
||||
def init_model(sample_length, base_model, num_classes=None):
|
||||
if base_model not in ('ig65m', 'kinetics'):
|
||||
def init_model(
|
||||
sample_length: int, base_model: str, num_classes: int = None
|
||||
) -> torchvision.models.video.resnet.VideoResNet:
|
||||
"""
|
||||
Initializes the model by loading it using torch's `hub.load`
|
||||
functionality. Uses the model from TORCH_R2PLUS1D.
|
||||
|
||||
Args:
|
||||
sample_length: Number of consecutive frames to sample from a video (i.e. clip length).
|
||||
base_model: the R2plus1D model is based on either ig65m or kinetics.
|
||||
num_classes: the number of classes/actions
|
||||
|
||||
Returns:
|
||||
Load a model from a github repo, with pretrained weights
|
||||
"""
|
||||
if base_model not in ("ig65m", "kinetics"):
|
||||
raise ValueError(
|
||||
"Not supported model {}. Should be 'ig65m' or 'kinetics'"
|
||||
.format(base_model)
|
||||
f"Not supported model {base_model}. Should be 'ig65m' or 'kinetics'"
|
||||
)
|
||||
|
||||
# Decide if to use pre-trained weights for DNN trained using 8 or for 32 frames
|
||||
if sample_length<=8:
|
||||
if sample_length <= 8:
|
||||
model_sample_length = 8
|
||||
else:
|
||||
model_sample_length = 32
|
||||
model_name = "r2plus1d_34_{}_{}".format(model_sample_length, base_model)
|
||||
|
||||
print("Loading {} model".format(model_name))
|
||||
model_name = f"r2plus1d_34_{model_sample_length}_{base_model}"
|
||||
|
||||
print(f"Loading {model_name} model")
|
||||
|
||||
model = torch.hub.load(
|
||||
TORCH_R2PLUS1D, model_name, num_classes=MODELS[model_name], pretrained=True
|
||||
TORCH_R2PLUS1D,
|
||||
model_name,
|
||||
num_classes=MODELS[model_name],
|
||||
pretrained=True,
|
||||
)
|
||||
|
||||
# Replace head
|
||||
if num_classes is not None:
|
||||
model.fc = nn.Linear(model.fc.in_features, num_classes)
|
||||
return model
|
||||
|
||||
@staticmethod
|
||||
def load_datasets(cfgs):
|
||||
"""Load VideoDataset
|
||||
return model, model_name
|
||||
|
||||
Args:
|
||||
cfgs (dict or Config): Dataset configuration. For validation dataset,
|
||||
data augmentation such as random shift and temporal jitter is not used.
|
||||
|
||||
Return:
|
||||
VideoDataset, VideoDataset: Train and validation datasets.
|
||||
If split file is not provided, returns None.
|
||||
"""
|
||||
cfgs = Config(cfgs)
|
||||
|
||||
train_split = cfgs.get('train_split', None)
|
||||
train_ds = None if train_split is None else VideoDataset(
|
||||
split_file=train_split,
|
||||
video_dir=cfgs.video_dir,
|
||||
num_segments=1,
|
||||
sample_length=cfgs.sample_length,
|
||||
sample_step=cfgs.get('temporal_jitter_step', cfgs.get('sample_step', 1)),
|
||||
input_size=112,
|
||||
im_scale=cfgs.get('im_scale', 128),
|
||||
resize_keep_ratio=cfgs.get('resize_keep_ratio', True),
|
||||
mean=cfgs.get('mean', DEFAULT_MEAN),
|
||||
std=cfgs.get('std', DEFAULT_STD),
|
||||
random_shift=cfgs.get('random_shift', True),
|
||||
temporal_jitter=True if cfgs.get('temporal_jitter_step', 0) > 0 else False,
|
||||
flip_ratio=cfgs.get('flip_ratio', 0.5),
|
||||
random_crop=cfgs.get('random_crop', True),
|
||||
random_crop_scales=cfgs.get('random_crop_scales', (0.6, 1.0)),
|
||||
video_ext=cfgs.video_ext,
|
||||
)
|
||||
|
||||
valid_split = cfgs.get('valid_split', None)
|
||||
valid_ds = None if valid_split is None else VideoDataset(
|
||||
split_file=valid_split,
|
||||
video_dir=cfgs.video_dir,
|
||||
num_segments=1,
|
||||
sample_length=cfgs.sample_length,
|
||||
sample_step=cfgs.get('sample_step', 1),
|
||||
input_size=112,
|
||||
im_scale=cfgs.get('im_scale', 128),
|
||||
resize_keep_ratio=True,
|
||||
mean=cfgs.get('mean', DEFAULT_MEAN),
|
||||
std=cfgs.get('std', DEFAULT_STD),
|
||||
random_shift=False,
|
||||
temporal_jitter=False,
|
||||
flip_ratio=0.0,
|
||||
random_crop=False, # == Center crop
|
||||
random_crop_scales=None,
|
||||
video_ext=cfgs.video_ext,
|
||||
)
|
||||
|
||||
return train_ds, valid_ds
|
||||
|
||||
def show_batch(self, which_data='train', num_samples=1):
|
||||
"""Plot first few samples in the datasets"""
|
||||
if which_data == 'train':
|
||||
batch = [self.train_ds[i][0] for i in range(num_samples)]
|
||||
elif which_data == 'valid':
|
||||
batch = [self.valid_ds[i][0] for i in range(num_samples)]
|
||||
else:
|
||||
raise ValueError("Unknown data type {}".format(which_data))
|
||||
_show_batch(
|
||||
batch,
|
||||
self.configs.sample_length,
|
||||
mean=self.configs.get('mean', DEFAULT_MEAN),
|
||||
std=self.configs.get('std', DEFAULT_STD),
|
||||
)
|
||||
|
||||
def freeze(self):
|
||||
def freeze(self) -> None:
|
||||
"""Freeze model except the last layer"""
|
||||
self._set_requires_grad(False)
|
||||
for param in self.model.fc.parameters():
|
||||
param.requires_grad = True
|
||||
|
||||
def unfreeze(self):
|
||||
def unfreeze(self) -> None:
|
||||
self._set_requires_grad(True)
|
||||
|
||||
def _set_requires_grad(self, requires_grad=True):
|
||||
def _set_requires_grad(self, requires_grad=True) -> None:
|
||||
for param in self.model.parameters():
|
||||
param.requires_grad = requires_grad
|
||||
|
||||
def fit(self, train_cfgs):
|
||||
def fit(self, train_cfgs) -> None:
|
||||
""" The primary fit function """
|
||||
train_cfgs = Config(train_cfgs)
|
||||
|
||||
model_dir = train_cfgs.get('model_dir', "checkpoints")
|
||||
model_dir = train_cfgs.get("model_dir", "checkpoints")
|
||||
os.makedirs(model_dir, exist_ok=True)
|
||||
|
||||
if cuda.is_available():
|
||||
device = torch.device("cuda")
|
||||
num_devices = cuda.device_count()
|
||||
# Look for the optimal set of algorithms to use in cudnn. Use this only with fixed-size inputs.
|
||||
torch.backends.cudnn.benchmark = True
|
||||
else:
|
||||
device = torch.device("cpu")
|
||||
num_devices = 1
|
||||
|
||||
data_loaders = {}
|
||||
if self.train_ds is not None:
|
||||
data_loaders['train'] = DataLoader(
|
||||
self.train_ds,
|
||||
batch_size=train_cfgs.get('batch_size', 8) * num_devices,
|
||||
shuffle=True,
|
||||
num_workers=0, # Torch 1.2 has a bug when num-workers > 0 (0 means run a main-processor worker)
|
||||
pin_memory=True,
|
||||
)
|
||||
if self.valid_ds is not None:
|
||||
data_loaders['valid'] = DataLoader(
|
||||
self.valid_ds,
|
||||
batch_size=train_cfgs.get('batch_size', 8) * num_devices,
|
||||
shuffle=False,
|
||||
num_workers=0,
|
||||
pin_memory=True,
|
||||
)
|
||||
data_loaders["train"] = self.dataset.train_dl
|
||||
data_loaders["valid"] = self.dataset.test_dl
|
||||
|
||||
# Move model to gpu before constructing optimizers and amp.initialize
|
||||
device = torch_device()
|
||||
self.model.to(device)
|
||||
count_devices = num_devices()
|
||||
torch.backends.cudnn.benchmark = True
|
||||
|
||||
named_params_to_update = {}
|
||||
total_params = 0
|
||||
|
@ -210,19 +149,22 @@ class R2Plus1D(object):
|
|||
print("\tfull network")
|
||||
else:
|
||||
for name in named_params_to_update:
|
||||
print("\t{}".format(name))
|
||||
print(f"\t{name}")
|
||||
|
||||
momentum=train_cfgs.get('momentum', 0.95)
|
||||
# create optimizer
|
||||
momentum = train_cfgs.get("momentum", 0.95)
|
||||
optimizer = optim.SGD(
|
||||
list(named_params_to_update.values()),
|
||||
lr=train_cfgs.lr,
|
||||
momentum=momentum,
|
||||
weight_decay=train_cfgs.get('weight_decay', 0.0001),
|
||||
weight_decay=train_cfgs.get("weight_decay", 0.0001),
|
||||
)
|
||||
|
||||
# Use mixed-precision if available
|
||||
# Currently, only O1 works with DataParallel: See issues https://github.com/NVIDIA/apex/issues/227
|
||||
if train_cfgs.get('mixed_prec', False) and AMP_AVAILABLE:
|
||||
if train_cfgs.get("mixed_prec", False):
|
||||
# break if not AMP_AVAILABLE
|
||||
assert AMP_AVAILABLE
|
||||
# 'O0': Full FP32, 'O1': Conservative, 'O2': Standard, 'O3': Full FP16
|
||||
self.model, optimizer = amp.initialize(
|
||||
self.model,
|
||||
|
@ -233,37 +175,35 @@ class R2Plus1D(object):
|
|||
)
|
||||
|
||||
# Learning rate scheduler
|
||||
if train_cfgs.get('use_one_cycle_policy', False):
|
||||
if train_cfgs.get("use_one_cycle_policy", False):
|
||||
# Use warmup with the one-cycle policy
|
||||
scheduler = torch.optim.lr_scheduler.OneCycleLR(
|
||||
optimizer,
|
||||
max_lr=train_cfgs.lr,
|
||||
total_steps=train_cfgs.epochs,
|
||||
pct_start=train_cfgs.get('warmup_pct', 0.3),
|
||||
base_momentum=0.9*momentum,
|
||||
pct_start=train_cfgs.get("warmup_pct", 0.3),
|
||||
base_momentum=0.9 * momentum,
|
||||
max_momentum=momentum,
|
||||
)
|
||||
else:
|
||||
# Simple step-decay
|
||||
scheduler = torch.optim.lr_scheduler.StepLR(
|
||||
optimizer,
|
||||
step_size=train_cfgs.get('lr_step_size', float("inf")),
|
||||
gamma=train_cfgs.get('lr_gamma', 0.1),
|
||||
step_size=train_cfgs.get("lr_step_size", float("inf")),
|
||||
gamma=train_cfgs.get("lr_gamma", 0.1),
|
||||
)
|
||||
|
||||
# DataParallel after amp.initialize
|
||||
if num_devices > 1:
|
||||
model = nn.DataParallel(self.model)
|
||||
else:
|
||||
model = self.model
|
||||
model = (
|
||||
nn.DataParallel(self.model) if count_devices > 1 else self.model
|
||||
)
|
||||
|
||||
criterion = nn.CrossEntropyLoss().to(device)
|
||||
|
||||
for e in range(1, train_cfgs.epochs + 1):
|
||||
print("Epoch {} ==========".format(e))
|
||||
if scheduler is not None:
|
||||
print("lr={}".format(scheduler.get_lr()))
|
||||
|
||||
print(f"Epoch {e} ==========")
|
||||
print(f"lr={scheduler.get_lr()}")
|
||||
|
||||
self.train_an_epoch(
|
||||
model,
|
||||
data_loaders,
|
||||
|
@ -276,14 +216,16 @@ class R2Plus1D(object):
|
|||
|
||||
scheduler.step()
|
||||
|
||||
if train_cfgs.get('save_models', False):
|
||||
if train_cfgs.get("save_models", False):
|
||||
self.save(
|
||||
os.path.join(
|
||||
model_dir,
|
||||
"{model_name}_{epoch}.pt".format(
|
||||
model_name=train_cfgs.get('model_name', self.model_name),
|
||||
epoch=str(e).zfill(3)
|
||||
)
|
||||
model_name=train_cfgs.get(
|
||||
"model_name", self.model_name
|
||||
),
|
||||
epoch=str(e).zfill(3),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
|
@ -296,29 +238,35 @@ class R2Plus1D(object):
|
|||
optimizer,
|
||||
grad_steps=1,
|
||||
mixed_prec=False,
|
||||
):
|
||||
) -> None:
|
||||
"""Train / validate a model for one epoch.
|
||||
|
||||
:param model:
|
||||
:param data_loaders: dict {'train': train_dl, 'valid': valid_dl}
|
||||
:param device:
|
||||
:param criterion:
|
||||
:param optimizer:
|
||||
:param grad_steps: If > 1, use gradient accumulation. Useful for larger batching
|
||||
:param mixed_prec: If True, use FP16 + FP32 mixed precision via NVIDIA apex.amp
|
||||
:return: dict {
|
||||
'train/time': batch_time.avg,
|
||||
'train/loss': losses.avg,
|
||||
'train/top1': top1.avg,
|
||||
'train/top5': top5.avg,
|
||||
'valid/time': ...
|
||||
}
|
||||
Args:
|
||||
model: the model to use to train
|
||||
data_loaders: dict {'train': train_dl, 'valid': valid_dl}
|
||||
device: gpu or not
|
||||
criterion: TODO
|
||||
optimizer: TODO
|
||||
grad_steps: If > 1, use gradient accumulation. Useful for larger batching
|
||||
mixed_prec: If True, use FP16 + FP32 mixed precision via NVIDIA apex.amp
|
||||
|
||||
Return:
|
||||
dict {
|
||||
'train/time': batch_time.avg,
|
||||
'train/loss': losses.avg,
|
||||
'train/top1': top1.avg,
|
||||
'train/top5': top5.avg,
|
||||
'valid/time': ...
|
||||
}
|
||||
"""
|
||||
assert "train" in data_loaders
|
||||
if mixed_prec and not AMP_AVAILABLE:
|
||||
warnings.warn(
|
||||
"NVIDIA apex module is not installed. Cannot use mixed-precision."
|
||||
"""
|
||||
NVIDIA apex module is not installed. Cannot use
|
||||
mixed-precision. Turning off mixed-precision.
|
||||
"""
|
||||
)
|
||||
mixed_prec = False
|
||||
|
||||
result = OrderedDict()
|
||||
for phase in ["train", "valid"]:
|
||||
|
@ -356,8 +304,10 @@ class R2Plus1D(object):
|
|||
# make the accumulated gradient to be the same scale as without the accumulation
|
||||
loss = loss / grad_steps
|
||||
|
||||
if mixed_prec and AMP_AVAILABLE:
|
||||
with amp.scale_loss(loss, optimizer) as scaled_loss:
|
||||
if mixed_prec:
|
||||
with amp.scale_loss(
|
||||
loss, optimizer
|
||||
) as scaled_loss:
|
||||
scaled_loss.backward()
|
||||
else:
|
||||
loss.backward()
|
||||
|
@ -371,30 +321,26 @@ class R2Plus1D(object):
|
|||
end = time.time()
|
||||
|
||||
print(
|
||||
"{} took {:.2f} sec: loss = {:.4f}, top1_acc = {:.4f}, top5_acc = {:.4f}".format(
|
||||
phase, batch_time.sum, losses.avg, top1.avg, top5.avg
|
||||
)
|
||||
f"{phase} took {batch_time.sum:.2f} sec: loss = {losses.avg:.4f}, top1_acc = {top1.avg:.4f}, top5_acc = {top5.avg:.4f}"
|
||||
)
|
||||
result["{}/time".format(phase)] = batch_time.sum
|
||||
result["{}/loss".format(phase)] = losses.avg
|
||||
result["{}/top1".format(phase)] = top1.avg
|
||||
result["{}/top5".format(phase)] = top5.avg
|
||||
result[f"{phase}/time"] = batch_time.sum
|
||||
result[f"{phase}/loss"] = losses.avg
|
||||
result[f"{phase}/top1"] = top1.avg
|
||||
result[f"{phase}/top5"] = top5.avg
|
||||
|
||||
return result
|
||||
|
||||
def save(self, model_path):
|
||||
torch.save(
|
||||
self.model.state_dict(),
|
||||
model_path
|
||||
)
|
||||
def save(self, model_path: Union[Path, str]) -> None:
|
||||
""" Save the model to a path on disk. """
|
||||
torch.save(self.model.state_dict(), model_path)
|
||||
|
||||
def load(self, model_name, model_dir="checkpoints"):
|
||||
def load(self, model_name: str, model_dir: str = "checkpoints") -> None:
|
||||
"""
|
||||
TODO accept epoch. If None, load the latest model.
|
||||
:param model_name: Model name format should be 'name_0EE' where E is the epoch
|
||||
:param model_dir: By default, 'checkpoints'
|
||||
:return:
|
||||
"""
|
||||
self.model.load_state_dict(torch.load(
|
||||
os.path.join(model_dir, "{}.pt".format(model_name))
|
||||
))
|
||||
self.model.load_state_dict(
|
||||
torch.load(os.path.join(model_dir, f"{model_name}.pt"))
|
||||
)
|
||||
|
|
|
@ -20,7 +20,7 @@ def crop(clip, i, j, h, w):
|
|||
clip (torch.tensor): Video clip to be cropped. Size is (C, T, H, W)
|
||||
"""
|
||||
assert len(clip.size()) == 4, "clip should be a 4D tensor"
|
||||
return clip[..., i:i + h, j:j + w]
|
||||
return clip[..., i : i + h, j : j + w]
|
||||
|
||||
|
||||
def resize(clip, target_size, interpolation_mode):
|
||||
|
@ -53,7 +53,9 @@ def center_crop(clip, crop_size):
|
|||
assert _is_tensor_video_clip(clip), "clip should be a 4D torch.tensor"
|
||||
h, w = clip.size(-2), clip.size(-1)
|
||||
th, tw = crop_size
|
||||
assert h >= th and w >= tw, "height and width must be no smaller than crop_size"
|
||||
assert (
|
||||
h >= th and w >= tw
|
||||
), "height and width must be no smaller than crop_size"
|
||||
|
||||
i = int(round((h - th) / 2.0))
|
||||
j = int(round((w - tw) / 2.0))
|
||||
|
@ -71,7 +73,9 @@ def to_tensor(clip):
|
|||
"""
|
||||
assert _is_tensor_video_clip(clip), "clip should be a 4D torch.tensor"
|
||||
if not clip.dtype == torch.uint8:
|
||||
raise TypeError("clip tensor should have data type uint8. Got %s" % str(clip.dtype))
|
||||
raise TypeError(
|
||||
"clip tensor should have data type uint8. Got %s" % str(clip.dtype)
|
||||
)
|
||||
return clip.float().permute(3, 0, 1, 2) / 255.0
|
||||
|
||||
|
|
@ -5,6 +5,7 @@ import torch
|
|||
|
||||
class AverageMeter(object):
|
||||
"""Computes and stores the average and current value"""
|
||||
|
||||
def __init__(self):
|
||||
self.reset()
|
||||
|
|
@ -37,13 +37,19 @@ class ResizeVideo(object):
|
|||
size = (int(self.size), int(self.size))
|
||||
else:
|
||||
if self.keep_ratio:
|
||||
scale = min(self.size[0] / clip.shape[-2], self.size[1] / clip.shape[-1], )
|
||||
scale = min(
|
||||
self.size[0] / clip.shape[-2],
|
||||
self.size[1] / clip.shape[-1],
|
||||
)
|
||||
else:
|
||||
size = self.size
|
||||
|
||||
return nn.functional.interpolate(
|
||||
clip, size=size, scale_factor=scale,
|
||||
mode=self.interpolation_mode, align_corners=False
|
||||
clip,
|
||||
size=size,
|
||||
scale_factor=scale,
|
||||
mode=self.interpolation_mode,
|
||||
align_corners=False,
|
||||
)
|
||||
|
||||
|
||||
|
@ -66,7 +72,7 @@ class RandomCropVideo(object):
|
|||
return F.crop(clip, i, j, h, w)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__class__.__name__ + '(size={0})'.format(self.size)
|
||||
return self.__class__.__name__ + "(size={0})".format(self.size)
|
||||
|
||||
@staticmethod
|
||||
def get_params(clip, output_size):
|
||||
|
@ -87,8 +93,8 @@ class RandomCropVideo(object):
|
|||
i = random.randint(0, h - th)
|
||||
j = random.randint(0, w - tw)
|
||||
return i, j, th, tw
|
||||
|
||||
|
||||
|
||||
|
||||
class RandomResizedCropVideo(object):
|
||||
def __init__(
|
||||
self,
|
||||
|
@ -116,13 +122,17 @@ class RandomResizedCropVideo(object):
|
|||
size is (C, T, H, W)
|
||||
"""
|
||||
i, j, h, w = self.get_params(clip, self.scale, self.ratio)
|
||||
return F.resized_crop(clip, i, j, h, w, self.size, self.interpolation_mode)
|
||||
return F.resized_crop(
|
||||
clip, i, j, h, w, self.size, self.interpolation_mode
|
||||
)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__class__.__name__ + \
|
||||
'(size={0}, interpolation_mode={1}, scale={2}, ratio={3})'.format(
|
||||
return (
|
||||
self.__class__.__name__
|
||||
+ "(size={0}, interpolation_mode={1}, scale={2}, ratio={3})".format(
|
||||
self.size, self.interpolation_mode, self.scale, self.ratio
|
||||
)
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_params(clip, scale, ratio):
|
||||
|
@ -187,7 +197,7 @@ class CenterCropVideo(object):
|
|||
return F.center_crop(clip, self.size)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__class__.__name__ + '(size={0})'.format(self.size)
|
||||
return self.__class__.__name__ + "(size={0})".format(self.size)
|
||||
|
||||
|
||||
class NormalizeVideo(object):
|
||||
|
@ -212,8 +222,12 @@ class NormalizeVideo(object):
|
|||
return F.normalize(clip, self.mean, self.std, self.inplace)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__class__.__name__ + '(mean={0}, std={1}, inplace={2})'.format(
|
||||
self.mean, self.std, self.inplace)
|
||||
return (
|
||||
self.__class__.__name__
|
||||
+ "(mean={0}, std={1}, inplace={2})".format(
|
||||
self.mean, self.std, self.inplace
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class ToTensorVideo(object):
|
|
@ -70,10 +70,10 @@ def read_classes_file(classes_filepath):
|
|||
classes = {}
|
||||
with open(classes_filepath) as class_file:
|
||||
for line in class_file:
|
||||
class_name, class_id = line.split(' ')
|
||||
class_name, class_id = line.split(" ")
|
||||
classes[class_name] = class_id.rstrip()
|
||||
return classes
|
||||
|
||||
|
||||
|
||||
def create_clip_file_name(row, clip_file_format="mp4"):
|
||||
"""
|
||||
|
@ -87,7 +87,7 @@ def create_clip_file_name(row, clip_file_format="mp4"):
|
|||
:return: str.
|
||||
The output clip file name.
|
||||
"""
|
||||
#video_file = ast.literal_eval(row.file_list)[0]
|
||||
# video_file = ast.literal_eval(row.file_list)[0]
|
||||
video_file = os.path.splitext(row["file_list"])[0]
|
||||
clip_id = row["# CSV_HEADER = metadata_id"]
|
||||
clip_file = "{}_{}.{}".format(video_file, clip_id, clip_file_format)
|
||||
|
@ -477,14 +477,14 @@ def extract_contiguous_negative_clips(
|
|||
|
||||
# video_path = os.path.join(video_dir, negative_sample_file)
|
||||
video_fname = os.path.splitext(os.path.basename(video_file_path))[0]
|
||||
clip_fname = video_fname+no_action_class+str(i)
|
||||
clip_fname = video_fname + no_action_class + str(i)
|
||||
clip_subdir_fname = os.path.join(no_action_class, clip_fname)
|
||||
negative_clip_file_list.append(clip_subdir_fname)
|
||||
_extract_clip_ffmpeg(
|
||||
start_time,
|
||||
duration,
|
||||
video_file_path,
|
||||
os.path.join(negative_clip_dir, clip_fname+"."+clip_format),
|
||||
os.path.join(negative_clip_dir, clip_fname + "." + clip_format),
|
||||
ffmpeg_path,
|
||||
)
|
||||
|
||||
|
@ -496,6 +496,7 @@ def extract_contiguous_negative_clips(
|
|||
}
|
||||
)
|
||||
|
||||
|
||||
def extract_sampled_negative_clips(
|
||||
video_info_df,
|
||||
num_negative_samples,
|
||||
|
@ -548,7 +549,9 @@ def extract_sampled_negative_clips(
|
|||
clips_sampled = 0
|
||||
while clips_sampled < num_negative_samples:
|
||||
# pick random file in list of videos
|
||||
negative_sample_file = video_files[random.randint(0, len(video_files)-1)]
|
||||
negative_sample_file = video_files[
|
||||
random.randint(0, len(video_files) - 1)
|
||||
]
|
||||
# get video duration
|
||||
duration = video_len[negative_sample_file]
|
||||
# pick random start time for clip
|
||||
|
@ -559,15 +562,27 @@ def extract_sampled_negative_clips(
|
|||
# check to ensure negative clip doesn't overlap a positive clip or pick another file
|
||||
if negative_sample_file in positive_intervals.keys():
|
||||
clip_positive_intervals = positive_intervals[negative_sample_file]
|
||||
if check_interval_overlaps(clip_start, clip_end, clip_positive_intervals):
|
||||
if check_interval_overlaps(
|
||||
clip_start, clip_end, clip_positive_intervals
|
||||
):
|
||||
continue
|
||||
video_path = os.path.join(video_dir, negative_sample_file)
|
||||
video_fname = os.path.splitext(negative_sample_file)[0]
|
||||
clip_fname = video_fname+no_action_class+str(clips_sampled)
|
||||
clip_fname = video_fname + no_action_class + str(clips_sampled)
|
||||
clip_subdir_fname = os.path.join(no_action_class, clip_fname)
|
||||
_extract_clip_ffmpeg(
|
||||
clip_start, negative_clip_length, video_path, os.path.join(clip_dir, clip_subdir_fname+"."+clip_format),
|
||||
clip_start,
|
||||
negative_clip_length,
|
||||
video_path,
|
||||
os.path.join(clip_dir, clip_subdir_fname + "." + clip_format),
|
||||
)
|
||||
with open(label_filepath, 'a') as f:
|
||||
f.write("\""+clip_subdir_fname+"\""+" "+str(classes[no_action_class])+"\n")
|
||||
clips_sampled += 1
|
||||
with open(label_filepath, "a") as f:
|
||||
f.write(
|
||||
'"'
|
||||
+ clip_subdir_fname
|
||||
+ '"'
|
||||
+ " "
|
||||
+ str(classes[no_action_class])
|
||||
+ "\n"
|
||||
)
|
||||
clips_sampled += 1
|
||||
|
|
|
@ -28,7 +28,9 @@ class Urls:
|
|||
base, "fridgeObjectsWatermarkTiny.zip"
|
||||
)
|
||||
fridge_objects_negatives_path = urljoin(base, "fridgeObjectsNegative.zip")
|
||||
fridge_objects_negatives_tiny_path = urljoin(base, "fridgeObjectsNegativeTiny.zip")
|
||||
fridge_objects_negatives_tiny_path = urljoin(
|
||||
base, "fridgeObjectsNegativeTiny.zip"
|
||||
)
|
||||
|
||||
# multilabel datasets
|
||||
multilabel_fridge_objects_path = urljoin(
|
||||
|
|
|
@ -3,8 +3,10 @@
|
|||
|
||||
import os
|
||||
import platform
|
||||
|
||||
import sys
|
||||
import torch
|
||||
import torch.cuda as cuda
|
||||
import torchvision
|
||||
from torch.cuda import current_device, get_device_name, is_available
|
||||
|
||||
|
||||
|
@ -47,6 +49,15 @@ def torch_device():
|
|||
)
|
||||
|
||||
|
||||
def num_devices():
|
||||
""" Gets the number of devices based on cpu/gpu """
|
||||
return (
|
||||
torch.cuda.device_count()
|
||||
if torch.cuda.is_available()
|
||||
else 1
|
||||
)
|
||||
|
||||
|
||||
def db_num_workers(non_windows_num_workers: int = 16):
|
||||
"""Returns how many workers to use when loading images in a databunch. On windows machines using >0 works significantly slows down model
|
||||
training and evaluation. Setting num_workers to zero on Windows machines will speed up training/inference significantly, but will still be
|
||||
|
@ -58,3 +69,15 @@ def db_num_workers(non_windows_num_workers: int = 16):
|
|||
return 0
|
||||
else:
|
||||
return non_windows_num_workers
|
||||
|
||||
|
||||
def system_info():
|
||||
print(sys.version, "\n")
|
||||
print(f"PyTorch {torch.__version__} \n")
|
||||
print(f"Torch-vision {torchvision.__version__} \n")
|
||||
print("Available devices:")
|
||||
if cuda.is_available():
|
||||
for i in range(cuda.device_count()):
|
||||
print(f"{i}: {cuda.get_device_name(i)}")
|
||||
else:
|
||||
print("CPUs only, no GPUs found")
|
||||
|
|
|
@ -83,3 +83,34 @@ def get_font(size: int = 12) -> ImageFont:
|
|||
font = None
|
||||
|
||||
return font
|
||||
|
||||
|
||||
class Config(object):
|
||||
def __init__(self, config=None, **extras):
|
||||
"""Dictionary wrapper to access keys as attributes.
|
||||
|
||||
Args:
|
||||
config (dict or Config): Configurations
|
||||
extras (kwargs): Extra configurations
|
||||
|
||||
Examples:
|
||||
>>> cfg = Config({'lr': 0.01}, momentum=0.95)
|
||||
or
|
||||
>>> cfg = Config({'lr': 0.01, 'momentum': 0.95})
|
||||
then, use as follows:
|
||||
>>> print(cfg.lr, cfg.momentum)
|
||||
"""
|
||||
if config is not None:
|
||||
if isinstance(config, dict):
|
||||
for k in config:
|
||||
setattr(self, k, config[k])
|
||||
elif isinstance(config, self.__class__):
|
||||
self.__dict__ = config.__dict__.copy()
|
||||
else:
|
||||
raise ValueError("Unknown config")
|
||||
|
||||
for k, v in extras.items():
|
||||
setattr(self, k, v)
|
||||
|
||||
def get(self, key, default):
|
||||
return getattr(self, key, default)
|
||||
|
|
|
@ -109,7 +109,13 @@ class CocoEvaluator(object):
|
|||
labels = prediction["labels"].tolist()
|
||||
|
||||
rles = [
|
||||
mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0]
|
||||
mask_util.encode(
|
||||
# Change according to the issue related to mask:
|
||||
# https://github.com/pytorch/vision/issues/1355#issuecomment-544951911
|
||||
np.array(
|
||||
mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"
|
||||
)
|
||||
)[0]
|
||||
for mask in masks
|
||||
]
|
||||
for rle in rles:
|
||||
|
|
Загрузка…
Ссылка в новой задаче