5.7 KiB

Исходник Постоянная ссылка Ответственный История

MS-2D-TAN

In this paper, we study the problem of moment localization with natural language, and propose a extend our previous proposed 2D-TAN method to a multi-scale version. The core idea is to retrieve a moment from two-dimensional temporal maps at different temporal scales, which considers adjacent moment candidates as the temporal context. The extended version is capable of encoding adjacent temporal relation at different scales, while learning discriminative features for matching video moments with referring expressions. Our model is simple in design and achieves competitive performance in comparison with the state-of-the-art methods on three benchmark datasets.

Arxiv Preprint

Framework

Main Results

Main results on Charades-STA

Feature	Rank1@0.5	Rank1@0.7	Rank5@0.5	Rank5@0.7
VGG	45.65	27.20	85.91	57.61
C3D	41.10	23.25	81.53	48.55
I3D	56.64	36.21	89.14	61.13
I3D*	60.08	37.39	89.06	59.17

(I3D* represents I3D features finetuned on Charades)

Main results on ActivityNet Captions

Feature	Rank1@0.3	Rank1@0.5	Rank1@0.7	Rank5@0.3	Rank5@0.5	Rank5@0.7
C3D	61.04	46.16	29.21	87.31	78.80	60.85
I3D	62.09	45.50	28.28	87.61	79.36	61.70

Main results on TACoS

Feature	Rank1@0.1	Rank1@0.3	Rank1@0.5	Rank1@0.7	Rank5@0.1	Rank5@0.3	Rank5@0.5	Rank5@0.7
VGG	50.64	43.31	35.27	23.54	78.31	66.18	55.81	38.09
C3D	49.24	41.74	34.29	21.54	78.33	67.01	56.76	36.84
I3D	48.66	41.96	33.59	22.14	75.96	64.93	53.44	36.12

Prerequisites

pytorch 1.4.0
python 3.7
torchtext
easydict
terminaltables

Quick Start

Download Datasets

Please download the data from box, dropbox or baidu and save it to the data folder.

Training

Run the following commands for training:

Table 1

python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-VGG.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-C3D.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-I3D.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-I3D-Finetuned.yaml --verbose --tag base

Table 2

python moment_localization/run.py --cfg experiments/activitynet/MS-2D-TAN-G-C3D.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/activitynet/MS-2D-TAN-G-I3D.yaml --verbose --tag base

Table 3

python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-VGG.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-C3D.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-I3D.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-C3D-H512N512K5A8k9L2.yaml --verbose --tag base
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-I3D-H512N512K5A8k9L2.yaml --verbose --tag base

Evaluation

Download all the trained model from box, dropbox or baidu and save them to the release_checkpoints folder.

Then, run the following commands to evaluate our trained models:

Table 1

python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-VGG.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-C3D.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-I3D.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/charades/MS-2D-TAN-G-I3D-Finetuned.yaml --verbose --split test --mode test

Table 2

python moment_localization/run.py --cfg experiments/activitynet/MS-2D-TAN-G-C3D.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/activitynet/MS-2D-TAN-G-I3D.yaml --verbose --split test --mode test

Table 3

python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-VGG.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-C3D.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-I3D.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-C3D-H512N512K5A8k9L2.yaml --verbose --split test --mode test
python moment_localization/run.py --cfg experiments/tacos/MS-2D-TAN-G-I3D-H512N512K5A8k9L2.yaml --verbose --split test --mode test

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@InProceedings{Zhang2021MS2DTAN,
author = {Zhang, Songyang and Peng, Houwen and Fu, Jianlong and Lu, Yijuan and Luo, Jiebo},
title = {Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language},
booktitle = {TPAMI},
year = {2021}
}

5.7 KiB Исходник Постоянная ссылка Ответственный История

MS-2D-TAN

Framework

Main Results

Main results on Charades-STA

Main results on ActivityNet Captions

Main results on TACoS

Prerequisites

Quick Start

Download Datasets

Training

Table 1

Table 2

Table 3

Evaluation

Table 1

Table 2

Table 3

Citation

5.7 KiB

Исходник Постоянная ссылка Ответственный История