An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
video
localization
segmentation
caption-task
coin
joint
msrvtt
multimodal-sentiment-analysis
multimodality
pretrain
pretraining
retrieval-task
video-language
video-text
video-text-retrieval
youcookii
alignment
caption
Обновлено 2024-07-25 14:07:31 +03:00
Oscar and VinVL
Обновлено 2023-08-28 04:34:58 +03:00
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
Обновлено 2023-05-23 01:20:31 +03:00