An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Обновлено 2024-07-25 14:07:31 +03:00
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
Обновлено 2023-05-23 01:20:31 +03:00