Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
nlp
multimodal
beit
beit-3
deepnet
document-ai
foundation-models
kosmos
kosmos-1
layoutlm
layoutxlm
llm
minilm
mllm
pre-trained-model
textdiffuser
trocr
unilm
xlm-e
Обновлено 2024-11-09 13:45:59 +03:00
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
video
localization
segmentation
caption-task
coin
joint
msrvtt
multimodal-sentiment-analysis
multimodality
pretrain
pretraining
retrieval-task
video-language
video-text
video-text-retrieval
youcookii
alignment
caption
Обновлено 2024-07-25 14:07:31 +03:00
Multitask Multilingual Multimodal Pre-training
Обновлено 2021-05-13 09:56:36 +03:00