Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
nlp
multimodal
beit
beit-3
deepnet
document-ai
foundation-models
kosmos
kosmos-1
layoutlm
layoutxlm
llm
minilm
mllm
pre-trained-model
textdiffuser
trocr
unilm
xlm-e
Обновлено 2024-11-09 13:45:59 +03:00
Platform for Situated Intelligence
artificial-intelligence
perception
framework
component-library
streaming
stream-processing
human-robot-interaction
multimodal
multimodal-interactions
pipelines
Обновлено 2024-11-08 21:16:09 +03:00
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
video
localization
segmentation
caption-task
coin
joint
msrvtt
multimodal-sentiment-analysis
multimodality
pretrain
pretraining
retrieval-task
video-language
video-text
video-text-retrieval
youcookii
alignment
caption
Обновлено 2024-07-25 14:07:31 +03:00
Обновлено 2024-06-05 21:41:15 +03:00
Обновлено 2022-09-29 22:46:51 +03:00
Multitask Multilingual Multimodal Pre-training
Обновлено 2021-05-13 09:56:36 +03:00