This commit is contained in:
ChunyuanLI 2023-04-17 22:56:43 -07:00 коммит произвёл GitHub
Родитель 4788a7425c
Коммит 4a29ea1763
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 3 добавлений и 0 удалений

Просмотреть файл

@ -1,6 +1,9 @@
# Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks <img src="docs/oscar_logo.png" width="200" align="right">
# VinVL: Revisiting Visual Representations in Vision-Language Models
## Updates
04/17/2023: Visual instruction tuning with GPT-4 is released! Please check out the multimodal model LLaVA: [[Project Page](https://llava-vl.github.io/)] [[Paper](https://arxiv.org/abs/2304.08485)] [[Demo](https://llava.hliu.cc/)] [[Data](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)] [[Model](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v0)]
05/28/2020: Released finetuned models on downstream tasks, please check [MODEL_ZOO.md](MODEL_ZOO.md). <br/>
05/15/2020: Released pretrained models, datasets, and code for downstream tasks finetuning. <br/>
01/13/2021: our new work [VinVL](https://arxiv.org/abs/2101.00529) proposed OSCAR+, an improved version of OSCAR, and provided a better object-attribute detection model to extract features for V+L tasks. The VinVL work achieved SOTA performance on all seven V+L tasks here. Please stay tuned for the model and code release. <br/>