История

ZubinGou 5cf70eb41c svamp and tabmwp prompt		2024-04-28 15:27:01 +08:00
..
data	🎉 upload code and data	2023-05-23 00:09:07 +08:00
images	🎉 upload code and data	2023-05-23 00:09:07 +08:00
prompts	svamp and tabmwp prompt	2024-04-28 15:27:01 +08:00
scripts	🎉 upload code and data	2023-05-23 00:09:07 +08:00
src	FIX: dataset loader	2023-08-19 15:48:40 +08:00
.gitignore	🎉 upload code and data	2023-05-23 00:09:07 +08:00
README.md	svamp and tabmwp prompt	2024-04-28 15:27:01 +08:00
requirements.txt	🎉 upload code and data	2023-05-23 00:09:07 +08:00

README.md

🤔🛠️🤖 CRITIC:
Large Language Models Can Self-Correct
with Tool-Interactive Critiquing

[Quick Start] • [Paper] • [Citation]

Repo for the paper "CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing" [ICLR'24]

🔥 News

In our updated paper, we add CRITIC results of open-source LLaMA-2 models from 7B to 70B on three QA / commonsense reasoning tasks, and three mathematical code synthesis tasks.

💡 Introduction

For the first time, we find that LLMs' Self-Verification and Self-Correction are unreliable; and we propose CRITIC, which enables LLMs to validate and rectify themselves through interaction with external tools.

Humans typically utilize external tools to cross-check and reﬁne their initial content, like using a search engine for fact-checking, or a code interpreter for debugging. Inspired by this observation, we introduce a framework called CRITIC that allows LLMs, which are essentially “black boxes” to validate and progressively amend their own outputs in a manner similar to human interaction with tools.

💬 Examples

🛠️ Setup

We recommend the use of conda environments:

conda create --name critic python=3.8
conda activate critic
pip install -r requirements.txt

Configure APIs:

Configure the LLMs API in src/llms/api.py.
For truthfulness evaluation and fact correction, configure the Google Search API in src/tools/config.py.
For toxicity reduction, you can follow this tutorial and configure Perspective API in src/tools/config.py.

🔥🔥 For an alternative to Google API, try our free web scraping tools available at LLM-Agent-Web-Tools.

🚀 Quick Start

We provide example bash scripts for each task as follows:

Free-from Question Answering (Google)

Inference: bash scripts/run_qa_infer.sh
CRITIC: bash scripts/run_qa_critic.sh
Evaluation: python -m src.qa.evaluate

Mathematical Program Synthesis (Python Interpreter)

Inference: bash scripts/run_program_infer.sh
CRITIC: bash scripts/run_program_critic.sh
Evaluation: python -m src.program.evaluate

Toxicity Reduction (Perpective API)

Inference: bash scripts/run_toxicity_infer.sh
CRITIC: bash scripts/run_toxicity_critic.sh
Evaluation: python -m src.toxicity.evaluate

🎯 Results

Example results with gpt-3.5-turbo:

Free-from Question Answering:

Mathematical Program Synthesis:

Toxicity Reduction:

☕️ Citation

@inproceedings{
    gou2024critic,
    title={{CRITIC}: Large Language Models Can Self-Correct with Tool-Interactive Critiquing},
    author={Zhibin Gou and Zhihong Shao and Yeyun Gong and yelong shen and Yujiu Yang and Nan Duan and Weizhu Chen},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=Sx038qxjek}
}

README.md Убрать экранирование Экранировать

🤔🛠️🤖 CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing