Update fine-tuned model weights.

This commit is contained in:
SivilTaram 2021-08-29 17:50:20 +08:00
Родитель 85be38dca8
Коммит 1b2514327f
4 изменённых файлов: 39 добавлений и 23 удалений

Просмотреть файл

@ -9,7 +9,7 @@ The official repository which contains the code and pre-trained models for our p
# 🏴󠁶󠁵󠁭󠁡󠁰󠁿 Overview
## 📝 Paper
## Paper
In the paper, we present T<span class="span-small">A</span>PE<span class="span-small">X</span> (for **Ta**ble **P**re-training via **Ex**ecution), a conceptually simple and empirically powerful pre-training approach to empower existing generative pre-trained models (e.g., [BART](https://arxiv.org/abs/1910.13461) in our paper) with table reasoning skills.
T<span class="span-small">A</span>PE<span class="span-small">X</span> realizes table pre-training by **learning a neural SQL executor over a synthetic corpus**, which is obtained by automatically synthesizing executable SQL queries.
@ -30,7 +30,7 @@ We believe that if a model can be trained to faithfully *execute* SQL queries, t
Meanwhile, since the diversity of SQL queries can be guaranteed systemically, and thus a *diverse* and *high-quality* pre-training corpus can be automatically synthesized for T<span class="span-small">A</span>PE<span class="span-small">X</span>.
## 💻 Project
## Project
This project contains two parts, `tapex` library and `examples` to employ it on different table-related applications (e.g., Table Question Answering).
@ -97,18 +97,11 @@ Below is an example from the pre-training corpus:
- The SQL plus flattened Table as **INPUT**:
```
select vote where passed = 'may 6, 1861' col : state | passed | referendum | vote
row 1 : s. carolina | december 20, 1860 | none | none
row 2 : mississippi | january 9, 1861 | none | none row 3 : florida | january 10, 1861 | none | none
row 4 : alabama | january 11, 1861 | none | none row 5 : georgia | january 19, 1861 | none | none
row 6 : louisiana | january 26, 1861 | none | none row 7 : texas | february 1, 1861 | february 23 | 46,153-14,747
row 8 : virginia | april 17, 1861 | may 23 | 132,201-37,451 row 9 : arkansas | may 6, 1861 | none | none
row 10 : tennessee | may 6, 1861 | june 8 | 104,471-47,183 row 11 : n. carolina | may 20, 1861 | none | none
row 12 : missouri | october 31, 1861 | none | none row 13 : kentucky | november 20, 1861 | none | none
select ( select number where number = 4 ) - ( select number where number = 3 ) col : number | date | name | age (at execution) | age (at offense) | race | state | method row 1 : 1 | november 2, 1984 | velma margie barfield | 52 | 45 | white | north carolina | lethal injection row 2 : 2 | february 3, 1998 | karla faye tucker | 38 | 23 | white | texas | lethal injection row 3 : 3 | march 30, 1998 | judias v. buenoano | 54 | 28 | white | florida | electrocution row 4 : 4 | february 24, 2000 | betty lou beets | 62 | 46 | white | texas | lethal injection row 5 : 5 | may 2, 2000 | christina marie riggs | 28 | 26 | white | arkansas | lethal injection row 6 : 6 | january 11, 2001 | wanda jean allen | 41 | 29 | black | oklahoma | lethal injection row 7 : 7 | may 1, 2001 | marilyn kay plantz | 40 | 27 | white | oklahoma | lethal injection row 8 : 8 | december 4, 2001 | lois nadean smith | 61 | 41 | white | oklahoma | lethal injection row 9 : 9 | may 10, 2002 | lynda lyon block | 54 | 45 | white | alabama | electrocution row 10 : 10 | october 9, 2002 | aileen carol wuornos | 46 | 33 | white | florida | lethal injection row 11 : 11 | september 14, 2005 | frances elaine newton | 40 | 21 | black | texas | lethal injection row 12 : 12 | september 23, 2010 | teresa wilson bean lewis | 41 | 33 | white | virginia | lethal injection row 13 : 13 | june 26, 2013 | kimberly lagayle mccarthy | 52 | 36 | black | texas | lethal injection row 14 : 14 | february 5, 2014 | suzanne margaret basso | 59 | 44 | white | texas | lethal injection
```
- The SQL Execution Result as **OUTPUT**:
```
104471
1.0
```
Here we want to acknowledge the huge effort of paper [On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries](https://arxiv.org/pdf/2010.11246.pdf), which provides the rich resources of SQL templates for us to synthesize the pre-training corpus.
@ -121,7 +114,23 @@ Model | Description | # params | Download
`tapex.base` | 6 encoder and decoder layers | 140M | [tapex.base.tar.gz](https://github.com/microsoft/Table-Pretraining/releases/download/v1.0/tapex.base.tar.gz)
`tapex.large` | 12 encoder and decoder layers | 400M | [tapex.large.tar.gz](https://github.com/microsoft/Table-Pretraining/releases/download/v1.0/tapex.large.tar.gz)
> More pre-trained models will be uploaded soon!
## Fine-tuned Models
We provide fine-tuned model weights and their performance on different datasets below. The following Accuracy (Acc) refers to denotation accuracy computed by our script `model_eval.py`. Meanwhile, it is worth noting that we need truncating long tables during preprocessing with some randomness. Therefore, we also provide preprocessed datasets for reproducing our experimental results.
Model | Dev Acc | Test Acc | Data | Download
---|---|----|----|----
`tapex.large.wtq` | 58.0 | 57.2 | WikiTableQuestions | [data](https://github.com/microsoft/Table-Pretraining/releases/download/preprocessed-data/wtq.preprocessed.zip) [model](https://github.com/microsoft/Table-Pretraining/releases/download/fine-tuned-model/tapex.large.wtq.tar.gz)
`tapex.large.sqa` | 70.7 | 74.0 | SQA | [data](https://github.com/microsoft/Table-Pretraining/releases/download/preprocessed-data/sqa.preprocessed.zip) [model](https://github.com/microsoft/Table-Pretraining/releases/download/fine-tuned-model/tapex.large.sqa.tar.gz)
Given these fine-tuned model weights, you can play with them using the `predict` mode in `examples/tableqa/run_model.py`.
For example, you can use the following command and see its log:
```shell
$ python examples/tableqa/run_model.py predict --resource-dir ./tapex.large.wtq --checkpoint-name model.pt
2021-08-29 17:39:47 | INFO | __main__ | Receive question as : Greece held its last Summer Olympics in which year?
2021-08-29 17:39:47 | INFO | __main__ | The answer should be : 2004
```
# 💬 Citation

Просмотреть файл

@ -30,9 +30,15 @@ After one dataset is prepared, you can run the `tableqa/run_model.py` script to
### 🍳 Train
To train a model, you could simply run the following command, where `<dataset_dir>` refers to directory which contains a `bin` folder such as `dataset/wikisql/tapex.base`, `<model_path>` refers to a pre-trained model path such as `tapex.base/model.pt`, `<model_arch>` refers to a pre-defined model architecture in fairseq such as `bart_base`.
To train a model, you could simply run the following command, where:
- `<dataset_dir>` refers to directory which contains a `bin` folder such as `dataset/wikisql/tapex.base`
- `<model_path>` refers to a pre-trained model path such as `tapex.base/model.pt`
- `<model_arch>` is a pre-defined model architecture in fairseq such as `bart_base`.
**HINT**:
- for `tapex.base` or `tapex.large`, `<model_arch>` should be `bart_base` or `bart_large` respectively.
- we would like to raise the readers' attention on the fact that the `accuracy` metric during training indicates the token-level accuracy defined in fairseq instead of the following denotation accuracy. Therefore, the `checkpoint_best.pt` is not always the best one for denotation accuracy. We recommend readers to evaluate all checkpoints using the following command to determine the best one.
**HINT**: for `tapex.base` or `tapex.large`, `<model_arch>` should be `bart_base` or `bart_large` respectively.
```shell
$ python run_model.py train --dataset-dir <dataset_dir> --model-path <model_path> --model-arch <model_arch>
@ -65,10 +71,14 @@ A full list of training arguments can be seen as below:
### 🍪 Evaluate
Once the model is fine-tuned, we can evaluate it by runing the following command, where `<dataset_dir>` refers to directory which contains a `bin` folder such as `dataset/wikisql/tapex.base`, and `<model_path>` refers to a fine-tuned model path such as `checkpoints/checkpoint_best.pt`.
Once the model is fine-tuned, we can evaluate it by running the following command, where:
- `<dataset_dir>` refers to directory which contains a `bin` folder such as `dataset/wikisql/tapex.base`
- `<model_path>` refers to a fine-tuned model path such as `checkpoints/checkpoint_best.pt`
- `<sub_dir>` refers to `valid` or `test` for the validation set and test set.
- `<predict_dir>` is used to save the evaluating result, which indicates the correctness of each sample such as `predict_wikisql`
```shell
$ python run_model.py eval --dataset-dir <dataset_dir> --model-path <model_path>
$ python run_model.py eval --dataset-dir <dataset_dir> --model-path <model_path> --sub-dir <sub_dir> --predict-dir <predict_dir>
```
A full list of evaluating arguments can be seen as below:

Просмотреть файл

@ -53,21 +53,14 @@ def evaluate(data: List, target_delimiter: str):
correct_num = 0
correct_arr = []
chunk_level_correct = defaultdict(lambda: {
"total": 0,
"correct": 0
})
total = len(data)
for example in data:
predict_str, ground_str, source_str, predict_id = example
chunk_number = source_str.count("<chunk>")
is_correct = evaluate_example(predict_str, ground_str)
if is_correct:
correct_num += 1
chunk_level_correct[chunk_number]["correct"] += 1
correct_arr.append(is_correct)
chunk_level_correct[chunk_number]["total"] += 1
print("Correct / Total : {} / {}, Denotation Accuracy : {:.3f}".format(correct_num, total, correct_num / total))
return correct_arr

Просмотреть файл

@ -11,6 +11,9 @@ from tapex.processor.table_linearize import TableLinearize
logger = logging.getLogger(__name__)
# truncate will randomly drop rows
random.seed(42)
class TableTruncate(ABC):
@ -79,6 +82,7 @@ class RowDeleteTruncate(TableTruncate):
The row deleting principle is straightforward: randomly deleting rows to fit the table into memory,
but do not make it too small (e.g., just lower than the limitation is ok).
"""
def __init__(self, table_linearize: TableLinearize, **kwargs):
super().__init__(**kwargs)
self.table_linearize = table_linearize