An algorithm for cross-domain NL2SQL

Перейти к файлу

JiaqiGuo 72df5c876f Fix bug in measurement		2019-12-07 11:24:42 +09:00
preprocess	add files	2019-11-01 10:26:20 +08:00
src	Fix bug in measurement	2019-12-07 02:34:01 +09:00
.gitignore	Initial commit	2019-11-01 01:10:43 +00:00
CODE_OF_CONDUCT.md	Initial CODE_OF_CONDUCT.md commit	2019-10-31 18:10:47 -07:00
LICENSE	Initial LICENSE commit	2019-10-31 18:10:47 -07:00
NOTICE	upload notice file	2019-11-01 10:31:54 +08:00
README.md	Update README.md	2019-12-05 09:23:23 +08:00
SECURITY.md	Initial SECURITY.md commit	2019-10-31 18:10:49 -07:00
__init__.py	add files	2019-11-01 10:26:20 +08:00
eval.py	Fix bug in measurement	2019-12-07 02:34:01 +09:00
eval.sh	fix file format	2019-11-11 21:57:11 +08:00
requirements.txt	add files	2019-11-01 10:26:20 +08:00
sem2SQL.py	Generate ground truth for evaluation	2019-11-12 10:47:15 +08:00
train.py	Fix bug in measurement	2019-12-07 11:24:42 +09:00
train.sh	fix file format	2019-11-11 21:57:11 +08:00

README.md

IRNet

Code for our ACL'19 accepted paper: Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Environment Setup

Python3.6
Pytorch 0.4.0 or higher

Install Python dependency via pip install -r requirements.txt when the environment of Python and Pytorch is setup.

Running Code

Data preparation

Download Glove Embedding and put glove.42B.300d under ./data/ directory
Download Pretrained IRNet and put IRNet_pretrained.model under ./saved_model/ directory
Download preprocessed train/dev datasets from here and put train.json, dev.json and tables.json under ./data/ directory

Generating train/dev data by yourself

You could process the origin Spider Data by your own. Download and put train.json, dev.json and tables.json under ./data/ directory and follow the instruction on ./preprocess/

Training

Run train.sh to train IRNet.

sh train.sh [GPU_ID] [SAVE_FOLD]

Testing

Run eval.sh to eval IRNet.

sh eval.sh [GPU_ID] [OUTPUT_FOLD]

Evaluation

You could follow the general evaluation process in Spider Page

Results

Model	Dev Exact Set Match Accuracy	Test Exact Set Match Accuracy
IRNet	53.2	46.7
IRNet+BERT(base)	61.9	54.7

Citation

If you use IRNet, please cite the following work.

@inproceedings{GuoIRNet2019,
  author={Jiaqi Guo and Zecheng Zhan and Yan Gao and Yan Xiao and Jian-Guang Lou and Ting Liu and Dongmei Zhang},
  title={Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation},
  booktitle={Proceeding of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2019},
  organization={Association for Computational Linguistics}
}

Thanks

We would like to thank Tao Yu and Bo Pang for running evaluations on our submitted models. We are also grateful to the flexible semantic parser TranX that inspires our works.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.