An algorithm for cross-domain NL2SQL
Перейти к файлу
JiaqiGuo 72df5c876f Fix bug in measurement 2019-12-07 11:24:42 +09:00
preprocess add files 2019-11-01 10:26:20 +08:00
src Fix bug in measurement 2019-12-07 02:34:01 +09:00
.gitignore Initial commit 2019-11-01 01:10:43 +00:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2019-10-31 18:10:47 -07:00
LICENSE Initial LICENSE commit 2019-10-31 18:10:47 -07:00
NOTICE upload notice file 2019-11-01 10:31:54 +08:00
README.md Update README.md 2019-12-05 09:23:23 +08:00
SECURITY.md Initial SECURITY.md commit 2019-10-31 18:10:49 -07:00
__init__.py add files 2019-11-01 10:26:20 +08:00
eval.py Fix bug in measurement 2019-12-07 02:34:01 +09:00
eval.sh fix file format 2019-11-11 21:57:11 +08:00
requirements.txt add files 2019-11-01 10:26:20 +08:00
sem2SQL.py Generate ground truth for evaluation 2019-11-12 10:47:15 +08:00
train.py Fix bug in measurement 2019-12-07 11:24:42 +09:00
train.sh fix file format 2019-11-11 21:57:11 +08:00

README.md

IRNet

Code for our ACL'19 accepted paper: Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Environment Setup

  • Python3.6
  • Pytorch 0.4.0 or higher

Install Python dependency via pip install -r requirements.txt when the environment of Python and Pytorch is setup.

Running Code

Data preparation

  • Download Glove Embedding and put glove.42B.300d under ./data/ directory
  • Download Pretrained IRNet and put IRNet_pretrained.model under ./saved_model/ directory
  • Download preprocessed train/dev datasets from here and put train.json, dev.json and tables.json under ./data/ directory
Generating train/dev data by yourself

You could process the origin Spider Data by your own. Download and put train.json, dev.json and tables.json under ./data/ directory and follow the instruction on ./preprocess/

Training

Run train.sh to train IRNet.

sh train.sh [GPU_ID] [SAVE_FOLD]

Testing

Run eval.sh to eval IRNet.

sh eval.sh [GPU_ID] [OUTPUT_FOLD]

Evaluation

You could follow the general evaluation process in Spider Page

Results

Model Dev
Exact Set Match
Accuracy
Test
Exact Set Match
Accuracy
IRNet 53.2 46.7
IRNet+BERT(base) 61.9 54.7

Citation

If you use IRNet, please cite the following work.

@inproceedings{GuoIRNet2019,
  author={Jiaqi Guo and Zecheng Zhan and Yan Gao and Yan Xiao and Jian-Guang Lou and Ting Liu and Dongmei Zhang},
  title={Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation},
  booktitle={Proceeding of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2019},
  organization={Association for Computational Linguistics}
}

Thanks

We would like to thank Tao Yu and Bo Pang for running evaluations on our submitted models. We are also grateful to the flexible semantic parser TranX that inspires our works.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.