Source code for EMNLP2019 paper "Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL".

Перейти к файлу

Haoyan Liu 246f52ee70 Merge pull request #3 from microsoft/release add citation		2019-11-11 10:25:34 +08:00
SQLNet	init commit	2019-10-17 21:28:27 +08:00
data	init commit	2019-10-11 14:54:02 +08:00
syntaxSQL	init commit	2019-10-17 21:28:27 +08:00
.gitignore	Initial commit	2019-09-23 00:34:22 -07:00
CODE_OF_CONDUCT.md	Initial commit	2019-09-23 00:34:26 -07:00
LICENSE	Initial commit	2019-09-23 00:34:27 -07:00
NOTICE	init commit	2019-10-11 14:54:02 +08:00
README.md	add citation	2019-11-11 10:24:32 +08:00
SECURITY.md	Initial commit	2019-09-23 00:34:29 -07:00
preprocess_direction_features.py	init commit	2019-10-17 21:28:27 +08:00
preprocess_syntaxSQL.py	init commit	2019-10-17 21:28:27 +08:00
requirements.txt	init commit	2019-10-11 14:54:02 +08:00

README.md

Adjective-Knowledge-for-Text-to-SQL

This is the source code for our paper Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL (EMNLP 2019).

In this paper, we propose to leverage adjective-noun phrasing knowledge mined from the web to predict the comparison relations in text-to-SQL. Experimental results on both the original and the re-split Spider dataset show that our approach achieves significant improvement over syntaxSQL and SQLNet on comparison relation prediction.

Preliminaries

Enviroment Setup

The baseline codes use Python 2.7 and Pytorch 0.2.0 GPU. Install Python dependency: pip install -r requirements.txt Alternatively use docker: docker pull buaa1156/py27torch0.2cuda8vim:latest
The preprocess scripts use Python >= 3.5.

Data and Embeddings

The dataset comes from the Spider task website, and the singletable and resplitdata used in our paper are under data/singletable and data/resplitdata respectively.
The knowledge used in this paper is under the folder data/knowledge.
Download the pretrained Glove, and put it under syntaxSQL and SQLNet folders as glove/glove.42B.300d.txt
Download evaluation.py and process_sql.py from the Spider github page, and evaluate the results following their instructions.

Run syntaxSQL with Knowledge

Generated train and dev data by running:
- python3 preprocess_syntaxSQL.py train|dev singletable|resplitdata
Preprocess knowledge features by running:
- python3 preprocess_direction_features.py syntaxSQL singletable|resplitdata weighted|direct
Run run_train.sh and run_test.sh under the directory syntaxSQL after setting the data_type, feats_format, and DATE at first lines.
- data_type: singletable or resplitdata
- feats_format: weighted or direct
- DATE: automatically set as local time while training and manually assigned while testing

Run SQLNet with Knowledge

Copy files in data/ directory to SQLNet/data/
Preprocess knowledge features by running:
- python3 preprocess_direction_features.py SQLNet singletable|resplitdata weighted|direct
Run run_train.sh and run_test.sh under the directory SQLNet after setting the data_type, feats_format, and DATE at first lines.
- data_type: singletable or resplitdata
- feats_format: weighted or direct
- DATE: automatically set as local time while training and manually assigned while testing

Question

If you have any question, please go ahead and open an issue.

Citation

@inproceedings{liu2019leveraging,
  title={Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL},
  author={Liu, Haoyan and Fang, Lei and Liu, Qian and Chen, Bei and Jian-Guang, LOU and Li, Zhoujun},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={3506--3511},
  year={2019}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.