ContextualSP/unified_parser_text_to_sql
longxud 5ae4e368e2 init unisar 2022-04-14 12:34:57 +08:00
..
data/spider_schema_linking_tag init unisar 2022-04-14 12:34:57 +08:00
dataset_post/spider_sl init unisar 2022-04-14 12:34:57 +08:00
eval init unisar 2022-04-14 12:34:57 +08:00
genre init unisar 2022-04-14 12:34:57 +08:00
semparse init unisar 2022-04-14 12:34:57 +08:00
third_party init unisar 2022-04-14 12:34:57 +08:00
unisar init unisar 2022-04-14 12:34:57 +08:00
README.md init unisar 2022-04-14 12:34:57 +08:00
interactive.py init unisar 2022-04-14 12:34:57 +08:00
multiprocessing_bpe_encoder.py init unisar 2022-04-14 12:34:57 +08:00
requirements.txt init unisar 2022-04-14 12:34:57 +08:00
running_pipeline.sh init unisar 2022-04-14 12:34:57 +08:00
step1_schema_linking.py init unisar 2022-04-14 12:34:57 +08:00
step2_serialization.py init unisar 2022-04-14 12:34:57 +08:00
step3_evaluate.py init unisar 2022-04-14 12:34:57 +08:00
train.py init unisar 2022-04-14 12:34:57 +08:00

README.md

Introduction

This paper introduces UniSAr, which extends existing autoregressive language models to incorporate three non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships; (2) constrained decoding to decode well structured SQL for a given database schema; and (3) SQL completion to complete potential missing JOIN relationships in SQL based on database schema.

Dataset and Model

Spider -> ./data/spider

Fine-tuned BART model -> ./models/spider_sl (Please download this model by git-lfs to avoid the issue.)

sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/dreamerdeo/mark-bart

Main dependencies

  • Python version >= 3.6
  • PyTorch version >= 1.5.0
  • pip install -r requirements.txt
  • fairseq is going though changing without backward compatibility. Install fairseq from source and use this commit for reproducibilty. See here for the current PR that should fix fairseq/master.

Evaluation Pipeline

Step 1: Preprocess via adding schema-linking and value-linking tag.

python step1_schema_linking.py

Step 2: Building the input and output for BART.

python step2_serialization.py

Step 3: Evaluation Script with/without constrained decoding.

python step3_evaluate.py --constrain

Results

Prediction: 69.34

Prediction with Constrain Decoding: 70.02

Interactive

python interactive.py --logdir ./models/spider-sl --db_id student_1 --db-path ./data/spider/database --schema-path ./data/spider/tables.json

Reference Code

https://github.com/ryanzhumich/editsql

https://github.com/benbogin/spider-schema-gnn-global

https://github.com/ElementAI/duorat

https://github.com/facebookresearch/GENRE