The model is tested in python 3.6 and pytorch 1.0. :
pip install -r requirements.txt
Download Pretrained BERT model from here as model/bert/data/annotated_wikisql_and_PyTorch_bert_param/pytorch_model_uncased_L-12_H-768_A-12.bin.
Download the database sqlite files from here as data/database.
First, download SParC. Then please follow
- for training run:
run_sparc_editsql.sh. - experimental logs are saved at
logs/logs_sparc_editsql. Deleteargs.logfrom there before commencing training - The dev results can be reproduced by
test_sparc_editsql.shwith the pre-trained model downloaded from here and put underlogs/logs_sparc_editsql/save_31_sparc_editsql. - The predictions are saved at
logs/logs_sparc_editsqlasdev_use_predicted_queries_predictions.json
Edit data/sparc/tables.json to add a new table, edit data/sparc/dev.json and data/sparc/dev_no_value.json to add new questions:
- Use https://github.com/taoyds/spider#tables and https://github.com/taoyds/spider#question-sql-and-parsed-sql as reference to understand the structure of the files.
- parsed_sql_examples.sql(https://github.com/taoyds/spider/blob/master/preprocess/parsed_sql_examples.sql) gives examples to help understand the input structure.
- In
dev.jsonanddev_no_value.jsonthere is no need to edit anything except theutteranceandutterance_toksas per the new question that you want to ask.
Add the new database schema file (.sqlite file) at data/sparc/databases/new_schema_name/new_schema.sqlite and add the database name to the list of database names in data/sparc/dev_db_ids.txt
After adding new questions, delete the following folders if they exist:
- processed_data_sparc_removefrom
- processed_data_sparc_removefrom_test
- data/sparc_data_removefrom These folders contain vocabulary files which need to be recreated if you have edited the dev files or added a new schema
Run output.py to get a text file named output.txt with formatted results.