OrQA

OrQA (Open Data Retrieval and Question Answering) is a workflow for generating new benchmark datasets for retrieval and tabular question answering model evaluation on Open Data.

The workflow is composed of four main stages:

Crawling data and metadata from the desired Open Data endpoint
Searching for candidate related tables
Evaluating the previously found pairs
Generating questions and corresponding SQL queries

All scripts needed to run your own experiments are located in the scripts folder.

🧰 Requirements

OrQA is built on top of Ollama and LiteLLM.
You will need to manually install Ollama before running the scripts.

Install the required Python packages via Conda:

$ conda env create -f environment.yml

and manually install LiteLLM proxy:

$ pip install 'litellm[proxy]'

🚀 Starting the Services

Before running the evaluation and generation scripts, start the Ollama server:

$ ollama serve

Then, launch LiteLLM:

(orqa) $ litellm --config litell_config.yml

🧪 Run the Workflow

Use the following commands to create a new dataset from the first 1000 available packages on the Canadian Open Data portal:

(orqa) $ python orqa_0_open_data_crawler.py CAN 0 1000 https://open.canada.ca/data/api/action
(orqa) $ python orqa_1_create_blend_index.py CAN 0 1000
(orqa) $ python orqa_2_search_candidates.py CAN 0 1000
(orqa) $ python orqa_3_evaluation.py CAN 0 1000
(orqa) $ python orqa_4_generate_questions.py CAN 0 1000

⚙️ Customization

At this stage, customization of the workflow—such as selecting different models for question generation—is not yet available via command-line arguments or external config files. These settings must be hardcoded directly into the scripts.

In the dataset folder there is a first dataset version generated with OrQA workflow: this dataset contains 1,000 questions created from Canadian and UK Open Data tables.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
conf		conf
dataset		dataset
orqa		orqa
scripts		scripts
static/images		static/images
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
get_statistics.ipynb		get_statistics.ipynb
litellm_config.yml		litellm_config.yml
test.ipynb		test.ipynb
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OrQA

🧰 Requirements

🚀 Starting the Services

🧪 Run the Workflow

⚙️ Customization

About

Uh oh!

Releases

Packages

Contributors 2

Languages

dbmodena/orqa

Folders and files

Latest commit

History

Repository files navigation

OrQA

🧰 Requirements

🚀 Starting the Services

🧪 Run the Workflow

⚙️ Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages