Skip to content

dbmodena/orqa

Repository files navigation

Image Alt Text OrQA

OrQA (Open Data Retrieval and Question Answering) is a workflow for generating new benchmark datasets for retrieval and tabular question answering model evaluation on Open Data.

The workflow is composed of four main stages:

  1. Crawling data and metadata from the desired Open Data endpoint
  2. Searching for candidate related tables
  3. Evaluating the previously found pairs
  4. Generating questions and corresponding SQL queries

All scripts needed to run your own experiments are located in the scripts folder.


🧰 Requirements

OrQA is built on top of Ollama and LiteLLM.
You will need to manually install Ollama before running the scripts.

Install the required Python packages via Conda:

$ conda env create -f environment.yml

and manually install LiteLLM proxy:

$ pip install 'litellm[proxy]'

🚀 Starting the Services

Before running the evaluation and generation scripts, start the Ollama server:

$ ollama serve 

Then, launch LiteLLM:

(orqa) $ litellm --config litell_config.yml 

🧪 Run the Workflow

Use the following commands to create a new dataset from the first 1000 available packages on the Canadian Open Data portal:

(orqa) $ python orqa_0_open_data_crawler.py CAN 0 1000 https://open.canada.ca/data/api/action
(orqa) $ python orqa_1_create_blend_index.py CAN 0 1000
(orqa) $ python orqa_2_search_candidates.py CAN 0 1000
(orqa) $ python orqa_3_evaluation.py CAN 0 1000
(orqa) $ python orqa_4_generate_questions.py CAN 0 1000

⚙️ Customization

At this stage, customization of the workflow—such as selecting different models for question generation—is not yet available via command-line arguments or external config files. These settings must be hardcoded directly into the scripts.


In the dataset folder there is a first dataset version generated with OrQA workflow: this dataset contains 1,000 questions created from Canadian and UK Open Data tables.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published