Skip to content

stbst1/MetaTab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaTab

This is the official guide for the paper “Detecting Logic Errors in LM-Generated Programs for Tabular Data via Metamorphic Testing.”

MetaTab Illustration


Requirements

  • Python ≥ 3.10
  • Linux

Installation

Clone this repository and install the required dependencies:

pip install -r requirements.txt

Usage

1. Data Preparation

Extract the dataset:

unzip assets/data.zip -d path/to/data

2. Model Setup

Set up the tabular language models locally:


3. Step-by-Step Example (TableGPT)

Intermediate Program Generation (Original)

python run_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation none --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt

Intermediate Program Generation (Perturbed)


Permutation Metamorphic Relations (PMR)

  • PMR1: Shuffle
python run_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation shuffle --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt
  • PMR2: Column Shuffle
python run_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation column_shuffle --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt
  • PMR3: Transpose
python run_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation transpose --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt
  • PMR4: Reconstruction
python run_reconstruction_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation none --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt

Decomposition Metamorphic Relations (DMR)

  • DMR1
python run_tablegpt_agent_cut.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation none --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt
  • DMR2
python run_tablegpt_agent_c_cut.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation none --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt

Semantic Metamorphic Relations (SMR)

  • SMR1
python Symbolization_pure_numbers_to_words.py
  • SMR2
python Category_Anonymization.py
  • SMR3
python filter_time_series_table.py

After preprocessing, run:

python run_tablegpt_agent.py \
    --model tablegpt \
    --dataset wtq --sub_sample False \
    --perturbation none --use_full_table True \
    --disable_resort True --norm_cache True \
    --resume 0 --stop_at 1e6 --self_consistency 5 --temperature 0.8 \
    --log_dir output/wtq_agent --cache_dir cache/tablegpt

Evaluation

  • Error Rate
python evaluate_agent_all_type.py
  • Recall, Precision, F1 Score
python hhh_wtq.py

Environment

  • PyTorch

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published