Skip to content

pshlego/HELIOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval

HELIOS outperforms state-of-the-art models with a significant improvement in both recall and nDCG on the OTT-QA benchmark.

About HELIOS

We introduce HELIOS, our novel table-text retrieval model designed to enhance the capabilities of open-domain question answering systems by addressing the limitations of both early and late fusion methods. Here are the key features of HELIOS:

  • Combining early and late fusion techniques, it bridges the gap between static pre-alignments and dynamic retrieval strategies, ensuring contextually relevant results for more complex queries.
  • Utilizing edge-based bipartite subgraph retrieval, HELIOS materializes finer-grained relationships between table segments and text passages, reducing the inclusion of irrelevant information while maintaining crucial query-dependent links.
  • Employing a query-relevant node expansion mechanism, it dynamically identifies and retrieves the most promising nodes for expansion, minimizing the risk of missing vital contexts.
  • Integrating a star-based LLM refinement step, it prevents hallucinations by performing logical inference at the star graph level, enabling advanced reasoning tasks such as column-wise aggregation and multi-hop reasoning.

drawing

HELIOS outperforms state-of-the-art models with a significant improvement in both recall and nDCG on the OTT-QA benchmark.

Getting Started

This page guides you to reproduce the results written in the paper "HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval".

Please refer to the instructions below.

Prerequisites

Docker

You must be able to download our docker image from the docker cloud. Please refer to Docker Docs to download docker.

Download Docker Image

We made a docker image of our environment. Please download from docker cloud.

  1. Download our image from docker cloud
docker pull anonymous824/heliosworkspace:latest

docker pull anonymous824/heliosresources:latest

Create HELIOS Workspace

Create a helios workspace using the downloaded image.

  1. Docker run
docker run -itd --name acl2025-heliosworkspace anonymous824/heliosworkspace /bin/bash
  1. Docker start
docker start acl2025-heliosworkspace
  1. Docker init
docker init acl2025-heliosworkspace

Activate Conda Env

conda activate fm

Download Dataset and Model Checkpoints

  1. Docker run
docker run -itd --name acl2025-heliosresources anonymous824/heliosresources /bin/bash
  1. Docker start
docker start acl2025-heliosresources
  1. Docker init
docker init acl2025-heliosresources
  1. Download large language model
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir-use-symlinks False --local-dir /mnt/sdd/OTT-QAMountSpace/ModelCheckpoints/Ours/llm/Meta-Llama-3.1-8B-Instruct --exclude *.pth

Build Index

  1. Create edge index
sh Algorithms/Ours/scripts/build_edge_index.sh
  1. Create table segment index
sh Algorithms/Ours/scripts/build_table_segment_index.sh
  1. Create passage index
sh Algorithms/Ours/scripts/build_passage_index.sh

Run Edge-based Bipartite Subgraph Retrieval

  1. If tmux is not installed, run the following command
apt-get install tmux
  1. Load edge retriever
tmux new -s edge_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_edge_retriever.sh
  1. Load edge reranker
tmux new -s edge_reranker
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_edge_reranker.sh
  1. Run bipartite subgraph retrieval
sh Algorithms/Ours/scripts/run_edge_based_bipartite_subgraph_retrieval.sh

Run Query-relevant Node Expansion

  1. Kill edge retriever session
tmux kill-session -t edge_retriever
  1. Load seed node scorer
tmux new -s node_scorer
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_seed_node_scorer.sh
  1. Load table segment retriever
tmux new -s table_segment_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_table_segment_retriever.sh
  1. Load passage retriever
tmux new -s passage_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_passage_retriever.sh
  1. Run query-relevant node expansion
sh Algorithms/Ours/scripts/run_query_relevant_node_expansion.sh

Run Star-based LLM Refinement

  1. Kill edge retriever session
tmux kill-session -t edge_reranker
tmux kill-session -t node_scorer
tmux kill-session -t table_segment_retriever
tmux kill-session -t passage_retriever
  1. Load large language model
tmux new -s llm
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_llm.sh
  1. Run star-based llm refinement
sh Algorithms/Ours/scripts/run_star_based_llm_refinement.sh

Evaluate Retrieval Accuracy

  1. Evaluate AnswerRecall@K
sh Algorithms/Ours/scripts/eval_answer_recall.sh
  1. Evaluate nDCG@K
sh Algorithms/Ours/scripts/eval_ndcg.sh
  1. Evaluate HITS@4K
sh Algorithms/Ours/scripts/eval_hits.sh

Evaluate Reading Accuracy

  1. Convert retrieval results into reader input
sh Algorithms/Ours/scripts/get_reader_input.sh
  1. Evaluate Exact Match & F1 Score
sh Algorithms/Ours/scripts/eval_reading_accuracy.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published