HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval
HELIOS outperforms state-of-the-art models with a significant improvement in both recall and nDCG on the OTT-QA benchmark.
We introduce HELIOS, our novel table-text retrieval model designed to enhance the capabilities of open-domain question answering systems by addressing the limitations of both early and late fusion methods. Here are the key features of HELIOS:
- Combining early and late fusion techniques, it bridges the gap between static pre-alignments and dynamic retrieval strategies, ensuring contextually relevant results for more complex queries.
- Utilizing edge-based bipartite subgraph retrieval, HELIOS materializes finer-grained relationships between table segments and text passages, reducing the inclusion of irrelevant information while maintaining crucial query-dependent links.
- Employing a query-relevant node expansion mechanism, it dynamically identifies and retrieves the most promising nodes for expansion, minimizing the risk of missing vital contexts.
- Integrating a star-based LLM refinement step, it prevents hallucinations by performing logical inference at the star graph level, enabling advanced reasoning tasks such as column-wise aggregation and multi-hop reasoning.
HELIOS outperforms state-of-the-art models with a significant improvement in both recall and nDCG on the OTT-QA benchmark.
This page guides you to reproduce the results written in the paper "HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval".
Please refer to the instructions below.
You must be able to download our docker image from the docker cloud. Please refer to Docker Docs to download docker.
We made a docker image of our environment. Please download from docker cloud.
- Download our image from docker cloud
docker pull anonymous824/heliosworkspace:latest
docker pull anonymous824/heliosresources:latestCreate a helios workspace using the downloaded image.
- Docker run
docker run -itd --name acl2025-heliosworkspace anonymous824/heliosworkspace /bin/bash- Docker start
docker start acl2025-heliosworkspace- Docker init
docker init acl2025-heliosworkspaceconda activate fm- Docker run
docker run -itd --name acl2025-heliosresources anonymous824/heliosresources /bin/bash- Docker start
docker start acl2025-heliosresources- Docker init
docker init acl2025-heliosresources- Download large language model
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir-use-symlinks False --local-dir /mnt/sdd/OTT-QAMountSpace/ModelCheckpoints/Ours/llm/Meta-Llama-3.1-8B-Instruct --exclude *.pth- Create edge index
sh Algorithms/Ours/scripts/build_edge_index.sh- Create table segment index
sh Algorithms/Ours/scripts/build_table_segment_index.sh- Create passage index
sh Algorithms/Ours/scripts/build_passage_index.sh- If tmux is not installed, run the following command
apt-get install tmux- Load edge retriever
tmux new -s edge_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_edge_retriever.sh- Load edge reranker
tmux new -s edge_reranker
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_edge_reranker.sh- Run bipartite subgraph retrieval
sh Algorithms/Ours/scripts/run_edge_based_bipartite_subgraph_retrieval.sh- Kill edge retriever session
tmux kill-session -t edge_retriever- Load seed node scorer
tmux new -s node_scorer
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_seed_node_scorer.sh- Load table segment retriever
tmux new -s table_segment_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_table_segment_retriever.sh- Load passage retriever
tmux new -s passage_retriever
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_passage_retriever.sh- Run query-relevant node expansion
sh Algorithms/Ours/scripts/run_query_relevant_node_expansion.sh- Kill edge retriever session
tmux kill-session -t edge_reranker
tmux kill-session -t node_scorer
tmux kill-session -t table_segment_retriever
tmux kill-session -t passage_retriever- Load large language model
tmux new -s llm
conda activate fm
cd HELIOS
sh Algorithms/Ours/scripts/load_llm.sh- Run star-based llm refinement
sh Algorithms/Ours/scripts/run_star_based_llm_refinement.sh- Evaluate AnswerRecall@K
sh Algorithms/Ours/scripts/eval_answer_recall.sh- Evaluate nDCG@K
sh Algorithms/Ours/scripts/eval_ndcg.sh- Evaluate HITS@4K
sh Algorithms/Ours/scripts/eval_hits.sh- Convert retrieval results into reader input
sh Algorithms/Ours/scripts/get_reader_input.sh- Evaluate Exact Match & F1 Score
sh Algorithms/Ours/scripts/eval_reading_accuracy.sh