Skip to content

mateusnishimura/summpaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“„ summpaper

๐Ÿ“ Project Overview

This project explores an approach that combines RAG (Retrieval-Augmented Generation) and LLMs to generate structured summaries of research papers. The goal is to automatically extract the most relevant passages and organize the summary into the following sections:

  • Introduction
  • Context
  • Results
  • Conclusion
  • Relevance

โš™๏ธ How It Works

1๏ธโƒฃ Paper Vectorization โ†’ The document is divided into chunks (smaller segments) with overlap to ensure context retention. 2๏ธโƒฃ Information Retrieval โ†’ The system retrieves the most relevant excerpts based on predefined topics. 3๏ธโƒฃ Summary Generation โ†’ LLaMA 3 processes the retrieved excerpts and generates a structured and concise summary.

โ–ถ๏ธ Running the Script

The main script can be executed as follows:

python main.py --pdf_file data/sample.pdf --output_file output/summa.txt --database_dir chroma_db

๐Ÿ›  Script Parameters

Parameter Description
-pdf, --pdf_file Path to the input PDF file (required).
-o, --output_file Path to save the extracted text (default: ./summa.txt).
-db, --database_dir Directory where ChromaDB will be stored (default: ./chroma_db).

๐Ÿ”ง Future Improvements

  • ๐Ÿ“Œ Enhance section extraction, vectorizing chunks by section.
  • โšก Optimize LLM response time.
  • ๐Ÿ–ผ Extract images from papers to enrich summaries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages