📄 summpaper

📝 Project Overview

This project explores an approach that combines RAG (Retrieval-Augmented Generation) and LLMs to generate structured summaries of research papers. The goal is to automatically extract the most relevant passages and organize the summary into the following sections:

Introduction
Context
Results
Conclusion
Relevance

⚙️ How It Works

1️⃣ Paper Vectorization → The document is divided into chunks (smaller segments) with overlap to ensure context retention. 2️⃣ Information Retrieval → The system retrieves the most relevant excerpts based on predefined topics. 3️⃣ Summary Generation → LLaMA 3 processes the retrieved excerpts and generates a structured and concise summary.

▶️ Running the Script

The main script can be executed as follows:

python main.py --pdf_file data/sample.pdf --output_file output/summa.txt --database_dir chroma_db

🛠 Script Parameters

Parameter	Description
`-pdf`, `--pdf_file`	Path to the input PDF file (required).
`-o`, `--output_file`	Path to save the extracted text (default: `./summa.txt`).
`-db`, `--database_dir`	Directory where ChromaDB will be stored (default: `./chroma_db`).

🔧 Future Improvements

📌 Enhance section extraction, vectorizing chunks by section.
⚡ Optimize LLM response time.
🖼 Extract images from papers to enrich summaries.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 summpaper

📝 Project Overview

⚙️ How It Works

▶️ Running the Script

🛠 Script Parameters

🔧 Future Improvements

About

Uh oh!

Releases

Packages

Languages

mateusnishimura/summpaper

Folders and files

Latest commit

History

Repository files navigation

📄 summpaper

📝 Project Overview

⚙️ How It Works

▶️ Running the Script

🛠 Script Parameters

🔧 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages