Skip to content

convert pdf into structured tree maps with LLMs, supporting English/Arabic, multiple providers, and JSON export

Notifications You must be signed in to change notification settings

AhmadHakami/PDF2TreeMap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

PDF2TreeMap

Overview

PDF2TreeMap is a document analysis tool that leverages generative LLMs to convert PDF documents into structured tree representations. The application intelligently parses document sections and establishes hierarchical relationships between content elements, supporting both Arabic and English languages

Features

  • Multi-language Support: Full support for Arabic and English document processing
  • Intelligent Document Parsing: Advanced PDF parsing capabilities for digital documents
  • LLM-powered Analysis: Utilizes generative models to understand document structure and relationships
  • Multiple AI Provider Support: Compatible with OpenAI, Groq, and Ollama models
  • Interactive Visualization:
    • Treemap visualization of document hierarchy
    • Outline view of document sections
    • JSON export functionality
  • Robust Architecture: Built following SOLID principles with comprehensive unit testing

Todo List

  • Design parsing system for Arabic/English digital PDF files
  • Implement intelligent document chunking using language models
  • Integrate OpenAI, Groq, and Ollama model support
  • Build interactive treemap visualization
  • Create outline sections view and JSON export
  • Write comprehensive unit tests for all functions

About

convert pdf into structured tree maps with LLMs, supporting English/Arabic, multiple providers, and JSON export

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published