Skip to content

A simple AI tool that reads a PDF Annual Report, detect basic structure and Answers questions back

Notifications You must be signed in to change notification settings

VinNLP/FinXtract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š FinXtract: LLM-Powered Annual Report Insight Tool

FinXtract is an LLM-powered tool that transforms annual reports into structured, searchable, and insightful data β€” with no manual rule-writing.

πŸš€ What It Does

Upload a full annual report (PDF or text), and FinXtract will:

  • πŸ—‚ Detect Document Structure
    Identify key sections (e.g. Chairman’s Statement, Risk Disclosure) and map them to page numbers

  • πŸ“„ Extract Section Text
    Retrieve and display full text for each detected section

  • 🧾 Named Entity Recognition (NER)
    Automatically annotate entities like names, roles, organisations, and locations

  • ❓ Question Answering
    Ask natural-language questions about the company or report
    e.g. β€œWho is the chairman?”, β€œWhat risks are mentioned?”

  • πŸ“ˆ Keyword Tracking & Chart Generation
    Analyse keyword trends and visualise them across the report

🧠 Powered By

  • πŸ”— Large Language Models (LLMs)
  • πŸ’¬ Prompt Engineering
  • 🐍 Python + Streamlit (or Flask)
  • πŸ“„ PDF/Text parsing (e.g. PyMuPDF)

πŸ’Ό Use Cases

  • Financial and ESG analysts
  • Regulatory and audit teams
  • NLP researchers and students
  • Anyone working with complex corporate disclosures

🏁 Getting Started

git clone https://github.com/yourusername/finxtract.git
cd finxtract
pip install -r requirements.txt
python app.py

About

A simple AI tool that reads a PDF Annual Report, detect basic structure and Answers questions back

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published