FinXtract is an LLM-powered tool that transforms annual reports into structured, searchable, and insightful data β with no manual rule-writing.
Upload a full annual report (PDF or text), and FinXtract will:
-
π Detect Document Structure
Identify key sections (e.g. Chairmanβs Statement, Risk Disclosure) and map them to page numbers -
π Extract Section Text
Retrieve and display full text for each detected section -
π§Ύ Named Entity Recognition (NER)
Automatically annotate entities like names, roles, organisations, and locations -
β Question Answering
Ask natural-language questions about the company or report
e.g. βWho is the chairman?β, βWhat risks are mentioned?β -
π Keyword Tracking & Chart Generation
Analyse keyword trends and visualise them across the report
- π Large Language Models (LLMs)
- π¬ Prompt Engineering
- π Python + Streamlit (or Flask)
- π PDF/Text parsing (e.g. PyMuPDF)
- Financial and ESG analysts
- Regulatory and audit teams
- NLP researchers and students
- Anyone working with complex corporate disclosures
git clone https://github.com/yourusername/finxtract.git
cd finxtract
pip install -r requirements.txt
python app.py