Skip to content

yanhua-wang/FinancialReport_QnA

Repository files navigation

Financial Statement Q&A System

A Retrieval-Augmented Generation (RAG) system that enables natural language question-answering on financial statements.

Overview

This system allows users to:

  • Query financial statements using natural language
  • Get detailed answers about specific companies' financial data
  • Navigate through different years of financial reports
  • Access information through an intuitive web interface

Installation

  1. Clone the repository:
git clone https://github.com/yanhua-wang/FinancialReport_QnA.git
cd FinancialReport_QnA
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Set up your Together API key:
    • Sign up for a Together API key at https://www.together.ai
    • Create a .env file in the project root directory
    • Add your API key to the .env file: TOGETHER_API_KEY=your_api_key_here

Usage

Downloading and Preparing Financial Data

Before running the application or building the RAG index, you need to download the 10-K financial statements for the supported companies and years. These statements must then be organized into a specific directory structure that the system expects.

Supported Tickers: As listed in app.py (e.g., AAPL, AEE, BA, CMCSA, CNP, CRL, D, ED, HWM, VRSN). (Modifiable in sampled_tickers.txt)

Supported Years: As listed in app.py (e.g., 2010 through 2019). (Modifiable in data_downloading.py)

  1. Important: Before running data_downloading.py, you must provide your email address in the EMAIL variable within the data_downloading.py script. This is required by the SEC EDGAR Downloader.
  2. Run data_downloading.py, data_processing.py, and vector_store_construction.py in this order to download, process, and index the financial data.
    python data_downloading.py
    python data_processing.py
    python vector_store_construction.py

Running the Streamlit App

  1. Start the Streamlit application:
streamlit run app.py
  1. Open your web browser and navigate to the URL shown in the terminal

  2. Using the app:

    • Select a company ticker from the dropdown menu
    • Choose the year of the financial report
    • Enter your question in natural language
    • Click "Submit" to get your answer

Example Questions

You can ask questions like:

  • "What was the company's revenue in 2018?"
  • "How much did the company spend on R&D?"
  • "What were the major risks identified in the annual report?"
  • "What is the company's current debt-to-equity ratio?"

Technologies Used

This project leverages several key technologies and frameworks:

Core Technologies

  • LlamaIndex: Used for building the RAG (Retrieval-Augmented Generation) system
  • ChromaDB: Vector database for storing and retrieving document embeddings
  • Streamlit: Web application framework for the user interface
  • Together AI: LLM API for generating responses to user queries

Key Components

  • Embedding Model: sentence-transformers/multi-qa-MiniLM-L6-cos-v1 for document embeddings
  • LLM Model: meta-llama/Llama-3-70b-chat-hf for generating responses
  • SEC EDGAR Downloader: For fetching financial statements and reports
  • BeautifulSoup: For parsing and cleaning HTML content from financial documents

Note

The Streamlit app runs efficiently on a local CPU for querying the current dataset. To fully replicate the project, GPU access is required. Reach out to me if you would like sample data.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors