Financial Statement Q&A System

A Retrieval-Augmented Generation (RAG) system that enables natural language question-answering on financial statements.

Overview

This system allows users to:

Query financial statements using natural language
Get detailed answers about specific companies' financial data
Navigate through different years of financial reports
Access information through an intuitive web interface

Installation

Clone the repository:

git clone https://github.com/yanhua-wang/FinancialReport_QnA.git
cd FinancialReport_QnA

Install the required dependencies:

pip install -r requirements.txt

Set up your Together API key:
- Sign up for a Together API key at https://www.together.ai
- Create a .env file in the project root directory
- Add your API key to the .env file: TOGETHER_API_KEY=your_api_key_here

Usage

Downloading and Preparing Financial Data

Before running the application or building the RAG index, you need to download the 10-K financial statements for the supported companies and years. These statements must then be organized into a specific directory structure that the system expects.

Supported Tickers: As listed in app.py (e.g., AAPL, AEE, BA, CMCSA, CNP, CRL, D, ED, HWM, VRSN). (Modifiable in sampled_tickers.txt)

Supported Years: As listed in app.py (e.g., 2010 through 2019). (Modifiable in data_downloading.py)

Important: Before running data_downloading.py, you must provide your email address in the EMAIL variable within the data_downloading.py script. This is required by the SEC EDGAR Downloader.
Run data_downloading.py, data_processing.py, and vector_store_construction.py in this order to download, process, and index the financial data.
```
python data_downloading.py
python data_processing.py
python vector_store_construction.py
```

Running the Streamlit App

Start the Streamlit application:

streamlit run app.py

Open your web browser and navigate to the URL shown in the terminal
Using the app:
- Select a company ticker from the dropdown menu
- Choose the year of the financial report
- Enter your question in natural language
- Click "Submit" to get your answer

Example Questions

You can ask questions like:

"What was the company's revenue in 2018?"
"How much did the company spend on R&D?"
"What were the major risks identified in the annual report?"
"What is the company's current debt-to-equity ratio?"

Technologies Used

This project leverages several key technologies and frameworks:

Core Technologies

LlamaIndex: Used for building the RAG (Retrieval-Augmented Generation) system
ChromaDB: Vector database for storing and retrieving document embeddings
Streamlit: Web application framework for the user interface
Together AI: LLM API for generating responses to user queries

Key Components

Embedding Model: sentence-transformers/multi-qa-MiniLM-L6-cos-v1 for document embeddings
LLM Model: meta-llama/Llama-3-70b-chat-hf for generating responses
SEC EDGAR Downloader: For fetching financial statements and reports
BeautifulSoup: For parsing and cleaning HTML content from financial documents

Note

The Streamlit app runs efficiently on a local CPU for querying the current dataset. To fully replicate the project, GPU access is required. Reach out to me if you would like sample data.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
data_downloading.py		data_downloading.py
data_processing.py		data_processing.py
requirements.txt		requirements.txt
run.sh		run.sh
sampled_tickers.txt		sampled_tickers.txt
system.py		system.py
vector_store_construction.py		vector_store_construction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Statement Q&A System

Overview

Installation

Usage

Downloading and Preparing Financial Data

Running the Streamlit App

Example Questions

Technologies Used

Core Technologies

Key Components

Note

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial Statement Q&A System

Overview

Installation

Usage

Downloading and Preparing Financial Data

Running the Streamlit App

Example Questions

Technologies Used

Core Technologies

Key Components

Note

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages