Stock Price Forecasting with Transformer & Temporal Fusion Transformer (TFT)

This project implements two state-of-the-art deep learning architectures for short-term stock price forecasting:

Transformer (TensorFlow/Keras)
Temporal Fusion Transformer (TFT) (PyTorch)

It supports multiple stock symbols, customizable hyperparameters, automatic technical indicator generation, and detailed evaluation metrics.

Features

Dual-Model Support → Transformer (fast & accurate) and TFT (interpretable, multi-horizon forecasting)
Configurable Parameters → Sequence length, features, batch size, learning rate, patience
Automatic Mixed Precision (AMP) → Faster GPU training with reduced memory usage
Dynamic Close Index Detection → No hardcoded target index
Rich Feature Engineering → SMA, EMA, MACD, RSI, Bollinger Bands
Complete Evaluation → MAE, RMSE, MAPE, R², Directional Accuracy, Sharpe Ratio, Total Return
Live Demo Script (demo.py) → Quick training/inference without full training

Installation

Prerequisites

Python 3.8+
pip

Setup

# 1. Clone the repository
git clone <repo-url>
cd final-project

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate      # Linux/Mac
venv\Scripts\activate       # Windows

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

Troubleshooting:

If python doesn't work, try python3
If pip fails, try pip3 or add --user flag
On macOS, you might need: xcode-select --install

Usage

Train & Evaluate

Transformer (AAPL example)

python main.py --model transformer --symbol AAPL --sequence-length 60 --batch-size 256 --epochs 50 --learning-rate 1e-4 --patience 7

TFT (NVDA example)

python main.py --model tft --symbol NVDA --sequence-length 60 --batch-size 256 --epochs 50 --learning-rate 3e-4 --patience 7

Key CLI Arguments

Argument	Description	Example
`--model`	Model type (`transformer` / `tft`)	`--model transformer`
`--symbol`	Stock ticker	`--symbol AAPL`
`--sequence-length`	Lookback window size	`--sequence-length 60`
`--prediction-horizon`	Days ahead to predict	`--prediction-horizon 1`
`--epochs`	Training epochs	`--epochs 50`
`--batch-size`	Training batch size	`--batch-size 256`
`--learning-rate`	Learning rate	`--learning-rate 1e-4`
`--patience`	Early stopping patience	`--patience 7`

Quick Demo

Run a short demo with reduced parameters:

python demo.py --model transformer --symbol AAPL
python demo.py --model tft --symbol NVDA

Run only data processing (no training):

python demo.py --data-only --symbol TSLA

Final Results (Interim → Final)

Model	Symbol	R² Score (Interim)	R² Score (Final)	MAE Final	RMSE Final
Transformer	AAPL	-0.4468	0.9066	0.079	0.101
TFT	NVDA	0.0707	0.1257	0.602	0.754

Transformer → Major accuracy improvement after efficiency optimizations TFT → Minor gains, requires further hyperparameter tuning.

Project Structure

final-project-main/
├── main.py               # Main training & evaluation pipeline
├── trainer.py            # PyTorch training loop for TFT
├── transformer.py        # Transformer model
├── tft.py                # Temporal Fusion Transformer model
├── stock_preprocessor.py # Data preprocessing & feature engineering
├── demo.py               # Quick demo script
├── requirements.txt      # Python dependencies
├── results/              # Metrics and reports
├── plots/                # Generated plots
└── models/               # Saved model checkpoints

Model Architectures

Transformer

[Input Sequence] → Input Projection → Positional Encoding
→ Multi-Head Attention → Layer Norm → Feed-Forward → Layer Norm
→ Global Average Pooling → Dense → Output

TFT

Static Variable Encoder + Historical Encoder
→ Multi-Head Attention + LSTM Layers
→ Temporal Decoder → Output Layer

Below is a detailed, end-to-end breakdown of the data and model architecture for both the Transformer and TFT implementations.

End-to-End Workflow

Both models follow the same initial data processing pipeline:

Data Fetching:
- Input: Stock Symbol (e.g., AAPL), Start Date, End Date.
- Action: Use the yfinance library to download historical daily data (Open, High, Low, Close, Volume).
- Output: A raw pandas.DataFrame.
Feature Engineering & Preprocessing:
- Input: The raw DataFrame.
- Action:
  - Calculate a rich set of technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands, Volatility).
  - Select the features to be used in the model (e.g., Close, Volume, RSI).
  - Scale all selected features to a common range using StandardScaler or MinMaxScaler.
- Output: A scaled DataFrame of engineered features.
Sequence Creation:
- Input: The scaled feature DataFrame.
- Action: Create overlapping time-series sequences. For each sample, the input (X) is a window of the last N days (e.g., 60 days), and the target (y) is the 'Close' price of the following day.
- Output: (X_train, y_train), (X_val, y_val), and (X_test, y_test) NumPy arrays ready for training.

The following diagram illustrates the complete end-to-end architecture of the project, from data ingestion to final evaluation.

1. Transformer (TensorFlow/Keras) Architecture

The Transformer model processes the sequences to capture temporal patterns and predict the change in the next day's stock price.

[Input Sequence (Batch, 60, 7)]
           │
           ▼
┌──────────────────────────┐
│  Input Projection (Dense)│  (Projects 7 features to 128)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Positional Encoding     │  (Adds time-step information)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Transformer Encoder x4  │
│  ┌─────────────────────┐ │
│  │ Multi-Head Attention│ │  (Learns relationships across the 60 days)
│  └─────────────────────┘ │
│  ┌─────────────────────┐ │
│  │ Feed-Forward Network│ │
│  └─────────────────────┘ │
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│ Global Average Pooling   │  (Condenses the sequence into one vector)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Output Head (Dense)     │  (Outputs a single value: the predicted price *delta*)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Final Price Calculation │  (Last Close Price + Predicted Delta)
└──────────────────────────┘
           │
           ▼
  [Predicted Stock Price]

2. Temporal Fusion Transformer (TFT) (PyTorch) Architecture

The TFT model uses specialized components like variable selection networks and an LSTM to interpret the time-series data.

[Input Sequence (Batch, 60, 7)]
           │
           ▼
┌──────────────────────────┐
│ Variable Selection Net   │  (Learns which features are important at each time step)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  LSTM Encoder            │  (Processes the weighted features sequentially)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Multi-Head Attention    │  (Focuses on the most relevant past time steps)
│  - Query: LSTM State     │
│  - Key/Value: LSTM Output│
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│ Gated Residual Network   │  (Post-attention processing)
└──────────────────────────┘
           │
           ▼
┌──────────────────────────┐
│  Output Head (Linear)    │  (Outputs a single value: the predicted price)
└──────────────────────────┘
           │
           ▼
  [Predicted Stock Price]

Acknowledgements

Yahoo Finance for data
PyTorch, TensorFlow/Keras for model implementation
Original TFT paper: Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
src		src
.gitignore		.gitignore
MSML612 Final Project Report .pdf		MSML612 Final Project Report .pdf
README.md		README.md
demo.py		demo.py
end-to-end.png		end-to-end.png
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Price Forecasting with Transformer & Temporal Fusion Transformer (TFT)

Features

Installation

Prerequisites

Setup

Usage

Train & Evaluate

Transformer (AAPL example)

TFT (NVDA example)

Quick Demo

Final Results (Interim → Final)

Project Structure

Model Architectures

End-to-End Workflow

1. Transformer (TensorFlow/Keras) Architecture

2. Temporal Fusion Transformer (TFT) (PyTorch) Architecture

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stock Price Forecasting with Transformer & Temporal Fusion Transformer (TFT)

Features

Installation

Prerequisites

Setup

Usage

Train & Evaluate

Transformer (AAPL example)

TFT (NVDA example)

Quick Demo

Final Results (Interim → Final)

Project Structure

Model Architectures

End-to-End Workflow

1. Transformer (TensorFlow/Keras) Architecture

2. Temporal Fusion Transformer (TFT) (PyTorch) Architecture

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages