This project focuses on time-series based stock prediction using Auto-Regression (AR) and Auto-ARIMA models.
Originally developed in R, it has been converted into Python with equivalent functionality for reproducibility and deployment.
The goal is to predict future stock closing prices using past historical values and provide insights into stock performance across multiple companies.
-
Data Preprocessing:
- Handling missing values (
NaN) - Checking normality (Shapiro-Wilk test, Boxplot, Histogram, Density, Q-Q plot)
- Outlier detection (IQR method)
- Stationarity checks (Rolling mean/std, Augmented Dickey-Fuller test)
- Differencing for non-stationary series
- Handling missing values (
-
Exploratory Data Analysis (EDA):
- Trend, seasonal, and residual decomposition
- Visualization of ACF (Autocorrelation) and PACF (Partial Autocorrelation) plots
- Statistical checks for time-series stationarity
-
Model Building:
- p-th order Auto-Regression Analysis (manual AR models)
- Auto-ARIMA for automatic (p, d, q) order selection
- ARIMA model training with expanding-window (one-step ahead) forecasting
- Multi-company analysis across 12 stocks:
- Apple, TCS, Tesla, Dr Reddy’s Lab, Abott, IBM, Nvidia, Google, Accenture, Microsoft, Amazon, HP
-
Evaluation:
- Accuracy metrics: RMSE, MAE, MAPE, ME, MPE
- Covariance matrices for lagged features
- Error comparisons across different p values for individual companies
-
Outputs:
- Predicted vs Actual plots
- Stationarity diagnostics
- Accuracy JSON reports
- Covariance CSVs
- Multi-company forecasts
https://www.kaggle.com/datasets/minatverma/nse-stocks-data
-
Python stack:
pandas,numpy,matplotlibstatsmodels(ADF test, ARIMA)pmdarima(Auto-ARIMA)scipy(Shapiro-Wilk)scikit-learn(metrics)
-
From R original:
ggplot2,zoo,tseries,forecast,tidyverse
stock_auto_arima_py/
│── README.md
│── requirements.txt
│
├── src/stock_arima/
│ ├── __init__.py
│ ├── utils.py # Accuracy metrics, IQR bounds
│ ├── io.py # CSV ingestion with Date parsing
│ ├── eda.py # Stationarity, rolling stats, ACF/PACF, plots
│ ├── modeling.py # ARIMA, Auto-ARIMA, iterative predictions
│
└── scripts/
├── run_univariate.py # Run full EDA + ARIMA pipeline for companies
└── multivariate_tests.py # Multivariate stationarity checks (ADF on all cols)
pip install -r requirements.txtpython scripts/run_univariate.py --csv "Path/stocks_ida.csv" --company "apple" --col "Close" --outdir outputs/applepython scripts/run_univariate.py --csv "Path/stocks_ida.csv" --companies "apple,TCS,tesla,dr reddy lab,abott,IBM,nvdia,google,accenture,micro soft,amazon,Hp" --col "Close" --outdir outputs/batchpython scripts/multivariate_tests.py --csv "Path/stocks_ida.csv" --company "apple" --start-col-index 3 --outdir outputs/multi- Stationarity: Most series were non-stationary initially; differencing made them stationary.
- Order selection: Auto-ARIMA identified optimal
(p,d,q)orders, validated with PACF plots. - Prediction style: Day-by-day iterative predictions (not all at once) for realistic forecasting.
- Best performing stocks: Microsoft and Accenture showed increasing predicted trends; identified as good investment candidates.
- Error metrics: Comparative RMSE values were tabulated for all 12 companies. Example (from R study):
- Apple: p=2, RMSE=0.3757
- Tesla: p=0, RMSE=0.2453
- Microsoft: p=2, RMSE=0.7382
This project demonstrates how time-series forecasting with Auto-Regression and Auto-ARIMA can provide actionable insights into stock price movements.
It balances statistical rigor (stationarity, decomposition, ACF/PACF) with predictive power (Auto-ARIMA, iterative ARIMA), offering a reusable Python-based framework for financial analytics.