This project implements Linear Regression from scratch in Java, without using any machine learning libraries.
- Linear Regression using Batch Gradient Descent
- Categorical feature handling via internal label encoding
- Custom DataFrame abstraction for tabular data
- Supports:
- training from CSV
- single-row prediction
- batch prediction
- Mean Squared Error (MSE) loss
- Clean separation of:
- data storage
- encoding
- model logic
.
├── DataFrame.java // Custom tabular data structure
├── EncodeData.java // Categorical encoder
├── LinearRegression.java // Linear Regression model
├── Test.java // Test / demo
└── README.mdWhere:
w= learned Weightsb= learned Bias
- Batch Gradient Descent
- Weight update:
- Bias update:
- Automatically detects non-numeric columns
- Encodes categories per feature
- Encoder is learned during training and reused during prediction
- Unknown categories during inference throw an explicit error
This prevents:
- silent data leakage
- inconsistent feature mappings
- One-Hot Encoding
- Feature normalization
- Logistic Regression
- Ridge / Lasso Regression
- R² and MAE metrics
- Mini-batch gradient descent