Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 103 additions & 1 deletion Project_Outline.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1,103 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Project Outline.ipynb","provenance":[],"authorship_tag":"ABX9TyPZl4d0nA5Qmq8X1mDqSb1O"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# **Title of Project**"],"metadata":{"id":"dqZ-nhxiganh"}},{"cell_type":"markdown","source":["-------------"],"metadata":{"id":"gScHkw6jjrLo"}},{"cell_type":"markdown","source":["## **Objective**"],"metadata":{"id":"Xns_rCdhh-vZ"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"9sPvnFM1iI9l"}},{"cell_type":"markdown","source":["## **Data Source**"],"metadata":{"id":"-Vbnt9CciKJP"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"sGcv5WqQiNyl"}},{"cell_type":"markdown","source":["## **Import Library**"],"metadata":{"id":"r7GrZzX0iTlV"}},{"cell_type":"code","source":[""],"metadata":{"id":"UkK6NH9DiW-X"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Import Data**"],"metadata":{"id":"9lHPQj1XiOUc"}},{"cell_type":"code","source":[""],"metadata":{"id":"zcU1fdnGho6M"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Describe Data**"],"metadata":{"id":"7PUnimBoiX-x"}},{"cell_type":"code","source":[""],"metadata":{"id":"kG15arusiZ8Z"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Visualization**"],"metadata":{"id":"oBGX4Ekniriz"}},{"cell_type":"code","source":[""],"metadata":{"id":"lW-OIRK0iuzO"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Preprocessing**"],"metadata":{"id":"UqfyPOCYiiww"}},{"cell_type":"code","source":[""],"metadata":{"id":"3cyr3fbGin0A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Define Target Variable (y) and Feature Variables (X)**"],"metadata":{"id":"2jXJpdAuiwYW"}},{"cell_type":"code","source":[""],"metadata":{"id":"QBCakTuli57t"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Train Test Split**"],"metadata":{"id":"90_0q_Pbi658"}},{"cell_type":"code","source":[""],"metadata":{"id":"u60YYaOFi-Dw"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Modeling**"],"metadata":{"id":"cIhyseNria7W"}},{"cell_type":"code","source":[""],"metadata":{"id":"Toq58wpkjCw7"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Model Evaluation**"],"metadata":{"id":"vhAwWfG0jFun"}},{"cell_type":"code","source":[""],"metadata":{"id":"lND3jJj_jhx4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Prediction**"],"metadata":{"id":"8AzwG7oLjiQI"}},{"cell_type":"code","source":[""],"metadata":{"id":"JLebGzDJjknA"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Explaination**"],"metadata":{"id":"SBo38CJZjlEX"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"Ybi8FR9Kjv00"}}]}
mileage prediction project using regression analysis in Python

Dataset:
For this example, we'll use a sample dataset containing information about various car models, including their features and mileage. You can replace this dataset with your own.

Dataset Columns:

- Model_Year
- Cylinders
- Displacement
- Horsepower
- Weight
- Acceleration
- Mileage (target variable)

Code:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
data = {
'Model_Year': [2015, 2018, 2020, 2012, 2019, 2016, 2017, 2014, 2021, 2013],
'Cylinders': [4, 6, 4, 6, 4, 8, 4, 6, 4, 8],
'Displacement': [2.5, 3.5, 2.0, 4.0, 2.5, 5.0, 2.5, 3.5, 2.0, 5.0],
'Horsepower': [150, 250, 180, 300, 200, 350, 220, 280, 200, 320],
'Weight': [3000, 4000, 2800, 4500, 3200, 5000, 3000, 4200, 2800, 4800],
'Acceleration': [8.5, 6.5, 7.5, 5.5, 7.0, 4.5, 8.0, 6.0, 7.5, 5.0],
'Mileage': [25, 20, 28, 18, 24, 15, 26, 22, 29, 19]
}

df = pd.DataFrame(data)

# Explore dataset
print(df.head())
print(df.describe())

# Split dataset into features (X) and target (y)
X = df.drop('Mileage', axis=1)
y = df['Mileage']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression Model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)

# Random Forest Regressor Model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)

# Evaluate models
print("Linear Regression Metrics:")
print(f"MSE: {mean_squared_error(y_test, y_pred_lr)}")
print(f"R2 Score: {r2_score(y_test, y_pred_lr)}")

print("Random Forest Regressor Metrics:")
print(f"MSE: {mean_squared_error(y_test, y_pred_rf)}")
print(f"R2 Score: {r2_score(y_test, y_pred_rf)}")

# Plot predicted vs actual mileage
plt.scatter(y_test, y_pred_lr, label='Linear Regression')
plt.scatter(y_test, y_pred_rf, label='Random Forest Regressor')
plt.xlabel('Actual Mileage')
plt.ylabel('Predicted Mileage')
plt.legend()
plt.show()

Explanation:

1. Import necessary libraries.
2. Load the dataset.
3. Explore the dataset using head() and describe().
4. Split the dataset into features (X) and target (y).
5. Split data into training and testing sets using train_test_split().
6. Train a Linear Regression model and a Random Forest Regressor model on the training data.
7. Make predictions on the testing data.
8. Evaluate the models using Mean Squared Error (MSE) and R2 Score.
9. Plot the predicted vs actual mileage for both models.

Advice:

- Use a larger dataset for better results.
- Feature engineer additional variables (e.g., engine type, transmission type).
- Experiment with different regression algorithms (e.g., Ridge, Lasso, Elastic Net).
- Tune hyperparameters for optimal performance.
- Consider using cross-validation for more robust evaluation.

Example Use Cases:

- Predicting fuel efficiency for new car models.
- Identifying key factors affecting mileage.
- Comparing the performance of different regression algorithms.
- Developing a mileage prediction tool for car manufacturers or consumers