This project aims to predict flight prices using a Random Forest Regressor. By leveraging the power of ensemble learning and applying k-fold cross-validation, the model is optimized for accuracy and robustness. The dataset is preprocessed and imputed for missing values, and the model's performance is evaluated based on the R-squared score.
- β **Random Forest Regressor: A powerful ensemble learning model used for predicting continuous variables like flight prices.
- β **Cross-Validation: Implements 5-fold cross-validation to evaluate model performance and prevent overfitting.
- β **R-squared Metric: Uses R-squared to assess the quality of predictions and model fit.
- β **Efficient Model Training: The model is trained on imputed data for better accuracy and handling of missing values.
- β **Scalable: The Random Forest algorithm is suitable for large datasets and can easily be extended to other use cases.
- Python 3.x
- scikit-learn: For machine learning models and evaluation metrics
- pandas and numpy: For data manipulation and preprocessing
- Random Forest Regressor: For predicting flight prices
- Cross-validation: For model evaluation
- π Cross-validated R-squared scores: The model demonstrates high predictive accuracy across multiple folds, with R-squared scores ranging from 0.82 to 0.88.
- π Final R-squared score: After training on the entire dataset, the Random Forest model achieved an R-squared score of 0.8542, indicating that it can explain 85.42% of the variance in flight prices.
- π Imputed Data Handling: The model uses imputed data to handle missing values effectively, ensuring that the predictions are not biased by incomplete information.
- πΉ Hyperparameter Tuning: Explore tuning the Random Forest parameters like n_estimators and max_depth to improve model performance further.
- πΉ Feature Engineering: Experiment with additional features like flight duration, airline, or departure time to enhance the modelβs predictive power.
- πΉ Comparison with Other Models: Compare the performance of Random Forest with other models like Gradient Boosting or XGBoost to assess which provides the best results for flight price prediction.
Clone the repository: git clone https://github.com/Baybordi/Flight-Price-Prediction.git