The project predicts house prices in the Southeast Asian real estate market.
The raw dataset contains 80 features and 1460 rows, with the sales price being the target variable.
The data is preprocessed, including the removal of null values, the one-hot encoding of categorical features, and the standardization of numerical features. The dataset is split into training and testing data, and five models are developed and tested.
These include random forest, random forest with three of the most important features, linear regression with PCA, Lasso linear regression with cross-validation, and a fully connected neural network.
The results indicate that the fully connected neural network model outperforms the other models, achieving a very high accuracy score and R-squared value. The random forest model and Lasso linear regression with cross-validation model also show promising results. Overall, the study demonstrates the importance of data analysis and machine learning in real estate price prediction, and the potential benefits of using such models for portfolio management.