Pricing Model Using K-nearest Neighbors

This repo contains a class that implements k-nearest neighbors regression that avoids time leakage.

Demo

You can run python house_price_prediction.py in the command line to see the demo.

What is the performance of the model measured in Median Relative Absolute Error?

The Relative Median Absolute Error (RMAE) is 0.2128.
What would be an appropriate methodology to determine the optimal k?

We can use grid search to determine the optimal k, which performs cross-validation and choose the optimal k with lowest MRAE. Execute the following commands:
```
 from sklearn.grid_search import GridSearchCV
 parameters = [{'k': np.arange(1,10)}]
 clf = GridSearchCV(housePricePredictor(), parameters, n_jobs=-1)
 clf.fit(X_train, y_train)
 clf.best_params_
```
Do you notice any spatial or temporal trends in error?

There is no noticable spatial or temporal trends in error.
How would you improve this model?
- Improve the accuracy:
  - Adding more features to the model
  - Using grid search cross-validation to find optimal parameters for the model.
- Improve the efficiency:
  - Precomputing the distance matrix.
How would you productionize this model?

Since our K-nearest Neighbors model is computationally intensive, we need to speed up the program. There are several ways we could try:
- Using cython to speed up computation
- Using numba to speed up computation
- Using processes in parallel with ProcessPoolExecutor
- Using Apache Spark framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
data.csv		data.csv
house_price_prediction.py		house_price_prediction.py