A machine learning-based system for recommending comparable properties for real estate appraisals. The system uses various property features to find and rank similar properties, providing detailed explanations for each recommendation.
- Advanced Feature Engineering: Combines property characteristics, location data, and textual descriptions
- Multiple ML Models: Support for Random Forest, Gradient Boosting, SVM, and ensemble methods
- Explainable AI: Detailed explanations for each recommendation using feature importance
- Comprehensive Evaluation: Multiple metrics including precision, recall, and similarity scores
- Real-time Predictions: Generate recommendations for any property in the dataset
Property-Recommendation-System/
├── src/
│ ├── data/
│ │ ├── data_loader.py # Data loading and preprocessing
│ │ └── feature_engineering.py # Feature extraction and transformation
│ ├── models/
│ │ └── comp_recommender.py # ML model implementations
│ ├── evaluation/
│ │ └── evaluator.py # Model evaluation and metrics
│ ├── utils/
│ │ └── explainability.py # Explanation generation
│ └── config/
│ └── config.py # Configuration settings
├── models/ # Trained model files
├── results/ # Evaluation results and recommendations
├── explanations/ # Generated explanations
├── train.py # Model training script
├── predict.py # Prediction and recommendation script
└── requirements.txt # Python dependencies
- Clone the repository:
git clone <repository-url>
cd Property-Recommendation-System- Install dependencies:
pip install -r requirements.txt- Ensure you have the dataset file
appraisals_dataset.jsonin the project root directory.
Train a model using the training script:
# Train a Random Forest model
python3 train.py --data-path appraisals_dataset.json --model-types random_forest --n-comps 3
# Train multiple models
python3 train.py --data-path appraisals_dataset.json --model-types random_forest gradient_boosting --n-comps 5
# Train with ensemble method
python3 train.py --data-path appraisals_dataset.json --model-types random_forest gradient_boosting --use-ensembleTraining Options:
--data-path: Path to the dataset file (default: appraisals_dataset.json)--model-types: Types of models to train (random_forest, gradient_boosting, svm)--n-comps: Number of comparable properties to recommend (default: 3)--use-ensemble: Use ensemble of multiple models
Generate recommendations for a specific property:
# Basic recommendation
python3 predict.py --subject-id 4762597 --model-path models/random_forest_model.pkl
# With custom output path
python3 predict.py --subject-id 4762597 --model-path models/random_forest_model.pkl --output-path results/my_recommendations.json
# More recommendations with LLM explanations
python3 predict.py --subject-id 4762597 --model-path models/random_forest_model.pkl --n-comps 5 --use-llmPrediction Options:
--subject-id: ID of the property to generate recommendations for (required)--model-path: Path to the trained model file--data-path: Path to the dataset file (default: appraisals_dataset.json)--n-comps: Number of recommendations to generate (default: 3)--use-llm: Use LLM for enhanced explanations--output-path: Path to save recommendations JSON file
The system generates detailed recommendations with explanations:
===== PROPERTY RECOMMENDATIONS =====
SUBJECT PROPERTY:
ID: 4762597
Address: 142-950 Oakview Ave Kingston ON K7M 6W8
Structure Type: Townhouse
Year Built: 1976.0
GLA: 1044.0
Bedrooms: 3
Bathrooms: 1:1
RECOMMENDED COMPARABLE PROPERTIES:
1. 311 Janette St
Similarity Score: 0.2840
Structure Type: Freehold Townhouse
Year Built: None
GLA: 1500.0
Bedrooms: 3.0
Price: 585000.0
Key Factors:
- Heating forced air: 6.82 (positive)
- Structure type freehold townhouse: 9.10 (positive)
- Basement features: -0.06 (negative)
To find available subject IDs in your dataset:
python3 -c "import json; data = json.load(open('appraisals_dataset.json', 'r')); print('Available IDs:', [appraisal['orderID'] for appraisal in data['appraisals'][:5]])"The system has been evaluated on multiple metrics:
- Precision & Recall: Model accuracy in finding relevant comparables
- Similarity Scores: Quantitative similarity between properties
- Feature Importance: Understanding which factors drive recommendations
- Coverage: Percentage of properties that can receive recommendations
Results are saved in the results/ directory with detailed evaluation reports.
Key configuration options in src/config/config.py:
- Model parameters (n_estimators, max_depth, etc.)
- Feature engineering settings
- Evaluation metrics
- File paths and directories
The system automatically extracts and engineers features including:
- Property characteristics (bedrooms, bathrooms, GLA, etc.)
- Location features (coordinates, municipality, etc.)
- Text features from property descriptions (TF-IDF)
- Categorical encodings
- Numerical scaling and normalization
Each recommendation includes:
- Similarity score
- Top contributing features
- Feature impact analysis
- Detailed property comparisons
Comprehensive evaluation using:
- Classification metrics (precision, recall, F1)
- Ranking metrics (MAP, NDCG)
- Similarity metrics
- Feature importance analysis
- Subject ID not found: Ensure the subject ID exists in your dataset
- Model not found: Train a model first using
train.py - Feature engineer not found: Retrain the model to save the feature engineer
- JSON serialization errors: The system automatically handles timestamp conversions
Check the evaluation results in results/ for model performance metrics and potential issues with specific properties.
See requirements.txt for the complete list of dependencies. Key packages include:
- pandas, numpy: Data processing
- scikit-learn: Machine learning models
- matplotlib, seaborn: Visualization
- joblib: Model serialization