ML Property Ranking System

This project is an interactive AI-powered tool for evaluating and explaining comparable property rankings.

It uses:

XGBoost for pairwise ranking
SHAP for explainability
GPT-3.5 for natural language explanations
GPT-3.5 for property remark parsing for accurate data

Getting Started

Install dependencies

pip install -r requirements.txt

Set your OpenAI API key

Make sure your OpenAI API key is exported as an environment variable:

export OPENAI_API_KEY=your-key-here

Run the app

Launch the Streamlit interface:

streamlit run app.py

How It Works

Reads a JSON dataset of appraisals and candidate properties
Cleans/parses the necessary appraisal data
Uses ChatGPT-3.5 to parse public property remarks to improve data accuracy and completeness
Runs geocoding for all addresses if needed
Performs feature engineering on each candidate vs. subject
Trains a ranking model to score candidate comparables
Uses SHAP to compute feature-level impact for each of the top-3 ranked comps
Uses GPT-3.5 to explain the rankings in natural language

Feedback Loop

Users can provide feedback on poor comp predictions directly in the UI:

Feedback is logged and integrated into the next training cycle
Bad comps are dropped entirely
Over time, the model learns from user guidance and improves

Files

appraisals_dataset.json: Input data
cleaned_appraisals_dataset.json: Cleaned/parsed appraisal data
gpt_extracted_features_appraisals.json: Appraisal data parsed from public remarks
feature_engineered_appraisals_dataset.json: Feature engineered appraisal data
geocoded_addresses.json: Longitude and latitude data for each address in the dataset
training_data.csv: Processed training dataset
training_data_with_feedback.csv: Dataset with integrated user feedback
feedback_log.csv: Log of submitted feedback
top3_gpt_explanations.csv: Final output with model explanations

Key Improvments this Iteration

Parsing public-remarks data using ChatGPT-3.5 to get more reliable and accurate data. It also allowed for addition of new data like condition data and other property features
Merging this new data into the final dataset, prioritizing public remarks data as ground truth
This new data took the precision up to 98%, with condition difference being the most important new feature
Removed some of the less important features according to SHAP feature analysis

Thoughts on Future Improvements

More data: Appraisers think about more than just things like "are there the same number of bedrooms" or "is the GLA similar". Things like trendy cabinetry or window sizes or general aesthetics are also important factors to consider for appraisers.
System for adding new data points: The ability to add new appraisals, potential properties, etc.
Better understanding of neighborhoods: Rather than just distance between coordinates, a system for better understanding whether or not two properties are in the same neighborhood, school district, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
analyzing_initial_dataset.py		analyzing_initial_dataset.py
app.py		app.py
appraisals_dataset.json		appraisals_dataset.json
clean_initial_data.py		clean_initial_data.py
cleaned_appraisals_dataset.json		cleaned_appraisals_dataset.json
data_pipeline.py		data_pipeline.py
extract_remarks_data.py		extract_remarks_data.py
feature_engineered_appraisals_dataset.json		feature_engineered_appraisals_dataset.json
features.py		features.py
geocode_all_addresses.py		geocode_all_addresses.py
geocoded_addresses.json		geocoded_addresses.json
gpt_extracted_features_appraisals.json		gpt_extracted_features_appraisals.json
missing_addresses.txt		missing_addresses.txt
requirements.txt		requirements.txt
top3.csv		top3.csv
top3_explanations.py		top3_explanations.py
top3_gpt_explanations.csv		top3_gpt_explanations.csv
train_model.py		train_model.py
training_data.csv		training_data.csv
training_data.py		training_data.py
training_data_with_feedback.csv		training_data_with_feedback.csv
xgb_rank_model.json		xgb_rank_model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Property Ranking System

Getting Started

How It Works

Feedback Loop

Files

Key Improvments this Iteration

Thoughts on Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Property Ranking System

Getting Started

How It Works

Feedback Loop

Files

Key Improvments this Iteration

Thoughts on Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages