Group 5 Term Project: Arizona Home Property Value Analysis

Project Summary

This project uses the process of data cleaning, analytics, and visualization techniques to find pricing patterns in Arizona’s real estate. By comparing property types and analyzing different ZIP codes, our notebook provides a clear breakdown of how different factors influence the overall sale price of the properties of Pinal County, Arizona.

Overarching Research Question: How have home purchasing habits changed over time in Pinal County, Arizona, and what drives property values across different types and locations?

Team Members:

Daniel Ohebshalom
Shivani Ramdeo
Caroline Guirand
Robert Neagu

Dataset

Source: Pinal County, Arizona real estate transactions

Format: CSV

Key Columns:

sale_price
sale_datetime
property_zip5
property_city
property_type
property_street_address
source_url

Methodology

1. Data Cleaning & Preparation

How to start: Make sure the CSV file is in the same directory as the notebook.

Ensure you are running the correct libraries:

import Numpy as np
import Pandas as pd
import matplotlib.pyplot as plt

Cleaning the Data:

Perform Data Exploration to find out which columns will be used and columns will NULL values. Then create a copy of the original data which will be cleaned.
Convert sale_datetime to a real datetime type
Convert property_zip5 from float to integer-like string (ZIP codes shouldn't have decimals)
Fill missing property_type with "Unknown". Reason: too much data would be deleted, leading to a weaker analysis.
Drop columns that are completely empty (100% missing values)
Remove rows where sale_price is missing or not positive

Note: While NULL values still exist in the data, they are in rows that are either not important or are unaffected like property_type when performing analysis.

Data Analysis

We explored the trends and patterns of the provided csv file using analytics and visualizations

Sale Price Distribution
- Created a histogram displaying the price ranges
- Identified a right-skewed distribution
- Noticed that most properties sell under $500k
Property type breakdown
- Counted and graphed the top 10 property types
- Residential properties dominated
- Highlights the impact made by the “unknown” category
Average sale price by ZIP code
- Calculated the mean sale prices for each ZIP code
- Identified the top 10 most expensive ZIP code areas
- Noticed a big geographic difference in pricing
Average sale price by property type
- Grouped by “property-type” and calculated those averages
- Used a bar chart to visualize the highest valued property categories
- Compared pricing across different property types

2. Exploratory Data Analysis (EDA) Results:

Question 1: What does the Distribution of Sale Prices look like across all properties?

Discovered basic statistics of the total dataset after cleaning data using .describe()
Created price bins grouping by property type using np.array and bin_idx()
Generated histogram using plt.figure mapping the distribution of sale prices up $1,000,000 Key Finding: Identified the Skewness and concentration of pricing both overall and by property type.

Question 2: Which Property Types are Most Common?

Applied value_counts() function to count occurrences of each property type
Extracted top 10 most frequent categories using .head(10)
Visualized results as a bar chart showing property type distribution Key Finding: Identified which property types dominate the Pinal County housing market

Question 3: How Do Average Sale Prices Vary Across ZIP Codes?

Used groupby("property_zip5") to partition data by geographic location
Calculated mean sale price for each ZIP code group using .mean()
Sorted results in descending order to identify most expensive neighborhoods
Extracted top 10 most expensive ZIP codes using .head(10)
Created bar chart visualization for geographic price comparison Key Finding: Revealed significant geographic price variation, showing that location is a major driver of property values

Question 4: Which property types tend to have higher or lower sale prices?

Calculated the average sale price by property type using .groupby() and .mean() afterwards
Sorted in descending order to easily identify the most expensive property types
Take the 10 most expensive using .head() and create a bar chart visualizing the result Key Finding: Unknown properties tend to be the most expensive, though this is skewed by the very expensive unknown property. Excluding this, the most expensive property type is Multiple Unit with an average of ~$1.6 million.

Model Analysis

Grouped the dataset by property type and ZIP code
Calculated mean sale prices for each group
Sorted results to identify the highest-valued categories
Interpreted relationships between property characteristics and pricing
Supported findings through graphs and visualizations which highlighted how location and property type influenced and affected the market value

Saving the Cleaned Data

Saved the cleaned dataset so future and further analysis can be done without repeating all the cleaning steps:

output_path = "AZ_cleaned.csv"
clean.to_csv(output_path, index=False)

How to run this project

Download the files: CIS9650_Group5_Term_Project.ipynb AZ.csv
Place the files in the same folder (If using CoLab, make sure to put it in the root folder, not content)
Open Jupyter Notebook
Run the notebook in sequential order
A cleaned dataset (AZ_cleaned.csv) will be automatically generated and downloaded to the user’s computer

Requirements:

This project uses the following Python libraries: numpy, pandas, and matplotlib

Files in this project:

CIS9650_Group5_Term_Project.ipynb - Complete Jupyter Notebook with all analysis, visualizations, and findings.
AZ_cleaned.csv - Cleaned dataset exported from the notebook for future analysis.
README.md - This documentation file.
Download the CSV for used for this project

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CIS9650_Group5_Term_Project.ipynb		CIS9650_Group5_Term_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group 5 Term Project: Arizona Home Property Value Analysis

Project Summary

Dataset

Methodology

1. Data Cleaning & Preparation

Data Analysis

2. Exploratory Data Analysis (EDA) Results:

Model Analysis

Saving the Cleaned Data

How to run this project

Requirements:

Files in this project:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Group 5 Term Project: Arizona Home Property Value Analysis

Project Summary

Dataset

Methodology

1. Data Cleaning & Preparation

Data Analysis

2. Exploratory Data Analysis (EDA) Results:

Model Analysis

Saving the Cleaned Data

How to run this project

Requirements:

Files in this project:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages