This repository contains the code, data, and documentation for project that involves cleaning and preparing a dataset for analysis using SQL.
The primary goals of this project are to:
Identify and address data quality issues within the dataset Enhance the accuracy and consistency of the data Transform the data into a format suitable for further analysis and visualization
The dataset used in this project is Nashville Housing Data.
Data Exploration: Examine the structure and content of the dataset to understand its characteristics and identify potential issues. Use descriptive statistics, visualizations, and SQL queries to visualize data distributions and detect anomalies.
Data Cleaning: Handle missing values: Apply strategies like imputation or removal based on the nature of missing data. Correct inconsistencies: Identify and rectify discrepancies in data formatting, naming conventions, or value ranges. Remove duplicates: Identify and eliminate identical records. Handle outliers: Detect and address extreme values that may distort analysis.
Data Transformation: Standardize data formats: Ensure consistency in data types, date formats, and units of measurement. Create new features: Derive additional variables or aggregate data as needed for analysis. Normalize or scale data: Adjust values to a common scale for certain statistical analyses.