This basic data analysis project focuses on data cleaning techniques using SQL queries. The data, sourced from Kaggle, consists of raw information on the Nashville housing market between 2013 and 2016.
The raw data is imported into Microsoft SQL Server and undergoes a comprehensive cleaning process to ensure accuracy and completeness for further analysis. The key cleaning steps include:
-
Standardizing Date Formats: Ensuring consistent date formats across all rows.
-
Handling NULL Values: Populating NULL values in certain cells to ensure data completeness and accuracy.
-
Splitting Addresses: Dividing full addresses into individual columns for improved readability.
-
Removing Duplicates: Eliminating duplicate rows to maintain unique and accurate data.
-
Deleting Irrelevant Columns: Removing columns that are not relevant to the analysis.
These steps ensure that the dataset is well-prepared for subsequent analysis, providing reliable insights into the Nashville housing market.
To replicate this analysis or use the cleaned data for your own projects, follow these steps:
-
Clone the Repository:
https://github.com/Aiswariya-R/SQL-project.git -
Import Data: Import the raw dataset into Microsoft SQL Server.
-
Run SQL Scripts: Execute the provided SQL scripts to clean the data.
Contributions to this project are welcome! If you have any ideas, suggestions, or improvements, please feel free to submit a pull request. Make sure to provide a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for details.