- Objective: To understand and apply different methods of data collection, analysis, and cleaning across various data formats.
- Key Skills: Web Scraping, JSON and CSV Data Handling, Data Cleaning, Python Programming.
The project is structured into four key tasks, starting with web scraping to collect raw data from websites, followed by analyzing pre-collected data in JSON and CSV formats, and concluding with a crucial data cleaning step to ensure data quality. Try to understand and finish the tasks in Assignment1.
- Task 1: Web Scraping for Data Acquisition
- Task 2: Analyzing Pre-collected Data in JSON Format
- Task 3: Exploring Pre-collected Data in CSV Format
- Task 4: Data Cleaning
- Objective: Learn how to extract data from websites using web scraping tools like Beautiful Soup.
- Key Concepts: HTML structure, CSS selectors, Python scripting.
- Objective: Understand how to load, and analyze data stored in JSON files.
- Key Concepts: JSON structure, data parsing, nested objects and arrays.
- Objective: Master the techniques for importing, manipulating, and analyzing data in CSV files using pandas.
- Key Concepts: CSV file structure, pandas DataFrame, data manipulation.
- Objective: Learn the importance of and methods for cleaning data to enhance the reliability and accuracy of analyses.
- Key Concepts: Handling missing values, data validation.
Once completed, submit your Jupyter notebook (assignment1.ipynb) along with any additional files generated during the assignment.