A comprehensive Python data analysis project that examines demographic patterns from the 1994 Census database using Pandas. This project answers 9 key demographic questions through statistical analysis.
This analyzer processes 32,561 demographic records to provide insights into:
- Race distribution across the population
- Income patterns by education level
- Work hour statistics and earnings correlation
- Geographic income distribution
- Occupation trends by country
demographic-data-analyzer/
├── src/
│ ├── demographic_data_analyzer.py # Main analysis code
│ ├── test_module.py # Unit tests (9/9 passing)
│ ├── main.py # Development runner
│ └── demographic_data.csv # Dataset (32,561 records)
├── requirements.txt # Dependencies
└── README.md # This file
-
Clone and navigate to the project:
cd src/ -
Install dependencies:
pip install pandas
-
Run the analysis:
python main.py
-
Run tests:
python test_module.py
| Metric | Result |
|---|---|
| Average age of men | 39.4 years |
| Bachelor's degree holders | 16.4% |
| Advanced education earning >50K | 46.5% |
| No advanced education earning >50K | 17.4% |
| Minimum work hours/week | 1 hour |
| Highest earning country | Iran (41.9%) |
| Top India occupation (>50K) | Prof-specialty |
- Race distribution - Pandas Series with race counts
- Average age of men - Mean age calculation
- Bachelor's degree percentage - Education level analysis
- Advanced vs. non-advanced education income - Comparative analysis
- Minimum work hours - Labor statistics
- Income correlation with minimal hours - Work-income relationship
- Highest earning country - Geographic income analysis
- Popular occupation in India - Country-specific career trends
All functions include comprehensive unit tests:
- 9/9 tests passing
- Automated validation
- Error handling verification
- Language: Python 3.12+
- Primary Library: Pandas for data manipulation
- Data Format: CSV (no headers, 15 columns)
- Precision: All decimals rounded to nearest tenth
- Dataset Size: 32,561 records
UCI Machine Learning Repository: Adult Data Set
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
🎓 freeCodeCamp Data Analysis with Python Certification Project