This project analyzes U.S. mobility patterns using the Trips by Distance dataset.
It includes data preprocessing, exploratory analysis, travel distance insights,
threshold-based comparisons, and parallel processing performance benchmarking.
Several predictive models are also implemented to study trends in travel behavior.
This repository contains a complete workflow for analyzing national-level mobility data:
- Automatic file path resolution
- Null value inspection
- Column structure verification
- Date conversion and cleanup
- Stay-at-home vs. not-at-home population statistics
- Distance-based travel metrics (0β500+ miles)
- Weighted distance calculations
- Daily trend visualizations
The project identifies dates where:
- > 10,000,000 people made 10β25 mile trips, and
- > 10,000,000 people made 50β100 mile trips
and compares the overlap between the two sets of dates.
Aggregation over dates is computed using:
- Sequential processing
- Parallel processing with 10 workers
- Parallel processing with 20 workers
Performance metrics include:
- Total runtime
- Speedup
- Efficiency
Predictive models include:
- Linear Regression
- Random Forest Regression
with performance metrics such as: - RΒ²
- RMSE
- MAE
All major results are saved as CSV:
processing_performance.csvmodel_performance.csvhome_stats.csvdistance_stats.csv
The main and most up-to-date analysis is contained in:
Analysis_of_U_S_Mobility_Patterns_and_Parallel_Processing.ipynb
This notebook includes all final data processing steps, visualizations, parallel processing benchmarks, threshold comparisons, and modeling results. Users should refer to this file for the complete and polished analysis.
The archive/ folder contains earlier versions of the notebooks used in this project.
These files represent intermediate development stages and are not as complete or polished
as the main analysis notebook. They are kept for reference and version history, but
users should rely on the latest notebook in the root directory for the final results.