Parallel Data Processing and Modeling for U.S. Trip Analysis

This project analyzes U.S. mobility patterns using the Trips by Distance dataset.
It includes data preprocessing, exploratory analysis, travel distance insights,
threshold-based comparisons, and parallel processing performance benchmarking.
Several predictive models are also implemented to study trends in travel behavior.

📊 Project Overview

This repository contains a complete workflow for analyzing national-level mobility data:

1. Data Loading & Inspection

Automatic file path resolution
Null value inspection
Column structure verification
Date conversion and cleanup

2. National-Level Trip Analysis

Stay-at-home vs. not-at-home population statistics
Distance-based travel metrics (0–500+ miles)
Weighted distance calculations
Daily trend visualizations

3. Threshold-Based Comparison

The project identifies dates where:

> 10,000,000 people made 10–25 mile trips, and
> 10,000,000 people made 50–100 mile trips
and compares the overlap between the two sets of dates.

4. Parallel Processing Performance Benchmark

Aggregation over dates is computed using:

Sequential processing
Parallel processing with 10 workers
Parallel processing with 20 workers

Performance metrics include:

Total runtime
Speedup
Efficiency

5. Modeling

Predictive models include:

Linear Regression
Random Forest Regression
with performance metrics such as:
R²
RMSE
MAE

6. Export Key Tables

All major results are saved as CSV:

processing_performance.csv
model_performance.csv
home_stats.csv
distance_stats.csv

📌 Final Notebook

The main and most up-to-date analysis is contained in:

Analysis_of_U_S_Mobility_Patterns_and_Parallel_Processing.ipynb

This notebook includes all final data processing steps, visualizations, parallel processing benchmarks, threshold comparisons, and modeling results. Users should refer to this file for the complete and polished analysis.

📂 Archive

The archive/ folder contains earlier versions of the notebooks used in this project.
These files represent intermediate development stages and are not as complete or polished
as the main analysis notebook. They are kept for reference and version history, but
users should rely on the latest notebook in the root directory for the final results.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
archive		archive
Analysis_of_U_S_Mobility_Patterns_and_Parallel_Processing_Performance.ipynb		Analysis_of_U_S_Mobility_Patterns_and_Parallel_Processing_Performance.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Data Processing and Modeling for U.S. Trip Analysis

📊 Project Overview

1. Data Loading & Inspection

2. National-Level Trip Analysis

3. Threshold-Based Comparison

4. Parallel Processing Performance Benchmark

5. Modeling

6. Export Key Tables

📌 Final Notebook

📂 Archive

About

Uh oh!

Releases

Packages

Languages

weix20/Mobility-Data-Processing

Folders and files

Latest commit

History

Repository files navigation

Parallel Data Processing and Modeling for U.S. Trip Analysis

📊 Project Overview

1. Data Loading & Inspection

2. National-Level Trip Analysis

3. Threshold-Based Comparison

4. Parallel Processing Performance Benchmark

5. Modeling

6. Export Key Tables

📌 Final Notebook

📂 Archive

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages