DMV_Road_Assessment_Analysis

Project Overview
Key Objectives
Key Performance Indicators
Overview of Dataset Structure
Tools Used
Data Cleaning
Feature Engineering
Exploratory Data Analysis
Statistical Analysis
Logistic Regression Predicting Road Test Success
Failure Reasons
Data Analysis and Visualization
Key Findings
Recommendations
Limitations

Project Overview

This project analyzes a simulated DMV road test dataset to uncover the key factors that influence whether applicants pass or fail their driving test. With high failure rates leading to delays and added costs, the study aims to generate actionable insights that can improve training programs, optimize resource allocation, and support applicants in being better prepared.

Key Objectives

Demographic Insights: Explore pass/fail trends across age, gender, and race.
Training Effectiveness: Evaluate the impact of different training levels (Advanced, Basic, None) on outcomes.
Performance Drivers: Identify the most significant road assessment indicators that predict success.
Predictive Modeling: Build and test a logistic regression model to estimate pass likelihood and measure performance using accuracy, precision, recall, and F1-score.
Threshold Analysis: Demonstrate how varying cutoff points affect predictions using confusion matrices.
Interactive Dashboards: Design Power BI dashboards with dynamic filters (demographics, training, results) to provide intuitive, real-time insights.

key Performance Indicators

Overview of Dataset Structure

DADV-2-Capstone Project-Group D-Dataset Drivers License Data:-
Contains data on applicant’s demographics (age , gender and race), training participation and road assessment indicators.

Tools Used

Excel- Feature Engineering, Exploratory Data Analysis, Statistical Analysis and Data Visualization
- Download_Here
Power Bi- Data Visualization using DAX formulas
- Download_Here

Data Cleaning

Key Data Cleaning Process Included:-

Accurate data types of the fields were assigned.
The dataset was thoroughly checked for all the missing values and for duplication of data and no such things were found.
All the numeric fields are converted into numeric datatypes from general datatypes to ensure proper analysis.

Feature Engineering

The Age column was derived from the Age Group column using Generative AI tools. Applicants in the Teenager group were assigned ages 16–19, Young Adults were assigned ages 20–29, and Middle Age applicants were assigned ages 30–50.
For logistic regression, categorical variables were encoded into numeric format. The Gender column was converted to binary, with Male = 1 and Female = 0. From the Training_Type column, two dummy variables were created: Training_Advanced (Advanced = 1, Basic/None = 0) and Training_Basic (Basic = 1, Advanced/None = 0).

Exploratory Data Analysis

EDA involved exploring the DMV road assessment data to answer key questions, such as:

What is the total number of applicants, and among them, how many successfully qualified versus how many did not qualify?
What is the Qualification rate in DMV Road Assessment?
What is the distribution of applicants by gender (male vs. female), and within each group, how many passed and how many failed?
What is the distribution of applicants across different age groups?
What is the distribution of applicants across different race?
What is the distribution of applicants across different training types, and within each type, how many passed, how many failed, and what is the qualification rate?
What is the overall average age of applicants, and how does it compare between those who qualified and those who did not?
What is the distribution of pass rates across gender, age groups, training types, and race?
What are the pass rates of male and female applicants across different training types (Advanced, Basic, None)?

Statistical Analysis

1. Relationship between age & Pass-Fail group

A T test was conducted between ages of passed people and failed people and we got that the age significantly influences the outcome (p = 0.005). On average, candidates who passed the road test were older (28.7 years) compared to those who failed (26.3 years). This suggests maturity/experience may improve success rates.

2. Chi-Square Outcome between gender and pass/fail

Chi Square test was conducted between Gender vs Pass-Fail and we got that there is no significant relationship between gender and pass/fail outcome.

3. Chi-Square Outcome between training type and pass/fail

Chi Square test was conducted between Training Type and Pass/Fail and we got that there is strong and statistically significant relationship between Training_Type and pass/fail outcome.

Logistic Regression Predicting Road Test Success

Logistic regression model built to predict Pass/ Fail outcome.
Predictor variables used: Gender ,Age , Training_Advanced , Training_Basic , Theory Test, Reaction time, Signals, Speed Control, Road_signs, Mirror usage ,Confidence, Parking, Night_Drive, Steer_Control.
Categorical variables converted to dummy variables (Males=1, Females=0,In Reaction_Fast Fast=1,Slow & Average=0 and In Reaction_Slow Slow=1, Fast & Average=0).
Model assigns a probability of passing (0-1) for each candidate.
Cut-off is set in a dynamic way in excel to classify “Pass” Vs “Fail”.
By using the Maximum Likelihood Estimation Method(MLE) we got that Training Type Advanced and Basic are the main deciding factors for whether a person will pass or fail in the driving test.
A confusion matrix is set to count True Positive , True Negative , False Positive , False Negative and from that accuracy , precision and F1 score is also calculated where we minimized False Positives as we can’t afford to mark inefficient drivers as an efficient drivers.
Here cutoffs are set in excel in a dynamic way so that whenever we will change the cutoff TP, TN, FP, FN, Accuracy, Precision and F1 score will change automatically when we will refresh the pivot table.

Key Factors responsible for success

After Filtering By P Value We got Training Advanced and Training Basic are most important decision making Predictors and negative coefficient for intercept shows without training or skills the base chance of passing is low.

The strongest drivers of passing are Advanced Training (coef 0.45) and Basic Training (coef 0.31), followed by skill-based factors like Signals, Road Signs, and Mirror Usage. Demographics such as age and gender show no significant effect

Prediction model Outcome

Failure Reasons

Data Analysis and Visualization

After data cleaning ,pre processing, data analysis using statistical techniques all the excel files are loaded into Power Bi. Then with the help of Power Bi DAX formulas , Visualization Charts and other important features of Power Bi Three Dashboards were created to address all the questions in problem statement.

📊 Dashboard 1: DMV Road Test Analysis – Qualification Patterns by Demographics and Training

Purpose- Highlights trends in qualification rates across age groups, gender, race, and training types to uncover demographic and behavioral patterns.
Viualization-

📊 Dashboard 2: Training & Demographic Influence on Qualification Outcomes

Purpose- Evaluates the impact of different training programs and demographic attributes on pass/fail outcomes, helping identify factors that enhance success rates.
Visualization-

📊 Dashboard 3: Predicting Pass/Fail in DMV Road Test – Logistic Regression Insights

Purpose- Applies predictive modeling through logistic regression to estimate the likelihood of qualification, enabling data-driven decision support.
Visualization-

Key Findings

1. Training Impact: Applicants with Advanced training achieved the highest pass rate (88.16%). Those with Basic training had more candidates under slower reaction types.

2. Gender Insights: Males had the highest qualification rate (50.82%), while females had the highest fail rate (51.17%).

3. Age Trends: The Middle Age group (30–50) had the highest pass rate (39.36%), whereas Teenagers (16–19) experienced the highest fail rate.

4. Reaction & Performance: Candidates with fast reaction times had the highest pass rate (35.99%).

5. Race Analysis: The Others race group had the highest qualification rate (34.06%), and White candidates had the highest fail rate.

6. Test Scores: Average written and theory test scores were highest among Young Adults (20–30) and those with Advanced training.

7. Correlation Insights: Age positively correlates with qualification, indicating maturity improves success likelihood.

8. Predictive Modeling: Logistic regression showed applicants with Advanced or Basic training are more likely to pass.

9.Cutoff Optimization: At a 60–88% cutoff, the model achieved 73% accuracy with minimal false positives, aligning with the goal of reducing candidates incorrectly predicted as passing.

Recommendations

1. Promote Advanced Training Programs

Applicants who took advanced training had the highest chance of passing. Increasing access to such programs will improve pass rates.

2. Encourage Basic Training for Beginners

Even basic training gives applicants a better chance of success compared to no training. Making this more widely available can reduce failures.

3. Focus on Key Driving Skills

Indicators like signals, road signs, and mirror usage showed significant impact on outcomes. Extra practice and testing in these areas can raise success rates.

4. Regularly Monitor Results with Dashboards

An interactive dashboard helps DMV officials track pass rates by gender, age, and training type. This makes it easier to spot trends and adjust training programs quickly.

5. Track Demographic Trends

While gender showed no significant impact , age and training choices did. Monitoring subgroups performance ensures fairness and highlights were extra support is needed.

Limitations

1. Observational data — causality not proven.

We only looked at existing data. We have seen patterns (like training and pass rates), but we can’t be 100% sure that training causes higher pass rates. There may be other reasons.

2. Training selection may be endogenous (people who opt for training may differ)

The people who choose “advanced training” might already be more motivated or better prepared than those who don’t. So, their higher pass rate may not be only because of the training.

3. Possible measurement error in skill indicators

Some information might not be fully accurate (for example, if skills or training were recorded incorrectly, or if people reported it themselves instead of being observed).

4. Model may omit other relevant variables

We didn’t have all possible factors (like prior driving experience, confidence level, or family income). These could also influence pass/fail results but were not included in the model.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DMV Project 2.pbix		DMV Project 2.pbix
DMV Road Assessment (1).pptx		DMV Road Assessment (1).pptx
Drivers_License_Data_with_Age (1).xlsx		Drivers_License_Data_with_Age (1).xlsx
README.md		README.md
driving capstone project.pbix		driving capstone project.pbix

SagnickMukherjee01/DMV_Road_Assessment_Analysis

Folders and files

Latest commit

History

Repository files navigation