This project evaluates the business impact of a redesigned digital client experience through an A/B test conducted by Vanguard Group. By comparing user behavior between the existing platform and the new interface, the analysis measures performance across key metrics including conversion, process completion time, and user drop-off.
The study goes beyond performance tracking to identify where the redesigned experience improves usability and where friction persists. Segment-level analysis reveals how different client groups respond to the new interface, providing a more nuanced understanding of customer behavior.
The project delivers data-driven recommendations to support product and business stakeholders in deciding whether to scale the new experience, refine specific journey steps, or maintain the current design, with the goal of improving both customer experience and operational efficiency.
├── config.yaml # Project configuration settings
├── main.py # Main script to run project workflows
├── pyproject.toml # Project dependencies and configuration
├── uv.lock # Dependency lock file
├── README.md # Project documentation
├── data/ # Project datasets
│ ├── raw/ # Original raw datasets
│ │ ├── df_final_demo.txt
│ │ ├── df_final_experiment_clients.txt
│ │ ├── df_final_web_data_pt_1.txt
│ │ └── df_final_web_data_pt_2.txt
│ │
│ └── clean/ # Cleaned and processed datasets
│ ├── demographics_dataset.csv
│ ├── df_main_merge.csv
│ ├── df_time_window.csv
│ ├── kpi_duration_dataset.csv
│ ├── kpi_completion_steps.csv
│ ├── kpi_completion_visits.csv
│ ├── kpi_merged.xlsx
│ └── kpi_errors.xlsx
├── figures/ # Generated visualizations
│ ├── Conversion-Rate_Steps.png
│ ├── Drop-off-Rate_Steps.png
│ ├── Task-Completion-Time_Group.png
│ ├── Step-Time-Distribution_Step.png
│ ├── Client-Behavior_Age-Group.png
│ ├── Client-Tenure-Distribution.png
│ └── ... (additional plots)
├── notebooks/ # Analysis and exploration notebooks
│ ├── KPI_errors_beatriz.ipynb
│ ├── demographics_rachel.ipynb
│ ├── KPI_completion_anne.ipynb
│ ├── KPI-duration_rachel.ipynb
│ ├── main_dataset_tasks_anne.ipynb
│ └── functions.py
This project analyzes a digital experiment conducted by Vanguard to improve its online user journey. A redesigned interface and in-context prompts were introduced to make the process more intuitive and increase completion rates. The main goal is to assess whether these changes lead to higher user engagement and more completed processes.
A/B testing is a method that compares two UI versions (Control vs Test) by randomly assigning users to evaluate which performs better based on defined metrics.
The analysis is based on three datasets:
- Client Profiles (df_final_demo): demographics and account details
- Digital Footprints (df_final_web_data 1 and 2): online interactions (concatenated from two parts)
- Experiment Roster (df_final_experiment_clients): test participation
Key figures:
- Total clients: 70,609
- Selected for A/B test: 50,500
- Participants: 40,028
- Total visits: 51,998
- Period: March–June 2017
- Measure if the new experience increases process completion
- Evaluate the quality of randomization
- Provide recommendations
The preprocessing of the datasets involved several key steps to ensure accuracy and consistency for analysis:
- Remove duplicates – Duplicate rows across datasets were deleted to avoid double-counting.
- Format conversion – String date columns were converted to
datetimeformat for proper time-based analysis. - Filter unassigned clients – Clients not associated with any experimental group were removed from the Digital Footprints dataset to maintain valid comparisons.
- Merge datasets –
- The two parts of the Digital Footprints data were first concatenated.
- A left join was then performed with the Client Profiles (demo) dataset.
- Another left join was applied with the Experiment Roster to include group assignments.
- Remove redundant dates –
last_startandconfirmdate columns were dropped to simplify the analysis and focus on relevant event timestamps.
These steps created a clean, unified dataset suitable for all subsequent KPI calculations and A/B test analysis.
Randomization appears globally reliable across the experiment. The initial split between Control (46.6%) and Test (53.4%) groups is slightly imbalanced but remains close to a 50/50 distribution, which is acceptable. This balance is maintained among participating clients (45% vs 55%) and across total visits (44.8% vs 55.2%), suggesting consistent allocation throughout the funnel. Additionally, the very low proportion of erroneous visits (~0.18%) has minimal impact on the overall dataset. Overall, these elements indicate that the randomization process is sound and does not introduce major bias into the analysis.
The user base is primarily concentrated in the 30–50 age range, with most clients having moderate to long tenure. On average, clients hold around 2.27 accounts, indicating an already engaged and established customer base. The average account balance is approximately 152K, with a high standard deviation (~302K), revealing a strong right-skew distribution driven by a small group of high-value clients. This indicates that while most users are moderately engaged, a minority contributes disproportionately to overall value. Segment-level analysis highlights clear behavioral differences:
- Clients with longer tenure exhibit more stable engagement patterns (in terms of accounts and activity).
- Lower-tenure or less active clients show greater variability, suggesting higher friction or inconsistent usage.
- Higher-balance clients tend to navigate more efficiently, implying that familiarity and financial engagement reduce friction in the user journey. Key Insight: The platform is primarily used by financially engaged, established clients, but the wide variance in balance and activity indicates that user behavior is highly segmented — not uniform.
Completion rate measures the percentage of users who successfully complete a task or full process.
Formula:
- Step completion = users who complete a step / users who started it
- Process completion = users who complete all steps without errors / users who started
A full process includes this exclusive sequence: start → step1 → step2 → step3 → confirm
Benchmark: 75–80%
Target improvement: +5% for Test group
Measures the average duration per step and per group to evaluate efficiency and detect friction points.
Each of the visits was classified into one of four types:
| Type | Definition |
|---|---|
| Smooth | All 5 steps completed in correct order, zero errors |
| Lumpy – Repetition | Completed, but re-visited a step |
| Lumpy – Regression | Completed, but navigated backwards |
| Incomplete | Never reached confirm |
Errors were flagged by comparing each step to the previous one within the same visit:
- Repetition: the same step appears consecutively (e.g., step_2 → step_2)
- Regression: the step rank decreases (e.g., step_2 → step_1)
A visit was marked as having an error if at least one repetition or regression occurred during that visit.
- Control: 46.85%
- Test: 49.74%
The Test group performs better with a +2.89 percentage point increase (~6.16% relative), but both remain below the expected benchmark (~70%).
- Start: Test group performs better → improved entry experience
- Step 1 & 2: Control performs better → clearer guidance in current interface
- Step 3: Test performs better → stronger engagement at final stage
The very low p-value and high z-score indicate a statistically significant difference between the two groups.
The improvement does not reach the +5% target. Overall completion remains relatively low.
The new interface shows promising results, especially for users progressing through the funnel, but further optimization is needed before full rollout.
Task duration analysis shows a strong right-skewed distribution: most users complete the process quickly, while a smaller group experiences significantly longer durations. After removing outliers using the IQR method, the analysis becomes more representative of typical user behavior. In this context, the median is a more reliable metric than the mean.
Step-level analysis reveals that time is unevenly distributed across the process, with specific steps acting as clear bottlenecks. These longer steps directly align with higher drop-off rates, highlighting a strong relationship between friction and abandonment. Users who take longer are significantly less likely to complete the process.
Focusing on the last attempt per user ensures that results reflect actual outcomes rather than repeated trials.
Key insight:
Process inefficiencies are concentrated in a few critical steps, and these friction points have a direct impact on completion rates. Optimizing these stages would likely generate the highest gains in both performance and user experience.
- The user base is experienced but heterogeneous, with strong differences in engagement and value
- The process is globally efficient but impacted by localized friction points
- High-value and experienced users navigate more smoothly, while others face more difficulties
- Improvements should target both problematic steps and struggling user segments
The redesigned UI shows partial effectiveness:
- Users complete the process efficiently when no friction is encountered
- However, overall completion remains limited by a few high-friction steps
- The impact of the new UI is uneven across the journey, not uniformly positive
Step-level insights confirm that:
- A small number of steps concentrate most of the delays
- These steps also correspond to higher drop-off rates
- Most of the journey works well → issues are localized, not systemic
Is the new UI worth it?
Yes — but not fully optimized.
The redesign improves efficiency for many users, but unresolved friction points still limit its overall impact.
Key insight:
Targeted improvements on specific bottlenecks will likely deliver higher returns than a full redesign.
The start step was the most error-prone in both groups:
| Group | Step | Error Rate |
|---|---|---|
| Control | start | 8.68% |
| Control | step_2 | 2.64% |
| Control | step_1 | 2.35% |
| Control | step_3 | 0.69% |
| Control | confirm | 0.56% |
| Test | start | 12.45% |
| Test | step_1 | 2.81% |
| Test | confirm | 2.19% |
| Test | step_2 | 1.66% |
| Test | step_3 | 0.51% |
The Test group had a notably higher error rate at start (12.45% vs 8.68%) and at confirm (2.19% vs 0.56%).
Note: error rate here = number of erroneous step transitions at that step / total rows in that group.
The most common single error in both groups was start → start (the user repeated the start page):
- Control: 6,118 occurrences (6.08% error rate)
- Test: 9,184 occurrences (6.59% error rate)
Other notable high-error transitions in the Test group:
step_1 → start: 4,878 errors (3.5%)confirm → confirm: 3,058 errors (2.19%)step_2 → step_1: 2,342 errors (1.68%)
The error rate here means: the percentage of visits by that age group that contained at least one navigation error (repetition or regression).
| Age Group | Control | Test |
|---|---|---|
| <30 | 34.14% | 40.31% |
| 30–50 | 31.36% | 40.26% |
| 50–70 | 35.33% | 47.85% |
| 70+ | 35.36% | 48.89% |
The Test group had higher error rates across all age groups. Users aged 50–70 and 70+ were the most affected — in the Test group, nearly 1 in 2 of their visits contained at least one navigation error. This suggests the new UI may present more friction for older users.
The A/B test conducted on Vanguard’s digital platform reveals partial improvements in process completion due to the redesigned interface, but several friction points remain.
Completion rate: The Test group shows a small improvement (+2.89pp), but the absolute rate remains below the 70% benchmark. Step-level analysis shows uneven performance, with some steps performing worse than the Control group. Completion time: Median task duration is reasonable for most users, but a few bottlenecks increase friction and reduce completion probability. Error rate: The Test group exhibits higher errors, especially at start and among older users (50–70+, 70+), indicating that the new UI may be less intuitive for certain segments.
User feedback insights:
Repeated navigation errors suggest users struggle with unclear instructions or step layout. Experienced users navigate efficiently, but less familiar or older clients encounter significant friction. Users value clarity and simplicity; additional guidance could reduce repeated attempts and regressions.
Selective participation: Repeated or self-selected participation may skew results toward more engaged users. Device/system variability: Differences in devices or browsers could affect user experience and measured KPIs. Temporal changes: Any modifications to the platform during the experiment could introduce confounding effects.
Synthesis: While the new interface shows promise for engaged users, overall performance improvements are modest. Friction points persist, particularly for less experienced or older users, and error rates in critical steps remain high.
We do not recommend launching the new version after the A/B test. Instead, we suggest focusing on improving the intermediate steps of the process, reducing errors before implementation, and leveraging additional information such as device/browser differences and user feedback to guide targeted optimizations.
Visual analysis was performed using a combination of Python, Prezi, and Tableau to provide both static and interactive insights. Please refer to Resources section below.
- pandas – data manipulation and aggregation
- matplotlib & seaborn – histograms, boxplots, scatter plots, and bar charts
- plotly – interactive charts for step duration and funnel analysis
- Prezi – dynamic presentation of key findings with embedded charts
- Tableau – interactive dashboards for step-level completion rates, user flows, and duration analysis
- Users can filter by group, tenure, or balance
- Allows drill-down into specific steps and client segments
This multi-tool approach ensures both detailed statistical insight and accessible visual storytelling for stakeholders.
- ipykernel>=7.2.0
- jupyter>=1.1.1
- matplotlib>=3.10.8
- numpy>=2.4.3
- pandas>=3.0.1
- polars>=1.39.0
- seaborn>=0.13.2
- statsmodels>=0.14.6
- scipy>=1.11.1
- plotly>=2.26.0
- kaleido>=0.2.1
Beatriz Fernandez, Anne Leschallier de Lisle, Rachel Vianna