Process Mining with PM4Py

A Hands-On Tutorial Using Real Insurance Claims Event Log Data

This project demonstrates how to perform process discovery, bottleneck detection, and visual analysis using the PM4Py process mining library in Python.
The notebook processes an insurance claims event log, converts it into an event log structure, and generates multiple Directly-Follows Graphs (DFGs).

Project Overview

This notebook covers:

Loading and preparing a real event log dataset (insurance claims)
Converting a CSV file into a PM4Py-compatible event log
Discovering the process model using:
- Standard DFG
- Frequency-based DFG
- Performance-based DFG
Visualizing each discovered model
Understanding process flow and identifying inefficiencies

Dataset Description

The dataset contains dummy HR onboarding events with fields including:

Employee_ID — unique claim identifier
Activity — process step
timestamp — event time

These were mapped to PM4Py event-log fields: Employee_ID is mapped to case:concept:name Activity is mapped to concept:name timestamp is mapped to time:timestamp

Technologies Used

Python 3
PM4Py – Process Mining for Python
Pandas – Data handling
Google Colab – Runtime environment

Installation

Inside the notebook, PM4Py is installed with:

!pip install pm4py

Import Libraries & Load Dataset

Load your CSV event log and inspect the initial data.

import pm4py

import pandas as pd

df = pd.read_csv('HR_Onboarding.csv')

Convert DataFrame → PM4Py Event Log

Convert your cleaned DataFrame into a PM4Py event log.

df = df.rename(columns={
    "Employee_ID": "case:concept:name",
    "Activity": "concept:name",
    "Timestamp": "time:timestamp"
})
df["time:timestamp"] = pd.to_datetime(df["time:timestamp"])

Discover Standard Directly-Follows Graph (DFG)

Generate and visualize the initial DFG.

import pm4py
from pm4py.algo.discovery.dfg import algorithm as dfg_discovery
from pm4py.visualization.dfg import visualizer as dfg_visualization
log = pm4py.convert_to_event_log(df)
dfg = dfg_discovery.apply(log)

Standard DFG Visualization

gviz = dfg_visualization.apply(dfg, log=log)
dfg_visualization.view(gviz)

It will display image

Frequency-Based DFG

from pm4py.algo.discovery.dfg import algorithm as dfg_discovery
from pm4py.visualization.dfg import visualizer as dfg_visualization

dfg_freq = dfg_discovery.apply(log, variant=dfg_discovery.Variants.FREQUENCY)
gviz_freq = dfg_visualization.apply(dfg_freq, log=log, variant=dfg_visualization.Variants.FREQUENCY)
dfg_visualization.view(gviz_freq)

Performance-Based DFG

dfg_perf = dfg_discovery.apply(log, variant=dfg_discovery.Variants.PERFORMANCE)
gviz_perf = dfg_visualization.apply(dfg_perf, log=log, variant=dfg_visualization.Variants.PERFORMANCE)
dfg_visualization.view(gviz_perf)

Repository Structure

| mining.ipynb is Main Google Colab notebook | | HR_Onboarding.csv is Input event log dataset | | dfg_frequency.png is Frequency DFG image | | dfg_performance.png is Performance DFG image | | README.md is Project documentation |

Purpose of the Notebook

This notebook helps you:

Understand event logs and their structure
Transform raw process data into PM4Py format
Discover process models using DFGs
Analyze bottlenecks using performance metrics
Build a reproducible process mining workflow

How to Use This Notebook

Open the notebook in Google Colab
Upload the event log CSV
Run all cells sequentially
Visualizations (DFGs) will appear automatically

The HR Onboarding dataset contains:

Employee_ID
Activities
Timestamps

These are converted into PM4Py’s event log format during preprocessing.

References

PM4Py Documentation

Author Satish Gupta

This project was created as part of a hands-on learning exercise in Process Mining with Python & PM4Py. Check project

!pip install pm4py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
HR_Onboarding.2.csv		HR_Onboarding.2.csv
README.md		README.md
mining.ipynb		mining.ipynb
temp.txt		temp.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Process Mining with PM4Py

Project Overview

Dataset Description

Technologies Used

Installation

Import Libraries & Load Dataset

Convert DataFrame → PM4Py Event Log

Discover Standard Directly-Follows Graph (DFG)

Standard DFG Visualization

Frequency-Based DFG

Performance-Based DFG

Repository Structure

Purpose of the Notebook

How to Use This Notebook

References

Author Satish Gupta

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Process Mining with PM4Py

Project Overview

Dataset Description

Technologies Used

Installation

Import Libraries & Load Dataset

Convert DataFrame → PM4Py Event Log

Discover Standard Directly-Follows Graph (DFG)

Standard DFG Visualization

Frequency-Based DFG

Performance-Based DFG

Repository Structure

Purpose of the Notebook

How to Use This Notebook

References

Author Satish Gupta

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages