Skip to content

Latest commit

 

History

History

README.md

Declustering: Correcting Hierarchical Clustering Artifacts

Overview

When a hierarchical clustering algorithm (such as iterative Fiedler) is used to partition a graph, it can happen that subclusters of two previously separated clusters are actually more strongly connected to each other than to their assigned groups. Declustering is the process of identifying and merging these incorrectly separated subclusters.

The Problem

Consider a weighted graph with four groups of vertices (A, B, C, D) where:

  • Groups A and D are large (40 vertices each)
  • Groups B and C are small (10 vertices each)
  • B and C are strongly connected to each other (edge weight = 40)

The iterative Fiedler method proceeds as follows:

  1. Step 1: Split into {A, B} and {C, D} (the large groups dominate)
  2. Step 2: Split {A, B} into {A} and {B}
  3. Step 3: Split {C, D} into {C} and {D}

This produces 4 clusters: {A}, {B}, {C}, {D}. But B and C were separated in Step 1 only because of the influence of the large groups A and D, not because they belong apart.

The Solution

After hierarchical clustering, check whether merging adjacent subclusters improves the modularity. In this example, merging B and C into a single cluster {B, C} gives 3 clusters: {A}, {B, C}, {D} -- which has higher modularity than the 4-cluster result.

Mathematical Details

The Girvan-Newman modularity is computed as:

Q = trace(E) - ||E^2||

where E_ij is the fraction of edges connecting cluster i to cluster j, normalized by total edge weight.

See declustering.pdf for the full mathematical treatment.

Usage

% Run the numerical example
numerical_example

This generates:

  • The weighted adjacency matrix visualization
  • Hierarchical Fiedler clustering into 2, 3, and 4 clusters
  • Modularity comparison showing the declustered (3-cluster) solution is superior

Files

File Description
numerical_example.m Main script demonstrating the declustering concept
declustering.tex LaTeX source for the explanatory document
declustering.pdf Compiled PDF with full mathematical explanation
sample_graph*.png Visualizations of the graph under different clusterings
modularities.png Modularity comparison plot

Visual Results

The key insight is shown in the modularity comparison: the 3-cluster "declustered" solution achieves higher modularity than either the 2-cluster or 4-cluster solutions from the standard hierarchical method.

Declustering Example