Skip to content

Jun-Zhang-32108/Data-Mining

Repository files navigation

Data Mining Project

Attention! To see the equations in this file with chrome browser without flaws, we recommend a plugin for chrome browser called MathJax Plugin for Github.

Author: Jun Zhang <jun.1.zhang@aalto.fi>, Zhiheng Qian <zhiheng.qian@aalto.fi>

The objective of this project is to design and implement a algorithm to partition a graph into communities. We use pre-processed graphs from the Stanford Network Analysis Project. The preprocessed datasets are in the folder "\graph_processed". Given an undirected graph $G = (V, E)$ and an integer $k &gt; 1$, we need to partition the set of vertices $V$ into $k$ communities $(V_1,...,V_k)$, so that$\cup_{i=1}^{k}V_i = V$ and $V_i \cap V_j = \emptyset$ for all $i\neq j$. We want the communities $V_1, . . . , V_k$ to be as much separated from each other as possible. We also want that the communities have roughly equal size. Thus, we will evaluate the goodness of a partition $V_1,...,V_k$ using the objective function $$\phi (V_1,...,V_k)=\sum_{i=1}^{k} \frac{|E(V_i,\overline{V_i})|}{|V_i|}$$ where for $S, T \subseteq V$ with $S \cap T = \phi$ we define $E(S, T)$ to be the set of edges of $G$ with one endpoint in $S$ and the other endpoint in $T$, i.e., $E(S,T) = {(u,v) \in E | u \in S; and; v \in T}$, and we also define $\overline{V_i} = V \backslash V_i$ .

The baseline algorithm is spectral clustering with RatioCut. We also explored a couple of ways to optimize the general spectral clustering and do some comparisons. To see the complete comparsons in detail, please refer to the file "DataMining_Project.pdf".

To run the program, please use the following command:

    python3 graph_partition.py [filename]

If you want to save the eigenvectors, please add the parameter '--save', i.e:

    python3 graph_partition.py [filename] --save

The result will be save in the folder "\result". Some useful statistics will be saved in the foler "\stat". Notice that some datasets such as soc-Epinions1, web-NotreDame, roadNet-CA are too big to be ran on the PC. In that case, please use powerful server instead.

About

Data Mining Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published