An implementation of the "POLIGRAS: Policy-based Graph Summarization".
All datasets can be accessed [here] (because of the size limitation in Github (<= 25MB), we cannot directly upload some datasets onto Github website).
We have uploaded the astro-ph and cnr-200 (in .zip file) into the ./dataset/. Before running code on a specific dataset, please make sure to create a file directory with the same name as the dataset at first, then download and unzip dataset files from the given link and put them into the created file directory.
For example, if running on the in-2004 dataset, users can execute the following steps:
- Create directory
./dataset/in-2004/.
mkdir ./dataset/in-2004/
- Download and unzip the in-2004 dataset files, which include the
in-2004_graphfile that contains the graph structure and thein-2004_featfile that contains node features
unzip in-2004_graph.zip
unzip in-2004_feat.zip
, then move in-2004_graph file and in-2004_feat file into ./dataset/in-2004/.
mv in-2004_graph in-2004_feat ./dataset/in-2004/
In detail, the in-2004_graph file includes the graph structure in the Networkx graph format, and it can be generated from other graph format (e.g., edge list) by the provided networkx_graph_generation.py file. The in-2004_feat file includes the node feature matrix with row size as the node number and column size as the feature dimension, and it can be generated by the provided node_feature_generation.py file.
Install the following tools and packages:
-
python3: Assumepython3by default (usepip3to install packages). -
numpy -
torch -
random -
networkx -
copy -
argparse -
pickle -
glob
Users can also run the provided installer.txt to install all the above packages.
pip3 install -r installer.txt
The following commands train and execute the Poligras model on a specific dataset.
python3 src/run.py --dataset dataset_name
Users can also set up more model options by:
--dataset: dataset to run;--counts: number of graph summarization iterations;--lr: learning rate;
python3 src/run.py --dataset astro-ph --counts 100 --lr 0.001
After the running, the graph summary (including the supernodes, superedges, and edge correction set) will be stored into a file named datasetname_graph_summary. The total summarization reward will be printed in the following example format:
#super edge: ####
correction set size: ####
-------SuperNode encoding ended, total reward is ####---------.
Apart from the graph summarization rewards, users can also see the final graph summary superedges number and correction set size.
The run.py is to set up all hyperparameters and import the Poligras model implemented in model.py.
The model.py includes the Poligras model details, and will be imported by run.py when running the code.
The networkx_graph_generation.py is to generate the Networkx format graph from the given initial graph stored in other formats (e.g., graph stored in edge list format).
The node_feature_generation.py is to generate node features for the given Networkx graph generated by networkx_graph_generation.py.