From 7b2ef08874bdc94c4dd458301c884014c7b423cb Mon Sep 17 00:00:00 2001 From: Evan West Date: Tue, 4 Feb 2025 14:54:43 -0500 Subject: [PATCH 1/2] Update README.md --- README.md | 75 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 8b1ee1a..b6b5465 100644 --- a/README.md +++ b/README.md @@ -1,45 +1,72 @@ -# How to Run +# Streaming Connected Components with Dynamic Queries +Previous [implementation](https://github.com/GraphStreamingProject/GraphStreamingCC) of graph semi-streaming algorithms can efficiently process a stream of edge insertions and deletions and then return the connected components of the graph. In this repo we will attempt to build an implementation that can efficiently answer queries interspersed with the stream while retaining update performance. + +## Based Upon +* [Dynamic graph connectivity in poly-logarithmic worst case time](https://dl.acm.org/doi/10.5555/2627817.2627898) Kapron, King, Mountjoy. +* Our paper here, once its published + +## Building and Running + +### Requirements +- Unix OS (not Mac, tested on Ubuntu) +- cmake>=3.15 +- openmpi +- openmp + +### Building +1. Clone this repository +2. Create a build sub directory +3. From build run `cmake ..` +4. Run `make` +``` +git clone https://github.com/etwest/DynamicQueriesCC.git +mkdir build +cd build +cmake .. +make -j +``` + +### Run unit tests +Run our unit tests with the following command:`./dynamicCC_tests` + +## Experiments + +### Acquire Data +Place instructions for downloading datasets here. + +## How to Run Setup: -* ```$ mkdir build``` -* ```$ cd build``` -* ```$ mkdir binary_streams``` -* ```$ ln -s /mnt/ssd2/fast_query_project/binary_streams/* binary_streams``` -* ```$ cmake .. -DCMAKE_BUILD_TYPE=Release``` or release type of choice (Release, RelWithDebInfo, Debug) -* ```$ make$``` +- Complete build steps above +- Create a link to datasets called `binary_streams` in the build directory: `ln -s binary_streams` Run With Scripts: * To run performance tests for update speed, query speed, or correctness tests, uncomment the script file to run on the input stream of your choice, and then run one of the following scripts: -* You may have to make the scripts executable with ```$ chmod +x [script_file]``` -* ```$ scripts/mpi_update_test``` -* ```$ scripts/mpi_query_test``` -* ```$ scripts/mpi_correct_test``` -* ```$ scripts/mpi_space_test``` -* * In the space test script you can change the numerical value TIME in ```./../scripts/mem_record.sh mpi_dynamicCC_tests [TIME] ./../results/mpi_space_results/$2_mem.txt``` to determine how many seconds are inbetween each memory measurement +* You may have to make the scripts executable with `chmod +x [script_file]` +* `scripts/mpi_update_test` +* `scripts/mpi_query_test` +* `scripts/mpi_correct_test` +* `scripts/mpi_space_test` +* In the space test script you can change the numerical value TIME in `./../scripts/mem_record.sh mpi_dynamicCC_tests [TIME] ./../results/mpi_space_results/$2_mem.txt` to determine how many seconds are inbetween each memory measurement Tweaking Hyperparameters: -* Update batch size: in ```test/mpi_graph_tiers_test.cpp``` edit the ```DEFAULT_BATCH_SIZE``` variable. -* Sketch buffer size: in ```include/skiplist.h``` edit the ```skiplist_buffer_cap``` variable. -* Skiplist height: in ```test/mpi_graph_tiers_test.cpp``` in the specific test you want to run edit the ```height_factor``` and\or ```sketchless_height_factor``` variables. Note that the first variable is for the skiplists in the Euler tour trees for each tier, and the second variable is only for the single query Euler tour tree on the input node (not containing sketches). +* Update batch size: in `test/mpi_graph_tiers_test.cpp` edit the `DEFAULT_BATCH_SIZE` variable. +* Sketch buffer size: in `include/skiplist.h` edit the `skiplist_buffer_cap` variable. +* Skiplist height: in `test/mpi_graph_tiers_test.cpp` in the specific test you want to run edit the `height_factor` and\or `sketchless_height_factor` variables. Note that the first variable is for the skiplists in the Euler tour trees for each tier, and the second variable is only for the single query Euler tour tree on the input node (not containing sketches). Run OMP Version Manually: -* ```$ ./dynamicCC_tests [binary_stream_file] --gtest_filter=*[filter]*``` +* `./dynamicCC_tests [binary_stream_file] --gtest_filter=*[filter]*` * Possible filters: omp_speed, omp_correct, query_speed, etc. * Possible streams: kron_13_stream_binary, kron_15_stream_binary, etc. Run MPI Version Manually: -* ```$ mpirun -np [num_processes] ./mpi_dynamicCC_tests [binary_stream_file] --gtest_filter=*[filter]*``` +* `mpirun -np [num_processes] ./mpi_dynamicCC_tests [binary_stream_file] --gtest_filter=*[filter]*` * num_processes: you can try to guess the number of processes and run it, the program will tell you the correct number for your input * Possible filters: mpi_speed, mpi_correct, mpi_queries, etc. * Possible streams: kron_13_stream_binary, kron_15_stream_binary, etc. -# Streaming Connected Components with Dynamic Queries -Using our previous [implementation](https://github.com/GraphStreamingProject/GraphStreamingCC) we can efficiently process a stream of edge insertions and deletions and then return the connected components of the graph. In this repo we will attempt to build an implementation that can efficiently answer queries interspersed with the stream. -# Based Upon -* [Dynamic graph connectivity in poly-logarithmic worst case time](https://dl.acm.org/doi/10.5555/2627817.2627898) Kapron, King, Mountjoy. -* Our paper here, once its published -# Data-structures +## Data-structures ### Euler Tour Tree Euler Tour Trees are a spanning tree representation that can be efficiently joined togther or cut apart. Additionally, they can efficient maintain sub-tree aggregate information. The key to this algorithm is that this sub-tree aggregate information can include the sum over the sketches for all nodes in the sub-tree. We can then use these sketches to sample edges in the cut of this sub-tree and the rest of the graph. This allows us to efficiently use sketches to join ETTs and to potentially repair deletions that shouldn't change the connected components. From 692745a37d8d659fbf9a861eaf11cb532daaf189 Mon Sep 17 00:00:00 2001 From: Evan West Date: Tue, 4 Feb 2025 17:33:21 -0500 Subject: [PATCH 2/2] Update README.md --- README.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index b6b5465..bbe5fcc 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,7 @@ # Streaming Connected Components with Dynamic Queries -Previous [implementation](https://github.com/GraphStreamingProject/GraphStreamingCC) of graph semi-streaming algorithms can efficiently process a stream of edge insertions and deletions and then return the connected components of the graph. In this repo we will attempt to build an implementation that can efficiently answer queries interspersed with the stream while retaining update performance. +This repository solves fully dynamic graph connectivity in the semi-streaming setting. It achieves fast update throughputs (millions of updates per second) and compact data-structures (orders of magnitude smaller than other dynamic connectivity systems), while also answering connectivity queries (of the form is `u` connected to `v`) in less than a 0.00001 seconds. -## Based Upon -* [Dynamic graph connectivity in poly-logarithmic worst case time](https://dl.acm.org/doi/10.5555/2627817.2627898) Kapron, King, Mountjoy. -* Our paper here, once its published +Cite our paper here, once published. ## Building and Running @@ -71,9 +69,9 @@ Run MPI Version Manually: ### Euler Tour Tree Euler Tour Trees are a spanning tree representation that can be efficiently joined togther or cut apart. Additionally, they can efficient maintain sub-tree aggregate information. The key to this algorithm is that this sub-tree aggregate information can include the sum over the sketches for all nodes in the sub-tree. We can then use these sketches to sample edges in the cut of this sub-tree and the rest of the graph. This allows us to efficiently use sketches to join ETTs and to potentially repair deletions that shouldn't change the connected components. -Additionally, we increase the usefulness of the euler tour trees by having them track the highest weight edge added at a given tier for each node. This is done so that when inserting at a lower tier would cause a cycle we can delete this tracked edge. The Kapron, King, Mountjoy paper uses link-cut-trees for this purpose but we have found that it is unnecessary to do so. +We implement our Euler Tour Trees with reduced height skip-lists (see our paper). ### Linear Sketching In the context of graph streaming, Linear Sketching data-structures represent a collection a edges incident to a node (or cluster of nodes) in sub-linear space. This sub-linear representation can be queried and will return an edge incident to the node (or cluster) at random. -We use our linear sketch `CubeSketch` that we describe and implement in `paper reference here` and in our previous implementation. The paper by Kapron, King, Mountjoy utilizes a different version of linear sketching. The performance of our sketching technique is a log faster in the average case. +We use the linear sketch `CameoSketch` in our implementation.