An R package for implementing geospatial cluster identification from time series of counts, by location. Locations can be expressed as counties, zip codes, census tracts, or other user-defined geographies. Users provide:
- a data.frame of counts by location and date
- a distance object, that contains the distance between location and its neighbors
The package provides functions to create these distance objects in either matrix or list format. These can be generated for census tract, zip codes, or counties (fips), or can be constructed for custom locations by providing a dataframe with columns for latitude and longitude (i.e the centroid of each location).
Install the gsClusterDetect package from CRAN as follows:
install.packages("gsClusterDetect")Install the development version from git as follows:
devtools::install_github("lmullany/gsClusterDetect")- Load the package and provide data frame with
location,date, andcountcolumns.
library(gsClusterDetect)
df <- example_count_data
tail(df)
location date count
<char> <IDat> <int>
1: 39171 2025-02-04 1
2: 39171 2025-02-05 0
3: 39173 2025-02-04 6
4: 39173 2025-02-05 7
5: 39175 2025-02-04 2
6: 39175 2025-02-05 0- Generate the distance matrix for this location. In this case, the synthetic data has
counts from counties/fips in the state of OHIO, so we use
county_distance_matrix()and pass the state abbreviation:
ohio_dm <- county_distance_matrix("OH")
# This is named list of two elements
cat("Class:", class(ohio_dm), "\nNames:", names(ohio_dm))
Class: list
Names: loc_vec distance_matrix- Set the end of your target period. This is called the
detect_date, and is a parameter that must be passed to thefind_clustersfunction. Typically, this might be the current (or last available) date.
detect_date <- max(df[, date])- Call the
find_clusters()function; See?find_clusters()for full set of options. Note that below, we pass the minimum required elements:cases,distance_matrix,detect_date, and set thedistance_limit(the maximum size of the clusters) to 50 (miles).
clusters <- find_clusters(
cases = df,
distance_matrix = ohio_dm[["distance_matrix"]],
detect_date = detect_date,
distance_limit = 50
)- Luke Mullany Luke.Mullany@jhuapl.edu
- Howard Burkom Howard.Burkom@jhuapl.edu
Copyright 2026 The Johns Hopkins University Applied Physics Laboratory LLC.