gsClusterDetect

Description

An R package for implementing geospatial cluster identification from time series of counts, by location. Locations can be expressed as counties, zip codes, census tracts, or other user-defined geographies. Users provide:

a data.frame of counts by location and date
a distance object, that contains the distance between location and its neighbors

The package provides functions to create these distance objects in either matrix or list format. These can be generated for census tract, zip codes, or counties (fips), or can be constructed for custom locations by providing a dataframe with columns for latitude and longitude (i.e the centroid of each location).

Installation

Install the gsClusterDetect package from CRAN as follows:

install.packages("gsClusterDetect")

Install the development version from git as follows:

devtools::install_github("lmullany/gsClusterDetect")

Getting Started:

Load the package and provide data frame with location, date, and count columns.

library(gsClusterDetect)
df <- example_count_data
tail(df)

   location       date count
     <char>     <IDat> <int>
1:    39171 2025-02-04     1
2:    39171 2025-02-05     0
3:    39173 2025-02-04     6
4:    39173 2025-02-05     7
5:    39175 2025-02-04     2
6:    39175 2025-02-05     0

Generate the distance matrix for this location. In this case, the synthetic data has counts from counties/fips in the state of OHIO, so we use county_distance_matrix() and pass the state abbreviation:

ohio_dm <- county_distance_matrix("OH")

# This is named list of two elements
cat("Class:", class(ohio_dm), "\nNames:", names(ohio_dm))

Class: list 
Names: loc_vec distance_matrix

Set the end of your target period. This is called the detect_date, and is a parameter that must be passed to the find_clusters function. Typically, this might be the current (or last available) date.

detect_date <- max(df[, date])

Call the find_clusters() function; See ?find_clusters() for full set of options. Note that below, we pass the minimum required elements: cases, distance_matrix, detect_date, and set the distance_limit (the maximum size of the clusters) to 50 (miles).

clusters <- find_clusters(
    cases = df,
    distance_matrix = ohio_dm[["distance_matrix"]],
    detect_date = detect_date,
    distance_limit = 50
)

Contacts:

Luke Mullany Luke.Mullany@jhuapl.edu
Howard Burkom Howard.Burkom@jhuapl.edu

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
R		R
data-raw		data-raw
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md
gsClusterDetect.Rproj		gsClusterDetect.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gsClusterDetect

Description

Installation

Getting Started:

Contacts:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

gsClusterDetect

Description

Installation

Getting Started:

Contacts:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages