Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: eiCompare
Type: Package
Title: Compares Different Ecological Inference Methods
Version: 3.0.4
Version: 3.0.5
Authors@R:
c(person(given = "Loren",
family = "Collingwood",
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ export(ei_rc_good_table)
export(ei_reg_bayes_conf_int)
export(ei_rxc)
export(elect_algebra)
export(extract_rxc_precinct)
export(fips_extract)
export(get_multi_barreled_surnames)
export(get_special_character_surnames)
Expand Down
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# eiCompare 3.0.5

## added function 11/11/25

* included extract_rxc_precinct() function to extract precinct level estimates from ei_rxc()

# eiCompare 3.0.4

## Package changes

* added rpv_normalize() function
* removed wru dependency
* incorporated rpv_coef_plot() and rpv_toDF() functions from eiExpand package
* edited ei_iter() to have flexible CI parameters (default is 0.95) using bayestestR for calculation and updated column naming, and to use reproducible parallel processing (.inorder=TRUE)
* edited ei_rxc() with repdocuible parallel processing and changed column naming to fit ei_iter()
Expand Down
118 changes: 118 additions & 0 deletions R/extract_rxc_precinct.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#' Extract Precinct-Level Estimates from ei.MD.bayes Object
#'
#' Extracts precinct-specific ecological inference estimates from ei_rxc() output.
#' Uses exact string matching to handle variation column names
#'
#' @param eivote `ei_rxc()` output object containing `stat_objects`
#' @param cand_cols Character vector of candidate column names (e.g., `c("pct_cand_A", "pct_cand_B")`)
#' @param race_cols Character vector of race column names (e.g., `c("pct_black", "pct_white")`)
#' @param dat Original data frame used in `ei_rxc()` call
#' @param precinct_id Column name for precinct identifier (must exist in `dat`)
#'
#' @return Data frame with precinct IDs and race×candidate estimate columns
#'
#' @details
#' The function extracts `md_out$draws$Beta` from the `ei_rxc()` output, which contains
#' MCMC draws for each precinct-race-candidate combination. Beta column names follow
#' the format `"beta.race_name.cand_name.precinct_idx"`. The function computes posterior
#' means across MCMC iterations for each precinct.
#'
#' Output columns follow `expand.grid(cand, race)` ordering, with column names formatted
#' as `paste0(race, cand)` (e.g., `"pct_blackpct_cand_A"`).
#'
#' @examples
#' \donttest{
#'
#' # library(eiCompare)
#' # data(gwinnett_ei)
#' #
#' # gwinnett_ei$precinct <- 1:nrow(gwinnett_ei)
#' #
#' # eivote <- ei_rxc( #this will take some time
#' # data = gwinnett_ei,
#' # cand_cols = c("kemp", "abrams", "metz"),
#' # race_cols = c("white", "black", "other"),
#' # totals_col = "turnout",
#' # seed = 12345
#' #)
#'
#' # # Extract precinct-level estimates
#' # precinct_results <- extract_rxc_precinct(
#' # eivote = eivote,
#' # cand_cols = c("kemp", "abrams"),
#' # race_cols = c("white", "black", "other"),
#' # dat = gwinnett_ei,
#' # precinct_id = "precinct"
#' #)
#'
#' #head(precinct_results)
#' }
#'
#' @export
extract_rxc_precinct <- function(eivote, cand_cols, race_cols, dat, precinct_id) {

# Extract md_out object from ei_rxc wrapper
eiMD_object <- eivote$stat_objects[[1]]

# Extract Beta matrix (MCMC iterations × beta parameters)
Beta <- eiMD_object$draws$Beta

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was messing with using dplyr functionality, which I tend toward when possible over base R matrix stuff. I used this which pretty much does it. format is slightly different but you can tweak as needed.

beta_means <- Beta %>% colMeans()

result_df <- data.frame(
    names = gsub('beta.','',names(beta_means)),
    vals = beta_means
  ) %>%
  tidyr::separate(
    col = names,
    into = c('race', 'cand', 'precinct'),
    sep = "\\."
  ) %>%
  tidyr::pivot_wider(
    names_from = c(race,cand),
    values_from = vals
  )

It simplifies the code and also makes it so the user doesn't need to specify the race and cands and the other args, it would just be the rxc result object. This assumes that the naming convention will always be beta.<race>.<cand>.<precinct> which I think is right based on your code?

If you want the user to be able to specify which cands and races are in the output you can keep those args and tweak this statement to filter on them. I kind of like it clean and just have it include all and the user can just subset results if they want. But thats really just an operational preference and I defer to you. If there are runs with a lot of cands and races it would mean an output with a ton of columns so I can see wanting to keep the args.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey good suggestions throughout. I think we should defer on the dplyr functionality, just because we did some extensive stress testing/testing for edge case names and want the function to work with messier names. We could stress test your version, but I think we should defer that and get this up and running.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thats ok for now. The next thing I want to do is make a bunch of package tests. Do you have some scripts of the stress testing you did? I can incorporate that in, which is important either way! Then if down the road we want to update this function we will have tests in place to make sure nothing breaks.

# Check that precinct_id column exists in dat
if(!precinct_id %in% colnames(dat)) {
stop(paste0("Column '", precinct_id, "' not found in dat. ",
"Available columns: ", paste(colnames(dat), collapse = ", ")))
}

n_precincts <- nrow(dat)
beta_colnames <- colnames(Beta)

# Initialize result matrix (precincts × race-candidate combinations)
result_matrix <- matrix(NA,
nrow = n_precincts,
ncol = length(race_cols) * length(cand_cols))

# Loop through race-candidate combinations and extract precinct estimates
col_idx <- 1
for(race in race_cols) {
for(cand in cand_cols) {

# Build expected prefix pattern for exact matching
# Format: beta.race.cand.precinct_number
expected_prefix <- paste0("beta.", race, ".", cand, ".")

# Find Beta columns matching this race-candidate pair
matching_cols <- grep(paste0("^", gsub("\\.", "\\\\.", expected_prefix)),
beta_colnames,
value = FALSE)

# Validation - should have exactly n_precincts matches
if(length(matching_cols) != n_precincts) {
stop(paste0("Column matching error for race='", race, "', cand='", cand,
"': found ", length(matching_cols), " columns but expected ",
n_precincts, " precincts"))
}

# Extract precinct indices and reorder to match dat row order
precinct_nums <- sub(expected_prefix, "", beta_colnames[matching_cols])
precinct_order <- order(as.numeric(precinct_nums))
matching_cols_ordered <- matching_cols[precinct_order]

# Calculate mean across MCMC iterations for each precinct
result_matrix[, col_idx] <- colMeans(Beta[, matching_cols_ordered])
col_idx <- col_idx + 1
}
}

# Create column names (race + candidate, matching expand.grid order)
col_names_df <- expand.grid(cand = cand_cols, race = race_cols)
col_names <- paste0(col_names_df$race, col_names_df$cand)

# Convert to data frame with column names
result_df <- as.data.frame(result_matrix)
colnames(result_df) <- col_names

# Attach precinct IDs from original data as first column
result_df <- cbind(dat[, precinct_id, drop = FALSE], result_df)

return(result_df)
}
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,27 @@

## News

# eiCompare 3.0.5

## New function

* included extract_rxc_precinct() function to extract precinct level estimates from ei_rxc()

# eiCompare 3.0.4

## Package changes
* added add_rpv_normalize() function
* removed wru dependency
* incorporated rpv_coef_plot() and rpv_toDF() functions from eiExpand package
* edited ei_iter() to have flexible CI parameters (default is 0.95) using bayestestR for calculation and updated column naming, and to use reproducible parallel processing (.inorder=TRUE)
* edited ei_rxc() with repdocuible parallel processing and changed column naming to fit ei_iter()
* Fixed summary.eiCompare() print behavior
* Added viridis to imports for color visualiztion and updated RoxygenNote to 7.3.2

### eiCompare 3.0.3

Updated

### eiCompare 3.0.2

#### Package changes
Expand Down
64 changes: 64 additions & 0 deletions man/extract_rxc_precinct.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading