extract_rxc_precinct function #155

rachel-carroll · 2025-11-10T05:03:05Z

I was messing with using dplyr functionality, which I tend toward when possible over base R matrix stuff. I used this which pretty much does it. format is slightly different but you can tweak as needed.

beta_means <- Beta %>% colMeans() result_df <- data.frame( names = gsub('beta.','',names(beta_means)), vals = beta_means ) %>% tidyr::separate( col = names, into = c('race', 'cand', 'precinct'), sep = "\\." ) %>% tidyr::pivot_wider( names_from = c(race,cand), values_from = vals )

It simplifies the code and also makes it so the user doesn't need to specify the race and cands and the other args, it would just be the rxc result object. This assumes that the naming convention will always be beta.<race>.<cand>.<precinct> which I think is right based on your code?

If you want the user to be able to specify which cands and races are in the output you can keep those args and tweak this statement to filter on them. I kind of like it clean and just have it include all and the user can just subset results if they want. But thats really just an operational preference and I defer to you. If there are runs with a lot of cands and races it would mean an output with a ton of columns so I can see wanting to keep the args.

Hey good suggestions throughout. I think we should defer on the dplyr functionality, just because we did some extensive stress testing/testing for edge case names and want the function to work with messier names. We could stress test your version, but I think we should defer that and get this up and running.

ok thats ok for now. The next thing I want to do is make a bunch of package tests. Do you have some scripts of the stress testing you did? I can incorporate that in, which is important either way! Then if down the road we want to update this function we will have tests in place to make sure nothing breaks.

-Original file line number
+Diff line change
@@ -1,7 +1,7 @@
     Package: eiCompare
     Type: Package
     Title: Compares Different Ecological Inference Methods
-    Version: 3.0.4
+    Version: 3.0.5
     Authors@R:
         c(person(given = "Loren",
                  family = "Collingwood",
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -15,6 +15,7 @@ export(ei_rc_good_table) @@
     export(ei_reg_bayes_conf_int)
     export(ei_rxc)
     export(elect_algebra)
+    export(extract_rxc_precinct)
     export(fips_extract)
     export(get_multi_barreled_surnames)
     export(get_special_character_surnames)
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -1,7 +1,14 @@
+    # eiCompare 3.0.5
+    ## added function 11/11/25
+    * included extract_rxc_precinct() function to extract precinct level estimates from ei_rxc()
     # eiCompare 3.0.4
     ## Package changes
+    * added rpv_normalize() function
+    * removed wru dependency
     * incorporated rpv_coef_plot() and rpv_toDF() functions from eiExpand package
     * edited ei_iter() to have flexible CI parameters (default is 0.95) using bayestestR for calculation and updated column naming, and to use reproducible parallel processing (.inorder=TRUE)
     * edited ei_rxc() with repdocuible parallel processing and changed column naming to fit ei_iter()
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,118 @@
+    #' Extract Precinct-Level Estimates from ei.MD.bayes Object
+    #'
+    #' Extracts precinct-specific ecological inference estimates from ei_rxc() output.
+    #' Uses exact string matching to handle variation column names
+    #'
+    #' @param eivote `ei_rxc()` output object containing `stat_objects`
+    #' @param cand_cols Character vector of candidate column names (e.g., `c("pct_cand_A", "pct_cand_B")`)
+    #' @param race_cols Character vector of race column names (e.g., `c("pct_black", "pct_white")`)
+    #' @param dat Original data frame used in `ei_rxc()` call
+    #' @param precinct_id Column name for precinct identifier (must exist in `dat`)
+    #'
+    #' @return Data frame with precinct IDs and race×candidate estimate columns
+    #'
+    #' @details
+    #' The function extracts `md_out$draws$Beta` from the `ei_rxc()` output, which contains
+    #' MCMC draws for each precinct-race-candidate combination. Beta column names follow
+    #' the format `"beta.race_name.cand_name.precinct_idx"`. The function computes posterior
+    #' means across MCMC iterations for each precinct.
+    #'
+    #' Output columns follow `expand.grid(cand, race)` ordering, with column names formatted
+    #' as `paste0(race, cand)` (e.g., `"pct_blackpct_cand_A"`).
+    #'
+    #' @examples
+    #' \donttest{
+    #'
+    #' # library(eiCompare)
+    #' # data(gwinnett_ei)
+    #' #
+    #' # gwinnett_ei$precinct <- 1:nrow(gwinnett_ei)
+    #' #
+    #' # eivote <- ei_rxc( #this will take some time
+    #' #  data = gwinnett_ei,
+    #' #  cand_cols = c("kemp", "abrams", "metz"),
+    #' #  race_cols = c("white", "black", "other"),
+    #' #  totals_col = "turnout",
+    #' #  seed = 12345
+    #' #)
+    #'
+    #' # # Extract precinct-level estimates
+    #' # precinct_results <- extract_rxc_precinct(
+    #' #  eivote = eivote,
+    #' #  cand_cols = c("kemp", "abrams"),
+    #' #  race_cols = c("white", "black", "other"),
+    #' #  dat = gwinnett_ei,
+    #' #  precinct_id = "precinct"
+    #' #)
+    #'
+    #' #head(precinct_results)
+    #' }
+    #'
+    #' @export
+    extract_rxc_precinct <- function(eivote, cand_cols, race_cols, dat, precinct_id) {
+      # Extract md_out object from ei_rxc wrapper
+      eiMD_object <- eivote$stat_objects[[1]]
+      # Extract Beta matrix (MCMC iterations × beta parameters)
+      Beta <- eiMD_object$draws$Beta
+      # Check that precinct_id column exists in dat
+      if(!precinct_id %in% colnames(dat)) {
+        stop(paste0("Column '", precinct_id, "' not found in dat. ",
+                    "Available columns: ", paste(colnames(dat), collapse = ", ")))
+      }
+      n_precincts <- nrow(dat)
+      beta_colnames <- colnames(Beta)
+      # Initialize result matrix (precincts × race-candidate combinations)
+      result_matrix <- matrix(NA,
+                              nrow = n_precincts,
+                              ncol = length(race_cols) * length(cand_cols))
+      # Loop through race-candidate combinations and extract precinct estimates
+      col_idx <- 1
+      for(race in race_cols) {
+        for(cand in cand_cols) {
+          # Build expected prefix pattern for exact matching
+          # Format: beta.race.cand.precinct_number
+          expected_prefix <- paste0("beta.", race, ".", cand, ".")
+          # Find Beta columns matching this race-candidate pair
+          matching_cols <- grep(paste0("^", gsub("\\.", "\\\\.", expected_prefix)),
+                               beta_colnames,
+                               value = FALSE)
+          # Validation - should have exactly n_precincts matches
+          if(length(matching_cols) != n_precincts) {
+            stop(paste0("Column matching error for race='", race, "', cand='", cand,
+                        "': found ", length(matching_cols), " columns but expected ",
+                        n_precincts, " precincts"))
+          }
+          # Extract precinct indices and reorder to match dat row order
+          precinct_nums <- sub(expected_prefix, "", beta_colnames[matching_cols])
+          precinct_order <- order(as.numeric(precinct_nums))
+          matching_cols_ordered <- matching_cols[precinct_order]
+          # Calculate mean across MCMC iterations for each precinct
+          result_matrix[, col_idx] <- colMeans(Beta[, matching_cols_ordered])
+          col_idx <- col_idx + 1
+        }
+      }
+      # Create column names (race + candidate, matching expand.grid order)
+      col_names_df <- expand.grid(cand = cand_cols, race = race_cols)
+      col_names <- paste0(col_names_df$race, col_names_df$cand)
+      # Convert to data frame with column names
+      result_df <- as.data.frame(result_matrix)
+      colnames(result_df) <- col_names
+      # Attach precinct IDs from original data as first column
+      result_df <- cbind(dat[, precinct_id, drop = FALSE], result_df)
+      return(result_df)
+    }

-Original file line number
+Diff line change
@@ Expand Up / @@ -13,6 +13,27 @@ @@
     ## News
+    # eiCompare 3.0.5
+    ## New function
+    * included extract_rxc_precinct() function to extract precinct level estimates from ei_rxc()
+    # eiCompare 3.0.4
+    ## Package changes
+    * added add_rpv_normalize() function
+    * removed wru dependency
+    * incorporated rpv_coef_plot() and rpv_toDF() functions from eiExpand package
+    * edited ei_iter() to have flexible CI parameters (default is 0.95) using bayestestR for calculation and updated column naming, and to use reproducible parallel processing (.inorder=TRUE)
+    * edited ei_rxc() with repdocuible parallel processing and changed column naming to fit ei_iter()
+    * Fixed summary.eiCompare() print behavior
+    * Added viridis to imports for color visualiztion and updated RoxygenNote to 7.3.2
+    ### eiCompare 3.0.3
+    Updated
     ### eiCompare 3.0.2
     #### Package changes
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract_rxc_precinct function #155

Uh oh!

Diff view

Diff view

There are no files selected for viewing

rachel-carroll Nov 10, 2025

Uh oh!

SeanDLong Nov 11, 2025

Uh oh!

rachel-carroll Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

extract_rxc_precinct function #155

Uh oh!

extract_rxc_precinct function #155

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

rachel-carroll Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

SeanDLong Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

rachel-carroll Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!