-
Notifications
You must be signed in to change notification settings - Fork 7
extract_rxc_precinct function #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| #' Extract Precinct-Level Estimates from ei.MD.bayes Object | ||
| #' | ||
| #' Extracts precinct-specific ecological inference estimates from ei_rxc() output. | ||
| #' Uses exact string matching to handle variation column names | ||
| #' | ||
| #' @param eivote `ei_rxc()` output object containing `stat_objects` | ||
| #' @param cand_cols Character vector of candidate column names (e.g., `c("pct_cand_A", "pct_cand_B")`) | ||
| #' @param race_cols Character vector of race column names (e.g., `c("pct_black", "pct_white")`) | ||
| #' @param dat Original data frame used in `ei_rxc()` call | ||
| #' @param precinct_id Column name for precinct identifier (must exist in `dat`) | ||
| #' | ||
| #' @return Data frame with precinct IDs and race×candidate estimate columns | ||
| #' | ||
| #' @details | ||
| #' The function extracts `md_out$draws$Beta` from the `ei_rxc()` output, which contains | ||
| #' MCMC draws for each precinct-race-candidate combination. Beta column names follow | ||
| #' the format `"beta.race_name.cand_name.precinct_idx"`. The function computes posterior | ||
| #' means across MCMC iterations for each precinct. | ||
| #' | ||
| #' Output columns follow `expand.grid(cand, race)` ordering, with column names formatted | ||
| #' as `paste0(race, cand)` (e.g., `"pct_blackpct_cand_A"`). | ||
| #' | ||
| #' @examples | ||
| #' \donttest{ | ||
| #' | ||
| #' # library(eiCompare) | ||
| #' # data(gwinnett_ei) | ||
| #' # | ||
| #' # gwinnett_ei$precinct <- 1:nrow(gwinnett_ei) | ||
| #' # | ||
| #' # eivote <- ei_rxc( #this will take some time | ||
| #' # data = gwinnett_ei, | ||
| #' # cand_cols = c("kemp", "abrams", "metz"), | ||
| #' # race_cols = c("white", "black", "other"), | ||
| #' # totals_col = "turnout", | ||
| #' # seed = 12345 | ||
| #' #) | ||
| #' | ||
| #' # # Extract precinct-level estimates | ||
| #' # precinct_results <- extract_rxc_precinct( | ||
| #' # eivote = eivote, | ||
| #' # cand_cols = c("kemp", "abrams"), | ||
| #' # race_cols = c("white", "black", "other"), | ||
| #' # dat = gwinnett_ei, | ||
| #' # precinct_id = "precinct" | ||
| #' #) | ||
| #' | ||
| #' #head(precinct_results) | ||
| #' } | ||
| #' | ||
| #' @export | ||
| extract_rxc_precinct <- function(eivote, cand_cols, race_cols, dat, precinct_id) { | ||
|
|
||
| # Extract md_out object from ei_rxc wrapper | ||
| eiMD_object <- eivote$stat_objects[[1]] | ||
|
|
||
| # Extract Beta matrix (MCMC iterations × beta parameters) | ||
| Beta <- eiMD_object$draws$Beta | ||
|
|
||
| # Check that precinct_id column exists in dat | ||
| if(!precinct_id %in% colnames(dat)) { | ||
| stop(paste0("Column '", precinct_id, "' not found in dat. ", | ||
| "Available columns: ", paste(colnames(dat), collapse = ", "))) | ||
| } | ||
|
|
||
| n_precincts <- nrow(dat) | ||
| beta_colnames <- colnames(Beta) | ||
|
|
||
| # Initialize result matrix (precincts × race-candidate combinations) | ||
| result_matrix <- matrix(NA, | ||
| nrow = n_precincts, | ||
| ncol = length(race_cols) * length(cand_cols)) | ||
|
|
||
| # Loop through race-candidate combinations and extract precinct estimates | ||
| col_idx <- 1 | ||
| for(race in race_cols) { | ||
| for(cand in cand_cols) { | ||
|
|
||
| # Build expected prefix pattern for exact matching | ||
| # Format: beta.race.cand.precinct_number | ||
| expected_prefix <- paste0("beta.", race, ".", cand, ".") | ||
|
|
||
| # Find Beta columns matching this race-candidate pair | ||
| matching_cols <- grep(paste0("^", gsub("\\.", "\\\\.", expected_prefix)), | ||
| beta_colnames, | ||
| value = FALSE) | ||
|
|
||
| # Validation - should have exactly n_precincts matches | ||
| if(length(matching_cols) != n_precincts) { | ||
| stop(paste0("Column matching error for race='", race, "', cand='", cand, | ||
| "': found ", length(matching_cols), " columns but expected ", | ||
| n_precincts, " precincts")) | ||
| } | ||
|
|
||
| # Extract precinct indices and reorder to match dat row order | ||
| precinct_nums <- sub(expected_prefix, "", beta_colnames[matching_cols]) | ||
| precinct_order <- order(as.numeric(precinct_nums)) | ||
| matching_cols_ordered <- matching_cols[precinct_order] | ||
|
|
||
| # Calculate mean across MCMC iterations for each precinct | ||
| result_matrix[, col_idx] <- colMeans(Beta[, matching_cols_ordered]) | ||
| col_idx <- col_idx + 1 | ||
| } | ||
| } | ||
|
|
||
| # Create column names (race + candidate, matching expand.grid order) | ||
| col_names_df <- expand.grid(cand = cand_cols, race = race_cols) | ||
| col_names <- paste0(col_names_df$race, col_names_df$cand) | ||
|
|
||
| # Convert to data frame with column names | ||
| result_df <- as.data.frame(result_matrix) | ||
| colnames(result_df) <- col_names | ||
|
|
||
| # Attach precinct IDs from original data as first column | ||
| result_df <- cbind(dat[, precinct_id, drop = FALSE], result_df) | ||
|
|
||
| return(result_df) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was messing with using dplyr functionality, which I tend toward when possible over base R matrix stuff. I used this which pretty much does it. format is slightly different but you can tweak as needed.
It simplifies the code and also makes it so the user doesn't need to specify the race and cands and the other args, it would just be the rxc result object. This assumes that the naming convention will always be
beta.<race>.<cand>.<precinct>which I think is right based on your code?If you want the user to be able to specify which cands and races are in the output you can keep those args and tweak this statement to filter on them. I kind of like it clean and just have it include all and the user can just subset results if they want. But thats really just an operational preference and I defer to you. If there are runs with a lot of cands and races it would mean an output with a ton of columns so I can see wanting to keep the args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey good suggestions throughout. I think we should defer on the dplyr functionality, just because we did some extensive stress testing/testing for edge case names and want the function to work with messier names. We could stress test your version, but I think we should defer that and get this up and running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thats ok for now. The next thing I want to do is make a bunch of package tests. Do you have some scripts of the stress testing you did? I can incorporate that in, which is important either way! Then if down the road we want to update this function we will have tests in place to make sure nothing breaks.