-
Notifications
You must be signed in to change notification settings - Fork 11
Description
There's a persistent issue where people provide expansive and idiosyncratic responses (e.g. "I'm sexually female") that can be reasonably classified by a human user, but are difficult to accommodate in the dictionaries method as it stands.
There are a number of suggestions for how we might resolve this (e.g. grep), but these of course have potential issues with unknown future inputs. Emily also likes how the current process gives you a transparent log of how recoding happens which becomes trickier with fuzzy matching.
This is a summary of the proposed (by Emily and I) implementation of any fuzzy matching.
Fuzzy matching should:
- not be default
- require deliberate action to implement (i.e. not just
fuzzy = TRUE), - require user input to validate matches.
The core function arguments would default to:
gender_recode <- function(gender = gender, dictionary = gendercoder::broad, fill = FALSE, match = "exact")
And implementation would be:
gender_recode(gender_data, dictionary = broad, fill = TRUE, match = "fuzzy")
> gendercoder has exactly matched 99 (99%) of cases
> gendercoder suggests that "I'm sexually female" indicates a gender of: female. Please provide input:
1. Yes, female
2. No, male
3. No, Sex and gender diverse
4. No, other (provide text input)
5. No, replace with NA
Selection:
Keen to get input on alternatives and implementations.