Skip to content

add fuzzy matching #37

@Lingtax

Description

@Lingtax

There's a persistent issue where people provide expansive and idiosyncratic responses (e.g. "I'm sexually female") that can be reasonably classified by a human user, but are difficult to accommodate in the dictionaries method as it stands.

There are a number of suggestions for how we might resolve this (e.g. grep), but these of course have potential issues with unknown future inputs. Emily also likes how the current process gives you a transparent log of how recoding happens which becomes trickier with fuzzy matching.

This is a summary of the proposed (by Emily and I) implementation of any fuzzy matching.

Fuzzy matching should:

  1. not be default
  2. require deliberate action to implement (i.e. not just fuzzy = TRUE),
  3. require user input to validate matches.

The core function arguments would default to:
gender_recode <- function(gender = gender, dictionary = gendercoder::broad, fill = FALSE, match = "exact")

And implementation would be:

gender_recode(gender_data, dictionary = broad, fill = TRUE, match = "fuzzy")

> gendercoder has exactly matched 99 (99%) of cases
> gendercoder suggests that "I'm sexually female" indicates a gender of: female. Please provide input:

1. Yes, female
2. No, male
3. No, Sex and gender diverse
4. No, other (provide text input)
5. No, replace with NA

 Selection:

Keen to get input on alternatives and implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions