Skip to content
This repository was archived by the owner on Jul 7, 2025. It is now read-only.
This repository was archived by the owner on Jul 7, 2025. It is now read-only.

pandas-dedupe==1.5.0 not compatible with dedupe>=3.0 (released on 27th June 2024) #64

@jmanprz

Description

@jmanprz

pandas-dedupe install the latest version of dedupe which is 3.0.3 as of now. However, when defining the field_properties in df_final = pandas_dedupe.dedupe_dataframe(df=df, field_properties=[...]), the following error is raised by dedupe:

File "/.../lib/python3.11/site-packages/dedupe/api.py", line 1141, in init
self.data_model = datamodel.DataModel(variable_definition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../lib/python3.11/site-packages/dedupe/datamodel.py", line 32, in init
raise ValueError(
ValueError: It looks like you are trying to use a variable definition composed of dictionaries. dedupe 3.0 uses variable objects directly. So instead of [{"field": "name", "type": "String"}] we now do [dedupe.variables.String("name")].

A quick and dirty fix I did to use dedupe>=3.0.3 (just to unblock myself) is to update the utility function pandas_dedupe.utility_functions.select_fields(fields, field_properties)(link) with:

if isinstance(i, String):
    fields.append(i)

Where i is of type dedupe.variables.String instead of:

if type(i)==str:
    fields.append({'field': i, 'type': 'String'})

Last commit in this project dates from 4 years. Any plans to upgrade the package to be compatible with dedupe>=3.0 and drop compatibility with older versions? Any help needed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions