Skip to content
This repository was archived by the owner on Oct 21, 2024. It is now read-only.
This repository was archived by the owner on Oct 21, 2024. It is now read-only.

🐛 [BUG] - Deletes both line when there is exactly same line. #21

@41ow1ives

Description

@41ow1ives

Environment Settings

JDK openjdk-11-jdk
Spark 3.5.0
Python 3.11.5

Reproduction

Apply deduplication___polyglot___minhash on the data that has exactly same line.

Expected Behavior

I faced something wrong while using deduplication___polyglot___miinhash.
When I used the function on the data that has exactly same lines, one of them has to be remained but all of them are deleted.

Actual Behavior

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingOpen for contributionThis issue is waiting for your contribution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions