Skip to content

Conversation

@philipp-jung
Copy link
Collaborator

I observed that many datasets contain empty strings in columns into which I want to insert Typos. Currently, this causes an error. With this PR, an empty string "" will be replaced by a random character, e.g. "a". While that's not a perfect solution, it gets us going.

In the future, we could look into dependencies between error types and error mechanisms (e.g., insert Typos into values that are not empty strings, yet preserve the error mechanism as good as possible).

Changes

  • Refactor tests, move them into separate folders and introduce conftest.py for shared fixtures.
  • Add a (very basic) unit test for Typo
  • Make Typo work with empty strings

@philipp-jung philipp-jung requested a review from se-jaeger May 13, 2025 16:54
@se-jaeger se-jaeger force-pushed the feat/typo-one-word branch from 343a90f to bddff8c Compare May 14, 2025 11:05
Copy link
Member

@se-jaeger se-jaeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Fixed some minor things if you agree feel free to merge it.

@se-jaeger se-jaeger merged commit e494477 into main May 28, 2025
34 checks passed
@se-jaeger se-jaeger deleted the feat/typo-one-word branch May 28, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants