-
Notifications
You must be signed in to change notification settings - Fork 2
hard deletions #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 12-07-soft-deletes
Are you sure you want to change the base?
hard deletions #504
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Coverage Report (Python 3.11) •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report (Python 3.10) •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report (Python 3.13) •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
f892f10 to
5aed737
Compare
f7fa9b5 to
5addac5
Compare
| """Hard delete currently implemented for in-memory store only.""" | ||
|
|
||
| if any_store.__class__.__name__ != "InMemoryMetadataStore": | ||
| pytest.xfail("Hard delete pending for non-memory backends") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead just @pytest.mark.xfail(raises=NotImplementedError) the entire test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we later want to implement integration by integration so being able to remoe/choose which ones work
makes sense for me
is there a better way?
| assert active_row[METAXY_DELETED_AT].is_null().all() | ||
|
|
||
|
|
||
| def test_hard_delete_memory_store_only(any_store: MetadataStore): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these deletion tests (the one from the previous PR as well) should be moved into other testing modules, this is called test_provenance_golden because it's testing... provenance engines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then we would have to copy the test initialization logic as well or externalize it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt the deletion also a part of the provenance i.e. especially in the case of soft deletions where we add rows and want to ensure that the right data is read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok that's a good point, actually to verify this we need to attempt to call .resolve_update for the feature and make sure increment.deleted is calculated correctly
danielgafni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, looking better, let's drop the DeletionResult thing for now?
Check my comment for reasoning
| f"Feature {feature_key.to_string()} not found in store; cannot soft delete." | ||
| ) | ||
|
|
||
| df = lazy.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm why exactly do we need this collect? is it just to return the row_count? probably we should avoid calculating the row count in this case.
We could get back to this later once we implement the metaxy_metadata column with arbitrary key-value pairs, we could include a soft_deletion_id there for backtracking the row count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should generalize that - also for hard deletions and ideally from the DB driver retrieve the affected/deleted records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we have to collect it somewhere: df: IntoFrame, for write_metadata is not lazy. So by dropping the Deletionresult we already got rid of a couple of collects. Is it fine to keep this one here for starters?
|
Another thing: this PR currently implements both hard and soft deletes. But only hard deletes are tested. Could you either split the PR or add soft deletion tests? For example, ensure that soft-deleting historical feature versions does not affect the current feature versions. Ensure that writing new metadata on top of soft-deleted rows properly undo the soft-deletion. Same with hard deletes. |
4427609 to
84fdbae
Compare
1a9d523 to
22465e0
Compare
can you explain what you mean. I think the soft deletes are downstack and have their test - and hard deletes here have their own tests see metaxy/tests/metadata_stores/test_provenance_golden_reference.py Lines 314 to 407 in 89cc2da
|
55c2d49 to
f4c8847
Compare
22465e0 to
bfa1ec9
Compare
f4c8847 to
3f6f579
Compare
3f6f579 to
cc0363a
Compare
720c654 to
b62661c
Compare
cc0363a to
8afd16b
Compare
b62661c to
b29bc0b
Compare
61f93dc to
e7564c0
Compare
b29bc0b to
01ee00e
Compare
192956a to
4faaaff
Compare
01ee00e to
da0b24b
Compare
4faaaff to
3472412
Compare
da0b24b to
a308098
Compare
6bf2974 to
b1c9ddd
Compare
a308098 to
5e9126b
Compare
5e9126b to
ab1b584
Compare
b1c9ddd to
3a3a985
Compare

No description provided.