Skip to content

Conversation

@geoHeil
Copy link
Collaborator

@geoHeil geoHeil commented Dec 8, 2025

No description provided.

Copy link
Collaborator Author

geoHeil commented Dec 8, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Coverage

Coverage Report (Python 3.12) •
FileStmtsMissCoverMissing
metadata_store
   base.py50515569%3, 5–11, 13–17, 19–21, 27–29, 34, 38, 51–53, 58–60, 67, 70, 89, 91, 96, 101, 107, 112, 124, 152, 159, 162, 164, 234–235, 248–249, 262, 383, 444, 467, 553, 655, 712, 741, 743–744, 804, 881, 964–965, 1008–1009, 1012–1013, 1030, 1060–1061, 1066–1067, 1088, 1093, 1105, 1110, 1142, 1159, 1177, 1188, 1202, 1226, 1240, 1244, 1257, 1267–1268, 1292, 1319, 1354, 1388, 1445, 1519, 1535, 1542, 1550, 1556, 1560, 1596, 1618, 1644–1645, 1658, 1680, 1682–1683, 1686, 1729, 1758, 1783, 1817, 1843, 1856, 1858, 1915, 1994, 1996, 1999–2000, 2003–2004, 2008, 2018, 2031, 2033, 2035–2037, 2042–2043, 2046–2047, 2049–2051, 2053, 2056–2057, 2060–2063, 2066, 2075–2076, 2078–2079, 2082, 2085, 2087–2089, 2093–2094, 2097–2099, 2102, 2104, 2108
   delta.py1231389%114, 128, 131, 164–165, 227, 271, 301, 304–305, 308, 316, 366
   duckdb.py1603081%84–90, 92, 203–204, 208, 276, 286–288, 290, 298–299, 320, 325, 330, 344–345, 356, 361, 419, 426, 432, 437, 450
   ibis.py1763977%205, 224, 375–376, 378, 420, 438, 469, 472–473, 476, 479, 482–483, 486, 489, 493, 496–497, 500, 503, 509, 511, 517–518, 520, 524, 526, 529, 531–532, 534, 538, 540–541, 562, 606, 674, 701
   lancedb.py105991%158–160, 163, 309–311, 361, 374
   memory.py1044655%3–5, 7–10, 12–17, 20, 33, 36, 64–65, 67, 78, 82, 86–87, 96, 99, 101–102, 105, 107, 112, 139, 141, 149, 151, 169, 181, 189, 196, 201, 240, 257, 267–268, 306, 311–312
models
   types.py1528146%3, 5–6, 16, 30, 33–34, 37, 46–48, 51, 54, 65, 70, 88–90, 100–104, 114–115, 117, 136–138, 158, 185, 192–193, 201–202, 206, 210, 214, 216, 222, 226, 230, 232, 236, 238, 242, 244, 248, 250, 254–255, 263, 267, 271, 275, 283, 320, 324, 328, 334, 339, 376, 380, 384, 391, 393, 396, 401, 437, 468, 471, 475, 483, 492, 495, 502, 507, 513, 516, 520, 522
TOTAL8674344160% 

Tests Skipped Failures Errors Time
1999 46 💤 0 ❌ 0 🔥 6m 22s ⏱️

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Coverage

Coverage Report (Python 3.11) •
FileStmtsMissCoverMissing
metadata_store
   base.py50515569%3, 5–11, 13–17, 19–21, 27–29, 34, 38, 51–53, 58–60, 67, 70, 89, 91, 96, 101, 107, 112, 124, 152, 159, 162, 164, 234–235, 248–249, 262, 383, 444, 467, 553, 655, 712, 741, 743–744, 804, 881, 964–965, 1008–1009, 1012–1013, 1030, 1060–1061, 1066–1067, 1088, 1093, 1105, 1110, 1142, 1159, 1177, 1188, 1202, 1226, 1240, 1244, 1257, 1267–1268, 1292, 1319, 1354, 1388, 1445, 1519, 1535, 1542, 1550, 1556, 1560, 1596, 1618, 1644–1645, 1658, 1680, 1682–1683, 1686, 1729, 1758, 1783, 1817, 1843, 1856, 1858, 1915, 1994, 1996, 1999–2000, 2003–2004, 2008, 2018, 2031, 2033, 2035–2037, 2042–2043, 2046–2047, 2049–2051, 2053, 2056–2057, 2060–2063, 2066, 2075–2076, 2078–2079, 2082, 2085, 2087–2089, 2093–2094, 2097–2099, 2102, 2104, 2108
   delta.py1231389%114, 128, 131, 164–165, 227, 271, 301, 304–305, 308, 316, 366
   duckdb.py1603081%84–90, 92, 203–204, 208, 276, 286–288, 290, 298–299, 320, 325, 330, 344–345, 356, 361, 419, 426, 432, 437, 450
   ibis.py1763977%205, 224, 375–376, 378, 420, 438, 469, 472–473, 476, 479, 482–483, 486, 489, 493, 496–497, 500, 503, 509, 511, 517–518, 520, 524, 526, 529, 531–532, 534, 538, 540–541, 562, 606, 674, 701
   lancedb.py105991%158–160, 163, 309–311, 361, 374
   memory.py1044655%3–5, 7–10, 12–17, 20, 33, 36, 64–65, 67, 78, 82, 86–87, 96, 99, 101–102, 105, 107, 112, 139, 141, 149, 151, 169, 181, 189, 196, 201, 240, 257, 267–268, 306, 311–312
models
   types.py1528146%3, 5–6, 16, 30, 33–34, 37, 46–48, 51, 54, 65, 70, 88–90, 100–104, 114–115, 117, 136–138, 158, 185, 192–193, 201–202, 206, 210, 214, 216, 222, 226, 230, 232, 236, 238, 242, 244, 248, 250, 254–255, 263, 267, 271, 275, 283, 320, 324, 328, 334, 339, 376, 380, 384, 391, 393, 396, 401, 437, 468, 471, 475, 483, 492, 495, 502, 507, 513, 516, 520, 522
TOTAL8674344160% 

Tests Skipped Failures Errors Time
1999 46 💤 0 ❌ 0 🔥 6m 33s ⏱️

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Test Results (Python 3.10)

1 999 tests  +30   1 953 ✅ + 6   6m 54s ⏱️ +21s
    1 suites ± 0      46 💤 +24 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 3a3a985. ± Comparison against base commit ab1b584.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Coverage

Coverage Report (Python 3.10) •
FileStmtsMissCoverMissing
metadata_store
   base.py50515569%3, 5–11, 13–17, 19–21, 27–29, 34, 38, 51–53, 58–60, 67, 70, 89, 91, 96, 101, 107, 112, 124, 152, 159, 162, 164, 234–235, 248–249, 262, 383, 444, 467, 553, 655, 712, 741, 743–744, 804, 881, 964–965, 1008–1009, 1012–1013, 1030, 1060–1061, 1066–1067, 1088, 1093, 1105, 1110, 1142, 1159, 1177, 1188, 1202, 1226, 1240, 1244, 1257, 1267–1268, 1292, 1319, 1354, 1388, 1445, 1519, 1535, 1542, 1550, 1556, 1560, 1596, 1618, 1644–1645, 1658, 1680, 1682–1683, 1686, 1729, 1758, 1783, 1817, 1843, 1856, 1858, 1915, 1994, 1996, 1999–2000, 2003–2004, 2008, 2018, 2031, 2033, 2035–2037, 2042–2043, 2046–2047, 2049–2051, 2053, 2056–2057, 2060–2063, 2066, 2075–2076, 2078–2079, 2082, 2085, 2087–2089, 2093–2094, 2097–2099, 2102, 2104, 2108
   delta.py1231389%114, 128, 131, 164–165, 227, 271, 301, 304–305, 308, 316, 366
   duckdb.py1603081%84–90, 92, 203–204, 208, 276, 286–288, 290, 298–299, 320, 325, 330, 344–345, 356, 361, 419, 426, 432, 437, 450
   ibis.py1763977%205, 224, 375–376, 378, 420, 438, 469, 472–473, 476, 479, 482–483, 486, 489, 493, 496–497, 500, 503, 509, 511, 517–518, 520, 524, 526, 529, 531–532, 534, 538, 540–541, 562, 606, 674, 701
   lancedb.py105991%158–160, 163, 309–311, 361, 374
   memory.py1044655%3–5, 7–10, 12–17, 20, 33, 36, 64–65, 67, 78, 82, 86–87, 96, 99, 101–102, 105, 107, 112, 139, 141, 149, 151, 169, 181, 189, 196, 201, 240, 257, 267–268, 306, 311–312
models
   types.py1528146%3, 5–6, 16, 30, 33–34, 37, 46–48, 51, 54, 65, 70, 88–90, 100–104, 114–115, 117, 136–138, 158, 185, 192–193, 201–202, 206, 210, 214, 216, 222, 226, 230, 232, 236, 238, 242, 244, 248, 250, 254–255, 263, 267, 271, 275, 283, 320, 324, 328, 334, 339, 376, 380, 384, 391, 393, 396, 401, 437, 468, 471, 475, 483, 492, 495, 502, 507, 513, 516, 520, 522
TOTAL8674344160% 

Tests Skipped Failures Errors Time
1999 46 💤 0 ❌ 0 🔥 6m 54s ⏱️

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

Coverage

Coverage Report (Python 3.13) •
FileStmtsMissCoverMissing
metadata_store
   base.py50515569%3, 5–11, 13–17, 19–21, 27–29, 34, 38, 51–53, 58–60, 67, 70, 89, 91, 96, 101, 107, 112, 124, 152, 159, 162, 164, 234–235, 248–249, 262, 383, 444, 467, 553, 655, 712, 741, 743–744, 804, 881, 964–965, 1008–1009, 1012–1013, 1030, 1060–1061, 1066–1067, 1088, 1093, 1105, 1110, 1142, 1159, 1177, 1188, 1202, 1226, 1240, 1244, 1257, 1267–1268, 1292, 1319, 1354, 1388, 1445, 1519, 1535, 1542, 1550, 1556, 1560, 1596, 1618, 1644–1645, 1658, 1680, 1682–1683, 1686, 1729, 1758, 1783, 1817, 1843, 1856, 1858, 1915, 1994, 1996, 1999–2000, 2003–2004, 2008, 2018, 2031, 2033, 2035–2037, 2042–2043, 2046–2047, 2049–2051, 2053, 2056–2057, 2060–2063, 2066, 2075–2076, 2078–2079, 2082, 2085, 2087–2089, 2093–2094, 2097–2099, 2102, 2104, 2108
   delta.py1231389%114, 128, 131, 164–165, 227, 271, 301, 304–305, 308, 316, 366
   duckdb.py1603081%84–90, 92, 203–204, 208, 276, 286–288, 290, 298–299, 320, 325, 330, 344–345, 356, 361, 419, 426, 432, 437, 450
   ibis.py1763977%205, 224, 375–376, 378, 420, 438, 469, 472–473, 476, 479, 482–483, 486, 489, 493, 496–497, 500, 503, 509, 511, 517–518, 520, 524, 526, 529, 531–532, 534, 538, 540–541, 562, 606, 674, 701
   lancedb.py105991%158–160, 163, 309–311, 361, 374
   memory.py1044655%3–5, 7–10, 12–17, 20, 33, 36, 64–65, 67, 78, 82, 86–87, 96, 99, 101–102, 105, 107, 112, 139, 141, 149, 151, 169, 181, 189, 196, 201, 240, 257, 267–268, 306, 311–312
models
   types.py1528146%3, 5–6, 16, 30, 33–34, 37, 46–48, 51, 54, 65, 70, 88–90, 100–104, 114–115, 117, 136–138, 158, 185, 192–193, 201–202, 206, 210, 214, 216, 222, 226, 230, 232, 236, 238, 242, 244, 248, 250, 254–255, 263, 267, 271, 275, 283, 320, 324, 328, 334, 339, 376, 380, 384, 391, 393, 396, 401, 437, 468, 471, 475, 483, 492, 495, 502, 507, 513, 516, 520, 522
TOTAL8674344160% 

Tests Skipped Failures Errors Time
1999 46 💤 0 ❌ 0 🔥 6m 29s ⏱️

@geoHeil geoHeil force-pushed the 12-07-soft-deletes branch from f892f10 to 5aed737 Compare December 8, 2025 12:39
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch 2 times, most recently from f7fa9b5 to 5addac5 Compare December 8, 2025 12:44
"""Hard delete currently implemented for in-memory store only."""

if any_store.__class__.__name__ != "InMemoryMetadataStore":
pytest.xfail("Hard delete pending for non-memory backends")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead just @pytest.mark.xfail(raises=NotImplementedError) the entire test?

Copy link
Collaborator Author

@geoHeil geoHeil Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we later want to implement integration by integration so being able to remoe/choose which ones work

makes sense for me

is there a better way?

assert active_row[METAXY_DELETED_AT].is_null().all()


def test_hard_delete_memory_store_only(any_store: MetadataStore):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these deletion tests (the one from the previous PR as well) should be moved into other testing modules, this is called test_provenance_golden because it's testing... provenance engines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we would have to copy the test initialization logic as well or externalize it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt the deletion also a part of the provenance i.e. especially in the case of soft deletions where we add rows and want to ensure that the right data is read?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok that's a good point, actually to verify this we need to attempt to call .resolve_update for the feature and make sure increment.deleted is calculated correctly

Copy link
Collaborator

@danielgafni danielgafni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, looking better, let's drop the DeletionResult thing for now?
Check my comment for reasoning

f"Feature {feature_key.to_string()} not found in store; cannot soft delete."
)

df = lazy.collect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why exactly do we need this collect? is it just to return the row_count? probably we should avoid calculating the row count in this case.

We could get back to this later once we implement the metaxy_metadata column with arbitrary key-value pairs, we could include a soft_deletion_id there for backtracking the row count.

Copy link
Collaborator Author

@geoHeil geoHeil Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generalize that - also for hard deletions and ideally from the DB driver retrieve the affected/deleted records.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have to collect it somewhere: df: IntoFrame, for write_metadata is not lazy. So by dropping the Deletionresult we already got rid of a couple of collects. Is it fine to keep this one here for starters?

@danielgafni
Copy link
Collaborator

Another thing: this PR currently implements both hard and soft deletes.

But only hard deletes are tested.

Could you either split the PR or add soft deletion tests?

For example, ensure that soft-deleting historical feature versions does not affect the current feature versions.

Ensure that writing new metadata on top of soft-deleted rows properly undo the soft-deletion.

Same with hard deletes.

@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from 4427609 to 84fdbae Compare December 9, 2025 07:16
@geoHeil geoHeil force-pushed the 12-07-soft-deletes branch from 1a9d523 to 22465e0 Compare December 9, 2025 07:16
@geoHeil
Copy link
Collaborator Author

geoHeil commented Dec 9, 2025

Another thing: this PR currently implements both hard and soft deletes.

But only hard deletes are tested.

Could you either split the PR or add soft deletion tests?

For example, ensure that soft-deleting historical feature versions does not affect the current feature versions.

Ensure that writing new metadata on top of soft-deleted rows properly undo the soft-deletion.

Same with hard deletes.

can you explain what you mean. I think the soft deletes are downstack and have their test - and hard deletes here have their own tests see

def test_soft_delete(any_store: MetadataStore):
"""Soft delete for stores that support it."""
unsupported = {
"ClickHouseMetadataStore",
"LanceDBMetadataStore",
}
if any_store.__class__.__name__ in unsupported:
pytest.xfail(f"Hard delete pending for {any_store.__class__.__name__}")
class UserProfile(
SampleFeature,
spec=SampleFeatureSpec(
key="hard_delete_profile",
fields=["email", "status"],
),
):
email: str | None = None
status: str | None = None
df = pl.DataFrame(
{
"sample_uid": ["u1", "u2", "u3"],
"email": ["a@example.com", "b@example.com", "c@example.com"],
"status": ["active", "active", "inactive"],
METAXY_PROVENANCE_BY_FIELD: [
{"email": "p1", "status": "p1"},
{"email": "p2", "status": "p2"},
{"email": "p3", "status": "p3"},
],
}
)
with any_store.open("write"):
any_store.write_metadata(UserProfile, df)
result = any_store.delete_metadata(
UserProfile, filters=nw.col("status") == "inactive"
)
assert result.row_count == 1
with any_store:
remaining = any_store.read_metadata(UserProfile).collect().to_polars()
assert set(remaining["sample_uid"]) == {"u1", "u2"}
with_deleted = (
any_store.read_metadata(UserProfile, include_soft_deleted=True)
.collect()
.to_polars()
)
assert set(with_deleted["sample_uid"]) == {"u1", "u2", "u3"}
def test_hard_delete(any_store: MetadataStore):
"""Hard delete removes rows from storage."""
if any_store.__class__.__name__ != "InMemoryMetadataStore":
pytest.xfail("Hard delete pending for non-memory backends")
class UserProfile(
SampleFeature,
spec=SampleFeatureSpec(
key="hard_delete_profile",
fields=["email", "status"],
),
):
email: str | None = None
status: str | None = None
df = pl.DataFrame(
{
"sample_uid": ["u1", "u2", "u3"],
"email": ["a@example.com", "b@example.com", "c@example.com"],
"status": ["active", "active", "inactive"],
METAXY_PROVENANCE_BY_FIELD: [
{"email": "p1", "status": "p1"},
{"email": "p2", "status": "p2"},
{"email": "p3", "status": "p3"},
],
}
)
with any_store.open("write"):
any_store.write_metadata(UserProfile, df)
result = any_store.delete_metadata(
UserProfile, filters=nw.col("status") == "inactive", soft=False
)
assert result.row_count == 1
with any_store:
remaining = any_store.read_metadata(UserProfile).collect().to_polars()
assert remaining.height == 2
assert set(remaining["sample_uid"]) == {"u1", "u2"}

@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch 2 times, most recently from 55c2d49 to f4c8847 Compare December 9, 2025 10:47
@geoHeil geoHeil force-pushed the 12-07-soft-deletes branch from 22465e0 to bfa1ec9 Compare December 9, 2025 10:47
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from f4c8847 to 3f6f579 Compare December 9, 2025 12:27
@geoHeil geoHeil requested a review from danielgafni December 9, 2025 13:01
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from 3f6f579 to cc0363a Compare December 10, 2025 13:24
@geoHeil geoHeil force-pushed the 12-07-soft-deletes branch 2 times, most recently from 720c654 to b62661c Compare December 10, 2025 14:19
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from cc0363a to 8afd16b Compare December 10, 2025 14:19
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch 2 times, most recently from 61f93dc to e7564c0 Compare December 10, 2025 16:09
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch 3 times, most recently from 192956a to 4faaaff Compare December 10, 2025 17:56
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from 4faaaff to 3472412 Compare December 11, 2025 13:16
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch 2 times, most recently from 6bf2974 to b1c9ddd Compare December 19, 2025 09:34
@geoHeil geoHeil force-pushed the 12-08-hard_deletions branch from b1c9ddd to 3a3a985 Compare December 23, 2025 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants