Skip to content

Fix schema mismatch#99

Merged
jacobtomlinson merged 2 commits intodask-contrib:mainfrom
snorkel-ai:fix/schema_mismatch
Jun 30, 2025
Merged

Fix schema mismatch#99
jacobtomlinson merged 2 commits intodask-contrib:mainfrom
snorkel-ai:fix/schema_mismatch

Conversation

@HiromuHota
Copy link
Contributor

@HiromuHota HiromuHota commented Jun 27, 2025

As demostrated in test_append_with_schema below,

def test_append_with_schema(tmpdir):
    """Ensure we can append to a table with a schema"""
    tmpdir = str(tmpdir)
    df = pd.DataFrame({"a": [1, 2, 3, 4]})
    ddf = dd.from_pandas(df, npartitions=2)
    schema = pa.Schema.from_pandas(df)
    to_deltalake(tmpdir, ddf, schema=schema)
    to_deltalake(tmpdir, ddf, schema=schema, mode="append")

this fails as of 032253f in main branch due to "Schema of data does not match table schema" because we currently compare pyarrow.Schema (as in schema) and arro3.core.Schema (as in table.schema().to_arrow()).

if table: # already exists
if (
schema is not None
and schema != table.schema().to_arrow()
and not (mode == "overwrite" and overwrite_schema)
):
raise ValueError(
"Schema of data does not match table schema\n"
f"Table schema:\n{schema}\nData Schema:\n{table.schema().to_arrow()}"
)

We should align the schema class before comparison.

dt.load_as_version(datetime)

schema = pa.schema(dt.schema().to_arrow())
schema = pa.schema(dt.schema())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a refactoring.
Before: DeltaSchema -> Arro3Schema -> PyArrowSchema
After: DeltaSchema -> PyArrowSchema

@HiromuHota HiromuHota marked this pull request as ready for review June 27, 2025 18:42
@jacobtomlinson jacobtomlinson merged commit af91282 into dask-contrib:main Jun 30, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants