add write_df_to_delta vignette and performance datasets #138

rocha-paula · 2026-01-28T10:48:37Z

Brief overview of changes

Added a comprehensive vignette and supporting benchmarking datasets for the write_df_to_delta() function.

Why are these changes being made?

To provide DfE analysts with a clear technical guide and empirical performance evidence for write_df_to_delta(). These changes ensure the new functionality is reproducible, documented to DfE standards, and proven to offer significant time savings for large-scale data ingestion.

Detailed description of changes

Vignette: Created vignettes/write_df_to_delta.Rmd.
Benchmarking Data: Added write_df_to_delta_benchmarks.rda and write_df_to_delta_stress_test.rda to the data/ folder.
Data Generation Scripts: Added data-raw/write_df_to_delta_benchmarks.R and data-raw/write_df_to_delta_stress_test.R.
Documentation: Updated R/datasets_documentation.R with documentation for the new datasets.
Dependencies: Updated DESCRIPTION to include ggplot2, purrr, and scales under Suggests.
Spelling: Updated inst/WORDLIST to resolve technical jargon flags.

Essential Checklist

I have read the contributing guidelines
The code follows the package style and naming conventions
All new and existing tests pass (devtools::test())
I have updated the documentation using devtools::document()
I have checked that my changes do not break existing functionality

Consider (as applicable)

I have added or updated documentation (function documentation, vignettes, readme, etc.)
I have updated the NEWS.md file with a summary of my changes
I have considered if a version bump is required in DESCRIPTION
I have added examples or usage where relevant
I have resolved styling (formatted using Air, or styler::style_pkg()) and lintr issues (lintr::lint_package())

Additional context

Note on test failures: During devtools::check(), a failure was noted in test-air_formatter.R. This is a pre-existing issue related to the local air.exe binary path on Windows and is unrelated to the changes in this PR.

Issue ticket number/s and link

#126

codecov · 2026-01-28T10:51:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.19%. Comparing base (378f30e) to head (472f1ea).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #138   +/-   ##
=======================================
  Coverage   57.19%   57.19%           
=======================================
  Files          18       18           
  Lines        1509     1509           
=======================================
  Hits          863      863           
  Misses        646      646

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 378f30e...472f1ea. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mzayeddfe

Thank you for your work on this @rocha-paula ! I wasn't able to render the vignette but I did go through it in an unrendered format combined with the one you sent on teams. The vignette is so well organised and the structure is very clear and I love the comparative content. I think it really drives the point home about how effective this function is!

I think I can't render the vignette and the pkgdown test fails here because we either need:

to document the write_df_to_delta_benchmarks.rda and write_df_to_delta_stress_test.rda datasets in pkgdown file
or we need to save those datafiles as internal. You can do this by using internal = TRUE in usethis::use_data for each of those datasets. Then amend the vignette so the wording doesn't imply users can access those datasets and you can showcase the structure there if needed.

My suggestion would be to make the data internal because most end users would not use it. You go through the methodology and the code is publicly available here. However, I'm interested to hear your thoughts on this.

I made one tiny suggestion below to link the check odbc function :)

mzayeddfe · 2026-02-03T08:56:45Z

vignettes/write_df_to_delta.Rmd

+-   `DATABRICKS_HOST`: The URL of your workspace.
+-   `DATABRICKS_TOKEN`: Your personal access token (PAT).
+
+**Tip:** Before you start, use `check_databricks_odbc()` to verify that your connection and environment variables are correctly configured.


A tiny suggestion here :)

Suggested change

**Tip:** Before you start, use `check_databricks_odbc()` to verify that your connection and environment variables are correctly configured.

**Tip:** Before you start, use [`check_databricks_odbc()`](https://dfe-analytical-services.github.io/dfeR/reference/check_databricks_odbc.html) to verify that your connection and environment variables are correctly configured.

…atasets

add write_df_to_delta vignette and performance datasets

676ffc3

mzayeddfe self-requested a review January 28, 2026 11:04

mzayeddfe requested changes Feb 3, 2026

View reviewed changes

write_df_to_delta benchmarks and stress tests converted to internal d…

472f1ea

…atasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add write_df_to_delta vignette and performance datasets #138

add write_df_to_delta vignette and performance datasets #138

Uh oh!

rocha-paula commented Jan 28, 2026

Uh oh!

codecov bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

mzayeddfe left a comment

Uh oh!

mzayeddfe Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Tip: Before you start, use `check_databricks_odbc()` to verify that your connection and environment variables are correctly configured.
	Tip: Before you start, use [`check_databricks_odbc()`](https://dfe-analytical-services.github.io/dfeR/reference/check_databricks_odbc.html) to verify that your connection and environment variables are correctly configured.

add write_df_to_delta vignette and performance datasets #138

Are you sure you want to change the base?

add write_df_to_delta vignette and performance datasets #138

Uh oh!

Conversation

rocha-paula commented Jan 28, 2026

Brief overview of changes

Why are these changes being made?

Detailed description of changes

Essential Checklist

Consider (as applicable)

Additional context

Issue ticket number/s and link

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mzayeddfe left a comment

Choose a reason for hiding this comment

Uh oh!

mzayeddfe Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 28, 2026 •

edited

Loading