-
Notifications
You must be signed in to change notification settings - Fork 3
add write_df_to_delta vignette and performance datasets #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add write_df_to_delta vignette and performance datasets #138
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #138 +/- ##
=======================================
Coverage 57.19% 57.19%
=======================================
Files 18 18
Lines 1509 1509
=======================================
Hits 863 863
Misses 646 646 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
mzayeddfe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work on this @rocha-paula ! I wasn't able to render the vignette but I did go through it in an unrendered format combined with the one you sent on teams. The vignette is so well organised and the structure is very clear and I love the comparative content. I think it really drives the point home about how effective this function is!
I think I can't render the vignette and the pkgdown test fails here because we either need:
- to document the write_df_to_delta_benchmarks.rda and write_df_to_delta_stress_test.rda datasets in pkgdown file
- or we need to save those datafiles as internal. You can do this by using
internal = TRUEinusethis::use_datafor each of those datasets. Then amend the vignette so the wording doesn't imply users can access those datasets and you can showcase the structure there if needed.
My suggestion would be to make the data internal because most end users would not use it. You go through the methodology and the code is publicly available here. However, I'm interested to hear your thoughts on this.
I made one tiny suggestion below to link the check odbc function :)
| - `DATABRICKS_HOST`: The URL of your workspace. | ||
| - `DATABRICKS_TOKEN`: Your personal access token (PAT). | ||
|
|
||
| **Tip:** Before you start, use `check_databricks_odbc()` to verify that your connection and environment variables are correctly configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tiny suggestion here :)
| **Tip:** Before you start, use `check_databricks_odbc()` to verify that your connection and environment variables are correctly configured. | |
| **Tip:** Before you start, use [`check_databricks_odbc()`](https://dfe-analytical-services.github.io/dfeR/reference/check_databricks_odbc.html) to verify that your connection and environment variables are correctly configured. |
Brief overview of changes
Added a comprehensive vignette and supporting benchmarking datasets for the
write_df_to_delta()function.Why are these changes being made?
To provide DfE analysts with a clear technical guide and empirical performance evidence for
write_df_to_delta(). These changes ensure the new functionality is reproducible, documented to DfE standards, and proven to offer significant time savings for large-scale data ingestion.Detailed description of changes
vignettes/write_df_to_delta.Rmd.write_df_to_delta_benchmarks.rdaandwrite_df_to_delta_stress_test.rdato thedata/ folder.data-raw/write_df_to_delta_benchmarks.Randdata-raw/write_df_to_delta_stress_test.R.R/datasets_documentation.Rwith documentation for the new datasets.DESCRIPTIONto includeggplot2,purrr, andscalesunderSuggests.inst/WORDLISTto resolve technical jargon flags.Essential Checklist
devtools::test())devtools::document()Consider (as applicable)
NEWS.mdfile with a summary of my changesDESCRIPTIONstyler::style_pkg()) and lintr issues (lintr::lint_package())Additional context
Note on test failures: During
devtools::check(), a failure was noted intest-air_formatter.R. This is a pre-existing issue related to the localair.exebinary path on Windows and is unrelated to the changes in this PR.Issue ticket number/s and link
#126