-
Notifications
You must be signed in to change notification settings - Fork 2
Handle record/item/row deletions #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
332dcc8
949c29e
e153c2e
926e780
ca29d5e
bb2b65f
7b9cba8
a8accda
0ba9cb9
3be1812
dc26703
d24e8c4
66099c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -158,7 +158,7 @@ db_add_primary_key <- function(con, name, value, keys = NULL) { | |
|
|
||
| #' Add Logging Table | ||
| #' | ||
| #' Both creates the logging table and the trigger to update it for | ||
| #' Creates the logging table and the triggers on updates and deletions for | ||
| #' all_review_data. | ||
| #' | ||
| #' @param con A DBI Connection to the SQLite DB | ||
|
|
@@ -218,6 +218,78 @@ db_add_log <- function(con, key_cols = c("id", key_columns)) { | |
| "END" | ||
| )) | ||
| DBI::dbClearResult(rs) | ||
| create_delete_log_trigger(con) | ||
| } | ||
|
|
||
| #' Create Delete Logging Trigger | ||
| #' | ||
| #' Logs record deletions on all_review_data into all_review_data_log. | ||
| #' | ||
| #' @param con A DBI Connection to the SQLite DB | ||
| #' | ||
| #' @keywords internal | ||
| create_delete_log_trigger <- function(con) { | ||
| rs <- DBI::dbSendStatement(con, paste( | ||
| "CREATE TRIGGER all_review_data_delete_log_trigger", | ||
| "AFTER DELETE ON all_review_data FOR EACH ROW", | ||
| "BEGIN", | ||
| "INSERT INTO all_review_data_log (", | ||
| "review_id, edit_date_time, reviewed, comment, reviewer, timestamp, status, dml_type", | ||
| ")", | ||
| "VALUES(", | ||
| "OLD.id,", | ||
| "OLD.edit_date_time,", | ||
| "OLD.reviewed,", | ||
| "OLD.comment,", | ||
| "OLD.reviewer,", | ||
| "OLD.timestamp,", | ||
| "OLD.status,", | ||
| "'DELETE'", | ||
| ");", | ||
| "END" | ||
| )) | ||
| DBI::dbClearResult(rs) | ||
| } | ||
|
|
||
| #' Upgrade app database | ||
| #' | ||
| #' Helper function to upgrade the database when migrating to a newer version of | ||
| #' ClinSight that impacts the database. | ||
| #' | ||
| #' @param db_path Character vector. Path to the database | ||
| #' | ||
| #' @return A character vector indicating the version the DB was upgraded to. Run | ||
| #' for side effects on the database. | ||
| #' | ||
| #' @export | ||
| db_upgrade <- function(db_path){ | ||
| stopifnot(file.exists(db_path)) | ||
|
|
||
| current_db_version <- db_get_version(db_path) | ||
| if (identical(current_db_version, db_version)) | ||
| return(paste("Upgraded to version", db_version)) | ||
|
|
||
| switch( | ||
| current_db_version, | ||
| "1.1" = db_temp_connect(db_path, { | ||
| if (!identical(db_upgrade_menu("1.2"), 1L)) | ||
| return(paste("Abandoning DB upgrading. DB version is", current_db_version)) | ||
| create_delete_log_trigger(con) | ||
| DBI::dbWriteTable(con, "db_version", data.frame(version = "1.2"), overwrite = TRUE) | ||
| }), | ||
| stop("No upgrade available for version ", current_db_version) | ||
| ) | ||
| db_upgrade(db_path) | ||
| } | ||
|
|
||
| db_upgrade_menu <- function(upgrade_version) { | ||
| utils::menu( | ||
| c("Yes", "No"), | ||
| title = sprintf( | ||
| "Attempting to update to v%s. Do you want to proceed?", | ||
| upgrade_version | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| #' Update app database | ||
|
|
@@ -242,6 +314,7 @@ db_update <- function( | |
| edit_time_var = "edit_date_time" | ||
| ){ | ||
| stopifnot(file.exists(db_path)) | ||
| stopifnot(identical(db_version, db_get_version(db_path))) | ||
| con <- get_db_connection(db_path) | ||
| data_synch_time <- attr(data, "synch_time") %||% "" | ||
|
|
||
|
|
@@ -260,6 +333,14 @@ db_update <- function( | |
| # Continue in the case data_synch_time is missing and if data_synch_time is | ||
| # more recent than db_synch_time | ||
| review_data <- DBI::dbGetQuery(con, "SELECT * FROM all_review_data") | ||
| cat("Check for deleted rows\n") | ||
| deleted_review_data <- delete_review_data( | ||
| review_df = review_data, | ||
| latest_review_data = data, | ||
| key_cols = key_cols | ||
| ) | ||
| cat("logging deleted review data to database...\n") | ||
| db_delete(con, deleted_review_data) | ||
|
Comment on lines
+336
to
+343
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to hard-force a DB update/ this DB version or can we run this part on the condition that the DB is at least version 1.2?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think things will get messy fast if we allow desynching between the DB version and the ClinSight version. Also, we already have a hard version check in
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Desynching should be removed as soon as it becomes messy of course. Not more than a minor version desynch would not be a problem I think? We can remove desynching at the next DB update. It would be easier to update clinsight in running studies. With that desynch the dev version is still a drop-in replacement for the current v0.3.0 version, no breaking changes.
Comment on lines
+337
to
+343
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this safe to run automatically with each update? What if this goes wrong and the wrong lines are selected for deletion? Can this be restored somehow easily? Or is that difficult? For example: what if you use an API to pull data and this data pull malfunctions and returns either zero rows, or not enough rows and you run db_update, do you loose all your review data then?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The deleted rows are located in the logging table. The situation you outlined above seems outside of this process though. I would hope that a check would be in place to verify data before pushing to ClinSight. And even if a push happened, I would hope someone would be creating DB backups for rollback if needed.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Above statement is inaccurate. The key columns are not being stored in the logging table, only the ID. Their values are truly lost in this case.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| cat("Start adding new rows to database\n") | ||
| updated_review_data <- update_review_data( | ||
| review_df = review_data, | ||
|
|
@@ -279,6 +360,34 @@ db_update <- function( | |
| cat("Finished updating review data\n") | ||
| } | ||
|
|
||
| #' DELETE FROM all_review_data | ||
| #' | ||
| #' Performs a DELETE FROM on all_review_data based on a data set containing the | ||
| #' records to delete. | ||
| #' | ||
| #' @param con A DBI Connection to the SQLite DB | ||
| #' @param data A data frame containing the records to DELETE FROM | ||
| #' all_review_data | ||
| #' @param key_cols A character vectory specifying which columns define a unique | ||
| #' index for a row. Defaults to "id" | ||
| #' | ||
| #' @return invisible returns TRUE. Is run for its side effects on the DB. | ||
| #' | ||
| #' @keywords internal | ||
| db_delete <- function(con, data, key_cols = "id") { | ||
| dplyr::copy_to(con, data, "row_deletes") | ||
| rs <- DBI::dbSendStatement(con, paste( | ||
| "DELETE FROM", | ||
| "all_review_data", | ||
| "WHERE EXISTS (", | ||
| "SELECT 1", | ||
| "FROM row_deletes", | ||
| "WHERE", paste(sprintf("row_deletes.%1$s = all_review_data.%1$s", key_cols), collapse = " AND "), | ||
| ")" | ||
| )) | ||
| DBI::dbClearResult(rs) | ||
| } | ||
|
|
||
| #' UPSERT to all_review_data | ||
| #' | ||
| #' Performs an UPSERT on all_review_data. New records will be appended to the | ||
|
|
@@ -290,7 +399,7 @@ db_update <- function( | |
| #' @param key_cols A character vector specifying which columns define a unique | ||
| #' index for a row. Defaults to `ClinSight` [key_columns()]. | ||
| #' | ||
| #' @return invisibly returns TRUE. Is run for it's side effects on the DB. | ||
| #' @return invisibly returns TRUE. Is run for its side effects on the DB. | ||
| #' | ||
| #' @keywords internal | ||
| db_upsert <- function(con, data, key_cols = key_columns) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,6 +39,40 @@ get_review_data <- function( | |
| all_review_data | ||
| } | ||
|
|
||
| #' Delete review data | ||
| #' | ||
| #' Identifies the records no longer in the dataset to be removed from | ||
| #' all_review_data. | ||
| #' | ||
| #' @param review_df Data frame containing old review data that need to be | ||
| #' updated. | ||
| #' @param latest_review_data Data frame containing latest review data. | ||
| #' @param key_cols A character vector containing the common key variables. | ||
| #' Defaults to `ClinSight` [key_columns()]. | ||
| #' | ||
| #' @return A data frame containing only the rows to remove from the review data. | ||
| #' | ||
| #' @keywords internal | ||
| delete_review_data <- function( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Function name is a bit confusing because it is not deleting review data but rather returning the rows to be deleted.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I was mimicking |
||
| review_df, | ||
| latest_review_data, | ||
| key_cols = key_columns | ||
| ){ | ||
| stopifnot(is.data.frame(latest_review_data), nrow(latest_review_data) > 0 ) | ||
| stopifnot(is.data.frame(review_df), nrow(review_df) > 0 ) | ||
| stopifnot(is.character(key_cols)) | ||
|
|
||
| deleted_data <- dplyr::anti_join(review_df, latest_review_data, by = key_cols) | ||
| n_deleted <- nrow(deleted_data) | ||
| if(n_deleted != 0){ | ||
| warning(n_deleted, " items were not found in the updated dataset") | ||
| cat("Missing items in the dataset:\n") | ||
| print(deleted_data) | ||
| } | ||
|
|
||
| deleted_data | ||
| } | ||
|
|
||
| #' Update review data | ||
| #' | ||
| #' | ||
|
|
@@ -67,20 +101,6 @@ update_review_data <- function( | |
| stopifnot(is.character(key_cols)) | ||
| stopifnot(is.character(edit_time_var)) | ||
|
|
||
| deleted_data <- dplyr::anti_join(review_df, latest_review_data, by = key_cols) | ||
| n_deleted <- nrow(deleted_data) | ||
| if(n_deleted != 0){ | ||
| warning(n_deleted, " items were not found in the updated dataset") | ||
| cat("Missing items in the dataset:\n") | ||
| print(deleted_data) | ||
| } | ||
|
|
||
| # If data is reviewed and later in an update deleted, then it is okay if the | ||
| # deletion does not show up in the updated data set, and the deleted data point shows | ||
| # up as reviewed. One reason why this is okay is because missing data is checked | ||
| # by the data managers. | ||
| # a more practical reason is that missing data does not show up in the application. | ||
|
|
||
| # when do we have the same edit_date_time but different timestamps? | ||
| # For example: | ||
| # - we have a new data point. It is not yet reviewed (status new or updated). | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was updated by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach to the database upgrading issue. By using recursion, we can essentially just add a step to the upgrade process. Once the database is fully upgraded to the expected value based on the version of ClinSight itself, it will return a success.
I also added a version check in
db_update(). Don't know if there are other places versioning should be checked.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this approach. I guess this needs to be rewritten to a loop if there are multiple database versions but that can be done once needed.
You could also think of asking user confirmation of the update if one is found, something like this:
utils::menu(c("Yes", "No"), title = "Update available. Do you really want to proceed?")Version is also checked in
run_app()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this approach doesn't need a loop because it is using recursion (calls itself at the end
dp_upgrade(dp_path)). Say the release after this is version 1.3. We would simply add "1.2" to theswitch()and the function would iterate based on the version. This allows us to focus on the changes needed to go from version 1.2 to 1.3 and ignore 1.1 to 1.3.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added checks to
db_update()because it is exported and can be ran outside ofrun_app().