-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
The list_result_files() function fails with an unhelpful error message when it encounters directories or files that don't follow the expected ethoscope naming convention (e.g., backup directories with .backup suffix).
This issue can be difficult to debug because:
- The error message doesn't identify which file/directory is problematic
- Manual testing with properly formatted data works fine, masking the issue
- Large datasets make manual inspection impractical (we have 21000 folders as of now...)
Error Message
Error: 1 parsing failure
Traceback:
1. build_query(result_dir, query, index_file)
2. list_result_files(result_dir, index_file)
3. parse_datetime(files_info$datetime)
4. parse_time(match[, 2], format = "%H-%M-%S")
5. readr::stop_for_problems(out)
Root Cause
The function uses list.files(result_dir, recursive=T, pattern="*\\.db$") which can pick up files in directories with non-standard names. When it encounters a directory named like 2025-07-02_15-32-33.backup, it tries to parse 15-32-33.backup as a time string, causing readr::parse_time() to fail.
Steps to Reproduce
- Create a directory structure with a malformed directory name:
/some_machine_id/ETHOSCOPE_XXX/2025-07-02_15-32-33.backup/file.db - Run
list_result_files()on the parent directory - Observe the cryptic parsing failure error
Current Workaround
Manually identify and remove/rename malformed directories:
find /path/to/data -name "*backup*"
rm -rf /path/to/malformed/directoryProposed Solution
Enhance the list_result_files() function to:
-
Validate datetime format before parsing:
# Add validation step datetime_pattern <- "^\\d{4}-\\d{2}-\\d{2}_\\d{2}-\\d{2}-\\d{2}$" valid_datetime <- grepl(datetime_pattern, files_info$datetime) if(!all(valid_datetime)) { invalid_files <- all_db_files[!valid_datetime] warning("Found files with invalid datetime format:") for(file in head(invalid_files, 5)) { warning(" ", file) } # Optionally filter out invalid files or stop with informative error }
-
Provide more informative error messages:
parse_datetime <- function(x){ match <- stringr::str_split(x, "_", simplify=TRUE) tryCatch({ d <- parse_date(match[,1]) t <- parse_time(match[,2], format="%H-%M-%S") data.table::data.table(date=d, time = t) }, error = function(e) { # Identify problematic entries problems <- readr::problems(readr::parse_time(match[,2], format="%H-%M-%S")) if(nrow(problems) > 0) { stop(sprintf("Failed to parse datetime from file paths. Problematic entries:\n%s\nCheck for malformed directory names or backup files.", paste(x[problems$row[1:min(5, nrow(problems))]], collapse="\n"))) } stop(e) }) }
Environment
- R version: 4.5.1 (2025-06-13)
- scopr version: 0.3.3
- readr version: 2.1.5
- OS: [your OS]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels