Skip to content

list_result_files() fails with cryptic "1 parsing failure" error when encountering malformed directory names #21

@ggilestro

Description

@ggilestro

The list_result_files() function fails with an unhelpful error message when it encounters directories or files that don't follow the expected ethoscope naming convention (e.g., backup directories with .backup suffix).

This issue can be difficult to debug because:

  • The error message doesn't identify which file/directory is problematic
  • Manual testing with properly formatted data works fine, masking the issue
  • Large datasets make manual inspection impractical (we have 21000 folders as of now...)

Error Message

Error: 1 parsing failure
Traceback:
1. build_query(result_dir, query, index_file)
2. list_result_files(result_dir, index_file)
3. parse_datetime(files_info$datetime)
4. parse_time(match[, 2], format = "%H-%M-%S")
5. readr::stop_for_problems(out)

Root Cause

The function uses list.files(result_dir, recursive=T, pattern="*\\.db$") which can pick up files in directories with non-standard names. When it encounters a directory named like 2025-07-02_15-32-33.backup, it tries to parse 15-32-33.backup as a time string, causing readr::parse_time() to fail.

Steps to Reproduce

  1. Create a directory structure with a malformed directory name:
    /some_machine_id/ETHOSCOPE_XXX/2025-07-02_15-32-33.backup/file.db
    
  2. Run list_result_files() on the parent directory
  3. Observe the cryptic parsing failure error

Current Workaround

Manually identify and remove/rename malformed directories:

find /path/to/data -name "*backup*"
rm -rf /path/to/malformed/directory

Proposed Solution

Enhance the list_result_files() function to:

  1. Validate datetime format before parsing:

    # Add validation step
    datetime_pattern <- "^\\d{4}-\\d{2}-\\d{2}_\\d{2}-\\d{2}-\\d{2}$"
    valid_datetime <- grepl(datetime_pattern, files_info$datetime)
    
    if(!all(valid_datetime)) {
      invalid_files <- all_db_files[!valid_datetime]
      warning("Found files with invalid datetime format:")
      for(file in head(invalid_files, 5)) {
        warning("  ", file)
      }
      # Optionally filter out invalid files or stop with informative error
    }
  2. Provide more informative error messages:

    parse_datetime <- function(x){
      match <- stringr::str_split(x, "_", simplify=TRUE)
      
      tryCatch({
        d <- parse_date(match[,1])
        t <- parse_time(match[,2], format="%H-%M-%S")
        data.table::data.table(date=d, time = t)
      }, error = function(e) {
        # Identify problematic entries
        problems <- readr::problems(readr::parse_time(match[,2], format="%H-%M-%S"))
        if(nrow(problems) > 0) {
          stop(sprintf("Failed to parse datetime from file paths. Problematic entries:\n%s\nCheck for malformed directory names or backup files.",
                      paste(x[problems$row[1:min(5, nrow(problems))]], collapse="\n")))
        }
        stop(e)
      })
    }

Environment

  • R version: 4.5.1 (2025-06-13)
  • scopr version: 0.3.3
  • readr version: 2.1.5
  • OS: [your OS]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions