Skip to content

process_data() throws an error with 2024 data #23

@mthomas-ketchbrook

Description

@mthomas-ketchbrook

An error is thrown when using the process_data() function against a directory containing the March 2024 downloaded data.

Reproducible Example

library(fcall)
# Download March 2024 data
download_data(year = 2024, month = 3, dest = "data-raw/2024-03")
# Process data
processed_data <- process_data("data-raw/2024-03")

returns the error:

Error in `map2()`:
ℹ️ In index: 28.
ℹ️ With name: RCR7.
Caused by error in `scan()`:
! line 61 did not have 535 elements

Error Details

The problem occurs due to missing rows in the RCR7_Q202403_G20240508.TXT file.
As described in Scenario 3, the RCR7 file expects, for each institution in the data file:

  • a row that contains comma-separated values for variables that belong to the first set of single-occurrence variables
  • a row for each class of the code variable with comma-separated values of multiple-occurrence variables
  • a row that contains comma-separated values of the remaining single-occurrence variables

In particular, there are some institutions that have missing entries for code class 2000 (i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).

Our current approach assumes that the RCR7 data published by FCA will have a row for each RegCapCode (for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of the D_RCR7.TXT file itself.

This missing 2000 code for some variables (for certain institutions) is causing process_data() to fail.

Possible Workarounds

There are several options for troubleshooting this error:

  1. Avoid processing the RCR7 file by removing D_RCR7.TXT and RCR7_Q202403_G20240508.TXT from the directory where the data was downloaded into (i.e., the dir argument of process_data()).
  2. Leverage process_metadata_file() and process_data_file() to process the non-RCR7 files you are interested in.
    For example, the code below shows how to process only the RCB data:
RCB_metadata <- fcall::process_metadata_file(file = "data-raw/2024-03/D_RCB.TXT")
RCB_data <- fcall::process_data_file(
  file = "data-raw/2024-03/RCB_Q202403_G20240508.TXT",
  metadata = RCB_metadata,
  dict = RCB__INV_CODE
)

Remember that available dicts are stored as internal {fcall} datasets.

  1. Manually add the missing lines to RCR7_Q202403_G20240508.TXT (this assumes all values for this code are zero).
    You can add 2000,,,,,,,,,,,,,,,,,, below each instance of a row that starts with 1900 that is not followed by a row that starts with 2000.
  2. Replace the RCR7_Q202403_G20240508.TXT file in the directory where the data was downloaded into (i.e., the dir argument of process_data()) with the attached file below that applies the changes described in # 3 above.

RCR7_Q202403_G20240508.TXT

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions