-
Notifications
You must be signed in to change notification settings - Fork 0
Description
An error is thrown when using the process_data() function against a directory containing the March 2024 downloaded data.
Reproducible Example
library(fcall)
# Download March 2024 data
download_data(year = 2024, month = 3, dest = "data-raw/2024-03")
# Process data
processed_data <- process_data("data-raw/2024-03")
returns the error:
Error in `map2()`:
ℹ️ In index: 28.
ℹ️ With name: RCR7.
Caused by error in `scan()`:
! line 61 did not have 535 elements
Error Details
The problem occurs due to missing rows in the RCR7_Q202403_G20240508.TXT file.
As described in Scenario 3, the RCR7 file expects, for each institution in the data file:
- a row that contains comma-separated values for variables that belong to the first set of single-occurrence variables
- a row for each class of the
codevariable with comma-separated values of multiple-occurrence variables - a row that contains comma-separated values of the remaining single-occurrence variables
In particular, there are some institutions that have missing entries for code class 2000 (i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).
Our current approach assumes that the RCR7 data published by FCA will have a row for each RegCapCode (for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of the D_RCR7.TXT file itself.
This missing 2000 code for some variables (for certain institutions) is causing process_data() to fail.
Possible Workarounds
There are several options for troubleshooting this error:
- Avoid processing the
RCR7file by removingD_RCR7.TXTandRCR7_Q202403_G20240508.TXTfrom the directory where the data was downloaded into (i.e., thedirargument ofprocess_data()). - Leverage
process_metadata_file()andprocess_data_file()to process the non-RCR7files you are interested in.
For example, the code below shows how to process only theRCBdata:
RCB_metadata <- fcall::process_metadata_file(file = "data-raw/2024-03/D_RCB.TXT")
RCB_data <- fcall::process_data_file(
file = "data-raw/2024-03/RCB_Q202403_G20240508.TXT",
metadata = RCB_metadata,
dict = RCB__INV_CODE
)
Remember that available dicts are stored as internal {fcall} datasets.
- Manually add the missing lines to
RCR7_Q202403_G20240508.TXT(this assumes all values for this code are zero).
You can add2000,,,,,,,,,,,,,,,,,,below each instance of a row that starts with1900that is not followed by a row that starts with2000. - Replace the
RCR7_Q202403_G20240508.TXTfile in the directory where the data was downloaded into (i.e., thedirargument ofprocess_data()) with the attached file below that applies the changes described in # 3 above.