Skip to content

NAs causing failures in training combination models #411

@kenahoo

Description

@kenahoo

I have a combination model that's running into some problems with NAs.

The data looks like this (see end of message for actual dput format to recreate the exact data set):

Browse[2]> summary(t1)
 interval_time                     dalmp           target       
 Min.   :2023-08-29 23:00:00   Min.   :21.89   Min.   :-1.8695  
 1st Qu.:2023-08-30 06:15:00   1st Qu.:24.90   1st Qu.:-0.1650  
 Median :2023-08-30 13:30:00   Median :34.82   Median : 0.1793  
 Mean   :2023-08-30 13:30:00   Mean   :40.45   Mean   : 0.2144  
 3rd Qu.:2023-08-30 20:45:00   3rd Qu.:53.67   3rd Qu.: 0.7105  
 Max.   :2023-08-31 04:00:00   Max.   :81.69   Max.   : 1.5769  
                                               NA's   :2        
Browse[2]> t1
# A tsibble: 30 x 3 [1h] <UTC>
   interval_time       dalmp target
   <dttm>              <dbl>  <dbl>
 1 2023-08-29 23:00:00  64.8  1.58 
 2 2023-08-30 00:00:00  55.0  1.57 
 3 2023-08-30 01:00:00  49.6 NA    
 4 2023-08-30 02:00:00  37.0 NA    
 5 2023-08-30 03:00:00  34.1  0.217
 6 2023-08-30 04:00:00  27.9  0.111
 7 2023-08-30 05:00:00  24.8 -0.132
 8 2023-08-30 06:00:00  24.6 -0.121
 9 2023-08-30 07:00:00  22.8 -0.195
10 2023-08-30 08:00:00  22.1 -0.328
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows

And the training goes like this:

rt_fit <- fabletools::model(training,
                              cmbn1 = combination_model(
                                ARIMA(target ~  dalmp),
                                NNETAR(target ~  dalmp),
                                AR(target ~ order(2)),
                                cmbn_args = list(weights = "inv_var")),
                              .safely = FALSE)

# Error in qr.default(XX) : NA/NaN/Inf in foreign function call (arg 1)

I thought I might be able to work around this by deleting the rows with NAs, but it doesn't seem happy with that either:

Error in check_gaps(.data) : 
  .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.

Is there a problem in combination_model that's causing this? Any suggested workaround? Any help greatly appreciated!

I'm using:

> R.version
               _                           
platform       aarch64-apple-darwin20      
arch           aarch64                     
os             darwin20                    
system         aarch64, darwin20           
status                                     
major          4                           
minor          2.2                         
year           2022                        
month          10                          
day            31                          
svn rev        83211                       
language       R                           
version.string R version 4.2.2 (2022-10-31)
nickname       Innocent and Trusting      

> packageVersion('fabletools')
[1] ‘0.3.3’
> packageVersion('fable')
[1] ‘0.3.3’

Here's the input data for reproduction purposes:

training <- 
structure(list(interval_time = structure(c(1693350000, 1693353600, 
1693357200, 1693360800, 1693364400, 1693368000, 1693371600, 1693375200, 
1693378800, 1693382400, 1693386000, 1693389600, 1693393200, 1693396800, 
1693400400, 1693404000, 1693407600, 1693411200, 1693414800, 1693418400, 
1693422000, 1693425600, 1693429200, 1693432800, 1693436400, 1693440000, 
1693443600, 1693447200, 1693450800, 1693454400), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), dalmp = c(64.772, 55.031, 49.5954, 37.0346, 34.122, 
27.9416, 24.8358, 24.6128, 22.81, 22.1077, 21.8885, 21.9106, 
22.9985, 25.0922, 25.7776, 30.3662, 32.1598, 35.5114, 39.2446, 
44.2576, 59.0444, 65.3967, 76.7988, 81.6873, 73.1935, 64.3783, 
45.2286, 36.5493, 26.4268, 22.6344), target = c(1.57691363525177, 
1.56855930002442, NA, NA, 0.216786192700598, 0.110616995970005, 
-0.131849633308032, -0.12072764832231, -0.195075151428118, -0.327716438359099, 
-0.403658249907274, -0.361481112249875, -0.254285754950902, -0.154945697500086, 
-0.142376837846178, 0.141843887915466, 0.376564223792779, 0.327934476018599, 
0.875235233052582, 0.550416417555717, 0.682786557929835, 0.482133615075935, 
0.952303345137791, 1.16381971509555, 0.793706899407923, 0.854539241873788, 
0.239481464973916, -0.0814464383521666, -1.86950100289322, -0.867599159975814
)), class = c("tbl_ts", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L), key = structure(list(.rows = structure(list(1:30), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L)), index = structure("interval_time", ordered = TRUE), index2 = "interval_time", interval = structure(list(
    year = 0, quarter = 0, month = 0, week = 0, day = 0, hour = 1, 
    minute = 0, second = 0, millisecond = 0, microsecond = 0, 
    nanosecond = 0, unit = 0), .regular = TRUE, class = c("interval", 
"vctrs_rcrd", "vctrs_vctr")))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions