Skip to content

Update to incorporate new features in drake#11

Merged
edgararuiz-zz merged 3 commits intosol-eng:masterfrom
wlandau:drake-history
Aug 8, 2019
Merged

Update to incorporate new features in drake#11
edgararuiz-zz merged 3 commits intosol-eng:masterfrom
wlandau:drake-history

Conversation

@wlandau
Copy link
Contributor

@wlandau wlandau commented Jun 27, 2019

Development drake now tracks the history and provenance of targets over time (CRAN ETA mid-July) and the additions in this PR describe how it works.

drake history is similar to MLflow tracking except:

  1. You do not have to tell drake which parameters to track. drake_history() just analyzes commands automatically and detects named length-one atomic arguments to function calls in the plan.
  2. drake does not automatically create downstream summaries such as performance metrics. Users still need to create special targets for those.

@wlandau wlandau changed the title Describe new history/provenance features in drake Update to incorporate new features in drake Aug 8, 2019
@wlandau
Copy link
Contributor Author

wlandau commented Aug 8, 2019

As of ropensci/drake#977, we no longer need to worry about the performance issue I mentioned in richfitz/storr#77 (comment). With target(format = "keras"), drake can now save and load Keras models directly in HDF5 format instead of a storr RDS file. The workaround from #9 is obsolete, so I removed it in b982bf9 when I added target(format = "keras") just now.

Related: drake also supports fst storage for large data frames. It does not immediately work for the data in this example because it is an object of class rsplit, but a little refactoring might help us work around this little technicality.

library(drake)
n <- 1e8 # Each target is 1.6 GB in memory.
plan <- drake_plan(
  data_fst = target(
    data.frame(x = runif(n), y = runif(n)),
    format = "fst"
  ),
  data_old = data.frame(x = runif(n), y = runif(n))
)
make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#>   target   elapsed              user                 system    
#>   <chr>    <Duration>           <Duration>           <Duration>
#> 1 data_fst 13.93s               37.562s              7.954s    
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s

Created on 2019-08-05 by the reprex package (v0.3.0)

@edgararuiz-zz edgararuiz-zz merged commit f85ef17 into sol-eng:master Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants