-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Prework
- Read and abide by
drake's code of conduct. - Search for duplicates among the existing issues, both open and closed.
- Advanced users: verify that the bug still persists in the current development version (i.e.
remotes::install_github("ropensci/drake")) and mention the SHA-1 hash of the Git commit you install.
Description
Processing a big file returns a 'cannot allocate vector of size X' or 'cannot allocate buffer' but ONLY inside drake. That is, I can read the file and process it outside of drake and there is no error.
Reproducible example
I had some trouble making this a reprex because I'm very unfamiliar with drake. In fact this is my first project but since I found this weird error I thought it would useful to put it here. Instead, I have a minimal working repository in Github that has the workflow. Below I explain.
- Clone the repo:
git clone https://github.com/cimentadaj/spain_census.git
- Run
renv(next iteration ofpackrat) for package management
devtools::install_github("rstudio/renv")
renv::restore() # should only take 1-2 mins- Load
drakeand runr_make()
library(drake)
r_make()
# This will take a few mins because it downloads the data which is about 4M rowsThere are four files (the same as in drake's documentation)
code/01-packages.Rloads packagescode/02-reading_data.Rhas one function which downloads, reads and saves the data inoutput/code/plan.Routlines the plan._drake.R
If I run r_make() (because my workflow is very interactive), everything will run OK (although it will take some time because everything is very heavy) until the plan in code/plan.R. That is, line 13 will read the heavy data but when the plan executes the target process_data (which only selects a few columns), drake will crash with memory related problems. The specific error is Error : cannot allocate vector of size 7.9GB or 'cannot allocate buffer'.
However, if I run all the scripts inside the folder code/ and manually run everything until line 13 in code/plan.R and then just do select(read_data, CPRO), this works. The error is only happening inside drake.
Session info
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] drake_7.4.0 workflowr_1.4.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 ps_1.3.0 crayon_1.3.4 assertthat_0.2.1
[5] digest_0.6.19 R6_2.4.0 backports_1.1.4 storr_1.2.1
[9] magrittr_1.5 evaluate_0.14 cli_1.1.0 rlang_0.4.0
[13] renv_0.5.0-66 callr_3.2.0 rmarkdown_1.13 tools_3.6.0
[17] igraph_1.2.4.1 processx_3.3.1 xfun_0.7 compiler_3.6.0
[21] pkgconfig_2.0.2 base64url_1.4 htmltools_0.3.6 knitr_1.23
Expected output
What output would the correct behavior have produced?
No error and then readd(process_data) will return the correct data frame.