-
Notifications
You must be signed in to change notification settings - Fork 196
Description
I read a few larger Excel files (several hundreds of MB's) and noticed that the RAM usage almost instantly jumped to 100%. This is unexpected since that machine has 32GB of RAM and only ~ 5GB were used before calling read_excel().
Thus, I reproduced the problem on a smaller scale. The example xlsx file is attached in the zip file. When I read that file with r readxl::read_excel("readxl_test.xlsx") I see in the Windows task manager that RAM usage jumps about 2GB higher during the process. However, profiling the call shows that only ~ 240 MB were used. For small files, this is no problem, for large ones, it is.
Here is the reprex including the sessionInfo() ...
set.seed(20190130)
# generate example data
df <- data.frame(A = rnorm(1000000),
B = rnorm(1000000),
C = rnorm(1000000),
D = rnorm(1000000))
# write excel file
openxlsx::write.xlsx(df, "readxl_test.xlsx")
# read excel file
readxl::read_excel("readxl_test.xlsx")
#> # A tibble: 1,000,000 x 4
#> A B C D
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.141 -1.85 0.410 -1.38
#> 2 1.45 0.397 -1.40 -0.147
#> 3 -0.456 0.164 -0.744 -0.0907
#> 4 2.02 0.492 0.157 1.05
#> 5 2.14 -1.87 -0.946 1.80
#> 6 3.58 0.502 1.05 1.50
#> 7 1.34 0.509 -0.627 1.00
#> 8 -1.82 1.65 1.03 -0.685
#> 9 0.810 0.0659 0.156 -1.91
#> 10 0.711 0.393 0.581 -0.543
#> # ... with 999,990 more rows
# clean up
file.remove("readxl_test.xlsx")
#> [1] TRUECreated on 2019-01-30 by the reprex package (v0.2.0).
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> os Windows 7 x64 SP 1
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2019-01-30
#>
#> - Packages --------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.1)
#> backports 1.1.2 2017-12-13 [1] CRAN (R 3.5.0)
#> callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.1)
#> cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1)
#> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
#> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
#> evaluate 0.12 2018-10-09 [1] CRAN (R 3.5.1)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.1)
#> fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.1)
#> glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.1)
#> highr 0.7 2018-06-09 [1] CRAN (R 3.5.1)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1)
#> knitr 1.21 2018-12-10 [1] CRAN (R 3.5.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1)
#> openxlsx 4.1.0 2018-05-26 [1] CRAN (R 3.5.1)
#> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)
#> pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.1)
#> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)
#> processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.2)
#> ps 1.2.1 2018-11-06 [1] CRAN (R 3.5.1)
#> R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.1)
#> Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.2)
#> readxl 1.1.0 2018-04-20 [1] CRAN (R 3.5.1)
#> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.1)
#> rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2)
#> rmarkdown 1.11 2018-12-08 [1] CRAN (R 3.5.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
#> stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.1)
#> stringr 1.3.1 2018-05-10 [1] CRAN (R 3.5.1)
#> testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.1)
#> tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.2)
#> usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.1)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
#> xfun 0.4 2018-10-23 [1] CRAN (R 3.5.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
#> zip 1.0.0 2017-04-25 [1] CRAN (R 3.5.1)
#>
#> [1] C:/Program Files/R/library... here is the result from profvis ...
... and here a screenshot showing the RAM usage jump from 4.5 GB RAM usage to 6.6 GB:
