Skip to content

Potential memory leak with read_excel() on Windows 7? #545

@matthiasgomolka

Description

@matthiasgomolka

I read a few larger Excel files (several hundreds of MB's) and noticed that the RAM usage almost instantly jumped to 100%. This is unexpected since that machine has 32GB of RAM and only ~ 5GB were used before calling read_excel().

Thus, I reproduced the problem on a smaller scale. The example xlsx file is attached in the zip file. When I read that file with r readxl::read_excel("readxl_test.xlsx") I see in the Windows task manager that RAM usage jumps about 2GB higher during the process. However, profiling the call shows that only ~ 240 MB were used. For small files, this is no problem, for large ones, it is.

Here is the reprex including the sessionInfo() ...

set.seed(20190130)

# generate example data
df <- data.frame(A = rnorm(1000000),
                 B = rnorm(1000000),
                 C = rnorm(1000000),
                 D = rnorm(1000000))

# write excel file
openxlsx::write.xlsx(df, "readxl_test.xlsx")

# read excel file
readxl::read_excel("readxl_test.xlsx")
#> # A tibble: 1,000,000 x 4
#>         A       B      C       D
#>     <dbl>   <dbl>  <dbl>   <dbl>
#>  1 -0.141 -1.85    0.410 -1.38  
#>  2  1.45   0.397  -1.40  -0.147 
#>  3 -0.456  0.164  -0.744 -0.0907
#>  4  2.02   0.492   0.157  1.05  
#>  5  2.14  -1.87   -0.946  1.80  
#>  6  3.58   0.502   1.05   1.50  
#>  7  1.34   0.509  -0.627  1.00  
#>  8 -1.82   1.65    1.03  -0.685 
#>  9  0.810  0.0659  0.156 -1.91  
#> 10  0.711  0.393   0.581 -0.543 
#> # ... with 999,990 more rows

# clean up
file.remove("readxl_test.xlsx")
#> [1] TRUE

Created on 2019-01-30 by the reprex package (v0.2.0).

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.0 (2018-04-23)
#>  os       Windows 7 x64 SP 1          
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-01-30                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)
#>  backports     1.1.2   2017-12-13 [1] CRAN (R 3.5.0)
#>  callr         3.1.1   2018-12-21 [1] CRAN (R 3.5.2)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.5.1)
#>  cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)
#>  devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)
#>  digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)
#>  evaluate      0.12    2018-10-09 [1] CRAN (R 3.5.1)
#>  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.5.1)
#>  fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)
#>  glue          1.3.0   2018-07-17 [1] CRAN (R 3.5.1)
#>  highr         0.7     2018-06-09 [1] CRAN (R 3.5.1)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.5.1)
#>  knitr         1.21    2018-12-10 [1] CRAN (R 3.5.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)
#>  openxlsx      4.1.0   2018-05-26 [1] CRAN (R 3.5.1)
#>  pillar        1.3.1   2018-12-15 [1] CRAN (R 3.5.2)
#>  pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)
#>  pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.1)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)
#>  processx      3.2.1   2018-12-05 [1] CRAN (R 3.5.2)
#>  ps            1.2.1   2018-11-06 [1] CRAN (R 3.5.1)
#>  R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.1)
#>  Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.2)
#>  readxl        1.1.0   2018-04-20 [1] CRAN (R 3.5.1)
#>  remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)
#>  rlang         0.3.1   2019-01-08 [1] CRAN (R 3.5.2)
#>  rmarkdown     1.11    2018-12-08 [1] CRAN (R 3.5.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)
#>  stringi       1.2.4   2018-07-20 [1] CRAN (R 3.5.1)
#>  stringr       1.3.1   2018-05-10 [1] CRAN (R 3.5.1)
#>  testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)
#>  tibble        2.0.1   2019-01-12 [1] CRAN (R 3.5.2)
#>  usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.5.1)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)
#>  xfun          0.4     2018-10-23 [1] CRAN (R 3.5.1)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.1)
#>  zip           1.0.0   2017-04-25 [1] CRAN (R 3.5.1)
#> 
#> [1] C:/Program Files/R/library

... here is the result from profvis ...

readxl_test.Rprofvis.zip

... and here a screenshot showing the RAM usage jump from 4.5 GB RAM usage to 6.6 GB:

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviorxlsx 📎

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions