Skip to content

Unable to dbInsert a .txt file #9

@pakom

Description

@pakom

Dear Roger,

I am trying to dbInsert a large .txt file as a data frame using the read_fwf function from the readr package. The file comes from OECD's PISA 2012 and its size is 1.1GB. It contains the responses to the student questionnaire. I work on a laptop with 4GB of RAM under Arch Linux (64-bit) and have about 250GB of free space on the hard drive. The size of the swap partition is 2GB. Here is the code that I use:

setwd("/media/work")

dbCreate("tmpDB")

DB <- dbInit("tmpDB")

dbInsert(DB, "x", data.frame(read_fwf(
file = "/media/PISA_2012/INT_STU12_DEC03.txt",
fwf_positions(start = ranges.start, end = ranges.end,
col_names = var.names), progress = FALSE)))

ranges.start, ranges.end and var.names are taken from the .sps file provided with the .txt data file.

The tmpDB file is created, the DB is initialized in the R environment. The dbInsert runs without any error or warning messages, but after being done the file size of the tmpDB still remains 0B, the dbList(DB) returns character(0) and the key x does not seem to exist.

I tried with smaller files from the same or previous cycles and with those of about 500MB it works. I also tried taking just 200 lines from the file I have troubles with and it works too. I thought this might be due to the limitation of my /tmp folder which is the system's temporary folder and is limited to 1.8GB. Then I installed the unixtoolspackage and used the following to change R's temporary folder and check if it is changed:

> set.tempdir("/media/temp")
> tempdir()
[1] "/media/temp"
> tempfile()
[1] "/media/temp/file8fc7d43a8d6"

I run the dbInsert code above again. However, the result is the same - tmpDB is still 0B, the x key does not exist.

What would be the reason for this behavior?

Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions