Skip to content

Scopus exceeds csv field limit #92

@r-wrobel

Description

@r-wrobel

Hi, I encountered a bug which is triggered by (very) long lines in the csv files.
It seems that the used csv module has a limit for the number of characters for fields of 131072 characters:

Cell In[33], line 2
----> 2 docs_scopus=litstudy.load_scopus_csv("scopus.csv")

File [\site-packages\litstudy\sources\scopus_csv.py:116] in load_scopus_csv(path)
    114 with robust_open(path) as f:
    115     lines = csv.DictReader(f)
--> 116     docs = [ScopusCsvDocument(line) for line in lines]
    117     return DocumentSet(docs)

File \Lib\csv.py:116, in DictReader.__next__(self)
    113 if self.line_num == 0:
    114     # Used only for its side effect.
    115     self.fieldnames
--> 116 row = next(self.reader)
    117 self.line_num = self.reader.line_num
    119 # unlike the basic reader, we prefer not to return blanks,
    120 # because we will typically wind up with a dict full of None
    121 # values

Error: field larger than field limit (131072)

You can use the DOI 10.1016/C2013-0-19213-6 for testing. The line of the complete csv export from Scopus has 182667 chars.
I assume, a solution is presented at https://stackoverflow.com/a/15063941

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions