-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Hi, I encountered a bug which is triggered by (very) long lines in the csv files.
It seems that the used csv module has a limit for the number of characters for fields of 131072 characters:
Cell In[33], line 2
----> 2 docs_scopus=litstudy.load_scopus_csv("scopus.csv")
File [\site-packages\litstudy\sources\scopus_csv.py:116] in load_scopus_csv(path)
114 with robust_open(path) as f:
115 lines = csv.DictReader(f)
--> 116 docs = [ScopusCsvDocument(line) for line in lines]
117 return DocumentSet(docs)
File \Lib\csv.py:116, in DictReader.__next__(self)
113 if self.line_num == 0:
114 # Used only for its side effect.
115 self.fieldnames
--> 116 row = next(self.reader)
117 self.line_num = self.reader.line_num
119 # unlike the basic reader, we prefer not to return blanks,
120 # because we will typically wind up with a dict full of None
121 # values
Error: field larger than field limit (131072)
You can use the DOI 10.1016/C2013-0-19213-6 for testing. The line of the complete csv export from Scopus has 182667 chars.
I assume, a solution is presented at https://stackoverflow.com/a/15063941
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working