Fix in strdb.cc file for handling large file reads#18
Fix in strdb.cc file for handling large file reads#18ajalagam wants to merge 1 commit intopercyliang:masterfrom
Conversation
percyliang
left a comment
There was a problem hiding this comment.
-
This change doesn't preserve the previous functionality, which ignores all whitespace (multiple, spaces, tabs, newlines), whereas the proposed change only breaks on single spaces. Would be good not to change this.
-
Can you explain why this code works when the previous one doesn't? I don't know how ifstream is implemented, but the memory usage for reading in a string should be constant...
| char s[16384]; | ||
| char buf[16384]; int buf_i = 0; // Output buffer | ||
| while(in >> s) { // Read a string | ||
| for(std::string line; getline( in, line, ' ');) { |
There was a problem hiding this comment.
-
To preserve 'breaking on whitespace functionality' the code can be modified as below.
for(std::string line; getline(std::ws(in), line, ' ');) -
I think the reason why the previous code didn't work has got something to do with the internal memory allocation, buffer handling of the inputstream and copying to a char array of [16384] size. The recommended way to read is into a std::string rather than a char array. The stacktrace also seems to indicate that the internal memcpy operation didn't go fine.
I think an understanding beyond this may require me to dig deeper into the workings of the ifstream, limitations of reading via operator>> versus getline, and also into possibly libc library oeprations.
strdb.cc terminates with segmentation fault when run on large data files of say 5 GB in size. This commit has a fix for this issue.