Conversation
There was a problem hiding this comment.
I was wondering if it's wise to stop outputting the single columns as CSV, because that has implications on escaping (you could write it as single-column-but-escaped-CSV), but I could not come up with a scenario where that would ever become a practical problem.
With this job I wanted to dump the columns as a non-CSV file on purpose, because I want to interpret the columns without the I'd say this is not an issue because if we had a setup which relies on |
|
Actually, just found a case. If the per-cell CSV values contained newlines they will lose escaping and break the scheme. So perhaps escape those? |
|
Excellent point! I initially thought that the escaping would only be done to the delimiter, but it seems that other types of backslashes mess up our final columns: A quick search on the Internets tells me that our fellow coders seem to agree on |
If a user dumps a csv file with e.g.
i6_core.text.processing.WriteToCsvFileJob, there's a chance that the delimiter appears in any of the fields. Thecsvmodule escapes these by adding quotes in the offending column. As a consequence, parsing the resulting csv file with basic parsers such asawk -F ...yields unpredictable results.I propose a job to reliably obtain a given (set of) column(s) from a csv file by using
csv.reader, which properly unescapes the columns.