Skip to content

bug: single value is invalidly split into multiple fields on certain unicode characters #137

@neko-kai

Description

@neko-kai

Example:

import java.io.ByteArrayInputStream
import java.io.InputStreamReader

import com.github.tototoshi.csv.CSVReader

object App extends App {
  val csv = ",퀙䘘縤ઞ◒䘬掤⢶坪⁓匕ମҀꑤꇮ腋觯\uE5D8\uE564栚ℑ钺剸蕁耥믠鐛挀쐜麂\uE6BF슊䧩奌쒒\u0085䃡썙츚祉≔轾╠扒㱉鞎뽖븢暩䜄蚂\uE0F4\uEF66\uEAC8\uEDEE\uF172秊ӥ붝ヴ恢둊\uEE65\uED46\uF4AC쫎,,,,2018-10-04T20:23:15.639Z,,,,,,,,,,,233299423,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Ⓑ쁖僂\uEF95䳀呇捧動瀼䂲殆䶐훥鼟쿠덠Ớ땄礪\uEEF4ἳ홤篏碽⪎ʞ昉\uF7E2\u0B29걫雘᪆脟\uEE43ᠪ뒤栗\uE487ɦ瀻\uE4AF\uEF5B\uF358ᝬﭧ薪쉶䗹훴殊Ӯ\u0FF2\uD7A9묬鼃\uEFBF䀌럚ᆾ掽呈콒ᶿ蟡䵫䃽ꅡᠹ檸ⰹ\uA4CA뢳ᑤ\uE57D웪ⷹ\uF436槵巸貉ﻥگ쁸㎿顲鱿뽿쒏﹪\uEB34浱ퟲ驊,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-7.147540834511315E-49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2018-10-04T09:54:28.639Z,,,,,,,,,"

  val input = new InputStreamReader(new ByteArrayInputStream(csv.getBytes("UTF-8")))

  val res = CSVReader.open(input).all()

  println(res)
}

The output is:

List(List(, 퀙䘘縤ઞ◒䘬掤⢶坪⁓匕ମҀꑤꇮ腋觯栚ℑ钺剸蕁耥믠鐛挀쐜麂슊䧩奌쒒), List(䃡썙츚祉≔轾╠扒㱉鞎뽖븢暩䜄蚂窏㋡秊ӥ붝ヴ恢둊쫎, , , , 2018-10-04T20:23:15.639Z, , , , , , , , , , , 233299423, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Ⓑ쁖僂䳀呇捧動瀼䂲殆䶐훥鼟쿠덠Ớ땄礪ἳ홤篏碽⪎ʞ昉଩걫雘᪆脟ᠪ뒤栗ɦ瀻뮲橾ᝬﭧ薪쉶䗹훴殊Ӯ࿲ᢦ힩묬鼃䀌럚ᆾ掽呈콒ᶿ蟡䵫䃽ꅡᠹ檸ⰹ꓊뢳ᑤ웪ⷹ槵巸貉ﻥگ쁸㎿顲鱿뽿쒏﹪浱ퟲ驊, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , -7.147540834511315E-49, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 2018-10-04T09:54:28.639Z, , , , , , , , , ))

But, the second value was supposed to be 퀙䘘縤ઞ◒䘬掤⢶坪⁓匕ମҀꑤꇮ腋觯栚ℑ钺剸蕁耥믠鐛挀쐜麂슊䧩奌쒒�䃡썙츚祉≔轾╠扒㱉鞎뽖븢暩䜄蚂窏㋡秊ӥ붝ヴ恢둊쫎! instead, it was truncated to 퀙䘘縤ઞ◒䘬掤⢶坪⁓匕ମҀꑤꇮ腋觯栚ℑ钺剸蕁耥믠鐛挀쐜麂슊䧩奌쒒

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions