Skip to content

LibIconv.mod - illegal byte sequence with BIG5 encoded sources #95

@GWRon

Description

@GWRon

When libiconv.mod is adapted to work with NG (I think I got it from @woollybah somewhen - was named "bah.libiconv_refurbished1_x86_x64.zip") - and it contains various upgrades to the libiconv.h and some source files) it works with eg "utf8" but fails (illegal byte sequence) with BIG5 encoded files.

Using the commandline ($ iconv -f BIG5-HKSCS -t utf8 testunicode.big5.txt -o testunicode.iconv.utf8.txt) generates a file which can be loaded in Blitzmax via ReadStream("utf8::file.txt") (or similar)

as the iconv.ConvertStream() I have locally does not work I used some "TBank" to load stuff "from BlitzMax" (skipping C) and passing it to the TIConv instance.

SuperStrict 
Import bah.libiconv2
Import Brl.Bank

'Local iconv:TIConv = TIConv.Create("BIG5-HKSCS", "UTF-8//IGNORE")
Local iconv:TIConv = TIConv.Create("BIG5", "UTF-8//IGNORE")
Local bank:TBank = LoadBank("testunicode.geany.big5.txt")
'Local iconv:TIConv = TIConv.Create("UTF-8", "UTF-8//IGNORE")
'Local bank:TBank = LoadBank("testunicode.iconv.utf8.txt")
if not iconv or not bank then throw "correct files or encoding!"
Local b:Byte Ptr = bank.Lock()
Local bsize:int = int(bank.size())
	
Local out:Byte[1024]
Local outLen:Int = 1024
	
local res:int = iconv.Convert(b, bsize, out, outLen)
' use "$ cat /usr/include/asm-generic/errno.h" to list error codes on your linux OS
if res < 0 Then Print "iconv error: " + iconv.LastError() + " (84 = illegal multibyte expression on my OS)" 

bank.Unlock()
	
'want to have utf8 at the end
text = String.FromUTF8Bytes(out, 1024 - outLen)

print text 'consoles are utf8 on blitzmax NG + linux

When passing in "normal text" or "utf8 saved" big5 text (so converted for an utf8 compatible representation) it works, but directly passing big5, big5-hkscs, gb18050 ... and it leads to the error code above (illegal byte sequence)

Sample text files:
testunicode.geany.big5.txt
testunicode.geany.big5hkscs.txt
testunicode.geany.gb18030.txt
testunicode.iconv.utf8.txt

Module source (adjusted stream sample to override TStream-stuff which now uses :long - still does not convert anything there) :
libiconv2.mod.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions