Skip to content

Comments

Automatic charset detection#4

Closed
Nyoroon wants to merge 4 commits intomgedmin:masterfrom
Nyoroon:patch-1
Closed

Automatic charset detection#4
Nyoroon wants to merge 4 commits intomgedmin:masterfrom
Nyoroon:patch-1

Conversation

@Nyoroon
Copy link

@Nyoroon Nyoroon commented Oct 22, 2013

No description provided.

@coveralls
Copy link

Coverage Status

Coverage decreased (-85.0%) when pulling 3c9713a on Nyoroon:patch-1 into 3dae140 on mgedmin:master.

@mgedmin
Copy link
Owner

mgedmin commented Oct 22, 2013

You need to add chardet to install_requires in setup.py.

@mgedmin
Copy link
Owner

mgedmin commented Oct 22, 2013

I see chardet detects UTF-8, so why not use it always? I.e. lose the try/except.

@mgedmin
Copy link
Owner

mgedmin commented Oct 22, 2013

I'd also be more comfortable if the input encoding could be specified explicitly, in case chardet guesses wrong. A command-line/config file option perhaps? Say, --input-charset={detect|hybrid|<charset>}, where detect would use chardet, hybrid would be http://xchat.org/encoding/ (i.e. the current behavior of irclog2html master), and <charset> would be s.decode(charset, 'replace').

The default should probably be chardet, unless I discover that it guesses wrong too often, in which case I'll revert the default to hybrid.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 711e2e2 on Nyoroon:patch-1 into 3dae140 on mgedmin:master.

@mgedmin
Copy link
Owner

mgedmin commented Oct 22, 2013

Chardet doesn't appear to be very-well maintained: the PyPI page points to https://github.com/erikrose/chardet, which doesn't allow anyone to file issues, and is a fork of https://github.com/dcramer/chardet, which has a bunch of 1-year-old unanswered issues.

One of those issues is Python 3 support: dcramer/chardet#7. irclog2html currently supports Python 3. Adding a dependency on chardet breaks that.

@Nyoroon
Copy link
Author

Nyoroon commented Oct 22, 2013

Oh, thanks!
I'll think about this.

@Nyoroon
Copy link
Author

Nyoroon commented Oct 22, 2013

https://github.com/sigmavirus24/charade seems better, but not ideal.

@coveralls
Copy link

Coverage Status

Coverage decreased (-4.33%) when pulling a2f1fc4 on Nyoroon:patch-1 into 3dae140 on mgedmin:master.

@Nyoroon
Copy link
Author

Nyoroon commented Oct 22, 2013

I'll test after every code change
I'll test after every code change
I'll test after every code change
I'll test after every code change
I'll test after every code change
I'll test after every code change

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) when pulling 8399444 on Nyoroon:patch-1 into 3dae140 on mgedmin:master.

@mgedmin
Copy link
Owner

mgedmin commented Nov 4, 2013

Do you still plan to work on this?

As you can see on Travis CI, the tests are broken for Python 2.x and 3.x.

@Nyoroon
Copy link
Author

Nyoroon commented Nov 5, 2013

Yes, i work on it.

@Nyoroon Nyoroon closed this Apr 27, 2014
@Nyoroon Nyoroon deleted the patch-1 branch April 27, 2014 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants