CLD2 cannot classify text that doesn't have spaces

The following text gets properly detected as English, with a percent of 99:

> the soldier with the green whiskers led them through the streets of the emerald city until they reached the room where the guardian of the gates lived this office run locked their spectacles to put them back in his great box and then he

However that same text, but with spaces removed:

>thesoldierwiththegreenwhiskersledthemthroughthestreetsoftheemeraldcityuntiltheyreachedtheroomwheretheguardianofthegateslivedthisofficerunlockedtheirspectaclestoputthembackinhisgreatboxandthenhe

gets classified as English(because that's the default), but `is_reliable` is set to `false` and the percentage is **0**.
Upon further inspection, using the function `DetectLanguageSummary`, the 3 most likely languages are all `UNKNOWN_LANGUAGE`, and the percentages for them are all **0**.

Since CLD2 uses quadgrams to analyze latin scripts, the whitespaces should matter very little(if at all) when detecting the language.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLD2 cannot classify text that doesn't have spaces #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CLD2 cannot classify text that doesn't have spaces #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions