Skip to content

Convert Arabic glyphs into standard letters #64

@linuxscout

Description

@linuxscout

According to previous issue issue 57, we propose to add a new function to unshape this text

Salam,
I tested the given words with pyarabic word as follow,
the word contains encoded glyphs not standard letters, it must be converted to ordinary letters.

To convert glyph based word into a string of letters you can use:
NB: the second unshape function is used only to inverse the result word

 word = "ﻣﺴﺎﻣﻌﻬﻢ"
 from pyarabic.unshape import unshaping_word
unshaping_word(unshaping_word(word))
'مسامعهم'
  • The test used to detect the problem

``>>> import pyarabic.araby as ar

lst=["اﻟﻤﺴﺌﻮﻟﻴﺔ","ﻣﺴﺎﻣﻌﻬﻢ","ﻓﻜﻠﻨﺎ","ﻣﺒﺎدراﺗﻨﺎ","ﻓﻬﻢ","اﻟﻤﻨﻈﻮﻣﺔ"]
for i in lst:
... print(i, ar.is_arabicword(i))
...
اﻟﻤﺴﺌﻮﻟﻴﺔ False
ﻣﺴﺎﻣﻌﻬﻢ False
ﻓﻜﻠﻨﺎ False
ﻣﺒﺎدراﺗﻨﺎ False
ﻓﻬﻢ False
اﻟﻤﻨﻈﻮﻣﺔ False

for i in lst:
... print("%s"%i, ar.is_arabicword(i))
...
اﻟﻤﺴﺌﻮﻟﻴﺔ False
ﻣﺴﺎﻣﻌﻬﻢ False
ﻓﻜﻠﻨﺎ False
ﻣﺒﺎدراﺗﻨﺎ False
ﻓﻬﻢ False
اﻟﻤﻨﻈﻮﻣﺔ False
for i in lst:
... for c in i :
... print(c, ord(c), ar.name(c))
...
ا 1575 ألف
ﻟ 65247
ﻤ 65252
ﺴ 65204
ﺌ 65164
ﻮ 65262
ﻟ 65247
ﻴ 65268
ﺔ 65172
ﻣ 65251
ﺴ 65204
ﺎ 65166
ﻣ 65251
ﻌ 65228
ﻬ 65260
ﻢ 65250
ﻓ 65235
ﻜ 65244
ﻠ 65248
ﻨ 65256
ﺎ 65166
ﻣ 65251
ﺒ 65170
ﺎ 65166
د 1583 دال
ر 1585 راء
ا 1575 ألف
ﺗ 65175
ﻨ 65256
ﺎ 65166
ﻓ 65235
ﻬ 65260
ﻢ 65250
ا 1575 ألف
ﻟ 65247
ﻤ 65252
ﻨ 65256
ﻈ 65224
ﻮ 65262
ﻣ 65251
ﺔ 65172
`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions