Skip to content

Comments

gh-66802: Add unicodedata.block() function#145042

Open
StanFromIreland wants to merge 5 commits intopython:mainfrom
StanFromIreland:unicodedata-blocks
Open

gh-66802: Add unicodedata.block() function#145042
StanFromIreland wants to merge 5 commits intopython:mainfrom
StanFromIreland:unicodedata-blocks

Conversation

@StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Feb 20, 2026

[clinic start generated code]*/

static PyObject *
unicodedata_block_impl(PyObject *module, int chr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried other approaches for this, and compared the performances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table with ranges & bin search (PR) Adding to each record
unique properties 395 1829
size of unicodedata_db.h (bytes) 626,364 759,474
simple bench* 498 nsec per loop 490 nsec per loop

* ./python -m timeit -v -r 10 -n 1000000 -s "import unicodedata; b=unicodedata.block" "b('\u0041');b('\u4E00');b('\U0010FFFF')"

The lookup difference is minor, and we save quite a bit of memory (~17%), so I think this approach is better.

StanFromIreland and others added 2 commits February 21, 2026 09:58
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The implementation with a binary tree is efficient (CPU, memory). It's a reasonable addition to the unicodedata module.

Copy link
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please do check whether using size_t for namelen wouldn't be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants