Skip to content

bug: /dev/urandom returns UTF-8 replacement chars instead of raw bytes #811

@chaliy

Description

@chaliy

Problem

Reading from /dev/urandom in the VFS returns UTF-8 encoded data instead of raw bytes. Bytes > 0x7F are expanded to 2-byte UTF-8 sequences (c2 xx or c3 xx), inflating the output size and breaking byte-level text processing.

Note: #828 partially fixed this (no more U+FFFD replacement characters), but bytes are still UTF-8 encoded rather than raw.

Reproduction

Test 1: Byte count inflation

head -c 8 /dev/urandom | wc -c
# Expected: 8
# Actual: 11-13 (varies, always > 8)

Requested vs actual byte counts:

Requested Actual
1 1
4 5
8 11
16 25
32 51

~60% inflation — consistent with ~50% of random bytes being > 0x7F and getting 2-byte UTF-8 encoding.

Test 2: Raw bytes show UTF-8 multibyte sequences

head -c 16 /dev/urandom | od -A x -t x1z | head -2
# 0000000 c3 aa 5b 4d c3 82 c3 bd 72 25 c3 8c c3 84 68 09

The c3 and c2 prefixes are UTF-8 lead bytes — raw byte values like 0xAA become c3 aa (2 bytes).

Test 3: tr -dc filtering broken

LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | head -c 8
# Expected: 8 alphanumeric chars like "a7xk2m9p"
# Actual: garbled non-ASCII like "ÅʤÄ" (4-5 chars, many non-ASCII)

Test 4: Repeated runs show inconsistent lengths

result=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | head -c 8)
echo "len=${#result} val=$result"
# len=5 val=Ýs±Ï�
# len=5 val=¹aÑ�
# len=4 val=²¨á

Root cause

The VFS read path converts random bytes through a Rust String (which must be valid UTF-8). Bytes > 0x7F are encoded as multi-byte UTF-8 sequences instead of being passed through as raw single bytes.

Impact

The common pattern tr -dc 'a-z0-9' < /dev/urandom | head -c N for generating random strings is broken. Used in wedow/ticket's generate_id(). Workaround: use $RANDOM instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions