Python port of rs/xid, providing globally unique, k-sortable 12-byte identifiers with a canonical base32 string representation.
Install from PyPI:
pip install pyxidInstall from source:
git clone https://github.com/thomastheyoung/pyxid.git
cd pyxid
pip install .Optional extras:
pip install pyxid[sqlalchemy] # include SQLAlchemy integration helpers
pip install pyxid[dev] # include pytest and pytest-covimport pyxid
import concurrent.futures
import sqlite3
identifier = pyxid.new()
print(identifier) # base32 string
print(identifier.time()) # datetime
print(identifier.machine()) # machine bytes
print(identifier.pid()) # pid as 16-bit int
print(identifier.pid_bytes()) # pid bytes, stable width
print(identifier.counter()) # 24-bit counter
print(identifier.counter_bytes())
# JSON helpers
payload = {"id": identifier, "nil": pyxid.NilID()}
json_text = json.dumps(payload, cls=pyxid.IDJSONEncoder)
restored = json.loads(json_text, object_hook=pyxid.id_object_hook())
# SQL helpers
value_for_db = pyxid.to_sql_value(identifier)
identifier_from_db = pyxid.from_sql_value(value_for_db)
# DB-API helpers (sqlite3 example)
pyxid.register_sqlite() # registers adapters and converters
conn = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
conn.execute("CREATE TABLE items (value XID)")
conn.execute("INSERT INTO items (value) VALUES (?)", (identifier,))
fetched = conn.execute("SELECT value FROM items").fetchone()[0]
assert fetched == identifier
# Multi-process generation (ProcessPoolExecutor example)
def generate_many(count: int) -> list[pyxid.ID]:
return [pyxid.new() for _ in range(count)]
with concurrent.futures.ProcessPoolExecutor() as pool:
batches = list(pool.map(generate_many, [1000] * 4))
assert len({str(id_) for batch in batches for id_ in batch}) == 4000- pyxid hashes hostname when platform IDs are unavailable; in Docker/Kubernetes ensure each container has a distinct hostname (default in Docker). Set
XID_MACHINE_IDexplicitly for full control. - When using read-only containers, confirm
/etc/machine-id(or platform equivalent) is accessible; otherwise pyxid falls back to hostname hashing. - For reproducible builds, set
XID_MACHINE_IDandTZto avoid environment-dependent variations during tests.
| Capability | Go rs/xid |
pyxid |
|---|---|---|
| 12-byte, k-sortable identifiers | ✅ | ✅ (lexicographically sorted by timestamp, 1-second precision) |
| Custom counter seeded with PID / cpuset | ✅ | ✅ |
| Environment → platform → hostname fallback | ✅ | ✅ |
| JSON marshal/unmarshal helpers | ✅ (MarshalJSON) |
✅ (IDJSONEncoder, id_object_hook, json_default) |
| SQL database integration | ✅ (Value, Scan) |
✅ (to_sql_value, from_sql_value, register_sqlite, SQLAlchemyXID) |
| Formatting helpers | ✅ (String) |
✅ (__str__, __format__, raw/upper/lower specifiers) |
| PID / counter fixed-width accessors | ✅ | ✅ (pid_bytes, counter_bytes) |
- Reuse worker processes (e.g., via
multiprocessingpools) to amortize module import and machine-ID discovery costs. - Avoid expensive string conversions in tight loops; cache
str(id_)if it is used repeatedly. - Configure
PYTHONHASHSEEDand random seeding when benchmarking to reduce variance; pyxid seeds counters usingos.urandom. - For burst workloads, batch ID generation (e.g.,
[pyxid.new() for _ in range(1000)]) and perform downstream serialization asynchronously.
Simple benchmark on Apple M1 / Python 3.12.4:
generated 500000 IDs in 0.538s -> 930,154 ids/sec
Script:
import pyxid, time
start = time.perf_counter()
for _ in range(500_000):
pyxid.new()
elapsed = time.perf_counter() - start
print(f"{500_000/elapsed:.0f} ids/sec")Compared to the Go implementation (go test -bench=. typically yields ~3-5M ids/sec on similar hardware), pyxid is slower—as expected for Python—but still well suited for most application workloads that need collision-free, sortable identifiers.
pyxid uses the same 20-character base32 alphabet as MongoDB ObjectId (0123456789abcdefghijklmnopqrstuv).
This differs from the RFC 4648 base32hex alphabet and avoids padding, matching the Go reference implementation.
The benchmarks/ directory contains quick probes you can run locally:
python benchmarks/throughput.py --iterations 200000
python benchmarks/concurrency.py --workers 4 --per-worker 50000
python benchmarks/memory.py --count 100000Use these as starting points for your own profiling; results vary by hardware and workload.
- UUIDv4: Replace
uuid.uuid4().hexwithstr(pyxid.new()). Expect 20-character sortable strings instead of 32-character hex. - Snowflake: pyxid preserves chronological ordering; if you need to backfill historical timestamps use
pyxid.new_with_time(dt). - MongoDB ObjectId: Both are 12 bytes; convert raw bytes with
pyxid.from_bytes(object_id.binary)if needed, though the layout differs. - During phased rollouts, dual-write both old and new IDs, then backfill using
pyxid.from_stringorpyxid.from_bytesto normalize stored data.
- The package maintains 100 % test coverage to ensure feature parity with the Go reference implementation.
- JSON and SQL helpers are optional; import only what you need.
- See
tests/test_xid.pyfor thorough examples, including platform-specific machine ID fallbacks, JSON encoding, SQLAlchemy integration, and formatting. - Thread-safe:
pyxid.new()uses a shared counter to guarantee uniqueness across threads within a process. - Counter wraparound: up to 16,777,216 IDs per second per process before the counter wraps (matching the Go implementation).