Skip to content

thomastheyoung/pyxid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyxid

Python port of rs/xid, providing globally unique, k-sortable 12-byte identifiers with a canonical base32 string representation.

Installation

Install from PyPI:

pip install pyxid

Install from source:

git clone https://github.com/thomastheyoung/pyxid.git
cd pyxid
pip install .

Optional extras:

pip install pyxid[sqlalchemy]  # include SQLAlchemy integration helpers
pip install pyxid[dev]         # include pytest and pytest-cov

Usage

import pyxid
import concurrent.futures
import sqlite3

identifier = pyxid.new()
print(identifier)              # base32 string
print(identifier.time())       # datetime
print(identifier.machine())    # machine bytes
print(identifier.pid())        # pid as 16-bit int
print(identifier.pid_bytes())  # pid bytes, stable width
print(identifier.counter())    # 24-bit counter
print(identifier.counter_bytes())

# JSON helpers
payload = {"id": identifier, "nil": pyxid.NilID()}
json_text = json.dumps(payload, cls=pyxid.IDJSONEncoder)
restored = json.loads(json_text, object_hook=pyxid.id_object_hook())

# SQL helpers
value_for_db = pyxid.to_sql_value(identifier)
identifier_from_db = pyxid.from_sql_value(value_for_db)

# DB-API helpers (sqlite3 example)
pyxid.register_sqlite()  # registers adapters and converters
conn = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
conn.execute("CREATE TABLE items (value XID)")
conn.execute("INSERT INTO items (value) VALUES (?)", (identifier,))
fetched = conn.execute("SELECT value FROM items").fetchone()[0]
assert fetched == identifier

# Multi-process generation (ProcessPoolExecutor example)
def generate_many(count: int) -> list[pyxid.ID]:
    return [pyxid.new() for _ in range(count)]

with concurrent.futures.ProcessPoolExecutor() as pool:
    batches = list(pool.map(generate_many, [1000] * 4))
    assert len({str(id_) for batch in batches for id_ in batch}) == 4000

Containers & Deployment

  • pyxid hashes hostname when platform IDs are unavailable; in Docker/Kubernetes ensure each container has a distinct hostname (default in Docker). Set XID_MACHINE_ID explicitly for full control.
  • When using read-only containers, confirm /etc/machine-id (or platform equivalent) is accessible; otherwise pyxid falls back to hostname hashing.
  • For reproducible builds, set XID_MACHINE_ID and TZ to avoid environment-dependent variations during tests.

Feature Comparison

Capability Go rs/xid pyxid
12-byte, k-sortable identifiers ✅ (lexicographically sorted by timestamp, 1-second precision)
Custom counter seeded with PID / cpuset
Environment → platform → hostname fallback
JSON marshal/unmarshal helpers ✅ (MarshalJSON) ✅ (IDJSONEncoder, id_object_hook, json_default)
SQL database integration ✅ (Value, Scan) ✅ (to_sql_value, from_sql_value, register_sqlite, SQLAlchemyXID)
Formatting helpers ✅ (String) ✅ (__str__, __format__, raw/upper/lower specifiers)
PID / counter fixed-width accessors ✅ (pid_bytes, counter_bytes)

Performance Tuning Tips

  • Reuse worker processes (e.g., via multiprocessing pools) to amortize module import and machine-ID discovery costs.
  • Avoid expensive string conversions in tight loops; cache str(id_) if it is used repeatedly.
  • Configure PYTHONHASHSEED and random seeding when benchmarking to reduce variance; pyxid seeds counters using os.urandom.
  • For burst workloads, batch ID generation (e.g., [pyxid.new() for _ in range(1000)]) and perform downstream serialization asynchronously.

Performance

Simple benchmark on Apple M1 / Python 3.12.4:

generated 500000 IDs in 0.538s -> 930,154 ids/sec

Script:

import pyxid, time
start = time.perf_counter()
for _ in range(500_000):
    pyxid.new()
elapsed = time.perf_counter() - start
print(f"{500_000/elapsed:.0f} ids/sec")

Compared to the Go implementation (go test -bench=. typically yields ~3-5M ids/sec on similar hardware), pyxid is slower—as expected for Python—but still well suited for most application workloads that need collision-free, sortable identifiers.

Encoding Notes

pyxid uses the same 20-character base32 alphabet as MongoDB ObjectId (0123456789abcdefghijklmnopqrstuv). This differs from the RFC 4648 base32hex alphabet and avoids padding, matching the Go reference implementation.

Benchmark Scripts

The benchmarks/ directory contains quick probes you can run locally:

python benchmarks/throughput.py --iterations 200000
python benchmarks/concurrency.py --workers 4 --per-worker 50000
python benchmarks/memory.py --count 100000

Use these as starting points for your own profiling; results vary by hardware and workload.

Migrating from Other ID Systems

  • UUIDv4: Replace uuid.uuid4().hex with str(pyxid.new()). Expect 20-character sortable strings instead of 32-character hex.
  • Snowflake: pyxid preserves chronological ordering; if you need to backfill historical timestamps use pyxid.new_with_time(dt).
  • MongoDB ObjectId: Both are 12 bytes; convert raw bytes with pyxid.from_bytes(object_id.binary) if needed, though the layout differs.
  • During phased rollouts, dual-write both old and new IDs, then backfill using pyxid.from_string or pyxid.from_bytes to normalize stored data.

Additional Notes

  • The package maintains 100 % test coverage to ensure feature parity with the Go reference implementation.
  • JSON and SQL helpers are optional; import only what you need.
  • See tests/test_xid.py for thorough examples, including platform-specific machine ID fallbacks, JSON encoding, SQLAlchemy integration, and formatting.
  • Thread-safe: pyxid.new() uses a shared counter to guarantee uniqueness across threads within a process.
  • Counter wraparound: up to 16,777,216 IDs per second per process before the counter wraps (matching the Go implementation).

About

Port of the original XID Go library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages