Skip to content

Latest commit

 

History

History
146 lines (108 loc) · 6.51 KB

File metadata and controls

146 lines (108 loc) · 6.51 KB

pyxid

Python port of rs/xid, providing globally unique, k-sortable 12-byte identifiers with a canonical base32 string representation.

Installation

Install from PyPI:

pip install pyxid

Install from source:

git clone https://github.com/thomastheyoung/pyxid.git
cd pyxid
pip install .

Optional extras:

pip install pyxid[sqlalchemy]  # include SQLAlchemy integration helpers
pip install pyxid[dev]         # include pytest and pytest-cov

Usage

import pyxid
import concurrent.futures
import sqlite3

identifier = pyxid.new()
print(identifier)              # base32 string
print(identifier.time())       # datetime
print(identifier.machine())    # machine bytes
print(identifier.pid())        # pid as 16-bit int
print(identifier.pid_bytes())  # pid bytes, stable width
print(identifier.counter())    # 24-bit counter
print(identifier.counter_bytes())

# JSON helpers
payload = {"id": identifier, "nil": pyxid.NilID()}
json_text = json.dumps(payload, cls=pyxid.IDJSONEncoder)
restored = json.loads(json_text, object_hook=pyxid.id_object_hook())

# SQL helpers
value_for_db = pyxid.to_sql_value(identifier)
identifier_from_db = pyxid.from_sql_value(value_for_db)

# DB-API helpers (sqlite3 example)
pyxid.register_sqlite()  # registers adapters and converters
conn = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
conn.execute("CREATE TABLE items (value XID)")
conn.execute("INSERT INTO items (value) VALUES (?)", (identifier,))
fetched = conn.execute("SELECT value FROM items").fetchone()[0]
assert fetched == identifier

# Multi-process generation (ProcessPoolExecutor example)
def generate_many(count: int) -> list[pyxid.ID]:
    return [pyxid.new() for _ in range(count)]

with concurrent.futures.ProcessPoolExecutor() as pool:
    batches = list(pool.map(generate_many, [1000] * 4))
    assert len({str(id_) for batch in batches for id_ in batch}) == 4000

Containers & Deployment

  • pyxid hashes hostname when platform IDs are unavailable; in Docker/Kubernetes ensure each container has a distinct hostname (default in Docker). Set XID_MACHINE_ID explicitly for full control.
  • When using read-only containers, confirm /etc/machine-id (or platform equivalent) is accessible; otherwise pyxid falls back to hostname hashing.
  • For reproducible builds, set XID_MACHINE_ID and TZ to avoid environment-dependent variations during tests.

Feature Comparison

Capability Go rs/xid pyxid
12-byte, k-sortable identifiers ✅ (lexicographically sorted by timestamp, 1-second precision)
Custom counter seeded with PID / cpuset
Environment → platform → hostname fallback
JSON marshal/unmarshal helpers ✅ (MarshalJSON) ✅ (IDJSONEncoder, id_object_hook, json_default)
SQL database integration ✅ (Value, Scan) ✅ (to_sql_value, from_sql_value, register_sqlite, SQLAlchemyXID)
Formatting helpers ✅ (String) ✅ (__str__, __format__, raw/upper/lower specifiers)
PID / counter fixed-width accessors ✅ (pid_bytes, counter_bytes)

Performance Tuning Tips

  • Reuse worker processes (e.g., via multiprocessing pools) to amortize module import and machine-ID discovery costs.
  • Avoid expensive string conversions in tight loops; cache str(id_) if it is used repeatedly.
  • Configure PYTHONHASHSEED and random seeding when benchmarking to reduce variance; pyxid seeds counters using os.urandom.
  • For burst workloads, batch ID generation (e.g., [pyxid.new() for _ in range(1000)]) and perform downstream serialization asynchronously.

Performance

Simple benchmark on Apple M1 / Python 3.12.4:

generated 500000 IDs in 0.538s -> 930,154 ids/sec

Script:

import pyxid, time
start = time.perf_counter()
for _ in range(500_000):
    pyxid.new()
elapsed = time.perf_counter() - start
print(f"{500_000/elapsed:.0f} ids/sec")

Compared to the Go implementation (go test -bench=. typically yields ~3-5M ids/sec on similar hardware), pyxid is slower—as expected for Python—but still well suited for most application workloads that need collision-free, sortable identifiers.

Encoding Notes

pyxid uses the same 20-character base32 alphabet as MongoDB ObjectId (0123456789abcdefghijklmnopqrstuv). This differs from the RFC 4648 base32hex alphabet and avoids padding, matching the Go reference implementation.

Benchmark Scripts

The benchmarks/ directory contains quick probes you can run locally:

python benchmarks/throughput.py --iterations 200000
python benchmarks/concurrency.py --workers 4 --per-worker 50000
python benchmarks/memory.py --count 100000

Use these as starting points for your own profiling; results vary by hardware and workload.

Migrating from Other ID Systems

  • UUIDv4: Replace uuid.uuid4().hex with str(pyxid.new()). Expect 20-character sortable strings instead of 32-character hex.
  • Snowflake: pyxid preserves chronological ordering; if you need to backfill historical timestamps use pyxid.new_with_time(dt).
  • MongoDB ObjectId: Both are 12 bytes; convert raw bytes with pyxid.from_bytes(object_id.binary) if needed, though the layout differs.
  • During phased rollouts, dual-write both old and new IDs, then backfill using pyxid.from_string or pyxid.from_bytes to normalize stored data.

Additional Notes

  • The package maintains 100 % test coverage to ensure feature parity with the Go reference implementation.
  • JSON and SQL helpers are optional; import only what you need.
  • See tests/test_xid.py for thorough examples, including platform-specific machine ID fallbacks, JSON encoding, SQLAlchemy integration, and formatting.
  • Thread-safe: pyxid.new() uses a shared counter to guarantee uniqueness across threads within a process.
  • Counter wraparound: up to 16,777,216 IDs per second per process before the counter wraps (matching the Go implementation).