Status: Draft v0.1
Target runtime: .NET 8 LTS (Windows/Linux/macOS, x64 & ARM64)
Scope: Phase A (dBASE III+/IV DBF + DBT, NTX/MDX) with writable EF/ADO.NET; Phase B (FoxPro 2.x DBF/FPT, CDX)
The framework is a modular, layered system that separates binary storage concerns from provider interfaces and high-level integrations:
Applications / Tools
▲
│ EF Core Provider (XBase.EFCore)
│ ADO.NET Provider (XBase.Data)
│
Abstractions (XBase.Abstractions) ← public SPIs & contracts
│
Core Engine (XBase.Core) ← DBF/DBT, NTX/MDX, journaling, locking, codepages, cursors
│
Diagnostics & Expressions (XBase.Diagnostics / XBase.Expressions)
│
File System & OS (memory-mapped I/O, file locks)
Key principles:
- Portability-first: No mandatory native dependencies. MMAP and file locks via .NET APIs with fallbacks.
- Safety over cleverness: Crash-safe journaling; single-writer default; deterministic error contracts.
- Pushdown where possible: Filters/order executed as close to storage as possible.
- Composable providers: ADO.NET and EF Core layers are thin adapters over the same cursor APIs.
- Binary formats: DBF (dBASE III+/IV), DBT (memo), NTX/MDX (B-tree indexes).
- Record model: fixed-length rows, deleted-flag handling, optional null bitmap (if present), RECNO addressing.
- Memo management: block size detection, chain following, slack space reuse, safe growth.
- Index engine: Node/page cache, key comparers, collations, tag metadata, rebuild/reindex.
- Transactions: user-space WAL/journal (
.trx), atomic commits, recovery on open. - Locking: OS file locks; optional
.lcksidecar; record-level locks with coarse-grained file lock guard. - Codepages: LDID map; heuristics when LDID is missing; override policies; transcoding utilities.
- Cursors: sequential & indexed scans, predicate/order pushdown, pagination.
- Contracts:
ITable,IIndex,ICursor,ISchemaProvider,ITransaction,IJournal,ILocker,IValueEncoder,IPageCache. - Schema evolution primitives:
SchemaVersion,SchemaOperation,SchemaLogEntry,SchemaBackfillTask,ISchemaMutator. - SPI for adding future format modules (Phase B CDX/FPT, future VFP module).
- Implements
DbConnection,DbCommand,DbDataReader,DbTransaction,DbParameter. - SQL-subset parser → query plan builder (index selection + scan) and DDL interpreter for
CREATE/DROP TABLE,ALTER TABLE,CREATE/DROP INDEXrouted throughISchemaMutator. - Connection string policy parsing (journal/locking/codepage/cache/deleted-flag).
- Provider glue:
UseXBase(...)extension,TypeMapping,MemberTranslator,QuerySqlGenerator-equivalent pipeline for pushdown. - Keys: synthetic RECNO if none present; concurrency token via record checksum.
- Write model: I/U/D delegates to Core with journaling; migrations via copy-rebuild.
- dBASE/Clipper/FoxPro expression subset used for index keys & predicate evaluation (Phase A subset; extended in Phase B).
- Pluggable evaluators & function registry (e.g.,
UPPER,TRIM, arithmetic, date ops; collation-aware comparisons).
- Structured logging, event counters (cache hits/misses, bytes read/written, journal commits, lock waits).
- Validators: header/index consistency checkers.
- CLI utilities:
dbfinfo,dbfdump,dbfreindex,dbfpack,dbfconvert, plus online DDL helpersddl apply/checkpoint/pack.
- Open:
XBaseConnectionresolves directory, discovers table/index/memo files, loads headers lazily. - Plan: ADO.NET/EF build a logical plan → Core Planner selects index/tag or sequential scan; builds a Cursor.
- Execute: Cursor yields records; Predicate pushdown applies comparisons before materialization; Order pushdown uses index order when possible.
- Write: EF/ADO.NET writes via Core Mutator → journal append → on commit: fsync journal, apply changes, fsync data, rotate journal.
- Recovery: On open, any non-empty journal is replayed (redo/undo depending on last consistent marker).
- Header parsing: version, date, header size, record size, field descriptors, LDID.
DbfTableLoadersurfaces parsed metadata asDbfTableDescriptor(implementingITableDescriptor) for reuse across Core/Data tools.TableCatalogdiscovers.dbffiles in a directory, hydrates memo/index sidecars, and feeds higher-level providers.- Record access: fixed-offset views; deleted flag; optional null bitmap (later variants); RECNO = 1-based index.
- Type mapping:
C/N/F/D/L/M→ .NET types (string/decimal/double/DateOnly/bool/Stream or stringfor memo).
- Block size from header; first block directory; fragmented chains.
- Writes allocate new blocks; compaction optional; safe growth with fsync checkpoints.
- B-tree with fixed fan-out; key serialization; page cache; tag metadata.
- Key expressions (subset) compiled to evaluator delegates; deterministic collation.
- Reindex builds new index side-by-side and atomically swaps.
- FoxPro 2.x FPT/CDX introduce compound indexes and richer collation; isolated module keeps Core stable.
-
File envelope: 16-byte header (
"XBASEJNL", version = 1, reserved padding) precedes an append-only stream of entries.WalJournalverifies the header on open and refuses new transactions when pending entries exist. -
Entry layout: each record stores
{length:int32, checksum:uint32}followed by payload:byte entryType (Begin|Mutation|Commit|Rollback) int64 transactionId (monotonic, per process) int64 timestamp (UTC ticks) [mutation payload]Mutation payload embeds the table name (UTF-8 + length prefix), record number, mutation kind (
Insert|Update|Delete), and serialized before/after images (length-prefixed byte blobs). -
Durability controls: writes are issued with
FileOptions.WriteThroughand optional double flush (FlushAsync+Flush(true)) to satisfyFR‑WT‑3. Options allow deferring journal truncation (AutoResetOnCommit = false) for diagnostics. -
Commit protocol:
Beginentry → mutation batch →Commitentry (fsync) → core applies DBF/DBT/index updates → data fsync → journal reset (truncate + header rewrite).Rollbackrecords trigger the same reset semantics after flushing the rollback marker. -
Crash recovery:
WalJournal.RecoverAsyncparses the stream, producingCommittedTransactions(redo set: after images applied in order),IncompleteTransactions(undo set: before images for uncommitted or rollback-tagged txns), and- diagnostic flags for checksum mismatches or truncated tails. Recovery must run before a new transaction begins;
BeginAsyncenforces this by checking for residual entries.
-
Atomicity: once data files reconcile the redo/undo plan, the journal is reset, matching the rename/swap guarantees used by DDL flows.
- Default single-writer, multi-reader. Optional record-level locks for long updates.
- Readers respect file locks; writers acquire exclusive table locks during commit window.
- EF
SaveChangeswraps mutations in a transaction; optimistic concurrency via record checksum column. FileLockManagercoordinates OS locks through.lcksidecars adjacent to DBF/DBT files. Shared locks open the sidecar read-only (FileShare.Read), allowing multiple readers, while exclusive locks requestFileShare.Noneand block until readers drain. Record-level locks (when enabled) reuse the same sidecar with byte-range locking (one byte per record), so writers on different records interleave without contending on the global file mutex.FileLockManagerOptionstune retry cadence (RetryDelay), acquisition timeout, lock directory overrides, and the span used for record offsets (default1byte).LockingModetogglesNone | File | Record, with record mode layering per-record locks on top of the file primitive.
- LDID-first detection; if missing: heuristics (invalid byte ratios, Hungarian accent profile, fallback CP852).
- Collation strategies: binary (default) + optional locale-aware uppercasing for comparisons; documented and consistent.
- Tooling
dbfconvertsupports transcode to UTF‑8 copies for migration workflows.
- Supported pushdown: equality/inequalities on
C/N/F/D/L,BETWEEN, prefixLIKE, smallINlists. - Non-pushdown expressions evaluated client-side (EF) with logged warnings.
- If an index prefix matches the
ORDER BY, use index order; otherwise stable in-memory sort with spill-to-temp when needed. - Pagination via
LIMIT/OFFSETmaterialized on cursor.
- Literals, arithmetic, logical ops, string funcs (
SUBSTR,LEFT,RIGHT,UPPER,TRIM), date diff/add (limited). - Deterministic serialization for index keys.
- Connection:
xbase://path=<dir>;readonly=<bool>;journal=<on|off>;locking=<file|record|none>;codepage=<auto|cp852|...>;deleted=<hide|show>;cacheSize=<int>. - Command: SQL-subset → logical plan; parameters
@pmapped to DBF types. - Reader: column ordinals map to field descriptors; memo streams on demand.
- Transaction: wraps journal begin/commit/rollback.
- Model building: conventions map DBF fields to .NET; RECNO as shadow key if needed.
- Query translation: attempt pushdown; unsupported features fallback with warnings; joins default to client-eval.
- Change tracking: serialize-before/after for checksum; concurrency token compares at save time.
- Migrations: copy-rebuild pattern with progress callbacks.
- In-Place Online DDL (IPOD) maintains table availability for readers and the single-writer pipeline while schema mutations
append to a schema-delta
.ddllog handled bySchemaLogandSchemaMutatorinsideXBase.Core. - Versioned projections let cursors materialize records against the target schema version, with adapters to reshape legacy
rows until they are backfilled via
SchemaBackfillQueue. - Lazy backfill occurs opportunistically during writes and via background workers, respecting throttles and journaling checkpoints.
- Atomic checkpoints consolidate applied deltas into refreshed DBF headers and regenerate catalog metadata while holding only
a short exclusive DDL lock.
SchemaMutator.CreateCheckpointAsyncmanages log compaction and backfill cleanup. - Short exclusive DDL locks protect header rewrites and index swap windows; standard reads/writes use shared locks plus version gates.
- Side-by-side index swaps rebuild NTX/MDX artifacts under temp names and atomically rename once validated.
- Provider integration: ADO.NET exposes
CREATE/DROP INDEXandALTER TABLEverbs; EF Core migrations target the same API and track schema version numbers to coordinate with the.ddllog. - Tooling:
xbase ddl apply,xbase ddl checkpoint, andxbase ddl packorchestrate delta ingestion, compaction, and optional vacuum aligned with schema state, with dry-run support. - Recovery order: Data journal replay runs first, followed by
.ddllog replay to restore schema projections and resume pending backfill tasks.
- I/O: Memory-mapped reads where safe; buffered writes with durable fsync boundaries.
- Caches: Record page cache and index node cache (LRU); configurable sizes.
- Batching: Grouped writes; deferred index updates per transaction.
- Metrics: event counters for throughput/latency; optional simple flame traces.
- Typed exceptions:
XBaseFileFormatException,XBaseCodepageException,XBaseLockException,XBaseTransactionException. - Messages include file, table, recno, tag, and remediation hints.
Explainendpoint (debug only) prints query plan and pushdown decisions.
- Bounds-checked parsing; limit record and memo sizes; guard against path traversal in index/memo resolution.
- No dynamic codegen from untrusted expressions; expression subset is compiled with a safe interpreter/JIT.
- Fixture-based tests on real DBF/DBT/NTX/MDX.
- Property-based invariants: read→write→read equivalence; reindex equivalence; crash simulations for journal recovery.
- CI matrix: Windows/Linux/macOS, x64/ARM64.
- SPI in
XBase.Abstractionsfor new formats (CDX/FPT, future VFP). - Pluggable collations and codepage providers.
- Expression function registry and custom translators.
- NuGet packages per module (
XBase.Core,XBase.Data,XBase.EFCore,XBase.Tools). - Signed packages, SemVer, source link, symbols.
- Minimal runtime dependencies; enable trimming where possible.
- Precise null semantics per variant (documented matrix).
- Record-level locks cross-platform consistency (advisory vs mandatory).
- Large file limits and safe behavior above 2–4 GB boundaries for legacy tools.
Example: SELECT Name FROM Products WHERE Name LIKE 'P%' ORDER BY Name LIMIT 100
- Plan: Use
Nameascending index (if exists) → prefixLIKEpushdown → scan until'Q'boundary → return first 100.
Example: WHERE Price BETWEEN 100 AND 200
- Plan: Use
Priceindex tag range scan; if absent, sequential scan with predicate pushdown.
End of ARCHITECTURE.md