Skip to content

Latest commit

 

History

History
269 lines (201 loc) · 14 KB

File metadata and controls

269 lines (201 loc) · 14 KB

xBase for .NET — Architecture (ARCHITECTURE.md)

Status: Draft v0.1
Target runtime: .NET 8 LTS (Windows/Linux/macOS, x64 & ARM64)
Scope: Phase A (dBASE III+/IV DBF + DBT, NTX/MDX) with writable EF/ADO.NET; Phase B (FoxPro 2.x DBF/FPT, CDX)


1. Architectural Overview

The framework is a modular, layered system that separates binary storage concerns from provider interfaces and high-level integrations:

Applications / Tools
      ▲
      │         EF Core Provider (XBase.EFCore)
      │         ADO.NET Provider (XBase.Data)
      │
Abstractions (XBase.Abstractions)  ← public SPIs & contracts
      │
Core Engine (XBase.Core)  ← DBF/DBT, NTX/MDX, journaling, locking, codepages, cursors
      │
Diagnostics & Expressions (XBase.Diagnostics / XBase.Expressions)
      │
File System & OS (memory-mapped I/O, file locks)

Key principles:

  • Portability-first: No mandatory native dependencies. MMAP and file locks via .NET APIs with fallbacks.
  • Safety over cleverness: Crash-safe journaling; single-writer default; deterministic error contracts.
  • Pushdown where possible: Filters/order executed as close to storage as possible.
  • Composable providers: ADO.NET and EF Core layers are thin adapters over the same cursor APIs.

2. Projects & Responsibilities

2.1 XBase.Core

  • Binary formats: DBF (dBASE III+/IV), DBT (memo), NTX/MDX (B-tree indexes).
  • Record model: fixed-length rows, deleted-flag handling, optional null bitmap (if present), RECNO addressing.
  • Memo management: block size detection, chain following, slack space reuse, safe growth.
  • Index engine: Node/page cache, key comparers, collations, tag metadata, rebuild/reindex.
  • Transactions: user-space WAL/journal (.trx), atomic commits, recovery on open.
  • Locking: OS file locks; optional .lck sidecar; record-level locks with coarse-grained file lock guard.
  • Codepages: LDID map; heuristics when LDID is missing; override policies; transcoding utilities.
  • Cursors: sequential & indexed scans, predicate/order pushdown, pagination.

2.2 XBase.Abstractions

  • Contracts: ITable, IIndex, ICursor, ISchemaProvider, ITransaction, IJournal, ILocker, IValueEncoder, IPageCache.
  • Schema evolution primitives: SchemaVersion, SchemaOperation, SchemaLogEntry, SchemaBackfillTask, ISchemaMutator.
  • SPI for adding future format modules (Phase B CDX/FPT, future VFP module).

2.3 XBase.Data (ADO.NET)

  • Implements DbConnection, DbCommand, DbDataReader, DbTransaction, DbParameter.
  • SQL-subset parser → query plan builder (index selection + scan) and DDL interpreter for CREATE/DROP TABLE, ALTER TABLE, CREATE/DROP INDEX routed through ISchemaMutator.
  • Connection string policy parsing (journal/locking/codepage/cache/deleted-flag).

2.4 XBase.EFCore

  • Provider glue: UseXBase(...) extension, TypeMapping, MemberTranslator, QuerySqlGenerator-equivalent pipeline for pushdown.
  • Keys: synthetic RECNO if none present; concurrency token via record checksum.
  • Write model: I/U/D delegates to Core with journaling; migrations via copy-rebuild.

2.5 XBase.Expressions

  • dBASE/Clipper/FoxPro expression subset used for index keys & predicate evaluation (Phase A subset; extended in Phase B).
  • Pluggable evaluators & function registry (e.g., UPPER, TRIM, arithmetic, date ops; collation-aware comparisons).

2.6 XBase.Diagnostics

  • Structured logging, event counters (cache hits/misses, bytes read/written, journal commits, lock waits).
  • Validators: header/index consistency checkers.

2.7 XBase.Tools

  • CLI utilities: dbfinfo, dbfdump, dbfreindex, dbfpack, dbfconvert, plus online DDL helpers ddl apply/checkpoint/pack.

3. Data Flow & Execution Model

  1. Open: XBaseConnection resolves directory, discovers table/index/memo files, loads headers lazily.
  2. Plan: ADO.NET/EF build a logical plan → Core Planner selects index/tag or sequential scan; builds a Cursor.
  3. Execute: Cursor yields records; Predicate pushdown applies comparisons before materialization; Order pushdown uses index order when possible.
  4. Write: EF/ADO.NET writes via Core Mutator → journal append → on commit: fsync journal, apply changes, fsync data, rotate journal.
  5. Recovery: On open, any non-empty journal is replayed (redo/undo depending on last consistent marker).

4. File Format Handling

4.1 DBF

  • Header parsing: version, date, header size, record size, field descriptors, LDID.
  • DbfTableLoader surfaces parsed metadata as DbfTableDescriptor (implementing ITableDescriptor) for reuse across Core/Data tools.
  • TableCatalog discovers .dbf files in a directory, hydrates memo/index sidecars, and feeds higher-level providers.
  • Record access: fixed-offset views; deleted flag; optional null bitmap (later variants); RECNO = 1-based index.
  • Type mapping: C/N/F/D/L/M → .NET types (string/decimal/double/DateOnly/bool/Stream or string for memo).

4.2 Memo (DBT)

  • Block size from header; first block directory; fragmented chains.
  • Writes allocate new blocks; compaction optional; safe growth with fsync checkpoints.

4.3 Indexes (NTX/MDX)

  • B-tree with fixed fan-out; key serialization; page cache; tag metadata.
  • Key expressions (subset) compiled to evaluator delegates; deterministic collation.
  • Reindex builds new index side-by-side and atomically swaps.

4.4 Phase B (Preview)

  • FoxPro 2.x FPT/CDX introduce compound indexes and richer collation; isolated module keeps Core stable.

5. Transactions & Locking

5.1 Journaled Transactions

  • File envelope: 16-byte header ("XBASEJNL", version = 1, reserved padding) precedes an append-only stream of entries. WalJournal verifies the header on open and refuses new transactions when pending entries exist.

  • Entry layout: each record stores {length:int32, checksum:uint32} followed by payload:

    byte   entryType (Begin|Mutation|Commit|Rollback)
    int64  transactionId (monotonic, per process)
    int64  timestamp (UTC ticks)
    [mutation payload]
    

    Mutation payload embeds the table name (UTF-8 + length prefix), record number, mutation kind (Insert|Update|Delete), and serialized before/after images (length-prefixed byte blobs).

  • Durability controls: writes are issued with FileOptions.WriteThrough and optional double flush (FlushAsync + Flush(true)) to satisfy FR‑WT‑3. Options allow deferring journal truncation (AutoResetOnCommit = false) for diagnostics.

  • Commit protocol: Begin entry → mutation batch → Commit entry (fsync) → core applies DBF/DBT/index updates → data fsync → journal reset (truncate + header rewrite). Rollback records trigger the same reset semantics after flushing the rollback marker.

  • Crash recovery: WalJournal.RecoverAsync parses the stream, producing

    • CommittedTransactions (redo set: after images applied in order),
    • IncompleteTransactions (undo set: before images for uncommitted or rollback-tagged txns), and
    • diagnostic flags for checksum mismatches or truncated tails. Recovery must run before a new transaction begins; BeginAsync enforces this by checking for residual entries.
  • Atomicity: once data files reconcile the redo/undo plan, the journal is reset, matching the rename/swap guarantees used by DDL flows.

5.2 Concurrency Model

  • Default single-writer, multi-reader. Optional record-level locks for long updates.
  • Readers respect file locks; writers acquire exclusive table locks during commit window.
  • EF SaveChanges wraps mutations in a transaction; optimistic concurrency via record checksum column.
  • FileLockManager coordinates OS locks through .lck sidecars adjacent to DBF/DBT files. Shared locks open the sidecar read-only (FileShare.Read), allowing multiple readers, while exclusive locks request FileShare.None and block until readers drain. Record-level locks (when enabled) reuse the same sidecar with byte-range locking (one byte per record), so writers on different records interleave without contending on the global file mutex.
  • FileLockManagerOptions tune retry cadence (RetryDelay), acquisition timeout, lock directory overrides, and the span used for record offsets (default 1 byte). LockingMode toggles None | File | Record, with record mode layering per-record locks on top of the file primitive.

6. Codepages & Collations

  • LDID-first detection; if missing: heuristics (invalid byte ratios, Hungarian accent profile, fallback CP852).
  • Collation strategies: binary (default) + optional locale-aware uppercasing for comparisons; documented and consistent.
  • Tooling dbfconvert supports transcode to UTF‑8 copies for migration workflows.

7. Query & Expression Pipeline

7.1 Predicates

  • Supported pushdown: equality/inequalities on C/N/F/D/L, BETWEEN, prefix LIKE, small IN lists.
  • Non-pushdown expressions evaluated client-side (EF) with logged warnings.

7.2 Ordering & Paging

  • If an index prefix matches the ORDER BY, use index order; otherwise stable in-memory sort with spill-to-temp when needed.
  • Pagination via LIMIT/OFFSET materialized on cursor.

7.3 Expression Subset (Phase A)

  • Literals, arithmetic, logical ops, string funcs (SUBSTR, LEFT, RIGHT, UPPER, TRIM), date diff/add (limited).
  • Deterministic serialization for index keys.

8. ADO.NET Provider Design

  • Connection: xbase://path=<dir>;readonly=<bool>;journal=<on|off>;locking=<file|record|none>;codepage=<auto|cp852|...>;deleted=<hide|show>;cacheSize=<int>.
  • Command: SQL-subset → logical plan; parameters @p mapped to DBF types.
  • Reader: column ordinals map to field descriptors; memo streams on demand.
  • Transaction: wraps journal begin/commit/rollback.

9. EF Core Provider Design

  • Model building: conventions map DBF fields to .NET; RECNO as shadow key if needed.
  • Query translation: attempt pushdown; unsupported features fallback with warnings; joins default to client-eval.
  • Change tracking: serialize-before/after for checksum; concurrency token compares at save time.
  • Migrations: copy-rebuild pattern with progress callbacks.

10. Online DDL & Schema Evolution (M3)

  • In-Place Online DDL (IPOD) maintains table availability for readers and the single-writer pipeline while schema mutations append to a schema-delta .ddl log handled by SchemaLog and SchemaMutator inside XBase.Core.
  • Versioned projections let cursors materialize records against the target schema version, with adapters to reshape legacy rows until they are backfilled via SchemaBackfillQueue.
  • Lazy backfill occurs opportunistically during writes and via background workers, respecting throttles and journaling checkpoints.
  • Atomic checkpoints consolidate applied deltas into refreshed DBF headers and regenerate catalog metadata while holding only a short exclusive DDL lock. SchemaMutator.CreateCheckpointAsync manages log compaction and backfill cleanup.
  • Short exclusive DDL locks protect header rewrites and index swap windows; standard reads/writes use shared locks plus version gates.
  • Side-by-side index swaps rebuild NTX/MDX artifacts under temp names and atomically rename once validated.
  • Provider integration: ADO.NET exposes CREATE/DROP INDEX and ALTER TABLE verbs; EF Core migrations target the same API and track schema version numbers to coordinate with the .ddl log.
  • Tooling: xbase ddl apply, xbase ddl checkpoint, and xbase ddl pack orchestrate delta ingestion, compaction, and optional vacuum aligned with schema state, with dry-run support.
  • Recovery order: Data journal replay runs first, followed by .ddl log replay to restore schema projections and resume pending backfill tasks.

11. Performance Architecture

  • I/O: Memory-mapped reads where safe; buffered writes with durable fsync boundaries.
  • Caches: Record page cache and index node cache (LRU); configurable sizes.
  • Batching: Grouped writes; deferred index updates per transaction.
  • Metrics: event counters for throughput/latency; optional simple flame traces.

12. Error Handling & Diagnostics

  • Typed exceptions: XBaseFileFormatException, XBaseCodepageException, XBaseLockException, XBaseTransactionException.
  • Messages include file, table, recno, tag, and remediation hints.
  • Explain endpoint (debug only) prints query plan and pushdown decisions.

13. Security Considerations

  • Bounds-checked parsing; limit record and memo sizes; guard against path traversal in index/memo resolution.
  • No dynamic codegen from untrusted expressions; expression subset is compiled with a safe interpreter/JIT.

14. Testing & CI

  • Fixture-based tests on real DBF/DBT/NTX/MDX.
  • Property-based invariants: read→write→read equivalence; reindex equivalence; crash simulations for journal recovery.
  • CI matrix: Windows/Linux/macOS, x64/ARM64.

15. Extensibility

  • SPI in XBase.Abstractions for new formats (CDX/FPT, future VFP).
  • Pluggable collations and codepage providers.
  • Expression function registry and custom translators.

16. Deployment & Packaging

  • NuGet packages per module (XBase.Core, XBase.Data, XBase.EFCore, XBase.Tools).
  • Signed packages, SemVer, source link, symbols.
  • Minimal runtime dependencies; enable trimming where possible.

17. Open Questions (to be tracked)

  • Precise null semantics per variant (documented matrix).
  • Record-level locks cross-platform consistency (advisory vs mandatory).
  • Large file limits and safe behavior above 2–4 GB boundaries for legacy tools.

18. Appendix: Example Plans

Example: SELECT Name FROM Products WHERE Name LIKE 'P%' ORDER BY Name LIMIT 100

  • Plan: Use Name ascending index (if exists) → prefix LIKE pushdown → scan until 'Q' boundary → return first 100.

Example: WHERE Price BETWEEN 100 AND 200

  • Plan: Use Price index tag range scan; if absent, sequential scan with predicate pushdown.

End of ARCHITECTURE.md