Status: Draft v0.1
License of the project: Apache-2.0 (recommended)
Target runtime: .NET 8 LTS (cross‑platform)
Purpose. Provide a robust, cross‑platform, open‑source framework to read/write and query legacy xBase databases (dBASE/Clipper/FoxPro families) from .NET, with modern integrations (ADO.NET, EF Core) and production‑grade tooling.
Scope (Phase A → Phase B).
- Phase A (MVP, write‑capable): dBASE III+/IV DBF + DBT memo, NTX (Clipper) & MDX (dBASE IV) indexes.
Includes: ADO.NET provider (read/write), EF Core provider (read/write with journaling), transactions (user‑space journal), record/file locking, codepage detection, core CLI tools. - Phase B: FoxPro 2.x DBF + FPT memo, CDX (compound/structural) indexes.
Includes: expression evaluator extensions, richer collation, and performance work for CDX. - Out of Scope (initially): Visual FoxPro DBC metadata, VFP‑specific field types, OLE/General blobs beyond pass‑through, SQL DDL, stored procedures.
- Library Consumers: .NET developers integrating legacy xBase data into new apps (ERP, billing, migration tools).
- Data Migrators: Analysts/engineers converting DBF systems to modern stores.
- Ops: Teams needing read‑only analytics or batch updates on shared filesystems.
- G1: High‑fidelity, lossless read/write for Phase‑A formats.
- G2: Cross‑platform support (Windows/Linux/macOS; works under Docker/containers).
- G3: ADO.NET provider implementing
DbConnection,DbCommand,DbDataReader,DbTransaction,DbParameter. - G4: EF Core provider with LINQ translation for common operators; supports Insert/Update/Delete with journaling.
- G5: Transactions via user‑space WAL/journal, with crash recovery and record/file locking.
- G6: Index support: NTX/MDX read/write; predicate and order pushdown; reindex.
- G7: Codepage/LDID detection with heuristics and override; UTF‑8 export tool.
- G8: Tooling:
dbfinfo,dbfdump,dbfpack,dbfreindex,dbfconvert. - G9: Comprehensive tests with real fixtures, property‑based tests, and CI.
- Full SQL engine, multi‑user server, or network protocol.
- VFP DBC schema semantics and advanced VFP data types in Phase A.
- FR‑FF‑1: DBF read/write for dBASE III+/IV with types:
C(Char),N(Numeric),F(Float),D(Date),L(Logical),M(Memo). - FR‑FF‑2: DBT memo read/write (block size detection, chain following).
- FR‑FF‑3: NTX/MDX index read/write (B‑tree).
- FR‑FF‑4 (Phase B): FoxPro 2.x DBF/FPT, CDX (read/write).
- FR‑SC‑1: Parse header, fields, record length, deleted flag, last update date.
- FR‑SC‑2: LDID‑based codepage detection; if missing/0, heuristic selection among CP852/850/437/1250.
- FR‑SC‑3: Provide override via connection string/API; default to CP852 when uncertain.
- FR‑SC‑4: Support null semantics if present (later dBASE variants) and map to .NET
Nullable<T>.
- FR‑CQ‑1: Sequential and index‑based scans; filter and order pushdown.
- FR‑CQ‑2: Predicates:
=,<,<=,>,>=,BETWEEN, prefixLIKE,IN(small sets). - FR‑CQ‑3: Ordering by indexed columns; stable sort fallback when no index matches.
- FR‑CQ‑4: Pagination:
LIMIT/OFFSETequivalent. - FR‑CQ‑5: Deleted records excluded by default; toggle to include.
- FR‑WT‑1: Insert/Update/Delete with journaled commit.
- FR‑WT‑2: Journal format: append‑only log entries (op, recno, before/after), checksums.
- FR‑WT‑3: Commit protocol: fsync journal → update main files atomically → fsync → truncate/rotate journal.
- FR‑WT‑4: Crash recovery: on open, redo/undo to consistent state.
- FR‑WT‑5: Index maintenance: deferred updates within transaction;
REINDEXtool. - FR‑WT‑6: Locking: OS file locks + optional
.lck; single‑writer/multi‑reader; optional record‑level locks.FileLockManageruses.lcksidecars to coordinate shared (FileShare.Read) and exclusive (FileShare.None) table locks with retry/timeout controls.- Record locking (enabled via
LockingMode.Record) reuses the same sidecar and applies byte-range locks (default one byte per record) so disjoint records can update concurrently.
- FR‑WT‑7: Pack/Zap: implement with safety checks and backups.
- FR‑AD‑1: Connection string:
xbase://path=<dir>;readonly=<bool>;journal=<on|off>;locking=<file|record|none>;codepage=<auto|cp852|...>;deleted=<hide|show>;cacheSize=<int>. - FR‑AD‑2: SQL‑subset parser for:
SELECT(projection, where, order, limit/offset). - FR‑AD‑3: Parameters with inferred DBF type mapping; named parameters
@p. - FR‑AD‑4: Transactions via
DbTransactionbound to journal mechanism. - FR‑AD‑5: Schema discovery APIs (
GetSchema, table/column metadata). - FR‑AD‑6: Recognize
CREATE/DROP TABLE,ALTER TABLE, andCREATE/DROP INDEXstatements, executing them through the schema mutator and surfacing resulting schema version metadata.
- FR‑EF‑1:
UseXBase(connectionString); model conventions for DBF → .NET types. - FR‑EF‑2: Keys: if no natural PK, synthesize using RECNO (exposed as read‑only shadow key).
- FR‑EF‑3: Query translation:
Where,OrderBy,Select,Skip,Take; Join: client evaluation by default; warn in logs. - FR‑EF‑4: Change Tracking: optimistic concurrency using checksum/hash of serialized record or timestamp surrogate.
- FR‑EF‑5: Migrations: copy‑rebuild strategy (create new DBF with target schema, copy data).
- FR‑EF‑6: Bulk ops: optional batched write API; document performance trade‑offs.
- FR‑TL‑1:
dbfinfo– print header, fields, LDID, codepage guess, record count, index tags. - FR‑TL‑2:
dbfdump– export to CSV/JSON (UTF‑8), toggle include deleted. - FR‑TL‑3:
dbfpack– remove deleted records safely (backup + journal sync). - FR‑TL‑4:
dbfreindex– rebuild NTX/MDX (CDX in Phase B). - FR‑TL‑5:
dbfconvert– transcode codepage → UTF‑8, or copy to new schema.
- FR‑OD‑1: Deliver In-Place Online DDL (IPOD) with schema-delta
.ddllog, versioned projections, lazy backfill, atomic checkpoints, and short exclusive DDL locks. - FR‑OD‑2: Support
CREATE TABLE,ALTER TABLE ADD/DROP/RENAME/MODIFY COLUMN,DROP TABLE,CREATE INDEX,DROP INDEXwith transactional guarantees and side-by-side index swaps. - FR‑OD‑3: Maintain replayable
.ddllog entries with checksums, monotonic schema version numbers, and bounded retention. - FR‑OD‑4: Integrate schema replay with journal recovery: apply data log first,
.ddllog second; resume incomplete backfill work queues. - FR‑OD‑5: Extend ADO.NET/EF Core providers to parse/emit DDL statements and surface schema version metadata to callers.
- FR‑OD‑6: Tooling must provide
xbase ddl apply,xbase ddl checkpoint, andxbase ddl packwith validation/dry-run switches. - FR‑OD‑7: Acceptance coverage: concurrent reader safety, writer throttling under backfill, recovery from mid-DDL crash, and index swap correctness.
- NFR‑P‑1: Efficient read path using memory‑mapped I/O where safe (platform fallback to buffered).
- NFR‑P‑2: Page caches for DBF records and index nodes; configurable sizes; LRU eviction.
- NFR‑P‑3: Target baselines (indicative on mid‑range hardware):
- Sequential read: ≥ 150 MB/s effective scanning of fixed‑length records.
- Indexed point lookup: p50 ≤ 1 ms (warm cache); p99 ≤ 15 ms.
- Insert throughput: ≥ 5k rec/s without index; ≥ 1k rec/s with 2 indexes.
- NFR‑P‑4: Large file support: single DBF sizes ≥ 2 GB (subject to format limits).
- NFR‑R‑1: Crash‑safe commit (no torn writes on supported filesystems).
- NFR‑R‑2: Recovery time: O(size of journal), with bounded upper limit via rotation.
- NFR‑R‑3: Checksums on journal entries; optional periodic snapshot.
- NFR‑X‑1: Windows, Linux, macOS; x64 & ARM64.
- NFR‑X‑2: Compatible with containerized deployments; no platform‑specific P/Invoke hard dependency (only optional).
- NFR‑S‑1: No remote code execution; safe parsing with bounds checks.
- NFR‑S‑2: Optional file permission checks; no elevation required.
- NFR‑S‑3: Supply‑chain safety: signed releases, reproducible builds where possible.
- NFR‑O‑1: Structured logging (Microsoft.Extensions.Logging).
- NFR‑O‑2: Event counters: cache hits/misses, I/O bytes, journal commits, lock waits.
- NFR‑O‑3: Diagnostics hooks to inspect query plans and index usage.
- CMP‑1: Read DBF/DBT/NTX/MDX created by common dBASE/Clipper tools.
- CMP‑2: Write files that remain readable by those tools where format allows.
- CMP‑3: Codepage round‑trip tests; document known edge cases.
- Connection string and fluent API options as per FR‑AD‑1 and FR‑EF‑1.
- Global options: codepage policy (
auto|force), default locking mode, cache sizes, deleted record visibility, null policy.
- Deterministic exception types (e.g.,
XBaseFileFormatException,XBaseCodepageException,XBaseLockException,XBaseTransactionException). - Clear remediation guidance in messages; include file path, table, recno, index tag.
xBase.sln
/src
/XBase.Core
/XBase.Abstractions
/XBase.Data // ADO.NET provider
/XBase.EFCore // EF Core provider
/XBase.Tools // CLI tools
/XBase.Expressions // expr & index key evaluator
/XBase.Diagnostics
/tests
/XBase.Core.Tests
/XBase.Data.Tests
/XBase.EFCore.Tests
/XBase.Tools.Tests
/fixtures // real DBF/DBT/NTX/MDX samples
/docs
ARCHITECTURE.md
ROADMAP.md
CODEPAGES.md
INDEXES.md
TRANSACTIONS.md
EFCORE_LIMITATIONS.md
LICENSE (Apache-2.0)
README.md
- Unit tests for parsers, encoders, index operations, journal commit/recovery.
- Property‑based tests: read→write→read equivalence; reindex invariants.
- Fuzzing for corrupted headers/memos/index pages.
- Concurrency tests: single‑writer with parallel readers; lock contention.
- Cross‑tool validation using third‑party DBF readers (when license permits).
- Quickstart for ADO.NET and EF Core; mapping tables for types; codepage guide.
- Tooling manuals; migration cookbook (DBF → CSV/Parquet/SQL).
- Performance tuning guide (cache, locking, batch strategies).
- License: Apache‑2.0 (patent grant, enterprise‑friendly).
- SemVer releases; signed NuGet packages; GitHub Actions CI; SBOM.
- R‑1: Index expression dialect differences → define a documented subset, pluggable evaluator.
- R‑2: Codepage ambiguity → LDID first, then heuristics, then explicit override; provide
dbfconvert. - R‑3: Data corruption from concurrent writers → enforce single‑writer by default; strong warnings and lock enforcement.
- R‑4: EF Core JOIN expectations → default to client‑eval, log warnings, provide guidance.
- M1 (Core Read): DBF/DBT read; NTX/MDX read; codepage auto;
dbfinfoworks.
Acceptance: open 20+ fixtures, correct schema and record counts. - M2 (Core Write + Journal): I/U/D; journal commit/recover; basic locking.
Acceptance: crash simulation tests pass; no data loss. - M3 (Online DDL + ADO.NET): Deliver In-Place Online DDL/IPOD (schema-delta log, lazy backfill, DDL lock discipline) while
completing the ADO.NET surface (SELECT subset, parameters, transactions) needed to drive tooling and provider scenarios.
Acceptance: Schema-delta logging captured and replayed in tests; ADO.NET/EF providers issue DDL verbs end-to-end; CLI
ddl apply/checkpoint/packcommands ship with validation and dry-run coverage. - M4 (EF Core Read/Write): LINQ subset; tracking; concurrency tokens.
Acceptance: CRUD round‑trips; concurrency tests pass. - M5 (Indexes & Tools): Reindex/pack tools; predicate/order pushdown measured.
Acceptance: >70% queries use index according to diagnostics counters. - M6 (Phase B Read): FoxPro 2.x DBF/FPT/CDX read.
Acceptance: Open known CDX datasets correctly; tag enumeration.
- US‑01: As a developer, I can connect via ADO.NET and
SELECTfiltered rows with paging and ordering. - US‑02: As an EF Core user, I can
Add/Update/Deleteentities and commit atomically with crash safety. - US‑03: As an operator, I can run
dbfpackanddbfreindexto maintain file health. - US‑04: As a migrator, I can
dbfdumpto UTF‑8 CSV/JSON preserving accents.
- DBF/DBT/FPT: Table & memo file formats.
- NTX/MDX/CDX: Index formats (Clipper/dBASE/FoxPro).
- LDID: Language Driver ID (codepage indicator).
- RECNO: Record number (1‑based).
- PACK/ZAP: Physical deletion/clear operations.
End of requirements.md