Skip to content

Latest commit

 

History

History
394 lines (275 loc) · 31 KB

File metadata and controls

394 lines (275 loc) · 31 KB

HL7 v2 Parser Library

Overview

Single-package Go library (package hl7) for parsing HL7 version 2.x messages in ER7 (pipe-delimited) format. Zero external dependencies. Requires Go 1.23+ for language features.

Goals

  • Low-latency parsing. Consumers reading only MSH-9 should not pay the cost of parsing OBX segments. Parsing is lazy: ParseMessage splits segments by \r, and field/component/subcomponent boundaries are scanned on each access.
  • Minimal allocations. Sub-message types (Field, Repetition, Component, Subcomponent) are lightweight value types (~32 bytes each: raw []byte + delims Delimiters). No internal caches or memoization. The only heap allocations come from ParseMessage itself (buffer copy + segment slice).
  • Spec-compliant HL7v2 parsing. Delimiter extraction from MSH-1/MSH-2, MSH field numbering quirks, null vs empty semantics, escape sequence processing, MLLP framing, and batch/file structure all follow the HL7 v2.5.1 specification.
  • Ergonomic chained access. Out-of-range access returns zero values (not errors), enabling patterns like seg.Field(9).Component(1).String() without intermediate nil checks.
  • General-purpose. Not tied to specific message types (ADT, ORU, etc.). Intended to be consumed by a larger application that interprets message content.
  • Safe for concurrent use. Messages are immutable after parsing — all types are either value types or hold only read-only []byte slices into the owned buffer, with no mutable state or caches. Concurrent reads require no synchronization.

Non-Goals

  • Full serialization or Marshal API. The library does not provide a general Marshal function. Message construction is handled by MessageBuilder (from-scratch) and Transform (modify existing), both of which produce *Message via ParseMessage. There are no per-field setters on parsed messages.
  • Built-in schema definitions. The library does not ship with HL7v2 segment, data type, or table definitions. Users provide their own Schema struct to msg.Validate(). This keeps the library general-purpose and avoids bundling version-specific definitions.
  • No field location constants. HL7 field positions (e.g., PID-5.1) are not stable across HL7 v2 versions or vendor implementations. The library does not provide named constants for terser-style location strings; callers define their own.
  • No predicate-based segment filtering. Methods like SegmentsOfType, SegmentsWhere, or combinator functions (And, Or) were considered and rejected. The plain Segments() loop with an inline if or switch is idiomatic Go, immediately readable without library knowledge, and handles all filtering cases in a single pass. The abstraction does not eliminate domain knowledge — callers still decide which types and fields matter — it only restructures where that logic lives. The API surface cost exceeds the ergonomic benefit.

Building and Testing

go build ./...                                  # Compile
go test ./... -v                                # Run all tests
go test -bench=. -benchmem                      # Benchmarks
go test -fuzz=FuzzParseMessage -fuzztime=30s    # Fuzz testing
go vet ./...                                    # Static analysis

Project Structure

hl7/
  go.mod              # module hl7, go 1.25.7, zero dependencies
  doc.go              # Package-level documentation and examples
  error.go            # Sentinel errors and ParseError type
  delimiters.go       # Delimiters struct, extraction, validation, scan helpers
  escape.go           # Unescape() with zero-alloc fast path
  component.go        # Component and Subcomponent value types
  field.go            # Field and Repetition value types
  segment.go          # Segment type with MSH/BHS/FHS special-case handling
  message.go          # Message type, ParseMessage(), segment iterators
  accessor.go         # Terser-style location parser, Location type, Value type, Get()
  charset.go          # ValueDecoder type; DecodeString on Field/Repetition/Component/Subcomponent/Value
  reader.go           # io.Reader wrapper with MLLP and raw mode support
  writer.go           # io.Writer wrapper with MLLP and raw mode support
  ack.go              # ACK message generation (AckCode, AckOption, Message.Ack, WithErrors)
  batch.go            # Batch (BHS/BTS) and File (FHS/FTS) parsing; BatchBuilder for constructing batches
  transform.go        # Transform engine, Change types, workBuf, delimiter re-encoding
  builder.go          # MessageBuilder for from-scratch message construction
  schema.go           # Schema types for validation (MessageDef, SegmentDef, etc.)
  validate.go         # Validation engine (structure + content validation)
  examples/           # Runnable example programs
    builder/main.go   # Construct a message from scratch and write it
    full/main.go      # End-to-end workflow: read, validate, transform, write
  testdata/           # Golden test files (adt_a01.hl7, oru_r01.hl7, etc.)
  *_test.go           # Unit tests, benchmarks, fuzz test

File ordering reflects the dependency graph bottom-up: error.go and delimiters.go are leaf dependencies; message.go and above consume everything below. transform.go provides the byte-splicing engine; builder.go wraps it for from-scratch construction. schema.go defines data types only; validate.go depends on everything.

doc.go must be evaluated for updating after every change.

Architecture

Scan-on-Access (No Memoization)

The central design decision is that sub-message types do not cache parsed results. Every call to Field(n), Rep(n), Component(n), or SubComponent(n) re-scans the raw bytes for the n-th delimiter. This was an intentional tradeoff:

  • Pro: Eliminates 10 heap allocations per message that were previously needed for []Field, []Repetition, and []Component slices created on first access.
  • Pro: All sub-message types become pure value types (two fields: raw []byte + delims Delimiters) — trivially copyable, no pointer indirection, no mutation.
  • Con: Repeated access to the same field re-scans each time. For HL7 messages (segments typically <200 bytes), each scan costs ~10-20ns, which is negligible compared to the ~100-200ns per heap allocation saved.
  • Measured impact: Parse + access every field of every segment in a 695-byte ORU_R01 message: 5 allocations / 1,140 bytes. Selective access (3 Get() calls): 6 allocations / 96 bytes / ~330ns.

The scan helpers in delimiters.go power all access:

  • nthSlice(data, delim, n) — returns the n-th (0-based) delimited piece, or nil if out of range.
  • nthRange(data, delim, n) — returns the byte offset range [start, end) of the n-th piece (used by spliceField for offset-based navigation without intermediate slices).
  • countDelimited(data, delim) — returns the number of pieces (always >= 1, even for empty data).

Type Hierarchy

Message        — owns the byte buffer, stores []Segment internally
  Segment      — raw []byte slice into Message buffer; uses pointer receivers to maintain identity with parent slice
    Field      — value type, scans raw bytes for ~ (repetition separator)
      Repetition — value type, scans raw bytes for ^ (component separator)
        Component — value type, scans raw bytes for & (subcomponent separator)
          Subcomponent — leaf value type, holds raw bytes

Segment uses pointer receivers to maintain identity with the Message.segments slice when passing segment references. All other types use value receivers.

Buffer Ownership

ParseMessage copies the input bytes into an owned buffer. All raw []byte slices throughout the hierarchy are sub-slices of this single buffer. The caller can freely reuse or discard their input after calling ParseMessage.

Segment Access

Segments() returns []Segment — a slice reference to the message's internal segment storage. The slice shares the Message's backing array (zero-copy), but callers should not modify it.

for _, seg := range msg.Segments() {
    if seg.Type() == "OBX" {
        // Process OBX segment
    }
}

To access specific segment types, iterate and filter by seg.Type().

Validation Engine

msg.Validate(schema) checks a parsed message against a user-provided Schema and returns a *ValidationResult containing a Valid bool and a slice of Issue structs. Validation runs in three phases, each gated by the presence of relevant definitions in the schema:

Phase 1 — Structure validation (schema.Messages): Looks up the MessageDef via MSH-9.3 (or MSH-9.1 + "_" + MSH-9.2 as fallback). Uses recursive descent to match the message's segment sequence against the Element tree, checking segment presence, order, and cardinality (Min/Max). Groups use a snapshot/restore mechanism to roll back spurious issues when probing for additional group iterations that don't consume segments.

Phase 2 — Content validation (schema.Segments, schema.DataTypes, schema.Tables): Iterates all segments, looks up SegmentDef by type, and validates each defined field for: exact value assertion, required presence, max length, repetition cardinality, primitive format (DT, TM, DTM/TS, NM, SI), composite component structure via DataTypeDef, and coded table values via TableDef. Value assertions (FieldDef.Value, ComponentDef.Value) are checked first; when a value doesn't match, remaining checks (format, table, length) are skipped. After declarative field checks, FieldDef.Check runs per field (skipped on empty/null), then SegmentDef.Check runs per segment occurrence.

Phase 3 — Custom message checks (schema.Checks): Runs each MessageCheckFunc in order, passing the full *Message. Useful for cross-segment business rules that cannot be expressed declaratively.

Key performance decisions:

  • Raw bytes, not strings. Format validators operate on []byte to avoid Unescape() allocations. Escape sequences in date/numeric fields are pathological and would correctly fail validation on the raw representation.
  • Deferred location strings. Location strings (e.g., "PID-3.1") are only constructed on error paths via buildFieldLoc/buildCompLoc helpers. On valid messages (the common case), zero location strings are allocated.
  • No fmt dependency. All string formatting uses strconv.Itoa and string concatenation.
  • Compiler-optimized table lookups. map[string([]byte)] in isInTable uses Go's mapaccess optimization to avoid allocation.

Schema types are defined in schema.go:

  • Schema — four optional maps (Messages, Segments, DataTypes, Tables) plus a Checks slice for message-level custom validators
  • MessageDef — tree of Element (segment or group with Min/Max cardinality)
  • SegmentDef — list of FieldDef plus optional Check SegmentCheckFunc for cross-field validation
  • FieldDef — index, data type, required, repeating, max length, table, value, plus optional Check FieldCheckFunc
  • DataTypeDef — list of ComponentDef for composite types (CX, XPN, etc.), each with optional value assertion
  • TableDef — map of valid coded values
  • FieldCheckFunc, SegmentCheckFunc, MessageCheckFunc — custom validator function types, all returning []Issue and tagged json:"-"

Transform and Builder Engine

Transform, TransformWith, and MessageBuilder share a common byte-splicing engine in transform.go. The central data structure is workBuf — a contiguous []byte buffer with segment boundary tracking ([]segBound). All mutations operate by splicing bytes directly in the buffer and adjusting segment offsets.

workBuf operations:

  • splice(segIdx, start, end, data) — replaces w.data[start:end] with data, updates segment bounds.
  • spliceExtend(segIdx, at, gaps, sep, value) — inserts N separator bytes + value at position, growing w.data in-place. Avoids the intermediate []byte allocation that fieldExtension previously required.
  • replaceField(segIdx, delims, fieldNum, value) — locates the field byte range with fieldByteRange and splices, or extends the segment using fieldGaps + spliceExtend if the field doesn't exist.
  • findSeg / createSeg — locate or append segments by type.

spliceField modifies a field's bytes at a specific hierarchical position (repetition, component, subcomponent). It navigates to the target byte range using nthRange offsets at each hierarchy level, computing padding delimiter counts when extending, and builds the result with a single make([]byte).

MessageBuilder wraps workBuf seeded with a minimal MSH skeleton (instead of a source message). Set(location, value) calls the same applyValueAtLocation path used by Transform. Build() calls ParseMessage on the buffer to produce an immutable *Message.

BatchBuilder constructs BHS/BTS-wrapped batch files. NewBatchBuilder(opts...) seeds the builder with DefaultDelimiters(). SetHeader(fieldNum, value) stores plain-text header field values (escaped at Build time). Add(msg) appends messages. Build() pre-calculates total output size, allocates once, and writes BHS + messages + BTS. BTS-1 is set to the message count. Reset() clears messages while preserving header fields. BHS-7 defaults to time.Now() at build time unless set via SetHeader(7, ...). BHS and FHS segments use the same MSH-style field numbering (field 1 = separator, field 2 = encoding chars, field 3+ = normal fields) via isHeaderSeg in segment.go.

Change typesReplace, Null, Omit, Move, Copy — are sealed interface implementations (Change has an unexported method). applyOneChange dispatches on the concrete type.

Delimiter re-encodingreencodeData performs a single-pass conversion of all bytes from source to destination delimiters, resolving escape sequences to their literal source values and re-escaping if they collide with destination delimiters.

Location Type

Location in accessor.go represents a specific position in an HL7 message hierarchy. ParseLocation parses terser-style strings (e.g., "PID-3[1].4.2") into a Location. Location.String() implements fmt.Stringer and produces the inverse terser representation. Both are used by the accessor (Get), transform, and builder subsystems.

Value in accessor.go is the return type of Get(). It is a lightweight value type (raw []byte + delims Delimiters) with String(), Bytes(), IsEmpty(), IsNull(), and HasValue() — the same interface as Field, Repetition, Component, and Subcomponent. A zero Value (nil raw bytes) is returned for invalid or not-found locations.

ValueDecoder in charset.go is a func([]byte) ([]byte, error) that converts post-unescape bytes to a target encoding (typically UTF-8). DecodeString(ValueDecoder) is defined on Value, Field, Repetition, Component, and Subcomponent. When the decoder is nil, DecodeString is equivalent to String() with no extra allocation.

HL7v2 Specification Decisions

Delimiter Handling

Delimiters are extracted per-message from MSH-1 (field separator) and MSH-2 positions 1-4 (component, repetition, escape, subcomponent). The standard set |^~\& is never assumed — any valid delimiter set is accepted. Validation rejects zero bytes, CR/LF, and duplicate characters.

MSH/BHS/FHS Field Numbering

MSH, BHS, and FHS segments share the same unique field numbering because field 1 IS the field separator character (it does not appear between delimiters like normal fields):

Index Content Notes
Field(0) "MSH" / "BHS" / "FHS" Segment type (same as all segments)
Field(1) "|" The field separator character itself
Field(2) "^~\\&" Encoding characters (literal, not parsed further)
Field(3+) Normal fields Parsed normally from bytes after encoding chars

This is implemented in segment.go via isHeaderSeg() (true for MSH, BHS, FHS) and mshField(), which handles the three special cases before falling through to nthSlice for fields 3+.

ADD Segment (Continuation)

Per HL7v2.5.1 Section 2.5.2, when a segment exceeds the practical length limit, it is split at a field boundary and an ADD (Addendum) segment immediately follows it. Each ADD field is a distinct additional field of the preceding segment — ADD field 1 becomes the next field after the last field in the preceding segment, ADD field 2 becomes the field after that, and so on. No content is concatenated within a field across the segment boundary.

ParseMessage merges ADD segments during the initial buffer copy via mergeADD(). The function strips the segment terminator and the ADD type marker (\rADD, \nADD, or \r\nADD) but retains the field separator that follows, so the ADD fields are appended to the preceding segment with correct field boundaries. The merged buffer is always shorter than or equal to the input, so the allocation size is unchanged.

A fast path checks for the ADD pattern with bytes.Contains before scanning. Messages without ADD segments pay only the cost of two bytes.Contains calls and then fall through to a plain copy.

ADD segments without a field separator (e.g., ADD\r with no fields) are not merged and remain as standalone segments. This matches the spec requirement that ADD must carry continuation data.

ADD segments that immediately follow MSH are also left as standalone segments. Per the spec, ADD extends data segments, not the message header. This also enables Concatenate to correctly reassemble cross-message continuations where page N+1 begins with MSH followed by ADD (continuing the last segment of page N): because ADD is still visible as a standalone segment in the parsed page, Concatenate preserves it in the assembled buffer, and the final ParseMessage call then merges it into the correct preceding segment.

Null vs Empty

Per the HL7 v2 specification:

  • Empty (||): Field omitted, no value present. On update, means "preserve existing value."
  • Null (|""|): Field explicitly set to null. On update, means "clear existing value."

Field.IsNull() checks for exactly two double-quote bytes. Field.IsEmpty() checks for zero-length raw bytes. Field.HasValue() is !IsEmpty() && !IsNull().

Escape Sequences

Escape processing happens only when .String() is called (not during parsing). Unescape() has a zero-allocation fast path: if bytes.IndexByte(data, escapeChar) returns -1, the input slice is returned directly.

Supported sequences (per HL7 v2.5.1 Section 2.7):

Sequence Result Implementation
\F\ Field separator Replaced with delims.Field
\S\ Component separator Replaced with delims.Component
\T\ Subcomponent separator Replaced with delims.SubComponent
\R\ Repetition separator Replaced with delims.Repetition
\E\ Escape character Replaced with delims.Escape
\H\ Start highlighting Consumed (no output)
\N\ End highlighting Consumed (no output)
\Xhh..\ Hex-encoded data Decoded to bytes
\Z..\ Locally defined Passed through verbatim
\C..\ Single-byte charset Passed through verbatim
\M..\ Multi-byte charset Passed through verbatim
\.xx\ Formatted text Passed through verbatim
Unknown Passed through verbatim

Segment Terminators

The HL7 spec mandates \r (0x0D) as the segment terminator. In practice, systems emit \r\n or \n alone. The parser accepts all three. splitSegments in message.go handles \r, \n, and \r\n pairs, skipping empty lines.

MLLP Framing

The Minimum Lower Layer Protocol wraps each message:

0x0B (VT) + message_bytes + 0x1C (FS) + 0x0D (CR)

Reader supports three modes:

  • ModeMLLP — strict MLLP framing
  • ModeRaw — MSH-boundary detection for unframed streams
  • ModeAuto — peeks at the first byte to detect framing

The trailing CR after 0x1C is tolerated if missing (some implementations omit it).

Batch and File Structure

ParseBatch() handles BHS/BTS-wrapped message groups. ParseFile() handles FHS/FTS-wrapped batch groups. Both are tolerant: header and trailer segments are optional, and messages not wrapped in BHS/BTS are placed in an implicit batch. BHS/FHS segments extract delimiters using the same MSH-style encoding character layout.

Code Conventions

Naming

  • Types: Message, Segment, Field, Repetition, Component, Subcomponent — named for the HL7 hierarchy level they represent. Location — terser-style position with ParseLocation/String() round-trip. MessageBuilder, BuilderOption — from-scratch message construction. Change (sealed interface), replaceChange, nullChange, omitChange, moveChange, copyChange — transform operations. Schema, MessageDef, SegmentDef, FieldDef, DataTypeDef, ComponentDef, TableDef — schema input types. ValidationResult, Issue, Severity — validation output types. FieldCheckFunc, SegmentCheckFunc, MessageCheckFunc — custom validator function types. AckCode, AckOption, WithErrors — ACK generation types.
  • Errors: Sentinel errors prefixed with Err (e.g., ErrMessageTooShort, ErrNoMSHSegment). All are package-level var declarations using errors.New. Validation issue codes are Code-prefixed string constants (e.g., CodeRequiredField, CodeInvalidFormat).
  • Unexported helpers: Lowercase descriptive names (nthSlice, countDelimited, splitSegments, mshField, normalField).
  • Method naming follows Go conventions: String() for the fmt.Stringer interface, Bytes() for raw access, IsX() for boolean predicates.

Commenting

  • Exported types and functions have doc comments per Go convention.
  • Comments explain why, not what, except where the HL7 spec demands non-obvious behavior (e.g., MSH field numbering).
  • Internal helpers have brief comments only where the logic is not self-evident.
  • No inline comments on straightforward code.

Error Handling

  • Parse-time errors: ParseMessage, ParseBatch, ParseFile, and Reader.ReadMessage return errors for structural failures.
  • Access-time zero values: Field(n), Rep(n), Component(n), SubComponent(n) return empty values for out-of-range indices. This enables chained access without error checking at each level.
  • ParseError wraps a sentinel error with position and context for detailed diagnostics.
  • Validation results: msg.Validate() never returns an error — it always returns a *ValidationResult. Issues are collected in a slice of Issue structs with Severity, Location, Code, and Description. The Valid field is false if any issue has SeverityError.

Testing

  • Unit tests in *_test.go files mirror the source file they test (e.g., segment_test.go tests segment.go).

  • Golden tests in message_test.go use real-world-style messages from testdata/.

  • Fuzz test in fuzz_test.go exercises parse + full traversal + accessor on arbitrary input.

  • Benchmarks in message_test.go cover parse-only, parse+access, minimal message, and accessor patterns. Writer benchmarks in writer_test.go cover MLLP and raw write modes. ACK benchmarks in ack_test.go cover ACK generation. Builder benchmarks in builder_test.go cover from-scratch message construction. Transform benchmarks in transform_test.go cover multi-change transforms on real-world messages.

  • Transform tests in transform_test.go cover replace, null, omit, move, copy, field-level vs component-level ordering, delimiter conversion (including escape sequence resolution and re-escaping), segment extension, multiple segment occurrences, and last-write-wins semantics.

  • Builder tests in builder_test.go cover basic set, multiple segments, component/subcomponent targeting, repetitions, null values, escaping round-trip, custom delimiters, build reusability, and error cases.

  • Validation tests in validate_test.go cover structure matching (segments, groups, cardinality), field content (required, length, cardinality), data type format (DT, TM, DTM, NM, SI), table lookups, nil inputs, and integration with realistic schemas. Benchmarks test structure-only, fields-only, full, empty-schema, and ADT patterns.

  • Tests use package hl7 (not hl7_test) to access unexported fields for direct struct construction.

  • Always write tests first.

  • If a failing test is not observed first, a code change is not to be made.

  • Tests must be comprehensive, covering both positive and negative outcomes.

  • Test coverage must not drop below 96%.

  • Always write benchmarks for each public function.

  • Benchmarks may be written after a function is introduced, but before the change that introduces it is considered complete.

  • Always run tests after a completed change.

  • A failing test must be addressed before moving foward.

  • Always run benchmarks after a completed change.

  • A performance regression in the form of additional latency, allocation, or memory must be addressed before moving forward unless an exception is granted due to an expected regression.

Performance Characteristics

Benchmarked on Apple M3 Pro:

Parsing

Benchmark Time Allocs Bytes
ParseMessage (695B ORU_R01) ~834ns 3 1088
Parse + access all fields ~8.2us 5 1140
ParseMessage (minimal MSH) ~94ns 3 144
Get() accessor (3 lookups) ~329ns 6 96

The 3 allocations in ParseMessage are: the owned byte buffer copy, the []Segment slice, and the *Message struct itself. The 2 additional allocations in parse+access come from Unescape slow paths on fields containing the escape character (MSH-2 always contains \).

Validation

Benchmark Time Allocs Bytes
Validate (empty schema) ~15ns 1 32
Validate (structure only, ORU_R01) ~257ns 3 72
Validate (fields only, ORU_R01) ~2.8us 11 64
Validate (full, ORU_R01) ~3.1us 13 104
Validate (full, ADT_A01) ~815ns 8 88

Validation allocations are dominated by the ValidationResult and internal validator struct. On valid messages, no location strings are constructed. The low allocation counts are achieved by operating on raw []byte (avoiding Unescape), deferring string construction to error paths, and using Go's map[string([]byte)] optimization for table lookups.

Writing

Benchmark Time Allocs Bytes
WriteMessage MLLP ~12ns 0 0
WriteMessage raw ~9ns 0 0

Writer writes are zero-allocation when the Writer is reused. The bufio.Writer batches the framing bytes and payload into a single syscall.

ACK Generation

Benchmark Time Allocs Bytes
Ack (ADT^A01) ~475ns 3 192
Ack with 3 ERR segments ~830ns 14 768

The 3 allocations in basic ACK are: the time.Time.Format() string, the output []byte buffer, and the ackBuilder struct. Buffer size is pre-calculated to avoid growing. WithErrors adds allocations for the []errData slice, Escape calls on code/description fields, and appendERL calls for location conversion.

Transform

Benchmark Time Allocs Bytes
Transform (4 changes, ORU_R01) ~1.3us 6 2200

Transform allocations: workBuf data buffer copy, []segBound slice, and ParseMessage at the end (3 allocs). Sub-field splicing via spliceField uses offset-based navigation with a single make([]byte) per call. Field extension uses spliceExtend to write separator bytes + value directly into the work buffer (zero intermediate allocations). Copy reads from the work buffer without clearing the source, so it adds no allocations beyond the value snapshot.

Builder

Benchmark Time Allocs Bytes
Builder (10 Set calls + Build) ~1.0us 13 784

Builder allocations: MessageBuilder struct, initial workBuf data/segs, Escape calls on values containing no delimiters (fast-path returns input), spliceField result allocations for sub-field targeting, and ParseMessage in Build(). The spliceExtend path avoids an allocation per field extension by writing separator gaps directly into the work buffer.

Gotchas

Index Conventions Are Mixed

The API mixes 0-based and 1-based indexing per HL7 convention. This is the most likely source of off-by-one bugs when modifying the code:

Method Indexing Why
Field(n) 0-based (0 = segment type, 1+ = fields) HL7 convention: MSH-1 is the first field
Rep(n) 0-based Repetitions are naturally 0-indexed
Component(n) 1-based HL7 convention: PID-5.1 is the first component
SubComponent(n) 1-based HL7 convention: PID-3.1.1 is the first subcomponent
Location.SegmentIndex 0-based OBX(0) is the first OBX segment
Location.Repetition 0-based PID-3[0] is the first repetition
Location.Component 1-based (0 = not specified) Matches HL7 convention
Location.SubComponent 1-based (0 = not specified) Matches HL7 convention

Internally, nthSlice is always 0-based. Callers in field.go and component.go subtract 1 from 1-based indices before passing to nthSlice.

nil vs Empty Slice from nthSlice

nthSlice returns nil for out-of-range (meaning "not found") vs an empty []byte for a present-but-empty piece between delimiters (e.g., ||). Every caller depends on this distinction — nil triggers the "return zero value" path, while an empty slice creates a valid-but-empty Field/Component/etc.

MSH-2 Always Allocates on .String()

MSH-2 contains ^~\&, which includes \ — the escape character. Calling .String() on MSH-2 (or any field containing \) always hits the Unescape slow path and allocates a new byte slice. This accounts for 2 of the 5 allocations in the parse+access-all-fields benchmark. This is expected and correct.

Reader Raw Mode Is Fragile

readRaw() in reader.go reconstructs the bufio.Reader via io.MultiReader when it needs to push back an MSH boundary that belongs to the next message. This is the most complex and fragile code in the library. Changes here should be accompanied by thorough testing with multi-message raw streams.

Validation Group Matching Uses Snapshot/Restore

When matching repeating groups in matchGroupElem, the validator probes for additional group iterations by calling matchElements and checking if any segments were consumed (newPos == start). If the probe doesn't consume segments, any issues generated during the probe (e.g., "required segment missing" for required elements within the group) are spurious and must be rolled back. The snapshotIssues/restoreIssues mechanism handles this by truncating the issues slice back to its pre-probe state. Removing this mechanism will cause false-positive validation errors on messages with optional/repeating groups.

Test Data Uses LF Line Endings

The .hl7 files in testdata/ use LF (\n) line endings, not the CR (\r) mandated by the HL7 spec. The parser handles this transparently, but it can be confusing when inspecting test data. If adding new test data files, either line ending works.