Single-package Go library (package hl7) for parsing HL7 version 2.x messages in ER7 (pipe-delimited) format. Zero external dependencies. Requires Go 1.23+ for language features.
- Low-latency parsing. Consumers reading only MSH-9 should not pay the cost of parsing OBX segments. Parsing is lazy:
ParseMessagesplits segments by\r, and field/component/subcomponent boundaries are scanned on each access. - Minimal allocations. Sub-message types (
Field,Repetition,Component,Subcomponent) are lightweight value types (~32 bytes each:raw []byte+delims Delimiters). No internal caches or memoization. The only heap allocations come fromParseMessageitself (buffer copy + segment slice). - Spec-compliant HL7v2 parsing. Delimiter extraction from MSH-1/MSH-2, MSH field numbering quirks, null vs empty semantics, escape sequence processing, MLLP framing, and batch/file structure all follow the HL7 v2.5.1 specification.
- Ergonomic chained access. Out-of-range access returns zero values (not errors), enabling patterns like
seg.Field(9).Component(1).String()without intermediate nil checks. - General-purpose. Not tied to specific message types (ADT, ORU, etc.). Intended to be consumed by a larger application that interprets message content.
- Safe for concurrent use. Messages are immutable after parsing — all types are either value types or hold only read-only
[]byteslices into the owned buffer, with no mutable state or caches. Concurrent reads require no synchronization.
- Full serialization or Marshal API. The library does not provide a general
Marshalfunction. Message construction is handled byMessageBuilder(from-scratch) andTransform(modify existing), both of which produce*MessageviaParseMessage. There are no per-field setters on parsed messages. - Built-in schema definitions. The library does not ship with HL7v2 segment, data type, or table definitions. Users provide their own
Schemastruct tomsg.Validate(). This keeps the library general-purpose and avoids bundling version-specific definitions. - No field location constants. HL7 field positions (e.g.,
PID-5.1) are not stable across HL7 v2 versions or vendor implementations. The library does not provide named constants for terser-style location strings; callers define their own. - No predicate-based segment filtering. Methods like
SegmentsOfType,SegmentsWhere, or combinator functions (And,Or) were considered and rejected. The plainSegments()loop with an inlineiforswitchis idiomatic Go, immediately readable without library knowledge, and handles all filtering cases in a single pass. The abstraction does not eliminate domain knowledge — callers still decide which types and fields matter — it only restructures where that logic lives. The API surface cost exceeds the ergonomic benefit.
go build ./... # Compile
go test ./... -v # Run all tests
go test -bench=. -benchmem # Benchmarks
go test -fuzz=FuzzParseMessage -fuzztime=30s # Fuzz testing
go vet ./... # Static analysishl7/
go.mod # module hl7, go 1.25.7, zero dependencies
doc.go # Package-level documentation and examples
error.go # Sentinel errors and ParseError type
delimiters.go # Delimiters struct, extraction, validation, scan helpers
escape.go # Unescape() with zero-alloc fast path
component.go # Component and Subcomponent value types
field.go # Field and Repetition value types
segment.go # Segment type with MSH/BHS/FHS special-case handling
message.go # Message type, ParseMessage(), segment iterators
accessor.go # Terser-style location parser, Location type, Value type, Get()
charset.go # ValueDecoder type; DecodeString on Field/Repetition/Component/Subcomponent/Value
reader.go # io.Reader wrapper with MLLP and raw mode support
writer.go # io.Writer wrapper with MLLP and raw mode support
ack.go # ACK message generation (AckCode, AckOption, Message.Ack, WithErrors)
batch.go # Batch (BHS/BTS) and File (FHS/FTS) parsing; BatchBuilder for constructing batches
transform.go # Transform engine, Change types, workBuf, delimiter re-encoding
builder.go # MessageBuilder for from-scratch message construction
schema.go # Schema types for validation (MessageDef, SegmentDef, etc.)
validate.go # Validation engine (structure + content validation)
examples/ # Runnable example programs
builder/main.go # Construct a message from scratch and write it
full/main.go # End-to-end workflow: read, validate, transform, write
testdata/ # Golden test files (adt_a01.hl7, oru_r01.hl7, etc.)
*_test.go # Unit tests, benchmarks, fuzz test
File ordering reflects the dependency graph bottom-up: error.go and delimiters.go are leaf dependencies; message.go and above consume everything below. transform.go provides the byte-splicing engine; builder.go wraps it for from-scratch construction. schema.go defines data types only; validate.go depends on everything.
doc.go must be evaluated for updating after every change.
The central design decision is that sub-message types do not cache parsed results. Every call to Field(n), Rep(n), Component(n), or SubComponent(n) re-scans the raw bytes for the n-th delimiter. This was an intentional tradeoff:
- Pro: Eliminates 10 heap allocations per message that were previously needed for
[]Field,[]Repetition, and[]Componentslices created on first access. - Pro: All sub-message types become pure value types (two fields:
raw []byte+delims Delimiters) — trivially copyable, no pointer indirection, no mutation. - Con: Repeated access to the same field re-scans each time. For HL7 messages (segments typically <200 bytes), each scan costs ~10-20ns, which is negligible compared to the ~100-200ns per heap allocation saved.
- Measured impact: Parse + access every field of every segment in a 695-byte ORU_R01 message: 5 allocations / 1,140 bytes. Selective access (3
Get()calls): 6 allocations / 96 bytes / ~330ns.
The scan helpers in delimiters.go power all access:
nthSlice(data, delim, n)— returns the n-th (0-based) delimited piece, or nil if out of range.nthRange(data, delim, n)— returns the byte offset range[start, end)of the n-th piece (used byspliceFieldfor offset-based navigation without intermediate slices).countDelimited(data, delim)— returns the number of pieces (always >= 1, even for empty data).
Message — owns the byte buffer, stores []Segment internally
Segment — raw []byte slice into Message buffer; uses pointer receivers to maintain identity with parent slice
Field — value type, scans raw bytes for ~ (repetition separator)
Repetition — value type, scans raw bytes for ^ (component separator)
Component — value type, scans raw bytes for & (subcomponent separator)
Subcomponent — leaf value type, holds raw bytes
Segment uses pointer receivers to maintain identity with the Message.segments slice when passing segment references. All other types use value receivers.
ParseMessage copies the input bytes into an owned buffer. All raw []byte slices throughout the hierarchy are sub-slices of this single buffer. The caller can freely reuse or discard their input after calling ParseMessage.
Segments() returns []Segment — a slice reference to the message's internal segment storage. The slice shares the Message's backing array (zero-copy), but callers should not modify it.
for _, seg := range msg.Segments() {
if seg.Type() == "OBX" {
// Process OBX segment
}
}To access specific segment types, iterate and filter by seg.Type().
msg.Validate(schema) checks a parsed message against a user-provided Schema and returns a *ValidationResult containing a Valid bool and a slice of Issue structs. Validation runs in three phases, each gated by the presence of relevant definitions in the schema:
Phase 1 — Structure validation (schema.Messages): Looks up the MessageDef via MSH-9.3 (or MSH-9.1 + "_" + MSH-9.2 as fallback). Uses recursive descent to match the message's segment sequence against the Element tree, checking segment presence, order, and cardinality (Min/Max). Groups use a snapshot/restore mechanism to roll back spurious issues when probing for additional group iterations that don't consume segments.
Phase 2 — Content validation (schema.Segments, schema.DataTypes, schema.Tables): Iterates all segments, looks up SegmentDef by type, and validates each defined field for: exact value assertion, required presence, max length, repetition cardinality, primitive format (DT, TM, DTM/TS, NM, SI), composite component structure via DataTypeDef, and coded table values via TableDef. Value assertions (FieldDef.Value, ComponentDef.Value) are checked first; when a value doesn't match, remaining checks (format, table, length) are skipped. After declarative field checks, FieldDef.Check runs per field (skipped on empty/null), then SegmentDef.Check runs per segment occurrence.
Phase 3 — Custom message checks (schema.Checks): Runs each MessageCheckFunc in order, passing the full *Message. Useful for cross-segment business rules that cannot be expressed declaratively.
Key performance decisions:
- Raw bytes, not strings. Format validators operate on
[]byteto avoidUnescape()allocations. Escape sequences in date/numeric fields are pathological and would correctly fail validation on the raw representation. - Deferred location strings. Location strings (e.g.,
"PID-3.1") are only constructed on error paths viabuildFieldLoc/buildCompLochelpers. On valid messages (the common case), zero location strings are allocated. - No
fmtdependency. All string formatting usesstrconv.Itoaand string concatenation. - Compiler-optimized table lookups.
map[string([]byte)]inisInTableuses Go's mapaccess optimization to avoid allocation.
Schema types are defined in schema.go:
Schema— four optional maps (Messages,Segments,DataTypes,Tables) plus aChecksslice for message-level custom validatorsMessageDef— tree ofElement(segment or group with Min/Max cardinality)SegmentDef— list ofFieldDefplus optionalCheck SegmentCheckFuncfor cross-field validationFieldDef— index, data type, required, repeating, max length, table, value, plus optionalCheck FieldCheckFuncDataTypeDef— list ofComponentDeffor composite types (CX, XPN, etc.), each with optional value assertionTableDef— map of valid coded valuesFieldCheckFunc,SegmentCheckFunc,MessageCheckFunc— custom validator function types, all returning[]Issueand taggedjson:"-"
Transform, TransformWith, and MessageBuilder share a common byte-splicing engine in transform.go. The central data structure is workBuf — a contiguous []byte buffer with segment boundary tracking ([]segBound). All mutations operate by splicing bytes directly in the buffer and adjusting segment offsets.
workBuf operations:
splice(segIdx, start, end, data)— replacesw.data[start:end]withdata, updates segment bounds.spliceExtend(segIdx, at, gaps, sep, value)— inserts N separator bytes + value at position, growingw.datain-place. Avoids the intermediate[]byteallocation thatfieldExtensionpreviously required.replaceField(segIdx, delims, fieldNum, value)— locates the field byte range withfieldByteRangeand splices, or extends the segment usingfieldGaps+spliceExtendif the field doesn't exist.findSeg/createSeg— locate or append segments by type.
spliceField modifies a field's bytes at a specific hierarchical position (repetition, component, subcomponent). It navigates to the target byte range using nthRange offsets at each hierarchy level, computing padding delimiter counts when extending, and builds the result with a single make([]byte).
MessageBuilder wraps workBuf seeded with a minimal MSH skeleton (instead of a source message). Set(location, value) calls the same applyValueAtLocation path used by Transform. Build() calls ParseMessage on the buffer to produce an immutable *Message.
BatchBuilder constructs BHS/BTS-wrapped batch files. NewBatchBuilder(opts...) seeds the builder with DefaultDelimiters(). SetHeader(fieldNum, value) stores plain-text header field values (escaped at Build time). Add(msg) appends messages. Build() pre-calculates total output size, allocates once, and writes BHS + messages + BTS. BTS-1 is set to the message count. Reset() clears messages while preserving header fields. BHS-7 defaults to time.Now() at build time unless set via SetHeader(7, ...). BHS and FHS segments use the same MSH-style field numbering (field 1 = separator, field 2 = encoding chars, field 3+ = normal fields) via isHeaderSeg in segment.go.
Change types — Replace, Null, Omit, Move, Copy — are sealed interface implementations (Change has an unexported method). applyOneChange dispatches on the concrete type.
Delimiter re-encoding — reencodeData performs a single-pass conversion of all bytes from source to destination delimiters, resolving escape sequences to their literal source values and re-escaping if they collide with destination delimiters.
Location in accessor.go represents a specific position in an HL7 message hierarchy. ParseLocation parses terser-style strings (e.g., "PID-3[1].4.2") into a Location. Location.String() implements fmt.Stringer and produces the inverse terser representation. Both are used by the accessor (Get), transform, and builder subsystems.
Value in accessor.go is the return type of Get(). It is a lightweight value type (raw []byte + delims Delimiters) with String(), Bytes(), IsEmpty(), IsNull(), and HasValue() — the same interface as Field, Repetition, Component, and Subcomponent. A zero Value (nil raw bytes) is returned for invalid or not-found locations.
ValueDecoder in charset.go is a func([]byte) ([]byte, error) that converts post-unescape bytes to a target encoding (typically UTF-8). DecodeString(ValueDecoder) is defined on Value, Field, Repetition, Component, and Subcomponent. When the decoder is nil, DecodeString is equivalent to String() with no extra allocation.
Delimiters are extracted per-message from MSH-1 (field separator) and MSH-2 positions 1-4 (component, repetition, escape, subcomponent). The standard set |^~\& is never assumed — any valid delimiter set is accepted. Validation rejects zero bytes, CR/LF, and duplicate characters.
MSH, BHS, and FHS segments share the same unique field numbering because field 1 IS the field separator character (it does not appear between delimiters like normal fields):
| Index | Content | Notes |
|---|---|---|
Field(0) |
"MSH" / "BHS" / "FHS" |
Segment type (same as all segments) |
Field(1) |
"|" |
The field separator character itself |
Field(2) |
"^~\\&" |
Encoding characters (literal, not parsed further) |
Field(3+) |
Normal fields | Parsed normally from bytes after encoding chars |
This is implemented in segment.go via isHeaderSeg() (true for MSH, BHS, FHS) and mshField(), which handles the three special cases before falling through to nthSlice for fields 3+.
Per HL7v2.5.1 Section 2.5.2, when a segment exceeds the practical length limit, it is split at a field boundary and an ADD (Addendum) segment immediately follows it. Each ADD field is a distinct additional field of the preceding segment — ADD field 1 becomes the next field after the last field in the preceding segment, ADD field 2 becomes the field after that, and so on. No content is concatenated within a field across the segment boundary.
ParseMessage merges ADD segments during the initial buffer copy via mergeADD(). The function strips the segment terminator and the ADD type marker (\rADD, \nADD, or \r\nADD) but retains the field separator that follows, so the ADD fields are appended to the preceding segment with correct field boundaries. The merged buffer is always shorter than or equal to the input, so the allocation size is unchanged.
A fast path checks for the ADD pattern with bytes.Contains before scanning. Messages without ADD segments pay only the cost of two bytes.Contains calls and then fall through to a plain copy.
ADD segments without a field separator (e.g., ADD\r with no fields) are not merged and remain as standalone segments. This matches the spec requirement that ADD must carry continuation data.
ADD segments that immediately follow MSH are also left as standalone segments. Per the spec, ADD extends data segments, not the message header. This also enables Concatenate to correctly reassemble cross-message continuations where page N+1 begins with MSH followed by ADD (continuing the last segment of page N): because ADD is still visible as a standalone segment in the parsed page, Concatenate preserves it in the assembled buffer, and the final ParseMessage call then merges it into the correct preceding segment.
Per the HL7 v2 specification:
- Empty (
||): Field omitted, no value present. On update, means "preserve existing value." - Null (
|""|): Field explicitly set to null. On update, means "clear existing value."
Field.IsNull() checks for exactly two double-quote bytes. Field.IsEmpty() checks for zero-length raw bytes. Field.HasValue() is !IsEmpty() && !IsNull().
Escape processing happens only when .String() is called (not during parsing). Unescape() has a zero-allocation fast path: if bytes.IndexByte(data, escapeChar) returns -1, the input slice is returned directly.
Supported sequences (per HL7 v2.5.1 Section 2.7):
| Sequence | Result | Implementation |
|---|---|---|
\F\ |
Field separator | Replaced with delims.Field |
\S\ |
Component separator | Replaced with delims.Component |
\T\ |
Subcomponent separator | Replaced with delims.SubComponent |
\R\ |
Repetition separator | Replaced with delims.Repetition |
\E\ |
Escape character | Replaced with delims.Escape |
\H\ |
Start highlighting | Consumed (no output) |
\N\ |
End highlighting | Consumed (no output) |
\Xhh..\ |
Hex-encoded data | Decoded to bytes |
\Z..\ |
Locally defined | Passed through verbatim |
\C..\ |
Single-byte charset | Passed through verbatim |
\M..\ |
Multi-byte charset | Passed through verbatim |
\.xx\ |
Formatted text | Passed through verbatim |
| Unknown | Passed through verbatim |
The HL7 spec mandates \r (0x0D) as the segment terminator. In practice, systems emit \r\n or \n alone. The parser accepts all three. splitSegments in message.go handles \r, \n, and \r\n pairs, skipping empty lines.
The Minimum Lower Layer Protocol wraps each message:
0x0B (VT) + message_bytes + 0x1C (FS) + 0x0D (CR)
Reader supports three modes:
ModeMLLP— strict MLLP framingModeRaw— MSH-boundary detection for unframed streamsModeAuto— peeks at the first byte to detect framing
The trailing CR after 0x1C is tolerated if missing (some implementations omit it).
ParseBatch() handles BHS/BTS-wrapped message groups. ParseFile() handles FHS/FTS-wrapped batch groups. Both are tolerant: header and trailer segments are optional, and messages not wrapped in BHS/BTS are placed in an implicit batch. BHS/FHS segments extract delimiters using the same MSH-style encoding character layout.
- Types:
Message,Segment,Field,Repetition,Component,Subcomponent— named for the HL7 hierarchy level they represent.Location— terser-style position withParseLocation/String()round-trip.MessageBuilder,BuilderOption— from-scratch message construction.Change(sealed interface),replaceChange,nullChange,omitChange,moveChange,copyChange— transform operations.Schema,MessageDef,SegmentDef,FieldDef,DataTypeDef,ComponentDef,TableDef— schema input types.ValidationResult,Issue,Severity— validation output types.FieldCheckFunc,SegmentCheckFunc,MessageCheckFunc— custom validator function types.AckCode,AckOption,WithErrors— ACK generation types. - Errors: Sentinel errors prefixed with
Err(e.g.,ErrMessageTooShort,ErrNoMSHSegment). All are package-levelvardeclarations usingerrors.New. Validation issue codes areCode-prefixed string constants (e.g.,CodeRequiredField,CodeInvalidFormat). - Unexported helpers: Lowercase descriptive names (
nthSlice,countDelimited,splitSegments,mshField,normalField). - Method naming follows Go conventions:
String()for thefmt.Stringerinterface,Bytes()for raw access,IsX()for boolean predicates.
- Exported types and functions have doc comments per Go convention.
- Comments explain why, not what, except where the HL7 spec demands non-obvious behavior (e.g., MSH field numbering).
- Internal helpers have brief comments only where the logic is not self-evident.
- No inline comments on straightforward code.
- Parse-time errors:
ParseMessage,ParseBatch,ParseFile, andReader.ReadMessagereturn errors for structural failures. - Access-time zero values:
Field(n),Rep(n),Component(n),SubComponent(n)return empty values for out-of-range indices. This enables chained access without error checking at each level. ParseErrorwraps a sentinel error with position and context for detailed diagnostics.- Validation results:
msg.Validate()never returns an error — it always returns a*ValidationResult. Issues are collected in a slice ofIssuestructs withSeverity,Location,Code, andDescription. TheValidfield isfalseif any issue hasSeverityError.
-
Unit tests in
*_test.gofiles mirror the source file they test (e.g.,segment_test.gotestssegment.go). -
Golden tests in
message_test.gouse real-world-style messages fromtestdata/. -
Fuzz test in
fuzz_test.goexercises parse + full traversal + accessor on arbitrary input. -
Benchmarks in
message_test.gocover parse-only, parse+access, minimal message, and accessor patterns. Writer benchmarks inwriter_test.gocover MLLP and raw write modes. ACK benchmarks inack_test.gocover ACK generation. Builder benchmarks inbuilder_test.gocover from-scratch message construction. Transform benchmarks intransform_test.gocover multi-change transforms on real-world messages. -
Transform tests in
transform_test.gocover replace, null, omit, move, copy, field-level vs component-level ordering, delimiter conversion (including escape sequence resolution and re-escaping), segment extension, multiple segment occurrences, and last-write-wins semantics. -
Builder tests in
builder_test.gocover basic set, multiple segments, component/subcomponent targeting, repetitions, null values, escaping round-trip, custom delimiters, build reusability, and error cases. -
Validation tests in
validate_test.gocover structure matching (segments, groups, cardinality), field content (required, length, cardinality), data type format (DT, TM, DTM, NM, SI), table lookups, nil inputs, and integration with realistic schemas. Benchmarks test structure-only, fields-only, full, empty-schema, and ADT patterns. -
Tests use
package hl7(nothl7_test) to access unexported fields for direct struct construction. -
Always write tests first.
-
If a failing test is not observed first, a code change is not to be made.
-
Tests must be comprehensive, covering both positive and negative outcomes.
-
Test coverage must not drop below 96%.
-
Always write benchmarks for each public function.
-
Benchmarks may be written after a function is introduced, but before the change that introduces it is considered complete.
-
Always run tests after a completed change.
-
A failing test must be addressed before moving foward.
-
Always run benchmarks after a completed change.
-
A performance regression in the form of additional latency, allocation, or memory must be addressed before moving forward unless an exception is granted due to an expected regression.
Benchmarked on Apple M3 Pro:
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| ParseMessage (695B ORU_R01) | ~834ns | 3 | 1088 |
| Parse + access all fields | ~8.2us | 5 | 1140 |
| ParseMessage (minimal MSH) | ~94ns | 3 | 144 |
| Get() accessor (3 lookups) | ~329ns | 6 | 96 |
The 3 allocations in ParseMessage are: the owned byte buffer copy, the []Segment slice, and the *Message struct itself. The 2 additional allocations in parse+access come from Unescape slow paths on fields containing the escape character (MSH-2 always contains \).
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| Validate (empty schema) | ~15ns | 1 | 32 |
| Validate (structure only, ORU_R01) | ~257ns | 3 | 72 |
| Validate (fields only, ORU_R01) | ~2.8us | 11 | 64 |
| Validate (full, ORU_R01) | ~3.1us | 13 | 104 |
| Validate (full, ADT_A01) | ~815ns | 8 | 88 |
Validation allocations are dominated by the ValidationResult and internal validator struct. On valid messages, no location strings are constructed. The low allocation counts are achieved by operating on raw []byte (avoiding Unescape), deferring string construction to error paths, and using Go's map[string([]byte)] optimization for table lookups.
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| WriteMessage MLLP | ~12ns | 0 | 0 |
| WriteMessage raw | ~9ns | 0 | 0 |
Writer writes are zero-allocation when the Writer is reused. The bufio.Writer batches the framing bytes and payload into a single syscall.
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| Ack (ADT^A01) | ~475ns | 3 | 192 |
| Ack with 3 ERR segments | ~830ns | 14 | 768 |
The 3 allocations in basic ACK are: the time.Time.Format() string, the output []byte buffer, and the ackBuilder struct. Buffer size is pre-calculated to avoid growing. WithErrors adds allocations for the []errData slice, Escape calls on code/description fields, and appendERL calls for location conversion.
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| Transform (4 changes, ORU_R01) | ~1.3us | 6 | 2200 |
Transform allocations: workBuf data buffer copy, []segBound slice, and ParseMessage at the end (3 allocs). Sub-field splicing via spliceField uses offset-based navigation with a single make([]byte) per call. Field extension uses spliceExtend to write separator bytes + value directly into the work buffer (zero intermediate allocations). Copy reads from the work buffer without clearing the source, so it adds no allocations beyond the value snapshot.
| Benchmark | Time | Allocs | Bytes |
|---|---|---|---|
| Builder (10 Set calls + Build) | ~1.0us | 13 | 784 |
Builder allocations: MessageBuilder struct, initial workBuf data/segs, Escape calls on values containing no delimiters (fast-path returns input), spliceField result allocations for sub-field targeting, and ParseMessage in Build(). The spliceExtend path avoids an allocation per field extension by writing separator gaps directly into the work buffer.
The API mixes 0-based and 1-based indexing per HL7 convention. This is the most likely source of off-by-one bugs when modifying the code:
| Method | Indexing | Why |
|---|---|---|
Field(n) |
0-based (0 = segment type, 1+ = fields) | HL7 convention: MSH-1 is the first field |
Rep(n) |
0-based | Repetitions are naturally 0-indexed |
Component(n) |
1-based | HL7 convention: PID-5.1 is the first component |
SubComponent(n) |
1-based | HL7 convention: PID-3.1.1 is the first subcomponent |
Location.SegmentIndex |
0-based | OBX(0) is the first OBX segment |
Location.Repetition |
0-based | PID-3[0] is the first repetition |
Location.Component |
1-based (0 = not specified) | Matches HL7 convention |
Location.SubComponent |
1-based (0 = not specified) | Matches HL7 convention |
Internally, nthSlice is always 0-based. Callers in field.go and component.go subtract 1 from 1-based indices before passing to nthSlice.
nthSlice returns nil for out-of-range (meaning "not found") vs an empty []byte for a present-but-empty piece between delimiters (e.g., ||). Every caller depends on this distinction — nil triggers the "return zero value" path, while an empty slice creates a valid-but-empty Field/Component/etc.
MSH-2 contains ^~\&, which includes \ — the escape character. Calling .String() on MSH-2 (or any field containing \) always hits the Unescape slow path and allocates a new byte slice. This accounts for 2 of the 5 allocations in the parse+access-all-fields benchmark. This is expected and correct.
readRaw() in reader.go reconstructs the bufio.Reader via io.MultiReader when it needs to push back an MSH boundary that belongs to the next message. This is the most complex and fragile code in the library. Changes here should be accompanied by thorough testing with multi-message raw streams.
When matching repeating groups in matchGroupElem, the validator probes for additional group iterations by calling matchElements and checking if any segments were consumed (newPos == start). If the probe doesn't consume segments, any issues generated during the probe (e.g., "required segment missing" for required elements within the group) are spurious and must be rolled back. The snapshotIssues/restoreIssues mechanism handles this by truncating the issues slice back to its pre-probe state. Removing this mechanism will cause false-positive validation errors on messages with optional/repeating groups.
The .hl7 files in testdata/ use LF (\n) line endings, not the CR (\r) mandated by the HL7 spec. The parser handles this transparently, but it can be confusing when inspecting test data. If adding new test data files, either line ending works.