Add logic for separator/initiator/prefixed/terminator alignment #1604

olabusayoT · 2025-12-18T21:01:18Z

Implement logic to calculate prefix, infix, and postfix separator alignments and lengths in sequence terms.
Added new alignment logic/test cases covering initiator alignment, prefixed length elements, terminator alignment and value MTA (for optimization) behavior.
Enhanced alignment calculation by excluding prior alignment from the contentStart, as alignmentApprox is a good place to start the contentStartAlignment.
Updated schemas and TDML files to include test data for new alignment scenarios.

DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057

- Implement logic to calculate prefix, infix, and postfix separator alignments and lengths in sequence terms. - Added new alignment logic/test cases covering initiator alignment, prefixed length elements, terminator alignment and value MTA (for optimization) behavior. - Enhanced alignment calculation by excluding prior alignment from the contentStart, as alignmentApprox is a good place to start the contentStartAlignment. - Updated schemas and TDML files to include test data for new alignment scenarios. DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057

stevedlawrence

This alignment stuff is very complicated and difficult to make sense. I think this is the right approach to fixing the current issues, I think with some tweaks needed.

Though we might want to consider if there's a different approach, or maybe some refactoring that's easier to make sense of for a future update.

stevedlawrence · 2025-12-19T14:48:22Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+      case Some(s: SequenceTermBase) if s.hasSeparator =>
+        import SeparatorPosition.*
+        s.separatorPosition match {
+          case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits)


MTA is an alignment thing, so I think these MTA approx functions what to return AligmentMultipleOf instead of LengthApprox.

Not sure that's right.

An MTA "region" like any region is a part of the data stream that has a length. So it depends are we computing the length of something, computing the alignment before or after it, or asking what it's required alignment is.

The separatorPrefixMTAApprox val, and other MTAApprox vals are calculating where the MTA region needs to align to, not how big the region is.

This way we can look at approximate end alignment (we should probably call this approximate position) of whatever came before the MTA region and determine if the MTA region is known to already be aligned and so can be excluded.

And then based on the previous alignment and the MTA, we can then calculate where that MTA will have put us. For example, we might know it just put us on a byte boundary, but we might know more and know that it actually put us on a 2-byte boundary, allowing for better optimizations if a later elements needs to be 2-byte aligned.

Gotcha. I think you are correct.

I think some of the complexity in this code could be reduced by rigorously naming the various things to avoid confusion.

Being clear about lengthApprox vs positionApprox vs. alignmentRequirement would go a long ways. We're overusing the term "alignment" here to mean position and requirement.

Definitely. The naming could definitely be improved.

Adding more variables might also be helpful. This document has great diagrams of all the different parts that are considered in our alignment

https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/

Some of our current variables kindof mush multiple boxes together which can make it difficult to tease apart what's going on and how well we actually match the document.

It might makes sense to have a variable that calculates the approximate start of every one of the boxes with matching names (e.g. allignFilleApproximateStart, initiatorMTAApproximateStart),. It adds more variables, but might make things more clear.

And that might simplify other logic that tries to statically compile out different grams. For example, the gram guards can become something like:

val needsAlignFillGram = alignFillApproxStart % alignment != 0 val needsInitiatorMTAGram = initiatorMTAApproxStart % initiator.encoding.mandatoryTextAlignment != 0

stevedlawrence · 2025-12-19T14:49:19Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+  private lazy val separatorLengthApprox = this.optLexicalParent match {
+    case Some(s: SequenceTermBase) if s.hasSeparator =>
+      getEncodingLengthApprox(s)
+    case _ => LengthMultipleOf(0)


Instead of LengthMultipleOf(0), it might be more clear to make this LengthExact(0). I imigine all the math ends up the same so it probably doesn't really matter, but it feels a bit odd to say something has a length that is a multiple of zero.

stevedlawrence · 2025-12-19T16:18:12Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+  // To combine AlignmentMultipleOfs, use *
  def *(that: AlignmentMultipleOf) = AlignmentMultipleOf(Math.gcd(nBits, that.nBits))
  def %(that: AlignmentMultipleOf) = nBits % that.nBits
+


Some of these changes are making me wonder if we need a def +(that: AlignmentMultipleOf) that has slightly different logic than gcd.

I believe the intention of * was to combine alignments that could all happen at the same point in data (e.g. a choice of elements, optional siblings elements).

But in the new code, we are taking an existing approximate alignment, and potentially adding a new alignment to it, which is slightly different.

For example, say we are currently at 2-byte alignment (AlignmentMultipleof(16)) and we are adding an element that needs to be 1-byte (AlignmentMultipleOf(8)). In this case, we do not need to perform any alignment here because we are already byte align (we are actually 2-byte aligned), and our logic handles that correctly.

But the contentStartAlignment of that new element still really wants to be 2-byte aligned, so it should remain AlignmentMultipleOf(16). But right now, using * it's alignment will be AlignmentMultilpeOf(8) since gcd(16,8) => 8. So we actually have less information about our true alignment, and could miss out on future optimizations.

So it feels like a + operation might want to be something like

def *(that: AlignmentMultipleOf) = if (this.nBits % that.nBits == 0) this else that

The idea is that if our new alignment (that) evenly divides our existing alignment (this), then no alignment is actually needed, and we can keep our more accurate current alignment. Otherwise we use the new alignment.

stevedlawrence · 2025-12-19T16:25:06Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+          case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits)
+          case Postfix => LengthMultipleOf(0)
+        }
+      case _ => LengthMultipleOf(0)


I think this really wants to be AlignmentApprox(1) (MTA is alignment after all so using length isn't quite right) and then with the new + operator mentioned above it will work as expected. This is because 1 will always evenly divide the existing alignment so the existing alignment will be used. Same with the other MTA things.

In fact, I think AligmentApprox(0) never wants to be used except for the root element, since it's kindof a 1-based thing.

stevedlawrence · 2025-12-19T16:29:26Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

  private lazy val priorAlignmentWithLeadingSkipApprox: AlignmentMultipleOf = {
-    priorAlignmentApprox + leadingSkipApprox
+    val priorAlignmentWithSeparatorApprox = priorAlignmentApprox
+      + separatorPrefixMTAApprox


Note that with the new + operator, this does want to stay a + and not use the old * operator. Since this isn't combining potential alignments, this is taking an existing alignment and adding to it.

stevedlawrence · 2025-12-19T16:41:11Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

-    }
+        * initiatorMTAApprox
+        + initiatorLengthApprox
+    val csa = (leftFramingApprox + prefixLengthElementLength) * valueMTAApprox


I think this also wants to use the new + operator.

stevedlawrence · 2025-12-19T16:42:12Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

-          contentStartAlignment + (elementSpecifiedLengthApprox + trailingSkipApprox)
+          contentStartAlignment + elementSpecifiedLengthApprox
        }
+        val cea = res * terminatorMTAApprox


Wants to be the new + operator

stevedlawrence · 2025-12-19T16:43:36Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+        // to terminator MTA/length and our trailing skip to get where it would leave off were
+        // each the actual last child.
+        val lastApproxesFinal = lastApproxes.map {
+          _ * terminatorMTAApprox


This wants to be a +

stevedlawrence · 2025-12-19T16:44:30Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+        }
+        Assert.invariant(lastApproxesFinal.nonEmpty)
+        val res = lastApproxesFinal.reduce {
+          _ * _


I think right here is the only place we really want to use the * operator. The goal of this is to combine a bunch of different alignemtns that could happen at the same place time due to choice or optional elements, so this finds the the common alignment they all have.

stevedlawrence · 2025-12-19T16:51:30Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/AlignedMixin.scala

+                prefixElem.elementLengthInBitsEv.optConstant.get.get
+              ) + prefixLengthElementLength
+            } else {
+              getEncodingLengthApprox(prefixElem)


Prefix elements dont' necessarily have encoding. They could just be binary numbers. And I think it's possible for prefix lengths to have lengthKind="prefixed", in which case this would have to be LengthMultipleOf(lengthUnits) or something?

olabusayoT requested review from jadams-tresys, mbeckerle and stevedlawrence December 18, 2025 21:01

This was referenced Dec 18, 2025

Update alignment to handle terminator alignment #1601

Closed

Add support for separator alignment in sequences #1594

Closed

stevedlawrence requested changes Dec 19, 2025

View reviewed changes

Add logic for separator/initiator/prefixed/terminator alignment #1604

Are you sure you want to change the base?

Add logic for separator/initiator/prefixed/terminator alignment #1604

Conversation

olabusayoT commented Dec 18, 2025

Uh oh!

stevedlawrence left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants