Skip to content

Conversation

@olabusayoT
Copy link
Contributor

  • Implement logic to calculate prefix, infix, and postfix separator alignments and lengths in sequence terms.
  • Added new alignment logic/test cases covering initiator alignment, prefixed length elements, terminator alignment and value MTA (for optimization) behavior.
  • Enhanced alignment calculation by excluding prior alignment from the contentStart, as alignmentApprox is a good place to start the contentStartAlignment.
  • Updated schemas and TDML files to include test data for new alignment scenarios.

DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057

- Implement logic to calculate prefix, infix, and postfix separator alignments and lengths in sequence terms.
- Added new alignment logic/test cases covering initiator alignment, prefixed length elements, terminator alignment and value MTA (for optimization) behavior.
- Enhanced alignment calculation by excluding prior alignment from the contentStart, as alignmentApprox is a good place to start the contentStartAlignment.
- Updated schemas and TDML files to include test data for new alignment scenarios.

DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057
Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alignment stuff is very complicated and difficult to make sense. I think this is the right approach to fixing the current issues, I think with some tweaks needed.

Though we might want to consider if there's a different approach, or maybe some refactoring that's easier to make sense of for a future update.

case Some(s: SequenceTermBase) if s.hasSeparator =>
import SeparatorPosition.*
s.separatorPosition match {
case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTA is an alignment thing, so I think these MTA approx functions what to return AligmentMultipleOf instead of LengthApprox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that's right.

An MTA "region" like any region is a part of the data stream that has a length. So it depends are we computing the length of something, computing the alignment before or after it, or asking what it's required alignment is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separatorPrefixMTAApprox val, and other MTAApprox vals are calculating where the MTA region needs to align to, not how big the region is.

This way we can look at approximate end alignment (we should probably call this approximate position) of whatever came before the MTA region and determine if the MTA region is known to already be aligned and so can be excluded.

And then based on the previous alignment and the MTA, we can then calculate where that MTA will have put us. For example, we might know it just put us on a byte boundary, but we might know more and know that it actually put us on a 2-byte boundary, allowing for better optimizations if a later elements needs to be 2-byte aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think you are correct.

I think some of the complexity in this code could be reduced by rigorously naming the various things to avoid confusion.

Being clear about lengthApprox vs positionApprox vs. alignmentRequirement would go a long ways. We're overusing the term "alignment" here to mean position and requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. The naming could definitely be improved.

Adding more variables might also be helpful. This document has great diagrams of all the different parts that are considered in our alignment

https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/

Some of our current variables kindof mush multiple boxes together which can make it difficult to tease apart what's going on and how well we actually match the document.

It might makes sense to have a variable that calculates the approximate start of every one of the boxes with matching names (e.g. allignFilleApproximateStart, initiatorMTAApproximateStart),. It adds more variables, but might make things more clear.

And that might simplify other logic that tries to statically compile out different grams. For example, the gram guards can become something like:

val needsAlignFillGram = alignFillApproxStart % alignment != 0

val needsInitiatorMTAGram = initiatorMTAApproxStart % initiator.encoding.mandatoryTextAlignment != 0

private lazy val separatorLengthApprox = this.optLexicalParent match {
case Some(s: SequenceTermBase) if s.hasSeparator =>
getEncodingLengthApprox(s)
case _ => LengthMultipleOf(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of LengthMultipleOf(0), it might be more clear to make this LengthExact(0). I imigine all the math ends up the same so it probably doesn't really matter, but it feels a bit odd to say something has a length that is a multiple of zero.

// To combine AlignmentMultipleOfs, use *
def *(that: AlignmentMultipleOf) = AlignmentMultipleOf(Math.gcd(nBits, that.nBits))
def %(that: AlignmentMultipleOf) = nBits % that.nBits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these changes are making me wonder if we need a def +(that: AlignmentMultipleOf) that has slightly different logic than gcd.

I believe the intention of * was to combine alignments that could all happen at the same point in data (e.g. a choice of elements, optional siblings elements).

But in the new code, we are taking an existing approximate alignment, and potentially adding a new alignment to it, which is slightly different.

For example, say we are currently at 2-byte alignment (AlignmentMultipleof(16)) and we are adding an element that needs to be 1-byte (AlignmentMultipleOf(8)). In this case, we do not need to perform any alignment here because we are already byte align (we are actually 2-byte aligned), and our logic handles that correctly.

But the contentStartAlignment of that new element still really wants to be 2-byte aligned, so it should remain AlignmentMultipleOf(16). But right now, using * it's alignment will be AlignmentMultilpeOf(8) since gcd(16,8) => 8. So we actually have less information about our true alignment, and could miss out on future optimizations.

So it feels like a + operation might want to be something like

def *(that: AlignmentMultipleOf) = if (this.nBits % that.nBits == 0) this else that

The idea is that if our new alignment (that) evenly divides our existing alignment (this), then no alignment is actually needed, and we can keep our more accurate current alignment. Otherwise we use the new alignment.

case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits)
case Postfix => LengthMultipleOf(0)
}
case _ => LengthMultipleOf(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really wants to be AlignmentApprox(1) (MTA is alignment after all so using length isn't quite right) and then with the new + operator mentioned above it will work as expected. This is because 1 will always evenly divide the existing alignment so the existing alignment will be used. Same with the other MTA things.

In fact, I think AligmentApprox(0) never wants to be used except for the root element, since it's kindof a 1-based thing.

private lazy val priorAlignmentWithLeadingSkipApprox: AlignmentMultipleOf = {
priorAlignmentApprox + leadingSkipApprox
val priorAlignmentWithSeparatorApprox = priorAlignmentApprox
+ separatorPrefixMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with the new + operator, this does want to stay a + and not use the old * operator. Since this isn't combining potential alignments, this is taking an existing alignment and adding to it.

}
* initiatorMTAApprox
+ initiatorLengthApprox
val csa = (leftFramingApprox + prefixLengthElementLength) * valueMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also wants to use the new + operator.

contentStartAlignment + (elementSpecifiedLengthApprox + trailingSkipApprox)
contentStartAlignment + elementSpecifiedLengthApprox
}
val cea = res * terminatorMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wants to be the new + operator

// to terminator MTA/length and our trailing skip to get where it would leave off were
// each the actual last child.
val lastApproxesFinal = lastApproxes.map {
_ * terminatorMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wants to be a +

}
Assert.invariant(lastApproxesFinal.nonEmpty)
val res = lastApproxesFinal.reduce {
_ * _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right here is the only place we really want to use the * operator. The goal of this is to combine a bunch of different alignemtns that could happen at the same place time due to choice or optional elements, so this finds the the common alignment they all have.

prefixElem.elementLengthInBitsEv.optConstant.get.get
) + prefixLengthElementLength
} else {
getEncodingLengthApprox(prefixElem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix elements dont' necessarily have encoding. They could just be binary numbers. And I think it's possible for prefix lengths to have lengthKind="prefixed", in which case this would have to be LengthMultipleOf(lengthUnits) or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants