-
Notifications
You must be signed in to change notification settings - Fork 71
Add logic for separator/initiator/prefixed/terminator alignment #1604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implement logic to calculate prefix, infix, and postfix separator alignments and lengths in sequence terms. - Added new alignment logic/test cases covering initiator alignment, prefixed length elements, terminator alignment and value MTA (for optimization) behavior. - Enhanced alignment calculation by excluding prior alignment from the contentStart, as alignmentApprox is a good place to start the contentStartAlignment. - Updated schemas and TDML files to include test data for new alignment scenarios. DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057
stevedlawrence
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alignment stuff is very complicated and difficult to make sense. I think this is the right approach to fixing the current issues, I think with some tweaks needed.
Though we might want to consider if there's a different approach, or maybe some refactoring that's easier to make sense of for a future update.
| case Some(s: SequenceTermBase) if s.hasSeparator => | ||
| import SeparatorPosition.* | ||
| s.separatorPosition match { | ||
| case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MTA is an alignment thing, so I think these MTA approx functions what to return AligmentMultipleOf instead of LengthApprox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that's right.
An MTA "region" like any region is a part of the data stream that has a length. So it depends are we computing the length of something, computing the alignment before or after it, or asking what it's required alignment is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separatorPrefixMTAApprox val, and other MTAApprox vals are calculating where the MTA region needs to align to, not how big the region is.
This way we can look at approximate end alignment (we should probably call this approximate position) of whatever came before the MTA region and determine if the MTA region is known to already be aligned and so can be excluded.
And then based on the previous alignment and the MTA, we can then calculate where that MTA will have put us. For example, we might know it just put us on a byte boundary, but we might know more and know that it actually put us on a 2-byte boundary, allowing for better optimizations if a later elements needs to be 2-byte aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I think you are correct.
I think some of the complexity in this code could be reduced by rigorously naming the various things to avoid confusion.
Being clear about lengthApprox vs positionApprox vs. alignmentRequirement would go a long ways. We're overusing the term "alignment" here to mean position and requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. The naming could definitely be improved.
Adding more variables might also be helpful. This document has great diagrams of all the different parts that are considered in our alignment
https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
Some of our current variables kindof mush multiple boxes together which can make it difficult to tease apart what's going on and how well we actually match the document.
It might makes sense to have a variable that calculates the approximate start of every one of the boxes with matching names (e.g. allignFilleApproximateStart, initiatorMTAApproximateStart),. It adds more variables, but might make things more clear.
And that might simplify other logic that tries to statically compile out different grams. For example, the gram guards can become something like:
val needsAlignFillGram = alignFillApproxStart % alignment != 0
val needsInitiatorMTAGram = initiatorMTAApproxStart % initiator.encoding.mandatoryTextAlignment != 0| private lazy val separatorLengthApprox = this.optLexicalParent match { | ||
| case Some(s: SequenceTermBase) if s.hasSeparator => | ||
| getEncodingLengthApprox(s) | ||
| case _ => LengthMultipleOf(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of LengthMultipleOf(0), it might be more clear to make this LengthExact(0). I imigine all the math ends up the same so it probably doesn't really matter, but it feels a bit odd to say something has a length that is a multiple of zero.
| // To combine AlignmentMultipleOfs, use * | ||
| def *(that: AlignmentMultipleOf) = AlignmentMultipleOf(Math.gcd(nBits, that.nBits)) | ||
| def %(that: AlignmentMultipleOf) = nBits % that.nBits | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these changes are making me wonder if we need a def +(that: AlignmentMultipleOf) that has slightly different logic than gcd.
I believe the intention of * was to combine alignments that could all happen at the same point in data (e.g. a choice of elements, optional siblings elements).
But in the new code, we are taking an existing approximate alignment, and potentially adding a new alignment to it, which is slightly different.
For example, say we are currently at 2-byte alignment (AlignmentMultipleof(16)) and we are adding an element that needs to be 1-byte (AlignmentMultipleOf(8)). In this case, we do not need to perform any alignment here because we are already byte align (we are actually 2-byte aligned), and our logic handles that correctly.
But the contentStartAlignment of that new element still really wants to be 2-byte aligned, so it should remain AlignmentMultipleOf(16). But right now, using * it's alignment will be AlignmentMultilpeOf(8) since gcd(16,8) => 8. So we actually have less information about our true alignment, and could miss out on future optimizations.
So it feels like a + operation might want to be something like
def *(that: AlignmentMultipleOf) = if (this.nBits % that.nBits == 0) this else thatThe idea is that if our new alignment (that) evenly divides our existing alignment (this), then no alignment is actually needed, and we can keep our more accurate current alignment. Otherwise we use the new alignment.
| case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits) | ||
| case Postfix => LengthMultipleOf(0) | ||
| } | ||
| case _ => LengthMultipleOf(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this really wants to be AlignmentApprox(1) (MTA is alignment after all so using length isn't quite right) and then with the new + operator mentioned above it will work as expected. This is because 1 will always evenly divide the existing alignment so the existing alignment will be used. Same with the other MTA things.
In fact, I think AligmentApprox(0) never wants to be used except for the root element, since it's kindof a 1-based thing.
| private lazy val priorAlignmentWithLeadingSkipApprox: AlignmentMultipleOf = { | ||
| priorAlignmentApprox + leadingSkipApprox | ||
| val priorAlignmentWithSeparatorApprox = priorAlignmentApprox | ||
| + separatorPrefixMTAApprox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with the new + operator, this does want to stay a + and not use the old * operator. Since this isn't combining potential alignments, this is taking an existing alignment and adding to it.
| } | ||
| * initiatorMTAApprox | ||
| + initiatorLengthApprox | ||
| val csa = (leftFramingApprox + prefixLengthElementLength) * valueMTAApprox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also wants to use the new + operator.
| contentStartAlignment + (elementSpecifiedLengthApprox + trailingSkipApprox) | ||
| contentStartAlignment + elementSpecifiedLengthApprox | ||
| } | ||
| val cea = res * terminatorMTAApprox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wants to be the new + operator
| // to terminator MTA/length and our trailing skip to get where it would leave off were | ||
| // each the actual last child. | ||
| val lastApproxesFinal = lastApproxes.map { | ||
| _ * terminatorMTAApprox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wants to be a +
| } | ||
| Assert.invariant(lastApproxesFinal.nonEmpty) | ||
| val res = lastApproxesFinal.reduce { | ||
| _ * _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think right here is the only place we really want to use the * operator. The goal of this is to combine a bunch of different alignemtns that could happen at the same place time due to choice or optional elements, so this finds the the common alignment they all have.
| prefixElem.elementLengthInBitsEv.optConstant.get.get | ||
| ) + prefixLengthElementLength | ||
| } else { | ||
| getEncodingLengthApprox(prefixElem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix elements dont' necessarily have encoding. They could just be binary numbers. And I think it's possible for prefix lengths to have lengthKind="prefixed", in which case this would have to be LengthMultipleOf(lengthUnits) or something?
DAFFODIL-2295, DAFFODIL-3056, DAFFODIL-3057