Reassociate reduction chains for long Min/Max#2
Reassociate reduction chains for long Min/Max#2galderz wants to merge 51 commits intorwestrel:masterfrom
Conversation
| * a typical loop would generate additional IR nodes. Hence, the test uses | ||
| * a custom while loop with non-inlined helper methods. | ||
| */ | ||
| public class TestReductionReassociationForAssociativeAdds { |
There was a problem hiding this comment.
I'm unsure about the name for this test. As you can see, I've originally used it to test all associative adds but which it doesn't do now, but I thought it could be useful to leave it like this for future enhancements? We know we would want to do this for min/max for Integer, and I think additions for long/integer might also work well.
|
To help review, this is what the generated IR test looks like (having it formatted after test execution): |
src/hotspot/share/opto/loopnode.cpp
Outdated
| return progress; | ||
| } | ||
|
|
||
| static AddNode* build_min_max(int opcode, Node* a, Node* b, PhaseIdealLoop* phase) { |
There was a problem hiding this comment.
There's already a MinMaxNode::build_min_max_long()
There was a problem hiding this comment.
But maybe it's not convenient to use here.
There was a problem hiding this comment.
Hmmmm, I could use it for sure. I would just need to have a opcode branch around it though, but then I could just reuse what is_associative does below
src/hotspot/share/opto/loopnode.cpp
Outdated
| } | ||
| } | ||
|
|
||
| static bool is_associative(Node* node) { |
There was a problem hiding this comment.
A comment that says only long min and max are supported at this point?
There was a problem hiding this comment.
Yeah I can add a comment. The more I think about it, maybe it can be renamed to can_reassociate.
src/hotspot/share/opto/loopnode.cpp
Outdated
| return node->Opcode() == Op_MinL || node->Opcode() == Op_MaxL; | ||
| } | ||
|
|
||
| static Node* reassociate_chain(int add_opcode, Node* node, PhiNode* phi, Node* loop_head, PhaseIdealLoop* phase) { |
There was a problem hiding this comment.
rather than static methods, create a ReassociateAssociativeOperations class and make static methods part of the class.
There was a problem hiding this comment.
Are there some guidelines on when to use static vs instance methods?
I initially considered making these instance methods of PhaseIdealLoop, which would have been fine for me, but I ended up with static methods to speed up development. Is making a ReassociateAssociativeOperations class with static methods preferable to this?
There was a problem hiding this comment.
There are no guidelines that I'm aware of. One benefit of a new class might be that, rather than passing things around as arguments to methods, they can be fields in the class. For instance, phase can be a field. So no need to pass it around. Same goes with loop_head I suppose. And all the code that's specific to a particular optimizations and unlikely to be needed elsewhere is clearly grouped together.
There was a problem hiding this comment.
Maybe the new class is overkill here but I went that way for some patch I'm working on and I found that it worked fairly well.
src/hotspot/share/opto/loopnode.cpp
Outdated
| Node* loop_head_use = loop_head->fast_out(i); | ||
| if (loop_head_use->is_Phi()) { | ||
| PhiNode* phi = loop_head_use->as_Phi(); | ||
| Unique_Node_List wq; |
There was a problem hiding this comment.
I think you should be able to have this work without the extra Unique_Node_List. If you don't want to process new Phis in the loop, you can capture uint unique = Compile::unique() before entering the loop and then in the loop test phi->_idx < unique.
src/hotspot/share/opto/loopnode.cpp
Outdated
| break; | ||
| } | ||
|
|
||
| Node* use = nullptr; |
There was a problem hiding this comment.
current has only one use, so you can use current->unique_out() instead of the loop.
There was a problem hiding this comment.
This said, wouldn't it be ok if there was multiple uses if only one of them is in the loop?
There was a problem hiding this comment.
In theory yeah but we agreed to limit this as much as possible to avoid breaking other parts. The primary use of this would be unrolled loops by HotSpot itself (as opposed to user unrolled ones)
src/hotspot/share/opto/loopopts.cpp
Outdated
| Node* loop_head = loop->head(); | ||
| ReassociateReductionChains rrc(loop, this); | ||
|
|
||
| for (DUIterator_Fast imax, i = loop_head->fast_outs(imax); i < imax; i++) { |
There was a problem hiding this comment.
Can this be a method of ReassociateReductionChains?
src/hotspot/share/opto/loopopts.cpp
Outdated
| } | ||
|
|
||
| bool transform(Node* n, PhiNode* phi) { | ||
| bool is_associative = n->Opcode() == Op_MinL || n->Opcode() == Op_MaxL; |
There was a problem hiding this comment.
I would put this in its own method
Here's my draft proposal for JDK-8351409.
I've run tier1-3 testing on x86_64 successfully. Emanuel run tests and they looked good.
Here are the benchmark results I observe:
darwin/aarch64:
linux/x86_64 (xeon cascade lake with AVX-512):
The
-6%down forlongMaxBigon xeon is noise, I did perfnorm and perfasm runs and in those it was330ops/ms.Please see some additional notes I've made in the code.