Skip to content

[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1

Closed
bgergely0 wants to merge 10000 commits intomainfrom
bolt-gnu-property-note
Closed

[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1
bgergely0 wants to merge 10000 commits intomainfrom
bolt-gnu-property-note

Conversation

@bgergely0
Copy link
Owner

This commit adds the GNUPropertyRewriter, which parses features from the
.gnu.property.note section.

Currently we only read the bit indicating BTI support
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI).

As BOLT does not add BTI landing pads to targets of indirect
branches/calls, we have to emit a warning that the output binary may be
corrupted.

petechou and others added 30 commits September 26, 2025 14:45
llvm#159145)

Post-RA machine sinking could sink a copy of sub-register into
a successor. However, the sub-register might not be removed from the
live-in bitmask of its super register in successor and then a later
pass, e.g, if-converter, may add an implicit use of the register from
live-in resulting in an use of an undefined register. This change makes
sure subrange of live-ins from super register could be removed as well.
)

We didn't have coverage for this yet. And I'm planning on making some
chnges in this area. These tests will be useful for that.
Factor out from llvm#151275
Remove all UnsafeFPMath uses but ABI tags related part.
Windows paths have different slashes, but I don't think we care about
the exact paths there anyway so I've just checked for the final
filename.

Fixes llvm#160652
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.

The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:

asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");

in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
We were looking for any mention of the feature name in cpuinfo, which
could have hit anything including features with common prefixes like
sme, sme2, smefa64.

Luckily this was not a problem but I'm changing this to find the
features line and split the features into a list. Then we are only
looking for exact matches.

Here's the information for one core as an example:
```
processor	: 7
BogoMIPS	: 200.00
Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid <...>
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd0f
CPU revision	: 0
```
(and to avoid any doubt, this is from a CPU simulated in Arm's FVP, it's
not real)

Note that the layout of the label, colon, values is sometimes aligned
but not always. So I trim whitespace a few times to normalise that.

This repeats once for each core so we only need to find one features
line.
Split out from llvm#151300 to isolate TargetTransformInfo cost modelling for
fault-only-first loads from VPlan implementation details. This change
adds costing support for vp.load.ff independently of the VPlan work.

For now, model a vp.load.ff as cost-equivalent to a vp.load.
This fixes the ifdefs added in
e9e166e; we need to include int_lib.h
first before we can expect these defines to be set.

Also remove the XFAILs for aarch64 windows. As this test now became a
no-op on platforms that lack CRT_HAS_128BIT or CRT_HAS_F128 (aarch64
windows lacks the latter), it no longer fails.
…remat (llvm#159110)

Currently, something like:

```
$eax = MOV32ri -11, implicit-def $rax
%al = COPY $eax
```

Can be rematerialized as:
```
dead $eax = MOV32ri -11, implicit-def $rax
```

Which marks the full $rax as used, not just $al.

With this change, this is rematerialized as:

```
dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al
```

To indicate that only $al is used. 

Note: This issue is latent right now, but is exposed when llvm#134408 is
applied, as it results in the register pressure being incorrectly
calculated (unless this patch is applied too).

I think this change is in line with past fixes in this area, notably:

llvm@059cead

llvm@69cd121
…if successor is loop header (llvm#154063)

This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not
split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.

Add a nvptx test to demonstrate the issue since the target also
requires structured CFG.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
In the im2col decomposition, propagate the filter tensor encoding (if
specified) through the tensor.collapse_shape op, so that it can be used
by the consuming linalg.generic matmul op.

Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
Additional CSE opportunities are exposed after converting to concrete
recipes/dissolving regions and materializing various expressions. Run
CSE later, to capitalize on some of the late opportunities.

PR: llvm#160572
This flags enables the compiler to generate most of the debug
information in a separate file which can be useful for executable size
and link times. Clang already supports this flag.
 
I have tried to follow the logic of the clang implementation where
possible. Some functions were moved where they could be used by both
clang and flang. The `addOtherOptions` was renamed to `addDebugOptions`
to better reflect its purpose.

Clang also set the `splitDebugFilename` field of the `DICompileUnit` in
the IR when this option is present. That part is currently missing from
this patch and will come in a follow-up PR.
…lvm#160021)

This patch makes the following updates to the `QualGroup` documentation:

✅ 1. Move to Reference section
Relocated the Qualification Working Group (QualGroup) docs from the main
index into the Reference section for better organization and
consistency.

✅ 2. Add link in GettingInvolved
Inserted a proper link to the QualGroup documentation in the
GettingInvolved sync-ups table, improving discoverability for newcomers.

✅ 3. Align structure with Security Group
Revised the documentation layout to follow the same structure pattern as
the Security Group docs, ensuring consistency across LLVM working group
references.
… `_LIBCPP_VERSION` (llvm#160627)

And add some guaranteed cases (namely, for `expected`, `optional`, and
`variant`) to `is_implicit_lifetime.pass.cpp`.

It's somehow unfortunate that `pair` and `tuple` are not guaranteed to
propagate triviality of copy/move constructors, and MSVC STL fails to do
so due to ABI compatibility. This affects the implicit-lifetime
property.
…oisonForTargetNode - add X86ISD::PSHUFB handling (llvm#160842)

X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))
On targets where f32 maximumnum is legal, but maximumnum on vectors of
smaller types is not legal (e.g. v2f16), try unrolling the vector first
as part of the expansion.

Only fall back to expanding the full maximumnum computation into
compares + selects if maximumnum on the scalar element type cannot be
supported.
)

Program itself is unused in that file, so just include the needed
headers.
… clang (llvm#160605)

When cross-compiling the LLVM project as a whole (from llvm/), if it
cannot find presupplied tools it will create a native build environment
to build the tools it needs.

However, when doing a standalone build of clang (that is, from clang/
and linking against an existing libLLVM) this doesn't work. Instead a
_target_ binary is built which predictably then fails.

The conventional workaround for this is to build the native tools in a
separate native compile phase and pass the paths to the cross build, for
example see OpenEmbedded[1] or Nix[2]. But we can do better!

The first problem is that LLVM_USE_HOST_TOOLS is only set in the llvm/
CMakeLists.txt, so setup_host_tool() will never consider building a
native binary. This can be solved by setting LLVM_USE_HOST_TOOLS based
on CMAKE_CROSSCOMPILING in clang/CMakeLists.txt in the standalone case.

Now setup_host_tool() will try to build a native tool, but it needs
build_native_tool() from CrossCompile.cmake, so that also needs to be
included.

Finally, the native binary then fails because there's no provider for
the dependency "CONFIGURE_Clang_NATIVE", so use llvm_create_cross_target
to create the native environment.

These few lines mirror what the lldb CMakeLists.txt does in the
standalone case, so there is prior art for this.

[1]
https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang_git.bb?id=e18d697e92b55e57124e80234369d46575226386#n212
[2]
https://github.com/NixOS/nixpkgs/blob/3354d448f2a26117a74638957b0131ce3da9c8c4/pkgs/development/compilers/llvm/common/tblgen.nix#L54
…oisonForTargetNode - add X86ISD::VPERMV handling (llvm#160845)

X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))
llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not
need to be tested any more.
Mostly mechanical changes to add the missing field.
frasercrmck and others added 26 commits September 29, 2025 10:20
Wenju He has been active on the libclc project for a while now and has
been contributing to the overall health and steering the future of the
project.
…I targets (llvm#161152)

Inspired by llvm#160928 - if we have a AVX512 target capable of AVXVNNI but not AVX512VNNI then we must split 512-bit (or larger) types to 256-bits
…0049)

Stumbled across a typo in the `MachineVerifier` file and since I had it
open, I changed some other comments.

Not important but why not leave it a bit cleaner 🙂

---------

Signed-off-by: Daniel Stadelmann <dasta_7@hotmail.com>
Remove NoSignedZerosFPMath in visitFSUB part, we should always use
instruction level fast math flags.
Currently, devices store a raw pointer to back to their owning Platform.
Platforms are stored directly inside of a vector. Modifying this vector
risks invalidating all the platform pointers stored in devices.

This patch allocates platforms individually, and changes devices to
store a reference to its platform instead of a pointer. This is safe,
because platforms are guaranteed to outlive the devices they contain.
Fix bug in llvm#140188 where incoming vectors are rotated in the wrong
direction.

Co-authored-by: Leon Clark <leoclark@amd.com>
…itchToSelect`

Make sure selects do exist prior to assigning weights to edges.

Fixes: llvm#161137.
…bstract subprogram DIEs" (llvm#160786)

This is an attempt to reland
llvm#159104 with the fix for
llvm#160197.

The original patch had the following problem: when an abstract
subprogram DIE is constructed from within
`DwarfDebug::endFunctionImpl()`,
`DwarfDebug::constructAbstractSubprogramScopeDIE()` acknowledges `unit:`
field of DISubprogram. But an abstract subprogram DIE constructed from
`DwarfDebug::beginModule()` was put in the same compile unit to which
global variable referencing the subprogram belonged, regardless of
subprogram's `unit:`.

This is fixed by adding `DwarfDebug::getOrCreateAbstractSubprogramCU()`
used by both`DwarfDebug:: constructAbstractSubprogramScopeDIE()` and
`DwarfCompileUnit::getOrCreateSubprogramDIE()` when abstract subprogram
is queried during the creation of DIEs for globals in
`DwarfDebug::beginModule()`.

The fix and the already-reviewed code from
llvm#159104 are two separate
commits in this PR.

=====
The original commit message follows:

With this change, construction of abstract subprogram DIEs is split in
two stages/functions: creation of DIE (in
DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population
with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).

With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes llvm#29985.

LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.

DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.

This is a temporary fix for
llvm#29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).

Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
llvm#90523 was taken for this
commit.
…lvm#161110)

When -nostdlib is specified, Clang should not report any
library‑provided module manifest, even if a manifest for the default
standard library is present.
…lvm#161085)

Under some options used by LLVM Buildbot, an uninitialized variable
(recently added to the test suite) caused constant evaluation failure,
despite the type of that variable is an empty class.

This PR explicitly initializes the variables with `{}` to fix the error.
Follows-up a558d65.
This reverts commit 99a29f6.

Original change was reverted because following assertion started firing:
```
clang++: clang/include/clang/AST/LambdaCapture.h:105: ValueDecl
*clang::LambdaCapture::getCapturedVar() const: Assertion
`capturesVariable() && "No variable available for capture"' failed.

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.

Stack dump:
0.Program arguments: ../../prebuilt/third_party/clang/custom/bin/clang++ -MD -MF host_x64/obj/third_party/android/platform/system/libbase/libbase.logging.cpp.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -I../.. -Ihost_x64/gen -I../../third_party/android/platform/system/libbase/include -I../../third_party/fmtlib/src/include -I../../third_party/android/platfo...com
1.<eof> parser at end of file
2.Per-file LLVM IR generation
clang++: error: clang frontend command failed with exit code 134 (use -v
to see invocation)
Fuchsia clang version 22.0.0git
(https://llvm.googlesource.com/llvm-project
8553bd2)
********************
```

The relanded patch just adds a `Capture.capturesVariable()` check before calling `getCapturedVar`. That's what the code did before the refactor.
…MaybeAlign` (llvm#159449)

Change remaining OpBuilder methods to use `llvm::MaybeAlign` instead of
`uint64_t` for alignment parameters.

---------

Co-authored-by: Erick Ochoa Lopez <erick.ochoalopez@amd.com>
This commit adds the GNUPropertyRewriter, which parses features from the
.gnu.property.note section.

Currently we only read the bit indicating BTI support
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI).

As BOLT does not add BTI landing pads to targets of indirect
branches/calls, we have to emit a warning that the output binary may be
corrupted.
Copy link
Owner Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@bgergely0 bgergely0 closed this Sep 29, 2025
@bgergely0
Copy link
Owner Author

thanks Graphite, really cool

bgergely0 pushed a commit that referenced this pull request Oct 6, 2025
Specifically, `X & M ?= C --> (C << clz(M)) ?= (X << clz(M))` where M is
a non-empty sequence of ones starting at the least significant bit with
the remainder zero and C is a constant subset of M that cannot be
materialised into a SUBS (immediate). Proof:
https://alive2.llvm.org/ce/z/haqdJ4.

This improves the comparison in isinf, for example:
```cpp
int isinf(float x) {
  return __builtin_isinf(x);
}
```

Before:
```
isinf:
  fmov    w9, s0
  mov     w8, #2139095040
  and     w9, w9, #0x7fffffff
  cmp     w9, w8
  cset    w0, eq
  ret
```

After:
```
isinf:
  fmov    w9, s0
  mov     w8, #-16777216
  cmp     w8, w9, lsl #1
  cset    w0, eq
  ret
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.