[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note by bgergely0 · Pull Request #1 · bgergely0/llvm-project

bgergely0 · 2025-09-29T14:22:12Z

This commit adds the GNUPropertyRewriter, which parses features from the
.gnu.property.note section.

Currently we only read the bit indicating BTI support
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI).

As BOLT does not add BTI landing pads to targets of indirect
branches/calls, we have to emit a warning that the output binary may be
corrupted.

llvm#159145) Post-RA machine sinking could sink a copy of sub-register into a successor. However, the sub-register might not be removed from the live-in bitmask of its super register in successor and then a later pass, e.g, if-converter, may add an implicit use of the register from live-in resulting in an use of an undefined register. This change makes sure subrange of live-ins from super register could be removed as well.

…#160254)

) We didn't have coverage for this yet. And I'm planning on making some chnges in this area. These tests will be useful for that.

…llvm#160429)

…160818)

Factor out from llvm#151275 Remove all UnsafeFPMath uses but ABI tags related part.

Windows paths have different slashes, but I don't think we care about the exact paths there anyway so I've just checked for the final filename. Fixes llvm#160652

An inline asm constraint "Jr", in AArch32, means that if the input value is a compile-time constant in the range -4095 to +4095, then it can be inserted into the assembly language as an immediate operand, and otherwise it will be placed in a register. The comment in the Arm backend said "It is not clear what this constraint is intended for". I believe the answer is that that range of immediate values are the ones you can use in a LDR or STR instruction. So it's suitable for cases like this: asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory"); in the same way that the "Ir" constraint is suitable for the immediate in a data-processing instruction such as ADD or EOR.

llvm#159258)

We were looking for any mention of the feature name in cpuinfo, which could have hit anything including features with common prefixes like sme, sme2, smefa64. Luckily this was not a problem but I'm changing this to find the features line and split the features into a list. Then we are only looking for exact matches. Here's the information for one core as an example: ``` processor : 7 BogoMIPS : 200.00 Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid <...> CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd0f CPU revision : 0 ``` (and to avoid any doubt, this is from a CPU simulated in Arm's FVP, it's not real) Note that the layout of the label, colon, values is sometimes aligned but not always. So I trim whitespace a few times to normalise that. This repeats once for each core so we only need to find one features line.

llvm#160823) - Fix llvm#156591 (comment) - As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default rounding mode is **round to nearest**.

Split out from llvm#151300 to isolate TargetTransformInfo cost modelling for fault-only-first loads from VPlan implementation details. This change adds costing support for vp.load.ff independently of the VPlan work. For now, model a vp.load.ff as cost-equivalent to a vp.load.

This fixes the ifdefs added in e9e166e; we need to include int_lib.h first before we can expect these defines to be set. Also remove the XFAILs for aarch64 windows. As this test now became a no-op on platforms that lack CRT_HAS_128BIT or CRT_HAS_F128 (aarch64 windows lacks the latter), it no longer fails.

…remat (llvm#159110) Currently, something like: ``` $eax = MOV32ri -11, implicit-def $rax %al = COPY $eax ``` Can be rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def $rax ``` Which marks the full $rax as used, not just $al. With this change, this is rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al ``` To indicate that only $al is used. Note: This issue is latent right now, but is exposed when llvm#134408 is applied, as it results in the register pressure being incorrectly calculated (unless this patch is applied too). I think this change is in line with past fixes in this area, notably: llvm@059cead llvm@69cd121

…if successor is loop header (llvm#154063) This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split. The critical edge's destination a loop header. Splitting the critical edge will not break structured CFG. Add a nvptx test to demonstrate the issue since the target also requires structured CFG. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

In the im2col decomposition, propagate the filter tensor encoding (if specified) through the tensor.collapse_shape op, so that it can be used by the consuming linalg.generic matmul op. Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>

Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: llvm#160572

…freeze(x),freeze(y)) (llvm#160835)

…s(freeze(x),freeze(y)) (llvm#160837)

…ilvar(freeze(x),freeze(y)) (llvm#160836)

This flags enables the compiler to generate most of the debug information in a separate file which can be useful for executable size and link times. Clang already supports this flag. I have tried to follow the logic of the clang implementation where possible. Some functions were moved where they could be used by both clang and flang. The `addOtherOptions` was renamed to `addDebugOptions` to better reflect its purpose. Clang also set the `splitDebugFilename` field of the `DICompileUnit` in the IR when this option is present. That part is currently missing from this patch and will come in a follow-up PR.

…lvm#160021) This patch makes the following updates to the `QualGroup` documentation: ✅ 1. Move to Reference section Relocated the Qualification Working Group (QualGroup) docs from the main index into the Reference section for better organization and consistency. ✅ 2. Add link in GettingInvolved Inserted a proper link to the QualGroup documentation in the GettingInvolved sync-ups table, improving discoverability for newcomers. ✅ 3. Align structure with Security Group Revised the documentation layout to follow the same structure pattern as the Security Group docs, ensuring consistency across LLVM working group references.

… `_LIBCPP_VERSION` (llvm#160627) And add some guaranteed cases (namely, for `expected`, `optional`, and `variant`) to `is_implicit_lifetime.pass.cpp`. It's somehow unfortunate that `pair` and `tuple` are not guaranteed to propagate triviality of copy/move constructors, and MSVC STL fails to do so due to ABI compatibility. This affects the implicit-lifetime property.

…oisonForTargetNode - add X86ISD::PSHUFB handling (llvm#160842) X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))

On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.

) Program itself is unused in that file, so just include the needed headers.

… clang (llvm#160605) When cross-compiling the LLVM project as a whole (from llvm/), if it cannot find presupplied tools it will create a native build environment to build the tools it needs. However, when doing a standalone build of clang (that is, from clang/ and linking against an existing libLLVM) this doesn't work. Instead a _target_ binary is built which predictably then fails. The conventional workaround for this is to build the native tools in a separate native compile phase and pass the paths to the cross build, for example see OpenEmbedded[1] or Nix[2]. But we can do better! The first problem is that LLVM_USE_HOST_TOOLS is only set in the llvm/ CMakeLists.txt, so setup_host_tool() will never consider building a native binary. This can be solved by setting LLVM_USE_HOST_TOOLS based on CMAKE_CROSSCOMPILING in clang/CMakeLists.txt in the standalone case. Now setup_host_tool() will try to build a native tool, but it needs build_native_tool() from CrossCompile.cmake, so that also needs to be included. Finally, the native binary then fails because there's no provider for the dependency "CONFIGURE_Clang_NATIVE", so use llvm_create_cross_target to create the native environment. These few lines mirror what the lldb CMakeLists.txt does in the standalone case, so there is prior art for this. [1] https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang_git.bb?id=e18d697e92b55e57124e80234369d46575226386#n212 [2] https://github.com/NixOS/nixpkgs/blob/3354d448f2a26117a74638957b0131ce3da9c8c4/pkgs/development/compilers/llvm/common/tblgen.nix#L54

…oisonForTargetNode - add X86ISD::VPERMV handling (llvm#160845) X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))

llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not need to be tested any more.

Mostly mechanical changes to add the missing field.

Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.

…I targets (llvm#161152) Inspired by llvm#160928 - if we have a AVX512 target capable of AVXVNNI but not AVX512VNNI then we must split 512-bit (or larger) types to 256-bits

…FC. (llvm#161150)

…0049) Stumbled across a typo in the `MachineVerifier` file and since I had it open, I changed some other comments. Not important but why not leave it a bit cleaner 🙂 --------- Signed-off-by: Daniel Stadelmann <dasta_7@hotmail.com>

Fixes llvm#157334.

…ectly (llvm#157616) Fixes [issue](llvm#155591)

…eturn a value" warning. NFC. (llvm#161168)

Remove NoSignedZerosFPMath in visitFSUB part, we should always use instruction level fast math flags.

)

…reter bytecode test coverage (llvm#161172) Part of llvm#155814

…eter bytecode test coverage (llvm#161174) Part of llvm#155814

Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.

…eter bytecode test coverage (llvm#161182) Part of llvm#155814

Fix bug in llvm#140188 where incoming vectors are rotated in the wrong direction. Co-authored-by: Leon Clark <leoclark@amd.com>

…itchToSelect` Make sure selects do exist prior to assigning weights to edges. Fixes: llvm#161137.

…bstract subprogram DIEs" (llvm#160786) This is an attempt to reland llvm#159104 with the fix for llvm#160197. The original patch had the following problem: when an abstract subprogram DIE is constructed from within `DwarfDebug::endFunctionImpl()`, `DwarfDebug::constructAbstractSubprogramScopeDIE()` acknowledges `unit:` field of DISubprogram. But an abstract subprogram DIE constructed from `DwarfDebug::beginModule()` was put in the same compile unit to which global variable referencing the subprogram belonged, regardless of subprogram's `unit:`. This is fixed by adding `DwarfDebug::getOrCreateAbstractSubprogramCU()` used by both`DwarfDebug:: constructAbstractSubprogramScopeDIE()` and `DwarfCompileUnit::getOrCreateSubprogramDIE()` when abstract subprogram is queried during the creation of DIEs for globals in `DwarfDebug::beginModule()`. The fix and the already-reviewed code from llvm#159104 are two separate commits in this PR. ===== The original commit message follows: With this change, construction of abstract subprogram DIEs is split in two stages/functions: creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population with children (in DwarfCompileUnit::constructAbstractSubprogramScopeDIE). With that, abstract subprograms can be created/referenced from DwarfDebug::beginModule, which should solve the issue with static local variables DIE creation of inlined functons with optimized-out definitions. It fixes llvm#29985. LexicalScopes class now stores mapping from DISubprograms to their corresponding llvm::Function's. It is supposed to be built before processing of each function (so, now LexicalScopes class has a method for "module initialization" alongside the method for "function initialization"). It is used by DwarfCompileUnit to determine whether a DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is invoked. DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can create an abstract or a concrete DIE for a subprogram. It accepts llvm::Function* argument to determine whether a concrete DIE must be created. This is a temporary fix for llvm#29985. Ideally, it will be fixed by moving global variables and types emission to DwarfDebug::endModule (https://reviews.llvm.org/D144007, https://reviews.llvm.org/D144005). Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in llvm#90523 was taken for this commit.

…lvm#161110) When -nostdlib is specified, Clang should not report any library‑provided module manifest, even if a manifest for the default standard library is present.

…lvm#161085) Under some options used by LLVM Buildbot, an uninitialized variable (recently added to the test suite) caused constant evaluation failure, despite the type of that variable is an empty class. This PR explicitly initializes the variables with `{}` to fix the error. Follows-up a558d65.

…llvm#161051)

This reverts commit 99a29f6. Original change was reverted because following assertion started firing: ``` clang++: clang/include/clang/AST/LambdaCapture.h:105: ValueDecl *clang::LambdaCapture::getCapturedVar() const: Assertion `capturesVariable() && "No variable available for capture"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0.Program arguments: ../../prebuilt/third_party/clang/custom/bin/clang++ -MD -MF host_x64/obj/third_party/android/platform/system/libbase/libbase.logging.cpp.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -I../.. -Ihost_x64/gen -I../../third_party/android/platform/system/libbase/include -I../../third_party/fmtlib/src/include -I../../third_party/android/platfo...com 1.<eof> parser at end of file 2.Per-file LLVM IR generation clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation) Fuchsia clang version 22.0.0git (https://llvm.googlesource.com/llvm-project 8553bd2) ******************** ``` The relanded patch just adds a `Capture.capturesVariable()` check before calling `getCapturedVar`. That's what the code did before the refactor.

…0009) Fixes llvm#156853.

…MaybeAlign` (llvm#159449) Change remaining OpBuilder methods to use `llvm::MaybeAlign` instead of `uint64_t` for alignment parameters. --------- Co-authored-by: Erick Ochoa Lopez <erick.ochoalopez@amd.com>

This commit adds the GNUPropertyRewriter, which parses features from the .gnu.property.note section. Currently we only read the bit indicating BTI support (GNU_PROPERTY_AARCH64_FEATURE_1_BTI). As BOLT does not add BTI landing pads to targets of indirect branches/calls, we have to emit a warning that the output binary may be corrupted.

bgergely0 · 2025-09-29T14:22:56Z

[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note #1 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

bgergely0 · 2025-09-29T14:28:09Z

thanks Graphite, really cool

Specifically, `X & M ?= C --> (C << clz(M)) ?= (X << clz(M))` where M is a non-empty sequence of ones starting at the least significant bit with the remainder zero and C is a constant subset of M that cannot be materialised into a SUBS (immediate). Proof: https://alive2.llvm.org/ce/z/haqdJ4. This improves the comparison in isinf, for example: ```cpp int isinf(float x) { return __builtin_isinf(x); } ``` Before: ``` isinf: fmov w9, s0 mov w8, #2139095040 and w9, w9, #0x7fffffff cmp w9, w8 cset w0, eq ret ``` After: ``` isinf: fmov w9, s0 mov w8, #-16777216 cmp w8, w9, lsl #1 cset w0, eq ret ```

petechou and others added 30 commits September 26, 2025 14:45

[LoongArch] Refine 256-bit vector_shuffle legalization for LASX (llvm…

fe2dc19

…#160254)

[clang][DebugInfo][test] Add tests for lambda capture packs (llvm#160705

77a3d43

) We didn't have coverage for this yet. And I'm planning on making some chnges in this area. These tests will be useful for that.

[LoongArch] Custom legalize vector_shuffle to xvpermi.d when possible (…

beed796

…llvm#160429)

[AMDGPU] Skip debug uses in SIInsertWaitcnts::shouldFlushVmCnt (llvm#…

8cd917b

…160818)

[ARM] Remove UnsafeFPMath uses in code generation part (llvm#160801)

3257dc3

Factor out from llvm#151275 Remove all UnsafeFPMath uses but ABI tags related part.

[lldb][test] Fix elf-no-shdrs-pt-notes.yaml on Windows (llvm#160827)

368d599

Windows paths have different slashes, but I don't think we care about the exact paths there anyway so I've just checked for the final filename. Fixes llvm#160652

[LoongArch] Generate [x]vldi instructions with special constant splats (

9de1bc0

llvm#159258)

[X86] Set default rounding mode round to nearest for llvm.set.rounding (

9b270fc

llvm#160823) - Fix llvm#156591 (comment) - As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default rounding mode is **round to nearest**.

[VPlan] Run CSE closer to VPlan::execute. (llvm#160572)

78af056

Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: llvm#160572

[X86] Add test showing failure to fold freeze(pshufb(x,y)) -> pshufb(…

ef5e0c7

…freeze(x),freeze(y)) (llvm#160835)

[X86] Add test showing failure to fold freeze(vpermps(x,y)) -> vpermp…

c10befb

…s(freeze(x),freeze(y)) (llvm#160837)

[X86] Add test showing failure to fold freeze(permilvar(x,y)) -> perm…

c731291

…ilvar(freeze(x),freeze(y)) (llvm#160836)

[X86] canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrP…

81aafd9

…oisonForTargetNode - add X86ISD::PSHUFB handling (llvm#160842) X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))

[clang][bytecode] Remove Program include from InterpFrame.h (llvm#160843

347df23

) Program itself is unused in that file, so just include the needed headers.

[X86] canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrP…

3073bb5

…oisonForTargetNode - add X86ISD::VPERMV handling (llvm#160845) X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))

[ARM] Remove -fno-unsafe-math from a number of tests. NFC

02746f8

llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not need to be tested any more.

[mlir] Add splitDebugFilename field in DIComplileUnitAttr. (llvm#160704)

e38e0bd

Mostly mechanical changes to add the missing field.

frasercrmck and others added 26 commits September 29, 2025 10:20

[libclc] Propose new libclc maintainer (llvm#161141)

585fd4c

Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.

[X86] createVPDPBUSD - only use 512-bit X86ISD::VPDPBUSD on AVX512VNN…

2ab2ffe

…I targets (llvm#161152) Inspired by llvm#160928 - if we have a AVX512 target capable of AVXVNNI but not AVX512VNNI then we must split 512-bit (or larger) types to 256-bits

[llvm-cov] Fix MSVC "not all control paths return a value" warning. N…

f1b4a3b

…FC. (llvm#161150)

[ConstantFold] Fold inttoptr, ptrtoaddr to bitcast (llvm#161087)

f628a54

Fixes llvm#157334.

[X86] Remove X86ISD::VSHLDV/VSHRDV and use ISD::FSHL/FSHR opcodes dir…

9552e89

…ectly (llvm#157616) Fixes [issue](llvm#155591)

[clang][bytecode] Pointer::isZero - fix MSVC "not all control paths r…

3c98be4

…eturn a value" warning. NFC. (llvm#161168)

[clang][x86] tbm-builtins.c - add i386 test coverage (llvm#161169)

c20ef94

[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFSUB (llvm#160974)

84e4c06

Remove NoSignedZerosFPMath in visitFSUB part, we should always use instruction level fast math flags.

[DOC][GlobalISel] Add more explanation to InstructionSelect (llvm#160510

7b25cef

)

[clang][x86] bmi-builtins.c - add i386 test coverage (llvm#161171)

3253ec0

[clang][X86] bmi2-builtins.c - add -fexperimental-new-constant-interp…

2d30392

…reter bytecode test coverage (llvm#161172) Part of llvm#155814

[clang][X86] tbm-builtins.c - add -fexperimental-new-constant-interpr…

9d33b99

…eter bytecode test coverage (llvm#161174) Part of llvm#155814

[clang][X86] bmi-builtins.c - add -fexperimental-new-constant-interpr…

cd94035

…eter bytecode test coverage (llvm#161182) Part of llvm#155814

[VectorCombine] Fix rotation in phi narrowing. (llvm#160465)

8df643f

Fix bug in llvm#140188 where incoming vectors are rotated in the wrong direction. Co-authored-by: Leon Clark <leoclark@amd.com>

[SimplifyCFG] Ensure selects have not been constant folded in `foldSw…

5ff9f7b

…itchToSelect` Make sure selects do exist prior to assigning weights to edges. Fixes: llvm#161137.

[clang][modules] Ensure -nostdlib causes no manifest to be reported (l…

b555c99

…lvm#161110) When -nostdlib is specified, Clang should not report any library‑provided module manifest, even if a manifest for the default standard library is present.

[X86] ftrunc.ll - add nounwind to silence cfi noise (llvm#161186)

492bcff

[X86][APX] Promote 8/16-bit LEA to 32-bit to avoid partial dependence (…

4fe1a87

…llvm#161051)

[VectorCombine] foldShuffleOfCastops - handle unary shuffles (llvm#16…

766c90f

…0009) Fixes llvm#156853.

[MLIR][MemRef] Change builders with int alignment params to `llvm::…

d8a8d1f

…MaybeAlign` (llvm#159449) Change remaining OpBuilder methods to use `llvm::MaybeAlign` instead of `uint64_t` for alignment parameters. --------- Co-authored-by: Erick Ochoa Lopez <erick.ochoalopez@amd.com>

bgergely0 closed this Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1

[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1
bgergely0 wants to merge 10000 commits intomainfrom
bolt-gnu-property-note

bgergely0 commented Sep 29, 2025

Uh oh!

bgergely0 commented Sep 29, 2025

Uh oh!

bgergely0 commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

bgergely0 commented Sep 29, 2025

Uh oh!

bgergely0 commented Sep 29, 2025

Uh oh!

bgergely0 commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants