[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1
Closed
[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note#1
Conversation
llvm#159145) Post-RA machine sinking could sink a copy of sub-register into a successor. However, the sub-register might not be removed from the live-in bitmask of its super register in successor and then a later pass, e.g, if-converter, may add an implicit use of the register from live-in resulting in an use of an undefined register. This change makes sure subrange of live-ins from super register could be removed as well.
Factor out from llvm#151275 Remove all UnsafeFPMath uses but ABI tags related part.
Windows paths have different slashes, but I don't think we care about the exact paths there anyway so I've just checked for the final filename. Fixes llvm#160652
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.
The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:
asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");
in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
We were looking for any mention of the feature name in cpuinfo, which could have hit anything including features with common prefixes like sme, sme2, smefa64. Luckily this was not a problem but I'm changing this to find the features line and split the features into a list. Then we are only looking for exact matches. Here's the information for one core as an example: ``` processor : 7 BogoMIPS : 200.00 Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid <...> CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd0f CPU revision : 0 ``` (and to avoid any doubt, this is from a CPU simulated in Arm's FVP, it's not real) Note that the layout of the label, colon, values is sometimes aligned but not always. So I trim whitespace a few times to normalise that. This repeats once for each core so we only need to find one features line.
llvm#160823) - Fix llvm#156591 (comment) - As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default rounding mode is **round to nearest**.
Split out from llvm#151300 to isolate TargetTransformInfo cost modelling for fault-only-first loads from VPlan implementation details. This change adds costing support for vp.load.ff independently of the VPlan work. For now, model a vp.load.ff as cost-equivalent to a vp.load.
This fixes the ifdefs added in e9e166e; we need to include int_lib.h first before we can expect these defines to be set. Also remove the XFAILs for aarch64 windows. As this test now became a no-op on platforms that lack CRT_HAS_128BIT or CRT_HAS_F128 (aarch64 windows lacks the latter), it no longer fails.
…remat (llvm#159110) Currently, something like: ``` $eax = MOV32ri -11, implicit-def $rax %al = COPY $eax ``` Can be rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def $rax ``` Which marks the full $rax as used, not just $al. With this change, this is rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al ``` To indicate that only $al is used. Note: This issue is latent right now, but is exposed when llvm#134408 is applied, as it results in the register pressure being incorrectly calculated (unless this patch is applied too). I think this change is in line with past fixes in this area, notably: llvm@059cead llvm@69cd121
…if successor is loop header (llvm#154063) This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split. The critical edge's destination a loop header. Splitting the critical edge will not break structured CFG. Add a nvptx test to demonstrate the issue since the target also requires structured CFG. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
In the im2col decomposition, propagate the filter tensor encoding (if specified) through the tensor.collapse_shape op, so that it can be used by the consuming linalg.generic matmul op. Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: llvm#160572
…freeze(x),freeze(y)) (llvm#160835)
…s(freeze(x),freeze(y)) (llvm#160837)
…ilvar(freeze(x),freeze(y)) (llvm#160836)
This flags enables the compiler to generate most of the debug information in a separate file which can be useful for executable size and link times. Clang already supports this flag. I have tried to follow the logic of the clang implementation where possible. Some functions were moved where they could be used by both clang and flang. The `addOtherOptions` was renamed to `addDebugOptions` to better reflect its purpose. Clang also set the `splitDebugFilename` field of the `DICompileUnit` in the IR when this option is present. That part is currently missing from this patch and will come in a follow-up PR.
…lvm#160021) This patch makes the following updates to the `QualGroup` documentation: ✅ 1. Move to Reference section Relocated the Qualification Working Group (QualGroup) docs from the main index into the Reference section for better organization and consistency. ✅ 2. Add link in GettingInvolved Inserted a proper link to the QualGroup documentation in the GettingInvolved sync-ups table, improving discoverability for newcomers. ✅ 3. Align structure with Security Group Revised the documentation layout to follow the same structure pattern as the Security Group docs, ensuring consistency across LLVM working group references.
… `_LIBCPP_VERSION` (llvm#160627) And add some guaranteed cases (namely, for `expected`, `optional`, and `variant`) to `is_implicit_lifetime.pass.cpp`. It's somehow unfortunate that `pair` and `tuple` are not guaranteed to propagate triviality of copy/move constructors, and MSVC STL fails to do so due to ABI compatibility. This affects the implicit-lifetime property.
…oisonForTargetNode - add X86ISD::PSHUFB handling (llvm#160842) X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))
On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.
… clang (llvm#160605) When cross-compiling the LLVM project as a whole (from llvm/), if it cannot find presupplied tools it will create a native build environment to build the tools it needs. However, when doing a standalone build of clang (that is, from clang/ and linking against an existing libLLVM) this doesn't work. Instead a _target_ binary is built which predictably then fails. The conventional workaround for this is to build the native tools in a separate native compile phase and pass the paths to the cross build, for example see OpenEmbedded[1] or Nix[2]. But we can do better! The first problem is that LLVM_USE_HOST_TOOLS is only set in the llvm/ CMakeLists.txt, so setup_host_tool() will never consider building a native binary. This can be solved by setting LLVM_USE_HOST_TOOLS based on CMAKE_CROSSCOMPILING in clang/CMakeLists.txt in the standalone case. Now setup_host_tool() will try to build a native tool, but it needs build_native_tool() from CrossCompile.cmake, so that also needs to be included. Finally, the native binary then fails because there's no provider for the dependency "CONFIGURE_Clang_NATIVE", so use llvm_create_cross_target to create the native environment. These few lines mirror what the lldb CMakeLists.txt does in the standalone case, so there is prior art for this. [1] https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang_git.bb?id=e18d697e92b55e57124e80234369d46575226386#n212 [2] https://github.com/NixOS/nixpkgs/blob/3354d448f2a26117a74638957b0131ce3da9c8c4/pkgs/development/compilers/llvm/common/tblgen.nix#L54
…oisonForTargetNode - add X86ISD::VPERMV handling (llvm#160845) X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))
llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not need to be tested any more.
Mostly mechanical changes to add the missing field.
Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.
…I targets (llvm#161152) Inspired by llvm#160928 - if we have a AVX512 target capable of AVXVNNI but not AVX512VNNI then we must split 512-bit (or larger) types to 256-bits
…0049) Stumbled across a typo in the `MachineVerifier` file and since I had it open, I changed some other comments. Not important but why not leave it a bit cleaner 🙂 --------- Signed-off-by: Daniel Stadelmann <dasta_7@hotmail.com>
…ectly (llvm#157616) Fixes [issue](llvm#155591)
…eturn a value" warning. NFC. (llvm#161168)
Remove NoSignedZerosFPMath in visitFSUB part, we should always use instruction level fast math flags.
…reter bytecode test coverage (llvm#161172) Part of llvm#155814
…eter bytecode test coverage (llvm#161174) Part of llvm#155814
Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.
…eter bytecode test coverage (llvm#161182) Part of llvm#155814
Fix bug in llvm#140188 where incoming vectors are rotated in the wrong direction. Co-authored-by: Leon Clark <leoclark@amd.com>
…itchToSelect` Make sure selects do exist prior to assigning weights to edges. Fixes: llvm#161137.
…bstract subprogram DIEs" (llvm#160786) This is an attempt to reland llvm#159104 with the fix for llvm#160197. The original patch had the following problem: when an abstract subprogram DIE is constructed from within `DwarfDebug::endFunctionImpl()`, `DwarfDebug::constructAbstractSubprogramScopeDIE()` acknowledges `unit:` field of DISubprogram. But an abstract subprogram DIE constructed from `DwarfDebug::beginModule()` was put in the same compile unit to which global variable referencing the subprogram belonged, regardless of subprogram's `unit:`. This is fixed by adding `DwarfDebug::getOrCreateAbstractSubprogramCU()` used by both`DwarfDebug:: constructAbstractSubprogramScopeDIE()` and `DwarfCompileUnit::getOrCreateSubprogramDIE()` when abstract subprogram is queried during the creation of DIEs for globals in `DwarfDebug::beginModule()`. The fix and the already-reviewed code from llvm#159104 are two separate commits in this PR. ===== The original commit message follows: With this change, construction of abstract subprogram DIEs is split in two stages/functions: creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population with children (in DwarfCompileUnit::constructAbstractSubprogramScopeDIE). With that, abstract subprograms can be created/referenced from DwarfDebug::beginModule, which should solve the issue with static local variables DIE creation of inlined functons with optimized-out definitions. It fixes llvm#29985. LexicalScopes class now stores mapping from DISubprograms to their corresponding llvm::Function's. It is supposed to be built before processing of each function (so, now LexicalScopes class has a method for "module initialization" alongside the method for "function initialization"). It is used by DwarfCompileUnit to determine whether a DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is invoked. DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can create an abstract or a concrete DIE for a subprogram. It accepts llvm::Function* argument to determine whether a concrete DIE must be created. This is a temporary fix for llvm#29985. Ideally, it will be fixed by moving global variables and types emission to DwarfDebug::endModule (https://reviews.llvm.org/D144007, https://reviews.llvm.org/D144005). Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in llvm#90523 was taken for this commit.
…lvm#161110) When -nostdlib is specified, Clang should not report any library‑provided module manifest, even if a manifest for the default standard library is present.
…lvm#161085) Under some options used by LLVM Buildbot, an uninitialized variable (recently added to the test suite) caused constant evaluation failure, despite the type of that variable is an empty class. This PR explicitly initializes the variables with `{}` to fix the error. Follows-up a558d65.
This reverts commit 99a29f6. Original change was reverted because following assertion started firing: ``` clang++: clang/include/clang/AST/LambdaCapture.h:105: ValueDecl *clang::LambdaCapture::getCapturedVar() const: Assertion `capturesVariable() && "No variable available for capture"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0.Program arguments: ../../prebuilt/third_party/clang/custom/bin/clang++ -MD -MF host_x64/obj/third_party/android/platform/system/libbase/libbase.logging.cpp.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -I../.. -Ihost_x64/gen -I../../third_party/android/platform/system/libbase/include -I../../third_party/fmtlib/src/include -I../../third_party/android/platfo...com 1.<eof> parser at end of file 2.Per-file LLVM IR generation clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation) Fuchsia clang version 22.0.0git (https://llvm.googlesource.com/llvm-project 8553bd2) ******************** ``` The relanded patch just adds a `Capture.capturesVariable()` check before calling `getCapturedVar`. That's what the code did before the refactor.
…MaybeAlign` (llvm#159449) Change remaining OpBuilder methods to use `llvm::MaybeAlign` instead of `uint64_t` for alignment parameters. --------- Co-authored-by: Erick Ochoa Lopez <erick.ochoalopez@amd.com>
This commit adds the GNUPropertyRewriter, which parses features from the .gnu.property.note section. Currently we only read the bit indicating BTI support (GNU_PROPERTY_AARCH64_FEATURE_1_BTI). As BOLT does not add BTI landing pads to targets of indirect branches/calls, we have to emit a warning that the output binary may be corrupted.
Owner
Author
|
thanks Graphite, really cool |
bgergely0
pushed a commit
that referenced
this pull request
Oct 6, 2025
Specifically, `X & M ?= C --> (C << clz(M)) ?= (X << clz(M))` where M is a non-empty sequence of ones starting at the least significant bit with the remainder zero and C is a constant subset of M that cannot be materialised into a SUBS (immediate). Proof: https://alive2.llvm.org/ce/z/haqdJ4. This improves the comparison in isinf, for example: ```cpp int isinf(float x) { return __builtin_isinf(x); } ``` Before: ``` isinf: fmov w9, s0 mov w8, #2139095040 and w9, w9, #0x7fffffff cmp w9, w8 cset w0, eq ret ``` After: ``` isinf: fmov w9, s0 mov w8, #-16777216 cmp w8, w9, lsl #1 cset w0, eq ret ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This commit adds the GNUPropertyRewriter, which parses features from the
.gnu.property.note section.
Currently we only read the bit indicating BTI support
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI).
As BOLT does not add BTI landing pads to targets of indirect
branches/calls, we have to emit a warning that the output binary may be
corrupted.