merge main into amd-staging by ronlieb · Pull Request #2738 · ROCm/llvm-project

ronlieb · 2026-05-30T16:51:33Z

No description provided.

This test explicitly checks for spurious DWARF lookups by looking for the string `$__lldb` in the dwarf lookup logs. The intent here is to validate that any calls to `FindFunctions` or `FindTypes` won't have `$__lldb` in its lookup target. On arm64, this is trivial because `expr self` goes through the IRInterpreter. However, on arm64e, `expr self` is JIT compiled and executed in the inferior process. LLDB installs some utility functions to check invariants and those functions are usually prefixed with `$__lldb`. When those utility functions appear in the logs, the test incorrectly fails.

) Add more test cases based on false negative we've seen in the past.

…ter. (llvm#200309) Emit a warning for a uninitialized local pointer variable.

Summary: Ever since llvm#171515 we now build the moduel files in the runtime step. This is problematic because pretty much all of the tests depend on them. The current dependency chain doesn't work correctly because the dependencies are set much later. This fix just adds a global property so that we can set these out-of-order and sets that property to depend on all the runtime tests. Flang now becomes the only test suite who depends on the runtimes like this, every other project's test suite is hermetic. However, moving pretty much every flang test to flang-rt isn't ideal. We can investigate this in depth later but for now this fixes the build.

…#200430) After PR184065 was committed, memprof ThinLTO builds were failing on imported aliases, which now have the original aliasee guid attached as metadata (we import aliases as a copy of the aliasee body). In distributed ThinLTO, unless also importing the aliasee symbol, we won't have an entry in the summary for the aliasee guid. And we now don't have a way to locate the alias summary, which caused some assumptions and assertions to fail. Work around this with a TODO to add a way to find the original alias guid.

This is compatible with GNU, as well as being shorter and allowing users to specify symbol names with commas in them. Note that this is distinct from the existing --disassemble which has existed for a long time and disassembles all symbols. This change adds a near-alias for the existing LLVM-specific --disassemble-symbols=. Reviewers: jh7370, MaskRay Pull Request: llvm#196594

The driver resolves the path to the linker ("ld") to the absolute path. Microsoft Visual Studio comes with its own instance of `ld.lld` which it installs under "c:\\program files\\microsoft visual studio\\2022\\community\\vc\\tools\\llvm\\x64\\bin\\ld.lld.exe" In the developer command prompt, this path is added to PATH where the Clang driver finds it. However, this path does not match regular expression `"{{[^" ]*}}ld{{[^" ]*}}"` used by the MIPS test because the path contains spaces. Fix the test failure by matching only the the trailing component after `ld`. Matching the prefix of the path is unique to the MIPS test, this is not done with tests for other platforms.

It has always been the intent that it was possible to duplicate unnamed_addr constants, and LTO takes advantage of it. The current LangRef wording allows it, but it was not explicitly spelled out, which led Clang developers to add an optimization that assumed that it wasn't possible. We are considering changing the semantics of unnamed_addr, but for now, just make the current state explicit. Reviewers: teresajohnson, zygoloid, rjmccall, ChuanqiXu9, efriedma-quic, ojhunt Pull Request: llvm#199251

This showed up on a spec test, but is a very simple system-sequentially consistent fence instruction.

This PR changes olLaunchKernel to accept an array of pointers to arguments: ``` void *ArgPtrs[] = {&A, &B, &C}; size_t ArgSizes[] = {sizeof(A), sizeof(B), sizeof(C)}; olLaunchKernel(Queue, Device, Kernel, &LaunchArgs, std::size(ArgPtrs), ArgPtrs, ArgSizes); ``` The newly proposed interface is implementable by existing and anticipated backends, is familiar to CUDA programmers, eliminates the extraneous construction of a contiguous arguments buffer, replacing it with constructing an array of pointers, sidesteps the alignment requirements, does not require reading program image metadata where it's impractical, and enables a compliant SYCL implementation to be built on top of it. The ArgSizes array is required to support OpenCL, which does not have native support for launching a kernel with an argument pointer array, or a reliable way of retrieving argument sizes for a kernel. ## Mapping the proposed API to backends CUDA and Level-Zero both support accepting an array of pointers to kernel arguments, through `cuLaunchKernel` (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15) and `zeCommandListAppendLaunchKernelWithArguments` (https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistappendlaunchkernelwitharguments) respectively. For OpenCL, which requires the kernel arguments to be set separately from the kernel launch, a potential implementation can extract the number of arguments from the OpenCL API, and then iterate over the argument pointer and size arrays: ``` cl_uint num_args = 0; cl_int err = clGetKernelInfo(kernel, CL_KERNEL_NUM_ARGS, sizeof(num_args), &num_args, NULL); for (cl_uint i = 0; i < num_args; ++i) { clSetKernelArg(kernel, i, ArgSizes[i], ArgPtrs[i]); } ``` The AMDGPU plugin needs to construct a contiguous buffer using the array of argument pointers. To do this conversion, we need to have offsets at which to place the arguments. Here, luckily, as mentioned before, the AMD plugin already reads kernel metadata from the program image. The implementation simply retrieves size and offset for each argument from the kernel image, and then uses it to populate a buffer.

[This commit](llvm@5536348) moved the config file `open()` into a `with` context manager but left the trailing `f.close()` call behind. Since the context manager already closes the file, the call is redundant. It is also outside the `with` block, so `f` is unbound on the `except OSError` path. This removes it. No change in behavior. Signed-off-by: Prasoon Kumar <prasoonkumar054@gmail.com>

5047ae2 left out some whitespace that is necessary for sphinx to work.

This patch modifies existing splitdouble lowering to handle matrix types. Fix: llvm#199069

llvm#200353) The libclc lit test 'math/cos.cl' (introduced by llvm#197151 with auto-generated FileCheck assertions) started failing on the staging buildbot after the revert in llvm#199981 of the AMDGPU ABI coercion change (llvm#185083). The revert restores the older codegen path, which produces a slightly different IR shape for the select instruction inside the cos() implementation: - the select's 'contract' fast-math flag is no longer present, and - the result is named %.v.i.i instead of a numbered SSA temporary. Regenerate the AMDGCN CHECK lines with libclc/test/update_libclc_tests.py to reflect the post-revert IR. After this change, all 10 libclc tests in check-libclc-amdgcn-amd-amdhsa-llvm pass. Note: running update_libclc_tests.py also produces a purely cosmetic diff in math/rsqrt.cl (it renames a few FileCheck capture names like META13 to META12 to match the actual metadata indices); those capture renames are not required for the test to pass and are not included here to keep the change minimal. Co-authored-by: mselehov <mselehov@amd.com>

…lvm#199964) A declarations-only module produces no `MachineFunction` at all, `SPIRVModuleAnalysis` is freed before `AsmPrinter::doFinalization` and `outputModuleSections` asserts in `getAnalysis<SPIRVModuleAnalysis>()` which causes a failure

CreateCondBr always returns an Instruction, so no need to dyn_cast back to an instruction after downcasting to a Value.

…andling (llvm#200096) Several small fixes and improvements to `clang-sycl-linker`'s command-line handling, plus completing the `--spirv-dump-device-code` option: - **`--version`**: now exits with `EXIT_SUCCESS` after printing, instead of falling through into the rest of `main`. - **Empty input**: report a clear "No input files provided" error from `getInput` rather than triggering an assertion deep inside `linkDeviceCode`. - **`--spirv-dump-device-code`**: previously parsed but ignored — now actually copies each generated `.spv` file into the requested directory. The directory is created up-front (`mkdir -p` semantics) so a missing path produces a friendly diagnostic instead of a low-level copy errno. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@interface

…es (llvm#200327) mergeConditionalStoreToAddress() merges two stores into one. It does this for non-atomic and atomic-unordered stores, but when merging unordered stores, it would downgrade them to non-atomic! This bug isn't accessible from C because C doesn't expose unordered atomics. But you can access it from e.g. Objective-C with something like ``` // repro.m — clang -fno-objc-arc -O2 __attribute__((objc_root_class)) @interface C { int _value; } @Property(atomic, direct) int value; @EnD @implementation C @EnD void f(C *obj, _Bool c1, _Bool c2, int v1, int v2) { if (!obj) __builtin_unreachable(); if (c1) obj.value = v1; if (c2) obj.value = v2; } ``` LLVM merges these into a single store. The store is non-atomic without this change. This bug was found by a large run of Opus 4.7 looking for bugs in LLVM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@f

…#195745) Third PR in the series splitting [llvm#192119](llvm#192119) / [llvm#192124](llvm#192124). [llvm#195725](llvm#195725) and [llvm#195737](llvm#195737) have merged; this PR is now a standalone diff on main. Adds Extend (signext / zeroext) to `cir-call-conv-lowering`. The CIR signature keeps the original narrow integer type; the rewriter attaches `llvm.signext` / `llvm.zeroext` to `arg_attrs` and `res_attrs`. That matches classic Clang's LLVM IR convention — `define void @f(i8 signext %x)`, not `define void @f(i32 signext %x)` with an entry-block truncation. The `coercedType` field on an Extend `ArgClassification` is informational only; the rewriter doesn't use it to change the CIR signature. Three `.cir` tests cover narrow-signed-arg, narrow-unsigned-arg, and narrow-signed-return. Since the test target's narrow-int Extend rule fires only on MLIR `IntegerType` and CIR functions use `cir::IntType`, these tests drive the rewriter through the classification-injection path added in [llvm#195725](llvm#195725).

Link using `-random_uuid` on macOS to avoid accidental UUID matching in tests.

…#199573) Enable `init_priority` on z/OS Motivation The recent addition of `clang/test/Sema/type-dependent-attrs.cpp` in llvm#182208 started failing on z/OS. That test uses `[[gnu::init_priority(2000)]]`, and the failure exposed that init_priority support was still disabled for z/OS in `Attr.td`. What changed - Enabled init_priority for z/OS in `clang/include/clang/Basic/Attr.td` - Updated `clang/test/SemaCXX/init-priority-attr.cpp` so z/OS now expects normal semantic handling for init_priority This reverts commit 2c7e24c and preserve any changes done after this commit.

…control hints (llvm#181612) Add target-agnostic infrastructure for the !mem.cache_hint metadata kind, https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443 This patch includes: - Registration of mem.cache_hint in FixedMetadataKinds - IR Verifier validation of structural constraints - Metadata helper support in combineMetadata(), copyMetadataForLoad(), and dropUBImplyingAttrsAndMetadata() - LangRef documentation for the metadata format and semantics - Verifier and transform pass test coverage (GVN, InstCombine, SimplifyCFG) Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com> Assisted-by: Claude Code --------- Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>

)

This test uses outdated `cbuffer` layout design. It has been replaced by `cbuffer-metadata.ll` when we updated the frontend to use explicit padding for `cbuffer` data types.

Ran this on an Android device using both algorithms, the new algorithm is on average 10% faster, but gets to be 15% faster in some cases. This is an example of the speed-ups. Average Operation Time Maximum Operation Time Name 326.9(ns) 80770(ns) PushBlocks New 365.9(ns) 108032(ns) PushBlocks Old

…h Atomic-Clause (llvm#199636) Adhering to the restrictions of using Memory-Order-Clause with Atomic-Clause. Added warnings to indicate the transformations that will done internally in flang. In the process of handling all the restrictions of using memory-order-clause This also Fixes [llvm#199490](llvm#199490) --------- Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>

This is a very simple implementation, we just make sure we add the base class destructor to the cleanup scope.

…200261) When expanding fptoui.sat/fptosi.sat, we saturate when the biased exponent is at least ExponentBias + BitWidth - IsSigned, the point where the value no longer fits in the target integer. We should *also* always saturate when the floating-point value is +/-inf. Usually this doesn't require any special handling; for example for a float32 -> int32 conversion, inf has a biased exponent of 255 > ExponentBias + BitWidth - IsSigned = 127 + 32 - 1. But for integer types which are large enough to contain all source floating-point values, this doesn't work. For example, if you're converting float32 to int256, you'd compute a threshold of 383, which is greater than 255. Therefore float32(inf) would not correctly saturate to INT256_MAX. Fix this by clamping the threshold to the all-ones biased exponent. This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.

ExtractCostCalculated deduplicates by scalar so only the first ExternalUser determines the scale, making the cost depend on IR block ordering via LLVM's reverse-insertion use-list order. Add a pre-pass computing ScalarToExtractBlock - the nearest common dominator of all effective extract sites per scalar. For PHI users inside a loop the effective site is the incoming block; for PHI users outside all loops it is the PHI's own block (scale = 1). The extract cost is then scaled by getLoopNestScale of the NCD block, which is fully order-independent. Fixes llvm#199548 Reviewers: bababuck, RKSimon, hiraditya Pull Request: llvm#199962

…00393) Calling Integral::getPtr() shouldn't happen for AddrLabelDiff integrals.

(llvm#200569)

…C) (llvm#200255) Set the scalar type for VPBlendRecipe and VPExpressionRecipe at construction time, instead of inferring it on demand via VPTypeAnalysis. With this change, all VPValues have their scalar type set at construction, so VPTypeAnalysis::inferScalarType becomes a thin wrapper around VPValue::getScalarType. To be removed in a follow-up: llvm#200256. PR: llvm#200255

Fixes llvm#200295

Update the assertion text to match the actual code behavior. Some functions enforce strictly positive values, whereas the error message incorrectly mentioned "nonnegative".

) We may also see OO_Array_Delete here.

Clang's TBAA grants the [basic.lval]/11.3 char-aliasing privilege only to the named ::std::byte type (Type::isStdByteType() requires the enum to be declared in the std namespace). LIBC_NAMESPACE::cpp::byte lives in libc's cpp namespace, so it gets its own TBAA node disjoint from char even though it has the same shape as std::byte. That mismatch lets the optimizer reorder typed loads past raw-byte writes through cpp::byte *, miscompiling HeapSort on rv64/Release (UnsortedThreeElementArray{1,2,3}, UnsortedTwoElementArray1 in SortingTest.h). The same hazard is latent in every cpp::byte *-based raw-aliasing site: memory_utils Ptr/CPtr, lsearch/lfind, block.h and freelist_heap.h allocator metadata. Tag the type with gnu::may_alias so accesses through cpp::byte * share the universal char-aliasing TBAA node, fixing all of the above in one place. This patch also reverts PR llvm#194171, as the may_alias attribute fixes it too.

Mark Complex visitExpr as unsupported, similar to Clang ORCG, not as NYI

Although this gets inlined in LTO builds, non-LTO builds benefit from having isIntegerTy(n) defined in a header. Co-authored-by: Nikita Popov <npopov@redhat.com>

In LLVM_ENABLE_ABI_BREAKING_CHECKS builds, when poisoned (`deleted()` or `allUsesReplacedWith()`), a PoisoningVH is removed from its use list but keeps its raw value pointer for identity, so its PrevPtr/Next are left stale. PoisoningVH has no move constructor, so relocating a value that embeds one falls back to the copy constructor, where `setRawValPtr` relinks with the stale pointers and corrupts the use list. This is a latent bug for any relocation of a PoisoningVH handle but becomes load-bearing for llvm#199615 , which relocating erase exercises it via ScalarEvolution's BackedgeTakenInfo (its ExitNotTakenInfo holds a PoisoningVH<BasicBlock>). Fix by special casing the `Poisoned` case. Aided By Claude Opus 4.8

- Pin containers. - Cleaner names for different targets and options. - Add Build Test step. - Skip shared tests and death tests.

…2yaml` and `yaml2pdb`"" (llvm#200588) Reverts llvm#200413 Breaks build bots: https://lab.llvm.org/buildbot/#/builders/169/builds/23142 https://lab.llvm.org/buildbot/#/builders/25/builds/18082

…ernel arguments (llvm#199483)" needs downstream integration : see rlieberm/SalaOffProblem This reverts commit f15904e.

bulbazord and others added 30 commits May 29, 2026 10:11

[analyzer][webkit][NFC] Add more tests for nodelete checker (llvm#200357

82840c5

) Add more test cases based on false negative we've seen in the past.

[alpha.webkit.UncountedLocalVarsChecker] Check uninitialized raw poin…

2d411c5

…ter. (llvm#200309) Emit a warning for a uninitialized local pointer variable.

[CIR] Implement __sync_synchronize builtin (llvm#200423)

bd5c724

This showed up on a spec test, but is a very simple system-sequentially consistent fence instruction.

[Docs] Fix build (llvm#200467)

91f3d8d

5047ae2 left out some whitespace that is necessary for sphinx to work.

[HLSL] Adding matrix support to splitdouble (llvm#200257)

3e3871d

This patch modifies existing splitdouble lowering to handle matrix types. Fix: llvm#199069

[ExpandIRInsts] Avoid redundant dyn_cast after llvm#175864 (llvm#200475)

d5abd9c

CreateCondBr always returns an Instruction, so no need to dyn_cast back to an instruction after downcasting to a Value.

[lldb][test] Link test binaries with -random_uuid (llvm#199385)

352fd0b

Link using `-random_uuid` on macOS to avoid accidental UUID matching in tests.

[libc] Fix SSE2 check for x86_64/sqrt.h. (llvm#200468)

6a96948

[libc][bazel] Add arm and riscv FEnvImpl.h to textual_hdrs. (llvm#200479

e04dbbf

)

[IR] Introduce Instruction::getFastMathFlagsOrNone (NFC) (llvm#200457)

02997d7

[DirectX] Remove obsolete cbuffer layout test (llvm#200307)

d337e68

This test uses outdated `cbuffer` layout design. It has been replaced by `cbuffer-metadata.ll` when we updated the frontend to use explicit padding for `cbuffer` data types.

[CIR] Implement cleanups of base classes for aggregates. (llvm#200473)

f8545be

This is a very simple implementation, we just make sure we add the base class destructor to the cleanup scope.

alexey-bataev and others added 20 commits May 30, 2026 07:43

[offload] *KernelInfo becomes KernelInfo

343f005

[clang][bytecode] Fix an assertion failure in AddSubNonNumber (llvm#2…

d46e7dc

…00393) Calling Integral::getPtr() shouldn't happen for AddrLabelDiff integrals.

[X86] Add VBMI2 shuffle test to track miscompile reported in llvm#200136

2501cdd

(llvm#200569)

[clang][bytecode] Reject invalid UnaryOperators (llvm#200394)

30ec1fa

[clang][bytecode] Fix a crash with an empty InitListExpr (llvm#200366)

ef6a217

Fixes llvm#200295

[mlir][nfc] Fix assertion text in IndexingUtils (llvm#181826)

f7d5760

Update the assertion text to match the actual code behavior. Some functions enforce strictly positive values, whereas the error message incorrectly mentioned "nonnegative".

[clang][bytecode] Loosen an assertion about operator delete (llvm#200575

470411c

) We may also see OO_Array_Delete here.

[CIR][NFC] Mark Complex visitExpr as unsupported (llvm#197782)

2396aa8

Mark Complex visitExpr as unsupported, similar to Clang ORCG, not as NYI

[IR][NFC] Inline Type::isIntegerTy(n) (llvm#200471)

51256ed

Although this gets inlined in LTO builds, non-LTO builds benefit from having isIntegerTy(n) defined in a header. Co-authored-by: Nikita Popov <npopov@redhat.com>

[libc][ci] Clean up libc-fullbuild-tests precommit CI. (llvm#200520)

b771021

- Pin containers. - Cleaner names for different targets and options. - Add Build Test step. - Skip shared tests and death tests.

merge main into amd-staging

1c4e914

merge main into amd-staging

2d9367e

Revert "Reapply "[PDB][llvm-pdbutil] Add DXContainer support for `pdb…

013ee58

…2yaml` and `yaml2pdb`"" (llvm#200588) Reverts llvm#200413 Breaks build bots: https://lab.llvm.org/buildbot/#/builders/169/builds/23142 https://lab.llvm.org/buildbot/#/builders/25/builds/18082

Revert "[offload][OpenMP] Add strict flag for blocks and threads in k…

5cd6f6b

…ernel arguments (llvm#199483)" needs downstream integration : see rlieberm/SalaOffProblem This reverts commit f15904e.

merge main into amd-staging

9117da7

[revPat] update reverts

a381577

ronlieb requested review from a team, dpalermo and skganesan008 May 30, 2026 16:51

ronlieb requested review from antiagainst and kuhar as code owners May 30, 2026 16:51

ronlieb removed request for antiagainst and kuhar May 30, 2026 16:51

dpalermo approved these changes May 30, 2026

View reviewed changes

ronlieb merged commit 183fecb into amd-staging May 31, 2026
121 of 125 checks passed

ronlieb deleted the amd/merge/upstream_merge_20260530113308 branch May 31, 2026 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-staging#2738

merge main into amd-staging#2738
ronlieb merged 107 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260530113308

ronlieb commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ronlieb commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants