Skip to content

merge main into amd-staging#2738

Merged
ronlieb merged 107 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260530113308
May 31, 2026
Merged

merge main into amd-staging#2738
ronlieb merged 107 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260530113308

Conversation

@ronlieb
Copy link
Copy Markdown
Collaborator

@ronlieb ronlieb commented May 30, 2026

No description provided.

bulbazord and others added 30 commits May 29, 2026 10:11
This test explicitly checks for spurious DWARF lookups by looking for
the string `$__lldb` in the dwarf lookup logs. The intent here is to
validate that any calls to `FindFunctions` or `FindTypes` won't have
`$__lldb` in its lookup target. On arm64, this is trivial because `expr
self` goes through the IRInterpreter.

However, on arm64e, `expr self` is JIT compiled and executed in the
inferior process. LLDB installs some utility functions to check
invariants and those functions are usually prefixed with `$__lldb`. When
those utility functions appear in the logs, the test incorrectly fails.
)

Add more test cases based on false negative we've seen in the past.
…ter. (llvm#200309)

Emit a warning for a uninitialized local pointer variable.
Summary:
Ever since llvm#171515 we now build
the moduel files in the runtime step. This is problematic because pretty
much all of the tests depend on them. The current dependency chain
doesn't work correctly because the dependencies are set much later.

This fix just adds a global property so that we can set these
out-of-order and sets that property to depend on all the runtime tests.

Flang now becomes the only test suite who depends on the runtimes like
this, every other project's test suite is hermetic. However, moving
pretty much every flang test to flang-rt isn't ideal. We can investigate
this in depth later but for now this fixes the build.
…#200430)

After PR184065 was committed, memprof ThinLTO builds were failing on
imported aliases, which now have the original aliasee guid attached
as metadata (we import aliases as a copy of the aliasee body). In
distributed ThinLTO, unless also importing the aliasee symbol, we won't
have an entry in the summary for the aliasee guid. And we now don't have
a way to locate the alias summary, which caused some assumptions and
assertions to fail.

Work around this with a TODO to add a way to find the original alias
guid.
This is compatible with GNU, as well as being shorter and allowing users
to specify symbol names with commas in them.

Note that this is distinct from the existing --disassemble which has
existed for a long time and disassembles all symbols. This change adds
a near-alias for the existing LLVM-specific --disassemble-symbols=.

Reviewers: jh7370, MaskRay

Pull Request: llvm#196594
The driver resolves the path to the linker ("ld") to the absolute path.
Microsoft Visual Studio comes with its own instance of `ld.lld` which it
installs under

"c:\\program files\\microsoft visual studio\\2022\\community\\vc\\tools\\llvm\\x64\\bin\\ld.lld.exe"

In the developer command prompt, this path is added to PATH where the
Clang driver finds it. However, this path does not match regular
expression `"{{[^" ]*}}ld{{[^" ]*}}"` used by the MIPS test because the
path contains spaces.

Fix the test failure by matching only the the trailing component after
`ld`. Matching the prefix of the path is unique to the MIPS test, this
is not done with tests for other platforms.
It has always been the intent that it was possible to duplicate
unnamed_addr constants, and LTO takes advantage of it. The current LangRef
wording allows it, but it was not explicitly spelled out, which led Clang
developers to add an optimization that assumed that it wasn't possible.

We are considering changing the semantics of unnamed_addr, but for now,
just make the current state explicit.

Reviewers:
teresajohnson, zygoloid, rjmccall, ChuanqiXu9, efriedma-quic, ojhunt

Pull Request: llvm#199251
This showed up on a spec test, but is a very simple system-sequentially
consistent fence instruction.
This PR changes olLaunchKernel to accept an array of pointers to
arguments:
```
  void *ArgPtrs[] = {&A, &B, &C};
  size_t ArgSizes[] = {sizeof(A), sizeof(B), sizeof(C)};

  olLaunchKernel(Queue, Device, Kernel, &LaunchArgs, std::size(ArgPtrs), ArgPtrs, ArgSizes);
```

The newly proposed interface is implementable by existing and
anticipated
backends, is familiar to CUDA programmers, eliminates the extraneous
construction of a contiguous arguments buffer, replacing it with
constructing
an array of pointers, sidesteps the alignment requirements, does not
require reading program image metadata where it's impractical, and
enables
a compliant SYCL implementation to be built on top of it.

The ArgSizes array is required to support OpenCL, which does not have
native support for launching a kernel with an argument pointer array, or
a reliable way of retrieving argument sizes for a kernel.

## Mapping the proposed API to backends

CUDA and Level-Zero both support accepting an array of pointers to
kernel arguments, through `cuLaunchKernel`

(https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15)
and `zeCommandListAppendLaunchKernelWithArguments`

(https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistappendlaunchkernelwitharguments)
respectively.

For OpenCL, which requires the kernel arguments to be set separately
from the kernel launch, a potential implementation can extract the
number of arguments from the OpenCL API, and then iterate over the
argument pointer and size arrays:
```
  cl_uint num_args = 0;
  cl_int err = clGetKernelInfo(kernel, CL_KERNEL_NUM_ARGS,
                               sizeof(num_args), &num_args, NULL);
  for (cl_uint i = 0; i < num_args; ++i) {
    clSetKernelArg(kernel, i, ArgSizes[i], ArgPtrs[i]);
  }
```
The AMDGPU plugin needs to construct a contiguous buffer using the array
of argument pointers. To do this conversion, we need to have offsets
at which to place the arguments. Here, luckily, as mentioned before,
the AMD plugin already reads kernel metadata from the program image.
The implementation simply retrieves size and offset for each argument
from the kernel image, and then uses it to populate a buffer.
[This
commit](llvm@5536348)
moved the config file `open()` into a `with` context manager but left
the trailing `f.close()` call behind. Since the context manager already
closes the file, the call is redundant. It is also outside the `with`
block, so `f` is unbound on the `except OSError` path. This removes it.
No change in behavior.

Signed-off-by: Prasoon Kumar <prasoonkumar054@gmail.com>
5047ae2 left out some whitespace that
is necessary for sphinx to work.
This patch modifies existing splitdouble lowering to handle matrix
types.
Fix: llvm#199069
llvm#200353)

The libclc lit test 'math/cos.cl' (introduced by llvm#197151 with
auto-generated FileCheck assertions) started failing on the staging
buildbot after the revert in llvm#199981 of the AMDGPU ABI coercion change
(llvm#185083). The revert restores the older codegen path, which produces a
slightly different IR shape for the select instruction inside the cos()
implementation:

  - the select's 'contract' fast-math flag is no longer present, and
  - the result is named %.v.i.i instead of a numbered SSA temporary.

Regenerate the AMDGCN CHECK lines with
libclc/test/update_libclc_tests.py to reflect the post-revert IR. After
this change, all 10 libclc tests in check-libclc-amdgcn-amd-amdhsa-llvm
pass.

Note: running update_libclc_tests.py also produces a purely cosmetic
diff in math/rsqrt.cl (it renames a few FileCheck capture names like
META13 to META12 to match the actual metadata indices); those capture
renames are not required for the test to pass and are not included here
to keep the change minimal.

Co-authored-by: mselehov <mselehov@amd.com>
…lvm#199964)

A declarations-only module produces no `MachineFunction` at all,
`SPIRVModuleAnalysis` is freed before `AsmPrinter::doFinalization` and
`outputModuleSections` asserts in `getAnalysis<SPIRVModuleAnalysis>()`
which causes a failure
CreateCondBr always returns an Instruction, so no need to dyn_cast back
to an instruction after downcasting to a Value.
…andling (llvm#200096)

Several small fixes and improvements to `clang-sycl-linker`'s
command-line
handling, plus completing the `--spirv-dump-device-code` option:

- **`--version`**: now exits with `EXIT_SUCCESS` after printing, instead
of
  falling through into the rest of `main`.
- **Empty input**: report a clear "No input files provided" error from
`getInput` rather than triggering an assertion deep inside
`linkDeviceCode`.
- **`--spirv-dump-device-code`**: previously parsed but ignored — now
actually
  copies each generated `.spv` file into the requested directory. The
  directory is created up-front (`mkdir -p` semantics) so a missing path
  produces a friendly diagnostic instead of a low-level copy errno.


Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (llvm#200327)

mergeConditionalStoreToAddress() merges two stores into one.  It does
this for non-atomic and atomic-unordered stores, but when merging
unordered stores, it would downgrade them to non-atomic!

This bug isn't accessible from C because C doesn't expose unordered
atomics. But you can access it from e.g. Objective-C with something like

```
// repro.m — clang -fno-objc-arc -O2
__attribute__((objc_root_class))
@interface C { int _value; }
@Property(atomic, direct) int value;
@EnD
@implementation C
@EnD

void f(C *obj, _Bool c1, _Bool c2, int v1, int v2) {
    if (!obj) __builtin_unreachable();
    if (c1) obj.value = v1;
    if (c2) obj.value = v2;
}
```

LLVM merges these into a single store.  The store is non-atomic without
this change.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#195745)

Third PR in the series splitting
[llvm#192119](llvm#192119) /
[llvm#192124](llvm#192124).
[llvm#195725](llvm#195725) and
[llvm#195737](llvm#195737) have merged;
this PR is now a standalone diff on main.

Adds Extend (signext / zeroext) to `cir-call-conv-lowering`. The CIR
signature keeps the original narrow integer type; the rewriter attaches
`llvm.signext` / `llvm.zeroext` to `arg_attrs` and `res_attrs`. That
matches classic Clang's LLVM IR convention — `define void @f(i8 signext
%x)`, not `define void @f(i32 signext %x)` with an entry-block
truncation. The `coercedType` field on an Extend `ArgClassification` is
informational only; the rewriter doesn't use it to change the CIR
signature.

Three `.cir` tests cover narrow-signed-arg, narrow-unsigned-arg, and
narrow-signed-return. Since the test target's narrow-int Extend rule
fires only on MLIR `IntegerType` and CIR functions use `cir::IntType`,
these tests drive the rewriter through the classification-injection path
added in [llvm#195725](llvm#195725).
Link using `-random_uuid` on macOS to avoid accidental UUID matching in tests.
…#199573)

Enable `init_priority` on z/OS

Motivation
The recent addition of `clang/test/Sema/type-dependent-attrs.cpp` in
llvm#182208 started failing on
z/OS. That test uses `[[gnu::init_priority(2000)]]`, and the failure
exposed that init_priority support was still disabled for z/OS in
`Attr.td`.

What changed

- Enabled init_priority for z/OS in `clang/include/clang/Basic/Attr.td`
- Updated `clang/test/SemaCXX/init-priority-attr.cpp` so z/OS now
expects normal semantic handling for init_priority

This reverts commit 2c7e24c and
preserve any changes done after this commit.
…control hints (llvm#181612)

Add target-agnostic infrastructure for the !mem.cache_hint metadata
kind,
https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443

This patch includes:
- Registration of mem.cache_hint in FixedMetadataKinds
- IR Verifier validation of structural constraints
- Metadata helper support in combineMetadata(), copyMetadataForLoad(),
and dropUBImplyingAttrsAndMetadata()
- LangRef documentation for the metadata format and semantics
- Verifier and transform pass test coverage (GVN, InstCombine,
SimplifyCFG)

Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>
Assisted-by: Claude Code

---------

Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>
This test uses outdated `cbuffer` layout design. It has been replaced by
`cbuffer-metadata.ll` when we updated the frontend to use explicit
padding for `cbuffer` data types.
Ran this on an Android device using both algorithms, the new algorithm
is on average 10% faster, but gets to be 15% faster in some cases. This
is an example of the speed-ups.

Average Operation Time    Maximum Operation Time   Name
        326.9(ns)                 80770(ns)        PushBlocks New
        365.9(ns)                108032(ns)        PushBlocks Old
…h Atomic-Clause (llvm#199636)

Adhering to the restrictions of using Memory-Order-Clause with
Atomic-Clause.
Added warnings to indicate the transformations that will done internally
in flang.

In the process of handling all the restrictions of using
memory-order-clause This also Fixes
[llvm#199490](llvm#199490)

---------

Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
This is a very simple implementation, we just make sure we add the base
class destructor to the cleanup scope.
…200261)

When expanding fptoui.sat/fptosi.sat, we saturate when the biased
exponent is at least ExponentBias + BitWidth - IsSigned, the point where
the value no longer fits in the target integer.

We should *also* always saturate when the floating-point value is
+/-inf.  Usually this doesn't require any special handling; for example
for a float32 -> int32 conversion, inf has a biased exponent of 255 >
ExponentBias + BitWidth - IsSigned = 127 + 32 - 1.

But for integer types which are large enough to contain all source
floating-point values, this doesn't work. For example, if you're
converting float32 to int256, you'd compute a threshold of 383, which is
greater than 255.  Therefore float32(inf) would not correctly saturate
to INT256_MAX.

Fix this by clamping the threshold to the all-ones biased exponent.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
alexey-bataev and others added 20 commits May 30, 2026 07:43
ExtractCostCalculated deduplicates by scalar so only the first
ExternalUser determines the scale, making the cost depend on IR block
ordering via LLVM's reverse-insertion use-list order.
Add a pre-pass computing ScalarToExtractBlock - the nearest common
dominator of all effective extract sites per scalar. For PHI users inside
a loop the effective site is the incoming block; for PHI users outside
all loops it is the PHI's own block (scale = 1). The extract cost is
then scaled by getLoopNestScale of the NCD block, which is fully
order-independent.

Fixes llvm#199548

Reviewers: bababuck, RKSimon, hiraditya

Pull Request: llvm#199962
…00393)

Calling Integral::getPtr() shouldn't happen for AddrLabelDiff integrals.
…C) (llvm#200255)

Set the scalar type for VPBlendRecipe and VPExpressionRecipe at
construction time, instead of inferring it on demand via VPTypeAnalysis.
With this change, all VPValues have their scalar type set at
construction, so VPTypeAnalysis::inferScalarType becomes a thin wrapper
around VPValue::getScalarType.

To be removed in a follow-up:
llvm#200256.

PR: llvm#200255
Update the assertion text to match the actual code behavior.
Some functions enforce strictly positive values, whereas the error
message incorrectly mentioned "nonnegative".
Clang's TBAA grants the [basic.lval]/11.3 char-aliasing privilege only
to the named ::std::byte type (Type::isStdByteType() requires the enum
to be declared in the std namespace). LIBC_NAMESPACE::cpp::byte lives in
libc's cpp namespace, so it gets its own TBAA node disjoint from char
even though it has the same shape as std::byte.

That mismatch lets the optimizer reorder typed loads past raw-byte
writes through cpp::byte *, miscompiling HeapSort on rv64/Release
(UnsortedThreeElementArray{1,2,3}, UnsortedTwoElementArray1 in
SortingTest.h). The same hazard is latent in every cpp::byte *-based
raw-aliasing site: memory_utils Ptr/CPtr, lsearch/lfind, block.h and
freelist_heap.h allocator metadata.

Tag the type with gnu::may_alias so accesses through cpp::byte * share
the universal char-aliasing TBAA node, fixing all of the above in one
place. This patch also reverts PR llvm#194171, as the may_alias attribute
fixes it too.
Mark Complex visitExpr as unsupported, similar to Clang ORCG, not as NYI
Although this gets inlined in LTO builds, non-LTO builds benefit from
having isIntegerTy(n) defined in a header.

Co-authored-by: Nikita Popov <npopov@redhat.com>
In LLVM_ENABLE_ABI_BREAKING_CHECKS builds, when poisoned (`deleted()` or
`allUsesReplacedWith()`), a PoisoningVH is removed from its use list but
keeps its raw value pointer for identity, so its PrevPtr/Next are left
stale.

PoisoningVH has no move constructor, so relocating a value that embeds
one
falls back to the copy constructor, where `setRawValPtr` relinks with
the stale pointers and corrupts the use list.

This is a latent bug for any relocation of a PoisoningVH handle
but becomes load-bearing for llvm#199615 , which relocating erase exercises
it via ScalarEvolution's BackedgeTakenInfo (its ExitNotTakenInfo holds a
PoisoningVH<BasicBlock>).

Fix by special casing the `Poisoned` case.
Aided By Claude Opus 4.8
- Pin containers.
- Cleaner names for different targets and options.
- Add Build Test step.
- Skip shared tests and death tests.
…ernel arguments (llvm#199483)"

  needs downstream integration : see rlieberm/SalaOffProblem

This reverts commit f15904e.
@ronlieb ronlieb requested review from a team, dpalermo and skganesan008 May 30, 2026 16:51
@ronlieb ronlieb requested review from antiagainst and kuhar as code owners May 30, 2026 16:51
@ronlieb ronlieb removed request for antiagainst and kuhar May 30, 2026 16:51
@ronlieb ronlieb merged commit 183fecb into amd-staging May 31, 2026
121 of 125 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20260530113308 branch May 31, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.