Skip to content

merge main into amd-staging#2737

Merged
68 commits merged into
amd-stagingfrom
amd/merge/upstream_merge_20260529234307
May 31, 2026
Merged

merge main into amd-staging#2737
68 commits merged into
amd-stagingfrom
amd/merge/upstream_merge_20260529234307

Conversation

@ronlieb
Copy link
Copy Markdown
Collaborator

@ronlieb ronlieb commented May 30, 2026

No description provided.

bulbazord and others added 30 commits May 29, 2026 10:11
This test explicitly checks for spurious DWARF lookups by looking for
the string `$__lldb` in the dwarf lookup logs. The intent here is to
validate that any calls to `FindFunctions` or `FindTypes` won't have
`$__lldb` in its lookup target. On arm64, this is trivial because `expr
self` goes through the IRInterpreter.

However, on arm64e, `expr self` is JIT compiled and executed in the
inferior process. LLDB installs some utility functions to check
invariants and those functions are usually prefixed with `$__lldb`. When
those utility functions appear in the logs, the test incorrectly fails.
)

Add more test cases based on false negative we've seen in the past.
…ter. (llvm#200309)

Emit a warning for a uninitialized local pointer variable.
Summary:
Ever since llvm#171515 we now build
the moduel files in the runtime step. This is problematic because pretty
much all of the tests depend on them. The current dependency chain
doesn't work correctly because the dependencies are set much later.

This fix just adds a global property so that we can set these
out-of-order and sets that property to depend on all the runtime tests.

Flang now becomes the only test suite who depends on the runtimes like
this, every other project's test suite is hermetic. However, moving
pretty much every flang test to flang-rt isn't ideal. We can investigate
this in depth later but for now this fixes the build.
…#200430)

After PR184065 was committed, memprof ThinLTO builds were failing on
imported aliases, which now have the original aliasee guid attached
as metadata (we import aliases as a copy of the aliasee body). In
distributed ThinLTO, unless also importing the aliasee symbol, we won't
have an entry in the summary for the aliasee guid. And we now don't have
a way to locate the alias summary, which caused some assumptions and
assertions to fail.

Work around this with a TODO to add a way to find the original alias
guid.
This is compatible with GNU, as well as being shorter and allowing users
to specify symbol names with commas in them.

Note that this is distinct from the existing --disassemble which has
existed for a long time and disassembles all symbols. This change adds
a near-alias for the existing LLVM-specific --disassemble-symbols=.

Reviewers: jh7370, MaskRay

Pull Request: llvm#196594
The driver resolves the path to the linker ("ld") to the absolute path.
Microsoft Visual Studio comes with its own instance of `ld.lld` which it
installs under

"c:\\program files\\microsoft visual studio\\2022\\community\\vc\\tools\\llvm\\x64\\bin\\ld.lld.exe"

In the developer command prompt, this path is added to PATH where the
Clang driver finds it. However, this path does not match regular
expression `"{{[^" ]*}}ld{{[^" ]*}}"` used by the MIPS test because the
path contains spaces.

Fix the test failure by matching only the the trailing component after
`ld`. Matching the prefix of the path is unique to the MIPS test, this
is not done with tests for other platforms.
It has always been the intent that it was possible to duplicate
unnamed_addr constants, and LTO takes advantage of it. The current LangRef
wording allows it, but it was not explicitly spelled out, which led Clang
developers to add an optimization that assumed that it wasn't possible.

We are considering changing the semantics of unnamed_addr, but for now,
just make the current state explicit.

Reviewers:
teresajohnson, zygoloid, rjmccall, ChuanqiXu9, efriedma-quic, ojhunt

Pull Request: llvm#199251
This showed up on a spec test, but is a very simple system-sequentially
consistent fence instruction.
This PR changes olLaunchKernel to accept an array of pointers to
arguments:
```
  void *ArgPtrs[] = {&A, &B, &C};
  size_t ArgSizes[] = {sizeof(A), sizeof(B), sizeof(C)};

  olLaunchKernel(Queue, Device, Kernel, &LaunchArgs, std::size(ArgPtrs), ArgPtrs, ArgSizes);
```

The newly proposed interface is implementable by existing and
anticipated
backends, is familiar to CUDA programmers, eliminates the extraneous
construction of a contiguous arguments buffer, replacing it with
constructing
an array of pointers, sidesteps the alignment requirements, does not
require reading program image metadata where it's impractical, and
enables
a compliant SYCL implementation to be built on top of it.

The ArgSizes array is required to support OpenCL, which does not have
native support for launching a kernel with an argument pointer array, or
a reliable way of retrieving argument sizes for a kernel.

## Mapping the proposed API to backends

CUDA and Level-Zero both support accepting an array of pointers to
kernel arguments, through `cuLaunchKernel`

(https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15)
and `zeCommandListAppendLaunchKernelWithArguments`

(https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistappendlaunchkernelwitharguments)
respectively.

For OpenCL, which requires the kernel arguments to be set separately
from the kernel launch, a potential implementation can extract the
number of arguments from the OpenCL API, and then iterate over the
argument pointer and size arrays:
```
  cl_uint num_args = 0;
  cl_int err = clGetKernelInfo(kernel, CL_KERNEL_NUM_ARGS,
                               sizeof(num_args), &num_args, NULL);
  for (cl_uint i = 0; i < num_args; ++i) {
    clSetKernelArg(kernel, i, ArgSizes[i], ArgPtrs[i]);
  }
```
The AMDGPU plugin needs to construct a contiguous buffer using the array
of argument pointers. To do this conversion, we need to have offsets
at which to place the arguments. Here, luckily, as mentioned before,
the AMD plugin already reads kernel metadata from the program image.
The implementation simply retrieves size and offset for each argument
from the kernel image, and then uses it to populate a buffer.
[This
commit](llvm@5536348)
moved the config file `open()` into a `with` context manager but left
the trailing `f.close()` call behind. Since the context manager already
closes the file, the call is redundant. It is also outside the `with`
block, so `f` is unbound on the `except OSError` path. This removes it.
No change in behavior.

Signed-off-by: Prasoon Kumar <prasoonkumar054@gmail.com>
5047ae2 left out some whitespace that
is necessary for sphinx to work.
This patch modifies existing splitdouble lowering to handle matrix
types.
Fix: llvm#199069
llvm#200353)

The libclc lit test 'math/cos.cl' (introduced by llvm#197151 with
auto-generated FileCheck assertions) started failing on the staging
buildbot after the revert in llvm#199981 of the AMDGPU ABI coercion change
(llvm#185083). The revert restores the older codegen path, which produces a
slightly different IR shape for the select instruction inside the cos()
implementation:

  - the select's 'contract' fast-math flag is no longer present, and
  - the result is named %.v.i.i instead of a numbered SSA temporary.

Regenerate the AMDGCN CHECK lines with
libclc/test/update_libclc_tests.py to reflect the post-revert IR. After
this change, all 10 libclc tests in check-libclc-amdgcn-amd-amdhsa-llvm
pass.

Note: running update_libclc_tests.py also produces a purely cosmetic
diff in math/rsqrt.cl (it renames a few FileCheck capture names like
META13 to META12 to match the actual metadata indices); those capture
renames are not required for the test to pass and are not included here
to keep the change minimal.

Co-authored-by: mselehov <mselehov@amd.com>
…lvm#199964)

A declarations-only module produces no `MachineFunction` at all,
`SPIRVModuleAnalysis` is freed before `AsmPrinter::doFinalization` and
`outputModuleSections` asserts in `getAnalysis<SPIRVModuleAnalysis>()`
which causes a failure
CreateCondBr always returns an Instruction, so no need to dyn_cast back
to an instruction after downcasting to a Value.
…andling (llvm#200096)

Several small fixes and improvements to `clang-sycl-linker`'s
command-line
handling, plus completing the `--spirv-dump-device-code` option:

- **`--version`**: now exits with `EXIT_SUCCESS` after printing, instead
of
  falling through into the rest of `main`.
- **Empty input**: report a clear "No input files provided" error from
`getInput` rather than triggering an assertion deep inside
`linkDeviceCode`.
- **`--spirv-dump-device-code`**: previously parsed but ignored — now
actually
  copies each generated `.spv` file into the requested directory. The
  directory is created up-front (`mkdir -p` semantics) so a missing path
  produces a friendly diagnostic instead of a low-level copy errno.


Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (llvm#200327)

mergeConditionalStoreToAddress() merges two stores into one.  It does
this for non-atomic and atomic-unordered stores, but when merging
unordered stores, it would downgrade them to non-atomic!

This bug isn't accessible from C because C doesn't expose unordered
atomics. But you can access it from e.g. Objective-C with something like

```
// repro.m — clang -fno-objc-arc -O2
__attribute__((objc_root_class))
@interface C { int _value; }
@Property(atomic, direct) int value;
@EnD
@implementation C
@EnD

void f(C *obj, _Bool c1, _Bool c2, int v1, int v2) {
    if (!obj) __builtin_unreachable();
    if (c1) obj.value = v1;
    if (c2) obj.value = v2;
}
```

LLVM merges these into a single store.  The store is non-atomic without
this change.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#195745)

Third PR in the series splitting
[llvm#192119](llvm#192119) /
[llvm#192124](llvm#192124).
[llvm#195725](llvm#195725) and
[llvm#195737](llvm#195737) have merged;
this PR is now a standalone diff on main.

Adds Extend (signext / zeroext) to `cir-call-conv-lowering`. The CIR
signature keeps the original narrow integer type; the rewriter attaches
`llvm.signext` / `llvm.zeroext` to `arg_attrs` and `res_attrs`. That
matches classic Clang's LLVM IR convention — `define void @f(i8 signext
%x)`, not `define void @f(i32 signext %x)` with an entry-block
truncation. The `coercedType` field on an Extend `ArgClassification` is
informational only; the rewriter doesn't use it to change the CIR
signature.

Three `.cir` tests cover narrow-signed-arg, narrow-unsigned-arg, and
narrow-signed-return. Since the test target's narrow-int Extend rule
fires only on MLIR `IntegerType` and CIR functions use `cir::IntType`,
these tests drive the rewriter through the classification-injection path
added in [llvm#195725](llvm#195725).
Link using `-random_uuid` on macOS to avoid accidental UUID matching in tests.
…#199573)

Enable `init_priority` on z/OS

Motivation
The recent addition of `clang/test/Sema/type-dependent-attrs.cpp` in
llvm#182208 started failing on
z/OS. That test uses `[[gnu::init_priority(2000)]]`, and the failure
exposed that init_priority support was still disabled for z/OS in
`Attr.td`.

What changed

- Enabled init_priority for z/OS in `clang/include/clang/Basic/Attr.td`
- Updated `clang/test/SemaCXX/init-priority-attr.cpp` so z/OS now
expects normal semantic handling for init_priority

This reverts commit 2c7e24c and
preserve any changes done after this commit.
…control hints (llvm#181612)

Add target-agnostic infrastructure for the !mem.cache_hint metadata
kind,
https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443

This patch includes:
- Registration of mem.cache_hint in FixedMetadataKinds
- IR Verifier validation of structural constraints
- Metadata helper support in combineMetadata(), copyMetadataForLoad(),
and dropUBImplyingAttrsAndMetadata()
- LangRef documentation for the metadata format and semantics
- Verifier and transform pass test coverage (GVN, InstCombine,
SimplifyCFG)

Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>
Assisted-by: Claude Code

---------

Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>
This test uses outdated `cbuffer` layout design. It has been replaced by
`cbuffer-metadata.ll` when we updated the frontend to use explicit
padding for `cbuffer` data types.
Ran this on an Android device using both algorithms, the new algorithm
is on average 10% faster, but gets to be 15% faster in some cases. This
is an example of the speed-ups.

Average Operation Time    Maximum Operation Time   Name
        326.9(ns)                 80770(ns)        PushBlocks New
        365.9(ns)                108032(ns)        PushBlocks Old
…h Atomic-Clause (llvm#199636)

Adhering to the restrictions of using Memory-Order-Clause with
Atomic-Clause.
Added warnings to indicate the transformations that will done internally
in flang.

In the process of handling all the restrictions of using
memory-order-clause This also Fixes
[llvm#199490](llvm#199490)

---------

Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
This is a very simple implementation, we just make sure we add the base
class destructor to the cleanup scope.
…200261)

When expanding fptoui.sat/fptosi.sat, we saturate when the biased
exponent is at least ExponentBias + BitWidth - IsSigned, the point where
the value no longer fits in the target integer.

We should *also* always saturate when the floating-point value is
+/-inf.  Usually this doesn't require any special handling; for example
for a float32 -> int32 conversion, inf has a biased exponent of 255 >
ExponentBias + BitWidth - IsSigned = 127 + 32 - 1.

But for integer types which are large enough to contain all source
floating-point values, this doesn't work. For example, if you're
converting float32 to int256, you'd compute a threshold of 383, which is
greater than 255.  Therefore float32(inf) would not correctly saturate
to INT256_MAX.

Fix this by clamping the threshold to the all-ones biased exponent.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
joker-eph and others added 23 commits May 30, 2026 00:11
…orStore (llvm#189235)

When `ConvertVectorStore` emits the narrow-type emulation for a
`vector.store` into a 2-D memref, it previously assumed that if the
trailing dimension of the memref exactly matches the vector size
(`trailingDimsMatch`), then the last-dimension index must be zero and no
sub-byte alignment adjustment is needed. This assumption is wrong: a
valid store such as

  vector.store %v, %src[%c0, %c1] : memref<3x4xi2>, vector<4xi2>

has a non-zero column index (%c1 == 1) even though trailingDim (4)
equals the vector size (4). The incorrect shortcut caused the pattern to
fall into the "aligned" path and emit a plain bitcast + store at byte
offset 0, silently dropping elements [1], [2], [3] of the first byte and
overwriting the wrong memory.

Fix: use `linearizedInfo.intraDataOffset` when it can be folded, so
constant non-zero offsets emit the required partial RMW stores. If the
offset is dynamic, reject the generic unaligned lowering instead of
assuming byte alignment; callers that can guarantee container-element
alignment should use the existing `assumeAligned` path.

The regression coverage includes the original constant-index case,
dynamic unaligned stores that now fail legalization, and existing
dynamic-row cases where the low-order offset is provably aligned.

Fixes llvm#131528

Assisted-by: Claude Code
Assisted-by: Codex
)

Mangled should not be more than 4GB. This will halve the size of
`DemangledNameInfo` from 128 to 64 bytes.
…dware accelerators like gpus (llvm#198907)

This is the first patch of many related to
https://discourse.llvm.org/t/upstreaming-basic-support-for-accelerators/89827/6

### What this patch adds

  - **`LLDBServerAcceleratorPlugin`** base class in
`source/Plugins/Process/gdb-remote/` so accelerator can implement the
own plugin
  - **`accelerator-plugins+`** feature in `qSupported` response, only
    advertised when at least one plugin is installed
- **`jAcceleratorPluginInitialize`** GDB remote packet and its
implementation in handlers, request and response.
- **`AcceleratorActions`** struct so every plugin can return the actions
that needs to be on the initilaize. in the future we will extend this
install breakpoints etc.
  - **Mock accelerator plugin** for testing, gated by CMake option
    `LLDB_ENABLE_MOCK_ACCELERATOR_PLUGIN` (default OFF)
  - **Tests** that connect to a real lldb-server, verify
    `accelerator-plugins+` in `qSupported`, send
    `jAcceleratorPluginInitialize`, and validate the JSON response

  ### Design decisions
  - CMake option defaults to OFF so normal builds are unaffected
  - Tests skip automatically when the plugin is not compiled in
…st events (llvm#200474)

For pull_request events, the author of the pull request has full control
over the workflow job and can potentially write any pull request number
to the file.   This would allow them to modify comments on any pull
request not just their own.

The reading of the pull request number from the file was added in
53ff447 to support issue_comment events
so we don't need it for pull_request events anyway.
…ck (llvm#200515)

Jonas caught that I had a typeo in checking for the
`sizeof_mh_and_loadcmds` key in the `jGetLoadedDynamicLibrariesInfos`
response from debugserver in DyanmicLoaderDarwin. Fix that.

Also I originally picked a fallback value for the mach header + load
commands as a guess. I've sinced looked at a large UI app's binaries and
based on the size of their actual mh+load commands, picked a default
that will read all the data needed in the majority of cases.

rdar://178283767
…of inf (llvm#200291)

[ExpandIRInsts] Fix sitofp/uitofp producing garbage instead of inf

s/uitofp of an integer larger than the max finite floating-point value
should produce inf.  This can't happen with e.g. an int32 -> float32
conversion, but can happen for e.g. int256 -> float32.

Before this change we'd produce garbage.

Fixes llvm#189054.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lvm#199256)

The existing setLastAccessAndModificationTime takes a file descriptor.
Add a const Twine & overload that opens the path internally so callers
no longer need to manage the fd themselves. The new overload accepts
both files and directories: on POSIX, O_RDONLY opens directories and the
existing fd-based implementation accepts a directory fd. On Windows,
FILE_FLAG_BACKUP_SEMANTICS is required to obtain a handle for a
directory.

The path overload pair mirrors the existing (Twine &) / (int FD) shape
used by setPermissions and resize_file.
Cover the HIP `__hipRegisterVar` path in CIR and LLVM.
Introduced in llvm#200291. Exclude for now while we get to fixing it.
`copyPhysReg` selected `KMOVQkk_EVEX` for a `$k -> $k` VK16 copy on a
`+egpr` (APX) subtarget even without BWI, but `KMOVQ` requires BWI. Use
`KMOVW` instead.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
The earlier fix in commit 8a0d145 (llvm#158744) only emitted a hard
error for os_log arguments of record or complex type that took the
VarArgKind::Valid / ValidInCXX11 path in checkFormatExpr. Arguments of
non-trivial C++ class type (non-trivial copy/move ctor or dtor) instead
take the VarArgKind::Undefined path, which only emitted the
-Wnon-pod-varargs warning and let compilation proceed into CodeGen.

There, emitBuiltinOSLogFormat passes the argument expression to
EmitScalarExpr, which requires a scalar type. A non-trivial class
argument is not a scalar, so CodeGen crashes (asserting in
hasScalarEvaluationKind in assertions builds).

Emit the hard error err_format_conversion_argument_type_mismatch on the
Undefined path too, so compilation stops before CodeGen.

rdar://174747930
llvm#200499)

…tion

The current wording of the hint is so long that the output obscures the
output of the command, which can be confusing. By shortening the message
the command output hopefully comes back into the center of attention.
`determineCalleeSaves` can run more than once and as a result we were
appending duplicate `Zilsd GPRPair CSR's`. Skip a pair if it is already
present in the CSR set.
…9257)

When dsymutil rewrites an existing .dSYM bundle, only the inner DWARF
file is replaced and the bundle directory's mtime stays frozen at the
time of the original build.

macOS Spotlight's bundle re-import path keys off the bundle directory's
mtime to decide whether the importer should re-run. With the mtime
frozen, Spotlight keeps the previous build's UUID indexed forever,
DebugSymbols.framework's Spotlight lookup misses on the new UUID.

Bump the bundle directory's mtime explicitly at the end of a successful
run, reusing the .dSYM extraction already used by the codesign path.

rdar://177725866
They are unsupported and will hopefully always be.
…0529)

These three tests pass when run against a remote-darwin platform backed
by lldb-platform on device. Update each decorator to reflect where it's
still expected to fail rather than blanket-XFAILing every remote run.

- `TestAssertMessages.test_createTestTarget`: was XFAIL on
oslist=no_match(["linux"]) + remote=True. Add darwin_all to the no_match
list so the XFAIL stays only on remote-windows / remote-freebsd /
remote-netbsd / remote-android.
- `TestDebuggerAPI.test_CreateTarget_platform`: scope to non-Darwin
remotes (bug llvm#92419 still tracks the underlying issue
on those platforms).
- `TestObjcOptimized`: drop @expectedFailureAll(remote=True) from the
test method and put @skipUnlessDarwin on the class. The Makefile depends
on `-framework Foundation` and `-lobjc`, so the test cannot build on
non-Darwin platforms — skip it there outright instead of pretending it
could XFAIL.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…9396)

To enable the new constant interpreter by default at configure time.

I don't expect any distributions to set this for now but it's useful for
testing and I think we need it eventually.
@ronlieb ronlieb requested review from a team, dpalermo and skganesan008 May 30, 2026 06:14
@ronlieb ronlieb closed this pull request by merging all changes into amd-staging in 183fecb May 31, 2026
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20260529234307 branch May 31, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.