merge main into amd-staging#2737
Merged
68 commits merged intoMay 31, 2026
Merged
Conversation
This test explicitly checks for spurious DWARF lookups by looking for the string `$__lldb` in the dwarf lookup logs. The intent here is to validate that any calls to `FindFunctions` or `FindTypes` won't have `$__lldb` in its lookup target. On arm64, this is trivial because `expr self` goes through the IRInterpreter. However, on arm64e, `expr self` is JIT compiled and executed in the inferior process. LLDB installs some utility functions to check invariants and those functions are usually prefixed with `$__lldb`. When those utility functions appear in the logs, the test incorrectly fails.
…ter. (llvm#200309) Emit a warning for a uninitialized local pointer variable.
Summary: Ever since llvm#171515 we now build the moduel files in the runtime step. This is problematic because pretty much all of the tests depend on them. The current dependency chain doesn't work correctly because the dependencies are set much later. This fix just adds a global property so that we can set these out-of-order and sets that property to depend on all the runtime tests. Flang now becomes the only test suite who depends on the runtimes like this, every other project's test suite is hermetic. However, moving pretty much every flang test to flang-rt isn't ideal. We can investigate this in depth later but for now this fixes the build.
…#200430) After PR184065 was committed, memprof ThinLTO builds were failing on imported aliases, which now have the original aliasee guid attached as metadata (we import aliases as a copy of the aliasee body). In distributed ThinLTO, unless also importing the aliasee symbol, we won't have an entry in the summary for the aliasee guid. And we now don't have a way to locate the alias summary, which caused some assumptions and assertions to fail. Work around this with a TODO to add a way to find the original alias guid.
This is compatible with GNU, as well as being shorter and allowing users to specify symbol names with commas in them. Note that this is distinct from the existing --disassemble which has existed for a long time and disassembles all symbols. This change adds a near-alias for the existing LLVM-specific --disassemble-symbols=. Reviewers: jh7370, MaskRay Pull Request: llvm#196594
The driver resolves the path to the linker ("ld") to the absolute path.
Microsoft Visual Studio comes with its own instance of `ld.lld` which it
installs under
"c:\\program files\\microsoft visual studio\\2022\\community\\vc\\tools\\llvm\\x64\\bin\\ld.lld.exe"
In the developer command prompt, this path is added to PATH where the
Clang driver finds it. However, this path does not match regular
expression `"{{[^" ]*}}ld{{[^" ]*}}"` used by the MIPS test because the
path contains spaces.
Fix the test failure by matching only the the trailing component after
`ld`. Matching the prefix of the path is unique to the MIPS test, this
is not done with tests for other platforms.
It has always been the intent that it was possible to duplicate unnamed_addr constants, and LTO takes advantage of it. The current LangRef wording allows it, but it was not explicitly spelled out, which led Clang developers to add an optimization that assumed that it wasn't possible. We are considering changing the semantics of unnamed_addr, but for now, just make the current state explicit. Reviewers: teresajohnson, zygoloid, rjmccall, ChuanqiXu9, efriedma-quic, ojhunt Pull Request: llvm#199251
This showed up on a spec test, but is a very simple system-sequentially consistent fence instruction.
This PR changes olLaunchKernel to accept an array of pointers to
arguments:
```
void *ArgPtrs[] = {&A, &B, &C};
size_t ArgSizes[] = {sizeof(A), sizeof(B), sizeof(C)};
olLaunchKernel(Queue, Device, Kernel, &LaunchArgs, std::size(ArgPtrs), ArgPtrs, ArgSizes);
```
The newly proposed interface is implementable by existing and
anticipated
backends, is familiar to CUDA programmers, eliminates the extraneous
construction of a contiguous arguments buffer, replacing it with
constructing
an array of pointers, sidesteps the alignment requirements, does not
require reading program image metadata where it's impractical, and
enables
a compliant SYCL implementation to be built on top of it.
The ArgSizes array is required to support OpenCL, which does not have
native support for launching a kernel with an argument pointer array, or
a reliable way of retrieving argument sizes for a kernel.
## Mapping the proposed API to backends
CUDA and Level-Zero both support accepting an array of pointers to
kernel arguments, through `cuLaunchKernel`
(https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15)
and `zeCommandListAppendLaunchKernelWithArguments`
(https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistappendlaunchkernelwitharguments)
respectively.
For OpenCL, which requires the kernel arguments to be set separately
from the kernel launch, a potential implementation can extract the
number of arguments from the OpenCL API, and then iterate over the
argument pointer and size arrays:
```
cl_uint num_args = 0;
cl_int err = clGetKernelInfo(kernel, CL_KERNEL_NUM_ARGS,
sizeof(num_args), &num_args, NULL);
for (cl_uint i = 0; i < num_args; ++i) {
clSetKernelArg(kernel, i, ArgSizes[i], ArgPtrs[i]);
}
```
The AMDGPU plugin needs to construct a contiguous buffer using the array
of argument pointers. To do this conversion, we need to have offsets
at which to place the arguments. Here, luckily, as mentioned before,
the AMD plugin already reads kernel metadata from the program image.
The implementation simply retrieves size and offset for each argument
from the kernel image, and then uses it to populate a buffer.
[This commit](llvm@5536348) moved the config file `open()` into a `with` context manager but left the trailing `f.close()` call behind. Since the context manager already closes the file, the call is redundant. It is also outside the `with` block, so `f` is unbound on the `except OSError` path. This removes it. No change in behavior. Signed-off-by: Prasoon Kumar <prasoonkumar054@gmail.com>
5047ae2 left out some whitespace that is necessary for sphinx to work.
This patch modifies existing splitdouble lowering to handle matrix types. Fix: llvm#199069
llvm#200353) The libclc lit test 'math/cos.cl' (introduced by llvm#197151 with auto-generated FileCheck assertions) started failing on the staging buildbot after the revert in llvm#199981 of the AMDGPU ABI coercion change (llvm#185083). The revert restores the older codegen path, which produces a slightly different IR shape for the select instruction inside the cos() implementation: - the select's 'contract' fast-math flag is no longer present, and - the result is named %.v.i.i instead of a numbered SSA temporary. Regenerate the AMDGCN CHECK lines with libclc/test/update_libclc_tests.py to reflect the post-revert IR. After this change, all 10 libclc tests in check-libclc-amdgcn-amd-amdhsa-llvm pass. Note: running update_libclc_tests.py also produces a purely cosmetic diff in math/rsqrt.cl (it renames a few FileCheck capture names like META13 to META12 to match the actual metadata indices); those capture renames are not required for the test to pass and are not included here to keep the change minimal. Co-authored-by: mselehov <mselehov@amd.com>
…lvm#199964) A declarations-only module produces no `MachineFunction` at all, `SPIRVModuleAnalysis` is freed before `AsmPrinter::doFinalization` and `outputModuleSections` asserts in `getAnalysis<SPIRVModuleAnalysis>()` which causes a failure
CreateCondBr always returns an Instruction, so no need to dyn_cast back to an instruction after downcasting to a Value.
…andling (llvm#200096) Several small fixes and improvements to `clang-sycl-linker`'s command-line handling, plus completing the `--spirv-dump-device-code` option: - **`--version`**: now exits with `EXIT_SUCCESS` after printing, instead of falling through into the rest of `main`. - **Empty input**: report a clear "No input files provided" error from `getInput` rather than triggering an assertion deep inside `linkDeviceCode`. - **`--spirv-dump-device-code`**: previously parsed but ignored — now actually copies each generated `.spv` file into the requested directory. The directory is created up-front (`mkdir -p` semantics) so a missing path produces a friendly diagnostic instead of a low-level copy errno. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (llvm#200327) mergeConditionalStoreToAddress() merges two stores into one. It does this for non-atomic and atomic-unordered stores, but when merging unordered stores, it would downgrade them to non-atomic! This bug isn't accessible from C because C doesn't expose unordered atomics. But you can access it from e.g. Objective-C with something like ``` // repro.m — clang -fno-objc-arc -O2 __attribute__((objc_root_class)) @interface C { int _value; } @Property(atomic, direct) int value; @EnD @implementation C @EnD void f(C *obj, _Bool c1, _Bool c2, int v1, int v2) { if (!obj) __builtin_unreachable(); if (c1) obj.value = v1; if (c2) obj.value = v2; } ``` LLVM merges these into a single store. The store is non-atomic without this change. This bug was found by a large run of Opus 4.7 looking for bugs in LLVM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#195745) Third PR in the series splitting [llvm#192119](llvm#192119) / [llvm#192124](llvm#192124). [llvm#195725](llvm#195725) and [llvm#195737](llvm#195737) have merged; this PR is now a standalone diff on main. Adds Extend (signext / zeroext) to `cir-call-conv-lowering`. The CIR signature keeps the original narrow integer type; the rewriter attaches `llvm.signext` / `llvm.zeroext` to `arg_attrs` and `res_attrs`. That matches classic Clang's LLVM IR convention — `define void @f(i8 signext %x)`, not `define void @f(i32 signext %x)` with an entry-block truncation. The `coercedType` field on an Extend `ArgClassification` is informational only; the rewriter doesn't use it to change the CIR signature. Three `.cir` tests cover narrow-signed-arg, narrow-unsigned-arg, and narrow-signed-return. Since the test target's narrow-int Extend rule fires only on MLIR `IntegerType` and CIR functions use `cir::IntType`, these tests drive the rewriter through the classification-injection path added in [llvm#195725](llvm#195725).
Link using `-random_uuid` on macOS to avoid accidental UUID matching in tests.
…#199573) Enable `init_priority` on z/OS Motivation The recent addition of `clang/test/Sema/type-dependent-attrs.cpp` in llvm#182208 started failing on z/OS. That test uses `[[gnu::init_priority(2000)]]`, and the failure exposed that init_priority support was still disabled for z/OS in `Attr.td`. What changed - Enabled init_priority for z/OS in `clang/include/clang/Basic/Attr.td` - Updated `clang/test/SemaCXX/init-priority-attr.cpp` so z/OS now expects normal semantic handling for init_priority This reverts commit 2c7e24c and preserve any changes done after this commit.
…control hints (llvm#181612) Add target-agnostic infrastructure for the !mem.cache_hint metadata kind, https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443 This patch includes: - Registration of mem.cache_hint in FixedMetadataKinds - IR Verifier validation of structural constraints - Metadata helper support in combineMetadata(), copyMetadataForLoad(), and dropUBImplyingAttrsAndMetadata() - LangRef documentation for the metadata format and semantics - Verifier and transform pass test coverage (GVN, InstCombine, SimplifyCFG) Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com> Assisted-by: Claude Code --------- Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>
This test uses outdated `cbuffer` layout design. It has been replaced by `cbuffer-metadata.ll` when we updated the frontend to use explicit padding for `cbuffer` data types.
Ran this on an Android device using both algorithms, the new algorithm
is on average 10% faster, but gets to be 15% faster in some cases. This
is an example of the speed-ups.
Average Operation Time Maximum Operation Time Name
326.9(ns) 80770(ns) PushBlocks New
365.9(ns) 108032(ns) PushBlocks Old
…h Atomic-Clause (llvm#199636) Adhering to the restrictions of using Memory-Order-Clause with Atomic-Clause. Added warnings to indicate the transformations that will done internally in flang. In the process of handling all the restrictions of using memory-order-clause This also Fixes [llvm#199490](llvm#199490) --------- Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
This is a very simple implementation, we just make sure we add the base class destructor to the cleanup scope.
…200261) When expanding fptoui.sat/fptosi.sat, we saturate when the biased exponent is at least ExponentBias + BitWidth - IsSigned, the point where the value no longer fits in the target integer. We should *also* always saturate when the floating-point value is +/-inf. Usually this doesn't require any special handling; for example for a float32 -> int32 conversion, inf has a biased exponent of 255 > ExponentBias + BitWidth - IsSigned = 127 + 32 - 1. But for integer types which are large enough to contain all source floating-point values, this doesn't work. For example, if you're converting float32 to int256, you'd compute a threshold of 383, which is greater than 255. Therefore float32(inf) would not correctly saturate to INT256_MAX. Fix this by clamping the threshold to the all-ones biased exponent. This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
…orStore (llvm#189235) When `ConvertVectorStore` emits the narrow-type emulation for a `vector.store` into a 2-D memref, it previously assumed that if the trailing dimension of the memref exactly matches the vector size (`trailingDimsMatch`), then the last-dimension index must be zero and no sub-byte alignment adjustment is needed. This assumption is wrong: a valid store such as vector.store %v, %src[%c0, %c1] : memref<3x4xi2>, vector<4xi2> has a non-zero column index (%c1 == 1) even though trailingDim (4) equals the vector size (4). The incorrect shortcut caused the pattern to fall into the "aligned" path and emit a plain bitcast + store at byte offset 0, silently dropping elements [1], [2], [3] of the first byte and overwriting the wrong memory. Fix: use `linearizedInfo.intraDataOffset` when it can be folded, so constant non-zero offsets emit the required partial RMW stores. If the offset is dynamic, reject the generic unaligned lowering instead of assuming byte alignment; callers that can guarantee container-element alignment should use the existing `assumeAligned` path. The regression coverage includes the original constant-index case, dynamic unaligned stores that now fail legalization, and existing dynamic-row cases where the low-order offset is provably aligned. Fixes llvm#131528 Assisted-by: Claude Code Assisted-by: Codex
…ps" (llvm#200512) Reverts llvm#198457 since there are buildbot failures.
…dware accelerators like gpus (llvm#198907) This is the first patch of many related to https://discourse.llvm.org/t/upstreaming-basic-support-for-accelerators/89827/6 ### What this patch adds - **`LLDBServerAcceleratorPlugin`** base class in `source/Plugins/Process/gdb-remote/` so accelerator can implement the own plugin - **`accelerator-plugins+`** feature in `qSupported` response, only advertised when at least one plugin is installed - **`jAcceleratorPluginInitialize`** GDB remote packet and its implementation in handlers, request and response. - **`AcceleratorActions`** struct so every plugin can return the actions that needs to be on the initilaize. in the future we will extend this install breakpoints etc. - **Mock accelerator plugin** for testing, gated by CMake option `LLDB_ENABLE_MOCK_ACCELERATOR_PLUGIN` (default OFF) - **Tests** that connect to a real lldb-server, verify `accelerator-plugins+` in `qSupported`, send `jAcceleratorPluginInitialize`, and validate the JSON response ### Design decisions - CMake option defaults to OFF so normal builds are unaffected - Tests skip automatically when the plugin is not compiled in
Squelch a warning reported by a Buildbot: https://lab.llvm.org/buildbot/#/builders/228/builds/477.
…st events (llvm#200474) For pull_request events, the author of the pull request has full control over the workflow job and can potentially write any pull request number to the file. This would allow them to modify comments on any pull request not just their own. The reading of the pull request number from the file was added in 53ff447 to support issue_comment events so we don't need it for pull_request events anyway.
…ck (llvm#200515) Jonas caught that I had a typeo in checking for the `sizeof_mh_and_loadcmds` key in the `jGetLoadedDynamicLibrariesInfos` response from debugserver in DyanmicLoaderDarwin. Fix that. Also I originally picked a fallback value for the mach header + load commands as a guess. I've sinced looked at a large UI app's binaries and based on the size of their actual mh+load commands, picked a default that will read all the data needed in the majority of cases. rdar://178283767
…of inf (llvm#200291) [ExpandIRInsts] Fix sitofp/uitofp producing garbage instead of inf s/uitofp of an integer larger than the max finite floating-point value should produce inf. This can't happen with e.g. an int32 -> float32 conversion, but can happen for e.g. int256 -> float32. Before this change we'd produce garbage. Fixes llvm#189054. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#200097) https://github.com/llvm/llvm-project/security/code-scanning/1580 https://github.com/llvm/llvm-project/security/code-scanning/1581 https://github.com/llvm/llvm-project/security/code-scanning/1582 https://github.com/llvm/llvm-project/security/code-scanning/1583 https://github.com/llvm/llvm-project/security/code-scanning/1584 https://github.com/llvm/llvm-project/security/code-scanning/1585 https://github.com/llvm/llvm-project/security/code-scanning/1586 https://github.com/llvm/llvm-project/security/code-scanning/1587
…lvm#199256) The existing setLastAccessAndModificationTime takes a file descriptor. Add a const Twine & overload that opens the path internally so callers no longer need to manage the fd themselves. The new overload accepts both files and directories: on POSIX, O_RDONLY opens directories and the existing fd-based implementation accepts a directory fd. On Windows, FILE_FLAG_BACKUP_SEMANTICS is required to obtain a handle for a directory. The path overload pair mirrors the existing (Twine &) / (int FD) shape used by setPermissions and resize_file.
Cover the HIP `__hipRegisterVar` path in CIR and LLVM.
Introduced in llvm#200291. Exclude for now while we get to fixing it.
`copyPhysReg` selected `KMOVQkk_EVEX` for a `$k -> $k` VK16 copy on a `+egpr` (APX) subtarget even without BWI, but `KMOVQ` requires BWI. Use `KMOVW` instead. This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
The earlier fix in commit 8a0d145 (llvm#158744) only emitted a hard error for os_log arguments of record or complex type that took the VarArgKind::Valid / ValidInCXX11 path in checkFormatExpr. Arguments of non-trivial C++ class type (non-trivial copy/move ctor or dtor) instead take the VarArgKind::Undefined path, which only emitted the -Wnon-pod-varargs warning and let compilation proceed into CodeGen. There, emitBuiltinOSLogFormat passes the argument expression to EmitScalarExpr, which requires a scalar type. A non-trivial class argument is not a scalar, so CodeGen crashes (asserting in hasScalarEvaluationKind in assertions builds). Emit the hard error err_format_conversion_argument_type_mismatch on the Undefined path too, so compilation stops before CodeGen. rdar://174747930
llvm#200499) …tion The current wording of the hint is so long that the output obscures the output of the command, which can be confusing. By shortening the message the command output hopefully comes back into the center of attention.
`determineCalleeSaves` can run more than once and as a result we were appending duplicate `Zilsd GPRPair CSR's`. Skip a pair if it is already present in the CSR set.
…9257) When dsymutil rewrites an existing .dSYM bundle, only the inner DWARF file is replaced and the bundle directory's mtime stays frozen at the time of the original build. macOS Spotlight's bundle re-import path keys off the bundle directory's mtime to decide whether the importer should re-run. With the mtime frozen, Spotlight keeps the previous build's UUID indexed forever, DebugSymbols.framework's Spotlight lookup misses on the new UUID. Bump the bundle directory's mtime explicitly at the end of a successful run, reusing the .dSYM extraction already used by the codesign path. rdar://177725866
They are unsupported and will hopefully always be.
…0529) These three tests pass when run against a remote-darwin platform backed by lldb-platform on device. Update each decorator to reflect where it's still expected to fail rather than blanket-XFAILing every remote run. - `TestAssertMessages.test_createTestTarget`: was XFAIL on oslist=no_match(["linux"]) + remote=True. Add darwin_all to the no_match list so the XFAIL stays only on remote-windows / remote-freebsd / remote-netbsd / remote-android. - `TestDebuggerAPI.test_CreateTarget_platform`: scope to non-Darwin remotes (bug llvm#92419 still tracks the underlying issue on those platforms). - `TestObjcOptimized`: drop @expectedFailureAll(remote=True) from the test method and put @skipUnlessDarwin on the class. The Makefile depends on `-framework Foundation` and `-lobjc`, so the test cannot build on non-Darwin platforms — skip it there outright instead of pretending it could XFAIL. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…9396) To enable the new constant interpreter by default at configure time. I don't expect any distributions to set this for now but it's useful for testing and I think we need it eventually.
dpalermo
approved these changes
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.