Skip to content
This repository was archived by the owner on Sep 15, 2025. It is now read-only.

Commit 872ddfd

Browse files
committed
Update llpc from commit 2ad9102b
GfxRegHandler enhancements lgcdis: Avoid createAsmStreamer deprecation warning [Continuations] Merge max payload size metadata [Continuations] Cleanup pre-llvmraytracing continuations leftovers Remove static opcodes from tanh test Update tests after LLVM update [Continuations] Add llvmraytracing::PipelineState Output semantic mask is decorated mistakenly [Continuations] Move DialectContextAnalysis to util header Fix assumptions in tryOptimizeWorkgroupId [Continuations] Add test for RayGen cont-state free in persistent launch mode [Continuations] Fixup await usage in tests, add assert Add CompleteOp to lowerCpsOps function Add a new pass for Scalar Replacement of Builtins lgc : change ExeGraphRuntimeContext to ownedTheModule Add AmdExtDeviceMemoryAcquire / Release to support device scope memory order acquire / release Clean up unused newly created built-in global variables Preparation for GPURT version 48 bump [RayQuery] Set the workgroup size for the Mesh/Task Refactor GS printing info Refactor multi/single stream framework in copy shader Add back GpurtGetRayStaticIdOp processing. Print the supported gfxip Add passRegistry for ScalarReplacementOfBuiltins [llvmraytracing] Fix linking with dynamic libs [llvmraytracing] Remove functions later in LgcCpsJumpInliner Update llvm-dialects submodule [CompilerUtils] Add ValueOriginTracking [LLPC] Fix assertion in processInputPipeline [CompilerUtils] Add ValueSpecializer [LGC] More ShaderStage refactorings lgc : move ADDR_SPACE_PayloadArray definition to lgcWgDialect.h Remove '.checksum_value' from lit tests Preserve nnan and nsz flags for gl_Position Update and disable OpGroupNonUniformMax.comp Fix a latent LDS allocation issue of NGG ES-GS ring lgc: get sp3 comments back into pipeline dump [CompilerUtils] Expose getCrossModuleName Use overload of CreateIntrinsic with implicit mangling Cleanup non-necessary trivial user-declared destructors Fix atomic swap test after upstream change [Continuations] Add CleanupContinuations Impl class Remove unused lambda capture PatchBufferOp: relax condition of s.buffer.load [LGC] Remove nodetype matching [LGC] Dynamically set CB_SHADER_MASK for dummy export Split up gl_out array type [RayQuery] Use gpurt new functions Rename some classes and files in lower pass [LGC] Use more generic type to search node
1 parent a00d749 commit 872ddfd

261 files changed

Lines changed: 7635 additions & 3360 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

cmake/continuations.cmake

Lines changed: 0 additions & 34 deletions
This file was deleted.

compilerutils/CMakeLists.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ add_llvm_library(LLVMCompilerUtils
1717
lib/DxilToLlvm.cpp
1818
lib/TypeLowering.cpp
1919
lib/TypesMetadata.cpp
20+
lib/ValueOriginTracking.cpp
21+
lib/ValueOriginTrackingTestPass.cpp
22+
lib/ValueSpecialization.cpp
23+
lib/ValueSpecializationTestPass.cpp
2024

2125
DEPENDS
2226
intrinsics_gen

compilerutils/include/compilerutils/CompilerUtils.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ class CrossModuleInliner {
118118
// target module.
119119
llvm::GlobalValue *findCopiedGlobal(llvm::GlobalValue &sourceGv, llvm::Module &targetModule);
120120

121+
static std::string getCrossModuleName(llvm::GlobalValue &gv);
122+
121123
private:
122124
// Checks that we haven't processed a different target module earlier.
123125
void checkTargetModule(llvm::Module &targetModule) {

compilerutils/include/compilerutils/TypesMetadata.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ class TypedFuncTy {
6565
// Construct a TypedFuncTy for the given result type and arg types.
6666
// This constructs the !pointeetys metadata; that can then be attached to a function
6767
// using writeMetadata().
68-
TypedFuncTy(TypedArgTy ResultTy, ArrayRef<TypedArgTy> ArgTys);
68+
TypedFuncTy(TypedArgTy ResultTy, ArrayRef<TypedArgTy> ArgTys, bool IsVarArg = false);
6969

7070
// Get a TypedFuncTy for the given Function, looking up the !pointeetys metadata.
7171
static TypedFuncTy get(const Function *F);
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
/*
2+
***********************************************************************************************************************
3+
*
4+
* Copyright (c) 2024 Advanced Micro Devices, Inc. All Rights Reserved.
5+
*
6+
* Permission is hereby granted, free of charge, to any person obtaining a copy
7+
* of this software and associated documentation files (the "Software"), to
8+
* deal in the Software without restriction, including without limitation the
9+
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10+
* sell copies of the Software, and to permit persons to whom the Software is
11+
* furnished to do so, subject to the following conditions:
12+
*
13+
* The above copyright notice and this permission notice shall be included in all
14+
* copies or substantial portions of the Software.
15+
*
16+
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17+
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18+
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19+
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20+
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21+
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
22+
* IN THE SOFTWARE.
23+
*
24+
**********************************************************************************************************************/
25+
/**
26+
***********************************************************************************************************************
27+
* @file ValueOriginTracking.h
28+
* @brief Helpers for tracking the byte-wise origin of SSA values.
29+
*
30+
* @details
31+
* Sometimes we are interested in the byte-wise contents of a value.
32+
* If the value is a constant, this can be determined with standard LLVM helpers like computeKnownBits,
33+
* but even if the value is dynamic it can be helpful to trace where these bytes come from.
34+
*
35+
* For instance, if some outgoing function arguments de-facto preserve incoming function arguments in the same argument
36+
* slot, then this information may be used to enable certain inter-procedural optimizations.
37+
*
38+
* This file provides helpers for such an analysis.
39+
* It can be thought of splitting values into "slices" (e.g. bytes or dwords), and performing an analysis of where
40+
* these values come from, propagating through things like {insert,extract}{value,element}.
41+
* Using single-byte slices results in a potentially more accurate analysis, but has higher runtime cost.
42+
* For every value, the analysis works on the in-memory layout of its type, including padding, even though we analyze
43+
* only SSA values that might end up in registers.
44+
* It can be thought of as describing the memory obtained from storing a value to memory.
45+
*
46+
* In that sense, it is similar to how SROA splits up allocas into ranges, and analyses ranges separately.
47+
* However, we only track contents of SSA values, and do not propagate through memory, and thus generally
48+
* SROA should have been run before to eliminate non-necessary memory operations.
49+
*
50+
* If the client code has extra information on the origin of some intermediate values that this analysis cannot reason
51+
* about, e.g. calls to special functions, or special loads, then it can provide this information in terms of
52+
* assumptions, which use the same format as the analysis result, mapping slices of a value to slices of other values or
53+
* constants. When analyzing a value with an assumption on it, the algorithm then applies the analysis result for
54+
* values referenced by assumptions, and propagates the result through following instructions.
55+
*
56+
* The analysis does not modify functions, however, as part of the analysis, additional constants may be created.
57+
*
58+
* The motivating application that we have implemented this for is propagating constant known arguments into the
59+
* Traversal shader in continuations-based ray tracing:
60+
*
61+
* The Traversal shader is enqueued by potentially multiple call sites in RayGen (RGS), Closest-Hit (CHS) or Miss (MS)
62+
* shaders. If all these call sites share some common constant arguments (e.g. on the ray payload), then we may
63+
* want to propagate these constants into the Traversal shader to reduce register pressure.
64+
* On these call sites, a simple analysis based on known constant values suffices.
65+
*
66+
* However, the Traversal shader is re-entrant, and may enqueue itself. Also, with Any-Hit (AHS) and/or Intersection
67+
* (IS) shaders in the pipeline, these shaders are enqueued by Traversal, which in turn re-enqueue Traversal.
68+
*
69+
* Thus, in order to prove that incoming arguments of the Traversal shader are known constants, we need to prove
70+
* that all TraceRay call sites share these constants, *and* that all functions that might re-enqueue Traversal
71+
* (Traversal itself, AHS, IS) preserve these arguments, or set it to the same constant.
72+
*
73+
* This analysis allows all of the above: It allows to prove that certain outgoing arguments at TraceRay call sites
74+
* have a specific constant value, and allow to prove that outgoing arguments of Traversal/AHS/IS preserve the
75+
* corresponding incoming ones, or more precisely, that argument slots are preserved.
76+
* Because we track on a fine granularity (e.g. dwords), we might be able to prove that parts of a struct argument are
77+
* preserved even if some fields of it are changed.
78+
*
79+
***********************************************************************************************************************
80+
*/
81+
82+
#pragma once
83+
84+
#include <llvm/ADT/ArrayRef.h>
85+
#include <llvm/ADT/DenseMap.h>
86+
#include <llvm/ADT/SmallVector.h>
87+
88+
namespace llvm {
89+
class raw_ostream;
90+
class Constant;
91+
class DataLayout;
92+
class Function;
93+
class Instruction;
94+
class Value;
95+
} // namespace llvm
96+
97+
namespace CompilerUtils {
98+
99+
namespace ValueTracking {
100+
101+
// enum wrapper with some convenience helpers for common operations.
102+
// The contained value is a bitmask of status, and thus multiple status can be set.
103+
// In that case we know that at run time, one of the status holds, but we don't know which one.
104+
// This can occur with phi nodes and select instructions.
105+
// In the common cases, just a single bit is set though.
106+
struct SliceStatus {
107+
// As the actual enum is contained within the struct, its values don't leak into the containing namespace,
108+
// and it's not possible to implicitly cast a SliceStatus to an int, so it's as good as an enum class.
109+
enum StatusEnum : uint32_t { Constant = 0x1, Dynamic = 0x2, UndefOrPoison = 0x4 };
110+
StatusEnum S = {};
111+
112+
SliceStatus(StatusEnum S) : S{S} {}
113+
114+
static SliceStatus makeEmpty() { return static_cast<StatusEnum>(0); }
115+
116+
// Returns whether all status bits set in other are also set in us.
117+
bool contains(SliceStatus Other) const { return (*this & Other) == Other; }
118+
119+
// Returns whether no status bits are set.
120+
bool isEmpty() const { return static_cast<uint32_t>(S) == 0; }
121+
122+
// Returns whether there is exactly one status bit set. Returns false for an empty status.
123+
bool isSingleStatus() const {
124+
auto AsInt = static_cast<uint32_t>(S);
125+
return (AsInt != 0) && (((AsInt - 1) & AsInt) == 0);
126+
}
127+
128+
SliceStatus operator&(SliceStatus Other) const { return static_cast<StatusEnum>(S & Other.S); }
129+
130+
SliceStatus operator|(SliceStatus Other) const { return static_cast<StatusEnum>(S | Other.S); }
131+
132+
bool operator==(SliceStatus Other) const { return S == Other.S; }
133+
bool operator!=(SliceStatus Other) const { return !(S == Other.S); }
134+
};
135+
136+
static constexpr unsigned MaxSliceSize = 4; // Needed for SliceInfo::ConstantValue
137+
138+
// A slice consists of a consecutive sequence of bytes within the representation of a value.
139+
// We keep track of a potential constant value, and a potential dynamic value that determines
140+
// the byte representation of our slice.
141+
// If both dynamic and constant values are set, then one of them determines the byte representation
142+
// of our slice, but we don't know which.
143+
// If just a single value is set, then we know that that one determines us.
144+
//
145+
// Allowing both a dynamic and a constant value is intended to allow patterns where a value
146+
// is either a constant, or a passed-through argument. If the constant matches the values used
147+
// to initialize the incoming argument on the caller side, then we can still prove that the value
148+
// is in fact constant.
149+
//
150+
// If the bit width of a value is not a multiple of the slice size, the last slice contains
151+
// unspecified high bits. These are not guaranteed to be zeroed out.
152+
struct SliceInfo {
153+
SliceInfo(SliceStatus S) : Status{S} {}
154+
void print(llvm::raw_ostream &OS, bool Compact = false) const;
155+
156+
// Enum-bitmask of possible status of the value.
157+
SliceStatus Status = SliceStatus::makeEmpty();
158+
uint32_t ConstantValue = 0;
159+
static_assert(sizeof(ConstantValue) >= MaxSliceSize);
160+
// If set, the byte representation of this slice is obtained
161+
// from the given value at the given offset.
162+
llvm::Value *DynamicValue = nullptr;
163+
unsigned DynamicValueByteOffset = 0;
164+
};
165+
llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const SliceInfo &BI);
166+
167+
// Combines slice infos for a whole value, unless the value is too large, in which case it might be cut off.
168+
// It is up to client code to detect missing slice infos at the value tail if that is relevant,
169+
// e.g. in order to prove that all bytes in a value match some assumption.
170+
struct ValueInfo {
171+
void print(llvm::raw_ostream &OS, bool Compact = false) const;
172+
173+
// Infos for the byte-wise representation of a value, partitioned into consecutive slices
174+
llvm::SmallVector<SliceInfo> Slices;
175+
};
176+
llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const ValueInfo &VI);
177+
178+
} // namespace ValueTracking
179+
180+
// Utility class to track the origin of values, partitioned into slices of e.g. 1 or 4 bytes each.
181+
// See the documentation at the top of this file for details.
182+
//
183+
// The status of each slice is given by its SliceStatus.
184+
// If the size of a value exceeds MaxBytesPerValue, then only a prefix of that size is analyzed.
185+
// This ensures bounded runtime and memory consumption on pathological cases with huge values.
186+
//
187+
// This is intended to be used for interprocedural optimizations, detecting cases where arguments are initialized with a
188+
// constant and then always propagated, allowing to replace the argument by the initial constant.
189+
class ValueOriginTracker {
190+
public:
191+
using ValueInfo = ValueTracking::ValueInfo;
192+
// In some cases, client code has additional information on where values originate from, or
193+
// where they should be assumed to originate from just for the purpose of the analysis.
194+
// For instance, if a value is spilled and then re-loaded, value origin tracking
195+
// would consider the reloaded value as unknown dynamic, because it doesn't track memory.
196+
// Value origin assumptions allow the client to provide such extra information.
197+
// For each registered value, when the analysis reaches the given value, it will instead rely on the supplied
198+
// ValueInfo, and replace dynamic references by the analysis result for these dynamic values.
199+
// This means that when querying values for which assumptions were given, it is *not* ensured that
200+
// the exact assumptions are returned.
201+
//
202+
// Consider this example using dword slices:
203+
// %equals.3 = add i32 3, 0
204+
// %unknown = call i32 @opaque()
205+
// %arr.0 = insertvalue [3 x i32] poison, i32 %equals.3, 0
206+
// %arr.1 = insertvalue [3 x i32] %arr.0, i32 %unknown, 1
207+
// %arr.stored = insertvalue [3 x i32] %arr.1, i32 %unknown, 2
208+
// store [3 x i32] %arr.stored, ptr %ptr
209+
// %reloaded = load [3 x i32], ptr %ptr
210+
// We supply the assumption that the first two dwords of %reloaded are in fact the first two dwords of
211+
// %arr.stored, and that the third dword equals 7 (because we have some additional knowledge somehow).
212+
// Then, when querying %reloaded, the result will be:
213+
// * dword 0: constant: 0x3 (result of the add)
214+
// * dword 1: dynamic: %unknown (offset 0)
215+
// * dword 2: constant: 0x7
216+
//
217+
// If only some slices are known, the other slices can use the fallback of point to the value itself.
218+
// For values with assumptions, we skip the analysis we'd perform otherwise, so adding assumptions can
219+
// lead to worse analysis results on values that can be analyzed. For now, this feature however
220+
// is intended for values that are otherwise opaque. Support for merging with the standard analysis could be added.
221+
//
222+
// For now, only assumptions on instructions are supported.
223+
// The intended uses of this feature only require it for instructions, and support for non-instructions
224+
// is a bit more complicated but can be added if necessary.
225+
// Also, only a single status on assumptions is allowed.
226+
using ValueOriginAssumptions = llvm::DenseMap<llvm::Instruction *, ValueInfo>;
227+
228+
ValueOriginTracker(const llvm::DataLayout &DL, unsigned BytesPerSlice = 4, unsigned MaxBytesPerValue = 512,
229+
ValueOriginAssumptions OriginAssumptions = ValueOriginAssumptions{})
230+
: DL{DL}, BytesPerSlice{BytesPerSlice}, MaxBytesPerValue{MaxBytesPerValue},
231+
OriginAssumptions(std::move(OriginAssumptions)) {}
232+
233+
// Computes a value info for the given value.
234+
// If the value has been seen before, returns a cache hit from the ValueInfos map.
235+
// When querying multiple values within the same functions, it is more efficient
236+
// to first run analyzeValues() on all of them together.
237+
ValueInfo getValueInfo(llvm::Value *V);
238+
239+
// Analyze a set of values in bulk for efficiency.
240+
// Value analysis needs to process whole functions, so analysing multiple values within the same
241+
// function allows to use a single pass for them all.
242+
// The passed values don't have to be instructions, and don't have to be in the same functions,
243+
// although there is no perf benefit in that case.
244+
// Values may contain duplicates.
245+
void analyzeValues(llvm::ArrayRef<llvm::Value *> Values);
246+
247+
private:
248+
struct ValueInfoBuilder;
249+
const llvm::DataLayout &DL;
250+
unsigned BytesPerSlice = 0;
251+
unsigned MaxBytesPerValue = 0;
252+
ValueOriginAssumptions OriginAssumptions;
253+
llvm::DenseMap<llvm::Value *, ValueInfo> ValueInfos;
254+
255+
// Analyze a value, creating a ValueInfo for it.
256+
// If V is an instruction, this assumes the ValueInfos of dependencies have
257+
// already been created. If some miss, we assume cyclic dependencies and give up
258+
// on this value.
259+
ValueInfo computeValueInfo(llvm::Value *V);
260+
// Same as above, implementing constant analysis
261+
ValueInfo computeConstantValueInfo(ValueInfoBuilder &VIB, llvm::Constant *C);
262+
// Given an origin assumption, compute a value info that combines analysis results
263+
// of the values referenced by the assumption.
264+
ValueInfo computeValueInfoFromAssumption(ValueInfoBuilder &VIB, const ValueInfo &OriginAssumption);
265+
266+
// Implementation function for analyzeValues():
267+
// Ensures that the ValueInfos map contains an entry for V, by optionally computing a value info first.
268+
// Then, return a reference to the value info object within the map.
269+
// The resulting reference is invalidated if ValueInfos is mutated.
270+
// Assumes that all values this depends on have already been analyzed, except for phi nodes,
271+
// which are handled pessimistically in case of loops.
272+
ValueInfo &getOrComputeValueInfo(llvm::Value *V, bool KnownToBeNew = false);
273+
};
274+
275+
} // namespace CompilerUtils

0 commit comments

Comments
 (0)