|
| 1 | +/* |
| 2 | + *********************************************************************************************************************** |
| 3 | + * |
| 4 | + * Copyright (c) 2024 Advanced Micro Devices, Inc. All Rights Reserved. |
| 5 | + * |
| 6 | + * Permission is hereby granted, free of charge, to any person obtaining a copy |
| 7 | + * of this software and associated documentation files (the "Software"), to |
| 8 | + * deal in the Software without restriction, including without limitation the |
| 9 | + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or |
| 10 | + * sell copies of the Software, and to permit persons to whom the Software is |
| 11 | + * furnished to do so, subject to the following conditions: |
| 12 | + * |
| 13 | + * The above copyright notice and this permission notice shall be included in all |
| 14 | + * copies or substantial portions of the Software. |
| 15 | + * |
| 16 | + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 17 | + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 18 | + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 19 | + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 20 | + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING |
| 21 | + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS |
| 22 | + * IN THE SOFTWARE. |
| 23 | + * |
| 24 | + **********************************************************************************************************************/ |
| 25 | +/** |
| 26 | + *********************************************************************************************************************** |
| 27 | + * @file ValueOriginTracking.h |
| 28 | + * @brief Helpers for tracking the byte-wise origin of SSA values. |
| 29 | + * |
| 30 | + * @details |
| 31 | + * Sometimes we are interested in the byte-wise contents of a value. |
| 32 | + * If the value is a constant, this can be determined with standard LLVM helpers like computeKnownBits, |
| 33 | + * but even if the value is dynamic it can be helpful to trace where these bytes come from. |
| 34 | + * |
| 35 | + * For instance, if some outgoing function arguments de-facto preserve incoming function arguments in the same argument |
| 36 | + * slot, then this information may be used to enable certain inter-procedural optimizations. |
| 37 | + * |
| 38 | + * This file provides helpers for such an analysis. |
| 39 | + * It can be thought of splitting values into "slices" (e.g. bytes or dwords), and performing an analysis of where |
| 40 | + * these values come from, propagating through things like {insert,extract}{value,element}. |
| 41 | + * Using single-byte slices results in a potentially more accurate analysis, but has higher runtime cost. |
| 42 | + * For every value, the analysis works on the in-memory layout of its type, including padding, even though we analyze |
| 43 | + * only SSA values that might end up in registers. |
| 44 | + * It can be thought of as describing the memory obtained from storing a value to memory. |
| 45 | + * |
| 46 | + * In that sense, it is similar to how SROA splits up allocas into ranges, and analyses ranges separately. |
| 47 | + * However, we only track contents of SSA values, and do not propagate through memory, and thus generally |
| 48 | + * SROA should have been run before to eliminate non-necessary memory operations. |
| 49 | + * |
| 50 | + * If the client code has extra information on the origin of some intermediate values that this analysis cannot reason |
| 51 | + * about, e.g. calls to special functions, or special loads, then it can provide this information in terms of |
| 52 | + * assumptions, which use the same format as the analysis result, mapping slices of a value to slices of other values or |
| 53 | + * constants. When analyzing a value with an assumption on it, the algorithm then applies the analysis result for |
| 54 | + * values referenced by assumptions, and propagates the result through following instructions. |
| 55 | + * |
| 56 | + * The analysis does not modify functions, however, as part of the analysis, additional constants may be created. |
| 57 | + * |
| 58 | + * The motivating application that we have implemented this for is propagating constant known arguments into the |
| 59 | + * Traversal shader in continuations-based ray tracing: |
| 60 | + * |
| 61 | + * The Traversal shader is enqueued by potentially multiple call sites in RayGen (RGS), Closest-Hit (CHS) or Miss (MS) |
| 62 | + * shaders. If all these call sites share some common constant arguments (e.g. on the ray payload), then we may |
| 63 | + * want to propagate these constants into the Traversal shader to reduce register pressure. |
| 64 | + * On these call sites, a simple analysis based on known constant values suffices. |
| 65 | + * |
| 66 | + * However, the Traversal shader is re-entrant, and may enqueue itself. Also, with Any-Hit (AHS) and/or Intersection |
| 67 | + * (IS) shaders in the pipeline, these shaders are enqueued by Traversal, which in turn re-enqueue Traversal. |
| 68 | + * |
| 69 | + * Thus, in order to prove that incoming arguments of the Traversal shader are known constants, we need to prove |
| 70 | + * that all TraceRay call sites share these constants, *and* that all functions that might re-enqueue Traversal |
| 71 | + * (Traversal itself, AHS, IS) preserve these arguments, or set it to the same constant. |
| 72 | + * |
| 73 | + * This analysis allows all of the above: It allows to prove that certain outgoing arguments at TraceRay call sites |
| 74 | + * have a specific constant value, and allow to prove that outgoing arguments of Traversal/AHS/IS preserve the |
| 75 | + * corresponding incoming ones, or more precisely, that argument slots are preserved. |
| 76 | + * Because we track on a fine granularity (e.g. dwords), we might be able to prove that parts of a struct argument are |
| 77 | + * preserved even if some fields of it are changed. |
| 78 | + * |
| 79 | + *********************************************************************************************************************** |
| 80 | + */ |
| 81 | + |
| 82 | +#pragma once |
| 83 | + |
| 84 | +#include <llvm/ADT/ArrayRef.h> |
| 85 | +#include <llvm/ADT/DenseMap.h> |
| 86 | +#include <llvm/ADT/SmallVector.h> |
| 87 | + |
| 88 | +namespace llvm { |
| 89 | +class raw_ostream; |
| 90 | +class Constant; |
| 91 | +class DataLayout; |
| 92 | +class Function; |
| 93 | +class Instruction; |
| 94 | +class Value; |
| 95 | +} // namespace llvm |
| 96 | + |
| 97 | +namespace CompilerUtils { |
| 98 | + |
| 99 | +namespace ValueTracking { |
| 100 | + |
| 101 | +// enum wrapper with some convenience helpers for common operations. |
| 102 | +// The contained value is a bitmask of status, and thus multiple status can be set. |
| 103 | +// In that case we know that at run time, one of the status holds, but we don't know which one. |
| 104 | +// This can occur with phi nodes and select instructions. |
| 105 | +// In the common cases, just a single bit is set though. |
| 106 | +struct SliceStatus { |
| 107 | + // As the actual enum is contained within the struct, its values don't leak into the containing namespace, |
| 108 | + // and it's not possible to implicitly cast a SliceStatus to an int, so it's as good as an enum class. |
| 109 | + enum StatusEnum : uint32_t { Constant = 0x1, Dynamic = 0x2, UndefOrPoison = 0x4 }; |
| 110 | + StatusEnum S = {}; |
| 111 | + |
| 112 | + SliceStatus(StatusEnum S) : S{S} {} |
| 113 | + |
| 114 | + static SliceStatus makeEmpty() { return static_cast<StatusEnum>(0); } |
| 115 | + |
| 116 | + // Returns whether all status bits set in other are also set in us. |
| 117 | + bool contains(SliceStatus Other) const { return (*this & Other) == Other; } |
| 118 | + |
| 119 | + // Returns whether no status bits are set. |
| 120 | + bool isEmpty() const { return static_cast<uint32_t>(S) == 0; } |
| 121 | + |
| 122 | + // Returns whether there is exactly one status bit set. Returns false for an empty status. |
| 123 | + bool isSingleStatus() const { |
| 124 | + auto AsInt = static_cast<uint32_t>(S); |
| 125 | + return (AsInt != 0) && (((AsInt - 1) & AsInt) == 0); |
| 126 | + } |
| 127 | + |
| 128 | + SliceStatus operator&(SliceStatus Other) const { return static_cast<StatusEnum>(S & Other.S); } |
| 129 | + |
| 130 | + SliceStatus operator|(SliceStatus Other) const { return static_cast<StatusEnum>(S | Other.S); } |
| 131 | + |
| 132 | + bool operator==(SliceStatus Other) const { return S == Other.S; } |
| 133 | + bool operator!=(SliceStatus Other) const { return !(S == Other.S); } |
| 134 | +}; |
| 135 | + |
| 136 | +static constexpr unsigned MaxSliceSize = 4; // Needed for SliceInfo::ConstantValue |
| 137 | + |
| 138 | +// A slice consists of a consecutive sequence of bytes within the representation of a value. |
| 139 | +// We keep track of a potential constant value, and a potential dynamic value that determines |
| 140 | +// the byte representation of our slice. |
| 141 | +// If both dynamic and constant values are set, then one of them determines the byte representation |
| 142 | +// of our slice, but we don't know which. |
| 143 | +// If just a single value is set, then we know that that one determines us. |
| 144 | +// |
| 145 | +// Allowing both a dynamic and a constant value is intended to allow patterns where a value |
| 146 | +// is either a constant, or a passed-through argument. If the constant matches the values used |
| 147 | +// to initialize the incoming argument on the caller side, then we can still prove that the value |
| 148 | +// is in fact constant. |
| 149 | +// |
| 150 | +// If the bit width of a value is not a multiple of the slice size, the last slice contains |
| 151 | +// unspecified high bits. These are not guaranteed to be zeroed out. |
| 152 | +struct SliceInfo { |
| 153 | + SliceInfo(SliceStatus S) : Status{S} {} |
| 154 | + void print(llvm::raw_ostream &OS, bool Compact = false) const; |
| 155 | + |
| 156 | + // Enum-bitmask of possible status of the value. |
| 157 | + SliceStatus Status = SliceStatus::makeEmpty(); |
| 158 | + uint32_t ConstantValue = 0; |
| 159 | + static_assert(sizeof(ConstantValue) >= MaxSliceSize); |
| 160 | + // If set, the byte representation of this slice is obtained |
| 161 | + // from the given value at the given offset. |
| 162 | + llvm::Value *DynamicValue = nullptr; |
| 163 | + unsigned DynamicValueByteOffset = 0; |
| 164 | +}; |
| 165 | +llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const SliceInfo &BI); |
| 166 | + |
| 167 | +// Combines slice infos for a whole value, unless the value is too large, in which case it might be cut off. |
| 168 | +// It is up to client code to detect missing slice infos at the value tail if that is relevant, |
| 169 | +// e.g. in order to prove that all bytes in a value match some assumption. |
| 170 | +struct ValueInfo { |
| 171 | + void print(llvm::raw_ostream &OS, bool Compact = false) const; |
| 172 | + |
| 173 | + // Infos for the byte-wise representation of a value, partitioned into consecutive slices |
| 174 | + llvm::SmallVector<SliceInfo> Slices; |
| 175 | +}; |
| 176 | +llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const ValueInfo &VI); |
| 177 | + |
| 178 | +} // namespace ValueTracking |
| 179 | + |
| 180 | +// Utility class to track the origin of values, partitioned into slices of e.g. 1 or 4 bytes each. |
| 181 | +// See the documentation at the top of this file for details. |
| 182 | +// |
| 183 | +// The status of each slice is given by its SliceStatus. |
| 184 | +// If the size of a value exceeds MaxBytesPerValue, then only a prefix of that size is analyzed. |
| 185 | +// This ensures bounded runtime and memory consumption on pathological cases with huge values. |
| 186 | +// |
| 187 | +// This is intended to be used for interprocedural optimizations, detecting cases where arguments are initialized with a |
| 188 | +// constant and then always propagated, allowing to replace the argument by the initial constant. |
| 189 | +class ValueOriginTracker { |
| 190 | +public: |
| 191 | + using ValueInfo = ValueTracking::ValueInfo; |
| 192 | + // In some cases, client code has additional information on where values originate from, or |
| 193 | + // where they should be assumed to originate from just for the purpose of the analysis. |
| 194 | + // For instance, if a value is spilled and then re-loaded, value origin tracking |
| 195 | + // would consider the reloaded value as unknown dynamic, because it doesn't track memory. |
| 196 | + // Value origin assumptions allow the client to provide such extra information. |
| 197 | + // For each registered value, when the analysis reaches the given value, it will instead rely on the supplied |
| 198 | + // ValueInfo, and replace dynamic references by the analysis result for these dynamic values. |
| 199 | + // This means that when querying values for which assumptions were given, it is *not* ensured that |
| 200 | + // the exact assumptions are returned. |
| 201 | + // |
| 202 | + // Consider this example using dword slices: |
| 203 | + // %equals.3 = add i32 3, 0 |
| 204 | + // %unknown = call i32 @opaque() |
| 205 | + // %arr.0 = insertvalue [3 x i32] poison, i32 %equals.3, 0 |
| 206 | + // %arr.1 = insertvalue [3 x i32] %arr.0, i32 %unknown, 1 |
| 207 | + // %arr.stored = insertvalue [3 x i32] %arr.1, i32 %unknown, 2 |
| 208 | + // store [3 x i32] %arr.stored, ptr %ptr |
| 209 | + // %reloaded = load [3 x i32], ptr %ptr |
| 210 | + // We supply the assumption that the first two dwords of %reloaded are in fact the first two dwords of |
| 211 | + // %arr.stored, and that the third dword equals 7 (because we have some additional knowledge somehow). |
| 212 | + // Then, when querying %reloaded, the result will be: |
| 213 | + // * dword 0: constant: 0x3 (result of the add) |
| 214 | + // * dword 1: dynamic: %unknown (offset 0) |
| 215 | + // * dword 2: constant: 0x7 |
| 216 | + // |
| 217 | + // If only some slices are known, the other slices can use the fallback of point to the value itself. |
| 218 | + // For values with assumptions, we skip the analysis we'd perform otherwise, so adding assumptions can |
| 219 | + // lead to worse analysis results on values that can be analyzed. For now, this feature however |
| 220 | + // is intended for values that are otherwise opaque. Support for merging with the standard analysis could be added. |
| 221 | + // |
| 222 | + // For now, only assumptions on instructions are supported. |
| 223 | + // The intended uses of this feature only require it for instructions, and support for non-instructions |
| 224 | + // is a bit more complicated but can be added if necessary. |
| 225 | + // Also, only a single status on assumptions is allowed. |
| 226 | + using ValueOriginAssumptions = llvm::DenseMap<llvm::Instruction *, ValueInfo>; |
| 227 | + |
| 228 | + ValueOriginTracker(const llvm::DataLayout &DL, unsigned BytesPerSlice = 4, unsigned MaxBytesPerValue = 512, |
| 229 | + ValueOriginAssumptions OriginAssumptions = ValueOriginAssumptions{}) |
| 230 | + : DL{DL}, BytesPerSlice{BytesPerSlice}, MaxBytesPerValue{MaxBytesPerValue}, |
| 231 | + OriginAssumptions(std::move(OriginAssumptions)) {} |
| 232 | + |
| 233 | + // Computes a value info for the given value. |
| 234 | + // If the value has been seen before, returns a cache hit from the ValueInfos map. |
| 235 | + // When querying multiple values within the same functions, it is more efficient |
| 236 | + // to first run analyzeValues() on all of them together. |
| 237 | + ValueInfo getValueInfo(llvm::Value *V); |
| 238 | + |
| 239 | + // Analyze a set of values in bulk for efficiency. |
| 240 | + // Value analysis needs to process whole functions, so analysing multiple values within the same |
| 241 | + // function allows to use a single pass for them all. |
| 242 | + // The passed values don't have to be instructions, and don't have to be in the same functions, |
| 243 | + // although there is no perf benefit in that case. |
| 244 | + // Values may contain duplicates. |
| 245 | + void analyzeValues(llvm::ArrayRef<llvm::Value *> Values); |
| 246 | + |
| 247 | +private: |
| 248 | + struct ValueInfoBuilder; |
| 249 | + const llvm::DataLayout &DL; |
| 250 | + unsigned BytesPerSlice = 0; |
| 251 | + unsigned MaxBytesPerValue = 0; |
| 252 | + ValueOriginAssumptions OriginAssumptions; |
| 253 | + llvm::DenseMap<llvm::Value *, ValueInfo> ValueInfos; |
| 254 | + |
| 255 | + // Analyze a value, creating a ValueInfo for it. |
| 256 | + // If V is an instruction, this assumes the ValueInfos of dependencies have |
| 257 | + // already been created. If some miss, we assume cyclic dependencies and give up |
| 258 | + // on this value. |
| 259 | + ValueInfo computeValueInfo(llvm::Value *V); |
| 260 | + // Same as above, implementing constant analysis |
| 261 | + ValueInfo computeConstantValueInfo(ValueInfoBuilder &VIB, llvm::Constant *C); |
| 262 | + // Given an origin assumption, compute a value info that combines analysis results |
| 263 | + // of the values referenced by the assumption. |
| 264 | + ValueInfo computeValueInfoFromAssumption(ValueInfoBuilder &VIB, const ValueInfo &OriginAssumption); |
| 265 | + |
| 266 | + // Implementation function for analyzeValues(): |
| 267 | + // Ensures that the ValueInfos map contains an entry for V, by optionally computing a value info first. |
| 268 | + // Then, return a reference to the value info object within the map. |
| 269 | + // The resulting reference is invalidated if ValueInfos is mutated. |
| 270 | + // Assumes that all values this depends on have already been analyzed, except for phi nodes, |
| 271 | + // which are handled pessimistically in case of loops. |
| 272 | + ValueInfo &getOrComputeValueInfo(llvm::Value *V, bool KnownToBeNew = false); |
| 273 | +}; |
| 274 | + |
| 275 | +} // namespace CompilerUtils |
0 commit comments