LLVM 22.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUMemoryUtils.h"
181#include "AMDGPUTargetMachine.h"
182#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
191#include "llvm/IR/Constants.h"
192#include "llvm/IR/DerivedTypes.h"
193#include "llvm/IR/Dominators.h"
194#include "llvm/IR/IRBuilder.h"
195#include "llvm/IR/InlineAsm.h"
196#include "llvm/IR/Instructions.h"
197#include "llvm/IR/IntrinsicsAMDGPU.h"
198#include "llvm/IR/MDBuilder.h"
201#include "llvm/Pass.h"
203#include "llvm/Support/Debug.h"
204#include "llvm/Support/Format.h"
209
210#include <vector>
211
212#include <cstdio>
213
214#define DEBUG_TYPE "amdgpu-lower-module-lds"
215
216using namespace llvm;
217using namespace AMDGPU;
218
219namespace {
220
221cl::opt<bool> SuperAlignLDSGlobals(
222 "amdgpu-super-align-lds-globals",
223 cl::desc("Increase alignment of LDS if it is not on align boundary"),
224 cl::init(true), cl::Hidden);
225
226enum class LoweringKind { module, table, kernel, hybrid };
227cl::opt<LoweringKind> LoweringKindLoc(
228 "amdgpu-lower-module-lds-strategy",
229 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
230 cl::init(LoweringKind::hybrid),
232 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
233 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
235 LoweringKind::kernel, "kernel",
236 "Lower variables reachable from one kernel, otherwise abort"),
237 clEnumValN(LoweringKind::hybrid, "hybrid",
238 "Lower via mixture of above strategies")));
239
240template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
241 llvm::sort(V, [](const auto *L, const auto *R) {
242 return L->getName() < R->getName();
243 });
244 return {std::move(V)};
245}
246
247class AMDGPULowerModuleLDS {
248 const AMDGPUTargetMachine &TM;
249
250 static void
251 removeLocalVarsFromUsedLists(Module &M,
252 const DenseSet<GlobalVariable *> &LocalVars) {
253 // The verifier rejects used lists containing an inttoptr of a constant
254 // so remove the variables from these lists before replaceAllUsesWith
255 SmallPtrSet<Constant *, 8> LocalVarsSet;
256 for (GlobalVariable *LocalVar : LocalVars)
257 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
258
260 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
261
262 for (GlobalVariable *LocalVar : LocalVars)
263 LocalVar->removeDeadConstantUsers();
264 }
265
266 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
267 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
268 // that might call a function which accesses a field within it. This is
269 // presently approximated to 'all kernels' if there are any such functions
270 // in the module. This implicit use is redefined as an explicit use here so
271 // that later passes, specifically PromoteAlloca, account for the required
272 // memory without any knowledge of this transform.
273
274 // An operand bundle on llvm.donothing works because the call instruction
275 // survives until after the last pass that needs to account for LDS. It is
276 // better than inline asm as the latter survives until the end of codegen. A
277 // totally robust solution would be a function with the same semantics as
278 // llvm.donothing that takes a pointer to the instance and is lowered to a
279 // no-op after LDS is allocated, but that is not presently necessary.
280
281 // This intrinsic is eliminated shortly before instruction selection. It
282 // does not suffice to indicate to ISel that a given global which is not
283 // immediately used by the kernel must still be allocated by it. An
284 // equivalent target specific intrinsic which lasts until immediately after
285 // codegen would suffice for that, but one would still need to ensure that
286 // the variables are allocated in the anticipated order.
287 BasicBlock *Entry = &Func->getEntryBlock();
288 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
289
291 Func->getParent(), Intrinsic::donothing, {});
292
293 Value *UseInstance[1] = {
294 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
295
296 Builder.CreateCall(
297 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
298 }
299
300public:
301 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
302
303 struct LDSVariableReplacement {
304 GlobalVariable *SGV = nullptr;
305 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
306 };
307
308 // remap from lds global to a constantexpr gep to where it has been moved to
309 // for each kernel
310 // an array with an element for each kernel containing where the corresponding
311 // variable was remapped to
312
313 static Constant *getAddressesOfVariablesInKernel(
315 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
316 // Create a ConstantArray containing the address of each Variable within the
317 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
318 // does not allocate it
319 // TODO: Drop the ptrtoint conversion
320
321 Type *I32 = Type::getInt32Ty(Ctx);
322
323 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
324
326 for (GlobalVariable *GV : Variables) {
327 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
328 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
329 auto *elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
330 Elements.push_back(elt);
331 } else {
332 Elements.push_back(PoisonValue::get(I32));
333 }
334 }
335 return ConstantArray::get(KernelOffsetsType, Elements);
336 }
337
338 static GlobalVariable *buildLookupTable(
340 ArrayRef<Function *> kernels,
342 if (Variables.empty()) {
343 return nullptr;
344 }
345 LLVMContext &Ctx = M.getContext();
346
347 const size_t NumberVariables = Variables.size();
348 const size_t NumberKernels = kernels.size();
349
350 ArrayType *KernelOffsetsType =
351 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
352
353 ArrayType *AllKernelsOffsetsType =
354 ArrayType::get(KernelOffsetsType, NumberKernels);
355
356 Constant *Missing = PoisonValue::get(KernelOffsetsType);
357 std::vector<Constant *> overallConstantExprElts(NumberKernels);
358 for (size_t i = 0; i < NumberKernels; i++) {
359 auto Replacement = KernelToReplacement.find(kernels[i]);
360 overallConstantExprElts[i] =
361 (Replacement == KernelToReplacement.end())
362 ? Missing
363 : getAddressesOfVariablesInKernel(
364 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
365 }
366
367 Constant *init =
368 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
369
370 return new GlobalVariable(
371 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
372 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
374 }
375
376 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
377 GlobalVariable *LookupTable,
378 GlobalVariable *GV, Use &U,
379 Value *OptionalIndex) {
380 // Table is a constant array of the same length as OrderedKernels
381 LLVMContext &Ctx = M.getContext();
382 Type *I32 = Type::getInt32Ty(Ctx);
383 auto *I = cast<Instruction>(U.getUser());
384
385 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
386
387 if (auto *Phi = dyn_cast<PHINode>(I)) {
388 BasicBlock *BB = Phi->getIncomingBlock(U);
389 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
390 } else {
391 Builder.SetInsertPoint(I);
392 }
393
394 SmallVector<Value *, 3> GEPIdx = {
395 ConstantInt::get(I32, 0),
396 tableKernelIndex,
397 };
398 if (OptionalIndex)
399 GEPIdx.push_back(OptionalIndex);
400
401 Value *Address = Builder.CreateInBoundsGEP(
402 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
403
404 Value *loaded = Builder.CreateLoad(I32, Address);
405
406 Value *replacement =
407 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
408
409 U.set(replacement);
410 }
411
412 void replaceUsesInInstructionsWithTableLookup(
413 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
414 GlobalVariable *LookupTable) {
415
416 LLVMContext &Ctx = M.getContext();
417 IRBuilder<> Builder(Ctx);
418 Type *I32 = Type::getInt32Ty(Ctx);
419
420 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
421 auto *GV = ModuleScopeVariables[Index];
422
423 for (Use &U : make_early_inc_range(GV->uses())) {
424 auto *I = dyn_cast<Instruction>(U.getUser());
425 if (!I)
426 continue;
427
428 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
429 ConstantInt::get(I32, Index));
430 }
431 }
432 }
433
434 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
435 Module &M, LDSUsesInfoTy &LDSUsesInfo,
436 DenseSet<GlobalVariable *> const &VariableSet) {
437
438 DenseSet<Function *> KernelSet;
439
440 if (VariableSet.empty())
441 return KernelSet;
442
443 for (Function &Func : M.functions()) {
444 if (Func.isDeclaration() || !isKernelLDS(&Func))
445 continue;
446 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
447 if (VariableSet.contains(GV)) {
448 KernelSet.insert(&Func);
449 break;
450 }
451 }
452 }
453
454 return KernelSet;
455 }
456
457 static GlobalVariable *
458 chooseBestVariableForModuleStrategy(const DataLayout &DL,
459 VariableFunctionMap &LDSVars) {
460 // Find the global variable with the most indirect uses from kernels
461
462 struct CandidateTy {
463 GlobalVariable *GV = nullptr;
464 size_t UserCount = 0;
465 size_t Size = 0;
466
467 CandidateTy() = default;
468
469 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
470 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
471
472 bool operator<(const CandidateTy &Other) const {
473 // Fewer users makes module scope variable less attractive
474 if (UserCount < Other.UserCount) {
475 return true;
476 }
477 if (UserCount > Other.UserCount) {
478 return false;
479 }
480
481 // Bigger makes module scope variable less attractive
482 if (Size < Other.Size) {
483 return false;
484 }
485
486 if (Size > Other.Size) {
487 return true;
488 }
489
490 // Arbitrary but consistent
491 return GV->getName() < Other.GV->getName();
492 }
493 };
494
495 CandidateTy MostUsed;
496
497 for (auto &K : LDSVars) {
498 GlobalVariable *GV = K.first;
499 if (K.second.size() <= 1) {
500 // A variable reachable by only one kernel is best lowered with kernel
501 // strategy
502 continue;
503 }
504 CandidateTy Candidate(
505 GV, K.second.size(),
506 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
507 if (MostUsed < Candidate)
508 MostUsed = Candidate;
509 }
510
511 return MostUsed.GV;
512 }
513
514 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
515 uint32_t Address) {
516 // Write the specified address into metadata where it can be retrieved by
517 // the assembler. Format is a half open range, [Address Address+1)
518 LLVMContext &Ctx = M->getContext();
519 auto *IntTy =
520 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
521 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
522 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
523 GV->setMetadata(LLVMContext::MD_absolute_symbol,
524 MDNode::get(Ctx, {MinC, MaxC}));
525 }
526
527 DenseMap<Function *, Value *> tableKernelIndexCache;
528 Value *getTableLookupKernelIndex(Module &M, Function *F) {
529 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
530 // lowers to a read from a live in register. Emit it once in the entry
531 // block to spare deduplicating it later.
532 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
533 if (Inserted) {
534 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
535 IRBuilder<> Builder(&*InsertAt);
536
537 It->second = Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {});
538 }
539
540 return It->second;
541 }
542
543 static std::vector<Function *> assignLDSKernelIDToEachKernel(
544 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
545 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
546 // Associate kernels in the set with an arbitrary but reproducible order and
547 // annotate them with that order in metadata. This metadata is recognised by
548 // the backend and lowered to a SGPR which can be read from using
549 // amdgcn_lds_kernel_id.
550
551 std::vector<Function *> OrderedKernels;
552 if (!KernelsThatAllocateTableLDS.empty() ||
553 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
554
555 for (Function &Func : M->functions()) {
556 if (Func.isDeclaration())
557 continue;
558 if (!isKernelLDS(&Func))
559 continue;
560
561 if (KernelsThatAllocateTableLDS.contains(&Func) ||
562 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
563 assert(Func.hasName()); // else fatal error earlier
564 OrderedKernels.push_back(&Func);
565 }
566 }
567
568 // Put them in an arbitrary but reproducible order
569 OrderedKernels = sortByName(std::move(OrderedKernels));
570
571 // Annotate the kernels with their order in this vector
572 LLVMContext &Ctx = M->getContext();
573 IRBuilder<> Builder(Ctx);
574
575 if (OrderedKernels.size() > UINT32_MAX) {
576 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
577 reportFatalUsageError("unimplemented LDS lowering for > 2**32 kernels");
578 }
579
580 for (size_t i = 0; i < OrderedKernels.size(); i++) {
581 Metadata *AttrMDArgs[1] = {
582 ConstantAsMetadata::get(Builder.getInt32(i)),
583 };
584 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
585 MDNode::get(Ctx, AttrMDArgs));
586 }
587 }
588 return OrderedKernels;
589 }
590
591 static void partitionVariablesIntoIndirectStrategies(
592 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
593 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
594 DenseSet<GlobalVariable *> &ModuleScopeVariables,
595 DenseSet<GlobalVariable *> &TableLookupVariables,
596 DenseSet<GlobalVariable *> &KernelAccessVariables,
597 DenseSet<GlobalVariable *> &DynamicVariables) {
598
599 GlobalVariable *HybridModuleRoot =
600 LoweringKindLoc != LoweringKind::hybrid
601 ? nullptr
602 : chooseBestVariableForModuleStrategy(
603 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
604
605 DenseSet<Function *> const EmptySet;
606 DenseSet<Function *> const &HybridModuleRootKernels =
607 HybridModuleRoot
608 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
609 : EmptySet;
610
611 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
612 // Each iteration of this loop assigns exactly one global variable to
613 // exactly one of the implementation strategies.
614
615 GlobalVariable *GV = K.first;
617 assert(K.second.size() != 0);
618
619 if (AMDGPU::isDynamicLDS(*GV)) {
620 DynamicVariables.insert(GV);
621 continue;
622 }
623
624 switch (LoweringKindLoc) {
625 case LoweringKind::module:
626 ModuleScopeVariables.insert(GV);
627 break;
628
629 case LoweringKind::table:
630 TableLookupVariables.insert(GV);
631 break;
632
633 case LoweringKind::kernel:
634 if (K.second.size() == 1) {
635 KernelAccessVariables.insert(GV);
636 } else {
637 // FIXME: This should use DiagnosticInfo
639 "cannot lower LDS '" + GV->getName() +
640 "' to kernel access as it is reachable from multiple kernels");
641 }
642 break;
643
644 case LoweringKind::hybrid: {
645 if (GV == HybridModuleRoot) {
646 assert(K.second.size() != 1);
647 ModuleScopeVariables.insert(GV);
648 } else if (K.second.size() == 1) {
649 KernelAccessVariables.insert(GV);
650 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
651 ModuleScopeVariables.insert(GV);
652 } else {
653 TableLookupVariables.insert(GV);
654 }
655 break;
656 }
657 }
658 }
659
660 // All LDS variables accessed indirectly have now been partitioned into
661 // the distinct lowering strategies.
662 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
663 KernelAccessVariables.size() + DynamicVariables.size() ==
664 LDSToKernelsThatNeedToAccessItIndirectly.size());
665 }
666
667 static GlobalVariable *lowerModuleScopeStructVariables(
668 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
669 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
670 // Create a struct to hold the ModuleScopeVariables
671 // Replace all uses of those variables from non-kernel functions with the
672 // new struct instance Replace only the uses from kernel functions that will
673 // allocate this instance. That is a space optimisation - kernels that use a
674 // subset of the module scope struct and do not need to allocate it for
675 // indirect calls will only allocate the subset they use (they do so as part
676 // of the per-kernel lowering).
677 if (ModuleScopeVariables.empty()) {
678 return nullptr;
679 }
680
681 LLVMContext &Ctx = M.getContext();
682
683 LDSVariableReplacement ModuleScopeReplacement =
684 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
685 ModuleScopeVariables);
686
687 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
689 cast<Constant>(ModuleScopeReplacement.SGV),
690 PointerType::getUnqual(Ctx)))});
691
692 // module.lds will be allocated at zero in any kernel that allocates it
693 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
694
695 // historic
696 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
697
698 // Replace all uses of module scope variable from non-kernel functions
699 replaceLDSVariablesWithStruct(
700 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
701 Instruction *I = dyn_cast<Instruction>(U.getUser());
702 if (!I) {
703 return false;
704 }
705 Function *F = I->getFunction();
706 return !isKernelLDS(F);
707 });
708
709 // Replace uses of module scope variable from kernel functions that
710 // allocate the module scope variable, otherwise leave them unchanged
711 // Record on each kernel whether the module scope global is used by it
712
713 for (Function &Func : M.functions()) {
714 if (Func.isDeclaration() || !isKernelLDS(&Func))
715 continue;
716
717 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
718 replaceLDSVariablesWithStruct(
719 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
720 Instruction *I = dyn_cast<Instruction>(U.getUser());
721 if (!I) {
722 return false;
723 }
724 Function *F = I->getFunction();
725 return F == &Func;
726 });
727
728 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
729 }
730 }
731
732 return ModuleScopeReplacement.SGV;
733 }
734
736 lowerKernelScopeStructVariables(
737 Module &M, LDSUsesInfoTy &LDSUsesInfo,
738 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
739 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
740 GlobalVariable *MaybeModuleScopeStruct) {
741
742 // Create a struct for each kernel for the non-module-scope variables.
743
745 for (Function &Func : M.functions()) {
746 if (Func.isDeclaration() || !isKernelLDS(&Func))
747 continue;
748
749 DenseSet<GlobalVariable *> KernelUsedVariables;
750 // Allocating variables that are used directly in this struct to get
751 // alignment aware allocation and predictable frame size.
752 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
753 if (!AMDGPU::isDynamicLDS(*v)) {
754 KernelUsedVariables.insert(v);
755 }
756 }
757
758 // Allocating variables that are accessed indirectly so that a lookup of
759 // this struct instance can find them from nested functions.
760 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
761 if (!AMDGPU::isDynamicLDS(*v)) {
762 KernelUsedVariables.insert(v);
763 }
764 }
765
766 // Variables allocated in module lds must all resolve to that struct,
767 // not to the per-kernel instance.
768 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
769 for (GlobalVariable *v : ModuleScopeVariables) {
770 KernelUsedVariables.erase(v);
771 }
772 }
773
774 if (KernelUsedVariables.empty()) {
775 // Either used no LDS, or the LDS it used was all in the module struct
776 // or dynamically sized
777 continue;
778 }
779
780 // The association between kernel function and LDS struct is done by
781 // symbol name, which only works if the function in question has a
782 // name This is not expected to be a problem in practice as kernels
783 // are called by name making anonymous ones (which are named by the
784 // backend) difficult to use. This does mean that llvm test cases need
785 // to name the kernels.
786 if (!Func.hasName()) {
787 reportFatalUsageError("anonymous kernels cannot use LDS variables");
788 }
789
790 std::string VarName =
791 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
792
793 auto Replacement =
794 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
795
796 // If any indirect uses, create a direct use to ensure allocation
797 // TODO: Simpler to unconditionally mark used but that regresses
798 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
799 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
800 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
801 !Accesses->second.empty())
802 markUsedByKernel(&Func, Replacement.SGV);
803
804 // remove preserves existing codegen
805 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
806 KernelToReplacement[&Func] = Replacement;
807
808 // Rewrite uses within kernel to the new struct
809 replaceLDSVariablesWithStruct(
810 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
811 Instruction *I = dyn_cast<Instruction>(U.getUser());
812 return I && I->getFunction() == &Func;
813 });
814 }
815 return KernelToReplacement;
816 }
817
818 static GlobalVariable *
819 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
820 Function *func) {
821 // Create a dynamic lds variable with a name associated with the passed
822 // function that has the maximum alignment of any dynamic lds variable
823 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
824 // allocation, possibly after alignment padding. The representative variable
825 // created here has the maximum alignment of any other dynamic variable
826 // reachable by that kernel. All dynamic LDS variables are allocated at the
827 // same address in each kernel in order to provide the documented aliasing
828 // semantics. Setting the alignment here allows this IR pass to accurately
829 // predict the exact constant at which it will be allocated.
830
832
833 LLVMContext &Ctx = M.getContext();
834 const DataLayout &DL = M.getDataLayout();
835 Align MaxDynamicAlignment(1);
836
837 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
838 if (AMDGPU::isDynamicLDS(*GV)) {
839 MaxDynamicAlignment =
840 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
841 }
842 };
843
844 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
845 UpdateMaxAlignment(GV);
846 }
847
848 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
849 UpdateMaxAlignment(GV);
850 }
851
852 assert(func->hasName()); // Checked by caller
853 auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
855 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
856 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
857 false);
858 N->setAlignment(MaxDynamicAlignment);
859
861 return N;
862 }
863
864 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
865 Module &M, LDSUsesInfoTy &LDSUsesInfo,
866 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
867 DenseSet<GlobalVariable *> const &DynamicVariables,
868 std::vector<Function *> const &OrderedKernels) {
869 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
870 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
871 LLVMContext &Ctx = M.getContext();
872 IRBuilder<> Builder(Ctx);
873 Type *I32 = Type::getInt32Ty(Ctx);
874
875 std::vector<Constant *> newDynamicLDS;
876
877 // Table is built in the same order as OrderedKernels
878 for (auto &func : OrderedKernels) {
879
880 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
882 if (!func->hasName()) {
883 reportFatalUsageError("anonymous kernels cannot use LDS variables");
884 }
885
887 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
888
889 KernelToCreatedDynamicLDS[func] = N;
890
891 markUsedByKernel(func, N);
892
893 auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
895 emptyCharArray, N, ConstantInt::get(I32, 0), true);
896 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
897 } else {
898 newDynamicLDS.push_back(PoisonValue::get(I32));
899 }
900 }
901 assert(OrderedKernels.size() == newDynamicLDS.size());
902
903 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
904 Constant *init = ConstantArray::get(t, newDynamicLDS);
905 GlobalVariable *table = new GlobalVariable(
906 M, t, true, GlobalValue::InternalLinkage, init,
907 "llvm.amdgcn.dynlds.offset.table", nullptr,
909
910 for (GlobalVariable *GV : DynamicVariables) {
911 for (Use &U : make_early_inc_range(GV->uses())) {
912 auto *I = dyn_cast<Instruction>(U.getUser());
913 if (!I)
914 continue;
915 if (isKernelLDS(I->getFunction()))
916 continue;
917
918 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
919 }
920 }
921 }
922 return KernelToCreatedDynamicLDS;
923 }
924
925 static GlobalVariable *uniquifyGVPerKernel(Module &M, GlobalVariable *GV,
926 Function *KF) {
927 bool NeedsReplacement = false;
928 for (Use &U : GV->uses()) {
929 if (auto *I = dyn_cast<Instruction>(U.getUser())) {
930 Function *F = I->getFunction();
931 if (isKernelLDS(F) && F != KF) {
932 NeedsReplacement = true;
933 break;
934 }
935 }
936 }
937 if (!NeedsReplacement)
938 return GV;
939 // Create a new GV used only by this kernel and its function
940 GlobalVariable *NewGV = new GlobalVariable(
941 M, GV->getValueType(), GV->isConstant(), GV->getLinkage(),
942 GV->getInitializer(), GV->getName() + "." + KF->getName(), nullptr,
944 NewGV->copyAttributesFrom(GV);
945 for (Use &U : make_early_inc_range(GV->uses())) {
946 if (auto *I = dyn_cast<Instruction>(U.getUser())) {
947 Function *F = I->getFunction();
948 if (!isKernelLDS(F) || F == KF) {
949 U.getUser()->replaceUsesOfWith(GV, NewGV);
950 }
951 }
952 }
953 return NewGV;
954 }
955
956 bool lowerSpecialLDSVariables(
957 Module &M, LDSUsesInfoTy &LDSUsesInfo,
958 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly) {
959 bool Changed = false;
960 const DataLayout &DL = M.getDataLayout();
961 // The 1st round: give module-absolute assignments
962 int NumAbsolutes = 0;
963 std::vector<GlobalVariable *> OrderedGVs;
964 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
965 GlobalVariable *GV = K.first;
966 if (!isNamedBarrier(*GV))
967 continue;
968 // give a module-absolute assignment if it is indirectly accessed by
969 // multiple kernels. This is not precise, but we don't want to duplicate
970 // a function when it is called by multiple kernels.
971 if (LDSToKernelsThatNeedToAccessItIndirectly[GV].size() > 1) {
972 OrderedGVs.push_back(GV);
973 } else {
974 // leave it to the 2nd round, which will give a kernel-relative
975 // assignment if it is only indirectly accessed by one kernel
976 LDSUsesInfo.direct_access[*K.second.begin()].insert(GV);
977 }
978 LDSToKernelsThatNeedToAccessItIndirectly.erase(GV);
979 }
980 OrderedGVs = sortByName(std::move(OrderedGVs));
981 for (GlobalVariable *GV : OrderedGVs) {
983 unsigned BarId = NumAbsolutes + 1;
984 unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;
985 NumAbsolutes += BarCnt;
986
987 // 4 bits for alignment, 5 bits for the barrier num,
988 // 3 bits for the barrier scope
989 unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;
990 recordLDSAbsoluteAddress(&M, GV, Offset);
991 }
992 OrderedGVs.clear();
993
994 // The 2nd round: give a kernel-relative assignment for GV that
995 // either only indirectly accessed by single kernel or only directly
996 // accessed by multiple kernels.
997 std::vector<Function *> OrderedKernels;
998 for (auto &K : LDSUsesInfo.direct_access) {
999 Function *F = K.first;
1001 OrderedKernels.push_back(F);
1002 }
1003 OrderedKernels = sortByName(std::move(OrderedKernels));
1004
1006 for (Function *F : OrderedKernels) {
1007 for (GlobalVariable *GV : LDSUsesInfo.direct_access[F]) {
1008 if (!isNamedBarrier(*GV))
1009 continue;
1010
1011 LDSUsesInfo.direct_access[F].erase(GV);
1012 if (GV->isAbsoluteSymbolRef()) {
1013 // already assigned
1014 continue;
1015 }
1016 OrderedGVs.push_back(GV);
1017 }
1018 OrderedGVs = sortByName(std::move(OrderedGVs));
1019 for (GlobalVariable *GV : OrderedGVs) {
1020 // GV could also be used directly by other kernels. If so, we need to
1021 // create a new GV used only by this kernel and its function.
1022 auto NewGV = uniquifyGVPerKernel(M, GV, F);
1023 Changed |= (NewGV != GV);
1024 unsigned BarrierScope = llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;
1025 unsigned BarId = Kernel2BarId[F];
1026 BarId += NumAbsolutes + 1;
1027 unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;
1028 Kernel2BarId[F] += BarCnt;
1029 unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;
1030 recordLDSAbsoluteAddress(&M, NewGV, Offset);
1031 }
1032 OrderedGVs.clear();
1033 }
1034 // Also erase those special LDS variables from indirect_access.
1035 for (auto &K : LDSUsesInfo.indirect_access) {
1036 assert(isKernelLDS(K.first));
1037 for (GlobalVariable *GV : K.second) {
1038 if (isNamedBarrier(*GV))
1039 K.second.erase(GV);
1040 }
1041 }
1042 return Changed;
1043 }
1044
1045 bool runOnModule(Module &M) {
1046 CallGraph CG = CallGraph(M);
1047 bool Changed = superAlignLDSGlobals(M);
1048
1050
1051 Changed = true; // todo: narrow this down
1052
1053 // For each kernel, what variables does it access directly or through
1054 // callees
1055 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1056
1057 // For each variable accessed through callees, which kernels access it
1058 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1059 for (auto &K : LDSUsesInfo.indirect_access) {
1060 Function *F = K.first;
1062 for (GlobalVariable *GV : K.second) {
1063 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1064 }
1065 }
1066
1067 if (LDSUsesInfo.HasSpecialGVs) {
1068 // Special LDS variables need special address assignment
1069 Changed |= lowerSpecialLDSVariables(
1070 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);
1071 }
1072
1073 // Partition variables accessed indirectly into the different strategies
1074 DenseSet<GlobalVariable *> ModuleScopeVariables;
1075 DenseSet<GlobalVariable *> TableLookupVariables;
1076 DenseSet<GlobalVariable *> KernelAccessVariables;
1077 DenseSet<GlobalVariable *> DynamicVariables;
1078 partitionVariablesIntoIndirectStrategies(
1079 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1080 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1081 DynamicVariables);
1082
1083 // If the kernel accesses a variable that is going to be stored in the
1084 // module instance through a call then that kernel needs to allocate the
1085 // module instance
1086 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1087 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1088 ModuleScopeVariables);
1089 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1090 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1091 TableLookupVariables);
1092
1093 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1094 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1095 DynamicVariables);
1096
1097 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1098 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1099
1101 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1102 KernelsThatAllocateModuleLDS,
1103 MaybeModuleScopeStruct);
1104
1105 // Lower zero cost accesses to the kernel instances just created
1106 for (auto &GV : KernelAccessVariables) {
1107 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1108 assert(funcs.size() == 1); // Only one kernel can access it
1109 LDSVariableReplacement Replacement =
1110 KernelToReplacement[*(funcs.begin())];
1111
1113 Vec.insert(GV);
1114
1115 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1116 return isa<Instruction>(U.getUser());
1117 });
1118 }
1119
1120 // The ith element of this vector is kernel id i
1121 std::vector<Function *> OrderedKernels =
1122 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1123 KernelsThatIndirectlyAllocateDynamicLDS);
1124
1125 if (!KernelsThatAllocateTableLDS.empty()) {
1126 LLVMContext &Ctx = M.getContext();
1127 IRBuilder<> Builder(Ctx);
1128
1129 // The order must be consistent between lookup table and accesses to
1130 // lookup table
1131 auto TableLookupVariablesOrdered =
1132 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1133 TableLookupVariables.end()));
1134
1135 GlobalVariable *LookupTable = buildLookupTable(
1136 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1137 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1138 LookupTable);
1139 }
1140
1141 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1142 lowerDynamicLDSVariables(M, LDSUsesInfo,
1143 KernelsThatIndirectlyAllocateDynamicLDS,
1144 DynamicVariables, OrderedKernels);
1145
1146 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1147 // kernel. We may have inferred this wasn't used prior to the pass.
1148 // TODO: We could filter out subgraphs that do not access LDS globals.
1149 for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,
1150 &KernelsThatAllocateTableLDS})
1151 for (Function *F : *KernelSet)
1152 removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});
1153
1154 // All kernel frames have been allocated. Calculate and record the
1155 // addresses.
1156 {
1157 const DataLayout &DL = M.getDataLayout();
1158
1159 for (Function &Func : M.functions()) {
1160 if (Func.isDeclaration() || !isKernelLDS(&Func))
1161 continue;
1162
1163 // All three of these are optional. The first variable is allocated at
1164 // zero. They are allocated by AMDGPUMachineFunction as one block.
1165 // Layout:
1166 //{
1167 // module.lds
1168 // alignment padding
1169 // kernel instance
1170 // alignment padding
1171 // dynamic lds variables
1172 //}
1173
1174 const bool AllocateModuleScopeStruct =
1175 MaybeModuleScopeStruct &&
1176 KernelsThatAllocateModuleLDS.contains(&Func);
1177
1178 auto Replacement = KernelToReplacement.find(&Func);
1179 const bool AllocateKernelScopeStruct =
1180 Replacement != KernelToReplacement.end();
1181
1182 const bool AllocateDynamicVariable =
1183 KernelToCreatedDynamicLDS.contains(&Func);
1184
1185 uint32_t Offset = 0;
1186
1187 if (AllocateModuleScopeStruct) {
1188 // Allocated at zero, recorded once on construction, not once per
1189 // kernel
1190 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1191 }
1192
1193 if (AllocateKernelScopeStruct) {
1194 GlobalVariable *KernelStruct = Replacement->second.SGV;
1195 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1196 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1197 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1198 }
1199
1200 // If there is dynamic allocation, the alignment needed is included in
1201 // the static frame size. There may be no reference to the dynamic
1202 // variable in the kernel itself, so without including it here, that
1203 // alignment padding could be missed.
1204 if (AllocateDynamicVariable) {
1205 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1206 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1207 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1208 }
1209
1210 if (Offset != 0) {
1211 (void)TM; // TODO: Account for target maximum LDS
1212 std::string Buffer;
1213 raw_string_ostream SS{Buffer};
1214 SS << format("%u", Offset);
1215
1216 // Instead of explicitly marking kernels that access dynamic variables
1217 // using special case metadata, annotate with min-lds == max-lds, i.e.
1218 // that there is no more space available for allocating more static
1219 // LDS variables. That is the right condition to prevent allocating
1220 // more variables which would collide with the addresses assigned to
1221 // dynamic variables.
1222 if (AllocateDynamicVariable)
1223 SS << format(",%u", Offset);
1224
1225 Func.addFnAttr("amdgpu-lds-size", Buffer);
1226 }
1227 }
1228 }
1229
1230 for (auto &GV : make_early_inc_range(M.globals()))
1232 // probably want to remove from used lists
1234 if (GV.use_empty())
1235 GV.eraseFromParent();
1236 }
1237
1238 return Changed;
1239 }
1240
1241private:
1242 // Increase the alignment of LDS globals if necessary to maximise the chance
1243 // that we can use aligned LDS instructions to access them.
1244 static bool superAlignLDSGlobals(Module &M) {
1245 const DataLayout &DL = M.getDataLayout();
1246 bool Changed = false;
1247 if (!SuperAlignLDSGlobals) {
1248 return Changed;
1249 }
1250
1251 for (auto &GV : M.globals()) {
1253 // Only changing alignment of LDS variables
1254 continue;
1255 }
1256 if (!GV.hasInitializer()) {
1257 // cuda/hip extern __shared__ variable, leave alignment alone
1258 continue;
1259 }
1260
1261 if (GV.isAbsoluteSymbolRef()) {
1262 // If the variable is already allocated, don't change the alignment
1263 continue;
1264 }
1265
1266 Align Alignment = AMDGPU::getAlign(DL, &GV);
1267 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1268
1269 if (GVSize > 8) {
1270 // We might want to use a b96 or b128 load/store
1271 Alignment = std::max(Alignment, Align(16));
1272 } else if (GVSize > 4) {
1273 // We might want to use a b64 load/store
1274 Alignment = std::max(Alignment, Align(8));
1275 } else if (GVSize > 2) {
1276 // We might want to use a b32 load/store
1277 Alignment = std::max(Alignment, Align(4));
1278 } else if (GVSize > 1) {
1279 // We might want to use a b16 load/store
1280 Alignment = std::max(Alignment, Align(2));
1281 }
1282
1283 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1284 Changed = true;
1285 GV.setAlignment(Alignment);
1286 }
1287 }
1288 return Changed;
1289 }
1290
1291 static LDSVariableReplacement createLDSVariableReplacement(
1292 Module &M, std::string VarName,
1293 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1294 // Create a struct instance containing LDSVarsToTransform and map from those
1295 // variables to ConstantExprGEP
1296 // Variables may be introduced to meet alignment requirements. No aliasing
1297 // metadata is useful for these as they have no uses. Erased before return.
1298
1299 LLVMContext &Ctx = M.getContext();
1300 const DataLayout &DL = M.getDataLayout();
1301 assert(!LDSVarsToTransform.empty());
1302
1304 LayoutFields.reserve(LDSVarsToTransform.size());
1305 {
1306 // The order of fields in this struct depends on the order of
1307 // variables in the argument which varies when changing how they
1308 // are identified, leading to spurious test breakage.
1309 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1310 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1311
1312 for (GlobalVariable *GV : Sorted) {
1314 DL.getTypeAllocSize(GV->getValueType()),
1315 AMDGPU::getAlign(DL, GV));
1316 LayoutFields.emplace_back(F);
1317 }
1318 }
1319
1320 performOptimizedStructLayout(LayoutFields);
1321
1322 std::vector<GlobalVariable *> LocalVars;
1323 BitVector IsPaddingField;
1324 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1325 IsPaddingField.reserve(LDSVarsToTransform.size());
1326 {
1327 uint64_t CurrentOffset = 0;
1328 for (auto &F : LayoutFields) {
1329 GlobalVariable *FGV =
1330 static_cast<GlobalVariable *>(const_cast<void *>(F.Id));
1331 Align DataAlign = F.Alignment;
1332
1333 uint64_t DataAlignV = DataAlign.value();
1334 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1335 uint64_t Padding = DataAlignV - Rem;
1336
1337 // Append an array of padding bytes to meet alignment requested
1338 // Note (o + (a - (o % a)) ) % a == 0
1339 // (offset + Padding ) % align == 0
1340
1341 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1342 LocalVars.push_back(new GlobalVariable(
1343 M, ATy, false, GlobalValue::InternalLinkage,
1345 AMDGPUAS::LOCAL_ADDRESS, false));
1346 IsPaddingField.push_back(true);
1347 CurrentOffset += Padding;
1348 }
1349
1350 LocalVars.push_back(FGV);
1351 IsPaddingField.push_back(false);
1352 CurrentOffset += F.Size;
1353 }
1354 }
1355
1356 std::vector<Type *> LocalVarTypes;
1357 LocalVarTypes.reserve(LocalVars.size());
1358 std::transform(
1359 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1360 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1361
1362 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1363
1364 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1365
1366 GlobalVariable *SGV = new GlobalVariable(
1367 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1369 false);
1370 SGV->setAlignment(StructAlign);
1371
1373 Type *I32 = Type::getInt32Ty(Ctx);
1374 for (size_t I = 0; I < LocalVars.size(); I++) {
1375 GlobalVariable *GV = LocalVars[I];
1376 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1377 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1378 if (IsPaddingField[I]) {
1379 assert(GV->use_empty());
1380 GV->eraseFromParent();
1381 } else {
1382 Map[GV] = GEP;
1383 }
1384 }
1385 assert(Map.size() == LDSVarsToTransform.size());
1386 return {SGV, std::move(Map)};
1387 }
1388
1389 template <typename PredicateTy>
1390 static void replaceLDSVariablesWithStruct(
1391 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1392 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1393 LLVMContext &Ctx = M.getContext();
1394 const DataLayout &DL = M.getDataLayout();
1395
1396 // A hack... we need to insert the aliasing info in a predictable order for
1397 // lit tests. Would like to have them in a stable order already, ideally the
1398 // same order they get allocated, which might mean an ordered set container
1399 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1400 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1401
1402 // Create alias.scope and their lists. Each field in the new structure
1403 // does not alias with all other fields.
1404 SmallVector<MDNode *> AliasScopes;
1405 SmallVector<Metadata *> NoAliasList;
1406 const size_t NumberVars = LDSVarsToTransform.size();
1407 if (NumberVars > 1) {
1408 MDBuilder MDB(Ctx);
1409 AliasScopes.reserve(NumberVars);
1411 for (size_t I = 0; I < NumberVars; I++) {
1413 AliasScopes.push_back(Scope);
1414 }
1415 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1416 }
1417
1418 // Replace uses of ith variable with a constantexpr to the corresponding
1419 // field of the instance that will be allocated by AMDGPUMachineFunction
1420 for (size_t I = 0; I < NumberVars; I++) {
1421 GlobalVariable *GV = LDSVarsToTransform[I];
1422 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1423
1425
1426 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1427 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1428 uint64_t Offset = APOff.getZExtValue();
1429
1430 Align A =
1431 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1432
1433 if (I)
1434 NoAliasList[I - 1] = AliasScopes[I - 1];
1435 MDNode *NoAlias =
1436 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1437 MDNode *AliasScope =
1438 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1439
1440 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1441 }
1442 }
1443
1444 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1445 const DataLayout &DL, MDNode *AliasScope,
1446 MDNode *NoAlias, unsigned MaxDepth = 5) {
1447 if (!MaxDepth || (A == 1 && !AliasScope))
1448 return;
1449
1450 ScopedNoAliasAAResult ScopedNoAlias;
1451
1452 for (User *U : Ptr->users()) {
1453 if (auto *I = dyn_cast<Instruction>(U)) {
1454 if (AliasScope && I->mayReadOrWriteMemory()) {
1455 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1456 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1457 : AliasScope);
1458 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1459
1460 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1461
1462 // Scoped aliases can originate from two different domains.
1463 // First domain would be from LDS domain (created by this pass).
1464 // All entries (LDS vars) into LDS struct will have same domain.
1465
1466 // Second domain could be existing scoped aliases that are the
1467 // results of noalias params and subsequent optimizations that
1468 // may alter thesse sets.
1469
1470 // We need to be careful how we create new alias sets, and
1471 // have right scopes and domains for loads/stores of these new
1472 // LDS variables. We intersect NoAlias set if alias sets belong
1473 // to the same domain. This is the case if we have memcpy using
1474 // LDS variables. Both src and dst of memcpy would belong to
1475 // LDS struct, they donot alias.
1476 // On the other hand, if one of the domains is LDS and other is
1477 // existing domain prior to LDS, we need to have a union of all
1478 // these aliases set to preserve existing aliasing information.
1479
1480 SmallPtrSet<const MDNode *, 16> ExistingDomains, LDSDomains;
1481 ScopedNoAlias.collectScopedDomains(NA, ExistingDomains);
1482 ScopedNoAlias.collectScopedDomains(NoAlias, LDSDomains);
1483 auto Intersection = set_intersection(ExistingDomains, LDSDomains);
1484 if (Intersection.empty()) {
1485 NA = NA ? MDNode::concatenate(NA, NoAlias) : NoAlias;
1486 } else {
1487 NA = NA ? MDNode::intersect(NA, NoAlias) : NoAlias;
1488 }
1489 I->setMetadata(LLVMContext::MD_noalias, NA);
1490 }
1491 }
1492
1493 if (auto *LI = dyn_cast<LoadInst>(U)) {
1494 LI->setAlignment(std::max(A, LI->getAlign()));
1495 continue;
1496 }
1497 if (auto *SI = dyn_cast<StoreInst>(U)) {
1498 if (SI->getPointerOperand() == Ptr)
1499 SI->setAlignment(std::max(A, SI->getAlign()));
1500 continue;
1501 }
1502 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1503 // None of atomicrmw operations can work on pointers, but let's
1504 // check it anyway in case it will or we will process ConstantExpr.
1505 if (AI->getPointerOperand() == Ptr)
1506 AI->setAlignment(std::max(A, AI->getAlign()));
1507 continue;
1508 }
1509 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1510 if (AI->getPointerOperand() == Ptr)
1511 AI->setAlignment(std::max(A, AI->getAlign()));
1512 continue;
1513 }
1514 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1515 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1516 APInt Off(BitWidth, 0);
1517 if (GEP->getPointerOperand() == Ptr) {
1518 Align GA;
1519 if (GEP->accumulateConstantOffset(DL, Off))
1520 GA = commonAlignment(A, Off.getLimitedValue());
1521 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1522 MaxDepth - 1);
1523 }
1524 continue;
1525 }
1526 if (auto *I = dyn_cast<Instruction>(U)) {
1527 if (I->getOpcode() == Instruction::BitCast ||
1528 I->getOpcode() == Instruction::AddrSpaceCast)
1529 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1530 }
1531 }
1532 }
1533};
1534
1535class AMDGPULowerModuleLDSLegacy : public ModulePass {
1536public:
1537 const AMDGPUTargetMachine *TM;
1538 static char ID;
1539
1540 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM = nullptr)
1541 : ModulePass(ID), TM(TM) {}
1542
1543 void getAnalysisUsage(AnalysisUsage &AU) const override {
1544 if (!TM)
1546 }
1547
1548 bool runOnModule(Module &M) override {
1549 if (!TM) {
1550 auto &TPC = getAnalysis<TargetPassConfig>();
1551 TM = &TPC.getTM<AMDGPUTargetMachine>();
1552 }
1553
1554 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1555 }
1556};
1557
1558} // namespace
1559char AMDGPULowerModuleLDSLegacy::ID = 0;
1560
1561char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1562
1563INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1564 "Lower uses of LDS variables from non-kernel functions",
1565 false, false)
1567INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1568 "Lower uses of LDS variables from non-kernel functions",
1570
1571ModulePass *
1573 return new AMDGPULowerModuleLDSLegacy(TM);
1574}
1575
1578 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1580}
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
This file contains the declarations for the subclasses of Constant, which represent the different fla...
DXIL Forward Handle Accesses
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
#define DEBUG_TYPE
global merge func
Hexagon Common GEP
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition PassSupport.h:42
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition PassSupport.h:44
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition PassSupport.h:39
This file contains some templates that are useful if you are working with the STL at all.
This is the interface for a metadata-based scoped no-alias analysis.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition APInt.h:78
uint64_t getZExtValue() const
Get zero extended value.
Definition APInt.h:1540
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
LLVM Basic Block Representation.
Definition BasicBlock.h:62
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
void reserve(unsigned N)
Definition BitVector.h:348
void push_back(bool Val)
Definition BitVector.h:466
The basic data container for the call graph of a Module of IR.
Definition CallGraph.h:72
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static ConstantAsMetadata * get(Constant *C)
Definition Metadata.h:535
static LLVM_ABI Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
static LLVM_ABI Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition Constants.h:1274
This is an important base class in LLVM.
Definition Constant.h:43
LLVM_ABI void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
A parsed version of the target data layout string in and methods for querying it.
Definition DataLayout.h:63
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:165
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
Definition DenseMap.h:229
iterator end()
Definition DenseMap.h:81
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition DenseMap.h:156
Implements a dense probed hash-table based set.
Definition DenseSet.h:261
LLVM_ABI void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
LinkageTypes getLinkage() const
LLVM_ABI bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
Definition Globals.cpp:424
ThreadLocalMode getThreadLocalMode() const
PointerType * getType() const
Global values are always pointers.
@ InternalLinkage
Rename collisions when linking (static functions).
Definition GlobalValue.h:60
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Type * getValueType() const
const Constant * getInitializer() const
getInitializer - Return the initializer for this global variable.
bool hasInitializer() const
Definitions have initializers, declarations don't.
LLVM_ABI void copyAttributesFrom(const GlobalVariable *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...
Definition Globals.cpp:540
bool isConstant() const
If the value is a global constant, its value is immutable throughout the runtime execution of the pro...
LLVM_ABI void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition Globals.cpp:507
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalVariable.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2780
bool runOnModule(Module &) override
ImmutablePasses are never run.
Definition Pass.h:302
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition MDBuilder.h:181
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition MDBuilder.h:174
Metadata node.
Definition Metadata.h:1077
static LLVM_ABI MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
static LLVM_ABI MDNode * concatenate(MDNode *A, MDNode *B)
Methods for metadata merging.
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition Metadata.h:1565
static LLVM_ABI MDNode * intersect(MDNode *A, MDNode *B)
Root of the metadata hierarchy.
Definition Metadata.h:63
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition Pass.h:255
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
A container for an operand bundle being viewed as a set of values rather than a set of uses.
static PointerType * getUnqual(Type *ElementType)
This constructs a pointer to an object of the specified type in the default address space (address sp...
unsigned getAddressSpace() const
Return the address space of the Pointer type.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
A simple AA result which uses scoped-noalias metadata to answer queries.
LLVM_ABI void collectScopedDomains(const MDNode *NoAlias, SmallPtrSetImpl< const MDNode * > &Domains) const
Collect the set of scoped domains relevant to the noalias scopes.
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:168
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
reference emplace_back(ArgTypes &&... Args)
void reserve(size_type N)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Class to represent struct types.
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition Type.cpp:620
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition Twine.h:82
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:297
LLVM_ABI unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static LLVM_ABI IntegerType * getInt8Ty(LLVMContext &C)
Definition Type.cpp:295
A Use represents the edge between a Value definition and its users.
Definition Use.h:35
LLVM Value Representation.
Definition Value.h:75
LLVM_ABI void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition Value.cpp:554
bool use_empty() const
Definition Value.h:346
iterator_range< use_iterator > uses()
Definition Value.h:380
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
std::pair< iterator, bool > insert(const ValueT &V)
Definition DenseSet.h:194
size_type size() const
Definition DenseSet.h:87
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition DenseSet.h:169
bool erase(const ValueT &V)
Definition DenseSet.h:100
A raw_ostream that writes to an std::string.
Changed
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
bool isDynamicLDS(const GlobalVariable &GV)
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
DenseMap< GlobalVariable *, DenseSet< Function * > > VariableFunctionMap
bool isKernelLDS(const Function *F)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
LLVM_ABI Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})
Look up the Function declaration of the intrinsic id in the Module M.
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
initializer< Ty > init(const Ty &Val)
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
bool operator<(int64_t V1, const APSInt &V2)
Definition APSInt.h:362
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1665
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:649
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:626
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1632
char & AMDGPULowerModuleLDSLegacyPassID
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:548
S1Ty set_intersection(const S1Ty &S1, const S2Ty &S2)
set_intersection(A, B) - Return A ^ B
LLVM_ABI void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition Format.h:126
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
@ Other
Any other memory.
Definition ModRef.h:68
LLVM_ABI void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
LLVM_ABI std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:155
constexpr unsigned BitWidth
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:565
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition Alignment.h:212
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition Error.cpp:180
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition AMDGPU.h:139
FunctionVariableMap direct_access
FunctionVariableMap indirect_access
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:85