Package Description
$ dotnet add package DotCompute.GeneratorsSource generators for the DotCompute framework that enable compile-time code generation for high-performance compute kernels.
The DotCompute.Generators project provides Roslyn-based source generators that automatically generate optimized backend-specific implementations for compute kernels marked with the [Kernel] or [RingKernel] attributes.
IIncrementalGenerator for optimal performance[Kernel] and [RingKernel] attributesVector<T>dotnet add package DotCompute.Generators --version 0.6.0
<ItemGroup>
<ProjectReference Include="..\..\src\DotCompute.Generators\DotCompute.Generators.csproj"
OutputItemType="Analyzer"
ReferenceOutputAssembly="false" />
</ItemGroup>
using DotCompute.Generators.Kernel;
// Standard kernel for one-shot execution
public static unsafe class VectorMath
{
[Kernel(
Backends = KernelBackends.CPU | KernelBackends.CUDA,
VectorSize = 8,
IsParallel = true,
Optimizations = OptimizationHints.AggressiveInlining | OptimizationHints.Vectorize)]
public static void AddVectors(float* a, float* b, float* result, int length)
{
for (int i = 0; i < length; i++)
{
result[i] = a[i] + b[i];
}
}
}
// Ring kernel for persistent GPU-resident computation
public static class GraphAlgorithms
{
[RingKernel(
KernelId = "pagerank-vertex",
Domain = RingKernelDomain.GraphAnalytics,
Mode = RingKernelMode.Persistent,
Capacity = 10000,
Backends = KernelBackends.CUDA | KernelBackends.OpenCL)]
public static void PageRankVertex(
IMessageQueue<VertexMessage> incoming,
IMessageQueue<VertexMessage> outgoing,
Span<float> pageRank)
{
int vertexId = Kernel.ThreadId.X;
while (incoming.TryDequeue(out var msg))
{
if (msg.TargetVertex == vertexId)
pageRank[vertexId] += msg.Rank;
}
// Send to neighbors...
}
}
The source generator will create:
Kernel Registry (KernelRegistry.g.cs):
CPU Implementation (AddVectors_CPU.g.cs):
Kernel Invoker (VectorMathInvoker.g.cs):
CPU: CPU backend with SIMD supportCUDA: NVIDIA GPU backendMetal: Apple GPU backendOpenCL: Cross-platform GPU backendAll: All available backendsAggressiveInlining: Force method inliningLoopUnrolling: Unroll loops for better performanceVectorize: Enable SIMD vectorizationPrefetch: Add memory prefetch hintsFastMath: Use fast math operations (may reduce accuracy)Sequential: Linear memory accessStrided: Fixed-stride memory accessRandom: Random memory accessCoalesced: GPU-optimized coalesced accessTiled: Tiled/blocked memory accessRing Kernels enable persistent GPU computation with message passing capabilities:
Persistent: Kernel stays active continuously, ideal for streaming workloadsEventDriven: Kernel launches on-demand when messages arrive, conserves resourcesSharedMemory: Lock-free queues in GPU shared memory (fastest for single-GPU)AtomicQueue: Lock-free queues in global memory with atomics (scalable)P2P: Direct GPU-to-GPU memory transfers (CUDA only, requires NVLink)NCCL: Multi-GPU collectives (CUDA only, optimal for distributed workloads)General: No domain-specific optimizationsGraphAnalytics: Optimized for irregular memory access patterns (graph algorithms)SpatialSimulation: Optimized for regular access with halo exchange (physics, fluids)ActorModel: Optimized for message-heavy workloads with dynamic distributionKernelId: Unique identifier for the kernel (required)Capacity: Maximum concurrent work items (default: 1024, must be power of 2)InputQueueSize: Size of incoming message queue (default: 256, must be power of 2)OutputQueueSize: Size of outgoing message queue (default: 256, must be power of 2)GridDimensions: Number of thread blocks per dimension (auto-calculated if null)BlockDimensions: Threads per block per dimension (auto-selected if null)UseSharedMemory: Enable shared memory for thread-block coordinationSharedMemorySize: Shared memory size in bytes per block| ID | Severity | Description |
|---|---|---|
| DC0001 | Error | Unsupported type in kernel |
| DC0002 | Error | Kernel method missing buffer parameter |
| DC0003 | Error | Invalid vector size (must be 4, 8, or 16) |
| DC0004 | Warning | Unsafe code context required |
| DC0005 | Warning | Potential performance issue |
DotCompute.Generators/
├── Kernel/
│ ├── KernelSourceGenerator.cs # Main generator
│ ├── KernelAttribute.cs # Attribute definitions
│ ├── KernelCompilationAnalyzer.cs # Compile-time analysis
│ └── AcceleratorType.cs # Backend enum
├── Backend/
│ └── CpuCodeGenerator.cs # CPU code generation
├── Models/
│ ├── KernelParameter.cs # Parameter model
│ └── VectorizationInfo.cs # Vectorization analysis model
├── Configuration/
│ └── GeneratorConfiguration.cs # Generator configuration
└── Utils/
├── SourceGeneratorHelpers.cs # Legacy facade (deprecated)
├── CodeFormatter.cs # Code formatting utilities
├── ParameterValidator.cs # Parameter validation
├── LoopOptimizer.cs # Loop optimization
├── VectorizationAnalyzer.cs # Vectorization analysis
├── MethodBodyExtractor.cs # Method body extraction
└── SimdTypeMapper.cs # SIMD type mapping
CUDA Code Generation
Metal Shader Generation
OpenCL Kernel Generation
Advanced Optimizations
Debugging Support
Comprehensive documentation is available for DotCompute: