DotCompute.Backends.CPU

Production-ready CPU compute backend with SIMD vectorization for .NET 9+.

Status: ✅ Production Ready

The CPU backend provides high-performance compute acceleration through:

SIMD Vectorization: AVX512/AVX2/NEON instruction sets
Multi-threading: Work-stealing thread pool
Memory Optimization: NUMA-aware allocation
Native AOT: Full compatibility with Native AOT compilation

Key Features

SIMD Acceleration

AVX512: Best performance on Intel Ice Lake+ and AMD Zen 4+
AVX2: Wide compatibility on modern Intel/AMD processors
NEON: ARM64 support for Apple Silicon and ARM servers
Automatic Detection: Runtime detection of optimal instruction set

Performance

8-23x Speedup: Achieved on vectorizable operations
Memory Bandwidth: 95%+ of theoretical peak utilization
Thread Scaling: Near-linear scaling to CPU core count
Low Overhead: Sub-microsecond kernel launch latency

Installation

dotnet add package DotCompute.Backends.CPU --version 0.5.3

Usage

Basic Setup

using DotCompute.Backends.CPU;
using Microsoft.Extensions.Logging;

var logger = LoggerFactory.Create(builder => builder.AddConsole())
    .CreateLogger<CpuAccelerator>();

var accelerator = new CpuAccelerator(logger);
await accelerator.InitializeAsync();

Service Registration

services.AddSingleton<IAccelerator, CpuAccelerator>();
// OR
services.AddCpuBackend();

Kernel Execution

var kernelDef = new KernelDefinition
{
    Name = "VectorAdd",
    Source = "/* OpenCL C kernel source */",
    EntryPoint = "vector_add"
};

var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);
await compiledKernel.ExecuteAsync(parameters);

Architecture

SIMD Dispatcher

Automatically selects the best available SIMD instruction set:

Detection: Runtime CPU capability detection
Dispatch: Function pointer selection to optimized kernels
Fallback: Scalar implementation for unsupported hardware

Thread Pool

Work-Stealing: Efficient load balancing across cores
Thread-Local Storage: Minimizes synchronization overhead
Adaptive Sizing: Scales with workload and system load

Memory Management

NUMA Awareness: Memory allocation respects CPU topology
Cache Optimization: Data layout for optimal cache usage
Memory Pooling: Reuse allocations to reduce overhead

Performance Benchmarks

Tested on Intel Core Ultra 7 165H with 16 threads:

Operation	Elements	CPU Time	SIMD Time	Speedup
Vector Add	1M floats	4.33ms	187μs	23x
Matrix Mult	512x512	2,340ms	89ms	26x
Dot Product	1M floats	2.1ms	156μs	13.4x

System Requirements

Minimum

.NET 9.0 or later
x64 or ARM64 processor
2GB RAM

Supported Platforms

Windows: x64, ARM64
Linux: x64, ARM64
macOS: x64 (Intel), ARM64 (Apple Silicon)

Build Configuration

The CPU backend automatically configures itself based on the target platform:

<PropertyGroup Condition="'$(TargetArchitecture)' == 'x64'">
  <DefineConstants>$(DefineConstants);ENABLE_AVX2;ENABLE_AVX512</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition="'$(TargetArchitecture)' == 'arm64'">
  <DefineConstants>$(DefineConstants);ENABLE_NEON</DefineConstants>
</PropertyGroup>

Troubleshooting

Performance Issues

Check SIMD Support: Verify CPU supports AVX2/AVX512
Memory Alignment: Ensure data is properly aligned for SIMD
Thread Count: Match thread count to physical cores
Memory Bandwidth: Monitor memory utilization during execution

Compatibility Issues

Native AOT: Ensure all types are AOT-compatible
Platform Support: Verify target platform support
Dependencies: Check for missing runtime dependencies

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Backend Integration - CPU SIMD implementation details
System Overview - Architecture and design principles

Developer Guides

Getting Started - Installation and quick start
Backend Selection - When to use CPU vs GPU
Performance Tuning - SIMD optimization techniques (3.7x measured speedup)
Kernel Development - Writing efficient kernels

Examples

Basic Vector Operations - CPU SIMD examples
Matrix Operations - Optimized CPU implementations

API Documentation

API Reference - Complete API documentation

Support

Documentation: Comprehensive Guides
Issues: GitHub Issues
Discussions: GitHub Discussions

Contributing

The CPU backend welcomes contributions in:

New SIMD instruction set support (e.g., AVX-512 variants)
Platform-specific optimizations
Kernel compilation improvements
Performance benchmarks and analysis

See CONTRIBUTING.md for guidelines.

DotCompute.Backends.CPU

Production-ready CPU compute backend with SIMD vectorization for .NET 9+.

Status: ✅ Production Ready

The CPU backend provides high-performance compute acceleration through:

SIMD Vectorization: AVX512/AVX2/NEON instruction sets
Multi-threading: Work-stealing thread pool
Memory Optimization: NUMA-aware allocation
Native AOT: Full compatibility with Native AOT compilation

Key Features

SIMD Acceleration

AVX512: Best performance on Intel Ice Lake+ and AMD Zen 4+
AVX2: Wide compatibility on modern Intel/AMD processors
NEON: ARM64 support for Apple Silicon and ARM servers
Automatic Detection: Runtime detection of optimal instruction set

Performance

8-23x Speedup: Achieved on vectorizable operations
Memory Bandwidth: 95%+ of theoretical peak utilization
Thread Scaling: Near-linear scaling to CPU core count
Low Overhead: Sub-microsecond kernel launch latency

Installation

dotnet add package DotCompute.Backends.CPU --version 0.5.3

Usage

Basic Setup

using DotCompute.Backends.CPU;
using Microsoft.Extensions.Logging;

var logger = LoggerFactory.Create(builder => builder.AddConsole())
    .CreateLogger<CpuAccelerator>();

var accelerator = new CpuAccelerator(logger);
await accelerator.InitializeAsync();

Service Registration

services.AddSingleton<IAccelerator, CpuAccelerator>();
// OR
services.AddCpuBackend();

Kernel Execution

var kernelDef = new KernelDefinition
{
    Name = "VectorAdd",
    Source = "/* OpenCL C kernel source */",
    EntryPoint = "vector_add"
};

var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);
await compiledKernel.ExecuteAsync(parameters);

Architecture

SIMD Dispatcher

Automatically selects the best available SIMD instruction set:

Detection: Runtime CPU capability detection
Dispatch: Function pointer selection to optimized kernels
Fallback: Scalar implementation for unsupported hardware

Thread Pool

Work-Stealing: Efficient load balancing across cores
Thread-Local Storage: Minimizes synchronization overhead
Adaptive Sizing: Scales with workload and system load

Memory Management

NUMA Awareness: Memory allocation respects CPU topology
Cache Optimization: Data layout for optimal cache usage
Memory Pooling: Reuse allocations to reduce overhead

Performance Benchmarks

Tested on Intel Core Ultra 7 165H with 16 threads:

Operation	Elements	CPU Time	SIMD Time	Speedup
Vector Add	1M floats	4.33ms	187μs	23x
Matrix Mult	512x512	2,340ms	89ms	26x
Dot Product	1M floats	2.1ms	156μs	13.4x

System Requirements

Minimum

.NET 9.0 or later
x64 or ARM64 processor
2GB RAM

Supported Platforms

Windows: x64, ARM64
Linux: x64, ARM64
macOS: x64 (Intel), ARM64 (Apple Silicon)

Build Configuration

The CPU backend automatically configures itself based on the target platform:

<PropertyGroup Condition="'$(TargetArchitecture)' == 'x64'">
  <DefineConstants>$(DefineConstants);ENABLE_AVX2;ENABLE_AVX512</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition="'$(TargetArchitecture)' == 'arm64'">
  <DefineConstants>$(DefineConstants);ENABLE_NEON</DefineConstants>
</PropertyGroup>

Troubleshooting

Performance Issues

Check SIMD Support: Verify CPU supports AVX2/AVX512
Memory Alignment: Ensure data is properly aligned for SIMD
Thread Count: Match thread count to physical cores
Memory Bandwidth: Monitor memory utilization during execution

Compatibility Issues

Native AOT: Ensure all types are AOT-compatible
Platform Support: Verify target platform support
Dependencies: Check for missing runtime dependencies

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Backend Integration - CPU SIMD implementation details
System Overview - Architecture and design principles

Developer Guides

Getting Started - Installation and quick start
Backend Selection - When to use CPU vs GPU
Performance Tuning - SIMD optimization techniques (3.7x measured speedup)
Kernel Development - Writing efficient kernels

Examples

Basic Vector Operations - CPU SIMD examples
Matrix Operations - Optimized CPU implementations

API Documentation

API Reference - Complete API documentation

Support

Documentation: Comprehensive Guides
Issues: GitHub Issues
Discussions: GitHub Discussions

Contributing

The CPU backend welcomes contributions in:

New SIMD instruction set support (e.g., AVX-512 variants)
Platform-specific optimizations
Kernel compilation improvements
Performance benchmarks and analysis

See CONTRIBUTING.md for guidelines.

mivertowski/DotCompute.Backends.CPUv0.6.2

Get Started

Readme

DotCompute.Backends.CPU

Status: ✅ Production Ready

Key Features

SIMD Acceleration

Performance

Installation

Usage

Basic Setup

Service Registration

Kernel Execution

Architecture

SIMD Dispatcher

Thread Pool

Memory Management

Performance Benchmarks

System Requirements

Minimum

Recommended

Supported Platforms

Build Configuration

Troubleshooting

Performance Issues

Compatibility Issues

Documentation & Resources

Architecture Documentation

Developer Guides

Examples

API Documentation

Support

Contributing

mivertowski/DotCompute.Backends.CPUv0.6.2

Get Started

Readme

DotCompute.Backends.CPU

Status: ✅ Production Ready

Key Features

SIMD Acceleration

Performance

Installation

Usage

Basic Setup

Service Registration

Kernel Execution

Architecture

SIMD Dispatcher

Thread Pool

Memory Management

Performance Benchmarks

System Requirements

Minimum

Recommended

Supported Platforms

Build Configuration

Troubleshooting

Performance Issues

Compatibility Issues

Documentation & Resources

Architecture Documentation

Developer Guides

Examples

API Documentation

Support

Contributing