A very lightweight library that lets developers enumerate spans with SIMD acceleration
$ dotnet add package VectorizedSpansThis library is a fine addition to your collection. Lightweight with a focus purely on performance through SIMD, vectorization of your code can be easier than ever. No third-party libraries are depended on, keeping developers building fast with apps running faster.
Two ref structs are provided: VectorizedSpan and
VectorizedSpanEnumerator.
VectorizedSpan
VectorizedSpanEnumerator
var sum = 0;
foreach (var n in numbers)
sum += n;
return sum;
And here's the after:
VectorizedSpan<int> vspan = numbers; // Yeah we got implicit conversions 😎
var vsum = Vector<int>.Zero;
// Cover all possible vectors until there are no more
foreach (var v in vspan)
vsum += v;
// Add up the leftovers in case not all ints could be reached
var sum = Vector.Sum(vsum);
foreach (var n in vspan.Leftovers)
sum += n;
return sum;
Now imagine that simplicity in something that could be far more complex. In fact, that's still far more simple than having to write out all of the vectorized code here and there, all over again, every time, with bound checks and all. Let's set all numbers except negatives to 0 The "challenge" here will be loading the vectors back into the span. Here's the scalar before:
public static void NegativeIsolation(Span<int> numbers)
{
for (var i = 0; i < numbers.Length; i++)
{
if (numbers[i] > 0)
numbers[i] = 0;
}
}
Wow. One comparison per number. If you're reading this, you don't like those statistics. Why else are you here? Let's see the after:
public static void NegativeIsolation(Span<int> numbers)
{
var venumer = new VectorizedSpanEnumerator<int>(numbers, i => i + Vector<int>.Count);
while (venumer.MoveNext())
{
var v = venumer.Current;
const int shrCount = sizeof(int) * 8 - 1; // sign flag shr
var negatives = v >>> shrCount;
negatives *= v;
negatives.TryCopyTo(venumer.VSpan[venumer.Index..]);
}
for (var i = venumer.VSpan.LeftoversIndex; i < numbers.Length; i++)
{
if (numbers[i] > 0)
numbers[i] = 0;
}
}
Sure, it's a little longer. However, is it vectorized? Yes. Is it shorter than what it normally takes to vectorize too? Yes. That's what we're here for. This example also demonstrates the potential in using the enumerator directly instead of in a foreach. Namely, the ability to get the index in the span from which the vector was loaded from. That is how we loaded the integers back into the span. If the index isn't needed, it is recommended to stick with a foreach loop.
Although the examples provided are very simple, almost too simple for vectorization, they get the point across that vectorizing almost anything is made simple.
It ain't much but it's (an) honest work(horse)