A Danish string is a German string alike implementation for .NET, managed memory optimized
$ dotnet add package SpiseMisu.Text.DstringA Danish string is a German string alike implementation for .NET, managed memory optimized.
A dstring consists of 16-bytes (128-bits) of continuous memory, where:
The first byte, stores a bitmask for the seven next bytes as well as a
byte [] pointer
The first byte, uses a 4-bit bitmask to store the length of the
dstring prefix, as well as another 4-bit bitmask to store flags for
encoding-and-format. Once the upperbound length of the dstring prefix
length is reached, a 3-bit bitmask with compression flags is available:
# Upperbound length of eight (compression flags are available)
+--------+
|▭▭▭▭■□□□|
+--------+
# Lenth of five (compression flags are NOT available)
+--------+
|▭▭▭▭□■□■|
+--------+
and
# Encoding. Default is multiple single-byte UTF8 for optimal storage
+--------+
|□□□□▭▭▭▭| UTF8.......: Encoded bytes as multiple UTF8 single-bytes
+--------+
|□□□■▭▭▭▭| ASCII......: Encoded bytes in [0x00 - 0x7F]
+--------+
|□□■□▭▭▭▭| ExtASCII...: Encoded bytes in [0x00 - 0xFF]
+--------+
# Encoding and Format placeholders
+--------+
|□□■■▭▭▭▭| PlaceholderF03 (placeholder for future formats/encodings)
+--------+
|□■□□▭▭▭▭| PlaceholderF04 (placeholder for future formats/encodings)
+--------+
|□■□■▭▭▭▭| PlaceholderF05 (placeholder for future formats/encodings)
+--------+
|□■■□▭▭▭▭| PlaceholderF06 (placeholder for future formats/encodings)
+--------+
|□■■■▭▭▭▭| PlaceholderF07 (placeholder for future formats/encodings)
+--------+
|■□□□▭▭▭▭| PlaceholderF08 (placeholder for future formats/encodings)
+--------+
|■□□■▭▭▭▭| PlaceholderF09 (placeholder for future formats/encodings)
+--------+
|■□■□▭▭▭▭| PlaceholderF10 (placeholder for future formats/encodings)
+--------+
|■□■■▭▭▭▭| PlaceholderF11 (placeholder for future formats/encodings)
+--------+
|■■□□▭▭▭▭| PlaceholderF12 (placeholder for future formats/encodings)
+--------+
|■■□■▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings)
+--------+
|■■■□▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings)
+--------+
# Format
+--------+
|■■■■▭▭▭▭| JSON.......: Ex: [{"foo":42}]
+--------+
bit-mask
and
# Default is uncompressed
+--------+
|▭▭▭▭■□□□| Uncompressed
+--------+
# Compression algorithms, with streaming support
+--------+
|▭▭▭▭■□□■| Deflate
+--------+
|▭▭▭▭■□■□| GZip
+--------+
|▭▭▭▭■□■■| ZLib
+--------+
|▭▭▭▭■■□□| Brotli
+--------+
# Compression algorithms placeholders
+--------+
|▭▭▭▭■■□■| PlaceholderF05
+--------+
|▭▭▭▭■■■□| PlaceholderF06
+--------+
|▭▭▭▭■■■■| PlaceholderF07
+--------+
bit-mask
The next seven bytes, store each of the seven first bytes of a
dstring. If the dstring is less than seven bytes, then the remaining
bytes will be instantiated to a default value of zero
Finally, the last bytes, contain a x64-pointer (8-bytes) to a byte []
(on the heap) for the rest of the bytes in the dstring. If the dstring
is less than eight bytes, the byte [] will not be instantiated (null
value)
dstring ("test"). No heap allocation:+--------+----+----+----+----+----+----+----+----------+
|□□□□□■□□|0x74|0x65|0x73|0x74|0x00|0x00|0x00| <NULL> |
+--------+----+----+----+----+----+----+----+----------+
bit-mask b0 b1 b2 b3 b4 b5 b6 pointer
—— —— —— ——
dstring ("Danish string") + heap allocation: 0x551A4290 (byte[] on heap)
|
v
+--------+----+----+---+----+----------+ +----+----+---+----+
|□□□□■□□□|0x44|0x61| … |0x20|0x551A4290| ---> |0x73|0x74| … |0x67|
+--------+----+----+---+----+----------+ +----+----+---+----+
bit-mask b0 b1 … b6 pointer b7 b8 … bn
—— —— —— ——————— —— —— ——
dstring:extra allocated byte arrays on heap ----+------------+------------+
| | |
v | |
0x6796EE96 | |
+-+----+-----------------------+ | | |
|i|memo| continuous memory | v | |
+-+----+--------+---+----------+ +---+ v |
|0|0x00|□□□□■□□□| … |0x6796EE96| -----> | … | 0x53EB31F6 |
+-+----+--------+---+----------+ +---+ | |
|1|0x10|□□□□□□■□| … | <NULL> | v |
+-+----+--------+---+----------+ +---+ v
|2|0x20|□□□□■□□□| … |0x53EB31F6| ------------------> | … | 0x4A424B5E
+-+----+--------+---+----------+ +---+ |
|…|0x…0|□□□□□■□■| … | <NULL> | v
+-+----+--------+---+----------+ +---+
|8|0x80|□□□□■□□□| … |0x4A424B5E| -------------------------------> | … |
+-+----+--------+---+----------+ +---+
├── SpiseMisu.Text.Dstring
│ ├── lib
│ │ └── utils.fs
│ ├── SpiseMisu.Text.Dstring.fsproj
│ └── dstring.fs
├── SpiseMisu.Text.Dstring.Perfs
│ ├── SpiseMisu.Text.Dstring.Perfs.fsproj
│ └── program.fs
├── SpiseMisu.Text.Dstring.Tests
│ ├── SpiseMisu.Text.Dstring.Tests.fsproj
│ ├── program.fs
│ └── tests.fs
├── demo
│ └── dstring.fsx
├── imgs
│ ├── docs
│ ├── licenses
│ └── nuget
├── SpiseMisu.Text.Dstring.sln
├── global.json
├── license.txt
├── license_cil-bytecode_agpl-3.0-only.txt
├── license_knowhow_cc-by-nc-nd-40.txt
├── readme.md
└── todo.org
![Figure: dstring[] hex-dump](imgs/docs/dstring-memory-layout.png)
dotnet-dump mini-guideIn ./SpiseMisu.Text.Dstring.Perfs/program.fs > x.GlobalCleanup () =
outcomment System.Threading.Thread.Sleep(15_000 (* 15 secs *))
Execute ./dotnet-cli-pidof.sh and you will see all the dotnet apps
running. Look for the ones ending with
SpiseMisu.Text.Dstring.Perfs-Job-OVERNF-1/bin/Release/net10.0.
Now wait for the job, you want to make the memory dump for, reaches the
clean-up section: // AfterActualRun
Execute dotnet-dump collect --type Heap --process-id 2456129 and you will
see:
// AfterActualRun
WorkloadResult 1: 2 op, 507459083.00 ns, 253.7295 ms/op
// GC: 8 7 0 207217488 2
// Threading: 0 0 2
[createdump] Gathering state for process 2456129 dotnet
[createdump] Writing minidump with heap to file ~/…/SpiseMisu.Text.Dstring/core_20251004_170724
[createdump] Written 596156416 bytes (145546 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 306ms
Investigate by typing: dotnet-dump analyze core_20251004_170724
In the tool, type: dumpheap -stat and you will see:
…
561d22bacde0 13,565 539,936 Free
7f54cec830c0 1 8,000,024 System.Int64[]
7f54cec82ee8 1 16,000,024 SpiseMisu.Text+Dstring[]
7f54cec82010 2 16,000,048 System.Byte[][]
7f54ce9aeb48 34 24,004,640 System.String[]
7f54ce90d7c8 3,000,708 158,772,680 System.String
7f54ceb75950 5,000,005 209,002,292 System.Byte[]
Total 8,015,865 objects, 432,486,422 bytes
dumpheap -mt 7f54cec82ee8 Address MT Size
7f14ce800048 7f54cec82ee8 16,000,024
dumparray -length 5 7f14ce800048Name: SpiseMisu.Text+Dstring[]
MethodTable: 00007f54cec82ee8
EEClass: 00007f54cec82e60
Size: 16000024(0xf42418) bytes
Array: Rank 1, Number of elements 1000000, Type VALUETYPE
Element Methodtable: 00007f54cec82db0
[0] 00007f14ce800058
[1] 00007f14ce800068
[2] 00007f14ce800078
[3] 00007f14ce800088
[4] 00007f14ce800098
db -c 80 00007f14ce800058 (16-byte element x 5 = 80-bytes):00007f14ce800058: 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba 0k"......s..7...
00007f14ce800068: 58 6b 22 ce 14 7f 00 00 08 53 d1 20 a4 46 a1 86 Xk"......S. .F..
00007f14ce800078: 80 6b 22 ce 14 7f 00 00 08 44 8f d6 ea 76 37 34 .k"......D...v74
00007f14ce800088: a8 6b 22 ce 14 7f 00 00 08 5b c1 41 f8 f9 bd 58 .k"......[.A...X
00007f14ce800098: d0 6b 22 ce 14 7f 00 00 08 50 72 ef 42 a5 6a 2a .k"......Pr.B.j*
which show a similar pattern as the hex dumper (Dstring.Memory.dump):
0112748739DB99|00001000|↔|00007F536E755118|459055102CAE09F54B
01E606DBB4F6FA|00001000|↔|00007F536E754DD8|4BBC8ED0A25F0B8755
07BDEDF50B83AC|00001000|↔|00007F536E754DB0|43A0DFEEA191AEA2A3
0C5FB78013D42F|00001000|↔|00007F536E754CC0|41854A8815FE6E6A3C
1F3A8D9CC33F5E|00001000|↔|00007F536E7550F0|4BA36307910E82AB70
NOTE: In the performance
benchmarkGuid's are byte[]-reversed.
> 0112748739DB99|08|↔|00007F536E755118
(byte reversed becomes)
> 18 51 75 6E 53 7F 00 00|08|99 DB 39 87 74 12 01
(and compared to `dotnet-dump`)
< 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba
core_[DATESTAMP]_[TIMESTAMP] files// * Summary *
BenchmarkDotNet v0.15.6, Linux NixOS 25.05 (Warbler)
12th Gen Intel Core i7-12800H 0.40GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.100
[Host] : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3 DEBUG
Job-NTWEWU : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3
Job=Job-NTWEWU Runtime=.NET 10.0 IterationCount=1
LaunchCount=0 WarmupCount=0 Error=NA
| Method | N | Mean | Ratio | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-------------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N' | 1000000 | 494.9 us | 1.00 | 7.63 MB | 1.00 |
| 'Array.zeroCreate<dstring> x.N' | 1000000 | 672.2 us | 1.36 | 15.26 MB | 2.00 |
| 'x.guids |> Array.map Encoding.ASCII.GetString' | 1000000 | 54,606.6 us | 110.33 | 61.04 MB | 8.00 |
| 'x.guids |> Array.map Dstring.Bytes.toDstring' | 1000000 | 51,448.0 us | 103.95 | 53.41 MB | 7.00 |
| 'x.sha256s |> Array.map Encoding.ASCII.GetString' | 1000000 | 72,909.0 us | 147.31 | 91.55 MB | 12.00 |
| 'x.sha256s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 60,189.2 us | 121.61 | 68.66 MB | 9.00 |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 75,732.9 us | 153.02 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 8,107.6 us | 16.38 | 15.26 MB | 2.00 |
| 'x.strings |> Array.sort' | 1000000 | 209,608.7 us | 423.51 | 7.63 MB | 1.00 |
| 'x.strings |> Array.sortDescending' | 1000000 | 238,639.4 us | 482.16 | 7.63 MB | 1.00 |
| 'x.strings |> Array.map Dstring.UTF8.fromString' | 1000000 | 130,051.8 us | 262.77 | 53.39 MB | 7.00 |
| 'x.dstrings |> Array.map Dstring.UTF8.toString' | 1000000 | 135,886.5 us | 274.55 | 98.69 MB | 12.94 |
| 'x.dstrings |> Dstrings.sort' | 1000000 | 168,288.2 us | 340.02 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortDescending' | 1000000 | 168,100.8 us | 339.64 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortPrefix' | 1000000 | 147,110.7 us | 297.23 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortPrefixDescending' | 1000000 | 149,646.4 us | 302.36 | 15.26 MB | 2.00 |
// * Hints *
HideColumnsAnalyser
Summary -> Hidden columns: Error
// * Legends *
N : Value of the 'N' parameter
Mean : Arithmetic mean of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
1 us : 1 Microsecond (0.000001 sec)
NOTE: By adding
pinnedLastBytefordstringsthat are exactly8-bytein size, we minimize the amount of instantiatedbyte[]. Compare to previous approach (see below, before and after):
| Method | N | Mean | Ratio | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-----------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N' | 1000000 | 2.419 ms | 1.00 | 7.63 MB | 1.00 |
| … | … | … | … | … | … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 84.291 ms | 34.85 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 51.341 ms | 21.23 | 45.78 MB | 6.00 |
| … | … | … | … | … | … |
| … | … | … | … | … | … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 83.702 ms | 28.04 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 11.347 ms | 3.80 | 15.26 MB | 2.00 |
| … | … | … | … | … | … |
That would be a reduction
x5.5on compuation time andx3on (heap) memory allocation.
This will be really useful/helpful for when storing basic types, as for example:
DateTime;float64;int64/uint64; …, asdstrings.
Source code in this repository is ONLY covered by a Server Side Public License, v 1 while the rest (knowhow, text, media, …), is covered by the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

However, as it's not permitted to deploy a nuget package with non OSI nor
FSF licenses:
Pushing SpiseMisu.Text.Dstring.0.11.0.nupkg to 'https://www.nuget.org/api/v2/package'...
PUT https://www.nuget.org/api/v2/package/
BadRequest https://www.nuget.org/api/v2/package/ 846ms
error: Response status code does not indicate success: 400 (License expression must only contain licenses that are approved by Open Source Initiative or Free Software Foundation. Unsupported licenses: SSPL-1.0.).
The CIL-bytecode content of the nuget package is therefore dual-licensed
under the GNU Affero General Public License v3.0 only and the
rest (knowhow, text, media, …), is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
license.
For more info on compatible nuget packages licenses, see SPDX License
List.