LocalEmbeddings

A .NET library for generating text embeddings locally using ONNX Runtime and Microsoft.Extensions.AI abstractions — no external API calls required.

🎥 Quick Overview

New to local embeddings? Watch this 5-minute video explaining the main goal of the library.

Want to build RAG applications? Read this blog post about 3 RAG approaches in .NET with local embeddings and zero cloud calls.

Interested in image embeddings? Check out the YouTube video and blog post about local image embeddings with CLIP and ONNX.

Features

Local Embedding Generation — Run inference entirely on your machine using ONNX Runtime
Microsoft.Extensions.AI Integration — Implements IEmbeddingGenerator<string, Embedding<float>>
Kernel Memory Integration — Companion package ElBruno.LocalEmbeddings.KernelMemory provides a native ITextEmbeddingGenerator adapter for Microsoft Kernel Memory
VectorData Integration — Companion package ElBruno.LocalEmbeddings.VectorData adds DI helpers for Microsoft.Extensions.VectorData (VectorStore and typed collections)
Built-in In-Memory Vector Store — ElBruno.LocalEmbeddings.VectorData includes InMemoryVectorStore (no Semantic Kernel connector dependency required)
HuggingFace Model Support — Use popular sentence transformer models from HuggingFace Hub
Automatic Model Caching — Models are downloaded once and cached locally
Dependency Injection Support — First-class IServiceCollection integration
Single-String Convenience API — GenerateAsync("text") and GenerateEmbeddingAsync("text") — no array wrapping needed
Similarity Helpers — Cosine similarity, all-pairs Similarity(...) matrix, and one-line FindClosestAsync(...) semantic search
Thread-Safe & Batched — Concurrent generation and efficient multi-text processing

Installation

dotnet add package ElBruno.LocalEmbeddings

For Kernel Memory integration, also install the companion package:

dotnet add package ElBruno.LocalEmbeddings.KernelMemory

For VectorData integration, install:

dotnet add package ElBruno.LocalEmbeddings.VectorData

Quick Start

1) Generate one embedding

using ElBruno.LocalEmbeddings;

await using var generator = await LocalEmbeddingGenerator.CreateAsync();
var embedding = await generator.GenerateEmbeddingAsync("Hello, world!");
Console.WriteLine(embedding.Vector.Length); // 384

2) Generate embeddings for multiple texts

var inputs = new[] { "first text", "second text", "third text" };
var embeddings = await generator.GenerateAsync(inputs);
Console.WriteLine(embeddings.Count); // 3

3) Compare two texts with cosine similarity

using ElBruno.LocalEmbeddings.Extensions;

var pair = await generator.GenerateAsync(["I love coding", "I enjoy programming"]);
var score = pair[0].CosineSimilarity(pair[1]);
Console.WriteLine(score);

4) Semantic search in one line

var corpus = new[]
{
    "Python for data science",
    "JavaScript for web apps",
    "Swift for iOS development"
};

var corpusEmbeddings = await generator.GenerateAsync(corpus);
var results = await generator.FindClosestAsync(
    "best language for websites",
    corpus,
    corpusEmbeddings,
    topK: 2,
    minScore: 0.2f);

foreach (var result in results)
    Console.WriteLine($"{result.Score:F3} - {result.Text}");

For custom models and runtime behavior, use the options-based constructor: new LocalEmbeddingGenerator(new LocalEmbeddingsOptions { ... }).

Note: The synchronous constructor remains available for backward compatibility, but performs blocking initialization when downloads are needed.

Want to go further? Read the Getting Started guide and the other docs in this repo for DI, configuration, VectorData, Kernel Memory, and full RAG examples.

Prefer a containerized dev environment? See the Dev Container section in the Contributing guide.

Samples

See the samples README for prerequisites and run instructions.

Sample	What It Shows
HelloWorldAltModel	Minimal hello world with `sentence-transformers/all-MiniLM-L12-v2`
ConsoleApp	All the basics: single/batch embeddings, similarity, semantic search, DI
RagChat	Embedding-only semantic search Q&A using shared VectorData `InMemoryVectorStore` (no LLM needed)
RagOllama	Full RAG with Ollama + phi4-mini + Kernel Memory
RagFoundryLocal	Full RAG with Foundry Local + phi4-mini
ImageRagSimple	Minimal image RAG: index images → search by text
ImageRagChat	Interactive image RAG chat with text and image-to-image search

Configuration

var options = new LocalEmbeddingsOptions
{
    ModelName = "sentence-transformers/all-MiniLM-L6-v2",  // HuggingFace model
    MaxSequenceLength = 512,                                // Max tokens
    CacheDirectory = null,                                  // Auto-detect per platform
    EnsureModelDownloaded = true,                           // Download if missing
    NormalizeEmbeddings = false                              // L2 normalize vectors
};

See Configuration docs for supported models, local model paths, and cache locations.

Common model options (with model cards)

Estimated download sizes below are approximate and can vary by ONNX variant (fp32/int8) and tokenizer assets.

Documentation

Topic	Description
Getting Started	Step-by-step guide from hello world to RAG
API Reference	Classes, methods, and extension methods
Configuration	Options, supported models, cache locations
Alternative Models	Non-default free models, local download workflow, and license notes
Dependency Injection	All DI overloads and `IConfiguration` binding
Kernel Memory Integration	Using local embeddings with Microsoft Kernel Memory
VectorData Integration	Using local embeddings with Microsoft.Extensions.VectorData abstractions
Contributing	Build from source, repo structure, guidelines
Roadmap	Planned and completed features/samples with priorities
Publishing	NuGet publishing with GitHub Actions + Trusted Publishing
Changelog	Versioned summary of notable changes

Have an idea for a new feature or sample? Please open an issue and share your suggestion.

Building from Source

git clone https://github.com/elbruno/elbruno.localembeddings.git
cd elbruno.localembeddings
dotnet build
dotnet test

Requirements

.NET 10.0 SDK or later
ONNX Runtime compatible platform (Windows, Linux, macOS)

👋 About the Author

Hi! I'm ElBruno 🧡, a passionate developer and content creator exploring AI, .NET, and modern development practices.

Made with ❤️ by ElBruno

If you like this project, consider following my work across platforms:

📻 Podcast: No Tienen Nombre — Spanish-language episodes on AI, development, and tech culture
💻 Blog: ElBruno.com — Deep dives on embeddings, RAG, .NET, and local AI
📺 YouTube: youtube.com/elbruno — Demos, tutorials, and live coding
🔗 LinkedIn: @elbruno — Professional updates and insights
𝕏 Twitter: @elbruno — Quick tips, releases, and tech news

License

This project is licensed under the MIT License — see the LICENSE file for details.

LocalEmbeddings

A .NET library for generating text embeddings locally using ONNX Runtime and Microsoft.Extensions.AI abstractions — no external API calls required.

🎥 Quick Overview

New to local embeddings? Watch this 5-minute video explaining the main goal of the library.

Want to build RAG applications? Read this blog post about 3 RAG approaches in .NET with local embeddings and zero cloud calls.

Interested in image embeddings? Check out the YouTube video and blog post about local image embeddings with CLIP and ONNX.

Features

Local Embedding Generation — Run inference entirely on your machine using ONNX Runtime
Microsoft.Extensions.AI Integration — Implements IEmbeddingGenerator<string, Embedding<float>>
Kernel Memory Integration — Companion package ElBruno.LocalEmbeddings.KernelMemory provides a native ITextEmbeddingGenerator adapter for Microsoft Kernel Memory
VectorData Integration — Companion package ElBruno.LocalEmbeddings.VectorData adds DI helpers for Microsoft.Extensions.VectorData (VectorStore and typed collections)
Built-in In-Memory Vector Store — ElBruno.LocalEmbeddings.VectorData includes InMemoryVectorStore (no Semantic Kernel connector dependency required)
HuggingFace Model Support — Use popular sentence transformer models from HuggingFace Hub
Automatic Model Caching — Models are downloaded once and cached locally
Dependency Injection Support — First-class IServiceCollection integration
Single-String Convenience API — GenerateAsync("text") and GenerateEmbeddingAsync("text") — no array wrapping needed
Similarity Helpers — Cosine similarity, all-pairs Similarity(...) matrix, and one-line FindClosestAsync(...) semantic search
Thread-Safe & Batched — Concurrent generation and efficient multi-text processing

Installation

dotnet add package ElBruno.LocalEmbeddings

For Kernel Memory integration, also install the companion package:

dotnet add package ElBruno.LocalEmbeddings.KernelMemory

For VectorData integration, install:

dotnet add package ElBruno.LocalEmbeddings.VectorData

Quick Start

1) Generate one embedding

using ElBruno.LocalEmbeddings;

await using var generator = await LocalEmbeddingGenerator.CreateAsync();
var embedding = await generator.GenerateEmbeddingAsync("Hello, world!");
Console.WriteLine(embedding.Vector.Length); // 384

2) Generate embeddings for multiple texts

var inputs = new[] { "first text", "second text", "third text" };
var embeddings = await generator.GenerateAsync(inputs);
Console.WriteLine(embeddings.Count); // 3

3) Compare two texts with cosine similarity

using ElBruno.LocalEmbeddings.Extensions;

var pair = await generator.GenerateAsync(["I love coding", "I enjoy programming"]);
var score = pair[0].CosineSimilarity(pair[1]);
Console.WriteLine(score);

4) Semantic search in one line

var corpus = new[]
{
    "Python for data science",
    "JavaScript for web apps",
    "Swift for iOS development"
};

var corpusEmbeddings = await generator.GenerateAsync(corpus);
var results = await generator.FindClosestAsync(
    "best language for websites",
    corpus,
    corpusEmbeddings,
    topK: 2,
    minScore: 0.2f);

foreach (var result in results)
    Console.WriteLine($"{result.Score:F3} - {result.Text}");

For custom models and runtime behavior, use the options-based constructor: new LocalEmbeddingGenerator(new LocalEmbeddingsOptions { ... }).

Note: The synchronous constructor remains available for backward compatibility, but performs blocking initialization when downloads are needed.

Want to go further? Read the Getting Started guide and the other docs in this repo for DI, configuration, VectorData, Kernel Memory, and full RAG examples.

Prefer a containerized dev environment? See the Dev Container section in the Contributing guide.

Samples

See the samples README for prerequisites and run instructions.

Sample	What It Shows
HelloWorldAltModel	Minimal hello world with `sentence-transformers/all-MiniLM-L12-v2`
ConsoleApp	All the basics: single/batch embeddings, similarity, semantic search, DI
RagChat	Embedding-only semantic search Q&A using shared VectorData `InMemoryVectorStore` (no LLM needed)
RagOllama	Full RAG with Ollama + phi4-mini + Kernel Memory
RagFoundryLocal	Full RAG with Foundry Local + phi4-mini
ImageRagSimple	Minimal image RAG: index images → search by text
ImageRagChat	Interactive image RAG chat with text and image-to-image search

Configuration

var options = new LocalEmbeddingsOptions
{
    ModelName = "sentence-transformers/all-MiniLM-L6-v2",  // HuggingFace model
    MaxSequenceLength = 512,                                // Max tokens
    CacheDirectory = null,                                  // Auto-detect per platform
    EnsureModelDownloaded = true,                           // Download if missing
    NormalizeEmbeddings = false                              // L2 normalize vectors
};

See Configuration docs for supported models, local model paths, and cache locations.

Common model options (with model cards)

Estimated download sizes below are approximate and can vary by ONNX variant (fp32/int8) and tokenizer assets.

Documentation

Topic	Description
Getting Started	Step-by-step guide from hello world to RAG
API Reference	Classes, methods, and extension methods
Configuration	Options, supported models, cache locations
Alternative Models	Non-default free models, local download workflow, and license notes
Dependency Injection	All DI overloads and `IConfiguration` binding
Kernel Memory Integration	Using local embeddings with Microsoft Kernel Memory
VectorData Integration	Using local embeddings with Microsoft.Extensions.VectorData abstractions
Contributing	Build from source, repo structure, guidelines
Roadmap	Planned and completed features/samples with priorities
Publishing	NuGet publishing with GitHub Actions + Trusted Publishing
Changelog	Versioned summary of notable changes

Have an idea for a new feature or sample? Please open an issue and share your suggestion.

Building from Source

git clone https://github.com/elbruno/elbruno.localembeddings.git
cd elbruno.localembeddings
dotnet build
dotnet test

Requirements

.NET 10.0 SDK or later
ONNX Runtime compatible platform (Windows, Linux, macOS)

👋 About the Author

Hi! I'm ElBruno 🧡, a passionate developer and content creator exploring AI, .NET, and modern development practices.

Made with ❤️ by ElBruno

If you like this project, consider following my work across platforms:

📻 Podcast: No Tienen Nombre — Spanish-language episodes on AI, development, and tech culture
💻 Blog: ElBruno.com — Deep dives on embeddings, RAG, .NET, and local AI
📺 YouTube: youtube.com/elbruno — Demos, tutorials, and live coding
🔗 LinkedIn: @elbruno — Professional updates and insights
𝕏 Twitter: @elbruno — Quick tips, releases, and tech news

License

This project is licensed under the MIT License — see the LICENSE file for details.

elbruno/ElBruno.LocalEmbeddings.ImageEmbeddingsv1.1.4

Get Started

Readme

LocalEmbeddings

🎥 Quick Overview

Features

Installation

Quick Start

1) Generate one embedding

2) Generate embeddings for multiple texts

3) Compare two texts with cosine similarity

4) Semantic search in one line

Samples

Configuration

Common model options (with model cards)

Documentation

Building from Source

Requirements

👋 About the Author

License

elbruno/ElBruno.LocalEmbeddings.ImageEmbeddingsv1.1.4

Get Started

Readme

LocalEmbeddings

🎥 Quick Overview

Features

Installation

Quick Start

1) Generate one embedding

2) Generate embeddings for multiple texts

3) Compare two texts with cosine similarity

4) Semantic search in one line

Samples

Configuration

Common model options (with model cards)

Documentation

Building from Source

Requirements

👋 About the Author

License