Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible API server. No cloud dependencies, no limits, just pure local inference.
$ dotnet add package SharpAI
Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible and OpenAI-compatible API server. No cloud dependencies, no limits, just local embeddings and inference.
A .NET library for local AI model inference with Ollama-compatible and OpenAI-compatible REST APIs
Embeddings • Completions • Chat • Built on LlamaSharp • GGUF Models Only
Install SharpAI via NuGet:
dotnet add package SharpAI
Or via Package Manager Console:
Install-Package SharpAI
The main entry point that provides access to all functionality:
using SharpAI;
using SyslogLogging;
// Initialize the AI driver
var ai = new AIDriver(
logging: new LoggingModule(),
databaseFilename: "./sharpai.db",
huggingFaceApiKey: "hf_xxxxxxxxxxxx",
modelDirectory: "./models/"
);
// Download a model from HuggingFace (GGUF format)
await ai.Models.Add(
name: "microsoft/phi-2",
quantizationPriority: null,
progressCallback: (url, bytesDownloaded, percentComplete) =>
{
Console.WriteLine($"Progress: {percentComplete:P0}");
});
// Generate a completion
string response = await ai.Completion.GenerateCompletion(
model: "microsoft/phi-2",
prompt: "Once upon a time",
maxTokens: 512,
temperature: 0.7f
);
The AIDriver provides access to APIs via:
ai.Models - Model management operationsai.Embeddings - Embedding generationai.Completion - Text completion generationai.Chat - Chat completion generationManages model downloads and lifecycle:
// List all downloaded models
List<ModelFile> models = ai.Models.All();
// Get a specific model
ModelFile model = ai.Models.GetByName("microsoft/phi-2");
// Download a new model from HuggingFace
ModelFile downloaded = await ai.Models.Add(
name: "meta-llama/Llama-2-7b-chat-hf",
quantizationPriority: null,
progressCallback: null);
// Delete a model
ai.Models.Delete("microsoft/phi-2");
// Get the filesystem path for a model
string modelPath = ai.Models.GetFilename("microsoft/phi-2");
SharpAI automatically handles downloading GGUF files from HuggingFace. Only GGUF format models are supported.
Model metadata includes:
Generate vector embeddings for text:
// Single text embedding
float[] embedding = await ai.Embeddings.Generate(
model: "microsoft/phi-2",
input: "This is a sample text"
);
// Multiple text embeddings
string[] texts = { "First text", "Second text", "Third text" };
float[][] embeddings = await ai.Embeddings.Generate(
model: "microsoft/phi-2",
inputs: texts
);
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate text continuations:
// Non-streaming completion
string completion = await ai.Completion.GenerateCompletion(
model: "microsoft/phi-2",
prompt: "The meaning of life is",
maxTokens: 512,
temperature: 0.7f
);
// Streaming completion
await foreach (string token in ai.Completion.GenerateCompletionStreaming(
model: "microsoft/phi-2",
prompt: "Write a poem about",
maxTokens: 512,
temperature: 0.8f))
{
Console.Write(token);
}
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate conversational responses:
// Non-streaming chat
string response = await ai.Chat.GenerateCompletion(
model: "microsoft/phi-2",
prompt: chatFormattedPrompt, // Prompt should be formatted for chat
maxTokens: 512,
temperature: 0.7f
);
// Streaming chat
await foreach (string token in ai.Chat.GenerateCompletionStreaming(
model: "microsoft/phi-2",
prompt: chatFormattedPrompt,
maxTokens: 512,
temperature: 0.7f))
{
Console.Write(token);
}
SharpAI includes prompt builders to format conversations for different model types:
using SharpAI.Prompts;
var messages = new List<ChatMessage>
{
new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
new ChatMessage { Role = "user", Content = "What is the capital of France?" },
new ChatMessage { Role = "assistant", Content = "The capital of France is Paris." },
new ChatMessage { Role = "user", Content = "What is its population?" }
};
// Format for different model types
string chatMLPrompt = PromptBuilder.Build(ChatFormat.ChatML, messages);
/* Output:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>assistant
*/
string llama2Prompt = PromptBuilder.Build(ChatFormat.Llama2, messages);
/* Output:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>
What is the capital of France? [/INST] The capital of France is Paris. </s><s>[INST] What is its population? [/INST]
*/
string simplePrompt = PromptBuilder.Build(ChatFormat.Simple, messages);
/* Output:
system: You are a helpful assistant.
user: What is the capital of France?
assistant: The capital of France is Paris.
user: What is its population?
assistant:
*/
Supported chat formats:
Simple - Basic role: content format (generic models, base models)ChatML - OpenAI ChatML format (GPT models, models fine-tuned with ChatML) including QwenLlama2 - Llama 2 instruction format (Llama-2-Chat models)Llama3 - Llama 3 format (Llama-3-Instruct models)Alpaca - Alpaca instruction format (Alpaca, Vicuna, WizardLM, and many Llama-based fine-tunes)Mistral - Mistral instruction format (Mistral-Instruct, Mixtral-Instruct models)HumanAssistant - Human/Assistant format (Anthropic Claude-style training, some chat models)Zephyr - Zephyr model format (Zephyr beta/alpha models)Phi - Microsoft Phi format (Phi-2, Phi-3 models)DeepSeek - DeepSeek format (DeepSeek-Coder, DeepSeek-LLM models)If you are unsure which your model supports, choose Simple.
using SharpAI.Prompts;
// Simple instruction
string instructionPrompt = TextPromptBuilder.Build(
TextGenerationFormat.Instruction,
"Write a haiku about programming"
);
/* Output:
### Instruction:
Write a haiku about programming
### Response:
*/
// Code generation with context
var context = new Dictionary<string, string>
{
["language"] = "python",
["requirements"] = "Include error handling"
};
string codePrompt = TextPromptBuilder.Build(
TextGenerationFormat.CodeGeneration,
"Write a function to parse JSON",
context
);
/* Output:
Language: python
Task: Write a function to parse JSON
Requirements: Include error handling
```python
*/
// Question-answer format
string qaPrompt = TextPromptBuilder.Build(
TextGenerationFormat.QuestionAnswer,
"What causes rain?"
);
/* Output:
Question: What causes rain?
Answer:
*/
// Few-shot examples
var examples = new List<(string input, string output)>
{
("2+2", "4"),
("5*3", "15")
};
string fewShotPrompt = TextPromptBuilder.BuildWithExamples(
TextGenerationFormat.QuestionAnswer,
"7-3",
examples
);
/* Output:
Examples:
Question: 2+2
Answer:
4
---
Question: 5*3
Answer:
15
---
Now complete the following:
Question: 7-3
Answer:
*/
Supported text generation formats:
Raw - No formattingCompletion - Continuation formatInstruction - Instruction/response formatQuestionAnswer - Q&A formatCreativeWriting - Story/creative formatCodeGeneration - Code generation formatAcademic - Academic writing formatListGeneration - List creation formatTemplateFilling - Template completionDialogue - Dialogue generationSharpAI includes a fully-functional REST API server through the SharpAI.Server project, which provides Ollama-compatible endpoints. The server acts and behaves like Ollama (with minor gaps), allowing you to use existing Ollama clients and integrations with SharpAI.
Ollama API endpoints include:
/api/generate - Text generation/api/chat - Chat completions/api/embed - Generate embeddings/api/tags - List available models/api/pull - Download models from HuggingFaceOpenAI API endpoints include:
/v1/embeddings - Generate embeddings/v1/completions - Text generation/v1/chat/completions - Chat completionsSharpAI has been tested on:
When models are downloaded, the following information is tracked:
Models are stored in the specified modelDirectory with files named by their GUID. Model metadata is stored in the SQLite database specified by databaseFilename.
The library automatically detects CUDA availability and optimizes layer allocation. The LlamaSharpEngine determines optimal GPU layers based on available hardware.
LlamaSharp supports multiple GPU backends through LlamaSharp and llama.cpp:
Note: The actual GPU support depends on the LlamaSharp build and backend availability on your system. CUDA support is most mature, while other backends may require specific LlamaSharp builds or additional setup.
SharpAI.Server is available as a Docker image, providing an easy way to deploy the Ollama-compatible API server without local installation.
For Windows:
run.bat v1.0.0
For Linux/macOS:
./run.sh v1.0.0
For Windows:
compose-up.bat
For Linux/macOS:
./compose-up.sh
Before running the Docker container, ensure you have:
sharpai.json configuration file in your working directory./logs/ - For application logs./models/ - For storing downloaded GGUF modelsThe official Docker image is available at: jchristn/sharpai. Refer to the docker directory for assets useful for running in Docker and Docker Compose.
The container uses several volume mappings for persistence:
| Host Path | Container Path | Description |
|---|---|---|
./sharpai.json | /app/sharpai.json | Configuration file |
./sharpai.db | /app/sharpai.db | SQLite database for model registry |
./logs/ | /app/logs/ | Application logs |
./models/ | /app/models/ | Downloaded GGUF model files |
Modify the sharpai.json file to supply your configuration.
The container exposes port 8000 by default.
You can access Ollama APIs at:
http://localhost:8000/api/tags - List available modelshttp://localhost:8000/api/pull - Pull a modelhttp://localhost:8000/api/generate - Generate texthttp://localhost:8000/api/chat - Chat completionshttp://localhost:8000/api/embed - Generate embeddingsYou can access OpenAI APIs at:
http://localhost:8000/v1/embeddings - Generate embeddingshttp://localhost:8000/v1/completions - Generate texthttp://localhost:8000/v1/chat/completions - Chat completionsCreate the required directory structure:
mkdir logs models
Create your sharpai.json configuration file
Run the container:
# Windows
run.bat v1.0.0
# Linux/macOS
./run.sh v1.0.0
Download a model using the API (GGUF format required):
curl http://localhost:8000/api/pull \
-d '{"model":"QuantFactory/Qwen2.5-3B-GGUF"}'
Generate text:
curl http://localhost:8000/api/generate \
-d '{
"model": "QuantFactory/Qwen2.5-3B-GGUF",
"prompt": "Why is the sky blue?",
"stream": false
}'
For production deployments, you can use Docker Compose. Create a compose.yaml file:
version: '3.8'
services:
sharpai:
image: jchristn/sharpai:v1.0.0
ports:
- "8000:8000"
volumes:
- ./sharpai.json:/app/sharpai.json
- ./sharpai.db:/app/sharpai.db
- ./logs:/app/logs
- ./models:/app/models
environment:
- TERM=xterm-256color
restart: unless-stopped
Then run:
docker compose up -d
To enable GPU acceleration in Docker:
Install the NVIDIA Container Toolkit and modify your run command:
docker run --gpus all \
-p 8000:8000 \
-v ./sharpai.json:/app/sharpai.json \
-v ./sharpai.db:/app/sharpai.db \
-v ./logs:/app/logs \
-v ./models:/app/models \
jchristn/sharpai:v1.0.0
For Docker Compose, add:
services:
sharpai:
# ... other configuration ...
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
sharpai.json exists and is valid JSON./models/ directory has proper write permissionsPlease see the CHANGELOG.md file for detailed version history and release notes.
Have a bug, feature request, or idea? Please file an issue on our GitHub repository. We welcome community input on our roadmap!
This project is licensed under the MIT License.