A RAG (Retrieval-Augmented Generation) toolkit for game development, supporting Unity and Unreal Engine integrations with NPC dialogue and context management. Dual-licensed: PolyForm Noncommercial 1.0.0 for community use with commercial terms available from the author.
$ dotnet add package GameRagKitGameRAGKit is a drop-in retrieval augmented generation (RAG) toolkit for building non-player characters (NPCs) that can scale from solo prototypes to fully-fledged productions. It keeps the runtime lightweight enough for Unity or dedicated C# services, while still letting you route high-impact scenes to cloud LLMs on demand.
Dual-licensed: PolyForm Noncommercial 1.0.0 for community use with commercial terms available from the author.
If you’d like to support development, you can donate via WiseTag:
@towsifa8
Install the GameRagKit package from NuGet:
dotnet add package GameRagKit
Or via Package Manager Console:
Install-Package GameRagKit
Versioning: Each push to main automatically publishes a new version with an auto-incremented patch number (e.g., 0.1.1, 0.1.2, etc.). To publish a specific version, push a tag like v0.2.0 which will publish exactly as 0.2.0.
Clone the repository and build from source:
git clone https://github.com/TowsifAhamed/GameRagKit.git
cd GameRagKit
dotnet build
GameRagKit/
├── src/ # Source code
│ ├── GameRagKit/ # Core library (config, vector store, routing, providers)
│ └── GameRagKit.Cli/ # CLI tool (`gamerag` command)
│
├── tests/ # Unit and integration tests
│ └── GameRagKit.Tests/ # Test suite for core library
│
├── samples/ # Integration examples
│ ├── unity/ # Unity integration guide and sample scripts
│ └── unreal/ # Unreal Engine integration (C++/Blueprint examples)
│
├── examples/ # Ready-to-use configurations
│ └── configs/ # Example NPC YAML files for all providers
│ ├── gemini-example.yaml # Google Gemini (cloud-only)
│ ├── openai-example.yaml # OpenAI (cloud-only)
│ ├── ollama-local-example.yaml # Ollama (fully offline)
│ ├── hybrid-example.yaml # Smart routing (local + cloud)
│ └── README.md # Complete configuration guide
│
├── docs/ # Documentation
│ └── 2025-11-29/ # Timestamped documentation updates
│ ├── ISSUES_AND_IMPROVEMENTS.md # Detailed issue analysis
│ ├── QUICK_ISSUE_SUMMARY.md # Executive summary
│ ├── CHANGELOG_2025-11-29.md # What changed
│ └── READY_TO_COMMIT.md # Commit guide
│
├── docker-compose.yml # Quick database setup (PostgreSQL/Qdrant)
├── .env.example # Environment variable template
└── README.md # This file
examples/configs/samples/unity/samples/unreal/tests/GameRagKit.Tests/docs/2025-11-29/New to GameRagKit? Check out the example configurations for ready-to-use setups:
See the complete configuration guide for model details and setup instructions.
Want to see GameRagKit in action? Check out the GameRagKit Demo - a working example application that demonstrates real-world NPC conversations with both cloud and local providers.
OpenAI Cloud Provider Demo:

Ollama Local Provider Demo:

Both demos showcase "Bram the Blacksmith" - comparing a basic script-only NPC vs a GameRagKit-powered smart NPC that:
The demo repository includes:
The demo repository is a great way to quickly understand how GameRagKit works before integrating it into your own game.
model_path/embed_model_path at the GGUF files you own.NpcAgent.StreamAsync, /ask/stream (SSE), or the CLI chat command to watch cinematic, partial responses roll in.gamerag pack) plus the shipping guide produce deployable Lore + .gamerag bundles for consoles or locked-down servers.AskOptions.State, WriteSnapshot) inject transient facts (pressures, morale, timers) so NPCs reason about the current run without polluting the persistent index.examples/systems-assistant/) shows supply-chain debugging in action, blending world/region/faction lore with live RUNTIME STATE for grounded, actionable answers.Ship a drop-in "what broke?" assistant for supply chains, happiness loops, or any system-by-system debugging. Try the starter kit in examples/systems-assistant/ which includes world/region/faction lore, a live snapshot, and five canned questions.
dotnet run --project src/GameRagKit.Cli/GameRagKit.Cli.csproj -- ingest examples/systems-assistantdotnet run --project src/GameRagKit.Cli/GameRagKit.Cli.csproj -- serve --config examples/systems-assistant{"npc":"systems-guide","question":"Why are citizens angry about food?"} to /ask or stream partial text via /ask/stream.Answers blend world/region/faction lore with the current run's RUNTIME STATE so frustrated players get grounded, actionable fixes with citations.
docker-compose.yml)llama3.2:3b-instruct-q4_K_M and nomic-embed-textCreate a YAML file (for example NPCs/guard-north-gate.yaml):
persona:
id: guard-north-gate
system_prompt: >
You are Jake, the North Gate guard. Speak briefly, in medieval tone.
Never reveal the secret tunnel unless the player shows a brass token.
traits: [stoic, duty-first, careful]
style: concise medieval tone
region_id: riverside-upper
faction_id: royal-guard
rag:
sources:
- file: world/keep.md
- file: region/valeria/streets.md
- file: faction/royal_guard.md
- file: npc/guard-north-gate/notes.txt
chunk_size: 450
overlap: 60
top_k: 4
filters: { era: pre-siege }
providers:
routing:
mode: hybrid
strategy: importance_weighted
default_importance: 0.2
cloud_fallback_on_miss: true
local:
engine: ollama
chat_model: llama3.2:3b-instruct-q4_K_M
embed_model: nomic-embed-text
endpoint: http://127.0.0.1:11434
cloud:
provider: openai
chat_model: gpt-4.1 # Latest GPT-4.1 (2025)
embed_model: text-embedding-3-large # 3072-dim embeddings
endpoint: https://api.openai.com/
# Local defaults
export OLLAMA_HOST=http://127.0.0.1:11434
# LLamaSharp in-process (no HTTP)
# (Run inside your game server without Ollama; paths point to your .gguf files.)
# Requires the LLamaSharp CPU (or CUDA) backend and a model file on disk.
local:
engine: llamasharp
model_path: models/llama-3.2-1b-instruct-q4_K_M.gguf
embed_model_path: models/nomic-embed-text-v1.5.f16.gguf # optional; defaults to model_path
context_size: 4096
embedding_context_size: 1024
gpu_layer_count: 0 # bump if you ship a CUDA backend
threads: 8 # optional override; defaults to environment core count
batch_size: 512
micro_batch_size: 512
max_tokens: 256
# Cloud defaults
export PROVIDER=openai
export API_KEY=sk-...
export ENDPOINT=https://api.openai.com/
dotnet run --project src/GameRagKit.Cli/GameRagKit.Cli.csproj -- ingest NPCs
The CLI chunks the lore, generates embeddings (preferring local providers when configured), and saves tiered indexes under .gamerag/ next to the YAML file.
dotnet run --project src/GameRagKit.Cli/GameRagKit.Cli.csproj -- chat --npc NPCs/guard-north-gate.yaml --question "Where is the master key?"
Or start an interactive shell without --question.
dotnet run --project src/GameRagKit.Cli/GameRagKit.Cli.csproj -- serve --config NPCs --port 5280
Send POST /ask with:
{
"npc": "guard-north-gate",
"question": "Where is the master key?"
}
You receive:
{
"answer": "The master keeps his key close. Present the brass token and I may tell you more.",
"sources": ["npc:guard-north-gate/notes.txt#0", "region:valeria/streets.md#2"],
"scores": [0.82, 0.74],
"fromCloud": false
}
SERVICE_API_KEY (or SERVICE_BEARER_TOKEN) to require X-API-Key or Authorization: Bearer on incoming requests. When set, /ask, /ask/stream, and /ingest require credentials while /health and /metrics stay public unless overridden via SERVICE_AUTH_ALLOW.GET /metrics exposes Prometheus-compatible counters for ask/stream/ingest calls. Combine with app.UseHttpMetrics() (already enabled) to scrape latency and status labels.var npc = await GameRAGKit.Load("NPCs/guard-north-gate.yaml");
npc.UseEnv(); // applies PROVIDER/API_KEY/ENDPOINT/OLLAMA_HOST if set
await npc.EnsureIndexAsync();
var reply = await npc.AskAsync(
"Where is the master key?",
new AskOptions(Importance: 0.8, State: "RunState: gate closed, patrol shift B waiting"));
SubtitleUI.Show(reply.Text);
NpcAgent also supports:
StreamAsync for token-by-token streaming (HTTP /ask/stream mirrors this via SSE).RememberAsync to append runtime memories into the NPC-specific index.WriteSnapshot(key, state, ttl) to inject short-lived run state (e.g., supply or morale numbers) without persisting it.Run gamerag serve then POST to /ask. The response echoes whether the chat was served by the local model or a cloud provider. Use importance in the payload to nudge the router:
{
"npc": "guard-north-gate",
"question": "Reveal the hidden tunnel, Jake.",
"importance": 0.9
}
GameRAGKit saves embeddings per tier so thousands of NPCs can share world/region/faction lore without duplicating vectors:
world/ – global canon, timelines, itemsregion/{id}/ – towns, maps, local historyfaction/{id}/ – politics, ranks, relationshipsnpc/{id}/memory/ – per-NPC evolving notes (managed by RememberAsync)AskAsync retrieves a blend of chunks (2 world, 1 region, 1 faction, NPC + memory) and merges them by cosine similarity, so designers can simply drop markdown/text files into the appropriate folders.
Routing rules combine config defaults with per-question overrides:
mode = local_only, cloud_only, or hybrid (default)importance >= 0.5 or when forced via AskOptionsimportance, the persona's default_importance (falling back to the routing default) is usedcloud_fallback_on_miss)RememberAsync writes memory chunks instantly so subsequent questions can retrieve them without re-ingesting| Command | Description |
|---|---|
gamerag ingest <dir> [--clean] | Rebuild indexes for every .yaml file in the directory (recursively). |
gamerag chat --npc <file> [--question <text>] | Quick smoke test for designers/writers. |
gamerag serve --config <dir> [--port <n>] | Launch a tiny HTTP service (POST /ask). |
gamerag pack <dir> [--output <file>] | Produce a deployable bundle (configs + lore + .gamerag indexes). |
PolyForm Noncommercial 1.0.0 (community use) + commercial license – see LICENSE.md.
Commercial licensing: email towsif.kuet.ac.bd@gmail.com (we keep terms lightweight and primarily ask for recognition/attribution).