Readme

TokenRateGate

Stop getting HTTP 429 "Rate limit exceeded" errors from Azure OpenAI, OpenAI, and Anthropic Claude APIs.

TokenRateGate is a .NET library that prevents rate limit errors by intelligently managing your token and request budgets. It tracks both TPM (Tokens-Per-Minute) and RPM (Requests-Per-Minute) limits, automatically queues requests when capacity is full, and ensures you never hit the dreaded 429 error again.

The Problem

Getting this error when calling LLM APIs?

HTTP 429: Too Many Requests
Rate limit is exceeded. Try again in X seconds.

This happens when you exceed your API provider's limits:

Azure OpenAI / Azure AI Foundry: TPM (Tokens-Per-Minute) limits based on your deployment tier
OpenAI API: Both TPM and RPM (Requests-Per-Minute) limits
Anthropic Claude: Rate limits based on usage tier

TokenRateGate prevents these errors by managing your token budget and queueing requests before they hit the API.

Features

Prevents HTTP 429 Errors: Enforces TPM and RPM limits before making API calls
Dual Limiting: Tracks both token usage (TPM) and request count (RPM) - whichever is more restrictive applies
Smart Queuing: Automatically queues requests when capacity is full, with configurable timeouts
Safety Buffer: Configurable buffer to avoid hitting exact limits (prevents edge-case 429s)
Accurate Tracking: Records actual token usage from API responses for precise capacity management
Multiple Providers: Built-in support for OpenAI and Azure OpenAI SDKs
Multi-Tenant: Factory pattern for managing different rate limits per tenant/model/tier
Real-Time Monitoring: Detailed usage statistics and capacity tracking
Dependency Injection: First-class DI support with configuration binding

Option	Default	Description
`TokenLimit`	500000	Maximum tokens per window (TPM limit)
`WindowSeconds`	60	Time window in seconds for token tracking
`SafetyBufferPercentage`	0.05 (5%)	Percentage of TokenLimit reserved as safety buffer<br>Effective limit = `TokenLimit * (1 - SafetyBufferPercentage)`
`MaxConcurrentRequests`	1000	Maximum concurrent active reservations
`MaxRequestsPerMinute`	null	Optional RPM limit (enforced in addition to token limit)<br>If both are configured, whichever is more restrictive applies
`RequestWindowSeconds`	120	Time window for RPM tracking (default: max(120s, 2×WindowSeconds))
`MaxWaitTime`	null (unlimited)	Maximum time to wait for capacity in the queue before timing out<br>Note: Only applies to capacity queue waiting, NOT semaphore waiting<br>Set to null for unlimited waiting (recommended for most use cases)
`OutputEstimationStrategy`	FixedMultiplier	How to estimate output tokens when not provided
`OutputMultiplier`	0.5	Multiplier for FixedMultiplier strategy
`DefaultOutputTokens`	1000	Fixed output for FixedAmount strategy

mrcho/TokenRateGate.Azurev0.9.1

Get Started

Readme

TokenRateGate

The Problem

Features

Installation

Option 1: Complete Solution (Recommended)

Option 2: Custom LLM Providers

Option 3: Add Integrations Individually

Quick Start

1. Dependency Injection Setup (Recommended)

2. Using with Custom LLM Providers

3. Using with OpenAI SDK (Recommended for OpenAI Users)

4. Using with Azure OpenAI (Recommended for Azure Users)

Multi-Tenant Configuration

Standalone Usage (Without DI)

Monitoring Usage

Configuration Options

Output Estimation Strategies

How It Works

Advanced Topics

Health Checks

Custom Token Estimation

Logging

Samples

Performance

Testing

Requirements

Packages

User-Facing Packages

Internal Packages (Included in Base)

Contributing

License

Acknowledgments