6 posts tagged with "incident-report"

View All Tags

Incident Report: Cache Eviction Closes In-Use httpx Clients

February 27, 2026

Ryan Crabbe

Performance Engineer, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Krrish Dholakia

CEO, LiteLLM

Date: February 27, 2026 Duration: ~6 days (Feb 21 merge -> Feb 27 fix) Severity: High Status: Resolved

Note: This fix is available starting from LiteLLM v1.81.14.rc.2 or higher.

Summary

A change to improve Redis connection pool cleanup introduced a regression that closed httpx clients that were still actively being used by the proxy. The LLMClientCache (an in-memory TTL cache) stores both Redis clients and httpx clients under the same eviction policy. When a cache entry expired or was evicted, the new cleanup code called aclose()/close() on the evicted value which worked correctly for Redis clients, but destroyed httpx clients that other parts of the system still held references to and were actively using for LLM API calls.

Impact: Any proxy instance that hit the cache TTL (default 10 minutes) or capacity limit (200 entries) would have its httpx clients closed out from under it, causing requests to LLM providers to fail with connection errors.

Background

LLMClientCache extends InMemoryCache and is used to cache SDK clients (OpenAI, Anthropic, etc.) to avoid re-creating them on every request. These clients are keyed by configuration + event loop ID. The cache has:

Max size: 200 entries
Default TTL: 10 minutes

When the cache is full or entries expire, InMemoryCache.evict_cache() calls _remove_key() to drop entries.

The cached values are a mix of:

Redis/async Redis clients — owned exclusively by the cache, safe to close on eviction
httpx-backed SDK clients (OpenAI, Anthropic, etc.) — shared references, still in use by router/model instances

Root Cause

PR #21717 overrode _remove_key() in LLMClientCache to close async clients on eviction:

Problematic code added in PR #21717

class LLMClientCache(InMemoryCache):
    def _remove_key(self, key: str) -> None:
        value = self.cache_dict.get(key)
        super()._remove_key(key)
        if value is not None:
            close_fn = getattr(value, "aclose", None) or getattr(value, "close", None)
            if close_fn and asyncio.iscoroutinefunction(close_fn):
                try:
                    asyncio.get_running_loop().create_task(close_fn())
                except RuntimeError:
                    pass
            elif close_fn and callable(close_fn):
                try:
                    close_fn()
                except Exception:
                    pass

The intent was correct for Redis clients — prevent connection pool leaks when cached Redis clients expire. But LLMClientCache also stores httpx-backed SDK clients (e.g., AsyncOpenAI, AsyncAnthropic). These clients:

Have an aclose() method (inherited from httpx)
Are still held by references elsewhere in the codebase (router, model instances)
Were being closed without any check on whether they were still in use

So when the cache evicted an entry, it would call aclose() on an httpx client that was still being used for active LLM requests → closed transport → connection errors.

The Fix

PR #22247 removed the _remove_key override entirely:

The fix (PR #22247)

 class LLMClientCache(InMemoryCache):
-    def _remove_key(self, key: str) -> None:
-        """Close async clients before evicting them to prevent connection pool leaks."""
-        value = self.cache_dict.get(key)
-        super()._remove_key(key)
-        if value is not None:
-            close_fn = getattr(value, "aclose", None) or getattr(
-                value, "close", None
-            )
-            ...
-
     def update_cache_key_with_event_loop(self, key):

The eviction now simply drops the reference and lets Python's GC handle cleanup, which is safe because:

httpx clients that are still referenced elsewhere stay alive
Unreferenced clients get cleaned up by GC naturally

The other improvements from PR #21717 were kept:

max_connections respected for URL-based Redis configs, previously silently dropped
disconnect() now closes both sync and async Redis clients, sync client was previously leaked
Connection pool passthrough, when a pool is provided with a URL config, it's used directly instead of creating a duplicate

Remediation

Action	Status	Code
Remove `_remove_key` override that closes shared clients on eviction	✅ Done	PR #22247
Add regression test: httpx client survives capacity eviction	✅ Done	PR #22306
Add regression test: httpx client survives TTL eviction	✅ Done	PR #22306

Incident Report: Wildcard Blocking New Models After Cost Map Reload

February 23, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Date: Feb 23, 2026
Duration: ~3 hours
Severity: High (for users with provider wildcard access rules)
Status: Resolved

Summary

When a new Anthropic model (e.g. claude-sonnet-4-6) was added to the LiteLLM model cost map and a cost map reload was triggered, requests to the new model were rejected with:

key not allowed to access model. This key can only access models=['anthropic/*']. Tried to access claude-sonnet-4-6.

The reload updated litellm.model_cost correctly but never re-ran add_known_models(), so litellm.anthropic_models (the in-memory set used by the wildcard resolver) remained stale. The new model was invisible to the anthropic/* wildcard even though the cost map knew about it.

LLM calls: All requests to newly-added Anthropic models were blocked with a 401.
Existing models: Unaffected — only models missing from the stale provider set were impacted.
Other providers: Same bug class existed for any provider wildcard (e.g. openai/*, gemini/*).

Incident Report: SERVER_ROOT_PATH regression broke UI routing

February 21, 2026

Yuneng Jiang

SWE @ LiteLLM (Full Stack)

Ishaan Jaff

CTO, LiteLLM

Krrish Dholakia

CEO, LiteLLM

Date: January 22, 2026 Duration: ~4 days (until fix merged January 26, 2026) Severity: High Status: Resolved

Note: This fix is available starting from LiteLLM v1.81.3.rc.6 or higher.

Summary

A PR (#19467) accidentally removed the root_path=server_root_path parameter from the FastAPI app initialization in proxy_server.py. This caused the proxy to ignore the SERVER_ROOT_PATH environment variable when serving the UI. Users who deploy LiteLLM behind a reverse proxy with a path prefix (e.g., /api/v1 or /llmproxy) found that all UI pages returned 404 Not Found.

LLM API calls: No impact. API routing was unaffected.
UI pages: All UI pages returned 404 for deployments using SERVER_ROOT_PATH.
Swagger/OpenAPI docs: Broken when accessed through the configured root path.

Incident Report: vLLM Embeddings Broken by encoding_format Parameter

February 18, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Date: Feb 16, 2026 Duration: ~3 hours Severity: High (for vLLM embedding users) Status: Resolved

Summary

A commit (dbcae4a) intended to fix OpenAI SDK behavior broke vLLM embeddings by explicitly passing encoding_format=None in API requests. vLLM rejects this with error: "unknown variant \`, expected float or base64"`.

vLLM embedding calls: Complete failure - all requests rejected
Other providers: No impact - OpenAI and other providers functioned normally
Other vLLM functionality: No impact - only embeddings were affected

Incident Report: Invalid beta headers with Claude Code

February 16, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Ishaan Jaff

CTO, LiteLLM

Krrish Dholakia

CEO, LiteLLM

Date: February 13, 2026 Duration: ~3 hours Severity: High Status: Resolved

Note: This fix will be available starting from v1.81.13-nightly or higher of LiteLLM.

Summary

Claude Code began sending unsupported Anthropic beta headers to non-Anthropic providers (Bedrock, Azure AI, Vertex AI), causing invalid beta flag errors. LiteLLM was forwarding all beta headers without provider-specific validation. Users experienced request failures when routing Claude Code requests through LiteLLM to these providers.

LLM calls to Anthropic: No impact.
LLM calls to Bedrock/Azure/Vertex: Failed with invalid beta flag errors when unsupported headers were present.
Cost tracking and routing: No impact.

Incident Report: Invalid model cost map on main

February 10, 2026

Ishaan Jaffer

CTO, LiteLLM

Date: January 27, 2026 Duration: ~20 minutes Severity: Low Status: Resolved

Summary

A malformed JSON entry in model_prices_and_context_window.json was merged to main (562f0a0). This caused LiteLLM to silently fall back to a stale local copy of the model cost map. Users on older package versions lost cost tracking for newer models only (e.g. azure/gpt-5.2). No LLM calls were blocked.

LLM calls and proxy routing: No impact.
Cost tracking: Impacted for newer models not present in the local backup. Older models were unaffected. The incident lasted ~20 minutes until the commit was reverted.

Summary​

Background​

Root Cause​

The Fix​

Remediation​

Summary​

Summary​

Summary​

Summary​

Summary​

Summary

Background

Root Cause

The Fix

Remediation

Summary

Summary

Summary

Summary

Summary