SHAHRIAR LABSIntelligence in Motion
    Back to Blog
    ArchitectureJune 5, 2026

    Zero-Downtime LLMs: Building High-Availability AI

    An architectural deep-dive into how freelm uses circuit breakers and interleaved failover to guarantee uptime.

    Building an AI wrapper is easy. Keeping it online is hard. High availability requires treating LLM APIs as unreliable dependencies.

    The "Always-Up" Architecture

    At Shahriar Labs, we engineered freelm with an obsessive focus on fault tolerance:

    • Circuit Breakers: Every API key has a breaker. A 5xx timeout opens the breaker to stop hammering a dead key. It half-opens after a cooldown.
    • Interleaved Failover: We try the best model from every provider before falling back to the 2nd best model of any provider. This guarantees fast failovers.
    • Token Bucket Pacing: We track requests-per-minute locally to avoid 429 responses entirely.

    Health Introspection

    Because reliability is the whole point, freelm exposes live state so you can see exactly why the router picked a specific path:

    for row in llm.health():
        print(row) # See ready status, breaker, requests used, and latency

    Frequently Asked Questions (FAQ)

    Q: Is AsyncFreeLLM thread-safe?
    A: Yes, it is safe across many concurrent tasks on one event loop.

    Q: What if every key is rate-limited?
    A: You can enable wait=True to briefly sleep until a key frees up, bounding it by max_wait.

    Summary

    Stop apologizing for API outages. Use freelm to build resilient, enterprise-grade AI architecture.