SHAHRIAR LABSIntelligence in Motion

    freelm Documentation

    freelm is a drop-in client for Python and Node.js that aggregates multiple free-tier LLM providers into one fault-tolerant, OpenAI-compatible API call. Use this guide to set up API keys, configure routing strategies, and deploy production-grade LLM applications without spending a dime.

    1. Installation

    Install the package via your preferred package manager. Both implementations provide identical API signatures.

    PYTHON (PyPI)

    pip install freelm

    NODE.JS / TYPESCRIPT (npm)

    npm install freelm

    2. Drop-in OpenAI Shim

    The fastest way to use freelm is via the OpenAI compatibility shim. This allows you to retain your existing standard `openai` package code, but route the network requests through the freelm failover engine.

    PYTHON EXAMPLE

    # Instead of: from openai import OpenAI
    from freelm.compat import OpenAI
    
    # Automatically loads available keys from .env
    client = OpenAI()
    
    response = client.chat.completions.create(()
        model="auto",
        messages=[{"role": "user", "content": "Explain quantum computing."}]
    )
    print(response.choices[0].message.content)

    NODE.JS EXAMPLE

    // Instead of: import OpenAI from "openai";
    import { OpenAI } from "freelm/compat";
    
    // Automatically loads available keys from process.env
    const client = new OpenAI();
    
    const response = await client.chat.completions.create({)
        model: "chat:fast",
        messages: [{ role: "user", content: "Explain quantum computing." }]
    });
    console.log(response.choices[0].message.content);

    3. Environment Variables

    freelm dynamically detects keys via environment variables. You do not need keys for every provider; freelm will automatically build a connection pool based on the keys it finds. You can supply multiple keys per provider by comma-separating them.

    ProviderAPI Key Variable (First match wins)Tier Override Variable
    OpenRouterOPENROUTER_API_KEY
    FREELM_OPENROUTER_KEYS
    FREELM_OPENROUTER_TIER (free|credit)
    Google AI StudioGEMINI_API_KEY
    GOOGLE_API_KEY
    FREELM_GOOGLE_KEYS
    FREELM_GOOGLE_TIER (free|tier1)
    NVIDIA NIMNVIDIA_API_KEY
    NIM_API_KEY
    FREELM_NIM_KEYS
    FREELM_NIM_TIER
    GroqGROQ_API_KEY
    FREELM_GROQ_KEYS
    FREELM_GROQ_TIER
    CerebrasCEREBRAS_API_KEY
    FREELM_CEREBRAS_KEYS
    FREELM_CEREBRAS_TIER
    MistralMISTRAL_API_KEY
    FREELM_MISTRAL_KEYS
    FREELM_MISTRAL_TIER

    4. Routing Strategies

    freelm offers four distinct routing algorithms to manage how requests failover between your API keys. You can specify the strategy when explicitly initializing `FreeLLM()`.

    • Priority (strategy="priority")

      The default. Providers are ranked deterministically by their assigned `priority` value (lower numbers are tried first), falling back strictly in order.

    • Quota Aware (strategy="quota_aware")

      Ranks keys dynamically by calculating their current headroom (remaining Requests-Per-Minute bounded by Daily Quota). This effectively load-balances across all keys to prevent any single key from hitting rate limits prematurely.

    • Round Robin (strategy="round_robin")

      Rotates the initial provider tried on each call, evenly distributing request volume blindly across the pool.

    • Latency (strategy="latency")

      Monitors moving averages of response times and prefers the provider with the lowest historically observed latency.

    5. Virtual Models & Discovery

    Because free models are constantly cycled in and out by providers (especially OpenRouter), freelm uses Virtual Models. You ask for a class of model, and freelm maps it to the best available concrete model for the active provider.

    • auto / chat: Any available chat model.
    • chat:fast: Fast/cheap models (e.g., Llama 3 8B, Gemma 2 9B).
    • chat:large: Stronger, more capable models.
    • vendor/model-id: Passthrough for specific model requests.

    Dynamic Discovery: For OpenRouter, freelm hits the `GET /models` endpoint on the first run, discovers which models are tagged as `:free`, parses their context lengths and tags, and caches them to disk (`~/.cache/freelm`, TTL 3600s) to avoid API overhead on subsequent calls.

    INSPECTING LIVE MODELS (JS/TS)

    import { listFreeModels } from "freelm";
    
    const models = await listFreeModels();
    for (const m of models.slice(0, 5)) {
        console.log(m.id, m.tags);
    }