freelm Documentation
freelm is a drop-in client for Python and Node.js that aggregates multiple free-tier LLM providers into one fault-tolerant, OpenAI-compatible API call. Use this guide to set up API keys, configure routing strategies, and deploy production-grade LLM applications without spending a dime.
1. Installation
Install the package via your preferred package manager. Both implementations provide identical API signatures.
PYTHON (PyPI)
pip install freelm
NODE.JS / TYPESCRIPT (npm)
npm install freelm
2. Drop-in OpenAI Shim
The fastest way to use freelm is via the OpenAI compatibility shim. This allows you to retain your existing standard `openai` package code, but route the network requests through the freelm failover engine.
PYTHON EXAMPLE
# Instead of: from openai import OpenAIfromfreelm.compatimportOpenAI # Automatically loads available keys from .env client = OpenAI() response = client.chat.completions.create(() model="auto", messages=[{"role": "user", "content": "Explain quantum computing."}] )
NODE.JS EXAMPLE
// Instead of: import OpenAI from "openai";import{ OpenAI }from"freelm/compat"; // Automatically loads available keys from process.envconstclient =newOpenAI();constresponse =awaitclient.chat.completions.create({) model: "chat:fast", messages: [{ role: "user", content: "Explain quantum computing." }] }); console.log(response.choices[0].message.content);
3. Environment Variables
freelm dynamically detects keys via environment variables. You do not need keys for every provider; freelm will automatically build a connection pool based on the keys it finds. You can supply multiple keys per provider by comma-separating them.
| Provider | API Key Variable (First match wins) | Tier Override Variable |
|---|---|---|
| OpenRouter | OPENROUTER_API_KEY FREELM_OPENROUTER_KEYS | FREELM_OPENROUTER_TIER (free|credit) |
| Google AI Studio | GEMINI_API_KEY GOOGLE_API_KEY FREELM_GOOGLE_KEYS | FREELM_GOOGLE_TIER (free|tier1) |
| NVIDIA NIM | NVIDIA_API_KEY NIM_API_KEY FREELM_NIM_KEYS | FREELM_NIM_TIER |
| Groq | GROQ_API_KEY FREELM_GROQ_KEYS | FREELM_GROQ_TIER |
| Cerebras | CEREBRAS_API_KEY FREELM_CEREBRAS_KEYS | FREELM_CEREBRAS_TIER |
| Mistral | MISTRAL_API_KEY FREELM_MISTRAL_KEYS | FREELM_MISTRAL_TIER |
4. Routing Strategies
freelm offers four distinct routing algorithms to manage how requests failover between your API keys. You can specify the strategy when explicitly initializing `FreeLLM()`.
Priority (
strategy="priority")The default. Providers are ranked deterministically by their assigned `priority` value (lower numbers are tried first), falling back strictly in order.
Quota Aware (
strategy="quota_aware")Ranks keys dynamically by calculating their current headroom (remaining Requests-Per-Minute bounded by Daily Quota). This effectively load-balances across all keys to prevent any single key from hitting rate limits prematurely.
Round Robin (
strategy="round_robin")Rotates the initial provider tried on each call, evenly distributing request volume blindly across the pool.
Latency (
strategy="latency")Monitors moving averages of response times and prefers the provider with the lowest historically observed latency.
5. Virtual Models & Discovery
Because free models are constantly cycled in and out by providers (especially OpenRouter), freelm uses Virtual Models. You ask for a class of model, and freelm maps it to the best available concrete model for the active provider.
auto/chat: Any available chat model.chat:fast: Fast/cheap models (e.g., Llama 3 8B, Gemma 2 9B).chat:large: Stronger, more capable models.vendor/model-id: Passthrough for specific model requests.
Dynamic Discovery: For OpenRouter, freelm hits the `GET /models` endpoint on the first run, discovers which models are tagged as `:free`, parses their context lengths and tags, and caches them to disk (`~/.cache/freelm`, TTL 3600s) to avoid API overhead on subsequent calls.
INSPECTING LIVE MODELS (JS/TS)
import{ listFreeModels }from"freelm";constmodels =awaitlistFreeModels();for(constmofmodels.slice(0, 5)) { console.log(m.id, m.tags); }