freelm Documentation

freelm is a drop-in client for Python and Node.js that aggregates multiple free-tier LLM providers into one fault-tolerant, OpenAI-compatible API call. Use this guide to set up API keys, configure routing strategies, and deploy production-grade LLM applications without spending a dime.

1. Installation

Install the package via your preferred package manager. Both implementations provide identical API signatures.

PYTHON (PyPI)

pip install freelm

NODE.JS / TYPESCRIPT (npm)

npm install freelm

2. Drop-in OpenAI Shim

The fastest way to use freelm is via the OpenAI compatibility shim. This allows you to retain your existing standard `openai` package code, but route the network requests through the freelm failover engine.

PYTHON EXAMPLE

# Instead of: from openai import OpenAI
from freelm.compat import OpenAI

# Automatically loads available keys from .env
client = OpenAI()

response = client.chat.completions.create(()
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum computing."}]
)
print(response.choices[0].message.content)

NODE.JS EXAMPLE

// Instead of: import OpenAI from "openai";
import { OpenAI } from "freelm/compat";

// Automatically loads available keys from process.env
const client = new OpenAI();

const response = await client.chat.completions.create({)
    model: "chat:fast",
    messages: [{ role: "user", content: "Explain quantum computing." }]
});
console.log(response.choices[0].message.content);

3. Environment Variables

freelm dynamically detects keys via environment variables. You do not need keys for every provider; freelm will automatically build a connection pool based on the keys it finds. You can supply multiple keys per provider by comma-separating them.

Provider	API Key Variable (First match wins)	Tier Override Variable
OpenRouter	OPENROUTER_API_KEY FREELM_OPENROUTER_KEYS	FREELM_OPENROUTER_TIER (free\|credit)
Google AI Studio	GEMINI_API_KEY GOOGLE_API_KEY FREELM_GOOGLE_KEYS	FREELM_GOOGLE_TIER (free\|tier1)
NVIDIA NIM	NVIDIA_API_KEY NIM_API_KEY FREELM_NIM_KEYS	FREELM_NIM_TIER
Groq	GROQ_API_KEY FREELM_GROQ_KEYS	FREELM_GROQ_TIER
Cerebras	CEREBRAS_API_KEY FREELM_CEREBRAS_KEYS	FREELM_CEREBRAS_TIER
Mistral	MISTRAL_API_KEY FREELM_MISTRAL_KEYS	FREELM_MISTRAL_TIER

4. Routing Strategies

freelm offers four distinct routing algorithms to manage how requests failover between your API keys. You can specify the strategy when explicitly initializing `FreeLLM()`.

Priority (strategy="priority")
The default. Providers are ranked deterministically by their assigned `priority` value (lower numbers are tried first), falling back strictly in order.
Quota Aware (strategy="quota_aware")
Ranks keys dynamically by calculating their current headroom (remaining Requests-Per-Minute bounded by Daily Quota). This effectively load-balances across all keys to prevent any single key from hitting rate limits prematurely.
Round Robin (strategy="round_robin")
Rotates the initial provider tried on each call, evenly distributing request volume blindly across the pool.
Latency (strategy="latency")
Monitors moving averages of response times and prefers the provider with the lowest historically observed latency.

5. Virtual Models & Discovery

Because free models are constantly cycled in and out by providers (especially OpenRouter), freelm uses Virtual Models. You ask for a class of model, and freelm maps it to the best available concrete model for the active provider.

auto / chat: Any available chat model.
chat:fast: Fast/cheap models (e.g., Llama 3 8B, Gemma 2 9B).
chat:large: Stronger, more capable models.
vendor/model-id: Passthrough for specific model requests.

Dynamic Discovery: For OpenRouter, freelm hits the `GET /models` endpoint on the first run, discovers which models are tagged as `:free`, parses their context lengths and tags, and caches them to disk (`~/.cache/freelm`, TTL 3600s) to avoid API overhead on subsequent calls.

INSPECTING LIVE MODELS (JS/TS)

import { listFreeModels } from "freelm";

const models = await listFreeModels();
for (const m of models.slice(0, 5)) {
    console.log(m.id, m.tags);
}

freelm Documentation

1. Installation

PYTHON (PyPI)

NODE.JS / TYPESCRIPT (npm)

2. Drop-in OpenAI Shim

PYTHON EXAMPLE

NODE.JS EXAMPLE

3. Environment Variables

4. Routing Strategies

Priority (`strategy="priority"`)

Quota Aware (`strategy="quota_aware"`)

Round Robin (`strategy="round_robin"`)

Latency (`strategy="latency"`)

5. Virtual Models & Discovery

INSPECTING LIVE MODELS (JS/TS)

freelm Documentation

1. Installation

PYTHON (PyPI)

NODE.JS / TYPESCRIPT (npm)

2. Drop-in OpenAI Shim

PYTHON EXAMPLE

NODE.JS EXAMPLE

3. Environment Variables

4. Routing Strategies

Priority (strategy="priority")

Quota Aware (strategy="quota_aware")

Round Robin (strategy="round_robin")

Latency (strategy="latency")

5. Virtual Models & Discovery

INSPECTING LIVE MODELS (JS/TS)

Priority (`strategy="priority"`)

Quota Aware (`strategy="quota_aware"`)

Round Robin (`strategy="round_robin"`)

Latency (`strategy="latency"`)