AgentsOpen SourceFreeActiveMachine-verified· intermediate · ~20 min setup

FreeLLMAPI: one socket, sixteen free model tiers with auto-fallback

Front the free tiers of many providers with a single OpenAI-compatible endpoint and a prioritized fallback chain, so your apps point at one key and the router switches providers automatically when one runs out for the day.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

One person wiring their own free provider keys behind a single endpoint for personal use. CI validates the client config: the endpoint targets the FreeLLMAPI port (:3001/v1), the key carries the freellmapi- prefix, the fallback chain is an ordered list of distinct providers, and the request body is a well-formed OpenAI chat call. No keys, no model call. The actual generation is fenced.

Not for

  • Reselling access or pointing a crowd at it, the project is personal-experimentation-only and several free tiers forbid proxying
  • Production or anything latency- or quality-critical, late-day fallback to weaker models is expected by design

The Stack

Tested Against

tashfeenahmed/freellmapi (2026-06)node@20

Side effects & data flow

Network
none, local only
Writes
./freellmapi.json
Credentials
none required

Prerequisites

  • Docker (the backend ships as a container on port 3001)
  • Free API keys from the providers you want in the chain

Steps

  1. 1

    Write the client config and validate it

    After installing the backend (curl install or docker compose, port 3001) and pasting your provider keys in the dashboard, capture the unified key and your fallback order. CI checks the client config shape; the actual chat call needs your key and is fenced.

    cat > freellmapi.json <<'JSON'
    {
      "endpoint": "http://localhost:3001/v1",
      "unified_key": "freellmapi-REPLACE_WITH_YOUR_KEY",
      "fallback_chain": ["google", "groq", "cerebras", "mistral", "cohere", "nvidia"],
      "request": { "model": "auto", "messages": [{ "role": "user", "content": "hi" }] }
    }
    JSON
    node -e '
    const fs = require("fs");
    const c = JSON.parse(fs.readFileSync("freellmapi.json", "utf8"));
    function bad(m) { console.error("BAD: " + m); process.exit(1); }
    if (!c.endpoint || c.endpoint.indexOf(":3001/v1") === -1) bad("endpoint must target the FreeLLMAPI port :3001/v1");
    if (!c.unified_key || c.unified_key.indexOf("freellmapi-") !== 0) bad("unified_key must start with freellmapi-");
    const chain = c.fallback_chain;
    if (!Array.isArray(chain) || chain.length < 2) bad("fallback_chain must list at least 2 providers");
    if (!chain.every((p) => typeof p === "string")) bad("fallback_chain entries must be strings");
    if (new Set(chain).size !== chain.length) bad("fallback_chain has duplicate providers");
    const req = c.request || {};
    if (!req.model) bad("request.model is required (use auto or a specific id)");
    if (!Array.isArray(req.messages) || req.messages.length === 0) bad("request.messages must be a non-empty array");
    if (!req.messages.every((m) => m.role && typeof m.content === "string")) bad("each message needs a role and string content");
    console.log("config OK: FreeLLMAPI at :3001/v1, unified key prefix ok, fallback chain of " + chain.length + " provider(s), request well-formed");
    '
  2. 2

    Call it (the model step, not checked by CI)

    Send the request with your real unified key (the Step 3 curl in the guide). A reply means the backend is live and routing. Expect sharper answers in the morning and weaker late-day fallbacks. The call and its quality are fenced.

Eval, 2 fixtures

Last passed: verified today
  • config-okcontainstimeout 30s · max $0

    Expected: config OK: FreeLLMAPI at :3001/v1, unified key prefix ok, fallback chain of 6 provider(s), request well-formed

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

FreeLLMAPI stacks 16 provider free tiers (Google, Groq, Cerebras, Mistral, Cohere, NVIDIA, and more) behind one /v1/chat/completions endpoint on port 3001, with a unified key that fronts your encrypted provider keys. Combined free allowances reach a theoretical ~1.7B tokens/month, but treat that as a ceiling, not a number you will hit. It self-labels personal experimentation only, and top models have the smallest caps, so quality drops late in the day and resets at midnight UTC.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).