v9.4.6 — benchmarked in production

The LLM gateway
that compresses
before forwarding.

Lynkr sits between your AI coding tools and any LLM provider, cutting tokens before the model sees them. 13+ providers. Zero code changes.

$ npm install -g lynkr click to copy
87.6% JSON tool compression 3,458 → 427 tokens on grep arrays
53% tool selection savings 2,085 → 972 tokens on 14-tool sessions
171ms semantic cache response 0 tokens billed vs 3,282ms without cache
13+ LLM providers local, cloud, enterprise — one endpoint
What it does

Compression happens before the model sees your request.

Every optimization runs in-process, adds no meaningful latency, and requires no changes to your client tool.

87.6%
compression
TOON JSON Compression

Binary-compresses large JSON tool results before forwarding. A 60-item grep array drops from 3,458 to 427 tokens. Plain text passes through unchanged — TOON targets structured data, which is where the waste is.

47–60%
token reduction
Smart Tool Selection

Classifies each request and strips irrelevant tool schemas before forwarding. A read-only question doesn't need Write, Edit, Bash, or Git schemas. On a 14-tool Claude Code session: read requests drop to 547 tokens, write requests to 412.

171ms
on cache hit
Semantic Cache

Near-identical prompts return cached responses in 171ms with 0 tokens billed. Uses local embeddings — no external calls, no latency on misses. Fixes the contextHash so dynamic memory injections don't break similarity matching.

Auto
15-dimension
Complexity Tier Routing

Scores each request across 15 dimensions — token count, code complexity, reasoning markers, risk signals, agentic patterns — and routes automatically. Simple questions stay on cheap local models. Security audits escalate to cloud.

96%
MCP Code Mode
MCP Integration + Code Mode

Full MCP support with Code Mode: replaces 100+ tool schemas with 4 meta-tools (list, inspect, docs, execute), cutting MCP tool token overhead from 17,500 to 700. Session-level memory, per-tenant budget enforcement, and audit logging included.

Benchmark

Measured on real agentic workloads.

Same prompts, same backend providers (Ollama, Moonshot, Azure OpenAI). Lynkr compresses first; the model bills for what remains.

Tool selection — 14-tool Claude Code session
before
2,085 tokens
after
972 tokens −53%
TOON compression — 60-item JSON grep result
before
3,458 tokens
after
427 tokens −87.6%
Semantic cache — paraphrased repeat query
cold
2,857 tokens · 1,891ms
hit
0 tokens · 171ms

June 2026 · Node.js 20 · Apple Silicon · Lynkr v9.3.2 · Full methodology →

Providers

13+ LLM providers. One endpoint.

Local — free
Ollama llama.cpp LM Studio
Cloud
AWS Bedrock OpenRouter OpenAI Azure OpenAI Azure Anthropic Moonshot Vertex AI Z.AI
Enterprise
Databricks DeepSeek
Compatible tools
Claude Code Cursor Codex CLI Cline Continue.dev
Quick Start

Running in under two minutes.

01
Install

One command. No Python, no Docker, no database required.

Terminal
$ npm install -g lynkr
02
Configure a provider

Free local model or cloud — your choice. Tier routing is optional.

Free — Ollama local
MODEL_PROVIDER=ollama
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest
Cloud — Azure OpenAI
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o
Tier routing — local for simple, cloud for complex
# Simple + medium → free local
TIER_SIMPLE=ollama:qwen2.5-coder:7b
TIER_MEDIUM=ollama:qwen2.5-coder:7b
# Complex + reasoning → cloud
TIER_COMPLEX=azure-openai:gpt-4o
TIER_REASONING=azure-openai:gpt-4o
03
Connect your tool

One environment variable. No code changes.

Start Lynkr + connect Claude Code
# Start the gateway
$ npm start  # → http://localhost:8081

# Point Claude Code at it
$ export ANTHROPIC_BASE_URL=http://localhost:8081
$ export ANTHROPIC_API_KEY=any-value
$ claude
Architecture

Where Lynkr fits.

       Claude Code  /  Cursor  /  Codex  /  Cline  /  Continue.dev
                              │  Anthropic or OpenAI format
                              ▼
                ┌──────────────────────────────┐
                │       Lynkr Gateway          │
                │       localhost:8081         │
                │                              │
                │  strip unused tool schemas   │
                │  compress JSON tool results  │
                │  semantic cache lookup       │
                │  complexity tier routing     │
                │  format conversion           │
                └──────────────┬───────────────┘
                               │
              ┌────────────────┼─────────────────┐
              ▼                ▼                  ▼
        ┌───────────┐   ┌───────────┐    ┌────────────┐
        │  Local    │   │  Cloud    │    │ Enterprise │
        ├───────────┤   ├───────────┤    ├────────────┤
        │ Ollama    │   │ Bedrock   │    │ Databricks │
        │ llama.cpp │   │ OpenRouter│    │ Azure      │
        │ LM Studio │   │ OpenAI    │    │ Vertex AI  │
        └───────────┘   └───────────┘    └────────────┘

Start cutting costs today.

One command install. Works with Claude Code, Cursor, and Codex out of the box. Apache 2.0, self-hosted, no usage tracking.