v9.4.6 — benchmarked in production

The LLM gateway
that compresses
before forwarding.

Lynkr sits between your AI coding tools and any LLM provider, cutting tokens before the model sees them. 13+ providers. Zero code changes.

$ npm install -g lynkr click to copy

GitHub → NPM → Documentation →

87.6% JSON tool compression 3,458 → 427 tokens on grep arrays

53% tool selection savings 2,085 → 972 tokens on 14-tool sessions

171ms semantic cache response 0 tokens billed vs 3,282ms without cache

13+ LLM providers local, cloud, enterprise — one endpoint

What it does

Compression happens before the model sees your request.

Every optimization runs in-process, adds no meaningful latency, and requires no changes to your client tool.

87.6%

compression

TOON JSON Compression

Binary-compresses large JSON tool results before forwarding. A 60-item grep array drops from 3,458 to 427 tokens. Plain text passes through unchanged — TOON targets structured data, which is where the waste is.

47–60%

token reduction

Smart Tool Selection

Classifies each request and strips irrelevant tool schemas before forwarding. A read-only question doesn't need Write, Edit, Bash, or Git schemas. On a 14-tool Claude Code session: read requests drop to 547 tokens, write requests to 412.

171ms

on cache hit

Semantic Cache

Near-identical prompts return cached responses in 171ms with 0 tokens billed. Uses local embeddings — no external calls, no latency on misses. Fixes the contextHash so dynamic memory injections don't break similarity matching.

Auto

15-dimension

Complexity Tier Routing

Scores each request across 15 dimensions — token count, code complexity, reasoning markers, risk signals, agentic patterns — and routes automatically. Simple questions stay on cheap local models. Security audits escalate to cloud.

96%

MCP Code Mode

MCP Integration + Code Mode

Full MCP support with Code Mode: replaces 100+ tool schemas with 4 meta-tools (list, inspect, docs, execute), cutting MCP tool token overhead from 17,500 to 700. Session-level memory, per-tenant budget enforcement, and audit logging included.

Benchmark

Measured on real agentic workloads.

Same prompts, same backend providers (Ollama, Moonshot, Azure OpenAI). Lynkr compresses first; the model bills for what remains.

Tool selection — 14-tool Claude Code session

before

2,085 tokens

after

972 tokens −53%

TOON compression — 60-item JSON grep result

before

3,458 tokens

after

427 tokens −87.6%

Semantic cache — paraphrased repeat query

cold

2,857 tokens · 1,891ms

hit

0 tokens · 171ms

June 2026 · Node.js 20 · Apple Silicon · Lynkr v9.3.2 · Full methodology →

Quick Start

Running in under two minutes.

Install

One command. No Python, no Docker, no database required.

Terminal

$ npm install -g lynkr

Configure a provider

Free local model or cloud — your choice. Tier routing is optional.

Free — Ollama local

MODEL_PROVIDER=ollama
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest

Cloud — Azure OpenAI

MODEL_PROVIDER=azure-openai
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o

Tier routing — local for simple, cloud for complex

# Simple + medium → free local
TIER_SIMPLE=ollama:qwen2.5-coder:7b
TIER_MEDIUM=ollama:qwen2.5-coder:7b
# Complex + reasoning → cloud
TIER_COMPLEX=azure-openai:gpt-4o
TIER_REASONING=azure-openai:gpt-4o

Connect your tool

One environment variable. No code changes.

Start Lynkr + connect Claude Code

# Start the gateway
$ npm start  # → http://localhost:8081

# Point Claude Code at it
$ export ANTHROPIC_BASE_URL=http://localhost:8081
$ export ANTHROPIC_API_KEY=any-value
$ claude

Architecture

Where Lynkr fits.

       Claude Code  /  Cursor  /  Codex  /  Cline  /  Continue.dev
                              │  Anthropic or OpenAI format
                              ▼
                ┌──────────────────────────────┐
                │       Lynkr Gateway          │
                │       localhost:8081         │
                │                              │
                │  strip unused tool schemas   │
                │  compress JSON tool results  │
                │  semantic cache lookup       │
                │  complexity tier routing     │
                │  format conversion           │
                └──────────────┬───────────────┘
                               │
              ┌────────────────┼─────────────────┐
              ▼                ▼                  ▼
        ┌───────────┐   ┌───────────┐    ┌────────────┐
        │  Local    │   │  Cloud    │    │ Enterprise │
        ├───────────┤   ├───────────┤    ├────────────┤
        │ Ollama    │   │ Bedrock   │    │ Databricks │
        │ llama.cpp │   │ OpenRouter│    │ Azure      │
        │ LM Studio │   │ OpenAI    │    │ Vertex AI  │
        └───────────┘   └───────────┘    └────────────┘

The LLM gateway
that compresses
before forwarding.

Compression happens before the model sees your request.

Measured on real agentic workloads.

13+ LLM providers. One endpoint.

Running in under two minutes.

Where Lynkr fits.

Start cutting costs today.

The LLM gateway that compresses before forwarding.

Compression happens before the model sees your request.

Measured on real agentic workloads.

13+ LLM providers. One endpoint.

Running in under two minutes.

Where Lynkr fits.

Start cutting costs today.

The LLM gateway
that compresses
before forwarding.