Lynkr sits between your AI coding tools and any LLM provider, cutting tokens before the model sees them. 13+ providers. Zero code changes.
npm install -g lynkr
click to copy
Every optimization runs in-process, adds no meaningful latency, and requires no changes to your client tool.
Binary-compresses large JSON tool results before forwarding. A 60-item grep array drops from 3,458 to 427 tokens. Plain text passes through unchanged — TOON targets structured data, which is where the waste is.
Classifies each request and strips irrelevant tool schemas before forwarding. A read-only question doesn't need Write, Edit, Bash, or Git schemas. On a 14-tool Claude Code session: read requests drop to 547 tokens, write requests to 412.
Near-identical prompts return cached responses in 171ms with 0 tokens billed. Uses local embeddings — no external calls, no latency on misses. Fixes the contextHash so dynamic memory injections don't break similarity matching.
Scores each request across 15 dimensions — token count, code complexity, reasoning markers, risk signals, agentic patterns — and routes automatically. Simple questions stay on cheap local models. Security audits escalate to cloud.
Full MCP support with Code Mode: replaces 100+ tool schemas with 4 meta-tools (list, inspect, docs, execute), cutting MCP tool token overhead from 17,500 to 700. Session-level memory, per-tenant budget enforcement, and audit logging included.
Same prompts, same backend providers (Ollama, Moonshot, Azure OpenAI). Lynkr compresses first; the model bills for what remains.
June 2026 · Node.js 20 · Apple Silicon · Lynkr v9.3.2 · Full methodology →
One command. No Python, no Docker, no database required.
$ npm install -g lynkr
Free local model or cloud — your choice. Tier routing is optional.
MODEL_PROVIDER=ollama OLLAMA_ENDPOINT=http://localhost:11434 OLLAMA_MODEL=qwen2.5-coder:latest
MODEL_PROVIDER=azure-openai AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_API_KEY=your-key AZURE_OPENAI_DEPLOYMENT=gpt-4o
# Simple + medium → free local TIER_SIMPLE=ollama:qwen2.5-coder:7b TIER_MEDIUM=ollama:qwen2.5-coder:7b # Complex + reasoning → cloud TIER_COMPLEX=azure-openai:gpt-4o TIER_REASONING=azure-openai:gpt-4o
One environment variable. No code changes.
# Start the gateway $ npm start # → http://localhost:8081 # Point Claude Code at it $ export ANTHROPIC_BASE_URL=http://localhost:8081 $ export ANTHROPIC_API_KEY=any-value $ claude
Claude Code / Cursor / Codex / Cline / Continue.dev
│ Anthropic or OpenAI format
▼
┌──────────────────────────────┐
│ Lynkr Gateway │
│ localhost:8081 │
│ │
│ strip unused tool schemas │
│ compress JSON tool results │
│ semantic cache lookup │
│ complexity tier routing │
│ format conversion │
└──────────────┬───────────────┘
│
┌────────────────┼─────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐
│ Local │ │ Cloud │ │ Enterprise │
├───────────┤ ├───────────┤ ├────────────┤
│ Ollama │ │ Bedrock │ │ Databricks │
│ llama.cpp │ │ OpenRouter│ │ Azure │
│ LM Studio │ │ OpenAI │ │ Vertex AI │
└───────────┘ └───────────┘ └────────────┘
One command install. Works with Claude Code, Cursor, and Codex out of the box. Apache 2.0, self-hosted, no usage tracking.