Claude Code, without the runaway monthly bill
Plug Claude Code into your LMbox. On every request, a cascade router decides whether to use a local model (free, on your LAN) or Claude Sonnet via Bedrock EU (billed, frontier). In practice: 70-80% of traffic never leaves your infrastructure.
Claude Code, on its own, is expensive and leaks code
≈ €400/dev/month
A team of 10 developers running Claude Code in agentic mode easily burns €4,000/month on the Anthropic API. The bill grows with usage.
Your code goes to Anthropic
Every agentic prompt ships repo context in clear text to a US API. Fine for an open-source side project — a deal-breaker for a law firm, an OIV operator or a bank.
No fallback plan
API down, quota exhausted, sanctions, network outage: Claude Code stops. No local fallback, no offline continuity.
How Claude Code plugs into the Box
Each pattern ships in the Box. You enable whichever matches your maturity and the sovereignty bar your sector requires.
Pure local proxy
Claude Code talks straight to a model running on your Box. Zero internet egress. Great for short questions and simple code review, limited for long agentic sessions.
ANTHROPIC_BASE_URL points to the Box. The Box answers with Mistral Large 2 or Codestral via the Anthropic-compatible /v1/messages endpoint.
Cascade routing (recommended)
The lmbox router scores each request on 6 dimensions (turns, tokens, tools, keywords…) and decides: fast (Mistral local), medium (Qwen Coder local), or frontier (Claude Sonnet via Bedrock EU). The model name is swapped on the wire, the client only sees one URL.
Tunable in router/config.yaml. X-LMbox-Force-Tier header for per-session overrides. Exhaustive JSONL audit log on /audit, exported to Loki.
MCP RAG over connectors
Claude Code attaches to the LMbox-mcp server alongside the LLM. The server exposes your connectors (SharePoint, Confluence, Drive, Jira) as MCP tools. Claude Code can read your internal docs during a session — without ever shipping content to Anthropic.
Streamable HTTP transport (MCP spec 2025-11-25). Auth delegated to Authentik via OIDC. RAG queries run inside the Box; only the ranked result is passed to the model.
10-developer team, three scenarios
Assumptions: 10 developers in heavy agentic mode, 8h working day, observed cascade ratio 60/30/10 (fast/medium/frontier). Bedrock EU pricing as of May 6, 2026.
| Scenario | Anthropic / Bedrock API | LMbox cost | Monthly total | Sovereignty |
|---|---|---|---|---|
| Direct Claude Code (no LMbox) | ≈ €4,000 | — | ≈ €4,000 · €48,000/year | None (US) |
| LMbox cascade (75% local) | ≈ €800 | Box amortised over 36 months | ≈ €1,100 · €13,200/year | Hybrid, controlled |
| 100% local (Mistral Large 2) | €0 | Box amortised over 36 months | ≈ €300 · €3,600/year | Full (EU / Mistral) |
ROI depends on team size and usage intensity. Beyond 5 developers in agentic mode, the LMbox cascade pays for itself in under a year, before counting the sovereignty benefit.
What the router actually scores
Transparent heuristics in router/config.yaml. Every decision is recorded with its score and reasons. If the calibration looks off, edit the YAML and reload.
Default thresholds: score < 7 → fast · 7-14 → medium · ≥ 15 → frontier.
What each tier handles, concretely
- Reword this docstring
- What's this function's signature?
- Generate a basic test for this helper
- Code review across these 3 files
- Refactor this module with dependency injection
- Why does this test flake?
- 20-turn agentic plan for this feature
- Cross-file refactor with migration
- Complex Python stack trace + traceback
5 minutes, 3 commands
1. Point Claude Code at the Box
One environment variable, one internal certificate. No change to the Claude Code binary.
export ANTHROPIC_BASE_URL=https://lmbox.local export ANTHROPIC_API_KEY=$(lmbox vault get claude_code_key) claude # runs against the Box
2. Dry-run the decision before paying
Before triggering a Bedrock call, preview the tier and reasons.
$ lmbox router test "Refactor the auth module to dependency injection"
{
"tier": "medium",
"score": 9,
"reasons": ["agentic_keyword:refactor", "turns:1"],
"would_route_to": "ollama/qwen2.5-coder:32b"
}
3. Audit usage by tier
After a week, your security team sees who routed where, and what you've spent.
$ lmbox router stats 7 TIER REQ ERR TIN/M TOUT/M P50ms P95ms COST€ fast 2,134 2 1.41 0.32 78 142 0.00 medium 612 0 3.20 0.95 1,430 2,810 0.00 frontier 147 1 2.07 0.61 340 980 34.32 TOTAL 2,893 34.32 Window: last 7 days · ratio_local = 95%
What if I want to force a tier?
Three ways: temporary env var, per-request HTTP header, or a YAML rule per project. Useful for benchmarks, debugging unexpected behaviour, or pinning a critical feature to the frontier tier.
# Force this session ANTHROPIC_BASE_URL=https://lmbox.local \ curl -H "X-LMbox-Force-Tier: frontier" ... # Or per project, in .LMbox.yml: force_tier: medium # never Bedrock, never fast
Plug Claude Code into the LMbox connectors
Pattern C — your IDE can search SharePoint, Confluence, Jira, GitLab and 5 other connectors without the content ever leaving the Box. Three steps.
1. Declare the MCP server in Claude Code
Drop a block into ~/.claude/claude.json. In production the key comes from the age-encrypted vault: lmbox vault get LMbox_mcp_key.
{
"mcpServers": {
"LMbox": {
"type": "http",
"url": "https://lmbox.local/mcp",
"headers": {
"x-lmbox-mcp-key": "${LMBOX_MCP_KEY}"
}
}
}
}
2. Verify the tools on the Box side
The server exposes 4 tools. Confirm they're being served and the sources are indexed before opening a session.
$ lmbox mcp tools • search args=(query, source, top_k) • read_doc args=(doc_id, source) • list_sources args=() • find_similar args=(text, source, top_k) $ lmbox mcp sources ✓ sharepoint SharePoint docs=12,480 last=2026-05-06T08:00 ✓ confluence Confluence docs= 6,230 last=2026-05-06T08:00 ✓ gitlab GitLab docs=24,100 last=2026-05-06T08:00 ✓ jira Jira docs= 3,770 last=2026-05-06T08:00
3. Ask Claude Code to use a tool
At session start, the 4 LMbox tools appear in the model context. Prompts that trigger them run a local RAG query — never Anthropic.
$ claude
> Use LMbox.search on Confluence to find anything about
> our Azure AD SSO flow.
[tool] search confluence "Azure AD SSO flow" → 8 hits
[model] The SSO flow is documented across 3 spaces:
Architecture (page 142), Security (page 88)…
Only the ranked top-K hits the model
The search runs inside the Box. Only the ranked snippets (default top-8, tunable in mcp/config.yaml) are injected into the LLM context. Raw Confluence or SharePoint document bodies never leave the appliance — you can even disable read_doc entirely in the config if you want to forbid full-document retrieval.
Questions teams raise before signing
+ Does code routed to frontier transit through the US?
+ Is local latency really competitive vs Bedrock?
+ Does Mistral Large 2 really match Sonnet on code?
+ How do I cut off Bedrock during an incident?
+ What about full air-gap mode?
We install the Box, we connect Claude Code
30-day on-site POC with a team of 5 to 15 developers. We measure the actual local/frontier ratio on your repos together. If it doesn't deliver the promise, we pack up.