Claude Code on an LMbox — local cascade + Bedrock EU

The problem

Claude Code, on its own, is expensive and leaks code

≈ €400/dev/month

A team of 10 developers running Claude Code in agentic mode easily burns €4,000/month on the Anthropic API. The bill grows with usage.

Your code goes to Anthropic

Every agentic prompt ships repo context in clear text to a US API. Fine for an open-source side project — a deal-breaker for a law firm, an OIV operator or a bank.

No fallback plan

API down, quota exhausted, sanctions, network outage: Claude Code stops. No local fallback, no offline continuity.

Three patterns

How Claude Code plugs into the Box

Each pattern ships in the Box. You enable whichever matches your maturity and the sovereignty bar your sector requires.

Pattern A

Pure local proxy

Claude Code talks straight to a model running on your Box. Zero internet egress. Great for short questions and simple code review, limited for long agentic sessions.

ANTHROPIC_BASE_URL points to the Box. The Box answers with Mistral Large 2 or Codestral via the Anthropic-compatible /v1/messages endpoint.

Pour qui

Law firms · OIV · health · air-gap

★ Recommandé

Pattern B

Cascade routing (recommended)

The lmbox router scores each request on 6 dimensions (turns, tokens, tools, keywords…) and decides: fast (Mistral local), medium (Qwen Coder local), or frontier (Claude Sonnet via Bedrock EU). The model name is swapped on the wire, the client only sees one URL.

Tunable in router/config.yaml. X-LMbox-Force-Tier header for per-session overrides. Exhaustive JSONL audit log on /audit, exported to Loki.

Pour qui

Tech mid-market · industry · Mistral/Anthropic partners

Pattern C

MCP RAG over connectors

Claude Code attaches to the LMbox-mcp server alongside the LLM. The server exposes your connectors (SharePoint, Confluence, Drive, Jira) as MCP tools. Claude Code can read your internal docs during a session — without ever shipping content to Anthropic.

Streamable HTTP transport (MCP spec 2025-11-25). Auth delegated to Authentik via OIDC. RAG queries run inside the Box; only the ranked result is passed to the model.

Pour qui

Any team with an internal documentation base

The math

10-developer team, three scenarios

Assumptions: 10 developers in heavy agentic mode, 8h working day, observed cascade ratio 60/30/10 (fast/medium/frontier). Bedrock EU pricing as of May 6, 2026.

Scenario	Anthropic / Bedrock API	LMbox cost	Monthly total	Sovereignty
Direct Claude Code (no LMbox)	≈ €4,000	—	≈ €4,000 · €48,000/year	None (US)
LMbox cascade (75% local)	≈ €800	Box amortised over 36 months	≈ €1,100 · €13,200/year	Hybrid, controlled
100% local (Mistral Large 2)	€0	Box amortised over 36 months	≈ €300 · €3,600/year	Full (EU / Mistral)

3-year savings, team of 10

≈ €105,000

ROI depends on team size and usage intensity. Beyond 5 developers in agentic mode, the LMbox cascade pays for itself in under a year, before counting the sovereignty benefit.

Per-request decision

What the router actually scores

Transparent heuristics in router/config.yaml. Every decision is recorded with its score and reasons. If the calibration looks off, edit the YAML and reload.

Number of turns

× 1, max 10

Multi-turn sessions tend to drift — bump up a tier.

Token volume

+5 if ≥8k · +10 if ≥32k

Beyond 32k tokens, only the frontier keeps coherence.

Tool presence

+5

Tool calls = agentic workflow = need reliable planning.

Agentic keywords

× 4, max 12

"plan", "refactor", "design", "architect", "debug stack"…

Complex keywords

× 3, max 6

"concurrent", "race condition", "migration", "algorithm"…

Header override

X-LMbox-Force-Tier

The client can force fast/medium/frontier when in doubt (debugging, benchmarking).

Default thresholds: score < 7 → fast · 7-14 → medium · ≥ 15 → frontier.

Three tiers, three models

What each tier handles, concretely

fast

Mistral Small 3 (local)

≈ 80 ms / token

Exemples

Reword this docstring
What's this function's signature?
Generate a basic test for this helper

medium

Qwen 2.5 Coder 32B (local)

≈ 30 tokens/s

Exemples

Code review across these 3 files
Refactor this module with dependency injection
Why does this test flake?

frontier

Claude Sonnet 4.5 (Bedrock EU)

Bedrock EU API · ≈ 200 ms TTFT

Exemples

20-turn agentic plan for this feature
Cross-file refactor with migration
Complex Python stack trace + traceback

Quickstart

5 minutes, 3 commands

1. Point Claude Code at the Box

One environment variable, one internal certificate. No change to the Claude Code binary.

export ANTHROPIC_BASE_URL=https://lmbox.local
export ANTHROPIC_API_KEY=$(lmbox vault get claude_code_key)
claude  # runs against the Box

2. Dry-run the decision before paying

Before triggering a Bedrock call, preview the tier and reasons.

$ lmbox router test "Refactor the auth module to dependency injection"
{
  "tier": "medium",
  "score": 9,
  "reasons": ["agentic_keyword:refactor", "turns:1"],
  "would_route_to": "ollama/qwen2.5-coder:32b"
}

3. Audit usage by tier

After a week, your security team sees who routed where, and what you've spent.

$ lmbox router stats 7
TIER          REQ   ERR  TIN/M  TOUT/M  P50ms  P95ms   COST€
fast        2,134     2   1.41    0.32     78    142    0.00
medium        612     0   3.20    0.95   1,430  2,810   0.00
frontier      147     1   2.07    0.61    340    980   34.32
TOTAL       2,893                                       34.32
Window: last 7 days · ratio_local = 95%

What if I want to force a tier?

Three ways: temporary env var, per-request HTTP header, or a YAML rule per project. Useful for benchmarks, debugging unexpected behaviour, or pinning a critical feature to the frontier tier.

# Force this session
ANTHROPIC_BASE_URL=https://lmbox.local \
  curl -H "X-LMbox-Force-Tier: frontier" ...

# Or per project, in .LMbox.yml:
force_tier: medium     # never Bedrock, never fast

Quickstart · MCP

Plug Claude Code into the LMbox connectors

Pattern C — your IDE can search SharePoint, Confluence, Jira, GitLab and 5 other connectors without the content ever leaving the Box. Three steps.

1. Declare the MCP server in Claude Code

Drop a block into ~/.claude/claude.json. In production the key comes from the age-encrypted vault: lmbox vault get LMbox_mcp_key.

{
  "mcpServers": {
    "LMbox": {
      "type": "http",
      "url":  "https://lmbox.local/mcp",
      "headers": {
        "x-lmbox-mcp-key": "${LMBOX_MCP_KEY}"
      }
    }
  }
}

2. Verify the tools on the Box side

The server exposes 4 tools. Confirm they're being served and the sources are indexed before opening a session.

$ lmbox mcp tools
• search        args=(query, source, top_k)
• read_doc      args=(doc_id, source)
• list_sources  args=()
• find_similar  args=(text, source, top_k)

$ lmbox mcp sources
✓ sharepoint    SharePoint    docs=12,480  last=2026-05-06T08:00
✓ confluence    Confluence    docs= 6,230  last=2026-05-06T08:00
✓ gitlab        GitLab        docs=24,100  last=2026-05-06T08:00
✓ jira          Jira          docs= 3,770  last=2026-05-06T08:00

3. Ask Claude Code to use a tool

At session start, the 4 LMbox tools appear in the model context. Prompts that trigger them run a local RAG query — never Anthropic.

$ claude
> Use LMbox.search on Confluence to find anything about
> our Azure AD SSO flow.

[tool] search confluence "Azure AD SSO flow" → 8 hits
[model] The SSO flow is documented across 3 spaces:
        Architecture (page 142), Security (page 88)…

Only the ranked top-K hits the model

The search runs inside the Box. Only the ranked snippets (default top-8, tunable in mcp/config.yaml) are injected into the LLM context. Raw Confluence or SharePoint document bodies never leave the appliance — you can even disable read_doc entirely in the config if you want to forbid full-document retrieval.

What CISOs and lead devs ask

Questions teams raise before signing

+ Does code routed to frontier transit through the US?

Not if you use Bedrock in eu-west-3 (Paris) or eu-central-1 (Frankfurt). Anthropic publishes its Bedrock processing terms: no retention, no training, AWS Europe jurisdiction. You can also disable the frontier tier entirely (cap the cascade at medium).

+ Is local latency really competitive vs Bedrock?

Mistral Small 3 on Apple Silicon M3 Ultra: ≈ 80 ms/token, comparable to Claude Haiku. Qwen 2.5 Coder 32B on the same Box: ≈ 30 tokens/s, fine for interactive code review. Bedrock Sonnet: 200 ms TTFT + ≈ 60 tokens/s. User experience is very close, except on agentic tasks beyond 30 turns where the frontier stays more reliable.

+ Does Mistral Large 2 really match Sonnet on code?

On 70-80% of a typical Claude Code session: yes, measured on our internal benchmarks (HumanEval, SWE-bench Lite, design partner prompts). On 20+ turn agentic sessions with cross-file planning and complex stack traces: no, Sonnet stays ahead. That's exactly why the router escalates those.

+ How do I cut off Bedrock during an incident?

One command: lmbox router config --frontier=disabled. All requests fall back to medium or fast. The team keeps Claude Code, with no outbound calls. Reversible without restarting the Box.

+ What about full air-gap mode?

The frontier tier is disabled by default. Claude Code runs on Pattern A pure (proxy to Mistral Large 2 + Codestral). 100% of the session stays in the Box. You keep Claude Code; you lose ≈ 15% quality on the hardest agentic sessions.

We install the Box, we connect Claude Code

30-day on-site POC with a team of 5 to 15 developers. We measure the actual local/frontier ratio on your repos together. If it doesn't deliver the promise, we pack up.

Request a Claude Code POC See the Box architecture

Claude Code, without the runaway monthly bill

Claude Code, on its own, is expensive and leaks code

≈ €400/dev/month

Your code goes to Anthropic

No fallback plan

How Claude Code plugs into the Box

Pure local proxy

Cascade routing (recommended)

MCP RAG over connectors

10-developer team, three scenarios

What the router actually scores

What each tier handles, concretely

5 minutes, 3 commands

1. Point Claude Code at the Box

2. Dry-run the decision before paying

3. Audit usage by tier

What if I want to force a tier?

Plug Claude Code into the LMbox connectors

1. Declare the MCP server in Claude Code

2. Verify the tools on the Box side

3. Ask Claude Code to use a tool

Only the ranked top-K hits the model

Questions teams raise before signing

We install the Box, we connect Claude Code