SKIP. SHRINK.
DISCOUNT.
The only local-first layer that SKIPS (cache: 100% on a hit), SHRINKS (compress: 60–95% on a miss), and DISCOUNTS (align: provider prefix-cache) every LLM call — and remembers — in one install.
Three Levers. One Install.
SLM v3.6 sits between your application and your LLM provider. Every call is intercepted, optimized, and forwarded. Errors are fail-open — your calls never break.
Cache
Byte-identical repeat calls served from local SQLite. Exact-match always. vCache-gated semantic (opt-in) for near-duplicates. Zero provider tokens consumed on hit.
Compress
Structure-preserving extractive compression for JSON/code/tool outputs. LLMLingua-2 prose (opt-in, warned). CCR reversible — every byte recoverable.
Align
Stabilizes prompt prefixes by detecting volatile tokens (UUIDs, timestamps, JWTs). Maximizes native provider prefix-cache discounts — Anthropic 90%, OpenAI 50%.
How It Works
Optimize runs alongside SLM Memory. Memory shapes what is in the prompt; Optimize decides whether and how it is sent. Separate jobs. One layer. Separate storage.
Your App ──► Proxy/SDK/Wrap ──► Cache Check ──hit──► Return Cached (0 tokens)
│
miss
│
┌─────▼──────┐
│ Compress │──► Provider ──► Store in Cache
│ 60-95% │
│ + Align │
└────────────┘
┌────────────────────┐ ┌────────────────────┐
│ llmcache.db │ │ ~/.slm/optimize.json│
│ (separate file) │ │ (hot-reload) │
│ AES-256-GCM │ │ Written by UI/CLI │
│ No memory.db │ │ 2s reload │
└────────────────────┘ └────────────────────┘ See Your Savings — Live
Every cache hit, compression, and alignment event is counted. View real-time USD/INR savings from the dashboard or CLI.
What's New in v3.6
Exact Cache
Byte-identical repeat calls served from local SQLite. SHA-256 key derivation, stampede shield, TTL-based expiry.
- → Zero provider tokens on hit
- → Tag-based invalidation
- → Tool-use/length calls never cached
Semantic Cache (opt-in)
vCache-powered learned thresholds with SAFE-CACHE centroid defense. Near-duplicate queries served within error bound.
- ★ Dual-threshold: 0.98 direct, 0.90–0.98 verify
- ★ CacheAttack 86% hijack class blocked
- ★ Side-channel padding + multi-turn guard
Extractive Compression
Structure-preserving compression for JSON, code, and prose. Lossless by default. CCR for byte-exact reversal.
- → JSON: 120-char truncation + array limit
- → Code: AST-aware (Python/JS/Go/Rust/Java/C++)
- → Prose: LLMLingua-2 (opt-in, warned)
Interception Proxy
HTTP proxy on port 8765 serving Anthropic, OpenAI, and Gemini surfaces. Zero-code integration.
- • 3 surfaces: /v1/messages, /v1/chat/completions, /v1beta
- • Fail-open: any error passes through
- • Header redaction + SSRF allowlist
Agent Wrapping
One command: `slm wrap claude` starts the proxy, sets the environment, and launches the agent.
- ● Supports 10 agents: Claude, Cursor, Aider, etc.
- ● SDK adapters: withSLM(OpenAI()) — drop-in
- ● Persistent config write for permanent setup
Savings Dashboard
Live cost visibility from UI and CLI. Everything is tracked, estimated, and displayed in real-time.
- ● USD + INR savings with configurable rates
- ● Hit rate, compress ratio, cache size
- ● 5 new API endpoints for programmatic access
One Command to Start Saving
Optimize is ON by default with safe settings. Semantic cache and aggressive compression are opt-in. No behavior change until you enable them.
Safety by Default. Security by Design.
SLM v3.6 is built on AI Reliability Engineering principles. Every feature has safe defaults, fail-open guarantees, and audited security.
Data Isolation
Separate llmcache.db — never touches memory.db. AES-256-GCM encryption on all cache values. Random per-install salt. chmod 600 on DB files. Cross-contamination guard at the code level.
Fail-Open Guarantee
Every cache/compress/proxy error passes through to the provider. Your calls never break. Tested: kill the cache mid-flight and calls still succeed with zero latency increase.
CacheAttack Mitigation
SAFE-CACHE centroid defense blocks 86% of adversarial hijack probes. Side-channel timing padding. UUID4 validation on CCR retrieval. No pickle serialization (CWE-502 safe).
Safe Compression
Default safe mode: extractive only (structure-preserving, lossless, production-safe). Aggressive mode requires explicit opt-in with warning. CCR stores originals for byte-exact reversal.
Start Your Free Tier Now
The only local-first layer that SKIPS, SHRINKS, and DISCOUNTS every LLM call — and remembers — in one install. Free. Open source. Your data stays on your machine.