v3.6 Optimize — Cache + Compress + Align

SKIP. SHRINK.
DISCOUNT.

The only local-first layer that SKIPS (cache: 100% on a hit), SHRINKS (compress: 60–95% on a miss), and DISCOUNTS (align: provider prefix-cache) every LLM call — and remembers — in one install.

pip install -U superlocalmemory && slm restart slm wrap claude
Get Started

Three Levers. One Install.

SLM v3.6 sits between your application and your LLM provider. Every call is intercepted, optimized, and forwarded. Errors are fail-open — your calls never break.

Cache

100% on a hit

Byte-identical repeat calls served from local SQLite. Exact-match always. vCache-gated semantic (opt-in) for near-duplicates. Zero provider tokens consumed on hit.

24h TTL default · Tag-based invalidation · Stampede shield

Compress

60–95% on a miss

Structure-preserving extractive compression for JSON/code/tool outputs. LLMLingua-2 prose (opt-in, warned). CCR reversible — every byte recoverable.

Safe mode default · AST-aware code (6 languages) · Batch retention

Align

Lossless extra

Stabilizes prompt prefixes by detecting volatile tokens (UUIDs, timestamps, JWTs). Maximizes native provider prefix-cache discounts — Anthropic 90%, OpenAI 50%.

Stability scoring · Per-finding analysis · Zero prompt mutation

How It Works

Optimize runs alongside SLM Memory. Memory shapes what is in the prompt; Optimize decides whether and how it is sent. Separate jobs. One layer. Separate storage.

Your App ──► Proxy/SDK/Wrap ──► Cache Check ──hit──► Return Cached (0 tokens)
                                    │
                                  miss
                                    │
                              ┌─────▼──────┐
                              │  Compress  │──► Provider ──► Store in Cache
                              │  60-95%    │
                              │  + Align   │
                              └────────────┘

   ┌────────────────────┐   ┌────────────────────┐
   │  llmcache.db       │   │  ~/.slm/optimize.json│
   │  (separate file)   │   │  (hot-reload)        │
   │  AES-256-GCM       │   │  Written by UI/CLI   │
   │  No memory.db      │   │  2s reload           │
   └────────────────────┘   └────────────────────┘

See Your Savings — Live

Every cache hit, compression, and alignment event is counted. View real-time USD/INR savings from the dashboard or CLI.

100%
tokens saved on cache hit
input + output
60–95%
fewer input tokens on miss
compression ratio
Real-time
USD + INR savings
CLI + dashboard
Safe
fail-open, AES-256-GCM
zero regression risk

What's New in v3.6

Exact Cache

Byte-identical repeat calls served from local SQLite. SHA-256 key derivation, stampede shield, TTL-based expiry.

  • Zero provider tokens on hit
  • Tag-based invalidation
  • Tool-use/length calls never cached

Semantic Cache (opt-in)

vCache-powered learned thresholds with SAFE-CACHE centroid defense. Near-duplicate queries served within error bound.

  • Dual-threshold: 0.98 direct, 0.90–0.98 verify
  • CacheAttack 86% hijack class blocked
  • Side-channel padding + multi-turn guard

Extractive Compression

Structure-preserving compression for JSON, code, and prose. Lossless by default. CCR for byte-exact reversal.

  • JSON: 120-char truncation + array limit
  • Code: AST-aware (Python/JS/Go/Rust/Java/C++)
  • Prose: LLMLingua-2 (opt-in, warned)

Interception Proxy

HTTP proxy on port 8765 serving Anthropic, OpenAI, and Gemini surfaces. Zero-code integration.

  • 3 surfaces: /v1/messages, /v1/chat/completions, /v1beta
  • Fail-open: any error passes through
  • Header redaction + SSRF allowlist

Agent Wrapping

One command: `slm wrap claude` starts the proxy, sets the environment, and launches the agent.

  • Supports 10 agents: Claude, Cursor, Aider, etc.
  • SDK adapters: withSLM(OpenAI()) — drop-in
  • Persistent config write for permanent setup

Savings Dashboard

Live cost visibility from UI and CLI. Everything is tracked, estimated, and displayed in real-time.

  • USD + INR savings with configurable rates
  • Hit rate, compress ratio, cache size
  • 5 new API endpoints for programmatic access

One Command to Start Saving

# Step 1 — Install or upgrade
pip install -U superlocalmemory
slm restart
# Step 2 — Verify
slm optimize status
# Step 3 — Wrap your agent
slm wrap claude
# Step 4 — See savings
slm optimize savings

Optimize is ON by default with safe settings. Semantic cache and aggressive compression are opt-in. No behavior change until you enable them.

Safety by Default. Security by Design.

SLM v3.6 is built on AI Reliability Engineering principles. Every feature has safe defaults, fail-open guarantees, and audited security.

Data Isolation

Separate llmcache.db — never touches memory.db. AES-256-GCM encryption on all cache values. Random per-install salt. chmod 600 on DB files. Cross-contamination guard at the code level.

Fail-Open Guarantee

Every cache/compress/proxy error passes through to the provider. Your calls never break. Tested: kill the cache mid-flight and calls still succeed with zero latency increase.

CacheAttack Mitigation

SAFE-CACHE centroid defense blocks 86% of adversarial hijack probes. Side-channel timing padding. UUID4 validation on CCR retrieval. No pickle serialization (CWE-502 safe).

Safe Compression

Default safe mode: extractive only (structure-preserving, lossless, production-safe). Aggressive mode requires explicit opt-in with warning. CCR stores originals for byte-exact reversal.

Start Your Free Tier Now

The only local-first layer that SKIPS, SHRINKS, and DISCOUNTS every LLM call — and remembers — in one install. Free. Open source. Your data stays on your machine.