Mathematical Foundation

Geometry,
not guesswork.

Three peer-reviewed proofs power every recall. The Fisher-Rao information metric, Riemannian lifecycle management, and information-theoretic compression — all verifiable, all open.

The problem

Why cosine fails agents.

Cosine similarity treats all dimensions equally. A memory stored with confidence 0.12 — nearly a guess — gets the same geometric weight as a memory stored with confidence 0.99. That is not a retrieval system. That is noise.

Confidence is information. When an agent stores a fact it observed once versus a fact it verified twelve times, those are not the same thing. Any distance metric that ignores this distinction will surface the wrong memories under load — precisely when correctness matters most.

The Fisher-Rao metric lives on the statistical manifold — the curved space where probability distributions actually live. Distance there is measured along geodesics, not straight lines. High-confidence memories are geometrically closer to their neighbors. Low-confidence memories are pushed outward. The geometry does the work.

Memory Cosine rank Fisher-Rao rank
Alice → Staff Eng. at Google #2 · 0.84 #1 · conf 0.97
Alice → maybe works in tech? #1 · 0.91 #4 · conf 0.11
Alice → ex-Googler (unverified) #3 · 0.79 #3 · conf 0.38
Alice → G-FAANG eng., confirmed #4 · 0.77 #2 · conf 0.91
Query: slm recall "Where does Alice work?"
Cosine surfaces the lowest-confidence memory first. Fisher-Rao surfaces the highest. Confidence is geometry, not metadata.
01 — Retrieval Metric
Fisher-Rao Distance · arXiv:2603.14588
dFR(p, q) = arccos( Σ √(pi · qi) )

Where p and q are probability distributions over memory confidence scores. The sum Σ is taken over all memory dimensions i. The arccos maps the result to angular distance on the unit sphere — the natural geometry of probability distributions under the Fisher information metric.

This is the geodesic distance on the statistical manifold — the shortest path between two probability distributions, measured along the curved surface they inhabit. Not the Euclidean shortcut through the void. The manifold curvature is determined by the Fisher information matrix, which encodes how much information each dimension carries. High-confidence dimensions curve the space more steeply. Low-confidence dimensions barely curve it at all.
  • Symmetric dFR(p, q) = dFR(q, p)
  • Triangle inequality dFR(p, r) ≤ dFR(p, q) + dFR(q, r)
  • Degenerate at identical distributions dFR(p, p) = 0
  • Bounded 0 ≤ dFR(p, q) ≤ π / 2
2603.14588 — V3 SLM: Information-Geometric Agent Memory →
02 — Lifecycle Model
Riemannian Lifecycle · arXiv:2603.02240
γtγ = 0     ·     Expp(v) = γ(1)

Memory lifecycle follows geodesic paths on the Riemannian manifold. ∇γtγ = 0 is the geodesic equation — it defines the "straightest possible path" between memory states, parallel-transporting the tangent vector along its own trajectory. Expp(v) is the Riemannian exponential map: starting at memory state p with velocity v, it gives you the memory state after one unit of geodesic travel — the mechanism by which memories consolidate toward related facts.

Memories that cohere — that have short geodesics to related facts in the graph — strengthen through consolidation. Memories that are isolated — with long geodesics from everything else — decay. The geometry decides. Not a scheduler, not an arbitrary timer, not a recency window.

There is no arbitrary TTL. Decay is a geometric property, not a timer. A memory used yesterday and confirmed again today has a short geodesic to current context — it stays. A memory untouched for months with no related facts in the manifold has a long geodesic from every active region — it fades. The manifold forgets naturally. You never configure a decay rate.
2603.02240 — V2 SLM: Bounded Persistent Memory →
03 — Compression Theory
Information-Theoretic Compression · arXiv:2604.06392
H(X|Y) H(X)     ·     I(X;Y) 0

These are Shannon's fundamental information inequalities. H(X|Y) is the conditional entropy of prompt X given prior context Y — the irreducible information content that cannot be compressed away without information loss. I(X;Y) is mutual information — the quantity of information that X and Y share, which is always non-negative.

The compression algorithm is bounded by these inequalities. It cannot reduce the prompt below H(X|Y) without losing information. Everything above that floor is redundancy — and redundancy is what SLM strips before forwarding to the LLM.

The 60–95% compression claim is not marketing. It is bounded by information theory. Structured payloads — JSON, code, schema-conformant text — have high mutual information with their schema and surrounding context. Most bytes are redundant given the schema. SLM exploits this: extractive compression preserves keys, signatures, and structural anchors; the schema reconstructs the rest. The bound is provable. The implementation is byte-exact reversible on structured paths.
  • Structured payloads (JSON, code) 60–95% reduction
  • Unstructured prose 15–40% reduction
  • Byte-exact reversible on structured paths I(compressed; original) = H(original)
compression — live example
# Check prompt compression on a structured payload $ slm compress --status --payload schema.json ↳ original: 4,821 tokens · compressed: 287 tokens · ratio: 94.0% algorithm: extractive · reversible: true · H(X|schema) = 287 bits provider KV-cache prefix aligned: 90% match · net saving: 97.2%
2604.06392 — Qualixar OS Architecture →
The field

Why information geometry?

Background · Amari & Rao, 1945–1985

A geometry built for probability.

Information geometry, developed by Shun-ichi Amari and C. R. Rao, is the study of probability distributions as geometric objects. The key insight: probability distributions do not live on a flat plane. They live on a curved manifold — the statistical manifold — where the natural notion of distance is the Fisher information metric, not Euclidean distance. Amari's 1985 monograph unified differential geometry and statistics into a single framework that lets you reason about distributions the same way classical geometry reasons about shapes.

Why it fits agent memory

Memories are distributions, not points.

When an agent stores a fact, it does not store a crisp value — it stores a probability distribution over possible values, weighted by confidence at observation time. A memory with confidence 0.97 and one with confidence 0.14 are not points separated by a number. They are distributions separated by a geodesic on the statistical manifold. Treating them as Euclidean points — as cosine similarity does — throws away exactly the information that matters most for reliability. Information geometry keeps it. That is why SLM retrieval improves as the agent operates: the manifold accumulates evidence and the geometry tightens around verified facts.

Published Research

Three papers. All open.

Every claim on this page is traceable to a peer-reviewed arXiv preprint. Read the proofs, reproduce the results, cite the work.

01 · Retrieval metric
dFR(p, q) = arccos( Σ √(pi · qi) )

Fisher-Rao Distance

Confidence-weighted geodesic on the statistical manifold. Replaces cosine similarity in all recall paths. 3 formal properties. 1 tight bound.

arXiv:2603.14588 →
02 · Lifecycle model
γtγ = 0  ·  Expp(v)

Riemannian Lifecycle

Geodesic consolidation and geometric decay. Memories strengthen along short geodesics to related context. No TTL, no scheduler — geometry decides.

arXiv:2603.02240 →
03 · Compression theory
H(X|Y) ≤ H(X)  ·  I(X;Y) ≥ 0

Information-Theoretic Compression

Shannon-bounded extractive compression. 60–95% on structured payloads. Byte-exact reversible. The bound is proven. The ratio is measured.

arXiv:2604.06392 →