A research investigation into federated MCP gateways. One hub process, every MCP server, every AI client. Tool schemas discovered on demand rather than preloaded. Cache, cost telemetry, and retrieval learning shared across concurrent sessions.
Each AI coding session that integrates with MCP inherits a per-session architecture that was designed for one-to-one client-to-server use. At cross-session scale the design produces three observable regressions.
Each AI coding session spawns its own MCP server children. With N sessions × M MCPs the process count grows as O(N·M). At five sessions and thirty-six MCPs the resident process set reaches 180 processes and roughly 9 GB of memory — before any model call has been made.
Every MCP tool schema is preloaded into each session’s context window. 400+ tool definitions consume approximately 150K tokens. On a 200K-context model, 75% of the working set is spent on tool prospectus rather than task content.
Independent sessions cannot share cache, cost ledgers, or retrieval signals. Every session re-computes the same lookups, re-pays the same API costs, and re-discovers the same failure modes. There is no shared substrate for learning.
A hub process owns the MCP server lifecycle. AI clients connect to the hub over HTTP and interact exclusively through three meta-tools. Tool schemas are retrieved on demand. Cache, cost telemetry, and optional retrieval learning are shared across every connected client.
Measurements taken on a reference workload of five concurrent AI coding sessions with thirty-six MCP servers. Absolute values depend on the client, model, and specific MCP set in use; the relative shape is stable.
The reference implementation is distributed on PyPI and npm. Source under AGPL v3. The artifact runs locally with no cloud dependency and is intended to be forked, studied, and critiqued.