Model Context Protocol (MCP): Architecture Deep Dive 2026

The Model Context Protocol (MCP) is reshaping how LLM clients talk to external tools and data sources. Unlike ad-hoc integration patterns where every client implements its own tool-calling logic, MCP defines a standardized, transport-agnostic architecture for tool discovery, resource management, and secure execution. This post breaks down MCP’s JSON-RPC 2.0 core, transport layer options (stdio, SSE, HTTP), the initialization handshake, the four core primitives (tools, resources, prompts, sampling), and production deployment patterns used by Anthropic’s Claude Desktop, Claude Code, and ecosystem clients like Cursor and VS Code extensions.

The stakes are high: teams shipping agentic systems now must choose between rebuilding the same tool-calling boilerplate or standardizing on MCP. We’ll examine why this protocol matters, how it actually works under the wire, and the architectural trade-offs you’ll face in production.

Why Model Context Protocol matters in 2026

MCP emerged from a simple observation: Claude’s desktop integration needs (Claude Desktop, Claude Code, Cursor, VS Code extensions) and enterprise tool-use patterns all converge on a repeated problem — how do we let an LLM safely discover, call, and iterate on external tools without reimplementing authorization, error handling, and discovery for each client? The 2026 wave of agentic applications amplified this pressure. Without a standard, every new tool vendor (GitHub, Slack, Postgres, filesystem) had to build custom integrations for Claude, then again for GPT, then again for open-source clients. MCP inverts this: servers implement the protocol once, and any client that understands JSON-RPC 2.0 can work with them. This mirrors the REST API revolution, but for LLM tool use.

In 2026, MCP has become the de-facto standard for Claude-based agents. It’s not a requirement — you can call tools directly via custom APIs — but it’s increasingly where ecosystem momentum lies. Anthropic published the specification at spec.modelcontextprotocol.io in public, and reference server implementations exist for filesystem, GitHub, PostgreSQL, Slack, Brave search, and more. The architecture is composable: a single client can wire multiple MCP servers, aggregate their tools and resources, and route LLM requests to whichever server owns a tool.

MCP Client-Host-Server Architecture

MCP’s core topology is simple: a client (Claude Desktop, Claude Code, or any app using the MCP SDK) communicates with multiple servers via a host layer that understands JSON-RPC 2.0 and a transport protocol (stdio, Server-Sent Events, or HTTP). The client never talks directly to servers; it always goes through the host, which routes messages, aggregates tool/resource manifests, and enforces sandboxing boundaries.

The client’s job is simple: send a JSON-RPC request (e.g., “initialize”, “tools/call”, “resources/read”) and wait for a response. It doesn’t know or care where the server lives — whether it’s a subprocess, an HTTP endpoint, or an SSE stream. The host layer abstracts transport. This design has three immediate wins: (1) clients aren’t coupled to specific transports, (2) servers can be deployed anywhere without client code changes, and (3) tool discovery is centralized, enabling capability negotiation before any tool call.

The host layer’s contract is strict: every message in or out is valid JSON-RPC 2.0. A request has an id, method, and params. A response matches the id and carries a result or error. Notifications (requests without an id) are used for async messages like resource updates. This discipline means tooling for JSON-RPC debugging works out of the box — the host can log every message, and clients can replay conversations to understand failures.

Each MCP server declares what it offers during the Initialize handshake. The client sends an Initialize request listing the client’s name, version, and capabilities (e.g., “I support text/plain resources and tool invocation”). The server responds with its name, version, and list of capabilities it implements: tools (with schemas), resources (with URI patterns), prompts (with templates), and sampling support. From that point on, the client knows exactly which tools it can call and what parameters each tool expects — schema validation happens client-side, and the server trusts the host to have validated.

JSON-RPC 2.0 Message Format and Transport

MCP mandates JSON-RPC 2.0 (RFC 7349) as the wire format. Every message is a JSON object. Requests look like this:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "resources/read",
  "params": {"uri": "file:///path/to/config.yaml"}
}

The host routes it to the right server and returns:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {"contents": [{"uri": "file:///path/to/config.yaml", "mimeType": "text/yaml", "text": "..."}]}
}

Or, on error:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {"code": -32603, "message": "File not found"}
}

Error codes follow JSON-RPC convention: -32700 (parse error), -32600 (invalid request), -32601 (method not found), -32602 (invalid params), -32603 (internal error), and custom codes for MCP-specific failures (e.g., -32001 for “resource not found”).

Notifications (messages without an id) flow server-to-client for async updates. When a filesystem watcher detects a file change, the server sends:

{
  "jsonrpc": "2.0",
  "method": "notifications/resources/list_changed"
}

The client processes it but doesn’t send back a response. This is how resource subscriptions work: the client says “I care about changes to this resource,” and the server asynchronously pushes updates whenever the resource changes.

The JSON-RPC layer is agnostic to transport. Three transports are officially supported:

stdio (standard input/output): The server is a subprocess. The client spawns it and speaks to it over stdin/stdout. Each message is newline-delimited JSON. This is the default for local servers (filesystem, git, databases running on the same machine). It has zero latency and is trivial to debug — just run the subprocess in a terminal and watch JSON flow. No firewall, no TLS, no service discovery required. Downside: only works for subprocesses, not remote services.
Server-Sent Events (SSE): The server runs an HTTP server. The client makes an HTTP POST to send requests and opens an EventSource stream to receive notifications and async responses. This is halfway between stdio and full HTTP — it’s simpler than bidirectional WebSocket but handles streaming well. Used for local services that want to decouple from the subprocess model (e.g., a long-running daemon you want to restart without killing the client).
Streamable HTTP: The server is an HTTP/2 or HTTP/3 endpoint. The client makes HTTP POST requests with full request/response semantics. Used for cloud-deployed servers, load-balanced clusters, and multi-tenant services. Requires TLS, auth (API key, mTLS), and server discovery (hardcoded URL or service discovery). Higher latency than stdio, but fully cloud-native and supports scaling.

Choosing a transport is a deployment decision. A reference implementation for one server can ship with stdio as the default (Claude Desktop spawns it), and the same server can expose SSE or HTTP for web clients. The server doesn’t change; only the transport shim changes.

The Four Core Primitives: Tools, Resources, Prompts, Sampling

MCP defines four primitive types that servers expose to clients. Understanding them is key to understanding what MCP enables.

Tools: Function Discovery and Schema

Tools are named functions the LLM can call. Each tool has a name, human-readable description, and a JSON Schema defining its input parameters. The client (Claude) reads the schema during Initialize, and when the user asks Claude to “read the file at /home/alice/config.yaml,” Claude knows to call the read_file tool with schema validation.

{
  "name": "read_file",
  "description": "Read the contents of a text file",
  "inputSchema": {
    "type": "object",
    "properties": {
      "path": {
        "type": "string",
        "description": "Absolute path to the file"
      }
    },
    "required": ["path"]
  }
}

The server is free to implement this tool however it wants — filesystem I/O, a database query, a remote API call. The client doesn’t care. Execution is one-way: client sends “tools/call” with the tool name and input, the server processes it (possibly in a sandbox), and returns a result or error. No request for approval, no human-in-the-loop by default (though the host can intercept for security).

Tools are the most visible MCP primitive and power agentic workflows. A GitHub MCP server exposes tools like “list_repositories”, “create_issue”, “get_file_contents”. A database server exposes “query_sql” and “insert_row”. Clients aggregate tools from all connected servers and use their schemas to guide tool selection during LLM inference.

Resources: Stateful Data and Subscriptions

Resources are pointers to data that the server manages. Unlike tools (which are functions), resources are data: a file on disk, a GitHub issue, a database row, a Slack channel. Each resource has a URI (e.g., “file:///path/to/file”), a MIME type, and content (either text or binary blob). Resources are read-only from the client’s perspective — the client asks to read a resource, and the server returns its current content.

The key innovation is subscriptions. A client can say “alert me whenever this resource changes” (e.g., “notify me if the file is modified by another process”). The server maintains a watch on the resource, and when it changes, it sends an async notification to the client. This powers real-time collaboration in Claude Code: while the user edits a file, the server watches the filesystem, and any changes made by other processes (e.g., a formatter running in the background) are pushed to Claude immediately.

Resources are also listed and filtered. A client can ask “list all resources matching file://**/*.py” (all Python files), and the server returns a paginated list. The schema includes URI templates, so the client knows which URIs a server supports without querying.

Prompts: Parameterized Templates

Prompts are reusable, parameterized snippets of text that servers can expose. A prompt might be “analyze this code for security issues” or “generate a unit test for this function.” The client can list available prompts, see what arguments they accept, and ask the server to render a specific prompt with specific arguments.

Prompts are less commonly used than tools or resources but powerful for common tasks. Rather than hardcoding a security analysis prompt in the client, a server can expose it as a prompt, making it discoverable and parameter-aware. The server returns the rendered text, which the client then sends to Claude as part of the conversation.

Sampling: LLM Invocation Within Servers

Sampling is MCP’s most unusual primitive. It allows a server to ask the client’s LLM to generate text on the server’s behalf. This is bidirectional: the client has an LLM (Claude), and the server can request samples from it during tool execution.

Use case: A code-review tool (server) is asked to review a function. The server calls tools/call on the client, which invokes the LLM with a prompt like “Review this code and suggest improvements.” The LLM generates a response, and the server processes it. This is rare but essential for advanced agentic workflows where the server needs to make decisions based on LLM reasoning without the main client driving the interaction.

Capabilities Negotiation and Initialize Handshake

The Initialize message is the handshake. The client sends:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {
      "roots": {"listChanged": true},
      "sampling": {}
    },
    "clientInfo": {
      "name": "Claude Desktop",
      "version": "0.5.0"
    }
  }
}

The client tells the server what it supports: roots (filesystem-like hierarchies), sampling (ability to invoke the LLM), etc. The server responds with its capabilities:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2024-11-05",
    "capabilities": {
      "tools": {},
      "resources": {},
      "prompts": {}
    },
    "serverInfo": {
      "name": "Filesystem MCP Server",
      "version": "1.0.0"
    }
  }
}

From this point on, the client knows: “This server offers tools and resources, but not prompts or sampling.” The client can then list tools (tools/list) and resources (resources/list) to populate its in-memory catalog. If a later tool call fails with “unknown tool,” the client may refresh its catalog (in case the server reloaded), but it won’t ask about capabilities it already knows the server doesn’t support.

This contract is durable. Once initialized, the client and server are bound for the lifetime of the connection. Re-initialization is not standard; you close the connection and reconnect if either side changes state in an incompatible way.

Transport Layer Deep Dive

Choosing a transport is one of the first MCP deployment decisions. Each transport has different tradeoffs for latency, scalability, and operational complexity.

stdio: The Default, Simplest Path

stdio is the out-of-the-box transport for local development and single-user systems. The client spawns the server as a subprocess, inherits its stdin/stdout, and sends newline-delimited JSON. Every line is a complete JSON-RPC message.

Pros: zero latency, trivial debugging (watch stdout), process lifecycle management for free (if the server crashes, the client detects it immediately).

Cons: only works for local subprocesses, scales poorly beyond a few servers (each subprocess is memory overhead), no built-in clustering or failover.

Claude Desktop uses stdio for local servers. When you add a filesystem MCP server to Claude Desktop, it downloads the binary, adds an entry to the config file pointing to the binary path, and Claude spawns it on startup. If it crashes, Claude restarts it. If you want to debug, you can edit the config to run the server in a debugger, and Claude will work with it.

SSE and HTTP: The Cloud-Ready Transports

For servers that need to survive client restarts or serve multiple clients, SSE or HTTP makes sense.

SSE runs an HTTP server. The client makes a POST to /mcp/messages with a JSON body (the request), and gets back a 200 response with the result. For async notifications, the client opens an EventSource stream to /mcp/stream and listens for “message” events. The server can push notifications asynchronously without the client polling.

HTTP (streaming variant) is full request-response. The client makes a POST /mcp/messages with the request, and the response is the result. No built-in async channel — if the server wants to push a notification, it either uses SSE (hybrid) or the client must poll (resources/list periodically).

In production, teams typically run MCP servers as HTTP microservices. A client connects to a pool of MCP endpoints via service discovery (Kubernetes service, DNS, load balancer), and the host layer round-robins requests. This decouples client scaling from server scaling. You can have 10 Claude instances talking to 1 filesystem server (shared state is now a concern) or 100 specialized servers each handling one data source.

Server Discovery and Security

Local stdio servers are hardcoded in the client config. Remote HTTP servers need discovery. MCP doesn’t mandate a discovery protocol; teams use whatever they have: Kubernetes DNS, Consul, internal service registries, or hardcoded config. A client config might specify:

servers:
  - name: filesystem
    command: /usr/local/bin/mcp-server-filesystem
  - name: github
    type: http
    url: https://mcp-github.example.com
    auth:
      type: bearer
      token: secret-api-key
  - name: database
    type: http
    url: https://mcp-db.example.com/rpc
    auth:
      type: mtls
      cert: /etc/mcp/db-client.crt
      key: /etc/mcp/db-client.key

Security is delegated to the transport. stdio assumes the subprocess is sandboxed at the OS level (kernel namespaces, containers). SSE/HTTP use TLS and authentication (API keys, mTLS, OAuth). The MCP spec doesn’t define authentication; it assumes the host (client) handles it before sending requests to the server.

Server Lifecycle and State Management

An MCP server is stateless by design, but operations assume a clean lifecycle.

A server starts up. The client sends Initialize. The server responds with its capabilities. From that point, the server can be called for any method listed in its capabilities. The server maintains internal state (cached data, subscriptions, watches) for the duration of the connection. If the connection drops, the server cleans up. The next connection starts from Initialize again.

This is different from stateful RPC systems (like gRPC with service mesh). MCP assumes each connection is independent. If you have 10 clients, you have 10 connections and potentially 10 instances of the server’s internal state. For read-only servers (filesystem, GitHub public repos), this is fine. For read-write servers (databases, Slack), teams must be careful about consistency.

State Machine for Server Lifecycle

A server’s lifecycle has discrete states: Idle (not yet initialized), InitWait (client has sent Initialize, server hasn’t responded), Ready (initialized, can accept tool calls), and Subscribed (handling async resource updates). If a tool call fails due to schema mismatch or a missing resource, the server returns an error but stays in Ready state. If a catastrophic error happens (the server crashes), the connection closes and the client reconnects from Idle.

Production Deployment Patterns

Deploying MCP at scale requires thinking through aggregation, health, and fallback.

Multi-Server Aggregation and Capability Routing

A typical production setup has a capability router that connects to multiple MCP servers and merges their tool/resource lists. When a client asks “list all tools,” the router gathers responses from all servers, deduplicates (if two servers expose the same tool name, pick one or flag an error), and returns the merged list.

Tool execution is routed: the client calls “tools/call” with a tool name. The router looks up which server owns that tool and forwards the request. If the tool fails, the router returns the error. Some routers implement fallback logic: if a server is unresponsive, remove it from the catalog temporarily and route calls elsewhere.

Health Checks and Failover

Production deployments must handle server failures. A health-check sidecar pings each MCP server periodically (heartbeat every 30 seconds is common). If a server doesn’t respond, the router marks it as unhealthy and removes it from the capability catalog until it recovers. This prevents the client from seeing tools it can’t actually call.

For critical servers (e.g., the main data server), teams run multiple instances behind a load balancer. The router load-balances requests across healthy instances.

Audit Logging and Observability

Every tool call should be logged for audit trails. A typical log entry captures the tool name, input parameters (with sensitive data redacted), the result or error, and the server that handled it. This is essential for debugging and compliance.

Observability (Prometheus metrics, structured logs, distributed tracing) should instrument the router: how many tools are currently available, what’s the p99 latency for tool calls, how many servers are unhealthy, are there tools that clients request but don’t exist (mismatched catalogs).

Trade-offs, Gotchas, and What Goes Wrong

MCP’s strength is also its greatest gotcha: tool call semantics are simple, which means error handling is your responsibility.

No transactions, no rollback. If you call a tool that modifies state (e.g., “delete_file”) and it partially succeeds, there’s no way to undo it via MCP. You must implement idempotence and retry logic in the tool itself. If a “transfer money” tool gets called twice due to network retries, the server must detect the duplicate and return the same result without actually transferring twice.

Tool schemas can change between Initialize and execution. A server is allowed to add new tools or change their schemas without reinitializing. The client caches the schema at Initialize time. If you add a required parameter to a tool, old clients will call it without that parameter and fail. Versioning tools is your problem — mermaid usually means keeping the old tool and adding a new one with a new name (read_file_v2).

Resources are eventually consistent. If you subscribe to a resource, you get notifications when it changes, but there’s no guarantee you’ll see every intermediate state. If a file is modified 1000 times in a second, you might only see notification 1 and 500 and 1000. This is acceptable for most use cases but breaks if you’re trying to build a total log of all changes.

Multi-server queries are slow. If you want to know “which server has a tool called ‘X’?”, you must query all of them (or cache the manifest). In a system with 50 servers, this can be slow. Most deployments cache the manifest and refresh periodically, accepting eventual inconsistency.

Security is transport-specific. MCP doesn’t define authorization — whether a client can actually call a tool is up to the server and transport. HTTP servers often use API keys (weak for multi-user systems), mTLS (better but operationally complex), or OAuth (complex but delegated). stdio servers assume the client is already trusted (it spawned them). There’s no per-tool authorization; you can’t say “Alice can call this tool, but Bob can’t” at the MCP level.

Practical Recommendations

If you’re deploying MCP, follow these patterns:

Start with stdio locally. Download or build the reference servers (filesystem, GitHub, database), point Claude Desktop at them, and test locally. This is zero-friction setup and gives you intuition for how the protocol works.

For multiple servers, use a capability router. Don’t have the client talk to each server directly. Build (or use an open-source) router that aggregates capabilities and routes calls. This simplifies client code and enables health checking.

Implement idempotent tool execution. Every tool that mutates state should be idempotent: calling it twice with the same inputs should have the same effect as calling once. Use request IDs or content hashes to detect and deduplicate retries.

Log every tool call. Capture the tool name, input, output, errors, and which server handled it. This is essential for debugging and auditing. Redact secrets, API keys, and PII.

Health-check every server. Use a lightweight heartbeat (JSON-RPC keepalive or a dummy tool call) to detect failures quickly. Remove unhealthy servers from the capability catalog.

Test error paths. MCP assumes servers can fail. What happens if a server is down when the client tries to call a tool? Does the client retry? Does it fail gracefully? Test this before deploying to production.

Quick checklist:
– Spawn servers in containers (ephemeral, easy to replace)
– Use SSE or HTTP for production servers (not stdio)
– Implement TLS + mTLS or API key auth for HTTP servers
– Cache capability manifests and refresh on failures
– Monitor tool execution latency and error rates
– Redact sensitive parameters in logs

Frequently asked questions

What is the difference between MCP and OpenAI’s Tool/Function Calling format?

OpenAI’s tool_use and function_calling formats define how an LLM says “I want to call this tool” and what parameters to use. They’re prompt-based: the LLM’s response contains the tool name and params as structured JSON, and the client parses it. MCP, by contrast, is a server-side discovery and execution protocol — it’s about how servers expose tools and clients invoke them, independent of the LLM’s reasoning. You can use MCP tools with Claude’s tool_use feature, and Claude will invoke them correctly. The two are orthogonal: MCP handles the plumbing, and tool_use handles the LLM’s decision to call a tool.

Can a single client connect to 100 MCP servers?

Technically yes, but practically no. Each server connection consumes memory (a few MB per server for connection state and buffered data). A client with 100 servers would waste RAM. More importantly, capability discovery becomes slow. Most production deployments cap at 10-20 servers and batch them by function (one server for filesystem, one for GitHub, one for Slack, etc.). If you need more, aggregate servers behind a capability router.

Is MCP a replacement for REST APIs?

No. REST APIs are for general-purpose data access and often expose business logic (billing, authentication, entitlements). MCP is specifically for LLM tool use: it’s the protocol a server implements to say “here are the functions I expose and the data I manage, in a schema the LLM can understand.” Many products will expose both a REST API (for user-facing integrations) and an MCP interface (for LLM clients). They can share the same backend implementation.

What happens if two servers expose the same tool name?

The router must decide which one to use. Options: (1) error out and tell the client there’s a conflict, (2) rename one tool (e.g., “filesystem:read_file” vs “github:read_file”), or (3) use priority rules (if both servers have “query”, prefer the database server). Ideally, servers coordinate names via a registry, but MCP doesn’t enforce this. Team conventions work well in practice.

Can MCP servers call other MCP servers?

Not directly. MCP is a client-server protocol. A server can implement a tool that makes external HTTP calls, but it can’t be an MCP client itself. If you want server-to-server integration, use standard APIs. If you want server A to delegate to server B via MCP, the client must orchestrate it (call A, then call B, then reconcile results).

How is MCP versioned?

MCP uses a protocol version (e.g., “2024-11-05”) in the Initialize message. Clients and servers declare what version they support, and they negotiate. If client supports 2024-11-05 and server supports 2024-06-01 and 2024-11-05, they both agree on 2024-11-05. If there’s no overlap, initialization fails. This allows gradual rollouts of new protocol features.

References

Model Context Protocol Specification — Anthropic — Official MCP spec, transport definitions, and JSON-RPC contract
Model Context Protocol GitHub — modelcontextprotocol/specification — Reference implementations (Node.js, Python), example servers
JSON-RPC 2.0 Specification — IETF — JSON-RPC wire format and error codes
Anthropic Blog — Introducing the Model Context Protocol — Design rationale and use cases

Last updated: April 22, 2026. Author: Riju (about).

Anthropic Model Context Protocol (MCP): The 2026 Architecture Deep Dive