💡

Gram AI by Speakeasy

Introducing Gram by Speakeasy. Gram is everything you need to power integrations for Agents and LLMs. Easily create, curate and host MCP servers for internal and external use. Build custom tools and remix toolsets across 1p and 3p APIs. Get on the waitlist today!

MCP: an in-depth introduction

To abuse a famous quote about monoids , “MCP is an open protocol that standardizes how applications provide context to LLMs, what’s the problem?”

But even after a few hours of reading about what MCP is and working through an example , it can be confusing to follow exactly what is happening when and where. What does the LLM do? What does the MCP server do? What does the MCP client do? Where does data flow, and where are choices made?

This is an in-depth introduction to MCP: what it is, how it works, and a full walkthrough example showing how everything fits together and exactly what happens at each step.

Specifically, we deployed a deliberately buggy Cloudflare Worker. The error surfaced in Sentry, and an AI assistant (Cline) running inside Visual Studio Code (VS Code) pulled the stack trace via the hosted Sentry MCP Server, opened a matching GitHub issue through a local GitHub MCP Server, patched the code, committed the fix and redeployed - all under human approval. MCP cut the integration work from M × N bespoke bridges to M + N adapters, but it charged us in latency, security diligence, and a healthy learning curve.

Why we need MCP

When an AI assistant has to juggle real-world systems - Sentry for monitoring, GitHub for code, Jira for tickets, and Postgres for data - every extra pairing means another custom adapter, another token shim, another place to break. The result is a hairball of one-off glue code that takes maintenance time and adds security risks. MCP was created to replace that chaos with a single, predictable handshake, so any compliant host can talk to any compliant tool out of the box.

The M × N integration tax

Without MCP, each Large Language Model (LLM) host or agent (such as ChatGPT, Claude, Cline, or VS Code Copilot) plus each tool (such as Sentry, GitHub, Jira, MySQL, or Stripe) requires its own connector - that is M hosts x N tools bits of glue-code.

Every connector re-implements:

Authentication and token refresh
Data-format mapping
Error handling and retries
Rate-limiting quirks
Security hardening

The cost grows quadratically. We imagine teams will end up prioritizing a handful of integrations and calling the rest “backlog”.

One protocol to rule the connectors

MCP proposes a USB-C moment for AI tooling: Every host implements MCP once, every tool exposes an MCP Server once, and any host-and-tool pair can talk. Complexity collapses to M + N. This claim sounded too good to ignore, so we put it to the test.

Before we get to our walkthrough, let’s go through a quick primer:

MCP 101

If you already speak LSP or JSON-RPC, you’ll feel at home. If not, here’s the 30-second cheat sheet:

Core vocabulary

Word

Host

One-liner

App that holds the LLM and user UI

Examples

Claude desktop , Cursor , Cline , Windsurf

Why it matters

Generates tool calls, mediates approvals

MCP Client

One-liner

Library embedded in the host

Examples

The MCP SDK

Why it matters

Maintains a stateful session per MCP Server

MCP Server

One-liner

Lightweight wrapper in front of a tool

Examples

Sentry MCP , GitHub MCP Server

Why it matters

Exposes tools and resources uniformly

Tool

One-liner

A callable function

Examples

Why it matters

Discoverable by the client at runtime; typed JSON args

Resource

One-liner

Text or blob-like thing exposed by the MCP Server

Examples

PDF file, log file, query syntax description, documentation

Why it matters

Information the LLM or user can access

Transport

One-liner

How bytes flow between the client and server

Examples

Why it matters

Local or cloud; always JSON-RPC 2.0

Word	One-liner	Examples	Why it matters
Host	App that holds the LLM and user UI	Claude desktop , Cursor , Cline , Windsurf	Generates tool calls, mediates approvals
MCP Client	Library embedded in the host	The MCP SDK	Maintains a stateful session per MCP Server
MCP Server	Lightweight wrapper in front of a tool	Sentry MCP , GitHub MCP Server	Exposes tools and resources uniformly
Tool	A callable function	,	Discoverable by the client at runtime; typed JSON args
Resource	Text or blob-like thing exposed by the MCP Server	PDF file, log file, query syntax description, documentation	Information the LLM or user can access
Transport	How bytes flow between the client and server	or	Local or cloud; always JSON-RPC 2.0

Stateful by design

MCP insists on a persistent channel, which is usually HTTP + Server-Sent Events (SSE) for remote servers and plain stdio for local processes. The server can remember per-client context (for example, auth tokens, working directory, in-progress job IDs). This is heavier than stateless REST but enables streaming diffs, long-running jobs, and server-initiated callbacks.

Discovery flow

The client calls tools/list to ask the server, “What can you do?”
The server returns JSON describing each tool, including its name, summary, and JSON Schema for the parameters and result.
The host injects that JSON into the model’s context.
When the user’s prompt demands an action, the model emits a structured call:
The MCP Client executes it using the transport and streams back result chunks. The conversation then resumes.

Our demo scenario

The best way to learn a new protocol is by using it to solve a real problem, so here’s what we’ll do: We’ll create a real problem and build a real solution.

The recurring nightmare of the 5 PM regression alert

To help set the scene, imagine this scenario: It’s Friday afternoon, you’re the last one in the office, and just as you’re packing your bag, Slack starts screaming that there is a new regression in worker.ts:12.

We want to find the shortest route from that first Slack message to a deployed fix.

The demo stack

We want a realistic but snack-sized scenario:

Component

Cloudflare Worker (
)

Role

Emits a

on every request and reports it to Sentry via the Sentry SDK.

Sentry MCP (hosted)

Role

Tools:

hosted by Sentry, SSE transport with OAuth

GitHub MCP (local)

Role

Tools:

(run with Docker), stdio transport with API key

Cline (VS Code)

Role

MCP host, agent, code editor, and human-approval gate that connects to both MCP Servers.

Component	Role
Cloudflare Worker ( )	Emits a on every request and reports it to Sentry via the Sentry SDK.
Sentry MCP (hosted)	Tools: hosted by Sentry, SSE transport with OAuth
GitHub MCP (local)	Tools: , , , (run with Docker), stdio transport with API key
Cline (VS Code)	MCP host, agent, code editor, and human-approval gate that connects to both MCP Servers.

A buggy Cloudflare Worker reports exceptions to Sentry, which surface through the hosted Sentry MCP (SSE) into Cline within VS Code. The same session then flows down to a local GitHub MCP (stdio) that is running in Docker, allowing the agent to file an issue, add comments, and push a pull request to the GitHub repository - all under human oversight.

Setup walkthrough

Let’s set up our stack.

Setup requirements

You’ll need the following:

Requirement

Node

Version/Details

≥ 18

Purpose

To build the Worker and run npx helpers

Docker

Version/Details

Latest desktop

Purpose

To run the GitHub MCP Server locally

VS Code

Version/Details

Latest desktop

Purpose

Code editor

Cline

Version/Details

Latest release

Purpose

MCP host/agent

Sentry account

Version/Details

Free plan is fine

Purpose

To catch our crash

GitHub account

Version/Details

Any tier

Purpose

Access token for MCP Server

Cloudflare account

Version/Details

Free plan is fine

Purpose

To host our buggy code

Requirement	Version/Details	Purpose
Node	≥ 18	To build the Worker and run npx helpers
Docker	Latest desktop	To run the GitHub MCP Server locally
VS Code	Latest desktop	Code editor
Cline	Latest release	MCP host/agent
Sentry account	Free plan is fine	To catch our crash
GitHub account	Any tier	Access token for MCP Server
Cloudflare account	Free plan is fine	To host our buggy code

You’ll also need an LLM to run the Cline agent. We used Mistral Codestral 25.01 for this demo, but you can use any LLM supported by Cline .

Bootstrap the buggy worker

In the terminal, run:

When prompted, select the following options:

Enter your new project:

Install the Sentry SDK npm package:

Open your project in VS Code:

Edit wrangler.jsonc and add the compatibility_flags array with one item, nodejs_compat:

Visit the Sentry setup guide for Cloudflare Workers and copy the example code. Paste it in src/index.ts and then add the intentional bug in the fetch() method.

Edit src/index.ts:

Deploy and trigger:

In your browser, visit your Cloudflare Worker at https://bug-demo.<your-cf-hostname>.workers.dev.

You should see the following error:

Cloudflare Worker error page displaying TypeError exception

Set up the Sentry MCP Server in Cline

In VS Code, with Cline installed, follow the steps below:

💡

npx

You may need to adjust configuration settings depending on the path of your Node and npx installation. For this guide, we used Node and npx installed with Homebrew.

Cline VS Code extension showing the MCP Servers configuration panel

Click the Cline (robot) icon in the VS Code sidebar.
Click the MCP Servers toolbar button at the top of the Cline panel.
Select the Installed tab.
Click Configure MCP Servers.
Paste the Sentry MCP Server config JSON that runs npx mcp remote https://mcp.sentry.dev/sse in the window, then press Cmd + s to save.
Click Done in the top-right corner of the panel.

After saving the MCP configuration, your browser should open with a prompt to authorize Remote MCP.

Sentry OAuth authorization page for Remote MCP Server access

Click Approve so that the application can allow the Sentry MCP Server to connect with your Sentry account.

Set up the GitHub MCP server in Cline

Generate a GitHub personal access token:

In GitHub, click your profile picture to open the right sidebar.
Click Settings in the sidebar.
Click Developer settings at the bottom of the left sidebar.
Expand Personal access tokens in the left sidebar.
Click Fine-grained tokens.
Press the Generate new token button.
Enter any name for your token, and select All repositories.
Select the following permissions:
- Administration: Read and Write
- Contents: Read and Write
- Issues: Read and Write
- Pull requests: Read and Write
Click Generate token and save your token for the next step.

Now that we have a GitHub token, let’s add the GitHub MCP Server.

Configuring the GitHub MCP server in Cline's settings panel

Click the Cline (robot) icon in the VS Code sidebar.
Click the MCP Servers toolbar button at the top of the Cline panel.
Select the Installed tab.
Press Configure MCP Servers.
Paste the GitHub MCP Server config JSON that runs docker run -it --rm -e GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_PERSONAL_ACCESS_TOKEN ghcr.io/github/github-mcp-server in the window, then press Cmd + s to save.
Click Done.

Let’s take it for a spin.

Create a GitHub repository

We’ll use the GitHub MCP Server to create a new repository for our demo.

We asked Cline:

Here’s what happened next:

System prompt with tools and task

Cline sent a completion request to the LLM. The request contained our prompt, a list of tools, and the tool schemas. You can see how Cline built this prompt in the Cline repository at src/core/prompts/system.ts.

LLM generates a tool call

The LLM generates a tool call based on the prompt and the tools available. The tool call is a structured XML object that contains the name of the MCP Server, the name of the tool to be called, and the arguments to be passed to it.

Cline sends the tool call to the MCP Server

Cline sends the tool call to the GitHub MCP Server using the stdio transport.

The message is sent over the already-open stdio pipe as a single UTF-8 line, typically terminated by \n, so the server can parse it line by line.
The id is an opaque request identifier chosen by Cline; the server will echo it in its response to let the client match replies to calls.
All MCP tool invocations follow the same structure - only the method and parameters change.

GitHub MCP Server processes the request

The GitHub MCP Server receives the tool call and processes it. It calls the GitHub API to create a new repository with the specified name and privacy settings, then parses the response from the API.

GitHub MCP Server sends the response back to Cline

The GitHub MCP Server sends the response back to Cline over the stdio transport.

The response contains the id of the request and the result of the tool call.
The result is a JSON object that contains the details of the newly created repository.
Cline receives the response and displays it in the UI.
The response is also passed to the LLM for further processing and is now available in the context for the next prompt.

Cline displays the result

Cline displays the result of the tool call in the UI, and the LLM can use it in subsequent prompts.

Cline pushes the repository to GitHub

Cline pushes the new repository to GitHub using the git command.

Fixing the bug using MCP

That was the fiddly part, which only happens once. Now we can fix the bug.

Giving Cline a task

In Cline, we asked:

Cline sends the request to the LLM

Cline sends the request to the LLM, which generates a tool call to the Sentry MCP Server to fetch the latest issue.

The JSON-RPC request sent to the Sentry MCP Server looks like this:

Sentry MCP Server processes the request

The Sentry MCP Server processes the request and calls the Sentry API to fetch the latest issue. It returns the result to Cline.

Cline analyzes the Sentry issue

After receiving the issue from Sentry, Cline requests additional details to understand the stack trace:

The Sentry MCP Server returns error details:

The stack trace reveals the exact nature and location of the bug: It’s due to the (undefined).call() in src/index.ts on line 12.

Sentry error details showing the stack trace and TypeError

This response is passed to the LLM, which uses it to generate the next tool call.

Cline creates a GitHub issue

Next, at the request of the LLM, Cline uses the GitHub MCP Server to create an issue documenting the bug:

The GitHub MCP Server confirms that the issue has been created:

GitHub issue created about the Sentry error

Cline examines the codebase

To fix the bug, the LLM needs to have the code in context. The LLM initiates a tool call to read the source code of the Cloudflare Worker.

Since we’re working directly in the VS Code editor, Cline uses the read_file tool:

Cline sends the result of the tool call to the LLM, which now has the full context of the codebase.

After examining the code, the LLM responds with a proposed fix.

The LLM generates the fix and Cline applies it

The LLM generates a fix for the bug, which is then sent to Cline:

Cline creates a new branch and commits the fix

Cline creates a new branch for the fix:

This opens the VS Code terminal and creates a branch:

Cline then commits the fix:

Then, Cline pushes the new branch to GitHub:

Cline opens a PR

Finally, Cline creates a pull request (PR) with the fix:

The GitHub MCP Server confirms the PR creation.

GitHub PR showing the bug fix about Sentry issue

Human approval and deployment

The final step requires human approval. The developer reviews the PR, approves the changes, merges the PR, and deploys:

A quick visit to the Cloudflare Worker URL confirms the fix is working, and Sentry shows no new errors.

Fixed Cloudflare Worker successfully serving Hello World response

What we learned

We learned a lot about MCP, both the good and the bad. Here are our key takeaways:

The good: Protocol unification

The MCP approach delivered on its promise. We only had to set up each service once, rather than having to build custom bridges between every pair of services. This was our first concrete experience with the theoretical M + N vs M × N difference.

Consider our modest demo setup with just three components (the Cline host, Sentry server, and GitHub server):

Integration approach

Direct APIs (M × N)

Number of integrations needed

3 × 2 = 6 integrations

MCP (M + N)

Number of integrations needed

3 + 2 = 5 components

Integration approach	Number of integrations needed
Direct APIs (M × N)	3 × 2 = 6 integrations
MCP (M + N)	3 + 2 = 5 components

The difference is small, with just a few components, but scales dramatically. With 10 hosts and 10 tool backends (100 potential connections), MCP requires just 20 adapters.

Moreover, we found that the JSON Schema system improved tooling discoverability. When Cline connected to a server, it automatically received comprehensive documentation about available operations without having to consult external API references.

The challenges: Latency, security, learning curve

MCP may not be suitable for all use cases. We encountered several challenges that could limit its applicability.

Latency

The MCP approach introduces additional layers between the LLM and the APIs. This comes with a latency cost.

In our testing, each MCP hop added a little time overhead, which is negligible for most use cases but could become significant in latency-sensitive applications or during complex multi-step workflows.

This isn’t a flaw in the protocol, but rather a trade-off for the convenience of having a single, unified interface. The latency is manageable for most applications, but it’s worth considering if you’re building a performance-critical application.

Security

Authentication represents one of the more challenging aspects of MCP. The protocol requires the secure handling of access tokens, which introduces additional security considerations.

Token management: Each server requires its own authentication approach (OAuth for Sentry, API tokens for GitHub).
Permission scoping: The user needs to grant permissions for each server, which can be cumbersome.
Token refresh: OAuth flows with refresh tokens add complexity.

This may be a symptom of the relative immaturity of the ecosystem, but most MCP Clients do not yet support OAuth flows or token refreshes. This is exactly why the Sentry MCP Server is called via npx mcp-remote, which is a wrapper MCP Server that handles the OAuth flow and token refresh for you:

Learning curve

While the protocol is reasonably straightforward for developers familiar with JSON-RPC, we encountered a few hurdles.

Sparse documentation: The specification is comprehensive, but practical guides are limited.
Debugging challenges: When tools failed, error messages weren’t always clear. The first time we tried to run the Sentry MCP Server, we encountered an authentication error that was difficult to diagnose.

Ecosystem maturity

MCP is still in its early days. The available servers are primarily reference implementations rather than production-ready services. We identified several areas needing improvement.

Standardization: Common operations (like CRUD) aren’t yet standardized across servers.
Host support: LLM host support is limited and extremely variable. Some hosts support only tools, while others support resources and tools. Some hosts support only stdio, while others support SSE.

Despite these challenges, the direction is promising. Each new MCP Server adds disproportionate value to the ecosystem by enabling any compliant host to connect to it.

Future improvements

Here’s a wishlist of improvements we think would benefit the MCP ecosystem:

Performance optimizations

Future MCP implementations could benefit from several performance optimizations:

Connection optimization: Using streaming HTTP instead of SSE for long-lived connections would reduce the need for persistent connections and improve performance. (This is already in development in the MCP SDK and spec.)
Schema caching: Hosts and LLMs could cache tool schemas to avoid repeated discovery calls and wasted token usage.
Request batching: Grouping related operations into a single request would increase efficiency by reducing round trips between the client and server.
Partial schema loading: Loading only the schemas for tools likely to be used in a given context could reduce token usage and improve tool selection.

Our Cline session for this problem sent 413,300 tokens and received 2,100 in total. This is a lot of tokens for a single session, and we could have saved plenty of tokens by caching the schemas and using partial schema loading.

Enhanced security

The security model could be strengthened with:

Granular permissions: Token scopes limited to specific tools rather than entire servers would allow for granular permissions. This is part of the MCP specification (in the form of MCP Roots) but isn’t widely supported by clients or servers yet.
Approval workflows: Using more sophisticated approval UIs for dangerous operations could ensure the user is aware of the implications of each action and help them avoid prompt injection attacks.
Audit logging: Comprehensive tracking of all MCP operations would improve the security model.

Verdict and recommendations

We’re excited about the potential of MCP and think the protocol will become a key part of the AI ecosystem. However, we’d recommend caution for production use. Audit the security of any MCP Server you use and be prepared to work around the ecosystem’s current fragmentation.

While everyone is still figuring out the security details, perhaps deactivate YOLO mode and stick to human-in-the-loop workflows for now.