
.png)
by Sushanth Kummitha
When we built AskNeedl at needl.ai, our flagship Q&A chatbot at Needl, the early results were promising. It could dig deep into user-linked data sources — annual reports, presentations, exchange filings — and produce insightful answers using RAG and structured prompts.
But we quickly hit a wall: prompt quality was everything.
If the prompt wasn’t perfectly framed, or if we didn’t guide the model through a series of intermediate reasoning steps, the output would drop in quality fast. What we needed was a system that could break down, reframe, and enrich user queries dynamically.
That’s when we started exploring Model Context Protocol (MCP). By exposing AskNeedl’s core capabilities (tools, prompts, data access) via a fully async, SSE-capable MCP server, we enabled external clients like Claude to:
Model Context Protocol (MCP) is a specification for exposing prompts, tools, and resources over a network interface.
In our case, MCP became the way we let outside models like Claude interact with AskNeedl’s engine — treating it less like a chatbot, and more like an extensible Q&A backend.
From the official MCP intro:
“MCP enables agents and runtime environments to fetch model input context dynamically, from structured remote sources.”
Through MCP, we turned our LLM expertise into remote-callable capabilities. This includes:
MCP gave us a structured way to expose:
This turned AskNeedl from a monolith into a context provider, capable of powering external agents through composable APIs.
With MCP, AskNeedl became a backend for anyone building smart model workflows — not just our UI.
The async-first, SSE-powered implementation means external clients can stream responses, invoke tools concurrently, and get partial answers immediately.
.png)
.png)
In a system where every request might:
async is optional
We designed our server so that every layer is async — from request parsing to final tool execution — runs on `asyncio`. This ensures:
We use Python’s `contextvars` to isolate auth context across coroutines:
import contextvars
token_ctx: contextvars.ContextVar[str] = contextvars.ContextVar("token")
This allows us to:
We use a`Bearer token` Authentication header as auth layer to secure tool access.
Example header:
Authorization: Bearer my-api-key
Our middleware extracts the token, stores it in an async-safe `contextvar`, and validates it. This design ensures:
Sample logic:
token = request.headers.get("Authorization", "").replace("Bearer ", "")
token_ctx.set(token)
if not token:
return JSONResponse({"error": "Unauthorized"}, status_code=401)
@mcp.tool(name="ask_feed")
async def ask_feed(prompt: str, feed_id: str) -> AskNeedlResponse:
token = token_ctx.get()
return await _ask_needl_request({"prompt": prompt, "feed_id": feed_id}, token)
Features:
@mcp.prompt(name="summarize_annual_report")
async def summarize_annual_report(message: str) -> str:
return f"{ANNUAL_REPORT_SUMMARY_PROMPT}\n\nUser Input:\n{message}"
This lets external agents construct well-scaffolded input prompts by simply passing a message string. The structure, tone, and format are handled remotely.
def create_sse_server(mcp: FastMCP):
app = mcp.http_app(path="/sse", transport="sse", middleware=[…])
return Starlette(routes=[Mount("/", app=app)])
SSE streams tokens or structured payloads as they’re generated. Perfect for:
class AuthenticateRequestMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
token = extract_token(request)
token_ctx.set(token)
if not validate_token(token):
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return await call_next(request)
Protects all routes
We use structured logs throughout:
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger.info(f"Calling tool {tool_name} with params: {params}")
Why it matters:
This guide is for developers who want to build and deploy their own Remote MCP server. Whether you’re working on internal LLM orchestration, agent frameworks, or enterprise tooling — here’s a clear path.
mkdir my-mcp-server && cd my-mcp-server
python -m venv venv && source venv/bin/activate
pip install fastapi fastmcp httpx uvicorn starletteUse `contextvars` to store per-request token safely.
# context.py
import contextvars
token_ctx = contextvars.ContextVar("token")
# middlewares/auth.py
from fastapi.responses import JSONResponse
from starlette.middleware.base import BaseHTTPMiddleware
from context import token_ctx
class AuthenticateRequestMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
token = request.headers.get("Authorization", "").replace("Bearer ", "")
token_ctx.set(token)
if not token:
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return await call_next(request)
# server.py
from fastapi import FastAPI
from fastmcp import FastMCP
from starlette.routing import Mount
from starlette.applications import Starlette
from starlette.middleware import Middleware
from middlewares.auth import AuthenticateRequestMiddleware
from context import token_ctx
mcp = FastMCP("My Remote MCP")
# Register tools or prompts here (see next step)
def create_sse_app():
mcp_app = mcp.http_app(
path="/sse",
transport="sse",
middleware=[Middleware(AuthenticateRequestMiddleware)],
)
return Starlette(routes=[Mount("/", app=mcp_app)], lifespan=mcp_app.router.lifespan_context)
app = FastAPI()
app.mount("/", create_sse_app())
if __name__ == "__main__":
import uvicorn
uvicorn.run(
"server:app",
host="0.0.0.0",
port=CONFIG.PORT,
loop="asyncio",
)
# tools.py
from context import token_ctx
from pydantic import BaseModel
import httpx
class ResponseModel(BaseModel):
result: str
@mcp.tool(name="echo_tool", description="Echoes the input message.")
async def echo_tool(message: str) -> ResponseModel:
token = token_ctx.get()
return ResponseModel(result=f"Echo: {message}, Token: {token}")
#!/bin/bash
set -e
TIMEOUT=180
WORKERS=1
THREADS=8
PORT=${PORT:-8000}
gunicorn -b 0.0.0.0:$PORT \
-k uvicorn.workers.UvicornWorker \
src.askneedl_mcp_server.server:app \
- timeout $TIMEOUT \
- workers $WORKERS \
- threads $THREADS \
- max-requests 1000 \
- max-requests-jitter 100 \
- log-level "info"
One way is you can use the official MCP inspector tool which offers a interactive interface to test and debug your remote MCP server’s liveness, resources, prompts, and tools — all without writing extra code. It supports direct inspection of locally developed or packaged servers via npx, making it ideal for rapid iteration and validation during development.
Another way to test your MCP server locally especially with LLM agents like Claude, you can use the `mcp-remote` npm package. This acts as a shim/proxy that:
Even if your MCP server is hosted remotely, the Claude desktop app do not allow remote integrations unless you’re on Claude Max plan. However, using `mcp-remote`, you can bridge that gap on any plan.
Here’s how:
{
"mcpServers": {
"my_mcp_server": {
"command": "npx",
"args": [
"-y",
"mcp-remote@latest",
"http://0.0.0.0:8000/sse",
"8000",
"--allow-http",
"--transport",
"sse-only",
"--header",
"Authorization: Bearer ${API_KEY}"
],
"env": {
"API_KEY": "<your_api_key>"
}
}
}
}
How this works:
This is a great way to test your tools locally before deploying them.
Learn more: npmjs.com/package/mcp-remote
These help ensure the MCP server is secure, stable, and extensible for broader use.
MCP transformed AskNeedl from a chatbot into a platform.
And most importantly: it’s not just about sending better prompts. It’s about unlocking the full value of your LLM system by separating concerns, adding structure, and thinking like a protocol designer.
If you’re building LLM infra, it might be time to ship your own MCP server.
You can also check this post on Medium.