AI++ // MCP's birthday presents, prompt caching, LLM JSON output, and much more

Happy birthday MCP! 🥳 The world's fastest growing protocol was released on 26th November 2024 and has captivated developers and users alike. I am certain that everyone reading this newsletter has used MCP in one way or another, and will be happy to hear that there is plenty of work going on to keep improving and evolving the protocol.

In the newsletter this week we have stories on prompt caching, JSON outputs, product evals, and the evolution of LLM extensions that has brought us to the state of MCP today.

Phil Nash
Developer relations engineer for Langflow

🛠️ Building with AI, Agents & MCP

MCP's anniversary brings new features

The MCP blog celebrated one year since the original spec release with an updated spec including support for task based workflows for long-running tasks, simplified authorization flows, and extensions. One of the new extensions is MCP Apps, which builds on MCP-UI and the OpenAI Apps SDK to bring interactive interfaces to your agents.

Simplifying authorization is important for MCP servers, as this extensive article on MCP Auth shows.

Data ingestion for LLMs

This week on the Langflow blog we look at how the open-source document processor Docling can be used in Langflow to easily turn PDFs into Markdown with just one component. Docling is awesome and it comes out of the box with Langflow, making it easy to parse files, chat with them, ingest them into vector databases, or anything you choose to do with the parsed data.

How to work with LLMs

This deep look into prompt caching will help you to understand both how it works and how you can take advantage of it to decrease latency and costs.

Here's a bunch of tips about getting accurate JSON output from models. Note, sequential calling is no more accurate, but way slower, and pay attention to output order and naming.

Agents beat workflows and single loops beat subagents. These are just two of the eight learnings from one year of agents at PostHog.

Label some data, align an LLM evaluator, run the eval harness with each change. Those are the steps for using product evals.

New models!

There are so many releases, this might need a permanent section in the newsletter. In the last two weeks we've seen the release of Claude Opus 4.5, DeepSeek 3.2, and Mistral 3.

Mistral 3 is a multi-modal model that includes a 3B size that can run in the browser, check out the demo on HuggingFace here.

🗞️ Other news

From ChatGPT plugins through MCP to Agent Skills, this history of LLM extensions is a bit of a roller coaster ride. My guess is that the ride isn't over yet either.
Benchmarking memory systems caused this developer to proclaim that universal LLM memory does not exist.
Agent design is still hard, which is a very reasonable thing to say in this early stage. The article contains some great lessons from the developers current journey though.
Anthropic's take on building effective harnesses for agents is a good read on trying to make agent design easier.
OpenAI wrote about what makes a good ChatGPT app, but with the MCP app extensions discussed above, this is good reading for anyone building MCP servers for users of agents.
One way to make agents easier to build is to make them more opinionated.

🧑‍💻 Code & Libraries

MCP Bundles (MCPB) was previously known as DXT (Desktop Extensions) and has been taken over by the MCP team.
Tokenflood is a load testing tool for LLMs, allowing you to test latency across a number of different parameters. Careful when running against hosted LLM providers where you pay per token, it could cost you!
LLM Council is a project built by Andrej Karpathy that uses multiple LLMs as a council to answer questions and critique each others answers.

🔦 Langflow Spotlight

Once you've launched an agent, observability is really important to ensure that agent is behaving. Langflow supports sending tracing data to 6 different services, Arize, Langfuse, LangSmith, LangWatch, Opik and Traceloop, all you need to do is set the correct environment variables for your chosen service and you'll be able to observe the behavior of your agents in production.