AI++ // Lessons from the Claude Code leak

The big news last week was that Anthropic mistakenly leaked the source code of Claude Code by leaving source maps in the package. Part of the source code referenced Claude Mythos, which has been properly announced this week as a model that’s incredibly good at finding software bugs and creating security exploits. It’s so good that it’s only being shared with 40 partners as part of Project Glasswing.

Here’s a quick heads up, this newsletter is going to be changing format soon. As Langflow integrates into IBM, we need to change where we send the newsletter from. It might look a little different, but it’s going to be the same content, links and cadence, so watch out!

Phil Nash
Developer relations engineer for IBM

🛠️ Building with AI, Agents & MCP

Claude Code leaks

When building your own agent, the internals of one of the most popular coding agents are always interesting. Plenty of people dug into the source code to break down what makes the Claude Code harness so good. Some people even visualized it, check out this deep dive and Claude Code Unpacked. Of particular interest was this look at how Claude Code builds its system prompt.

There was some fun too, finding all the spinner verbs that Claude Code uses while it loads, or that it has a list of frustration keywords that help it classify bad behavior. That’s one way to run evaluations in production.

If you’re interested in more, check the Latent Space round up of the leak. Or if you want the non-Claude Code article on how to build an agent harness, Anthropic did release this article on harness design for long-running application development.

Building and improving agents

The team at PostHog shared what they wish they knew about building agents. Point 3 is “your context is your advantage” so this guide to context engineering might help along the way. For more on adding to context, this article on how agentic RAG works is also invaluable. If you’re interested in agentic RAG, we’ve been working on an agentic RAG system called OpenRAG that you should check out. It’s open-source and we’d love for you to check it out and contribute if you can. I also gave a talk recently about using Docling with OpenSearch for advanced RAG applications.

If you want to improve your agents you can check in with Dropbox to find out how they used DSPy to improve their LLM judge, or how LangChain built evals for their Deep Agents project.

MCP news

A cool little experiment from Google Deepmind showed that using Gemini API Skills and the Gemini Docs MCP server combined improved the pass rate on an eval set from 7.7% to 96.3%. Skills and MCP servers serve different, complementary purposes.

Meanwhile, in the world of the MCP spec, here’s an interesting look into Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do. Tool Annotations have been around in the spec for a while, but they shouldn’t be ignored.

🧠 New models

Google released their latest open-source model Gemma 4, which has been getting very good reviews, and, for the legal minded, is licensed with the Apache 2.0 license instead of the less permissive Gemma Terms of Use that covered the previous versions of the model
If you want to interact with audio and speech, there were some interesting releases last week:
- Google released Gemini 3.1 Flash Live, their latest audio and voice model
- Mistral released the open-source Voxtral TTS
- Cohere released the open-source Cohere Transcribe for STT
More interested in vision models? Granite 4.0 3B Vision is a new model for enterprise document understanding and can go hand-in-hand with Docling for parsing your unstructured data
Finally, Trinity-Large-Thinking was released, claiming to be the best open frontier model released outside of China

🗞️ Other news

Stripe Projects: Provision and manage services from the CLI
What Data Scientists Actually Do in the Age of AI Agents
OpenAI released plugins for Codex, which can be Skills, Apps or MCP servers
People have wanted “sign in with ChatGPT” for a while, and swyx points out that you can do this with the Codex App Server

🧑‍💻 Code & Libraries

Hugging Face’s Transformers.js 4 is released
pg_textsearch brings BM25 lexical search to PostgreSQL
Hippo Memory is a biologically inspired memory for AI agents
Jai is a sandbox for containing agents on Linux
Zerobox is a cross-platform process sandbox powered by OpenAI Codex’s runtime
Microsoft’s Agent Lightning promises to help you train your agents with almost zero code changes

🔦 Langflow Spotlight

Here’s a new Langflow use-case you can build. If you get a bunch of newsletters that you don’t have the time to read, you can turn them into a podcast with this Langflow workflow.

Check out the template here.

🗓️ Events

This week you’ll find me and Tejas from the IBM developer relations team at AI Engineer Europe in London. Tejas is speaking about harnesses in AI, so don’t miss that.