The big news last week was that Anthropic mistakenly leaked the source code of Claude Code by leaving source maps in the package. Part of the source code referenced Claude Mythos, which has been properly announced this week as a model that’s incredibly good at finding software bugs and creating security exploits. It’s so good that it’s only being shared with 40 partners as part of Project Glasswing.
Here’s a quick heads up, this newsletter is going to be changing format soon. As Langflow integrates into IBM, we need to change where we send the newsletter from. It might look a little different, but it’s going to be the same content, links and cadence, so watch out!
Phil Nash
Developer relations engineer for IBM
🛠️ Building with AI, Agents & MCP
Claude Code leaks
When building your own agent, the internals of one of the most popular coding agents are always interesting. Plenty of people dug into the source code to break down what makes the Claude Code harness so good. Some people even visualized it, check out this deep dive and Claude Code Unpacked. Of particular interest was this look at how Claude Code builds its system prompt.
There was some fun too, finding all the spinner verbs that Claude Code uses while it loads, or that it has a list of frustration keywords that help it classify bad behavior. That’s one way to run evaluations in production.
If you’re interested in more, check the Latent Space round up of the leak. Or if you want the non-Claude Code article on how to build an agent harness, Anthropic did release this article on harness design for long-running application development.
Building and improving agents
The team at PostHog shared what they wish they knew about building agents. Point 3 is “your context is your advantage” so this guide to context engineering might help along the way. For more on adding to context, this article on how agentic RAG works is also invaluable. If you’re interested in agentic RAG, we’ve been working on an agentic RAG system called OpenRAG that you should check out. It’s open-source and we’d love for you to check it out and contribute if you can. I also gave a talk recently about using Docling with OpenSearch for advanced RAG applications.
If you want to improve your agents you can check in with Dropbox to find out how they used DSPy to improve their LLM judge, or how LangChain built evals for their Deep Agents project.
MCP news
A cool little experiment from Google Deepmind showed that using Gemini API Skills and the Gemini Docs MCP server combined improved the pass rate on an eval set from 7.7% to 96.3%. Skills and MCP servers serve different, complementary purposes.
Meanwhile, in the world of the MCP spec, here’s an interesting look into Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do. Tool Annotations have been around in the spec for a while, but they shouldn’t be ignored.
🧠 New models
- Google released their latest open-source model Gemma 4, which has been getting very good reviews, and, for the legal minded, is licensed with the Apache 2.0 license instead of the less permissive Gemma Terms of Use that covered the previous versions of the model
-
If you want to interact with audio and speech, there were some interesting releases last week:
- More interested in vision models? Granite 4.0 3B Vision is a new model for enterprise document understanding and can go hand-in-hand with Docling for parsing your unstructured data
- Finally, Trinity-Large-Thinking was released, claiming to be the best open frontier model released outside of China
🗞️ Other news
🧑💻 Code & Libraries
🔦 Langflow Spotlight
Here’s a new Langflow use-case you can build. If you get a bunch of newsletters that you don’t have the time to read, you can turn them into a podcast with this Langflow workflow.
Check out the template here.
🗓️ Events
This week you’ll find me and Tejas from the IBM developer relations team at AI Engineer Europe in London. Tejas is speaking about harnesses in AI, so don’t miss that.