AI++ // Lessons from the Claude Code leak


The big news last week was that Anthropic mistakenly leaked the source code of Claude Code by leaving source maps in the package. Part of the source code referenced Claude Mythos, which has been properly announced this week as a model that’s incredibly good at finding software bugs and creating security exploits. It’s so good that it’s only being shared with 40 partners as part of Project Glasswing.

Here’s a quick heads up, this newsletter is going to be changing format soon. As Langflow integrates into IBM, we need to change where we send the newsletter from. It might look a little different, but it’s going to be the same content, links and cadence, so watch out!

Phil Nash
Developer relations engineer for
IBM​

🛠️ Building with AI, Agents & MCP

Claude Code leaks

When building your own agent, the internals of one of the most popular coding agents are always interesting. Plenty of people dug into the source code to break down what makes the Claude Code harness so good. Some people even visualized it, check out this deep dive and Claude Code Unpacked. Of particular interest was this look at how Claude Code builds its system prompt.

There was some fun too, finding all the spinner verbs that Claude Code uses while it loads, or that it has a list of frustration keywords that help it classify bad behavior. That’s one way to run evaluations in production.

If you’re interested in more, check the Latent Space round up of the leak. Or if you want the non-Claude Code article on how to build an agent harness, Anthropic did release this article on harness design for long-running application development.

Building and improving agents

The team at PostHog shared what they wish they knew about building agents. Point 3 is “your context is your advantage” so this guide to context engineering might help along the way. For more on adding to context, this article on how agentic RAG works is also invaluable. If you’re interested in agentic RAG, we’ve been working on an agentic RAG system called OpenRAG that you should check out. It’s open-source and we’d love for you to check it out and contribute if you can. I also gave a talk recently about using Docling with OpenSearch for advanced RAG applications.

If you want to improve your agents you can check in with Dropbox to find out how they used DSPy to improve their LLM judge, or how LangChain built evals for their Deep Agents project.

MCP news

A cool little experiment from Google Deepmind showed that using Gemini API Skills and the Gemini Docs MCP server combined improved the pass rate on an eval set from 7.7% to 96.3%. Skills and MCP servers serve different, complementary purposes.

Meanwhile, in the world of the MCP spec, here’s an interesting look into Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do. Tool Annotations have been around in the spec for a while, but they shouldn’t be ignored.

🧠 New models

  • Google released their latest open-source model Gemma 4, which has been getting very good reviews, and, for the legal minded, is licensed with the Apache 2.0 license instead of the less permissive Gemma Terms of Use that covered the previous versions of the model
  • If you want to interact with audio and speech, there were some interesting releases last week:
  • More interested in vision models? Granite 4.0 3B Vision is a new model for enterprise document understanding and can go hand-in-hand with Docling for parsing your unstructured data
  • Finally, Trinity-Large-Thinking was released, claiming to be the best open frontier model released outside of China

🗞️ Other news

🧑‍💻 Code & Libraries

🔦 Langflow Spotlight

Here’s a new Langflow use-case you can build. If you get a bunch of newsletters that you don’t have the time to read, you can turn them into a podcast with this Langflow workflow.

Check out the template here.

🗓️ Events

This week you’ll find me and Tejas from the IBM developer relations team at AI Engineer Europe in London. Tejas is speaking about harnesses in AI, so don’t miss that.

Enjoy this newsletter? Forward it to a friend.

2755 Augustine Dr, 8th Floor, Santa Clara, CA 95054
Unsubscribe · Preferences

AI++ newsletter

Subscribe for all the latest news for developers on AI, Agents and MCP curated by the Langflow team.

Read more from AI++ newsletter

Jensen Huang has declared on a podcast that we have reached AGI. For a very specific definition of AGI that probably doesn't agree with what you might think AGI is. One would have thought that the afterglow of NVIDIA GTC would have provided enough hype for at least the rest of the month. Meanwhile, for those building agents, there has been a lot of talk about CLIs and Agent Skills, and this week we focus on evaluating skills to make sure they do what they are supposed to. WebMCP has been an...

Agents are starting to take on a life of their own, and as agent builders we need to consider the potential outcomes. The story of the autonomous agent that was denied when it opened a pull request to contribute to matplotlib and consequently wrote a hit piece on the maintainer had opinions on the internet veering between a fabrication or the beginning of Skynet. I think the lesson we should be taking away from this is that the the end user of an agent is not the only human that may come into...

The last couple of weeks has seen the explosion of OpenClaw (née ClawdBot), developers around the world have finally found the agent that acts like the AI they've been promised. I'm personally a little concerned over the security and privacy aspects of letting a powerful agent run wild with an all access pass to your computer and all your data, but I am excited to see the experimentation. It's fun, and maybe a little silly, to see the growth of social media for agents, with Moltbook providing...