AI++ // Lessons from the Claude Code leak


The big news last week was that Anthropic mistakenly leaked the source code of Claude Code by leaving source maps in the package. Part of the source code referenced Claude Mythos, which has been properly announced this week as a model that’s incredibly good at finding software bugs and creating security exploits. It’s so good that it’s only being shared with 40 partners as part of Project Glasswing.

Here’s a quick heads up, this newsletter is going to be changing format soon. As Langflow integrates into IBM, we need to change where we send the newsletter from. It might look a little different, but it’s going to be the same content, links and cadence, so watch out!

Phil Nash
Developer relations engineer for
IBM​

🛠️ Building with AI, Agents & MCP

Claude Code leaks

When building your own agent, the internals of one of the most popular coding agents are always interesting. Plenty of people dug into the source code to break down what makes the Claude Code harness so good. Some people even visualized it, check out this deep dive and Claude Code Unpacked. Of particular interest was this look at how Claude Code builds its system prompt.

There was some fun too, finding all the spinner verbs that Claude Code uses while it loads, or that it has a list of frustration keywords that help it classify bad behavior. That’s one way to run evaluations in production.

If you’re interested in more, check the Latent Space round up of the leak. Or if you want the non-Claude Code article on how to build an agent harness, Anthropic did release this article on harness design for long-running application development.

Building and improving agents

The team at PostHog shared what they wish they knew about building agents. Point 3 is “your context is your advantage” so this guide to context engineering might help along the way. For more on adding to context, this article on how agentic RAG works is also invaluable. If you’re interested in agentic RAG, we’ve been working on an agentic RAG system called OpenRAG that you should check out. It’s open-source and we’d love for you to check it out and contribute if you can. I also gave a talk recently about using Docling with OpenSearch for advanced RAG applications.

If you want to improve your agents you can check in with Dropbox to find out how they used DSPy to improve their LLM judge, or how LangChain built evals for their Deep Agents project.

MCP news

A cool little experiment from Google Deepmind showed that using Gemini API Skills and the Gemini Docs MCP server combined improved the pass rate on an eval set from 7.7% to 96.3%. Skills and MCP servers serve different, complementary purposes.

Meanwhile, in the world of the MCP spec, here’s an interesting look into Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do. Tool Annotations have been around in the spec for a while, but they shouldn’t be ignored.

🧠 New models

  • Google released their latest open-source model Gemma 4, which has been getting very good reviews, and, for the legal minded, is licensed with the Apache 2.0 license instead of the less permissive Gemma Terms of Use that covered the previous versions of the model
  • If you want to interact with audio and speech, there were some interesting releases last week:
  • More interested in vision models? Granite 4.0 3B Vision is a new model for enterprise document understanding and can go hand-in-hand with Docling for parsing your unstructured data
  • Finally, Trinity-Large-Thinking was released, claiming to be the best open frontier model released outside of China

🗞️ Other news

🧑‍💻 Code & Libraries

🔦 Langflow Spotlight

Here’s a new Langflow use-case you can build. If you get a bunch of newsletters that you don’t have the time to read, you can turn them into a podcast with this Langflow workflow.

Check out the template here.

🗓️ Events

This week you’ll find me and Tejas from the IBM developer relations team at AI Engineer Europe in London. Tejas is speaking about harnesses in AI, so don’t miss that.

Enjoy this newsletter? Forward it to a friend.

2755 Augustine Dr, 8th Floor, Santa Clara, CA 95054
Unsubscribe · Preferences

AI++ newsletter

Subscribe for all the latest news for developers on AI, Agents and MCP curated by the Langflow team.

Read more from AI++ newsletter

Working with LLMs is weird, but I never thought it would be as weird as OpenAI having to specifically tell their models not to talk about goblins, gremlins, raccoons, trolls, ogres, or pigeons. It raises so many questions. Thankfully after someone spotted the instructions in the Codex base instructions, OpenAI did give an explanation as to where the goblins came from. They never mentioned why raccoons and pigeons got caught up in the fantasy creature fascination though. In this edition of...

Is a token crunch coming? This week GitHub paused sign-us for GitHub Copilot Pro, Pro+ and Student plans, tightened up their usage limits, and removed Opus from their Pro plans. And today, Anthropic seemed to remove Claude Code from new Pro plans, though that has been reversed quickly. In general, while this is only seeming to affect individual plans related to coding agents, it could point to an inflection point where AI companies start considering how their pricing matches up to their...

Jensen Huang has declared on a podcast that we have reached AGI. For a very specific definition of AGI that probably doesn't agree with what you might think AGI is. One would have thought that the afterglow of NVIDIA GTC would have provided enough hype for at least the rest of the month. Meanwhile, for those building agents, there has been a lot of talk about CLIs and Agent Skills, and this week we focus on evaluating skills to make sure they do what they are supposed to. WebMCP has been an...