AI++ // memory, search, evals and risk; what it takes to be an agent


Claude Mythos is here, except it’s called Fable 5 and comes with a few restrictions. It appears to be the largest model released and, according to the benchmarks, the most accomplished. Even more so than Opus 4.8 that was only released 2 weeks ago. It’s also the most expensive model, so you might want to think twice before swapping it into your RAG support chat bot.

While it’s impressive to see the frontier march forward, this week in AI++ we’ll take a look at some of the techniques people are using to build agents for production. We’ll also have a look at a new Langflow release and round up some looks at search and memory for agents.

Phil Nash

Developer relations engineer for IBM

🛠️ Building with AI, Agents & MCP

Agents in production

To understand whether your agent is doing its job you need to evaluate its performance. The guide at How to Evaluate AI Agents is a great start to understand what you’re targeting and the ways to go about it. The OpenAI team walk us through building self-improving tax agents with Codex and shows how evals help you build loops that hill-climb to the best results.

The Anthropic team wrote about containing Claude across the different products in which it exists, identifying risks, reducing blast radius and identifying what to trust. Along similar lines, Sean Goedecke compares the risks and benefits of agents over pipelines.

Finally, LangChain describe how Lyft built their own agent platform sharing how the agents are evaluated in production with tracing and monitoring.

Brand new Langflow

Great news in the world of Langflow with the release of version 1.10. This version upgrades the Langflow Assistant from building components to building whole flows with you. It also adds Memory Bases that persist conversation context across sessions in a flow, and configurable database connectors for Knowledge Bases.

Memory and Search

We’ll start with this in depth look into agentic search that was originally a talk from the AI Engineer Europe conference. Watch the talk or walk through the examples yourself. A recent study showed that grep is all you need, but was it right? Was the harness doing a lot of the work instead?

Have you considered what to do with images in RAG? The team at Kapa have, and they describe how they index images for RAG.

Finally, the team at mem0 do a rundown of how popular agent harnesses manage their memory. There is a lot of work to be done in the memory space, and this is a great overview of techniques and drawbacks.

🧠 New models

🗞️ Other news

🧑‍💻 Code & Libraries

🔦 Langflow Spotlight

Did you see that you can now apply policies to agent actions in Langflow? Policies turns natural language rules into guards for tools directly within the agent. Prompts can guide behavior, but Policies constrain execution. Learn about how Policies work in this blog post.

🗓️ Events

The AI Coding Summit will be in London and online on July 6th and 7th with talks and workshops on MCP, agentic systems, AI-driven testing & debugging, and real-world best practices.

Use the promo code AI++ for a 10% discount on tickets.

Enjoy this newsletter? Forward it to a friend.

2755 Augustine Dr, 8th Floor, Santa Clara, CA 95054
Unsubscribe · Preferences

AI++ newsletter

Subscribe for all the latest news for developers on AI, Agents and MCP curated by the Langflow team.

Read more from AI++ newsletter

The last couple of weeks has seen students booing commencement speakers at graduation ceremonies in Florida and Arizona when they mentioned AI. This is a visceral reaction to what they see as a threat to their careers. Meanwhile, developers working with AI are burning out and getting “Brain Fry” from doing more work at higher intensities without the same fulfillment. I wrote a bit about this myself, sharing that I found it hard to be proud of a useful little app that I built. AI is changing...

Working with LLMs is weird, but I never thought it would be as weird as OpenAI having to specifically tell their models not to talk about goblins, gremlins, raccoons, trolls, ogres, or pigeons. It raises so many questions. Thankfully after someone spotted the instructions in the Codex base instructions, OpenAI did give an explanation as to where the goblins came from. They never mentioned why raccoons and pigeons got caught up in the fantasy creature fascination though. In this edition of...

Is a token crunch coming? This week GitHub paused sign-us for GitHub Copilot Pro, Pro+ and Student plans, tightened up their usage limits, and removed Opus from their Pro plans. And today, Anthropic seemed to remove Claude Code from new Pro plans, though that has been reversed quickly. In general, while this is only seeming to affect individual plans related to coding agents, it could point to an inflection point where AI companies start considering how their pricing matches up to their...