When we last published AI++ Anthropic had just launched Fable 5 and everyone was very excited about another step-change in the quality of models. That excitement didn’t last long as the US government issued an export directive that effectively meant Anthropic had to withdraw access. If you missed out, Ethan Mollick wrote about what it was like to work with Fable 5.
I found two interesting looks at the world of AI this week. First, Anthropic published the results of 81,000 interviews with users of AI asking what people want from it. It covers both benefits and concerns, so is a fascinating look across how people are experiencing AI. Meanwhile the folks at Build Club put together a bunch of research into how industries are using AI in the State of AI.
In this edition of AI++ we look at the new buzzword everyone is talking about: loops! We also investigate the promise of multi-agent systems, the world of data extraction, and see whether an AI model can beat Zork.
Phil Nash
Developer relations engineer for IBM
🛠️ Building with AI, Agents & MCP
Loops and durability
There has been a lot of talk lately about how you should no longer be prompting coding agents, but building loops that do the prompting. I think this should apply to building agents too, as the key to loops is giving an agent a verifiable goal that it can loop until it completes the goal. PostHog wrote up why they are bullish on loops which outlines the basics of how they work.
One of the emerging building blocks for loops is durable execution. Dan Farrelly from Inngest wrote about Agent Loop Architecture and Sunil Pai of Cloudflare wrote Never Waste a Token, bringing in resumable streaming to the problem.
On the governance side of the loop, both IBM and Amazon have been talking about how human-in-the-loop isn’t the silver bullet it sounds like for AI supervision. IBM warns about automation bias and argues for human accountability while Amazon warns about the normalization of deviance and how human identity and ownership should govern AI decisions.
Multi-LLM systems
Fable 5 may have been retracted, but there have been reports of getting better results not from a single model but by combining panels of models. OpenRouter’s Fusion has been benchmarked as beating individual models by forming a model network. Sakana released Fugu, a “multi-agent system as a model” that they also report to beat Fable on certain benchmarks.
It’s great that lesser models can team up to beat a frontier level model, but it is worth considering time and cost. OpenRouter say that there will be N panel calls + 1 judge call, so costs for the default 3-model panel will be 4-5 times the cost of a single prompt. I don’t think this makes multi-model panels the future yet, but perhaps for the hardest problems they will be worth it.
Formatting data for AI
This week saw the release of Docling for IBM watsonx, a managed service for the open-source Docling document converter. Also Docling related, the Linux Foundation formed a working group to develop DocLang as a standard for representing documents in an structured, AI-native manner.
Google also launched the Open Knowledge Format which is intended to formalize Andrej Karpathy’s LLM-wiki pattern into an interoperable format.
🧠 New models
Z.ai’s open-source GLM-5.2 has been the one model that everyone is talking about this week. It’s too big to run yourself, unless you have your own data center, but it’s significantly cheaper than other frontier models and is being compared favourably to Opus 4.8 and GPT-5.5
🗞️ Other news
Fun with AI
Have you found yourself wondering whether LLMs are sentient? Well you can stop now, since a Microsoft researcher built a goat-powered LLM in a game. If LLMs are sentient, then so is 1999’s Age of Empires II. Speaking of games, Raymond Camden pointed Chrome’s built in prompt API at a game of Zork to see if it could win. I think the grues are winning so far.
Anthropic’s naming of model levels has been the most poetic, literally, and one data scientist wondered what would happen if you extrapolated their naming to enterprise-scale narrative objects.
🧑💻 Code & Libraries
🔦 Langflow Spotlight
Langflow Memory Bases provide AI agents with long-term, persistent memory across chat sessions using a vector-based storage layer. They use semantic search to retrieve relevant past context. Developers can filter memory by session, control ingestion timing, and use LLM preprocessing to filter out noise ensuring agents only remember useful, relevant details over time. Check out this video walkthrough of Memory Bases.
🗓️ Events
On Friday 26th June I’ll be speaking at AgentCon Perth on building MCP Apps.
From June 29th until 2nd July the AI Engineer World’s Fair is on in San Francisco. You can catch Tejas Kumar from the team speaking about Evals in AI on the first day.
The AI Coding Summit will be in London and online on July 6th and 7th with talks and workshops on MCP, agentic systems, AI-driven testing & debugging, and real-world best practices.
Use the promo code AI++ for a 10% discount on tickets.