AI++ // Did we reach AGI? Depends on how you define AGI...

Jensen Huang has declared on a podcast that we have reached AGI. For a very specific definition of AGI that probably doesn't agree with what you might think AGI is. One would have thought that the afterglow of NVIDIA GTC would have provided enough hype for at least the rest of the month.

Meanwhile, for those building agents, there has been a lot of talk about CLIs and Agent Skills, and this week we focus on evaluating skills to make sure they do what they are supposed to. WebMCP has been an exciting experiment in the browser, so we have more on what it is and how it differs from MCP. Oh, and if you're a LiteLLM user, sorry if your builds broke today.

Phil Nash
Developer relations engineer for Langflow

🛠️ Building with AI, Agents & MCP

Supply chain attack

If you're building agents with Python, you might be a user of LiteLLM to give you a unified interface to access multiple LLMs. You might have also discovered that the latest versions of LiteLLM have fallen victim to a supply chain attack. The compromised versions look to have been pulled from PyPI now, but it's a good reminder to stay vigilant with your dependencies.

CLIs and Skills

Agents are good at using CLIs, but can they be better? Justin Poehnelt argues that you should rewrite your CLIs for agents. Don't worry about big rewrites though, many of these patterns can be added incrementally.

Agent Skills are the hot way to expose CLIs to agents, and Angie Jones wrote 3 principals for designing skills. You might want to test or evaluate those skills too, Phil Schmid shared a practical guide to testing skills and Robert Xu at LangChain also wrote up how they evaluate skills. For an automated test, there is also a skill validator to check your skills against the spec.

MCP and WebMCP

Before we get too carried away with skills, we can't forget about MCP. This article makes good arguments that MCP provides much more than token bloat, so please read MCP is dead; long live MCP.

WebMCP is still a new experiment, but there has already been enough confusion that the Chrome team had to write up when to use WebMCP and MCP. And if you want a good primer on what WebMCP can do, check out WebMCP for beginners.

Sandboxes

If an agent can get work done by calling a CLI via a skill or using an MCP tool, then they are increasingly writing their own code. But we should never let them execute that untrusted code in our environments. Sandboxes isolate the untrusted code and protect your data, so we're seeing more options. First up is NVIDIA's OpenShell, released as part of their NemoClaw additions to OpenClaw. LangChain also released LangSmith Sandboxes as a hosted sandbox.

🧠 New models

OpenAI released GPT-5.4 (also in mini and nano variations)
Google released Gemini Embedding 2 which is a natively multimodal embedding model that works on text, images, video, audio and documents.
In image model news, Midjourney have an alpha version of their v8 and Microsoft released their in-house image model MAI-Image-2

🗞️ Other news

Langflow released version 1.8 that includes global model providers, a new API for executing flows, flow and component traces, and much more.
I don't think I expected this from A16z, but this is a great look into the world of forming context from data for agents
OpenAI shared how treating prompt injection as social engineering helps them resist attacks. I like this framing since social engineering works on people too, but we do our best to defend against it.
The Vimeo engineering team shared an interesting take on how to build a resilient system of LLMs to translate subtitles that still synchronize properly
Anthropic built generative UI into Claude and this developer reverse-engineered it and built it for the terminal
Mozilla are building an open-source “stack overflow for agents” called cq
Research has showed that calling your LLM an expert may not help, but telling it to be a safety monitor will help it refuse attacks
Anthropic have created a dream feature for Claude where it cleans up it's own memory over time
I enjoyed this story in which Claude Code was caught making end to end tests pass by patching the application at runtime

🧑‍💻 Code & Libraries

An interested example of an always-on agent memory layer built in Google's ADK
Cloudflare released a crawl API that will scrape content for you
OpenHarness is an open-source project built on Vercel's AI SDK to provide the building blocks to build general-purpose agents
Mellea, a Python library for writing generative programs, released version 0.4.0

Enjoy this newsletter? Forward it to a friend.

2755 Augustine Dr, 8th Floor, Santa Clara, CA 95054
Unsubscribe · Preferences

AI++ newsletter