Working with LLMs is weird, but I never thought it would be as weird as OpenAI having to specifically tell their models not to talk about goblins, gremlins, raccoons, trolls, ogres, or pigeons. It raises so many questions. Thankfully after someone spotted the instructions in the Codex base instructions, OpenAI did give an explanation as to where the goblins came from. They never mentioned why raccoons and pigeons got caught up in the fantasy creature fascination though.
In this edition of AI++ there are many tips on writing Agent Skills, a look at the latest on building Agent harnesses, and we check out some real time speech models.
As a reminder, we are working to move this newsletter over to IBM systems. So the next time you receive it, it may look a little different. Watch out for that!
Phil Nash
Developer relations engineer for IBM
🛠️ Building with AI, Agents & MCP
Skills
Writing good Agent Skills is important if you are building things yourself, in a team, or building something that others could consume through a Skill. Phil Schmid’s 8 Tips for Writing Agent Skills is a good place to start, and the team at Perplexity went even deeper on how they Design, Refine, and Maintain Agent Skills. I also liked the approach that Pulse took in describing how to collaborate on Skills as a team.
Harnesses
Agent harnesses are the biggest topic of conversation right now, with much work being done on how to turn an LLM loop and some tools into the very capable agents that we see. Addy Osmani shared a roundup on harness engineering, and Vivek Trivedy at LangChain wrote about the anatomy of an agent harness. Meanwhile, the team behind the Astro web framework released a harness framework called Flue that makes it easy to build your own harness.
Real time models
There is a section for new model news below, but I thought that it was interesting to see OpenAI and Thinking Machines, a lab started by former OpenAI CTO Mira Murati, both release real time models within the last week. Thinking Machines described their new model as an interaction model that takes part in conversations like a human would, listening in real time, interjecting when required, and able to do the same thing with vision too. Sadly the model isn’t available to experiment with yet, but the demos look very interesting.
OpenAI are touting similar skills with the new GPT-Realtime-2 model, though I don’t think it can interject in the same way. They also announced GPT-Realtime-Translate which can live translate from 70 languages into 13 output languages, and GPT-Realtime-Whisper which live transcribes speech to text.
I also learned last week while at Twilio’s SIGNAL conference about Deepgram’s Flux model. Flux was released in October last year, but is helping to drive voice AI use cases as it was trained to understand conversations, particularly turn-taking.
It’s an exciting time to be building voice interfaces to your agents.
🧠 New models
🗞️ Other news
🧑💻 Code & Libraries
- Steve Yegge announced Gas City a framework in which you can build your own Gas Town multi-agent orchestration system
- Agent Vault is an open-source credential broker that sits between agents and the APIs they call
- I came across a couple of agentic memory projects: Stash and agentmemory
- Cursor launched a TypeScript SDK that you can build your own coding agents with
- OpenUI is a toolkit for building generative UI into your agent
🔦 Langflow Spotlight
Back when Langflow released version 1.9 one of the most exciting things was the Langflow Assistant, an in app assistant that allows you to generate components from natural language. Check out the video of it in action in this post.