April 29, 2026
Contents
What is this?
This is an overview of the principles I used to assemble zagent, a coding harness for getting more progress out of a single “shot”. I’ll be both describing the topics I worked on top of as well as the structure behind what I put together.
If you’re looking for a post to read that gives copy-and-paste’able commands, this isn’t for you. If you’re alright with reading something more explanatory, then continue on!
What is this not?
zagent is not going to replace your Claude code or pi (which I predominately use) but the ideas below should be high level enough you can implement it in your own harnesses or coding agent systems.
The background
I didn’t by any means invent a new model or algorithm, I simply applied together some existing concepts which are accessible yourself. Being transparent, this is my way of building up towards a general purpose version of what Google accomplished with AlphaEvolve.
To break down what the heck is going on with zagent, there are three “primitives” in the area of coding agents that would be useful to know.
Coding agent
From editors like Cursor to headless systems like Devin, there’s a large variety of offerings that all fall under the notion of “coding agents”. Simplifying it to the bare minimum, a coding agent is an AI that can take a prompt from someone and write code to accomplish some goal. However it may be accessed by a user (could be tagging in a Slack workspace, sending a message on Telegram, writing a prompt from a UI, etc), the underlying step from general agents is that it can write and run code.
Sometimes underappreciated in domains other than literally writing software, the power of coding agents is in how much is built on top of code, making them immediately ‘effective’ in the world around us today. It could be a “short lived” agent that only runs to solve a specific problem before exiting or a “long running” agent with a growing memory. Depending on the particular use case you’re looking for, one may be better than another.
In the context of writing software that delivers something, I’ve personally found the philosophy of short lived agents to be better suited.
Ralph loops
Ralph Wiggum loops, named literally after the Simpsons character, is a technique for working with coding agents that looks something like the below pseudo-code:
while not done:
fire coding agent at task(s)
repeat until done
For instance, the Claude code plugin would run a while true loop in bash until the LLM outputted a specific string indicating it had actually completed the task rather than said things which sounded nice. In pseudo-code that’d look something like:
while "DONE" not in last_output:
fire claude code and tell it to say "DONE" when finished
continue until done
To avoid running out of context (and to keep productive when one goes to sleep), folks would run “ralph loops” since the while true serves as a way to reset the context over and over, letting it run ‘infinitely’. In the case of problems where the task is going through a large bullet point list of items (ie meticulously writing unit tests across a large codebase), it works well since the tokens that filled up the context about prior solved items isn’t relevant to the context needed for solving problems moving forward.
However, in the case of problems where you do lose something by resetting the context (ie a complex integration which requires knowing about all the pieces involved to be useful), then ralph loops can fall short. While still a useful technique, it’s no longer meme’d as a solution for “solving programming” for this reason.
RLMs
An idea popularized from a blog post and then published to arXiv, RLMs broadly solve the problem of “running out of context” but in an importantly different way. Rather than place the “infinite loop” above the LLM (like done in the ralph loop), what if the loop were conceptually brought into the agent loop itself? In RLMs, this is done by letting the agent recursively call itself or other agents before coming back with a final answer.
Explaining how this works with LLMs but with an analogy: suppose you wake up to a text message asking you to research something that you have five minutes to respond to but you haven’t had the chance to even have coffee yet. Lacking the energy to Google around, you text someone else who you think either knows the answer already or wouldn’t mind finding it, they get back with the answer, you forward to the first person, and then all’s done.
A profound utility from this is being able to “stretch” your context window since spawned sub-agents can go through their context windows exploring something rather than the top-level agent you provided the original prompt to. Nowadays, in conjunction with stuff like memory, some of the older problems with arbitrarily large context windows have tools for tackling them.
Where “infinite context” can fall short can be broadly explained by how “completely illuminating a house such there no shadows” makes it uninhabitable. It’s no secret LLMs can be convincing whether to themselves to users falling into AI psychosis. As a result, letting an agent ruminate on some goal or task (even if it’s rational like programming), can lead to adverse results which are seen as unproductive to the person hoping to finish an app or such.
zagent terms
Inspired by my experience with herding coding agents, there are three layers I’ve assembled into zagent that apply the above ideas. Before you ask, yes, the names are inspired by One Piece.
Code cannon
In my prior projects using “code cannons” like rewriting git in zig or developing a modern toolkit between Elixir and WebAssembly, what I was really doing was leveraging Vers VMs as the RLM environments in which sub-agents were working on scoped problems. To differentiate from the ideal of a code factory, this RLM pattern is what I’ve referred to as a “code cannon”.
In the case of rewriting the git CLI, there are several subcommands which can be worked on in parallel (and on different files which can prevent conflicts when merging changes). You can think of this like how, at a hackathon, you may have one person working on the backend, one person working on the frontend, and one person working on the slideshow presentation; each of them can work on their piece of the overall project without stepping on each others’ toes.
Code pirate
Taking a step back and contemplating what I was really doing when “firing code cannons”: I would see what the progress or status of changes were, break down the next wave of changes I wanted to see, provisioning new agents with their respective prompts, and letting it run for a while before coming back to my laptop and repeating.
Enter the “code pirate”, a ralph loop that works from a markdown file firing code cannons until it finishes more substantial progress.
By bridging together the context-resetting of the Ralph loop (the pirate) and the context-mindfulness of the RLMs (the cannons), it establishes a coding harness which is able to accomplish larger diffs like building out sterling (if it’s still private, it’s coming soon!).
When I come back to my computer to review a captain result, it’s less about knitting knots in feature intentions and more about steering the army of coding agents overall. Making sterling with the code pirate was less about firing it over and over at a goal but more setting goal(s), it finishes them through, and then setting new goals to be implemented (like making a peanut butter jelly sandwich).
Code captain
Everything up to this point I can say truthfully has yielded a real result that would have taken more time or effort if I used a different tool. This next “layer” is something I’ve been tinkering with and have not yet found something that feels like I “cracked it”. However, I’m sharing here in case the concepts are of use to someone else facing similar problems.
When tackling projects where it “working” is non-negotiable (ie it meets a test coverage quota, an ambiguity that would lead some agents to giving up early), totally depending on the LLM to come back with a result can be anticlimatic.
To remedy this while tinkering with Lean, I’ve started working on a “code captain” which behaves like a code pirate but, rather than let the agent exit when it’s gone astray, I added a gate which prevents the pirate from exiting until all conditions are met.
If the gate’s not well defined, then the agent can find a way to exit early. If the gate’s redefinable (ie learning about new objectives or constraints over time) or even appendable, then the agent may still find a way to exit early. So, ultimately, software engineering’s a game of scoping objectives well.
Takeaways
Training employees versus hiring interns is like the difference between vertical and horizontal scaling. Likewise, the difference between leveling up a single person versus spinning up agents to fill in certain tasks is like the difference between vertical and horizontal scaling but for responsibilities. The underlying problem with coding harnesses is boiling down the responsibilities of a software engineer into horizontally scalable skills.
It’s already the case in some hedge funds that folks will develop models for executing strategies but aren’t picking up the phone and placing orders themselves. While there are still some firms which rely on old fashioned methods, the analog to software is that there will eventually be categories of products where the code defining these products isn’t governed by people but instead by the systems established by them.
Until the day coding’s finally solved, we shall still have problems to solve. Hack the planet!