/ home / posts / A coding agent with direction

April 29, 2026

Contents

What is this?

This is an overview of the principles I used to assemble zagent, a coding harness for getting more progress out of a single “shot”. I’ll be both describing the topics I worked on top of as well as the structure behind what I put together.

If you’re looking for a post to read that gives copy-and-paste’able commands, this isn’t for you. If you’re alright with reading something more explanatory, then continue on!

What is this not?

zagent is not going to replace your Claude code or pi (which I predominately use) but the ideas below should be high level enough you can implement it in your own harnesses or coding agent systems.

The background

I didn’t by any means invent a new model or algorithm, I simply applied together some existing concepts which are accessible yourself. Being transparent, this is my way of building up towards a general purpose version of what Google accomplished with AlphaEvolve.

To break down what the heck is going on with zagent, there are three “primitives” in the area of coding agents that would be useful to know.

Coding agent

From editors like Cursor to headless systems like Devin, there’s a large variety of offerings that all fall under the notion of “coding agents”. Simplifying it to the bare minimum, a coding agent is an AI that can take a prompt from someone and write code to accomplish some goal. However it may be accessed by a user (could be tagging in a Slack workspace, sending a message on Telegram, writing a prompt from a UI, etc), the underlying step from general agents is that it can write and run code.

graph LR; Prompt-->Agent Agent["Coding agent\n(Can write/run code)"]-->Output

Sometimes underappreciated in domains other than literally writing software, the power of coding agents is in how much is built on top of code, making them immediately ‘effective’ in the world around us today. It could be a “short lived” agent that only runs to solve a specific problem before exiting or a “long running” agent with a growing memory. Depending on the particular use case you’re looking for, one may be better than another.

In the context of writing software that delivers something, I’ve personally found the philosophy of short lived agents to be better suited.

Ralph loops

Ralph Wiggum loops, named literally after the Simpsons character, is a technique for working with coding agents that looks something like the below pseudo-code:

while not done:
    fire coding agent at task(s)
    repeat until done

For instance, the Claude code plugin would run a while true loop in bash until the LLM outputted a specific string indicating it had actually completed the task rather than said things which sounded nice. In pseudo-code that’d look something like:

while "DONE" not in last_output:
    fire claude code and tell it to say "DONE" when finished
    continue until done

To avoid running out of context (and to keep productive when one goes to sleep), folks would run “ralph loops” since the while true serves as a way to reset the context over and over, letting it run ‘infinitely’. In the case of problems where the task is going through a large bullet point list of items (ie meticulously writing unit tests across a large codebase), it works well since the tokens that filled up the context about prior solved items isn’t relevant to the context needed for solving problems moving forward.

graph LR; Prompt["Send prompt before going to sleep"]-->Ralph Ralph["Ralph loop\n(Resetting and repeating over and over till it's done)"]-->Goal["Completed goal"] Ralph-->Ralph

However, in the case of problems where you do lose something by resetting the context (ie a complex integration which requires knowing about all the pieces involved to be useful), then ralph loops can fall short. While still a useful technique, it’s no longer meme’d as a solution for “solving programming” for this reason.

RLMs

An idea popularized from a blog post and then published to arXiv, RLMs broadly solve the problem of “running out of context” but in an importantly different way. Rather than place the “infinite loop” above the LLM (like done in the ralph loop), what if the loop were conceptually brought into the agent loop itself? In RLMs, this is done by letting the agent recursively call itself or other agents before coming back with a final answer.

graph LR; Prompt-->Agent Agent-->Sub["Sub-agent"] Sub-->Web["Web request"] Sub-->Code["Run code"] Sub-->Process["Process results"] Sub-->Agent Agent-->Result

Explaining how this works with LLMs but with an analogy: suppose you wake up to a text message asking you to research something that you have five minutes to respond to but you haven’t had the chance to even have coffee yet. Lacking the energy to Google around, you text someone else who you think either knows the answer already or wouldn’t mind finding it, they get back with the answer, you forward to the first person, and then all’s done.

A profound utility from this is being able to “stretch” your context window since spawned sub-agents can go through their context windows exploring something rather than the top-level agent you provided the original prompt to. Nowadays, in conjunction with stuff like memory, some of the older problems with arbitrarily large context windows have tools for tackling them.

Where “infinite context” can fall short can be broadly explained by how “completely illuminating a house such there no shadows” makes it uninhabitable. It’s no secret LLMs can be convincing whether to themselves to users falling into AI psychosis. As a result, letting an agent ruminate on some goal or task (even if it’s rational like programming), can lead to adverse results which are seen as unproductive to the person hoping to finish an app or such.

zagent terms

Inspired by my experience with herding coding agents, there are three layers I’ve assembled into zagent that apply the above ideas. Before you ask, yes, the names are inspired by One Piece.

Code cannon

In my prior projects using “code cannons” like rewriting git in zig or developing a modern toolkit between Elixir and WebAssembly, what I was really doing was leveraging Vers VMs as the RLM environments in which sub-agents were working on scoped problems. To differentiate from the ideal of a code factory, this RLM pattern is what I’ve referred to as a “code cannon”.

graph TD; Agent-->Sub1["Sub-agent"] Agent-->Sub2["Sub-agent"] Agent-->Sub3["Sub-agent"] subgraph cannon[" "] Sub1 Sub2 Sub3 Sub1-->RF1["Read file"] Sub1-->WF1["Write file"] Sub1-->RP1["Run program"] Sub2-->RF2["Read file"] Sub2-->WF2["Write file"] Sub2-->RP2["Run program"] Sub3-->RF3["Read file"] Sub3-->WF3["Write file"] Sub3-->RP3["Run program"] end

In the case of rewriting the git CLI, there are several subcommands which can be worked on in parallel (and on different files which can prevent conflicts when merging changes). You can think of this like how, at a hackathon, you may have one person working on the backend, one person working on the frontend, and one person working on the slideshow presentation; each of them can work on their piece of the overall project without stepping on each others’ toes.

Code pirate

Taking a step back and contemplating what I was really doing when “firing code cannons”: I would see what the progress or status of changes were, break down the next wave of changes I wanted to see, provisioning new agents with their respective prompts, and letting it run for a while before coming back to my laptop and repeating.

Enter the “code pirate”, a ralph loop that works from a markdown file firing code cannons until it finishes more substantial progress.

graph LR; Pirate["Code pirate"]-->Pirate Pirate-->SA1 Pirate-->SA2 Pirate-->SA3 Pirate-->SA4 subgraph pair1["Code cannon"] SA1["Sub-agent"] SA2["Sub-agent"] end subgraph pair2["Code cannon"] SA3["Sub-agent"] SA4["Sub-agent"] end

By bridging together the context-resetting of the Ralph loop (the pirate) and the context-mindfulness of the RLMs (the cannons), it establishes a coding harness which is able to accomplish larger diffs like building out sterling (if it’s still private, it’s coming soon!).

When I come back to my computer to review a captain result, it’s less about knitting knots in feature intentions and more about steering the army of coding agents overall. Making sterling with the code pirate was less about firing it over and over at a goal but more setting goal(s), it finishes them through, and then setting new goals to be implemented (like making a peanut butter jelly sandwich).

Code captain

Everything up to this point I can say truthfully has yielded a real result that would have taken more time or effort if I used a different tool. This next “layer” is something I’ve been tinkering with and have not yet found something that feels like I “cracked it”. However, I’m sharing here in case the concepts are of use to someone else facing similar problems.

When tackling projects where it “working” is non-negotiable (ie it meets a test coverage quota, an ambiguity that would lead some agents to giving up early), totally depending on the LLM to come back with a result can be anticlimatic.

To remedy this while tinkering with Lean, I’ve started working on a “code captain” which behaves like a code pirate but, rather than let the agent exit when it’s gone astray, I added a gate which prevents the pirate from exiting until all conditions are met.

graph LR; subgraph pirate["Repeat until complete"] Pirate["Code captain"]-->Pirate end pirate-->SA1 pirate-->SA2 pirate-->SA3 pirate-->SA4 subgraph pair1["Code cannon"] SA1["Sub-agent"] SA2["Sub-agent"] end subgraph pair2["Code cannon"] SA3["Sub-agent"] SA4["Sub-agent"] end

If the gate’s not well defined, then the agent can find a way to exit early. If the gate’s redefinable (ie learning about new objectives or constraints over time) or even appendable, then the agent may still find a way to exit early. So, ultimately, software engineering’s a game of scoping objectives well.

Takeaways

Training employees versus hiring interns is like the difference between vertical and horizontal scaling. Likewise, the difference between leveling up a single person versus spinning up agents to fill in certain tasks is like the difference between vertical and horizontal scaling but for responsibilities. The underlying problem with coding harnesses is boiling down the responsibilities of a software engineer into horizontally scalable skills.

It’s already the case in some hedge funds that folks will develop models for executing strategies but aren’t picking up the phone and placing orders themselves. While there are still some firms which rely on old fashioned methods, the analog to software is that there will eventually be categories of products where the code defining these products isn’t governed by people but instead by the systems established by them.

Until the day coding’s finally solved, we shall still have problems to solve. Hack the planet!