March 09, 2026
Contents
There’s a funny video?
That’s right, you can watch it on:
A video game?
You can think of the funny video as a trailer for a game which was the actual project I worked on for the Hermes Agent hackathon. You can watch a playthrough of the game on Twitter
Here I modded an open source game that’s inspired by Factorio to update the “point of contact” for the Hermes Agent to be a factory building game. Levels and tutorials guide a person playing through the features I assembled together:
- Level 1: Prompt a Playwright web browser agent in a Vers VM
- Level 2: Prompt an iMessage communication agent locally
- Level 3: Prompt a GitHub administrative agent inside an Apple container rather instead of a Docker container
- Level 4: Prompt a coding agent in a cloud sandbox
Why
Like how every AI company has a similar looking logo, every AI company with a website has a centered input bar that resembles Google:

Compared to:

And every AI company with a big enough engineering team will have their own browser and so forth. The list of similarities is endless so what are we missing?
A great deal of the work done in harnesses or orchestration revolves around clarification. What do I mean by that? Let’s refer to a classic XKCD comic:

A point of the comic can be said in the bit about how “some things are difficult for humans but easy for computers and some things are difficult for computers but easy for humans”. We already have companies with market caps of trillions of dollars being run by people and they seem to continue along just fine. Translating how organizations of people work into systems of agents is a game of clarifying the necessary loops people or agents should be working in.
We already know that AI can be organized to work on flashy or sizeable problems yet we’re always eager to chip away at arranging the next best system. With folks rediscovering political science but for agents, it’s worth recognizing the never ending rabbit hole here; there’s always going to be a number to increase or decrease in a score or benchmark.
Where things could be interesting is if we consider how science fiction always has intuitive interfaces; holograms that slide at the gesture of a hand, voice input immediately available, and information that presents itself as itself and not an output of a medium. We get what’s in front of us rather than only see it.
What factory building games accomplish well is visually watching the pulse of the game. By being able to watch the pulse of a game, I’d compare it to being able to watch an animation or diagram of cellular activity. And that was precisely what I thought would fit well in conjunction with the newly added features like iMessage or Apple containers.
Nevertheless, so I can tie this into the whole “argument for alternative interfaces” and not just link to Iron Man on YouTube. The miracle of modern video calling tech is people can talk to folks thousands of miles away as though they were right in front of them without having ever gained geographical distance. Sure, you’re talking to a face on a screen and not a person; with VR you’re talking to a face on two screens instead. But, if you’ve ever talked to any person through a smartphone, then you can see how much different it is in terms of presence versus sending a hand written letter.
The promise of the information highway was that all of the world’s information would be at our fingertips and it’s gotten pretty good at it if we’re to be honest. For AI to be a similar advancement in the world, it’s got to come with the new interface. We already have information at our fingertips so where are the interfaces with capabilities at our fingertips?
GitHub
Finally, if you scrolled down here for the GitHubs, here ya go
- https://github.com/hdresearch/shapez.io -
shapez.iofork with mod - https://github.com/hdresearch/hermes-agent -
hermes-agentfork - https://github.com/hdresearch/shapez - Custom bridge/server (submodule of
hermes-agent)