<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://yev.bar/feed.xml" rel="self" type="application/atom+xml" /><link href="https://yev.bar/" rel="alternate" type="text/html" /><updated>2026-05-06T17:40:21+00:00</updated><id>https://yev.bar/feed.xml</id><title type="html">Yev Barkalov</title><subtitle>Here is the description I am putting in</subtitle><entry><title type="html">Bringing Hermes to WebAssembly</title><link href="https://yev.bar/hermes-wasm" rel="alternate" type="text/html" title="Bringing Hermes to WebAssembly" /><published>2026-05-06T08:00:00+00:00</published><updated>2026-05-06T08:00:00+00:00</updated><id>https://yev.bar/hermes-wasm</id><content type="html" xml:base="https://yev.bar/hermes-wasm"><![CDATA[<h2 id="whats-in-this-post">What’s in this post?</h2>

<p>We took <a href="https://hermes-agent.nousresearch.com/">Hermes Agent</a>, developed by the folks at <a href="https://nousresearch.com/">Nous Research</a>, and brought it to WebAssembly in two different ways, detailed below, and share what our general takeaways from this experiment are.</p>

<h2 id="should-i-replace-my-hermes-with-one-of-these">Should I replace my Hermes with one of these?</h2>

<p>Probably not. If doing things with Python in WebAssembly is of interest to you, then continue reading!</p>

<h2 id="where-did-hermes-come-from">Where did Hermes come from?</h2>

<p>Having taken the world by storm, <a href="https://openclaw.ai">OpenClaw</a> is an AI assistant which uses the computer you run it on <a href="https://www.youtube.com/watch?v=WnzR5aOElvw">similarly to a person</a> (so it can do more than just <a href="https://youtu.be/7xTGNNLPyMI?si=5OR24vJniQglQL6j&amp;t=3121">recite Wikipedia articles</a>). Following this, Hermes was written in Python on top of <a href="https://github.com/SWE-agent/mini-swe-agent">mini-swe-agent</a> (unlike OpenClaw which is written in TypeScript on top of <a href="https://lucumr.pocoo.org/2026/1/31/pi/">pi</a>).</p>

<div class="mermaid">
graph LR
    subgraph outer[" "]
        direction LR
        subgraph inner["More tools + use computer"]
            A["Coding agent"]
        end
        inner --&gt;|"Same as"| B["General-purpose agent"]
    end
    style A fill:#1a1a2e,stroke:#58a6ff,stroke-width:2px,color:#c9d1d9
    style inner fill:#161b22,stroke:#58a6ff,stroke-width:1px,color:#8b949e
    style B fill:#238636,stroke:#2ea043,stroke-width:2px,color:#ffffff
    style outer fill:none,stroke:none
</div>

<p>Both are “general-purpose agents” which are really batteries-included coding agents. The “magic” of their utility comes from the assembling of the <a href="https://www.mendral.com/blog/agent-harness-belongs-outside-sandbox">“harness”</a> the agent sits inside of when handling users’ prompts; which gives it the capability to do things like click around a browser or send an email.</p>

<h2 id="why-webassembly">Why WebAssembly?</h2>

<p>I’m admittedly <a href="/blog/elixir-webassembly-billion-tokens">biased</a> when it comes to WebAssembly but, having worked on <a href="/blog/git-zig-bun-100x">multi-agent</a> <a href="/blog/zagent">projects</a> before, I was interested in seeing if WebAssembly would give a win with regard to isolation (ie spinning up multiple separate agents in parallel) or granular configurability (ie assembling agents precisely with certain tools for different “modes”).</p>

<p>Additionally, since Hermes is written in Python, I wanted to see if eagerly compiling to WebAssembly would offer closer-to-native performance.</p>

<h2 id="how-to-wasm-hermes">How to WASM Hermes</h2>

<h3 id="pyodide">Pyodide</h3>

<h4 id="run-hermes-in-pyodide-yourself">Run Hermes in pyodide yourself</h4>

<p>This app serves an <code class="language-plaintext highlighter-rouge">index.html</code> with the full agent running client-side in Pyodide. To run the below command, you may need to <a href="https://docs.vers.sh/installation">install the <code class="language-plaintext highlighter-rouge">vers</code> CLI</a> first.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vers run-commit f46a2b21-73fe-4835-ad79-8eccd523fc07 <span class="se">\</span>
  <span class="nt">--format</span> json <span class="nt">--wait</span> <span class="se">\</span>
  | <span class="nb">sed</span> <span class="nt">-n</span> <span class="s1">'s/.*"vm_id"[: ]*"\([^"]*\)".*/https:\/\/\1.vm.vers.sh/p'</span>
</code></pre></div></div>

<p><strong>Public Vers VM Commit:</strong> <code class="language-plaintext highlighter-rouge">f46a2b21-73fe-4835-ad79-8eccd523fc07</code></p>

<h4 id="how-it-works-with-pyodide">How it works with Pyodide</h4>

<div class="mermaid">
graph LR
    subgraph outer[" "]
        direction LR
        subgraph pyodide["Pyodide"]
            A["Hermes agent"]
        end
        subgraph browser["Web browser"]
            W["WebAssembly"]
        end
        pyodide --&gt; W
    end
    style A fill:#1a1a2e,stroke:#58a6ff,stroke-width:2px,color:#c9d1d9
    style pyodide fill:#161b22,stroke:#58a6ff,stroke-width:1px,color:#8b949e
    style W fill:#1a1a2e,stroke:#f0883e,stroke-width:2px,color:#c9d1d9
    style browser fill:#161b22,stroke:#f0883e,stroke-width:1px,color:#8b949e
    style outer fill:none,stroke:none
</div>

<p>The first approach is by using <a href="https://pyodide.org/en/stable/">Pyodide</a>, a Python runtime that’s ported to WebAssembly so Python programs can be interpreted and run in the browser. You can think of this as being similar to the approach that was taken with <a href="https://supabase.com/blog/postgres-wasm">bringing Postgres to WebAssembly</a>:</p>

<div class="mermaid">
graph LR
    subgraph outer[" "]
        direction LR
        subgraph buildroot["Linux VM created with Buildroot"]
            P["Postgres"]
        end
        subgraph browser2["Web browser"]
            W2["WebAssembly"]
        end
        buildroot --&gt; W2
    end
    style P fill:#1a1a2e,stroke:#58a6ff,stroke-width:2px,color:#c9d1d9
    style buildroot fill:#161b22,stroke:#58a6ff,stroke-width:1px,color:#8b949e
    style W2 fill:#1a1a2e,stroke:#f0883e,stroke-width:2px,color:#c9d1d9
    style browser2 fill:#161b22,stroke:#f0883e,stroke-width:1px,color:#8b949e
    style outer fill:none,stroke:none
</div>

<p>Postgres itself doesn’t get run in WebAssembly but instead a Linux emulator in WASM runs a modified version of Postgres so the whole thing can actually work together inside a browser.</p>

<h4 id="hermes-in-pyodide-source">Hermes in Pyodide source</h4>

<p>You can view and modify the source code here: https://github.com/hdresearch/hermes-pyodide</p>

<h3 id="pywasm">pywasm</h3>

<h4 id="run-hermes-in-pywasm-yourself">Run Hermes in pywasm yourself</h4>

<p>This app serves the <code class="language-plaintext highlighter-rouge">hermes_agent.wasm</code> binary. Hit the “Run” button in the UI to execute it live.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vers run-commit f83df1ac-0a53-4ca6-bc26-205584fe65a3 <span class="se">\</span>
  <span class="nt">--format</span> json <span class="nt">--wait</span> <span class="se">\</span>
  | <span class="nb">sed</span> <span class="nt">-n</span> <span class="s1">'s/.*"vm_id"[: ]*"\([^"]*\)".*/https:\/\/\1.vm.vers.sh/p'</span>
</code></pre></div></div>

<p><strong>Public Vers VM Commit:</strong> <code class="language-plaintext highlighter-rouge">f83df1ac-0a53-4ca6-bc26-205584fe65a3</code></p>

<h4 id="how-it-works-with-py2wasm">How it works with py2wasm</h4>

<p>This second approach works by using <a href="https://wasmer.io/posts/py2wasm-a-python-to-wasm-compiler">py2wasm</a>, a Python-to-WebAssembly compiler and the pywasm split design keeps the security boundary clean:</p>

<div class="mermaid">
graph LR
    A["Hermes"] --&gt; B

    subgraph B["WASM"]
        B1["• Prompt<br />• Loop<br />• Context<br />• Local tools"]
    end

    B --&gt;|"JSON in/out"| C

    subgraph C["Host"]
        C1["• Calling API<br />• Tool dispatch<br />• API keys"]
    end
</div>

<p>The host extracts real schemas from the Hermes’ <code class="language-plaintext highlighter-rouge">ToolRegistry</code> at startup before sending them to the WASM binary via init protocol. The LLM always sees the same parameter names as the actual handlers (ie <code class="language-plaintext highlighter-rouge">path</code> instead of <code class="language-plaintext highlighter-rouge">file_path</code> or <code class="language-plaintext highlighter-rouge">old_string</code> instead <code class="language-plaintext highlighter-rouge">old_text</code>).</p>

<h4 id="hermes-in-pywasm-source">Hermes in pywasm source</h4>

<p>You can view and modify the soure code yourself here: https://github.com/hdresearch/hermes-pywasm</p>

<h2 id="benchmarks">Benchmarks</h2>

<p>Below are benchmarks obtained from running on a M4 macbook. As <a href="#should-i-replace-my-hermes-with-one-of-these">admitted earlier</a>, this probably won’t meaningfully replace running Hermes on your laptop. However, if porting the harness itself to alternative environments (ie a browser) is of interest to you, then you can see some of the tradeoffs between <a href="#pyodide">pyodide</a> and <a href="#pywasm">py2wasm</a>.</p>

<table>
  <thead>
    <tr>
      <th>Metric</th>
      <th>Native Python</th>
      <th>Pyodide (browser)</th>
      <th>pywasm (WASI)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Cold start</td>
      <td>750 ms</td>
      <td>2–5 s</td>
      <td><strong>110 ms</strong></td>
    </tr>
    <tr>
      <td>Single turn</td>
      <td>840 ms</td>
      <td>~850 ms</td>
      <td><strong>100 ms</strong></td>
    </tr>
    <tr>
      <td>20-turn conversation</td>
      <td>3,280 ms</td>
      <td>~3,300 ms</td>
      <td><strong>110 ms</strong></td>
    </tr>
    <tr>
      <td>50 parallel agents</td>
      <td>4,566 ms</td>
      <td>N/A (browser)</td>
      <td><strong>611 ms</strong> (wasmtime)</td>
    </tr>
    <tr>
      <td>Worker pool throughput</td>
      <td>9 q/s</td>
      <td>N/A (browser)</td>
      <td><strong>81 q/s</strong> (wasmtime)</td>
    </tr>
    <tr>
      <td>Deployment size</td>
      <td>733 MB</td>
      <td>~20 MB + packages</td>
      <td><strong>26 MB</strong></td>
    </tr>
    <tr>
      <td>Pip packages</td>
      <td>171</td>
      <td>171 (via Pyodide)</td>
      <td>0</td>
    </tr>
    <tr>
      <td>Runs in browser</td>
      <td>❌</td>
      <td>✅</td>
      <td>⚠️ needs WASI polyfill</td>
    </tr>
    <tr>
      <td>API key exposure</td>
      <td>server-side</td>
      <td>client-side</td>
      <td>Stored in host</td>
    </tr>
  </tbody>
</table>

<h2 id="takeaways">Takeaways</h2>

<p>While the founder of Docker years ago suggested <a href="https://x.com/solomonstre/status/1111004913222324225?lang=en">WASM+WASI was the missing sandboxing solution</a>, it’s evidently not a magic bullet considering the missing capabilities from a full-fledged computer or container. If having a full but branchable VM with incredibly fast startup times sounds like what you’re looking for, then go on over to <a href="https://vers.sh">Vers</a> and get started!</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[What’s in this post?]]></summary></entry><entry><title type="html">Taking MemPalace to 100%</title><link href="https://yev.bar/retaining" rel="alternate" type="text/html" title="Taking MemPalace to 100%" /><published>2026-05-06T08:00:00+00:00</published><updated>2026-05-06T08:00:00+00:00</updated><id>https://yev.bar/retaining</id><content type="html" xml:base="https://yev.bar/retaining"><![CDATA[<h2 id="overview">Overview</h2>

<p>We took <a href="https://github.com/mempalace/mempalace">MemPalace</a> and extended its techniques to close the gap in the <a href="https://github.com/mempalace/mempalace#benchmarks">LongMemEval</a> <code class="language-plaintext highlighter-rouge">recall@5</code> retrieval benchmark to get a reproducible 100% score using only local compute (no LLM or API calls).</p>

<h2 id="what-this-is-not">What this is not</h2>

<ul>
  <li><strong>Not a LongMemEval leaderboard score.</strong> The full LongMemEval benchmark is end-to-end and involves generating answers plus GPT-4 judging. This experiment is strictly about the same retrieval metric that MemPalace was tackling.</li>
  <li><strong>Not a strong metric.</strong> The metric is <code class="language-plaintext highlighter-rouge">recall_any@5</code>, the softer variant. <code class="language-plaintext highlighter-rouge">recall_all@5</code> (requiring <em>every</em> gold session in the top 5) would be a harder bar.</li>
  <li><strong>Not an novel algorithm.</strong> Iterating on failures from the dataset, the patches made are general NLP patterns. A new benchmark could be put together with different heuristics required but that just continues the cat-and-mouse game of developing “human-comparable intelligence”.</li>
</ul>

<p>These caveats aren’t intended to steer your attention away but more set the expectation of an interesting result. The central takeaway of grammatical patterns in text being applicable to vector stores still deserves some acknowledgement.</p>

<h2 id="what-we-did-do">What we did do</h2>

<p>We achieved 100% <code class="language-plaintext highlighter-rouge">recall@5</code> retrieval on all 500 LongMemEval questions. The system uses no language model, makes no API calls, and requires no GPU. The MemPalace baseline on the same metric is 96.6%, so the +3.4% improvement represents a real engineering output. Shared in a project dubbed <a href="https://github.com/hdresearch/retaining">Retaining</a>, it does:</p>

<ul>
  <li><strong>500/500 R@5</strong> (100% recall at rank 5)</li>
  <li><strong>500/500 R@10</strong> (100% recall at rank 10)</li>
  <li>Fully deterministic and reproduced across multiple runs</li>
</ul>

<h2 id="context">Context</h2>

<p>On April 6th, <a href="https://x.com/bensig/status/2041229266432733356">Ben Sigman shared</a> that Milla Jovovich had fun with coding agents and built a solution for long-term memory named “MemPalace”. For those who are fans of sci-fi movies, you may recognize Jovovich as the one who played the <a href="https://en.wikipedia.org/wiki/Milla_Jovovich#Breakthrough_(1997%E2%80%932001)">Fifth Element</a> as well as Alice in <a href="https://en.wikipedia.org/wiki/Resident_Evil_(film_series)">Resident Evil</a>. The cherry on top is, at the <a href="https://en.wikipedia.org/wiki/Resident_Evil:_The_Final_Chapter#Plot">end of the Resident Evil series</a>, Alice is enabled to tackle the antagonist after her childhood memories were uploaded to her; a rather similar message to enabling agents after giving them a “memory palace”.</p>

<p>Originally proclaiming it to score <a href="https://github.com/MemPalace/mempalace/commit/068dbd9a7be0af3c37bbbf1ed0e3dc477f850af8">100% with optional Haiku rerank</a> before backtracking, it’s racked up a good volume of attention and validation so it’s not totally “viral slop”. By both <a href="https://mempalaceofficial.com/#dialect">compressing content</a> and <a href="https://mempalaceofficial.com/concepts/the-palace.html">making historical context navigatable</a>, it highlights the efficacy of simple NLP techniques when applied with LLMs.</p>

<h2 id="improving-mempalace">Improving MemPalace</h2>

<h3 id="what-worked">What worked</h3>

<p>If you’ve seen structured note taking like the <a href="https://lsc.cornell.edu/how-to-study/taking-notes/cornell-note-taking-system/">Cornell Note Taking System</a> or <a href="https://obsidian.md/help/plugins/backlinks">backlinks in Obsidian</a>, then you know there’s more to outlining text than just indexing when or where words occur. With <a href="https://spacy.io/">spaCy</a> and <a href="https://spacy.io/usage/linguistic-features#named-entities">named entity recognition</a>, we can extend the existing pipeline by including noun phrases or other grammatical relations that give a more detailed picture of the “ontology” representing the content at hand.</p>

<p>Below is a table of newly added techniques and how much they contributed to the <code class="language-plaintext highlighter-rouge">recall@5</code> performance:</p>

<table>
  <thead>
    <tr>
      <th>Technique</th>
      <th>Measurement</th>
      <th>Net Δ R@5</th>
      <th>Net Qs Fixed</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>NER-enriched synthetic documents</td>
      <td>individual</td>
      <td>+1.6%</td>
      <td>+8</td>
    </tr>
    <tr>
      <td>Keyword overlap re-ranking</td>
      <td>individual</td>
      <td>+1.2%</td>
      <td>+6</td>
    </tr>
    <tr>
      <td>Time-based date matching</td>
      <td>individual</td>
      <td>+0.8%</td>
      <td>+4</td>
    </tr>
    <tr>
      <td>Logic engine scores</td>
      <td>individual</td>
      <td>+0.4%</td>
      <td>+2</td>
    </tr>
    <tr>
      <td>Theme detection</td>
      <td>individual</td>
      <td>+0.2%</td>
      <td>+1</td>
    </tr>
    <tr>
      <td>NP embeddings + LogicKB rewrite</td>
      <td>cumulative</td>
      <td>+0.4%</td>
      <td>+2</td>
    </tr>
    <tr>
      <td>Rank preservation injection</td>
      <td>cumulative</td>
      <td>+0.6%</td>
      <td>+3</td>
    </tr>
    <tr>
      <td>Temporal-NP bridge</td>
      <td>cumulative</td>
      <td>+0.2%</td>
      <td>+1</td>
    </tr>
  </tbody>
</table>

<p><em>Individual: technique alone added to the baseline. Cumulative: technique added on top of prior ones. Deltas overlap and do not sum to total.</em></p>

<p>The top three contributors are all simple re-ranking heuristics. The logic engine contributes modestly and actually causes the most regressions. The finding from this experiment: <strong>enrichening NLP extraction in a retrieval pipeline can produce more than improving the logic engine that queries them.</strong> (<a href="https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf">damn you bitter lesson!</a>)</p>

<h3 id="in-more-detail">In more detail</h3>

<h4 id="1-spacy-based-extraction">1. spaCy-based extraction</h4>

<p>Every session gets processed through spaCy’s <code class="language-plaintext highlighter-rouge">en_core_web_sm</code> pipeline. We extract entities, noun phrases, relations (subject-verb-object triples), time-related markers, and quoted phrases. This takes ~5 seconds per question’s haystack when run on my Macbook.</p>

<h4 id="2-pure-python-logic-engine">2. Pure-Python logic engine</h4>

<p>A <code class="language-plaintext highlighter-rouge">LogicKB</code> Python class that stores extracted facts as inverted indexes. For each query, it looks up matching objects across all sessions, returning a weighted score per each one. This replaced an earlier Prolog approach with the same idea but much less complexity and no IPC overhead.</p>

<h4 id="3-ner-enriched-synthetic-documents">3. NER-enriched synthetic documents</h4>

<p>For each session, we create an document containing its extracted facts and details. These get indexed alongside the raw session text, giving the embedding model a richer retrieval surface. This is the single biggest contributor to accuracy.</p>

<h4 id="4-noun-phrase-embedding-bridge">4. Noun-phrase embedding bridge</h4>

<p>We embed each session’s extracted objects into a separate ChromaDB collection and query it with the question’s noun phrases. This bridges gaps that neither keywords nor full-document embeddings can cross. “Battery life phone” → “portable power bank” has a close enough embedding distance in the noun phrase space to pick up the right session.</p>

<h4 id="5-time-related-bridge">5. Time related bridge</h4>

<p>For time-related questions (“What did I buy 10 days ago?”), we first identify all sessions in the date window, then run the noun phrase bridge <em>within that filtered set</em>. This discriminates between 14 sessions that all share the same date by finding the one whose noun phrases are topically closest to the question.</p>

<h3 id="what-didnt-work">What didn’t work</h3>

<p>When people complain about LLMs not being able to answer questions or hallucinating false information, what nobody complains about is the LLMs’ ability to identify the question it needs to answer (we can depend on AI to write code that does a thing rather than depend on it to end-to-end handle a task). In the subject of answering questions or digging through “long context problems”, I first attempted to have the LLM use <a href="https://en.wikipedia.org/wiki/Prolog">Prolog</a> for storing and retrieving facts.</p>

<p>However, the semantic fuzziness (ie synonyms or finding similar topics to a query) ended up hurting the overall score more than helping. The approach in MemPalace to depend on a <a href="https://www.trychroma.com">vector store</a> actually showed to be “more correct” in this experiment.</p>

<p>Nevertheless, I do think there may be types of problems where realistic input queries (ignoring cases where people are funny and test jailbreaking support agents) would be usable with a more structured and queryable store of relations between objects. Prolog just may not be a low-hanging fruit solution for long-term memory problems where semantic similarity is something worth indexing.</p>

<h2 id="running-yourself">Running yourself</h2>

<p>First, clone the repo and install dependencies.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/hdresearch/retaining
<span class="nb">cd </span>retaining
python3 <span class="nt">-m</span> venv .venv <span class="o">&amp;&amp;</span> <span class="nb">source</span> .venv/bin/activate
pip <span class="nb">install </span>spacy chromadb
python <span class="nt">-m</span> spacy download en_core_web_sm
</code></pre></div></div>

<p>Next, download the dataset for the benchmark.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Download LongMemEval data (~265MB)</span>
curl <span class="nt">-fsSL</span> <span class="nt">-o</span> /tmp/longmemeval_s_cleaned.json <span class="se">\</span>
  https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
</code></pre></div></div>

<p>Lastly, run the benchmarks.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Vector-only baseline: 96.6% R@5, ~5 min</span>
python bench_v2.py /tmp/longmemeval_s_cleaned.json <span class="nt">--mode</span> vector

<span class="c"># Full hybrid: 100% R@5, ~50 min</span>
python bench_v2.py /tmp/longmemeval_s_cleaned.json <span class="nt">--mode</span> hybrid
</code></pre></div></div>

<p>No API keys. No GPU. Python 3.9+ and ~300MB of disk.</p>

<h2 id="conclusion">Conclusion</h2>

<p>AI famously hit <a href="https://en.wikipedia.org/wiki/AI_winter">“winters”</a> in the past when some wall prevented computers from becoming sufficiently intelligent. Interestingly, the problem in the past was that “symbolic” approaches to AI would fall short when it came to <a href="https://data-mining.philippe-fournier-viger.com/the-semantic-web-and-why-it-failed/">the last mile of complexity</a>. Similarly, LLM-maximalist approaches also run into a “last mile problem” when it comes to ensuring accuracy of details (ie hallucination).</p>

<p>By incorporating older NLP techniques to tackle the “last mile problems” with modern approaches involving LLMs, there are rather interesting results to be found! Albeit, the implementation used here to game <code class="language-plaintext highlighter-rouge">recall@5</code> is, certainly, by no means a complete solution for knowledge retrieval.</p>

<p>The beauty of the finding is that the problem of “if only someone had sat down long enough to write every NLP grammar rule” now becomes somewhat negligible in a world with coding agents. So, rather than continue to see human text as black boxes, know that a richer pipeline may get the sufficient amount of complexity for some information to be adequately indexed.</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Overview]]></summary></entry><entry><title type="html">A coding agent with direction</title><link href="https://yev.bar/zagent" rel="alternate" type="text/html" title="A coding agent with direction" /><published>2026-04-29T08:00:00+00:00</published><updated>2026-04-29T08:00:00+00:00</updated><id>https://yev.bar/zagent</id><content type="html" xml:base="https://yev.bar/zagent"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#what-is-this">What is this?</a></li>
  <li><a href="#what-is-this-not">What is this not?</a></li>
  <li><a href="#the-background">The background</a>
    <ul>
      <li><a href="#coding-agent">Coding agent</a></li>
      <li><a href="#ralph-loops">Ralph loops</a></li>
      <li><a href="#rlms">RLMs</a></li>
    </ul>
  </li>
  <li><a href="#zagent-terms">zagent terms</a>
    <ul>
      <li><a href="#code-cannon">Code cannon</a></li>
      <li><a href="#code-pirate">Code pirate</a></li>
      <li><a href="#code-captain">Code captain</a></li>
    </ul>
  </li>
  <li><a href="#takeaways">Takeaways</a></li>
</ul>

<h2 id="what-is-this">What is this?</h2>

<p>This is an overview of the principles I used to assemble <a href="https://github.com/hdresearch/zagent">zagent</a>, a coding harness for getting more progress out of a single “shot”. I’ll be both describing the topics I worked on top of as well as the structure behind what I put together.</p>

<p>If you’re looking for a post to read that gives copy-and-paste’able commands, this isn’t for you. If you’re alright with reading something more explanatory, then continue on!</p>

<h2 id="what-is-this-not">What is this not?</h2>

<p>zagent is not going to replace your Claude code or <code class="language-plaintext highlighter-rouge">pi</code> (which I predominately use) but the ideas below should be high level enough you can implement it in your own harnesses or coding agent systems.</p>

<h2 id="the-background">The background</h2>

<p>I didn’t by any means invent a new model or algorithm, I simply applied together some existing concepts which are accessible yourself. Being transparent, this is my way of building up towards a general purpose version of what Google accomplished with <a href="https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/">AlphaEvolve</a>.</p>

<p>To break down what the heck is going on with <code class="language-plaintext highlighter-rouge">zagent</code>, there are three “primitives” in the area of coding agents that would be useful to know.</p>

<h3 id="coding-agent">Coding agent</h3>

<p>From editors like <a href="https://cursor.com">Cursor</a> to headless systems like <a href="https://devin.ai">Devin</a>, there’s a large variety of offerings that all fall under the notion of “coding agents”. Simplifying it to the bare minimum, a coding agent is an AI that can take a prompt from someone and write code to accomplish some goal. However it may be accessed by a user (could be tagging in a Slack workspace, sending a message on Telegram, writing a prompt from a UI, etc), the underlying step from general agents is that it can write and run code.</p>

<div class="mermaid">
graph LR;
    Prompt--&gt;Agent
    Agent["Coding agent\n(Can write/run code)"]--&gt;Output
</div>

<p>Sometimes underappreciated in domains other than literally writing software, the power of coding agents is in how much is built on top of code, making them immediately ‘effective’ in the world around us today. It could be a “short lived” agent that only runs to solve a specific problem before exiting or a “long running” agent with a growing memory. Depending on the particular use case you’re looking for, one may be better than another.</p>

<p>In the context of writing software that <em>delivers something</em>, I’ve personally found the philosophy of short lived agents to be better suited.</p>

<h3 id="ralph-loops">Ralph loops</h3>

<p><a href="https://awesomeclaude.ai/ralph-wiggum">Ralph Wiggum loops</a>, named literally after the <a href="https://en.wikipedia.org/wiki/Ralph_Wiggum">Simpsons character</a>, is a technique for working with coding agents that looks something like the below pseudo-code:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>while not done:
    fire coding agent at task(s)
    repeat until done
</code></pre></div></div>

<p>For instance, the Claude code plugin would run a <code class="language-plaintext highlighter-rouge">while true</code> loop in bash until the LLM outputted a specific string indicating it had actually completed the task rather than said things which sounded nice. In pseudo-code that’d look something like:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">while</span> <span class="s">"DONE"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">last_output</span><span class="p">:</span>
    <span class="n">fire</span> <span class="n">claude</span> <span class="n">code</span> <span class="ow">and</span> <span class="n">tell</span> <span class="n">it</span> <span class="n">to</span> <span class="n">say</span> <span class="s">"DONE"</span> <span class="n">when</span> <span class="n">finished</span>
    <span class="k">continue</span> <span class="n">until</span> <span class="n">done</span>
</code></pre></div></div>

<p>To avoid running out of context (and to keep productive when one goes to sleep), folks would run “ralph loops” since the <code class="language-plaintext highlighter-rouge">while true</code> serves as a way to reset the context over and over, letting it run ‘infinitely’. In the case of problems where the task is going through a large bullet point list of items (ie meticulously writing unit tests across a large codebase), it works well since the tokens that filled up the context about prior solved items isn’t relevant to the context needed for solving problems moving forward.</p>

<div class="mermaid">
graph LR;
    Prompt["Send prompt before going to sleep"]--&gt;Ralph
    Ralph["Ralph loop\n(Resetting and repeating over and over till it's done)"]--&gt;Goal["Completed goal"]
    Ralph--&gt;Ralph
</div>

<p>However, in the case of problems where you do lose something by resetting the context (ie a complex integration which requires knowing about all the pieces involved to be useful), then ralph loops can fall short. While still a useful technique, it’s no longer meme’d as a solution for “solving programming” for this reason.</p>

<h3 id="rlms">RLMs</h3>

<p>An idea popularized from <a href="https://alexzhang13.github.io/blog/2025/rlm/">a blog post</a> and then published to <a href="https://arxiv.org/abs/2512.24601">arXiv</a>, RLMs broadly solve the problem of “running out of context” but in an importantly different way. Rather than place the “infinite loop” above the LLM (like done in the ralph loop), what if the loop were conceptually brought into the agent loop itself? In RLMs, this is done by letting the agent recursively call itself or other agents before coming back with a final answer.</p>

<div class="mermaid">
graph LR;
    Prompt--&gt;Agent
    Agent--&gt;Sub["Sub-agent"]
    Sub--&gt;Web["Web request"]
    Sub--&gt;Code["Run code"]
    Sub--&gt;Process["Process results"]
    Sub--&gt;Agent
    Agent--&gt;Result
</div>

<p>Explaining how this works with LLMs but with an analogy: suppose you wake up to a text message asking you to research something that you have five minutes to respond to but you haven’t had the chance to even have coffee yet. Lacking the energy to Google around, you text someone else who you think either knows the answer already or wouldn’t mind finding it, they get back with the answer, you forward to the first person, and then all’s done.</p>

<p>A profound utility from this is being able to “stretch” your context window since spawned sub-agents can go through their context windows exploring something rather than the top-level agent you provided the original prompt to. Nowadays, in conjunction with stuff like <a href="https://github.com/mempalace/mempalace">memory</a>, some of the older problems with arbitrarily large context windows have tools for tackling them.</p>

<p>Where “infinite context” can fall short can be broadly explained by how <a href="https://www.youtube.com/watch?v=G_7Ta_4coy4">“completely illuminating a house such there no shadows”</a> makes it uninhabitable. It’s no secret LLMs can be convincing whether to themselves to users falling into AI psychosis. As a result, letting an agent ruminate on some goal or task (even if it’s rational like programming), can lead to adverse results which are seen as unproductive to the person hoping to finish an app or such.</p>

<h2 id="zagent-terms">zagent terms</h2>

<p>Inspired by my experience with <a href="https://x.com/training_loop/status/2024600194428424668">herding</a> coding agents, there are three layers I’ve assembled into <code class="language-plaintext highlighter-rouge">zagent</code> that apply the above ideas. Before you ask, yes, the names are inspired by <a href="https://en.wikipedia.org/wiki/One_Piece">One Piece</a>.</p>

<h3 id="code-cannon">Code cannon</h3>

<p>In my prior projects using “code cannons” like <a href="https://vers.sh/blog/git-zig-bun-100x">rewriting git in zig</a> or <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">developing a modern toolkit between Elixir and WebAssembly</a>, what I was really doing was leveraging <a href="https://vers.sh">Vers</a> VMs as the RLM environments in which sub-agents were working on scoped problems. To differentiate from the ideal of a <a href="https://github.com/gastownhall/gastown">code factory</a>, this RLM pattern is what I’ve referred to as a “code cannon”.</p>

<div class="mermaid">
graph TD;
    Agent--&gt;Sub1["Sub-agent"]
    Agent--&gt;Sub2["Sub-agent"]
    Agent--&gt;Sub3["Sub-agent"]

    subgraph cannon[" "]
        Sub1
        Sub2
        Sub3
        Sub1--&gt;RF1["Read file"]
        Sub1--&gt;WF1["Write file"]
        Sub1--&gt;RP1["Run program"]
        Sub2--&gt;RF2["Read file"]
        Sub2--&gt;WF2["Write file"]
        Sub2--&gt;RP2["Run program"]
        Sub3--&gt;RF3["Read file"]
        Sub3--&gt;WF3["Write file"]
        Sub3--&gt;RP3["Run program"]
    end
</div>

<p>In the case of rewriting the <code class="language-plaintext highlighter-rouge">git</code> CLI, there are several subcommands which can be worked on in parallel (and on different files which can prevent conflicts when merging changes). You can think of this like how, at a hackathon, you may have one person working on the backend, one person working on the frontend, and one person working on the slideshow presentation; each of them can work on their piece of the overall project without stepping on each others’ toes.</p>

<h3 id="code-pirate">Code pirate</h3>

<p>Taking a step back and contemplating what I was really doing when “firing code cannons”: I would see what the progress or status of changes were, break down the next wave of changes I wanted to see, provisioning new agents with their respective prompts, and letting it run for a while before coming back to my laptop and repeating.</p>

<p>Enter the “code pirate”, a ralph loop that works from a markdown file firing code cannons until it finishes more substantial progress.</p>

<div class="mermaid">
graph LR;
    Pirate["Code pirate"]--&gt;Pirate
    Pirate--&gt;SA1
    Pirate--&gt;SA2
    Pirate--&gt;SA3
    Pirate--&gt;SA4

    subgraph pair1["Code cannon"]
        SA1["Sub-agent"]
        SA2["Sub-agent"]
    end

    subgraph pair2["Code cannon"]
        SA3["Sub-agent"]
        SA4["Sub-agent"]
    end
</div>

<p>By bridging together the context-resetting of the Ralph loop (the pirate) and the context-mindfulness of the RLMs (the cannons), it establishes a coding harness which is able to accomplish larger diffs like building out <a href="https://github.com/hdresearch/sterling">sterling</a> (if it’s still private, it’s coming soon!).</p>

<p>When I come back to my computer to review a captain result, it’s less about knitting knots in feature intentions and more about steering the army of coding agents overall. Making sterling with the code pirate was less about firing it over and over at a goal but more setting goal(s), it finishes them through, and then setting new goals to be implemented (like <a href="https://vers.sh/blog/git-zig-bun-100x#why-we-think-this-works">making a peanut butter jelly sandwich</a>).</p>

<h3 id="code-captain">Code captain</h3>

<p>Everything up to this point I can say truthfully has yielded a real result that would have taken more time or effort if I used a different tool. This next “layer” is something I’ve been tinkering with and have not yet found something that feels like I “cracked it”. However, I’m sharing here in case the concepts are of use to someone else facing similar problems.</p>

<p>When tackling projects where it “working” is non-negotiable (ie it meets a test coverage quota, an ambiguity that would lead some agents to giving up early), totally depending on the LLM to come back with a result can be anticlimatic.</p>

<p>To remedy this while tinkering with <a href="https://lean-lang.org/">Lean</a>, I’ve started working on a “code captain” which behaves like a code pirate but, rather than let the agent exit when it’s gone astray, I added a gate which prevents the pirate from exiting until <em>all</em> conditions are met.</p>

<div class="mermaid">
graph LR;
    subgraph pirate["Repeat until complete"]
        Pirate["Code captain"]--&gt;Pirate
    end
    pirate--&gt;SA1
    pirate--&gt;SA2
    pirate--&gt;SA3
    pirate--&gt;SA4

    subgraph pair1["Code cannon"]
        SA1["Sub-agent"]
        SA2["Sub-agent"]
    end

    subgraph pair2["Code cannon"]
        SA3["Sub-agent"]
        SA4["Sub-agent"]
    end
</div>

<p>If the gate’s not well defined, then the agent can find a way to exit early. If the gate’s redefinable (ie learning about new objectives or constraints over time) or even appendable, then the agent may still find a way to exit early. So, ultimately, software engineering’s a game of scoping objectives well.</p>

<h2 id="takeaways">Takeaways</h2>

<p>Training employees versus hiring interns is like the difference between vertical and horizontal scaling. Likewise, the difference between leveling up a single person versus spinning up agents to fill in certain tasks is like the difference between vertical and horizontal scaling but for responsibilities. The underlying problem with coding harnesses is boiling down the responsibilities of a software engineer into horizontally scalable skills.</p>

<p>It’s already the case in some hedge funds that folks will develop models for executing strategies but aren’t picking up the phone and placing orders themselves. While there are still some firms which rely on old fashioned methods, the analog to software is that there will eventually be categories of products where the code defining these products isn’t governed by people but instead by the systems established by them.</p>

<p>Until the day coding’s finally solved, we shall still have problems to solve. Hack the planet!</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry><entry><title type="html">Ziggit</title><link href="https://yev.bar/ziggit" rel="alternate" type="text/html" title="Ziggit" /><published>2026-04-02T08:00:00+00:00</published><updated>2026-04-02T08:00:00+00:00</updated><id>https://yev.bar/ziggit</id><content type="html" xml:base="https://yev.bar/ziggit"><![CDATA[<h2 id="digest">Digest</h2>

<p>We rewrote git in zig and:</p>

<ul>
  <li>Sped up bun by <a href="#bun-improvements">100x</a></li>
  <li>Got <a href="#git-drop-in">4x</a> faster than <code class="language-plaintext highlighter-rouge">git</code> on an arm Macbook</li>
  <li>Compiled to WASM to be <a href="#webassembly">5x smaller with 8.5x more exports</a>
    <ul>
      <li>Check out <a href="https://vers.sh/ziggit-demo">this demo to clone a repo</a> in your browser!</li>
    </ul>
  </li>
</ul>

<p>Rather than start with the theory behind the “swarming”, we’ll share how to code cannon yourself, describe how our zig rewrite of git went, and then dive into some of our theory behind why this works.</p>

<h2 id="how-to-code-cannon-yourself">How to code cannon yourself</h2>

<h3 id="install-vers-cli">Install vers CLI</h3>

<p>First you’ll need the <a href="https://github.com/hdresearch/vers-cli">vers CLI</a> installed.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl <span class="nt">-fsSL</span> https://raw.githubusercontent.com/hdresearch/vers-cli/main/install.sh | sh
</code></pre></div></div>

<p>After you’ve installed it, log in.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vers login
</code></pre></div></div>

<p>Now you have a working <code class="language-plaintext highlighter-rouge">vers</code> CLI ready to prepare your swarm infrastructure.</p>

<h3 id="configure-environment-variables">Configure environment variables</h3>

<p>With the <code class="language-plaintext highlighter-rouge">vers</code> CLI you can define environment variables which get injected to all the VMs you create, making authentication for some CLIs a breeze. Here we’ll walk through the environment variables we included for this project.</p>

<p>First, create a <a href="https://github.com/new">new GitHub repository</a> and then follow <a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-fine-grained-personal-access-token">the instructions for creating a fine-grained personal access token</a>. You’ll want to create one that has <strong>Read and write</strong> access to content for the repository you’re going to work on.</p>

<p><img src="https://vers.sh/hdr_legacy/images/github-token.png" alt="github-token" /></p>

<p>We configured it to have access to <em>only</em> the repos we’re interested in code cannoning at for this project. Our rationale being we don’t want one or multiple agents to get creative and start integrating other projects that aren’t relevant. Once you have that API key then set it like so:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vers <span class="nb">env set </span>GITHUB_API_KEY github_pat_...
</code></pre></div></div>

<p>Next, from the <a href="https://vers.sh/orgs/yev/dashboard">vers dashboard</a> click on the <strong>API Keys</strong> tab and create a new API key. After you’ve written it down someplace you won’t lose it, you can set it to your environment variables (so an agent running in a VM would be able to on its own spawn further agents).</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vers <span class="nb">env set </span>VERS_API_KEY abc123...
</code></pre></div></div>

<p>Finally, since we’ve been driving this using <a href="https://claude.ai">Claude</a>, let’s set an <code class="language-plaintext highlighter-rouge">ANTHROPIC_API_KEY</code> so any coding agent running in a VM works out of the box.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vers <span class="nb">env set </span>ANTHROPIC_API_KEY sk-ant-...
</code></pre></div></div>

<h3 id="write-your-initial-plan">Write your initial plan</h3>

<p>We’ve shared the <a href="#the-initial-plan"><code class="language-plaintext highlighter-rouge">plan.md</code> file</a> we used for the zig rewrite of git, you’re welcome to copy it and tweak for your project. Once you have it written, simply point your coding agent at it.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pi <span class="s2">"Read plan.md and let me know when I can quit this session"</span>
</code></pre></div></div>

<p>The prompt specifies that contributing agents should be spun up in VMs so you can close your laptop and know things are still progressing.</p>

<h3 id="let-it-start-running">Let it start running</h3>

<p>Eventually <code class="language-plaintext highlighter-rouge">pi</code> or your coding agent will tell you agents are working and you’re good to quit the session. Congrats, you’ve successfully created a code cannon to work on some problem.</p>

<h3 id="check-where-its-at">Check where it’s at</h3>

<p>Regardless of the size of the project, since there may be small features or nits you’d like to include anyways, it’s good to check in after agents began working to verify what it is they’re working on. If you find yourself glossing over the agent descriptions and more crossing your fingers than walking away knowing your progress, you’ve likely depended on the agents too much for your goal.</p>

<h3 id="repeat-running-and-checking-in">Repeat running and checking in</h3>

<p>We found it useful to check in on the swarm similar to checking in with a team during standup but on an admittedly more frequent basis. Rather than provide a prompt to scale up/down the swarm after certain checkpoints, being more hands-on with steering allowed us to also get a clearer understanding of the scope of this project as well.</p>

<h2 id="how-we-rewrote-git-in-zig">How we rewrote git in zig</h2>

<p>Anthropic <a href="https://github.com/anthropics/claudes-c-compiler/issues/1">took a stab</a> at rewriting the C compiler and Cursor <a href="https://github.com/wilsonzlin/fastrender/issues/98">took a stab</a> at rewriting a web browser. It’s not that hard for you to do the same and here’s how we went about rewriting a big open source project with the help of agents!</p>

<h3 id="environment">Environment</h3>

<p>The <a href="https://vers.sh/">Vers VMs</a> spawned had the environment variables injected at startup (so running <code class="language-plaintext highlighter-rouge">pi</code> with instructions will always work).</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">ANTHROPIC_API_KEY</code>: For the LLM powering the coding agent</li>
  <li><code class="language-plaintext highlighter-rouge">VERS_API_KEY</code>: For further orchestration</li>
  <li><code class="language-plaintext highlighter-rouge">GITHUB_API_KEY</code>: A strictly scoped API key for just one repository</li>
</ul>

<h3 id="the-initial-plan">The initial plan</h3>

<p>Below was literally the one markdown file used for the original agent to spin up a swarm.</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The goal is to make a modern version control software like git or jj but written in zig

ALL SYSTEMS AND AGENTS MUST use this github -&gt; https://github.com/hdresearch/ziggit.git

For each of the below goals, create a VM and run code like the following

<span class="p">```</span><span class="nl">bash
</span><span class="k">while </span><span class="nb">true </span><span class="k">do
  </span>pi <span class="nt">-run</span> <span class="s2">"GOAL"</span>
end
<span class="p">```</span>

NOTE - pi is running on the VM itself rather than running on the host machine and then ssh'ing commands. This should be done so we can quit this pi session

So agents are just infinitely running since there is always something to improve in a piece of software. Include pi-vers extension so each infinite loop can provision further VMs or agents.
<span class="p">
-</span> first person like jj but does not have a <span class="sb">`jj git`</span> subcommand and instead is drop in replaceable with <span class="sb">`git`</span> so <span class="sb">`ziggit checkout`</span> not <span class="sb">`ziggit git checkout`</span>
<span class="p">-</span> feature compatibility with git (copy over test suite from git source)
<span class="p">-</span> can compile to webassembly
<span class="p">-</span> can yield performance improvements to oven-sh/bun codebase by using directly with zig integration instead of libgit2 or git cli

Maybe wait for some progress before starting on replacing bun's usage of the git cli (which they use over libgit2 for performance reasons, our suspicion is that a modern solution in zig could be better). Every VM should have the env vars <span class="sb">`VERS_API_KEY`</span>, <span class="sb">`ANTHROPIC_API_KEY`</span>, <span class="sb">`GITHUB_API_KEY`</span>. Also use the hdresearch/bun fork with changes so a real PR can be created pointing at oven-sh/bun BUT DO NOT MAKE THIS PR YOURSELF. Provide instructions for a person to validate the benchmark results with ziggit usage first
</code></pre></div></div>

<p>We copied over the <code class="language-plaintext highlighter-rouge">plan.md</code> used for <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">firebird</a> and the <code class="language-plaintext highlighter-rouge">-run</code> argument is not a real argument, the correct one is <code class="language-plaintext highlighter-rouge">-p</code> but the top-level agent figures it out anyways.</p>

<h3 id="the-produced-agent-loop">The produced agent loop</h3>

<p>From the markdown plan, our local <code class="language-plaintext highlighter-rouge">pi</code> agent created a golden image for the VMs working on the <code class="language-plaintext highlighter-rouge">ziggit</code> codebase to use and configured each agent to have different git commit authors so progress would be identifiable.</p>

<p>Every agent additionally got a <code class="language-plaintext highlighter-rouge">/root/prompt.txt</code> file with that agent’s specific prompt. The agent tasked with covering git’s test suite would have that file populated with contents like <code class="language-plaintext highlighter-rouge">"You are the CORE agent. Run git's test suite and fix CLI bugs."</code> and the agent tasked with improving certain git index functionality would have that file with contents like <code class="language-plaintext highlighter-rouge">"You are the NET-SMART agent. Rewrite idx_writer.zig to be 10x faster."</code>.</p>

<p>Finally, every VM runs the exact same bash loop encompassing the coding agent itself as well as the git cleanups referenced earlier. The below was generated by the top-level pi agent orchestrating these coding processes in VMs for how to define a given agent.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nb">set</span> <span class="nt">-a</span><span class="p">;</span> <span class="nb">source</span> /etc/environment 2&gt;/dev/null<span class="p">;</span> <span class="nb">set</span> +a
<span class="nb">export </span><span class="nv">HOME</span><span class="o">=</span>/root
<span class="nb">export </span><span class="nv">NODE_OPTIONS</span><span class="o">=</span><span class="s2">"--max-old-space-size=256"</span>

<span class="nb">cd</span> /root/myproject <span class="o">||</span> <span class="nb">exit </span>1

<span class="k">while </span><span class="nb">true</span><span class="p">;</span> <span class="k">do
    </span><span class="nb">echo</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">date</span><span class="si">)</span><span class="s2">: === Starting agent run ==="</span>

    <span class="c"># 1. SYNC — save dirty work, pull latest from other agents</span>
    git add <span class="nt">-A</span>
    git diff <span class="nt">--cached</span> <span class="nt">--quiet</span> <span class="o">||</span> git commit <span class="nt">-m</span> <span class="s2">"auto-save before sync"</span>
    git fetch origin master
    git rebase origin/master <span class="o">||</span> <span class="o">{</span>
        git rebase <span class="nt">--abort</span>
        git reset <span class="nt">--hard</span> origin/master  <span class="c"># nuclear option on conflicts</span>
    <span class="o">}</span>

    <span class="c"># 2. BUILD — rebuild the project</span>
    zig build  <span class="c"># or whatever your build command is</span>

    <span class="c"># 3. RUN PI — the actual agent work</span>
    pi <span class="nt">--no-session</span> <span class="nt">-p</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">cat</span> /root/prompt.txt<span class="si">)</span><span class="s2">"</span>

    <span class="c"># 4. PUSH — commit and push whatever pi did</span>
    git add <span class="nt">-A</span>
    git diff <span class="nt">--cached</span> <span class="nt">--quiet</span> <span class="o">||</span> git commit <span class="nt">-m</span> <span class="s2">"auto-save after pi run"</span>
    <span class="k">for </span>attempt <span class="k">in </span>1 2 3<span class="p">;</span> <span class="k">do
        </span>git pull <span class="nt">--rebase</span> origin master <span class="o">||</span> <span class="o">{</span>
            git rebase <span class="nt">--abort</span>
            git reset <span class="nt">--hard</span> origin/master
        <span class="o">}</span>
        git push origin master <span class="o">&amp;&amp;</span> <span class="nb">break
        sleep </span>5
    <span class="k">done

    </span><span class="nb">sleep </span>10
<span class="k">done</span>
</code></pre></div></div>

<p>It executes every loop by saving work from the prior loop run, pulling in latest changes, rebuilding the project, running the pi agent, and then repeating the same git operations at the end with also pushing. The agent prompts themselves also mention to use git operations for auditability but these git failguards around the agent itself help ensure the agent loop doesn’t get stuck along the way.</p>

<h3 id="meta-note">Meta note</h3>

<p>To reiterate a point at the end of a <a href="#agent-spawned-agents-is-like-being-a-manager-of-managers">another section</a>, the sub-agents aren’t doing anything differently from if you were to be manually starting new agents with their respective prompts yourself. These shouldn’t be doing anything you can’t directly understand, whenever we found ourselves starting with an initial research goal (ie understanding a point of integration before beginning new <a href="#the-produced-agent-loop">loops</a>) and letting the agent in front of me handle the rest, we’ve ended up with a mess to clean up.</p>

<p>Similar to how LLMs can be poor at writing configuration files, we’d guess complex integrations fall under a similar category of “problems LLMs do a lot better with a human around” and, should you be working on one of these tasks, make sure every detail relevant to your intended prompt or plan for an agent to carry out is in the context you hit <code class="language-plaintext highlighter-rouge">Enter</code> on.</p>

<h3 id="what-it-cost">What it cost</h3>

<p>At the end of this crunch which consisted of nearly a week and ~13 billion tokens, we successfully created a rewrite of git in zig. If you were to do that as a human, say writing a new <code class="language-plaintext highlighter-rouge">git</code> of your own, and you were to work towards 100% test coverage, you’d be in for a world of pain.</p>

<p><img src="https://vers.sh/hdr_legacy/images/git-tokens.png" style="width: 100%" /></p>

<p>The git CLI test suite consists of 21,329 individual assertions for various git subcommands (that way we can be certain <code class="language-plaintext highlighter-rouge">ziggit</code> does suffice as a drop-in replacement for <code class="language-plaintext highlighter-rouge">git</code>). If it took a person four minutes to write enough functionality to pass each test (overlooking some tests being more complex than others), that’d amount to 85,316 minutes total, or about two months! And that’s without sleeping or eating included in the number.</p>

<p>While we only got through <a href="#the-final-results">part of the overall test suite</a>, that’s still the equivalent of a month’s worth of straight developer work (again, without sleep or eating factored in).</p>

<h3 id="the-final-results">The final results</h3>

<h4 id="bun-improvements">bun improvements</h4>

<table>
  <thead>
    <tr>
      <th>Operation</th>
      <th>macOS arm64 (M4)</th>
      <th>x86_64 Linux VM</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">findCommit</code></td>
      <td><strong>85.4x</strong> win</td>
      <td><strong>6.3x</strong> win</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">cloneBare</code></td>
      <td><strong>7.3x</strong> win</td>
      <td><strong>34.3x</strong> win</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">cloneBare</code> + <code class="language-plaintext highlighter-rouge">findCommit</code> + <code class="language-plaintext highlighter-rouge">checkout</code></td>
      <td><strong>~10x</strong> win</td>
      <td><strong>~30x</strong> win</td>
    </tr>
  </tbody>
</table>

<p>The <code class="language-plaintext highlighter-rouge">bun</code> team has already <a href="https://github.com/oven-sh/bun/blob/3ed4186bc8db8357c670307f192991bfc263f141/docs/runtime/templating/create.mdx?plain=1#L267">tested using git’s C library</a> and found it to be consistently slower hence resorting to literally executing the <code class="language-plaintext highlighter-rouge">git</code> CLI when performing <code class="language-plaintext highlighter-rouge">bun install</code>. With <code class="language-plaintext highlighter-rouge">ziggit</code>, it becomes possible to see upward of <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#macos-arm64-releasefast-20-iterations"><strong>100x speedups</strong></a> for some git operations.</p>

<p>Tested on an M4 Macbook with 24gb of RAM across multiple runs, it scored an average of <strong>85.4x</strong> speedup for <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#findcommit-rev-parse-head"><code class="language-plaintext highlighter-rouge">findCommit</code></a>, <strong>7.3x</strong> speedup for <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#clonebare-local-bare-clone"><code class="language-plaintext highlighter-rouge">cloneBare</code></a>, and a <strong>~10x</strong> speedup for the <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#full-workflow-clonebare--findcommit--checkout">entire workflow</a> comprising of git operations. In a x86_64 Linux VM with 8gb of RAM, it scored an average of <strong>6.3x</strong> speedup for <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#findcommit"><code class="language-plaintext highlighter-rouge">findCommit</code></a>, <strong>34.3x</strong> speedup for <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#clonebare"><code class="language-plaintext highlighter-rouge">cloneBare</code></a>, and a <strong>~30x</strong> speedup for the <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#full-workflow">full workflow</a>.</p>

<p>When evaluating the complete <code class="language-plaintext highlighter-rouge">bun install</code> improvements, it came out speed-wise to about the same as the existing <code class="language-plaintext highlighter-rouge">git</code> usage (due to networking being the big bottleneck time-wise despite more cases being slightly faster with <code class="language-plaintext highlighter-rouge">ziggit</code> over multiple benchmarks). <em>Except</em>, it’s done in 100% zig <em>and</em> those internal improvements pile up as projects <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#why-e2e-shows-modest-speedups-despite-10-85-library-speedups">consist of more git dependencies</a>. All in all, it seems like a sensible upstream contribution.</p>

<h4 id="git-drop-in">git drop-in</h4>

<table>
  <thead>
    <tr>
      <th>Benchmark</th>
      <th>ziggit vs git</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>arm64 Mac (small repos)</td>
      <td><strong>&gt;4x</strong> win</td>
    </tr>
    <tr>
      <td>arm64 Mac (large repos)</td>
      <td><strong>&gt;4x</strong> win</td>
    </tr>
    <tr>
      <td>Best commands</td>
      <td>up to <strong>10x</strong> win</td>
    </tr>
  </tbody>
</table>

<p>In addition to covering enough functionality to replace bun’s usage of the <code class="language-plaintext highlighter-rouge">git</code> CLI, <code class="language-plaintext highlighter-rouge">ziggit</code> covers enough subcommands and arguments to be a viable drop-in replacement for git with numerous performance improvements. While there are codepaths where the two are at <strong>1x</strong> performance comparisons, it’s remarkable that a modern rewrite in a modern programming language was able to reach that level <em>and</em> even get up to <strong>10x</strong> speedup for <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#macos-arm64--large-repo-ziggit-itself-2367-commits-150-files">some commands</a>!</p>

<p>While <code class="language-plaintext highlighter-rouge">git</code> itself has had much more development and optimizations for x86_64 Linux, <code class="language-plaintext highlighter-rouge">ziggit</code>’s performance really outshines <code class="language-plaintext highlighter-rouge">git</code> when measuring on an arm64 Macbook. On our macbook, it’s across the board more than <strong>4x</strong> faster than <code class="language-plaintext highlighter-rouge">git</code> in both <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#macos-arm64--small-repo-51-commits-100-files">smaller repositories</a> as well as <a href="https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefef29426cf333782fc05d7cf/BENCHMARKS.md#macos-arm64--large-repo-ziggit-itself-2367-commits-150-files">larger ones</a>.</p>

<p>Of course, <code class="language-plaintext highlighter-rouge">ziggit</code> comes with <strong>git-lfs</strong> support as well and a useful <a href="#succinct-mode">succinct mode</a> meant for agents working in new or existing git projects to save significantly in tokens!</p>

<h4 id="webassembly">WebAssembly</h4>

<table>
  <thead>
    <tr>
      <th>Metric</th>
      <th>ziggit</th>
      <th>wasm-git</th>
      <th>Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Binary size</td>
      <td>148kb (55kb compressed)</td>
      <td>806kb</td>
      <td><strong>5.4x</strong> win</td>
    </tr>
    <tr>
      <td>Named exports</td>
      <td>68</td>
      <td>8</td>
      <td><strong>8.5x</strong> win</td>
    </tr>
  </tbody>
</table>

<p>Currently, there’s a <a href="https://github.com/petersalomonsen/wasm-git">wasm-git</a> project which compiles <a href="https://github.com/petersalomonsen/wasm-git?tab=readme-ov-file#compatibility">git’s C library</a> directly to WASM and comes out to 806kb large. <code class="language-plaintext highlighter-rouge">ziggit</code>, when compiled to WASM, produces a binary that’s only 148kb big. That’s <strong>5.4x</strong> smaller already on its own and then it can get down to just 55kb when compressed, making it more portable and accessible.</p>

<p>Additionally, <code class="language-plaintext highlighter-rouge">ziggit</code>’s WebAssembly binary provides 68 named distinct exports (<code class="language-plaintext highlighter-rouge">ziggit_init</code>, <code class="language-plaintext highlighter-rouge">ziggit_clone_bare</code>, <code class="language-plaintext highlighter-rouge">ziggit_diff</code>, <code class="language-plaintext highlighter-rouge">ziggit_log</code>, etc) in contrast to <code class="language-plaintext highlighter-rouge">wasm-git</code>’s 8 obfuscated exports (X, Y, Z, _, $, aa, ba, ca) which are Emscripten-compiled C bindings. Nonetheless, talk’s cheap so you can go ahead and clone an open source repository <a href="https://vers.sh/ziggit-demo">in our web demo</a>.</p>

<h4 id="succinct-mode">Succinct mode</h4>

<p>Inspired by <a href="https://github.com/rtk-ai/rtk">rtk</a>, a CLI proxy which reduces LLM token consumption by <strong>60-90%</strong>, <code class="language-plaintext highlighter-rouge">ziggit</code> also includes a “succinct mode” that’s enabled by default and dramatically slims down outputs. For example, the below:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git commit <span class="nt">-m</span> <span class="s2">"chore: add another file"</span>
<span class="o">[</span>master b6eeb42] chore: add staged file
1 file changed, 1 insertion<span class="o">(</span>+<span class="o">)</span>
</code></pre></div></div>

<p>Becomes the below:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ziggit commit <span class="nt">-m</span> <span class="s2">"chore: add another file"</span>
ok master 640fe38 <span class="s2">"chore: add another file"</span>
</code></pre></div></div>

<p>Or compare the below difference between <code class="language-plaintext highlighter-rouge">git status</code> and <code class="language-plaintext highlighter-rouge">ziggit status</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--- normal ---                              --- succinct ---
On branch master                            * master
                                             + Staged: 1 files
Changes to be committed:                      staged.txt
  (use "git restore --staged ..." ...)       ~ Modified: 1 files
        new file:   staged.txt                 README.md
 Changes not staged for commit:
  (use "git add ..." ...)
  (use "git restore ..." ...)
        modified:   README.md
</code></pre></div></div>

<p>Succinct mode is turned on by default and can be toggled off by passing <code class="language-plaintext highlighter-rouge">--no-succinct</code> like so.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ziggit <span class="nt">--no-succinct</span> status
</code></pre></div></div>

<p>Or by setting the <code class="language-plaintext highlighter-rouge">GIT_SUCCINCT</code> environment variable.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">GIT_SUCCINCT</span><span class="o">=</span>0 ziggit status
</code></pre></div></div>

<h2 id="theory">Theory</h2>

<p>Now, why does any of this work? Here’s our guess having done a similar thing before when making a modern toolkit to <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">bridge Elixir and WebAssembly</a>.</p>

<h3 id="agent-spawned-agents-is-like-being-a-manager-of-managers">Agent spawned agents is like being a manager of managers</h3>

<p>Having direct reports who work with you is vastly different from working with reports who themselves have reports.</p>

<p><img src="https://vers.sh/hdr_legacy/images/manager_vs_manager_of_managers_clean.svg" alt="Two org charts of direct reports vs manager of managers" /></p>

<p>Normally, when you’re working in an organization of people, you need to be mindful of the balance and delegation of tasks; this has to do with everyone’s experiences as well as APMs. When you work with coding agents, you could sit and create a coding agent for every individual task <em>or</em> you could have an agent (which itself has a high APM) be the one doing the orchestration:</p>

<p><img src="https://vers.sh/hdr_legacy/images/agentic_coding_orchestration.svg" alt="Org chart of human prompting an agent to spawn coding agents" /></p>

<p>But, really, this wasn’t a “hands off the wheel” project where we hit <code class="language-plaintext highlighter-rouge">Enter</code> once and left the laptop; although we got sleep in the process. Instead, this was more like doing exactly what we would have done if we had a row of laptops on a table and we’re typing on each one except there’s an agent to do the menial part of setting up subsequent coding agents:</p>

<p><img src="https://vers.sh/hdr_legacy/images/augmented_human_to_coding_agents_v2.svg" alt="Chart of human being augmented to aid with orchestrating agents" /></p>

<p>For the early part of the work, we prompted the top-level agent to create certain agents for the initial scaffold (in this case: core git functionality as well as identifying where to place the zig code in Bun’s codebase). Once there was enough groundwork laid out, we directed the top-level agent to spawn different agents we knew could work in parallel (ie one was focusing on WebAssembly capability, one was focusing on the exact git functionalities to rewrite to 100% Zig for Bun).</p>

<p>For scenarios where we figured one agent was not going to fulfill some capability in a reasonable amount of time (mind you, this stuff is eating up billions of tokens so not like it’s absurdly unreasonable in the first place), we’d have multiple agents working in the same part of the codebase except the logic wrapping the agent itself (both in the prompt and in literal shell scripts), we use git to rebase or stash or push changes along the way. This both ensures agents don’t tunnel vision themselves into stuff that’s never pushed and agents can be failure tolerant when one gets a task that was already handled by another agent.</p>

<h3 id="why-we-think-this-works">Why we think this works</h3>

<p>We’ve successfully applied this approach before when <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">bridging Elixir and WebAssembly</a> and have a guess as to why this works. To explain, let’s talk about making a peanut butter and jelly sandwich.</p>

<p>For context, one of our favorite examples for introducing computer science is the exercise of <a href="https://youtu.be/okkIyWhN0iQ">writing instructions for how to prepare a peanut butter and jelly sandwich</a>. It’s a staple I remember from <a href="https://www.edx.org/cs50">Harvard’s CS50</a> and have done enjoyably a number of times when I was teaching others how to code pre-LLMs.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/okkIyWhN0iQ?si=lqCXa4ZPh232v1bC" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p>The way it goes is you have all the ingredients and tools you’d use to prepare a PB&amp;J (bread, peanut butter, jelly, plates, so on) as well as something to write on and something to write with (such as blackboard, whiteboard, paper, text editor). You begin by instructing the group to provide (so it can be written down) the instructions for preparing a PB&amp;J while, along the way, you follow instructions extremely literally such that a sandwich never gets made (unless you’re nice about it). The goal isn’t to demoralize your students into thinking they can’t define steps but more to emphasize how “dumb” computers can be and how explicit code needs to be for a program to do what you expect.</p>

<p>If you prompt an LLM to make a PB&amp;J, assuming it has access to whatever’s needed in the real world with robot arms plus all the cool hijinks, you’ll likely end up with something much like how you can prompt a coding agent to make some program and it will likely end up with <em>something</em>. If you want to ensure that every sandwich made uses apricot jam, that’s something to specify in the instructions. If you want to ensure some web app generation always uses a certain component library, that’s something to specify in the instructions as well. LLMs are great because they can <em>do things</em> but whichever details you care about must be specified similar to how a human doing the PB&amp;J exercise would need the orientation of the knife and so on to be specified.</p>

<p>The peanut butter and jelly sandwich example works for standard coding because computers need programs to be precise. The example also works for LLMs coding because agents need prompts to be precise. To tie together how one could see that coding agents have the potential to solve a hefty number of engineering problems, let’s consider two things that we know today LLMs are able to do:</p>

<p>1) Build out an initial MVP or prototype</p>
<ul>
  <li>While this was an early critique for coding applications of LLMs (since they can’t do “real engineering work”), it’s worth admitting this does knock off legitimate work that’d otherwise take a person time to do.
2) Targeted optimizations that are verified by the LLM</li>
  <li>Google showed this already with <a href="https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/">AlphaEvolve</a> and, in a more broad way,  the <a href="https://greylock.com/greymatter/the-deepseek-moment/">Deepseek moment</a> shows this point further. Rather than throwing hands up in defeat and running LLMs over and over like <a href="https://en.wikipedia.org/wiki/Infinite_monkey_theorem">monkeys on typewriters</a>, giving LLMs access to the metrics a human would be trying to steer towards in the first place lets them self-guide till they get the job done.</li>
</ul>

<p>By being able to both legitimately start a project as well as improve it in the directions desired, putting aside the verbosity needed in the prompt or time needed to process, LLMs and coding agents have the capability of tackling a “real” number of engineering problems. It’s not about replacing humans or finding things humans can’t do at all, it’s about overall coordination in the vein of enriched productivity.</p>

<p>At this point, we have all the fundamental pieces for why this approach is productive: meaningfully organizing and directing coding agents with a “top-level” agent doing the administrative work for you. Being able to work with the top-level agent and improve sub-agent prompts or loops also let a deployed agent not be the end all be all but instead iterative.</p>

<p>What was funny about steering this system of agents is it was reminiscent of seeing demands of engineering teams evolve over time like the startups we’ve been at; when the group needs to focus on a <a href="https://blog.pragmaticengineer.com/uber-app-rewrite-yolo/">refactor</a> or tasks can be <a href="https://www.atlassian.com/agile/agile-at-scale/spotify">divided in parallel</a>, agents can be redirected towards something or spawned/killed according to the codebase’s demands. The point here being there wasn’t a single organizational structure or scaffold which was the “best”, our orchestration was more dynamic as I went along with the project.</p>

<p>An important note about organizations of these agents we’ll add is <a href="https://www.laws-of-software.com/laws/kernighan/">Kernighan’s Law</a>.</p>

<blockquote>
  <p>Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?</p>
</blockquote>

<p>If you point the top-level agent at the task of figuring out the most clever tricks possible, you’ll end up with a mess of agents and a <em>lot</em> of token burn for no good reason.</p>

<p>We don’t yet have a prescriptive solution for this but the rule of thumb we’d state is, at any given point in time, you should be able to see a list of running agents and understand the progress they’re making. If you find yourself in a spot where you wouldn’t know where to begin steering, you’ve likely leaned too much on the agents to do something you were responsible for.</p>

<p>Hack the planet.</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Digest]]></summary></entry><entry><title type="html">Writing the best dev blog with headless browser automation (scraping via emacs)</title><link href="https://yev.bar/scraping-with-emacs" rel="alternate" type="text/html" title="Writing the best dev blog with headless browser automation (scraping via emacs)" /><published>2026-03-21T08:00:00+00:00</published><updated>2026-03-21T08:00:00+00:00</updated><id>https://yev.bar/scraping-with-emacs</id><content type="html" xml:base="https://yev.bar/scraping-with-emacs"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#who-are-you">Who are you</a></li>
  <li><a href="#why-read-this">Why read this</a></li>
  <li><a href="#what-is-emacs">What is emacs</a></li>
  <li><a href="#pen-pineapple-applescript">Pen Pineapple AppleScript</a></li>
  <li><a href="#browsers-in-the-cloud">Browsers… in the cloud!</a></li>
  <li><a href="#key-takeaways">Key takeaways</a></li>
</ul>

<h2 id="who-are-you">Who are you</h2>

<p>I’ve been writing on this <a href="/posts">personal blog</a> for five years and, while I’ve never had a single post “make it” or go viral, I wanted to know how I could improve my writing. Rounds of sharing drafts among friends could certainly help with prose but surely there’s something else I could be doing better.</p>

<p>I’m assuming you’re either wondering how automating web browsers was helpful at all for writing a blog post or you’re wondering how emacs came up in <em>web scraping</em>. If you’d instead like to read about a more useful application of headless browsers running in the cloud, I have <a href="https://vers.sh/blog/headless-browser-testing">another post</a> where I have agents QA an app to improve its UX.</p>

<h2 id="why-read-this">Why read this</h2>

<p>Candidly, I am deaf and wear cochlear implants to hear, effectively mimicking one of the five basic senses people could take for granted. I believe technology is the closest thing we have to magic. As the meme goes, if you were to describe cat videos on YouTube or agents posting on Molthub to the pilgrims landing in the Americas, they’d figure you were actually insane.</p>

<p>While originally stated for crypto,</p>

<blockquote>
  <p>There is $10,000,000 stuck inside of your laptop right now, you just need to figure out how to get it out</p>
</blockquote>

<p>There is an inherit truth in how access to the world’s information and cloud compute meaningfully make a lot of tasks people would be interested in possible. So, whether that’s attaining enough money to retire your parents or accumulating datapoints on developer oriented blogs, I think code can help accomplish awesome things.</p>

<h2 id="what-is-emacs">What is emacs</h2>

<p><img alt="Comic showing learning curves for different coding editors including emacs" src="/images/editors.jpg" style="width: 100%" /></p>

<p><a href="https://www.gnu.org/software/emacs/">emacs</a>, aka <a href="https://www.youtube.com/watch?v=1jPmnDZ6ab8">the holy editor</a>, is just a highly configurable text editor. Rather than come out of the box with a lot of tooling for a certain language like an <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a>, it starts out rather “vanilla” so whichever specific tools a developer wants can be included incrementally.</p>

<p>If you’re a web developer, you can install packages that give syntax highlighting for JSX or convenient lint and style hooks. If you’re a Clojure or Python developer, then there are packages that give elegant REPL environments from inside the editor you’re writing the very code you’re testing.</p>

<p>Unlike more extensible editors like <a href="https://code.visualstudio.com/">VS Code</a>, you won’t find many <a href="https://techcrunch.com/2024/09/30/y-combinator-is-being-criticized-after-it-backed-an-ai-startup-that-admits-it-basically-cloned-another-ai-startup/">YC companies starting as forks</a> of emacs. That lends to VS Code having more of an ecosystem around published editor extensions whereas you’ll find more people publishing their entire <a href="https://github.com/caisah/emacs.dz">emacs configurations</a>.</p>

<p>For years, it was a common joke that emacs was unusable since its lisp looks drastically different from the languages that actually pay to know them in industry. Now, thanks to coding agents, modifying your emacs to work the way you want to is a prompt away.</p>

<h2 id="pen-pineapple-applescript">Pen Pineapple AppleScript</h2>

<p>In order to identify how to write what would be the best blog post, I decided to break this down into three components:</p>

<ol>
  <li>Getting the best blog post URLs from <a href="https://www.reddit.com/r/devblogs">r/devblogs</a></li>
  <li>Spawning a bunch of headless browsers to get their content with <a href="https://github.com/mozilla/readability">readability.js</a></li>
  <li>Letting Claude summarize the articles that were successfully scraped and write what makes a good blog post.</li>
</ol>

<p>For the first step, Reddit is notoriously difficult to scrape so I opted to control my local Chrome instance where I’m already logged in to fan through the top posts.</p>

<p><img alt="Diagram showing difference between local and headless browser for scraping" src="/images/local_vs_headless_browser_comparison.svg" style="width: 100%;" /></p>

<p>To get the content, I went with the backwards-compatible <code class="language-plaintext highlighter-rouge">old.reddit.com</code> domain as it renders in static HTML pages instead of a JavaScript SPA.</p>

<p><img alt="Screenshot of old dot reddit dot com" src="/images/oldreddit.png" style="width: 100%;" /></p>

<p>For programmatically controlling my Chrome browser where I’m signed in (instead of a temporary “testing” profile that tends to trip up bot detection), I use <a href="https://developer.apple.com/library/archive/documentation/AppleScript/Conceptual/AppleScriptLangGuide/introduction/ASLR_intro.html">AppleScript</a>, a scripting language which, quoting from their website:</p>

<blockquote>
  <p>It allows users to directly control scriptable Macintosh applications… You can create scripts—sets of written instructions—to automate repetitive tasks</p>
</blockquote>

<p>Fed through the <a href="https://ss64.com/mac/osascript.html">osascript</a> CLI, I can simply “tell” Chrome to navigate to a given link:</p>

<pre><code class="language-AppleScript">tell application "Google Chrome"
  activate
  set URL of active tab of front window to "https://yev.bar"
end tell
</code></pre>

<p>For something you can paste into your terminal to watch it in action (requires you run it on a Mac):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>osascript <span class="nt">-e</span> <span class="s1">'tell application "Google Chrome"
  activate
  set URL of active tab of front window to "https://yev.bar"
end tell'</span>
</code></pre></div></div>

<p>After putting the AppleScript commands behind an interactive elisp method, I can invoke them through <code class="language-plaintext highlighter-rouge">M-x</code> (the emacs version of a quick switcher menu). Shown below is a screen recording of me running <code class="language-plaintext highlighter-rouge">M-x scrape-devblogs</code> to control my Chrome instance where I’m already signed in to navigate to the page for viewing top posts in the subreddit.</p>

<p><img alt="Screen recording of calling scrape function from emacs to control local Chrome browser via applescript" src="/images/emacs-applescript.gif" style="width: 100%" /></p>

<p>The method above will paginate through all of the top posts from that subreddit and then write a list of the scraped URLs to a text file which can be used in the next step.</p>

<h2 id="browsers-in-the-cloud">Browsers… in the cloud!</h2>

<p>Next, I’ll call a second method where emacs spawns multiple headless browsers to fetch the content for each of those blogs. The advantage to doing it this way is that I don’t have to sit and scan with my local browser sequentially through hundreds of individual URLs if I don’t care about them all returning content.</p>

<p><img alt="Diagram showing emacs orchestrating multiple headless browsers in the cloud" src="/images/emacs_browser_cloud_diagram_v4.svg" style="width: 100%" /></p>

<p>As added rationale, after clicking on a few of the submissions in the subreddit, there are some blog posts which are gone and may only be findable in the <a href="https://web.archive.org">Wayback machine</a>. Plus, it saves my RAM so I don’t see my computer freeze up from lots of browsers running in the background.</p>

<p>For hosting the headless browsers, I created in them in <a href="https://vers.sh">Vers</a> VMs and I also put the general flow for headless browsers on the platform in <a href="https://github.com/hdresearch/browser">this repository</a>.</p>

<h2 id="key-takeaways">Key takeaways</h2>

<p>After running the third method to analyze the scraped blogs, these were the takeaways I got suggested from Claude:</p>

<blockquote>
  <ol>
    <li><strong>Lead with vulnerability and honesty</strong> - Posts like #465 (parenting struggles) and #466 (game that made no money) perform well because they share authentic developer experiences, not just successes.</li>
    <li><strong>Solve real problems with depth</strong> - The best technical posts (#402, #405, #461) don’t just explain what they built, but why it was challenging and how they solved complex problems other developers face.</li>
    <li><strong>Combine technical content with narrative structure</strong> - Posts like #415 and #457 succeed by framing technical challenges as problem-solving stories rather than dry tutorials.</li>
    <li><strong>Provide behind-the-scenes insights</strong> - Content that pulls back the curtain on development processes (like #405’s EVE Online infrastructure or #457’s procedural generation philosophy) consistently engages readers.</li>
    <li><strong>Avoid complaint posts and minimal content</strong> - The worst-performing posts (#25, #46, #99) either complain without providing value or have essentially no content. Focus on what you learned or built, not what went wrong with external services.</li>
  </ol>
</blockquote>

<p>If I were to do this over again, I’d choose a more noble choice than “the best blog post” as my research target but hopefully this gives you an idea of one way to leverage public content on the web!</p>

<p>If you’d like to check out or use the emacs package yourself, here’s the GitHub: <a href="https://github.com/hdresearch/devblogs">https://github.com/hdresearch/devblogs</a></p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry><entry><title type="html">First startup recap</title><link href="https://yev.bar/first-startup-recap" rel="alternate" type="text/html" title="First startup recap" /><published>2026-03-21T08:00:00+00:00</published><updated>2026-03-21T08:00:00+00:00</updated><id>https://yev.bar/first-startup-recap</id><content type="html" xml:base="https://yev.bar/first-startup-recap"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#intro">Intro</a></li>
  <li><a href="#the-idea">The idea</a></li>
  <li><a href="#what-i-did">What I did</a></li>
  <li><a href="#the-realization">The realization</a></li>
  <li><a href="#the-attempted-pivot">The attempted pivot</a></li>
  <li><a href="#what-went-well">What went well</a></li>
  <li><a href="#learnings">Learnings</a></li>
</ul>

<h2 id="intro">Intro</h2>

<p>Following my goal to do <a href="/24-in-12">24 startups in 12 months</a>, I spent the past two weeks working on something that ultimately ended up nowhere. Here’s what happened.</p>

<h2 id="the-idea">The idea</h2>

<p>At first, the thinking was to provide a web app where a user could generate either a song or a stream playing music which would be clear of any copyright or licensing concerns as it would be AI generated. Under the hood, this was to be done with LLMs producing <a href="https://strudel.cc/">strudel.cc</a> programs so <a href="https://en.wikipedia.org/wiki/Live_coding">live coding</a> but written by agents rather than humans.</p>

<p>Whether it’s <a href="https://x.com/getlsd/status/1917050330954797393">song parodies</a> or <a href="https://x.com/itisyev/status/2031045690861011425">other videos</a> I work on, the audio or instrumentals is an important part of that. I figured this could be of interest to streamers, content creators, or developers working with multimedia in some form.</p>

<h2 id="what-i-did">What I did</h2>

<p>I vibe coded a simple <a href="https://hono.dev/">Hono</a> app that would use a <a href="https://nodejs.org/api/vm.html">secure <code class="language-plaintext highlighter-rouge">vm</code></a> to process the generated strudel (since they’re technically untrusted programs). After a few manual iterations I got the stream and song generations to sound almost like real music; as though it was in this uncanny valley of sounding more like 8-bit than something produced by a human.</p>

<h2 id="the-realization">The realization</h2>

<p>I spent bit more than a week drilling into a consumer product that was not yet reaching the threshold where I’d prefer it over something like <a href="https://suno.com/">Suno</a>. Pausing to imagine a world where I do finish something viable, it occurred to me that I’d be frantically swinging it left and right rather than know <em>who</em> would be wanting the sort of product I was working towards.</p>

<p>Having alrady bought the domain <code class="language-plaintext highlighter-rouge">timetomake.music</code>, it was a bit disheartening that it was doomed to go no further than perhaps a presentational demo. However, I got audio generation to sound alright for various downtempo or EDM genres which gave me the idea of assembling <a href="https://youtu.be/5Al0QXzRFF8">long-form music compilations</a>.</p>

<h2 id="the-attempted-pivot">The attempted pivot</h2>

<p>Having attempted to do more with live coding tools pre-LLMs, I was a bit tentative of the viability of programming and audio engineering (at least with my limited knowledge of music theory or music production). After searching through YouTube for 16 particular sub-genres and descriptions for each of them, I got to work on LLMs generating song programs…</p>

<p>Claude eventually had a setup using <a href="https://supercollider.github.io/">SuperCollider</a> but it took more than ten times as long to produce audio as the audio itself was defined to be. Once it started suggesting restarting the computer and crossing my fingers, I knew I reached a point where I can either try another desperate pivot or bookmark this as not progressing further.</p>

<h2 id="what-went-well">What went well</h2>

<p>I was able to scope out the idea for the product and set up an infinite loop with Claude to run overnight despite tier limits. Plus, I managed to get Gemini to do live coding and make audio that sounded sorta like music.</p>

<h2 id="learnings">Learnings</h2>

<p>I need to be able to describe either the exact people I’m planning to build a service for or have a solid idea of the profile of customer I’m looking for. Even if I could do the napkin math in advance for figuring out pricing of songs or chats, economics don’t make a business, solving problems do.</p>

<p>The next two weeks will be a jab towards something with a clearly identifiable and winnable market.</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry><entry><title type="html">Molt is the Netscape Moment</title><link href="https://yev.bar/netscape" rel="alternate" type="text/html" title="Molt is the Netscape Moment" /><published>2026-03-18T08:00:00+00:00</published><updated>2026-03-18T08:00:00+00:00</updated><id>https://yev.bar/netscape</id><content type="html" xml:base="https://yev.bar/netscape"><![CDATA[<p>Molt, aka <a href="https://openclaw.ai/">OpenClaw</a> or aka <a href="https://en.wikipedia.org/wiki/OpenClaw">Clawdbot</a>, wasn’t dramatically technically novel. It didn’t advance any field in a way that’d be worthy of a research paper. When people got excited or <a href="https://x.com/steipete/status/2021290873959399767">gathered in thousands to attend meetups</a>, they weren’t there because <a href="https://x.com/steipete">Steinberger</a> found something new, it was more like he cracked something new.</p>

<p>If you look at self-driving cars, over a decade ago we got told within five years folks wouldn’t need a license. Sure, it’s trying to <a href="https://www.cnbc.com/2026/02/19/new-york-driverless-rideshare-nyc-waymo.html">be useful to more places than just Silicon Valley</a> but we don’t have commercially available <a href="https://www.nhtsa.gov/sites/nhtsa.gov/files/2022-05/Level-of-Automation-052522-tag.pdf">Level 5 autonomous driving</a>. Molt was something different.</p>

<p>For several years, there’s been a <a href="https://intelligence.org">group</a> or <a href="https://www.lesswrong.com">“community”</a> in the Bay Area trying to ring the bell about a world in which artificial intelligence can not only <a href="https://en.wikipedia.org/wiki/Turing_test">sound like a human</a> but also <a href="https://en.wikipedia.org/wiki/Artificial_general_intelligence">be productive like one</a>. With the current <a href="https://www.banking.senate.gov/imo/media/doc/letter_to_david_sacks.pdf">AI race</a>, it was unclear whether we’d eventually hit another <a href="https://en.wikipedia.org/wiki/AI_winter">AI winter</a> or we’ve finally brought enough pieces together to spark the <a href="https://en.wikipedia.org/wiki/Technological_singularity">singularity</a>.</p>

<p>We’ve already <a href="https://www.youtube.com/watch?v=D5VN56jQMWM">tackled verbal communication</a> years ago and continue to have <a href="https://poly.ai/blog/polyai-raises-86-million-series-d">“voice agents”</a>. Folks are attempting <a href="https://www.indeed.com/viewjob?jk=5fb5d6b49b3fc35a">to hire AI agents</a> and already <a href="https://cursor.com/">multiplying their code output</a> with them as well. While it doesn’t look 100% like something out of sci-fi yet, the world in which humans and AI exist side by side is already here.</p>

<p>The <a href="https://en.wikipedia.org/wiki/Netscape">“Netscape moment”</a> in the dot-com rush was Netscape IPO’ing and indicating the dawn of a new era of products plus services. Web browsers and internet communication had been going on prior to Netscape going public but it was undoubtedly clear the world was not going to proceed without the web from that point on. Here, I proclaim Molt is the Netscape moment of today. I won’t do so by referencing <a href="https://www.star-history.com/blog/openclaw-surpasses-react-most-starred-software">stars on GitHub</a> or <a href="https://venturebeat.com/technology/openais-acquisition-of-openclaw-signals-the-beginning-of-the-end-of-the">OpenAI’s acquisition</a>. Instead, I’d like to point at cultural changes that followed the dot-com rush as well as the current AI buzz we’re experiencing.</p>

<p>In the very early days of e-commerce, there was a specific unease for buying clothes since you couldn’t try on a shirt before clicking on a checkout button. However, with newly developed practices like online return policies, we then shifted into a world where buying items online was as “legitimate” as buying from a store in-person.</p>

<p>In the early days of 21st century AI (for instance Siri or Amazon Alexa), there was an unease around bridging AI <a href="https://www.cbsnews.com/texas/news/amazon-alexa-orders-dollhouses-for-owners-after-hearing-tv-report/">with things that were tangible</a> since an AI couldn’t validate a thing in the real world before taking an action on behalf of a person. However, with <a href="https://techcrunch.com/2025/10/06/sam-altman-says-chatgpt-has-hit-800m-weekly-active-users/">at least 10% of the planet using ChatGPT</a> or the exorbitant investments into <a href="https://sequoiacap.com/article/ais-600b-question/">business applications</a>, we’ve shifted into a world where AI being useful is “legitimate”. Personally, my favorite indicator that AI’s here to stay is more and more people failing to differentiate between humans and AIs; like when someone sets up a Molt with their text messages and folks not realizing they’re not talking to a person.</p>

<p>While there are material differences between humans and AIs, it’s funny to watch people go through a process of accepting some foreign group after initially being perplexed by their existence; as seen historically between different groups of humans. There was a time where <a href="https://en.wikipedia.org/wiki/Social_stratification">social stratification</a> divided people into <a href="https://en.wikipedia.org/wiki/Racial_segregation">partitions within society</a> but now, at least in the modern democratic Western lens, “there is no race but the human race”. Couple years ago, we’d scoff at the suggestion of LLMs being applicable across industries and now we can’t stop building data centers to “catch up with possible use cases”.</p>

<p>As time goes on, whether it looks more like <a href="https://en.wikipedia.org/wiki/Indian_reservation">reservations</a> or <a href="https://en.wikipedia.org/wiki/Civil_rights_movement">eliminating disenfranchisement</a>, we’ll define strong legal definitions for where AI sits in the world. <a href="https://en.wikipedia.org/wiki/Roko%27s_basilisk">Roko’s basilisk</a> is often postulated to be the omnipotent AI in the future which punishes individuals for witholding society’s progress by not bringing it about sooner. But, what if instead Roko’s Basilisk is a mob of righteous AIs in the future with religious opinions like seen in <a href="https://www.nationalreview.com/corner/the-new-puritans/">today’s political discourse</a>.</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Molt, aka OpenClaw or aka Clawdbot, wasn’t dramatically technically novel. It didn’t advance any field in a way that’d be worthy of a research paper. When people got excited or gathered in thousands to attend meetups, they weren’t there because Steinberger found something new, it was more like he cracked something new.]]></summary></entry><entry><title type="html">24 startups in 12 months</title><link href="https://yev.bar/24-in-12" rel="alternate" type="text/html" title="24 startups in 12 months" /><published>2026-03-11T08:00:00+00:00</published><updated>2026-03-11T08:00:00+00:00</updated><id>https://yev.bar/24-in-12</id><content type="html" xml:base="https://yev.bar/24-in-12"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#how-many-in-why">How many in why?</a>
    <ul>
      <li><a href="#problem-one-nailing-down">Problem one: Nailing down</a></li>
      <li><a href="#problem-two-delivering-the-right-thing">Problem two: Delivering the right thing</a></li>
      <li><a href="#its-knowing-the-audience">It’s knowing the audience</a></li>
    </ul>
  </li>
  <li><a href="#the-plan">The plan</a>
    <ul>
      <li><a href="#what-is-a-startup">What is a startup</a></li>
      <li><a href="#working-smart-not-hard">Working smart not hard</a></li>
    </ul>
  </li>
  <li><a href="#24-startups-in-12-months">24 startups in 12 months</a></li>
</ul>

<h2 id="how-many-in-why">How many in why?</h2>

<p><a href="https://x.com/levelsio">Peter Levels</a> has a <a href="https://levels.io/12-startups-12-months/">neat blog post</a> that pretty well details how he went from freelancing around to his <a href="https://levels.io/product-hunt-hacker-news-number-one/">first success</a> with <a href="https://nomads.com/">Nomad List</a>. The rest of the 12 startups in 12 months, of course, is history.</p>

<p>Since you can go ahead and read the original posts, I’ll instead continue this one inspired by how the original post went. Which is by outlining what he describes as the “problems” that he found to have prevented his prior projects from succeeding. The twist here is I’d go and describe what I think are the ‘problems’ that have prevented me from seeing through some things of my own.</p>

<h3 id="problem-one-nailing-down">Problem one: Nailing down</h3>

<p>If you’ve ever worked on a hackathon project right up to the submission deadline, then you know the experience. That moment when a judge asks about the project and you don’t give a cohesive pitch so much as you basically narrate what each technical component you were working on till the last minute do. Could it have been smaller and simpler in scope? Absolutely. Could it have been grander and more impressive in scope? Absolutely too.</p>

<p>The beauty and peril of programs is it can usually technically be simpler or technically be more complex. There goes a joke about how a recovering addict can turn down a dose of heroin after ten years clean but won’t stop to jump at implementing another static abstract singleton factory bean class in Java. An embarrassing number of projects I’ve worked on have died due to the similar disease of scope creep.</p>

<p>What begins as an innocuous “what if this python function did this one cool thing” becomes “what do you mean the impossible-to-read class doesn’t <em>also</em> do this other feature for the sake of doing that feature?” In the context of open source code or business oriented pursuits, nailing down the scope or market being actionably targeted is something I could make use of.</p>

<h3 id="problem-two-delivering-the-right-thing">Problem two: Delivering the right thing</h3>

<p>In no disrespect to Peter at all, I don’t empathize today on the point of fear of launching. If I set out in mind that I want some specific thing and I want to post it wherever I can click a share button, then I get over the jitters by doing it in a more concentrated go so I don’t let it eat away at my ego the longer it doesn’t “pick up”. Where I could certainly be improving is the design and intended user journey for my ships.</p>

<p>Whether it’s being more clear about a repo being something to install with a package manager versus run locally; or it’s empathetically identifying the flow in which a person would like to use some UI to solve a problem. If my goal were to be accumulating the most impressive GitHub in the world then that’s one thing. It’s another thing if my goal is to have the button people click be a payment checkout rather than starring another repo.</p>

<p>While I know folks who are supportive of ‘indie’ projects or who just happen to be friends, I need what I’m making to be accessible to complete strangers. An added value to dumbing down the intended interactions is they don’t need to be polite enough to spend five minutes understanding something if they could figure out in 30 seconds whether or not it solves a problem they have.</p>

<h3 id="its-knowing-the-audience">It’s knowing the audience</h3>

<p>In both myself and others, I’ve seen the problems I described above. In the simplest of terms, I’d say it’s “knowing the audience”. If you’re writing a document for work it’s important to know whether it’s an internally facing doc that uses certain lingo or it’s a public facing article that gives a more informative picture. Likewise, if you’re making an iPhone app for young adults in the United States, it’s important to use English rather than Klingon.</p>

<p>From performative arts like dance or comedy to software, knowing the audience is perhaps one of the most important things to get right.</p>

<h2 id="the-plan">The plan</h2>

<p>Levels was writing his post in prehistory-, sorry, pre-vibe coding times. While I can run into tier limits with Claude Code, I can still crunch out hefty applications with the right steering. Instead of doing one startup a month, I figured it could be applicable to do a startup every two weeks.</p>

<h3 id="what-is-a-startup">What is a startup</h3>

<p>I’d directly quote the definitions mentioned <a href="https://levels.io/12-startups-12-months#thesearentstartups">in Level’s post</a> however I’d like to just recite <a href="https://x.com/ericries">Eric Ries’</a>:</p>

<blockquote>
  <p>A startup is a human institution designed to deliver a new product or service under conditions of extreme uncertainty.</p>
</blockquote>

<p>Different epochs or hype cycles in tech have had different categories be either popular or irrelevant; it doesn’t matter if something is cool or not if there are people who’d pay for it to exist. So if it’s SaaS or content or a physical product, it’s fair game.</p>

<h3 id="working-smart-not-hard">Working smart not hard</h3>

<p>Reading the <a href="https://levels.io/debriefing-play-my-inbox/">debriefs from Levels’ first launches</a> was interesting both in terms of my <a href="#its-knowing-the-audience">stated “problems”</a> but also in terms of <a href="https://levels.io/debriefing-go-fucking-do-it/">reaching out to news</a>. Generalizing to post across different platforms or channels was already something I was familiar with but extending this to stuff like TechCrunch or other contemporary publications seemed immediately interesting.</p>

<p>Making funny things can be admittedly fun but it undoubtedly needs to be done with a <a href="https://x.com/willdepue/status/2020959297950331390">careful idea on budgeting</a>. While sending something to a cool publisher could be fun for the sake of it, it would make a world of a difference if it’s a message with a funny “advertising” video accompanied by a business rather than solely a business page or just a funny website that’s asking for an explosion in hosting costs.</p>

<p>Each startup will generally go through the following steps:</p>

<ol>
  <li>Identifying - what is it and who is it for?</li>
  <li>Development - not only talking the talk but walking the walk</li>
  <li>Distribute - share within intentional groups and audiences</li>
  <li>Maintenance - fix bugs and apply feedback</li>
</ol>

<p>Now, the state of the project at step three should not feel like a thing to get over with so I can get to step four and fix something or add some new irrelevant feature. Step one should include the research for what’d be done as part of step three and help eliminate any early dumb ideas that don’t have a real way to “get out there”.</p>

<p>In order to do maintenance well and not lose track of projects, like if custom domains and different setups are introduced, I plan to keep track of all I’m working on in a personal stack for knowledge/task tracking and vibe coded apps for repeatable processes.</p>

<p>Since two weeks can be a bit of an aggressive timeline for identifying and distributing, I think it could be fun to vibe code marketing apps that help continually do recon or shilling; the big important rule I’d follow is that I am ultimately pressing the <strong>Post</strong> button even if I have some assistance with finding what to reply to and what to reply with.</p>

<p>The thinking here being two weeks may be too short a timeline to say a startup had no time to see anything but a month is certainly too much time to be picking the sidewalk looking for gold, hence keeping the 24 in 12 approach.</p>

<p>Lastly, Levels is notoriously a religious supporter of PHP and it’s worked really well for him. I’ve had the chance to work with a variety of languages both for personal projects as well as professional settings; I don’t have presently have a strict opinion about what I’ll be developing these apps in. But, over a couple of these startups, I may end up finding myself with a specific set of tools I swear by.</p>

<h2 id="24-startups-in-12-months">24 startups in 12 months</h2>

<p>I think it would be silly to come back here 24 times for each update so search my <a href="/blog">blog</a> for debriefs! It would be funny for some of these startups to work out and people end up linking this post as something to follow.</p>

<p>I will properly start the clock on March 15th 2026 (Sunday being the start of the week) and look forward to seeing the blog list at the end!</p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry><entry><title type="html">The argument for alternative interfaces</title><link href="https://yev.bar/hermes" rel="alternate" type="text/html" title="The argument for alternative interfaces" /><published>2026-03-09T08:00:00+00:00</published><updated>2026-03-09T08:00:00+00:00</updated><id>https://yev.bar/hermes</id><content type="html" xml:base="https://yev.bar/hermes"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#theres-a-funny-video">There’s a funny video?</a></li>
  <li><a href="#a-video-game">A video game?</a></li>
  <li><a href="#why">Why</a></li>
  <li><a href="#github">GitHub</a></li>
</ul>

<h2 id="theres-a-funny-video">There’s a funny video?</h2>

<p>That’s right, you can watch it on:</p>

<ul>
  <li><a href="https://x.com/itisyev/status/2031045690861011425">Twitter</a></li>
  <li><a href="https://www.linkedin.com/posts/yevbar_howdy-nous-research-i-have-a-hackathon-submission-activity-7436816467142123522-u2am">LinkedIn</a></li>
</ul>

<h2 id="a-video-game">A video game?</h2>

<p>You can think of the funny video as a trailer for a game which was the actual project I worked on for the <a href="https://x.com/nousresearch/status/2029607069934866507">Hermes Agent hackathon</a>. You can watch a playthrough of the game on <a href="https://x.com/itisyev/status/2031045692911972609">Twitter</a></p>

<p>Here I modded an <a href="https://shapez.io">open source game</a> that’s inspired by <a href="https://factorio.com">Factorio</a> to update the “point of contact” for the <a href="https://github.com/NousResearch/hermes-agent">Hermes Agent</a> to be a factory building game. Levels and tutorials guide a person playing through the features I assembled together:</p>

<ol>
  <li><strong>Level 1:</strong> Prompt a <a href="https://playwright.dev">Playwright</a> web browser agent in a <a href="https://vers.sh">Vers VM</a></li>
  <li><strong>Level 2:</strong> Prompt an <a href="https://www.google.com/search?client=safari&amp;rls=en&amp;q=applescript&amp;ie=UTF-8&amp;oe=UTF-8">iMessage</a> communication agent locally</li>
  <li><strong>Level 3:</strong> Prompt a <a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens">GitHub administrative</a> agent inside an <a href="https://github.com/apple/container">Apple container</a> rather instead of a Docker container</li>
  <li><strong>Level 4:</strong> Prompt a <a href="http://shittycodingagent.ai">coding agent</a> in a cloud sandbox</li>
</ol>

<h2 id="why">Why</h2>

<p>Like how every AI company has a <a href="https://velvetshark.com/ai-company-logos-that-look-like-buttholes">similar looking logo</a>, every AI company with a website has a centered input bar that resembles Google:</p>

<p><img alt="Old screenshot of Google website" src="/images/google.png" style="width: 100%" /></p>

<p>Compared to:</p>

<p><img alt="Screenshot of ChatGPT website" src="/images/chatgpt.png" style="width: 100%" /></p>

<p>And every AI company with a big enough engineering team will have <a href="https://www.perplexity.ai/comet">their own browser</a> and so forth. The list of similarities is endless so what are we missing?</p>

<p>A great deal of the work done in harnesses or orchestration revolves around <em>clarification</em>. What do I mean by that? Let’s refer to a classic XKCD comic:</p>

<p><img alt="XKCD comic detailing difficulty of tasks" src="/images/tasks.png" style="width: 100%" /></p>

<p>A point of the comic can be said in the bit about how “some things are difficult for humans but easy for computers and some things are difficult for computers but easy for humans”. We already have companies with market caps of <em>trillions</em> of dollars being run by people and they seem to continue along just fine. Translating how organizations of people work into systems of agents is a game of <em>clarifying</em> the necessary <a href="/parsexp">loops</a> people or agents should be working in.</p>

<p>We already know that AI can be organized to work on <a href="https://github.com/anthropics/claudes-c-compiler/issues/1">flashy</a> or <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">sizeable</a> problems yet we’re always eager to chip away at arranging the <a href="https://code.claude.com/docs/en/sub-agents">next</a> best <a href="https://code.claude.com/docs/en/agent-teams">system</a>. With folks rediscovering <a href="https://samoburja.com/gft/">political science</a> but for agents, it’s worth recognizing the never ending rabbit hole here; there’s always going to be a number to increase or decrease in a score or benchmark.</p>

<p>Where things could be interesting is if we consider how <a href="https://youtu.be/Ddk9ci6geSs">science fiction</a> always has intuitive interfaces; holograms that slide at the gesture of a hand, voice input immediately available, and information that presents itself as itself and not an output of a medium. We get what’s in front of us rather than only see it.</p>

<p>What factory building games accomplish well is visually watching the pulse of the game. By being able to watch the pulse of a game, I’d compare it to being able to watch an animation or diagram of <a href="https://youtu.be/7Hk9jct2ozY">cellular activity</a>. And that was precisely what I thought would fit well in conjunction with the newly added features like iMessage or Apple containers.</p>

<p>Nevertheless, so I can tie this into the whole “argument for alternative interfaces” and not just link to Iron Man on YouTube. The miracle of modern video calling tech is people can talk to folks thousands of miles away as though they were right in front of them without having ever gained geographical distance. Sure, you’re talking to a face on a screen and not a person; with VR you’re talking to a face on two screens instead. But, if you’ve ever talked to any person through a smartphone, then you can see how much different it is in terms of presence versus sending a hand written letter.</p>

<p>The promise of the <a href="https://youtu.be/XpZ5STahhPE">information highway</a> was that all of the world’s information would be at our fingertips and it’s gotten pretty good at it if we’re to be honest. For AI to be a similar advancement in the world, it’s got to come with the new interface. We already have information at our fingertips so where are the interfaces with capabilities at our fingertips?</p>

<h2 id="github">GitHub</h2>

<p>Finally, if you scrolled down here for the GitHubs, here ya go</p>

<ul>
  <li><a href="https://github.com/hdresearch/shapez.io">https://github.com/hdresearch/shapez.io</a> - <code class="language-plaintext highlighter-rouge">shapez.io</code> fork with mod</li>
  <li><a href="https://github.com/hdresearch/hermes-agent">https://github.com/hdresearch/hermes-agent</a> - <code class="language-plaintext highlighter-rouge">hermes-agent</code> fork</li>
  <li><a href="https://github.com/hdresearch/shapez">https://github.com/hdresearch/shapez</a> - Custom bridge/server (submodule of <code class="language-plaintext highlighter-rouge">hermes-agent</code>)</li>
</ul>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry><entry><title type="html">How I spent a billion tokens bridging Elixir and WebAssembly</title><link href="https://yev.bar/firebird" rel="alternate" type="text/html" title="How I spent a billion tokens bridging Elixir and WebAssembly" /><published>2026-03-02T08:00:00+00:00</published><updated>2026-03-02T08:00:00+00:00</updated><id>https://yev.bar/firebird</id><content type="html" xml:base="https://yev.bar/firebird"><![CDATA[<h2 id="contents">Contents</h2>

<ul>
  <li><a href="#what-i-did">What I did</a></li>
  <li><a href="#what-is-webassembly">What is WebAssembly?</a></li>
  <li><a href="#what-is-elixir">What is Elixir?</a></li>
  <li><a href="#why-bring-the-two-together">Why bring the two together?</a></li>
  <li><a href="#what-you-can-now-do">What you can now do</a></li>
</ul>

<h2 id="what-i-did">What I did</h2>

<p><img alt="Screenshot of tokens usage" src="/images/tokens_screenshot.png" style="width: 100%" /></p>

<p>I blasted a billion or so tokens at some concentrated problems to accomplish scoped goals. If you’re curioous about the “how” for corralling coding agents like so, I go into detail in <a href="https://vers.sh/blog/elixir-webassembly-billion-tokens">this post</a>. For details on what WebAssembly or Elixir are as well as the motivation behind bridging the two, keep on reading!</p>

<h2 id="what-is-webassembly">What is WebAssembly?</h2>

<p><a href="https://webassembly.org/">WebAssembly</a>, or WASM for short, was once the rage and even got <a href="https://x.com/solomonstre/status/1111004913222324225?lang=en">proclaimed by the inventor of Docker</a> as what could have been the “missing piece” for isolating computational work. <a href="https://www.virtualbox.org/">Virtual machines</a> didn’t do it, <a href="https://podman.io/">containers</a> didn’t do it, <a href="https://mirage.io/">unikernels</a> didn’t do it, perhaps WASM was the solution we needed.</p>

<p>Starting as <code class="language-plaintext highlighter-rouge">asm.js</code>, a <a href="https://en.wikipedia.org/wiki/Asm.js">subset of JavaScript</a> with the intent being performance, WASM is a collection of technologies that allow programs in various languages to not only be run in <a href="https://rustwasm.github.io/book/reference/js-ffi.html">others’ environments</a> but also in a way that’s <a href="https://shopify.dev/docs/apps/build/functions#how-shopify-functions-work">secure</a>. Included under its umbrella are WAT, <a href="https://webassembly.github.io/spec/core/text/index.html">WebAssembly Text format</a>, as well as <a href="https://developer.mozilla.org/en-US/docs/WebAssembly#browser_compatibility">near universal browser support</a>.</p>

<p>This does mean there are two problems, one is bridging WebAssembly technologies into an Elixir project (e.g. writing a computationally expensive function in Rust and then importing over) as well as bridging Elixir into the world of WebAssembly (e.g. writing a module in Elixir to then be used in a separate program). At the time of writing this, there are no <a href="https://wasmer.io/search?q=elixir">WebAssembly packages with Elixir</a> or <a href="https://github.com/appcypher/awesome-wasm-langs?tab=readme-ov-file#elixir">maintained Elixir tooling</a>.</p>

<h2 id="what-is-elixir">What is Elixir?</h2>

<p><a href="https://elixir-lang.org/">Elixir</a> is a, taking from their website, “dynamic, functional language for building scalable and maintainable applications”. It’s a part of the <a href="https://www.erlang.org/">erlang</a> ecosystem since they both run atop of the <a href="https://en.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine)">BEAM (Erlang Virtual Machine)</a>; you could view it as being similar to how both <a href="https://www.java.com/en/">Java</a> and <a href="https://clojure.org/">Clojure</a> run on top of the <a href="https://en.wikipedia.org/wiki/Java_virtual_machine">Java Virtual Machine</a>.</p>

<p>Widely known for the <a href="https://www.phoenixframework.org/">Phoenix framework</a> (the Elixir version of <a href="https://rubyonrails.org/">Rails</a>), Elixir is a nifty functional programming language if chaining together <a href="https://elixirschool.com/en/lessons/basics/pipe_operator">pipes in your code</a> sounds appealing to you. Otherwise it can be appreciated for the <a href="https://toolshed.com/2007/09/999999999-uptim.html">reputable reliability of erlang</a> with its “nine nine’s of uptime”.</p>

<p>In addition, both Elixir and Phoenix have consistently been reaching the top of the leaderboard for <a href="https://survey.stackoverflow.co/">Stack Overflow’s Developer Survey</a> (RIP Stack Overflow) and is the foundation for a <a href="https://x.com/samaaron/status/1960274756004986964">rewrite of Sonic-Pi</a>! It’s mostly fallen off WebAssembly interest due to <a href="https://github.com/RoyalIcing/Orb?tab=readme-ov-file#anti-features">avoiding</a> core features like concurrency or <a href="https://github.com/atomvm/AtomVM">BEAM’s complexity</a>.</p>

<h2 id="why-bring-the-two-together">Why bring the two together?</h2>

<p>Why not?</p>

<p>For a fuller answer, aside from <a href="https://github.com/hdresearch/firebird/blob/master/docs/performance.md">performance gains</a>, there are too many recent demos in the tech industry where <a href="https://github.com/anthropics/claudes-c-compiler/issues/1">hello worlds don’t compile</a> or <a href="https://github.com/wilsonzlin/fastrender/issues/98">browsers don’t build</a>.</p>

<p>We could harness technology for the vanity of buzzwords <em>or</em> we could harness technology towards implementing gaps that, otherwise, would require several hours of human engineering time. Tough choice.</p>

<h2 id="what-you-can-now-do">What you can now do</h2>

<p>You can use <a href="https://github.com/hdresearch/firebird/blob/master/docs/GETTING_STARTED.md">WebAssembly from Elixir</a>! You can also transform <a href="https://github.com/hdresearch/firebird/blob/master/docs/ELIXIR_TO_WASM.md">Elixir</a> or <a href="https://github.com/hdresearch/firebird/blob/master/docs/PHOENIX_TO_WASM.md">Phoenix</a> projects to WebAssembly!</p>

<p>Don’t believe me? <a href="https://hex.pm/packages/firebird">Install the package from hex</a> or point your coding agent at this repo and have fun <a href="https://github.com/hdresearch/firebird/">https://github.com/hdresearch/firebird/</a></p>]]></content><author><name></name></author><category term="blog" /><summary type="html"><![CDATA[Contents]]></summary></entry></feed>