/ home / posts / We were promised JARVIS, instead we got 140 tool approvals

Contents

Where we are

Similar to how old Thiel’isms have become cliche (likely due to the popularization of his book), it seems the bit “We wanted flying cars, instead we got 140 characters” plays out again today. We are promised we might cure cancer even though it’s a promise that’s been narrated in the past. Sure, there’s nuance to be juggled and some wins to be recognized, but something’s missing from the world we’re in.

If AI really were solely good and beneficial, then it wouldn’t come across as a stupid mistake in the first place when undone. If success stories for those driving bleeding edge technology were “wins for the nerds”, then a company exiting to Nvidia wouldn’t come as an alarming event.

Yet, something is undeniably happening. If Bezos in 1997 were around today, he’d note the explosive growth of ChatGPT. Politicians are giddy to get involved before the tech becomes antiquated. Technical employees are snatching sports athlete level salaries. A whole wave of automated programming is happening as you read this paragraph.

All while this is going on, does our economy look like a horoscope?

Diagram showing the circular nature of deals among large tech companies

Admittedly, yes. But, financial matters aside, are we at least making advancements towards that utopia of tomorrow? When the biggest name in AI nudges into advertisements, it’s hard to take it seriously.

Where LLMs have shown a strong degree of “usefulness” is in being able to represent external requests to APIs or MCPs in the form of structured outputs or calling tools. It’s nice that a coding agent can also be used for smart home automation but, after jumping through the researching and code invocations, what you’ve ended up with is a heap of tokens for every dance between the model and the available functions.

We were promised JARVIS, instead we got 140 tool approvals. The model asks for permission to check the temperature, the model asks for permission to set the temperature, the model asks for permission to mark the change… Tony Stark would have never flown off the ground if he had to help JARVIS through every individual task.

Looking forward

Sci-fi has been an oracle for predicting the future whether that’s lightsabers or the shoes and hoverboard from Back to the Future. So what references are there for “AI”?

In popular culture, there are plenty of examples of AI being the “bad guy”. HAL 9000, Skynet, Ultron, AUTO, Agent Smith, ED-209, and so on. Usually when it’s meant to serve as a vanguard or marshal.

When AI’s the “good guy”, it’s as an auxiliary character (such as droids in Star Wars) that primarily augments the functions of a human (such as when a droid sits in a fighter jet to assist the pilot) or as the protagonist “taking on the world” (i.e. Astro Boy). The latter case doesn’t matter as much since the character’s protagonist status supersedes them being an AI. Fans don’t care the Iron Giant is from another planet, or that Mega Man is canonically a robot (unless it gives them a point to relate to as being “strange”).

Interestingly, in fiction, when we consider the below rule from the 1979 IBM Training Manual:

“A computer can never be held accountable, therefore a computer must never make a management decision.”

Every villain scenario of AI starts with an engineer along the way failing to follow this rule. What ultimately ends up happening is we find ourselves with an AI that takes the rules too literally. It’s even worth noting Asimov’s books are specifically about well intended “laws” for robots never working! So, what are we missing?

Useful AI

Which brings me to what I proclaim is missing from AI - people want to confidently and reliably tell their computer to do a thing and it does it. Nobody in sci-fi interacting with a superintelligent machine expects to approve steps or “auto-accept” their contributions.

Ever since the popularization of the internet, connected technologies have generally enabled people to do things. You no longer need to walk to a store to buy a new pair of pants. You no longer need a physical mainframe to host a website. You no longer need to have connections to a music label to get an album published.

But, all of these dashboards and apps require some understanding of what you’re doing. You can’t reasonably purchase a pair of pants on a website in a written language you don’t understand. You can’t sign up on AWS and expect a website to magically appear. You can’t publish songs that haven’t been made yet. Sure, there are solutions for all the above but those too come with their own matters.

Complexity, naturally, is inherent to software. With every plaintext database, there’s a NoSQL database with vector search. With every simple audio software, there’s a GPL licensed DJ software. Zawinski’s law, too, relates to this matter.

To reference a part from “Augmenting Human Intellect” by Douglas Engelbart:

Every process of thought or action is made up of sub-processes. Let us consider such examples as making a pencil stroke, writing a letter of the alphabet, or making a plan. Quite a few discrete muscle movements are organized into the making of a pencil stroke; similarly, making particular pencil strokes and making a plan for a letter are complex processes in themselves that become sub-processes to the over-all writing of an alphabetic character.

Although every sub-process is a process in its own right, in that it consists of further sub-processes, there seems to be no point here in looking for the ultimate bottom of the process-hierarchical structure. There seems to be no way of telling whether or not the apparent bottoms (processes that cannot be further subdivided) exist in the physical world or in the limitations of human understanding.

Each button or page is a process or sub-process within an application. Companies’ entire products grow over time due to users specifying further sub-processes or requesting new processes be enabled. With ordinary software development, this indeterminate bottom of the “process-hierarchical structure” results in companies’ products never being “done” since users provide perpetual work.

In the case of LLMs, being able to handle arbitrary text, this provides a solution to Engelbart’s issue of how processes and sub-processes look like turtles all the way down. LLMs are able to handle text and any portrayal of processes and sub-processes would simply consist of more text. Consequently, the implementation of the components to a solution, too, would simply consist of text (assuming there is at least a programmatic solution since code is just text). If any request or input can be handled with sufficient capabilities, then the arbitrary depth of complexity to any task can assumed to be handled as well thanks to the newfound “intelligence” found in these models.

Compared to the last few decades of NLP, what LLMs unlock from a developer perspective is the capacity to handle a much wider range of human input. Rather than play along with calling a support bot on the phone where you verbally answer structured questions, a person can say their request however they’d like to and a system can completely fulfill the request end to end.

What the future holds

Today, we sit at the edge of our seats waiting for the next announcement that might be the last announcement they ever need to make because it’s the unveiling of superintelligence. But, what are we waiting for exactly? Each model is said to be AGI before launch and then said to be the immediately previous step to AGI after launch even though Google demonstrated convincing conversational AI half a decade ago.

I argue that instead AGI is something that more takes the shape of God of the gaps, something we represent in the space of “stuff we haven’t figured out yet”. If “AGI” were so clear a topic, it wouldn’t be a tenuous piece of a legal contract (web archive).

With each model improvement, we set a new ceiling that we must reach. Results of the past end up similar to how we view 480p even though there was a time where it was considered cutting-edge display technology. It makes sense for people to always want this idea to be a bit outside whatever we’ve reached because why wouldn’t you want progress to go even further than wherever it is today?

If not AGI, what is the future we’re headed towards? To explain, I’d like to quote a scene from the Mad Men pilot:

[Executives at a cigarettes company explain to a marketing agency that recent reports link smoking with cancer so they’re blocked from advertisements linking cigarettes and health together as they and their competitors have prior to the news]

Don: “This is the greatest advertising opportunity since the invention of cereal. We have six identical companies making six identical products… We can say anything we want.”

Don: “How do you make your cigarettes?”

Lee Junior: “I don’t know.”

Lee Senior: “Shame on you… We breed insect-repellant tobacco seeds, plant ‘em in the North Carolina sunshine. Grow it, cure it, toast it-“

Don: “There you go. There you go. [writes “It’s toasted” on blackboard]”

Lee Junior: “But everybody else’s tobacco is toasted.”

Don: “Everybody else’s tobacco is poisonous. Lucky Strike’s is toasted.”

This describes the state of AI precisely. Everyone making models is using the same architectures and everyone making AI is using the same models yet nobody’s building the same thing. What truly matters is the story as it relates to the people who are using products themselves.

Just because everyone makes computers that descend from the same vacuum tubes, it does not mean code on Apple computers are not ways of expressing ideas and principles as Steve Jobs would put it. Just because LLMs may or may not be non-deterministic random word generators, it does not mean Claude Code cannot augment a developer to be dramatically more proficient in a way that would have historically been considered impossible.

The successful AI story is going to be the one where the narrative around how people are with it outshines the narrative around the AI in and of itself. Those with an aspiring God complex will be the ones to offer the snake oil in this market which is able to solve all the problems. Those who want to make something useful and worthwhile will be offering something that, ideally, would make someone from five years ago (or more recently if you’re operating in a rapidly evolving ecosystem) react with shock.

When everyone else is telling you how great they are, doesn’t an option that makes you better than the rest of them sound better? In simpler terms, people don’t want an Iron Man automated by JARVIS, they want their own Iron Man suits with their own JARVIS’es.