Reasoning Models are Stored Program Computers

April 5, 2026

This blog post from François Chollet has influenced the way that I think about LLMs more than almost anything else. But lately, I’m starting to think it was missing something subtle that wasn’t obvious at the time. Today I’m going to explain why, and also take us on a brief journey through some very old computers.

The Post

Chollet states in his post that in the same way Word2Vec allowed us to embed the semantics of words geometrically, LLMs were doing the same with “vector programs” that can actually do things. This is why you can say “translate this:” or “translate this sentence:” to an LLM and get similar results either way. Semantically those sentences are similar, so they should land in a similar area of the action space. LLMs took the rich understanding of NLP methods, and gave them a sense of forward motion.

Chollet goes on to say that a good way to think of an LLM is as a program database, where your prompts are querying the database and the generations are the results of running those programs.

Reading this blog a few years later, with the advancements that reasoning models like o1 and R1 have brought us, I find myself quibbling with his use of the word “program” here. Did GPT-3 have programs stored in its weights? My gut says no. I think this was too generous of a word for where the technology was at the time.

A brief (probably mostly correct) history of computers

From my perspective, there are three key eras that computers/programs have gone through:

Fixed program computers. The pocket calculators and accounting machines of old that were state of the art up until World War Two. They could do computation, but these computations were fixed into the hardware when they were manufactured. If you need new programs, buy a new calculator.
Programmable computers. The rush to crack codes during WW2 leads to a wave of innovation that gives us the ENIAC. Using a system of wires you could reroute the flow of computation in between each run, allowing you to test out new programs every day if you wanted.
Stored program computers. This is the logical consequence of the Universal Turing Machine, made concrete with the Von Neumann architecture. We quickly get computers that can store programs in memory! The EDVAC project gives us the first manifestation of this, and since then there’s no separation between the place where you store the data and the place where you store the programs.

Nowadays we take the concept of the stored program computer for granted. It’s natural to assume that when you get a new computer it both has programs in it, and can be trivially modified with new programs. We’ve all since decided that programming computers with physical wires is far too tedious (except for a few modular synth enthusiasts who do this for fun).

Early LLMs were programmable, but did not store programs

This is a delicate analogy, but I believe that the GPT-3 era of LLMs that Chollet was referencing were not truly a “database of programs.” They struggled enormously if you tried to get them to do anything that required more than one logical step. The main thing they could do was fuzzy recall of facts from their memory, and one-shot a set of cognitive tasks that showed up a lot in the training data. For example, they were great at translation, summarization, simple programming tasks.

What this looks like to me, is that the LLMs of that time had a large library of (comparably complex) operations in them. These were the cognitive equivalent of a single arithmetic step on the ENIAC. The prompts we were writing at that time were wires between these primitive operations. You could patch together some simple operations, but the programs themselves were not in the weights.

Reasoning models store chains of action

An interesting fact about the RL process used to train reasoning models, is that it essentially never changes the token distribution to shift probability to a very unlikely token. The explore/exploit paradigm at the root of these training loops means the model rarely ever tries a path that wasn’t at least in the top 5-10 potential candidates at each step. So you’re not getting new behavior, you’re getting new combinations of behavior.

This is the actual descendant of the stored program computer. Through gazillions of training steps, the LLM has learned to combine long chains of operations into coherent sequences. It implicitly stores these sequences in its transition weights, and can recall them when prompted. This means less time for us finicking with wires, because the programs are in the computer.

Something to chew on: Meta-cognitive Loops, Society of Thought, Online Learning

If you believe the reasoning model is a stored program computer, then some questions naturally follow. What programs are stored in there? What is their nature and how are they structured? How can we make them better?

In the DeepSeek R1 paper (Supplementary C.2) they show the model autonomously starts to push up tokens like “wait”, “mistake”, “however”, and “but.” These are useful because they tend to trigger self-reflection flows where the model can expend compute checking its work. These phrases have taken on the name Meta-cognitive Loops in the research literature and are being actively investigated. This collection of keywords echoes the control flow (if/else/try/except) in programming languages. They route you through certain paths, and are invariant to the specific task/data you’re working with.

The Society of Thought paper shows how reasoning models will set up distinct personas in their chain of thought reasoning to check themselves. The tension between these personas seems to improve the quality of responses. The analogies become stretched here, but this feels to me like it’s more at the framework or design pattern level. Language models are finding that modular design and separation of responsibility is a good way to structure vector programs.

More broadly, there is a tension between the framing of a stored program computer and the reality that LLM weights can only be updated at train time. Prompts become a rich scripting language over a powerful standard library, but we still don’t have write access to the memory. In other words, the programs are stored in memory but doing the actual write is extremely complex. Fixing this would seemingly complete the stored program analogy, and much of the field is pushing towards Online Learning to do exactly that.