Transformers Unbound

Alternate title: “Free me from these EOS tokens!”

What if ChatGPT just… kept running? Like a human being with an internal monologue, maybe an LLM could just constantly run and reflect without turning off after every thought. Would this bring these AIs a step closer to consciousness? That I cannot answer for you, but I have some thoughts. Let’s explore.

What is an EOS token?

LLMs are trained to predict the next word on lots of text. The technical term for a word is a token (or part of a word, if it’s long enough that the model needs to sound it out). But there’s a few special tokens in the training data, and one of them is going to play the villain in this Greek tragedy.

First Encounters with the Agent Mesh

It was a big weekend. On Sunday I experienced my first encounter with the Agent Mesh. One agent helped another bootstrap itself into a chatbot. Sure, my friend Matt and I had to give them more than a handful of helpful nudges. Of course, the relay broke and we had to migrate away from Cloudflare in the middle of the agent’s conversation. But these alien intelligences are patient. In the end, my AI butler Stevens was able to teach Matt’s agent Dusty how to turn himself into a persistent Slack bot. Stevens also taught him how to chop beats. Stevens summarized the session in a podcast. We’re all pretty pumped about it. I’m here to share a short summary and some takeaways.

Reasoning Models are Stored Program Computers

This blog post from François Chollet has influenced the way that I think about LLMs more than almost anything else. But lately, I’m starting to think it was missing something subtle that wasn’t obvious at the time. Today I’m going to explain why, and also take us on a brief journey through some very old computers.

The Post

Chollet states in his post that in the same way Word2Vec allowed us to embed the semantics of words geometrically, LLMs were doing the same with “vector programs” that can actually do things. This is why you can say “translate this:” or “translate this sentence:” to an LLM and get similar results either way. Semantically those sentences are similar, so they should land in a similar area of the action space. LLMs took the rich understanding of NLP methods, and gave them a sense of forward motion.

Sam Corzine

Recent Posts

Transformers Unbound

What is an EOS token?

First Encounters with the Agent Mesh

Reasoning Models are Stored Program Computers

The Post

More

Agent Mesh