50-year-old PDP-11 trained a transformer in 3.5 minutes

A 1979 PDP-11, a machine that weighs about 30 kg and runs on a 6 MHz processor with 64 kB of RAM, has just trained a transformer model. The stunt came from Dave Plummer, a former Microsoft engineer and one of the people behind Windows, and it is exactly the sort of proof that makes modern AI feel both miraculous and a little overhyped.

The model, called ATTN-11, was written in PDP-11 assembly and trained to reverse a sequence of eight digits. That sounds trivial, but the task is designed to force the model to learn a rule rather than just memorize answers. In other words, it is a tiny version of what larger systems are doing all day, just without the giant data centers and the electricity bill that probably has its own electricity bill.

ATTN-11 learned the rule, not the sequence

Plummer says the system reached 100% accuracy after about 350 steps, finishing in 3.5 minutes. That is the neat part: on hardware from the late 1970s, the model still managed to generalize a pattern instead of stumbling over the math. It is a reminder that transformer-style behavior is not just a function of brute force; architecture matters, even when the hardware looks like museum stock.

The comparison to modern silicon is brutal. Plummer previously estimated that the same PDP-11 is about 200,000 times weaker than Apple M2 Ultra in single-threaded terms. That gap is wide enough to swallow entire product cycles, yet this experiment shows that some AI ideas are lightweight enough to survive on far humbler machines than the industry likes to admit.

PDP-11 transformer training on vintage hardware

There is a useful historical echo here. Early computing often squeezed useful work from hardware that today would struggle to impress a smartwatch, and AI research is circling back to that instinct as smaller models, edge inference, and efficiency-first designs become more attractive. The giants are still racing for scale, but experiments like this hint that the next round of AI bragging rights may also involve doing more with absurdly less.

Machine: PDP-11 mini-computer from 1979
CPU speed: 6 MHz
Memory: 64 kB

The open question is whether this kind of toy-scale success says anything useful about the future of AI deployment. Probably yes, but only in a narrow way: the industry still wants bigger models, yet the real growth opportunity may be in compact systems that can learn and run where cloud-grade hardware is unnecessary or impossible.

ATTN-11 learned the rule, not the sequence

PDP-11 transformer training on vintage hardware

Leave a comment