Here’s some refreshing news. Instead of burbling endlessly about an imagined (and unknowable) future, a trio of distinguished AI researchers decided to explore something that we do know about; namely, the past.
They built a “vintage” large language model (LLM) called Talkie that was trained exclusively on pre-1931 English texts. So, unlike most LLMs, which are trained on everything that their makers can scrape from the internet, Talkie has a hard cut-off date – 31 December 1930 – in its knowledge base.
What’s the significance of that date? Simply this: everything published before it is in the public domain, even under the warped US legislation crafted over decades by Hollywood to ensure that Mickey Mouse stayed copyrighted for as long as possible.
That places Talkie squarely inside the copyright fight that is now engulfing the AI companies. It is, by design, the thing that the plaintiffs in the current piracy lawsuits are arguing that the big labs should have built – instead of ripping off the intellectual property of millions of authors.
The model was trained on 260 billion tokens (words or fragments of words) of historical pre-1931 English text, including books, newspapers, periodicals, scientific journals, patents and case law. What’s really charming about it is that its conversational interface was built from pre-1931 reference works including etiquette manuals, letter-writing manuals, encyclopedias and poetry collections.
So Talkie’s notion of how to respond to the user is reconstructed from Edwardian and Victorian conventions of correspondence and conduct rather than the conversational guff one gets from 21st-century LLMs. So, at times, you half-expect it to reply to a prompt saying: “With regard to your inquiry of the 15th… ”
It would be tempting, but wrong, to regard Talkie as just a retro curiosity. In fact, it provides a vivid confirmation of Alison Gopnik’s insight that LLMs are not artificial “minds” but just cultural technologies such as writing, printing and libraries.
In other words, they are tools we use in order to access the accumulated knowledge of our species. Talkie isn’t pretending to be a human from the 1930s. It’s just providing a view into the collective written knowledge of the period and, via that, an insight into its culture.
Newsletters
Choose the newsletters you want to receive
View more
For information about how The Observer protects your data, read our Privacy Policy
In that sense, it enables us to interact with texts written by people who did not possess the 20/20 vision that hindsight bestows. Ask Talkie what will be the likely effect of the automobile on public morality, for example, and it will dig out this from Blackwood’s magazine: “The automobile has had an unquestionable effect in democratising pleasure-seeking, and enlarging the sphere of popular recreations. It has popularised holiday-making, and added to the number of those who spend their leisure in outdoor amusements. The consequence has been to improve public morality, inasmuch as it has substituted innocent for vicious pleasures, and has set up a healthier standard of enjoyment.”
Talkie is a good example of the value of curiosity-driven research. It’s truly generative in the best sense of the word, in that it prompts people to think, to daydream about having conversations with people in the past.
What would you ask someone with no knowledge of what was to come?
It also enables us to do thought experiments. How good were people in the pre-1930s at predicting things that happened before the end of the period? Did ordinary people in Germany in the 1920s foresee the possibility of a fascist takeover?
Or, as Demis Hassabis, the boss of Google DeepMind, wondered, could a language model trained on data up to 1911 independently discover general relativity, as Einstein did in 1915? And so on.
What Talkie quietly demonstrates is that Gopnik’s framing isn’t just a corrective to the breathless mythology of AI; it’s a guide to what the technology should be allowed to be. A library, after all, doesn’t justify its holdings by claiming they constitute a “mind”. An LLM is just a special kind of library that has an interactive, infinitely patient, combinatorial index to our accumulated written past, which you can access.
Talkie also provides a neat, understated riposte to the AI companies’ insistence that their technology cannot work without ingesting the contemporary copyrighted web. It shows that they have been outflanked, intellectually if not commercially, by three researchers working with material from before colour film was invented.
The smallness of the team, the modesty of the claim, and the public-domain training corpus together represent a quietly impressive rebuke to the current Gadarene stampede in search of AI supremacy.
So maybe the best slogan for Talkie would be: the shock of the old.
What I’m reading
Out of control
When Decentralisation Fails is a long, long essay by Alex Chalmers on the problem of governing AI.
Small talk
A really interesting blogpost by economist John Quiggin is What If We Just Stopped Hyperscaling?
Critical mass
The Life and Death of the Book Review is a sad tale of literary decline by David A Bell.
Photograph by Eddy Buttarelli/REDA/Universal Images Group via Getty Images



