Would you pay an AI to read your book? Authors may soon not have any choice

A ‘shadow library’ of 500,000 books has been used to teach large language models. That could become the norm

John NaughtonColumnist

In 1978, American scientist and futurist Roy Amara formulated what became known as Amara’s law. This says that we tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run. We’ve seen this repeatedly. Photography, for example, was supposed to kill painting (news to Picasso). And nobody imagined when Twitter launched in 2006 that, decades later, a few tweets would bring down a large US bank in a day.

We’re now in the early phase of Amara’s law, with large language models (LLMs) and their associated chatbots. Current discourse focuses overwhelmingly on their short-term implications: students cheating with ChatGPT-written essays, for example, as well as journalists being laid off, torrents of misinformation and supposed “AI slop” coursing through our media ecosystem, job losses, ethical deficiencies, misuse and the astonishing growth of user-dependence on chatbots for therapeutic purposes – to list just a handful.

Newsletters

Register to hear the latest from the Observer

For information about how The Observer protects your data, read ourPrivacy Policy

Then there’s the wholesale theft of intellectual property involved in training these models. This, at least, was obvious from the start, and is now slowly being tackled. For example, Anthropic, which operates the Claude AI, recently lost a key court case and has agreed to pay $1.5bn to authors for illegally copying 500,000 books from a “shadow library” that contained digital versions of their works – all collected by renegade librarians dreaming of universal book access.

Which is good news for this columnist because the search engine provided by the lawyers revealed that one of my books appeared in this dodgy collection, and I look forward to discussing with the publisher my share of the $3,000 compensation.

Just for the record, though, Anthropic claims that it didn’t actually use this library for training (it had just downloaded it), and was penalised simply for possessing unauthorised copies.

Kevin Kelly, Wired magazine’s founding executive editor – and a prolific author – also checked the database and found that four of his five published books were there. Contrarily, however, he felt “honoured to be included in a group of books that can train AIs that I now use everyday” and “flattered that my ideas might be able to reach millions of people through the chain of thought of LLMs”. He seemed disappointed that Anthropic didn’t actually use the shadow library, so those fine ideas may remain trapped between the hard covers of his books.

Why? Kelly (pictured) thinks that authors have got the wrong idea. “They believe,” he writes, “that AI companies should pay them for training AIs on their books. But I predict in a very short while, authors will be paying AI companies to ensure that their books are included in the education and training of AIs. The authors (and their publishers) will pay in order to have influence on the answers and services the AIs provide. If your work is not known and appreciated by the AIs, it will be essentially unknown.”

If that reminds you of the famous “Right to be forgotten” on the internet decision by the European court of justice in 2014, then join the club. That judgment empowered European citizens to petition for the removal from search engine results of embarrassing online information about them.

It wasn’t truly a right to be forgotten, merely a right not to be found by Google – and in that sense it was an implicit acknowledgment of the search engine’s power. If Google didn’t find you, then you didn’t exist.

Kelly’s thesis likewise rests on an implicit long-term vision of the kind of power and authority these machines might eventually come to wield. He clearly also understands that LLMs should not be viewed primarily as intelligent agents but as what the psychologist Alison Gopnik calls a new kind of “cultural technology” that enables humans to take advantage of information other humans have accumulated. Such as libraries, printing and books, in other words.

Viewed through this analogical lens, what author wouldn’t want her book and ideas to be in print or in libraries – or in LLMs? Perhaps that’s the deeper lesson of Amara’s law: just as the printing press reshaped knowledge and authority over centuries, LLMs may slowly reshape how we produce, share and even think about knowledge itself.

And writers seeking an audience may need to accept LLMs alongside humans as their “readers”.

What I’m reading

Class of its own

Britain’s Elite Needs a History Lesson is an interesting essay by Alastair Benn, prompted by an interview of Nigel Farage.

Think piece

Anthony Gottlieb has written a nice piece based on his new biography of the philosopher Ludwig Wittgenstein. It’s called Wittgenstein and Philosophy as ‘Neverending Therapy’.

Out of this world

Ed Simon’s Close Reading Carl Sagan is a lovely essay about the great science communicator.

Artificial intelligence

Science & Technology

Share this article

About

Work with us Careers

Join

PDF Edition Journalism school

Events Shop

Follow

The Observer

The Observer Magazine

The ObserverNew Review

The Observer Food Monthly