National

Sunday 26 April 2026

Arms race to exploit personal data exposed by Biobank breach

Ethics researchers have warned that while data leaks and hacks are taken seriously, the threat of data misuse by ‘rogue’ researchers is flying under the radar

UK Biobank’s data breach is part of a global arms race between research bodies and criminals seeking to exploit personal data

Last week’s revelation that “rogue” researchers had tried to sell UK Biobank data came after 18 months in which the charity has been fighting to maintain the security of the biological data it holds of more than 500,000 volunteers. After receiving an anonymous tip-off, UK Biobank discovered that researchers linked to three Chinese academic institutions, who had previously been vetted by the charity, had created listings on e-commerce websites owned by Alibaba.

It banned the three institutions and asked the government for help. Diplomats liaised with the Chinese government and Alibaba, which quickly took down the listings.

Although the data contained no names, addresses or dates of birth, there is a risk that anonymised data can be pieced together to identify people. Naomi Allen, UK Biobank’s chief scientist, said they had been “assured” that the data had not been “sold to third parties”.

The UK Biobank incident was just one of several data security issues that emerged last week. Hackers stole 19m records from the French agency that manages driving licences, passports and ID cards, while Booking.com and ADT, the home security firm, were also hacked. The UK was hit by 8.5m cybercrimes last year.

Data breaches became a headache for UK Biobank in 2024, after academic journals began requiring researchers to publish computer code they had used to analyse large medical datasets. Sometimes that code, usually published on code-sharing platform GitHub, has included raw data from the charity, according to Allen.

“We’ve got a machine-learning algorithm that does a daily trawl of all open-source repositories and we check that it doesn’t include any data,” she said. “When we do find it, we get the researcher to take it down immediately. Or if we can’t find the researcher, GitHub takes it down for us. That has been really successful.”

Ethics researchers have warned that the type of threat posed by rogue researchers – data misuse rather than data leaks or hacks – is not taken seriously enough by biobanks. This was based on a 2021 analysis of BBMRI-ERIC, an association of more than 550 European biobanks.

‘It’s a balance between advancing science and ensuring the security of data’

‘It’s a balance between advancing science and ensuring the security of data’

Naomi Allen, UK Biobank

UK Biobank was one of the first attempts to gather large amounts of medical data about individuals so that researchers could make links between different biological processes in the body. Recruitment began in 2003 of people aged 40 to 69 who agreed to have blood tests and body and brain scans, and to answer pages of questions about their health and lifestyle. The details of the ir answers are so intimate that married couples who signed up are treated separately, said Claudine Henderson, who runs one of UK Biobank’s imaging centres in Newcastle.

It has been enormously successful in enabling scientific advances, with 22,000 researchers accessing its data producing 18,000 papers. Doctors now analyse heart scans in seconds with AI developed using UK Biobank data, and NHS clinics can diagnose dementia in minutes.

Newsletters

Choose the newsletters you want to receive

View more

For information about how The Observer protects your data, read our Privacy Policy

Yet UK Biobank’s size, longevity and success is part of the reason it is more vulnerable. When it began making data available to researchers in 2012, it let them download datasets. Now the database is nearly 40 petabytes (about 40m gigabytes) – a high-speed domestic connection would take more than 10 years to download it.

Most of the large health data repositories that have been launched since then, such as Finland’s FinnGen, Germany’s Nako, and All of Us and the Million Veteran Program in the US, have put data in the cloud, and forced researchers to do their analysis there instead.

UK Biobank began to transfer to a similar system in 2021, but the security measure was highly unpopular with researchers as it was more costly. Timothy Raben, a geneticist, said in 2024 that some groups would have to give up research because it had become “cost prohibitive”. China’s Kadoorie Biobank still appears to allow researchers to download data and biotech firms are increasingly turning to it and other Chinese data sources, industry sources say.

“The most secure data set is one that’s locked away and is never used,” Allen said. “We want to make rapid progress into the causes and treatments of disease. It’s a balance between making the data available and advancing science versus ensuring the security of the data.”

There is a risk in sharing data, but also in not sharing it, she said.

“Scientific progress will not be made if you don't have the global collaborative community working on these data to make those discoveries,” Allen said. “And I think that trade off is difficult to get right, because the technology is moving all the time.”

Follow

The Observer
The Observer Magazine
The ObserverNew Review
The Observer Food Monthly
Copyright © 2025 Tortoise MediaPrivacy PolicyTerms & Conditions