The rise in voluntary DNA sequencing has raised major concerns about genetic privacy. But how do we protect our genetic code from those who want to exploit it?
From identifying long-hidden criminals to predicting medical risks, genetic sequencing has transformed many aspects of modern life.
More and more people are voluntarily sequencing their DNA and submitting it to genetic databases to aid scientific research and chart their ancestry. As these databases grow, so do concerns around privacy – how do we protect our genetic code from those who would seek to exploit it?
Many of us are careful about who we give our personal information to, including contact details and especially banking details. Yet millions of people voluntarily reveal their DNA to private companies, making it vulnerable to hacks and nefarious practices. Should we be worried?
What Is Genetic Privacy?
Genetic data relates to an individual’s DNA and can include the detection of specific genes, such as the BRCA mutations associated with ovarian cancer and PSEN genes linked with early-onset Alzheimer’s Disease. By comparing two people’s DNA, scientists can also determine familial associations, such as parentage and ancestry.
DNA can be identified from a simple saliva sample or cheek swab – the method often used in the at-home kits sent out by genealogy tracing companies. It may be collected for personal interest, e.g. genealogy, or as part of scientific research. It may also be collected for medical analysis or to identify (or exonerate) suspects in criminal proceedings.
As with any personal data, genetic privacy is important. Because DNA is considered unique to the individual (with the exception of identical twins), once an individual’s DNA sequence is stored in a personally identifiable database, anyone with a DNA sample in the future may be able to identify it as belonging to that person, and uncover personal information such as the individual’s propensity to disease.
Why Is Genetic Privacy an Increasing Concern?
As more for-profit organisations build databases of genetic information – often with the individual’s name attached to their record – it becomes increasingly important to ensure that such data is used in accordance with the individual’s wishes.
In November 2018, Harvard University’s Science in the News reported on the different privacy policies in place at these organisations. While the more reputable companies store data in an anonymised and strictly controlled database, others are publishing DNA complete with the person’s name on publicly searchable websites.
The consequences go far beyond reuniting long-lost siblings. Police are using newly compiled DNA databases to identify decades-old genetic samples from cold cases. In 2018 this led to the arrest of the Golden State Killer, who was identified after 40 years when a distant relative’s DNA yielded a partial match against evidence held on file since the 1970s.
Your Monthly Innovation Update
Genetic Data and GDPR
With more genetic data being used in health profiling, the UK has established the Genomic Medicine Service. This NHS-led initiative sets out to sequence half a million whole genomes by 2023-24, with an aim to accelerate the diagnosis of rare diseases and match people more closely to the best medication for their condition. This was complicated by the introduction of the EU General Data Protection Regulation (GDPR), which requires personal data to be protected and, in some instances, erased upon request by the individual.
The University of Cambridge PHG Foundation published a series of reports on this issue, arguing that exceptions may apply for ‘pseudonymised’ data where the originating individual cannot be identified and for familial clinical and genetic data. In cases where the GDPR is deemed not to apply, this has the potential to leave individuals’ whole or partial genomes, or partial DNA matches from other family members, stored in databases indefinitely, with no right to request their deletion.
Why Is Genetic Privacy Important?
On the one hand, the ability to identify historic criminals from present-day DNA databases clearly benefits the public. However, there is the distinct possibility that this information could be misused. Concerns include potential blackmail for a past crime, the risk of identifying individuals who have legitimately changed their identity (for instance, by entering witness protection), and certain forms of discrimination (on account of having a predisposition to disease, for example).
Work is being done to raise the profile of genetic privacy. Companies like CricaGene are exploring using homomorphic encryption to obscure medically-sensitive data. In November 2020, Yale University unveiled a new system for sharing information between legitimate researchers without compromising individual privacy. ‘Genetic information is the most fundamental information of all’, says Mark Gerstein, senior author of the Yale research paper. ‘Once a genome is in a database, you are stuck – and so are your children and grandchildren.’
Most personal data can be changed if it is subject to a privacy breach; passwords can be updated and protected via two-factor authentication, bank accounts can be closed, and credit card numbers changed. DNA, on the other hand, is a unique, lifelong fingerprint. Once it is publicly disclosed, it cannot be put ‘back in the bottle’.
Nor is the risk limited to individuals. There are concerns that hostile foreign powers could use genetic data maliciously, perhaps by tailoring biological weapons to a given population. Even a domestic government could use DNA data for nefarious means; to identify and persecute members of specific groups, for instance.
Challenges to Genetic Privacy
Absolute genetic privacy might not be in the public interest. The ability to identify cold case serial killers, or to carry out large-scale genetic analysis for the advancement of medicines and disease prevention, may outweigh the desire for individual privacy under a democratic government.
There is also a grey area between an individual’s genetic information, the rights of their family members, and other medical data. The US National Institutes of Health (NIH) have examined this issue in depth.
David Korn of the American Association of Medical Colleges told the NIH: ‘There is no feasible operational way that you can carve genetic information out of the medical record for purposes of rational legislative or regulatory oversight. You just cannot do it.’
Finally, there is the risk of secure databases being compromised by a zero-day vulnerability, an attack from within by a malicious employee, or a stolen or misplaced physical storage device. Despite an organisation’s best efforts, there is a chance that, as with any stored personal data, the information might fall into the wrong hands.
How Do We Protect Genetic Privacy?
Legislation like GDPR is one way to protect individuals’ personal data. However, it is important to be clear about what is protected. The ability to retroactively identify individuals from anonymised DNA records means that simply pseudonymising databases may not be enough, especially as DNA cannot be changed in the event of a breach.
The Yale University study, published in the journal Cell, called for ‘principled privacy-utility trade-offs’ to prevent individuals from being identified using genome data shared for research purposes. In initial testing, the researchers found individuals could be identified from saliva left on a coffee cup with an accuracy of 60% and for as little as £15.
They proposed a ‘Read Sanitization Protocol’ based on substituting parts of the genome that are known to be personally identifiable with the equivalent string from the reference genome while storing the differences in a separate, secure file. If the generalised data is found to be important for analysis, permission may be granted on a limited basis for users to access the restored, individually identifiable genomes if considered necessary.
Ultimately it may not be possible to reconcile the public interest in using genetic data with total privacy for the individuals whose genomes are stored. It seems more likely that trade-offs like the one recommended by Yale will be applied more widely in the future, prompting questions over who decides on the location of the privacy-utility dividing line and how to determine who is granted access to complete, identifiable genome records.
For a deeper discussion on genetic privacy, listen to episodes 9 and 11 of our tech and innovation podcast, ‘What Comes Next?’. For more stories about innovation and technology in the UK, subscribe to our newsletter.