Overview
In recent years, the use of large-scale genetic data in the medical field has become more prominent. Its dominating presence in healthcare and research has rapidly improved the understanding of a wide range of diseases, including cancer, cardiovascular diseases and neurological disorders. With that said, its increasing usage has unavoidably raised concerns regarding patient security and digital privacy. Such concerns stem from multiple cybersecurity issues such as cyber-attacks, fraud, patient re-identification and file leakage. To prevent such problems, policymakers and bioinformatic companies undertake multiple approaches to strengthen the confidentiality and anonymity of genetic data in healthcare. This is crucially important to eradicate misuse of genetic information and genetic discrimination based on ethnicity or other genetic factors, to ensure the protection of individual rights and access to healthcare for population groups.
Understanding biobanks and genetic data
What is a biobank?
In simple terms, a biobank is a large-scale collection of various biological samples. These include blood samples, skin samples or tissue organs, from which genetic information can be extracted. Biobanks provide researchers, companies and health institutions with an accessible pool of biological information for their intended purposes like patient treatment or commercial drug development. The UK Biobank is the largest dataset in the world, containing whole genome sequencing data from 500,000 volunteers (UK Biobank)
Types of biobanks
The categorisation of biobanks has been refined over the past few years. Several studies have proposed the following subgroups of biobanks:1,2
- Population-based biobanks – based on long-term observations of the general population to detect possible diseases and/or monitor their complications. These are collected through volunteers
- Disease-oriented biobanks – are collected in hospitals during patient treatment and are more clinically relevant compared to population-based biobanks
- Leftover tissue biobanks – tissues collected during clinical diagnosis
- Twin biobanks – a collection of biological data obtained from twins
Biobanks can be categorised in various other ways too, including the tissue type collected, purpose, ownership, size of collection or type of donor (gender, age, pregnant women, etc).1
Role of genetic data in biobanks
In terms of research, the purpose of a biobank is to allow a large-scale matching of genotypes to phenotypes in varying organisational levels, including individuals, organs and tissues. This helps to build a better understanding of the prevention, diagnosis and treatment of less common diseases.
Privacy risks associated with genetic data
Consequences of data breaches and cybersecurity threats
Personal data recorded in biobanks are at risk of a data breach which can lead to fraud, unintended use or data not anonymised.3 Re-identification of personal, anonymised data is possible due to the large number of supplementary information provided alongside genetic data, which poses a major threat to patient/donor confidentiality.4
Misuse of genetic information
Genetic information, just like any other information, is vulnerable to exploitation not associated with healthcare. One of these is genetic discrimination. Genetic discrimination is defined by the National Human Genome Research Institute as “the unequal treatment of individuals based on an aspect of their genetic code or genome, including disease risk factors. Ethical concerns are raised as such discrimination can affect certain population groups by preventing their access to healthcare or diminishing their employability.
Methods of enhancing privacy and security
Technical solutions
Data protection is crucial during both using and sharing of data. To achieve maximum security by improving confidentiality and anonymity, the following approaches can be taken:5
- Cryptographic tools – this uses different algorithms or mathematical tools to conceal data using codes that are difficult to decipher. It is a technique used commonly with bank details, emails and other data that needs to be confidential
- Access control – this technique prevents unauthorized users from accessing it, contributing to limited data exposure6
- Data perturbation methods – in simple terms, data perturbation is the addition of “noise” to disrupt communication or data transmission7
- Blockchain technology – this is the distribution of data in multiple blocks (computers) to prevent centralization of data and thus improve security and management
Policies and regulations
Multiple policies enforce data security in genomic studies. Here are a few examples:
- Universal Declaration on the Human Genome and Human Rights
- Council of Europe Convention
- The European Union’s General Data Protection Regulation
- Genetic Information Nondiscrimination Act
Such regulations primarily declare the importance of restricted access to private genetic data and the confidentiality of patients/donors. They emphasise the need for protecting individual rights in the public, especially those within minority population groups by proposing bans against genetic discrimination and inappropriate data usage.8 These rules typically attempt to extend beyond pre-existing research policies in terms of protecting data privacy, as genetic data can never truly be anonymized. This is because the genetic sequence, unlike other forms of information such as age or gender, is distinctively different between each individual, with the exception of twins (National Human Genome Research Institute). Even so, the consensus amongst scientists is that there is still a lack of convincing policies with regard to genetic privacy when compared to other forms of non-biological information.
Organisational practices
Although many bioinformatic companies attempt to implement a range of technical solutions to increase cybersecurity, studies reveal that many applications produced by biotech companies lack cutting-edge cybersecurity algorithms, leaving them vulnerable to file leakage and illegal access.9 Companies must therefore make substantial effort in updating/maintaining security algorithms if they handle large-scale genetic data. Another approach could involve altering the data storage method, both in terms of format (e.g. BAM or CRAM format), or shifting to an online cloud platform.10 This can further help in reducing financial, personnel or time costs involved in data management, potentially improving its quality and security.
Future directions
Self consent
Donors should be informed by bioinformatic companies and healthcare clinics regarding data usage and protection. Based on this, they must be given an opportunity to make an independant decision as to whether to take part in a biobank or not after giving them all the information about it. This process is known as informed consent.11 Recently, it has been increasingly clear that the traditional approach for self-consent in the general healthcare scene cannot be applied to genetic information, which increases the need for aligning self consent procedures in the current context of genetic data collection.12
Improvements in laws and regulation
It is important for companies to regularly improve their cybersecurity to keep up with the ever-increasing use of genetic data. This is important not only for protection but also for traceability and clarity in data management. To further encourage this, policymakers should not only produce stricter regulations but also redefine genetic information as personally identifiable data by law, as this is not necessarily the case under today's legal understanding.13
Conclusion
In summary, there are many legal and technological ambiguities regarding the security of genetic information. Although biobanks have great potential in improving the understanding of the human body through the management of large-scale genetic information, questions remain about whether the current systems and measures of genetic data security are sophisticated enough to thoroughly protect individual rights. To improve in this aspect, bioinformatic companies must consider updating their cybersecurity, while policymakers should propose regulations that can accelerate this process and, also improve customer-company transparency. Meanwhile, as donors or potential patients, the general public must increase their understanding of how genetic data protection works in today's information society.
Frequently asked questions
How secure are genetic testing apps?
There has recently been an increase in the usage of genetic testing applications among the general population. Although the safety of genetic data varies between companies, it is important to keep in mind the possible privacy risks associated with sharing your genetic data. Not only that current cybersecurity algorithms are imperfect, or laws protecting genetic data are insufficient, customers also tend to skim-read terms and conditions resulting in a lack of understanding of how the genetic data is handled by companies. Those policies may also change after sharing genetic data. Therefore, it is arguable that customers are also responsible for learning the risks of sharing genetic information.
Do I legally own my genetic information?
The legal understanding of ownership of genetic information is vague. From a philosophical perspective, studies agree that genetic information is more likely to not be an appropriate target for self-ownership.14 Hence, it is questionable whether one can claim to own their genetic information, especially after sharing it with bioinformatic or genetic testing companies.
References
- Gramatiuk S, Huppertz B. Types of biobanks. In: Sargsyan K, Huppertz B, Gramatiuk S, editors. Biobanks in Low- and Middle-Income Countries: Relevance, Setup and Management [Internet]. Cham: Springer International Publishing; 2022 [cited 2024 Aug 30]. p. 17–20. Available from: https://doi.org/10.1007/978-3-030-87637-1_3
- Gottweis H, Zatloukal K. Biobank governance: trends and perspectives. Pathobiology [Internet]. 2007 Aug 24 [cited 2024 Aug 30];74(4):206–11. Available from: https://doi.org/10.1159/000104446
- Akyüz K, Chassang G, Goisauf M, Kozera Ł, Mezinska S, Tzortzatou O, et al. Biobanking and risk assessment: a comprehensive typology of risks for an adaptive risk governance. Life Sci Soc Policy [Internet]. 2021 Dec 13 [cited 2024 Aug 30];17:10. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8666836/
- Harbord K. Genetic data privacy solutions in the gdpr. Texas A&M Law Review [Internet]. 2019 Jan 1;7(1):269–97. Available from: https://scholarship.law.tamu.edu/lawreview/vol7/iss1/7
- Wan Z, Hazel JW, Clayton EW, Vorobeychik Y, Kantarcioglu M, Malin BA. Sociotechnical safeguards for genomic data privacy. Nat Rev Genet [Internet]. 2022 Jul [cited 2024 Aug 30];23(7):429–45. Available from: https://www.nature.com/articles/s41576-022-00455-y
- Erlich Y, Williams JB, Glazer D, Yocum K, Farahany N, Olson M, et al. Redefining genomic privacy: trust and empowerment. PLoS Biol. 2014 Nov;12(11):e1001983.
- Wilson RL, Rosen PA. Does protecting databases using perturbation techniques impact knowledge discovery? : In: Siau K, editor. Advances in Database Research [Internet]. IGI Global; 2005 [cited 2024 Aug 30]. p. 96–107. Available from: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-59140-471-2.ch003
- Costello RÁ. Genetic data and the right to privacy: towards a relational theory of privacy? Human Rights Law Review [Internet]. 2022 Jan 6 [cited 2024 Aug 30];22(1):ngab031. Available from: https://academic.oup.com/hrlr/article/doi/10.1093/hrlr/ngab031/6497576
- Tao T, Chen Y, Liu B, Jin X, Yan M, Ji S. Security analysis of bioinformatics web application. In: Yang CN, Peng SL, Jain LC, editors. Security with Intelligent Computing and Big-data Services. Cham: Springer International Publishing; 2020. p. 383–97.
- Tanjo T, Kawai Y, Tokunaga K, Ogasawara O, Nagasaki M. Practical guide for managing large-scale human genome data in research. J Hum Genet [Internet]. 2021 Jan [cited 2024 Aug 30];66(1):39–52. Available from: https://www.nature.com/articles/s10038-020-00862-1
- Beauchamp TL. Informed consent: its history, meaning, and present challenges. Cambridge Quarterly of Healthcare Ethics [Internet]. 2011 Oct [cited 2024 Aug 30];20(4):515–23. Available from: https://www.cambridge.org/core/journals/cambridge-quarterly-of-healthcare-ethics/article/abs/informed-consent-its-history-meaning-and-present-challenges/27E8171706F09D53D5702137B3DEA168
- Rego S, Grove ME, Cho MK, Ormond KE. Informed consent in the genomics era. Cold Spring Harb Perspect Med [Internet]. 2020 Aug [cited 2024 Aug 30];10(8):a036582. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397836/
- Schumacher GJ, Sawaya S, Nelson D, Hansen AJ. Genetic information insecurity as state of the art. Front Bioeng Biotechnol [Internet]. 2020 Dec 8 [cited 2024 Aug 30];8:591980. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768984/
- Nielsen MEJ, Kongsholm NCH, Schovsbo J. Property and human genetic information. J Community Genet [Internet]. 2019 Jan [cited 2024 Aug 30];10(1):95–107. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325034/

