Author Archives: genomicsincontext

How a community of UK scientists came together to break the COVID code and brought about a revolution in genomic surveillance

Dr Lara V Marks, managing editor of

Overview of COG-UK participants. Credit: Darren Smith Presentation to COG-UK Together.

Overview of COG-UK participants. Credit: Darren Smith Presentation to COG-UK Together (1:30 mins). From:

The full implications and toll of COVID-19 will only become clear many years from now. But one of the more striking aspects of the pandemic was the unprecedented degree and scale with which scientists and clinicians collaborated to address the emergency. Nowhere was this more apparent than in genomic surveillance. Employed to monitor pathogens and analyse their genetic similarities, genomic surveillance was key to tracking the evolution and transmission pathways of SARS-CoV-2 – the virus behind COVID-19 – and its new variants. Previously regarded as the preserve of academic research and thought to have only limited practical applications, the pandemic illustrated the power of genomic surveillance as a critical tool for informing public health decisions.

One of the key drivers behind the large-scale adoption of genomic surveillance for public health was the establishment of the Consortium of COVID-19 Genomics UK Consortium (COG-UK). Set up in March 2020 with remarkable speed and foresight, COG-UK provided the first proof that genomic surveillance of a pathogen could be done in real time at a national level. Its work was pivotal to the quick detection of more transmissible and worrying variants and was essential to the roll-out of effective diagnostics, vaccines and treatments.

Within two months after it officially began operations, COG-UK had managed to sequence and analyse over half of the SARS-CoV-2 viral genomes reported globally. Thereafter COG-UK continued to push genomic surveillance to unprecedented levels, offering an important model for other countries to follow. How it did this is documented in a new online exhibition ‘Cracking Covid: The history of COG-UK’ that I have curated.

I first learnt about the work of COG-UK in late December 2020 when the UK government decided to cancel Christmas in the wake of the discovery of the Alpha variant. My interest was further piqued when I found out that the consortium had been set up by Professor Sharon Peacock, a clinical microbiologist at the University of Cambridge whom I had previously interviewed in 2015 for my online exhibition ‘The path to DNA sequencing: The life and work of Fred Sanger’. In this exhibition, I featured the DNA sequencing work Peacock and her team did to successfully bring under control an outbreak of MRSA in a Special Care Baby Unit at the Rosie Hospital in Cambridge in 2011.

Cartoon building the plane while trying to fly. First used by software developers in Silicon Valley, the analogy of ‘building a plane as you fly’ has become a common analogy in many sectors. Credit: James Baylay (aka James the Scribe).

Cartoon building the plane while trying to fly. First used by software developers in Silicon Valley, the analogy of ‘building a plane as you fly’ has become a common analogy in many sectors. Credit: James Baylay (aka James the Scribe). From:

I also had links to Peacock by virtue of the fact that just before the pandemic, I took up a visiting research position in the Cambridge Institute for Therapeutic Immunology and Infectious Diseases, with which she is also affiliated. The Institute commissioned me to put together the online exhibition ‘The history of antimicrobial resistance and scientists’ struggles to overcome the problem’ on, an online charitable educational resource I founded in 2013 to tell the stories of the people, sciences and places behind recent advances in biomedical science and compile a searchable timeline of key developments.

With the outbreak of COVID-19, the Institute encouraged me to develop some resources around the scientific developments behind COVID-19. This included compiling regular briefings from Professor Stephen Baker to document his and his team’s efforts to roll out diagnostic testing for COVID-19 among healthcare workers in Addenbrooke’s Hospital and the wider population of Cambridge. Talking with Baker and others alerted me to the urgent need to capture the experiences of scientists working on the frontline of the pandemic before their memories faded. To this end, I gained Peacock’s support to conduct interviews with COG-UK’s participants and collect documents relating to its operation. My aim was to use this material to create an online exhibition and archive to preserve the history of COG-UK.

To reflect the wide diversity of participants in COG-UK, Peacock encouraged me to interview a wide range of people with a multitude of skills and varying levels of seniority. Little did I realise how much of an undertaking this would be. In the end I interviewed 85 out of the 600 members of COG-UK between December 2021 and December 2022. Where possible I transcribed these interviews.

Many of the people interviewed generously shared some of their documents. This included some slide presentations given at the first meeting held to set up COG-UK and the first proposal drawn up to secure funding from the government. Put together at remarkable speed, these materials provide key insights into how the consortium took shape in the early days. Operating for a total of 18 months, COG-UK also put out a series of blog postings which offer an overview of how its work expanded and responded to the changing demands of the pandemic. Another important source of information were the reports its team compiled for the UK government’s Scientific Advisory Group for Emergencies (SAGE) which detail the evolution and spread of the SARS-CoV-2 virus within the UK. Early on, COG-UK also provided regular updates through Coverage Reports on the geographic spread of sequencing undertaken by its partners. Also invaluable are the presentations COG-UK recorded for an event, held in October 2021, to bring together all of its members to share their experiences of sequencing since its inception.

Tweet put out by Professor Nick Loman announcing the successful funding of COG-UK.

Tweet put out by Professor Nick Loman announcing the successful funding of COG-UK. From:

In the process of putting the exhibition together I also drew extensively on messages put out by people involved or adjacent to COG-UK on Twitter, the American social media company. Concerned that such messages would fail to be preserved in the wake of Elon Musk’s acquisition of Twitter, I immediately saved these messages into a searchable database to appear alongside the exhibition.

The exhibition and its resources serves as a monument to the hard toil and sacrifices many scientists and others went through to overcome the adversities of the pandemic and save lives. Everyone who took part in COG-UK has their own rich and moving story to tell, reflecting the wide range of skills and backgrounds of the individuals who helped propel the project forward. It is their stories which the exhibition aims to convey.

Now that the immediate crisis of COVID-19 has passed on, the strong temptation to put it behind us is already in danger of changing the narrative and understanding of the pandemic, and the role that scientists played in helping to bring it under control is in danger of being forgotten or rewritten. But how scientists experienced and successfully responded to the pandemic has important lessons for the future, particularly for handling other infectious disease threats.



Special issue published: ‘The sequences and the sequencers’

Cover of Historical Studies in the Natural Sciences journalAfter several years of analysis and interpretation of datasets and networks concerning publications reporting new DNA sequences, the TRANSGENE collaboration can announce the release of five papers reporting our findings in the journal Historical Studies in the Natural Sciences. The special issue demonstrates the mixed-methods approach used and developed by the multi-disciplinary collaboration, which involved moving between the quantitative and network-based analysis and more qualitative modes of research, such as reviews of literature, oral history interviews, and inspection of archives and other historical records. In the three papers in the middle of the issue, this overall approach is pursued in distinct ways to develop the history of human, yeast and pig genomics.

For human genomics, the paper explores a collaboration between medical genetics laboratories – who were not involved in the Human Genome Project (HGP) – and the private company Celera who rivalled the HGP in attempting to sequence the whole human genome. This collaboration, around human chromosome 7, demonstrated that the separation between medical genetics and the mainline large-scale genomics represented by the HGP was contingent, and the divisions between smaller-scale genetics and large-scale infrastructures could – and did – break down when it suited both sides. In the case of Celera, their business model encouraged them to seek partners in medical genetics to help enrich their proprietary datasets and establish the means to develop diagnostic and therapeutic innovations using their data. For the medical geneticists, working with Celera and its highly-refined infrastructure for generating and processing sequence data would enable them to improve the canonical sequence of a key chromosome for their research.

The paper concerning yeast genomics focuses on some private DNA sequencing companies in Germany and the Biozentrum research institution in Switzerland, which inspection of the clusters in the co-authorship network for this species revealed as bridging the European Commission-funded Yeast Genome Sequencing Project with other groups and projects, particularly in North America. In examining these companies and Biozentrum, this work is as much a contribution to the distinct history of biotechnology in Europe as well as yeast genomics, and also reveals details of the way in which Europe and North America interacted during and after the projects to sequence the yeast genome (1989-1996), the ways in which yeast biologists contributed to genomic research, and the late-twentieth century European model of organising genomics.

The study of pig genomics highlights the significance of bricolage, the repurposing of existing resources for new applications. While these practices are present across genomics and scientific research more broadly, they have been particularly salient in pig genomics. Using this as an organising concept, the paper identifies three sets of institutions based on an assessment of two metrics concerning their publishing activity: agricultural, systematic (conducting research on evolution, diversity and related topics) and hybrid. The institutions in these sets exhibit particular forms of bricolage and modes of collaboration. Intriguingly, over the period covered by this research (1990-2015), the authors discerned a shift from dominance by the agriculturally-inclined institutions towards the more systematically-oriented. The inflection point, in the mid-2000s, was when the Swine Genome Sequencing Project – involving many of the agriculturally-inclined institutions – was producing and publishing data on the pig genome. The availability of this resource enabled new institutions to publish and therefore appear in our dataset, and fresh forms of collaboration to be forged, often in the systematic mode. This also affected institutions who had long been involved in pig genomics, who were able to use the new reference sequence to conduct novel kinds of research. The paper highlights one, Universitat Autònoma de Barcelona, who shifted from a more agriculturally-inclined mode to a systematic one. This changed the pattern of their collaborations, from a few Spanish or intra-institutional partners, towards larger international networks. In so doing, it made the institution much more central and well-connected in the co-authorship networks. Without centering the production of a reference genome, this paper was able to identify its salience as a bricolaged resource through an examination of trends and types of collaborative activity preceding it, paralleling it and succeeding it.

The issue closes with a reflection on how the different ways in which the overall mixed-methods approach has enabled the history of genomics to be thickened. Recognising that the precise methods and datasets developed and used may not be applicable beyond genomics, or even certain kinds of genomics, the authors suggest other forms of data and ways of generating resources from which to construct networks or other representations of collaboration.

All of the papers are available with full open-access here.

The authors of the special issue and all individual papers are: Miguel García-Sancho, Rhodri Leng, James Lowe, Niki Vermeulen, Gil Viry and Mark Wong.

The 100,000 Genomes Project: shaping genomic medicine in the NHS

Jarmo de Vries, Science, Technology and Innovation Studies, University of Edinburgh

With the advent of the NHS Genomic Medicine Service (GMS), NHS England is transforming the organisation of it genetic testing services. As part of my PhD project, I am studying how this signifies the emergence of a new knowledge-control regime, a sociotechnical arrangement ‘that constitutes categories of agents, spaces, objects, and relationships among them in a manner that allocates entitlements and burdens pertaining to knowledge’ (Hilgartner 2017: p. 9). In this new knowledge-control regime, the organisation of genetic and genomic testing services is being centralised, the production of genomic data becomes key, the control of this data is placed outside the NHS, and new actors become important in the analysis and interpretation of genomic data. The origins of the GMS and aspects of this new knowledge-control regime can be traced to the 100,000 Genomes Project (100kGP) and to several UK policy reports produced following the completion of the Human Genome Project. The 100kGP was a project announced in 2012 to sequence the whole genome of 100,000 NHS patients and aimed to make the implementation of whole-genome sequencing possible in the NHS and to lay the groundwork for the GMS (Genomics England, 2014). I will discuss how these visions of genetic and genomic medicine led to the announcement of the 100kGP and indicate changes in the knowledge-control regime that emerge in them.

Let me start with the White Paper Our Inheritance, Our Future that was published in 2003 by the Department of Health just after the completion of the Human Genome Project (Department of Health, 2003). This report talks about a projected revolution in healthcare that would be enabled by genetic technologies, a future in which genetic testing becomes a routine part of mainstream NHS services, and outlines an ambition for the NHS to become a world leader in genetics. More importantly, it also started to discuss potential changes to genetic laboratory services. The White Paper recognised that it did not need to happen immediately, but set out considerations for future genetic testing services: the centralisation of laboratories and services, closing certain laboratories, establishing divisions of labour across laboratories, and involving the private sector. This foreshadows some of the changes that the GMS brings, especially the centralisation of laboratory services and collaborations with the private sector.

A few months after the White Paper was published, the Bioscience 2015 report was published by the Bioscience Innovation and Growth Team (BIGT) set up by the Departments of Health and of Trade and Industry (BIGT, 2003). This report develops the aim to make the UK a global leader in the life sciences, to stimulate the bioscience industry and the UK economy, and to improve healthcare, including through personalised and preventative medicine. In it, the NHS is seen as an asset for developing a bioscience industry by potentially providing access to a large patient pool for clinical trials and research (BIGT 2003: p. 13). However, a lack of support for innovation in the NHS was stated as a barrier for their plans. While this was not discussed further, it implied the need for a culture change in the NHS. This was a recurrent theme in later reports. Overall, the plans for making genetic testing routine in the NHS cannot be seen separately from plans such as Bioscience 2015.

The House of Lords Genomic Medicine Report from 2009 is key for how the future of genomics and genetic testing has been envisioned in the UK (House of Lords, 2009). It is the first report that explicitly asked for funding to study the implementation of genomic technologies in the NHS and for something like the 100kGP. Furthermore, it put the implementation of genomic technologies in the context of making the NHS more innovative and creating collaborations between the NHS and private companies. To do this it suggested that a cultural change in the NHS was needed to achieve a ‘real commitment to research’ and to develop a ‘culture of innovation’ (House of Lords, 2009: section 3.13). Part of this process of cultural change would be to the overcome the perceived lack of translation in the NHS, which is seen as the development and implementation of healthcare products and treatments from basic research. Only by improving this, they argued, could the NHS and the UK gain the full economic and health benefits from health research. So, the NHS was again seen as a barrier to the successful implementation of genomic technologies. As a result, organisational changes in the NHS were proposed to overcome these supposed challenges. Interestingly, historical research into medical genetics actually sketches a different picture of the relationship between research and care in the NHS. For example, Sturdy discusses the development of molecular oncology in the UK and shows the close involvement of clinical geneticists in its development in the NHS. It was not just a one-directional movement from research to care but instead depended on both clinical and laboratory expertise for both the development of research and clinical services. It shows that medical genetics in the NHS were historically involved with research and this can serve as a counter point against the claims that the NHS has no commitment to research and for how translation and innovation are often perceived as this one-directional process.

The House of Lords report mainly focused on reorganising the laboratory services. In his oral evidence to the Lords committee, prominent medical geneticist and scientific administrator Professor John Bell made an explicit call to reorganise laboratory services. He called existing provision ‘severely fragmented’, stated that there is ‘an urgent need therefore to rationalise the management of these’, and that a ‘single clinical service structure is imperative to ensure that there is a coherent approach to these methodologies within the NHS’ (House of Lords, 2009: section 4.38). His involvement and views are important. Following this report he was appointed as chair of the Human Genomics Strategy Group that was tasked to develop a vision for genomics in the NHS. It is not surprising, therefore, that his vision – reflected in the House of Lords report – was prominent in the report produced by the Human Genomics Strategy Group in 2012.

The Human Genomics Strategy Group report recommended that the NHS prepare for an imminent implementation of genomic testing, produce a centralised genomic database and to develop a new service delivery model for genomic and genetic testing (Human Genomics Strategy Group, 2012). It followed the earlier recommendation of the House of Lords reports, but was also more specific in describing what a new genetic testing service should look like. It proposed that the laboratory services be taken away from the Regional Genetic Centres, which carried out most of the testing. Instead, a new type of laboratory would become responsible for all types of genetic testing, including ones not performed by the Regional Genetic Centres. Furthermore, these plans suggested a reduction of the number of laboratories servicing the NHS and for some specialist tests to only be done by specific accredited laboratories. The report also suggested that, increasingly, private sector providers should be used, but did not explain how or under what circumstances. The Regional Genetic Centres would mainly become hubs for the provision of clinical genetic services, managing familial disease, and offering support and expertise to other clinical disciplines. A reduction of their overall numbers was not ruled out either. This report was the first to suggest a specific reorganisation of genetic testing in the NHS, building on previous calls for centralisation.

The plans outlined in the Human Genomics Strategy Group report were supported by the UK government and at least some parts of NHS England. At the end of 2012, another report was published that set out plans to sequence 100,000 whole genomes of NHS patients, the 100kGP, and to lay the groundwork for routine genomic testing in the NHS. This update on the Strategy for UK Life Sciences was introduced by then Prime Minister David Cameron and also again John Bell as a Life Science Champion (HM Government, 2012). It set out three main goals, to:

1) harness the potential of genomic technology by the NHS to improve patient outcomes and healthcare;

2) maximise the opportunities for research and translation of research findings into health and economic benefits for the UK; and

3) support the growth of UK genomics and bioinformatics companies, including SMEs by enabling the creation of genomic platforms for innovation (HM Government, 2012: p. 44/45).

In this way, the UK government continued the ambitions and visions laid out for genomics in previous reports. The 100kGP can also be seen as following the recommendation of the House of Lords and the Human Genomics Strategy Group to fund a study for the implementation of genomic medicine. The 100kGP seems, therefore, to be the outcome of a perceived translational gap between research and clinic, an ambition of the UK government to stimulate and form a profitable life sciences industry, and the vision to make genomics a mainstream part of medicine. These visions seems to be underlying the reorganisation of the genetic testing services in the NHS as well and the specific structure and organisation of the 100kGP and the GMS. This I plan to discuss in a later blog post.


BIGT, 2003. Bioscience 2015: Improving National Health, Increasing National Wealth. Executive Summary

Department of Health, 2003. Our Inheritance, Our Future: Realising the potential of genetics in the NHS.

Genomics England, 2014. The 100,000 Genomes Project.

Hilgartner, S., 2017. Reordering Life: Knowledge and Control in the Genomics Revolution. The MIT Press.

HM Government, 2012. Strategy for UK Life Sciences: One Year On.

House of Lords, 2009. Genomic Medicine – Volume I: Report (Science and Technology Committee No. 2), Session 2008–09.

Human Genomics Strategy Group, 2012. Building on our inheritance: Genomic technology in healthcare.

Sturdy, S., 2021. Local mutations: on the tentative beginnings of molecular oncology in Britain 1980–2000. New Genetics and Society, 40 (1), 1–19.

Record approvals of new cancer drugs – what is the role of genomics?

Post by Matt Wasmuth, a postgraduate student at the University of Edinburgh. This post is adapted from work conducted as part of the ‘Biobusiness’ course.

Image of prescription drugs, produced by J. Troha, source National Cancer Institute. Reproduced under a Creative Commons Attribution-Share Alike 4.0 International license. Available online at:

Over the last decade, the number of new cancer patients in the United States has grown from 1.5 million per year to a projected 1.9 million for 2020. This has also resulted in an increase in spending on cancer treatment. This trend is not just restricted to the US. Global spending on cancer treatment is expected to reach $200 Billion by 2022, constituting 14% of total medical expenditure. This spending has paralleled the unprecedented number of drugs being approved for the treatment of cancer. Here I discuss some key changes within the oncology drug market over the last 10 years and assess the extent to which genomics has altered the dynamics of drug discovery.

Approval rates of cancer drugs continue to rise with an increase from 13% to 17% of novel drugs treatments from 2015 to 2019. What has stimulated this rise? The story begins with Richard Nixon signing the National Cancer Act in 1971, which led to increased funding directed towards oncology through the National Institutes of Health. The research that followed led to the recognition that cancer is not just one disease, but a collection of many – sometimes rare – diseases stemming from one or more genetic mutations. Due to the much smaller markets for treatment, rare diseases attract less investment into the development of potential drug therapies. To ameliorate this, The Orphan Drug Designation Program (ODDP) within the Food and Drug Administration (FDA) was established in 1983. The program aims to advance the development of products that demonstrate promise for the treatment or diagnosis of rare diseases by providing incentives for sponsors. The National Organization for Rare Disorders (NORD), originally spearheaded by patient groups affected by more widely-known disorders such as Huntington’s disease and severe combined immunodeficiency (SCID), gave the necessary impetus for the Act to take shape. Whilst members of these advocacy groups benefitted hugely from the research and products that arose from this movement, so did companies and research groups who develop cancer treatments. With more than 200 kinds of cancer now recognised, most potential treatments for rare cancers (those affecting 1/200,000) are eligible for the program. This also explains why the largest category of these un-profitable ‘orphan’ approved drugs is oncology. As of 2019, 21 out of 44 products approved were orphan drugs.

Figures on R&D spending in the pharmaceutical industry overall have not generally been matched by an equivalent rise in the rate of approvals, especially within Europe. Approvals within the oncology market buck the trend. This may be for a variety of reasons. One is that, as the genetic basis for various cancers become known, treatments have become more targeted. Indeed, they are now often approved with a companion diagnostic test that detects the genetic basis of the cancer.

Clinical trials are then optimised and designed for trial populations in which the genetics of the cancer has been characterised. Smaller patient numbers for rare diseases pose a challenge for clinical trial recruitment and for ensuring that the trials are statistically valid. However, novel clinical trial designs are being explored. For instance, predictive probability is employed to select patients most likely to benefit from treatment, based on their biomarker signatures. In cancer where time is a crucial factor, rather than using clinical end-points (remission rates), adaptive trial designs employ surrogate end-points (such as evidence of tumour shrinkage) or the detection of biomarkers such as HER2 gene overexpression, which is a feature in breast cancer, which exploit the relationship between the severity of the disease and response to treatment. These types of designs often allow for trials to be smaller and for decisions to be made faster.

A collaboration between the European Medicines Agency and the FDA launched in 2009. The product, a standardised application form concerning drugs wishing to enter both markets, yielded reduced approval times and as this approach becomes more refined will continue to do so. The UK government is aiming for regulation to take a more “streamlined, internationally competitive approach” in areas such as clinical trials resulting in further improved approval rates as well as encouraging the use of techniques such as genome editing by CRISPR-Cas9. Genome editing may make it possible to accurately modify the genome to provide a better understanding of cancer biology, highlight potential druggable targets and present a method of synthesising the drugs for these targets in cell factories.

Figure 1: Number of drugs new drugs approved from 1993-2019. Adapted from Mullard, 2020.

Significant developments in the treatment of cancer are not new. Radiotherapy was introduced in the 1900s, and chemotherapy in the 1940s. The next jump with targeted treatments such as monoclonal antibodies and Tyrosine Kinase Inhibitors in the 1980s was a more profound change, shaping the policy and regulatory landscape. Research efforts towards studying the genetics of cancer that began in the 1980s and the genomics of cancer from the 2000s are now bearing fruit, which is reflected in the spike we see in approvals for targeted treatments over the last three years (Figure 1). From the 1980s onwards, oncologists came to understand cancer as a molecular genetic disease by cataloguing different types of mutations found in tumours. These included mutations of the p53 gene, which implicated in cell cycle control and apoptosis, and therefore tumour suppression when active, and the development of tumours when inactivated. Genomic studies of the DNA sequences of particular tumours has enabled better-tailored treatments and also allowed the further stratification of cancer into the individual rare diseases that we know today.

However, the undoubted impact of genetics and genomics on diagnosis and treatment should not lead us to ignore the ongoing salience of monoclonal antibodies, immune proteins that can be designed to selectively attack cancer cells.

Due to the specificity of their targeting of cancer cells, in contrast with radiotherapeutic and chemotherapeutic approaches, monoclonal antibody therapy has taken off over the last decade and is expected to increase further not only as a cancer treatment but also in treating numerous other diseases (Table 1). Recently, antibody-drug conjugates that combine the specificity of an antibody with a cytotoxic (cell-killing) drug compound have been developed to target cancer cells. CAR-T cell therapy (Chimeric Antigen Receptor T cells) involves editing the genome of T cells, immune cells that kill other cells, to enable them to detect specific molecular markers (antigens) on the surface of cancer cells. This causes the T cells to attack those cells with high specificity (Mullard, 2020).

The limitations of particular immunological approaches such as monoclonal antibody therapy have not restricted its growth in all classes of drugs, including oncological. Methods such as genome editing are helping to further improve the specificity and effectiveness of immunological approaches. But genomics has also made its own distinctive contribution. It has enabled the stratification of cancer into multiple rare diseases that has been incentivised by the advantages of orphan drug designation. And it contributes towards the identification of new therapeutic targets, such as this one paper alone highlighting 5 novel gene targets from whole genome sequencing of over 500 patients.

Genomics and the coronavirus, SARS-CoV2

Post by James Lowe, a member of the TRANSGENE: Medical Translation in the History of Modern Genomics project, which is funded by a European Research Council Horizon 2020 Programme Starting Grant. See the TRANSGENE website for more information on the project:​

Computer-generated representation of the SARS-CoV2 virus

Computer-generated representation of the SARS-CoV2 virus, produced by Felipe Esquivel Reed. Reproduced under a Creative Commons Attribution-Share Alike 4.0 International license. Available online at:

The coronavirus SARS-CoV2, that causes the potentially fatal illness known as COVID-19, was first detected in the city of Wuhan in December 2019. The viral genome, made of RNA rather than the DNA that constitutes the genomes of all non-viral species, was rapidly sequenced, and published in January 2020. Remarkable quantities of work have been published on the virus, on the disease and its epidemiology, and on the mitigation of its spread and potential treatments. Further sequencing and investigation of the virus’s genome has formed a significant portion of this research. Due to the quantity of this work and my own lack of expertise in this area, I will not comment on its validity or implications for managing the spread of the virus and treating the disease. Instead I will highlight what the genomics research performed on the virus can tell us about the uses of genomics, and the relationship of genomics to other areas of the life sciences, including novel public health challenges.

The initial sequencing of the genome of the virus took place in China, both at BGI, the large-scale sequencing company, and the Chinese Center for Disease Control and Prevention (CDCP). The sequencing was based on samples provided by nine patients, the RNA from which was reverse transcribed to complementary DNA (cDNA). The resulting DNA assemblies for each patient were used to construct a consensus sequence. This consensus sequence became the representative reference genome. Comparative practices were central to this process, as I have shown it is to genomics more generally (here and in a paper in preparation). For example, they compared the sequence data they were getting with the latest human reference genome data, using software called the Burrows-Wheeler Aligner. The algorithms in this software detect alignments of the supposed viral sequence to human sequence, thus enabling human DNA not previously washed out by the researchers’ purification procedures to be identified and removed from the sequence.

They needed the reference genome of another strain of coronavirus, bat-SL-CoVZC45, to aid them in assembling the genomes of the viruses extracted from the patients. The sequence data they already had indicated the similarity between the two strains. They used this similarity to map the sequence reads from the nine patients to the bat-SL-CoVZC45 genome, using it as scaffolding to construct the genomes of the viruses extracted from each of the patients. The genomes of the viruses extracted from each of the patients were compared against each other to ascertain the consensus sequence, as well as identifying a tiny number of sequence differences between them.

Finally, the sequence of the virus was compared against the reference sequences of other known coronaviruses, to infer evolutionary relationships between them. This phylogenetic analysis, which posited the possible strains from which SARS-CoV2 derived, was not merely of academic interest. It provided clues as to its origins in bats (with other evidence suggested that another animal vector transmitted it from bats to humans in the Wuhan seafood market), and indicated a similar receptor (ACE2) to that employed by the original SARS virus (SARS-CoV), with implications for the viruses mode of action and possible treatment.

The analysis of the sequence so produced therefore provided evidence as to the origins of the new virus, its relationship to other viruses, some clues as to its mode of action, and formed the basis of a Polymerase Chain Reaction test for the presence of the virus that was quickly devised by the CDCP.

Since then, genomics has been used in a multitude of ways to investigate the virus and its spread. At the time of writing, 579 separate sequences have been submitted to the publicly-available database GenBank alone, from viruses collected all around the world. Two types of studies demonstrate the potential directions genomics can take even after the publication of a consensus reference genome. One concerns the deeper investigation of the sequence itself for clues about the virus’s function, and its evolutionary history. The other concerns the diversity of viral RNA sequence. Both have health implications, and ones even broader than that. One implication of the diversity of the viral RNA sequence among affected humans is whether one vaccine will need to be produced, or whether new vaccines will have to be developed every year, as for seasonal flu. Sequence comparisons between samples across the world suggest a rather low number of mutations have occurred in the virus’s reproduction and spread. Depending on the immunology of COVID-19, this suggests that the virus will not evolve fast enough to necessitate regular novel vaccine production.

Chinese pangolin, Manis pentadactyla

Chinese pangolin, Manis pentadactyla. Photograph by Sarita Jnawali of National Trust for Nature Conservation Central Zoo, Nepal. Reproduced under a Creative Commons Attribution-Share Alike 4.0 International license. Available online at:

In an example of the former type of study, a US-UK-Australian collaboration examined two key features of the SARS-CoV2 genome, with the aim of clarifying its origins, and also providing further data for understanding the biology of the virus’s infection of human cells. The first feature is the Receptor-Binding Domain (RBD), which aids the virus in binding to the aforementioned ACE2 receptor on human cells to enable them to enter. Comparing the sequence of six key amino acids with those in the original SARS virus, they found that although they were well-suited to binding to ACE2 and therefore entering the cell, they were not optimal for binding. They suggest that this means that the virus was not deliberately engineered by malign actors, but such reasoning is unlikely to persuade the conspiratorially-minded. The evolution of a set of amino acids with high binding efficiency distinct from any set found in humans led them to conclude that the virus had its origins in another animal. A similar RBD in pangolins made them the preferred candidate, though data on the other feature they studied also indicated some evolution of the virus in humans before it reached its current form. The extent of evolution of the virus before human-to-human transmission has implications for whether we might expect new coronaviruses to emerge. If it did mainly derive from an animal reservoir of viruses, they project that another strain is likely to emerge at some point, necessitating the development of new strategies to reduce the risk of this happening. Further comparative data on viral genomes from many different animal sources would be needed when weighing the various hypotheses concerning viral origins that they examine.

This blog post should not be taken to be an authoritative source of information on SARS-CoV2, COVID-19, its epidemiology or treatment. The findings I have reported are only a few of the many that have been published, and are open to challenges, including alternative interpretations of the data they have produced. Instead, I have endeavoured to show what the function of genomics is in a rapidly developing situation in which the investigation of the biology of a new entity and its interaction with humans is intended to produce results of direct and immediate relevance. The biology required ranges from the molecular biology of the ACE2 receptor to the sequence differences in viral samples across the globe. The production of new genomic data required for these wildly different studies has relied on the skilful exploitation of existing genomic data. Researchers have, for instance, used the reference genome of a similar strain, inferred function and mechanisms of action of the virus from a similar strain, and used human genome data to wash out potential contamination from the human DNA of the viral RNA donors. The data they produce may be similarly used in ways its creators did not envisage or plan for, as this rare mobilisation of a substantial portion of the world scientific community continues.