In February 2020 a substitution at the interface between SARS-CoV-2 Spike protein subunits, Spike D614G, was observed in public databases. The Spike 614G variant subsequently increased in frequency in many locations throughout the world. Global patterns of dispersal of Spike 614G are suggestive of a selective advantage of this variant, however the origin of Spike 614G is associated with early colonization events in Europe and subsequent radiations to the rest of the world. Increasing frequency of 614G may therefore be due to a random founder effect. We investigate the hypothesis for positive selection of Spike 614G at the level of an individual country, the United Kingdom, using more than 25,000 whole genome SARS-CoV-2 sequences collected by COVID-19 Genomics UK Consortium. Using phylogenetic analysis, we identify Spike 614G and 614D clades with unique origins in the UK and from these we extrapolate and compare growth rates of co-circulating transmission clusters. We find that Spike 614G clusters are introduced in the UK later on average than 614D clusters and grow to larger size after adjusting for time of introduction. Phylodynamic analysis does not show a significant increase in growth rates for clusters with the 614G variant, but population genetic modelling indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We also investigate the potential influence of Spike 614D versus G on virulence by matching a subset of records to clinical data on patient outcomes. We do not find any indication that patients infected with the Spike 614G variant have higher COVID-19 mortality, but younger patients have slightly increased odds of 614G carriage. Despite the availability of a very large data set, well represented by both Spike 614 variants, not all approaches showed a conclusive signal of higher transmission rate for 614G, but significant differences in growth, size, and composition of these lineages indicate a need for continued study.
### Competing Interest Statement
The authors have declared no competing interest.
### Funding Statement
We thank all partners and contributors to the COG-UK consortium who are listed at https://www.cogconsortium.uk/about/. We also acknowledge the important work of SARS-CoV-2 genome data producers globally contributing sequence data to the GISAID database, and particularly acknowledge the groups who have generated data used by this project, listed in Table S4. EV acknowledges the MRC Centre for Global Infectious Disease Analysis MR/R015600/1. VH was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) [grant number BB/M010996/1]. JTM, RMC, NJL and AR acknowledge the support of the Wellcome Trust (Collaborators Award 206298/Z/17/Z ARTIC network). AR is supported by the European Research Council (grant agreement no. 725422 - ReservoirDOCS). DLR, ASF and ECT are supported by the MRC (MC\_UU\_1201412). JS was supported by the Biotechnology and Biological Sciences Research Council-funded South West Biosciences Doctoral Training Partnership [training grant reference BB/M009122/1]. TRC and NJL acknowledge support from the MRC which funded computational resources used by the project [grant reference MR/L015080/1]. TRC acknowledges funding as part of the BBSRC Institute Strategic Programme Microbes in the Food Chain BB/R012504/1 and its constituent projects BBS/E/F/000PR10348 and BBS/E/F/000PR10352]. AP and TRC acknowledge support from Supercomputing Wales, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government. The project was also supported by specific funding from Welsh Government which provided funds for the sequencing of a subset of the Welsh samples used in this study, via Genomics Partnership Wales.
### Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study presented encompasses two elements. The first of these does not require specific ethical approval, as it focuses on public health/surveillance questions that make use of sequence data and other metadata that is already shared with the wider world as part of the activities of the COG-UK consortium (https://www.cogconsortium.uk/). COG-UK data is released and is publicly available via the ENA, GISAID and the COG-UK website. The element of the work that would/could require ethical approval is the specific examination of pathogenicity using mortality data from PHE. This work is covered as part of the COG-UK project protocol which was approved by the Public Health England Research Support and Governance Office (RSGO) following review by the PHE Research Ethics and Governance Group (REGG). Caldicott approval and ethical approval were obtained for clinical and genomic data from the relevant national biorepository authorities (16/WS/0207NHS and10/S1402/33).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
All sequence data and metadata used in this work is shared via: The COG-UK website : https://www.cogconsortium.uk/data/ GISAID: https://www.gisaid.org/ and the ENA as part of Bioproject PRJEB37886: https://www.ebi.ac.uk/ena/data/view/PRJEB37886