J. Gen. Virol. Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. PubMed Central 6, 8391 (2015). For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. A.R. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. PubMedGoogle Scholar. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. Even before the COVID-19 pandemic, pangolins have been making headlines. Biol. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. PubMed Central ISSN 2058-5276 (online). & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. Biazzo et al. =0.00075 and one with a mean of 0.00024 and s.d. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. The canine viral genome was excluded from the Bayesian phylogenetic analyses because temporal signal analyses (see below) indicated that it was an outlier. These residues are also in the Pangolin Guangdong 2019 sequence. Uncertainty measures are shown in Extended Data Fig. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. Evol. For weather, science, and COVID-19 . BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. Sci. 3). This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. 206298/Z/17/Z. D.L.R. 874850). 3) to examine the sensitivity of date estimates to this prior specification. J. Med. Lond. Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. 94, e0012720 (2020). Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. Virus Evol. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. 3). TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. Wu, F. et al. Boni, M.F., Lemey, P., Jiang, X. et al. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Extensive diversity of coronaviruses in bats from China. These datasets were subjected to the same recombination masking approach as NRA3 and were characterized by a strong temporal signal (Fig. T.L. This leaves the insertion of polybasic. And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. The extent of sarbecovirus recombination history can be illustrated by five phylogenetic trees inferred from BFRs or concatenated adjacent BFRs (Fig. Hu, B. et al. However, on closer inspection, the relative divergences in the phylogenetic tree (Fig. Holmes, E. C., Rambaut, A. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. It is RaTG13 that is more divergent in the variable-loop region (Extended Data Fig. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. Li, X. et al. B., Weaver, S. & Sergei, L. Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans. In the absence of any reasonable prior knowledge on the TMRCA of the sarbecovirus datasets (which is required for grid specification in a skygrid model), we specified a simpler constant size population prior. (2020) with additional (and higher quality) snake coding sequence data and several miscellaneous eukaryotes with low genomic GC content failed to find any meaningful clustering of the SARS-CoV-2 with snake genomes (a). Further information on research design is available in the Nature Research Reporting Summary linked to this article. Bioinformatics 30, 13121313 (2014). By mid-January 2020, the virus was spreading widely within Hubei province and by early March SARS-CoV-2 was declared a pandemic8. These differences reflect the fact that rate estimates can vary considerably with the timescale of measurement, a frequently observed phenomenon in viruses known as time-dependent evolutionary rates41,43,44. Nature 579, 270273 (2020). In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. M.F.B. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. Coronavirus: Pangolins found to carry related strains. Developed by the Centre for Genomic Pathogen Surveillance. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. 5). 17, 15781579 (1999). Evol. PubMed Gorbalenya, A. E. et al. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus genome sequences. Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. These authors contributed equally: Maciej F. Boni, Philippe Lemey. Proc. Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. R. Soc. 68, 10521061 (2019). 190, 20882095 (2004). 2). is funded by the MRC (no. J. Virol. 31922087). Conducting analogous analyses of codon usage bias as Ji et al. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. A reduced sequence set of 25sequences chosen to capture the breadth of diversity in the sarbecoviruses (obvious recombinants not involving the SARS-CoV-2 lineage were also excluded) was used because GARD is computationally intensive. Ge, X. et al. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. Posterior distributions were approximated through Markov chain Monte Carlo sampling, which were run sufficiently long to ensure effective sampling sizes >100. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. The shaded region corresponds to the Sprotein. Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Nature 558, 180182 (2018). J. Virol. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. He, B. et al. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. The SARS-CoV divergence times are somewhat earlier than dates previously estimated15 because previous estimates were obtained using a collection of SARS-CoV genomes from human and civet hosts (as well as a few closely related bat genomes), which implies that evolutionary rates were predominantly informed by the short-term SARS outbreak scale and probably biased upwards. Phylogenetic supertree reveals detailed evolution of SARS-CoV-2, Origin and cross-species transmission of bat coronaviruses in China, Emerging SARS-CoV-2 variants follow a historical pattern recorded in outgroups infecting non-human hosts, Inferring the ecological niche of bat viruses closely related to SARS-CoV-2 using phylogeographic analyses of Rhinolophus species, Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2, A Bayesian approach to infer recombination patterns in coronaviruses, Metagenomic identification of a new sarbecovirus from horseshoe bats in Europe, A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic, Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape, https://github.com/plemey/SARSCoV2origins, https://doi.org/10.1101/2020.04.20.052019, https://doi.org/10.1101/2020.02.10.942748, https://doi.org/10.1101/2020.05.28.122366, http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339, http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331. Methods Ecol. volume5,pages 14081417 (2020)Cite this article. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. The existing diversity and dynamic process of recombination amongst lineages in the bat reservoir demonstrate how difficult it will be to identify viruses with potential to cause major human outbreaks before they emerge. PubMed Central The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. Are you sure you want to create this branch? We compiled a set of 69SARS-CoV genomes including 58 sampled from humans and 11 sampled from civets and raccoon dogs. with an alignment on which an initial recombination analysis was done. You signed in with another tab or window. Conservatively, we combined the three BFRs >2kb identified above into non-recombining region1 (NRR1). Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? Among the 68sequences in the aligned sarbecovirus sequence set, 67 show evidence of mosaicism (all DunnSidak-corrected P<4104 and 3SEQ14), indicating involvement in homologous recombination either directly with identifiable parentals or in their deeper shared evolutionary historythat is, due to shared ancestral recombination events. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior.