I re-rooted the whole sequence phylogenetic tree of Covid-19 (2019-nCoV; SARS-CoV-2). The phylogeny shows up that the most recent common ancestor (MRCA) might be circulating in human several months before the outbreak in Wuhan in Dec. 2019; the outbreak might be related to a non-synonymous Ser > Leu mutation in ORF8.
The auspice site collected dozens of whole genomic sequences of Covid-19, and made good visualization for the phylogeny. The webside put the node of the biggest “star cluster” as the beginning as the outbreak, which are commonly believed as the origin of the prevalence of this plague. However, this phylogenetic tree is unrooted. Since the mutation rate of a virus may not be in accordance with molecular clock assumption, the most recent common ancestor, or the “entrance of the tree” (e.g. “Y-chromosomal Adam” or “mitochondrial Eve” of modern human) should be confirmed through outgroup sequences. As the human Y chromosomal and mitochondrial trees were rooted using chimpanzee or Neanderthal sequences, naturally the closest strain to Covid-19, RaTG13, which was found in a bat in Yunnan province was applied for the re-rooting.
I discovered that the largest “star cluster” was not the MRCA of Covid-19! The root is shown in red in Fig. 1 and Fig. 2. Three mutations, C8782T (Fig. 3), T28144C (Fig. 4) and C24034T (Fig. 5) are found identical between bat coronavirus RaTG13 and those Covid-19 on the right of Fig. 2, but different to the others. So the root should be located at the position in red (although no sequence was found at this position, the closest sequence is one mutation away). A T>C muation at pos. 24034 caused a non-synonymous Ser > Leu change, which might cause the enhancement of the virulence. Also the rooting implies that Wuhan could be not the only candidate where the origin infection among human occurred; the virus circulation might have last a few weeks before December of 2019, but only known by human due to the rising of virulence in Wuhan. Future investigation of those sequences near the root might reveal hints for the origin of Covid-19.
結果，我發現最大的那個餅並不是樹根！眞正的樹根其實是在圖1和圖2標紅的位置。有三個突變，C8782T（圖3），T28144C（圖4），C24034T（圖5），RaTG13都是和圖2偏右的樣本一致。說明整個新冠肺炎的始祖是在紅叉處（但截至目前並沒有測到過序列全同的樣本，最少的也是差一步）。而ORF8（某個被翻譯的蛋白）上的T>C突變造成了一個絲氨酸到亮氨酸（Ser > Leu）的殘基改變，有可能是病毒毒力增強的重要位點。同時還說明，武漢有可能不是病毒的原發地，12月之前該病毒在人閒已經傳播數月，只是在武漢因毒力增強爆發纔爲人所知。最接近樹形眞正根部的這些樣本也許能提供病毒最初來源的更多線索。