SANTA CRUZ. UC Santa Cruz researchers have collected more than 10 million genetic variants of the COVID-19 virus from around the world and organized them into a family tree that maps the evolution of the coronavirus.
“This scale of data is truly unprecedented,” said Angie Hinrichs, a bioinformatics programmer at the University of California, Santa Cruz. “We’ve never had so many genomes of the same species before.”
In February 2020, after the first strain of the coronavirus became available, UC Santa Cruz researchers adapted the design of their existing genome browser and created it specifically to store and display collected genetic variants of the COVID-19 virus. Since the browser was created, thousands of coronavirus sequences have been received daily from researchers both in the US and around the world. Researchers at the University of California at Santa Cruz are having a hard time managing a massive amount of data.
“The tools that were available to build phylogenetic trees before the pandemic could process a few thousand genome sequences, but all of a sudden we have tens of thousands,” said Hinrichs, who has worked with the university’s genome browser for more than 20 years.
With this in mind, a team of researchers, including Hinrichs, was assembled to transform the incomprehensible amount of information into the form of a phylogenetic tree, similar to the family tree of a virus. Team member, then postdoctoral fellow, Yatish Turakhia, wrote a new program called UShER that allowed scientists to quickly and accurately organize coronavirus variants in the massive phylogenetic tree stored in the university’s coronavirus browser. In June, the number of options in the database exceeded ten million.
To compare statistics, the next highest number of sequences collected in the UC Santa Cruz Genome Browser is E. coli, with just over 5 million genomic sequences, about half the number of coronavirus strains collected. Hinrichs notes that E. coli has been studied by scientists for decades.
The university’s coronavirus genome browser and its hosted phylogenetic tree have allowed scientists and researchers to track the history of the virus as it moves geographically, identifying new lineages and deadly variants such as the omicron, or BA.1 and BA.2 as they are called. tree and predict superspreading events or other potentially dangerous phenomena that can be foreseen in the tea leaves of the phylogenetic tree as it continues to grow.
“I wish the pandemic would go away and the episodes would stop coming in and I could complete this project, but that hasn’t happened yet,” Heinrich said. “As long as the virus continues to evolve, we will continue to build this tree so that we can at least understand what it is doing.”