Skip to main content

Tracing coronavirus’ family tree

How a website designed to track the flu is now mapping the spread of COVID-19

By Tony Rehagen

Back in February, when the public was wondering where the novel coronavirus might travel from its epicenter in China, Samuel Scarpino was at a computer in Boston, trying to find out where the virus had already been. Scarpino, an assistant professor at Northeastern University’s Network Science Institute, is a mathematic epidemiologist — a detective, essentially, hot on the trail of the perpetrator pathogen. Like any good sleuth, Scarpino was starting his investigation by building a composite picture of what the virus looks like.

Today, molecular epidemiologists can map a virus’ genetic code and monitor its mutations as it spreads. Thanks to a website called Nextstrain, they’re doing it in real time. From one positive nose swab test, scientists can map and sequence a virus’ entire 30,000-letter genome. Researchers like Scarpino can then instantly upload their results and use Nextstrain to compare them with other samples from around the globe, looking for variations. The result is an evolutionary family tree, a sort of composite picture of the COVID-19 virus.

Nextstrain is emerging as a powerful tool for tracking the coronavirus and learning more about how it spreads. By studying the virus’ mutations in Nextstrain, researchers can tell where each version of it came from, how it spread, and thus where and how it escaped containment. That gives government and public health officials the clues they need to take action and stay ahead of the virus.

The interactive visualization — color-coded horizontal “trees” overlaid on a clickable map — clearly illustrates changes and mutations in the virus’s genetic code with the tap of a mouse pad. It’s available online for any researcher or curious layperson to see (though to draw conclusions from the data, it helps to have some background in genomics).

Until this year, Nextstrain was little known outside the relatively isolated community of viral genomics researchers. That quickly changed in early 2020, with a world desperate to curtail a global pandemic. Researchers didn’t have time to wait for traditional peer-reviewed publication. They needed answers as soon as possible. Nextstrain’s moment had arrived.

Some of Nextstrain’s key contributions came soon after the epidemic surfaced. In mid-January, computational biologist Trevor Bedford — who co-created Nextstrain with biologist Richard Neher — noticed that strains in China and southeast Asia had moved from host to host before they’d had time to mutate. In a January 31 blog post, and in messages to health officials the week before, Bedford warned that transmission from person to person was probably far more widespread than previously thought.

On January 19, health officials had identified the first known U.S. coronavirus patient: a man in Snohomish County, Washington, just north of Seattle, who’d recently visited family in Wuhan, China. A month later, a high school student in Snohomish County tested positive for SARS-CoV-2, the virus that causes COVID-19. The teen hadn’t been to China.

Our understanding of the virus is evolving rapidly. The visualization gives us a better idea of how it’s [spreading] over time.”

Samuel Scarpino of Northeastern University’s Network Science Institute

But when researchers used Nextstrain to compare the traveler’s virus and the student’s, they were almost genetically identical — likely direct relatives. The finding suggested that coronavirus had been silently spreading via personal contact for a month, infecting hundreds of Seattle-area residents. (Recently, new data revealed a nearly identical strain in nearby British Columbia, raising the possibility that the Washington outbreak did not start with the Wuhan traveler.)

“It was unquestionable from their early reports that there was community transmission,” says Scarpino, who helped sound that alarm in Boston and has co-authored a study of what did and didn’t work in China to contain the virus.

Armed with that information, local authorities and public health officials in Washington State implemented social-distancing measures to help stem the outbreak, setting an example for other governments throughout the United States. “Just being able to show that was really important,” says Scarpino, “for jump-starting what had been a pretty stagnant response in the U.S.”

Nextstrain started in 2014 as a website called nextflu, co-created by Bedford at the Fred Hutchinson Cancer Research Center in Seattle. Drawing from data uploaded to GISAID, an open-source database, Nextstrain offered real-time tracking of seasonal influenza viruses. The platform’s ability to convert data into clear, interactive graphics also proved useful for monitoring other global outbreaks, including Ebola in 2014 and the Zika virus in 2016.

“The interactive visualization was the new component,” says Shirlee Wohl, a postdoctoral fellow in epidemiology at the Johns Hopkins Bloomberg School of Public Health. She’s used Nextstrain since 2016, when she contributed genomic data on a Massachusetts mumps outbreak.

“A lot of times, writing papers, there are six different variables,” says Wohl, “and you try hard to create compelling figures that highlight the analysis and results. But it’s hard to do that.” Many researchers had published papers showing pathogens’ family trees. “But there was no push to make them customizable and interactive, where people could click on and color in different ways,” Wohl explains. “This allows users to explore data on their own.” The site’s analysis and visualization tools are open-source codes that anyone with basic bioinformatics training can use.

That’s proven helpful for tracking the new coronavirus. Researchers used Nextstrain to see how the virus spread from China to Washington State and to discover that many of New York City’s cases had originated in Europe. “Because it’s unfolding rapidly and the geographic breadth of the spread is happening rapidly, our understanding of the virus is evolving rapidly,” says Scarpino. “The visualization gives us a better idea of how it’s happening over time.”

Showing the virus’ natural origins via Nextstrain — that its structure is unique and not some mix of other pre-existing viruses — also helped scientists dispel false and dangerous conspiracy theories that claimed SARS-CoV-2 was engineered as a weapon in some lab.

Nextstrain is triggering a tectonic shift in the larger world of academia. Since the fight against this fleet-footed coronavirus is so urgent, the website needs the data as soon as researchers can map it. There is no time for the lengthy peer review that traditional papers receive before publication. So academics have emerged from their ivory-tower labs and banded together to face a common enemy.

“It’s been really awesome to see the scientific community adopt this platform,” says Wohl. “Nextstrain is an early adopter of making the data available for everyone and giving credit where it’s due.”

The lack of peer review carries risks, but Nextstrain proponents say researchers can compensate for that by understanding the difference between preliminary results and peer-review results and relying on the growing Nextstrain scientific community to police itself.

“Like the viruses we study, science also changes over time,” says Anderson Brito, postdoctoral associate at Yale School of Public Health. “Nextstrain is a real-time representation of how scientific knowledge evolves.”

In January, at the earliest stages of the coronavirus epidemic, only a few dozen variant SARS-CoV-2 genomes were available, not enough to draw specific conclusions. Now, more than 4,500 different genomes from samples of the virus are mapped on Nextstrain. “As the number of genomes increased, more detailed patterns of viral spread and evolution started to be more evident,” Brito says.

Researchers affiliated with Nextstrain are also now planning a sister site, NextTrace, which will apply Nextstrain’s tools to contact tracing. That’s the process of identifying people who have had contact with a positive COVID-19 patient during the epidemic, and asking them to self-quarantine — another strategy for combatting and ultimately defeating this invisible foe.

After the coronavirus fight slows down, Scarpino says, the Nextstrain community will also look back to see how people reacted to its information. “We need to learn about what we got right, and what we got wrong, and what we could have done differently,” he says. “It’s those things we do in between epidemics that — hopefully — make us more prepared for when the next one comes.”

Published on

Tony Rehagen is a writer based in St. Louis.


Illustration by Errata Carmona


Energy justice is the next civil rights issue

Shalanda Baker thinks clean energy can help heal America's legacy of racism. That's why she's joined the Biden Administration.

By Erick Trickey