On Nov. 26, the day the World Health Organization (WHO) named omicron the fifth variant of concern of the COVID-19 pandemic, US stock markets lost more than 2 percent of their value; the prominent Dow Jones Industrial Average fell more than 2.5 percent, its worst performance of the year. News of the variant, first reported by South Africa the day before, caused economic shockwaves in other markets as well. With global equity markets representing roughly $120 trillion in value, it’s fair to say that the discovery of omicron had trillions of dollars in impact in a single day.
Omicron’s genetic profile closely matched the mix of mutations that variant modelers predicted would dodge the protection of vaccines and previous infections, leading to a fear that the variant might rapidly spread through the world and potentially cause a great amount of sickness. Amid intense public interest in omicron and its economic and societal reverberations, governments sought to rapidly understand if new vaccine formulations or new mitigation measures were necessary. Knowing the genomic sequence of variants is critical to developing public health responses, and with omicron, the public had this information quickly.
But what if it had instead remained in the hands of a small number of private entities who made it their business to sell exclusive, “early access” insights on emerging variants to Wall Street trading firms? There are no doubt strong incentives to divine the signal from the noise in freshly generated variant sequencing data, and knowing first could make all the difference. Or what if the genomic data for omicron hadn’t been available at all, because the researchers who found it had little incentive to share the information?
Unfortunately, in the rapidly evolving genomic surveillance landscape, neither of these negative scenarios is out of the question for future variants.
Recent history shows that access to critical information about circulating viruses isn’t guaranteed. And now, some important members of the scientific establishment are working to undermine a key source of viral genomic data: the public-private Global Initiative on Sharing All Influenza Data (GISAID) initiative, a global collaboration that has incentivized researchers to share their findings and to which they have uploaded millions of SARS-CoV-2 genomes. At the same time, big technology corporations like Microsoft, Oracle, and Google are eying the viral genomic surveillance market as a potentially lucrative data source, raising the specter of a for-profit system. It’s not hard to imagine a future where much critical genomic data is privatized, or otherwise inaccessible in the fight against pandemics.
GISAID and inequalities in viral genome sharing. Global access to virus samples and genome sequences has long been a contentious matter. For decades, the Western scientific establishment did not properly credit the researchers and others—often in the developing world or from oppressed groups—who discovered viruses or made other important contributions to medical research. And when virus discoveries made vaccines and therapeutics possible, they were usually developed in wealthy countries, and poorer countries have often found themselves last in line to access them.
The technology to sequence viral genomes and rapidly share colossal amounts of digital data has improved enormously in recent years. But we already know what happens when poorer countries recognize that richer ones are keener to gather data on their viruses than they are to share new vaccines and medicines. They simply stop sharing. In 2007, for example, the Indonesian government—concerned that vaccines developed from its influenza samples would only benefit wealthy countries— refused to share H5N1 bird flu samples with the WHO.
In 2006, Peter Bogner, a Time Warner executive, and Nancy Cox, the Director of the Influenza Division at the Centers for Disease Control and Prevention (CDC) in the United States, devised a solution to encourage equitable genomic data sharing, the GISAID initiative. What they developed was a framework designed to overcome the hesitancy among scientists to rapidly share their influenza data. Their solution was to provide a way for people to share viral genome sequence data and ensure that those who used their data credited them. The initiative, which formally launched in 2008, also encourages (and in some cases requires) data consumers to work with data generators, fostering collaboration among scientists around the world. While Cox retired from the CDC in 2014, and Bogner and his team are based in Santa Monica, California, where he lived before GISAID was founded, GISAID essentially operates as a series of public-private partnerships with governments, public-health agencies and academic institutions in Germany, Argentina, Brazil, China, Republic of the Congo, Ethiopia, Indonesia, Malaysia, Russia, Senegal, Singapore, and South Africa. The computers that hold the viral genome data were initially based in Geneva, Switzerland, but after a 2009 dispute with the Swiss Institute for Bioinformatics, the databases were moved to Germany. The organization lists its sources of funding, which include a major grant from the Rockefeller Foundation, as well as smaller contributions from the WHO, the Institute Pasteur (France), and from various governments and biopharma firms such as Roche, GSK, Merck, J&J, and Pfizer.
Although the GISAID sharing mechanism was originally designed for influenza genome data, it has truly shined during the COVID-19 pandemic. On April 4th, researchers from the Bandung Institute of Technology in Indonesia uploaded a BA.2 variant genome—a subvariant of omicron—to GISAID, marking 10 million sequences shared since the beginning of the pandemic. Approximately 50,000 submitters in over 200 countries rely on GISAID to share their data on influenza, SARS-like coronaviruses, and respiratory syncytial virus (RSV), a common respiratory virus that can be harmful in young children and the elderly.
GISAID’s attribution requirements are more than just about providing scientists with an ego boost; credit means scientific credibility, opportunities for funding, and, broadly, the ability to cultivate human capital and infrastructure in a region. When leading Congolese microbiologist Jean-Jacques Muyembe demanded that Ebola samples from an outbreak in the Democratic Republic of Congo that began in 2018 remain in his country, he appeared to force the hands of scientists in richer countries who wanted to study the Congolese pathogens, better positioning his domestic colleagues to receive resources and credit.
By 2019, Muyembe had had a long and illustrative history confronting biases in the biomedical enterprise that favor researchers in richer countries, primarily in Europe and North America. After getting a doctorate in Belgium in the late 1960s, he returned to his home country, then known as Zaire and now named the Democratic Republic of Congo, to work as a field epidemiologist. In 1976, he encountered patients suffering from a new viral disease. Without access even to medical gloves, Muyembe took patient samples and provided them to Belgian scientist Peter Piot, who co-published the discovery of Ebola without naming Muyembe.
Despite the snub, Muyembe’s career flourished, and when Ebola struck his country for the 10th time in an epidemic beginning in 2018, he was able to dictate a startling demand: All blood samples would stay in the country. It was a decision born out of frustration with slights that included African scientists not receiving credit for work they had performed. As Muyembe announced his decision on the blood samples, according to NPR, Japan agreed to invest in a state-of-the-art research facility in the Democratic Republic of Congo.
One of Muyembe’s legacies, he told the outlet, is that “if another young Congolese scientist finds himself with an interesting blood sample, he’ll be able to investigate it right here in Congo.”
Despite GISAID’s success as an effective sharing mechanism during public health emergencies, it has also faced controversies and outright attacks. The organization has been criticized in glossy scientific journals, such as Science. But these criticisms come almost entirely from scientists in wealthy countries who’d prefer to be able to anonymously access and benefit from other people’s data. Last April, for example, former director of National Institutes of Health Francis Collins sent an email in which he proposed that governmental funding agencies in the United States and Europe, as well as the nonprofit Gates Foundation, should use their collective heft to curtail GISAID’s role by enforcing “public domain” sharing of viral genome data.
Likewise, the Gates Foundation, which has a major focus on global health and has given more than $60 billion in grants since 1994, is moving forward with multiple initiatives to fund pathogen genomic surveillance in various so-called “lower middle-income countries” in Africa, Southeast Asia, and Latin America. In a 2021 funding call, a Gates Foundation-sponsored organization called Public Health Alliance for Genomic Epidemiology attempted to compel scientists in these countries who accepted their funding to deposit their viral genome sequencing data into the public domain. Doing so would strip these scientists of any right to even be credited by name when authors from wealthier countries use their data in publications or grant applications.
Public domain repositories. But what is public domain sharing? And why isn’t it the preferred way to share coronavirus sequence data? Public domain databases such as the US National Center for Biotechnology Information’s Genbank or the European Nucleotide Archive are instrumental for biology research, and almost all genetic sequence data is, in fact, shared in the public domain. But public domain sharing—whether it’s music, software, photography, art, or genetic data—means exactly that. When you put your work into the public domain, it is now owned “by the public.” This means anyone can use it for any reason without attribution or any obligation to share benefits.
There are virtues to public domain sharing of genetic sequence data. The public domain is a fantastic way for society to extract the maximum value of genetic data once it has already been generated, for instance by taxpayer-funded research, because the products, in principle, can be used for the benefit of everyone. But its virtues are inextricable from its drawbacks. A major problem with public domain sharing is that scientists do not have any incentive to generate or rapidly share viral genome data via this mechanism. Instead, they often allow their data to sit on computers until they get a manuscript accepted or a grant funded. This reticence to rapidly share genetic data is a recipe for disaster in public health emergencies like pandemics, because fresh viral sequence data from many geographies are needed to discover new variants, to determine which ones are growing faster than others, and to keep diagnostics and vaccines up to date.
An omicron sequence did not show up in the public domain until Nov. 29, almost an entire week after data on the variant became available on GISAID (Nov 23). By Nov 30, only two omicron genome sequences had been made available in the public domain—both from a single country, Belgium. By this time, the GISAID community had already shared 230 omicron genomes from 13 countries and three continents. The public health merit of eliminating or marginalizing such an important system for early detection of new viruses and viral variants seems dubious.
The push for public domain sharing ignores the very problem GISAID was built to solve. A system that allows anyone, anywhere to anonymously access and use a researchers’ data for any reason without so much as crediting them by name offers no incentives for rapid sharing. Therefore, if public domain sharing became the norm, the incentives to provide data in a timely way would have to come from outside, in the form of financial contracts or grant support mechanisms that stipulate that data must be deposited into the public domain.
It’s hard to see how a global viral genomic surveillance system that relies on external inputs, like grant funding, which are at the mercy of federal budgets and philanthropic whims, would be able to detect the next pandemic virus early enough to stop it in its tracks. It’s even harder to see how such an approach would succeed if all the data are supposed to be shared via public domain.
A for-profit route? Will corporations and scientific agencies in Europe and North America inadvertently build a viral variant detection framework that gathers data from poor countries to protect those of us in wealthier places? What would happen if data analyses conducted on computers in Seattle, Berlin, and Paris demonstrate the existence of a terrifying new virus circulating in Uganda, Nigeria, and Cameroon? Who would see the data first?
Given such a prominent role of GISAID in curating and distributing data that has such profound effects on economic activity around the world, it is perhaps not a surprise to see a coterie of big tech and geopolitical players trying to lay down new infrastructure and stand up services that can intercept microbial genomic surveillance data at its source. Microsoft, Oracle, and Google’s Verily Health arm have all shown strong interests in becoming involved in microbial genomic surveillance.
If the technologies and investments of companies turn out to become a key part of the pipeline to detect new viruses, what would prevent corporations from selling early access to such “signals” to hedge funds or to firms that trade in airline stocks or oil futures? In any case, it is easy to see how quickly things can get dystopian when timely genetic data are generated and handled in non-transparent ways. In fact, we already know that firms like 23andMe and health data brokerages are trading patient data for profit. What’s to stop this type of for-profit entrepreneurial system from becoming the default way that we track the emergence of new viruses and viral variants?
Today, even in wealthy countries like the United States, coronavirus genome sequences are decoded by firms like LabCorp, Quest Diagnostics, Aegis Scientific, and many other companies, some of which have deals with stores like Walgreens to carry out COVID-19 PCR tests on patient nasal swabs. In 2021, the CDC put in place contracts that pay these firms to decode the viral genomes from 25,000 samples a week. Of course, this data is valuable, but patients are never told that their samples might generate up to $200 in net revenue for the work of sequencing the viral genome. After being decoded, these viral genomes are deposited in the public domain. And even when it’s smaller academic labs that do the sequencing, funding agencies like the National Institutes of Health and CDC require them to deposit their data into the public domain, where corporations could freely use it to generate insights, which they could then turn around and sell for profit.
A particular problem in the United States, which does not offer health care as a right to its citizens, is that a person—whether insured or not—might well end up paying out-of-pocket costs for a coronavirus test and for medical care, but people behind the scenes are capturing additional revenue off their illness by sequencing the viral genome data from their samples, and that revenue will not be directly used to offset the patient’s medical costs or improve their care.
In developing countries, the situation could get even more objectionable. The hundredth genome sequence from a crowded slum in a developing or low-to-middle-income country like Brazil or India could be far more valuable than the millionth sequence from New York City or London. Crowded conditions and lack of medical resources can speed viral evolution. Will we allow private entities to quietly implement and benefit from their own viral genomic surveillance networks?
One of the lessons of the COVID-19 pandemic is that viral genome data can move stock markets, send the price of oil plummeting, impact the revenue of major hotel chains and airline companies, with cascading effects on just about everything and everyone. In the wake of COVID-19, investments are being made to ensure global economies are not caught off guard again by a virus like SARS-CoV-2, which was able to spread undetected for weeks.
Making these changes won’t be easy. Yet it’s fair to say that genetic data that comes from people’s bodies—even if it is the genomes of microorganisms that cause disease—is no ordinary commodity. New technologies are making it possible to produce and monetize new data streams in ways we could not previously imagine. Implemented properly, we could use advances in genome sequencing to keep tabs on myriad infectious diseases and track their evolution in real time. In fact, genome sequencing is poised to become the primary way that an infection is diagnosed. But if the few are allowed to profit at the expense of the many, the spigots of viral genome data will dry up. People will simply have no incentive to donate samples, and so, new viruses won’t be detected until after the most severe cases show up at hospitals. Time and time again, humans take advantage of each other to make a buck. Some of that is just capitalism. That said, if we do not work to build equitable and transparent ways of generating and sharing microbial genome data from people who are ill, we will be cheating ourselves out of an early warning system that can prevent the next big pandemic.
The challenge going forward is to not abuse or betray trust. If wealthy countries want to know about viruses circulating in poorer countries, they must establish partnerships built on trust and transparency.
The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.