5 Reasons We Need Integrated and Detailed SARS-CoV-2 Case Data

In this post, we aim to illustrate why we built Global.health and make the case for sustainable and large-scale integration of genomic, clinical, and epidemiological data to protect our communities and prevent pandemics. The better we can fully understand the dynamics of the pandemic, the more effectively we can translate essential data for decision-makers to reduce the burden of disease.

Here are 5 reasons for integrated and detailed SARS-CoV-2 case data:


  1. Defining strategies for controlling an outbreak;

Controlling an infectious disease outbreak is incredibly difficult but still possible. To design strategies of control, important basic characteristics of an infectious disease – such as the incubation period (infection to symptom onset) – need to be assessed rapidly and continuously1 2. Almost all recommendations on quarantine requirements are based in part on estimates of the incubation period (and how long someone is potentially infectious)3.

For example, the incubation period can be assessed from patients who have known travel history to case clusters4 5 or from infection pairs6. However, this quantity is variable because of individual variation, interventions7, and genetic changes in the virus. This makes continuous monitoring and mapping of the individual level case data against behavioural and genetic changes important8. (We include travel history data on Global.health where available.)

Figure (b) taken from Ref. 4: Incubation period of SARS-CoV-2 using case data from individuals who had reported travel history to Wuhan early in the pandemic.


  1. Monitoring the evolution of infection and hospital utilization;

Hospitalisations have become the dominant lens through which governments and health ministries decide their SARS-CoV-2 intervention strategies. The number of hospitalisations that can be expected from case rates for each age group depends on good parameters. For example, 1,000 cases today can be expected to translate into n-hospitalisations in 1, 2, or 3 weeks, depending on additional parameters. 

Many of these parameters are context-specific (e.g., vaccinations, variants of concern, prior immunity, access to health care, underlying comorbidities, ongoing community mitigation efforts to name a few) and time specific9. Studies on the heterogeneity of case-hospitalisation-death rates are limited due to the paucity of data globally. However, they could be made possible when data on cases and hospitalisations are integrated. Global.health has brought together datasets from Brazil, Argentina, and Mexico providing the baseline data to enable this research; an effort that now demands scaling. Individual-level metadata that tracks the subsequent progression of individuals in hospital is equally important for subsequent clinical planning. The various risk factors that influence admission rates similarly affect the relative likelihood of staying in hospital longer or requiring ICU admission.

Taken from Lefrancq et al. Ref 7. Figure: Changes in probabilities of ICU admission and death. A. Daily number of hospital admissions as a function of time, from 13th March to 30th June 2020. Dashed lines denote the different windows of time (named T1-T5) used to estimate the changes in probabilities. B. Changes in probability of ICU admission given hospitalization, as a function of time. C. Changes in probability of death given hospitalization and no ICU admission, as a function of time. D. Changes in probability of death given ICU admission, as a function of time. E. Changes in the overall probability of death given hospitalization, as a function of time. We divide the epidemic into different periods of time: T1: 13 March - 1 April; T2: 2 April - 21 April; T3: 22 April - 11 May; T4: 12 May - 31 May; T5: 1 June - 30 June, T6: 1 July - 31 July, T7: 1 August - 31 August, T8: 1 September - 30 September, T9: 1 October - 31 October, T10: 1 November - 30 November. All changes are weighted by the proportion of patients that are of each sex. Changes are computed relatively to T1 (reference), estimates are presented in Tables S7–10. The dots and lines represent 2.5, 50, and 97.5 percentiles of the posterior distributions.


  1. Tracking symptoms, infectiousness, and symptomatic vs. asymptomatic rates;

Which symptoms are most prevalent? How do they change depending on comorbidities and between contexts? To enable this research we track symptoms based on a unified global symptom ontology (see dataset from Brazil). For example, dry cough and loss of taste and smell were early symptoms of the original strain of SARS-CoV-210 but infections with the Delta variant of concern may also involve a runny nose11. Without mapping cases to lineage, we are flying blind; making it more difficult to control an outbreak of a new variant. Further, it is now well known that some genetic changes result in differences in peak infectiousness12. Ideally, it should be possible to share data on virus shedding for a subset of individuals so better protective measures can be taken to control spread.

Figure taken from (12): Viral load and cell culture infectivity in 25,381 SARS-CoV-2 infections. (A) Viral loads in presymptomatic, asymptomatic, and mildly symptomatic cases (PAMS; red), hospitalized patients (blue), and other subjects (black). (B) Expected first-positive viral load and cell culture isolation probability, colored as in (A). (C) Temporal estimation with lines representing patients, colored as in (A). (D) As in (C), but colored by age.


  1.  Assessing the transmissibility and burden of SARS-CoV-2 Variants of Concern;

The future of SARS-CoV-2 will be governed by its ability to escape immunity and cause re-infection. Serious re-infections are of primary concern and no database that matches individual-level genomic, vaccination, and clinical data exists at the global scale. At this point, these studies come either from a small study population or from a few select countries where such datasets exist (Israel, United Kingdom - see Figure) 13 14, but these are usually not accessible to the wider research community. Brazil recently has made openly available vaccination status of severe infections across all municipalities which we plan to integrate with the Global.health platform. (See a detailed description of the dataset here. NB virus genomic data are still not linked in the national database.)

Figure taken from (14): Kaplan–Meier plot showing survival (point estimates and 95% confidence intervals) among individuals tested in the community in England with (B.1.1.7) and without SGTF, in the subset for whom SGTF was measured. The inset shows the full y-axis range.


  1. Exposing health inequities.

How are we able to improve health for all? Or improve equitable distribution of vaccines and access to rapid diagnostic tests? It is now well established that black and hispanic populations in the United States are more vulnerable to infection and death of SARS-CoV-2 15 16. Some countries now routinely report cases, hospitalisations, and deaths broken down by race and ethnicity (the US, Brazil) but many others are lagging behind in their reporting (e.g., Germany, France). Prioritising the delivery of health services to prevent infection in the most vulnerable groups will be a priority to reduce the burden of this and future infectious diseases.

Figures (A) (B) taken from (16): Individual-level hospitalisation and death risk by age-standardised OR. (A) OR for severe acute respiratory infection (SARI) hospitalisation by race. (B) OR for SARI hospitalisation by income.

Integrated and detailed data are essential to inform both local and global responses to combat the ongoing transmission of SARS-CoV-2, and adapt to future challenges posed to our health systems and populations. Developing an open platform that makes critical data such as these available and actionable for the international public health community is a cornerstone of Global.health’s mission.

These concepts – and possible insights for further health gains – are also applicable to a wide array of pathogens. These far-reaching and life-saving applications – and the supporting technology systems to generate, track, and combine key data in real-time –must be front-and-centre as we prepare for the next epidemic of a known threat, or the next pandemic from a previously unknown source.

David Pigott, Rebecca Katz, & Moritz Kraemer on behalf of the Global.health team.

References