Curator Review for 2022: Reflecting on the mpox and Ebola outbreaks
In 2022, the Global.health (G.h) team completed our 100 Days Mission for two emerging infectious disease outbreaks - mpox (formerly named monkeypox, May - September) and Ebola (September - December). Both outbreaks posed an immediate threat to global public health and we responded by creating openly accessible epidemiological line-list data to support response, providing early situational awareness during a time when the chance for outbreak containment is highest. Our curation team experienced recurring challenges in building an emerging disease dataset in real-time and wanted to highlight similarities and differences between outbreaks.
Both outbreaks tested our tracking methodology. The scale of the outbreaks is very different. The mpox outbreak was widespread, impacting over 110 countries/ territories and resulting in more than 84,000 confirmed cases worldwide. Our curation team reviewed over 50 sources per day to update line-list information. In contrast, the Ebola (Sudan ebolavirus strain) outbreak was contained to one country - Uganda. As of January 11th 2023, the declaration of the end of the outbreak, there have only been 142 confirmed cases. Our curators focused on one primary source for outbreak information - Situation Reports (SitReps) from the Ugandan Ministry of Health (MoH). You may think that one source is easier to triage than 50, but the quality and reliability of information has a large impact on our ability to create and maintain a line-list dataset, regardless of the number of sources.
Early in an outbreak, curators are developing and refining our sourcing strategy. Often, sources and the information they provide are not consistent in format, detail, or timing of delivery. We spend a lot of time identifying, vetting, and augmenting information with secondary media sources. As time passes, information may become more centralized and predictable from official sources; this is the pattern that we observed for mpox. For each outbreak, we establish a source list and prioritize the order and frequency for review based on resources. Our curators sync and discuss pain points frequently. From these conversations, we created a “Top 10” list of curation challenges: access to official information; time delays; data gaps; format changes; country-level reporting that lacks geographic granularity; aggregated counts; inconsistent details; confusing statements (including translation); data entry errors; and data reconciliation exercises. SitReps have presented every challenge in our Top 10 list during the Ebola reporting period, and as a result we are working with many limitations and assumptions.
When the Ebola outbreak was first declared in September 2022, our curation team was limited by access to official information. SitReps were not publicly available at the outset, and we relied on news media for case information. The first MoH SitRep that was publicly released [#10] already totaled 54 confirmed cases, including 35 deaths; we could not review the early evolution of cases over time and had to use secondary sources to create the line-list and add key metadata [e.g. location; outcome; occupation]. To understand the impact of this outbreak, we needed to track cases/deaths/recoveries; geographic spread; affected population (healthcare workers). We quickly found that the SitRep formatting and data is inconsistent with many errors, and data was changed from one report to the next without explanation. These inconsistencies and errors directly impact the integrity of G.h data; we have built-in quality control checks to review and attempt to reconcile errors, which is a manual and very time-consuming process. Oftentimes, our curators will delay ingestion of cases or metadata if we observe a reporting inconsistency or assume that a data entry error has occurred; we wait for the next SitRep to be released and compare data to see if an error has been corrected or updated, and then ingest accordingly. This QC-step creates a lag in ingestion, but here we decided to prioritize quality and accuracy over speed.
The average case fatality rate (CFR) for Ebola is 50%; an estimated half of infected people will die. This Ebola outbreak ended up ranking in the top 10 largest Ebola outbreaks ever recorded in terms of case magnitude. We needed to log case details, specifically Outcome [death/ recovery status], to support downstream analyses. It was not feasible for our team to track Outcome for tens of thousands of mpox cases, but case numbers were smaller for Ebola and thus seemingly more manageable. However, even though Ebola case counts were relatively small, we struggled to assign Outcome in our line-list. The current G.h sum of Outcome by district-level location is misaligned with MoH SitRep case counts. Starting from SitRep #66, our curation team identified discrepancies and data entry errors in summary charts. SitRep #66 reported 4 recoveries in Mubende district and SitRep #68 reported an additional 2 new recoveries in Mubende. These increases resulted in the sum of the Outcome count exceeding the confirmed case count for Mubende [i.e. MoH SitReps count more deaths and recoveries for Mubende than there are confirmed cases, which suggests that they have made a data entry error]. As a result, the G.h dataset has a remaining 6 cases [1 Mubende, 5 Kassanda] without an assigned Outcome. Because the MoH SitReps reported all remaining recoveries under Mubende, we are unable to assign Outcome status for remaining G.h cases with confidence. Further, SitRep #68 reported a reclassification of Outcome for a Mubende case (death to recovery) after a data reconciliation exercise, but again our curation team was unable to determine which case ID to update due to lack of detail provided in the SitRep. Curation methods, and the process to log Outcome, are reviewed in GitHub.
Both outbreaks were brought under control. Officials were able to curb spread and contain both outbreaks, but the lack of a coordinated global response, delayed guidance for risk communication and community engagement, and slow government action were faced with criticism from the public. A combination of increased situational awareness, changes in behavior, and vaccination curbed the spread of mpox, while more stringent measures like lockdowns were used to combat Ebola. Ugandans in the two most affected districts - Mubende and Kassanda - were placed under lockdown, which extended for 63 days. Lockdowns have many unintended consequences, including a lack of food and resources, household economic distress, negative impacts on mental health, increased reports of domestic violence, and persistent community stigma.
In both outbreaks, affected communities played a role in their collective response. For example the “for us, by us” approach reflected a grassroots response to mpox, and was previously used for COVID-19 vaccination. The predominantly affected MSM community got involved to support messaging around mpox education, vaccination, and prevention. For Ebola, Village Health Teams were formed from volunteers who were trained in disease surveillance. Not only did they provide support in actively monitoring the symptoms of contacts, but they were indispensable in raising awareness and dismissing fear and stigma among community members, which the WHO mentions was a crucial aspect of the Ebola response. We resonate with this type of community engagement - our transparent, crowdsourced approach to sharing epidemiological information across borders and GitHub collaboration aims “to put the public back in public health.”
The global public health community still has a lot of work to do in order to prepare for the next outbreak, and we are doing our part to improve access to open public health data. Our approach remains to contribute a line-list dataset that is open, granular, and standardized for epidemiologists and other responders to visualize, model, and mitigate the spread of emerging infectious diseases. But, our data is only as good as the source data provided by public health institutions, and we have used this newsletter to highlight many limitations and identify areas for improvement. The threats are real. We are not fully prepared. We can all do better.
We continue to thank our user community for the many helpful contributions we have received through our GitHub and email [info@global.health]. Get involved!