scroll

This section takes a close look at how case data--their definition, use, and management--influenced the Ebola response. Both case data about individual patients and caseload data, which comprised aggregated individual case data, were central to understanding the disease’s trajectory and to formulating corresponding aspects of the operational response. As one health expert explained, “Infectious disease requires strong operational linkage of interventions. The case investigation team has to provide information to the contact tracers and burial teams, and laboratories are providing results back to everyone. The data on epidemiology has to go to social mobilization folks. This is not technical but operational links are critical for this type of  response.”[1]

Data about individual patients suspected of contracting Ebola were usually collected via a CIF. The data on these forms informed the range of operational activities associated with the response efforts. Information about contacts became the basis for tracing other potential Ebola cases (contact tracing); documentation about presenting symptoms, daily vital signs, and other health data became the basis for both patient treatment and epidemiological research; geographic data about an individual patient’s location were crucial for everything from isolation or quarantine for family members to social mobilization activities and the dispatch of ambulances or burial teams. 

The exchange of case data is also broadly reflective of how formal response actors (such as governments, international actors, and large NGOs) exchanged other types of data in mounting and deploying the response. Similarly, the challenges facing the collection of case data mirror those affecting the use of other data sets (e.g., laboratory tests, supply chain and logistics information about medical equipment and other supplies, or survey data about people’s knowledge, attitudes, and practices (KAP) related to Ebola) used in the response.

Case Data Definition and Use

Terminology 

Individual patients were characterized as falling into one of three categories: suspected, probable, or confirmed Ebola positive. 

  • A suspected case was one in which any person, alive or dead, had symptoms and known contact with a suspected, probable, or confirmed Ebola case, a dead or sick animal, or any person with unexplained bleeding or any sudden, unexplained death. 
  • A probable case was any suspected case evaluated by a clinician, or any person who died from “suspected” Ebola and had an epidemiological link to a confirmed case but was not tested and did not have laboratory confirmation of the disease. 
  • A confirmed case was any probable or suspected case that was confirmed when a sample from that patient tested positive for Ebola virus in the laboratory.[2]

Although these terms were standardized across the response, the interpretation of the terms differed across countries and organizations.

The Complications of Counting and Defining Ebola Cases

Compiling Ebola Caseload Data

Compiling data for this report about the toll of Ebola proved especially complicated. Readers may note that the numbers in this report differ from those in other sources, and that the report includes several sets of numbers (see Tables 1 and 2). As this section outlines, the Ebola case numbers differed, depending on whether the source included only confirmed cases, or confirmed, probable, and suspected cases (see Table 1, The Toll of Ebola, which contrasts these numbers). Whereas cumulative numbers were more dramatic and illustrated the scale of the response, the weekly case counts provided a picture of the state of the epidemic as it happened.

As laboratory testing facilities became more available and accessible, the accuracy of case counts improved and individual cases moved from classifications as either probable or suspected to confirmed cases or, if the result was negative, to non-cases. This affected the tallies of Ebola cases and deaths, across sources and dates, even though the patterns and trends remained relatively consistent. For these reasons, constructing the cumulative toll of the outbreak will remain a challenge; the numbers will likely never be fully reconciled. As one official suggested, capturing weekly case counts as opposed to the cumulative numbers represented a worthy tradeoff. “We were putting in the hands of responders the best data possible. For the world, getting an accurate picture was not going to happen.[3] Although an acute issue at the start of the Ebola outbreak, data collection did improve as the outbreak progressed even though many of the data analysis challenges outlined in this report persisted.
 
Table 2, below, illustrates these changes over time, due to reclassification (e.g., a case moving from probable or suspected to confirmed, due to laboratory test results) and retrospective investigations. It serves as a reminder of the data picture closer to the peak of the Ebola oubtbreak in West Africa and of the “fog” that characterized the early days of Ebola case data collection.

Defining Ebola Cases

In addition to the challenges of compiling case data, the case definitions of Ebola changed between and within countries and over the course of the outbreak.[4] All three countries used the WHO case definitions as a starting point. Yet interviewees reported definitions sometimes changed without warning or without ensuring district surveillance officers and others were aware of or understood how to interpret the changes. This made case designations difficult. Those making the designations did not always understand the distinctions between the terms used for Ebola case definition (i.e., probable, suspected, confirmed), and not all probable or suspected cases were eventually reclassified as a confirmed case or for negative results, a non-case, even after the end of the outbreak. According to the WHO classification guidelines, all probable and suspected cases should have been confirmed by laboratory tests, ideally within seven days. Yet variables such as the availability of laboratory testing, the accuracy of the test, and when an individual was tested influenced when, whether, and how individual cases would shift between the categories of probable, suspected, and confirmed. At the height of the epidemic, delays in the manual input of data meant that these reclassifications would happen in batches. According to one official, “They would reassess a bunch of probable cases and they would become suspect or confirmed. I’m guessing that was due to a backlog of laboratory cases.”[5] As a result, the aggregated numbers of confirmed cases could shift from week to week, which created confusion with the caseload data counts.[6]

Guinea

In Guinea, “probable" cases were only assigned when the individual in question was dead, and the cause of death was assumed to be Ebola[7]. A “suspected” case referred to a person who was sick and who met the case definition for Ebola at the time. Only those cases with a positive laboratory test were designated as confirmed cases. Individuals from certain geographic locations (i.e., préfectures with a confirmed Ebola case in the previous 21 days) were designated as suspected cases.[8] Case definitions changed multiple times in Guinea, sometimes without official approval or without notifying all health care providers of the changes. In one instance, a préfecture changed the definition without a change at the national level.[9] 

Liberia

Liberia’s Ministry of Health established the Ebola case definitions in use during the outbreak, based upon the WHO guidelines. In May 2016, well after the end of the epidemic, the Liberian MOH defined standard case definitions for priority diseases, including Ebola, as part of its Integrated Disease Surveillance and Response (IDSR) guidelines. Health workers conducted ongoing and systematic monitoring of disease in order to improve public health, known as disease surveillance and response. This routine surveillance definition of a suspected Ebola case required fewer symptoms than the more sensitive definition that officials used during the outbreak.[10] This change emerged, in part, from a recognition that integrating Ebola disease surveillance into routine surveillance activities would assist the earlier detection and containment of future Ebola outbreaks.

Sierra Leone

The initial case definitions for suspected, probable, and confirmed Ebola cases originated with the WHO. In August 2014, the Sierra Leone Ministry of Health, with guidance from the WHO, changed the definitions to those described above. Like Liberia and Guinea, the case definitions were inconsistently applied across health facilities and treatment units, and were not always well understood by those completing the CIF or entering data into the VHF module database, which led to variation in how case data were interpreted and recorded.[11] 

Case Data Use

Health officials at the county, préfecture, or district level used case data to dispatch relevant operational responders, such as contact tracers and burial teams. These data also were aggregated and transmitted, often via email or daily or weekly phone calls to the national level, where they became the basis for aggregated country-level Situation Reports (SitReps). SitReps were the primary vehicle through which national governments, together with WHO and CDC, provided formal reporting about aggregated data on the outbreak and response on daily, weekly, and monthly bases. These SitReps were disseminated and cited in media stories and used to inform high-level national and international discussions about the Ebola outbreak and response efforts.
 
The publication of case data in aggregated form in PDF SitReps meant that the raw case data, graphics, and charts in these formal, public reporting documents were not easily accessible. The PDFs were neither machine readable nor could the data readily be transformed into other formats. This limited the ease with which other actors could use these data, requiring that data be manually re-entered before they could be re-used for analysis and reporting,[12] or transformed into new charts and graphs. One interviewee succinctly stated, “People will make a PDF SitRep out of a spreadsheet, and that’s the end of the spreadsheet,”[13]  a reference to the lost ability to easily access or alter data electronically once they moved into PDF format. 

Forms and Data

Case data, whether in paper or digital format, originated at various collection points. They were recorded by different actors and traveled among response entities in a variety of complex ways. One interviewee in Sierra Leone characterized data flows in this way: “Data flow was nuanced across districts; it varied by district, and between countries. The reason for that was because of the number of different ways that a case form could be completed and lab results generated. You could have an Ebola case be an ill person arriving at hospital or a ... dead body picked up by burial teams.” In the former, responders would complete a CIF whereas a laboratory test would define the latter as an Ebola case, or a non-case.[14]Although the primary flow of case data remained the same across countries (from treatment unit to district/county/préfecture to national level), the flow differed by response actor and the type of data being shared. For example, in Guinea, WHO managed case data, which préfecture officials sent to a central, national-level repository where a limited number of data managers entered data. In Sierra Leone, the CDC managed case data in the EpiInfo VHF module. In Liberia, case data were managed by the Ministry of Health, with support from the CDC and WHO. In each of these cases, however, the data belonged to the respective national ministry of health.[15] 

Collecting Case Data - Caseload Investigation Forms (CIFs)

Case data, a primary data source used to inform the response, usually were collected on paper CIFs at the point of identification as probable, suspected, or confirmed Ebola cases. The primary Ebola CIF used in this outbreak was originally developed for and used in previous Ebola outbreaks in Uganda and the Democratic Republic of Congo. The CIF was based on the data tracked in the VHF disease surveillance module for EpiInfo.[16] Case data, however, could be collected multiple times, primarily because routine ministry of health-level surveillance for malaria, cholera, or other infectious diseases occurred separately from the overall response. As a result, case data of various types in all three countries were collected in multiple formats (both paper and digital) in treatment units by those providing care, by those responsible for routine health ministry disease surveillance (supported by WHO), as well as by those investigating Ebola cases or tracing contacts of Ebola cases as part of the overall Ebola outbreak response.[17]
 
As mentioned above, a number of variables in the response had an impact on whether, how, and the number of times CIF data were collected. For example, where, when, and how a person was identified as a probable, suspected, or confirmed Ebola case affected the amount and type of information collected in CIFs. Interviewees indicated CIFs were not always completed, as in instances in which a patient was too ill to answer questions, and family members were unavailable or unwilling to provide information. At times responders were too overwhelmed with cases to complete CIFs for each patient, as was the case in many Ebola treatment centers at the height of the outbreak. If an individual presented with symptoms at multiple facilities on different dates, one individual might be associated with several CIFs. Finally, deceased patients were tested for Ebola and had lab results, which data managers at the national or county level later associated with a unique patient ID. Their CIF indicated their status as a confirmed Ebola case or as a non-case, and contained varying degrees of completeness depending upon the information case investigators were able to discover about the deceased.[18] These forms were eventually manually transcribed into electronic format and compiled either in Excel spreadsheets or in digital databases.

Non-standardized Use of Forms

The paper CIF had a corresponding digital format in the EpiInfo VHF module that essentially mirrored the paper form.[19]The paper-based format used at the outset of the outbreak and response was 11 pages with eight sections. Sections included patient information, clinical signs and symptoms, hospitalization information, epidemiological risk factors and exposure, case reporter information, patient outcome, and final case classification. In fall 2014, the paper and the corresponding digital version of the form used in the EpiInfo VHF module were condensed to a shorter version covering symptoms, dates of onset, travel history, and exposure history, and lab samples.[20]The shortened form facilitated data entry, since data managers had fewer data points to digitize. Even so, the forms were not always completed or accurately completed, for myriad reasons including time, resources, or staff capacity.[21]
 
In addition, the information collected in the CIFs was not consistently digitized or used across actors. In Sierra Leone, toward the end of the response when widespread transmission had stopped, some districts stopped using the CIFs and would only complete the CIF associated with the process of swabbing deceased individuals or suspected cases. One official explained how this affected both the collection of CIF data and its digitization: “The swabber and DSO [district surveillance officer] would both respond to a death. In reality the swabber was usually there first and then the DSO. … Some districts have both at the same time, some just a swabber.… Some use both a CIF and a swabbing form. Either way some information is captured, but if there is no CIF then the case doesn’t get to VHF. Any time there is a CIF, the CIF goes to a data manager who enters this into the VHF.”[22] If no one entered the CIF into the VHF, digitization did not occur. In general, both duplication of case entry or non-digitization of CIFs proved problematic for the compilation, analysis, and use of case data to support the response.

Tension between Levels of Detail of Data Required

The CIF, which became the basis for the case data collected for the response, was designed by and for epidemiological purposes for those seeking to capture the precise and detailed patient-level data required to understand the incidence, transmission, and clinical presentation of Ebola. In the early days, however, it was impossible to capture this level of detail for the number of patients in the Ebola treatment units where medical staff worked under incredible time pressure and in difficult environments. One medical doctor providing care at an Ebola treatment center in Sierra Leone said, “Information collected on paper forms was built for research purposes, not for patient care. We were given case information forms with pages of data in size 10 font that we were meant to collect. But the clinicians didn’t have time to report this level of detail, and so it was never captured.”[23] Another health official in Liberia echoed this sentiment with regard to the degree of detail required in the VHF forms, “VHF was a research tool with lots of variables.” She continued, “The wrong data system was set up with best intentions.”[24]
 
Although burdensome for the frontline health care workers, this level of detail is crucial for understanding the epidemiology, and subsequent prevention and treatment of Ebola. In the words of one epidemiologist, “The clinical definition of [an Ebola] case here was very different than what we had seen before. Prior to this outbreak, we would say hemorrhage was the defining feature of Ebola. But we didn’t have this here. Here it was vomiting and diarrhea and general pain. People had incredible weakness.” His colleague elaborated on the importance of this level of detail: “With this, the case definition changes. If you don’t document all this, you can’t find [this information]. ... you would lose it.”[25] Knowing how a disease spreads, how rapidly it spreads (e.g., in a linear fashion or exponentially), and the patterns of its spread are all part of the epidemiology of a disease.[26] This knowledge, in turn, informs the type of interventions needed to stop transmission, how rapidly responders should intervene, and where and who responders should target first. In the case of Ebola, epidemiologists had very few answers to these questions prior to this outbreak.

Multiple Reporting Requirements

The requirement for many managing treatment centers to submit various reports to multiple entities compounded these challenges. To the central response coordination bodies, they reported available beds, and case and patient care data, and to donors they provided grant-related data. One NGO official explained these reporting requirements, saying, “We made large tables with current [Ebola] status by age, how many patients were in suspect/confirmed wards, who transited between wards, and numbers for discharge and death. … [Our headquarters] wanted what changed that day but [our donor] wanted statistics for that day in the entire center, and also the cumulative total for entire center since our opening. Because this was too complicated, we took the reports that included the most information and made two-page reports that we sent to everyone.”[27] Multiple interviewees highlighted the burden of reporting to multiple entities, including within and outside of their agency (to donors and coordinating bodies), much of which required different information in varying formats.[28]

Moving from Paper to Digital

Impact of Human Error and Time Delays

As with many other kinds of data collected in the outbreak response, case data were collected on paper, primarily at the community or district/county/préfecture level. In these instances, community or frontline health workers collected the data, and subsequently transmitted them to the county, district, or préfecture and then to the national coordinating body. One responder in Sierra Leone noted, however, “Not every district was running its own data entry. If the district had a low number of people or limited human resources, then they’d be doing data entry elsewhere.[29] In many places, data entry clerks--usually referred to as data managers--from WHO or the national coordination bodies supported these officials. At the district or national level, these data managers manually entered relevant data from thousands of pages of information about individual Ebola cases into one of three digital tools used to track Ebola cases: Excel spreadsheets and/or a designated database, such as the EpiInfo VHF module or the District Health Information System 2 (DHIS2) Ebola module.[30]
 
The process of manually digitizing case data led to human error and time delays that affected the quality of case count reporting. Numerous interviewees highlighted the possibilities for human error and inconsistent reporting that emerged in the data entry process. At the beginning of the response, the VHF module supported free-form text cells as opposed to dropdown menus. For example, with a free text cell, those doing data entry could provide an exact age or a range, making automatic comparisons impossible across cases. During the fall of 2014 the CDC modified the module, in part to eliminate or at least minimize transcription errors and to facilitate comparisons.[31]
 
The significant time required to enter relevant data into digital format likewise affected data quality and use. In an analysis of on-the-ground data systems, the Gobee Group tested the time required to manually digitize data and the resulting effect on data quality. In testing 80 forms, they found the following: “Scanning the forms to have their data pulled out automatically was 38 times faster than data entry completed by hand. While it took 153 minutes for the team to manually input the data, scanning the forms took only four minutes. Although accuracy for number-based data was roughly the same for both processes, letter-based data was 21 percent more accurate when scanned.”[32] Unsurprisingly, given the amount of data during the height of the outbreak, a WHO official indicated that in September 2014, the data entry required to maintain the VHF module in Liberia was three weeks behind the data reported in the SitReps.[33] Similarly, the CDC reported a time lag of one to two weeks between VHF entry and data submission to the national level in Sierra Leone.[34]

Excel-based Data Management

Interviewees frequently referred to Excel spreadsheets as the unofficial data management tool of the response. “Excel was the unsung hero of the Ebola response,” one interviewee said.[35] Because Excel spreadsheets are often used for reporting and monitoring health, humanitarian, and development activities, many organizations adopted Excel to manage case data. Both for instances in which digital databases were not used and, frequently when they were, Excel spreadsheets were central tools by which digitized data were shared, stored, and managed. To compile aggregated caseload data, the WHO provided a spreadsheet template with guidance about data collection, which organizations modified to meet their own reporting needs. According to one official, “The spreadsheets did have a standard variable list. These were used by organizations, and they were adding extra variables on the end.”[36]
 
In most cases, case data collected on paper were manually entered in Excel and/or the EpiInfo VHF module (depending on the country) and then shared with community or district/county/préfecture-level health teams or sent directly to the national coordination bodies, where they were aggregated and reported in caseload data statistics. Even so, multiple interviewees signalled the importance of training since computer and digital literacy were not widespread among local staff.[37]

EpiInfo Viral Hemorrhagic Fever (VHF) Module/EpiInfo (Guinea and Sierra Leone)

Both Sierra Leone and Guinea used the EpiInfo VHF module to track overall caseload data throughout the response.[38] The VHF module, however, was designed to track cluster outbreak investigations and not as a national disease surveillance system. As a result, the VHF module originally functioned on a single computer. Because the outbreak was geographically dispersed and required data entry from multiple locations, during the response it was redesigned to support distributed data entry. This allowed data entry and aggregation from multiple computers and locations into a single, centralized database, housed at the national level. In principle, data managers in the community health teams or district health offices would manually input data from their paper CIF forms into the centralized VHF database, which then could be accessed at the national level.
 
Even so, users reported challenges related to connectivity, version control, and system updates.[39] For example, within the VHF module, when district-level officials updated their data with new information, the update could overwrite and replace the existing data, including any changes or modifications. Consequently, in some instances when national-level data managers corrected data, their data cleaning corrections were lost in the process. Others reported challenges with having the relevant technical expertise in country to update the VHF module as new versions came out, or to train new people in VHF data entry.[40]

The District Health Information Software 2 (DHIS2) Database (Liberia)

Unlike Sierra Leone and Guinea, which used the VHF module throughout the outbreak, Liberia changed how it collected caseload data.[41] In the summer of 2014, the CDC EpiInfo team set up the VHF module and trained staff from the Liberian Ministry of Health in its use. The VHF database was housed in Lofa county, where the outbreak was centered at the time, and at the Ministry of Health building in Monrovia. The data were mostly CIF data, but individual records often lacked contact tracing and lab data.[42] As the fall progressed, cases mounted into the hundreds and it became apparent that the VHF data entry was weeks behind, limiting the system’s use for reporting and decision making. In the words of one health official, “The outbreak outgrew the VHF as a solution.”[43]
 
Three years prior to the outbreak, Liberia had adopted a national health information management system called DHIS2, a system designed to track health facilities, monitor and evaluate select health programs, and analyze and visualize data.[44] Given the issues with data entry, in the fall of 2014 Liberian officials decided to jettison the VHF system in favor of developing an Ebola-specific disease module for DHIS2. Although DHIS2 had only been deployed for aggregated data (versus individual data, as in the Ebola VHF module), DHIS2 developers from the University of Oslo created a new module tracking Ebola patients and contacts that catered to the revised and shortened CIF, which was deployed in late 2014.[45] In the interim, Liberian Ministry of Health and the supporting CDC and WHO officials managed the Liberian case data with paper forms coupled with Excel spreadsheets and Google drive.[46]
 
Designing the new module was fraught with complications. The urgency of the situation meant that the outside technical team from the University of Oslo had to scramble to quickly assemble a team to support the work. Most of the team worked virtually from abroad but one technical coordinator was deployed to Liberia. Poor connectivity and the perception of danger associated with foreigners traveling to Liberia at the time (September through November 2014) further complicated the work. In addition, developers had to consider ways to link in the VHF module case data throughout the response; in the end the VHF data were entered manually into the new DHIS2 module. Key officials were trained on using the new module but continued adjustments to the module (and associated training) occurred throughout the fall until it was deployed in late 2014. According to Knut Staring, the developer deployed to Monrovia, “The constant stream of updates interfered quite substantially with the development of the new module. Because of the great urgency it had not been engineered to be generic and easy to change.”[47]

Impact of Infection Prevention and Control (IPC) Measures on Digitization in Treatment Centers

The physical constraints required by stringent infection prevention and control (IPC) measures necessary to stop Ebola transmission complicated data collection and digitization within the treatment centers themselves. As a result, digitization almost always occurred as the second step of data collection, with paper as the first step. The physical constraints included: limited time periods during which doctors were permitted to remain in the infectious “red zones” due to the personal protective equipment worn in extreme heat without temperature control; the physical separation of red zones housing Ebola patients to the uncontaminated “green zones”; and the requirement to burn or disinfect through a chlorine rinse anything moving between zones. Most frontline health care workers prioritized patient care over reporting during their limited time inside the red zone.[48]
 
Due to these constraints, it was not possible to follow normal patient care protocol in which charts are kept with individual patients, enabling doctors and nurses to document their treatment and leave an accessible record for other clinicians to view. Instead, those running treatment centers developed a variety of coping strategies, many of which depended on connectivity within the facility, as well as people’s access to and familiarity with equipment and technology.[49]
 
In the most basic centers, physicians in the red zone would dictate patient information to individuals in the green zone, or write down essential patient information and leave it in a place viewable by those outside the red zone. Health care workers in treatment centers with chalk or white boards ensured the boards were visible to those in the green zone. This allowed someone to take a picture or transcribe the information by hand into patient charts kept outside the red zone. In these cases, patient information could be transcribed two or more times, thereby increasing the possibilities for error. One medical doctor working in a treatment center in the fall of 2014 estimated that this process of documenting patient treatment took several hours per day: the clinician would document the treatment in the red zone, then orally report to someone outside the red zone, and that individual would transcribe the information to the patient’s chart. This process would be repeated at least once for each patient during each of the three daily shifts.[50]

A Proliferation of Platforms and Tools

Each of the three most-affected countries collected and reported its aggregated caseload data differently, adopting different structures, mechanisms, and procedures to coordinate the response. The ministries of health, WHO, and CDC all released both national counts and regional totals for the outbreak.[51] The reporting and the timing of release of these data sometimes differed. In addition to the collection and management of case data use of VHF and DHIS2 for aggregated caseload data, multiple forms, formats, and platforms were used in collecting data related to the various pillars of the response, such as case investigation, social mobilization/community engagement, and infection prevention and control.[52]
 
Across the response, organizations deployed digital technologies to manage treatment centers, case information, contact tracing, burials, and other key activities. These technologies ranged from Google documents, Excel spreadsheets, Dropbox, open-source--often free--software (e.g., OpenDataKit (ODK), KoboToolbox, Voozanoo, OpenMRS), proprietary software (e.g., Magpi, Sense Followup/ID, Tableau, iForm), as well as combinations of these tools. This resulted in a non-aligned approach to data collection, storage, and management. Various accounts have tracked the breadth of digital tools used across the response. As one report described, “Over the course of the epidemic, the operational infrastructure of the response involved more than 50 independent technology tools. One group catalogued more than 300 separate initiatives to engage the public,” a number of which intended to do so using digital tools.[53]
 
Although the functionality they offered enabled users to meet a variety of needs, the proliferation of tools and platforms and a lack of commonly used standards and data sharing between them contributed to the lack of readily available data needed to create a common picture of the outbreak and the corresponding response.[54] Moreover, many of the information collection systems that organizations set up during the response were not linked to national systems or national capacity. One responder collecting data about community attitudes in Sierra Leone recounted a conversation with a national official. He reported, “The question I got from the NERC coordinator was, ‘This is all good, but what are you doing to ensure that these platforms are integrated into national response system?’’” He continued, “There are different organizations doing different pieces of data collection. Some of these things even in an emergency context have to be thought out, ideally in the preparedness phase.”[55]
 
In other ways, the fact that the response played out over time and across multiple countries made it possible to use and reuse tools and to employ fixes from one country to another. One interviewee pointed out that the early peaks in Liberia made it possible to employ lessons learned in Liberia in the other two countries. In another example, a glitch in the VHF module that temporarily de-linked laboratory test results from the associated patient affected both Sierra Leone and Guinea. Responders were able to use the fix developed for Guinea in Sierra Leone as well.[56]
 
Although some organizations that operated treatment centers based their patient data collection in these centers on the CIF forms, others developed data systems that responded to their own specific workflows or needs. Within Ebola treatment centers in particular, organizations developed customized software systems to manage patient records and enable the transfer of information from contaminated areas of treatment centers. One NGO official remarked, “It was so hard to accurately get information out of the red zone. If we could get high-quality information out, then we could improve our understanding, and also patient care.”[57]
 
Some of the digital technologies employed open-source or interoperable platforms--in the sense of being technically integrated with other systems--but interviewees reported that many of these systems were standalone systems.[58] Even though some of these systems used standards and open-source tools, such as OpenMRS, OpenHIE, and DHIS2, the number of standalone data collection and management systems deployed resulted in a lack of interoperability between systems. This proliferation of systems also created difficulties in centralizing data, and complicated efforts to align information infrastructures used to support the operational response. “The challenge is getting data that can ‘talk’ to itself--across different actors and types of data,” said one interviewee.[59]
 
Finally, many of the newly developed systems or platforms were one-off instances that functioned more like pilots in that many were new, and none were deployed at scale. The effort and time required to develop, test, and deploy new digital systems meant that many of the systems were ready only after the caseload had declined. An NGO official, who had been involved in deploying a pilot system, said, “It was hard to understand how long and complicated it would be to do a digital system.[60] Multiple interviewees mentioned that they underestimated the amount of time, effort, and human resources required to develop, deploy, and manage these systems, particularly in the middle of the response that escalated from dozens to over one hundred Ebola cases in a week.[61]
 
When used, however, they enabled efficiencies and adaptations that met identified needs from frontline responders. For example, the Red Cross used maps of Ebola cases to quickly and efficiently deploy social mobilization and burial teams to Ebola hot spots. Using mobile phones and GPS software allowed them to track where they picked up and buried people. “We were collecting hundreds of bodies, and reporting about 20 people per week without names at the peak of the crisis,” said one Red Cross official. To promote accountability and address this issue, they worked with the software company to quickly integrate an additional feature--photographs--to help identify people in cases where they lacked names for the deceased. The latter feature was important in helping relatives locate family members who died during Ebola outbreak.[62] These and other benefits of digital technologies are explored more fully in the following section.

Lessons

This discussion of the collection, management, and analysis of case and caseload data in the Ebola response paints a picture of the myriad challenges that complicated efforts to efficiently and effectively track case data in particular. It highlights a series of lessons regarding data and information flows, and for health and humanitarian preparedness and response more generally.
 

  • Limited mobile and Internet connectivity hampered the sharing and digitization of case data as well as other types of data used in the response. Where limitations in digital connectivity were understood, solutions could be designed accordingly, such as in mobile credit top-ups for health workers to facilitate reporting of case data. As one interviewee said, “In order to generate good data, first you have to understand the situation and environment, where data is being produced. Some days, 7 out of 15 counties weren’t reporting [case data]. The reason was simple: they did not have Internet or scratch cards [for mobile top-ups].[63]
Extending Connectivity in an Emergency

The use of VSAT and BGAN satellites to rapidly deploy access to communications in areas without Internet or mobile network coverage has been common practice in response to sudden onset emergencies like natural disasters, and was repeated to extend connectivity to responding entities during the Ebola outbreak. This satellite-based communications model presents two significant challenges: high costs to establish and use, and a lack of sustainability once international partners who traditionally deploy the satellites leave.
 
The 2014 arrival along the coast of West Africa of an undersea fiber optic broadband Internet cable presented the opportunity for an alternative model of extending connectivity: a point-to-point wifi network. The Ebola Response Connectivity Initiative (ERCI), a consortium made up of a diverse group of telecommunications and technology organizations, leveraged this model to extend access to the Internet in Ebola hotspots in Liberia and Sierra Leone. The initiative forged relationships with mobile network operators to use existing cell phone towers to set up the point-to-point wifi connections, and within three months established over 100 communications centers for health clinics and other areas staffed by responding organizations.

In addition to the involvement of international partners Facebook, Cisco, NetHope, EveryLayer, and Inveneo, the initiative included local technology companies, such as Damsel in Sierra Leone, that were certified to provide ongoing hardware, software, and system maintenance. Although these in-country partners solved one portion of the challenge of maintaining this network over time, financial sustainability challenges remain once international donor funding ends. Furthermore, while the initiative succeeded in boosting connectivity for response actors, it did not extend the reach of communications to extension workers and average citizens, leaving a critical gap still to be filled.

  • In environments with limited digital connectivity, solutions that functioned in both online and offline environments were essential. One responder noted, “The [electronic medical records] system we built worked for our site but wouldn’t sync offline. That is a useful feature. … The situation for patients changes so quickly--it is nontrivial to say that I’ll do this offline but then there are really important time stamps that don’t get captured.”[64] Agreeing upon a simple and straightforward paper-based data collection approach at the beginning of the response, prior to digitization, could have enabled comparable data across paper and digitized datasets, and facilitated the implementation of digitization where connectivity existed.
  • Technological challenges included version control and lack of in-country expertise to troubleshoot problems that arose with the digital technologies in use.
  • Existing digital information systems were under- or unprepared to deal with the data demands of the response. The VHF module, designed to support single and limited outbreaks, faltered under the weight of the unprecedented spike in Ebola cases. Existing investments in health information systems in the region had focused primarily on engineering support for one-off projects, rather than on long-term capacity building, systems maintenance, and systems adaptation to meet needs identified at the national level. As a result, fledgling national information systems struggled to meet the data aggregation and reporting needs of the response.
  • None of the three most-affected countries had interlinked emergency and routine surveillance systems, nor were disease surveillance and response integrated with national health information systems. According to one interviewee, “The routine public health surveillance systems were sufficient to generate trend analysis for seasonality, but they weren’t highly sensitive. … We still have routine and emergency surveillance separate.”[65] This made it difficult to automatically sound the alarm when the outbreak occurred.
  • Data and corresponding data flows often were siloed and duplicative. This resulted in a proliferation of distinct and non-interoperable platforms to collect and manage data. For case data, separate reporting of routine disease surveillance and Ebola case data resulted in multiple sources of data. Yet this was true of other data types and data collection systems as well, including health or humanitarian program and evaluation information. One report on Sierra Leone noted, “Three distinct streams of information management were in operation during the response: via technical coordinators and programme managers, via [Monitoring, Evaluation, Accountability and Learning] staff and via communications staff. These streams did not appear to be strongly linked.”[66]
  • The issue of who “owned” the data, and the related question of who could share data, surfaced as crucial questions in relation to a variety of sources of data, particularly at the beginning of the response. For case data, a general consensus existed that the national ministries of health owned these data. Yet the ownership, sharing, and publication rules of subsets of these data were less clear. For example, who “owned” data about individual patients in a treatment unit, and, therefore, who could publish analysis about these patients? Once data were shared in a publicly accessible format, who owned these data? What policies, procedures, or gatekeeping functions should be in place to formally request access to data? 
  • Data collection tools for cross-response activities, such as case data reporting, contact tracing, and laboratory tests, needed to be standardized with a minimal set of data points/indicators and distributed widely. The urgency of the crisis, combined with the lack of flexibility in many of the data collection systems meant that updates and technological fixes were not easily implemented.
  • The lack of a robust unique identifier system was a great hindrance to data integration across data sets (e.g., case data, contact tracing, burials, family notification, laboratory data), as was the lack of machine-readable data. Compatibility of data standards and systems was an issue between countries, but even more so within countries.
  • Human capacity issues related to the collection and management of data and information affected the response in a variety of ways. These included the data burden (e.g., referring to the time required and human error introduced with multiple transcriptions, including from one paper form to another or from paper to digital format as well as multiple reporting streams);  limited capacity to collect, manage, and analyze data; and inadequate time, human resources, or supporting institutional policies and processes required to effectively manage and use the volume and velocity of data collected.
  • Outbreak responses require research. During the Ebola response (as with a number of other public health crises) it was essential for research to happen during, not only after the response--given the need to understand how the virus was mutating, how that affected contact chains, and the large numbers of unknowns related to how to best treat and stop the virus.

References

[1] Interview with health official, January 2016.
[2] World Health Organization, “Case Definition Recommendations for Ebola or Marburg Virus Diseases,” 2014, 9 August. http://www.who.int/csr/resources/publications/ebola/ebola-case-definition-contact-en.pdf?ua=1&ua=1
[3] Interview with CDC official, February 2015; interviews with WHO and CDC officials, February 2016.
[4] These definitions compiled with the assistance of the U.S. CDC. See also World Health Organization, “Ebola and Marburg virus disease epidemics: preparedness, alert, control, and evaluation,” interim manual version 1.2, August 2014, accessed August 9, 2014, http://www.who.int/csr/disease/ebola/manual_EVD/en/ ; and World Health Organization, “WHO: Ebola Response Roadmap Situation Report,” October 1, 2014, accessed August 11, 2016, http://apps.who.int/iris/bitstream/10665/135600/1/roadmapsitrep_1Oct2014_eng.pdf.
[5] Interview with WHO and CDC officials, February 2016.
[6] Note that the archive of WHO SitReps across the three countries is available at: World Health Organization, “Ebola Situation Reports: Archive,” updated June 2016, accessed May 9, 2016, http://www.who.int/csr/disease/ebola/situation-reports/archive/en/
[7] Eventually the term “probable” cases was discontinued and no longer used in Guinea.
[8] World Health Organization, “Définition des cas suspects aucours de la phase 3 de l’épidémie de la MVE [Ebola],” December 15, 2015. 
[9] Email correspondence with CDC officials, August 2016.
[10] Ministry of Health, Liberia, World Health Organization, and U.S. Centers for Disease Control and Prevention (2016). National Technical Guidelines Integrated Disease Surveillance and Response Liberia. Monrovia, Republic of Liberia. P. 183.
[11] For more on Ebola definitions in Sierra Leone, see Dietz PM, Jambai A, Paweska JT, Yoti Z, Ksiazek TG. Epidemiology and risk factors for Ebola virus disease in Sierra Leone-23 May 2014 to 31 January 2015. Clinical Infectious Disease 2015;61(11):1648-1654.
[12] In part based on the experience from Ebola, leading scientific publishers have pledged to make data related to the Zika outbreak openly available, to enable research and analysis to support clinical interventions. See Gretchen Vogel, “A Plea for Open Science on Zika,” Science AAAS, February 16, 2016, accessed September 13, 2016, http://www.sciencemag.org/news/2016/02/plea-open-science-zika.
[13] Interview with international responder, February 2015. Lab data, however, were typically tracked and shared in Excel, making these data more easily disseminated or re-used. 
[14] Interview with international responder, June 2015.
[15] Correspondence with CDC officials, August 2016.
[16] EpiInfo was originally developed by the CDC and is a set of open software tools commonly used by researchers and health professionals to track, analyze, and visualize disease surveillance and epidemiological information (e.g., patient symptoms, modes of disease transmission). Although it is open, downloadable software, it is not open-source software, meaning the source code is not publicly available online.
[17] Interview with international health official, February 2016.
[18] Confirmed cases, by definition, had confirmation via laboratory tests. (Interviews with various responders, February 2016). Starting in October of 2014, anyone who died in these countries was supposed to receive a safe and dignified burial by burial teams, regardless of whether they showed any Ebola symptoms. All those who died were supposed to be tested, “swabbed,” for Ebola. If positive, these individuals were included in the aggregated counts. See World Health Organization, “Field Situation: How to conduct safe and dignified burial of a patient who has died from suspected or confirmed Ebola virus disease,” October 2014, accessed September 29, 2016, http://apps.who.int/iris/bitstream/10665/137379/1/WHO_EVD_GUIDANCE_Burials_14.2_eng.pdf?ua=1.
[19] Interview with CDC official, February 2016.
[20] Interview with CDC officials, February 2016.
[21] Correspondence with CDC officials, August 2016.
[22] Interview with CDC official, February 2016. 
[23] Interview with NGO medical doctor, October 2015.
[24] Interview with health official, February 2016.
[25] Interviews with epidemiologists, February 2016.
[26] In September 2014, David Nabarro, the Senior UN System Coordinator for Ebola, indicated the disease was exponentially spreading and doubling approximately every three weeks. See United Nations Security Council, 7268th Meeting (PM), “With Spreads of Ebola Outpacing Response, Security Council Adopts Resolution 2177 (2014) Urging Immediate Action, End to Isolation of Affected States,” Meeting Coverage, September 18, 2014, https://www.un.org/press/en/2014/sc11566.doc.htm.
[27] Interviews with national and international responders, January, March, and May 2015, January and February 2016. 
[28] Interviews with national and international responders, January, March, and May 2015, January and February 2016. 
[29] Interview with CDC official, June 2015.
[30] For instance, if a district had 30 Ebola cases in one week and the CIF form was 10 pages, this would require manual data entry of 300 pages of information, much of which would also require cross-checking to avoid duplication.
[31] Interview with CDC officials, June 2015 and February 2016.
[32] Mahad Ibrahim, “Ebola’s Paper Trail,” Motherboard (July 15, 2015), http://motherboard.vice.com/read/ebolas-paper-trail (accessed April 27, 2016). 
[33] Interview with Esther Hamblion, WHO, and other officials involved with case data collection in Liberia, February 2016.
[34] Correspondence with CDC officials, August 2016.
[35] Interview with international responder, April 2015.
[36] Interview with WHO official, February 2016.
[37] Interviews with international responders, June and October 2015, February 2016.
[38] Interview with various officials involved in caseload data collection in Guinea and Sierra Leone, June 2015, January and February 2016.
[39] Interviews with CDC officials, February 2016.
[40] Interviews with CDC officials, April 2015 and February 2016.
[41] Interviews with individuals involved in caseload data collection in Liberia, April and May 2015, January and February 2016.
[42] Interview with USG official, February 2016.
[43] Interview with health official, February 2016.
[44] District Health Information Software 2, see “DHIS 2.23 is here,” DHIS2, https://www.dhis2.org. A national Health Information System (HIS) is designed to provide information support at all levels of a health system (e.g., patient, community, facility, district/county, national). It includes population-level data as well as facility and community data, such as service or administrative records about health workers, logistics, and financial records. A health management information system (HMIS), in comparison, refers to a subset of the HIS, specifically focused on aggregate service delivery records, such as number of pregnant women receiving antenatal care, malaria cases, etc. At the core, strengthening HIS involves establishing a culture of data-driven decision-making. It also involves standardizing indicator definitions and allowing the various subsets of the system to “speak to” or interoperate with other subsets in order to access holistic data (e.g., making it possible to link health worker registries to health facility registries, or supply chain logistics to link to health facility registries).
[45] Interview with officials involved, April/May 2015 and February 2016.
[46] Interviews with Liberian and USG officials, April 2015 and February 2016.
[47] Email correspondence with Knut Staring, September 2016.
[48] Interviews with NGO, international organization, and health officials working in treatment units, February 2016.
[49] See Jeff Neiman, “OpenMRS Ebola Case Study: Fighting Ebola with Open Source Collaboration,” August 31, 2016, http://openmrs.org/2016/08/openmrs-ebola-case-study/. ; see also the following articles which describe data collection and clinical outcomes for patients: John S Schieffelin et al., “Clinical Illness and Outcomes in Patients with Ebola in Sierra Leone,” New England Journal of Medicine 371, no. 22 (2014): 2092-100, doi: 10.1056/NEJMoa1411680; Gabriel Fitzpatrick et al., “The Contribution of Ebola Viral Load at Admission and Other Patient Characteristics to Mortality in a Médecins Sans Frontières Ebola Case Management Centre, Kailahun, Sierra Leone, June-October 2014,” Journal of Infectious Diseases 212, no. 11 (2015): 1752-8, doi: 10.1093/infdis/jiv304; and Elhadj Ibrahima Bah et al., “Clinical Presentation of Patients with Ebola Virus Disease in Conakry, Guinea,” New England Journal of Medicine 372, no.1 (2015): 40-47, doi: 10.1056/NEJMoa1411249.
[50] Interview with medical doctor, January 2015.
[51] World Health Organization,“Ebola Situation Reports,” accessed May 18, 2016 http://apps.who.int/ebola/ebola-situation-reports. See also the national MOH Situation Reports and CDC Ebola outbreak updates, available from http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/previous-updates.html.   
[52] Other digital tools for case management were used, such as eHealth Africa’s Sense software suite, which was used for contact tracing in Nigeria, Sierra Leone, and Liberia, or Dimagi’s CommCare, which was used for contact tracing in Guinea.
[53] Sean Martin McDonald, Ebola: A Big Data Disaster - Privacy, Property, and the Law of Disaster Experimentation, CIS papers (Delhi: The Center for Internet and Society, 2016), 14, http://cis-india.org/papers/ebola-a-big-data-disaster.
[54] In May 2015, the USAID Development Informatics team and partners identified approximately 150 uses of HIS platforms across all of West Africa, some of which were used in multiple countries. This list was not comprehensive of all digital platforms used in the health sector, nor did it include non-health-focused platforms, such as the WFP mobile Vulnerability Assessment and Mapping tool developed for and used in WFP programs in all three most-affected countries. Interview with WFP officials, February 2016. See “MVAM: The Blog, Mobile Technology for WFP’s Food Security Monitoring,” World Food Programme, http://mvam.org. Also interviews with officials, December 2014 and June 2015.
[55] Interview with international responder, November 2015.
[56] Interview with NGO official, February 2016; also with CDC officials, February 2016.
[57] Interview with NGO official, October 2015.
[58] Interviews with international responders, December 2014, April 2015 and February 2016.
[59] Interview with USG official, January 2015; see also Sean McDonald, Ebola: A Big Data Disaster, 14.
[60] Interview with NGO official, October 2015.
[61]Interviews with national and international responders, December 2015 and February 2016.
[62] Interview with Amanda McClelland, December 2015.
[63]  Interview with national official, April 2015.
[64] Interview with NGO official, October 2015.
[65] Interview with international health official, February 2016
[66] John Adams, Anne Lloyd, and Carolyn Miller, The Oxfam Ebola Response in Liberia and Sierra Leone: An Evaluation Report for the Disasters Emergency committee (Oxford, UK: Oxfam, 2015), 23, accessed September 29, 2016, http://policy-practice.oxfam.org.uk/publications/the-oxfam-ebola-response-in-liberia-and-sierra-leone-an-evaluation-report-for-t-560602.