Cover story for Pulse@UM Year 2021 Issue 1: Big Data & Artificial Intelligence in Medicine and Healthcare Research

The Democratization of Global Infectious Disease Surveillance Data: Practical Considerations, Concerns and Opportunities

By Dr Vivek Jason Jayaraj & Professor Dr Sanjay Rampal

The rapid emergence and integration of the internet, mobile phones, satellites, and sensors into everyday human life has ensured that data continues to be increasingly intertwined into the fabric of our postmodern society. In the 1990s, a single 3.5-inch floppy disk with a capacity of 1.44MB could perhaps carry a fair amount of data one generated overtime periods of days to weeks. Fast forward 30 years, and each human on the planet now produces 1.7MB of data per second. Such is the ubiquity of data within our day-to-day existence that terms such as “big data” are now viewed in an almost banal perspective.

This data revolution has in no doubt had ripples in all fields including in the surveillance of disease. The great American epidemiologist, Alexander Langmuir, defined surveillance in 1963 as a “continued watchfulness of trend in all relevant data”- foreshadowing the concepts of velocity, volume, variety, veracity, variability and value that have become the cornerstone of big data. This intersection of big data and surveillance has led to the discipline of digital epidemiology, which promises increased precision in the identification of at-risk populations, increased efficiency of surveillance, and more targeted interventions.

We have increasingly witnessed the democratization of data in the last decade. Examples of this are 1) democratization of surveillance data – Project Tycho ( and 2) use of novel surveillance methodologies such as the mining of social media data – Twitter API. These efforts, however, are dwarfed by the sheer scale at which the COVID-19 data machinery has so rapidly developed. In just a year, organisations such as the John-Hopkins Coronavirus resource centre, Worldometers, and Our World in Data have all assembled global data collation networks of basic surveillance aggregates. This global democratization of COVID-19 associated data has allowed for rapid development and dissemination of data visualisation and data analytics.

Nonetheless, in mining and utilising these data sources there remain several important practical considerations. Despite movements within the open data space within the last decade- COVID-19 has again highlighted the reluctance of authorities to report data with complete transparency (Neill, 2020). Instead, data has been reported in portions, in inconsistent locations, in non- standard formats, contravening typical conventions of data storage and with no detailed descriptions of the data structure.

Data, be it aggregates of cases or deaths, is eventually a reflection of a public health apparatus which are dynamic both across national boundaries and time. It is useful to think of a disease as an iceberg- with the public health apparatus a camera capturing a picture of the said iceberg. A more efficient camera would have the capacity to capture images in greater resolution and size- possibly even capturing portions of the iceberg that are underwater.

Just as the internet has served as an information superhighway that can be accessed from the comfort of your home, the accessibility of surveillance data within the COVID-19 epidemic has been almost universal. Early in the pandemic, this led to many non-experts wading into the field of infectious disease, and epidemiology earning themselves the moniker of “armchair epidemiologist”. This led to a deluge of data visualisation and analytics creating an “infodemic”- an environment that very quickly led to factions and followings on social media with a host of narratives and counter-narratives.

The Covid-19 pandemic is far from over. There is also the threat of larger future pandemics by other emerging and re-emerging infectious diseases. We need further democratization of data and further evolution of the digital epidemiology discipline to be better prepared.

Other posts: