TY - JOUR AU - Rounds, Jeremiah AU - Charles-Smith, Lauren AU - Corley, Courtney D. PY - 2017/05/02 Y2 - 2024/03/28 TI - Soda Pop: A Time-Series Clustering, Alarming and Disease Forecasting Application JF - Online Journal of Public Health Informatics JA - OJPHI VL - 9 IS - 1 SE - Data fusion/integration DO - 10.5210/ojphi.v9i1.7582 UR - https://ojphi.org/ojs/index.php/ojphi/article/view/7582 SP - AB - Objective<br />To introduce Soda Pop, an R/Shiny application designed to be a<br />disease agnostic time-series clustering, alarming, and forecasting<br />tool to assist in disease surveillance “triage, analysis and reporting”<br />workflows within the Biosurveillance Ecosystem (BSVE) [1]. In this<br />poster, we highlight the new capabilities that are brought to the BSVE<br />by Soda Pop with an emphasis on the impact of metholodogical<br />decisions.<br />Introduction<br />The Biosurveillance Ecosystem (BSVE) is a biological and<br />chemical threat surveillance system sponsored by the Defense Threat<br />Reduction Agency (DTRA). BSVE is intended to be user-friendly,<br />multi-agency, cooperative, modular and threat agnostic platform<br />for biosurveillance [2]. In BSVE, a web-based workbench presents<br />the analyst with applications (apps) developed by various DTRAfunded<br />researchers, which are deployed on-demand in the cloud<br />(e.g., Amazon Web Services). These apps aim to address emerging<br />needs and refine capabilities to enable early warning of chemical and<br />biological threats for multiple users across local, state, and federal<br />agencies.<br />Soda Pop is an app developed by Pacific Northwest National<br />Laboratory (PNNL) to meet the current needs of the BSVE for<br />early warning and detection of disease outbreaks. Aimed for use by<br />a diverse set of analysts, the application is agnostic to data source<br />and spatial scale enabling it to be generalizable across many diseases<br />and locations. To achieve this, we placed a particular emphasis on<br />clustering and alerting of disease signals within Soda Pop without<br />strong prior assumptions on the nature of observed diseased counts.<br />Methods<br />Although designed to be agnostic to the data source, Soda Pop was<br />initially developed and tested on data summarizing Influenza-Like<br />Illness in military hospitals from collaboration with the Armed Forces<br />Health Surveillance Branch. Currently, the data incorporated also<br />includes the CDC’s National Notifiable Diseases Surveillance System<br />(NNDSS) tables [3] and the WHO’s Influenza A/B Influenza Data<br />(Flunet) [4]. These data sources are now present in BSVE’s Postgres<br />data storage for direct access.<br />Soda Pop is designed to automate time-series tasks of data<br />summarization, exploration, clustering, alarming and forecasting.<br />Built as an R/Shiny application, Soda Pop is founded on the powerful<br />statistical tool R [5]. Where applicable, Soda Pop facilitates nonparametric<br />seasonal decomposition of time-series; hierarchical<br />agglomerative clustering across reporting areas and between diseases<br />within reporting areas; and a variety of alarming techniques including<br />Exponential Weighted Moving Average alarms and Early Aberration<br />Detection [6].<br />Soda Pop embeds these techniques within a user-interface designed<br />to enhance an analyst’s understanding of emerging trends in their data<br />and enables the inclusion of its graphical elements into their dossier<br />for further tracking and reporting. The ultimate goal of this software<br />is to facilitate the discovery of unknown disease signals along with<br />increasing the speed of detection of unusual patterns within these<br />signals.<br />Conclusions<br />Soda Pop organizes common statistical disease surveillance tasks<br />in a manner integrated with BSVE data source inputs and outputs.<br />The app analyzes time-series disease data and supports a robust set of<br />clustering and alarming routines that avoid strong assumptions on the<br />nature of observed disease counts. This attribute allows for flexibility<br />in the data source, spatial scale, and disease types making it useful to<br />a wide range of analysts<br />Soda Pop within the BSVE.<br />Keywords<br />BSVE; Biosurveillance; R/Shiny; Clustering; Alarming<br />Acknowledgments<br />This work was supported by the Defense Threat Reduction Agency under<br />contract CB10082 with Pacific Northwest National Laboratory<br />References<br />1. Dasey, Timothy, et al. “Biosurveillance Ecosystem (BSVE) Workflow<br />Analysis.” Online journal of public health informatics 5.1 (2013).<br />2. http://www.defense.gov/News/Article/Article/681832/dtra-scientistsdevelop-<br />cloud-based-biosurveillance-ecosystem. Accessed 9/6/2016.<br />3. Centers for Disease Control and Prevention. “National Notifiable<br />Diseases Surveillance System (NNDSS).”<br />4. World Health Organization. “FluNet.” Global Influenza Surveillance<br />and Response System (GISRS).<br />5. R Core Team (2016). R: A language and environment for statistical<br />computing. R Foundation for Statistical Computing, Vienna, Austria.<br />6. Salmon, Maëlle, et al. “Monitoring Count Time Series in R: Aberration<br />Detection in Public Health Surveillance.” Journal of Statistical<br />Software [Online], 70.10 (2016): 1 - 35. ER -