AbstractObjectiveTo comparatively analyze Google, Twitter, and Wikipedia byevaluating how well change points detected in each web-based sourcecorrespond to change points detected in CDC ILI data.IntroductionTraditional influenza surveillance relies on reports of influenza-like illness (ILI) by healthcare providers, capturing individualswho seek medical care and missing those who may search, post,and tweet about their illnesses instead. Existing research has shownsome promise of using data from Google, Twitter, and Wikipediafor influenza surveillance, but with conflicting findings, studies haveonly evaluated these web-based sources individually or dually withoutcomparing all three of them1-5. A comparative analysis of all threeweb-based sources is needed to know which of the web-based sourcesperforms best in order to be considered to complement traditionalmethods.MethodsWe collected publicly available, de-identified data from the CDCILINet system, Google Flu Trends, HealthTweets.org, and Wikipediafor the 2012-2015 influenza seasons. Bayesian change point analysiswas the method used to detect change points, or seasonal changes,in each of the web-data sources for comparison to change pointsin CDC ILI data. All analyses was conducted using the R package‘bcp’ v4.0.0 in RStudio v0.99.484. Sensitivity and positive predictivevalues (PPV) were then calculated.ResultsDuring the 2012-2015 influenza seasons, a high sensitivity of 92%was found for Google, while the PPV for Google was 85%. A lowsensitivity of 50% was found for Twitter; a low PPV of 43% wasfound for Twitter also. Wikipedia had the lowest sensitivity of 33%and lowest PPV of 40%.ConclusionsGoogle had the best combination of sensitivity and PPV indetecting change points that corresponded with change points found inCDC data. Overall, change points in Google, Twitter, and Wikipediadata occasionally aligned well with change points captured in CDCILI data, yet these sources did not detect all changes in CDC data,which could indicate limitations of the web-based data or signify thatthe Bayesian method is not adequately sensitive. These three web-based sources need to be further studied and compared using otherstatistical methods before being incorporated as surveillance data tocomplement traditional systems.Figure 1. Detection of change points, 2012-2013 influenza seasonFigure 2. Detection of change points, 2013-2014 influenza seasonFigure 3. Detection of change points, 2014-2015 influenza season
Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes. Share-alike: when posting copies or adaptations of the work, release the work under the same license as the original. For any other use of articles, please contact the copyright owner. The journal/publisher is not responsible for subsequent uses of the work, including uses infringing the above license. It is the author's responsibility to bring an infringement action if so desired by the author.