TY - JOUR AU - Ray, Bisakha AU - Chunara, Rumi PY - 2017/05/02 Y2 - 2024/03/28 TI - Predicting Acute Respiratory Infections from Participatory Data JF - Online Journal of Public Health Informatics JA - OJPHI VL - 9 IS - 1 SE - Language processing, classifiers, and syndrome definitions DO - 10.5210/ojphi.v9i1.7650 UR - https://ojphi.org/ojs/index.php/ojphi/article/view/7650 SP - AB - <div style="left: 82px; top: 292.246px; font-size: 12.9074px; font-family: sans-serif; transform: scaleX(1.08211);" data-canvas-width="58.10914814814815">Objective</div><div style="left: 95.6667px; top: 305.868px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00323);" data-canvas-width="344.8781814814815">To evaluate prediction of laboratory diagnosis of acute respiratory</div><div style="left: 82px; top: 321.054px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01864);" data-canvas-width="361.0227666666666">infection (ARI) from participatory data using machine learning</div><div style="left: 82px; top: 336.239px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00164);" data-canvas-width="40.51635185185185">models.</div><div style="left: 82px; top: 365.135px; font-size: 12.9074px; font-family: sans-serif; transform: scaleX(1.11815);" data-canvas-width="75.28890740740741">Introduction</div><div style="left: 95.6667px; top: 378.757px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.973098);" data-canvas-width="344.1360055555554">ARIs have epidemic and pandemic potential. Prediction of presence</div><div style="left: 82px; top: 393.942px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.98906);" data-canvas-width="360.69362777777775">of ARIs from individual signs and symptoms in existing studies</div><div style="left: 82px; top: 409.128px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.0217);" data-canvas-width="225.60857407407403">have been based on clinically-sourced data</div><div style="left: 307.532px; top: 409.469px; font-size: 7.74444px; font-family: serif;">1</div><div style="left: 311.409px; top: 409.128px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.02878);" data-canvas-width="130.0034074074074">. Clinical data generally</div><div style="left: 82px; top: 424.313px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.980819);" data-canvas-width="358.11343703703704">represents the most severe cases, and those from locations with access</div><div style="left: 82px; top: 439.498px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.98835);" data-canvas-width="358.1121462962962">to healthcare institutions. Thus, the viral information that comes from</div><div style="left: 82px; top: 454.683px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00156);" data-canvas-width="358.50711296296294">clinical sampling is insufficient to either capture disease incidence in</div><div style="left: 82px; top: 469.868px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.985602);" data-canvas-width="358.1676481481482">general populations or its predictability from symptoms. Participatory</div><div style="left: 82px; top: 485.054px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01376);" data-canvas-width="358.9549999999998">data — information that individuals today can produce on their own</div><div style="left: 82px; top: 500.239px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.0379);" data-canvas-width="359.51001851851856">— enabled by the ubiquity of digital tools, can help fill this gap by</div><div style="left: 82px; top: 515.424px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00635);" data-canvas-width="360.54261111111094">providing self-reported data from the community. Internet-based</div><div style="left: 82px; top: 530.609px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01463);" data-canvas-width="219.24522222222225">participatory efforts such as Flu Near You</div><div style="left: 301.253px; top: 530.951px; font-size: 7.74444px; font-family: serif;">2</div><div style="left: 305.125px; top: 530.609px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00738);" data-canvas-width="135.66072407407404">have augmented existing</div><div style="left: 82px; top: 545.794px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.982237);" data-canvas-width="358.22185925925896">ARI surveillance through early and widespread detection of outbreaks</div><div style="left: 82px; top: 560.979px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00149);" data-canvas-width="126.90562962962962">and public health trends.</div><div style="left: 82px; top: 589.876px; font-size: 12.9074px; font-family: sans-serif; transform: scaleX(1.07341);" data-canvas-width="53.062351851851844">Methods</div><div style="left: 95.6667px; top: 603.498px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01231);" data-canvas-width="120.24540740740743">The GoViral platform</div><div style="left: 215.883px; top: 603.839px; font-size: 7.74444px; font-family: serif;">3</div><div style="left: 219.848px; top: 603.498px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.9941);" data-canvas-width="222.8605870370371">was established to obtain self-reported</div><div style="left: 82px; top: 618.683px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.03807);" data-canvas-width="359.2776851851852">symptoms and diagnostic specimens from the community (Table 1</div><div style="left: 82px; top: 633.868px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.978665);" data-canvas-width="359.9204740740742">summarizes participation detail). Participants from states with the</div><div style="left: 82px; top: 649.054px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.04249);" data-canvas-width="359.5022740740739">most data, MA, NY, CT, NH, and CA were included. Age, gender,</div><div style="left: 82px; top: 664.239px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.983067);" data-canvas-width="358.1805555555557">zip code, and vaccination status were requested from each participant.</div><div style="left: 82px; top: 679.424px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00279);" data-canvas-width="358.6323148148148">Participants submitted saliva and nasal swab specimens and reported</div><div style="left: 82px; top: 694.609px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00206);" data-canvas-width="358.5548703703703">symptoms from: fever, cough, sore throat, shortness of breath, chills,</div><div style="left: 82px; top: 709.794px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01807);" data-canvas-width="361.1260259259259">fatigue, body aches, headache, nausea, and diarrhea. Pathogens</div><div style="left: 82px; top: 724.979px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.03235);" data-canvas-width="359.3938518518518">were confirmed via RT-PCR on a GenMark respiratory panel assay</div><div style="left: 82px; top: 740.165px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00155);" data-canvas-width="174.2370925925926">(full virus list reported previously</div><div style="left: 256.218px; top: 740.506px; font-size: 7.74444px; font-family: serif;">3</div><div style="left: 260.091px; top: 740.165px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00334);" data-canvas-width="7.525018518518517">).</div><div style="left: 95.6667px; top: 755.35px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.982556);" data-canvas-width="346.6129370370373">Observations with missing, invalid or equivocal lab tests were</div><div style="left: 82px; top: 770.535px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.974814);" data-canvas-width="359.9501611111113">removed. Table 2 summarizes the binary features. Age categories</div><div style="left: 82px; top: 785.72px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01813);" data-canvas-width="32.4105">were:</div><div style="left: 115.797px; top: 788.172px; font-size: 12.9074px; font-family: sans-serif;">≤</div><div style="left: 122.984px; top: 785.72px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.912521);" data-canvas-width="135.63103703703703">20, &gt; 20 and &lt; 40, and</div><div style="left: 258.588px; top: 788.172px; font-size: 12.9074px; font-family: sans-serif;">≥</div><div style="left: 265.775px; top: 785.72px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.964756);" data-canvas-width="171.69433333333325">40 to represent young, middle-</div><div style="left: 82px; top: 800.905px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.995166);" data-canvas-width="358.4090166666667">aged, and old. Missing age and gender values were imputed based on</div><div style="left: 82px; top: 816.091px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00123);" data-canvas-width="106.84751851851851">overall distributions.</div><div style="left: 95.6667px; top: 831.276px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.998585);" data-canvas-width="346.75879074074095">Three machine learning algorithms—Support Vector Machines</div><div style="left: 82px; top: 846.461px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.998352);" data-canvas-width="41.44826666666666">(SVMs)</div><div style="left: 123.472px; top: 846.802px; font-size: 7.74444px; font-family: serif;">4</div><div style="left: 127.332px; top: 846.461px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.985316);" data-canvas-width="121.35802592592594">, Random Forests (RFs)</div><div style="left: 248.771px; top: 846.802px; font-size: 7.74444px; font-family: serif;">5</div><div style="left: 252.63px; top: 846.461px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.981806);" data-canvas-width="187.3122962962963">, and Logistic Regression (LR) were</div><div style="left: 82px; top: 861.646px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.985486);" data-canvas-width="360.2444499999998">considered. Both individual features and their combinations were</div><div style="left: 82px; top: 876.831px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00864);" data-canvas-width="358.80656481481475">assessed. Outcome was the presence (1) or absence (0) of laboratory</div><div style="left: 82px; top: 892.017px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00153);" data-canvas-width="91.42316666666666">diagnosis of ARI.</div><div style="left: 82px; top: 920.913px; font-size: 12.9074px; font-family: sans-serif; transform: scaleX(1.08548);" data-canvas-width="46.62155555555556">Results</div><div style="left: 95.6667px; top: 934.535px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00549);" data-canvas-width="347.2131314814813">Ten-fold cross validation was repeated ten times. Evaluations</div><div style="left: 82px; top: 949.72px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.06267);" data-canvas-width="362.5664925925925">metrics used were: positive predictive value (PPV), negative</div><div style="left: 82px; top: 964.905px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.02832);" data-canvas-width="267.46729629629647">predictive value (NPV), sensitivity, and specificity</div><div style="left: 349.624px; top: 965.247px; font-size: 7.74444px; font-family: serif;">6</div><div style="left: 353.515px; top: 964.905px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.04355);" data-canvas-width="87.84910555555554">. LR and SVMs</div><div style="left: 82px; top: 980.091px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01193);" data-canvas-width="259.7460851851852">yielded the best PPV of 0.64 (standard deviation:</div><div style="left: 341.74px; top: 982.542px; font-size: 12.9074px; font-family: sans-serif;">±</div><div style="left: 348.824px; top: 980.091px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01286);" data-canvas-width="92.06853703703703">0.08) with cough</div><div style="left: 82px; top: 995.276px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.996632);" data-canvas-width="270.0539407407407">and fever as predictors. The best sensitivity of 0.59 (</div><div style="left: 352.036px; top: 997.727px; font-size: 12.9074px; font-family: sans-serif;">±</div><div style="left: 359.12px; top: 995.276px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.99637);" data-canvas-width="81.30375925925927">0.14) was from</div><div style="left: 82px; top: 1010.46px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.04504);" data-canvas-width="359.4067592592593">LR using cough, fever, and sore throat. RFs had the best NPV and</div><div style="left: 82px; top: 1025.65px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.980853);" data-canvas-width="107.82848148148149">specificity of 0.62 (</div><div style="left: 189.788px; top: 1028.1px; font-size: 12.9074px; font-family: sans-serif;">±</div><div style="left: 196.987px; top: 1025.65px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.96288);" data-canvas-width="88.24794444444444">0.15) and 0.83 (</div><div style="left: 285.212px; top: 1028.1px; font-size: 12.9074px; font-family: sans-serif;">±</div><div style="left: 292.411px; top: 1025.65px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.975044);" data-canvas-width="149.75045000000003">0.10) respectively with the</div><div style="left: 82px; top: 1040.83px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.98534);" data-canvas-width="357.97403703703696">CDC ILI symptom profile of fever and (cough or sore throat). Adding</div><div style="left: 82px; top: 1056.02px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.965692);" data-canvas-width="359.62360370370357">demographics and vaccination status did not improve performance</div><div style="left: 82px; top: 1071.2px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00479);" data-canvas-width="355.3783574074072">of the classifiers. Results are consistent with studies using clinically-</div><div style="left: 82px; top: 1086.39px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.958839);" data-canvas-width="359.80430740740735">sourced data: cough and fever together were found to be the best</div><div style="left: 82px; top: 1101.57px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.03986);" data-canvas-width="157.85113888888887">predictors of flu-like illness</div><div style="left: 239.832px; top: 1101.91px; font-size: 7.74444px; font-family: serif;">1</div><div style="left: 243.836px; top: 1101.57px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.02325);" data-canvas-width="199.7214777777778">. Because our data include mildly</div><div style="left: 464.667px; top: 290.683px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00118);" data-canvas-width="358.4877518518518">infectious and asymptomatic cases, the classifier sensitivity and PPV</div><div style="left: 464.667px; top: 305.868px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00154);" data-canvas-width="240.1681296296296">are low compared to results from clinical data.</div><div style="left: 464.667px; top: 334.765px; font-size: 12.9074px; font-family: sans-serif; transform: scaleX(1.10391);" data-canvas-width="77.45735185185185">Conclusions</div><div style="left: 478.333px; top: 348.387px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.01928);" data-canvas-width="345.45127037037025">Evidence of fever and cough together are good predictors of ARI</div><div style="left: 464.667px; top: 363.572px; font-size: 12.9074px; font-family: serif; transform: scaleX(0.976227);" data-canvas-width="360.4393518518521">in the community, but clinical data may overestimate this due to</div><div style="left: 464.667px; top: 378.757px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00054);" data-canvas-width="358.56003333333325">sampling bias. Integration of participatory data can not only improve</div><div style="left: 464.667px; top: 393.942px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.03259);" data-canvas-width="305.75066666666675">population health by actively engaging the general public</div><div style="left: 770.245px; top: 394.284px; font-size: 7.74444px; font-family: serif;">2</div><div style="left: 774.154px; top: 393.942px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.06431);" data-canvas-width="49.951666666666654">but also</div><div style="left: 464.667px; top: 409.128px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00841);" data-canvas-width="361.0459999999999">improve the scope of studies solely based on clinically-sourced</div><div style="left: 464.667px; top: 424.313px; font-size: 12.9074px; font-family: serif; transform: scaleX(1.00177);" data-canvas-width="90.32603703703704">surveillance data.</div><div style="left: 464.667px; top: 449.962px; font-size: 11.3889px; font-family: serif; transform: scaleX(1.00654);" data-canvas-width="185.67305555555564">Table 1. Details of included participants.</div>Table 2. Coding of binary features ER -