TY - JOUR AU - Fitzpatrick, Dylan AU - Ni, Yun AU - Neill, Daniel B. PY - 2017/05/02 Y2 - 2024/03/29 TI - Support Vector Subset Scan for Spatial Outbreak Detection JF - Online Journal of Public Health Informatics JA - OJPHI VL - 9 IS - 1 SE - Novel algorithms, statistical or mathematical methods DO - 10.5210/ojphi.v9i1.7599 UR - https://ojphi.org/ojs/index.php/ojphi/article/view/7599 SP - AB - <div style="left: 90px; top: 320.758px; font-size: 14.1667px; font-family: sans-serif; transform: scaleX(1.08129);" data-canvas-width="63.778333333333336">Objective</div><div style="left: 105px; top: 335.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01543);" data-canvas-width="378.94275">We present the support vector subset scan (SVSS), a new method</div><div style="left: 90px; top: 352.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.997603);" data-canvas-width="393.4380833333333">for detecting localized and irregularly shaped patterns in spatial data.</div><div style="left: 90px; top: 369.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.999451);" data-canvas-width="262.45591666666667">SVSS integrates the penalized fast subset scan</div><div style="left: 352.452px; top: 369.417px; font-size: 8.5px; font-family: serif;">3</div><div style="left: 356.702px; top: 369.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.998404);" data-canvas-width="126.79733333333333">with a kernel support</div><div style="left: 90px; top: 385.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.996508);" data-canvas-width="393.32191666666665">vector machine classifier to accurately detect disease clusters that are</div><div style="left: 90px; top: 402.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00094);" data-canvas-width="178.2166666666667">compact and irregular in shape.</div><div style="left: 90px; top: 434.091px; font-size: 14.1667px; font-family: sans-serif; transform: scaleX(1.11768);" data-canvas-width="82.63416666666666">Introduction</div><div style="left: 105px; top: 449.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01123);" data-canvas-width="141.5391666666667">Neill’s fast subset scan</div><div style="left: 246.525px; top: 449.417px; font-size: 8.5px; font-family: serif;">2</div><div style="left: 250.885px; top: 449.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00326);" data-canvas-width="235.1808333333334">detects significant spatial patterns of</div><div style="left: 90px; top: 465.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00895);" data-canvas-width="396.0121666666671">disease by efficiently maximizing a log-likelihood ratio statistic</div><div style="left: 90px; top: 482.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.979141);" data-canvas-width="395.60558333333324">over subsets of locations, but may result in patterns that are not</div><div style="left: 90px; top: 499.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.0301);" data-canvas-width="332.0525">spatially compact. The penalized fast subset scan (PFSS)</div><div style="left: 422.039px; top: 499.417px; font-size: 8.5px; font-family: serif;">3</div><div style="left: 426.323px; top: 499.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.04156);" data-canvas-width="58.171166666666664">provides</div><div style="left: 90px; top: 515.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.962673);" data-canvas-width="395.2896666666666">a flexible framework for adding soft constraints to the fast subset</div><div style="left: 90px; top: 532.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.959278);" data-canvas-width="395.07858333333326">scan, rewarding or penalizing inclusion of individual points into a</div><div style="left: 90px; top: 549.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.970834);" data-canvas-width="394.8221666666668">cluster with additive point-specific penalty terms. We propose the</div><div style="left: 90px; top: 565.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.04013);" data-canvas-width="394.67766666666586">support vector subset scan (SVSS), a novel method that iteratively</div><div style="left: 90px; top: 582.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.976577);" data-canvas-width="392.81191666666655">assigns penalties according to distance from the separating hyperplane</div><div style="left: 90px; top: 599.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.999649);" data-canvas-width="393.4451666666667">learned by a kernel support vector machine (SVM). SVSS efficiently</div><div style="left: 90px; top: 615.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00289);" data-canvas-width="389.08749999999975">detects disease clusters that are geometrically compact and irregular.</div><div style="left: 90px; top: 647.425px; font-size: 14.1667px; font-family: sans-serif; transform: scaleX(1.07287);" data-canvas-width="58.23916666666666">Methods</div><div style="left: 105px; top: 662.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01159);" data-canvas-width="59.66716666666666">Speakman</div><div style="left: 164.645px; top: 662.751px; font-size: 8.5px; font-family: serif;">3</div><div style="left: 168.943px; top: 662.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.05593);" data-canvas-width="278.23758333333325">observes that for a fixed value of relative risk</div><div style="left: 447.141px; top: 662.376px; font-size: 14.1667px; font-family: serif;">q</div><div style="left: 454.303px; top: 662.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.08497);" data-canvas-width="30.306749999999997">, the</div><div style="left: 90px; top: 679.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.02416);" data-canvas-width="394.5416666666668">log-likelihood ratio for the exponential family of expectation-based</div><div style="left: 90px; top: 695.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01089);" data-canvas-width="393.89141666666654">scan statistics can be written as an additive set function over all data</div><div style="left: 90px; top: 712.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00883);" data-canvas-width="393.8460833333331">elements. This property enables addition of element-specific penalty</div><div style="left: 90px; top: 729.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01657);" data-canvas-width="394.12375000000003">terms to the log-likelihood ratio, interpreted as the prior log-odds of</div><div style="left: 90px; top: 745.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01126);" data-canvas-width="393.7525833333333">including a data point in the cluster. We propose an iterative method</div><div style="left: 90px; top: 762.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.980112);" data-canvas-width="393.07400000000007">for setting the penalty terms which leads to spatially compact clusters,</div><div style="left: 90px; top: 779.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.966111);" data-canvas-width="395.2358333333333">alternately running PFSS to obtain an optimal subset and training</div><div style="left: 90px; top: 795.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.956487);" data-canvas-width="394.7739999999999">a kernel SVM to maximize the margin between points within and</div><div style="left: 90px; top: 812.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.970458);" data-canvas-width="393.0031666666666">outside of the subset. On each iteration of PFSS, penalties are assigned</div><div style="left: 90px; top: 829.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01748);" data-canvas-width="393.96791666666655">based on distance to the SVM decision boundary. We apply random</div><div style="left: 90px; top: 845.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.001);" data-canvas-width="393.59250000000003">restarts across the penalty space to approach a global optimum in the</div><div style="left: 90px; top: 862.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00043);" data-canvas-width="214.05833333333342">non-convex SVSS objective function.</div><div style="left: 90px; top: 894.091px; font-size: 14.1667px; font-family: sans-serif; transform: scaleX(1.08488);" data-canvas-width="51.17">Results</div><div style="left: 105px; top: 909.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.965099);" data-canvas-width="380.02366666666666">We demonstrate detection of disease clusters in mosquito pools</div><div style="left: 90px; top: 925.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.981176);" data-canvas-width="393.10800000000006">tested for West Nile Virus (WNV), using data made publicly available</div><div style="left: 90px; top: 942.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.97269);" data-canvas-width="395.6254166666664">by the Chicago Department of Public Health through the City of</div><div style="left: 90px; top: 959.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.995003);" data-canvas-width="340.8216666666666">Chicago Data Portal. In comparison to the circular scan</div><div style="left: 430.784px; top: 959.417px; font-size: 8.5px; font-family: serif;">1</div><div style="left: 435.136px; top: 959.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.969417);" data-canvas-width="50.71666666666667">, which</div><div style="left: 90px; top: 975.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.973103);" data-canvas-width="395.0799999999999">detects circular patterns with elevated WNV, SVSS has improved</div><div style="left: 90px; top: 992.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.01898);" data-canvas-width="396.75875">power to detect disease clusters that are elongated or irregular</div><div style="left: 90px; top: 1009.04px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.994746);" data-canvas-width="395.9753333333334">in shape. For example, the top WNV cluster detected by SVSS</div><div style="left: 90px; top: 1025.71px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.02116);" data-canvas-width="394.28666666666663">roughly conforms to sections of two major rivers in North Chicago,</div><div style="left: 90px; top: 1042.38px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.962783);" data-canvas-width="394.6124999999998">overlapping significant portions of the forest preserves adjacent to</div><div style="left: 90px; top: 1059.04px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.02776);" data-canvas-width="275.8859166666666">these rivers. The unconstrained fast subset scan</div><div style="left: 365.786px; top: 1059.42px; font-size: 8.5px; font-family: serif;">2</div><div style="left: 370.058px; top: 1059.04px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.03922);" data-canvas-width="114.4355">has high detection</div><div style="left: 90px; top: 1075.71px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00064);" data-canvas-width="393.45083333333315">power for subtle and irregular disease clusters, but finds patterns that</div><div style="left: 90px; top: 1092.38px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.987234);" data-canvas-width="395.63391666666695">are spatially sparse and intermingled with non-anomalous points.</div><div style="left: 90px; top: 1109.04px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.993431);" data-canvas-width="395.73308333333307">SVSS rewards patterns with spatial coherence, detecting clusters</div><div style="left: 90px; top: 1125.71px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.976213);" data-canvas-width="395.43133333333327">that are compact and separated from non-anomalous points while</div><div style="left: 90px; top: 1142.38px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.977823);" data-canvas-width="392.97058333333314">maintaining power to detect slight but significant increases in detected</div><div style="left: 90px; top: 1159.04px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.02304);" data-canvas-width="83.00249999999998">rates of WNV.</div><div style="left: 510px; top: 320.758px; font-size: 14.1667px; font-family: sans-serif; transform: scaleX(1.10336);" data-canvas-width="85.01416666666667">Conclusions</div><div style="left: 525px; top: 335.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.962481);" data-canvas-width="370.9400000000001">SVSS introduces soft spatial constraints to the fast subset scan</div><div style="left: 895.75px; top: 336.084px; font-size: 8.5px; font-family: serif;">2</div><div style="left: 510px; top: 352.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.03885);" data-canvas-width="394.60541666666626">in the form of penalties to the log-likelihood ratio statistic, learned</div><div style="left: 510px; top: 369.043px; font-size: 14.1667px; font-family: serif; transform: scaleX(0.966921);" data-canvas-width="394.7456666666666">iteratively based on distance to a high-dimensional SVM decision</div><div style="left: 510px; top: 385.709px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.00896);" data-canvas-width="396.2529999999999">boundary. These constraints give SVSS greater power to detect</div><div style="left: 510px; top: 402.376px; font-size: 14.1667px; font-family: serif; transform: scaleX(1.0008);" data-canvas-width="287.59749999999997">spatially compact and irregular patterns of disease.</div><div style="left: 510px; top: 644.461px; font-size: 12.5px; font-family: serif; transform: scaleX(1.00649);" data-canvas-width="334.67499999999984">Clusters of West Nile Virus detected by three scanning algorithms.</div> ER -