Bias Associated with Mining Electronic Health Records


  • George Hripcsak Columbia University
  • Charles Knirsch Pfizer Inc., New York, NY
  • Li Zhou Partners Healthcare, Boston, MA
  • Adam Wilcox Columbia University
  • Genevieve Melton University of Minnesota



Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.




How to Cite

Hripcsak, G., Knirsch, C., Zhou, L., Wilcox, A., & Melton, G. (2011). Bias Associated with Mining Electronic Health Records. DISCO: Journal of Biomedical Discovery and Collaboration, 6, 48–52.



Lessons Learned