摘要

The amount of observational data available for research is growing rapidly with the rise of electronic health records and patient-generated data. However, these data bring new challenges, as data collected outside controlled environments and generated for purposes other than research may be error-prone, biased, or systematically missing. Analysis of these data requires methods that are robust to such challenges, yet methods for causal inference currently only handle uncertainty at the level of causal relationships -rather than variables or specific observations. In contrast, we develop a new approach for causal inference from time series data that allows uncertainty at the level of individual data points, so that inferences depend more strongly on variables and individual observations that are more certain. In the limit, a completely uncertain variable will be treated as if it were not measured. Using simulated data we demonstrate that the approach is more accurate than the state of the art, making substantially fewer false discoveries. Finally, we apply the method to a unique set of data collected from 17 individuals with type I diabetes mellitus (T1DM) in free-living conditions over 72 h where glucose levels, insulin dosing, physical activity and sleep are measured using body-worn sensors. These data often have high rates of error that vary across time, but we are able to uncover the relationships such as that between anaerobic activity and hyperglycemia. Ultimately, better modeling of uncertainty may enable better translation of methods to free-living conditions, as well as better use of noisy and uncertain EHR data.

  • 出版日期2016-10