A multi-component classifier for nonalcoholic fatty liver disease (NAFLD) based on genomic, proteomic, and phenomic data domains

作者:Wood G Craig; Chu Xin; Argyropoulos George; Benotti Peter; Rolston David; Mirshahi Tooraj; Petrick Anthony; Gabrielson John; Carey David J; DiStefano Johanna K; Still Christopher D; Gerhard Glenn S
来源:Scientific Reports, 2017, 7(1): 43238.
DOI:10.1038/srep43238

摘要

Non-alcoholic fatty liver disease (NAFLD) represents a spectrum of conditions that include steatohepatitis and fibrosis that are thought to emanate from hepatic steatosis. Few robust biomarkers or diagnostic tests have been developed for hepatic steatosis in the setting of obesity. We have developed a multi-component classifier for hepatic steatosis comprised of phenotypic, genomic, and proteomic variables using data from 576 adults with extreme obesity who underwent bariatric surgery and intra-operative liver biopsy. Using a 443 patient training set, protein biomarker discovery was performed using the highly multiplexed SOMAscan (R) proteomic assay, a set of 19 clinical variables, and the steatosis predisposing PNPLA3 rs738409 single nucleotide polymorphism genotype status. The most stable markers were selected using a stability selection algorithm with a L-1-regularized logistic regression kernel and were then fitted with logistic regression models to classify steatosis, that were then tested against a 133 sample blinded verification set. The highest area under the ROC curve (AUC) for steatosis of PNPLA3 rs738409 genotype, 8 proteins, or 19 phenotypic variables was 0.913, whereas the final classifier that included variables from all three domains had an AUC of 0.935. These data indicate that multi-domain modeling has better predictive power than comprehensive analysis of variables from a single domain.

  • 出版日期2017-3-7