摘要

This study proposed a novel algorithm to investigate the risk factors for complex diseases. We employed the novel algorithm to determine the risk factors for depressive disorder, osteoporosis, and fracture in young patients with breast cancer who were receiving curative surgery. The novel algorithm has three steps. First, multiple correspondence analysis (MCA) is used to transform the raw data set into a multidimensional coordinate matrix. Second, the expectation-maximization (EM) algorithm is used for clustering the multidimensional coordinates for each category of variable. Third, v-fold cross-validation is incorporated into the coordinate matrix obtained using the MCA-EM algorithm to determine the optimal clustering of complex diseases and risk factors. A total of 4108 patients with breast cancer aged 20-39 years were enrolled. The results revealed that depressive disorder, osteoporosis, and fracture were clustered with liver cirrhosis, chronic obstructive pulmonary disease (COPD), distant metastasis, and primary metastatic and adjuvant therapies, namely, chemotherapy, radiotherapy, tamoxifen, aromatase inhibitors, and trastuzumab. Among the risk factors identified using this novel algorithm, liver cirrhosis and COPD have been rarely mentioned in the literature. In conclusion, the novel algorithm proposed in this study enables physicians and clinicians to identify risk factors for multiple diseases.