ObjectiveUsing the whole genome association study (GWAS) data, Mendel randomization (MR) method was used to find the causal relationship between oral flora and type 2 diabetes (T2D) and myocardial infarction (MI). MethodsGenetic association data of oral microbiota were selected from the Chinese 4D-SZ cohort GWAS dataset, and T2D and MI outcome data were obtained from a large-scale cohort study in BioBank Japan. Four methods, including inverse variance weighting (IVW), were used to analyze the causal relationship between exposure and outcomes. Sensitivity analysis was conducted on significant MR results to further validate the robustness of the results. ResultsThe results showed a total of 24 species of dorsal tongue flora and 13 species of salivary flora with a potential causal relationship with T2D. There were 12 species each of dorsal tongue and salivary flora with a potential causal relationship with MI.A total of 8 oral flora were found on the dorsum of the tongue and saliva that could affect both T2D and MI, namely Saccharimonadaceae, Treponemataceae, Prevotella, Haemophilus, Lachnoanaerobaculum, Campylobacter_A, Neisseria, and Streptococcus. ConclusionWe identified 8 oral flora causally associated with both T2D and MI, suggesting that diabetes may play a role in promoting the progression of myocardial infarction by affecting the above oral flora.
Objective To construct a "disease-syndrome combination" mathematical representation model for pulmonary nodules based on oral microbiome data, utilizing a multimodal data algorithm framework centered on dynamic systems theory. Furthermore, to compare predictive models under various algorithmic frameworks and validate the efficacy of the optimal model in predicting the presence of pulmonary nodules. MethodsA total of 213 subjects were prospectively enrolled from July 2022 to March 2023 at the Hospital of Chengdu University of Traditional Chinese Medicine, Sichuan Cancer Hospital, and the Chengdu Integrated Traditional Chinese and Western Medicine Hospital. This cohort included 173 patients with pulmonary nodules and 40 healthy subjects. A novel multimodal data algorithm framework centered on dynamic systems theory, termed VAEGANTF (Variational Auto Encoder-Generative Adversarial Network-Transformer), was proposed. Subsequently, based on a multi-dimensional integrated dataset of "clinical features-syndrome elements-microorganisms", all subjects were divided into training (70%) and testing (30%) sets for model construction and efficacy testing, respectively. Using healthy individuals and patients with pulmonary nodules as dependent variables, and combining candidate markers such as clinical features, lesion location, disease nature, and microbial genera, the independent variables were screened based on variable importance ranking after identifying and addressing multicollinearity. Missing values were then imputed, and data were standardized. Eight machine learning algorithms were then employed to construct pulmonary nodule risk prediction models: random forest, least absolute shrinkage and selection operator (LASSO) regression, support vector machine, multilayer perceptron, eXtreme gradient boosting (XGBoost), VAE-ViT (Vision Transformer), GAN-ViT, and VAEGANTF. K-fold cross-validation was used for model parameter tuning and optimization. The efficacy of the eight predictive models was evaluated using confusion matrices and receiver operating characteristic (ROC) curves, and the optimal model was selected. Finally, goodness-of-fit testing and decision curve analysis (DCA) were performed to evaluate the optimal model. ResultsThere were no statistically significant differences between the two groups in demographic characteristics such as age and sex. The 312 subjects were randomly divided into training and testing sets (7∶3), and prediction models were constructed using the eight machine learning algorithms. After excluding potential problems such as multicollinearity, a total of 301 clinical feature information, syndrome elements, and microbial genera markers were included for model construction. The area under the curve (AUC) values of the random forest, LASSO regression, support vector machine, multilayer perceptron, and VAE-ViT models did not reach 0.85, indicating poor efficacy. The AUC values of the XGBoost, GAN-ViT, and VAEGANTF models all reached above 0.85, with the VAEGANTF model exhibiting the highest AUC value (AUC=0.923). Goodness-of-fit testing indicated good calibration ability of the VAEGANTF model, and decision curve analysis showed a high degree of clinical benefit. The nomogram results showed that age, sex, heart, lung, Qi deficiency, blood stasis, dampness, Porphyromonas genus, Granulicatella genus, Neisseria genus, Haemophilus genus, and Actinobacillus genus could be used as predictors. Conclusion The "disease-syndrome combination" risk prediction model for pulmonary nodules based on the VAEGANTF algorithm framework, which incorporates multi-dimensional data features of "clinical features-syndrome elements-microorganisms", demonstrates better performance compared to other machine learning algorithms and has certain reference value for early non-invasive diagnosis of pulmonary nodules.