Deep learning models to predict survival of patients with hepatocellular carcinoma based on Surveillance, Epidemiology, and End Results (SEER) database analysis.

Data Description

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

In this study, 35,444 patients with HCC were examined from the SEER database between 2010 and 2015, of which 2197 patients met the inclusion criteria. Table 1 shows the basic clinical characteristics of the patients (eTable 1 in the Appendix). Of the 2197 participants, 70% (n = 1548) were 66 years of age and younger, 23% (n = 505) were between 66 and 77 years of age, and 6.6% (n = 144) were 77 years of age. was more than Male participants accounted for 78% (n = 1915), while females represented 22% (n = 550). In terms of race, the majority of participants were White, 66% (n = 1455), followed by Asian or Pacific Islander 22% (n = 478), Black individuals 10% (n = 228), and were indigenous. American/Alaska Natives only 1.6% (n = 36). Regarding marital status, 60% (n = 1319) were married, and the remaining 40% (n = 878) were of other marital status. Histologically, most participants (98%, n = 2154) were type 8170. Additionally, 50% (n = 1104) of patients had grade II, 18% (n = 402) grade III, 1.0% (n = 1.0%). 22) were grade IV, and 30% (n = 669) were grade I. In terms of tumor stage, 48% (n = 1054) of participants were in stage I, 29% (n = 642) in stage II, 16% (n = 344) in stage III, and 7.1% (n = 157) In Phase IV. Regarding TNM classification, 49% (n = 1079) were T1, 31% (n 1 = 677) were T2, 96% (n = 2114) were N0, and 95% (n = 2090) were M0. 66% (n = 1444) of participants had positive/elevated AFP. 70% (n = 1532) showed high levels of liver fibrosis. 92% (n = 2012) had a single tumor, while the remaining 8.4% (n = 185) had multiple tumors. 32% (n = 704) underwent lobectomy, 14% (n = 311) underwent local tumor destruction, 34% (n = 753) had no surgery, and 20% (n = 429) underwent wedge or segmental resection. Resection done. Finally, 2.1% (n = 46) received radiation therapy, with 62% (n = 1352) not receiving chemotherapy and 38% (n = 855) undergoing chemotherapy. The median overall survival (OS) in months for participants was 45 ± 34 months, with 1327 (60%) alive at the end of follow-up.

Table 1 shows univariate and multivariate Cox regression analyzes of significant characteristics.

Feature selection

After univariate Cox regression analysis, we identified several factors significantly associated with the survival rate (p<0.05) of patients with hepatocellular carcinoma. These factors include age, race, marital status, histological type, tumor grade, tumor stage, T stage, N stage, M stage, alpha-fetoprotein level, tumor size, type of surgery and chemotherapy status. All these variables significantly affected patient survival in univariate analysis. However, in multivariate Cox regression analysis, we further confirmed that only age, marital status, histological type, tumor grade, tumor stage, and tumor size were independent factors affecting patient survival (P<0.05 ) (Table 1). Furthermore, by covariance analysis, we observed a significant high degree of concordance between tumor staging (stage) and individual stages of T, N, and M (Fig. 1). This phenomenon occurs mainly because the overall tumor stage (stage) is determined directly based on the results of the TNM diagnosis. This covariance suggests the need to handle these variables carefully during modeling to avoid overfitting and low predictive performance. Although some variables were not identified as independent predictors in the multivariate analysis, we included them in the construction of our deep learning model for several compelling reasons. First, these variables can capture subtle interactions and nonlinear relationships that are not readily apparent in traditional regression models, but can be identified through more sophisticated modeling techniques such as deep learning. Second, including a broader set of variables may increase the generalizability and robustness of the model across different clinical scenarios, allowing it to better account for variation across patient subgroups or treatment conditions. Based on this analysis, we finally selected 12 significant factors (age, race, marital status, histological type, tumor grade, T stage, N stage, M stage, alpha-fetoprotein, tumor size, type of surgery, chemotherapy). what Building predictive models. We divided the dataset into two subsets: a training set containing 1537 samples and a test set containing 660 samples (Table 2). By training and testing the model on these data, we aim to develop a model that can accurately predict the survival rate of patients with hepatocellular carcinoma, support clinical decision making and improve patient prognosis. .

Figure 1

Correlation coefficients for each pair of variables in the data set.

Table 2 Distribution of main feature of data in training set and test sets.

Hyperparameter optimization and model comparison results

Initially, we performed five-fold cross-validation on the training set and performed 1000 iterations of random search. In all these validations, we selected the parameters that showed the highest average concordance index (C-index) and identified them as the best parameters. Figure 2 shows the loss function graphs for two deep learning models, NMTLR and DeepSurv. This set of graphs shows the changes of the losses of these two models during the training process.

Figure 2

loss convergence graph for (Oh) DeepSurv, (B) neural network multitask logistic regression (N-MTLR) models.

When comparing the machine learning models with the standard Cox Proportional Hazards (CoxPH) model in terms of prediction performance, Table 3 presents the performance of each model on the test set. In our analysis, we used the log-rank test to compare the covariance indices (C-index) across models. The results indicated that the three machine learning models—DeepSurv, N-MTLR, and RSF—demonstrated significantly higher discrimination ability than the standard Cox Proportional Hazards (CoxPH) model (p < 0.01), as shown in Table 4 is explained in detail. The C-index was 0.7317 for DeepServ, 0.7353 for NMTLR, and 0.7336 for RSF, compared to only 0.6837 for the standard CoxPH model. Among these three machine learning models, NMTLR had the highest C-index, indicating its superiority in prediction performance. Further analysis of the Integrated Brier Score (IBS) for each model showed that the IBS for the four models were 0.1598 (NMTLR), 0.1632 (DeepSurv), 0.1648 (RSF), and 0.1789 (CoxPH), respectively ( Fig. 3 ). The NMTLR model had the lowest IBS value, indicating its best performance in terms of uncertainty in predictions. Furthermore, there was no significant difference between the C-index obtained from the training and test sets, which suggests that the NMTLR model has better generalization performance over complex real-world data and is free from the tendency of overfitting. can be effectively avoided.

Table 3 Performance of four survival models.
Table 4 Comparative analysis of discrimination ability (C-index) between CoxPH and machine learning models (DeepSurv, N-MTLR, RSF).
Figure 3

Through calibration plots (Fig. 4), we observed that the NMTLR model demonstrated excellent consistency between model predictions and actual observations in terms of 1-year, 3-year, and 5-year overall survival rates. What, followed by the Deep Soro model, the RSF model, and the CoxPH model. This consistency was also reflected in the AUC values: for the prediction of 1-year, 3-year, and 5-year survival rates, the NMTLR and DeepSurv models had higher AUC values ​​than the RSF and CoxPH models. Specifically, the 1-year AUC values ​​were 0.803 for NMTLR and 0.794 for DeepSurv, compared with 0.786 for RSF and 0.766 for CoxPH; The 3-year AUC values ​​were 0.808 for NMTLR and 0.809 for DeepSurv, compared with 0.797 for RSF and 0.772 for CoxPH; The 5-year AUC values ​​were 0.819 for both DeepSurv and NMTLR, compared to 0.812 for RSF and 0.772 for CoxPH. The results show that in predicting the survival of patients with hepatocellular carcinoma, the deep learning models—DeepSurv and NMTLR—show higher accuracy than the RSF and classical CoxPH models. The NMTLR model performed significantly better in several evaluation metrics.

Figure 4

Receiver operating curves (ROC) and calibration curves for 1-, 3-, 5-year survival predictions. ROC curves (Oh) 1-, (C3-, (E) predicts 5-year survival. Calibration curves for (B) 1-, (D3-, (F) predicts 5-year survival.

Importance of a model feature

In feature analysis of deep learning models, the effect of a feature on the model's accuracy can be measured by the percentage reduction in the concordance index (C-index) when its values ​​are replaced by random data. A high reduction percentage indicates the critical importance of the feature in maintaining the predictive accuracy of the model. Figure 5 shows the feature importance heatmaps for the DeepSurv, NMTLR, and RSF models.

Figure 5

Heatmap of feature importance for DeepServ, Neural Network Multitask Logistic Regression (NMTLR) and Random Survival Forest (RSF) models.

In the NMTLR model, adjusting for characteristics such as age, race, marital status, histological type, tumor grade, T stage, N stage, alpha-fetoprotein, tumor size, type of surgery, and chemotherapy decreased the mean reduction. . Coherence index greater than 0.1 percent. In the deep-sore model, the concordance index was similar when variables such as age, race, marital status, histological type, T-stage, N-stage, alpha-fetoprotein, tumor size, and type of surgery were replaced with random data. An average decrease of . In the RSF model, we found that age, race, tumor grade, T stage, M stage, tumor size, and type of surgery significantly affected the accuracy of the model, as indicated by a significant decrease in the C-index. occurs, on average less than 0.1% when transformed from random data.

The risk stratification potential of the NMTLR model

In the training cohort, the NMTLR model was used to predict patient risk probabilities. Optimal threshold values ​​for these probabilities were determined using X-tile software. Based on these cut-off points, patients were classified as low risk (<178.8)، درمیانے خطرے (178.8–248.4)، اور ہائی رسک (> 248.4) was divided into categories. A statistically significant difference was observed in the survival curves between the groups, with a p-value less than 0.001, as shown in Fig. 6A. Similar results were replicated in the external validation cohort, as shown in Fig. 6B , demonstrating the robust risk stratification ability of the NMTLR model.

Figure 6

Kaplan-Meier curves evaluated the risk stratification ability of the NMTLR model.

Model deployment.

The web application developed in this study, primarily for research or informational purposes, is generally accessible at The functionality and output visualization of this application is shown in Figure 7 and E-Figure 1 in the Appendix.

Figure 7

Online web based application of NMTLR model.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment