Skip to main content

Machine learning-driven prediction of medical expenses in triple-vessel PCI patients using feature selection

Abstract

Revascularization therapies, such as percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG), alleviate symptoms and treat myocardial ischemia. Patients with multivessel disease, particularly those undergoing 3-vessel PCI, are more susceptible to procedural complications, which can increase healthcare costs. Developing efficient strategies for resource allocation has become a paramount concern due to tightening healthcare budgets and the escalating costs of treating heart conditions. Therefore, it is essential to develop an evaluation model to estimate the costs of PCI surgeries and identify the key factors influencing these costs to enhance healthcare quality. This study utilized the National Health Insurance Research Database (NHIRD), encompassing data from multiple hospitals across Taiwan and covering up to 99% of the population. The study examined data from triple-vessel PCI patients treated between January 2015 and December 2017. Additionally, six machine-learning algorithms and five cross-validation techniques were employed to identify key features and construct the evaluation model. The machine learning algorithms used included linear regression (LR), random forest (RF), support vector regression (SVR), generalized linear model boost (GLMBoost), Bayesian generalized linear model (BayesGLM), and extreme gradient boosting (eXGB). Among these, the eXGB model exhibited outstanding performance, with the following metrics: MSE (0.02419), RMSE (0.15552), and MAPE (0.00755). We found that the patient’s medication use in the previous year is also crucial in determining subsequent surgical costs. Additionally, 25 significant features influencing surgical expenses were identified. The top variables included 1-year medical expenditure before PCI surgery (hospitalization and outpatient costs), average blood transfusion volume, ventilator use duration, Charlson Comorbidity Index scores, emergency department visits, and patient age. This research is crucial for estimating potential expenses linked to complications from the procedure, directing the allocation of resources in the future, and acting as an important resource for crafting medical management policies.

Peer Review reports

Introduction

Across the globe, healthcare providers face the challenge of prioritizing and distributing resources under the constraints of limited budgets. Heart disease, a major public health concern, is responsible for approximately US$39 billion in treatment costs every year in the United States [1]. Furthermore, by 2030, the cost of treatment for heart disease in the United States is projected to increase to US$70 billion [2, 3]. Due to their advanced age and accompanying health conditions, patients with heart disease are frequently hospitalized for treatment, which in turn elevates their risk of mortality [4, 5].

Coronary atherosclerosis is a heart-related disease that often incurs high treatment costs [4,5,6,7]. Percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG) are commonly used procedures for treating coronary artery disease. Both of these procedures involve cardiac revascularization and typically require hospitalization [4, 8]. However, the cost of these procedures can vary significantly depending on several factors [9].

The average cost of a PCI procedure can range from approximately $11,030 for elective procedures to $14,840 for acute cases. However, in Patients with triple-vessel disease (TVD) after coronary angiogram, if they are diagnosed with diabetes and impaired left ventricular function, these costs can escalate with complications [10, 11].

Due to the differences in healthcare systems, having data specific to each country is crucial for supporting decision-making. To reduce healthcare expenses, researchers have conducted extensive studies to identify the key factors influencing hospitalization costs for patients undergoing PCI. Table 1 presents a range of methods for identifying risk factors and analyzing medical expenses in patients who have undergone PCI. Pohlen et al. used the dataset from Germany in 2002 and found that major cost drivers for PCI include elevated creatinine levels, reduced ejection fraction, and thrombus presence, with patients suffering from acute coronary syndromes (ACS) incurring significantly higher costs due to urgent and expensive treatments [12]. Furthermore, the average cost per patient among ACS patients, of whom 85% received PCI treatment, was €2601, with a standard deviation of €5378 over a one-year follow-up period. The study highlighted that depression scores, LDL cholesterol levels, and left ventricular ejection fraction (LVEF) were significant predictors of overall healthcare costs, accounting for up to 30% of the total variance in costs [13].

Table 1 Lecture review of the risk factors analysis in patients who have undergone PCIs

Amin et al. find that patients with AKI after PCI significantly increase hospital costs, with an estimated annual burden of $411.3 million in the U.S [17]. Moleerergpoom et al., refer that the costs for patients treated with PCI were significantly higher than for other treatment modalities, and it increased costs by 116,445 baht to the overall cost [14]. The hospitalization costs for patients undergoing PCI have increased over time, leaving hospitals with a deficit ranging from $4,493 to $7,940 per patient in 2009 [16]. In Hong Kong, the treatment costs for CAD are significantly impacted by PCI. Due to an increased need for invasive procedures and associated medical care. For patients with comorbidities, particularly hyperlipidemia, additional self-financed costs for consumables (e.g., balloons, stents) tend to incur even higher costs [15].

Through the research mentioned above, it has been observed that few studies have utilized ML methods for the evaluation of PCI-related medical expenses. ML boasts a broad spectrum of applications and holds substantial potential for efficiently allocating healthcare resources.

Given the constraints of limited healthcare budgets, the main purpose of this study is to conduct an ML evaluation model to access before triple-vessel PCI and provide a valuable reference for future resource allocation and formulation of medical management policies. We plan to employ feature selection alongside multiple ML algorithms to identify and prioritize the key factors influencing medical expenses. Lastly, we have developed a framework for evaluating medical expenses for patients who underwent new-onset triple-vessel PCI surgery between 2015 and 2017. This information is pivotal in identifying potential costs associated with complications arising from the procedure, guiding future resource allocation, and serving as a valuable reference for developing medical management policies.

Materials and methods

Data source

Since its launch in March 1995, Taiwan’s National Health Insurance system has had approximately 23 million beneficiaries. The National Health Insurance Research Database (NHIRD) is Taiwan’s most comprehensive administrative healthcare resource and has been widely utilized in numerous academic studies. Between 2002 and 2018, the National Health Research Institutes of Taiwan contributed to over 3,300 published research articles based on this database. As a crucial empirical tool, the NHIRD significantly influences medical decision-making and provides critical disease prevention and management insights [18].

This study focused on patients undergoing triple-vessel PCI for the first time and utilized data obtained from the NHIRD. In 2016, NHIRD started including diagnosis codes from the International Classification of Diseases, Ninth Revision (ICD-9-CM), Clinical Modification and International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). Due to the non-hierarchical structure of this drug coding system, various researchers have opted to map the codes to international systems, ex, the WHO Anatomical Therapeutic Chemical Classification System (ATC code), to facilitate their studies [19], which provide valuable insights into patients’ medical histories.

The NHIRD organizes its data across various datasheets containing ambulatory care records, inpatient and outpatient claims, medication prescription details, and medical facility registries. In addition, the National Death Registry of Taiwan includes the patient’s survival status and cause of death. To ensure patient privacy, all original beneficiary identification data that could potentially be used to identify patients or care providers, including the names of medical institutions and physicians, were encrypted.

However, the patient and hospital identification numbers were assigned a unique value (PINs), the only way to recognize the same patient or hospital [20]. By linking these datasheets using personal identification numbers, researchers can link a patient’s past and future data, including demographic information, for research purposes [20].

This study was approved by the Institutional Review Board of Fu Jen Catholic University, New Taipei City, Taiwan (Approval no. C108121). No informed consent was required.

Research framework

Figure 1 depicts the four-stage conceptual framework of patients who underwent new-onset PCI. In the first stage, we conducted a retrospective cohort analysis of critically ill patients who underwent triple-vessel PCI (procedure codes 33078A and 33078B) for the first time between January 1, 2015, and December 31, 2017, in Taiwan. Patients meeting the following criteria were excluded from this study: (1) having missing information (n = 9) and (2) having a history of PCI (n = 404). Ultimately, 4,835 patients were included for analysis. The initial hospitalization date for PCI was designated as the index date, and the final follow-up period was extended until the date of death or the end of the study period (December 31, 2020), with death records sourced from Taiwan’s National Death Registry.

Fig. 1
figure 1

Four-stage conceptual framework of patients who underwent new-onset PCI

Six ML algorithms and five cross-validation techniques were used to identify key features. These ML algorithms were linear regression (LR), random forest (RF), support vector regression (SVR), generalized linear model boost (GLMBoost), the Bayesian generalized linear model (BayesGLM), and extreme gradient boosting (eXGB). Because each of these algorithms has its own computational rules, in the second stage, we compared the key variables identified by each algorithm and the averaged variables to establish an evaluation model. In the third stage, we examined which variable selection algorithm had the highest predictive performance when the nth variable was selected. In the final stage, surgical expenses were evaluated using the following six indicators: mean absolute error (MAE), root mean squared error (RMSE), mean squared error (MSE), mean absolute scaled error (MASE), mean absolute percentage error (MAPE) [21, 22], and symmetric mean absolute percentage error (SMAPE) [23].

Risk factors

A total of 64 variables were selected for this study, including demographic, previous-year, relevant disease, and medication variables. Variables from X1 to X3 included data related to sex, age, and Charlson Comorbidity Index (CCI) scores. Variables from X4 to X12 included preoperative data for the preceding year, such as the frequency of hemodialysis and peritoneal dialysis (P.D.), the average duration of intensive care unit admissions, the average volume of blood transfused (94001C, 94002C, 4013C, 94015C, and 94003C), the average duration of mechanical ventilation (57001B, 57002B, and 57003B), the number of CABG vessels, whether intra-aortic balloon pump (IABP) surgery was performed, outpatient medical expenses, and hospitalization costs. Variables from X13 to X43 included data on whether the patients had baseline conditions related to bleeding or infection. These conditions were calculated using three outpatient records or a single hospitalization record within the preceding year.

Singh et al. 2010 state that when patients are diagnosed with chronic stable angina, it is essential to provide each patient with optimal medical treatment. This should include medications such as ACE inhibitors, beta-blockers, statins, and nitrates to manage the condition effectively [10]. X44 to X64 variables included data on each patient’s preoperative medication history. Furthermore, post-PCI patients require long-term medications, such as antiplatelets and lipid-lowering agents, which contribute to the ongoing medical costs [15]. Supplementary 1 presents all the ICD-9-CODE, ICD-10-CODE. The procedure code and the drug codes are placed in supplementary 2.

Outcome definition

According to the literature, previous-year medical records help evaluate medical expenses [24]. This study used previous-year medical records of outpatient and hospitalization costs, underlying diseases, and medications as predictive variables. All variables were derived from the records of both inpatients and outpatients.

The primary outcome of this study was surgical expenditures, which served as the dependent variable (Y) (Fig. 2). All expenses are reported in new Taiwan dollars (NT$).

Fig. 2
figure 2

Definition of surgical expenses

Model construction

Medical information is inherently complex. Before making a clinical decision, physicians must obtain their patients’ medical records, which is a time-consuming process. Given the negative repercussions of risk factors, variable selection is a key preprocessing step. This step excludes irrelevant risk factors and preserves strongly correlated features without significant information loss [24,25,26]. With the advancement of machine learning algorithms, an increasing number of researchers are employing diverse machine learning techniques to develop an evaluation model. These models aim to identify relevant risk factors, facilitating disease prevention and management [24, 27,28,29,30].

Each of the diverse capabilities of these algorithms in handling different aspects of the dataset, including linear relationships (LR), nonlinear interactions (RF, eXGB), support vector mechanisms (SVR), gradient boosting approaches (GLMBoost), and Bayesian inference methods (Bayesglm). Each method has the ability to provide insights into feature importance and balance between interpretability and performance. Moreover, studies by Liu in 2021 [31] and Zack in 2019 [32] developed and validated ML-based models (RF) to predict clinical outcomes and forecast relevant data for PCI procedures. EO Cruz in 2024 demonstrated that eXGB provided the most accurate predictions for CABG hospitalization costs [33].

Consequently, in the second stage, we utilized a dataset containing 64 input features and employed different algorithms to determine the best predictive models and identify the optimal features for evaluating surgical expenses. Due to variations in their calculation methods, we used two different feature selection methods. The first method averages and ranks the features calculated by each algorithm, while the second method allows each algorithm to determine the important variables independently.

We do this because each machine learning algorithm offers unique benefits. RF utilizes the MDA approach for variable selection. However, the mean decrease in accuracy (MDA) is commonly utilized to evaluate feature importance by measuring how much the model’s accuracy drops when a particular feature’s values are randomly shuffled. This feature selection method has been widely adopted in many published studies [34,35,36,37,38].

Both GLMBoost and SVR use recursive feature elimination (RFE). This process involves training the model and progressively removing the minor essential features to select the most predictive ones [39, 40]. Additionally, BayesGLM determines feature importance by calculating the posterior probability of each model. The Bayes factor is used to compare the quality of different models, including those with various features. By comparing Bayes factors can choose the model that includes the most valuable features [41]. In eXGB, feature importance is assessed using gain, which evaluates how much a feature enhances the model’s predictive performance by its role in making splits [42].

Finally, the data were randomly split into training (80%) and testing (20%) datasets, then we employed machine learning techniques along with K-fold cross-validation to assess the performance of the final model. This cross-validation method helps the model reduce the risk of overfitting by effectively assessing model generalization through multiple training cycles. It provides more stable and reliable performance metrics by averaging results across different subsets [37, 43]. When we implement k-fold cross-validation, the data is divided into k-equal subsets. We then combine k − 1 subsets for model training, using the remaining subset for testing. In this study, we set K to 5. This process is repeated k times, and the results are averaged at the end [37].

This study applied the feature selection method to streamline data reduction and significantly enhance the accuracy of ML prediction models [37, 38, 44]. This approach offers a distinct advantage in modeling the complex patterns and interactions among patient characteristics, pre-treatment illness burdens, and medication usage. It provides a more precise and comprehensive analysis of the factors influencing PCI surgery costs, highlighting the importance of features across the various models utilized in our research.

LR

LR is widely used for predictive modeling and trend analysis due to its simplicity and interoperability [45, 46]. The primary objective is to find the best-fitting line that minimizes the sum of the squared differences between observed and predicted values using the least squares method [46]. The data should be normally distributed before a prediction model is established.

RF

RF is employed to select the optimal splitting method at each intermediate node from a randomly chosen subset of features. The core principle of RF is based on the increase observed in the impurity of nodes, known as the Gini impurity [47, 48]. The greatest impurity reduction occurs at the tree’s root, whereas the minor reduction occurs at the terminal nodes. RF is a method used to estimate the test error of a model without requiring additional training data sets. This estimation procedure uses the out-of-bag error rate and does not require additional test data. Although the test error slightly varies between individual trees, the correlation observed between trees in an RF is weaker than in the bagging method. Growing additional trees with a weaker correlation increases prediction accuracy [47, 49, 50].

SVR

While support vector machine (SVM) classification generates binary outputs (e.g., class labels), SVR addresses regression problems by estimating real-valued functions. The main objective of SVR is to find a function that approximates the relationship between input features and a continuous target variable while ensuring that the majority of data points fall within a specified ε-insensitive zone [51]. It is a supervised ML technique that addresses regression problems. It employs the fundamental principle of SVM classification by utilizing a sparse kernel machine that defines a hyperplane based on a limited number of support vectors to perform regression [52, 53]. SVR can utilize kernel functions to map the input features into higher-dimensional spaces, making it capable of handling non-linear relationships effectively.

Additionally, SVR formulates an optimization problem to derive a regression function that maps input predictor variables to output response values. This technique is instrumental because it balances model complexity with prediction accuracy, making it highly effective for handling high-dimensional data [53].

GLMBoost

GLMBoost is a regression and classification algorithm employing gradient boosting techniques, with GLM as the base model. This algorithm fits linear models through component-wise boosting, in which each design matrix column is individually analyzed and selected through a simple linear approach [54]. GLMBoost is typically used to fit LR models by using component-wise linear least squares with L2 boosting, and it is highly effective in reducing model complexity while enhancing prediction accuracy [55]. GLMBoost is used not only to manage diverse types of data, including categorical and continuous variables, but also to handle nonlinear data [54].

BayesGLM

BayesGLM is a robust alternative to traditional GLMs. When GLMs are used in modeling, responses can be defined with great flexibility, leading to an adaptable structure [56]; however, this approach often requires complex parametrizations that are difficult to interpret and may result in incomprehensive solutions. In BayesGLM, selecting a normal prior for the parameters of the linear predictor is common. This selection is conjugated to a normal linear model, resulting in a GLM that employs a Gaussian observation model with an identity link function. BayesGLM can handle computational difficulties when specific data centers report only zero counts. It can also effectively manage overdispersion in count data to ensure accurate and reliable results [57].

eXGB

In 2016, Chen et al. [42] proposed eXGB, which can be used to rapidly and accurately solve many data science problems. eXGB outperforms gradient-boosting machines in terms of computational efficiency and speed. In addition, it substantially reduces model complexity while improving prediction accuracy. It also has configurable parameters, making it highly effective in handling large data sets. eXGB does not require feature normalization and exhibits high performance with nonlinear data. Finally, it employs a multithreaded strategy that maximizes the utilization of CPU cores, resulting in enhanced speed and performance [58].

Validation indicators

This study used six ML algorithms to evaluate surgical medical expenses. The validation index of the proposed model was used as a reference to evaluate its quality and accuracy based on its attributes.

To evaluate the performance of the proposed model, six indicators were used to examine the prediction outcomes to ensure the model’s wide applicability. These metrics included three categories: absolute errors, scaled errors, and percentage errors. The absolute errors comprised the M.A.E., RMSE, and M.S.E.; the scaled errors comprised the MASE; and the percentage errors comprised the MAPE [21, 22] and SMAPE [23]. Table 2 presents the mathematical equations used to calculate these statistical validation metrics.

Table 2 Mathematical equations used to calculate statistical validation metrics

Validation indices are essential metrics for evaluating the performance of regression models. They play a key role in determining the accuracy of model predictions in terms of their alignment with the actual data involved. Indicators are commonly used as performance benchmarks in various prediction models [59]. In an evaluation model, lower deviation indicates higher accuracy.

The MAPE is a commonly used forecast accuracy measure characterized by scale independence and interpretability [60]. MAPE values lower than 0.1 indicate excellent model discrimination, MAPE values of 0.11–0.2 indicate high model discrimination, MAPE values of 0.21–0.50 indicate acceptable model discrimination, and MAPE values higher than 0.51 indicate no model discrimination [59, 61, 62].

In the present study, we used the above indicators to determine each model’s prediction error. Here, x represents the total number of patients, b = [\({b}^{1}+{b}^{2}+{b}^{3}\dots {b}^{i}\)] represents the actual medical expenses, and a = [\({a}^{1}+{a}^{2}+{a}^{3}\dots {a}^{i}\)] represents the predicted medical expenses.

Statistical analysis

This study selected patients who underwent triple-vessel new-onset PCI for the first time based on their demographic characteristics, medical history, prior medication use, and previous-year medical expenses. All results are presented as numbers and percentages, with N (%) for categorical variables and mean ± standard deviation (S.D.) for continuous variables. The analyses in stage 1 were conducted using S.A.S. version 9.4 (S.A.S. Institute, Cary, NC, U.S.A.).

ML algorithms were implemented in R software version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). The caret" package was used to construct LR, GLMBoost, and BayesGLM models; the e1071" package was used to construct an SVR model; and the randomForest" package was used to construct an RF model.

ML parameters

Setting model parameters effectively enhances machine learning algorithms’ performance, stability, and interpretability. Proper configurations can significantly improve predictive accuracy and prevent overfitting, ensuring that models generalize well to new data. Overall, the parameter adjustments are essential for building accurate, resilient, and adaptable models tailored to specific datasets [63, 64]. Supplementary 3 is the settings for ML parameters. Configuring the RF model with the specified parameters provides several advantages. Setting the number of trees (ntree) to 800 enhances predictive accuracy and stability, as more trees generally improve the ensemble’s performance and robustness against overfitting; this increase from the initial 500 allows for better generalization on unseen data. The nodesize is set to 5, the default for regression, ensuring terminal nodes contain a sufficient number of observations, which enhances prediction reliability and reduces sensitivity to noise. Meanwhile, leaving maxnodes set to null allows the algorithm to automatically determine the optimal number of terminal nodes, offering flexibility and ensuring an appropriate balance in model complexity based on the data. Overall, these parameter adjustments lead to a more accurate and resilient Random Forest model, improving its performance in various regression tasks.

Setting the parameter prior.scale = 2.5 in Bayesian modeling influences the degree of shrinkage applied to the parameter estimates. A larger scale value allows for more variability in the estimates, which can be advantageous in scenarios where the true effects are expected to vary significantly among predictors. This helps avoid over-regularization and ensures that important signals in the data are not unduly suppressed. The parameters of GLMBoost are set as follows: the family is assigned to Gaussian(), which is the default for linear regression. This ensures that the model is tailored for linear regression, effectively modeling continuous response variables. Limiting the number of boosting iterations to 100 helps control overfitting, while a learning rate (nu) of 0.1 promotes gradual updates, enhancing stability and robustness. Additionally, selecting "bols" as the base learner allows for efficient linear basis functions, facilitating interpretability and computational efficiency. Overall, these parameter settings create a balanced and effective GLMBoost model that reliably captures linear relationships in the data.

Setting the learning rate (eta) in gradient boosting algorithms like eXGB is crucial for effective model training. A smaller value for eta, typically ranging from 0.01 to 0.3 (with the default being 0.3, we set it to 0.1), slows down the learning process, allowing the model to gradually update the weights with each iteration. This can lead to better model performance as it promotes careful learning and helps prevent overfitting to the training data. Moreover, setting max_depth = 6 helps control overfitting, reduces model complexity, improves training efficiency, enhances interpretability, and often balances performance based on empirical evidence from various datasets. When colsample_bytree is set to 1, the algorithm uses all features when building each tree, ensuring that important information is not lost due to random sampling of features during tree construction.

Results

Demographic characteristics of the study population

A total of 4,835 patients who underwent triple-vessel new-onset PCI between January 1, 2015, and December 31, 2017, were included in this study. Table 3 presents the demographic characteristics and underlying diseases of these patients. We analyzed 64 variables presumably influencing the target variable (Y), namely surgical expenditures (NT$253,139). Regarding baseline factors, the percentage of women was higher than that of men (78.08% vs. 21.92%). The average age (X2) was 65.46 ± 11.51 years, with an average CCI score (X3) of 4.47 ± 3.00. Variables from X4 to X12 comprised data from the previous year, including hemodialysis sessions (166.10 ± 150.18), PD sessions (23.50 ± 24.14), emergency department (ED) visits (X6, 17.10 ± 38.40), blood transfusion procedures (X7, 6.48 ± 8.16 bags), mechanical ventilation use patterns (X8, 9.78 ± 28.04 days), outpatient expenses (X9, NT$107,343 ± NT$248,107), hospitalization expenses (X10, NT$307,704 ± NT$267,111), and anastomosis procedures (X11, 2.78 ± 0.62 vessels). Variables from X13 to X43 included data related to the following comorbidities: hypertension (X13, 90.75%), hyperlipidemia (X14, 85.23%), diabetes mellitus (DM; X16, 64.45%), chronic obstructive pulmonary disease (COPD; X18, 41.94%), gastrointestinal bleeding (X36, 41.84%), lower respiratory tract infection (LRTI; X40, 41.84%), soft tissue and bone infection (STBI; X41, 63.33%), and gastrointestinal infection (GTI; X42, 44.55%). Variables from X44 to X64, totaling 21 variables, included data related to the patients’ histories of medication use. The following drugs had high use rates: nonsteroidal anti-inflammatory drugs (NSAIDs; X49, 87.78%), calcium channel blockers (CCBs; X50, 69.89%), lipid-lowering drugs (X52, 73.32%), benzodiazepines (BZDs; X54, 58.59%), β-blockers (X56, 69.29%), angiotensin-converting enzyme (ACE) inhibitors and angiotensin II receptor blockers (ARBs; X58, 74.02%), and anticoagulants (X59, 87.92%).

Table 3 Baseline characteristics of patients who underwent triple-vessel new-onset PCI

This study utilized various prediction models, including Logistic Regression (LR), Random Forest (RF), Support Vector Regression (SVR), GLMBoost, BayesGLM, and eXtreme Gradient Boosting (eXGB). To determine the optimal value for each indicator, we evaluated several metrics: SMAPE, MAPE, MAE, MASE, RMSE, relative squared error (RSE), and MSE. Each algorithm employs its own computational methods and variable selection processes.

Table 4 presents the performance results of machine learning models on the testing datasets. The training dataset results will be shown in Supplementary 4. During the first stage of evaluation, eXGB achieved the lowest values for both the training and testing datasets when 28 variables were selected by its algorithm: MSE (training: 0.02875, testing: 0.02501), RMSE (training: 0.12348, testing: 0.15813), and MAPE (training: 0.00885, testing: 0.00816). These results underscore the importance of minimizing error. Additionally, GLMBoost showed competitive results after selecting 30 variables, but overall, eXGB remained the most potential in predictive modeling.

Table 4 Performance results of ML models on testing dataset

In the second stage, when variables were averaged across six algorithms in the second stage, eXGB again showed superior performance with the 25th variable. eXGB exhibited the lowest MAPE values for both the training and testing datasets (0.00841 and 0.00755, respectively). Additionally, the RMSE was 0.15552 for the training and 0.12135 for the testing dataset. These findings emphasize the impact of variable selection on model performance, offering insights for optimizing machine learning applications.

Figure 3 depicts the optimal prediction result of the MAPE for surgical expenses.

Fig. 3
figure 3

MAPE results with a cumulative number of variables

We evaluated the importance of each variable across multiple ML models to construct an effective model. Because each algorithm involves unique calculations for determining variable importance, the selected variables exhibited certain differences. According to the literature, while evaluating the risk factors associated with 1-year medical expenses after discharge, various feature selection methods can be used to identify key variables that offer valuable insights [24]. In the present study, 64 variables were selected based on each physician’s clinical experience and a comprehensive literature review [12, 16]. After these variables were filtered, six ML algorithms were used to make predictions. The variables were ranked depending on their scores, with the highest-scoring variables deemed the most crucial and placed at the top of the ranking. Table 5 lists the optimal parameters selected through ML predictions. After applying the six ML algorithms, the top five key variables were arranged as follows depending on their average ranking: previous-year hospitalization cost, the average duration of mechanical ventilation, CCI score, outpatient cost, and the average number of blood bags transfused.

Table 5 Ranking of key features with optimal accuracy for surgical expenses

We averaged the following 25 variables depending on their ranking in the six prediction models: age, CCI score, average blood transfusion volume, hyperlipidemia, chronic kidney disease (CKD), diabetes, acute coronary syndrome (ACS), cardiogenic shock, malignant dysrhythmia, average ventilator use duration, gastrointestinal bleeding, AKF, and LRTI. As shown in Fig. 3, these 25 variables were used to establish an ensemble learning prediction model.

Discussion

PCI is a minimally invasive procedure that does not require an open incision and is associated with a short recovery period [65]. Compared with PCI, CABG is associated with longer surgical durations and postoperative hospital stays [66]. For these reasons, patients with coronary artery disease (CAD) typically prefer PCI as the treatment approach for their heart disease. Over the preceding decade, SYNTAX scores have effectively customized revascularization treatments for patients with multivessel CAD [67]. Before a physician decides to proceed with surgery, they must make a preliminary judgment depending on their clinical experience and the patient s SYNTAX score to determine the severity of blood vessel blockage. Depending on their SYNTAX scores, patients are divided into three groups: low (score ≤ 22), intermediate (23 ≤ score ≤ 32), and high (score ≥ 33) [68, 69]. PCI is regarded as an effective revascularization strategy for patients with low SYNTAX scores, although it is also associated with a significantly high rate of repeat revascularization [69]. When patients with triple-vessel CAD undergo treatment, the likelihood of complications or even death during or after the intervention increases compared with those with one- or two-vessel CAD, and their mortality rate and associated treatment costs also increase [70]. In Europe, CAD is estimated to be responsible for approximately GBP60 billion in annual economic losses. In the United States, the costs associated with CAD were estimated to be $204.4 billion in 2010. By 2030, these medical expenses will increase by approximately 100% [71, 72]. Although the initial cost of PCI is typically lower than that of CABG, its overall cost may increase because of the need for additional procedures, with each additional complication potentially resulting in an exponential increase in the overall cost [73,74,75].

In Hong Kong, the cost of PCI procedures has risen significantly. The procedure costs approximately HK$27,550, but additional self-financed expenses for consumables, such as balloons and stents, range between HK$12,000 and HK$48,000, depending on the case’s complexity. Furthermore, hospitalization costs average HK$32,945 (US$4,224) [15]. According to a study conducted in Korea from 2011 to 2015, the average in-hospital cost for PCI was approximately 8,628,768 KRW, with the total medical cost within one year being 13,128,158 KRW [76]. According to a study conducted in Malaysia, the total hospitalization cost for PCI ranged between RM 11,519 and RM 14,356 [77]. The median cumulative cost of PCI was approximately USD $19,967, significantly higher than USD $9,071 for medical therapy [78]. The average cost of PCI across diverse studies stood at $13,501, with the lowest at $520 and, in a U.S. managed care analysis, the highest at $25,641. Prices in Europe ($12,208) and Asia ($11,717) were similar [79].

In the present study, multiple ML algorithms and feature screening combinations were used to identify the key factors affecting the costs of PCI. After identifying these key features, we used six ML algorithms to construct a predictive model and evaluated this model’s performance to ensure its accuracy and reliability. Baciewicz et al. [80] reported a correlation between high treatment costs and increased blood transfusion requirements in critically ill patients. DM, hypertension, and major adverse cardiovascular events are typically associated with high medical expenditure [81]. CAD is often treated using antiplatelet agents, statins, β-blockers, ACE inhibitors, and nitrates. After new-onset PCI, medication adherence negatively correlates with rehospitalization costs and total medical expenditures [81].

Inpatient incidence and mortality for PCI-related gastrointestinal bleeding has been increasing particularly with a large increase in incidence among older patients [82].After coronary angioplasty, it is often necessary to take two antiplatelet drugs, which increases the risk of bleeding. Once bleeding occurs, it will inevitably require time, medication, and other examinations such as gastroscopy. AKI after PCI is often overlooked because it usually occurs alongside other complications like cardiac dysfunction, which can independently worsen the patient’s condition [83]. Once AKI occurs, dialysis may be required, leading to an extended hospital stay. AKI, and specifically CI-AKI, during PCI is associated with significantly longer PCI admission LOS, PCI admission costs, and long-terms costs [17, 84]. Kristin et al. [85] reported that patients with ACS who consistently used lipid-lowering agents incurred significantly lower direct medical costs than those who did not. Ryba et al., discusses how pre-treatment with dopamine agonists can affect surgical complexity and postoperative outcomes, indirectly indicating the severity of the patients’ conditions [86]. The other study finds an association between dopamine use and patient outcomes, particularly in the context of severe preoperative conditions like shock [87].

In this study, we adopted a multi-stage approach to evaluate the medical expenses for patients undergoing triple vessel percutaneous coronary intervention (PCI). Initially, we employed a feature selection method to identify the key variables influencing medical costs. Following this step, we selected various machine learning models to develop a predictive model and assessed its performance using seven indicators. After variable selection and evaluation of the machine learning (ML) model, our results indicated that the most effective algorithm was eXGB, pinpointing significant factors influencing the cost of triple vessel PCI surgery from a set of 64 variables. These factors included baseline variables such as CCI scores (X3), age (X2), previous-year hospitalizations (X10), outpatient medical expenses (X9), the number of emergency department (ED) visits (X6), the average duration of ventilator use (X8), and the number of blood units transfused (X7). Moreover, among the medications used, anticoagulants (X59), dopamine (X44), and lipid-lowering agents (X52) were ranked among the top ten significant factors.

Lee et al. [15] find that the increasing complexity of treatments and extended hospital stays contribute to the overall rise in the total expense of PCI. Shander et al. [88] state that the annual spending on blood and transfusion-related activities for surgical patients ranged from $1.62 million to $6.03 million per hospital, mainly driven by the rate of transfusions. In 2018, Baciewicz et al. [80] was noted that the need for higher blood transfusions for sicker patients resulted in increased expenses. Therefore, when assessing the factors influencing the cost of PCI surgery, it is clear that baseline variables and specific medications are crucial determinants. Our findings suggest that a history of hospitalizations in the previous year and numerous emergency department visits indicate a more severe or unstable condition, thereby complicating the PCI procedure and increasing both the necessary level of care and the associated costs. Similarly, significant outpatient medical expenses, along with the requirement for extended ventilator support or multiple blood transfusions, indicate the patient’s health status and the complexity of the procedure.

On the medication front, the use of anticoagulants, dopamine, and lipid-lowering agents indicates the management of risks associated with PCI medical expenses. Some studies demonstrate that antiplatelet therapy is a cost-effective element of PCI [89]. Anticoagulants, typically used alongside PCI therapy, also help prevent ischemic stroke and systemic embolism in patients with atrial fibrillation (AF) before undergoing PCI [90, 91]. Additionally, the medical costs are influenced by hospitalization expenses, management of complications (e.g., diabetes, hyperlipidemia, ACS, hypertension, cardiogenic shock, acute renal failure), and long-term medication use (e.g., anticoagulants, antiplatelet agents, antihypertensive drugs, statins) The above result was consistent with our research. These factors are crucial for maintaining patient stability, reducing the risk of cardiovascular events, and contributing to treatment’s overall complexity and cost [76,77,78].

In our research, we observed that the preoperative use of dopamine in patients may indicate a compromised health status, potentially involving shock. This condition may indirectly influence the costs associated with the current surgical procedure and have implications for the subsequent prognosis. However, because patients may incur out-of-pocket expenses, these costs cannot be included in the study. The eXGB demonstrated the most effective evaluation model for predicting medical expenses among all the algorithms employed. This study focuses on evaluating healthcare costs covered by insurance, and understanding these factors is essential for enabling more accurate budgeting and resource allocation.

Conclusion

In this study, we developed an effective model for predicting the surgical expenses of patients undergoing triple-vessel PCI for the first time. This paper offers valuable recommendations, establishes a multistage framework, and identifies the variables influencing medical expenditure. The present findings provide useful insights for healthcare professionals aiming to alleviate future health insurance burdens. Therefore, the key is identifying measures to effectively reduce healthcare costs. These findings underscore the importance of medication adherence and intensive follow-up care in reducing overall medical expenses.

Data availability

Due to privacy concerns and various restrictions, the raw data from the Health and Welfare Data Science Center (HWDC) are not accessible to the public.

References

  1. Bui AL, Horwich TB, Fonarow GC. Epidemiology and risk profile of heart failure. Nat Rev Cardiol. 2011;8(1):30–41.

    Article  PubMed  Google Scholar 

  2. Reynolds K, et al. Relation of acute heart failure hospital length of stay to subsequent readmission and all-cause mortality. Am J Cardiol. 2015;116(3):400–5.

    Article  PubMed  Google Scholar 

  3. Gasior M, et al. COnteMporary modalities in treatment of heart failure: a report from the COMMIT-HF registry. Kardiol Pol. 2016;74(6):523–8.

    Article  PubMed  Google Scholar 

  4. Ghahramani S, Kazerooni AR, Hasannia S, Sayari M, Kazerooni AHR, Lankarani KB. Catastrophic Health Expenditure and Out-of-pocket Payments for Percutaneous Coronary Intervention (PCI) and Coronary Artery Bypass Grafting (CABG). Jundishapur J Chronic Dis Care. 2023;12(4). https://brieflands.com/articles/jjcdc-138446.

  5. Roth GA, et al. Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015. J Am Coll Cardiol. 2017;70(1):1–25.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Galbreath AD, Krasuski RA, Smith B, Stajduhar KC, Kwan MD, Ellis R, Freeman GL. Long-term healthcare and cost outcomes of disease management in a large, randomized, community-based population with heart failure. Circulation. 2004;110(23):3518-26. https://www.ahajournals.org/doi/full/10.1161/01.CIR.0000148957.62328.89.

  7. Salarvand S, et al. Challenges experienced by nurses in the implementation of a healthcare reform plan in Iran. Electron Physician. 2017;9(4):4131.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Khosravi A, et al. Impact of misclassification on measures of cardiovascular disease mortality in the Islamic Republic of Iran: a cross-sectional study. Bull World Health Organ. 2008;86(9):688–96.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Tyminska A, et al. Heart failure patients with a previous coronary revascularization: results from the ESC-HF Registry. Kardiol Pol. 2018;76(1):144–52.

    Article  PubMed  Google Scholar 

  10. Singh AK. Percutaneous coronary intervention vs coronary artery bypass grafting in the management of chronic stable angina: a critical appraisal. J Cardiovasc Dis Res. 2010;1(2):54–8.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Bohm M, Werner N. PCI for 3-vessel disease. 2009.

    Google Scholar 

  12. Pohlen M, et al. Risk predictors for adverse outcomes after percutaneous coronary interventions and their related costs. Clin Res Cardiol. 2008;97:441–8.

    Article  PubMed  Google Scholar 

  13. Hautala AJ, et al. Machine learning models in predicting health care costs in patients with a recent acute coronary syndrome: a prospective pilot study. Cardiovasc Digit Health J. 2023;4(4):137–42.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Moleerergpoom W, et al. Costs of payment in Thai acute coronary syndrome patients. J Med Assoc Thai. 2007;90(Suppl 1):21–31.

    PubMed  Google Scholar 

  15. Lee VW, et al. Direct medical cost of newly diagnosed stable coronary artery disease in Hong Kong. Heart Asia. 2013;5(1):1–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Afana M, et al. Hospitalization costs for acute myocardial infarction patients treated with percutaneous coronary intervention in the United States are substantially higher than Medicare payments. Clin Cardiol. 2015;38(1):13–9.

    Article  PubMed  Google Scholar 

  17. Amin AP, et al. Incremental cost of acute kidney injury after percutaneous coronary intervention in the United States. Am J Cardiol. 2020;125(1):29–33.

    Article  PubMed  Google Scholar 

  18. Chen Y-C, et al. Taiwan’s National Health Insurance Research Database: administrative health care database as study object in bibliometrics. Scientometrics. 2010;86(2):365–80.

    Article  Google Scholar 

  19. Hsieh CY, et al. Taiwan’s national health insurance research database: past and future. Clin Epidemiol. 2019;11:349–58.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Lin L, et al. Data resource profile: the national health insurance research database (NHIRD). Epidemiol Health. 2018;40:e2018062.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Hudaverdi T, Akyildiz O. Investigation of the site-specific character of blast vibration prediction. Environ Earth Sci. 2017;76:1–16.

    Article  Google Scholar 

  22. Popoola SI, et al. Optimal model for path loss predictions using feed-forward neural networks. Cogent Eng. 2018;5(1):1444345.

    Article  Google Scholar 

  23. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:e623.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Huang YC, et al. The prediction model of medical expenditure appling machine learning algorithm in CABG patients. Healthcare (Basel). 2021;9(6):710.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Ghosh P, et al. efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access. 2021;9:19304–26.

    Article  Google Scholar 

  26. Muthukrishnan R, Rohini R. LASSO: A feature selection technique in predictive modeling for machine learning. In 2016 IEEE international conference on advances in computer applications (ICACA). 2016. p. 18-20. https://ieeexplore.ieee.org/abstract/document/7887916/?casa_token=WHJlnRuyWL4AAAAA:1YPVDUdWO70NwkC_tPiRcC9KaFl57MaAFOUVKVsmRPUGk-jyTSWJKw_mc9dFK6_mgm6zSBAb.

  27. Chang CC, et al. Developing a stacked ensemble-based classification scheme to predict second primary cancers in head and neck cancer survivors. Int J Environ Res Public Health. 2021;18(23):12499.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Huang YC, et al. Machine-learning techniques for feature selection and prediction of mortality in elderly CABG patients. Healthcare (Basel). 2021;9(5):547.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Lin YT, et al. Prediction of recurrence-associated death from localized prostate cancer with a charlson comorbidity index-reinforced machine learning model. Open Med (Wars). 2019;14:593–606.

    Article  PubMed  Google Scholar 

  30. Raoof SS, Jabbar MA, Fathima SA. Lung Cancer prediction using machine learning: A comprehensive approach. In 2020 2nd International conference on innovative mechanisms for industry applications (ICIMIA). IEEE; 2020. p. 108-115. https://ieeexplore.ieee.org/abstract/document/9074947?casa_token=pjpKAV5RP3wAAAAA:Avk2m0Viqkod4GQ-_GW99gRwPaGbIMVq1diBEd_scsgKyjuiOOwDXh-XVhj_a22wfV3Z53gU.

  31. Liu S, et al. Machine learning-based long-term outcome prediction in patients undergoing percutaneous coronary intervention. Cardiovas Diagn Ther. 2021;11(3):736.

    Article  Google Scholar 

  32. Zack CJ, et al. Leveraging machine learning techniques to forecast patient prognosis after percutaneous coronary intervention. Cardiovasc Interv. 2019;12(14):1304–11.

    Google Scholar 

  33. Cruz EO, Sakowitz S, Mallick S, Le N, Chervu N, Bakhtiyar SS, Benharash P. Machine learning prediction of hospitalization costs for coronary artery bypass grafting operations. Surgery. 2024;176(2):282–8. https://www.sciencedirect.com/science/article/pii/S0039606024002162.

  34. Han H, Guo X, Yu H. Variable selection using mean decrease accuracy and mean decrease gini based on random forest. In 2016 7th ieee international conference on software engineering and service science (icsess). IEEE; 2016. p. 219-224. https://ieeexplore.ieee.org/abstract/document/7883053?casa_token=Vrao1DXSgT4AAAAA:RaLB6gQx6b7VJw2Ff-RaLB6gQx6b7VJw2FfvtSNf19SI_QiJ99o-zRV4RK_qIDQpQYNuyUUrdohG2sn0DXd_0CBXO.

  35. Bénard C, Da Veiga S, Scornet E. Mean decrease accuracy for random forests: inconsistency, and a practical solution via the Sobol-MDA. Biometrika. 2022;109(4):881–900.

    Article  Google Scholar 

  36. Ding Y, et al. Determination of soil source using laser induced breakdown spectroscopy combined with feature selection. J Anal At Spectrom. 2023;38(11):2499–506.

    Article  CAS  Google Scholar 

  37. Huang YC, Ho CW, Chou WR, Chen M. A framework to predict second primary lung cancer patients by using ensemble models. Ann Operations Res. 2023:1–25. https://link.springer.com/article/10.1007/s10479-023-05691-x.

  38. Huang Y-C, Li S-J, Chen M, Lee T-S, Chien Y-N. Machine-Learning Techniques for Feature Selection and Prediction of Mortality in Elderly CABG Patients. Healthcare. 2021;9(5):547. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/healthcare9050547.

  39. Ferreira AJ, Figueiredo MA. Boosting algorithms: A review of methods, theory, and applications. Ensemble machine learning: Methods and applications. 2012;35-85. https://link.springer.com/chapter/10.1007/978-1-4419-9326-7_2.

  40. McKearnan SB, et al. Feature selection for support vector regression using a genetic algorithm. Biostatistics. 2023;24(2):295–308.

    Article  PubMed  Google Scholar 

  41. Casella G, Moreno E. Objective Bayesian variable selection. J Am Stat Assoc. 2006;101(473):157–67.

    Article  CAS  Google Scholar 

  42. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.

  43. Xie Y, et al. Evaluation of machine learning methods for formation lithology identification: a comparison of tuning processes and model performances. J Petrol Sci Eng. 2018;160:182–93.

    Article  CAS  Google Scholar 

  44. Borboudakis G, Tsamardinos I. Forward-backward selection with early dropping. J Mach Learn Res. 2019;20(8):1–39.

    Google Scholar 

  45. Su X, Yan X, Tsai CL. Linear regression. Wiley Interdisciplinary Rev Comput Stat. 2012;4(3):275–94.

    Article  Google Scholar 

  46. Snee RD. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. J Qual Tech. 1983;15(3):149–53. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/00224065.1983.11978865.

  47. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  48. Cutler DR, et al. Random forests for classification in ecology. Ecology. 2007;88(11):2783–92.

    Article  PubMed  Google Scholar 

  49. Han S, Kim H, Lee Y-S. Double random forest. Mach Learn. 2020;109:1569–86.

    Article  Google Scholar 

  50. Banfield RE, et al. A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell. 2006;29(1):173–80.

    Article  Google Scholar 

  51. Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004;14:199–222.

    Article  Google Scholar 

  52. Zhang F, O’Donnell LJ. Support vector regression, Chapter 7. In: Mechelli A, Vieira S, editors. Machine learning. Academic Press; 2020. p. 123–140. https://www.sciencedirect.com/science/article/abs/pii/B9780128157398000079.

  53. Kavitha S, Varuna S, Ramya R. "A comparative analysis on linear regression and support vector regression," 2016 Online International Conference on Green Engineering and Technologies (IC-GET). Coimbatore: 2016. p. 1-5. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/GET.2016.7916627.

  54. Zhang K, et al. Machine learning-based prediction of survival prognosis in esophageal squamous cell carcinoma. Sci Rep. 2023;13(1):13532.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Adamu HA, Muhammad M, Jingi AM. Application of gradient boosting algorithm in statistical modeling. Research & Reviews: Journal of Statistics and Mathematical Sciences. 2019;5:11–18.

  56. Mosavi A, et al. Groundwater salinity susceptibility mapping using classifier ensemble and Bayesian machine learning models. Ieee Access. 2020;8:145564–76.

    Article  Google Scholar 

  57. Fneish F, Ellenberger D, Frahm N, Stahmann A, Schaarschmidt F. Appropriate statistical model for count data in central statistical monitoring and application on the German Multiple Sclerosis Registry. https://www.msregister.de/fileadmin/resources/public/documents/publications/poster/GMDS_CSM_Fneish.pdf.

  58. Ramraj S, et al. Experimenting XGBoost algorithm for prediction and classification of different datasets. Int J Control Theory Appl. 2016;9(40):651–62.

    Google Scholar 

  59. Rodea-Montero ER, et al. Trends, structural changes, and assessment of time series models for forecasting hospital discharge due to death at a Mexican tertiary care hospital. PLoS One. 2021;16(3):e0248277.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kim S, Kim H. A new metric of absolute percentage error for intermittent demand forecasts. Int J Forecast. 2016;32(3):669–79.

    Article  Google Scholar 

  61. Chen W-J, et al. Hybrid basketball game outcome prediction model by integrating data mining methods for the national basketball association. Entropy. 2021;23(4):477.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Juang W-C, et al. Application of time series analysis in modelling and forecasting emergency department visits in a medical centre in Southern Taiwan. BMJ Open. 2017;7(11):e018628.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Saquib SS, Bouman CA, Sauer K. ML parameter estimation for Markov random fields with applications to Bayesian tomography. IEEE Trans Image Process. 1998;7(7):1029–44.

    Article  CAS  PubMed  Google Scholar 

  64. Islam ARMT, et al. Estimating ground-level PM2. 5 using subset regression model and machine learning algorithms in Asian megacity, Dhaka, Bangladesh. Air Qual Atmos Health. 2023;16(6):1117–39.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Lee H, et al. Trends in percutaneous coronary intervention and coronary artery bypass surgery in Korea. Korean J Thoracic Cardiovasc Surg. 2016;49(Suppl 1):S60.

    Article  Google Scholar 

  66. Lim GB. Long-term superiority of CABG surgery for three-vessel disease confirmed. Nat Rev Cardiol. 2014;11(7):372–372.

    Article  PubMed  Google Scholar 

  67. Barac YD, et al. The Clinical SYNTAX score predicts survival better than the SYNTAX score in coronary revascularization. J Thoracic Cardiovasc Surg. 2024;167(1):164–173. e4.

    Article  Google Scholar 

  68. Serruys PW, et al. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. N Engl J Med. 2009;360(10):961–72.

    Article  CAS  PubMed  Google Scholar 

  69. Kashiyama T, et al. A multidirectional approach to risk assessment in patients with three-vessel coronary artery disease undergoing percutaneous intervention. J Cardiol. 2017;69(4):640–7.

    Article  PubMed  Google Scholar 

  70. Peterson ED, et al. Contemporary mortality risk prediction for percutaneous coronary intervention: results from 588,398 procedures in the National Cardiovascular Data Registry. J Am Coll Cardiol. 2010;55(18):1923–32.

    Article  PubMed  PubMed Central  Google Scholar 

  71. Nichols M, Townsend N, Luengo-Fernandez R, Leal J, Gray A, Scarborough P, Rayner M. European cardiovascular disease statistics 2012. 2012. https://research-information.bris.ac.uk/en/publications/european-cardiovascular-disease-statistics-2012.

  72. Go AS, et al. Heart disease and stroke statistics—2014 update: a report from the American Heart Association. Circulation. 2014;129(3):e28–292.

    PubMed  Google Scholar 

  73. Gholami SS, et al. Cost-effectiveness of coronary artery bypass graft and percutaneous coronary intervention compared to medical therapy in patients with coronary artery disease: a systematic review. Heart Fail Rev. 2019;24:967–75.

    Article  PubMed  Google Scholar 

  74. Rezapour A, et al. Effectiveness of revascularization interventions compared with medical therapy in patients with ischemic cardiomyopathy: A systematic review protocol. Medicine. 2018;97(10):e9958.

    Article  PubMed  Google Scholar 

  75. Mehaffey JH, et al. Cost of individual complications following coronary artery bypass grafting. J Thoracic Cardiovasc Surg. 2018;155(3):875–882. e1.

    Article  Google Scholar 

  76. Han S, et al. Trends, characteristics, and clinical outcomes of patients undergoing percutaneous coronary intervention in Korea between 2011 and 2015. Korean Circ J. 2018;48(4):310–21.

    Article  PubMed  Google Scholar 

  77. Yun LK. Cost Analysis of Elective Percutaneous Coronary Intervention and Its Predictors in Malaysia (Doctoral dissertation, University of Malaya (Malaysia)). 2017. https://www.proquest.com/openview/9088a28d274a97369ef228d75f8af6cf/1?pq-origsite=gscholar&cbl=2026366&diss=y.

  78. Vieira RDO, et al. Cost-effectiveness analysis for surgical, angioplasty, or medical therapeutics for coronary artery disease. Circulation. 2012;126(11_suppl_1):S145–50.

    Article  PubMed  Google Scholar 

  79. Nicholson G, et al. Patient-level costs of major cardiovascular conditions: a review of the international literature. Clinicoecon Outcomes Res. 2016;8:495–506.

    Article  PubMed  PubMed Central  Google Scholar 

  80. Baciewicz FA. Show me the money (cost). J Thorac Cardiovasc Surg. 2018;155(3):883–4.

    Article  PubMed  Google Scholar 

  81. Jia J-J, et al. Impact of physician-coordinated intensive follow-up on long-term medical costs in patients with unstable angina undergoing percutaneous coronary intervention. Acta Cardiol Sin. 2017;33(2):173.

    PubMed  PubMed Central  Google Scholar 

  82. Cholankeril G, et al. Inpatient outcomes for gastrointestinal bleeding associated with percutaneous coronary intervention. J Clin Gastroenterol. 2019;53(2):120–6.

    Article  PubMed  Google Scholar 

  83. Zhang Y-F, et al. Acute kidney injury in patients with acute coronary syndrome after percutaneous coronary intervention: pathophysiologies, risk factors, and preventive measures. Cardiology. 2021;146(6):678–89.

    Article  CAS  PubMed  Google Scholar 

  84. Griffiths RI, et al. Cost to Medicare of acute kidney injury in percutaneous coronary intervention. Am Heart J. 2023;262:20–8.

    Article  PubMed  Google Scholar 

  85. Kristin E, Krisdinarti L, Yasmina A, Pratiwi WR, Febrinasari RP, Mahati E, Indra Jaya S. Medication Persistence to Lipid-lowering Agents As A Cost-saving Opportunities on Patients with Acute Coronary Syndrome after Percutaneous Coronary Intervention in Indonesia. Indones J Pharm. 2022;33(2):291–8. https://journal.ugm.ac.id/v3/IJP/article/view/2588.

  86. Ryba A, et al. Preoperative treatment with dopamine agonist therapy influences surgical outcome in prolactinoma: a retrospective single-center on 159 patients. Acta Neurochir. 2024;166(1):1–10.

    Article  Google Scholar 

  87. Suzuki R, et al. Dopamine use and its consequences in the intensive care unit: a cohort study utilizing the Japanese Intensive care PAtient Database. Crit Care. 2022;26(1):90.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Shander A, et al. Activity-based costs of blood transfusions in surgical patients at four hospitals. Transfusion. 2010;50(4):753–65.

    Article  PubMed  Google Scholar 

  89. Cortese B, Sebik R, Valgimigli M. The conundrum of antithrombotic drugs before, during and after primary PCI. EuroIntervention. 2014;10(Suppl T):T64–73.

    Article  PubMed  Google Scholar 

  90. Koziński M, et al. Updated overview of evidence on optimal antithrombotic therapy in patients with atrial fibrillation undergoing percutanous coronary intervention. Adv Interv Cardiol. 2020;16(2):127–37.

    Google Scholar 

  91. Weintraub WS, Mandel L, Weiss SA. Antiplatelet therapy in patients undergoing percutaneous coronary intervention: economic considerations. Pharmacoeconomics. 2013;31:959–70.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Thank you to the Health and Welfare Data Science Center (HWDC) for providing the data that made this research analysis possible.

Funding

National Science and Technology Council: 113–2410-H-032–001-

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, K.-Y.C. and M.C.; data curation, Y.-C.H.; formal analysis, Y.-C.H.; methodology, K.-Y.C., Y.-C.H.; project administration, K.-Y.C., C.-K.L. and M.C.; resources, M.C.; supervision, S.-J.L.; validation, K.-Y.C. and S.-J.L.; visualization, C.-K.L.; writing—original draft, Y.-C.H. and Y.-C.L.; writing—review & editing, K.-Y.C., Y.-C.H., C.-K.L., S.-J.L., and M.C. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Mingchih Chen.

Ethics declarations

Ethics approval and consent to participate

Following national regulations and with the approval of the Institutional Review Board (IRB) at Fu Jen Catholic University, the requirement for obtaining informed consent was waived for this study. The waiver was granted based on the assurance that participant information would remain confidential, with data anonymized through unique identifiers (protocol code C108121; approved on March 5, 2020). This waiver was deemed appropriate as the study met specific criteria outlined in Taiwan’s Human Subjects Protection Act and rigorously adhered to all established protocols.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, KY., Huang, YC., Liu, CK. et al. Machine learning-driven prediction of medical expenses in triple-vessel PCI patients using feature selection. BMC Health Serv Res 25, 105 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12913-025-12218-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12913-025-12218-6

Keywords