INTRODUCTION
Pediatric hospital readmissions have received attention in recent decades. The 30-day readmission rate for hospitalized children is still high, ranging from 4.40 to 29.50 % (1,2). Studies show that hospital readmission can negatively influence patients’ quality of life (3) with short and long-term consequences (3-7). Besides, they can contribute substantially to the increase in healthcare costs (8-10). A study of pediatric patients revealed that the hospital cost for all admissions and readmissions is US$17.3 billion, of which 21.5 % (US$3.71 billion) was spent during a readmission hospital stay (10). Another study found that of the US$11.6 billion spent annually for all hospitalizations, being US$2.0 billion (16.9 % of total hospitalization costs) related to all-cause readmissions within 30 days (9).
Despite that, identifying patients at high risk of readmission and implementing timely interventions remains a challenge for healthcare professionals. Recently, predictive modeling has been pointed out as an efficient method to stratify the risk of readmission, allowing the targeting of preventive interventions to patients at risk, thus optimizing the allocation of clinical resources (11). Tools capable of early identification of patients at risk of readmission have been proposed in order to helping to minimize the incidence of hospital readmissions (2,12-15). However, there is still a lack of practical and easily understood predictive models to support clinical decisions. The reported models are often poorly designed, being mainly based on black-box algorithms (2,12-15), which makes it impossible to know how clinical factors led to forecasts.
For healthcare applications, the model’s interpretability is as important as its performance. So, when it is possible to observe the attributes and the decision paths rationally, the predictive clinical model became easier for their application by the health team. Given this, decision trees, based on a supervised machine learning approach, can be an excellent option. Since this method relates the nodes to each other hierarchically (16), resulting in an easy model to interpret. Despite the known nutritional problems with negative outcomes, are scarce the studies investigating these aspects. So, some studies with artificial intelligent have highlighted that nutrition should be considered in several areas of health that keep a biological relationship with nutrition (17).
Therefore, the aim of our study is to build an interpretable predictive model using a decision tree algorithm to identify patients at the risk of 30-day potentially avoidable readmissions.
METHODS
STUDY DESIGN AND SETTINGS
A retrospective cohort study was conducted at a tertiary university hospital from January 1st, 2014 to December 31st, 2018. We included 528 children and adolescents between 0 and 18 years old, who had all data retrieved from electronic databases (biochemical exams and nutritional monitoring). We excluded hospitalizations that resulted in hospital death (not at risk for readmission outcome), discharges against medical advice (not at the opportunity to implement care plan and discharge instruction) and patients with incomplete data in the electronic databases.
In order to avoid algorithmic bias when we perform machine learning techniques, we try to minimize the class imbalance since that can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes. To address class imbalance problems, a 1:1 nested case-control design was performed, we included patients who readmitted and had complete data (cases) and randomly selected patients with complete data who did not readmit (controls) (Fig. 1). The university’s Ethics Committee approved this study (CAAE 51706221.3.0000.5152, protocol number: 5.003.236). This manuscript followed the guide “Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis” (TRIPOD) statement for the reporting of the prediction model.
PREDICTORS SELECTION
Demographic data (age and sex) and clinical data (wards, admission type, diagnoses and length of hospital stay — number of days), biochemical exams (blood count, leukogram, sodium and C-reactive protein — CRP) and presence or absence of any nutritional monitoring during hospitalization and were obtained from electronic databases.
We classified the age of patients in six groups, according to childhood and adolescence periods of growth and development: < 1 years, ≥ 1 to < 5 years, ≥ 5 to < 9 years, ≥ 9 to < 13 years, ≥ 13 to < 16 years and ≥ 16 years. Length of hospital stay (LOS) was categorized into quartiles: < 8, ≥ 8 to < 17, ≥ 17 and < 38 and ≥ 38 days. All blood tests were performed at a single laboratory. CRP was measured by methods immunoturbidimetry using a Cobas® 6000 analyzer, sodium measured by potentiometric methods and hematological parameters were analyzed using an automated Sysmex XN-3000™ hematology analyzer. We categorized biochemical exams as altered and normal, considering age and sex. Nutritional data were not filled in a standardized way in the electronic databases, so it was not possible to classify the nutritional status of patients. Therefore, we could only observe whether the child had any nutritional monitoring during hospitalization. Therefore, we created a variable showing patients who had nutritional monitoring during hospital stay (with or without nutritional monitoring during hospitalization).
OUTCOME
The outcome was the 30-day potentially avoidable readmissions, considered as a new admission within this short period after the immediately previous hospital discharge. Thus, all unavoidable readmissions, all patients admitted to wards with planned hospitalizations (obstetrics, gynecology and transplant) or with predictable admissions, such as labour/delivery, and chemotherapy or radiotherapy treatments (ambulatory care) were excluded.
STATISTICAL ANALYSIS
Descriptive data were summarized using proportions or means (± standard deviation, SD). For the statistical analysis, we use the R Project (version 4.0.3), the RStudio (version 4.0.2), and considered the 95 % confidence intervals (95 % CI). The machine learning-based decision tree algorithm J48, present in the Weka suite, was used to develop best-fit trees in order to select the minimum set of characteristics capable of classify patients at risk of 30-day potentially avoidable readmissions efficiently. The J48 algorithm produces decision trees based on the concept of information gain ratio, thereby reducing entropy and improving the tree’s predictive accuracy. Based on this concept, the J48 algorithm searches for the best attribute and threshold and divides the data into two subsets: those with attribute values above the threshold and those below or equal. This process is repeated for each subset created until a stop criterion is found, which ensures that the most informative attributes are used to construct a decision tree that effectively models the underlying patterns in the data.
The leave-one-out cross-validation (LOOCV) applied to estimate the classification accuracy and test the generalizability of the model. We computed the area under the receiver operating curve (AUC). We also estimate others measures of diagnostic performance model such as the specificity, sensitivity, post-positive predictive value and negative predictive values. We performed the analyses using WEKA software (Waikato Environment for Knowledge Analysis, version 3.6.1).
RESULTS
Of the 528 patients aged between 0 and 18 years, 60.2 % (318) were male, 33.5 % (177) had under one year of age. The frequency of 30-day potentially avoidable readmissions was 50.0 % (264). Of these, 31.10 % (82) had a length of hospital stay less than 8 days, 70.8 % (187) and 85.6 % (226) had hemoglobin and CRP levels altered, respectively (Table I).
Table I. Demographic, clinical and biochemical variables for potentially avoidable 30-day readmission.

Considering all available predictors, a decision tree inferred by from the J48 method was constructed to classify patients with a risk of 30-day potentially avoidable readmissions (Fig. 2). Regarding the model, the health team should look the C-reactive protein firstly. If the C-reactive protein is greater than 0.5mg/d, the hemoglobin should be observed. If it’s showed a normal level, the nutrition monitoring should be considered because if the patient has not a monitoring the readmission risk is greater. The decision tree algorithm to classify readmission vs non-readmission proposed the use of CRP, hemoglobin, sodium levels and nutritional data, obtaining an AUC of 0.65 and accuracy of 63.3 % the full training (FULL) and leave-one-out cross-validation (LOOCV) with specificity (68.37 %) and sensitivity (60.4 %). Besides that, their positive and negative predictive values were 76.52 % and 50.76 %, respectively (Fig. 2).

Figure 2. Decision tree algorithm proposed to differentiate patients with 30-day potentially avoidable readmissions. The total number of classified admissions (correct and incorrect) for each class is shown in parentheses for each terminal node. Incorrectly classified admissions appear after a slash “/”. The area under the receiver operating curve (AUC), full training (FULL) and leave-one-out cross-validation (LOOCV) accuracies are shown in the figure.
DISCUSSION
In this study, we used the J48 algorithm to build a classification model for 30-day potentially avoidable readmissions. The most important attributes for the model were CRP, hemoglobin and sodium levels, besides nutritional monitoring. Our findings were confirmed using the leave-one-out cross-validation. To the best of our knowledge, our study is the first to build a prediction model based on a decision tree with only three levels and confirmation by leave-one-out cross-validation. Being a model of easy understanding and application in clinical practice, making clear the contribution and direction of each association, as well as using attributes routinely found in hospital services. Furthermore, the rule found by our model applies to 63.6 % of new cases.
Previous studies had reported many risk factors involved with increased risk of hospital readmission: age (18-20), multimorbidity (11,13,19,21), prolonged duration of the last hospital stay (18,21,22), polypharmacy (15,23) and presence of diagnostics/conditions like anemia, malnutrition, cancer and global developmental delay (1,20). However, hospital readmission is still a recurring problem and difficult clinical management, involved with short and long-term deleterious effects (3-7), besides contributing substantially to hospital costs (8-10). A study investigating Intensive Care Unit (ICU) readmissions observed that early markers can be used to anticipate patients at high risk of clinical deterioration after ICU discharge (24). The early identification of patients at greater risk of readmission provides opportunities for targeting interventions and allocating clinical and financial resources. In this sense, predictive models have been proposed in the literature, with variable performances such as AUC of 0.65 using Naive Bayes for all-cause 30-day readmission (12), AUC of 0.65 with Gradient Boosted for 30-day unplanned hospital readmissions (13), AUC of 0.73 using Support Vector Machines with Polynomial Kernel for at-discharge models (14), and even AUC of 0.81 with XGBoost for unplanned readmissions within 30 days (2) all for 30-day hospital readmission. However, there are few practical and interpretable models that are easy to understand and apply, capable of supporting clinical decisions.
In this sense, we use a machine learning decision tree-based algorithm in order to build an interpretable model capable of identifying patients at risk of hospital readmission. Despite presenting a modest performance, AUC = 0.65, our results are relevant and capable of identifying new patients with a risk of readmission in 63.6 % of cases (LOOCV = 63.6 %). This validation in silico performed by leave-one-out cross-validation simulates the model performance as if it were another population (Wong, 2015). Besides that, we found good specificity (68.37 %) and sensitivity (60.4 %) besides a lower probability of negative predictive values, reinforcing a good model performance.
Therefore, with these measures of diagnostic performance, our model can effectively contribute to clinical practice, since it is a model that is easy to understand and apply in hospital routine, besides employing only relevant and easily got attributes in medical services.
In the model built, using the J48 algorithm, we identified that the most relevant attribute was the CRP levels, with more information in each iteration, being placed as the root of our decision tree, in which their high levels contribute to a risk of readmission. CRP is an acute phase protein, considered a sensitive and rapid response marker of inflammation. Studies have suggested that high CRP concentrations are correlated with the presence of ongoing organ dysfunction (25,26). So, the elevated CRP may serve as an indirect marker of disease severity, and could be linked to a higher risk of hospital readmission (25,26). A study found that high CRP levels were associated with a higher risk of readmission at 7 days (25), and also with a higher risk of adverse outcome after discharge from the intensive care unit (26).
Our decision tree also used hemoglobin levels in order to identify patients at risk of readmission. For patients with normal hemoglobin levels, it is necessary to assess the presence or absence of nutritional monitoring during hospitalization. Patients without nutritional support during hospitalization have a risk of being readmitted when compared to those followed up by a nutritionist.
Nutritional data have already been explored in previous observational studies, however, in most cases, they only assess the association of malnutrition with hospital readmission (1,20). However, they cannot address the relevance of nutritional monitoring during hospitalization. At least 80 % of patients admitted to a hospital must undergo nutritional screening within the first 24 hours of admission (27). Nutritional screening is the initial step allowing the identification of patients at nutritional risk and early intervention when necessary, minimizing deleterious effects related to nutritional status (28). However, regarding the nutritional approach in pediatric patients at the hospital level, there is still no consensus and the tools are scarce and little used (29). A study carried out in Brazil revealed that 43.3 % of medical records did not contain any records of the children’s nutritional status (30). According to the authors, this reiterates the under-reporting of this important data by the entire health team that assists hospitalized children (30). Because of this deficiency, many hospitalized patients may not receive any type of nutritional monitoring, which would make it difficult to identify nutritional losses with negative effects on their health. Therefore, it is relevant, besides malnutrition, to assess the effectiveness of nutritional monitoring and its contribution to hospital readmission.
On the other hand, if the hemoglobin levels are altered, it is necessary to assess the sodium levels. One study found that the hemoglobin level is inversely correlated with 30-day hospital readmission rates (31). Low hemoglobin levels may be related to anemia, a condition often diagnosed in hospitalized children (32-34) that can be both a symptom and a complication of many diseases. Studies suggest that anemia is a negative prognostic factor and may contribute to the worsening of clinical outcomes, besides negatively affecting the child’s health, with long-term deleterious effects (32,35). Moreover, sodium levels may contribute to identifying patients at risk of hospital readmission. Abnormal sodium levels are one of the most common electrolyte disturbances in hospitalized patients and have been associated with worse clinical outcomes. Studies have revealed that hyponatremia is associated with hospital readmission (36-38). However, in our study, we found no association for sodium levels below 134 mEq/L. One hypothesis, for the absence of association, may be because of the low frequency of hyponatremia (5.3 %) in the evaluated patients. Nevertheless, sodium levels above 134 mEq/L were associated with hospital readmission, and this rule applies to almost 50 % of the patients evaluated. Studies suggest that sodium levels may be a marker of the severity of the underlying disease, being related to an increase in negative health outcomes such as mortality (36,39,40), increased length of stay (36,37,40), or yet hospital readmission (36-38).
Hospital readmission is a challenging outcome, it contributes substantially to increased costs and is often associated with adverse health outcomes. Therefore, models capable of predicting the risk of readmission are of interest, these tools can help identify and reduce readmission, improve overall patient care and reduce healthcare costs. In our study we built a classification model for 30-day potentially avoidable readmissions using the J48 algorithm. We showed that one of the relevant predictors was nutritional monitoring, often neglected by predictive models. Future studies should be carried out exploring nutritional data, aiming to deepen knowledge and make health professionals aware of the importance of nutritional screening.
Some limitations for this study need to be pointed out. First of all, the extrapolation of the data must be careful, since this study was carried out with pediatric patients from a tertiary university hospital. Secondly, the small sample size, since the absence of complete data made it impossible to have a larger database. Algorithms are more effective when used in large databases. However, the present study has strengths; we evaluated all available data during the study period, applied LOOCV cross-validation, used predictors relevant to the pediatric population and were easily accessible, and finally, we sought to build a model that was easy to interpret and apply in practice.
CONCLUSION
The decision tree model found making showed that the CRP, hemoglobin, sodium levels and nutrition monitoring are the most important to classify 30-day potentially avoidable readmissions.
Our model allows the identification of individuals at risk of readmission, in an easy and practical way, facilitating the targeting of interventions by the medical team, and contributing to minimize this outcome.