• KACHN
  • Contact us
  • E-Submission
ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS

Articles

Original article

Predictive factors of adolescents’ happiness: a random forest analysis of the 2023 Korea Youth Risk Behavior Survey

Child Health Nursing Research 2025;31(2):85-95.
Published online: April 30, 2025

1Associate Professor, Department of Nursing, Gangneung-Wonju National University, Wonju, Korea

2PhD Student, Department of Nursing, Gangneung-Wonju National University, Wonju, Korea

3Master of Science in Nursing Student, Department of Nursing, Gangneung-Wonju National University, Wonju, Korea

Corresponding author Seong Kwang Kim Department of Nursing, Gangneung-Wonju National University, 150 Namwon-ro, Heungop-myeon, Wonju 26403, Korea Tel: +82-33-760-8650 Fax: +82-0504-034-1677 E-mail: tjdrhkd141@gwnu.ac.kr
• Received: December 14, 2024   • Revised: January 12, 2025   • Accepted: April 3, 2025

© 2025 Korean Academy of Child Health Nursing.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial and No Derivatives License (https://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted non-commercial use, distribution of the material without any modifications, and reproduction in any medium, provided the original works properly cited.

  • 275 Views
  • 16 Download
prev next
  • Purpose
    This study aimed to identify predictive factors affecting adolescents’ subjective happiness using data from the 2023 Korea Youth Risk Behavior Survey. A random forest model was applied to determine the strongest predictive factors, and its predictive performance was compared with traditional regression models.
  • Methods
    Responses from a total of 44,320 students from grades 7 to 12 were analyzed. Data pre-processing involved handling missing values and selecting variables to construct an optimal dataset. The random forest model was employed for prediction, and SHAP (Shapley Additive Explanations) analysis was used to assess variable importance.
  • Results
    The random forest model demonstrated a stable predictive performance, with an R2 of .37. Mental and physical health factors were found to significantly affect subjective happiness. Adolescents’ subjective happiness was most strongly influenced by perceived stress, perceived health, experiences of loneliness, generalized anxiety disorder, suicidal ideation, economic status, fatigue recovery from sleep, and academic performance.
  • Conclusion
    This study highlights the utility of machine learning in identifying factors influencing adolescents’ subjective happiness, addressing limitations of traditional regression approaches. These findings underscore the need for multidimensional interventions to improve mental and physical health, reduce stress and loneliness, and provide integrated support from schools and communities to enhance adolescents’ subjective happiness.
1. Background
Subjective happiness is an individual’s subjective evaluation of their happiness and life satisfaction. High levels of subjective happiness are characterized by high levels of positive emotions and life satisfaction and low levels of negative emotions [1]. Subjective happiness is influenced by various factors, including personality traits, ways of thinking, interpersonal relationships, and cultural and social systems [1]. In adolescents, subjective happiness is specifically affected by academic achievement, peer relationships, family dynamics, and the school environment [2]. South Korean adolescents have consistently reported the lowest levels of subjective happiness, both internationally and within Organization for Economic Cooperation and Development (OECD) countries. Subjective happiness has showed no improvement in this population since the early 2010s [2,3].
This phenomenon primarily stems from the educational system and socialization processes within families. South Korea tends to prioritize academic achievement over addressing the diverse developmental needs of adolescents, while parents often focus on grades and entrance exam results rather than on their children’s holistic development [2,4]. According to a 2024 report by the Child Fund Korea (Korean Green Umbrella Children’s Foundation), children and adolescents in South Korea experience increasing hours of extracurricular study as they progress through school grades [5]. This excessive academic burden, combined with South Korea’s competitive societal atmosphere, contributes to increased depression, anxiety, and suicidal ideation among adolescents [5].
Academic burden and a competitive social atmosphere can act as additional stressors for adolescents, contributing to lower levels of subjective happiness [5]. Other stressors include educational inequalities stemming from parental income disparities, reduced sleep quality due to increased afterschool study hours, and limited peer interactions [5].
Low subjective happiness during adolescence is a multifaceted issue and can significantly impact career development, interpersonal relationship formation, and social adaptation later in life [6-9]. In particular, low happiness during adolescence may negatively affect mental health and social adaptation. This may lead to lower individual quality of life as well as higher societal costs [10].
School health teachers and community nurses play crucial preventative roles. They interact directly with adolescents on the frontline, placing them in the position to detect signs of low subjective happiness early and, when necessary, provide appropriate interventions [5]. This allows them to significantly contribute to enhancing adolescents’ wellbeing. By identifying the factors that influence subjective happiness, school health teachers and community nurses can assess adolescents’ mental health, provide counseling and health education, and coordinate resources across different sectors to promote positive changes in subjective happiness [5].
Existing studies on adolescents’ subjective happiness have predominantly relied on traditional regression models, which present limitations in analyzing the multidimensional and complex nature of subjective happiness [11,12]. Regression models require the fulfillment of statistical assumptions regarding autocorrelation, the variance inflation factor, and multicollinearity. These constraints make it challenging to include numerous variables, limiting the ability to capture intricate interactions [13]. For instance, research utilizing the Korea Youth Risk Behavior Survey to study subjective happiness in adolescents divided variables into separate models based on factors rather than analyzing all variables within a single regression model [14].
Machine-learning techniques have emerged as promising alternatives to overcome these limitations. Unlike traditional regression models, machine learning is less constrained by statistical assumptions and can effectively capture nonlinear relationships and high-dimensional interactions [15]. For example, the widely used random forest algorithm is recognized as a representative predictive model because of its robust algorithmic structure and high predictive accuracy, as demonstrated in numerous empirical studies. Ensemble methods such as random forest also facilitate variable importance analysis to identify key predictive factors influencing complex phenomena [16].
In this context, the present study applied machine learning techniques to address the limitations of traditional regression models and more comprehensively analyzed the multifaceted interactions of various factors affecting adolescents’ subjective happiness. We aimed to deepen the understanding of adolescents’ subjective happiness, build predictive models with greater explanatory power, and contribute to the development of effective strategies to enhance well-being in this population.
2. Research Objectives
This study analyzed the predictive factors influencing adolescents’ subjective happiness using machine learning techniques. We aimed to achieve the following objectives: (1) To develop a predictive model for adolescents’ subjective happiness using a random forest machine learning model for regression. (2) To analyze the predictive performance of this machine learning model and compare it with that of traditional regression models to validate the utility and explanatory power of machine learning techniques in predicting adolescents’ subjective happiness. (3) To perform a variable importance analysis using the random forest machine learning model to identify the key factors that significantly influence adolescents’ subjective happiness to present practical implications for improving adolescents’ subjective happiness
Ethics statements: This study is a secondary data analysis. The research plan was reviewed and approved for exemption by the Institutional Review Board (IRB) of the Gangneung-Wonju National University, as the study was determined to pose minimal risk to human subject protection and privacy (IRB no., GWNUIRB-R2024-90).
1. Study Design
This study was a secondary data analysis aimed at identifying the predictive factors for adolescents’ subjective happiness using data from the 19th Korea Youth Risk Behavior Survey conducted in 2023 by the Korea Disease Control and Prevention Agency. This large-scale, cross-sectional survey was conducted nationwide to systematically assess the health behaviors of adolescents in South Korea and provide reliable and representative data. With application to this dataset, this study aimed to develop a machine learning-based predictive model of adolescents’ subjective happiness. Reporting followed the guidelines outlined in Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) [17]. A simplified visualization of the research design and process is presented in Supplement 1. This diagram provides an overview of the research process without detailed statistical methods.
2. Research Variables
According to previous studies, adolescents’ subjective happiness is known to have complex associations with various domains, including emotional factors (e.g., depression, stress), physical activity, sleep, economic status, and family environment [2,4-9]. The 19th Korea Youth Risk Behavior Survey (2023) contains questionnaire items covering mental health, physical activity, and health behaviors. Analyzing this dataset offers the advantage of utilizing a wide range of variables that influence adolescents’ subjective happiness. We analyzed 145 variables across 17 categories, including adolescents’ mental health, physical activity, dietary habits, personal hygiene, substance use, and general characteristics (Supplement 2). The dependent variable, “subjective happiness perception,” was measured using a 5-point Likert scale as a response to the question, “How happy do you feel in your daily life?” All other variables were independent variables in the analysis (Supplement 2). The precise definitions of our research variables can be viewed on the Korea Disease Control and Prevention Agency website (https://www.kdca.go.kr/yhs/), which is publicly accessible to anyone.
3. Data Collection
The 19th Korea Youth Risk Behavior Survey (national approved statistics no., 117058) was a self-administered online questionnaire conducted nationwide from August to October 2023, targeting students from middle school (grade 1) to high school (grade 3). Eight-hundred schools were sampled, 799 schools participated in the survey, and 52,880 of 56,935 eligible students completed the survey.
4. Data Analysis

1) Exploratory data analysis

Prior to developing the random forest predictive model, an exploratory data analysis was conducted to understand the characteristics of the dataset and establish effective preprocessing strategies. Categorical variables were analyzed using frequency and percentage distributions, whereas continuous variables were examined for median, minimum, maximum, mean, and standard deviation. Data quality checks included identifying missing values, outliers, non-standard data (e.g., text or special characters), and non-response rates. This exploratory analysis provides insights into the overall structure and characteristics of the data and formed the basis for a robust data preprocessing plan.

2) Data preprocessing

The raw dataset contained observations for 52,880 participants over 145 variables before data cleaning (Supplement 2). After removing the cases with missing values, the dataset contained observations from 44,320 participants. Variables with at least 20% non-responses (35 variables) (Supplement 3) were excluded [18], leaving 115 variables in the analysis. The seven items measuring generalized anxiety disorder were summed to create a single composite variable (generalized anxiety disorder). The final dataset after data cleaning therefore included 104 variables. Missing values (‘9999,’ ‘8888’) were imputed using the mode (categorical) or mean (continuous). Key variables were reverse coded for clarity (e.g., 1 was coded as 5 for subjective stress, perceived health, economic status, and sleep quality). No additional processing required for outliers or logical inconsistencies.

3) Training and test data split

To objectively evaluate the predictive performance of the model, the entire dataset (n=44,320) was divided into 80% training data (n=35,456) and 20% test data (n=8,864). Additionally, five-fold cross-validation was performed to ensure model stability and generalizability of predictive performance. This process involved dividing the training data into five subsets and sequentially validating the model, preventing overfitting and enhancing the reliability of the model.

4) Random forest model

(1) Model construction

The random forest algorithm is an ensemble learning method that generates multiple decision trees and aggregates their predictions to produce a final prediction [19]. Bootstrapping is used to create multiple random training samples and aggregate the predictions of individual decision trees through majority voting or averaging to ensure prediction stability.
A key feature of the random forest algorithm is its ability to evaluate variable importance to identify the key predictors influencing the dependent variable [19]. This is obtained by calculating the relative importance of independent variables based on the degree of impurity reduction at each decision tree node. Additionally, by combining multiple decision trees, random forest prevents the overfitting of individual trees and reduces prediction variability.
To enhance the model performance, optimization techniques such as hyperparameter tuning were applied. This process improved the prediction accuracy, facilitating the precise identification of significant variables affecting adolescents’ subjective happiness.

(2) Hyperparameter configuration

Hyperparameters are key parameters that control the learning process of a machine-learning model and must be defined by the researcher before training. Determining the optimal combination of these hyperparameters is a critical step in determining the performance of the model [20].
In this study, the hyperparameter tuning was performed using the random search method. Combinations of hyperparameters were selected at random within a specified range and tested to identify a configuration that strikes a balance between computational efficiency and optimization performance [20]. The hyperparameter values and their impact on the model are detailed in Supplement 4.

5) Model evaluation

(1) Performance metrics

This study utilized performance metrics such as mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), and the coefficients of determination (R2 and adjusted R2) to evaluate predictive performance. Among these, MSE, MAE, and RMSE reflect the absolute magnitude of prediction errors. However, their values have no upper bound, making direct comparisons between models less intuitive. Therefore, this study primarily evaluated and interpreted model performance using R2, which standardizes the model’s explanatory power to a scale between 0 and 1.

(2) Shapley Additive Explanations

Shapley Additive Explanations (SHAP) is a tool to interpret machine learning models utilizing Shapley values from game theory. The impact of each feature on the predictions is quantified using a SHAP value, where the sign of the SHAP value indicates whether the feature contributes to an increase (positive) or decrease (negative) in the predicted value. In this study, key predictive variables were selected using an elbow plot of the SHAP value distribution. Variables were added to incrementally to evaluate their contributions [21]. The results showed a negligible performance improvement when more than eight variables were included. At feature index 8, the mean SHAP value converged to zero. Thus, the eight features with the highest SHAP values were identified as the key predictors that significantly influenced adolescents’ subjective happiness (Figure 1).
1. Predictive Performance
For the training data (five-fold cross-validation), the metrics were: MSE, 0.57; 95% confidence interval (CI), 0.56–0.58; RMSE, 0.76; 95% CI, 0.75–0.76; MAE, 0.61; 95% CI, 0.60–0.61; and R2, .39; 95% CI, .38–.40. For the test data, the metrics were MSE (0.57), RMSE (0.76), MAE (0.61), R2 (.37), and adjusted R2 (.37). The model performed similarly across the training and test datasets, indicating that it stably predicted adolescents’ subjective happiness without overfitting. The model explained approximately 37%–39% of the total variance in subjective happiness (Tables 1, 2). This explanatory power may seem modest, given the multifaceted nature of adolescents’ subjective happiness. However, this finding provides empirical evidence for the feasibility of leveraging machine learning to capture the interplay of diverse psychosocial, physical, and socioeconomic factors more effectively than traditional regression models. Although subjective happiness is shaped by a complex interplay of influences, a data-driven approach such as random forest can robustly identify and predict a substantial portion of the variance within a large, nationally representative sample.
2. SHAP Analysis

1) SHAP summary plot

The SHAP analysis identified eight key predictors influencing adolescents’ subjective happiness, ranked by impact: perceived stress (Supplement 5), perceived health, experience of loneliness, generalized anxiety disorder, suicidal ideation, economic status, fatigue recovery through sleep, and academic performance.
The feature impact plot also reveals the directional effects of each variable. For instance, higher perceived stress levels have a negative impact on subjective happiness, whereas better perceived health positively influences it. These findings indicate that adolescents’ subjective happiness is shaped by a complex interplay of factors, including mental health (e.g., perceived stress), physical health (e.g., perceived health), and socioeconomic status (e.g., economic status). Stress, loneliness, anxiety, and suicidal ideation were found to consistently undermine subjective happiness, whereas stronger perceived health, sufficient sleep, better economic status, and better academic performance were associated with more positive self-evaluations of happiness.

2) SHAP force plot

The SHAP force plot revealed a complex interplay of various factors influencing subjective happiness levels (Figure 3).
For adolescents with very low subjective happiness (1.34), the strongest influences were extremely high perceived stress (5.0), a severe experience of loneliness (5.0), very poor perceived health (1.0), suicidal ideation (1.0), and high levels of generalized anxiety (26.0).
At a low subjective happiness level (2.03), moderate fatigue recovery through sleep (3.0) had a positive effect. However, its impact was outweighed by extremely high perceived stress (5.0), a severe experience of loneliness (5.0), suicidal ideation (1.0), moderate perceived health (3.0), and high levels of generalized anxiety (24.0).
Moderate subjective happiness (2.99) was positively influenced by a moderate experience of loneliness (3.0) and no suicidal ideation (0.0), whereas extremely high perceived stress (5.0) had a negative effect. Moderate perceived health (3.0) and economic status (3.0) has a positive influence.
High subjective happiness (4.00) was predominantly driven by positive factors including high economic status (4.0), sufficient fatigue recovery through sleep (4.0), no suicidal ideation (0.0), good perceived health (4.0), low levels of generalized anxiety (9.0), and a low experience of loneliness (1.0). Only moderately high perceived stress (4.0) had a negative influence.
For very high subjective happiness (5.00), all the major predictive factors consistently contributed positively. These included low levels of generalized anxiety (7.0), very sufficient fatigue recovery through sleep (5.0), very high economic status (5.0), a low experience of loneliness (1.0), excellent perceived health (5.0), and very low perceived stress (1.0).
Collectively, these findings illustrate that adolescents’ subjective happiness is governed by an intricate web of psychosocial and physical resources, mental health conditions, and academic and economic contexts. Notably, stress, anxiety, loneliness, and suicidal ideation substantially decreased subjective happiness levels, emphasizing the urgency of mental health interventions at school and community levels. Similarly, stronger perceived health, better economic status, and sufficient fatigue recovery through sleep were highlighted as factors that can bolster adolescents’ well-being. Random forest can capture these simultaneous influences with relatively high stability, reinforcing the idea that an integrated approach, combining mental health screening, promotion of physical health, and attention to socioeconomic disparities, can systematically address the complexity adolescents’ subjective happiness. The findings of the present model provide quantifiable support for the proposition that schools, health professionals, and policymakers should collaborate across multiple dimensions to design programs targeting stress, sleep quality, and mental health, given their influential roles in shaping how young people evaluate their lives.
This study employed random forest and SHAP analysis to investigate the key predictors of subjective happiness among South Korean adolescents, using data from the 19th Korea Youth Risk Behavior Survey (2023). This survey produced a robust dataset capturing a wide range of health behaviors and psychosocial factors among adolescents nationwide [5]. Happiness, as conceptualized in this research, is a universal aspiration subject to cultural, social, and individual variations in its definition [1]. Subjective happiness, in particular, concerns one’s evaluation of one’s own life, encompassing emotional (positive and negative affect) and cognitive (life satisfaction) components. For this reason, understanding the drivers of subjective happiness requires considering a diverse set of predictive factors [1,22].
Compared with prior studies that mostly used linear regression models [9,14], the machine learning approach in this study facilitated a more detailed examination of complex, nonlinear interactions among variables. In keeping with the comprehensive scope of the survey, our findings identified eight primary predictors of adolescents’ subjective happiness: perceived stress, perceived health, experiences of loneliness, generalized anxiety disorder, suicidal ideation, economic status, fatigue recovery through sleep, and academic performance. This aligns with the notion that adolescents’ subjective happiness is shaped by the intricate interplay of mental, physical, and social influences.
Importantly, the application of random forest and SHAP methods provides new insights into adolescent subjective happiness. Traditional regression models often struggle to capture nonlinearities and high-dimensional interactions among numerous variables [11,12]. In contrast, random forest can robustly handle such complexities. SHAP analysis provides an interpretable framework to isolate the individual contributions of each predictor. By unveiling how each factor, such as stress, sleep, or economic status, exerts its impact in different contexts, SHAP analysis of the random forest model offers a richer understanding of where targeted interventions may have the greatest effect. Consequently, alongside identifying key predictors of subjective happiness, this study deepens the contextual understanding of how these predictors interact to shape this outcome.
In particular, adolescents experiencing low levels of subjective happiness appear to be burdened by overlapping negative factors, such as high perceived stress, experiences of loneliness, poor perceived health, suicidal ideation, and high levels of generalized anxiety. Stress and loneliness, often intensified by academic pressure, conflicts with peers, or limited emotional support, have been consistently linked to lower subjective happiness [23]. Suicidal ideation reflects severe psychological distress, and can stem from depression, hopelessness, and low self-esteem [24]. Anxiety is known to compound mental and physical strain, and this effect potentially involves physiological mechanisms such as elevated cortisol levels [25]. Even among those with low subjective happiness, better sleep providing greater fatigue recovery had a modest positive effect on subjective happiness, although a persistent lack of adequate rest exacerbates mental health concerns and impedes recovery [26,27]. Without timely and comprehensive support, these compounding factors may seriously threaten the psychosocial development of adolescents.
In contrast, the SHAP force plot revealed that adolescents with high subjective happiness benefited from various protective elements, including sufficient fatigue recovery through sleep, good perceived health, low experiences of loneliness, the absence of suicidal ideation, and low levels of generalized anxiety. These factors promote emotional stability and physical well-being. Additionally, higher economic status helps mitigate financial stress and enhance subjective happiness [27-29]. Academic performance may create a sense of competence, thereby offsetting the stress associated with unsatisfactory grades [9]. Given the importance of economic status in shaping adolescents’ subjective happiness observed in this study, it may be valuable to consider policy-level interventions that support economically disadvantaged youths. Programs such as tuition subsidies, access to afterschool programs, and community welfare services could help alleviate financial stress and create more equitable environments that promote adolescent well-being.
Although these findings identify individual and environmental protective factors, contextual influences, such as family dynamics and cultural norms, can still shape the interplay between these variables.
Overall, our results demonstrate that adolescents’ subjective happiness in South Korea emerges from a constellation of factors, including mental health conditions (stress, loneliness, anxiety, and suicidal ideation), physical well-being (fatigue recovery through sleep and perceived health), and socioeconomic circumstances (economic status). Consequently, a dynamic and integrated approach spanning mental health screening, stress management programs, social support enhancements, and physical health initiatives offers a more holistic means to promote adolescents’ subjective happiness. Schools, community nurses, and policymakers should work together to improve sleep hygiene education, provide targeted counseling on stress and loneliness, and address socioeconomic disparities. Through efforts to simultaneously reduce negative stressors and strengthen protective factors, adolescents’ subjective happiness can be nurtured more effectively.
Nevertheless, because the data analyzed in this study is cross-sectional and observational, we cannot definitively establish causal relationships among the identified predictors. Future longitudinal studies or experimental designs are necessary to clarify the causal pathways and strengthen the evidence base for targeted interventions based on these factors.
Finally, although the analyzed dataset is both large and nationally representative, caution should be exercised when generalizing these findings to other cultural contexts [22]. Future research could expand to different populations and develop machine learning models tailored to specific subgroups (e.g., by sex or school level). This may improve predictive precision and reveal further details about the complex factors underlying adolescents’ subjective happiness.
By illustrating the interplay between psychosocial and physical factors, this study reaffirms that adolescents’ subjective happiness cannot be reduced to a single dimension such as academic performance. The capacity of random forest and SHAP analysis to capture these multidimensional nonlinear relationships highlights the importance of data-driven strategies that address the broad determinants of adolescents’ lives, ultimately supporting more effective interventions to promote subjective happiness.
This study employed a random forest model with SHAP analysis to explore multiple factors influencing adolescents’ subjective happiness using data from the 19th Korea Youth Risk Behavior Survey (2023). Subjective stress, perceived health, experience of loneliness, generalized anxiety disorder, suicidal ideation, economic status, fatigue, recovery through sleep, and academic performance were identified as meaningful contributors to adolescents’ subjective happiness.
A notable strength of this study is its focus on a broad range of psychosocial and health-related variables within a large, nationally representative sample. Using machine learning, this study captured complex interactions among variables, and the SHAP analysis offered more specific insights into how each factor affects subjective happiness.
These findings highlight the interplay between the mental, physical, and socioeconomic factors that shape adolescents’ subjective happiness. Future research should build on this work by examining causal pathways through longitudinal designs or by extending the analysis to different cultural contexts. Such efforts may help to refine our understanding of adolescents’ subjective happiness and inform more precise strategies to address this population’s complex needs.

Authors' contribution

Conceptualization: EJK, SKK, SHJ, YSR. Data collection: SKK, SHJ. Formal analysis; EJK, SKK. Interpretation of data: SKK, YSR. Writing–original draft: EJK, SKK, SHJ, YSR. Writing–review and editing: SKK, SHJ, YSR. Final approval of published version: EJK, SKK, SHJ, YSR.

Conflict of interest

No existing or potential conflict of interest relevant to this article was reported.

Funding

This study was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (No. 2021R1A2C1095530).

Data availability

Please contact the corresponding author for data availability.

Acknowledgements

None.

Supplement 1.
Overview of study methodology.
chnr-2024-049-Supplement-1.pdf
Supplement 2.
Overview of key variables.
chnr-2024-049-Supplement-2.pdf
Supplement 3.
Summary of omitted variables.
chnr-2024-049-Supplement-3.pdf
Supplement 4.
Description of key hyperparameters.
chnr-2024-049-Supplement-4.pdf
Supplement 5.
Frequency analysis and chi-square test for differences in subjective stress causes by adolescents’ subjective happiness levels.
chnr-2024-049-Supplement-5.pdf
Figure 1.
Shapley Additive Explanations (SHAP) values distribution (elbow plot). The elbow plot illustrates the mean absolute SHAP values from the Random Forest model, highlighting the contribution of each feature to the model predictions. The x-axis represents the feature index (ranked by importance) and the y-axis represents the mean SHAP value. A sharp decline in the SHAP values beyond the 8th feature indicates that the most influential predictors of the dependent variable were concentrated within the top eight features, whereas the remaining variables contributed minimally.
chnr-2024-049f1.jpg
Figure 2.
Shapley Additive Explanations (SHAP) values. The SHAP bar plot illustrates the ranking of key predictors of the dependent variable. The features appearing at the top were the most influential in predicting subjective happiness. The SHAP summary plot provides insights into the influence of independent variables on the dependent variable. A positive SHAP value indicates a positive contribution to subjective happiness, whereas a negative SHAP value suggests a negative impact. The feature values are color-coded; low feature values are represented in blue, whereas high feature values are shown in red. For example, lower levels of perceived stress (M_STR) were associated with higher subjective happiness, as indicated by positive SHAP values. E-SES, economic status; E_S_RCRD, academic performance; M_GAD, generalized anxiety disorder scale; M_LON, experience of loneliness; M_SLP_EN, degree of fatigue recovery through sleep; M_STR, perceived stress in daily life; M_SUI_CON, suicidal thoughts; PR_HT, perceived health status.
chnr-2024-049f2.jpg
Figure 3.
Shapley Additive Explanations (SHAP) force plot. The waterfall plot visualizes the impact of individual features on the model predictions using the SHAP values. Each row represents a single prediction instance with features contributing positively (red) or negatively (blue) to the predicted outcome. The x-axis represents the model output (f(x)) starting from the base value, and the colored bars show how each feature shifts the prediction. Red bars indicate features that increased the predicted value, contributing positively to subjective happiness. The blue bars indicate the features that decreased the predicted value and contributed negatively. Features with higher SHAP values (larger bars) had a more significant influence on the model prediction. E-SES, economic status; E_S_RCRD, academic performance; M_GAD, generalized anxiety disorder scale; M_LON, experience of loneliness; M_SLP_EN, degree of fatigue recovery through sleep; M_STR, perceived stress in daily life; M_SUI_CON, suicidal thoughts; PR_HT, perceived health status.
chnr-2024-049f3.jpg
Table 1.
Configuration of the random forest model (N=44,320)
Random forest model configuration and performance Results
bootstrap True
max_depth 20
min_sample_leaf 8
min_samples_split 2
n_estimators 500
max_features None
random_state 42
Cross-validation 5
Table 2.
Performance of the random forest model
MSE (95% CI) MAE (95% CI) RMSE (95% CI) R2 (95% CI) Adjusted R2
Train data 0.58 (0.57–0.59) 0.61 (0.60–0.62) 0.76 (0.75–0.77) .38 (.37–.39) -
Test data 0.57 0.61 0.76 .37 .37

CI, confidence interval; MAE, mean absolute error; MSE, mean squared error; RMSE, root mean squared error.

FIGURE & DATA

REFERENCES

    Citations

    Citations to this article as recorded by  

      Download Citation

      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:

      Include:

      Predictive factors of adolescents’ happiness: a random forest analysis of the 2023 Korea Youth Risk Behavior Survey
      Child Health Nurs Res. 2025;31(2):85-95.   Published online April 30, 2025
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Predictive factors of adolescents’ happiness: a random forest analysis of the 2023 Korea Youth Risk Behavior Survey
      Child Health Nurs Res. 2025;31(2):85-95.   Published online April 30, 2025
      Close

      Figure

      • 0
      • 1
      • 2
      Predictive factors of adolescents’ happiness: a random forest analysis of the 2023 Korea Youth Risk Behavior Survey
      Image Image Image
      Figure 1. Shapley Additive Explanations (SHAP) values distribution (elbow plot). The elbow plot illustrates the mean absolute SHAP values from the Random Forest model, highlighting the contribution of each feature to the model predictions. The x-axis represents the feature index (ranked by importance) and the y-axis represents the mean SHAP value. A sharp decline in the SHAP values beyond the 8th feature indicates that the most influential predictors of the dependent variable were concentrated within the top eight features, whereas the remaining variables contributed minimally.
      Figure 2. Shapley Additive Explanations (SHAP) values. The SHAP bar plot illustrates the ranking of key predictors of the dependent variable. The features appearing at the top were the most influential in predicting subjective happiness. The SHAP summary plot provides insights into the influence of independent variables on the dependent variable. A positive SHAP value indicates a positive contribution to subjective happiness, whereas a negative SHAP value suggests a negative impact. The feature values are color-coded; low feature values are represented in blue, whereas high feature values are shown in red. For example, lower levels of perceived stress (M_STR) were associated with higher subjective happiness, as indicated by positive SHAP values. E-SES, economic status; E_S_RCRD, academic performance; M_GAD, generalized anxiety disorder scale; M_LON, experience of loneliness; M_SLP_EN, degree of fatigue recovery through sleep; M_STR, perceived stress in daily life; M_SUI_CON, suicidal thoughts; PR_HT, perceived health status.
      Figure 3. Shapley Additive Explanations (SHAP) force plot. The waterfall plot visualizes the impact of individual features on the model predictions using the SHAP values. Each row represents a single prediction instance with features contributing positively (red) or negatively (blue) to the predicted outcome. The x-axis represents the model output (f(x)) starting from the base value, and the colored bars show how each feature shifts the prediction. Red bars indicate features that increased the predicted value, contributing positively to subjective happiness. The blue bars indicate the features that decreased the predicted value and contributed negatively. Features with higher SHAP values (larger bars) had a more significant influence on the model prediction. E-SES, economic status; E_S_RCRD, academic performance; M_GAD, generalized anxiety disorder scale; M_LON, experience of loneliness; M_SLP_EN, degree of fatigue recovery through sleep; M_STR, perceived stress in daily life; M_SUI_CON, suicidal thoughts; PR_HT, perceived health status.
      Predictive factors of adolescents’ happiness: a random forest analysis of the 2023 Korea Youth Risk Behavior Survey
      Random forest model configuration and performance Results
      bootstrap True
      max_depth 20
      min_sample_leaf 8
      min_samples_split 2
      n_estimators 500
      max_features None
      random_state 42
      Cross-validation 5
      MSE (95% CI) MAE (95% CI) RMSE (95% CI) R2 (95% CI) Adjusted R2
      Train data 0.58 (0.57–0.59) 0.61 (0.60–0.62) 0.76 (0.75–0.77) .38 (.37–.39) -
      Test data 0.57 0.61 0.76 .37 .37
      Table 1. Configuration of the random forest model (N=44,320)

      Table 2. Performance of the random forest model

      CI, confidence interval; MAE, mean absolute error; MSE, mean squared error; RMSE, root mean squared error.

      TOP