Park, Lee, and Cho: Analysis of the supportive care needs of the parents of preterm children in South Korea using big data text-mining: Topic modeling



The purpose of this study was to identify the supportive care needs of parents of preterm children in South Korea using text data from a portal site.


In total, 628 online newspaper articles and 1,966 social network service posts published between January 1 and December 31, 2019 were analyzed. The procedures in this study were conducted in the following order: keyword selection, data collection, morpheme analysis, keyword analysis, and topic modeling.


The term “yirundung-yi”, which is a native Korean word referring to premature infants, was confirmed to be a useful term for parents. The following four topics were identified as the supportive care needs of parents of preterm children: 1) a vague fear of caring for a baby upon imminent neonatal intensive care unit discharge, 2) real-world difficulties encountered while caring for preterm children, 3) concerns about growth and development problems, and 4) anxiety about possible complications.


Supportive care interventions for parents of preterm children should include general parenting methods for babies. A team composed of multidisciplinary experts must support the individual growth and development of preterm children and manage the complications of prematurity using highly accessible media.


Worldwide, the proportion of premature births (under 37 weeks of gestation) is approximately 10%[1]. In South Korea, the proportion of premature infants has more than doubled from 3.8% in 2000 to 8.1% in 2019 [2]. Meanwhile, the survival rate of premature infants is steadily increasing with advances in medical technology and the implementation of active policies related to the neonatal intensive care unit (NICU) [3]. According to University of Utah Health, the survival rate for infants born at 24 weeks is approximately 50%, but 80%~90% for infants born at 28 weeks and 95% at 32 weeks [4]. In South Korea in 2019, the survival rate was 87.9% for infants born with a body weight under 1,500 g and 72.8% for those under 1,000 g [2].
As the survival rate of premature infants improves, the interest in the field has shifted from medical care to parenting at home after NICU discharge and the proper growth and development of premature children [3,5]. Even if premature infants are safely discharged, they are at a high risk of complications, such as health problems or developmental disabilities [6,7].
Parents of infants born prematurely are exposed to various crisis situations while coping with the possible health problems and developmental disabilities of their children [8]. They also experience more and longer post-traumatic and parenting stress than the parents of full-term infants [9,10] and encounter anxiety, depression, and conflict [8,11]. They may be unprepared for their parental roles or may exhibit inappropriate parenting behaviors, such as overprotection [11,12].
Parental responses are the major factor influencing the neurodevelopment, cognitive development, and behavioral development of children born as premature infants [13]. Parents experiencing high parenting stress have impaired mental health, inappropriate relationships with their children, and poor parenting quality [5,14]. Therefore, the need for interventions that can support the parents of preterm children has emerged [5,8,12].
The term “big data” refers to the vast amount of data generated online, which would be difficult to analyze using existing data processing technologies [15]. Currently, online social network services (SNSs) are used for communication by a large number of people, and they can be used to grasp various viewpoints comprehensively because of the amount of data that is accumulated [16]. In particular, South Korea has the highest internet penetration rate in the world, and more than 50% of newspaper users use online platforms [15,16]. Big data analysis using text data of online news and SNSs has been attracting attention as a useful analytical method to explore the flow of people’s perception, times, and society [15,16]. Topic modeling is an analytical method that can find hidden topics in a large amount of text data [15]. Moreover, research using surveys and interviews is limited in terms of identifying population-scale phenomena owing to the selection of participants. By contrast, the analysis of big data, including online newspapers and SNSs, can reflect the experiences of the population [15].
Interest in the growth, development, and parenting of preterm children has been increasing. Related research is being conducted worldwide; some examples include a meta-analysis of the parenting behavior of parents of preterm children [12], qualitative studies on the care experience of preterm children [8,11], a systematic literature review of the relationship between parenting style and preterm infant development [13], and research on the stress of parenting preterm children [5,9,17]. These studies have reported that support is required for the parents of preterm children, but have been limited to investigations of the relationships between parenting aspects and child development. In other words, research aiming to understand the types of support required by the parents of preterm children is insufficient.
Research on the experiences of parents of preterm children after their discharge from the NICU is insufficient owing to the limited accessibility to such parents. However, analyzing the experiences of parents of preterm children using online big data is expected to produce results that reflect the actual phenomena because there are no restrictions on participation. Online news can be regarded as a public concern because it contains information that attracts attention related to real life through interactions with people living at the same time [18]. Therefore, this study was conducted to identify the types of support requests made by the parents of preterm children by using text data from online newspaper articles and SNS posts on a portal site. The specific purpose of the present study was 1) to collect big data using text mining to analyze keywords and 2) to derive the supportive care needs of parents of preterm children by using topic modeling.


1. Study Design

In this study, text mining was performed using online big data to identify the supportive care needs of parents of preterm children.

2. Sample

The portal site used was Naver, which has been reported to be the most widely used search engine in South Korea, with a market share of more than 60%[19]. The researchers conducted face-to-face meetings six times to closely examine agreement for the selection of texts to be used in the analysis of the supportive care needs of parents of preterm children using big data text mining. In total, 1,247 online newspaper articles and 1,966 SNS posts by the parents of preterm children in a web community on a portal site, posted between January 1 and December 31, 2019, were analyzed.

3. Study Process and Data Analysis

This study was conducted in the following order: keyword selection, data collection, morpheme analysis, keyword analysis, and topic modeling (Figure 1).

1) Keyword selection

The keywords selected were “premature infant” or “yirundung-yi”. Officially, the term “premature infant” is used, but the National Institute of the Korean Language created a new term, “yirundung-yi”, in 2006. “Yirundung-yi” is a native Korean word that denotes that a baby was born earlier than full term [20], and it is commonly used in online newspaper articles and SNSs.

2) Data collection

Data were collected through Textom (Textom, the IMC, South Korea), a big data analysis software that can collect, clean, and process matrix data in a web environment. It was chosen for this study because it can collect and analyze data from various channels, including portal sites [15,16]. The settings for the data collection were as follows: 1) text in Korean; 2) keywords of “yirundung-yi” and “premature infant”; 3) full text; 4) post start date was January 1, 2019; 5) post end date was December 31, 2019; 6) the collection channel was an online newspaper article and web community on Naver; and 7) duplicate uniform resource locators (URLs) were removed.
The title, URL, and text of the collected data were saved as an Excel file with a size of 1.2 MB. The titles and texts of the collected data were checked by the researchers, and unrelated data and duplicate content were deleted. In total, 1,247 online newspaper articles were collected, from which 628 texts were analyzed after excluding 583 duplicate and 36 unrelated articles. In total, 1,976 SNS posts were collected; there were 10 unrelated posts and no duplicate data, resulting in 1,966 texts for analysis. The final text size was 627 KB.

3) Morpheme analysis

Morphological analysis was performed with nouns and adjectives using the MeCab-ko analyzer. Unlike Espresso K, MeCab-ko has the advantage of classifying vocabulary by referring to a dictionary regardless of space [21]. However, as compound nouns are extracted as simple nouns, and synonyms and similar words cannot be distinguished, the meaning of the text was not clear. Thus, through an N-gram depicting the degree of concentration of the keywords, complex nouns were created again from the refined data. Subsequently, a user dictionary was used to refine words with the same or similar meaning and words that did not have a meaning in the refined data. The user dictionary refers to a function in which the words to be changed are constructed as a dictionary when it is necessary to refine data for similar or identical subjects repeatedly, and these words are collectively changed into corrected words designated in the morpheme analysis process [21]. Therefore, a synonym dictionary was constructed for words with agreed-upon meanings, and words with similar meanings were constructed as a thesaurus dictionary using Textom’s user word dictionary. Words such as “because of”, “etc.”, “related”, and “and” do not have relevant meanings in this context; therefore, they were treated as stop words. The final refined data were saved as a .txt file, and the encoding was set to UTF-8, saved, and then uploaded to Textom.

4) Keyword analysis

In this study, the term frequency-inverse document frequency (TF-IDF) value was applied to extract important keywords, as it represents the frequency and weight of specific words in the document [22]. From the collected data, the top 50 keywords were extracted by applying the TF-IDF weight to each data point. In addition, the keywords were analyzed with the keywords of data collection: “yirundung-yi” and “premature infant”. A word cloud is the most representative text mining technique for visualizing keywords [23]. In this study, the size was expressed as large, medium, and small according to the frequency of a word’s appearance in the data. The colors were set as blue, green, and gray, in descending order of importance of appearance in the data.

5) Topic analysis

For topic modeling, the latent dirichlet allocation (LDA) model was used in this study. The LDA method assumes that a document is a collection of topics, each of which is a set of keywords, and it probabilistically infers the topics latent in the document [24]. In the LDA method, the researcher determines the number of topics that are considered well-categorized [21]. In this study, the number of topics was determined to be four, which demonstrated the clearest distance and boundary between the topics. The researchers reviewed the keywords and texts that composed the four topics that were derived and named them in a way that clearly expressed their meaning.


1. Keyword Analysis

The top 50 keywords based on their TF-IDF values are shown in Table 1. Three word clouds for the combined data (online newspaper articles and SNS data), online newspaper article data, and SNS data are presented in Figure 2. The word cloud of online newspaper articles revealed that “premature infant” was the most important term, as its representation in the cloud was larger than that of “yirundung-yi”, indicating that the official medical term was primarily used in online newspaper articles. In the word cloud of SNS data, “yirundung-yi” was found to have higher importance than “premature infant”, and other words related to parenting were found to be the main words.

2. Topic Modeling

The four topics using LDA modeling are shown in Figure 2 and discussed in the following sections.

1) Vague fear of caring for a baby from the time when NICU discharge is imminent

The main keywords for the first topic were “premature infant formula”, “discharge”, and “incubator”. This topic indicates that the parents started preparing for their parental role when their infants neared their discharge from the NICU, and that they experienced vague anxiety.
I’m a yirundung-yi mom, but I’m still very unfamiliar. I don’t know anything. What is corrected age? (code 1920)
I am a yirundung-yi mom of a 37-week and 1,750 g baby. The baby is still in the incubator, and it is about to be three weeks of hospitalization. I think he/she will be discharged this month. What are the hospital expenses and supplies for discharge? (code 1165)

2) Real-world difficulties encountered while caring for preterm children

The main keywords that composed the second topic were “Naturemade?”, “infant formula”, and “recommendation”. This topic encompassed the real-world problems that the parents faced directly after their infants were discharged from the NICU. Unlike the first topic, it dealt with the specific and real-world difficulties of child-rearing, such as feeding and providing clothes and diapers.
My baby was discharged from the NICU today and We couldn’t find a place to buy premature infant formula. Does anyone know where premature infant formula is sold, or someone who wants to sell leftover premature infant formula? (code 319)
My baby was finally discharged yesterday and came home. Whether it is day or night, the baby keeps shouting and moving. The baby constantly has strong movements throughout the whole body, is it okay? (code 366)
It is a reality that special diapers and powdered milk that are needed for yirundung-yi are difficult to obtain at home because they are not only expensive but also in limited supply. The Ministry of Health and Welfare, Samsung Card, and the Green Umbrella Children’s Foundation support the ‘childcare kit.’ (code 2027)

3) Concerns about the growth and developmental problems of preterm children

The third topic was composed of “worry”, “weight”, “test”, “retinopathy of prematurity (ROP) test”, and “corrected age”. This topic indicates that after discharge from the NICU, parents became aware of the problems related to the growth and development of their children and became anxious or concerned.
My baby was born as yirundung-yi, and corrected age is seven months. The counselor doctor in the portal site and the attending physician said that the yirundung-yi’s development will be 2-3 months later than full-term children. My baby is 7 months old, but he still doesn’t try to put weight on his legs. (code 1524)
I am a beginner mom raising a 46-day-old baby. The baby was born at 36 weeks, and the weight and height were similar to those of other children, but he was born a month early, so it seemed that he never opens his eyes and plays. He opened his eyes for a very short time and cried, and after I hugged him, he fell asleep again. It seems that only my baby is late. (code 437)
Everyone around me tells me not to worry too much because the kids will grow up on their own when the time comes. But when my child seems to be developing or growing later than his peers, I inevitably become anxious. When should my baby have an infant and toddler development test? (code 2445)

4) Anxiety about possible complications in preterm children

The main keywords that composed the fourth topic were “retinopathy of prematurity (ROP)”, “bronchopulmonary dysplasia”, “treatment”, “medical expenses support”, and “surgery”. This topic indicated worries about potential future health problems, such as future complications and treatment prognosis.
My baby was hospitalized for 5 days with jaundice and had a level of 19.5 at 4 days old. He was 6 months and 9 days old. Should he be going for a retinopathy of prematurity test? (code 365)
My baby was born as a very premature infant, and his corrected age is 6 months. During the weekend, he had phlegm and coughed quite a bit. We went to a nearby pediatrician in a hurry to get a prescription, but the he didn’t have any lung sickness yet. When he wheezes, it is called bronchiolitis. What should I do? (code 720)
Medipost Pneumostem® is a treatment for bronchopulmonary dysplasia, which is mainly composed of mesenchymal stem cells derived from allogeneic cord blood. Bronchial pulmonary dysplasia is a chronic lung disease that occurs in premature infants receiving artificial ventilation and oxygen therapy. (code 2451)


This study analyzed a vast amount of text from online newspaper articles and SNS posts to understand the supportive care needs reported by parents raising preterm children. Four topics were identified through the data analysis.
The first topic was parents’ vague fear of caring for their child, starting when NICU discharge was imminent. First-time parents generally experienced stress and anxiety about caring for their children [5]. In particular, previous studies have reported that the parents of premature infants are generally less prepared for parenting than the parents of full-term infants [11]. Furthermore, the parents of premature infants experience more stress than the parents of full-term infants [9,25]. Parent education [26], skin-to-skin contact through kangaroo care [27], and massages [28] are among the interventions that have been implemented in NICUs to enhance parent-child attachment, reduce parental stress, and strengthen parental confidence. However, in this study, it was found that parents exhibited an uncertain and vague reaction to caring for their children. Although NICU discharge education is being implemented, in reality, parents seek advice on how to handle their children using SNS, which indicates that the education provided at the NICU was ineffective. Therefore, it is necessary to develop effective educational methods and content that can improve parents’ confidence to raise their children at home after discharge from the NICU.
The second topic encompassed the real-world difficulties of parenting preterm children. Compared to the first topic, which represented an abstract future difficulty, this topic included specific parenting difficulties. According to a study by Lee and Koh [29], the parents of infants lacked the ability to meet their own parenting expectations. In addition, according to a study by Jeong and Kim [30] that analyzed the parental education of parents of infants, the most frequently encountered type of parental education dealt with child development; however, parents wanted education related to actual parenting behaviors, including interacting with their child, solving health problems, and proper eating habits. Consistent with previous studies, the results of this study revealed that keywords related to practical care for infants, such as “formula”, “diapers” and “childcare products” appeared, rather than content specific to preterm children. Therefore, the demands of parents who care for their children after discharge from the NICU relate to real-world parenting behaviors. Furthermore, previous studies have reported that similar issues caused stress when parenting premature and full-term infants [5,10,17]. In other words, it is necessary to educate the parents of preterm children about general parenting behaviors and to provide interventions that can improve overall parenting competence. In addition, to effectively deliver necessary information to parents caring for infants after being discharged from the NICU, it is necessary to develop educational material that is as accessible as SNSs as a way to find information.
The third topic involved concerns about the growth and development of preterm children. The results of a qualitative study by Kim [8], which analyzed parenting experience with infants born prematurely, indicated that vague anxiety was the parents’ main reaction and was caused by a lack of certainty about the normal growth and development of their children. This is consistent with the results of this study. Furthermore, Whittingham et al. [11] also reported that the parents of prematurely born infants lacked assurance that their children would develop appropriately. Parents obtain information on growth and development problems that may occur in premature infants through SNSs or online communities [5,8,10]. If parents have a limited ability to interpret information and apply it under the circumstances of caring for a child born as a premature baby, the information related to growth and development provided by health non-professionals may cause excessive anxiety. Therefore, to provide assurance regarding premature children’s growth and development, health professionals need to help parents through individual child growth and development interventions, in addition to the use of self-help groups, such as SNSs, by parents of preterm children.
The final identified topic was parental anxiety about possible future health complications in preterm children. Parents of preterm infants are aware of the potential medical problems of their children and seek information [11]. A study by Kim [8] reported that parents of premature infants were anxious to prevent infection, diseases, or complications from worsening after discharge, which is similar to the results of this study. Therefore, it is necessary to provide multidisciplinary interventions for the parents of preterm children, including a neonatal health care professional, a developmental expert, and an expert who can provide medical interventions.
In the word cloud, it was found that the keywords “premature infant” and “yirundung-yi” were similar in size in the online newspaper articles, whereas “yirundung-yi” was larger in the SNS posts where parents’ writings were reflected. This result means that for parents, the term “yirundung-yi”, which is a pure Korean word, is a more practical term than the technical term “premature infant”. Therefore, when providing intervention for the parents of preterm infants, the use of the term “yirundung-yi” rather than the medical term “premature infant” may be a more useful strategy.
As research on raising premature children discharged from the NICU is just beginning, this study is significant in that an analysis was conducted using big data with the goal of understanding the supportive care needs of parents raising preterm children. The significance of this study lies in its ability to identify parents’ needs meaningfully because it analyzed 2,594 texts using text mining and LDA topic modeling to extract topics embedded in online documents used by these parents. In addition, as the data were collected using various channels, namely online newspaper articles and SNSs, the research results derived in this study are reliable.
However, a methodological limitation is that the subjectivity of the researcher may have been reflected in the process of refining the keywords. Furthermore, slang, abbreviations, and new words were not completely reflected in the refined keywords. In addition, the results of this study should be interpreted considering the fact that they did not reflect the needs of parents who do not use SNSs.


The supportive care needs of parents of preterm children were identified in this study and can be summarized as follows: 1) uncertainty in caring for their infants, 2) real-world difficulties in raising their children, 3) concerns about appropriate growth and development, and 4) anxiety about potential medical problems of their children. Reflecting these needs, the following interventions are needed. It is necessary to develop and apply effective education that increases the confidence of parents in caring for their children at home when NICU discharge is imminent. For interventions after NICU discharge, parenting methods specific to premature infants are important; however, information on parenting methods in general should also be included. In addition, it is necessary for a team composed of multidisciplinary experts, such as doctors, nurses, development experts, and occupational therapists, to provide support for the individual growth and development of preterm children and help manage complications of prematurity. Finally, it is suggested that various media such as SNS, internet, and mobile platforms be used to increase the accessibility of information for parents.


This study was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (No. NRF-2019R1I1A3A01060632).
Rank Total (N=9,782) Online newspaper articles (n=5,451) SNS (n=5,304)

Keyword TF-IDF Keyword TF-IDF Keyword TF-IDF
1 Yirundung-yi 2,132.5 Yirundung-yi 824.2 Yirundung-yi 1,262.5

2 Premature infant 1,660.2 Premature infant 626.0 Premature infant 906.8

3 Premature infant formula 1,041.0 Newborn 416.8 Premature infant formula 849.7

4 Infant 988.9 Medipost Pneumostem®* 404.3 Naturemade® 810.1

5 Naturemade® 987.0 Daughter 360.5 Infant 714.8

6 Child 817.2 Under 336.5 Powdered milk 574.3

7 Newborn 722.6 NICU 324.5 Give 505.5

8 Powdered milk 706.4 Child 322.4 Baby 504.6

9 Hospital 646.3 Born 306.8 Discharge 497.1

10 Discharge 631.0 Pregnancy 298.3 Child 494.3

11 Baby 611.0 Incubator 289.0 Yirundung-yi mom 474.1

12 Give 609.3 ROP 277.2 Hospital 402.5

13 Huggies® 601.3 Support 275.9 Weight 361.6

14 Incubator 584.2 BPD 257.4 Huggies® 348.7

15 Daughter 574.6 Particulate matter 254.5 Need 323.8

16 Yirundung-yi mom 564.1 Huggies® 253.5 Delivery 322.8

17 Childbirth 533.5 Congenital anomaly 252.8 Breast milk 314.3

18 Medipost Pneumostem®* 524.7 Hospital 243.9 Degree 310.1

19 Under 515.7 Danger 241.9 Babe 305.8

20 Weight 505.3 Voucher 232.9 Sale 298.7

21 NICU 500.9 Infant 224.4 Postnatal care center 286.4

22 Birth 488.4 Treatment 221.5 Mother 286.1

23 Need 478.0 Pregnant woman 219.2 Height 280.8

24 Pregnancy 466.7 Person 216.3 Preemie 280.0

25 Mother 464.0 Delivery 210.8 Incubator 274,4

26 Support 454.7 Body weight 205.3 Recommendation 270.6

27 Breast milk 397.6 Diaper 202.2 Nicu 266.9

28 Degree 377.8 Yuhan-Kimberly 202.2 ROP 261.5

29 Body weight 364.4 Selection 195.0 Worry 258.0

30 ROP 364.4 Health 193.0 Birth 253.5

31 Congenital anomaly 361.8 Region 188.4 Completion 252.7

32 Sale 358.1 Medical expenses support 183.4 Test 250.3

33 Delivery 356.0 Campaign 181.8 Mom 248.9

34 Health 354.9 Extreme prematurity 181.8 Corrected 235.1

35 Babe 352.0 Mother 177.9 Twin 231.3

36 Twin 350.5 Parent 175.2 Thinking 228.4

37 Particulate matter 339.3 Hospital bills 173.2 Infant formula 215.5

38 Height 338.6 Hold 170.6 University hospital 212.4

39 Treatment 337.2 Surgery 168.4 Newborn 210.4

40 BPD 334.1 Ministry of Health and Welfare 168.1 Small 205.2

41 Danger 330.3 Outbreak 167.4 Request 203.4

42 Postnatal care center 329.0 Doctor 166.6 Home 202.4

43 Premature retina 323.8 World 159.0 Hospitalization 201.3

44 Surgery 323.0 Puerperal mother 158.7 ROP test 196.2

45 Preemie 318.6 Child Fund Korea 156.3 Collect on delivery 192.5

46 Person 315.9 Case 155.3 Use 190.9

47 Worry 311.6 This year 153.0 Agonize 188.3

48 Recommendation 308.2 Need 151.9 Post 180.9

49 Voucher 306.9 The latest 149.8 Premature birth 179.2

50 Nicu 306.9 Last year 147.2 Corrected age 171.4

* A medication for preventing and treating bronchopulmonary dysplasia in clinical trials in South Korea;

A diaper brand name in South Korea;

BPD, bronchopulmonary dysplasia; NICU, neonatal intensive care unit; ROP, retinopathy of prematurity; SNS, social networking service; TF-IDF, term frequency-inverse document frequency.

