Cluster analysis for identifying obesity subgroups in health and nutritional status survey data

Khalil, Usman and Ahmed Malik, Owais and Teck, Daphne Ching Lai and Ong, Sok King (2021) Cluster analysis for identifying obesity subgroups in health and nutritional status survey data. Asia-Pacific Journal of Information Technology and Multimedia, 10 (2). pp. 146-169. ISSN 2289-2192


Official URL:


This study presents the discovery of meaningful patterns (groups) from the obese samples of health and nutritional survey data by applying various clustering techniques. Due to the mixed nature of the data (qualitative and quantitative variables) in the data set, the best-suited clustering techniques with appropriate dissimilarity metrics were chosen to interpret the meaningful results. The relationships between obesity and the lifestyle affecting factors like demography, socio-economic status, physical activity, and dietary behavior were assessed using four cluster techniques namely Two-Step clustering, Partition Around Medoids (PAM), Agglomerative Hierarchical clustering and, Kohonen Self Organizing Maps (SOMs). The solutions generated by these techniques were analyzed and validated by the help of cluster validity (CV) indices and later on their associations were determined with the obesity classes to discover the pattern from the obese sample. Two-Step clustering and hierarchical clustering outperformed the other applied techniques in identifying the subgroups based on the underlying hidden patterns in the data. Based on the CV indices values and the association analysis (obesity factor with the cluster solutions), two subgroups were generated and profiles of these groups have been reported. The first group belonged to the middle-aged individuals who seem to take care of their lifestyle while the other group belonged to young-aged individuals who in contrast to the first group presented a careless lifestyle factor (i.e., physical activity and dietary behavior). The salient features of these subgroups have been reported and can be proposed for the betterment in the health care industry. The research helped in identifying the interesting subsets/groups within survey data demonstrating similar characteristics and health status (i.e., prevalence of obesity with respect to lifestyle factors like physical activity, dietary behavior etc.) which will help to suggest appropriate measures/steps to be taken by the concerned departments to counter them and prevent in the population.

Item Type:Article
Keywords:NHANSS; Machine learning; Two-step; Partition around medoids; Agglomerative; Hierarchical; Kohonen SOMs; Clustering; Obesity
Journal:Asia - Pasific Journal of Information Technology and Multimedia (Formerly Jurnal Teknologi Maklumat dan Multimedia)
ID Code:17963
Deposited By: ms aida -
Deposited On:12 Jan 2022 00:53
Last Modified:15 Jan 2022 08:20

Repository Staff Only: item control page