A cluster analysis of population based cancer registry in Brunei Darussalam : an exploratory study

Lai, Daphne Teck Ching and Owais A. Malik, (2022) A cluster analysis of population based cancer registry in Brunei Darussalam : an exploratory study. Asia-Pacific Journal of Information Technology and Multimedia, 11 (1). pp. 54-64. ISSN 2289-2192


Official URL: https://www.ukm.my/apjitm/articles-issues


Machine learning techniques have been mostly applied in gene expression cancer data. Socio-demographic data available in cancer registries could be explored, to get further insight into relationships between cancer types and their contributing factors. Moreover, less attention has been paid to analyse the mixed demographic data (numeric and categorical) from cancer registries and its association to the cancer types. The aim of this study is to identify subgroups of patients, having similar demographics characteristics, from the population based cancer registry in Brunei Darussalam and examine the prevalent cancer types in these subgroups. Four clustering algorithms are explored in the cluster analysis of Brunei Darussalam Cancer Registry; Two-step, Partitional Around Medoid, Agglomerative Hierarchical and Model-based. Gower distance was used for measuring similarity for mixed data types. To evaluate the clusters found; cluster distribution and Silhouette index were used for cluster quality, Cohen's Kappa Index for cluster stability and Cramer's V Coefficient for clinical relevance of clusters. Six distinct demographic subgroups were consistently found by three algorithms while model-based clustering solution were not considered for deeper analysis as highly imbalanced clusters were produced. The subgroups found have good quality clusters, moderate association with cancer types and high stability. The top three prevalent cancers associated with these subgroups were consistently identified using the three algorithms. Upon comparing the subgroups’ ages during diagnosis, we identify possible screening behaviours of specific subgroups, suggesting for early screening awareness programmes. This study demonstrates the use of cluster analysis in a cancer registry to identify demographic subgroups that could suggest potential areas to develop targeted and improved healthcare management strategies.

Item Type:Article
Keywords:Population-based Cancer Registries; Cluster analysis; Clustering algorithms; Machine learning; Two-step; Partition around medoids; Agglomerative; Hierarchical; Model-based clustering; Gower distance
Journal:Asia - Pasific Journal of Information Technology and Multimedia (Formerly Jurnal Teknologi Maklumat dan Multimedia)
ID Code:19427
Deposited By: ms aida -
Deposited On:16 Aug 2022 02:02
Last Modified:18 Aug 2022 08:13

Repository Staff Only: item control page