Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia

Naeimah Mamat, and Siti Fatin Mohd Razali, (2023) Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia. Jurnal Kejuruteraan, 35 (1). pp. 191-201. ISSN 0128-0198

[img]
Preview
PDF
2MB

Official URL: https://www.ukm.my/jkukm/volume-3501-2023/

Abstract

In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data.

Item Type:Article
Keywords:Imputation methods; Missing data; Multiple imputation; Evaluation criteria; Water quality
Journal:Jurnal Kejuruteraan
ID Code:21963
Deposited By: Mohd Hamka Md. Nasir
Deposited On:24 Jul 2023 06:26
Last Modified:27 Jul 2023 05:58

Repository Staff Only: item control page