Imputing missing values in modelling the PM10 concentrations

Nuradhiathy Abd Razak, and Yong Zulina Zubairi, and Rossita M. Yunus, (2014) Imputing missing values in modelling the PM10 concentrations. Sains Malaysiana, 43 (10). pp. 1599-1607. ISSN 0126-6039


Official URL:


Missing values have always been a problem in analysis. Most exclude the missing values from the analyses which may lead to biased parameter estimates. Some imputations methods are considered in this paper in which simulation study is conducted to compare three methods of imputation namely mean substitution, hot deck and expectation maximization (EM) imputation. The EM imputation is found to be superior especially when the percentage of missing values is high as it constantly gives low RMSE as compared with other two methods. The EM imputation method is then applied to the PM10 concentrations data set for the southwest and northeast monsoons in Petaling Jaya and Seberang Perai, Malaysia which has missing values. Four types of distributions, namely the Weibull, lognormal, gamma and Gumbel distribution are considered to describe the PM10 concentrations. The Weibull distribution gives the best fit for the southwest monsoon data for Petaling Jaya. The lognormal distribution outperformed the others in describing the southwest monsoon in Seberang Perai. Meanwhile, for the northeast monsoon in both locations, gamma distribution is the best distribution to describe the data.

Item Type:Article
Keywords:Expectation maximization; mean imputation; missing value; PM10; Weibull
Journal:Sains Malaysiana
ID Code:7824
Deposited By: ms aida -
Deposited On:05 Nov 2014 05:46
Last Modified:14 Dec 2016 06:45

Repository Staff Only: item control page