Ensemble learning for multidimensional poverty classification

Azuraliza Abu Bakar, and Rusnita Hamdan, and Nor Samsiah Sani, (2020) Ensemble learning for multidimensional poverty classification. Sains Malaysiana, 49 (2). pp. 447-459. ISSN 0126-6039

[img]
Preview
PDF
870kB

Official URL: http://www.ukm.my/jsm/malay_journals/jilid49bil2_2...

Abstract

The poverty rate in Malaysia is determined through financial or income indices and measurements. As such, periodic measurements are conducted through Household Expenditure and Income Survey (HEIS) twice every five years, and subsequently used to generate a Poverty Line Income (PLI) to determine poverty levels through statistical methods. Such uni-dimensional measurement however is unable to portray the overall deprivation conditions, especially based on the experience of the urban population. In addition, the United Nation Development Programme (UNDP) has introduced a set of multi-dimensional poverty measurements but is yet to be applied in the case of Malaysia. In view of this, a potential use of Machine Learning (ML) approaches that can produce new poverty measurement methods is therefore of interest, which must be triggered by the existence of a rich database collection on poverty, such as the eKasih database maintained by the Malaysian Government. The goal of this study was to determine whether ensemble learning method (random forest) can classify poverty and hence produce multidimensional poverty indicator compared to based learner method using eKasih dataset. CRoss Industry Standard Process for Data Mining (CRISP-DM) methods was used to ensure data mining and ML processes were conducted properly. Beside Random Forest, we also examined decision tree and general linear methods to benchmark their performance and determine the method with the highest accuracy. Fifteen variables were then rank using varImp method to search for important variables. Analysis of this study showed that Per Capita Income, State, Ethnic, Strata, Religion, Occupation and Education were found to be the most important variables in the classification of poverty at a rate of 99% accuracy confidence using Random Forest algorithm.

Item Type:Article
Keywords:Machine learning; Multidimensional poverty; Random forest
Journal:Sains Malaysiana
ID Code:14778
Deposited By: ms aida -
Deposited On:18 Jun 2020 07:02
Last Modified:23 Jun 2020 01:15

Repository Staff Only: item control page