New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables

The issue of classifying objects into groups when measured variables in an experiment are mixed has attracted the attention of statisticians.The Smoothed Location Model (SLM) appears to be a popular classification method to handle data containing both continuous and binary variables simultaneously.H...

Full description

Bibliographic Details
Main Authors: Hamid, Hashibah, P.A.H., Ngu, Mohd Alipiah, Fathilah
Format: Article
Language:English
Published: Universiti Putra Malaysia Press 2018
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/24407/1/PJST%20%2026%201%202018%20%20247%20260.pdf
_version_ 1803628715775623168
author Hamid, Hashibah
P.A.H., Ngu
Mohd Alipiah, Fathilah
author_facet Hamid, Hashibah
P.A.H., Ngu
Mohd Alipiah, Fathilah
author_sort Hamid, Hashibah
collection UUM
description The issue of classifying objects into groups when measured variables in an experiment are mixed has attracted the attention of statisticians.The Smoothed Location Model (SLM) appears to be a popular classification method to handle data containing both continuous and binary variables simultaneously.However, SLM is infeasible for a large number of binary variables due to the occurrence of numerous empty cells.Therefore, this study aims to construct new SLMs by integrating SLM with two variable extraction techniques, Principal Component Analysis (PCA) and two types of Multiple Correspondence Analysis (MCA) in order to reduce the large number of mixed variables, primarily the binary ones.The performance of the newly constructed models, namely the SLM+PCA+Indicator MCA and SLM+PCA+Burt MCA are examined based on misclassification rate. Results from simulation studies for a sample size of n=60 show that the SLM+PCA+Indicator MCA model provides perfect classification when the sizes of binary variables (b) are 5 and 10. For b=20, the SLM+PCA+Indicator MCA model produces misclassification rates of 0.3833, 0.6667 and 0.3221 for n=60, n=120 and n=180, respectively. Meanwhile, the SLM+PCA+Burt MCA model provides a perfect classification when the sizes of the binary variables are 5, 10, 15 and 20 and yields a small misclassification rate as 0.0167 when b=25. Investigations into real dataset demonstrate that both of the newly constructed models yield low misclassification rates with 0.3066 and 0.2336 respectively, in which the SLM+PCA+Burt MCA model performed the best among all the classification methods compared.The findings reveal that the two new models of SLM integrated with two variable extraction techniques can be good alternative methods for classification purposes in handling mixed variable problems, mainly when dealing with large binary variables.
first_indexed 2024-07-04T06:26:22Z
format Article
id uum-24407
institution Universiti Utara Malaysia
language English
last_indexed 2024-07-04T06:26:22Z
publishDate 2018
publisher Universiti Putra Malaysia Press
record_format dspace
spelling uum-244072018-07-18T05:58:09Z https://repo.uum.edu.my/id/eprint/24407/ New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables Hamid, Hashibah P.A.H., Ngu Mohd Alipiah, Fathilah QA75 Electronic computers. Computer science The issue of classifying objects into groups when measured variables in an experiment are mixed has attracted the attention of statisticians.The Smoothed Location Model (SLM) appears to be a popular classification method to handle data containing both continuous and binary variables simultaneously.However, SLM is infeasible for a large number of binary variables due to the occurrence of numerous empty cells.Therefore, this study aims to construct new SLMs by integrating SLM with two variable extraction techniques, Principal Component Analysis (PCA) and two types of Multiple Correspondence Analysis (MCA) in order to reduce the large number of mixed variables, primarily the binary ones.The performance of the newly constructed models, namely the SLM+PCA+Indicator MCA and SLM+PCA+Burt MCA are examined based on misclassification rate. Results from simulation studies for a sample size of n=60 show that the SLM+PCA+Indicator MCA model provides perfect classification when the sizes of binary variables (b) are 5 and 10. For b=20, the SLM+PCA+Indicator MCA model produces misclassification rates of 0.3833, 0.6667 and 0.3221 for n=60, n=120 and n=180, respectively. Meanwhile, the SLM+PCA+Burt MCA model provides a perfect classification when the sizes of the binary variables are 5, 10, 15 and 20 and yields a small misclassification rate as 0.0167 when b=25. Investigations into real dataset demonstrate that both of the newly constructed models yield low misclassification rates with 0.3066 and 0.2336 respectively, in which the SLM+PCA+Burt MCA model performed the best among all the classification methods compared.The findings reveal that the two new models of SLM integrated with two variable extraction techniques can be good alternative methods for classification purposes in handling mixed variable problems, mainly when dealing with large binary variables. Universiti Putra Malaysia Press 2018 Article PeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/24407/1/PJST%20%2026%201%202018%20%20247%20260.pdf Hamid, Hashibah and P.A.H., Ngu and Mohd Alipiah, Fathilah (2018) New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables. Pertanika Journal of Social Sciences & Humanities, 26 (1). pp. 247-260. ISSN 0128-7702 http://www.pertanika.upm.edu.my/regular_issues.php?jtype=2&journal=JST-26-1-1
spellingShingle QA75 Electronic computers. Computer science
Hamid, Hashibah
P.A.H., Ngu
Mohd Alipiah, Fathilah
New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title_full New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title_fullStr New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title_full_unstemmed New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title_short New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables
title_sort new smoothed location models integrated with pca and two types of mca for handling large number of mixed continuous and binary variables
topic QA75 Electronic computers. Computer science
url https://repo.uum.edu.my/id/eprint/24407/1/PJST%20%2026%201%202018%20%20247%20260.pdf
work_keys_str_mv AT hamidhashibah newsmoothedlocationmodelsintegratedwithpcaandtwotypesofmcaforhandlinglargenumberofmixedcontinuousandbinaryvariables
AT pahngu newsmoothedlocationmodelsintegratedwithpcaandtwotypesofmcaforhandlinglargenumberofmixedcontinuousandbinaryvariables
AT mohdalipiahfathilah newsmoothedlocationmodelsintegratedwithpcaandtwotypesofmcaforhandlinglargenumberofmixedcontinuousandbinaryvariables