New Discrimination Procedure of Location Model for Handling Large Categorical Variables

The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the...

Full description

Bibliographic Details
Main Authors: Hamid, Hashibah, Long, Mei Mei, Syed Yahaya, Sharipah Soaad
Format: Article
Language:English
Published: Universiti Kebangsaan Malaysia 2017
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/30823/1/SM%2046%2006%202017%201001-1010.pdf
http://dx.doi.org/10.17576/jsm-2017-4606-20
_version_ 1803629985816117248
author Hamid, Hashibah
Long, Mei Mei
Syed Yahaya, Sharipah Soaad
author_facet Hamid, Hashibah
Long, Mei Mei
Syed Yahaya, Sharipah Soaad
author_sort Hamid, Hashibah
collection UUM
description The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables
first_indexed 2024-07-04T06:46:33Z
format Article
id uum-30823
institution Universiti Utara Malaysia
language English
last_indexed 2024-07-04T06:46:33Z
publishDate 2017
publisher Universiti Kebangsaan Malaysia
record_format dspace
spelling uum-308232024-05-29T10:39:10Z https://repo.uum.edu.my/id/eprint/30823/ New Discrimination Procedure of Location Model for Handling Large Categorical Variables Hamid, Hashibah Long, Mei Mei Syed Yahaya, Sharipah Soaad QA Mathematics The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables Universiti Kebangsaan Malaysia 2017 Article PeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/30823/1/SM%2046%2006%202017%201001-1010.pdf Hamid, Hashibah and Long, Mei Mei and Syed Yahaya, Sharipah Soaad (2017) New Discrimination Procedure of Location Model for Handling Large Categorical Variables. Sains Malaysiana, 46 (06). pp. 1001-1010. ISSN 0126-6039 http://www.ukm.edu.my/jsm/index.html http://dx.doi.org/10.17576/jsm-2017-4606-20 http://dx.doi.org/10.17576/jsm-2017-4606-20
spellingShingle QA Mathematics
Hamid, Hashibah
Long, Mei Mei
Syed Yahaya, Sharipah Soaad
New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title_full New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title_fullStr New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title_full_unstemmed New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title_short New Discrimination Procedure of Location Model for Handling Large Categorical Variables
title_sort new discrimination procedure of location model for handling large categorical variables
topic QA Mathematics
url https://repo.uum.edu.my/id/eprint/30823/1/SM%2046%2006%202017%201001-1010.pdf
http://dx.doi.org/10.17576/jsm-2017-4606-20
work_keys_str_mv AT hamidhashibah newdiscriminationprocedureoflocationmodelforhandlinglargecategoricalvariables
AT longmeimei newdiscriminationprocedureoflocationmodelforhandlinglargecategoricalvariables
AT syedyahayasharipahsoaad newdiscriminationprocedureoflocationmodelforhandlinglargecategoricalvariables