Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction

With the explosive growth of internet, web pages classification has become an essential issue. This is because web pages classification will provide an efficient information search to internet users. Without professional classification, a website would become a jumble yard of content which is confus...

Full description

Bibliographic Details
Main Authors: Sam, Lee Zhi, Maarof, Mohd. Aizaini, Selamat, Ali
Format: Conference or Workshop Item
Language:English
Published: 2006
Subjects:
Online Access:http://eprints.utm.my/3129/1/F__PDF_ICOMMS-108.pdf
_version_ 1796853523156041728
author Sam, Lee Zhi
Maarof, Mohd. Aizaini
Selamat, Ali
author_facet Sam, Lee Zhi
Maarof, Mohd. Aizaini
Selamat, Ali
author_sort Sam, Lee Zhi
collection ePrints
description With the explosive growth of internet, web pages classification has become an essential issue. This is because web pages classification will provide an efficient information search to internet users. Without professional classification, a website would become a jumble yard of content which is confusing and time wasting. By using web pages classification, it allows web visitors to navigate a web site quickly and efficiently. However, presently most of the web directories are still being classified manually or using semi-automated (huge teams of human editors)[1]. Automated web pages classification is highly in demand in order to replace expensive manpower and reduce the time consumed. In this paper we analyze the concept of a new model, which uses an integration of Principal Component Analysis (PCA) and Independent Component Analysis (ICA) as feature reduction for web pages classification. This model consists of several modules, which are web page retrieval process, stemming, stop-word filtering, feature reduction, feature selection, classification and evaluation.
first_indexed 2024-03-05T18:00:47Z
format Conference or Workshop Item
id utm.eprints-3129
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T18:00:47Z
publishDate 2006
record_format dspace
spelling utm.eprints-31292017-09-30T04:28:05Z http://eprints.utm.my/3129/ Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction Sam, Lee Zhi Maarof, Mohd. Aizaini Selamat, Ali QA76 Computer software With the explosive growth of internet, web pages classification has become an essential issue. This is because web pages classification will provide an efficient information search to internet users. Without professional classification, a website would become a jumble yard of content which is confusing and time wasting. By using web pages classification, it allows web visitors to navigate a web site quickly and efficiently. However, presently most of the web directories are still being classified manually or using semi-automated (huge teams of human editors)[1]. Automated web pages classification is highly in demand in order to replace expensive manpower and reduce the time consumed. In this paper we analyze the concept of a new model, which uses an integration of Principal Component Analysis (PCA) and Independent Component Analysis (ICA) as feature reduction for web pages classification. This model consists of several modules, which are web page retrieval process, stemming, stop-word filtering, feature reduction, feature selection, classification and evaluation. 2006 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/3129/1/F__PDF_ICOMMS-108.pdf Sam, Lee Zhi and Maarof, Mohd. Aizaini and Selamat, Ali (2006) Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction. In: Proceedings of International Conference on Man-Machine Systems 2006, September 15-16 2006, Langkawi, Malaysia. http://icomms.unimap.edu.my/index.htm
spellingShingle QA76 Computer software
Sam, Lee Zhi
Maarof, Mohd. Aizaini
Selamat, Ali
Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title_full Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title_fullStr Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title_full_unstemmed Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title_short Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction
title_sort automated web pages classification with integration of principal component analysis pca and independent component analysis ica as feature reduction
topic QA76 Computer software
url http://eprints.utm.my/3129/1/F__PDF_ICOMMS-108.pdf
work_keys_str_mv AT samleezhi automatedwebpagesclassificationwithintegrationofprincipalcomponentanalysispcaandindependentcomponentanalysisicaasfeaturereduction
AT maarofmohdaizaini automatedwebpagesclassificationwithintegrationofprincipalcomponentanalysispcaandindependentcomponentanalysisicaasfeaturereduction
AT selamatali automatedwebpagesclassificationwithintegrationofprincipalcomponentanalysispcaandindependentcomponentanalysisicaasfeaturereduction