Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting
Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are &a...
Main Authors: | J. Prasanna Kumar, P. Govindarajulu |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2013-02-01
|
Series: | International Journal of Computational Intelligence Systems |
Subjects: | |
Online Access: | https://www.atlantis-press.com/article/25868364.pdf |
Similar Items
-
CaSePer: An efficient model for personalized web page change detection based on segmentation
by: K.S. Kuppusamy, et al.
Published: (2014-01-01) -
A web archiving framework for national archive of Iran (NAI) /
by: 592620 Shabnam Zokaei, et al.
Published: (2011) -
A web archiving framework for national archive of Iran (NAI) [electronic resource] /
by: 592620 Shabnam Zokaei
Published: (2011) -
Web Page Content Block Identification with Extended Block Properties
by: Kiril Griazev, et al.
Published: (2023-05-01) -
A Survey Study on Relation Extraction for Web Pages
by: Ghada Alsaigh, et al.
Published: (2020-03-01)