Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are &a...

Full description

Bibliographic Details
Main Authors:	J. Prasanna Kumar, P. Govindarajulu
Format:	Article
Language:	English
Published:	Springer 2013-02-01
Series:	International Journal of Computational Intelligence Systems
Subjects:	Web Crawling Web page Duplicate web page Near duplicate web page Near duplicate detection fingerprinting
Online Access:	https://www.atlantis-press.com/article/25868364.pdf

Internet

https://www.atlantis-press.com/article/25868364.pdf

Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

Internet

Similar Items