Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting
Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are &a...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2013-02-01
|
Series: | International Journal of Computational Intelligence Systems |
Subjects: | |
Online Access: | https://www.atlantis-press.com/article/25868364.pdf |