Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are &a...

Full description

Bibliographic Details
Main Authors:	J. Prasanna Kumar, P. Govindarajulu
Format:	Article
Language:	English
Published:	Springer 2013-02-01
Series:	International Journal of Computational Intelligence Systems
Subjects:	Web Crawling Web page Duplicate web page Near duplicate web page Near duplicate detection fingerprinting
Online Access:	https://www.atlantis-press.com/article/25868364.pdf

Similar Items

CaSePer: An efficient model for personalized web page change detection based on segmentation
by: K.S. Kuppusamy, et al.
Published: (2014-01-01)

A web archiving framework for national archive of Iran (NAI) /
by: 592620 Shabnam Zokaei, et al.
Published: (2011)

A web archiving framework for national archive of Iran (NAI) [electronic resource] /
by: 592620 Shabnam Zokaei
Published: (2011)

Web Page Content Block Identification with Extended Block Properties
by: Kiril Griazev, et al.
Published: (2023-05-01)

A Survey Study on Relation Extraction for Web Pages
by: Ghada Alsaigh, et al.
Published: (2020-03-01)

Semantic Web and Web Page Clustering Algorithms: A Landscape View
by: Ahmed J. Obaid, et al.
Published: (2020-11-01)

Content Matters: Clustering Web Pages for QoE Analysis With WebCLUST
by: Luis Roberto Jimenez, et al.
Published: (2021-01-01)

A Semantic Focused Web Crawler Based on a Knowledge Representation Schema
by: Julio Hernandez, et al.
Published: (2020-05-01)

Kemijski sadržaji na hrvatskom internetu
by: Mayer, Marina
Published: (2004-03-01)

Pembangunan aplikasi web menggunakan Active server pages (ASP) /
by: Mohd. Shahizan Othman
Published: (2006)

Web Page Streams and Relevance Propagation for Topic Distillation
by: Mohammad Amin Golshani, et al.
Published: (2014-03-01)

EVALUATION OF WEB SEARCHING METHOD USING A NOVEL WPRR ALGORITHM FOR TWO DIFFERENT CASE STUDIES
by: V. Lakshmi Praba, et al.
Published: (2012-04-01)

STUDY OF WEB PAGE RANKING ALGORITHMS: A REVIEW
by: Aditi Chowdhary, et al.
Published: (2019-03-01)

JavaServer pages /
by: 307586 Pekowsky, Larne
Published: (2000)

Active server pages /
by: 283598 Morneau, Keith, et al.
Published: (2001)

AUTOMATIC TAGGING OF PERSIAN WEB PAGES BASED ON N-GRAM LANGUAGE MODELS USING MAPREDUCE
by: Saeed Shahrivari, et al.
Published: (2015-07-01)

Using active server pages /
by: 319833 Johnson, Scott, et al.
Published: (1997)

JavaServer pages / [cakera padat]
by: 307586 Pekowsky, Larne
Published: (2000)

Webbed Penis Associated with Urethral Duplication: A Case Report
by: Burhan Aksu, et al.
Published: (2011-03-01)

Javaserver pages illuminated /
by: 195720 Metlapalli, Prabhakar
Published: (2008)

Some features of alt texts associated with images in Web pages
by: Timothy C. Craven
Published: (2006-01-01)

Sams teach yourself : active server pages 3.0 in 21 days /
by: 406234 Mitchell, Scott
Published: (2000)

Facilities Management and Digital Application of Web Engineering: Implications for Business Informatics Systems
by: Ezendu Ariwa, et al.
Published: (2010-12-01)

A Language Appraisal of Hotel Web Pages in Indonesia Five Starred Hotels: Interpersonal Meaning
by: Suyik Binarkaheni
Published: (2019-04-01)

How to do everything with Microsoft Office FrontPage 2003 /
by: 245512 Plotkin, David
Published: (c200)

American Elements
by: Mayer, Marina
Published: (2009-04-01)

Navigation and its role in raising the efficiency of the website
by: Maram Hammad, et al.
Published: (2021-07-01)

Aplikasi mel elektronik (e-mel) dan laman web di kalangan guru besar program khas pensiswazah guru besar (PKPGB) UTM Skudai /
by: 253930 Jamaludin Md. Jali
Published: (2008)

Aplikasi mel elektronik (e-mel) dan laman web di kalangan guru besar program khas pensiswazah guru besar (PKPGB) UTM Skudai [electronic resource] /
by: 253930 Jamaludin Md. Jali
Published: (2008)

WEB LOG EXPLORER – CONTROL OF MULTIDIMENSIONAL DYNAMICS OF WEB PAGES
by: Mislav Šimunić, et al.
Published: (2012-07-01)

Beginning ASP.NET 1.0 with VB.NET /
by: 302830 Ullman, Chris
Published: (2002)

Instant ASP.NET applications /
by: 305765 Buczek, Greg
Published: (2001)

Beginning ASP.NET 1.1 with Visual C# .NET 2003 /
by: Ullman, Chris
Published: (2004)

E-bingkai cermin mata menggunakan teknik asp /
by: 442450 Rahime Zamahrafila Mulup
Published: (2001)

Instant ASP components /
by: 305765 Buczek, Greg
Published: (2000)

Instant ASP components / [cakera padat]
by: 305765 Buczek, Greg
Published: (2000)

Effective Web Page Crawler
by: Hilal Hadi Saleh, et al.
Published: (2011-02-01)

WebScore: An Effective Page Scoring Approach for Uncertain Web Social Networks
by: Shaojie Qiao, et al.
Published: (2011-10-01)

Intelligent Content-based Categorization of Web pages Using Combination of Textual, Structural, and Visual Features
by: Ali Ahmadi, et al.
Published: (2009-12-01)

Professional active server pages 2.0 /
by: Fedorov, Alex
Published: (1998)