Hybrid focused crawling on the Surface and the Dark Web

Abstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framew...

Full description

Bibliographic Details
Main Authors: Christos Iliou, George Kalpakis, Theodora Tsikrika, Stefanos Vrochidis, Ioannis Kompatsiaris
Format: Article
Language:English
Published: SpringerOpen 2017-07-01
Series:EURASIP Journal on Information Security
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13635-017-0064-5
_version_ 1818132516752588800
author Christos Iliou
George Kalpakis
Theodora Tsikrika
Stefanos Vrochidis
Ioannis Kompatsiaris
author_facet Christos Iliou
George Kalpakis
Theodora Tsikrika
Stefanos Vrochidis
Ioannis Kompatsiaris
author_sort Christos Iliou
collection DOAJ
description Abstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web.
first_indexed 2024-12-11T08:38:05Z
format Article
id doaj.art-b9e9be0ac972459a99d8a67cbda7d5b2
institution Directory Open Access Journal
issn 2510-523X
language English
last_indexed 2024-12-11T08:38:05Z
publishDate 2017-07-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Information Security
spelling doaj.art-b9e9be0ac972459a99d8a67cbda7d5b22022-12-22T01:14:18ZengSpringerOpenEURASIP Journal on Information Security2510-523X2017-07-012017111310.1186/s13635-017-0064-5Hybrid focused crawling on the Surface and the Dark WebChristos Iliou0George Kalpakis1Theodora Tsikrika2Stefanos Vrochidis3Ioannis Kompatsiaris4Information Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasAbstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web.http://link.springer.com/article/10.1186/s13635-017-0064-5Focused crawlingDark webDarknetsTorI2PFreenet
spellingShingle Christos Iliou
George Kalpakis
Theodora Tsikrika
Stefanos Vrochidis
Ioannis Kompatsiaris
Hybrid focused crawling on the Surface and the Dark Web
EURASIP Journal on Information Security
Focused crawling
Dark web
Darknets
Tor
I2P
Freenet
title Hybrid focused crawling on the Surface and the Dark Web
title_full Hybrid focused crawling on the Surface and the Dark Web
title_fullStr Hybrid focused crawling on the Surface and the Dark Web
title_full_unstemmed Hybrid focused crawling on the Surface and the Dark Web
title_short Hybrid focused crawling on the Surface and the Dark Web
title_sort hybrid focused crawling on the surface and the dark web
topic Focused crawling
Dark web
Darknets
Tor
I2P
Freenet
url http://link.springer.com/article/10.1186/s13635-017-0064-5
work_keys_str_mv AT christosiliou hybridfocusedcrawlingonthesurfaceandthedarkweb
AT georgekalpakis hybridfocusedcrawlingonthesurfaceandthedarkweb
AT theodoratsikrika hybridfocusedcrawlingonthesurfaceandthedarkweb
AT stefanosvrochidis hybridfocusedcrawlingonthesurfaceandthedarkweb
AT ioanniskompatsiaris hybridfocusedcrawlingonthesurfaceandthedarkweb