Hybrid focused crawling on the Surface and the Dark Web
Abstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framew...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2017-07-01
|
Series: | EURASIP Journal on Information Security |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13635-017-0064-5 |
_version_ | 1818132516752588800 |
---|---|
author | Christos Iliou George Kalpakis Theodora Tsikrika Stefanos Vrochidis Ioannis Kompatsiaris |
author_facet | Christos Iliou George Kalpakis Theodora Tsikrika Stefanos Vrochidis Ioannis Kompatsiaris |
author_sort | Christos Iliou |
collection | DOAJ |
description | Abstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web. |
first_indexed | 2024-12-11T08:38:05Z |
format | Article |
id | doaj.art-b9e9be0ac972459a99d8a67cbda7d5b2 |
institution | Directory Open Access Journal |
issn | 2510-523X |
language | English |
last_indexed | 2024-12-11T08:38:05Z |
publishDate | 2017-07-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Information Security |
spelling | doaj.art-b9e9be0ac972459a99d8a67cbda7d5b22022-12-22T01:14:18ZengSpringerOpenEURASIP Journal on Information Security2510-523X2017-07-012017111310.1186/s13635-017-0064-5Hybrid focused crawling on the Surface and the Dark WebChristos Iliou0George Kalpakis1Theodora Tsikrika2Stefanos Vrochidis3Ioannis Kompatsiaris4Information Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasInformation Technologies Institute, Centre for Research and Technology HellasAbstract Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web.http://link.springer.com/article/10.1186/s13635-017-0064-5Focused crawlingDark webDarknetsTorI2PFreenet |
spellingShingle | Christos Iliou George Kalpakis Theodora Tsikrika Stefanos Vrochidis Ioannis Kompatsiaris Hybrid focused crawling on the Surface and the Dark Web EURASIP Journal on Information Security Focused crawling Dark web Darknets Tor I2P Freenet |
title | Hybrid focused crawling on the Surface and the Dark Web |
title_full | Hybrid focused crawling on the Surface and the Dark Web |
title_fullStr | Hybrid focused crawling on the Surface and the Dark Web |
title_full_unstemmed | Hybrid focused crawling on the Surface and the Dark Web |
title_short | Hybrid focused crawling on the Surface and the Dark Web |
title_sort | hybrid focused crawling on the surface and the dark web |
topic | Focused crawling Dark web Darknets Tor I2P Freenet |
url | http://link.springer.com/article/10.1186/s13635-017-0064-5 |
work_keys_str_mv | AT christosiliou hybridfocusedcrawlingonthesurfaceandthedarkweb AT georgekalpakis hybridfocusedcrawlingonthesurfaceandthedarkweb AT theodoratsikrika hybridfocusedcrawlingonthesurfaceandthedarkweb AT stefanosvrochidis hybridfocusedcrawlingonthesurfaceandthedarkweb AT ioanniskompatsiaris hybridfocusedcrawlingonthesurfaceandthedarkweb |