Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation

Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically...

Full description

Bibliographic Details
Main Authors: Jesper Bergman, Oliver B. Popov
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10064292/
_version_ 1797847354909392896
author Jesper Bergman
Oliver B. Popov
author_facet Jesper Bergman
Oliver B. Popov
author_sort Jesper Bergman
collection DOAJ
description Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the “dark web”, or ACNs, such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and related future research topics.
first_indexed 2024-04-09T18:10:01Z
format Article
id doaj.art-60770aaf040f4dbc8020135d695e5e4a
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-09T18:10:01Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-60770aaf040f4dbc8020135d695e5e4a2023-04-13T23:00:55ZengIEEEIEEE Access2169-35362023-01-0111359143593310.1109/ACCESS.2023.325516510064292Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their ImplementationJesper Bergman0https://orcid.org/0000-0002-2653-9325Oliver B. Popov1https://orcid.org/0000-0001-6176-6817Department of Computer and Systems Sciences, Stockholm University, Stockholm, SwedenDepartment of Computer and Systems Sciences, Stockholm University, Stockholm, SwedenStrong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the “dark web”, or ACNs, such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and related future research topics.https://ieeexplore.ieee.org/document/10064292/Cybercrimedigital forensicssystematic literature reviewdark web crawlingTor
spellingShingle Jesper Bergman
Oliver B. Popov
Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
IEEE Access
Cybercrime
digital forensics
systematic literature review
dark web crawling
Tor
title Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
title_full Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
title_fullStr Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
title_full_unstemmed Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
title_short Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
title_sort exploring dark web crawlers a systematic literature review of dark web crawlers and their implementation
topic Cybercrime
digital forensics
systematic literature review
dark web crawling
Tor
url https://ieeexplore.ieee.org/document/10064292/
work_keys_str_mv AT jesperbergman exploringdarkwebcrawlersasystematicliteraturereviewofdarkwebcrawlersandtheirimplementation
AT oliverbpopov exploringdarkwebcrawlersasystematicliteraturereviewofdarkwebcrawlersandtheirimplementation