Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation
Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10064292/ |
_version_ | 1797847354909392896 |
---|---|
author | Jesper Bergman Oliver B. Popov |
author_facet | Jesper Bergman Oliver B. Popov |
author_sort | Jesper Bergman |
collection | DOAJ |
description | Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the “dark web”, or ACNs, such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and related future research topics. |
first_indexed | 2024-04-09T18:10:01Z |
format | Article |
id | doaj.art-60770aaf040f4dbc8020135d695e5e4a |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-09T18:10:01Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-60770aaf040f4dbc8020135d695e5e4a2023-04-13T23:00:55ZengIEEEIEEE Access2169-35362023-01-0111359143593310.1109/ACCESS.2023.325516510064292Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their ImplementationJesper Bergman0https://orcid.org/0000-0002-2653-9325Oliver B. Popov1https://orcid.org/0000-0001-6176-6817Department of Computer and Systems Sciences, Stockholm University, Stockholm, SwedenDepartment of Computer and Systems Sciences, Stockholm University, Stockholm, SwedenStrong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the “dark web”, or ACNs, such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and related future research topics.https://ieeexplore.ieee.org/document/10064292/Cybercrimedigital forensicssystematic literature reviewdark web crawlingTor |
spellingShingle | Jesper Bergman Oliver B. Popov Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation IEEE Access Cybercrime digital forensics systematic literature review dark web crawling Tor |
title | Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation |
title_full | Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation |
title_fullStr | Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation |
title_full_unstemmed | Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation |
title_short | Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation |
title_sort | exploring dark web crawlers a systematic literature review of dark web crawlers and their implementation |
topic | Cybercrime digital forensics systematic literature review dark web crawling Tor |
url | https://ieeexplore.ieee.org/document/10064292/ |
work_keys_str_mv | AT jesperbergman exploringdarkwebcrawlersasystematicliteraturereviewofdarkwebcrawlersandtheirimplementation AT oliverbpopov exploringdarkwebcrawlersasystematicliteraturereviewofdarkwebcrawlersandtheirimplementation |