WebCollectives: A light regular expression based web content extractor in Java

Conventional web crawling methods typically involve a sequence of distinct steps for downloading and extracting web content. A noteworthy limitation of these conventional crawling approaches is their lack of a focus-based crawling strategy. The software introduced in this paper, known as WebCollecti...

Full description

Bibliographic Details
Main Author: Hayri Volkan Agun
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:SoftwareX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352711023002650