WebCollectives: A light regular expression based web content extractor in Java
Conventional web crawling methods typically involve a sequence of distinct steps for downloading and extracting web content. A noteworthy limitation of these conventional crawling approaches is their lack of a focus-based crawling strategy. The software introduced in this paper, known as WebCollecti...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-12-01
|
Series: | SoftwareX |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352711023002650 |