Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Abstract Extracting web content is to obtain the required data embedded in web pages, usually including structured records, such as product information, and text content, such as news. Web pages use a large number of HTML tags to organize and to present various information. Both knowing little about...

Full description

Bibliographic Details
Main Authors:	Jingwei Zhang, Qian Wang, Qing Yang, Rui Zhou, Yanchun Zhang
Format:	Article
Language:	English
Published:	SpringerOpen 2018-06-01
Series:	Data Science and Engineering
Subjects:	Web extraction Visual characteristics Content semantics Unified framework
Online Access:	http://link.springer.com/article/10.1007/s41019-018-0067-3

Internet

http://link.springer.com/article/10.1007/s41019-018-0067-3

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Internet

Similar Items