Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content
Abstract Extracting web content is to obtain the required data embedded in web pages, usually including structured records, such as product information, and text content, such as news. Web pages use a large number of HTML tags to organize and to present various information. Both knowing little about...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2018-06-01
|
Series: | Data Science and Engineering |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1007/s41019-018-0067-3 |