RED: Redundancy-driven data extraction from result pages?
Data-driven websites are mostly accessed through search interfaces. Such sites follow a common publishing pattern that, surprisingly, has not been fully exploited for unsupervised data extraction yet: the result of a search is presented as a paginated list of result records. Each result record conta...
Main Authors: | , , , , |
---|---|
Format: | Conference item |
Published: |
Association for Computing Machinery
2019
|