Exploring the web with OXPath.

OXPath is a careful extension of XPath that facihtates data extraction from the deep web. It is designed to facilitate the large-scale extraction of data from sophisticated modern web interfaces with client-side scripting and asynchronous server communication. Its main characteristics are (1) a mini...

Full description

Bibliographic Details
Main Authors: Furche, T, Gottlob, G, Grasso, G, Schallhart, C, Sellers, A
Other Authors: Virgilio, R
Format: Journal article
Language:English
Published: ACM 2011
Description
Summary:OXPath is a careful extension of XPath that facihtates data extraction from the deep web. It is designed to facilitate the large-scale extraction of data from sophisticated modern web interfaces with client-side scripting and asynchronous server communication. Its main characteristics are (1) a minimal extension of XPath to allow page navigation and action execution, (2) a set-theoretic formal semantics for full OXPath, (3) and a sophisticated memory management that minimizes page buffering. In this poster, we briefly review the main features of the language and discuss ongoing and future work. Copyright 2011 ACM.