Exploring the web with OXPath.
OXPath is a careful extension of XPath that facihtates data extraction from the deep web. It is designed to facilitate the large-scale extraction of data from sophisticated modern web interfaces with client-side scripting and asynchronous server communication. Its main characteristics are (1) a mini...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Journal article |
Language: | English |
Published: |
ACM
2011
|
Summary: | OXPath is a careful extension of XPath that facihtates data extraction from the deep web. It is designed to facilitate the large-scale extraction of data from sophisticated modern web interfaces with client-side scripting and asynchronous server communication. Its main characteristics are (1) a minimal extension of XPath to allow page navigation and action execution, (2) a set-theoretic formal semantics for full OXPath, (3) and a sophisticated memory management that minimizes page buffering. In this poster, we briefly review the main features of the language and discuss ongoing and future work. Copyright 2011 ACM. |
---|