You Need Only One Clue for Effective Record Segmentation
Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules...
Κύριοι συγγραφείς: | , , , , , |
---|---|
Μορφή: | Conference item |
Έκδοση: |
2011
|
_version_ | 1826276802123792384 |
---|---|
author | Wang, C Furche, T Gottlob, G Grasso, G Orsi, G Schallhart, C |
author_facet | Wang, C Furche, T Gottlob, G Grasso, G Orsi, G Schallhart, C |
author_sort | Wang, C |
collection | OXFORD |
description | Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction. |
first_indexed | 2024-03-06T23:19:19Z |
format | Conference item |
id | oxford-uuid:6833bb7e-df67-42c1-94de-860e9f08bd68 |
institution | University of Oxford |
last_indexed | 2024-03-06T23:19:19Z |
publishDate | 2011 |
record_format | dspace |
spelling | oxford-uuid:6833bb7e-df67-42c1-94de-860e9f08bd682022-03-26T18:43:19ZYou Need Only One Clue for Effective Record SegmentationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:6833bb7e-df67-42c1-94de-860e9f08bd68Department of Computer Science2011Wang, CFurche, TGottlob, GGrasso, GOrsi, GSchallhart, CRecord segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction. |
spellingShingle | Wang, C Furche, T Gottlob, G Grasso, G Orsi, G Schallhart, C You Need Only One Clue for Effective Record Segmentation |
title | You Need Only One Clue for Effective Record Segmentation |
title_full | You Need Only One Clue for Effective Record Segmentation |
title_fullStr | You Need Only One Clue for Effective Record Segmentation |
title_full_unstemmed | You Need Only One Clue for Effective Record Segmentation |
title_short | You Need Only One Clue for Effective Record Segmentation |
title_sort | you need only one clue for effective record segmentation |
work_keys_str_mv | AT wangc youneedonlyoneclueforeffectiverecordsegmentation AT furchet youneedonlyoneclueforeffectiverecordsegmentation AT gottlobg youneedonlyoneclueforeffectiverecordsegmentation AT grassog youneedonlyoneclueforeffectiverecordsegmentation AT orsig youneedonlyoneclueforeffectiverecordsegmentation AT schallhartc youneedonlyoneclueforeffectiverecordsegmentation |