You Need Only One Clue for Effective Record Segmentation

Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules...

وصف كامل

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Wang, C, Furche, T, Gottlob, G, Grasso, G, Orsi, G, Schallhart, C
التنسيق: Conference item
منشور في: 2011
_version_ 1826276802123792384
author Wang, C
Furche, T
Gottlob, G
Grasso, G
Orsi, G
Schallhart, C
author_facet Wang, C
Furche, T
Gottlob, G
Grasso, G
Orsi, G
Schallhart, C
author_sort Wang, C
collection OXFORD
description Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction.
first_indexed 2024-03-06T23:19:19Z
format Conference item
id oxford-uuid:6833bb7e-df67-42c1-94de-860e9f08bd68
institution University of Oxford
last_indexed 2024-03-06T23:19:19Z
publishDate 2011
record_format dspace
spelling oxford-uuid:6833bb7e-df67-42c1-94de-860e9f08bd682022-03-26T18:43:19ZYou Need Only One Clue for Effective Record SegmentationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:6833bb7e-df67-42c1-94de-860e9f08bd68Department of Computer Science2011Wang, CFurche, TGottlob, GGrasso, GOrsi, GSchallhart, CRecord segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction.
spellingShingle Wang, C
Furche, T
Gottlob, G
Grasso, G
Orsi, G
Schallhart, C
You Need Only One Clue for Effective Record Segmentation
title You Need Only One Clue for Effective Record Segmentation
title_full You Need Only One Clue for Effective Record Segmentation
title_fullStr You Need Only One Clue for Effective Record Segmentation
title_full_unstemmed You Need Only One Clue for Effective Record Segmentation
title_short You Need Only One Clue for Effective Record Segmentation
title_sort you need only one clue for effective record segmentation
work_keys_str_mv AT wangc youneedonlyoneclueforeffectiverecordsegmentation
AT furchet youneedonlyoneclueforeffectiverecordsegmentation
AT gottlobg youneedonlyoneclueforeffectiverecordsegmentation
AT grassog youneedonlyoneclueforeffectiverecordsegmentation
AT orsig youneedonlyoneclueforeffectiverecordsegmentation
AT schallhartc youneedonlyoneclueforeffectiverecordsegmentation