A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from the...

Full description

Bibliographic Details
Main Authors: Seonyeong Song, Jiyoung Han, Kunwoo Park
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10472505/
_version_ 1797243312262873088
author Seonyeong Song
Jiyoung Han
Kunwoo Park
author_facet Seonyeong Song
Jiyoung Han
Kunwoo Park
author_sort Seonyeong Song
collection DOAJ
description Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as &#x2018;contextomizing,&#x2019; where words are extracted from their original context, changing the speaker&#x2019;s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE&#x002B;&#x002B;, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE&#x002B;&#x002B; enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>.
first_indexed 2024-04-24T18:53:07Z
format Article
id doaj.art-9c5d76a5ea7342f6910d869a24d221a6
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T18:53:07Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-9c5d76a5ea7342f6910d869a24d221a62024-03-26T17:48:10ZengIEEEIEEE Access2169-35362024-01-0112401684018110.1109/ACCESS.2024.337722710472505A Data-Centric Contrastive Embedding Framework for Contextomized Quote DetectionSeonyeong Song0https://orcid.org/0009-0006-8822-9812Jiyoung Han1Kunwoo Park2https://orcid.org/0000-0003-2913-9711Department of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaMoon Soul Graduate School of Future Strategy, KAIST, Daejeon, South KoreaDepartment of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaQuotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as &#x2018;contextomizing,&#x2019; where words are extracted from their original context, changing the speaker&#x2019;s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE&#x002B;&#x002B;, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE&#x002B;&#x002B; enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>.https://ieeexplore.ieee.org/document/10472505/Data-centric AIcontrastive learningcontextomy
spellingShingle Seonyeong Song
Jiyoung Han
Kunwoo Park
A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
IEEE Access
Data-centric AI
contrastive learning
contextomy
title A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_full A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_fullStr A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_full_unstemmed A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_short A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_sort data centric contrastive embedding framework for contextomized quote detection
topic Data-centric AI
contrastive learning
contextomy
url https://ieeexplore.ieee.org/document/10472505/
work_keys_str_mv AT seonyeongsong adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection
AT jiyounghan adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection
AT kunwoopark adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection
AT seonyeongsong datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection
AT jiyounghan datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection
AT kunwoopark datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection