A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from the...

Full description

Bibliographic Details
Main Authors:	Seonyeong Song, Jiyoung Han, Kunwoo Park
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Data-centric AI contrastive learning contextomy
Online Access:	https://ieeexplore.ieee.org/document/10472505/

_version_	1797243312262873088
author	Seonyeong Song Jiyoung Han Kunwoo Park
author_facet	Seonyeong Song Jiyoung Han Kunwoo Park
author_sort	Seonyeong Song
collection	DOAJ
description	Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from their original context, changing the speaker’s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>.
first_indexed	2024-04-24T18:53:07Z
format	Article
id	doaj.art-9c5d76a5ea7342f6910d869a24d221a6
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T18:53:07Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-9c5d76a5ea7342f6910d869a24d221a62024-03-26T17:48:10ZengIEEEIEEE Access2169-35362024-01-0112401684018110.1109/ACCESS.2024.337722710472505A Data-Centric Contrastive Embedding Framework for Contextomized Quote DetectionSeonyeong Song0https://orcid.org/0009-0006-8822-9812Jiyoung Han1Kunwoo Park2https://orcid.org/0000-0003-2913-9711Department of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaMoon Soul Graduate School of Future Strategy, KAIST, Daejeon, South KoreaDepartment of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaQuotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from their original context, changing the speaker’s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>.https://ieeexplore.ieee.org/document/10472505/Data-centric AIcontrastive learningcontextomy
spellingShingle	Seonyeong Song Jiyoung Han Kunwoo Park A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection IEEE Access Data-centric AI contrastive learning contextomy
title	A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_full	A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_fullStr	A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_full_unstemmed	A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_short	A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
title_sort	data centric contrastive embedding framework for contextomized quote detection
topic	Data-centric AI contrastive learning contextomy
url	https://ieeexplore.ieee.org/document/10472505/
work_keys_str_mv	AT seonyeongsong adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT jiyounghan adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT kunwoopark adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT seonyeongsong datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT jiyounghan datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT kunwoopark datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection

A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

Similar Items