A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection
Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10472505/ |
_version_ | 1797243312262873088 |
---|---|
author | Seonyeong Song Jiyoung Han Kunwoo Park |
author_facet | Seonyeong Song Jiyoung Han Kunwoo Park |
author_sort | Seonyeong Song |
collection | DOAJ |
description | Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from their original context, changing the speaker’s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>. |
first_indexed | 2024-04-24T18:53:07Z |
format | Article |
id | doaj.art-9c5d76a5ea7342f6910d869a24d221a6 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-24T18:53:07Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-9c5d76a5ea7342f6910d869a24d221a62024-03-26T17:48:10ZengIEEEIEEE Access2169-35362024-01-0112401684018110.1109/ACCESS.2024.337722710472505A Data-Centric Contrastive Embedding Framework for Contextomized Quote DetectionSeonyeong Song0https://orcid.org/0009-0006-8822-9812Jiyoung Han1Kunwoo Park2https://orcid.org/0000-0003-2913-9711Department of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaMoon Soul Graduate School of Future Strategy, KAIST, Daejeon, South KoreaDepartment of Intelligent Semiconductors, Soongsil University, Dongjak, Seoul, South KoreaQuotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from their original context, changing the speaker’s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at <uri>https://github.com/ssu-humane/contextomized-quotes-access</uri>.https://ieeexplore.ieee.org/document/10472505/Data-centric AIcontrastive learningcontextomy |
spellingShingle | Seonyeong Song Jiyoung Han Kunwoo Park A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection IEEE Access Data-centric AI contrastive learning contextomy |
title | A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection |
title_full | A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection |
title_fullStr | A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection |
title_full_unstemmed | A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection |
title_short | A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection |
title_sort | data centric contrastive embedding framework for contextomized quote detection |
topic | Data-centric AI contrastive learning contextomy |
url | https://ieeexplore.ieee.org/document/10472505/ |
work_keys_str_mv | AT seonyeongsong adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT jiyounghan adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT kunwoopark adatacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT seonyeongsong datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT jiyounghan datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection AT kunwoopark datacentriccontrastiveembeddingframeworkforcontextomizedquotedetection |