A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

被引:0
|
作者
Song, Seonyeong [1 ]
Han, Jiyoung [2 ]
Park, Kunwoo [1 ,3 ]
机构
[1] Soongsil Univ, Dept Intelligent Semicond, Seoul 06978, South Korea
[2] Korea Adv Inst Sci & Technol, Moon Soul Grad Sch Future Strategy, Daejeon 34141, South Korea
[3] Soongsil Univ, Sch AI Convergence, Seoul 06978, South Korea
基金
新加坡国家研究基金会;
关键词
Semantics; Training; Task analysis; Self-supervised learning; Context modeling; Artificial intelligence; Fake news; Data models; Contrast resolution; Detection algorithms; Professional communication; Data-centric AI; contrastive learning; contextomy;
D O I
10.1109/ACCESS.2024.3377227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as 'contextomizing,' where words are extracted from their original context, changing the speaker's intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at https://github.com/ssu-humane/contextomized-quotes-access.
引用
收藏
页码:40168 / 40181
页数:14
相关论文
共 50 条
  • [1] Attention versus contrastive learning of tabular data: a data-centric benchmarking
    Rabbani, Shourav B.
    Medri, Ivan V.
    Samad, Manar D.
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [2] TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation
    Xian, Zixiang
    Huang, Rubing
    Towey, Dave
    Fang, Chunrong
    Chen, Zhenyu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1600 - 1619
  • [3] Data-Centric Graph Learning: A Survey
    Guo, Yuxin
    Bo, Deyu
    Yang, Cheng
    Lu, Zhiyuan
    Zhang, Zhongjian
    Liu, Jixi
    Peng, Yufei
    Shi, Chuan
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (01) : 1 - 20
  • [4] Data-centric Artificial Intelligence: A Survey
    Zha, Daochen
    Bhat, Zaid Pervaiz
    Lai, Kwei-Herng
    Yang, Fan
    Jiang, Zhimeng
    Zhong, Shaochen
    Hu, Xia
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [5] Opportunities and Challenges in Data-Centric AI
    Kumar, Sushant
    Datta, Sumit
    Singh, Vishakha
    Singh, Sanjay Kumar
    Sharma, Ritesh
    IEEE ACCESS, 2024, 12 : 33173 - 33189
  • [6] A Data-centric AI Framework for Automating Exploratory Data Analysis and Data Quality Tasks
    Patel, Hima
    Guttula, Shanmukha
    Gupta, Nitin
    Hans, Sandeep
    Mittal, Ruhi Sharma
    Lokesh, N.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2023, 15 (04):
  • [7] Data-centric AI to Improve Early Detection of Mental Illness
    Wang, Alex X.
    Chukova, Stefanka S.
    Simpson, Colin R.
    Nguyen, Binh P.
    2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 369 - 373
  • [8] Potential Impact of Data-Centric AI on Society
    Kumar, Sushant
    Sharma, Ritesh
    Singh, Vishakha
    Tiwari, Shrikant
    Singh, Sanjay Kumar
    Datta, Sumit
    IEEE TECHNOLOGY AND SOCIETY MAGAZINE, 2023, 42 (03) : 98 - 107
  • [9] Hyper-Noise Interference Privacy Protection Framework for Intelligent Medical Data-Centric Networks
    Wu, Wanqing
    Zhang, Haoke
    de Albuquerque, Victor Hugo C.
    Xu, Lin
    IEEE NETWORK, 2021, 35 (01): : 333 - 339
  • [10] AnyFace: A Data-Centric Approach For Input-Agnostic Face Detection
    Kuzdeuov, Askat
    Koishigarina, Darina
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 211 - 218