A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

被引:0
作者
Song, Seonyeong [1 ]
Han, Jiyoung [2 ]
Park, Kunwoo [1 ,3 ]
机构
[1] Soongsil Univ, Dept Intelligent Semicond, Seoul 06978, South Korea
[2] Korea Adv Inst Sci & Technol, Moon Soul Grad Sch Future Strategy, Daejeon 34141, South Korea
[3] Soongsil Univ, Sch AI Convergence, Seoul 06978, South Korea
基金
新加坡国家研究基金会;
关键词
Semantics; Training; Task analysis; Self-supervised learning; Context modeling; Artificial intelligence; Fake news; Data models; Contrast resolution; Detection algorithms; Professional communication; Data-centric AI; contrastive learning; contextomy;
D O I
10.1109/ACCESS.2024.3377227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as 'contextomizing,' where words are extracted from their original context, changing the speaker's intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at https://github.com/ssu-humane/contextomized-quotes-access.
引用
收藏
页码:40168 / 40181
页数:14
相关论文
共 50 条
  • [11] Epidemic Forecasting with a Data-Centric Lens
    Rodriguez, Alexander
    Kamarthi, Harshavardhan
    Prakash, B. Aditya
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4822 - 4823
  • [12] MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition
    Schmitt-Koopmann, Felix M.
    Huang, Elaine M.
    Hutter, Hans-Peter
    Stadelmann, Thilo
    Darvishy, Alireza
    IEEE ACCESS, 2024, 12 : 76963 - 76974
  • [13] Data-Centric Methods for Environmental Sound Classification With Limited Labels
    Syed, Ali Raza
    Coban, Enis Berk
    Pir, Dara
    Mandel, Michael
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4288 - 4297
  • [14] A data-centric approach to manage business processes
    Haddar, Nahla
    Tmar, Mohamed
    Gargouri, Faiez
    COMPUTING, 2016, 98 (04) : 375 - 406
  • [15] dcbench: A Benchmark for Data-Centric AI Systems
    Eyuboglu, Sabri
    Karlas, Bojan
    Re, Christopher
    Zhang, Ce
    Zou, James
    PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,
  • [16] Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2023, 25 (06) : 62 - 70
  • [17] MCL: A Contrastive Learning Method for Multimodal Data Fusion in Violence Detection
    Yang, Liu
    Wu, Zhenjie
    Hong, Junkun
    Long, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 408 - 412
  • [18] Better, Not Just More: Data-centric machine learning for Earth observation
    Roscher, Ribana
    Russwurm, Marc
    Gevaert, Caroline
    Kampffmeyer, Michael
    Dos Santos, Jefersson A.
    Vakalopoulou, Maria
    Haensch, Ronny
    Hansen, Stine
    Nogueira, Keiller
    Prexl, Jonathan
    Tuia, Devis
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2024, 12 (04) : 335 - 355
  • [19] Model and data-centric machine learning algorithms to address data scarcity for failure identification
    Khan, Lareb Zar
    Pedro, Joao
    Costa, Nelson
    Sgambelluri, Andrea
    Napoli, Antonio
    Sambo, Nicola
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 369 - 381
  • [20] A Data-Centric AI Paradigm for Socio-Industrial and Global Challenges
    Majeed, Abdul
    Hwang, Seong Oun
    ELECTRONICS, 2024, 13 (11)