Cite-worthiness Detection on Social Media: A Preliminary Study

被引:0
作者
Hafid, Salim [1 ]
Ammar, Wassim [1 ]
Bringay, Sandra [1 ,2 ]
Todorov, Konstantin [1 ]
机构
[1] Univ Montpellier, CNRS, LIRMM, Montpellier, France
[2] Univ Montpellier 3, Montpellier, France
来源
NATURAL SCIENTIFIC LANGUAGE PROCESSING AND RESEARCH KNOWLEDGE GRAPHS, NSLP 2024 | 2024年 / 14770卷
关键词
Cite-worthiness; Science-related discourse; Social Media; NLP;
D O I
10.1007/978-3-031-65794-8_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting cite-worthiness in text is seen as the problem of flagging a missing reference to a scientific result (an article or a dataset) that should come to support a claim formulated in the text. Previous work has taken interest in this problem in the context of scientific literature, motivated by the need to allow for reference recommendation for researchers and flag missing citations in scientific work. In this preliminary study, we extend this idea towards the context of social media. As scientific claims are often made to support various arguments in societal debates on the Web, it is crucial to flag non-referenced or unsupported claims that relate to science, as this promises to contribute to improving the quality of the debates online. We experiment with baseline models, initially tested on scientific literature, by applying them on the SciTweets dataset which gathers science-related claims from X. We show that models trained on scientific papers struggle to detect cite-worthy text from X, we discuss implications of such results and argue for the necessity to train models on social media corpora for satisfactory flagging of missing references on social media. We make our data publicly available to encourage further research on cite-worthiness detection on social media.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 30 条
[1]  
Alam F., 2023, Working Notes of CLEF
[2]   Social Media and Fake News in the 2016 Election [J].
Allcott, Hunt ;
Gentzkow, Matthew .
JOURNAL OF ECONOMIC PERSPECTIVES, 2017, 31 (02) :211-235
[3]  
Alperin J.P., 2024, Quant. Sci. Stud, P1
[4]  
Arnold Phoebe, 2020, Technical Report
[5]   Explain like I am a Scientist: The Linguistic Barriers of Entry to r/science [J].
August, Tal ;
Card, Dallas ;
Hsieh, Gary ;
Smith, Noah A. ;
Reinecke, Katharina .
PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20), 2020,
[6]  
Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[7]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[8]  
Bird S, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P1755
[9]   The Internet’s hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales [J].
Chandrasekharan, Eshwar ;
Samory, Mattia ;
Jhaver, Shagun ;
Charvat, Hunter ;
Bruckman, Amy ;
Lampe, Cliff ;
Eisenstein, Jacob ;
Gilbert, Eric .
Proceedings of the ACM on Human-Computer Interaction, 2018, 2 (CSCW)
[10]  
de Semir V, 2000, Int Microbiol, V3, P125