A Transformer Based Encodings for Detection of Semantically Equivalent Questions in cQA

被引:2
作者
Kumar, Shobhan [1 ]
Chauhan, Arun [2 ]
机构
[1] IIIT Dharwad, Comp Sci & Engn, Dharwad 580009, Karnataka, India
[2] Graph Era Univ, Comp Sci & Engn, Dehra Dun 580009, Uttarakhand, India
关键词
artificial intelligence; fine-tuning; community question-answers; language model; semantic equivalence; question-answer;
D O I
10.1093/comjnl/bxac003
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The probability of redundancy in questions has significantly increased due to the increasing influx of users on different cQA forums such as Quora, Stack overflow, etc. Because of this redundancy, the responses are scattered through various variations of the same question that results in unsatisfactory search results to a specific question. To address this issue, this work proposes the model for discovering the semantic similarity among the cQA questions. We followed two approaches (i) Feature-based: the question embedding is created using four forms of word embeddings and an ensemble of all four. Then Siamese LSTM (sLSTM) is used to find the semantic similarity among the questions. (ii) Fine-tuning: we fine-tuned BERT model on STS and SNLI data, which employs Siamese network architectures to generate semantically meaningful sentence embeddings. Then sBERT is used to assess the similarity between the questions. Experiments were carried out on Quora (QQP) and Stack Exchange cQA dataset with training sets of different sizes and word vectors of different dimensionalities. The model shows significant improvement over the state-of-the-artwork on sentence similarity tasks.
引用
收藏
页码:1139 / 1155
页数:17
相关论文
共 81 条
  • [1] An Enhanced Deep Learning Model for Duplicate Question Pairs Recognition
    Abishek, K.
    Hariharan, Basuthkar Rajaram
    Valliyammai, C.
    [J]. SOFT COMPUTING IN DATA ANALYTICS, SCDA 2018, 2019, 758 : 769 - 777
  • [2] Mining Duplicate Questions in Stack Overflow
    Ahasanuzzaman, Muhammad
    Asaduzzaman, Muhammad
    Roy, Chanchal K.
    Schneider, Kevin A.
    [J]. 13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), 2016, : 402 - 412
  • [3] Alonso O., 2013, Duplicate News Story Detection Revisited, P203
  • [4] [Anonymous], 2018, ARXIV180511360
  • [5] [Anonymous], 2016, P 25 INT JOINT C ART
  • [6] [Anonymous], 2011, P 5 INT JOINT C NATU
  • [7] [Anonymous], 2005, P 14 ACM INT C INF K, DOI DOI 10.1145/1099554.1099572
  • [8] [Anonymous], 2011, P 25 AAAI C ART INT
  • [9] [Anonymous], 2016, P 10 INT WORKSH SEM, P602
  • [10] Arora S, 2017, 5 INT C LEARN REPR I