Query reformulation system based on WordNet and word vectors clusters

被引:0
作者
Jumde A. [1 ]
Keskar R. [1 ]
机构
[1] Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur
关键词
Query reformulation; TREC; whoosh; word embedding; WordNet;
D O I
10.3233/JIFS-236296
中图分类号
学科分类号
摘要
With tremendous evolution in the internet world, the internet has become a household thing. Internet users use search engines or personal assistants to request information from the internet. Search results are greatly dependent on the entered keyword. Casual users may enter a vague query due to lack of knowledge of the domain-specific words. We propose a query reformulation system that determines the context of the query, decides on keywords to be replaced and outputs a better-modified query. We propose strategies for keyword replacements and metrics for query betterment checks. We have found that if we project keywords into the vector space of word projection using word embedding techniques and if the keyword replacement is correct, clusters of a new set of keywords become more cohesive. This assumption forms the basis of our proposed work. To prove the effectiveness of the proposed system, we applied it to the ad-hoc retrieval tasks over two benchmark corpora viz TREC-CDS 2014 and OHSUMED corpus. We indexed Whoosh search engine on these corpora and evaluated based on the given queries provided along with the corpus. Experimental results show that the proposed techniques achieved 9 to 11% improvement in precision and recall scores. Using Google’s popularity index, we also prove that the reformulated queries are not only more accurate but also more popular. The proposed system also applies to Conversational AI chatbots like ChatGPT, where users must rephrase their queries to obtain better results. © 2024 – IOS Press. All rights reserved.
引用
收藏
页码:9119 / 9137
页数:18
相关论文
共 61 条
  • [1] Nogueira R., Cho K., Task-oriented query reformulation with reinforcement learning, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 574-583, (2017)
  • [2] Abolghasemi A., Verberne S., Askari A., Azzopardi L., Retrievability bias estimation using synthetically generated queries, Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3712-3716, (2023)
  • [3] Krishna Adatrao N.S., Gadireddy G.R., Noh J., A survey on conversational search and applications in biomedicine, Proceedings of the 2023 ACM Southeast Conference, ACMSE, 2023, 2023, pp. 78-88
  • [4] Anand A., Anand A., Setty V., Et al., Query understanding in the age of large language models
  • [5] Baeza-Yates R., Calderon-Benavides L., Gonzalez-Caro C., The intention behind web queries, International SymposiumonStringProcessingandInformationRetrieval, pp. 98-109, (2006)
  • [6] Bhargav A., Bhargav M., Pattern discovery and users classification through web usage mining, 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 632-636, (2014)
  • [7] Bhopale A.P., Tiwari A., Leveraging neural network phrase embedding model for query reformulation in adhoc biomedical information retrieval, Malaysian Journal of Computer Science, 34, 2, pp. 151-170, (2021)
  • [8] Broder A., A taxonomy of web search, ACM Sigir forum
  • [9] Word2vec, pp. 3-10, (2002)
  • [10] Conesa J., Storey V.C., Sugumaran V., Improving web-query processing through semantic knowledge, Data & Knowledge Engineering, 66, 1, pp. 18-34, (2008)