An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

被引：0

作者：

Ben Ayed, Alaidine ^{[1
]}

Biskri, Ismail ^{[2
]}

Meunier, Jean-Guy ^{[3
]}

机构：

[1] Univ Quebec Montreal, Cognit Comp Sci, Montreal, PQ, Canada

[2] Univ Quebec Trois Rivieres, Comp Sci Dept, Computat Linguist & Artificial Intelligence, Trois Rivieres, PQ, Canada

[3] Univ Quebec Montreal, Montreal, PQ, Canada

来源：

INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH | 2022年 / 12卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Data and Knowledge Representation; Document Retrieval; Internet and Web Applications; Mono/Multi-Document Summarization; RELEVANCE;

D O I：

10.4018/IJIRR.289950

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the context of big data and the Industrial Revolution 4.0 era, enhancing document/information retrieval framework efficiency to handle the ever-growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.

引用

页数：14