MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics

被引:2
作者
Silva, David [1 ]
Bacao, Fernando [1 ]
机构
[1] Univ Lisbon, NOVA, IMS, Campus Campolide, P-1070312 Lisbon, Portugal
来源
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022 | 2022年 / 13566卷
关键词
Sentence embeddings; Transformer architecture; Visual analytics; Information retrieval; Topic modeling; Competitive Intelligence;
D O I
10.1007/978-3-031-16474-3_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and reranker engine that first finds the closest neighbors to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing module also leverages the embeddings by projecting them onto two dimensions while preserving the original landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. In this work, we evaluate the system and its components on the 20 newsgroups dataset and demonstrate the superiority of Transformer-based components.
引用
收藏
页码:599 / 610
页数:12
相关论文
共 27 条
[1]  
Malkov YA, 2018, Arxiv, DOI arXiv:1603.09320
[2]  
Angelov D, 2020, Arxiv, DOI [arXiv:2008.09470, DOI 10.48550/ARXIV.2008.09470]
[3]  
Bajaj P, 2018, Arxiv, DOI arXiv:1611.09268
[4]  
Bianchi F, 2021, Arxiv, DOI [arXiv:2004.03974, DOI 10.48550/ARXIV.2004.03974]
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]   Cartolabe: A Web-Based Scalable Visualization of Large Document Collections [J].
Caillou, Philippe ;
Renault, Jonas ;
Fekete, Jean-Daniel ;
Letournel, Anne-Catherine ;
Sebag, Michele .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2021, 41 (02) :76-87
[7]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[8]   Information encountering re-encountered A conceptual re-examination of serendipity in the context of information acquisition [J].
Erdelez, Sanda ;
Makri, Stephann .
JOURNAL OF DOCUMENTATION, 2020, 76 (03) :731-751
[9]  
Esteva A, 2020, Arxiv, DOI arXiv:2006.09595
[10]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57