Transformer-Based Approach for Automatic Semantic Financial Document Verification

被引:0
作者
Toprak, Ahmet [1 ]
Turan, Metin [1 ]
机构
[1] Istanbul Ticaret Univ, Dept Comp Engn, TR-34840 Istanbul, Turkiye
关键词
Transformers; Semantics; Accuracy; Manuals; Personnel; Data models; Feature extraction; Training data; Costs; Document handling; Financial management; Document verification; semantic analysis; transformer; abstract summarization; Reuters financial datasets; SIMILARITY;
D O I
10.1109/ACCESS.2024.3477270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document verification is the process of verifying an original summary document on the original full-text document. Semantic control is very critical in these verification processes. In this study, an automatic document verification system based on Natural Language Processing techniques was designed to semantically check the consistency of the abstract summary produced especially for the original document or documents of the financial type. Verification of abstract summaries on original full-text documents was done through the Transformer-based model. Since the reference documents to be verified in the study belong to the financial type, the Transformer model was created by training with Reuters financial dataset. The proposed Transformer-based semantic document verification approach was tested on the original full-text and summary documents. The full text and summary documents were subjected to data pre-processing and Spell Checker processes. Then, since the summary document will be verified on the full-text document, the sentences most similar to the summary document sentences from the full-text document sentences were determined by using Simhash and Cross Encoder text similarity algorithms. It is a heuristic approach and completes the proposed verification system. Two (experimentally) original full-text document sentences most similar to the summary document sentence were selected. Then, these original full-text document sentences were inputted as training data to the Transformer model. Finally, the transformer model produced an abstract summary of original full-text sentences. In the last stage, the original summary and the summary produced by the Transformer model were compared with both Simhash and Cross Encoder text similarity algorithms in terms of their similarities, and the average document verification accuracy was calculated. The proposed Transformer-based semantic document verification approach achieved an average of 84.1% semantic financial document verification accuracy on the financial documents in the Reuters financial dataset. In this study, we present several key contributions to the field of semantic document verification: Firstly, we introduce a Transformer-based model tailored for financial texts, trained on the Reuters financial dataset, which offers enhanced precision in understanding financial language. Secondly, our approach employs advanced Natural Language Processing techniques for deep semantic analysis to verify the consistency of document summaries. Thirdly, we propose a novel hybrid methodology that integrates Transformer models with sentence grouping techniques for generating accurate and informative abstract summaries. These innovations collectively mark a substantial advancement in the automation and precision of document verification processes.
引用
收藏
页码:184327 / 184349
页数:23
相关论文
共 88 条
[1]  
Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[2]   Improving the Factual Accuracy of Abstractive Clinical Text Summarization using Multi-Objective Optimization [J].
Alambo, Amanuel ;
Banerjee, Tanvi ;
Thirunarayan, Krishnaprasad ;
Cajita, Mia .
2022 44TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2022, :1615-1618
[3]  
Alzahrani S., 2019, IEEE Access, V7
[4]  
[Anonymous], 2023, Dipsizkuyu
[5]  
[Anonymous], [Accessed 1 June 2023]. Jakubik, P. Christophersen, C., 2014 Insurance and the Macroeconomic Environment, s.l.: s.n. Mckinsey Company, 2023. Global Insurance Report 2023: Closing the personal PC protection gap. [Online] Available at: https://www.mckinsey.com/industries/financial-services/our-insights/globalinsurance-report-2023-closing-the-personal-p-and-c-protection-gap
[6]  
[Anonymous], 2023, Weaviate
[7]  
Ashish Vaswani, 2017, Attention is all you need
[8]  
Asif M., 2022, P INT C BUS AN TECHN, P1, DOI [10.1109/ICBATS54253.2022.9759026, DOI 10.1109/ICBATS54253.2022.9759026]
[9]  
Attivissimo F, 2019, IEEE SYS MAN CYBERN, P3525, DOI 10.1109/SMC.2019.8914438
[10]  
Balouch B. A. K., 2023, P INT C APPL ENG NAT, V1, P476