Transformer-Based Approach for Automatic Semantic Financial Document Verification

被引:0
作者
Toprak, Ahmet [1 ]
Turan, Metin [1 ]
机构
[1] Istanbul Ticaret Univ, Dept Comp Engn, TR-34840 Istanbul, Turkiye
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Semantics; Accuracy; Manuals; Personnel; Data models; Feature extraction; Training data; Costs; Document handling; Financial management; Document verification; semantic analysis; transformer; abstract summarization; Reuters financial datasets; SIMILARITY;
D O I
10.1109/ACCESS.2024.3477270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document verification is the process of verifying an original summary document on the original full-text document. Semantic control is very critical in these verification processes. In this study, an automatic document verification system based on Natural Language Processing techniques was designed to semantically check the consistency of the abstract summary produced especially for the original document or documents of the financial type. Verification of abstract summaries on original full-text documents was done through the Transformer-based model. Since the reference documents to be verified in the study belong to the financial type, the Transformer model was created by training with Reuters financial dataset. The proposed Transformer-based semantic document verification approach was tested on the original full-text and summary documents. The full text and summary documents were subjected to data pre-processing and Spell Checker processes. Then, since the summary document will be verified on the full-text document, the sentences most similar to the summary document sentences from the full-text document sentences were determined by using Simhash and Cross Encoder text similarity algorithms. It is a heuristic approach and completes the proposed verification system. Two (experimentally) original full-text document sentences most similar to the summary document sentence were selected. Then, these original full-text document sentences were inputted as training data to the Transformer model. Finally, the transformer model produced an abstract summary of original full-text sentences. In the last stage, the original summary and the summary produced by the Transformer model were compared with both Simhash and Cross Encoder text similarity algorithms in terms of their similarities, and the average document verification accuracy was calculated. The proposed Transformer-based semantic document verification approach achieved an average of 84.1% semantic financial document verification accuracy on the financial documents in the Reuters financial dataset. In this study, we present several key contributions to the field of semantic document verification: Firstly, we introduce a Transformer-based model tailored for financial texts, trained on the Reuters financial dataset, which offers enhanced precision in understanding financial language. Secondly, our approach employs advanced Natural Language Processing techniques for deep semantic analysis to verify the consistency of document summaries. Thirdly, we propose a novel hybrid methodology that integrates Transformer models with sentence grouping techniques for generating accurate and informative abstract summaries. These innovations collectively mark a substantial advancement in the automation and precision of document verification processes.
引用
收藏
页码:184327 / 184349
页数:23
相关论文
共 88 条
  • [1] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
  • [2] Alambo Amanuel, 2022, Annu Int Conf IEEE Eng Med Biol Soc, V2022, P1615, DOI 10.1109/EMBC48229.2022.9871798
  • [3] [Anonymous], 2023, Dipsizkuyu
  • [4] [Anonymous], [Accessed 1 June 2023]. Jakubik, P. Christophersen, C., 2014 Insurance and the Macroeconomic Environment, s.l.: s.n. Mckinsey Company, 2023. Global Insurance Report 2023: Closing the personal PC protection gap. [Online] Available at: https://www.mckinsey.com/industries/financial-services/our-insights/globalinsurance-report-2023-closing-the-personal-p-and-c-protection-gap
  • [5] [Anonymous], 2023, Weaviate
  • [6] Ashish V., 2017, P ADV NEUR INF PROC, P5998
  • [7] Asif M., 2022, P INT C BUS AN TECHN, P1, DOI [10.1109/ICBATS54253.2022.9759026, DOI 10.1109/ICBATS54253.2022.9759026]
  • [8] Attivissimo F, 2019, IEEE SYS MAN CYBERN, P3525, DOI 10.1109/SMC.2019.8914438
  • [9] Balouch B. A. K., 2023, Int. Conf. Appl. Eng. Natural Sci., V1, P476, DOI [10.59287/icaens.1042, DOI 10.59287/ICAENS.1042]
  • [10] Deriving semantic validation rules from industrial standards: An OPC UA study
    Bareedu, Yashoda Saisree
    Fruehwirth, Thomas
    Niedermeier, Christoph
    Sabou, Marta
    Steindl, Gernot
    Thuluva, Aparna Saisree
    Tsaneva, Stefani
    Ozkaya, Nilay Tufek
    [J]. SEMANTIC WEB, 2024, 15 (02) : 517 - 554