Transformer-Based Approach for Automatic Semantic Financial Document Verification

被引:0
|
作者
Toprak, Ahmet [1 ]
Turan, Metin [1 ]
机构
[1] Istanbul Ticaret Univ, Dept Comp Engn, TR-34840 Istanbul, Turkiye
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Semantics; Accuracy; Manuals; Personnel; Data models; Feature extraction; Training data; Costs; Document handling; Financial management; Document verification; semantic analysis; transformer; abstract summarization; Reuters financial datasets; SIMILARITY;
D O I
10.1109/ACCESS.2024.3477270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document verification is the process of verifying an original summary document on the original full-text document. Semantic control is very critical in these verification processes. In this study, an automatic document verification system based on Natural Language Processing techniques was designed to semantically check the consistency of the abstract summary produced especially for the original document or documents of the financial type. Verification of abstract summaries on original full-text documents was done through the Transformer-based model. Since the reference documents to be verified in the study belong to the financial type, the Transformer model was created by training with Reuters financial dataset. The proposed Transformer-based semantic document verification approach was tested on the original full-text and summary documents. The full text and summary documents were subjected to data pre-processing and Spell Checker processes. Then, since the summary document will be verified on the full-text document, the sentences most similar to the summary document sentences from the full-text document sentences were determined by using Simhash and Cross Encoder text similarity algorithms. It is a heuristic approach and completes the proposed verification system. Two (experimentally) original full-text document sentences most similar to the summary document sentence were selected. Then, these original full-text document sentences were inputted as training data to the Transformer model. Finally, the transformer model produced an abstract summary of original full-text sentences. In the last stage, the original summary and the summary produced by the Transformer model were compared with both Simhash and Cross Encoder text similarity algorithms in terms of their similarities, and the average document verification accuracy was calculated. The proposed Transformer-based semantic document verification approach achieved an average of 84.1% semantic financial document verification accuracy on the financial documents in the Reuters financial dataset. In this study, we present several key contributions to the field of semantic document verification: Firstly, we introduce a Transformer-based model tailored for financial texts, trained on the Reuters financial dataset, which offers enhanced precision in understanding financial language. Secondly, our approach employs advanced Natural Language Processing techniques for deep semantic analysis to verify the consistency of document summaries. Thirdly, we propose a novel hybrid methodology that integrates Transformer models with sentence grouping techniques for generating accurate and informative abstract summaries. These innovations collectively mark a substantial advancement in the automation and precision of document verification processes.
引用
收藏
页码:184327 / 184349
页数:23
相关论文
共 50 条
  • [31] Automatic assessment of divergent thinking in Chinese language with TransDis: A transformer-based language model approach
    Yang, Tianchen
    Zhang, Qifan
    Sun, Zhaoyang
    Hou, Yubo
    BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 5798 - 5819
  • [32] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
    Viet The Bui
    Tho Chi Luong
    Oanh Thi Tran
    CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630
  • [33] The interactive reading task: Transformer-based automatic item generation
    Attali, Yigal
    Runge, Andrew
    LaFlair, Geoffrey T.
    Yancey, Kevin
    Goodwin, Sarah
    Park, Yena
    von Davier, Alina A.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [34] Automatic Detection of Sensitive Data Using Transformer-Based Classifiers
    Petrolini, Michael
    Cagnoni, Stefano
    Mordonini, Monica
    FUTURE INTERNET, 2022, 14 (08)
  • [35] Transformer-Based Models for the Automatic Indexing of Scientific Documents in French
    Angel Gonzalez, Jose
    Buscaldi, Davide
    Sanchis, Emilio
    Hurtado, Lluis-F
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 60 - 72
  • [36] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [37] Vison Transformer-Based Automatic Crack Detection on Dam Surface
    Zhou, Jian
    Zhao, Guochuan
    Li, Yonglong
    WATER, 2024, 16 (10)
  • [38] Explaining transformer-based models for automatic short answer grading
    Poulton, Andrew
    Eliens, Sebas
    5TH INTERNATIONAL CONFERENCE ON DIGITAL TECHNOLOGY IN EDUCATION, ICDTE 2021, 2021, : 110 - 116
  • [39] A Transformer-Based Pipeline for German Clinical Document De-Identification
    Arzideh, Kamyar
    Baldini, Giulia
    Winnekens, Philipp
    Friedrich, Christoph M.
    Nensa, Felix
    Idrissi-Yaghir, Ahmad
    Hosch, Rene
    APPLIED CLINICAL INFORMATICS, 2025, 16 (01): : 31 - 43
  • [40] Doc-Former: A transformer-based document shadow denoising network
    Pei, Shengchang
    Liu, Jun
    Yi, Niannian
    Zhang, Yun
    Liu, Zhengtao
    Chen, Zengyan
    2023 THE 6TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA 2023, 2023, : 139 - 143