Transformer-Based Approach for Automatic Semantic Financial Document Verification

被引:0
|
作者
Toprak, Ahmet [1 ]
Turan, Metin [1 ]
机构
[1] Istanbul Ticaret Univ, Dept Comp Engn, TR-34840 Istanbul, Turkiye
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Semantics; Accuracy; Manuals; Personnel; Data models; Feature extraction; Training data; Costs; Document handling; Financial management; Document verification; semantic analysis; transformer; abstract summarization; Reuters financial datasets; SIMILARITY;
D O I
10.1109/ACCESS.2024.3477270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document verification is the process of verifying an original summary document on the original full-text document. Semantic control is very critical in these verification processes. In this study, an automatic document verification system based on Natural Language Processing techniques was designed to semantically check the consistency of the abstract summary produced especially for the original document or documents of the financial type. Verification of abstract summaries on original full-text documents was done through the Transformer-based model. Since the reference documents to be verified in the study belong to the financial type, the Transformer model was created by training with Reuters financial dataset. The proposed Transformer-based semantic document verification approach was tested on the original full-text and summary documents. The full text and summary documents were subjected to data pre-processing and Spell Checker processes. Then, since the summary document will be verified on the full-text document, the sentences most similar to the summary document sentences from the full-text document sentences were determined by using Simhash and Cross Encoder text similarity algorithms. It is a heuristic approach and completes the proposed verification system. Two (experimentally) original full-text document sentences most similar to the summary document sentence were selected. Then, these original full-text document sentences were inputted as training data to the Transformer model. Finally, the transformer model produced an abstract summary of original full-text sentences. In the last stage, the original summary and the summary produced by the Transformer model were compared with both Simhash and Cross Encoder text similarity algorithms in terms of their similarities, and the average document verification accuracy was calculated. The proposed Transformer-based semantic document verification approach achieved an average of 84.1% semantic financial document verification accuracy on the financial documents in the Reuters financial dataset. In this study, we present several key contributions to the field of semantic document verification: Firstly, we introduce a Transformer-based model tailored for financial texts, trained on the Reuters financial dataset, which offers enhanced precision in understanding financial language. Secondly, our approach employs advanced Natural Language Processing techniques for deep semantic analysis to verify the consistency of document summaries. Thirdly, we propose a novel hybrid methodology that integrates Transformer models with sentence grouping techniques for generating accurate and informative abstract summaries. These innovations collectively mark a substantial advancement in the automation and precision of document verification processes.
引用
收藏
页码:184327 / 184349
页数:23
相关论文
共 50 条
  • [1] TRANSFORMER-BASED APPROACH FOR DOCUMENT LAYOUT UNDERSTANDING
    Yang, Huichen
    Hsu, William
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4043 - 4047
  • [2] A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
    de Moor, Aral
    van Deursen, Arie
    Izadi, Maliheh
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 28 - 37
  • [3] A Semantic Based Approach for Automatic Patent Document Summarization
    Trappey, Amy J. C.
    Trappey, Charles V.
    Wu, Chun-Yi
    COLLABORATIVE PRODUCTIVE AND SERVICE LIFE CYCLE MANAGEMENT FOR A SUSTAINABLE WORLD, 2008, : 485 - +
  • [4] Transformer-based Hierarchical Encoder for Document Classification
    Sakhrani, Harsh
    Parekh, Saloni
    Ratadiya, Pratik
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 852 - 858
  • [5] Automatic Fake News Detection in Political Platforms - A Transformer-based Approach
    Raza, Shaina
    CASE 2021: THE 4TH WORKSHOP ON CHALLENGES AND APPLICATIONS OF AUTOMATED EXTRACTION OF SOCIO-POLITICAL EVENTS FROM TEXT (CASE), 2021, : 68 - 78
  • [6] TransRSS: Transformer-based Radar Semantic Segmentation
    Zou, Hao
    Xie, Zhen
    Ou, Jiarong
    Gao, Yutao
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6965 - 6972
  • [7] Transformer-based approach for joint handwriting and named entity recognition in historical document
    Rouhou, Ahmed Cheikh
    Dhiaf, Marwa
    Kessentini, Yousri
    Ben Salem, Sinda
    PATTERN RECOGNITION LETTERS, 2022, 155 : 128 - 134
  • [8] BertSRC: transformer-based semantic relation classification
    Lee, Yeawon
    Son, Jinseok
    Song, Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [9] BertSRC: transformer-based semantic relation classification
    Yeawon Lee
    Jinseok Son
    Min Song
    BMC Medical Informatics and Decision Making, 22
  • [10] A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach
    Nemani, Praneeth
    Vollala, Satyanarayana
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,