AutoPaperBench: An MLLM-Based Framework for Automatic Generation of Paper Understanding Evaluation Benchmarks

被引:0
作者
Kim, Min-Woo [1 ]
Park, Hyo-Bin [1 ]
Ahn, Hee-Jin [1 ]
Park, Woo-Ram [1 ]
Jeon, Jae-Wan [1 ]
Lee, Kyong-Ha [2 ]
Lee, Ryong [2 ]
Choi, Dong-Geol [1 ]
机构
[1] Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
[2] Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South Korea
来源
ELECTRONICS | 2025年 / 14卷 / 06期
关键词
large language model; deep learning; benchmark; research paper evaluation system;
D O I
10.3390/electronics14061175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
AutoPaperBench proposes a benchmark generation system to automatically evaluate the comprehensibility of papers in a Multimodal Large Language Model (MLLM). The proposed system efficiently structures the content of a paper through semantic parsing and automatically generates text-based QAs and visual-based VQAs. To ensure the quality of the generated QA, we introduce a reviewer system that evaluates six criteria such as logic and appropriateness. In our experiments on 60 research papers from the medical, natural, and engineering fields, the generated benchmarks demonstrate comparable performance rankings to those of previous benchmarks, and the performance improvements achieved through semantic parsing are validated. The system can run on a single GPU environment and provides a framework for efficiently evaluating LLM thesis comprehension.
引用
收藏
页数:20
相关论文
共 43 条
  • [1] Alayrac JB, 2022, ADV NEUR IN
  • [2] Anil R., 2023, PaLM 2 technical report
  • [3] VIP-Bench: A Benchmark Suite for Evaluating Privacy-Enhanced Computation Frameworks
    Biernacki, Lauren
    Demissie, Meron Zerihun
    Workneh, Kidus Birkayehu
    Namomsa, Galane Basha
    Gebremedhin, Plato
    Andargie, Fitsum Assamnew
    Reagen, Brandon
    Austin, Todd
    [J]. 2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021), 2021, : 139 - 149
  • [4] Autonomous chemical research with large language models
    Boiko, Daniil A.
    Macknight, Robert
    Kline, Ben
    Gomes, Gabe
    [J]. NATURE, 2023, 624 (7992) : 570 - +
  • [5] Chen Z., 2023, PREPRINT, DOI [arXiv:2311.16079, 10.48550/arXiv.2311.16079, DOI 10.48550/ARXIV.2311.16079]
  • [6] Colombo P, 2024, Arxiv, DOI arXiv:2403.03883
  • [7] Cui JX, 2024, Arxiv, DOI [arXiv:2306.16092, DOI 10.48550/ARXIV.2306.16092]
  • [8] Dong Q., 2024, arXiv, DOI arXiv:2301.00234
  • [9] Fei H., 2024, P 38 ANN C NEUR INF, P1
  • [10] Fu R, 2024, Arxiv, DOI arXiv:2403.11401