AutoPaperBench: An MLLM-Based Framework for Automatic Generation of Paper Understanding Evaluation Benchmarks

被引：0

作者：

Kim, Min-Woo ^{[1
]}

Park, Hyo-Bin ^{[1
]}

Ahn, Hee-Jin ^{[1
]}

Park, Woo-Ram ^{[1
]}

Jeon, Jae-Wan ^{[1
]}

Lee, Kyong-Ha ^{[2
]}

Lee, Ryong ^{[2
]}

Choi, Dong-Geol ^{[1
]}

机构：

[1] Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea

[2] Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South Korea

来源：

ELECTRONICS | 2025年 / 14卷 / 06期

关键词：

large language model; deep learning; benchmark; research paper evaluation system;

D O I：

10.3390/electronics14061175

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

AutoPaperBench proposes a benchmark generation system to automatically evaluate the comprehensibility of papers in a Multimodal Large Language Model (MLLM). The proposed system efficiently structures the content of a paper through semantic parsing and automatically generates text-based QAs and visual-based VQAs. To ensure the quality of the generated QA, we introduce a reviewer system that evaluates six criteria such as logic and appropriateness. In our experiments on 60 research papers from the medical, natural, and engineering fields, the generated benchmarks demonstrate comparable performance rankings to those of previous benchmarks, and the performance improvements achieved through semantic parsing are validated. The system can run on a single GPU environment and provides a framework for efficiently evaluating LLM thesis comprehension.

引用

页数：20

共 43 条

[1]

Alayrac JB, 2022, ADV NEUR IN

[2]

Anil Rohan, 2023, Palm 2 technical report

[3] VIP-Bench: A Benchmark Suite for Evaluating Privacy-Enhanced Computation Frameworks [J].

Biernacki, Lauren ;

Demissie, Meron Zerihun ;

Workneh, Kidus Birkayehu ;

Namomsa, Galane Basha ;

Gebremedhin, Plato ;

Andargie, Fitsum Assamnew ;

Reagen, Brandon ;

Austin, Todd .

2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021), 2021, :139-149

[4] Autonomous chemical research with large language models [J].

Boiko, Daniil A. ;

Macknight, Robert ;

Kline, Ben ;

Gomes, Gabe .

NATURE, 2023, 624 (7992) :570-+

[5]

Chen ZM, 2023, Arxiv, DOI [arXiv:2311.16079, 10.48550/arXiv.2311.16079]

[6]

Colombo P, 2024, Arxiv, DOI arXiv:2403.03883

[7]

Cui JX, 2024, Arxiv, DOI [arXiv:2306.16092, DOI 10.48550/ARXIV.2306.16092]

[8]

Dong QX, 2024, Arxiv, DOI arXiv:2301.00234

[9]

Fei H., 2024, P 38 ANN C NEUR INF, P1

[10]

Fu R, 2024, Arxiv, DOI [arXiv:2403.11401, arXiv:2403.11401]

← 1 2 3 4 5 →