AutoPaperBench: An MLLM-Based Framework for Automatic Generation of Paper Understanding Evaluation Benchmarks

被引：0

作者：

Kim, Min-Woo ^{[1
]}

Park, Hyo-Bin ^{[1
]}

Ahn, Hee-Jin ^{[1
]}

Park, Woo-Ram ^{[1
]}

Jeon, Jae-Wan ^{[1
]}

Lee, Kyong-Ha ^{[2
]}

Lee, Ryong ^{[2
]}

Choi, Dong-Geol ^{[1
]}

机构：

[1] Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea

[2] Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South Korea

来源：

ELECTRONICS | 2025年 / 14卷 / 06期

关键词：

large language model; deep learning; benchmark; research paper evaluation system;

D O I：

10.3390/electronics14061175

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

AutoPaperBench proposes a benchmark generation system to automatically evaluate the comprehensibility of papers in a Multimodal Large Language Model (MLLM). The proposed system efficiently structures the content of a paper through semantic parsing and automatically generates text-based QAs and visual-based VQAs. To ensure the quality of the generated QA, we introduce a reviewer system that evaluates six criteria such as logic and appropriateness. In our experiments on 60 research papers from the medical, natural, and engineering fields, the generated benchmarks demonstrate comparable performance rankings to those of previous benchmarks, and the performance improvements achieved through semantic parsing are validated. The system can run on a single GPU environment and provides a framework for efficiently evaluating LLM thesis comprehension.

引用

页数：20

共 43 条

[11]

Georgiev P., 2024, arXiv, DOI DOI 10.48550/ARXIV.2403.05530

[12]

Guo JX, 2018, AAAI CONF ARTIF INTE, P5141

[13] PitVQA: Image-Grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery [J].

He, Runlong ;

Xu, Mengya ;

Das, Adrito ;

Khan, Danyal Z. ;

Bano, Sophia ;

Marcus, Hani J. ;

Stoyanov, Danail ;

Clarkson, Matthew J. ;

Islam, Mobarakol .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 :488-498

[14]

Hu AW, 2024, Arxiv, DOI arXiv:2403.12895

[15]

Hudson G.T., 2022, arXiv

[16] A dataset of clinically generated visual questions and answers about radiology images [J].

Lau, Jason J. ;

Gayen, Soumya ;

Ben Abacha, Asma ;

Demner-Fushman, Dina .

SCIENTIFIC DATA, 2018, 5

[17]

Lewis P, 2020, ADV NEUR IN, V33

[18]

Li JN, P MACHINE LEARNING R

[19]

Liao WH, 2025, Arxiv, DOI arXiv:2408.15045

[20]

Liu HT, 2024, Arxiv, DOI arXiv:2310.03744

← 1 2 3 4 5 →