AutoPaperBench: An MLLM-Based Framework for Automatic Generation of Paper Understanding Evaluation Benchmarks
被引:0
作者:
Kim, Min-Woo
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Kim, Min-Woo
[1
]
Park, Hyo-Bin
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Park, Hyo-Bin
[1
]
Ahn, Hee-Jin
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Ahn, Hee-Jin
[1
]
Park, Woo-Ram
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Park, Woo-Ram
[1
]
Jeon, Jae-Wan
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Jeon, Jae-Wan
[1
]
Lee, Kyong-Ha
论文数: 0引用数: 0
h-index: 0
机构:
Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Lee, Kyong-Ha
[2
]
Lee, Ryong
论文数: 0引用数: 0
h-index: 0
机构:
Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Lee, Ryong
[2
]
Choi, Dong-Geol
论文数: 0引用数: 0
h-index: 0
机构:
Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South KoreaHanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
Choi, Dong-Geol
[1
]
机构:
[1] Hanbat Natl Univ, Dept Informat & Commun Engn, Daejeon 34158, South Korea
[2] Korea Inst Sci & Technol Informat, Dept Large Scale AI Res Grp, Daejeon 34141, South Korea
来源:
ELECTRONICS
|
2025年
/
14卷
/
06期
关键词:
large language model;
deep learning;
benchmark;
research paper evaluation system;
D O I:
10.3390/electronics14061175
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
AutoPaperBench proposes a benchmark generation system to automatically evaluate the comprehensibility of papers in a Multimodal Large Language Model (MLLM). The proposed system efficiently structures the content of a paper through semantic parsing and automatically generates text-based QAs and visual-based VQAs. To ensure the quality of the generated QA, we introduce a reviewer system that evaluates six criteria such as logic and appropriateness. In our experiments on 60 research papers from the medical, natural, and engineering fields, the generated benchmarks demonstrate comparable performance rankings to those of previous benchmarks, and the performance improvements achieved through semantic parsing are validated. The system can run on a single GPU environment and provides a framework for efficiently evaluating LLM thesis comprehension.