Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引：0

作者：

Yamac, Aylin ^{[1
]}

Genc, Dilan ^{[1
]}

Zaman, Esra ^{[1
]}

Gerschner, Felix ^{[1
]}

Klaiber, Marco ^{[1
]}

Theissler, Andreas ^{[1
]}

机构：

[1] Aalen Univ Appl Sci, Aalen, Germany

来源：

2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年

关键词：

text-to-image; open-source; weaknesses;

D O I：

10.1109/COMPSAC61105.2024.00261

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.

引用

页码：1659 / 1664

页数：6

共 50 条

[31] Parallelization and performance evaluation of open-source HEVC codecs
Garcia-Lucas, David
Cebrian-Marquez, Gabriel
Cuenca, Pedro
JOURNAL OF SUPERCOMPUTING, 2017, 73 (01) : 495 - 513
[32] Parallelization and performance evaluation of open-source HEVC codecs
David García-Lucas
Gabriel Cebrián-Márquez
Pedro Cuenca
The Journal of Supercomputing, 2017, 73 : 495 - 513
[33] ICY: A NEW OPEN-SOURCE COMMUNITY IMAGE PROCESSING SOFTWARE
de Chaumont, Fabrice
Dallongeville, Stephane
Olivo-Marin, Jean-Christophe
2011 8TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, 2011, : 234 - 237
[34] Performance Evaluation of Open-Source Serverless Platforms for Kubernetes
Decker, Jonathan
Kasprzak, Piotr
Kunkel, Julian Martin
ALGORITHMS, 2022, 15 (07)
[35] SWF-GAN: A Text-to-Image model based on sentence-word fusion perception
Liu, Chun
Hu, Jingsong
Lin, Hong
COMPUTERS & GRAPHICS-UK, 2023, 115 : 500 - 510
[36] Stable rivers: A case study in the application of text-to-image generative models for Earth sciences
Kupferschmidt, C.
Binns, A. D.
Kupferschmidt, K. L.
Taylor, G. W.
EARTH SURFACE PROCESSES AND LANDFORMS, 2024, 49 (13) : 4213 - 4232
[37] Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
Zhang, Guangzi
Qian, Yulin
Deng, Juntao
Cai, Xingquan
APPLIED SCIENCES-BASEL, 2024, 14 (08):
[38] Clever little tricks: A socio-technical history of text-to-image generative models
Steinfeld, Kyle
INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2023, 21 (02) : 211 - 241
[39] Open-source approaches for location cover models: capabilities and efficiency
Chen, Huanfa
Murray, Alan T.
Jiang, Rui
JOURNAL OF GEOGRAPHICAL SYSTEMS, 2021, 23 (03) : 361 - 380
[40] Open-source approaches for location cover models: capabilities and efficiency
Huanfa Chen
Alan T. Murray
Rui Jiang
Journal of Geographical Systems, 2021, 23 : 361 - 380

← 1 2 3 4 5 →