Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引：0

作者：

Yamac, Aylin ^{[1
]}

Genc, Dilan ^{[1
]}

Zaman, Esra ^{[1
]}

Gerschner, Felix ^{[1
]}

Klaiber, Marco ^{[1
]}

Theissler, Andreas ^{[1
]}

机构：

[1] Aalen Univ Appl Sci, Aalen, Germany

来源：

2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年

关键词：

text-to-image; open-source; weaknesses;

D O I：

10.1109/COMPSAC61105.2024.00261

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.

引用

页码：1659 / 1664

页数：6

共 50 条

[1] Image Diversity Evaluation Metrics Correlated with Human Subjectivity and Prediction of Image Diversity in Text-to-image Synthesis
Okamoto, Natsuo
Shinagawa, Seitaro
Nakamura, Satoshi
Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (06)
[2] Debiasing Text-to-Image Diffusion Models
He, Ruifei
Xue, Chuhui
Tan, Haoru
Zhang, Wenqing
Yu, Yingchen
Bai, Song
Qi, Xiaojuan
PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
[3] A Study on Generating Webtoons Using Multilingual Text-to-Image Models
Yu, Kyungho
Kim, Hyoungju
Kim, Jeongin
Chun, Chanjun
Kim, Pankoo
APPLIED SCIENCES-BASEL, 2023, 13 (12):
[4] Open-Source MQTT Evaluation
Bender, Melvin
Kirdan, Erkin
Pahl, Marc-Oliver
Carle, Georg
2021 IEEE 18TH ANNUAL CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE (CCNC), 2021,
[5] Archetypes of open-source business models
Estelle Duparc
Frederik Möller
Ilka Jussen
Maleen Stachon
Sükran Algac
Boris Otto
Electronic Markets, 2022, 32 : 727 - 745
[6] Example-Based Conditioning for Text-to-Image Generative Models
Takada, Atsushi
Kawabe, Wataru
Sugano, Yusuke
IEEE ACCESS, 2024, 12 : 162191 - 162203
[7] Archetypes of open-source business models
Duparc, Estelle
Moeller, Frederik
Jussen, Ilka
Stachon, Maleen
Algac, Sukran
Otto, Boris
ELECTRONIC MARKETS, 2022, 32 (02) : 727 - 745
[8] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
[9] Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Liu, Vivian
Chilton, Lydia B.
PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
[10] Exposing fake images generated by text-to-image diffusion models
Xu, Qiang
Wang, Hao
Meng, Laijin
Mi, Zhongjie
Yuan, Jianye
Yan, Hong
PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82

← 1 2 3 4 5 →