Image to Text Conversion: State of the Art and Extended Work

被引：9

作者：

Farhani, Nada ^{[1
]}

Terbeh, Naim ^{[1
]}

Zrigui, Mounir ^{[1
]}

机构：

[1] LaTICE Lab, Monastir, Tunisia

来源：

2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA) | 2017年

关键词：

Modality; learning; image processing; automatic phrase generation; PTT Conversion;

D O I：

10.1109/AICCSA.2017.159

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The aim of this article is to study the conversion of information between the different modalities (text, image) due to the evolution of human-machine communication that introduced the use of natural communication modalities to humans such as gestures, speech, sound and vision. In fact, one of the main challenges of this "multimodal" learning is the learning of a shared representation between the distinct modalities and the prediction of the missing data (for example, by retrieval or synthesis) from a conditioned modality to another. Some researches work on the different types of conversions; Text to Speech, Speech to Picture or Text to Picture synthesis and vice versa but in this paper we will focus on: Text to Picture (TTP) and Picture to Text (PTT) synthesis.

引用

页码：937 / 943

页数：7