Faculty Publications

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Hongchen Tan, Dalian University of Technology
Xiuping Liu, Dalian University of Technology
Baocai Yin, Dalian University of Technology
Xin Li, Louisiana State University

Document Type

Article

Publication Date

1-1-2022

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making the distance of the pairs of synthesized image and its corresponding text description closer, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods.

Publication Source (Journal or Book title)

IEEE Transactions on Multimedia

First Page

832

Last Page

845

Recommended Citation

Tan, H., Liu, X., Yin, B., & Li, X. (2022). Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis. IEEE Transactions on Multimedia, 24, 832-845. https://doi.org/10.1109/TMM.2021.3060291

This document is currently not available here.

COinS

Faculty Publications

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY