Context-Aware Layout to Image Generation with Enhanced Object Appearance

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Sen He
  • Wentong Liao
  • Michael Ying Yang
  • Yongxin Yang
  • Yi-Zhe Song
  • Bodo Rosenhahn
  • Tao Xiang

Research Organisations

External Research Organisations

  • University of Surrey
  • University of Twente
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages15044-15053
Number of pages10
ISBN (electronic)978-1-6654-4509-2
ISBN (print)978-1-6654-4510-8
Publication statusPublished - 2021

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919
ISSN (electronic)2575-7075

Abstract

A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

Keywords

    cs.CV

ASJC Scopus subject areas

Cite this

Context-Aware Layout to Image Generation with Enhanced Object Appearance. / He, Sen; Liao, Wentong; Yang, Michael Ying et al.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. p. 15044-15053 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

He, S, Liao, W, Yang, MY, Yang, Y, Song, Y-Z, Rosenhahn, B & Xiang, T 2021, Context-Aware Layout to Image Generation with Enhanced Object Appearance. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 15044-15053. https://doi.org/10.1109/CVPR46437.2021.01480
He, S., Liao, W., Yang, M. Y., Yang, Y., Song, Y.-Z., Rosenhahn, B., & Xiang, T. (2021). Context-Aware Layout to Image Generation with Enhanced Object Appearance. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 15044-15053). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR46437.2021.01480
He S, Liao W, Yang MY, Yang Y, Song YZ, Rosenhahn B et al. Context-Aware Layout to Image Generation with Enhanced Object Appearance. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2021. p. 15044-15053. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.01480
He, Sen ; Liao, Wentong ; Yang, Michael Ying et al. / Context-Aware Layout to Image Generation with Enhanced Object Appearance. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 15044-15053 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).
Download
@inproceedings{0a5525459ebb47ed9a06d9bd49de027a,
title = "Context-Aware Layout to Image Generation with Enhanced Object Appearance",
abstract = "A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks. ",
keywords = "cs.CV",
author = "Sen He and Wentong Liao and Yang, {Michael Ying} and Yongxin Yang and Yi-Zhe Song and Bodo Rosenhahn and Tao Xiang",
note = "Funding Information: This work was supported by the Center for Digital In- novations (ZDIN), Federal Ministry of Education and Re- search (BMBF), Germany under the project LeibnizKILa- bor(grant no.01DD20003) and the Deutsche Forschungs- gemeinschaft (DFG) under Germany{\textquoteright}s Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122).",
year = "2021",
doi = "10.1109/CVPR46437.2021.01480",
language = "English",
isbn = "978-1-6654-4510-8",
series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "15044--15053",
booktitle = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
address = "United States",

}

Download

TY - GEN

T1 - Context-Aware Layout to Image Generation with Enhanced Object Appearance

AU - He, Sen

AU - Liao, Wentong

AU - Yang, Michael Ying

AU - Yang, Yongxin

AU - Song, Yi-Zhe

AU - Rosenhahn, Bodo

AU - Xiang, Tao

N1 - Funding Information: This work was supported by the Center for Digital In- novations (ZDIN), Federal Ministry of Education and Re- search (BMBF), Germany under the project LeibnizKILa- bor(grant no.01DD20003) and the Deutsche Forschungs- gemeinschaft (DFG) under Germany’s Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122).

PY - 2021

Y1 - 2021

N2 - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

AB - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.

KW - cs.CV

UR - http://www.scopus.com/inward/record.url?scp=85115173922&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.01480

DO - 10.1109/CVPR46437.2021.01480

M3 - Conference contribution

SN - 978-1-6654-4510-8

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 15044

EP - 15053

BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

By the same author(s)