Details
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 15044-15053 |
Number of pages | 10 |
ISBN (electronic) | 978-1-6654-4509-2 |
ISBN (print) | 978-1-6654-4510-8 |
Publication status | Published - 2021 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
ISSN (Print) | 1063-6919 |
ISSN (electronic) | 2575-7075 |
Abstract
Keywords
- cs.CV
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. p. 15044-15053 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Context-Aware Layout to Image Generation with Enhanced Object Appearance
AU - He, Sen
AU - Liao, Wentong
AU - Yang, Michael Ying
AU - Yang, Yongxin
AU - Song, Yi-Zhe
AU - Rosenhahn, Bodo
AU - Xiang, Tao
N1 - Funding Information: This work was supported by the Center for Digital In- novations (ZDIN), Federal Ministry of Education and Re- search (BMBF), Germany under the project LeibnizKILa- bor(grant no.01DD20003) and the Deutsche Forschungs- gemeinschaft (DFG) under Germany’s Excellence Strategy within the Cluster of Excellence PhoenixD (EXC 2122).
PY - 2021
Y1 - 2021
N2 - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.
AB - A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks.
KW - cs.CV
UR - http://www.scopus.com/inward/record.url?scp=85115173922&partnerID=8YFLogxK
U2 - 10.1109/CVPR46437.2021.01480
DO - 10.1109/CVPR46437.2021.01480
M3 - Conference contribution
SN - 978-1-6654-4510-8
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 15044
EP - 15053
BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
ER -