Learning to Disentangle Latent Physical Factors for Video Prediction

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • Max Planck Institute for Intelligent Systems
View graph of relations

Details

Original languageEnglish
Title of host publicationPattern Recognition
Subtitle of host publication41st DAGM German Conference, DAGM GCPR 2019, Proceedings
EditorsGernot A. Fink, Simone Frintrop, Xiaoyi Jiang
PublisherSpringer Nature
Pages595-608
Number of pages14
Edition1.
ISBN (electronic)978-3-030-33676-9
ISBN (print)978-3-030-33675-2
Publication statusPublished - 25 Oct 2019
Event41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 - Dortmund, Germany
Duration: 10 Sept 201913 Sept 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11824 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

ASJC Scopus subject areas

Cite this

Learning to Disentangle Latent Physical Factors for Video Prediction. / Zhu, Deyao; Munderloh, Marco; Rosenhahn, Bodo et al.
Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. ed. / Gernot A. Fink; Simone Frintrop; Xiaoyi Jiang. 1. ed. Springer Nature, 2019. p. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11824 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Zhu, D, Munderloh, M, Rosenhahn, B & Stückler, J 2019, Learning to Disentangle Latent Physical Factors for Video Prediction. in GA Fink, S Frintrop & X Jiang (eds), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11824 LNCS, Springer Nature, pp. 595-608, 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019, Dortmund, Germany, 10 Sept 2019. https://doi.org/10.1007/978-3-030-33676-9_42
Zhu, D., Munderloh, M., Rosenhahn, B., & Stückler, J. (2019). Learning to Disentangle Latent Physical Factors for Video Prediction. In G. A. Fink, S. Frintrop, & X. Jiang (Eds.), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings (1. ed., pp. 595-608). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11824 LNCS). Springer Nature. https://doi.org/10.1007/978-3-030-33676-9_42
Zhu D, Munderloh M, Rosenhahn B, Stückler J. Learning to Disentangle Latent Physical Factors for Video Prediction. In Fink GA, Frintrop S, Jiang X, editors, Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. ed. Springer Nature. 2019. p. 595-608. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-33676-9_42
Zhu, Deyao ; Munderloh, Marco ; Rosenhahn, Bodo et al. / Learning to Disentangle Latent Physical Factors for Video Prediction. Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. editor / Gernot A. Fink ; Simone Frintrop ; Xiaoyi Jiang. 1. ed. Springer Nature, 2019. pp. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{9b635a8a9c724a579eec0f4e9083b16a,
title = "Learning to Disentangle Latent Physical Factors for Video Prediction",
abstract = "Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.",
author = "Deyao Zhu and Marco Munderloh and Bodo Rosenhahn and J{\"o}rg St{\"u}ckler",
note = "Funding information: This work has been supported through Cyber Valley.; 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 ; Conference date: 10-09-2019 Through 13-09-2019",
year = "2019",
month = oct,
day = "25",
doi = "10.1007/978-3-030-33676-9_42",
language = "English",
isbn = "978-3-030-33675-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature",
pages = "595--608",
editor = "Fink, {Gernot A.} and Simone Frintrop and Xiaoyi Jiang",
booktitle = "Pattern Recognition",
address = "United States",
edition = "1.",

}

Download

TY - GEN

T1 - Learning to Disentangle Latent Physical Factors for Video Prediction

AU - Zhu, Deyao

AU - Munderloh, Marco

AU - Rosenhahn, Bodo

AU - Stückler, Jörg

N1 - Funding information: This work has been supported through Cyber Valley.

PY - 2019/10/25

Y1 - 2019/10/25

N2 - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

AB - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

UR - http://www.scopus.com/inward/record.url?scp=85076178878&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-33676-9_42

DO - 10.1007/978-3-030-33676-9_42

M3 - Conference contribution

AN - SCOPUS:85076178878

SN - 978-3-030-33675-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 595

EP - 608

BT - Pattern Recognition

A2 - Fink, Gernot A.

A2 - Frintrop, Simone

A2 - Jiang, Xiaoyi

PB - Springer Nature

T2 - 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019

Y2 - 10 September 2019 through 13 September 2019

ER -

By the same author(s)