Learning to Disentangle Latent Physical Factors for Video Prediction

Deyao Zhu; Marco Munderloh; Bodo Rosenhahn; Jörg Stückler

doi:10.1007/978-3-030-33676-9_42

Details

Originalsprache	Englisch
Titel des Sammelwerks	Pattern Recognition
Untertitel	41st DAGM German Conference, DAGM GCPR 2019, Proceedings
Herausgeber/-innen	Gernot A. Fink, Simone Frintrop, Xiaoyi Jiang
Herausgeber (Verlag)	Springer Nature
Seiten	595-608
Seitenumfang	14
Auflage	1.
ISBN (elektronisch)	978-3-030-33676-9
ISBN (Print)	978-3-030-33675-2
Publikationsstatus	Veröffentlicht - 25 Okt. 2019
Veranstaltung	41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 - Dortmund, Deutschland Dauer: 10 Sept. 2019 → 13 Sept. 2019

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	11824 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

Learning to Disentangle Latent Physical Factors for Video Prediction. / Zhu, Deyao; Munderloh, Marco; Rosenhahn, Bodo et al.
Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. Hrsg. / Gernot A. Fink; Simone Frintrop; Xiaoyi Jiang. 1. Aufl. Springer Nature, 2019. S. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 11824 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Zhu, D, Munderloh, M, Rosenhahn, B & Stückler, J 2019, Learning to Disentangle Latent Physical Factors for Video Prediction. in GA Fink, S Frintrop & X Jiang (Hrsg.), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. Aufl., Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 11824 LNCS, Springer Nature, S. 595-608, 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019, Dortmund, Deutschland, 10 Sept. 2019. https://doi.org/10.1007/978-3-030-33676-9_42

Zhu, D., Munderloh, M., Rosenhahn, B., & Stückler, J. (2019). Learning to Disentangle Latent Physical Factors for Video Prediction. In G. A. Fink, S. Frintrop, & X. Jiang (Hrsg.), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings (1. Aufl., S. 595-608). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 11824 LNCS). Springer Nature. https://doi.org/10.1007/978-3-030-33676-9_42

Zhu D, Munderloh M, Rosenhahn B, Stückler J. Learning to Disentangle Latent Physical Factors for Video Prediction. in Fink GA, Frintrop S, Jiang X, Hrsg., Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. Aufl. Springer Nature. 2019. S. 595-608. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-33676-9_42

Zhu, Deyao ; Munderloh, Marco ; Rosenhahn, Bodo et al. / Learning to Disentangle Latent Physical Factors for Video Prediction. Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. Hrsg. / Gernot A. Fink ; Simone Frintrop ; Xiaoyi Jiang. 1. Aufl. Springer Nature, 2019. S. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{9b635a8a9c724a579eec0f4e9083b16a,

title = "Learning to Disentangle Latent Physical Factors for Video Prediction",

abstract = "Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.",

author = "Deyao Zhu and Marco Munderloh and Bodo Rosenhahn and J{\"o}rg St{\"u}ckler",

note = "Funding information: This work has been supported through Cyber Valley.; 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 ; Conference date: 10-09-2019 Through 13-09-2019",

year = "2019",

month = oct,

day = "25",

doi = "10.1007/978-3-030-33676-9_42",

language = "English",

isbn = "978-3-030-33675-2",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "595--608",

editor = "Fink, {Gernot A.} and Simone Frintrop and Xiaoyi Jiang",

booktitle = "Pattern Recognition",

address = "United States",

edition = "1.",

}

Download

TY - GEN

T1 - Learning to Disentangle Latent Physical Factors for Video Prediction

AU - Zhu, Deyao

AU - Munderloh, Marco

AU - Rosenhahn, Bodo

AU - Stückler, Jörg

N1 - Funding information: This work has been supported through Cyber Valley.

PY - 2019/10/25

Y1 - 2019/10/25

N2 - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

AB - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

UR - http://www.scopus.com/inward/record.url?scp=85076178878&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-33676-9_42

DO - 10.1007/978-3-030-33676-9_42

M3 - Conference contribution

AN - SCOPUS:85076178878

SN - 978-3-030-33675-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 595

EP - 608

BT - Pattern Recognition

A2 - Fink, Gernot A.

A2 - Frintrop, Simone

A2 - Jiang, Xiaoyi

PB - Springer Nature

T2 - 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019

Y2 - 10 September 2019 through 13 September 2019

ER -

Research@Leibniz University

Learning to Disentangle Latent Physical Factors for Video Prediction

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction