Learning to Disentangle Latent Physical Factors for Video Prediction

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

Externe Organisationen

  • Max-Planck-Institut für Intelligente Systeme (Stuttgart)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksPattern Recognition
Untertitel41st DAGM German Conference, DAGM GCPR 2019, Proceedings
Herausgeber/-innenGernot A. Fink, Simone Frintrop, Xiaoyi Jiang
Herausgeber (Verlag)Springer Nature
Seiten595-608
Seitenumfang14
Auflage1.
ISBN (elektronisch)978-3-030-33676-9
ISBN (Print)978-3-030-33675-2
PublikationsstatusVeröffentlicht - 25 Okt. 2019
Veranstaltung41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 - Dortmund, Deutschland
Dauer: 10 Sept. 201913 Sept. 2019

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band11824 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

ASJC Scopus Sachgebiete

Zitieren

Learning to Disentangle Latent Physical Factors for Video Prediction. / Zhu, Deyao; Munderloh, Marco; Rosenhahn, Bodo et al.
Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. Hrsg. / Gernot A. Fink; Simone Frintrop; Xiaoyi Jiang. 1. Aufl. Springer Nature, 2019. S. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 11824 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Zhu, D, Munderloh, M, Rosenhahn, B & Stückler, J 2019, Learning to Disentangle Latent Physical Factors for Video Prediction. in GA Fink, S Frintrop & X Jiang (Hrsg.), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. Aufl., Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 11824 LNCS, Springer Nature, S. 595-608, 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019, Dortmund, Deutschland, 10 Sept. 2019. https://doi.org/10.1007/978-3-030-33676-9_42
Zhu, D., Munderloh, M., Rosenhahn, B., & Stückler, J. (2019). Learning to Disentangle Latent Physical Factors for Video Prediction. In G. A. Fink, S. Frintrop, & X. Jiang (Hrsg.), Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings (1. Aufl., S. 595-608). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 11824 LNCS). Springer Nature. https://doi.org/10.1007/978-3-030-33676-9_42
Zhu D, Munderloh M, Rosenhahn B, Stückler J. Learning to Disentangle Latent Physical Factors for Video Prediction. in Fink GA, Frintrop S, Jiang X, Hrsg., Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. 1. Aufl. Springer Nature. 2019. S. 595-608. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-33676-9_42
Zhu, Deyao ; Munderloh, Marco ; Rosenhahn, Bodo et al. / Learning to Disentangle Latent Physical Factors for Video Prediction. Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. Hrsg. / Gernot A. Fink ; Simone Frintrop ; Xiaoyi Jiang. 1. Aufl. Springer Nature, 2019. S. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{9b635a8a9c724a579eec0f4e9083b16a,
title = "Learning to Disentangle Latent Physical Factors for Video Prediction",
abstract = "Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.",
author = "Deyao Zhu and Marco Munderloh and Bodo Rosenhahn and J{\"o}rg St{\"u}ckler",
note = "Funding information: This work has been supported through Cyber Valley.; 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 ; Conference date: 10-09-2019 Through 13-09-2019",
year = "2019",
month = oct,
day = "25",
doi = "10.1007/978-3-030-33676-9_42",
language = "English",
isbn = "978-3-030-33675-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature",
pages = "595--608",
editor = "Fink, {Gernot A.} and Simone Frintrop and Xiaoyi Jiang",
booktitle = "Pattern Recognition",
address = "United States",
edition = "1.",

}

Download

TY - GEN

T1 - Learning to Disentangle Latent Physical Factors for Video Prediction

AU - Zhu, Deyao

AU - Munderloh, Marco

AU - Rosenhahn, Bodo

AU - Stückler, Jörg

N1 - Funding information: This work has been supported through Cyber Valley.

PY - 2019/10/25

Y1 - 2019/10/25

N2 - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

AB - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.

UR - http://www.scopus.com/inward/record.url?scp=85076178878&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-33676-9_42

DO - 10.1007/978-3-030-33676-9_42

M3 - Conference contribution

AN - SCOPUS:85076178878

SN - 978-3-030-33675-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 595

EP - 608

BT - Pattern Recognition

A2 - Fink, Gernot A.

A2 - Frintrop, Simone

A2 - Jiang, Xiaoyi

PB - Springer Nature

T2 - 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019

Y2 - 10 September 2019 through 13 September 2019

ER -

Von denselben Autoren