Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Pattern Recognition |
Untertitel | 41st DAGM German Conference, DAGM GCPR 2019, Proceedings |
Herausgeber/-innen | Gernot A. Fink, Simone Frintrop, Xiaoyi Jiang |
Herausgeber (Verlag) | Springer Nature |
Seiten | 595-608 |
Seitenumfang | 14 |
Auflage | 1. |
ISBN (elektronisch) | 978-3-030-33676-9 |
ISBN (Print) | 978-3-030-33675-2 |
Publikationsstatus | Veröffentlicht - 25 Okt. 2019 |
Veranstaltung | 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019 - Dortmund, Deutschland Dauer: 10 Sept. 2019 → 13 Sept. 2019 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 11824 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Proceedings. Hrsg. / Gernot A. Fink; Simone Frintrop; Xiaoyi Jiang. 1. Aufl. Springer Nature, 2019. S. 595-608 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 11824 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Learning to Disentangle Latent Physical Factors for Video Prediction
AU - Zhu, Deyao
AU - Munderloh, Marco
AU - Rosenhahn, Bodo
AU - Stückler, Jörg
N1 - Funding information: This work has been supported through Cyber Valley.
PY - 2019/10/25
Y1 - 2019/10/25
N2 - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.
AB - Physical scene understanding is a fundamental human ability. Empowering artificial systems with such understanding is an important step towards flexible and adaptive behavior in the real world. As a step in this direction, we propose a novel approach to physical scene understanding in video. We train a deep neural network for video prediction which embeds the video sequence in a low-dimensional recurrent latent space representation. We optimize the total correlation of the latent dimensions within a variational recurrent auto-encoder framework. This encourages the representation to disentangle the latent physical factors of variation in the training data. To train and evaluate our approach, we use synthetic video sequences in three different physical scenarios with various degrees of difficulty. Our experiments demonstrate that our model can disentangle several appearance-related properties in the unsupervised case. If we add supervision signals for the latent code, our model can further improve the disentanglement of dynamics-related properties.
UR - http://www.scopus.com/inward/record.url?scp=85076178878&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-33676-9_42
DO - 10.1007/978-3-030-33676-9_42
M3 - Conference contribution
AN - SCOPUS:85076178878
SN - 978-3-030-33675-2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 595
EP - 608
BT - Pattern Recognition
A2 - Fink, Gernot A.
A2 - Frintrop, Simone
A2 - Jiang, Xiaoyi
PB - Springer Nature
T2 - 41st DAGM German Conference on Pattern Recognition, DAGM GCPR 2019
Y2 - 10 September 2019 through 13 September 2019
ER -