Details
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
Pages | 7774-7783 |
Number of pages | 10 |
ISBN (electronic) | 9781728132938 |
Publication status | Published - 2019 |
Event | 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States Duration: 16 Jun 2019 → 20 Jun 2019 |
Publication series
Name | IEEE Conference on Computer Vision and Pattern Recognition |
---|---|
ISSN (Print) | 1063-6919 |
ISSN (electronic) | 2575-7075 |
Abstract
This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.
Keywords
- 3D from Single Image, And Body Pose, Face, Gesture
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2019. p. 7774-7783 8953653 (IEEE Conference on Computer Vision and Pattern Recognition).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - RepNet
T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
AU - Wandt, Bastian
AU - Rosenhahn, Bodo
PY - 2019
Y1 - 2019
N2 - This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.
AB - This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.
KW - 3D from Single Image
KW - And Body Pose
KW - Face
KW - Gesture
UR - http://www.scopus.com/inward/record.url?scp=85074369170&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2019.00797
DO - 10.1109/CVPR.2019.00797
M3 - Conference contribution
AN - SCOPUS:85074369170
SN - 978-1-7281-3294-5
T3 - IEEE Conference on Computer Vision and Pattern Recognition
SP - 7774
EP - 7783
BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Y2 - 16 June 2019 through 20 June 2019
ER -