RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Pages7774-7783
Number of pages10
ISBN (electronic)9781728132938
Publication statusPublished - 2019
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
Duration: 16 Jun 201920 Jun 2019

Publication series

NameIEEE Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919
ISSN (electronic)2575-7075

Abstract

This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.

Keywords

    3D from Single Image, And Body Pose, Face, Gesture

ASJC Scopus subject areas

Cite this

RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. / Wandt, Bastian; Rosenhahn, Bodo.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2019. p. 7774-7783 8953653 (IEEE Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Wandt, B & Rosenhahn, B 2019, RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition., 8953653, IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774-7783, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, United States, 16 Jun 2019. https://doi.org/10.1109/CVPR.2019.00797
Wandt, B., & Rosenhahn, B. (2019). RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 7774-7783). Article 8953653 (IEEE Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2019.00797
Wandt B, Rosenhahn B. RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2019. p. 7774-7783. 8953653. (IEEE Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR.2019.00797
Wandt, Bastian ; Rosenhahn, Bodo. / RepNet : Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2019. pp. 7774-7783 (IEEE Conference on Computer Vision and Pattern Recognition).
Download
@inproceedings{d2bba294f44e4a82b641e83b55f59edd,
title = "RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation",
abstract = "This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.",
keywords = "3D from Single Image, And Body Pose, Face, Gesture",
author = "Bastian Wandt and Bodo Rosenhahn",
year = "2019",
doi = "10.1109/CVPR.2019.00797",
language = "English",
isbn = "978-1-7281-3294-5",
series = "IEEE Conference on Computer Vision and Pattern Recognition",
pages = "7774--7783",
booktitle = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
note = "32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

}

Download

TY - GEN

T1 - RepNet

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

AU - Wandt, Bastian

AU - Rosenhahn, Bodo

PY - 2019

Y1 - 2019

N2 - This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.

AB - This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.

KW - 3D from Single Image

KW - And Body Pose

KW - Face

KW - Gesture

UR - http://www.scopus.com/inward/record.url?scp=85074369170&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2019.00797

DO - 10.1109/CVPR.2019.00797

M3 - Conference contribution

AN - SCOPUS:85074369170

SN - 978-1-7281-3294-5

T3 - IEEE Conference on Computer Vision and Pattern Recognition

SP - 7774

EP - 7783

BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Y2 - 16 June 2019 through 20 June 2019

ER -

By the same author(s)