Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency

Felix Kuhnke; Jorn Ostermann

doi:10.1109/TBIOM.2023.3237039

Details

Originalsprache	Englisch
Seiten (von - bis)	348-359
Seitenumfang	12
Fachzeitschrift	IEEE Transactions on Biometrics, Behavior, and Identity Science
Jahrgang	5
Ausgabenummer	3
Frühes Online-Datum	19 Jan. 2023
Publikationsstatus	Veröffentlicht - Juli 2023

Abstract

Head pose estimation plays a vital role in biometric systems related to facial and human behavior analysis. Typically, neural networks are trained on head pose datasets. Unfortunately, manual or sensor-based annotation of head pose is impractical. A solution is synthetic training data generated from 3D face models, which can provide an infinite number of perfect labels. However, computer generated images only provide an approximation of real-world images, leading to a performance gap between training and application domain. Therefore, there is a need for strategies that allow simultaneous learning on labeled synthetic data and unlabeled real-world data to overcome the domain gap. In this work we propose relative pose consistency, a semi-supervised learning strategy for head pose estimation based on consistency regularization. Consistency regularization enforces consistent network predictions under random image augmentations, including pose-preserving and pose-altering augmentations. We propose a strategy to exploit the relative pose introduced by pose-altering augmentations between augmented image pairs, to allow the network to benefit from relative pose labels during training on unlabeled data. We evaluate our approach in a domain-adaptation scenario and in a commonly used cross-dataset scenario. Furthermore, we reproduce related works to enforce consistent evaluation protocols and show that for both scenarios we outperform SOTA.

ASJC Scopus Sachgebiete

Informatik (insg.)
Artificial intelligence
Physik und Astronomie (insg.)
Instrumentierung
Informatik (insg.)
Maschinelles Sehen und Mustererkennung
Informatik (insg.)
Angewandte Informatik

Zitieren

Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency. / Kuhnke, Felix; Ostermann, Jorn.
in: IEEE Transactions on Biometrics, Behavior, and Identity Science, Jahrgang 5, Nr. 3, 07.2023, S. 348-359.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Kuhnke, F & Ostermann, J 2023, 'Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency', IEEE Transactions on Biometrics, Behavior, and Identity Science, Jg. 5, Nr. 3, S. 348-359. https://doi.org/10.1109/TBIOM.2023.3237039

Kuhnke, F., & Ostermann, J. (2023). Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency. IEEE Transactions on Biometrics, Behavior, and Identity Science, 5(3), 348-359. https://doi.org/10.1109/TBIOM.2023.3237039

Kuhnke F, Ostermann J. Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency. IEEE Transactions on Biometrics, Behavior, and Identity Science. 2023 Jul;5(3):348-359. Epub 2023 Jan 19. doi: 10.1109/TBIOM.2023.3237039

Kuhnke, Felix ; Ostermann, Jorn. / Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency. in: IEEE Transactions on Biometrics, Behavior, and Identity Science. 2023 ; Jahrgang 5, Nr. 3. S. 348-359.

Download

@article{14eebba3a0444891a80247ae2f41804e,

title = "Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency",

abstract = "Head pose estimation plays a vital role in biometric systems related to facial and human behavior analysis. Typically, neural networks are trained on head pose datasets. Unfortunately, manual or sensor-based annotation of head pose is impractical. A solution is synthetic training data generated from 3D face models, which can provide an infinite number of perfect labels. However, computer generated images only provide an approximation of real-world images, leading to a performance gap between training and application domain. Therefore, there is a need for strategies that allow simultaneous learning on labeled synthetic data and unlabeled real-world data to overcome the domain gap. In this work we propose relative pose consistency, a semi-supervised learning strategy for head pose estimation based on consistency regularization. Consistency regularization enforces consistent network predictions under random image augmentations, including pose-preserving and pose-altering augmentations. We propose a strategy to exploit the relative pose introduced by pose-altering augmentations between augmented image pairs, to allow the network to benefit from relative pose labels during training on unlabeled data. We evaluate our approach in a domain-adaptation scenario and in a commonly used cross-dataset scenario. Furthermore, we reproduce related works to enforce consistent evaluation protocols and show that for both scenarios we outperform SOTA.",

keywords = "Behavioral sciences, Consistency Regularization, Deep Learning, Domain Adaptation, Feature extraction, Head Pose Estimation, Pose estimation, Task analysis, Three-dimensional displays, Training, Training data, consistency regularization, deep learning, domain adaptation, Head pose estimation",

author = "Felix Kuhnke and Jorn Ostermann",

year = "2023",

month = jul,

doi = "10.1109/TBIOM.2023.3237039",

language = "English",

volume = "5",

pages = "348--359",

number = "3",

}

Download

TY - JOUR

T1 - Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency

AU - Kuhnke, Felix

AU - Ostermann, Jorn

PY - 2023/7

Y1 - 2023/7

N2 - Head pose estimation plays a vital role in biometric systems related to facial and human behavior analysis. Typically, neural networks are trained on head pose datasets. Unfortunately, manual or sensor-based annotation of head pose is impractical. A solution is synthetic training data generated from 3D face models, which can provide an infinite number of perfect labels. However, computer generated images only provide an approximation of real-world images, leading to a performance gap between training and application domain. Therefore, there is a need for strategies that allow simultaneous learning on labeled synthetic data and unlabeled real-world data to overcome the domain gap. In this work we propose relative pose consistency, a semi-supervised learning strategy for head pose estimation based on consistency regularization. Consistency regularization enforces consistent network predictions under random image augmentations, including pose-preserving and pose-altering augmentations. We propose a strategy to exploit the relative pose introduced by pose-altering augmentations between augmented image pairs, to allow the network to benefit from relative pose labels during training on unlabeled data. We evaluate our approach in a domain-adaptation scenario and in a commonly used cross-dataset scenario. Furthermore, we reproduce related works to enforce consistent evaluation protocols and show that for both scenarios we outperform SOTA.

AB - Head pose estimation plays a vital role in biometric systems related to facial and human behavior analysis. Typically, neural networks are trained on head pose datasets. Unfortunately, manual or sensor-based annotation of head pose is impractical. A solution is synthetic training data generated from 3D face models, which can provide an infinite number of perfect labels. However, computer generated images only provide an approximation of real-world images, leading to a performance gap between training and application domain. Therefore, there is a need for strategies that allow simultaneous learning on labeled synthetic data and unlabeled real-world data to overcome the domain gap. In this work we propose relative pose consistency, a semi-supervised learning strategy for head pose estimation based on consistency regularization. Consistency regularization enforces consistent network predictions under random image augmentations, including pose-preserving and pose-altering augmentations. We propose a strategy to exploit the relative pose introduced by pose-altering augmentations between augmented image pairs, to allow the network to benefit from relative pose labels during training on unlabeled data. We evaluate our approach in a domain-adaptation scenario and in a commonly used cross-dataset scenario. Furthermore, we reproduce related works to enforce consistent evaluation protocols and show that for both scenarios we outperform SOTA.

KW - Behavioral sciences

KW - Consistency Regularization

KW - Deep Learning

KW - Domain Adaptation

KW - Feature extraction

KW - Head Pose Estimation

KW - Pose estimation

KW - Task analysis

KW - Three-dimensional displays

KW - Training

KW - Training data

KW - consistency regularization

KW - deep learning

KW - domain adaptation

KW - Head pose estimation

UR - http://www.scopus.com/inward/record.url?scp=85147281573&partnerID=8YFLogxK

U2 - 10.1109/TBIOM.2023.3237039

DO - 10.1109/TBIOM.2023.3237039

M3 - Article

AN - SCOPUS:85147281573

VL - 5

SP - 348

EP - 359

JO - IEEE Transactions on Biometrics, Behavior, and Identity Science

JF - IEEE Transactions on Biometrics, Behavior, and Identity Science

IS - 3

ER -

Research@Leibniz University

Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression