Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

S. El Amrani Abouelassad; M. Mehltretter; F. Rottensteiner

doi:10.1007/s41064-024-00311-0

Details

Originalsprache	Englisch
Seiten (von - bis)	499-516
Seitenumfang	18
Fachzeitschrift	PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
Jahrgang	92
Ausgabenummer	5
Frühes Online-Datum	16 Sept. 2024
Publikationsstatus	Veröffentlicht - Okt. 2024

Abstract

Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

ASJC Scopus Sachgebiete

Sozialwissenschaften (insg.)
Geografie, Planung und Entwicklung
Physik und Astronomie (insg.)
Instrumentierung
Erdkunde und Planetologie (insg.)
Erdkunde und Planetologie (sonstige)

Zitieren

Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. / El Amrani Abouelassad, S.; Mehltretter, M.; Rottensteiner, F.
in: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Jahrgang 92, Nr. 5, 10.2024, S. 499-516.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

El Amrani Abouelassad, S, Mehltretter, M & Rottensteiner, F 2024, 'Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN', PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Jg. 92, Nr. 5, S. 499-516. https://doi.org/10.1007/s41064-024-00311-0

El Amrani Abouelassad, S., Mehltretter, M., & Rottensteiner, F. (2024). Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 92(5), 499-516. https://doi.org/10.1007/s41064-024-00311-0

El Amrani Abouelassad S, Mehltretter M, Rottensteiner F. Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 Okt;92(5):499-516. Epub 2024 Sep 16. doi: 10.1007/s41064-024-00311-0

El Amrani Abouelassad, S. ; Mehltretter, M. ; Rottensteiner, F. / Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN. in: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 ; Jahrgang 92, Nr. 5. S. 499-516.

Download

@article{7fd7714d18b148b38200326cc8a3c536,

title = "Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN",

abstract = "Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.",

keywords = "Autonomous driving, Object detection, Object reconstruction, Pose estimation, Shape estimation",

author = "{El Amrani Abouelassad}, S. and M. Mehltretter and F. Rottensteiner",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = oct,

doi = "10.1007/s41064-024-00311-0",

language = "English",

volume = "92",

pages = "499--516",

number = "5",

}

Download

TY - JOUR

T1 - Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

AU - El Amrani Abouelassad, S.

AU - Mehltretter, M.

AU - Rottensteiner, F.

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/10

Y1 - 2024/10

N2 - Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

AB - Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of cm in planimetry and cm in height for keypoints defining the car shape.

KW - Autonomous driving

KW - Object detection

KW - Object reconstruction

KW - Pose estimation

KW - Shape estimation

UR - http://www.scopus.com/inward/record.url?scp=85204012518&partnerID=8YFLogxK

U2 - 10.1007/s41064-024-00311-0

DO - 10.1007/s41064-024-00311-0

M3 - Article

AN - SCOPUS:85204012518

VL - 92

SP - 499

EP - 516

JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

SN - 2512-2789

IS - 5

ER -

Research@Leibniz University

Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Fresh Concrete Properties from Stereoscopic Image Sequences

Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images

Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications

Cooperative Image Orientation with Dynamic Objects

Editorial for Special Issue: 75 Years IPI—an Overview of Current Research Activities in Photogrammetry and Remote Sensing

Fresh Concrete Properties from Stereoscopic Image Sequences

Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images

Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications

Cooperative Image Orientation with Dynamic Objects

Editorial for Special Issue: 75 Years IPI—an Overview of Current Research Activities in Photogrammetry and Remote Sensing

Fresh Concrete Properties from Stereoscopic Image Sequences