Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Roberto Henschel; Timo von Marcard; Bodo Rosenhahn

doi:10.1109/tip.2020.3013801

Details

Originalsprache	Englisch
Aufsatznummer	9166762
Seiten (von - bis)	8476-8489
Seitenumfang	14
Fachzeitschrift	IEEE Transactions on Image Processing
Jahrgang	29
Publikationsstatus	Veröffentlicht - 13 Aug. 2020

Abstract

Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Computergrafik und computergestütztes Design

Zitieren

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs. / Henschel, Roberto; von Marcard, Timo; Rosenhahn, Bodo.
in: IEEE Transactions on Image Processing, Jahrgang 29, 9166762, 13.08.2020, S. 8476-8489.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Henschel, R, von Marcard, T & Rosenhahn, B 2020, 'Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs', IEEE Transactions on Image Processing, Jg. 29, 9166762, S. 8476-8489. https://doi.org/10.1109/tip.2020.3013801

Henschel, R., von Marcard, T., & Rosenhahn, B. (2020). Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs. IEEE Transactions on Image Processing, 29, 8476-8489. Artikel 9166762. https://doi.org/10.1109/tip.2020.3013801

Henschel R, von Marcard T, Rosenhahn B. Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs. IEEE Transactions on Image Processing. 2020 Aug 13;29:8476-8489. 9166762. doi: 10.1109/tip.2020.3013801

Henschel, Roberto ; von Marcard, Timo ; Rosenhahn, Bodo. / Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs. in: IEEE Transactions on Image Processing. 2020 ; Jahrgang 29. S. 8476-8489.

Download

@article{fba90dc47b3344a2abb48c2f7c3c802b,

title = "Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs",

abstract = "Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.",

keywords = "graph labeling, human motion analysis, IMU, Multiple people tracking, sensor fusion",

author = "Roberto Henschel and {von Marcard}, Timo and Bodo Rosenhahn",

year = "2020",

month = aug,

day = "13",

doi = "10.1109/tip.2020.3013801",

language = "English",

volume = "29",

pages = "8476--8489",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Download

TY - JOUR

T1 - Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

AU - Henschel, Roberto

AU - von Marcard, Timo

AU - Rosenhahn, Bodo

PY - 2020/8/13

Y1 - 2020/8/13

N2 - Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

AB - Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

KW - graph labeling

KW - human motion analysis

KW - IMU

KW - Multiple people tracking

KW - sensor fusion

UR - http://www.scopus.com/inward/record.url?scp=85090821429&partnerID=8YFLogxK

U2 - 10.1109/tip.2020.3013801

DO - 10.1109/tip.2020.3013801

M3 - Article

AN - SCOPUS:85090821429

VL - 29

SP - 8476

EP - 8489

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

M1 - 9166762

ER -

Research@Leibniz University

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction