M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks

Mariia Khan; Jumana Abu-Khalaf; David Suter; Bodo Rosenhahn

doi:10.1007/978-3-031-25825-1_18

Details

Originalsprache	Englisch
Titel des Sammelwerks	Image and Vision Computing
Untertitel	37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers
Herausgeber/-innen	Wei Qi Yan, Minh Nguyen, Martin Stommel
Herausgeber (Verlag)	Springer Science and Business Media Deutschland GmbH
Seiten	246-261
Seitenumfang	16
ISBN (elektronisch)	978-3-031-25825-1
ISBN (Print)	9783031258244
Publikationsstatus	Veröffentlicht - 2023
Veranstaltung	37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022 - Auckland, Neuseeland Dauer: 24 Nov. 2022 → 25 Nov. 2022

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	13836 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP₇₅. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)

Zitieren

M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. / Khan, Mariia; Abu-Khalaf, Jumana; Suter, David et al.
Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Hrsg. / Wei Qi Yan; Minh Nguyen; Martin Stommel. Springer Science and Business Media Deutschland GmbH, 2023. S. 246-261 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13836 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Khan, M, Abu-Khalaf, J, Suter, D & Rosenhahn, B 2023, M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. in WQ Yan, M Nguyen & M Stommel (Hrsg.), Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13836 LNCS, Springer Science and Business Media Deutschland GmbH, S. 246-261, 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022, Auckland, Neuseeland, 24 Nov. 2022. https://doi.org/10.1007/978-3-031-25825-1_18

Khan, M., Abu-Khalaf, J., Suter, D., & Rosenhahn, B. (2023). M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. In W. Q. Yan, M. Nguyen, & M. Stommel (Hrsg.), Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers (S. 246-261). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13836 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-25825-1_18

Khan M, Abu-Khalaf J, Suter D, Rosenhahn B. M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. in Yan WQ, Nguyen M, Stommel M, Hrsg., Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2023. S. 246-261. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2023 Feb 4. doi: 10.1007/978-3-031-25825-1_18

Khan, Mariia ; Abu-Khalaf, Jumana ; Suter, David et al. / M3T : Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Hrsg. / Wei Qi Yan ; Minh Nguyen ; Martin Stommel. Springer Science and Business Media Deutschland GmbH, 2023. S. 246-261 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{f66aa93d2f084ea49f2208a469ef4958,

title = "M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks",

abstract = "In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent{\textquoteright}s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.",

keywords = "Embodied AI, Multiple Object Tracking, Scene Understanding",

author = "Mariia Khan and Jumana Abu-Khalaf and David Suter and Bodo Rosenhahn",

year = "2023",

doi = "10.1007/978-3-031-25825-1_18",

language = "English",

isbn = "9783031258244",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "246--261",

editor = "Yan, {Wei Qi} and Minh Nguyen and Martin Stommel",

booktitle = "Image and Vision Computing",

address = "Germany",

note = "37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022 ; Conference date: 24-11-2022 Through 25-11-2022",

}

Download

TY - GEN

T1 - M3T

T2 - 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022

AU - Khan, Mariia

AU - Abu-Khalaf, Jumana

AU - Suter, David

AU - Rosenhahn, Bodo

PY - 2023

Y1 - 2023

N2 - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

AB - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

KW - Embodied AI

KW - Multiple Object Tracking

KW - Scene Understanding

UR - http://www.scopus.com/inward/record.url?scp=85147999282&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-25825-1_18

DO - 10.1007/978-3-031-25825-1_18

M3 - Conference contribution

AN - SCOPUS:85147999282

SN - 9783031258244

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 246

EP - 261

BT - Image and Vision Computing

A2 - Yan, Wei Qi

A2 - Nguyen, Minh

A2 - Stommel, Martin

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 24 November 2022 through 25 November 2022

ER -

Research@Leibniz University

M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks

Autoren

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Shape Fitting for 3D Scene Abstraction

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus

Q-SENN: Quantized Self-Explaining Neural Networks

Monte Carlo graph search for quantum circuit optimization