M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Edith Cowan University
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksImage and Vision Computing
Untertitel37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers
Herausgeber/-innenWei Qi Yan, Minh Nguyen, Martin Stommel
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten246-261
Seitenumfang16
ISBN (elektronisch)978-3-031-25825-1
ISBN (Print)9783031258244
PublikationsstatusVeröffentlicht - 2023
Veranstaltung37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022 - Auckland, Neuseeland
Dauer: 24 Nov. 202225 Nov. 2022

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band13836 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

ASJC Scopus Sachgebiete

Zitieren

M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. / Khan, Mariia; Abu-Khalaf, Jumana; Suter, David et al.
Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Hrsg. / Wei Qi Yan; Minh Nguyen; Martin Stommel. Springer Science and Business Media Deutschland GmbH, 2023. S. 246-261 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13836 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Khan, M, Abu-Khalaf, J, Suter, D & Rosenhahn, B 2023, M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. in WQ Yan, M Nguyen & M Stommel (Hrsg.), Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13836 LNCS, Springer Science and Business Media Deutschland GmbH, S. 246-261, 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022, Auckland, Neuseeland, 24 Nov. 2022. https://doi.org/10.1007/978-3-031-25825-1_18
Khan, M., Abu-Khalaf, J., Suter, D., & Rosenhahn, B. (2023). M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. In W. Q. Yan, M. Nguyen, & M. Stommel (Hrsg.), Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers (S. 246-261). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13836 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-25825-1_18
Khan M, Abu-Khalaf J, Suter D, Rosenhahn B. M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. in Yan WQ, Nguyen M, Stommel M, Hrsg., Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2023. S. 246-261. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2023 Feb 4. doi: 10.1007/978-3-031-25825-1_18
Khan, Mariia ; Abu-Khalaf, Jumana ; Suter, David et al. / M3T : Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks. Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Hrsg. / Wei Qi Yan ; Minh Nguyen ; Martin Stommel. Springer Science and Business Media Deutschland GmbH, 2023. S. 246-261 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{f66aa93d2f084ea49f2208a469ef4958,
title = "M3T: Multi-class Multi-instance Multi-view Object Tracking for Embodied AI Tasks",
abstract = "In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent{\textquoteright}s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.",
keywords = "Embodied AI, Multiple Object Tracking, Scene Understanding",
author = "Mariia Khan and Jumana Abu-Khalaf and David Suter and Bodo Rosenhahn",
year = "2023",
doi = "10.1007/978-3-031-25825-1_18",
language = "English",
isbn = "9783031258244",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "246--261",
editor = "Yan, {Wei Qi} and Minh Nguyen and Martin Stommel",
booktitle = "Image and Vision Computing",
address = "Germany",
note = "37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022 ; Conference date: 24-11-2022 Through 25-11-2022",

}

Download

TY - GEN

T1 - M3T

T2 - 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022

AU - Khan, Mariia

AU - Abu-Khalaf, Jumana

AU - Suter, David

AU - Rosenhahn, Bodo

PY - 2023

Y1 - 2023

N2 - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

AB - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.

KW - Embodied AI

KW - Multiple Object Tracking

KW - Scene Understanding

UR - http://www.scopus.com/inward/record.url?scp=85147999282&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-25825-1_18

DO - 10.1007/978-3-031-25825-1_18

M3 - Conference contribution

AN - SCOPUS:85147999282

SN - 9783031258244

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 246

EP - 261

BT - Image and Vision Computing

A2 - Yan, Wei Qi

A2 - Nguyen, Minh

A2 - Stommel, Martin

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 24 November 2022 through 25 November 2022

ER -

Von denselben Autoren