Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Image and Vision Computing |
Untertitel | 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers |
Herausgeber/-innen | Wei Qi Yan, Minh Nguyen, Martin Stommel |
Herausgeber (Verlag) | Springer Science and Business Media Deutschland GmbH |
Seiten | 246-261 |
Seitenumfang | 16 |
ISBN (elektronisch) | 978-3-031-25825-1 |
ISBN (Print) | 9783031258244 |
Publikationsstatus | Veröffentlicht - 2023 |
Veranstaltung | 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022 - Auckland, Neuseeland Dauer: 24 Nov. 2022 → 25 Nov. 2022 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 13836 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Image and Vision Computing: 37th International Conference, IVCNZ 2022, Auckland, New Zealand, November 24–25, 2022, Revised Selected Papers. Hrsg. / Wei Qi Yan; Minh Nguyen; Martin Stommel. Springer Science and Business Media Deutschland GmbH, 2023. S. 246-261 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13836 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - M3T
T2 - 37th International Conference on Image and Vision Computing New Zealand, IVCNZ 2022
AU - Khan, Mariia
AU - Abu-Khalaf, Jumana
AU - Suter, David
AU - Rosenhahn, Bodo
PY - 2023
Y1 - 2023
N2 - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.
AB - In this paper, we propose an extended multiple object tracking (MOT) task definition for embodied AI visual exploration research task - multi-class, multi-instance and multi-view object tracking (M3T). The aim of the proposed M3T task is to identify the unique number of objects in the environment, observed on the agent’s way, and visible from far or close view, from different angles or visible only partially. Classic MOT algorithms are not applicable for the M3T task, as they typically target moving single-class multiple object instances in one video and track objects, visible from only one angle or camera viewpoint. Thus, we present the M3T-Round algorithm designed for a simple scenario, where an agent takes 12 image frames, while rotating 360° from the initial position in a scene. We, first, detect each object in all image frames and then track objects (without any training), using cosine similarity metric for association of object tracks. The detector part of our M3T-Round algorithm is compatible with the baseline YOLOv4 algorithm [1] in terms of detection accuracy: a 5.26 point improvement in AP75. The tracker part of our M3T-Round algorithm shows a 4.6 point improvement in HOTA over GMOTv2 algorithm [2], a recent, high-performance tracking method. Moreover, we have collected a new challenging tracking dataset from AI2-Thor [3] simulator for training and evaluation of the proposed M3T-Round algorithm.
KW - Embodied AI
KW - Multiple Object Tracking
KW - Scene Understanding
UR - http://www.scopus.com/inward/record.url?scp=85147999282&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-25825-1_18
DO - 10.1007/978-3-031-25825-1_18
M3 - Conference contribution
AN - SCOPUS:85147999282
SN - 9783031258244
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 246
EP - 261
BT - Image and Vision Computing
A2 - Yan, Wei Qi
A2 - Nguyen, Minh
A2 - Stommel, Martin
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 24 November 2022 through 25 November 2022
ER -