LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • University of Tennessee, Chattanooga
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)6811–6837
Seitenumfang27
FachzeitschriftMachine learning
Jahrgang113
Ausgabenummer9
Frühes Online-Datum15 Juli 2024
PublikationsstatusVeröffentlicht - Sept. 2024

Abstract

Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

ASJC Scopus Sachgebiete

Zitieren

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. / Nguyen, Tuan T.; Nguyen, Hoang H.; Sartipi, Mina et al.
in: Machine learning, Jahrgang 113, Nr. 9, 09.2024, S. 6811–6837.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Nguyen TT, Nguyen HH, Sartipi M, Fisichella M. LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. Machine learning. 2024 Sep;113(9):6811–6837. Epub 2024 Jul 15. doi: 10.1007/s10994-024-06592-1
Nguyen, Tuan T. ; Nguyen, Hoang H. ; Sartipi, Mina et al. / LaMMOn : language model combined graph neural network for multi-target multi-camera tracking in online scenarios. in: Machine learning. 2024 ; Jahrgang 113, Nr. 9. S. 6811–6837.
Download
@article{3add23712e2f4aa4966de3fb3766b661,
title = "LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios",
abstract = "Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.",
keywords = "Language model, Mtmct, Multi-camera tracking, Object tracking",
author = "Nguyen, {Tuan T.} and Nguyen, {Hoang H.} and Mina Sartipi and Marco Fisichella",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
month = sep,
doi = "10.1007/s10994-024-06592-1",
language = "English",
volume = "113",
pages = "6811–6837",
journal = "Machine learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "9",

}

Download

TY - JOUR

T1 - LaMMOn

T2 - language model combined graph neural network for multi-target multi-camera tracking in online scenarios

AU - Nguyen, Tuan T.

AU - Nguyen, Hoang H.

AU - Sartipi, Mina

AU - Fisichella, Marco

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/9

Y1 - 2024/9

N2 - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

AB - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

KW - Language model

KW - Mtmct

KW - Multi-camera tracking

KW - Object tracking

UR - http://www.scopus.com/inward/record.url?scp=85198649632&partnerID=8YFLogxK

U2 - 10.1007/s10994-024-06592-1

DO - 10.1007/s10994-024-06592-1

M3 - Article

AN - SCOPUS:85198649632

VL - 113

SP - 6811

EP - 6837

JO - Machine learning

JF - Machine learning

SN - 0885-6125

IS - 9

ER -

Von denselben Autoren