LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Tuan T. Nguyen; Hoang H. Nguyen; Mina Sartipi; Marco Fisichella

doi:10.1007/s10994-024-06592-1

Details

Originalsprache	Englisch
Seiten (von - bis)	6811–6837
Seitenumfang	27
Fachzeitschrift	Machine learning
Jahrgang	113
Ausgabenummer	9
Frühes Online-Datum	15 Juli 2024
Publikationsstatus	Veröffentlicht - Sept. 2024

Abstract

Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Artificial intelligence

Zitieren

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. / Nguyen, Tuan T.; Nguyen, Hoang H.; Sartipi, Mina et al.
in: Machine learning, Jahrgang 113, Nr. 9, 09.2024, S. 6811–6837.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Nguyen, TT, Nguyen, HH, Sartipi, M & Fisichella, M 2024, 'LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios', Machine learning, Jg. 113, Nr. 9, S. 6811–6837. https://doi.org/10.1007/s10994-024-06592-1

Nguyen, T. T., Nguyen, H. H., Sartipi, M., & Fisichella, M. (2024). LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. Machine learning, 113(9), 6811–6837. https://doi.org/10.1007/s10994-024-06592-1

Nguyen TT, Nguyen HH, Sartipi M, Fisichella M. LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. Machine learning. 2024 Sep;113(9):6811–6837. Epub 2024 Jul 15. doi: 10.1007/s10994-024-06592-1

Nguyen, Tuan T. ; Nguyen, Hoang H. ; Sartipi, Mina et al. / LaMMOn : language model combined graph neural network for multi-target multi-camera tracking in online scenarios. in: Machine learning. 2024 ; Jahrgang 113, Nr. 9. S. 6811–6837.

Download

@article{3add23712e2f4aa4966de3fb3766b661,

title = "LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios",

abstract = "Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.",

keywords = "Language model, Mtmct, Multi-camera tracking, Object tracking",

author = "Nguyen, {Tuan T.} and Nguyen, {Hoang H.} and Mina Sartipi and Marco Fisichella",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = sep,

doi = "10.1007/s10994-024-06592-1",

language = "English",

volume = "113",

pages = "6811–6837",

journal = "Machine learning",

issn = "0885-6125",

publisher = "Springer Netherlands",

number = "9",

}

Download

TY - JOUR

T1 - LaMMOn

T2 - language model combined graph neural network for multi-target multi-camera tracking in online scenarios

AU - Nguyen, Tuan T.

AU - Nguyen, Hoang H.

AU - Sartipi, Mina

AU - Fisichella, Marco

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/9

Y1 - 2024/9

N2 - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

AB - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

KW - Language model

KW - Mtmct

KW - Multi-camera tracking

KW - Object tracking

UR - http://www.scopus.com/inward/record.url?scp=85198649632&partnerID=8YFLogxK

U2 - 10.1007/s10994-024-06592-1

DO - 10.1007/s10994-024-06592-1

M3 - Article

AN - SCOPUS:85198649632

VL - 113

SP - 6811

EP - 6837

JO - Machine learning

JF - Machine learning

SN - 0885-6125

IS - 9

ER -

Research@Leibniz University

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

FairTrade: Achieving Pareto-Optimal Trade-Offs between Balanced Accuracy and Fairness in Federated Learning

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

FairTrade: Achieving Pareto-Optimal Trade-Offs between Balanced Accuracy and Fairness in Federated Learning

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

Open benchmark for filtering techniques in entity resolution