LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Research output: Contribution to journalArticleResearchpeer review

Authors

Research Organisations

External Research Organisations

  • University of Tennessee, Chattanooga
View graph of relations

Details

Original languageEnglish
Pages (from-to)6811–6837
Number of pages27
JournalMachine learning
Volume113
Issue number9
Early online date15 Jul 2024
Publication statusPublished - Sept 2024

Abstract

Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

Keywords

    Language model, Mtmct, Multi-camera tracking, Object tracking

ASJC Scopus subject areas

Cite this

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. / Nguyen, Tuan T.; Nguyen, Hoang H.; Sartipi, Mina et al.
In: Machine learning, Vol. 113, No. 9, 09.2024, p. 6811–6837.

Research output: Contribution to journalArticleResearchpeer review

Nguyen TT, Nguyen HH, Sartipi M, Fisichella M. LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios. Machine learning. 2024 Sept;113(9):6811–6837. Epub 2024 Jul 15. doi: 10.1007/s10994-024-06592-1
Nguyen, Tuan T. ; Nguyen, Hoang H. ; Sartipi, Mina et al. / LaMMOn : language model combined graph neural network for multi-target multi-camera tracking in online scenarios. In: Machine learning. 2024 ; Vol. 113, No. 9. pp. 6811–6837.
Download
@article{3add23712e2f4aa4966de3fb3766b661,
title = "LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios",
abstract = "Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.",
keywords = "Language model, Mtmct, Multi-camera tracking, Object tracking",
author = "Nguyen, {Tuan T.} and Nguyen, {Hoang H.} and Mina Sartipi and Marco Fisichella",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
month = sep,
doi = "10.1007/s10994-024-06592-1",
language = "English",
volume = "113",
pages = "6811–6837",
journal = "Machine learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "9",

}

Download

TY - JOUR

T1 - LaMMOn

T2 - language model combined graph neural network for multi-target multi-camera tracking in online scenarios

AU - Nguyen, Tuan T.

AU - Nguyen, Hoang H.

AU - Sartipi, Mina

AU - Fisichella, Marco

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/9

Y1 - 2024/9

N2 - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

AB - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.

KW - Language model

KW - Mtmct

KW - Multi-camera tracking

KW - Object tracking

UR - http://www.scopus.com/inward/record.url?scp=85198649632&partnerID=8YFLogxK

U2 - 10.1007/s10994-024-06592-1

DO - 10.1007/s10994-024-06592-1

M3 - Article

AN - SCOPUS:85198649632

VL - 113

SP - 6811

EP - 6837

JO - Machine learning

JF - Machine learning

SN - 0885-6125

IS - 9

ER -

By the same author(s)