Improving single image localization through domain adaptation and large kernel attention with synthetic data

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Dansheng Yao
  • Hehua Zhu
  • Bangke Ren
  • Xiaoying Zhuang

Research Organisations

External Research Organisations

  • Tongji University
View graph of relations

Details

Original languageEnglish
Article number108951
Number of pages18
JournalEngineering Applications of Artificial Intelligence
Volume137
Early online date8 Aug 2024
Publication statusPublished - Nov 2024

Abstract

In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

Keywords

    3D model, Domain adaptation, Large kernel attention, Synthetic dataset, Visual localization

ASJC Scopus subject areas

Cite this

Improving single image localization through domain adaptation and large kernel attention with synthetic data. / Yao, Dansheng; Zhu, Hehua; Ren, Bangke et al.
In: Engineering Applications of Artificial Intelligence, Vol. 137, 108951, 11.2024.

Research output: Contribution to journalArticleResearchpeer review

Yao D, Zhu H, Ren B, Zhuang X. Improving single image localization through domain adaptation and large kernel attention with synthetic data. Engineering Applications of Artificial Intelligence. 2024 Nov;137:108951. Epub 2024 Aug 8. doi: 10.1016/j.engappai.2024.108951
Download
@article{fcdcb700fec543e9983ccefc51dc033a,
title = "Improving single image localization through domain adaptation and large kernel attention with synthetic data",
abstract = "In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.",
keywords = "3D model, Domain adaptation, Large kernel attention, Synthetic dataset, Visual localization",
author = "Dansheng Yao and Hehua Zhu and Bangke Ren and Xiaoying Zhuang",
note = "Publisher Copyright: {\textcopyright} 2024",
year = "2024",
month = nov,
doi = "10.1016/j.engappai.2024.108951",
language = "English",
volume = "137",
journal = "Engineering Applications of Artificial Intelligence",
issn = "0952-1976",
publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - Improving single image localization through domain adaptation and large kernel attention with synthetic data

AU - Yao, Dansheng

AU - Zhu, Hehua

AU - Ren, Bangke

AU - Zhuang, Xiaoying

N1 - Publisher Copyright: © 2024

PY - 2024/11

Y1 - 2024/11

N2 - In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

AB - In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

KW - 3D model

KW - Domain adaptation

KW - Large kernel attention

KW - Synthetic dataset

KW - Visual localization

UR - http://www.scopus.com/inward/record.url?scp=85200634854&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2024.108951

DO - 10.1016/j.engappai.2024.108951

M3 - Article

AN - SCOPUS:85200634854

VL - 137

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

SN - 0952-1976

M1 - 108951

ER -