Improving single image localization through domain adaptation and large kernel attention with synthetic data

Dansheng Yao; Hehua Zhu; Bangke Ren; Xiaoying Zhuang

doi:10.1016/j.engappai.2024.108951

Details

Original language	English
Article number	108951
Number of pages	18
Journal	Engineering Applications of Artificial Intelligence
Volume	137
Early online date	8 Aug 2024
Publication status	Published - Nov 2024

Abstract

In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

Keywords

3D model, Domain adaptation, Large kernel attention, Synthetic dataset, Visual localization

ASJC Scopus subject areas

Engineering(all)
Control and Systems Engineering
Computer Science(all)
Artificial Intelligence
Engineering(all)
Electrical and Electronic Engineering

Cite this

Improving single image localization through domain adaptation and large kernel attention with synthetic data. / Yao, Dansheng; Zhu, Hehua; Ren, Bangke et al.
In: Engineering Applications of Artificial Intelligence, Vol. 137, 108951, 11.2024.

Research output: Contribution to journal › Article › Research › peer review

Yao, D, Zhu, H, Ren, B & Zhuang, X 2024, 'Improving single image localization through domain adaptation and large kernel attention with synthetic data', Engineering Applications of Artificial Intelligence, vol. 137, 108951. https://doi.org/10.1016/j.engappai.2024.108951

Yao, D., Zhu, H., Ren, B., & Zhuang, X. (2024). Improving single image localization through domain adaptation and large kernel attention with synthetic data. Engineering Applications of Artificial Intelligence, 137, Article 108951. https://doi.org/10.1016/j.engappai.2024.108951

Yao D, Zhu H, Ren B, Zhuang X. Improving single image localization through domain adaptation and large kernel attention with synthetic data. Engineering Applications of Artificial Intelligence. 2024 Nov;137:108951. Epub 2024 Aug 8. doi: 10.1016/j.engappai.2024.108951

Yao, Dansheng ; Zhu, Hehua ; Ren, Bangke et al. / Improving single image localization through domain adaptation and large kernel attention with synthetic data. In: Engineering Applications of Artificial Intelligence. 2024 ; Vol. 137.

Download

@article{fcdcb700fec543e9983ccefc51dc033a,

title = "Improving single image localization through domain adaptation and large kernel attention with synthetic data",

abstract = "In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.",

keywords = "3D model, Domain adaptation, Large kernel attention, Synthetic dataset, Visual localization",

author = "Dansheng Yao and Hehua Zhu and Bangke Ren and Xiaoying Zhuang",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = nov,

doi = "10.1016/j.engappai.2024.108951",

language = "English",

volume = "137",

journal = "Engineering Applications of Artificial Intelligence",

issn = "0952-1976",

publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - Improving single image localization through domain adaptation and large kernel attention with synthetic data

AU - Yao, Dansheng

AU - Zhu, Hehua

AU - Ren, Bangke

AU - Zhuang, Xiaoying

PY - 2024/11

Y1 - 2024/11

N2 - In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

AB - In the realm of digital twin technology, image localization emerges as a crucial aspect, particularly in the challenging domain of civil engineering construction. Unlike the data-rich environments typical of structure-from-motion (sfm) technologies, the construction phase of civil engineering projects often faces economic constraints that limit data collection. This results in sporadic and localized snapshots, rather than comprehensive spatial and temporal coverage of the entire scene. Such prevalent data sparsity poses significant challenges to achieving accurate image localization. Our research is tailored to address this specific challenge, focusing on single image localization in environments where data is inherently sparse. We introduce a multi-scale convolutional attention network, incorporating feature-fused adversarial components, to effectively navigate the complexities of sparse data typical in civil engineering construction sites. The network employs large kernel convolutions for refined channel and spatial attention, ensuring precise location information transmission, even in data-limited scenarios. This accuracy is further augmented by multi-scale convolutional layers and a multi-level discriminator network, aiming to minimize the domain shift between virtual and real-world imagery. Our approach was rigorously tested and subjected to ablation studies on two public datasets, confirming its efficacy. In indoor settings, we achieved a median localization accuracy of 1.12 m and 9.80°, and in outdoor environments, our best results were 3.69 m and 1.67°. These outcomes highlight the effectiveness of our method in addressing the unique challenges posed by data sparsity in civil engineering construction. We also investigated the impact of domain adaptation on localization accuracy across different feature levels, finding that its effect varies depending on the degree of alignment between virtual and real datasets. In conclusion, this study offers a significant contribution to image localization in digital twin technology, particularly in the challenging context of data-sparse civil engineering construction processes. It paves the way for future research in optimizing image localization techniques in similar sparse data environments.

KW - 3D model

KW - Domain adaptation

KW - Large kernel attention

KW - Synthetic dataset

KW - Visual localization

UR - http://www.scopus.com/inward/record.url?scp=85200634854&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2024.108951

DO - 10.1016/j.engappai.2024.108951

M3 - Article

AN - SCOPUS:85200634854

VL - 137

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

SN - 0952-1976

M1 - 108951

ER -

Research@Leibniz University

Improving single image localization through domain adaptation and large kernel attention with synthetic data

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this