Multimodal Isotropic Neural Architecture with Patch Embedding

Hubert Truchan; Evgenii Naumov; Rezaul Abedin; Gregory Palmer; Zahra Ahmadi

doi:10.1007/978-981-99-8079-6_14

Details

Original language	English
Title of host publication	Neural Information Processing
Subtitle of host publication	30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I
Editors	Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	173-187
Number of pages	15
ISBN (electronic)	978-981-99-8079-6
ISBN (print)	9789819980789
Publication status	Published - 14 Nov 2023
Event	30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, China Duration: 20 Nov 2023 → 23 Nov 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14447 LNCS
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors’ newly collected multi-dimensional multimodal dataset, Mude-streda, obtained from real industrial processing devices¹ (¹ Link to code and dataset: https://github.com/hubtru/Minape).

Keywords

Isotropic Architecture, Multimodal Classification, Patch Embedding, Time Series

ASJC Scopus subject areas

Mathematics(all)
Theoretical Computer Science
Computer Science(all)
General Computer Science

Cite this

Multimodal Isotropic Neural Architecture with Patch Embedding. / Truchan, Hubert; Naumov, Evgenii; Abedin, Rezaul et al.
Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I. ed. / Biao Luo; Long Cheng; Zheng-Guang Wu; Hongyi Li; Chaojie Li. Springer Science and Business Media Deutschland GmbH, 2023. p. 173-187 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14447 LNCS).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Truchan, H, Naumov, E, Abedin, R, Palmer, G & Ahmadi, Z 2023, Multimodal Isotropic Neural Architecture with Patch Embedding. in B Luo, L Cheng, Z-G Wu, H Li & C Li (eds), Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14447 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 173-187, 30th International Conference on Neural Information Processing, ICONIP 2023, Changsha, China, 20 Nov 2023. https://doi.org/10.1007/978-981-99-8079-6_14

Truchan, H., Naumov, E., Abedin, R., Palmer, G., & Ahmadi, Z. (2023). Multimodal Isotropic Neural Architecture with Patch Embedding. In B. Luo, L. Cheng, Z.-G. Wu, H. Li, & C. Li (Eds.), Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I (pp. 173-187). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14447 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8079-6_14

Truchan H, Naumov E, Abedin R, Palmer G, Ahmadi Z. Multimodal Isotropic Neural Architecture with Patch Embedding. In Luo B, Cheng L, Wu ZG, Li H, Li C, editors, Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I. Springer Science and Business Media Deutschland GmbH. 2023. p. 173-187. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-8079-6_14

Truchan, Hubert ; Naumov, Evgenii ; Abedin, Rezaul et al. / Multimodal Isotropic Neural Architecture with Patch Embedding. Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I. editor / Biao Luo ; Long Cheng ; Zheng-Guang Wu ; Hongyi Li ; Chaojie Li. Springer Science and Business Media Deutschland GmbH, 2023. pp. 173-187 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{bf2a9bf476fa41a9b931e9ca059f25b3,

title = "Multimodal Isotropic Neural Architecture with Patch Embedding",

abstract = "Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors{\textquoteright} newly collected multi-dimensional multimodal dataset, Mude-streda, obtained from real industrial processing devices1 (1 Link to code and dataset: https://github.com/hubtru/Minape).",

keywords = "Isotropic Architecture, Multimodal Classification, Patch Embedding, Time Series",

author = "Hubert Truchan and Evgenii Naumov and Rezaul Abedin and Gregory Palmer and Zahra Ahmadi",

year = "2023",

month = nov,

day = "14",

doi = "10.1007/978-981-99-8079-6_14",

language = "English",

isbn = "9789819980789",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "173--187",

editor = "Biao Luo and Long Cheng and Zheng-Guang Wu and Hongyi Li and Chaojie Li",

booktitle = "Neural Information Processing",

address = "Germany",

note = "30th International Conference on Neural Information Processing, ICONIP 2023 ; Conference date: 20-11-2023 Through 23-11-2023",

}

Download

TY - GEN

T1 - Multimodal Isotropic Neural Architecture with Patch Embedding

AU - Truchan, Hubert

AU - Naumov, Evgenii

AU - Abedin, Rezaul

AU - Palmer, Gregory

AU - Ahmadi, Zahra

PY - 2023/11/14

Y1 - 2023/11/14

N2 - Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors’ newly collected multi-dimensional multimodal dataset, Mude-streda, obtained from real industrial processing devices1 (1 Link to code and dataset: https://github.com/hubtru/Minape).

AB - Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors’ newly collected multi-dimensional multimodal dataset, Mude-streda, obtained from real industrial processing devices1 (1 Link to code and dataset: https://github.com/hubtru/Minape).

KW - Isotropic Architecture

KW - Multimodal Classification

KW - Patch Embedding

KW - Time Series

UR - http://www.scopus.com/inward/record.url?scp=85181982796&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-8079-6_14

DO - 10.1007/978-981-99-8079-6_14

M3 - Conference contribution

AN - SCOPUS:85181982796

SN - 9789819980789

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 173

EP - 187

BT - Neural Information Processing

A2 - Luo, Biao

A2 - Cheng, Long

A2 - Wu, Zheng-Guang

A2 - Li, Hongyi

A2 - Li, Chaojie

PB - Springer Science and Business Media Deutschland GmbH

T2 - 30th International Conference on Neural Information Processing, ICONIP 2023

Y2 - 20 November 2023 through 23 November 2023

ER -

Research@Leibniz University