Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 174107-174121 |
Seitenumfang | 15 |
Fachzeitschrift | IEEE ACCESS |
Jahrgang | 12 |
Publikationsstatus | Veröffentlicht - 13 Nov. 2024 |
Abstract
With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code for this work is publicly available at: https://github.com/AliRasekh/TSImageFusion.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Allgemeine Computerwissenschaft
- Werkstoffwissenschaften (insg.)
- Allgemeine Materialwissenschaften
- Ingenieurwesen (insg.)
- Allgemeiner Maschinenbau
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: IEEE ACCESS, Jahrgang 12, 13.11.2024, S. 174107-174121.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction
AU - Rasekh, Ali
AU - Heidari, Reza
AU - Hosein Haji Mohammad Rezaie, Amir
AU - Sharifi Sedeh, Parsa
AU - Ahmadi, Zahra
AU - Mitra, Prasenjit
AU - Nejdl, Wolfgang
N1 - Publisher Copyright: © 2024 The Authors.
PY - 2024/11/13
Y1 - 2024/11/13
N2 - With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code for this work is publicly available at: https://github.com/AliRasekh/TSImageFusion.
AB - With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code for this work is publicly available at: https://github.com/AliRasekh/TSImageFusion.
KW - attention mechanism
KW - Multimodal learning
KW - phenotyping
KW - robustness
KW - time series
UR - http://www.scopus.com/inward/record.url?scp=85210390662&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3497668
DO - 10.1109/ACCESS.2024.3497668
M3 - Article
AN - SCOPUS:85210390662
VL - 12
SP - 174107
EP - 174121
JO - IEEE ACCESS
JF - IEEE ACCESS
SN - 2169-3536
ER -