MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction

Ngan Dong; Stefanie Mucke; Megha Khosla

doi:10.1109/TCBB.2022.3176456

Details

Original language	English
Pages (from-to)	3081-3092
Number of pages	12
Journal	IEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume	19
Issue number	6
Publication status	Published - 20 May 2022

Abstract

Growing evidence from recent studies implies that microRNAs or miRNAs could serve as biomarkers in various complex human diseases. Since wet-lab experiments for detecting miRNAs associated with a disease are expensive and time-consuming, machine learning techniques for miRNA-disease association prediction have attracted much attention in recent years. A big challenge in building reliable machine learning models is that of data scarcity. In particular, existing approaches trained on the available small datasets, even when combined with precalculated handcrafted input features, often suffer from bad generalization and data leakage problems. We overcome the limitations of existing works by proposing a novel multitask graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (associations between miRNAs/diseases and protein-coding genes (PCGs), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multitask setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we conduct large-scale experiments on the standard benchmark datasets as well as on our proposed large independent testing sets and case studies. MuCoMiD obtains significantly higher Average Precision (AP) scores than all benchmarked models on three large independent testing sets, especially those with many new miRNAs, as well as in the detection of false positives. Thanks to its capability of learning directly from raw input information, MuCoMiD is easier to maintain and update than handcrafted feature-based methods, which would require recomputation of features every time there is a change in the original information sources (e.g., disease ontology, miRNA/disease-PCG associations, etc.). We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.

Keywords

Data integration, disease, graph representation learning, MiRNA, multitask

ASJC Scopus subject areas

Biochemistry, Genetics and Molecular Biology(all)
Biotechnology
Biochemistry, Genetics and Molecular Biology(all)
Genetics
Mathematics(all)
Applied Mathematics

Sustainable Development Goals

SDG 3 - Good Health and Well-being

Cite this

MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction. / Dong, Ngan; Mucke, Stefanie; Khosla, Megha.
In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 19, No. 6, 20.05.2022, p. 3081-3092.

Research output: Contribution to journal › Article › Research › peer review

Dong, N, Mucke, S & Khosla, M 2022, 'MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction', IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 6, pp. 3081-3092. https://doi.org/10.1109/TCBB.2022.3176456

Dong, N., Mucke, S., & Khosla, M. (2022). MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(6), 3081-3092. https://doi.org/10.1109/TCBB.2022.3176456

Dong N, Mucke S, Khosla M. MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2022 May 20;19(6):3081-3092. doi: 10.1109/TCBB.2022.3176456

Dong, Ngan ; Mucke, Stefanie ; Khosla, Megha. / MuCoMiD : A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2022 ; Vol. 19, No. 6. pp. 3081-3092.

Download

@article{6413f62ac80c47b7ab52349fb2bab598,

title = "MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction",

abstract = "Growing evidence from recent studies implies that microRNAs or miRNAs could serve as biomarkers in various complex human diseases. Since wet-lab experiments for detecting miRNAs associated with a disease are expensive and time-consuming, machine learning techniques for miRNA-disease association prediction have attracted much attention in recent years. A big challenge in building reliable machine learning models is that of data scarcity. In particular, existing approaches trained on the available small datasets, even when combined with precalculated handcrafted input features, often suffer from bad generalization and data leakage problems. We overcome the limitations of existing works by proposing a novel multitask graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (associations between miRNAs/diseases and protein-coding genes (PCGs), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multitask setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we conduct large-scale experiments on the standard benchmark datasets as well as on our proposed large independent testing sets and case studies. MuCoMiD obtains significantly higher Average Precision (AP) scores than all benchmarked models on three large independent testing sets, especially those with many new miRNAs, as well as in the detection of false positives. Thanks to its capability of learning directly from raw input information, MuCoMiD is easier to maintain and update than handcrafted feature-based methods, which would require recomputation of features every time there is a change in the original information sources (e.g., disease ontology, miRNA/disease-PCG associations, etc.). We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.",

keywords = "Data integration, disease, graph representation learning, MiRNA, multitask",

author = "Ngan Dong and Stefanie Mucke and Megha Khosla",

year = "2022",

month = may,

day = "20",

doi = "10.1109/TCBB.2022.3176456",

language = "English",

volume = "19",

pages = "3081--3092",

journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",

issn = "1545-5963",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

Download

TY - JOUR

T1 - MuCoMiD

T2 - A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction

AU - Dong, Ngan

AU - Mucke, Stefanie

AU - Khosla, Megha

PY - 2022/5/20

Y1 - 2022/5/20

N2 - Growing evidence from recent studies implies that microRNAs or miRNAs could serve as biomarkers in various complex human diseases. Since wet-lab experiments for detecting miRNAs associated with a disease are expensive and time-consuming, machine learning techniques for miRNA-disease association prediction have attracted much attention in recent years. A big challenge in building reliable machine learning models is that of data scarcity. In particular, existing approaches trained on the available small datasets, even when combined with precalculated handcrafted input features, often suffer from bad generalization and data leakage problems. We overcome the limitations of existing works by proposing a novel multitask graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (associations between miRNAs/diseases and protein-coding genes (PCGs), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multitask setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we conduct large-scale experiments on the standard benchmark datasets as well as on our proposed large independent testing sets and case studies. MuCoMiD obtains significantly higher Average Precision (AP) scores than all benchmarked models on three large independent testing sets, especially those with many new miRNAs, as well as in the detection of false positives. Thanks to its capability of learning directly from raw input information, MuCoMiD is easier to maintain and update than handcrafted feature-based methods, which would require recomputation of features every time there is a change in the original information sources (e.g., disease ontology, miRNA/disease-PCG associations, etc.). We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.

AB - Growing evidence from recent studies implies that microRNAs or miRNAs could serve as biomarkers in various complex human diseases. Since wet-lab experiments for detecting miRNAs associated with a disease are expensive and time-consuming, machine learning techniques for miRNA-disease association prediction have attracted much attention in recent years. A big challenge in building reliable machine learning models is that of data scarcity. In particular, existing approaches trained on the available small datasets, even when combined with precalculated handcrafted input features, often suffer from bad generalization and data leakage problems. We overcome the limitations of existing works by proposing a novel multitask graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (associations between miRNAs/diseases and protein-coding genes (PCGs), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multitask setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we conduct large-scale experiments on the standard benchmark datasets as well as on our proposed large independent testing sets and case studies. MuCoMiD obtains significantly higher Average Precision (AP) scores than all benchmarked models on three large independent testing sets, especially those with many new miRNAs, as well as in the detection of false positives. Thanks to its capability of learning directly from raw input information, MuCoMiD is easier to maintain and update than handcrafted feature-based methods, which would require recomputation of features every time there is a change in the original information sources (e.g., disease ontology, miRNA/disease-PCG associations, etc.). We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.

KW - Data integration

KW - disease

KW - graph representation learning

KW - MiRNA

KW - multitask

UR - http://www.scopus.com/inward/record.url?scp=85130506199&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2022.3176456

DO - 10.1109/TCBB.2022.3176456

M3 - Article

AN - SCOPUS:85130506199

VL - 19

SP - 3081

EP - 3092

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 6

ER -

Research@Leibniz University

MuCoMiD: A Multitask Graph Convolutional Learning Framework for miRNA-Disease Association Prediction

Authors

Research Organisations

External Research Organisations