A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Bin Liu
  • Patrick Lindner
  • Adan Chari Jirmo
  • Ulrich Maus
  • Thomas Illig
  • David S. Deluca

Research Organisations

External Research Organisations

  • Hannover Medical School (MHH)
View graph of relations

Details

Original languageEnglish
Article number28
JournalBMC BIOINFORMATICS
Volume21
Issue number1
Publication statusPublished - 2020

Abstract

Background: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, "gene signatures" for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. Results: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. Conclusions: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.

Keywords

    Gene set, Gene signature, Transcriptome

ASJC Scopus subject areas

Cite this

A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells. / Liu, Bin; Lindner, Patrick; Jirmo, Adan Chari et al.
In: BMC BIOINFORMATICS, Vol. 21, No. 1, 28, 2020.

Research output: Contribution to journalArticleResearchpeer review

Liu B, Lindner P, Jirmo AC, Maus U, Illig T, Deluca DS. A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells. BMC BIOINFORMATICS. 2020;21(1):28. doi: 10.1186/s12859-020-3366-4
Download
@article{1e0ffe69fc53434c9b1c4eb0caa5ef06,
title = "A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells",
abstract = "Background: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, {"}gene signatures{"} for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. Results: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. Conclusions: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.",
keywords = "Gene set, Gene signature, Transcriptome",
author = "Bin Liu and Patrick Lindner and Jirmo, {Adan Chari} and Ulrich Maus and Thomas Illig and Deluca, {David S.}",
note = "Funding information: This work was funded by the Bundesministerium f{\"u}r Bildung und Forschung BMBF (German Center for Lung Research (DZL)). The funding bodies were not involved in the design of the study, collection, analysis, and interpretation of data, nor in writing the manuscript.",
year = "2020",
doi = "10.1186/s12859-020-3366-4",
language = "English",
volume = "21",
journal = "BMC BIOINFORMATICS",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "1",

}

Download

TY - JOUR

T1 - A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells

AU - Liu, Bin

AU - Lindner, Patrick

AU - Jirmo, Adan Chari

AU - Maus, Ulrich

AU - Illig, Thomas

AU - Deluca, David S.

N1 - Funding information: This work was funded by the Bundesministerium für Bildung und Forschung BMBF (German Center for Lung Research (DZL)). The funding bodies were not involved in the design of the study, collection, analysis, and interpretation of data, nor in writing the manuscript.

PY - 2020

Y1 - 2020

N2 - Background: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, "gene signatures" for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. Results: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. Conclusions: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.

AB - Background: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, "gene signatures" for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. Results: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. Conclusions: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.

KW - Gene set

KW - Gene signature

KW - Transcriptome

UR - http://www.scopus.com/inward/record.url?scp=85078680098&partnerID=8YFLogxK

U2 - 10.1186/s12859-020-3366-4

DO - 10.1186/s12859-020-3366-4

M3 - Article

C2 - 31992182

AN - SCOPUS:85078680098

VL - 21

JO - BMC BIOINFORMATICS

JF - BMC BIOINFORMATICS

SN - 1471-2105

IS - 1

M1 - 28

ER -