Loading [MathJax]/extensions/tex2jax.js

ViruSurf: an integrated database to investigate viral sequences

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Arif Canakoglu
  • Pietro Pinoli
  • Anna Bernasconi
  • Tommaso Alfonsi
  • Damianos p Melidis

Research Organisations

External Research Organisations

  • Politecnico di Milano

Details

Original languageEnglish
Pages (from-to)D817-D824
JournalNucleic Acids Research
Volume49
Issue numberD1
Early online date12 Oct 2020
Publication statusPublished - 8 Jan 2021

Abstract

ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Genetics

Sustainable Development Goals

Cite this

ViruSurf: an integrated database to investigate viral sequences. / Canakoglu, Arif; Pinoli, Pietro; Bernasconi, Anna et al.
In: Nucleic Acids Research, Vol. 49, No. D1, 08.01.2021, p. D817-D824.

Research output: Contribution to journalArticleResearchpeer review

Canakoglu, A, Pinoli, P, Bernasconi, A, Alfonsi, T, Melidis, D & Ceri, S 2021, 'ViruSurf: an integrated database to investigate viral sequences', Nucleic Acids Research, vol. 49, no. D1, pp. D817-D824. https://doi.org/10.1093/nar/gkaa846
Canakoglu, A., Pinoli, P., Bernasconi, A., Alfonsi, T., Melidis, D., & Ceri, S. (2021). ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research, 49(D1), D817-D824. https://doi.org/10.1093/nar/gkaa846
Canakoglu A, Pinoli P, Bernasconi A, Alfonsi T, Melidis D, Ceri S. ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research. 2021 Jan 8;49(D1):D817-D824. Epub 2020 Oct 12. doi: 10.1093/nar/gkaa846
Canakoglu, Arif ; Pinoli, Pietro ; Bernasconi, Anna et al. / ViruSurf : an integrated database to investigate viral sequences. In: Nucleic Acids Research. 2021 ; Vol. 49, No. D1. pp. D817-D824.
Download
@article{ca11031611fb4ded93e11fabc013df2f,
title = "ViruSurf: an integrated database to investigate viral sequences",
abstract = "ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.",
author = "Arif Canakoglu and Pietro Pinoli and Anna Bernasconi and Tommaso Alfonsi and Damianos p Melidis and Stefano Ceri",
note = "Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].",
year = "2021",
month = jan,
day = "8",
doi = "10.1093/nar/gkaa846",
language = "English",
volume = "49",
pages = "D817--D824",
journal = "Nucleic Acids Research",
issn = "0301-5610",
publisher = "Oxford University Press",
number = "D1",

}

Download

TY - JOUR

T1 - ViruSurf

T2 - an integrated database to investigate viral sequences

AU - Canakoglu, Arif

AU - Pinoli, Pietro

AU - Bernasconi, Anna

AU - Alfonsi, Tommaso

AU - Melidis, Damianos p

AU - Ceri, Stefano

N1 - Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].

PY - 2021/1/8

Y1 - 2021/1/8

N2 - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

AB - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

UR - http://www.scopus.com/inward/record.url?scp=85097436367&partnerID=8YFLogxK

U2 - 10.1093/nar/gkaa846

DO - 10.1093/nar/gkaa846

M3 - Article

VL - 49

SP - D817-D824

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0301-5610

IS - D1

ER -