Loading [MathJax]/extensions/tex2jax.js

ViruSurf: an integrated database to investigate viral sequences

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

  • Arif Canakoglu
  • Pietro Pinoli
  • Anna Bernasconi
  • Tommaso Alfonsi
  • Damianos p Melidis

Organisationseinheiten

Externe Organisationen

  • Politecnico di Milano

Details

OriginalspracheEnglisch
Seiten (von - bis)D817-D824
FachzeitschriftNucleic Acids Research
Jahrgang49
AusgabenummerD1
Frühes Online-Datum12 Okt. 2020
PublikationsstatusVeröffentlicht - 8 Jan. 2021

Abstract

ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

ASJC Scopus Sachgebiete

  • Biochemie, Genetik und Molekularbiologie (insg.)
  • Genetik

Ziele für nachhaltige Entwicklung

Zitieren

ViruSurf: an integrated database to investigate viral sequences. / Canakoglu, Arif; Pinoli, Pietro; Bernasconi, Anna et al.
in: Nucleic Acids Research, Jahrgang 49, Nr. D1, 08.01.2021, S. D817-D824.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Canakoglu, A, Pinoli, P, Bernasconi, A, Alfonsi, T, Melidis, D & Ceri, S 2021, 'ViruSurf: an integrated database to investigate viral sequences', Nucleic Acids Research, Jg. 49, Nr. D1, S. D817-D824. https://doi.org/10.1093/nar/gkaa846
Canakoglu, A., Pinoli, P., Bernasconi, A., Alfonsi, T., Melidis, D., & Ceri, S. (2021). ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research, 49(D1), D817-D824. https://doi.org/10.1093/nar/gkaa846
Canakoglu A, Pinoli P, Bernasconi A, Alfonsi T, Melidis D, Ceri S. ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research. 2021 Jan 8;49(D1):D817-D824. Epub 2020 Okt 12. doi: 10.1093/nar/gkaa846
Canakoglu, Arif ; Pinoli, Pietro ; Bernasconi, Anna et al. / ViruSurf : an integrated database to investigate viral sequences. in: Nucleic Acids Research. 2021 ; Jahrgang 49, Nr. D1. S. D817-D824.
Download
@article{ca11031611fb4ded93e11fabc013df2f,
title = "ViruSurf: an integrated database to investigate viral sequences",
abstract = "ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.",
author = "Arif Canakoglu and Pietro Pinoli and Anna Bernasconi and Tommaso Alfonsi and Damianos p Melidis and Stefano Ceri",
note = "Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].",
year = "2021",
month = jan,
day = "8",
doi = "10.1093/nar/gkaa846",
language = "English",
volume = "49",
pages = "D817--D824",
journal = "Nucleic Acids Research",
issn = "0301-5610",
publisher = "Oxford University Press",
number = "D1",

}

Download

TY - JOUR

T1 - ViruSurf

T2 - an integrated database to investigate viral sequences

AU - Canakoglu, Arif

AU - Pinoli, Pietro

AU - Bernasconi, Anna

AU - Alfonsi, Tommaso

AU - Melidis, Damianos p

AU - Ceri, Stefano

N1 - Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].

PY - 2021/1/8

Y1 - 2021/1/8

N2 - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

AB - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

UR - http://www.scopus.com/inward/record.url?scp=85097436367&partnerID=8YFLogxK

U2 - 10.1093/nar/gkaa846

DO - 10.1093/nar/gkaa846

M3 - Article

VL - 49

SP - D817-D824

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0301-5610

IS - D1

ER -