ViruSurf: an integrated database to investigate viral sequences

Arif Canakoglu; Pietro Pinoli; Anna Bernasconi; Tommaso Alfonsi; Damianos p Melidis; Stefano Ceri

doi:10.1093/nar/gkaa846

Details

Originalsprache	Englisch
Seiten (von - bis)	D817-D824
Fachzeitschrift	Nucleic Acids Research
Jahrgang	49
Ausgabenummer	D1
Frühes Online-Datum	12 Okt. 2020
Publikationsstatus	Veröffentlicht - 8 Jan. 2021

Abstract

ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

ASJC Scopus Sachgebiete

Biochemie, Genetik und Molekularbiologie (insg.)
Genetik

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen

Zitieren

ViruSurf: an integrated database to investigate viral sequences. / Canakoglu, Arif; Pinoli, Pietro; Bernasconi, Anna et al.
in: Nucleic Acids Research, Jahrgang 49, Nr. D1, 08.01.2021, S. D817-D824.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Canakoglu, A, Pinoli, P, Bernasconi, A, Alfonsi, T, Melidis, D & Ceri, S 2021, 'ViruSurf: an integrated database to investigate viral sequences', Nucleic Acids Research, Jg. 49, Nr. D1, S. D817-D824. https://doi.org/10.1093/nar/gkaa846

Canakoglu, A., Pinoli, P., Bernasconi, A., Alfonsi, T., Melidis, D., & Ceri, S. (2021). ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research, 49(D1), D817-D824. https://doi.org/10.1093/nar/gkaa846

Canakoglu A, Pinoli P, Bernasconi A, Alfonsi T, Melidis D, Ceri S. ViruSurf: an integrated database to investigate viral sequences. Nucleic Acids Research. 2021 Jan 8;49(D1):D817-D824. Epub 2020 Okt 12. doi: 10.1093/nar/gkaa846

Canakoglu, Arif ; Pinoli, Pietro ; Bernasconi, Anna et al. / ViruSurf : an integrated database to investigate viral sequences. in: Nucleic Acids Research. 2021 ; Jahrgang 49, Nr. D1. S. D817-D824.

Download

@article{ca11031611fb4ded93e11fabc013df2f,

title = "ViruSurf: an integrated database to investigate viral sequences",

abstract = "ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.",

author = "Arif Canakoglu and Pietro Pinoli and Anna Bernasconi and Tommaso Alfonsi and Damianos p Melidis and Stefano Ceri",

note = "Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].",

year = "2021",

month = jan,

day = "8",

doi = "10.1093/nar/gkaa846",

language = "English",

volume = "49",

pages = "D817--D824",

journal = "Nucleic Acids Research",

issn = "0301-5610",

publisher = "Oxford University Press",

number = "D1",

}

Download

TY - JOUR

T1 - ViruSurf

T2 - an integrated database to investigate viral sequences

AU - Canakoglu, Arif

AU - Pinoli, Pietro

AU - Bernasconi, Anna

AU - Alfonsi, Tommaso

AU - Melidis, Damianos p

AU - Ceri, Stefano

N1 - Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].

PY - 2021/1/8

Y1 - 2021/1/8

N2 - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

AB - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

UR - http://www.scopus.com/inward/record.url?scp=85097436367&partnerID=8YFLogxK

U2 - 10.1093/nar/gkaa846

DO - 10.1093/nar/gkaa846

M3 - Article

VL - 49

SP - D817-D824

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0301-5610

IS - D1

ER -

Research@Leibniz University

ViruSurf: an integrated database to investigate viral sequences

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Ziele für nachhaltige Entwicklung

Zitieren