Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | D817-D824 |
Fachzeitschrift | Nucleic Acids Research |
Jahrgang | 49 |
Ausgabenummer | D1 |
Frühes Online-Datum | 12 Okt. 2020 |
Publikationsstatus | Veröffentlicht - 8 Jan. 2021 |
Abstract
ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.
ASJC Scopus Sachgebiete
- Biochemie, Genetik und Molekularbiologie (insg.)
- Genetik
Ziele für nachhaltige Entwicklung
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Nucleic Acids Research, Jahrgang 49, Nr. D1, 08.01.2021, S. D817-D824.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - ViruSurf
T2 - an integrated database to investigate viral sequences
AU - Canakoglu, Arif
AU - Pinoli, Pietro
AU - Bernasconi, Anna
AU - Alfonsi, Tommaso
AU - Melidis, Damianos p
AU - Ceri, Stefano
N1 - Funding Information: The authors would like to thank Ilaria Capua, Matteo Chiara, Ana Conesa, Luca Ferretti, Alice Fusaro, Ruba Khalaf, Susanna Lamers, Stefania Leopardi, Francesca Mari, Carla Mavian, Graziano Pesole, Alessandra Renieri, Anna Sandionigi, Stephen Tsui, Limsoon Wong and Federico Zambelli for their contribution to requirements elicitation and for inspiring future developments of this research. The authors are grateful to the GISAID organization for the data sharing agreement that allowed the development of the GISAID-specific version of ViruSurf. The authors also acknowledge the depositions of worldwide laboratories to GenBank, COG-UK and NMDC. Finally, we acknowledge the support from Amazon Machine Learning Research Award 'Data-driven Machine and Deep Learning for Genomics'. H2020 European Research Council [693174]; H2020 European Institute of Innovation and Technology [20663]. Funding for open access charge: H2020 European Research Council [693174].
PY - 2021/1/8
Y1 - 2021/1/8
N2 - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.
AB - ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.
UR - http://www.scopus.com/inward/record.url?scp=85097436367&partnerID=8YFLogxK
U2 - 10.1093/nar/gkaa846
DO - 10.1093/nar/gkaa846
M3 - Article
VL - 49
SP - D817-D824
JO - Nucleic Acids Research
JF - Nucleic Acids Research
SN - 0301-5610
IS - D1
ER -