Details
Original language | English |
---|---|
Title of host publication | 2021 IEEE International Conference on Big Data (Big Data) |
Editors | Yixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 263-273 |
Number of pages | 11 |
ISBN (electronic) | 9781665439022 |
ISBN (print) | 978-1-6654-4599-3 |
Publication status | Published - 2021 |
Event | 2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States Duration: 15 Dec 2021 → 18 Dec 2021 |
Publication series
Name | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|
Abstract
Temporal web graphs have been attracting much attention recently due to their important applications in web search, data mining, and social network analysis. Accumulated over long periods, those graphs have grown gigantic in size and rich in temporal evolution, which poses tough challenges for data storage and management. Though a few temporal graph management systems were previously proposed, none of them can simultaneously satisfy both essential requirements when retrieving on temporal web graphs: very large data scalability and very low querying latency.In this work, we address the above gap in existing works by developing a highly efficient temporal graph management system which is dedicated to web graphs. To this end, we greatly extend the most efficient framework for managing large static web graphs to handle temporal information using the property matrix while preserving most of the outstanding features of the base framework. Ultimately, our proposed system can achieve a nearly instant response for vertex-centric temporal retrieval while still being scalable to huge datasets. Experiments on a real-world dataset with more than 43B nodes and 317B links show that using a small non-dedicated cluster, our system can reach a reduction of data storage space up to 88% of raw data size and reduce the retrieval time by 20%, compared to the baselines. We also demonstrate that our system also yields a significant reduction of computational costs for many graph ranking algorithms.
Keywords
- archival search, compression, distributed system, graph index, temporal graph representation
ASJC Scopus subject areas
- Decision Sciences(all)
- Information Systems and Management
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Computer Vision and Pattern Recognition
- Computer Science(all)
- Information Systems
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2021 IEEE International Conference on Big Data (Big Data). ed. / Yixin Chen; Heiko Ludwig; Yicheng Tu; Usama Fayyad; Xingquan Zhu; Xiaohua Tony Hu; Suren Byna; Xiong Liu; Jianping Zhang; Shirui Pan; Vagelis Papalexakis; Jianwu Wang; Alfredo Cuzzocrea; Carlos Ordonez. Institute of Electrical and Electronics Engineers Inc., 2021. p. 263-273 (Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Efficient Scalable Temporal Web Graph Store
AU - Vo, Khoi Duy
AU - Zerr, Sergej
AU - Zhu, Xiaofei
AU - Nejdl, Wolfgang
PY - 2021
Y1 - 2021
N2 - Temporal web graphs have been attracting much attention recently due to their important applications in web search, data mining, and social network analysis. Accumulated over long periods, those graphs have grown gigantic in size and rich in temporal evolution, which poses tough challenges for data storage and management. Though a few temporal graph management systems were previously proposed, none of them can simultaneously satisfy both essential requirements when retrieving on temporal web graphs: very large data scalability and very low querying latency.In this work, we address the above gap in existing works by developing a highly efficient temporal graph management system which is dedicated to web graphs. To this end, we greatly extend the most efficient framework for managing large static web graphs to handle temporal information using the property matrix while preserving most of the outstanding features of the base framework. Ultimately, our proposed system can achieve a nearly instant response for vertex-centric temporal retrieval while still being scalable to huge datasets. Experiments on a real-world dataset with more than 43B nodes and 317B links show that using a small non-dedicated cluster, our system can reach a reduction of data storage space up to 88% of raw data size and reduce the retrieval time by 20%, compared to the baselines. We also demonstrate that our system also yields a significant reduction of computational costs for many graph ranking algorithms.
AB - Temporal web graphs have been attracting much attention recently due to their important applications in web search, data mining, and social network analysis. Accumulated over long periods, those graphs have grown gigantic in size and rich in temporal evolution, which poses tough challenges for data storage and management. Though a few temporal graph management systems were previously proposed, none of them can simultaneously satisfy both essential requirements when retrieving on temporal web graphs: very large data scalability and very low querying latency.In this work, we address the above gap in existing works by developing a highly efficient temporal graph management system which is dedicated to web graphs. To this end, we greatly extend the most efficient framework for managing large static web graphs to handle temporal information using the property matrix while preserving most of the outstanding features of the base framework. Ultimately, our proposed system can achieve a nearly instant response for vertex-centric temporal retrieval while still being scalable to huge datasets. Experiments on a real-world dataset with more than 43B nodes and 317B links show that using a small non-dedicated cluster, our system can reach a reduction of data storage space up to 88% of raw data size and reduce the retrieval time by 20%, compared to the baselines. We also demonstrate that our system also yields a significant reduction of computational costs for many graph ranking algorithms.
KW - archival search
KW - compression
KW - distributed system
KW - graph index
KW - temporal graph representation
UR - http://www.scopus.com/inward/record.url?scp=85125298267&partnerID=8YFLogxK
U2 - 10.1109/bigdata52589.2021.9671984
DO - 10.1109/bigdata52589.2021.9671984
M3 - Conference contribution
AN - SCOPUS:85125298267
SN - 978-1-6654-4599-3
T3 - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
SP - 263
EP - 273
BT - 2021 IEEE International Conference on Big Data (Big Data)
A2 - Chen, Yixin
A2 - Ludwig, Heiko
A2 - Tu, Yicheng
A2 - Fayyad, Usama
A2 - Zhu, Xingquan
A2 - Hu, Xiaohua Tony
A2 - Byna, Suren
A2 - Liu, Xiong
A2 - Zhang, Jianping
A2 - Pan, Shirui
A2 - Papalexakis, Vagelis
A2 - Wang, Jianwu
A2 - Cuzzocrea, Alfredo
A2 - Ordonez, Carlos
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Big Data, Big Data 2021
Y2 - 15 December 2021 through 18 December 2021
ER -