Details
Original language | English |
---|---|
Pages (from-to) | 1807-1818 |
Number of pages | 12 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 33 |
Issue number | 5 |
Early online date | 4 Nov 2019 |
Publication status | Published - 1 May 2021 |
Abstract
There has been significant progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives, and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. We argue that most of the UNRL approaches either model and exploit neighborhood or what we call context information of a node. These methods largely differ in their definitions and exploitation of context. Consequently, we propose a framework that casts a variety of approaches-random walk based, matrix factorization and deep learning based-into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study their differences which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering nine popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks-node classification and link prediction. We find that for non-attributed graphs there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition, we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
ASJC Scopus subject areas
- Computer Science(all)
- Information Systems
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computational Theory and Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: IEEE Transactions on Knowledge and Data Engineering, Vol. 33, No. 5, 01.05.2021, p. 1807-1818.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - A Comparative Study for Unsupervised Network Representation Learning
AU - Khosla, Megha
AU - Setty, Vinay
AU - Anand, Avishek
N1 - Funding Information: This work is partially funded by the SoBigData (European Unions Horizon 2020 Research and Innovation Programme under Grant agreement No. 654024).
PY - 2021/5/1
Y1 - 2021/5/1
N2 - There has been significant progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives, and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. We argue that most of the UNRL approaches either model and exploit neighborhood or what we call context information of a node. These methods largely differ in their definitions and exploitation of context. Consequently, we propose a framework that casts a variety of approaches-random walk based, matrix factorization and deep learning based-into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study their differences which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering nine popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks-node classification and link prediction. We find that for non-attributed graphs there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition, we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
AB - There has been significant progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives, and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. We argue that most of the UNRL approaches either model and exploit neighborhood or what we call context information of a node. These methods largely differ in their definitions and exploitation of context. Consequently, we propose a framework that casts a variety of approaches-random walk based, matrix factorization and deep learning based-into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study their differences which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering nine popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks-node classification and link prediction. We find that for non-attributed graphs there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition, we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
UR - http://www.scopus.com/inward/record.url?scp=85104043134&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2951398
DO - 10.1109/TKDE.2019.2951398
M3 - Article
AN - SCOPUS:85104043134
VL - 33
SP - 1807
EP - 1818
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
IS - 5
ER -