Details
Original language | English |
---|---|
Title of host publication | The Past Web |
Subtitle of host publication | Exploring Web Archives |
Publisher | Springer International Publishing AG |
Pages | 85-99 |
Number of pages | 15 |
ISBN (electronic) | 9783030632915 |
ISBN (print) | 9783030632908 |
Publication status | Published - 1 Aug 2021 |
Abstract
In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
ASJC Scopus subject areas
- Computer Science(all)
- General Computer Science
- Arts and Humanities(all)
- General Arts and Humanities
- Social Sciences(all)
- General Social Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
The Past Web: Exploring Web Archives. Springer International Publishing AG, 2021. p. 85-99.
Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review
}
TY - CHAP
T1 - A holistic view on web archives
AU - Holzmann, Helge
AU - Nejdl, Wolfgang
PY - 2021/8/1
Y1 - 2021/8/1
N2 - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
AB - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
UR - http://www.scopus.com/inward/record.url?scp=85150047765&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63291-5_8
DO - 10.1007/978-3-030-63291-5_8
M3 - Contribution to book/anthology
AN - SCOPUS:85150047765
SN - 9783030632908
SP - 85
EP - 99
BT - The Past Web
PB - Springer International Publishing AG
ER -