A holistic view on web archives

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Autoren

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksThe Past Web
UntertitelExploring Web Archives
Herausgeber (Verlag)Springer International Publishing AG
Seiten85-99
Seitenumfang15
ISBN (elektronisch)9783030632915
ISBN (Print)9783030632908
PublikationsstatusVeröffentlicht - 1 Aug. 2021

Abstract

In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.

ASJC Scopus Sachgebiete

Zitieren

A holistic view on web archives. / Holzmann, Helge; Nejdl, Wolfgang.
The Past Web: Exploring Web Archives. Springer International Publishing AG, 2021. S. 85-99.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Holzmann, H & Nejdl, W 2021, A holistic view on web archives. in The Past Web: Exploring Web Archives. Springer International Publishing AG, S. 85-99. https://doi.org/10.1007/978-3-030-63291-5_8
Holzmann, H., & Nejdl, W. (2021). A holistic view on web archives. In The Past Web: Exploring Web Archives (S. 85-99). Springer International Publishing AG. https://doi.org/10.1007/978-3-030-63291-5_8
Holzmann H, Nejdl W. A holistic view on web archives. in The Past Web: Exploring Web Archives. Springer International Publishing AG. 2021. S. 85-99 Epub 2021 Jul 1. doi: 10.1007/978-3-030-63291-5_8
Holzmann, Helge ; Nejdl, Wolfgang. / A holistic view on web archives. The Past Web: Exploring Web Archives. Springer International Publishing AG, 2021. S. 85-99
Download
@inbook{db64d4ce6adf4fdb896edb6ec6114a90,
title = "A holistic view on web archives",
abstract = "In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.",
author = "Helge Holzmann and Wolfgang Nejdl",
year = "2021",
month = aug,
day = "1",
doi = "10.1007/978-3-030-63291-5_8",
language = "English",
isbn = "9783030632908",
pages = "85--99",
booktitle = "The Past Web",
publisher = "Springer International Publishing AG",
address = "Switzerland",

}

Download

TY - CHAP

T1 - A holistic view on web archives

AU - Holzmann, Helge

AU - Nejdl, Wolfgang

PY - 2021/8/1

Y1 - 2021/8/1

N2 - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.

AB - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.

UR - http://www.scopus.com/inward/record.url?scp=85150047765&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-63291-5_8

DO - 10.1007/978-3-030-63291-5_8

M3 - Contribution to book/anthology

AN - SCOPUS:85150047765

SN - 9783030632908

SP - 85

EP - 99

BT - The Past Web

PB - Springer International Publishing AG

ER -

Von denselben Autoren