Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | The Past Web |
Untertitel | Exploring Web Archives |
Herausgeber (Verlag) | Springer International Publishing AG |
Seiten | 85-99 |
Seitenumfang | 15 |
ISBN (elektronisch) | 9783030632915 |
ISBN (Print) | 9783030632908 |
Publikationsstatus | Veröffentlicht - 1 Aug. 2021 |
Abstract
In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Allgemeine Computerwissenschaft
- Geisteswissenschaftliche Fächer (insg.)
- Allgemeine Kunst und Geisteswissenschaften
- Sozialwissenschaften (insg.)
- Allgemeine Sozialwissenschaften
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
The Past Web: Exploring Web Archives. Springer International Publishing AG, 2021. S. 85-99.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review
}
TY - CHAP
T1 - A holistic view on web archives
AU - Holzmann, Helge
AU - Nejdl, Wolfgang
PY - 2021/8/1
Y1 - 2021/8/1
N2 - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
AB - In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
UR - http://www.scopus.com/inward/record.url?scp=85150047765&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63291-5_8
DO - 10.1007/978-3-030-63291-5_8
M3 - Contribution to book/anthology
AN - SCOPUS:85150047765
SN - 9783030632908
SP - 85
EP - 99
BT - The Past Web
PB - Springer International Publishing AG
ER -