Ranking Archived Documents for Structured Queries on Semantic Layers

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationJCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages155-164
Number of pages10
ISBN (electronic)9781450351782
Publication statusPublished - 23 May 2018
Event18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 - Fort Worth, United States
Duration: 3 Jun 20187 Jun 2018

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Abstract

Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.

Keywords

    archived documents, probabilistic modeling, ranking, semantic layers, stochastic modeling

ASJC Scopus subject areas

Cite this

Ranking Archived Documents for Structured Queries on Semantic Layers. / Fafalios, Pavlos; Kasturia, Vaibhav; Nejdl, Wolfgang.
JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. p. 155-164 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Fafalios, P, Kasturia, V & Nejdl, W 2018, Ranking Archived Documents for Structured Queries on Semantic Layers. in JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Institute of Electrical and Electronics Engineers Inc., pp. 155-164, 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018, Fort Worth, United States, 3 Jun 2018. https://doi.org/10.1145/3197026.3197049
Fafalios, P., Kasturia, V., & Nejdl, W. (2018). Ranking Archived Documents for Structured Queries on Semantic Layers. In JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (pp. 155-164). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3197026.3197049
Fafalios P, Kasturia V, Nejdl W. Ranking Archived Documents for Structured Queries on Semantic Layers. In JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. 2018. p. 155-164. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). doi: 10.1145/3197026.3197049
Fafalios, Pavlos ; Kasturia, Vaibhav ; Nejdl, Wolfgang. / Ranking Archived Documents for Structured Queries on Semantic Layers. JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 155-164 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
Download
@inproceedings{81c661779a3346d89f30d8d7f4077e4b,
title = "Ranking Archived Documents for Structured Queries on Semantic Layers",
abstract = "Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.",
keywords = "archived documents, probabilistic modeling, ranking, semantic layers, stochastic modeling",
author = "Pavlos Fafalios and Vaibhav Kasturia and Wolfgang Nejdl",
note = "Funding information: The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233).; 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 ; Conference date: 03-06-2018 Through 07-06-2018",
year = "2018",
month = may,
day = "23",
doi = "10.1145/3197026.3197049",
language = "English",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "155--164",
booktitle = "JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries",
address = "United States",

}

Download

TY - GEN

T1 - Ranking Archived Documents for Structured Queries on Semantic Layers

AU - Fafalios, Pavlos

AU - Kasturia, Vaibhav

AU - Nejdl, Wolfgang

N1 - Funding information: The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233).

PY - 2018/5/23

Y1 - 2018/5/23

N2 - Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.

AB - Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.

KW - archived documents

KW - probabilistic modeling

KW - ranking

KW - semantic layers

KW - stochastic modeling

UR - http://www.scopus.com/inward/record.url?scp=85048856169&partnerID=8YFLogxK

U2 - 10.1145/3197026.3197049

DO - 10.1145/3197026.3197049

M3 - Conference contribution

AN - SCOPUS:85048856169

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 155

EP - 164

BT - JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018

Y2 - 3 June 2018 through 7 June 2018

ER -

By the same author(s)