Details
Original language | English |
---|---|
Title of host publication | JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 155-164 |
Number of pages | 10 |
ISBN (electronic) | 9781450351782 |
Publication status | Published - 23 May 2018 |
Event | 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 - Fort Worth, United States Duration: 3 Jun 2018 → 7 Jun 2018 |
Publication series
Name | Proceedings of the ACM/IEEE Joint Conference on Digital Libraries |
---|---|
ISSN (Print) | 1552-5996 |
Abstract
Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.
Keywords
- archived documents, probabilistic modeling, ranking, semantic layers, stochastic modeling
ASJC Scopus subject areas
- Engineering(all)
- General Engineering
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. p. 155-164 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Ranking Archived Documents for Structured Queries on Semantic Layers
AU - Fafalios, Pavlos
AU - Kasturia, Vaibhav
AU - Nejdl, Wolfgang
N1 - Funding information: The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233).
PY - 2018/5/23
Y1 - 2018/5/23
N2 - Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.
AB - Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.
KW - archived documents
KW - probabilistic modeling
KW - ranking
KW - semantic layers
KW - stochastic modeling
UR - http://www.scopus.com/inward/record.url?scp=85048856169&partnerID=8YFLogxK
U2 - 10.1145/3197026.3197049
DO - 10.1145/3197026.3197049
M3 - Conference contribution
AN - SCOPUS:85048856169
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 155
EP - 164
BT - JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018
Y2 - 3 June 2018 through 7 June 2018
ER -