Incremental Diversification for Very Large Sets: A Streaming-based Approach

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery (ACM)
Pages585-594
Number of pages10
ISBN (print)9781450309349
Publication statusPublished - 24 Jul 2011
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China
Duration: 24 Jul 201128 Jul 2011

Publication series

NameSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

Keywords

    Approximation, Diversification, Large sets, Streams

ASJC Scopus subject areas

Cite this

Incremental Diversification for Very Large Sets: A Streaming-based Approach. / Minack, Enrico; Siberski, Wolf; Nejdl, Wolfgang.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. p. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Minack, E, Siberski, W & Nejdl, W 2011, Incremental Diversification for Very Large Sets: A Streaming-based Approach. in SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery (ACM), pp. 585-594, 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 24 Jul 2011. https://doi.org/10.1145/2009916.2009996
Minack, E., Siberski, W., & Nejdl, W. (2011). Incremental Diversification for Very Large Sets: A Streaming-based Approach. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 585-594). (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery (ACM). https://doi.org/10.1145/2009916.2009996
Minack E, Siberski W, Nejdl W. Incremental Diversification for Very Large Sets: A Streaming-based Approach. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM). 2011. p. 585-594. (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2009916.2009996
Minack, Enrico ; Siberski, Wolf ; Nejdl, Wolfgang. / Incremental Diversification for Very Large Sets : A Streaming-based Approach. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. pp. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).
Download
@inproceedings{028a68cf60a94c9ba57f4773fa52e8bb,
title = "Incremental Diversification for Very Large Sets: A Streaming-based Approach",
abstract = "Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.",
keywords = "Approximation, Diversification, Large sets, Streams",
author = "Enrico Minack and Wolf Siberski and Wolfgang Nejdl",
year = "2011",
month = jul,
day = "24",
doi = "10.1145/2009916.2009996",
language = "English",
isbn = "9781450309349",
series = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",
publisher = "Association for Computing Machinery (ACM)",
pages = "585--594",
booktitle = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",
address = "United States",
note = "34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 ; Conference date: 24-07-2011 Through 28-07-2011",

}

Download

TY - GEN

T1 - Incremental Diversification for Very Large Sets

T2 - 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011

AU - Minack, Enrico

AU - Siberski, Wolf

AU - Nejdl, Wolfgang

PY - 2011/7/24

Y1 - 2011/7/24

N2 - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

AB - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

KW - Approximation

KW - Diversification

KW - Large sets

KW - Streams

UR - http://www.scopus.com/inward/record.url?scp=80052136714&partnerID=8YFLogxK

U2 - 10.1145/2009916.2009996

DO - 10.1145/2009916.2009996

M3 - Conference contribution

AN - SCOPUS:80052136714

SN - 9781450309349

T3 - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 585

EP - 594

BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery (ACM)

Y2 - 24 July 2011 through 28 July 2011

ER -

By the same author(s)