Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach

Enrico Minack; Wolf Siberski; Wolfgang Nejdl

doi:10.1145/2009916.2009996

Details

Original language	English
Title of host publication	SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
Publisher	Association for Computing Machinery (ACM)
Pages	585-594
Number of pages	10
ISBN (print)	9781450309349
Publication status	Published - 24 Jul 2011
Event	34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China Duration: 24 Jul 2011 → 28 Jul 2011

Publication series

Name	SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

Keywords

Approximation, Diversification, Large sets, Streams

ASJC Scopus subject areas

Computer Science(all)
Information Systems

Cite this

Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. / Minack, Enrico; Siberski, Wolf; Nejdl, Wolfgang.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. p. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Minack, E, Siberski, W & Nejdl, W 2011, Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. in SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery (ACM), pp. 585-594, 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 24 Jul 2011. https://doi.org/10.1145/2009916.2009996

Minack, E., Siberski, W., & Nejdl, W. (2011). Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 585-594). (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery (ACM). https://doi.org/10.1145/2009916.2009996

Minack E, Siberski W, Nejdl W. Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM). 2011. p. 585-594. (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2009916.2009996

Minack, Enrico ; Siberski, Wolf ; Nejdl, Wolfgang. / Incremental Diversiﬁcation for Very Large Sets : A Streaming-based Approach. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. pp. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Download

@inproceedings{028a68cf60a94c9ba57f4773fa52e8bb,

title = "Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach",

abstract = "Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.",

keywords = "Approximation, Diversification, Large sets, Streams",

author = "Enrico Minack and Wolf Siberski and Wolfgang Nejdl",

year = "2011",

month = jul,

day = "24",

doi = "10.1145/2009916.2009996",

language = "English",

isbn = "9781450309349",

series = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",

publisher = "Association for Computing Machinery (ACM)",

pages = "585--594",

booktitle = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",

address = "United States",

note = "34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 ; Conference date: 24-07-2011 Through 28-07-2011",

}

Download

TY - GEN

T1 - Incremental Diversiﬁcation for Very Large Sets

T2 - 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011

AU - Minack, Enrico

AU - Siberski, Wolf

AU - Nejdl, Wolfgang

PY - 2011/7/24

Y1 - 2011/7/24

N2 - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

AB - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

KW - Approximation

KW - Diversification

KW - Large sets

KW - Streams

UR - http://www.scopus.com/inward/record.url?scp=80052136714&partnerID=8YFLogxK

U2 - 10.1145/2009916.2009996

DO - 10.1145/2009916.2009996

M3 - Conference contribution

AN - SCOPUS:80052136714

SN - 9781450309349

T3 - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 585

EP - 594

BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery (ACM)

Y2 - 24 July 2011 through 28 July 2011

ER -

Research@Leibniz University

Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach

Authors

Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

An artificial intelligence-assisted clinical framework to facilitate diagnostics and translational discovery in hematologic neoplasia