Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach

Enrico Minack; Wolf Siberski; Wolfgang Nejdl

doi:10.1145/2009916.2009996

Details

Originalsprache	Englisch
Titel des Sammelwerks	SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
Herausgeber (Verlag)	Association for Computing Machinery (ACM)
Seiten	585-594
Seitenumfang	10
ISBN (Print)	9781450309349
Publikationsstatus	Veröffentlicht - 24 Juli 2011
Veranstaltung	34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China Dauer: 24 Juli 2011 → 28 Juli 2011

Publikationsreihe

Name	SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems

Zitieren

Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. / Minack, Enrico; Siberski, Wolf; Nejdl, Wolfgang.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. S. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Minack, E, Siberski, W & Nejdl, W 2011, Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. in SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery (ACM), S. 585-594, 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 24 Juli 2011. https://doi.org/10.1145/2009916.2009996

Minack, E., Siberski, W., & Nejdl, W. (2011). Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (S. 585-594). (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery (ACM). https://doi.org/10.1145/2009916.2009996

Minack E, Siberski W, Nejdl W. Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach. in SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM). 2011. S. 585-594. (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). doi: 10.1145/2009916.2009996

Minack, Enrico ; Siberski, Wolf ; Nejdl, Wolfgang. / Incremental Diversiﬁcation for Very Large Sets : A Streaming-based Approach. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), 2011. S. 585-594 (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Download

@inproceedings{028a68cf60a94c9ba57f4773fa52e8bb,

title = "Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach",

abstract = "Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.",

keywords = "Approximation, Diversification, Large sets, Streams",

author = "Enrico Minack and Wolf Siberski and Wolfgang Nejdl",

year = "2011",

month = jul,

day = "24",

doi = "10.1145/2009916.2009996",

language = "English",

isbn = "9781450309349",

series = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",

publisher = "Association for Computing Machinery (ACM)",

pages = "585--594",

booktitle = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",

address = "United States",

note = "34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 ; Conference date: 24-07-2011 Through 28-07-2011",

}

Download

TY - GEN

T1 - Incremental Diversiﬁcation for Very Large Sets

T2 - 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011

AU - Minack, Enrico

AU - Siberski, Wolf

AU - Nejdl, Wolfgang

PY - 2011/7/24

Y1 - 2011/7/24

N2 - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

AB - Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

KW - Approximation

KW - Diversification

KW - Large sets

KW - Streams

UR - http://www.scopus.com/inward/record.url?scp=80052136714&partnerID=8YFLogxK

U2 - 10.1145/2009916.2009996

DO - 10.1145/2009916.2009996

M3 - Conference contribution

AN - SCOPUS:80052136714

SN - 9781450309349

T3 - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 585

EP - 594

BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery (ACM)

Y2 - 24 July 2011 through 28 July 2011

ER -

Research@Leibniz University

Incremental Diversiﬁcation for Very Large Sets: A Streaming-based Approach

Autorschaft

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

Adaptive Dispatching of Mobile Charging Stations using Multi-Agent Graph Convolutional Cooperative-Competitive Reinforcement Learning

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

Adaptive Dispatching of Mobile Charging Stations using Multi-Agent Graph Convolutional Cooperative-Competitive Reinforcement Learning

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets