Cardinality estimation and dynamic length adaptation for Bloom filters

Odysseas Papapetrou; Wolf Siberski; Wolfgang Nejdl

doi:10.1007/s10619-010-7067-2

Details

Original language	English
Pages (from-to)	119-156
Number of pages	38
Journal	Distributed and parallel databases
Volume	28
Issue number	2-3
Publication status	Published - 2 Sept 2010

Abstract

Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.

Keywords

Bloom filters, Distributed databases, Distributed information systems

ASJC Scopus subject areas

Computer Science(all)
Software
Computer Science(all)
Information Systems
Computer Science(all)
Hardware and Architecture
Decision Sciences(all)
Information Systems and Management

Cite this

Cardinality estimation and dynamic length adaptation for Bloom filters. / Papapetrou, Odysseas; Siberski, Wolf; Nejdl, Wolfgang.
In: Distributed and parallel databases, Vol. 28, No. 2-3, 02.09.2010, p. 119-156.

Research output: Contribution to journal › Article › Research › peer review

Papapetrou, O, Siberski, W & Nejdl, W 2010, 'Cardinality estimation and dynamic length adaptation for Bloom filters', Distributed and parallel databases, vol. 28, no. 2-3, pp. 119-156. https://doi.org/10.1007/s10619-010-7067-2

Papapetrou, O., Siberski, W., & Nejdl, W. (2010). Cardinality estimation and dynamic length adaptation for Bloom filters. Distributed and parallel databases, 28(2-3), 119-156. https://doi.org/10.1007/s10619-010-7067-2

Papapetrou O, Siberski W, Nejdl W. Cardinality estimation and dynamic length adaptation for Bloom filters. Distributed and parallel databases. 2010 Sept 2;28(2-3):119-156. doi: 10.1007/s10619-010-7067-2

Papapetrou, Odysseas ; Siberski, Wolf ; Nejdl, Wolfgang. / Cardinality estimation and dynamic length adaptation for Bloom filters. In: Distributed and parallel databases. 2010 ; Vol. 28, No. 2-3. pp. 119-156.

Download

@article{4e0d4e29b4aa4800b0f08ab45bd349b5,

title = "Cardinality estimation and dynamic length adaptation for Bloom filters",

abstract = "Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.",

keywords = "Bloom filters, Distributed databases, Distributed information systems",

author = "Odysseas Papapetrou and Wolf Siberski and Wolfgang Nejdl",

year = "2010",

month = sep,

day = "2",

doi = "10.1007/s10619-010-7067-2",

language = "English",

volume = "28",

pages = "119--156",

journal = "Distributed and parallel databases",

issn = "0926-8782",

publisher = "Springer Netherlands",

number = "2-3",

}

Download

TY - JOUR

T1 - Cardinality estimation and dynamic length adaptation for Bloom filters

AU - Papapetrou, Odysseas

AU - Siberski, Wolf

AU - Nejdl, Wolfgang

PY - 2010/9/2

Y1 - 2010/9/2

N2 - Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.

AB - Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.

KW - Bloom filters

KW - Distributed databases

KW - Distributed information systems

UR - http://www.scopus.com/inward/record.url?scp=77957859235&partnerID=8YFLogxK

U2 - 10.1007/s10619-010-7067-2

DO - 10.1007/s10619-010-7067-2

M3 - Article

AN - SCOPUS:77957859235

VL - 28

SP - 119

EP - 156

JO - Distributed and parallel databases

JF - Distributed and parallel databases

SN - 0926-8782

IS - 2-3

ER -

Research@Leibniz University

Cardinality estimation and dynamic length adaptation for Bloom filters

Authors

Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Adaptive Dispatching of Mobile Charging Stations using Multi-Agent Graph Convolutional Cooperative-Competitive Reinforcement Learning

Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

Open benchmark for filtering techniques in entity resolution

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions