public abstract class BloomFilter
extends Object
Byte
Short
Integer
Long
String
FPP
) of a Bloom filter is defined as the probability that
mightContain(Object) will erroneously return true
for an object that has
not actually been put in the BloomFilter
.
The implementation is largely based on the BloomFilter
class from Guava.Modifier and Type | Class and Description |
---|---|
static class |
BloomFilter.Version |
Constructor and Description |
---|
BloomFilter() |
Modifier and Type | Method and Description |
---|---|
abstract long |
bitSize()
Returns the number of bits in the underlying bit array.
|
long |
cardinality() |
static BloomFilter |
create(long expectedNumItems)
Creates a
BloomFilter with the expected number of insertions and a default expected
false positive probability of 3%. |
static BloomFilter |
create(long expectedNumItems,
double fpp)
Creates a
BloomFilter with the expected number of insertions and expected false
positive probability. |
static BloomFilter |
create(long expectedNumItems,
long numBits)
Creates a
BloomFilter with given expectedNumItems and numBits , it will
pick an optimal numHashFunctions which can minimize fpp for the bloom filter. |
abstract double |
expectedFpp()
Returns the probability that mightContain(Object) erroneously return
true
for an object that has not actually been put in the BloomFilter . |
abstract BloomFilter |
intersectInPlace(BloomFilter other)
Combines this bloom filter with another bloom filter by performing a bitwise AND of the
underlying data.
|
abstract boolean |
isCompatible(BloomFilter other)
Determines whether a given bloom filter is compatible with this bloom filter.
|
abstract BloomFilter |
mergeInPlace(BloomFilter other)
Combines this bloom filter with another bloom filter by performing a bitwise OR of the
underlying data.
|
abstract boolean |
mightContain(Object item)
Returns
true if the element might have been put in this Bloom filter,
false if this is definitely not the case. |
abstract boolean |
mightContainBinary(byte[] item)
A specialized variant of
mightContain(Object) that only tests byte array items. |
abstract boolean |
mightContainLong(long item)
A specialized variant of
mightContain(Object) that only tests long items. |
abstract boolean |
mightContainString(String item)
A specialized variant of
mightContain(Object) that only tests String items. |
static long |
optimalNumOfBits(long n,
double p)
Computes m (total bits of Bloom filter) which is expected to achieve, for the specified
expected insertions, the required false positive probability.
|
static long |
optimalNumOfBits(long expectedNumItems,
long maxNumItems,
long maxNumOfBits)
Computes m (total bits of Bloom filter) which is expected to achieve.
|
abstract boolean |
put(Object item)
Puts an item into this
BloomFilter . |
abstract boolean |
putBinary(byte[] item)
A specialized variant of
put(Object) that only supports byte array items. |
abstract boolean |
putLong(long item)
A specialized variant of
put(Object) that only supports long items. |
abstract boolean |
putString(String item)
A specialized variant of
put(Object) that only supports String items. |
static BloomFilter |
readFrom(java.io.InputStream in)
Reads in a
BloomFilter from an input stream. |
abstract void |
writeTo(java.io.OutputStream out)
Writes out this
BloomFilter to an output stream in binary format. |
public abstract double expectedFpp()
true
for an object that has not actually been put in the BloomFilter
.
Ideally, this number should be close to the fpp
parameter passed in
create(long, double), or smaller. If it is significantly higher, it is usually
the case that too many items (more than expected) have been put in the BloomFilter
,
degenerating it.public abstract long bitSize()
public abstract boolean put(Object item)
BloomFilter
. Ensures that subsequent invocations of
mightContain(Object) with the same item will always return true
.object
has been added to the
filter. If the bits haven't changed, this might be the first time object
has been added to the filter. Note that put(t)
always returns the
opposite result to what mightContain(t)
would have returned at the time
it is called.public abstract boolean putString(String item)
put(Object)
that only supports String
items.public abstract boolean putLong(long item)
put(Object)
that only supports long
items.public abstract boolean putBinary(byte[] item)
put(Object)
that only supports byte array items.public abstract boolean isCompatible(BloomFilter other)
other
- The bloom filter to check for compatibility.public abstract BloomFilter mergeInPlace(BloomFilter other) throws IncompatibleMergeException
other
- The bloom filter to combine this bloom filter with. It is not mutated.IncompatibleMergeException
- if isCompatible(other) == false
public abstract BloomFilter intersectInPlace(BloomFilter other) throws IncompatibleMergeException
other
- The bloom filter to combine this bloom filter with. It is not mutated.IncompatibleMergeException
- if isCompatible(other) == false
public abstract boolean mightContain(Object item)
true
if the element might have been put in this Bloom filter,
false
if this is definitely not the case.public abstract boolean mightContainString(String item)
mightContain(Object)
that only tests String
items.public abstract boolean mightContainLong(long item)
mightContain(Object)
that only tests long
items.public abstract boolean mightContainBinary(byte[] item)
mightContain(Object)
that only tests byte array items.public abstract void writeTo(java.io.OutputStream out) throws java.io.IOException
BloomFilter
to an output stream in binary format. It is the caller's
responsibility to close the stream.java.io.IOException
public long cardinality()
BloomFilter
.public static BloomFilter readFrom(java.io.InputStream in) throws java.io.IOException
BloomFilter
from an input stream. It is the caller's responsibility to close
the stream.java.io.IOException
public static long optimalNumOfBits(long n, double p)
n
- expected insertions (must be positive)p
- false positive rate (must be 0 < p < 1)public static long optimalNumOfBits(long expectedNumItems, long maxNumItems, long maxNumOfBits)
public static BloomFilter create(long expectedNumItems)
BloomFilter
with the expected number of insertions and a default expected
false positive probability of 3%.
Note that overflowing a BloomFilter
with significantly more elements than specified,
will result in its saturation, and a sharp deterioration of its false positive probability.public static BloomFilter create(long expectedNumItems, double fpp)
BloomFilter
with the expected number of insertions and expected false
positive probability.
Note that overflowing a BloomFilter
with significantly more elements than specified,
will result in its saturation, and a sharp deterioration of its false positive probability.public static BloomFilter create(long expectedNumItems, long numBits)
BloomFilter
with given expectedNumItems
and numBits
, it will
pick an optimal numHashFunctions
which can minimize fpp
for the bloom filter.