Package org.apache.spark.util.sketch
Class CountMinSketch
Object
org.apache.spark.util.sketch.CountMinSketch
A Count-min sketch is a probabilistic data structure used for cardinality estimation using
sub-linear space. Currently, supported data types include:
A
CountMinSketch is initialized with a random seed, and a pair of parameters:
- relative error (or
eps), and - confidence (or
delta)
x has appeared in a data
stream so far. With probability delta, the estimate of this frequency is within the
range true frequency <= estimate <= true frequency + eps * N, where N is the
total count of items have appeared the data stream so far.
Under the cover, a CountMinSketch is essentially a two-dimensional long array
with depth d and width w, where
d = ceil(2 / eps)w = ceil(-log(1 - confidence) / log(2))
CountMinSketch class from stream-lib.-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract voidIncrementsitem's count by one.abstract voidIncrementsitem's count bycount.abstract voidaddBinary(byte[] item) Incrementsitem's count by one.abstract voidaddBinary(byte[] item, long count) Incrementsitem's count bycount.abstract voidaddLong(long item) Incrementsitem's count by one.abstract voidaddLong(long item, long count) Incrementsitem's count bycount.abstract voidIncrementsitem's count by one.abstract voidIncrementsitem's count bycount.abstract doubleReturns the confidence (ordelta) of thisCountMinSketch.static CountMinSketchcreate(double eps, double confidence, int seed) static CountMinSketchcreate(int depth, int width, int seed) abstract intdepth()Depth of thisCountMinSketch.abstract longestimateCount(Object item) Returns the estimated frequency ofitem.abstract CountMinSketchmergeInPlace(CountMinSketch other) Merges anotherCountMinSketchwith this one in place.static CountMinSketchreadFrom(byte[] bytes) Reads in aCountMinSketchfrom a byte array.static CountMinSketchreadFrom(InputStream in) Reads in aCountMinSketchfrom an input stream.abstract doubleReturns the relative error (oreps) of thisCountMinSketch.abstract byte[]Serializes thisCountMinSketchand returns the serialized form.abstract longTotal count of items added to thisCountMinSketchso far.abstract intwidth()Width of thisCountMinSketch.abstract voidwriteTo(OutputStream out) Writes out thisCountMinSketchto an output stream in binary format.
-
Constructor Details
-
CountMinSketch
public CountMinSketch()
-
-
Method Details
-
relativeError
public abstract double relativeError()Returns the relative error (oreps) of thisCountMinSketch. -
confidence
public abstract double confidence()Returns the confidence (ordelta) of thisCountMinSketch. -
depth
public abstract int depth()Depth of thisCountMinSketch. -
width
public abstract int width()Width of thisCountMinSketch. -
totalCount
public abstract long totalCount()Total count of items added to thisCountMinSketchso far. -
add
Incrementsitem's count by one. -
add
Incrementsitem's count bycount. -
addLong
public abstract void addLong(long item) Incrementsitem's count by one. -
addLong
public abstract void addLong(long item, long count) Incrementsitem's count bycount. -
addString
Incrementsitem's count by one. -
addString
Incrementsitem's count bycount. -
addBinary
public abstract void addBinary(byte[] item) Incrementsitem's count by one. -
addBinary
public abstract void addBinary(byte[] item, long count) Incrementsitem's count bycount. -
estimateCount
Returns the estimated frequency ofitem. -
mergeInPlace
Merges anotherCountMinSketchwith this one in place. Note that only Count-Min sketches with the samedepth,width, and random seed can be merged.- Throws:
IncompatibleMergeException- if theotherCountMinSketchhas incompatible depth, width, relative-error, confidence, or random seed.
-
writeTo
Writes out thisCountMinSketchto an output stream in binary format. It is the caller's responsibility to close the stream.- Throws:
IOException
-
toByteArray
Serializes thisCountMinSketchand returns the serialized form.- Throws:
IOException
-
readFrom
Reads in aCountMinSketchfrom an input stream. It is the caller's responsibility to close the stream.- Throws:
IOException
-
readFrom
Reads in aCountMinSketchfrom a byte array.- Throws:
IOException
-
create
- Parameters:
depth- depth of the Count-min Sketch, must be positivewidth- width of the Count-min Sketch, must be positiveseed- random seed
-
create
- Parameters:
eps- relative error, must be positiveconfidence- confidence, must be positive and less than 1.0seed- random seed
-