Package org.apache.spark.util.sketch
Class CountMinSketch
Object
org.apache.spark.util.sketch.CountMinSketch
A Count-min sketch is a probabilistic data structure used for cardinality estimation using
 sub-linear space.  Currently, supported data types include:
 
 A 
CountMinSketch is initialized with a random seed, and a pair of parameters:
 - relative error (or eps), and
- confidence (or delta)
x has appeared in a data
 stream so far.  With probability delta, the estimate of this frequency is within the
 range true frequency <= estimate <= true frequency + eps * N, where N is the
 total count of items have appeared the data stream so far.
 Under the cover, a CountMinSketch is essentially a two-dimensional long array
 with depth d and width w, where
 - d = ceil(2 / eps)
- w = ceil(-log(1 - confidence) / log(2))
CountMinSketch class from stream-lib.- 
Nested Class SummaryNested Classes
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionabstract voidIncrementsitem's count by one.abstract voidIncrementsitem's count bycount.abstract voidaddBinary(byte[] item) Incrementsitem's count by one.abstract voidaddBinary(byte[] item, long count) Incrementsitem's count bycount.abstract voidaddLong(long item) Incrementsitem's count by one.abstract voidaddLong(long item, long count) Incrementsitem's count bycount.abstract voidIncrementsitem's count by one.abstract voidIncrementsitem's count bycount.abstract doubleReturns the confidence (ordelta) of thisCountMinSketch.static CountMinSketchcreate(double eps, double confidence, int seed) static CountMinSketchcreate(int depth, int width, int seed) abstract intdepth()Depth of thisCountMinSketch.abstract longestimateCount(Object item) Returns the estimated frequency ofitem.abstract CountMinSketchmergeInPlace(CountMinSketch other) Merges anotherCountMinSketchwith this one in place.static CountMinSketchreadFrom(byte[] bytes) Reads in aCountMinSketchfrom a byte array.static CountMinSketchreadFrom(InputStream in) Reads in aCountMinSketchfrom an input stream.abstract doubleReturns the relative error (oreps) of thisCountMinSketch.abstract byte[]Serializes thisCountMinSketchand returns the serialized form.abstract longTotal count of items added to thisCountMinSketchso far.abstract intwidth()Width of thisCountMinSketch.abstract voidwriteTo(OutputStream out) Writes out thisCountMinSketchto an output stream in binary format.
- 
Constructor Details- 
CountMinSketchpublic CountMinSketch()
 
- 
- 
Method Details- 
relativeErrorpublic abstract double relativeError()Returns the relative error (oreps) of thisCountMinSketch.
- 
confidencepublic abstract double confidence()Returns the confidence (ordelta) of thisCountMinSketch.
- 
depthpublic abstract int depth()Depth of thisCountMinSketch.
- 
widthpublic abstract int width()Width of thisCountMinSketch.
- 
totalCountpublic abstract long totalCount()Total count of items added to thisCountMinSketchso far.
- 
addIncrementsitem's count by one.
- 
addIncrementsitem's count bycount.
- 
addLongpublic abstract void addLong(long item) Incrementsitem's count by one.
- 
addLongpublic abstract void addLong(long item, long count) Incrementsitem's count bycount.
- 
addStringIncrementsitem's count by one.
- 
addStringIncrementsitem's count bycount.
- 
addBinarypublic abstract void addBinary(byte[] item) Incrementsitem's count by one.
- 
addBinarypublic abstract void addBinary(byte[] item, long count) Incrementsitem's count bycount.
- 
estimateCountReturns the estimated frequency ofitem.
- 
mergeInPlaceMerges anotherCountMinSketchwith this one in place. Note that only Count-Min sketches with the samedepth,width, and random seed can be merged.- Throws:
- IncompatibleMergeException- if the- other- CountMinSketchhas incompatible depth, width, relative-error, confidence, or random seed.
 
- 
writeToWrites out thisCountMinSketchto an output stream in binary format. It is the caller's responsibility to close the stream.- Throws:
- IOException
 
- 
toByteArraySerializes thisCountMinSketchand returns the serialized form.- Throws:
- IOException
 
- 
readFromReads in aCountMinSketchfrom an input stream. It is the caller's responsibility to close the stream.- Throws:
- IOException
 
- 
readFromReads in aCountMinSketchfrom a byte array.- Throws:
- IOException
 
- 
create- Parameters:
- depth- depth of the Count-min Sketch, must be positive
- width- width of the Count-min Sketch, must be positive
- seed- random seed
 
- 
create- Parameters:
- eps- relative error, must be positive
- confidence- confidence, must be positive and less than 1.0
- seed- random seed
 
 
-