Monday, April 7, 2014

Stream Mining with Rfx Framework

Estimate the unique words from data stream URL http://en.wikipedia.org/wiki/List_of_United_States_counties_and_county_equivalents
Using new data structure HyperLogLog since Redis 2.8.9
http://redis.io/commands#hyperloglog

Open Source Stream Library of AddThis
https://github.com/addthis/stream-lib

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
Original Paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

Mining Data Stream 
Slide: http://www.stanford.edu/class/cs246/slides/16-streams.pdf

Applicable Problems:
  • Estimate the unique elements in continuous data stream
  • Estimation for Big Data
  • finding an ever growing number of applications in networking and traffic monitoring, such as the detection of worm propagation, of network attacks (e.g., by Denial of Service), and of link-based spam on the web
  • an important indication for detecting attacks and monitoring traffic, as it records the number of distinct active flows
Refer Links