US20090268623A1 - Efficient probabilistic counting scheme for stream-expression cardinalities - Google Patents
Efficient probabilistic counting scheme for stream-expression cardinalities Download PDFInfo
- Publication number
- US20090268623A1 US20090268623A1 US12/110,380 US11038008A US2009268623A1 US 20090268623 A1 US20090268623 A1 US 20090268623A1 US 11038008 A US11038008 A US 11038008A US 2009268623 A1 US2009268623 A1 US 2009268623A1
- Authority
- US
- United States
- Prior art keywords
- nodes
- updating
- data packets
- received
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/026—Capturing of monitoring data using flow identification
Definitions
- the present invention relates to traffic analysis in a network.
- Massive and distributed data streams are increasingly prevalent in many modern applications.
- IP Internet-Protocol
- packets arrive at and depart from the nodes at very high speeds.
- web content-delivery system composed of many servers (such as Akamai)
- user requests for accessing websites are distributed among the many servers based on the location of the user and current server loads.
- Other application domains that give rise to these massive and distributed streams include financial applications and sensor networks.
- the number of distinct flows in a network sharing the same characteristics is of high interest to network operators, where a packet flow is defined as, e.g., a sequence of packets that have the same 5-tuple (a logical construct containing five parameters used to identify the connection and allowing network packets of data to be communicated between a server process and a client process in a bi-directional fashion), the same IP addresses/ports of the two communicating peers, and the same protocol.
- the flow ID of a packet can be derived from the 5-tuple.
- the number of distinct flows between a node pair x and y which is a type of traffic matrix element, can be formulated as the number of flows of
- traffic matrix elements are the numbers of streams of packet-flow IDs seen at nodes x and y, respectively.
- traffic matrix elements can be used by network operators for network provisioning and optimization.
- a second example is the total number of distinct flows to the same destination node i, i.e., U i , where are the streams of packet-flow IDs to node i.
- U i the total number of distinct flows to the same destination node i
- a significant increase in U i may indicate an underlying network anomaly, such as a Denial of Service (DoS) attack.
- DoS Denial of Service
- set expression refers to an expression that defines a set of data elements and is made up of set identifiers (i.e., names of sets) and set operations (such as complements, unions, intersections, and differences) performed on those sets.
- stream expression refers to a set expression defined over multiple streams (such as streams of data passing through different nodes of a network), where each stream is considered as a set of elements. Since, in a given stream expression, elements may appear more than once, the term “stream-expression cardinality” refers to the number of distinct elements in a stream expression. For example, in the Venn diagram of FIG.
- T 1 and T 2 represent two different stream expressions
- the cardinality of T 1 is 1 (i.e., T 1 contains 1 distinct element)
- the cardinality of T 2 is 2 (i.e., T 2 contains 2 distinct elements).
- the cardinality of the stream-intersection set T 1 ⁇ T 2 is 0, since there are no elements common to both T 1 and T 2 , and the cardinality of the stream-union set T 1 ⁇ T 2 is 3.
- the present invention provides a method of monitoring a network.
- the method includes, at each node of a fixed set, constructing a corresponding vector of M components based on data packets received at the node during a time period, M being an integer greater than 1, the fixed set being formed of some nodes of the network; and, based on the constructed vectors, estimating how many of the received data packets have been received by all of the nodes of the set or estimating how many flows of the received data packets have data packets that have passed through all of the nodes of the set.
- the constructing includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet.
- the updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes.
- FIG. 1 is an exemplary block diagram for a portion of a network implementing a method consistent with one embodiment of the present invention
- FIG. 2 is a flowchart of an exemplary method for monitoring flows or packets among multiple network nodes by using a Proportional-Union Estimation (PUE) estimator;
- PUE Proportional-Union Estimation
- FIG. 3 is a Venn diagram illustrating proportions of set-expression cardinalities for a two-stream set expression
- FIG. 4 is a Venn diagram illustrating proportions of set-expression cardinalities for a three-stream set expression
- FIG. 5 is a flowchart illustrating a method for continuous FM-vector generation consistent with one embodiment of the present invention
- FIG. 6 is a flowchart illustrating an exemplary method consistent with one embodiment of the present invention.
- FIG. 7 is an exemplary block diagram of a network implementing an exemplary method consistent with one embodiment of the present invention.
- An embodiment of a PUE method is a traffic-matrix estimation problem in a high-speed network, where the objective is to accurately estimate total traffic volume (in flows or packets) between origin and destination nodes in the network, using flows or packet streams observed at individual nodes.
- This embodiment may be used in the system of FIG. 1 .
- data streams travel from an origin node 101 to a destination node 102 via the Internet 103 .
- the streams include a first stream T 1 of flows or packets passing through origin node 101 and a second stream T 2 of flows or packets passing through destination node 102 .
- Each of nodes 101 and 102 generates and provides a vector of M components, where M>1, to server 104 , which applies one or more set operations to the vectors for the two data streams T 1 ,T 2 to generate a PUE cardinality estimate, i.e., an estimate of the number of elements in these data streams.
- Server 104 includes a memory 105 (e.g., a hard disk or RAM device) that contains machine-readable instructions for generating the PUE cardinality estimate. This PUE estimate characterizes the amount of traffic between origin node 101 and destination node 102 and can be used by server 104 for various traffic-monitoring or routing purposes.
- server 104 might send messages to one or more nodes in the network (not shown) to admit new flows or packets into the network at origin node 101 or to deny entrance to such new flows or packets, depending on the amount of traffic indicated by the PUE cardinality estimate.
- FIG. 2 shows an embodiment of a method for monitoring flows among network nodes with a PUE estimator. This method can also be used to monitor individual packets, each of which would simply be treated as a single-packet flow.
- step 210 continuous FM vectors from flow IDs (or packet headers, in the case of single-packet flows) are generated at a plurality of individual nodes in the network (e.g., as shown in the flowchart of FIG. 5 , which will be discussed in further detail below).
- Step 210 can be carried out, e.g., according to Algorithm 1 , below.
- the vectors from the plurality of individual nodes in the network are all received at a single node (e.g., server 104 of FIG. 1 ).
- This receiving step may be the result of one or more queries from the single node, or alternatively, the vectors may automatically be provided by the individual nodes.
- a PUE scheme is used to estimate the number of flows (or packets, in the case of single-packet flows) in stream expressions over the plurality of individual nodes in the network, i.e., a PUE cardinality estimate is generated by applying one or more set operations to the vectors corresponding to the stream expressions.
- Step 230 can be carried out, e.g., by methods based on the formulae of Equation (6), below.
- step 240 based on the amount of traffic indicated by the PUE cardinality estimate of the number of flows or packets, a determination is made whether to admit one or more new flows or packets into the network.
- the expression S is used herein to denote a stream expression of interest being studied, and N is the cardinality of the union of all streams.
- the variable ⁇ represents a specified desirable value of relative standard error, which is a measure of an estimate's reliability obtained by dividing the standard error of the estimate by the estimate itself.
- a stream expression passes through an arbitrary combination of some network nodes but not others.
- distributed streams under consideration will be denoted by , 1 ⁇ j ⁇ J, where J is the number of nodes in the network.
- the operators ⁇ , ⁇ , and ⁇ represent set union, set intersection, and set difference, respectively.
- is used to denote the cardinality of a set.
- 1 ⁇ j ⁇ J For a stream expression that involves an arbitrary combination of unions, intersections, or differences of , 1 ⁇ j ⁇ J, e.g.,
- estimating the cardinality of the set is done to provide probabilistic guarantees of estimation accuracy, which permits minimizing computational overhead and memory usage.
- the expression P(•) represents a probability function.
- the expressions E(•) and var(•) represent the expectation and variance of a random variable, respectively.
- the expressions corr(•,•) and cov(•,•) represent correlation and covariance, respectively, between two random variables, and the expression
- the expression z, 36 means a definition, and the expression a ⁇ b is equivalent to a/b ⁇ 1, where the operator represents an approximation of equality.
- the expression Exp(r) (also written as e r ) is an exponential distribution with rate parameter r, and Normal( ⁇ , ⁇ 2 ) represents a Gaussian distribution with mean ⁇ and variance ⁇ .
- N represents the cardinality of the total stream union
- p s
- f ⁇ (x) is a probability-density function for a continuous random variable x (or a probability-mass function for a discrete random variable x), parameterized by ⁇ .
- L( ⁇ ) of ⁇ i.e.,
- I ⁇ ( ⁇ ) ( 1 n ) ⁇ E ⁇ ( ⁇ ⁇ ⁇ ⁇ log ⁇ ⁇ L ⁇ ( ⁇ ) ) 2 ( 1 )
- ⁇ circumflex over ( ⁇ ) ⁇ is an unbiased estimator of ⁇ based on the given sample x 1 , x 2 , . . . , x n .
- the variance of ⁇ circumflex over ( ⁇ ) ⁇ is then bounded by the reciprocal of the Fisher information I( ⁇ ), i.e.,
- Cramer-Rao inequality also known as the Cramer-Rao lower bound (CRB), as described in Bickel et al., Mathematical Statistics: Basic Ideas and Selected Topics, Vol 1, (2 nd Edition ), Prentice Hall, 2000, which is incorporated herein by reference in its entirety.
- CB Cramer-Rao lower bound
- the efficiency of ⁇ circumflex over ( ⁇ ) ⁇ is defined using the Cramer-Rao lower bound by
- MLE Maximum-Likelihood Estimation
- the cardinality of distributed streams can be estimated using vectors that are compact representations of the actual elements in the streams of the streams.
- the vectors have M components, where M>1.
- Such vectors may also be referred to as “sketches,” “statistical sketches,” “statistical digests,” “digests,” or “minimal statistics.”
- Such vectors can be, e.g., sampling-based or hash-based probabilistic vectors.
- These probabilistic vector-based solutions which largely focus on deriving novel algorithms for deriving the vectors, include the Flajolet-Martin (FM) estimator, proposed for calculating the cardinality of a single stream, as disclosed in Flajolet et al., “Probabilistic counting,” In Proc. Symp. on Foundations of Computer Science ( FOCS ), 1983, which is incorporated herein by reference in its entirety.
- FM Flajolet-Martin
- Flajolet and Martin proposed an estimator for counting distinct values in a single stream, using a hash-based probabilistic vector with O(log N) space, where N is the true cardinality.
- a hash function is used to map an element in the stream to an integer that follows a geometric distribution.
- a continuous variant of the FM algorithm is developed by replacing the geometric random number with a uniform random number. The continuous variant is used to simplify the statistical analysis, as will be discussed below.
- a hash array Y[k] is initialized with 1 for all k.
- an incoming element e is hashed using the function h:[M] ⁇ 1, . . . ,m ⁇ ,which is a universal hash function that maps an element uniformly over an array of m buckets.
- a random number is generated using the function g:[M] ⁇ [0,1], which is a universal hash function that maps the element e to a uniform random number in [0,1], independent of h.
- an update of Y[k] is then performed using:
- step 550 a determination is made whether additional elements e exist, in which case the method returns to step 520 . If, at step 550 , it is determined that no additional elements exist, then the method proceeds to step 560 , wherein hash array Y is returned as a result. A bucket value will remain at 1 if no element is hashed to that bucket.
- the following exemplary pseudo-code Algorithm 1 may be used to implement continuous FM-vector generation for stream :
- steps 1, 2, 3, 4, 5, and 6 correspond to steps 510 , 550 , 520 , 530 , 540 , and 560 of FIG. 5 , respectively, as described above.
- the variable ⁇ m ⁇ 1
- represents the mean number of distinct elements in each bucket for .
- ⁇ k 1 m ⁇ ⁇ Y ⁇ [ k ] , ⁇ . ( 4 )
- PUE Proportional-Union Estimation
- the expression Y j [k], 1 ⁇ k ⁇ m represents the corresponding continuous FM vector, obtained using Algorithm 1 with the same hash functions h and g. (For ease of reference, the “[k]” portion of the expression will be omitted when referring to these vectors at a bucket location k.)
- the stream union ⁇ can be partitioned into three subsets, ⁇ , ⁇ , and ⁇ .
- the second line of Equation (6) states that an estimate for
- Equation (6) states that an estimate for
- PUE Proportional-Union Estimator
- Equation (5) can be inverted to obtain a PUE estimate of the cardinalities.
- the “[k]” portion of the expression will be omitted when referring to these vectors at a bucket location k.
- ⁇ can be estimated by e ⁇ circumflex over (N) ⁇ /m . If ⁇ is not negligible, then the proportion equation derived from the above procedure can be inverted to obtain the PUE estimate of correspondingly.
- RSE relative standard error
- the variable ⁇ represents a specified value of the relative standard error of a proportional-union estimate of
- log log ⁇ j uses log log ⁇ j storage bits
- log Y j ⁇ max(0, ⁇ log U 1 , . . . , ⁇ log U B ).
- the uniform random number generator g(•) can be replaced by a unit-exponential random-number generator (with the decimal truncated into a bits), and the minimum update can be replaced by a maximum update. This avoids taking the logarithm in the vector generation, and the initial values for the buckets now become 0 instead of 1.
- ⁇ 0
- ⁇ 1
- ⁇ 2
- ⁇ ( ⁇ 0 , ⁇ 1 , ⁇ 2 ) T .
- Result 1 provides the gradient and Hessian matrix of l( ⁇ ) with respect to ⁇ , noting that the expectation of the Hessian matrix is the same as the information matrix l( ⁇ ) defined in Equation (1).
- H ij is a 3 ⁇ 3 symmetric non-negative definite matrix with elements H ij given by:
- the MLE of 0 that minimizes l( 0 ) does not have a closed-form solution.
- l( 0 ) is strictly convex with a probability of almost 1 , and hence, its unique minimum can be located using a Newton-Rapson algorithm, e.g., wherein the expression represents the MLE of 0 by minimizing l( 0 ), and the MLE of cardinalities is simply
- a PUE estimator has almost the same efficiency as the MLE estimator.
- FIG. 6 is a flowchart illustrating an exemplary method consistent with one embodiment of the present invention.
- a corresponding vector of M components is constructed based on data packets received at the node during a time period, where M is an integer greater than 1 .
- an estimate is made of either (a) how many of the received data packets have been received by all of the nodes of the set, or (b) how many flows of the received data packets have data packets that have passed through all of the nodes of the set.
- Step 620 includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet. The updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes.
- the estimating may estimate how many of the data packets have propagated to every node of the set.
- the estimating may estimate how many of the flows have data packets that have passed through all of the nodes of the set.
- the updating may further include determining a number to assign to the component for updating based on the property of the data packet received by the one of the nodes.
- the estimating may involve evaluating a correlation between the vectors.
- the constructing of the vector of M components may involve updating the number assigned to each component of one of the vectors by a process that changes the assigned number in a monotonic manner.
- FIG.7 is an exemplary block diagram of a network 703 implementing an exemplary method consistent with one embodiment of the present invention.
- network 703 includes a fixed set formed of some nodes 701 of the network and a server 704 .
- Each node 701 is configured to construct a corresponding vector of M components based on data packets received at the node during a time period, M being an integer greater than 1.
- Server 704 is configured to (i) receive the constructed vectors from nodes 701 and (ii) based on the constructed vectors, estimate either (a) how many of the received data packets have been received by all of the nodes of the set or (b) how many flows of the received data packets have data packets that have passed through all of the nodes of the set.
- the constructing of the vector of M components includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet.
- the updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes.
- the estimation performed by the server may estimate how many of the data packets have propagated to every node of the set.
- the estimation performed by the server may estimate how many of the flows have data packets that have passed through all of the nodes of the set.
- the updating may further include determining a number to assign to the component for updating based on the property of the data packet received by the one of the nodes.
- the estimating may involve evaluating a correlation between the vectors.
- the constructing of the vector of M components may involve updating the number assigned to each component of one of the vectors by a process that changes the assigned number in a monotonic manner.
- Proportional-Union Estimation and “PUE” should be understood to include the methods described herein, as well as other implementations and embodiments of proportional-union estimation not specifically set forth herein, and that such terms should not be construed as being limited by the methods described herein.
- the present invention has applicability in monitoring traffic in different environments and comprising data streams of different types, including not only traditional-network (e.g., hardwired LAN) data streams, but also, e.g., wireless-network data streams, sensor-network data streams, and financial-application data streams.
- data streams of different types including not only traditional-network (e.g., hardwired LAN) data streams, but also, e.g., wireless-network data streams, sensor-network data streams, and financial-application data streams.
- random should not be construed as being limited to pure random selections or number generations, but should be understood to include pseudo-random, including seed-based selections or number generations, as well as other selection or number generation methods that might simulate randomness but are not actually random, or do not even attempt to simulate randomness.
- the present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack.
- various functions of circuit elements may also be implemented as processing blocks in a software program.
- Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
- the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
- the present invention can also be embodied in the form of data-storage media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable data-storage medium storing machine-readable program code, wherein the program code includes a set of instructions for executing one of the inventive methods on a digital data-processing machine, such as a computer, to perform the method.
- the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
- each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- This application is related to U.S. application Ser. No. ______, filed on the same date as this application as attorney docket no. Cao 7-7, the teachings of which are incorporated herein by reference.
- 1. Field of the Invention
- The present invention relates to traffic analysis in a network.
- 2. Description of the Related Art
- Massive and distributed data streams are increasingly prevalent in many modern applications. In a backbone Internet-Protocol (IP) network composed of hundreds or even thousands of nodes, packets arrive at and depart from the nodes at very high speeds. In a web content-delivery system composed of many servers (such as Akamai), user requests for accessing websites are distributed among the many servers based on the location of the user and current server loads. Other application domains that give rise to these massive and distributed streams include financial applications and sensor networks.
- Due to their massive and distributed nature, answering queries about these data streams poses a unique challenge. Often, exact-query answering is infeasible due to memory requirements and communications overhead. In this scenario, approximate-query answering, which can provide probabilistic guarantees, becomes the only viable option. One of the most fundamental query classes of interest is the estimation of the number of flows in such streams.
- As a first example, in the context of IP-network management, the number of distinct flows in a network sharing the same characteristics is of high interest to network operators, where a packet flow is defined as, e.g., a sequence of packets that have the same 5-tuple (a logical construct containing five parameters used to identify the connection and allowing network packets of data to be communicated between a server process and a client process in a bi-directional fashion), the same IP addresses/ports of the two communicating peers, and the same protocol. Moreover, the flow ID of a packet can be derived from the 5-tuple. The number of distinct flows between a node pair x and y, which is a type of traffic matrix element, can be formulated as the number of flows of
-
- where
-
- are the numbers of streams of packet-flow IDs seen at nodes x and y, respectively. Such traffic matrix elements can be used by network operators for network provisioning and optimization.
-
- The term “set expression” refers to an expression that defines a set of data elements and is made up of set identifiers (i.e., names of sets) and set operations (such as complements, unions, intersections, and differences) performed on those sets. The term “stream expression” refers to a set expression defined over multiple streams (such as streams of data passing through different nodes of a network), where each stream is considered as a set of elements. Since, in a given stream expression, elements may appear more than once, the term “stream-expression cardinality” refers to the number of distinct elements in a stream expression. For example, in the Venn diagram of
FIG. 3 , where T1 and T2 represent two different stream expressions, the cardinality of T1 is 1 (i.e., T1 contains 1 distinct element), and the cardinality of T2 is 2 (i.e., T2 contains 2 distinct elements). The cardinality of the stream-intersection set T1∩T2 is 0, since there are no elements common to both T1 and T2, and the cardinality of the stream-union set T1∪T2 is 3. - In one embodiment, the present invention provides a method of monitoring a network. The method includes, at each node of a fixed set, constructing a corresponding vector of M components based on data packets received at the node during a time period, M being an integer greater than 1, the fixed set being formed of some nodes of the network; and, based on the constructed vectors, estimating how many of the received data packets have been received by all of the nodes of the set or estimating how many flows of the received data packets have data packets that have passed through all of the nodes of the set. The constructing includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet. The updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes.
-
FIG. 1 is an exemplary block diagram for a portion of a network implementing a method consistent with one embodiment of the present invention; -
FIG. 2 is a flowchart of an exemplary method for monitoring flows or packets among multiple network nodes by using a Proportional-Union Estimation (PUE) estimator; -
FIG. 3 is a Venn diagram illustrating proportions of set-expression cardinalities for a two-stream set expression; -
FIG. 4 is a Venn diagram illustrating proportions of set-expression cardinalities for a three-stream set expression; -
FIG. 5 is a flowchart illustrating a method for continuous FM-vector generation consistent with one embodiment of the present invention; -
FIG. 6 is a flowchart illustrating an exemplary method consistent with one embodiment of the present invention; and -
FIG. 7 is an exemplary block diagram of a network implementing an exemplary method consistent with one embodiment of the present invention. - Estimation methods for stream-expression cardinalities, consistent with embodiments of the present invention, will now be discussed in further detail in the following sequence. First, a method for monitoring flows or packets among multiple network nodes by using a Proportional-Union Estimation (PUE) estimator consistent with one embodiment of the invention will be described. Second, some basic statistical concepts and a continuous variant of Flajolet-Martin (FM) vectors on which a PUE method is based will be discussed. Third, a PUE estimator will be developed from the continuous FM vectors, and its performance will be characterized analytically. Fourth, the statistical efficiency of a PUE estimator will be analyzed by comparison with the MLE for expressions over two streams.
- An embodiment of a PUE method is a traffic-matrix estimation problem in a high-speed network, where the objective is to accurately estimate total traffic volume (in flows or packets) between origin and destination nodes in the network, using flows or packet streams observed at individual nodes. This embodiment may be used in the system of
FIG. 1 . In that system, data streams travel from anorigin node 101 to adestination node 102 via the Internet 103. The streams include a first stream T1 of flows or packets passing throughorigin node 101 and a second stream T2 of flows or packets passing throughdestination node 102. Each ofnodes server 104, which applies one or more set operations to the vectors for the two data streams T1,T2 to generate a PUE cardinality estimate, i.e., an estimate of the number of elements in these data streams.Server 104 includes a memory 105 (e.g., a hard disk or RAM device) that contains machine-readable instructions for generating the PUE cardinality estimate. This PUE estimate characterizes the amount of traffic betweenorigin node 101 anddestination node 102 and can be used byserver 104 for various traffic-monitoring or routing purposes. For example,server 104 might send messages to one or more nodes in the network (not shown) to admit new flows or packets into the network atorigin node 101 or to deny entrance to such new flows or packets, depending on the amount of traffic indicated by the PUE cardinality estimate. -
FIG. 2 shows an embodiment of a method for monitoring flows among network nodes with a PUE estimator. This method can also be used to monitor individual packets, each of which would simply be treated as a single-packet flow. First, atstep 210, continuous FM vectors from flow IDs (or packet headers, in the case of single-packet flows) are generated at a plurality of individual nodes in the network (e.g., as shown in the flowchart ofFIG. 5 , which will be discussed in further detail below).Step 210 can be carried out, e.g., according to Algorithm 1, below. Next, atstep 220, the vectors from the plurality of individual nodes in the network are all received at a single node (e.g.,server 104 ofFIG. 1 ). This receiving step may be the result of one or more queries from the single node, or alternatively, the vectors may automatically be provided by the individual nodes. At step 230, a PUE scheme, as will be described in further detail below, is used to estimate the number of flows (or packets, in the case of single-packet flows) in stream expressions over the plurality of individual nodes in the network, i.e., a PUE cardinality estimate is generated by applying one or more set operations to the vectors corresponding to the stream expressions. This generates an estimate of the number of either (i) the number of received data packets that have been received by all of the nodes of the set, or (ii) the number of flows within received data packets that have passed through all of the nodes of the set. Step 230 can be carried out, e.g., by methods based on the formulae of Equation (6), below. Lastly, at step 240, based on the amount of traffic indicated by the PUE cardinality estimate of the number of flows or packets, a determination is made whether to admit one or more new flows or packets into the network. - Some terminology used herein will now be introduced. The expression S is used herein to denote a stream expression of interest being studied, and N is the cardinality of the union of all streams. The variable δ represents a specified desirable value of relative standard error, which is a measure of an estimate's reliability obtained by dividing the standard error of the estimate by the estimate itself.
- In the context of a stream of network flows, a stream expression passes through an arbitrary combination of some network nodes but not others. In general, distributed streams under consideration will be denoted by , 1≦j≦J, where J is the number of nodes in the network. The operators ∪, ∩, and \ represent set union, set intersection, and set difference, respectively. The operator |•| is used to denote the cardinality of a set. For a stream expression that involves an arbitrary combination of unions, intersections, or differences of , 1≦j≦J, e.g.,
-
- estimating the cardinality of the set is done to provide probabilistic guarantees of estimation accuracy, which permits minimizing computational overhead and memory usage.
- The following additional notations will be used herein. The expression P(•) represents a probability function. The expressions E(•) and var(•) represent the expectation and variance of a random variable, respectively. The expressions corr(•,•) and cov(•,•) represent correlation and covariance, respectively, between two random variables, and the expression
- represents a convergence in distribution. The expression z,36 means a definition, and the expression a≈b is equivalent to a/b≈1, where the operator represents an approximation of equality. The expression Exp(r) (also written as er) is an exponential distribution with rate parameter r, and Normal(μ, σ2) represents a Gaussian distribution with mean μ and variance σ. Finally, for a stream expression S over J streams, the variable N represents the cardinality of the total stream union, and ps=|S|/N represents the proportion of S in N, which proportion is also referred to herein as simply p.
- A brief introduction of some statistical concepts that are used herein, including the notion of statistical efficiency, will now be provided. The expression fθ(x) is a probability-density function for a continuous random variable x (or a probability-mass function for a discrete random variable x), parameterized by θ. Supposing a random sample of size n is drawn from fθ(•), i.e., x1, x2, . . . , xn, the probability density associated with the observed data is fθ(x1, . . . , xn)=Πi=1 n fθ(xi). As a function of θ with x1, x2, . . . , xn observed, this is, in fact, the likelihood function L(θ) of θ, i.e.,
-
L(θ)=Πi=1 n f θ(x i). - The expression I(θ) is defined by
-
- and {circumflex over (θ)} is an unbiased estimator of θ based on the given sample x1, x2, . . . , xn. The variance of {circumflex over (θ)} is then bounded by the reciprocal of the Fisher information I(θ), i.e.,
-
- The foregoing inequality is the well-known Cramer-Rao inequality, also known as the Cramer-Rao lower bound (CRB), as described in Bickel et al., Mathematical Statistics: Basic Ideas and Selected Topics, Vol 1, (2nd Edition), Prentice Hall, 2000, which is incorporated herein by reference in its entirety. The efficiency of {circumflex over (θ)} is defined using the Cramer-Rao lower bound by
-
- From the Cramer-Rao inequality, eff ({circumflex over (θ)})≦1. A desirable statistical inference method seeks an efficient estimator {circumflex over (θ)} that has a large value eff ({circumflex over (θ)}) of efficiency. One such method is the popular Maximum-Likelihood Estimation (MLE) method. The MLE of θ, {circumflex over (θ)}MLE, is defined by finding the value of θ that maximizes L(θ), i.e., {circumflex over (θ)}MLE=argmaxθL(θ), where the function argmax represents the argument of the maximum. As the number of samples increases to infinity, the MLE is asymptotically unbiased and efficient, i.e., to achieve the Cramer-Rao lower bound.
- The cardinality of distributed streams can be estimated using vectors that are compact representations of the actual elements in the streams of the streams. The vectors have M components, where M>1. Such vectors may also be referred to as “sketches,” “statistical sketches,” “statistical digests,” “digests,” or “minimal statistics.” Such vectors can be, e.g., sampling-based or hash-based probabilistic vectors. These probabilistic vector-based solutions, which largely focus on deriving novel algorithms for deriving the vectors, include the Flajolet-Martin (FM) estimator, proposed for calculating the cardinality of a single stream, as disclosed in Flajolet et al., “Probabilistic counting,” In Proc. Symp. on Foundations of Computer Science (FOCS), 1983, which is incorporated herein by reference in its entirety.
- A continuous variant of the Flajolet-Martin (FM) vector, which is used to develop an efficient PUE estimator for stream-expression cardinalities, will now be described. Flajolet and Martin proposed an estimator for counting distinct values in a single stream, using a hash-based probabilistic vector with O(log N) space, where N is the true cardinality. In the original version of the Flajolet-Martin (FM) algorithm for generating vectors, a hash function is used to map an element in the stream to an integer that follows a geometric distribution. In embodiments of the present invention, a continuous variant of the FM algorithm is developed by replacing the geometric random number with a uniform random number. The continuous variant is used to simplify the statistical analysis, as will be discussed below.
- To generate independent replicates of statistics used for counting cardinalities, a technique referred to as stochastic averaging is employed, as described in Durand et al., “Loglog counting of large cardinalities,” Proceedings of European Symposium on Algorithms, pages 605-617, 2003, which is incorporated herein by reference in its entirety. In stochastic averaging, elements are randomly distributed over an array of buckets. For a single stream , the expression [M] represents the data domain of its element e. In the example of IP-flow counting, e is the packet-flow ID, and [M] is the set of all possible values of flow IDs. As shown in the flowchart of
FIG. 5 , the continuous FM vector Y[k], k=1, . . . ,m, is generated for an array of size m, using the following method. First, atstep 510, a hash array Y[k] is initialized with 1 for all k. Atstep 520, an incoming element e is hashed using the function h:[M]→{1, . . . ,m},which is a universal hash function that maps an element uniformly over an array of m buckets. Atstep 530, a random number is generated using the function g:[M]→[0,1], which is a universal hash function that maps the element e to a uniform random number in [0,1], independent of h. For each incoming element e, the variable k=h(e) represents the bucket to which e is mapped. Atstep 540, an update of Y[k] is then performed using: -
Y[k]→min(Y[k], g(e)). (3) - At
step 550, a determination is made whether additional elements e exist, in which case the method returns to step 520. If, atstep 550, it is determined that no additional elements exist, then the method proceeds to step 560, wherein hash array Y is returned as a result. A bucket value will remain at 1 if no element is hashed to that bucket. The following exemplary pseudo-code (Algorithm 1) may be used to implement continuous FM-vector generation for stream : -
Algorithm 1: Continuous FM Sketch for a Stream 1: Initialize a hash array Y of size m with values 1. 2: for each incoming element e of do 3: Hash e into a bucket k = h(e) uniformly, where k ε {1,2,...,m} 4: Generate a random number g(e) uniformly on the interval [0,1]. 5: Update Y[k] by min(Y[k],g(e)). 6: Return Y at the end of the stream.
In Algorithm 1, steps 1, 2, 3, 4, 5, and 6 correspond tosteps FIG. 5 , respectively, as described above. The variable μ=m−1|| represents the mean number of distinct elements in each bucket for . - By ignoring the weak dependence among {Y[k], 1≦k≦m}, the likelihood function of μ can be written as
-
-
- It is assumed above that two universal hash functions h and g are available for producing random independent numbers. To be more realistic, t-wise independent hashing, which employs additional storage for storing a seed, could alternatively be used.
- The development of a new, Proportional-Union Estimation (PUE) method for estimating the cardinality of a stream expression using continuous FM vectors will now be described. To facilitate the clarity of the following description, a PUE method will first be demonstrated for a set expression over two streams and then this method will be generalized to an arbitrary number of streams ,j=1, . . . , J.
- For each stream , j=1,2, the expression Yj[k], 1≦k≦m represents the corresponding continuous FM vector, obtained using Algorithm 1 with the same hash functions h and g. (For ease of reference, the “[k]” portion of the expression will be omitted when referring to these vectors at a bucket location k.) As shown in
FIG. 3 , the stream union ∪ can be partitioned into three subsets, ∩ , \, and \. -
- STATEMENT 1:
- Suppose that m is large enough such that
-
-
P(Y 1 <Y 2)≈(1−ε)N −1 |T 1 \T 2|, -
P(Y 1 >Y 2)≈(1−ε)N −1 |T 2 \T 1|. - For example, if N is large enough such that c is negligible, then Statement 1 permits the conclusion that | ∩|≈NP(Y1=Y2). Therefore, an estimate of ∩ can be obtained using the product of estimates of | ∪ T2| and P(Y1=Y2). It is noted that the continuous FM vector (Algorithm 1) of the stream union ∪ is exactly the bucket-wise minimum min(Y1, Y2), and therefore, | ∪ | can be estimated by using Equation (4) on the new vector. Furthermore, P(Y1=Y2) can be estimated empirically from the observed vector-pair (Y1,Y2).
-
-
- respectively, where {circumflex over (P)} represents the empirical probabilities based on the observed vector pair (Y1[k], Y2[k]), k=1, . . . ,m. In other words, the first line of Equation (6) states that an estimate for ∩ can be derived by multiplying the estimate {circumflex over (N)} of stream union N by the estimated probability that Y1=Y2. The second line of Equation (6) states that an estimate for |\| can be derived by multiplying the estimate {circumflex over (N)} of stream union N by the estimated probability that Y1<Y2. The third line of Equation (6) states that an estimate for |\| can be derived by multiplying the estimate {circumflex over (N)} of stream union N by the estimated probability that Y2<Y1. Equation (6) can be used to obtain an estimate of either (i) the number of received data packets that have been received by all of the nodes of the set, or (ii) the number of flows within received data packets that have passed through all of the nodes of the set. The foregoing estimation scheme or method is referred to as a Proportional-Union Estimator (PUE) because the estimation is based on proportions of subsets of a stream union.
- If ε=exp(−N/m) is not negligible, then Equation (5) can be inverted to obtain a PUE estimate of the cardinalities.
- A PUE method will now be generalized to estimate the cardinality of a set expression over multiple streams , j=1, . . . , J, with J≧2. The expression Yj[k], k=1, . . . ,m represents the corresponding continuous FM vectors by Algorithm 1 for stream . As before, for ease of reference, the “[k]” portion of the expression will be omitted when referring to these vectors at a bucket location k. The expression Y∩ represents the continuous FM vector for the stream union ∪j=1 J and is defined as:
-
Y ∪=min(Y 1 , . . . ,Y J). (7) - The following
Statement 2 is an extension of Statement 1 on P(Y1=Y2) to the case of multiple streams: - STATEMENT 2:
- Suppose m is large enough such that (1−m−1)m≈e−1.
- Then for 1≦d≦J,
-
FIG. 4 illustrates an example of set expressions over J=3 streams , where d=2. It is noted that the stream union can be partitioned into 7 exclusive subsets, numbered from 0 to 6. For example, subset 0 denotes ∩j=1 3 . It should be understood that a generalization can be made to arbitrary values of J and d. - Supposing that S is a set expression over J streams, whose cardinality is the subject of interest, to complete the generalization of a PUE method for |S|, two additional techniques are used for dealing with set expressions, which is illustrated by the following example. Considering S=\((∩ )∪ ), the first technique is to remove set differences that appear in the expression using the relation
- It is noted that this relation can be used repeatedly if there are multiple set differences. In this example, this implies
- Without loss of generality, it can be assumed that the set expression only involves unions and intersections. The second technique is to rewrite the set expression in terms of intersections of set unions, i.e.,
- In this example, this implies
- It is noted that the continuous FM vector of a set union of d streams is exactly the minimum of the d individual FM vectors. If =/N is the proportion of in the total union, then, by applying
Statement 2 with vectors Yj replaced by the vectors corresponding to the set unions, a close approximation of ps can be derived. In this example, the following expressions hold true (for simplicity, ε is assumed to be ignorable): -
and thus - Assuming {circumflex over (N)} is the estimate of N by Y∪, and is the empirical proportion based on the observed vector corresponding to the tuple of the J streams, if N is large enough such that ε=exp(−N/m) is negligible, then a PUE estimate of can be obtained in a straightforward way as in the two-stream case, by:
-
-
- The memory usage of a PUE scheme will now be discussed. For the stream expression S, the variable δ represents a specified value of the relative standard error of a proportional-union estimate of |S|. Given the above equation for calculating RSE, the number of buckets used for a given S is
-
Yj˜min(λj −1E,1) (9) - Since λj˜O(N), it is implied that Yj uses O(log N) storage bits. The following procedure can reduce the per-bucket storage of vector statistics Yj from log N to log log N, by instead storing log Yj. It is noted that, by Equation (9),
-
log Yj˜min(0, −log(λj)+log E). (10) - Assuming log log λj is an integer, log λj uses log log λj storage bits, and log Y uses at most log log(N)+a storage bits, where the a bits are used for storing the decimals of log E (for reference, it is noted that the 0.1% and 99.9% quantiles of log E are −6.907 and 1.933, respectively). Therefore, the per-bucket storage is now at most O(log log N), and the total memory used is δ=2 (log log N+a).
- From experimental studies, it has been observed that a=10 bits is enough for storing the decimals of log E without compromising the overall accuracy of the cardinality estimate. This can be further justified using a careful bias analysis of the probability approximation of e.g., using Equations (5) and (8). For example, considering = ∩ over the J streams, and ==P(Y1=Y2=Y∪), where (I) is the new probability based on the discretized vectors Yj, j=1, . . . ,J, described above, it can be shown that ≦ (I)≦+2 −a+1. Therefore, if a=10, then the difference in the probabilities is at most 0.002, which is negligible for practical purposes.
- It is further noted that, alternatively, a direct method can be used for computing the logarithmic vector log Yj. By Algorithm 1, the following expression is true:
-
Y j=min(1, U 1 , . . . , U B), - where B is a binomial random number representing the number of distinct elements that are mapped to the bucket, and each Ui is a uniform random number in [0,1]. It is noted that −log Ui follows a unit exponential distribution, and therefore,
-
log Y j=−max(0, −log U 1, . . . , −log U B). - Thus, to generate the logarithmic continuous FM vectors, the uniform random number generator g(•) can be replaced by a unit-exponential random-number generator (with the decimal truncated into a bits), and the minimum update can be replaced by a maximum update. This avoids taking the logarithm in the vector generation, and the initial values for the buckets now become 0 instead of 1.
- The efficiency of a PUE method will now be discussed, using the formal statistical methods described above. The likelihood of the continuous FM vectors is derived for the case of two streams, the MLE of the cardinality parameters is obtained, and then its asymptotic variance is compared with that of a PUE method. As explained above, MLE is asymptotically most efficient, because it can achieve the Cramer-Rao lower bound. For a set expression whose cardinality is the subject of interest, =||/N is the proportion of the cardinality of in the total union. It will be shown that, in the two-stream case, a PUE method is as efficient as that of MLE when is small.
- In addition to the notation used above in the discussion of set expressions over two streams, the following are defined as unknown cardinality parameters:
- where
-
θ=(λ0, λ1, λ2)T. - The function fλ(•) denotes the density function of an exponential random variable with rate λ, i.e., fλ(x)=λe−λx, x≧0. The density function for the continuous FM vectors (Y1,Y2), i.e., P(Y1=y1,Y2=y2), 0≦y1,y2≦1, can be expressed as
-
- The vectors at two different bucket locations (Y1[k], Y2[k]) and (Y1[j], Y2[j]) for j≠k, are very weakly dependent. If l(θ) is the negative logarithmic-likelihood function of the continuous FM vectors (Y1[k], Y2[k]), k=1, . . . ,m, i.e.,
-
- then the following Result 1 provides the gradient and Hessian matrix of l(θ) with respect to θ, noting that the expectation of the Hessian matrix is the same as the information matrix l(θ) defined in Equation (1).
- RESULT 1:
- The gradient
-
- is given by
-
-
- where mi are the number of buckets of case i defined in
-
- The Hessian matrix
-
- is a 3×3 symmetric non-negative definite matrix with elements Hij given by:
-
-
- Furthermore if m5>0 and m6>0, is strictly positive definite and thus I(θ) is a strictly convex function. If ∩|>0, then the probability
-
-
- is almost 1 for large m.
- Unlike a PUE estimate, the MLE of 0 that minimizes l(0) does not have a closed-form solution. By Result 1, l(0) is strictly convex with a probability of almost 1, and hence, its unique minimum can be located using a Newton-Rapson algorithm, e.g., wherein the expression represents the MLE of 0 by minimizing l(0), and the MLE of cardinalities is simply
-
- It has been observed, through simulation and experimental studies, that the Newton-Rapson iteration typically employs only a few steps (less than 5) before convergence is reached. The following
Statement 3 provides the asymptotic distribution for the relative accuracy of {circumflex over (θ)}(MLE): STATEMENT 3: -
-
-
- where b is a small nurnbcr given in Equation 19 in the append. Comparing the relative standard error of the MLE estimator and that of a PUE estimator, the efficiency of PUE as compared to MLE, defined by the ratio of their MSEs, i.e., Equation (2), is:
-
- which is close to 1 for a small value of p,. Therefore, a PUE estimator has almost the same efficiency as the MLE estimator.
- For cardinalities defined over a larger number of streams, a PUE method is still relatively simple to implement, while the MLE method becomes difficult, due to the complexity of the likelihood computation.
-
FIG. 6 is a flowchart illustrating an exemplary method consistent with one embodiment of the present invention. At step 610, at each node of a fixed set that is formed of some nodes of the network, a corresponding vector of M components is constructed based on data packets received at the node during a time period, where M is an integer greater than 1. Next, atstep 620, based on the constructed vectors, an estimate is made of either (a) how many of the received data packets have been received by all of the nodes of the set, or (b) how many flows of the received data packets have data packets that have passed through all of the nodes of the set. Step 620 includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet. The updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes. - The estimating may estimate how many of the data packets have propagated to every node of the set. The estimating may estimate how many of the flows have data packets that have passed through all of the nodes of the set. The updating may further include determining a number to assign to the component for updating based on the property of the data packet received by the one of the nodes. The estimating may involve evaluating a correlation between the vectors.
- The constructing of the vector of M components may involve updating the number assigned to each component of one of the vectors by a process that changes the assigned number in a monotonic manner.
-
FIG.7 is an exemplary block diagram of anetwork 703 implementing an exemplary method consistent with one embodiment of the present invention. As shown,network 703 includes a fixed set formed of somenodes 701 of the network and aserver 704. Eachnode 701 is configured to construct a corresponding vector of M components based on data packets received at the node during a time period, M being an integer greater than 1.Server 704 is configured to (i) receive the constructed vectors fromnodes 701 and (ii) based on the constructed vectors, estimate either (a) how many of the received data packets have been received by all of the nodes of the set or (b) how many flows of the received data packets have data packets that have passed through all of the nodes of the set. The constructing of the vector of M components includes updating a component of the vector of one of the nodes in response to the one of the nodes receiving a data packet. The updating includes selecting the component for updating by hashing a property of the data packet received by the one of the nodes. - The estimation performed by the server may estimate how many of the data packets have propagated to every node of the set. The estimation performed by the server may estimate how many of the flows have data packets that have passed through all of the nodes of the set. The updating may further include determining a number to assign to the component for updating based on the property of the data packet received by the one of the nodes. The estimating may involve evaluating a correlation between the vectors.
- The constructing of the vector of M components may involve updating the number assigned to each component of one of the vectors by a process that changes the assigned number in a monotonic manner.
- The terms “Proportional-Union Estimation” and “PUE” should be understood to include the methods described herein, as well as other implementations and embodiments of proportional-union estimation not specifically set forth herein, and that such terms should not be construed as being limited by the methods described herein.
- While embodiments of the invention disclosed herein use an FM method for generating vectors, it should be understood that a PUE estimator consistent with alternative embodiments of the invention could use other methods for generating vectors.
- The present invention has applicability in monitoring traffic in different environments and comprising data streams of different types, including not only traditional-network (e.g., hardwired LAN) data streams, but also, e.g., wireless-network data streams, sensor-network data streams, and financial-application data streams.
- The term “random,” as used herein, should not be construed as being limited to pure random selections or number generations, but should be understood to include pseudo-random, including seed-based selections or number generations, as well as other selection or number generation methods that might simulate randomness but are not actually random, or do not even attempt to simulate randomness.
- The present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
- The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of data-storage media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable data-storage medium storing machine-readable program code, wherein the program code includes a set of instructions for executing one of the inventive methods on a digital data-processing machine, such as a computer, to perform the method. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
- Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range.
- It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
- It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present invention.
- Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
- Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/110,380 US8400933B2 (en) | 2008-04-28 | 2008-04-28 | Efficient probabilistic counting scheme for stream-expression cardinalities |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/110,380 US8400933B2 (en) | 2008-04-28 | 2008-04-28 | Efficient probabilistic counting scheme for stream-expression cardinalities |
Publications (2)
Publication Number | Publication Date |
---|---|
US20090268623A1 true US20090268623A1 (en) | 2009-10-29 |
US8400933B2 US8400933B2 (en) | 2013-03-19 |
Family
ID=41214917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/110,380 Expired - Fee Related US8400933B2 (en) | 2008-04-28 | 2008-04-28 | Efficient probabilistic counting scheme for stream-expression cardinalities |
Country Status (1)
Country | Link |
---|---|
US (1) | US8400933B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271509A1 (en) * | 2008-04-28 | 2009-10-29 | Lucent Technologies Inc. | Probabilistic aggregation over distributed data streams |
CN102202321A (en) * | 2010-03-25 | 2011-09-28 | 通用汽车环球科技运作有限责任公司 | V2X-Connected Cooperative Diagnostic & Prognostic Applications in Vehicular AD HOC Networks |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8533167B1 (en) * | 2012-08-31 | 2013-09-10 | Guavus, Inc. | Compressed set representation for sets as measures in OLAP cubes |
US20140067751A1 (en) * | 2012-08-31 | 2014-03-06 | Nikhil Shirish Ketkar | Compressed set representation for sets as measures in olap cubes |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050220023A1 (en) * | 2004-03-31 | 2005-10-06 | Kodialam Muralidharan S | High-speed traffic measurement and analysis methodologies and protocols |
US6980521B1 (en) * | 2000-11-29 | 2005-12-27 | Cisco Technology, Inc. | Method and apparatus for per session load balancing with improved load sharing in a packet switched network |
US7206861B1 (en) * | 2002-07-29 | 2007-04-17 | Juniper Networks, Inc. | Network traffic distribution across parallel paths |
US20070155396A1 (en) * | 2005-12-30 | 2007-07-05 | Samsung Electronics Co., Ltd. | Interactive traffic information providing method and apparatus |
US20090271509A1 (en) * | 2008-04-28 | 2009-10-29 | Lucent Technologies Inc. | Probabilistic aggregation over distributed data streams |
US8214479B2 (en) * | 2007-04-02 | 2012-07-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Scalability and redundancy in an MSC-Server blade cluster |
-
2008
- 2008-04-28 US US12/110,380 patent/US8400933B2/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6980521B1 (en) * | 2000-11-29 | 2005-12-27 | Cisco Technology, Inc. | Method and apparatus for per session load balancing with improved load sharing in a packet switched network |
US7206861B1 (en) * | 2002-07-29 | 2007-04-17 | Juniper Networks, Inc. | Network traffic distribution across parallel paths |
US20050220023A1 (en) * | 2004-03-31 | 2005-10-06 | Kodialam Muralidharan S | High-speed traffic measurement and analysis methodologies and protocols |
US20070155396A1 (en) * | 2005-12-30 | 2007-07-05 | Samsung Electronics Co., Ltd. | Interactive traffic information providing method and apparatus |
US8214479B2 (en) * | 2007-04-02 | 2012-07-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Scalability and redundancy in an MSC-Server blade cluster |
US20090271509A1 (en) * | 2008-04-28 | 2009-10-29 | Lucent Technologies Inc. | Probabilistic aggregation over distributed data streams |
US8204985B2 (en) * | 2008-04-28 | 2012-06-19 | Alcatel Lucent | Probabilistic aggregation over distributed data streams |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271509A1 (en) * | 2008-04-28 | 2009-10-29 | Lucent Technologies Inc. | Probabilistic aggregation over distributed data streams |
US8204985B2 (en) | 2008-04-28 | 2012-06-19 | Alcatel Lucent | Probabilistic aggregation over distributed data streams |
CN102202321A (en) * | 2010-03-25 | 2011-09-28 | 通用汽车环球科技运作有限责任公司 | V2X-Connected Cooperative Diagnostic & Prognostic Applications in Vehicular AD HOC Networks |
US20110238259A1 (en) * | 2010-03-25 | 2011-09-29 | Gm Global Technology Operations, Inc. | V2X-Connected Cooperative Diagnostic & Prognostic Applications in Vehicular AD HOC Networks |
Also Published As
Publication number | Publication date |
---|---|
US8400933B2 (en) | 2013-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7779143B2 (en) | Scalable methods for detecting significant traffic patterns in a data network | |
US8406132B2 (en) | Estimating cardinality distributions in network traffic | |
US9888030B2 (en) | Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery | |
US8204985B2 (en) | Probabilistic aggregation over distributed data streams | |
US7751325B2 (en) | Method and apparatus for sketch-based detection of changes in network traffic | |
Tong et al. | Privacy-preserving data integrity verification in mobile edge computing | |
US20110239299A1 (en) | Adaptive distinct counting for network-traffic monitoring and other applications | |
US20060085592A1 (en) | Method for distinct count estimation over joins of continuous update stream | |
Chen et al. | Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach | |
Zhou et al. | Persistent spread measurement for big network data based on register intersection | |
US7773538B2 (en) | Estimating origin-destination flow entropy | |
US8185559B2 (en) | Method and system for operating a telecommunication device using a hash table in particular for protecting such device from attacks | |
Chen et al. | Precise error estimation for sketch-based flow measurement | |
US7639611B2 (en) | Method and apparatus for payload-based flow estimation | |
Du et al. | GraphShield: Dynamic large graphs for secure queries with forward privacy | |
Mahmood et al. | Critical infrastructure protection: Resource efficient sampling to improve detection of less frequent patterns in network traffic | |
US7957272B2 (en) | Method and apparatus for coincidence counting for estimating flow statistics | |
US8400933B2 (en) | Efficient probabilistic counting scheme for stream-expression cardinalities | |
US8195710B2 (en) | Method for summarizing data in unaggregated data streams | |
Kostopoulos et al. | Leveraging on the XDP framework for the efficient mitigation of water torture attacks within authoritative DNS servers | |
Zheng et al. | Unbiased delay measurement in the data plane | |
Li et al. | Stable-sketch: A versatile sketch for accurate, fast, web-scale data stream processing | |
US8005949B2 (en) | Variance-optimal sampling-based estimation of subset sums | |
Cormode et al. | Time-decaying sketches for sensor data aggregation | |
Hegedus et al. | Distributed differentially private stochastic gradient descent: An empirical study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LUCENT TECHNOLOGIES INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BU, TIAN;CAO, JIN;CHEN, AIYOU;REEL/FRAME:020862/0096 Effective date: 20080424 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: ALCATEL-LUCENT USA INC., NEW JERSEY Free format text: MERGER;ASSIGNOR:LUCENT TECHNOLOGIES INC.;REEL/FRAME:029446/0103 Effective date: 20081101 |
|
AS | Assignment |
Owner name: ALCATEL LUCENT, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALCATEL-LUCENT USA INC.;REEL/FRAME:029497/0359 Effective date: 20121218 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:ALCATEL-LUCENT USA INC.;REEL/FRAME:030510/0627 Effective date: 20130130 |
|
AS | Assignment |
Owner name: ALCATEL-LUCENT USA INC., NEW JERSEY Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:033949/0016 Effective date: 20140819 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210319 |