[go: up one dir, main page]

CN116303585A - A kind of data flow counting method, equipment and storage medium based on Flag flag position - Google Patents

A kind of data flow counting method, equipment and storage medium based on Flag flag position Download PDF

Info

Publication number
CN116303585A
CN116303585A CN202211217920.XA CN202211217920A CN116303585A CN 116303585 A CN116303585 A CN 116303585A CN 202211217920 A CN202211217920 A CN 202211217920A CN 116303585 A CN116303585 A CN 116303585A
Authority
CN
China
Prior art keywords
data packet
data
counter
flag
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211217920.XA
Other languages
Chinese (zh)
Other versions
CN116303585B (en
Inventor
胡文龙
宛俊美
胡钰涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211217920.XA priority Critical patent/CN116303585B/en
Publication of CN116303585A publication Critical patent/CN116303585A/en
Application granted granted Critical
Publication of CN116303585B publication Critical patent/CN116303585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a Flag bit-based data stream counting method, a Flag bit-based data stream counting device and a storage medium, wherein the Flag bit-based data stream counting method comprises the following steps: setting a parameter initial value; when a data packet arrives, providing key value pair information, and acquiring the position of a calculator mapped by the data packet; when the first data packet arrives, calling a counter value mapped to an insertion function update data packet of the CUS structure to be 1; comparing the counter position of each data packet with the previous data packet from the second data packet, and judging whether the data packet and the previous data packet mapped to the same counter belong to the same data stream or not; and carrying out data stream query, and selecting the minimum value as a query result. The Flag bit is utilized to detect whether the data packets mapped to the same counter belong to the same data stream, and different counting schemes are adopted to improve the accuracy of calculating the number of the data packets in the data stream and improve the measurement accuracy of the data stream; and the Flag bit is of a Bool type, and the memory occupied in the actual environment is negligible.

Description

一种基于Flag标志位的数据流计数方法、设备和存储介质A kind of data flow counting method, equipment and storage medium based on Flag flag position

技术领域technical field

本发明涉及网络管理的技术领域,尤其涉及一种基于Flag标志位的数据流计数方法、设备和存储介质。The present invention relates to the technical field of network management, in particular to a method, device and storage medium for counting data streams based on Flag flag bits.

背景技术Background technique

在高速网络中,海量的数据流量给网络测量工作带来了巨大的压力,基于数据流的网络流量测量方案作为下一代网络测量的重要手段,已经广泛应用于网络计费、流量工程、流数据库、网络安全等领域。基于数据流的网络流量测量方法只产生占据少量存储空间的概要信息,就可以满足数据处理的常规操作需求,尤其是sketch的数据流技术功能强大且灵活性高,更新速度快、复杂度低。Sketch是一种紧凑型数据结构,也是一种基于Hash的数据结构,可以进行高速网络中的大流、异常流等的检测,具有内存和估计精度平衡的特征。In high-speed networks, massive data traffic has brought enormous pressure to network measurement work. As an important means of next-generation network measurement, the network traffic measurement solution based on data flow has been widely used in network billing, traffic engineering, and flow database. , network security and other fields. The network traffic measurement method based on data flow only generates summary information occupying a small amount of storage space, which can meet the routine operation requirements of data processing, especially the data flow technology of Sketch is powerful and flexible, with fast update speed and low complexity. Sketch is a compact data structure and a Hash-based data structure, which can detect large flows and abnormal flows in high-speed networks, and has the characteristics of a balance between memory and estimation accuracy.

基于sketch的数据流方法主要包括Count-Min Sketch(CMS)、Count Sketch(CS)、Conservative Update Sketch(CUS)、Augmented Sketch(AS)等,这类算法不存储元素本身,而是存储元素的计数。CMS是一种典型的且应用最为广泛的sketch结构,其通过Hash函数对数据包的键值对进行哈希计算后映射存储于计数器中,映射到的每个计数器桶都需要更新,而CUS只需要对其中最小的计数器桶进行更新即可。CS与CMS类似,不同的地方在于CS中的每个数组都和两个Hash函数相关。AS在sketch的基础上增添了一个附加过滤器,用来动态地捕获频繁项。上述这几个sketch虽然在网络流量测量的频次计数方面提供了作用,但是它们存在着内存访问次数多、哈希计算资源消耗高、准确率低、误差大等问题。Sketch-based data flow methods mainly include Count-Min Sketch (CMS), Count Sketch (CS), Conservative Update Sketch (CUS), Augmented Sketch (AS), etc. These algorithms do not store the elements themselves, but store the count of the elements . CMS is a typical and most widely used sketch structure. It performs hash calculation on the key-value pairs of data packets through the Hash function and stores the mapping in the counter. Each counter bucket mapped to needs to be updated, while CUS only The smallest counter bucket needs to be updated. CS is similar to CMS, the difference is that each array in CS is related to two Hash functions. AS adds an additional filter on the basis of sketch to dynamically capture frequent items. Although the above-mentioned sketches play a role in the frequency counting of network traffic measurement, they have problems such as large number of memory accesses, high consumption of hash calculation resources, low accuracy, and large errors.

由于哈希表中存在哈希冲突,CMS的查询结果只可能大于或等于真实值,只会高估元素的频率,尤其是对出现次数较少的元素,其准确率会比较低。CUS在CMS的基础上,采用保守更新的方法提高了数据流计数的准确性,但是其无法执行删除操作,因为在插入元素一段时间后,该元素先前映射的最小桶内的最小者不一定就是它本身,可能有其他元素共享该计数桶,所以如果执行删除操作可能会影响到其他元素。Due to the hash conflicts in the hash table, the query result of CMS can only be greater than or equal to the real value, which will only overestimate the frequency of elements, especially for elements with fewer occurrences, the accuracy rate will be relatively low. On the basis of CMS, CUS adopts a conservative update method to improve the accuracy of data flow counting, but it cannot perform deletion operations, because after inserting an element for a period of time, the smallest one in the smallest bucket previously mapped by the element is not necessarily the same By itself, there may be other elements sharing the counting bucket, so if a delete operation is performed, other elements may be affected.

基于此,针对CMS的高估问题,在为改善CMS的准确率而提出的CUS的基础上,本发明提出了一种基于Flag标志位的数据流计数方法,旨在通过设置一个Flag标志位来检测映射到同一计数器的数据包是否属于同一个数据流,然后采取不同的计数方案来改善计数器的计数操作以提高测量精度。Based on this, aiming at the overestimation problem of CMS, on the basis of CUS proposed in order to improve the accuracy rate of CMS, the present invention proposes a kind of data flow counting method based on Flag flag bit, aims at by setting a Flag flag bit Detect whether the data packets mapped to the same counter belong to the same data flow, and then adopt different counting schemes to improve the counting operation of the counter to improve the measurement accuracy.

发明内容Contents of the invention

本部分的目的在于概述本发明的实施例的一些方面以及简要介绍一些较佳实施例。在本部分以及本申请的说明书摘要和发明名称中可能会做些简化或省略以避免使本部分、说明书摘要和发明名称的目的模糊,而这种简化或省略不能用于限制本发明的范围。The purpose of this section is to outline some aspects of embodiments of the invention and briefly describe some preferred embodiments. Some simplifications or omissions may be made in this section, as well as in the abstract and titles of this application, to avoid obscuring the purpose of this section, abstract and titles, and such simplifications or omissions should not be used to limit the scope of the invention.

鉴于上述现有基于Flag标志位的数据流计数方法存在的问题,提出了本发明。In view of the problems existing in the above-mentioned existing method for counting data streams based on Flag flag bits, the present invention is proposed.

因此,本发明目的是提供一种基于Flag标志位的数据流计数方法。Therefore, the object of the present invention is to provide a method for counting data streams based on Flag flag bits.

为解决上述技术问题,本发明提供如下技术方案:In order to solve the above technical problems, the present invention provides the following technical solutions:

设置参数初始值;Set the initial value of the parameter;

当一个数据包到达时,先提出所述数据包中的健值对信息,接着根据CUS算法的Hash函数进行哈希计算,获取所述数据包映射到的计算器位置;When a data packet arrives, first propose the health value pair information in the data packet, then carry out hash calculation according to the Hash function of the CUS algorithm, and obtain the calculator position to which the data packet is mapped;

当第一个数据包到达时,调用CUS结构的插入函数,更新数据包映射到的计数器值为1;When the first data packet arrives, the insert function of the CUS structure is called, and the counter value to which the data packet is mapped is updated to 1;

从到达的第二个数据包开始,将每一个数据包的计数器位置与前一个数据包的进行比较,判断该数据包与映射到同一计数器的前一个数据包是否属于同一个数据流;Starting from the second data packet that arrives, compare the counter position of each data packet with that of the previous data packet to determine whether the data packet belongs to the same data flow as the previous data packet mapped to the same counter;

进行数据流查询,确定映射的所有计数器位置,比较计数器值的大小,选取其中最小的值作为查询结果。Carry out data flow query, determine all the counter positions of the mapping, compare the size of the counter values, and select the smallest value among them as the query result.

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:设置参数初始值包括,As a preferred solution of the method for counting data streams based on the Flag flag bit in the present invention, wherein: setting the initial value of the parameter includes,

将CUS结构的宽度值和深度值设置为固定值,将所述CUS结构中所有计数器值初始化为0,Flag标志位初始化为flag=1。Set the width and depth values of the CUS structure to fixed values, initialize all counter values in the CUS structure to 0, and initialize the Flag bit to flag=1.

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:所述CUS结构是宽度为w、深度为d的二维矩阵数组,用来存储元素的计数信息,每个所述数组包含w个计数器,同时还包含d个哈希函数hj(j=1,2,...,d)。As a preferred solution of the method for counting data streams based on Flag flag bits in the present invention, wherein: the CUS structure is a two-dimensional matrix array with a width of w and a depth of d, used to store counting information of elements, each The array includes w counters and d hash functions h j (j=1, 2, . . . , d).

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:所述哈希计算表示为hj(key),(j=1,2,…,d),经过所述哈希计算后获取数据包映射到的计数器位置,记为f(xi,pi);As a preferred solution of the method for counting data streams based on Flag flag bits in the present invention, wherein: the hash calculation is expressed as h j (key), (j=1, 2, ..., d), through the Obtain the counter position to which the data packet is mapped after hash calculation, denoted as f( xi , p i );

其中si表示数据包所属的流,pi表示该数据包所映射到的计数器位置。Among them, s i represents the stream to which the data packet belongs, and pi represents the counter position to which the data packet is mapped.

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:所述CUS结构的插入函数包括,As a preferred solution of the method for counting data streams based on Flag flag bits in the present invention, wherein: the insertion function of the CUS structure includes,

当数据包到达时,按照键值对的形式进行映射计数,先调用CUS结构中的哈希函数,对数据包的键字段进行d次哈希计算,找到对应的计数器,然后只对映射的计数器中最小的值进行增加操作,即更新最小的计数器值为当前值加上该数据包的值字段值。When the data packet arrives, map and count in the form of key-value pairs, first call the hash function in the CUS structure, perform d times hash calculation on the key field of the data packet, find the corresponding counter, and then only count the mapped counter Increment the smallest value in the counter, that is, update the smallest counter value to the current value plus the value field value of the data packet.

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:从到达的第二个数据包开始,将每一个数据包的计数器位置与前一个数据包的进行比较,判断该数据包与映射到同一计数器的前一个数据包是否属于同一个数据流包括,As a preferred scheme of the method for counting data streams based on Flag flag bits in the present invention, wherein: starting from the second data packet arriving, the counter position of each data packet is compared with that of the previous data packet, and judged Whether this packet belongs to the same flow as the previous packet mapped to the same counter includes,

若前后两个数据包映射的计数器位置不同,则按照CUS的计数规则正常计数;If the counter positions of the two data packets before and after are mapped to be different, it will be counted normally according to the counting rules of CUS;

若前后两个数据包映射的计数位置相同,即发生了哈希冲突,且属于相同的数据流,则将flag记为1;If the count positions of the two data packet mappings before and after are the same, that is, a hash collision occurs, and they belong to the same data flow, the flag is recorded as 1;

若前后两个数据包映射的计数位置相同,且属于不同相同的数据流,则将flag记为0;If the count positions of the two data packet mappings before and after are the same, and belong to different and identical data streams, the flag will be recorded as 0;

最后根据flag的值在计数器中进行更新;Finally, update the counter according to the value of the flag;

若flag=1,则将数据包计数器值增加1;If flag=1, then increase the packet counter value by 1;

若flag=0,则将数据包计数器值增加0.5。If flag=0, increase the data packet counter value by 0.5.

作为本发明所述基于Flag标志位的数据流计数方法的一种优选方案,其中:进行所述数据流查询包括,As a preferred solution of the method for counting data streams based on Flag flag bits in the present invention, wherein: performing the data stream query includes,

调用CUS查询函数,首先根据数据包的键字段进行哈希计算找到相对应的哈希函数映射的计数器,然后找出这些计数器中最小的值作为该数据包的频次估计值,最后选取最小值作为估计值。Call the CUS query function, first perform hash calculation according to the key field of the data packet to find the corresponding counter of the hash function mapping, then find out the smallest value among these counters as the estimated frequency of the data packet, and finally select the smallest value as the estimated value.

一种基于Flag标志位的数据流计数系统,其特征在于,所述系统包括:A kind of data flow counting system based on Flag sign position, it is characterized in that, described system comprises:

数据收集模块,使相关参数初始化,便于后续计算;The data collection module initializes relevant parameters to facilitate subsequent calculations;

数据判定模块,利用flag标志位来检测映射到同一计数器的数据包是否属于同一数据流,然后采用不同的计数方案来改善计数器的计数操作;The data judgment module utilizes the flag flag to detect whether the data packets mapped to the same counter belong to the same data flow, and then adopts different counting schemes to improve the counting operation of the counter;

数据查询模块,确定最终估计值,提高数据流的测量精度。The data query module determines the final estimated value and improves the measurement accuracy of the data flow.

一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时是实现权利要求1至7中任一项所述的方法的步骤。A computer device, comprising a memory and a processor, the memory stores a computer program, wherein when the processor executes the computer program, it realizes the steps of the method according to any one of claims 1 to 7 .

一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer-readable storage medium, on which a computer program is stored, wherein, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are realized.

本发明的有益效果:利用Flag标志位来检测映射到同一计数器的数据包是否属于同一个数据流,然后采取不同的计数方案来改善计数器的计数操作,可以提高计算数据流中的数据包数量的准确性,减少CMS的高估,提高数据流的测量精度;且设置的Flag标志位是Bool类型的,只占用一个字节的内存空间,实际环境中可以忽略Flag所需的额外内存。Beneficial effects of the present invention: use Flag flag to detect whether the data packets mapped to the same counter belong to the same data flow, and then adopt different counting schemes to improve the counting operation of the counter, which can improve the efficiency of calculating the number of data packets in the data flow Accuracy, reducing the overestimation of CMS, and improving the measurement accuracy of the data stream; and the set Flag flag is of Bool type, which only occupies one byte of memory space, and the additional memory required by the Flag can be ignored in the actual environment.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。其中:In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort. in:

图1为本发明基于Flag标志位的数据流计数方法的整体结构示意图。FIG. 1 is a schematic diagram of the overall structure of the method for counting data streams based on Flag flag bits in the present invention.

图2为本发明基于Flag标志位的数据流计数方法中插入数据包的流程图。FIG. 2 is a flow chart of inserting data packets in the Flag bit-based data flow counting method of the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合说明书附图对本发明的具体实施方式做详细的说明。In order to make the above objects, features and advantages of the present invention more obvious and comprehensible, specific implementations of the present invention will be described in detail below in conjunction with the accompanying drawings.

在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是本发明还可以采用其他不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似推广,因此本发明不受下面公开的具体实施例的限制。In the following description, a lot of specific details are set forth in order to fully understand the present invention, but the present invention can also be implemented in other ways different from those described here, and those skilled in the art can do it without departing from the meaning of the present invention. By analogy, the present invention is therefore not limited to the specific examples disclosed below.

其次,此处所称的“一个实施例”或“实施例”是指可包含于本发明至少一个实现方式中的特定特征、结构或特性。在本说明书中不同地方出现的“在一个实施例中”并非均指同一个实施例,也不是单独的或选择性的与其他实施例互相排斥的实施例。Second, "one embodiment" or "an embodiment" referred to herein refers to a specific feature, structure or characteristic that may be included in at least one implementation of the present invention. "In one embodiment" appearing in different places in this specification does not all refer to the same embodiment, nor is it a separate or selective embodiment that is mutually exclusive with other embodiments.

再其次,本发明结合示意图进行详细描述,在详述本发明实施例时,为便于说明,表示器件结构的剖面图会不依一般比例作局部放大,而且所述示意图只是示例,其在此不应限制本发明保护的范围。此外,在实际制作中应包含长度、宽度及深度的三维空间尺寸。Secondly, the present invention is described in detail in conjunction with schematic diagrams. When describing the embodiments of the present invention in detail, for the convenience of explanation, the cross-sectional view showing the structure of the device will not be partially enlarged according to the general scale, and the schematic diagram is only an example, and it should not be used here. Limit the scope of protection of the present invention. In addition, the three-dimensional space dimensions of length, width and depth should be included in actual production.

实施例1Example 1

参照图1和图2,一种基于Flag标志位的数据流计数方法包括:With reference to Fig. 1 and Fig. 2, a kind of data flow counting method based on Flag flag comprises:

如图1所示,该方法包含元素项和CUS这两个部分;其中,元素项由item和flag两个模块组成;item是记录元素的;flag是标志位,flag可以用来判断数据包是否属于同一个数据流;CUS是宽度为w、深度为d的二维数组,用来存储元素的计数信息的;其原理是根据flag标志位来检测映射到同一计数器的数据包是否属于同一个数据流,然后分类进行计数。As shown in Figure 1, this method includes two parts: element item and CUS; wherein, the element item is composed of two modules item and flag; item is for recording elements; flag is a flag bit, and flag can be used to determine whether the data packet is Belong to the same data stream; CUS is a two-dimensional array with a width of w and a depth of d, which is used to store the count information of elements; the principle is to detect whether the data packets mapped to the same counter belong to the same data according to the flag bit stream, and then categorized for counting.

其中,图1中p为数据包。Wherein, p in Fig. 1 is a data packet.

基于Flag标志位的数据流计数方法,CUS结构可以对数据包对应流进行频次计数,结构如图1右半部分所示。Based on the data flow counting method of the Flag flag bit, the CUS structure can count the frequency of the flow corresponding to the data packet. The structure is shown in the right half of Figure 1.

其中,将Flag标志位的数据流计数方法记作CUSF。Wherein, the data flow counting method of the Flag flag is denoted as CUSF.

其中,CUS结构是一个二维数组,宽度为w、深度为d,每个数组包含w个计数器,同时还包含d个哈希函数hj(j=1,2,...,d)。Wherein, the CUS structure is a two-dimensional array with a width of w and a depth of d, and each array contains w counters and d hash functions h j (j=1, 2, . . . , d).

当一个数据包到达时,按照键值对(key,value)的形式进行映射存储,其中键字段可以是数据包头中的属性信息,如五元组(源IP地址、源端口、目的IP地址、目的端口以及传输层协议),值字段可以是流出现的次数或是流的大小。When a data packet arrives, it is mapped and stored in the form of a key-value pair (key, value), where the key field can be attribute information in the data packet header, such as a five-tuple (source IP address, source port, destination IP address, Destination port and transport layer protocol), the value field can be the number of times the flow occurs or the size of the flow.

更新计数时,先对到达的数据包的键字段进行d次哈希计算,找到映射的计数器,然后只对映射的计数器中最小的那个(或那些)进行增加操作,即更新最小的计数器值为当前值加上该数据包的值字段值。When updating the count, first perform d hash calculations on the key field of the arriving data packet, find the mapped counter, and then only increase the smallest one (or those) of the mapped counters, that is, update the smallest counter value to The current value plus the value of the value field of the packet.

查询时,不同数据流生成的键值对是不相同的,根据数据包的键字段进行哈希计算找到这些哈希函数映射的计数器,然后找出这些计数器中最小的值作为该数据包的频次估计值,因为存在哈希冲突,查询结果只可能大于或等于真实值,所以选择最小值作为估计值是最为准确的。When querying, the key-value pairs generated by different data streams are different. Perform hash calculations based on the key fields of the data packets to find the counters mapped by these hash functions, and then find the smallest value of these counters as the frequency of the data packet Estimated value, because of hash conflicts, the query result can only be greater than or equal to the real value, so choosing the minimum value as the estimated value is the most accurate.

调用上述方法,如图2所示,执行步骤如下:Call the above method, as shown in Figure 2, the execution steps are as follows:

S1:设置参数初始值。需要说明的是:S1: Set the initial value of the parameter. It should be noted:

将CUS结构的宽度值和深度值设置为固定值,将CUS结构中所有计数器值初始化为0,Flag标志位初始化为flag=1。Set the width value and depth value of the CUS structure to fixed values, initialize all counter values in the CUS structure to 0, and initialize the Flag bit to flag=1.

S2:当一个数据包到达时,先提出所述数据包中的健值对信息,接着根据CUS算法的Hash函数进行哈希计算,获取所述数据包映射到的计算器位置。需要说明的是:S2: When a data packet arrives, first propose the key-value pair information in the data packet, and then perform hash calculation according to the Hash function of the CUS algorithm to obtain the position of the calculator to which the data packet is mapped. It should be noted:

当一个数据包到达时,首先从数据包中提取出键值对(key,value)信息,然后根据CUS算法的Hash函数进行哈希计算,获取数据包映射到的计数器位置,记为f(si,pi)。When a data packet arrives, the key-value pair (key, value) information is first extracted from the data packet, and then the hash calculation is performed according to the Hash function of the CUS algorithm to obtain the counter position to which the data packet is mapped, which is recorded as f(s i , p i ).

其中,哈希计算表示为hj(key),(j=1,2,…,d),si表示数据包所属的流,pi表示该数据包所映射到的计数器位置。Wherein, the hash calculation is expressed as h j (key), (j=1, 2, ..., d), s i indicates the flow to which the data packet belongs, and p i indicates the counter position to which the data packet is mapped.

S3:当第一个数据包到达时,调用CUS结构的插入函数,更新数据包映射到的计数器值为1。需要说明的是:S3: When the first data packet arrives, call the insert function of the CUS structure, and update the counter value to which the data packet is mapped to 1. It should be noted:

CUS结构的插入函数为当数据包到达时,按照键值对的形式进行映射计数,先调用CUS结构中的哈希函数,对数据包的键字段进行d次哈希计算,找到对应的计数器,然后只对映射的计数器中最小的值进行增加操作,即更新最小的计数器值为当前值加上该数据包的值字段值。The insertion function of the CUS structure is to perform mapping and counting in the form of key-value pairs when the data packet arrives, first call the hash function in the CUS structure, perform d times of hash calculation on the key field of the data packet, and find the corresponding counter, Then only the smallest value in the mapped counter is incremented, that is, the smallest counter value is updated to the current value plus the value field value of the data packet.

S4:从到达的第二个数据包开始,将每一个数据包的计数器位置与前一个数据包的进行比较,判断该数据包与映射到同一计数器的前一个数据包是否属于同一个数据流。需要说明的是:S4: Starting from the second arriving data packet, compare the counter position of each data packet with that of the previous data packet, and judge whether the data packet and the previous data packet mapped to the same counter belong to the same data flow. It should be noted:

若前后两个数据包映射的计数器位置不同,则按照CUS的计数规则正常计数;If the counter positions of the two data packets before and after are mapped to be different, it will be counted normally according to the counting rules of CUS;

若前后两个数据包映射的计数位置相同,即发生了哈希冲突,且属于相同的数据流,则将flag记为1;If the count positions of the two data packet mappings before and after are the same, that is, a hash collision occurs, and they belong to the same data flow, the flag is recorded as 1;

若前后两个数据包映射的计数位置相同,且属于不同相同的数据流,则将flag记为0;If the count positions of the two data packet mappings before and after are the same, and belong to different and identical data streams, the flag will be recorded as 0;

最后根据flag的值在计数器中进行更新;Finally, update the counter according to the value of the flag;

若flag=1,则将数据包计数器值增加1;If flag=1, then increase the packet counter value by 1;

若flag=0,则将数据包计数器值增加0.5。If flag=0, increase the data packet counter value by 0.5.

S5:进行数据流查询,确定映射的所有计数器位置,比较计数器值的大小,选取其中最小的值作为查询结果。需要说明的是:S5: Perform data flow query, determine the positions of all counters mapped, compare the counter values, and select the smallest value among them as the query result. It should be noted:

调用CUS查询函数,首先根据数据包的键字段进行哈希计算找到相对应的哈希函数映射的计数器,然后找出这些计数器中最小的值作为该数据包的频次估计值,最后选取最小值作为估计值。Call the CUS query function, first perform hash calculation according to the key field of the data packet to find the corresponding counter of the hash function mapping, then find out the smallest value among these counters as the estimated frequency of the data packet, and finally select the smallest value as the estimated value.

实施例2Example 2

本实施例为本发明的第四个实施例,该实施例不同于第一个实施例的是,提供了一种基于Flag标志位的数据流计数方法的验证测试,对本方法中采用的技术效果加以验证说明。This embodiment is the fourth embodiment of the present invention. What this embodiment is different from the first embodiment is that it provides a verification test of a data flow counting method based on Flag flag bits, and the technical effect adopted in this method Be verified.

Count-Min Sketch在处理数据流的元素计数问题时,由于哈希表存在哈希冲突,CMS只会高估元素的频率,尤其是对出现次数较少的元素,其准确率会比较低,为改善CMS的准确率而提出的Conservative Update Sketch,采用保守更新的方法提高了数据流计数的准确性,但是其无法执行删除操作,因为在插入元素一段时间后,该元素先前映射的最小桶内的最小者不一定就是它本身,可能有其他元素共享该计数桶,所以如果执行删除操作可能会影响到其他元素,在CMS和CUS的基础上,本发明提出一种基于Flag标志位的数据流计数方法CUSF,添加了一个Bool类型的Flag标志位,采取不同的计数方案来改善计数器的计数操作以提高测量精度。When Count-Min Sketch deals with the element counting problem of the data stream, due to the hash conflicts in the hash table, the CMS will only overestimate the frequency of elements, especially for elements with fewer occurrences, the accuracy rate will be relatively low, as The Conservative Update Sketch proposed to improve the accuracy of CMS uses a conservative update method to improve the accuracy of data flow counting, but it cannot perform deletion operations because after inserting an element for a period of time, the minimum bucket previously mapped by the element The smallest one is not necessarily itself, there may be other elements sharing the counting bucket, so if the delete operation is performed, other elements may be affected. On the basis of CMS and CUS, this invention proposes a data flow counting based on the Flag flag The method CUSF adds a Bool-type Flag flag, and adopts different counting schemes to improve the counting operation of the counter to improve the measurement accuracy.

CUSF与CUS的实验对比:假设某段时间内,存在a、b两个数据流需要更新到计数器中,数据包的到达序列是a、b、a、b、b、a、a、b、a、a,计数器初始化为0,flag=1,CUS的宽度w=10、深度d=2,a、b两个流映射到计数器中的位置至少有一个是相同的。Experimental comparison between CUSF and CUS: Assume that within a certain period of time, there are two data streams a and b that need to be updated to the counter, and the arrival sequence of data packets is a, b, a, b, b, a, a, b, a , a, the counter is initialized to 0, flag=1, the CUS width w=10, depth d=2, at least one of the positions mapped to the counter by the two streams a and b is the same.

此处有两种情况:第一种情况,a、b两个流映射到计数器中的位置都相同,即h1(a)=h1(b)、h2(a)=h2(b),。There are two cases here: in the first case, the positions of the two streams a and b mapped to the counter are the same, that is, h 1 (a)=h 1 (b), h 2 (a)=h 2 (b ),.

第二种情况,a、b两个流映射到计数器中的位置只有一个相同,即h1(a)=h1(b)、h2(a)≠h2(b)或者h1(a)≠h1(b)、h2(a)=h2(b)。In the second case, only one position of the two streams a and b mapped to the counter is the same, that is, h 1 (a)=h 1 (b), h 2 (a)≠h 2 (b) or h 1 (a )≠h 1 (b), h 2 (a)=h 2 (b).

在此以第一种情况为例,假设h1(a)=h1(b)=2、h2(a)=h2(b)=5,CUS、CUSF的计数结果如表1、表2所示。Taking the first case as an example here, assuming that h1(a)=h1(b)=2, h2(a)=h2(b)=5, the counting results of CUS and CUSF are shown in Table 1 and Table 2.

Figure SMS_1
Figure SMS_1

表1:CUS上数据流的计数结果Table 1: Counting results of data flows on CUS

Figure SMS_2
Figure SMS_2

表2:CUSF上数据流的计数结果Table 2: Counting results of data flows on CUSF

在CUS算法中查询数据流a、b的估计值,其查询结果为10个包,而实际a流的数据包数只有6个包,b流只有4个数据包,其对数据流a、b的查询准确率分别为60%、40%,在CUSF算法中的查询结果为7个包,数据流a的查询准确率达到了86%,数据流b的准确率为57%,与CUS算法相比,CUSF算法的查询精度提高了22%左右。由此可以看出,CUSF算法的准确率更高,更适用于高速网络环境下的流量测量。In the CUS algorithm, the estimated value of data stream a and b is queried, and the query result is 10 packets, but the actual number of data packets of flow a is only 6 packets, and that of flow b is only 4 packets. The query accuracy rates are 60% and 40%, respectively. The query results in the CUSF algorithm are 7 packets. The query accuracy rate of data stream a reaches 86%, and the accuracy rate of data stream b is 57%. Compared with the CUS algorithm Compared with that, the query accuracy of CUSF algorithm is increased by about 22%. It can be seen that the accuracy of the CUSF algorithm is higher, and it is more suitable for flow measurement in a high-speed network environment.

综上所述,本发明提出了一种基于Flag标志位的数据流计数方法,通过设置一个Bool类型的Flag标志位(设置的Flag标志位是Bool类型的,只占用一个字节的内存空间,实际环境中可以忽略Flag所需的额外内存),用于检测映射到同一计数器的数据包是否与先前的数据包属于同一个数据流,如果前后两个数据包映射的计数器位置相同(即发生了哈希冲突),并且属于相同的数据流,则将flag记为1,若是属于不同的数据流,则将flag记为0,然后根据flag的值在计数器中进行更新,可以更准确地计算数据流中的数据包数量,减少CMS的高估,提高数据流的测量精度。To sum up, the present invention proposes a kind of data flow counting method based on the Flag flag bit, by setting a Flag flag bit of Bool type (the Flag flag bit of setting is Bool type, only takes the memory space of one byte, In the actual environment, the additional memory required by Flag can be ignored), which is used to detect whether the data packet mapped to the same counter belongs to the same data flow as the previous data packet, and if the counter positions of the two data packets before and after the mapping are the same (that is, a Hash conflict), and belong to the same data stream, the flag is recorded as 1, if it belongs to different data streams, the flag is recorded as 0, and then updated in the counter according to the value of the flag, the data can be calculated more accurately The number of packets in the flow, reducing the overestimation of CMS and improving the measurement accuracy of data flow.

应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation, although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be carried out Modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present invention shall be covered by the claims of the present invention.

Claims (10)

1. A data flow counting method based on Flag bit is characterized in that: comprising the steps of (a) a step of,
setting a parameter initial value;
when a data packet arrives, firstly providing key value pair information in the data packet, then carrying out Hash calculation according to a Hash function of a CUS algorithm, and obtaining a calculator position mapped to the data packet;
when the first data packet arrives, calling an insertion function of the CUS structure, and updating a counter value mapped by the data packet to be 1;
comparing the counter position of each data packet with the previous data packet from the second data packet, and judging whether the data packet and the previous data packet mapped to the same counter belong to the same data stream or not;
and carrying out data flow inquiry, determining all mapped counter positions, comparing the sizes of counter values, and selecting the minimum value as an inquiry result.
2. The Flag-based data stream counting method according to claim 1, wherein: the initial values of the set parameters include,
setting the width value and the depth value of a CUS structure to fixed values, initializing all counter values in the CUS structure to 0, and initializing flag bit to flag=1.
3. The Flag-based data stream counting method according to claim 2, wherein: the CUS structure is a two-dimensional matrix array with width w and depth d and is used for storing the counting information of elements, and each array comprises w counters and d hash functions h j (j=1,2,...,d)。
4. The Flag-based data stream counting method according to claim 1, wherein: the hash computation is denoted as h j (key), (j=1, 2, …, d) and obtaining the counter position mapped by the data packet after the hash calculation, and marking as f(s) i ,p i );
Wherein s is i Representing the flow to which the packet belongs, p i Indicating the counter location to which the packet is mapped.
5. The Flag-based data stream counting method according to claim 1, wherein: the insertion function of the CUS structure includes,
when the data packet arrives, mapping and counting are carried out according to the key value pair mode, firstly, a hash function in the CUS structure is called, d times of hash calculation are carried out on the key field of the data packet, a corresponding counter is found, then only the minimum value in the mapped counter is subjected to increasing operation, namely, the minimum counter value is updated to be the current value plus the value field value of the data packet.
6. The Flag-based data stream counting method according to claim 1, wherein: from the second packet that arrives, the counter position of each packet is compared with the previous packet to determine whether the packet belongs to the same data stream as the previous packet mapped to the same counter,
if the positions of the counters mapped by the front data packet and the rear data packet are different, normally counting according to the counting rule of the CUS;
if the counting positions of the front data packet map and the rear data packet map are the same, namely hash collision occurs and the data packets belong to the same data stream, marking the flag as 1;
if the counting positions of the front and back data packet mappings are the same and belong to different identical data streams, marking the flag as 0;
finally, updating in a counter according to the value of the flag;
if flag=1, the packet counter value is incremented by 1;
if flag=0, the packet counter value is incremented by 0.5.
7. The Flag-based data stream counting method according to claim 6, wherein: making the data stream query includes,
calling a CUS query function, firstly carrying out hash calculation according to key fields of a data packet to find out corresponding hash function mapped counters, then finding out the minimum value in the counters to serve as a frequency estimated value of the data packet, and finally selecting the minimum value to serve as an estimated value.
8. A Flag bit based data stream counting system, the system comprising:
the data collection module initializes related parameters and facilitates subsequent calculation;
the data judging module is used for detecting whether the data packets mapped to the same counter belong to the same data stream or not by using the flag bit, and then adopting different counting schemes to improve the counting operation of the counter;
and the data query module determines a final estimated value and improves the measurement accuracy of the data stream.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, is the step of implementing the method of any one of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202211217920.XA 2022-09-30 2022-09-30 A data stream counting method, device, and storage medium based on Flag bits. Active CN116303585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211217920.XA CN116303585B (en) 2022-09-30 2022-09-30 A data stream counting method, device, and storage medium based on Flag bits.

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211217920.XA CN116303585B (en) 2022-09-30 2022-09-30 A data stream counting method, device, and storage medium based on Flag bits.

Publications (2)

Publication Number Publication Date
CN116303585A true CN116303585A (en) 2023-06-23
CN116303585B CN116303585B (en) 2026-01-02

Family

ID=86822743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211217920.XA Active CN116303585B (en) 2022-09-30 2022-09-30 A data stream counting method, device, and storage medium based on Flag bits.

Country Status (1)

Country Link
CN (1) CN116303585B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881338A (en) * 2023-09-07 2023-10-13 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model
CN119603203A (en) * 2025-02-10 2025-03-11 苏州大学 A method, system, device and medium for estimating flow frequency
CN119996264A (en) * 2025-02-17 2025-05-13 北京大学 Network traffic monitoring method, device, electronic device and storage medium
CN120144624A (en) * 2025-05-14 2025-06-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) A method and system for efficiently detecting persistent items in a small memory environment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3195810A (en) * 1957-12-26 1965-07-20 Burroughs Corp Accounting apparatus
US20110069632A1 (en) * 2009-09-21 2011-03-24 Alcatel-Lucent Usa Inc. Tracking network-data flows
CN108304409A (en) * 2017-01-13 2018-07-20 北京大学 A kind of data Frequency estimation method of the Sketch data structures based on carry
CN111262756A (en) * 2020-01-20 2020-06-09 长沙理工大学 An accurate measurement method and architecture of high-speed network elephant flow
CN113572705A (en) * 2021-07-23 2021-10-29 广东唯审信息科技有限公司 High-speed network active queue scheduling method based on controllable time delay

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3195810A (en) * 1957-12-26 1965-07-20 Burroughs Corp Accounting apparatus
US20110069632A1 (en) * 2009-09-21 2011-03-24 Alcatel-Lucent Usa Inc. Tracking network-data flows
CN108304409A (en) * 2017-01-13 2018-07-20 北京大学 A kind of data Frequency estimation method of the Sketch data structures based on carry
CN111262756A (en) * 2020-01-20 2020-06-09 长沙理工大学 An accurate measurement method and architecture of high-speed network elephant flow
CN113572705A (en) * 2021-07-23 2021-10-29 广东唯审信息科技有限公司 High-speed network active queue scheduling method based on controllable time delay

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张震;汪斌强;朱珂;: "流量测量的关键技术分析与研究", 计算机应用研究, no. 09, 15 September 2009 (2009-09-15) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881338A (en) * 2023-09-07 2023-10-13 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model
CN116881338B (en) * 2023-09-07 2024-01-26 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model
CN119603203A (en) * 2025-02-10 2025-03-11 苏州大学 A method, system, device and medium for estimating flow frequency
CN119603203B (en) * 2025-02-10 2025-06-06 苏州大学 A method, system, device and medium for estimating flow frequency
CN119996264A (en) * 2025-02-17 2025-05-13 北京大学 Network traffic monitoring method, device, electronic device and storage medium
CN120144624A (en) * 2025-05-14 2025-06-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) A method and system for efficiently detecting persistent items in a small memory environment
CN120144624B (en) * 2025-05-14 2025-08-19 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and system for efficiently detecting persistent items in small memory environment

Also Published As

Publication number Publication date
CN116303585B (en) 2026-01-02

Similar Documents

Publication Publication Date Title
CN116303585A (en) A kind of data flow counting method, equipment and storage medium based on Flag flag position
WO2019024623A1 (en) Flow measurement method, device and system
US11218395B2 (en) Latency monitoring for network devices
CN105989061B (en) Multidimensional data repeats detection fast indexing method under a kind of sliding window
CN104424256B (en) Bloom filter generation method and device
CN110535825B (en) A data identification method for characteristic network flow
CN101309216A (en) A method and device for classifying IP packets
CN108304409A (en) A kind of data Frequency estimation method of the Sketch data structures based on carry
CN102420831A (en) Multi-domain network packet classification method
CN108011823A (en) Multipolarity method and device, multilevel flow table lookup method and the device of multiple domain flow table
CN111817978A (en) A kind of flow classification method and device
CN108268603A (en) A kind of community discovery method based on core member's identification
Hua et al. A multi-attribute data structure with parallel bloom filters for network services
CN117201368A (en) Network flow base number estimation method and system based on pre-sampling
Liu et al. SEAD counter: Self-adaptive counters with different counting ranges
CN108460030A (en) A kind of set element judgment method based on improved Bloom filter
CN118467546A (en) Data stream processing method, device, computer equipment and storage medium
CN111131197B (en) Filtering strategy management system and method thereof
CN111200542B (en) A network traffic management method and system based on deterministic replacement strategy
CN107590160B (en) Method and device for monitoring internal structure of radix tree to realize test
CN113297430B (en) High-performance arbitrary partial key measurement method and system based on Sketch
CN104794158B (en) Domain name data repeats detection fast indexing method under a kind of boundary mark window
US12381822B2 (en) Data flow table, method and device for processing data flow table, and storage medium
CN117827851B (en) Data processing structure for measuring flow base number and application thereof
CN113595816B (en) A data flow calculation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant