CN100461091C

CN100461091C - Method and system for content detection with reconfigurable hardware

Info

Publication number: CN100461091C
Application number: CNB2005800330496A
Authority: CN
Inventors: 巴拉斯·马德胡苏丹; 约翰·W·洛克伍德
Original assignee: University of Washington
Current assignee: University of Washington
Priority date: 2004-08-24
Filing date: 2005-08-24
Publication date: 2009-02-11
Anticipated expiration: 2025-08-24
Also published as: CN101031876A

Abstract

The method and system according to the present invention are used to identify duplicate content in a data stream. A hash function is calculated for at least one of a plurality of portions of the data stream. Benign characters are removed from the at least one portion of the data stream to prevent benign strings from being identified as duplicate content. At least one of a plurality of counters is incremented in response to the calculated hash function result. Each counter corresponds to a corresponding calculated hash function result. The duplicate content is identified when at least one of the plurality of counters exceeds a count value. The identified duplicate content is verified to be not a benign string.

Description

Method and system for content detection with reconfigurable hardware

相关申请的交叉引用Cross References to Related Applications

本申请要求了于2004年8月24日提交的美国临时申请序号60/604,372题目为“METHODS AND SYSTEMS FOR CONTENTDETECTION IN A RECONFIGURABLE HARDWARE”的申请日期和优先权，在此将其内容按照法律所允许的程度加以引用以供参考。This application claims the filing date and priority of U.S. Provisional Application Serial No. 60/604,372, filed August 24, 2004, entitled "METHODS AND SYSTEMS FOR CONTENT DETECTION IN A RECONFIGURABLE HARDWARE," the contents of which are hereby incorporated as permitted by law Levels are cited for reference.

技术领域 technical field

本发明总体上涉及网络通信领域，并且尤其涉及用于检测在经由网络所转送的数据中内容的方法和系统。The present invention relates generally to the field of network communications, and more particularly to methods and systems for detecting content in data transferred over a network.

背景技术 Background technique

因特网蠕虫(worm)通过利用在操作系统以及在系统上所运行的其它软件中的漏洞来工作。攻击会损害安全性并使网络性能降级。它们的影响包括由于系统停机时间和工人无法工作所导致的商业上的巨大经济损失。预计在未来用于使网络免受恶意代码危害的系统会成为关键因特网基础设施的一部分。当前，被称作入侵检测和阻止系统(Intrusion Detection and Prevention Systems IDPS)的这些系统通常只过滤先前标识的蠕虫，所以这些系统的作用很有限。Internet worms work by exploiting vulnerabilities in the operating system and other software running on the system. Attacks can compromise security and degrade network performance. Their impact includes significant financial losses for businesses due to system downtime and workers not being able to work. It is expected that in the future the systems used to protect networks from malicious code will become part of the critical Internet infrastructure. Currently, these systems, known as Intrusion Detection and Prevention Systems (IDPS), typically only filter previously identified worms, so they are of limited use.

发明内容 Contents of the invention

根据本发明的方法和系统检测网络通信业务流量中频繁出现的内容，诸如蠕虫特征码(worm signature)。用硬件来实现内容检测，这与常规的基于软件的方法相比较提供了更高的吞吐量。扫描在网络中经由数据流所发送的数据来标识类似内容的模式(pattern)。频繁出现的数据模式被标识并报告为可能是蠕虫特征码或其它类型的特征码。可以并行扫描数据以提供高吞吐量。通过把几个数据字节窗(window ofbytes)并行散列到芯片内(on-chip)块存储器来维持吞吐量，其中可以并行更新每个芯片内块存储器。可以把所标识的内容与在芯片外(off-chip)存储器中所存储的已知特征码相比较来确定是否存在假肯定(false positive)。由于按照本发明的方法和系统可标识频繁出现的模式，所以它们不局限于标识已知的特征码。The method and system according to the present invention detect frequently occurring content in network communication traffic, such as worm signatures. Content detection is implemented in hardware, which provides higher throughput compared to conventional software-based approaches. Data sent via data streams in the network is scanned to identify patterns of similar content. Frequently occurring data patterns are identified and reported as likely worm signatures or other types of signatures. Data can be scanned in parallel to provide high throughput. Throughput is maintained by parallel hashing of several windows of bytes of data to on-chip block memories, where each on-chip block memory can be updated in parallel. The identified content may be compared to known signatures stored in off-chip memory to determine if there are false positives. Because the methods and systems according to the present invention can identify frequently occurring patterns, they are not limited to identifying known signatures.

依照根据本发明的方法，提供了一种在数据处理系统中用于标识数据流中的重复内容的方法。所述方法包括步骤：为所述数据流的多个部分中的至少一个部分计算散列函数；响应于所计算的散列函数结果来使多个计数器中的至少一个计数器的值增加，每个计数器对应于相应计算的散列函数结果；当所述多个计数器中的至少一个计数器的值超过预定阈值时标识所述重复内容；并且验证所标识的重复内容并非是良性串(benign string)。According to the method according to the invention, there is provided a method in a data processing system for identifying duplicate content in a data stream. The method includes the steps of: computing a hash function for at least one of a plurality of portions of the data stream; and incrementing the value of at least one of a plurality of counters in response to the computed hash function result, each The counters correspond to corresponding calculated hash function results; identifying the duplicate content when a value of at least one of the plurality of counters exceeds a predetermined threshold; and verifying that the identified duplicate content is not a benign string.

依照根据本发明的系统，提供了一种用于标识数据流中的重复内容的系统。所述系统包括：散列函数计算电路，用于为所述数据流的多个部分中的至少一个部分计算散列函数；多个计数器，响应于所计算的散列函数结果来使多个计数器中的至少一个计数器的值增加，每个计数器对应于相应计算的散列函数结果；重复内容标识器，用于当所述多个计数器中的至少一个计数器的值超过预定计数值时标识所述重复内容；和验证器，用于验证所标识的重复内容并非是良性串。According to the system according to the invention, a system for identifying duplicate content in a data stream is provided. The system includes: a hash function calculation circuit for calculating a hash function for at least one of the plurality of portions of the data stream; a plurality of counters responsive to the calculated hash function results to cause the plurality of counters to The value of at least one counter in the plurality of counters is increased, each counter corresponding to the hash function result of the corresponding calculation; the duplicate content identifier is used to identify when the value of at least one counter in the plurality of counters exceeds a predetermined count value. duplicate content; and a validator to verify that the identified duplicate content is not a benign string.

依照根据本发明的系统，提供了一种用于标识数据流中的重复内容的系统。所述系统包括：用于为所述数据流的多个部分中的至少一个部分计算散列函数的装置；用于响应于所计算的散列函数结果来使多个计数器中的至少一个计数器的值增加的装置，每个计数器对应于相应计算的散列函数结果；用于当所述多个计数器中的至少一个计数器的值超过预定计数值时标识所述重复内容的装置；和用于验证所标识的重复内容并非是良性串的装置。According to the system according to the invention, a system for identifying duplicate content in a data stream is provided. The system includes: means for calculating a hash function for at least one of the plurality of portions of the data stream; and for causing at least one of the plurality of counters to be activated in response to the calculated result of the hash function. means for incrementing a value, each counter corresponding to a corresponding calculated hash function result; means for identifying said duplicate content when the value of at least one of said plurality of counters exceeds a predetermined count value; and means for verifying The identified duplicate content is not a benign string device.

本领域技术人员当考察以下附图和具体实施方式时，本发明的其它特征变得更加清楚。意在把所有这种附加系统、方法、特征和优点包括在本说明书中、本发明的范围内，并且通过权利要求书来保护。Other features of the present invention will become more apparent to those skilled in the art when examining the following drawings and detailed descriptions. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.

附图说明 Description of drawings

并入并构成此说明书一部分的附图图示了本发明的实现方式，并且连同说明书一起用来解释本发明的优点和原理：在附图中，The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the implementation of the invention and together with the description serve to explain the advantages and principles of the invention: In the accompanying drawings,

图1A是按照本发明用于执行内容检测的系统的框图；Figure 1A is a block diagram of a system for performing content detection in accordance with the present invention;

图1B是按照本发明示出特征码检测设备怎样处理数据流的功能框图；Fig. 1B is a functional block diagram showing how a signature detection device processes a data stream according to the present invention;

图2是按照本发明的特征码检测设备的框图；Fig. 2 is a block diagram according to the feature code detecting device of the present invention;

图3是按照本发明的计数处理器的框图；Figure 3 is a block diagram of a counting processor according to the present invention;

图4是按照本发明的字符过滤器的框图；Figure 4 is a block diagram of a character filter according to the present invention;

图5是按照本发明的字节移位器的框图；Figure 5 is a block diagram of a byte shifter according to the present invention;

图6是按照本发明包含良性字符串的控制分组的框图；Figure 6 is a block diagram of a control packet containing benign character strings in accordance with the present invention;

图7是按照本发明的大计数向量的框图；Figure 7 is a block diagram of a large count vector according to the present invention;

图8是更详细的图7的大计数向量的框图；Figure 8 is a more detailed block diagram of the large count vector of Figure 7;

图9是按照本发明的流水线的框图；Figure 9 is a block diagram of a pipeline according to the present invention;

图10是用于描绘并行处理数据流的字节的功能框图；Figure 10 is a functional block diagram for depicting parallel processing of bytes of a data stream;

图11示出了优先级编码器怎样在没有冲突的情况下处理数据的例子；Figure 11 shows an example of how a priority encoder handles data without conflicts;

图12示出了优先级编码器怎样在具有冲突的情况下处理数据的例子；Figure 12 shows an example of how a priority encoder handles data in the event of a conflict;

图13是按照本发明的分析器的框图；Figure 13 is a block diagram of an analyzer according to the present invention;

图14是按照本发明的分析器状态的状态图；和Figure 14 is a state diagram of analyzer states according to the present invention; and

图15是从按照本发明的报警产生器所发出的控制分组的框图。Fig. 15 is a block diagram of control packets sent from the alarm generator according to the present invention.

遍及几个附图，相应的附图特征码指代相应的部分。Corresponding figure identifiers refer to corresponding parts throughout the several figures.

具体实施方式 Detailed ways

现在按照如附图中所图示的本发明来详细地参考依照方法、系统和制造产品的实现方式。Reference will now be made in detail to implementations in accordance with methods, systems and articles of manufacture in accordance with the invention as illustrated in the accompanying drawings.

按照本发明的方法和系统检测数据流中频繁出现的内容，诸如蠕虫特征码，同时抵抗多态技术，诸如由蠕虫作者所使用的那些技术。为了以高速实现内容检测，用硬件来实现系统。Methods and systems according to the present invention detect frequently occurring content in data streams, such as worm signatures, while resisting polymorphic techniques, such as those used by worm authors. In order to realize content detection at high speed, the system is realized with hardware.

图1A是适于与按照本发明的方法和系统一起使用的说明性数据处理系统100的框图。如同所示，多个主机被连接到多个子网络。即，主机102、104和106被连接到子网络108；主机110和112被连接到子网络114；并且主机116和118被连接到子网络120。在各自的子网络之间以及在所述子网络和更大网络128(诸如因特网)之间的通信业务流量经过路由器126。虚拟局域网(VLAN)集中器122集中进入路由器126的网络通信业务流量。通过把特征码检测设备124放置在路由器和VLAN集中器122之间，可以为内容扫描在子网络之间的通信业务流量。FIG. 1A is a block diagram of an illustrative data processing system 100 suitable for use with methods and systems in accordance with the present invention. As shown, multiple hosts are connected to multiple subnets. That is, hosts 102 , 104 , and 106 are connected to subnetwork 108 ; hosts 110 and 112 are connected to subnetwork 114 ; and hosts 116 and 118 are connected to subnetwork 120 . Communication traffic between respective subnetworks and between the subnetworks and a larger network 128 , such as the Internet, passes through router 126 . Virtual local area network (VLAN) concentrator 122 concentrates network communication traffic entering router 126 . By placing the signature detection device 124 between the router and the VLAN concentrator 122, the traffic flow between the sub-networks can be scanned for content.

在图1A的说明性例子中，特征码检测设备124是现场可编程的端口扩展器(field-programmable port extender，FPX)平台。FPX平台允许通过使用大现场可编程门阵列(field programmable gate array，FPGA)130(诸如Xilinx XCV2000E FPGA)来处理高速网络流。下述特征码检测电路可以被下载到FPGA 130中以便按高达2.5千兆位每秒的通信业务量速率来处理网络流。使用32位宽数据字来把网络通信业务计时(clock)到FPGA 130中。本领域技术人员应当理解，可以使用不同于这里所描述的硬件和软件组件来实现按照本发明的方法和系统。例如，可以采用不同于FPX平台的其他设备来实现特征码检测设备。In the illustrative example of FIG. 1A , signature detection device 124 is a field-programmable port extender (FPX) platform. The FPX platform allows high-speed network flows to be processed by using a large field programmable gate array (FPGA) 130, such as a Xilinx XCV2000E FPGA. The signature detection circuitry described below can be downloaded into FPGA 130 to process network flows at communication traffic rates up to 2.5 Gigabits per second. Network communication traffic is clocked into FPGA 130 using 32-bit wide data words. Those skilled in the art will understand that the methods and systems according to the present invention may be implemented using hardware and software components other than those described herein. For example, other devices other than the FPX platform may be used to implement the signature detection device.

在这里所描述的说明性例子中，参考检测蠕虫特征码来说明，但按照本发明的方法和系统并不局限于此。按照本发明的方法和系统可标识数据流中的重复内容。重复内容可以是但不局限于：蠕虫；病毒；大量访问网址人群的出现；存在被发送到多个收件人的大量类似的电子邮件，诸如垃圾邮件；经由对等网络所重复交换的内容，诸如音乐或视频；及其它模式的重复内容。In the illustrative examples described herein, reference is made to detecting worm signatures, although methods and systems according to the present invention are not limited thereto. Methods and systems according to the present invention can identify duplicate content in a data stream. Duplicate content can be, but is not limited to: worms; viruses; the presence of large numbers of people visiting a website; the existence of a large number of similar emails being sent to multiple recipients, such as spam; content repeatedly exchanged via peer-to-peer networks, Duplicate content such as music or video; and other formats.

图1B是按照本发明示出特征码检测设备124怎样处理数据流的功能框图。在说明性例子中，现场可编程门阵列130包括字符过滤器150、散列处理器152、计数向量(count vector)154、时间平均处理器156、阈值分析器158、芯片外存储器分析器160和报警产生器162的功能组件。这些功能组件提供了现场可编程门阵列130的说明性的高级功能视图。下面参考图3-15更详细地描述了现场可编程门阵列130及其功能。FIG. 1B is a functional block diagram illustrating how signature detection device 124 processes a data stream in accordance with the present invention. In the illustrative example, FPGA 130 includes character filter 150, hash processor 152, count vector 154, time averaging processor 156, threshold analyzer 158, off-chip memory analyzer 160, and A functional component of the alarm generator 162 . These functional components provide an illustrative high-level functional view of field programmable gate array 130 . The field programmable gate array 130 and its function are described in more detail below with reference to FIGS. 3-15.

如在说明性例子中所示，字符过滤器150从数据流170中采样数据，并且滤出不太可能是二进制数据一部分的字符，从而提供N字节的数据串172。如下面将更详细地描述，蠕虫典型情况下由二进制数据构成。这样，字符过滤器150滤出不太可能表现蠕虫特征码特性的一些字符。散列处理器152对N字节串172计算k位散列，并且把所产生的特征码散列到计数向量154。如下面将更详细地描述，计数向量154可以包括多个计数向量。当特征码散列到计数向量154时，由散列所指定的计数器的值被增加。按周期性间隔，这里被称作测量间隔，每个计数向量中的计数值被减少等于或大于由于正常通信业务流量所到达平均数的量，如按照时间平均处理器156所确定的。当计数向量154到达预定阈值时，如阈值分析器158所确定的，芯片外存储器分析器160把有害串(offending string)散列到芯片外存储器212中的表。下一次出现相同的串时，在芯片外存储器212中的相同地点进行散列以便比较这两个串。如果两个串是相同的，那么产生报警。如果两个串是不同的，那么利用新的串来重写芯片外存储器212中的串。因此，芯片外存储器分析器160可以通过减少由于半频繁出现的串所导致的报警来减少报警数目。当接收到报警消息时，报警产生器162向外部机器发送包括有害特征码(offending signature)的控制分组以用于进一步分析。As shown in the illustrative example, character filter 150 samples data from data stream 170 and filters out characters that are unlikely to be part of the binary data, thereby providing N-byte data string 172 . As will be described in more detail below, worms typically consist of binary data. In this way, character filter 150 filters out characters that are less likely to exhibit characteristics of a worm signature. Hash processor 152 computes a k-bit hash on N-byte string 172 and hashes the resulting signature to count vector 154 . As will be described in more detail below, count vector 154 may include multiple count vectors. When the signature is hashed to count vector 154, the value of the counter specified by the hash is incremented. At periodic intervals, referred to herein as measurement intervals, the count value in each count vector is reduced by an amount equal to or greater than the average reached due to normal traffic flow, as determined by time averaging processor 156 . When count vector 154 reaches a predetermined threshold, as determined by threshold analyzer 158 , off-chip memory analyzer 160 hashes the offending string to a table in off-chip memory 212 . The next time the same string occurs, a hash is done at the same location in off-chip memory 212 to compare the two strings. If the two strings are the same, an alarm is generated. If the two strings are different, then the string in off-chip memory 212 is overwritten with the new string. Thus, off-chip memory analyzer 160 can reduce the number of alerts by reducing alerts due to semi-frequently occurring strings. When an alarm message is received, the alarm generator 162 sends a control packet including an offending signature to an external machine for further analysis.

图2是用于更详细地示出特征码检测设备124的框图。在说明性的例子中，用于在网络上检测信号的电路在现场可编程门阵列130中被实现为被称作worm_app的应用202。worm_app202适于在分层协议包装(wrapper)204的架构内。如下面将更详细地描述，计数处理器206从分层协议包装204接收包装信号，把所述包装信号分析转换为字节流，把所述字节流散列到计数向量，并且使计数器的值增加。计数处理器206进一步对所检测的蠕虫特征码数目进行计数平均并且处理良性串。计数处理器206输出信号count_match，对于超过阈值以及相应的10字节长的蠕虫offending_signature的特征码来说该信号被断言为高。另外，计数处理器206可以向分层协议包装204输出信号。FIG. 2 is a block diagram illustrating the signature detection device 124 in more detail. In the illustrative example, circuitry for detecting signals on the network is implemented in field programmable gate array 130 as application 202 called worm_app. The worm_app 202 fits within the framework of a layered protocol wrapper 204 . As will be described in more detail below, the count processor 206 receives a wrapped signal from the layered protocol wrapper 204, parses the wrapped signal into a byte stream, hashes the byte stream into a count vector, and makes the value of the counter Increase. The count processor 206 further counts and averages the number of detected worm signatures and processes benign strings. The count processor 206 outputs the signal count_match which is asserted high for signatures exceeding the threshold and corresponding 10 byte long worm offending_signature. Additionally, count processor 206 may output signals to layered protocol wrapper 204 .

实现worm_app电路，使得它提供高吞吐量和低延迟。为了达到这一点性能，worm_app电路可以具有流水线。在说明性例子中，流水线长度是27个时钟周期并且可以如下分解：Implement the worm_app circuit such that it provides high throughput and low latency. To achieve this performance, the worm_app circuit can be pipelined. In the illustrative example, the pipeline length is 27 clock cycles and can be broken down as follows:

-FIFO延迟：3个时钟周期-FIFO latency: 3 clock cycles

-计数处理器延迟：11个时钟周期- Count processor latency: 11 clock cycles

-分析器延迟：13个时钟周期- Analyzer latency: 13 clock cycles

分析器208接收来自计数处理器206的输入信号并且与在诸如静态随机存取存储器(static random access memory SRAM)之类的芯片外存储器212中所存储的散列表210有接口。如果count_match被断言为高，那么由分析器208来访问芯片外存储器212。如果在芯片外存储器212的散列表210中标识了offending_signature，那么分析器208输出被断言为高的信号analyzer_match。报警产生器214接收来自分析器208的analyzer_match信号并且把它从计数处理器206所接收的包装信号传递到分层协议包装204。当analyzer_match信号被断言为高时，报警产生器214发出包含有offending_signature的控制分组。Analyzer 208 receives input signals from counting processor 206 and interfaces with hash table 210 stored in off-chip memory 212, such as static random access memory (SRAM). If count_match is asserted high, off-chip memory 212 is accessed by analyzer 208 . If the offending_signature is identified in the hash table 210 of the off-chip memory 212, the analyzer 208 outputs a signal analyzer_match that is asserted high. Alarm generator 214 receives the analyzer_match signal from analyzer 208 and passes the wrapper signal it receives from count processor 206 to layered protocol wrapper 204 . When the analyzer_match signal is asserted high, the alarm generator 214 sends out a control packet containing the offending_signature.

在图3中示出了说明性计数处理器206的组件级视图。计数处理器206包括分组缓冲器302。如下面将描述，当块RAM被占用并且所述块RAM内的计数器无法被增加时，分组缓冲器302在计数平均周期期间缓存分组。在计数平均周期以外，分组缓冲器302经由通信业务传递。字符过滤器304判定哪些字节要包括在蠕虫特征码中。字节移位器306根据字符过滤器304的输出来汇编可以计数的输入串。大计数向量308散列从字节移位器306所接收的字符串，使相应的计数器的值增加并且按照需要产生报警。下面将更详细地描述计数处理器的每个功能组件。A component-level view of an illustrative count processor 206 is shown in FIG. 3 . Count processor 206 includes packet buffer 302 . As will be described below, the packet buffer 302 buffers packets during count averaging periods when the block RAM is occupied and the counters within the block RAM cannot be incremented. Outside of the count averaging period, the packet buffer 302 is delivered via traffic. Character filter 304 determines which bytes are to be included in the worm signature. Byte shifter 306 assembles a countable input string from the output of character filter 304 . The large count vector 308 hashes the string received from the byte shifter 306, increments the corresponding counter value and generates an alarm as needed. Each functional component of the counting processor is described in more detail below.

在图4的框图中更详细地示出了字符过滤器304。字符过滤器304可使得所选择的字符从散列计算中被排除。由于蠕虫典型情况下由二进制数据构成，所以特征码检测设备可以忽略数据流中的一些字符，所述字符非常不可能是二进制数据的一部分。这些字符例如包括数据流中的空白、换行符、换行和空格。文本文档例如包含用于填充的大量空格和空行。要避免这些字符的另一原因在于空行或空格的字符串不必表现可以用来标识蠕虫的良好特征码的特性。优选使用不会出现在文档中的串。按照本发明的方法和系统不局限于用于避免有害特征码的启发式方式。可以实现的其它方式包括但不局限于：标识并忽略电子邮件消息中的文本，预处理整个串，或者流编辑以搜索规则的表达式并用串来代替它们。Character filter 304 is shown in more detail in the block diagram of FIG. 4 . Character filter 304 may cause selected characters to be excluded from hash calculations. Since worms typically consist of binary data, the signature detection device can ignore some characters in the data stream that are very unlikely to be part of the binary data. These characters include, for example, blanks, newlines, newlines, and spaces in the data stream. Text documents for example contain a lot of spaces and empty lines for padding. Another reason to avoid these characters is that blank lines or strings of spaces do not have to exhibit the properties of a good signature to identify a worm. It is preferable to use strings that do not appear in the document. Methods and systems according to the present invention are not limited to heuristics for avoiding harmful signatures. Other ways this can be done include, but are not limited to: identifying and ignoring text in email messages, preprocessing entire strings, or stream editing to search for regular expressions and replace them with strings.

字符过滤器304接收32位数据字data_in以及信号data_en来作为输入，所述信号data_en用于标识data_in中的数据是否有效。字符过滤器304把32位字拆分为4个单个字节(byte1到byte4)并且输出相应的信号以便表明所述字节是否包含有效数据(byte1valid到byte4valid)。如果它是字符过滤器304正寻找的字符之一，那么它被认为是无效的。如果例如4字节字符串a、newline、b、null被字符过滤器304作为输入来接收，并且假定字符过滤器304被配置为忽略换行符和空字符，那么字符过滤器304的相应输出信号可能会是：The character filter 304 receives a 32-bit data word data_in and a signal data_en as input, and the signal data_en is used to identify whether the data in data_in is valid. Character filter 304 splits the 32-bit word into 4 individual bytes (byte1 to byte4) and outputs a corresponding signal to indicate whether the byte contains valid data (byte1valid to byte4valid). If it is one of the characters that character filter 304 is looking for, it is considered invalid. If, for example, the 4-byte string a, newline, b, null is received as input by character filter 304, and assuming character filter 304 is configured to ignore newline and null characters, then the corresponding output signal of character filter 304 may will be:

Byte1：a，Byte1有效：高Byte1: a, Byte1 valid: high

Byte2：newline，Byte2有效：低Byte2: newline, Byte2 effective: low

Byte3：b，Byte3有效：高Byte3: b, Byte3 valid: high

Byte4：null，Byte4有效：低Byte4: null, Byte4 valid: low

图5是说明性字节移位器306的框图。字节移位器306从字符过滤器304读入值并且输出特征码的字节移位版本，可能会被大计数向量308散列。字节移位器306还输出需要被散列的字节数目(num_hash)以及用于告诉大计数向量308何时开始计数平均的信号。字节移位器306接受从字符过滤器304输出的数据。在说明性例子中，输出特征码是13字节长，并且每个特征码均包含4个10字节的重叠字符串。FIG. 5 is a block diagram of an illustrative byte shifter 306 . Byte shifter 306 reads in values from character filter 304 and outputs a byte-shifted version of the signature, possibly hashed by large count vector 308 . The byte shifter 306 also outputs the number of bytes that need to be hashed (num_hash) and a signal to tell the large count vector 308 when to start counting averages. Byte shifter 306 accepts data output from character filter 304 . In the illustrative example, the output signatures are 13 bytes long, and each signature contains four overlapping strings of 10 bytes.

下面的说明性例子描述字节移位器的功能。如果输入是“NIMDAADMIN123”后面跟随来自先前例子的字符串a，newline，b，null，那么字符串的字节移位版本可能会是“MDAADMIN123ab”并且num_hash可能会是2。如下所述，num_hash值由大计数向量308来使用。The following illustrative example describes the functionality of the byte shifter. If the input was "NIMDAADMIN123" followed by the strings a, newline, b, null from the previous example, then the byte-shifted version of the string would be "MDAADMIN123ab" and num_hash would be 2. The num_hash value is used by the large count vector 308 as described below.

为了维护所检测特征码数目的运行平均，周期性减少所检测特征码的计数。在说明性例子中，这在已经处理了固定数目的字节(诸如2.5兆字节)之后在分组边界发生。字节移位器306跟踪已经被散列到大计数向量308的字节数目。当所处理的字节总数超过阈值时，那么字节移位器306经历下列步骤：To maintain a running average of the number of detected signatures, the count of detected signatures is periodically decremented. In the illustrative example, this occurs at packet boundaries after a fixed number of bytes, such as 2.5 megabytes, have been processed. Byte shifter 306 keeps track of the number of bytes that have been hashed to large count vector 308 . When the total number of bytes processed exceeds a threshold, then byte shifter 306 goes through the following steps:

1.字节移位器308等待要从分组缓冲器302读取的当前分组的最后字，继而停止从分组缓冲器302的读取。从此时起，进入计数处理器206的通信业务量被暂时缓存在分组缓冲器302中。这样做是由于当进行计数平均时无法散列并计数这些字节。1. The byte shifter 308 waits for the last word of the current packet to be read from the packet buffer 302 and then stops reading from the packet buffer 302 . From this point on, the communication traffic entering the counting processor 206 is temporarily buffered in the packet buffer 302 . This is done because the bytes cannot be hashed and counted when doing count averaging.

2.当当前分组的最后字已经被大计数向量308处理时，字节移位器306把subtract_now信号断言为高。此信号由大计数向量308用来开始计数平均。2. When the last word of the current packet has been processed by the large count vector 308, the byte shifter 306 asserts the subtract_now signal high. This signal is used by the large count vector 308 to start count averaging.

当来自包装的start of payload(有效负载开始)信号被断言为高时，字节移位器306把count_now信号断言为高。当来自包装的end of frame(帧末尾)信号被断言为高时，count_now被断言为低。据此，可以计数仅包括有效负载的字节。When the start of payload signal from the wrapper is asserted high, the byte shifter 306 asserts the count_now signal high. count_now is asserted low when the end of frame signal from the wrapper is asserted high. From this, bytes including only the payload can be counted.

字节移位器306还可以确定在数据流中是否存在良性串。诸如来自Microsoft Update的代码段之类的良性串可以通过作为一组串被编程到字节移位器306中来识别，这种串尽管通常出现在网络上，然而并非是蠕虫。通过经由数据流在字节移位器306接收良性串分组来把良性串加载到大计数向量308中。例如当分组在端口1200上被发送到目的地址192.168.200.2时，字节移位器306假定所述分组包含良性串的13位散列值。使用散列值的高5位来引用32个块RAM之一并且使用低8位来引用每个块RAM内256个计数器之一。在图6中示出了包含良性串的说明性控制分组602的图。按benign_string输出有效负载的第一字的低13位并且benign_valid被断言为高。由于控制分组包含不必计数的良性串，所以count_now被断言为低。benign_valid和count_string信号由大计数向量308用来避免计数良性串，如下面所解释。Byte shifter 306 may also determine whether there are benign strings in the data stream. A benign string, such as a code segment from Microsoft Update, can be identified by being programmed into the byte shifter 306 as a set of strings that, while commonly appearing on the web, are not worms. The benign strings are loaded into the large count vector 308 by receiving the benign string packets at the byte shifter 306 via the data stream. For example, when a packet is sent on port 1200 to a destination address of 192.168.200.2, byte shifter 306 assumes that the packet contains a 13-bit hash value of a benign string. The upper 5 bits of the hash value are used to reference one of the 32 block RAMs and the lower 8 bits are used to reference one of the 256 counters within each block RAM. A diagram of an illustrative control packet 602 containing benign strings is shown in FIG. 6 . The lower 13 bits of the first word of the payload are output as benign_string and benign_valid is asserted high. count_now is asserted low because the control packet contains benign strings that do not have to be counted. The benign_valid and count_string signals are used by the large count vector 308 to avoid counting benign strings, as explained below.

图7是说明性大计数向量308的框图。字节移位器306的输出是大计数向量308的输入。大计数向量308包含逻辑，用于散列输入的字符串、解决在块RAM之间的冲突、从块RAM中进行读取、使计数器的值增加并且写回到块RAM。在说明性例子中，大计数向量308包括32个块RAM，每个块RAM具有256个计数器，每个计数器都是16位宽。利用此大小的说明性计数器，可以支持高达64K的计数。下面参考图8将更详细地描述大计数向量308的功能组件。FIG. 7 is a block diagram of an illustrative large count vector 308 . The output of byte shifter 306 is the input of large count vector 308 . The big count vector 308 contains the logic to hash the input string, resolve conflicts between block RAMs, read from block RAM, increment the value of the counter and write back to block RAM. In the illustrative example, large count vector 308 includes 32 block RAMs each having 256 counters, each counter being 16 bits wide. With an illustrative counter of this size, counts up to 64K can be supported. The functional components of the large count vector 308 will be described in more detail below with reference to FIG. 8 .

说明性的大计数向量308每个时钟周期对四个10字节字符串计算四个散列值，所述四个10字节字符串包括在13字节的信号字符串中。每个时钟周期计算一个以上散列值以维护吞吐量。由于所跟踪的特征码可能出现在有效负载中的任何点，并且它们被散列到相同的地点而不管它们在分组中的偏移如何，所以在所有情况下使用相同的散列函数。每个散列函数产生13位的值。The illustrative large count vector 308 computes four hash values per clock cycle for the four 10-byte strings included in the 13-byte signal string. Compute more than one hash value per clock cycle to maintain throughput. Since the tracked signatures may appear at any point in the payload, and they are hashed to the same place regardless of their offset in the packet, the same hash function is used in all cases. Each hash function produces a 13-bit value.

为了检测经常出现的内容，大计数向量308对流送数据的10字节(80位)窗上计算k位散列。为了计算所述散列，在配置计数处理器的时候产生一组k×80个随机二进制值。所述散列的每个位被计算为对80位输入串的随机选择子集的异或(XOR)。通过使散列函数随机化，对手无法确定可能会引起过多散列冲突的字节模式。对每个有效负载的多个散列计算确保简单的多态方法难以奏效。To detect frequently occurring content, the large count vector 308 computes a k-bit hash over a 10-byte (80-bit) window of streaming data. To compute the hash, a set of k x 80 random binary values is generated when the counting processor is configured. Each bit of the hash is computed as an exclusive OR (XOR) on a randomly selected subset of the 80-bit input string. By randomizing the hash function, an adversary cannot determine byte patterns that could cause too many hash collisions. Multiple hash calculations for each payload ensure that simple polymorphic approaches are less effective.

在说明性实施例中，使用称作H₃的通用散列函数。散列函数H₃被定义为：In an illustrative embodiment, a general hash function called _H3 is used. The hash function _H3 is defined as:

$h h ((X x)) = = {d d}_{11} \cdot \cdot {x x}_{11} &CirclePlus; &CirclePlus; {d d}_{22} \cdot \cdot {x x}_{22} &CirclePlus; &CirclePlus; {d d}_{33} \cdot \cdot {x x}_{33} &CirclePlus; &CirclePlus; . . . . . . {&CirclePlus; &CirclePlus; d d}_{b b} \cdot &Center Dot; {x x}_{b b}$

在以上公式中，b是按位来计量的字符串长度。在说明性例子中，b＝80位。(d₁，d₂，d₃，...d_b)是k×80个随机二进制值组。随机二进制值在[0..2^m+n-1]范围内(其中n是按位计量的单个计数器大小，并且2^m是所使用块RAM的数目)。换句话说，d值具有与要产生的散列值相同的范围。相对于输入对该随机值组所执行的XOR功能生成散列值，其对输入值来说具有一定分布。In the above formula, b is the length of the string measured in bits. In the illustrative example, b=80 bits. (d ₁ , d ₂ , d ₃ , . . . d _b ) are k×80 groups of random binary values. Random binary values in the range [0..2m ^+n-1 ] (where n is the size of a single counter in bits and ^2m is the number of block RAMs used). In other words, the d value has the same range as the hash value to be generated. An XOR function performed on the set of random values with respect to the input generates a hash value that has a distribution over the input values.

为了计算所述散列，对于字符串中的每个位来说，如果该位等于‘1’，那么该位相关联的随机值与当前的结果相XOR，以获得散列值。例如，给定d＝(101；100；110；011)并且输入串X＝1010，那么相应的3位散列函数为101 XOR 110＝011。To compute the hash, for each bit in the string, if the bit is equal to '1', the random value associated with that bit is XORed with the current result to obtain the hash value. For example, given d = (101; 100; 110; 011) and the input string X = 1010, then the corresponding 3-bit hash function is 101 XOR 110 = 011.

大计数向量308使用散列值来索引到计数器向量中，所述计数器包含在诸如计数向量802之类的计数向量中。当某个特征码散列到计数器时，这使得计数器的值增加一。按周期性间隔，这里可以被称作测量间隔，每个计数向量中的计数值被减少等于或大于由于正常通信业务流量所到达平均数的量。当计数器的值到达预定阈值时，分析器208访问芯片外存储器212，如下面将描述，并且计数器被复位。对于Xilinx FPGA上的电路的说明性实现方式来说，通过把双端口的芯片内块RAM配置为存储单元阵列来实现计数向量。每个说明性存储器在每个时钟周期可以执行一个读取操作和一个写入操作。如图9所示，实现三级流水线在每个时钟周期读取、增加并写入存储器。由于该特征码在每个时钟周期都会改变并且由于每个特征码的每次出现都被计数，所以从存储器子系统来看需要高性能。双端口的存储器允许写回一个特征码的出现次数，同时正读取另一特征码的出现。Large count vector 308 uses hash values to index into counter vectors contained in count vectors such as count vector 802 . This causes the value of the counter to be incremented by one when a signature hashes to the counter. At periodic intervals, which may be referred to herein as measurement intervals, the count value in each count vector is reduced by an amount equal to or greater than the average number reached due to normal traffic flow. When the value of the counter reaches a predetermined threshold, the analyzer 208 accesses the off-chip memory 212, as will be described below, and the counter is reset. For the illustrative implementation of the circuit on a Xilinx FPGA, the count vector is implemented by configuring a dual-ported on-chip block RAM as an array of memory cells. Each illustrative memory can perform one read operation and one write operation per clock cycle. As shown in Figure 9, a three-stage pipeline is implemented to read, increase and write to the memory in each clock cycle. Since the signature changes every clock cycle and since each occurrence of each signature is counted, high performance is required from the memory subsystem perspective. Dual-ported memory allows writing back occurrences of one signature while an occurrence of another signature is being read.

为了标出测量间隔的末尾，大计数向量308可以周期性复位计数器。在经过了固定的字节窗之后，通过把值写为零来复位所有计数器。然而，此方法具有缺陷。如果在测量间隔末尾的附近时对应于恶意特征码的计数器值刚好在阈值以下，那么复位此计数器会使该特征码不会被检测出来。因此作为替换方式，说明性大计数向量308周期性从所有计数器减去平均值。平均值被计算为所期望的字节数目，所述字节可以在间隔内被散列到每个计数器的值。如下所述，此方法要求使用比较器和减法器。To mark the end of the measurement interval, the large count vector 308 may reset the counters periodically. After the fixed byte window has elapsed, all counters are reset by writing the value to zero. However, this approach has drawbacks. If the counter value corresponding to a malicious signature is just below the threshold near the end of the measurement interval, resetting the counter makes the signature go undetected. So instead, the illustrative large count vector 308 periodically subtracts the average from all counters. The average is calculated as the expected number of bytes that can be hashed to each counter's value in the interval. As described below, this method requires the use of comparators and subtractors.

为了实现高吞吐量，可以在每个时钟周期内处理多个串。为了允许并行执行多个存储器操作，如图10所示在内容检测系统130中使用多个块RAM把计数向量分段到多个存储体(bank)中。使用散列值的较高位来确定要访问哪个块RAM。使用较低位来确定在给定块RAM内哪个计数器的值应当增加。多于一个的字符串可能散列到相同的块RAM。这种情况这里被称作“存储体冲突”。可以使用优先级编码器来解决存储体冲突。由于优先级编码器的操作，对于按OC-48行速率运行的系统来说，每个时钟周期可能不会计数在1和3个字符串之间。To achieve high throughput, multiple strings can be processed per clock cycle. To allow multiple memory operations to be performed in parallel, multiple block RAMs are used in the content detection system 130 as shown in FIG. 10 to segment count vectors into multiple banks. Use the higher bits of the hash value to determine which block of RAM to access. The lower bits are used to determine which counter's value should be incremented within a given block of RAM. More than one string may hash to the same block of RAM. This situation is referred to herein as a "bank conflict." A priority encoder can be used to resolve bank conflicts. Due to the operation of the priority encoder, it may not count between 1 and 3 strings per clock cycle for systems running at OC-48 line rates.

冲突的概率c可以用下式表示：The probability c of conflict can be expressed by the following formula:

$c c = = 11 - - {Π Π}_{i i = = N N - - B B + + 11}^{N N - - 11} \frac{i i}{N N}$

在上面的公式中，N是所使用的块RAM的数目并且B是每个时钟周期到来的字节数目。In the above formula, N is the number of block RAMs used and B is the number of bytes coming per clock cycle.

诸如优先级编码器804之类的优先级编码器解决当四个散列值中的两个或多个的较高5个位相同时可能发生的冲突。优先级编码器804输出需要被增加的块RAM的地址。如图8所示，使用散列值的较高5个位来标识要增加的块RAM。使用较低8个位来索引在块RAM内的其值要增加的计数器。bram_num1到bram_num4引用各块RAM。ctr_addr1到ctr_addr4引用在每个块RAM内的其值要增加的计数器的号。当相应的块RAM和计数器地址有效时，num1_valid到num4_valid被断言为高。由于报警可以由32个块RAM中的任何一个产生并且所述报警可以对应于四种可能的特征码，所以大计数向量308跟踪哪个特征码触发了所述报警。这通过使用对应于bram_num和ctr_addr信号的信号sign1到sign4来实现。在说明性例子中，信号sign1到sign4可以具有以下五个值之一：一、二、三和四对应于13字节信号字符串中的第一、第二、第三和第四特征码。值八表示良性串。A priority encoder, such as priority encoder 804, resolves collisions that may occur when the upper 5 bits of two or more of the four hash values are the same. The priority encoder 804 outputs the address of the block RAM that needs to be increased. As shown in Figure 8, the upper 5 bits of the hash value are used to identify the block RAM to be increased. The lower 8 bits are used to index the counter within the block RAM whose value is to be incremented. bram_num1 to bram_num4 refer to each block of RAM. ctr_addr1 to ctr_addr4 refer to the number of the counter whose value is to be incremented within each block RAM. num1_valid to num4_valid are asserted high when the corresponding block RAM and counter addresses are valid. Since an alert can be generated by any of the 32 block RAMs and the alert can correspond to four possible signatures, the large count vector 308 keeps track of which signature triggered the alert. This is achieved by using the signals sign1 to sign4 corresponding to the bram_num and ctr_addr signals. In an illustrative example, signals sign1 through sign4 can have one of five values: one, two, three, and four corresponding to the first, second, third, and fourth signatures in the 13-byte signal string. A value of eight indicates a benign string.

值num_hash确定了其中需要解决冲突的块RAM的数目。例如如果此信号值为二，那么这意味着字节移位器306已经按两个字节移位了该特征码。因此，由于另外两个特征码已经被计数，所以只计数两个特征码。The value num_hash determines the number of block RAMs in which conflicts need to be resolved. For example, if this signal has a value of two, then this means that the byte shifter 306 has shifted the signature by two bytes. Therefore, only two signatures are counted because two other signatures have already been counted.

在图11中示出了在没有冲突的情况下优先级编码器的功能的说明性例子。在说明性例子中，在第一时钟周期内，所有四个输入字节被字符过滤器认为是有效的。因此，所有四个特征码被散列，并且sign1到sign4具有有效值连同它们相应的bram_num和ctr_addr信号。在第二时钟周期内，四个输入字节中只有两个被字符过滤器认为是有效的。因此，只有两个特征码被散列。因此只有sign1和sign2具有关于特征码3和4的有效值。An illustrative example of the functionality of the priority encoder in the absence of conflicts is shown in FIG. 11 . In the illustrative example, during the first clock cycle, all four input bytes are considered valid by the character filter. Therefore, all four signatures are hashed, and sign1 through sign4 have valid values along with their corresponding bram_num and ctr_addr signals. During the second clock cycle, only two of the four input bytes are considered valid by the character filter. Therefore, only two signatures are hashed. So only sign1 and sign2 have valid values for signatures 3 and 4.

在图12中示出了在存在冲突的情况下优先级编码器的功能的说明性例子。如说明性例子中所示，被增加的块RAM在两种情况下冲突。在这两种情况中，优先一个特征码来解决冲突。一个特征码相对于另一个特征码的优先级在大计数向量308中。An illustrative example of the functionality of the priority encoder in the presence of conflicts is shown in FIG. 12 . As shown in the illustrative example, increased block RAM conflicts in two cases. In both cases, one signature is preferred to resolve conflicts. The priority of one signature relative to another is in large count vector 308 .

在说明性实施例中，由于块RAM的固有功能并不包括支持复位和计数平均，所以围绕所述块RAM提供包装(wrapper)以实现该功能。所述包装的功能说明性地由图8中所示出的说明性计数向量来表示。在大计数向量308中例示了此计数向量组件的三十二个拷贝，对于正使用的每个块RAM来说对应一个。In an illustrative embodiment, since the native functionality of block RAM does not include support for reset and count averaging, a wrapper is provided around the block RAM to implement this functionality. The functionality of the package is illustratively represented by the illustrative count vector shown in FIG. 8 . Thirty-two copies of this count vector component are instantiated in the large count vector 308, one for each block of RAM being used.

如计数向量的说明性例子中所示，所述计数向量具有reset(复位)信号。当reset信号被断言为低时，每个计数器的值被初始化为0。由于在说明性例子中并行初始化块RAM，所以这花费256个时钟周期(每个块RAM中的计数器数目)。hash标识了在count_vector中要读取的地址。dout标识了计数器中对应于hash的数据。addr标识了要写回增加的计数的地址，这在下面将进行描述。ctr_data标识了要写回到count_vector的值。set_ctr向count_vector提供了允许写入。当subtract被断言为高时，大计数向量通过每个计数器进行迭代并且从中减去平均值。如先前所提及，平均值被计算为所期望的在每个间隔中可能会散列到计数器的值的字节数目。如果给定计数器的值小于平均值，那么它被初始化为零。如果给定计数器的值包含与良性串相关联的特定字段，那么它不被减除。就像初始化计数向量一样，并行性确保了在256个时钟周期内实现减法。As shown in the illustrative example of the count vector, the count vector has a reset signal. When the reset signal is asserted low, the value of each counter is initialized to 0. Since the block RAMs are initialized in parallel in the illustrative example, this takes 256 clock cycles (the number of counters in each block RAM). hash identifies the address to read in count_vector. dout identifies the data corresponding to the hash in the counter. addr identifies the address where the incremented count is to be written back, as described below. ctr_data identifies the value to be written back to count_vector. set_ctr provides write permission to count_vector. When subtract is asserted high, the large count vector iterates through each counter and subtracts the average from it. As mentioned previously, the average is calculated as the expected number of bytes that might hash to the counter's value in each interval. If the value of the given counter is less than the average value, then it is initialized to zero. If the value of a given counter contains the particular field associated with a benign string, it is not decremented. As with initializing the count vector, parallelism ensures that the subtraction is achieved within 256 clock cycles.

为了支持良性串，利用超出阈值的值来填充对应于良性串的散列的计数器。当计数器具有此值时，电路跳过所述的增加和写回步骤。To support benign strings, the counter corresponding to the hash of the benign string is filled with values exceeding the threshold. When the counter has this value, the circuit skips the described steps of incrementing and writing back.

对于有限数目的常见字符串来说，可以不计数散列存储桶(bucket)，并且从而可以避免发送报警。但是随着良性串的数目接近可用计数器的数目，因为只有较少的计数器用于检测特征码，所以降低了有效性。对于较大数目的不常见字符串，可以在下游软件中避免假肯定产生。为了减少被发送到下游软件的假肯定，可以由控制主机来处理是良性的但是不那么频繁出现的字符串。For a limited number of common strings, hash buckets may not be counted, and thus alerts may be avoided. But as the number of benign strings approaches the number of available counters, effectiveness decreases because fewer counters are used to detect signatures. For larger numbers of uncommon strings, false positives can be avoided in downstream software. To reduce false positives being sent to downstream software, strings that are benign but occur less frequently can be processed by the control host.

返回参照图8，读取级806的输入是来自优先级编码器804的输出。来自读取级806的输出被连接到32个块RAM的地址和数据总线(例如，计数向量802)。然而，为简单起见在图8中只示出了一个计数向量802。根据读取级806的bram_num输入的值来断言适当的地址和数据信号。进入读取级806的信号sign1到sign4被分配给signb1到sign b32中的任何一个(此后被称作“sign”信号同时指任何一个块RAM)，所述信号除了当处理包含良性串的控制分组时会流出读取级806。在该情况下，输出sign信号被指定值8，使得比较组件808和增加组件810可以适当地处理它。Referring back to FIG. 8 , the input to the read stage 806 is the output from the priority encoder 804 . The output from the read stage 806 is connected to the address and data buses of the 32 block RAMs (eg, count vector 802). However, only one count vector 802 is shown in FIG. 8 for simplicity. Depending on the value of the read stage 806's bram_num input, the appropriate address and data signals are asserted. Signals sign1 to sign4 entering the read stage 806 are assigned to any of signb1 to sign b32 (hereinafter referred to as the "sign" signal while referring to any one of the block RAMs), except when processing control packets containing benign strings will flow out of the read stage 806 . In this case, the output sign signal is assigned a value of 8 so that the comparison component 808 and the addition component 810 can handle it appropriately.

诸如计数向量802之类的计数向量的输出由其相应的比较组件808来检查，并且如果它小于阈值，那么所述比较组件的inc信号被断言为高。如果它等于阈值，那么大计数向量308把count_match信号设置为高以便向分析器208通知潜在频繁出现的特征码。count_match信号使芯片外存储器212被占用长达13个时钟周期(这是因为它是开始从芯片外存储器212读取10字节字符串、比较字符字符串并且写回该字符串所花费的时间)，count_match抑制信号确保在两个count_match信号之间存在至少13个时钟周期的间隔。The output of a count vector, such as count vector 802, is checked by its corresponding compare component 808, and if it is less than a threshold, the compare component's inc signal is asserted high. If it is equal to the threshold, the large count vector 308 sets the count_match signal high to notify the analyzer 208 of potentially frequently occurring signatures. The count_match signal keeps off-chip memory 212 occupied for up to 13 clock cycles (this is because it is the time it takes to start reading a 10-byte string from off-chip memory 212, compare character strings, and write the string back) , the count_match suppress signal ensures that there is an interval of at least 13 clock cycles between two count_match signals.

在增加和写回级中，存在四个说明性功能，增加和写回级可以按流水线执行执行。在所有情况下，ctr_data是被写回到计数向量的值。四个说明性功能如下：There are four illustrative functions in the increment and writeback stages, which may be performed in a pipelined manner. In all cases, ctr_data is the value that is written back to the count vector. The four illustrative functions are as follows:

-如果inc信号已经被断言为高，那么ctr_data的值被设置为比count_vector的输出多一。- If the inc signal has been asserted high, then the value of ctr_data is set to one more than the output of count_vector.

-如果sign值为8，那么与良性串相关联的值被指定成ctr_data。在说明性例子中，此值为0 x FFFF。- If the sign value is 8, then the value associated with the benign string is designated as ctr_data. In the illustrative example, this value is 0 x FFFF.

-如果计数向量的输出为0 x FFFF，那么把相同的值指定给ctr_data以便保存良性串。- If the output of the count vector is 0 x FFFF, then assign the same value to ctr_data to hold the benign string.

-ctr_data的缺省值为0。如果计数器已经超过阈值，那么所述0值不会改变。The default value for -ctr_data is 0. If the counter has exceeded the threshold, then the zero value will not change.

valid(有效)信号(例如，bl_valid)当被反转(flop)适当的次数时被用为计数向量的写使能的输入(即，set_ctr)。A valid signal (eg, bl_valid) is used as an input to the count vector's write enable (eg, set_ctr) when flopped the appropriate number of times.

在放置和路由期间，可以依照承受大传播延迟的这种方式来放置一些块RAM。这可能使所述电路不满足定时约束。在说明性例子中通过把触发器包括到块RAM的输入和输出中来弥补此情况。在图8中没有示出附加的触发器以便保持简明。Some block RAM can be placed in such a way as to suffer large propagation delays during placement and routing. This may cause the circuit to fail timing constraints. This is remedied in the illustrative example by including flip-flops to the inputs and outputs of the block RAM. Additional triggers are not shown in Figure 8 in order to maintain clarity.

当发现有害特征码时，大计数向量308输出count_match以及相应的特征码(sign_num)。计数处理器206反转string适当的次数来反映大计数向量308的延迟。当count_match被断言为高时，根据sign_num的值来选择offending_signature。When a malicious signature is found, the large count vector 308 outputs count_match and the corresponding signature (sign_num). The count processor 206 inverts the string an appropriate number of times to reflect the latency of the large count vector 308 . When count_match is asserted high, offending_signature is chosen according to the value of sign_num.

图13是说明性分析器208的框图。分析器208保留可疑的特征码并且估算确定的特征码多长时间出现一次。从而，分析器208可以减少由报警产生器214所发送的报警数目。为了这样，分析器确信越过预定阈值的计数器的值实际上是频繁出现字符串的结果。当计数器的值跨过预定阈值时，有害串被散列到芯片外存储器212中的表。使用上述方法对有害特征码计算17位散列值。芯片外存储器212数据总线是19位宽。散列值映射到地址信号的高17位。地址信号的低两位被改变以用来表示存储器中的三个连续字(所述存储器用来存储10字节字符串)。散列值用来索引到芯片外存储器的散列表210中。下一次出现相同的串时，分析器散列到芯片外存储器212中的相同地点并且比较两个串。如果两个串是相同的，那么产生报警。如果两个串是不同的，那么分析器208重写芯片外存储器212单元并且存储另一串。在该情况下，因为散列函数把几个半频繁出现的字符串散列到相同的值，所以可能出现计数器溢出。由于半频繁出现的字符串不是所关注的，因此分析器208可防止出现产生报警分组的开销。FIG. 13 is a block diagram of an illustrative analyzer 208 . Analyzer 208 retains suspicious signatures and estimates how often certain signatures occur. Accordingly, analyzer 208 may reduce the number of alerts sent by alert generator 214 . To do this, the analyzer makes sure that the values of the counters crossing a predetermined threshold are in fact the result of frequent occurrences of strings. When the value of the counter crosses a predetermined threshold, the harmful string is hashed to a table in off-chip memory 212 . Compute a 17-bit hash value for the malicious signature using the method described above. The off-chip memory 212 data bus is 19 bits wide. The hash value maps to the upper 17 bits of the address signal. The lower two bits of the address signal are changed to represent three consecutive words in the memory used to store 10 byte strings. The hash value is used to index into the hash table 210 in off-chip memory. The next time the same string occurs, the analyzer hashes to the same location in off-chip memory 212 and compares the two strings. If the two strings are the same, an alarm is generated. If the two strings are different, analyzer 208 overwrites off-chip memory 212 cells and stores another string. In this case, counter overflow may occur because the hash function hashes several semi-frequently occurring strings to the same value. Since semi-frequently occurring strings are not of interest, analyzer 208 can prevent the overhead of generating alert packets.

下面解释分析器208的说明性信号：The illustrative signals of analyzer 208 are explained below:

count_match：当被大计数向量308断言为高时，某个特征码已经使计数器到达阈值。count_match: When asserted high by the large count vector 308, a certain signature has caused the counter to reach the threshold.

offending_signature：对应于count_match的特征码正被断言为高。offending_signature: The signature corresponding to count_match is being asserted high.

analyzer_match；当被断言为高时，分析器已经验证到达阈值的计数器不是假肯定的结果。analyzer_match; When asserted high, the analyzer has verified that the counter reaching the threshold is not the result of a false positive.

mod1_req：当被断言为高时，此信号表明请求访问芯片外存储器212。在正访问芯片外存储器212期间此信号保持为高。mod1_req: When asserted high, this signal indicates a request to access off-chip memory 212 . This signal remains high while the off-chip memory 212 is being accessed.

mod1_gr：当被断言为高时，此信号表明许可访问芯片外存储器212。mod1_gr: When asserted high, this signal indicates that access to off-chip memory 212 is granted.

mod1_rw：当此信号被断言为高时分析器208从芯片外存储器212中进行读取并且当此信号被断言为低时分析器208写入芯片外存储器212。mod1_rw: Analyzer 208 reads from off-chip memory 212 when this signal is asserted high and analyzer 208 writes to off-chip memory 212 when this signal is asserted low.

mod1_addr：表明要从中读取或写入的芯片外存储器地址。mod1_addr: Indicates the off-chip memory address to read from or write to.

mod1_d_in：包括正从芯片外存储器212中读取的数据。mod1_d_in: Contains the data being read from off-chip memory 212 .

mod1_d_out：包括正被写入芯片外存储器212的数据。mod1_d_out: Contains the data being written to the off-chip memory 212 .

分析器208被配置为包括用于芯片外存储器212访问的多个有限状态。在图14中示出了用于分析器208的说明性有限状态机。下面解释在图14中所描绘的每个说明性状态。Analyzer 208 is configured to include a plurality of finite states for off-chip memory 212 access. An illustrative finite state machine for analyzer 208 is shown in FIG. 14 . Each illustrative state depicted in FIG. 14 is explained below.

idle：是分析器208的默认状态。当count_match被断言为高时分析器208从此状态中转换出来。idle: is the default state of the analyzer 208 . Analyzer 208 transitions out of this state when count_match is asserted high.

prep_for_sram：在此状态中请求许可访问芯片外存储器212。当许可被授权时分析器208从此状态中转换出来。prep_for_sram: permission to access off-chip memory 212 is requested in this state. Analyzer 208 transitions out of this state when a license is granted.

send_read_request：如在图14的说明性例子中所示，实现三种send_read_request状态。在发送读取请求的所有三种状态中，mod1_rw被断言为高并且mod1_addr被设置为根据offending_signature的散列所导出的值。send_read_request: As shown in the illustrative example of Figure 14, three send_read_request states are implemented. In all three states where a read request is sent, mod1_rw is asserted high and mod1_addr is set to the value derived from the hash of offending_signature.

wait1：等待要从芯片外存储器212中所读取的数据。wait1: waiting for data to be read from the off-chip memory 212 .

read_data_from_sram：在mod1_d_in上来自芯片外存储器212的数据被读入到临时寄存器中。read_data_from_sram: Data from off-chip memory 212 is read into temporary registers on mod1_d_in.

check_match：临时寄存器被串接并与offending_signature相比较。如果这两个相等，那么analyzer_match被断言为高并且分析器208转变回到idle。如果这两个不相等，那么分析器208把新的字符串写回到存储器。check_match: Temporary registers are concatenated and compared with offending_signature. If these two are equal, analyzer_match is asserted high and analyzer 208 transitions back to idle. If the two are not equal, the parser 208 writes the new string back to memory.

send_write_request：mod1_rw被断言为低，并且如同读取状态一样，mod1_addr被设置为根据offending_signature的散列所导出的值。send_write_request: mod1_rw is asserted low, and like read status, mod1_addr is set to the value derived from the hash of offending_signature.

一旦mod1_gr升高，那么分析器208中的每次转变就在时钟边沿发生。Every transition in analyzer 208 occurs on a clock edge once mod1_gr goes high.

芯片外存储器212用来存储完整的串(未散列版本)，在说明性例子中所述串为10字节(80位)长。分析器208尽管比软件要快数百倍，然而仍然要求几个附加时钟周期来访问芯片外存储器212，这可能会使数据处理流水线停滞。在说明性例子中，访问芯片外存储器212中的10字节字符串要求13个时钟周期。Off-chip memory 212 is used to store the full string (unhashed version), which in the illustrative example is 10 bytes (80 bits) long. Analyzer 208, while hundreds of times faster than software, still requires several additional clock cycles to access off-chip memory 212, which may stall the data processing pipeline. In the illustrative example, accessing a 10-byte string in off-chip memory 212 requires 13 clock cycles.

可以实现每当从芯片外存储器212执行存储器读取时就使数据处理流水线停滞的电路。然而，使流水线停滞具有缺点。相对于整个分组有效负载来对字节窗计算散列值的目的在于处理多态蠕虫的情况。但是考虑更普遍的非多态蠕虫的情况，其中蠕虫通信量的分组有效负载近乎是完全相同的。在这种情况下，按照本发明的方法和系统可以对整个分组有效负载产生一系列连续匹配。然后为了每个匹配使流水线停滞可能会造成严重的吞吐量降级，这是由于每个芯片外存储器212访问要花费多个时钟周期。实际上，这样做可能对攻击者有益，这是由于系统管理员可能被迫关掉系统。在说明性例子中，解决方案在于当从芯片外存储器212进行读取时并不使流水线停滞，而是跳过进一步的存储操作直到先前操作已经完成。因此，一旦产生报警，那么在下一13个时钟周期(在读取和写回到芯片外存储器212中所涉及的等待时间)上的数据不会导致产生进一步的报警。Circuitry may be implemented that stalls the data processing pipeline whenever a memory read is performed from off-chip memory 212 . However, stalling the pipeline has disadvantages. The purpose of hashing the byte window relative to the entire packet payload is to handle the case of polymorphic worms. But consider the more general case of non-polymorphic worms, where the packet payloads of the worm traffic are nearly identical. In this case, the method and system according to the present invention can generate a series of consecutive matches for the entire packet payload. Stalling the pipeline for each match can then cause severe throughput degradation since each off-chip memory 212 access takes multiple clock cycles. In fact, doing so may benefit the attacker, since the system administrator may be forced to shut down the system. In the illustrative example, the solution is not to stall the pipeline when reading from off-chip memory 212, but to skip further store operations until the previous operation has completed. Therefore, once an alert is generated, the data over the next 13 clock cycles (latency involved in reading and writing back into off-chip memory 212) will not cause further alerts to be generated.

在测量间隔内，所观察的特征码数目可以近似等于所处理的字符数目。因为由于存储体RAM冲突而跳过一小部分字符，所以所处理的字符数目可能较少。给定测量间隔长度来确定阈值的问题可以被简化为确定概率的界限，所述概率为当m个元素被散列到具有b个存储桶的表时被散列到相同存储桶的元素数目超过i的概率。所述界限由下式给出：During the measurement interval, the number of observed signatures can be approximately equal to the number of characters processed. The number of characters processed may be lower because a small number of characters are skipped due to bank RAM conflicts. The problem of determining a threshold given the length of the measurement interval can be reduced to determining a bound on the probability that when m elements are hashed to a table with b buckets, the number of elements hashed to the same bucket exceeds probability of i. The bound is given by:

$b b {((\frac{em em}{ib ib}))}^{i i}$

在说明性例子中，m个特征码被散列到b个计数器。在以上表示式中，i是阈值。从而，给定测量间隔长度，可以改变阈值，来使计数器超过阈值的概率的上限可接受地小。这随后减少了不必要的芯片外存储器212访问的数目。因此，由于到来的特征码随机地散列到计数器的值，所以对于适当大的阈值来说不规则的特征码可能使计数器的值超过所述预定阈值。计数器精确地接收i个元素的概率可以由下式给出：In an illustrative example, m signatures are hashed to b counters. In the above expression, i is a threshold. Thus, given the length of the measurement interval, the threshold can be varied such that the upper bound on the probability of the counter exceeding the threshold is acceptably small. This in turn reduces the number of unnecessary off-chip memory 212 accesses. Thus, since the incoming signature randomly hashes to the value of the counter, an irregular signature for a suitably large threshold may cause the value of the counter to exceed said predetermined threshold. The probability that the counter receives exactly i elements can be given by:

$(\begin{matrix} m m \\ i i \end{matrix}) {((\frac{11}{b b}))}^{i i} {((11 - - \frac{11}{b b}))}^{((m m - - i i))} \leq \leq (\begin{matrix} m m \\ i i \end{matrix}) {((\frac{11}{b b}))}^{i i} \leq \leq ((\frac{me me}{i i})) {((\frac{11}{b b}))}^{i i} = = {((\frac{me me}{bi bi}))}^{i i}$

第二不等式是二项式系数的上限的结果。计数器的值至少为i的概率可以由下式给出：The second inequality is a consequence of the upper bound on the binomial coefficient. The probability that the value of the counter is at least i can be given by:

$Pr PR ((c c &GreaterEqual; &Greater Equal; i i)) \leq \leq {Σ Σ}_{k k = = i i}^{m m} {((\frac{em em}{bk bk}))}^{k k} \leq \leq {((\frac{em em}{ib ib}))}^{i i} [[11 + + ((\frac{em em}{ib ib})) + + {((\frac{em em}{ib ib}))}^{22} + + . . . . . . + + {((\frac{em em}{ib ib}))}^{((m m - - i i))}]]$

随着i增加，方括号内的项接近于1。因此，计数器的值至少为i的概率可以由下式限制：The term in square brackets approaches 1 as i increases. Therefore, the probability that the value of the counter is at least i can be bounded by:

$b b {((\frac{em em}{ib ib}))}^{i i}$

在说明性实施例中，由于测量间隔m是2.5兆字节，所以计数器的数目b是8192，并且预定阈值i是850，对于随机通信业务流量来说计数器溢出的概率范围为1.02 x 10^-9。据此，对于在间隔内所处理的通信业务来说，计数器溢出的概率如期望的那么小。In the illustrative embodiment, since the measurement interval m is 2.5 megabytes, the number of counters b is 8192, and the predetermined threshold i is 850, the probability of counter overflow for random traffic flow ranges from 1.02 x 10 ⁻⁹ . Accordingly, the probability of counter overflow is as low as desired for the traffic processed during the interval.

当从分析器208接收报警消息时，报警产生器214向在已知的UDP/IP端口上监听的外部数据处理系统发送用户数据报协议(userdatagram protocol，UDP)控制分组。该分组可能包含该有害特征码(对其计算散列的字节的字符串)。当analyzer_match被断言为高时，报警产生器214发出该控制分组。据此，然后可以把最频繁出现的字符串标示成可疑的。图15是从报警产生器214所发出的说明性控制分组1502的框图。Upon receiving an alert message from analyzer 208, alert generator 214 sends a user datagram protocol (UDP) control packet to an external data processing system listening on a known UDP/IP port. The packet may contain the malicious signature (a string of bytes for which to hash). The alarm generator 214 issues this control packet when analyzer_match is asserted high. From this, the most frequently occurring strings can then be marked as suspicious. FIG. 15 is a block diagram of an illustrative control packet 1502 sent from the alert generator 214 .

因此，按照本发明的方法和系统检测网络通信业务流量中频繁出现的特征码。通过用硬件来实现内容检测，可以实现高吞吐量。此外，通过利用由硬件所提供的并行性，与典型的基于软件的方法相比较可以扫描更大量的通信业务流量。通过把几个字节窗并行散列到芯片级块存储器来维持吞吐量，其中可以并行更新每个芯片级块存储器。这不同于传统的基于软件的方法，其中跟随有计数器更新的散列可能需要顺序执行几个指令。此外，使用芯片外存储器分析器提供了低的假肯定率。对每个分组进行多个散列还帮助系统阻碍简单的多态测量。Therefore, the method and system according to the present invention detect frequently occurring signatures in network communication traffic. By implementing content detection in hardware, high throughput can be achieved. Furthermore, by taking advantage of the parallelism provided by the hardware, larger volumes of traffic can be scanned as compared to typical software-based approaches. Throughput is maintained by parallel hashing of several byte windows to on-chip block memories, where each on-chip block store can be updated in parallel. This differs from traditional software-based approaches, where a hash followed by a counter update may require several instructions to be executed sequentially. Furthermore, using an off-chip memory analyzer provides a low false positive rate. Doing multiple hashes per packet also helps the system hinder simple polymorphic measurements.

先前的网络监视工具依赖系统管理员的直觉来检测网络通信业务流量中的异常。按照本发明的方法和系统自动检测网络通信业务流量中对应于频繁出现内容的峰值。Previous network monitoring tools relied on a system administrator's intuition to detect anomalies in network traffic traffic. The method and system according to the present invention automatically detect peaks in network communication traffic corresponding to frequently occurring content.

已经给出了本发明实现方式的以上描述，用于示出和说明的目的。这并非是穷举并且不把本发明限制在所公开的精确形式。按照以上教导可以进行修改和变化，或者通过实施本发明可得到这些修改和变化。例如，所描述的实现方式包括软件，但是当前实现方式可以被实现为硬件和软件的组合或由硬件单独实现。此外，由程序所执行的说明性处理步骤可以依照不同于上述的次序执行，并且可以包括附加处理步骤。本发明的范围由权利要求书及其等效物来定义。The foregoing descriptions of implementations of the present invention have been presented for purposes of illustration and description. This is not exhaustive and does not limit the invention to the precise forms disclosed. Modifications and variations are possible in light of the above teachings or may be obtained through practice of the invention. For example, the described implementations include software, but current implementations may be implemented as a combination of hardware and software or by hardware alone. Furthermore, the illustrative processing steps performed by the programs may be performed in an order different from that described above, and may include additional processing steps. The scope of the invention is defined by the claims and their equivalents.

当介绍本发明或其优选实施例(一个或多个)的元素时，“一”、“一个”、“这个”和“所述”意指存在一个或多个该元素。术语“包括”、“包含”和“具有”意指包括在内的并且意味着可以存在除所列出元素之外的另外元素。When referring to elements of the invention or its preferred embodiment(s), "a", "an", "the" and "said" mean that there are one or more of that element. The terms "comprising", "comprising" and "having" are meant to be inclusive and mean that there may be additional elements other than the listed elements.

在不脱离本发明范围的情况下可以在以上构造中进行各种改变，在以上描述或附图中示出的所有内容应当被解释为说明性的而没有限制性的意义。As various changes could be made in the above constructions without departing from the scope of the invention, all matter contained in the above description or shown in the accompanying drawings shall be interpreted as illustrative and not in a restrictive sense.

Claims

1. method that in data handling system, is used for the duplicate contents of identification data stream, described method comprises step:

Be at least one the part compute Hash functions in a plurality of parts of described data stream;

In response to institute's computed hash function result, increase the value of at least one counter in a plurality of counters, each counter is corresponding to respective computed hash function result;

When surpassing predetermined count value, the value of at least one counter in described a plurality of counters identifies described duplicate contents; And

The duplicate contents that checking is identified is not optimum string.

2. the method for claim 1 is wherein calculated a plurality of hash functions of a plurality of part parallels ground calculating that described hash function is included as described data stream.

3. method as claimed in claim 2, wherein said a plurality of counter bit are in a plurality of memory banks.

4. method as claimed in claim 3 also comprises step:

When a plurality of counters that are arranged in same bank will be increased in the identical clock period, determine the priority which counter is increased.

5. the method for claim 1 also comprises step:

Filter at least one part in a plurality of parts of described data stream to remove predetermined data.

6. the method for claim 1 also comprises step:

Usage count on average comes periodically to reduce the value of each counter in described a plurality of counter.

7. the method for claim 1 also comprises step:

Determine whether the duplicate contents that is identified is false sign.

8. method as claimed in claim 7 is wherein by comparing the duplicate contents that is identified to determine whether the duplicate contents that is identified is false sign with the duplicate contents of previous sign.

9. method as claimed in claim 8, the duplicate contents of wherein previous sign are stored in the storer away from local storage, and this local storage comprises the duplicate contents that is identified.

10. the method for claim 1 wherein uses streamline to increase the value of at least one counter in described a plurality of counter.

11. the method for claim 1, wherein said duplicate contents are the worm condition codes.

12. the method for claim 1, wherein the duplicate contents that is identified has non-predefined condition code.

13. the method for claim 1, wherein said duplicate contents is a virus signature.

14. the method for claim 1, wherein said duplicate contents is a Spam signatures.

15. the method for claim 1, wherein said duplicate contents are the contents via the network repeated exchanged.

16. the method for claim 1, wherein said duplicate contents are the user's of a plurality of access websites appearance.

17. a system that is used for identification data stream duplicate contents, described system comprises:

The hash function counting circuit is used at least one the part compute Hash functions in a plurality of parts of described data stream;

A plurality of counters in response to institute's computed hash function result, increase the value of at least one counter in a plurality of counters, and each counter is corresponding to respective computed hash function result;

The duplicate contents concentrator marker is used for identifying described duplicate contents when the value of at least one counter of described a plurality of counters surpasses predetermined count value; With

Validator is used to verify that the duplicate contents that is identified is not optimum string.

18. a plurality of hash functions of a plurality of part parallels ground calculating that described hash function is included as described data stream wherein calculate in system as claimed in claim 17.

19. system as claimed in claim 18, wherein said a plurality of counter bit are in a plurality of memory banks.

20. system as claimed in claim 19 comprises:

Priority encoder is used for determining the priority which counter is increased when a plurality of counters that are positioned at same bank are increased.

21. system as claimed in claim 17 comprises:

Filtrator, at least one part of a plurality of parts that is used for filtering described data stream is to remove predetermined data.

22. system as claimed in claim 17, wherein usage count on average comes periodically to reduce the value of each counter in described a plurality of counter.

23. system as claimed in claim 17 comprises:

Analyzer is used to determine whether the duplicate contents that is identified is false sign.

24. system as claimed in claim 23 is wherein by comparing the duplicate contents that is identified to determine whether the duplicate contents that is identified is false sign with the duplicate contents of previous sign.

25. system as claimed in claim 24, the duplicate contents of wherein previous sign is stored in the storer away from local storage, and this local storage comprises the duplicate contents that is identified.

26. system as claimed in claim 17 wherein uses streamline to increase the value of at least one counter in described a plurality of counter.

27. system as claimed in claim 17, wherein said duplicate contents is the worm condition code.

28. system as claimed in claim 17, wherein the duplicate contents that is identified has non-predefined condition code.

29. system as claimed in claim 17, wherein said duplicate contents is a virus signature.

30. system as claimed in claim 17, wherein said duplicate contents is a Spam signatures.

31. system as claimed in claim 17, wherein said duplicate contents is the content via the network repeated exchanged.

32. system as claimed in claim 17, wherein said duplicate contents is the user's of a plurality of access websites appearance.

33. a system that is used for the duplicate contents of identification data stream, described system comprises:

Be used to the device of at least one the part compute Hash functions in a plurality of parts of described data stream, at least one part of described data stream is therefrom removed optimum character to prevent that optimum string is designated duplicate contents;

Be used for increasing in response to institute's computed hash function result the device of value of at least one counter of a plurality of counters, each counter is corresponding to respective computed hash function result;

Be used for when the value of at least one counter of described a plurality of counters surpasses predetermined count value, identifying the device of described duplicate contents; With

Be used to verify that the duplicate contents that is identified is not the device of optimum string.