CN106095971B

CN106095971B - A kind of method and server for searching data flow cut-point based on server

Info

Publication number: CN106095971B
Application number: CN201610439783.2A
Authority: CN
Inventors: 于传帅; 张程伟; 徐林波
Original assignee: Huawei Technologies Co Ltd
Current assignee: XFusion Digital Technologies Co Ltd
Priority date: 2014-02-14
Filing date: 2014-02-27
Publication date: 2019-08-13
Anticipated expiration: 2034-02-27
Also published as: CN106095971A

Abstract

The embodiment of the present invention provides a server-based method for searching data stream segmentation points. In the embodiment of the present invention, the data flow segmentation point is searched by judging whether at least part of the data in one of the M windows satisfies the predetermined condition, and when at least part of the data in a certain window does not meet the predetermined condition, skip N*U length, to obtain the next potential split point, and improve the efficiency of searching for the split point of the data stream.

Description

A method and server for searching data flow split point based on server

技术领域technical field

本发明涉及信息技术领域，尤其涉及一种基于服务器查找数据流分割点的方法及服务器。The invention relates to the field of information technology, in particular to a server-based method for searching data stream segmentation points and a server.

背景技术Background technique

数据量的不断增长，使得提供充足的数据存储成为当前存储领域面临的严峻挑战。目前应对这一挑战的一种方式为利用需要存储的数据的冗余特性，使用重复数据删除技术，从而减少存储的数据量。The continuous growth of data volume makes providing sufficient data storage a serious challenge in the current storage field. One way to address this challenge is to take advantage of the redundant nature of the data that needs to be stored and use de-duplication technology to reduce the amount of stored data.

现有技术中，基于内容分块(Content Defined Chunk,CDC)的重复数据删除算法，首先要将待存储的数据流分成很多数据块，而将数据流分成数据块就需要在数据流中查找合适的分割点，两个相邻数据流分割点之间的数据构成一个数据块。计算数据块的特征值，从而查找是否存在相同特征值的数据块，如果查找到相同特征指的数据块，则认为存在重复数据。具体的，基于内容分块的重复数据删除技术是应用滑动窗口技术(Sliding WindowTechnique)基于文件的内容来查找分块的分割点，即通过计算窗口内数据的Rabin指纹来确定数据流分割点。假设从数据流的左边向右边查找分割点，每次计算滑动窗口内数据的指纹，并且将指纹值基于给定的整数K取模后，与给定的余数R进行比对；若相等则窗口的右端为数据流分割点，否则将窗口继续往右滑动一个字节，依次循环地进行计算和比对，直到到达数据流末尾。在基于内容分块的重复数据删除过程中，查找数据流分割点，需要消耗大量的计算资源，从而成为提升重复数据删除性能的瓶颈。In the prior art, the data deduplication algorithm based on Content Defined Chunk (CDC) first needs to divide the data stream to be stored into many data blocks, and to divide the data stream into data blocks needs to find the appropriate data stream in the data stream. The data between the split points of two adjacent data streams constitutes a data block. Calculate the characteristic value of the data block to find out whether there is a data block with the same characteristic value. If the data block with the same characteristic value is found, it is considered that there is duplicate data. Specifically, the data deduplication technology based on content partitioning is to apply the sliding window technology (Sliding Window Technique) to find the split point of the block based on the content of the file, that is, to determine the data stream split point by calculating the Rabin fingerprint of the data in the window. Assuming that the segmentation point is found from the left to the right of the data stream, the fingerprint of the data in the sliding window is calculated each time, and the fingerprint value is moduloed based on the given integer K, and compared with the given remainder R; if they are equal, the window The right end of is the split point of the data stream, otherwise, the window will continue to slide one byte to the right, and the calculation and comparison will be performed cyclically until the end of the data stream is reached. In the process of data deduplication based on content partitioning, finding data stream segmentation points consumes a large amount of computing resources, which becomes a bottleneck in improving deduplication performance.

发明内容Contents of the invention

第一方面，本发明实施例提供了一种基于服务器查找数据流分割点的方法，在所述服务器上预设有规则，所述规则为：为潜在分割点In the first aspect, the embodiment of the present invention provides a server-based method for finding data stream segmentation points, where rules are preset on the server, and the rules are: potential segmentation points

k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；所述方法包括：k Determine M points p _x , the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x and the predetermined condition corresponding to the window W _x [p _x -A _x ,p _x +B _x ] C _x , wherein, x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers; the method includes:

a)依据所述规则为当前潜在分割点k_i确定点p_iz及所述点p_iz对应的窗口W_iz[p_iz-A_z,p_iz+B_z]，i和z为整数，并且1≤z≤M；a) Determine the point p _iz and the window W _iz [p _iz -A _z ,p _iz +B _z ] corresponding to the point p _iz for the current potential segmentation point ki according to the rules, _i and z are integers, and 1 ≤z≤M;

b)判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z；b) judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z ;

当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，获得新的潜在分割点，执行步骤a)；When at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , jump N times from the point p _iz along the direction of searching the data stream splitting point The minimum search unit U of the data stream segmentation point, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), to obtain a new potential segmentation point, perform step a);

c)当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[p_ix-A_x,p_ix+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。c) When at least part of the data in each of the M windows W _ix [p _ix -A _x , p _ix +B _x ] of the current potential segmentation point k _i satisfies the predetermined condition C _x , then the current potential The split point _ki is the data stream split point.

结合第一方面，第一种可能的实现方式中，所述规则还包括：至少两个点p_e和p_f,满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f。With reference to the first aspect, in a first possible implementation manner, the rule further includes: at least two points pe and p _f satisfying the conditions of A _e =A _f , _Be =B _f , and C _e _{=C f} _.

结合第一方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f,相对于所述潜在分割点k,在所述数据流分割点查找反方向上。In combination with the first possible implementation of the first aspect, in the second possible implementation, the rule further includes: the at least two points p _e and p _f , relative to the potential segmentation point k, at The data stream split point looks in the reverse direction.

结合第一方面的第一种可能的实现方式或第二种可能的实现方式，第三种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f之间的距离为1个U。In combination with the first possible implementation manner or the second possible implementation manner of the first aspect, in a third possible implementation manner, the rule further includes: the distance between the at least two points p _e and p _f The distance is 1 U.

结合第一方面，或第一方面第一至第三种任一可能的实现方式，第四种可能的实现方式中，判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the first aspect, or any one of the first to third possible implementations of the first aspect, in the fourth possible implementation, the window W _iz [p _iz -A _z ,p _iz +B _z ] is judged Whether at least part of the data in satisfies the predetermined condition C _z , specifically including:

使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。A random function is used to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z .

结合第一方面的第四种可能的实现方式，第五种可能的实现方式中，所述使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体为使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。In combination with the fourth possible implementation of the first aspect, in the fifth possible implementation, the use of a random function to determine at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] Whether the predetermined condition C _z is satisfied is specifically using a hash function to determine whether at least part of the data in the window W _iz [p _iz -A _z , p _iz +B _z ] satisfies the predetermined condition C _z .

结合第一方面，或第一方面第一至第五种任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述点p_ic对应的所述窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述点p_ic是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。In combination with the first aspect, or any of the first to fifth possible implementations of the first aspect, in the sixth possible implementation, when the window W _iz [p _iz -A _z ,p _iz +B _z ] At least part of the data does not meet the predetermined condition C _z , from the point p _iz along the search direction of the data stream segmentation point to jump N data stream segmentation point minimum search unit U, to obtain the new potential segmentation point, according to According to the rule, the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is the same as the window W _iz [p _iz -A _z ,p _iz +B _z ] coincides with the right boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is located within the range of the window W _iz [p _iz -A _z ,p _iz +B _z ]; wherein, the point p _ic determined for the new potential segmentation point is according to the rule, for the new The M points determined by the potential segmentation points are the first points in the sequence obtained according to the search direction of the data flow.

结合第一方面的第四种可能的实现方式，第七种可能的实现方式中，使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the first aspect, in the seventh possible implementation, a random function is used to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies The predetermined condition C _z specifically includes:

在所述窗口W_iz[p_iz-A_z,p_iz+B_z]中选择F个字节，将所述F个字节反复利用H次，共获得F*H个字节，其中每个字节由8位组成，记为a_m,1…a_m,8，表示所述F*H个字节中第m个字节的第1到第8位，所述F*H个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，所述F*H个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，所述矩阵V_a表示为：从服务正态分布的随机数中选择F*H*8个随机数组成矩阵R，所述矩阵R表示为：将所述矩阵V_a的第m行与所述矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8，同理，获得S_a1、S_a2…到S_aF*H，统计S_a1、S_a2…到S_aF*H中满足大于0的值的个数K，当K为偶数，则所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据满足所述预定条件C_z。Select F bytes in the window W _iz [p _iz -A _z ,p _iz +B _z ], use the F bytes repeatedly H times, and obtain F*H bytes in total, where each A byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , indicating the 1st to 8th bits of the mth byte in the F*H bytes, and the F*H bytes The corresponding bits can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 Any one, the bits corresponding to the F*H bytes obtain a matrix V _a according to the conversion relationship between am, _n and V _{am, n} , and the matrix V _a is expressed as: Select F*H*8 random numbers from the random numbers of the service normal distribution to form a matrix R, and the matrix R is expressed as: Multiply the mth row of the matrix V _a by the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _{am ,2} *h _m,2 +…+V _am,8 *h _m,8 , similarly, obtain S _a1 , S _a2 ... to S _aF*H , count S _a1 , S _a2 ... to S _aF*H to satisfy The number K of values greater than 0, when K is an even number, at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z .

第二方面，本发明实施例提供了一种基于服务器查找数据流分割点的方法，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；In the second aspect, the embodiment of the present invention provides a server-based method for searching for data flow segmentation points. Rules are preset on the server, and the rules are: determine M windows W _x [kA for a potential segmentation point k _x ,k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x ,k+B _x ], wherein x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers;

所述方法包括：The methods include:

a)依据所述规则为当前潜在分割点k_i确定对应的窗口W_iz[k_i-A_z,k_i+B_z]，i和z为整数，并且1≤z≤M；a) Determine the corresponding window W _iz [k _i -A _z , _ki +B _z ] for the current potential segmentation point ki according to the rules, _i and z are integers, and 1≤z≤M;

b)判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z；b) judging whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z ;

当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖)，获得新的潜在分割点，执行步骤a)；When at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] does not meet the predetermined condition C _z , search for the direction from the current potential split point _ki along the data flow split point Skip the minimum search unit U of N data stream segmentation points, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖), obtain a new potential segmentation point, and perform step a);

c)当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[k_i-A_x,k_i+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。c) When at least part of the data in each window W _ix [k _i -A _x , _ki +B _x ] of the M windows of the current potential segmentation point _ki satisfies the predetermined condition C _x , then the current potential The split point _ki is the data stream split point.

结合第二方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：A_e和A_f为正整数。With reference to the first possible implementation manner of the second aspect, in the second possible implementation manner, the rule further includes: A _e and A _f are positive integers.

结合第二方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：A_e-1＝A_f，B_e+1＝B_f。In combination with the first possible implementation or the second possible implementation of the second aspect, in a third possible implementation, the rule further includes: A _e -1=A _f , _Be +1= B _f .

结合第二方面，或第二方面第一至第三任一可能的实现方式，第四种可能的实现方式中，判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否所述满足预定条件C_z，具体包括：In combination with the second aspect, or any of the first to third possible implementations of the second aspect, in the fourth possible implementation, it is determined that in the window W _iz [k _i -A _z , _ki +B _z ] Whether at least part of the data meets the predetermined condition C _z , specifically including:

使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。A random function is used to determine whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

结合第二方面的第四种可能的实现方式，第五种可能的实现方式中，所述使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体为使用hash函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。In combination with the fourth possible implementation of the second aspect, in the fifth possible implementation, the use of a random function to determine at least part of the data in the window W _iz [k _i -A _z , k _i +B _z ] Whether the predetermined condition C _z is satisfied is specifically using a hash function to determine whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

结合第二方面，或第二方面第一至第五任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。In combination with the second aspect, or any of the first to fifth possible implementations of the second aspect, in the sixth possible implementation, when the window W _iz [k _i -A _z , _ki +B _z ] At least part of the data does not satisfy the predetermined condition C _z , jumping from the current potential segmentation point _ki along the data stream segmentation point search direction for N minimum search units U of the data stream segmentation point to obtain the new potential segmentation point , according to the rule, the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is the same as the window W _iz [k _i -A _z ,k _i +B _z ] coincides with the right boundary or the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is located in the window W _iz [k _i - A _z , _ki + B _z ] range; wherein, the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is according to the rule, as The M windows determined by the new potential segmentation point are the first windows in the sequence obtained according to the search direction of the data flow.

结合第二方面的第四种可能的实现方式，第七种可能的实现方式中，使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the second aspect, in the seventh possible implementation, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies The predetermined condition C _z specifically includes:

在所述窗口W_iz[k_i-A_z,k_i+B_z]中选择F个字节，将所述F个字节反复利用H次，共获得F*H个字节，其中每个字节由8位组成，记为a_m,1…a_m,8，表示所述F*H个字节中第m个字节的第1到第8位，所述F*H个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，所述F*H个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，所述矩阵V_a表示为：从服务正态分布的随机数中选择F*H*8个随机数组成矩阵R，所述矩阵R表示为：将所述矩阵V_a的第m行与所述矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8，同理，获得S_a1、S_a2…到S_aF*H，统计S_a1、S_a2…到S_aF*H中满足大于0的值的个数K，当K为偶数，则所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据满足所述预定条件C_z。Select F bytes in the window W _iz [k _i -A _z , _ki +B _z ], use the F bytes repeatedly H times, and obtain F*H bytes in total, where each A byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , indicating the 1st to 8th bits of the mth byte in the F*H bytes, and the F*H bytes The corresponding bits can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 Any one, the bits corresponding to the F*H bytes obtain a matrix V _a according to the conversion relationship between am, _n and V _{am, n} , and the matrix V _a is expressed as: Select F*H*8 random numbers from the random numbers of the service normal distribution to form a matrix R, and the matrix R is expressed as: Multiply the mth row of the matrix V _a by the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _{am ,2} *h _m,2 +…+V _am,8 *h _m,8 , similarly, obtain S _a1 , S _a2 ... to S _aF*H , count S _a1 , S _a2 ... to S _aF*H to satisfy The number K of values greater than 0, when K is an even number, at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

第三方面，本发明实施例提供了一种用于查找数据流分割点的服务器，所述服务器包括中央处理单元和主存储器，所述中央处理单元与所述主存储器通信，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；In a third aspect, an embodiment of the present invention provides a server for finding a data flow split point, the server includes a central processing unit and a main memory, the central processing unit communicates with the main memory, and on the server There are preset rules, which are: determine M points p _x for potential segmentation point k, the window W _x [p _x -A _x ,p _x ₊ B _x ] and the window W _x [p _x -A _x ,p _x +B _x ] corresponds to the predetermined condition C _x , wherein x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers;

所述主存储器用于存储可执行指令，所述中央处理单元执行所述可执行指令，以执行如下步骤：The main memory is used to store executable instructions, and the central processing unit executes the executable instructions to perform the following steps:

c)当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[p_ix- A_x,p_ix+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。c) When at least part of the data in each of the M windows W _ix [p _ix - A _x , p _ix + B _x ] of the current potential segmentation point k _i satisfies the predetermined condition C _x , then the current potential The split point _ki is the data stream split point.

结合第三方面，第一种可能的实现方式中，所述规则还包括：至少两个点p_e和p_f，满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f。With reference to the third aspect, in the first possible implementation manner, the rule further includes: at least two points pe and p _f satisfy the conditions of A _e =A _f , _Be =B _f , and C _e _{=C f} _.

结合第三方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f，相对于所述潜在分割点k,在所述数据流分割点查找反方向上。With reference to the first possible implementation of the third aspect, in the second possible implementation, the rule further includes: the at least two points p _e and p _f , relative to the potential segmentation point k, at The data stream split point looks in the reverse direction.

结合第三方面的第一种可能的实现方式或第二种可能的实现方式，第三种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f之间的距离为1个U。With reference to the first possible implementation manner or the second possible implementation manner of the third aspect, in the third possible implementation manner, the rule further includes: the distance between the at least two points p _e and p _f The distance is 1 U.

结合第三方面，或第一至第三任一可能的实现方式，第四种可能的实现方式中，所述中央处理单元具体用于With reference to the third aspect, or any of the first to third possible implementation manners, in a fourth possible implementation manner, the central processing unit is specifically configured to

结合第三方面的第四种可能的实现方式，第五种可能的实现方式中，所述中央处理单元具体用于使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。In combination with the fourth possible implementation of the third aspect, in the fifth possible implementation, the central processing unit is specifically configured to use a hash function to determine the window W _iz [p _iz -A _z ,p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

结合第三方面，或第一至第五任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述点p_ic对应的所述窗口W_ic[p_ic-A_c,p_ic+B_c] 的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述点p_ic是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。In combination with the third aspect, or any one of the first to fifth possible implementations, in the sixth possible implementation, when at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] If the predetermined condition C _z is not satisfied, the minimum search unit U of N data stream segmentation points is jumped from the point p _iz along the search direction of the data stream segmentation point to obtain the new potential segmentation point. According to the rule, The left boundary of the window W _ic [p _ic -A _c , p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is the same as the window W _iz [p _iz -A _z ,p _iz + The right boundary of B _z ] coincides or the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is located in the window Within the range of W _iz [p _iz -A _z ,p _iz +B _z ]; wherein, the point p _ic determined for the new potential segmentation point is the new potential segmentation point according to the rule The determined M points are the first points in the sequence obtained according to the search direction of the data flow.

结合第三方面的第四种可能的实现方式，第七种可能的实现方式中，所述中央处理单元使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the third aspect, in the seventh possible implementation, the central processing unit uses a random function to determine whether in the window W _iz [p _iz -A _z ,p _iz +B _z ] Whether at least part of the data satisfies the predetermined condition C _z , specifically including:

在所述窗口W_iz[p_iz-A_z,p_iz+B_z]中选择F个字节，将所述F个字节反复利用H次，共获得F*H个字节，其中每个字节由8位组成，记为a_m,1…a_m,8，表示所述F*H个字节中第m个字节的第1到第8位，所述F*H个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，所述F*H个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，所述矩阵V_a表示为：从服务正态分布的随机数中选择F*H*8个随机数组成矩阵R，所述矩阵R表示为：将所述矩阵V_a的第m行与所述矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8，同理，获得S_a1、S_a2…到S_aF*H，统计S_a1、S_a2…到S_aF*H中满足大于0的值的个数K，当K为偶数，则所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据满足所述预定条件C_z。第四方面，本发明实施例提供了一种用于查找数据流分割点的服务器，所述服务器包括中央处理单元和主存储器，所述中央处理单元与所述主存储器通信，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；Select F bytes in the window W _iz [p _iz -A _z ,p _iz +B _z ], use the F bytes repeatedly H times, and obtain F*H bytes in total, where each A byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , indicating the 1st to 8th bits of the mth byte in the F*H bytes, and the F*H bytes The corresponding bits can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 Any one, the bits corresponding to the F*H bytes obtain a matrix V _a according to the conversion relationship between am, _n and V _{am, n} , and the matrix V _a is expressed as: Select F*H*8 random numbers from the random numbers of the service normal distribution to form a matrix R, and the matrix R is expressed as: Multiply the mth row of the matrix V _a by the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _{am ,2} *h _m,2 +…+V _am,8 *h _m,8 , similarly, obtain S _a1 , S _a2 ... to S _aF*H , count S _a1 , S _a2 ... to S _aF*H to satisfy The number K of values greater than 0, when K is an even number, at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z . In a fourth aspect, an embodiment of the present invention provides a server for finding a data stream split point, the server includes a central processing unit and a main memory, the central processing unit communicates with the main memory, and on the server A rule is preset, and the rule is: determine M windows W _x [kA _x , k+B _x ] and a predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ] for the potential segmentation point k, Wherein, x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers;

所述主存储器用于存储可执行指令，所述中央处理单元执行所述可执行指令，以执行以下步骤：The main memory is used to store executable instructions, and the central processing unit executes the executable instructions to perform the following steps:

结合第四方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：A_e和A_f为正整数。With reference to the first possible implementation manner of the fourth aspect, in the second possible implementation manner, the rule further includes: A _e and A _f are positive integers.

结合第四方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：A_e-1＝A_f，B_e+1＝ B_f。In combination with the first possible implementation or the second possible implementation of the fourth aspect, in the third possible implementation, the rule further includes: A _e -1=A _f , _Be +1= B _f .

结合第四方面，或第一至第三任一可能的实现方式，第四种可能的实现方式中，所述中央处理单元具体用于With reference to the fourth aspect, or any of the first to third possible implementation manners, in a fourth possible implementation manner, the central processing unit is specifically used to

结合第四方面的第四种可能的实现方式，第五种可能的实现方式中，所述中央处理单元具体用于使用hash函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。With reference to the fourth possible implementation of the fourth aspect, in the fifth possible implementation, the central processing unit is specifically configured to use a hash function to determine the window W _iz [k _i -A _z , k _i +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

结合第四方面，或第一至第五任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。With reference to the fourth aspect, or any of the first to fifth possible implementations, in the sixth possible implementation, when at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] If the predetermined condition C _z is not satisfied, jump N minimum search units U of the data stream segmentation point from the current potential segmentation point _ki along the data stream segmentation point search direction to obtain the new potential segmentation point, according to the According to the above rules, the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is the same as the window W _iz [k _i -A _z , _ki +B _z ] or the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is located at the window W _iz [k _i -A _z , k _i +B _z ] range; wherein, the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is based on the rule, for the new The M windows determined by the potential splitting points are the first windows in the sequence obtained according to the search direction of the data flow.

结合第四方面的第四种可能的实现方式，第七种可能的实现方式中，所述中央处理单元使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the fourth aspect, in the seventh possible implementation, the central processing unit _uses a _random _function to _judge _the Whether at least part of the data satisfies the predetermined condition C _z , specifically including:

第五方面，本发明实施例提供了一种用于查找数据流分割点的服务器，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；In the fifth aspect, the embodiment of the present invention provides a server for searching data stream segmentation points, and rules are preset on the server, and the rules are: determine M points p _x , point The window W _x [p _x -A _x ,p _x +B _x ] corresponding to p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], where x is 1 to M continuous natural numbers, M≥2, A _x and B _x are integers;

所述服务器包括：处理单元，用于执行步骤a)：The server includes: a processing unit, configured to perform step a):

判断处理单元，用于判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z；A judging processing unit, configured to judge whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies a predetermined condition C _z ;

当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，获得新的潜在分割点，则所述确定单元为所述新的潜在分割点执行步骤a)；When at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , jump N times from the point p _iz along the direction of searching the data stream splitting point The minimum search unit U of the data flow segmentation point, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), to obtain a new potential segmentation point, then the determination unit performing step a) for said new potential segmentation point;

当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[p_ix-A_x,p_ix+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。When at least part of the data in each of the M windows W _ix [p _ix -A _x , p _ix +B _x ] of the current potential segmentation point k _i satisfies the predetermined condition C _x , then the current potential segmentation point k _i is the data stream split point.

结合第五方面，第一种可能的实现方式中，所述规则还包括：至少两个点p_e和p_f，满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f。With reference to the fifth aspect, in the first possible implementation manner, the rule further includes: at least two points pe and p _f satisfy the conditions of A _e =A _f , _Be =B _f , and C _e _{=C f} _.

结合第五方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f，相对于所述潜在分割点k,在所述数据流分割点查找反方向上。With reference to the first possible implementation of the fifth aspect, in the second possible implementation, the rule further includes: the at least two points p _e and p _f , relative to the potential segmentation point k, at The data stream split point looks in the reverse direction.

结合第五方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f之间距离为1个U。With reference to the first possible implementation manner or the second possible implementation manner of the fifth aspect, in a third possible implementation manner, the rule further includes: between the at least two points p _e and p _f The distance is 1 U.

结合第五方面，或第一至第三任一可能的实现方式，第四种可能的实现方式中，所述判断处理单元具体使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。With reference to the fifth aspect, or any of the first to third possible implementation manners, in a fourth possible implementation manner, the determination processing unit specifically uses a random function to determine the window W _iz [p _iz -A _z ,p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

结合第五方面的第四种可能的实现方式，第五种可能的实现方式中，所述判决处理单元具体用于使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。With reference to the fourth possible implementation of the fifth aspect, in the fifth possible implementation, the judgment processing unit is specifically configured to use a hash function to judge the window W _iz [p _iz -A _z ,p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

结合第五方面，或第一至第五任一可能的实现方式，第六种可能的实现方式中，所述判断处理单元用于当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，所述确定单元为所述新的潜在分割点执行步骤a)，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[p_ic-A_c,p_ic+B_c]是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。With reference to the fifth aspect, or any of the first to fifth possible implementation manners, in a sixth possible implementation manner, the judgment processing unit is configured to when the window W _iz [p _iz -A _z ,p _iz + At least part of the data in B _z ] does not satisfy the predetermined condition C _z , jump N minimum search units U of data flow segmentation points from the point p _iz along the data flow segmentation point search direction, and obtain the new potential segmentation point, the determination unit performs step a) for the new potential segmentation point, according to the rule, the window W _ic corresponding to the point p _ic determined for the new potential segmentation point [p _ic -A _c ,p _ic +B _c ] coincides with the right boundary of the window W _iz [p _iz -A _z ,p _iz +B _z ] or the window W _ic [p _ic -A _c ,p _ic +B _c ] is located within the window W _iz [p _iz -A _z ,p _iz +B _z ]; The window W _ic [p _ic -A _c ,p _ic +B _c ] is the first point in the sequence obtained according to the search direction of the data flow among the M points determined for the new potential segmentation point according to the rule.

结合第五方面的第四种可能的实现方式，第七种可能的实现方式中，所述判断处理单元具体用于使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：With reference to the fourth possible implementation of the fifth aspect, in the seventh possible implementation, the determination processing unit is specifically configured to use a random function to determine the window W _iz [p _iz -A _z ,p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z , specifically including:

第六方面，本发明实施例提供了一种用于查找数据流分割点的服务器，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；In a sixth aspect, an embodiment of the present invention provides a server for searching data stream segmentation points, and a rule is preset on the server, and the rule is: determine M windows W _x [kA for a potential segmentation point k _x ,k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x ,k+B _x ], wherein x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers;

所述服务器包括:确定单元，用于执行步骤a：The server includes: a determining unit, configured to perform step a:

a)依据所述规则为当前潜在分割点k_i确定对应的窗口W_iz[k_i-A_z,k_i a) Determine the corresponding window W _iz [k _i -A _z ,k _i for the current potential segmentation point _ki according to the rules

+B_z]，i和z为整数，并且1≤z≤M；+B _z ], i and z are integers, and 1≤z≤M;

判断处理单元,用于判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z；A judging processing unit, configured to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies a predetermined condition C _z ;

c当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[k_i-A_x,k_i+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。c When at least part of the data in each of the M windows W _ix [k _i -A _x , _ki +B _x ] of the current potential segmentation point k _i satisfies the predetermined condition C _x , then the current potential segmentation Point _ki is the data flow splitting point.

结合第六方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：A_e和A_f为正整数。With reference to the first possible implementation manner of the sixth aspect, in the second possible implementation manner, the rule further includes: A _e and A _f are positive integers.

结合第六方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：A_e-1＝A_f，B_e+1＝B_f。In combination with the first possible implementation manner or the second possible implementation manner of the sixth aspect, in the third possible implementation manner, the rule further includes: A _e -1=A _f , _Be +1= B _f .

结合第六方面，或第一至第三任一可能的实现方式，第四种可能的实现方式中，所述判断处理单元具体用于With reference to the sixth aspect, or any of the first to third possible implementation manners, in a fourth possible implementation manner, the judgment processing unit is specifically configured to

结合第六方面的第四种可能的实现方式，第五种可能的实现方式中，所述判断处理单元具体用于使用hash函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。With reference to the fourth possible implementation of the sixth aspect, in the fifth possible implementation, the judgment processing unit is specifically configured to use a hash function to judge the window W _iz [k _i -A _z , k _i +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

结合第六方面，或第一至第五任一可能的实现方式，第六种可能的实现方式中，所述判断处理单元用于当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，所述确定单元为所述新的潜在分割点执行步骤a)，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。With reference to the sixth aspect, or any of the first to fifth possible implementation manners, in the sixth possible implementation manner, the judgment processing unit is configured to: when the window W _iz [k _i -A _z ,k _i + At least part of the data in B _z ] does not satisfy the predetermined condition C _z , jump N minimum search units U of the data stream segmentation point from the current potential segmentation point _ki along the data stream segmentation point search direction, and obtain the new The potential segmentation point of , the determination unit performs step a) for the new potential segmentation point, according to the rule, the window W _ic [k _i -A _c ,k _i + The left boundary of B _c ] coincides with the right boundary of the window W _iz [k _i -A _z , _ki +B _z ] or the window W _ic [k _i -A _c , _ki +B _c ] is located within the range of the window W _iz [ _ki -A _z , _ki +B _z ]; wherein, the window W determined for the new potential segmentation point _ic [k _i -A _c , _ki +B _c ] is the first window in the sequence obtained according to the search direction of the data flow among the M windows determined for the new potential segmentation point according to the rule.

结合第六方面的第四种可能的实现方式，第七种可能的实现方式中，所述判断处理单元使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation manner of the sixth aspect, in the seventh possible implementation _manner , the _judgment processing unit uses a _random _function to _judge the Whether at least part of the data satisfies the predetermined condition C _z , specifically including:

第七方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质用于存储可执行指令，服务器执行所述可执行指令以查找数据流分割点，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；In the seventh aspect, the embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium is used to store executable instructions, and the server executes the executable instructions to find data flow segmentation points, and the server There are rules preset above, and the rules are: determine M points p _x for the potential segmentation point k, the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x , and the window W _x [p _x -A _x ,p _x +B _x ] corresponds to the predetermined condition C _x , wherein x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers;

当所述服务器执行所述可执行指令，以执行以下步骤：When the server executes the executable instruction to perform the following steps:

结合第七方面，第一种可能的实现方式中，所述规则还包括：至少两个点p_e和p_f，满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f。With reference to the seventh aspect, in the first possible implementation manner, the rule further includes: at least two points pe and p _f satisfy the conditions of A _e =A _f , _Be =B _f , and C _e _{=C f} _.

结合第七方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f，相对于所述潜在分割点k,在所述数据流分割点查找反方向上。With reference to the first possible implementation manner of the seventh aspect, in the second possible implementation manner, the rule further includes: the at least two points _pe and p _f , relative to the potential segmentation point k, at The data stream split point looks in the reverse direction.

结合第七方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：所述至少两个点p_e和p_f之间的距离为1个U。With reference to the first possible implementation manner or the second possible implementation manner of the seventh aspect, in a third possible implementation manner, the rule further includes: between the at least two points p _e and p _f The distance is 1 U.

结合第七方面，或第七方面第一至第三任一可能的实现方式,第四种可能的实现方式中，所述服务器判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：With reference to the seventh aspect, or any of the first to third possible implementation manners of the seventh aspect, in the fourth possible implementation manner, the server determines that the window W _iz [p _iz -A _z , p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z , specifically including:

所述服务器使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。The server uses a random function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z .

结合第七方面的第四种可能的实现方式，第五种可能的实现方式中，所述服务器使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：With reference to the fourth possible implementation of the seventh aspect, in the fifth possible implementation, the server uses a random function to judge that at least part of the window W _iz [p _iz -A _z ,p _iz +B _z ] Whether the data satisfies the predetermined condition C _z , specifically including:

所述服务器使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。The server uses a hash function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z .

结合第七方面，或第七方面第一至第五任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述点p_ic对应的所述窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述点p_ic是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。In combination with the seventh aspect, or any of the first to fifth possible implementations of the seventh aspect, in the sixth possible implementation, when the window W _iz [p _iz -A _z ,p _iz +B _z ] At least part of the data does not satisfy the predetermined condition C _z , jumping from the point p _iz along the direction of searching for the data stream split point by N minimum search units U of the data stream split point to obtain the new potential split point, according to the According to the above rules, the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is the same as the window W _iz [p _iz -A _z , The right boundary of p _iz +B _z ] coincides or the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is located at Within the range of the window W _iz [p _iz -A _z ,p _iz +B _z ]; wherein, the point p _ic determined for the new potential segmentation point is according to the rule, for the new The M points determined by the potential split point are the points ranked first in the sequence obtained according to the search direction of the data flow.

结合第七方面的第四种可能的实现方式，第七种可能的实现方式中，使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the seventh aspect, in the seventh possible implementation, a random function is used to judge whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies The predetermined condition C _z specifically includes:

第八方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质用于存储可执行指令，服务器执行所述可执行指令以查找数据流分割点，在所述服务器上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；当所述服务器执行所述可执行指令，以执行以下步骤：In an eighth aspect, an embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium is used to store executable instructions, and the server executes the executable instructions to find data flow segmentation points. There are rules preset above, and the rules are: determine M windows W _x [kA _x , k+B _x ] and predetermined conditions C _x corresponding to windows W _x [kA _x , k+B _x ] for potential segmentation point k , wherein, x is a continuous natural number from 1 to M, M≥2, A _x and B _x are integers; when the server executes the executable instruction, the following steps are performed:

结合第八方面的第一种可能的实现方式，第二种可能的实现方式中，所述规则还包括：A_e和A_f为正整数。With reference to the first possible implementation manner of the eighth aspect, in the second possible implementation manner, the rule further includes: A _e and A _f are positive integers.

结合第八方面的第一种可能的实现方式或第二种可能的实现方式，在第三种可能的实现方式中，所述规则还包括：A_e-1＝A_f，B_e+1＝B_f。In combination with the first possible implementation manner or the second possible implementation manner of the eighth aspect, in the third possible implementation manner, the rule further includes: A _e -1=A _f , _Be +1= B _f .

结合第八方面，或第八方面第一到第三任一可能的实现方式，第四种可能的实现方式中，所述服务器判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：With reference to the eighth aspect, or any of the first to third possible implementation manners of the eighth aspect, in the fourth possible implementation manner, the server determines that the window W _iz [k _i -A _z ,k _i +B _z ] whether at least part of the data satisfies the predetermined condition C _z , specifically including:

结合第八方面的第四种可能的实现方式，第五种可能的实现方式中，所述服务器使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体为所述服务器使用hash函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。With reference to the fourth possible implementation of the eighth aspect, in the fifth possible implementation, the server uses a random function to judge that at least part of the window W _iz [k _i -A _z , k _i +B _z ] Whether the data satisfies the predetermined condition C _z , specifically, the server uses a hash function to determine whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

结合第八方面，或第八方面第一到第五任一可能的实现方式，第六种可能的实现方式中，当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。In combination with the eighth aspect, or any of the first to fifth possible implementations of the eighth aspect, in the sixth possible implementation, when the window W _iz [k _i -A _z , _ki +B _z ] At least part of the data does not satisfy the predetermined condition C _z , jumping from the current potential segmentation point _ki along the data stream segmentation point search direction for N minimum search units U of the data stream segmentation point to obtain the new potential segmentation point , according to the rule, the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is the same as the window W _iz [k _i -A _z ,k _i +B _z ] coincides with the right boundary or the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is located in the window W _iz [k _i - A _z , _ki + B _z ] range; wherein, the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is according to the rule, as The M windows determined by the new potential segmentation point are the first windows in the sequence obtained according to the search direction of the data flow.

结合第八方面的第四种可能的实现方式，第七种可能的实现方式中，使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：In combination with the fourth possible implementation of the eighth aspect, in the seventh possible implementation, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies The predetermined condition C _z specifically includes:

在所述窗口W_iz[k_i-A_z,k_i+B_z]中选择F个字节，将所述F个字节反复利用H次，共获得F*H个字节，其中每个字节由8位组成，记为a_m,1…a_m,8，表示所述F*H个字节中第m个字节的第1到第8位，所述F*H个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，所述 F*H个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，所述矩阵V_a表示为：从服务正态分布的随机数中选择F*H*8个随机数组成矩阵R，所述矩阵R表示为：将所述矩阵V_a的第m行与所述矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8，同理，获得S_a1、S_a2…到S_aF*H，统计S_a1、S_a2…到S_aF*H中满足大于0的值的个数K，当K为偶数，则所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据满足所述预定条件C_z。Select F bytes in the window W _iz [k _i -A _z , _ki +B _z ], use the F bytes repeatedly H times, and obtain F*H bytes in total, where each A byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , indicating the 1st to 8th bits of the mth byte in the F*H bytes, and the F*H bytes The corresponding bits can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 Any one, the bits corresponding to the F*H bytes obtain a matrix V _a according to the conversion relationship between am, _n and V _{am, n} , and the matrix V _a is expressed as: Select F*H*8 random numbers from the random numbers of the service normal distribution to form a matrix R, and the matrix R is expressed as: Multiply the mth row of the matrix V _a by the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _{am ,2} *h _m,2 +…+V _am,8 *h _m,8 , similarly, obtain S _a1 , S _a2 ... to S _aF*H , count S _a1 , S _a2 ... to S _aF*H to satisfy The number K of values greater than 0, when K is an even number, at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

本发明实施例中通过判断M个窗口中某一个窗口中至少部分数据是否满足预定条件，来查找数据流分割点，当某一个窗口中至少部分数据不满足预定条件，则跳过N*U个长度，获得下一个潜在分割点，提高了数据流分割点查找效率。In the embodiment of the present invention, the data flow segmentation point is searched by judging whether at least part of the data in one of the M windows satisfies the predetermined condition, and when at least part of the data in a certain window does not meet the predetermined condition, skip N*U length, to obtain the next potential split point, and improve the efficiency of searching for the split point of the data stream.

附图说明Description of drawings

图1为本发明实施例一种应用场景示意图；FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention;

图2为数据流分割点示意图；Fig. 2 is a schematic diagram of a data flow splitting point;

图3为查找数据流分割点示意图；Fig. 3 is a schematic diagram of finding data stream segmentation points;

图4为本发明实施例方法示意图；Fig. 4 is a schematic diagram of the method of the embodiment of the present invention;

图5和图6为查找数据流分割点实施方式示意图；Fig. 5 and Fig. 6 are the schematic diagrams of the implementation manner of finding the data stream segmentation point;

图7和图8为查找数据流分割点实施方式示意图；FIG. 7 and FIG. 8 are schematic diagrams of implementations of searching for data stream segmentation points;

图9和图10为找数据流分割点实施方式示意图；Fig. 9 and Fig. 10 are the schematic diagrams of the embodiment of finding data stream segmentation point;

图11和图12和图13为找数据流分割点实施方式示意图；Fig. 11 and Fig. 12 and Fig. 13 are the schematic diagrams of the embodiment of finding the data stream segmentation point;

图14和图15为找数据流分割点实施方式示意图；Fig. 14 and Fig. 15 are the schematic diagrams of the embodiment of finding the split point of the data flow;

图16和图17为判断窗口中至少部分数据是否满足预定条件示意图；Figure 16 and Figure 17 are schematic diagrams for judging whether at least part of the data in the window satisfies a predetermined condition;

图18为去重服务器结构图；Figure 18 is a structural diagram of the deduplication server;

图19为去重服务器结构图；Figure 19 is a structural diagram of the deduplication server;

图20为本发明实施例方法示意图；Figure 20 is a schematic diagram of the method of the embodiment of the present invention;

图21和图22为查找数据流分割点实施方式示意图；FIG. 21 and FIG. 22 are schematic diagrams of implementations of searching for data stream segmentation points;

图23和图24为查找数据流分割点实施方式示意图；FIG. 23 and FIG. 24 are schematic diagrams of implementations of searching for data stream segmentation points;

图25和图26为找数据流分割点实施方式示意图；FIG. 25 and FIG. 26 are schematic diagrams of implementations of finding a data stream segmentation point;

图27和图28和图29为找数据流分割点实施方式示意图；Fig. 27, Fig. 28 and Fig. 29 are schematic diagrams of implementations of finding data stream segmentation points;

图30和图31为找数据流分割点实施方式示意图；FIG. 30 and FIG. 31 are schematic diagrams of implementations of finding a data stream segmentation point;

图32和图33为判断窗口中至少部分数据是否满足预定条件示意图。FIG. 32 and FIG. 33 are schematic diagrams for judging whether at least part of the data in the window satisfies a predetermined condition.

图34和35为判断是否满足预定条件的概率示意图。34 and 35 are probability diagrams for judging whether a predetermined condition is satisfied.

具体实施例specific embodiment

随着存储技术的不断进步，数据产生量也在不断增加，大量的数据对存储容量提出了最高的要求。存储容量增加的同时，也增加了IT设备采购成本，为了缓解数据量与存储容量之间的需求矛盾，节约IT设备采购成本，在数据存储领域引入了重复数据删除技术。With the continuous advancement of storage technology, the amount of data generated is also increasing, and a large amount of data puts forward the highest requirements for storage capacity. The increase in storage capacity also increases the procurement cost of IT equipment. In order to alleviate the demand contradiction between data volume and storage capacity and save IT equipment procurement costs, data deduplication technology is introduced in the field of data storage.

本发明实施例一种使用场景为数据备份场景。数据备份是为防止各种原因导致的数据丢失，通过备份服务器将数据备份到其他存储介质的过程。如图1所示的数据备份系统架构。数据备份系统包括客户端(101a、101b…101n)、备份服务器102、重复数据删除服务器(简称去重服务器或重删服务器)103和存储设备(104a、104b…104n)。其中客户端(101a、101b…101n)可以为应用服务器、工作站等；备份服务器102用于备份客户端生成的数据；去重服务器103用于执行备份数据的重复数据删除任务；存储设备(104a、104b…104n)作为存储重复数据删除后的数据的存储介质，可以为磁盘阵列、磁带库等存储介质。客户端(101a、101b…101n)、备份服务器102、重复数据删除服务器103和存储设备(104a、104b…104n)可以通过交换机、局域网、互联网、光纤等方式连接，上述设备可以位于同一地点，也可以位于不同地点。备份服务器102、重删服务器103、存储设备(104a、104b…104n)可以为独立的物理设备，或者在具体实现中物理上集成为一体，或者备份服务器102与重删服务器103集成为一体，或者重删服务器103与存储设备(104a、104b…104n)集成为一体等。A usage scenario in the embodiment of the present invention is a data backup scenario. Data backup is the process of backing up data to other storage media through the backup server to prevent data loss caused by various reasons. The data backup system architecture shown in Figure 1. The data backup system includes clients (101a, 101b...101n), backup server 102, data deduplication server (referred to as deduplication server or deduplication server) 103 and storage devices (104a, 104b...104n). Wherein the clients (101a, 101b...101n) can be application servers, workstations, etc.; the backup server 102 is used to back up the data generated by the client; the deduplication server 103 is used to perform the deduplication task of the backup data; the storage device (104a, 104b...104n) As a storage medium for storing deduplicated data, it may be a storage medium such as a disk array or a tape library. Clients (101a, 101b...101n), backup server 102, deduplication server 103, and storage devices (104a, 104b...104n) can be connected through switches, local area networks, the Internet, optical fibers, etc., and the above devices can be located at the same place or Can be located in different locations. The backup server 102, the deduplication server 103, and the storage devices (104a, 104b...104n) can be independent physical devices, or they can be physically integrated in a specific implementation, or the backup server 102 and the deduplication server 103 can be integrated into one, or The deduplication server 103 is integrated with the storage devices (104a, 104b...104n) and the like.

去重服务器103对备份数据的数据流执行重复数据删除操作，一般包括以下步骤：The deduplication server 103 performs deduplication operations on the data stream of the backup data, which generally includes the following steps:

1)数据流分割点查找：根据特定算法在数据流中查找数据流分割点；1) Data stream split point search: find the data stream split point in the data stream according to a specific algorithm;

2)根据查找到的数据流分割点划分数据块；2) Divide the data block according to the found data flow segmentation point;

3)计算数据块的特征值：计算数据块的特征值作为标识该数据块的特征；将计算得到的特征值添加到该数据流对应的文件的数据块的特征列表中；一般利用SHA-1或MD5算法计算数据块的特征值；3) Calculate the characteristic value of the data block: calculate the characteristic value of the data block as the characteristic identifying the data block; add the calculated characteristic value to the characteristic list of the data block of the file corresponding to the data stream; generally use SHA-1 or MD5 algorithm to calculate the characteristic value of the data block;

4)相同数据块检测：将计算得到的数据块的特征值与数据块特征列表中已存在的特征值进行比对以确定是否存在相同数据块；4) Same data block detection: compare the calculated feature value of the data block with the existing feature value in the data block feature list to determine whether there is the same data block;

5)删除重复数据块：通过相同数据块检测，如果发现数据块特征列表中存在与该数据块相同的特征值，则不需要再存储该数据块或者根据备份策略确定的重复数据块存储数量决定是否存储该数据块。5) Delete duplicate data blocks: through the detection of the same data block, if it is found that there is a feature value identical to the data block in the feature list of the data block, it is not necessary to store the data block or determine the number of duplicate data blocks stored according to the backup strategy Whether to store the data block.

通过去重服务器103对备份数据的数据流执行重复数据删除操作的步骤可知，数据流分割点查找作为重复数据删除操作的关键步骤，直接决定了重复数据删除的性能。From the steps of the deduplication server 103 performing the deduplication operation on the data stream of the backup data, it can be seen that the data stream segmentation point search is a key step of the deduplication operation, which directly determines the performance of the deduplication.

本发明实施例中，去重服务器103接收备份服务器102发送的备份文件，对该文件执行重复数据删除处理。通常待处理备份文件在去重服务器103中以数据流形式呈现，去重服务器103查找数据流中的分割点时，通常要确定数据流分割点最小查找单位，具体如图2所示，如潜在分割点k₁位于序号分别为1和2的连续两个数据流分割点最小查找单位之间，潜在分割点是指需要进行判断是否可以作为数据流分割点的点；当点k₁为一个数据流分割点，数据流分割点查找方向如图2中箭头所示，查找下一个潜在分割点为k₇，即位于序号分别为7和8的连续两个数据流分割点最小查找单位之间，当潜在分割点k₇为数据流分割点，则相邻的两个数据流分割点k₁、k₇之间的数据为1个数据块。数据流分割点最小查找单位具体可以根据实际需要确定，这里以1个字节(Byte)为例，即序号为1、2、7和8的数据流分割点最小查找单位大小均为1个字节。如图2所示的数据流分割点查找方向通常表示由文件头向文件尾方向查找，或者由文件尾向文件头方向，本实施例中以从文件头向文件尾方向查找为例。In the embodiment of the present invention, the deduplication server 103 receives the backup file sent by the backup server 102, and performs deduplication processing on the file. Usually, the backup file to be processed is presented in the form of a data stream in the deduplication server 103. When the deduplication server 103 searches for a split point in the data stream, it usually needs to determine the minimum search unit of the data stream split point, as shown in FIG. The split point k ₁ is located between the minimum search units of two consecutive data stream split points whose serial numbers are ₁ and 2 respectively. A potential split point refers to a point that needs to be judged whether it can be used as a data stream split point; Stream split point, the search direction of the data stream split point is shown by the arrow in Figure 2, and the next potential split point to be searched is k ₇ , which is located between two consecutive data stream split point minimum search units whose sequence numbers are 7 and 8 respectively, When the potential splitting point k ₇ is a data stream splitting point, the data between two adjacent data stream splitting points k ₁ and k ₇ is 1 data block. The minimum search unit of the data stream segmentation point can be determined according to actual needs. Here, 1 byte (Byte) is taken as an example, that is, the minimum search unit size of the data stream segmentation points with serial numbers 1, 2, 7, and 8 is 1 word. Festival. The search direction of the data stream split point as shown in FIG. 2 usually means searching from the file head to the file tail, or from the file tail to the file head. In this embodiment, the search from the file head to the file tail is taken as an example.

在重复数据删除场景，通常数据块越小，重复数据删除率越高，越容易查找到重复数据块，但是由此生成的元数据数量越大，而且数据块小到一定程度之后，重复数据删除率就不会增加了，但是元数据数量却会急剧增加。因此，必须控制数据块大小，实际应用中，通常会设定数据块的最小值，例如4KB(4096个字节)，同时考虑到重复数据删除率，也会设定数据块的最大值，即数据块大小不能超过最大值，例如12KB(12288个字节)。一种具体实现方式如图3所示，去重服务器103在沿着箭头所示方向查找数据流分割点，k_a为当前查找到的数据流分割点，从k_a向数据流分割点查找方向查找下一个潜在分割点，为满足最小数据块要求，通常会从数据流分割点开始沿着数据流分割点查找方向跳过最小数据块大小，从最小数据块结束位置开始查找，也就是将最小数据块结束位置作为下一个潜在分割点k_i。在本发明实施例中，可以先从k_a点沿数据流分割点查找方向跳跃最小数据块4KB，即4*1024＝4096字节。从k_a点沿数据流分割点查找方向跳跃4096个字节，在第4096个字节的结束位置获得点k_i，作为潜在分割点，例如k_i位于序号分别为4096和4097的连续两个数据流分割点最小查找单位之间。仍然以图3为例，k_a为当前查找到的数据流分割点，沿如图3所示方向查找下一个数据流分割点，如果超过数据块最大值仍然没有找到下一个数据流分割点，则在从k_a点开始向数据流分割点查找方向达到数据块最大值的点k_z作为下一个数据流分割点，进行强制分割。In the data deduplication scenario, usually the smaller the data block, the higher the deduplication rate, and the easier it is to find the duplicate data block, but the greater the amount of metadata generated, and after the data block is small enough, the data deduplication The rate will not increase, but the amount of metadata will increase dramatically. Therefore, the size of the data block must be controlled. In practical applications, the minimum value of the data block is usually set, such as 4KB (4096 bytes). At the same time, considering the deduplication rate, the maximum value of the data block is also set, namely The data block size cannot exceed a maximum value, such as 12KB (12288 bytes). A specific implementation is shown in Figure 3, the deduplication server 103 searches for the data stream splitting point along the direction indicated by the arrow, k _a is the data stream splitting point currently found, and the search direction from k _a to the data stream splitting point Find the next potential split point. In order to meet the minimum data block requirements, the minimum data block size is usually skipped from the data stream split point along the data stream split point search direction, and the search starts from the end position of the smallest data block, that is, the smallest The end position of the data block is used as the next potential split point k _i . In the embodiment of the present invention, the minimum data block of 4KB can be skipped from point k _a along the search direction of the data stream segmentation point, that is, 4*1024=4096 bytes. Jump 4096 bytes from point k _a along the data stream split point search direction, and obtain point k _i at the end position of the 4096th byte as a potential split point, for example, k _i is located in two consecutive numbers of 4096 and 4097 The data stream split point is between the smallest search units. Still taking Figure 3 as an example, k _a is the currently found data stream split point, and the next data stream split point is searched along the direction shown in Figure 3. If the next data stream split point is still not found if the maximum value of the data block is exceeded, Then, the point k _z which reaches the maximum value of the data block in the search direction from k _a to the data stream split point is taken as the next data stream split point, and the forced split is performed.

本发明实施例提供一种基于去重服务器查找数据流分割点的方法，如图4所示，包括：An embodiment of the present invention provides a method for finding a data stream segmentation point based on a deduplication server, as shown in FIG. 4 , including:

在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；其中，p_x与潜在分割点k之间距离d_x个数据流分割点最小查找单位，数据流分割点最小查找单位以U表示，本实施例中U＝1个字节，。在图3所示的实现方式中，关于M的取值，其中一种实现方式，M*U取值不大于预设的两个相邻的数据流分割点之间的最大距离，即预设的数据块最大长度。判断点p_z对应的窗口W_z[p_z-A_z，p_z+B_z]中至少部分数据是否满足预定条件C_z，其中，z为整数，1≤z≤M，(p_z-A_z)与(p_z+B_z)分别表示窗口W_z的两个边界。当判断任意一个点p_z的窗口W_z[p_z-A_z，p_z+B_z]中至少部分数据不满足预定条件C_z，则从不满足预定条件的窗口W_z[p_z-A_z，p_z+B_z]对应的点p_z沿数据流分割点查找方向跳跃N个字节，N≤‖B_z‖+max_x(‖A_x‖+‖(k-p_x)‖)。其中，‖(k-p_x)‖表示M个点p_x中任一个点与潜在分割点k之间的距离，max_x(‖A_x‖+‖(k-p_x)‖)表示M个点p_x中任一个点与潜在分割点k之间的距离及该点对应的A_x的绝对值之和的最大值；‖B_z‖表示W_z[p_z-A_z，p_z+B_z]中B_z的绝对值，将在下面实施例中具体介绍N取值的原理。当判断M个窗口中的每一个窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x，则潜在分割点k为数据流分割点。Rules are preset on the deduplication server 103, and the rules are: determine M points p _x , the window W _x [p _x −A _x , p _x +B _x ] corresponding to the point p _x for potential segmentation point k and The predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], where x is a continuous natural number from 1 to M, M≥2, A _x and B _x are integers; where p _x The distance from the potential segmentation point k is d _x the minimum search unit of the data stream segmentation point. The minimum search unit of the data stream segmentation point is represented by U, and U=1 byte in this embodiment. In the implementation shown in FIG. 3 , regarding the value of M, in one implementation, the value of M*U is not greater than the preset maximum distance between two adjacent data stream segmentation points, that is, the preset The maximum length of the data block. Judging whether at least part of the data in the window W _z [p _z -A _z , p _z +B _z ] corresponding to the point p _z meets the predetermined condition C _z , where z is an integer, 1≤z≤M, (p _z -A _z ) and (p _z +B _z ) denote the two boundaries of the window W _z respectively. When it is judged that at least part of the data in the window W _z [p _z -A _z , p _z +B _z ] of any point p _z does not meet the predetermined condition C _z , then the window W _z [p _z -A _z , p _z +B _z ] corresponding point p _z jumps N bytes along the search direction of data stream segmentation point, N≤‖B _z ‖+max _x (‖A _x ‖+‖(kp _x )‖). Among them, ‖(kp _x )‖ represents the distance between any point in M points p _x and the potential segmentation point k, and max _x (‖A _x ‖+‖(kp _x )‖) represents the distance between M points p _x The maximum value of the sum of the distance between any point and the potential segmentation point k and the absolute value of A _x corresponding to the point; ‖B _z ‖ means B in W _z [p _z -A _z , p _z +B _z ] For the absolute value of _z , the principle of selecting the value of N will be specifically introduced in the following embodiments. When it is determined that at least part of the data in each of the M windows W _x [p _x -A _x , p _x +B _x ] satisfies the predetermined condition C _x , then the potential split point k is a data stream split point.

具体的，对当前潜在分割点k_i，依据所述规则，执行以下步骤：Specifically, for the current potential segmentation point k _i , according to the rules, the following steps are performed:

步骤401：依据所述规则为当前潜在分割点k_i确定点p_iz及所述点p_iz对应的窗口W_iz[p_iz-A_z,p_iz+B_z]，i和z为整数，并且1≤z≤M；Step 401: Determine the point p _iz and the window W _iz [p _iz -A _z ,p _iz +B _z ] corresponding to the point p _iz for the current potential segmentation point ki according to the rule, _i and z are integers, and 1≤z≤M;

步骤402：判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z；Step 402: judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z ;

当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，获得新的潜在分割点，执行步骤401；When at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , jump N times from the point p _iz along the direction of searching the data stream splitting point The minimum search unit U of the data flow segmentation point, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), to obtain a new potential segmentation point, go to step 401;

进一步地，所述规则还包括：至少两个点p_e和p_f,满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f；Further, the rule further includes: at least two points p _e and p _f satisfy the conditions of A _e =A _f , B _e =B _f , and C _e =C _f ;

所述规则还包括：所述至少两个点p_e和p_f，相对于所述潜在分割点k,在所述数据流分割点查找反方向上。The rule further includes: the at least two points _pe and p _f are in the reverse direction of the data flow split point search relative to the potential split point k.

所述规则还包括：所述至少两个点p_e和p_f之间的距离为1个U。The rule further includes: the distance between the at least two points _pe and p _f is 1 U.

判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：Judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z , specifically includes:

所述使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体为使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。The use of a random function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is specifically using a hash function to determine whether the window W _iz [p _iz -A _z , p _iz +B _z ] whether at least part of the data satisfies the predetermined condition C _z .

当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述点p_ic对应的所述窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述点p_ic是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。When at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , jump N times from the point p _iz along the direction of searching the data stream splitting point The minimum search unit U of the data stream segmentation point is used to obtain the new potential segmentation point. According to the rules, the window W _ic corresponding to the point p _ic determined for the new potential segmentation point [p _ic -A _c ,p _ic +B _c ] is coincident with the right boundary of the window W _iz [p _iz -A _z ,p _iz +B _z ] or the point p _ic corresponding to the new potential segmentation point The left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] is within the range of the window W _iz [p _iz -A _z ,p _iz +B _z ]; wherein, is the new potential The point p _ic determined by the split point is the first point in the sequence obtained according to the search direction of the data flow among the M points determined for the new potential split point according to the rule.

本发明实施例中通过判断M个窗口中某一个窗口中至少部分数据是否满足预定条件，来查找数据流分割点，当某一个窗口中至少部分数据不满足预定条件，则跳过N*U个长度，其中，N*U不大于‖B_z‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，获得下一个潜在分割点，提高了数据流分割点查找效率。In the embodiment of the present invention, the data flow segmentation point is searched by judging whether at least part of the data in one of the M windows satisfies the predetermined condition, and when at least part of the data in a certain window does not meet the predetermined condition, skip N*U length, where N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), to obtain the next potential segmentation point, which improves the efficiency of searching for data flow segmentation points.

在重复数据删除过程中，为保证数据块大小均匀，会考虑平均数据块(也称为平均分块)大小，即在满足最小数据块大小和最大数据块大小限定的同时，会确定平均数据块大小，以保证获得的数据块大小均匀。点p_x个数M与点p_x对应的窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率，这两个因素决定了找到数据流分割点的概率(以P(n)表示)。前者影响跳跃的长度，后者影响跳跃的概率，二者共同影响平均分块大小。一般而言，在平均分块大小固定时，点 p_x个数M增加，则单个点p_x对应的窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率也增加，例如在去重服务器103上预设的规则为：为潜在分割点k确定11个点p_x，x分别为1到11连续的自然数，11个点中任一个点p_x对应的窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率为1/2。而在去重服务器103上预设的另一组规则为：为潜在分割点k选择的24个点p_x，x分别为1到24连续的自然数，24个点中任一个点p_x对应的窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率3/4。具体窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率设定可参见判断窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据是否满足预定条件C_x部分的描述。点p_x个数M与点p_x对应的窗口W_x[p_x-A_x，p_x+B_x]中至少部分数据满足预定条件C_x的概率这两个因素决定P(n)，P(n)表示：从数据流起始位置/上一数据流分割点查找n个数据流分割点最小查找单位后没找到数据流分割点的概率。关于这两个因素决定P(n)的计算过程，实际上是一个多步长Fibonacci数列，后面将具体描述。得到P(n)后，1-P(n)即为数据流分割点的分布函数，(1-P(n))-(1-P(n-1))＝P(n-1)-P(n)，即为在第n个点找到数据流分割点的概率，也就是数据流分割点的密度函数，根据数据流分割点的密度函数就可以积分从而求得数据流分割点的期望长度，即平均分块大小，其中，4*1024(字节)表示最小数据块长度，12*1024(字节)表示最大数据块长度。In the process of data deduplication, in order to ensure uniform data block size, the average data block (also known as average block) size will be considered, that is, the average data block will be determined while meeting the minimum data block size and maximum data block size restrictions size to ensure that the obtained data blocks are uniform in size. The number M of points p _x and the probability that at least part of the data in the window W _x [p _x -A _x , p _x +B _x ] corresponding to point p _x satisfies the predetermined condition C _x , these two factors determine the finding of data stream segmentation The probability of the point (in P(n)). The former affects the length of the jump, the latter affects the probability of the jump, and the two together affect the average block size. Generally speaking, when the average block size is fixed and the number M of points p _x increases, at least part of the data in the window W _x [p _x -A _x , p _x +B _x ] corresponding to a single point p _x meets the predetermined conditions The probability of C _x also increases. For example, the preset rule on the deduplication server 103 is: determine 11 points p _x for the potential segmentation point k, x is a continuous natural number from 1 to 11, and any point p in the 11 points The probability that at least part of the data in the window W _x [p _x -A _x , p _x +B _x ] corresponding to _x satisfies the predetermined condition C _x is 1/2. Another set of rules preset on the deduplication server 103 is: 24 points p _x selected for the potential segmentation point k, x is a continuous natural number from 1 to 24, and any point p _x in the 24 points corresponds to The probability that at least part of the data in the window W _x [p _x -A _x , p _x +B _x ] satisfies the predetermined condition C _x is 3/4. The probability setting of at least part of the data in the specific window W _x [p _x -A _x , p _x +B _x ] meeting the predetermined condition C _x can be found in the judgment window W _x [p _x -A _x , p _x +B _x ] Whether at least part of the data satisfies the description of the predetermined condition C _x part. The number M of points p _x and the probability that at least part of the data in the window W _x [p _x -A _x , p _x +B _x ] corresponding to point p _x meet the predetermined condition C _x These two factors determine P(n), P (n) indicates: the probability that no data stream split point is found after searching n minimum search units of data stream split points from the data stream start position/previous data stream split point. The calculation process of determining P(n) by these two factors is actually a multi-step Fibonacci sequence, which will be described in detail later. After obtaining P(n), 1-P(n) is the distribution function of the data stream segmentation point, (1-P(n))-(1-P(n-1))=P(n-1)- P(n), that is, the probability of finding the data stream split point at the nth point, that is, the density function of the data stream split point, can be integrated according to the density function of the data stream split point Thus, the expected length of the split point of the data stream, that is, the average block size, is obtained, wherein 4*1024 (bytes) represents the minimum data block length, and 12*1024 (bytes) represents the maximum data block length.

如图3所示的数据流分割点查找的基础上，在图5所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x](简称窗口W_x)和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇ ＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。其中，点p_x与潜在分割点k之间距离d_x个字节，具体的，点p₁与潜在分割点k之间距离0个字节，点p₂与潜在分割点k之间距离1个字节，点p₃与潜在分割点k之间距离2个字节，点p₄与潜在分割点k之间距离3个字节，点p₅与潜在分割点k之间距离4个字节，点p₆与潜在分割点k之间距离5个字节，点p₇与潜在分割点k之间距离6个字节，点p₈与潜在分割点k之间距离7个字节，点p₉与潜在分割点k之间距离8个字节，点p₁₀与潜在分割点k之间距离9个字节，点p₁₁与潜在分割点k之间距离10个字节，并且点p₂、p₃、p₄、p₅、p₆、p₇、p₈、p₉、p₁₀和p₁₁相对于潜在分割点k均位于数据流分割点查找反方向。k_a为数据流分割点，图5中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图5所示的实施方式中，为潜在分割点k_i确定的点为11个，分别为p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11，点p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11对应的窗口分别为W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]。上述窗口分别简称为W_i1、W_i2、W_i3、W_i4、W_i5、W_i6、W_i7、W_i8、W_i9、W_i10和W_i11。其中，点p_ix与潜在分割点k_i之间距离d_x个字节，具体的，p_i1与k_i间距0个字节、p_i2与k_i间距1个字节、p_i3与k_i间距2个字节、p_i4与k_i间距3个字节、p_i5与k_i间距4个字节、p_i6与k_i间距5个字节、p_i7与 k_i间距6个字节、p_i8与k_i间距7个字节、p_i9与k_i间距8个字节、p_i10与k_i间距9个字节，p_i11与k_i间距10个字节，并且p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11相对于潜在分割点k_i均位于数据流分割点查找反方向。判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图6所示，W_i5[p_i5-169,p_i5]中至少部分数据不满足对应的预定条件C₅，则从点p_i5沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₅‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，在图6所示的实施方式中，跳跃N个字节不大于179字节，在本实施例中，N＝11，得到下一个潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j。根据图5所示的实施方式中在去重服务器103上预设的规则，为潜在分割点k_j确定的点为11个，分别为p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11，确定点p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11对应的窗口分别为W_j1[p_j1-169,p_j1]、W_j2[p_j2-169,p_j2]、W_j3[p_j3-169,p_j3]、W_j4[p_j4-169,p_j4]、W_j5[p_j5-169,p_j5]、W_j6[p_j6-169,p_j6]、W_j7[p_j7-169,p_j7]、W_j8[p_j8-169,p_j8]、W_j9[p_j9-169,p_j9]、W_j10[p_j10-169,p_j10]和W_j11[p_j11-169,p_j11]。其中，p_jx与潜在分割点k_j之间距离d_x个字节，具体的，p_j1与k_j间距0个字节、p_j2与k_j间距1个字节、p_j3与k_j间距2个字节、p_j4与k_j间距3个字节、p_j5与k_j间距4个字节、p_j6与k_j间距5个字节、p_j7与k_j间距6个字节、p_j8与k_j间距7个字节、p_j9与k_j间距8个字节、p_j10与k_j间距9个字节，p_j11与k_j间距10个字节，并且p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11相对于潜在分割点k_j均位于数据流分割点查找反方向。如图6所示实施方式中，当为潜在分割点k_j确定的第11个窗口W_j11[p_j11-169,p_j11]，在保证潜在分割点k_i与潜在分割点k_j之间的范围都在判断范围之内，则在本实施方式中，必须保证窗口W_j11[p_j11-169,p_j11]的左边界与W_i5[p_i5-169,p_i5]的右边界p_i5重合或者位于W_i5[p_i5-169,p_i5]范围之内，其中，所述潜在分割点k_j确定的点p_j11是根据所述规则，为所述潜在分割点k_j确定的M个点按照数据流查找方向获得的序列中排序第一的点。因此，在这一限定内，当W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅，从p_i5沿着数据流分割点查找方向跳跃的距离为不大于‖B₅‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，其中，M＝11，11*U不大于max_x(‖A_x‖+‖(k_i-p_ix)‖)，因此，从p_i5沿着数据流分割点查找方向跳跃的距离为不大于179。判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁、判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂、判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃、判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该规则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式跳跃11个字节获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search as shown in Figure 3, in the embodiment shown in Figure 5, a rule is preset on the deduplication server 103, and the rule is: determine 11 points for the potential segmentation point k p _x , window W _x [p _x -A _x ,p _x +B _x ] (referred to as window W _x ) corresponding to point p _x and reservation corresponding to window W _x [p _x -A _x ,p _x +B _x ] Condition C _x , where A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ = B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ . Among them, the distance between point p _x and potential segmentation point k is d _x bytes, specifically, the distance between point p ₁ and potential segmentation point k is 0 bytes, and the distance between point p ₂ and potential segmentation point k is 1 bytes, the distance between point p ₃ and potential segmentation point k is 2 bytes, the distance between point p ₄ and potential segmentation point k is 3 bytes, and the distance between point p ₅ and potential segmentation point k is 4 bytes node, the distance between point p ₆ and potential split point k is 5 bytes, the distance between point p ₇ and potential split point k is 6 bytes, the distance between point p ₈ and potential split point k is 7 bytes, The distance between point p ₉ and potential split point k is 8 bytes, the distance between point p ₁₀ and potential split point k is 9 bytes, the distance between point p ₁₁ and potential split point k is 10 bytes, and point p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ , p ₉ , p ₁₀ , and p ₁₁ are all located in the opposite direction of the data flow split point search relative to the potential split point k. k _a is the data stream splitting point. The search direction of the data stream splitting point shown in Figure 5 is from left to right. After skipping the smallest data block 4KB from the data stream splitting point k _a , the smallest data block 4KB end position is taken as the next potential The split point _ki is a point p _ix determined for the potential split point _ki . In this embodiment, according to the preset rule on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 5 , 11 points are determined for the potential segmentation point _ki , which are p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 , the windows corresponding to points p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 - 169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169, p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -169,p _i11 ]. The above windows are referred to as W _i1 , W _i2 , W _i3 , W _i4 , W _i5 , W _i6 , W _i7 , W _i8 , W _i9 , W _i10 and W _i11 for short respectively. Among them, the distance between point p _ix and potential segmentation point _ki is d _x bytes, specifically, the distance between p _i1 and _ki is 0 bytes, the distance between p _i2 and _ki is 1 byte, and the distance between p _i3 and _ki The distance between p _i4 and _ki is 3 bytes, the distance between p _i5 and _ki is 4 bytes, the distance between p _i6 and _ki is 5 bytes, the distance between p _i7 and _ki is 6 bytes, The distance between p _i8 and _ki is 7 bytes, the distance between p _i9 and _ki is 8 bytes, the distance between p _i10 and _ki is 9 bytes, the distance between p _i11 and _ki is 10 bytes, and p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are all located in the reverse direction of the data flow segmentation point search relative to the potential segmentation point _ki . Judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , judging W _i3 [ Whether at least part of the data in p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and judge whether W _i5 [p _i5 -169 ,p _i5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169,p _i6 ] meets the predetermined condition C ₆ , judging W _i7 [p _i7 -169,p _i7 ] Whether at least part of the data in W i8 [p i8 -169, p i8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _i8 [p _i8 -169, p _i8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _i9 [p _i9 -169, p _i9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _i10 [p _i10 -169, p _i10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [p _i11 -169, p _i11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined condition, as shown in Figure 6, at least part of the data in W _i5 [p _i5 -169, p _i5 ] does not meet the corresponding predetermined condition C ₅ , Then jump N bytes from the point p _i5 along the search direction of the data stream segmentation point, where N bytes are not greater than ‖B ₅ ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), In the embodiment shown in Fig. 6, jumping N bytes is no more than 179 bytes, and in this embodiment, N=11, obtains next potential segmentation point, for being different from potential segmentation point _ki , new here The potential split points of are denoted as k _j . According to the preset rules on the deduplication server 103 in the embodiment shown in FIG. 5 , there are 11 points determined for the potential segmentation point k _j , namely p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 , determine the points p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and The windows corresponding to p _j11 are respectively W _j1 [p _j1 -169,p _j1 ], W _j2 [p _j2 -169,p _j2 ], W _j3 [p _j3 -169,p _j3 ], W _j4 [p _j4 -169 ,p _j4 ], W _j5 [p _j5 -169,p _j5 ], W _j6 [p _j6 -169,p _j6 ], W _j7 [p _{j7 -169} ,p _j7 ], W _j8 [p _{j8 -169} ,p _j8 ], W _j9 [p _j9 -169,p _j9 ], W _j10 [p _j10 -169,p _j10 ], and W _j11 [p _j11 -169,p _j11 ]. Among them, the distance between p _jx and potential segmentation point k _j is d _x bytes, specifically, the distance between p _j1 and k _j is 0 bytes, the distance between p _j2 and k _j is 1 byte, and the distance between p _j3 and k _j 2 bytes, 3 bytes between p _j4 and k _j , 4 bytes between p _j5 and k _j , 5 bytes between p _j6 and k _j , 6 bytes between p _j7 and k _j , p The distance between _j8 and k _j is 7 bytes, the distance between p _j9 and k _j is 8 bytes, the distance between p _j10 and k _j is 9 bytes, the distance between p _j11 and k _j is 10 bytes, and p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 , and p _j11 are all located in the opposite direction of the data flow split point search relative to the potential split point k _j . In the embodiment shown in Figure 6, when the 11th window W _j11 [p _j11 -169,p _j11 ] determined for the potential segmentation point k _j , between the guaranteed potential segmentation point k _i and the potential segmentation point k _j range is within the judgment range, then in this embodiment, it must be ensured that the left boundary of window W _j11 [p _j11 -169,p _j11 ] coincides with the right boundary p _i5 of W _i5 [p _i5 -169,p _i5 ] Or within the range of W _i5 [p _i5 -169,p _i5 ], wherein the point p _j11 determined by the potential segmentation point k _j is the M points determined for the potential segmentation point k _j according to the rule The first point in the sequence obtained according to the search direction of the data flow. Therefore, within this limit, when at least part of the data in W _i5 [p _i5 -169, p _i5 ] does not satisfy the predetermined condition C ₅ , the jumping distance from p _i5 along the direction of data stream segmentation point search is not greater than ∥B ₅ ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), where M=11, 11*U is not greater than max _x (‖A _x ‖+‖(k _i -p _ix )‖ ), therefore, the jumping distance from p _i5 along the search direction of the data stream segmentation point is not greater than 179. Judging whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , judging W _j3 [ Whether at least part of the data in p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , and judge whether W _j5 [p _j5 -169 ,p _j5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging W _j7 [p _{j7 -169} ,p _j7 ] Whether at least part of the data in W j8 [p j8 -169,p j8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _j11 [p _j11 -169, p _j11 ] meets the predetermined condition _C11 . Of course, in the embodiment of the present invention, this rule is also followed when judging whether the potential split point k _a is a data stream split point, and the specific implementation will not be described again, and the description of judging the potential split point k _i can be referred to. When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data stream split point. When it is judged that the potential split point k _j is not a data stream split point, jump 11 bytes in the same way as k _i to obtain the next potential split point, and judge according to the preset rules on the deduplication server 103 and the above method Whether a potential split point is a stream split point. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the forced split point.

在图5所示的实施方式中，根据在去重服务器103上预设的规则，从判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁开始，当判断W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据满足预定条件C₃和判断W_i4[p_i4-169,p_i4]中至少部分数据满足预定条件C₄，判断W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅时，从点p_i5沿着数据流分割点查找方向跳跃10个字节，在第10个字节的结束位置获得新的潜在分割点，为与其他潜在分割点区别，这里表示为k_g，按照在去重服务器103上预设的规则，为潜在分割点k_g确定11个点p_gx，x分别为1到11连续的自然数，分别为p_g1、p_g2、p_g3、p_g4、p_g5、p_g6、p_g7、p_g8、p_g9、p_g10和p_g11，确定点p_g1、p_g2、p_g3、p_g4、p_g5、p_g6、p_g7、p_g8、p_g9、p_g10和p_g11对应的窗口分别为W_g1[p_g1-169,p_g1]、W_g2[p_g2-169,p_g2]、W_g3[p_g3-169,p_g3]、W_g4[p_g4-169,p_g4]、W_g5[p_g5-169,p_g5]、W_g6[p_g6-169,p_g6]、W_g7[p_g7-169,p_g7]、W_g8[p_g8-169,p_g8]、W_g9[p_g9-169,p_g9]、W_g10[p_g10-169,p_g10]和W_g11[p_g11-169,p_g11]。其中，p_gx与潜在分割点k_g之间距离d_x个字节，具体的，p_g1与k_g间距0个字节、p_g2与k_g间距1个字节、p_g3与k_g间距2个字节、p_g4与k_g间距3个字节、p_g5与k_g间距4个字节、p_g6与k_g间距5个字节、p_g7与k_g间距6个字节、p_g8与k_g间距7个字节、p_g9与k_g间距8个字节、p_g10与k_g间距9个字节，p_g11与 k_g间距10个字节，并且p_g2、p_g3、p_g4、p_g5、p_g6、p_g7、p_g8、p_g9、p_g10和p_g11相对于潜在分割点k_g均位于数据流分割点查找反方向。判断W_g1[p_g1-169,p_g1]中至少部分数据是否满足预定条件C₁、判断W_g2[p_g2-169,p_g2]中至少部分数据是否满足预定条件C₂、判断W_g3[p_g3-169,p_g3]中至少部分数据是否满足预定条件C₃、判断W_g4[p_g4-169,p_g4]中至少部分数据是否满足预定条件C₄、判断W_g5[p_g5-169,p_g5]中至少部分数据是否满足预定条件C₅、判断W_g6[p_g6-169,p_g6]中至少部分数据是否满足预定条件C₆、判断W_g7[p_g7-169,p_g7]中至少部分数据是否满足预定条件C₇、判断W_g8[p_g8-169,p_g8]中至少部分数据是否满足预定条件C₈、判断W_g9[p_g9-169,p_g9]中至少部分数据是否满足预定条件C₉、判断W_g10[p_g10-169,p_g10]中至少部分数据是否满足预定条件C₁₀和判断W_g11[p_g11-169,p_g11]中至少部分数据是否满足预定条件C₁₁。因此，潜在分割点k_g对应的点p_g11与潜在分割点k_i对应的点p_i5重合，并且点p_g11对应的窗口W_g11[p_g11-169,p_g11]与点p_i5对应的窗口W_i5[p_i5-169,p_i5]重合，并且C₅＝C₁₁，因此，对当潜在分割点k_i，当判断W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅时，从点p_i5沿着数据流分割点查找方向跳跃10个字节，获得的潜在分割点k_g仍然不符合作为数据流分割点的条件。因此，如果从点p_i5沿着数据流分割点查找方向跳跃10个字节会存在重复计算，从点p_i5沿着数据流分割点查找方向跳跃11个字节可以减少重复计算，效率更高。因此提高了查找数据流分割点的速度。当预设规定中点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件C_x的概率为1/2时，即是说以1/2的概率执行跳跃，每次最多可以跳跃179个字节。In the embodiment shown in FIG. 5 , according to the preset rules on the deduplication server 103, it starts from judging whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , when it is judged that W At least part of the data in _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , at least part of the data in the judgment W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , and the judgment W _i3 [p _i3 -169 ,p _i3 ] at least part of the data satisfies the predetermined condition C ₃ and at least part of the data in the judgment W _i4 [p _i4 -169,p _i4 ] satisfies the predetermined condition C ₄ , and in the judgment W _i5 [p _i5 -169,p _i5 ] at least When part of the data does not meet the predetermined condition _C5 , jump 10 bytes from the point p _i5 along the direction of data flow segmentation point search, and obtain a new potential segmentation point at the end position of the 10th byte, which is the same as other potential segmentation points The difference, expressed here as k _g , according to the preset rules on the deduplication server 103, determines 11 points p _gx for the potential segmentation point k _g , and x is a continuous natural number from 1 to 11, which are respectively p _g1 and p _g2 , p _g3 , p _g4 , p _g5 , p _g6 , p _g7 , p _g8 , p _g9 , p _g10 and p _g11 , determine the points p _g1 , p _g2 , p _g3 , p _g4 , p _g5 , p _g6 , p _g7 , p _g8 , p _g9 , p _g10 and p _g11 correspond to W _g1 [p _g1 -169,p _g1 ], W _g2 [p _g2 -169,p _g2 ], W _g3 [p _g3 -169,p _g3 ], W _g4 [p _g4 -169,p _g4 ], W _g5 [p _g5 -169,p _g5 ], W _g6 [p _g6 -169,p _g6 ], W _g7 [p _g7 -169,p _g7 ] , W _g8 [p _g8 -169, p _g8 ], W _g9 [p _g9 -169, p _g9 ], W _g10 [p _g10 -169, p _g10 ], and W _g11 [p _g11 -169, p _g11 ]. Among them, the distance between p _gx and potential segmentation point k _g is d _x bytes, specifically, the distance between p _g1 and k _g is 0 bytes, the distance between p _g2 and k _g is 1 byte, and the distance between p _g3 and k _g 2 bytes, 3 bytes between p _g4 and k _g , 4 bytes between p _g5 and k _g , 5 bytes between p _g6 and k _g , 6 bytes between p _g7 and k _g , p The distance between _g8 and k _g is 7 bytes, the distance between p _g9 and k _g is 8 bytes, the distance between p _g10 and k _g is 9 bytes, the distance between p _g11 and k _g is 10 bytes, and p _g2 , p _g3 , p _g4 , p _g5 , p _g6 , p _g7 , p _g8 , p _g9 , p _g10 and p _g11 are all located in the opposite direction of the data flow split point search relative to the potential split point k _g . Judging whether at least part of the data in W _g1 [p _g1 -169, p _g1 ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _g2 [p _g2 -169, p _g2 ] meets the predetermined condition C ₂ , judging W _g3 [ p _g3 -169,p _g3 ] whether at least part of the data satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _g4 [p _g4 -169,p _g4 ] meets the predetermined condition C ₄ , and judge W _g5 [p _g5 -169 ,p _g5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _g6 [p _g6 -169,p _g6 ] meets the predetermined condition C ₆ , judging W _g7 [p _g7 -169,p _g7 ] Whether at least part of the data in W g8 [p g8 -169,p g8 ] meets the predetermined condition C ₇ , whether at least part of the data in W _g8 [p _g8 -169,p _g8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _g9 [p _g9 -169,p _g9 ] Whether to meet the predetermined condition C ₉ , judging whether at least part of the data in W _g10 [p _g10 -169, p _g10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _g11 [p _g11 -169, p _g11 ] meets the predetermined condition _C11 . Therefore, the point p _g11 corresponding to the potential segmentation point k _g coincides with the point p _i5 corresponding to the potential segmentation point k _i , and the window W _g11 [p _g11 -169,p _g11 ] corresponding to the point p _g11 is the window corresponding to the point p _i5 W _i5 [p _i5 -169,p _i5 ] coincides, and C ₅ =C ₁₁ , therefore, for the potential segmentation point _ki , when it is judged that at least part of the data in W _i5 [p _i5 -169,p _i5 ] does not meet the predetermined When the condition C is ₅ , jumping 10 bytes from the point p _i5 along the direction of data stream split point search, the obtained potential split point k _g still does not meet the condition of being a data stream split point. Therefore, if jumping 10 bytes from point p _i5 along the search direction of the data flow split point, there will be double calculation, and jumping 11 bytes from point p _i5 along the search direction of the data stream split point can reduce double count and be more efficient . Therefore, the speed of finding data stream split points is increased. When the probability that at least part of the data in the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the midpoint p _x satisfies the predetermined condition C _x is 1/2, that is to say, 1/2 Probability to perform a jump, and each time a maximum of 179 bytes can be jumped.

在本实施方式中，预定规则为:为潜在分割点k确定11个点p_x、点 p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到11连续的自然数，其中，点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件的概率为1/2，通过这两个因素可以计算P(n)。并且A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离0个字节，p₂与k之间距离1个字节，p₃与k之间距离2个字节，p₄与k之间距离3个字节，p₅与k之间距离4个字节，p₆与k之间距离5个字节，p₇与k之间距离6个字节，p₈与k之间距离7个字节，p₉与k之间距离8个字节，p₁₀与k之间距离9个字节，p₁₁与k之间距离10个字节，并且p₂、p₃、p₄、p₅、p₆、p₇、p₈、p₉、p₁₀和p₁₁相对于潜在分割点k均位于数据流分割点查找反方向。因此是否存在连续11个点对应窗口中的每一个窗口中至少部分数据均满足预定条件C_x就决定潜在分割点k是否为数据流分割点。从数据流起始位置/上一数据流分割点跳跃最小分块长度4096个字节后，向数据流分割点查找反方向回退10个字节，找到第4086个点，在该点处不存在数据流分割点，所以P(4086)＝1，依次类推，P(4087)＝1，……P(4095)＝1。在第4096个点处，即在最小分块大小处，以(1/2)^11的概率这11个点对应的窗口中每一个窗口中至少部分数据满足预定条件C_x，因此以(1/2)^11的概率存在数据流分割点，以1-(1/2)^11的概率不存在数据流分割点，所以P(11)＝1-(1/2)^11。In this embodiment, the predetermined rule is: determine 11 points p _x for the potential segmentation point k, the window W _x [p _x -A _x , p _x +B _x ] and the window W _x [p _x corresponding to the point p _x -A _x ,p _x +B _x ] corresponding to the predetermined condition C _x , x is a continuous natural number from 1 to 11, wherein, the window W _x corresponding to point p _x [p _x -A _x ,p _x +B _x ] The probability that at least some of the data satisfy the predetermined condition is 1/2, and P(n) can be calculated by these two factors. And A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ ＝C ₁₁ , wherein, the distance between p _x and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 0 bytes, and the distance between p ₂ and k is 1 byte Bytes, the distance between p ₃ and k is 2 bytes, the distance between p ₄ and k is 3 bytes, the distance between p ₅ and k is 4 bytes, and the distance between p ₆ and k is 5 bytes , the distance between p ₇ and k is 6 bytes, the distance between p ₈ and k is 7 bytes, the distance between p ₉ and k is 8 bytes, the distance between p ₁₀ and k is 9 bytes, p The distance between ₁₁ and k is 10 bytes, and p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ , p ₉ , p ₁₀ and p ₁₁ are all located in the data Stream split point lookup in reverse direction. Therefore, whether there are at least part of the data in each window corresponding to 11 consecutive points satisfying the predetermined condition _Cx determines whether the potential segmentation point k is a data stream segmentation point. After skipping the minimum block length of 4096 bytes from the start position of the data stream/the last data stream split point, go back 10 bytes in the opposite direction to the data stream split point and find the 4086th point. There is a data stream split point, so P(4086)=1, and so on, P(4087)=1, ... P(4095)=1. At the 4096th point, that is, at the minimum block size, with a probability of (1/2)^11, at least part of the data in each window corresponding to these 11 points satisfies the predetermined condition C _x , so (1 There is a data stream split point with a probability of /2)^11, and there is no data stream split point with a probability of 1-(1/2)^11, so P(11)=1-(1/2)^11.

如图34所示，在第n个点处，可以分为12种情况来递推P(n)。As shown in Figure 34, at the nth point, P(n) can be deduced recursively in 12 cases.

情况1：第n个点对应的窗口中至少部分数据以1/2的概率不满足预定条件，此时第n个点前面的n-1个点以P(n-1)的概率不存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件，因此P(n)包含1/2*P(n-1)。第n个点对应的窗口中至少部分数据不满足预定条件，并且第n个点前面的n-1个点存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件的情况与P(n)无关。Case 1: At least part of the data in the window corresponding to the nth point does not meet the predetermined condition with a probability of 1/2. At this time, the n-1 points in front of the nth point do not exist continuously with a probability of P(n-1) At least part of the data in each of the windows corresponding to the 11 points of , respectively satisfy the predetermined conditions, so P(n) includes 1/2*P(n-1). At least part of the data in the window corresponding to the nth point does not meet the predetermined conditions, and there are n-1 points in front of the nth point, and at least part of the data in each of the windows corresponding to 11 consecutive points meet the predetermined conditions respectively The case is independent of P(n).

情况2：第n个点对应的窗口中至少部分数据以1/2的概率满足预定条件，第n-1个点对应的窗口中至少部分数据以1/2的概率不满足预定条件，此时第n-1个点前面的n-2个点以P(n-2)的概率不存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件，因此P(n)包含1/2*1/2*P(n-2)。第n个点对应的窗口中至少部分数据满足预定条件，第n-1个点对应的窗口中至少部分数据不满足预定条件，并且第n-1个点前面的n-2个点存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件的情况与P(n)无关。Case 2: At least part of the data in the window corresponding to the nth point meets the predetermined condition with a probability of 1/2, and at least part of the data in the window corresponding to the n-1th point does not meet the predetermined condition with a probability of 1/2. At this time The n-2 points in front of the n-1th point do not exist with the probability of P(n-2), at least part of the data in each window corresponding to the continuous 11 points in the window respectively meet the predetermined conditions, so P(n) Contains 1/2*1/2*P(n-2). At least part of the data in the window corresponding to the nth point meets the predetermined condition, at least part of the data in the window corresponding to the n-1th point does not meet the predetermined condition, and there are continuous n-2 points in front of the n-1th point The fact that at least part of the data in each of the windows corresponding to the 11 points respectively satisfies the predetermined condition has nothing to do with P(n).

依照上述描述，情况11：第n至n-9个点对应的窗口中至少部分数据以(1/2)^10的概率满足预定条件，第n-10个点对应的窗口中至少部分数据以1/2的概率不满足预定条件，此时第n-10个点前面的n-11个点以P(n-11)的概率不存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件，因此P(n)包含(1/2)^10*1/2*P(n-11)。第n至n-9个点对应的窗口中至少部分数据均满足预定条件，第n-10个点对应的窗口中至少部分数据不满足预定条件，并且第n-10个点前面的n-11个点存在连续的11个点对应的窗口中每一个窗口中至少部分数据分别满足预定条件的情况与P(n)无关。According to the above description, case 11: at least part of the data in the window corresponding to the nth to n-9 points meets the predetermined condition with a probability of (1/2)^10, and at least part of the data in the window corresponding to the n-10th point is The probability of 1/2 does not meet the predetermined conditions. At this time, the n-11 points in front of the n-10th point do not exist with the probability of P(n-11). In each window corresponding to 11 consecutive points, at least Part of the data respectively satisfies the predetermined conditions, so P(n) includes (1/2)^10*1/2*P(n-11). At least part of the data in the window corresponding to the n-9th point meets the predetermined condition, at least part of the data in the window corresponding to the n-10th point does not meet the predetermined condition, and n-11 points in front of the n-10th point It is independent of P(n) that at least part of the data in each of the windows corresponding to 11 consecutive points in each window satisfy the predetermined condition respectively.

情况12：第n至n-10个点对应的窗口中至少部分数据以(1/2)^11的概率满足预定条件，该情况与P(n)无关。Case 12: At least part of the data in the window corresponding to the nth to n-10th points meets the predetermined condition with a probability of (1/2)^11, and this case has nothing to do with P(n).

因此，P(n)＝1/2*P(n-1)+(1/2)^2*P(n-2)+……+(1/2)^11*P(n-11)。另一种预设规则：为潜在分割点k确定24个点p_x、点 p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到24连续的自然数，其中，点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件C_x的概率为3/4，通过这两个因素可以计算P(n)。并且A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝…＝C₂₂＝C₂₃＝C₂₄，其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离0个字节，p₂与k之间距离1个字节，p₃与k之间距离2个字节，p₄与k之间距离3个字节，p₅与k之间距离4个字节，p₆与k之间距离5个字节，p₇与k之间距离6个字节，p₈与k之间距离7个字节，p₉与k之间距离8个字节,…p₂₂与k之间距离21个字节，p₂₃与k之间距离22个字节，p₂₄与k之间距离23个字节,并且p₂、p₃、p₄、p₅、p₆、p₇、p₈、p₉…p₂₂、p₂₃和p₂₄相对于潜在分割点k均位于数据流分割点查找反方向。因此是否存在连续24个点对应窗口中的每一个窗口中至少部分数据均满足预定条件C_x就决定潜在分割点k是否为数据流分割点，可以通过下面的公式计算：Therefore, P(n)=1/2*P(n-1)+(1/2)^2*P(n-2)+...+(1/2)^11*P(n-11) . Another preset rule: determine 24 points p _x for potential segmentation point k, the window W _x [p _x -A _x ,p _x +B _x ] and window W _x [p _x -A _x corresponding to point p _x ,p _x +B _x ] corresponding to the predetermined condition C _x , x is a continuous natural number from 1 to 24, wherein, at least part of the window W _x [p _x -A _x ,p _x +B _x ] corresponding to point p _x The probability that the data satisfies the predetermined condition C _x is 3/4, and P(n) can be calculated by these two factors. And A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =... =C ₂₂ =C ₂₃ =C ₂₄ , wherein, the distance between p _x and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 0 bytes, p ₂ and k The distance between p ₃ and k is 2 bytes, the distance between p ₄ and k is 3 bytes, the distance between p ₅ and k is 4 bytes, and the distance between p ₆ and k The distance between p 7 and k is 5 bytes, the distance between p ₇ and k is 6 bytes, the distance between p ₈ and k is 7 bytes, the distance between p ₉ and k is 8 bytes, ... the distance between p ₂₂ and k 21 bytes, the distance between p ₂₃ and k is 22 bytes, the distance between p ₂₄ and k is 23 bytes, and p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ , p ₉ . . . p ₂₂ , p ₂₃ and p ₂₄ are all located in the opposite direction of the data flow split point search relative to the potential split point k. Therefore, whether there are at least part of the data in each window corresponding to 24 consecutive points satisfying the predetermined condition C _x determines whether the potential segmentation point k is a data stream segmentation point, which can be calculated by the following formula:

P(4073)＝1，P(4074)＝1，……P(,4095)＝1，P(4096)＝1-(3/4)^24，P(4073)=1, P(4074)=1, ... P(,4095)=1, P(4096)=1-(3/4)^24,

P(n)＝1/4*P(n-1)+1/4*(3/4)*P(n-2)+……+1/4*(3/4)^23*P(n-24)。P(n)＝1/4*P(n-1)+1/4*(3/4)*P(n-2)+......+1/4*(3/4)^23*P( n-24).

经过计算，P(5*1024)＝0.78，P(11*1024)＝0.17，P(12*1024)＝0.13，即从数据流起始位置/上一数据流分割点查找到12KB后以13％的概率仍未找到数据流分割点，强制进行分割。通过这个概率，求得数据流分割点的密度函数，经过积分求得大约平均在从数据流起始位置/上一数据流分割点查找7.6KB时找到数据流分割点，即平均分块长度大约为7.6KB。与连续的11个点对应的窗口中至少部分数据以1/2的概率满足预定条件不同，传统CDC算法采用一个窗口以1/2^12的概率满足条件时，方可达到平均分块长度7.6KB的效果。After calculation, P(5*1024)＝0.78, P(11*1024)＝0.17, P(12*1024)＝0.13, that is, after finding 12KB from the data stream start position/previous data stream split point, use 13 % probability that the data stream split point is still not found, forcing a split. Through this probability, the density function of the split point of the data stream is obtained, and the approximate average of the data stream split point is found when searching for 7.6 KB from the starting position of the data stream/the previous split point of the data stream through integration, that is, the average block length is about It is 7.6KB. Unlike at least part of the data in the window corresponding to consecutive 11 points that meets the predetermined condition with a probability of 1/2, the traditional CDC algorithm uses a window that meets the condition with a probability of 1/2^12 to achieve an average block length of 7.6 The effect of KB.

在图3所示的数据流分割点查找的基础上，在图7所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到11连续的自然数，其中，点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件C_x的概率为1/2，并且A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离2个字节，p₂与k之间距离3个字节，p₃与k之间距离4个字节，p₄与k之间距离5个字节，p₅与k之间距离6个字节，p₆与k之间距离7个字节，p₇与k之间距离8个字节，p₈与k之间距离9个字节，p₉与k之间距离10个字节，p₁₀与k之间距离1个字节，p₁₁与k之间距离0个字节，并且p₁、p₂、p₃、p₄、p₅、p₆、p₇、p₈、p₉和p₁₀相对于潜在分割点k均位于数据流分割点查找反方向。k_a为数据流分割点，图7中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，在最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图7所示的实施方式中，依据预定规则，为潜在分割点k_i确定的点为11个，分别为p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11，点p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11对应的窗口分别为W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169, p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]。其中，点p_ix与潜在分割点k_i之间距离d_ix个字节，具体的，p_i1与k_i间距2个字节、p_i2与k_i间距3个字节、p_i3与k_i间距4个字节、p_i4与k_i间距5个字节、p_i5与k_i间距6个字节、p_i6与k_i间距7个字节、p_i7与k_i间距8个字节、p_i8与k_i间距9个字节、p_i9与k_i间距10个字节、p_i10与k_i间距1个字节，p_i11与k_i间距0个字节，并且p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9和p_i10相对于潜在分割点k_i均位于数据流分割点查找反方向。判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图8所示，W_i3[p_i3-169,p_i3]中至少部分数据不满足预定条件C₃，点p_i3沿着数据流分割点查找方向跳跃11个字节为例进行描述。如图8所示，当判断W₃不满足预定条件时，以p₃为起始点，沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₃‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，在图6所示的实施方式中，跳跃N个字节，具体为不大于179字节，在本实施例中，N＝11，在第11个字节的结束位置，获得下一个潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据在去重服务器103上预设的规则，为潜在分割点k_j确定的点为11个，分别为p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11，确定点p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11对应的窗口分别为W_j1[p_j1-169,p_j1]、W_j2[p_j2-169,p_j2]、W_j3[p_j3-169,p_j3]、W_j4[p_j4-169,p_j4]、W_j5[p_j5-169,p_j5]、W_j6[p_j6-169,p_j6]、W_j7[p_j7-169,p_j7]、W_j8[p_j8-169,p_j8]、W_j9[p_j9-169,p_j9]、W_j10[p_j10-169,p_j10]和W_j11[p_j11-169,p_j11]。其中，p_jx与潜在分割点k_j之间距离d_x个字节，具体的，p_j1与k_j间距2个字节、p_j2与k_j间距3个字节、p_j3与k_j间距4个字节、p_j4与k_j间距5个字节、p_j5与k_j间距6个字节、p_j6与k_j间距7个字节、p_j7与k_j间距8个字节、p_j8与k_j间距9个字节、p_j9与k_j间距10个字节、p_j10与k_j间距1个字节，p_j11与k_j间距0个字节，并且p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9和p_j10相对于潜在分割点k_j均位于数据流分割点查找反方向。判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁、判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂、判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃、判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式跳跃11个字节获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。当然该方法的实施受最大数据块长度和构成该数据流的文件的大小约束，在此不再赘述。On the basis of the data stream segmentation point search shown in FIG. 3 , in the embodiment shown in FIG. 7 , a rule is preset on the deduplication server 103, and the rule is: determine 11 points for the potential segmentation point k p _x , the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], x respectively is a continuous natural number from 1 to 11, where the probability that at least part of the data in the window W _x [p _x -A _x ,p _x +B _x ] corresponding to point p _x meets the predetermined condition C _x is 1/2, and A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ , wherein, the distance between p _x and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 2 bytes, and the distance between p ₂ and k is 3 bytes, The distance between p ₃ and k is 4 bytes, the distance between p ₄ and k is 5 bytes, the distance between p ₅ and k is 6 bytes, the distance between p ₆ and k is 7 bytes, p ₇ The distance between p 8 and k is 8 bytes, the distance between p ₈ and k is 9 bytes, the distance between p ₉ and k is 10 bytes, the distance between p ₁₀ and k is 1 byte, and the distance between p ₁₁ and k The distance between them is 0 bytes, and p ₁ , p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ , p ₉ and p ₁₀ are all located at the data stream split point with respect to the potential split point k Find the opposite direction. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 7 is from left to right. After skipping the minimum data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is used as the next The potential segmentation point _ki is to determine the point p _ix for the potential segmentation point _ki . In this embodiment, according to the preset rules on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 7 , according to predetermined rules, 11 points are determined for the potential segmentation point _{ki, namely p i1} _, p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 , the windows corresponding to points p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are respectively W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -169,p _i11 ]. Among them, the distance between point p _ix and potential segmentation point _ki is d _ix bytes. Specifically, the distance between p _i1 and _ki is 2 bytes, the distance between p _i2 and _ki is 3 bytes, and the distance between p _i3 and _ki The distance between p _i4 and _ki is 5 bytes, the distance between p _i5 and _ki is 6 bytes, the distance between p _i6 and _ki is 7 bytes, the distance between p _i7 and _ki is 8 bytes, The distance between p _i8 and _ki is 9 bytes, the distance between p _i9 and _ki is 10 bytes, the distance between p _i10 and _ki is 1 byte, the distance between p _i11 and _ki is 0 bytes, and p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , and p _i10 are all located in the opposite direction of the data stream segmentation point search relative to the potential segmentation point _ki . Judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , judging W _i3 [ Whether at least part of the data in p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and judge whether W _i5 [p _i5 -169 ,p _i5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169,p _i6 ] meets the predetermined condition C ₆ , judging W _i7 [p _i7 -169,p _i7 ] Whether at least part of the data in W i8 [p i8 -169, p i8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _i8 [p _i8 -169, p _i8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _i9 [p _i9 -169, p _i9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _i10 [p _i10 -169, p _i10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [p _i11 -169, p _i11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined condition, as shown in Figure 8, at least part of the data in W _i3 [p _i3 -169, p _i3 ] does not meet the predetermined condition C ₃ , point p _i3 jumps 11 bytes along the search direction of the data stream split point as an example for description. As shown in Figure 8, when it is judged that W ₃ does not meet the predetermined conditions, start from p ₃ and jump N bytes along the direction of data stream segmentation point search, where N bytes are not greater than ‖B ₃ ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), in the embodiment shown in Figure 6, skip N bytes, specifically not more than 179 bytes, in this embodiment, N= 11. At the end position of the 11th byte, obtain the next potential segmentation point. In order to distinguish it from the potential segmentation point ki, the new potential segmentation point is represented as _{k j} _here , according to the preset value on the deduplication server 103 Rule, there are 11 points determined for the potential segmentation point k _j , which are p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 , Determine the windows corresponding to points p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 respectively as W _j1 [p _j1 -169,p _j1 ] , W _j2 [p _j2 -169,p _j2 ], W _j3 [p _j3 -169,p _j3 ], W _j4 [p _j4 -169,p _j4 ], W _j5 [p _j5 -169,p _j5 ], W _j6 [p _j6 -169,p _j6 ], W _j7 [p _{j7 -169} ,p _j7 ], W _j8 [p _{j8 -169} ,p _j8 ], W _j9 [p _j9 -169,p _j9 ], W _j10 [ p _j10 -169,p _j10 ] and W _j11 [p _j11 -169,p _j11 ]. Among them, the distance between p _jx and potential segmentation point k _j is d _x bytes, specifically, the distance between p _j1 and k _j is 2 bytes, the distance between p _j2 and k _j is 3 bytes, and the distance between p _j3 and k _j 4 bytes, 5 bytes between p _j4 and k _j , 6 bytes between p _j5 and k _j , 7 bytes between p _j6 and k _j , 8 bytes between p _j7 and k _j , p The distance between _j8 and k _j is 9 bytes, the distance between p _j9 and k _j is 10 bytes, the distance between p _j10 and k _j is 1 byte, the distance between p _j11 and k _j is 0 bytes, and p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , and p _j10 are all located in the opposite direction of the data flow split point search relative to the potential split point k _j . Judging whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , judging W _j3 [ Whether at least part of the data in p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , and judge whether W _j5 [p _j5 -169 ,p _j5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging W _j7 [p _{j7 -169} ,p _j7 ] Whether at least part of the data in W j8 [p j8 -169,p j8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _j11 [p _j11 -169, p _j11 ] meets the predetermined condition _C11 . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data flow split point. When it is judged that the potential segmentation point k _j is not a data stream segmentation point, jump 11 bytes in the same way as k _i to obtain the next potential segmentation point, and judge according to the preset rules on the deduplication server 103 and the above method Whether a potential split point is a stream split point. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the forced split point. Of course, the implementation of this method is limited by the maximum data block length and the size of the files constituting the data stream, so details will not be repeated here.

在图3所示的数据流分割点查找的基础上，在图9所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离3个字节，p₂与k之间距离2个字节，p₃与k之间距离1个字节，p₄与k之间距离0个字节，p₅与k之间距离1个字节，p₆与k之间距离2个字节，p₇与k之间距离3个字节，p₈与k之间距离4个字节，p₉与k之间距离5个字节，p₁₀与k之间距离6个字节，p₁₁与k之间距离7个字节，并且p₅、p₆、p₇、p₈、p₉、p₁₀和p₁₁相对于潜在分割点k均位于数据流分割点查找反方向，p₁、p₂和p₃相对于潜在分割点k均位于数据流分割点查找方向。k_a为数据流分割点，图9中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图9所示的实施方式中，为潜在分割点k_i确定的点为11个，分别为p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11，点p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11对应的窗口分别为W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]。其中，p_ix与潜在分割点k_i之间距离d_x个字节，具体的，p_i1与k_i间距3个字节、p_i2与k_i间距2个字节、p_i3与k_i间距1个字节、p_i4与k_i间距0个字节、p_i5与k_i间距1个字节、p_i6与k_i间距2个字节、p_i7与k_i间距3个字节、p_i8与k_i间距4个字节、p_i9与k_i间距5个字节、p_i10与k_i间距6个字节，p_i11与k_i间距7个字节，并且p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11相对于潜在分割点k_i均位于数据流分割点查找反方向，p_i1、p_i2和p_i3相对于潜在分割点k_i均位于数据流分割点查找方向。判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图10所示，W_i7[p_i7-169,p_i7]中至少部分数据不满足对应的预定条件，则从点p_i7沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₄‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，在图10所示的实施方式中，跳跃N个字节，具体为不大于179个字节，在本实施例中，具体取N＝8，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据图9所示的实施方式中在去重服务器103上预设的规则，为潜在分割点k_j确定的点为11个，分别为p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11，确定点p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11对应的窗口分别为W_j1[p_j1-169,p_j1]、W_j2[p_j2-169,p_j2]、W_j3[p_j3-169,p_j3]、W_j4[p_j4-169,p_j4]、W_j5[p_j5-169,p_j5]、W_j6[p_j6-169,p_j6]、W_j7[p_j7-169,p_j7]、W_j8[p_j8-169,p_j8]、W_j9[p_j9-169,p_j9]、W_j10[p_j10-169,p_j10]和W_j11[p_j11-169,p_j11]。其中，p_jx与潜在分割点k_j之间距离d_x个字节，具体的，p_j1与k_j间距3个字节、p_j2与k_j间距2个字节、p_j3与k_j间距1个字节、p_j4与k_j间距0个字节、p_j5与k_j间距1个字节、p_j6与k_j间距2个字节、p_j7与k_j间距3个字节、p_j8与k_j间距4个字节、p_j9与k_j间距5个字节、p_j10与k_j间距6个字节，p_j11与k_j间距7个字节，并且p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11相对于潜在分割点k_j均位于数据流分割点查找反方向，p_j1、p_j2和p_j3相对于潜在分割点k_j均位于数据流分割点查找方向。判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁、判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂、判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃、判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式跳跃8个字节获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search shown in FIG. 3 , in the embodiment shown in FIG. 9 , a rule is preset on the deduplication server 103, and the rule is: determine 11 points for the potential segmentation point k p _x , the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x , and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], where A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ = _C11 . Among them, the distance between p _x and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 3 bytes, the distance between p ₂ and k is 2 bytes, and p ₃ The distance between p 4 and k is 1 byte, the distance between p ₄ and k is 0 byte, the distance between p ₅ and k is 1 byte, the distance between p ₆ and k is 2 bytes, and the distance between p ₇ and k The distance between p ₈ and k is 4 bytes, the distance between p ₉ and k is 5 bytes, the distance between p ₁₀ and k is 6 bytes, and the distance between p ₁₁ and k The distance is 7 bytes, and p ₅ , p ₆ , p ₇ , p ₈ , p ₉ , p ₁₀ and p ₁₁ are all located in the opposite direction of the data flow split point search relative to the potential split point k, p ₁ , p ₂ and p ₃ are located in the search direction of the data flow split point relative to the potential split point k. k _a is the data stream splitting point. The search direction of the data stream splitting point shown in Figure 9 is from left to right. After skipping the smallest data block 4KB from the data stream splitting point k _a , the smallest data block 4KB end position is taken as the next potential The split point _ki is a point p _ix determined for the potential split point _ki . In this embodiment, according to the preset rule on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 9 , 11 points are determined for the potential segmentation point _{ki, namely p i1} _, p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 , the windows corresponding to points p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 - 169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169, p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -169,p _i11 ]. Among them, the distance between p _ix and potential segmentation point _ki is d _x bytes, specifically, the distance between p _i1 and _ki is 3 bytes, the distance between p _i2 and _ki is 2 bytes, and the distance between p _i3 and _ki 1 byte, the distance between p _i4 and _ki is 0 bytes, the distance between p _i5 and _ki is 1 byte, the distance between p _i6 and _ki is 2 bytes, the distance between p _i7 and _ki is 3 bytes, p The distance between _i8 and _ki is 4 bytes, the distance between p _i9 and _ki is 5 bytes, the distance between p _i10 and _ki is 6 bytes, the distance between p _i11 and _ki is 7 bytes, and p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are all located in the opposite direction of the data flow segmentation point search relative to the potential segmentation point _{ki, and p i1} _, p _i2 and p _i3 are all located in the data stream relative to the potential segmentation point _ki Split point lookup direction. Judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , judging W _i3 [ Whether at least part of the data in p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and judge whether W _i5 [p _i5 -169 ,p _i5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169,p _i6 ] meets the predetermined condition C ₆ , judging W _i7 [p _i7 -169,p _i7 ] Whether at least part of the data in W i8 [p i8 -169, p i8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _i8 [p _i8 -169, p _i8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _i9 [p _i9 -169, p _i9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _i10 [p _i10 -169, p _i10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [p _i11 -169, p _i11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined conditions, as shown in Figure 10, at least part of the data in W _i7 [p _i7 -169, p _i7 ] does not meet the corresponding predetermined conditions, then from Point p _i7 jumps N bytes along the search direction of the data flow splitting point, where N bytes are not greater than ‖B ₄ ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), in Fig. In the embodiment shown in 10, N bytes are skipped, specifically no more than 179 bytes. In this embodiment, N=8 is specifically taken to obtain a new potential segmentation point, which is different from the potential segmentation point _ki , here the new potential segmentation point is denoted as _k _j , according to the rules preset on the deduplication server 103 in the embodiment shown in FIG _. , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 , determine the points p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 correspond to W _j1 [p _j1 -169,p _j1 ], W _j2 [p _j2 -169,p _j2 ], W _j3 [p _j3 - 169,p _j3 ], W _j4 [p _j4 -169,p _j4 ], W _j5 [p _j5 -169,p _j5 ], W _j6 [p _j6 -169,p _j6 ], W _j7 [p _{j7 -169} , p _j7 ], W _j8 [p _{j8 -169} ,p _j8 ], W _j9 [p _j9 -169,p _j9 ], W _j10 [p _j10 -169,p _j10 ], and W _j11 [p _j11 -169,p _j11 ]. Among them, the distance between p _jx and potential segmentation point k _j is d _x bytes. Specifically, the distance between p _j1 and k _j is 3 bytes, the distance between p _j2 and k _j is 2 bytes, and the distance between p _j3 and k _j is 1 byte, the distance between p _j4 and k _j is 0 bytes, the distance between p _j5 and k _j is 1 byte, the distance between p _j6 and k _j is 2 bytes, the distance between p _j7 and k _j is 3 bytes, p The distance between _j8 and k _j is 4 bytes, the distance between p _j9 and k _j is 5 bytes, the distance between p _j10 and k _j is 6 bytes, the distance between p _j11 and k _j is 7 bytes, and p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 are all located in the opposite direction of the data flow segmentation point search relative to the potential segmentation point k _j , and p _j1 , p _j2 and p _j3 are all located in the data stream relative to the potential segmentation point k _j Split point lookup direction. Judging whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , judging W _j3 [ Whether at least part of the data in p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , and judge whether W _j5 [p _j5 -169 ,p _j5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging W _j7 [p _{j7 -169} ,p _j7 ] Whether at least part of the data in W j8 [p j8 -169,p j8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _j11 [p _j11 -169, p _j11 ] meets the predetermined condition _C11 . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data stream split point. When it is judged that the potential segmentation point k _j is not a data flow segmentation point, jump 8 bytes in the same way as k _i to obtain the next potential segmentation point, and judge according to the preset rules on the deduplication server 103 and the above method Whether a potential split point is a stream split point. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the mandatory split point.

在图3所示的数据流分割点查找的基础上，在图11所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k 确定11个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝169，A₁₁＝182，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀≠C₁₁。其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离0个字节，p₂与k之间距离1个字节，p₃与k之间距离2个字节，p₄与k之间距离3个字节，p₅与k之间距离4个字节，p₆与k之间距离5个字节，p₇与k之间距离6个字节，p₈与k之间距离7个字节，p₉与k之间距离8个字节，p₁₀与k之间距离1个字节，p₁₁与k之间距离3个字节，并且、p₂、p₃、p₄、p₅、p₆、p₇、p₈和p₉相对于潜在分割点k均位于数据流分割点查找反方向，p₁₀和p₁₁相对于潜在分割点k均位于数据流分割点查找方向。k_a为数据流分割点，图11中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图11所示的实施方式中，为潜在分割点k_i确定的点为11个，分别为p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11，点p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11对应的窗口分别为W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-182,p_i11]。其中，p_ix与潜在分割点k_i之间距离d_x个字节，具体的，p_i1与k_i间距0个字节、p_i2与k_i间距1个字节、p_i3与k_i间距2个字节、p_i4与k_i间距3个字节、p_i5与k_i间距4个字节、p_i6与k_i间距5个字节、p_i7与k_i间距6个字节、p_i8与k_i间距7个字节、p_i9 与k_i间距8个字节、p_i10与k_i间距1个字节，p_i11与k_i间距3个字节，并且p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8和p_i9相对于潜在分割点k_i均位于数据流分割点查找反方向，p_i10和p_i11相对于潜在分割点k_i均位于数据流分割点查找方向。判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当判断窗口W_i11中至少部分数据不满足预定条件C₁₁时，则从潜在分割点k_i沿着数据流分割点查找方向跳跃1个字节，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j。当W_i1、W_i2、W_i3、W_i4、W_i5、W_i6、W_i7、W_i8、W_i9和 W_i1010个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图12所示，W_i4[p_i4-169,p_i4]，则从点p_i4沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₄‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，在图12所示的实施方式中，跳跃N个字节，具体为不大于179，在本实施例中，具体取N＝9，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据图11所示的实施方式中在去重服务器103上预设的规则，为潜在分割点k_j确定的点为11个，分别为p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11，确定点p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11对应的窗口分别为W_j1[p_j1-169,p_j1]、W_j2[p_j2-169,p_j2]、W_j3[p_j3-169,p_j3]、W_j4[p_j4-169,p_j4]、W_j5[p_j5-169,p_j5]、W_j6[p_j6-169,p_j6]、W_j7[p_j7-169,p_j7]、W_j8[p_j8-169,p_j8]、W_j9[p_j9-169,p_j9]、W_j10[p_j10-169,p_j10]和W_j11[p_j8-182,p_j8]。其中，p_jx与潜在分割点k_j之间距离d_x个字节，具体的，p_j1与k_j间距0个字节、p_j2与k_j间距1个字节、p_j3与k_j间距2个字节、p_j4与k_j间距3个字节、p_j5与k_j间距4个字节、p_j6与k_j间距5个字节、p_j7与k_j间距6个字节、p_j8与k_j间距7个字节、p_j9与k_j间距8个字节、p_j10与k_j间距1个字节，p_j11与k_j间距3个字节，并且p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8和p_j9相对于潜在分割点k_j均位于数据流分割点查找反方向，p_j10和p_j11相对于潜在分割点k_j均位于数据流分割点查找方向。判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁、判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂、判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃、判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6 -169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-182,p_j11]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 11, a rule is preset on the deduplication server 103, and the rule is: determine 11 points for the potential segmentation point k p _x , the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x , and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], where A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =169, A ₁₁ =182, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ ≠C ₁₁ . Among them, the distance between p _x and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 0 bytes, the distance between p ₂ and k is 1 byte, and p ₃ The distance between p 4 and k is 2 bytes, the distance between p ₄ and k is 3 bytes, the distance between p ₅ and k is 4 bytes, the distance between p ₆ and k is 5 bytes, and the distance between p ₇ and k The distance between p ₈ and k is 7 bytes, the distance between p ₉ and k is 8 bytes, the distance between p ₁₀ and k is 1 byte, and the distance between p ₁₁ and k The distance is 3 bytes, and, p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ and p ₉ are all located in the opposite direction of the data flow split point search relative to the potential split point k, p ₁₀ and p ₁₁ is located in the search direction of the data stream split point relative to the potential split point k. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 11 is from left to right. After skipping the minimum data block 4KB from the data stream split point k _a , the minimum data block 4KB end position is taken as the next potential The split point _ki is a point p _ix determined for the potential split point _ki . In this embodiment, according to the preset rule on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 11 , 11 points are determined for the potential segmentation point k _i , which are p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 , the windows corresponding to points p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 - 169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169, p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -182,p _i11 ]. Among them, the distance between p _ix and potential segmentation point _ki is d _x bytes, specifically, the distance between p _i1 and _ki is 0 bytes, the distance between p _i2 and _ki is 1 byte, and the distance between p _i3 and _ki 2 bytes, 3 bytes between p _i4 and _ki , 4 bytes between p _i5 and _ki , 5 bytes between p _i6 and _ki , 6 bytes between p _i7 and _ki , p The distance between _i8 and _ki is 7 bytes, the distance between p _i9 and _ki is 8 bytes, the distance between p _i10 and _ki is 1 byte, the distance between p _i11 and _ki is 3 bytes, and p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 and p _i9 are all located in the opposite direction of the data stream segmentation point search relative to the potential segmentation point _ki , and p _i10 and p _i11 are located in the data stream relative to the potential segmentation point _ki Split point lookup direction. Judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , judging W _i3 [ Whether at least part of the data in p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and judge whether W _i5 [p _i5 -169 ,p _i5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169,p _i6 ] meets the predetermined condition C ₆ , judging W _i7 [p _i7 -169,p _i7 ] Whether at least part of the data in W i8 [p i8 -169, p i8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _i8 [p _i8 -169, p _i8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _i9 [p _i9 -169, p _i9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _i10 [p _i10 -169, p _i10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [p _i11 -169, p _i11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in the judging window W _i11 does not meet the predetermined condition C ₁₁ , then jump 1 byte from the potential segmentation point _ki along the data stream segmentation point search direction to obtain a new potential segmentation point, which is the potential segmentation point _ki difference, here denote the new potential segmentation point as k _j . When at least part of the data in any of the 10 windows W _i1 , W _i2 , W _i3 , W _i4 , W _i5 , W _i6 , W _i7 , W _i8 , W _i9 , and W _i10 does not meet the corresponding predetermined conditions, such as As shown in Figure 12, W _i4 [p _i4 -169, p _i4 ], jump N bytes from point p _i4 along the direction of data stream segmentation point search, where N bytes are not greater than ‖B ₄ ‖+max _x (‖A _x ‖+‖(k _i −p _ix )‖), in the embodiment shown in Figure 12, skip N bytes, specifically not more than 179, in this embodiment, specifically take N=9 , to obtain a new potential segmentation point, in order to distinguish it from the potential segmentation point ki, here the new potential segmentation point is expressed as _{k j} _, according to the rules preset on the deduplication server 103 in the embodiment shown in FIG. 11 , as The potential segmentation point k _j determines 11 points, which are p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 . The windows corresponding to _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 and p _j11 are respectively W _j1 [p _j1 -169,p _j1 ], W _j2 [p _j2 -169,p _j2 ], W _j3 [p _j3 -169,p _j3 ], W _j4 [p _j4 -169,p _j4 ], W _j5 [p _j5 -169,p _j5 ], W _j6 [p _j6 -169,p _j6 ], W _j7 [p _{j7 -169} ,p _j7 ], W _j8 [p _{j8 -169} ,p _j8 ], W _j9 [p _j9 -169,p _j9 ], W _j10 [p _j10 - 169,p _j10 ] and W _j11 [p _j8 -182,p _j8 ]. Among them, the distance between p _jx and potential segmentation point k _j is d _x bytes, specifically, the distance between p _j1 and k _j is 0 bytes, the distance between p _j2 and k _j is 1 byte, and the distance between p _j3 and k _j 2 bytes, 3 bytes between p _j4 and k _j , 4 bytes between p _j5 and k _j , 5 bytes between p _j6 and k _j , 6 bytes between p _j7 and k _j , p The distance between _j8 and k _j is 7 bytes, the distance between p _j9 and k _j is 8 bytes, the distance between p _j10 and k _j is 1 byte, the distance between p _j11 and k _j is 3 bytes, and p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 and p _j9 are all located in the opposite direction of the data stream segmentation point search relative to the potential segmentation point k _j , and p _j10 and p _j11 are located in the data stream relative to the potential segmentation point k _j Split point lookup direction. Judging whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , judging W _j3 [ Whether at least part of the data in p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , and judge whether W _j5 [p _j5 -169 ,p _j5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging W _j7 [p _{j7 -169} ,p _j7 ] Whether at least part of the data in W j8 [p j8 -169,p j8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _j11 [p _j11 -182, p _j11 ] meets the predetermined condition _C11 . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data flow split point. When it is judged that the potential split point k _j is not a data flow split point, the next potential split point is obtained in the same manner as k _i , and the next potential split point is judged according to the preset rules on the deduplication server 103 and the above method. Split point for data flow. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the forced split point.

在图3所示的数据流分割点查找的基础上，在图13所示的实施方式中，在去重服务器103上预设有规则为：为潜在分割点k确定11个点 p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到11连续的自然数，其中，点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件的概率为1/2，并且A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，其中，p_x与潜在分割点k之间距离d_x个字节，具体的，p₁与潜在分割点k之间距离0个字节，p₂与k之间距离2个字节，p₃与k之间距离4个字节，p₄与k之间距离6个字节，p₅与k之间距离8个字节，p₆与k之间距离10个字节，p₇与k之间距离12个字节，p₈与k之间距离14个字节，p₉与k之间距离16个字节，p₁₀与k之间距离18个字节，p₁₁与k之间距离20个字节，并且p₂、p₃、p₄、p₅、p₆、p₇、p₈、p₉、p₁₀和p₁₁相对于潜在分割点k均位于数据流分割点查找反方向。k_a为数据流分割点，图13中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，在最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图13所示的实施方式中，依据预定规则，为潜在分割点k_i确定的点为11个，分别为p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11，点p_i1、p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11对应的窗口分别为W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]。其中，p_ix与潜在分割点k_i之间距离d_x个字节，具体的，p_i1与k_i间距0个字节、p_i2与k_i间距2个字节、p_i3与k_i间距4个字节、p_i4与k_i间距6个字节、p_i5与k_i间距8个字节、p_i6与k_i间距10个字节、p_i7与k_i间距12个字节、p_i8与k_i间距14个字节、p_i9与k_i间距16个字节、p_i10与k_i间距18个字节，p_i11与k_i间距20个字节，并且p_i2、p_i3、p_i4、p_i5、p_i6、p_i7、p_i8、p_i9、p_i10和p_i11相对于潜在分割点k_i均位于数据流分割点查找反方向。判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁、判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图14所示，W_i4[p_i4-169,p_i4]中至少部分数据不满足预定条件C₄，则选择下一个潜在分割点，为与潜在分割点k_i区别，这里表示为k_j，k_j位于k_i右边，并且k_j与k_i间距1 个字节。如图14所示，依据在去重服务器103上预设的规则，为潜在分割点k_j确定11个点，分别为p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11，确定点p_j1、p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11对应的窗口分别为W_j1[p_j1-169,p_j1]、W_j2[p_j2-169,p_j2]、W_j3[p_j3-169,p_j3]、W_j4[p_j4-169,p_j4]、W_j5[p_j5-169,p_j5]、W_j6[p_j6-169,p_j6]、W_j7[p_j7-169,p_j7]、W_j8[p_j8-169,p_j8]、W_j9[p_j9-169,p_j9]、W_j10[p_j10-169,p_j10]和W_j11[p_j11-169,p_j11]，其中，A₁＝A₂＝A₃＝A₄＝A₅＝A₆＝A₇＝A₈＝A₉＝A₁₀＝A₁₁＝169，B₁＝B₂＝B₃＝B₄＝B₅＝B₆＝B₇＝B₈＝B₉＝B₁₀＝B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。其中，p_jx与潜在分割点k_j之间距离d_x个字节，具体的，p_j1与k_j间距0个字节、p_j2与k_j间距2个字节、p_j3与k_j间距4个字节、p_j4与k_j间距6个字节、p_j5与k_j间距8个字节、p_j6与k_j间距10个字节、p_j7与k_j间距12个字节、p_j8与k_j间距14个字节、p_j9与k_j间距16个字节、p_j10与k_j间距18个字节，p_j11与k_j间距20个字节，并且p_j2、p_j3、p_j4、p_j5、p_j6、p_j7、p_j8、p_j9、p_j10和p_j11相对于潜在分割点k_j均位于数据流分割点查找反方向。判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁、判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂、判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃、判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点。当判断窗口W_j1、W_j2、W_j3、W_j4、W_j5、W_j6、W_j7、W_j8、W_j9、W_j10和W_j11中任一个窗口中至少部分数据不满足预定条件时，如图15所示，W_j3[p_j3-169,p_j3]中至少部分数据不满足预定条件C₃时，点p_i4相对于数据流分割点查找方向位于点p_j3左边，从点p_i4沿着数据流分割点查找方向跳跃21个字节，获得下一个潜在分割点，为与潜在分割点k_i、k_j相区别，表示为k_l。根据图13所实施方式中在去重服务器103上预设的规则，为潜在分割点k_l确定的点为11个，分别为p_l1、p_l2、p_l3、p_l4、p_l5、p_l6、p_l7、p_l8、p_l9、p_l10和p_l11，点p_l1、p_l2、p_l3、p_l4、p_l5、p_l6、p_l7、p_l8、p_l9、p_l10和p_l11对应的窗口分别为W_l1[p_l1-169,p_l1]、W_l2[p_l2-169,p_l2]、W_l3[p_l3-169,p_l3]、W_l4[p_l4-169,p_l4]、W_l5[p_l5-169,p_l5]、W_l6[p_l6-169,p_l6]、W_l7[p_l7-169,p_l7]、W_l8[p_l8-169,p_l8]、W_l9[p_l9-169,p_l9]、W_l10[p_l10-169,p_l10]和W_l11[p_l11-169,p_l11]，其中，p_lx与潜在分割点k_l之间距离d_x个字节，具体的，p_l1与潜在分割点k_l之间距离0个字节，p_l2与k_l之间距离2个字节，p_l3与k_l之间距离4个字节，p_l4与k_l之间距离6个字节，p_l5与k_l之间距离8个字节，p_l6与k_l之间距离10个字节，p_l7与k_l之间距离12个字节，p_l8与k_l之间距离14个字节，p_l9与k_l之间距离16个字节，p_l10与k_l之间距离18个字节，p_l11与k_l之间距离20个字节，并且p_l2、p_l3、p_l4、p_l5、p_l6、p_l7、p_l8、p_l9、p_l10和p_l11相对于潜在分割点k_l均位于数据流分割点查找反方向。判断W_l1[p_l1-169,p_l1]中至少部分数据是否满足预定条件C₁、判断W_l2[p_l2-169,p_l2]中至少部分数据是否满足预定条件C₂、判断W_l3[p_l3-169,p_l3]中至少部分数据是否满足预定条件C₃、判断W_l4[p_l4-169,p_l4]中至少部分数据是否满足预定条件C₄、判断W_l5[p_l5-169,p_l5]中至少部分数据是否满足预定条件C₅、判断W_l6[p_l6-169,p_l6]中至少部分数据是否满足预定条件C₆、判断W_l7[p_l7-169,p_l7]中至少部分数据是否满足预定条件C₇、判断W_l8[p_l8-169,p_l8]中至少部分数据是否满足预定条件C₈、判断W_l9[p_l9-169,p_l9]中至少部分数据是否满足预定条件C₉、判断W_l10[p_l10-169,p_l10]中至少部分数据是否满足预定条件C₁₀和判断W_l11[p_l11-169,p_l11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_l1中至少部分数据满足预定条件C₁、窗口W_l2中至少部分数据满足预定条件C₂、窗口W_l3中至少部分数据满足预定条件C₃、窗口W_l4中至少部分数据满足预定条件C₄、窗口W_l5中至少部分数据满足预定条件C₅、窗口W_l6中至少部分数据满足预定条件C₆、窗口W_l7中至少部分数据满足预定条件C₇、窗口W_l8中至少部分数据满足预定条件C₈、窗口W_l9中至少部分数据满足预定条件C₉、窗口W_l10中至少部分数据满足预定条件C₁₀和窗口W_l11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_l为数据流分割点。当窗口W_l1、W_l2、W_l3、W_l4、W_l5、W_l6、W_l7、W_l8、W_l9、W_l10和W_l11中任一窗口中至少部分数据不满足预定条件时，选择下一个潜在分割点，为与潜在分割点k_i、k_j和k_l区别，表示为k_m，k_m位于k_l右边，并且k_m与k_l间距1个字节。根据图13所示实施例在去重服务器103上预设的规则，为潜在分割点k_m确定的点为11个，分别为p_m1、p_m2、p_m3、p_m4、p_m5、p_m6、p_m7、p_m8、p_m9、p_m10和p_m11，点p_m1、p_m2、p_m3、p_m4、p_m5、p_m6、p_m7、p_m8、p_m9、p_m10和p_m11对应的窗口分别为W_m1[p_m1-169,p_m1]、W_m2[p_m2-169,p_m2]、W_m3[p_m3-169,p_m3]、W_m4[p_m4-169,p_m4]、W_m5[p_m5-169,p_m5]、W_m6[p_m6-169,p_m6]、W_m7[p_m7-169,p_m7]、W_m8[p_m8-169,p_m8]、W_m9[p_m9-169,p_m9]、W_m10[p_m10-169,p_m10]和W_m11[p_m11-169,p_m11]，其中，p_mx与潜在分割点k_m之间距离d_x个字节，具体的，p_m1与潜在分割点k_m之间距离0个字节，p_m2与k_m之间距离2个字节，p_m3与k_m之间距离4个字节，p_m4与k_m之间距离6个字节，p_m5与k_m之间距离8个字节，p_m6与k_m之间距离10个字节，p_m7与k_m之间距离12个字节，p_m8与k_m之间距离14个字节，p_m9与k_m之间距离16个字节，p_m10与k_m之间距离18个字节，p_m11与k_m之间距离20个字节，并且p_m2、p_m3、p_m4、p_m5、p_m6、p_m7、p_m8、p_m9、p_m10和p_m11相对于潜在分割点k_m均位于数据流分割点查找反方向。判断W_m1[p_m1-169,p_m1]中至少部分数据是否满足预定条件C₁、判断W_m2[p_m2-169,p_m2]中至少部分数据是否满足预定条件C₂、判断W_m3[p_m3-169,p_m3]中至少部分数据是否满足预定条件C₃、判断W_m4[p_m4-169,p_m4]中至少部分数据是否满足预定条件C₄、判断W_m5[p_m5-169,p_m5]中至少部分数据是否满足预定条件C₅、判断W_m6[p_m6-169,p_m6]中至少部分数据是否满足预定条件C₆、判断W_m7[p_m7-169,p_m7]中至少部分数据是否满足预定条件C₇、判断W_m8[p_m8-169,p_m8]中至少部分数据是否满足预定条件C₈、判断W_m9[p_m9-169,p_m9]中至少部分数据是否满足预定条件C₉、判断W_m10[p_m10-169,p_m10]中至少部分数据是否满足预定条件C₁₀和判断W_m11[p_m11-169,p_m11]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_m1中至少部分数据满足预定条件C₁、窗口W_m2中至少部分数据满足预定条件C₂、窗口W_m3中至少部分数据满足预定条件C₃、窗口W_m4中至少部分数据满足预定条件C₄、窗口W_m5中至少部分数据满足预定条件C₅、窗口W_m6中至少部分数据满足预定条件C₆、窗口W_m7中至少部分数据满足预定条件C₇、窗口W_m8中至少部分数据满足预定条件C₈、窗口W_m9中至少部分数据满足预定条件C₉、窗口W_m10中至少部分数据满足预定条件C₁₀和窗口W_m11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_m为数据流分割点。当任一个窗口中至少部分数据不满足预定条件时，则按照前面描述的方案执行跳跃，以获得下一个潜在分割点并判断是否为数据流分割点。On the basis of the data stream segmentation point search shown in FIG. 3 , in the embodiment shown in FIG. 13 , a rule is preset on the deduplication server 103: determine 11 points p _x , point The window W _x [p _x -A _x ,p _x +B _x ] corresponding to p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], x is 1 to 11 respectively Continuous natural numbers, where the probability that at least part of the data in the window W _x [p _x -A _x ,p _x +B _x ] corresponding to point p _x meets the predetermined condition is 1/2, and A ₁ =A ₂ =A ₃ =A ₄ =A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ , where p _x The distance between p 1 and potential segmentation point k is d _x bytes, specifically, the distance between p ₁ and potential segmentation point k is 0 bytes, the distance between p ₂ and k is 2 bytes, and the distance between p ₃ and k The distance between p ₄ and k is 6 bytes, the distance between p ₅ and k is 8 bytes, the distance between p ₆ and k is 10 bytes, and the distance between p ₇ and k is 12 bytes bytes, the distance between p ₈ and k is 14 bytes, the distance between p ₉ and k is 16 bytes, the distance between p ₁₀ and k is 18 bytes, and the distance between p ₁₁ and k is 20 bytes , and p ₂ , p ₃ , p ₄ , p ₅ , p ₆ , p ₇ , p ₈ , p ₉ , p ₁₀ and p ₁₁ are all located in the opposite direction of the data flow split point search relative to the potential split point k. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 13 is from left to right. After skipping the smallest data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is used as the next The potential segmentation point _ki is to determine the point p _ix for the potential segmentation point _ki . In this embodiment, according to the preset rules on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 13 , according to predetermined rules, 11 points are determined for the potential segmentation point _{ki, namely p i1} _, p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 , the windows corresponding to points p _i1 , p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are respectively W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -169,p _i11 ]. Among them, the distance between p _ix and potential segmentation point _ki is d _x bytes, specifically, the distance between p _i1 and _ki is 0 bytes, the distance between p _i2 and _ki is 2 bytes, and the distance between p _i3 and _ki 4 bytes, 6 bytes between p _i4 and _ki , 8 bytes between p _i5 and _ki , 10 bytes between p _i6 and _ki , 12 bytes between p _i7 and _ki , p The distance between _i8 and _ki is 14 bytes, the distance between p _i9 and _ki is 16 bytes, the distance between p _i10 and _ki is 18 bytes, the distance between p _i11 and _ki is 20 bytes, and p _i2 , p _i3 , p _i4 , p _i5 , p _i6 , p _i7 , p _i8 , p _i9 , p _i10 and p _i11 are all located in the opposite direction of the data flow segmentation point search relative to the potential segmentation point _ki . Judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , judging W _i3 [ Whether at least part of the data in p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and judge whether W _i5 [p _i5 -169 ,p _i5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169,p _i6 ] meets the predetermined condition C ₆ , judging W _i7 [p _i7 -169,p _i7 ] Whether at least part of the data in W i8 [p i8 -169, p i8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _i8 [p _i8 -169, p _i8 ] meets the predetermined condition C ₈ , or whether at least part of the data in W _i9 [p _i9 -169, p _i9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _i10 [p _i10 -169, p _i10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [p _i11 -169, p _i11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined condition, as shown in Figure 14, at least part of the data in W _i4 [p _i4 -169,p _i4 ] does not meet the predetermined condition C ₄ , then select The next potential segmentation point, to be distinguished from the potential segmentation point ki, is denoted as k _j here, k _j is located to the right of _ki , and the distance between _{k j} _and _ki is 1 byte. As shown in Figure 14, according to the preset rules on the deduplication server 103, 11 points are determined for the potential segmentation point k _j , which are p _j1 , p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , _p _j8 _, _p _j9 _, _p _j10 _and _p _j11 _, _determine _the _{corresponding} The windows are respectively W _j1 [p _j1 -169,p _j1 ], W _j2 [p _j2 -169,p _j2 ], W _j3 [p _j3 -169,p _j3 ], W _j4 [p _j4 -169,p _j4 ] , W _j5 [p _j5 -169,p _j5 ], W _j6 [p _j6 -169,p _j6 ], W _j7 [p _{j7 -169} ,p _j7 ], W _j8 [p _{j8 -169} ,p _j8 ], W _j9 [p _j9 -169,p _j9 ], W _j10 [p _j10 -169,p _j10 ] and W _j11 [p _j11 -169,p _j11 ], where A ₁ ＝A ₂ ＝A ₃ ＝A ₄ ＝A ₅ =A ₆ =A ₇ =A ₈ =A ₉ =A ₁₀ =A ₁₁ =169, B ₁ =B ₂ =B ₃ =B ₄ =B ₅ =B ₆ =B ₇ =B ₈ =B ₉ =B ₁₀ =B ₁₁ =0, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ . Among them, the distance between p _jx and potential segmentation point k _j is d _x bytes, specifically, the distance between p _j1 and k _j is 0 bytes, the distance between p _j2 and k _j is 2 bytes, and the distance between p _j3 and k _j 4 bytes, 6 bytes between p _j4 and k _j , 8 bytes between p _j5 and k _j , 10 bytes between p _j6 and k _j , 12 bytes between p _j7 and k _j , p The distance between _j8 and k _j is 14 bytes, the distance between p _j9 and k _j is 16 bytes, the distance between p _j10 and k _j is 18 bytes, the distance between p _j11 and k _j is 20 bytes, and p _j2 , p _j3 , p _j4 , p _j5 , p _j6 , p _j7 , p _j8 , p _j9 , p _j10 , and p _j11 are all located in the opposite direction of the data flow split point search relative to the potential split point k _j . Judging whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , judging W _j3 [ Whether at least part of the data in p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , and judge whether W _j5 [p _j5 -169 ,p _j5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging W _j7 [p _{j7 -169} ,p _j7 ] Whether at least part of the data in W j8 [p j8 -169,p j8 ] satisfies the predetermined condition C ₇ , whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] Whether the predetermined condition C ₉ is met, judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _j11 [p _j11 -169, p _j11 ] meets the predetermined condition _C11 . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in the window W _i9 meets the predetermined condition _C9 , at least part of the data in the window _Wj10 meets the predetermined condition _C10 , and at least part of the data in the window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data flow splitting point. When judging that at least part of the data in any of the windows W _j1 , W _j2 , W _j3 , W _j4 , W _j5 , W _j6 , W _j7 , W _j8 , W _j9 , W _j10 and W _j11 does not meet the predetermined conditions, as As shown in Figure 15, when at least part of the data in W _j3 [p _j3 -169, p _j3 ] does not meet the predetermined condition C ₃ , point p _i4 is located on the left side of point p _j3 relative to the search direction of the data flow splitting point, from point p _i4 along Jump 21 bytes in the search direction of the data stream split point to obtain the next potential split point, which is expressed as k _l to distinguish it from potential split points ki and _{k j} _. According to the preset rules on the deduplication server 103 in the embodiment shown in FIG. 13 , there are 11 points determined for the potential segmentation point k _l , which are p _l1 , p _l2 , p _l3 , p _l4 , p _l5 , and p _l6 , p _l7 , p _l8 , p _l9 , p _l10 and p _l11 , the points p _l1 , p _l2 , p _l3 , p _l4 , p _l5 , p _l6 , p _l7 , p _l8 , p _l9 , p _l10 and p _l11 correspond to The windows of W _l1 [p _l1 -169,p _l1 ], W _l2 [p _l2 -169,p _l2 ], W _l3 [p _l3 -169,p _l3 ], W _l4 [p _l4 -169,p _l4 ], W _l5 [p _l5 -169,p _l5 ], W _l6 [p _l6 -169,p _l6 ], W _l7 [p _l7 -169,p _l7 ], W _l8 [p _l8 -169,p _l8 ], W _l9 [p _l9 -169,p _l9 ], W _l10 [p _l10 -169,p _l10 ] and W _l11 [p _{l11 -169} ,p _l11 ], where the distance between p _lx and potential segmentation point k _l is d _x bytes, specifically, the distance between p _l1 and potential segmentation point k _l is 0 bytes, the distance between p _l2 and k _l is 2 bytes, the distance between p _l3 and k _l is 4 bytes, The distance between p _l4 and k _l is 6 bytes, the distance between p _l5 and k _l is 8 bytes, the distance between p _l6 and k _l is 10 bytes, and the distance between p _l7 and k _l is 12 characters section, the distance between p _l8 and k _l is 14 bytes, the distance between p _l9 and k _l is 16 bytes, the distance between p _l10 and k _l is 18 bytes, and the distance between p _l11 and k _l is 20 bytes bytes, and p _l2 , p _l3 , p _l4 , p _l5 , p _l6 , p _l7 , p _l8 , p _l9 , p _l10 and p _l11 are all located in the opposite direction of the data stream split point search relative to the potential split point k _l . Judging whether at least part of the data in W _l1 [p _l1 -169,p _l1 ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _l2 [p _l2 -169,p _l2 ] meets the predetermined condition C ₂ , judging W _l3 [ p _l3 -169,p _l3 ] whether at least part of the data meet the predetermined condition C ₃ , judge whether at least part of the data in W _l4 [p _l4 -169,p _l4 ] meet the predetermined condition C ₄ , judge W _l5 [p _l5 -169 ,p _l5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judge whether at least part of the data in W _l6 [p _l6 -169,p _l6 ] meets the predetermined condition C ₆ , judge whether W _l7 [p _l7 -169,p _l7 ] Whether at least part of the data in W l8 [p l8 -169,p l8 ] meets the predetermined condition C ₇ , whether at least part of the data in W _l8 [p _l8 -169,p _l8 ] meets the predetermined condition C ₈ , and whether at least part of the data in W _l9 [p _l9 -169,p _l9 ] is judged Whether to meet the predetermined condition C ₉ , judging whether at least part of the data in W _l10 [p _l10 -169,p _l10 ] meets the predetermined condition C ₁₀ and judging whether at least part of the data in W _l11 [p _{l11 -169} ,p _l11 ] meets the predetermined condition _C11 . When it is judged that at least part of the data in window W _l1 meets the predetermined condition C ₁ , at least part of the data in window W _l2 meets the predetermined condition C ₂ , at least part of the data in window W _l3 meets the predetermined condition C ₃ , and at least part of the data in window W _l4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _l5 meet the predetermined condition C ₅ , at least part of the data in window W _l6 meet the predetermined condition C ₆ , at least part of the data in window W _l7 meet the predetermined condition C ₇ , at least part of the data in window W _l8 When the predetermined condition C8 is met, at least part of the data in window _W19 meets the predetermined condition _C9 , at least part of the data in window _W110 meets the predetermined condition _C10 , and at least part of the data in window _W111 _meets the predetermined condition _C11 , then the current potential segmentation Point k _l is the data flow splitting point. When at least some of the data in any of the windows W _l1 , W _l2 , W _l3 , W _l4 , W _l5 , W _l6 , W _l7 , W _l8 , W _l9 , W _l10 and W _l11 do not meet the predetermined conditions, select the following A potential segmentation point, to be distinguished from potential segmentation points _ki , _kj and _kl , is denoted as km, km is located to the right of _kl _, and the distance between _km and _kl is ₁ byte. According to the preset rules on the deduplication server 103 in the embodiment shown in Fig. 13, there are 11 points determined for the potential segmentation point k _m , which are respectively p _m1 , p _m2 , p _m3 , p _m4 , p _m5 , p _m6 , p _m7 , p _m8 , p _m9 , p _m10 and p _m11 , the points p _m1 , p _m2 , p _m3 , p _m4 , p _m5 , p _m6 , p _m7 , p _m8 , p _m9 , p _m10 and p _m11 correspond to The windows are respectively W _m1 [p _m1 -169,p _m1 ], W _m2 [p _m2 -169,p _m2 ], W _m3 [p _m3 -169,p _m3 ], W _m4 [p _m4 -169,p _m4 ], W _m5 [p _m5 -169, p _m5 ], W _m6 [p _m6 -169, p _m6 ], W _m7 [p _m7 -169, p _m7 ], W _m8 [p _m8 -169, p _m8 ], W _m9 [p _m9 -169,p _m9 ], W _m10 [p _m10 -169,p _m10 ] and W _m11 [p _m11 -169,p _m11 ], where the distance d between p _mx and the potential segmentation point k _m _x bytes, specifically, the distance between p _m1 and potential segmentation point k _m is 0 bytes, the distance between p _m2 and k _m is 2 bytes, the distance between p _m3 and k _m is 4 bytes, The distance between p _m4 and k _m is 6 bytes, the distance between p _m5 and k _m is 8 bytes, the distance between p _m6 and k _m is 10 bytes, and the distance between p _m7 and k _m is 12 characters section, the distance between p _m8 and k _m is 14 bytes, the distance between p _m9 and k _m is 16 bytes, the distance between p _m10 and k _m is 18 bytes, and the distance between p _m11 and k _m is 20 bytes bytes, and p _m2 , p _m3 , p _m4 , p _m5 , p _m6 , p _m7 , p _m8 , p _m9 , p _m10 and p _m11 are all located in the reverse direction of the data flow split point search relative to the potential split point k _m . Judging whether at least part of the data in W _m1 [p _m1 -169, p _m1 ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _m2 [p _m2 -169, p _m2 ] meets the predetermined condition C ₂ , judging whether W _m3 [ p _m3 -169, p _m3 ] whether at least part of the data satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _m4 [p _m4 -169, p _m4 ] meet the predetermined condition C ₄ , and judge whether W _m5 [p _m5 -169 ,p _m5 ] whether at least part of the data satisfies the predetermined condition C ₅ , judging whether at least part of the data in W _m6 [ _pm6 -169, _pm6 ] meets the predetermined condition C ₆ , judging whether W _m7 [p _m7 -169,p _m7 ] Whether at least part of the data in W _m8 [p _m8 _-169 ,p _m8 ] meets the predetermined condition C ₈ , _whether at least part of the data in W _m8 [ _pm Whether to meet the predetermined condition C ₉ , judge whether at least part of the data in W _m10 [p _m10 -169, p _m10 ] meet the predetermined condition C ₁₀ and judge whether at least part of the data in W _m11 [p _m11 -169, p _m11 ] meet the predetermined condition _C11 . When it is judged that at least part of the data in window W _m1 meets the predetermined condition C ₁ , at least part of the data in window W _m2 meets the predetermined condition C ₂ , at least part of the data in window W _m3 meets the predetermined condition C ₃ , and at least part of the data in window W _m4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _m5 meets the predetermined condition C ₅ , at least part of the data in window W _m6 meets the predetermined condition C ₆ , at least part of the data in window W _m7 meets the predetermined condition C ₇ , and at least part of the data in window W _m8 When the predetermined condition C8 is met, at least part of the data in window _Wm9 meets the predetermined condition _C9 , at least part of the data in window _Wm10 meets the predetermined condition _C10 , and at least part of the data in window _Wm11 _meets the predetermined condition _C11 , then the current potential segmentation The point k _m is the data flow splitting point. When at least part of the data in any window does not meet the predetermined condition, jumping is performed according to the scheme described above to obtain the next potential split point and determine whether it is a data flow split point.

本发明实施例提供了一种判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，以图5所示的实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及点p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定的条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中“■”表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为a_m,1…a_m,8,表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当 a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，255个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，可以表示为：选取大量随机数，组成矩阵，由随机数据组成的矩阵一旦组成，保持不变，如从服从特定分布(这里以正态分布为例)的随机数中选择255*8个随机数组成矩阵R：将矩阵V_a的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8。根据该方法，获得S_a1、S_a2…到S_a255，统计S_a1、S_a2…到S_a255中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_am与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_a1、S_a2…到S_a255中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_a1、S_a2…到S_a255的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁；当K为奇数时，表明W_i1[p_i1-169,p_i1]中至少部分数据不满足预定条件C₁,这里C₁即指根据上述方式获得的S_a1、S_a2…到S_a255的值大于0的个数K为偶数。在图5所示的实施方式中，在W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169, p_i10]和W_i11[p_i11-169,p_i11]中，各窗口大小相同,即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的描述。因此，如图16所示，表示判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为b_m,1…b_m,8,表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当b_m,n＝1时，V_bm,n＝1，当b_m,n＝0时，V_bm,n＝-1，其中b_m,n表示b_m,1…b_m,8中的任一个，255个字节对应的位按照b_m,n与V_bm,n的转换关系得到矩阵V_b，可以表示为：判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件的方式与判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件的方式相同，因此使用矩阵R：将矩阵V_b的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_bm＝V_bm,1*h_m,1+V_bm,2*h_m,2+…+V_bm,8*h_m,8。根据该方法，获得S_b1、S_b2…到S_b255，统计S_b1、S_b2…到S_b255中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_bm与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_b1、S_b2…到S_b255中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_b1、S_b2…到S_b255的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂；当K为奇数时，表明W_i2[p_i2-169,p_i2]中至少部分数据不满足预定条件C₂,这里C₂即指根据上述方式获得的S_b1、S_b2…到S_b255的值大于0的个数K为偶数。图3所示的实施方式中，W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂。The embodiment of the present invention provides a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z . In this embodiment, a random function is used to judge the window W _iz [ Whether at least part of the data in p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z , taking the implementation shown in Figure 5 as an example, according to the preset rules on the deduplication server 103, is a potential split k _i determines the point p _i1 and the window W _i1 [p _i1 -169, p _i1 ] corresponding to the point p _i1 , and judges whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , such as As shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169, p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , select 5 bytes , "■" in Figure 16 represents one selected byte, and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, the bits corresponding to 255 bytes It can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 For any one, the bits corresponding to 255 bytes are obtained according to the conversion relationship between a _m,n and V _am,n to obtain the matrix V _a , which can be expressed as: Select a large number of random numbers to form a matrix. Once the matrix composed of random data is formed, it remains unchanged. For example, select 255*8 random numbers from random numbers that obey a specific distribution (here, take the normal distribution as an example) to form a matrix R: Multiply the m-th row of the matrix V _a with the random number in the m-th row of the matrix R, and then sum to get a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _am,2 *h _m,2 +...+V _am,8 *h _m,8 . According to this method, S _a1 , S _a2 . . . to S _a255 are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _a1 , S _a2 . . . to S _a255 is counted. Since the matrix R obeys the normal distribution, S _am , like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of the normal distribution random number being greater than 0 is 1/2, in S _a1 , S _a2 ... to S _a255 In , the probability of each value greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K _whose value is greater than 0 from S _a1 , S _a2 . To meet the conditions. When K is an even number, it indicates that at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ ; when K is an odd number, it indicates that at least part of the data in W _i1 [p _i1 -169,p _i1 ] The predetermined condition C ₁ is not satisfied, where C ₁ means that the number K of _S _a1 , S _a2 . _[ _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169, p _i10 ], and W _i11 [p _i11 -169,p _i11 ], each window has the same size, namely The size of the window is 169 bytes. At the same time, the method of judging whether at least part of the data in the window meets the predetermined condition is the same. For details, see the above-mentioned judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ describe. Therefore, as shown in Figure 16, Indicates one byte selected when judging whether at least part of the data in the window W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as b _m,1 ...b _m,8 , which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, the corresponding bits of 255 bytes It can be expressed as: When b _m,n =1, V _bm,n =1, when b _m,n =0, V _bm,n =-1, where b _m,n represents b _m,1 ...b _m,8 For any one, the bits corresponding to 255 bytes are obtained according to the conversion relationship between b _m,n and V _bm,n to obtain the matrix V _b , which can be expressed as: The method of judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition is the same as the method of judging whether at least part of the data in the window W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition, so use Matrix R: Multiply the m-th row of the matrix V _b with the random number in the m-th row of the matrix R, and then sum to get a value, specifically expressed as S _bm =V _bm,1 *h _m,1 +V _bm,2 *h _m,2 +...+V _bm,8 *h _m,8 . According to this method, obtain S _b1 , S _b2 . . . to S _b255 , and _count the number K of values in S _b1 , S _b2 . Since the matrix R obeys the normal distribution, S _bm , like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of the random number of the normal distribution being greater than 0 is 1/2, in S _b1 , S _b2 ... to S _b255 In , the probability of each value greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K whose value is greater than 0 from S _b1 , S _b2 ... to S _b255 is an even number, the probability that the random number of the binomial distribution is an even number is 1/2, so K is with the probability of 1/2 To meet the conditions. When K is an even number, it indicates that at least part of the data in W _i2 [p _i2 -169, p _i2 ] meets the predetermined condition C ₂ ; when K is an odd number, it indicates that at least part of the data in W _i2 [p _i2 -169, p _i2 ] The predetermined condition C ₂ is not met, where C ₂ means that the number K of _S _b1 , S _b2 . In the embodiment shown in FIG. 3 , at least part of the data in W _i2 [p _i2 −169, p _i2 ] satisfies the predetermined condition C ₂ .

因此，如图16所示，表示判断窗口W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[p_i1-169,p_i1]和W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件的方法，判断W_i3[p_i3-169,p_i3]中至少数据是否满足预定条件C₃。图5所示的实施方式中，W_i3[p_i3-169,p_i3]中至少部分数据满足预定条件。如图16所示，表示判断窗口W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]和W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件的方法，判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄。图5所示的实施方式中，W_i4[p_i4-169,p_i4]中至少部分数据满足预定条件C₄。如图16所示，表示判断窗口W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]和W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件的方法，判断W_i5[p_i5-169,p_i5]中至少数据是否满足预定条件C₅。图5所示的实施方式中，W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅。Therefore, as shown in Figure 16, Indicates one byte selected when judging whether at least part of the data in the window W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the method of judging whether at least part of the data in the windows W _i1 [p _i1 -169,p _i1 ] and W _i2 [p _i2 -169,p _i2 ] meet the predetermined conditions, and judge W _i3 [p _i3 -169,p _i3 ] Whether at least the data satisfy the predetermined condition C ₃ . In the embodiment shown in FIG. 5 , at least part of the data in W _i3 [p _i3 −169, p _i3 ] satisfies a predetermined condition. As shown in Figure 16, Indicates one byte selected when judging whether at least part of the data in the window W _i4 [p _i4 -169,p _i4 ] satisfies the predetermined condition C ₄ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the method of judging whether at least part of the data in the windows W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ] and W _i3 [p _i3 -169,p _i3 ] meet the predetermined conditions, It is judged whether at least part of the data in W _i4 [p _i4 -169, p _i4 ] satisfies the predetermined condition C ₄ . In the embodiment shown in FIG. 5 , at least part of the data in W _i4 [p _i4 -169, p _i4 ] satisfies the predetermined condition C ₄ . As shown in Figure 16, Indicates one byte selected when judging whether at least part of the data in the window W _i5 [p _i5 -169,p _i5 ] satisfies the predetermined condition C ₅ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the judgment windows W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ] and W _i4 [p _i4 -169,p _i4 The method of whether at least part of the data in W _i5 [p _i5 −169, p _i5 ] satisfies the predetermined condition C ₅ is determined. In the embodiment shown in FIG. 5 , at least part of the data in W _i5 [p _i5 −169, p _i5 ] does not satisfy the predetermined condition C ₅ .

当W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件时C₅，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得下一个潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，选择5个字节，图17中“■”表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为a_m,1'…a_m,8',表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当a_m,n'＝1时，V_am,n'＝1，当a_m,n'＝0时，V_am,n'＝-1，其中a_m,n'表示a_m,1'…a_m,8'中的任一个，255个字节对应的位按照a_m,n'与 V_am,n'的转换关系得到矩阵V_a'，可以表示为：判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定的条件与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定的条件的方式相同，因此使用矩阵R：将矩阵V_a'的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am'＝V_am,1'*h_m,1+V_am,2'*h_m,2+…+V_am,8'*h_m,8。根据该方法，获得S_a1'、S_a2'…到S_a255'，统计S_a1'、S_a2'…到S_a255'中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_am'与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_a1'、S_a2'…到S_a255'中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_a1'、S_a2'…到S_a255'的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁；当K为奇数时，表明W_j1[p_j1-169,p_j1]中至少部分数据不满足预定条件C₁。When at least part of the data in W _i5 [p _i5 -169,p _i5 ] does not meet the predetermined condition C ₅ , jump 11 bytes from point p _i5 along the direction of data flow splitting point search, at the end of the 11th byte The position obtains the next potential segmentation point k _j , as shown in Figure 6, according to the preset rules on the deduplication server 103, determine the point p _j1 for the potential segmentation point k _j , and the window W _j1 [p _j1 corresponding to the point p _j1 -169,p _j1 ], the method of judging whether at least part of the data in the window W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ is the same as judging at least part of the data in the window W _i1 [p _i1 -169,p _i1 ] Whether or not the predetermined condition C ₁ is met is the same, so as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], to judge at least part of the data in W _j1 [p _j1 -169,p _j1 ] Whether the predetermined condition C ₁ is met or not, 5 bytes are selected. "■" in FIG. 17 represents 1 byte selected, and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as a _m,1 '...a _m,8 ', indicating the 1st to 8th bits of the mth byte in 255 bytes, therefore, 255 bytes correspond to bits can be expressed as: When a _m,n '=1, V _am,n '=1, when a _m,n '=0, V _am,n '=-1, where a _m,n ' means a _m,1 '... For any one of a _m,8 ', the bits corresponding to 255 bytes are obtained according to the conversion relationship between a _m,n ' and V _am,n ' to obtain a matrix V _a ', which can be expressed as: Judging whether at least part of the data in the window W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition is the same as judging whether at least part of the data in the window W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition, so Using matrix R: Multiply the mth row of the matrix V _a ' with the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am '=V _am,1 '*h _m,1 +V _{am, 2} '*h _m,2 +...+V _am,8 '*h _m,8 . According to this method, S _a1 ′, S _a2 ′… to S _a255 ′ are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _a1 ′, S _a2 ′… to S _a255 ′ is counted. Since the matrix R obeys the normal distribution, S _am ', like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of a normal distribution random number being greater than 0 is 1/2. In S _a1 ', S _a2 '... To S _a255 ', the probability of each value being greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K whose value is greater than 0 from S _a1 ', S _a2 '... to S _a255 ' is an even number, and the probability that the random number of the binomial distribution is an even number is 1/2, so K is 1/2 The probability satisfies the condition. When K is an even number, it indicates that at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ ; when K is an odd number, it indicates that at least part of the data in W _j1 [p _j1 -169,p _j1 ] The predetermined condition C ₁ is not satisfied.

判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图17所示，表示判断窗口W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为b_m,1'…b_m,8',表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当b_m,n'＝1时，V_bm,n'＝1，当b_m,n'＝0时，V_bm,n'＝-1，其中b_m,n'表示b_m,1'…b_m,8'中的任一个，255个字节对应的位按照b_m,n'与V_bm,n'的转换关系得到矩阵V_b'，可以表示为：窗口W₂[p₂-169,p₂]和W₂[q₂-169,q₂]中至少部分数据是否满足预定条件的方式相同，因此仍使用矩阵R：将矩阵V_b'的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_bm'＝V_bm,1'*h_m,1+V_bm,2'*h_m,2+…+V_bm,8'*h_m,8。根据该方法，获得S_b1'、S_b2'…到S_b255'，统计S_b1'、S_b2'…到S_b255'中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_bm'与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_b1'、S_b2'…到S_b255'中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_b1'、S_b2'…到S_b255'的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_j2[p_j2 -169,p_j2]中至少部分数据满足预定条件C₂；当K为奇数时，表明W_j2[p_j2-169,p_j2]中至少部分数据不满足预定条件C₂。同理，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。The method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ is the same as the method of judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , so, as shown in Figure 17, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as b _m,1 '...b _m,8 ', which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, 255 bytes correspond to bits can be expressed as: When b _m,n '=1, V _bm,n '=1, when b _m,n '=0, V _bm,n '=-1, where b _m,n ' means b _m,1 '... For any one of b _m,8 ', the bits corresponding to 255 bytes are obtained according to the conversion relationship between b _m,n ' and V _bm,n ' to obtain matrix V _b ', which can be expressed as: Whether at least part of the data in the window W ₂ [p ₂ -169,p ₂ ] and W ₂ [q ₂ -169,q ₂ ] meet the predetermined condition is the same, so the matrix R is still used: Multiply the mth row of the matrix V _b ' with the random number of the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _bm '=V _bm,1 '*h _m,1 +V _{bm, 2} '*h _m,2 +...+V _bm,8 '*h _m,8 . According to this method, S _b1 ′, S _b2 ′… to S _b255 ′ are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _b1 ′, S _b2 ′… to S _b255 ′ is counted. Since the matrix R obeys the normal distribution, S _bm ', like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of a normal distribution random number being greater than 0 is 1/2. In S _b1 ', S _b2 '... To S _b255 ', the probability of each value being greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K of S _b1 ', S _b2 '... to S _b255 ' whose value is greater than 0 is an even number, and the probability that the random number of the binomial distribution is an even number is 1/2, so K is 1/ The probability of 2 satisfies the condition. When K is an even number, it indicates that at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ ; when K is an odd number, it indicates that at least part of the data in W _j2 [p _j2 -169,p _j2 ] The predetermined condition C ₂ is not satisfied. Similarly, the method of judging whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ In the same way, judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , judge whether at least part of the data in W _j5 [p _j5 -169,p _j5 ] meet the predetermined condition C _5. Judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging whether at least part of the data in W _j7 [p _{j7 -169} ,p _j7 ] meets the predetermined condition C ₇ , judging Whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , judge whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] meet the predetermined condition C ₉ , and judge whether W _j10 [p Whether at least part of the data in _j10 -169,p _j10 ] satisfies the predetermined condition C ₁₀ and whether at least part of the data in W _j11 [p _j11 -169,p _j11 ] meets the predetermined condition C ₁₁ will not be repeated here.

仍然以图5所示实施方式为例，提供了一种判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定的条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中“■”表示选择的1个字节，相邻两个选择“■”的字节之间相差42个字节。其中一种实现方式为使用HASH函数计算选择的5个字节，使用HASH函数计算得到的数值是一个固定均匀分布，如果使用HASH函数计算得到的数值为偶数，则判断W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁，即C₁表示根据上述方式使用HASH 函数计算得到的数值为偶数。因此，W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件的概率为1/2。在图5所示的实施方式中，使用Hash函数判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂、W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄和W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅，具体实现可参考描述图5所示实施方式使用Hash函数判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件的方式C₁，在此不再赘述。Still taking the embodiment shown in FIG. 5 as an example, a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is provided, which is used in this embodiment The random function judges whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] meets the predetermined condition C _z , and determines the point for the potential split point k _i according to the preset rules on the deduplication server 103 p _i1 and the window W _i1 [p _i1 -169, p _i1 ] corresponding to p _i1 , judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , as shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169,p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ , select 5 bytes, in Figure 16 "■" indicates one byte selected, and the difference between two adjacent bytes selected "■" is 42 bytes. One of the implementation methods is to use the HASH function to calculate the selected 5 bytes. The value calculated by the HASH function is a fixed uniform distribution. If the value calculated by the HASH function is an even number, then judge W _i1 [p _i1 -169 ,p _i1 ] at least part of the data satisfies the predetermined condition C ₁ , that is, C ₁ indicates that the value calculated by using the HASH function in the above manner is an even number. Therefore, the probability of whether at least part of the data in W _i1 [p _i1 −169, p _i1 ] satisfies the predetermined condition is 1/2. In the embodiment shown in FIG. 5 , the Hash function is used to determine whether at least part of the data in W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C ₂ , and at least part of the data in W _i3 [p _i3 -169, p _i3 ] Whether the data meets the predetermined condition C ₃ , whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meet the predetermined condition C ₄ and whether at least part of the data in W _i5 [p _i5 -169,p _i5 ] meet the predetermined condition C _5. For specific implementation, refer to the manner C ₁ that describes whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies a predetermined condition by using a Hash function in the embodiment shown in FIG. 5 , and details are not repeated here.

当W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅时，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得当前潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，选择5个字节，图17中“■”表示选择的1个字节，相邻两个选择的字节“■”之间相差42个字节。使用Hash函数计算从窗口W_j1[p_j1-169,p_j1]中选取的5个字节，如果得到的数值为偶数，则W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁。图17中，判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图17所示，表示判断窗口W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。使用Hash函数计算选择的5个字节，如果得到的数值为偶数，则W_j2[p_j2-169,p_j2]中至少部分数据满足预定条件C₂。图17中，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃的方式相同，因此，如图17所示，表示判断窗口W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃时选择的1个字节，相邻两个选择的字节“”之间相差42个字节。使用Hash函数计算选择的5个字节，得到的数值为偶数，则W_j3[p_j3-169,p_j3]中至少部分数据满足预定条件C₃。图17中，判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄的方式和判断窗口W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄的方式，因此，如图17所示，表示判断窗口W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄时选择的1个字节，相邻两个选择的字节之间相差42个字节。使用Hash函数计算选择的5个字节，得到的数值为偶数，则W_j4[p_j4-169,p_j4]中至少部分数据满足预定条件C₄。根据上述方法，判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。When at least part of the data in W _i5 [p _i5 -169,p _i5 ] does not meet the predetermined condition C ₅ , jump 11 bytes from point p _i5 along the direction of data flow split point search, at the end of the 11th byte The position obtains the current potential segmentation point k _j , as shown in Figure 6, according to the preset rules on the deduplication server 103, determine the point p _j1 _for the potential segmentation point k _j , and the window W _j1 [p _j1 - 169,p _j1 ], the method of judging whether at least part of the data in the window W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ is the same as judging whether at least part of the data in the window W _i1 [p _i1 -169,p _i1 ] The way to satisfy the predetermined condition C ₁ is the same, so as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], in order to judge whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] Satisfy the predetermined condition C ₁ , select 5 bytes, "■" in Fig. 17 represents 1 selected byte, and the difference between two adjacent selected bytes "■" is 42 bytes. Use the Hash function to calculate the 5 bytes selected from the window W _j1 [p _j1 -169,p _j1 ], if the obtained value is an even number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined Condition C ₁ . In Fig. 17, the method of judging whether at least part of the data in W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [p _j2 -169, p _j2 ] satisfies the predetermined condition C ₂ in the same way, so, as shown in Figure 17, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , and the adjacent two selected bytes There is a difference of 42 bytes between them. Use the Hash function to calculate the selected 5 bytes, and if the obtained value is an even number, then at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ . In Fig. 17, the method of judging whether at least part of the data in W _i3 [p _i3 -169, p _i3 ] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [p _j3 -169, p _j3 ] satisfies the predetermined condition C ₃ in the same way, so, as shown in Figure 17, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j3 [p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ , and the adjacent two selected bytes " There is a difference of 42 bytes between ". Use the Hash function to calculate the selected 5 bytes, and the value obtained is an even number, then at least part of the data in W _j3 [p _j3 -169,p _j3 ] meets the predetermined condition C ₃ . Figure 17 In the method of judging whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ and whether at least part of the data in the window W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ way, therefore, as shown in Figure 17, Indicates one byte selected when judging whether at least part of the data in the window W _j4 [p _j4 -169,p _j4 ] satisfies the predetermined condition C ₄ , and two adjacent selected bytes There is a difference of 42 bytes between them. Use the Hash function to calculate the selected 5 bytes, and the value obtained is an even number, then at least part of the data in W _j4 [p _j4 -169,p _j4 ] satisfies the predetermined condition C ₄ . According to the above method, judge whether at least part of the data in W _j5 [p _j5 -169,p _j5 ] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meet the predetermined condition C ₆ , Judging whether at least part of the data in W _j7 [p _{j7 -169} ,p _j7 ] meets the predetermined condition C ₇ , judging whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , judging W _j9 [ p _j9 -169, p _j9 ] whether at least part of the data satisfies the predetermined condition C ₉ , judging whether at least part of the data in W _j10 [p _j10 -169, p _j10 ] meets the predetermined condition C ₁₀ and judging W _j11 [p _j11 -169 ,p _j11 ] whether at least part of the data satisfies the predetermined condition C ₁₁ , which will not be repeated here.

以图5所示的实施方式为例，提供了一种判断窗口W_iz[p_iz-A_z,p_iz +B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁、a₂、a₃、a₄和a₅。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁、a₂、a₃、a₄和a₅中的任一个a_r均满足0≤a_r≤255。a₁、a₂、a₃、a₄和a₅组成1*5的矩阵。从服从二项分布的随机数中选择256*5个随机数，组成矩阵R，表示为： Taking the implementation shown in FIG. 5 as an example, a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is provided. In this embodiment, The random function judges whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] meets the predetermined condition C _z , and determines the point for the potential split point k _i according to the preset rules on the deduplication server 103 The window W _i1 [p _i1 -169, p _i1 ] corresponding to p _i1 and p _i1 , judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , as shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169,p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , select 5 bytes, and the serial number in Figure 16 is The byte "■" of 169, 127, 85, 43 and 1 represents one selected byte respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are respectively converted into a decimal value, expressed as a ₁ , a ₂ , a ₃ , a ₄ and a ₅ respectively. Because a byte is composed of 8 bits, each byte "■" is used as a value, and any a _r among a ₁ , a ₂ , a ₃ , a ₄ and a ₅ satisfies 0≤a _r ≤ 255. a ₁ , a ₂ , a ₃ , a ₄ and a ₅ form a 1*5 matrix. Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix R, which is expressed as:

根据a₁的值和所在的列，从矩阵R中查找对应的值，如a₁＝36，a₁位于第1列，则查找h_36,1对应的值；根据a₂的值和所在的列，从矩阵R中查找对应的值，如a₂＝48，a₂位于第2列，则查找h_48,2对应的值；根据a₃的值和所在的列，从矩阵R中查找对应的值，如a₃＝26，a₃位于第3列，则查找h_26,3对应的值；根据a₄的值和所在的列，从矩阵R中查找对应的值，如a₄＝26，a₄位于第4列，则查找h_26,4对应的值；根据a₅的值和所在的列，从矩阵R中查找对应的值，如a₅＝88，a₅位于第5列，则查找h_88,5对应的值。S₁＝h_36,1+h_48,2+h_26,3+h_26,4+h_88,5，因为矩阵R服从二项分布，因此，S₁也服从二项分布。当S₁为偶数，则W_i1[p_i1-169,p_i1] 中至少部分数据满足预定条件C₁，当S₁为奇数，则W_i1[p_i1-169,p_i1]中至少部分数据不满足预定条件C₁，S₁为偶数的概率为1/2，C₁表示按上述方式计算S₁为偶数。在图5所示实施例中，W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁。如图16所示，表示判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图16中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁、b₂、b₃、b₄和b₅。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁、b₂、b₃、b₄和b₅中的任一个b_r均满足0≤b_r≤255。b₁、b₂、b₃、b₄和b₅组成1*5的矩阵。本实施方式中，判断W_i1和W_i2中至少部分数据是否满足预定条件的方式相同，因此仍然使用矩阵R，根据b₁的值和所在的列，从矩阵R中查找对应的值，如b₁＝66，b₁位于第1列，则查找h_66,1对应的值；根据b₂的值和所在的列，从矩阵R中查找对应的值，如b₂＝48，b₂位于第2列，则查找h_48,2对应的值；根据b₃的值和所在的列，从矩阵R中查找对应的值，如b₃＝99，b₃位于第3列，则查找h_99,3对应的值；根据b₄的值和所在的列，从矩阵R中查找对应的值，如b₄＝26，b₄位于第4列，则查找h_26,4对应的值；根据b₅的值和所在的列，从矩阵R中查找对应的值，如b₅＝90，b₅位于第5列，则查找h_90,5对应的值。S₂＝h_66,1+h_48,2+h_99,3+h_26,4+h_90,5,因为矩阵R服从二项分布，因此，S₂也服从二项分布。当S₂为偶数，则W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂，当S₂为奇数，则W_i2[p_i2-169,p_i2]中至少部分数据不满足预定条件C₂，S₂为偶数的概率为1/2。在图5所示实施例中，W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。图5所示的实施方式中，W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得当前潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，图17中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁'、a₂'、a₃'、a₄'和a₅'。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁'、a₂'、a₃'、a₄'和a₅'中的任一个a_r'均满足0≤a_r'≤255。a₁'、a₂'、a₃'、a₄'和a₅'组成1*5的矩阵。判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此，仍然使用矩阵 R，表示为： According to the value of a ₁ and the column where it is located, find the corresponding value from the matrix R, such as a ₁ = 36, a ₁ is located in the first column, then look for the value corresponding to h _36,1 ; according to the value of a ₂ and where it is column, find the corresponding value from the matrix R, such as a ₂ = 48, a ₂ is located in the second column, then find the value corresponding to h _48,2 ; according to the value of a ₃ and the column where it is located, find the corresponding value from the matrix R value, such as a ₃ = 26, a ₃ is located in the third column, then look for the value corresponding to h _26,3 ; according to the value of a ₄ and the column where it is located, find the corresponding value from the matrix R, such as a ₄ = 26 , a ₄ is located in the fourth column, then search for the value corresponding to h _26,4 ; according to the value of a ₅ and the column where it is located, find the corresponding value from the matrix R, such as a ₅ = 88, a ₅ is located in the fifth column, Then find the value corresponding to h _88,5 . S ₁ =h _36,1 +h _48,2 +h _26,3 +h _26,4 +h _88,5 , because the matrix R obeys the binomial distribution, therefore, S ₁ also obeys the binomial distribution. When S ₁ is an even number, then at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , and when S ₁ is an odd number, then at least part of the data in W _i1 [p _i1 -169,p _i1 ] If the predetermined condition C ₁ is not met, the probability that S ₁ is an even number is 1/2, and C ₁ indicates that S ₁ is an even number calculated in the above manner. In the embodiment shown in FIG. 5 , at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ . As shown in Figure 16, Represents the 1 byte selected when judging whether at least part of the data in the window W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C _2. In FIG. Indicates that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 respectively converted into a decimal value, respectively expressed as b ₁ , b ₂ , b ₃ , b ₄ and b ₅ . Since 1 byte consists of 8 bits, each byte As a numerical value, any b _r among b ₁ , b ₂ , b ₃ , b ₄ and b ₅ satisfies 0≤b _r ≤255. b ₁ , b ₂ , b ₃ , b ₄ and b ₅ form a 1*5 matrix. In this embodiment, the method of judging whether at least part of the data in W _i1 and W _i2 meets the predetermined conditions is the same, so the matrix R is still used, and the corresponding value is searched from the matrix R according to the value _of b1 and the column where it is located, such as b ₁ = 66, b ₁ is located in the first column, then search for the value corresponding to h _66,1 ; according to the value of b ₂ and the column where it is located, find the corresponding value from the matrix R, such as b ₂ = 48, b ₂ is in the first column 2 columns, then search for the value corresponding to h _48,2 ; according to the value of b ₃ and the column where it is located, find the corresponding value from the matrix R, such as b ₃ =99, b ₃ is located in the third column, then search for h _99, The value corresponding to ₃ ; according to the value of b ₄ and the column where it is located, find the corresponding value from the matrix R, such as b ₄ = 26, b ₄ is located in the fourth column, then find the value corresponding to h ₂₆ , 4; according to b ₅ Find the corresponding value from the matrix R, such as b ₅ =90, and b ₅ is located in the fifth column, then find the value corresponding to h _90,5 . S ₂ =h _66,1 +h _48,2 +h _99,3 +h _26,4 +h _90,5 , because the matrix R obeys the binomial distribution, therefore, S ₂ also obeys the binomial distribution. When S ₂ is an even number, at least part of the data in W _i2 [p _i2 -169, p _i2 ] meets the predetermined condition C ₂ , and when S ₂ is an odd number, then at least part of the data in W _i2 [p _i2 -169, p _i2 ] If the predetermined condition C ₂ is not satisfied, the probability that S ₂ is an even number is 1/2. In the embodiment shown in FIG. 5 , at least part of the data in W _i2 [p _i2 −169, p _i2 ] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , and judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] satisfies the predetermined condition C _4. Judging whether at least part of the data in W _i5 [p _i5 -169, p _i5 ] meets the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169, p _i6 ] meets the predetermined condition C ₆ , judging W Whether at least part of the data in _i7 [p _i7 -169,p _i7 ] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [p _i8 -169,p _i8 ] meets the predetermined condition C ₈ , and judge W _i9 [p _i9 -169,p _i9 ] whether at least part of the data satisfies the predetermined condition C ₉ , judging whether at least part of the data in W _i10 [p _i10 -169,p _i10 ] meets the predetermined condition C ₁₀ and judging W _i11 [p _i11 -169,p _i11 ] whether at least part of the data satisfies the predetermined condition C ₁₁ . In the embodiment shown in FIG. 5 , at least part of the data in W _i5 [p _i5 -169, p _i5 ] does not meet the predetermined condition C ₅ , jumping 11 bytes from point p _i5 along the direction of data flow splitting point search, at The end position of the 11th byte obtains the current potential segmentation point k _j , as shown in FIG. 6 , according to the preset rules on the deduplication server 103, determine the point p _j1 and the corresponding point p _j1 for the potential segmentation point k _j Window W _j1 [p _j1 -169,p _j1 ], the method of judging whether at least part of the data in window W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ is the same as judging window W _i1 [p _i1 -169,p _i1 ] whether at least part of the data satisfies the predetermined condition C ₁ in the same way, so as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], to judge W _j1 [p _j1 -169,p Whether at least part of the data in _j1 ] satisfies the predetermined condition C ₁ , the bytes "■" with serial numbers 169, 127, 85, 43 and 1 in Figure 17 represent one selected byte respectively, and two adjacent selected bytes There is a difference of 42 bytes between them. Convert the bytes "■" with sequence numbers 169, 127, 85, 43, and 1 into a decimal value, which are respectively expressed as a ₁ ', a ₂ ', a ₃ ', a ₄ ', and a ₅ '. Because 1 byte consists of 8 bits, each byte "■" is used as a value, and any a _r 'in a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' is Satisfy 0≤a _r '≤255. a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' form a 1*5 matrix. The method of judging whether at least part of the data in the window W _j1 [p _j1 -169, p _j1 ] meets the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i1 [p _i1 -169, p _i1 ] meets the predetermined condition C ₁ In the same way, therefore, still using the matrix R, expressed as:

根据a₁'的值和所在的列，从矩阵R中查找对应的值，如a₁'＝16，a₁'位于第1列，则查找h_16,1对应的值；根据a₂'的值和所在的列，从矩阵R中查找对应的值，如a₂'＝98，a₂'位于第2列，则查找h_98,2对应的值；根据a₃'的值和所在的列，从矩阵R中查找对应的值，如a₃'＝56，a₃'位于第3列，则查找h_56,3对应的值；根据a₄'的值和所在的列，从矩阵R中查找对应的值，如a₄'＝36，a₄'位于第4列，则查找h_36,4对应的值；根据a₅'的值和所在的列，从矩阵R中查找对应的值，如a₅'＝99，a₅'位于第5列，则查找h_99,5对应的值。S₁'＝h_16,1+h_98,2+h_56,3+h_36,4+h_99,5,因为矩阵R服从二项分布，因此，S₁'也服从二项分布。当S₁'为偶数，则W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁，当S₁'为奇数，则W_j1[p_j1-169,p_j1]中至少部分数据不满足预定条件C₁，S₁'为偶数的概率为1/2。According to the value of a ₁ ' and the column where it is located, find the corresponding value from the matrix R, such as a ₁ '=16, and a ₁ ' is located in the first column, then find the value corresponding to h _16,1 ; according to the value of a ₂ ' value and the column where it is located, find the corresponding value from the matrix R, such as a ₂ '=98, a ₂ 'is located in the second column, then find the value corresponding to h _98,2 ; according to the value of a ₃ ' and the column where it is located , look up the corresponding value from the matrix R, such as a ₃ '=56, a ₃ 'is located in the third column, then look for the value corresponding to h _56,3 ; according to the value and column of a ₄ ', from the matrix R Find the corresponding value, such as a ₄ '=36, a ₄ ' is located in the fourth column, then find the value corresponding to h _36,4 ; according to the value of a ₅ ' and the column where it is located, find the corresponding value from the matrix R, For example, a ₅ '=99, and a ₅ 'is located in the fifth column, then search for the value corresponding to h _99,5 . S ₁ '=h _16,1 +h _98,2 +h _56,3 +h _36,4 +h _99,5 , because the matrix R obeys the binomial distribution, therefore, S ₁ ' also obeys the binomial distribution. When S ₁ 'is an even number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , and when S ₁ ' is an odd number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] Part of the data does not satisfy the predetermined condition C ₁ , and the probability that S ₁ ′ is an even number is 1/2.

判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图17所示，表示判断窗口W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁'、b₂'、b₃'、b₄'和b₅'。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁'、b₂'、b₃'、b₄'和b₅'中的任一个b_r'均满足0≤b_r'≤255。b₁'、b₂'、b₃'、b₄'和b₅'组成1*5的矩阵。与判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂使用相同的矩阵R，根据b₁'的值和所在的列，从矩阵R中查找对应的值，如b₁'＝210，b₁'位于第1列，则查找h_210,1对应的值；根据b₂'的值和所在的列，从矩阵R中查找对应的值，如b₂'＝156，b₂'位于第2列，则查找h_156,2对应的值；根据b₃'的值和所在的列，从矩阵R中查找对应的值，如b₃'＝144，b₃'位于第3列，则查找h_144,3对应的值；根据b₄'的值和所在的列，从矩阵R中查找对应的值，如b₄'＝60，b₄'位于第4列，则查找h_60,4对应的值；根据b₅'的值和所在的列，从矩阵R中查找对应的值，如b₅'＝90，b₅'位于第5列，则查找h_90,5对应的值。S₂'＝h_210,1+h_156,2+h_144,3+h_60,4+h_90,5，与S₂的判断条件相同，当S₂'为偶数，则W_j2[p_j2-169,p_j2]中至少部分数据满足预定条件C₂，当S₂'为奇数，则W_j2[p_j2-169,p_j2]中至少部分数据不满足预定条件C₂，S₂'为偶数的概率为1/2。The method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ is the same as the method of judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , so, as shown in Figure 17, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes, respectively Sequence numbers 170, 128, 86, 44 and 2 indicate that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 converted into a decimal value, respectively, expressed as b ₁ ', b ₂ ', b ₃ ', b ₄ ' and b ₅ '. Since 1 byte consists of 8 bits, each byte As a numerical value, b _r ' among b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ satisfies _0≤br ′≤255. b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ form a 1*5 matrix. Using the same matrix R as judging whether at least part of the data in the window W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , according to the value of b ₁ ' and the column where it is located, find the corresponding value from the matrix R, For example, b ₁ '=210, b ₁ 'is located in the first column, then find the value corresponding to h _210,1 ; according to the value of b ₂ ' and the column where it is located, find the corresponding value from the matrix R, such as b ₂ '= 156, b ₂ ' is located in the second column, then search for the value corresponding to h ₁₅₆ , 2; according to the value of b ₃ ' and the column where it is located, find the corresponding value from the matrix R, such as b ₃ '=144, b ₃ ' If it is in the third column, look for the value corresponding to h _144,3 ; look up the corresponding value from the matrix R according to the value of b ₄ ' and the column where it is located, such as b ₄ '=60, b ₄ 'is in the fourth column, Then look for the value corresponding to h _60,4 ; look up the corresponding value from the matrix R according to the value and column of b ₅ ', such as b ₅ '=90, b ₅ ' is in the fifth column, then look for h _{90, 5} corresponds to the value. S ₂ '=h _210,1 +h _156,2 +h _144,3 +h _60,4 +h _90,5 , same as S ₂ judgment conditions, when S ₂ ' is an even number, then W _j2 [p _j2 -169,p _j2 ] at least part of the data satisfies the predetermined condition C ₂ , when S ₂ ' is an odd number, then at least part of the data in W _j2 [p _j2 -169,p _j2 ] does not meet the predetermined condition C ₂ , S ₂ ' is The probability of an even number is 1/2.

同理，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。Similarly, the method of judging whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ In the same way, judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , judge whether at least part of the data in W _j5 [p _j5 -169,p _j5 ] meet the predetermined condition C _5. Judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging whether at least part of the data in W _j7 [p _{j7 -169} ,p _j7 ] meets the predetermined condition C ₇ , judging Whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] satisfies the predetermined condition C ₈ , judge whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] meets the predetermined condition C ₉ , and judge whether W _j10 [p Whether at least part of the data in _j10 -169,p _j10 ] satisfies the predetermined condition C ₁₀ and whether at least part of the data in W _j11 [p _j11 -169,p _j11 ] meets the predetermined condition C ₁₁ will not be repeated here.

以图5所示的实施方式为例，提供了一种判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定的条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁、a₂、a₃、a₄和a₅。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁、a₂、a₃、a₄和a₅中的任一个a_s均满足0≤a_s≤255。a₁、a₂、a₃、a₄和a₅组成1*5的矩阵。从服从二项分布的随机数中选择256*5个随机数，组成矩阵R，表示为：从服从二项分布的随机数中选择256*5个随机数，组成矩阵G，表示为： Taking the implementation shown in FIG. 5 as an example, a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is provided. In this embodiment, The random function judges whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] meets the predetermined condition C _z , and determines the point for the potential split point k _i according to the preset rules on the deduplication server 103 p _i1 and the window W _i1 [p _i1 -169, p _i1 ] corresponding to p _i1 , judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , as shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169,p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ , select 5 bytes, the serial number in Figure 16 The bytes "■" of 169, 127, 85, 43 and 1 respectively represent one selected byte, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are respectively converted into a decimal value, expressed as a ₁ , a ₂ , a ₃ , a ₄ and a ₅ respectively. Because a byte is composed of 8 bits, each byte "■" is used as a value, and any a _s in a ₁ , a ₂ , a ₃ , a ₄ and a ₅ satisfies 0≤a _s ≤ 255. a ₁ , a ₂ , a ₃ , a ₄ and a ₅ form a 1*5 matrix. Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix R, which is expressed as: Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix G, which is expressed as:

根据a₁的值和所在的列，如a₁＝36，a₁位于第1列，则从矩阵R中查找查找h_36,1对应的值，从矩阵G中查找g_36,1对应的值；根据a₂的值和所在的列，如a₂＝48，a₂位于第2列，则从矩阵R中查h_48,2对应的值，从矩阵G中查找g_48,2对应的值；根据a₃的值和所在的列，如a₃＝26，a₃位于第3列，则从矩阵R中查找h_26,3对应的值，从矩阵G中查找g_26,3对应的值；根据a₄的值和所在的列，如a₄＝26，a₄位于第4列，则从矩阵R中查找h_26,4对应的值，从矩阵G中查找g_26,4对应的值；根据a₅的值和所在的列，如a₅＝88，a₅位于第5列，则从矩阵R中查找h_88,5对应的值，从矩阵G中查找g_88,5对应的值。S_1h＝h_36,1+h_48,2+h_26,3+h_26,4+h_88,5,因为矩阵R服从二项分布，因此，S_1h也服从二项分布；S_1g＝g_36,1+g_48,2+g_26,3+g_26,4+g_88,5，因为矩阵G服从二项分布，因此S_1g也服从二项分布。当S_1h和S_1g中有1个为偶数，则W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁，当S_1h和S_1g均为奇数，则W_i1[p_i1-169,p_i1]中至少部分数据不满足预定条件C₁，C₁表述按照上述方法获得的S_1h和S_1g中有1个为偶数。因为S_1h和S_1g均服从二项分布，因此S_1h为偶数的概率为1/2，S_1g为偶数的概率为1/2，S_1h和S_1g中有1个为偶数的概率为1-1/4＝3/4，因此，W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁的概率为3/4。在图5所示实施例中，W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁。在图5所示的实施方式中，在W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]中，各窗口大小相同,即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的描述。因此，如图16所示，表示判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图16中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁、b₂、b₃、b₄和b₅。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁、b₂、b₃、b₄和b₅中的任一个b_s均满足0≤b_s≤255。b₁、b₂、b₃、b₄ 和b₅组成1*5的矩阵。本实施方式中，判断各窗口中至少部分数据是否满足预定条件的方式相同，因此仍然使用相同矩阵R和G。根据b₁的值和所在的列，如b₁＝66，b₁位于第1列，则从矩阵R中查找h_66,1对应的值，从矩阵G中查找g_66,1对应的值；根据b₂的值和所在的列，如b₂＝48，b₂位于第2列，则从矩阵R中查找h_48,2对应的值，从矩阵G中查找g_48,2对应的值；根据b₃的值和所在的列，如b₃＝99，b₃位于第3列，则从矩阵R中查找h_99,3对应的值，从矩阵G中查找g_99,3对应的值；根据b₄的值和所在的列，如b₄＝26，b₄位于第4列，则从矩阵R中查找h_26,4对应的值，从矩阵G中查找g_26,4对应的值；根据b₅的值和所在的列，如b₅＝90，b₅位于第5列，则从矩阵R中查找h_90,5对应的值，从矩阵G中查找g_90,5对应的值。S_2h＝h_66,1+h_48,2+h_99,3+h_26,4+h_90,5,因为矩阵R服从二项分布，因此，S_2h也服从二项分布。S_2g＝g_66,1+g_48,2+g_99,3+g_26,4+g_90,5，因为矩阵G服从二项分布，因此，S_2g也服从二项分布。当S_2h和S_2g中有1个为偶数，则W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂，当S_2h和S_2g均为奇数，则W_i2[p_i2-169,p_i2]中至少部分数据不满足预定条件C₂，S_2h和S_2g中有1个为偶数的概率为3/4。在图5所示实施例中，W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11 -169,p_i11]中至少部分数据是否满足预定条件C₁₁。图5所示的实施方式中，W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得当前潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，图17中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁'、a₂'、a₃'、a₄'和a₅'。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁'、a₂'、a₃'、a₄'和a₅'中的任一个a_s'均满足0≤a_s'≤255。a₁'、a₂'、a₃'、a₄'和a₅'组成1*5的矩阵。使用与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁相同的矩阵R和G，分别表示为：和 According to the value of a ₁ and the column where it is located, such as a ₁ = 36, and a ₁ is located in the first column, then look up the value corresponding to h _36,1 from the matrix R, and look up the value corresponding to g _36,1 from the matrix G ;According to the value of a ₂ and the column where it is located, such as a ₂ = 48, a ₂ is located in the second column, then look up the value corresponding to h _48,2 from the matrix R, and look up the value corresponding to g _48,2 from the matrix G ;According to the value of a ₃ and the column where it is located, such as a ₃ =26, a ₃ is located in the third column, then look up the value corresponding to h _26,3 from the matrix R, and look up the value corresponding to g _26,3 from the matrix G ;According to the value of a ₄ and the column where it is located, such as a ₄ =26, a ₄ is located in the fourth column, then look up the value corresponding to h _26,4 from the matrix R, and look up the value corresponding to g _26,4 from the matrix G ;According to the value of a ₅ and the column where it is located, as a ₅ =88, a ₅ is located in the fifth column, then look up the value corresponding to h _88,5 from the matrix R, and look up the value corresponding to g _88,5 from the matrix G . S _1h ＝h _36,1 +h _48,2 +h _26,3 +h _26,4 +h _88,5 , because matrix R obeys binomial distribution, therefore, S _1h also obeys binomial distribution; S _1g ＝g _36,1 +g _48,2 +g _26,3 +g _26,4 +g _88,5 , because matrix G obeys binomial distribution, so S _1g also obeys binomial distribution. When one of S _1h and S _1g is an even number, at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , and when both S _1h and S _1g are odd numbers, then W _i1 [p _i1 -169,p _i1 ] at least part of the data does not meet the predetermined condition C ₁ , and C ₁ indicates that one of S _1h and S _1g obtained by the above method is an even number. Because both S _1h and S _1g follow the binomial distribution, the probability that S _1h is even is 1/2, the probability that S _1g is even is 1/2, and the probability that one of S _1h and S _1g is even is 1 -1/4=3/4, therefore, the probability that at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ is 3/4. In the embodiment shown in FIG. 5 , at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ . _[ _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169,p _i10 ], and W _i11 [p _i11 -169,p _i11 ], each window has the same size, namely The size of the window is 169 bytes. At the same time, the method of judging whether at least part of the data in the window meets the predetermined condition is the same. For details, see the above-mentioned judging whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ describe. Therefore, as shown in Figure 16, Represents the 1 byte selected when judging whether at least part of the data in the window W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C _2. In FIG. Indicates that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 respectively converted into a decimal value, respectively expressed as b ₁ , b ₂ , b ₃ , b ₄ and b ₅ . Since 1 byte consists of 8 bits, each byte As a numerical value, any b _s among b ₁ , b ₂ , b ₃ , b ₄ and b ₅ satisfies 0≤b _s ≤255. b ₁ , b ₂ , b ₃ , b ₄ and b ₅ form a 1*5 matrix. In this embodiment, the method of judging whether at least part of the data in each window satisfies the predetermined condition is the same, so the same matrices R and G are still used. According to the value of b ₁ and the column where it is located, such as b ₁ = 66, b ₁ is located in the first column, then look up the value corresponding to h _66,1 from the matrix R, and look up the value corresponding to g _66,1 from the matrix G; According to the value of b2 and the column where it is located, as b2=48, b2 is located in the _second column, then look _{up the value corresponding to h 48,2} _from matrix R, and look up the value corresponding to _g _48,2 from matrix G; According to the value of b ₃ and the column where it is located, as b ₃ =99, b ₃ is located in the third column, then the value corresponding to h _99,3 is searched from matrix R, and the value corresponding to g _99,3 is searched from matrix G; According to the value of b ₄ and the column where it is located, as b ₄ =26, b ₄ is located in the fourth column, then look up the value corresponding to h _26,4 from the matrix R, and look up the value corresponding to g _26,4 from the matrix G; According to the value of b ₅ and the column where it is located, for example, b ₅ =90, and b ₅ is located in the fifth column, the value corresponding to h _90,5 is searched from the matrix R, and the value corresponding to g _90,5 is searched from the matrix G. S _2h =h _66,1 +h _48,2 +h _99,3 +h _26,4 +h _90,5 , because the matrix R obeys the binomial distribution, therefore, S _2h also obeys the binomial distribution. S _2g =g _66,1 +g _48,2 +g _99,3 +g _26,4 +g _90,5 , because the matrix G obeys the binomial distribution, therefore, S _2g also obeys the binomial distribution. When one of S _2h and S _2g is an even number, at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , and when both S _2h and S _2g are odd numbers, then W _i2 [p _i2 -169, p _i2 ] at least part of the data does not satisfy the predetermined condition C ₂ , and the probability that one of S _2h and S _2g is an even number is 3/4. In the embodiment shown in FIG. 5 , at least part of the data in W _i2 [p _i2 −169, p _i2 ] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , and judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] satisfies the predetermined condition C _4. Judging whether at least part of the data in W _i5 [p _i5 -169, p _i5 ] meets the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169, p _i6 ] meets the predetermined condition C ₆ , judging W Whether at least part of the data in _i7 [p _i7 -169,p _i7 ] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [p _i8 -169,p _i8 ] meets the predetermined condition C ₈ , and judge W _i9 [p _i9 -169,p _i9 ] whether at least part of the data satisfies the predetermined condition C ₉ , judging whether at least part of the data in W _i10 [p _i10 -169,p _i10 ] meets the predetermined condition C ₁₀ and judging W _i11 [p _i11 -169,p _i11 ] whether at least part of the data satisfies the predetermined condition C ₁₁ . In the embodiment shown in FIG. 5 , at least part of the data in W _i5 [p _i5 -169, p _i5 ] does not meet the predetermined condition C ₅ , jumping 11 bytes from point p _i5 along the direction of data flow splitting point search, at The end position of the 11th byte obtains the current potential segmentation point k _j , as shown in FIG. 6 , according to the preset rules on the deduplication server 103, determine the point p _j1 and the corresponding point p _j1 for the potential segmentation point k _j Window W _j1 [p _j1 -169,p _j1 ], the method of judging whether at least part of the data in window W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ is the same as judging window W _i1 [p _i1 -169,p _i1 ] whether at least part of the data satisfies the predetermined condition C ₁ in the same way, so as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], to judge W _j1 [p _j1 -169,p Whether at least part of the data in _j1 ] satisfies the predetermined condition C ₁ , the bytes "■" with serial numbers 169, 127, 85, 43 and 1 in Figure 17 represent one selected byte respectively, and two adjacent selected bytes There is a difference of 42 bytes between them. Convert the bytes "■" with sequence numbers 169, 127, 85, 43, and 1 into a decimal value, which are respectively expressed as a ₁ ', a ₂ ', a ₃ ', a ₄ ', and a ₅ '. Because 1 byte consists of 8 bits, each byte "■" is used as a value, and any a _s 'in a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' is Satisfy 0≤a _s '≤255. a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' form a 1*5 matrix. Using the same matrix R and G as judging whether at least part of the data in the window W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ is expressed as: and

根据a₁'的值和所在的列，如a₁'＝16，a₁'位于第1列，则从矩阵R中查找h_16,1对应的值，从矩阵G中查找g_16,1对应的值；根据a₂'的值和所在的列，如a₂'＝98，a₂'位于第2列，则从矩阵R中查找h_98,2对应的值,从矩阵G中查找g_98,2对应的值；根据a₃'的值和所在的列，如a₃'＝56，a₃'位于第3列，则从矩阵R中查找h_56,3对应的值,从矩阵G中查找g_56,3对应的值；根据a₄'的值和所在的列，如a₄'＝36，a₄'位于第4列，则从矩阵R中查找h_36,4对应的值,从矩阵G中查找g_36,4对应的值；根据a₅'的值和所在的列，如a₅'＝99，a₅'位于第5列，则从矩阵R中查找h_99,5对应的值,从矩阵G中查找g_99,5对应的值。S_1h'＝h_16,1+h_98,2+h_56,3+h_36,4+h_99,5,因为矩阵R服从二项分布，因此，S_1h'也服从二项分布；S_1g'＝g_16,1+g_98,2+g_56,3+g_36,4+g_99,5，因为矩阵G服从二项分布，因此S_1g'也服从二项分布。当S_1h'和S_1g'中有1个为偶数，则W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁，当S_1h'和S_1g'均为奇数，则W_j1[p_j1-169,p_j1]中至少部分数据不满足预定条件C₁，S_1h'和S_1g'有1个为偶数的概率为3/4。According to the value of a ₁ ' and the column where it is located, such as a ₁ '=16, and a ₁ 'is located in the first column, then look up the value corresponding to h _16,1 from the matrix R, and find the corresponding value of g _16,1 from the matrix G value; according to the value of a ₂ ' and the column where it is located, such as a ₂ '=98, a ₂ 'is located in the second column, then look up the value corresponding to h ₉₈ and 2 from the matrix R, and look up g ₉₈ from the matrix G _{, the value corresponding to 2} ; according to the value of a ₃ ' and the column where it is located, such as a ₃ '=56, a ₃ 'is located in the third column, then look up the value corresponding to h _56,3 from the matrix R, and from the matrix G Find the value corresponding to g _56,3 ; according to the value of a ₄ ' and the column where it is located, such as a ₄ '=36, a ₄ ' is located in the fourth column, then find the value corresponding to h _36,4 from the matrix R, from Find the value corresponding to g _36,4 in the matrix G; according to the value of a ₅ ' and the column where it is located, such as a ₅ '=99, a ₅ ' is located in the fifth column, then find the value corresponding to h _99,5 from the matrix R value, find the value corresponding to g _99,5 from the matrix G. S _1h '=h _16,1 +h _98,2 +h _56,3 +h _36,4 +h _99,5 , because matrix R obeys binomial distribution, therefore, S _1h ' also obeys binomial distribution; S _1g '=g _16,1 +g _98,2 +g _56,3 +g _36,4 +g _99,5 , because the matrix G obeys the binomial distribution, so S _1g ' also obeys the binomial distribution. When one of S _1h ' and S _1g ' is an even number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , and when both S _1h ' and S _1g ' are odd numbers, then At least part of the data in W _j1 [p _j1 -169,p _j1 ] does not satisfy the predetermined condition C ₁ , and the probability that one of S _1h ' and S _1g ' is an even number is 3/4.

判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图17所示，表示判断窗口W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。在图17中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁'、b₂'、b₃'、b₄'和b₅'。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁'、b₂'、b₃'、b₄'和b₅'中的任一个b_s'均满足0≤b_s'≤255。b₁'、b₂'、b₃'、b₄'和b₅'组成1*5的矩阵。使用与判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂相同的矩阵R和G，根据b₁'的值和所在的列，如b₁'＝210，b₁'位于第1列，则从矩阵R中查找h_210,1对应的值,从矩阵G中查找g_210,1对应的值；根据b₂'的值和所在的列，如b₂'＝156，b₂'位于第2列，则从矩阵R中查找h_156,2对应的值,从矩阵G中查找g_156,2对应的值；根据b₃'的值和所在的列，如b₃'＝144，b₃'位于第3列，则从矩阵R中查找h_144,3对应的值,从矩阵G中查找g_144,3对应的值；根据b₄'的值和所在的列，如b₄'＝60，b₄'位于第4列，则从矩阵R中查找h_60,4对应的值,从矩阵G中查找g_60,4对应的值；根据b₅'的值和所在的列，如b₅'＝90，b₅'位于第5列，则从矩阵R中查找h_90,5对应的值,从矩阵G中查找g_90,5对应的值。S_2h'＝h_210,1+h_156,2+h_144,3+h_60,4+h_90,5,S_2g'＝g_210,1+g_156,2+g_144,3+g_60,4+g_90,5。当S_2h'和S_2g'中有1个为偶数，则W_j2[p_j2-169,p_j2]中至少部分数据满足预定条件C₂，当S_2h'和S_2g'均为奇数，则W_j2[p_j2-169,p_j2]中至少部分数据不满足预定条件C₂，S_2h'和S_2g'中有1个为偶数的概率为3/4。The method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ is the same as the method of judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , so, as shown in Figure 17, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. In FIG. 17, they are represented by sequence numbers 170, 128, 86, 44 and 2 respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 converted into a decimal value, respectively, expressed as b ₁ ', b ₂ ', b ₃ ', b ₄ ' and b ₅ '. Since 1 byte consists of 8 bits, each byte As a numerical value, any one of b _s ' among b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ satisfies 0≤b _s '≤255. b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ form a 1*5 matrix. Using the same matrix R and G as judging whether at least part of the data in the window W _i2 [p _i2 -169, p _i2 ] meets the predetermined condition C ₂ , according to the value of b ₁ ' and the column where it is located, such as b ₁ '=210, b ₁ ' is located in the first column, then look up the value corresponding to h _210,1 from the matrix R, and find the value corresponding to g _210,1 from the matrix G; according to the value of b ₂ ' and the column where it is located, such as b ₂ ' =156, b ₂ 'is in the second column, then look up the value corresponding to h _156,2 from the matrix R, and look up the value corresponding to g _156,2 from the matrix G; according to the value of b ₃ ' and the column where it is located, such as b ₃ '=144, b ₃ 'is located in the third column, then look up the value corresponding to h _144,3 from the matrix R, and look up the value corresponding to g _144,3 from the matrix G; according to the value of b ₄ ' and where column, such as b ₄ '=60, b ₄ 'is located in the fourth column, then look up the value corresponding to h _60,4 from the matrix R, and look up the value corresponding to g _60,4 from the matrix G; according to the value of b ₅ ' and the column where it is located, such as b ₅ '=90, and b ₅ 'is located in the fifth column, then look up the value corresponding to h _90,5 from the matrix R, and look up the value corresponding to g _90,5 from the matrix G. S _2h '=h _210,1 +h _156,2 +h _144,3 +h _60,4 +h _90,5 , S _2g '=g _210,1 +g _156,2 +g _144,3 +g _{60 ,4} +g _90,5 . When one of S _2h ' and S _2g ' is an even number, then at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , and when both S _2h ' and S _2g ' are odd numbers, then At least part of the data in W _j2 [p _j2 -169,p _j2 ] does not satisfy the predetermined condition C ₂ , and the probability that one of S _2h ' and S _2g ' is an even number is 3/4.

以图5所示的实施方式为例，提供了一种判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定的条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”依次看成40个位，分别表示为a₁、a₂、a₃、a₄…a₄₀。a₁、a₂、a₃、a₄…a₄₀中的任一a_t，当a_t＝0时，V_at＝-1，当a_t＝1时，V_at＝1，根据a_t与V_at对应关系，生成V_a1、V_a2、V_a3、V_a4…V_a40。从服从正态分布的随机数中选择40个随机数，分别表示为：h₁、h₂、h₃、h₄...h₄₀。S_a＝V_a1*h₁+V_a2*h₂+V_a3*h₃+V_a4*h₄+…+V_a40*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_a也服从正态分布。当S_a为正数，则W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁，当S_a为负数或0，则W_i1[p_i1-169,p_i1]中至少部分数据不满足预定条件C₁，S_a为正数的概率为1/2。在图5所示实施例中，W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁。如图16所示，表示判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图16中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节依次看成40个位，分别表示为b₁、b₂、b₃、b₄…b₄₀。b₁、b₂、b₃、b₄…b₄₀中的任一b_t，当b_t＝0时，V_bt＝-1，当b_t＝1时，V_bt＝1，根据b_t与V_bt对应关系，生成V_b1、V_b2、V_b3、V_b4…V_b40。判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式相同，因此，使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀，S_b＝V_b1*h₁+V_b2*h₂+V_b3*h₃+V_b4*h₄+…+V_b40*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_b也服从正态分布。当S_b为正数，则W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂，当S_b为负数或0，则W_i2[p_i2-169,p_i2]中至少部分数据不满足预定条件C₂，S_b为正数的概率为1/2。在图5所示实施例中，W_i2[p_i2-169,p_i2]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅、判断W_i6[p_i6-169,p_i6]中至少部分数据是否满足预定条件C₆、判断W_i7[p_i7-169,p_i7]中至少部分数据是否满足预定条件C₇、判断W_i8[p_i8-169,p_i8]中至少部分数据是否满足预定条件C₈、判断W_i9[p_i9-169,p_i9]中至少部分数据是否满足预定条件C₉、判断W_i10[p_i10-169,p_i10]中至少部分数据是否满足预定条件C₁₀和判断W_i11[p_i11-169,p_i11]中至少部分数据是否满足预定条件C₁₁。图5所示的实施方式中，W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得当前潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，选择5个字节，图17中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”依次看成40个位，分别表示为a₁'、a₂'、a₃'、a₄'…a₄₀'。a₁'、a₂'、a₃'、a₄'…a₄₀'中的任一a_t'，当a_t'＝0时，V_at'＝-1，当a_t'＝1时，V_at'＝1，根据a_t'与V_at'对应关系，生成V_a1'、V_a2'、V_a3'、V_a4'…V_a40'。判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀。S_a'＝V_a1'*h₁+V_a2'*h₂+V_a3'*h₃+V_a4'*h₄+…+V_a40'*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_a'也服从正态分布。当S_a'为正数，则W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁，当S_a'为负数或0，则W_j1[p_j1-169,p_j1]中至少部分数据不满足预定条件C₁，S_a'为正数的概率为1/2。Taking the implementation shown in FIG. 5 as an example, a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is provided. In this embodiment, The random function judges whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] meets the predetermined condition C _z , and determines the point for the potential split point k _i according to the preset rules on the deduplication server 103 p _i1 and the window W _i1 [p _i1 -169, p _i1 ] corresponding to p _i1 , judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , as shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169,p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , select 5 bytes, the sequence number in Figure 16 The bytes "■" of 169, 127, 85, 43 and 1 respectively represent one selected byte, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are regarded as 40 bits in turn, which are expressed as a ₁ , a ₂ , a ₃ , a ₄ ...a ₄₀ respectively. For any at of a ₁ , a ₂ , a ₃ , a ₄ ...a ₄₀ , when _at = 0, V _at = _-1 , when _at = 1, V _at = 1, according to _at and _Vat correspondence relationship generates V _a1 , V _a2 , V _a3 , V _a4 . . . V _a40 . Select 40 random numbers from the random numbers subject to the normal distribution, respectively denoted as: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ . S _a =V _a1 *h ₁ +V _a2 *h ₂ +V _a3 *h ₃ +V _a4 *h ₄ + . . . +V _a40 *h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey the normal distribution, therefore, S _a also obeys the normal distribution. When S _a is a positive number, at least part of the data in W _i1 [p _i1 -169,p _i1 ] meets the predetermined condition C ₁ , and when S _a is negative or 0, then in W _i1 [p _i1 -169,p _i1 ] At least part of the data does not satisfy the predetermined condition C ₁ , and the probability that S _a is a positive number is 1/2. In the embodiment shown in FIG. 5 , at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ . As shown in Figure 16, Represents the 1 byte selected when judging whether at least part of the data in the window W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C _2. In FIG. Indicates that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 They are regarded as 40 bits in turn, represented as b ₁ , b ₂ , b ₃ , b ₄ . . . b ₄₀ . For any b _t in b ₁ , b ₂ , b ₃ , b ₄ ... b ₄₀ , when b _t =0, V _bt =-1, when b _t =1, V _bt =1, according to b _t and The V _bt correspondence relationship generates V _b1 , V _b2 , V _b3 , V _b4 . . . V _b40 . The method of judging whether at least part of the data in the window W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i2 [p _i2 -169, p _i2 ] meets the predetermined condition C ₂ In the same way, therefore, using the same random numbers: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ , S _b = V _b1 *h ₁ +V _b2 *h ₂ +V _b3 *h ₃ +V _b4 *h ₄ +...+V _b40 *h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _b also obeys normal distribution. When S _b is a positive number, at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ , and when S _b is negative or 0, then in W _i2 [p _i2 -169,p _i2 ] At least part of the data does not satisfy the predetermined condition C ₂ , and the probability that S _b is a positive number is 1/2. In the embodiment shown in FIG. 5 , at least part of the data in W _i2 [p _i2 −169, p _i2 ] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ , and judge whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] satisfies the predetermined condition C _4. Judging whether at least part of the data in W _i5 [p _i5 -169, p _i5 ] meets the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [p _i6 -169, p _i6 ] meets the predetermined condition C ₆ , judging W Whether at least part of the data in _i7 [p _i7 -169,p _i7 ] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [p _i8 -169,p _i8 ] meets the predetermined condition C ₈ , and judge W _i9 [p _i9 -169,p _i9 ] whether at least part of the data satisfies the predetermined condition C ₉ , judging whether at least part of the data in W _i10 [p _i10 -169,p _i10 ] meets the predetermined condition C ₁₀ and judging W _i11 [p _i11 -169,p _i11 ] whether at least part of the data satisfies the predetermined condition C ₁₁ . In the embodiment shown in FIG. 5 , at least part of the data in W _i5 [p _i5 -169, p _i5 ] does not meet the predetermined condition C ₅ , jumping 11 bytes from point p _i5 along the direction of data flow splitting point search, at The end position of the 11th byte obtains the current potential segmentation point k _j , as shown in FIG. 6 , according to the preset rules on the deduplication server 103, determine the point p _j1 and the corresponding point p _j1 for the potential segmentation point k _j Window W _j1 [p _j1 -169,p _j1 ], the method of judging whether at least part of the data in window W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ is the same as judging window W _i1 [p _i1 -169,p _i1 ] whether at least part of the data satisfies the predetermined condition C ₁ is the same way, so as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], to judge W _j1 [p _j1 -169,p Whether at least part of the data in _j1 ] meets the predetermined condition C ₁ , in order to judge whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , select 5 bytes, and the serial numbers in Figure 17 are 169, The byte "■" of 127, 85, 43 and 1 represents one selected byte respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are regarded as 40 bits in sequence, which are represented as a ₁ ', a ₂ ', a ₃ ', a ₄ '...a ₄₀ ' respectively. For any a _t 'in a ₁ ', a ₂ ', a ₃ ', a ₄ '...a ₄₀ ', when a _t '=0, V _at '=-1, when a _t '=1, V _at '=1, V _a1 ', V _a2 ', V _a3 ', V _a4 '...V _a40 ' _are generated according to the correspondence between at ' and V _at '. The method of judging whether at least part of the data in the window W _j1 [p _j1 -169, p _j1 ] satisfies the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ In the same way, so the same random numbers are used: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ . S _a '=V _a1 '*h ₁ +V _a2 '*h ₂ +V _a3 '*h ₃ +V _a4 '*h ₄ +...+V _a40 '*h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _a ' also obeys normal distribution. When S _a 'is a positive number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] meets the predetermined condition C ₁ , and when S _a ' is negative or 0, then W _j1 [p _j1 -169,p _j1 ] at least part of the data does not meet the predetermined condition C ₁ , the probability that S _a ' is a positive number is 1/2.

判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图17所示，表示判断窗口W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。在图17中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节依次看成40个位，分别表示为b₁'、b₂'、b₃'、b₄'…b₄₀'。b₁'、b₂'、b₃'、b₄'…b₄₀'中的任一b_t'，当b_t'＝0时，V_bt'＝-1，当b_t'＝1时，V_bt'＝1，根据b_t'与V_bt'对应关系，生成V_b1'、V_b2'、V_b3'、V_b4'…V_b40'。判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，因此，使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀，S_b'＝V_b1'*h₁+V_b2'*h₂+V_b3'*h₃+V_b4'*h₄+…+V_b40'*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_b'也服从正态分布。当S_b'为正数，则W_j2[p_j2-169,p_j2]中至少部分数据满足预定条件C₂，当S_b'为负数或0，则W_j2[p_j2-169,p_j2]中至少部分数据不满足预定条件C₂，S_b'为正数的概率为1/2。The method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ is the same as the method of judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , so, as shown in Figure 17, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. In FIG. 17, they are represented by sequence numbers 170, 128, 86, 44 and 2 respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 It is regarded as 40 bits in turn, which are respectively expressed as b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′…b ₄₀ ′. Any b _t 'in b ₁ ', b ₂ ', b ₃ ', b ₄ '...b ₄₀ ', when b _t '=0, V _bt '=-1, when b _t '=1, V _bt '=1, V _b1 ', V _b2 ', V _b3 ', V _b4 '...V _b40 ' are generated according to the corresponding relationship between b _t ' and V _bt '. The method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] satisfies the predetermined condition C ₂ is the same as the method of judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] satisfies the predetermined condition C ₂ , therefore, using the same random numbers: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ , S _b '=V _b1 '*h ₁ +V _b2 '*h ₂ +V _b3 '*h ₃ +V _b4 '*h ₄ +...+V _b40 '*h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _b ' also obeys normal distribution. When S _b ' is a positive number, then at least part of the data in W _j2 [p _j2 -169,p _j2 ] meets the predetermined condition C ₂ , and when S _b ' is negative or 0, then W _j2 [p _j2 -169,p _j2 ] at least part of the data does not meet the predetermined condition C ₂ , the probability that S _b ' is a positive number is 1/2.

同理，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。Similarly, the method of judging whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [p _j3 -169,p _j3 ] satisfies the predetermined condition C ₃ In the same way, judge whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , judge whether at least part of the data in W _j5 [p _j5 -169,p _j5 ] meet the predetermined condition C _5. Judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging whether at least part of the data in W _j7 [p _{j7 -169} ,p _j7 ] meets the predetermined condition C ₇ , judging Whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , judge whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] meet the predetermined condition C ₉ , and judge whether W _j10 [p Whether at least part of the data in _j10 -169,p _j10 ] satisfies the predetermined condition C ₁₀ and whether at least part of the data in W _j11 [p _j11 -169,p _j11 ] meets the predetermined condition C ₁₁ will not be repeated here.

仍然以图5所示实施方式为例，提供了一种判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z，根据在去重服务器103上预设的规则，为潜在分割点k_i确定点p_i1及p_i1对应的窗口W_i1[p_i1-169,p_i1]，判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，如图16所示，W_i1表示窗口W_i1[p_i1-169,p_i1]，为判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁，选择5个字节，图16中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”转换成1个十进制数，范围为0-(2^40-1)，使用均匀分布随机数生成器为0-(2^40-1)中的每一个十进制数生成1 个指定值，记录0-(2^40-1)中的每一个十进制数与指定值之间的对应关系R，一旦指定则该十进制数对应的指定值就不变，该指定值服从均匀分布，如果该指定值为偶数，则W_i1[p_i1-169,p_i1]中至少部分数据满足预定条件C₁，如果该指定值为奇数，则W_i1[p_i1-169,p_i1]中至少部分数据不满足预定条件C₁，C₁表示按照上述方法获得的指定值为偶数。因为均匀分布的随机数为偶数的概率为1/2，因此，[p_i1-169,p_i1]中至少部分数据满足预定条件C₁的概率为1/2。在图5所示的实施方式中，使用同样的规则，分别判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃、判断W_i4[p_i4-169,p_i4]中至少部分数据是否满足预定条件C₄、判断W_i5[p_i5-169,p_i5]中至少部分数据是否满足预定条件C₅，在此不再赘述。Still taking the embodiment shown in FIG. 5 as an example, a method for judging whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z is provided, which is used in this embodiment The random function judges whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] meets the predetermined condition C _z , and determines the point for the potential split point k _i according to the preset rules on the deduplication server 103 The window W _i1 [p _i1 -169, p _i1 ] corresponding to p _i1 and p _i1 , judge whether at least part of the data in W _i1 [p _i1 -169, p _i1 ] satisfies the predetermined condition C ₁ , as shown in Figure 16, W _i1 represents the window W _i1 [p _i1 -169,p _i1 ], in order to judge whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ , select 5 bytes, and the serial number in Figure 16 is The byte "■" of 169, 127, 85, 43 and 1 represents one selected byte respectively, and the difference between two adjacent selected bytes is 42 bytes. Convert the byte "■" with serial numbers 169, 127, 85, 43 and 1 into a decimal number in the range of 0-(2^40-1), using a uniformly distributed random number generator as 0-(2^ Each decimal number in 40-1) generates a specified value, and records the correspondence R between each decimal number in 0-(2^40-1) and the specified value. Once specified, the corresponding decimal number The specified value remains unchanged, and the specified value obeys the uniform distribution. If the specified value is even, then at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ . If the specified value is odd, then At least part of the data in W _i1 [p _i1 -169, p _i1 ] does not satisfy the predetermined condition C ₁ , and C ₁ indicates that the specified value obtained by the above method is an even number. Because the probability that a uniformly distributed random number is an even number is 1/2, therefore, the probability that at least part of the data in [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ is 1/2. In the embodiment shown in FIG. 5 , the same rule is used to determine whether at least part of the data in W _i2 [p _i2 -169, p _i2 ] satisfies the predetermined condition C ₂ , and to determine whether W _i3 [p _i3 -169, p _i3 ] whether at least part of the data in W i4 [p i4 -169,p i4 ] satisfies the predetermined condition C ₃ , whether at least part of the data in W _i4 [p _i4 -169,p _i4 ] meets the predetermined condition C ₄ , and whether at least part of the data in W _i5 [p _i5 -169,p _i5 ] Whether the data satisfies the predetermined condition C ₅ will not be repeated here.

当W_i5[p_i5-169,p_i5]中至少部分数据不满足预定条件C₅，从点p_i5沿着数据流分割点查找方向跳跃11个字节，在第11个字节的结束位置获得当前潜在分割点k_j，如图6所示，根据在去重服务器103上预设的规则，为潜在分割点k_j确定点p_j1、点p_j1对应的窗口W_j1[p_j1-169,p_j1]，判断窗口W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的方式相同，因此，使用相同的0-(2^40-1)中的每一个十进制数与指定值之间的对应关系R，如图17所示，W_j1表示窗口W_j1[p_j1-169,p_j1]，为判断W_j1[p_j1-169,p_j1]中至少部分数据是否满足预定条件C₁，选择5个字节，图17中“■”表示选择的1个字节，相邻两个选择的字节“■”之间相差42个字节。将序号为169、127、85、43和1的字节“■”转换成1个十进制数，在R查找该十进制数对应的指定值，如果该指定值为偶数，则W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁，如果该指定值为奇数，则W_j1[p_j1-169,p_j1]中至少部分数据不满足预定条件C₁，因为均匀分布的随机数为偶数的概率为1/2，因此，W_j1[p_j1-169,p_j1]中至少部分数据满足预定条件C₁的概率为1/2。同理，判断W_i2[p_i2-169,p_i2]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[p_j2-169,p_j2]中至少部分数据是否满足预定条件C₂的方式相同，判断W_i3[p_i3-169,p_i3]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[p_j3-169,p_j3]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[p_j4-169,p_j4]中至少部分数据是否满足预定条件C₄、判断W_j5[p_j5-169,p_j5]中至少部分数据是否满足预定条件C₅、判断W_j6[p_j6-169,p_j6]中至少部分数据是否满足预定条件C₆、判断W_j7[p_j7-169,p_j7]中至少部分数据是否满足预定条件C₇、判断W_j8[p_j8-169,p_j8]中至少部分数据是否满足预定条件C₈、判断W_j9[p_j9-169,p_j9]中至少部分数据是否满足预定条件C₉、判断W_j10[p_j10-169,p_j10]中至少部分数据是否满足预定条件C₁₀和判断W_j11[p_j11-169,p_j11]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。When at least part of the data in W _i5 [p _i5 -169,p _i5 ] does not meet the predetermined condition C ₅ , jump 11 bytes from point p _i5 along the direction of data flow splitting point search, at the end position of the 11th byte Obtain the current potential segmentation point k _j , as shown in FIG. 6 , according to the preset rules on the deduplication server 103, determine the point p _j1 and the window W _j1 [p _j1 -169 corresponding to the point p _j1 ] for the potential segmentation point k _j ,p _j1 ], the method of judging whether at least part of the data in the window W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ is the same as judging whether at least part of the data in the window W _i1 [p _i1 -169,p _i1 ] satisfies The way of the predetermined condition C ₁ is the same, therefore, use the correspondence R between each decimal number in the same 0-(2^40-1) and the specified value, as shown in Figure 17, W _j1 represents the window W _j1 [p _j1 -169,p _j1 ], in order to judge whether at least part of the data in W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ , select 5 bytes, "■" in Figure 17 indicates the selected 1 bytes, and the difference between two adjacent selected bytes "■" is 42 bytes. Convert the byte "■" with serial numbers 169, 127, 85, 43 and 1 into a decimal number, and find the specified value corresponding to the decimal number in R. If the specified value is even, then W _j1 [p _j1 - 169,p _j1 ] at least part of the data satisfies the predetermined condition C ₁ , if the specified value is an odd number, then at least part of the data in W _j1 [p _j1 -169,p _j1 ] does not meet the predetermined condition C ₁ , because the random The probability that the number is even is 1/2, therefore, the probability that at least part of the data in W _j1 [p _j1 -169,p _j1 ] satisfies the predetermined condition C ₁ is 1/2. Similarly, the method of judging whether at least part of the data in W _i2 [p _i2 -169,p _i2 ] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [p _j2 -169,p _j2 ] meet the predetermined condition C ₂ The way of judging whether at least part of the data in W _i3 [p _i3 -169,p _i3 ] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [p _j3 -169,p _j3 ] satisfies the predetermined condition C The method of ₃ is the same, similarly, judging whether at least part of the data in W _j4 [p _j4 -169,p _j4 ] meets the predetermined condition C ₄ , judging whether at least part of the data in W _j5 [p _j5 -169,p _j5 ] meets the predetermined condition Condition C ₅ , judging whether at least part of the data in W _j6 [p _j6 -169,p _j6 ] meets the predetermined condition C ₆ , judging whether at least part of the data in W _j7 [p _{j7 -169} ,p _j7 ] meets the predetermined condition C ₇ , Judging whether at least part of the data in W _j8 [p _{j8 -169} ,p _j8 ] meets the predetermined condition C ₈ , judging whether at least part of the data in W _j9 [p _j9 -169,p _j9 ] meets the predetermined condition C ₉ , judging W _j10 [ Whether at least part of the data in p _j10 -169,p _j10 ] satisfies the predetermined condition C ₁₀ and whether at least part of the data in W _j11 [p _j11 -169,p _j11 ] meets the predetermined condition C ₁₁ will not be repeated here.

图1所示的本发明实施例中的去重服务器103，是指能够实现本发明实施例所描述的技术方案的装置，如图18所示，通常包括中央处理单元、主存储器以及输入输出接口。中央处理单元、主存储器与输入输出接口之间相互通信，主存储器存储可执行指令，中央处理单元执行主存储器中存储的可执行指令，从而执行特定的功能，如本发明实施例图4至图17所描述的查找数据流分割点。因此，如图19所示，根据图4至图17所示的本发明实施例，去重服务器103，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；去重服务器103包括确定单元1901和判断处理单元1902。其中，确定单元1901，用于用于执行步骤a)：a)依据所述规则为当前潜在分割点k_i确定点p_iz及所述点p_iz对应的窗口W_iz[p_iz-A_z,p_iz+B_z]，i和z为整数，并且1≤z≤M；判断处理单元1902，用于所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足预定条件C_z；The deduplication server 103 in the embodiment of the present invention shown in Figure 1 refers to the device capable of implementing the technical solution described in the embodiment of the present invention, as shown in Figure 18, generally includes a central processing unit, a main memory, and an input and output interface . The central processing unit, the main memory, and the input and output interfaces communicate with each other, the main memory stores executable instructions, and the central processing unit executes the executable instructions stored in the main memory to perform specific functions, as shown in Figure 4 to Figure 4 of the embodiment of the present invention 17 to find data stream split points as described. Therefore, as shown in FIG. 19, according to the embodiment of the present invention shown in FIGS. point p _x , the window W _x [p _x -A _x ,p _x +B _x ] corresponding to the point p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], Wherein, x is a continuous natural number from 1 to M, M≥2, and A _x and B _x are integers; the deduplication server 103 includes a determination unit 1901 and a judgment processing unit 1902 . Wherein, the determination unit 1901 is configured to perform step a): a) determine a point p _iz and a window W _iz corresponding to the point p _iz for the current potential segmentation point k _i according to the rule [p _iz -A _z , p _iz +B _z ], i and z are integers, and 1≤z≤M; judging processing unit 1902, used for at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] Satisfy the predetermined condition C _z ;

当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖+‖(k_i-p_ix)‖)，获得新的潜在分割点，则所述确定单元为所述新的潜在分割点执行步骤a)；当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[p_ix-A_x,p_ix+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。When at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , jump N times from the point p _iz along the direction of searching the data stream splitting point The minimum search unit U of the data flow segmentation point, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖+‖(k _i -p _ix )‖), to obtain a new potential segmentation point, then the determination unit Execute step a) for the new potential segmentation point; when at least part of the data in each window W _ix [p _ix -A _x , p _ix +B _x ] of the M windows of the current potential segmentation point _ki If the predetermined condition C _x is met, the current potential split point _ki is a data stream split point.

进一步地，所述规则还包括：至少两个点p_e和p_f，满足条件A_e＝A_f，B_e＝B_f，C_e＝C_f。进一步地，所述规则还包括：所述至少两个点p_e和p_f，相对于所述潜在分割点k,在所述数据流分割点查找反方向上。Further, the rule further includes: at least two points p _e and p _f satisfy the conditions of A _e =A _f , B _e =B _f , and C _e =C _f . Further, the rule further includes: the at least two points _pe and p _f are in the reverse direction of the data flow split point search relative to the potential split point k.

进一步地，所述规则还包括：所述至少两个点p_e和p_f之间的距离为1个U。Further, the rule further includes: the distance between the at least two points _pe and p _f is 1 U.

进一步地，所述判断处理单元1902具体用于使用随机函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。具体地，所述判断处理单元1902具体用于使用hash函数判断所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z。具体地，所述判断处理单元1902具体用于使用随机函数判断所述窗口 W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：Further, the determination processing unit 1902 is specifically configured to use a random function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z . Specifically, the determination processing unit 1902 is specifically configured to use a hash function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z . Specifically, the determination processing unit 1902 is specifically configured to use a random function to determine whether at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z , specifically including:

在所述窗口W_iz[p_iz-A_z,p_iz+B_z]中选择F个字节，将所述F个字节反复利用H次，共获得F*H个字节，其中每个字节由8位组成，记为a_m,1…a_m,8，表示所述F*H个字节中第m个字节的第1到第8位，所述F*H个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，所述F*H个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，所述矩阵V_a表示为：从服务正态分布的随机数中选择F*H*8个随机数组成矩阵R，所述矩阵R表示为：将所述矩阵V_a的第m行与所述矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8，同理，获得S_a1、S_a2…到S_aF*H，统计S_a1、S_a2…到S_aF*H中满足大于0的值的个数K，当K为偶数，则所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据满足所述预定条件C_z。Select F bytes in the window W _iz [p _iz -A _z ,p _iz +B _z ], use the F bytes repeatedly H times, and obtain F*H bytes in total, where each A byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , indicating the 1st to 8th bits of the mth byte in the F*H bytes, and the F*H bytes The corresponding bits can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 Any one, the bits corresponding to the F*H bytes obtain a matrix V _a according to the conversion relationship between am, _n and V _{am, n} , and the matrix V _a is expressed as: Select F*H*8 random numbers from the random numbers of the service normal distribution to form a matrix R, and the matrix R is expressed as: Multiply the mth row of the matrix V _a by the random number in the mth row of the matrix R, and then sum to obtain a value, which is specifically expressed as S _am =V _am,1 *h _m,1 +V _{am ,2} *h _m,2 +…+V _am,8 *h _m,8 , similarly, obtain S _a1 , S _a2 ... to S _aF*H , count S _a1 , S _a2 ... to S _aF*H to satisfy The number K of values greater than 0, when K is an even number, at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] satisfies the predetermined condition C _z .

进一步地，所述判断处理单元1902用于当所述窗口W_iz[p_iz-A_z,p_iz+B_z]中至少部分数据不满足所述预定条件C_z，从所述点p_iz沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，所述确定单元1901为所述新的潜在分割点执行步骤a)，根据所述规则，为所述新的潜在分割点确定的点p_ic对应的窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界与所述窗口W_iz[p_iz-A_z,p_iz+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[p_ic-A_c,p_ic+B_c]的左边界位于所述窗口W_iz[p_iz-A_z,p_iz+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[p_ic-A_c,p_ic+B_c]是根据所述规则，为所述新的潜在分割点确定的M个点按照数据流查找方向获得的序列中排序第一的点。Further, the judgment processing unit 1902 is configured to, when at least part of the data in the window W _iz [p _iz -A _z ,p _iz +B _z ] does not satisfy the predetermined condition C _z , start from the point p _iz along the The data stream segmentation point search direction jumps N minimum search units U of data stream segmentation points to obtain the new potential segmentation point, and the determining unit 1901 performs step a) for the new potential segmentation point, according to the rule, the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] corresponding to the point p _ic determined for the new potential segmentation point is the same as the window W _iz [p _iz -A _z ,p _iz +B _z ] coincides or the left boundary of the window W _ic [p _ic -A _c ,p _ic +B _c ] determined for the new potential segmentation point is located at the window W _iz [p _iz -A _z ,p _iz +B _z ]; wherein, the window W _ic [p _ic -A _c ,p _ic +B _c ] determined for the new potential segmentation point is according to the rule, The M points determined for the new potential segmentation point are the points ranked first in the sequence obtained according to the search direction of the data flow.

根据图4至图17所示的本发明实施例提供的基于服务器查找数据流分割点的方法中，为潜在分割点k_i确定点p_ix及点p_ix的窗口W_ix[p_ix-A_x，p_ix+B_x]，其中，x分别为1到M连续的自然数，M≥2，可以并行判断M个窗口中每一个窗口中至少部分数据是否满足预定条件C_x，或者依次判断窗口中至少部分数据是否满足预定条件，也可以判断窗口W_i1[p_i1-A₁，p_i1+B₁]中至少部分数据满足预定条件C₁时，再判断W_i2[p_i2-A₂，p_i2+B₂]中至少部分数据满足预定条件C₂时，直到判断W_im[p_im-A_m，p_im+B_m]中至少部分数据满足预定条件C_m。实施例中其他窗口的判断与此相同，不再赘述。According to the server-based method for _searching data _stream _segmentation _points _provided _by the embodiments of the present invention shown in FIGS. , p _ix +B _x ], where x is a continuous natural number from 1 to M, and M≥2, it can be judged in parallel whether at least part of the data in each of the M windows satisfies the predetermined condition C _x , or it can be judged sequentially in the windows Whether at least part of the data meets the predetermined condition can also be judged when at least part of the data in the window W _i1 [p _i1 -A ₁ , p _i1 +B ₁ ] meets the predetermined condition C ₁ , and then judge W _i2 [p _i2 -A ₂ , p When at least part of the data in _i2 +B ₂ ] satisfies the predetermined condition C ₂ , until it is determined that at least part of the data in W _im [p _im -A _m , p _im +B _m ] satisfies the predetermined condition C _m . The determination of other windows in the embodiment is the same as this, and will not be repeated here.

另外，根据根据图4至图17所示的本发明实施例，实际应用中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定M个点p_x、点p_x对应的窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到M连续的自然数，M≥2，在该预设规则中，A₁、A₂、A₃…A_m可以不全部相等，B₁、B₂、B₃…B_m可以不全部相等，C₁、C₂、C₃…C_M也可以不全部相同。在图5所示的实施方式中，在窗口W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、 W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]和W_i11[p_i11-169,p_i11]中，各窗口大小相同,即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[p_i1-169,p_i1]中至少部分数据是否满足预定条件C₁的描述，但在图11所示的实施方式中，W_i1[p_i1-169,p_i1]、W_i2[p_i2-169,p_i2]、W_i3[p_i3-169,p_i3]、W_i4[p_i4-169,p_i4]、W_i5[p_i5-169,p_i5]、W_i6[p_i6-169,p_i6]、W_i7[p_i7-169,p_i7]、W_i8[p_i8-169,p_i8]、W_i9[p_i9-169,p_i9]、W_i10[p_i10-169,p_i10]与W_i11[p_i11-182,p_i11]窗口大小可以不相同,同时判断窗口中至少部分数据是否满足预定条件的方式也可以不相同。在所有实施例中，根据在去重服务器103上预设的规则，判断窗口W_i1中至少部分数据是否满足预定条件C₁的方式与判断窗口W_j1中至少部分数据是否满足预定条件C₁的方式必然相同，判断W_i2中至少部分数据是否满足预定条件C₂的方式与判断W_j2中至少部分数据是否满足预定条件C₂的方式必然相同…判断窗口W_iM中至少部分数据是否满足预定条件C_M的方式与判断窗口W_jM中至少部分数据是否满足预定条件C_M的方式必然相同。在此不再赘述，同时根据图4至图17所示的本发明实施例，虽然均以M＝11为例，但根据实际需要，M的取值并不限于11，本领域技术人员根据本发明实施例中的描述，确定M的值。In addition, according to the embodiment of the present invention shown in FIG. 4 to FIG. 17 , in practical applications, rules are preset on the deduplication server 103, and the rules are: determine M points p _x , point The window W _x [p _x -A _x ,p _x +B _x ] corresponding to p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], x are 1 to M respectively Continuous natural numbers, M≥2, in this default rule, A ₁ , A ₂ , A ₃ ... A _m may not all be equal, B ₁ , B ₂ , B ₃ ... B _m may not all be equal, C ₁ , C ₂ , C ₃ . . . C _M may not all be the same. In the embodiment shown in FIG. 5 , in windows W _i1 [p _i1 -169,p _i1 ], W _i2 [p _i2 -169,p _i2 ], W _i3 [p _i3 -169,p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _{In i8} -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169,p _i10 ] and W _i11 [p _i11 -169,p _i11 ], each window has the same size, That is, the size of the window is 169 bytes. At the same time, the method of judging whether at least part of the data in the window meets the predetermined condition is the same. For details, see the above judgment of whether at least part of the data in W _i1 [p _i1 -169,p _i1 ] satisfies the predetermined condition C ₁ , but in the embodiment shown in Figure 11, W _i1 [p _i1 -169, p _i1 ], W _i2 [p _i2 -169, p _i2 ], W _i3 [p _i3 -169, p _i3 ], W _i4 [p _i4 -169,p _i4 ], W _i5 [p _i5 -169,p _i5 ], W _i6 [p _i6 -169,p _i6 ], W _i7 [p _i7 -169,p _i7 ], W _i8 [p _i8 -169,p _i8 ], W _i9 [p _i9 -169,p _i9 ], W _i10 [p _i10 -169,p _i10 ] and W _i11 [p _i11 -182,p _i11 ] window sizes can be different , and at the same time, the manner of judging whether at least part of the data in the window satisfies the predetermined condition may also be different. In all embodiments, according to the preset rules on the deduplication server 103, the method of judging whether at least part of the data in the window W _i1 satisfies the predetermined condition _C1 is the same as the method of judging whether at least part of the data in the window W _j1 satisfies the predetermined condition _C1 The method must be the same, the method of judging whether at least part of the data in W _i2 satisfies the predetermined condition _C2 must be the same as the method of judging whether at least part of the data in W _j2 satisfies the predetermined condition _C2 ...judging whether at least part of the data in the window W _iM satisfies the predetermined condition The way of C _M is necessarily the same as the way of judging whether at least part of the data in the window _WjM satisfies the predetermined condition C _M. No more details here, and according to the embodiments of the present invention shown in Figure 4 to Figure 17, although M=11 is taken as an example, according to actual needs, the value of M is not limited to 11, those skilled in the art The description in the embodiment of the invention determines the value of M.

根据图4至图17所示的本发明实施例，在去重服务器103上预设有规则，k_a、k_i、k_j、k_l和k_m为沿着数据流分割点查找方向查找分割点时获得的潜在分割点，k_a、k_i、k_j、k_l和k_m都依据该规则。本发明实施例中的窗口W_x[p_x-A_x,p_x+B_x]表示一个特定范围，在该特定范围选择数据以判断这些数据是否满足预定条件C_x，具体地，可以在该特定范围内选择部分数据，也可以选择全部数据以判断这些数据是否满足预定条件C_x。本发明实施例中具体使用的窗口概念可参照窗口W_x[p_x-A_x,p_x+B_x]，在此不再赘述。According to the embodiment of the present invention shown in FIG. 4 to FIG. 17 , there are preset rules on the deduplication server 103, k _a , k _i , k _j , k _l and k _m are the search division along the direction of data flow segmentation point search. The potential segmentation points obtained at point k _a , k _i , k _j , k _l and k _m all follow this rule. The window W _x [p _x -A _x ,p _x +B _x ] in the embodiment of the present invention represents a specific range, in which data is selected to determine whether the data meet the predetermined condition C _x , specifically, it can be Select part of the data within a specific range, or select all the data to determine whether these data meet the predetermined condition C _x . For the window concept specifically used in the embodiment of the present invention, reference may be made to window W _x [p _x -A _x , p _x +B _x ], which will not be repeated here.

根据图4至图17所示的本发明实施例，窗口W_x[p_x-A_x,p_x+B_x]中，(p_x-A_x)和(p_x+B_x)表示该窗口W_x[p_x-A_x,p_x+B_x]的两个边界，其中(p_x-A_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找反方向的边界，(p_x+B_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找方向的边界。具体地，在本发明实施例中，在图3至图15所示的数据流分割点查找方向为从左向右，则其中(p_x-A_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找反方向的边界(即左边界)，(p_x+B_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找方向的边界(即右边界)。如果在图3至图15所示的数据流分割点查找方向为从右向左，则其中(p_x-A_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找反方向的边界(即右边界)，(p_x+B_x)表示窗口W_x[p_x-A_x,p_x+B_x]相对于点p_x位于数据流分割点查找方向的边界(即左边界)。According to the embodiment of the present invention shown in Fig. 4 to Fig. 17, in the window W _x [p _x -A _x ,p _x +B _x ], (p _x -A _x ) and (p _x +B _x ) represent the window Two boundaries of W _x [p _x -A _x ,p _x +B _x ], where (p _x -A _x ) means window W _x [p _x -A _x ,p _x +B _x ] relative to point p _x The boundary located in the direction opposite to the data stream split point search, (p _x +B _x ) means the boundary of the window W _x [p _x -A _x ,p _x +B _x ] located in the direction of the data stream split point search relative to point p _x . Specifically, in the embodiment of the present invention, the search direction of the data stream segmentation point shown in Fig. 3 to Fig. 15 is from left to right, where (p _x -A _x ) represents the window W _x [p _x -A _x ,p _x +B _x ] relative to the point p _x is located at the boundary in the opposite direction of the data flow split point (ie the left boundary), (p _x +B _x ) means the window W _x [p _x -A _x ,p _x +B _x ] relative to the point p _x is located at the boundary (ie, the right boundary) of the search direction of the data flow split point. If the search direction of the data flow splitting point shown in Figure 3 to Figure 15 is from right to left, then (p _x -A _x ) represents the window W _x [p _x -A _x ,p _x +B _x ] relative to The point p _x is located at the boundary of the opposite direction of the data flow split point search (ie the right boundary), (p _x +B _x ) means that the window W _x [p _x -A _x ,p _x +B _x ] is located at the data stream relative to the point p _x The boundary of the stream split point lookup direction (i.e. the left boundary).

本领域普通技术人员可以意识到，结合本发明实施例描述的各示例的单元及算法步骤，本发明实施例的关键特征可以与其他技术相结合，以更为复杂的形式呈现，但仍会包含本发明的关键特征。在真实环境中可能使用备用分割点，例如一种实施方式为，根据在去重服务器103上预设的规则，为潜在分割点k_i确定11个点p_x，x为1到11连续的自然数，确定p_x对应的窗口W_x[p_x-A_x,p_x+B_x]及窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，当11个窗口中每一个窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据均满足预定条件C_x，则潜在分割点k_i为数据流分割点，当超过设定的最大数据块时，仍未查找到分割点，这时可能使用备用的预设规则，备用的预设规则与在去重服务器103上预设的规则类似，备用的预设规则为：例如为潜在分割点k_i确定10个点p_x，x为1到10连续的自然数，确定p_x对应的窗口W_x[p_x-A_x,p_x+B_x]及窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，当10个窗口中每一个窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据均满足预定条件C_x，则潜在分割点k_i为数据流分割点，当超过设定的最大数据块时，仍未查找到数据流分割点时，从最大数据块的结束位置作为强制分割点。Those skilled in the art can realize that, in combination with the units and algorithm steps of the examples described in the embodiments of the present invention, the key features of the embodiments of the present invention can be combined with other technologies and presented in a more complex form, but still include Key features of the invention. It is possible to use alternate split points in a real environment. For example, one implementation is to determine 11 points p _x for potential split points _ki according to the preset rules on the deduplication server 103, where x is a continuous natural number from 1 to 11 , determine the window W _x [p _x -A _x ,p _x +B _x ] corresponding to p _x and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], when 11 windows At least part of the data in each window W _x [p _x -A _x ,p _x +B _x ] in each window satisfies the predetermined condition C _x , then the potential split point _ki is the data stream split point, when the set maximum data block is exceeded When the split point is not found yet, a spare preset rule may be used at this time. The spare preset rule is similar to the preset rule on the deduplication server 103. The spare preset rule is: for example, a potential split point k _i Determine 10 points p _x , x is a continuous natural number from 1 to 10, determine the window W _x [p _x -A _x ,p _x +B _x ] and the window W _x [p _x -A _x ,p corresponding to p _x _x +B _x ] corresponding to the predetermined condition C _x , when at least part of the data in each of the 10 windows W _x [p _x -A _x ,p _x +B _x ] satisfies the predetermined condition C _x , the potential segmentation point k _i is the data stream split point. When the data stream split point is not found when the set maximum data block is exceeded, the end position of the largest data block is used as the mandatory split point.

在去重服务器103上预设有规则，所述规则中为潜在分割点k确定M个点，并不一定要求先有一个潜在分割点k，可以通过确定的M个点来判断潜在分割点k。There are preset rules on the deduplication server 103, in which M points are determined for the potential segmentation point k, and a potential segmentation point k is not necessarily required first, and the potential segmentation point k can be judged by the determined M points .

本发明实施例提供一种基于去重服务器查找数据流分割点的方法，如图20所示，包括：An embodiment of the present invention provides a method for finding a data stream segmentation point based on a deduplication server, as shown in FIG. 20 , including:

在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，、A_x和B_x为整数；在图3所示的实施方式中，关于M的取值，其中一种实现方式，M*U取值不大于预设的两个相邻的数据流分割点之间的最大距离，即预设的数据块最大长度。判断窗口W_z[k-A_z，k+B_z]中至少部分数据是否满足预定条件C_z，其中，z为整数，1≤z≤M，(k-A_z)与(k+B_z)分别表示窗口W_z的两个边界。当判断任意一个窗口W_z[k-A_z，k+B_z]中至少部分数据不满足预定条件C_z，则从潜在分割点k沿数据流分割点查找方向跳跃N个字节，N≤‖B_z‖+max_x(‖A_x‖)。其中，‖B_z‖表示W_z[k-A_z，k+B_z]中B_z的绝对值，max_x(‖A_x‖)表示M个窗口中A_x绝对值中的最大值，将在下面实施例中具体介绍N取值的原理。当判断M个窗口中的每一个窗口W_x[k-A_x,k+B_x]中至少部分数据满足预定条件C_x，则潜在分割点k为数据流分割点。A rule is preset on the deduplication server 103, and the rule is: determine M windows W _x [kA _x , k+B _x ] corresponding to windows W _x [kA _x , k+B _x ] for a potential segmentation point k The predetermined condition C _x , wherein, x is a continuous natural number from 1 to M, M≥2, A _x and B _x are integers; in the embodiment shown in Figure 3, regarding the value of M, one of the realizations In this way, the value of M*U is not greater than the preset maximum distance between two adjacent data stream segmentation points, that is, the preset maximum length of the data block. Determine whether at least part of the data in the window W _z [kA _z , k+B _z ] satisfies the predetermined condition C _z , where z is an integer, 1≤z≤M, (kA _z ) and (k+B _z ) respectively represent the window The two boundaries of W _z . When it is judged that at least part of the data in any window W _z [kA _z , k+B _z ] does not meet the predetermined condition C _z , jump N bytes from the potential split point k along the search direction of the data stream split point, N≤‖B _z ‖+max _x (‖A _x ‖). Among them, ‖B _z ‖ represents the absolute value of B _z in W _z [kA _z , k+B _z ], max _x (‖A _x ‖) represents the maximum value of the absolute value of A _x in M windows, which will be shown below The principle of selecting the value of N is specifically introduced in the embodiment. When it is determined that at least part of the data in each of the M windows W _x [kA _x , k+B _x ] satisfies the predetermined condition C _x , then the potential split point k is a data stream split point.

具体地，对当前潜在分割点k_i，依据所述规则，执行以下步骤：Specifically, for the current potential segmentation point k _i , according to the rules, the following steps are performed:

步骤2001：依据所述规则为当前潜在分割点k_i确定对应的窗口W_iz[k_i-A_z,k_i+B_z]，i和z为整数,并且1≤z≤M；Step 2001: Determine the corresponding window W _iz [k _i -A _z , _ki +B _z ] for the current potential segmentation point ki according to the rules, _i and z are integers, and 1≤z≤M;

步骤2002：判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z；Step 2002: judging whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z ;

当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖)，获得新的潜在分割点，执行步骤2001；When at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] does not meet the predetermined condition C _z , search for the direction from the current potential split point _ki along the data flow split point Skip N data stream segmentation point minimum search unit U, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖), obtain a new potential segmentation point, execute step 2001;

当所述当前潜在分割点k_i的M个窗口中的每一个窗口W_ix[k_i-A_x,k_i+B_x]中至少部分数据满足预定条件C_x，则所述当前潜在分割点k_i为数据流分割点。When at least part of the data in each of the M windows W _ix [k _i -A _x , _ki +B _x ] of the current potential segmentation point k _i satisfies the predetermined condition C _x , then the current potential segmentation point k _i is the data stream split point.

进一步地,判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z；更进一步地,所述使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体为使用hash函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z。Further, judging whether at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] satisfies the predetermined condition C _z specifically includes: using a random function to judge whether the window W _iz [k _i Whether at least part of the data in -A _z , _ki +B _z ] satisfies the predetermined condition C _z ; further, the use of a random function to judge the window W _iz [ _ki -A _z , _ki +B _z ] whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z , specifically using a hash _function .

当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。When at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] does not meet the predetermined condition C _z , search for the direction from the current potential split point _ki along the data flow split point Skip the minimum search unit U of N data stream segmentation points to obtain the new potential segmentation point, and according to the rules, the window W _ic [k _i -A _c , _ki +B determined for the new potential segmentation point _c ] is coincident with the right boundary of the window W _iz [k _i -A _z , _ki +B _z ] or the window W _ic [k _i -A _c , _ki +B _c ] the left boundary of the window W _iz [k _i -A _z , _ki +B _z ]; wherein, the window W _ic determined for the new potential segmentation point [k _i -A _c , _ki +B _c ] is the first window in the sequence obtained according to the search direction of the data flow among the M windows determined for the new potential segmentation point according to the rule.

本发明实施例中通过判断M个窗口中某一个窗口中至少部分数据是否满足预定条件，来查找数据流分割点，当某一个窗口中至少部分数据不满足预定条件，则跳过N*U个长度，其中，N*U不大于‖B_z‖+max_x(‖A_x‖)，获得下一个潜在分割点，提高了数据流分割点查找效率。In the embodiment of the present invention, the data flow segmentation point is searched by judging whether at least part of the data in one of the M windows satisfies the predetermined condition, and when at least part of the data in a certain window does not meet the predetermined condition, skip N*U length, wherein, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖), to obtain the next potential segmentation point, which improves the efficiency of searching for data stream segmentation points.

在重复数据删除过程中，为保证数据块大小均匀，会考虑平均数据块(也可以称为平均分块)大小，即在满足最小数据块大小和最大数据块大小限定的同时，会确定平均数据块大小，以保证获得的数据块大小均匀。窗口W_x[k-A_x，k+B_x]的个数M与窗口W_x[k-A_x，k+B_x]中至少部分数据满足预设条件的概率这两个因素决定了找到数据流分割点的概率(以P(n)表示)，前者影响跳跃的长度，后者影响跳跃的概率，二者共同影响平均分块大小。一般而言，在平均分块大小固定时，W_x[k-A_x，k+B_x]个数增加，则单个窗口W_x[k-A_x，k+B_x]中至少部分数据满足预定条件的概率也增加，例如在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[k-A_x，k+B_x]，x分别为1到11连续的自然数，11个窗口中任一个窗口W_x[k-A_x，k+B_x]中至少部分数据满足预设条件的概率为1/2。而在去重服务器103上预设的另一组规则为：为潜在分割点k确定24个窗口W_x[k-A_x，k+B_x]，x分别为1到24连续的自然数，24个窗口中任一个窗口W_x[k-A_x，k+B_x]中至少部分数据满足预设条件的概率3/4，具体窗口W_x[k-A_x，k+B_x]中至少部分数据满足预设条件的概率设定可参见判断窗口W_x[k-A_x，k+B_x]中至少部分数据是否满足预设条件部分的描述。窗口W_x[k-A_x，k+B_x]的个数M与窗口W_x[k-A_x，k+B_x]中至少部分数据满足的预设条件的概率这两个因素决定P(n)，P(n)表示：从数据流起始位置或者从上一数据流分割点查找n个数据流分割点最小查找单位后没找到数据流分割点的概率。关于这两个因素决定P(n)的计算过程，实际上是一个多步长Fibonacci数列，后面将具体描述。得到P(n)后，1-P(n)即为数据流分割点的分布函数，(1-P(n))-(1-P(n-1))＝P(n-1)-P(n)即为在n个数据流分割点最小查找单位找到数据流分割点概率，也就是数据流分割点的密度函数，根据数据流分割点的密度函数就可以积分从而求得数据流分割点的期望长度，即平均分块大小,其中，4*1024(字节)表示最小数据块长度，12*1024(字节)表示最大数据块长度。In the deduplication process, in order to ensure uniform data block size, the average data block (also called average block size) size will be considered, that is, the average data block size will be determined while meeting the minimum data block size and maximum data block size restrictions. Block size to ensure that the obtained data blocks are of uniform size. The number M of the window W _x [kA _x , k+B _x ] and the probability that at least part of the data in the window W _x [kA _x , k+B _x ] meet the preset conditions determine the finding of the data stream segmentation point The probability of (denoted by P(n)), the former affects the length of the jump, the latter affects the probability of the jump, and both affect the average block size. Generally speaking, when the average block size is fixed and the number of W _x [kA _x , k+B _x ] increases, the probability that at least part of the data in a single window W _x [kA _x , k+B _x ] satisfies the predetermined condition Also increase, for example, there are rules preset on the deduplication server 103, the rules are: determine 11 windows W _x [kA _x , k+B _x ] for the potential segmentation point k, and x are respectively 1 to 11 consecutive natural numbers , the probability that at least part of the data in any one of the 11 windows W _x [kA _x , k+B _x ] satisfies the preset condition is 1/2. Another set of rules preset on the deduplication server 103 is: determine 24 windows W _x [kA _x , k+B _x ] for the potential segmentation point k, x is a continuous natural number from 1 to 24, and 24 windows The probability that at least part of the data in any window W _x [kA _x , k+B _x ] meets the preset conditions is 3/4, and at least part of the data in the specific window W _x [kA _x , k+B _x ] meets the preset conditions For setting the probability of , please refer to the description of judging whether at least part of the data in the window W _x [kA _x , k+B _x ] satisfies the preset condition. The number M of the window W _x [kA _x , k+B _x ] and the probability that at least part of the data in the window W _x [kA _x , k+B _x ] satisfy the preset conditions determine P(n), P(n) represents: the probability that no data stream split point is found after searching n minimum search units of data stream split points from the start position of the data stream or from the previous data stream split point. The calculation process of determining P(n) by these two factors is actually a multi-step Fibonacci sequence, which will be described in detail later. After obtaining P(n), 1-P(n) is the distribution function of the data stream segmentation point, (1-P(n))-(1-P(n-1))=P(n-1)- P(n) is the probability of finding the data stream split point at the minimum search unit of n data stream split points, that is, the density function of the data stream split point, which can be integrated according to the density function of the data stream split point Thereby, the expected length of the split point of the data stream is obtained, that is, the average block size, wherein 4*1024 (bytes) represents the minimum data block length, and 12*1024 (bytes) represents the maximum data block length.

在图3所示的数据流分割点查找的基础上，在图21所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到11连续的自然数，A_x和B_x为整数。其中，A₁＝169，B₁＝0；A₂＝170，B₂＝-1；A₃＝171，B₃＝-2；A₄＝172，B₄＝-3；A₅＝173，B₅＝-4；A₆＝174，B₆＝-5；A₇＝175，B₇＝-6；A₈＝176，B₈＝-7；A₉＝177，B₉＝-8；A₁₀＝178，B₁₀＝-9；A₁₁＝179，B₁₁＝-10，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，则11个窗口分别为W₁[k-169,k]、W₂[k-170,k-1]、W₃[k-171,k-2]、W₄[k-172,k-3]、 W₅[k-173,k-4]、W₆[k-174,k-5]、W₇[k-175,k-6]、W₈[k-176,k-7]、W₉[k-177,k-8]、W₁₀[k-178,k-9]和W₁₁[k-179,k-10]。k_a为数据流分割点，图21中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，根据为去重服务器103预设的规则，为潜在分割点k_i确定窗口W_ix[k_i-A_x,k_i+B_x],在本实施例中，x分别为1到11连续的自然数。在图21所示的实施方式中，为潜在分割点k_i确定的窗口为11个，分别为W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-178,k_i-9]和W_i11[k_i-179,k_i-10]。判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁、判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂、判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-178,k_i-9]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图22所示，W_i5[k_i-173,k_i-4]，则从潜在分割点k_i沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₅‖+max_x(‖A_x‖)，在图22所示的实施方式中，跳跃N个字节不大于183个字节，在本实施例中，N＝7，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j。根据图21所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k_j确定窗口W_jx[k_j-A_x,k_j+B_x]，在本实施例中，x分别为1到11连续的自然数。为潜在分割点k_j确定的窗口为11个，分别为W_j1[k_j-169,k_j]、W_j2[k_j-170,k_j-1]、W_j3[k_j-171,k_j-2]、W_j4[k_j-172,k_j-3]、W_j5[k_j-173,k_j-4]、W_j6[k_j-174,k_j-5]、W_j7[k_j-175,k_j-6]、W_j8[k_j-176,k_j-7]、W_j9[k_j-177,k_j-8]、W_j10[k_j-178,k_j-9]和W_j11[k_j-179,k_j-10]。如图22所示，为潜在分割点确定的第11个窗口W_j11[k_j-179,k_j-10]，在保证潜在分割点k_i与潜在分割点k_j之间的范围都在判断范围之内，则在本实施方式中，必须保证窗口W_j11[k_j-179,k_j-10]的左边界与窗口W_i5[k_i-173,k_i-4]的右边界(k_i-4)重合，或者位于窗口W_i5[k_i-173,k_i-4]范围之内，所述窗口W_j11[k_j-179,k_j-10]是根据所述规则，为所述潜在分割点k_j确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。因此，在这一限定内，当窗口W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅，从潜在分割点k_i沿数据流分割点查找方向跳跃的距离不大于‖B₅‖+max_x(‖A_x‖)。判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁、判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂、判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃、判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 21, a rule is preset on the deduplication server 103, and the rule is: determine 11 windows for the potential segmentation point k W _x [kA _x , k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ], wherein x is a continuous natural number from 1 to 11, and A _x and B _x are integers. Among them, A ₁ =169, B ₁ =0; A ₂ =170, B ₂ =-1; A ₃ =171, B ₃ =-2; A ₄ =172, B ₄ =-3; A ₅ =173, B ₅ =-4; A ₆ =174, B ₆ =-5; A ₇ =175, B ₇ =-6; A ₈ =176, B ₈ =-7; A ₉ =177, B ₉ =-8; A ₁₀ =178, B ₁₀ =-9; A ₁₁ =179, B ₁₁ =-10, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ = C ₁₀ ＝C ₁₁ , then the 11 windows are W ₁ [k-169,k], W ₂ [k-170,k-1], W ₃ [k-171,k-2], W ₄ [k -172,k-3], W ₅ [k-173,k-4], W ₆ [k-174,k-5], W ₇ [k-175,k-6], W ₈ [k-176 ,k-7], W ₉ [k-177,k-8], W ₁₀ [k-178,k-9] and W ₁₁ [k-179,k-10]. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 21 is from left to right. After skipping the minimum data block 4KB from the data stream split point k _a , the minimum data block 4KB end position is taken as the next potential For split point k _i , according to the preset rules for the deduplication server 103, determine the window W _ix [k _i -A _x , k _i +B _x ] for the potential split point k _i , in this embodiment, x is 1 respectively to 11 consecutive natural numbers. In the embodiment shown in FIG. 21 , there are 11 windows determined for the potential segmentation point ki, which are respectively W _i1 [k _i -169, _ki ], W _i2 [ _k _i -170, _ki -1] , W _i3 [k _i -171,k _i -2], W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4], W _i6 [k _i -174 ,k _i -5], W _i7 [k _i -175,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177,k _i -8], W _i10 [k _i -178, _ki -9] and W _i11 [k _i -179, _ki -10]. Judging whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _i3 [k _i -171, _ki -2] meets the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [k _i -172, _ki -3] meet the predetermined condition C ₄ , and judge W Whether at least part of the data in _i5 [k _i -173, _ki -4] meets the predetermined condition C ₅ , judge whether at least part of the data in W _i6 [k _i -174, _ki -5] meet the predetermined condition C ₆ , and judge W Whether at least part of the data in _i7 [k _i -175, _ki -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -176, _ki -7] meet the predetermined condition C ₈ , and judge W Whether at least part of the data in _i9 [k _i -177, _ki -8] meets the predetermined condition C ₉ , judge whether at least part of the data in W _i10 [k _i -178, _ki -9] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _i11 [k _i -179, _ki -10] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined conditions, as shown in Figure 22, W _i5 [k _i -173, _ki -4], then from the potential segmentation point _ki along the data The search direction of the stream split point jumps N bytes, wherein N bytes are not greater than ‖B ₅ ‖+max _x (‖A _x ‖), and in the embodiment shown in Figure 22, the jump N bytes are not greater than 183 bytes. In this embodiment, N=7 to obtain a new potential segmentation point. In order to distinguish it from the potential segmentation point ki, the new potential segmentation point is denoted as _{k j} _here . According to the embodiment shown in FIG. 21 , a rule is preset on the deduplication server 103, and the rule is: determine a window W _jx [k _j -A _x , k _j +B _x ] for a potential segmentation point k _j , In this embodiment, x are consecutive natural numbers from 1 to 11, respectively. There are 11 windows determined for the potential segmentation point k _j , which are respectively W _j1 [k _j -169,k _j ], W _j2 [k _j -170,k _j -1], W _j3 [k _j -171,k _j -2], W _j4 [k _j -172,k _j -3], W _j5 [k _j -173,k _j -4], W _j6 [k _j -174,k _j -5], W _j7 [ k _j -175,k _j -6], W j8 [k _j _-176 ,k _j -7], W _j9 [k _j -177,k _j -8], W _j10 [k _j -178,k _j - 9] and W _j11 [k _j -179, k _j -10]. As shown in Figure 22, the 11th window W _j11 [k _j -179, k _j -10] determined for the potential segmentation point is guaranteed to be within the range between the potential segmentation point k _i and the potential segmentation point k _j range, then in this embodiment, it must be ensured that the left boundary of window W _j11 [k _j -179, k _j -10] and the right boundary of window W _i5 [k _i -173, k _i -4] (k _i -4) coincides, or is located within the range of window W _i5 [k _i -173, _ki -4], the window W _j11 [k _j -179, k _j -10] is based on the rule and is The M _windows determined by the above-mentioned potential segmentation point kj are the first-ranked windows in the sequence obtained according to the search direction of the data flow. Therefore, within this limit, when at least part of the data in the window W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , the jumping distance from the potential split point _ki along the direction of data stream split point search Not greater than ‖B ₅ ‖+max _x (‖A _x ‖). Judging whether at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _j3 [k _j -171,k _j -2] meets the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [k _j -172,k _j -3] meet the predetermined condition C ₄ , judge W Whether at least part of the data in _j5 [k _j -173, k _j -4] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174, k _j -5] meet the predetermined condition C ₆ , judge W Whether at least part of the data in _j7 [k _j -175, k _j -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W j8 [k _j _-176 , k _j -7] meet the predetermined condition C ₈ , and judge W Whether at least part of the data in _j9 [k _j -177, k _j -8] meets the predetermined condition C ₉ , judge whether at least part of the data in W _j10 [k _j -178, k _j -9] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _j11 [k _j -179, k _j -10] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _i is the data flow split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data flow split point. When it is judged that the potential split point k _j is not a data flow split point, the next potential split point is obtained in the same manner as k _i , and the next potential split point is judged according to the preset rules on the deduplication server 103 and the above method. Split point for data flow. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the forced split point.

在如图21所示的实施方式中，按照在去重服务器103上预设的规则，从判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁开始，当判断W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]和W_i4[k_i-172,k_i-3]中至少部分数据中至少部分数据分别满足预定条件C₁、C₂、C₃和C₄，判断W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅时，从潜在分割点k_i沿着数据流分割点查找方向跳跃6个字节，在第6个字节的结束位置获得新的潜在分割点，为与其他潜在分割点区别，这里表示为k_g，按照在去重服务器103上预设的规则，为潜在分割点k_g确定11个窗口，分别为W_g1[k_g-169,k_g]、W_g2[k_g-170,k_g-1]、W_g3[k_g-171,k_g-2]、W_g4[k_g-172,k_g-3]、W_g5[k_g-173,k_g-4]、W_g6[k_g-174,k_g-5]、W_g7[k_g-175,k_g-6]、W_g8[k_g-176,k_g-7]、W_g9[k_g-177,k_g-8]、W_g10[k_g-178,k_g-9]和W_g11[k_g-179,k_g-10]。判断W_g1[k_g-169,k_g]中至少部分数据是否满足预定条件C₁、判断W_g2[k_g-170,k_g-1]中至少部分数据是否满足预定条件C₂、判断W_g3[k_g-171,k_g-2]中至少部分数据是否满足预定条件C₃、判断W_g4[k_g-172,k_g-3]中至少部分数据是否满足预定条件C₄、判断W_g5[k_g-173,k_g-4]中至少部分数据是否满足预定条件C₅、判断W_g6[k_g-174,k_g-5]中至少部分数据是否满足预定条件C₆、判断W_g7[k_g-175,k_g-6]中至少部分数据是否满足预定条件C₇、判断W_g8[k_g-176,k_g-7]中至少部分数据是否满足预定条件C₈、判断W_g9[k_g-177,k_g-8]中至少部分数据是否满足预定条件C₉、判断W_g10[k_g-178,k_g-9]中至少部分数据是否满足预定条件C₁₀和判断W_g11[k_g-179,k_g-10]中至少部分数据是否满足预定条件C₁₁。窗口W_g11[k_g-179,k_g-10]与窗口W_i5[k_i-173,k_i-4]重合，并且C₅＝C₁₁，因此，当判断W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅时，从潜在分割点k_i沿着数据流分割点查找方向跳跃T个字节，获得的潜在分割点k_g仍然不符合作为数据流分割点的条件。因此，如果从潜在分割点k_i沿着数据流分割点查找方向跳跃6个字节会存在重复计算，因此，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节可以减少重复计算，效率更高。因此提高了查找数据流分割点的速度。当预设规定中窗口W_x[k-A_x,k+B_x]中至少部分数据满足预定条件C_x的概率为1/2时，即是说以1/2的概率执行跳跃，每次最多可以跳跃‖B₁₁‖+‖A₁₁‖＝189个字节。In the embodiment shown in FIG. 21 , according to the preset rules on the deduplication server 103, it starts from judging whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , when judging W _i1 [k _i -169,k _i ], W _i2 [k _i -170,k _i -1], W _i3 [k _i -171,k _i -2] and W _i4 [k _i -172,k _i -3] At least part of the data in at least part of the data meet the predetermined conditions C ₁ , C ₂ , C ₃ and C ₄ respectively, and it is judged that at least part of the data in W _i5 [ _ki -173, _ki -4] does not meet the predetermined condition C ₅ , jump 6 bytes from the potential segmentation point _ki along the data stream segmentation point search direction, and obtain a new potential segmentation point at the end position of the 6th byte. To distinguish it from other potential segmentation points, it is expressed as k _g , according to the preset rules on the deduplication server 103, determine 11 windows for the potential segmentation point k _g , which are respectively W _g1 [k _g -169,k _g ], W _g2 [k _g -170,k _g -1], W _g3 [k _g -171,k _g -2], W _g4 [k _g -172,k _g -3], W _g5 [k _g -173,k _g -4], W _g6 [k _g -174,k _g -5], W _g7 [k _g -175,k _g -6], W _g8 [k _g -176,k _g -7], W _g9 [k _g -177,k _g -8 ], W _g10 [k _g -178,k _g -9] and W _g11 [k _g -179,k _g -10]. Judging whether at least part of the data in W _g1 [k _g -169, _kg ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _g2 [k _g -170, _kg -1] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _g3 [k _g -171, _kg -2] satisfies the predetermined condition C ₃ , judging whether at least part of the data in W _g4 [k _g -172, _kg -3] meets the predetermined condition C ₄ , judging W Whether at least part of the data in _g5 [k _g -173, _kg -4] satisfies the predetermined condition C ₅ , judging whether at least part of the data in _g6 [k _g -174, _kg -5] meets the predetermined condition C ₆ , judging W Whether at least part of the data in _g7 [k _g -175, _{kg g} -6] meets the predetermined condition C ₇ , judging whether at least part of the data in W _g8 [k _g -176, _{kg g} -7] meets the predetermined condition C ₈ , judging W Whether at least part of the data in _g9 [k _g -177, k _g -8] satisfies the predetermined condition C ₉ , judge whether at least part of the data in W _g10 [k _g -178, k _g -9] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _g11 [k _g -179,k _g -10] satisfies the predetermined condition C ₁₁ . Window W _g11 [k _g -179, k _g -10] coincides with window W _i5 [k _i -173, _ki -4], and C ₅ =C ₁₁ , therefore, when judging W _i5 [k _i -173, When at least part of the data in _ki -4] does not meet the predetermined condition C ₅ , jump T bytes from the potential segmentation point _ki along the data stream segmentation point search direction, and the obtained potential segmentation point k _g still does not meet the data stream Conditions for split points. Therefore, there will be double counting if 6 bytes are jumped from the potential split point _ki along the data flow split point lookup direction, so jumping 7 bytes from the potential split point _ki along the data stream split point lookup direction can reduce Repeated calculations are more efficient. Therefore, the speed of finding data stream split points is increased. When the probability that at least part of the data in the window W _x [kA _x , k+B _x ] satisfies the predetermined condition C _x is 1/2, that is to say, the jump is performed with the probability of 1/2, and each time the maximum can be Jump ‖B ₁₁ ‖+‖A ₁₁ ‖=189 bytes.

在本实施方式中，预定规则为:为潜在分割点k确定11个窗口W_x[k-A_x,k+B_x]及窗口W_x[k-A_x,k+B_x]中至少部分数据满足预设条件C_x，其中W_x[k-A_x,k+B_x]中至少部分数据满足预设条件C_x的概率为1/2，x分别为1到11连续的自然数并且A_x和B_x为整数。其中，A₁＝169，B₁＝0；A₂＝170，B₂＝-1；A₃＝171，B₃＝-2；A₄＝172，B₄＝-3；A₅＝173，B₅＝-4；A₆＝174，B₆＝-5；A₇＝175，B₇＝-6；A₈＝176，B₈＝-7；A₉＝177，B₉＝-8；A₁₀＝178，B₁₀＝-9；A₁₁＝179，B₁₁＝-10，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。即为潜在分割点k选择11个窗口，并且为连续11个窗口，通过这两个因素可以计算P(n)。11个窗口的选择方式及判断11个窗口中的每一个窗口中至少部分数据满足预定条件C_x遵循在去重服务器103上预设的规则，因此是否存在连续11个窗口中每一个窗口中至少部分数据满足预定条件C_x就决定潜在分割点k是否为数据流分割点。我们称两个字节之间的间隙为一个点。P(n)表示：连续的n个窗口内不存在连续的11个满足条件的窗口的概率，即不存在数据流分割点的概率。从文件头/上一分割点跳跃最小分块大小4KB后，向数据流分割点查找反方向回退10个字节，找到第4086个点，在该点处不存在数据流分割点，所以P(4086)＝1，依次类推，P(4087)＝1，……P(4095)＝1。在第4096个点处，即在最小分块大小处，以(1/2)^11的概率这11个窗口中每一个窗口中至少部分数据满足预定条件C_x，因此以(1/2)^11的概率存在数据流分割点，以1-(1/2)^11的概率不存在数据流分割点，所以P(4096)＝1-(1/2)^11。In this embodiment, the predetermined rule is: determine 11 windows W _x [kA _x , k+B _x ] for the potential segmentation point k and at least part of the data in the windows W _x [kA _x , k+B _x ] satisfy the preset Condition C _x , where the probability that at least part of the data in W _x [kA _x ,k+B _x ] meets the preset condition C _x is 1/2, x is a continuous natural number from 1 to 11 and A _x and B _x are integers . Among them, A ₁ =169, B ₁ =0; A ₂ =170, B ₂ =-1; A ₃ =171, B ₃ =-2; A ₄ =172, B ₄ =-3; A ₅ =173, B ₅ =-4; A ₆ =174, B ₆ =-5; A ₇ =175, B ₇ =-6; A ₈ =176, B ₈ =-7; A ₉ =177, B ₉ =-8; A ₁₀ =178, B ₁₀ =-9; A ₁₁ =179, B ₁₁ =-10, and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ = C ₁₀ =C ₁₁ . That is, 11 windows are selected for the potential segmentation point k, and 11 consecutive windows are selected, and P(n) can be calculated by these two factors. The selection method of 11 windows and judging that at least part of the data in each of the 11 windows meet the predetermined condition C _x follows the preset rules on the deduplication server 103, so whether there is at least Part of the data satisfies the predetermined condition C _x to determine whether the potential split point k is a data stream split point. We call the gap between two bytes a point. P(n) represents: the probability that there are no consecutive 11 windows satisfying the condition within the consecutive n windows, that is, the probability that there is no data stream segmentation point. After skipping the minimum block size of 4KB from the file header/previous split point, go back 10 bytes in the opposite direction to the data stream split point, and find the 4086th point. There is no data stream split point at this point, so P (4086)=1, and so on, P(4087)=1, ... P(4095)=1. At the 4096th point, that is, at the minimum block size, at least part of the data in each of the 11 windows satisfies the predetermined condition C _x with the probability of (1/2)^11, so with (1/2) There is a data stream split point with a probability of ^11, and there is no data stream split point with a probability of 1-(1/2)^11, so P(4096)=1-(1/2)^11.

如图35所示，在第n个窗口处，可以分为12种情况来递推P(n)。As shown in Figure 35, at the nth window, P(n) can be deduced recursively in 12 cases.

情况1：第n个窗口中至少部分数据以1/2的概率不满足预定条件，此时第n个窗口前面的n-1个窗口以P(n-1)的概率不存在连续的11个窗口中每一个窗口至少部分数据均满足预定条件，因此P(n)包含1/2*P(n-1)。第n个窗口中至少部分数据不满足预定条件，并且且第n个点前面的n-1个窗口存在连续的11个窗口每一个窗口中至少部分数据均满足预定条件的情况与P(n)无关。Case 1: At least part of the data in the nth window does not meet the predetermined condition with a probability of 1/2. At this time, there are no consecutive 11 windows with a probability of P(n-1) in the n-1 windows in front of the nth window At least part of the data in each window in the window satisfies the predetermined condition, so P(n) includes 1/2*P(n-1). At least part of the data in the nth window does not meet the predetermined conditions, and there are 11 consecutive windows in the n-1 windows before the nth point, and at least part of the data in each window meets the predetermined conditions and P(n) irrelevant.

情况2：第n个窗口中至少部分数据以1/2的概率满足预定条件，第n-1个窗口中至少部分数据以1/2的概率不满足预定条件，此时第n-1个窗口前面的n-2个窗口以P(n-2)的概率不存在连续的11个窗口中每一个窗口中至少部分数据均满足预定条件，因此P(n)包含1/2*1/2*P(n-2)。第n个窗口中至少部分数据满足预定条件，第n-1个点窗口中至少部分数据不满足预定条件，并且第n-1个窗口前面的n-2个窗口存在连续的11个窗口中每一个窗口至少部分数据满足预定条件的情况与P(n)无关。Case 2: At least part of the data in the nth window meets the predetermined condition with a probability of 1/2, and at least part of the data in the n-1th window does not meet the predetermined condition with a probability of 1/2. At this time, the n-1th window The previous n-2 windows do not exist with the probability of P(n-2). At least part of the data in each of the 11 consecutive windows meets the predetermined conditions, so P(n) contains 1/2*1/2* P(n-2). At least part of the data in the nth window meets the predetermined condition, at least part of the data in the n-1th point window does not meet the predetermined condition, and the n-2 windows in front of the n-1th window exist in each of the 11 consecutive windows. The fact that at least part of the data in a window satisfies the predetermined condition has nothing to do with P(n).

依照上述描述，情况11：第n至n-9个窗口中至少部分数据以(1/2)^10的概率满足预定条件，第n-10个窗口中至少部分数据以1/2的概率不满足预定条件，此时第n-10个窗口前面的n-11个窗口以P(n-11)的概率不存在连续的11个窗口中每一个窗口中至少部分数据均满足预定条件，因此P(n)包含(1/2)^10*1/2*P(n-11)。第n至n-9个窗口中至少部分数据均满足预定条件，第n-10个窗口中至少部分数据不满足预定条件，并且第n-10个窗口前面的n-11个窗口存在连续的11个窗口中每一个窗口中至少部分数据均满足预定条件的情况与P(n)无关。According to the above description, case 11: at least part of the data in the nth to n-9 windows meets the predetermined condition with a probability of (1/2)^10, and at least part of the data in the n-10th window does not meet the predetermined condition with a probability of 1/2 Satisfy the predetermined condition, at this time, the n-11 windows in front of the n-10th window do not exist with the probability of P(n-11). At least part of the data in each of the 11 consecutive windows meets the predetermined condition, so P (n) contains (1/2)^10*1/2*P(n-11). At least part of the data in the n-9th window meets the predetermined condition, at least part of the data in the n-10th window does not meet the predetermined condition, and there are consecutive 11 windows in the n-11 windows before the n-10th window The fact that at least part of the data in each of the windows satisfies the predetermined condition has nothing to do with P(n).

情况12：第n至n-10个的窗口中至少部分数据以(1/2)^11的概率满足预定条件，该情况与P(n)无关。Case 12: at least part of the data in the nth to n-10th windows satisfies the predetermined condition with a probability of (1/2)^11, and this case has nothing to do with P(n).

因此，P(n)＝1/2*P(n-1)+(1/2)^2*P(n-2)+……+(1/2)^11*P(n-11)。另一种预设规则：为潜在分割点k确定24个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到11连续的自然数，A₁＝169，B₁＝0；A₂＝170，B₂＝-1；A₃＝171，B₃＝-2；A₄＝172，B₄＝-3；A₅＝173，B₅＝-4；A₆＝174，B₆＝-5；A₇＝175，B₇＝-6；A₈＝176，B₈＝-7；A₉＝177，B₉＝-8；A₁₀＝178，B₁₀＝-9；A₁₁＝179，B₁₁＝-10，…A₂₄＝192，B₂₄＝-23,并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝…＝C₂₄，窗口W_x[k-A_x,k+B_x]中至少部分数据满足预定条件C_x的概率为3/4，通过这两个因素可以计算P(n)。Therefore, P(n)=1/2*P(n-1)+(1/2)^2*P(n-2)+...+(1/2)^11*P(n-11) . Another preset rule: determine 24 windows W _x [kA _x ,k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x ,k+B _x ] for potential segmentation point k, where x It is a continuous natural number from 1 to 11, A ₁ =169, B ₁ =0; A ₂ =170, B ₂ =-1; A ₃ =171, B ₃ =-2; A ₄ =172, B ₄ =-3 ; A ₅ =173, B ₅ =-4; A ₆ =174, B ₆ =-5; A ₇ =175, B ₇ =-6; A ₈ =176, B ₈ =-7; A ₉ =177, B ₉ =-8; A ₁₀ =178, B ₁₀ =-9; A ₁₁ =179, B ₁₁ =-10, ... A ₂₄ =192, B ₂₄ =-23, and C ₁ =C ₂ =C ₃ = C ₄ ＝C ₅ ＝C ₆ ＝C ₇ ＝C ₈ ＝C ₉ ＝...＝C ₂₄ , the probability that at least part of the data in the window W _x [kA _x ,k+B _x ] satisfies the predetermined condition C _x is 3/4 , P(n) can be calculated from these two factors.

因此是否存在连续24个窗口中的每一个窗口中至少部分数据均满足预定条件C_x就决定潜在分割点k是否为数据流分割点，可以通过下面的公式计算：Therefore, whether there is at least part of the data in each of the 24 consecutive windows satisfying the predetermined condition C _x determines whether the potential segmentation point k is a data stream segmentation point, which can be calculated by the following formula:

P(1)＝1，P(2)……P(23)＝1，P(24)＝1-(3/4)^24，P(1)=1, P(2)...P(23)=1, P(24)=1-(3/4)^24,

经过计算，P(5*1024)＝0.78，P(11*1024)＝0.17，P(12*1024)＝0.13，即从数据流起始位置/上一数据流分割点查找到12KB后以13％的概率仍未找到数据流分割点，强制进行分割。通过这个概率，求得数据流分割点的密度函数，经过积分求得大约平均在从数据流起始位置/上一数据流分割点查找7.6KB时找到数据流分割点，即平均分块长度大约为7.6KB。与连续的11个窗口中至少部分数据以1/2的概率满足预定条件不同，传统CDC算法采用一个窗口以1/2^12的概率满足条件时，方可达到平均分块长度7.6KB的效果。After calculation, P(5*1024)＝0.78, P(11*1024)＝0.17, P(12*1024)＝0.13, that is, after finding 12KB from the data stream start position/previous data stream split point, use 13 % probability that the data stream split point is still not found, forcing a split. Through this probability, the density function of the split point of the data stream is obtained, and the approximate average of the data stream split point is found when searching for 7.6 KB from the starting position of the data stream/the previous split point of the data stream through integration, that is, the average block length is about It is 7.6KB. Different from at least part of the data in 11 consecutive windows meeting the predetermined condition with a probability of 1/2, the traditional CDC algorithm can only achieve the effect of an average block length of 7.6KB when a window meets the condition with a probability of 1/2^12 .

在图3所示的数据流分割点查找的基础上，在图23所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到11连续的自然数，A_x和B_x为整数。其中，窗口W_x[k-A_x,k+B_x]中至少部分数据满足预定条件C_x的概率为1/2，A₁＝171，B₁＝-2；A₂＝172，B₂＝-3；A₃＝173，B₃＝-4；A₄＝174，B₄＝-5；A₅＝175，B₅＝-6；A₆＝176，B₆＝-7；A₇＝177，B₇＝-8；A₈＝178，B₈＝-9；A₉＝179，B₉＝-10；A₁₀＝170，B₁₀＝-1；A₁₁＝169，B₁₁＝0，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。k_a为数据流分割点，图23中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，在最小数据块4KB结束位置作为下一个潜在分割点k_i，根据在去重服务器103上预设的规则，为潜在分割点k_i确定W_x[k-A_x,k+B_x]及窗口W_x[k-A_x,k+B_x]对应的预设条件C_x，其中x为1到11连续的自然数。确定的11个窗口分别为W_i1[k_i-171,k_i-2]、W_i2[k_i-172,k_i-3]、W_i3[k_i-173,k_i-4]、W_i4[k_i-174,k_i-5]、W_i5[k_i-175,k_i-6]、W_i6[k_i-176,k_i-7]、W_i7[k_i-177,k_i-8]、W_i8[k_i-178,k_i-9]、W_i9[k_i-179,k_i-10]、W_i10[k_i-170,k_i-1]和W_i11[k_i-169,k_i]。判断W_i1[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₁、判断W_i2[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₂、判断W_i3[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-178,k_i-9]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-169,k_i]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图24所示，W_i3[p_i3-169,p_i3]中至少部分数据不满足预定条件C₃，点p_i3沿着数据流分割点查找方向跳跃11个字节为例进行描述。如图24所示，当判断W₃不满足预定条件C₃时，以k_i为起始点，沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₃‖+max_x(‖A_x‖)，在本实施例中，N＝7，在第7个字节的结束位置，获得下一个潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据在去重服务器103上预设的规则，为潜在分割点k_j确定11个窗口W_jx[k_j-A_x,k_j+B_x]，分别为W_j1[k_j-171,k_j-2]、W_j2[k_j-172,k_j-3]、W_j3[k_j-173,k_j-4]、W_j4[k_j-174,k_j-5]、W_j5[k_j-175,k_j-6]、W_j6[k_j-176,k_j-7]、W_j7[k_j-177,k_j-8]、W_j8[k_j-178,k_j-9]、W_j9[k_j-179,k_j-10]、W_j10[k_j-170,k_j-1]和W_j11[k_j-169,k_j]。判断W_j1[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₁、判断W_j2[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₂、判断W_j3[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₃、判断W_j4[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-169,k_j]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。当然该方法的实施受最大数据块长度和构成该数据流的文件的大小约束，在此不再赘述。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 23, a rule is preset on the deduplication server 103, and the rule is: determine 11 windows for the potential segmentation point k W _x [kA _x , k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ], wherein x is a continuous natural number from 1 to 11, and A _x and B _x are integers. Wherein, the probability that at least part of the data in the window W _x [kA _x ,k+B _x ] satisfies the predetermined condition C _x is 1/2, A ₁ =171, B ₁ =-2; A ₂ =172, B ₂ =- 3; A ₃ =173, B ₃ =-4; A ₄ =174, B ₄ =-5; A ₅ =175, B ₅ =-6; A ₆ =176, B ₆ =-7; A ₇ =177 , B ₇ =-8; A ₈ =178, B ₈ =-9; A ₉ =179, B ₉ =-10; A ₁₀ =170, B ₁₀ =-1; A ₁₁ =169, B ₁₁ =0, And C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ . k _a is the data stream split point. The search direction of the data stream split point shown in Figure 23 is from left to right. After skipping the smallest data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is used as the next Potential segmentation point k _i , according to the preset rules on the deduplication server 103, determine W _x [kA _x , k+B _x ] and window W _x [kA _x , k+B _x ] corresponding to potential segmentation point k _i The preset condition C _x , where x is a continuous natural number from 1 to 11. The 11 determined windows are respectively W _i1 [k _i -171, _ki -2], W _i2 [k _i -172, _ki -3], W _i3 [k _i -173, ki -4], W _i _i4 [k _i -174,k _i -5], W _i5 [k _i -175,k _i -6], W _i6 [k _i -176,k _i -7], W _i7 [k _i -177,k _i -8], W _i8 [k _i -178,k _i -9], W _i9 [k _i -179,k _i -10], W _i10 [k _i -170,k _i -1] and W _i11 [ k _i -169, k _i ]. Judging whether at least part of the data in W _i1 [k _i -171, _ki -2] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [k _i -172, _ki -3] meets the predetermined condition C ₂ , Judging whether at least part of the data in W _i3 [k _i -173, _ki -4] meets the predetermined condition C ₃ , judging whether at least part of the data in W _i4 [k _i -174, _ki -5] meets the predetermined condition C ₄ , Judging whether at least part of the data in W _i5 [k _i -175, _ki -6] meets the predetermined condition C ₅ , judging whether at least part of the data in W _i6 [k _i -176, _ki -7] meets the predetermined condition C ₆ , Judging whether at least part of the data in W _i7 [k _i -177, _ki -8] meets the predetermined condition C ₇ , judging whether at least part of the data in W _i8 [k _i -178, _ki -9] meets the predetermined condition C ₈ , Judging whether at least part of the data in W _i9 [k _i -179, _ki -10] meets the predetermined condition C ₉ , judging whether at least part of the data in W _i10 [k _i -170, _ki -1] meets the predetermined condition C ₁₀ and It is judged whether at least part of the data in W _i11 [k _i -169, _ki ] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined condition, as shown in Figure 24, at least part of the data in W _i3 [p _i3 -169, p _i3 ] does not meet the predetermined condition C ₃ , point p _i3 jumps 11 bytes along the search direction of the data stream split point as an example for description. As shown in Figure 24, when it is judged that W ₃ does not meet the predetermined condition C ₃ , start with _ki as the starting point and jump N bytes along the search direction of the data stream segmentation point, where N bytes are not greater than ‖B ₃ ‖ +max _x (‖A _x ‖), in the present embodiment, N=7, at the end position of the 7th byte, obtain the next potential segmentation point, for being different from the potential segmentation point _ki , here the new A potential segmentation point is denoted as k _j , according to the preset rules on the deduplication server 103, 11 windows W _jx [k _j -A _x , k _j +B _x ] are determined for the potential segmentation point k _j , respectively W _j1 [k _j -171,k _j -2], W _j2 [k _j -172,k _j -3], W _j3 [k _j -173,k _j -4], W _j4 [k _j -174,k _j -5], W _j5 [k _j -175,k _j -6], W _j6 [k _j -176,k _j -7], W _j7 [k _j -177,k _j -8], W _j8 [k j _j -178,k _j -9], W _j9 [k _j -179,k _j -10], W _j10 [k _j -170,k _j -1], and W _j11 [k _j -169,k _j ]. Judging whether at least part of the data in W _j1 [k _j -171,k _j -2] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [k _j -172,k _j -3] meets the predetermined condition C ₂ , Judging whether at least part of the data in W _j3 [k _j -173, k _j -4] meets the predetermined condition C ₃ , judging whether at least part of the data in W _j4 [k _j -174, k _j -5] meets the predetermined condition C ₄ , Judging whether at least part of the data in W _j5 [k _j -175, k _j -6] meets the predetermined condition C ₅ , judging whether at least part of the data in W _j6 [k _j -176, k _j -7] meets the predetermined condition C ₆ , Judging whether at least part of the data in W _j7 [k _j -177, k _j -8] meets the predetermined condition C ₇ , judging whether at least part of the data in W _j8 [k _j -178, k _j -9] meets the predetermined condition C ₈ , Judging whether at least part of the data in W _j9 [k _j -179, k _j -10] meets the predetermined condition C ₉ , judging whether at least part of the data in W _j10 [k _j -170, k _j -1] meets the predetermined condition C ₁₀ and It is judged whether at least part of the data in W _j11 [k _j −169,k _j ] satisfies the predetermined condition C ₁₁ . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data stream split point. When it is judged that the potential split point k _j is not a data flow split point, the next potential split point is obtained in the same manner as k _i , and the next potential split point is judged according to the preset rules on the deduplication server 103 and the above method. Split point for data flow. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the mandatory split point. Of course, the implementation of this method is limited by the maximum data block length and the size of the files constituting the data stream, so details will not be repeated here.

在图3所示的数据流分割点查找的基础上，在图25所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中x为1到11连续自然数，A₁＝166，B₁＝3；A₂＝167，B₂＝2；A₃＝168，B₃＝1；A₄＝169，B₄＝0；A₅＝170，B₅＝-1；A₆＝171，B₆＝-2；A₇＝172，B₇＝-3；A₈＝173，B₈＝-4；A₉＝174，B₉＝-5；A₁₀＝175，B₁₀＝-6；A₁₁＝176，B₁₁＝-7；并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，则11个窗口分别为W₁[k-166,k+3]、W₂[k-167,k+2]、W₃[k-168,k+1]、W₄[k-169,k]、W₅[k-170,k-1]、W₆[k-171,k-2]、W₇[k-172,k-3]、W₈[k-173,k-4]、W₉[k-174,k-5]、W₁₀[k-175,k-6]和W₁₁[k-176,k-7]。k_a为数据流分割点，图25中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，在本实施例中，根据在去重服务器103上预设的规则，为潜在分割点k_i确定11个窗口W_ix[k-A_x,k+B_x]及窗口W_ix[k-A_x,k+B_x]对应的预定条件C_x，x分别为1到11连续的自然数。在图25所示的实施方式中，为潜在分割点k_i确定11个窗口，分别为W_i1[k_i-166,k_i+3]、W_i2[k_i-167,k_i+2]、W_i3[k_i-168,k_i+1]、W_i4[k_i-169,k_i]、W_i5[k_i-170,k_i-1]、W_i6[k_i-171,k_i-2]、W_i7[k_i-172,k_i-3]、W_i8[k_i-173,k_i-4]、W_i9[k_i-174,k_i-5]、W_i10[k_i-175,k_i-6]和W_i11[k_i-176,k_i-7]。判断W_i1[k_i-166,k_i+3]中至少部分数据是否满足预定条件C₁、判断W_i2[k_i-167,k_i+2]中至少部分数据是否满足预定条件C₂、判断W_i3[k_i-168,k_i+1]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-169,k_i]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₆、判断W_i7W_i7[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图26所示，W_i7[k_i-172,k_i-3]，则从潜在分割点k_i沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₇‖+max_x(‖A_x‖)，在图26所示的实施方式中，跳跃N个字节不大于185个字节，在本实施例中，N＝5，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据图25所示的实施方式中在去重服务器103上预设的规则，为潜在分割点k_j确定的窗口为11个，分别为W_j1[k_j-166,k_j+3]、W_j2[k_j-167,k_j+2]、W_j3[k_j-168,k_j+1]、W_j4[k_j-169,k_j]、W_j5[k_j-170,k_j-1]、W_j6[k_j-171,k_j-2]、W_j7[k_j-172,k_j-3]、W_j8[k_j-173,k_j-4]、W_j9[k_j-174,k_j-5]、W_j10[k_j-175,k_j-6]和W_j11[k_j-176,k_j-7]。判断W_j1[k_j-166,k_j+3]中至少部分数据是否满足预定条件C₁、判断W_j2[k_j-167,k_j+2]中至少部分数据是否满足预定条件C₂、判断W_j3[k_j-168,k_j+1]中至少部分数据是否满足预定条件C₃、判断W_j4[k_j-169,k_j]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 25, a rule is preset on the deduplication server 103, and the rule is: determine 11 windows for the potential segmentation point k W _x [kA _x , k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ], where x is a continuous natural number from 1 to 11, A ₁ =166, B ₁ =3; A ₂ =167, B ₂ =2; A ₃ =168, B ₃ =1; A ₄ =169, B ₄ =0; A ₅ =170, B ₅ =-1; A ₆ =171, B ₆ =- 2; A ₇ =172, B ₇ =-3; A ₈ =173, B ₈ =-4; A ₉ =174, B ₉ =-5; A ₁₀ =175, B ₁₀ =-6; A ₁₁ =176 , B ₁₁ =-7; and C ₁ ＝C ₂ ＝C ₃ ＝C ₄ ＝C ₅ ＝C ₆ ＝C ₇ ＝C ₈ ＝C ₉ ＝C ₁₀ ＝C ₁₁ , then the 11 windows are W ₁ [ k-166,k+3], W ₂ [k-167,k+2], W ₃ [k-168,k+1], W ₄ [k-169,k], W ₅ [k-170, k-1], W ₆ [k-171,k-2], W ₇ [k-172,k-3], W ₈ [k-173,k-4], W ₉ [k-174,k- 5], W ₁₀ [k-175,k-6] and W ₁₁ [k-176,k-7]. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 25 is from left to right. After skipping the smallest data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is taken as the next potential Segmentation point k _i , in this embodiment, according to the preset rules on the deduplication server 103 _, 11 windows W _ix [kA _x , k+B _x ] and windows W _ix [kA _x , k+B _x ] corresponding to the predetermined condition C _x , x is a continuous natural number from 1 to 11 respectively. In the embodiment shown in FIG. 25, 11 windows are determined for the potential segmentation point ki, which are respectively W _i1 [k _i -166, _ki +3], W _i2 [ _k _i -167, _ki +2] , W _i3 [k _i -168,k _i +1], W _i4 [k _i -169,k _i ], W _i5 [k _i -170,k _i -1], W _i6 [k _i -171,k _i -2], W _i7 [k _i -172,k _i -3], W _i8 [k _i -173,k _i -4], W _i9 [k _i -174,k _i -5], W _i10 [ k _i -175, k _i -6] and W _i11 [k _i -176, k _i -7]. Judging whether at least part of the data in W _i1 [k _i -166, _ki +3] meets the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [k _i -167, _ki +2] meets the predetermined condition C ₂ , Judging whether at least part of the data in W _i3 [k _i -168, _ki +1] meets the predetermined condition C ₃ , judging whether at least part of the data in W _i4 [k _i -169, _ki ] meets the predetermined condition C ₄ , judging W Whether at least part of the data in _i5 [k _i -170, _ki -1] meets the predetermined condition C ₅ , judge whether at least part of the data in W _i6 [k _i -171, _ki -2] meet the predetermined condition C ₆ , and judge W _i7 Whether at least part of the data in W _i7 [k _i -172, _ki -3] meets the predetermined condition C ₇ , judging whether at least part of the data in W _i8 [k _i -173, _ki -4] meets the predetermined condition C ₈ , Judging whether at least part of the data in W _i9 [k _i -174, _ki -5] meets the predetermined condition C ₉ , judging whether at least part of the data in W _i10 [k _i -175, _ki -6] meets the predetermined condition C ₁₀ and It is judged whether at least part of the data in W _i11 [k _i -176, _ki -7] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined conditions, as shown in Figure 26, W _i7 [k _i -172, _ki -3], then from the potential segmentation point _ki along the data Stream split point search direction jumps N bytes, where N bytes are not greater than ‖B ₇ ‖+max _x (‖A _x ‖), in the embodiment shown in Figure 26, jump N bytes not greater than 185 bytes, in this embodiment, N=5, to obtain a new potential segmentation point, in order to distinguish it from the potential segmentation point ki, here the new potential segmentation point is represented as _{k j} _, according to the implementation shown in Figure 25 According to the preset rules on the deduplication server 103, 11 windows are determined for the potential segmentation point k _j , which are respectively W _j1 [k _j -166, k _j +3], W _j2 [k _j -167, k _j +2], W _j3 [k _j -168,k _j +1], W _j4 [k _j -169,k _j ], W _j5 [k _j -170,k _j -1], W _j6 [k _j -171,k _j -2], W _j7 [k _j -172,k _j -3], W j8 [k _j _-173 ,k _j -4], W _j9 [k _j -174,k _j -5] , W _j10 [k _j -175, k _j -6] and W _j11 [k _j -176, k _j -7]. Judging whether at least part of the data in W _j1 [k _j -166, k _j +3] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [k _j -167, k _j +2] meets the predetermined condition C ₂ , Judging whether at least part of the data in W _j3 [k _j -168,k _j +1] meets the predetermined condition C ₃ , judging whether at least part of the data in W _j4 [k _j -169,k _j ] meets the predetermined condition C ₄ , judging W Whether at least part of the data in _j5 [k _j -170, k _j -1] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -171, k _j -2] meet the predetermined condition C ₆ , and judge W Whether at least part of the data in _j7 [k _j -172, k _j -3] meets the predetermined condition C ₇ , judge whether at least part of the data in W j8 [k _j _-173 , k _j -4] meet the predetermined condition C ₈ , judge W Whether at least part of the data in _j9 [k _j -174, k _j -5] meets the predetermined condition C ₉ , judge whether at least part of the data in W _j10 [k _j -175, k _j -6] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _j11 [k _j -176,k _j -7] satisfies the predetermined condition C ₁₁ . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data stream split point. When it is judged that the potential split point k _j is not a data flow split point, the next potential split point is obtained in the same manner as k _i , and the next potential split point is judged according to the preset rules on the deduplication server 103 and the above method. Split point for data flow. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the forced split point.

在图3所示的数据流分割点查找的基础上，在图27所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中x为1到11连续的自然数，A₁＝169，B₁＝0；A₂＝170，B₂＝-1；A₃＝171，B₃＝-2；A₄＝172，B₄＝-3；A₅＝173，B₅＝-4；A₆＝174，B₆＝-5；A₇＝175，B₇＝-6；A₈＝176，B₈＝-7；A₉＝177，B₉＝-8；A₁₀＝168，B₁₀＝1；A₁₁＝179，B₁₁＝3；并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀≠C₁₁，则11个窗口分别为W₁[k-169,k]、W₂[k-170,k-1]、W₃[k-171,k-2]、W₄[k-172,k-3]、W₅[k-173,k-4]、W₆[k-174,k-5]、W₇[k-175,k-6]、W₈[k-176,k-7]、W₉[k-177,k-8]、W₁₀[k-168,k+1]和W₁₁[k-179,k+3]。k_a为数据流分割点，图27中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，最小数据块4KB结束位置作为下一个潜在分割点k_i，在本实施例中，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_ix[k_i-A_x，k_i+B_x]，x分别为1到11连续的自然数，在图27所示的实施方式中，为潜在分割点k_i确定11个窗口分别为W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-168,k_i+1]和W_i11[k_i-179,k_i+3]。判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁、判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂、判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-168,k_i+1]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-179,k_i+3]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当判断窗口W_i11中至少部分数据不满足预定条件C₁₁时，则从潜在分割点k_i沿着数据流分割点查找方向跳跃1个字节，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j。当W_i1、W_i2、W_i3、W_i4、W_i5、W_i6、W_i7、W_i8、W_i9和W_i1010个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图28所示，W_i4[k_i-172,k_i-3]，则从点k_i沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₄‖+max_x(‖A_x‖)，在图28所示的实施方式中，跳跃N个字节不大于182个字节，在本实施例中，N＝6，得到新的潜在分割点，为与潜在分割点k_i区别，这里将新的潜在分割点表示为k_j，根据图27所示的实施方式中在去重服务器103上预设的规则，为潜在分割点k_j确定的窗口分别为W_j1[k_j-169,k_j]、W_j2[k_j -170,k_j-1]、W_j3[k_j-171,k_j-2]、W_j4[k_j-172,k_j-3]、W_j5[k_j-173,k_j-4]、W_j6[k_j-174,k_j-5]、W_j7[k_j-175,k_j-6]、W_j8[k_j-176,k_j-7]、W_j9[k_j-177,k_j-8]、W_j10[k_j-168,k_j+1]和W_j11[k_j-179,k_j+3]。判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁、判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂、判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃、判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-168,k_j+1]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j+3]中至少部分数据是否满足预定条件C₁₁。当然在本发明实施例中，判断潜在分割点k_a是否为数据流分割点时也遵循该原则，具体实现不再描述，可以参照判断潜在分割点k_i的描述。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_j9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点，k_j与k_a之间的数据构成1个数据块，同时按照与k_a相同的方式跳过最小分块大小4KB，获得下一个潜在分割点，并按照在去重服务器103上预设的规则，判断下一个潜在分割点是否为数据流分割点。当判断潜在分割点k_j不是数据流分割点时，按照与k_i相同的方式获得下一个潜在分割点，并按照在去重服务器103上预设的规则及上述方法判断下一个潜在分割点是否为数据流分割点。当超过设定的最大数据块仍然没有找到数据流分割点时，则从最大数据块的结束位置作为强制分割点。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 27, a rule is preset on the deduplication server 103, and the rule is: determine 11 windows for the potential segmentation point k W _x [kA _x ,k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x ,k+B _x ], where x is a continuous natural number from 1 to 11, A ₁ =169, B ₁ =0 ; A ₂ =170, B ₂ =-1; A ₃ =171, B ₃ =-2; A ₄ =172, B ₄ =-3; A ₅ =173, B ₅ =-4; A ₆ =174, B ₆ =-5; A ₇ =175, B ₇ =-6; A ₈ =176, B ₈ =-7; A ₉ =177, B ₉ =-8; A ₁₀ =168, B ₁₀ =1; ₁₁ =179, B ₁₁ =3; and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ =C ₇ =C ₈ =C ₉ =C ₁₀ ≠C ₁₁ , then the 11 windows are respectively W ₁ [k-169,k], W ₂ [k-170,k-1], W ₃ [k-171,k-2], W ₄ [k-172,k-3], W ₅ [k- 173,k-4], W ₆ [k-174,k-5], W ₇ [k-175,k-6], W ₈ [k-176,k-7], W ₉ [k-177, k-8], W ₁₀ [k-168, k+1], and W ₁₁ [k-179, k+3]. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 27 is from left to right. After skipping the smallest data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is taken as the next potential Segmentation point k _i , in this embodiment, according to the preset rules on the deduplication server 103, determine the window W _ix [k _i -A _x , k _i +B _x ] for the potential segmentation point k _i , where x is respectively 1 to 11 consecutive natural numbers, in the embodiment shown in Figure 27, 11 windows are determined for the potential segmentation point k _i respectively W _i1 [k _i -169, k _i ], W _i2 [k _i -170, k _i -1], W _i3 [k _i -171,k _i -2], W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4], W _i6 [ k _i -174,k _i -5], W _i7 [k _i -175,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177,k _i - 8], W _i10 [k _i -168, _ki +1], and W _i11 [k _i -179, _ki +3]. Judging whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _i3 [k _i -171, _ki -2] meets the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [k _i -172, _ki -3] meet the predetermined condition C ₄ , and judge W Whether at least part of the data in _i5 [k _i -173, _ki -4] meets the predetermined condition C ₅ , judge whether at least part of the data in W _i6 [k _i -174, _ki -5] meet the predetermined condition C ₆ , and judge W Whether at least part of the data in _i7 [k _i -175, _ki -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -176, _ki -7] meet the predetermined condition C ₈ , and judge W Whether at least part of the data in _i9 [k _i -177, _ki -8] meets the predetermined condition C ₉ , judge whether at least part of the data in W _i10 [k _i -168, _ki +1] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _i11 [k _i -179, _ki +3] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in the judging window W _i11 does not meet the predetermined condition C ₁₁ , then jump 1 byte from the potential segmentation point _ki along the data stream segmentation point search direction to obtain a new potential segmentation point, which is the potential segmentation point _ki difference, here denote the new potential segmentation point as k _j . When at least part of the data in any of the 10 windows W _i1 , W _i2 , W _i3 , W _i4 , W _i5 , W _i6 , W _i7 , W _i8 , W _i9 , and W _i10 does not meet the corresponding predetermined conditions, such as As shown in Figure 28, W _i4 [k _i -172, _ki -3], then N bytes are jumped from point _ki along the direction of data stream segmentation point search, where N bytes are not greater than ‖B ₄ ‖+ max _x (‖A _x ‖), in the embodiment shown in Figure 28, skip N bytes and be no more than 182 bytes, in the present embodiment, N=6, obtain new potential segmentation point, be and Potential segmentation point _ki difference, here the new potential segmentation point is represented as k _j , according to the rules preset on the deduplication server 103 in the embodiment shown in FIG. 27 , the windows determined for the potential segmentation point k _j are respectively W _j1 [k _j -169,k _j ], W _j2 [k _j -170,k _j -1], W _j3 [k _j -171,k _j -2], W _j4 [k _j -172,k _j -3], W _j5 [k _j -173,k _j -4], W _j6 [k _j -174,k _j -5], W _j7 [k _j -175,k _j -6], W _j8 [k j _j -176,k _j -7], W _j9 [k _j -177,k _j -8], W _j10 [k _j -168,k _j +1] and W _j11 [k _j -179,k _j +3 ]. Judging whether at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _j3 [k _j -171,k _j -2] meets the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [k _j -172,k _j -3] meet the predetermined condition C ₄ , judge W Whether at least part of the data in _j5 [k _j -173, k _j -4] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174, k _j -5] meet the predetermined condition C ₆ , judge W Whether at least part of the data in _j7 [k _j -175, k _j -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W j8 [k _j _-176 , k _j -7] meet the predetermined condition C ₈ , and judge W Whether at least part of the data in _j9 [k _j -177, k _j -8] meets the predetermined condition C ₉ , judge whether at least part of the data in W _j10 [k _j -168, k _j +1] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _j11 [k _j −179, k _j +3] satisfies the predetermined condition C ₁₁ . Of course, in the embodiment of the present invention, this principle is also followed when judging whether a potential split point k _a is a data stream split point, and the specific implementation will not be described again, and reference may be made to the description of judging a potential split point k _i . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in window _Wj9 meets the predetermined condition _C9 , at least part of the data in window _Wj10 meets the predetermined condition _C10 , and at least part of the data in window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data stream split point, the data between k _j and k _a constitutes a data block, and at the same time skip the minimum block size of 4KB in the same way as k _a , get the next potential split point, and follow the Deduplicate the preset rules on the server 103 to determine whether the next potential split point is a data flow split point. When it is judged that the potential split point k _j is not a data flow split point, the next potential split point is obtained in the same manner as k _i , and the next potential split point is judged according to the preset rules on the deduplication server 103 and the above method. Split point for data flow. When the data flow split point is not found beyond the set maximum data block, the end position of the largest data block is used as the mandatory split point.

在图3所示的数据流分割点查找的基础上，在图29所示的实施方式中，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定11个窗口W_x[p_x-A_x,p_x+B_x]和窗口W_x[p_x-A_x,p_x+B_x]对应的预定条件C_x，x分别为1到11连续的自然数，其中，窗口W_x[p_x-A_x,p_x+B_x]中至少部分数据满足预定条件的概率为1/2，A₁＝169，B₁＝0；A₂＝171，B₂＝-2；A₃＝173，B₃＝-4；A₄＝175，B₄＝-6；A₅＝177，B₅＝-8；A₆＝179，B₆＝-10；A₇＝181，B₇＝-12；A₈＝183，B₈＝-14；A₉＝185，B₉＝-16；A₁₀＝187，B₁₀＝-18；A₁₁＝189，B₁₁＝-20；并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁，则11个窗口分别为W₁[k-169,k]、W₂[k-171,k-2]、W₃[k-173,k-4]、W₄[k-175,k-6]、W₅[k-177,k-8]、W₆[k-179,k-10]、W₇[k-181,k-12]、W₈[k-183,k-14]、W₉[k-185,k-16]、W₁₀[k-187,k-18]和W₁₁[k-189,k-20]。k_a为数据流分割点，图29中所示数据流分割点查找方向为从左向右，从数据流分割点k_a跳过最小数据块4KB后，在最小数据块4KB结束位置作为下一个潜在分割点k_i，为潜在分割点k_i确定点p_ix，在本实施例中，根据在去重服务器103上预设的规则，x分别为1到11连续的自然数。在图29所示的实施方式中，依据预定规则，为潜在分割点k_i确定的11个窗口分别为W_i1[k_i-169,k_i]、W_i2[k_i-171,k_i-2]、W_i3[k_i-173,k_i-4]、W_i4[k_i-175,k_i-6]、W_i5[k_i-177, k_i-8]、W_i6[k_i-179,k_i-10]、W_i7[k_i-181,k_i-12]、W_i8[k_i-183,k_i-14]、W_i9[k_i-185,k_i-16]、W_i10[k_i-187,k_i-18]和W_i11[k_i-189,k_i-20]。判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁、判断W_i2[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₂、判断W_i3[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-181,k_i-12]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-183,k_i-14]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-185,k_i-16]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-187,k_i-18]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-189,k_i-20]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_i1中至少部分数据满足预定条件C₁、窗口W_i2中至少部分数据满足预定条件C₂、窗口W_i3中至少部分数据满足预定条件C₃、窗口W_i4中至少部分数据满足预定条件C₄、窗口W_i5中至少部分数据满足预定条件C₅、窗口W_i6中至少部分数据满足预定条件C₆、窗口W_i7中至少部分数据满足预定条件C₇、窗口W_i8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_i10中至少部分数据满足预定条件C₁₀和窗口W_i11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_i为数据流分割点。当11个窗口中任一个窗口中至少部分数据不满足对应的预定条件时，如图30所示，W_i4[k_i-175,k_i-6]中至少部分数据不满足预定条件C₄，则选择下一个潜在分割点，为与潜在分割点k_i区别，这里表示为k_j，k_j位于k_i右边，并且k_j与k_i间距1个字节。如图30所示，依为去重服务器103预设的规则，为潜在分割点k_j确定11个窗口分别为W_j1[k_j-169,k_j]、W_j2[k_j-171,k_j-2]、W_j3[k_j-173,k_j-4]、W_j4[k_j-175,k_j-6]、W_j5[k_j-177,k_j-8]、W_j6[k_j-179,k_j-10]、W_j7[k_j-181,k_j-12]、W_j8[k_j-183,k_j-14]、W_j9[k_j-185,k_j-16]、W_j10[k_j-187,k_j-18]和W_j11[k_j-189,k_j-20]，并且C₁＝C₂＝C₃＝C₄＝C₅＝C₆＝C₇＝C₈＝C₉＝C₁₀＝C₁₁。判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁、判断W_j2[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₂、判断W_j3[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₃、判断W_j4[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-181,k_j-12]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-183,k_j-14]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-185,k_j-16]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-187,k_j-18]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-189,k_j-20]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_j1中至少部分数据满足预定条件C₁、窗口W_j2中至少部分数据满足预定条件C₂、窗口W_j3中至少部分数据满足预定条件C₃、窗口W_j4中至少部分数据满足预定条件C₄、窗口W_j5中至少部分数据满足预定条件C₅、窗口W_j6中至少部分数据满足预定条件C₆、窗口W_j7中至少部分数据满足预定条件C₇、窗口W_j8中至少部分数据满足预定条件C₈、窗口W_i9中至少部分数据满足预定条件C₉、窗口W_j10中至少部分数据满足预定条件C₁₀和窗口W_j11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_j为数据流分割点。当判断窗口W_j1、W_j2、W_j3、W_j4、W_j5、W_j6、W_j7、W_j8、W_j9、W_j10和W_j11中任一个窗口中至少部分数据不满足预定条件时，如图31所示，W_j3[k_j-173,k_j-4]中至少部分数据不满足预定条件C₃时，k_j位于k_i右边从k_i沿着数据流分割点查找方向跳跃N个字节，其中N个字节不大于‖B₄‖+max_x(‖A_x‖)，在图28所示的实施方式中，N个字节不大于195个字节，在本实施例中，N＝15，获得下一个潜在分割点，为与潜在分割点k_i、k_j相区别，表示为k_l。根据图29所实施方式中为去重服务器103预设的规则，为潜在分割点k_l确定11个窗口分别为W_l1[k_l-169,k_l]、W_l2[k_l-171,k_l-2]、W_l3[k_l-173,k_l-4]、W_l4[k_l-175,k_l-6]、W_l5[k_l-177,k_l-8]、W_l6[k_l-179,k_l-10]、W_l7[k_l-181,k_l-12]、W_l8[k_l-183,k_l-14]、W_l9[k_l-185,k_l-16]、W_l10[k_l-187,k_l-18]和W_l11[k_l-189,k_l-20]。判断W_l1[k_l-169,k_l]中至少部分数据是否满足预定条件C₁、判断W_l2[k_l-171,k_l-2]中至少部分数据是否满足预定条件C₂、判断W_l3[k_l-173,k_l-4]中至少部分数据是否满足预定条件C₃、判断W_l4[k_l-175,k_l-6]中至少部分数据是否满足预定条件C₄、判断W_l5[k_l-177,k_l-8]中至少部分数据是否满足预定条件C₅、判断W_l6[k_l-179,k_l-10]中至少部分数据是否满足预定条件C₆、判断W_l7[k_l-181,k_l-12]中至少部分数据是否满足预定条件C₇、判断W_l8[k_l-183,k_l-14]中至少部分数据是否满足预定条件C₈、判断W_l9[k_l-185,k_l-16]中至少部分数据是否满足预定条件C₉、判断W_l10[k_l-187,k_l-18]中至少部分数据是否满足预定条件C₁₀和判断W_l11[k_l-189,k_l-20]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_l1中至少部分数据满足预定条件C₁、窗口W_l2中至少部分数据满足预定条件C₂、窗口W_l3中至少部分数据满足预定条件C₃、窗口W_l4中至少部分数据满足预定条件C₄、窗口W_l5中至少部分数据满足预定条件C₅、窗口W_l6中至少部分数据满足预定条件C₆、窗口W_l7中至少部分数据满足预定条件C₇、窗口W_l8中至少部分数据满足预定条件C₈、窗口W_l9中至少部分数据满足预定条件C₉、窗口W_l10中至少部分数据满足预定条件C₁₀和窗口W_l11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_l为数据流分割点。当窗口W_l1、W_l2、W_l3、W_l4、W_l5、W_l6、W_l7、W_l8、W_l9、W_l10和W_l11中任一窗口中至少部分数据不满足预定条件时，选择下一个潜在分割点，为与潜在分割点k_i、k_j和k_l区别，表示为k_m，k_m位于k_l右边，并且k_m与k_l间距1个字节。根据图29所示实施例为去重服务器103预设的规则，为潜在分割点k_m确定的11个窗口分别为W_m1[k_m-169,k_m]、W_m2[k_m-171,k_m-2]、W_m3[k_m-173,k_m-4]、W_m4[k_m-175,k_m-6]、W_m5[k_m-177,k_m-8]、W_m6[k_m-179,k_m-10]、W_m7[k_m-181,k_m-12]、W_m8[k_m-183,k_m-14]、W_m9[k_m-185,k_m-16]、W_m10[k_m-187,k_m-18]和W_m11[k_m-189,k_m-20]。判断W_m1[k_m-169,k_m]中至少部分数据是否满足预定条件C₁、判断W_m2[k_m-171,k_m-2]中至少部分数据是否满足预定条件C₂、判断W_m3[k_m-173,k_m-4]中至少部分数据是否满足预定条件C₃、判断W_m4[k_m-175,k_m-6]中至少部分数据是否满足预定条件C₄、判断W_m5[k_m-177,k_m-8]中至少部分数据是否满足预定条件C₅、判断W_m6[k_m-179,k_m-10]中至少部分数据是否满足预定条件C₆、判断W_m7[k_m-181,k_m-12]中至少部分数据是否满足预定条件C₇、判断W_m8[k_m-183,k_m-14]中至少部分数据是否满足预定条件C₈、判断W_m9[k_m-185,k_m-16]中至少部分数据是否满足预定条件C₉、判断W_m10[k_m-187,k_m-18]中至少部分数据是否满足预定条件C₁₀和判断W_m11[k_m-189,k_m-20]中至少部分数据是否满足预定条件C₁₁。当判断窗口W_m1中至少部分数据满足预定条件C₁、窗口W_m2中至少部分数据满足预定条件C₂、窗口W_m3中至少部分数据满足预定条件C₃、窗口W_m4中至少部分数据满足预定条件C₄、窗口W_m5中至少部分数据满足预定条件C₅、窗口W_m6中至少部分数据满足预定条件C₆、窗口W_m7中至少部分数据满足预定条件C₇、窗口W_m8中至少部分数据满足预定条件C₈、窗口W_m9中至少部分数据满足预定条件C₉、窗口W_m10中至少部分数据满足预定条件C₁₀和窗口W_m11中至少部分数据满足预定条件C₁₁时，则当前潜在分割点k_m为数据流分割点。当任一个窗口中至少部分数据不满足预定条件时，则按照前面描述的方案执行跳跃，以获得下一个潜在分割点并判断是否为数据流分割点。On the basis of the data stream segmentation point search shown in Figure 3, in the embodiment shown in Figure 29, a rule is preset on the deduplication server 103, and the rule is: determine 11 windows for the potential segmentation point k W _x [p _x -A _x ,p _x +B _x ] and the predetermined condition C _x corresponding to the window W _x [p _x -A _x ,p _x +B _x ], x is a continuous natural number from 1 to 11, where , the probability that at least part of the data in the window W _x [p _x -A _x ,p _x +B _x ] satisfies the predetermined condition is 1/2, A ₁ =169, B ₁ =0; A ₂ =171, B ₂ =- 2; A ₃ =173, B ₃ =-4; A ₄ =175, B ₄ =-6; A ₅ =177, B ₅ =-8; A ₆ =179, B ₆ =-10; A ₇ =181 , B ₇ =-12; A ₈ =183, B ₈ =-14; A ₉ =185, B ₉ =-16; A ₁₀ =187, B ₁₀ =-18; A ₁₁ =189, B ₁₁ =-20 ; and C ₁ ＝C ₂ ＝C ₃ ＝C ₄ ＝C ₅ ＝C ₆ ＝C ₇ ＝C ₈ ＝C ₉ ＝C ₁₀ ＝C ₁₁ , then the 11 windows are respectively W ₁ [k-169,k] , W ₂ [k-171,k-2], W ₃ [k-173,k-4], W ₄ [k-175,k-6], W ₅ [k-177,k-8], W ₆ [k-179,k-10], W ₇ [k-181,k-12], W ₈ [k-183,k-14], W ₉ [k-185,k-16], W ₁₀ [ k-187,k-18] and W ₁₁ [k-189,k-20]. k _a is the data stream split point. The search direction of the data stream split point shown in Figure 29 is from left to right. After skipping the smallest data block 4KB from the data stream split point k _a , the end position of the smallest data block 4KB is used as the next The potential segmentation point _ki is to determine the point p _ix for the potential segmentation point _ki . In this embodiment, according to the preset rules on the deduplication server 103, x is a continuous natural number from 1 to 11. In the embodiment shown in FIG. 29 , according to predetermined rules, the 11 windows determined for the potential segmentation point k _i are W _i1 [k _i -169, _ki ], W _i2 [k _i -171, _ki - 2], W _i3 [k _i -173, _ki -4], W _i4 [k _i -175, _ki -6], W _i5 [k _i -177, _ki -8], W _i6 [k _i -179,k _i -10], W _i7 [k _i -181,k _i -12], W _i8 [k _i -183,k _i -14], W _i9 [k _i -185,k _i -16] , W _i10 [k _i -187, _ki -18] and W _i11 [k _i -189, _ki -20]. Judging whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , judging whether at least part of the data in W _i2 [k _i -171, _ki -2] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _i3 [k _i -173, _ki -4] meets the predetermined condition C ₃ , judge whether at least some of the data in W _i4 [k _i -175, _ki -6] meet the predetermined condition C ₄ , and judge W Whether at least part of the data in _i5 [k _i -177, _ki -8] meets the predetermined condition C ₅ , judge whether at least part of the data in W _i6 [k _i -179, _ki -10] meet the predetermined condition C ₆ , and judge W Whether at least part of the data in _i7 [k _i -181, _ki -12] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -183, _ki -14] meet the predetermined condition C ₈ , and judge W Whether at least part of the data in _i9 [k _i -185, _ki -16] meet the predetermined condition C ₉ , judge whether at least part of the data in W _i10 [k _i -187, _ki -18] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _i11 [k _i -189, _ki -20] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _i1 meets the predetermined condition C ₁ , at least part of the data in window W _i2 meets the predetermined condition C ₂ , at least part of the data in window W _i3 meets the predetermined condition C ₃ , and at least part of the data in window W _i4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _i5 meets the predetermined condition C ₅ , at least part of the data in window W _i6 meets the predetermined condition C ₆ , at least part of the data in window W _i7 meets the predetermined condition C ₇ , and at least part of the data in window W _i8 When the predetermined condition C8 _{is met, at least part of the data in window W i9} _meets the predetermined condition _C9 , at least part of the data in window W _i10 meets the predetermined condition _C10 , and at least part of the data in window W _i11 meets the predetermined condition _C11 , then the current potential segmentation Point _ki is the data flow splitting point. When at least part of the data in any of the 11 windows does not meet the corresponding predetermined condition, as shown in Figure 30, at least part of the data in W _i4 [k _i -175, _ki -6] does not meet the predetermined condition C ₄ , Then select the next potential segmentation point, which is different from the potential segmentation point ki, expressed as k _j here, k _j is located on the right side of _ki , and the distance between _{k j} _and _ki is 1 byte. As shown in FIG. 30 , according to the preset rules for the deduplication server 103, 11 windows are determined for the potential segmentation point k _j as W _j1 [k _j -169, k _j ], W _j2 [k _j -171, k _j -2], W _j3 [k _j -173,k _j -4], W _j4 [k _j -175,k _j -6], W _j5 [k _j -177,k _j -8], W _j6 [ k _j -179,k _j -10], W _j7 [k _j -181,k _j -12], W _j8 [k _j -183,k _j -14], W _j9 [k _j -185,k _j - 16], W _j10 [k _j -187, k _j -18] and W _j11 [k _j -189, k _j -20], and C ₁ =C ₂ =C ₃ =C ₄ =C ₅ =C ₆ = C ₇ =C ₈ =C ₉ =C ₁₀ =C ₁₁ . Judging whether at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _j2 [k _j -171,k _j -2] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _j3 [k _j -173,k _j -4] meets the predetermined condition C ₃ , judge whether at least part of the data in W _j4 [k _j -175,k _j -6] meets the predetermined condition C ₄ , judge W Whether at least part of the data in _j5 [k _j -177, k _j -8] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -179, k _j -10] meet the predetermined condition C ₆ , judge W Whether at least part of the data in _j7 [k _j -181, k _j -12] meets the predetermined condition C ₇ , judge whether at least part of the data in W _j8 [k _j -183, k _j -14] meet the predetermined condition C ₈ , judge W Whether at least part of the data in _j9 [k _j -185, k _j -16] meets the predetermined condition C ₉ , judge whether at least part of the data in W _j10 [k _j -187, k _j -18] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _j11 [k _j -189,k _j -20] satisfies the predetermined condition C ₁₁ . When judging that at least part of the data in window W _j1 meets the predetermined condition C ₁ , at least part of the data in window W _j2 meets the predetermined condition C ₂ , at least part of the data in window W _j3 meets the predetermined condition C ₃ , and at least part of the data in window W _j4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _j5 meets the predetermined condition C ₅ , at least part of the data in window W _j6 meets the predetermined condition C ₆ , at least part of the data in window W _j7 meets the predetermined condition C ₇ , and at least part of the data in window W _j8 When the predetermined condition C8 is met, at least part of the data in the window W _i9 meets the predetermined condition _C9 , at least part of the data in the window _Wj10 meets the predetermined condition _C10 , and at least part of the data in the window _Wj11 _meets the predetermined condition _C11 , then the current potential segmentation Point k _j is the data flow splitting point. When judging that at least part of the data in any of the windows W _j1 , W _j2 , W _j3 , W _j4 , W _j5 , W _j6 , W _j7 , W _j8 , W _j9 , W _j10 and W _j11 does not meet the predetermined conditions, as As shown in Figure 31, when at least part of the data in W _j3 [k _j -173,k _j -4] does not meet the predetermined condition C ₃ , k _j is located on the right of _ki and jumps N times from _ki along the direction of data stream segmentation point search Bytes, where N bytes are not greater than ‖B ₄ ‖+max _x (‖A _x ‖), in the embodiment shown in Figure 28, N bytes are not greater than 195 bytes, in this embodiment , N=15, to obtain the next potential segmentation point, which is denoted as k _l to distinguish it from potential segmentation points ki and _{k j} _. According to the preset rules for the deduplication server 103 in the embodiment shown in FIG. 29, 11 windows are determined for the potential segmentation point k _l , which are respectively W _l1 [k _l -169, k _l ], W _l2 [k _l -171, k _l -2], W _l3 [k _l -173,k _l -4], W _l4 [k _l -175,k _l -6], W _l5 [k _l -177,k _l -8], W _l6 [ k _l -179,k _l -10], W _l7 [k _l -181,k _l -12], W _l8 [k _l -183,k _l -14], W _l9 [k _l -185,k _l - 16], W _l10 [k _l -187,k _l -18] and W _l11 [k _l -189,k _l -20]. Judging whether at least part of the data in W _l1 [k _l -169,k _l ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _l2 [k _l -171,k _l -2] meets the predetermined condition C ₂ , judging W _l3 Whether at least part of the data in [k _l -173,k _l -4] meets the predetermined condition C ₃ , judge whether at least part of the data in W _l4 [k _l -175,k _l -6] meet the predetermined condition C ₄ , judge W _l5 Whether at least part of the data in [k _l -177, k _l -8] satisfies the predetermined condition C ₅ , judging W _l6 whether at least part of the data in [k _l -179, k _l -10] meets the predetermined condition C ₆ , judging W _l7 Whether at least part of the data in [k _l -181, k _l -12] meets the predetermined condition C ₇ , judge whether at least part of the data in W _l8 [k _l -183, k _l -14] meet the predetermined condition C ₈ , judge W _l9 Whether at least part of the data in [k _l -185, k _l -16] meets the predetermined condition C ₉ , judgment W _l10 Whether at least part of the data in [k _l -187, k _l -18] meets the predetermined condition C ₁₀ and judgment W Whether at least part of the data in _l11 [k _l -189,k _l -20] satisfies the predetermined condition C ₁₁ . When it is judged that at least part of the data in window W _l1 meets the predetermined condition C ₁ , at least part of the data in window W _l2 meets the predetermined condition C ₂ , at least part of the data in window W _l3 meets the predetermined condition C ₃ , and at least part of the data in window W _l4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _l5 meet the predetermined condition C ₅ , at least part of the data in window W _l6 meet the predetermined condition C ₆ , at least part of the data in window W _l7 meet the predetermined condition C ₇ , at least part of the data in window W _l8 When the predetermined condition C8 is met, at least part of the data in window _W19 meets the predetermined condition _C9 , at least part of the data in window _W110 meets the predetermined condition _C10 , and at least part of the data in window _W111 _meets the predetermined condition _C11 , then the current potential segmentation Point k _l is the data flow splitting point. When at least some of the data in any of the windows W _l1 , W _l2 , W _l3 , W _l4 , W _l5 , W _l6 , W _l7 , W _l8 , W _l9 , W _l10 and W _l11 do not meet the predetermined conditions, select the following A potential segmentation point, to be distinguished from potential segmentation points _ki , _kj and _kl , is denoted as km, km is located to the right of _kl _, and the distance between _km and _kl is ₁ byte. According to the preset rules for the deduplication server 103 in the embodiment shown in FIG. 29, the 11 windows determined for the potential segmentation point k _m are respectively W _m1 [k _m -169, km ], W _m2 [ _{k m} _-171 , _{km -2], W m3} _[ km _-173 , _km -4], W _m4 [ _km -175, _{km -6], W m5} _[ km _-177 , _km -8], W _m6 [km -179, _{km -10], W m7} _[ km _-181 , km -12], W _m8 [km _-183 , _km _-14 ], W _m9 [ _km _-185 , _km -16], W _m10 [km -187, _km -18] and W _m11 [ _km _-189 , _km -20]. Judging whether at least part of the data in W _m1 [km _-169 , _km ] meets the predetermined condition C ₁ , judging whether at least part of the data in W _m2 [ _km -171, _km -2] meets the predetermined condition C ₂ , judging W Whether at least part of the data in _m3 [k _m -173, _km -4] meets the predetermined condition C ₃ , judge whether at least part of the data in W _m4 [ _km -175, _km -6] meet the predetermined condition C ₄ , and judge W Whether at least part of the data in _m5 [k _m _-177 , km -8] meets the predetermined condition C ₅ , judge whether at least part of the data in W _m6 [ _km -179, _km -10] meet the predetermined condition C ₆ , and judge W Whether at least part of the data in _m7 [k _m -181, _km -12] meets the predetermined condition C ₇ , judging whether at least part of the data in W _m8 [ _km -183, _km -14] meets the predetermined condition C ₈ , judging W Whether at least part of the data in _m9 [k _m -185, _km -16] meets the predetermined condition C ₉ , judge whether at least part of the data in W _m10 [ _km -187, _km -18] meet the predetermined condition C ₁₀ and judge W Whether at least part of the data in _m11 [k _m -189,k _m -20] satisfies the predetermined condition C ₁₁ . When it is judged that at least part of the data in window W _m1 meets the predetermined condition C ₁ , at least part of the data in window W _m2 meets the predetermined condition C ₂ , at least part of the data in window W _m3 meets the predetermined condition C ₃ , and at least part of the data in window W _m4 meets the predetermined condition Condition C ₄ , at least part of the data in window W _m5 meet the predetermined condition C ₅ , at least part of the data in window W _m6 meet the predetermined condition C ₆ , at least part of the data in window W _m7 meet the predetermined condition C ₇ , and at least part of the data in window W _m8 When the predetermined condition C8 is met, at least part of the data in window _Wm9 meets the predetermined condition _C9 , at least part of the data in window _Wm10 meets the predetermined condition _C10 , and at least part of the data in window _Wm11 _meets the predetermined condition _C11 , then the current potential segmentation The point k _m is the data flow splitting point. When at least part of the data in any window does not meet the predetermined condition, jumping is performed according to the scheme described above to obtain the next potential split point and determine whether it is a data flow split point.

本发明实施例提供了一种判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z的方法，本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，以图21所示的实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中“■”表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为a_m,1…a_m,8,表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当a_m,n＝1时，V_am,n＝1，当a_m,n＝0时，V_am,n＝-1，其中a_m,n表示a_m,1…a_m,8中的任一个，255个字节对应的位按照a_m,n与V_am,n的转换关系得到矩阵V_a，可以表示为：。选取大量随机数，组成矩阵，由随机数据组成的矩阵一旦组成，保持不变，如从服从特定分布(这里以正态分布为例)的随机数中选择255*8个随机数组成矩阵R：将矩阵V_a的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am＝V_am,1*h_m,1+V_am,2*h_m,2+…+V_am,8*h_m,8。根据该方法，获得S_a1、S_a2…到S_a255，统计S_a1、S_a2…到S_a255中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_am与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_a1、S_a2…到S_a255中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_a1、S_a2…到S_a255的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁；当K为奇数时，表明W₁[k_i-169,k_i]中至少部分数据不满足预定条件C₁,这里C₁即指根据上述方式获得的S_a1、S_a2…到S_a255的值大于0的个数K为偶数。在图21所示的实施方式中，在W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-178,k_i-9]和W_i11[k_i-179,k_i-10]中，各窗口大小相同，即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的描述。因此，如图32所示，表示判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为b_m,1…b_m,8,表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当b_m,n＝1时，V_bm,n＝1，当b_m,n＝0时，V_bm,n＝-1，其中b_m,n表示b_m,1…b_m,8中的任一个，255个字节对应的位按照b_m,n与V_bm,n的转换关系得到矩阵V_b，可以表示为：判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件的方式与判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件的方式相同，因此使用矩阵R：将矩阵V_b的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_bm＝V_bm,1*h_m,1+V_bm,2*h_m,2+…+V_bm,8*h_m,8。根据该方法，获得S_b1、S_b2…到S_b255，统计S_b1、S_b2…到S_b255中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_bm与矩阵R 一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_b1、S_b2…到S_b255中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_b1、S_b2…到S_b255的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂；当K为奇数时，表明W_i2[k_i-170,k_i-1]中至少部分数据不满足预定条件C₂,这里C₂即指根据上述方式获得的S_b1、S_b2…到S_b255的值大于0的个数K为偶数。图21所示的实施方式中，W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂。The embodiment of the present invention provides a method for judging whether at least part of the data in the window W _iz [k _i -A _z , k _i +B _z ] satisfies the predetermined condition C _z . In this embodiment, a random function is used to judge the window W _iz [ k _i -A _z , k _i +B _z ] whether at least part of the data satisfies the predetermined condition C _z , taking the implementation shown in FIG. k _i determines the window W _i1 [k _i -169, _ki ], and judges whether at least part of the data in W _i1 [k _i -169, _ki ] meets the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , select 5 bytes, "■" in Figure 32 indicates selection 1 byte, the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as a _m,1 ... a _m,8 , which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, the bits corresponding to 255 bytes It can be expressed as: When a _m,n =1, V _am,n =1, when a _m,n =0, V _am,n =-1, where a _m,n represents a _m,1 ... a _m,8 For any one, the bits corresponding to 255 bytes are obtained according to the conversion relationship between a _m,n and V _am,n to obtain the matrix V _a , which can be expressed as: . Select a large number of random numbers to form a matrix. Once the matrix composed of random data is formed, it remains unchanged. For example, select 255*8 random numbers from random numbers that obey a specific distribution (here, take the normal distribution as an example) to form a matrix R: Multiply the m-th row of the matrix V _a with the random number in the m-th row of the matrix R, and then sum to get a value, specifically expressed as S _am =V _am,1 *h _m,1 +V _am,2 *h _m,2 +...+V _am,8 *h _m,8 . According to this method, S _a1 , S _a2 . . . to S _a255 are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _a1 , S _a2 . . . to S _a255 is counted. Since the matrix R obeys the normal distribution, S _am , like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of the normal distribution random number being greater than 0 is 1/2, in S _a1 , S _a2 ... to S _a255 In , the probability of each value greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K _whose value is greater than 0 from S _a1 , S _a2 . condition. When K is an even number, it indicates that at least part of the data in W _i1 [k _i -169, _ki ] meets the predetermined condition C ₁ ; when K is an odd number, it indicates that at least part of the data in W ₁ [k _i -169, _ki ] The predetermined condition C ₁ is not satisfied, where C ₁ means that the number K of _S _a1 , S _a2 . In the embodiment shown in FIG. 21 , in W _i1 [k _i -169, _ki ], W _i2 [k _i -170, _ki -1], W _i3 [k _i -171, _ki -2] , W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4], W _i6 [k _i -174,k _i -5], W _i7 [k _i -175 ,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177,k _i -8], W _i10 [k _i -178,k _i -9] and W In _i11 [k _i -179,k _i -10], the size of each window is the same, that is, the size of the window is 169 bytes. At the same time, the method of judging whether at least part of the data in the window meets the predetermined condition is also the same. For details, see the above judgment W _i1 Whether at least part of the data in [k _i −169, _ki ] satisfies the description of the predetermined condition C ₁ . Therefore, as shown in Figure 32, Indicates 1 byte selected when judging whether at least part of the data in the window W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as b _m,1 ...b _m,8 , which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, the corresponding bits of 255 bytes It can be expressed as: When b _m,n =1, V _bm,n =1, when b _m,n =0, V _bm,n =-1, where b _m,n represents b _m,1 ...b _m,8 For any one, the bits corresponding to 255 bytes are obtained according to the conversion relationship between b _m,n and V _bm,n to obtain the matrix V _b , which can be expressed as: The method of judging whether at least part of the data in W _i1 [k _i -169, _ki ] meets the predetermined condition is the same as the method of judging whether at least part of the data in the window W _i2 [k _i -170, _ki -1] meets the predetermined condition, So using the matrix R: Multiply the m-th row of the matrix V _b with the random number in the m-th row of the matrix R, and then sum to get a value, specifically expressed as S _bm =V _bm,1 *h _m,1 +V _bm,2 *h _m,2 +...+V _bm,8 *h _m,8 . According to this method, obtain S _b1 , S _b2 . . . to S _b255 , and _count the number K of values in S _b1 , S _b2 . Since the matrix R obeys the normal distribution, S _bm , like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of the normal distribution random number being greater than 0 is 1/2, in S _b1 , S _b2 ... to S _b255 In , the probability of each value greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K whose value is greater than 0 from S _b1 , S _b2 ... to S _b255 is an even number, the probability that the random number of the binomial distribution is an even number is 1/2, so K is with the probability of 1/2 To meet the conditions. When K is an even number, it indicates that at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ ; when K is an odd number, it indicates that W _i2 [k _i -170, _ki -1] At least some of the data in do not satisfy the predetermined condition C ₂ , where C ₂ means that the number _K of S _b1 , S _b2 . In the implementation shown in FIG. 21 , at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ .

因此，如图32所示，表示判断窗口W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[k_i-169,k_i]和W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件的方法，判断W_i3[k_i-171,k_i-2]中至少数据是否满足预定条件C₃。图21所示的实施方式中，W_i3[k_i-171,k_i-2]中至少部分数据满足预定条件。如图32所示，表示判断窗口W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]和W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件的方法，判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄。图21所示的实施方式中，W_i4[k_i-172,k_i-3]中至少部分数据满足预定条件C₄。如图32所示，表示判断窗口W_i5[k_i-173,k_i-4] 中至少部分数据是否满足预定条件C₅时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。然后使用判断窗口W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]和W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件的方法，判断W_i5[k_i-173,k_i-4]中至少数据是否满足预定条件C₅。图21所示的实施方式中，W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅。Therefore, as shown in Figure 32, Indicates one byte selected when judging whether at least part of the data in the window W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the method of judging whether at least part of the data in the windows W _i1 [k _i -169, _ki ] and W _i2 [k _i -170, _ki -1] meet the predetermined conditions, and judge W _i3 [k _i -171, k Whether at least the data in _i -2] satisfies the predetermined condition C ₃ . In the implementation shown in FIG. 21 , at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition. As shown in Figure 32, Indicates 1 byte selected when judging whether at least part of the data in the window W _i4 [k _i -172, _ki -3] satisfies the predetermined condition C ₄ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the judgment window W _i1 [k _i -169, _ki -1], W _i2 [k _i -170, _ki -1] and W _i3 [k _i -171, _ki -2] to determine whether at least part of the data meets the predetermined A conditional method, judging whether at least part of the data in W _i4 [k _i -172, _ki -3] satisfies the predetermined condition C ₄ . In the implementation shown in FIG. 21 , at least part of the data in W _i4 [k _i -172, _ki -3] satisfies the predetermined condition C ₄ . As shown in Figure 32, Indicates one byte selected when judging whether at least part of the data in the window W _i5 [k _i -173, _ki -4] satisfies the predetermined condition C ₅ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Then use the judgment windows W _i1 [k _i -169,k _i ], W _i2 [k _i -170,k _i -1], W _i3 [k _i -171,k _i -2] and W _i4 [k _i - 172, _ki -3] at least part of the data satisfy the predetermined condition, judging whether at least the data in W _i5 [ _ki -173, _ki -4] satisfy the predetermined condition C ₅ . In the embodiment shown in FIG. 21 , at least part of the data in W _i5 [k _i -173, _ki -4] does not satisfy the predetermined condition C ₅ .

当W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件时C₅，从点p_i5沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得下一个潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此如图33所示，W_j1表示窗口，为判断中至少部分数据是否满足预定条件C₁，选择5个字节，图33中“■”表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得255字节，以增加随机性。其中每个字节由8位组成，记为a_m,1'…a_m,8',表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当a_m,n'＝1时，V_am,n'＝1，当a_m,n'＝0时，V_am,n'＝-1，其中a_m,n'表示a_m,1'…a_m,8'中的任一个，255个字节对应的位按照a_m,n'与V_am,n'的转换关系得到矩阵V_a'，可以表示为：判断窗口中至少部分数据是否满足预定的条件与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件的方式相同，因此使用矩阵R：将矩阵V_a'的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_am'＝V_am,1'*h_m,1+V_am,2'*h_m,2+…+V_am,8'*h_m,8。根据该方法，获得S_a1'、S_a2'…到S_a255'，统计S_a1'、S_a2'…到S_a255'中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_am'与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_a1'、S_a2'…到S_a255'中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_a1'、S_a2'…到S_a255'的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁；当K为奇数时，表明W_j1[k_j-169,k_j]中至少部分数据不满足预定条件C₁。When at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , jump 7 bytes from point p _i5 along the direction of data flow splitting point search, at the seventh byte The end position of the next potential segmentation point k _j is obtained, as shown in Figure 22, according to the rules preset for the deduplication server 103, the window W _j1 [k _j -169, k _j ] is determined for the potential segmentation point k _j , and the judgment The method of whether at least part of the data in the window W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ and the method of judging whether at least part of the data in the window W _i1 [k _i -169,k _i ] meet the predetermined condition C ₁ Same, so as shown in Figure 33, W _j1 represents the window, in order to judge whether at least part of the data satisfies the predetermined condition C ₁ , select 5 bytes, "■" in Figure 33 represents the selected 1 byte, adjacent two The difference between selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as a _m,1 '...a _m,8 ', indicating the 1st to 8th bits of the mth byte in 255 bytes, therefore, 255 bytes correspond to bits can be expressed as: When a _m,n '=1, V _am,n '=1, when a _m,n '=0, V _am,n '=-1, where a _m,n ' means a _m,1 '... For any one of a _m,8 ', the bits corresponding to 255 bytes are obtained according to the conversion relationship between a _m,n ' and V _am,n ' to obtain a matrix V _a ', which can be expressed as: Judging whether at least part of the data in the window meets the predetermined condition is the same as judging whether at least part of the data in the window W _i1 [k _i -169, _ki ] meets the predetermined condition, so the matrix R is used: Multiply the mth row of the matrix V _a ' with the random number in the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _am '=V _am,1 '*h _m,1 +V _{am, 2} '*h _m,2 +...+V _am,8 '*h _m,8 . According to this method, S _a1 ′, S _a2 ′… to S _a255 ′ are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _a1 ′, S _a2 ′… to S _a255 ′ is counted. Since the matrix R obeys the normal distribution, S _am ', like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of a normal distribution random number being greater than 0 is 1/2. In S _a1 ', S _a2 '... To S _a255 ', the probability of each value being greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K whose value is greater than 0 from S _a1 ', S _a2 '... to S _a255 ' is an even number, and the probability that the random number of the binomial distribution is an even number is 1/2, so K is 1/2 The probability satisfies the condition. When K is an even number, it indicates that at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ ; when K is an odd number, it indicates that at least part of the data in W _j1 [k _j -169,k _j ] The predetermined condition C ₁ is not satisfied.

判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图33所示，表示判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。将选择的5字节数据反复利用51次，共获得 255字节，以增加随机性。其中每个字节由8位组成，记为b_m,1'…b_m,8',表示255个字节中第m个字节的第1到第8位，因此，255个字节对应的位可以表示为：当b_m,n'＝1时，V_bm,n'＝1，当b_m,n'＝0时，V_bm,n'＝-1，其中b_m,n'表示b_m,1'…b_m,8'中的任一个，255个字节对应的位按照b_m,n'与V_bm,n'的转换关系得到矩阵V_b'，可以表示为：判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₁和W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₁的方式相同，因此仍使用矩阵R：将矩阵V_b'的第m行与矩阵R的第m行的随机数相乘，然后求和得到一个值，具体表示为S_bm'＝V_bm,1'*h_m,1+V_bm,2'*h_m,2+…+V_bm,8'*h_m,8。根据该方法，获得S_b1'、S_b2'…到S_b255'，统计S_b1'、S_b2'…到S_b255'中满足特定条件(这里以大于0为例)的值的个数K。由于矩阵R服从正态分布，则S_bm'与矩阵R一样，仍然服从正态分布，根据概率论，正态分布随机数大于0的概率为1/2，在S_b1'、S_b2'…到S_b255'中，每个值大于0的概率为1/2，所以K满足二项分布：根据统计结果，判断S_b1'、S_b2'…到S_b255'的值大于0的个数K是否为偶数，二项分布的随机数为偶数的概率为为1/2，所以K以1/2的概率满足条件。当K为偶数时，表明中至少部分数据满足预定条件C₂；当K为奇数时，表明W_j2[k_j-170,k_j-1]中至少部分数据不满足预定条件C₂。同理，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。Judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₂ in the same way, so, as shown in Figure 33, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. The selected 5-byte data is reused 51 times to obtain a total of 255 bytes to increase randomness. Each byte is composed of 8 bits, recorded as b _m,1 '...b _m,8 ', which means the 1st to 8th bits of the mth byte in 255 bytes, therefore, 255 bytes correspond to bits can be expressed as: When b _m,n '=1, V _bm,n '=1, when b _m,n '=0, V _bm,n '=-1, where b _m,n ' means b _m,1 '... For any one of b _m,8 ', the bits corresponding to 255 bytes are obtained according to the conversion relationship between b _m,n ' and V _bm,n ' to obtain matrix V _b ', which can be expressed as: Judging whether at least part of the data in the window W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₁ and whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₁ In the same way, so the matrix R is still used: Multiply the mth row of the matrix V _b ' with the random number of the mth row of the matrix R, and then sum to obtain a value, specifically expressed as S _bm '=V _bm,1 '*h _m,1 +V _{bm, 2} '*h _m,2 +...+V _bm,8 '*h _m,8 . According to this method, S _b1 ′, S _b2 ′… to S _b255 ′ are obtained, and the number K of values satisfying a specific condition (here greater than 0 is taken as an example) among S _b1 ′, S _b2 ′… to S _b255 ′ is counted. Since the matrix R obeys the normal distribution, S _bm ', like the matrix R, still obeys the normal distribution. According to the probability theory, the probability of a normal distribution random number being greater than 0 is 1/2. In S _b1 ', S _b2 '... To S _b255 ', the probability of each value being greater than 0 is 1/2, so K satisfies the binomial distribution: According to the statistical results, judge whether the number K of S _b1 ', S _b2 '... to S _b255 ' whose value is greater than 0 is an even number, and the probability that the random number of the binomial distribution is an even number is 1/2, so K is 1/ The probability of 2 satisfies the condition. When K is an even number, it indicates that at least part of the data in W _j2 [k _j -170, k _j _-1 ] does not meet the predetermined condition C ₂ when K is an odd number. Similarly, the method of judging whether at least part of the data in W _i3 [k _i -171, k _i -2] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [k _j -171, k _j -2] satisfies The way of the predetermined condition C ₃ is the same, and similarly, judge whether at least part of the data in W _j4 [k _j -172, k _j -3] meets the predetermined condition C ₄ , judge W _j5 [k _j -173, k _j -4] Whether at least part of the data in W j6 [k j -174,k j -5] satisfies the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174,k _j -5] meets the predetermined condition C ₆ , and judge W _j7 [k _j -175,k _j -6] Whether at least part of the data in W j8 [k j -176,k j -7] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _j8 [k _j -176,k _j -7] meets the predetermined condition C ₈ , judge W _j9 [k _j -177,k _j -8] Whether at least part of the data in W _j10 [k _j -178, k _j -9] meets the predetermined condition C ₁₀ and whether at least part of the data in W _j10 [k _j -179, k _j _-10 ] Whether or not at least some of the data satisfy the predetermined condition C ₁₁ will not be repeated here.

本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，仍然以图21所示实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中“■”表示选择的1个字节，相邻两个选择“■”的字节之间相差42个字节。其中一种实现方式为使用HASH函数计算选择的5个字节，使用HASH函数计算得到的数值是一个固定均匀分布，如果使用HASH函数计算得到的数值为偶数，则判断W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁，即C₁表示根据上述方式使用HASH函数计算得到的数值为偶数。因此，W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件的概率为1/2。在图21所示的实施方式中，使用Hash函数判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂、W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄和W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅，具体实现可参考描述图21所示实施方式使用Hash函数判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件的方式C₁，在此不再赘述。In this embodiment, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z , still taking the implementation shown in Figure 21 as an example, according to the deduplication The preset rule on the server 103 determines the window W _i1 [k _i -169, _ki ] for the potential segmentation point _ki , and judges whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , choose 5 In Figure 32, "■" indicates one byte selected, and the difference between two adjacent bytes selected "■" is 42 bytes. One of the implementation methods is to use the HASH function to calculate the selected 5 bytes. The value calculated by the HASH function is a fixed uniform distribution. If the value calculated by the HASH function is an even number, then judge W _i1 [k _i -169 , _ki ] at least part of the data satisfies the predetermined condition C ₁ , that is, C ₁ indicates that the value calculated by using the HASH function in the above manner is an even number. Therefore, the probability of whether at least part of the data in W _i1 [k _i −169, _ki ] satisfies the predetermined condition is 1/2. In the embodiment shown in FIG. 21, the Hash function is used to determine whether at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined conditions C ₂ and W _i3 [k _i -171, _ki -2 ] whether at least part of the data satisfies the predetermined condition C ₃ , whether at least part of the data in W _i4 [k _i -172, _ki -3] meets the predetermined condition C ₄ and W _i5 [k _i -173, _ki -4] Whether at least part of the data satisfies the predetermined condition C ₅ , for specific implementation, please refer to the method C ₁ that describes whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition by using the Hash function in the embodiment shown in FIG. 21 . This will not be repeated here.

当W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅时，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得当前潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此如图33所示，W_j1表示窗口W_j1[k_j-169,k_j]，为判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁，选择5个字节，图33中“■”表示选择的1个字节，相邻两个选择的字节“■”之间相差42个字节。使用Hash函数计算从窗口W_j1[k_j-169,k_j]中选取的5个字节，如果得到的数值为偶数，则W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁。图33中，判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图33所示，表示判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。使用Hash函数计算选择的5个字节，如果得到的数值为偶数，则W_j2[k_j-170,k_j-1]中至少部分数据满足预定条件C₂。图33中，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃的方式相同，因此，如图33所示，表示判断窗口W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃时选择的1个字节，相邻两个选择的字节之间相差42个字节。使用Hash函数计算选择的5个字节，得到的数值为偶数，则W_j3[k_j-171,k_j-2]中至少部分数据满足预定条件C₃。图33中，判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄的方式和判断窗口W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄的方式，因此，如图33所示，表示判断窗口W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄时选择的1个字节，相邻两个选择的字节之间相差42个字节。使用Hash函数计算选择的5个字节，得到的数值为偶数，则W_j4[k_j-172,k_j-3]中至少部分数据满足预定条件C₄。根据上述方法，判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。When at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , jump 7 bytes from the potential split point _ki along the direction of data flow split point search, at the seventh The end position of the byte obtains the current potential segmentation point k _j , as shown in Figure 22, according to the rules preset for the deduplication server 103, determine the window W _j1 [k _j -169, k _j ] for the potential segmentation point k _j , The method of judging whether at least part of the data in the window W _j1 [k _j -169, k _j ] satisfies the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i1 [k _i -169, k _i ] meets the predetermined condition C ₁ The same way, so as shown in Figure 33, W _j1 represents the window W _j1 [k _j -169, k _j ], in order to judge whether at least part of the data in W _j1 [k _j -169, k _j ] satisfies the predetermined condition C ₁ , Select 5 bytes, "■" in Figure 33 indicates the selected 1 byte, and the difference between two adjacent selected bytes "■" is 42 bytes. Use the Hash function to calculate the 5 bytes selected from the window W _j1 [k _j -169, k _j ], if the obtained value is an even number, then at least part of the data in W _j1 [k _j -169, k _j ] meets the predetermined Condition C ₁ . In Fig. 33, the method of judging whether at least part of the data in W _i2 [k _i -170, k _i -1] satisfies the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] _The way to satisfy the predetermined condition C2 is the same, therefore, as shown in Fig. 33, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , and the adjacent two selected bytes There is a difference of 42 bytes between them. Use the Hash function to calculate the selected 5 bytes, and if the obtained value is an even number, then at least part of the data in W _j2 [k _j -170,k _j -1] satisfies the predetermined condition C ₂ . In Fig. 33, the method of judging whether at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [k _j -171, k _j -2] _The way to satisfy the predetermined condition C3 is the same, therefore, as shown in Fig. 33, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j3 [k _j -171,k _j -2] meets the predetermined condition C ₃ , and the adjacent two selected bytes There is a difference of 42 bytes between them. Use the Hash function to calculate the selected 5 bytes, and the value obtained is an even number, then at least part of the data in W _j3 [k _j -171, k _j -2] satisfies the predetermined condition C ₃ . In Fig. 33, the way of judging whether at least part of the data in W _j4 [k _j -172, k _j -3] meets the predetermined condition C ₄ and judging at least part of the data in the window W _i4 [k _i -172, k _i -3] whether it satisfies the predetermined condition C ₄ way, therefore, as shown in Fig. 33, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j4 [k _j -172, k _j -3] meets the predetermined condition C ₄ , and the adjacent two selected bytes There is a difference of 42 bytes between them. Use the Hash function to calculate the selected 5 bytes, and the value obtained is an even number, then at least part of the data in W _j4 [k _j -172, k _j -3] satisfies the predetermined condition C ₄ . According to the above method, judge whether at least part of the data in W _j5 [k _j -173, k _j -4] meets the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174, k _j -5] meet the predetermined condition Condition C ₆ , judging whether at least part of the data in W _j7 [k _j -175, k _j -6] meets the predetermined condition C ₇ , judging whether at least part of the data in W j8 [k _j _-176 , k _j -7] meets the predetermined Condition C ₈ , judging whether at least part of the data in W _j9 [k _j -177, k _j -8] meets the predetermined condition C ₉ , judging whether at least part of the data in W _j10 [k _j -178, k _j -9] meets the predetermined The condition C ₁₀ and judging whether at least part of the data in W _j11 [k _j -179, k _j -10] satisfy the predetermined condition C ₁₁ will not be repeated here.

本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，以图21所示的实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中序号为169、127、 85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁、a₂、a₃、a₄和a₅。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁、a₂、a₃、a₄和a₅中的任一个a_r均满足0≤a_r≤255。a₁、a₂、a₃、a₄和a₅组成1*5的矩阵。从服从二项分布的随机数中选择256*5个随机数，组成矩阵R，表示为： In this embodiment, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z . Taking the implementation shown in FIG. 21 as an example, according to the deduplication The preset rule on the server 103 determines the window W _i1 [k _i -169, _ki ] for the potential segmentation point _ki , and judges whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , select five Byte, the byte "■" with serial numbers 169, 127, 85, 43 and 1 in Fig. 32 represents one selected byte respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are respectively converted into a decimal value, expressed as a ₁ , a ₂ , a ₃ , a ₄ and a ₅ respectively. Because 1 byte is composed of 8 bits, each byte "■" is used as a value, and any a _r among a ₁ , a ₂ , a ₃ , a ₄ and a ₅ satisfies 0≤a _r ≤ 255. a ₁ , a ₂ , a ₃ , a ₄ and a ₅ form a 1*5 matrix. Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix R, which is expressed as:

根据a₁的值和所在的列，从矩阵R中查找对应的值，如a₁＝36，a₁位于第1列，则查找h_36,1对应的值；根据a₂的值和所在的列，从矩阵R中查找对应的值，如a₂＝48，a₂位于第2列，则查找h_48,2对应的值；根据a₃的值和所在的列，从矩阵R中查找对应的值，如a₃＝26，a₃位于第3列，则查找h_26,3对应的值；根据a₄的值和所在的列，从矩阵R中查找对应的值，如a₄＝26，a₄位于第4列，则查找h_26,4对应的值；根据a₅的值和所在的列，从矩阵R中查找对应的值，如a₅＝88，a₅位于第5列，则查找h_88,5对应的值。S₁＝h_36,1+h_48,2+h_26,3+h_26,4+h_88,5，因为矩阵R服从二项分布，因此，S₁也服从二项分布。当S₁为偶数，则W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁，当S₁为奇数，则W_i1[k_i-169,k_i]中至少部分数据不满足预定条件C₁，S₁为偶数的概率为1/2，C₁表示按上述方式计算S₁为偶数。在图21所示实施例中，W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁。如图32所示，表示判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图32中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁、b₂、b₃、b₄和b₅。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁、b₂、b₃、b₄和b₅中的任一个b_r均满足0≤b_r≤255。b₁、b₂、b₃、b₄和b₅组成1*5的矩阵。本实施方式中，判断W_i1和W_i2中至少部分数据是否满足预定条件的方式相同，因此仍然使用矩阵R，根据b₁的值和所在的列，从矩阵R中查找对应的值，如b₁＝66，b₁位于第1列，则查找h_66,1对应的值；根据b₂的值和所在的列，从矩阵R中查找对应的值，如b₂＝48，b₂位于第2列，则查找h_48,2对应的值；根据b₃的值和所在的列，从矩阵R中查找对应的值，如b₃＝99，b₃位于第3列，则查找h_99,3对应的值；根据b₄的值和所在的列，从矩阵R中查找对应的值，如b₄＝26，b₄位于第4列，则查找h_26,4对应的值；根据b₅的值和所在的列，从矩阵R中查找对应的值，如b₅＝90，b₅位于第5列，则查找h_90,5对应的值。S₂＝h_66,1+h_48,2+h_99,3+h_26,4+h_90,5,因为矩阵R服从二项分布，因此，S₂也服从二项分布。当S₂为偶数，则W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂，当S₂为奇数，则W_i2[k_i-170,k_i-1]中至少部分数据不满足预定条件C₂，S₂为偶数的概率为1/2。在图21所示实施例中，W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i -178,k_i-9]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₁₁。图21所示的实施方式中，W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得当前潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此如图33所示，W_j1表示窗口W_j1[k_j-169,k_j]，为判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁，图33中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁'、a₂'、a₃'、a₄'和a₅'。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁'、a₂'、a₃'、a₄'和a₅'中的任一个a_r'均满足0≤a_r'≤255。a₁'、a₂'、a₃'、a₄'和a₅'组成1*5的矩阵。判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此，仍然使用矩阵R，表示为： According to the value of a ₁ and the column where it is located, find the corresponding value from the matrix R, such as a ₁ = 36, a ₁ is located in the first column, then look for the value corresponding to h _36,1 ; according to the value of a ₂ and where it is column, find the corresponding value from the matrix R, such as a ₂ = 48, a ₂ is located in the second column, then find the value corresponding to h _48,2 ; according to the value of a ₃ and the column where it is located, find the corresponding value from the matrix R value, such as a ₃ = 26, a ₃ is located in the third column, then look for the value corresponding to h _26,3 ; according to the value of a ₄ and the column where it is located, find the corresponding value from the matrix R, such as a ₄ = 26 , a ₄ is located in the fourth column, then search for the value corresponding to h _26,4 ; according to the value of a ₅ and the column where it is located, find the corresponding value from the matrix R, such as a ₅ = 88, a ₅ is located in the fifth column, Then find the value corresponding to h _88,5 . S ₁ =h _36,1 +h _48,2 +h _26,3 +h _26,4 +h _88,5 , because the matrix R obeys the binomial distribution, therefore, S ₁ also obeys the binomial distribution. When S ₁ is an even number, at least part of the data in W _i1 [k _i -169, _ki ] meets the predetermined condition C ₁ , and when S ₁ is an odd number, then at least part of the data in W _i1 [k _i -169, _ki ] If the predetermined condition C ₁ is not met, the probability that S ₁ is an even number is 1/2, and C ₁ indicates that S ₁ is an even number calculated in the above manner. In the embodiment shown in FIG. 21 , at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ . As shown in Figure 32, Represents the 1 byte selected when at least part of the data in the judgment window W _i2 [k _i -170, _ki -1] meets the predetermined condition C _2. In Fig. 32, sequence numbers 170, 128, 86, 44 are used and 2 indicate that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 respectively converted into a decimal value, respectively expressed as b ₁ , b ₂ , b ₃ , b ₄ and b ₅ . Since 1 byte consists of 8 bits, each byte As a numerical value, any b _r among b ₁ , b ₂ , b ₃ , b ₄ and b ₅ satisfies 0≤b _r ≤255. b ₁ , b ₂ , b ₃ , b ₄ and b ₅ form a 1*5 matrix. In this embodiment, the method of judging whether at least part of the data in W _i1 and W _i2 meets the predetermined conditions is the same, so the matrix R is still used, and the corresponding value is searched from the matrix R according to the value _of b1 and the column where it is located, such as b ₁ = 66, b ₁ is located in the first column, then search for the value corresponding to h _66,1 ; according to the value of b ₂ and the column where it is located, find the corresponding value from the matrix R, such as b ₂ = 48, b ₂ is in the first column 2 columns, then search for the value corresponding to h _48,2 ; according to the value of b ₃ and the column where it is located, find the corresponding value from the matrix R, such as b ₃ =99, b ₃ is located in the third column, then search for h _99, The value corresponding to ₃ ; according to the value of b ₄ and the column where it is located, find the corresponding value from the matrix R, such as b ₄ = 26, b ₄ is located in the fourth column, then find the value corresponding to h ₂₆ , 4; according to b ₅ Find the corresponding value from the matrix R, such as b ₅ =90, and b ₅ is located in the fifth column, then find the value corresponding to h _90,5 . S ₂ =h _66,1 +h _48,2 +h _99,3 +h _26,4 +h _90,5 , because the matrix R obeys the binomial distribution, therefore, S ₂ also obeys the binomial distribution. When S ₂ is an even number, at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ , and when S ₂ is an odd number, then W _i2 [k _i -170, _ki -1] At least some of the data do not satisfy the predetermined condition C ₂ , and the probability that S ₂ is an even number is 1/2. In the embodiment shown in FIG. 21 , at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [k _i -172, _ki -3] Satisfy the predetermined condition C ₄ , determine whether at least part of the data in W _i5 [k _i -173, _ki -4] meets the predetermined condition C ₅ , determine whether at least part of the data in W _i6 [k _i -174, _ki -5] Satisfy the predetermined condition C ₆ , judge whether at least part of the data in W _i7 [k _i -175, _ki -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -176, _ki -7] Satisfy the predetermined condition C ₈ , determine whether at least part of the data in W _i9 [k _i -177, _ki -8] meets the predetermined condition C ₉ , determine whether at least part of the data in W _i10 [k _i -178, _ki -9] Satisfying the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [k _i -179, _ki -10] satisfies the predetermined condition C ₁₁ . In the embodiment shown in Fig. 21, at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , and 7 jumps are made from the potential split point _ki along the direction of data stream split point search Byte, obtain the current potential segmentation point k _j at the end position of the seventh byte, as shown in Figure 22, according to the rules preset for the deduplication server 103, determine the window W _j1 [k _j for the potential segmentation point k _j -169,k _j ], the method of judging whether at least part of the data in the window W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ is the same as judging at least part of the data in the window W _i1 [k _i -169,k _i ] The method of whether the predetermined condition _C1 is satisfied is the same, so as shown in Figure 33, W _j1 represents the window W _j1 [k _j -169, k _j ], to judge that at least part of the data in W _j1 [k _j -169, k _j ] Whether the predetermined condition C ₁ is met, the bytes "■" with serial numbers 169, 127, 85, 43 and 1 in Fig. 33 represent 1 selected byte respectively, and the difference between two adjacent selected bytes is 42 characters Festival. Convert the bytes "■" with sequence numbers 169, 127, 85, 43 and 1 into a decimal value, which are respectively expressed as a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ '. Because 1 byte consists of 8 bits, each byte "■" is used as a value, and any a _r 'in a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' is Satisfy 0≤a _r '≤255. a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' form a 1*5 matrix. The method of judging whether at least part of the data in the window W _j1 [k _j -169, k _j ] satisfies the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i1 [k _i -169, k _i ] meets the predetermined condition C ₁ In the same way, therefore, still using the matrix R, expressed as:

根据a₁'的值和所在的列，从矩阵R中查找对应的值，如a₁'＝16，a₁'位于第1列，则查找h_16,1对应的值；根据a₂'的值和所在的列，从矩阵R中查找对应的值，如a₂'＝98，a₂'位于第2列，则查找h_98,2对应的值；根据a₃'的值和所在的列，从矩阵R中查找对应的值，如 a₃'＝56，a₃'位于第3列，则查找h_56,3对应的值；根据a₄'的值和所在的列，从矩阵R中查找对应的值，如a₄'＝36，a₄'位于第4列，则查找h_36,4对应的值；根据a₅'的值和所在的列，从矩阵R中查找对应的值，如a₅'＝99，a₅'位于第5列，则查找h_99,5对应的值。S₁'＝h_16,1+h_98,2+h_56,3+h_36,4+h_99,5,因为矩阵R服从二项分布，因此，S₁'也服从二项分布。当S₁'为偶数，则W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁，当S₁'为奇数，则W_j1[k_j-169,k_j]中至少部分数据不满足预定条件C₁，S₁'为偶数的概率为1/2。According to the value of a ₁ ' and the column where it is located, find the corresponding value from the matrix R, such as a ₁ '=16, and a ₁ ' is located in the first column, then find the value corresponding to h _16,1 ; according to the value of a ₂ ' value and the column where it is located, find the corresponding value from the matrix R, such as a ₂ '=98, a ₂ 'is located in the second column, then find the value corresponding to h _98,2 ; according to the value of a ₃ ' and the column where it is located , look up the corresponding value from the matrix R, such as a ₃ '=56, a ₃ 'is located in the third column, then look for the value corresponding to h _56,3 ; according to the value and column of a ₄ ', from the matrix R Find the corresponding value, such as a ₄ '=36, a ₄ ' is located in the fourth column, then find the value corresponding to h _36,4 ; according to the value of a ₅ ' and the column where it is located, find the corresponding value from the matrix R, For example, a ₅ '=99, and a ₅ 'is located in the fifth column, then search for the value corresponding to h _99,5 . S ₁ '=h _16,1 +h _98,2 +h _56,3 +h _36,4 +h _99,5 , because the matrix R obeys the binomial distribution, therefore, S ₁ ' also obeys the binomial distribution. When S ₁ ' is an even number, at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ , and when S ₁ ' is an odd number, then at least part of the data in W _j1 [k _j -169,k _j ] Part of the data does not satisfy the predetermined condition C ₁ , and the probability that S ₁ ′ is an even number is 1/2.

判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图33所示，表示判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁'、b₂'、b₃'、b₄'和b₅'。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁'、b₂'、b₃'、b₄'和b₅'中的任一个b_r'均满足0≤b_r'≤255。b₁'、b₂'、b₃'、b₄'和b₅'组成1*5的矩阵。与判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂使用相同的矩阵R，根据b₁'的值和所在的列，从矩阵R中查找对应的值，如b₁'＝210，b₁'位于第1列，则查找h_210,1对应的值；根据b₂'的值和所在的列，从矩阵R中查找对应的值，如b₂'＝156，b₂'位于第2列，则查找h_156,2对应的值；根据b₃'的值和所在的列，从矩阵R中查找对应的值，如b₃'＝144，b₃'位于第3列，则查找h_144,3对应的值；根据b₄'的值和所在的列，从矩阵R中查找对应的值，如b₄'＝60，b₄'位于第4列，则查找h_60,4对应的值；根据b₅'的值和所在的列，从矩阵R中查找对应的值，如b₅'＝90，b₅'位于第5列，则查找h_90,5对应的值。S₂'＝h_210,1+h_156,2+h_144,3+h_60,4+h_90,5，与S₂的判断条件相同，当S₂'为偶数，则W_j2[k_j-170,k_j-1]中至少部分数据满足预定条件C₂，当S₂'为奇数，则W_j2[k_j-170,k_j-1]中至少部分数据不满足预定条件C₂，S₂'为偶数的概率为1/2。Judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₂ in the same manner, so, as shown in Figure 33, Indicates the 1 byte selected when judging whether at least part of the data in the window W _j2 [k _j -170, k _j -1] satisfies the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes, Respectively represented by sequence numbers 170, 128, 86, 44 and 2, the difference between two adjacent selected bytes is 42 bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 converted into a decimal value, respectively, expressed as b ₁ ', b ₂ ', b ₃ ', b ₄ ' and b ₅ '. Since 1 byte consists of 8 bits, each byte As a numerical value, b _r ' among b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ satisfies _0≤br ′≤255. b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ form a 1*5 matrix. Use the same matrix R as judging whether at least part of the data in the window W _i2 [k _i -170, _ki _-1 ] meets the predetermined condition C ₂ , and find the corresponding Value, such as b ₁ '=210, b ₁ 'is located in the first column, then find the value corresponding to h _210,1 ; according to the value and column of b ₂ ', find the corresponding value from the matrix R, such as b ₂ '=156, b ₂ 'is located in the second column, then search for the value corresponding to h _156,2 ; according to the value of b ₃ ' and the column where it is located, find the corresponding value from the matrix R, such as b ₃ '=144, b ₃ ' is located in the third column, then search for the value corresponding to h _144,3 ; according to the value of b ₄ ' and the column where it is located, find the corresponding value from the matrix R, such as b ₄ '=60, b ₄ 'is in the 4th column, then look for the value corresponding to h _60,4 ; look up the corresponding value from the matrix R according to the value of b ₅ ' and the column where it is located, such as b ₅ '=90, b ₅ 'is in the fifth column, then look for h The value corresponding to _90,5 . S ₂ '=h _210,1 +h _156,2 +h _144,3 +h _60,4 +h _90,5 , same as S ₂ judgment conditions, when S ₂ ' is an even number, then W _j2 [k _j -170, k _j -1] at least part of the data satisfies the predetermined condition C ₂ , when S ₂ ' is an odd number, then at least part of the data in W _j2 [k _j -170, k _j -1] does not meet the predetermined condition C ₂ , The probability that S ₂ ' is an even number is 1/2.

同理，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。Similarly, the method of judging whether at least part of the data in W _i3 [k _i -171, k _i -2] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [k _j -171, k _j -2] satisfies The way of the predetermined condition C ₃ is the same, and similarly, judge whether at least part of the data in W _j4 [k _j -172, k _j -3] meets the predetermined condition C ₄ , judge W _j5 [k _j -173, k _j -4] Whether at least part of the data in W j6 [k j -174,k j -5] satisfies the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174,k _j -5] meets the predetermined condition C ₆ , and judge W _j7 [k _j -175,k _j -6] Whether at least part of the data in W j8 [k j -176,k j -7] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _j8 [k _j -176,k _j -7] meets the predetermined condition C ₈ , judge W _j9 [k _j -177,k _j -8] Whether at least part of the data in W _j10 [k _j -178, k _j -9] meets the predetermined condition C ₁₀ and whether at least part of the data in W _j10 [k _j -179, k _j _-10 ] Whether or not at least some of the data satisfy the predetermined condition C ₁₁ will not be repeated here.

本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，以图21所示的实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁、a₂、a₃、a₄和a₅。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁、a₂、a₃、a₄和a₅中的任一个a_s均满足0≤a_s≤255。a₁、a₂、a₃、a₄和a₅组成1*5的矩阵。从服从二项分布的随机数中选择256*5个随机数，组成矩阵R，表示为：从服从二项分布的随机数中选择256*5个随机数，组成矩阵G，表示为： In this embodiment, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z . Taking the implementation shown in FIG. 21 as an example, according to the deduplication The preset rule on the server 103 determines the window W _i1 [k _i -169, _ki ] for the potential segmentation point _ki , and judges whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , choose 5 byte, the byte "■" with serial number 169, 127, 85, 43 and 1 in Fig. 32 represents one byte selected respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are respectively converted into a decimal value, expressed as a ₁ , a ₂ , a ₃ , a ₄ and a ₅ respectively. Because 1 byte consists of 8 bits, each byte "■" is used as a value, and any a _s in a ₁ , a ₂ , a ₃ , a ₄ and a ₅ satisfies 0≤a _s ≤ 255. a ₁ , a ₂ , a ₃ , a ₄ and a ₅ form a 1*5 matrix. Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix R, which is expressed as: Select 256*5 random numbers from the random numbers subject to the binomial distribution to form a matrix G, which is expressed as:

根据a₁的值和所在的列，如a₁＝36，a₁位于第1列，则从矩阵R中查找查找h_36,1对应的值，从矩阵G中查找g_36,1对应的值；根据a₂的值和所在的列，如a₂＝48，a₂位于第2列，则从矩阵R中查h_48,2对应的值，从矩阵G中查找g_48,2对应的值；根据a₃的值和所在的列，如a₃＝26，a₃位于第3列，则从矩阵R中查找h_26,3对应的值，从矩阵G中查找g_26,3对应的值；根据a₄的值和所在的列，如a₄＝26，a₄位于第4列，则从矩阵R中查找h_26,4对应的值，从矩阵G中查找g_26,4对应的值；根据a₅的值和所在的列，如a₅＝88，a₅位于第5列，则从矩阵R中查找h_88,5对应的值，从矩阵G中查找g_88,5对应的值。S_1h＝h_36,1+h_48,2+h_26,3+h_26,4+h_88,5,因为矩阵R服从二项分布，因此，S_1h也服从二项分布；S_1g＝g_36,1+g_48,2+g_26,3+g_26,4+g_88,5，因为矩阵G服从二项分布，因此S_1g也服从二项分布。当S_1h和S_1g中有1个为偶数，则W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁，当S_1h和S_1g均为奇数，则W_i1[k_i-169,k_i]中至少部分数据不满足预定条件C₁，C₁表述按照上述方法获得的S_1h和S_1g中有1个为偶数。因为S_1h和S_1g均服从二项分布，因此S_1h为偶数的概率为1/2，S_1g为偶数的概率为1/2，S_1h和S_1g中有1个为偶数的概率为1-1/4＝3/4，因此，W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁的概率为3/4。在图21所示实施例中，W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁。在图21所示的实施方式中，在W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-178,k_i-9]和W_i11[k_i-179,k_i-10]中，各窗口大小相同,即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的描述。因此，如图32所示，表示判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图32中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁、b₂、b₃、b₄和b₅。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁、b₂、b₃、b₄和b₅中的任一个b_s均满足0≤b_s≤255。b₁、b₂、b₃、b₄和b₅组成1*5的矩阵。本实施方式中，判断各窗口中至少部分数据是否满足预定条件的方式相同，因此仍然使用相同矩阵R和G。根据b₁的值和所在的列，如b₁＝66，b₁位于第1列，则从矩阵R中查找h_66,1对应的值，从矩阵G中查找g_66,1对应的值；根据b₂的值和所在的列，如b₂＝48，b₂位于第2列，则从矩阵R中查找h_48,2对应的值，从矩阵G中查找g_48,2对应的值；根据b₃的值和所在的列，如b₃＝99，b₃位于第3列，则从矩阵R中查找h_99,3对应的值，从矩阵G中查找g_99,3对应的值；根据b₄的值和所在的列，如b₄＝26，b₄位于第4列，则从矩阵R中查找h_26,4对应的值，从矩阵G中查找g_26,4对应的值；根据b₅的值和所在的列，如b₅＝90，b₅位于第5列，则从矩阵R中查找h_90,5对应的值，从矩阵G中查找g_90,5对应的值。S_2h＝h_66,1+h_48,2+h_99,3+h_26,4+h_90,5,因为矩阵R服从二项分布，因此，S_2h也服从二项分布。S_2g＝g_66,1+g_48,2+g_99,3+g_26,4+g_90,5，因为矩阵G服从二项分布，因此，S_2g也服从二项分布。当S_2h和S_2g中有1个为偶数，则W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂，当S_2h和S_2g均为奇数，则W_i2[k_i-170,k_i-1]中至少部分数据不满足预定条件C₂，S_2h和S_2g中有1个为偶数的概率为3/4。在图21所示实施例中，W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-178,k_i-9]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₁₁。图21所示的实施方式中，W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得当前潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此如图33所示，W_j1表示窗口W_j1[k_j-169,k_j]，为判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁，图33中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”分别转换成一个十进制数值，分别表示为a₁'、a₂'、a₃'、a₄'和a₅'。因为1个字节由8位组成，所以每个字节“■”作为一个数值，则a₁'、a₂'、a₃'、a₄'和a₅'中的任一个a_s'均满足0≤a_s'≤255。a₁'、a₂'、a₃'、a₄'和a₅'组成1*5的矩阵。使用与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁相同的矩阵R和G，分别表示为：和 According to the value of a ₁ and the column where it is located, such as a ₁ = 36, and a ₁ is located in the first column, then look up the value corresponding to h _36,1 from the matrix R, and look up the value corresponding to g _36,1 from the matrix G ;According to the value of a ₂ and the column where it is located, such as a ₂ = 48, a ₂ is located in the second column, then look up the value corresponding to h _48,2 from the matrix R, and look up the value corresponding to g _48,2 from the matrix G ;According to the value of a ₃ and the column where it is located, such as a ₃ =26, a ₃ is located in the third column, then look up the value corresponding to h _26,3 from the matrix R, and look up the value corresponding to g _26,3 from the matrix G ;According to the value of a ₄ and the column where it is located, such as a ₄ =26, a ₄ is located in the fourth column, then look up the value corresponding to h _26,4 from the matrix R, and look up the value corresponding to g _26,4 from the matrix G ;According to the value of a ₅ and the column where it is located, as a ₅ =88, a ₅ is located in the fifth column, then look up the value corresponding to h _88,5 from the matrix R, and look up the value corresponding to g _88,5 from the matrix G . S _1h ＝h _36,1 +h _48,2 +h _26,3 +h _26,4 +h _88,5 , because matrix R obeys binomial distribution, therefore, S _1h also obeys binomial distribution; S _1g ＝g _36,1 +g _48,2 +g _26,3 +g _26,4 +g _88,5 , because matrix G obeys binomial distribution, so S _1g also obeys binomial distribution. When one of S _1h and S _1g is an even number, at least part of the data in W _i1 [k _i -169,k _i ] meets the predetermined condition C ₁ , and when S _1h and S _1g are both odd numbers, then W _i1 [k _i -169, _ki ] at least part of the data do not meet the predetermined condition C ₁ , C ₁ means that one of S _1h and S _1g obtained by the above method is an even number. Because both S _1h and S _1g follow the binomial distribution, the probability that S _1h is even is 1/2, the probability that S _1g is even is 1/2, and the probability that one of S _1h and S _1g is even is 1 -1/4=3/4, therefore, the probability that at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ is 3/4. In the embodiment shown in FIG. 21 , at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ . In the embodiment shown in FIG. 21 , in W _i1 [k _i -169, _ki ], W _i2 [k _i -170, _ki -1], W _i3 [k _i -171, _ki -2] , W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4], W _i6 [k _i -174,k _i -5], W _i7 [k _i -175 ,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177,k _i -8], W _i10 [k _i -178,k _i -9] and W In _i11 [k _i -179, k _i -10], the size of each window is the same, that is, the size of the window is 169 bytes, and the method of judging whether at least part of the data in the window meets the predetermined condition is also the same, see the above judgment W _i1 for details Whether at least part of the data in [k _i −169, _ki ] satisfies the description of the predetermined condition C ₁ . Therefore, as shown in Figure 32, Represents the 1 byte selected when at least part of the data in the judgment window W _i2 [k _i -170, _ki -1] meets the predetermined condition C _2. In Fig. 32, sequence numbers 170, 128, 86, 44 are used and 2 indicate that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 respectively converted into a decimal value, respectively expressed as b ₁ , b ₂ , b ₃ , b ₄ and b ₅ . Since 1 byte consists of 8 bits, each byte As a numerical value, any b _s among b ₁ , b ₂ , b ₃ , b ₄ and b ₅ satisfies 0≤b _s ≤255. b ₁ , b ₂ , b ₃ , b ₄ and b ₅ form a 1*5 matrix. In this embodiment, the method of judging whether at least part of the data in each window satisfies the predetermined condition is the same, so the same matrices R and G are still used. According to the value of b ₁ and the column where it is located, such as b ₁ = 66, b ₁ is located in the first column, then look up the value corresponding to h _66,1 from the matrix R, and look up the value corresponding to g _66,1 from the matrix G; According to the value of b2 and the column where it is located, as b2=48, b2 is located in the _second column, then look _{up the value corresponding to h 48,2} _from matrix R, and look up the value corresponding to _g _48,2 from matrix G; According to the value of b ₃ and the column where it is located, as b ₃ =99, b ₃ is located in the third column, then the value corresponding to h _99,3 is searched from matrix R, and the value corresponding to g _99,3 is searched from matrix G; According to the value of b ₄ and the column where it is located, as b ₄ =26, b ₄ is located in the fourth column, then look up the value corresponding to h _26,4 from the matrix R, and look up the value corresponding to g _26,4 from the matrix G; According to the value of b ₅ and the column where it is located, for example, b ₅ =90, and b ₅ is located in the fifth column, the value corresponding to h _90,5 is searched from the matrix R, and the value corresponding to g _90,5 is searched from the matrix G. S _2h =h _66,1 +h _48,2 +h _99,3 +h _26,4 +h _90,5 , because the matrix R obeys the binomial distribution, therefore, S _2h also obeys the binomial distribution. S _2g =g _66,1 +g _48,2 +g _99,3 +g _26,4 +g _90,5 , because the matrix G obeys the binomial distribution, therefore, S _2g also obeys the binomial distribution. When one of S _2h and S _2g is an even number, at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ , and when both S _2h and S _2g are odd numbers, then W _i2 At least part of the data in [k _i -170, _ki -1] does not satisfy the predetermined condition C ₂ , and the probability that one of S _2h and S _2g is an even number is 3/4. In the embodiment shown in FIG. 21 , at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [k _i -172, _ki -3] Satisfy the predetermined condition C ₄ , determine whether at least part of the data in W _i5 [k _i -173, _ki -4] meets the predetermined condition C ₅ , determine whether at least part of the data in W _i6 [k _i -174, _ki -5] Satisfy the predetermined condition C ₆ , judge whether at least part of the data in W _i7 [k _i -175, _ki -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -176, _ki -7] Satisfy the predetermined condition C ₈ , determine whether at least part of the data in W _i9 [k _i -177, _ki -8] meets the predetermined condition C ₉ , determine whether at least part of the data in W _i10 [k _i -178, _ki -9] Satisfying the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [k _i -179, _ki -10] satisfies the predetermined condition C ₁₁ . In the embodiment shown in Fig. 21, at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , and 7 jumps are made from the potential split point _ki along the direction of data stream split point search Byte, obtain the current potential segmentation point k _j at the end position of the seventh byte, as shown in Figure 22, according to the rules preset for the deduplication server 103, determine the window W _j1 [k _j for the potential segmentation point k _j -169,k _j ], the method of judging whether at least part of the data in the window W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ is the same as judging at least part of the data in the window W _i1 [k _i -169,k _i ] The method of whether the predetermined condition _C1 is satisfied is the same, so as shown in Figure 33, W _j1 represents the window W _j1 [k _j -169, k _j ], in order to judge that at least part of the data in W _j1 [k _j -169, k _j ] Whether the predetermined condition C ₁ is met, the bytes "■" with serial numbers 169, 127, 85, 43 and 1 in Fig. 33 respectively represent 1 selected byte, and the difference between two adjacent selected bytes is 42 characters Festival. Convert the bytes "■" with sequence numbers 169, 127, 85, 43, and 1 into a decimal value, which are respectively expressed as a ₁ ', a ₂ ', a ₃ ', a ₄ ', and a ₅ '. Because 1 byte consists of 8 bits, each byte "■" is used as a value, and any a _s 'in a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' is Satisfy 0≤a _s '≤255. a ₁ ', a ₂ ', a ₃ ', a ₄ ' and a ₅ ' form a 1*5 matrix. Using the same matrix R and G as judging whether at least part of the data in the window W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ is expressed as: and

根据a₁'的值和所在的列，如a₁'＝16，a₁'位于第1列，则从矩阵R中查找h_16,1对应的值，从矩阵G中查找g_16,1对应的值；根据a₂'的值和所在的列，如a₂'＝98，a₂'位于第2列，则从矩阵R中查找h_98,2对应的值,从矩阵G中查找g_98,2对应的值；根据a₃'的值和所在的列，如a₃'＝56，a₃'位于第3列，则从矩阵R中查找h_56,3对应的值,从矩阵G中查找g_56,3对应的值；根据a₄'的值和所在的列，如a₄'＝36，a₄'位于第4列，则从矩阵R中查找h_36,4对应的值,从矩阵G中查找g_36,4对应的值；根据a₅'的值和所在的列，如a₅'＝99，a₅'位于第5列，则从矩阵R中查找h_99,5对应的值,从矩阵G中查找g_99,5对应的值。S_1h'＝h_16,1+h_98,2+h_56,3+h_36,4+h_99,5,因为矩阵R服从二项分布，因此，S_1h'也服从二项分布；S_1g'＝g_16,1+g_98,2+g_56,3+g_36,4+g_99,5，因为矩阵G服从二项分布，因此S_1g'也服从二项分布。当S_1h'和S_1g'中有1个为偶数，则W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁，当S_1h'和S_1g'均为奇数，则W_j1[k_j-169,k_j]中至少部分数据不满足预定条件C₁，S_1h'和S_1g'有1个为偶数的概率为3/4。According to the value of a ₁ ' and the column where it is located, such as a ₁ '=16, and a ₁ 'is located in the first column, then look up the value corresponding to h _16,1 from the matrix R, and find the corresponding value of g _16,1 from the matrix G value; according to the value of a ₂ ' and the column where it is located, such as a ₂ '=98, a ₂ 'is located in the second column, then look up the value corresponding to h ₉₈ and 2 from the matrix R, and look up g ₉₈ from the matrix G _{, the value corresponding to 2} ; according to the value of a ₃ ' and the column where it is located, such as a ₃ '=56, a ₃ 'is located in the third column, then look up the value corresponding to h _56,3 from the matrix R, and from the matrix G Find the value corresponding to g _56,3 ; according to the value of a ₄ ' and the column where it is located, such as a ₄ '=36, a ₄ ' is located in the fourth column, then find the value corresponding to h _36,4 from the matrix R, from Find the value corresponding to g _36,4 in the matrix G; according to the value of a ₅ ' and the column where it is located, such as a ₅ '=99, a ₅ ' is located in the fifth column, then find the value corresponding to h _99,5 from the matrix R value, find the value corresponding to g _99,5 from the matrix G. S _1h '=h _16,1 +h _98,2 +h _56,3 +h _36,4 +h _99,5 , because matrix R obeys binomial distribution, therefore, S _1h ' also obeys binomial distribution; S _1g '=g _16,1 +g _98,2 +g _56,3 +g _36,4 +g _99,5 , because the matrix G obeys the binomial distribution, so S _1g ' also obeys the binomial distribution. When one of S _1h ' and S _1g ' is an even number, then at least part of the data in W _j1 [k _j -169,k _j ] satisfies the predetermined condition C ₁ , and when both S _1h ' and S _1g ' are odd numbers, then At least part of the data in W _j1 [k _j -169,k _j ] does not satisfy the predetermined condition C ₁ , and the probability that one of S _1h ' and S _1g ' is an even number is 3/4.

判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图33所示，表示判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。在图33中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节分别转换成一个十进制数值，分别表示为b₁'、b₂'、b₃'、b₄'和b₅'。因为1个字节由8位组成，所以每个字节作为一个数值，则b₁'、b₂'、b₃'、b₄'和b₅'中的任一个b_s'均满足0≤b_s'≤255。b₁'、b₂'、b₃'、b₄'和b₅'组成1*5的矩阵。使用与判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂相同的矩阵R和G，根据b₁'的值和所在的列，如b₁'＝210，b₁'位于第1列，则从矩阵R中查找h_210,1对应的值,从矩阵G中查找g_210,1对应的值；根据b₂'的值和所在的列，如b₂'＝156，b₂'位于第2列，则从矩阵R中查找h_156,2对应的值,从矩阵G中查找g_156,2对应的值；根据b₃'的值和所在的列，如b₃'＝144，b₃'位于第3列，则从矩阵R中查找h_144,3对应的值,从矩阵G中查找g_144,3对应的值；根据b₄'的值和所在的列，如b₄'＝60，b₄'位于第4列，则从矩阵R中查找h_60,4对应的值,从矩阵G中查找g_60,4对应的值；根据b₅'的值和所在的列，如b₅'＝90，b₅'位于第5列，则从矩阵R中查找h_90,5对应的值,从矩阵G中查找g_90,5对应的值。S_2h'＝h_210,1+h_156,2+h_144,3+h_60,4+h_90,5,S_2g'＝g_210,1+g_156,2+g_144,3+g_60,4+g_90,5。当S_2h'和S_2g'中有1个为偶数，则W_j2[k_j-170,k_j -1]中至少部分数据满足预定条件C₂，当S_2h'和S_2g'均为奇数，则W_j2[k_j-170,k_j-1]中至少部分数据不满足预定条件C₂，S_2h'和S_2g'中有1个为偶数的概率为3/4。Judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₂ in the same way, so, as shown in Figure 33, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. In FIG. 33 , they are represented by sequence numbers 170, 128, 86, 44 and 2 respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 converted into a decimal value, respectively, expressed as b ₁ ', b ₂ ', b ₃ ', b ₄ ' and b ₅ '. Since 1 byte consists of 8 bits, each byte As a numerical value, any one of b _s ' among b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ satisfies 0≤b _s '≤255. b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′, and b ₅ ′ form a 1*5 matrix. Using the same matrix R and G as judging whether at least part of the data in the window W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , according to the value of b ₁ ' and the column where it is located, such as b ₁ '= 210, b ₁ ' is located in the first column, then look up the value corresponding to h _210,1 from the matrix R, and find the value corresponding to g _210,1 from the matrix G; according to the value of b ₂ ' and the column where it is located, such as b ₂ '=156, b ₂ 'is located in the second column, then look up the value corresponding to h _156,2 from the matrix R, and look up the value corresponding to g _156,2 from the matrix G; according to the value of b ₃ ' and the column where it is located , such as b ₃ '=144, b ₃ 'is located in the third column, then look up the value corresponding to h _144,3 from the matrix R, and look up the value corresponding to g _144,3 from the matrix G; according to the value of b ₄ ' and The column where it is located, such as b ₄ '=60, b ₄ 'is located in the fourth column, then look up the value corresponding to h _60,4 from the matrix R, and look up the value corresponding to g _60,4 from the matrix G; according to b ₅ ' The value and the column where b ₅ '=90, b ₅ 'is located in the fifth column, then look up the value corresponding to h _90,5 from the matrix R, and look up the value corresponding to g _90,5 from the matrix G. S _2h '=h _210,1 +h _156,2 +h _144,3 +h _60,4 +h _90,5 , S _2g '=g _210,1 +g _156,2 +g _144,3 +g _{60 ,4} +g _90,5 . When one of S _2h ' and S _2g ' is an even number, then at least part of the data in W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , when both S _2h ' and S _2g ' are odd , then at least part of the data in W _j2 [k _j -170,k _j -1] does not meet the predetermined condition C ₂ , and the probability that one of S _2h ' and S _2g ' is an even number is 3/4.

本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，以图21所示的实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”依次看成40个位，分别表示为a₁、a₂、a₃、a₄…a₄₀。a₁、a₂、a₃、a₄…a₄₀中的任一a_t，当a_t＝0时，V_at＝-1，当a_t＝1时，V_at＝1，根据a_t与V_at对应关系，生成V_a1、V_a2、V_a3、V_a4…V_a40。从服从正态分布的随机数中选择40个随机数，分别表示为：h₁、h₂、h₃、h₄...h₄₀。S_a＝V_a1*h₁+V_a2*h₂+V_a3*h₃+V_a4*h₄+…+V_a40*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_a也服从正态分布。当S_a为正数，则W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁，当S_a为负数或0，则W_i1[k_i-169,k_i]中至少部分数据不满足预定条件C₁，S_a为正数的概率为1/2。在图21所示实施例中，W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁。如图32所示，表示判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂时分别选择的1个字节，在图32中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节依次看成40个位，分别表示为b₁、b₂、b₃、b₄…b₄₀。b₁、b₂、b₃、b₄…b₄₀中的任一b_t，当b_t＝0时，V_bt＝-1，当b_t＝1时，V_bt＝1，根据b_t与V_bt对应关系，生成V_b1、V_b2、V_b3、V_b4…V_b40。判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀，S_b＝V_b1*h₁+V_b2*h₂+V_b3*h₃+V_b4*h₄+…+V_b40*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_b也服从正态分布。当S_b为正数，则W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂，当S_b为负数或0，则W_i2[k_i-170,k_i-1]中至少部分数据不满足预定条件C₂，S_b为正数的概率为1/2。在图21所示实施例中，W_i2[k_i-170,k_i-1]中至少部分数据满足预定条件C₂。使用同样的规则，分别判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅、判断W_i6[k_i-174,k_i-5]中至少部分数据是否满足预定条件C₆、判断W_i7[k_i-175,k_i-6]中至少部分数据是否满足预定条件C₇、判断W_i8[k_i-176,k_i-7]中至少部分数据是否满足预定条件C₈、判断W_i9[k_i-177,k_i-8]中至少部分数据是否满足预定条件C₉、判断W_i10[k_i-178,k_i-9]中至少部分数据是否满足预定条件C₁₀和判断W_i11[k_i-179,k_i-10]中至少部分数据是否满足预定条件C₁₁。图21所示的实施方式中，W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得当前潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此如图33所示，W_j1表示窗口W_j1[k_j-169,k_j]，为判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁，选择5个字节，图33中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”依次看成40个位，分别表示为a₁'、a₂'、a₃'、a₄'…a₄₀'。a₁'、a₂'、a₃'、a₄'…a₄₀'中的任一a_t'，当a_t'＝0时，V_at'＝-1，当a_t'＝1时，V_at'＝1，根据a_t'与V_at'对应关系，生成V_a1'、V_a2'、V_a3'、V_a4'…V_a40'。判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀。S_a'＝V_a1'*h₁+V_a2'*h₂+V_a3'*h₃+V_a4'*h₄+…+V_a40'*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_a'也服从正态分布。当S_a'为正数，则W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁，当S_a'为负数或0，则W_j1[k_j-169,k_j]中至少部分数据不满足预定条件C₁，S_a'为正数的概率为1/2。In this embodiment, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z . Taking the implementation shown in FIG. 21 as an example, according to the deduplication The preset rule on the server 103 determines the window W _i1 [k _i -169, _ki ] for the potential segmentation point _ki , and judges whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , choose 5 byte, the byte "■" with serial number 169, 127, 85, 43 and 1 in Fig. 32 represents one byte selected respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are regarded as 40 bits in turn, which are expressed as a ₁ , a ₂ , a ₃ , a ₄ ...a ₄₀ respectively. For any at of a ₁ , a ₂ , a ₃ , a ₄ ...a ₄₀ , when _at = 0, V _at = _-1 , when _at = 1, V _at = 1, according to _at and _Vat correspondence relationship generates V _a1 , V _a2 , V _a3 , V _a4 . . . V _a40 . Select 40 random numbers from the random numbers subject to the normal distribution, respectively denoted as: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ . S _a =V _a1 *h ₁ +V _a2 *h ₂ +V _a3 *h ₃ +V _a4 *h ₄ + . . . +V _a40 *h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey the normal distribution, therefore, S _a also obeys the normal distribution. When S _a is a positive number, at least part of the data in W _i1 [ _ki -169, _ki ] meets the predetermined condition C ₁ , when S _a is negative or 0, then in W _i1 [ _ki -169, _ki ] At least part of the data does not satisfy the predetermined condition C ₁ , and the probability that S _a is a positive number is 1/2. In the embodiment shown in FIG. 21 , at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ . As shown in Figure 32, Represents the 1 byte selected when at least part of the data in the judgment window W _i2 [k _i -170, _ki -1] meets the predetermined condition C _2. In Fig. 32, sequence numbers 170, 128, 86, 44 are used and 2 indicate that there is a difference of 42 bytes between two adjacent selected bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 They are regarded as 40 bits in turn, represented as b ₁ , b ₂ , b ₃ , b ₄ . . . b ₄₀ . For any b _t in b ₁ , b ₂ , b ₃ , b ₄ ... b ₄₀ , when b _t =0, V _bt =-1, when b _t =1, V _bt =1, according to b _t and The V _bt correspondence relationship generates V _b1 , V _b2 , V _b3 , V _b4 . . . V _b40 . The method of judging whether at least part of the data in the window W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ is the same as judging whether at least part of the data in the window W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ in the same way, therefore, use the same random numbers: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ , S _b =V _b1 *h ₁ +V _b2 *h ₂ +V _b3 *h ₃ +V _b4 *h ₄ +...+V _b40 *h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _b also obeys normal distribution. When S _b is a positive number, at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ , and when S _b is negative or 0, then W _i2 [k _i -170, _ki -1] -1] at least part of the data does not meet the predetermined condition C ₂ , the probability that S _b is a positive number is 1/2. In the embodiment shown in FIG. 21 , at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ . Using the same rule, judge whether at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [k _i -172, _ki -3] Satisfy the predetermined condition C ₄ , determine whether at least part of the data in W _i5 [k _i -173, _ki -4] meets the predetermined condition C ₅ , determine whether at least part of the data in W _i6 [k _i -174, _ki -5] Satisfy the predetermined condition C ₆ , judge whether at least part of the data in W _i7 [k _i -175, _ki -6] meets the predetermined condition C ₇ , judge whether at least part of the data in W _i8 [k _i -176, _ki -7] Satisfy the predetermined condition C ₈ , determine whether at least part of the data in W _i9 [k _i -177, _ki -8] meets the predetermined condition C ₉ , determine whether at least part of the data in W _i10 [k _i -178, _ki -9] Satisfying the predetermined condition C ₁₀ and judging whether at least part of the data in W _i11 [k _i -179, _ki -10] satisfies the predetermined condition C ₁₁ . In the embodiment shown in Fig. 21, at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , and 7 jumps are made from the potential split point _ki along the direction of data stream split point search Byte, obtain the current potential segmentation point k _j at the end position of the seventh byte, as shown in Figure 22, according to the rules preset for the deduplication server 103, determine the window W _j1 [k _j for the potential segmentation point k _j -169,k _j ], the method of judging whether at least part of the data in the window W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ is the same as judging at least part of the data in the window W _i1 [k _i -169,k _i ] The method of whether the predetermined condition _C1 is satisfied is the same, so as shown in Figure 33, W _j1 represents the window W _j1 [k _j -169, k _j ], in order to judge that at least part of the data in W _j1 [k _j -169, k _j ] Whether the predetermined condition C ₁ is met, select 5 bytes, and the bytes "■" with serial numbers 169, 127, 85, 43 and 1 in Figure 33 respectively represent the selected 1 byte, and the adjacent two selected bytes There is a difference of 42 bytes between them. The bytes "■" with sequence numbers 169, 127, 85, 43 and 1 are regarded as 40 bits in sequence, which are represented as a ₁ ', a ₂ ', a ₃ ', a ₄ '...a ₄₀ ' respectively. For any a _t 'in a ₁ ', a ₂ ', a ₃ ', a ₄ '...a ₄₀ ', when a _t '=0, V _at '=-1, when a _t '=1, V _at '=1, V _a1 ', V _a2 ', V _a3 ', V _a4 '...V _a40 ' _are generated according to the correspondence between at ' and V _at '. The method of judging whether at least part of the data in the window W _j1 [k _j -169, k _j ] satisfies the predetermined condition C ₁ is the same as the method of judging whether at least part of the data in the window W _i1 [k _i -169, k _i ] meets the predetermined condition C ₁ In the same way, so the same random numbers are used: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ . S _a '=V _a1 '*h ₁ +V _a2 '*h ₂ +V _a3 '*h ₃ +V _a4 '*h ₄ +...+V _a40 '*h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _a ' also obeys normal distribution. When S _a 'is a positive number, at least part of the data in W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ , and when S _a ' is negative or 0, then W _j1 [k _j -169,k _j ] at least part of the data does not meet the predetermined condition C ₁ , the probability that S _a ' is a positive number is 1/2.

判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，如图33所示，表示判断窗口W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂时选择的1个字节，相邻两个选择的字节之间相差42个字节。在图33中，分别用序号170、128、86、44和2表示，相邻两个选择的字节之间相差42个字节。将序号170、128、86、44和2的字节依次看成40个位，分别表示为b₁'、b₂'、b₃'、b₄'…b₄₀'。b₁'、b₂'、b₃'、b₄'…b₄₀'中的任一b_t'，当b_t'＝0时，V_bt'＝-1，当b_t'＝1时，V_bt'＝1，根据b_t'与V_bt'对应关系，生成V_b1'、V_b2'、V_b3'、V_b4'…V_b40'。判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，因此，使用相同的随机数：h₁、h₂、h₃、h₄...h₄₀，S_b'＝V_b1'*h₁+V_b2'*h₂+V_b3'*h₃+V_b4'*h₄+…+V_b40'*h₄₀。因为h₁、h₂、h₃、h₄...h₄₀服从正态分布，因此，S_b'也服从正态分布。当S_b'为正数，则W_j2[k_j-170,k_j-1]中至少部分数据满足预定条件C₂，当S_b'为负数或0，则W_j2[k_j-170,k_j-1]中至少部分数据不满足预定条件C₂，S_b'为正数的概率为1/2。Judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₂ in the same way, so, as shown in Figure 33, Indicates one byte selected when judging whether at least part of the data in the window W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , and the difference between two adjacent selected bytes is 42 bytes. In FIG. 33 , they are represented by sequence numbers 170, 128, 86, 44 and 2 respectively, and the difference between two adjacent selected bytes is 42 bytes. The bytes with sequence numbers 170, 128, 86, 44, and 2 It is regarded as 40 bits in turn, which are respectively expressed as b ₁ ′, b ₂ ′, b ₃ ′, b ₄ ′…b ₄₀ ′. Any b _t 'in b ₁ ', b ₂ ', b ₃ ', b ₄ '...b ₄₀ ', when b _t '=0, V _bt '=-1, when b _t '=1, V _bt '=1, V _b1 ', V _b2 ', V _b3 ', V _b4 '...V _b40 ' are generated according to the corresponding relationship between b _t ' and V _bt '. Judging whether at least part of the data in W _i2 [k _i -170, _ki -1] meets the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meets the predetermined condition C ₂ in the same way, therefore, use the same random numbers: h ₁ , h ₂ , h ₃ , h ₄ ... h ₄₀ , S _b '=V _b1 '*h ₁ +V _b2 '*h ₂ +V _b3 '*h ₃ +V _b4 '*h ₄ +...+V _b40 '*h ₄₀ . Because h ₁ , h ₂ , h ₃ , h ₄ . . . h ₄₀ obey normal distribution, therefore, S _b ' also obeys normal distribution. When S _b ' is a positive number, at least part of the data in W _j2 [k _j -170,k _j -1] meets the predetermined condition C ₂ , and when S _b ' is negative or 0, then W _j2 [k _j -170, At least part of the data in k _j -1] does not satisfy the predetermined condition C ₂ , and the probability that S _b ' is a positive number is 1/2.

同理，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j -176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。Similarly, the method of judging whether at least part of the data in W _i3 [k _i -171, k _i -2] satisfies the predetermined condition C ₃ is the same as judging whether at least part of the data in W _j3 [k _j -171, k _j -2] satisfies The way of the predetermined condition C ₃ is the same, and similarly, judge whether at least part of the data in W _j4 [k _j -172, k _j -3] meets the predetermined condition C ₄ , judge W _j5 [k _j -173, k _j -4] Whether at least part of the data in W j6 [k j -174,k j -5] satisfies the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174,k _j -5] meets the predetermined condition C ₆ , and judge W _j7 [k _j -175,k _j -6] Whether at least part of the data in W j8 [k j -176,k j -7] satisfies the predetermined condition C ₇ , judge whether at least part of the data in W _j8 [k _j -176,k _j -7] meets the predetermined condition C ₈ , judge W _j9 [k _j -177,k _j -8] Whether at least part of the data in W _j10 [k _j -178, k _j -9] meets the predetermined condition C ₁₀ and whether at least part of the data in W _j10 [k _j -179, k _j _-10 ] Whether or not at least some of the data satisfy the predetermined condition C ₁₁ will not be repeated here.

本实施例中使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z，仍然以图21所示实施方式为例，根据在去重服务器103上预设的规则，为潜在分割点k_i确定窗口W_i1[k_i-169,k_i]，判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定的条件C₁，如图32所示，W_i1表示窗口W_i1[k_i-169,k_i]，为判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁，选择5个字节，图32中序号为169、127、85、43和1的字节“■”分别表示选择的1个字节，相邻两个选择的字节之间相差42个字节。将序号为169、127、85、43和1的字节“■”转换成1个十进制数，范围为0-(2^40-1)，使用均匀分布随机数生成器为0-(2^40-1)中的每一个十进制数生成1个指定值，记录0-(2^40-1)中的每一个十进制数与指定值之间的对应关系R，一旦指定则该十进制数对应的指定值就不变，该指定值服从均匀分布，如果该指定值为偶数，则W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁，如果该指定值为奇数，则W_i1[k_i-169,k_i]中至少部分数据不满足预定条件C₁，C₁表示按照上述方法获得的指定值为偶数。因为均匀分布的随机数为偶数的概率为1/2，因此，W_i1[k_i-169,k_i]中至少部分数据满足预定条件C₁的概率为1/2。在图21所示的实施方式中，使用同样的规则，分别判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃、判断W_i4[k_i-172,k_i-3]中至少部分数据是否满足预定条件C₄、判断W_i5[k_i-173,k_i-4]中至少部分数据是否满足预定条件C₅，在此不再赘述。In this embodiment, a random function is used to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z , still taking the implementation shown in Figure 21 as an example, according to the deduplication The preset rule on the server 103 determines the window W _i1 [k _i -169, _ki ] for the potential segmentation point _ki , and judges whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , as shown in Figure 32, W _i1 represents the window W _i1 [k _i -169, _ki ], in order to judge whether at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ , choose 5 byte, the byte "■" with serial number 169, 127, 85, 43 and 1 in Fig. 32 represents one byte selected respectively, and the difference between two adjacent selected bytes is 42 bytes. Convert the byte "■" with serial numbers 169, 127, 85, 43 and 1 into a decimal number in the range of 0-(2^40-1), using a uniformly distributed random number generator as 0-(2^ Each decimal number in 40-1) generates a specified value, and records the correspondence R between each decimal number in 0-(2^40-1) and the specified value. Once specified, the corresponding decimal number The specified value remains unchanged, and the specified value obeys the uniform distribution. If the specified value is an even number, then at least part of the data in W _i1 [k _i -169, _ki ] satisfies the predetermined condition C ₁ . If the specified value is an odd number, then At least part of the data in W _i1 [k _i −169, _ki ] does not satisfy the predetermined condition C ₁ , and C ₁ indicates that the specified value obtained by the above method is an even number. Because the probability that a uniformly distributed random number is an even number is 1/2, therefore, the probability that at least part of the data in W _i1 [k _i −169, _ki ] satisfies the predetermined condition C ₁ is 1/2. In the embodiment shown in FIG. 21 , the same rules are used to determine whether at least part of the data in W _i2 [k _i -170, _ki -1] satisfies the predetermined condition C ₂ , and determine whether W _i3 [k _i -171, Whether at least part of the data in _ki -2] meets the predetermined condition C ₃ , judge whether at least part of the data in W _i4 [ _ki -172, _ki -3] meets the predetermined condition C ₄ , and judge W _i5 [ _ki -173, Whether at least part of the data in k _i -4] satisfies the predetermined condition C ₅ will not be repeated here.

当W_i5[k_i-173,k_i-4]中至少部分数据不满足预定条件C₅，从潜在分割点k_i沿着数据流分割点查找方向跳跃7个字节，在第7个字节的结束位置获得当前潜在分割点k_j，如图22所示，根据为去重服务器103预设的规则，为潜在分割点k_j确定窗口W_j1[k_j-169,k_j]，判断窗口W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁的方式与判断窗口W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的方式相同，因此，使用相同的0-(2^40-1)中的每一个十进制数与指定值之间的对应关系R，如图33所示，W_j1表示窗口，为判断W_j1[k_j-169,k_j]中至少部分数据是否满足预定条件C₁，选择5个字节，图33中“■”表示选择的1个字节，相邻两个选择的字节“■”之间相差42个字节。将序号为169、127、85、43和1的字节“■”转换成1个十进制数，在R查找该十进制数对应的指定值，如果该指定值为偶数，则W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁，如果该指定值为奇数，则W_j1[k_j-169,k_j]中至少部分数据不满足预定条件C₁，因为均匀分布的随机数为偶数的概率为1/2，因此，W_j1[k_j-169,k_j]中至少部分数据满足预定条件C₁的概率为1/2。同理，判断W_i2[k_i-170,k_i-1]中至少部分数据是否满足预定条件C₂的方式和判断W_j2[k_j-170,k_j-1]中至少部分数据是否满足预定条件C₂的方式相同，判断W_i3[k_i-171,k_i-2]中至少部分数据是否满足预定条件C₃的方式与判断W_j3[k_j-171,k_j-2]中至少部分数据是否满足预定条件C₃的方式相同，同理，判断W_j4[k_j-172,k_j-3]中至少部分数据是否满足预定条件C₄、判断W_j5[k_j-173,k_j-4]中至少部分数据是否满足预定条件C₅、判断W_j6[k_j-174,k_j-5]中至少部分数据是否满足预定条件C₆、判断 W_j7[k_j-175,k_j-6]中至少部分数据是否满足预定条件C₇、判断W_j8[k_j-176,k_j-7]中至少部分数据是否满足预定条件C₈、判断W_j9[k_j-177,k_j-8]中至少部分数据是否满足预定条件C₉、判断W_j10[k_j-178,k_j-9]中至少部分数据是否满足预定条件C₁₀和判断W_j11[k_j-179,k_j-10]中至少部分数据是否满足预定条件C₁₁，在此不再赘述。When at least part of the data in W _i5 [k _i -173, _ki -4] does not meet the predetermined condition C ₅ , jump 7 bytes from the potential split point _ki along the data flow split point search direction, and at the seventh word The end position of the section obtains the current potential segmentation point k _j , as shown in Figure 22, according to the rules preset for the deduplication server 103, determine the window W _j1 [k _j -169, k _j ] for the potential segmentation point k _j , judge The method of whether at least part of the data in the window W _j1 [k _j -169,k _j ] meets the predetermined condition C ₁ and the method of judging whether at least part of the data in the window W _i1 [k _i -169,k _i ] meet the predetermined condition C ₁ The same, therefore, use the corresponding relationship R between each decimal number in the same 0-(2^40-1) and the specified value, as shown in Figure 33, W _j1 represents the window, for judging W _j1 [k _j -169, k _j ] Whether at least part of the data satisfies the predetermined condition C ₁ , select 5 bytes, "■" in Figure 33 represents the selected 1 byte, between two adjacent selected bytes "■" A difference of 42 bytes. Convert the byte "■" with serial numbers 169, 127, 85, 43 and 1 into a decimal number, and find the specified value corresponding to the decimal number in R. If the specified value is even, then W _j1 [k _j - 169,k _j ] at least part of the data meet the predetermined condition C ₁ , if the specified value is an odd number, then at least part of the data in W _j1 [k _j -169,k _j ] does not meet the predetermined condition C ₁ , because the random The probability that the number is even is 1/2, therefore, the probability that at least part of the data in W _j1 [k _j -169, k _j ] satisfies the predetermined condition C ₁ is 1/2. Similarly, the method of judging whether at least part of the data in W _i2 [k _i -170, k _i -1] satisfies the predetermined condition C ₂ and judging whether at least part of the data in W _j2 [k _j -170, k _j -1] meet The method of the predetermined condition C ₂ is the same, the method of judging whether at least part of the data in W _i3 [k _i -171, _ki -2] satisfies the predetermined condition C ₃ is the same as that of judging W _j3 [k _j -171, k _j -2] Whether at least part of the data satisfies the predetermined condition C ₃ is the same. Similarly, judge whether at least part of the data in W _j4 [k _j -172,k _j -3] meets the predetermined condition C ₄ , and judge W _j5 [k _j -173, Whether at least part of the data in k _j -4] satisfies the predetermined condition C ₅ , judge whether at least part of the data in W _j6 [k _j -174,k _j -5] meets the predetermined condition C ₆ , judge W _j7 [k _j -175, Whether at least part of the data in k _j -6] satisfies the predetermined condition C ₇ , judging whether at least part of the data in W j8 [k _j _-176 ,k _j -7] meets the predetermined condition C ₈ , judging W _j9 [k _j -177, Whether at least part of the data in k _j -8] meets the predetermined condition C ₉ , judging whether at least part of the data in W _j10 [k _j -178, k _j -9] meets the predetermined condition C ₁₀ and judging W _j11 [k _j -179, Whether at least part of the data in k _j -10] satisfies the predetermined condition C ₁₁ will not be repeated here.

图1所示的本发明实施例中的去重服务器103，是指能够实现本发明实施例所描述的技术方案的装置，如图18所示，通常包括中央处理单元、主存储器以及输入输出接口。中央处理单元、主存储器与输入输出接口之间相互通信，主存储器存储可执行指令，中央处理单元执行主存储器中存储的可执行指令，从而执行特定的功能，使去重服务器103具备特定功能，如本发明实施例图20至图33所描述的查找数据流分割点。因此，如图19所示，根据20至图33所示的本发明实施例，去重服务器103，在去重服务器103上预设有规则，所述规则为：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，其中，x为1到M连续的自然数，M≥2，A_x和B_x为整数；The deduplication server 103 in the embodiment of the present invention shown in Figure 1 refers to the device capable of implementing the technical solution described in the embodiment of the present invention, as shown in Figure 18, generally includes a central processing unit, a main memory, and an input and output interface . The central processing unit, the main memory, and the input and output interfaces communicate with each other, the main memory stores executable instructions, and the central processing unit executes the executable instructions stored in the main memory, thereby performing specific functions, so that the deduplication server 103 has specific functions, Find the data flow split point as described in FIG. 20 to FIG. 33 in the embodiment of the present invention. Therefore, as shown in FIG. 19, according to the embodiments of the present invention shown in FIG. 20 to FIG. 33, the deduplication server 103 has rules preset on the deduplication server 103, and the rules are: determine M The window W _x [kA _x ,k+B _x ] and the predetermined condition C _x corresponding to the window W _x [kA _x ,k+B _x ], where x is a continuous natural number from 1 to M, M≥2, A _x and B _x is an integer;

去重服务器103包括确定单元1901和判断处理单元1902。其中，确定单元1901用于执行步骤a)：The deduplication server 103 includes a determination unit 1901 and a judgment processing unit 1902 . Wherein, the determination unit 1901 is used to perform step a):

判断处理单元1902,用于判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z；A judging processing unit 1902, configured to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] satisfies a predetermined condition C _z ;

当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N 个数据流分割点最小查找单位U，N*U不大于‖B_z‖+max_x(‖A_x‖)，获得新的潜在分割点，则所述确定单元1901为所述新的潜在分割点执行步骤a)；When at least part of the data in the window W _iz [k _i -A _z , _ki + B _z ] does not meet the predetermined condition C _z , search for the direction from the current potential split point _ki along the data flow split point Skip N data stream segmentation point minimum search unit U, N*U is not greater than ‖B _z ‖+max _x (‖A _x ‖) to obtain a new potential segmentation point, then the determination unit 1901 is the new potential The split point executes step a);

进一步地，所述规则还包括：至少两个窗口W_ie[k_i-A_e,k_i+B_e]与W_if[k_i-A_f,k_i+B_f]，满足条件：|A_e+B_e|＝|A_f+B_f|，C_e＝C_f。进一步地，所述规则还包括：A_e和A_f为正整数。进一步地，所述规则还包括：A_e-1＝A_f，B_e+1＝B_f。Further, the rule also includes: at least two windows W _ie [k _i -A _e , _ki +B _e ] and W _if [k _i -A _f , _ki +B _f ], satisfying the condition: |A _e +B _e |=|A _f +B _f |, C _e =C _f . Further, the rule further includes: A _e and A _f are positive integers. Further, the rule further includes: A _e −1=A _f , _Be +1=B _f .

进一步地，判断处理单元1902具体用于使用随机函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z。更进一步地，判断处理单元1902具体使用hash函数判断窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足预定条件C_z。Further, the judging processing unit 1902 is specifically configured to use a random function to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z . Furthermore, the judging processing unit 1902 specifically uses a hash function to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z .

进一步地，判断处理单元1902用于当所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据不满足所述预定条件C_z，从所述当前潜在分割点k_i沿所述数据流分割点查找方向跳跃N个数据流分割点最小查找单位U，获得所述新的潜在分割点，所述确定单元1901为所述新的潜在分割点执行步骤a)，根据所述规则，为所述新的潜在分割点确定的窗口W_ic[k_i-A_c,k_i+B_c]的左边界与所述窗口W_iz[k_i-A_z,k_i+B_z]的右边界重合或者为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]的左边界位于所述窗口W_iz[k_i-A_z,k_i+B_z]范围之内；其中，为所述新的潜在分割点确定的所述窗口W_ic[k_i-A_c,k_i+B_c]是根据所述规则，为所述新的潜在分割点确定的M个窗口按照数据流查找方向获得的序列中排序第一的窗口。Further, the _judging processing unit ₁₉₀₂ is _configured to _select from the current potential _segmentation _point _ki Jumping N minimum search units U of data stream segmentation points along the search direction of the data stream segmentation point to obtain the new potential segmentation point, the determining unit 1901 performs step a) for the new potential segmentation point, according to the According to the above rules, the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is the same as the window W _iz [k _i -A _z , _ki +B _z ] or the left boundary of the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is located at the window W _iz [k _i -A _z , k _i +B _z ] range; wherein, the window W _ic [k _i -A _c , _ki +B _c ] determined for the new potential segmentation point is based on the rule, for the new The M windows determined by the potential splitting points are the first windows in the sequence obtained according to the search direction of the data flow.

进一步地，判断处理单元1902使用随机函数判断所述窗口W_iz[k_i-A_z,k_i+B_z]中至少部分数据是否满足所述预定条件C_z，具体包括：Further, the judgment processing unit 1902 uses a random function to judge whether at least part of the data in the window W _iz [k _i -A _z , _ki +B _z ] satisfies the predetermined condition C _z , specifically including:

根据20至图33所示的本发明实施例提供的基于服务器查找数据流分割点的方法中，为潜在分割点k_i确定窗口W_ix[k_i-A_x，k_i+B_x]，其中，x分别为1到M连续的自然数，M≥2，可以并行判断M个窗口中每一个窗口中至少部分数据是否满足预定条件C_x，或者依次判断窗口中至少部分数据是否满足预定条件，也可以依次窗口W_i1[k_i-A₁，k_i+B₁]，中至少部分数据满足预定条件C₁时，再判断W_i2[k_i-A₂，k_i+B₂]中至少部分数据满足预定条件C₂时，直到判断W_im[k_i-A_m，k_i+B_m]中至少部分数据满足预定条件C_m。实施例中其他窗口的判断与此相同，不再赘述。According to the server-based method for searching data stream segmentation points provided by the embodiments of the present invention shown in 20 to FIG. 33 , a window W _ix [ _ki -A _x , _ki +B _x ] is determined for a potential segmentation point _ki , where , x are consecutive natural numbers from 1 to M, and M≥2, it is possible to judge in parallel whether at least part of the data in each of the M windows satisfies the predetermined condition C _x , or sequentially judge whether at least part of the data in the window satisfies the predetermined condition, or When at least part of the data in the window W _i1 [k _i -A ₁ , _ki +B ₁ ] satisfies the predetermined condition C ₁ , at least part of the data in W _i2 [k _i -A ₂ , _ki +B ₂ ] can be judged When the data satisfies the predetermined condition C ₂ , until it is judged that at least part of the data in W _im [k _i -A _m , _ki + B _m ] satisfies the predetermined condition C _m . The determination of other windows in the embodiment is the same as this, and will not be repeated here.

另外，根据20至图33所示的本发明实施例，在去重服务器103上预设有规则，所述规则：为潜在分割点k确定M个窗口W_x[k-A_x,k+B_x]和窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，x分别为1到M连续的自然数，M≥2，在该预设规则中，A₁、A₂、A₃…A_m可以不全部相等，B₁、B₂、B₃…B_m可以不全部相等，C₁、C₂、C₃…C_M也可以不全部相同。在图21所示的实施方式中，在W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-178,k_i-9]和W_i11[k_i-179,k_i-10]中，各窗口大小相同,即窗口大小均为169字节，同时判断窗口中至少部分数据是否满足预定条件的方式也相同,具体见上述判断W_i1[k_i-169,k_i]中至少部分数据是否满足预定条件C₁的描述，但在图11所示的实施方式中，W_i1[k_i-169,k_i]、W_i2[k_i-170,k_i-1]、W_i3[k_i-171,k_i-2]、W_i4[k_i-172,k_i-3]、W_i5[k_i-173,k_i-4]、W_i6[k_i-174,k_i-5]、W_i7[k_i-175,k_i-6]、W_i8[k_i-176,k_i-7]、W_i9[k_i-177,k_i-8]、W_i10[k_i-168,k_i+1]和W_i11[k_i-179,k_i+3]窗口大小可以不相同,同时判断窗口中至少部分数据是否满足预定条件的方式也可以不相同。在所有实施例中，根据为去重服务器103预设的规则，判断窗口W_i1中至少部分数据是否满足预定条件C₁的方式与判断窗口W_j1中至少部分数据是否满足预定条件C₁的方式必然相同，判断W_i2中至少部分数据是否满足预定条件C₂的方式与判断W_j2中至少部分数据是否满足预定条件C₂的方式必然相同…判断窗口W_iM中至少部分数据是否满足预定条件C_M的方式与判断窗口W_jM中至少部分数据是否满足预定条件C_M的方式必然相同。在此不再赘述。In addition, according to the embodiments of the present invention shown in Figures 20 to 33, rules are preset on the deduplication server 103, the rules: determine M windows W _x [kA _x , k+B _x ] for a potential segmentation point k The predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ], x is a continuous natural number from 1 to M, and M≥2. In this preset rule, A ₁ , A ₂ , A ₃ ... All A _m may not be equal, B ₁ , B ₂ , B ₃ . . . B _m may not all be equal, and C ₁ , C ₂ , C ₃ . . . C _M may not all be equal. In the embodiment shown in FIG. 21 , in W _i1 [k _i -169, _ki ], W _i2 [k _i -170, _ki -1], W _i3 [k _i -171, _ki -2] , W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4], W _i6 [k _i -174,k _i -5], W _i7 [k _i -175 ,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177,k _i -8], W _i10 [k _i -178,k _i -9] and W In _i11 [k _i -179, k _i -10], the size of each window is the same, that is, the size of the window is 169 bytes, and the method of judging whether at least part of the data in the window meets the predetermined condition is also the same, see the above judgment W _i1 for details Whether at least part of the data in [k _i -169, _ki ] satisfies the description of the predetermined condition C ₁ , but in the embodiment shown in Fig. 11 , W _i1 [k _i -169, _ki ], W _i2 [k _i -170,k _i -1], W _i3 [k _i -171,k _i -2], W _i4 [k _i -172,k _i -3], W _i5 [k _i -173,k _i -4] , W _i6 [k _i -174,k _i -5], W _i7 [k _i -175,k _i -6], W _i8 [k _i -176,k _i -7], W _i9 [k _i -177 , ki -8], W _i10 [k _i -168, _ki +1] and W _i11 [ _k _i -179, _ki +3] window sizes can be different, and at the same time determine whether at least part of the data in the window meets the predetermined The form of the condition may also be different. In all embodiments, according to the preset rules for the deduplication server 103, the method of judging whether at least part of the data in the window W _i1 satisfies the predetermined condition _C1 is the same as the method of judging whether at least part of the data in the window W _j1 satisfies the predetermined condition _C1 It must be the same, the method of judging whether at least part of the data in W _i2 satisfies the predetermined condition _C2 is necessarily the same as the method of judging whether at least part of the data in W _j2 satisfies the predetermined condition _C2 ...judging whether at least part of the data in the window W _iM satisfies the predetermined condition C The way of _M is necessarily the same as the way of judging whether at least part of the data in the window W _jM satisfies the predetermined condition C _M . I won't repeat them here.

根据20至图33所示的本发明实施例，在去重服务器103上预设有规则，k_a、k_i、k_j、k_l和k_m为沿着数据流分割点查找方向查找分割点时获得的潜在分割点，k_a、k_i、k_j、k_l和k_m都依据该规则。本发明实施例中的窗口W_x[k-A_x,k+B_x]表示一个特定范围，在该特定范围选择数据以判断这些数据是否满足预定条件C_x，具体地，可以在该特定范围内选择部分数据，也可以选择全部数据以判断这些数据是否满足预定条件C_x。本发明实施例中具体使用的窗口概念可参照窗口W_x[k-A_x,k+B_x]，在此不再赘述。According to the embodiment of the present invention shown in 20 to FIG. 33 , there are preset rules on the deduplication server 103, k _a , ki , k _j , _k _l and _km are to search for split points along the direction of data flow split point search The potential segmentation points obtained when , k _a , ki , _{k j} _, k _l and k _m all follow this rule. The window W _x [kA _x , k+B _x ] in the embodiment of the present invention represents a specific range, select data in this specific range to judge whether these data meet the predetermined condition C _x , specifically, you can select within this specific range Part of the data, or all the data may be selected to determine whether the data satisfy the predetermined condition C _x . For the window concept specifically used in the embodiment of the present invention, reference may be made to window W _x [kA _x , k+B _x ], which will not be repeated here.

窗口W_x[k-A_x,k+B_x]中，(k-A_x)和(k+B_x)表示该窗口W_x[k-A_x,k+B_x]的两个边界，其中(k-A_x)表示窗口W_x[k-A_x,k+B_x]相对于潜在分割点k位于数据流分割点查找反方向的边界，(k+B_x)表示窗口W_x[k-A_x,k+B_x]相对于潜在分割点k位于数据流分割点查找方向的边界。具体地，在本发明实施例中，在图20至图33所示的数据流分割点查找方向为从左向右，则其中(k-A_x)表示窗口W_x[k-A_x,k+B_x]相对于潜在分割点k位于数据流分割点查找反方向的边界(即左边界)，(k+B_x)表示窗口W_x[k-A_x,k+B_x]相对于潜在分割点k位于数据流分割点查找方向的边界(即右边界)。如果在图20至图33所示的数据流分割点查找方向为从右向左，则其中(k-A_x)表示窗口W_x[k-A_x, k+B_x]相对于潜在分割点k位于数据流分割点查找反方向的边界(即右边界)，(k+B_x)表示窗口W_x[k-A_x,k+B_x]相对于潜在分割点k位于数据流分割点查找方向的边界(即左边界)。In the window W _x [kA _x ,k+B _x ], (kA _x ) and (k+B _x ) represent the two boundaries of the window W _x [kA _x ,k+B _x ], where (kA _x ) represents The window W _x [kA _x ,k+B _x ] is located at the boundary of the opposite direction of the data flow split point search relative to the potential split point k, and (k+B _x ) means that the window W _x [kA _x ,k+B _x ] is relative to The potential split point k is located at the boundary of the data stream split point search direction. Specifically, in the embodiment of the present invention, the search direction of the data stream segmentation point shown in Figure 20 to Figure 33 is from left to right, where (kA _x ) represents the window W _x [kA _x , k+B _x ] Relative to the potential segmentation point k, the boundary in the opposite direction of the data flow segmentation point search (ie, the left boundary), (k+B _x ) means that the window W _x [kA _x , k+B _x ] is located in the data flow relative to the potential segmentation point k The boundary (i.e. the right boundary) of the split point lookup direction. If the search direction of the data stream split point shown in Figure 20 to Figure 33 is from right to left, then (kA _x ) means that the window W _x [kA _x , k+B _x ] is located in the data stream relative to the potential split point k The boundary in the direction opposite to the split point search (ie, the right boundary), (k+B _x ) means that the window W _x [kA _x , k+B _x ] is located at the boundary of the data flow split point search direction relative to the potential split point k (ie, the left boundary).

本领域普通技术人员可以意识到，结合本发明实施例图20至图33描述的各示例的单元及算法步骤，本发明实施例的关键特征可以与其他技术相结合，以更为复杂的形式呈现，但仍会包含本发明的关键特征。在真实环境中可能使用备用分割点，例如一种实施方式为，根据为去重服务器103预设的规则，为潜在分割点k_i确定11个窗口W_x[k-A_x,k+B_x]及窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，x为1到11连续的自然数，当11个窗口中每一个窗口W_x[k-A_x,k+B_x]中至少部分数据均满足预定条件C_x，则潜在分割点k_i为数据流分割点，当超过设定的最大数据块时，仍未查找到分割点，这时可能使用备用预设规则，备用的预设规则与在去重服务器103上预设的规则类似，备用的预设规则为：例如为潜在分割点k_i确定10个窗口W_x[k-A_x,k+B_x]及窗口W_x[k-A_x,k+B_x]对应的预定条件C_x，x为1到10连续的自然数，确定当10个窗口中每一个窗口W_x[k-A_x,k+B_x]中至少部分数据均满足预定条件C_x，则潜在分割点k_i为数据流分割点，当超过设定的最大数据块时，仍未查找到数据流分割点时，从最大数据块的结束位置作为强制分割点。Those of ordinary skill in the art can realize that, in combination with the units and algorithm steps of the examples described in Figure 20 to Figure 33 of the embodiment of the present invention, the key features of the embodiment of the present invention can be combined with other technologies to present in a more complex form , but still contain the key features of the present invention. In a real environment, alternate split points may be used. For example, one embodiment is to determine 11 windows W _x [kA _x , _k +B _x ] and The predetermined condition C _x corresponding to the window W _x [kA _x , k+B _x ], x is a continuous natural number from 1 to 11, when at least part of each of the 11 windows W _x [kA _x , k+B _x ] The data all meet the predetermined condition C _x , then the potential split point _ki is the data flow split point. When the maximum data block is exceeded, the split point has not been found yet. At this time, the alternate preset rule may be used. The alternate preset The rules are similar to the preset rules on the deduplication server 103, and the spare preset rules are: for example, 10 windows W _x [kA _x , _k +B _x ] and windows W _x [kA _x ,k+B _x ] corresponding to the predetermined condition C _x , x is a continuous natural number from 1 to 10, and it is determined that at least part of the data in each of the 10 windows W _x [kA _x ,k+B _x ] satisfies the predetermined condition C _x , then the potential split point _ki is the data stream split point. When the data stream split point is not found when the set maximum data block is exceeded, the end position of the largest data block is used as the forced split point.

根据20至图33所示的本发明实施例，在去重服务器103上预设有规则，所述规则中为潜在分割点k确定M个窗口，并不一定要求先有一个潜在分割点k，可以通过确定的M个窗口来判断潜在分割点k。According to the embodiments of the present invention shown in Figures 20 to 33, there are preset rules on the deduplication server 103, in which M windows are determined for a potential segmentation point k, and a potential segmentation point k is not necessarily required first. The potential segmentation point k can be judged through the determined M windows.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在提供的几个实施例中，应该理解到，所公开的系统、方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取非易失性存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个非易失性存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的非易失性存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable non-volatile storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a non-volatile storage The medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned non-volatile storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk or optical disk, and various media capable of storing program codes.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应所述以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. a kind of for searching the server of data flow cut-point, which is characterized in that the server includes processor and interface, The processor and the interface communication, preset regular, the rule on the server are as follows: true for potential cut-point k Determine M point p_x, point p_xCorresponding window W_x[p_x-A_x, p_x+B_x] and window W_x[p_x-A_x, p_x+B_x] corresponding predetermined condition C_x, In, x arrives the continuous natural number of M, M >=2, A for 1_xAnd B_xFor integer；

The interface flows for receiving data, and the processor is for executing following steps:

It a) is current potential cut-point k according to the rule_iDetermine point p_izAnd the point p_izCorresponding window W_iz[p_iz-A_z, p_iz+ B_x], i and z are integer, and 1≤z≤M；

B) judge the window W_iz[p_iz-A_z, p_iz+B_z] at least partly data whether meet predetermined condition C_z；

As the window W_iz[p_iz-A_z, p_iz+B_z] at least partly data be unsatisfactory for the predetermined condition C_z, from the point p_izEdge The N number of data flow cut-point minimum of the data flow cut-point search direction jump searches unit U, and N*U is not more than | | B_z||+max_x (||A_x||+||(k_i-p_ix) | |), new potential cut-point is obtained, step a) is executed；

C) as the current potential cut-point k_iEach of M window window W_ix[p_ix-A_x, p_ix+B_x] at least partly Data meet predetermined condition C_x, then the current potential cut-point k_iFor data flow cut-point.

2. server according to claim 1, which is characterized in that the rule further include: at least two point p_eAnd p_f, meet Condition A_e=A_f, B_e=B_f, C_e=C_f。

3. server according to claim 2, which is characterized in that the rule further include: the 1 points p_eAnd p_f, Relative to the potential cut-point k, searched on opposite direction in the data flow cut-point.

4. server according to claim 2 or 3, which is characterized in that the rule further include: the 1 points p_e And p_fThe distance between be 1 U.

5. server according to any one of claims 1 to 3, which is characterized in that the processor is specifically used for using random Function judges the window W_iz[p_iz-A_z, p_iz+B_z] at least partly data whether meet the predetermined condition C_z。

6. server according to claim 5, which is characterized in that the processor is specifically used for judging using hash function The window W_iz[p_iz-A_z, p_iz+B_z] at least partly data whether meet the predetermined condition C_z。

7. server according to any one of claims 1 to 3, which is characterized in that as the window W_iz[p_iz-A_z, p_iz+B_z] in At least partly data are unsatisfactory for the predetermined condition C_z, from the point p_izIt jumps along the data flow cut-point search direction N number of Data flow cut-point minimum searches unit U, obtains the new potential cut-point, is described new potential according to the rule The point p that cut-point determines_icCorresponding window w_ic[p_ic-A_c, p_ic+B_c] left margin and the window W_iz[p_iz-A_z, p_iz+B_z] The point p that right margin is overlapped or determines for the new potential cut-point_icThe corresponding window w_ic[p_ic-A_c, p_ic+ B_c] left margin be located at the window W_iz[p_iz-A_z, p_iz+B_z] within the scope of；Wherein, it is determined for the new potential cut-point The point p_icIt is according to the rule, the M point determined for the new potential cut-point is obtained according to data flow search direction The point of sequence first in the sequence obtained.

8. server according to claim 5, which is characterized in that the processor judges the window using random function W_iz[p_iz-A_z, p_iz+B_z] at least partly data whether meet the predetermined condition C_z, it specifically includes:

In the window W_iz[p_iz-A_z, p_iz+B_z] F byte of middle selection, the F byte is recycled H times, obtains F*H altogether A byte is denoted as a wherein each byte is formed by 8_{M, 1}...a_{M, 8}, indicate the 1st of m-th of byte in the F*H byte To the 8th, the corresponding position of the F*H byte can be indicated are as follows:Work as a_{M, n}When=1, V_{Am, n}=1, work as a_{M, n}When=0, V_{Am, n}=-1, wherein a_{M, n}Indicate a_{M, 1}...a_{M, 8}Any of, the F*H byte is corresponding Position is according to a_{M, n}With V_{Am, n}Transformational relation obtain matrix V_a, the matrix V_aIt indicates are as follows:F*H*8 random number of selection forms matrix R from the random number of service normal distribution, The matrix R is indicated are as follows:By the matrix V_aM row and the matrix R m row Random number be multiplied, then summation obtain a value, be embodied as s_am=V_{Am, 1}*h_{M, 1}+V_{Am, 2}*_{M, 2}+...+V_{Am, 8}*h_{M, 8}, together Reason obtains s_a1、s_a2... arrive s_aF*H, count s_a1、s_a2... arrive s_aF*HIt is middle meet greater than 0 value number K, when K be even number, then The window W_iz[p_iz-A_z, p_iz+B_z] at least partly data meet the predetermined condition C_z。

9. a kind of for searching the server of data flow cut-point, which is characterized in that the server includes processor and interface, The processor and the interface communication, preset regular, the rule on the server are as follows: true for potential cut-point k Determine M window W_x[k-A_x, k+B_x] and window W_x[k-A_x, k+B_x] corresponding predetermined condition C_x, wherein x arrives M continuously certainly for 1 So number, M >=2, A_xAnd B_xFor integer；

It a) is current potential cut-point k according to the rule_iDetermine corresponding window W_iz[k_i-A_z, k_i+B_z], i and z are integer, and And 1≤z≤M；

B) judge the window W_iz[k_i-A_z, k_i+B_z] at least partly data whether meet predetermined condition C_z；

As the window W_iz[k_i-A_z, k_i+B_z] at least partly data be unsatisfactory for the predetermined condition C_z, from described current potential Cut-point k_iThe N number of lookup of data flow cut-point minimum unit U, N*U are jumped no more than ‖ along the data flow cut-point search direction B_z ‖+max_x(‖A_x‖), new potential cut-point is obtained, step a) is executed；

C) as the current potential cut-point k_iEach of M window window W_ix[k_i-A_x, k_i+B_x] at least partly count According to meeting predetermined condition C_x, then the current potential cut-point k_iFor data flow cut-point.

10. server according to claim 9, which is characterized in that the rule further include: at least two window W_ie[k_i- A_e, k_i+B_e] and W_if[k_i-A_f, k_i+B_f], meet condition: | A_e+B_e|=| A_f+B_f|, C_e=C_f。

11. server according to claim 10, which is characterized in that be the server preset rules, the rule is also It include: A_eAnd A_fFor positive integer.

12. server described in 0 or 11 according to claim 1, which is characterized in that the rule further include: A_e- 1=A_f, B_e+1 =B_f。

13. according to any server of claim 9 to 11, which is characterized in that the processor be specifically used for use with Machine function judges the window W_iz[k_i-A_z, k_i+B_z] at least partly data whether meet the predetermined condition C_z。

14. server according to claim 13, which is characterized in that the processor is specifically used for sentencing using hash function Break the window W_iz[k_i-A_z, k_i+B_z] at least partly data whether meet the predetermined condition C_z。

15. according to any server of claim 9 to 11, which is characterized in that as the window W_iz[k_i-A_z, k_i+B_z] In at least partly data be unsatisfactory for the predetermined condition C_z, from the current potential cut-point k_iIt is looked into along the data flow cut-point The direction N number of data flow cut-point minimum of jump is looked for search unit U, obtain the new potential cut-point is according to the rule The window w that the new potential cut-point determines_ic[k_i-A_c, k_i+B_c] left margin and the window W_iz[k_i-A_z, k_i+B_z] The window w that right margin is overlapped or determines for the new potential cut-point_ic[k_i-A_c, k_i+B_c] left margin be located at institute State window W_iz[k_i-A_z, k_i+B_z] within the scope of；Wherein, the window w determined for the new potential cut-point_ic[k_i-A_c, k_i+B_c] it is to be obtained for the M window that the new potential cut-point determines according to data flow search direction according to the rule The window of sequence first in sequence.

16. server according to claim 13, which is characterized in that the processor judges the window using random function Mouth W_iz[k_i-A_z, k_i+B_z] at least partly data whether meet the predetermined condition C_z, it specifically includes:

In the window W_iz[k_i-A_z, k_i+B_z] F byte of middle selection, the F byte is recycled H times, obtains F*H altogether A byte is denoted as a wherein each byte is formed by 8_{M, 1}...a_{M, 8}, indicate the 1st of m-th of byte in the F*H byte To the 8th, the corresponding position of the F*H byte can be indicated are as follows:Work as a_{M, n}When=1, V_{Am, n}=1, work as a_{M, n}When=0, V_{Am, n}=-1, wherein a_{M, n}Indicate a_{M, 1}…a_{M, 8}Any of, the F*H byte is corresponding Position is according to a_{M, n}With V_{Am, n}Transformational relation obtain matrix V_a, the matrix V_aIt indicates are as follows:F*H*8 random number of selection forms matrix R from the random number of service normal distribution, The matrix R is indicated are as follows:By the matrix V_aM row and the matrix R m row Random number be multiplied, then summation obtain a value, be embodied as s_am=V_{Am, 1}*h_{M, 1}+V_{Am, 2}*h_{M, 2}+...+V_{Am, 8}*h_{M, 8}, Similarly, s is obtained_a1、s_a2... arrive s_aF*H, count s_a1、s_a2... arrive s_aF*HIt is middle meet greater than 0 value number K, when K be even number, The then window W_iz[k_i-A_z, k_i+B_z] at least partly data meet the predetermined condition C_z。