CN107196998B

CN107196998B - Mobile Web request processing method, device and system based on data deduplication

Info

Publication number: CN107196998B
Application number: CN201710290394.2A
Authority: CN
Inventors: 施展; 冯丹; 毛艳; 李双双; 单玉祥; 张芸怡; 方交凤
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2020-07-10
Anticipated expiration: 2037-04-28
Also published as: CN107196998A

Abstract

The invention discloses a mobile Web request processing method, equipment and system based on data deduplication, and belongs to the technical field of mobile Web application and data processing. Aiming at the problem that the Web performance is reduced by network transmission of most unnecessary repeated contents when a client cache file is updated in a small amount in the field of mobile Web, the method provides that a mobile Web request data deduplication mechanism is adopted in the mobile Web, duplicated data in a new version of the update file is deleted according to the cache file to generate a differential file, only the differential file is returned to the client during network transmission, and the client performs restoration according to the differential file and the cache file, so that the network transmission data amount is reduced; the method for asynchronously processing the differential file analysis and the server rendering is provided, so that the pressure of a client is reduced, the network request times of dynamic data are reduced, and the overall Web performance is improved.

Description

Mobile Web request processing method, device and system based on data deduplication

技术领域technical field

本发明属于移动Web应用和数据处理技术领域，更具体地，涉及一种基于数据去重的移动Web请求处理方法、设备及系统。The invention belongs to the technical field of mobile web application and data processing, and more particularly, relates to a method, device and system for processing mobile web requests based on data deduplication.

背景技术Background technique

互联网是当今最重要的信息传播及通讯平台，而万维网Web是互联网中最重要的平台之一，随着进入Web2.0时代，页面内容逐渐多样化和复杂化，如何获得较好的Web性能成为了研究热点。The Internet is the most important information dissemination and communication platform today, and the World Wide Web is one of the most important platforms in the Internet. With the entry of the Web 2.0 era, the content of pages is gradually diversified and complicated. How to obtain better Web performance has become a problem. research hotspots.

随着移动设备的普及和移动互联网的快速发展，很多传统互联网应用正在向移动互联网方向转变。由于移动端的系统资源相对较缺乏，如较小的内存和存储容量、较弱的处理能力、较差的网络和连接性，请求呈现一个页面的速度要比PC端慢几秒到十几秒，所以针对移动端的优化更值得重视。With the popularization of mobile devices and the rapid development of the mobile Internet, many traditional Internet applications are changing to the mobile Internet. Due to the relatively lack of system resources on the mobile side, such as small memory and storage capacity, weak processing power, poor network and connectivity, the speed of requesting to render a page is several seconds to ten seconds slower than that on the PC side. Therefore, optimization for mobile terminals is more worthy of attention.

在信息爆炸时代，资源有限的移动设备对过多信息的处理能力明显不足。有相关研究表明，互联网应用里大量的数据信息存在很多冗余数据，去除冗余数据、减少数据传输量可以节约有限的存储空间、减少网络传输时间、节约用户的时间和成本，对移动互联网有重要的意义。In the era of information explosion, mobile devices with limited resources are obviously insufficient in processing too much information. Relevant studies have shown that there is a lot of redundant data in a large amount of data information in Internet applications. Removing redundant data and reducing the amount of data transmission can save limited storage space, reduce network transmission time, and save users’ time and cost. Significance.

目前主要采用两种方法减少数据传输量：压缩技术和缓存技术。压缩技术针对每个原始文件进行压缩，减少文件传输体积，但是不能压缩文件之间的冗余，对两个相似文件分别压缩之后得到的压缩文件仍然有很多冗余数据。缓存技术可以有效减少网络请求数量，但缓存容量有限，且无法消除数据冗余问题。在文件系统及备份系统中，数据去重技术被采用来大规模识别和消除冗余数据，降低数据存储成本。但将传统领域的冗余消除技术直接应用到移动Web中，会增加额外的运算来进行计算和恢复，对于处理能力较弱的移动端，可能会影响整体Web性能，因此，针对移动Web应用，要实现一个有效的方法，既能消除冗余数据，又能提升整体Web性能。At present, two methods are mainly used to reduce the amount of data transmission: compression technology and caching technology. The compression technology compresses each original file to reduce the file transmission volume, but cannot compress the redundancy between the files. The compressed file obtained after compressing two similar files separately still has a lot of redundant data. Cache technology can effectively reduce the number of network requests, but the cache capacity is limited and cannot eliminate the problem of data redundancy. In file systems and backup systems, data deduplication technology is used to identify and eliminate redundant data on a large scale, reducing data storage costs. However, applying the redundancy elimination technology in the traditional field directly to the mobile web will add extra operations for calculation and recovery. For mobile terminals with weak processing capabilities, it may affect the overall web performance. Therefore, for mobile web applications, To implement an efficient method that both eliminates redundant data and improves overall web performance.

发明内容SUMMARY OF THE INVENTION

针对现有技术的以上缺陷或改进需求，本发明提供了一种基于数据去重的移动Web请求处理方法、设备及系统，其目的在于减少移动Web请求处理过程中传输大量不必要的冗余数据降低Web性能的问题，既能有效消除冗余数据，又能提升整体Web性能。In view of the above defects or improvement requirements of the prior art, the present invention provides a mobile Web request processing method, device and system based on data deduplication, the purpose of which is to reduce the transmission of a large amount of unnecessary redundant data in the process of mobile Web request processing. The problem of reducing web performance can not only effectively eliminate redundant data, but also improve overall web performance.

为实现上述目的，按照本发明的一个方面，提供了一种基于数据去重的移动Web请求处理方法，包括：In order to achieve the above object, according to an aspect of the present invention, a method for processing mobile Web requests based on data deduplication is provided, comprising:

S1、客户端向服务端发送页面请求，以使所述服务端向通用网关接口CGI请求动态数据并填充，渲染页面文件，其中，所述页面请求中包括页面文件；S1, the client sends a page request to the server, so that the server requests dynamic data from the general gateway interface CGI and fills in, and renders a page file, wherein the page request includes the page file;

S2、所述客户端接收所述服务端发送的渲染好的页面文件；S2, the client receives the rendered page file sent by the server;

S3、所述客户端提取接收到的页面文件中的文件版本号信息，与缓存中的旧版本文件的版本号对比，若发现文件发生更新，则执行步骤S4；若文件没有更新，直接使用缓存中的旧版本文件；S3. The client extracts the file version number information in the received page file, and compares it with the version number of the old version file in the cache. If it is found that the file is updated, step S4 is performed; if the file is not updated, the cache is directly used. old version files in;

S4、所述客户端向所述服务端发送差量文件请求，所述差量文件请求用于请求差量文件，所述差量文件由所述服务端在文件更新时离线生成，且在所述差量文件中包括新增的数据和未变数据内容对应的块号；S4. The client sends a delta file request to the server, the delta file request is used to request a delta file, the delta file is generated offline by the server when the file is updated, and is The difference file includes the block number corresponding to the newly added data and the unchanged data content;

S5、所述客户端接收所述服务端发送的差量文件，并由所述差量文件中新增的数据和未变数据内容对应的块号将差量文件与缓存中的旧版本文件合并恢复新版本文件。S5. The client receives the delta file sent by the server, and merges the delta file with the old version file in the cache according to the block numbers corresponding to the newly added data and unchanged data content in the delta file Restore the new version of the file.

优选地，所述步骤S5包括以下步骤：Preferably, the step S5 includes the following steps:

S51、初始化空字符串变量strResult表示计算得到的新文件；S51, initialize the empty string variable strResult to represent the calculated new file;

S52、扫描差量文件中的数组；S52, scan the array in the differential file;

S53、若当前扫描的数组项内容是字符串，则表示是新增的数据，将该新增的数据添加到strResult中；若当前扫描的数组项内容不是字符串，则表示是记录未变数据内容的块号，根据块号对应的块大小和描述规则计算该块号对应的内容在旧版本文件里的起始和结束位置，用截取字符串方法从旧版本文件获得该描述规则对应的内容，并将该内容添加到strResult中；S53. If the content of the currently scanned array item is a string, it means that it is newly added data, and the newly added data is added to strResult; if the content of the currently scanned array item is not a character string, it means that the record is unchanged data The block number of the content, calculate the start and end positions of the content corresponding to the block number in the old version file according to the block size corresponding to the block number and the description rule, and use the interception string method to obtain the content corresponding to the description rule from the old version file. , and add that content to strResult;

S54、判断扫描是否结束，若是则执行步骤S55，否则转到S52；S54, determine whether the scanning is over, if so, execute step S55, otherwise go to S52;

S55、返回变量strResult作为新版本文件，结束。S55, returning the variable strResult as the new version file, and ending.

为实现上述目的，按照本发明的另一个方面，提供了一种基于数据去重的移动Web请求处理方法，包括：To achieve the above object, according to another aspect of the present invention, there is provided a mobile Web request processing method based on data deduplication, comprising:

A1、服务端接收客户端发送的页面请求，向通用网关接口CGI请求动态数据并填充，渲染页面文件，其中，所述页面请求中包括页面文件；A1, the server receives the page request sent by the client, requests dynamic data from the general gateway interface CGI and fills it, and renders the page file, wherein the page request includes the page file;

A2、所述服务端将渲染好数据的页面文件发送给所述客户端，以使所述客户端提取接收到的页面文件中的文件版本号信息，与缓存中的旧版本文件的版本号对比，并在文件发生更新时，向所述服务端发送差量文件请求，其中，所述差量文件请求用于请求差量文件，所述差量文件由所述服务端在文件更新时离线生成，且在所述差量文件中包括新增的数据和未变数据内容对应的块号；A2. The server sends the page file with the rendered data to the client, so that the client extracts the file version number information in the received page file and compares it with the version number of the old version file in the cache , and send a delta file request to the server when the file is updated, wherein the delta file request is used to request a delta file, and the delta file is generated offline by the server when the file is updated , and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

A3、所述服务端接收来自所述客户端的差量文件请求，向所述客户端发送差量文件，以使所述客户端由所述差量文件中新增的数据和未变数据内容对应的块号将接收到的差量文件与缓存中的旧版本文件合并恢复新版本文件。A3. The server receives the delta file request from the client, and sends the delta file to the client, so that the client corresponds to the content of the unchanged data from the newly added data in the delta file The block number of the received delta file is merged with the old version file in the cache to restore the new version file.

优选地，所述差量文件的生成步骤为：Preferably, the step of generating the difference file is:

T1、将旧版本文件按固定长度分块，计算每个块的哈希校验码值作为块标识对每个块编号；T1. Divide the old version file into fixed length blocks, and calculate the hash check code value of each block as the block identifier to number each block;

T2、在新版本文件里进行滚动查找，依次比对新版文件块内容的哈希校验码值是否和旧版本文件的块标识相对应；如果对应，记录对应的块号，滚动下标一个块长度距离；否则将数据压入新数据缓冲区，滚动下标一个字符长度距离；T2. Perform a scrolling search in the new version file, and compare the hash check code value of the block content of the new version file in turn to see if it corresponds to the block ID of the old version file; if so, record the corresponding block number, and scroll to subscript a block Length distance; otherwise, push the data into the new data buffer, and scroll the subscript a character length distance;

T3、判断新版本文件中的内容是否滚动查找结束，若是则执行步骤T4，否则转到步骤T2；T3, determine whether the content in the new version file is scrolled and searched, if so, execute step T4, otherwise go to step T2;

T4、合成新数据缓冲区中的新数据并生成差量文件。T4. Synthesize the new data in the new data buffer and generate a difference file.

为实现上述目的，按照本发明的另一个方面，提供了一种客户端，包括：To achieve the above object, according to another aspect of the present invention, a client is provided, comprising:

第一发送模块，用于向服务端发送页面请求，以使所述服务端向通用网关接口CGI请求动态数据并填充，渲染页面文件，其中，所述页面请求中包括页面文件；a first sending module, configured to send a page request to the server, so that the server requests dynamic data from the general gateway interface CGI and fills in and renders the page file, wherein the page request includes the page file;

接收模块，用于接收所述服务端发送的渲染好的页面文件；a receiving module, configured to receive the rendered page file sent by the server;

比对判断模块，用于提取接收到的页面文件中的文件版本号信息，与缓存中的旧版本文件的版本号对比，判断文件是否发生更新；The comparison and judgment module is used to extract the file version number information in the received page file, compare it with the version number of the old version file in the cache, and determine whether the file is updated;

第二发送模块，用于在文件发生更新时，向所述服务端发送差量文件请求，所述差量文件请求用于请求差量文件，所述差量文件由所述服务端在文件更新时离线生成，且在所述差量文件中包括新增的数据和未变数据内容对应的块号；The second sending module is configured to send a delta file request to the server when the file is updated, the delta file request is used to request a delta file, and the delta file is updated by the server in the file Time off-line generation, and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

更新模块，用于接收所述服务端发送的差量文件，并由所述差量文件中新增的数据和未变数据内容对应的块号将差量文件与缓存中的旧版本文件合并恢复新版本文件。The update module is used to receive the delta file sent by the server, and combine and restore the delta file and the old version file in the cache by the block number corresponding to the newly added data and unchanged data content in the delta file new version file.

优选地，所述更新模块包括：Preferably, the update module includes:

初始化模块，用于初始化空字符串变量strResult表示计算得到的新文件；The initialization module is used to initialize the empty string variable strResult to represent the calculated new file;

扫描模块，用于扫描差量文件中的数组；Scan module for scanning arrays in delta files;

扫描处理模块，用于在当前扫描的数组项内容是字符串时，表示是新增的数据，将该新增的数据添加到strResult中；在当前扫描的数组项内容不是字符串时，表示是记录未变数据内容的块号，根据块号对应的块大小和描述规则计算该块号对应的内容在旧版本文件里的起始和结束位置，用截取字符串方法从旧版本文件获得该描述规则对应的内容，并将该内容添加到strResult中；The scan processing module is used to add new data to strResult when the content of the currently scanned array item is a string; when the content of the currently scanned array item is not a string, it means yes Record the block number of the unchanged data content, calculate the start and end positions of the content corresponding to the block number in the old version file according to the block size corresponding to the block number and the description rule, and obtain the description from the old version file by intercepting the string method The content corresponding to the rule, and add the content to strResult;

判断处理模块，用于在扫描结束时，返回变量strResult作为新版本文件；在扫描未结束时，返回执行所述扫描模块的操作。The judgment processing module is used for returning the variable strResult as the new version file when the scanning ends; when the scanning is not finished, returning to execute the operation of the scanning module.

为实现上述目的，按照本发明的另一个方面，提供了一种服务端，包括：To achieve the above purpose, according to another aspect of the present invention, a server is provided, comprising:

第一接收模块，用于接收客户端发送的页面请求，向通用网关接口CGI请求动态数据并填充，渲染页面文件，其中，所述页面请求中包括页面文件；The first receiving module is used to receive the page request sent by the client, request dynamic data from the general gateway interface CGI and fill it, and render the page file, wherein the page request includes the page file;

第一发送模块，用于将渲染好数据的页面文件发送给所述客户端，以使所述客户端提取接收到的页面文件中的文件版本号信息，与缓存中的旧版本文件的版本号对比，并在文件发生更新时，向所述服务端发送差量文件请求，其中，所述差量文件请求用于请求差量文件；The first sending module is used to send the page file with the rendered data to the client, so that the client extracts the file version number information in the received page file, and the version number of the old version file in the cache. Compare, and when the file is updated, send a delta file request to the server, wherein the delta file request is used to request a delta file;

差量文件生成模块，用于在文件更新时离线生成差量文件，且在所述差量文件中包括新增的数据和未变数据内容对应的块号；A difference file generation module, used for offline generation of a difference file when the file is updated, and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

第二接收模块，用于接收来自所述客户端的差量文件请求；a second receiving module, configured to receive a differential file request from the client;

第二发送模块，用于向所述客户端发送差量文件，以使所述客户端由所述差量文件中新增的数据和未变数据内容对应的块号将接收到的差量文件与缓存中的旧版本文件合并恢复新版本文件。The second sending module is configured to send the delta file to the client, so that the client will receive the delta file from the block number corresponding to the newly added data and the unchanged data content in the delta file Merge with the old version of the file in the cache to restore the new version of the file.

优选地，所述差量文件生成模块包括：Preferably, the delta file generation module includes:

分块模块，用于将旧版本文件按固定长度分块，计算每个块的哈希校验码值作为块标识对每个块编号；The block module is used to divide the old version file into fixed length blocks, and calculate the hash check code value of each block as a block identifier to number each block;

查找处理模块，用于在新版本文件里进行滚动查找，依次比对新版文件块内容的哈希校验码值是否和旧版本文件的块标识相对应；如果对应，记录对应的块号，滚动下标一个块长度距离；否则将数据压入新数据缓冲区，滚动下标一个字符长度距离；The search processing module is used to perform a rolling search in the new version file, and sequentially compare whether the hash check code value of the block content of the new version file corresponds to the block ID of the old version file; if so, record the corresponding block number and scroll Subscript a block length distance; otherwise, push the data into the new data buffer, and scroll the subscript a character length distance;

判断处理模块，用于在新版本文件中的内容滚动查找结束时，合成新数据缓冲区中的新数据并生成差量文件；在新版本文件中的内容滚动查找未结束时，返回执行所述查找处理模块的操作。The judgment processing module is used to synthesize new data in the new data buffer and generate a difference file when the content scrolling search in the new version file ends; when the content scrolling search in the new version file is not completed, return to execute the described Find the action of the processing module.

为实现上述目的，按照本发明的另一个方面，提供了一种基于数据去重的移动Web请求处理系统，包括客户端和服务端，其中：In order to achieve the above object, according to another aspect of the present invention, a mobile Web request processing system based on data deduplication is provided, including a client and a server, wherein:

所述客户端，用于向服务端发送页面请求，其中，所述页面请求中包括页面文件；The client is configured to send a page request to the server, wherein the page request includes a page file;

所述服务端，用于接收所述页面请求，向通用网关接口CGI请求动态数据并填充，渲染页面文件；将渲染好数据的页面文件发送给客户端；The server is used for receiving the page request, requesting dynamic data from the general gateway interface CGI and filling it, and rendering the page file; sending the page file with the rendered data to the client;

所述客户端，还用于提取接收到的页面文件中的文件版本号信息，与缓存中的旧版本文件的版本号对比，若发现文件发生更新，则向服务端发送差量文件请求，所述差量文件请求用于请求差量文件，所述差量文件由服务端在文件更新时离线生成；若文件没有更新，直接使用缓存中的旧版本文件；The client is also used to extract the file version number information in the received page file, and compare it with the version number of the old version file in the cache. If it is found that the file is updated, it will send a differential file request to the server. The difference file request is used to request the difference file, and the difference file is generated offline by the server when the file is updated; if the file is not updated, the old version file in the cache is directly used;

所述服务端，还用于接收来自客户端的差量文件请求，向客户端发送差量文件，在所述差量文件中包括新增的数据和未变数据内容对应的块号；The server is also used to receive a delta file request from the client, send a delta file to the client, and include the newly added data and the block number corresponding to the unchanged data content in the delta file;

所述客户端，还用于由所述差量文件中新增的数据和未变数据内容对应的块号将接收到的差量文件与缓存中的旧版本文件合并恢复新版本文件。The client is further configured to restore the new version file by merging the received delta file with the old version file in the cache according to the block number corresponding to the newly added data and unchanged data content in the delta file.

优选地，所述服务端，还用于将旧版本文件按固定长度分块，计算每个块的哈希校验码值作为块标识对每个块编号；在新版本文件里进行滚动查找，依次比对新版文件块内容的哈希校验码值是否和旧版本文件的块标识相对应；如果对应，记录对应的块号，滚动下标一个块长度距离；否则将数据压入新数据缓冲区，滚动下标一个字符长度距离；判断新版本文件中的内容是否滚动查找结束，若是则合成新数据缓冲区中的新数据并生成差量文件，否则在新版本文件里继续进行滚动查找；Preferably, the server is also used to divide the old version file into fixed-length blocks, calculate the hash check code value of each block as a block identifier to number each block; perform a rolling search in the new version file, Check whether the hash check code value of the new version of the file block content corresponds to the block ID of the old version of the file; if so, record the corresponding block number, scroll the subscript a block length distance; otherwise, push the data into the new data buffer area, scroll subscript a character length distance; judge whether the content in the new version file is finished scrolling, if so, synthesize the new data in the new data buffer and generate a difference file, otherwise continue to scroll in the new version file;

所述客户端，还用于初始化空字符串变量strResult表示计算得到的新文件；扫描差量文件中的数组；若当前扫描的数组项内容是字符串，则表示是新增的数据，将该新增的数据添加到strResult中；若当前扫描的数组项内容不是字符串，则表示是记录未变数据内容的块号，根据块号对应的块大小和描述规则计算该块号对应的内容在旧版本文件里的起始和结束位置，用截取字符串方法从旧版本文件获得该描述规则对应的内容，并将该内容添加到strResult中；判断是否扫描结束，若是则返回变量strResult作为新版本文件，否则继续扫描差量文件中的数组。The client is also used to initialize the empty string variable strResult to represent the calculated new file; scan the array in the differential file; if the content of the currently scanned array item is a string, it means that it is newly added data, and the The newly added data is added to strResult; if the content of the currently scanned array item is not a string, it means that it is the block number that records the content of the unchanged data, and the content corresponding to the block number is calculated according to the block size and description rules corresponding to the block number. Start and end positions in the old version file, use the interception string method to obtain the content corresponding to the description rule from the old version file, and add the content to strResult; judge whether the scan is over, and if so, return the variable strResult as the new version file, otherwise continue scanning the array in the delta file.

以上的基于数据去重的移动Web请求处理方法、设备及系统，简称为DWPF(Deduplication-based Web-request Processing Framework)，有别于现有技术方案：基于缓存技术和基于压缩技术。总体而言，本发明方法与现有技术方案相比，能够取得下列有益效果：The above data deduplication-based mobile Web request processing method, device and system, referred to as DWPF (Deduplication-based Web-request Processing Framework) for short, are different from existing technical solutions: cache-based technology and compression-based technology. In general, compared with the prior art solutions, the method of the present invention can achieve the following beneficial effects:

1、本发明与现有缓存技术相比，考虑文件中冗余数据的消除以及缓存容量的有限性，有效消除冗余数据，减少数据传输量以及缓存大小。1. Compared with the existing cache technology, the present invention takes into account the elimination of redundant data in the file and the limited cache capacity, effectively eliminates redundant data, and reduces the amount of data transmission and the size of the cache.

2、本发明与现有压缩技术相比，考虑相似文件之间的冗余数据消除，更进一步消除文件中的冗余，减少数据传输量。2. Compared with the existing compression technology, the present invention considers the elimination of redundant data between similar files, further eliminates the redundancy in the files, and reduces the amount of data transmission.

附图说明Description of drawings

图1为本发明实施例公开的一种基于数据去重的Web请求处理方法的流程示意图；1 is a schematic flowchart of a method for processing a Web request based on data deduplication disclosed in an embodiment of the present invention;

图2为本发明实施例公开的另一种基于数据去重的Web请求处理方法的流程示意图；2 is a schematic flowchart of another Web request processing method based on data deduplication disclosed in an embodiment of the present invention;

图3为本发明实施例公开的一种基于数据去重的Web请求处理系统的结构示意图；3 is a schematic structural diagram of a Web request processing system based on data deduplication disclosed in an embodiment of the present invention;

图4为本发明实施例公开的一种差量文件生成方法的流程示意图；4 is a schematic flowchart of a method for generating a differential file disclosed in an embodiment of the present invention;

图5为本发明实施例公开的一种合并差量文件和旧版文件为新版文件的方法流程示意图。FIG. 5 is a schematic flowchart of a method for merging a difference file and an old version file into a new version file disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明提供一种基于数据去重的移动Web请求处理方法，包括如下机制：The present invention provides a mobile Web request processing method based on data deduplication, including the following mechanisms:

1、移动Web请求数据去重机制1. Mobile Web request data deduplication mechanism

现有提升Web性能的方法或通过压缩技术减少数据传输量或通过缓存技术增加缓存减少访问请求次数，但是这两种技术在已缓存的旧版本文件发生少量更新的情况无法有效提升Web性能。本发明提出移动Web请求数据去重机制，当缓存有旧版本文件的客户端向服务端请求新版本文件内容时，服务端根据新旧版本文件的版本号，消除冗余数据，生成新旧版本文件之间的差量文件，并发送给客户端，而不发送完整文件或压缩文件，以此有效减少文件冗余数据，减少数据传输量，提升Web页面请求的性能。Existing methods to improve Web performance include reducing the amount of data transmission through compression technology or increasing the number of access requests through caching technology, but these two technologies cannot effectively improve Web performance in the case of a small amount of updates to old cached files. The present invention proposes a data deduplication mechanism for mobile web requests. When a client that caches an old version of a file requests the server for the content of a new version of the file, the server eliminates redundant data according to the version numbers of the old and new versions of the file, and generates a new version of the file. The difference between the files is sent to the client instead of the complete file or compressed file, which effectively reduces the redundant data of the file, reduces the amount of data transmission, and improves the performance of Web page requests.

2、异步去重与渲染机制2. Asynchronous deduplication and rendering mechanism

移动Web请求数据去重机制的使用可以有效减少数据传输量，但是客户端必须将差量文件与旧版文件合并恢复新版文件，同时客户端需要进行动态数据请求和渲染页面的过程，这会增大处理能力较弱的客户端的计算压力。本发明提出异步去重与渲染机制，使服务端动态数据请求及渲染页面与客户端差量文件解析异步进行，这种机制可以减少动态数据的网络请求次数，平衡服务端和客户端的负载，有效提升整体Web性能。The use of mobile Web request data deduplication mechanism can effectively reduce the amount of data transmission, but the client must merge the difference file with the old file to restore the new version of the file, and the client needs to perform dynamic data request and page rendering process, which will increase Computational stress on clients with less processing power. The invention proposes an asynchronous deduplication and rendering mechanism, so that the dynamic data request and rendering page of the server and the differential file parsing of the client are performed asynchronously. This mechanism can reduce the number of network requests for dynamic data, balance the load of the server and the client, and effectively Improve overall web performance.

本发明提供一个实施例说明本发明方法的实施过程。本实例中假设客户端缓存的旧版本文件内容为chunk1-chunk2-chunk3-chunk4-chunk5(每个chunk代表按照固定长度分割的不同字符串)，新版本文件内容为chunk1-“hdfjkhlf”-chunk2-chunk3-“uygfiuy”-chunk5。The present invention provides an example to illustrate the implementation process of the method of the present invention. In this example, it is assumed that the content of the old version file cached by the client is chunk1-chunk2-chunk3-chunk4-chunk5 (each chunk represents a different string divided by a fixed length), and the content of the new version file is chunk1-"hdfjkhlf"-chunk2- chunk3-"uygfiuy"-chunk5.

如图1所示为本发明实施例公开的一种基于数据去重的移动Web请求处理方法的流程示意图，包括如下步骤：1 is a schematic flowchart of a method for processing a mobile Web request based on data deduplication disclosed in an embodiment of the present invention, including the following steps:

S1、客户端向服务端发送页面请求，以使所述服务端向通用网关接口(CommonGateway Interface，CGI)请求动态数据并填充，渲染页面文件，其中，所述页面请求中包括页面文件；S1, the client sends a page request to the server, so that the server requests dynamic data from a common gateway interface (CommonGateway Interface, CGI) and fills in, and renders a page file, wherein the page request includes the page file;

其中，将页面渲染过程转移到服务端可以减少客户端后续对差量文件解析与恢复的计算压力。Among them, transferring the page rendering process to the server can reduce the client's subsequent computational pressure on differential file parsing and recovery.

S5、所述客户端接收所述服务端发送的差量文件，并由所述差量文件中新增的数据和未变数据内容对应的块号将差量文件与缓存中的旧版本文件合并恢复新版本文件，差量文件解析过程与服务端渲染过程异步进行，解析完成则处理结束。S5. The client receives the delta file sent by the server, and merges the delta file with the old version file in the cache according to the block numbers corresponding to the newly added data and unchanged data content in the delta file To restore the new version of the file, the differential file parsing process and the server-side rendering process are performed asynchronously, and the processing ends when the parsing is completed.

如图2所示为本发明实施例公开的另一种基于数据去重的移动Web请求处理方法的流程示意图，包括如下步骤：FIG. 2 is a schematic flowchart of another method for processing mobile Web requests based on data deduplication disclosed in an embodiment of the present invention, including the following steps:

本发明实施例还提供了一种客户端，包括：The embodiment of the present invention also provides a client, including:

本发明实施例还提供了一种服务端，包括：The embodiment of the present invention also provides a server, including:

如图3所示为本发明实施例公开的一种基于数据去重的移动Web请求处理系统的结构示意图，在图3所示的系统中包括客户端和服务端，其中：Figure 3 is a schematic structural diagram of a mobile Web request processing system based on data deduplication disclosed in an embodiment of the present invention. The system shown in Figure 3 includes a client and a server, wherein:

客户端：缓存有旧版本文件，用于向服务端发出Web请求，以及后续对不同版本文件间差量文件的请求和解析过程。Client: The old version of the file is cached, which is used to send a web request to the server, as well as the subsequent request and parsing process for the difference file between different versions of the file.

服务端：包含差量文件生成器和页面解析器，用于处理来自客户端的Web请求。差量文件生成器用于文件发生更新时离线生成不同版本文件间的差量文件，页面解析器用于请求动态数据并渲染页面。Server-side: Contains a delta file generator and page parser for handling web requests from clients. The delta file generator is used to generate delta files between different versions of files offline when the file is updated, and the page parser is used to request dynamic data and render pages.

如图4所示为本发明实施例公开的一种生成差量文件的方法流程示意图，包括以下操作：FIG. 4 is a schematic flowchart of a method for generating a difference file disclosed in an embodiment of the present invention, including the following operations:

T1、根据配置将旧版本文件按固定长度分块，并计算每个块的哈希校验码值并将其作为块标识并对每个块编号，分块后的内容为chunk1、chunk2、chunk3、chunk4、chunk5连续的5块，块号分别为1、2、3、4、5；T1. According to the configuration, the old version file is divided into fixed length chunks, and the hash check code value of each chunk is calculated and used as the chunk identifier and numbered for each chunk. The chunked content is chunk1, chunk2, chunk3 , chunk4, chunk5 are consecutive 5 blocks, the block numbers are 1, 2, 3, 4, 5 respectively;

T2、在新版本文件里进行滚动查找，依次比对新版本文件块内容的哈希校验码值是否和旧版本文件的块标识相对应；如果对应，记录对应的块号，滚动下标一个块长度距离；否则将数据压入新数据缓冲区，滚动下标一个字符长度距离；在本实施例中，经过滚动查找，chunk1的内容与旧版本是对应的，记录1到差量文件；“hdfjkhlf”与旧版内容不对应，记录到差量文件；以此类推，分别写2、3、“uygfiuy”、5到差量文件；T2. Perform a rolling search in the new version file, and compare the hash check code value of the block content of the new version file in turn to see if it corresponds to the block ID of the old version file; if so, record the corresponding block number, and scroll to subscript one Block length distance; otherwise, push the data into the new data buffer, and scroll the subscript a character length distance; in this embodiment, after scrolling search, the content of chunk1 corresponds to the old version, and record 1 to the difference file; " hdfjkhlf" does not correspond to the old version, and is recorded to the difference file; and so on, write 2, 3, "uygfiuy", 5 to the difference file respectively;

T3、最终得到差量文件内容为{1,“hdfjkhlf”,2,3,“uygfiuy”,5}。T3. The content of the difference file is finally obtained as {1, "hdfjkhlf", 2, 3, "uygfiuy", 5}.

如图5所示为本发明实施例公开的一种合并差量文件与旧版本文件为新版文件的方法流程示意图，包括以下操作：5 is a schematic flowchart of a method for merging a difference file and an old version file into a new version file disclosed in an embodiment of the present invention, including the following operations:

S51、用变量strResult表示计算得到的新文件，并初始化为空字符串“”；S51. Use the variable strResult to represent the calculated new file, and initialize it to an empty string "";

S52、扫描差量文件数组s＝{1,“hdfjkhlf”,2,3,“uygfiuy”,5}；S52, scan the differential file array s={1, "hdfjkhlf", 2, 3, "uygfiuy", 5};

S53、当前的数组项内容如果是字符串，则表示是新增的数据，直接添加到strResult里；如果不是字符串，则表示是记录未变内容的块号，根据块大小和描述规则来计算这部分内容在原文件里的起始和结束位置，用截取字串方法从原文件获得该描述规则对应的真正内容，并将内容添加到strResult里；在本实施例中，经过扫描，发现s[0]＝1，不是字符串，在旧版文件中找到对应的块内容，写chunk1的内容到strResult；s[1]＝“hdfjkhlf”，是字符串，直接添加到strResult；以此类推，继续添加chunk2、chunk3、“uygfiuy”、chunk5到strResult；S53. If the content of the current array item is a string, it means that it is newly added data and is directly added to strResult; if it is not a string, it means that it is the block number that records the unchanged content, which is calculated according to the block size and description rules This part of the content is at the start and end positions of the original file, and the real content corresponding to the description rule is obtained from the original file by the method of intercepting the string, and the content is added to strResult; in this embodiment, after scanning, it is found that s[ 0]=1, not a string, find the corresponding block content in the old version file, write the content of chunk1 to strResult; s[1]="hdfjkhlf", is a string, add it directly to strResult; and so on, continue to add chunk2, chunk3, "uygfiuy", chunk5 to strResult;

S54、新版本文件为chunk1-“hdfjkhlf”-chunk2-chunk3-“uygfiuy”-chunk5。S54. The new version file is chunk1-"hdfjkhlf"-chunk2-chunk3-"uygfiuy"-chunk5.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. a mobile web request processing method based on data deduplication, is characterized in that, comprises:

S1, the client sends a page request to the server, so that the server requests dynamic data from the general gateway interface CGI and fills it in, and renders a page file, wherein the page request includes the page file; transfer the page rendering process to the service The client can reduce the subsequent calculation pressure of the client for differential file parsing and recovery;

S2, the client receives the rendered page file sent by the server;

S3. The client extracts the file version number information in the received page file, and compares it with the version number of the old version file in the cache. If it is found that the file is updated, step S4 is performed; if the file is not updated, the cache is directly used. old version files in;

S4. The client sends a delta file request to the server, the delta file request is used to request a delta file, the delta file is generated offline by the server when the file is updated, and is The difference file includes the block number corresponding to the newly added data and the unchanged data content;

S5. The client receives the delta file sent by the server, and merges the delta file with the old version file in the cache according to the block numbers corresponding to the newly added data and unchanged data content in the delta file Restore the new version of the file; the differential file parsing process is asynchronous with the server-side rendering process.

2. The method according to claim 1, wherein the step S5 comprises the following steps:

S51, initialize the empty string variable strResult to represent the calculated new file;

S52, scan the array in the differential file;

S53. If the content of the currently scanned array item is a string, it means that it is newly added data, and the newly added data is added to strResult; if the content of the currently scanned array item is not a character string, it means that the record is unchanged data The block number of the content, calculate the start and end positions of the content corresponding to the block number in the old version file according to the block size corresponding to the block number and the description rule, and use the interception string method to obtain the content corresponding to the description rule from the old version file. , and add that content to strResult;

S54, determine whether the scanning is over, if so, execute step S55, otherwise go to S52;

S55, returning the variable strResult as the new version file, and ending.

3. a mobile Web request processing method based on data deduplication, is characterized in that, comprises:

A1, the server receives the page request sent by the client, requests dynamic data from the general gateway interface CGI and fills it, and renders the page file, wherein the page request includes the page file;

A2. The server sends the page file with the rendered data to the client, so that the client extracts the file version number information in the received page file and compares it with the version number of the old version file in the cache , and send a delta file request to the server when the file is updated, wherein the delta file request is used to request a delta file, and the delta file is generated offline by the server when the file is updated , and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

A3. The server receives the delta file request from the client, and sends the delta file to the client, so that the client corresponds to the content of the unchanged data from the newly added data in the delta file The block number of the received difference file is merged with the old version file in the cache to restore the new version file;

The steps of generating the difference file are:

T1. Divide the old version file into fixed length blocks, and calculate the hash check code value of each block as the block identifier to number each block;

T2. Perform a scrolling search in the new version file, and compare the hash check code value of the block content of the new version file in turn to see if it corresponds to the block ID of the old version file; if so, record the corresponding block number, and scroll to subscript a block Length distance; otherwise, push the data into the new data buffer, and scroll the subscript a character length distance;

T3, determine whether the content in the new version file is scrolled and searched, if so, execute step T4, otherwise go to step T2;

T4. Synthesize the new data in the new data buffer and generate a difference file.

4. A client, characterized in that, comprising:

a first sending module, configured to send a page request to the server, so that the server requests dynamic data from the general gateway interface CGI and fills in and renders the page file, wherein the page request includes the page file;

a receiving module, configured to receive the rendered page file sent by the server;

The comparison and judgment module is used to extract the file version number information in the received page file, compare it with the version number of the old version file in the cache, and determine whether the file is updated;

The second sending module is configured to send a delta file request to the server when the file is updated, the delta file request is used to request a delta file, and the delta file is updated by the server in the file Time off-line generation, and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

The update module is used to receive the delta file sent by the server, and combine and restore the delta file and the old version file in the cache by the block number corresponding to the newly added data and unchanged data content in the delta file new version file.

5. client according to claim 4, is characterized in that, described update module comprises:

The initialization module is used to initialize the empty string variable strResult to represent the calculated new file;

Scan module for scanning arrays in delta files;

The scan processing module is used to add new data to strResult when the content of the currently scanned array item is a string; when the content of the currently scanned array item is not a string, it means yes Record the block number of the unchanged data content, calculate the start and end positions of the content corresponding to the block number in the old version file according to the block size corresponding to the block number and the description rule, and obtain the description from the old version file by intercepting the string method The content corresponding to the rule, and add the content to strResult;

The judgment processing module is used for returning the variable strResult as the new version file when the scanning ends; when the scanning is not finished, returning to execute the operation of the scanning module.

6. A server, characterized in that, comprising:

The first receiving module is used to receive the page request sent by the client, request dynamic data from the general gateway interface CGI and fill it, and render the page file, wherein the page request includes the page file;

The first sending module is used to send the page file with the rendered data to the client, so that the client extracts the file version number information in the received page file, and the version number of the old version file in the cache. Compare, and when the file is updated, send a delta file request to the server, wherein the delta file request is used to request a delta file;

A difference file generation module, used for offline generation of a difference file when the file is updated, and the block number corresponding to the newly added data and the unchanged data content is included in the difference file;

a second receiving module, configured to receive a differential file request from the client;

The second sending module is configured to send the delta file to the client, so that the client will receive the delta file from the block number corresponding to the newly added data and the unchanged data content in the delta file Merge with the old version of the file in the cache to restore the new version of the file;

The delta file generation module includes:

The block module is used to divide the old version file into fixed length blocks, and calculate the hash check code value of each block as a block identifier to number each block;

The search processing module is used to perform a rolling search in the new version file, and sequentially compare whether the hash check code value of the block content of the new version file corresponds to the block ID of the old version file; if so, record the corresponding block number and scroll Subscript a block length distance; otherwise, push the data into the new data buffer, and scroll the subscript a character length distance;

The judgment processing module is used to synthesize new data in the new data buffer and generate a difference file when the content scrolling search in the new version file ends; when the content scrolling search in the new version file is not completed, return to execute the described Find the action of the processing module.

7. a mobile Web request processing system based on data deduplication, is characterized in that, comprises client and server, wherein:

The client is configured to send a page request to the server, wherein the page request includes a page file;

The server is used for receiving the page request, requesting dynamic data from the general gateway interface CGI and filling it, and rendering the page file; sending the page file with the rendered data to the client;

The client is also used to extract the file version number information in the received page file, and compare it with the version number of the old version file in the cache. If it is found that the file is updated, it will send a differential file request to the server. The difference file request is used to request the difference file, and the difference file is generated offline by the server when the file is updated; if the file is not updated, the old version file in the cache is directly used;

The server is also used to receive a delta file request from the client, send a delta file to the client, and include the newly added data and the block number corresponding to the unchanged data content in the delta file;

The client is further configured to restore the new version file by merging the received delta file with the old version file in the cache according to the block number corresponding to the newly added data and unchanged data content in the delta file.

8. The system of claim 7, wherein:

The server is also used to divide the old version file into blocks according to a fixed length, and calculate the hash check code value of each block as a block identifier to number each block; perform a rolling search in the new version file, and compare them in turn. Whether the hash check code value of the block content of the new version of the file corresponds to the block ID of the old version of the file; if so, record the corresponding block number, and scroll the subscript a block length distance; otherwise, push the data into the new data buffer and scroll Subscript a character length distance; judge whether the content in the new version file is finished scrolling, if so, synthesize the new data in the new data buffer and generate a difference file, otherwise continue to scroll in the new version file;

The client is also used to initialize the empty string variable strResult to represent the calculated new file; scan the array in the differential file; if the content of the currently scanned array item is a string, it means that it is newly added data, and the The newly added data is added to strResult; if the content of the currently scanned array item is not a string, it means that it is the block number that records the content of the unchanged data, and the content corresponding to the block number is calculated according to the block size and description rules corresponding to the block number. Start and end positions in the old version file, use the interception string method to obtain the content corresponding to the description rule from the old version file, and add the content to strResult; judge whether the scan is over, and if so, return the variable strResult as the new version file, otherwise continue scanning the array in the delta file.