CN114879909A

CN114879909A - Data storage method and device, electronic equipment and storage medium

Info

Publication number: CN114879909A
Application number: CN202210493066.3A
Authority: CN
Inventors: 池信泽; 吴素宏; 张旭明; 王豪迈; 胥昕
Original assignee: Beijing Xingchen Tianhe Technology Co ltd
Current assignee: Beijing Xingchen Tianhe Technology Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-08-09
Anticipated expiration: 2042-05-07
Also published as: CN114879909B

Abstract

The invention discloses a data storage method and device, electronic equipment and a storage medium. Wherein, the method comprises the following steps: receiving a data processing request initiated by a client, wherein the data processing request at least carries: service data to be processed; storing the service data into a plurality of cache blocks, and identifying the cache blocks to obtain a plurality of identified cache blocks; combining the plurality of identification cache blocks to obtain a write-back request under the condition that the number of the plurality of identification cache blocks is greater than a preset number threshold; write back requests are written to respective segmented data spaces in the data plane and a dynamic mapping of data pointed to the segmented data spaces by the identified cache blocks is determined. The invention solves the technical problem that the random write-back request can not be effectively processed when the data is written back to the data layer from the cache layer.

Description

Data storage method and device, electronic device, storage medium

技术领域technical field

本发明涉及数据存储技术领域，具体而言，涉及一种数据存储方法及装置、电子设备、存储介质。The present invention relates to the technical field of data storage, and in particular, to a data storage method and device, an electronic device, and a storage medium.

背景技术Background technique

相关技术中，目前数据存储系统，一般会包括多种存储介质(例如，硬盘驱动器HDD和固态硬盘SATA SSD)或者混合介质作为高性价比解决方案，通常分为缓存层(或者缓存盘)和数据层(或者数据层)，缓存层一般包括高速磁盘(如NVMe SSD或者普通的SSD)，数据层会包括相对慢速的磁盘(例如，普通SSD，HDD或QLC SSD组成)，其中，缓存盘的容量小于数据盘容量。这样，来自客户端的请求先写入缓存盘，就可以马上返回完成，而后通过回写机制(writeback)将数据慢慢的写入数据盘中，由于回写的过程是在后台进行，因此他不会对前端IO进行干扰。In the related art, the current data storage system generally includes a variety of storage media (for example, hard disk drive HDD and solid state drive SATA SSD) or mixed media as a cost-effective solution, usually divided into a cache layer (or cache disk) and a data layer. (or data layer), the cache layer generally includes high-speed disks (such as NVMe SSD or ordinary SSD), and the data layer includes relatively slow disks (such as ordinary SSD, HDD or QLC SSD), among which, the capacity of the cache disk less than the data disk capacity. In this way, the request from the client is first written to the cache disk, and then it can be returned immediately, and then the data is slowly written to the data disk through the writeback mechanism (writeback). It will interfere with front-end IO.

在通过数据存储系统进行数据存储、回写过程中，核心的是Cache算法，Cache算法中的回写机制的核心是缓存盘的Block和数据盘Block的映射关系。目前的机制通常是将缓存盘分割为若干个Block块(Block大小可以是4KB，8KB等)，然后我们也将数据盘分割为对应的若干Block块；当前端的请求要写入数据盘Block A的时候，Cache算法会选择一个SSD的某个Block A’，先把数据写入到Block A’中，然后把这个Block A’标记为脏数据(Dirty)，这时候就建立了缓存盘Block A’和数据盘Block A的映射关系。In the process of data storage and write-back through the data storage system, the core is the Cache algorithm, and the core of the write-back mechanism in the Cache algorithm is the mapping relationship between the block of the cache disk and the block of the data disk. The current mechanism is usually to divide the cache disk into several blocks (block size can be 4KB, 8KB, etc.), and then we also divide the data disk into several corresponding blocks; the current request needs to be written to the data disk Block A At this time, the Cache algorithm will select a certain Block A' of an SSD, first write the data to Block A', and then mark this Block A' as dirty data (Dirty), and then create a cache disk Block A' The mapping relationship with the data disk Block A.

上述的Cache算法的将高速缓存盘的Block和低速数据盘的Block一一对应。但是这个过程并没有对前端来的请求进行重新的规整，即如果前端IO是随机的，那么回写的过程也将会是随机的，随机回写对于数据盘的压力非常大，如果数据盘由HDD或者QLC SSD组成，那么数据盘的压力会很大。In the above-mentioned Cache algorithm, the blocks of the cache disk and the blocks of the low-speed data disk are in one-to-one correspondence. However, this process does not re-regulate the requests from the front-end, that is, if the front-end IO is random, the write-back process will also be random, and the random write-back will put a lot of pressure on the data disk. If it is composed of HDD or QLC SSD, the pressure on the data disk will be great.

针对上述的问题，目前尚未提出有效的解决方案。For the above problems, no effective solution has been proposed yet.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了一种数据存储方法及装置、电子设备、存储介质，以至少解决在将数据由缓存层回写至数据层时，无法有效处理随机回写的请求的技术问题。Embodiments of the present invention provide a data storage method and device, an electronic device, and a storage medium to at least solve the technical problem that a random write-back request cannot be effectively processed when data is written back from the cache layer to the data layer.

根据本发明实施例的一个方面，提供了一种数据存储方法，应用于数据存储系统中的缓存层，所述缓存层所对应的缓存空间中包括多个预先划分的缓存块，包括：接收客户端发起的数据处理请求，其中，所述数据处理请求中至少携带有：待处理的业务数据；将所述业务数据存储至多个缓存块中，并对所述多个缓存块进行标识，得到多个标识缓存块；在所述多个标识缓存块的数量大于预设数量阈值的情况下，组合所述多个标识缓存块，得到回写请求；将所述回写请求写入至数据层中的各个分段数据空间中，并确定由所述标识缓存块指向所述分段数据空间的数据动态映射关系。According to an aspect of the embodiments of the present invention, a data storage method is provided, which is applied to a cache layer in a data storage system. The cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks, including: receiving a client The data processing request initiated by the terminal, wherein the data processing request at least carries: the business data to be processed; the business data is stored in multiple cache blocks, and the multiple cache blocks are identified to obtain multiple cache blocks. each identifier cache block; when the number of the multiple identifier cache blocks is greater than the preset number threshold, combine the multiple identifier cache blocks to obtain a write-back request; write the write-back request into the data layer in each segmented data space, and determine the data dynamic mapping relationship pointed to the segmented data space by the identified cache block.

可选地，组合所述多个标识缓存块，得到回写请求的步骤，包括：获取每个所述标识缓存块所属的缓存区间，并确定所述缓存区间的区间起始标识；基于所述区间起始标识，合并所述所述缓存区间，得到所述回写请求。Optionally, the step of combining the multiple identification cache blocks to obtain a write-back request includes: acquiring a cache interval to which each identified cache block belongs, and determining an interval start identifier of the cache interval; based on the The start identifier of the interval, and the cache interval is merged to obtain the write-back request.

可选地，在将所述回写请求写入至数据层中的各个分段数据空间中之后，还包括：获取所述缓存区间的区间信息和预先存储的所述业务数据；对所述业务数据进行预加工处理，得到业务处理数据和预加工信息；将所述业务处理数据确定为分段数据，将所述预加工信息确定为分段头信息；将所述分段头信息和所述分段数据组合为分段记录信息；将所述分段记录信息追加写入所述分段数据空间。Optionally, after writing the write-back request into each segmented data space in the data layer, the method further includes: acquiring interval information of the cache interval and the pre-stored service data; The data is preprocessed to obtain business processing data and preprocessing information; the business processing data is determined as segmented data, and the preprocessing information is determined as segmented header information; the segmented header information and the The segment data is combined into segment record information; the segment record information is additionally written into the segment data space.

可选地，确定由所述标识缓存块指向所述分段数据空间的数据动态映射关系的步骤，包括：基于所述标识缓存块所属的缓存区间的区间起始标识和所述缓存区间的区间大小，确定映射起始位；获取所述分段数据空间在数据层中的整体数据空间中的分段起始位；基于所述分段起始位，确定所述标识缓存块所属的缓存区间在所述分段数据空间中的分段空间起始位和分段空间大小；建立所述映射起始位指向所述分段空间起始位的数据动态映射关系。Optionally, the step of determining the dynamic mapping relationship of data pointed to the segmented data space by the identifier cache block includes: based on the interval start identifier of the cache interval to which the identifier cache block belongs and the interval of the cache interval. size, determine the mapping start bit; obtain the segment start bit in the overall data space of the segmented data space in the data layer; based on the segment start bit, determine the cache interval to which the identified cache block belongs The segment space start bit and segment space size in the segment data space; establish a data dynamic mapping relationship in which the mapping start bit points to the segment space start bit.

可选地，在确定由所述标识缓存块指向所述分段数据空间的数据动态映射关系之后，还包括：在产生新回写请求的情况下，为所述新回写请求分配有效的分段数据空间；将所述新回写请求写入至已分配的所述有效的分段数据空间。Optionally, after determining the data dynamic mapping relationship from the identification cache block to the segmented data space, the method further includes: when a new write-back request is generated, assigning a valid segment to the new write-back request. segment data space; write the new write-back request to the allocated valid segment data space.

可选地，还包括：在产生新回写请求的情况下，判断所述新回写请求所使用的缓存区间是否被重复使用；在所述新回写请求所使用的缓存区间被重复使用的情况下，对重复使用的所述缓存区间在历史过程中所指向的所述分段数据空间进行回收处理，并对回收的所述分段数据空间进行无效标记。Optionally, it also includes: in the case of generating a new write-back request, judging whether the cache interval used by the new write-back request is reused; if the cache interval used by the new write-back request is reused In this case, the reclaiming process is performed on the segmented data space pointed to by the repeatedly used cache section in the historical process, and the reclaimed segmented data space is marked invalid.

可选地，对重复使用的所述缓存区间在历史过程中所指向的所述分段数据空间进行回收处理的步骤，包括：在存在多个待回收的所述分段数据空间的情况下，分析所述分段数据空间的空间使用率和内部存储数据量；按照所述空间使用率和所述内部存储数据量的排序顺序，对多个待回收的所述分段数据空间分别进行回收处理。Optionally, the step of reclaiming the segmented data space pointed to in the historical process by the repeatedly used cache interval includes: when there are multiple segmented data spaces to be reclaimed, Analyzing the space usage rate and the amount of internal storage data of the segmented data space; according to the sorting order of the space usage rate and the amount of internal storage data, reclaim the plurality of segmented data spaces to be reclaimed respectively. .

根据本发明实施例的另一方面，还提供了一种数据存储装置，应用于数据存储系统中的缓存层，所述缓存层所对应的缓存空间中包括多个预先划分的缓存块，包括：接收单元，用于接收客户端发起的数据处理请求，其中，所述数据处理请求中至少携带有：待处理的业务数据；存储单元，用于将所述业务数据存储至多个缓存块中，并对所述多个缓存块进行标识，得到多个标识缓存块；组合单元，用于在所述多个标识缓存块的数量大于预设数量阈值的情况下，组合所述多个标识缓存块，得到回写请求；写入单元，用于将所述回写请求写入至数据层中的各个分段数据空间中，并确定由所述标识缓存块指向所述分段数据空间的数据动态映射关系。According to another aspect of the embodiments of the present invention, a data storage device is also provided, which is applied to a cache layer in a data storage system. The cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks, including: a receiving unit, configured to receive a data processing request initiated by a client, wherein the data processing request at least carries: business data to be processed; a storage unit, configured to store the business data in multiple cache blocks, and Identifying the multiple cache blocks to obtain multiple identified cache blocks; a combining unit for combining the multiple identified cache blocks when the number of the multiple identified cache blocks is greater than a preset number threshold, Obtaining a write-back request; a writing unit, configured to write the write-back request into each segmented data space in the data layer, and determine the data dynamic mapping pointed to the segmented data space by the identification cache block relation.

可选地，所述组合单元包括：第一获取模块，用于获取每个所述标识缓存块所属的缓存区间，并确定所述缓存区间的区间起始标识；第一合并模块，用于基于所述区间起始标识，合并所述所述缓存区间，得到所述回写请求。Optionally, the combining unit includes: a first acquiring module, configured to acquire a cache interval to which each of the identified cache blocks belongs, and determine an interval start identifier of the cache interval; a first combining module, configured to be based on The interval start identifier is combined with the cache interval to obtain the write-back request.

可选地，数据存储装置还包括：第二获取模块，用于在将所述回写请求写入至数据层中的各个分段数据空间中之后，获取所述缓存区间的区间信息和预先存储的所述业务数据；预加工模块，用于对所述业务数据进行预加工处理，得到业务处理数据和预加工信息；第一确定模块，用于将所述业务处理数据确定为分段数据，将所述预加工信息确定为分段头信息；第二合并模块，用于将所述分段头信息和所述分段数据组合为分段记录信息；第一写入模块，用于将所述分段记录信息追加写入所述分段数据空间。Optionally, the data storage device further includes: a second obtaining module, configured to obtain interval information of the cache interval and pre-store the interval information of the cache interval after writing the write-back request into each segmented data space in the data layer. the business data; a preprocessing module for preprocessing the business data to obtain business processing data and preprocessing information; a first determining module for determining the business processing data as segmented data, The preprocessing information is determined as segment header information; the second merging module is used to combine the segment header information and the segment data into segment record information; the first write module is used to The segment record information is additionally written into the segment data space.

可选地，所述写入单元包括：第一确定模块，用于基于所述标识缓存块所属的缓存区间的区间起始标识和所述缓存区间的区间大小，确定映射起始位；第三获取模块，用于获取所述分段数据空间在数据层中的整体数据空间中的分段起始位；基于所述分段起始位，确定所述标识缓存块所属的缓存区间在所述分段数据空间中的分段空间起始位和分段空间大小；第一建立模块，用于建立所述映射起始位指向所述分段空间起始位的数据动态映射关系。Optionally, the writing unit includes: a first determining module configured to determine a mapping start bit based on an interval start identifier of the buffer interval to which the identified cache block belongs and an interval size of the buffer interval; a third an acquisition module, configured to acquire the segment start bit of the segment data space in the overall data space in the data layer; The start bit of the segment space and the size of the segment space in the segment data space; the first establishing module is used to establish a data dynamic mapping relationship in which the start bit of the mapping points to the start bit of the segment space.

可选地，数据存储装置还包括：第一分配模块，用于在确定由所述标识缓存块指向所述分段数据空间的数据动态映射关系之后，在产生新回写请求的情况下，为所述新回写请求分配有效的分段数据空间；第二写入模块，用于将所述新回写请求写入至已分配的所述有效的分段数据空间。Optionally, the data storage device further includes: a first allocation module, configured to, after determining the data dynamic mapping relationship from the identification cache block to the segmented data space, in the case of generating a new write-back request, for The new write-back request allocates a valid segment data space; the second writing module is configured to write the new write-back request into the allocated valid segment data space.

可选地，数据存储装置还包括：第一判断模块，用于在产生新回写请求的情况下，判断所述新回写请求所使用的缓存区间是否被重复使用；回收模块，用于在所述新回写请求所使用的缓存区间被重复使用的情况下，对重复使用的所述缓存区间在历史过程中所指向的所述分段数据空间进行回收处理，并对回收的所述分段数据空间进行无效标记。Optionally, the data storage device further includes: a first judging module for judging whether the cache interval used by the new write-back request is reused when a new write-back request is generated; a recycling module for When the cache interval used by the new write-back request is reused, the segment data space pointed to by the reused cache interval in the historical process is reclaimed, and the reclaimed segment data space is reclaimed. Segment data space is invalid marked.

可选地，所述回收模块包括：分析子模块，用于在存在多个待回收的所述分段数据空间的情况下，分析所述分段数据空间的空间使用率和内部存储数据量；回收子模块，用于按照所述空间使用率和所述内部存储数据量的排序顺序，对多个待回收的所述分段数据空间分别进行回收处理。Optionally, the recycling module includes: an analysis submodule, configured to analyze the space usage rate and the amount of internal storage data of the segmented data space when there are multiple segmented data spaces to be recycled; The recycling sub-module is configured to perform recycling processing on a plurality of the segmented data spaces to be recycled according to the sorting order of the space usage rate and the amount of the internal storage data.

根据本发明实施例的另一方面，还提供了一种电子设备，包括：处理器；以及存储器，用于存储所述处理器的可执行指令；其中，所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的数据存储方法。According to another aspect of the embodiments of the present invention, there is also provided an electronic device, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the The instructions are executable to perform the data storage method described in any of the above.

根据本发明实施例的另一方面，还提供了一种计算机可读存储介质，所述计算机可读存储介质包括存储的计算机程序，其中，在所述计算机程序运行时控制所述计算机可读存储介质所在设备执行上述任意一项所述的数据存储方法。According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, the computer-readable storage medium comprising a stored computer program, wherein the computer-readable storage medium is controlled when the computer program is executed The device where the medium is located executes the data storage method described in any one of the above.

本发明实施例中，采用接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据，将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块，在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求，将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。在该实施例中，可以通过数据再规整技术，将来自前端的随机小块写数据转化为顺序大块的回写请求，这样能够应对不同情况的随机回写，提高混合存储介质的访问速度，实现数据快速整理，提升数据处理效率，从而解决在将数据由缓存层回写至数据层时，无法有效处理随机回写的请求的技术问题。In this embodiment of the present invention, a data processing request initiated by a client is received, wherein the data processing request at least carries: business data to be processed, the business data is stored in multiple cache blocks, and the multiple cache blocks are identified , obtain multiple identification cache blocks, and when the number of multiple identification cache blocks is greater than the preset number threshold, combine multiple identification cache blocks to obtain a write-back request, and write the write-back request to each partition in the data layer In the segment data space, and determine the data dynamic mapping relationship from the identification cache block to the segment data space. In this embodiment, the random small-block write data from the front end can be converted into a sequential large-block write-back request through the data re-warping technology, which can cope with random write-back in different situations and improve the access speed of the mixed storage medium. Realize quick data sorting, improve data processing efficiency, and solve the technical problem that random write-back requests cannot be effectively processed when data is written back from the cache layer to the data layer.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

图1是根据本发明实施例的一种可选的数据存储方法的流程图；1 is a flowchart of an optional data storage method according to an embodiment of the present invention;

图2是根据本发明实施例的一种可选的数据存储装置的示意图；2 is a schematic diagram of an optional data storage device according to an embodiment of the present invention;

图3是根据本发明实施例的一种数据存储方法的电子设备(或移动设备)的硬件结构框图。FIG. 3 is a block diagram of a hardware structure of an electronic device (or mobile device) of a data storage method according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

为便于本领域技术技术人员能够理解本发明，下面对本发明各实施例中涉及的部分术语或者名词做出解释：For the convenience of those skilled in the art to understand the present invention, some terms or nouns involved in each embodiment of the present invention are explained below:

机械磁盘，Hard Disk Drive，简称HDD；Mechanical disk, Hard Disk Drive, referred to as HDD;

固态磁盘，Solid State Disk，简称SSD，存储介质的闪存颗粒；Solid state disk, Solid State Disk, referred to as SSD, flash memory particles of storage media;

NVMe SSD，NVMe接口的SSD，一种高速的SSD；NVMe SSD, SSD with NVMe interface, a high-speed SSD;

QLC(quad-level cell)SSD，一种高密度的SSD，其价格相对之前的SSD较低，但是其写入命中比较短。QLC (quad-level cell) SSD, a high-density SSD, its price is lower than the previous SSD, but its write hit is relatively short.

本发明可应用于各种数据存储系统，数据存储系统采用了多种持久化设备，该持久化设备可以包括但不限于：缓存层和数据层，其中，缓存层可以包括高速磁盘(如NVMeSSD或者普通的SSD)，数据层会包括相对慢速的磁盘(例如，普通SSD、HDD或QLC SSD)，其中，缓存层的容量小于数据层容量。The present invention can be applied to various data storage systems. The data storage system adopts a variety of persistent devices. The persistent devices may include but are not limited to: a cache layer and a data layer, wherein the cache layer may include high-speed disks (such as NVMeSSD or Ordinary SSD), the data tier will include relatively slow disks (eg, ordinary SSDs, HDDs, or QLC SSDs), where the cache tier capacity is smaller than the data tier capacity.

本发明在数据存储系统中采用混合介质管理机制，该机制可以管理一块盘，也可以是一组盘组成的逻辑空间，即可以将若干的高速缓存盘组成一个逻辑高速缓存空间，将相对低速的数据盘组成一个逻辑的低速数据空间，按照同样的策略可以实现从高速缓存空间到低速数据空间的回写机制。The present invention adopts a mixed media management mechanism in the data storage system, which can manage one disk or a logical space composed of a group of disks, that is, several cache disks can be composed into a logical cache space, and relatively low-speed cache disks can be combined into a logical cache space. The data disk forms a logical low-speed data space, and the write-back mechanism from the cache space to the low-speed data space can be implemented according to the same strategy.

本发明中，由多组磁盘组成的逻辑数据空间方案，对于在随机小IO前提下，当缓存空间不足，需要回写数据到数据层过程中，数据盘对于随机小IO数据处理效率低下的问题，本申请引入了数据再规整技术，将随机的小IO转化了顺序的大IO，充分挥发了数据盘的大IO处理能力，即可以采用再规整技术很方便地填充数据，无需在数据层中分割与缓存层一一对应的数据块(block块)。In the present invention, the logical data space scheme composed of multiple groups of disks, under the premise of random small IO, when the cache space is insufficient and the data needs to be written back to the data layer, the data disk is inefficient for random small IO data processing. , this application introduces the data re-arrangement technology, which converts random small IOs into sequential large IOs, fully volatilizing the large IO processing capability of the data disk, that is, the re-arranging technology can be used to conveniently fill data, without needing to store data in the data layer. Divide the data blocks (block blocks) corresponding to the cache layer one-to-one.

本申请中，可以设置回写的数据满足磁盘存储介质的存储条件，以此提高磁盘的处理能力(例如，资源池如果采用EC作为副本策略，那么可以通过填充数据使得其对齐EC条带；例如，可以通过对齐数据池SSD的擦除块大小，优化SSD写入行为)。In this application, the write-back data can be set to meet the storage conditions of the disk storage medium, so as to improve the processing capability of the disk (for example, if the resource pool adopts EC as the copy strategy, the data can be filled to align the EC stripe; for example, , the SSD write behavior can be optimized by aligning the erase block size of the data pool SSD).

下面结合各个实施例来说明本发明。The present invention will be described below with reference to various embodiments.

实施例一Example 1

根据本发明实施例，提供了一种数据存储方法实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a data storage method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and, although in A logical order is shown in the flowcharts, but in some cases steps shown or described may be performed in an order different from that herein.

本实施例提供了一种数据存储方法，应用于数据存储系统中的缓存层，缓存层所对应的缓存空间中包括多个预先划分的缓存块。This embodiment provides a data storage method, which is applied to a cache layer in a data storage system, where a cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks.

图1是根据本发明实施例的一种可选的数据存储方法的流程图，如图1所示，该方法包括如下步骤：FIG. 1 is a flowchart of an optional data storage method according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

步骤S102，接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据；Step S102, receiving a data processing request initiated by the client, wherein the data processing request at least carries: business data to be processed;

步骤S104，将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块；Step S104, storing the business data in multiple cache blocks, and identifying multiple cache blocks to obtain multiple identified cache blocks;

步骤S106，在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求；Step S106, in the case that the number of multiple identification cache blocks is greater than the preset number threshold, combine multiple identification cache blocks to obtain a write-back request;

步骤S108，将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。Step S108, write the write-back request into each segmented data space in the data layer, and determine the data dynamic mapping relationship pointed to by the identification cache block to the segmented data space.

通过上述步骤，可以接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据，将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块，在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求，将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。在该实施例中，可以通过数据再规整技术，将来自前端的随机小块写数据转化为顺序大块的回写请求，这样能够应对不同的回写请求，实现数据快速整理，提升数据处理效率，从而解决在将数据由缓存层回写至数据层时，无法有效处理随机回写的请求的技术问题。Through the above steps, a data processing request initiated by the client can be received, wherein the data processing request at least carries: the business data to be processed, the business data is stored in multiple cache blocks, and the multiple cache blocks are identified to obtain Multiple identifier cache blocks, when the number of multiple identifier cache blocks is greater than the preset number threshold, combine multiple identifier cache blocks to obtain a write-back request, and write the write-back request to each segmented data in the data layer space, and determine the data dynamic mapping relationship from the identification cache block to the segmented data space. In this embodiment, the random small-block write data from the front end can be converted into sequential large-block write-back requests through the data re-warping technology, so that different write-back requests can be handled, data can be quickly sorted, and data processing efficiency can be improved , so as to solve the technical problem that the random write-back request cannot be effectively processed when data is written back from the cache layer to the data layer.

本实施例中引入了数据再规整技术，对于数据存储系统使用的Cache算法，在建立缓存层的Block和数据层Block的映射关系时，无需将高速缓存层的Block和低速数据层的Block一一对应，通过拆成Log Append写入技术，将来自客户端(或者叫前端)的随机小块写数据转化为顺序大块写请求，顺序写有效地发挥了数据层的顺序大块写能力。In this embodiment, the data re-regularization technology is introduced. For the Cache algorithm used by the data storage system, when establishing the mapping relationship between the Block of the cache layer and the Block of the data layer, it is not necessary to separate the Block of the cache layer and the Block of the low-speed data layer one by one. Correspondingly, by splitting it into Log Append writing technology, the random small block write data from the client (or the front end) is converted into a sequential large block write request, and the sequential write effectively exerts the sequential large block write capability of the data layer.

下面结合上述各实施步骤来详细说明本发明实施例。The embodiments of the present invention will be described in detail below with reference to the above implementation steps.

步骤S102，接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据。Step S102: Receive a data processing request initiated by the client, wherein the data processing request at least carries: service data to be processed.

本实施例中，来自客户端/前端的业务写IO，会通过数据处理请求发送至缓存层/缓存盘/缓存空间，该缓存层的数据写入数据较快，能够及时进行数据缓存规整。可以将所有来自客户端/前端的业务写IO先写入到缓存层的缓存空间中。In this embodiment, the service write IO from the client/front end will be sent to the cache layer/cache disk/cache space through a data processing request. The data write data of the cache layer is faster, and the data cache can be regularized in time. All business write IO from the client/front end can be written to the cache space of the cache layer first.

步骤S104，将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块。In step S104, the service data is stored in multiple cache blocks, and the multiple cache blocks are identified to obtain multiple identified cache blocks.

在本实施例中，缓存层中的缓存空间可以包括至少一块高速存储盘，缓存空间预先被划分为多个缓存块Block，以缓存块Bl ock粒度进行管理，整个缓存空间可以分为N个Block。可选的，本实施例中划分的缓存块的大小根据每次请求写入的数据大小划分的，缓存块的大小不一致，例如，4K,8K，16K,128K等。In this embodiment, the cache space in the cache layer may include at least one high-speed storage disk, and the cache space is pre-divided into multiple cache blocks Block, which are managed with the granularity of the cache block Block, and the entire cache space may be divided into N Blocks . Optionally, the size of the divided cache blocks in this embodiment is divided according to the size of data written in each request, and the sizes of the cache blocks are inconsistent, for example, 4K, 8K, 16K, 128K, and so on.

每一次新写入的业务数据分配对应个数的Block空间，数据写入这些Block后，可以对这些已经写入数据的缓存块进行标记，得到标识缓存块，例如，将将这些Block标记为脏Block；在完成块标记后，即可给客户端/前端返回写入成功信息。A corresponding number of Block spaces are allocated for each newly written business data. After the data is written to these blocks, the cache blocks that have already written data can be marked to obtain the identified cache blocks. For example, these blocks are marked as dirty. Block; After completing the block marking, the writing success information can be returned to the client/front end.

随着客户端不断的写入数据，缓存层中Block也会逐渐的消耗，当消耗到一定程度的时候，需要将缓存盘中的数据回写到数据层以便腾出足够的空间处理新的请求。步骤S106，在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求。As the client continues to write data, the blocks in the cache layer will gradually be consumed. When the consumption reaches a certain level, the data in the cache disk needs to be written back to the data layer to free up enough space to process new requests. . Step S106, in the case that the number of the multiple identification cache blocks is greater than the preset number threshold, combine the multiple identification cache blocks to obtain a write-back request.

即随着不断在缓存空间中分配缓存块，写入数据，这样标识缓存块的数量越来越多，对于缓存空间的消耗也不断增加，然后可以根据预先设置的回写策略触发调节(例如，缓存块的数量达到某一阈值、缓存时长达到某一预设时长阈值、缓存空间的使用率达到某一使用率阈值)的情况下，将这些标识缓存块，拼凑起来，形成大的回写(writeback)请求，准备写入到数据层中的数据空间中。即可以随着Block的消耗，脏Block越来越多，这时候我们可以将这些Block拼凑起来，形成大的回写(writeback)请求，准备写入到数据空间。That is, as cache blocks are continuously allocated in the cache space and data is written, so that the number of identified cache blocks increases, the consumption of cache space also increases, and then adjustment can be triggered according to the preset write-back policy (for example, When the number of cache blocks reaches a certain threshold, the cache duration reaches a preset duration threshold, and the usage of the cache space reaches a certain usage threshold), these identified cache blocks are pieced together to form a large write-back ( writeback) request, ready to write to the data space in the data layer. That is, with the consumption of blocks, there are more and more dirty blocks. At this time, we can piece together these blocks to form a large writeback request and prepare to write to the data space.

在本实施例中，对于数据存储系统中的数据层，会预先将数据空间分割成逻辑的分段数据空间-Segment，Segment的大小灵活配置，一般是大于1M，每个Segment对应数据空间中一段的物理空间。In this embodiment, for the data layer in the data storage system, the data space is divided into logical segmented data spaces-Segment in advance. The size of the segment is flexibly configured, generally greater than 1M, and each segment corresponds to a segment in the data space. physical space.

可选的，组合多个标识缓存块，得到回写请求的步骤，包括：获取每个标识缓存块所属的缓存区间，并确定缓存区间的区间起始标识；基于区间起始标识，合并缓存区间，得到回写请求。Optionally, the step of combining multiple identified cache blocks to obtain a write-back request includes: obtaining a cache interval to which each identified cache block belongs, and determining an interval start identifier of the cache interval; and combining the cache intervals based on the interval start identifier , get a writeback request.

在缓存空间中，由于每次写入的数据大小不一致，每次划分的数据块的数量不一致，因此，需要针对一个或者多个数据处理请求，设置多个标识缓存块的缓存区间，这样形成为回写请求的大小也不一致，来自回写的请求也不一定是连续的，它包含若干个缓存区间，标记为A(off,len),B(off,len)，C(off,len)等，对这些区间进行合并-Merge。例如，对于SSD中的缓存区间A，其大小为4K，由0K-4k，从缓存空间中的第1M开始，则该缓存区间A所属区间为第1M+0K至1M+4K，该缓存区间A的区间起始标识为1M；对于SSD中的缓存区间B，其大小为4K，由8k-12K，从缓存空间中的第2M开始，则该缓存区间B所属区间为第2M+8k至2M+12K，该缓存区间B的区间起始标识为2M+8k；对于SSD中的缓存区间C，其大小为16K，由16k-32K，从缓存空间中的第3M开始，则该缓存区间C所属区间为第3M+16k至3M+32K，该缓存区间C的区间起始标识为3M+16k。In the cache space, since the size of the data written each time is inconsistent and the number of data blocks divided each time is inconsistent, it is necessary to set multiple cache intervals that identify the cache block for one or more data processing requests, which is formed as The size of the write-back request is also inconsistent, and the request from the write-back is not necessarily continuous. It contains several cache intervals, marked as A(off,len), B(off,len), C(off,len), etc. , merge these intervals - Merge. For example, for the cache interval A in the SSD, its size is 4K, starting from 0K-4k, starting from the 1M in the cache space, then the interval to which the cache interval A belongs is the 1M+0K to 1M+4K, and the cache interval A belongs The starting identifier of the interval is 1M; for the cache interval B in the SSD, its size is 4K, starting from 8k-12K, starting from the 2M in the cache space, then the interval to which the cache interval B belongs is 2M+8k to 2M+ 12K, the interval start identifier of the cache interval B is 2M+8k; for the cache interval C in the SSD, its size is 16K, from 16k-32K, starting from the 3M in the cache space, the interval to which the cache interval C belongs From 3M+16k to 3M+32K, the interval start identifier of the buffer interval C is 3M+16k.

本实施例中，回写过程中对于缓存区间的选择是灵活配置的，也可以根据某种特性进行选择，以减轻后续的混合介质的数据动态映射持久化的代价。例如，本申请可以根据持久化的特点选择对应的缓存空间，使得后续持久化只需要更新少量的数据。In this embodiment, the selection of the cache interval during the write-back process is flexibly configured, and the selection can also be made according to a certain characteristic, so as to reduce the cost of dynamic mapping and persistence of data in the subsequent mixed media. For example, the present application can select the corresponding cache space according to the characteristics of persistence, so that only a small amount of data needs to be updated for subsequent persistence.

另一种可选的，在将回写请求写入至数据层中的各个分段数据空间中之后，还包括：获取缓存区间的区间信息和预先存储的业务数据；对业务数据进行预加工处理，得到业务处理数据和预加工信息；将业务处理数据确定为分段数据，将预加工信息确定为分段头信息；将分段头信息和分段数据组合为分段记录信息；将分段记录信息追加写入分段数据空间。Alternatively, after writing the write-back request into each segmented data space in the data layer, the method further includes: acquiring interval information of the cache interval and pre-stored business data; preprocessing the business data , obtain business processing data and preprocessing information; determine business processing data as segmented data, and determine preprocessing information as segmented header information; combine segmented header information and segmented data into segmented record information; Record information is appended to the segmented data space.

一次回写的数据可以很大，每个缓存区间形为一条分段记录信息-SegmentEntry，Entry中记录了这些缓存区间的信息和数据，称为分段头信息-EntryHeader和分段数据，记录数据信息-EntryData。可以根据需要对数据进行加工(checksum或者压缩等)，这些额外的信息可以很方便地加入到EntryHeader中，然后以追加写的方式写入到上述的分段数据空间-Segment中。The data that can be written back at a time can be very large, and each cache interval is shaped as a segment record information-SegmentEntry, the information and data of these cache intervals are recorded in the Entry, which is called segment header information-EntryHeader and segment data, record data info - EntryData. The data can be processed (checksum or compression, etc.) as needed, and these additional information can be easily added to the EntryHeader, and then written to the above-mentioned segmented data space-Segment in the form of additional writing.

可选的，确定由标识缓存块指向分段数据空间的数据动态映射关系的步骤，包括：基于标识缓存块所属的缓存区间的区间起始标识和缓存区间的区间大小，确定映射起始位；获取分段数据空间在数据层中的整体数据空间中的分段起始位；基于分段起始位，确定标识缓存块所属的缓存区间在分段数据空间中的分段空间起始位和分段空间大小；建立映射起始位指向分段空间起始位的数据动态映射关系。Optionally, the step of determining the data dynamic mapping relationship that identifies the cache block to the segmented data space includes: determining the mapping start position based on the interval start identifier of the cache interval to which the identifier cache block belongs and the interval size of the cache interval; Obtain the segment start bit of the segment data space in the overall data space in the data layer; based on the segment start bit, determine the segment space start bit in the segment data space of the cache interval that identifies the cache block and the The size of the segment space; establish a dynamic mapping relationship of data that the mapping start bit points to the start bit of the segment space.

混合介质的数据动态映射是整个算法的核心思想，通过不断地更新映射，可以实现大块顺序数据空间的写入。该数据动态映射关系可以称为元数据映射，需要持久化，由于每一个缓存空间都需要形成一个映射，映射的持久化过程可能会带来额外的开销。The dynamic data mapping of mixed media is the core idea of the whole algorithm. By continuously updating the mapping, the writing of large blocks of sequential data space can be realized. The data dynamic mapping relationship can be called metadata mapping, which needs to be persisted. Since each cache space needs to form a mapping, the mapping persistence process may bring additional overhead.

虽然每次建立的数据动态映射很小，但是由于持久化至少要对齐磁盘的逻辑块，每次只更新一个映射非常浪费磁盘，本实施例中，可通过批量更新减少映射信息部分的开销。Although the data dynamic mapping created each time is small, at least one logical block of the disk needs to be aligned for persistence, and only updating one mapping at a time is very wasteful of the disk. In this embodiment, the overhead of the mapping information part can be reduced by batch updating.

在回写过程中，需要记录缓存空间到分段数据空间的映射关系，例如，记录缓存空间A(off,len)到Segment空间SA(off,len)的映射，称为混合介质的数据动态映射。During the write-back process, it is necessary to record the mapping relationship between the cache space and the segmented data space, for example, record the mapping from the cache space A(off,len) to the segment space SA(off,len), which is called the data dynamic mapping of mixed media .

在本实施例中，对若干磁盘组成的资源池，回写过程可以通过填充数据，使得回写的数据满足某个条件，在该条件下可以更好的发挥数据盘某些物理特性(例如数据空间如果采用EC作为副本策略，那么可以通过填充数据使得其对齐EC条带；例如可以通过对齐数据池SSD的擦除块大小，优化SSD写入行为)。In this embodiment, for a resource pool composed of several disks, the write-back process can fill data so that the written-back data satisfies a certain condition, and under this condition, certain physical characteristics of the data disk (for example, data If the space adopts EC as the copy strategy, the data can be filled to align the EC stripe; for example, the SSD write behavior can be optimized by aligning the erase block size of the data pool SSD).

作为本发明可选的实施方式，在确定由标识缓存块指向分段数据空间的数据动态映射关系之后，还包括：在产生新回写请求的情况下，为新回写请求分配有效的分段数据空间；将新回写请求写入至已分配的有效的分段数据空间。As an optional embodiment of the present invention, after determining the data dynamic mapping relationship from the identification cache block to the segmented data space, the method further includes: in the case of generating a new write-back request, assigning a valid segment to the new write-back request Data space; writes new writeback requests to the allocated, valid segmented data space.

可选的，本申请还包括：在产生新回写请求的情况下，判断新回写请求所使用的缓存区间是否被重复使用；在新回写请求所使用的缓存区间被重复使用的情况下，对重复使用的缓存区间在历史过程中所指向的分段数据空间进行回收处理，并对回收的分段数据空间进行无效标记。Optionally, the present application further includes: in the case of generating a new write-back request, judging whether the cache interval used by the new write-back request is reused; in the case that the cache interval used by the new write-back request is reused , to reclaim the segmented data space pointed to in the historical process by the reused cache interval, and to invalidate the reclaimed segmented data space.

在产生覆盖写，形成新的回写请求的情况下，同一个缓存空间A(off,len)会对应不同的Segment空间SB(off,len)，这时候老的空间SA(off,len)会标记为无效而需要进行垃圾回收，本申请中，可以利用bitmap来进行分段数据空间有效/无效空间的标记，bitmap可以在后续的垃圾回收阶段减少无效的数据搬移。When an overwrite is generated and a new write-back request is formed, the same cache space A(off,len) will correspond to a different segment space SB(off,len), and the old space SA(off,len) will be If the mark is invalid and needs to be garbage collected, in this application, the bitmap can be used to mark the valid/invalid space of the segmented data space, and the bitmap can reduce invalid data movement in the subsequent garbage collection stage.

一种可选的，对重复使用的缓存区间在历史过程中所指向的分段数据空间进行回收处理的步骤，包括：在存在多个待回收的分段数据空间的情况下，分析分段数据空间的空间使用率和内部存储数据量；按照空间使用率和内部存储数据量的排序顺序，对多个待回收的分段数据空间分别进行回收处理。Optionally, the step of reclaiming the segmented data space pointed to by the reused cache interval in the historical process includes: analyzing the segmented data when there are multiple segmented data spaces to be reclaimed. The space usage rate and the internal storage data volume of the space; according to the sorting order of the space usage rate and the internal storage data volume, the multiple segmented data spaces to be reclaimed are reclaimed respectively.

本实施例中，根据数据盘使用容量的高低启动垃圾回收算法，在容量使用率低的时候可以优先回收高垃圾量的数据空间。In this embodiment, the garbage collection algorithm is started according to the level of the used capacity of the data disk, and data space with a high amount of garbage can be preferentially recycled when the capacity usage rate is low.

另一种可选的，在业务的低峰阶段，可以采取更加严格的措施，提早或者加大力度进行垃圾回收，提前释放空间。Alternatively, in the low-peak stage of the business, more stringent measures can be taken to perform garbage collection in advance or intensify efforts to release space in advance.

空间回收过程，是回收无效的分段数据空间-Segment空间的过程，回收过程中根据segment bitmap内容读到有效的数据空间，然后分配新的segment再以追加写的方式写入到新的数据空间，然后修改混合介质的数据动态映射使得缓存空间指向新的数据空间，修改完成后，回收老的数据空间用作后续的数据回写。The space reclamation process is the process of reclaiming invalid segmented data space-Segment space. During the reclamation process, the valid data space is read according to the segment bitmap content, and then a new segment is allocated and then written to the new data space by additional writing. , and then modify the data dynamic mapping of the mixed media so that the cache space points to the new data space. After the modification is completed, the old data space is reclaimed for subsequent data write-back.

本实施例，通过缓存盘缓存热点的数据，将较冷的数据回写到数据盘中，发挥了缓存盘高性能和高寿命优势；回写通过采用追加写技术，发挥了某些种类磁盘随机写能力差但大块顺序写能力好的优点；追加写过程中可以很方便填充一些数据包括对数据进行再加工(例如checksum和压缩等)，再加工的过程对于发挥数据盘的某些特性意义重大。In this embodiment, the cache disk caches hot data, and writes the colder data back to the data disk, which takes advantage of the high performance and long life of the cache disk; It has the advantages of poor writing ability but good large-block sequential writing ability; it is very convenient to fill some data in the process of additional writing, including reprocessing the data (such as checksum and compression, etc.). The reprocessing process is meaningful for some characteristics of the data disk. major.

下面结合另一种可选的实施例来说明本申请。The present application will be described below with reference to another optional embodiment.

实施例二Embodiment 2

本实施例提供了一种数据存储装置，应用于数据存储系统中的缓存层，缓存层所对应的缓存空间中包括多个预先划分的缓存块，该数据存储装置提供了多个实施单元，每个实施单元对应了上述实施例一中的各个实施步骤。This embodiment provides a data storage device, which is applied to a cache layer in a data storage system. The cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks. The data storage device provides a plurality of implementation units. Each implementation unit corresponds to each implementation step in the foregoing first embodiment.

图2是根据本发明实施例的一种可选的数据存储装置的示意图，如图2所示，该装置可以包括：接收单元21、存储单元23、组合单元25、写入单元27，其中，FIG. 2 is a schematic diagram of an optional data storage device according to an embodiment of the present invention. As shown in FIG. 2, the device may include: a receiving unit 21, a storage unit 23, a combining unit 25, and a writing unit 27, wherein,

接收单元21，用于接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据；The receiving unit 21 is configured to receive a data processing request initiated by a client, wherein the data processing request at least carries: business data to be processed;

存储单元23，用于将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块；The storage unit 23 is used to store the business data in multiple cache blocks, and identify multiple cache blocks to obtain multiple identified cache blocks;

组合单元25，用于在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求；The combining unit 25 is configured to combine multiple identification cache blocks to obtain a write-back request when the number of multiple identification cache blocks is greater than a preset number threshold;

写入单元27，用于将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。The writing unit 27 is configured to write the write-back request into each segmented data space in the data layer, and determine the data dynamic mapping relationship pointed to the segmented data space by the identification cache block.

上述数据存储装置，可以通过接收单元21接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据，通过存储单元23将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块，通过组合单元25在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求，通过写入单元27将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。在该实施例中，可以通过数据再规整技术，将来自前端的随机小块写数据转化为顺序大块的回写请求，这样能够应对不同的回写请求，实现数据快速整理，提升数据处理效率，从而解决在将数据由缓存层回写至数据层时，无法有效处理随机回写的请求的技术问题。The above data storage device can receive a data processing request initiated by a client through the receiving unit 21, wherein the data processing request at least carries: the business data to be processed, and the business data is stored in a plurality of cache blocks by the storage unit 23, and Identify multiple cache blocks to obtain multiple identified cache blocks, and combine multiple identified cache blocks through the combining unit 25 when the number of multiple identified cache blocks is greater than the preset number threshold to obtain a write-back request. The writing unit 27 writes the write-back request into each segmented data space in the data layer, and determines the data dynamic mapping relationship pointed to by the identification cache block to the segmented data space. In this embodiment, the random small-block write data from the front end can be converted into sequential large-block write-back requests through the data re-warping technology, so that different write-back requests can be handled, data can be quickly sorted, and data processing efficiency can be improved , so as to solve the technical problem that the random write-back request cannot be effectively processed when data is written back from the cache layer to the data layer.

可选的，组合单元包括：第一获取模块，用于获取每个标识缓存块所属的缓存区间，并确定缓存区间的区间起始标识；第一合并模块，用于基于区间起始标识，合并缓存区间，得到回写请求。Optionally, the combining unit includes: a first acquiring module, used for acquiring the cache interval to which each identified cache block belongs, and determining the interval start identifier of the cache interval; a first merging module, used for combining based on the interval start identifier. Cache interval, get write-back request.

可选的，数据存储装置还包括：第二获取模块，用于在将回写请求写入至数据层中的各个分段数据空间中之后，获取缓存区间的区间信息和预先存储的业务数据；预加工模块，用于对业务数据进行预加工处理，得到业务处理数据和预加工信息；第一确定模块，用于将业务处理数据确定为分段数据，将预加工信息确定为分段头信息；第二合并模块，用于将分段头信息和分段数据组合为分段记录信息；第一写入模块，用于将分段记录信息追加写入分段数据空间。Optionally, the data storage device further includes: a second obtaining module, configured to obtain interval information of the cache interval and pre-stored service data after the write-back request is written into each segmented data space in the data layer; The preprocessing module is used for preprocessing the business data to obtain the business processing data and preprocessing information; the first determining module is used for determining the business processing data as segmented data and the preprocessing information as segmented header information The second merging module is used to combine the segment header information and the segment data into segment record information; the first write module is used to additionally write the segment record information into the segment data space.

可选的，写入单元包括：第一确定模块，用于基于标识缓存块所属的缓存区间的区间起始标识和缓存区间的区间大小，确定映射起始位；第三获取模块，用于获取分段数据空间在数据层中的整体数据空间中的分段起始位；基于分段起始位，确定标识缓存块所属的缓存区间在分段数据空间中的分段空间起始位和分段空间大小；第一建立模块，用于建立映射起始位指向分段空间起始位的数据动态映射关系。Optionally, the writing unit includes: a first determining module for determining a mapping start bit based on an interval start identifier identifying the buffer interval to which the cache block belongs and the interval size of the buffer interval; a third acquiring module for acquiring The segment start bit of the segment data space in the overall data space in the data layer; based on the segment start bit, determine the segment space start bit and the segment value in the segment data space that identify the cache interval to which the cache block belongs. The size of the segment space; the first establishment module is used to establish a data dynamic mapping relationship in which the mapping start bit points to the start bit of the segment space.

可选的，数据存储装置还包括：第一分配模块，用于在确定由标识缓存块指向分段数据空间的数据动态映射关系之后，在产生新回写请求的情况下，为新回写请求分配有效的分段数据空间；第二写入模块，用于将新回写请求写入至已分配的有效的分段数据空间。Optionally, the data storage device further includes: a first allocation module, configured to generate a new write-back request in the case of a new write-back request after determining the data dynamic mapping relationship that identifies the cache block to the segmented data space. Allocate a valid segmented data space; the second writing module is used to write a new write-back request into the allocated valid segmented data space.

可选的，数据存储装置还包括：第一判断模块，用于在产生新回写请求的情况下，判断新回写请求所使用的缓存区间是否被重复使用；回收模块，用于在新回写请求所使用的缓存区间被重复使用的情况下，对重复使用的缓存区间在历史过程中所指向的分段数据空间进行回收处理，并对回收的分段数据空间进行无效标记。Optionally, the data storage device further includes: a first judging module for judging whether the cache interval used by the new write-back request is reused when a new write-back request is generated; In the case that the cache area used by the write request is reused, the segment data space pointed to by the reused cache area in the historical process is reclaimed, and the reclaimed segment data space is marked invalid.

可选的，回收模块包括：分析子模块，用于在存在多个待回收的分段数据空间的情况下，分析分段数据空间的空间使用率和内部存储数据量；回收子模块，用于按照空间使用率和内部存储数据量的排序顺序，对多个待回收的分段数据空间分别进行回收处理。Optionally, the recycling module includes: an analysis submodule for analyzing the space usage rate and internal storage data volume of the segmented data space when there are multiple segmented data spaces to be recycled; a recycling submodule for According to the sorting order of the space usage rate and the amount of internal storage data, the multiple segmented data spaces to be reclaimed are reclaimed respectively.

上述的数据存储装置还可以包括处理器和存储器，上述接收单元21、存储单元23、组合单元25、写入单元27等均作为程序单元存储在存储器中，由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The above-mentioned data storage device can also include a processor and a memory, and the above-mentioned receiving unit 21, storage unit 23, combining unit 25, writing unit 27, etc. are all stored in the memory as program units, and the processor executes the above-mentioned stored in the memory. program unit to achieve the corresponding function.

上述处理器中包含内核，由内核去存储器中调取相应的程序单元。内核可以设置一个或以上，通过调整内核参数来将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。The above-mentioned processor includes a kernel, and the corresponding program unit is called from the memory by the kernel. The kernel can set one or more, and write the write-back request into each segmented data space in the data layer by adjusting the kernel parameters, and determine the data dynamic mapping relationship from the identification cache block to the segmented data space.

上述存储器可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)，存储器包括至少一个存储芯片。The above-mentioned memory may include non-persistent memory in computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory includes at least a memory chip.

根据本发明实施例的另一方面，还提供了一种电子设备，包括：处理器；以及存储器，用于存储处理器的可执行指令；其中，处理器配置为经由执行可执行指令来执行上述任意一项的数据存储方法。According to another aspect of the embodiments of the present invention, there is also provided an electronic device, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the above-mentioned execution by executing the executable instructions The data storage method for any item.

根据本发明实施例的另一方面，还提供了一种计算机可读存储介质，计算机可读存储介质包括存储的计算机程序，其中，在计算机程序运行时控制计算机可读存储介质所在设备执行上述任意一项的数据存储方法。According to another aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, where the computer-readable storage medium includes a stored computer program, wherein when the computer program runs, the device where the computer-readable storage medium is located is controlled to execute any of the above An item's data storage method.

本申请还提供了一种计算机程序产品，当在数据处理设备上执行时，适于执行初始化有如下方法步骤的程序：接收客户端发起的数据处理请求，其中，数据处理请求中至少携带有：待处理的业务数据；将业务数据存储至多个缓存块中，并对多个缓存块进行标识，得到多个标识缓存块；在多个标识缓存块的数量大于预设数量阈值的情况下，组合多个标识缓存块，得到回写请求；将回写请求写入至数据层中的各个分段数据空间中，并确定由标识缓存块指向分段数据空间的数据动态映射关系。The present application also provides a computer program product, which, when executed on a data processing device, is suitable for executing a program initialized with the following method steps: receiving a data processing request initiated by a client, wherein the data processing request carries at least: Business data to be processed; store business data in multiple cache blocks, and identify multiple cache blocks to obtain multiple identified cache blocks; when the number of multiple identified cache blocks is greater than the preset number threshold, combine A plurality of identification cache blocks are obtained to obtain a write-back request; the write-back request is written into each segmented data space in the data layer, and the data dynamic mapping relationship between the identification cache block and the segmented data space is determined.

图3是根据本发明实施例的一种数据存储方法的电子设备(或移动设备)的硬件结构框图。如图3所示，电子设备可以包括一个或多个(图中采用102a、102b，……，102n来示出)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104。除此以外，还可以包括：显示器、输入/输出接口(I/O接口)、通用串行总线(USB)端口(可以作为I/O接口的端口中的一个端口被包括)、网络接口、键盘、电源和/或相机。本领域普通技术人员可以理解，图3所示的结构仅为示意，其并不对上述电子装置的结构造成限定。例如，电子设备还可包括比图3中所示更多或者更少的组件，或者具有与图3所示不同的配置。FIG. 3 is a block diagram of a hardware structure of an electronic device (or mobile device) of a data storage method according to an embodiment of the present invention. As shown in FIG. 3, the electronic device may include one or more processors 102 (illustrated by 102a, 102b, . A processing device such as a device FPGA), a memory 104 for storing data. In addition, may also include: display, input/output interface (I/O interface), universal serial bus (USB) port (may be included as one of the ports of the I/O interface), network interface, keyboard , power supply and/or camera. Those skilled in the art can understand that the structure shown in FIG. 3 is only a schematic diagram, which does not limit the structure of the above electronic device. For example, the electronic device may also include more or fewer components than shown in FIG. 3 , or have a different configuration than that shown in FIG. 3 .

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

在本发明的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，可以为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative, for example, the division of the units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.

Claims

1. a data storage method, is characterized in that, is applied to the cache layer in the data storage system, and the cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks, including:

Receive a data processing request initiated by a client, wherein the data processing request at least carries: business data to be processed;

storing the business data in multiple cache blocks, and identifying the multiple cache blocks to obtain multiple identified cache blocks;

In the case that the number of the multiple identification cache blocks is greater than the preset number threshold, combining the multiple identification cache blocks to obtain a write-back request;

The write-back request is written into each segmented data space in the data layer, and a data dynamic mapping relationship pointed to the segmented data space by the identified cache block is determined.

2. The method according to claim 1, wherein the step of combining the multiple identification cache blocks to obtain a write-back request comprises:

Obtain the cache section to which each of the identified cache blocks belongs, and determine the section start identifier of the cache section;

Based on the interval start identifier, the cache interval is merged to obtain the write-back request.

3. The method according to claim 2, wherein after writing the write-back request into each segmented data space in the data layer, the method further comprises:

Obtain the interval information of the cache interval and the pre-stored service data;

Preprocessing the business data to obtain business processing data and preprocessing information;

Determining the business processing data as segmented data, and determining the preprocessing information as segmented header information;

combining the segment header information and the segment data into segment record information;

The segment record information is additionally written into the segment data space.

4. The method according to claim 2, wherein the step of determining the data dynamic mapping relationship pointed to the segmented data space by the identification cache block comprises:

Determine the mapping start bit based on the interval start identifier of the buffer interval to which the identifier cache block belongs and the interval size of the buffer interval;

Obtain the segment start bit of the segment data space in the overall data space in the data layer;

Based on the segment start bit, determine the segment space start bit and the segment space size in the segment data space of the cache interval to which the identifier cache block belongs;

A data dynamic mapping relationship in which the mapping start bit points to the segment space start bit is established.

5. The method according to claim 2, characterized in that, after determining the data dynamic mapping relationship pointed to by the identification cache block to the segmented data space, the method further comprises:

In the case of generating a new write-back request, allocate a valid segmented data space for the new write-back request;

The new write-back request is written to the allocated valid segment data space.

6. The method of claim 5, further comprising:

In the case of generating a new write-back request, determine whether the cache interval used by the new write-back request is reused;

In the case that the cache interval used by the new write-back request is reused, reclaim the segmented data space pointed to by the reused cache interval in the historical process, and reclaim the reclaimed Fragmented data space for invalid marking.

7. The method according to claim 6, wherein the step of reclaiming the segmented data space pointed to in the historical process by the repeatedly used cache interval comprises:

In the case that there are multiple segmented data spaces to be reclaimed, analyzing the space usage rate and internal storage data volume of the segmented data spaces;

According to the sorting order of the space usage rate and the amount of the internal storage data, reclamation processing is performed on the plurality of segmented data spaces to be reclaimed respectively.

8. A data storage device, characterized in that it is applied to a cache layer in a data storage system, and the cache space corresponding to the cache layer includes a plurality of pre-divided cache blocks, including:

a receiving unit, configured to receive a data processing request initiated by a client, wherein the data processing request at least carries: service data to be processed;

a storage unit, configured to store the service data in multiple cache blocks, and identify the multiple cache blocks to obtain multiple identified cache blocks;

a combining unit, configured to combine the multiple identification cache blocks to obtain a write-back request when the number of the multiple identification cache blocks is greater than a preset number threshold;

A writing unit, configured to write the write-back request into each segmented data space in the data layer, and determine a data dynamic mapping relationship from the identification cache block to the segmented data space.

9. An electronic device, characterized in that, comprising:

processor; and

a memory for storing executable instructions for the processor;

Wherein, the processor is configured to perform the data storage method of any one of claims 1 to 7 by executing the executable instructions.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein, when the computer program runs, the device where the computer-readable storage medium is located is controlled to execute claim 1 The data storage method described in any one of to 7.