CN118567567A

CN118567567A - Transparent compression method, storage controller and storage system based on partitioned solid state disk

Info

Publication number: CN118567567A
Application number: CN202410729402.9A
Authority: CN
Inventors: 周游; 吴非; 谢长生; 王宇; 孙子斌
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2024-06-06
Filing date: 2024-06-06
Publication date: 2024-08-30

Abstract

The present invention discloses a transparent compression method, a storage controller and a storage system based on a partitioned solid-state disk, which belongs to the field of data storage. A compression module is integrated in the firmware of the partitioned solid-state disk, and a super block is subdivided into sub-super blocks, including data sub-super blocks and additional sub-super blocks; in the subsequent flash memory allocation, writing, erasing and garbage collection processes, management is performed in units of sub-super blocks. On this basis, the flash memory is also divided into slots of fixed size, and the compressed logical page is mapped to the slot of fixed size. If the compressed logical page is larger than the slot size, the excess part is truncated and written into the additional sub-super block; the size of the slot is dynamically adjusted according to the compression rate of the data; and a lightweight garbage collection strategy is adopted. The present invention realizes the data compression function inside the partitioned solid-state disk, which can reduce the amount of written data, improve the flash memory writing performance and durability, and reduce the host CPU load. At the same time, it reduces space waste and reduces costs.

Description

Transparent compression method based on partition solid-state disk, storage controller and storage system

Technical Field

The invention belongs to the field of data storage, and in particular relates to a transparent compression method, a storage controller and a storage system based on a partition solid-state disk.

Background

With the rapid development of information technology, the data volume has shown an explosive growth, and the demand for storage devices has also increased. Solid-state disks gradually replace the traditional mechanical hard disk because of the advantages of quick reading and writing, low power consumption, vibration resistance and the like. However, the cost of solid state disks is still higher than mechanical hard disks, and therefore, how to reduce the cost of solid state disks and increase the cost performance of solid state disks is a current research hotspot.

A partitioned solid state disk is a solid state disk that employs NVMe protocol that abstracts a storage device at a logical level through a partition namespace interface into partitions with a fixed size, the data inside each partition having to be written in a sequential manner. At the physical level, the firmware of a partitioned solid state disk typically organizes the flash blocks at the same offset location in the back-end flash chip into a set, referred to as a superblock, which is managed in units of superblocks during subsequent flash allocation, writing, erasing, and garbage collection. The organization abstraction of the superblock brings advantages such as maximizing the parallel operation efficiency of the flash memory, reducing the management overhead, etc. The partition manner is helpful to reduce the cost, and in addition, the development of the high-density flash memory technology further improves the storage density, but also brings about problems such as the reduction of the writing performance and the service life. In particular, the four-layer memory cell flash memory has disadvantages of writing performance and life span, although the memory cost is significantly reduced.

The data compression technology reduces the storage space requirement of data and the written data quantity by compressing and encoding the original data before writing the data, thereby realizing the improvement of the writing performance of the application, the reduction of the storage cost of the solid-state disk and the prolongation of the service life. The traditional data compression technology is mainly divided into a software layer and a hardware layer, and the traditional software compression algorithm often occupies a large amount of calculation resources due to high dependence on CPU processing. In contrast, hardware compression schemes, while relieving the CPU of the burden, relying on specialized hardware may introduce data transmission bottlenecks and occupy valuable system bus resources. In response to these challenges, the industry has widely adopted schemes for integrating data compression functionality into storage device firmware that are transparent to the host, not only do the host not require additional management costs, but also avoid occupying CPU resources and data bus bandwidth, thereby providing overall system efficiency. However, the partition solid state disk manages the flash memory in a partition manner, and the single partition space is usually larger (usually in GiB level), which usually causes serious space waste after the data compression technology is implemented in the partition solid state disk in the existing manner, so that the overall storage cost is not reduced. In addition, whether software or hardware compression schemes are adopted, the mapping relation of data before and after compression needs to be maintained, which brings additional management overhead and also puts higher requirements on the memory space.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides a transparent compression method, a storage controller and a storage system based on a partition solid-state disk, and aims to realize a data compression function in the partition solid-state disk, reduce the writing data quantity, improve the writing performance and durability of a flash memory, reduce the CPU load of a host end, reduce the space waste in the partition solid-state disk and reduce the cost.

In order to achieve the above object, according to one aspect of the present invention, there is provided a transparent compression method based on a partitioned solid state disk, in which a data compression module is integrated in firmware of the partitioned solid state disk, a storage space thereof is divided into a data storage area and an additional storage area, each super block in the data storage area is equally divided into data sub-super blocks, and each partition in the additional storage area is equally divided into additional sub-super blocks; the transparent compression method comprises the following steps: for logical partition Z _L to be written, the following steps are performed:

(S1) an idle data sub-super block is allocated for a logical partition Z _L, the mapping relation between the logical partition Z _L and the allocated data sub-super block is recorded into a partition mapping table in a memory, and a first logical page in the logical partition Z _L is selected as a current logical page;

(S2) compressing the current logical page by using a data compression module to obtain a compressed page, if the current allocated data sub-super block is full, reallocating an idle data sub-super block for the logical partition Z _L, recording the mapping relation between the logical partition Z _L and the allocated data sub-super block into a partition mapping table, and turning to the step (S3); otherwise, directly switching to the step (S3);

(S3) writing the compressed page into the currently allocated data sub-superblock;

(S4) if the unwritten logical page exists in the logical partition Z _L, selecting the next logical page as the current logical page, and transferring to the step (S2); otherwise, the write process for logical partition Z _L ends.

Further, in the allocated data sub-superblock, each flash page is split equally into a plurality of slots, one slot being mapped to each logical page in logical partition Z _L;

And, the transparent compression method further includes: for a logical partition Z _L to be written, an additional sub-superblock which is not occupied currently is allocated for the logical partition Z _L;

and, the step (S3) includes:

if the compressed page is smaller than or equal to the size of the currently mapped slot, writing the compressed page into the currently mapped operation;

if the compressed page is larger than the size of the currently mapped slot, the following operations are performed:

(S3-1) writing the first half of the compressed page, which is equal to the slot size, as direct addressing data into the currently mapped slot, and the remaining second half of the data as truncated data, to be journaled into the allocated additional sub-superblock;

And (S3-2) taking the initial address of the truncated data in the extra sub-super block in the data sub-super block as the base address of the data sub-super block, writing the initial address into a partition mapping table, and taking the offset and the length of each truncated data relative to the base address as metadata to write the offset and the length of each truncated data in an out-of-band data area of a flash memory page where corresponding direct addressing data is located.

Further, the transparent compression method based on the partition solid state disk provided by the invention further comprises the following steps: equally dividing the logical partition Z _L into a plurality of analysis windows; for each analysis window, the slot size mapped by the logical page therein is determined as follows:

Obtaining a distribution function of the compression rate of the logic page in the previous analysis window, and setting the size S of the groove mapped by the logic page in the current analysis window according to the size S _z of the compression page corresponding to the compression rate at the preset percentile; the slot size S satisfies: s is greater than or equal to S _z, and S is the slot size closest to S _z among the slot sizes capable of dividing the flash page size.

Further, the preset percentile is 70%.

Further, the transparent compression method based on the partition solid state disk provided by the invention further comprises the following steps: further comprises: executing garbage collection operation by taking the sub super blocks as units; and, when executing garbage collection operation, for the data sub super block D _GC to be collected, the following operations are executed:

If the data sub super block D _GC does not have effective data, the flash memory block is directly erased;

If valid data exists in the data sub-superblock D _GC, distributing the data sub-superblock D _M which is enough to accommodate the valid data, and after the valid data in the data sub-superblock D _GC are aggregated in the partitioned solid state disk and then transferred to the data sub-superblock D _M, erasing the flash memory blocks in the data sub-superblock D _GC;

The sub super block is a data sub super block or an additional sub super block.

Further, the transparent compression method based on the partition solid state disk provided by the invention further comprises the following steps: when the garbage collection operation is performed, for the extra sub super block E _GC to be collected, the following operations are performed:

If the truncated data in the effective data sub-super block does not exist in the extra sub-super block E _GC, the flash memory block is directly erased;

If there is truncated data in the valid data sub-superblock in the extra sub-superblock E _GC, then the extra sub-superblock E _M is allocated to accommodate the truncated data, the truncated data of each valid data sub-superblock in the outer sub-superblock E _GC is migrated to the extra sub-superblock E _M as a whole, and the base address of the corresponding data sub-superblock in the partition mapping table is modified.

Further, the transparent compression method based on the partition solid state disk provided by the invention further comprises the following steps: when the partition solid state disk is idle, if there is an idle space at the tail of any data sub-superblock B _S and the idle space is enough to accommodate truncated data corresponding to the data sub-superblock B _S, the truncated data is migrated from the extra sub-superblock to the tail of the data sub-superblock B _S, and the partition mapping table is modified accordingly.

According to still another aspect of the present invention, there is provided a storage controller based on a partitioned solid state disk, in which a data compression module is integrated in firmware of the partitioned solid state disk, a storage space of the data compression module is divided into a data storage area and an additional storage area, a super block in the data storage area is divided into data sub-super blocks, and a partition in the additional storage area is divided into additional sub-super blocks; the controller includes:

a computer readable storage medium storing a computer program;

and the processor is used for reading the computer program in the computer readable storage medium and executing the transparent compression method based on the partition solid-state disk.

According to yet another aspect of the present invention, there is provided a partitioned solid state disk-based storage system comprising:

The partition solid state disk is characterized in that a data compression module is integrated in firmware of the partition solid state disk, a storage space of the partition solid state disk is divided into a data storage area and an additional storage area, a super block in the data storage area is divided into data sub-super blocks, and a partition in the additional storage area is divided into additional sub-super blocks;

and the storage controller based on the partition solid state disk.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1) The invention integrates the compression module in the firmware of the partitioned solid-state disk, and unloads the data compression function into the solid-state disk, thereby realizing transparent compression, reducing the writing data quantity, improving the writing performance and durability of the flash memory, reducing the CPU load of a host, optimizing the partition mapping management in the partitioned solid-state disk on the basis, dividing the super block into sub super blocks with finer granularity, and distributing the sub super blocks one by one as required when writing into the logic partition, thereby flexibly distributing the storage space according to the compressed data size, improving the space utilization rate, reducing the space waste in the partitioned solid-state disk and reducing the cost.

(2) The invention divides the flash memory page into smaller storage units, namely slots, and each slot is only used for storing one compressed page, if the size of the compressed page exceeds the size of the slot, the compressed page is cut off, and cut-off data is recorded in an additional space in a log form for centralized management, so that each logic page is mapped to the slot with fixed size in a section of continuous space, a data layout mode of 'slot alignment' is realized, under the data layout mode, the number of the logic page is divided by the number of the slots in the flash memory page, the flash memory page number mapped to by the logic page number can be calculated, and the number of the logic page number to the slots in the flash memory page can be modulo the calculation of the corresponding slot offset of the logic page one by one without calculating the mapping relation between each logic page number and the flash memory page number one by one, thereby effectively reducing the storage cost actually occupied by a partition mapping table. Furthermore, based on the data layout mode of slot alignment, the space actually occupied by the partition mapping table is relatively small, and the data can be directly cached in the internal cache of the partition solid-state disk, so that the reading and writing efficiency of the data is effectively improved.

(3) The invention further divides each logical partition to be written into a plurality of analysis windows, and determines the size of the mapped slot of the logical page in the current analysis window based on the statistic condition of the data compression rate in the previous analysis window, the invention discovers that the data set has local characteristics in terms of compression rate, namely, the compression rate of most data pages tends to be concentrated, based on the fact that the compression ratio difference between different percentiles is relatively small, the method and the device can dynamically adjust the size of the groove according to the data compression ratio, a groove configuration mechanism with self-adaptive compression ratio is achieved, the self-adaptive mechanism ensures that the size of the groove can adapt to the compression ratio of a current data set, and storage efficiency is effectively improved. In a preferred scheme, the size of the groove is specifically set to be just capable of accommodating the 70 th compression page, namely, 70% of data pages can be stored in the groove without being truncated, and experimental results show that the arrangement maintains higher space utilization rate on the whole under the condition of sacrificing a certain compression rate, and the storage cost is remarkably reduced.

(5) When the data sub-super block is recovered through garbage recovery operation, the invention directly carries out effective data migration in the disk, avoids unnecessary decompression and recompression, shortens the data migration path, effectively improves the garbage recovery efficiency, and further reduces the CPU load of the host end and the time delay of data migration.

(6) When the truncated data is added to the extra sub-super block in a log form, the starting address of the truncated data in the extra sub-super block in the data sub-super block is used as the base address of the data sub-super block, the data sub-super block is embedded into the partition mapping table, the offset and the length of each truncated data relative to the base address are used as metadata to be written into the out-of-band data area of the flash memory page where the corresponding direct addressing data is located, and when the extra sub-super block is recovered, the truncated data in the effective data sub-super block is integrally migrated to the other extra sub-super block, the corresponding partition mapping table is modified, so that the recovery is efficient, and the related mapping information is easy to maintain.

(7) When the partition solid state disk is idle and enough space exists in the data sub super block, the method and the device can transfer the additional log data which is truncated due to compression from the additional sub super block to the tail part of the data sub super block, and correspondingly update the partition level mapping table, so that the storage space utilization rate can be further improved and the storage cost can be reduced under the condition that the upper layer application is not influenced.

Drawings

FIG. 1 is a diagram illustrating a mapping between logical partitions and physical space under different conventional storage strategies; wherein, (a) is a mapping schematic diagram of a logical partition and a super block in a traditional partition solid state disk without data compression, and (b) is a mapping schematic diagram of a logical partition and a super block in a traditional partition solid state disk with data compression;

FIG. 2 is a schematic diagram of a transparent compression method based on a partition solid state disk according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating mapping between logical partitions and data sub-superblocks according to the present embodiment;

FIG. 4 is a diagram of a mapping mechanism in a conventional partitioned solid state disk employing data compression;

FIG. 5 is a schematic diagram of a data layout of "slot alignment" according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a partition map according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a slot configuration mechanism for adaptive compression rate according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a data compression ratio distribution of different data sets according to an embodiment of the present invention;

FIG. 9 is a theoretical benefit of a data layout for "slot alignment" provided by an embodiment of the present invention;

Fig. 10 is a schematic diagram of recycling a data sub-superblock according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Before explaining the technical scheme of the invention in detail, related technical terms are explained and explained.

Partition namespace interface: an interface protocol allows a storage device to be abstracted into partitions having a fixed size, thereby simplifying the operation and management of the storage device.

Partitioned solid state disk: a solid state disk employing NVMe protocol abstracts a storage device through a partition namespace interface into partitions with a fixed size, the data inside each partition having to be written in a sequential manner.

Super block: solid state disk firmware typically organizes the flash blocks in the same offset location in the back-end flash chip into a collection, which is referred to as a superblock. These operations are managed in super blocks during subsequent flash allocation, writing, erasing, and garbage collection.

In a conventional partitioned solid state disk that does not employ transparent compression, the write size of a logical partition is consistent with the size of the space actually occupied by the logical partition on a flash memory block, and as shown in fig. 1 (a), a mapping diagram of the logical partition and a super block in the partitioned solid state disk that does not employ data compression is shown, when the logical partition is completely written, the flash memory space in the super block is also completely used; however, after the transparent compression function is introduced, the written data of the logical partition is processed by the transparent compression module, so that occupation of the super block space is reduced, and the specific occupation proportion depends on the compression rate of the data, as shown in (b) of fig. 1, in the case of adopting the conventional mapping manner, even if the logical partition is completely written, the corresponding super block is not fully utilized, resulting in potential waste of the storage space.

In order to achieve the data compression function inside the partition solid-state disk, reduce the writing data amount, improve the writing performance and durability of the flash memory, reduce the CPU load of the host, reduce the space waste in the partition solid-state disk, and reduce the cost, in one embodiment of the present invention, i.e., embodiment 1, a transparent compression method based on the partition solid-state disk is provided, and the partition mapping management in the partition solid-state disk is optimized. As shown in fig. 2, in this embodiment, a data compression module is integrated in the firmware of the partitioned solid state disk, and its storage space is divided into a data storage area and an additional storage area, each super block in the data storage area is equally divided into a plurality of data sub-super blocks, and each super block in the additional storage area is equally divided into a plurality of additional sub-super blocks. When the subsequent logical partition writes, the physical storage space is allocated by taking the data sub-super block as a unit, when one logical partition performs the write operation for the first time, the logical partition is mapped into one data sub-super block, and if the data sub-super block cannot meet the space requirement, more sub-super blocks are allocated as required.

The invention integrates the compression module in the firmware of the partitioned solid-state disk, and unloads the data compression function into the solid-state disk, thereby realizing transparent compression, reducing the writing data quantity, improving the writing performance and durability of the flash memory, and reducing the CPU load of the host end. Alternatively, the compression algorithm used in this embodiment is a Deflate algorithm based on static Huffman coding, which is a core component of the Zlib compression library and provides the basis for numerous other compression algorithms, typically. It should be noted that in other embodiments of the present invention, other data compression algorithms may be selected according to the actual data compression requirements.

Accordingly, as shown in fig. 3, the transparent compression method provided in this embodiment includes: for logical partition Z _L to be written, the following steps are performed:

As can be seen from comparing fig. 3 and fig. 1 (a), in this embodiment, the sub super blocks with finer granularity are used as the basic units of partition mapping, so that the storage space can be flexibly allocated according to the size of the compressed data, the space utilization is improved, and the storage cost is reduced.

As shown in fig. 2, the present embodiment further proposes a data layout mode based on "slot alignment" based on partition management mapping using sub-superblocks as basic units, and slot configuration for additional log data partition management and compression rate adaptation. The following is a detailed description.

The conventional data compression technique converts a fixed-size data page into a variable-length fragment in a physical flash memory, as shown in fig. 4, which causes a conflict with a coarse-granularity mapping mechanism in a partition solid-state disk when the conventional data compression technique is implemented in the partition solid-state disk, so that management of a mapping table becomes complex, and the cache requirement and hardware cost of metadata management are increased. In order to improve space utilization and reduce complexity and cache requirements of mapping data management, the present embodiment proposes a data layout manner of "slot alignment", specifically, as shown in fig. 5, in the allocated data sub-super-blocks, each flash page is divided equally into a plurality of slots, and each logical page in the logical partition Z _L is mapped to a slot;

and, the step (S3) includes:

If the compressed page is smaller than or equal to the size of the currently mapped slot, writing the compressed page into the currently mapped operation; compressed pages a and B as in fig. 5;

If the compressed page is larger than the size of the currently mapped slot, as the compressed page corresponding to logical page C in FIG. 5, then the following operations are performed:

For example, in fig. 5, in the compressed page formed by compressing the logical page C, the first half C ₁ with the same size as the slot is written as direct addressing data into the slot, and the remaining part is added as truncated data C ₂ in the form of a log to an additional sub-superblock;

In this embodiment, the sub-superblocks are classified and managed according to the type of data stored, the sub-superblocks used for storing the compressed data in the slot are called "data sub-superblocks", the sub-superblocks used for storing truncated data are called "extra sub-superblocks", the data sub-superblocks are mapped to corresponding logical partitions, and the extra sub-superblocks are shared by a plurality of logical partitions. At any given time, the additional sub-superblocks only allow the truncated data of one logical partition to be written in a log form, so that the data sub-superblocks are continuously stored in the additional sub-superblocks, the truncated data of the same data sub-superblock can be migrated in the whole in the subsequent garbage collection operation, the garbage collection efficiency is improved, and the reading efficiency can be improved through a data pre-reading mechanism when the additional sub-superblocks are sequentially read. Based on the mechanism of append writing, the extra data sub-superblock may also be referred to as an "extra log data area".

When one logical partition completes writing and closes, the extra sub-superblock it uses can be reassigned to another newly opened logical partition for storing its truncated data. The truncated data for each logical partition is journaled in a contiguous physical space, i.e., a series of contiguous and independent flash pages. As shown in FIG. 6, the data for logical partition 1 has been completely written and closed, so the extra sub-superblocks it uses can be used by the new logical partition 2. Since neither logical partition 2 nor logical partition 3 has been fully written, a separate additional sub-superblock is required to store truncated data in log form.

In the data layout mode, the number of the flash memory page numbers mapped by the logical page numbers can be calculated by dividing the logical page numbers by the number of the flash memory page slots, and the offset of the corresponding slots of the logical page can be calculated by modulo the number of the logical page numbers to the flash memory page slots without calculating the mapping relation between each logical page number and the flash memory page number one by one, thereby effectively reducing the storage overhead actually occupied by the partition mapping table.

In order to further improve the storage efficiency and realize transparent data compression, the embodiment further provides a slot configuration mechanism with adaptive compression rate, which dynamically adjusts the subsequent slot size according to the compression rate of the original written data of the logical partition, and in the process of executing data compression, the embodiment analyzes the compression rate of the data in real time so as to provide a basis for the configuration of the subsequent slot size, and specifically, as shown in fig. 7, equally divides the logical partition Z _L into a plurality of analysis windows; for each analysis window, the slot size mapped by the logical page therein is determined as follows:

Because of the limitations of partition namespace interfaces, data within a single partition can only be written sequentially, so analysis windows are also processed sequentially.

In this embodiment, the distribution function of the compression rate of the logical page in the previous analysis window is specifically obtained by the following manner: when writing the compressed data into the physical flash memory page, the size of the compressed data is recorded, in order to reduce the occupation of the memory space, the idea of a bucket ordering algorithm is adopted, and the distribution condition of the data compression rate in the analysis window is obtained by counting through using a limited bucket. Based on the statistical result, a distribution function of the compression rate of the logic page in the previous analysis window can be obtained, wherein the abscissa is the compression rate, and the ordinate is the percentage. And selecting the compression rate at the corresponding percentile according to the load characteristic, and calculating the corresponding slot size as the slot size corresponding to the next analysis window. Because the data set has a local characteristic in the aspect of compression rate, namely the compression rate of most data pages tends to be concentrated, and the compression rate difference between different percentiles is relatively small, the embodiment can dynamically adjust the size of the groove according to the data compression rate, and a groove configuration mechanism with self-adaptive compression rate is realized, and the self-adaptive mechanism ensures that the size of the groove can adapt to the compression rate of the current data set, so that the storage efficiency is effectively improved.

It is readily understood that the compression rate is the ratio of the compressed size of the data to the original data size, and that, at the initial time, for the first analysis window, the slot size mapped by the logical page may be set directly according to the data set characteristics.

In practical application, the preset percentile of the size of the set slot should be determined according to the data load characteristics and the performance requirements, and it is easy to understand that the larger the preset percentile is, the fewer logical pages are blocked, but the more space is wasted in the slot, otherwise, the larger the preset percentile is, the fewer space is wasted in the slot, but the more logical pages are blocked.

Optionally, in this embodiment, the preset percentile is 70%, and in an ideal state, at least 70% of the logical pages may be directly written into the corresponding slot after being compressed, and only 30% of the logical pages need to be truncated after being written into the slot. Experiments show that the percentile is set to be 70%, so that the higher space utilization rate can be maintained on the whole under the condition of sacrificing a certain compression rate, and the storage cost is obviously reduced.

In this embodiment, the data layout of "slot alignment" and the slot configuration mechanism of compression rate adaptation are proposed based on the finding that the compression rate of the data set has locality, and in order to make the relevant mechanism clearer, the following analysis is performed in conjunction with the compression rate of the real data set. The datasets involved include well-known Silesia compressed corpora and other publicly available datasets that cover a wide range of application scenarios, such as databases (nci and osdb), code and library files (samba and ooffice), web page text (webstrer and XML), electronic mail (mail), language phrase tags (autokey), and bioinformatics data (hgdb and neuro). The size of these datasets ranged from 2.2 mibs to 7.6GiB. Based on the size of the 4KiB page, the compression is carried out by adopting Deflate algorithm based on static Huffman coding, and the compression rate of the data with different percentiles is visualized through a histogram, as shown in FIG. 8, the lower the height of the pillar, the stronger the compression rate of the data is indicated. In these data sets, the 50 th percentile data compression rate was between 0.22 and 0.81 with an average value of 0.47. This data shows that real data typically has significant compression potential and that data compression has a potential savings of more than 2 times in terms of storage cost, while figure 8 shows the distribution of compression ratios for each data set, it also reveals the locality characteristics of the data set in terms of compression ratio, i.e. the compression ratios for most data pages tend to be concentrated, with relatively small differences in compression ratio between the different percentiles.

The core idea of the data layout mode of "slot alignment" proposed in this embodiment is to align the compressed data page into a slot with a fixed size, and actively sacrifice part of compression space benefit, thereby reducing the number of entries of the compression mapping table and reducing the management and buffering costs of the mapping table. Fig. 9 shows the theoretical benefit of the "slot alignment" data layout in this embodiment, and as shown in fig. 9, although the slot alignment design results in a certain compression rate loss, this loss is within an acceptable range. Experimental results show that the slot alignment strategy results in an average of 7.5% space loss, with a maximum loss of no more than 13.0%. This result shows that although the slot alignment design may sacrifice some compressibility, overall high space utilization is maintained, achieving a significant reduction in storage costs.

With the update, deletion, and reset of logical partitions, etc., invalid data may appear in a sub-superblock or the entire sub-superblock may be marked as invalid, similar to a conventional partitioned solid state disk, when the number of available physical flash pages in the partitioned solid state disk is small (less than a certain threshold), a garbage collection operation will be triggered. In this embodiment, the garbage collection operation is performed by taking the sub-superblocks (including the data sub-superblocks and the additional sub-superblocks) as units, and when garbage collection is performed, the data sub-superblocks or the additional sub-superblocks to be collected can be selected by using a policy in the existing garbage collection algorithm.

In order to further improve the garbage collection efficiency and reduce the CPU load of the host, as a preferred embodiment, in this embodiment, when performing the garbage collection operation, the following operations are performed on the data sub super block D _GC to be collected:

The sub super block is a data sub super block or an additional sub super block.

As shown in the left side of fig. 10, in the conventional garbage collection strategy, when partition garbage collection is performed, the conventional method needs to read and decompress data from the flash memory, send the data into the memory of the host, and then, by re-issuing a write command, re-pass through the compression module in the solid-state disk, and re-write the data into the flash memory after re-compression, which obviously causes unnecessary occupation of the compression module. In the embodiment, the recovery mode of the super block of the data is shown on the right side of fig. 10, and the effective data migration is directly performed in the disk, so that unnecessary decompression and recompression are avoided, and the garbage recovery efficiency is effectively improved. In a partitioned solid state disk, migration of data in the disk can be realized by calling a simple copy command based on a partition namespace interface. It should be noted that other ways of implementing intra-disk migration are applicable to the present invention.

Additional sub-superblocks may contain both valid and invalid flash pages from different logical partitions, thus requiring special handling of the additional sub-superblocks. In this embodiment, when performing garbage collection operation, for the extra sub-superblock E _GC to be collected, the following operations are performed:

By recycling the extra sub-super blocks in the mode, because the truncated data in the same data sub-super block is integrally migrated, the address offset and the length recorded in the out-of-band area of the flash memory page are not required to be modified, only the base address of the data sub-data block in the partition mapping table is updated, and the related information is convenient to maintain and update. Thanks to the compression rate adaptive slot configuration, the truncated data and the data using the extra sub-superblocks are relatively small, so that the garbage collection frequency of the extra log data area is also low, and the performance overhead caused by data migration is negligible.

It will be readily appreciated that for truncated compressed pages, direct addressing data needs to be read from the corresponding slots of the data sub-superblock, and then truncated data accessed; when accessing the truncated data, the base address is firstly acquired from the cache, then the base address and the offset address stored according to the out-of-band data area of the flash memory are combined to obtain the complete target physical flash memory position, the data with the corresponding length is divided and read from the offset address to obtain the truncated data, and the direct addressing data and the truncated data are combined to obtain the complete compressed page.

In order to further improve the storage space utilization, the present embodiment further includes: when the partition solid state disk is idle, if there is an idle space at the tail of any data sub-superblock B _S and the idle space is enough to accommodate truncated data corresponding to the data sub-superblock B _S, the truncated data is migrated from the extra sub-superblock to the tail of the data sub-superblock B _S, and the partition mapping table is modified accordingly.

Since the rollback operation is only performed when the partitioned solid state disk is idle, it does not negatively impact user performance. By returning the data, the storage space in the data sub-superblock is more fully utilized, and multiple sub-superblocks are not required to be accessed when truncated data is accessed. Meanwhile, occupation of the truncated data to the additional sub-data blocks is reduced, and data migration overhead during recovery of the additional sub-data blocks is further reduced.

In general, the embodiment can effectively reduce the data writing quantity, improve the writing performance and durability of the solid-state disk, reduce the CPU load of the host end, improve the utilization rate of the storage space and reduce the storage cost.

Example 2:

A storage controller based on a partitioned solid state disk, wherein a data compression module is integrated in firmware of the partitioned solid state disk, a storage space of the data compression module is divided into a data storage area and an additional storage area, super blocks in the data storage area are divided into data sub-super blocks, and partitions in the additional storage area are divided into additional sub-super blocks; the controller includes:

a computer readable storage medium storing a computer program;

a processor configured to read a computer program in a computer-readable storage medium and execute the transparent compression method based on the partitioned solid state disk provided in embodiment 1.

Example 3:

a partitioned solid state disk-based storage system, comprising:

and the partitioned solid state disk based storage controller provided by embodiment 2 above.

It will be readily appreciated by those skilled in the art that the foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A transparent compression method based on a partitioned solid-state disk, characterized in that a data compression module is integrated into the firmware of the partitioned solid-state disk, and its storage space is divided into a data storage area and an additional storage area, each super block in the data storage area is equally divided into a plurality of data sub-super blocks, and each super block in the additional storage area is equally divided into a plurality of additional sub-super blocks; the transparent compression method comprises: for a logical partition Z _L to be written, performing the following steps:

(S1) allocating a free data sub-super block to the logical partition Z _L , recording the mapping relationship between the logical partition Z _L and the allocated data sub-super block into a partition mapping table in the memory, and selecting the first logical page in the logical partition Z _L as the current logical page;

(S2) compressing the current logical page by using the data compression module to obtain a compressed page; if the currently allocated data sub-super block is full, allocating an idle data sub-super block to the logical partition Z _L , recording the mapping relationship between the logical partition Z _L and the allocated data sub-super block in the partition mapping table, and proceeding to step (S3); otherwise, directly proceeding to step (S3);

(S3) writing the compressed page into the currently allocated data sub-super block;

(S4) If there are still unwritten logical pages in the logical partition Z _L , the next logical page is selected as the current logical page and the process goes to step (S2); otherwise, the writing process of the logical partition Z _L ends.

2. The transparent compression method based on partitioned solid state disk according to claim 1, characterized in that, in the allocated data sub-super block, each flash memory page is equally divided into a plurality of slots, and each logical page in the logical partition Z _L is mapped to one slot;

Furthermore, the transparent compression method further comprises: allocating a currently unoccupied additional sub-super block to the logic partition Z _L to be written;

Furthermore, the step (S3) comprises:

If the compressed page is less than or equal to the size of the currently mapped slot, write the compressed page to the currently mapped operation;

If the compressed page is larger than the size of the currently mapped slot, the following actions are performed:

(S3-1) writing the first half of the compressed page that is equal to the slot size into the currently mapped slot as directly addressed data, and appending the remaining second half of the data as truncated data to the allocated additional sub-super block in the form of a log;

(S3-2) The starting address of the truncated data in the data sub-super block in the additional sub-super block is written into the partition mapping table as the base address of the data sub-super block, and the offset and length of each truncated data relative to the base address are written as metadata into the out-of-band data area of the flash memory page where the corresponding directly addressed data is located.

3. The transparent compression method based on partitioned solid state disk according to claim 2, further comprising: dividing the logical partition Z _L into a plurality of analysis windows; for each analysis window, determining the slot size mapped by the logical page therein in the following manner:

Obtain the distribution function of the compression ratio of the logical page in the previous analysis window, and set the slot size S mapped to the logical page in the current analysis window according to the compressed page size S _z corresponding to the compression ratio at the preset percentile; the slot size S satisfies: S ≥ S _z , and S is the slot size closest to S _z among the slot sizes that can divide the flash memory page size.

4 . The transparent compression method based on a partitioned solid state disk according to claim 3 , wherein the preset percentile is 70%.

5. The transparent compression method based on a partitioned solid state disk according to any one of claims 2 to 4, characterized in that it further comprises: performing a garbage collection operation in units of sub-super blocks; and when performing the garbage collection operation, for the data sub-super block D _GC to be recycled, performing the following operations:

If there is no valid data in the data sub-superblock D _GC , the flash memory block therein is directly erased;

If there is valid data in the data sub-super block D _GC , a data sub-super block D _M that is sufficient to accommodate the valid data is allocated, and after the valid data in the data sub-super block D _GC is aggregated and migrated to the data sub-super block D _M in the partitioned solid state disk, the flash memory block in the data sub-super block D _GC is erased;

The sub-super block is a data sub-super block or an additional sub-super block.

6. The transparent compression method based on a partitioned solid state disk according to claim 5, further comprising: when performing a garbage collection operation, for the extra sub-super block E _GC to be recycled, performing the following operations:

If the truncated data in the valid data sub-super block does not exist in the extra sub-super block E _GC , the flash memory block therein is directly erased;

If there are truncated data in the valid data sub-super block in the additional sub-super block E _GC , an additional sub-super block _EM that is sufficient to accommodate the truncated data is allocated, the truncated data of each valid data sub-super block in the external sub-super block E _GC is migrated as a whole to the additional sub-super block _EM , and the base address of the corresponding data sub-super block in the partition mapping table is modified.

7. The transparent compression method based on a partitioned solid-state disk as described in claim 6 is characterized in that it also includes: when the partitioned solid-state disk is idle, for any data sub-super block _BS , if there is free space at its tail, and the free space is sufficient to accommodate the truncated data corresponding to the data sub-super block _BS , then the truncated data is migrated from the additional sub-super block to the tail of the data sub-super block _BS , and the partition mapping table is modified accordingly.

8. A storage controller based on a partitioned solid-state disk, characterized in that a data compression module is integrated into the firmware of the partitioned solid-state disk, and its storage space is divided into a data storage area and an additional storage area, a super block in the data storage area is divided into data sub-super blocks, and a partition in the additional storage area is divided into additional sub-super blocks; the controller comprises:

A computer-readable storage medium for storing a computer program;

The processor is used to read the computer program in the computer-readable storage medium and execute the transparent compression method based on the partitioned solid state disk according to any one of claims 1 to 7.

9. A storage system based on a partitioned solid state disk, comprising:

A partitioned SSD has a data compression module integrated into its firmware, and its storage space is divided into a data storage area and an additional storage area. The superblock in the data storage area is divided into data sub-superblocks, and the partition in the additional storage area is divided into additional sub-superblocks.

And the storage controller based on partitioned solid state disk as described in claim 8.