CN115826856A

CN115826856A - Flash memory exception handling method, system and storage medium

Info

Publication number: CN115826856A
Application number: CN202211468251.3A
Authority: CN
Inventors: 李舒
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-03-21

Abstract

The embodiment of the application provides a flash memory exception handling method, a flash memory exception handling system and a storage medium. In the embodiment of the application, for a super block of an idle flash memory block which is not reserved for replacing an abnormal flash memory block, when an unreadable abnormal super block occurs in the super block, the data of the abnormal super block can be recovered according to application data of other normal data flash memory blocks and check data of redundant blocks, RAID is constructed by using normal data flash memory blocks, storage protection is performed on the data of the abnormal super block, and the data of the abnormal flash memory block can be stored on the premise of not reducing data reliability. Compared with an abnormal flash memory processing mode of reserving idle flash memory blocks for replacing abnormal flash memory blocks, the capacity of the super block can be reused, and the capacity utilization rate of the super block is improved.

Description

Flash memory exception handling method, system and storage medium

技术领域technical field

本申请涉及数据存储技术领域，尤其涉及一种闪存异常处理方法、系统及存储介质。The present application relates to the technical field of data storage, and in particular to a flash memory exception handling method, system and storage medium.

背景技术Background technique

因特网和电子商务的激增产生大量的数据。本领域已创建大量的存储系统和服务器来存储和访问数据。存储系统或服务器可以包括易失性存储器(例如，动态随机存取存储器和多个存储驱动器(如固态驱动器等)。存储驱动器可包括用于持久存储的非易失性存储器(如NAND闪存或闪存存储器等)。服务器中的存储器在存储系统的性能和容量中起关键作用。The proliferation of the Internet and electronic commerce generates massive amounts of data. A large number of storage systems and servers have been created in the art to store and access data. The storage system or server may include volatile memory (e.g., dynamic random access memory) and multiple storage drives (such as solid-state drives, etc.). The storage drives may include non-volatile memory (such as NAND flash or flash memory) for persistent storage. memory, etc.) The memory in the server plays a key role in the performance and capacity of the storage system.

存储驱动器通常利用内部数据缓冲区作为写缓存，以缩短写延迟。众所周知，闪存在数据读写过程中，可能会发生闪存块故障。在现有技术中，经常在每个闪存芯片预留空白的闪存块，在出现异常闪存块时，利用预留空白闪存块来替换异常闪存块。当闪存芯片预留的空白闪存块用尽时，该闪存芯片整体不再可用，导致存储驱动器的容量降低，使得闪存容量的利用率较低。Storage drives typically utilize internal data buffers as write caches to reduce write latency. As we all know, during the data read and write process of flash memory, flash block failure may occur. In the prior art, a blank flash memory block is often reserved in each flash memory chip, and when an abnormal flash memory block occurs, the abnormal flash memory block is replaced by the reserved blank flash memory block. When the blank flash memory blocks reserved by the flash memory chip are exhausted, the flash memory chip as a whole is no longer usable, resulting in a reduction in the capacity of the storage drive and a low utilization rate of the flash memory capacity.

发明内容Contents of the invention

本申请的多个方面提供一种闪存异常处理方法、系统及存储介质，用以提高闪存容量的利用率。Aspects of the present application provide a flash memory exception handling method, system, and storage medium, so as to improve the utilization rate of flash memory capacity.

本申请实施例提供一种闪存异常处理方法，包括：An embodiment of the present application provides a flash memory exception handling method, including:

针对闪存中的异常闪存块，检测所述异常闪存块是否可读取；For the abnormal flash memory block in the flash memory, detect whether the abnormal flash memory block can be read;

若所述异常闪存块不可读取，根据所述异常闪存块所属的目标超级块中目标闪存块的应用数据和冗余块的校验数据，恢复所述异常闪存块的应用数据；所述目标闪存块为所述目标超级块中除所述异常闪存块和所述冗余块之外的闪存块；If the abnormal flash memory block is unreadable, recover the application data of the abnormal flash memory block according to the application data of the target flash memory block in the target super block to which the abnormal flash memory block belongs and the check data of the redundant block; The flash memory block is a flash memory block except the abnormal flash memory block and the redundant block in the target super block;

将所述目标闪存块设置为第一独立冗余磁盘阵列RAID，并计算所述异常闪存块的应用数据中待写入所述第一RAID的第一超级页的应用数据的校验数据；The target flash memory block is set as the first redundant array of independent disks RAID, and the check data to be written in the application data of the first superpage of the first RAID in the application data of the abnormal flash memory block is calculated;

将待写入所述第一超级页的应用数据和所述待写入所述第一超级页的应用数据对应的校验数据，写入所述第一超级页。Writing the application data to be written into the first super page and the verification data corresponding to the application data to be written into the first super page into the first super page.

本申请实施例还提供一种计算系统，包括：处理器和存储驱动器；所述存储驱动器包括：控制器和闪存；The embodiment of the present application also provides a computing system, including: a processor and a storage driver; the storage driver includes: a controller and a flash memory;

所述控制器包括：固件及通道管理组件；所述控制器通过所述通道管理组件与所述闪存通信；并通过通道访问所述闪存；The controller includes: firmware and a channel management component; the controller communicates with the flash memory through the channel management component; and accesses the flash memory through a channel;

所述固件，用于执行上述闪存异常处理方法中的步骤。The firmware is used to execute the steps in the above-mentioned flash memory exception handling method.

本申请实施例还提供一种存储有计算机指令的计算机可读存储介质，当所述计算机指令被一个或多个处理器执行时，致使所述一个或多个处理器上述闪存异常处理方法中的步骤。The embodiment of the present application also provides a computer-readable storage medium storing computer instructions. When the computer instructions are executed by one or more processors, the one or more processors in the above-mentioned flash memory exception handling method step.

在本申请实施例中，针对无预留用于替换异常闪存块的空闲闪存块的超级块，在该超级块中出现不可读取的异常超级块时，可根据其它正常的数据闪存块的应用数据和冗余块的校验数据，恢复出异常超级块的数据，并利用正常数据闪存块构建RAID，对异常超级块的数据进行存储保护，可在不降低数据可靠性的前提下存储异常闪存块的数据。相较于预留空闲闪存块进行异常闪存块替换的异常闪存处理方式，可复用超级块的容量，有助于提高超级块的容量利用率。In the embodiment of the present application, for the super block of the free flash memory block that is not reserved for replacing the abnormal flash memory block, when there is an unreadable abnormal super block in the super block, it can be used according to the application of other normal data flash memory blocks. Check data of data and redundant blocks, recover abnormal super block data, and use normal data flash blocks to build RAID, store and protect abnormal super block data, and store abnormal flash memory without reducing data reliability block of data. Compared with the abnormal flash memory processing method of reserving free flash memory blocks for replacement of abnormal flash memory blocks, the capacity of the super block can be reused, which helps to improve the capacity utilization of the super block.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:

图1为本申请实施例提供的计算系统的结构示意图；FIG. 1 is a schematic structural diagram of a computing system provided by an embodiment of the present application;

图2为本申请实施例提供的闪存芯片的逻辑结构示意图；Fig. 2 is a schematic diagram of the logic structure of the flash memory chip provided by the embodiment of the present application;

图3为本申请实施例提供的闪存的逻辑结构示意图；FIG. 3 is a schematic diagram of a logical structure of a flash memory provided by an embodiment of the present application;

图4为传统方案提供的异常闪存处理方式示意图；Figure 4 is a schematic diagram of the abnormal flash memory processing method provided by the traditional solution;

图5为本申请实施例提供的异常闪存处理方式示意图；FIG. 5 is a schematic diagram of an abnormal flash memory processing method provided by the embodiment of the present application;

图6为本申请实施例提供的异常闪存块可读取时的处理方式示意图；Fig. 6 is a schematic diagram of the processing method when the abnormal flash memory block provided by the embodiment of the present application can be read;

图7为本申请实施例提供的异常闪存块不可读取时的处理方式示意图；Fig. 7 is a schematic diagram of the processing method when the abnormal flash memory block provided by the embodiment of the present application cannot be read;

图8为本申请实施例提供的异常闪存处理方法的流程示意图；FIG. 8 is a schematic flow diagram of a method for processing abnormal flash memory provided by an embodiment of the present application;

图9为本申请实施例提供的异常闪存处理方式的过程示意图；FIG. 9 is a schematic diagram of the process of the abnormal flash memory processing method provided by the embodiment of the present application;

图10为本申请实施例提供的另一异常闪存处理方法的流程示意图。FIG. 10 is a schematic flowchart of another abnormal flash memory processing method provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be clearly and completely described below in conjunction with specific embodiments of the present application and corresponding drawings. Apparently, the described embodiments are only some of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

针对现有异常闪存处理方式导致的闪存容量利用率低的技术问题，本申请实施例提供了一种针对无预留用于替换异常闪存块的空闲闪存块的超级块的管理方法，针对该超级块中不可读取的异常超级块，根据其它正常的数据闪存块的应用数据和冗余块的校验数据，恢复出异常超级块的数据，并利用正常数据闪存块构建RAID，对异常超级块的数据进行存储保护，可在不降低数据可靠性的前提下存储异常闪存块的数据。相较于预留空闲闪存块进行异常闪存块替换的异常闪存处理方式，可复用超级块的容量，有助于提高超级块的容量利用率。Aiming at the technical problem of low utilization rate of flash memory capacity caused by the existing abnormal flash memory processing method, the embodiment of the present application provides a management method for a super block with no spare flash memory block reserved for replacing the abnormal flash memory block. For the abnormal super block that cannot be read in the block, the data of the abnormal super block is recovered according to the application data of other normal data flash blocks and the verification data of the redundant block, and the normal data flash block is used to construct RAID, and the abnormal super block It can store the data of the abnormal flash block without reducing the reliability of the data. Compared with the abnormal flash memory processing method of reserving free flash memory blocks for replacement of abnormal flash memory blocks, the capacity of the super block can be reused, which helps to improve the capacity utilization of the super block.

以下结合附图，详细说明本申请各实施例提供的技术方案。The technical solutions provided by various embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

应注意到：相同的标号在下面的附图以及实施例中表示同一物体，因此，一旦某一物体在一个附图或实施例中被定义，则在随后的附图和实施例中不需要对其进行进一步讨论。It should be noted that the same reference numerals represent the same object in the following drawings and embodiments, therefore, once a certain object is defined in one drawing or embodiment, it does not need to be defined in subsequent drawings and embodiments It is discussed further.

图1为本申请实施例提供的计算系统的结构示意图。如图1所示，计算系统S10可包括：处理器10和存储驱动器20。其中，处理器10与存储驱动器20通信连接。处理器10可实现为计算设备的中央处理器(Central Processing Unit，CPU)。存储驱动器20可为固态驱动器(Solid State Driver，SSD)。存储驱动器20的数量可为1个或多个。多个是指2个或2个以上。FIG. 1 is a schematic structural diagram of a computing system provided by an embodiment of the present application. As shown in FIG. 1 , the computing system S10 may include: a processor 10 and a storage driver 20 . Wherein, the processor 10 is communicatively connected with the storage drive 20 . The processor 10 may be implemented as a central processing unit (Central Processing Unit, CPU) of a computing device. The storage drive 20 may be a solid state drive (Solid State Driver, SSD). The number of storage drives 20 may be one or more. A plurality means two or more.

在一些实施例中，计算系统S10还可包括：易失性存储器30。易失性存储器30可为随机存取存储器(Random Access Memory，RAM)或动态随机存取存储器(Dynamic RandomAccess Memory，DRAM)等。易失性存储器30可为计算设备的主存。当然，计算系统S10还可包括：网卡40等组件。图1中仅示意性给出计算系统的部分组件，并不意味计算系统必须包含图1所示全部组件，也不意味着计算系统只能包括图1所示组件。在本实施例中，计算系统可为计算设备中的计算系统，如服务器中的计算系统等。在云存储或云计算领域，计算系统中的各组件也可位于不同的服务器中。In some embodiments, the computing system S10 may further include: a volatile memory 30 . The volatile memory 30 can be random access memory (Random Access Memory, RAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. Volatile memory 30 may be the main memory of a computing device. Certainly, the computing system S10 may further include components such as a network card 40 . Fig. 1 only schematically shows some components of the computing system, which does not mean that the computing system must include all the components shown in Fig. 1, nor does it mean that the computing system can only include the components shown in Fig. 1 . In this embodiment, the computing system may be a computing system in a computing device, such as a computing system in a server. In the field of cloud storage or cloud computing, components in the computing system can also be located in different servers.

如图1所示，存储驱动器20可包括：控制器21和闪存22。其中，闪存22为非易失性存储介质。闪存22可为与非(Not AND，NAND)闪存或者或非(Not OR，NOR)闪存等。控制器21可包括：对接主机和对接闪存22的接口。对接主机的接口主要是指对接主机的处理器10的接口。控制器21还可以包括写缓冲器211、固件212和通道管理组件213。写缓冲器211具有掉电保护功能，用于缓存主机发送的写命令包含的待写入数据。As shown in FIG. 1 , the storage driver 20 may include: a controller 21 and a flash memory 22 . Wherein, the flash memory 22 is a non-volatile storage medium. The flash memory 22 can be a NAND (Not AND, NAND) flash or a NOR (Not OR, NOR) flash or the like. The controller 21 may include: an interface for connecting to the host and the flash memory 22 . The interface connected to the host mainly refers to the interface connected to the processor 10 of the host. Controller 21 may also include write buffer 211 , firmware 212 and channel management component 213 . The write buffer 211 has a power-failure protection function, and is used for buffering the data to be written included in the write command sent by the host.

固件212是指用于执行本申请描述的命令、指令和/或代码的组件。在本申请中，固件212用于接收写命令；并将写命令包含的待写入数据写入写缓冲器211；之后向处理器10返回写命令写入成功消息。这样，主机(图1中未示出)即认为写命令写入成功。固件212还可将写缓冲器211中的待写入数据以异步方式写入闪存22中。Firmware 212 refers to components for executing the commands, instructions and/or codes described herein. In this application, the firmware 212 is used to receive the write command; write the data to be written contained in the write command into the write buffer 211 ; and then return a write success message to the processor 10 . In this way, the host (not shown in FIG. 1 ) considers that the write command is written successfully. The firmware 212 can also asynchronously write the data to be written in the write buffer 211 into the flash memory 22 .

如图1所示，闪存22可包括多个通道(Channel)221。每个通道上设置有多个闪存芯片222。对于NAND闪存，闪存芯片222可为NAND颗粒。图2为闪存芯片222的结构示意图。如图2所示，每个闪存芯片222可包括：1个或多个闪存平面(Plane)。多个闪存平面(Plane)可并发操作。每个闪存平面可包括：1个或多个闪存块(Block)。闪存块为最小擦除单元。每个闪存块可包括：1个或多个闪存页(Page)。闪存页为闪存22最小的读写单元。As shown in FIG. 1 , the flash memory 22 may include a plurality of channels (Channel) 221 . Each channel is provided with a plurality of flash memory chips 222 . For NAND flash memory, the flash memory chips 222 can be NAND particles. FIG. 2 is a schematic structural diagram of a flash memory chip 222 . As shown in FIG. 2 , each flash memory chip 222 may include: 1 or more flash memory planes (Planes). Multiple flash planes (Planes) can operate concurrently. Each flash memory plane may include: 1 or more flash memory blocks (Block). A flash block is the smallest unit of erasing. Each flash block may include: 1 or more flash pages (Page). A flash memory page is the smallest read/write unit of the flash memory 22 .

在实际使用过程中，通常以超级块为单位对闪存进行数据擦除和写入。下面对超级块的概念进行解释说明。In actual use, data is usually erased and written to the flash memory in units of super blocks. The concept of a super block is explained below.

如图3所示，闪存22包括多个通道(Channel)221。每个通道上设置有多个闪存芯片222。关于闪存芯片222的组织结构可参见上述图2。如图3所示，超级块，可包括位于不同通道上的多个闪存芯片中的闪存块。图3中位于同一行的闪存块组成一个超级块。相应地，超级页，也可称为页条带，可包括位于不同通道上的多个闪存芯片中的闪存页，即超级块中位于同一行的闪存页组成一个超级页。其中，图3中灰色填充的闪存页表示已写入数据的物理页，斜线填充和白色的闪存页表示空闲的闪存页。As shown in FIG. 3 , the flash memory 22 includes a plurality of channels (Channel) 221 . Each channel is provided with a plurality of flash memory chips 222 . For the organizational structure of the flash memory chip 222, please refer to the above-mentioned FIG. 2 . As shown in FIG. 3 , a super block may include flash memory blocks located in multiple flash memory chips on different channels. Flash blocks located in the same row in Figure 3 form a super block. Correspondingly, a super page, also called a page stripe, may include flash pages in multiple flash chips on different channels, that is, flash pages in the same row in a super block form a super page. Wherein, the flash memory pages filled in gray in FIG. 3 represent physical pages with written data, and the flash memory pages filled with oblique lines and white represent free flash memory pages.

在待写入数据写入闪存22期间，固件212可按照“水平”方式将待写入数据写至超级页的闪存页中。也就是说，固件212可将一页数据写至超级页中的排序第一的闪存芯片222的第一个空闲的物理页(即图3中斜线填充的物理页)。然后，固件212可以继续将下一页数据写入下一依次排序的闪存芯片222的第一个空闲的闪存页(图3中斜线填充的闪存页)，以页为单位将页数据依次写入超级页的第一个空闲的物理页中，至将该超级页的校验数据写入该超级页的冗余页。When the data to be written is written into the flash memory 22 , the firmware 212 may write the data to be written into the flash memory page of the super page in a “horizontal” manner. That is to say, the firmware 212 can write a page of data to the first free physical page of the first-ranked flash chip 222 in the superpage (ie, the physical page filled with oblique lines in FIG. 3 ). Then, the firmware 212 can continue to write the next page of data into the first free flash memory page (the flash memory page filled with oblique lines in FIG. into the first free physical page of the superpage, and then write the verification data of the superpage into the redundancy page of the superpage.

在实际应用中，闪存22可能会异常的闪存块，即异常闪存块。在传统方案中，如图4所示，为了对异常闪存块处理，经常在每个闪存芯片222中预留空闲的闪存块，当出现异常闪存块时，利用预留的空闲的闪存块替换异常闪存块。图4的超级块中“blk”表示闪存块，标记“×”的闪存块为异常闪存块，替换异常闪存块的闪存块即为预留的空闲闪存块。如图4所示，可利用与异常闪存块在同一通道的预留的空闲闪存块替换异常闪存块。In practical applications, the flash memory 22 may contain abnormal flash memory blocks, that is, abnormal flash memory blocks. In the traditional scheme, as shown in Figure 4, in order to abnormal flash memory block processing, often reserve free flash memory block in each flash memory chip 222, when abnormal flash memory block occurs, utilize the spare flash memory block of reservation to replace abnormal flash block. “blk” in the super block in FIG. 4 represents a flash memory block, and the flash memory block marked with “×” is an abnormal flash memory block, and the flash memory block replacing the abnormal flash memory block is a reserved free flash memory block. As shown in FIG. 4, the abnormal flash memory block may be replaced by a reserved free flash memory block in the same channel as the abnormal flash memory block.

由于异常闪存块中已写入数据，需将该数据复制到预留的空闲闪存块中，并将异常闪存块标记异常标签，后续不再使用该异常闪存块。上述闪存异常处理方法需要每个闪存芯片需预留空闲的闪存块。当闪存芯片中预留的空闲闪存块用尽时，该闪存芯片整体不可用，导致存储驱动器(如SSD)的容量迅速下降，闪存容量利用率较低。Since data has been written in the abnormal flash block, the data needs to be copied to the reserved free flash block, and the abnormal flash block is marked with an abnormal label, and the abnormal flash block will not be used later. The above-mentioned flash memory exception handling method needs to reserve free flash memory blocks for each flash memory chip. When the free flash memory blocks reserved in the flash memory chip are exhausted, the flash memory chip as a whole is unavailable, resulting in a rapid decline in the capacity of a storage drive (such as an SSD), and low capacity utilization of the flash memory.

在本申请实施例中，为了提高闪存容量的利用率，提供一种新的闪存异常处理方式，下面结合具体实施例进行示例性说明。In the embodiment of the present application, in order to improve the utilization rate of the flash memory capacity, a new method for handling exceptions of the flash memory is provided, which will be exemplarily described below in conjunction with specific embodiments.

图5为本申请实施例提供的无预留的空闲闪存块的存储驱动器的结构示意图。图5中灰色填充标注有“s”的部分，表示传统方案中预留的空闲闪存块，在本申请实施例中，闪存芯片222中无预留的空闲闪存块，即图5中灰色填充部分的闪存块可释放使用。在图5中，当闪存芯片222中出现异常闪存块时，超级块的构成无需更改，在下次垃圾回收(GarbageCollection，GC)之前，已写入数据可保存在原超级块中，可减少数据迁移量和介质写入放大。下面对GC和写入放大进行解释说明。FIG. 5 is a schematic structural diagram of a storage driver without reserved free flash memory blocks provided by an embodiment of the present application. The part marked with "s" filled in gray in Fig. 5 represents the free flash memory block reserved in the traditional solution. In the embodiment of the present application, there is no reserved free flash memory block in the flash memory chip 222, that is, the gray filled part in Fig. 5 The flash memory block can be freed for use. In Fig. 5, when an abnormal flash memory block appears in the flash memory chip 222, the composition of the super block does not need to be changed, and before the next garbage collection (GarbageCollection, GC), the written data can be saved in the original super block, which can reduce the amount of data migration and media write amplification. GC and write amplification are explained below.

GC是指把闪存中的现存数据重新转移到其他闪存位置，并且把一些无用的数据彻底删除的过程。由于闪存的数据写入的方式，即以闪存页为单位写入的，但删除数据需要以闪存块为单位。因此要删除无用的数据，首先需要把一个闪存块包含有用的数据先复制到全新的闪存块中的页面内，这样原来的闪存块中包含的无用数据才能够以闪存块为单位删除。删除后，才能够写入新的数据，而在擦除之前是无法写入新数据的。GC refers to the process of re-transferring the existing data in the flash memory to other flash memory locations, and completely deleting some useless data. Due to the way in which the data of the flash memory is written, that is, it is written in the unit of the flash memory page, but the deletion of data needs to be in the unit of the flash memory block. Therefore, to delete useless data, it is first necessary to copy useful data contained in a flash memory block to a page in a brand new flash memory block, so that the useless data contained in the original flash memory block can be deleted in units of flash memory blocks. New data can only be written after deletion, and new data cannot be written before erasing.

因为当写入新数据时，如果控制器21找不到可以写入的闪存页，会执行GC。GC机制会将一些闪存块中的有效数据合并写入其它的闪存块。然后，将这些闪存块的无效数据擦除，再将新数据写入到这些闪存块中，而在整个过程中除了要写入新数据之外，实际上写入了一些其它闪存块合并过来的数据，这就叫写入放大。Because when writing new data, if the controller 21 cannot find a flash memory page that can be written, GC will be executed. The GC mechanism will merge valid data in some flash memory blocks into other flash memory blocks. Then, erase the invalid data of these flash memory blocks, and then write new data into these flash memory blocks, and in the whole process, in addition to writing new data, actually write some other flash memory blocks. data, this is called write amplification.

在本申请实施例中，为了提高数据存储的稳定性和安全性，可采用冗余校验算法将数据存储至闪存中。具体地，可采用冗余校验算法对数据进行编码，得到冗余校验算法对应的校验数据；并按照冗余校验算法对应的数据存储形式将待写入数据和校验数据写入闪存22中进行存储。这样，在数据部分数据丢失或出错时，可利用冗余校验算法及其它正确的数据，恢复出丢失或出错数据。在本申请实施例中，不限定冗余校验算法的具体实施形式。可选地，冗余校验算法可为奇偶校验算法、差错检测和修正(Error Checking andCorrection，ECC)算法、循环冗余校验(Cyclic Redundancy Check，CRC)算法或者纠删码校验算法等等。In the embodiment of the present application, in order to improve the stability and security of data storage, a redundancy check algorithm may be used to store data in the flash memory. Specifically, the redundancy check algorithm can be used to encode the data to obtain the check data corresponding to the redundancy check algorithm; and the data to be written and the check data are written according to the data storage form corresponding to the redundancy check algorithm stored in the flash memory 22. In this way, when part of the data is lost or erroneous, the lost or erroneous data can be recovered by using the redundancy check algorithm and other correct data. In this embodiment of the present application, the specific implementation form of the redundancy check algorithm is not limited. Optionally, the redundancy check algorithm may be a parity check algorithm, an error detection and correction (Error Checking and Correction, ECC) algorithm, a cyclic redundancy check (Cyclic Redundancy Check, CRC) algorithm or an erasure code check algorithm, etc. wait.

冗余校验算法可恢复的数据的位数是有限的，例如，对于ECC算法能纠正1个bit错误和检测2个bit错误等。相应地，ECC算法对应的位数阈值可取1。The number of bits of data that can be recovered by the redundancy check algorithm is limited. For example, the ECC algorithm can correct 1-bit errors and detect 2-bit errors. Correspondingly, the number of digits threshold corresponding to the ECC algorithm may be 1.

在本申请实施例中，可采用独立冗余磁盘阵列(Redundant Arrays ofIndependent Disks，RAID)技术进行闪存数据存储。闪存中超级块可形成RAID。在RAID场景中，超级块可包括：数据闪存块和冗余块。数据闪存块是指用于存储应用数据的闪存块。冗余块是指存储校验数据的闪存块。冗余块的数量由冗余校验算法决定。例如，对于ECC算法，一个超级块可包括一个冗余块；该超级块中的其它数据块即为数据闪存块。In the embodiment of the present application, flash memory data storage may be performed using redundant array of independent disks (Redundant Arrays of Independent Disks, RAID) technology. The super block in flash memory can form RAID. In a RAID scenario, a super block may include: a data flash block and a redundant block. Data flash blocks are flash blocks used to store application data. A redundant block is a flash memory block that stores parity data. The number of redundant blocks is determined by the redundancy check algorithm. For example, for the ECC algorithm, a super block may include a redundant block; other data blocks in the super block are data flash blocks.

基于上述RAID，固件212在将待写入数据写入闪存中时，还可根据数据闪存块的数量及闪存页的容量，对待写入数据进行切分，以得到待写入数据的数据切片；每个数据切片的数据量等于超级页中数据闪存页的容量。之后，固件212可采用冗余校验算法，计算每个数据切片的校验数据；并将数据切片和数据切片的校验数据以页为单位写入RAID的超级块中。在本申请实施例中，超级块的容量可根据实际需求进行灵活设置。可选地，超级块可包括每个通道的一个闪存块。Based on the above-mentioned RAID, when the firmware 212 writes the data to be written into the flash memory, it can also divide the data to be written according to the number of data flash memory blocks and the capacity of the flash memory page, so as to obtain data slices of the data to be written; The data volume of each data slice is equal to the capacity of the data flash page in the superpage. Afterwards, the firmware 212 may use a redundancy check algorithm to calculate the check data of each data slice; and write the data slice and the check data of the data slice into the super block of the RAID in units of pages. In the embodiment of the present application, the capacity of the super block can be flexibly set according to actual needs. Alternatively, a superblock may include one flash block per channel.

针对上述以RAID技术进行数据存储的闪存，下面结合图6和图7对本申请实施例提供的闪存异常处理方法进行示例性说明。一个超级块包括M个数据闪存块和P个冗余块。数据闪存块是指用于存储应用数据的闪存块。冗余块是指存储校验数据的闪存块。在使用中出现了N个异常闪存块后，该超级块被缩减到含有(M-N)个数据闪存块和P个冗余块。P为正整数，具体取值由冗余校验算法决定。例如，对于ECC算法，P＝1。With regard to the aforementioned flash memory for data storage using the RAID technology, the method for handling the exception of the flash memory provided by the embodiment of the present application will be exemplarily described below with reference to FIG. 6 and FIG. 7 . A super block includes M data flash blocks and P redundant blocks. Data flash blocks are flash blocks used to store application data. A redundant block is a flash memory block that stores parity data. After N abnormal flash blocks appear in use, the super block is reduced to contain (M-N) data flash blocks and P redundant blocks. P is a positive integer, and the specific value is determined by the redundancy check algorithm. For example, for the ECC algorithm, P=1.

在本申请实施例中，固件212可在闪存读写过程中对闪存进行状态检测。例如，在读闪存过程中，固件212可检测当前读的闪存页是否发生读错误；若从当前读的闪存页读取到数据，确定当前读的闪存页所在闪存块不存在读错误；若未从当前读的闪存页读取到数据，则确定当前读的闪存页所在的闪存块发生读错误。针对发生读错误的异常闪存块，可在闪存映射层中对异常闪存块的物理地址标注发生读错误标签，以供针对目标超级块的下次垃圾回收发生时优先回收该异常闪存块的数据。In the embodiment of the present application, the firmware 212 may perform state detection on the flash memory during the process of reading and writing the flash memory. For example, in the process of reading the flash memory, the firmware 212 can detect whether a read error occurs in the flash memory page currently read; If the currently read flash memory page reads data, it is determined that a read error occurs in the flash memory block where the currently read flash memory page is located. For the abnormal flash memory block with read error, the physical address of the abnormal flash memory block can be marked with a read error label in the flash memory mapping layer, so that the data of the abnormal flash memory block can be preferentially reclaimed when the next garbage collection for the target super block occurs.

在写闪存过程中，固件212可读取闪存22的闪存页中写入的数据；并将闪存页中写入的数据与页缓冲区存储的当前页数据进行比较；若写入闪存页的数据与页缓冲区存储的当前页数据相同，确定当前闪存页不存在写错误。相应地，若写入闪存页的数据与页缓冲区存储的当前页数据不一致，确定当前闪存页发生写错误。发生写错误的闪存页所在闪存块即为发生写错误的异常闪存块。当然，若固件212无法从当前闪存页中读取写入的数据，说明固件212发生读错误和写错误。针对发生写错误的异常闪存块，可将异常闪存块标记为只读。In the process of writing the flash memory, the firmware 212 can read the data written in the flash memory page of the flash memory 22; and compare the data written in the flash memory page with the current page data stored in the page buffer; It is the same as the current page data stored in the page buffer, and it is determined that there is no write error in the current flash memory page. Correspondingly, if the data written into the flash memory page is inconsistent with the current page data stored in the page buffer, it is determined that a write error occurs in the current flash memory page. The flash memory block where the flash memory page where the write error occurs is the abnormal flash memory block where the write error occurs. Of course, if the firmware 212 cannot read the written data from the current flash memory page, it means that the firmware 212 has a read error and a write error. For an abnormal flash memory block where a write error occurs, the abnormal flash memory block may be marked as read-only.

在本申请实施例中，针对异常闪存块，固件212可检测异常闪存块是否可读取。可选地，固件212可对异常闪存块进行读数据操作；若异常闪存块的数据读取成功，确定异常闪存块可读取。若异常闪存块的数据无法读出，确定异常闪存块不可读取。由于发生读错误的异常闪存块无法读取数据，因此，发生读错误的异常闪存块不可读取。In the embodiment of the present application, for the abnormal flash memory block, the firmware 212 can detect whether the abnormal flash memory block is readable. Optionally, the firmware 212 can perform a data read operation on the abnormal flash memory block; if the data of the abnormal flash memory block is successfully read, it is determined that the abnormal flash memory block can be read. If the data of the abnormal flash memory block cannot be read, it is determined that the abnormal flash memory block cannot be read. Since data cannot be read from an abnormal flash memory block where a read error has occurred, the abnormal flash memory block where a read error has occurred cannot be read.

发生写错误的异常闪存块中数据可能可读取，也可能不可读取。由于数据可读取的异常闪存块，不影响后续的数据读取的准确性，因此，在本实施例中，如图6所示，可以异常闪存块所属的超级块中的其它数据闪存块和冗余块组成新的RAID；并对新的RAID进行数据写入，直至下次GC。在本申请实施例中，为了便于描述和区分，将异常闪存块所属的超级块中，除异常闪存块之外的其它闪存块，定义为目标闪存块。在图6中，灰色填充的矩形表示当前已写入数据的超级页；黑色填充的矩形表示发生写错误，但数据可读取的闪存页。相应地，发生写错误的闪存页所属的闪存块，即为异常闪存块。数据闪存块中除异常闪存块之外的其它数据闪存块即为目标闪存块。图6中以左斜线填充的数据闪存块即为目标闪存块。Data in an abnormal flash block where a write error occurred may or may not be readable. Because the abnormal flash memory block that data can be read does not affect the accuracy of subsequent data reading, therefore, in this embodiment, as shown in Figure 6, other data flash memory blocks and other data flash memory blocks in the super block to which the abnormal flash memory block belongs can be Redundant blocks form a new RAID; and write data to the new RAID until the next GC. In the embodiment of the present application, for the convenience of description and distinction, the flash memory blocks other than the abnormal flash memory block in the super block to which the abnormal flash memory block belongs are defined as target flash memory blocks. In Figure 6, the gray filled rectangles represent the superpages where data is currently written; the black filled rectangles represent flash memory pages where write errors occur but data can be read. Correspondingly, the flash memory block to which the flash memory page in which a write error occurs belongs to is an abnormal flash memory block. The other data flash blocks in the data flash block except the abnormal flash block are the target flash blocks. The data flash block filled with a left slash in Figure 6 is the target flash block.

由于GC需要回收有效数据，基于此，在GC操作启动之后，固件212可优先回收异常闪存块中的应用数据；并回收目标闪存块的应用数据和冗余块的校验数据。之后，固件212可对异常闪存块所在的超级块进行擦除，并剔除异常闪存块。Since the GC needs to recover valid data, based on this, after the GC operation starts, the firmware 212 can preferentially recover the application data in the abnormal flash memory block; and recover the application data of the target flash memory block and the verification data of the redundant block. Afterwards, the firmware 212 can erase the super block where the abnormal flash memory block is located, and remove the abnormal flash memory block.

在另一些实施例中，异常闪存块的数据可能不可读取，这种情况会影响闪存的数据读取性能，需要对异常闪存块存储的数据进行回收。无法读取的异常闪存块可能发生读错误，也可能发生写错误。图7中仅以不可读取的异常闪存块发生写错误进行图示，但不构成限定。在图7中，灰色填充部分表示已写入数据的超级页，黑色填充部分表示发生写错误的闪存页，该闪存页不可读取。相应地，黑色填充部分的闪存页所在的闪存块即为异常闪存块，该异常闪存块不可读取。已写入数据的超级页中网格填充部分为异常闪存块的已写入数据的闪存页存储的应用数据，即异常闪存块的应用数据。已写入数据的超级页中左斜线填充部分为已写入数据的超级页的校验数据(即原始校验数据)。In some other embodiments, the data of the abnormal flash memory block may not be readable, which will affect the data reading performance of the flash memory, and the data stored in the abnormal flash memory block needs to be reclaimed. An abnormal flash block that cannot be read may have a read error and may have a write error. In FIG. 7 , it is only shown that a write error occurs in an unreadable abnormal flash memory block, but it does not constitute a limitation. In FIG. 7 , the gray filled part represents the superpage where data has been written, and the black filled part represents the flash memory page where a write error occurs, and the flash memory page cannot be read. Correspondingly, the flash memory block where the flash memory page of the black filled part is located is an abnormal flash memory block, and the abnormal flash memory block cannot be read. The grid filling part of the superpage with written data is the application data stored in the flash memory pages of the abnormal flash memory block, that is, the application data of the abnormal flash memory block. The part filled with left slashes in the superpage with written data is the verification data (ie original verification data) of the superpage with written data.

如图7所示，针对不可读取的异常闪存块，固件212可读取异常闪存块所属的目标超级块中目标闪存块的应用数据和目标闪存块的校验数据(即图7的原校验数据)。在本申请各实施例中，目标闪存块是指目标超级块中除异常闪存块和冗余块之外的闪存块，即目标超级块的数据闪存块中除异常闪存块之外的其它闪存块。图7中数据闪存块1和数据闪存块2为目标闪存块。As shown in Figure 7, for the unreadable abnormal flash block, firmware 212 can read the application data of the target flash block in the target super block to which the abnormal flash block belongs and the verification data of the target flash block (that is, the original calibration data of Figure 7 test data). In each embodiment of the present application, the target flash memory block refers to the flash memory block except the abnormal flash memory block and the redundant block in the target super block, that is, other flash memory blocks except the abnormal flash memory block in the data flash memory block of the target super block . Data flash memory block 1 and data flash memory block 2 in FIG. 7 are target flash memory blocks.

进一步，固件212可根据目标闪存块的应用数据和冗余块的校验数据(即图7中的原校验数据)，恢复异常闪存块的应用数据。具体地，固件212可根据目标闪存块的应用数据和冗余块的校验数据(即图7中的原校验数据)，采用冗余校验算法恢复异常闪存块的应用数据。冗余校验算法为目标超级块存储应用数据所使用的冗余校验算法。可选地，固件212可以页为单位，恢复异常闪存块的应用数据。图7中以灰色网格填充的闪存块即为异常闪存块，该闪存块存储的数据，为异常闪存块的应用数据。Further, the firmware 212 can restore the application data of the abnormal flash memory block according to the application data of the target flash memory block and the verification data of the redundant block (ie, the original verification data in FIG. 7 ). Specifically, the firmware 212 can restore the application data of the abnormal flash memory block using a redundancy verification algorithm according to the application data of the target flash memory block and the verification data of the redundant block (ie, the original verification data in FIG. 7 ). The redundancy check algorithm is a redundancy check algorithm used by the target super block to store application data. Optionally, the firmware 212 may restore the application data of the abnormal flash memory block in units of pages. The flash memory block filled with gray grid in FIG. 7 is the abnormal flash memory block, and the data stored in the flash memory block is the application data of the abnormal flash memory block.

在恢复异常闪存块的应用数据之后，还需对异常闪存块的应用数据进行存储。在本实施例中，为了保证异常闪存块的应用数据的稳定性和安全性，如图7所示，固件212可将目标闪存块设置为新的RAID，并计算异常闪存块的应用数据中待写入新的RAID的超级页的应用数据的校验数据。After restoring the application data of the abnormal flash memory block, it is also necessary to store the application data of the abnormal flash memory block. In this embodiment, in order to ensure the stability and security of the application data of the abnormal flash memory block, as shown in FIG. Verification data of the application data written to the superpage of the new RAID.

例如，固件212可采用冗余校验算法，对异常闪存块的应用数据中待写入新的RAID的超级页的应用数据进行编码，以得到待写入该超级页的应用数据的校验数据。在本申请实施例中，为了便于描述和区分，将目标闪存块组成的RAID，定义为第一RAID。第一RAID的超级页，是指目标超级块的超级页中除异常闪存页之外的其它闪存页组成的页条带。第一RAID可包括：数据闪存块和冗余块。第一RAID的冗余块的数量由冗余校验算法决定。For example, the firmware 212 can use a redundancy check algorithm to encode the application data to be written into the superpage of the new RAID in the application data of the abnormal flash memory block, so as to obtain the verification data of the application data to be written into the superpage . In the embodiment of the present application, for the convenience of description and distinction, the RAID composed of target flash memory blocks is defined as the first RAID. The superpage of the first RAID refers to a page stripe composed of other flash pages except the abnormal flash page in the superpage of the target superblock. The first RAID may include: data flash blocks and redundant blocks. The number of redundant blocks of the first RAID is determined by a redundancy check algorithm.

例如，在图7中，数据闪存块1和数据闪存块2组成第一RAID。数据闪存块1为第一RAID的数据闪存块；数据闪存块2为第一RAID的冗余块。For example, in FIG. 7, data flash memory block 1 and data flash memory block 2 form the first RAID. Data flash memory block 1 is a data flash memory block of the first RAID; data flash memory block 2 is a redundant block of the first RAID.

进一步，固件212可将待写入第一RAID的超级页的应用数据，及待写入第一RAID的超级页的应用数据对应的校验数据，写入第一RAID的超级页。固件212以闪存页为单位，将待写入第一RAID的超级页的应用数据，及待写入第一RAID的超级页的应用数据对应的校验数据，写入第一RAID的超级页。例如，在图7中，将待写入第一RAID的超级页的应用数据以闪存页为单位写入数据闪存块1，并将第一RAID的每个超级页的校验数据写入数据闪存块2对应的闪存页。图7中黑色网格填充的矩形表示写入异常闪存块的应用数据的第一RAID的闪存页；图7中竖线填充的矩形表示写入第一RAID的超级页的异常闪存块的应用数据对应的校验数据。Further, the firmware 212 may write the application data to be written into the superpage of the first RAID and the verification data corresponding to the application data to be written into the superpage of the first RAID into the superpage of the first RAID. The firmware 212 writes the application data to be written into the superpage of the first RAID and the verification data corresponding to the application data to be written into the superpage of the first RAID into the superpage of the first RAID in units of flash memory pages. For example, in Figure 7, the application data to be written into the superpage of the first RAID is written into the data flash memory block 1 in units of flash memory pages, and the verification data of each superpage of the first RAID is written into the data flash memory Flash page corresponding to block 2. The rectangle filled with black grid in Fig. 7 represents the flash page of the first RAID of the application data written in the abnormal flash memory block; the rectangle filled with the vertical line in Fig. 7 represents the application data of the abnormal flash memory block written in the super page of the first RAID Corresponding verification data.

本申请实施例提供了一种针对无预留用于替换异常闪存块的空闲闪存块的超级块的管理方法，针对该超级块中不可读取的异常超级块，可根据其它正常的数据闪存块的应用数据和冗余块的校验数据，恢复出异常超级块的数据，并利用该超级块中正常数据闪存块构建新的RAID，对异常超级块的数据进行存储保护，可在不降低数据可靠性的前提下存储异常闪存块的数据。相较于预留空闲闪存块进行异常闪存块替换的异常闪存处理方式，可复用超级块的容量，有助于提高超级块的容量利用率。The embodiment of the present application provides a management method for a super block of a free flash memory block that is not reserved for replacing an abnormal flash memory block. For the abnormal super block that cannot be read in the super block, it can be stored according to other normal data flash memory blocks. The application data of the redundant block and the verification data of the redundant block are restored to the data of the abnormal super block, and a new RAID is constructed by using the normal data flash block in the super block to store and protect the data of the abnormal super block without reducing the data Store the data of abnormal flash block under the premise of reliability. Compared with the abnormal flash memory processing method of reserving free flash memory blocks for replacement of abnormal flash memory blocks, the capacity of the super block can be reused, which helps to improve the capacity utilization of the super block.

另一方面，由于异常闪存块的应用数据还是写入到原超级块中的，超级块的构成无需更改，在下次GC整个超级块之前，超级块的已写入数据仍保存在原超级块中，因此，可减少数据搬迁量和介质写放大。On the other hand, since the application data of the abnormal flash block is still written into the original super block, the composition of the super block does not need to be changed. Before the next GC of the entire super block, the written data of the super block is still stored in the original super block. Therefore, the amount of data relocation and media write amplification can be reduced.

下面结合图8对本申请实施例提供的异常闪存处理方法进行示例性说明。如图8所示，本申请实施例提供的异常闪存处理方法，主要包括以下步骤：The method for processing abnormal flash memory provided by the embodiment of the present application will be exemplarily described below with reference to FIG. 8 . As shown in Figure 8, the abnormal flash memory processing method provided by the embodiment of the present application mainly includes the following steps:

801、针对闪存中的异常闪存块，检测异常闪存块是否可读取。801. For the abnormal flash memory block in the flash memory, detect whether the abnormal flash memory block is readable.

802、若异常闪存块不可读取，根据异常闪存块所属的目标超级块中目标闪存块的应用数据和冗余块的校验数据，恢复异常闪存块的应用数据；目标闪存块为目标超级块中除异常闪存块和冗余块之外的闪存块。802. If the abnormal flash memory block is unreadable, restore the application data of the abnormal flash memory block according to the application data of the target flash memory block in the target super block to which the abnormal flash memory block belongs and the verification data of the redundant block; the target flash memory block is the target super block Flash blocks other than abnormal flash blocks and redundant blocks in

803、将目标闪存块设置为第一RAID。803. Set the target flash memory block as the first RAID.

804、计算异常闪存块的应用数据中待写入第一RAID的第一超级页的应用数据的校验数据。804. Calculate the check data of the application data to be written into the first superpage of the first RAID among the application data of the abnormal flash memory block.

805、将待写入第一超级页的应用数据和待写入第一超级页的应用数据对应的校验数据，写入第一超级页。其中，第一超级页为第一RAID的任一超级页。805. Write the application data to be written into the first superpage and the verification data corresponding to the application data to be written into the first superpage into the first superpage. Wherein, the first superpage is any superpage of the first RAID.

本申请实施例提供的异常闪存处理方法主要应用于上述图1的控制器中的固件。在本实施例中，闪存芯片中无预留的空闲闪存块。当闪存芯片中出现异常闪存块时，超级块的构成无需更改，在下次垃圾回收(Garbage Collection，GC)之前，已写入数据可保存在原超级块中，可减少数据迁移量和介质写入放大。The abnormal flash memory processing method provided in the embodiment of the present application is mainly applied to the firmware in the controller in FIG. 1 above. In this embodiment, there is no reserved free flash memory block in the flash memory chip. When an abnormal flash memory block appears in the flash memory chip, the composition of the super block does not need to be changed. Before the next garbage collection (Garbage Collection, GC), the written data can be saved in the original super block, which can reduce the amount of data migration and media write amplification .

在实施例中，为了提高数据存储的稳定性和安全性，可采用冗余校验算法将数据存储至闪存中。这样，在数据部分数据丢失或出错时，可利用冗余校验算法及其它正确的数据，恢复出丢失或出错数据。在本申请实施例中，可采用RAID技术进行闪存数据存储。闪存中超级块可形成RAID。在RAID场景中，超级块可包括：数据闪存块和冗余块。数据闪存块是指用于存储应用数据的闪存块。冗余块是指存储校验数据的闪存块。冗余块的数量由冗余校验算法决定。In an embodiment, in order to improve the stability and security of data storage, a redundancy check algorithm may be used to store data in the flash memory. In this way, when part of the data is lost or erroneous, the lost or erroneous data can be recovered by using the redundancy check algorithm and other correct data. In the embodiment of the present application, the RAID technology may be used for flash memory data storage. The super block in flash memory can form RAID. In a RAID scenario, a super block may include: a data flash block and a redundancy block. Data flash blocks are flash blocks used to store application data. A redundant block is a flash memory block that stores parity data. The number of redundant blocks is determined by the redundancy check algorithm.

基于上述RAID，在将待写入数据写入闪存中时，还可根据数据闪存块的数量、闪存页的容量，对待写入数据进行切分，以得到待写入数据的数据切片；每个数据切片的数据量等于超级页中数据闪存页的容量。之后，可采用冗余校验算法，计算每个数据切片的校验数据；并将数据切片和数据切片的校验数据以页为单位写入RAID的超级块中。Based on the above-mentioned RAID, when the data to be written is written in the flash memory, the data to be written can also be segmented according to the number of data flash memory blocks and the capacity of the flash memory page, so as to obtain data slices of the data to be written; each The data volume of the data slice is equal to the capacity of the data flash page in the super page. Afterwards, a redundancy check algorithm may be used to calculate the check data of each data slice; and write the data slice and the check data of the data slice into the super block of the RAID in units of pages.

针对上述以RAID技术进行数据存储的闪存，下面结合图6-图9对本申请实施例提供的闪存异常处理方法进行示例性说明。在本申请实施例中，可在闪存读写过程中对闪存进行状态检测。例如，在读闪存过程中，可检测当前读的闪存页是否发生读错误；若从当前读的闪存页读取到数据，确定当前读的闪存页所在闪存块不存在读错误；若未从当前读的闪存页读取到数据，则确定当前读的闪存页所在的闪存块发生读错误。针对发生读错误的异常闪存块，可在闪存映射层中对异常闪存块的物理地址标注发生读错误标签，以供针对目标超级块的下次垃圾回收发生时优先回收该异常闪存块的数据。With regard to the above-mentioned flash memory for data storage using the RAID technology, the method for handling the exception of the flash memory provided by the embodiment of the present application will be exemplarily described below with reference to FIGS. 6-9 . In the embodiment of the present application, the state detection of the flash memory may be performed during the process of reading and writing the flash memory. For example, in the process of reading the flash memory, it is possible to detect whether a read error occurs in the currently read flash page; if data is read from the currently read flash page, it is determined that there is no read error in the flash block where the currently If data is read from the flash memory page, it is determined that a read error occurs in the flash memory block where the currently read flash memory page is located. For the abnormal flash memory block with read error, the physical address of the abnormal flash memory block can be marked with a read error label in the flash memory mapping layer, so that the data of the abnormal flash memory block can be preferentially reclaimed when the next garbage collection for the target super block occurs.

在写闪存过程中，可读取闪存的闪存页中写入的数据；并将闪存页中写入的数据与页缓冲区存储的当前页数据进行比较；若写入闪存页的数据与页缓冲区存储的当前页数据相同，确定当前闪存页不存在写错误。相应地，若写入闪存页的数据与页缓冲区存储的当前页数据不一致，确定当前闪存页发生写错误。发生写错误的闪存页所在闪存块即为发生写错误的异常闪存块。当然，若固件无法从当前闪存页中读取写入的数据，说明固件发生读错误和写错误。针对发生写错误的异常闪存块，可将异常闪存块标记为只读。In the process of writing flash memory, the data written in the flash page of flash memory can be read; and the data written in the flash memory page is compared with the current page data stored in the page buffer; The current page data stored in the flash memory area is the same, and it is determined that there is no write error in the current flash memory page. Correspondingly, if the data written into the flash memory page is inconsistent with the current page data stored in the page buffer, it is determined that a write error occurs in the current flash memory page. The flash memory block where the flash memory page where the write error occurs is the abnormal flash memory block where the write error occurs. Of course, if the firmware cannot read the written data from the current flash memory page, it means that the firmware has read and written errors. For an abnormal flash memory block where a write error occurs, the abnormal flash memory block may be marked as read-only.

在本申请实施例中，针对异常闪存块，在步骤801中，可检测异常闪存块是否可读取。可选地，固件可对异常闪存块进行读数据操作；若异常闪存块的数据读取成功，确定异常闪存块可读取。若异常闪存块的数据无法读出，确定异常闪存块不可读取。由于发生读错误的异常闪存块无法读取数据，因此，发生读错误的异常闪存块不可读取。In the embodiment of the present application, for the abnormal flash memory block, in step 801, it may be detected whether the abnormal flash memory block is readable. Optionally, the firmware can perform a data read operation on the abnormal flash memory block; if the data of the abnormal flash memory block is successfully read, it is determined that the abnormal flash memory block can be read. If the data of the abnormal flash memory block cannot be read, it is determined that the abnormal flash memory block cannot be read. Since data cannot be read from an abnormal flash memory block where a read error has occurred, the abnormal flash memory block where a read error has occurred cannot be read.

发生写错误的异常闪存块中数据可能可读取，也可能不可读取。由于数据可读取的异常闪存块，不影响后续的数据读取的准确性，因此，在本实施例中，如图6所示，可以异常闪存块所属的目标超级块中的其它闪存块和冗余块组成新的RAID(定义为)；并对新的RAID进行数据写入，直至下次GC。在本申请实施例中，为了便于描述和区分，将异常闪存块所属的超级块中，除异常闪存块之外的其它闪存块，定义为目标闪存块。Data in an abnormal flash block where a write error occurred may or may not be readable. Because the abnormal flash memory block that data can be read does not affect the accuracy of subsequent data reading, therefore, in this embodiment, as shown in Figure 6, other flash memory blocks and other flash memory blocks in the target super block to which the abnormal flash memory block belongs can be Redundant blocks form a new RAID (defined as ); and write data to the new RAID until the next GC. In the embodiment of the present application, for the convenience of description and distinction, the flash memory blocks other than the abnormal flash memory block in the super block to which the abnormal flash memory block belongs are defined as target flash memory blocks.

由于GC需要回收有效数据，基于此，在GC操作启动之后，可优先回收异常闪存块中的应用数据；并回收目标闪存块的应用数据和冗余块的校验数据。之后，可对异常闪存块所在的超级块进行擦除，并剔除异常闪存块。Since the GC needs to recover valid data, based on this, after the GC operation is started, the application data in the abnormal flash block can be recovered first; and the application data of the target flash block and the verification data of the redundant block can be recovered. After that, the super block where the abnormal flash memory block is located can be erased, and the abnormal flash memory block can be removed.

在另一些实施例中，异常闪存块的数据可能不可读取，这种情况会影响闪存的数据读取性能，需要对异常闪存块存储的数据进行回收。无法读取的异常闪存块可能发生读错误，也可能发生写错误。图7中仅以不可读取的异常闪存块发生写错误进行图示，但不构成限定。In some other embodiments, the data of the abnormal flash memory block may not be readable, which will affect the data reading performance of the flash memory, and the data stored in the abnormal flash memory block needs to be reclaimed. An abnormal flash block that cannot be read may have a read error and may have a write error. In FIG. 7 , it is only shown that a write error occurs in an unreadable abnormal flash memory block, but it does not constitute a limitation.

结合图7和图8，针对不可读取的异常闪存块，固件可读取异常闪存块所属的目标超级块中目标闪存块的应用数据和目标超级块的校验数据。在本申请各实施例中，目标闪存块是指目标超级块中除异常闪存块和冗余块之外的闪存块，即目标超级块的数据闪存块中除异常闪存块之外的其它闪存块。Referring to FIG. 7 and FIG. 8 , for an unreadable abnormal flash memory block, the firmware can read the application data of the target flash memory block and the verification data of the target super block in the target super block to which the abnormal flash memory block belongs. In each embodiment of the present application, the target flash memory block refers to the flash memory block except the abnormal flash memory block and the redundant block in the target super block, that is, other flash memory blocks except the abnormal flash memory block in the data flash memory block of the target super block .

进一步，在步骤802中，可根据目标闪存块的应用数据和冗余块的校验数据，恢复异常闪存块的应用数据。具体地，可根据目标闪存块的应用数据和冗余块的校验数据，采用冗余校验算法恢复异常闪存块的应用数据。可选地，固件可以页为单位，恢复异常闪存块的应用数据。Further, in step 802, the application data of the abnormal flash memory block can be restored according to the application data of the target flash memory block and the check data of the redundant block. Specifically, according to the application data of the target flash memory block and the check data of the redundant block, the application data of the abnormal flash memory block can be recovered by using a redundancy check algorithm. Optionally, the firmware may restore the application data of the abnormal flash memory block in units of pages.

具体地，针对目标超级块中的任一超级页A，该超级页A为已写入数据的超级页，可根据目标闪存块在超级页A的应用数据，及冗余块在超级页A的校验数据，采用冗余校验算法恢复异常闪存块在超级页A的应用数据。Specifically, for any superpage A in the target superblock, the superpage A is a superpage that has already written data, according to the application data of the target flash memory block in superpage A, and the redundancy block in superpage A Check the data, use the redundancy check algorithm to restore the application data of the abnormal flash memory block in super page A.

可选地，可按照冗余校验算法对应的解码过程，对目标闪存块在超级页A的应用数据，及冗余块在超级页A的校验数据进行解码，以得到异常闪存块在超级页A的应用数据。Optionally, according to the decoding process corresponding to the redundancy check algorithm, the application data of the target flash memory block in super page A and the check data of the redundant block in super page A are decoded to obtain the abnormal flash block in super page A Application data for page A.

上述实施例仅以超级页A为例，对恢复异常闪存块的应用数据的过程进行示例性说明。可采用相同的方式，恢复异常闪存块在已写入数据的各超级页的应用数据，得到异常闪存块的应用数据。The above embodiment only uses super page A as an example to illustrate the process of restoring the application data of the abnormal flash memory block. In the same way, the application data of the abnormal flash memory block in each superpage where data has been written can be restored to obtain the application data of the abnormal flash memory block.

在恢复异常闪存块的应用数据之后，还需对异常闪存块的应用数据进行存储。在本实施例中，为了保证异常闪存块的应用数据的稳定性和安全性，结合图7和图8，在图8的步骤803中，可将目标闪存块设置为新的RAID，并在步骤804中，计算异常闪存块的应用数据中待写入新的RAID的超级页的应用数据的校验数据。具体地，可采用冗余校验算法，对异常闪存块的应用数据中待写入新的RAID的超级页的应用数据进行编码，以得到待写入该超级页的应用数据的校验数据。在本申请实施例中，为了便于描述和区分，将目标闪存块组成的RAID，定义为第一RAID。第一RAID的超级页，是指目标超级块的超级页中除异常闪存页之外的其它闪存页组成的页条带。第一RAID可包括：数据闪存块和冗余块。第一RAID的冗余块的数量由冗余校验算法决定。After restoring the application data of the abnormal flash memory block, it is also necessary to store the application data of the abnormal flash memory block. In this embodiment, in order to ensure the stability and security of the application data of the abnormal flash block, in conjunction with Fig. 7 and Fig. 8, in step 803 of Fig. 8, the target flash block can be set as a new RAID, and in step In 804, the check data of the application data to be written into the superpage of the new RAID among the application data of the abnormal flash memory block is calculated. Specifically, a redundancy check algorithm may be used to encode the application data to be written into the superpage of the new RAID in the application data of the abnormal flash memory block, so as to obtain the check data of the application data to be written into the superpage. In the embodiment of the present application, for the convenience of description and distinction, the RAID composed of target flash memory blocks is defined as the first RAID. The superpage of the first RAID refers to a page stripe composed of other flash pages except the abnormal flash page in the superpage of the target superblock. The first RAID may include: data flash blocks and redundant blocks. The number of redundant blocks of the first RAID is determined by a redundancy check algorithm.

进一步，步骤805中，可将待写入第一RAID的超级页的应用数据，及待写入第一RAID的超级页的应用数据对应的校验数据，写入第一RAID的超级页。固件以闪存页为单位，将待写入第一RAID的超级页的应用数据，及待写入第一RAID的超级页的应用数据对应的校验数据，写入第一RAID的超级页。Further, in step 805, the application data to be written into the superpage of the first RAID and the verification data corresponding to the application data to be written into the superpage of the first RAID can be written into the superpage of the first RAID. The firmware writes the application data to be written into the superpage of the first RAID and the verification data corresponding to the application data to be written into the superpage of the first RAID into the superpage of the first RAID in units of flash memory pages.

具体地，可根据目标闪存块的数量和闪存页的容量，将异常闪存块的应用数据划分为应用数据切片。应用数据切片的数量可为1个或多个。多个是指2个或2个以上。每个应用数据切片的数据量等于第一RAID的超级页包含的数据闪存页的总容量。Specifically, the application data of the abnormal flash memory block may be divided into application data slices according to the quantity of the target flash memory block and the capacity of the flash memory page. The number of application data slices may be one or more. A plurality means two or more. The data volume of each application data slice is equal to the total capacity of the data flash pages included in the superpage of the first RAID.

在本实施例中，假设目标超级块已写入数据的超级页的数量为K个，K为正整数。目标闪存块的数量为Q个。Q≥3，且为整数。第一RAID的冗余块的数量为P个，1≤P≤(Q-2)，且为整数。In this embodiment, it is assumed that the number of super pages into which data has been written in the target super block is K, where K is a positive integer. The number of target flash memory blocks is Q. Q≥3, and is an integer. The number of redundant blocks of the first RAID is P, 1≦P≦(Q−2), and is an integer.

在一些实施例中，K为(Q-P)的整数倍，这样，异常闪存块的应用数据可均分为至少一个应用数据切片。在一些实施例中，K不是(Q-P)的整数倍，则可在异常闪存块的应用数据中填充N个页的数据，以得到填充后的应用数据。其中，R等于Q减去K除以(Q-P)的余数。即R＝Q-[K％(Q-P)]，“％”表示取余数。对于ECC算法，P＝1。图6、图7及图9中均以冗余块的数量为1个进行图示，但不构成限定。对于冗余块数量为1个的实施例，固件可对一个异常闪存块进行异常处理。In some embodiments, K is an integer multiple of (Q-P), so that the application data of the abnormal flash block can be equally divided into at least one application data slice. In some embodiments, K is not an integer multiple of (Q-P), then N pages of data can be filled in the application data of the abnormal flash block to obtain the filled application data. Wherein, R is equal to the remainder of Q minus K divided by (Q-P). That is, R=Q-[K%(Q-P)], "%" means taking the remainder. For the ECC algorithm, P=1. In FIG. 6 , FIG. 7 and FIG. 9 , the number of redundant blocks is taken as one for illustration, but it is not limited thereto. For the embodiment in which the number of redundant blocks is one, the firmware can perform abnormal processing on one abnormal flash memory block.

进一步，可根据第一RAID的数据闪存块的数量和闪存页的容量，将填充后的应用数据划分为至少一个应用数据切片。应用数据切片即为待写入第一RAID的超级页的应用数据。一个应用数据切片为待写入第一RAID的一个超级页的应用数据。Further, the filled application data may be divided into at least one application data slice according to the number of data flash memory blocks and the capacity of the flash memory pages of the first RAID. The application data slice is the application data to be written into the superpage of the first RAID. An application data slice is application data to be written into a super page of the first RAID.

在将异常闪存块的应用数据切分为一个个应用数据切片之后，可采用冗余校验算法，对应用数据切片进行编码，以得到应用数据切片的校验数据。进一步，可将应用数据切片和该应用数据切片对应的校验数据写入第一RAID的超级页。一个应用数据分片和该应用数据分片对应的校验数据，写入第一RAID的一个超级页，从而实现了对异常闪存块的应用数据的存储和RAID保护。After the application data of the abnormal flash memory block is divided into individual application data slices, a redundancy check algorithm may be used to encode the application data slices to obtain check data of the application data slices. Further, the application data slice and the verification data corresponding to the application data slice may be written into the super page of the first RAID. An application data slice and the verification data corresponding to the application data slice are written into a super page of the first RAID, thereby realizing the storage and RAID protection of the application data of the abnormal flash memory block.

在图7中，黑色网格填充部分即为第一RAID的超级页的数据闪存页，写入的是异常闪存块的应用数据。竖线填充部分为第一RAID的超级页的冗余页，写入的是异常闪存块的应用数据的校验数据。In FIG. 7 , the black grid filled part is the data flash page of the super page of the first RAID, and the application data of the abnormal flash block is written. The part filled with the vertical lines is the redundant page of the superpage of the first RAID, and what is written is the verification data of the application data of the abnormal flash memory block.

本申请实施例提供了一种针对无预留用于替换异常闪存块的空闲闪存块的超级块的管理方法，针对该超级块中不可读取的异常超级块，可根据其它正常数据闪存块的应用数据和冗余块的校验数据，恢复出异常超级块的数据，并利用该超级块中正常数据闪存块构建RAID，对异常超级块的数据进行存储保护，可在不降低数据可靠性的前提下存储异常闪存块的数据。相较于预留空闲闪存块进行异常闪存块替换的异常闪存处理方式，可复用超级块的容量，有助于提高超级块的容量利用率。The embodiment of the present application provides a management method for the super block of the free flash memory block that is not reserved for replacing the abnormal flash memory block. For the abnormal super block that cannot be read in the super block, it can be based on other normal data flash memory blocks. The application data and the verification data of the redundant block restore the data of the abnormal super block, and use the normal data flash block in the super block to construct RAID to store and protect the data of the abnormal super block without reducing the reliability of the data. Store the data of the abnormal flash block under the premise. Compared with the abnormal flash memory processing method of reserving free flash memory blocks for replacement of abnormal flash memory blocks, the capacity of the super block can be reused, which helps to improve the capacity utilization of the super block.

由于超级块中已写入数据的超级页的RAID保护，是利用冗余校验算法对超级页的数据闪存页进行编码得到的校验数据。因此，超级块的冗余块存储的校验数据融合了异常闪存块的应用数据，导致RAID保护不再可对已写入数据的超级页起作用。Since the RAID protection of the super page with data written in the super block is the verification data obtained by encoding the data flash page of the super page by using the redundancy verification algorithm. Therefore, the verification data stored in the redundant block of the super block is fused with the application data of the abnormal flash block, so that the RAID protection can no longer work on the super page that has written data.

为了对超级块的已写入数据继续进行RAID保护，则需要消除异常闪存块的应用数据的影响。基于此，结合图7和图9，还可根据异常闪存块的应用数据和目标超级块的冗余块的应用数据，采用冗余校验算法确定目标闪存块的应用数据的校验数据，得到更新后的校验数据。该过程得到的校验数据可剥除异常闪存块的应用数据，保留除了异常闪存块所在页之外其它数据闪存页的保护，故称为抵消效果。抵消效果是指RAID保护剥离了异常闪存块的应用数据，可对其它正常数据闪存块的数据继续形成RAID保护。In order to continue to perform RAID protection on the written data of the super block, it is necessary to eliminate the influence of the application data of the abnormal flash block. Based on this, in conjunction with Fig. 7 and Fig. 9, also according to the application data of the abnormal flash memory block and the application data of the redundant block of the target super block, adopt the redundant verification algorithm to determine the verification data of the application data of the target flash memory block, obtain The updated checksum data. The verification data obtained in this process can strip the application data of the abnormal flash memory block, and retain the protection of other data flash memory pages except the page where the abnormal flash memory block is located, so it is called the offset effect. The offset effect means that the RAID protection strips the application data of the abnormal flash block, and can continue to form RAID protection for the data of other normal data flash blocks.

具体地，如图9所示，针对目标超级块中已写入数据的任一超级页B，可根据异常闪存块在超级页B的应用数据，及超级块的冗余块在超级页B的校验数据，采用冗余校验算法确定目标闪存块在超级页B的校验数据，作为超级页B的更新的校验数据。在图9中，仅以目标超级块包括M个数据闪存块和1个冗余块进行图示，但不构成限定。在图9中，在使用中出现了N个异常闪存块后，该超级块被缩减到含有(M-N)个数据闪存块和1个冗余块。Specifically, as shown in Figure 9, for any superpage B that has written data in the target superblock, the application data of the abnormal flash memory block in superpage B, and the redundancy block of the superblock in superpage B For the check data, a redundancy check algorithm is used to determine the check data of the target flash memory block in the super page B as the updated check data of the super page B. In FIG. 9 , it is only shown that the target super block includes M data flash memory blocks and one redundant block, but it is not limited thereto. In FIG. 9, after N abnormal flash blocks appear in use, the super block is reduced to contain (M-N) data flash blocks and 1 redundant block.

可选地，可对异常闪存块在超级页B的应用数据，及超级块的冗余块在超级页B的校验数据，进行冗余校验算法编码过程的逆运算，以得到目标闪存块在超级页B的校验数据，作为超级页B的更新的校验数据。Optionally, the application data of the abnormal flash block in super page B and the check data of the redundant block of the super block in super page B can be inversely calculated to obtain the target flash memory block The verification data in super page B is used as the updated verification data of super page B.

例如，冗余校验算法为异或算法。如图9所示，可对异常闪存块在超级页B的应用数据，及超级块的冗余块在超级页B的校验数据进行按位异或操作，以得到目标闪存块在超级页B的校验数据，作为超级页B的更新的校验数据。图9中“⊕”表示“异或”。For example, the redundancy check algorithm is an exclusive OR algorithm. As shown in Figure 9, a bitwise XOR operation can be performed on the application data of the abnormal flash block in super page B and the verification data of the redundant block of the super block in super page B to obtain the target flash block in super page B The verification data of is used as the updated verification data of super page B. "⊕" in Fig. 9 means "exclusive OR".

进一步，如图7所示，可将目标闪存块的应用数据对应的校验数据写入目标超级块的原冗余块的空闲冗余页。即将更新的校验数据写入目标超级块的原冗余块的空闲冗余页。目标超级块的冗余块为原超级块的冗余块，即图7中存储原校验数据的冗余块。图7中，更新后的校验数据，即为目标闪存块的应用数据对应的校验数据。具体地，以闪存页为单位，从当前空闲的第一个冗余页开始依次将目标闪存块的应用数据对应的校验数据按页写入原冗余块的空闲冗余页。Further, as shown in FIG. 7 , the parity data corresponding to the application data of the target flash memory block can be written into the free redundant page of the original redundant block of the target super block. The updated parity data is to be written into the free redundant page of the original redundant block of the target super block. The redundant block of the target super block is the redundant block of the original super block, that is, the redundant block storing the original parity data in FIG. 7 . In FIG. 7 , the updated check data is the check data corresponding to the application data of the target flash memory block. Specifically, starting from the first redundant page that is currently free, the parity data corresponding to the application data of the target flash memory block is written page by page into the free redundant page of the original redundant block in units of flash memory pages.

上述实施例基于RAID抵消及重建，可灵活解绑超级块组合并释放超级块容量性。针对异常闪存块发生前的数据保护，可通过上述RAID抵消效果更新RAID冗余，以对写入正常闪存块的数据构成有效保护，复用了介质容量。Based on the RAID cancellation and rebuilding, the above embodiments can flexibly unbind the combination of super blocks and release the capacity of super blocks. For data protection before abnormal flash memory blocks occur, the RAID redundancy can be updated through the above-mentioned RAID offset effect, so as to effectively protect the data written in normal flash memory blocks and reuse the media capacity.

在本申请实施例中，在将待写入第一RAID的超级页的应用数据和这些应用数据的校验数据，写入第一RAID的超级页之后，可将目标闪存块和目标超级块的冗余块设置为新的RAID(定义为第二RAID)。进一步，在后续写闪存时，可对第二RAID进行数据写入，直至下次GC。In the embodiment of the present application, after the application data to be written into the superpage of the first RAID and the verification data of these application data are written into the superpage of the first RAID, the target flash memory block and the target superblock can be The redundant block is set to a new RAID (defined as the second RAID). Further, when the flash memory is subsequently written, data can be written to the second RAID until the next GC.

进一步，在针对目标超级块的GC启动时，可响应针对目标超级块的GC操作，回收上述目标闪存块的数据、目标超级块的原冗余块更新的校验数据及对第二RAID写入数据的校验数据。其中，更新的校验数据，为上述目标闪存块的应用数据的校验数据。Further, when the GC for the target super block starts, it can respond to the GC operation for the target super block, reclaim the data of the above-mentioned target flash memory block, the check data of the original redundant block update of the target super block and write to the second RAID Checksum of the data. Wherein, the updated verification data is the verification data of the application data of the target flash memory block.

在有效数据回收之后，可对目标超级块进行擦除，并剔除异常闪存块，实现目标超级块的GC。After the valid data is recovered, the target super block can be erased, and abnormal flash memory blocks can be removed to realize GC of the target super block.

本申请实施例提供的异常闪存处理方式，在超级块出现异常闪存块时，若异常闪存块的数据可读出，则继续使用原超级块进行数据读写。若异常闪存块的数据无法读出，在完成异常闪存块的数据在同一超级块内的搬迁后，可对原超级块继续进行正常读写。在处理出现异常闪存块的数据在同一超级块搬迁期间，前端写入仍可由并行的开放超级块承担，可减少对用户可见性能的冲击，提升了SSD的服务性能(Quality of Service，QoS)。In the abnormal flash memory processing method provided by the embodiment of the present application, when an abnormal flash memory block appears in the super block, if the data of the abnormal flash memory block can be read, the original super block will continue to be used for data reading and writing. If the data of the abnormal flash memory block cannot be read out, after completing the relocation of the data of the abnormal flash memory block in the same super block, the original super block can be read and written normally. During the relocation of the same super block for processing abnormal flash block data, the front-end writing can still be undertaken by the parallel open super block, which can reduce the impact on user-visible performance and improve the service performance (Quality of Service, QoS) of SSD.

而上述图4示出的异常闪存块替换过程中，需读取异常超级块的数据并重新写入预留的空闲闪存块。由于异常闪存块被定义为故障恢复，故具有较高的处理优先级，需要快速处理。因此，此类操作消耗SSD后端资源，并形成与前端用户请求的竞争，对SSD整体性能稳定性造成阶段性影响。因此，本申请实施例提供的闪存异常处理方式有助于提高SSD的性能稳定性和QoS。In the process of replacing the abnormal flash memory block shown in FIG. 4 above, it is necessary to read the data of the abnormal super block and rewrite the reserved free flash memory block. Since the abnormal flash memory block is defined as failure recovery, it has a higher processing priority and needs to be processed quickly. Therefore, such operations consume SSD back-end resources and form competition with front-end user requests, which has a periodic impact on the overall performance stability of SSDs. Therefore, the flash memory exception handling method provided by the embodiment of the present application helps to improve the performance stability and QoS of the SSD.

另一方面，图4示出的异常闪存块替换方案，异常闪存块的发现，通常在写入过程中通过写错误而发现，此时多个闪存页的并行写入被中断，需要及时完成异常闪存块的替换所含的数据搬移及映射更新，还需使用数据缓存临时承接输入的数据，对计算系统的数据一致性和掉电保护均形成显著压力。On the other hand, in the abnormal flash memory block replacement scheme shown in Figure 4, the discovery of abnormal flash memory blocks is usually found through write errors during the writing process. At this time, the parallel writing of multiple flash memory pages is interrupted, and the abnormal The data migration and mapping update included in the replacement of the flash memory block also needs to use the data cache to temporarily accept the input data, which puts significant pressure on the data consistency and power-off protection of the computing system.

本申请实施例提供的闪存异常处理方式，异常闪存块的数据在同一超级块内的搬迁，可降低数据搬迁数量，提高数据搬迁速度，有助于降低多个闪存页的并行写入的中断时长，在一定程度上可降低数据一致性和掉电保护的压力。The flash memory exception processing method provided by the embodiment of the present application, the relocation of the data of the abnormal flash memory block in the same super block can reduce the number of data relocations, increase the speed of data relocation, and help reduce the interruption duration of parallel writing of multiple flash memory pages , to a certain extent, can reduce the pressure of data consistency and power-down protection.

为了更好地理解本申请实施例提供的闪存异常处理方法，下面结合图10所示的具体实施例进行示例性说明。在图10中，超级块包括：M个数据闪存块和一个冗余块。冗余校验算法为异或(XOR)算法。如图10所示，该闪存异常处理方法主要包括：In order to better understand the flash memory exception handling method provided by the embodiment of the present application, an exemplary description will be given below in conjunction with the specific embodiment shown in FIG. 10 . In FIG. 10, the super block includes: M data flash memory blocks and one redundant block. The redundancy check algorithm is an exclusive-or (XOR) algorithm. As shown in Figure 10, the flash memory exception handling method mainly includes:

S1、加载异常闪存块表，并根据通道分布选取正常闪存块构成超级块。该超级块包括：M个数据闪存块和一个冗余块。S1. Load the abnormal flash memory block table, and select normal flash memory blocks to form a super block according to channel distribution. The super block includes: M data flash blocks and a redundant block.

S2、SSD对闪存进行读写操作。S2. The SSD performs read and write operations on the flash memory.

S3、在对闪存进行读操作过程中，检测是否发生读错误。若发生，执行步骤S6；若未发生，执行步骤S4。S3. During the process of reading the flash memory, it is detected whether a read error occurs. If it occurs, execute step S6; if not, execute step S4.

S4、在对闪存进行写操作过程中，检测是否发生写错误。若发生，执行步骤S5。若未发生，返回执行步骤S2。S4. During the process of writing the flash memory, it is detected whether a writing error occurs. If so, go to step S5. If not, return to step S2.

S5、将发生写错误的异常闪存块标记为只读。S5. Mark the abnormal flash memory block where the write error occurs as read-only.

S6、读取异常闪存块的应用数据。S6. Read the application data of the abnormal flash memory block.

S7、检测是否读取成功。若成功，执行步骤S9；若失败，执行步骤S8。S7. Detect whether the reading is successful. If successful, perform step S9; if failed, perform step S8.

S8、对目标闪存块的应用数据和冗余块的校验数据进行按位XOR操作，恢复出异常闪存块的应用数据。接着，执行步骤S10。S8. Perform a bitwise XOR operation on the application data of the target flash memory block and the verification data of the redundant block to restore the application data of the abnormal flash memory block. Next, step S10 is executed.

目标闪存块的数量为(M-1)个，目标闪存块为M个数据闪存块中除异常闪存块之外的其它闪存块。The number of target flash memory blocks is (M-1), and the target flash memory blocks are other flash memory blocks except the abnormal flash memory block among the M data flash memory blocks.

S9、将(M-1)个目标闪存块和原有冗余块设置为第二RAID；并使用第二RAID的空闲闪存页进行数据写入。接着，执行步骤S16。S9. Set (M-1) target flash memory blocks and original redundant blocks as the second RAID; and use the free flash memory pages of the second RAID to write data. Next, step S16 is executed.

S10、将(M-1)个目标闪存块设置为第一RAID，第一RAID包括(M-2)个数据闪存块和1个冗余块。第一RAID的超级页包括(M-2)个数据闪存页和1个冗余页。S10. Set (M-1) target flash memory blocks as a first RAID, where the first RAID includes (M-2) data flash memory blocks and 1 redundant block. The superpage of the first RAID includes (M-2) data flash pages and 1 redundant page.

S11、针对异常闪存页的应用数据中的待写入(M-2)个数据闪存页的应用数据，计算待写入(M-2)个数据闪存页的应用数据的校验数据。S11. For the application data of (M-2) data flash pages to be written in the application data of the abnormal flash page, calculate the verification data of the application data of the (M-2) data flash pages to be written.

S12、将待写入(M-2)个数据闪存页的应用数据和相应的校验数据，写入第一RAID的一个超级页。S12. Write the application data to be written into (M-2) data flash memory pages and the corresponding check data into a super page of the first RAID.

S13、根据异常闪存块在超级块的超级页B的应用数据，及超级块的冗余块在超级页B的校验数据，确定目标闪存块在超级页B的校验数据，得到更新的校验数据。S13, according to the application data of the abnormal flash memory block in the super page B of the super block, and the verification data of the redundant block of the super block in the super page B, determine the verification data of the target flash memory block in the super page B, and obtain the updated verification data test data.

S14、将更新的校验数据写入超级块的冗余块的空闲冗余页。S14. Write the updated verification data into the idle redundant page of the redundant block of the super block.

S15、将(M-1)个目标闪存块和原有冗余块设置为第三RAID；并使用第三RAID的空闲闪存页进行数据写入。接着，执行步骤S16。S15. Set (M-1) target flash memory blocks and original redundant blocks as the third RAID; and use the free flash memory pages of the third RAID to write data. Next, step S16 is executed.

S16、GC超级块中的有效数据，擦除数据块，并剔除异常数据块。S16, GC the effective data in the super block, erase the data block, and remove the abnormal data block.

有效数据包括：目标闪存块的数据、冗余块存储的更新的校验数据及步骤S15在冗余块中写入的校验数据。Valid data includes: data of the target flash memory block, updated check data stored in the redundant block, and check data written in the redundant block in step S15.

S17、在异常闪存表中添加异常闪存块。S17. Add an abnormal flash memory block to the abnormal flash memory table.

S18、判断读写操作是否完成。若完成，执行步骤S19；若未完成，执行步骤S2。S18. Determine whether the read and write operations are completed. If completed, perform step S19; if not completed, perform step S2.

S19、将异常闪存块表写入持久化存储介质，以供固件初始化时加载。S19. Write the abnormal flash block table into a persistent storage medium, so as to be loaded when the firmware is initialized.

需要说明的是，上述实施例所提供方法的各步骤的执行主体均可以是同一设备，或者，该方法也由不同设备作为执行主体。比如，步骤801和802的执行主体可以为设备A；又比如，步骤801的执行主体可以为设备A，步骤802的执行主体可以为设备B；等等。It should be noted that the subject of execution of each step of the method provided in the foregoing embodiments may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 801 and 802 may be device A; for another example, the execution subject of step 801 may be device A, and the execution subject of step 802 may be device B; and so on.

另外，在上述实施例及附图中的描述的一些流程中，包含了按照特定顺序出现的多个操作，但是应该清楚了解，这些操作可以不按照其在本文中出现的顺序来执行或并行执行，操作的序号如801、802等，仅仅是用于区分开各个不同的操作，序号本身不代表任何的执行顺序。另外，这些流程可以包括更多或更少的操作，并且这些操作可以按顺序执行或并行执行。In addition, in some of the processes described in the above embodiments and accompanying drawings, multiple operations appearing in a specific order are included, but it should be clearly understood that these operations may not be executed in the order in which they appear herein or executed in parallel , the sequence numbers of operations, such as 801, 802, etc., are only used to distinguish different operations, and the sequence numbers themselves do not represent any execution order. Additionally, the processes may include more or fewer operations, and the operations may be performed sequentially or in parallel.

相应地，本申请实施例还提供一种存储有计算机指令的计算机可读存储介质，当计算机指令被一个或多个处理器执行时，致使一个或多个处理器执行上述闪存异常处理方法中的步骤。Correspondingly, the embodiment of the present application also provides a computer-readable storage medium storing computer instructions, and when the computer instructions are executed by one or more processors, one or more processors are caused to execute the above-mentioned flash memory exception handling method. step.

需要说明的是，本文中的“第一”、“第二”等描述，是用于区分不同的消息、设备、模块等，不代表先后顺序，也不限定“第一”和“第二”是不同的类型。It should be noted that the descriptions of "first" and "second" in this article are used to distinguish different messages, devices, modules, etc. are different types.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、只读光盘(Compact Disc Read-Only Memory，CD-ROM)、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present application may employ one or more computer-usable storage media (including but not limited to disk storage, compact disc read-only memory (CD-ROM), optical storage, etc.) containing computer-usable program code therein. ) in the form of a computer program product.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU等)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPU, etc.), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(Random-Access Memory，RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, in the form of random-access memory (RAM) and/or non-volatile memory such as read-only memory (ROM) or flash RAM . Memory is an example of computer readable media.

计算机的存储介质为可读存储介质，也可称为可读介质。可读存储介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(Phase-Change Memory，PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(Dynamic Random Access Memory，DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(Digital Video Disc，DVD)或其他光学存储、磁盒式磁带，磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。The storage medium of the computer is a readable storage medium, which may also be referred to as a readable medium. Readable storage media, including volatile and non-permanent, removable and non-removable media, may be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (Phase-Change Memory, PRAM), static random-access memory (SRAM), dynamic random-access memory (Dynamic Random Access Memory, DRAM), other types of random Access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD-ROM), digital multiplayer A digital video disc (DVD) or other optical storage, magnetic tape cartridge, disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括上述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in a process, method, article or device comprising the aforementioned element.

以上内容仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above content is only an embodiment of the present application, and is not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims

1. A flash memory exception processing method, is characterized in that, comprising:

For the abnormal flash memory block in the flash memory, detect whether the abnormal flash memory block can be read;

If the abnormal flash memory block is unreadable, recover the application data of the abnormal flash memory block according to the application data of the target flash memory block in the target super block to which the abnormal flash memory block belongs and the check data of the redundant block; The flash memory block is a flash memory block except the abnormal flash memory block and the redundant block in the target super block;

The target flash memory block is set as the first redundant array of independent disks RAID, and the check data to be written in the application data of the first superpage of the first RAID in the application data of the abnormal flash memory block is calculated;

Writing the application data to be written into the first super page and the verification data corresponding to the application data to be written into the first super page into the first super page.

2. The method according to claim 1, further comprising:

According to the application data of the abnormal flash memory block and the application data of the redundant block, using the redundancy check algorithm to determine the check data of the target flash memory block to obtain updated check data;

Writing the updated check data into a free redundant page of the redundant block.

3. The method according to claim 2, wherein, according to the application data of the abnormal flash memory block and the verification data of the redundant block, the redundancy verification algorithm is used to determine the target flash memory Block check data, including:

For the second super page in the target super block, according to the application data of the abnormal flash memory block in the second super page and the check data of the redundant block in the second super page, adopt the The redundancy check algorithm determines the check data of the target flash memory block in the second super page; the second super page is any super page in the target super block that has written data.

4. The method according to claim 3, wherein the redundancy check algorithm is an XOR algorithm; the application data of the second super page according to the abnormal flash block, and the redundant For the check data of the remaining block in the second super page, the redundancy check algorithm is used to determine the check data of the target flash memory block in the second super page, including:

performing a bitwise XOR operation on the application data of the abnormal flash memory block in the second super page and the verification data of the redundant block in the second super page to obtain the target flash memory block in the second super page Describe the verification data of the second superpage.

5. method according to claim 1, is characterized in that, described according to the application data of target flash memory block in the target super block that described abnormal flash memory block belongs and the verification data of redundant block, restore described abnormal flash memory block application data, including:

For the second super page in the target super block, according to the application data of the target flash block in the second super page and the check data of the redundancy block in the second super page, redundant The verification algorithm restores the application data of the abnormal flash memory block in the second superpage; the second superpage is any superpage in the target superblock that has written data.

6. The method according to claim 2, wherein the calculation of the verification data of the application data to be written into the first super page of the first RAID in the application data of the abnormal flash memory block includes:

According to the quantity of the data flash memory block of the first RAID and the capacity of the flash memory page, the application data of the abnormal flash memory block is divided into at least one application data slice; the application data slice is to be written into the first super page application data;

The application data slice is encoded by using a redundancy check algorithm to obtain check data of the application data slice.

7. The method according to claim 6, wherein the quantity of the second super page in the target super block is K; the second super page is the super page of written data in the target super block page; the quantity of the target flash memory blocks is Q; K is a positive integer; M≥3, and is an integer; the quantity of redundant blocks of the first RAID is P; 1≤P≤(Q-2) , and is an integer;

According to the number of data flash memory blocks and the capacity of flash memory pages of the first RAID, dividing the application data of the abnormal flash memory block into at least one application data slice includes:

If K is not an integer multiple of (Q-P), fill the data of R flash memory pages in the application data of the abnormal flash memory block, to obtain the filled application data; R is equal to the remainder of Q minus K divided by (Q-P);

Divide the filled application data into at least one application data slice according to the number of data flash memory blocks and the capacity of flash memory pages of the first RAID.

8. The method of claim 2, further comprising:

After writing the application data to be written into the first super page and the verification data corresponding to the application data to be written into the first super page into the first super page, the target flash memory The block and the redundant block are configured as a second RAID;

Write data to the second RAID until the next garbage collection for the target super block occurs.

9. The method according to claim 8, wherein the abnormal flash memory block is a flash memory block in which a read error occurs detected in the process of reading data from the flash memory; the method also includes:

In the flash memory mapping layer, the physical address of the abnormal flash memory block is marked with a read error label, so that the data of the abnormal flash memory block can be preferentially reclaimed when the next garbage collection for the target super block occurs.

10. The method of claim 2, further comprising:

If the abnormal flash memory block can be read, the target flash memory block and the redundant block are set as a second RAID;

11. The method according to any one of claims 8-10, further comprising:

In response to the garbage collection operation for the target super block, reclaim the data currently written in the target flash memory block, the updated check data in the redundant block and the check data of the second RAID;

Erasing the target super block, and removing the abnormal flash memory block.

12. A computing system, comprising: a processor and a storage driver; the storage driver includes: a controller and a flash memory;

The controller includes: firmware and a channel management component; the controller communicates with the flash memory through the channel management component; and accesses the flash memory through a channel;

The firmware is configured to execute the steps in the method of any one of claims 1-11.

13. A computer-readable storage medium storing computer instructions, wherein when said computer instructions are executed by one or more processors, said one or more processors are caused to perform any of claims 1-11 A step in said method.