[go: up one dir, main page]

CN115079936A - Data writing method and device - Google Patents

Data writing method and device Download PDF

Info

Publication number
CN115079936A
CN115079936A CN202110281305.4A CN202110281305A CN115079936A CN 115079936 A CN115079936 A CN 115079936A CN 202110281305 A CN202110281305 A CN 202110281305A CN 115079936 A CN115079936 A CN 115079936A
Authority
CN
China
Prior art keywords
request
read
data
storage device
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110281305.4A
Other languages
Chinese (zh)
Inventor
鲁鹏
金季焜
刘金虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110281305.4A priority Critical patent/CN115079936A/en
Publication of CN115079936A publication Critical patent/CN115079936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种数据写入方法及装置,涉及数据存储领域。该方法包括:存储设备获取第一读IO请求,并根据第一读IO请求的特征值以及数据特征分析模型,确定至少一个与第一读IO请求具有被连续读取的关联关系的第二IO请求,该第二IO请求保存在存储设备的内存中;进而,存储设备将第一读IO请求和至少一个第二IO请求一起写入存储设备的机械硬盘中。针对于包括多种存储介质(如HDD和SSD)的存储设备,存储设备将第一读IO请求和所有的第二IO请求一起写入HDD中,实现了存储设备将多个具有被连续读取的关联关系的IO请求写入HDD中,避免了随机性强的IO被写入HDD,提高了HDD的数据读写速度,进而,提高了存储设备的数据读写速度。

Figure 202110281305

Embodiments of the present application provide a data writing method and device, which relate to the field of data storage. The method includes: the storage device obtains a first read IO request, and according to a characteristic value of the first read IO request and a data characteristic analysis model, determining at least one second IO that has an association relationship with the first read IO request that is continuously read request, the second IO request is stored in the memory of the storage device; further, the storage device writes the first read IO request and at least one second IO request into the mechanical hard disk of the storage device. For a storage device including multiple storage media (such as HDD and SSD), the storage device writes the first read IO request and all second IO requests into the HDD together, so that the storage device can continuously read multiple The IO request of the associated relationship is written into the HDD, which prevents the IO with strong randomness from being written into the HDD, improves the data reading and writing speed of the HDD, and further improves the data reading and writing speed of the storage device.

Figure 202110281305

Description

Data writing method and device
Technical Field
The present application relates to the field of data storage, and in particular, to a data writing method and apparatus.
Background
A Redundant Array of Independent Drives (RAID) is a redundant array of multiple disks, and RAID can be used as an independent large storage device. RAID can fully exploit the advantages of multiple hard disks, such as increasing the speed of the hard disks, and providing fault tolerance to ensure data security. RAID may include a variety of storage media, such as Hard Disk Drives (HDDs) and Solid State Drives (SSDs).
Generally, in a process of writing Input Output (IO) data into a RAID by a processor, the processor writes IO data into a hard disk with a large remaining space in an HDD and an SSD according to the remaining space in the RAID. Because the HDD scans by using the magnetic head of the magnetic disk machine during the reading and writing process of the IO data, the data reading and writing speed of the IO data in the HDD is slow, and if the processor writes the IO data with strong randomness (for example, the IO data may be rewritten many times) into the HDD in the RAID, the data reading and writing speed of the HDD is slow, and further, the data reading and writing speed of the RAID is reduced.
Therefore, how to ensure the rationality of writing IO data into RAID and improve the data reading and writing speed of RAID is a problem that needs to be solved urgently at present.
Disclosure of Invention
The application provides a data writing method and a data writing device, and solves the problem that reading and writing speeds of an HDD and an RAID are low due to the fact that IO with high randomness is written into the HDD.
In order to achieve the purpose, the following technical scheme is adopted in the application.
In a first aspect, an embodiment of the present application provides a data writing method, which may be performed by a storage device, or performed by a data storage system including the storage device, and the method includes: the storage equipment acquires a first read IO request, and determines at least one second IO request having a continuously read incidence relation with the first read IO request according to a characteristic value of the first read IO request and a data characteristic analysis model, wherein the second IO request is stored in a memory of the storage equipment; further, the storage device writes the first read IO request and the at least one second IO request together into a mechanical hard disk of the storage device. By using the data writing method provided by the embodiment of the application, the storage device determines the IO requests having the continuously read association relation with the first read IO request, and for the storage device comprising various storage media (such as a HDD and an SSD), the storage device writes the first read IO request and all the second IO requests into the HDD together, so that the storage device writes a plurality of IO requests having the continuously read association relation into the HDD, the IO requests with strong randomness are prevented from being written into the HDD, the data reading and writing speed of the HDD is improved, and further, the data reading and writing speed of the storage device is improved.
In an optional implementation manner, the characteristic value of the first read IO request includes a first Logical Block Address (LBA) of data to be written, a data length, and a timestamp. The first LBA indicates a storage location of the data to be written in the storage device, the data length indicates a storage byte occupied by the data to be written, and the timestamp may indicate time information of last data change performed on the data to be written, where the data change may be at least one of data reading, data writing, or data rewriting.
In one possible example, the storage device determines at least one second IO request according to the feature values of the first read IO request and a data feature analysis model, and the method includes: the storage device outputs the probability values of the plurality of IO requests which are read continuously according to the characteristic values of the first IO request and the data characteristic analysis model, and the IO requests with the probability values reaching the probability threshold are used as second IO requests. The storage device can determine the incidence relation among the IO requests by utilizing the probability value of the plurality of IO requests which are read continuously, so that the data access characteristics of the plurality of IO requests are determined, the storage device can store the plurality of IO requests according to the data access characteristics of the plurality of IO requests, a group of IO requests with strong randomness are prevented from being written into the HDD, the IO requests with strong accessibility are stored by the mechanical hard disk, the data read-write speed of the HDD is improved, and further the data read-write speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the storage device determines at least one third IO request which does not have the incidence relation with the first read IO request and is continuously read according to the characteristic value of the first read IO request and the data characteristic analysis model, and writes the third IO request into a solid state disk of the storage device. The storage device may determine whether the first read IO request and the IO request to be tested have the continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and write the third IO request, which does not have the continuously read association relationship with the first read IO request, into the solid state disk. The storage device writes the IO into different types of hard disks according to the data access characteristics (such as stronger accessibility or stronger randomness) of the IO, so that the storage device is prevented from writing the IO with stronger randomness into the HDD, the data read-write speed of the HDD is improved, and further the data read-write speed of the storage device is improved.
In an optional implementation manner, before the storage device determines at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, the data writing method further includes: the storage device obtains a characteristic value of the first read IO request in the memory. Compared with the storage device that the characteristic value of the first read IO request is stored in the mechanical hard disk or the solid state hard disk, the characteristic value of the first read IO request is stored in the memory of the storage device, the data reading speed of the memory is higher than that of the mechanical hard disk and that of the solid state hard disk, the time for the controller of the storage device to read the characteristic value is reduced, and the data writing speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the storage device acquires a data characteristic analysis model. For example, the step of acquiring, by the storage device, the data feature analysis model specifically includes: and the storage equipment acquires the IO training set, inputs the IO training set into the first model to obtain the associated information, and then takes the first model as a data characteristic analysis model if the associated information meets the model convergence condition. The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs; the association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. The storage device can determine the incidence relation among the IO requests by using the data characteristic analysis model, and further determine the data access mode of the IO requests according to the incidence relation, so that the process of manually determining the incidence relation among the IO requests is reduced, the incidence of written data is improved, further, in the data reading process, the storage device can sequentially read the incidence relation which is continuously read, the data reading performance of the HDD is improved, and the data reading and writing performance of the storage device where the HDD is located is improved.
In some examples, the data access patterns described above include a randomness pattern and an accessibility pattern. For example, the storage device determines a probability value that any two IO requests are read continuously by using the data feature analysis model, compares the probability value with a probability threshold, and if the probability value is greater than or equal to the probability threshold, the storage device determines that the any two IO requests have an association relationship that is read continuously, and the data access mode of the any two IO requests is an accessibility mode. If the probability value is smaller than the probability threshold, the storage device determines that any two IO requests do not have the association relation of continuous reading, and the data access mode of any two IO requests is a random mode.
In a second aspect, an embodiment provided by the present application provides a data writing method, which may be executed by a computing device, where the computing device is connected to a storage device, and the method includes: the method comprises the steps that a computing device obtains a first read IO request, at least one second IO request which has a continuously read association relation with the first read IO request is determined according to a characteristic value of the first read IO request and a data characteristic analysis model, a storage device obtains a first storage message of each second IO request, and the storage device sends the first read IO request and all the first storage messages. And further, the storage device writes the second IO request from the memory of the storage device into the mechanical hard disk of the storage device according to the first storage message. The computing device can determine, by using a feature value and a data feature analysis model of the first read IO request, a second IO request having an association relation with the first read IO request that is continuously read, and for a storage device including multiple storage media, the storage device sends first storage messages of the first read IO request and multiple second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, so that sequential writing of the multiple IO requests having continuous reading is realized.
In an optional implementation manner, the characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp.
In one possible example, the computing device determines at least one second IO request according to the feature values of the first read IO request and a data feature analysis model, including: the computing device outputs a probability value that the plurality of IO requests are read continuously according to the feature value of the first IO request and the data feature analysis model, and takes the IO request with the probability value reaching a probability threshold value as a second IO request.
In an optional implementation manner, the data writing method further includes: the computing device determines at least one third IO request which does not have the continuously read association relation with the first IO request according to the feature value of the first IO request and the data feature analysis model, and further obtains a second storage message of each third IO request and sends all the second storage messages to the storage device. And then, the storage device writes the third IO request into the solid state disk of the storage device according to the second storage message. The computing device may determine whether the first read IO request and the IO request to be detected have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then store the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device according to whether the plurality of IO requests have a continuously read association relationship, so that data with strong randomness is prevented from being written into the mechanical hard disk, and the data read-write speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the computing device obtains a data feature analysis model. For example, the obtaining, by the computing device, the data feature analysis model specifically includes: the computing equipment obtains an IO training set, inputs the IO training set into the first model to obtain associated information, and if the associated information meets a model convergence condition, the computing equipment takes the first model as a data characteristic analysis model. The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs; the association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. Generally, the processing capacity of the computing device is stronger than that of the storage device, and the data feature analysis model is obtained by the computing device, so that the training time of the data feature analysis model is reduced, and the training efficiency is improved.
In a third aspect, an embodiment of the present application provides a data writing device, and beneficial effects may refer to descriptions of any aspect of the first aspect, which are not described herein again. The data writing means has the functionality to implement the behaviour in the method instance of any of the above first aspects. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the data writing apparatus is applied to a storage device, and the data writing apparatus includes: the receiving and sending unit is used for acquiring a first read IO request; the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read incidence relation, and the second IO request is stored in a memory of the storage device; and the storage unit is used for writing the first read IO request and the at least one second IO request into a mechanical hard disk of the storage device together.
In an alternative embodiment, the characteristic value of the first read IO request includes a first LBA of the data to be written, a data length, and a time stamp.
In another optional embodiment, the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request that is read continuously; and the storage unit is further used for writing the third IO request into the solid state disk of the storage device.
In another optional embodiment, the processing unit is specifically configured to output probability values that the plurality of IO requests are read continuously according to the feature values of the first IO request and the data feature analysis model, and use the IO request with the probability value reaching the probability threshold as the second IO request.
In another optional implementation manner, the transceiver unit is further configured to obtain a characteristic value of the first read IO request in the memory.
In another optional embodiment, the data writing apparatus further comprises: and the model acquisition unit is used for acquiring the data characteristic analysis model. The model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes multiple test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs; the model obtaining unit is specifically configured to input the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read; the model obtaining unit is specifically configured to take the first model as a data feature analysis model if the correlation information meets a model convergence condition.
In a fourth aspect, an embodiment of the present application provides another data writing device, and beneficial effects may refer to descriptions of any aspect of the second aspect, which are not repeated herein. The data writing means has the function of implementing the behaviour in the method instance of any of the second aspects described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the data writing apparatus is applied to a computing device, the computing device is connected with a storage device, and the data writing apparatus includes: the receiving and sending unit is used for acquiring a first read IO request; the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read incidence relation, and the second IO request is stored in a memory of the storage device; the processing unit is further configured to obtain a first storage message of each second IO request, where the first storage message indicates the storage device to write the second IO request from the memory into a mechanical hard disk of the storage device; and the transceiving unit is further used for sending the first read IO request and all the first storage messages to the storage device.
In an alternative embodiment, the characteristic value of the first read IO request includes a first LBA of the data to be written, a data length, and a time stamp.
In another optional implementation manner, the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request, and the association relationship is read continuously; the processing unit is further configured to obtain a second storage message of each third IO request, where the second storage message indicates the storage device to write the third IO request into the solid state disk of the storage device; and the transceiving unit is also used for sending all the second storage messages to the storage device.
In another optional implementation manner, the processing unit is specifically configured to output probability values that the plurality of IO requests are read continuously according to the feature values of the first read IO request and the data feature analysis model; and the processing unit is specifically configured to use the IO request with the probability value reaching the probability threshold as the second IO request.
In another optional embodiment, the data writing apparatus further comprises: and the model acquisition unit is used for acquiring the data characteristic analysis model. The model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes multiple test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs; the model obtaining unit is specifically configured to input the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read; the model obtaining unit is specifically configured to take the first model as a data feature analysis model if the associated information meets a model convergence condition.
In a fifth aspect, an embodiment of the present application provides a storage device, which includes a processor, a mechanical hard disk, and a solid state hard disk, where the processor is used to implement the operational steps of the method according to any one of the first aspect and the first possible implementation manner through logic circuits or executing code instructions.
In a sixth aspect, embodiments of the present application provide a computing device, which includes a processor and an interface circuit, where the interface circuit is configured to receive signals from other computing devices except the computing device and transmit the signals to the processor or send the signals from the processor to other computing devices except the computing device, and the processor is configured to implement the operation steps of the method according to any one of the second aspect and the possible implementation manner of the second aspect through logic circuits or executing code instructions.
In a seventh aspect, an embodiment of the present application provides a data storage system, where the data storage system includes a computing device and a storage device, and the computing device is connected to the storage device. The computing device is used for acquiring a first read IO request; the computing device is further used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read association relationship, and the second IO request is stored in a memory of the storage device; the computing device is further configured to obtain a first storage message of each second IO request, where the first storage message indicates that the storage device writes the second IO request from the memory into a mechanical hard disk of the storage device; the computing device is further configured to send the first read IO request and all of the first storage messages to the storage device. The storage device is used for receiving the first read IO request and all the first storage messages; the storage device is further configured to obtain a second IO request corresponding to each first storage message from the memory, and write the first read IO request and all the second IO requests into the mechanical hard disk together. The computing device may determine, using the feature values and the data feature analysis model of the first read IO request, a second IO request having an association relationship with the first read IO request that is read continuously.
For the storage device comprising multiple storage media, the storage device sends a first storage message of a first read IO request and multiple second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, so that sequential writing of multiple IO requests which are read continuously is realized.
Furthermore, in the data reading process, the disk in the mechanical hard disk sequentially reads the data according to the sequence of the plurality of IO requests, so that the sequential reading of the plurality of IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
In an eighth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program or instructions are stored, which, when executed by a storage device, implement the operating steps of the method of any one of the first aspect and the first possible implementation manner, and when executed by a computing device, implement the operating steps of the method of any one of the second aspect and the second possible implementation manner.
In a ninth aspect, embodiments of the present application provide a computer program product, which, when run on a storage device, causes the storage device to perform the operational steps of the method of any one of the possible implementations of the first aspect and the first aspect, or when run on a computing device, causes the computing device to perform the operational steps of the method of any one of the possible implementations of the second aspect and the second aspect.
In a tenth aspect, an embodiment of the present application provides a chip, which includes a memory and a processor, where the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to perform the operation steps of the method in the first aspect and any possible implementation manner of the first aspect, or the method in any possible implementation manner of the second aspect and the second aspect.
The present application may further combine to provide more implementation manners on the basis of the implementation manners provided by the above aspects.
Drawings
FIG. 1A is a schematic diagram of a data storage system provided herein;
FIG. 1B is a schematic view of another data storage system provided herein;
FIG. 2 is a schematic diagram of a data writing method provided in the present application;
FIG. 3 is a schematic diagram of data reading and writing provided by the present application;
FIG. 4 is a schematic diagram of a data feature analysis model provided in the present application;
FIG. 5 is a schematic diagram of another data writing method provided in the present application;
FIG. 6 is a schematic diagram of another data writing method provided in the present application;
FIG. 7 is a schematic diagram of a data writing apparatus provided in the present application;
fig. 8 is a schematic structural diagram of a computing device provided in the present application.
Detailed Description
For clarity and conciseness of the following description of various embodiments, a brief introduction to the related art is first given:
RAID requirements vary from customer to customer, for example, if RAID is composed using full flash granules, RAID performance is high, but cost is also very high. For another example, if a RAID is composed using a full SSD array, the cost of the RAID is low, but the performance is difficult to meet the customer's requirements. Therefore, to satisfy the balance between performance and cost of RAID, storage devices including a variety of storage media have come into play. For example, the Storage medium may be, but is not limited to, one or more of a Storage Class Memory (SCM), an SSD, and an HDD. SCM is a hybrid storage technology that combines the characteristics of both traditional storage devices and Memory, and Memory-level memories can provide faster read and write speeds than hard disks, but are slower in operation and cheaper in cost than Dynamic Random Access Memories (DRAMs).
Generally, for a storage device including a plurality of storage media, the read-write performance of the storage device depends on the read-write speed of the storage medium with the lowest read-write speed among the storage devices. For example, for a storage device comprising an SSD and an HDD, the read-write speed of the storage device depends on the read-write speed of the HDD. For example, the HDD may write new data (write new data after erasing original data of a disk) by a write-over (COW) method, and the SSD writes data by the COW method may erase all existing data in the SSD, so that the SSD generally writes new data by a write-over (ROW) method (if part of original data in the SSD is rewritten, the rewritten data is rewritten in the SSD, and a pointer is reconfigured for the rewritten data). The detailed process of COW and ROW can be referred to the related description of the prior art, and is not described herein.
The ROW may cause the storage device or a computing device connected to the storage device to perform GC according to the remaining space of the storage device, and release a data block (bolck) written with useless data in the storage device, so that the bolck may write new data after the GC, and as the ROW policy and the GC are continuously executed, the data distribution in the HDD may become scattered. Because the HDD uses the magnetic head of the magnetic disk to scan in the read-write process of the IO data, the data read-write speed of the IO data in the HDD is lower; under the condition that the average seek time of the HDD is not changed, if the data distribution in the HDD is more and more scattered, the seek times of the HDD can be increased, so that the total data read-write time of the HDD is increased, the data read-write speed of the HDD is reduced, and further, the data read-write speed of the storage equipment is reduced.
In order to solve the above problem, an embodiment of the present application provides a data writing method, including: the storage device obtains the first read IO request, determines at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, and writes the first read IO request and all the second IO requests into a mechanical hard disk of the storage device. The second IO request and the first read IO request have an incidence relation of being read continuously, and the second IO request is stored in a memory of the storage device. By using the data writing method provided by the embodiment of the application, the storage device determines the IO requests having the continuously read association relation with the first IO request, and for the storage device comprising multiple storage media, the storage device writes the first IO request and all the second IO requests into the HDD together, so that the storage device writes the IO requests having the continuously read association relation into the HDD, the situation that the IO with strong randomness is written into the HDD is avoided, the data reading and writing speed of the HDD is improved, and further, the data reading and writing speed of the storage device is improved.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1A is a schematic diagram of a data storage system provided herein, which includes a computing device 100 and a storage device 120. In the application scenario shown in FIG. 1A, a user accesses data through an application. The computer running these applications may be referred to as a "computing device". The computing device 100 may be a physical machine or a virtual machine. Physical computing devices include, but are not limited to, desktop computers, servers, notebook computers, and mobile devices. In one possible example, computing device 100 accesses storage device 120 over a network to access data, e.g., the network may include switch 110. In another possible example, the computing device 100 may also communicate with the storage device 120 through a wired connection, such as a Universal Serial Bus (USB).
The storage device 120 shown in FIG. 1A may be a centralized storage system. The centralized storage system is characterized by a uniform entry through which all data from external devices pass, which is the engine 121 of the centralized storage system. The engine 121 is the most central component in a centralized storage system, in which the high-level functions of many storage systems are implemented.
The storage device 120 shown in fig. 1A may also be a distributed storage system including a computing device cluster and a storage device cluster, the computing device cluster including one or more computing devices, and the computing devices 110 may communicate with each other. The computing device may be a computing device such as a server, a desktop computer, or a controller of a storage array, etc. In hardware, the computing device may include a processor, memory, a network card, and the like. The processor is a Central Processing Unit (CPU) for processing a data access request from outside the computing device or a request generated inside the computing device. For example, when the processor receives a write data request sent by a user, the data in the write data request is temporarily stored in the memory. And when the total amount of the data in the memory reaches a certain threshold value, the processor sends the data stored in the memory to the storage device for persistent storage. In addition, processors are used for data computation or processing, such as metadata management, deduplication, data compression, virtualized storage space, and address translation.
In one example, any one computing device may access any one of the storage devices in the storage device cluster over a network. The storage device cluster includes a plurality of storage devices. A storage device includes one or more controllers, a network card for communicating with a computing device, and a plurality of hard disks.
The data writing method provided by the present application may be executed by the computing device 100, or may be executed by the storage device 120, for example, the storage device 120 may be a centralized storage system or a distributed storage system.
As shown in fig. 1A, there may be one or more controllers in the engine 121, and fig. 1A illustrates an example in which the engine 121 includes one controller. In a possible example, if the engine 121 has multiple controllers, a mirror channel may be provided between any two controllers, so as to implement a function that any two controllers backup each other, thereby avoiding unavailability of the entire storage device 120 due to a hardware failure.
The engine 121 also includes a front-end interface 1211 and a back-end interface 1214, where the front-end interface 1211 is used to communicate with the computing device 100 to provide storage services for the computing device 100. And a back-end interface 1214 for communicating with the hard disk to expand the capacity of the storage device 120. Through the backend interface 1214, the engine 121 can connect more hard disks, thereby forming a very large pool of storage resources.
In hardware, as shown in fig. 1A, the controller includes at least a processor 1212 and a memory 1213. Processor 1212 is a CPU that processes data access requests from outside of storage device 120 (servers or other storage systems) as well as requests generated internally within storage device 120. For example, when the processor 1212 receives the write data request sent by the computing device 100 through the front-end port 1211, the data in the write data request is temporarily stored in the memory 1213. When the total amount of data in the memory 1213 reaches a certain threshold, the processor 1212 sends the data stored in the memory 1213 to at least one of the mechanical hard disk 1211, the mechanical hard disk 1222, the solid-state hard disk 1223, or the other hard disk 1224 through the back-end port for persistent storage.
The memory 1213 is an internal memory for directly exchanging data with the processor, and it can read and write data at any time and at a high speed as a temporary data storage for an operating system or other programs in operation. The Memory includes at least two types of Memory, for example, the Memory may be a random access Memory (ram) or a Read Only Memory (ROM). The random access memory is, for example, DRAM, or SCM. DRAM is a semiconductor Memory, and belongs to a volatile Memory (volatile Memory) device, like most Random Access Memories (RAMs). However, the DRAM and the SCM are only exemplary in this embodiment, and the Memory may also include other Random Access memories, such as Static Random Access Memory (SRAM), and the like. As the rom, for example, a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), and the like can be used. In addition, the Memory 1213 may also be a Dual In-line Memory Module (DIMM), i.e., a Module composed of Dynamic Random Access Memory (DRAM), or an SSD. In practice, the controller may be configured with a plurality of memories 1213, and different types of memories 1213. The number and type of the memories 1213 are not limited in this embodiment. In addition, the memory 1213 may be configured to have a power conservation function. The power-saving function means that data stored in the memory 1213 cannot be lost when the system is powered off and powered on again. A memory having a power retention function is called a nonvolatile memory.
The memory 1213 stores software programs, and the processor 1212 runs the software programs in the memory 1213 to manage the hard disk. For example, a hard disk is abstracted into a storage resource pool, and the storage resource pool is provided to a server for use in the form of a Logical Unit Number (LUN). The LUN here is in fact the hard disk seen on the server. Of course, some centralized storage systems are themselves file servers, and may provide shared file services for the servers.
As an alternative implementation, fig. 1B is a schematic diagram of another data storage system provided in the present application, in which the engine 121 may not have a hard disk slot, a hard disk needs to be placed in the hard disk frame 130, and the backend interface 1214 communicates with the hard disk frame 130. The backend interface 1214 exists in the engine 121 in the form of an adapter card, and two or more backend interfaces 1214 may be used simultaneously on one engine 121 to connect a plurality of hard disk frames. Alternatively, the adapter card may be integrated on the motherboard, and the adapter card may communicate with the processor 1212 through the PCIE bus.
It should be noted that only one engine 121 is shown in fig. 1B, however, in practical applications, two or more engines 121 may be included in the storage system, and redundancy or load balancing is performed among the engines 121.
The hard disk frame 130 includes a control unit 1225 and several hard disks. The control unit 1225 may have various forms. In one case, the hard disk frame 130 belongs to an intelligent disk frame, and as shown in fig. 1B, the control unit 1225 includes a CPU and a memory. The CPU is used for performing address conversion, reading and writing data and the like. The memory is used to temporarily store data to be written to the hard disk or read from the hard disk to be sent to the controller. Alternatively, the control unit 1225 is a programmable electronic component, such as a Data Processing Unit (DPU). The DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests or analysis requests. DPUs are distinguished from CPUs by a large degree of parallelism (requiring processing of large numbers of requests). Optionally, the DPU may also be replaced with a Graphics Processing Unit (GPU), an embedded neural Network Processor (NPU), and other processing chips. In general, the number of the control units 1225 may be one, or two or more. The functions of the control unit 1225 may be offloaded to the network card 1226. In other words, in this embodiment, the hard disk frame 130 does not have the control unit 1225 therein, but the network card 1226 performs data reading and writing, address conversion, and other computing functions. In this case, the network card 1226 is an intelligent network card. It may contain a CPU and memory. The CPU is used for performing address conversion, reading and writing data and other operations. The memory is used to temporarily store data to be written to the hard disk or read out from the hard disk to be sent to the controller. Or may be a programmable electronic component such as a DPU. There is no affiliation between the network card 1226 and the hard disk in the hard disk frame 130, and the network card 1226 can access any one of the hard disks in the hard disk frame 130 (such as the mechanical hard disk 1221, the mechanical hard disk 1222, the solid state hard disk 1223, and the other hard disks 1224 shown in fig. 1A), so it is convenient to expand the hard disks when the storage space is insufficient.
Depending on the type of communication protocol between the engine 121 and the hard disk frame 130, the hard disk frame 130 may be a serial attached small computer system interface (SAS) hard disk frame, an NVMe (non-volatile memory express) hard disk frame, and other types of hard disk frames. SAS hard disk frames adopt SAS3.0 protocol, and each frame supports 25 SAS hard disks. Engine 121 interfaces with hard disk frame 130 through an onboard SAS interface or SAS interface module. The NVMe hard disk frame is more like a complete computer system, and the NVMe hard disk is inserted into the NVMe hard disk frame. The NVMe hard disk box is in turn connected to the engine 121 through an RDMA port.
In order to improve the data read-write speed of RAID while ensuring the rationality of writing IO data into RAID, the following description takes the storage device 120 to execute the data writing method provided in this embodiment as an example, and fig. 2 is a schematic flow chart of a data writing method provided in this application, where the data writing method includes the following steps.
S210, the storage device 120 obtains the first read IO request.
In one possible implementation, as shown in fig. 1A, the first read IO request may be a read IO request sent by the computing device 100 to the storage device 120. For example, the first read IO request may be a read IO request generated when the computing device runs an application.
In another possible implementation manner, as shown in fig. 1B, the first read IO request may be an IO request generated when the storage device 120 performs a service. For example, the service may be a GC process executed by the storage device 120, the first read IO request includes valid data after GC, and the first read IO request may be stored in the memory 1213 of the storage device 120, for example, the processor 1212 in the storage device 120 may further obtain the first read IO request from the memory 1213.
Taking GC of the solid state disk 1223 by the storage device 120 shown in fig. 2 as an example, the storage device 120 deletes invalid data (or called garbage data) in the solid state disk 1223, and writes valid data of the solid state disk 1223 to the mechanical hard disk 1222, where the invalid data refers to data that is not read in a process after the storage device 120 after GC; the valid data refers to data that may be read in a process after the storage device 120 after the GC. As shown in fig. 3, fig. 3 is a schematic diagram of data reading and writing provided by this application, where a solid state disk 1223 and a mechanical hard disk 1222 each include multiple storage regions, and after GC is performed on one storage region in the solid state disk 1223, the storage region has multiple data blocks (blocks) including valid data, each block includes 8 data pages (pages), a data page in the data block 311 shown in fig. 3 is "A0C 0B 00D", a data page in the data block 312 is "0E 0000F 0", and a data page in the data block 313 is "000G 0000", where taking as an example that a controller reads a data page in the data block 311, the first read IO request includes valid data: "ACBD".
In this document, the read-write process shown in fig. 3 is described by taking the granularity of data (the minimum unit of data read-write) of the data blocks 311 to 313 as block and the data included in the data blocks as page as an example. However, in some possible examples, the data granularity of data blocks 311-313 may also be chunk (the data granularity of chunk is typically greater than block), and the data each chunk includes may be block. In other possible examples, the granularity of data blocks 311-313 may also be pages, and each page may include data that is smaller than the page.
The examples provided in fig. 2 and 3 are illustrated with data included in IO requests being stored on solid state disk 1223, but in some possible examples, the data included in the IO requests may also be stored on a mechanical hard disk (e.g., mechanical hard disk 1221 shown in fig. 1A) or other hard disk.
S220, the storage device 120 obtains the characteristic value of the first read IO request in the memory 1213.
In one possible example, if the storage device 120 is a centralized storage system, as shown in fig. 1B, the processor 1212 in the storage device 120 may read the characteristic value of the first read IO request from the memory 1213.
In another possible example, if the storage device 120 is a distributed storage system, the storage device 120 may also read the characteristic value of the first read IO request from the memories of other storage devices of the distributed storage system.
The characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp. Where the first LBA indicates a storage location of the data to be written in the storage device 120, the LBA may number from 0 to locate a block where the data to be written is located in the storage device 120, for example, the first LBA is 0, the second LBA is 1, and so on.
The data length indicates a storage byte occupied by the data to be written, for example, a byte occupied by the data to be written is 8 kilobytes (kB). The time stamp may indicate time information of a last data change of the data to be written, which may be at least one of data reading, data writing, or data rewriting.
In some possible embodiments, the characteristic value of the first read IO request may further include a read-write type of the first read IO request, for example, the first read IO request is a request with the read-write type being read IO.
In other possible embodiments, if the first read IO request is a group of IO data streams, the characteristic value of the first read IO request may further include at least one of a read-write ratio of data, an IO size distribution, a read-read interval distribution, a write-write interval distribution, a sequential stream characteristic, an interval stream characteristic, and an association stream characteristic. The read-write proportion refers to the proportion of read IO and write IO in an IO data stream, the IO size distribution refers to the data length of each IO data in the IO data stream, the read-read interval distribution refers to the interval time of two adjacent read IOs, the write-write interval distribution refers to the interval time of two adjacent write IOs, the sequential stream characteristic refers to the associated information of two adjacent IOs which are continuously written or continuously read, the interval stream characteristic refers to the associated information of two non-adjacent IOs which are continuously written or continuously read, and the associated stream characteristic refers to the associated information of any two IOs which are continuously written or continuously read. The above feature values are only possible cases provided by the embodiments of the present application, and should not be construed as limiting the present application, and in some possible examples, the feature values of the first read IO request may further include more or less features.
As an alternative implementation manner, as shown in fig. 2, a characteristic value of the first read IO request may be stored in the memory 1213, and the memory 1213 may further store a plurality of other IO requests. In a possible example, if the storage capacity of the memory 1213 is large, the memory 1213 may also hold characteristic values of a plurality of IO requests.
Compared with the storage device that the characteristic value of the first read IO request is stored in the mechanical hard disk or the solid state disk, the characteristic value of the first read IO request is stored in the memory of the storage device in the embodiment of the application, and the data reading speed of the memory is higher than that of the mechanical hard disk and that of the solid state disk, so that the time for a controller of the storage device to read the characteristic value is reduced, and the speed for the storage device to write data is increased.
As an alternative implementation, if there are more IO requests stored in the memory 1213 of the storage device 120, the characteristic value of the first IO request may also be stored in a hard disk of the storage device 120, such as the solid state disk 1223 shown in fig. 2. When the memory 1213 of the storage device 120 is not sufficient for storing the characteristic value of the first read IO request, the storage device 120 may use a partial area in the solid state disk 1223 as a memory to implement a function of the memory 1213 (e.g., cache the characteristic value of the first read IO request).
S230, the storage device 120 determines at least one second IO request according to the feature value of the first IO request and the data feature analysis model.
The second IO request has an association relationship with the first read IO request, and the second IO request may be stored in the memory 1213 of the storage device 120. The "being read continuously" means that the data included in the first read IO request and the data included in the second request are read continuously in the same process, or read in a plurality of continuous processes. As shown in fig. 3, if the first read IO request includes "page ACBD", 2 second IO requests having an association relationship with the first read IO request and being read continuously include "page EF" and "page G", respectively, and the data are read in the same process, the controller may rearrange and combine the data included in the IO requests to obtain a block including "ABCDEFG".
By using the data writing method provided by the embodiment of the application, the storage device determines at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, in other words, the storage device determines a plurality of IO requests having continuously read association relations, and further, the storage device may determine the data access characteristics of the plurality of IO requests according to whether the plurality of IO requests have continuously read association relations. For example, if a plurality of IO requests have an association relationship of being read continuously, the storage device determines that the plurality of IO requests have a requirement of being read sequentially, and further determines that the plurality of IO requests are a group of IO requests with strong accessibility; if the plurality of IO requests do not have the incidence relation of being read continuously, the storage device determines that the plurality of IO requests have the incidence relation of being read randomly, and further determines that the plurality of IO requests are a group of IO requests with strong randomness.
As an optional implementation manner, the foregoing S230 specifically includes: the storage device 120 outputs a probability value that a plurality of IO requests are read continuously according to the feature value of the first IO request and the data feature analysis model; and taking the IO request with the probability value reaching the probability threshold as a second IO request. In some examples, "the probability value meets the probability threshold" refers to the probability value being greater than or equal to the probability threshold. The probability threshold is used for determining whether the plurality of IO requests have a continuously read association relationship, and a value of the probability threshold may be determined according to performance of the storage device, a requirement of the storage device for executing a service, or a requirement of the computing device for executing a service. Here, taking the pages a to G shown in fig. 3 as an example, table 1 shows probability values at which any two pages of the pages a to G are read consecutively.
TABLE 1
A B C D E F G
A 90% 4% 3% 2% 1% 0.4%
B 0.1% 85% 3% 2% 1% 0.4%
C 0.1% 0.2% 87% 4% 3% 2%
D 0.1% 0.2% 0.3% 79% 2% 1%
E 0.1% 0.2% 0.1% 0.2% 84% 4%
F 0.1% 0.2% 0.1% 0.2% 0.5% 74%
G 0.1% 0.2% 0.1% 0.2% 0.1% 0.2%
Wherein after page A is read, page B is read with a probability of 90%.
After page B is read, page C is read with a probability of 85%.
After page C was read, page D was read with a probability of 87%.
After page D was read, page E was read with a probability of 79%.
After page E is read, page F is read with a probability of 84%.
After page F is read, page G is read with a 74% probability.
In one possible implementation manner, each page has a different LBA, which is exemplified by LBA of page a being 0, and as shown in fig. 3, a data block 311 includes data pages "A0C 0B 00D", and the block occupies LBA of "0 to 7", LBA of page B being 4, LBA of page C being 2, LBA of page D being 7; the data block 312 includes data pages of "0E 0000F 0", the LBA occupied by the block is "8 to 15", the LBA of page E is 9, and the LBA of page F is 14; the data block 313 includes data pages of "000G 0000", the LBA occupied by the block is "16 to 23", and the LBA of page G is 19. In table 1, the probability values of the pages including valid data are illustrated in the present application, and the probability values of the continuous reading between the pages may be represented by the probability values of the continuous reading between any two LBAs, which is not limited in the present application.
In a first possible example, the storage device determines only one second IO request according to the characteristic values of the first read IO request and the data characteristic analysis model. For example, if the probability threshold is 70%, the first read IO request includes page a, and the IO request includes page B, the probability that page a and page B are read continuously shown in table 1 is 90% > 70%, it is determined that the first read IO request and the IO request have an association relationship of being read continuously, and the storage device takes the IO request as the second IO request.
In a second possible example, the storage device only determines a plurality of second IO requests according to the feature values of the first read IO requests and the data feature analysis model. For example, if the probability threshold is 70%, the first read IO request includes page a, the first IO request includes page B, and the second IO request includes page C, the probability that page a and page B are read continuously shown in table 1 is 90%, the probability that page B and page C are read continuously is 85%, the probability that pages a to C are read continuously is determined to be 90% × 85% >, 76.5% > 70%, further, the first read IO request and 2 IO requests are determined to have an association relationship of being read continuously, and the storage device takes the 2 IO requests as the second IO request. The example that the number of the second IO requests is 2 is only used for illustration, and in some possible examples, the second IO requests having the association relationship with the first read IO request that are read continuously may also have more.
In addition, in the above embodiment, the probability values that a plurality of IO requests are read continuously are an example of a product of probability values that 2 adjacent pages are read continuously, and in some possible examples, to obtain the probability values that a plurality of IO requests are read continuously, the storage device may further perform curve fitting, weighting processing, and the like on the plurality of probability values, which is not limited in this application. For example, regarding the above-mentioned "page a-page B-page C-page F", probability values read consecutively between adjacent 2 pages may be compared with a probability threshold value to determine whether IO requests have an association relationship read consecutively. For example, if the probability threshold is 65%, since the probability values of continuously reading pages a-page B, B-page C, and C-page F are 90%, 85%, 3%, and 90% and 85% are greater than the probability threshold 65%, respectively, it is determined that page a-page B-page C has an association relationship of being continuously read, and 3% is less than the probability threshold 65%, it is determined that page a-page B-page C and page F do not have an association relationship of being continuously read.
According to the data writing method provided by the embodiment of the application, the storage device can determine the incidence relation among the plurality of IO requests by utilizing the probability value that the plurality of IO requests are read continuously, so that the data access characteristics of the plurality of IO requests are determined, the storage device is favorable for storing the plurality of IO requests according to the data access characteristics of the plurality of IO requests, a group of IO requests with strong randomness can be prevented from being written into the HDD, the IO requests with strong accessibility are stored by the mechanical hard disk, the data reading and writing speed of the HDD is improved, and further the data reading and writing speed of the storage device is improved.
As an optional implementation manner, before the foregoing S230, the data writing method provided in this embodiment of the present application may further include: the storage device obtains a data feature analysis model. For example, the data feature analysis model may be a neural network model, such as a Deep Neural Network (DNN) model, a Recurrent Neural Network (RNN) model, or a Convolutional Neural Network (CNN) model. For another example, the data feature analysis model may also be an index table in which different types of IO are recorded, the feature value of the IO request is input to the index table, and the data feature analysis model outputs a probability value that each IO request in the plurality of IO requests and the IO request are read continuously.
As shown in fig. 4, fig. 4 is a schematic diagram illustrating an acquisition process of a data feature analysis model provided in the present application, and the acquisition process of the data feature analysis model may include the following steps S410 to S440.
And S410, obtaining an IO training set.
The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs. For IO characteristic values, reference may be made to the above description of S220, which is not described herein.
And S420, inputting the IO training set into the first model to obtain the associated information.
The association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. For the probability values that any two test IOs are read consecutively, reference may be made to the related description in table 1, which is not described herein.
And S430, judging whether the associated information meets the model convergence condition.
The model convergence condition may be that the number of times of training the first model reaches a threshold number of times (e.g., 5 ten thousand times), or that the similarity of the actual association between the associated information and each test IO in the IO training set reaches a similarity threshold (e.g., 80%), which is not limited in this application.
If the associated information meets the model convergence condition, executing S430; and if the associated information does not accord with the model convergence condition, training the first model for multiple times by using an IO training set until the model convergence condition is met.
And S440, taking the first model as a data characteristic analysis model.
It should be noted that, in the method provided in the embodiment of the present application, the process of training the data feature analysis model may be executed by a storage device or a computing device, and the present application is not limited thereto. In some possible examples, the data feature analysis model may also be obtained by other computing devices that send the trained data feature analysis model to the computing device or storage device provided by embodiments of the present application.
By way of example, the other computing device may include at least one processor, which may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor including a CPU, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. For example, when the other device has only one processor, the processor may implement the above-described S410 to S440 and possible sub-steps thereof. For another example, when the other device includes a plurality of processors, the plurality of processors may cooperatively implement the above S410 to S440 and possible sub-steps thereof, for example, the plurality of processors includes a first processor and a second processor, the first processor may implement the above S410 and S420, and the second processor may implement the above S430 and S440.
Generally, the processing power of the computing device is stronger than that of the storage device, and compared to the storage device for training to obtain the data feature analysis model, if the training process of the data feature analysis model is executed by the computing device, the training time of the data feature analysis model is shorter, and the training efficiency is higher.
The storage device determines the incidence relation among the IO requests by using the data characteristic analysis model, and then determines the data access mode of the IO requests according to the incidence relation, so that the process of manually determining the incidence relation among the IO requests is reduced, the incidence of written data is improved, further, in the data reading process, the storage device can sequentially read the incidence relation which is continuously read, the data reading performance of the HDD is improved, and the data reading and writing performance of the storage device where the HDD is located is improved.
In one possible example, the data access patterns described above include a random pattern and an access pattern. For example, the storage device determines a probability value that any two IO requests are read continuously by using the data feature analysis model, compares the probability value with a probability threshold, and if the probability value is greater than or equal to the probability threshold, the storage device determines that the any two IO requests have an association relationship that is read continuously, and the data access mode of the any two IO requests is an accessibility mode. If the probability value is smaller than the probability threshold, the storage device determines that any two IO requests do not have the association relation of continuous reading, and the data access mode of any two IO requests is a random mode.
With continued reference to fig. 2, after S230, in order to increase the data read/write speed of the storage device, the data writing method provided in the embodiment of the present application further includes the following steps.
S240, the storage device 120 writes the first read IO request and all the second IO requests together into the mechanical hard disk of the storage device 120.
As shown in fig. 2, the storage device 120 may write the first read IO request and all second IO requests to the mechanical hard disk 1222 or the mechanical hard disk 1221.
In one possible example, if the storage device 120 sets the storage logic of the same type of hard disk to be a balance policy, such as writing data into the same type of hard disk with a larger remaining space, if the remaining space of the mechanical hard disk 1222 is larger than the remaining space of the mechanical hard disk 1221, the storage device 120 may write the first read IO request and all the second IO requests into the mechanical hard disk 1222.
In another possible example, if the storage device 120 sets the storage logic of the same type of hard disk as the resource optimal utilization policy, such as writing a hard disk into another hard disk of the same type after being full, if the remaining space of the mechanical hard disk 1222 is larger than the remaining space of the mechanical hard disk 1221, the storage device 120 may write the first read IO request and all the second IO requests into the mechanical hard disk 1221.
As an optional implementation manner, if the data to be written included in the first read IO request and the second IO request is smaller than the minimum data granularity of the data written in the storage device, in S240, the meaning of "write together" may mean that the controller in the storage device 120 combines the first read IO request and all the second IO requests according to logic information, and writes the combined data into the mechanical hard disk, where the logic information indicates a sequence in which the first read IO request and all the second IO requests are read. For example, with reference to the contents shown in fig. 3 and table 1, if the first read IO request includes page ACBD, the first second IO request includes page E and page F, the second IO request includes page G, and the block obtained by combining the pages by the controller is: "ACBDEFG", the controller may obtain, according to the probability value information shown in table 1, a probability value that the order in which the pages are read is "ABCDEFG" as follows: 90% × 85% × 87% × 79% × 84% × 74% ≈ 32.68%, with a probability threshold of multiple IO requests being read consecutively of 25%, the controller rearranges and combines the above pages to obtain a bolck comprising "ACBDEFG".
In the GC process in the prior art, the storage device moves the valid data obtained by the GC, as shown in the data block 311 in fig. 3, the data block 311 includes 8 pages: "A0C 0B 00D", the storage device writes the valid data in the data block 311 to the mechanical hard disk 1222 in the order of "page a-page C-page B-page D". However, the 4 pages are read in the order of "page a-page B-page C-page D", which results in that the magnetic head in the mechanical hard disk 1222 needs to be repeatedly adjusted in position (the magnetic head needs to move 5 unit distances from page a-page D) for reading the pages a-D during the data reading process. In other words, in the GC process of the prior art, the storage device does not rearrange the read data, which causes the data in the HDD to become more scattered as the number of GC increases, the data reading speed of the HDD to become slower and slower, and further causes the data reading speed of the storage device where the HDD is located to decrease.
In contrast, according to the data writing method provided by the embodiment of the application, the storage device may determine, by using the feature value of the first read IO request and the data feature analysis model, the second IO request having the continuously read association relationship with the first read IO request, and for the storage device including multiple storage media, the storage device writes the first read IO request and the multiple second IO requests into the mechanical hard disk together, so that sequential writing of the multiple continuously read IO requests is realized, and further, in the data reading process, the storage device reads the mechanical hard disk according to the logical sequence of the multiple IO requests, so that sequential reading of the multiple IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
In one possible example, with reference to fig. 3 and table 1, if the first IO request includes page a and page B, the second IO request includes page C and page D, the probability threshold is 45%, the probability value of the page a-page B-page C-page D being read continuously is 90% × 85% × 87% × 79% ≈ 52.58% > 45%, and the storage device writes the first IO request and the second IO request to the HDD in the order of "page a-page B-page C-page D". In the process of data reading, because the first IO request and the second IO request are written in sequence, the storage device reads the 4 pages and reads the data according to the sequence of 'page A-page B-page C-page D', so that the track seeking times of a magnetic head of the HDD are reduced (the magnetic head needs to move 3 unit distances from the page A to the page D), the data reading speed of the HDD is improved, and further the data reading speed of the storage device is improved.
As an alternative implementation, in order to fully utilize the hard disk formed by various storage media in the storage device 120, data stored in the solid state disk 1223 (e.g., data with stronger accessibility) may be written into the mechanical hard disk 1221 or the mechanical hard disk 1222. As shown in fig. 1A, data included in the first read IO request and the second IO request are stored in the solid state disk 1223, the controller reads the data from the solid state disk 1223 to the memory 1213, and further, the controller writes the data to the mechanical hard disk 1221 or the mechanical hard disk 1222 according to the read order by using the data writing method provided in the embodiment of the present application. By writing a group of data with strong accessibility into the mechanical hard disk from the solid state disk, the utilization rate of various different storage medium layers in the storage equipment can be improved, and the balance between the performance and the cost of the storage equipment is realized.
With reference to fig. 2, in order to store IO requests that do not have a continuously read association relationship with the first read IO request and improve resource utilization of each storage medium in the storage device, the data writing method provided in the embodiment of the present application may further include the following steps.
S250, the storage device 120 determines at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model.
The third IO request has no association with the first read IO request to be read continuously.
With reference to the contents shown in fig. 2, fig. 3, and table 1, if the block included in the first IO read request is "page ACBD", the block included in the IO request to be tested is "page G", and the probability threshold is 40%, the probability that any one of the pages included in the first IO read request and the page G included in the IO request to be tested are continuously read is smaller than the probability threshold, the storage device determines that the first IO read request and the IO request to be tested do not have the continuously read association relationship, and uses the IO request to be tested as the third IO request.
The characteristic value of the first read IO request is stored in the memory of the storage device, so that when the incidence relation of a plurality of IO requests to be tested is confirmed, only the characteristic value of the first read IO request needs to be obtained from the memory.
S260, the storage device 120 writes all the third IO requests into the solid hard disk 1223 of the storage device 120.
As shown in fig. 2, storage device 120 may write the third IO request to solid state disk 1223. In one example, the third IO request may also be saved in the memory 1213 of the storage device 120, so as to increase the data writing speed of the storage device.
By using the data writing method provided by the embodiment of the application, the storage device at least comprises two storage media, namely the HDD and the SSD, so that the performance and the cost of the storage device can be balanced, and the rationality of the storage device is ensured. In addition, the storage device may determine whether the first read IO request and the IO request to be tested have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then the storage device stores the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device. For example, the storage device writes a second IO request having an association relationship with the first read IO request being read continuously into the mechanical hard disk, and the storage device writes a third IO request having no association relationship with the first read IO request being read continuously into the solid state hard disk.
In the foregoing embodiment of the present application, the data writing method is executed by a controller in the storage device 120, and as an optional implementation manner, the data writing method provided in the present application may also be executed by a hard disk frame 122 in the storage device 120, which is not described herein again.
In the foregoing embodiment of the present application, the data writing method is executed by the storage device 120, as an alternative implementation manner, the data writing method provided by the present application may also be executed by the computing device 100, and fig. 5 is a schematic flow chart of another data writing method provided by the present application, where the data writing method includes the following steps.
S510, the computing device 100 obtains the first read IO request.
As shown in fig. 1A, the first read IO request may be an IO request generated when the computing device 100 runs an application program, or an IO request sent by another device and received by the computing device 100 through the switch 110.
S520, the computing device 100 determines at least one second IO request according to the characteristic value of the first IO request and the data characteristic analysis model.
The characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp, the second IO request has an association relationship with the first read IO request that is read continuously, and the second IO request is stored in a memory of the storage device. For the characteristic value of the first read IO request, reference may be made to the related description of S210, which is not described herein again. For the data characteristic analysis model provided in the embodiment of the present application, reference may be made to the related description of fig. 4, which is not repeated herein.
As an optional implementation manner, the characteristic value of the first read IO request may be stored in a memory of the computing device, and since the data reading speed of the memory is greater than the data reading speed of the mechanical hard disk and the solid state disk, the time for the computing device to read the characteristic value is reduced, and the data writing speed is increased.
As an optional implementation manner, the foregoing S520 specifically includes: outputting probability values of a plurality of IO requests which are read continuously according to the characteristic values of the first IO request and the data characteristic analysis model; and taking the IO request with the probability value reaching the probability threshold as a second IO request. For the process and the beneficial effect of the computing device determining at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, reference may be made to the above description of S230, which is not repeated herein.
S530, the computing device 100 obtains the first storage message of each second IO request.
In some examples, "storing a message" may also be referred to as storing an instruction, storing information, or storing an identification, etc. The first storage message in S530 instructs the storage device to write the second IO request from the memory to the mechanical hard disk of the storage device. The first storage message may include the LBA, data length, or LUN of the data to be written in the second IO request. In the data writing method provided by the embodiment of the application, the LUN may be used to determine a target storage resource pool of the data to be written included in the second IO request in the storage device. The LBA and the data length can be referred to in relation to S220, and are not described herein.
S540, the computing device 100 sends the first read IO request and all the first store messages to the storage device 120.
As shown in fig. 1A, computing device 100 may send a first read IO request and all first storage messages to storage device 120 through switch 110 to implement a process to access data. In some possible examples, the switch 110 may be an optional device, the computing device 100 may communicate with the storage device 120 through a network, and the computing device 100 may also communicate with the storage device 120 using a wired connection, which is not limited in the manner in which the computing device and the storage device communicate.
S550, the storage device 120 writes the second IO request from the memory into the mechanical hard disk of the storage device 120 according to the first storage message.
For example, the first storage message includes an LBA and a data length of the second IO request, where the LBA indicates a start address of the data to be written stored in the storage device, and the storage device obtains an end address of the data to be written according to the start address and the data length, and further, the storage device may read the data to be written corresponding to the second IO request according to the start address and the end address.
As an alternative implementation manner, fig. 6 is a schematic flow chart of another data writing method provided in the present application, where the foregoing S550 specifically includes S610 to S630.
S610, the storage device 120 receives the first read IO request and all the first storage messages from the computing device 100.
S620, the storage device 120 determines a second IO request corresponding to each first storage message.
For example, the first storage message includes the LBA, the data length, and the LUN of the second IO request, for example, "LUN 1 LBA 33 kB", where the second IO request is located in the 4 th logical block of the LUN1 disk in the storage device, and the data length of the second IO request is 3 kB.
S630, the storage device 120 writes the first read IO request and all the second IO requests into the mechanical hard disk.
For the specific implementation process of S630, reference may be made to the related description of S240 above, and details are not described here.
In the data writing method provided by the embodiment of the application, the computing device may determine, by using the feature value of the first read IO request and the data feature analysis model, the second IO request having the continuously read association relationship with the first read IO request, and for the storage device including the plurality of storage media, the storage device sends the first storage message of the first read IO request and the plurality of second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, thereby implementing sequential writing of a plurality of IO requests having been read successively, and further, in the data reading process, the disk in the mechanical hard disk sequentially reads the data according to the sequence of the plurality of IO requests, so that the sequential reading of the plurality of IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
With reference to fig. 5, in order to store IO requests that do not have a continuously read association relationship with the first read IO request and improve resource utilization of each storage medium in the storage device, the data writing method provided in the embodiment of the present application may further include the following steps.
S561, the computing device 100 determines at least one third IO request according to the feature value of the first IO request and the data feature analysis model.
The third IO request has no association relationship with the first read IO request to be read continuously. The computing device may determine whether the third IO request has a continuously read association relationship with the probability value of the first IO read request, for example, if the block included in the first IO read request is "page a", the block included in the IO request to be detected is "page d", and the probability threshold is 30%, the computing device determines that the probability value of the page d read after the page a is read is 25% and 25% < 30% by using the feature value of the first IO read request, determines that the page a and the page d do not have the continuously read association relationship, and further determines that the IO request to be detected and the first IO read request do not have the continuously read association relationship, and uses the IO request to be detected as the third IO request.
S562, the computing device 100 obtains the second storage message of each third IO request.
The second storage message instructs the storage device to write the third IO request into the solid state disk of the storage device. The second storage message may include the LBA, data length, or LUN of the data to be written in the second IO request. In the data writing method provided by the embodiment of the application, the LUN may be used to determine a target storage resource pool of the data to be written included in the second IO request in the storage device. The LBA and the data length can be referred to in relation to S220, and are not described herein.
S563, the computing device 100 sends all the second storage messages to the storage device 120.
S564, the storage device 120 writes the third IO request from the memory into the solid state disk of the storage device 120 according to the second storage message.
For example, the second storage message includes an LBA and a data length of the third IO request, where the LBA indicates a start address of the data to be written (e.g., page d) stored in the storage device, and the storage device obtains an end address of the data to be written according to the start address and the data length (e.g., 3 bytes), and further, the storage device may read the data to be written corresponding to the third IO request according to the start address and the end address.
As an alternative implementation manner, as shown in fig. 6, the above S564 specifically includes S640 to S660.
S640, the storage device 120 receives all second storage messages from the computing device.
S650, the storage device 120 determines a third IO request corresponding to each second storage message.
For example, the second storage message includes the LBA, the data length, and the LUN of the third IO request, for example, "LUN 3 LBA 52 kB," where the third IO request is located in the 6 th logical block of the LUN3 disk in the storage device, and the data length of the second IO request is 2 kB.
S660, the storage device 120 writes all the third IO requests into the solid state disk.
For the specific implementation process of S660, reference may be made to the related description of S260 above, and details are not described here.
By using the data writing method provided by the embodiment of the application, the storage device at least comprises two storage media, namely the HDD and the SSD, so that the performance and the cost of the storage device can be balanced, and the reasonability of the storage device is ensured. In addition, the computing device may determine whether the first read IO request and the IO request to be detected have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then, the computing device stores the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device according to whether the plurality of IO requests have a continuously read association relationship, so that data with strong randomness is prevented from being written into the mechanical hard disk, and the data read-write speed of the storage device is improved.
For example, the computing device sends a first storage message of a second IO request having an association relationship with a first read IO request that is continuously read to the storage device, and the storage device writes the first read IO request and a second IO request corresponding to the first storage message to the mechanical hard disk.
For another example, the computing device sends a second storage message of a third IO request that does not have an association relationship with the first read IO request and is continuously read to the storage device, and the storage device writes the third IO request corresponding to the second storage message into the solid state disk.
It is understood that, in order to implement the functions of the above embodiments, the computing device and the storage device include corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software driven hardware depends on the particular application scenario and design constraints imposed on the solution.
The data writing method provided according to the present embodiment is described in detail above with reference to fig. 1A to 6, and the data writing apparatus and the computing device provided according to the present embodiment are described below with reference to fig. 7 and 8. For the storage device provided in the embodiment of the present application, reference may be made to relevant contents in fig. 1A or fig. 1B, which are not described herein again.
Fig. 7 is a schematic diagram of a data writing apparatus provided in the present application. The data writing device can be used for realizing the functions of the storage device and the computing device in the method embodiment, so that the beneficial effects of the method embodiment can be realized. In this embodiment, the data writing apparatus may be the storage device 120 or the computing device 100 shown in fig. 1A or fig. 1B.
The structure and function of the data writing apparatus 700 are described below with reference to fig. 7, and the data writing apparatus 700 may implement the functions of the storage device or the computing device shown in fig. 2 and fig. 4 to 6. It should be understood that the present embodiment only exemplarily divides the structure and the functional modules of the data writing device 700, and the present application does not limit the specific division thereof.
As shown in fig. 7, the data writing apparatus 700 includes a transceiver 710, a processing unit 720, a storage unit 730, and a model obtaining unit 740, which may be used to implement the methods corresponding to the operation steps executed by the storage devices or the computing devices shown in fig. 2 to fig. 6.
When the data writing apparatus 700 is used to implement the functions of the storage device in the method embodiment shown in fig. 2, the transceiving unit 710 is configured to perform S210, the processing unit 720 is configured to perform S220, S230, and S250, and the storage unit 730 is configured to perform S240 and S260.
Optionally, when the data writing apparatus 700 is configured to implement the functions in the method embodiment shown in fig. 4, the model obtaining unit 740 is configured to execute S410 to S440.
When the data writing apparatus 700 is used to implement the functions of the computing device in the method embodiment shown in fig. 5, the transceiving unit 710 is configured to execute S510, S540, S561, and S564, and the processing unit 720 is configured to execute S520 and S562.
When the data writing apparatus 700 is used to implement the function of the storage device in the method embodiment shown in fig. 5, the storage unit 730 is used to execute S550 and S564.
When the data writing apparatus 700 is used to implement the functions of the storage device in the method embodiment shown in fig. 6, the transceiving unit 710 is configured to execute S610 and S640, the processing unit 720 is configured to execute S620 and S650, and the storage unit 730 is configured to execute S630 and S660.
More detailed descriptions about the data writing apparatus 700 can be obtained directly by referring to the related descriptions in the method embodiments shown in fig. 2 and fig. 4 to fig. 6, which are not repeated herein.
Fig. 8 is a schematic structural diagram of a computing device provided in the present application, where the computing device 800 includes a processor 810 and a communication interface 820. Processor 810 and communication interface 820 are coupled to one another. It is to be appreciated that the communication interface 820 can be a transceiver or an input-output interface. Optionally, the computing device 800 may also include a memory 830 for storing instructions to be executed by the processor 810 or for storing input data required by the processor 810 to execute the instructions or for storing data generated by the processor 810 after executing the instructions.
As a possible implementation manner, the processor 810 may generate data to be compressed in a tree structure according to the original data, and determine data occupancy information in the tree structure by using a cyclic network layer included in the data compression model. The data occupancy information is used to indicate the data distribution of the original data in the tree structure. Further, the processor 810 compresses the data to be compressed according to the data occupancy information to obtain compressed data.
When the computing device 800 is used to implement the methods shown in fig. 4-6, the processor 810, the communication interface 820 and the memory 830 may also cooperatively implement various operation steps in the data processing method performed by the transmitting end and the receiving end. The computing device 800 may also perform the functions of the data writing apparatus 700 shown in fig. 7, which are not described herein.
The specific connection medium among the communication interface 820, the processor 810 and the memory 830 is not limited in the embodiments of the present application. In fig. 8, the communication interface 820, the processor 810 and the memory 830 are connected by a bus 840, the bus is represented by a thick line in fig. 8, and the connection manner among other components is only schematically illustrated and is not limited. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
The memory 830 can be used for storing software programs and modules, such as program instructions/modules corresponding to the data processing method provided in the embodiments of the present application, and the processor 810 executes the software programs and modules stored in the memory 830, so as to execute various functional applications and data processing. The communication interface 820 may be used for signaling or data communication with other devices. The computing device 800 may have multiple communication interfaces 820 in this application.
The memory may be, but is not limited to, RAM, ROM, PROM, EPROM, EEPROM, and the like.
The processor may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor including a CPU, NP, etc.; but may also be a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a computing device. Of course, the processor and the storage medium may reside as discrete components in a computing device.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, hard disk, magnetic tape; or optical media such as Digital Video Disks (DVDs); but may also be a semiconductor medium, such as an SSD.
In the embodiments of the present application, unless otherwise specified or conflicting with respect to logic, the terms and/or descriptions in different embodiments have consistency and may be mutually cited, and technical features in different embodiments may be combined to form a new embodiment according to their inherent logic relationship.
The terms "first," "second," and "third," etc. in the description and claims of this application and the above-described drawings are used for distinguishing between different objects and not for limiting a particular order.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship; in the formula of the present application, the character "/" indicates that the preceding and following related objects are in a relationship of "division".
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application. The sequence numbers of the above processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic.

Claims (25)

1.一种数据写入方法,其特征在于,所述方法由存储设备执行,所述方法包括:1. A data writing method, wherein the method is performed by a storage device, the method comprising: 获取第一读IO请求;Get the first read IO request; 根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求,所述第二IO请求与所述第一读IO请求具有被连续读取的关联关系,所述第二IO请求保存在所述存储设备的内存中;At least one second IO request is determined according to the characteristic value of the first read IO request and the data characteristic analysis model, the second IO request and the first read IO request have an associated relationship of being continuously read, and the first read IO request Two IO requests are stored in the memory of the storage device; 将所述第一读IO请求和所述至少一个第二IO请求一起写入所述存储设备的机械硬盘中。The first read IO request and the at least one second IO request are written into the mechanical hard disk of the storage device together. 2.根据权利要求1所述的方法,其特征在于,所述第一读IO请求的特征值包括待写入数据的第一逻辑区块地址LBA、数据长度和时间戳。2. The method according to claim 1, wherein the characteristic value of the first read IO request comprises a first logical block address LBA of the data to be written, a data length and a time stamp. 3.根据权利要求1或2所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1 or 2, wherein the method further comprises: 根据所述第一读IO请求的特征值以及所述数据特征分析模型确定至少一个第三IO请求,所述第三IO请求与所述第一读IO请求不具有被连续读取的关联关系;Determine at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, and the third IO request and the first read IO request do not have an association relationship of being continuously read; 将所述第三IO请求写入所述存储设备的固态硬盘中。Writing the third IO request into the solid state disk of the storage device. 4.根据权利要求1-3中任一项所述的方法,其特征在于,根据所述第一读IO请求的特征值和数据特征分析模型确定至少一个第二IO请求,包括:4. The method according to any one of claims 1-3, wherein determining at least one second IO request according to the characteristic value of the first read IO request and a data characteristic analysis model, comprising: 根据所述第一读IO请求的特征值和所述数据特征分析模型输出多个IO请求被连续读取的概率值;According to the characteristic value of the first read IO request and the data characteristic analysis model, output a probability value that multiple IO requests are continuously read; 将概率值达到概率阈值的IO请求作为所述第二IO请求。The IO request whose probability value reaches the probability threshold is used as the second IO request. 5.根据权利要求1-4中任一项所述的方法,其特征在于,在根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求之前,所述方法还包括:5. The method according to any one of claims 1-4, wherein before determining at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, the method Also includes: 在所述内存中获取所述第一读IO请求的特征值。Acquire the characteristic value of the first read IO request in the memory. 6.根据权利要求1-5中任一项所述的方法,其特征在于,所述方法还包括:获取所述数据特征分析模型;6. The method according to any one of claims 1-5, wherein the method further comprises: acquiring the data feature analysis model; 所述获取所述数据特征分析模型具体包括:The acquiring the data feature analysis model specifically includes: 获取IO训练集,所述IO训练集包括多个测试IO,所述测试IO的特征值包括所述测试IO的读写类型、时间戳、第二LBA和数据长度;Acquiring an IO training set, the IO training set includes a plurality of test IOs, and the characteristic value of the test IO includes the read-write type, timestamp, second LBA and data length of the test IO; 将所述IO训练集输入至第一模型得到关联信息,所述关联信息包括所述多个测试IO中任意两个测试IO被连续读取的概率值;Inputting the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs in the plurality of test IOs are continuously read; 若所述关联信息符合模型收敛条件,则将所述第一模型作为所述数据特征分析模型。If the associated information meets the model convergence condition, the first model is used as the data feature analysis model. 7.一种数据写入方法,其特征在于,所述方法由计算设备执行,所述计算设备与存储设备连接,所述方法包括:7. A data writing method, wherein the method is executed by a computing device, the computing device is connected to a storage device, and the method comprises: 获取第一读IO请求;Get the first read IO request; 根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求,所述第二IO请求与所述第一读IO请求具有被连续读取的关联关系,所述第二IO请求保存在所述存储设备的内存中;At least one second IO request is determined according to the characteristic value of the first read IO request and the data characteristic analysis model, the second IO request and the first read IO request have an associated relationship of being continuously read, and the first read IO request Two IO requests are stored in the memory of the storage device; 获取每个所述第二IO请求的第一存储消息,所述第一存储消息指示所述存储设备将所述第二IO请求从所述内存写入所述存储设备的机械硬盘中;obtaining a first storage message of each of the second IO requests, where the first storage message instructs the storage device to write the second IO request from the memory into the mechanical hard disk of the storage device; 向所述存储设备发送所述第一读IO请求和所有的所述第一存储消息。Send the first read IO request and all the first storage messages to the storage device. 8.根据权利要求7所述的方法,其特征在于,所述第一读IO请求的特征值包括待写入数据的第一逻辑区块地址LBA、数据长度和时间戳。8. The method according to claim 7, wherein the characteristic value of the first read IO request comprises a first logical block address LBA of the data to be written, a data length and a time stamp. 9.根据权利要求7或8所述的方法,其特征在于,所述方法还包括:9. The method according to claim 7 or 8, wherein the method further comprises: 根据所述第一读IO请求的特征值以及所述数据特征分析模型确定至少一个第三IO请求,所述第三IO请求与所述第一读IO请求不具有被连续读取的关联关系;Determine at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, and the third IO request and the first read IO request do not have an association relationship of being continuously read; 获取每个所述第三IO请求的第二存储消息,所述第二存储消息指示所述存储设备将所述第三IO请求写入所述存储设备的固态硬盘中;Acquire a second storage message for each of the third IO requests, where the second storage message instructs the storage device to write the third IO request into the solid-state disk of the storage device; 向所述存储设备发送所有的所述第二存储消息。All of the second storage messages are sent to the storage device. 10.根据权利要求7-9中任一项所述的方法,其特征在于,根据所述第一读IO请求的特征值和数据特征分析模型确定至少一个第二IO请求,包括:10. The method according to any one of claims 7-9, wherein determining at least one second IO request according to the characteristic value of the first read IO request and a data characteristic analysis model, comprising: 根据所述第一读IO请求的特征值和所述数据特征分析模型输出多个IO请求被连续读取的概率值;According to the characteristic value of the first read IO request and the data characteristic analysis model, output a probability value that multiple IO requests are continuously read; 将概率值达到概率阈值的IO请求作为所述第二IO请求。The IO request whose probability value reaches the probability threshold is used as the second IO request. 11.根据权利要求7-10中任一项所述的方法,其特征在于,所述方法还包括:获取所述数据特征分析模型;11. The method according to any one of claims 7-10, wherein the method further comprises: acquiring the data feature analysis model; 所述获取所述数据特征分析模型具体包括:The acquiring the data feature analysis model specifically includes: 获取IO训练集,所述IO训练集包括多个测试IO,所述测试IO的特征值包括所述测试IO的读写类型、时间戳、第二LBA和数据长度;Acquiring an IO training set, the IO training set includes a plurality of test IOs, and the characteristic value of the test IO includes the read-write type, timestamp, second LBA and data length of the test IO; 将所述IO训练集输入至第一模型得到关联信息,所述关联信息包括所述多个测试IO中任意两个测试IO被连续读取的概率值;Inputting the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs in the plurality of test IOs are continuously read; 若所述关联信息符合模型收敛条件,则将所述第一模型作为所述数据特征分析模型。If the associated information meets the model convergence condition, the first model is used as the data feature analysis model. 12.一种数据写入装置,其特征在于,所述装置应用于存储设备,所述装置包括:12. A data writing device, wherein the device is applied to a storage device, and the device comprises: 收发单元,用于获取第一读IO请求;a transceiver unit for obtaining the first read IO request; 处理单元,用于根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求,所述第二IO请求与所述第一读IO请求具有被连续读取的关联关系,所述第二IO请求保存在所述存储设备的内存中;A processing unit, configured to determine at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, where the second IO request and the first read IO request have an association of being continuously read relationship, the second IO request is stored in the memory of the storage device; 存储单元,用于将所述第一读IO请求和所述至少一个第二IO请求一起写入所述存储设备的机械硬盘中。A storage unit, configured to write the first read IO request and the at least one second IO request together into the mechanical hard disk of the storage device. 13.根据权利要求12所述的装置,其特征在于,所述第一读IO请求的特征值包括待写入数据的第一逻辑区块地址LBA、数据长度和时间戳。13. The apparatus according to claim 12, wherein the characteristic value of the first read IO request comprises a first logical block address LBA of the data to be written, a data length and a time stamp. 14.根据权利要求12或13所述的装置,其特征在于,所述处理单元,还用于根据所述第一读IO请求的特征值以及所述数据特征分析模型确定至少一个第三IO请求,所述第三IO请求与所述第一读IO请求不具有被连续读取的关联关系;14. The apparatus according to claim 12 or 13, wherein the processing unit is further configured to determine at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model , the third IO request and the first read IO request do not have an associated relationship of being continuously read; 所述存储单元,还用于将所述第三IO请求写入所述存储设备的固态硬盘中。The storage unit is further configured to write the third IO request into the solid-state hard disk of the storage device. 15.根据权利要求12-14中任一项所述的装置,其特征在于,所述处理单元,具体用于根据所述第一读IO请求的特征值和所述数据特征分析模型输出多个读IO请求被连续读取的概率值,以及将概率值达到概率阈值的读IO请求作为所述第二IO请求。15. The apparatus according to any one of claims 12-14, wherein the processing unit is specifically configured to output a plurality of data according to the characteristic value of the first read IO request and the data characteristic analysis model. The probability value of the read IO request being read continuously, and the read IO request whose probability value reaches the probability threshold value is regarded as the second IO request. 16.根据权利要求12-15中任一项所述的装置,其特征在于,所述收发单元,还用于在所述内存中获取所述第一读IO请求的特征值。16. The apparatus according to any one of claims 12-15, wherein the transceiver unit is further configured to acquire the characteristic value of the first read IO request in the memory. 17.根据权利要求12-16中任一项所述的装置,其特征在于,所述装置还包括:模型获取单元,用于获取所述数据特征分析模型;17. The apparatus according to any one of claims 12-16, wherein the apparatus further comprises: a model obtaining unit, configured to obtain the data feature analysis model; 所述模型获取单元具体用于获取IO训练集,所述IO训练集包括多个测试IO,所述测试IO的特征值包括所述测试IO的读写类型、时间戳、第二LBA和数据长度;The model obtaining unit is specifically used to obtain an IO training set, the IO training set includes a plurality of test IOs, and the characteristic values of the test IOs include the read/write type, timestamp, second LBA and data length of the test IO ; 所述模型获取单元具体用于将所述IO训练集输入至第一模型得到关联信息,所述关联信息包括所述多个测试IO中任意两个测试IO被连续读取的概率值;The model obtaining unit is specifically configured to input the IO training set into the first model to obtain associated information, where the associated information includes a probability value that any two test IOs in the plurality of test IOs are continuously read; 所述模型获取单元具体用于若所述关联信息符合模型收敛条件,将所述第一模型作为所述数据特征分析模型。The model obtaining unit is specifically configured to use the first model as the data feature analysis model if the associated information meets the model convergence condition. 18.一种数据写入装置,其特征在于,所述装置应用于计算设备,所述计算设备与存储设备连接,所述装置包括:18. A data writing device, wherein the device is applied to a computing device, the computing device is connected to a storage device, and the device comprises: 收发单元,用于获取第一读IO请求;a transceiver unit for obtaining the first read IO request; 处理单元,用于根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求,所述第二IO请求与所述第一读IO请求具有被连续读取的关联关系,所述第二IO请求保存在所述存储设备的内存中;A processing unit, configured to determine at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, where the second IO request and the first read IO request have an association of being continuously read relationship, the second IO request is stored in the memory of the storage device; 所述处理单元,还用于获取每个所述第二IO请求的第一存储消息,所述第一存储消息指示所述存储设备将所述第二IO请求从所述内存写入所述存储设备的机械硬盘中;The processing unit is further configured to obtain a first storage message of each of the second IO requests, where the first storage message instructs the storage device to write the second IO request from the memory to the storage in the mechanical hard disk of the device; 所述收发单元,还用于向所述存储设备发送所述第一读IO请求和所有的所述第一存储消息。The transceiver unit is further configured to send the first read IO request and all the first storage messages to the storage device. 19.根据权利要求18所述的装置,其特征在于,所述第一读IO请求的特征值包括待写入数据的第一逻辑区块地址LBA、数据长度和时间戳。19. The apparatus according to claim 18, wherein the characteristic value of the first read IO request comprises a first logical block address LBA of the data to be written, a data length and a time stamp. 20.根据权利要求18或19所述的装置,其特征在于,所述处理单元,还用于根据所述第一读IO请求的特征值以及所述数据特征分析模型确定至少一个第三IO请求,所述第三IO请求与所述第一读IO请求不具有被连续读取的关联关系;20. The apparatus according to claim 18 or 19, wherein the processing unit is further configured to determine at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model , the third IO request and the first read IO request do not have an associated relationship of being continuously read; 所述处理单元,还用于获取每个所述第三IO请求的第二存储消息,所述第二存储消息指示所述存储设备将所述第三IO请求写入所述存储设备的固态硬盘中;The processing unit is further configured to obtain a second storage message of each of the third IO requests, where the second storage message instructs the storage device to write the third IO request to the solid state disk of the storage device middle; 所述收发单元,还用于向所述存储设备发送所有的所述第二存储消息。The transceiver unit is further configured to send all the second storage messages to the storage device. 21.根据权利要求18-20中任一项所述的装置,其特征在于,所述处理单元,具体用于根据所述第一读IO请求的特征值和所述数据特征分析模型输出多个读IO请求被连续读取的概率值;21. The apparatus according to any one of claims 18 to 20, wherein the processing unit is specifically configured to output a plurality of data according to the characteristic value of the first read IO request and the data characteristic analysis model. The probability value of the read IO request being read continuously; 所述处理单元,具体用于将概率值达到概率阈值的读IO请求作为所述第二IO请求。The processing unit is specifically configured to use the read IO request whose probability value reaches the probability threshold as the second IO request. 22.根据权利要求18-21中任一项所述的装置,其特征在于,所述装置还包括:模型获取单元,用于获取所述数据特征分析模型;22. The apparatus according to any one of claims 18-21, wherein the apparatus further comprises: a model obtaining unit, configured to obtain the data feature analysis model; 所述模型获取单元具体用于获取IO训练集,所述IO训练集包括多个测试IO,所述测试IO的特征值包括所述测试IO的读写类型、时间戳、第二LBA和数据长度;The model obtaining unit is specifically used to obtain an IO training set, the IO training set includes a plurality of test IOs, and the characteristic values of the test IOs include the read/write type, timestamp, second LBA and data length of the test IO ; 所述模型获取单元具体用于将所述IO训练集输入至第一模型得到关联信息,所述关联信息包括所述多个测试IO中任意两个测试IO被连续读取的概率值;The model obtaining unit is specifically configured to input the IO training set into the first model to obtain associated information, where the associated information includes a probability value that any two test IOs in the plurality of test IOs are continuously read; 所述模型获取单元具体用于若所述关联信息符合模型收敛条件,则将所述第一模型作为所述数据特征分析模型。The model obtaining unit is specifically configured to use the first model as the data feature analysis model if the associated information meets the model convergence condition. 23.一种存储设备,其特征在于,包括:处理器、机械硬盘和固态硬盘,所述处理器通过逻辑电路或执行代码指令用于实现如权利要求1至6中任一项所述的方法。23. A storage device, comprising: a processor, a mechanical hard disk and a solid-state hard disk, wherein the processor is used to implement the method according to any one of claims 1 to 6 through logic circuits or executing code instructions . 24.一种数据存储系统,其特征在于,包括计算设备和存储设备,所述计算设备与所述存储设备连接;24. A data storage system, comprising a computing device and a storage device, the computing device being connected to the storage device; 所述计算设备,用于获取第一读IO请求;The computing device is used to obtain the first read IO request; 所述计算设备,还用于根据所述第一读IO请求的特征值以及数据特征分析模型确定至少一个第二IO请求,所述第二IO请求与所述第一读IO请求具有被连续读取的关联关系,所述第二IO请求保存在所述存储设备的内存中;The computing device is further configured to determine at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, and the second IO request and the first read IO request have the same number of consecutive reads. The association relationship taken, the second IO request is stored in the memory of the storage device; 所述计算设备,还用于获取每个所述第二IO请求的第一存储消息,所述第一存储消息指示所述存储设备将所述第二IO请求从所述内存写入所述存储设备的机械硬盘中;The computing device is further configured to obtain a first storage message of each second IO request, where the first storage message instructs the storage device to write the second IO request from the memory to the storage in the mechanical hard disk of the device; 所述计算设备,还用于向所述存储设备发送所述第一读IO请求和所有的所述第一存储消息;The computing device is further configured to send the first read IO request and all the first storage messages to the storage device; 所述存储设备,用于接收所述第一读IO请求和所有的所述第一存储消息;The storage device is configured to receive the first read IO request and all the first storage messages; 所述存储设备,还用于从所述内存中获取每个所述第一存储消息对应的第二IO请求,以及,将所述第一读IO请求和所有的所述第二IO请求一起写入所述机械硬盘中。The storage device is further configured to obtain a second IO request corresponding to each of the first storage messages from the memory, and write the first read IO request and all the second IO requests together into the mechanical hard disk. 25.一种计算机存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被存储设备执行时,实现如权利要求1至6中任一项所述的方法,当所述计算机程序或指令被计算设备执行时,实现如权利要求7至11中任一项所述的方法。25. A computer storage medium, characterized in that, a computer program or instruction is stored in the storage medium, and when the computer program or instruction is executed by a storage device, any one of claims 1 to 6 is implemented. The method of any one of claims 7 to 11 is implemented when the computer program or instructions are executed by a computing device.
CN202110281305.4A 2021-03-16 2021-03-16 Data writing method and device Pending CN115079936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110281305.4A CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110281305.4A CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Publications (1)

Publication Number Publication Date
CN115079936A true CN115079936A (en) 2022-09-20

Family

ID=83246022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110281305.4A Pending CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Country Status (1)

Country Link
CN (1) CN115079936A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865803A (en) * 2023-03-03 2023-03-28 浪潮电子信息产业股份有限公司 IO request processing method, device, equipment and readable storage medium
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data
WO2025161381A1 (en) * 2024-01-29 2025-08-07 华为技术有限公司 Storage system, data storage method and storage device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865803A (en) * 2023-03-03 2023-03-28 浪潮电子信息产业股份有限公司 IO request processing method, device, equipment and readable storage medium
CN115865803B (en) * 2023-03-03 2023-08-22 浪潮电子信息产业股份有限公司 A kind of IO request processing method, device, equipment and readable storage medium
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data
CN117391149B (en) * 2023-11-30 2024-03-26 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data
WO2025161381A1 (en) * 2024-01-29 2025-08-07 华为技术有限公司 Storage system, data storage method and storage device

Similar Documents

Publication Publication Date Title
US11360705B2 (en) Method and device for queuing and executing operation commands on a hard disk
WO2022017002A1 (en) Garbage collection method and device
US20240231645A1 (en) Storage device, data storage method, and storage system
CN115079936A (en) Data writing method and device
CN115686341A (en) Method, device, storage equipment and storage medium for processing access request
US12524343B2 (en) Computing node cluster, data aggregation method, and related device
US20250370963A1 (en) Storage system, data access method and apparatus, and device
US20250130723A1 (en) Data processing method and related device
WO2017132797A1 (en) Data arrangement method, storage apparatus, storage controller and storage array
CN115237854A (en) Log management method and device
WO2023020136A1 (en) Data storage method and apparatus in storage system
CN108334457B (en) A kind of IO processing method and device
CN120066403B (en) Storage systems and storage methods
CN116560560A (en) Method and related device for storing data
CN115793957A (en) Method and device for writing data and computer storage medium
US12386786B2 (en) Data processing method and apparatus
CN115687170A (en) Data processing method, storage device and system
CN112000289A (en) All-flash storage server system data management method and related components
CN212341857U (en) Intelligent storage device, system and hard disk cartridge
CN116243854B (en) Data storage system and data storage method
WO2022262345A1 (en) Data management method and apparatus, and storage space management method and apparatus
CN119718165A (en) A data access method, CXL storage device and CXL controller
CN115878308A (en) Resource scheduling method and device
CN115858237A (en) A data processing method and storage device
CN116501266B (en) Message context processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination