CN118869393A

CN118869393A - Method and device for accessing solid state disk (SSD) across a network

Info

Publication number: CN118869393A
Application number: CN202310483412.4A
Authority: CN
Inventors: 屈向峰; 程传宁; 程中武; 谭焜; 李力军; 王智用
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2024-10-29

Abstract

A method of accessing an SSD across a network, applied to a first device, the first device comprising a transport layer and a transaction layer, the method comprising: the transport layer of the first device receives a first transaction from the transaction layer of the first device that reads or writes to the SSD of the second device, encapsulates the first transaction with a transport layer header, obtains a first message, and sends the first message to the transport layer of the second device. The first message is used for requesting the SSD controller of the second device to execute the first transaction, and the first message is unpackaged to obtain the first transaction which can be directly sent to the SSD controller of the second device. After the transmission layer of the second device receives the message for reading or writing the SSD of the second device, the message is unpacked to obtain the transaction which needs to be sent to the SSD controller of the second device, and the unpacked message is unpacked again without converting other protocols, so that the conversion between protocols is avoided, and the time delay of accessing the SSD across the network can be reduced.

Description

Method and device for accessing solid state disk SSD across network

Technical Field

The embodiment of the application relates to the field of communication, in particular to a method and a device for accessing a Solid State Disk (SSD) across a network.

Background

Nonvolatile express bus (non-volatile memory express, NVMe) is a controller interface standard that unifies a Queue (Queue) transfer mechanism between NVMe devices and hosts (Host) connected through peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) buses, optimizing Queue interfaces, etc.

After the great success of the PCIe architecture NVMe standard in the industry, the NVMe protocol is run on a network such as a remote memory direct access protocol (remote direct memory access over Converged Ethernet, ROCE) based on converged ethernet, fiber Channel (FC), and the like, so as to provide more flexible and wider application. The industry refers to an application of NVMe protocols running on RoCE and FC networks as structure-based NVMe (NOF).

In the architecture of NVMe over Fabric, a Host (Host) is responsible for initiating reading and writing of data; the Target storage device (Target) is responsible for receiving and executing commands sent by the Host. After the Target receives a Write Command (Write Command) sent by the Host, analyzing the content in the Write Command to obtain the data length required to be transmitted by the Write Command, and distributing a corresponding cache space in a network card memory of the Target for caching the data to be transmitted by the Host. After the network card of the Target allocates the storage space required by the cached data, the Host transmits the data to the cache space of the allocated network card memory. And then, the data cached in the cache space of the network card memory is written into the hard disk of the Target. When the Host sends a Read Command, the implementation process is similar, namely, the data in the Target hard disk needs to be cached in the cache space of the network card memory of the Target, and then the data cached in the cache space is sent to the Host.

At present, an NVMe protocol can be carried on a RoCE protocol in an NVMe over Fabric architecture, so that SSD is accessed across networks. However, in the current technology of accessing SSD across networks, multiple conversions of remote direct data access (Remote Direct Memory Access, RDMA) protocol and NVMe protocol are required, and the delay is relatively large. How to reduce latency in accessing SSDs across networks is a challenge.

Disclosure of Invention

The embodiment of the application provides a method for accessing a solid state disk SSD across a network, which aims to reduce the time delay of accessing the SSD without executing conversion of an RDMA protocol and an NVMe protocol when a first device accesses the SSD of a second device across the network.

In a first aspect, a method for accessing an SSD across a network is provided, for use with a first device. The method may be performed by the first device or may be performed by a circuit configured in the first device, which is not limited by the present application. For convenience, the following description will be given with the first device execution example. The first device comprises a transaction layer and a transport layer, and the method for accessing SSD across a network comprises the following steps:

The transmission layer of the first device receives a first transaction from the transaction layer of the first device, wherein the first transaction is an SSD for reading the second device or an SSD for writing the second device; the transport layer of the first device encapsulates the first transaction with a transport layer header TPH to obtain a first message; the transmission layer of the first device sends the first message to the transmission layer of the second device, the first message is used for requesting the SSD controller of the second device to execute the first transaction, and the first message is unpackaged to obtain the first transaction sent to the SSD controller of the second device.

Based on the above technical solution, when the transaction layer of the first device has a first transaction to be processed (for example, reading or writing a remote SSD), the transaction layer of the first device sends the first transaction to the transport layer of the first device, the transport layer of the first device directly encapsulates the first transaction to obtain a first message, and directly sends the encapsulated first message to the transaction layer of the second device, the first message is stripped from the transport layer packet header and the previous information, that is, after the second device receives the first message, the first transaction can be obtained by stripping the transport layer packet header of the first message and the previous information, and the first transaction requested to be executed by the first device is obtained.

With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes: the transport layer of the first device receives a first response message from the transport layer of the second device, the first response message being used to indicate whether the SSD controller of the second device successfully receives the first transaction; after the second device successfully receives the first transaction, the transport layer of the first device receives a second response message from the transport layer of the second device, where the second response message is used to instruct the SSD controller of the second device to successfully execute the first transaction.

Based on the above technical solution, after the transport layer of the second device receives the first message, the first transaction after decapsulation may be sent to the SSD controller of the second device, so that the SSD controller of the second device knows that the first transaction requested to be executed by the first device currently, and in some cases (for example, the second device side buffer is full, the SSD controller of the second device cannot receive the first transaction) the SSD controller of the second device may receive the failure, and the second device may notify the first device through the first response message whether the first transaction is successfully transmitted, and in addition, the second device may notify, through the second response message, that the first transaction is successfully executed if the first transaction is successfully executed. In summary, the second device may timely feed back the results of the transmission stage and the execution stage in a two-step response manner, so that the first device knows the conditions of each stage (for example, whether transmission is completed or whether execution is completed) according to the response message of each stage, and timely feeds back the conditions of each stage. Improving the performance of accessing SSDs across a network.

With reference to the first aspect, in certain implementation manners of the first aspect, if the first response message is a transaction reply TAACK, indicating that the SSD controller of the second device successfully receives the first transaction, the method further includes: the transport layer of the first device sends the TAACK to the transaction layer of the first device.

With reference to the first aspect, in certain implementation manners of the first aspect, if the first response message is a transaction negative acknowledgement TANAK, indicating that the SSD controller of the second device did not successfully receive the first transaction, the method further includes: the transport layer of the first device sends the TANAK to the transaction layer of the first device; the transport layer of the first device receives the first transaction from the transaction layer, encapsulates the first transaction again to obtain the first message, and resends the first message to the transport layer of the second device.

With reference to the first aspect, in certain implementations of the first aspect, the transmission layer of the first device receives a first transaction from the transaction layer of the first device, including: a first transport layer group TPG in a transport layer of the first device receives the first transaction from a transaction layer of the first device; the transmission layer of the first device sending a first message to the transmission layer of the second device, comprising: and the transmission layer of the first device selects the transmission layer TP connection with the lightest load in the first TPG, and sends the first message to the transmission layer of the second device through the TP connection.

Based on the above technical solution, the transport layer of the first device may implement load balancing in a certain TPG.

With reference to the first aspect, in certain implementation manners of the first aspect, if the first transaction is writing first data to an SSD of the second device, the method further includes: the transmission layer of the first device receives a second message from the transmission layer of the second device, wherein the second message is a message after a second transaction is encapsulated, the second transaction is a transaction processed by the transaction layer of the second device, and the second transaction is the first data acquired from the first device; the transmission layer of the first device strips the transmission layer packet header TPH of the second message and the information before the TPH to obtain the second transaction, and determines a target virtual machine VM according to an entity identifier EID carried in the second transaction; the transport layer of the first device sending the second transaction to a remote control, RC, queue of the VM to schedule the RC queue, generating a third response message in response to the second message; the transmission layer of the first device sends the third response message to the transmission layer of the second device, wherein the third response message comprises first data to be written into the SSD of the second device.

With reference to the first aspect, in certain implementation manners of the first aspect, if the first transaction is to read second data in an SSD of the second device, the method further includes: the transmission layer of the first device receives a third message from the transmission layer of the second device, wherein the third message is a message carrying the second data; and the transmission layer of the first equipment strips the transmission layer packet header TPH of the third message and the information before the TPH to obtain the second data, and writes the second data into the memory of the first equipment.

In a second aspect, a method of accessing an SSD across a network is provided for use with a second device. The method may be performed by the second device or may be performed by a circuit configured in the second device, which is not limited in this regard. For convenience, the second device execution example will be described below. The second device comprises a transaction layer and a transport layer, and the method for accessing SSD across a network comprises the following steps:

The method comprises the steps that a transmission layer of second equipment receives a first message from a transmission layer of first equipment, wherein the first message is a message packaged by a first transaction, and the first transaction is an SSD for reading the second equipment or an SSD for writing the second equipment; the transmission layer of the second device strips the transmission layer packet header TPH of the first message and the information before the TPH to obtain the first transaction; the transport layer of the second device sends the first transaction to the transaction layer of the second device.

With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the transmission layer of the second device sends a first response message to the transmission layer of the first device, wherein the first response message is used for indicating whether the SSD controller of the second device successfully receives the first transaction; after the second device successfully receives the first transaction, the transport layer of the second device sends a second response message to the transport layer of the first device, where the second response message is used to instruct the SSD controller of the second device to successfully execute the first transaction.

With reference to the second aspect, in certain implementation manners of the second aspect, if the first response message is a transaction negative acknowledgement TANAK, indicating that the SSD controller of the second device did not successfully receive the first transaction, the method further includes: the transport layer of the second device re-receives the first message from the transport layer of the first device.

With reference to the second aspect, in certain implementations of the second aspect, if the first transaction is writing first data to an SSD of the second device, the method further includes: the transmission layer of the second device receives a second transaction from the SSD controller of the second device, the second transaction being the acquisition of the first data from the first device; the transmission layer of the second device encapsulates the second transaction with a transmission layer packet header TPH to obtain the second message; the transmission layer of the second device sends the second message to the transmission layer of the first device; the transport layer of the second device receives a third response message from the transport layer of the first device in response to the second message, the third response message including the first data to be written to the SSD of the second device.

With reference to the second aspect, in some implementations of the second aspect, if the first transaction is to read second data in an SSD of the second device, the method further includes: the transport layer of the second device receives the second data from the SSD controller of the second device; the transmission layer of the second device encapsulates the second data with a transmission layer packet header TPH to obtain the second message; the transport layer of the second device sends the third message to the transport layer of the first device.

The technical effects of the method shown in the above second aspect and its possible designs can be referred to the technical effects in the first aspect and its possible designs.

In a third aspect, there is provided an apparatus for accessing an SSD across a network, the apparatus comprising: the storage module is used for storing programs; and the processing module is used for executing the program stored in the storage module, and when the program stored in the storage module is executed, the processing module is used for executing the method provided by each aspect.

In a fourth aspect, there is provided a computer readable storage medium storing program code for execution by a device, the program code comprising instructions for performing the methods provided in the above aspects.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided in the above aspects.

In a sixth aspect, a chip is provided, the chip comprising a processing module and a communication interface, the processing module reading instructions stored on a memory via the communication interface for performing the methods provided in the above aspects.

Optionally, as an implementation manner, the chip may further include a storage module, where the storage module stores instructions, and the processing module is configured to execute the instructions stored on the storage module, where the instructions, when executed, are configured to perform the method provided in the foregoing aspects.

In a seventh aspect, there is provided a chip comprising a first device for performing the method provided in the first aspect and a second device for performing the method provided in the second aspect.

In an eighth aspect, there is provided a computer device comprising a chip as shown in the seventh aspect. For example, computer devices include, but are not limited to, switches or servers in a data center.

In a ninth aspect, there is provided a terminal device comprising the chip shown in the seventh aspect. For example, the terminal device includes, but is not limited to, a mobile phone, a vehicle, and the like.

In a tenth aspect, there is provided a system for accessing an SSD across a network, the system comprising a first device for performing the method provided in the first aspect and a second device for performing the method provided in the second aspect.

Drawings

Fig. 1 (a) is a schematic structural diagram of a computer device according to an embodiment of the present application.

Fig. 1 (b) is a schematic diagram of a data center according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a generic bus protocol message format according to an embodiment of the present application.

FIG. 3 is a schematic diagram of an NVMe over Fabric architecture.

Fig. 4 is a schematic diagram of a cross-network access SSD according to an embodiment of the application.

Fig. 5 is a schematic structural diagram of a host according to an embodiment of the present application.

Fig. 6 is an exemplary flow chart for accessing SSDs across a network.

Fig. 7 is an exemplary flow chart of accessing SSDs across a network provided by an embodiment of the application.

Fig. 8 shows a schematic structural diagram of an apparatus 800 for accessing an SSD across a network according to an embodiment of the application.

Fig. 9 shows a schematic structural diagram of a chip system 900 according to an embodiment of the present application.

Fig. 10 schematically shows a conceptual partial view of a computer program product provided by an embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

In many applications, the computer device deploying the application needs to access data to implement the functionality of the application. For example, a computer device deploying a database application may need to have a large amount of data access to update data in the database or, in response to a data query request, return a query result to the user. As another example, a computer device deploying a network (web) application may require extensive data access to return requested content to a user.

The computer device may be a server, a switch, or a terminal. Terminals include, but are not limited to, user devices such as desktop computers, notebook computers, smart phones, and the like. For ease of understanding, the structure of the computer device is described below.

Referring to the schematic structure of the computer device shown in fig. 1 (a), the computer device includes a processor 101, an input/output device (input output device, IO device) 102, a memory 103, a cache 104, a memory management unit (memory management unit, MMU) 105, an input/output memory management unit (input output management unit, IOMMU) 106, a memory 107 (e.g., SSD, network card, etc.), and a bus 108.

The processor 101 includes at least one core. The kernel is also referred to as a compute engine. Wherein each core may independently perform a task. When processor 101 includes multiple cores, tasks from an application may be partitioned such that the application can leverage the multiple cores to perform more tasks in a particular time. In this embodiment, the processor 101 may be a main processor, such as a central processing unit (Central Processing Unit, CPU).

Input-output device 102 refers to a hardware device having the capability to input data and/or output data. The input-output devices 102 may be divided into input devices and output devices. The input device may include a mouse, a keyboard, a joystick, a stylus, a microphone, and the like, and the output device may include a display, a speaker, and the like.

The memory 103 is also referred to as an internal memory or a main memory, and is used for temporarily storing operation data in the processor 101. Further, the memory 103 is also used for temporarily storing data exchanged with the external memory 107. Memory 103 may be implemented using a storage medium such as Dynamic Random Access Memory (DRAM) or static random access memory (static random access memory, SRAM).

The cache 104 (in this embodiment, a processor cache, such as a CPU cache) is a means for reducing the average time required for the processor 101 to access the memory 103. Referring to fig. 1 (a), in the pyramid-type storage system, the cache 104 is located at the top-down second level, next to the registers (not shown in fig. 1 (a)) of the processor 101, higher than the memory 103 (the memory 103 is located at the top-down third level). Typically, the capacity of the cache 104 is much smaller than the memory 103, but the access speed may be close to the frequency of the processor 101.

The memory management unit 105 is a type of computer hardware for handling data access requests. The memory management unit 105 is specifically configured to map Virtual Addresses (VA) in the data access request. The memory management unit 105 may intercept a data access request sent by a kernel of the processor 101, and map (or translate) a virtual address in the data access request to a physical address (PHYSICAL ADDRESS, PA), so as to access the memory 103 according to the physical address.

The i/o memory management unit 106 is essentially a memory management unit. Similar to the memory management unit 105 mapping virtual addresses visible to the processor 101 to physical addresses, the input output memory management unit 106 is configured to map virtual addresses (which may also be referred to as device addresses or IO addresses) of the input output device visible 102 to physical addresses.

The external memory 107, also referred to as external memory, auxiliary memory, is typically used to persist data. For example, the memory 107 may persist operational data in the processor 101. Even if the power supply is abnormal, the data written into the external memory 107 can still be stored, and the data loss is avoided. In particular, the external memory 107 includes at least one non-volatile memory 1071, and when the external memory includes a plurality of non-volatile memories, the plurality of non-volatile memories may be of the same type or different types. For example, in the example of fig. 1 (a), the external memory 107 may include two types of nonvolatile memory, such as storage class memory (storage class memory, SCM) and solid state disk (solid STATE DRIVE, SSD).

Bus 108 is used to connect the various functional components of the computer device. Bus 108 is a common communications backbone that carries information among the various functional components of the computer device. Bus 108 may be a transmission harness formed of wires. The bus 108 may be further divided into an internal bus and an external bus according to the connection object.

Wherein the internal bus employs an internal bus protocol to transfer information. The internal bus protocol includes a bus protocol for accessing a memory space of the computer device. The external bus uses an external bus protocol to transfer information. The external bus protocol includes a bus protocol for accessing a memory space of the computer device. The memory space refers to the address space of the memory, and the external memory space refers to the address space of the external memory.

In some embodiments, the internal Bus protocols include, but are not limited to, a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) Bus, a peripheral component interconnect standard Express (PCI-E) protocol, a quick path interconnect (IntelTM Quick Path Interconnect, QPI) protocol, a Universal Bus (UB) protocol. External bus protocols include, but are not limited to, small computer system interface SYSTEM INTERFACE (SCSI) protocol or serial-to-small computer system interface (SERIAL ATTACHED SCSI, SAS) protocol.

The computer device shown in fig. 1 (a) is exemplified by the external memory 107 as a remote external memory. As shown in fig. 1 (a), the external memory 107 includes a network card 1072. The network card 1072 may be, for example, a smart NIC network interface card (i.e., a network adapter card). The external memory 107 is connected to a network via the network card 1072, and is further connected to other components of the computer device 101 via the network. The network may be a wired communication network, such as a fiber optic communication network, or a wireless communication network, such as a wireless local area network (wireless local area network, WLAN) or a fifth generation (the fifth generation, 5G) mobile communication network.

In some possible implementations, the memory 107 of the computer device may also be a local memory to which other components of the computer device, such as the processor 101, may be coupled via the bus 108. In other possible implementations, the computer device may include both remote and local memory. In addition, the embodiment of the present application may be applicable to a centralized storage or a distributed storage scenario, which is not limited in this embodiment.

The present application relates generally to accessing SSDs across networks, and is illustratively applied to server clusters requiring communication across networks, such as the data center shown in FIG. 1 (b). Wherein the switch or server internal structure shown in fig. 1 (b) is as shown in fig. 1 (a) above, the computer devices referred to hereinafter include, but are not limited to, switches or servers in a data center.

In addition, the SSD controller according to the present application supports a Universal Bus (UB) protocol, where the universal Bus protocol may also be called a smart Bus or a Unified Bus, and a Bus protocol standard is not limited to the name of the universal Bus.

The universal bus protocol breaks through the barriers of various existing protocols, and removes unnecessary conversion overhead in the middle, thereby realizing extremely low time delay. The universal bus protocol defines separate transaction and transport layers. There is a connection between the transport layers and no connection between the transaction layers. All transactions within a Host are carried on one transport layer. The universal bus protocol comprises a transmission layer and a transaction layer, wherein the transmission layer is responsible for network packet loss retransmission, reliable transmission is guaranteed, and the transaction layer processes different transactions. The transport layer receives the packet from the network, peels off the transport layer packet header, and forwards the packet to the transaction layer.

The universal bus protocol message format is shown in fig. 2. Specifically, the fields in the universal bus protocol message format are defined as shown in table 1 below:

TABLE 1

Specifically, the transaction layer of the universal bus protocol and the interactive interface of the application are called Jetty, and the message of the application can be sent to any destination through one Jetty, and also can be received from any source through one Jetty. The just-sent Jetty, defined as (Jetty For Send, JFS); the receivable Jetty is defined as (Jetty For Receive, JFR).

As can be seen from the description about the external memory 107 in fig. 1, the external memory 107 may be an SSD, and with the rapid development of the SSD, the data access SSD is about 100 times faster than the disk, and the throughput is 100 times greater. Meanwhile, the improvement of network bandwidth brings vitality for storing the IP network and vigor for the development of the distributed technology. SSD pooling realizes sharing of SSD by a plurality of computing nodes, and effectively improves resource utilization rate. However, inserting SSDs directly into existing storage systems has one disadvantage: it cannot fully exploit the performance-enhancing potential of the underlying technology. To truly exploit the potential of SSD devices requires re-inspection of the manner in which the storage system is connected to the server, memory providers have devised a variety of approaches for SSD-based memory, the most interesting of which is the design of a direct PCIe bus. After the construction of multiple proprietary devices, the storage and server industry created NVMe in 2011 at hand.

NVMe is a protocol and not a form factor or interface specification. Unlike other storage protocols, NVMe treats SSD devices as memory, not hard disk drives. The design of NVMe protocol was originally aimed at use with PCIe interfaces and is therefore almost directly connected to the CPU and memory subsystem of the server.

The NVMe protocol is not limited to connecting local flash drives within a server, but may also be used over a network, where the network "architecture" supports any connection between storage and server elements when used within a network environment. The NVMe over Fabric support organization creates an ultra-high performance storage network with a latency comparable to direct-connected storage. The flash memory device can thus be shared as needed between servers. NVMe over Fabric can be considered as a substitute for fibre channel based Small Computer system interface (Small Computer SYSTEM INTERFACE, SCSI) or internet Small Computer interface (INTERNET SMALL Computer SYSTEM INTERFACE, ISCSI) with the advantages of lower latency, higher I/O rate, and better productivity. For ease of understanding, the architecture of NVMe over Fabric is briefly described in connection with fig. 3.

FIG. 3 is a schematic diagram of an NVMe over Fabric architecture. Host100, target200, and Target210 are included in fig. 3. The Host100 is a Host, and is mainly responsible for initiating data reading and writing, for example, sending a data reading and writing command. Target200 and Target210 are Target storage devices, also referred to as NVM subsystems (subsystems) in the NVMe protocol, and are mainly responsible for receiving and executing data read and write commands sent by Host 100. The specific form of the Host100 includes, but is not limited to, a physical server or a virtual machine on the physical server, where the physical server may be a computer device including a CPU, a memory, a network card, and other components. The detailed structural schematic diagrams of Host100, target200, and Target210 in fig. 3 may refer to the description of the computer device in fig. 1, and are not repeated here.

Target 200 may be a separate physical hard disk system, as shown in fig. 3, where Target 200 includes a network card 201 and one or more hard disks, and network card 201 and one or more hard disks are respectively connected. In fig. 3, three hard disks are taken as an example, and in a specific implementation, target 200 may include more than one hard disk. The hard disk in Target 200 may be a storage medium having a storage function, such as a Solid state disk (Solid STATE DISK, SSD) or a hard disk drive (HARD DISK DRIVER, HDD). The network card 201 has a function of a network interface card, which may be a remote network interface card (emote Network INTERFACE CARD, RNICR) in NVMe over Fabric, and the network card 201 performs communication related to data read/write command or data transmission with the Host 100 through Fabric.

Target210 is similar in structure to Target 200, including network card 211 and one or more hard disks. The functions and implementations of the constituent elements (network card 211, hard disk, etc.) in Target210 are similar to those of the constituent elements (network card 201, hard disk, etc.) in Target 200. In a specific implementation, there may be multiple targets, and fig. 3 illustrates only two targets (Target 200 and Target 210) as an example.

Taking the example that the Host 100 needs to store data to the Target 200 in fig. 3, a process of sending data to the Host 100 and receiving data by the Target will be described, including:

Step one: when Host100 needs to store data to Target200, host100 sends a Command via a Write Command, which typically carries the data to be stored. If the amount of data to be stored is large, the Host100 will carry the scatter gather list in the Write Command (SCATTER GATHER LIST, SGL) when the Host100 cannot carry and send data over the Write Command (e.g., the amount of data to be stored exceeds the maximum amount of data that the Write Command can carry). The SGL includes a field, which may be an entry, for example, and includes information such as a source address of data to be stored in the Host100, a length of data to be stored, and a destination address of data to be stored in the Target 200. It should be noted that the SGL may also include a plurality of fields, for example, a plurality of entries, where each entry includes information such as a source address of data to be stored in the Host100, a length of data to be stored, and a destination address of data to be stored in the Target 200. When the data to be stored includes a plurality of address fields, that is, the data to be stored is discontinuous in the Host100 and exists in the plurality of address fields, a plurality of entries are required to record the data in the plurality of address fields.

Step two: host 100 sends a Write Command to Target 200 via the network card; the Write Command includes an SGL.

Optionally, the data to be stored may be more than one data block, and since the length of the data block is fixed, the length of the data to be written may be recorded by the number of data blocks.

Step three: after the network card 201 in the Target 200 receives the Write Command, the length of the data to be stored carried in the Write Command is obtained, and a corresponding storage space is allocated in a network card memory (not shown in fig. 3) of the network card 201, that is, a storage space with the same length as the data to be stored carried in the Write Command is allocated in the network card memory of the network card 201, so as to be used for caching the data to be stored sent by the Host 100.

Step four: after the network card 201 allocates the storage space required by the buffered data, the network card 103 is notified by the RDMA command to transmit the data to be stored by the RDMA method. That is, the network card 103 is notified to read the data to be stored according to the source address of the data to be stored in the Host100, receive the data to be stored transmitted by the network card 103 through the network, and buffer the received data to be stored in the storage space of the network card memory.

The data to be stored cached in the network card memory is migrated to the hard disk in Target 200.

Specifically, in the NVMe over Fabric architecture, NVMe can be mapped to various physical network transmission channels, and network protocols such as RDMA and the like can be combined with NVMe to realize cross-network access to SSD.

For ease of understanding, a scenario in which a host accesses an SSD disk (or SSD pool, double Data Rate (DDR)) across a network is briefly described with reference to fig. 4, fig. 4 is a schematic diagram of a cross-network access provided by an embodiment of the application.

Specifically, the host cross-network access SSD disk in the present application may be the SSD disk and/or DDR in the host a cross-network access host B in fig. 4, or may also be the SSD disk and/or DDR in the host C cross-network access host B in fig. 4.

The server (or other host) communicates directly with the NVMe storage via the network architecture or indirectly with it via the controller. If the memory solution uses a controller, the controller communicates with its own memory target, which may include NVMe over Fabric or other proprietary or non-proprietary solutions. Depending on the implementation and choice of memory vendor.

Fig. 5 is a schematic structural diagram of a host according to an embodiment of the present application. The host (host a or host B as shown in fig. 5) may be applied in an application scenario where SSDs are accessed across a network. As shown in fig. 5, the host a includes a plurality of Virtual Machines (VMs), and one VM includes a plurality of processes (or applications) and a plurality of Remote Command (RC) tables for receiving Remote Read commands. One process contains several communication interfaces (e.g., jetty shown in fig. 5), a send-only communication interface (e.g., JFS shown in fig. 5), a send-only communication interface (e.g., JFR shown in fig. 5). The Jetty is bidirectional and can be received or transmitted; JFS is unidirectional and can only be sent; JFR is unidirectional and can only be received.

Jetty, JFS, JFR and RC have respective Contexts (CXT). Jetty, JFS, JFR and RC belong to the transaction layer. Several Transport layer (TP) connections (e.g., TP connection #0 through TP connection #7 shown in fig. 5, for a total of 8 TP connections) are established between two hosts, and these 8 TP connections may form a Transport Group (TPG) through which all traffic between the two hosts passes. The 8 TP connections can be distributed on different physical ports, and the traffic is transmitted between the 8 TP connections in an equalizing way, so that multi-port and multi-path are realized. The TP connection and the TPG belong to a transmission layer. The two hosts pass through a network, the network possibly loses packets, the TP connection is responsible for network packet loss retransmission, and the end-to-end reliability is ensured. The TP connection is responsible for end-to-end congestion control.

Host B in fig. 5 includes several VMs and several SSD controllers. The VM and SSD controller belong to the transaction layer.

The scenarios in which the present application can be applied and the related internal logic units of the host are briefly described above with reference to fig. 1 to 5, and in order to facilitate understanding of the embodiments of the present application, some basic concepts related to the present application are briefly described.

1. Network card: which may also be referred to as a network interface controller (network interface controller, NIC), network adapter, or local area network receiver, is a type of computer hardware designed to allow a host or computer device to communicate over a network.

2. Packet sequence number (packet sequence number, PSN): when the transmission layer of the transmitting side transmits packets, each packet is provided with a PSN, and the PSN is gradually increased. The receiving side receives the packet, returns TPACK (PSN carrying the received packet) to inform the transmitting side that the transport layer packet was received correctly. If the receiving side receives a packet, and finds that the packet smaller than the packet PSN is not received, it is determined that the packet smaller than the packet PSN is lost in the network, TPSACK (PSN carrying the received packet, PSN of the lost packet) is returned, and the transmitting side transmission layer receives TPSACK and retransmits the lost packet.

3. Slice sequence number (Segment sequence number, SSN): the messages of the transaction layer may be relatively large, e.g., 16MB. In the UB protocol, a plurality of transaction layers share a transport layer, and in order to prevent a message of a transaction layer from occupying a transport layer connection for a long time, when the transaction layer sends the message to the transport layer, the message is cut into a plurality of slices, for example, one slice 64KB, and only one slice is sent to the transport layer by a transaction layer at a time.

4. Transaction acknowledgement (TAACK): after the receiving side receives a Segment (a Segment is split into multiple packets at the transport layer) and executes correctly (e.g., reads and writes memory correctly), return TAACK to inform the sending side that the slice has executed correctly, or the receiving side successfully receives the message from the sending side, return TAACK to inform the sending side that the message has been received successfully. 5. Transaction negative acknowledgement (Transaction No OK ACK, TANAK): the receiving side receives a Segment and executes an error (e.g. a page fault occurs in the read-write memory), returns TANAK to inform the sending side of the transaction layer, retransmits the slice, or the receiving side does not successfully receive the message of the sending side, returns TANAK to inform the sending side of the transaction layer, and retransmits the message.

6. Page fault (Page fault): when the virtual address is used for reading and writing the memory, page missing occurs.

7. NVMe: a non-volatile memory standard, a protocol standard over the PCIe interface is run.

8. RDMA: the remote direct memory access technology is a remote direct memory access technology, a transport layer protocol is realized in hardware, a memory or message primitive interface is exposed to a user space, and high throughput and low delay of a network are realized by bypassing a CPU and a kernel network protocol stack. Today, the large-scale RDMA deployment modes mainly include Infiniband and RoCE, the former is mainly used in the high-performance field, and the latter is used in the Internet company data center.

In addition, in order to facilitate understanding of the embodiments of the present application, the following description is made.

First, the term "at least one" as used herein means one or more, and the term "plurality" means two or more. In addition, in the embodiments of the present application, "first", "second", and various numerical numbers (e.g., "#1", "#2", etc.) are merely for convenience of description and are not intended to limit the scope of the embodiments of the present application. The following sequence numbers of the processes do not mean the order of execution, which should be determined by the functions and internal logic thereof, but should not constitute any limitation on the implementation process of the embodiments of the present application, and it should be understood that the objects thus described may be interchanged where appropriate so as to be able to describe schemes other than the embodiments of the present application. In addition, in the embodiment of the present application, the words "S710" and the like are merely marks for convenience of description, and do not limit the order of executing the steps.

Second, in embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

Third, references to "save" in embodiments of the present application may refer to saving in one or more memories. The one or more memories may be provided separately or may be integrated in an encoder or decoder, processor, or communication device. The one or more memories may also be provided separately in part and integrated in the decoder, processor, or communication device. The type of memory may be any form of storage medium, and the application is not limited in this regard.

Fourth, references to "comprising" and/or "includes" in embodiments of the present application when used in this specification specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Fifth, references to "if" in embodiments of the present application may be interpreted as meaning "when" ("white" or "upon") or "in response to a determination" or "in response to detection". Similarly, the phrase "if a [ stated condition or event ] is detected" may be interpreted to mean "upon a determination" or "in response to a determination" or "upon a detection of a [ stated condition or event ] or" in response to a detection of a [ stated condition or event ], depending on the context.

Sixth, the terminology used in the description of the various examples in the embodiments of the application is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various described examples and in the appended claims, the numerical forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Seventh, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The above is a simple description of a scenario where the method for accessing SSD across network provided by the present application is applicable, and a basic concept related to the present application is described in conjunction with fig. 1 to 5. RDMA is introduced in the basic concept, and the manner of large-scale RDMA deployment is mainly known as Infiniband and RoCE, and the NVMe over RoCE technology is briefly introduced below with reference to FIG. 6. RoCE is an ethernet-based RDMA technology. NVMe is carried over RoCE, enabling access to SSDs across the network.

Fig. 6 is an exemplary flow chart of accessing an SSD across a network for use in a scenario in which a first device accesses an SSD of a second device across the network.

The first device is a computer device for sending a data transmission instruction in the data transmission process. The first device may be referred to as an initiator device or a source device. The first device may be provided as a host in NVMe protocol, NOF protocol, RDMA protocol, PCI protocol, PCIe protocol. The first device may be provided as a storage client. The first device may be a server, a personal computer, a notebook computer, a terminal, etc. The first device may be a physical device, or the first device may be a virtual machine or container running on the physical device.

The second device is a computer device which receives a data transmission instruction in the data transmission process. The second device may be referred to as a destination (target) device or a receiving device. The second device may be provided as a host (host) in NVMe protocol, network interconnect NVMe (NVMe over Fabrics, NOF) protocol, RDMA protocol, PCI protocol, and PCIe protocol. The second device may be provided as a storage server. The second device may be used to perform Input Output (IO) processing tasks. The second device may be a server, a personal computer, a notebook computer, a terminal, etc. The second device may be a physical device, or the second device may be a virtual machine or container running on the physical device.

It should be understood that the specific form of the first device and the second device in this embodiment is not limited, and may be a computer device supporting NVMe over RoCE technology.

Specifically, the first device accessing the SSD in the second device includes the steps of:

s610, the first device sends an RDMA message to the second device.

In this embodiment, the data transfer instructions may be transferred by way of RDMA, and the data transfer instructions may be carried in RDMA messages. The data transmission instruction is used for indicating transmission data. The data transfer instruction may be an IO instruction.

Optionally, the data transmission instruction may be an IO instruction in an NVMe protocol or a NOF protocol. The data transfer instruction may be, for example, NVME WRITE instructions and NVMe read instructions. The data transfer instruction may include at least one of a write (write) instruction, a read (read) instruction, a send (send) instruction, and a receive (receive) instruction.

Wherein the write instruction may instruct writing data to the memory of the second device to store data through the memory of the second device. The read instruction may be for instructing reading of data from the memory of the second device for retrieval of data by the memory of the second device.

For example, a first device (i.e., initiator) translates a write SSD message into a Send message of the RDMA protocol, and sends the Send message to a second device (i.e., target). The Send message carries the address and length of the data to be written in the Initiator. The following description will be given by taking the example of the first device sending the Send message to the second device for convenience of description.

Further, after the second device receives the RDMA message, the second device parses the RDMA message, and the method flow shown in fig. 5 further includes:

S620, the second device analyzes the Send message.

The second device may receive the Send message of the first device, and parse the Send message to obtain a data transmission instruction carried by the Send message. For example, the second device may receive the RDMA WRITE message of the first device, parse the RDMA WRITE message, and obtain the NVMe instruction carried by the RDMA WRITE message. For another example, the second device may receive an RDMA send message of the first device, parse the RDMA send message, and obtain NVMe instructions carried by the RDMA send message.

Alternatively, the second device may include a memory, and the second device may process the data transmission instruction through the memory. Alternatively, the second device may control the memory to complete processing the data transfer instruction by writing content carried in the RDMA message to at least one resource of the memory. For example, the memory receives the Send message, generates a completion queue entry (Complete Queue Entry, CQE), wakes up RDMA protocol software, and then hands over to NVMe, which is an upper layer application, to the SSD controller of the second device.

S630, the second device sends a Read command to the first device.

Further, the SSD controller of the second device issues a 'Read' command to the memory of the second device.

Specifically, the Memory of the second device first applies for the Memory of the second device, registers a Memory Region (MR), and is used for receiving Read Response. Then, the memory of the second device issues a Read command to the first device, carrying the address and length of the data to be written in the first device.

S640, the first device executes the read command.

Specifically, the first device executes 'Read', reads 'data to be written' from the memory of the first device, composes a Read Response (Read Response), and sends the Read Response to the second device.

S650, the first device sends a read response to the second device.

The memory of the second device receives the 'Read Response', writes the Read Response into the applied memory of the second device, generates a CQE, wakes up RDMA protocol software, and gives the wake up RDMA protocol software to NVMe as an upper layer application to inform an SSD controller of the second device.

S660, the second device writes the data.

Specifically, the SSD controller of the second device reads the memory of the second device, writing the data to the SSD.

In the access flow shown in fig. 6, the Initiator side may access the SSD of the Target side across the network, but the following problems exist in the cross-network access:

1) Multiple RDMA protocol and NVMe protocol conversion increases the cost and delay. For example, after the memory at the Target end receives the 'Send' message at the Initiator end, the RDMA protocol software needs to be awakened to realize the conversion from RDMA protocol to NVMe protocol; for example, before the memory of the Target receives the 'Read' command and sends the 'Read' command to the Initiator, the memory needs to implement the conversion from NVMe protocol to RDMA protocol; for another example, after the memory of Target receives the 'Read Response', it is necessary to wake up the RDMA protocol software to implement the RDMA protocol to NVMe protocol.

2) The Target end needs to apply for the registration memory MR and receive the Read Response. Occupying memory resources and memory read-write bandwidth.

In order to solve the problems of large delay and occupation of memory resources and memory read-write bandwidth of the conventional cross-network access SSD, the application provides a method for cross-network access SSD, which solves the problems of the cross-network access SSD shown in fig. 6 by adopting a UB protocol. The method for accessing SSD across network provided by the application will be described in detail below with reference to the accompanying drawings.

It should be understood that the method for accessing SSD across network provided by the embodiment of the application can be applied to a computer system, for example, the system for accessing across network shown in fig. 3.

It should also be understood that the embodiments shown below are not particularly limited to the specific structure of the execution body of the method provided by the embodiments of the present application, as long as the method provided by the embodiments of the present application can be implemented by running a program in which codes of the method provided by the embodiments of the present application are recorded. For example, the execution body of the method provided by the embodiment of the application may be a device, or a functional module in the device that can call a program and execute the program.

Fig. 7 is a schematic flow chart of a method of accessing SSDs across a network provided by the present application. The method for accessing SSD across network can be applied to a scene that a first device accesses a second device across network (such as SSD medium of the first device accesses the second device across network, etc. long-time delay storage medium), wherein the first device comprises a transmission layer and a transaction layer, the second device also comprises the transmission layer and the transaction layer, and an SSD controller of the second device supports a universal bus protocol.

Specifically, the method for accessing SSD across networks comprises the following steps:

s710, the transaction layer of the first device sends the first transaction to the transport layer of the first device, or the transport layer of the first device receives the first transaction from the transaction layer of the first device.

The first transaction (or first command) is a transaction to be processed by a transaction layer of the first device. The transaction processed by the transaction layer includes a read or write SSD, and in this embodiment, the type of the transaction processed by the transaction layer is not limited, and may be an SSD for reading or writing to the second device, or may be another type of transaction, which is not described in detail in this embodiment. For ease of description, the following description will take the first transaction as an example of reading or writing to the SSD. For example, the first transaction received by the transport layer of the first device is a read or write SSD.

Illustratively, the transaction layer of the first device sends the first transaction to the transport layer of the first device, including but not limited to the following sending means:

The application of the first device issues a first transaction (e.g., a command of write data, abbreviated as a 'write' command) to the JFS of the first device, which issues the first transaction to the transport layer of the first device. Further, in this embodiment, after the first transaction is received by the transport layer of the first device, the first transaction may be encapsulated based on the universal bus protocol to obtain a first message, and then the method flow shown in fig. 7 further includes:

and S720, the transmission layer of the first device encapsulates the first transaction to obtain a first message.

Specifically, the first message is used to request the SSD controller of the second device to execute a first transaction (e.g., read or write data), where the first message includes address and length information of the first data to be read, or includes address and length information of the second data to be written.

For example, if the first transaction is "write SSD", the first message may be understood as a "Send" message, which is to be noted that, in this embodiment, the first message is different from the first device sending an RDMA message to the second device in S610 in the cross-network access SSD flow shown in fig. 6, in that:

The first message referred to in this embodiment is encapsulated based on the universal bus protocol rather than the RDMA protocol. As can be seen from the description of the general bus protocol in the basic concept above, the general bus protocol includes a transport layer and a transaction layer, where the transport layer is responsible for network packet loss retransmission, ensuring reliable transmission, and the transaction layer processes different transactions. For the receiving end, the transmission layer receives the packet from the network, peels off the transmission layer packet head and forwards the packet to the transaction layer. The access SSD referred to in the present application can be understood as a type of transaction. Thus, the read or write SSD process based on the universal bus protocol does not require protocol conversion (e.g., avoiding the RDMA protocol to NVMe protocol conversion shown in FIG. 6); in addition, the SSD reading or writing process based on the universal bus protocol does not need the receiving end to apply for the internal memory for transfer.

Illustratively, the transport layer of the first device sends the first message to the transport layer of the second device, including but not limited to the following:

The application of the first device issues a command (such as a write' command) for writing data to the JFS of the first device, the JFS of the first device issues the write command to a first TPG in a transport layer of the first device, the first TPG selects a TP with the lightest load, encapsulates the TPH for the write command to obtain a first message, and sends the first message to the transport layer of the second device through the network.

It should be appreciated that accessing an SSD in this embodiment may be understood as a type of transaction, as the universal bus protocol defines separate transaction and transport layers. There is a connection between the transport layers between the first device and the second device, and there is no connection between the transaction layers. All transactions within one host (e.g., the first device described above) are carried on one transport layer, and the SSD controller of the accessed host (e.g., the second device described above) supports the universal bus protocol, so that protocol conversion is not required to access the SSD based on the universal bus protocol.

S730, the transport layer of the first device sends the first message to the transport layer of the second device, or the transport layer of the second device receives the first message from the transport layer of the first device.

Specifically, in this embodiment, after the transport layer of the first device encapsulates the first transaction, the first message obtained by encapsulation is sent to the transport layer of the second device through the network. It should be appreciated that in this embodiment the first device and the second device are both computer devices supporting the universal bus protocol, and thus the second device also includes a transport layer and a transaction layer.

Further, in this embodiment, after the transport layer of the second device receives the first message, the first message may be parsed, and then the method flow shown in fig. 7 further includes:

s740, the transport layer of the second device parses the first message.

Specifically, after the transport layer of the second device receives the first message, the first message is parsed. The parsing, by the transport layer of the second device, the first message in this embodiment includes: and stripping the transmission layer packet header of the first message to obtain the first message after the decapsulation, and sending the first message (namely the first transaction) after the decapsulation to an SSD controller of the second device.

Illustratively, the transport layer of the second device may learn, through an EID carried in the first message, a table look-up to be forwarded to a certain SSD controller of the second device. For example, the EID included in the first message is information having a bit width of 128 bits, which indicates a certain SSD controller.

The transport layer stripping transport layer header of the second device in this embodiment includes: the transport layer of the second device strips the TPH and the previous portion of the first message, the remainder is forwarded to the SSD controller of the second device, and a reply TAACK is returned to the first device.

As one possible implementation, if the buffer (buffer) of the second device is full, the SSD controller of the second device cannot receive the first transaction, and replies TANAK to the transport layer of the first device. It should be appreciated that the provision of a Buff is but one implementation and that there may be no Buff.

S750, the transport layer of the second device sends a first response message to the transport layer of the first device.

The first response message may be TAACK or TANAK as shown above. The backward compatibility of the scheme can be improved by corresponding to the response message form specified by the current protocol.

If the transmission layer of the second device successfully receives the first message, analyzing the first message and successfully sending the first transaction to the SSD controller of the first device, wherein the first response message is TAACK;

If the SSD controller of the second device cannot receive the processed (e.g., stripped of the transport layer header) first message, the first response message is TANAK.

Specifically, after the transport layer of the first device receives the first response message, the first response message is sent to the JFS of the first device.

Optionally, the first response message is TAACK, and when the JFS of the first device receives TAACK, the JFS of the first device generates a CQE informing the application that the second device has correctly received the first thing.

Optionally, the first response message is TANAK, when the JFS of the first device receives TANAK, the JFS of the first device retransmits the first thing, and the transport layer of the first device retransmits the first message, i.e. repeats steps S710 to S730 described above.

In this embodiment, taking the SSD controller of the second device as an example to successfully receive the processed first message as an illustration, the method flow shown in fig. 7 further includes:

S760, the transport layer of the second device sends the first transaction to the SSD controller of the second device.

Specifically, the first transaction is a message in which the first message is stripped of the transport layer header and the previous information of the transport layer header. The first message TPH encapsulated by the universal bus protocol further includes UBLINK, NPI, IP information before the first message TPH, and the first transaction is a message obtained after the transmission layer of the second device receives the first message and strips the information before the TPH from the first message.

For example, the first transaction is 'write SSD', and after receiving the first transaction, the SSD controller of the second device determines that the first device has data to be stored in the memory, and may learn the address and the length information of the data to be written based on the first transaction, so that the SSD controller of the second device parses the first transaction, and converts the write command of the first device into a read operation of the SSD controller.

For example, the first transaction is 'read SSD', and after receiving the first transaction, the SSD controller of the second device determines that the first device needs to read data from the SSD, and may learn address and length information of the data to be read based on the first transaction, so that the SSD controller of the second device parses the first transaction, and converts the read command of the first device into a write operation of the SSD controller.

For ease of understanding, the following description is provided in connection with a specific example, in which the SSD controller of the second device performs a specific read or write operation procedure:

example one: the first transaction is writing the SSD, e.g., the first transaction is writing first data to the SSD of the second device.

Illustratively, in the case shown in this example one, two possible implementations are included:

Mode one: if the first transaction does not carry the first data of the SSD to be written into the second device, in the case of the first mode, the memory controller read operation of the second device includes the following steps:

s761, the SSD controller of the second device sends the second transaction to a transport layer of the second device.

The first transaction is writing SSD, and the second transaction is reading the first data.

And S762, the transmission layer of the second device encapsulates the second transaction to obtain a second message.

Specifically, the SSD controller of the second device issues a second transaction (e.g., a command for reading data, abbreviated as a "Read" command), and the transport layer of the second device encapsulates the TPH for the "Read" command, encapsulates the network layer and the LINK layer, obtains a second message, and sends the second message to the transport layer of the first device.

S763, the transport layer of the second device sends the second message to the transport layer of the first device.

Specifically, the transport layer of the second device in this embodiment sends the second message to the transport layer of the first device via the network. For example, the second message may be transmitted over a TP that receives the first message.

S764, the transport layer of the first device parses the second message.

The transmission layer of the first device receives the second message containing the second transaction, strips the TPH and the previous part of the second message, obtains the destination VM according to the EID table look-up in the second message, and transfers the remaining part to a Remote Command (RC) queue of the destination VM. The RC is scheduled to send a 'Read' command to the transport layer TPG of the first device, the TP reads the memory, and composes a third Response message (e.g., a Read Response (RR) packet) that is sent to the transport layer of the second device via the network.

S765, the transport layer of the first device sends the third response message to the transport layer of the second device.

And the transmission layer of the second device receives a third response message, wherein the third response message comprises first data to be written into the SSD. And the transmission layer of the second equipment obtains the destination SSD controller according to the EID table look-up in the third response message, the TPH and the previous part of the third response message are stripped, and the rest part is forwarded to the SSD controller.

S766, the transport layer of the second device sends the fourth response message to the transport layer of the first device.

Specifically, the fourth response message is used to indicate whether the SSD of the second device successfully receives the third response message.

As one possible implementation manner, the transport layer of the second device forwards the processed third response message to the SSD or DDR controller of the two computer devices successfully, and the fourth response message is TAACK.

In this implementation, the transport layer of the first device forwards the TAACK to the RC, which ends the task.

As another possible implementation manner, the transport layer of the second device fails to forward the processed third response message to the SSD controller of the second computer device, for example, the SSD controller of the second device fails to receive the third response message when the buffer is full, and the fourth response message is TAACK.

In this implementation, the transport layer of the first device forwards TANAK to the RC, and the transport layer of the first device retransmits the third response message, i.e., repeats step S765 described above.

S767, the SSD controller of the second device performs writing the first data.

Mode two: if the first transaction carries the first data to be written into the SSD of the second device, in the case shown in the second mode, the second device does not need to perform steps S761 to S766 shown in the first mode, and the SSD controller of the second device directly performs step S767. Further, in the case that the first data carried in the third response message or the first transaction is successfully written into the SSD of the second device, the method flow shown in fig. 7 further includes

S768, the SSD controller of the second device sends the third thing to the transport layer of the second device.

A third thing is used to indicate that the first data has been successfully written to the SSD of the second device. For example, on the second device side, after the data carried in the RR is successfully written into the SSD, the SSD controller sends a 'Send' message.

And S769, the transmission layer of the second device encapsulates the third thing to obtain a second response message.

Specifically, the second device's transport layer encapsulates the third thing based on the UB protocol to obtain a second response message, including: and the transmission layer of the second device encapsulates the TPH for the third thing, and then encapsulates the network layer and the LINK layer to obtain a second response message.

S7691, the transport layer of the second device sends the second response message to the transport layer of the first device, or the transport layer of the first device receives the second response message from the transport layer of the second device.

And the transmission layer of the first device receives the second response message containing the third transaction, peels off the TPH and the previous part of the second response message, and obtains the target VM according to the EID table look-up in the second response message, and the rest part is transferred to the JFR of the target VM. The JFR of the VM generates a CQE informing the application that the SSD write was successful, freeing memory.

Example two: the first transaction is to read the SSD, e.g., the first transaction is to read the second data in the SSD of the second device.

The SSD controller write operation of the second device includes the steps of:

s771, the SSD controller of the second device sends the second data to the transport layer of the second device.

And S772, the transmission layer of the second device encapsulates the second data to obtain a third message.

Specifically, the SSD controller of the second device sends out the second data, the transmission layer of the second device encapsulates the second data into the TPH, and after encapsulating the network layer and the LINK layer, a third message is obtained, and the third message is sent to the transmission layer of the first device.

S773, the transport layer of the second device sends a third message to the transport layer of the first device.

Specifically, the transport layer of the second device in this embodiment sends the third message to the transport layer of the first device via the network. For example, the third message may be transmitted over the TP that received the first message.

S774, the transport layer of the first device parses the third message.

The transmission layer of the first device receives the third message containing the second data, strips the TPH and the previous part of the third message, and obtains the second data, and stores the second data.

It should be understood that, in the second example where the first transaction is the SSD reading, the first device may also indicate whether the third message is successfully received through the response message, and specifically, the description about the fourth response message in the first example where the first transaction is the SSD writing may be referred to, which is not repeated herein.

After the first transaction is completed, the transport layer of the first device may notify the application of completion of reading or writing through a notification message, and then the method flow shown in fig. 7 further includes:

s780, the transport layer of the first device sends a notification message to the transaction layer of the first device. Specifically, the notification message is used to notify that reading or writing of data is completed.

It should be understood that the specific example shown in fig. 7 in the embodiments of the present application is only for helping those skilled in the art to better understand the embodiments of the present application, and is not intended to limit the scope of the embodiments of the present application. It should be further understood that the sequence numbers of the above processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation process of the embodiments of the present application.

It is also to be understood that in the various embodiments of the application, where no special description or logic conflict exists, the terms and/or descriptions between the various embodiments are consistent and may reference each other, and features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.

The foregoing description of the solution provided by the embodiments of the present application has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The following describes in detail the device for accessing SSD across network according to the embodiment of the application with reference to fig. 8 to 10. It should be understood that the descriptions of the apparatus embodiments and the descriptions of the method embodiments correspond to each other, and thus, descriptions of details not shown may be referred to the above method embodiments, and for the sake of brevity, some parts of the descriptions are omitted.

The embodiment of the application can divide the function modules of the sending end device or the receiving end device according to the method example, for example, each function module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation. The following description will take an example of dividing each functional module into corresponding functions.

As an example, the apparatus 800 for accessing an SSD across a network may be applied to a first device, and the apparatus 800 for accessing an SSD across a network may be used to perform the method for accessing an SSD across a network described above, e.g. to perform the method shown in fig. 7. Wherein the device 800 for accessing SSD across a network includes a transaction layer and a transport layer. In particular, the apparatus 800 for accessing an SSD across a network may include a transceiving unit 810 and a processing unit 820.

The transceiver 810 is configured to receive a first transaction from a transaction layer of the first device, where the first transaction is an SSD that reads from or writes to a second device. A processing unit 820, configured to encapsulate the transport layer packet header TPH for the first transaction to obtain a first message. And a transceiver 810, configured to send the first message to a transport layer of the second device, where the first message is used to request an SSD controller of the second device to execute the first transaction, and the first message strips the TPH and information before the TPH, so as to obtain the first transaction.

As an example, in connection with fig. 7, the transceiving unit 810 may be used to perform S730, S750, S763, S765, S766, and S7691, and the processing unit 820 may be used to perform S720, S764, and S774.

It should be noted that the apparatus shown in fig. 8 may also be used to perform the method steps related to the embodiment modification shown in the aforementioned drawings, which are not described herein.

As another example, the apparatus 800 for accessing an SSD across a network may be applied to a second device, and the apparatus 800 for accessing an SSD across a network may be used to perform the method for accessing an SSD across a network described above, for example, to perform the method shown in fig. 7. Wherein the device 800 for accessing SSD across a network includes a transaction layer and a transport layer. In particular, the apparatus 800 for accessing an SSD across a network may include a transceiving unit 810 and a processing unit 820.

The transceiver 810 is configured to receive a first message from a transport layer of a first device, where the first message is a message encapsulated by a first transaction, and the first transaction is an SSD that reads from or writes to the second device. And a processing unit 1020, configured to strip a transport layer packet header TPH of the first message and information before the TPH, to obtain the first transaction.

As an example, in connection with fig. 7, the transceiving unit 810 may be used to perform S730, S750, S763, S765, S766, and S7691, and the processing unit 820 may be used to perform S740, S762, S767, and S769.

It should be noted that the apparatus shown in fig. 10 may also be used to perform the method steps related to the embodiment modification shown in the aforementioned drawings, which are not described herein.

The present application also provides a chip system 900, as shown in fig. 9, where the chip system 900 includes at least one processor and at least one interface circuit. By way of example, when the chip system 900 includes one processor and one interface circuit, the one processor may be the processor 910 shown by the solid line box (or the processor 910 shown by the broken line box) in fig. 9, and the one interface circuit may be the interface circuit 920 shown by the solid line box (or the interface circuit 920 shown by the broken line box) in fig. 9.

When the chip system 900 includes two processors, including a processor 910 shown in a solid line box and a processor 910 shown in a broken line box in fig. 9, and two interface circuits, including an interface circuit 920 shown in a solid line box and an interface circuit 920 shown in a broken line box in fig. 9. This is not limited thereto. The processor 910 and the interface circuit 920 may be interconnected by wires. For example, interface circuit 920 may be used to receive signals (e.g., instructions stored in memory, etc.). For another example, interface circuit 920 may be used to send signals to other devices (e.g., processor 910).

Illustratively, the interface circuit 920 may read instructions stored in the memory and send the instructions to the processor 910. The instructions, when executed by the processor 910, may cause a device accessing an SSD across a network or a device accessing memory to perform the various steps of the embodiments described above. Of course, the system-on-chip 900 may also include other discrete devices, which are not particularly limited in accordance with embodiments of the present application.

Another embodiment of the present application further provides a computer readable storage medium having instructions stored therein, where when the instructions are executed on a device that accesses an SSD across a network, the device that accesses the SSD across the network performs each step performed by the device that accesses the SSD across the network in the method flow shown in the method embodiment. In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture.

Fig. 10 schematically shows a conceptual partial view of a computer program product comprising a computer program for executing a computer process on a device, provided by an embodiment of the application.

In one embodiment, a computer program product is provided using signal bearing medium 1000. The signal bearing medium 1000 may include one or more program instructions that when executed by one or more processors may provide the functionality or portions of the functionality described above with respect to fig. 7. Thus, for example, reference to one or more features in fig. 7 may be carried by one or more instructions associated with signal bearing medium 1000. Further, the program instructions in fig. 10 also describe example instructions.

In some examples, signal bearing medium 1000 may comprise a computer readable medium 1001 such as, but not limited to, a hard disk drive, compact Disk (CD), digital Video Disk (DVD), digital magnetic tape, memory, read-only memory readonly memory, ROM), or random access memory (random access memory, RAM), among others.

In some implementations, signal bearing medium 1000 may include a computer recordable medium 1002 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like.

In some implementations, signal bearing medium 1000 may include communication media 1003 such as, but not limited to, digital and/or analog communication media (e.g., fiber optic cable, waveguide, wired communications link, wireless communications link, etc.). Signal bearing medium 1000 may be conveyed by a communication medium 1003 in a wireless form (e.g., a wireless communication medium conforming to the IEEE 1502.11 standard or other transmission protocol). The one or more program instructions may be, for example, computer-executable instructions or logic-implemented instructions.

In some examples, an apparatus such as for accessing an SSD across a network for fig. 7 may be configured to provide various operations, functions, or actions in response to program instructions through one or more of computer readable medium 1001, computer recordable medium 1002, and/or communication medium 1003.

It should be understood that the arrangement described herein is for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. In addition, many of the elements described are functional entities that may be implemented as discrete or distributed components, or in any suitable combination and location in conjunction with other components.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions in accordance with embodiments of the present application are produced in whole or in part on and when the computer-executable instructions are executed by a computer. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus.

The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, a website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (solid statedisk, SSD)), etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for accessing a solid state disk (SSD) across a network, characterized in that it is applied to a first device, the first device includes a transport layer and a transaction layer, and the method includes:

The transport layer of the first device receives a first transaction from the transaction layer of the first device, where the first transaction is to read the SSD of the second device or to write the SSD of the second device;

The transport layer of the first device encapsulates the first transaction with a transport layer header TPH to obtain a first message;

The transport layer of the first device sends the first message to the transport layer of the second device, where the first message is used to request the SSD controller of the second device to execute the first transaction, and the first message is decapsulated to obtain the first transaction sent to the SSD controller of the second device.

2. The method according to claim 1, characterized in that the method further comprises:

The transport layer of the first device receives a first response message from the transport layer of the second device, where the first response message is used to indicate whether the SSD controller of the second device successfully receives the first transaction;

After the SSD controller of the second device successfully receives the first transaction, the transport layer of the first device receives a second response message from the transport layer of the second device, where the second response message is used to indicate that the SSD controller of the second device successfully executed the first transaction.

3. The method according to claim 2, wherein if the first response message is a transaction acknowledgement TAACK, indicating that the SSD controller of the second device successfully received the first transaction, the method further comprises:

The transport layer of the first device sends the TAACK to the transaction layer of the first device.

4. The method according to claim 2, wherein if the first response message is a transaction negative acknowledgment TANAK, indicating that the SSD controller of the second device has not successfully received the first transaction, the method further comprises:

The transport layer of the first device sends the TANAK to the transaction layer of the first device;

The transport layer of the first device receives the first transaction from the transaction layer of the first device, re-encapsulates the first transaction to obtain the first message, and re-sends the first message to the transport layer of the second device.

5. The method according to any one of claims 1 to 4, characterized in that the transport layer of the first device receives the first transaction from the transaction layer of the first device, comprising:

A first transport layer group TPG in the transport layer of the first device receives the first transaction from the transaction layer of the first device;

The transport layer of the first device sends a first message to the transport layer of the second device, including:

The transport layer of the first device selects a transport layer TP connection with the lightest load in the first TPG, and sends the first message to the transport layer of the second device through the TP connection.

6. The method according to any one of claims 1 to 5, wherein if the first transaction is writing first data to an SSD of the second device, the method further comprises:

The transport layer of the first device receives a second message from the transport layer of the second device, where the second message is a message encapsulated by a second transaction, the second transaction is a transaction processed by the transaction layer of the second device, and the second transaction is obtaining the first data from the first device;

The transport layer of the first device strips the transport layer header TPH of the second message and the information before the TPH to obtain the second transaction, and determines the destination virtual machine VM according to the entity identifier EID carried in the second transaction;

The transport layer of the first device sends the second transaction to the remote control RC queue of the VM to schedule the RC queue and generate a third response message in response to the second message;

The transport layer of the first device sends the third response message to the transport layer of the second device, where the third response message includes the first data to be written into the SSD of the second device.

7. The method according to any one of claims 1 to 5, characterized in that if the first transaction is reading second data in the SSD of the second device, the method further comprises:

The transport layer of the first device receives a third message from the transport layer of the second device, where the third message is a message carrying the second data;

The transport layer of the first device strips off the transport layer header TPH of the third message and the information before the TPH, obtains the second data, and writes the second data into the memory of the first device.

8. A method for accessing a solid state disk (SSD) across a network, characterized in that it is applied to a second device, the second device includes a transport layer and a transaction layer, and the method includes:

The transport layer of the second device receives a first message from the transport layer of the first device, where the first message is a message encapsulated by a first transaction, and the first transaction is to read the SSD of the second device, or to write the SSD of the second device;

The transport layer of the second device strips the transport layer header TPH of the first message and information before the TPH to obtain the first transaction;

The transport layer of the second device sends the first transaction to the transaction layer of the second device.

9. The method according to claim 8, characterized in that the method further comprises:

The transport layer of the second device sends a first response message to the transport layer of the first device, where the first response message is used to indicate whether the SSD controller of the second device has successfully received the first transaction;

After the SSD controller of the second device successfully receives the first transaction, the transport layer of the second device sends a second response message to the transport layer of the first device, where the second response message is used to indicate that the SSD controller of the second device successfully executed the first transaction.

10. The method according to claim 8 or 9, characterized in that if the first response message is a transaction negative acknowledgment TANAK, indicating that the SSD controller of the second device has not successfully received the first transaction, the method further comprises:

The transport layer of the second device re-receives the first message from the transport layer of the first device.

11. The method according to any one of claims 8 to 10, wherein if the first transaction is writing first data to an SSD of the second device, the method further comprises:

The transport layer of the second device receives a second transaction from the SSD controller of the second device, where the second transaction is to obtain the first data from the first device;

The transport layer of the second device encapsulates the second transaction with a transport layer header TPH to obtain the second message;

The transport layer of the second device sends the second message to the transport layer of the first device;

The transport layer of the second device receives a third response message from the transport layer of the first device in response to the second message, wherein the third response message includes the first data to be written to the SSD of the second device.

12. The method according to any one of claims 8 to 10, wherein if the first transaction is reading second data in an SSD of the second device, the method further comprises:

The transport layer of the second device receives the second data from the SSD controller of the second device;

The transport layer of the second device encapsulates the second data with a transport layer header TPH to obtain the third message;

The transport layer of the second device sends the third message to the transport layer of the first device.

13. A device for accessing a solid-state disk (SSD) across a network, characterized in that it comprises: a processor for reading instructions stored in a memory, and when the processor executes the instructions, the device for accessing the SSD across a network implements any one of the methods of claims 1 to 7.

14. A device for accessing a solid-state disk (SSD) across a network, characterized in that it comprises: a processor for reading instructions stored in a memory, and when the processor executes the instructions, the device for accessing the SSD across a network implements the method described in any one of claims 8 to 12.

15. A device for accessing a solid state disk (SSD) across a network, characterized in that it comprises a unit for executing the method according to any one of claims 1 to 7, or a unit for executing the method according to any one of claims 8 to 12.

16. A chip, comprising:

At least one processing core, configured to execute the method according to any one of claims 1 to 12.

17. A computer device, comprising: the chip as claimed in claim 16.

18. A computer program product, characterized in that the computer program product comprises a computer program code, and when the computer program code is run on a computer, the method according to any one of claims 1 to 12 is executed.

19. A computer-readable storage medium, characterized in that it comprises a computer program, which, when executed on a computer device, enables a processing module in the computer device to execute the method according to any one of claims 1 to 12.

20. A system for accessing a solid state disk (SSD) across a network, characterized by comprising a first device for executing the method according to any one of claims 1 to 7 and a second device for executing the method according to any one of claims 8 to 12.