[go: up one dir, main page]

CN117743253A - Interface for remote memory - Google Patents

Interface for remote memory Download PDF

Info

Publication number
CN117743253A
CN117743253A CN202310780755.7A CN202310780755A CN117743253A CN 117743253 A CN117743253 A CN 117743253A CN 202310780755 A CN202310780755 A CN 202310780755A CN 117743253 A CN117743253 A CN 117743253A
Authority
CN
China
Prior art keywords
interface
memory
cxl
circuit
processing circuitry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310780755.7A
Other languages
Chinese (zh)
Inventor
M·加格
R·阿马里
P·里希纳穆尔蒂
崔昌皓
奇亮奭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/054,492 external-priority patent/US20240095171A1/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN117743253A publication Critical patent/CN117743253A/en
Pending legal-status Critical Current

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明公开一种系统,所述系统具有用于远程存储器的接口。在一些实施例中,所述系统包括:接口电路,所述接口电路具有:第一接口,所述第一接口被配置为连接到处理电路;以及第二接口,所述第二接口被配置为连接到存储器,所述第一接口包括高速缓存一致性接口,并且所述第二接口不同于所述第一接口。

The present invention discloses a system having an interface for remote storage. In some embodiments, the system includes: an interface circuit having: a first interface configured to connect to the processing circuit; and a second interface configured to Connected to the memory, the first interface includes a cache coherence interface, and the second interface is different from the first interface.

Description

用于远程存储器的接口Interface for remote storage

相关申请的交叉引用Cross-references to related applications

本申请要求2022年9月21日提交的名称为“REMOTE ACCESS SOLUTION FOR CXLMEMORY CLUSTERING ACROSS SERVERS”的美国临时申请编号63/408,725的优先级和权益,所述申请的全部内容通过引用并入本文。This application claims the priority and benefit of U.S. Provisional Application No. 63/408,725 entitled "REMOTE ACCESS SOLUTION FOR CXLMEMORY CLUSTERING ACROSS SERVERS" filed on September 21, 2022, the entire contents of which are incorporated herein by reference.

技术领域Technical field

根据本公开的实施例的一个或多个方面涉及计算系统,并且更具体地涉及用于远程存储器的接口。One or more aspects of embodiments in accordance with the present disclosure relate to computing systems, and more particularly to interfaces for remote memory.

背景技术Background technique

在计算系统中,主机中央处理单元(CPU)可通过例如地址总线和数据总线或者通过使用高速互连如CXL连接到主机存储器。用于形成与存储器的连接的一些系统可限制可用于形成这些连接的导体(例如,缆线)的长度。In a computing system, a host central processing unit (CPU) may be connected to host memory through, for example, an address bus and a data bus, or through the use of a high-speed interconnect such as CXL. Some systems for forming connections to memory may limit the length of conductors (eg, cables) that can be used to form these connections.

本公开的各方面是关于这种一般技术环境的。Aspects of the present disclosure are directed to this general technical environment.

发明内容Contents of the invention

根据本公开的实施例,提供了一种系统,其包括:接口电路,所述接口电路具有:第一接口,所述第一接口被配置为连接到处理电路;以及第二接口,所述第二接口被配置为连接到存储器,所述第一接口包括高速缓存一致性接口,并且所述第二接口不同于所述第一接口。According to an embodiment of the present disclosure, a system is provided, including: an interface circuit having: a first interface configured to be connected to a processing circuit; and a second interface, the first interface being connected to a processing circuit. Two interfaces are configured to connect to the memory, the first interface includes a cache coherence interface, and the second interface is different from the first interface.

在一些实施例中,所述系统还包括连接到所述第二接口的存储器服务器。In some embodiments, the system further includes a memory server connected to the second interface.

在一些实施例中,所述第二接口包括远程直接存储器访问接口。In some embodiments, the second interface includes a remote direct memory access interface.

在一些实施例中,所述第二接口包括计算机集群互连接口。In some embodiments, the second interface includes a computer cluster interconnect interface.

在一些实施例中,所述计算机集群互连接口包括以太网接口。In some embodiments, the computer cluster interconnect interface includes an Ethernet interface.

在一些实施例中,所述存储器服务器由所具有的长度大于6英尺的缆线连接到所述第二接口。In some embodiments, the storage server is connected to the second interface by a cable having a length greater than 6 feet.

在一些实施例中,所述高速缓存一致性接口包括计算快速链接(CXL)接口。In some embodiments, the cache coherence interface includes a Compute Express Link (CXL) interface.

在一些实施例中,所述第一接口被配置为:响应于由所述处理电路执行的加载指令而向所述处理电路发送数据;并且响应于由所述处理电路执行的存储指令而从所述处理电路接收数据。In some embodiments, the first interface is configured to: send data to the processing circuit in response to a load instruction executed by the processing circuit; and to send data from the processing circuit in response to a store instruction executed by the processing circuit. The processing circuit receives the data.

在一些实施例中,所述系统还包括:计算快速链接(CXL)根联合体,所述CXL根联合体连接在所述处理电路与所述第一接口之间。In some embodiments, the system further includes a Compute Express Link (CXL) root complex connected between the processing circuitry and the first interface.

根据本公开的实施例,提供了一种系统,其包括:接口电路,所述接口电路具有:第一接口,所述第一接口被配置为连接到处理电路;以及第二接口,所述第二接口被配置为连接到存储器,所述第一接口包括计算快速链接(CXL)接口,并且所述第二接口不同于所述第一接口。According to an embodiment of the present disclosure, a system is provided, including: an interface circuit having: a first interface configured to be connected to a processing circuit; and a second interface, the first interface being connected to a processing circuit. Two interfaces are configured to connect to the memory, the first interface includes a Compute Express Link (CXL) interface, and the second interface is different from the first interface.

在一些实施例中,所述系统还包括连接到所述第二接口的存储器服务器。In some embodiments, the system further includes a memory server connected to the second interface.

在一些实施例中,所述第二接口包括远程直接存储器访问接口。In some embodiments, the second interface includes a remote direct memory access interface.

在一些实施例中,所述第二接口包括计算机集群互连接口。In some embodiments, the second interface includes a computer cluster interconnect interface.

在一些实施例中,所述计算机集群互连接口包括以太网接口。In some embodiments, the computer cluster interconnect interface includes an Ethernet interface.

在一些实施例中,所述存储器服务器由所具有的长度大于6英尺的缆线连接到所述第二接口。In some embodiments, the storage server is connected to the second interface by a cable having a length greater than 6 feet.

在一些实施例中,所述CXL接口包括高速缓存一致性接口。In some embodiments, the CXL interface includes a cache coherence interface.

在一些实施例中,所述第一接口被配置为:响应于由所述处理电路执行的加载指令而向所述处理电路发送数据;并且响应于由所述处理电路执行的存储指令而从所述处理电路接收数据。In some embodiments, the first interface is configured to: send data to the processing circuit in response to a load instruction executed by the processing circuit; and to send data from the processing circuit in response to a store instruction executed by the processing circuit. The processing circuit receives the data.

在一些实施例中,所述系统还包括:CXL根联合体,所述CXL根联合体连接在所述处理电路与所述第一接口之间。In some embodiments, the system further includes: a CXL root complex connected between the processing circuit and the first interface.

根据本公开的实施例,提供了一种方法,其包括:由中央处理单元执行用于将第一值存储在第一地址处的第一存储器位置中的存储指令;响应于所述执行所述存储指令而由接口电路向包括所述第一存储器位置的存储器发送存储命令,所述存储命令是用于将所述第一值存储在所述第一存储器位置中的命令,其中所述接口电路具有:第一接口,所述第一接口连接到所述中央处理单元;以及第二接口,所述第二接口连接到所述存储器,所述第一接口包括计算快速链接(CXL)接口,并且所述第二接口不同于所述第一接口。According to an embodiment of the present disclosure, a method is provided, comprising: executing, by a central processing unit, a store instruction for storing a first value in a first memory location at a first address; in response to said executing said store instructions transmitting, by the interface circuit, a store command to the memory including the first memory location, the store command being a command for storing the first value in the first memory location, wherein the interface circuit having: a first interface connected to the central processing unit; and a second interface connected to the memory, the first interface comprising a Compute Express Link (CXL) interface, and The second interface is different from the first interface.

在一些实施例中,所述方法还包括:由所述中央处理单元执行用于将第二地址处的第二存储器位置中的值加载到所述中央处理单元的寄存器中的加载指令;响应于所述执行所述加载指令而由所述接口电路向所述存储器发送读取命令,所述读取命令是用于读取所述第二存储器位置中的值的命令。In some embodiments, the method further includes: executing, by the central processing unit, a load instruction for loading a value in a second memory location at a second address into a register of the central processing unit; in response to The execution of the load instruction causes the interface circuit to send a read command to the memory, where the read command is a command for reading a value in the second memory location.

附图说明Description of drawings

参考说明书、权利要求和附图,将认识并理解本公开的这些和其他特征和优点,其中:These and other features and advantages of the present disclosure will be recognized and understood by reference to the specification, claims, and drawings, in which:

图1A是根据本公开的实施例的单个主机计算系统的框图;1A is a block diagram of a single host computing system in accordance with an embodiment of the present disclosure;

图1B是根据本公开的实施例的启动过程的流程图;1B is a flowchart of a startup process according to an embodiment of the present disclosure;

图1C是根据本公开的实施例的包括图形处理单元的单个主机计算系统的框图;1C is a block diagram of a single host computing system including a graphics processing unit, in accordance with an embodiment of the present disclosure;

图1D是根据本公开的实施例的包括多个不同的存储器池服务器的单个主机计算系统的框图;1D is a block diagram of a single host computing system including multiple different memory pool servers, in accordance with an embodiment of the present disclosure;

图2是根据本公开的实施例的多主机计算系统的框图;2 is a block diagram of a multi-host computing system according to an embodiment of the present disclosure;

图3是根据本公开的实施例的单个主机计算系统的操作框图;3 is an operational block diagram of a single host computing system in accordance with an embodiment of the present disclosure;

图4是根据本公开的实施例的具有开关的多主机计算系统的框图;并且4 is a block diagram of a multi-host computing system with switches in accordance with an embodiment of the present disclosure; and

图5是根据本公开的实施例的方法的流程图。Figure 5 is a flowchart of a method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图阐述的详细描述旨在作为根据本公开提供的用于远程存储器的接口的示例性实施例的描述,而不旨在表示本公开可被构建或利用的唯一形式。描述结合所示实施例阐述了本公开的特征。然而,应当理解,相同或等效的功能和结构可以通过不同的实施例来实现,所述实施例也旨在涵盖在本公开的范围内。如本文别处所示,相似的元件编号旨在指示相似的元件或特征。The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary embodiments of interfaces for remote memory provided in accordance with the present disclosure and is not intended to represent the only forms in which the disclosure may be constructed or utilized. The description sets forth the features of the disclosure in connection with the illustrated embodiments. However, it should be understood that the same or equivalent functions and structures may be implemented by different embodiments, which are also intended to be within the scope of the present disclosure. As shown elsewhere herein, similar element numbers are intended to indicate similar elements or features.

在各种计算应用中,例如当新应用程序在主机上起动时,在当前运行的应用程序关闭时或者当用户对应用程序的需求变化时,主机对存储器的要求会随时间变化。然而,使主机配备有足够的主存储器以处置最高可预见需求可能成本很高。此外,针对主机中央处理单元(CPU)与存储器之间的一些接口可能存在的缆线长度限制会限制存储器可用的容量(例如,限于在与主机CPU相同的机架内可用的容量)。In various computing applications, the memory requirements of the host can change over time, such as when a new application is launched on the host, when a currently running application is closed, or when user needs for the application change. However, equipping a host with enough main memory to handle the highest foreseeable demands can be costly. Additionally, cable length limitations that may exist for some interfaces between the host central processing unit (CPU) and storage may limit the capacity available to the storage (eg, to capacity available within the same rack as the host CPU).

因此,一些实施例产生一种机制,所述机制允许主机应用程序使用正常加载和存储命令访问本地机架级之外的存储器池。此类实施例可将可作为存储器池的前端操作的到接口电路的高速缓存一致性接口(例如,快速计算链接(CXL)接口)与作为存储器池的后端操作的低等待时间远程存储器访问协议(例如,RDMA)存储器池一起使用。所述系统可通过接口电路和共享存储器池动态地分配资源。此类实施例可实现将存储器分解为物理上单独的存储器资源(例如,不需要与使用这些资源的处理电路处于相同机架中的资源),从而避免了对一些存储器接口链接(例如,基于外围部件互连高速(PCIe)的链接诸如CXL,所述链接所具有的缆线长度可限于例如介于8英寸(第3代PCIe)与15英寸(第1代PCIe)之间的长度)的限制。Therefore, some embodiments create a mechanism that allows host applications to access memory pools beyond the local rack level using normal load and store commands. Such embodiments may combine a cache-coherent interface to an interface circuit (eg, a Compute Express Link (CXL) interface) that may operate as the front-end of a memory pool with a low-latency remote memory access protocol that operates as the back-end of a memory pool. (e.g., RDMA) memory pool. The system can dynamically allocate resources through interface circuits and shared memory pools. Such embodiments may enable the decomposition of memory into physically separate memory resources (e.g., resources that do not need to be in the same rack as the processing circuitry that uses those resources), thus avoiding the need for some memory interface links (e.g., peripheral-based Component Interconnect Express (PCIe) links such as CXL, which may have cable length limitations such as between 8 inches (PCIe Gen 3) and 15 inches (PCIe Gen 1)) .

在一些实施例中,主机可使用加载和存储语义访问存储器的池,并且避免需要在与主机处于相同机架内的设备上实现物理资源(诸如动态随机访问存储器(DRAM))。分解存储器(通过低等待时间远程存储器访问协议(例如,RDMA))可用于在运行时分配资源。在一些实施例中,接口电路可被重新配置为任何大小并且被映射到远程存储器池,并且远程存储器池可提供可被动态地组合以满足服务器在可组成分解存储器架构中的需要的流动的存储器资源集合。一些实施例导致更低的总拥有成本(TCO),并且通过借助于RDMA(其可与所具有的长度大于例如6英尺的缆线兼容)支持长距离存储器分解来克服存在于一些接口中的缆线长度限制。In some embodiments, a host can access a pool of memory using load and store semantics and avoid the need to implement physical resources, such as dynamic random access memory (DRAM), on devices within the same rack as the host. Exploded memory (via a low-latency remote memory access protocol (e.g., RDMA)) can be used to allocate resources at runtime. In some embodiments, the interface circuits can be reconfigured to any size and mapped to remote memory pools, and the remote memory pools can provide flowing memory that can be dynamically combined to meet the needs of the server in a composable disaggregated memory architecture. Resource collection. Some embodiments result in lower total cost of ownership (TCO) and overcome the cable constraints present in some interfaces by supporting long-distance memory disaggregation via RDMA (which is compatible with cables having lengths greater than, for example, 6 feet). Line length limit.

参考图1A,在一些实施例中,计算系统100包括主机102(包括中央处理单元(CPU)105(其可以是或包括处理电路)和本地存储器110(其可以是双倍数据速率(DDR)存储器))以及存储器系统115。存储器系统可包括接口电路120,所述接口电路包括具有高速缓存一致性存储器访问协议的前端接口130(例如,cxl.mem)和具有低等待时间远程存储器访问能力的后端接口135(例如,RDMA使能的网络接口卡)。前端接口130可以是CXL接口(例如,CXL.mem);在这种情况下,接口电路120可被称为CXL设备。存储器池服务器125可包括后端接口135和存储器池140。存储器池140可包括例如动态随机访问存储器库,所述动态随机访问存储器库例如被配置为存储器模块,每个存储器模块包括印刷电路板上的多个存储器芯片。存储器可直接连接到存储器池服务器125的后端接口135,或者存储器的一部分或全部可被实现在一个或多个存储器服务器中,通过计算机集群互连接口连接到存储器池服务器125。Referring to FIG. 1A , in some embodiments, computing system 100 includes a host 102 including a central processing unit (CPU) 105 (which may be or include processing circuitry) and local memory 110 (which may be double data rate (DDR) memory) )) and the memory system 115. The memory system may include interface circuitry 120 that includes a front-end interface 130 with a cache-coherent memory access protocol (e.g., cxl.mem) and a back-end interface 135 with low-latency remote memory access capabilities (e.g., RDMA enabled network interface card). Front-end interface 130 may be a CXL interface (eg, CXL.mem); in this case, interface circuit 120 may be referred to as a CXL device. Storage pool server 125 may include a backend interface 135 and a storage pool 140 . Memory pool 140 may include, for example, a bank of dynamic random access memory configured as memory modules, each memory module including a plurality of memory chips on a printed circuit board. The storage may be directly connected to the backend interface 135 of the storage pool server 125, or some or all of the storage may be implemented in one or more storage servers connected to the storage pool server 125 through a computer cluster interconnect interface.

前端接口130可连接到CPU 105的地址总线和数据总线。因此,从CPU 105的角度来看,由存储器系统115提供的存储可与本地存储器110基本上相同,并且在操作中,在加载和存储指令的地址处在分配给接口电路120的物理地址范围内时,存储器系统115可以对由CPU 105执行的加载和存储指令直接作出响应。存储器系统115对由CPU 105执行的加载和存储指令直接响应的这种能力可使CPU无需调用驱动程序功能来将数据存储在存储器系统115中或从存储器系统115检索数据。Front-end interface 130 may be connected to the address bus and data bus of CPU 105 . Thus, from the perspective of the CPU 105 , the storage provided by the memory system 115 may be substantially the same as the local memory 110 and, in operation, the addresses of load and store instructions are within the physical address range assigned to the interface circuit 120 , memory system 115 may respond directly to load and store instructions executed by CPU 105. This ability of memory system 115 to respond directly to load and store instructions executed by CPU 105 may eliminate the need for the CPU to call driver functions to store data in or retrieve data from memory system 115 .

主机102可将接口电路120视为CXL设备,所述CXL设备通过CXL发现将其存储器资源通告给主机102。例如,存储在接口电路120的CXL接口中的基地址寄存器(BAR)中的合适值可确定分配给接口电路120的存储器地址范围的大小。在启动时,为了确定通过接口电路120可用的存储器的大小,CPU可将全部二进制一的字写入到适当的基地址寄存器,并且然后读取相同的基地址寄存器;作为响应,接口电路120可发送指示可用存储器区域的大小的字。主机102可通过执行CPU 105的指令集合的加载指令或存储指令来使用存储器资源。The host 102 may view the interface circuit 120 as a CXL device that advertises its memory resources to the host 102 through CXL discovery. For example, an appropriate value stored in a base address register (BAR) in the CXL interface of interface circuit 120 may determine the size of the memory address range allocated to interface circuit 120 . On startup, to determine the size of memory available through the interface circuit 120, the CPU may write a word of all binary ones to the appropriate base address register, and then read the same base address register; in response, the interface circuit 120 may Sends a word indicating the size of the available memory area. Host 102 may use memory resources by executing load instructions or store instructions of the instruction set of CPU 105 .

图1B示出可在启动时采用的方法。在一些实施例中,在启动时,接口电路120中的RDMA网络接口控制器135在155处向存储器池服务器125发送发现消息。存储器池服务器125然后在160处利用报告存储器池服务器125的能力的响应回应发现消息。报告的能力可包括例如存储器池服务器125的容量、存储器池服务器125的带宽和存储器池服务器125的等待时间。RDMA网络接口控制器135可保持此能力信息,并且在165处将能力信息提供给存储器接口130以用于CXL起动。存储器接口130然后可在170处执行CXL起动并且使用一致性设备属性表(CDAT)(其可在CXL标准下被采用以向主机发送存储器信息)将存储器相关信息提供给主机102。Figure IB illustrates a method that may be employed at startup. In some embodiments, upon startup, the RDMA network interface controller 135 in the interface circuit 120 sends a discovery message to the memory pool server 125 at 155 . The memory pool server 125 then responds to the discovery message at 160 with a response reporting the capabilities of the memory pool server 125. Reported capabilities may include, for example, storage pool server 125 capacity, storage pool server 125 bandwidth, and storage pool server 125 latency. RDMA network interface controller 135 may maintain this capability information and provide the capability information to memory interface 130 for CXL booting at 165 . Memory interface 130 may then perform CXL startup at 170 and provide memory related information to host 102 using a consistent device attribute table (CDAT) (which may be employed under the CXL standard to send memory information to the host).

接口电路120和存储器池服务器125之间的连接可以是远程直接存储器访问连接,如图所示。远程直接存储器访问连接可包括缆线145(例如,电力缆线(具有多个导体)或光学缆线(包括多根光纤))。缆线145可在接口电路120中的后端接口135与存储器池服务器125中的另一个后端接口135之间形成连接。RDMA网络接口控制器135之间的接口可以是以太网或任何其他合适的计算机集群互连接口,例如Infini Band或光纤通道。The connection between interface circuit 120 and memory pool server 125 may be a remote direct memory access connection, as shown. The remote direct memory access connection may include cable 145 (eg, a power cable (having multiple conductors) or an optical cable (including multiple optical fibers)). Cable 145 may form a connection between backend interface 135 in interface circuit 120 and another backend interface 135 in storage pool server 125 . The interface between the RDMA network interface controllers 135 may be Ethernet or any other suitable computer cluster interconnect interface, such as Infini Band or Fiber Channel.

接口电路120的后端接口135的配置可包括使用存储器池服务器125的网际协议(IP)地址(其可以是后端接口135的配置的一部分)来与存储器池服务器125通信。接口电路120可在起动时针对存储器资源与存储器池服务器125协商,并且与RDMA服务器建立远程直接存储器访问连接以执行读取或写入操作。Configuration of the backend interface 135 of the interface circuit 120 may include communicating with the memory pool server 125 using an Internet Protocol (IP) address of the memory pool server 125 (which may be part of the configuration of the backend interface 135 ). The interface circuit 120 may negotiate with the memory pool server 125 at startup for memory resources and establish a remote direct memory access connection with the RDMA server to perform read or write operations.

从主机102的角度来看,接口电路120可以是CXL类型3设备。在操作中,远程直接存储器访问系统可产生一个或多个队列对(QP),并且寄存一个或多个存储器区域(MR)。响应于由主机102的CPU 105执行的加载和存储操作,队列对和存储器区域然后可用于执行远程直接存储器访问读取操作和远程直接存储器访问写入操作。From the perspective of host 102, interface circuit 120 may be a CXL Type 3 device. In operation, the remote direct memory access system may generate one or more queue pairs (QPs) and register one or more memory regions (MRs). In response to load and store operations performed by CPU 105 of host 102, the queue pairs and memory regions may then be used to perform remote direct memory access read operations and remote direct memory access write operations.

例如,当主机102的CPU 105执行用于将第一值存储在第一地址处的第一存储器位置中的存储指令,第一地址被映射到接口电路120时,接口电路120可接收存储指令(作为第一地址被映射到接口电路120的结果),并且响应于执行存储指令,接口电路120可向存储器池服务器125发送存储命令。存储命令可经由远程直接存储器访问进行发送;例如用于将第一值存储在存储器池服务器125的存储器池140中,接口电路120可发起远程直接存储器访问写入传输,以将第一值存储在存储器池140中。For example, when CPU 105 of host 102 executes a store instruction to store a first value in a first memory location at a first address, the first address is mapped to interface circuit 120 , interface circuit 120 may receive the store instruction ( As a result of the first address being mapped to interface circuit 120 ), and in response to executing the store instruction, interface circuit 120 may send a store command to memory pool server 125 . The store command may be sent via remote direct memory access; for example, to store the first value in memory pool 140 of memory pool server 125 , interface circuit 120 may initiate a remote direct memory access write transfer to store the first value in in the memory pool 140.

作为另一个示例,如果主机102的CPU 105执行用于将第二地址处的第二存储器位置中的值加载到CPU 105的寄存器中的加载指令,第二地址被映射到接口电路120,则接口电路120可接收加载指令(作为第二地址被映射到接口电路120的结果),并且响应于执行加载指令,接口电路120可向存储器池服务器125发送读取命令。读取命令可经由远程直接存储器访问进行发送;例如用于读取存储在存储器池服务器125的存储器池140中的值,接口电路120可发起远程直接存储器访问读取传输,以从存储器池140读取值。As another example, if CPU 105 of host 102 executes a load instruction to load a value in a second memory location at a second address into a register of CPU 105 , the second address is mapped to interface circuit 120 , then the interface Circuitry 120 may receive a load instruction (as a result of the second address being mapped to interface circuitry 120 ), and in response to executing the load instruction, interface circuitry 120 may send a read command to memory pool server 125 . A read command may be sent via remote direct memory access; for example, to read a value stored in memory pool 140 of memory pool server 125 , interface circuit 120 may initiate a remote direct memory access read transfer to read from memory pool 140 Take value.

CXL 2.0可支持热插拔特征。因此,在前端接口130是CXL接口的实施例中,可以在主机正在操作时与接口电路120形成新连接,或者在主机正在操作时断开接口电路120,而不会干扰主机的操作。在一些实施例中,存储器系统115的等待时间充分地低(例如,作为使用低等待时间协议诸如Infini Band或RDMA的结果)以使得能够支持CXL.cache的高速缓存一致性保留特征。CXL 2.0 supports hot-swappable features. Therefore, in embodiments where front-end interface 130 is a CXL interface, new connections can be made to interface circuit 120 while the host is operating, or interface circuit 120 can be disconnected while the host is operating, without disturbing the operation of the host. In some embodiments, the latency of memory system 115 is sufficiently low (eg, as a result of using a low latency protocol such as Infini Band or RDMA) to enable support of the cache coherence preservation feature of CXL.cache.

参考图1C,在一些实施例中,图形处理单元(GPU)148连接到CPU 105和接口电路120。例如,可通过CXL开关150形成连接。在一些实施例中,图形处理单元148是主机102的一部分,如图所示;在其他实施例中,图形处理单元148可以是另一个主机的主要处理电路,或者图形处理单元148可以是单独CXL设备(例如,类型2CXL设备),所述单独CXL设备不是主机102的一部分并且像接口电路120一样通过CXL链接连接到主机。在一些实施例中,图形处理单元148在类型2CXL设备中。Referring to FIG. 1C , in some embodiments, a graphics processing unit (GPU) 148 is connected to the CPU 105 and interface circuitry 120 . For example, the connection may be made through CXL switch 150 . In some embodiments, graphics processing unit 148 is part of host 102 , as shown; in other embodiments, graphics processing unit 148 may be the primary processing circuitry of another host, or graphics processing unit 148 may be a separate CXL A device (eg, a Type 2 CXL device) that is not part of the host 102 and is connected to the host through a CXL link like the interface circuit 120 . In some embodiments, graphics processing unit 148 is in a Type 2 CXL device.

此外,如图1C所示,在一些实施例中,多个存储器池服务器125连接到接口电路120的RDMA网络接口控制器135。存储器池服务器125中的每一个可经由任何合适类型的连接(例如以太网、Infini band或光纤通道)连接到接口电路120,如图所示。接口电路120可包括能够支持到多个存储器池服务器125的多个相应连接的单个RDMA网络接口控制器135,如图所示,或者接口电路120可包括多个RDMA网络接口控制器135或具有多个物理接口的一个NIC,每个物理接口连接到接口电路120的存储器接口130并且每个物理接口(i)连接到存储器池服务器125中的相应一个和(ii)被配置为支持在接口电路120与相应的存储器池服务器125之间的连接中使用的协议(例如以太网、Infini band或光纤通道)。例如,使用Infiniband形成的连接与使用以太网的连接相比可具有更低等待时间。这种连接(例如,使用Infini band的连接)可以满足三种CXL协议(cxl.io、cxl.mem和cxl.cache)中的一种或多种的等待时间要求。Additionally, as shown in FIG. 1C , in some embodiments, multiple memory pool servers 125 are connected to the RDMA network interface controller 135 of the interface circuit 120 . Each of the storage pool servers 125 may be connected to the interface circuit 120 via any suitable type of connection (eg, Ethernet, Infini band, or Fiber Channel), as shown. Interface circuit 120 may include a single RDMA network interface controller 135 capable of supporting multiple respective connections to multiple memory pool servers 125, as shown, or interface circuit 120 may include multiple RDMA network interface controllers 135 or have multiple A NIC of physical interfaces, each physical interface connected to the memory interface 130 of the interface circuit 120 and each physical interface (i) connected to a corresponding one of the memory pool servers 125 and (ii) configured to support the memory interface 130 on the interface circuit 120 The protocol used in the connection to the corresponding storage pool server 125 (such as Ethernet, Infini band, or Fiber Channel). For example, a connection formed using Infiniband may have lower latency than a connection using Ethernet. Such a connection (for example, one using Infini band) can satisfy the latency requirements of one or more of the three CXL protocols (cxl.io, cxl.mem, and cxl.cache).

参考图1D,在一些实施例中,存储器池服务器125被不同地构造,例如,存储器池服务器125中的一个可包括针对高带宽优化的存储器池140a,存储器池服务器125中的一个可包括针对低等待时间优化的存储器池140b,并且存储器池服务器125中的一个可包括针对高容量优化的存储器池140c。在一些实施例中,可选择被采用以将存储器池服务器125连接到接口电路120的连接类型以向主机102提供一定水平的性能。例如,可具有相对低等待时间的Infini band连接可用于将包括针对低等待时间优化的存储器池140b的存储器池服务器125连接到接口电路120,使得由主机102经历的总等待时间由于以下两者而减少:(i)所使用的存储器池140的类型和(ii)所使用的连接(用于将存储器池服务器125连接到接口电路120)的类型。Referring to FIG. 1D, in some embodiments, the memory pool servers 125 are configured differently. For example, one of the memory pool servers 125 may include a memory pool 140a optimized for high bandwidth, and one of the memory pool servers 125 may include a memory pool 140a optimized for low bandwidth. A latency-optimized memory pool 140b, and one of the memory pool servers 125 may include a memory pool 140c optimized for high capacity. In some embodiments, the type of connection employed to connect the storage pool server 125 to the interface circuit 120 may be selected to provide a certain level of performance to the host 102 . For example, an Infini band connection, which may have relatively low latency, may be used to connect a memory pool server 125 including a memory pool 140b optimized for low latency to the interface circuit 120 such that the total latency experienced by the host 102 is due to Reduce: (i) the type of memory pool 140 used and (ii) the type of connection used to connect the memory pool server 125 to the interface circuit 120.

在一些实施例中,在主机102上运行的应用程序可需要不同特性的存储器;例如,出于性能原因,应用程序可需要低等待时间存储器。在一些实施例中,这种应用程序可知道通过接口电路120连接到主机102的不同存储器池服务器125的不同性能特性。应用程序可访问该信息作为(上述)启动过程的结果,这可导致将该信息(例如,通过主机102的操作系统)存储在主机中。应用程序然后在其从操作系统请求存储器时可请求具有如下性能特性的存储器,所述性能特性将导致应用程序的可接受性能。In some embodiments, applications running on host 102 may require different characteristics of memory; for example, applications may require low latency memory for performance reasons. In some embodiments, such applications may be aware of the different performance characteristics of different memory pool servers 125 connected to the host 102 through the interface circuitry 120 . The application may access this information as a result of the boot process (described above), which may cause the information to be stored in the host (eg, via the operating system of host 102). The application, when it requests memory from the operating system, can then request memory with performance characteristics that will result in acceptable performance for the application.

参考图2,在一些实施例中,多个主机可各自连接到存储器池服务器125并且共享存储器池服务器125的存储器资源。如在图1A的实施例中,每个接口电路120向相应的主机提供具有配置的动态存储器大小的存储器设备,而无需物理存储器资源(例如,DRAM存储器)物理地存在于接口电路120中。分解存储器资源的池可例如在一个或多个存储器服务器中产生,并且它可被放置在远程位置处。存储器服务器的远程集合可称为存储器场(memoryfarm)。可使用低等待时间网络协议访问存储器资源,诸如远程直接存储器访问;在此类实施例中,可有效地使用存储器资源,并且可降低总拥有成本(TCO)。Referring to FIG. 2 , in some embodiments, multiple hosts may each be connected to the memory pool server 125 and share the memory resources of the memory pool server 125 . As in the embodiment of FIG. 1A , each interface circuit 120 provides a memory device with a configured dynamic memory size to a corresponding host without requiring physical memory resources (eg, DRAM memory) to be physically present in the interface circuit 120 . A pool of decomposed memory resources may be created, for example, in one or more memory servers, and it may be placed at a remote location. A remote collection of memory servers may be called a memory farm. Memory resources may be accessed using low latency network protocols, such as remote direct memory access; in such embodiments, memory resources may be used efficiently and total cost of ownership (TCO) may be reduced.

参考图3,如上所述,接口电路120可占用CPU 105的物理地址空间的一部分。主机102还可包括(除了CPU 105和本地存储器110之外)CXL根联合体305,所述CXL根联合体可在主机侧上形成到图1A的CXL链接的接口。在一些实施例中,在操作中,主机写入到物理地址空间映射的CXL-DRAM中,请求被发送到CXL根联合体305(通过CPU 105的地址总线和数据总线),CXL根联合体305生成事务层分组(TLP)并且将所述TLP发送到接口电路120,并且接口电路120将事务层分组转换为远程直接存储器访问并且通过计算机集群互连接口发送所述远程直接存储器访问。Referring to Figure 3, as discussed above, interface circuitry 120 may occupy a portion of the physical address space of CPU 105. Host 102 may also include (in addition to CPU 105 and local memory 110) a CXL root complex 305, which may form an interface on the host side to the CXL link of Figure 1A. In some embodiments, during operation, the host writes to the physical address space mapped CXL-DRAM and the request is sent to the CXL root complex 305 (via the address bus and data bus of the CPU 105), the CXL root complex 305 Transaction layer packets (TLPs) are generated and sent to interface circuitry 120, and interface circuitry 120 converts the transaction layer packets into remote direct memory accesses and sends the remote direct memory accesses through the computer cluster interconnect interface.

图4示出包括CXL开关405的实施例。多个主机102中的每一个连接到CXL开关405(其可以是或可包括例如CXL 2.0开关),所述CXL开关连接到一个或多个接口电路120(每个接口电路在图4中标记为“IC”)和零个或多个其他CXL设备410(每个CXL设备在图4中标记为“D”)。接口电路120可通过远程直接存储器访问连接而连接到单个共享存储器池服务器125(如图所示),或者所述接口电路可连接到多个存储器池服务器125(例如,每个接口电路120可连接到相应的存储器池服务器125)。Figure 4 shows an embodiment including a CXL switch 405. Each of the plurality of hosts 102 is connected to a CXL switch 405 (which may be or may include, for example, a CXL 2.0 switch) that is connected to one or more interface circuits 120 (each interface circuit is labeled in FIG. 4 "IC") and zero or more other CXL devices 410 (each CXL device is labeled "D" in Figure 4). Interface circuit 120 may be connected to a single shared memory pool server 125 via a remote direct memory access connection (as shown), or the interface circuit 120 may be connected to multiple memory pool servers 125 (e.g., each interface circuit 120 may be connected to to the corresponding storage pool server 125).

图5是在一些实施例中方法的流程图。所述方法包括:在505处,由中央处理单元执行用于将第一值存储在第一地址处的第一存储器位置中的存储指令;并且在510处,响应于执行存储指令而由接口电路向包括第一存储器位置的存储器发送存储命令,所述存储命令是用于将第一值存储在第一存储器位置中的命令。所述方法还可包括:在515处,由中央处理单元执行用于读取第二值的加载指令,所述第二值可被存储在第二地址处的第二存储器位置中;并且在520处,响应于执行加载指令而由接口电路向包括第二存储器位置的存储器发送读取命令,所述读取命令是从第二存储器位置读取第二值的命令。Figure 5 is a flow diagram of a method in some embodiments. The method includes, at 505, executing, by the central processing unit, a store instruction for storing a first value in a first memory location at a first address; and at 510, in response to executing the store instruction, causing, by the interface circuitry, a store instruction to store a first value in a first memory location at a first address. A store command is sent to a memory including a first memory location, the store command being a command to store a first value in the first memory location. The method may also include, at 515, executing, by the central processing unit, a load instruction to read a second value, which may be stored in a second memory location at a second address; and at 520 At, a read command is sent by the interface circuit to the memory including the second memory location in response to executing the load instruction, the read command being a command to read the second value from the second memory location.

如本文所用,计算机集群互连接口是适用于互连计算机的任何接口,诸如InfiniBand、以太网或光纤通道。As used herein, a computer cluster interconnect interface is any interface suitable for interconnecting computers, such as InfiniBand, Ethernet, or Fiber Channel.

如本文所使用的,某物的“一部分”意指该物的“至少一些”,并且同样可以意指少于该物的全部或者该物的全部。同样,事物的“一部分”包括作为特例的整个事物,即,整个事物是该事物的一部分的实例。如本文所使用的,当第二量在第一量X的“Y内”时,意指第二量至少为X-Y并且第二量至多为X+Y。如本文所使用的,当第二数值在第一数值的“Y%以内”时,意指第二数值至少是第一数值的(1-Y/100)倍,并且第二数值至多是第一数值数的(1+Y/100)倍。如本文所使用的,术语“或”应解释为“和/或”,使得例如,“A或B”意指“A”或“B”或“A和B”中的任何一个。As used herein, a "portion" of something means "at least some" of the thing, and may equally mean less than all of the thing or all of the thing. Likewise, a "part" of a thing includes the whole thing as a special case, that is, the instance of which the whole thing is a part. As used herein, when the second amount is "within Y" of the first amount X, it is meant that the second amount is at least X-Y and the second amount is at most X+Y. As used herein, when a second value is "within Y%" of a first value, it is meant that the second value is at least (1-Y/100) times the first value, and that the second value is at most (1+Y/100) times the number of values. As used herein, the term "or" shall be construed as "and/or" such that, for example, "A or B" means either "A" or "B" or "A and B."

在本公开部分的背景部分中提供的背景仅被包括来设置上下文,并且该部分的内容不被认为是现有技术。所描述的(例如,在本文包括的任何系统图中)任何部件或部件的任何组合可以用于执行本文包括的任何流程图的一个或多个操作。此外,(i)所述操作是示例性操作,并且可以包括未明确涵盖的各种附加步骤,并且(ii)所述操作的时间顺序可以变化。The background provided in the Background section of this disclosure is included only to set the context and the contents of this section are not deemed to be prior art. Any component or combination of components described (eg, in any system diagram included herein) may be used to perform one or more operations of any flowchart included herein. Furthermore, (i) the operations described are exemplary operations and may include various additional steps not expressly covered, and (ii) the temporal order of the operations described may vary.

术语“处理电路”和“用于处理的装置”中的每一者在本文用来意指用于处理数据或数字信号的硬件、固件和软件的任何组合。处理电路硬件可以包括例如专用集成电路(ASIC)、通用或专用中央处理单元(CPU)、数字信号处理器(DSP)、图形处理单元(GPU)以及诸如现场可编程门阵列(FPGA)的可编程逻辑设备。如本文所使用的,在处理电路中,每个功能由配置为执行该功能的硬件(即,硬接线的)来执行,或者由配置为执行存储在非暂态存储介质中的指令的更通用的硬件(诸如CPU)来执行。处理电路可以在单个印刷电路板(PCB)上制造,或者分布在几个互连的PCB上。处理电路可以包含其他处理电路;例如,处理电路可以包括在PCB上互连的两个处理电路,FPGA和CPU。The terms "processing circuitry" and "means for processing" are each used herein to mean any combination of hardware, firmware, and software for processing data or digital signals. Processing circuit hardware may include, for example, application specific integrated circuits (ASICs), general or special purpose central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), and programmable devices such as field programmable gate arrays (FPGAs). logical device. As used herein, in processing circuitry, each function is performed by hardware configured to perform that function (i.e., hardwired), or more generally by hardware configured to execute instructions stored in a non-transitory storage medium hardware (such as CPU) to execute. Processing circuitry can be fabricated on a single printed circuit board (PCB) or distributed across several interconnected PCBs. The processing circuit may include other processing circuitry; for example, the processing circuit may include two processing circuits, an FPGA and a CPU, interconnected on a PCB.

如本文所使用的,当方法(例如,调整)或第一量(例如,第一变量)被称为“基于”第二量(例如,第二变量)时,这意指第二量是该方法的输入或影响第一量,例如,第二量可以是计算第一量的函数的输入(例如,唯一的输入,或几个输入之一),或者第一量可以等于第二量,或者第一量可以与第二量相同(例如,存储在存储器中与第二量相同的一个或多个位置)。As used herein, when a method (eg, an adjustment) or a first quantity (eg, a first variable) is said to be "based on" a second quantity (eg, a second variable), it means that the second quantity is the The method inputs or affects the first quantity, e.g., the second quantity can be an input to a function that computes the first quantity (e.g., the only input, or one of several inputs), or the first quantity can be equal to the second quantity, or The first quantity may be the same as the second quantity (eg, stored in the same location or locations in memory as the second quantity).

应当理解,尽管在本文中术语“第一”、“第二”、“第三”等可以用于描述各种元件、部件、区域、层和/或部段,但这些元件、部件、区域、层和/或部段应不受这些术语限制。这些术语仅用于将一个元件、部件、区域、层或部分与另一个元件、部件、区域、层或部分区分开来。因此,下面讨论的第一元件、部件、区域、层或部分可以被称为第二元件、部件、区域、层或部分,而不脱离本发明概念的精神和范围。It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, Layers and/or segments shall not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the spirit and scope of the inventive concept.

为了便于描述以描述如图所示的一个元件或特征与另一个元件或特征的关系,可在本文中使用诸如“在……下方(beneath)”、“在……之下(below)”、“在……下部(lower)”、“在……下面(under)”、“在……上方(above)”、“在……上部(upper)”等之类的空间相对术语。应当理解,此类空间相对术语旨在涵盖除了附图中描绘的取向之外装置在使用或操作中的不同取向。例如,如果在附图中的设备翻转,则描述在其他元件或特征“之下”或“下方”的元件将被定向在其他元件或特征的“上方”。因此,示例性术语“之下”和“下面”可涵盖上和下两种取向。设备可以其他方式取向(例如,旋转90度或以其他取向旋转),并且本文中使用的空间相对描述符应当相应地解释。此外,还将理解,当层被称为在两个层“之间”时,它可以是两个层之间的唯一层,或者也可能存在一个或多个居间层。For ease of description, terms such as "beneath," "below," or "beneath" may be used herein to describe the relationship of one element or feature to another element or feature as shown in the figures. Spatially relative terms such as "lower", "under", "above", "upper", etc. It will be understood that such spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary terms "below" and "below" may encompass both upper and lower orientations. The device may be otherwise oriented (e.g., rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly. Additionally, it will also be understood that when a layer is referred to as being "between" two layers, it can be the only layer between the two layers, or one or more intervening layers may also be present.

本文中使用的术语仅是为了描述特定实施例的目的,而不旨在限制本发明构思。如本文所使用的,术语“基本上”、“约”和类似术语用作近似术语而不是程度术语,并且旨在说明测量或计算值的固有变化,这些变化会被本领域普通技术人员识别出。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the inventive concept. As used herein, the terms "substantially," "about," and similar terms are used as terms of approximation rather than terms of degree and are intended to describe inherent variations in measured or calculated values that would be recognized by one of ordinary skill in the art .

如本文所使用的,单数形式“一”和“一个”也旨在包括复数形式,除非上下文另有明确指示。还应理解,当在本说明书中使用时,术语“包括(comprise)”和/或“包括(comprising)”指定所陈述的特征、整体、步骤、操作、元件和/或部件的存在,但不排除一个或多个其他特征、整体、步骤、操作、元件、部件和/或其组合的存在或添加。如本文所使用,术语“和/或”包括相关联的所列项目中的一项或多项的任意和所有组合。诸如“至少一个”的表述在元素列表之后使用时会修改整个元素列表,而不会修改列表中的单个元素。此外,当描述本发明概念的实施例时,“可以”的使用是指“本公开的一个或多个实施例”。另外,术语“示例性”旨在指一个实例或说明。如本文所用,术语“使用(use)”、“使用(using)”和“使用(used)”可分别被认为与术语“利用(utilize)”、“利用(utilizing)”和“利用(utilized)”同义。As used herein, the singular forms "a," "an" and "an" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when used in this specification, the terms "comprise" and/or "comprising" designate the presence, but not the presence, of stated features, integers, steps, operations, elements and/or parts. Excludes the presence or addition of one or more other features, integers, steps, operations, elements, parts and/or combinations thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. Expressions such as "at least one" when used after a list of elements modify the entire list of elements without modifying the individual elements in the list. Furthermore, when describing embodiments of the inventive concept, the use of "may" means "one or more embodiments of the present disclosure." Additionally, the term "exemplary" is intended to refer to an example or illustration. As used herein, the terms "use", "using" and "used" may be considered to be equivalent to the terms "utilize", "utilizing" and "utilized" respectively. "Synonymous.

应当理解,当元件或层被称为“在…上”、“连接到”、“联接到”或“邻近”另一个元件或层时,其可以直接“在…上”、“连接到”、“联接到”或“邻近”另一元件或层,或者可以存在一个或多个中间元件或层。相反,当元件或层被称为“直接在…上”、“直接连接到”、“直接联接到”或“直接邻近”另一元件或层时,不存在中间元件或层。It will be understood that when an element or layer is referred to as being "on," "connected to," "coupled to," or "adjacent to" another element or layer, it can be directly "on," "connected to," "Coupled to" or "adjacent to" another element or layer, or one or more intervening elements or layers may be present. In contrast, when an element or layer is referred to as being "directly on," "directly connected to," "directly coupled to" or "directly adjacent" another element or layer, there are no intervening elements or layers present.

本文列举的任何数值范围旨在包括包含在列举范围内的相同数值精度的所有子范围。例如,“1.0至10.0”或“1.0与10.0之间”的范围旨在包括所列举最小值1.0和所列举最大值10.0之间(并且包括)的所有子范围,即,具有等于或大于1.0的最小值以及等于或小于10.0的最大值,例如像2.4至7.6。类似地,描述为“在10的35%内”的范围旨在包括所列举最小值6.5(即,(1–35/100)乘以10)和所列举最大值13.5(即,(1+35/100)乘以10)之间(并且包括)的所有子范围,即,具有等于或大于6.5的最小值以及等于或小于13.5的最大值,例如像7.4至10.6。本文列举的任何最大数值限制旨在包括其中包含的所有较低数值限制,并且本说明书中列举的任何最小数值限制旨在包括其中包含的所有较高数值限制。Any numerical range recited herein is intended to include all subranges containing the same numerical precision within the recited range. For example, a range of "1.0 to 10.0" or "between 1.0 and 10.0" is intended to include all subranges between (and including) the recited minimum value of 1.0 and the recited maximum value of 10.0, that is, having a value equal to or greater than 1.0 Minimum value and maximum value equal to or less than 10.0, e.g. like 2.4 to 7.6. Similarly, a range described as "within 35% of 10" is intended to include the recited minimum value of 6.5 (i.e., (1–35/100) times 10) and the recited maximum value of 13.5 (i.e., (1+35 /100) times 10) (and inclusive), i.e., have a minimum value equal to or greater than 6.5 and a maximum value equal to or less than 13.5, e.g. like 7.4 to 10.6. Any maximum numerical limitations recited herein is intended to include any lower numerical limitations contained therein, and any minimum numerical limitations recited herein is intended to include any higher numerical limitations contained therein.

一些实施例包括在以下编号的陈述中列出的特征。Some embodiments include features listed in the following numbered statements.

1、一种系统,包括:1. A system including:

接口电路,其具有:Interface circuit, which has:

第一接口,被配置为连接到处理电路;以及a first interface configured to connect to the processing circuit; and

第二接口,被配置为连接到存储器,a second interface, configured to connect to the memory,

所述第一接口包括高速缓存一致性接口,并且the first interface includes a cache coherence interface, and

所述第二接口不同于所述第一接口。The second interface is different from the first interface.

2、如陈述1所述的系统,其还包括连接到所述第二接口的存储器服务器。2. The system of statement 1, further comprising a memory server connected to the second interface.

3、如陈述1或陈述2所述的系统,其中所述第二接口包括远程直接存储器访问接口。3. The system of Statement 1 or Statement 2, wherein the second interface includes a remote direct memory access interface.

4、如前述陈述中的任一项所述的系统,其中所述第二接口包括计算机集群互连接口。4. The system of any one of the preceding statements, wherein the second interface includes a computer cluster interconnect interface.

5、如陈述4所述的系统,其中所述计算机集群互连接口包括以太网接口。5. The system of statement 4, wherein the computer cluster interconnect interface includes an Ethernet interface.

6、如陈述2至5中任一项所述的系统,其中所述存储器服务器通过所具有的长度大于6英尺的缆线连接到所述第二接口。6. The system of any one of statements 2 to 5, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.

7、如前述陈述中的任一项所述的系统,其中所述高速缓存一致性接口包括计算快速链接(CXL)接口。7. The system of any one of the preceding statements, wherein the cache coherence interface includes a Compute Express Link (CXL) interface.

8、如前述陈述中的任一项所述的系统,其中所述第一接口被配置为:8. The system of any one of the preceding statements, wherein the first interface is configured to:

响应于由所述处理电路执行的加载指令而向所述处理电路发送数据;并且sending data to the processing circuit in response to a load instruction executed by the processing circuit; and

响应于由所述处理电路执行的存储指令而从所述处理电路接收数据。Data is received from the processing circuit in response to store instructions executed by the processing circuit.

9、如前述陈述中的任一项所述的系统,还包括:计算快速链接(CXL)根联合体,所述CXL根联合体连接在所述处理电路与所述第一接口之间。9. The system of any one of the preceding statements, further comprising a Computational Express Link (CXL) root complex connected between the processing circuitry and the first interface.

10、一种系统,其包括:10. A system comprising:

接口电路,所述接口电路具有:Interface circuit, the interface circuit has:

第一接口,所述第一接口被配置为连接到处理电路;以及a first interface configured to connect to the processing circuit; and

第二接口,所述第二接口被配置为连接到存储器,a second interface configured to connect to the memory,

所述第一接口包括计算快速链接(CXL)接口,并且the first interface includes a Compute Express Link (CXL) interface, and

所述第二接口不同于所述第一接口。The second interface is different from the first interface.

11、如陈述10所述的系统,还包括连接到所述第二接口的存储器服务器。11. The system of statement 10, further comprising a memory server connected to the second interface.

12、如陈述10或陈述11所述的系统,其中所述第二接口包括远程直接存储器访问接口。12. The system of Statement 10 or Statement 11, wherein the second interface includes a remote direct memory access interface.

13、如陈述10至12中任一项所述的系统,其中所述第二接口包括计算机集群互连接口。13. The system of any one of statements 10 to 12, wherein the second interface includes a computer cluster interconnect interface.

14、如陈述10至13中任一项所述的系统,其中所述计算机集群互连接口包括以太网接口。14. The system of any one of statements 10 to 13, wherein the computer cluster interconnect interface includes an Ethernet interface.

15、如陈述10至14中任一项所述的系统,其中所述存储器服务器通过所具有的长度大于6英尺的缆线连接到所述第二接口。15. The system of any one of statements 10 to 14, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.

16、如陈述10至15中任一项所述的系统,其中所述CXL接口包括高速缓存一致性接口。16. The system of any one of statements 10 to 15, wherein the CXL interface includes a cache coherence interface.

17、如陈述10至16中任一项所述的系统,其中所述第一接口被配置为:17. The system of any one of statements 10 to 16, wherein the first interface is configured to:

响应于由所述处理电路执行的加载指令而向所述处理电路发送数据;并且sending data to the processing circuit in response to a load instruction executed by the processing circuit; and

响应于由所述处理电路执行的存储指令而从所述处理电路接收数据。Data is received from the processing circuit in response to store instructions executed by the processing circuit.

18、如陈述10至17中任一项所述的系统,还包括:CXL根联合体,所述CXL根联合体连接在所述处理电路与所述第一接口之间。18. The system of any one of statements 10 to 17, further comprising: a CXL root complex connected between the processing circuitry and the first interface.

19、一种方法,其包括:19. A method comprising:

由中央处理单元执行用于将第一值存储在第一地址处的第一存储器位置中的存储指令,a store instruction executed by the central processing unit for storing a first value in a first memory location at a first address,

响应于所述执行所述存储指令而由接口电路向包括所述第一存储器位置的存储器发送包括存储命令,所述存储命令是用于将所述第一值存储在所述第一存储器位置中的命令,and sending, by the interface circuit to a memory including the first memory location, a store command in response to the execution of the store instruction, the store command being for storing the first value in the first memory location. The command,

其中所述接口电路具有:The interface circuit has:

第一接口,所述第一接口连接到所述中央处理单元;以及a first interface connected to the central processing unit; and

第二接口,所述第二接口连接到所述存储器,a second interface, the second interface being connected to the memory,

所述第一接口包括计算快速链接(CXL)接口,并且the first interface includes a Compute Express Link (CXL) interface, and

所述第二接口不同于所述第一接口。The second interface is different from the first interface.

20、如陈述19所述的方法,还包括:20. The method of statement 19, further comprising:

由所述中央处理单元执行用于将第二地址处的第二存储器位置中的值加载到所述中央处理单元的寄存器中的加载指令,execution by the central processing unit of a load instruction for loading a value in a second memory location at a second address into a register of the central processing unit,

响应于所述执行所述加载指令而由所述接口电路向所述存储器发送读取命令,所述读取命令是用于读取所述第二存储器位置中的值的命令。A read command is sent by the interface circuit to the memory in response to the execution of the load instruction, the read command being a command to read a value in the second memory location.

尽管本文已经具体描述和示出了用于远程存储器的接口的示例性实施例,但是许多修改和变型对于本领域技术人员将是显而易见的。因此,应当理解,根据本公开的原理构建的用于远程存储器的接口可以实现为不同于本文具体描述的那样。本发明也在所附权利要求及其等同物中定义。Although exemplary embodiments of interfaces for remote memory have been specifically described and illustrated herein, many modifications and variations will be apparent to those skilled in the art. Accordingly, it should be understood that interfaces for remote memory constructed in accordance with the principles of the present disclosure may be implemented differently than as specifically described herein. The invention is also defined in the appended claims and their equivalents.

Claims (20)

1. A system, comprising:
an interface circuit, the interface circuit having:
a first interface configured to be connected to a processing circuit; and
a second interface configured to connect to a memory,
the first interface includes a cache coherency interface, and
the second interface is different from the first interface.
2. The system of claim 1, further comprising a memory server connected to the second interface.
3. The system of claim 2, wherein the second interface comprises a remote direct memory access interface.
4. The system of claim 2, wherein the second interface comprises a computer cluster interconnect interface.
5. The system of claim 4, wherein the computer cluster interconnect interface comprises an ethernet interface.
6. The system of claim 2, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
7. The system of claim 1, wherein the cache coherence interface comprises a computing quick link (CXL) interface.
8. The system of claim 1, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
9. The system of claim 1, further comprising: a computing fast link (CXL) root complex is coupled between the processing circuitry and the first interface.
10. A system, comprising:
an interface circuit, the interface circuit having:
a first interface configured to be connected to a processing circuit; and
a second interface configured to connect to a memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
11. The system of claim 10, further comprising a memory server connected to the second interface.
12. The system of claim 11, wherein the second interface comprises a remote direct memory access interface.
13. The system of claim 11, wherein the second interface comprises a computer cluster interconnect interface.
14. The system of claim 13, wherein the computer cluster interconnect interface comprises an ethernet interface.
15. The system of claim 13, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
16. The system of claim 13, wherein the CXL interface comprises a cache coherence interface.
17. The system of claim 10, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
18. The system of claim 10, further comprising: a CXL root complex, the CXL root complex coupled between the processing circuitry and the first interface.
19. A method, comprising:
the store instructions for storing the first value in the first memory location at the first address are executed by the central processing unit,
transmitting, by an interface circuit, a store command to a memory including the first memory location in response to executing the store instruction, the store command being a command to store the first value in the first memory location,
wherein the interface circuit has:
a first interface connected to the central processing unit; and
a second interface, said second interface being connected to said memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
20. The method of claim 19, further comprising:
a load instruction is executed by the central processing unit for loading a value in a second memory location at a second address into a register of the central processing unit,
a read command is sent by the interface circuit to the memory in response to executing the load instruction, the read command being a command to read a value in the second memory location.
CN202310780755.7A 2022-09-21 2023-06-28 Interface for remote memory Pending CN117743253A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/408,725 2022-09-21
US18/054,492 2022-11-10
US18/054,492 US20240095171A1 (en) 2022-09-21 2022-11-10 Interface for remote memory

Publications (1)

Publication Number Publication Date
CN117743253A true CN117743253A (en) 2024-03-22

Family

ID=90281857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310780755.7A Pending CN117743253A (en) 2022-09-21 2023-06-28 Interface for remote memory

Country Status (1)

Country Link
CN (1) CN117743253A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119402566A (en) * 2024-12-30 2025-02-07 苏州元脑智能科技有限公司 A memory management system, method, program product and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119402566A (en) * 2024-12-30 2025-02-07 苏州元脑智能科技有限公司 A memory management system, method, program product and storage medium
CN119402566B (en) * 2024-12-30 2025-04-29 苏州元脑智能科技有限公司 Memory management system, method, program product and storage medium

Similar Documents

Publication Publication Date Title
CN113810312B (en) Systems and methods for managing memory resources
US20250036590A1 (en) System and method for aggregating server memory
EP3896574B1 (en) System and method for computing
US7818497B2 (en) Buffered memory module supporting two independent memory channels
US7899983B2 (en) Buffered memory module supporting double the memory device data width in the same physical space as a conventional memory module
US7865674B2 (en) System for enhancing the memory bandwidth available through a memory module
US7840748B2 (en) Buffered memory module with multiple memory device data interface ports supporting double the memory capacity
KR20200030325A (en) Storage device and system
US7783822B2 (en) Systems and methods for improving performance of a routable fabric
US10824574B2 (en) Multi-port storage device multi-socket memory access system
US20150261698A1 (en) Memory system, memory module, memory module access method, and computer system
US8019919B2 (en) Method for enhancing the memory bandwidth available through a memory module
US9547610B2 (en) Hybrid memory blade
TWI459211B (en) Computer system and method for sharing computer memory
US11960900B2 (en) Technologies for fast booting with error-correcting code memory
EP3716085B1 (en) Technologies for flexible i/o endpoint acceleration
US6931462B2 (en) Memory controller which increases bus bandwidth, data transmission method using the same, and computer system having the same
KR100807443B1 (en) Opportunistic read completion combining
CN117743253A (en) Interface for remote memory
EP4343560A1 (en) Interface for remote memory
US9910789B2 (en) Electrical and optical memory access
US11347425B2 (en) Data mover selection system
US11487695B1 (en) Scalable peer to peer data routing for servers
CN120492381A (en) Accelerator card, control method thereof, and accelerated computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination