US20220091981A1 - Unified Memory Management for a Multiple Processor System - Google Patents
Unified Memory Management for a Multiple Processor System Download PDFInfo
- Publication number
- US20220091981A1 US20220091981A1 US17/031,432 US202017031432A US2022091981A1 US 20220091981 A1 US20220091981 A1 US 20220091981A1 US 202017031432 A US202017031432 A US 202017031432A US 2022091981 A1 US2022091981 A1 US 2022091981A1
- Authority
- US
- United States
- Prior art keywords
- memory
- transaction
- inter
- processor
- chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/0284—Multiple user address space allocation, e.g. using different base addresses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0873—Mapping of cache memory to specific storage devices or parts thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1045—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4027—Coupling between buses using bus bridges
- G06F13/404—Coupling between buses using bus bridges with address mapping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4265—Bus transfer protocol, e.g. handshake; Synchronisation on a point to point bus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/466—Transaction processing
- G06F9/467—Transactional memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1056—Simplification
Definitions
- Processing systems can use different types of transactions to move data.
- a first processing system may use memory-mapped transactions to move data to a specific memory address of a recipient processing system.
- a second processing system may use a stream transaction to send data to a recipient processing system that is then tasked of determining how to handle the received data.
- Using different forms of transactions within a system can be inefficient and can complicate memory management.
- a multi-processor unified memory management system may comprise a first programmable processor system that may communicate via an inter-chip link with a second programmable processor system.
- the first programmable processor system may comprise a first inter-chip memory management module that may be configured to analyze memory access transactions.
- the first inter-chip memory management module may be configured to translate outbound memory-mapped transactions into non-memory mapped transactions comprising coded memory address data.
- the first inter-chip memory management module may be configured to translate inbound non-memory mapped transactions into memory-mapped transactions based on coded memory address data.
- the system may comprise the second programmable processor system that may communicate via the inter-chip link with the first programmable processor system.
- the second programmable processor system may comprise a second inter-chip memory management module configured to analyze memory access transactions.
- the module may be configured to translate outbound memory-mapped transactions into non-memory mapped transactions comprising coded memory address data.
- the module may be configured to translate inbound non-memory mapped transactions into memory-mapped transactions based on coded memory address data.
- the first inter-chip memory management module may be further configured to analyze a memory access transaction.
- the module may be further configured to determine that the memory access transaction may involve a memory address accessible via the second programmable processor system.
- the module may be further configured to, in response to determining that the memory access transaction involves the memory address accessible via the second programmable processor, output a non-memory mapped transaction via an inter-chip high speed link to the second programmable processor.
- the non-memory mapped transaction may comprise coded memory address data.
- the second inter-chip memory management module may be further configured to receive the non-memory mapped transaction from the first inter-chip memory management module.
- the second inter-chip memory management module may be further configured to store data from the received non-memory mapped transaction into a memory, such as random access memory (RAM) or a local buffer, at a memory location based on the coded memory address data included in the non-memory mapped transaction.
- the system may further comprise a first random access memory (RAM) or local buffer directly accessible by only the first programmable processor system.
- the system may further comprise a second RAM or local buffer directly accessible by only the second programmable processor.
- the first inter-chip memory management module and the second inter-chip memory management module may use a common flat memory map.
- the first programmable processor system may comprise a first field programmable gate array (FPGA).
- the second programmable processor system may comprise a second FPGA.
- the second programmable processor system may comprises a reduced instruction set computer (RISC) processor.
- the system may further comprise an inter-chip link between the first programmable processor system and the second programmable processor system.
- RISC reduced
- a method for using a unified memory management system may comprise outputting, by a native processing module of a first processor, a memory transaction.
- the method may comprise determining, by a first inter-chip memory management module executed by the first processor, that the memory transaction may correspond to a portion of a flat memory map that is managed by another processor.
- the method may comprise translating, by the first inter-chip memory management module, the memory transaction into a non-memory mapped memory transaction.
- the method may comprise transmitting, by the first inter-chip memory management module, the translated memory transaction via an inter-chip link to a second inter-chip memory management module of a second processor.
- the method may comprise translating, by the second inter-chip memory management module, the translated memory transaction into a memory-mapped memory transaction.
- the method may comprise performing, by the second inter-chip memory management module, the memory-mapped memory transaction.
- the non-memory mapped memory transaction may be a stream-based memory transaction.
- the method may further comprise receiving, by the second inter-chip memory management module, the transmitted translated memory transaction via the inter-chip link.
- the method may further comprise determining, by the second inter-chip memory management module, that the transmitted translated memory transaction may not to be forwarded. Determining that the transmitted translated memory transaction may not to be forwarded may be based on the portion of the flat memory map being managed by the second inter-chip memory management module of the second processor.
- the first processor may be a first field-programmable gate array (FPGA) and the second processor may be a second FPGA.
- FPGA field-programmable gate array
- Performing the memory-mapped memory transaction may comprise storing data from the memory transaction into random access memory (RAM) at a memory location based on coded memory address data included in the non-memory mapped transaction.
- a memory address corresponding to the flat memory map may be transmitted via a side-band communication as part of transmitting the translated memory transaction via the inter-chip link.
- Transmitting, by the first inter-chip memory management module, the translated memory transaction via the inter-chip link to the second inter-chip memory management module of the second processor may comprise forwarding, by a third inter-chip memory management module of a third processor, the translated memory transaction via the inter-chip link to the second inter-chip memory management module of the second processor.
- the first processor may be connected with the second processor only via the third processor.
- FIG. 1 illustrates an embodiment of a multi-processor unified memory management system.
- FIG. 2 illustrates another embodiment of a multi-processor unified memory management system.
- FIG. 3 illustrates an embodiment of a flat memory map created using a unified memory management system.
- FIG. 4 illustrates an embodiment of a method for using a unified memory management system.
- Embodiments detailed herein disclose the use of a memory management module (MMM) that can allow for a flat memory map to be used across a multiple processor system.
- MMM memory management module
- the MMM can have the ability to handle both memory map and stream inter-chip transactions. Communication between processors may be performed using stream-based transactions across a high speed interface.
- the memory transaction may be routed to an MMM implemented as part of the processor.
- the MMM may have stored a flat memory map that defines how memory is assigned across multiple processors, including the first processor on which the MMM is implemented. MMMs on the other processors in communication with the first processor can store the same memory map.
- the MMM may determine whether a locally-accessible memory (e.g., random access memory, RAM, local buffer) is to be accessed or if a memory accessible via another processor is to be accessed. If the memory transaction involves the locally-accessible memory, the MMM may perform a memory-mapped transaction directly with the locally-accessible memory.
- a locally-accessible memory e.g., random access memory, RAM, local buffer
- the processor may determine the appropriate processor to transmit the memory transaction to and may translate the memory transaction into a stream-based memory transaction.
- This stream-based memory transaction can include coded memory address data.
- the stream-based memory transaction may then be sent via a high speed inter-chip link to the appropriate processor.
- An MMM of the processor that receives the stream-based memory transaction may decode the coded memory address data and store to the appropriate locally-accessible memory. While stream-based memory transactions are typically used for data processing-related memory transactions, all inter-chip memory transactions between MMMs of processors may be handled using stream-based memory transactions.
- Such an arrangement can have one or more distinct advantages.
- a single simple memory map can be implemented that is common across all processors.
- Each processor can be configured to access and use the entire memory, even though various portions of the memory are only directly accessible via a particular IC.
- the MMM of each processor can handle routing of memory transactions to the appropriate processor and can handle both memory-mapped and stream based memory transactions. Therefore, when inter-chip communication is necessary for a memory transaction, the MMMs handle conversion, if needed, of the memory transaction into a stream-based transaction that includes encoded memory address data and decoding of the stream-based transaction upon receipt.
- such an arrangement that uses MMMs can allow for priority-based routing among various memory mapped and non-memory-mapped transactions.
- Such an arrangement can allow for a particular quality of service (QoS) to be realized for particular processes that are dependent on memory transactions being performed within a certain amount of time. Based upon an indicated priority level, certain memory transactions can be performed out-of-turn from other memory transactions in an attempt to realize the QoS.
- QoS quality of service
- FIG. 1 illustrates an embodiment of a multi-processor unified memory management system 100 (“system 100 ”).
- System 100 can include: processor 110 ; processor 120 ; memory 130 ; memory 140 ; and inter-chip link 160 .
- Processor 110 and processor 120 may be various forms of processors on which customized modules can be implemented.
- processor 110 and processor 120 may be various types of FPGAs on which code can be implemented using programmable hardware.
- FPGAs may be a multiple processor system on a chip (MPSoC).
- MPSoC multiple processor system on a chip
- MMM 150 - 1 , 150 - 2
- Each MMM may handle two primary tasks: 1) routing memory transactions appropriately; and 2) performing any conversion or translation needed to the memory transaction.
- Native processing 112 of processor 110 may be implemented as firmware based on code or as executed software written by a person or obtained from some other source.
- MMM 150 - 1 may be implemented as a code module that is similarly implemented as firmware (or software) on processor 110 . In other embodiments, MMMs may be implemented using hardware particularly designed for the purpose.
- Native processing 112 may generate either a memory-mapped or a stream-based memory transaction. Regardless of whether the memory transaction output by native processing 112 is memory-mapped or stream-based, the transaction can be routed to MMM 150 - 1 . (Therefore, MMM 150 - 1 handles all forms of memory transactions for native processing 112 .) MMM 150 - 1 may maintain a memory map that includes memory 130 of processor 110 and memory 140 of processor 120 .
- a first address range may be mapped to processor 110 while a second memory address range may be mapped to processor 120 . If a memory transaction received by MMM 150 - 1 corresponds to a memory address mapped to processor 110 , the memory transaction may be performed directly by MMM 150 - 1 with memory 130 . Memory transactions conducted by MMM 150 - 1 with memory 130 can be memory mapped transactions.
- a memory transaction received by MMM 150 - 1 from native processing 112 indicates a memory address mapped to processor 120
- the memory transaction may be transmitted by MMM 150 - 1 via inter-chip link 160 to MMM 150 - 2 .
- Native processing 112 of processor 110 may be able to generate a transaction using a memory mapped protocol (e.g., AXI4-memory mapped transactions) and/or a transaction using a stream protocol (e.g., AXI4-stream protocol data transfer transactions).
- a memory mapped protocol e.g., AXI4-memory mapped transactions
- a stream protocol e.g., AXI4-stream protocol data transfer transactions.
- all transactions involve the use of a target memory address.
- a stream transaction does not include a memory address associated with the transaction.
- a stream transaction (e.g., an AXI stream transaction) can allow for a unidirectional channel for data flow.
- a stream transaction may tend to provide better performance compared to a memory mapped transaction due to less overhead data being involved. Therefore, for communications between processors, AXI stream based transactions may be preferable.
- MMM 150 - 1 may serve to convert a transaction to a stream transaction prior to sending via inter-chip link 160 .
- the transaction may then be sent to MMM 150 - 2 .
- MMM 150 - 2 which has the same memory map as MMM 150 - 1 , may then perform the memory transaction using memory 140 .
- MMM 150 - 2 may translate the received stream protocol transaction into a memory-mapped transaction to perform the memory transaction with memory 140 . Encoded within the stream protocol transaction or sent via a sidelink transaction may be memory address information added by MMM 150 - 1 .
- MMM 150 - 2 may decode these memory address information and use it to create the memory mapped transaction.
- the memory address information may be sent by MMM 150 - 1 to MMM 150 - 2 using in-band signaling.
- In-band signal can involve a data header being sent before or after the data payload on the inter-chip link as part of the stream protocol transaction.
- side-band signaling may be used.
- An inter-chip link protocol such as Interlaken, can support built-in low bandwidth sideband bus communications. Such arrangements allow for higher speed data transmissions in-band and lower speed transmissions via a side-band.
- Side-band signaling can include a memory address and control messages being sent on a low-bandwidth (relative to the high-bandwidth using to transmit the data payload), out-of-band inter-chip link. Therefore, using side-band signaling, a different frequency may be used for communication than the data.
- efficient side-band signaling may be used.
- the destination memory address is sent in-band; however, the identity (ID) of the target processor and routing metadata is sent through a low-bandwidth side-band link.
- ID identity
- routing metadata is sent through a low-bandwidth side-band link.
- MMM 150 - 2 may function the same as MMM 150 - 1 . Therefore, MMMs 150 may each handle memory read and write transactions to a local memory and remote memory that are part of a common flat memory map, along with handling any protocol translations necessary between a memory-mapped protocol and a stream protocol.
- a common piece of code may be used to implement MMMs 150 .
- a difference between MMMs 150 may be which address range within a common flat memory map each MMM can access directly. From the point-of-view of native processing 112 and native processing 122 , the entire memory map can be treated the same. Each MMM of MMMs 150 properly routes, translates, and responds to the memory transactions as needed.
- FIG. 2 illustrates another embodiment of a multi-processor unified memory management system 200 (“system 200 ”).
- system 200 can include: FPGA 210 ; FPGA 220 ; FPGA 230 ; and MPSoC 242 .
- FPGAs 210 , 220 , and 230 can have various modules that are created as code and used to configure the FPGAs.
- FPGA 210 can include native processing 212 and MMM 270 - 1 .
- MMM 270 - 1 may communicate directly with memory 261 . Only FPGA 210 may be able to directly access memory 261 ; therefore, memory transactions that involve the portion of the system memory map corresponding to memory 261 may be required to be performed via MMM 270 - 1 .
- FPGA 220 can include native processing 222 , MMM 270 - 2 , and local buffer 226 .
- Native processing 222 and MMM 270 - 2 may function as detailed in relation to the native processing and MMMs of system 100 .
- MMM 270 - 2 may be configured to access an additional type of memory, such as local buffer 226 .
- Local buffer 226 can represent high speed memory that is on-board FPGA 220 .
- Local buffer 226 can be included as part of the system-wide common flat memory map and may be accessed via memory mapped transactions by MMM 270 - 2 . Therefore, a memory transaction conducted by any of FPGAs 210 , 220 , 230 , or MPSoC 242 may be routed to and handled by MMM 270 - 2 .
- FPGA 230 can include native processing 232 , MMM 270 - 3 , and data local area network (LAN) 234 .
- Native processing 232 and MMM 270 - 3 may function as detailed in relation to the native processing and MMMs of system 100 .
- MMM 270 - 3 may be additionally configured to communicate with data LAN 234 .
- Data LAN 234 may serve as an interface for input and output of user data, such as via one or more user interfaces. Data exchanged with data LAN 234 may be via a stream protocol, therefore transactions conducted between MMM 270 - 3 and data LAN 234 may be converted to a stream protocol, if needed.
- MPSoC 242 includes multiple on-board processors.
- MPSoC 242 can include FPGA 240 and processing subsystem 250 .
- FPGA 240 may include native processing 246 and MMM 270 - 4 .
- Processing subsystem 250 may include one or more other types of processors, such as processor 252 .
- Processor 252 could be a RISC-based processor (e.g., from ARM).
- MPSoC 242 may have multiple dedicated memories.
- MMM 270 - 4 may control access to memory 264 and memory 265 .
- a memory mapped protocol may be used by MMM 270 - 4 for communication with memory 264 and memory 265 .
- MMM 270 - 4 may allow for processor 252 to perform a memory mapped transaction with FPGA 240 or any of FPGAs 210 , 220 , and 230 .
- MMM 270 - 4 similar to the other instances of MMMs 270 may translate a memory mapped protocol transaction into a stream protocol transaction.
- the memory address information included as part of the memory mapped protocol transaction may be embedded as part of the stream protocol transaction such that the memory address information can be extracted by the receiving MMM.
- FPGA 210 may communicate with FPGA 220 via inter-chip link 214 .
- FPGA 220 may communicate with FPGA 230 via inter-chip link 224 .
- FPGA 220 may communicate with FPGA 240 via inter-chip link 244 . It should be understood that this hub-and-spoke arrangement around FPGA 220 is merely an example. Additional or alternate inter-chip links may be present.
- FPGA 210 may have a second inter-chip link to, for example, FPGA 230 .
- Each MMM of MMMs 270 may only have data stored indicating to which processor a memory transaction should be forwarded.
- the flat memory map maintained by MMM 270 - 1 may indicate a first range of memory addresses that correspond to memory 261 . All other memory addresses may correspond to FPGA 220 and MMM 270 - 2 .
- MMM 270 - 2 may need to perform further forwarding, such as to FPGA 230 or FPGA 240 .
- each MMM of MMM 270 can handle stream-based memory transactions (or another form of non-memory mapped memory transactions) and memory mapped memory transactions in immediate succession.
- native processing 212 may conduct a memory transaction with a particular memory address.
- the memory transaction may be sent to MMM 270 - 1 by native processing 212 .
- MMM 270 - 1 may determine that the memory transaction corresponds to a memory address in the flat memory map that corresponds to FPGA 220 .
- the memory transaction may be sent via a stream transaction to FPGA 220 and received by MMM 270 - 2 .
- MMM 270 - 2 may analyze the stream transaction to extract memory address information.
- MMM 270 - 2 may access the flat memory map and determine that the memory address corresponds to FPGA 230 and MMM 270 - 3 .
- a second memory transaction may be sent via a stream transaction to FPGA 230 by FPGA 220 and received by MMM 270 - 3 .
- MMM 270 - 3 may analyze the stream transaction to extract memory address information and may then conduct the memory transaction locally with memory 263 . Therefore, from the point-of-view of MMM 270 - 1 , the flat memory map indicates that the memory transaction should be sent to FPGA 220 .
- the flat memory map of MMM 270 - 2 which corresponds to the same addresses, indicates that the memory transaction is to be sent via inter-chip link 224 to FPGA 230 .
- the memory map of MMM 270 - 3 indicates the memory transaction is to be handled directly with memory 263 .
- an advantage to at least some of the arrangements detailed herein is that the MMM transmitting the memory transaction has the memory address destination, but does not need all of the details of the route for the transaction to the memory address. Rather, the MMM transmitting the memory transaction determines the next MMM to which the memory transaction should be transmitted. This next MMM determines the next hop toward the memory address destination (if a next hop is needed).
- Such an arrangement can further allow for a stream transaction to be transmitted without the destination memory address being known. Rather, the stream memory transaction can be routed based on a destination processor identifier. A separate memory address space may be maintained that is mapped to only the processor identifier and the local MMM determines the specific memory addresses.
- FIG. 3 illustrates an embodiment of a flat memory map 300 created using a unified memory management system.
- Flat memory map 300 may be common across all processors of a unified memory management system, such as system 200 of FIG. 2 .
- Memory map 300 indicates five memory address blocks: memory block 301 ; memory address block 302 ; memory address block 303 ; memory address block 304 ; and memory address block 305 .
- the version of memory map 300 stored by each MMM can include the same data stored at the same memory addresses.
- memory address block 301 corresponds to a memory (e.g., DDR RAM, local buffer) in direct communication with FPGA 210
- memory address block 302 corresponds to a memory in direct communication with FPGA 220
- memory address block 303 corresponds to a memory in direct communication with FPGA 230
- memory address block 304 corresponds to a memory in communication with FPGA 240
- memory address block 305 corresponds to a memory in direct communication with processing subsystem 250 .
- each memory map may correspond to the same data, each memory map may differ in how various memory address blocks are mapped for access. From the perspective of MMM 270 - 1 , memory transactions involving memory address within memory address block 301 may be directly handled; memory transactions involving memory addresses within memory address blocks 302 - 305 may be forwarded to FPGA 220 via inter-chip link 214 .
- memory transactions involving memory address within memory address block 301 may be forwarded to FPGA 210 via inter-chip link 214 ; memory transactions involving memory addresses within memory address block 302 may be directly handled, memory transaction involving memory addresses within memory address block 303 may be forwarded to FPGA 230 via inter-chip link 224 ; and memory transaction involving memory addresses within memory address blocks 304 and 305 may be forwarded to FPGA 240 via inter-chip link 244 . Therefore, while each processor may have access to the entire memory map, the routing of memory transactions within a system can be controlled by MMMs based on stored flat memory maps.
- FIG. 4 illustrates an embodiment of a method 400 for using a unified memory management system.
- Method 400 can involve the use of systems arranged similar to system 100 and system 200 .
- a memory transaction may be received from a local native processing component.
- Block 405 may be performed by an MMM being executed by the processing system that is performing native processing.
- the memory transaction may be a stream-based memory transaction (or another form of non-memory mapped memory transaction) or a memory mapped memory transaction.
- the MMM can be configured to handle both types of memory transactions in immediate succession. Therefore, the native processing process may transmit a memory request to the MMM.
- the native processing process may not have visibility as to whether the memory transaction involves local memory or data stored in memory of another processor.
- the MMM may determine if the memory transaction involves local or remote memory.
- the MMM may make the determination by a memory address of the request. Since a single memory map is used across the entire, multi-processor system, one or more ranges of memory addresses are mapped to the local memory.
- Block 415 is performed if block 410 was determined to involve a local memory transaction.
- the MMM may directly access the local memory and perform the memory transaction, such as writing to the memory address or reading from the memory address.
- Block 420 is performed if block 410 was determined to involve a remote memory transaction. That is, the memory transaction involves accessing memory that is only in direction communication with another processor of the system.
- the MMM may translate the memory transaction into a stream-based memory transaction that includes memory location data encoded as part of the transaction output by the MMM. If the processor is in communication with multiple processors, the appropriate processor to which the memory transaction is to be sent may be selected. The appropriate processor may be selected based on the memory address.
- the memory transaction output by the MMM may be forwarded via an inter-chip link 425 to another processor, which may have been selected as part of block 420 .
- the memory transaction is received by the other processor via the inter-chip link.
- the memory transaction may be analyzed by an MMM of the processor that received the memory transaction.
- the MMM may then determine at block 435 , based on the encoded memory location, whether the memory address is directly accessible by the processor that received the memory transaction or if the memory transaction needs to be forwarded again. If forwarded again, the transaction may be translated and forwarded at blocks 420 through 435 until the memory transaction arrives at the correct processor.
- processors may be chained together, and thus forwarding of a memory transaction may occur at most only a few times. However, in some implementations many more processors may be chained together and forwarding of the memory transaction may need to be performed many times.
- method 400 proceeds to block 440 .
- translation if needed, is performed on the received memory transaction.
- the memory transaction is performed.
- configurations may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure.
- examples of the methods may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments to perform the necessary tasks may be stored in a non-transitory computer-readable medium such as a storage medium. Processors may perform the described tasks.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Memory System (AREA)
- Multi Processors (AREA)
Abstract
Description
- Processing systems can use different types of transactions to move data. A first processing system may use memory-mapped transactions to move data to a specific memory address of a recipient processing system. A second processing system may use a stream transaction to send data to a recipient processing system that is then tasked of determining how to handle the received data. Using different forms of transactions within a system can be inefficient and can complicate memory management.
- Various embodiments are described related to a multi-processor unified memory management system. In some embodiments, a multi-processor unified memory management system is described. The system may comprise a first programmable processor system that may communicate via an inter-chip link with a second programmable processor system. The first programmable processor system may comprise a first inter-chip memory management module that may be configured to analyze memory access transactions. The first inter-chip memory management module may be configured to translate outbound memory-mapped transactions into non-memory mapped transactions comprising coded memory address data. The first inter-chip memory management module may be configured to translate inbound non-memory mapped transactions into memory-mapped transactions based on coded memory address data. The system may comprise the second programmable processor system that may communicate via the inter-chip link with the first programmable processor system. The second programmable processor system may comprise a second inter-chip memory management module configured to analyze memory access transactions. The module may be configured to translate outbound memory-mapped transactions into non-memory mapped transactions comprising coded memory address data. The module may be configured to translate inbound non-memory mapped transactions into memory-mapped transactions based on coded memory address data.
- Embodiments of such a system may include one or more of the following features: the first inter-chip memory management module may be further configured to analyze a memory access transaction. The module may be further configured to determine that the memory access transaction may involve a memory address accessible via the second programmable processor system. The module may be further configured to, in response to determining that the memory access transaction involves the memory address accessible via the second programmable processor, output a non-memory mapped transaction via an inter-chip high speed link to the second programmable processor. The non-memory mapped transaction may comprise coded memory address data. The second inter-chip memory management module may be further configured to receive the non-memory mapped transaction from the first inter-chip memory management module. The second inter-chip memory management module may be further configured to store data from the received non-memory mapped transaction into a memory, such as random access memory (RAM) or a local buffer, at a memory location based on the coded memory address data included in the non-memory mapped transaction. The system may further comprise a first random access memory (RAM) or local buffer directly accessible by only the first programmable processor system. The system may further comprise a second RAM or local buffer directly accessible by only the second programmable processor. The first inter-chip memory management module and the second inter-chip memory management module may use a common flat memory map. The first programmable processor system may comprise a first field programmable gate array (FPGA). The second programmable processor system may comprise a second FPGA. The second programmable processor system may comprises a reduced instruction set computer (RISC) processor. The system may further comprise an inter-chip link between the first programmable processor system and the second programmable processor system.
- In some embodiments, a method for using a unified memory management system is described. The method may comprise outputting, by a native processing module of a first processor, a memory transaction. The method may comprise determining, by a first inter-chip memory management module executed by the first processor, that the memory transaction may correspond to a portion of a flat memory map that is managed by another processor. The method may comprise translating, by the first inter-chip memory management module, the memory transaction into a non-memory mapped memory transaction. The method may comprise transmitting, by the first inter-chip memory management module, the translated memory transaction via an inter-chip link to a second inter-chip memory management module of a second processor. The method may comprise translating, by the second inter-chip memory management module, the translated memory transaction into a memory-mapped memory transaction. The method may comprise performing, by the second inter-chip memory management module, the memory-mapped memory transaction.
- Embodiments of such a method may include one or more of the following features: the non-memory mapped memory transaction may be a stream-based memory transaction. The method may further comprise receiving, by the second inter-chip memory management module, the transmitted translated memory transaction via the inter-chip link. The method may further comprise determining, by the second inter-chip memory management module, that the transmitted translated memory transaction may not to be forwarded. Determining that the transmitted translated memory transaction may not to be forwarded may be based on the portion of the flat memory map being managed by the second inter-chip memory management module of the second processor. The first processor may be a first field-programmable gate array (FPGA) and the second processor may be a second FPGA. Performing the memory-mapped memory transaction may comprise storing data from the memory transaction into random access memory (RAM) at a memory location based on coded memory address data included in the non-memory mapped transaction. A memory address corresponding to the flat memory map may be transmitted via a side-band communication as part of transmitting the translated memory transaction via the inter-chip link. Transmitting, by the first inter-chip memory management module, the translated memory transaction via the inter-chip link to the second inter-chip memory management module of the second processor may comprise forwarding, by a third inter-chip memory management module of a third processor, the translated memory transaction via the inter-chip link to the second inter-chip memory management module of the second processor. The first processor may be connected with the second processor only via the third processor.
- A further understanding of the nature and advantages of various embodiments may be realized by reference to the following figures. In the appended figures, similar components or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
-
FIG. 1 illustrates an embodiment of a multi-processor unified memory management system. -
FIG. 2 illustrates another embodiment of a multi-processor unified memory management system. -
FIG. 3 illustrates an embodiment of a flat memory map created using a unified memory management system. -
FIG. 4 illustrates an embodiment of a method for using a unified memory management system. - Embodiments detailed herein disclose the use of a memory management module (MMM) that can allow for a flat memory map to be used across a multiple processor system. The MMM can have the ability to handle both memory map and stream inter-chip transactions. Communication between processors may be performed using stream-based transactions across a high speed interface.
- When a memory transaction is to be performed by a programmable processor, such as a field programmable gate array (FPGA), the memory transaction may be routed to an MMM implemented as part of the processor. The MMM may have stored a flat memory map that defines how memory is assigned across multiple processors, including the first processor on which the MMM is implemented. MMMs on the other processors in communication with the first processor can store the same memory map. Based on the received memory transaction, the MMM may determine whether a locally-accessible memory (e.g., random access memory, RAM, local buffer) is to be accessed or if a memory accessible via another processor is to be accessed. If the memory transaction involves the locally-accessible memory, the MMM may perform a memory-mapped transaction directly with the locally-accessible memory. If the memory transaction involves a memory of another processor, the processor may determine the appropriate processor to transmit the memory transaction to and may translate the memory transaction into a stream-based memory transaction. This stream-based memory transaction can include coded memory address data. The stream-based memory transaction may then be sent via a high speed inter-chip link to the appropriate processor. An MMM of the processor that receives the stream-based memory transaction may decode the coded memory address data and store to the appropriate locally-accessible memory. While stream-based memory transactions are typically used for data processing-related memory transactions, all inter-chip memory transactions between MMMs of processors may be handled using stream-based memory transactions.
- Such an arrangement can have one or more distinct advantages. For a multiple processor system, a single simple memory map can be implemented that is common across all processors. Each processor can be configured to access and use the entire memory, even though various portions of the memory are only directly accessible via a particular IC. The MMM of each processor can handle routing of memory transactions to the appropriate processor and can handle both memory-mapped and stream based memory transactions. Therefore, when inter-chip communication is necessary for a memory transaction, the MMMs handle conversion, if needed, of the memory transaction into a stream-based transaction that includes encoded memory address data and decoding of the stream-based transaction upon receipt.
- Additionally or alternatively, such an arrangement that uses MMMs can allow for priority-based routing among various memory mapped and non-memory-mapped transactions. Such an arrangement can allow for a particular quality of service (QoS) to be realized for particular processes that are dependent on memory transactions being performed within a certain amount of time. Based upon an indicated priority level, certain memory transactions can be performed out-of-turn from other memory transactions in an attempt to realize the QoS.
- Further details and benefits of these embodiments and other embodiments are provided in relation to the figures.
FIG. 1 illustrates an embodiment of a multi-processor unified memory management system 100 (“system 100”).System 100 can include:processor 110;processor 120;memory 130;memory 140; andinter-chip link 160. -
Processor 110 andprocessor 120 may be various forms of processors on which customized modules can be implemented. For instance,processor 110 andprocessor 120 may be various types of FPGAs on which code can be implemented using programmable hardware. For example, one or both of FPGAs may be a multiple processor system on a chip (MPSoC). On each of 110 and 120, a separate instance of an MMM (150-1, 150-2) may be implemented. Each MMM may handle two primary tasks: 1) routing memory transactions appropriately; and 2) performing any conversion or translation needed to the memory transaction.processors -
Native processing 112 ofprocessor 110 may be implemented as firmware based on code or as executed software written by a person or obtained from some other source. MMM 150-1 may be implemented as a code module that is similarly implemented as firmware (or software) onprocessor 110. In other embodiments, MMMs may be implemented using hardware particularly designed for the purpose.Native processing 112 may generate either a memory-mapped or a stream-based memory transaction. Regardless of whether the memory transaction output bynative processing 112 is memory-mapped or stream-based, the transaction can be routed to MMM 150-1. (Therefore, MMM 150-1 handles all forms of memory transactions fornative processing 112.) MMM 150-1 may maintain a memory map that includesmemory 130 ofprocessor 110 andmemory 140 ofprocessor 120. A first address range may be mapped toprocessor 110 while a second memory address range may be mapped toprocessor 120. If a memory transaction received by MMM 150-1 corresponds to a memory address mapped toprocessor 110, the memory transaction may be performed directly by MMM 150-1 withmemory 130. Memory transactions conducted by MMM 150-1 withmemory 130 can be memory mapped transactions. - If a memory transaction received by MMM 150-1 from
native processing 112 indicates a memory address mapped toprocessor 120, the memory transaction may be transmitted by MMM 150-1 viainter-chip link 160 to MMM 150-2.Native processing 112 ofprocessor 110, may be able to generate a transaction using a memory mapped protocol (e.g., AXI4-memory mapped transactions) and/or a transaction using a stream protocol (e.g., AXI4-stream protocol data transfer transactions). In a memory mapped transaction, all transactions involve the use of a target memory address. In contrast, a stream transaction does not include a memory address associated with the transaction. A stream transaction (e.g., an AXI stream transaction) can allow for a unidirectional channel for data flow. A stream transaction may tend to provide better performance compared to a memory mapped transaction due to less overhead data being involved. Therefore, for communications between processors, AXI stream based transactions may be preferable. - MMM 150-1 may serve to convert a transaction to a stream transaction prior to sending via
inter-chip link 160. The transaction may then be sent to MMM 150-2. MMM 150-2, which has the same memory map as MMM 150-1, may then perform the memorytransaction using memory 140. MMM 150-2 may translate the received stream protocol transaction into a memory-mapped transaction to perform the memory transaction withmemory 140. Encoded within the stream protocol transaction or sent via a sidelink transaction may be memory address information added by MMM 150-1. MMM 150-2 may decode these memory address information and use it to create the memory mapped transaction. - The memory address information may be sent by MMM 150-1 to MMM 150-2 using in-band signaling. In-band signally can involve a data header being sent before or after the data payload on the inter-chip link as part of the stream protocol transaction. Alternatively, side-band signaling may be used. An inter-chip link protocol, such as Interlaken, can support built-in low bandwidth sideband bus communications. Such arrangements allow for higher speed data transmissions in-band and lower speed transmissions via a side-band. Side-band signaling can include a memory address and control messages being sent on a low-bandwidth (relative to the high-bandwidth using to transmit the data payload), out-of-band inter-chip link. Therefore, using side-band signaling, a different frequency may be used for communication than the data.
- As another alternative, efficient side-band signaling may be used. In efficient side-band signaling, the destination memory address is sent in-band; however, the identity (ID) of the target processor and routing metadata is sent through a low-bandwidth side-band link. Such an arrangement allows for the receiving MMM to not need to decode or analyze the incoming data payload to obtain a memory address. Therefore, the in-band data payload and address can be encrypted when transmitted between MMMs, while using the unencrypted metadata passed on the side-link to facilitate routing and handling of the data payload.
- MMM 150-2 may function the same as MMM 150-1. Therefore, MMMs 150 may each handle memory read and write transactions to a local memory and remote memory that are part of a common flat memory map, along with handling any protocol translations necessary between a memory-mapped protocol and a stream protocol. A common piece of code may be used to implement MMMs 150. A difference between MMMs 150 may be which address range within a common flat memory map each MMM can access directly. From the point-of-view of
native processing 112 andnative processing 122, the entire memory map can be treated the same. Each MMM of MMMs 150 properly routes, translates, and responds to the memory transactions as needed. -
FIG. 2 illustrates another embodiment of a multi-processor unified memory management system 200 (“system 200”). Insystem 200, a more complicated multi-chip architecture is present. It should be understood that the number and arrangement of processors is merely an example.System 200 can include:FPGA 210;FPGA 220;FPGA 230; andMPSoC 242. 210, 220, and 230 can have various modules that are created as code and used to configure the FPGAs.FPGAs FPGA 210 can includenative processing 212 and MMM 270-1. MMM 270-1 may communicate directly withmemory 261. OnlyFPGA 210 may be able to directly accessmemory 261; therefore, memory transactions that involve the portion of the system memory map corresponding tomemory 261 may be required to be performed via MMM 270-1. -
FPGA 220 can includenative processing 222, MMM 270-2, andlocal buffer 226.Native processing 222 and MMM 270-2 may function as detailed in relation to the native processing and MMMs ofsystem 100. However, MMM 270-2 may be configured to access an additional type of memory, such aslocal buffer 226.Local buffer 226 can represent high speed memory that is on-board FPGA 220.Local buffer 226 can be included as part of the system-wide common flat memory map and may be accessed via memory mapped transactions by MMM 270-2. Therefore, a memory transaction conducted by any of 210, 220, 230, orFPGAs MPSoC 242 may be routed to and handled by MMM 270-2. -
FPGA 230 can includenative processing 232, MMM 270-3, and data local area network (LAN) 234.Native processing 232 and MMM 270-3 may function as detailed in relation to the native processing and MMMs ofsystem 100. However, MMM 270-3 may be additionally configured to communicate withdata LAN 234.Data LAN 234 may serve as an interface for input and output of user data, such as via one or more user interfaces. Data exchanged withdata LAN 234 may be via a stream protocol, therefore transactions conducted between MMM 270-3 anddata LAN 234 may be converted to a stream protocol, if needed. -
MPSoC 242 includes multiple on-board processors. For example,MPSoC 242 can includeFPGA 240 andprocessing subsystem 250.FPGA 240 may includenative processing 246 and MMM 270-4.Processing subsystem 250 may include one or more other types of processors, such asprocessor 252.Processor 252 could be a RISC-based processor (e.g., from ARM).MPSoC 242 may have multiple dedicated memories. MMM 270-4 may control access tomemory 264 andmemory 265. A memory mapped protocol may be used by MMM 270-4 for communication withmemory 264 andmemory 265. Further MMM 270-4 may allow forprocessor 252 to perform a memory mapped transaction withFPGA 240 or any of 210, 220, and 230. MMM 270-4, similar to the other instances of MMMs 270 may translate a memory mapped protocol transaction into a stream protocol transaction. When a memory mapped protocol transaction, the memory address information included as part of the memory mapped protocol transaction may be embedded as part of the stream protocol transaction such that the memory address information can be extracted by the receiving MMM.FPGAs - Multiple high-speed inter-chip links are present between
210, 220, 230, andFPGAs MPSoC 242.FPGA 210 may communicate withFPGA 220 viainter-chip link 214.FPGA 220 may communicate withFPGA 230 viainter-chip link 224.FPGA 220 may communicate withFPGA 240 viainter-chip link 244. It should be understood that this hub-and-spoke arrangement aroundFPGA 220 is merely an example. Additional or alternate inter-chip links may be present. For example,FPGA 210 may have a second inter-chip link to, for example,FPGA 230. - Each MMM of MMMs 270 may only have data stored indicating to which processor a memory transaction should be forwarded. For example, the flat memory map maintained by MMM 270-1 may indicate a first range of memory addresses that correspond to
memory 261. All other memory addresses may correspond to FPGA 220 and MMM 270-2. However, upon receipt of a memory transaction from MMM 270-1, MMM 270-2 may need to perform further forwarding, such as to FPGA 230 orFPGA 240. Further, each MMM of MMM 270 can handle stream-based memory transactions (or another form of non-memory mapped memory transactions) and memory mapped memory transactions in immediate succession. - As an example of such an arrangement,
native processing 212 may conduct a memory transaction with a particular memory address. The memory transaction may be sent to MMM 270-1 bynative processing 212. MMM 270-1 may determine that the memory transaction corresponds to a memory address in the flat memory map that corresponds to FPGA 220. The memory transaction may be sent via a stream transaction to FPGA 220 and received by MMM 270-2. MMM 270-2 may analyze the stream transaction to extract memory address information. MMM 270-2 may access the flat memory map and determine that the memory address corresponds to FPGA 230 and MMM 270-3. A second memory transaction may be sent via a stream transaction to FPGA 230 byFPGA 220 and received by MMM 270-3. MMM 270-3 may analyze the stream transaction to extract memory address information and may then conduct the memory transaction locally withmemory 263. Therefore, from the point-of-view of MMM 270-1, the flat memory map indicates that the memory transaction should be sent toFPGA 220. The flat memory map of MMM 270-2, which corresponds to the same addresses, indicates that the memory transaction is to be sent viainter-chip link 224 toFPGA 230. The memory map of MMM 270-3 indicates the memory transaction is to be handled directly withmemory 263. - Therefore, an advantage to at least some of the arrangements detailed herein is that the MMM transmitting the memory transaction has the memory address destination, but does not need all of the details of the route for the transaction to the memory address. Rather, the MMM transmitting the memory transaction determines the next MMM to which the memory transaction should be transmitted. This next MMM determines the next hop toward the memory address destination (if a next hop is needed). Such an arrangement can further allow for a stream transaction to be transmitted without the destination memory address being known. Rather, the stream memory transaction can be routed based on a destination processor identifier. A separate memory address space may be maintained that is mapped to only the processor identifier and the local MMM determines the specific memory addresses.
-
FIG. 3 illustrates an embodiment of aflat memory map 300 created using a unified memory management system.Flat memory map 300 may be common across all processors of a unified memory management system, such assystem 200 ofFIG. 2 .Memory map 300 indicates five memory address blocks:memory block 301;memory address block 302;memory address block 303;memory address block 304; andmemory address block 305. The version ofmemory map 300 stored by each MMM can include the same data stored at the same memory addresses. - The example of
FIG. 3 corresponds tosystem 200. In this example,memory address block 301 corresponds to a memory (e.g., DDR RAM, local buffer) in direct communication withFPGA 210,memory address block 302 corresponds to a memory in direct communication withFPGA 220;memory address block 303 corresponds to a memory in direct communication withFPGA 230;memory address block 304 corresponds to a memory in communication withFPGA 240; andmemory address block 305 corresponds to a memory in direct communication withprocessing subsystem 250. - While each memory map may correspond to the same data, each memory map may differ in how various memory address blocks are mapped for access. From the perspective of MMM 270-1, memory transactions involving memory address within
memory address block 301 may be directly handled; memory transactions involving memory addresses within memory address blocks 302-305 may be forwarded to FPGA 220 viainter-chip link 214. In contrast, from the perspective of MMM 270-2, memory transactions involving memory address withinmemory address block 301 may be forwarded to FPGA 210 viainter-chip link 214; memory transactions involving memory addresses withinmemory address block 302 may be directly handled, memory transaction involving memory addresses withinmemory address block 303 may be forwarded to FPGA 230 viainter-chip link 224; and memory transaction involving memory addresses within memory address blocks 304 and 305 may be forwarded to FPGA 240 viainter-chip link 244. Therefore, while each processor may have access to the entire memory map, the routing of memory transactions within a system can be controlled by MMMs based on stored flat memory maps. - Various methods may be performed using the systems and memory mapping arrangements detailed in relation to
FIGS. 1-3 .FIG. 4 illustrates an embodiment of amethod 400 for using a unified memory management system.Method 400 can involve the use of systems arranged similar tosystem 100 andsystem 200. - At
block 405, a memory transaction may be received from a local native processing component.Block 405 may be performed by an MMM being executed by the processing system that is performing native processing. The memory transaction may be a stream-based memory transaction (or another form of non-memory mapped memory transaction) or a memory mapped memory transaction. The MMM can be configured to handle both types of memory transactions in immediate succession. Therefore, the native processing process may transmit a memory request to the MMM. The native processing process may not have visibility as to whether the memory transaction involves local memory or data stored in memory of another processor. - At
block 410, the MMM may determine if the memory transaction involves local or remote memory. The MMM may make the determination by a memory address of the request. Since a single memory map is used across the entire, multi-processor system, one or more ranges of memory addresses are mapped to the local memory. -
Block 415 is performed ifblock 410 was determined to involve a local memory transaction. Atblock 415, the MMM may directly access the local memory and perform the memory transaction, such as writing to the memory address or reading from the memory address. -
Block 420 is performed ifblock 410 was determined to involve a remote memory transaction. That is, the memory transaction involves accessing memory that is only in direction communication with another processor of the system. Atblock 420, the MMM may translate the memory transaction into a stream-based memory transaction that includes memory location data encoded as part of the transaction output by the MMM. If the processor is in communication with multiple processors, the appropriate processor to which the memory transaction is to be sent may be selected. The appropriate processor may be selected based on the memory address. - At
block 425, the memory transaction output by the MMM may be forwarded via aninter-chip link 425 to another processor, which may have been selected as part ofblock 420. Atblock 430, the memory transaction is received by the other processor via the inter-chip link. The memory transaction may be analyzed by an MMM of the processor that received the memory transaction. The MMM may then determine atblock 435, based on the encoded memory location, whether the memory address is directly accessible by the processor that received the memory transaction or if the memory transaction needs to be forwarded again. If forwarded again, the transaction may be translated and forwarded atblocks 420 through 435 until the memory transaction arrives at the correct processor. In most implementations, no more than four or five processors may be chained together, and thus forwarding of a memory transaction may occur at most only a few times. However, in some implementations many more processors may be chained together and forwarding of the memory transaction may need to be performed many times. - If at
block 435 it is determined that the memory transaction does not need to be forwarded since the memory address corresponds to an address of the memory map for memory that is directly accessed by the processor that received the request,method 400 proceeds to block 440. Atblock 440, translation, if needed, is performed on the received memory transaction. Atblock 445, the memory transaction is performed. - The methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For instance, in alternative configurations, the methods may be performed in an order different from that described, and/or various stages may be added, omitted, and/or combined. Also, features described with respect to certain configurations may be combined in various other configurations. Different aspects and elements of the configurations may be combined in a similar manner. Also, technology evolves and, thus, many of the elements are examples and do not limit the scope of the disclosure or claims.
- Specific details are given in the description to provide a thorough understanding of example configurations (including implementations). However, configurations may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configurations of the claims. Rather, the preceding description of the configurations will provide those skilled in the art with an enabling description for implementing described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure.
- Also, configurations may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure. Furthermore, examples of the methods may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments to perform the necessary tasks may be stored in a non-transitory computer-readable medium such as a storage medium. Processors may perform the described tasks.
- Having described several example configurations, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may be components of a larger system, wherein other rules may take precedence over or otherwise modify the application of the invention. Also, a number of steps may be undertaken before, during, or after the above elements are considered.
Claims (20)
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/031,432 US11281583B1 (en) | 2020-09-24 | 2020-09-24 | Unified memory management for a multiple processor system |
| EP21791200.5A EP4217874B1 (en) | 2020-09-24 | 2021-09-23 | Unified memory management for a multiple processor system |
| PCT/US2021/051669 WO2022066850A1 (en) | 2020-09-24 | 2021-09-23 | Unified memory management for a multiple processor system |
| CA3193617A CA3193617A1 (en) | 2020-09-24 | 2021-09-23 | Unified memory management for a multiple processor system |
| BR112023005282A BR112023005282A2 (en) | 2020-09-24 | 2021-09-23 | UNIFIED MEMORY MANAGEMENT FOR MULTI-PROCESSOR SYSTEM |
| US17/587,102 US11636036B2 (en) | 2020-09-24 | 2022-01-28 | Unified memory management for a multiple processor system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/031,432 US11281583B1 (en) | 2020-09-24 | 2020-09-24 | Unified memory management for a multiple processor system |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/587,102 Continuation US11636036B2 (en) | 2020-09-24 | 2022-01-28 | Unified memory management for a multiple processor system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US11281583B1 US11281583B1 (en) | 2022-03-22 |
| US20220091981A1 true US20220091981A1 (en) | 2022-03-24 |
Family
ID=78135218
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/031,432 Active US11281583B1 (en) | 2020-09-24 | 2020-09-24 | Unified memory management for a multiple processor system |
| US17/587,102 Active US11636036B2 (en) | 2020-09-24 | 2022-01-28 | Unified memory management for a multiple processor system |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/587,102 Active US11636036B2 (en) | 2020-09-24 | 2022-01-28 | Unified memory management for a multiple processor system |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11281583B1 (en) |
| EP (1) | EP4217874B1 (en) |
| BR (1) | BR112023005282A2 (en) |
| CA (1) | CA3193617A1 (en) |
| WO (1) | WO2022066850A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040260889A1 (en) * | 2003-04-11 | 2004-12-23 | Sun Microsystems, Inc. | Multi-node system with response information in memory |
| US20180300931A1 (en) * | 2017-04-17 | 2018-10-18 | Intel Corporation | Scatter gather engine |
| US20190121737A1 (en) * | 2016-04-25 | 2019-04-25 | Netlist, Inc. | Method and apparatus for uniform memory access in a storage cluster |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9940287B2 (en) | 2015-03-27 | 2018-04-10 | Intel Corporation | Pooled memory address translation |
| US11126572B2 (en) | 2018-02-13 | 2021-09-21 | Intel Corporation | Methods and systems for streaming data packets on peripheral component interconnect (PCI) and on-chip bus interconnects |
| KR102879034B1 (en) | 2019-03-11 | 2025-10-29 | 삼성전자주식회사 | Memory Device performing calculation process and Operation Method thereof |
-
2020
- 2020-09-24 US US17/031,432 patent/US11281583B1/en active Active
-
2021
- 2021-09-23 BR BR112023005282A patent/BR112023005282A2/en unknown
- 2021-09-23 WO PCT/US2021/051669 patent/WO2022066850A1/en not_active Ceased
- 2021-09-23 CA CA3193617A patent/CA3193617A1/en active Pending
- 2021-09-23 EP EP21791200.5A patent/EP4217874B1/en active Active
-
2022
- 2022-01-28 US US17/587,102 patent/US11636036B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040260889A1 (en) * | 2003-04-11 | 2004-12-23 | Sun Microsystems, Inc. | Multi-node system with response information in memory |
| US20190121737A1 (en) * | 2016-04-25 | 2019-04-25 | Netlist, Inc. | Method and apparatus for uniform memory access in a storage cluster |
| US20180300931A1 (en) * | 2017-04-17 | 2018-10-18 | Intel Corporation | Scatter gather engine |
Also Published As
| Publication number | Publication date |
|---|---|
| US11281583B1 (en) | 2022-03-22 |
| CA3193617A1 (en) | 2022-03-31 |
| US11636036B2 (en) | 2023-04-25 |
| US20220222179A1 (en) | 2022-07-14 |
| EP4217874B1 (en) | 2025-01-29 |
| EP4217874A1 (en) | 2023-08-02 |
| WO2022066850A1 (en) | 2022-03-31 |
| BR112023005282A2 (en) | 2023-04-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109992405B (en) | A method and network card for processing data message | |
| US7609718B2 (en) | Packet data service over hyper transport link(s) | |
| US6757768B1 (en) | Apparatus and technique for maintaining order among requests issued over an external bus of an intermediate network node | |
| US6832279B1 (en) | Apparatus and technique for maintaining order among requests directed to a same address on an external bus of an intermediate network node | |
| US11095626B2 (en) | Secure in-line received network packet processing | |
| US8401000B2 (en) | Method of processing data packets | |
| US9219695B2 (en) | Switch, information processing apparatus, and communication control method | |
| US7596148B2 (en) | Receiving data from virtual channels | |
| US20230185745A1 (en) | Data flow control module for autonomous flow control of multiple dma engines | |
| US9015380B2 (en) | Exchanging message data in a distributed computer system | |
| US11010165B2 (en) | Buffer allocation with memory-based configuration | |
| CN111490946B (en) | FPGA connection realization method and device based on OpenCL framework | |
| US11636036B2 (en) | Unified memory management for a multiple processor system | |
| CN110958216B (en) | Secure online network packet transmission | |
| US6766423B2 (en) | Message-based memory system for DSP storage expansion | |
| US7237044B2 (en) | Information processing terminal and transfer processing apparatus | |
| US20220083485A1 (en) | Data frame interface network device | |
| US8429240B2 (en) | Data transfer device and data transfer system | |
| US20220131837A1 (en) | Secure element and method | |
| CN116303195A (en) | PCIE communication | |
| US20190286575A1 (en) | Network interface device, information processing device having plural nodes including network interface device, and method for transmitting transmission data between nodes of information processing device | |
| CN107249008A (en) | Passage interconnect device and method that a kind of remote data is directly accessed | |
| HK40026403A (en) | Secure in-line network packet transmittal | |
| HK40026403B (en) | Secure in-line network packet transmittal | |
| HK40017505B (en) | Secure in-line received network packet processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| AS | Assignment |
Owner name: U.S. BANK NATIONAL ASSOCIATION, MINNESOTA Free format text: SECURITY INTEREST;ASSIGNOR:HUGHES NETWORK SYSTEMS, LLC;REEL/FRAME:059987/0168 Effective date: 20220504 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |