US20080270713A1 - Method and System for Achieving Varying Manners of Memory Access - Google Patents
Method and System for Achieving Varying Manners of Memory Access Download PDFInfo
- Publication number
- US20080270713A1 US20080270713A1 US11/741,933 US74193307A US2008270713A1 US 20080270713 A1 US20080270713 A1 US 20080270713A1 US 74193307 A US74193307 A US 74193307A US 2008270713 A1 US2008270713 A1 US 2008270713A1
- Authority
- US
- United States
- Prior art keywords
- cell
- memory
- computer system
- configuration
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0646—Configuration or reconfiguration
Definitions
- the present invention relates to computer systems and, more particularly, relates to systems and methods within computer systems that govern the accessing of memory.
- each block of memory is controlled by a respective memory controller that is capable of communicating with multiple processing units of the computer system.
- Some computer systems employ sockets that each have multiple processing units and, in addition, also typically each have their own respective memory controllers that manage blocks of memory capable of being accessed by one or more of the processing units of the respective sockets.
- processing units located on a given socket may be able to access memory blocks controlled by memory controllers located on other sockets.
- memory interleaving Such operation, in which one socket directly accesses the memory resources of another socket, is commonly referred to as “memory interleaving”, and systems employing such interleaving capability are commonly referred to as non-uniform memory access (NUMA) systems.
- NUMA non-uniform memory access
- Memory interleaving as described above is typically restricted to small numbers of sockets, for example, to four sockets or less.
- the memory controllers of the sockets cannot be directly connected to the processing units of other sockets but rather typically need to be connected by way of processor agents.
- processor agents tends to be complicated and inefficient both in terms of the operation of the processor agents and in terms of the extra burdens that are placed upon the operating system and applications running on such systems.
- the operating system/applications be capable of adapting to changes in the memory architecture to avoid inefficient operation, something which is often difficult to achieve.
- computer systems be scalable and otherwise adjustable in terms of their sockets (e.g., in terms of processing power and memory). For example, it may in one circumstance be desirable that a computer system utilize only a small number of sockets but in another circumstance become desirable or necessary that the computer system be modified to utilize a larger number of sockets. As the computer system is modified to include or not include larger numbers of sockets, a given manner of interleaving suited for either smaller or larger numbers of sockets may become more or less effective.
- the computer system's memory access performance may vary significantly as the computer system is modified between utilizing four or less sockets and greater than four sockets.
- the present invention relates to a method of operating a computer system.
- the method includes operating a first cell of the computer system in accordance with a first memory access configuration, and migrating a first attribute of a first core of the first cell to a second cell of the computer system.
- the method additionally includes configuring a first portion of the first cell so that the first cell is capable of operating in accordance with a second memory access configuration, and migrating at least one of the first attribute and a second attribute from the second cell back to the first core of the first cell, whereby subsequently the first cell operates in the second mode of operation.
- the present invention relates to a method of operating a computer system.
- the method includes, at a first time, operating the computer system in accordance with an agent access memory configuration in which a first core of a first cell of the computer system communicates with a first memory controller of a second cell of the computer system by way of at least one processor agent and a fabric.
- the method additionally includes, at a second time that one of precedes and occurs after the first time, operating the computer system in accordance with a direct access memory configuration in which the first core of the first cell of the computer system communicates with a second memory controller of the first cell, not by way of the at least one processor agent and not by way of the fabric.
- the method further includes performing a transitioning procedure by which the computer system switches between the operating of the computer system in accordance with the agent access memory configuration and the operating of the computer system in accordance with the direct access memory configuration.
- the present invention relates to a computer system.
- the computer system includes a first core on a first cell, and first and second memory controllers governing first and second memory segments, respectively, where the first memory controller is also implemented as part of the first cell.
- the computer system further includes a fabric, and at least one processor agent coupled at least indirectly to the first core, at least indirectly to the second memory controller, and to the fabric.
- the first core communicates with the second memory controller by way of the at least one processor agent and the fabric.
- the computer system operates in accordance with a direct access memory configuration
- the first core communicates with the first memory controller independently of the at least one processor agent and the fabric.
- FIG. 1 shows in schematic form components of an exemplary computer system divided into multiple partitions that are linked by a fabric, where each partition includes memory blocks and multiple sockets with multiple processing units and memory controllers, and where the multiple sockets are capable of sharing and accessing, via communication links, the various memory blocks, in accordance with one embodiment of the present invention
- FIG. 2 is a flow chart showing exemplary steps of operation, which in particular relate to dynamically converting a direct access memory configuration of the computer system of FIG. 1 to an agent access memory configuration in accordance with one embodiment of the present invention
- FIG. 3 is a flow chart showing exemplary steps of operation, which in particular relate to dynamically converting an agent access memory configuration of the computer system of FIG. 1 to a direct access memory configuration in accordance with one embodiment of the present invention.
- the computer system 2 in the present embodiment in particular includes two partitions, namely, a first cell 4 , a second cell 6 , and a fabric 8 to facilitate communication between those two cells.
- the two cells 4 , 6 can be understood to be formed on two separate printed circuit boards that can be plugged into, and connected by, a backplane (on which is formed or to which is coupled the fabric 8 ).
- the computer system 2 of the present embodiment includes only the first and second cells 4 and 6 , it is nevertheless intended to be representative of a wide variety of computer systems having an arbitrary number of cells and/or circuit boards. For example, in other embodiments, only a single cell or more than two cells are present.
- the computer system 2 can be a sx1000 super scalable processor chipset available from the Hewlett-Packard Company of Palo Alto, Calif., on which are deployed hard partitions formed by the cells 4 , 6 (also known as “nPars”).
- Hard partitions formed by the cells 4 , 6 allow the resources of a single server to be divided among many enterprise workloads and to provide different operating environments (e.g., HP-UX, Linux, Microsoft Windows Server 2003, OpenVMS) simultaneously.
- Such hard partitions also allow computer resources to be dynamically reallocated.
- the computer system 2 can be the super scalable processor chipset mentioned above, it need not be such a chipset and instead in other embodiments can also take a variety of other forms.
- Each of the cells 4 , 6 is capable of supporting a wide variety of hardware and software components. More particularly as shown, each of the cells 4 , 6 includes a respective pair of sockets, namely, sockets 10 and 12 on the first cell 4 and sockets 14 and 16 on the second cell 6 . Additionally, main memory of the cells 4 , 6 is divided into multiple memory segments including memory segments or blocks 26 , 28 , 30 on the first cell 4 and memory segments or blocks 32 , 34 , 36 on the second cell 6 . Additionally, each of the cells 4 , 6 includes a respective pair of processor agents (PAs), namely, PAs 18 and 20 on the first cell 4 and PAs 22 and 24 on the second cell 6 .
- PAs processor agents
- one or both of the cells 4 , 6 can also include other components not shown, for example, input/output systems, and power management controllers.
- the computer system 2 is capable of supporting different types of memory access configurations for accessing the various multiple memory segments 26 - 36 .
- sockets 10 - 16 serve as a platform for supporting multiple hardware components.
- These hardware components include respective sets of cores or processing units 38 , 40 , 42 , 44 on each respective socket, respective pairs of memory controllers (MCs) 88 and 90 , 92 and 94 , 96 and 98 , and 100 and 102 on each respective socket, and respective switches 80 , 82 , 84 and 86 on each respective socket.
- MCs memory controllers
- the socket 10 includes four cores 46 , 48 , 50 , 52
- the socket 12 includes four cores 54 , 56 , 58 , 60
- the socket 14 includes four cores 62 , 64 , 66 , 68
- the socket 16 includes four cores 70 , 72 , 74 , 76 .
- each of the sockets 10 , 12 , 14 and 16 has four cores
- the present invention is intended to encompass a variety of other embodiments of sockets having other numbers of cores, such as sockets having less than four cores (or even only a single core) or more than four cores.
- the switches 80 - 86 on each socket are crossbars capable of routing communications to and from the other components located on that socket. More particularly, the switch 80 allows for the routing of communications from and to any of the cores 46 - 52 and MCs 88 , 90 on the socket 10 , the switch 82 allows for the routing of communications from and to any of the cores 54 - 60 and MCs 92 , 94 on the socket 12 , the switch 84 allows for the routing of communications from and to any of the cores 62 - 68 and MCs 96 , 98 on the socket 14 , and the switch 86 allows for the routing of communications from and to any of the cores 70 - 76 and MCs 100 , 102 on the socket 16 .
- each of the switches 80 - 86 also allows for the routing of communications to and from the respective socket 10 - 16 , on which the respective switch is mounted, from and to the respective pairs of PAs 18 , 20 or 22 , 24 of the cells 4 or 6 , respectively, on which the switch is mounted. That is, each of the switches 80 , 82 is capable of directly communicating with each of the PAs 18 , 20 as shown by dashed paths 81 , 85 , 87 and 89 , while each of the switches 84 , 86 is capable of directly communicating with each of the PAs 22 , 24 as shown by dashed paths 91 , 93 , 95 and 99 . Further, the switches located on a cell are capable of communicating with each other as well. For example, the switches 80 , 82 can communicate with each other as shown by a dashed path 83 and the switches 84 , 86 can also communicate with each other as shown by a dashed path 97 .
- the cores 46 - 76 of the sets of cores 38 - 44 located on the sockets 10 - 16 respectively are chips that are coupled to their respective sockets by way of electrical connectors, and are intended to be representative of a wide variety of central processing units.
- the cores 46 - 76 are Itanium processing units as are available from the Intel Corporation of Santa Clara, Calif.
- one or more of the cores 38 - 44 can take other forms including, for example, Xeon, Celeron and Sempron.
- one or more of the cores can be another type of processing unit other than those mentioned above. Different cores on a given socket, on different sockets, and/or on different cells need not be the same but rather can differ from one another in terms of their types, models, or functional characteristics.
- one or more of the sockets 10 - 16 can include components other than or in addition to those mentioned above. Also, notwithstanding the fact that the present embodiment has two sockets on each of the first and second cells 4 and 6 respectively, one or more cells in other embodiments can either have a single socket or possibly more than two as well. In many embodiments, the number of sockets will exceed (possibly even greatly exceed) the number of sockets shown in FIG. 1 . For example, in some embodiments, the number of sockets could be up to 64 different sockets.
- the present embodiment, with its limited numbers of cells 4 , 6 and sockets 10 , 12 , 14 and 16 is provided as an exemplary embodiment of the present invention due to the ease with which it can be described and illustrated.
- each of the cores of the sets of cores 38 - 44 in the present embodiment includes a variety of hardware and software components capable of supporting a wide variety of applications as well as tasks relating to the management of the various hardware and software components present on the cores as adapted in accordance with various embodiments of the present invention. More particularly, each of the cores includes a cache memory (not shown), which is smaller and faster in operation than the memory segments 26 - 36 of main memory discussed above, and which is capable of storing blocks of frequently used data accessed from the main memory in order to reduce the impact of memory latency that occurs when accessing the main memory (discussed in more detail below in regards with FIG. 2 ). Notwithstanding the presence of the cache memories, one or more of the memory segments 26 - 36 of the main memory still need to be accessed on a regular basis, for example, upon failures to locate requested information in the cache memories.
- each of the cores 46 - 76 has a respective logic block referred to as a Source Address Decoder (SAD) 78 .
- the SADs 78 can be implemented as hardware components and/or can reside as software.
- each of the SADs 78 is pre-programmed to direct memory request signals to the MCs 88 - 102 either indirectly via the PAs 18 - 24 or directly (not via the PAs) depending upon a memory configuration of the computer system 2 , as discussed in more detail below.
- signals returning from the MCs 88 - 102 are processed in the SADs 78 for receipt by the cores 46 - 76 .
- the SADs 78 associated with any of the cores 46 - 60 of the first cell 4 will only send requests to the PAs 18 , 20 or the MCs 88 - 94 of that cell, while the SADs 78 associated with any of the cores 62 - 76 of the second cell 6 will only send requests to the PAs 22 , 24 or the MCs 96 - 102 of that cell.
- the SADs 78 process signals arising from the cores 46 - 76 and determine how to route at least some of those signals based upon the memory configuration used for accessing the various memory segments 26 - 36 .
- signals arising from the cores 46 - 76 and intended for the MCs 88 - 102 are routed indirectly not only via appropriate ones of the switches 80 - 86 but also via appropriate ones of the PAs 18 - 24 and the fabric 8 .
- all (or substantially all) such signals between cores and MCs/memory segments are routed in such indirect manner regardless of the relative proximity of the cores and MCs/memory segments.
- the routing of signals depends upon the relative locations of the cores 46 - 76 and the MCs 88 - 102 (and associated memory segments).
- signals from respective ones of the cores 46 - 76 that are directed to respective ones of the MCs 88 - 102 that are located on the same respective cells are routed directly to those respective MCs via the respective switches 80 - 86 of the respective sockets of the respective cells, without being routed to any of the PAs 18 - 24 or routed via the fabric.
- Such memory accesses where the requesting core and the intended MC (and desired memory segment) are positioned on the same cell can be referred to as local memory accesses.
- signals from respective ones of the cores 46 - 76 that are directed to respective ones of the MCs 88 - 102 that are located on different cells are routed indirectly via appropriate ones of the switches 80 - 86 and appropriate ones of the PAs 18 - 24 through the fabric 8 , in the same manner as signals are routed according to the agent access memory configuration.
- Such memory accesses where the requesting core and the intended MC (and desired memory segment) are positioned on different cells can be referred to as remote memory accesses.
- operation in the direct access memory configuration involves some signals being communicated indirectly by way of one or more of the PAs 18 - 24 and the fabric 8 , typically as many signals as possible are routed in a manner that does not involve any of the PAs or the fabric.
- I/O subsystems are accessed by way of the fabric 8 . Therefore, while the paths of signals communicated between the cores 46 - 76 and the MCs 88 - 102 as determined by the SADs 78 can vary depending upon the memory access configuration, in the present embodiment all signals communicated by the cores 46 - 76 that are intended for receipt by I/O subsystems are routed via the respective switches 80 - 86 , appropriate ones of the PAs 18 - 24 and the fabric, irrespective of the memory configuration. In alternate embodiments, it is possible that the I/O subsystems will be coupled to other devices/structures of the computer system 2 , such as directly to the PAs 18 - 24 themselves.
- the MCs 88 - 102 are responsible for managing and accessing the various memory segments 26 - 36 in response to read/write requests received from the cores 46 - 76 , and for relaying signals back from those memory segments to the cores, as described in further detail below.
- the MCs 88 - 102 can be hardware chips such as application specific integrated circuit (ASICs) that are connected to the sockets 10 - 16 by way of electrical connectors. In other embodiments, one or more of the MCs 88 - 102 can be other type(s) of MCs.
- ASICs application specific integrated circuit
- the number of MCs per socket can vary in other embodiments (e.g., there can be only a single MC on each socket or possibly more than two as well).
- each of the MCs 88 - 102 includes a respective logic block referred to as a Target Address Decoder (TAD) 106 .
- TAD Target Address Decoder
- the TADs 78 process signals arriving from the cores 46 - 76 and determine how to convert between (e.g., decode) memory address information received in those signals and memory locations within the memory segments 26 - 36 .
- the TADs 106 also facilitate the return of information from the memory segments 26 - 36 back toward the cores 46 - 76 .
- each of the TADs 106 can be implemented in either hardware or software, and is pre-programmed to convert between memory bank addresses and memory locations inside the memory segments 26 - 36 .
- main memory itself, as discussed above, it is divided into multiple disjointed memory segments including, for example, the memory segments 26 - 36 of FIG. 1 .
- the particular manner of sub-division of the main memory into multiple memory segments can vary depending upon the embodiment and upon various factors, for example, the requirements of the applications running on the cores 46 - 76 .
- the memory segments 26 - 36 are organized as dual in-line memory modules (DIMMs) that are respectively connected to one or more of the MCs 88 - 102 by way of electrical connectors.
- DIMMs dual in-line memory modules
- the memory segment 26 and 30 are controlled by the MCs 88 and 94 , respectively, while the memory segment 28 is controlled by both the MC 90 and the MC 92 .
- the memory segments 32 and 36 are respectively controlled by the MCs 96 and 102 , respectively, while the memory segment 34 is managed by both the MC 98 and 100 .
- Exemplary communications among the various components of the computer system 2 as occurs in the aforementioned different agent access and direct access memory configurations, as well as between the cores of the computer system and the I/O subsystems, are illustrated in FIG. 1 by several exemplary communication paths 144 , 146 , 148 , 150 , 152 , 154 , 155 , 156 and 157 .
- those of the communication paths that proceed between the sockets 10 - 16 and the PAs 18 - 24 follow the connections provided by the dashed communication paths 81 , 85 , 87 , 89 , 91 , 93 , 95 and 99 connecting the switches 80 - 86 with the PAs 18 - 24 , and those of the communication paths that proceed between neighboring sockets of the same cell follow the connections provided by the dashed communication paths 83 , 97 .
- the communication paths 144 , 146 , 148 , 150 , 152 , 154 , 155 , and 156 illustrate several communication paths some of which are illustrative of direct access memory configuration communications, some of which are illustrative of agent access memory configuration communications, and some of which are consistent with both agent access and direct access memory configuration communications.
- the communication paths 150 , 152 and 155 show three exemplary signal paths that can be taken when the computer system 2 is operating in the direct access memory configuration and when cores are accessing local memory segments.
- the communication path 150 in particular shows the core 62 accessing the memory segment 32 via the MC 96 and the switch 84 .
- the communication path 152 shows the core 70 accessing the memory segment 36 via the MC 102 and the switch 86
- the communication path 155 shows the core 56 accessing the memory segment 28 by way of the MC 90 and switches 80 and 82 (and the communication path 83 ).
- the communication paths 148 , 154 and 156 respectively show three exemplary signal paths that can be taken when the computer system 2 is operating in the agent access memory configuration and when cores are accessing local memory segments.
- these communication paths 148 , 154 and 156 respectively connect the same cores and MCs as the communication paths 150 , 152 and 155 , respectively, these communication paths proceed via certain of the PAs 18 - 24 and via the fabric 8 .
- the communication path 148 like the communication path 150 shows the core 62 accessing the memory segment 32 via the MC 96 , except in this case the communication path proceeds via the switch 84 , the PA 24 and the fabric 8 .
- the communication path 154 like the communication path 152 shows the core 70 accessing the memory segment 36 by way of the MC 102 , except insofar as in this case the communication path proceeds via the switch 86 , the PA 24 and the fabric 8 .
- the communication path 155 like the communication path 156 shows the core 56 accessing the memory segment 28 via the MC 90 , except in this case the communication path proceeds via the switches 80 and 82 , the PA 20 and the fabric 8 .
- a memory request concerning a memory location sent by a given core on a given socket of a given cell is directed first (by way of the switch of the socket) to a first PA that is on the same cell as the requesting core.
- the first PA in response provides a signal to the fabric 8 , which then directs the signal to a second PA that is on the same cell as the MC governing the memory segment on which is located the requested memory location.
- the second PA in turn provides a signal to that MC, which results in the desired accessing of the requested memory location.
- a memory request concerning a memory location sent by a respective core on a respective socket on a respective cell is routed directly to a MC on the same socket or another socket of the same respective cell merely by way of the appropriate switch(es) of the socket(s), without being routed through any of the PAs 18 - 24 or the fabric 8 .
- the MC then provides the appropriate result by way of accessing the requested memory location and directly providing a response via the switches without any routing through any PAs or the fabric. It should be further evident from the particular paths 148 , 150 and 152 shown in FIG.
- the direct access memory configuration can involve communications between cores and MCs that are on the same socket (as indicated by the paths 148 and 152 ), as well as cores and MCs that are on the same cell but not the same socket (as indicated by the path 150 ).
- the communication path 144 is employed to allow the core 62 to access the memory segment 26 by way of the MC 88 via the switches 80 and 84 , the PAs 18 and 22 , and the fabric 8 when the computer system 2 is operating in each of the direct access memory configuration and the agent access memory configuration.
- the communication path 146 is employed to allow the core 64 to access the memory segment 30 by way of the MC 94 via the switches 82 and 84 , the PAs 20 and 22 , and the fabric 8 when the computer system 2 is operating in each of the two memory configurations.
- the fabric 8 is a hardware device formed as part of (or connected to) the backplane of the computer system 2 .
- all requests and other messages to and from any of the cores with respect to the I/O subsystems are communicated via the fabric 8 .
- all requests by any of the cores to access any of the MCs and associated memory segments when operating in the agent access memory configuration are directed through the fabric 8 irrespective of the location of the memory segment relative to the originating core (and even if the core and the appropriate MC for controlling that memory segment are on the same cell and/or same socket).
- requests by any of the cores to access memory segments by way of their respective MCs when the MCs and cores are located on different cells are also routed via the fabric 8 , both when the computer system 2 is operating in the agent access memory configuration and also when the computer system is operating in the direct access memory configuration to perform remote memory accesses.
- communications between the cores and MCs do not pass via the fabric 8 during operation according to the direct access memory configuration when accessing a local memory segment.
- each of the cells 4 and 6 are connected to the fabric 8 during configuration, when the cells are installed on the computer system 2 .
- signals communicated onto the fabric 8 must take on fabric (or global) addresses that differ from the physical addresses employed by the signals when outside of the fabric.
- each of the PAs 18 - 24 can be an integrated circuit (IC) chip albeit, in other embodiments, one or more of the PAs 18 - 24 can take other form(s).
- the PAs 18 - 24 form an intermediary by which signals directed from the cores 46 - 76 and/or the MCs 88 - 102 by way of the switches 80 - 86 are provided to the fabric 8 , and vice-versa.
- the PAs 18 - 24 can be directly coupled to one or more of the I/O subsystems rather than by way of paths such as the path 154 of FIG. 1 that passes through the fabric.
- each of the PAs 18 - 24 has located thereon two coherency controllers (CCs), namely, CCs 108 and 112 on the PA 18 , CCs 116 and 120 on the PA 20 , CCs 124 and 128 on the PA 22 , and CCs 132 and 136 on the PA 24 .
- CCs coherency controllers
- each of the PAs 18 - 24 also has located thereon two caching agents (CAs), namely, CAs 110 and 114 on the PA 18 , CAs 118 and 122 on the PA 20 , CAs 126 and 130 on the PA 22 , and CAs 134 and 138 on the PA 24 .
- CAs caching agents
- the CCs 108 , 112 , 116 , 120 , 124 , 128 , 132 , 136 in particular process signals that are being directed toward, or received from, the MCs 88 - 102 (via the switches 80 - 86 ).
- the CAs 110 , 114 , 118 , 122 , 126 , 130 , 134 and 138 process signals that are being received from, or directed toward, the cores 46 - 76 (via the switches 80 - 86 ).
- the CCs and CAs serve several purposes.
- the CCs are particularly responsible for resolving coherency conflicts within the computer system 2 relating to the accessing of the memory segments 26 - 36 by way of the MCs 88 - 102 in an agent access memory configuration or in collaboration with the MCs 88 - 102 in a direct access memory configuration.
- Conflicts can arise since, in addition to residing within home memory segments, more recent copies of memory locations can also be resident within one or more local cache memories of the cores 46 - 76 .
- the CCs employ a directory based cache coherency control protocol, which is described further below.
- the sockets 10 - 16 also support a coherency protocol, which tracks ownership in both local cores and the CCs (which track ownership for the rest of the cores in the partition).
- the protocol used by the sockets can be, for example, a directory-based protocol or a snooping-based protocol.
- each of the CCs maintains a directory (for example, a table) for each memory location of the main memory.
- Each row of the directory of a given CC includes information indicating which of the memory segments 26 - 36 has ownership (the home memory segment) of each memory location, as well as information indicating which of the cores 46 - 76 has the most updated copy of that location.
- Each location of the directory can be accessed by a subset of the address bits.
- the given CC can also determine if alternate updated copies of that memory location exist within another one or more of the cores. If so, asynchronous signals or “snoops” can be issued to the core holding the updated copy of the memory location for retrieval, thus resulting in the returning of the most updated copy of the memory location in response to a read/write request.
- each of the CCs 108 , 112 , 116 , 120 , 124 , 128 , 132 , and 136 in particular includes an instance of a Memory Translation CAM (MTC) 140 , which can be implemented as a pre-programmed logic block.
- MTC 140 Memory Translation CAM
- the respective MTC 140 of each CC is responsible for converting fabric addresses into local physical addresses for every memory access signal received by the respective MTC off of the fabric 8 when the computer system 2 is operating according to the agent access memory configuration, and also for performing such conversions from fabric addresses to local physical addresses with respect to every remote memory access signal received off of the fabric when the computer system is operating in the direct access memory configuration.
- These local physical addresses can be then used by the MCs 88 - 102 that are in communication with the respective CCs (e.g., to retrieve the information from the requested memory location).
- the MTCs 140 of the CCs are used to determine the coherency flow of the received requests. To issue snoops, a global address routed via the fabric 8 can be converted to a local physical address by way of one of the MTCs 140 . Each MTC 140 also gives an indication about whether a given memory address corresponds to part of a direct access block of memory or part of an agent access block of memory. If the line is part of an agent access block of memory, then the corresponding CC will issue snoops (if required, as determined by the local directory) to both cores on the cell with which it is associated as well as to cores on different cells. If the line is part of a direct access block of memory, then the CC will only issue snoops to cores on remote cells (if required, as determined by the local directory).
- each of the PAs 18 - 24 also has located thereon two caching agents (CAs), namely, CAs 110 and 114 on the PA 18 , CAs 118 and 122 on the PA 20 , CAs 126 and 130 on the PA 22 , and CAs 134 and 138 on the PA 24 .
- CAs 110 , 114 , 118 , 122 , 126 , 130 , 134 and 138 these are intended to perform several functions when the computer system is in the agent access memory configuration.
- the CAs are responsible for executing the coherency flow determined by the CCs (e.g., by executing the snoops issued by the CCs).
- the CAs perform address abstraction for signals routed off via the fabric 8 , by which local physical addresses referenced in signals received from the cores 46 - 76 are converted into fabric (global) addresses appropriate for the fabric 8 , and vice-versa.
- one or more of the CAs 110 , 114 , 118 , 122 , 126 , 130 , 134 and 138 can be programmed to perform other functions than those mentioned above.
- each of the CAs 110 , 114 , 118 , 122 , 126 , 130 , 134 and 138 includes a respective Fabric Abstraction Block (FAB) 142 .
- FAB Fabric Abstraction Block
- each respective FAB 142 allows its respective CA to convert local physical addresses such as those arriving on memory request signals from the cores 46 - 76 (via the switches 80 - 86 ) into fabric (global) addresses suitable for determining where the signals are sent within the fabric 8 for accessing a memory location.
- the FABs 142 can operate in a variety of ways to perform these conversions and, in the present embodiment, employ interleaving algorithms.
- each of the CAs 110 , 114 , 118 , 122 , 126 , 130 , 134 and 138 is pre-programmed for managing a subset of the main memory (e.g., certain subsets of the memory segments 26 - 36 ).
- the allocation of memory to the different CAs is known to the SADs 78 of the cores 46 - 76 , such that the SADs are able to route the memory request signals to the appropriate CAs based upon the memory locations requested by the cores.
- the signals communicated between the cores 46 - 76 and the memory controllers 88 - 102 undergo several conversions as they proceed via the switches 80 - 86 , the PAs 18 - 24 , and the fabric 8 . More particularly, a signal sent by one of the cores 46 - 76 undergoes a first conversion by the SAD 78 of the core, which results in the signal being communicated by the appropriate one of the switches 80 - 86 to an appropriate one of the PAs 18 - 24 .
- the FAB 142 of one of the CAs of the PA converts the signal into a signal appropriate for transmission over the fabric 8 .
- this conversion at least in part involves a conversion of a physical memory address to a fabric address.
- the signal After being transmitted through the fabric 8 , the signal then arrives at another one of the PAs 18 - 24 (or potentially the same PA handling the signal before it entered the fabric), where the MTC 140 of one of the CCs of the PA again converts the fabric address back into a physical memory address.
- the TAD 106 of that MC further converts the signal so that the desired memory location in main memory is accessed. Similar conversion processes occur when signals proceed in the opposite direction from the memory to the cores.
- the local physical address generated by the MTC 140 on the destination end of a given request differs from the local physical address generated by the SAD 78 on the request originating end. That is, the address sent to a MC as part of a memory read or write request is not necessarily the same physical address that was generated by the core that made the request.
- each memory location of the main memory can be referenced by way of a unique address, multiple locations within each of the memory segments 26 - 36 nevertheless can share the same address. As explained earlier, each of the memory segments 26 - 36 is a small, disjointed subset of the main memory.
- the memory locations hosted within each of those memory segments can be accessed by using a smaller subset of the address that is used to access a location inside the main memory. Additionally, the MC view of the address cannot be used by that MC for coherency operations as the modified address can access an incorrect location if applied to a core.
- a signal sent by one of the cores 46 - 76 undergoes a conversion by the SAD 78 of the core, which results in the signal being communicated by the appropriate one (or possibly two) of the switches 80 - 86 to the TAD 106 of an appropriate one of the MCs 88 - 102 .
- the TAD 106 of the appropriate MC in turn converts the signal so that the desired memory location in main memory is accessed.
- the addresses generated by the cores making the requests are the same as the addresses sent to the MCs 88 - 102 , thus enabling the coherency algorithms to issue snoops to the local cores for maintaining coherency within the computer system 2 .
- the agent access memory configuration provides high bandwidth and scalability to large partition sizes, albeit it has higher latency in small partitions than is the case using the direct access memory configuration.
- the computer system 2 is capable of being operated so as to switch between the two memory configurations and thereby provide the advantages of both the agent access and the direct access memory configurations (at least at different times). More particularly, in accordance with these embodiments of the present invention, the computer system 2 is capable of dynamic re-sizing of memory in its partitions and, when that occurs, also capable of converting in its operation between the direct access memory configuration and the agent access memory configuration.
- a partition's memory performance can be optimized whenever new cells and/or new sockets are added/removed from the existing partition within the computer system 2 . Based upon the specific needs/types of the computer systems and the applications running on those computer systems, the configuration of memory access in particular can be changed to provide optimal performance in terms of minimizing hot spots and memory latency.
- flowcharts 158 and 184 respectively shows exemplary steps of operation in which the computer system 2 of FIG. 1 is dynamically converted between the direct access memory configuration and the agent access memory configuration in response to (or in conjunction with) the conversion of the computer system between being a single cell system having only the first cell 4 and a dual cell system having both the first and second cells 4 and 6 .
- the flowchart 158 of FIG. 2 in particular relates to the conversion of the partition from being a single cell system to a dual cell system, and to the corresponding conversion of the computer system 2 from operating in the direct access memory configuration to the agent access memory configuration, while the flow chart 184 of FIG. 3 relates to the opposite conversions.
- FIGS. 1-3 illustrate the computer system 2 that switches between the direct access and agent access memory configurations depending upon whether the computer system's partition employs one cell or two cells
- this transition typically occurs at a different threshold in terms of partition size, e.g., whether the partition includes four or less sockets or more than four sockets.
- the addition/removal of a cell can, but need not necessarily, involve the physical coupling of a new cell with new memory resources (e.g., a memory card) to the computer system 2 .
- new memory resources e.g., a memory card
- the addition of a cell to a given partition merely involves allocation of an idle cell to the given partition, or reallocation of a cell from another partition to the given partition.
- the removal of a cell from a given partition merely involves causing that cell to become idle, or reallocation of that cell to another partition.
- Adding or removing cells can significantly vary the available memory.
- all of the partition memory across all of the available is sockets is re-interleaved. More particularly with respect to the conversion from the direct access memory configuration to the agent access memory configuration when a cell is added to a partition as represented by FIG. 2 , this process involves migrating the cores of the originally-existing cell to the newly-added cell and then back again such that, temporarily, all requests are directed through the PAs of both the old and the newly-added cell. A similar migration of cores is required also in the process of converting the agent access memory configuration to the direct access memory configuration when a cell is deleted from a partition as represented by FIG. 3 .
- the process of adding a cell and converting from the direct access memory configuration to the agent access memory configuration starts at a step 160 and then proceeds to a step 162 .
- a “new” cell is added to an existing partition containing an “old” cell.
- the second cell 6 of FIG. 1 referred to as “cell 1 ” from hereon, is added to the existing single partition containing the first cell 4 , referred to as “cell 0 ” from hereon.
- the “new” cell 1 is a cell board that is similar in structure and characteristics to the existing cell 0 , and in particular includes multiple cores, MCs and partitioned memory segments.
- the cell 1 is initially an unassigned resource with its power shut off, and that is available to be added to the existing partition containing the cell 0 .
- the cell 1 can also initially co-exist in the computer system 2 as part of another existing partition, in which case, the process of FIG. 2 involves removal of that cell from its initially-assigned partition and reassignment of that cell to the partition containing the cell 0 .
- the cell 1 can be installed in the computer system 2 of FIG. 1 at run time.
- the cell 1 is subject to diagnostics testing in which all of the components present on that cell (e.g., memory, cores, switches, PAs, MCs, I/O-related resources and memory segments as well as possibly other components) are tested to ensure that all the components are in working condition and capable of performing their respective functions. Any flawed or non-working components are replaced prior to integrating the cell 1 with the cell 0 into the existing partition of the cell 0 .
- the “new” cell 1 is then integrated with the fabric 8 of the existing partition to facilitate communication between the “old” cell, cell 0 , and the cell 1 , at a step 164 . At this point, the cell 1 becomes a part of the partition ready for communication with the cell 0 and the process proceeds to a step 166 .
- the cell 1 is configured and the partition memory on the cell is allocated.
- properties of that cell including all of its resources, are set so that the cell 1 is formally added to the partition and so that the cell 0 becomes aware of the presence of the cell 1 .
- a portion of the available memory on the cell 1 is configured as a cell local memory which is a non-interleaved memory available for all cores present in the partition.
- all of the other memory on the cell 1 is configured as a global shared memory, also called partition memory, which is capable of being accessed by any of the cores either on the cell 1 or on another cell such as the cell 0 .
- This new partition memory on the cell 1 is allocated to memory blocks adjacent to the pool of existing partition memory blocks of the cell 0 , and thus is invisible to the operating system at this point.
- this partition memory of the cell 1 is added to the cache coherency controller on that cell to track cache line ownership of each line of the partition memory of the cell 1 .
- address maps in the FABs 142 of the cell 1 are set up to configure that cell so that it is capable of accessing memory segments in accordance with the agent access memory configuration.
- the new partition memory of the cell 1 is added to the existing partition memory of the cell 0 . Once the new partition memory is added in this manner, the cell 1 is able to access the existing partition memory on the cell 0 as well.
- the existing cell 0 is configured as well during that step in order to reflect the addition of the cell 1 into the partition and to enable the cell 0 to access the partition memory of the cell 1 .
- the cores and/or sockets on these cells can be actively used for communication with each other.
- the configuration of the cell 0 and the cell 1 during the step 166 is under the control of the firmware, with the integration of the cell 1 invisible to the operating system.
- step 168 the process subsequently advances to a series of steps, beginning with a step 168 , at which the address decoders and other supporting hardware components on the cell 0 are re-programmed to convert that cell from the direct access memory configuration to the agent access memory configuration.
- this step involves migrating the cores, one at a time, from the cell 0 to the cell 1 to facilitate changes to the source decoders of the cell 0 in a manner that avoids crashing the computer system 2 or stalling the processes executing on the cores of that cell.
- the architected state of that core including for example, the registers and interrupts, is frozen and captured under the control of the firmware.
- This state information is then migrated to a “spare” core on the cell 1 so that the spare core can handle any processes that were originally being handled by the original core on the cell 0 .
- the process then proceeds to a step 170 , at which all of the address decoders and related hardware components (such as the cores and MTC s) of the cell 0 are reprogrammed, to facilitate conversion from the direct access memory configuration to the agent access memory configuration.
- the MTCs 140 in particular can be reprogrammed to include information as to whether the MTCs are responsible for tracking local cores (e.g., in the agent access memory configuration) or if the MCs will track the local cores (e.g., in the direct access memory configuration).
- the step 170 is performed by way of special, preprogrammed hardware components found within each of the PAs 18 - 24 of the cells (more particularly within the CCs 108 , 112 , 116 , 120 , 124 , 128 , 132 and 136 ), which can be referred to as memory migration engines.
- each memory migration engine is capable of moving one half of the memory segments in the attached socket, such that two memory migration engines are employed to move all of the memory segments in one socket as explained in more detail below.
- each memory migration engine is capable of reviewing/traversing each cache line of the partition memory for both the cell 0 and the new cell 1 (e.g., the memory segments 26 - 36 ) and corresponding lines of the caches of the cores of the cell 0 , as well as clearing tags of the caches.
- the new cell 1 e.g., the memory segments 26 - 36
- corresponding lines of the caches of the cores of the cell 0 as well as clearing tags of the caches.
- the memory migration engine(s) operate by updating the CCs 108 , 112 , 116 and 120 of the cell 0 and transitioning the directories of those CCs (used for tracking coherency) accordingly so that the cores on the cell 0 do not use any of the old copies of data still present in their respective caches. Data from all of the caches on the cell 0 is then flushed out and the CCs 108 , 112 , 116 and 120 gain exclusive access to each line in the memory segments 26 - 30 of that cell and write them back to idle.
- the directory contents of those flushed cache lines are updated and the new partition memory of the cell 1 is added to the directories to enable the MCs to keep track of the cache lines present in every core on both the cell 0 and the cell 1 .
- No cores on the cell 0 are referenced after this step is completed.
- the firmware controls the cores on cell 0 so the control is transferred to the cores on cell 1 for any further processing.
- all of the MTCs 140 on the cell 0 are reprogrammed for operation in accordance with the agent access memory configuration rather than the direct access memory configuration. More particularly, the MTCs 140 are reprogrammed for use in conjunction with the fabric to enable the transmission of memory request signals to the MCs 88 - 102 and thereby to the memory segments 26 - 36 . As mentioned earlier, when the computer system 2 is operating in the direct access memory configuration accessing local memory segments, all memory requests from cores to MCs located on the same cell are directed to those MCs without passing through any of the PAs 18 - 24 or the fabric 8 .
- the only signals that are communicated to the PAs 18 - 24 , and governed by the MTCs 140 in particular, are those intended for the I/O subsystems or cores accessing remote memory segments.
- Reprogramming of the MTCs 140 makes it possible for the PAs 18 - 24 to handle signals arriving from the cores 46 - 76 that are intended eventually for the MCs 88 - 102 .
- the TADs 106 of the MCs 88 - 102 need not be re-programmed, since bank addresses of the memory segments 26 - 36 do not change.
- each of the SADs 78 of the cores 46 - 60 on the original cell 0 are reprogrammed by the firmware to operate according to the agent access memory configuration rather than the direct access memory configuration. More particularly, the SADs 78 are reprogrammed so that all memory requests are directed to one of the PAs 18 - 24 instead of directly to one of the MCs 88 - 102 (via one or more of the switches 80 - 86 ) as during operation according to the direct access memory configuration.
- the operation of the MTCs 140 and SADs 78 is primarily responsible for determining whether the computer system 2 is operating according to the direct access memory configuration or the agent access memory configuration.
- the FAB 142 however remains unchanged when switching between direct access and agent access memory configurations.
- the MTCs 140 and SADs 78 are pre-programmed to be capable of both the direct access memory configuration and the agent access memory configuration.
- the internal characteristics of the MTCs 140 of the CCs 108 , 112 , 116 and 120 are modified from being appropriate for the direct access memory configuration to being appropriate for the agent access memory configuration.
- the internal characteristics of the SADs 78 also are changed from being appropriate for the direct access memory configuration to being appropriate for the agent access memory configuration.
- each of the MTCs 140 has entries that each respectively correspond to a respective memory segment.
- Each of the entries can be set independently by the firmware for either agent access or direct access memory configurations.
- the MTC entries include MC addresses for different memory lines, and each memory address typically remains the same regardless of whether the line is to be accessed via the direct access memory configuration or the agent access memory configuration, in order to facilitate the conversion process (e.g., to facilitate the mapping of memory from a deleted cell to a remaining cell).
- firmware sets or clears a bit in each such MTC entry indicating whether the corresponding memory block is for use in the agent access memory configuration or the direct access memory configuration.
- each MTC entry corresponds to a single memory block, and the direct access/agent access attribute can be set independently for each entry.
- the SADs 78 and related components are re-programmed to facilitate the conversion from the direct access memory configuration to the agent access memory configuration, every direct access memory address used as an alias address for satisfying read requests to the socket home agent remain the same.
- the core(s) that were migrated from the cell 0 to the cell 1 are migrated back to their original location(s) on the cell 0 , at a step 176 .
- the architected states of the cores of the cell 0 are retrieved and the cores can re-establish all active process execution. This procedure is largely the reverse of the procedure performed during the step 168 discussed above.
- the process next advances to a step 178 .
- the step 178 is optional and can be skipped. Nevertheless, for optimal utilization of all of the memory segments 26 - 36 of the computer system 2 that are available when operating in the agent access memory configuration, at the step 178 the partition memory on the cells 0 and 1 is re-interleaved with the memory segments.
- the memory segments 26 - 36 are re-interleaved by collaboration between the firmware and the hardware, such as the memory migration engines and the MTCs, by executing special software to facilitate such a re-interleaving.
- the cell 0 becomes capable of accessing not only the partition memory of the cell 0 but also the partition memory of the cell 1 (and vice-versa). This reduces memory access latency and diminishes the frequency of memory “hot-spots” relative to the performance that could be obtained if the cells 0 , 1 were only able to access their own memory (e.g., by operating in the direct access memory configuration).
- step 180 After completing (or skipping) the step 178 , the process then advances to a step 180 .
- an on-line addition operation is performed to expose the newly-added cell 1 , its cores 62 - 76 and its partition memory to the operating system, thereby finalizing the conversion of the computer system 2 from the direct access memory configuration to the agent access memory configuration.
- the new cell 1 with its cores 62 - 76 can now be used for active processing of data in addition to the existing cell 0 .
- the process 158 of conversion from the direct access memory configuration to the agent access memory conversion is complete and the process ends at a step 182 .
- an additional flowchart 184 shows exemplary steps of another process by which the computer system 2 is converted from the agent access memory configuration to the direct access memory configuration, in accordance with at least some embodiments of the present invention.
- This conversion is performed when one or more cell(s) is/are deleted from a partition so as to leave only a single cell in the partition, in order to achieve memory latency on the remaining cell of the partition that would have been achieved if the remaining partition had been booted with that single-cell configuration.
- the computer system 2 initially operating in accordance with the agent access memory configuration begins operation with a partition having two cells (namely, the cells 0 and 1 ), but then ends operation with only a single cell (namely, the cell 0 ).
- the present conversion operation is equally applicable to computer systems that are reduced from having a higher number of cells (e.g., more than two, and particularly more than four) to a lesser number of cells.
- the process begins at a step 186 and then proceeds to another step 188 .
- the partition memory on the cell that is being removed (cell 1 ) is deleted by way of an on-line deletion operation. By performing this operation, the partition memory on the cell 1 is made invisible to the operating system, so as to facilitate an on-line removal of the cell 1 without affecting the executing processes on the cores of the cell 0 .
- the active processing on the cores 62 - 76 of the cell 1 is terminated to prevent the computer system 2 of FIG. 1 from crashing while deleting that cell.
- the operating system releases all of the socket local memory of the cell 1 and also the partition memory of the cell 1 (e.g., the segments 32 - 36 )) to ensure that the remaining memory can fit onto the remaining cell 0 .
- the partition memory will typically be in groups of two (such that it can be interleaved across both of the two sockets of the cell 0 ), and the groups are claimed directly by the TADs in the cell 0 .
- a step 190 all of the cores 62 - 76 of the cell 1 are deleted from the operating system by way of an on-line deletion operation. After this on-line deletion operation has been performed on the cores of the cell 1 , these cores are not used anymore for any active process execution. That the cores 62 - 76 attain this inactive status is desirable in order to preclude the computer system 2 from crashing in an event where the cell 1 still has processes running on it after that cell is removed from the partition. Subsequently, at a step 192 , all of the partition memory previously available to both the cell 0 and cell 1 is de-interleaved into a 1-way partition memory capable of being accessed only by the remaining cell 0 in the partition. De-interleaving of the partition memory takes place by way of special software under the collaboration of the memory migration engines and the firmware.
- the cores on the cell 0 are migrated to the cell 1 so as to allow modification of the source decoders and address maps of the cores on the cell 0 to be converted from the agent access memory configuration to the direct access memory configuration.
- migration of each of the cores 46 - 60 of the cell 0 involves capturing (or “freezing”) the present architected state of that core, and then transferring that state of the core to a “spare” core on the cell 1 , which resumes all of the active processing for the core that has been migrated from the cell 0 .
- Each core is migrated in succession, with the same process being repeated until all of the cores from the cell 0 have been migrated to the corresponding spare cores on the cell 1 .
- the migration of the cores 46 - 60 from the cell 0 enables the SADs 78 on those cores to be re-programmed for operation according to the direct access memory configuration, since after the migration those cells are precluded by firmware from performing any active processing.
- the computer system 2 continues to perform the same overall process after the migration (except by different cores) as was occurring before the migration.
- the process then advances to a step 196 at which the memory migration engine(s) are programmed to remove all references to cell 0 cores from both the CC directory (more particularly, the CC filter tag cache or “RTAG”, which is designed to track remote ownership of memory lines) and the MC directory.
- the CC directory more particularly, the CC filter tag cache or “RTAG”, which is designed to track remote ownership of memory lines
- the MC directory is programmed to remove all references to cell 0 cores from both the CC directory (more particularly, the CC filter tag cache or “RTAG”, which is designed to track remote ownership of memory lines) and the MC directory.
- the cache entries in the cores of the cell 0 are cleared and copied back to the appropriate memory segments 26 - 36 to prevent any loss of data, and the directories are updated to reflect the cleared caches.
- the directory entries on the cell 0 are modified to reflect the removal of the partition memory of the cell 1 (e.g., the memory segments 32 - 36 ) so that the directories include only the entries for the partition memory of the cell 0 (e.g., the memory segments 26 - 30 ).
- the TADs 106 in the MCs 82 - 106 are not re-programmed as the bank addresses of the memory segments 26 - 36 on the cell 0 are not affected by the removal of the partition memory of the cell 1 .
- the MTCs 140 on the CCs 108 , 112 , 116 and 120 of the cell 0 are re-programmed.
- the MTCs 140 are responsible for translating global/fabric addresses received off of the fabric 8 into local physical addresses for accessing memory locations when the computer system 2 is operating according to the agent access memory configuration.
- the MTCs 140 are used only for accessing I/O systems and any possible remaining remote cores, and all of the memory request signals are transferred only to the MCs 88 - 94 of the cell 0 .
- the MTCs 140 are pre-programmed for both operation according to each of the agent access memory configuration and the direct access memory configuration, at the step 198 (with the aid of the firmware) the MTCs 140 are re-programmed to convert their configuration particularly to the direct access memory configuration, such that the MTCs no longer operate to translate global/fabric addresses into local addresses and the fabric 8 is no longer utilized for memory accesses.
- this re-programming can involve updating by firmware of a particular bit per entry within the MTCs 140 .
- the SADs 78 on each of the cores 46 - 60 on the cell 0 are re-programmed.
- the SADs 78 are pre-programmed for operation in both the agent access memory configuration and the direct access memory configuration.
- the firmware re-programs the SADs 78 on each of the cores 46 - 60 of the cell 0 to route all memory requests directly to one of the MCs 88 - 94 on the same cell rather than to any of the PAs 18 - 24 .
- the process of FIG. 2 the process of FIG.
- the 3 in the present embodiment involves each of the programming of the memory migration engine, the updating of the cache coherency controllers, and the re-programming of the MTCs 140 and the SADs 78 when the cores 46 - 60 of the cell 0 have been migrated to the cell 1 .
- the cores that is, the states of the cores
- the cores are migrated back from the cell 1 to the cell 0 , one at a time, to resume all active processing in the newly-determined direct access memory configuration.
- the directory caches (particularly the RTAGs mentioned above) in the CCs of the cell 0 are updated to prevent any memory accesses to the partition memory of the cell 1 (e.g., the memory segments 32 - 36 ).
- each such directory cache is assigned a subset of memory segments (home segments) and is responsible for tracking ownership of the memory segments other than its own home segments.
- the directory caches in the CCs of the cell 0 are cleared to prevent any further snoops from being issued to cores on cell 1 for local direct access memory.
- all the remaining resources, such as the interrupts and input output systems, of the cell 1 are moved to the cell 0 . From this point onwards, the cell 0 becomes responsible for handling all those resources.
- the cell 1 is then removed from the partition and is no longer exposed to the operating system. Before this point, even though an on-line deletion operation to remove the cell 1 had been performed, the cell 1 was available to aid the transition from the agent access memory configuration to the direct access memory configuration. Depending upon the embodiment or circumstance, the removed cell 1 can be completely removed from the computer system 2 or it can exist within the computer system 2 as an unassigned resource that can again be added to any partition within that computer system at any point. After removal of the cell 1 from the partition, only cell 0 is left behind, which accesses the partition memory according to the direct access memory configuration. The process of FIG. 3 then ends at a step 210 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
Description
- The present invention relates to computer systems and, more particularly, relates to systems and methods within computer systems that govern the accessing of memory.
- Given the high processing speeds that can be achieved by today's computer processing units, memory access speed has become a limiting factor in many computer systems. In order to reduce memory latency and avoid “hot spots” in which certain memory resources are overly taxed, many computer systems employ a shared memory system in which the memory is divided into multiple blocks, and where multiple processing units are allowed to access the same blocks of the memory at different or even substantially the same times. In some such computer systems, each block of memory is controlled by a respective memory controller that is capable of communicating with multiple processing units of the computer system.
- Some computer systems employ sockets that each have multiple processing units and, in addition, also typically each have their own respective memory controllers that manage blocks of memory capable of being accessed by one or more of the processing units of the respective sockets. To reduce memory latency in some such systems, processing units located on a given socket may be able to access memory blocks controlled by memory controllers located on other sockets. Such operation, in which one socket directly accesses the memory resources of another socket, is commonly referred to as “memory interleaving”, and systems employing such interleaving capability are commonly referred to as non-uniform memory access (NUMA) systems.
- Yet the degree to which memory interleaving can be effectively implemented in conventional computer systems is limited. Memory interleaving as described above is typically restricted to small numbers of sockets, for example, to four sockets or less. To achieve systems having larger numbers of sockets that are capable of accessing each other's memory resources, the memory controllers of the sockets cannot be directly connected to the processing units of other sockets but rather typically need to be connected by way of processor agents. Yet the implementation of such systems employing processor agents tends to be complicated and inefficient both in terms of the operation of the processor agents and in terms of the extra burdens that are placed upon the operating system and applications running on such systems. For example, in such systems it is desirable that the operating system/applications be capable of adapting to changes in the memory architecture to avoid inefficient operation, something which is often difficult to achieve.
- Additionally, it is increasingly desired that computer systems be scalable and otherwise adjustable in terms of their sockets (e.g., in terms of processing power and memory). For example, it may in one circumstance be desirable that a computer system utilize only a small number of sockets but in another circumstance become desirable or necessary that the computer system be modified to utilize a larger number of sockets. As the computer system is modified to include or not include larger numbers of sockets, a given manner of interleaving suited for either smaller or larger numbers of sockets may become more or less effective. Again for example, supposing that such a computer system employs the manner of interleaving described above as involving direct contact (not involving processor agents, sometimes referred to as glueless) among four or less sockets, the computer system's memory access performance may vary significantly as the computer system is modified between utilizing four or less sockets and greater than four sockets.
- For at least these reasons, it would be advantageous if an improved system and method for achieving enhanced memory access capabilities in computer systems could be developed. More particularly, it would be advantageous if, in at least some embodiments, such a system and method enabled enhanced memory interleave capabilities in computer systems having large numbers of sockets with multiple processors and memory controllers, such that the processors of the various sockets could access different memory blocks controlled by memory controllers of other sockets in a manner that, in comparison with conventional systems, reduced memory latency and/or the occurrence of “hot spots”. Additionally, it would be advantageous if, in at least some embodiments, such a system and method was capable of achieving satisfactory levels of memory interleave capabilities even where the number and/or type of system resources such as processors and memory devices being utilized by the system varied during system operation.
- In at least some embodiments, the present invention relates to a method of operating a computer system. The method includes operating a first cell of the computer system in accordance with a first memory access configuration, and migrating a first attribute of a first core of the first cell to a second cell of the computer system. The method additionally includes configuring a first portion of the first cell so that the first cell is capable of operating in accordance with a second memory access configuration, and migrating at least one of the first attribute and a second attribute from the second cell back to the first core of the first cell, whereby subsequently the first cell operates in the second mode of operation.
- Further, in at least some embodiments, the present invention relates to a method of operating a computer system. The method includes, at a first time, operating the computer system in accordance with an agent access memory configuration in which a first core of a first cell of the computer system communicates with a first memory controller of a second cell of the computer system by way of at least one processor agent and a fabric. The method additionally includes, at a second time that one of precedes and occurs after the first time, operating the computer system in accordance with a direct access memory configuration in which the first core of the first cell of the computer system communicates with a second memory controller of the first cell, not by way of the at least one processor agent and not by way of the fabric. The method further includes performing a transitioning procedure by which the computer system switches between the operating of the computer system in accordance with the agent access memory configuration and the operating of the computer system in accordance with the direct access memory configuration.
- Additionally, in at least some embodiments, the present invention relates to a computer system. The computer system includes a first core on a first cell, and first and second memory controllers governing first and second memory segments, respectively, where the first memory controller is also implemented as part of the first cell. The computer system further includes a fabric, and at least one processor agent coupled at least indirectly to the first core, at least indirectly to the second memory controller, and to the fabric. When the computer system operates in accordance with an agent access memory configuration, the first core communicates with the second memory controller by way of the at least one processor agent and the fabric. Also, when the computer system operates in accordance with a direct access memory configuration, the first core communicates with the first memory controller independently of the at least one processor agent and the fabric.
-
FIG. 1 shows in schematic form components of an exemplary computer system divided into multiple partitions that are linked by a fabric, where each partition includes memory blocks and multiple sockets with multiple processing units and memory controllers, and where the multiple sockets are capable of sharing and accessing, via communication links, the various memory blocks, in accordance with one embodiment of the present invention; -
FIG. 2 is a flow chart showing exemplary steps of operation, which in particular relate to dynamically converting a direct access memory configuration of the computer system ofFIG. 1 to an agent access memory configuration in accordance with one embodiment of the present invention; and -
FIG. 3 is a flow chart showing exemplary steps of operation, which in particular relate to dynamically converting an agent access memory configuration of the computer system ofFIG. 1 to a direct access memory configuration in accordance with one embodiment of the present invention. - Referring to
FIG. 1 , components of an exemplarymultiprocessor computer system 2, divided into multiple partitions, are shown in a simplified schematic form, in accordance with at least one embodiment of the present invention. As shown, thecomputer system 2 in the present embodiment in particular includes two partitions, namely, afirst cell 4, asecond cell 6, and afabric 8 to facilitate communication between those two cells. The two 4, 6 can be understood to be formed on two separate printed circuit boards that can be plugged into, and connected by, a backplane (on which is formed or to which is coupled the fabric 8). Although thecells computer system 2 of the present embodiment includes only the first and 4 and 6, it is nevertheless intended to be representative of a wide variety of computer systems having an arbitrary number of cells and/or circuit boards. For example, in other embodiments, only a single cell or more than two cells are present.second cells - In at least some embodiments, the
computer system 2 can be a sx1000 super scalable processor chipset available from the Hewlett-Packard Company of Palo Alto, Calif., on which are deployed hard partitions formed by thecells 4, 6 (also known as “nPars”). Hard partitions formed by the 4, 6 allow the resources of a single server to be divided among many enterprise workloads and to provide different operating environments (e.g., HP-UX, Linux, Microsoft Windows Server 2003, OpenVMS) simultaneously. Such hard partitions also allow computer resources to be dynamically reallocated. Although thecells computer system 2 can be the super scalable processor chipset mentioned above, it need not be such a chipset and instead in other embodiments can also take a variety of other forms. - Each of the
4, 6 is capable of supporting a wide variety of hardware and software components. More particularly as shown, each of thecells 4, 6 includes a respective pair of sockets, namely,cells 10 and 12 on thesockets first cell 4 and 14 and 16 on thesockets second cell 6. Additionally, main memory of the 4, 6 is divided into multiple memory segments including memory segments orcells 26, 28, 30 on theblocks first cell 4 and memory segments or 32, 34, 36 on theblocks second cell 6. Additionally, each of the 4, 6 includes a respective pair of processor agents (PAs), namely,cells 18 and 20 on thePAs first cell 4 and 22 and 24 on thePAs second cell 6. In other embodiments, one or both of the 4, 6 can also include other components not shown, for example, input/output systems, and power management controllers. As will be discussed in further detail below, thecells computer system 2 is capable of supporting different types of memory access configurations for accessing the various multiple memory segments 26-36. - With respect to the sockets 10-16, they serve as a platform for supporting multiple hardware components. These hardware components include respective sets of cores or
38, 40, 42, 44 on each respective socket, respective pairs of memory controllers (MCs) 88 and 90, 92 and 94, 96 and 98, and 100 and 102 on each respective socket, andprocessing units 80, 82, 84 and 86 on each respective socket. With respect to the sets ofrespective switches 38, 40, 42, 44 on eachcores 10, 12, 14, 16 in particular, therespective socket socket 10 includes four 46, 48, 50, 52, thecores socket 12 includes four 54, 56, 58, 60, thecores socket 14 includes four 62, 64, 66, 68, and thecores socket 16 includes four 70, 72, 74, 76. Notwithstanding the fact that, in the present embodiment, each of thecores 10, 12, 14 and 16 has four cores, the present invention is intended to encompass a variety of other embodiments of sockets having other numbers of cores, such as sockets having less than four cores (or even only a single core) or more than four cores.sockets - The switches 80-86 on each socket are crossbars capable of routing communications to and from the other components located on that socket. More particularly, the
switch 80 allows for the routing of communications from and to any of the cores 46-52 and 88, 90 on theMCs socket 10, theswitch 82 allows for the routing of communications from and to any of the cores 54-60 and 92, 94 on theMCs socket 12, theswitch 84 allows for the routing of communications from and to any of the cores 62-68 and 96, 98 on theMCs socket 14, and theswitch 86 allows for the routing of communications from and to any of the cores 70-76 and 100, 102 on theMCs socket 16. Additionally, each of the switches 80-86 also allows for the routing of communications to and from the respective socket 10-16, on which the respective switch is mounted, from and to the respective pairs of 18, 20 or 22, 24 of thePAs 4 or 6, respectively, on which the switch is mounted. That is, each of thecells 80, 82 is capable of directly communicating with each of theswitches 18, 20 as shown by dashedPAs 81, 85, 87 and 89, while each of thepaths 84, 86 is capable of directly communicating with each of theswitches 22, 24 as shown by dashedPAs 91, 93, 95 and 99. Further, the switches located on a cell are capable of communicating with each other as well. For example, thepaths 80, 82 can communicate with each other as shown by aswitches dashed path 83 and the 84, 86 can also communicate with each other as shown by aswitches dashed path 97. - Typically, the cores 46-76 of the sets of cores 38-44 located on the sockets 10-16 respectively are chips that are coupled to their respective sockets by way of electrical connectors, and are intended to be representative of a wide variety of central processing units. For example, in the present embodiment, the cores 46-76 are Itanium processing units as are available from the Intel Corporation of Santa Clara, Calif. In other embodiments, one or more of the cores 38-44 can take other forms including, for example, Xeon, Celeron and Sempron. In alternate embodiments, one or more of the cores can be another type of processing unit other than those mentioned above. Different cores on a given socket, on different sockets, and/or on different cells need not be the same but rather can differ from one another in terms of their types, models, or functional characteristics.
- In other embodiments, one or more of the sockets 10-16 can include components other than or in addition to those mentioned above. Also, notwithstanding the fact that the present embodiment has two sockets on each of the first and
4 and 6 respectively, one or more cells in other embodiments can either have a single socket or possibly more than two as well. In many embodiments, the number of sockets will exceed (possibly even greatly exceed) the number of sockets shown insecond cells FIG. 1 . For example, in some embodiments, the number of sockets could be up to 64 different sockets. The present embodiment, with its limited numbers of 4, 6 andcells 10, 12, 14 and 16, is provided as an exemplary embodiment of the present invention due to the ease with which it can be described and illustrated.sockets - Internally, each of the cores of the sets of cores 38-44 in the present embodiment includes a variety of hardware and software components capable of supporting a wide variety of applications as well as tasks relating to the management of the various hardware and software components present on the cores as adapted in accordance with various embodiments of the present invention. More particularly, each of the cores includes a cache memory (not shown), which is smaller and faster in operation than the memory segments 26-36 of main memory discussed above, and which is capable of storing blocks of frequently used data accessed from the main memory in order to reduce the impact of memory latency that occurs when accessing the main memory (discussed in more detail below in regards with
FIG. 2 ). Notwithstanding the presence of the cache memories, one or more of the memory segments 26-36 of the main memory still need to be accessed on a regular basis, for example, upon failures to locate requested information in the cache memories. - Further as shown, each of the cores 46-76 has a respective logic block referred to as a Source Address Decoder (SAD) 78. Depending upon the embodiment, the
SADs 78 can be implemented as hardware components and/or can reside as software. In the present embodiment, each of theSADs 78 is pre-programmed to direct memory request signals to the MCs 88-102 either indirectly via the PAs 18-24 or directly (not via the PAs) depending upon a memory configuration of thecomputer system 2, as discussed in more detail below. Conversely, signals returning from the MCs 88-102 (either indirectly via the PAs 18-24 or more directly from the MCs 88-102) are processed in theSADs 78 for receipt by the cores 46-76. Typically, theSADs 78 associated with any of the cores 46-60 of thefirst cell 4 will only send requests to the 18, 20 or the MCs 88-94 of that cell, while thePAs SADs 78 associated with any of the cores 62-76 of thesecond cell 6 will only send requests to the 22, 24 or the MCs 96-102 of that cell.PAs - More particularly, in the present embodiment the SADs 78 process signals arising from the cores 46-76 and determine how to route at least some of those signals based upon the memory configuration used for accessing the various memory segments 26-36. In a first, “agent access” memory configuration, signals arising from the cores 46-76 and intended for the MCs 88-102 are routed indirectly not only via appropriate ones of the switches 80-86 but also via appropriate ones of the PAs 18-24 and the
fabric 8. Typically, when operating in accordance with this memory configuration, all (or substantially all) such signals between cores and MCs/memory segments are routed in such indirect manner regardless of the relative proximity of the cores and MCs/memory segments. In contrast, in a second, “direct access” memory configuration, the routing of signals depends upon the relative locations of the cores 46-76 and the MCs 88-102 (and associated memory segments). - More particularly, in the direct access memory configuration, signals from respective ones of the cores 46-76 that are directed to respective ones of the MCs 88-102 that are located on the same respective cells are routed directly to those respective MCs via the respective switches 80-86 of the respective sockets of the respective cells, without being routed to any of the PAs 18-24 or routed via the fabric. Such memory accesses where the requesting core and the intended MC (and desired memory segment) are positioned on the same cell can be referred to as local memory accesses. However, signals from respective ones of the cores 46-76 that are directed to respective ones of the MCs 88-102 that are located on different cells are routed indirectly via appropriate ones of the switches 80-86 and appropriate ones of the PAs 18-24 through the
fabric 8, in the same manner as signals are routed according to the agent access memory configuration. Such memory accesses where the requesting core and the intended MC (and desired memory segment) are positioned on different cells can be referred to as remote memory accesses. Generally speaking, although operation in the direct access memory configuration involves some signals being communicated indirectly by way of one or more of the PAs 18-24 and thefabric 8, typically as many signals as possible are routed in a manner that does not involve any of the PAs or the fabric. - Further as shown, in the present embodiment all input/output (I/O) subsystems are accessed by way of the
fabric 8. Therefore, while the paths of signals communicated between the cores 46-76 and the MCs 88-102 as determined by theSADs 78 can vary depending upon the memory access configuration, in the present embodiment all signals communicated by the cores 46-76 that are intended for receipt by I/O subsystems are routed via the respective switches 80-86, appropriate ones of the PAs 18-24 and the fabric, irrespective of the memory configuration. In alternate embodiments, it is possible that the I/O subsystems will be coupled to other devices/structures of thecomputer system 2, such as directly to the PAs 18-24 themselves. - As for the MCs 88-102, these are responsible for managing and accessing the various memory segments 26-36 in response to read/write requests received from the cores 46-76, and for relaying signals back from those memory segments to the cores, as described in further detail below. The MCs 88-102 can be hardware chips such as application specific integrated circuit (ASICs) that are connected to the sockets 10-16 by way of electrical connectors. In other embodiments, one or more of the MCs 88-102 can be other type(s) of MCs. Additionally, while in the present embodiment two of the MCs 88-102 are provided on each of the sockets 10-16, the number of MCs per socket can vary in other embodiments (e.g., there can be only a single MC on each socket or possibly more than two as well).
- Further as shown, each of the MCs 88-102 includes a respective logic block referred to as a Target Address Decoder (TAD) 106. As will be described in further detail below, the
TADs 78 process signals arriving from the cores 46-76 and determine how to convert between (e.g., decode) memory address information received in those signals and memory locations within the memory segments 26-36. TheTADs 106 also facilitate the return of information from the memory segments 26-36 back toward the cores 46-76. In the present embodiment, each of theTADs 106 can be implemented in either hardware or software, and is pre-programmed to convert between memory bank addresses and memory locations inside the memory segments 26-36. - With respect to the main memory itself, as discussed above, it is divided into multiple disjointed memory segments including, for example, the memory segments 26-36 of
FIG. 1 . The particular manner of sub-division of the main memory into multiple memory segments can vary depending upon the embodiment and upon various factors, for example, the requirements of the applications running on the cores 46-76. In at least some embodiments, the memory segments 26-36 are organized as dual in-line memory modules (DIMMs) that are respectively connected to one or more of the MCs 88-102 by way of electrical connectors. More particularly, in the present embodiment, the 26 and 30 are controlled by thememory segment 88 and 94, respectively, while theMCs memory segment 28 is controlled by both theMC 90 and theMC 92. Similarly, the 32 and 36 are respectively controlled by thememory segments 96 and 102, respectively, while theMCs memory segment 34 is managed by both the 98 and 100.MC - Exemplary communications among the various components of the
computer system 2 as occurs in the aforementioned different agent access and direct access memory configurations, as well as between the cores of the computer system and the I/O subsystems, are illustrated inFIG. 1 by several 144, 146, 148, 150, 152, 154, 155, 156 and 157. In general, those of the communication paths that proceed between the sockets 10-16 and the PAs 18-24 follow the connections provided by the dashedexemplary communication paths 81, 85, 87, 89, 91, 93, 95 and 99 connecting the switches 80-86 with the PAs 18-24, and those of the communication paths that proceed between neighboring sockets of the same cell follow the connections provided by the dashedcommunication paths 83, 97. This should be understood to be the case even though thecommunication paths 144, 146, 148, 150, 152, 154, 155, 156 and 157 for clarity of illustration are not shown to directly overlap the dashed communication paths.communication paths - As already noted above, in the present embodiment, all communications between the cores 46-76 and the I/O subsystems occur by way of the PAs 18-24 and the
fabric 8, as represented by theexemplary communication path 157. As for memory request signals, the 144, 146, 148, 150, 152, 154, 155, and 156 illustrate several communication paths some of which are illustrative of direct access memory configuration communications, some of which are illustrative of agent access memory configuration communications, and some of which are consistent with both agent access and direct access memory configuration communications. To begin, thecommunication paths 150, 152 and 155 show three exemplary signal paths that can be taken when thecommunication paths computer system 2 is operating in the direct access memory configuration and when cores are accessing local memory segments. Thecommunication path 150 in particular shows the core 62 accessing thememory segment 32 via theMC 96 and theswitch 84. Additionally, thecommunication path 152 shows the core 70 accessing thememory segment 36 via theMC 102 and theswitch 86, while thecommunication path 155 shows the core 56 accessing thememory segment 28 by way of theMC 90 and switches 80 and 82 (and the communication path 83). - In contrast, the
148, 154 and 156 respectively show three exemplary signal paths that can be taken when thecommunication paths computer system 2 is operating in the agent access memory configuration and when cores are accessing local memory segments. Although these 148, 154 and 156 respectively connect the same cores and MCs as thecommunication paths 150, 152 and 155, respectively, these communication paths proceed via certain of the PAs 18-24 and via thecommunication paths fabric 8. In particular, thecommunication path 148 like thecommunication path 150 shows the core 62 accessing thememory segment 32 via theMC 96, except in this case the communication path proceeds via theswitch 84, thePA 24 and thefabric 8. Additionally, the communication path 154 like thecommunication path 152 shows the core 70 accessing thememory segment 36 by way of theMC 102, except insofar as in this case the communication path proceeds via theswitch 86, thePA 24 and thefabric 8. Further, thecommunication path 155 like thecommunication path 156 shows the core 56 accessing thememory segment 28 via theMC 90, except in this case the communication path proceeds via the 80 and 82, theswitches PA 20 and thefabric 8. - As indicated by each exemplary scenario represented by the
148, 154 and 156, when operating in the agent access memory configuration, a memory request concerning a memory location sent by a given core on a given socket of a given cell is directed first (by way of the switch of the socket) to a first PA that is on the same cell as the requesting core. The first PA in response provides a signal to thepaths fabric 8, which then directs the signal to a second PA that is on the same cell as the MC governing the memory segment on which is located the requested memory location. The second PA in turn provides a signal to that MC, which results in the desired accessing of the requested memory location. It should thus be evident that the accessing of memory in this mode of operation occurs by way of the fabric and two PAs, or possibly by way of the fabric and only one PA where the requesting core and MC governing the requested memory location are on the same cell. - By comparison, when operating in the direct access memory configuration accessing local memory segments, and as indicated by each exemplary scenario represented by the
148, 150 and 152, a memory request concerning a memory location sent by a respective core on a respective socket on a respective cell is routed directly to a MC on the same socket or another socket of the same respective cell merely by way of the appropriate switch(es) of the socket(s), without being routed through any of the PAs 18-24 or thepaths fabric 8. The MC then provides the appropriate result by way of accessing the requested memory location and directly providing a response via the switches without any routing through any PAs or the fabric. It should be further evident from the 148, 150 and 152 shown inparticular paths FIG. 1 that the direct access memory configuration can involve communications between cores and MCs that are on the same socket (as indicated by thepaths 148 and 152), as well as cores and MCs that are on the same cell but not the same socket (as indicated by the path 150). - In addition to the accessing of local memory segments, it is also possible when operating in the direct access memory configuration to conduct remote memory accesses where the request originating core and the desired memory segment are located on different cells. In such cases, the manner of accessing the memory segments is the same with respect to operation in both the direct access and agent access memory configurations. For example, as shown in
FIG. 1 , thecommunication path 144 is employed to allow the core 62 to access thememory segment 26 by way of theMC 88 via the 80 and 84, theswitches 18 and 22, and thePAs fabric 8 when thecomputer system 2 is operating in each of the direct access memory configuration and the agent access memory configuration. Likewise, thecommunication path 146 is employed to allow the core 64 to access thememory segment 30 by way of theMC 94 via the 82 and 84, theswitches 20 and 22, and thePAs fabric 8 when thecomputer system 2 is operating in each of the two memory configurations. - With respect to the
fabric 8 in particular, it is a hardware device formed as part of (or connected to) the backplane of thecomputer system 2. As discussed above, all requests and other messages to and from any of the cores with respect to the I/O subsystems are communicated via thefabric 8. Also as discussed above, all requests by any of the cores to access any of the MCs and associated memory segments when operating in the agent access memory configuration are directed through thefabric 8 irrespective of the location of the memory segment relative to the originating core (and even if the core and the appropriate MC for controlling that memory segment are on the same cell and/or same socket). - Additionally, requests by any of the cores to access memory segments by way of their respective MCs when the MCs and cores are located on different cells are also routed via the
fabric 8, both when thecomputer system 2 is operating in the agent access memory configuration and also when the computer system is operating in the direct access memory configuration to perform remote memory accesses. However, communications between the cores and MCs do not pass via thefabric 8 during operation according to the direct access memory configuration when accessing a local memory segment. To allow for the above-described operation and usage of thefabric 8, each of the 4 and 6 are connected to thecells fabric 8 during configuration, when the cells are installed on thecomputer system 2. As will be described further below, signals communicated onto thefabric 8 must take on fabric (or global) addresses that differ from the physical addresses employed by the signals when outside of the fabric. - As for the PAs 18-24, each of the PAs 18-24 can be an integrated circuit (IC) chip albeit, in other embodiments, one or more of the PAs 18-24 can take other form(s). As already indicated above, the PAs 18-24 form an intermediary by which signals directed from the cores 46-76 and/or the MCs 88-102 by way of the switches 80-86 are provided to the
fabric 8, and vice-versa. Also, although the present embodiment envisions the I/O subsystems as being coupled to thefabric 8, in alternate embodiments, the PAs 18-24 can be directly coupled to one or more of the I/O subsystems rather than by way of paths such as the path 154 ofFIG. 1 that passes through the fabric. - More particularly as shown in
FIG. 1 , each of the PAs 18-24 has located thereon two coherency controllers (CCs), namely, 108 and 112 on theCCs PA 18,CCs 116 and 120 on thePA 20, 124 and 128 on theCCs PA 22, andCCs 132 and 136 on thePA 24. In addition to the CCs, each of the PAs 18-24 also has located thereon two caching agents (CAs), namely, 110 and 114 on theCAs PA 18, 118 and 122 on theCAs PA 20, 126 and 130 on theCAs PA 22, and 134 and 138 on theCAs PA 24. As illustrated by the 144, 146, 148, 154 and 156, thecommunication paths 108, 112, 116, 120, 124, 128, 132, 136 in particular process signals that are being directed toward, or received from, the MCs 88-102 (via the switches 80-86). In contrast, theCCs 110, 114, 118, 122, 126, 130, 134 and 138 process signals that are being received from, or directed toward, the cores 46-76 (via the switches 80-86).CAs - In the
computer system 2, the CCs and CAs serve several purposes. To begin with, the CCs are particularly responsible for resolving coherency conflicts within thecomputer system 2 relating to the accessing of the memory segments 26-36 by way of the MCs 88-102 in an agent access memory configuration or in collaboration with the MCs 88-102 in a direct access memory configuration. Conflicts can arise since, in addition to residing within home memory segments, more recent copies of memory locations can also be resident within one or more local cache memories of the cores 46-76. To reduce or eliminate such conflicts and maintain a consistent, coherent view of main memory, the CCs employ a directory based cache coherency control protocol, which is described further below. Although such a coherency protocol can be employed, it should be understood that in alternate embodiments other coherency protocols can be used including for example, invalidate protocols such as the MESI and update protocols such as the snooping protocol. When thecomputer system 2 is operating in a direct access memory configuration in particular, the sockets 10-16 also support a coherency protocol, which tracks ownership in both local cores and the CCs (which track ownership for the rest of the cores in the partition). The protocol used by the sockets can be, for example, a directory-based protocol or a snooping-based protocol. - In the present embodiment in which the CCs employ a directory based cache coherency protocol, each of the CCs maintains a directory (for example, a table) for each memory location of the main memory. Each row of the directory of a given CC includes information indicating which of the memory segments 26-36 has ownership (the home memory segment) of each memory location, as well as information indicating which of the cores 46-76 has the most updated copy of that location. Each location of the directory can be accessed by a subset of the address bits. By searching through its directory, the given CC can also determine if alternate updated copies of that memory location exist within another one or more of the cores. If so, asynchronous signals or “snoops” can be issued to the core holding the updated copy of the memory location for retrieval, thus resulting in the returning of the most updated copy of the memory location in response to a read/write request.
- As shown, each of the
108, 112, 116, 120, 124, 128, 132, and 136 in particular includes an instance of a Memory Translation CAM (MTC) 140, which can be implemented as a pre-programmed logic block. TheCCs respective MTC 140 of each CC is responsible for converting fabric addresses into local physical addresses for every memory access signal received by the respective MTC off of thefabric 8 when thecomputer system 2 is operating according to the agent access memory configuration, and also for performing such conversions from fabric addresses to local physical addresses with respect to every remote memory access signal received off of the fabric when the computer system is operating in the direct access memory configuration. These local physical addresses can be then used by the MCs 88-102 that are in communication with the respective CCs (e.g., to retrieve the information from the requested memory location). - Also, the
MTCs 140 of the CCs are used to determine the coherency flow of the received requests. To issue snoops, a global address routed via thefabric 8 can be converted to a local physical address by way of one of theMTCs 140. EachMTC 140 also gives an indication about whether a given memory address corresponds to part of a direct access block of memory or part of an agent access block of memory. If the line is part of an agent access block of memory, then the corresponding CC will issue snoops (if required, as determined by the local directory) to both cores on the cell with which it is associated as well as to cores on different cells. If the line is part of a direct access block of memory, then the CC will only issue snoops to cores on remote cells (if required, as determined by the local directory). - In addition to the CCs, each of the PAs 18-24 also has located thereon two caching agents (CAs), namely,
110 and 114 on theCAs PA 18, 118 and 122 on theCAs PA 20, 126 and 130 on theCAs PA 22, and 134 and 138 on theCAs PA 24. With respect to the 110, 114, 118, 122, 126, 130, 134 and 138, these are intended to perform several functions when the computer system is in the agent access memory configuration. To begin, in the present embodiment, the CAs are responsible for executing the coherency flow determined by the CCs (e.g., by executing the snoops issued by the CCs). Additionally, the CAs perform address abstraction for signals routed off via theCAs fabric 8, by which local physical addresses referenced in signals received from the cores 46-76 are converted into fabric (global) addresses appropriate for thefabric 8, and vice-versa. In other embodiments, one or more of the 110, 114, 118, 122, 126, 130, 134 and 138 can be programmed to perform other functions than those mentioned above.CAs - More particularly with respect to the performing of address abstraction, each of the
110, 114, 118, 122, 126, 130, 134 and 138 includes a respective Fabric Abstraction Block (FAB) 142. When theCAs computer system 2 is operating according to the agent access memory configuration or accessing remote memory segments when in the direct access memory configuration, eachrespective FAB 142 allows its respective CA to convert local physical addresses such as those arriving on memory request signals from the cores 46-76 (via the switches 80-86) into fabric (global) addresses suitable for determining where the signals are sent within thefabric 8 for accessing a memory location. TheFABs 142 can operate in a variety of ways to perform these conversions and, in the present embodiment, employ interleaving algorithms. In the present embodiment each of the 110, 114, 118, 122, 126, 130, 134 and 138 is pre-programmed for managing a subset of the main memory (e.g., certain subsets of the memory segments 26-36). The allocation of memory to the different CAs is known to theCAs SADs 78 of the cores 46-76, such that the SADs are able to route the memory request signals to the appropriate CAs based upon the memory locations requested by the cores. - When the
computer system 2 is operating according to the agent access memory configuration or is operating to access a remote memory segment when in the direct access memory configuration, the signals communicated between the cores 46-76 and the memory controllers 88-102 undergo several conversions as they proceed via the switches 80-86, the PAs 18-24, and thefabric 8. More particularly, a signal sent by one of the cores 46-76 undergoes a first conversion by theSAD 78 of the core, which results in the signal being communicated by the appropriate one of the switches 80-86 to an appropriate one of the PAs 18-24. Upon the signal being received at the appropriate one of the PAs 18-24, theFAB 142 of one of the CAs of the PA converts the signal into a signal appropriate for transmission over thefabric 8. As indicated above, this conversion at least in part involves a conversion of a physical memory address to a fabric address. After being transmitted through thefabric 8, the signal then arrives at another one of the PAs 18-24 (or potentially the same PA handling the signal before it entered the fabric), where theMTC 140 of one of the CCs of the PA again converts the fabric address back into a physical memory address. Finally, upon passing from that PA via another one of the switches 80-86 (or potentially the same switch as before) and arriving at an appropriate one of the MCs, theTAD 106 of that MC further converts the signal so that the desired memory location in main memory is accessed. Similar conversion processes occur when signals proceed in the opposite direction from the memory to the cores. - Although not necessarily the case, it is nonetheless often the case in the agent access memory configuration that the local physical address generated by the
MTC 140 on the destination end of a given request differs from the local physical address generated by theSAD 78 on the request originating end. That is, the address sent to a MC as part of a memory read or write request is not necessarily the same physical address that was generated by the core that made the request. At the same time, while each memory location of the main memory can be referenced by way of a unique address, multiple locations within each of the memory segments 26-36 nevertheless can share the same address. As explained earlier, each of the memory segments 26-36 is a small, disjointed subset of the main memory. Consequently, the memory locations hosted within each of those memory segments can be accessed by using a smaller subset of the address that is used to access a location inside the main memory. Additionally, the MC view of the address cannot be used by that MC for coherency operations as the modified address can access an incorrect location if applied to a core. - By comparison, when the
computer system 2 is operating according to the direct access memory configuration and accessing local memory, a somewhat less complicated set of conversions occur as signals are provided between the cores 46-76 and the MCs 88-102 directly by way of the switches 80-86 (but not by way of the PAs 18-24 or the fabric 8). In such cases, the cores 46-60 on thecell 4 are at most only able to communicate with the MCs 88-94 on that cell, while the cores 62-76 on thecell 6 are at most only able to communicate with the MCs 96-102 on that cell. Under these circumstances, a signal sent by one of the cores 46-76 undergoes a conversion by theSAD 78 of the core, which results in the signal being communicated by the appropriate one (or possibly two) of the switches 80-86 to theTAD 106 of an appropriate one of the MCs 88-102. TheTAD 106 of the appropriate MC in turn converts the signal so that the desired memory location in main memory is accessed. Further, the addresses generated by the cores making the requests are the same as the addresses sent to the MCs 88-102, thus enabling the coherency algorithms to issue snoops to the local cores for maintaining coherency within thecomputer system 2. - Operating of the
computer system 2 in the agent access memory configuration versus the direct access memory configuration as described above results in different operational attributes and advantages. More particularly, when thecomputer system 2 is operating according to the direct access memory configuration, only a low order of interleaving occurs (e.g., in which the various cores only have limited access to certain memory segments controlled by MCs on the same cells as those respective cores). While the memory latency associated with this manner of memory access is quite small, this manner of memory access is typically limited to small configuration computer systems. That is, although the direct access memory configuration can be employed in any size partition (or computer system), in general the direct access memory configuration does not support interleaves greater than what the resources in a given socket supports (for example, often interleaves across up to only 4 sockets are possible). - In comparison, when the
computer system 2 is operating according to the agent access memory configuration, a high level of interleaving occurs in which the core(s) of a given socket and cell potentially have access to many memory locations governed by many MCs of many different sockets and cells. As a result, performance of thecomputer system 2 is enhanced by reducing the frequency with which hot-spots are encountered in large configuration computer systems such as those having 64 sockets. Although the communication of signals between the cores 46-76 and MCs 88-102 by way of the PAs 18-24 andfabric 8 is somewhat slower than can occur when signals are directly communicated between the cores and MCs (e.g., as in accordance with the direct access memory configuration), the overall or average level of memory access latency is still relatively low. Thus, the agent access memory configuration provides high bandwidth and scalability to large partition sizes, albeit it has higher latency in small partitions than is the case using the direct access memory configuration. - While each of the direct access and agent access memory configurations have relative advantages, in accordance with embodiments of the present invention the
computer system 2 is capable of being operated so as to switch between the two memory configurations and thereby provide the advantages of both the agent access and the direct access memory configurations (at least at different times). More particularly, in accordance with these embodiments of the present invention, thecomputer system 2 is capable of dynamic re-sizing of memory in its partitions and, when that occurs, also capable of converting in its operation between the direct access memory configuration and the agent access memory configuration. By dynamically converting between the direct access and agent access memory configurations, a partition's memory performance can be optimized whenever new cells and/or new sockets are added/removed from the existing partition within thecomputer system 2. Based upon the specific needs/types of the computer systems and the applications running on those computer systems, the configuration of memory access in particular can be changed to provide optimal performance in terms of minimizing hot spots and memory latency. - Turning to
FIGS. 2 and 3 , 158 and 184 respectively shows exemplary steps of operation in which theflowcharts computer system 2 ofFIG. 1 is dynamically converted between the direct access memory configuration and the agent access memory configuration in response to (or in conjunction with) the conversion of the computer system between being a single cell system having only thefirst cell 4 and a dual cell system having both the first and 4 and 6. Thesecond cells flowchart 158 ofFIG. 2 in particular relates to the conversion of the partition from being a single cell system to a dual cell system, and to the corresponding conversion of thecomputer system 2 from operating in the direct access memory configuration to the agent access memory configuration, while theflow chart 184 ofFIG. 3 relates to the opposite conversions. - It should further be understood that, although for simplicity
FIGS. 1-3 illustrate thecomputer system 2 that switches between the direct access and agent access memory configurations depending upon whether the computer system's partition employs one cell or two cells, in practice this transition typically occurs at a different threshold in terms of partition size, e.g., whether the partition includes four or less sockets or more than four sockets. Further, the addition/removal of a cell can, but need not necessarily, involve the physical coupling of a new cell with new memory resources (e.g., a memory card) to thecomputer system 2. Rather, in at least some circumstances, the addition of a cell to a given partition merely involves allocation of an idle cell to the given partition, or reallocation of a cell from another partition to the given partition. Likewise, in at least some circumstances, the removal of a cell from a given partition merely involves causing that cell to become idle, or reallocation of that cell to another partition. - Adding or removing cells can significantly vary the available memory. To enable efficient usage of all the memory blocks that are available to a partition after a conversion involving the addition or removal of one or more cells, all of the partition memory across all of the available is sockets is re-interleaved. More particularly with respect to the conversion from the direct access memory configuration to the agent access memory configuration when a cell is added to a partition as represented by
FIG. 2 , this process involves migrating the cores of the originally-existing cell to the newly-added cell and then back again such that, temporarily, all requests are directed through the PAs of both the old and the newly-added cell. A similar migration of cores is required also in the process of converting the agent access memory configuration to the direct access memory configuration when a cell is deleted from a partition as represented byFIG. 3 . - Referring particularly to
FIG. 2 , the process of adding a cell and converting from the direct access memory configuration to the agent access memory configuration starts at astep 160 and then proceeds to astep 162. At thestep 162, a “new” cell is added to an existing partition containing an “old” cell. For example, in the present embodiment, thesecond cell 6 ofFIG. 1 , referred to as “cell 1” from hereon, is added to the existing single partition containing thefirst cell 4, referred to as “cell 0” from hereon. As already described above, the “new”cell 1 is a cell board that is similar in structure and characteristics to the existingcell 0, and in particular includes multiple cores, MCs and partitioned memory segments. In the present example, thecell 1 is initially an unassigned resource with its power shut off, and that is available to be added to the existing partition containing thecell 0. However, as mentioned earlier, in other embodiments, thecell 1 can also initially co-exist in thecomputer system 2 as part of another existing partition, in which case, the process ofFIG. 2 involves removal of that cell from its initially-assigned partition and reassignment of that cell to the partition containing thecell 0. Further, in some embodiments, thecell 1 can be installed in thecomputer system 2 ofFIG. 1 at run time. - As indicated by the
step 162, prior to integrating thecell 1 to the partition containing thecell 0, thecell 1 is subject to diagnostics testing in which all of the components present on that cell (e.g., memory, cores, switches, PAs, MCs, I/O-related resources and memory segments as well as possibly other components) are tested to ensure that all the components are in working condition and capable of performing their respective functions. Any flawed or non-working components are replaced prior to integrating thecell 1 with thecell 0 into the existing partition of thecell 0. After being tested for and demonstrating operability, the “new”cell 1 is then integrated with thefabric 8 of the existing partition to facilitate communication between the “old” cell,cell 0, and thecell 1, at astep 164. At this point, thecell 1 becomes a part of the partition ready for communication with thecell 0 and the process proceeds to astep 166. - At the
step 166, thecell 1 is configured and the partition memory on the cell is allocated. By configuring thecell 1, properties of that cell, including all of its resources, are set so that thecell 1 is formally added to the partition and so that thecell 0 becomes aware of the presence of thecell 1. For example, a portion of the available memory on thecell 1 is configured as a cell local memory which is a non-interleaved memory available for all cores present in the partition. At the same time, all of the other memory on thecell 1 is configured as a global shared memory, also called partition memory, which is capable of being accessed by any of the cores either on thecell 1 or on another cell such as thecell 0. This new partition memory on thecell 1 is allocated to memory blocks adjacent to the pool of existing partition memory blocks of thecell 0, and thus is invisible to the operating system at this point. In addition, this partition memory of thecell 1 is added to the cache coherency controller on that cell to track cache line ownership of each line of the partition memory of thecell 1. - Further for example, address maps in the
FABs 142 of thecell 1 are set up to configure that cell so that it is capable of accessing memory segments in accordance with the agent access memory configuration. Additionally for example, in at least some circumstances, the new partition memory of thecell 1 is added to the existing partition memory of thecell 0. Once the new partition memory is added in this manner, thecell 1 is able to access the existing partition memory on thecell 0 as well. Also, in addition to the configuration of thecell 1 at thestep 166, the existingcell 0 is configured as well during that step in order to reflect the addition of thecell 1 into the partition and to enable thecell 0 to access the partition memory of thecell 1. Thus, after configuring thecell 0 and thecell 1, the cores and/or sockets on these cells can be actively used for communication with each other. The configuration of thecell 0 and thecell 1 during thestep 166 is under the control of the firmware, with the integration of thecell 1 invisible to the operating system. - After configuration of the
1 and 0 at thecells step 166, the process subsequently advances to a series of steps, beginning with astep 168, at which the address decoders and other supporting hardware components on thecell 0 are re-programmed to convert that cell from the direct access memory configuration to the agent access memory configuration. With respect to thestep 168 in particular, this step involves migrating the cores, one at a time, from thecell 0 to thecell 1 to facilitate changes to the source decoders of thecell 0 in a manner that avoids crashing thecomputer system 2 or stalling the processes executing on the cores of that cell. To migrate a core from thecell 0 to thecell 1, the architected state of that core, including for example, the registers and interrupts, is frozen and captured under the control of the firmware. This state information is then migrated to a “spare” core on thecell 1 so that the spare core can handle any processes that were originally being handled by the original core on thecell 0. - Upon completion of the migration process for a given original core on the
cell 0, that original core becomes inactive and all of its processes are run on the spare core on thecell 1. Further, the same migration process is then performed again with respect to all of the cores on thecell 0 until all of the cores from that cell have been migrated to corresponding spare cores on thecell 1. The migration of the cores from theold cell 0 to thenew cell 1 is facilitated by the firmware, and is invisible to the operating system associated with the partition so as to prevent any disruption to the regular processing of thecomputer system 2. - After the migration of all of the cores from the
cell 0 to thecell 1 has been completed, the process then proceeds to astep 170, at which all of the address decoders and related hardware components (such as the cores and MTC s) of thecell 0 are reprogrammed, to facilitate conversion from the direct access memory configuration to the agent access memory configuration. TheMTCs 140 in particular can be reprogrammed to include information as to whether the MTCs are responsible for tracking local cores (e.g., in the agent access memory configuration) or if the MCs will track the local cores (e.g., in the direct access memory configuration). In the present embodiment, thestep 170 is performed by way of special, preprogrammed hardware components found within each of the PAs 18-24 of the cells (more particularly within the 108, 112, 116, 120, 124, 128, 132 and 136), which can be referred to as memory migration engines. Also in the present embodiment, each memory migration engine is capable of moving one half of the memory segments in the attached socket, such that two memory migration engines are employed to move all of the memory segments in one socket as explained in more detail below. Further, each memory migration engine is capable of reviewing/traversing each cache line of the partition memory for both theCCs cell 0 and the new cell 1 (e.g., the memory segments 26-36) and corresponding lines of the caches of the cores of thecell 0, as well as clearing tags of the caches. In other embodiments, implementations other than that mentioned above can be used. - More particularly, the memory migration engine(s) operate by updating the
108, 112, 116 and 120 of theCCs cell 0 and transitioning the directories of those CCs (used for tracking coherency) accordingly so that the cores on thecell 0 do not use any of the old copies of data still present in their respective caches. Data from all of the caches on thecell 0 is then flushed out and the 108, 112, 116 and 120 gain exclusive access to each line in the memory segments 26-30 of that cell and write them back to idle. Further, the directory contents of those flushed cache lines are updated and the new partition memory of theCCs cell 1 is added to the directories to enable the MCs to keep track of the cache lines present in every core on both thecell 0 and thecell 1. No cores on thecell 0 are referenced after this step is completed. The firmware controls the cores oncell 0 so the control is transferred to the cores oncell 1 for any further processing. - Next, at a
step 172, all of theMTCs 140 on thecell 0 are reprogrammed for operation in accordance with the agent access memory configuration rather than the direct access memory configuration. More particularly, theMTCs 140 are reprogrammed for use in conjunction with the fabric to enable the transmission of memory request signals to the MCs 88-102 and thereby to the memory segments 26-36. As mentioned earlier, when thecomputer system 2 is operating in the direct access memory configuration accessing local memory segments, all memory requests from cores to MCs located on the same cell are directed to those MCs without passing through any of the PAs 18-24 or thefabric 8. At such time, the only signals that are communicated to the PAs 18-24, and governed by theMTCs 140 in particular, are those intended for the I/O subsystems or cores accessing remote memory segments. Reprogramming of theMTCs 140, however, makes it possible for the PAs 18-24 to handle signals arriving from the cores 46-76 that are intended eventually for the MCs 88-102. It should be further noted that, notwithstanding changes to theMTCs 140, theTADs 106 of the MCs 88-102 need not be re-programmed, since bank addresses of the memory segments 26-36 do not change. - Subsequent to the
step 172, at afurther step 174 each of theSADs 78 of the cores 46-60 on theoriginal cell 0 are reprogrammed by the firmware to operate according to the agent access memory configuration rather than the direct access memory configuration. More particularly, theSADs 78 are reprogrammed so that all memory requests are directed to one of the PAs 18-24 instead of directly to one of the MCs 88-102 (via one or more of the switches 80-86) as during operation according to the direct access memory configuration. Thus, after the SADs 78 are re-programmed, all requests for memory accesses are directed to respective ones of the PAs 18-24 on the respective cells, which in turn direct such requests onto thefabric 8 and thus further to additional (or possibly the same) ones of the PAs, where those signals are handled by therespective MTCs 140 of those PAs. Typically, firmware running on a management subsystem (or possible on the cores controlled by firmware) use CSR writes to change the programming of theSADs 78. Reprogramming of the SADs 78 at this point in the process represented byFIG. 2 is particularly possible since, subsequent to the cleaning of tags in thestep 170, there is no traffic to the cores oncell 0. - From the above description, it is evident that the operation of the
MTCs 140 andSADs 78 is primarily responsible for determining whether thecomputer system 2 is operating according to the direct access memory configuration or the agent access memory configuration. TheFAB 142 however remains unchanged when switching between direct access and agent access memory configurations. In the present embodiment, theMTCs 140 andSADs 78 are pre-programmed to be capable of both the direct access memory configuration and the agent access memory configuration. Upon receiving the appropriate request(s) from the firmware, however, the internal characteristics of theMTCs 140 of the 108, 112, 116 and 120 are modified from being appropriate for the direct access memory configuration to being appropriate for the agent access memory configuration. Likewise, upon receiving the appropriate request(s) from the firmware, the internal characteristics of theCCs SADs 78 also are changed from being appropriate for the direct access memory configuration to being appropriate for the agent access memory configuration. - More particularly with respect to the
MTCs 140, each of theMTCs 140 has entries that each respectively correspond to a respective memory segment. Each of the entries can be set independently by the firmware for either agent access or direct access memory configurations. The MTC entries include MC addresses for different memory lines, and each memory address typically remains the same regardless of whether the line is to be accessed via the direct access memory configuration or the agent access memory configuration, in order to facilitate the conversion process (e.g., to facilitate the mapping of memory from a deleted cell to a remaining cell). To distinguish between whether a given MC address is for use in the agent access memory configuration or the direct access memory configuration, firmware sets or clears a bit in each such MTC entry indicating whether the corresponding memory block is for use in the agent access memory configuration or the direct access memory configuration. Thus, each MTC entry corresponds to a single memory block, and the direct access/agent access attribute can be set independently for each entry. Additionally, although theSADs 78 and related components are re-programmed to facilitate the conversion from the direct access memory configuration to the agent access memory configuration, every direct access memory address used as an alias address for satisfying read requests to the socket home agent remain the same. - Once the
SADs 78 have been re-programmed in thestep 174 such that the memory configuration on thecell 0 has been changed from the direct access memory configuration to the agent access memory configuration, the core(s) that were migrated from thecell 0 to thecell 1 are migrated back to their original location(s) on thecell 0, at astep 176. In particular, during this step, the architected states of the cores of thecell 0 are retrieved and the cores can re-establish all active process execution. This procedure is largely the reverse of the procedure performed during thestep 168 discussed above. Upon completion of this step, the process next advances to astep 178. - The
step 178 is optional and can be skipped. Nevertheless, for optimal utilization of all of the memory segments 26-36 of thecomputer system 2 that are available when operating in the agent access memory configuration, at thestep 178 the partition memory on the 0 and 1 is re-interleaved with the memory segments. The memory segments 26-36 are re-interleaved by collaboration between the firmware and the hardware, such as the memory migration engines and the MTCs, by executing special software to facilitate such a re-interleaving. By re-interleaving all of the available partition memory, thecells cell 0 becomes capable of accessing not only the partition memory of thecell 0 but also the partition memory of the cell 1 (and vice-versa). This reduces memory access latency and diminishes the frequency of memory “hot-spots” relative to the performance that could be obtained if the 0, 1 were only able to access their own memory (e.g., by operating in the direct access memory configuration).cells - After completing (or skipping) the
step 178, the process then advances to astep 180. At this step, an on-line addition operation is performed to expose the newly-addedcell 1, its cores 62-76 and its partition memory to the operating system, thereby finalizing the conversion of thecomputer system 2 from the direct access memory configuration to the agent access memory configuration. In particular, by exposing thenew cell 1 to the operating system, thenew cell 1 with its cores 62-76 can now be used for active processing of data in addition to the existingcell 0. At this point, theprocess 158 of conversion from the direct access memory configuration to the agent access memory conversion is complete and the process ends at astep 182. - Turning to
FIG. 3 , anadditional flowchart 184 shows exemplary steps of another process by which thecomputer system 2 is converted from the agent access memory configuration to the direct access memory configuration, in accordance with at least some embodiments of the present invention. This conversion is performed when one or more cell(s) is/are deleted from a partition so as to leave only a single cell in the partition, in order to achieve memory latency on the remaining cell of the partition that would have been achieved if the remaining partition had been booted with that single-cell configuration. In the present example ofFIG. 3 , it is presumed that thecomputer system 2 initially operating in accordance with the agent access memory configuration begins operation with a partition having two cells (namely, thecells 0 and 1), but then ends operation with only a single cell (namely, the cell 0). However, it should be understood (as is the case with respect to the process ofFIG. 2 ) that the present conversion operation is equally applicable to computer systems that are reduced from having a higher number of cells (e.g., more than two, and particularly more than four) to a lesser number of cells. - As shown in FIG, 3, the process begins at a
step 186 and then proceeds to anotherstep 188. At thestep 188, the partition memory on the cell that is being removed (cell 1) is deleted by way of an on-line deletion operation. By performing this operation, the partition memory on thecell 1 is made invisible to the operating system, so as to facilitate an on-line removal of thecell 1 without affecting the executing processes on the cores of thecell 0. Additionally at thestep 188, the active processing on the cores 62-76 of thecell 1 is terminated to prevent thecomputer system 2 ofFIG. 1 from crashing while deleting that cell. Further during the on-line deletion operation, the operating system releases all of the socket local memory of thecell 1 and also the partition memory of the cell 1 (e.g., the segments 32-36)) to ensure that the remaining memory can fit onto the remainingcell 0. Because there are two sockets on thecell 0, the partition memory will typically be in groups of two (such that it can be interleaved across both of the two sockets of the cell 0), and the groups are claimed directly by the TADs in thecell 0. - Next, at a
step 190, all of the cores 62-76 of thecell 1 are deleted from the operating system by way of an on-line deletion operation. After this on-line deletion operation has been performed on the cores of thecell 1, these cores are not used anymore for any active process execution. That the cores 62-76 attain this inactive status is desirable in order to preclude thecomputer system 2 from crashing in an event where thecell 1 still has processes running on it after that cell is removed from the partition. Subsequently, at astep 192, all of the partition memory previously available to both thecell 0 andcell 1 is de-interleaved into a 1-way partition memory capable of being accessed only by the remainingcell 0 in the partition. De-interleaving of the partition memory takes place by way of special software under the collaboration of the memory migration engines and the firmware. - Subsequently, at a
step 194, the cores on thecell 0 are migrated to thecell 1 so as to allow modification of the source decoders and address maps of the cores on thecell 0 to be converted from the agent access memory configuration to the direct access memory configuration. Similar to thestep 168 ofFIG. 2 , migration of each of the cores 46-60 of thecell 0 involves capturing (or “freezing”) the present architected state of that core, and then transferring that state of the core to a “spare” core on thecell 1, which resumes all of the active processing for the core that has been migrated from thecell 0. Each core is migrated in succession, with the same process being repeated until all of the cores from thecell 0 have been migrated to the corresponding spare cores on thecell 1. The migration of the cores 46-60 from thecell 0 enables the SADs 78 on those cores to be re-programmed for operation according to the direct access memory configuration, since after the migration those cells are precluded by firmware from performing any active processing. At the same time, since the migration of the core processing to the cores 62-76 of thecell 1 is invisible to the operating system, thecomputer system 2 continues to perform the same overall process after the migration (except by different cores) as was occurring before the migration. - After the
step 194, the process then advances to astep 196 at which the memory migration engine(s) are programmed to remove all references tocell 0 cores from both the CC directory (more particularly, the CC filter tag cache or “RTAG”, which is designed to track remote ownership of memory lines) and the MC directory. This is done at least in part so that there are not any references to thecell 0 cores, and so that updating the SAD becomes easier (or becomes possible if the SAD does not support dynamic changes). More particularly, the cache entries in the cores of thecell 0 are cleared and copied back to the appropriate memory segments 26-36 to prevent any loss of data, and the directories are updated to reflect the cleared caches. Additionally in thestep 196, the directory entries on thecell 0 are modified to reflect the removal of the partition memory of the cell 1 (e.g., the memory segments 32-36) so that the directories include only the entries for the partition memory of the cell 0 (e.g., the memory segments 26-30). Again as was the case in thestep 172, theTADs 106 in the MCs 82-106 are not re-programmed as the bank addresses of the memory segments 26-36 on thecell 0 are not affected by the removal of the partition memory of thecell 1. - Next at a
step 198, theMTCs 140 on the 108, 112, 116 and 120 of theCCs cell 0 are re-programmed. As previously mentioned, theMTCs 140 are responsible for translating global/fabric addresses received off of thefabric 8 into local physical addresses for accessing memory locations when thecomputer system 2 is operating according to the agent access memory configuration. However, when thecomputer system 2 is operating in the direct access memory configuration, theMTCs 140 are used only for accessing I/O systems and any possible remaining remote cores, and all of the memory request signals are transferred only to the MCs 88-94 of thecell 0. Thus, while theMTCs 140 are pre-programmed for both operation according to each of the agent access memory configuration and the direct access memory configuration, at the step 198 (with the aid of the firmware) theMTCs 140 are re-programmed to convert their configuration particularly to the direct access memory configuration, such that the MTCs no longer operate to translate global/fabric addresses into local addresses and thefabric 8 is no longer utilized for memory accesses. As discussed above, this re-programming can involve updating by firmware of a particular bit per entry within theMTCs 140. - Further at a
step 200, theSADs 78 on each of the cores 46-60 on thecell 0 are re-programmed. As is the case with theMTCs 140, theSADs 78 are pre-programmed for operation in both the agent access memory configuration and the direct access memory configuration. When converting from the agent access memory configuration to the direct access memory configuration, the firmware re-programs theSADs 78 on each of the cores 46-60 of thecell 0 to route all memory requests directly to one of the MCs 88-94 on the same cell rather than to any of the PAs 18-24. Thus, as was the case with respect to the process ofFIG. 2 , the process ofFIG. 3 in the present embodiment involves each of the programming of the memory migration engine, the updating of the cache coherency controllers, and the re-programming of theMTCs 140 and theSADs 78 when the cores 46-60 of thecell 0 have been migrated to thecell 1. Once theMTCs 140 and theSADs 78 have been re-programmed, at astep 202 the cores (that is, the states of the cores) are migrated back from thecell 1 to thecell 0, one at a time, to resume all active processing in the newly-determined direct access memory configuration. - Next, at a
step 204, the directory caches (particularly the RTAGs mentioned above) in the CCs of thecell 0 are updated to prevent any memory accesses to the partition memory of the cell 1 (e.g., the memory segments 32-36). As mentioned above, each such directory cache is assigned a subset of memory segments (home segments) and is responsible for tracking ownership of the memory segments other than its own home segments. After thecell 0 has been converted to the direct access memory configuration and the cores 62-76 on thecell 1 have become inactive, the directory caches in the CCs of thecell 0 are cleared to prevent any further snoops from being issued to cores oncell 1 for local direct access memory. Subsequently, at astep 206, all the remaining resources, such as the interrupts and input output systems, of thecell 1 are moved to thecell 0. From this point onwards, thecell 0 becomes responsible for handling all those resources. - At a
step 208, thecell 1 is then removed from the partition and is no longer exposed to the operating system. Before this point, even though an on-line deletion operation to remove thecell 1 had been performed, thecell 1 was available to aid the transition from the agent access memory configuration to the direct access memory configuration. Depending upon the embodiment or circumstance, the removedcell 1 can be completely removed from thecomputer system 2 or it can exist within thecomputer system 2 as an unassigned resource that can again be added to any partition within that computer system at any point. After removal of thecell 1 from the partition,only cell 0 is left behind, which accesses the partition memory according to the direct access memory configuration. The process ofFIG. 3 then ends at astep 210. - While the processes described above with respect to the
158 and 184 offlow charts FIG. 2 andFIG. 3 , respectively, show examples of processes by which the memory performance of thecomputer system 2 can be optimized by converting dynamically between a direct access memory configuration and an agent access memory configuration, the present invention is also intended to encompass a variety of other processes, including modifications and/or refinements of the above-described processes. In particular, the present invention is intended to encompass a variety of processes that allow for dynamic on-line conversion of the memory configuration, without rebooting of the computer system. The particular process steps employed above to facilitate conversion between the agent access and direct access memory configurations, the cache coherency protocols employed by the CCs, the interleave algorithms employed by the PAs and other features can all be varied depending upon the type/needs of the computer system being used and the memory being used. - It is specifically intended that the present invention not be limited to the embodiments and illustrations contained herein, but include modified forms of those embodiments including portions of the embodiments and combinations of elements of different embodiments as come within the scope of the following claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/741,933 US7904676B2 (en) | 2007-04-30 | 2007-04-30 | Method and system for achieving varying manners of memory access |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/741,933 US7904676B2 (en) | 2007-04-30 | 2007-04-30 | Method and system for achieving varying manners of memory access |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20080270713A1 true US20080270713A1 (en) | 2008-10-30 |
| US7904676B2 US7904676B2 (en) | 2011-03-08 |
Family
ID=39888400
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US11/741,933 Expired - Fee Related US7904676B2 (en) | 2007-04-30 | 2007-04-30 | Method and system for achieving varying manners of memory access |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US7904676B2 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110138159A1 (en) * | 2009-12-09 | 2011-06-09 | Sanyo Electric Co., Ltd. | Memory control apparatus |
| US20140281457A1 (en) * | 2013-03-15 | 2014-09-18 | Elierzer Weissmann | Method for booting a heterogeneous system and presenting a symmetric core view |
| US20150331797A1 (en) * | 2014-05-17 | 2015-11-19 | International Business Machines Corporation | Memory access tracing method |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9727464B2 (en) | 2014-11-20 | 2017-08-08 | International Business Machines Corporation | Nested cache coherency protocol in a tiered multi-node computer system |
| US9886382B2 (en) | 2014-11-20 | 2018-02-06 | International Business Machines Corporation | Configuration based cache coherency protocol selection |
Citations (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5787095A (en) * | 1994-10-25 | 1998-07-28 | Pyramid Technology Corporation | Multiprocessor computer backlane bus |
| US5822618A (en) * | 1994-11-21 | 1998-10-13 | Cirrus Logic, Inc. | System for automatically switching to DMA data transfer mode to load and unload data frames when there are excessive data frames in memory buffer |
| US5842031A (en) * | 1990-11-13 | 1998-11-24 | International Business Machines Corporation | Advanced parallel array processor (APAP) |
| US20020087813A1 (en) * | 2000-07-31 | 2002-07-04 | Harris Kevin W. | Technique for referencing distributed shared memory locally rather than remotely |
| US6421775B1 (en) * | 1999-06-17 | 2002-07-16 | International Business Machines Corporation | Interconnected processing nodes configurable as at least one non-uniform memory access (NUMA) data processing system |
| US6457100B1 (en) * | 1999-09-15 | 2002-09-24 | International Business Machines Corporation | Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls |
| US20020144063A1 (en) * | 2001-03-29 | 2002-10-03 | Jih-Kwon Peir | Multiprocessor cache coherence management |
| US6487619B1 (en) * | 1999-10-14 | 2002-11-26 | Nec Corporation | Multiprocessor system that communicates through an internal bus using a network protocol |
| US20030009641A1 (en) * | 2001-06-21 | 2003-01-09 | International Business Machines Corp. | Dynamic history based mechanism for the granting of exclusive data ownership in a non-uniform memory access (numa) computer system |
| US20030131067A1 (en) * | 2002-01-09 | 2003-07-10 | International Business Machines Corporation | Hardware support for partitioning a multiprocessor system to allow distinct operating systems |
| US6668308B2 (en) * | 2000-06-10 | 2003-12-23 | Hewlett-Packard Development Company, L.P. | Scalable architecture based on single-chip multiprocessing |
| US6671792B1 (en) * | 2000-04-28 | 2003-12-30 | Hewlett-Packard Development Company, L.P. | Share masks and alias for directory coherency |
| US6684343B1 (en) * | 2000-04-29 | 2004-01-27 | Hewlett-Packard Development Company, Lp. | Managing operations of a computer system having a plurality of partitions |
| US6725317B1 (en) * | 2000-04-29 | 2004-04-20 | Hewlett-Packard Development Company, L.P. | System and method for managing a computer system having a plurality of partitions |
| US20040268044A1 (en) * | 2003-06-25 | 2004-12-30 | International Business Machines Corporation | Multiprocessor system with dynamic cache coherency regions |
| US6848003B1 (en) * | 1999-11-09 | 2005-01-25 | International Business Machines Corporation | Multi-node data processing system and communication protocol that route write data utilizing a destination ID obtained from a combined response |
| US20050021913A1 (en) * | 2003-06-25 | 2005-01-27 | International Business Machines Corporation | Multiprocessor computer system having multiple coherency regions and software process migration between coherency regions without cache purges |
| US6910062B2 (en) * | 2001-07-31 | 2005-06-21 | International Business Machines Corporation | Method and apparatus for transmitting packets within a symmetric multiprocessor system |
| US20050240649A1 (en) * | 2004-04-12 | 2005-10-27 | Hewlett-Packard Development Company, L.P. | Resource management system |
| US20050246508A1 (en) * | 2004-04-28 | 2005-11-03 | Shaw Mark E | System and method for interleaving memory |
| US6973517B1 (en) * | 2000-08-31 | 2005-12-06 | Hewlett-Packard Development Company, L.P. | Partition formation using microprocessors in a multiprocessor computer system |
| US20060031672A1 (en) * | 2004-08-03 | 2006-02-09 | Soltis Donald C Jr | Resource protection in a computer system with direct hardware resource access |
| US20070245334A1 (en) * | 2005-10-20 | 2007-10-18 | The Trustees Of Columbia University In The City Of New York | Methods, media and systems for maintaining execution of a software process |
| US20080162873A1 (en) * | 2006-12-28 | 2008-07-03 | Zimmer Vincent J | Heterogeneous multiprocessing |
| US7398360B2 (en) * | 2005-08-17 | 2008-07-08 | Sun Microsystems, Inc. | Multi-socket symmetric multiprocessing (SMP) system for chip multi-threaded (CMT) processors |
| US7467204B2 (en) * | 2005-02-10 | 2008-12-16 | International Business Machines Corporation | Method for providing low-level hardware access to in-band and out-of-band firmware |
-
2007
- 2007-04-30 US US11/741,933 patent/US7904676B2/en not_active Expired - Fee Related
Patent Citations (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5842031A (en) * | 1990-11-13 | 1998-11-24 | International Business Machines Corporation | Advanced parallel array processor (APAP) |
| US5787095A (en) * | 1994-10-25 | 1998-07-28 | Pyramid Technology Corporation | Multiprocessor computer backlane bus |
| US5822618A (en) * | 1994-11-21 | 1998-10-13 | Cirrus Logic, Inc. | System for automatically switching to DMA data transfer mode to load and unload data frames when there are excessive data frames in memory buffer |
| US6421775B1 (en) * | 1999-06-17 | 2002-07-16 | International Business Machines Corporation | Interconnected processing nodes configurable as at least one non-uniform memory access (NUMA) data processing system |
| US6457100B1 (en) * | 1999-09-15 | 2002-09-24 | International Business Machines Corporation | Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls |
| US6487619B1 (en) * | 1999-10-14 | 2002-11-26 | Nec Corporation | Multiprocessor system that communicates through an internal bus using a network protocol |
| US6848003B1 (en) * | 1999-11-09 | 2005-01-25 | International Business Machines Corporation | Multi-node data processing system and communication protocol that route write data utilizing a destination ID obtained from a combined response |
| US6671792B1 (en) * | 2000-04-28 | 2003-12-30 | Hewlett-Packard Development Company, L.P. | Share masks and alias for directory coherency |
| US6918052B2 (en) * | 2000-04-29 | 2005-07-12 | Hewlett-Packard Development Company, L.P. | Managing operations of a computer system having a plurality of partitions |
| US6725317B1 (en) * | 2000-04-29 | 2004-04-20 | Hewlett-Packard Development Company, L.P. | System and method for managing a computer system having a plurality of partitions |
| US6684343B1 (en) * | 2000-04-29 | 2004-01-27 | Hewlett-Packard Development Company, Lp. | Managing operations of a computer system having a plurality of partitions |
| US6668308B2 (en) * | 2000-06-10 | 2003-12-23 | Hewlett-Packard Development Company, L.P. | Scalable architecture based on single-chip multiprocessing |
| US20020087813A1 (en) * | 2000-07-31 | 2002-07-04 | Harris Kevin W. | Technique for referencing distributed shared memory locally rather than remotely |
| US6973517B1 (en) * | 2000-08-31 | 2005-12-06 | Hewlett-Packard Development Company, L.P. | Partition formation using microprocessors in a multiprocessor computer system |
| US20020144063A1 (en) * | 2001-03-29 | 2002-10-03 | Jih-Kwon Peir | Multiprocessor cache coherence management |
| US20030009641A1 (en) * | 2001-06-21 | 2003-01-09 | International Business Machines Corp. | Dynamic history based mechanism for the granting of exclusive data ownership in a non-uniform memory access (numa) computer system |
| US6910062B2 (en) * | 2001-07-31 | 2005-06-21 | International Business Machines Corporation | Method and apparatus for transmitting packets within a symmetric multiprocessor system |
| US20030131067A1 (en) * | 2002-01-09 | 2003-07-10 | International Business Machines Corporation | Hardware support for partitioning a multiprocessor system to allow distinct operating systems |
| US20050021913A1 (en) * | 2003-06-25 | 2005-01-27 | International Business Machines Corporation | Multiprocessor computer system having multiple coherency regions and software process migration between coherency regions without cache purges |
| US20040268044A1 (en) * | 2003-06-25 | 2004-12-30 | International Business Machines Corporation | Multiprocessor system with dynamic cache coherency regions |
| US20050240649A1 (en) * | 2004-04-12 | 2005-10-27 | Hewlett-Packard Development Company, L.P. | Resource management system |
| US20050246508A1 (en) * | 2004-04-28 | 2005-11-03 | Shaw Mark E | System and method for interleaving memory |
| US20060031672A1 (en) * | 2004-08-03 | 2006-02-09 | Soltis Donald C Jr | Resource protection in a computer system with direct hardware resource access |
| US7467204B2 (en) * | 2005-02-10 | 2008-12-16 | International Business Machines Corporation | Method for providing low-level hardware access to in-band and out-of-band firmware |
| US7398360B2 (en) * | 2005-08-17 | 2008-07-08 | Sun Microsystems, Inc. | Multi-socket symmetric multiprocessing (SMP) system for chip multi-threaded (CMT) processors |
| US20070245334A1 (en) * | 2005-10-20 | 2007-10-18 | The Trustees Of Columbia University In The City Of New York | Methods, media and systems for maintaining execution of a software process |
| US20080162873A1 (en) * | 2006-12-28 | 2008-07-03 | Zimmer Vincent J | Heterogeneous multiprocessing |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110138159A1 (en) * | 2009-12-09 | 2011-06-09 | Sanyo Electric Co., Ltd. | Memory control apparatus |
| US8812810B2 (en) * | 2009-12-09 | 2014-08-19 | Semiconductor Components Industries, Llc | Memory control apparatus |
| US20140281457A1 (en) * | 2013-03-15 | 2014-09-18 | Elierzer Weissmann | Method for booting a heterogeneous system and presenting a symmetric core view |
| US9727345B2 (en) * | 2013-03-15 | 2017-08-08 | Intel Corporation | Method for booting a heterogeneous system and presenting a symmetric core view |
| US10503517B2 (en) | 2013-03-15 | 2019-12-10 | Intel Corporation | Method for booting a heterogeneous system and presenting a symmetric core view |
| US20150331797A1 (en) * | 2014-05-17 | 2015-11-19 | International Business Machines Corporation | Memory access tracing method |
| US9928175B2 (en) | 2014-05-17 | 2018-03-27 | International Business Machines Corporation | Identification of a computing device accessing a shared memory |
| US9940237B2 (en) * | 2014-05-17 | 2018-04-10 | International Business Machines Corporation | Identification of a computing device accessing a shared memory |
| US10169237B2 (en) | 2014-05-17 | 2019-01-01 | International Business Machines Corporation | Identification of a computing device accessing a shared memory |
| US10241917B2 (en) | 2014-05-17 | 2019-03-26 | International Business Machines Corporation | Identification of a computing device accessing a shared memory |
| US11163681B2 (en) | 2014-05-17 | 2021-11-02 | International Business Machines Corporation | Identification of a computing device accessing a shared memory |
Also Published As
| Publication number | Publication date |
|---|---|
| US7904676B2 (en) | 2011-03-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7469321B2 (en) | Software process migration between coherency regions without cache purges | |
| US10891228B2 (en) | Cache line states identifying memory cache | |
| US7484043B2 (en) | Multiprocessor system with dynamic cache coherency regions | |
| US7814279B2 (en) | Low-cost cache coherency for accelerators | |
| US6088769A (en) | Multiprocessor cache coherence directed by combined local and global tables | |
| US6374331B1 (en) | Distributed directory cache coherence multi-processor computer architecture | |
| JP3849951B2 (en) | Main memory shared multiprocessor | |
| US20050144399A1 (en) | Multiprocessor system, and consistency control device and consistency control method in multiprocessor system | |
| US6654858B1 (en) | Method for reducing directory writes and latency in a high performance, directory-based, coherency protocol | |
| KR102092660B1 (en) | Cpu and multi-cpu system management method | |
| JPH05128071A (en) | Apparatus and method for optimizing performance of multiplex processor system | |
| KR20030033109A (en) | Error recovery | |
| KR20000076539A (en) | Non-uniform memory access (numa) data processing system having shared intervention support | |
| US7149922B2 (en) | Storage system | |
| US7904676B2 (en) | Method and system for achieving varying manners of memory access | |
| US6721852B2 (en) | Computer system employing multiple board sets and coherence schemes | |
| CN116414563A (en) | Memory control device, cache consistency system and cache consistency method | |
| CN101216781A (en) | A multi-processor system, device and method | |
| US11487654B2 (en) | Method for controlling write buffer based on states of sectors of write buffer and associated all flash array server | |
| US20240370370A1 (en) | Dynamic Migration Of Point-Of-Coherency And Point-Of-Serialization In NUMA Coherent Interconnects | |
| US6961827B2 (en) | Victim invalidation | |
| JP2005250830A (en) | Processor and main memory shared multiprocessor | |
| US8028130B1 (en) | Pipeline structure for a shared memory protocol | |
| US6397295B1 (en) | Cache mechanism for shared resources in a multibus data processing system | |
| US12511235B2 (en) | Storage apparatus and control method for storage apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HORNUNG, BRYAN;SHAW, MARK;REEL/FRAME:019747/0055;SIGNING DATES FROM 20070821 TO 20070822 Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HORNUNG, BRYAN;SHAW, MARK;SIGNING DATES FROM 20070821 TO 20070822;REEL/FRAME:019747/0055 |
|
| CC | Certificate of correction | ||
| REMI | Maintenance fee reminder mailed | ||
| LAPS | Lapse for failure to pay maintenance fees | ||
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20150308 |