US20090132750A1 - Cache memory system - Google Patents
Cache memory system Download PDFInfo
- Publication number
- US20090132750A1 US20090132750A1 US12/284,336 US28433608A US2009132750A1 US 20090132750 A1 US20090132750 A1 US 20090132750A1 US 28433608 A US28433608 A US 28433608A US 2009132750 A1 US2009132750 A1 US 2009132750A1
- Authority
- US
- United States
- Prior art keywords
- data
- memory
- cache
- address
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0853—Cache with multiport tag or data arrays
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0895—Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6024—History based prefetching
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6028—Prefetching based on hints or prefetch instructions
Definitions
- United Kingdom Patent Application No. 0722707.7 filed Nov. 19, 2007, entitled “CACHE MEMORY SYSTEM”.
- United Kingdom Patent Application No. 0722707.7 is assigned to the assignee of the present application and is hereby incorporated by reference into the present disclosure as if fully set forth herein.
- the present application hereby claims priority under 35 U.S.C. ⁇ 119(a) to United Kingdom Patent Application No. 0722707.7.
- the present invention relates to systems comprising cache memories, and in particular to systems employing data pre-fetching.
- a very large number of systems involve the retrieval of data from a system memory by a device such as a processor.
- Many of these systems employ a technique known as data caching which exploits a property of data access known as temporal locality.
- Temporal locality means data that has been accessed recently is the data most likely to be accessed again in the near future.
- Data caching involves storing, or caching, a copy of recently accessed data in a cache memory that is accessible more quickly and efficiently than the system memory. If the same data is requested again in the future, the cached copy of the data can be retrieved from the cache memory rather than retrieving the original data from the system memory. As the cache memory can be accessed more quickly than the system memory, this scheme generally increases the overall speed of data retrieval.
- processor circuitry typically includes an internal cache memory which is located physically closer to the CPU than the system memory, so can be accessed more quickly than the system memory.
- processor requests data from the system memory a copy of the retrieved data is stored in the cache memory, if it is not stored there already.
- Some systems provide two or more caches arranged between the CPU and the system memory in a hierarchical structure. Caches further up the hierarchy are typically smaller in size, but can be accessed more quickly by the CPU than caches lower down the hierarchy. Caches within such a structure are usually referred to as level 1 (L1), level 2 (L2), level 3 (L3), . . . caches with the L1 cache usually being the smallest and fastest.
- a typical cache memory comprises a series of cache lines, each storing a predetermined sized portion of data. For example, a typical cache memory is divided into 1024 cache lines, each 32 bytes in size, giving a total capacity of 32 kB. Data is usually cached in portions equal to the size of a whole number of cache lines. When an item of data smaller than a cache line is cached, a block of data equal to the size of one or more cache lines containing the data item is cached. For example, the data item may be located at the beginning of the cache line sized portion of data, at the end or somewhere in the middle. Such an approach can improve the efficiency of data accesses exploiting a principle known as spatial locality.
- the principle of spatial locality means that addresses referenced by programs in a short space of time are likely to span a relatively small portion of the entire address space. By caching one or more entire cache lines, not only is the requested data item cached, but also data located nearby, which, by the principle of spatial locality is more likely to be required in the near future than other data.
- Each cache line of the cache memory is associated with address information, known as tags, identifying the region of the system memory from which the data stored in each cache line was retrieved.
- the tag associated with a particular cache line may comprise the address of the system memory from which the cache line sized portion of data stored in that cache line was retrieved.
- the cache lines may be stored in a data memory portion of the cache, while the tags may be stored in a tag memory portion of the cache.
- the address of the requested data is first compared to the address information in the tag memory to determine whether a copy of the requested data is already located in the cache as the result of a previous data access. If so, a cache hit occurs and the copy of the data is retrieved from the cache. If not, a cache miss occurs, in which case the data is retrieved from the system memory.
- a copy of the retrieved data may be stored in the cache in one or more selected cache lines and the associated tags updated accordingly.
- the highest level cache is first checked to determine if a copy of the data is located there.
- the next highest level cache is checked, and so on, until the lowest level cache has been checked. If the data is not located in any of the caches then the data is retrieved from the system memory. A copy of the retrieved data may be stored in any of the caches in the hierarchy.
- one process referred to as write-back or copy-back, involves writing or copying data stored in one or more cache lines back to the region of system memory from which the cache lines were originally retrieved (as specified in the address information).
- This process may be performed in a variety of circumstances. For example, when data stored in a cache line has been modified, the cache line may be copied back to the system memory to ensure that the data stored in the cache line and the corresponding data in the system memory are identical. In another example, when data is copied into the cache as a result of a cache miss, an existing cache line of data may need to be removed to make space for the new entry.
- This process is known as eviction and the cache line of data that needs to be removed is known as the victim. If the victim comprises modified data, then the victim would need to be written back to the system memory to ensure that the modifications made to the data are not lost when the victim is deleted from the cache.
- special data coherency routines implemented in software are executed to maintain data coherency. Such routines may periodically sweep the cache to ensure that data coherency is maintained, or may act only when specifically required, for example when data is modified or replaced. These routines may include write-back or copy-back processes.
- Some systems employ a technique known as data pre-fetching in which data may be retrieved, possibly speculatively, before it is actually needed in order to increase the overall speed of memory access.
- Data pre-fetches may be speculative in the sense that the pre-fetched data may not eventually be required.
- data pre-fetching when executing a code loop in which an item of data needs to be retrieved within each iteration of the loop, the data required for a particular iteration may be pre-fetched during the preceding iteration. In this way, at the point the data is actually required, it does not need to be retrieved at that time.
- pre-fetched data is stored in a cache and treated as cached data. In this way, when the pre-fetched data is actually requested, the cache will be checked to determine whether the requested data is located there. Due to the earlier data pre-fetch, a copy of the data can be retrieved from the cache, rather than accessing the system memory. Pre-fetching data into a cache is useful even in applications involving data accesses where the property of temporal locality do not apply. For example, in data streaming applications, data may only be used a single time, so temporal locality does not apply in this case. However, for the reasons given above caching pre-fetched data is advantageous.
- processor architectures provide special pre-fetch instructions which allow software to cause data to be pre-fetched into a cache in advance of its use. Examples of such instructions include pre-fetch, preload or touch instructions.
- a cache normally communicate via a special interface which allows the cache to perform actions when a special instruction is executed by the processor.
- Data may be pre-fetched into any cache present in a cache hierarchy, such as a level 1 cache or level 2 cache. In some systems, pre-fetching data into a level 2 cache may be performed as a consequence of issuing a request to pre-fetch data into the level 1 cache.
- a limiting factor in the performance of many systems is the delay between a CPU requesting data from memory and the data actually being supplied to it. This delay is known as memory latency.
- memory latency For example, the memory latency of highly integrated systems is typically 10-100 times the duration of the execution of a single instruction by the CPU.
- CPU clock rates are increasing rapidly, resulting in increasing demand for higher rates of data access.
- Even with improvements in the speed of memory access the effects of memory latency are becoming more significant as a result.
- the present invention solves these and other problems associated with existing techniques.
- the present disclosure provides a cache memory system for caching data comprising: a cache memory for storing a copy of a portion of data stored in a system memory; and a cache load circuit capable of retrieving the portion of data from the system memory and of storing a copy of the retrieved portion of data in the cache memory; wherein the system further comprises: means for monitoring accesses to data in the system memory; a first memory for storing a first value defining a first memory address; a comparator for comparing the access address of an access to data in the system memory with the first memory address; and a second memory for storing a second value defining a second memory address; the system being arranged such that, if a relationship between the access address and the first memory address is satisfied, the cache load circuit retrieves the portion of data stored in the system memory at the second memory address defined by the second value, and stores the retrieved portion of data in the cache memory; wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
- the present disclosure provides a method for pre-fetching data into a cache memory system, the method comprising the steps of: retrieving a portion of data from a system memory; and storing a copy of the retrieved portion of data in a cache memory; wherein the method comprises the further steps of: monitoring accesses to data in the system memory; comparing the access address of an access to data in the system memory with a first memory address; and if a relationship between the access address and the first memory address is satisfied, retrieving the portion of data stored in the system memory at a second memory address, and storing the retrieved portion of data in the cache memory; wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
- FIG. 1 is a schematic diagram of a cache memory system in a first embodiment of the invention
- FIG. 2 is a schematic diagram of a system comprising the cache shown in FIG. 1 ;
- FIG. 3 is a schematic diagram of the monitoring circuit comprised in the system illustrated in FIG. 1 ;
- FIG. 4 shows a system topology comprising a level 2 cache
- FIG. 5 shows the internal structure of a level 2 cache
- FIG. 6 shows a flow diagram for a pre-fetch procedure
- FIG. 7 shows the fields of a 32-bit physical address and how they are interpreted by the L2 cache lookup logic
- FIG. 8 shows internal buffering and logic for a level 2 cache
- FIG. 9 shows the internal structure of a level 2 cache for a further embodiment.
- FIG. 1 is a schematic diagram of an exemplary cache memory system embodying the present disclosure.
- the system referred to below simply as cache 1 , comprises a data memory 3 for storing one or more cache lines 5 of data and a tag memory 7 for storing address information in the form of a series of tags 9 .
- For each cache line 5 in the data memory 3 there is a corresponding tag 9 in the tag memory 7 .
- the cache 1 also comprises a cache load circuit 19 used to store data in the data memory 3 . It is understood that the present disclosure may be used in a variety of cache systems and is not limited to the arrangement illustrated in FIG. 1 .
- FIG. 2 illustrates a system 100 comprising the cache 1 shown in FIG. 1 .
- the cache 1 is a level 2 cache functionally located between a processor 101 comprising a level 1 cache 103 and a system memory 105 .
- the cache shown in FIG. 1 may be used as any level of cache, in any cache hierarchy arrangement or as a sole cache.
- the term system memory may refer to a specific memory device or to a group of two or more memory devices. In general the system memory represents a general memory space formed from the whole, or part of, the individual memory spaces of one or more memory devices.
- the processor 101 directly accesses the level 1 cache 103 .
- the level 1 cache 103 communicates with the level 2 cache 1 via bus lines 11 , 15 and 25 and the level 2 cache 1 communicates with the system memory 105 via bus line 29 .
- the system 100 also comprises other modules, including a module 107 having DMA (Direct Memory Access) capability.
- the module 107 accesses the level 2 cache 1 via bus line 109 .
- Other parts of the system may also access the level 2 cache 1 via further bus lines (not shown) which may be separate from or integrated with bus line 109 .
- the processor 101 issues a request for retrieval of data stored in the system memory 105 the following process occurs.
- the data access request is transmitted to the level 1 cache 103 which determines whether it stores a copy of the requested data. If so then the copy of the requested data is retrieved from the level 1 cache 103 and provided to the processor 101 . In this case, no data retrieval involving the level 2 cache 1 or the system memory 105 is made. If the level 1 cache 103 does not store a copy of the requested data then the data access request is forwarded from the level 1 cache 103 to the level 2 cache 1 . In this case, the level 2 cache 1 determines whether it stores a copy of the requested data.
- the copy of the requested data is retrieved from the level 2 cache 1 and provided to the level 1 cache 103 , which in turn provides the data to the processor 101 . If the level 2 cache 1 does not store a copy of the requested data then the data is retrieved from the system memory 105 . In this case, the level 2 cache 1 requests the data from the system memory 105 and provides the retrieved data to the level 1 cache 103 , which in turn provides it to the processor 101 .
- the level 2 cache 1 performs the following process when a data access request is received by it. First, a determination is made as to whether a copy of the data specified in the data access request is already present in the data memory 3 of the cache 1 .
- the data access request identifies the address of the system memory 105 at which the requested data is located.
- the address of the requested data is supplied to the tag memory 7 via line 11 and compared to the tags 9 stored in the tag memory 7 .
- Each tag 9 comprises an address of the system memory 105 from which a corresponding cache line 5 of data was originally retrieved. If the address of the data presently being requested matches an address specified by a tag 9 , this indicates that the data memory 3 does contain a copy of the requested data.
- a match is indicated by asserting a hit signal on line 13 , which is received by the data memory 3 and the cache load circuit 19 .
- the hit signal is asserted, the cache line 5 of data corresponding to the tag 9 causing the hit is retrieved from the data memory 3 and output from the data memory 3 and cache 1 on line 15 .
- the hit signal is not asserted.
- the requested data is retrieved from the system memory 105 using the cache load circuit 19 in the manner described below.
- a copy of the data retrieved from the system memory 105 by the cache load circuit is stored in the data memory 3 .
- the data is then output from the data memory 3 and cache 1 on line 15 .
- the cache load circuit 19 comprises a memory 21 which stores a queue of pending cache load operations. Each cache load operation represents an item of data to be retrieved from the system memory 105 and includes the memory address of the data item. A cache load operation may also contain other relevant information, such as whether the data is required as the result of a pre-fetch or some other type of data access.
- the address received on line 11 is provided to the cache load circuit 19 via line 17 .
- the cache load circuit 19 also receives the hit signal via line 13 . When the hit signal on line 13 is not asserted, the cache load circuit 19 adds a cache load operation to the queue stored in the memory 21 based on the address received on line 17 .
- the cache load circuit 19 processes each cache load operation in turn, for example in the order in which they were added to the queue. A newly added cache load operation will eventually be processed by the cache load circuit resulting in the data being retrieved from the system memory 105 , stored in the data memory 3 and output from the cache 1 .
- the cache load circuit To process a cache load operation, the cache load circuit identifies the address of the data to be cached and issues a suitable data access request on line 29 which is received by the system memory 105 .
- the cache load circuit identifies one or more suitable cache lines in the data memory in which to store the received data. These may comprise currently vacant cache lines. However, if there are insufficient free cache lines, it may be necessary to remove one or more existing cache lines of data to make room for the new data, in which case the write-back process described above may be required.
- the cache load circuit then transmits a load command to the data memory via line 31 comprising a copy of data to be cached, the system memory address from which the data was retrieved and the cache lines identified to store the data.
- the copy of the data is then stored in the cache lines specified in the load command and corresponding tags are added to the tag memory based on the address information specified in the load command.
- FIG. 1 A technique by which the embodiment illustrated in FIG. 1 implements pre-fetching of data into the cache will now be described.
- data accesses may include both read and write accesses.
- data is input from a first buffer and computation then applied to that data. The resulting data is then output to a different buffer. Since this task is typically carried out repetitively, a write to the destination buffer can be used as an indication of a subsequent access to the source buffer.
- Data reads, as well as data writes, from a specific region of memory may also be indicative of that data access involving another region of memory will occur in the near future.
- data from a second location or region is automatically pre-fetched into the cache.
- data from the second location is required later, it will be already be available from the cache as a result of the pre-fetch. This avoids the need to access the data from system memory at the point it is actually required thereby reducing memory latency.
- the memory address or region which triggers a pre-fetch of data may or may not be one from which data has been cached.
- the cache 1 comprises monitoring means, which in this embodiment is in the form of a monitoring circuit 33 , arranged to monitor data accesses within the system and, if a data access involving a first address or region of memory occurs, to cause a pre-fetch of data from a second address or region.
- the monitoring circuit 33 illustrated in more detail in FIG. 3 , comprises a first memory, mem 1 35 , a second memory, mem 2 37 and a comparator 39 .
- Memory mem 1 35 stores a first address which, if accessed is highly indicative that a data access involving a second address will be made imminently. This second address is stored in memory mem 2 37 .
- the comparator 39 is used to compare the addresses of data accesses within the system to the contents of mem 1 35 .
- the contents of memories mem 1 35 and mem 2 37 may be accessed via lines 41 and 43 respectively.
- the processor When the processor requests data from the system memory, the processor issues a data access request which includes the system memory address of the requested data. As shown in FIG. 1 , this access address is received by the cache 1 via line 25 . The access address is transmitted to the monitoring circuit 33 and received at a first input of the comparator 39 . The comparator 39 also receives the value stored in mem 1 35 at a second input. The comparator is arranged to compare the first and second inputs, and to assert an output signal on line 45 if the two inputs match. For example, a bitwise XOR operation between the access address and the value stored in mem 1 may be performed and the comparator output asserted if all bits of the result are equal to zero. It is understood that different ways to perform the comparison may be used.
- the signal on line 45 is received by mem 2 which is arranged to output the value stored by mem 2 on line 47 when the signal on line 45 is asserted.
- the value stored in mem 2 will be output from the monitoring circuit 33 if the data access address received on line 25 matches the value stored in mem 1 .
- the value output from the monitoring circuit on line 47 is used to initiate a pre-fetch of data from the address defined by the output value using any suitable method.
- the output value may be input as an address into the tag memory 7 . This causes the cache to determine whether a copy of the data located at that address is already present in the data memory 3 . If so, then no further action is taken since the data has already been cached. However, if a copy of the data is not located in the data memory 3 then a cache line of data located at the address is retrieved and stored in the data memory in the manner described above.
- the output value may be input as an address into the cache load circuit causing a new entry to be added to the queue of pending cache load operations.
- a pre-fetch of data may be initiated in any other suitable manner using the address based on the value output from the monitoring circuit.
- only a single cache line of data may be pre-fetched from the address defined by the monitoring circuit output.
- any specified number of cache lines may be pre-fetched from the address defined by the monitoring circuit output.
- the number of cache lines to be pre-fetched may be specified in any suitable way, for example by means of a stored value.
- the contents of memory mem 2 is divided into two parts, the first defining the address of data to be pre-fetched, and the second specifying the number of cache lines to be pre-fetched from that address.
- bits 31:5 of the 32 bit value stored in mem 2 define bits 31:5 of the address.
- the value stored in mem 2 defining the memory address of data to be pre-fetched may comprise an offset value or other means to define a memory address or region.
- the value may represent a signed offset value defining an offset relative to a memory address defined by the value stored in mem 1 .
- the address defined by the value stored in mem 1 is addr and the offset is os, then a data access involving the address addr will cause data to be pre-fetched from the address addr+os.
- a data pre-fetch may be performed even if there is not an identical match between the address of a data access and the address stored in mem 1 .
- a data pre-fetch may be initiated if there is certain relationship between the data access address and the address defined by the value stored in mem 1 .
- a data pre-fetch is initiated if there is a match between the bits of the data access address and the corresponding bits of mem 1 at a set of specified bit positions.
- the monitor circuit comprises a third memory, mem 3 49 , which stores a mask defining those bit positions at which the access address and the contents of mem 2 must match for a data pre-fetch to be initiated. For example, for each bit of the mask stored in mem 3 equal to 1, the corresponding bits of the data access address and contents of mem 2 must match. When a specific bit of the mask is equal to 0, then the corresponding bits of the data access address and mem 2 do not need to match.
- the comparator receives the contents of mem 3 at a third input and asserts an output if the bits of the access address and the corresponding bits of the value stored in mem 1 match at those bit positions at which the mask has a value of 1. For example, a bitwise XOR operation between the access address and the value stored in mem 1 may be performed. Then a bitwise AND operation between the resulting value and the mask is performed. The comparator output is asserted if all bits of the final result are equal to zero. It is understood that different ways to perform the comparison based on the mask may be used.
- Memory addresses having the same values for the p highest order bits span a memory range of 2 q bits (i.e. those memory addresses differing only in the q lowest order bits). Consequently, setting the mask in this way means that the memory address of a data access would not need not be exactly equal to the address stored in mem 1 .
- the address of the data access would need to be located within a region of memory of 2 q bytes in size beginning with the address defined by bits 31:8 of the value of mem 1 (with the remaining bits being equal to zero). In this way, the system can be conFIGUREd so that data accesses involving a specified region of memory initiate an automatic pre-fetch of data into the cache.
- the mask may be set so that the p highest order bits are set to 0 while the remaining q bits are set to 1.
- the memory address of a data access would only need to match the address stored in mem 1 in the q lowest order bits.
- the address stored in mem 1 may be one aligned to a address boundary of particular size (such as a cache line sized boundary).
- a data access would only initiate a pre-fetch if the address of the data access was aligned to the same boundary as the address stored in mem 1 .
- the system may be conFIGUREd so that only data accesses acting on addresses naturally aligned on a cache line sized boundaries trigger pre-fetching of data.
- a data pre-fetch is initiated if there is a match between the bits of the data access address and the corresponding bits of mem 1 at a set of specified bit positions.
- a data pre-fetch may be initiated if the data access address is within a certain range from the address defined by the value stored in mem 1 . Other relationships are possible.
- a pre-fetch of data is automatically performed only if the data access causes a cache miss. For example, even if the address of the data access matches the address stored in mem 1 based on the mask stored in mem 3 , a pre-fetch is only carried out if the hit signal on line 13 is not asserted. In this case, the procedure described above to retrieve and cache the requested data takes place. Concurrently, if the address of the data access matches the address stored in mem 1 based on the mask, data from the location defined by the value stored in mem 2 is pre-fetched into the cache.
- This additional condition may be applied for the following reason.
- a data access involving a first location does not result in a cache miss, this indicates that an access to the first location has already been made. If this is the case then at least the first such access would have triggered a pre-fetch of data from a second location into the cache in the manner described above. Therefore, it would not be necessary to pre-fetch the data again from the second location following subsequent accesses to the first location. Pre-fetching of data from the second location into the cache is only performed upon the first access to the first location, indicated by a cache miss.
- memories mem 1 , mem 2 and mem 3 are 32 bit registers arranged to store a 32 bit value and whose contents are modifiable. For example, values may be written to mem 1 , mem 2 and mem 3 during an initialisation period following a system reset, or may be modified dynamically by the processor or other system module. In alternative embodiments, different types and sizes of memory may be used.
- the mem 1 , mem 2 and mem 3 may comprise dedicated memories of any suitable type, or may comprise reserved locations within a larger memory space.
- mem 1 , mem 2 and mem 3 may comprise read only memories, or may comprise memories that can be written to one time only, for example at the time of manufacture.
- Allowing the contents of mem 1 , mem 2 and mem 3 to be modified provides a greater degree of flexibility, for example in systems in which the indicator of an imminent data access may change over time or depending on the application. In other applications in which the indicator of an imminent data access remains fixed, providing read only memories increases security by preventing the appropriate values from being modified inappropriately.
- the skilled person would appreciate that various further modifications of the embodiments described above may be made.
- the values stored in memories mem 1 and mem 2 may define addresses in the form of physical memory addresses, virtual memory addresses or may define addresses in any other suitable way.
- the kind of memory addresses defined by the values stored in mem 1 and mem 2 may be the same, or may be different. It can be seen that, in the embodiments described, the values stored in memories mem 1 and mem 2 correspond to different physical memory addresses of the system memory. In other words, a data access involving one physical memory address causes a pre-fetch of data from a different physical memory address, regardless of how the memory addresses are actually represented.
- a system and method which automatically monitors data fetches and initiates pre-fetches as described above. Since pre-fetches are not initiated using pre-fetch instructions, it is not necessary to modify existing code to include such pre-fetch instructions. Also, since data accesses are being continuously monitored, it is not necessary to know in advance every occasion on which a pre-fetch is required. Any relevant data accesses will be detected dynamically and a pre-fetch initiated if necessary. Furthermore, any delays associated with executing special pre-fetch instructions are eliminated. Data is pre-fetched quickly and efficiently by dedicated autonomous hardware upon detecting a relevant data access.
- a cache memory comprising storage means and a comparison means arranged to compare a first address provided to the cache memory when a write or load access is made to the cache with at least one predetermined address, and, responsive to the first address corresponding to one of the at least one predetermined addresses, cause the cache memory to fetch data from a further address of an external memory device and store the data in the storage means.
- a disadvantage of known systems is that they require the use of one or more special instructions to pre-fetch data into an L 1 cache.
- Standard names for these instructions are pre-fetch, preload or touch instructions. It is commonplace to extend this functionality to L2 caches so that the aforementioned instructions can effect a similar operation on an attached L2 cache. This is an example of encoding the operation in the op-code of the instruction.
- the L1 & L2 cache normally communicate via a special interface which allows the L2 to perform actions when a special instruction is executed by the CPU.
- the further embodiment addresses this disadvantage, that special instructions have to be used to pre-fetch information into the cache.
- an ordinary load access is used to trigger the pre-fetch of multiple cache lines.
- the buffer pre-fetch differs from other types of pre-fetch as the software does not need to issue any special pre-fetch instructions nor issue writes to any special registers. This is advantageous in situations which have significant amounts of legacy code which is difficult to modify (e.g. to add pre-fetch instructions) or the software engineer optimising the code cannot identify all the places that the buffer may be accessed. In this case it is possible following buffer creation to associate it with a pre-fetch. Thereafter, the L2 cache will behave automatically. Typically, the buffer pre-fetch scheme results in a buffer being pre-fetched to the cache, and, therefore, multiple cache lines are fetched.
- FIGS. 4 to 8 illustrate a system comprising a level 2 cache.
- FIG. 9 illustrates a further embodiment of the present disclosure.
- the level 2 (L2) cache has a target port dedicated to accessing a special register called an L2PFR (L2 pre-fetch register).
- L2PFR L2 pre-fetch register
- the L2PFR may be implemented as a 32-bit write-only register. Writing a 32-bit value to this register may cause the naturally-aligned 32-byte block—whose address is specified by bits [31:5] of the value—to be fetched into the L2 cache. The pre-fetch operation can therefore be initiated by a CPU with a standard word write operation.
- the procedure followed is that first the address is looked up in the L2 cache. If there is a hit, that is the 32-byte block associated with the address is present in the cache, then there is no further activity and no data is fetched. If there is a miss, which implies that the data is not in the cache then space is allocated in the cache and the 32-byte block is fetched from main memory and placed in the level 2 cache. This pre-fetch mechanism is therefore simple to use within the structure of conventional software and conventional DMA engines.
- a common use is when a data buffer is to be transferred from an I/O interface to main memory whereupon the CPU will perform some computation on the data contained in the buffer.
- a DMA engine maybe deployed to transfer data from an I/O interface (e.g. an Ethernet port, a USB port, a SATA disk interface etc.) into system dynamic random access memory (DRAM).
- DRAM system dynamic random access memory
- the DMA engine Upon completion of the data transfer the DMA engine would send an interrupt to the CPU to signal that the data is transfer has finished.
- the interrupt handler in the CPU would schedule the execution of an appropriate routine to deal with the computation to be performed on the data buffer.
- the routine may then execute in an expedited manner by using one of two methods:
- a linked list which specifies the set of transfers to be performed by the DMA is extended by one or more additional items.
- the first additional item specifies that a single 32-bit datum is to be transferred from system memory to the address of the L2PFR register.
- the value of the datum is the address of the first byte of the data buffer which has been transferred.
- subsequent additional items are similar except that the value of the datum transferred to the L2PFR register is numerically 32 larger than the previous item. If n additional items were specified (where 1 ⁇ n ⁇ (buffer size/32)) then this has the effect of pre-fetching some or all of the data buffer into the L2 cache.
- the transfer proceeds as in a conventional system and an interrupt is sent to the CPU on completion of the DMA.
- the interrupt handler writes the address of one or more blocks which contain the data buffer to the L2PFR register. This causes some or all of the data buffer to be requested to be pre-fetched into the L2 cache before the computation routine associated with the data buffer is executed.
- FIG. 4 illustrates a hierarchical memory arrangement.
- a CPU 1102 which optionally has a level 1 cache
- a level 2 cache 1104 a separate module known as a level 2 cache 1104 .
- level 2 should not be taken to imply exclusive use in systems which have level 1 caches. Nor is there an implication that there is no level 3 or higher level caches. Nonetheless, the level 2 terminology is retained purely for simplicity of exposition.
- the level 2 cache (L2 cache) 1104 is functionally located between the CPU 1102 and the rest of the system 1106 so that all of its high performance memory requests have to go via the L2 cache 1104 .
- the L2 cache 1104 is able to service some of its requests from its own contents and other requests is passes on to the rest of the system to be serviced.
- the L2 cache 1104 also contains a number of configuration and status registers (CSRs) 1108 through which the operation of the L2 cache 1104 may be controlled and monitored.
- CSRs configuration and status registers
- FIG. 5 A top-level diagram of a cache such as the L2 cache 1104 is shown in FIG. 5 .
- the cache comprises an access address 1202 , which is the address which is presented by the CPU 1102 to the L2 cache 1104 , and a tag RAM 1204 which is the memory to which the access address 1202 is associated.
- the access address 1202 is compared with the contents of the tag RAM 1204 to determine which data RAM 1206 array (line) should be selected.
- Data RAM 1206 holds the data which is supplied to the L2 cache 1104 .
- a set-associative cache an address can only reside in a limited number of places in the cache. The collection of places which a single address may reside is called a set 1208 .
- a block of data associated with a single address in the tag RAM 1204 is a line 1212 .
- a refill engine 1214 is present, which is a functional unit whose responsibility is fetching from main memory data which is not already held in the cache. It does this on demand from a standard access or a pre-fetch.
- L2PFR 1110 is an operational register used to initiate a pre-fetch.
- the L2PFR 1110 is writable by both the CPU 1102 (using the target 1 port 1112 ) and modules with DMA capability 1114 in the rest of the system (using the target 2 port 1116 ).
- the register is written with a 32-bit operand, the operand is interpreted as a cache line address (see FIG. 7 ).
- the address is submitted to the cache for lookup the address is broken down into a number of fields that are used for different purposes by the hardware. The size and location of each of the fields depends on the size and internal organisation of the cache. An example arrangement of the fields is shown in FIG. 7 .
- a word selection field 1402 specifies which of the 8 4-byte words in the line is the requested word.
- a tag field 1404 is stored in the tag RAM to uniquely identify the address of the data held in the associated line.
- a set selection field 1406 is used to determine which set in the cache is looked up.
- FIG. 6 The procedure following a write to the L2PFR 108 is outlined in the flow diagram in FIG. 6 , with further reference to FIG. 8 which illustrates internal logic and buffering of the L2 cache.
- a write is made into the L2PFR in step S 1302 . This is interpreted as a request to fetch the address into the L2 cache.
- the operand is latched into the target 2 incoming buffer in FIG. 8 ) and transferred to the first part of the control pipeline C 1 ( 1504 ) whereupon logic signals are generated such that the address is looked-up in the tags (see 1204 of FIG. 6 ).
- a lookup of the L2PFR is made in step S 1304 . If the lookup of the L2PFR address does yields a match (in step S 1306 ), as indicated by assertion of the “HIT” signal ( 1216 in FIG. 6 ) then this indicates that the data is already held in the cache and no further action is taken.
- a fetch request is passed to the refill engine ( 1214 in FIG. 6 ) in step S 1308 .
- the refill engine ensures that an appropriate entry is added to the bus queue ( 1506 ) and also to the Pending request buffer ( 1508 ).
- the Pending request buffer holds address and allocation attributes of all outstanding requests.
- Entries in the bus queue ( 1506 ) will eventually be realized as memory requests on the system interconnect ( 1118 in FIG. 4 ) in a standard manner.
- the request will eventually illicit a response containing the requested data in step S 1310 .
- the requested data is buffered in the response queue ( 1510 ).
- the request attributes contained in the pending request buffer ( 1508 ) are used to identify where in the cache the pre-fetched data is to be located and the tag which is to accompany it into the cache (step S 1312 ).
- the data and tags are loaded into the cache using the line fill buffer ( 1512 ).
- step S 1314 it is checked whether the write-back is needed, and if so, in step S 1316 the L2 arranges for the write-back in a manner common to the design of caches and utilizing a write-back buffer to hold the data whose place in the cache will have been taken by the pre-fetched data.
- step S 1318 the victim is replaced by the fetched data, and, in step S 1320 , the process halts.
- the pre-fetch address must also search the pending request buffer 1508 . If there is a match in the pending request buffer then the pre-fetch request is discarded and no further action is taken.
- L2PFR_J bits [4:0] are not zero, a pre-fetch request is issued to the line specified by L2PFR_J bits [31:5]—this occurs in the manner described previously for simple L2PFR register writes. Following the pre-fetch, L2PFR_J [4:0] is decremented by 1 and L2PFR_J [31:5] is incremented by 1. In this way a sequence of pre-fetches can be implemented with a single write to the L2PFR_J.
- This logic may be implemented by an additional two adders and a comparator with simple modification to the L2 cache state machine in a manner known to all skilled logic designers.
- FIG. 9 illustrates the internal structure of a L2 cache with a buffer pre-fetch scheme of the further embodiment.
- the structure shown in FIG. 9 is similar to that described above with reference to FIG. 4 .
- the L2 cache shown in FIG. 9 makes use of three further 32-bit registers.
- a level 2 comparison register (L2CR_B) 1702 contains the value against which accessed addresses are compared.
- a level match register (L2MR_B) 1704 contains a mask which governs how the comparison is performed.
- the access address 1708 In order for a match signal 1706 to be asserted, the access address 1708 must match the contents of the L2CR_B register 1702 in those bit positions where the L2MR_B register 1704 is a ‘1’.
- the comparison operation is performed by a comparator 1710 .
- a level 2 pre-fetch register (L2PFR_B) 1712 indicates the cache line(s) which should be fetched when the match signal 1706 is asserted.
- the format of this register is the same as that described with reference to the system described above and whose encoding is shown in Table 1.
- the L2 cache In operation, when a cacheable load access is made to the L2 cache which results in a miss (i.e. the hit signal 1216 is not asserted) the L2 cache will implement its standard miss handling procedure and fetch the requested cache line into the cache. Concurrent with the miss processing the access address is also presented to the comparison logic 1710 . If the access address matches the value in the L2CR-B register 1702 in all bit positions in which the L2MR_B register 1704 is a binary ‘1’ then a pre-fetch is initiated. The size and base address of the pre-fetch is indicated in the L2PFR_B register 1712 in a manner described previously.
- Couple and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another.
- the term “or” is inclusive, meaning and/or.
- the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The present disclosure provides systems and methods for a cache memory and a cache load circuit. The cache load circuit is capable of retrieving a portion of data from the system memory and of storing a copy of the retrieved portion of data in the cache memory. In addition, the systems and methods comprise a monitoring circuit for monitoring accesses to data in the system memory.
Description
- The present application is related to United Kingdom Patent Application No. 0722707.7, filed Nov. 19, 2007, entitled “CACHE MEMORY SYSTEM”. United Kingdom Patent Application No. 0722707.7 is assigned to the assignee of the present application and is hereby incorporated by reference into the present disclosure as if fully set forth herein. The present application hereby claims priority under 35 U.S.C. §119(a) to United Kingdom Patent Application No. 0722707.7.
- The present invention relates to systems comprising cache memories, and in particular to systems employing data pre-fetching.
- A very large number of systems involve the retrieval of data from a system memory by a device such as a processor. Many of these systems employ a technique known as data caching which exploits a property of data access known as temporal locality. Temporal locality means data that has been accessed recently is the data most likely to be accessed again in the near future. Data caching involves storing, or caching, a copy of recently accessed data in a cache memory that is accessible more quickly and efficiently than the system memory. If the same data is requested again in the future, the cached copy of the data can be retrieved from the cache memory rather than retrieving the original data from the system memory. As the cache memory can be accessed more quickly than the system memory, this scheme generally increases the overall speed of data retrieval.
- To implement caching techniques, processor circuitry typically includes an internal cache memory which is located physically closer to the CPU than the system memory, so can be accessed more quickly than the system memory. When the processor requests data from the system memory a copy of the retrieved data is stored in the cache memory, if it is not stored there already. Some systems provide two or more caches arranged between the CPU and the system memory in a hierarchical structure. Caches further up the hierarchy are typically smaller in size, but can be accessed more quickly by the CPU than caches lower down the hierarchy. Caches within such a structure are usually referred to as level 1 (L1), level 2 (L2), level 3 (L3), . . . caches with the L1 cache usually being the smallest and fastest.
- A typical cache memory comprises a series of cache lines, each storing a predetermined sized portion of data. For example, a typical cache memory is divided into 1024 cache lines, each 32 bytes in size, giving a total capacity of 32 kB. Data is usually cached in portions equal to the size of a whole number of cache lines. When an item of data smaller than a cache line is cached, a block of data equal to the size of one or more cache lines containing the data item is cached. For example, the data item may be located at the beginning of the cache line sized portion of data, at the end or somewhere in the middle. Such an approach can improve the efficiency of data accesses exploiting a principle known as spatial locality. The principle of spatial locality means that addresses referenced by programs in a short space of time are likely to span a relatively small portion of the entire address space. By caching one or more entire cache lines, not only is the requested data item cached, but also data located nearby, which, by the principle of spatial locality is more likely to be required in the near future than other data.
- Each cache line of the cache memory is associated with address information, known as tags, identifying the region of the system memory from which the data stored in each cache line was retrieved. For example, the tag associated with a particular cache line may comprise the address of the system memory from which the cache line sized portion of data stored in that cache line was retrieved. The cache lines may be stored in a data memory portion of the cache, while the tags may be stored in a tag memory portion of the cache.
- When a processor requests data from the system memory, the address of the requested data is first compared to the address information in the tag memory to determine whether a copy of the requested data is already located in the cache as the result of a previous data access. If so, a cache hit occurs and the copy of the data is retrieved from the cache. If not, a cache miss occurs, in which case the data is retrieved from the system memory. In addition, a copy of the retrieved data may be stored in the cache in one or more selected cache lines and the associated tags updated accordingly. In a system comprising a cache hierarchy, when data is requested from the system memory, the highest level cache is first checked to determine if a copy of the data is located there. If not, then the next highest level cache is checked, and so on, until the lowest level cache has been checked. If the data is not located in any of the caches then the data is retrieved from the system memory. A copy of the retrieved data may be stored in any of the caches in the hierarchy.
- When applying caching techniques, it is important to ensure that the data stored in a cache represents a true copy of the corresponding data stored in the system memory. This requirement may be referred to as maintaining coherency between the data stored in the system memory and the data stored in the cache. Data coherency may be destroyed, for example, if data in one of the system memory and cache is modified or replaced without modifying or replacing the corresponding data in the other. For example, when the processor wishes to modify data, a copy of which is stored in the cache, the processor will typically modify the cached copy without modifying the original data stored in the system memory. This is because it is the cached copy of the data that the processor would retrieve in future accesses and so, for efficiency reasons, the original data stored in the system memory is not modified. However, without taking steps to maintain coherency, any other devices which access the data from the system memory would access the unmodified, and therefore out of date, data.
- Various techniques may be applied to maintain data coherency in cache memory systems. For example, one process, referred to as write-back or copy-back, involves writing or copying data stored in one or more cache lines back to the region of system memory from which the cache lines were originally retrieved (as specified in the address information). This process may be performed in a variety of circumstances. For example, when data stored in a cache line has been modified, the cache line may be copied back to the system memory to ensure that the data stored in the cache line and the corresponding data in the system memory are identical. In another example, when data is copied into the cache as a result of a cache miss, an existing cache line of data may need to be removed to make space for the new entry. This process is known as eviction and the cache line of data that needs to be removed is known as the victim. If the victim comprises modified data, then the victim would need to be written back to the system memory to ensure that the modifications made to the data are not lost when the victim is deleted from the cache.
- In some systems, special data coherency routines implemented in software are executed to maintain data coherency. Such routines may periodically sweep the cache to ensure that data coherency is maintained, or may act only when specifically required, for example when data is modified or replaced. These routines may include write-back or copy-back processes.
- Some systems employ a technique known as data pre-fetching in which data may be retrieved, possibly speculatively, before it is actually needed in order to increase the overall speed of memory access. Data pre-fetches may be speculative in the sense that the pre-fetched data may not eventually be required. In one example of data pre-fetching, when executing a code loop in which an item of data needs to be retrieved within each iteration of the loop, the data required for a particular iteration may be pre-fetched during the preceding iteration. In this way, at the point the data is actually required, it does not need to be retrieved at that time. In another example, in highly integrated multimedia systems, very large quantities of data are manipulated, typically in a linear fashion, in a technique known as data streaming. In such applications, the future access patterns of data may be known some time in advance. In this case, data required in the future may be pre-fetched so that it is immediately available when eventually required.
- Typically, pre-fetched data is stored in a cache and treated as cached data. In this way, when the pre-fetched data is actually requested, the cache will be checked to determine whether the requested data is located there. Due to the earlier data pre-fetch, a copy of the data can be retrieved from the cache, rather than accessing the system memory. Pre-fetching data into a cache is useful even in applications involving data accesses where the property of temporal locality do not apply. For example, in data streaming applications, data may only be used a single time, so temporal locality does not apply in this case. However, for the reasons given above caching pre-fetched data is advantageous.
- Many processor architectures provide special pre-fetch instructions which allow software to cause data to be pre-fetched into a cache in advance of its use. Examples of such instructions include pre-fetch, preload or touch instructions. In such cases a cache normally communicate via a special interface which allows the cache to perform actions when a special instruction is executed by the processor. Data may be pre-fetched into any cache present in a cache hierarchy, such as a
level 1 cache or level 2 cache. In some systems, pre-fetching data into a level 2 cache may be performed as a consequence of issuing a request to pre-fetch data into thelevel 1 cache. - A limiting factor in the performance of many systems is the delay between a CPU requesting data from memory and the data actually being supplied to it. This delay is known as memory latency. For example, the memory latency of highly integrated systems is typically 10-100 times the duration of the execution of a single instruction by the CPU. With the continuing development of processors, CPU clock rates are increasing rapidly, resulting in increasing demand for higher rates of data access. Even with improvements in the speed of memory access, the effects of memory latency are becoming more significant as a result.
- There is a need, therefore, for a system and method for pre-fetching data which is as fast and efficient as possible. One problem with existing systems is that it may be difficult to ensure that data is pre-fetched sufficiently in advance so that it is available immediately when needed. Often, it is known in advance that data will required, and in these situations a pre-fetch can be initiated, for example using special pre-fetch instructions. However, the execution of such instructions may take a significant period of time to complete so that the data may not be available by the time it is needed. Furthermore, modifying code to include all necessary pre-fetch instructions may be difficult. It may also be difficult to identify all occasions when pre-fetching will be necessary, particularly in a dynamic system in which the patterns of data access may not be consistent.
- The present invention solves these and other problems associated with existing techniques.
- According to a first aspect, the present disclosure provides a cache memory system for caching data comprising: a cache memory for storing a copy of a portion of data stored in a system memory; and a cache load circuit capable of retrieving the portion of data from the system memory and of storing a copy of the retrieved portion of data in the cache memory; wherein the system further comprises: means for monitoring accesses to data in the system memory; a first memory for storing a first value defining a first memory address; a comparator for comparing the access address of an access to data in the system memory with the first memory address; and a second memory for storing a second value defining a second memory address; the system being arranged such that, if a relationship between the access address and the first memory address is satisfied, the cache load circuit retrieves the portion of data stored in the system memory at the second memory address defined by the second value, and stores the retrieved portion of data in the cache memory; wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
- According to a second aspect, the present disclosure provides a method for pre-fetching data into a cache memory system, the method comprising the steps of: retrieving a portion of data from a system memory; and storing a copy of the retrieved portion of data in a cache memory; wherein the method comprises the further steps of: monitoring accesses to data in the system memory; comparing the access address of an access to data in the system memory with a first memory address; and if a relationship between the access address and the first memory address is satisfied, retrieving the portion of data stored in the system memory at a second memory address, and storing the retrieved portion of data in the cache memory; wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
- Other technical features may be readily apparent to one skilled in the art from the following FIGURES, descriptions and claims.
- For a more complete understanding of this present disclosure and its features, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
-
FIG. 1 is a schematic diagram of a cache memory system in a first embodiment of the invention; -
FIG. 2 is a schematic diagram of a system comprising the cache shown inFIG. 1 ; -
FIG. 3 is a schematic diagram of the monitoring circuit comprised in the system illustrated inFIG. 1 ; -
FIG. 4 shows a system topology comprising a level 2 cache; -
FIG. 5 shows the internal structure of a level 2 cache; -
FIG. 6 shows a flow diagram for a pre-fetch procedure; -
FIG. 7 shows the fields of a 32-bit physical address and how they are interpreted by the L2 cache lookup logic; -
FIG. 8 shows internal buffering and logic for a level 2 cache; and -
FIG. 9 shows the internal structure of a level 2 cache for a further embodiment. -
FIG. 1 is a schematic diagram of an exemplary cache memory system embodying the present disclosure. The system, referred to below simply ascache 1, comprises adata memory 3 for storing one ormore cache lines 5 of data and atag memory 7 for storing address information in the form of a series oftags 9. For eachcache line 5 in thedata memory 3, there is acorresponding tag 9 in thetag memory 7. Thecache 1 also comprises acache load circuit 19 used to store data in thedata memory 3. It is understood that the present disclosure may be used in a variety of cache systems and is not limited to the arrangement illustrated inFIG. 1 . -
FIG. 2 illustrates asystem 100 comprising thecache 1 shown inFIG. 1 . As shown inFIG. 2 , in this embodiment, thecache 1 is a level 2 cache functionally located between aprocessor 101 comprising alevel 1cache 103 and asystem memory 105. However, it is understood that the cache shown inFIG. 1 may be used as any level of cache, in any cache hierarchy arrangement or as a sole cache. The term system memory may refer to a specific memory device or to a group of two or more memory devices. In general the system memory represents a general memory space formed from the whole, or part of, the individual memory spaces of one or more memory devices. Theprocessor 101 directly accesses thelevel 1cache 103. Thelevel 1cache 103 communicates with the level 2cache 1 via 11, 15 and 25 and the level 2bus lines cache 1 communicates with thesystem memory 105 viabus line 29. Thesystem 100 also comprises other modules, including amodule 107 having DMA (Direct Memory Access) capability. Themodule 107 accesses the level 2cache 1 viabus line 109. Other parts of the system (not shown) may also access the level 2cache 1 via further bus lines (not shown) which may be separate from or integrated withbus line 109. - With reference to
FIG. 2 , when theprocessor 101 issues a request for retrieval of data stored in thesystem memory 105 the following process occurs. First, the data access request is transmitted to thelevel 1cache 103 which determines whether it stores a copy of the requested data. If so then the copy of the requested data is retrieved from thelevel 1cache 103 and provided to theprocessor 101. In this case, no data retrieval involving the level 2cache 1 or thesystem memory 105 is made. If thelevel 1cache 103 does not store a copy of the requested data then the data access request is forwarded from thelevel 1cache 103 to the level 2cache 1. In this case, the level 2cache 1 determines whether it stores a copy of the requested data. If so then the copy of the requested data is retrieved from the level 2cache 1 and provided to thelevel 1cache 103, which in turn provides the data to theprocessor 101. If the level 2cache 1 does not store a copy of the requested data then the data is retrieved from thesystem memory 105. In this case, the level 2cache 1 requests the data from thesystem memory 105 and provides the retrieved data to thelevel 1cache 103, which in turn provides it to theprocessor 101. - With reference to
FIG. 1 , the level 2cache 1 performs the following process when a data access request is received by it. First, a determination is made as to whether a copy of the data specified in the data access request is already present in thedata memory 3 of thecache 1. The data access request identifies the address of thesystem memory 105 at which the requested data is located. The address of the requested data is supplied to thetag memory 7 vialine 11 and compared to thetags 9 stored in thetag memory 7. Eachtag 9 comprises an address of thesystem memory 105 from which acorresponding cache line 5 of data was originally retrieved. If the address of the data presently being requested matches an address specified by atag 9, this indicates that thedata memory 3 does contain a copy of the requested data. A match is indicated by asserting a hit signal online 13, which is received by thedata memory 3 and thecache load circuit 19. When the hit signal is asserted, thecache line 5 of data corresponding to thetag 9 causing the hit is retrieved from thedata memory 3 and output from thedata memory 3 andcache 1 online 15. - If no match is found between the address of the requested data and any of the
tags 9 in the tag memory, the hit signal is not asserted. In this case the requested data is retrieved from thesystem memory 105 using thecache load circuit 19 in the manner described below. A copy of the data retrieved from thesystem memory 105 by the cache load circuit is stored in thedata memory 3. The data is then output from thedata memory 3 andcache 1 online 15. - The
cache load circuit 19 comprises amemory 21 which stores a queue of pending cache load operations. Each cache load operation represents an item of data to be retrieved from thesystem memory 105 and includes the memory address of the data item. A cache load operation may also contain other relevant information, such as whether the data is required as the result of a pre-fetch or some other type of data access. The address received online 11 is provided to thecache load circuit 19 vialine 17. As mentioned above, thecache load circuit 19 also receives the hit signal vialine 13. When the hit signal online 13 is not asserted, thecache load circuit 19 adds a cache load operation to the queue stored in thememory 21 based on the address received online 17. Thecache load circuit 19 processes each cache load operation in turn, for example in the order in which they were added to the queue. A newly added cache load operation will eventually be processed by the cache load circuit resulting in the data being retrieved from thesystem memory 105, stored in thedata memory 3 and output from thecache 1. - To process a cache load operation, the cache load circuit identifies the address of the data to be cached and issues a suitable data access request on
line 29 which is received by thesystem memory 105. When the requested data is provided back to the cache load circuit, the cache load circuit identifies one or more suitable cache lines in the data memory in which to store the received data. These may comprise currently vacant cache lines. However, if there are insufficient free cache lines, it may be necessary to remove one or more existing cache lines of data to make room for the new data, in which case the write-back process described above may be required. The cache load circuit then transmits a load command to the data memory vialine 31 comprising a copy of data to be cached, the system memory address from which the data was retrieved and the cache lines identified to store the data. The copy of the data is then stored in the cache lines specified in the load command and corresponding tags are added to the tag memory based on the address information specified in the load command. - A technique by which the embodiment illustrated in
FIG. 1 implements pre-fetching of data into the cache will now be described. In many systems, when a specific address or region of memory is accessed, this is highly indicative that data access involving certain other addresses or regions of memory will be made shortly afterwards. Such data accesses may include both read and write accesses. For example, in some applications, data is input from a first buffer and computation then applied to that data. The resulting data is then output to a different buffer. Since this task is typically carried out repetitively, a write to the destination buffer can be used as an indication of a subsequent access to the source buffer. Data reads, as well as data writes, from a specific region of memory may also be indicative of that data access involving another region of memory will occur in the near future. - Accordingly, In order to improve data access system performance, when a data access involving a first location or region is made, data from a second location or region is automatically pre-fetched into the cache. In this way, if data from the second location is required later, it will be already be available from the cache as a result of the pre-fetch. This avoids the need to access the data from system memory at the point it is actually required thereby reducing memory latency. The memory address or region which triggers a pre-fetch of data may or may not be one from which data has been cached.
- As shown in
FIG. 1 , thecache 1 comprises monitoring means, which in this embodiment is in the form of amonitoring circuit 33, arranged to monitor data accesses within the system and, if a data access involving a first address or region of memory occurs, to cause a pre-fetch of data from a second address or region. Themonitoring circuit 33, illustrated in more detail inFIG. 3 , comprises a first memory,mem1 35, a second memory,mem2 37 and acomparator 39.Memory mem1 35 stores a first address which, if accessed is highly indicative that a data access involving a second address will be made imminently. This second address is stored inmemory mem2 37. Thecomparator 39 is used to compare the addresses of data accesses within the system to the contents ofmem1 35. The contents of memories mem1 35 andmem2 37 may be accessed via 41 and 43 respectively.lines - When the processor requests data from the system memory, the processor issues a data access request which includes the system memory address of the requested data. As shown in
FIG. 1 , this access address is received by thecache 1 vialine 25. The access address is transmitted to themonitoring circuit 33 and received at a first input of thecomparator 39. Thecomparator 39 also receives the value stored inmem1 35 at a second input. The comparator is arranged to compare the first and second inputs, and to assert an output signal online 45 if the two inputs match. For example, a bitwise XOR operation between the access address and the value stored in mem1 may be performed and the comparator output asserted if all bits of the result are equal to zero. It is understood that different ways to perform the comparison may be used. The signal online 45 is received by mem2 which is arranged to output the value stored by mem2 online 47 when the signal online 45 is asserted. Thus, the value stored in mem2 will be output from themonitoring circuit 33 if the data access address received online 25 matches the value stored in mem1. - The value output from the monitoring circuit on
line 47 is used to initiate a pre-fetch of data from the address defined by the output value using any suitable method. For example, the output value may be input as an address into thetag memory 7. This causes the cache to determine whether a copy of the data located at that address is already present in thedata memory 3. If so, then no further action is taken since the data has already been cached. However, if a copy of the data is not located in thedata memory 3 then a cache line of data located at the address is retrieved and stored in the data memory in the manner described above. Alternatively, the output value may be input as an address into the cache load circuit causing a new entry to be added to the queue of pending cache load operations. A pre-fetch of data may be initiated in any other suitable manner using the address based on the value output from the monitoring circuit. - In some cases, only a single cache line of data may be pre-fetched from the address defined by the monitoring circuit output. Alternatively, any specified number of cache lines may be pre-fetched from the address defined by the monitoring circuit output. The number of cache lines to be pre-fetched may be specified in any suitable way, for example by means of a stored value. In one embodiment, the contents of memory mem2 is divided into two parts, the first defining the address of data to be pre-fetched, and the second specifying the number of cache lines to be pre-fetched from that address. As an example, bits 31:5 of the 32 bit value stored in mem2 define bits 31:5 of the address. Bits 4:0 of the address are implicitly assumed to be zero, in which case the addresses are ones aligned on 25=32 byte boundaries. Bits 4:0 of the value stored in mem2 are interpreted as a binary value specifying the number of cache lines to be pre-fetched from the address.
- The value stored in mem2 defining the memory address of data to be pre-fetched may comprise an offset value or other means to define a memory address or region. For example, the value may represent a signed offset value defining an offset relative to a memory address defined by the value stored in mem1. In this case, if the address defined by the value stored in mem1 is addr and the offset is os, then a data access involving the address addr will cause data to be pre-fetched from the address addr+os.
- An additional feature will now be described in which a data pre-fetch may be performed even if there is not an identical match between the address of a data access and the address stored in mem1. Specifically, a data pre-fetch may be initiated if there is certain relationship between the data access address and the address defined by the value stored in mem1.
- In one example, a data pre-fetch is initiated if there is a match between the bits of the data access address and the corresponding bits of mem1 at a set of specified bit positions. To this end, the monitor circuit comprises a third memory, mem3 49, which stores a mask defining those bit positions at which the access address and the contents of mem2 must match for a data pre-fetch to be initiated. For example, for each bit of the mask stored in mem3 equal to 1, the corresponding bits of the data access address and contents of mem2 must match. When a specific bit of the mask is equal to 0, then the corresponding bits of the data access address and mem2 do not need to match. The comparator receives the contents of mem3 at a third input and asserts an output if the bits of the access address and the corresponding bits of the value stored in mem1 match at those bit positions at which the mask has a value of 1. For example, a bitwise XOR operation between the access address and the value stored in mem1 may be performed. Then a bitwise AND operation between the resulting value and the mask is performed. The comparator output is asserted if all bits of the final result are equal to zero. It is understood that different ways to perform the comparison based on the mask may be used.
- Using this technique, the mask value may be set, for example, so that the p highest order bits of the mask are set to 1 while the remaining q bits are set to 0 (where p+q=N is the size in bits of a memory address). This means that the memory address of a data access would only need to match the address stored in mem1 in the p highest order bits. Memory addresses having the same values for the p highest order bits span a memory range of 2q bits (i.e. those memory addresses differing only in the q lowest order bits). Consequently, setting the mask in this way means that the memory address of a data access would not need not be exactly equal to the address stored in mem1. Rather, the address of the data access would need to be located within a region of memory of 2q bytes in size beginning with the address defined by bits 31:8 of the value of mem1 (with the remaining bits being equal to zero). In this way, the system can be conFIGUREd so that data accesses involving a specified region of memory initiate an automatic pre-fetch of data into the cache.
- In another example, the mask may be set so that the p highest order bits are set to 0 while the remaining q bits are set to 1. In this case, the memory address of a data access would only need to match the address stored in mem1 in the q lowest order bits. The address stored in mem1 may be one aligned to a address boundary of particular size (such as a cache line sized boundary). In this case, a data access would only initiate a pre-fetch if the address of the data access was aligned to the same boundary as the address stored in mem1. For example, the system may be conFIGUREd so that only data accesses acting on addresses naturally aligned on a cache line sized boundaries trigger pre-fetching of data.
- In the example described above, a data pre-fetch is initiated if there is a match between the bits of the data access address and the corresponding bits of mem1 at a set of specified bit positions. However, other relationships may be applied. For example, a data pre-fetch may be initiated if the data access address is within a certain range from the address defined by the value stored in mem1. Other relationships are possible.
- In some embodiments, a pre-fetch of data is automatically performed only if the data access causes a cache miss. For example, even if the address of the data access matches the address stored in mem1 based on the mask stored in mem3, a pre-fetch is only carried out if the hit signal on
line 13 is not asserted. In this case, the procedure described above to retrieve and cache the requested data takes place. Concurrently, if the address of the data access matches the address stored in mem1 based on the mask, data from the location defined by the value stored in mem2 is pre-fetched into the cache. - This additional condition may be applied for the following reason. When a data access involving a first location does not result in a cache miss, this indicates that an access to the first location has already been made. If this is the case then at least the first such access would have triggered a pre-fetch of data from a second location into the cache in the manner described above. Therefore, it would not be necessary to pre-fetch the data again from the second location following subsequent accesses to the first location. Pre-fetching of data from the second location into the cache is only performed upon the first access to the first location, indicated by a cache miss.
- In the embodiment described above, memories mem1, mem2 and mem3 are 32 bit registers arranged to store a 32 bit value and whose contents are modifiable. For example, values may be written to mem1, mem2 and mem3 during an initialisation period following a system reset, or may be modified dynamically by the processor or other system module. In alternative embodiments, different types and sizes of memory may be used. For example, the mem1, mem2 and mem3 may comprise dedicated memories of any suitable type, or may comprise reserved locations within a larger memory space. In alternative embodiments, mem1, mem2 and mem3 may comprise read only memories, or may comprise memories that can be written to one time only, for example at the time of manufacture. Allowing the contents of mem1, mem2 and mem3 to be modified provides a greater degree of flexibility, for example in systems in which the indicator of an imminent data access may change over time or depending on the application. In other applications in which the indicator of an imminent data access remains fixed, providing read only memories increases security by preventing the appropriate values from being modified inappropriately. The skilled person would appreciate that various further modifications of the embodiments described above may be made.
- In the embodiments described above, the values stored in memories mem1 and mem2 may define addresses in the form of physical memory addresses, virtual memory addresses or may define addresses in any other suitable way. The kind of memory addresses defined by the values stored in mem1 and mem2 may be the same, or may be different. It can be seen that, in the embodiments described, the values stored in memories mem1 and mem2 correspond to different physical memory addresses of the system memory. In other words, a data access involving one physical memory address causes a pre-fetch of data from a different physical memory address, regardless of how the memory addresses are actually represented.
- Using the techniques described above, it is not necessary to use special pre-fetch instructions to initiate pre-fetching of data. Instead, a system and method is provided which automatically monitors data fetches and initiates pre-fetches as described above. Since pre-fetches are not initiated using pre-fetch instructions, it is not necessary to modify existing code to include such pre-fetch instructions. Also, since data accesses are being continuously monitored, it is not necessary to know in advance every occasion on which a pre-fetch is required. Any relevant data accesses will be detected dynamically and a pre-fetch initiated if necessary. Furthermore, any delays associated with executing special pre-fetch instructions are eliminated. Data is pre-fetched quickly and efficiently by dedicated autonomous hardware upon detecting a relevant data access.
- A further embodiment of the present disclosure will now be described with reference to
FIGS. 4 to 9 . In broad terms, in this embodiment there is provided a cache memory comprising storage means and a comparison means arranged to compare a first address provided to the cache memory when a write or load access is made to the cache with at least one predetermined address, and, responsive to the first address corresponding to one of the at least one predetermined addresses, cause the cache memory to fetch data from a further address of an external memory device and store the data in the storage means. - As mentioned above, a disadvantage of known systems is that they require the use of one or more special instructions to pre-fetch data into an L1 cache. Standard names for these instructions are pre-fetch, preload or touch instructions. It is commonplace to extend this functionality to L2 caches so that the aforementioned instructions can effect a similar operation on an attached L2 cache. This is an example of encoding the operation in the op-code of the instruction. In such cases the L1 & L2 cache normally communicate via a special interface which allows the L2 to perform actions when a special instruction is executed by the CPU. The further embodiment addresses this disadvantage, that special instructions have to be used to pre-fetch information into the cache.
- In this embodiment, an ordinary load access is used to trigger the pre-fetch of multiple cache lines. This is achieved because the L2 cache is conFIGUREd to be aware that loads from certain addresses are highly indicative of imminent loads from a buffer. This is known as a buffer pre-fetch scheme. The buffer pre-fetch differs from other types of pre-fetch as the software does not need to issue any special pre-fetch instructions nor issue writes to any special registers. This is advantageous in situations which have significant amounts of legacy code which is difficult to modify (e.g. to add pre-fetch instructions) or the software engineer optimising the code cannot identify all the places that the buffer may be accessed. In this case it is possible following buffer creation to associate it with a pre-fetch. Thereafter, the L2 cache will behave automatically. Typically, the buffer pre-fetch scheme results in a buffer being pre-fetched to the cache, and, therefore, multiple cache lines are fetched.
-
FIGS. 4 to 8 illustrate a system comprising a level 2 cache.FIG. 9 illustrates a further embodiment of the present disclosure. - In the system shown in
FIG. 5 , the level 2 (L2) cache has a target port dedicated to accessing a special register called an L2PFR (L2 pre-fetch register). The use of this register allows CPU and non-CPU requesters to cause data to be fetched into the L2 cache before it is used, therefore avoiding having to suffer the delay incurred when the CPU fetches on demand. - The L2PFR may be implemented as a 32-bit write-only register. Writing a 32-bit value to this register may cause the naturally-aligned 32-byte block—whose address is specified by bits [31:5] of the value—to be fetched into the L2 cache. The pre-fetch operation can therefore be initiated by a CPU with a standard word write operation.
- The procedure followed is that first the address is looked up in the L2 cache. If there is a hit, that is the 32-byte block associated with the address is present in the cache, then there is no further activity and no data is fetched. If there is a miss, which implies that the data is not in the cache then space is allocated in the cache and the 32-byte block is fetched from main memory and placed in the level 2 cache. This pre-fetch mechanism is therefore simple to use within the structure of conventional software and conventional DMA engines.
- A common use is when a data buffer is to be transferred from an I/O interface to main memory whereupon the CPU will perform some computation on the data contained in the buffer. In a conventional system a DMA engine maybe deployed to transfer data from an I/O interface (e.g. an Ethernet port, a USB port, a SATA disk interface etc.) into system dynamic random access memory (DRAM). Upon completion of the data transfer the DMA engine would send an interrupt to the CPU to signal that the data is transfer has finished. The interrupt handler in the CPU would schedule the execution of an appropriate routine to deal with the computation to be performed on the data buffer.
- The routine may then execute in an expedited manner by using one of two methods:
- 1). A linked list which specifies the set of transfers to be performed by the DMA is extended by one or more additional items. The first additional item specifies that a single 32-bit datum is to be transferred from system memory to the address of the L2PFR register. The value of the datum is the address of the first byte of the data buffer which has been transferred. Optionally, subsequent additional items are similar except that the value of the datum transferred to the L2PFR register is numerically 32 larger than the previous item. If n additional items were specified (where 1≦n≦(buffer size/32)) then this has the effect of pre-fetching some or all of the data buffer into the L2 cache.
- 2). The transfer proceeds as in a conventional system and an interrupt is sent to the CPU on completion of the DMA. In addition to the conventional actions the interrupt handler writes the address of one or more blocks which contain the data buffer to the L2PFR register. This causes some or all of the data buffer to be requested to be pre-fetched into the L2 cache before the computation routine associated with the data buffer is executed.
- Reference is now made to
FIG. 4 , which illustrates a hierarchical memory arrangement. In this arrangement a CPU 1102 (which optionally has alevel 1 cache) is supplemented by a separate module known as a level 2cache 1104. Use of the term level 2 should not be taken to imply exclusive use in systems which havelevel 1 caches. Nor is there an implication that there is nolevel 3 or higher level caches. Nonetheless, the level 2 terminology is retained purely for simplicity of exposition. - The level 2 cache (L2 cache) 1104 is functionally located between the
CPU 1102 and the rest of thesystem 1106 so that all of its high performance memory requests have to go via theL2 cache 1104. TheL2 cache 1104 is able to service some of its requests from its own contents and other requests is passes on to the rest of the system to be serviced. TheL2 cache 1104 also contains a number of configuration and status registers (CSRs) 1108 through which the operation of theL2 cache 1104 may be controlled and monitored. - A top-level diagram of a cache such as the
L2 cache 1104 is shown inFIG. 5 . The cache comprises anaccess address 1202, which is the address which is presented by theCPU 1102 to theL2 cache 1104, and atag RAM 1204 which is the memory to which theaccess address 1202 is associated. In other words theaccess address 1202 is compared with the contents of thetag RAM 1204 to determine whichdata RAM 1206 array (line) should be selected.Data RAM 1206 holds the data which is supplied to theL2 cache 1104. In a set-associative cache an address can only reside in a limited number of places in the cache. The collection of places which a single address may reside is called aset 1208. The collection of addresses which are in the same set is called away 1210. A block of data associated with a single address in thetag RAM 1204 is aline 1212. Arefill engine 1214 is present, which is a functional unit whose responsibility is fetching from main memory data which is not already held in the cache. It does this on demand from a standard access or a pre-fetch. - As mentioned, this system makes use of a special register called
L2PFR 1110, which is an operational register used to initiate a pre-fetch. TheL2PFR 1110 is writable by both the CPU 1102 (using thetarget 1 port 1112) and modules withDMA capability 1114 in the rest of the system (using the target 2 port 1116). When the register is written with a 32-bit operand, the operand is interpreted as a cache line address (seeFIG. 7 ). When an address is submitted to the cache for lookup the address is broken down into a number of fields that are used for different purposes by the hardware. The size and location of each of the fields depends on the size and internal organisation of the cache. An example arrangement of the fields is shown inFIG. 7 . Aword selection field 1402 specifies which of the 8 4-byte words in the line is the requested word. Atag field 1404 is stored in the tag RAM to uniquely identify the address of the data held in the associated line. Aset selection field 1406 is used to determine which set in the cache is looked up. - The procedure following a write to the L2PFR 108 is outlined in the flow diagram in
FIG. 6 , with further reference toFIG. 8 which illustrates internal logic and buffering of the L2 cache. A write is made into the L2PFR in step S1302. This is interpreted as a request to fetch the address into the L2 cache. The operand is latched into the target 2 incoming buffer inFIG. 8 ) and transferred to the first part of the control pipeline C1 (1504) whereupon logic signals are generated such that the address is looked-up in the tags (see 1204 ofFIG. 6 ). - A lookup of the L2PFR is made in step S1304. If the lookup of the L2PFR address does yields a match (in step S1306), as indicated by assertion of the “HIT” signal (1216 in
FIG. 6 ) then this indicates that the data is already held in the cache and no further action is taken. - If the lookup of the L2PFR address does not yield a match this is indicated by de-assertion of the HIT signal (1216 in
FIG. 6 ). In this case a fetch request is passed to the refill engine (1214 inFIG. 6 ) in step S1308. The refill engine ensures that an appropriate entry is added to the bus queue (1506) and also to the Pending request buffer (1508). The Pending request buffer holds address and allocation attributes of all outstanding requests. - Entries in the bus queue (1506) will eventually be realized as memory requests on the system interconnect (1118 in
FIG. 4 ) in a standard manner. The request will eventually illicit a response containing the requested data in step S1310. The requested data is buffered in the response queue (1510). The request attributes contained in the pending request buffer (1508) are used to identify where in the cache the pre-fetched data is to be located and the tag which is to accompany it into the cache (step S1312). The data and tags are loaded into the cache using the line fill buffer (1512). - If the L2 cache is operated in copy-back mode there is a possibility that the place selected for the fetched data was previously occupied by a cache line (the victim) which has been modified since being fetched from memory (i.e. is termed dirty). A dirty victim will require writing back to memory—a process sometimes referred to as eviction. In step S1314 it is checked whether the write-back is needed, and if so, in step S1316 the L2 arranges for the write-back in a manner common to the design of caches and utilizing a write-back buffer to hold the data whose place in the cache will have been taken by the pre-fetched data. In step S1318 the victim is replaced by the fetched data, and, in step S1320, the process halts.
- There is also the possibility that the data to be pre-fetched, although not currently present in the cache, is in the process of being fetched into the cache by a preceding data access miss or indeed an earlier pre-fetch. For this reason, in addition to looking up in the TAG array of the cache the pre-fetch address must also search the pending
request buffer 1508. If there is a match in the pending request buffer then the pre-fetch request is discarded and no further action is taken. - Data access misses to the L2PFR address which occur when the pre-fetch request is pending will be detected by searching the pending request buffer. The Pending request buffer is able to link together subsequent data accesses, so that when the fetched data returns it is able to be used to satisfy each of these accesses in turn. This functionality is easily implemented in standard logic and is known to the designers of caches which are able to deal with multiple misses.
- An enhancement to the system described above can be achieved though the use of “jumbo pre-fetch”. In this case, low-order bits in the L2PFR are used to specify the number of cache lines to be fetched. In the preceding description it should be appreciated that the low order bits are not required to specify the cache line to be fetched as they normally indicate the byte-in-line to be accessed. This is extended to allow multiple cache lines to be fetched efficiently.
- This can be performed by a decrement and fetch system. In this encoding all accesses to a dedicated jumbo pre-fetch register (denoted L2PFR_J) are interpreted as in Table 1 below, where bits [4:0] function as a simple count of the cache lines remaining to be fetched, or as in Table 2 below, where bits [4:0] function as power of 2 count of the lines to be fetched. On each L2 cache clock cycle the following procedure happens.
- If L2PFR_J bits [4:0] are not zero, a pre-fetch request is issued to the line specified by L2PFR_J bits [31:5]—this occurs in the manner described previously for simple L2PFR register writes. Following the pre-fetch, L2PFR_J [4:0] is decremented by 1 and L2PFR_J [31:5] is incremented by 1. In this way a sequence of pre-fetches can be implemented with a single write to the L2PFR_J.
- This logic may be implemented by an additional two adders and a comparator with simple modification to the L2 cache state machine in a manner known to all skilled logic designers.
-
TABLE 1 L2PFR[4:0] Lines Fetched Bytes fetched 00000 0 0 00001 1 32 00010 2 64 00011 3 96 . . . . . . . . . 11111 31 992 -
TABLE 2 L2PFR[4:0] Lines Fetched Bytes fetched 0000 1 32 0001 2 64 0010 4 128 0011 8 256 0100 16 512 0101 32 1024 0110-1111 Reserved — -
FIG. 9 illustrates the internal structure of a L2 cache with a buffer pre-fetch scheme of the further embodiment. The structure shown inFIG. 9 is similar to that described above with reference toFIG. 4 . However, the L2 cache shown inFIG. 9 makes use of three further 32-bit registers. - A level 2 comparison register (L2CR_B) 1702 contains the value against which accessed addresses are compared. A level match register (L2MR_B) 1704 contains a mask which governs how the comparison is performed. In order for a
match signal 1706 to be asserted, theaccess address 1708 must match the contents of theL2CR_B register 1702 in those bit positions where theL2MR_B register 1704 is a ‘1’. The comparison operation is performed by a comparator 1710. - A level 2 pre-fetch register (L2PFR_B) 1712 indicates the cache line(s) which should be fetched when the
match signal 1706 is asserted. The format of this register is the same as that described with reference to the system described above and whose encoding is shown in Table 1. - In operation, when a cacheable load access is made to the L2 cache which results in a miss (i.e. the
hit signal 1216 is not asserted) the L2 cache will implement its standard miss handling procedure and fetch the requested cache line into the cache. Concurrent with the miss processing the access address is also presented to the comparison logic 1710. If the access address matches the value in the L2CR-B register 1702 in all bit positions in which theL2MR_B register 1704 is a binary ‘1’ then a pre-fetch is initiated. The size and base address of the pre-fetch is indicated in theL2PFR_B register 1712 in a manner described previously. - It is understood that the features of any of the embodiment described above may be used in any of the other embodiments, where this is possible and appropriate. For example, the address fields illustrated in
FIG. 7 may be used in the embodiment shown inFIG. 1 . - It may be advantageous to set forth definitions of certain words and phrases used in this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
- While this present disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this present disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this present disclosure, as defined by the following claims.
Claims (20)
1. A cache memory system for caching data comprising:
a cache memory for storing a copy of a portion of data stored in a system memory;
a cache load circuit capable of retrieving the portion of data from the system memory and of storing a copy of the retrieved portion of data in the cache memory;
a monitoring circuit, wherein the monitoring circuit monitors accesses to the system memory;
a first memory for storing a first value defining a first memory address;
a comparator for comparing an access address of an access to the system memory with the first memory address; and
a second memory for storing a second value defining a second memory address, wherein upon a determination that a relationship between the access address and the first memory address is satisfied, the cache load circuit retrieves the portion of data stored in the system memory at the second memory address defined by the second value, and stores the retrieved portion of data in the cache memory;
wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
2. The cache memory system according to claim 1 , wherein the relationship between the access address and the first address is based upon a determination that the access address and the first address are equal.
3. The cache memory system according to claim 1 , wherein the relationship between the access address and the first address is based upon the determination that the access address and the first address are equal at one or more defined bit positions.
4. The cache memory system according to claim 3 , wherein the relationship between the access address and the first address is based upon a determination that a p most significant bits of the access address match a corresponding p most significant bits of the first address are equal.
5. The cache memory system according to claim 3 , wherein the relationship between the access address and the first address is based upon a determination that a q least significant bits of the access address match a corresponding q least significant bits of the first address are equal.
6. The cache memory system according to claim 3 , wherein the bit positions are defined by a value stored in a third memory.
7. The cache memory system according to claim 1 , wherein the relationship between the access address and the first address is satisfied if the access address is within a defined range from the first address.
8. The cache memory system according to claim 1 , wherein the access to the system memory is a data read.
9. the cache memory system according to claim 1 , wherein the access to the system memory is a data write.
10. The cache memory system according to claim 1 , wherein a size of the retrieved portion of data is variable.
11. The cache memory system according to claim 10 , wherein the size of the retrieved data portion is defined by the first value.
12. The cache memory system according to claim 1 , wherein the second value is an offset value.
13. The cache memory system according to claim 1 , wherein one or more contents of the first and second memories are modifiable.
14. The cache memory system according to claim 1 , wherein the cache memory system is part of a level 2 cache.
15. An integrated circuit comprising a cache memory system according claim 1 .
16. A system comprising:
a processor;
a system memory;
a cache memory for storing a copy of a portion of data stored in the system memory; and
a cache load circuit capable of retrieving the portion of data from the system memory and of storing a copy of the retrieved portion of data in the cache memory located between the processor and the system memory.
17. The system according to claim 16 , wherein the processor is capable of requesting an access to data in the system memory.
18. A method for pre-fetching data into a cache memory system, the method comprising the steps of:
retrieving a portion of data from a system memory; and
storing a copy of the retrieved portion of data in a cache memory;
monitoring accesses to the system memory; and
comparing an access address of an access to the system memory with a first memory address, wherein upon a determination that a relationship between the access address and the first memory address is satisfied, retrieving the portion of data stored in the system memory at a second memory address, and storing the retrieved portion of data in the cache memory, wherein the first and second memory addresses correspond to different physical memory addresses of the system memory.
19. The method according to claim 18 , wherein the access to the system memory is a data write.
20. The method according to claim 18 , wherein a size of the retrieved portion of data is variable.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0821078.3A GB2454808B (en) | 2007-11-19 | 2008-11-18 | Cache memory system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0722707.7 | 2007-11-19 | ||
| GBGB0722707.7A GB0722707D0 (en) | 2007-11-19 | 2007-11-19 | Cache memory |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20090132750A1 true US20090132750A1 (en) | 2009-05-21 |
Family
ID=38896579
Family Applications (4)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US12/284,331 Active 2030-08-26 US9311246B2 (en) | 2007-11-19 | 2008-09-19 | Cache memory system |
| US12/284,332 Active 2031-01-26 US9208096B2 (en) | 2007-11-19 | 2008-09-19 | Cache pre-fetching responsive to data availability |
| US12/284,329 Active 2031-08-04 US8725987B2 (en) | 2007-11-19 | 2008-09-19 | Cache memory system including selectively accessible pre-fetch memory for pre-fetch of variable size data |
| US12/284,336 Abandoned US20090132750A1 (en) | 2007-11-19 | 2008-09-19 | Cache memory system |
Family Applications Before (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US12/284,331 Active 2030-08-26 US9311246B2 (en) | 2007-11-19 | 2008-09-19 | Cache memory system |
| US12/284,332 Active 2031-01-26 US9208096B2 (en) | 2007-11-19 | 2008-09-19 | Cache pre-fetching responsive to data availability |
| US12/284,329 Active 2031-08-04 US8725987B2 (en) | 2007-11-19 | 2008-09-19 | Cache memory system including selectively accessible pre-fetch memory for pre-fetch of variable size data |
Country Status (2)
| Country | Link |
|---|---|
| US (4) | US9311246B2 (en) |
| GB (1) | GB0722707D0 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090132768A1 (en) * | 2007-11-19 | 2009-05-21 | Stmicroelectronics (Research & Development) Limited | Cache memory system |
| US20150089352A1 (en) * | 2013-09-25 | 2015-03-26 | Akamai Technologies, Inc. | Key Resource Prefetching Using Front-End Optimization (FEO) Configuration |
| US20150186289A1 (en) * | 2013-12-26 | 2015-07-02 | Cambridge Silicon Radio Limited | Cache architecture |
Families Citing this family (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102290050B (en) * | 2010-06-18 | 2014-07-30 | 北京中星微电子有限公司 | Audio data transmission method and device |
| WO2012014015A2 (en) * | 2010-07-27 | 2012-02-02 | Freescale Semiconductor, Inc. | Apparatus and method for reducing processor latency |
| US20130166805A1 (en) * | 2010-12-14 | 2013-06-27 | Mitsubishi Electric Corporation | Interrupt cause management device and interrupt processing system |
| US9792218B2 (en) * | 2011-05-20 | 2017-10-17 | Arris Enterprises Llc | Data storage methods and apparatuses for reducing the number of writes to flash-based storage |
| US8429315B1 (en) | 2011-06-24 | 2013-04-23 | Applied Micro Circuits Corporation | Stashing system and method for the prevention of cache thrashing |
| US9336162B1 (en) | 2012-02-16 | 2016-05-10 | Applied Micro Circuits Corporation | System and method for pre-fetching data based on a FIFO queue of packet messages reaching a first capacity threshold |
| US10019390B2 (en) * | 2012-03-30 | 2018-07-10 | Intel Corporation | Using memory cache for a race free interrupt scheme without the use of “read clear” registers |
| US9311251B2 (en) | 2012-08-27 | 2016-04-12 | Apple Inc. | System cache with sticky allocation |
| US8886886B2 (en) | 2012-09-28 | 2014-11-11 | Apple Inc. | System cache with sticky removal engine |
| US9372811B2 (en) * | 2012-12-13 | 2016-06-21 | Arm Limited | Retention priority based cache replacement policy |
| US9519586B2 (en) | 2013-01-21 | 2016-12-13 | Qualcomm Incorporated | Methods and apparatus to reduce cache pollution caused by data prefetching |
| US9454486B2 (en) | 2013-07-12 | 2016-09-27 | Apple Inc. | Cache pre-fetch merge in pending request buffer |
| US9967309B2 (en) * | 2014-10-06 | 2018-05-08 | Microsoft Technology Licensing, Llc | Dynamic loading of routes in a single-page application |
| US20160124786A1 (en) * | 2014-11-04 | 2016-05-05 | Netapp, Inc. | Methods for identifying race condition at runtime and devices thereof |
| US9720827B2 (en) * | 2014-11-14 | 2017-08-01 | Intel Corporation | Providing multiple memory modes for a processor including internal memory |
| US9971693B2 (en) * | 2015-05-13 | 2018-05-15 | Ampere Computing Llc | Prefetch tag for eviction promotion |
| US9934149B2 (en) | 2016-03-31 | 2018-04-03 | Qualcomm Incorporated | Prefetch mechanism for servicing demand miss |
| US10963388B2 (en) | 2019-06-24 | 2021-03-30 | Samsung Electronics Co., Ltd. | Prefetching in a lower level exclusive cache hierarchy |
| US11210225B2 (en) * | 2019-11-25 | 2021-12-28 | Micron Technology, Inc. | Pre-fetch for memory sub-system with cache where the pre-fetch does not send data and response signal to host |
| US11372645B2 (en) * | 2020-06-12 | 2022-06-28 | Qualcomm Incorporated | Deferred command execution |
| KR20220023649A (en) * | 2020-08-21 | 2022-03-02 | 에스케이하이닉스 주식회사 | Memory controller and operating method thereof |
| CN114201120B (en) * | 2022-02-18 | 2022-05-10 | 苏州浪潮智能科技有限公司 | A data reading and writing method, device and related equipment |
| US12254196B2 (en) * | 2022-11-21 | 2025-03-18 | Advanced Micro Devices, Inc. | System and method to reduce power consumption when conveying data to a device |
Citations (38)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5361391A (en) * | 1992-06-22 | 1994-11-01 | Sun Microsystems, Inc. | Intelligent cache memory and prefetch method based on CPU data fetching characteristics |
| US5713003A (en) * | 1994-12-13 | 1998-01-27 | Microsoft Corporation | Method and system for caching data |
| US5761706A (en) * | 1994-11-01 | 1998-06-02 | Cray Research, Inc. | Stream buffers for high-performance computer memory system |
| US5787475A (en) * | 1992-07-21 | 1998-07-28 | Digital Equipment Corporation | Controlled prefetching of data requested by a peripheral |
| US5944815A (en) * | 1998-01-12 | 1999-08-31 | Advanced Micro Devices, Inc. | Microprocessor configured to execute a prefetch instruction including an access count field defining an expected number of access |
| US5956744A (en) * | 1995-09-08 | 1999-09-21 | Texas Instruments Incorporated | Memory configuration cache with multilevel hierarchy least recently used cache entry replacement |
| US5983324A (en) * | 1996-03-28 | 1999-11-09 | Hitachi, Ltd. | Data prefetch control method for main storage cache for protecting prefetched data from replacement before utilization thereof |
| US6173392B1 (en) * | 1997-04-12 | 2001-01-09 | Nec Corporation | Prefetch controller automatically updating history addresses |
| US20010011330A1 (en) * | 1997-06-09 | 2001-08-02 | John H. Hughes | Dma driven processor cache |
| US20020087801A1 (en) * | 2000-12-29 | 2002-07-04 | Zohar Bogin | Method and system for servicing cache line in response to partial cache line request |
| US20020116584A1 (en) * | 2000-12-20 | 2002-08-22 | Intel Corporation | Runahead allocation protection (rap) |
| US20030191900A1 (en) * | 2002-04-09 | 2003-10-09 | Ip-First, Llc. | Microprocessor with repeat prefetch instruction |
| US6643743B1 (en) * | 2000-03-31 | 2003-11-04 | Intel Corporation | Stream-down prefetching cache |
| US6697909B1 (en) * | 2000-09-12 | 2004-02-24 | International Business Machines Corporation | Method and apparatus for performing data access and refresh operations in different sub-arrays of a DRAM cache memory |
| US20040064648A1 (en) * | 2002-09-26 | 2004-04-01 | International Business Machines Corporation | Cache prefetching |
| US6738867B1 (en) * | 1999-06-02 | 2004-05-18 | Hitachi, Ltd. | Disk array system reading ahead operand databackground of the invention |
| US20040148473A1 (en) * | 2003-01-27 | 2004-07-29 | Hughes William A. | Method and apparatus for injecting write data into a cache |
| US6792508B1 (en) * | 1999-12-06 | 2004-09-14 | Texas Instruments Incorporated | Cache with multiple fill modes |
| US20040199727A1 (en) * | 2003-04-02 | 2004-10-07 | Narad Charles E. | Cache allocation |
| US20040205300A1 (en) * | 2003-04-14 | 2004-10-14 | Bearden Brian S. | Method of detecting sequential workloads to increase host read throughput |
| US6862657B1 (en) * | 1999-12-21 | 2005-03-01 | Intel Corporation | Reading data from a storage medium |
| US20050125644A1 (en) * | 1999-06-21 | 2005-06-09 | Pts Corporation | Specifying different type generalized event and action pair in a processor |
| US20050216666A1 (en) * | 2004-03-24 | 2005-09-29 | Sih Gilbert C | Cached memory system and cache controller for embedded digital signal processor |
| US20060075142A1 (en) * | 2004-09-29 | 2006-04-06 | Linden Cornett | Storing packet headers |
| US20060085602A1 (en) * | 2004-10-15 | 2006-04-20 | Ramakrishna Huggahalli | Method and apparatus for initiating CPU data prefetches by an external agent |
| US20060112229A1 (en) * | 2004-11-19 | 2006-05-25 | Moat Kent D | Queuing cache for vectors with elements in predictable order |
| US20060123195A1 (en) * | 2004-12-06 | 2006-06-08 | Intel Corporation | Optionally pushing I/O data into a processor's cache |
| US20060179258A1 (en) * | 2005-02-09 | 2006-08-10 | International Business Machines Corporation | Method for detecting address match in a deeply pipelined processor design |
| US20060294322A1 (en) * | 2000-12-20 | 2006-12-28 | Fujitsu Limited | Multi-port memory based on DRAM core |
| US7177985B1 (en) * | 2003-05-30 | 2007-02-13 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
| US20070067577A1 (en) * | 2002-06-18 | 2007-03-22 | Ip-First, Llc | Microprocessor, apparatus and method for selective prefetch retire |
| US20070113018A1 (en) * | 2005-11-14 | 2007-05-17 | Brink Peter C | Method, apparatus, and a system for efficient context switch |
| US20070124736A1 (en) * | 2005-11-28 | 2007-05-31 | Ron Gabor | Acceleration threads on idle OS-visible thread execution units |
| US20070204087A1 (en) * | 2006-02-24 | 2007-08-30 | Birenbach Michael E | Two-level interrupt service routine |
| US20080104325A1 (en) * | 2006-10-26 | 2008-05-01 | Charles Narad | Temporally relevant data placement |
| US20080168191A1 (en) * | 2007-01-10 | 2008-07-10 | Giora Biran | Barrier and Interrupt Mechanism for High Latency and Out of Order DMA Device |
| US20080256328A1 (en) * | 2007-04-12 | 2008-10-16 | Massachusetts Institute Of Technology | Customizable memory indexing functions |
| US20080263257A1 (en) * | 2007-04-17 | 2008-10-23 | International Business Machines Corporation | Checkpointed Tag Prefetcher |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000353146A (en) | 1999-06-11 | 2000-12-19 | Nec Corp | Input/output control device and method for prefetching data |
| JP2003242027A (en) | 2002-02-13 | 2003-08-29 | Sony Corp | Interface device, data processing system, and data processing method |
| US20100121935A1 (en) | 2006-10-05 | 2010-05-13 | Holt John M | Hybrid replicated shared memory |
| US9208095B2 (en) | 2006-12-15 | 2015-12-08 | Microchip Technology Incorporated | Configurable cache for a microprocessor |
| KR100868766B1 (en) * | 2007-01-31 | 2008-11-17 | 삼성전자주식회사 | Method and apparatus for determining priority of a direct memory access device having a plurality of DMA request blocks |
| GB0722707D0 (en) | 2007-11-19 | 2007-12-27 | St Microelectronics Res & Dev | Cache memory |
| GB2454810B8 (en) | 2007-11-19 | 2012-11-21 | St Microelectronics Res & Dev | Cache memory system |
| GB2454808B (en) | 2007-11-19 | 2012-12-19 | St Microelectronics Res & Dev | Cache memory system |
| GB2454809B (en) | 2007-11-19 | 2012-12-19 | St Microelectronics Res & Dev | Cache memory system |
| GB2454811B8 (en) | 2007-11-19 | 2012-11-21 | St Microelectronics Res & Dev | Cache memory system |
-
2007
- 2007-11-19 GB GBGB0722707.7A patent/GB0722707D0/en not_active Ceased
-
2008
- 2008-09-19 US US12/284,331 patent/US9311246B2/en active Active
- 2008-09-19 US US12/284,332 patent/US9208096B2/en active Active
- 2008-09-19 US US12/284,329 patent/US8725987B2/en active Active
- 2008-09-19 US US12/284,336 patent/US20090132750A1/en not_active Abandoned
Patent Citations (40)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5361391A (en) * | 1992-06-22 | 1994-11-01 | Sun Microsystems, Inc. | Intelligent cache memory and prefetch method based on CPU data fetching characteristics |
| US5787475A (en) * | 1992-07-21 | 1998-07-28 | Digital Equipment Corporation | Controlled prefetching of data requested by a peripheral |
| US5761706A (en) * | 1994-11-01 | 1998-06-02 | Cray Research, Inc. | Stream buffers for high-performance computer memory system |
| US5713003A (en) * | 1994-12-13 | 1998-01-27 | Microsoft Corporation | Method and system for caching data |
| US5956744A (en) * | 1995-09-08 | 1999-09-21 | Texas Instruments Incorporated | Memory configuration cache with multilevel hierarchy least recently used cache entry replacement |
| US5983324A (en) * | 1996-03-28 | 1999-11-09 | Hitachi, Ltd. | Data prefetch control method for main storage cache for protecting prefetched data from replacement before utilization thereof |
| US6173392B1 (en) * | 1997-04-12 | 2001-01-09 | Nec Corporation | Prefetch controller automatically updating history addresses |
| US20010011330A1 (en) * | 1997-06-09 | 2001-08-02 | John H. Hughes | Dma driven processor cache |
| US5944815A (en) * | 1998-01-12 | 1999-08-31 | Advanced Micro Devices, Inc. | Microprocessor configured to execute a prefetch instruction including an access count field defining an expected number of access |
| US6738867B1 (en) * | 1999-06-02 | 2004-05-18 | Hitachi, Ltd. | Disk array system reading ahead operand databackground of the invention |
| US20050125644A1 (en) * | 1999-06-21 | 2005-06-09 | Pts Corporation | Specifying different type generalized event and action pair in a processor |
| US6792508B1 (en) * | 1999-12-06 | 2004-09-14 | Texas Instruments Incorporated | Cache with multiple fill modes |
| US6862657B1 (en) * | 1999-12-21 | 2005-03-01 | Intel Corporation | Reading data from a storage medium |
| US6643743B1 (en) * | 2000-03-31 | 2003-11-04 | Intel Corporation | Stream-down prefetching cache |
| US6697909B1 (en) * | 2000-09-12 | 2004-02-24 | International Business Machines Corporation | Method and apparatus for performing data access and refresh operations in different sub-arrays of a DRAM cache memory |
| US20020116584A1 (en) * | 2000-12-20 | 2002-08-22 | Intel Corporation | Runahead allocation protection (rap) |
| US20060294322A1 (en) * | 2000-12-20 | 2006-12-28 | Fujitsu Limited | Multi-port memory based on DRAM core |
| US20020087801A1 (en) * | 2000-12-29 | 2002-07-04 | Zohar Bogin | Method and system for servicing cache line in response to partial cache line request |
| US20030191900A1 (en) * | 2002-04-09 | 2003-10-09 | Ip-First, Llc. | Microprocessor with repeat prefetch instruction |
| US20070067577A1 (en) * | 2002-06-18 | 2007-03-22 | Ip-First, Llc | Microprocessor, apparatus and method for selective prefetch retire |
| US20040064648A1 (en) * | 2002-09-26 | 2004-04-01 | International Business Machines Corporation | Cache prefetching |
| US20040148473A1 (en) * | 2003-01-27 | 2004-07-29 | Hughes William A. | Method and apparatus for injecting write data into a cache |
| US20040199727A1 (en) * | 2003-04-02 | 2004-10-07 | Narad Charles E. | Cache allocation |
| US20040205300A1 (en) * | 2003-04-14 | 2004-10-14 | Bearden Brian S. | Method of detecting sequential workloads to increase host read throughput |
| US7512740B2 (en) * | 2003-05-30 | 2009-03-31 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
| US20070043907A1 (en) * | 2003-05-30 | 2007-02-22 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
| US7177985B1 (en) * | 2003-05-30 | 2007-02-13 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
| US20050216666A1 (en) * | 2004-03-24 | 2005-09-29 | Sih Gilbert C | Cached memory system and cache controller for embedded digital signal processor |
| US20060075142A1 (en) * | 2004-09-29 | 2006-04-06 | Linden Cornett | Storing packet headers |
| US20060085602A1 (en) * | 2004-10-15 | 2006-04-20 | Ramakrishna Huggahalli | Method and apparatus for initiating CPU data prefetches by an external agent |
| US20060112229A1 (en) * | 2004-11-19 | 2006-05-25 | Moat Kent D | Queuing cache for vectors with elements in predictable order |
| US20060123195A1 (en) * | 2004-12-06 | 2006-06-08 | Intel Corporation | Optionally pushing I/O data into a processor's cache |
| US20060179258A1 (en) * | 2005-02-09 | 2006-08-10 | International Business Machines Corporation | Method for detecting address match in a deeply pipelined processor design |
| US20070113018A1 (en) * | 2005-11-14 | 2007-05-17 | Brink Peter C | Method, apparatus, and a system for efficient context switch |
| US20070124736A1 (en) * | 2005-11-28 | 2007-05-31 | Ron Gabor | Acceleration threads on idle OS-visible thread execution units |
| US20070204087A1 (en) * | 2006-02-24 | 2007-08-30 | Birenbach Michael E | Two-level interrupt service routine |
| US20080104325A1 (en) * | 2006-10-26 | 2008-05-01 | Charles Narad | Temporally relevant data placement |
| US20080168191A1 (en) * | 2007-01-10 | 2008-07-10 | Giora Biran | Barrier and Interrupt Mechanism for High Latency and Out of Order DMA Device |
| US20080256328A1 (en) * | 2007-04-12 | 2008-10-16 | Massachusetts Institute Of Technology | Customizable memory indexing functions |
| US20080263257A1 (en) * | 2007-04-17 | 2008-10-23 | International Business Machines Corporation | Checkpointed Tag Prefetcher |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090132768A1 (en) * | 2007-11-19 | 2009-05-21 | Stmicroelectronics (Research & Development) Limited | Cache memory system |
| US9208096B2 (en) | 2007-11-19 | 2015-12-08 | Stmicroelectronics (Research & Development) Limited | Cache pre-fetching responsive to data availability |
| US9311246B2 (en) | 2007-11-19 | 2016-04-12 | Stmicroelectronics (Research & Development) Limited | Cache memory system |
| US20150089352A1 (en) * | 2013-09-25 | 2015-03-26 | Akamai Technologies, Inc. | Key Resource Prefetching Using Front-End Optimization (FEO) Configuration |
| US9477774B2 (en) * | 2013-09-25 | 2016-10-25 | Akamai Technologies, Inc. | Key resource prefetching using front-end optimization (FEO) configuration |
| US20150186289A1 (en) * | 2013-12-26 | 2015-07-02 | Cambridge Silicon Radio Limited | Cache architecture |
Also Published As
| Publication number | Publication date |
|---|---|
| US8725987B2 (en) | 2014-05-13 |
| US9208096B2 (en) | 2015-12-08 |
| US20090307433A1 (en) | 2009-12-10 |
| GB0722707D0 (en) | 2007-12-27 |
| US20090132749A1 (en) | 2009-05-21 |
| US20090132768A1 (en) | 2009-05-21 |
| US9311246B2 (en) | 2016-04-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20090132750A1 (en) | Cache memory system | |
| US8180981B2 (en) | Cache coherent support for flash in a memory hierarchy | |
| KR100262906B1 (en) | Data line drawing method and system | |
| JP4128878B2 (en) | Method and system for speculatively invalidating cached lines | |
| US8706969B2 (en) | Variable line size prefetcher for multiple memory requestors | |
| USRE45078E1 (en) | Highly efficient design of storage array utilizing multiple pointers to indicate valid and invalid lines for use in first and second cache spaces and memory subsystems | |
| US7107384B1 (en) | Dynamic PCI-bus pre-fetch with separate counters for commands of commands of different data-transfer lengths | |
| US9286221B1 (en) | Heterogeneous memory system | |
| US12332790B2 (en) | Multi-level cache security | |
| US7493452B2 (en) | Method to efficiently prefetch and batch compiler-assisted software cache accesses | |
| US20110072218A1 (en) | Prefetch promotion mechanism to reduce cache pollution | |
| US20100064107A1 (en) | Microprocessor cache line evict array | |
| JP2000250813A (en) | Data managing method for i/o cache memory | |
| JPH0962572A (en) | Device and method for stream filter | |
| US7657667B2 (en) | Method to provide cache management commands for a DMA controller | |
| EP1304619A1 (en) | Cache memory operation | |
| CN113641598A (en) | Microprocessor, cache memory system and method implemented therein | |
| US5926841A (en) | Segment descriptor cache for a processor | |
| JP2007200292A (en) | Disowning cache entries on aging out of the entry | |
| EP0741356A1 (en) | Cache architecture and method of operation | |
| US8108621B2 (en) | Data cache with modified bit array | |
| US6965962B2 (en) | Method and system to overlap pointer load cache misses | |
| US5835945A (en) | Memory system with write buffer, prefetch and internal caches | |
| GB2454810A (en) | Cache memory which evicts data which has been accessed in preference to data which has not been accessed | |
| US8108624B2 (en) | Data cache with modified bit array |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: STMICROELECTRONICS (RESEARCH & DEVELOPMENT) LIMITE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JONES, ANDREW MICHAEL;RYAN, STUART;REEL/FRAME:022050/0062 Effective date: 20081215 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |