WO2011078014A1 - キャッシュメモリおよびキャッシュメモリ制御装置 - Google Patents
キャッシュメモリおよびキャッシュメモリ制御装置 Download PDFInfo
- Publication number
- WO2011078014A1 WO2011078014A1 PCT/JP2010/072475 JP2010072475W WO2011078014A1 WO 2011078014 A1 WO2011078014 A1 WO 2011078014A1 JP 2010072475 W JP2010072475 W JP 2010072475W WO 2011078014 A1 WO2011078014 A1 WO 2011078014A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- tag
- storage unit
- address
- entry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0808—Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/128—Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/314—In storage network, e.g. network attached cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/70—Details relating to dynamic memory management
Definitions
- the present invention relates to a cache memory, and more particularly to a cache memory that can be used as a shared FIFO (First-In First-Out).
- FIFO First-In First-Out
- the present invention has been made in view of such a situation, and an object thereof is to efficiently transfer data between processors in a multiprocessor having a shared cache memory. Another object of the present invention is to realize synchronization by a shared cache memory when data is transferred between processors in a multiprocessor.
- the cache memory according to the first aspect of the present invention includes a plurality of entries including a tag address and a remaining reference count by a first address portion of an access address.
- a tag storage unit to which at least one is indexed a data storage unit to store data corresponding to the plurality of entries, a second address portion of the access address different from the first address portion, and the indexed
- When a matching entry is detected by comparing with the tag address included in the entry and the fact that the remaining reference count included in the matching entry is one remaining is displayed for read access
- the entry is invalidated without writing back after the read access, and the remaining reference count is larger than the remaining one.
- a tag control unit for reducing the remaining reference count by one when the number is displayed, and a data control unit for selecting data corresponding to the matched entry from the data storage unit for the read access. To do. This brings about an effect that the data is invalidated after being read by the number of times corresponding to the remaining number of times of reference.
- the tag control unit displays a number in which the remaining reference counts of the entries in the tag storage unit corresponding to the first address part are all greater than zero for write access.
- the data related to the write access and the reference count may be controlled to be saved in an external memory without accessing the tag storage unit and the data storage unit.
- a prefetch control unit that controls to prefetch the saved data and the reference count from the memory to the data storage unit and the tag storage unit when there is free space in the data storage unit.
- the first aspect further includes an area designation register for designating a specific area on the memory, and the tag control unit is configured to perform the above-described write access with respect to the write access when the access address is included in the area.
- the tag control unit is configured to perform the above-described write access with respect to the write access when the access address is included in the area.
- the saved data is prefetched from the memory to the data storage unit, and the remaining reference count in the tag storage unit is set to one. You may further provide the prefetch control part to control. This brings about the effect of urging data transfer from the memory to the cache memory.
- the cache memory control device includes a tag storage unit in which at least one of a plurality of entries including a tag address and a remaining reference count is indexed by a first address portion of an access address; The second address part of the access address different from the first address part is compared with the tag address included in the indexed entry to detect a matched entry, and the matched for read access If the entry indicates that the remaining reference count is 1 remaining, the entry is invalidated without writing back after the read access, and the remaining reference count is larger than the remaining 1 count.
- the tag control unit reduces the remaining reference count by one. This brings about an effect that the data is invalidated after being read by the number of times corresponding to the remaining number of times of reference.
- the cache memory is a tag in which at least one of a plurality of entries including a tag address and a lifetime flag indicating whether or not a lifetime is included is indexed by the first address portion of the access address.
- a storage unit a data storage unit for storing data corresponding to the plurality of entries and storing the remaining reference count when the lifetime flag indicates that a lifetime is included; and the first address portion, Compares the second address portion of the different access addresses with the tag address included in the indexed entry to detect a matching entry and for the read access the lifetime flag included in the matched entry Indicates that there is a lifetime, the corresponding remaining reference count is displayed once.
- a tag control unit that invalidates the entry without writing back after the read access, and data corresponding to the matched entry for the read access is selected from the data storage unit and the matched Data control for reducing the remaining reference count by one when the remaining reference count is larger than the remaining one when the lifetime flag included in the entry indicates that the lifetime is included Part. This brings about an effect that the data is invalidated after being read by the number of times corresponding to the remaining number of times of reference.
- the tag control unit displays a number in which the remaining reference counts of the entries of the data storage unit corresponding to the first address portion are all greater than zero for write access.
- the data related to the write access and the reference count may be controlled to be saved in an external memory without accessing the tag storage unit and the data storage unit.
- the data storage unit may further include a prefetch control unit that controls to prefetch the saved data and the reference count from the memory to the data storage unit when there is free space. . This brings about the effect of urging data transfer from the memory to the cache memory.
- the third aspect further includes an area designation register for designating a specific area on the memory, wherein the tag control unit is configured to perform the above-mentioned operation for write access when the access address is included in the area.
- the tag control unit is configured to perform the above-mentioned operation for write access when the access address is included in the area.
- the saved data is prefetched from the memory to the data storage unit, and the remaining reference count in the data storage unit is set to one. You may further provide the prefetch control part to control. This brings about the effect of urging data transfer from the memory to the cache memory.
- the cache memory includes a tag storage unit in which at least one of a plurality of entries including a tag address and a data amount field is indexed by a first address portion of an access address,
- the data storage unit that stores data corresponding to the entry of the first address portion matches the second address portion of the access address different from the first address portion and the tag address included in the indexed entry.
- write access it waits until a free space is secured based on the value of the data amount field included in the matched entry, and after the write access, the write amount is stored in the data amount field. Add the amount of data related to access.
- the system waits until the amount of data targeted for read access is secured, and after the read access, subtracts the data amount related to the read access from the data amount field.
- a tag control unit that writes data related to the write access with respect to the write access to the matched entry of the data storage unit, and stores data corresponding to the matched entry with respect to the read access to the data storage unit And a data control unit to select from.
- the tag control unit may include a mode in which the data amount is added at a delay timing when the write access is executed for a predetermined number of entries after the write access. As a result, the update of the data amount field is delayed and the rewriting of data is allowed. Regardless of this, the tag control unit may add the data amount promptly upon receiving a flush instruction in the mode in which the data amount is added at the delay timing.
- the tag storage unit includes a lock bit indicating whether or not the entry is locked to the entry, and the tag control unit includes the matched entry at the time of the write access.
- the lock bit included in the entry may be locked, and the lock bit included in the matched entry may be unlocked during the read access.
- the tag control unit may include a mode in which the lock bit is locked at a delay timing when the write access is executed for a predetermined number of entries after the write access. This brings about the effect of delaying the update of the lock bit and allowing the data to be read again. Regardless of this, the tag control unit may unlock the lock bit immediately upon receiving a flush instruction in a mode in which the lock bit is locked at the delay timing.
- the tag control unit displays a number in which the data amount field of the entry of the tag storage unit corresponding to the first address portion is greater than zero for the write access. Or when the lock bit is locked, the data related to the write access and the amount of write data are saved in an external memory without accessing the tag storage unit and the data storage unit. You may make it control to. As a result, data having a size exceeding the capacity of the cache memory is saved in an external memory, and the subsequent delivery is continued.
- the cache memory control device includes a tag storage unit in which at least one of a plurality of entries including a tag address and a data amount field is indexed by a first address portion of an access address; The second address part of the access address different from the first address part is compared with the tag address included in the indexed entry to detect a matched entry, and in the case of write access, the matched address Wait until the free space is secured based on the value of the data amount field included in the entry, and after the write access, add the data amount related to the write access to the data amount field.
- the present invention it is possible to obtain an excellent effect that data can be efficiently transferred between processors in a multiprocessor having a shared cache memory. Further, according to the present invention, when data is transferred between processors in a multiprocessor, it is possible to achieve an excellent effect that synchronization can be realized by a shared cache memory.
- FIG. 1 It is a figure showing an example of 1 composition of an information processing system in an embodiment of the invention. It is a figure which shows the function structural example of the shared cache in embodiment of this invention. It is a figure which shows the circuit structural example of the shared cache in embodiment of this invention. It is a figure which shows the example of the correspondence of the data storage part and main memory in embodiment of this invention. It is a figure which shows the example of a field structure of the tag memory
- FIG. 1 is a diagram illustrating a configuration example of an information processing system according to an embodiment of the present invention.
- This information processing system includes p processors 100-1 to 100-p (p may be collectively referred to as a processor 100 hereinafter) and a shared cache (secondary cache). ) 200 and the main memory 300.
- the processors 100-1 to 100-p and the shared cache 200 are mutually connected by a system bus 190.
- Each of the processors 100 includes primary caches 110-1 to 110-p (hereinafter, these may be collectively referred to as a primary cache 110).
- the processor 100 performs data access using the primary cache 110 as long as the primary cache 110 is hit, but accesses the shared cache 200 when a miss occurs in the primary cache 110.
- the processor 100 performs data access using the shared cache 200 as long as the shared cache 200 is hit when a miss occurs in the primary cache 110.
- the main memory 300 is accessed.
- a three-level storage structure of the primary cache 110, the shared cache (secondary cache) 200, and the main memory 300 corresponding to each of the processors 100 is employed.
- FIG. 2 is a diagram illustrating a functional configuration example of the shared cache 200 according to the embodiment of the present invention.
- the shared cache 200 includes an arbitration unit 210, a tag storage unit 220, a tag control unit 230, a data storage unit 240, a data control unit 250, and a response unit 260.
- the arbitration unit 210 arbitrates accesses from the processors 100-1 to 100-p and the main memory 300, and grants access permission to any of them.
- the arbitration in the arbitration unit 210 for example, it is conceivable that the processors 100-1 to 100-p and the main memory 300 are sequentially assigned by the round robin method. The permitted access is supplied to the tag control unit 230.
- the tag storage unit 220 is a memory made up of a plurality of entries, and holds a tag address or the like in each entry.
- the tag address indicates a part of the accessed address, as will be described later.
- Each entry in the tag storage unit 220 is indexed by another part of the accessed address.
- the tag control unit 230 selects and controls an entry to be accessed in the tag storage unit 220 based on the accessed address. The entry selected by the tag control unit 230 is notified to the data control unit 250.
- the data storage unit 240 stores data corresponding to each entry in the tag storage unit 220. Data stored in the data storage unit 240 is managed for each cache line, and transfer between the main memory 300 and the processor 100 is also performed for each cache line.
- the data control unit 250 accesses the data (cache line) stored in the data storage unit 240 according to the entry selected by the tag control unit 230. In the case of read access or write back operation, the data read from the data storage unit 240 is supplied to the response unit 260. In the case of write access, the write data is embedded at the corresponding position in the data read from the data storage unit 240 and stored again in the data storage unit 240.
- the response unit 260 outputs the data supplied from the data control unit 250 to the processors 100-1 to 100-p or the main memory 300. If the response is a read access from the processor 100, the data is output to the accessed processor 100. In the case of a write-back operation to the main memory 300, the data is output to the main memory 300.
- FIG. 3 is a diagram illustrating a circuit configuration example of the shared cache 200 according to the embodiment of the present invention.
- the shared cache 200 is a 2-way set associative cache having 128 lines and a line size of 64 B (bytes). That is, a maximum of two cache lines can be stored for the same index address, and the size of data corresponding to each cache line is 64 bytes.
- the required address is 28 bits. Since the block size is 64 bytes, a total of 6 bits from the 0th to 5th bits of the access address are allocated to the in-line address. Since the number of lines is 128, the index address for indexing the entry in the tag storage unit 220 is assigned to a total of 7 bits of the 6th to 12th bits of the access address. Therefore, the tag address is assigned to a total of 15 bits of the 13th to 27th bits of the access address.
- the tag address is supplied to the shared cache 200 via the signal line 201
- the index address is supplied via the signal line 202
- the in-line address is supplied to the shared cache 200 via the signal line 203.
- the tag storage unit 220 includes two ways # 0 and # 1 each having 128 entries. Each way of the tag storage unit 220 is indexed by an index address supplied via the signal line 202. Thus, in this example, two entries will be indexed.
- the tag control unit 230 includes comparators 231 and 232 and an OR operation unit 233, and detects an entry with a matching tag address among the entries indexed in the tag storage unit 220.
- the comparator 231 detects a match by comparing the tag address included in the entry indexed in the way # 0 of the tag storage unit 220 with the tag address supplied via the signal line 201.
- the comparator 232 detects a match by comparing the tag address included in the entry indexed in the way # 1 of the tag storage unit 220 with the tag address supplied via the signal line 201. is there.
- the comparison results in the comparators 231 and 232 are supplied to the logical sum calculator 233 and the data control unit 250.
- the logical sum calculator 233 outputs a notification of a hit via the signal line 298 when a match is detected by either the comparator 231 or 232. However, as described later, when the valid bit of the corresponding entry indicates invalid, it is determined as a miss hit.
- the data storage unit 240 includes two ways # 0 and # 1 each having 128 cache lines, and stores data corresponding to each entry in the tag storage unit 220.
- the data storage unit 240 is also indexed by an index address supplied via the signal line 202 in the same manner as the tag storage unit 220. As a result, two 64-byte line data are supplied to the data control unit 250.
- the data control unit 250 includes selectors 251 and 252.
- the selector 251 selects one of the two 64B data supplied from the data storage unit 240. That is, when a match is detected in the comparator 231, the line data of the way # 0 in the data storage unit 240 is selected, and when a match is detected in the comparator 232, the way # 1 in the data storage unit 240 is selected. Line data is selected. However, as will be described later, when the valid bit of the entry in which a match is detected indicates that it is invalid, the data of the cache line is not selected. If no match is detected in either of the comparators 231 and 232, no data of the cache line is selected.
- the selector 252 selects data at a position specified as an in-line address among the selected line data.
- the in-line address is supplied via the signal line 203.
- the function of the selector 252 may be provided on the processor 100 side. In either case, the entire line data or a part thereof is output to the response unit 260 via the signal line 299.
- FIG. 4 is a diagram illustrating an example of a correspondence relationship between the data storage unit 240 and the main memory 300 in the embodiment of the present invention.
- a 2-way set associative cache having 128 lines and a block size of 64 bytes is assumed as the shared cache 200.
- Each cache line of the data storage unit 240 is indexed by the index address as described above.
- the index address of the 0th line is “0”
- the index address of the 1st line is “1”
- the index address of the 127th line is “127”.
- the 0th line of the data storage unit 240 stores a line whose lower 13 bits of the address are “0b0000000000000” (“0b” means that the following number is a binary number, and so on).
- the first line of the data storage unit 240 stores a line whose lower 13 bits of the address are “0b0000001000000”.
- the second line of the data storage unit 240 stores a line whose lower 13 bits of the address is “0b0000010000000”.
- the third line of the data storage unit 240 stores a line whose lower 13 bits of the address are “0b0000011000000”.
- the fourth line of the data storage unit 240 stores a line whose lower 13 bits of the address are “0b0000100000000”.
- the 127th line of the data storage unit 240 stores a line whose lower 13 bits of the address are “0b1111111000000000”.
- the number of cache lines that can be stored in the shared cache 200 for a certain index address is limited to two. Therefore, in order to store new data in a set in which two cache lines are already occupied, it is necessary to eject one of the cache lines and replace it.
- a method of selecting a cache line candidate to be replaced for example, a method of expelling a cache line that is not used at the nearest time (LRU method) is known.
- FIG. 5 is a diagram illustrating a field configuration example of the tag storage unit 220 according to the first embodiment of the present invention.
- Each entry in the tag storage unit 220 includes fields for a tag address 221, a valid 222, a dirty 223, and a reference count 224.
- the tag address 221 stores the tag address (upper 15 bits of the address) of the cache line corresponding to the entry.
- the tag address 221 is abbreviated as “TAG”.
- the valid 222 stores a valid bit (Valid) indicating the validity of the entry. If the valid 222 indicates “1”, the cache line data corresponding to the entry is valid. If the valid 222 indicates “0”, even if a match is detected in the comparator 231 or 232, a hit is determined. Not judged. In the figure, this valid 222 is abbreviated as “V”.
- the dirty 223 stores a dirty bit (Dirty) indicating that the data in the cache line corresponding to the entry does not match the data in the main memory 300.
- the dirty 223 indicates “1”, the data in the cache line corresponding to the entry does not match the data in the main memory 300, and the data in the shared cache 200 is the latest data. means.
- the dirty 223 indicates “0”, it means that the data in the cache line corresponding to the entry matches the data in the main memory 300.
- this dirty 223 is abbreviated as “D”.
- the reference count 224 stores the remaining reference count (Reference number) to which the cache line corresponding to the entry should be referred.
- the reference count 224 is abbreviated as “R”. This reference count 224 is set at the same time when data to be transferred is written to the cache line.
- R the reference count 224
- “1” is subtracted (decremented) from the value stored in the reference count 224.
- the cache line is invalidated after the read access. At this time, write back to the main memory 300 is not performed.
- the reference count 224 indicates “0”, the value does not change even if read access is performed. As a result, an operation according to a normal cache algorithm can be performed. That is, no matter how many read accesses are made, invalidation is not performed as long as it is a cache hit, and write back to the main memory 300 is performed when necessary.
- FIG. 6 is a diagram illustrating a configuration example relating to the update of the tag storage unit 220 according to the first embodiment of the present invention.
- the tag storage unit 220 or the tag control unit 230 includes a comparator 511, a subtracter 512, and a comparator 513.
- the reference number 224 of the target cache line is read, and when the comparator 511 detects that the reference number 224 indicates “2” or more, the reference number 224 is subtracted from the subtracter 512. 1 is decremented (decremented). Further, when read access is performed, the reference count 224 of the target cache line is read, and when the comparator 513 detects that the reference count 224 indicates “1”, the cache line is invalidated. Done. That is, the valid 222 and the dirty 223 are cleared to zero.
- data can be transferred between the processors 100 by using the shared cache 200 as a shared FIFO. At this time, since the transferred data is invalidated without being written back, it does not remain in the shared cache 200.
- the processor 100-1 writes data with the reference count to the shared cache 200.
- the tag controller 230 detects the tag match for the write access, and as a result, the reference count 224 is already set in all ways. Event occurs. In this case, data is directly stored in the main memory 300 through the shared cache 200 and using the uncached path. At this time, the reference count accompanying the write data is also saved in the main memory 300.
- the processor 100-2 reads the data from the shared cache 200, and invalidates the cache line whose reference count 224 has changed from “1” to “0”. Then, when there is a read access request from the processor 100-2, data is filled from the main memory 300 to the shared cache 200. At this time, the saved reference count is also set to the reference count 224. As a result, read access from the processor 100-2 becomes possible.
- FIG. 7 is a diagram showing a first example of the relationship between the main memory 300 and the FIFO storage area 310 according to the first embodiment of the present invention.
- a continuous space of the main memory 300 is used as the FIFO storage area 310 shared among the processors 100. Therefore, the FIFO storage area 310 is specified by the start address and the size. In this example, it is assumed that the FIFO storage area 310 is predetermined as a specified value.
- the write data is saved through the shared cache 200.
- a reference count storage area 320 for saving the reference count is allocated to the main memory 300.
- the reference count storage area 320 when the reference count 224 is already set in all ways as described above, the reference count associated with the write data is saved through the shared cache 200.
- FIG. 8 is a diagram showing a processing procedure during writing of the shared cache 200 according to the first embodiment of this invention.
- step S910 When a cache hit is detected as a result of comparison in the tag control unit 230 (step S910), “1” is set to the dirty 223 of the cache line, and the reference count associated with the write data is set to the reference count 224. (Step S918). Then, the write data is written into the data storage unit 240 (step S919).
- step S910 when a miss hit is detected (step S910), if there is an unused way (step S911), a cache line is added (step S912), and the valid 222 of the cache line is "1". Is set (step S917). Then, “1” is set to the dirty 223 of the cache line, and the reference count associated with the write data is set to the reference count 224 (step S918). Then, the write data is written into the data storage unit 240 (step S919).
- step S910 If a miss-hit is detected (step S910), if all ways are in use (step S911) and the reference count 224 of all ways is set to a value greater than zero (step S913).
- step S913 Through the shared cache 200. That is, the data and the reference count are saved in the main memory 300 using the uncached path (steps S915 and S916).
- step S913 If a miss is detected and all the ways are in use (step S911) and the reference count 224 of any way is set to zero (step S913), the cache The line is replaced (step S914).
- the operation after the cache line replacement is the same as the operation when the cache line is added (steps S917 to S919).
- FIG. 9 is a diagram showing a processing procedure when the shared cache 200 is read according to the first embodiment of this invention.
- step S921 when a miss hit is detected (step S921), a cache line is secured (step S922), and "1" is set in the valid 222 of the cache line (step S923). At this time, “0” is set in the dirty 223, and the reference count saved in the reference count storage area 320 of the main memory 300 is set in the reference count 224 (step S923).
- the data storage unit 240 is filled with data from the FIFO storage area 310 of the main memory 300 (step S924). Thereafter, data is read from the data storage unit 240 of the cache line (step S925).
- step S921 when a cache hit is detected (step S921), data is read from the data storage unit 240 of the cache line (step S926). At this time, if the value of the reference count 224 of the cache line is “1” (step S927), the valid 222 is set to “0” and invalidated (step S928). If the value of the reference count 224 is “2” or more (step S927), the value of the reference count 224 is decremented (decremented) by 1 (step S929). If the value of the reference count 224 is “0” (step S927), the value of the reference count 224 does not change.
- FIG. 10 is a diagram showing a first example of a data passing sequence between processors in the first embodiment of the present invention.
- the shared cache 200 has a sufficient capacity.
- the shared cache 200 secures a cache line and writes the write data and the reference count in the entry (step S951).
- step S952 when the processor 100-2 issues a read request (step S952), the shared cache 200 compares in the tag control unit 230 (step S953), and outputs data to the processor 100-2 when the cache hits (step S954). .
- step S955 assuming that the value of the reference count 224 is “2” or more, “1” is subtracted (step S955).
- step S956 When the processor 100-2 issues several read requests (step S956), a cache hit occurs (step S957), and data is output to the processor 100-2 (step S958). At this time, assuming that the value of the reference count 224 is “1”, the cache line is invalidated without being written back (step S959).
- FIG. 11 is a diagram showing a second example of a data passing sequence between processors in the first embodiment of the present invention. In this example, data having a size exceeding the capacity of the shared cache 200 is transferred.
- step S960 When the processor 100-1 issues a write request (step S960), it is assumed that a cache line cannot be secured in the shared cache 200 (step S961), and the data and the reference count are saved in the main memory 300 (step S961). S962).
- the processor 100-2 issues a read request for other data (step S963), hits the cache (step S964), and outputs the data to the processor 100-2 (step S965).
- the cache line is invalidated without being written back (step S966).
- step S967 when a read request related to the saved data is issued by the processor 100-2 (step S967), a miss occurs and a fill request is issued from the shared cache 200 to the main memory 300 (step S968).
- step S969 When the data saved in the main memory 300 and the reference count are output to the shared cache 200 (step S969), the shared cache 200 writes the entry in the cache line (step S970). As a result, the saved data can be read from the shared cache 200, and the data is output to the processor 100-2 (step S971).
- FIG. 12 is a diagram illustrating a configuration example in which a prefetch function is provided in the information processing system according to the first embodiment of this invention.
- a prefetch control unit 400 is connected between the processors 100-1 to 100-p and the shared cache 200. Prior to the read access from the processor 100, the prefetch control unit 400 issues a read request to the shared cache 200 to perform prefetching. In other words, this prefetch prompts data transfer from the main memory 300 to the shared cache 200.
- FIG. 13 is a diagram illustrating a configuration example of the prefetch control unit 400 according to the first embodiment of the present invention.
- the prefetch control unit 400 includes a prefetch address register 410, a FIFO capacity register 420, a shared cache capacity counter 430, a main memory capacity counter 440, a bus interface 450, and a control unit 490.
- the prefetch address register 410 is a register that holds a prefetch address for issuing a read request to the shared cache 200.
- the value of the prefetch address register 410 is sequentially updated and controlled to prepare for the next prefetch.
- the FIFO capacity register 420 is a register that holds the total capacity of the shared cache 200 and the main memory 300 used as a shared FIFO.
- the shared cache capacity counter 430 is a counter that holds the data size stored in the shared cache 200.
- the main memory capacity counter 440 is a counter that holds the size of the transfer target data stored in the main memory 300.
- the bus interface 450 is a bus interface to the system bus 190 on the shared cache 200 side.
- the control unit 490 is responsible for overall control of the prefetch control unit 400.
- the capacity to be handled as the FIFO is set in the FIFO capacity register 420.
- the processor 100-1 writes data to transfer the first data.
- a value obtained by adding the value of the increasing shared cache capacity counter 430 to the write address from the processor 100-1 is stored in the prefetch address register 410.
- the value of the prefetch address register 410 means an address that may issue a prefetch request.
- the prefetch address register 410 is incremented by the increment of the shared cache capacity counter 430 due to the write from the processor 100-1. Then, a prefetch request is issued to an address held in the prefetch address register 410 when a prefetch generation condition described later is satisfied.
- the FIFO capacity register 420 is reset or cleared. With this as a trigger, the prefetch address register 410 is also cleared.
- FIG. 14 is a diagram illustrating an example of state transition of the prefetch control unit 400 according to the first embodiment of the present invention.
- the prefetch control unit 400 is in one of five states: an empty state 10, an L2 limited state 21, an L2 full / main save state 22, an L2 non-full / main save state 23, and a full state 30.
- the value of the shared cache capacity counter 430 is expressed as cnt, the value of the main memory capacity counter 440 as excnt, the value of the FIFO capacity register 420 as size, and the total capacity of the shared cache 200 as L2size. Each data is referred to once.
- the L2 non-full / main save state 23 is a state in which the shared cache 200 has a cache line whose reference count 224 has a value of “0”, but the data to be transferred is also stored in the main memory 300. That is, cnt ⁇ L2size and excnt ⁇ 0.
- the shared cache capacity counter 430 is incremented (incremented) by 1, and the state transits to the L2 limited state 21.
- the timing of addition and state transition of the shared cache capacity counter 430 is when a transaction between the prefetch control unit 400 and the shared cache 200 is completed.
- the update and state transition timings of the shared cache capacity counter 430 and the main memory capacity counter 440 are when a transaction between the prefetch control unit 400 and the shared cache 200 is completed.
- the shared cache capacity counter 430 is decremented (decremented) by “1”. At this time, if the value of the shared cache capacity counter 430 is subtracted from “1” to “0”, the state transits to the empty state 10.
- the shared cache capacity counter 430 is incremented by “1”.
- the main memory capacity counter 440 is incremented from “0” to “1”, and the state shifts to the L2 full / main save state 22. To do. At that time, the data is directly stored in the main memory 300 instead of the shared cache 200.
- the shared cache capacity counter 430 is decremented (decremented) by 1, and the state transits to the L2 non-full / main save state 23.
- the read data at this time is data stored in the shared cache 200, and since the number of references is assumed to be one as described above, the cache line after the read is invalidated.
- the shared cache capacity counter 430 is decremented (decremented) by “1”, and transitions to the L2 non-full / main save state 23. Further, in the full state 30, there may occur a case where data to be transferred is not stored in the shared cache 200 and all is stored in the main memory 300. In this case, prefetch is performed from the main memory 300 to the shared cache 200, the shared cache capacity counter 430 is incremented by "1”, the main memory capacity counter 440 is decremented from "0" to "1”, and the L2 non-full / main save The state transitions to state 23.
- a prefetch request is automatically issued from the prefetch control unit 400 to the shared cache 200.
- the shared cache capacity counter 430 is incremented (incremented) by 1
- the main memory capacity counter 440 is decremented (decremented) by 1.
- the main memory capacity counter 440 is incremented by “1”, and the data is directly stored in the main memory.
- the shared cache capacity counter 430 is decremented (decremented) by “1”. However, when the value of the shared cache capacity counter 430 is “0” at the time of reading, the completion of prefetch is awaited. After the prefetch operation, when the value of the main memory capacity counter 440 becomes “0”, the state transits to the L2 limited state 21.
- the prefetch is performed. Satisfy the occurrence condition. Then, by prefetching, data is filled from the main memory 300 to the shared cache 200.
- FIG. 15 is a diagram illustrating a second example of the relationship between the main memory 300 and the FIFO storage area 310 according to the first embodiment of this invention.
- a head address register 521 that holds a head address and a size register 522 that holds a size are provided in order to specify a storage area used as a FIFO application in the shared cache 200.
- the head address register 521 and the size register 522 are an example of an area designation register described in the claims.
- the reference count 224 is used when filling the data. Is set to “1”. In a normal FIFO, once read data is unnecessary, there is no problem that the number of references is one time, and it is not necessary to save the number of references to the main memory 300, thereby reducing the area on the LSI. it can.
- the processing procedure at the time of writing is the same as that described with reference to FIG. 8, and thus the description thereof will be omitted.
- the processing procedure at the time of reading will be described below.
- FIG. 16 is a diagram showing a processing procedure when the shared cache 200 is read when the FIFO storage area is designated in the first embodiment of the present invention.
- step S931 the operation when a cache hit is detected (step S931) is the same as the processing procedure (steps S926 to S929) described with reference to FIG. 9 (steps S936 to S939).
- step S931 the operation when a miss hit is detected (step S931) is almost the same as the processing procedure (steps S922 to S925) described with reference to FIG. 9 (steps S932 to S935).
- this example is different in that the reference count 224 is set to “1” in step S933. Thereby, it is unnecessary to save the reference count in the reference count storage area 320.
- the cache line is invalidated when “1” is changed to “0” by subtracting the reference count 224 of the tag storage unit 220 every read access.
- the cache memory can be operated as a shared FIFO between processors.
- the reference count 224 field is provided in the tag storage unit 220.
- the reference count is stored in the data storage unit 240.
- the premise of the information processing system and the configuration of the shared cache is the same as that of the first embodiment described with reference to FIGS.
- FIG. 17 is a diagram illustrating a field configuration example of the tag storage unit 220 according to the second embodiment of the present invention.
- Each entry of the tag storage unit 220 includes fields of a tag address 221, a valid 222, a dirty 223, and a lifetime 225. Since the tag address 221, the valid 222, and the dirty 223 are the same as the fields of the first embodiment described with reference to FIG. 5, the description thereof is omitted here.
- the 225 with lifetime stores a lifetime limited flag (Time limited) indicating whether or not the cache line corresponding to the entry has a lifetime.
- this lifetime 225 is abbreviated as “T”.
- the cache line data storage unit 240 indicating that the lifetime 225 has a lifetime stores the number of references as will be described later.
- each field of the tag storage unit 220 is set at the time of processing for dealing with a cache miss accompanying the occurrence of a cache miss, and is updated as appropriate in the subsequent processing.
- FIG. 18 is a diagram illustrating a field configuration example of the data storage unit 240 according to the second embodiment of the present invention.
- the data storage unit 240 includes two ways # 0 and # 1 each consisting of 128 cache lines, and holds 64-byte line data. Of the 64-byte line data, the upper 1 byte is the reference count 242, and the lower 63 bytes is the data 241. This allocation of 1 byte and 63 bytes is an example, and may be changed as appropriate.
- the reference count 242 stores the remaining reference count (Reference Number) to which the cache line corresponding to the entry should be referenced. In this example, an integer value of “0” to “255” is stored. In the figure, the reference count 242 is abbreviated as “RN”. This reference count 242 is valid only when the corresponding cache line with lifetime 225 indicates “1”. When the lifetime 225 indicates “0”, the reference count 242 does not have a special meaning, and the entire 64-byte line data is handled as data. That is, according to the value of 225 with life, the cache line has one of two types of configurations.
- the reference count 242 is set at the same time when the data to be transferred is written as the data 241 to the cache line.
- “1” is subtracted (decremented) from the value stored in the reference count 224.
- the cache line is invalidated after the read access. At this time, write back to the main memory 300 is not performed.
- FIG. 19 is a diagram showing a processing procedure at the time of cache line write of the tag control unit 230 in the second embodiment of the present invention.
- the tag control unit 230 When the tag control unit 230 receives a cache line write request from the processor 100 (step S811), the tag control unit 230 reads the tag storage unit 220 based on the address of the cache line included therein, and determines a cache hit or a miss hit.
- the cache line write request includes designation of the cache line address and type.
- step S812 If it is a cache hit (step S812), the tag control unit 230 updates the tag information of the hit cache line stored in the tag storage unit 220 (step S816). In the cache line to be updated, the valid 222 is set to “1”. Then, the tag control unit 230 notifies the data control unit 250 of the storage location of the hit cache line in the data storage unit 240, and instructs the cache line write (step S818).
- step S812 the tag control unit 230 determines whether or not the missed cache line can be added to the data storage unit 240.
- the tag control unit 230 adds the tag information of the missed cache line to the tag storage unit 220 (step S815).
- the valid 222 is set to “1”
- the dirty 223 is set to “1”.
- “225” with a lifetime is set to “1” if it has a lifetime and “0” if it does not have a lifetime according to the type included in the cache line request.
- the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240 and instructs cache line write (step S818).
- step S813 If the missed cache line cannot be added to the data storage unit 240 (step S813), a cache line replacement process is performed to secure an additional area for the cache line (step S814). Then, the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240 and instructs cache line write (step S818).
- FIG. 20 is a diagram showing a processing procedure at the time of cache line reading of the tag control unit 230 in the second embodiment of the present invention.
- the tag control unit 230 When the tag control unit 230 receives a cache line read request from the processor 100 (step S821), the tag control unit 230 reads the tag storage unit 220 based on the address of the cache line included therein, and determines a cache hit or a miss hit.
- This cache line read request includes specification of the address and type of the cache line. If the tag address 221 matches, the valid 222 is “1”, and the 225 with lifetime matches the type included in the request, it is determined as a cache hit, and otherwise, it is determined as a cache miss.
- step S822 If it is a cache hit (step S822), the tag control unit 230 notifies the data control unit 250 of the storage location of the hit cache line in the data storage unit 240, the type of cache line, and the supply destination of the cache line. . Thereby, a cache line read is requested (step S828).
- step S822 the tag control unit 230 determines whether or not the missed cache line can be added to the data storage unit 240.
- the tag control unit 230 adds tag information of the missed cache line to the tag storage unit 220 (step S825).
- a tag calculated from the address of the missed cache line is stored in the tag address 221.
- the valid 222 is set to “1”
- the dirty 223 is set to “0”.
- “225” with a lifetime is set to “1” if it has a lifetime and “0” if it does not have a lifetime, according to the type included in the cache line request.
- the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240 and the address in the main memory 300, and requests cache line fetch (step S827).
- the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240, the type of the cache line, and the supply destination of the cache line, and requests cache line read. (Step S828).
- the tag control unit 230 secures an additional area for the cache line by executing the cache line replacement process (step S824).
- the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240 and the address in the main memory 300, and requests cache line fetch (step S827).
- the tag control unit 230 notifies the data control unit 250 of the storage location of the missed cache line in the data storage unit 240, the type of the cache line, and the supply destination of the cache line, and requests cache line read. (Step S828).
- FIG. 21 is a diagram showing a processing procedure at the time of cache line replacement of the tag control unit 230 in the second embodiment of the present invention. This process corresponds to step S814 in FIG. 19 or step S824 in FIG.
- the tag control unit 230 determines whether a cache line can be added to the data storage unit 240 when a cache line needs to be added due to a cache miss. At this time, if it cannot be added, any cache line currently held is selected, and the cache line is written back to the main memory 300 to secure a free area, and a new cache line is stored there. Is stored. This is the cache line replacement process.
- the tag control unit 230 refers to the tag information in the tag storage unit 220 and selects the cache line to be written back to the main memory 300 (step S831). As described above, this cache line can be selected by using a method (LRU method) for expelling a cache line that is not used at the nearest time.
- LRU method a method for expelling a cache line that is not used at the nearest time.
- the tag control unit 230 notifies the data control unit 250 of the storage position of the selected cache line in the tag storage unit 220, the type of the cache line, and the write-back destination address of the cache line to the main memory 300. . Thereby, a cache line write-back is requested (step S832).
- the tag control unit 230 replaces the tag information of the selected cache line with the tag information of the missed cache line (step S833).
- the tag address 221 stores a tag calculated from the address of the missed cache line.
- the valid 222 is set to “1”.
- the dirty 223 stores “1” if the cache miss is caused by a write access, and “0” if the cache miss is caused by a read access.
- “225” with a lifetime is set to “1” if it has a lifetime and “0” if it does not have a lifetime, according to the type included in the cache line request.
- FIG. 22 is a diagram showing a processing procedure at the time of cache line read of the data control unit 250 in the second embodiment of the present invention.
- the data control unit 250 When the data control unit 250 receives a cache line read instruction from the tag control unit 230 (step S841), the data control unit 250 reads a cache line at a position in the data storage unit 240 designated by the tag control unit 230 (step S842). If the target of the read instruction of the tag control unit 230 is a cache line with a lifetime (step S843), the data control unit 250 subtracts “1” from the read reference count 242 value (step S844), and the data storage unit Write back to 240 (step S845). Then, the data control unit 250 outputs the cache line with a lifetime to the processor 100 side (step S846). If the target of the read instruction of the tag control unit 230 is a normal cache line (step S843), the data control unit 250 outputs the cache line read from the position in the data storage unit 240 designated by the tag control unit 230 ( Step S846).
- FIG. 23 is a diagram illustrating a processing procedure at the time of cache line write-back of the data control unit 250 according to the second embodiment of the present invention.
- step S851 When the data control unit 250 receives a cache line write-back instruction from the tag control unit 230 (step S851), the data control unit 250 reads the cache line at the position in the data storage unit 240 designated by the tag control unit 230 (step S852). If the target of the write back instruction of the tag control unit 230 is a cache line with a lifetime (step S853), the data control unit 250 checks the value of the reference count 242 read from the data storage unit 240. If the value of the reference count 242 is zero, the cache line write-back process is once terminated (step S854).
- step S854 If the value of the reference count 242 is not zero (step S854), the data control unit 250 outputs the lifetime cache line and the address designated by the tag control unit 230 to the main memory 300 side (step S855). As a result, the cache line with the lifetime is written to the designated address of the main memory 300.
- FIG. 24 is a diagram illustrating a processing procedure at the time of cache line fetching of the data control unit 250 according to the second embodiment of the present invention.
- step S861 When the data control unit 250 receives a cache line fetch instruction from the tag control unit 230 (step S861), the data control unit 250 outputs the address designated by the tag control unit 230 to the main memory 300 side. As a result, the cache line size data from the designated address is requested to be read from the main memory 300 (step S862).
- the data control unit 250 receives the transferred cache line (step S863), and the received cache line is located in the data storage unit 240 designated by the tag control unit 230. (Step S864).
- FIG. 25 is a diagram showing a processing procedure at the time of cache line write of the data control unit 250 according to the second embodiment of the present invention.
- the data control unit 250 When receiving the cache line write instruction from the tag control unit 230 (step S871), the data control unit 250 receives a cache line from the primary cache 110 side of the processor 100 (step S872). Then, the data control unit 250 writes the received cache line in the position in the data storage unit 240 designated by the tag control unit 230 (step S873).
- FIG. 26 is a diagram illustrating an example of a data write sequence to the shared cache 200 according to the second embodiment of this invention.
- the processor 100-1 writes “1” representing 1 as the reference count in the upper 1 byte, and writes data to be passed to the processor 100-2 in the lower 63 bytes. (Step S881). Then, the primary cache 110-1 is instructed to write this work area to the shared cache 200 as a cache line with a lifetime (step S882).
- the primary cache 110-1 requests the shared cache 200 to write a cache line by designating a cache line with a lifetime as a cache line type (step S883).
- the shared cache 200 receives the write request (step S884), the shared cache 200 performs a cache hit or miss hit determination and replaces the cache line as necessary (step S885). Then, the cache line with a lifetime is received, and the received cache line with the lifetime is stored in the data storage unit 240 (step S886).
- step S887 When the transmission of the cache line with a lifetime is completed (step S887), the primary cache 110-1 reports the completion of the write of the cache line with a lifetime to the processor 100-1 (step S888). The write process ends when the processor 100-1 receives this report (step S889).
- FIG. 27 is a diagram illustrating an example of a data read sequence from the shared cache 200 according to the second embodiment of this invention.
- the processor 100-2 instructs the primary cache 110-2 to read the cache line in order to refer to the data written by the processor 100-1 (step S890).
- the primary cache 110-2 requests the shared cache 200 to read a cache line by designating a cache line with a lifetime as a cache line type (step S891).
- the shared cache 200 When the shared cache 200 receives a read request from the primary cache 110-2 (step S892), the shared cache 200 determines a cache hit or a miss (step S893). Then, the hit cache line with the lifetime is read from the data storage unit 240, and the value of the reference count 242 is subtracted by “1” (step S894), and the cache line with the lifetime is transmitted to the primary cache 110-2 (step S894). S895).
- step S896 When the reception of the cache line with a lifetime is completed (step S896), the primary cache 110-2 reports the completion of the read of the cache line with a lifetime to the processor 100-2 (step S897).
- step S898 When the processor 100-2 receives the read completion report of the cache line with the lifetime from the primary cache 110-2 (step S898), the processor 100-2 starts the shared data read processing (step S899).
- step S894 the cache line with the lifetime whose reference count 242 has become zero is an unnecessary cache line that will not be referred to in the future, and will be selected as a replacement target cache line by the tag control unit 230 in the future.
- the data control unit 250 when the value of the reference count 242 becomes zero, the data control unit 250 does not write back to the main memory 300 but discards it as it is.
- the reference count 242 of the data storage unit 240 is subtracted for each read access, and the cache line is invalidated when the value changes from “1” to “0”. Can be As a result, the cache memory can be operated as a shared FIFO between processors.
- the reference count 224 field is provided in the tag storage unit 220.
- the usable amount and the lock bit are stored in the tag storage unit 220.
- the premise of the information processing system and the configuration of the shared cache is the same as that of the first embodiment described with reference to FIGS.
- FIG. 28 is a diagram illustrating a field configuration example of the tag storage unit 220 according to the third embodiment of the present invention.
- Each entry of the tag storage unit 220 includes fields of a tag address 221, a valid 222, a dirty 223, a lock 226, and an available amount 227. Since the tag address 221, the valid 222, and the dirty 223 are the same as the fields of the first embodiment described with reference to FIG. 5, the description thereof is omitted here.
- the lock 226 stores a lock bit for locking so that the entry is not a replacement target. If this lock 226 is set to a locked state (eg, “1”) by one processor, the entry is not replaced by accesses from other unrelated processors. That is, when the write side processor needs a new cache line, the lock 226 is set to the locked state, and when the read side processor no longer needs the cache line, the lock 226 is set to the unlocked state. In the drawing, the lock 226 is abbreviated as “L”.
- the usable amount 227 stores the amount of usable data (usable amount) in the data storage unit 240 of the entry.
- a unit of the data amount an arbitrary unit can be used as long as it is unified.
- a unit of byte or block (4 bytes or 8 bytes) can be used.
- 1 byte is used as a unit, it is necessary to allocate a 6-bit width in order to express 64 bytes in the available amount 227.
- this available amount 227 is abbreviated as “U”.
- the available amount 227 can be used alone without being used in combination with the lock 226. However, by using it in combination with the lock 226, a later-described delay mechanism can be used effectively.
- FIG. 29 is a diagram illustrating a processing procedure during writing of the shared cache 200 according to the third embodiment of this invention.
- step S710 When a cache hit is detected as a result of the comparison in the tag control unit 230 (step S710), it is determined whether or not there is room for writing data in the cache line (step S718). Specifically, when the value obtained by subtracting the available amount 227 from the line size (64 bytes) is insufficient for the write data amount, the write operation is waited. On the other hand, if the value obtained by subtracting the available amount 227 from the line size is equal to or larger than the write data amount, the write data is written in the data storage unit 240 (step S719). At that time, the write data amount is added to the available amount 227 (step S720).
- step S710 when a miss hit is detected (step S710), if there is an unused way (step S711), a cache line is added (step S712), and the valid 222 of the cache line is “1”. Is set (step S717). Subsequent operations are the same as when a cache hit occurs (steps S718 to S720).
- step S710 When a mishit is detected (step S710), the following processing is performed. If all the ways are in use (step S711) and the locks 226 of all the ways are locked or the available amount 227 is set to a value greater than zero (step S713), the shared cache 200 is stored. To go through. That is, the amount of data and write data is saved to the main memory 300 using the uncached path (steps S715 and S716).
- the save area on the main memory 300 is the same as that of the first embodiment described with reference to FIG. 7, and a write data amount storage area (not shown) is secured in addition to the FIFO storage area 310.
- step S713 If all the ways are in use and there is a cache line in which the lock 226 of any way is unlocked or the available amount 227 is not set to a value greater than zero (step S713), The cache line is replaced (step S714).
- the operation after the cache line replacement is the same as the operation when the cache line is added (steps S717 to S720).
- FIG. 30 is a diagram showing a processing procedure when the shared cache 200 is read in the third embodiment of the present invention.
- step S721 When a cache hit is detected as a result of comparison in the tag control unit 230 (step S721), it is determined whether or not data can be read from the cache line (step S725). Specifically, when the value of the available amount 227 is insufficient for the read data amount, the read operation is waited. On the other hand, if the value of the available amount 227 is equal to or larger than the read data amount, the read data is read from the data storage unit 240 of the cache line (step S926). At that time, the read data amount is subtracted from the available amount 227 (step S727), and the lock 226 is unlocked to “0” (step S728).
- step S721 when a miss hit is detected (step S721), a cache line is secured (step S722), and "1" is set in the valid 222 of the cache line (step S723).
- “0” is set to the dirty 223
- the lock 226 is set to “1” (lock)
- the saved write data amount is set to the available amount 227 (step S723).
- the data storage unit 240 is filled with data from the FIFO storage area of the main memory 300 (step S724). Subsequent operations are the same as when a cache hit occurs (steps S725 to S728).
- FIG. 31 is a diagram illustrating an aspect of the delay setting mechanism for the available amount 227 according to the third embodiment of this invention. Assume that a mechanism for delaying the update timing of the available amount 227 by N lines is added when allocation is performed because a new cache line is required. Such a mechanism is referred to as an available amount 227 delay setting mechanism. By this delay setting mechanism, data within N lines from the last write position can be rewritten. This figure shows an example assuming a delay of two lines.
- a write line pointer register 581 and a plurality of write data amount registers 582 are provided.
- the write line pointer register 581 is a register that stores how far the cache line currently performing write access has progressed.
- the write data amount register 582 is a register that stores the amount of write data up to that time when the usable amount 227 of the cache line that is a delay target is determined.
- the write line pointer register 581 points to the fifth cache line, indicating that the usable amount 227 of the third cache line has been determined. Immediately after writing the fifth cache line, the usable amount 227 of the fourth and fifth cache lines is not set, so that the corresponding cache line can be rewritten.
- the write data amount to be set in the usable amount 227 of the fourth and fifth cache lines is stored in the write data amount register 582 and is referred to in accordance with the change of the write line pointer register 581.
- FIG. 32 is a diagram illustrating an aspect of the delay release mechanism of the lock 226 according to the third embodiment of the present invention. It is assumed that when the read cache line becomes unnecessary and is unlocked, a mechanism for delaying the unlock timing by N lines is added. Such a mechanism is referred to as a delay release mechanism of the lock 226. With this delay release mechanism, data within N lines from the final read position can be read again. This figure shows an example assuming a delay of two lines.
- the read line pointer register 591 is a register that stores how far the cache line currently being read-accessed has progressed.
- the read line pointer register 591 points to the fifth cache line, indicating that the unlock state of the lock 226 of the third cache line has been established. Immediately after reading the fifth cache line, the lock state of the fourth and fifth locks 226 is not released, so that the corresponding cache line can be read again.
- FIG. 33 is a diagram illustrating an example of data order change using the delayed update mechanism according to the third embodiment of the present invention.
- Each instruction to be executed and a virtual FIFO state are shown as a pair.
- a virtual FIFO entity is stored in the shared cache 200.
- a FIFO corresponding to eight cache lines is shown.
- FIG. 33A shows an instruction executed by the write side processor and a state immediately after the execution.
- the write side processor sequentially writes to the FIFO from the left.
- the state immediately after writing the cache line of the data D2, the state returns to the previous cache line and the data D1 is written.
- the usable amount 227 is fixed, the data cannot be written until the read side processor reads it, but the data D1 is delayed by delaying the setting of the usable amount 227. Can be written.
- FIG. 33B shows an instruction executed by the read side processor and a state immediately after the execution.
- the read side processor sequentially reads from the left to the FIFO.
- the data D1 can be read first, and then the data D2 can be read. That is, unlike the write order of the write side processor, the data D1 and D2 can be exchanged and read. As a result, it is possible to reduce the cost of replacing the data stored in the memory in the write side processor or the read side processor.
- FIG. 34 is a diagram illustrating an example of data size compression using a delayed update mechanism according to the third embodiment of the present invention.
- Each instruction to be executed and a virtual FIFO state are shown as a pair.
- a virtual FIFO entity is stored in the shared cache 200.
- a FIFO corresponding to 8 bytes in the cache line is shown.
- FIG. 34 (a) shows instructions executed by the write side processor and a state immediately after execution when the delayed update mechanism is not used.
- 1-byte data D1 is written to the 0th byte of the FIFO.
- the 2-byte data D2 is written in the second to third bytes of the FIFO.
- the available amount 227 stores the final position in the cache line, the next 1-byte data D3 is written in the fourth byte.
- FIG. 34 (b) shows the instructions executed by the write side processor and the state immediately after execution when the delayed update mechanism is used.
- the usable amount 227 is not fixed when the data D1 and D2 are written, the 1-byte data D3 can be written to the first byte.
- the free area in the FIFO can be used. Thereby, an unused area for data alignment can be reduced.
- FIG. 35 is a diagram showing an IDCT (Inverse Discrete Cosine Transform) coefficient decoding processing algorithm in a general codec.
- IDCT Inverse Discrete Cosine Transform
- an IDCT coefficient of an 8 pixel ⁇ 8 pixel block is acquired from a bit stream, and the acquired IDCT coefficient is zigzag scanned as shown in FIG. 36 and output to the FIFO as a one-dimensional coefficient string.
- MPEG Motion Picture Experts Group
- JPEG Joint Photographic Experts Group
- FIG. 37 is a diagram showing an IDCT coefficient decoding processing algorithm of a codec optimized by a conventional method.
- optimization is performed by executing a zigzag scan simultaneously with the IDCT coefficient decoding.
- this method it is necessary to change the order when the coefficient sequence after the zigzag scan is output to the FIFO, and it is necessary to provide a buffer QF for holding the intermediate result.
- FIG. 38 is a diagram showing an IDCT coefficient decoding processing algorithm of the codec using the delay update mechanism in the third embodiment of the present invention.
- zigzag scanning is executed simultaneously with IDCT coefficient decoding, and output to the FIFO without providing a buffer for holding intermediate results. That is, after the initialization data is once output to the FIFO, only the non-zero coefficient can be written again to the FIFO, so that it is possible to omit a buffer for holding the intermediate result.
- the IDCT coefficient is 128 bytes (8 pixels ⁇ 8 pixels ⁇ 2 bytes), and when the cache line size is 64 bytes, 2 lines are used. According to the delayed update mechanism according to the third embodiment of the present invention, it is possible to write to an arbitrary place in two lines a plurality of times, so that the algorithm can be flexibly optimized.
- the processor 100-1 writes data to the shared cache 200.
- the tag control unit 230 detects a tag match for write access, and an event occurs in which the available amount 227 and the lock 226 are already set in all ways. To do.
- data is directly stored in the main memory 300 through the shared cache 200 and using the uncached path. At this time, the amount of write data accompanying the write data is also saved in the main memory 300.
- the processor 100-2 reads the data from the shared cache 200, and the data is filled from the main memory 300 to the shared cache 200.
- the saved write data amount is also set to the available amount 227. As a result, read access from the processor 100-2 becomes possible.
- the delayed update mechanism when used, when the uncached path is used, it is necessary to determine the available amount 227 and the lock 226 for which the update has been delayed. That is, in the delay setting mechanism for the available amount 227, the available amount 227 is not determined unless a write operation is performed later. Similarly, the delay release mechanism for the lock 226 requires no read operation later. Lock 226 will not be determined. Therefore, it is necessary to forcibly determine the available amount 227 and the lock 226 for the last N lines not only when the cache capacity is exceeded but also during normal access. Therefore, the following flash function is provided.
- Flash function In the shared cache 200 having a delayed update mechanism, a flush function for determining the available amount 227 and the lock 226 is provided.
- the write data amount held in the write data amount register 582 is set to the usable amount 227 and confirmed, and the lock 226 that has not been confirmed is confirmed in the locked state.
- all the available amounts 227 are set to “0”, all the locks 226 are set to unlock, and the cache line is released.
- a shared FIFO can be realized on the cache memory. Then, a digital television broadcasting system will be described as an application example using a shared FIFO.
- FIG. 39 is a diagram showing an example of a digital television broadcast system as an application example of the embodiment of the present invention.
- a digital television broadcast signal is transmitted from a transmitter 601 to a receiver 603 via a channel 602.
- the transmitter 601 transmits the stream data of the transport stream.
- the receiver 603 receives the stream data of the transport stream transmitted from the transmitter 601.
- this digital television broadcasting system performs byte interleaving on the transmission transport stream packet.
- the interleaving depth is 12 bytes, and the next byte of the synchronization byte passes through a reference path without delay.
- the transmitter 601 is provided with an interleaver 610, and the receiver 603 is provided with a deinterleaver 630.
- FIG. 40 is a diagram showing a configuration example of the interleaver 610 in the application example of the embodiment of the present invention.
- This interleaver 610 has twelve paths # 0 to # 11, and the switches 611 and 613 are simultaneously switched so as to pass any one of the paths.
- the byte interleaver 610 is switched so as to pass different paths for each byte. That is, the path # 1 and the path # 2 are switched in order from the path # 0, and the path after the path # 11 is switched to the path # 0 again.
- This interleaver 610 includes FIFOs 612-1 to 11-11 in paths # 1 to # 11 among twelve paths # 0 to # 11.
- FIG. 41 is a diagram showing a configuration example of the deinterleaver 630 in the application example of the embodiment of the present invention. Similar to the interleaver 610, this deinterleaver 630 has twelve paths # 0 to # 11, and the switches 631 and 633 are simultaneously switched so as to pass any one of the paths. . In the debyte interleaver 630, similarly to the interleaver 610, switching is performed so as to pass different paths for each byte. That is, the path # 1 and the path # 2 are switched in order from the path # 0, and the path after the path # 11 is switched to the path # 0 again.
- the deinterleaver 630 includes FIFOs 632-0 to 10 in paths # 0 to # 10 out of 12 paths # 0 to # 11.
- the FIFOs 612 and 632 in the interleaver 610 and deinterleaver 630 are processed assuming that dummy data is filled. Therefore, a process for extruding dummy data is required in the first stage of the process.
- the lengths of the FIFOs in both paths are set to be a pair, and when both paths are combined, the length of 187 bytes is obtained. It is comprised so that it may have. Therefore, the data arrangement is matched between the input of the interleaver 610 and the output of the deinterleaver 630. On the other hand, the arrangement of data on the channel 602 is scattered, so that even when a burst error occurs, the receiver 603 is in a state convenient for error correction using an error correction code.
- the FIFOs 612 and 632 in the interleaver 610 and the deinterleaver 630 can be realized as a shared FIFO in the above-described embodiment of the present invention.
- the FIFOs 612 and 632 are not necessarily stored in the cache memory, and the cache memory can be used according to the processing status. That is, when there are many cache lines used for purposes other than FIFO, saving to the main memory is performed, and in the opposite case, the state in which the FIFO data has survived on the cache line is maintained. Therefore, processing can be performed efficiently with a small cache capacity.
- the cache line size of the shared cache 200 is assumed to be 64 bytes, but the present invention is not limited to this. Further, in the embodiment of the present invention, the cache line size of the shared cache 200 and the primary cache 110 is assumed to be the same 64 bytes. However, the present invention is not limited to this, and a combination of different cache line sizes can be used. There may be.
- the shared cache 200 is assumed to be a write-back cache memory.
- the present invention is not limited to this, and other schemes such as a write-through scheme may be used. .
- the data transfer between the primary cache 110 and the shared cache 200 is performed in units of cache line size, but the present invention is not limited to this. You may enable it to transfer by arbitrary sizes.
- the embodiment of the present invention shows an example for embodying the present invention, and as clearly shown in the embodiment of the present invention, the matters in the embodiment of the present invention and the scope of claims There is a corresponding relationship with the invention-specific matters in. Similarly, the invention specific matter in the claims and the matter in the embodiment of the present invention having the same name as this have a corresponding relationship.
- the present invention is not limited to the embodiments, and can be embodied by making various modifications to the embodiments without departing from the gist of the present invention.
- the processing procedure described in the embodiment of the present invention may be regarded as a method having a series of these procedures, and a program for causing a computer to execute the series of procedures or a recording medium storing the program May be taken as
- this recording medium for example, a CD (Compact Disc), an MD (Mini Disc), a DVD (Digital Versatile Disk), a memory card, a Blu-ray Disc (registered trademark), or the like can be used.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
説明は以下の順序により行う。
1.第1の実施の形態(参照回数フィールドをタグ記憶部に設けた例)
2.第2の実施の形態(参照回数フィールドをデータ記憶部に設けた例)
3.第3の実施の形態(利用可能量フィールドおよびロックフィールドをタグ記憶部に設けた例)
4.応用例(デジタルテレビ放送システムへの適用例)
5.変形例
[情報処理システムの構成]
図1は、本発明の実施の形態における情報処理システムの一構成例を示す図である。この情報処理システムは、p個(pは1以上の整数)のプロセッサ100-1乃至100-p(以下、これらをまとめてプロセッサ100と呼称する場合がある。)と、共有キャッシュ(二次キャッシュ)200と、メインメモリ300とを備える。プロセッサ100-1乃至100-pと共有キャッシュ200との間は、システムバス190により相互に接続されている。
図2は、本発明の実施の形態における共有キャッシュ200の機能構成例を示す図である。この共有キャッシュ200は、調停部210と、タグ記憶部220と、タグ制御部230と、データ記憶部240と、データ制御部250と、応答部260とを備えている。
図5は、本発明の第1の実施の形態におけるタグ記憶部220のフィールド構成例を示す図である。タグ記憶部220の各エントリは、タグアドレス221、バリッド222、ダーティ223、および、参照回数224の各フィールドを備える。
共有キャッシュ200の容量を越えるサイズのデータをプロセッサ100間で転送するためには、共有キャッシュ200を使用しないアンキャッシュ経路と、データ制御部250において以下のような制御機能とを追加することが有用である。
図8は、本発明の第1の実施の形態における共有キャッシュ200のライト時の処理手順を示す図である。
図12は、本発明の第1の実施の形態による情報処理システムにおいてプリフェッチ機能を設けた構成例を示す図である。この構成例では、プロセッサ100-1乃至pと共有キャッシュ200との間にプリフェッチ制御部400が接続されている。このプリフェッチ制御部400は、プロセッサ100からのリードアクセスに先行して、共有キャッシュ200にリードリクエストを発行し、プリフェッチを行わせるものである。すなわち、このプリフェッチは、メインメモリ300から共有キャッシュ200へのデータ転送を促すものである。
上述の、キャッシュ容量を超えるサイズのデータ受け渡しを実現する他の手法として、FIFO記憶領域の指定について説明する。この手法では、参照回数は1回であることが前提となる。
このように、本発明の第1の実施の形態によれば、リードアクセスの度にタグ記憶部220の参照回数224を減算し、「1」から「0」に変化した際にキャッシュラインを無効化することができる。これにより、キャッシュメモリをプロセッサ間の共有FIFOとして動作させることが可能となる。
上述の第1の実施の形態では、タグ記憶部220内に参照回数224フィールドを設けたが、この第2の実施の形態ではデータ記憶部240内に参照回数を記憶する。前提とする情報処理システムおよび共有キャッシュの構成は図1乃至図4により説明した第1の実施の形態と同様であるため、ここでは説明を省略する。
図17は、本発明の第2の実施の形態におけるタグ記憶部220のフィールド構成例を示す図である。タグ記憶部220の各エントリは、タグアドレス221、バリッド222、ダーティ223、および、寿命付き225の各フィールドを備える。タグアドレス221、バリッド222およびダーティ223については、図5により説明した第1の実施の形態のフィールドと同様であるため、ここでは説明を省略する。
図18は、本発明の第2の実施の形態におけるデータ記憶部240のフィールド構成例を示す図である。上述のように、データ記憶部240は、それぞれ128個のキャッシュラインから成る2つのウェイ#0および#1を備え、それぞれ64バイトのラインデータを保持している。64バイトのラインデータのうち上位1バイトは参照回数242であり、下位63バイトがデータ241である。なお、この1バイトと63バイトの割り当ては一例であり、適宜変更してもよい。
本発明の第2の実施の形態では、共有キャッシュ200の動作について、タグ制御部230とデータ制御部250に分けて説明する。
プロセッサ100-2は、プロセッサ100-1がライトしたデータを参照するために、一次キャッシュ110-2に対してキャッシュラインをリードするよう指示する(ステップS890)。一次キャッシュ110-2は、共有キャッシュ200に対して、キャッシュラインの種類として寿命付きキャッシュラインを指定して、キャッシュラインのリードを要求する(ステップS891)。
この第2の実施の形態においても、上述の第1の実施の形態において説明した、キャッシュ容量を超えるサイズのデータ受け渡し、プリフェッチ、FIFO記憶領域の指定といった変形例を適宜適用することができる。
このように、本発明の第2の実施の形態によれば、リードアクセスの度にデータ記憶部240の参照回数242を減算し、「1」から「0」に変化した際にキャッシュラインを無効化することができる。これにより、キャッシュメモリをプロセッサ間の共有FIFOとして動作させることが可能となる。
上述の第1の実施の形態では、タグ記憶部220内に参照回数224フィールドを設けたが、この第3の実施の形態ではタグ記憶部220内に利用可能量およびロックビットを記憶する。前提とする情報処理システムおよび共有キャッシュの構成は図1乃至図4により説明した第1の実施の形態と同様であるため、ここでは説明を省略する。
図28は、本発明の第3の実施の形態におけるタグ記憶部220のフィールド構成例を示す図である。タグ記憶部220の各エントリは、タグアドレス221、バリッド222、ダーティ223、ロック226、および、利用可能量227の各フィールドを備える。タグアドレス221、バリッド222およびダーティ223については、図5により説明した第1の実施の形態のフィールドと同様であるため、ここでは説明を省略する。
図29は、本発明の第3の実施の形態における共有キャッシュ200のライト時の処理手順を示す図である。
図31は、本発明の第3の実施の形態における利用可能量227の遅延設定機構の態様を示す図である。新規にキャッシュラインが必要となって割り当てを行う際に、利用可能量227の更新タイミングをNライン分遅らせる機構を追加することを想定する。このような機構を、利用可能量227の遅延設定機構と称する。この遅延設定機構により、最終ライト位置からNライン以内のデータが再書込み可能になる。この図では、2ライン分の遅延を想定した例を示している。
以下では、ここまでに説明した遅延更新機構をコーデックのアルゴリズムに適用して最適化を行う適用例について説明する。
この第3の実施の形態においても、共有キャッシュ200の容量を越えるサイズのデータをプロセッサ100間で転送することが可能である。すなわち、第1の実施の形態の場合と同様に、共有キャッシュ200を使用しないアンキャッシュ経路と、データ制御部250において以下のような制御機能とを追加することが有用である。
遅延更新機構を備える共有キャッシュ200においては、利用可能量227およびロック226を確定させるためのフラッシュ機能を設ける。ライト動作に関してフラッシュの指示があった場合、ライトデータ量レジスタ582に保持されたライトデータ量を利用可能量227に設定して確定させるとともに、未確定となっていたロック226をロック状態に確定させる。リード動作に関してフラッシュの指示があった場合、全ての利用可能量227を「0」に設定するとともに、全てのロック226をアンロックに設定して、キャッシュラインを解放する。
この第3の実施の形態においても、上述の第1の実施の形態において説明した、プリフェッチ、FIFO記憶領域の指定といった変形例を適宜適用することができる。
このように、本発明の第3の実施の形態によれば、利用可能量227に対して、ライトアクセスの度に加算し、リードアクセスの度に減算することにより、共有FIFOとして動作させた際のデータの追い越しを防止することができる。また、ライトアクセスの際にロック226をロック状態にして、リードアクセスの際にアンロック状態にすることにより、第三者によるキャッシュラインの追い出しを防止することができる。また、これら利用可能量227およびロック226について遅延更新機構を設けることにより、共有FIFOとして利用する際にデータを入れ換えることが可能となる。
上述のように、本発明の実施の形態によれば、キャッシュメモリ上に共有FIFOを実現することができる。そこで、次に共用FIFOを利用した応用例として、デジタルテレビ放送システムについて説明する。
以上、本発明の実施の形態について説明したが、本発明はこれらの実施の形態に限定されるものではない。例えば、本発明の実施の形態においては、共有キャッシュ200のキャッシュラインサイズを64バイトと想定したが、本発明はこれに限定されるものではない。また、本発明の実施の形態においては、共有キャッシュ200と一次キャッシュ110のキャッシュラインサイズを同じ64バイトと想定したが、本発明はこれに限定されるものではなく、異なるキャッシュラインサイズの組合せであってもよい。
Claims (19)
- アクセスアドレスの第1のアドレス部分によってタグアドレスおよび残り参照回数を含む複数のエントリのうち少なくとも一つが索引されるタグ記憶部と、
前記複数のエントリに対応するデータを記憶するデータ記憶部と、
前記第1のアドレス部分とは異なる前記アクセスアドレスの第2のアドレス部分と前記索引されたエントリに含まれる前記タグアドレスとを比較して一致したエントリを検出し、リードアクセスに対して前記一致したエントリに含まれる前記残り参照回数が残り1回である旨を表示している場合には前記リードアクセスの後に当該エントリをライトバックせずに無効化し、前記残り参照回数が残り1回よりも大きい数を表示している場合には前記残り参照回数を1回分減らすタグ制御部と、
前記リードアクセスに対して前記一致したエントリに対応するデータを前記データ記憶部から選択するデータ制御部と
を具備するキャッシュメモリ。 - 前記タグ制御部は、ライトアクセスに対して前記第1のアドレス部分に対応する前記タグ記憶部のエントリの前記残り参照回数が何れもゼロより大きい数を表示している場合には前記タグ記憶部および前記データ記憶部にアクセスせずに前記ライトアクセスに係るデータおよび参照回数を外部のメモリに退避するように制御する
請求項1記載のキャッシュメモリ。 - 前記データ記憶部に空き容量が存在する場合に前記退避されたデータおよび参照回数を前記メモリからそれぞれ前記データ記憶部および前記タグ記憶部にプリフェッチするよう制御するプリフェッチ制御部をさらに具備する請求項2記載のキャッシュメモリ。
- メモリ上の特定の領域を指定する領域指定レジスタをさらに具備し、
前記タグ制御部は、前記アクセスアドレスが前記領域に含まれる場合においてライトアクセスに対して前記第1のアドレス部分に対応する前記タグ記憶部のエントリの前記残り参照回数が何れもゼロより大きい数を表示しているときには前記タグ記憶部および前記データ記憶部にアクセスせずに前記ライトアクセスに係るデータを外部のメモリに退避するように制御する
請求項1記載のキャッシュメモリ。 - 前記データ記憶部に空き容量が存在する場合に前記退避されたデータを前記メモリから前記データ記憶部にプリフェッチするとともに前記タグ記憶部における残り参照回数を1回に設定するよう制御するプリフェッチ制御部をさらに具備する請求項4記載のキャッシュメモリ。
- アクセスアドレスの第1のアドレス部分によってタグアドレスおよび残り参照回数を含む複数のエントリのうち少なくとも一つが索引されるタグ記憶部と、
前記第1のアドレス部分とは異なる前記アクセスアドレスの第2のアドレス部分と前記索引されたエントリに含まれる前記タグアドレスとを比較して一致したエントリを検出し、リードアクセスに対して前記一致したエントリに含まれる前記残り参照回数が残り1回である旨を表示している場合には前記リードアクセスの後に当該エントリをライトバックせずに無効化し、前記残り参照回数が残り1回よりも大きい数を表示している場合には前記残り参照回数を1回分減らすタグ制御部と
を具備するキャッシュメモリ制御装置。 - アクセスアドレスの第1のアドレス部分によってタグアドレスおよび寿命付きか否かを示す寿命フラグを含む複数のエントリのうち少なくとも一つが索引されるタグ記憶部と、 前記複数のエントリに対応するデータを記憶するとともに前記寿命フラグが寿命付きである旨を示している場合には残り参照回数を記憶するデータ記憶部と、
前記第1のアドレス部分とは異なる前記アクセスアドレスの第2のアドレス部分と前記索引されたエントリに含まれる前記タグアドレスとを比較して一致したエントリを検出し、リードアクセスに対して前記一致したエントリに含まれる前記寿命フラグが寿命付きである旨を示している場合には対応する前記残り参照回数が残り1回である旨を表示しているときには前記リードアクセスの後に当該エントリをライトバックせずに無効化するタグ制御部と、
前記リードアクセスに対して前記一致したエントリに対応するデータを前記データ記憶部から選択するとともに、前記一致したエントリに含まれる前記寿命フラグが寿命付きである旨を示している場合には対応する前記残り参照回数が残り1回よりも大きい数を表示しているときには前記残り参照回数を1回分減らすデータ制御部と
を具備するキャッシュメモリ。 - 前記タグ制御部は、ライトアクセスに対して前記第1のアドレス部分に対応する前記データ記憶部のエントリの前記残り参照回数が何れもゼロより大きい数を表示している場合には前記タグ記憶部および前記データ記憶部にアクセスせずに前記ライトアクセスに係るデータおよび参照回数を外部のメモリに退避するように制御する
請求項7記載のキャッシュメモリ。 - 前記データ記憶部に空き容量が存在する場合に前記退避されたデータおよび参照回数を前記メモリから前記データ記憶部にプリフェッチするよう制御するプリフェッチ制御部をさらに具備する請求項8記載のキャッシュメモリ。
- メモリ上の特定の領域を指定する領域指定レジスタをさらに具備し、
前記タグ制御部は、前記アクセスアドレスが前記領域に含まれる場合においてライトアクセスに対して前記第1のアドレス部分に対応する前記データ記憶部のエントリの前記残り参照回数が何れもゼロより大きい数を表示しているときには前記タグ記憶部および前記データ記憶部にアクセスせずに前記ライトアクセスに係るデータを外部のメモリに退避するように制御する
請求項7記載のキャッシュメモリ。 - 前記データ記憶部に空き容量が存在する場合に前記退避されたデータを前記メモリから前記データ記憶部にプリフェッチするとともに前記データ記憶部における残り参照回数を1回に設定するよう制御するプリフェッチ制御部をさらに具備する請求項10記載のキャッシュメモリ。
- アクセスアドレスの第1のアドレス部分によってタグアドレスおよびデータ量フィールドを含む複数のエントリのうち少なくとも一つが索引されるタグ記憶部と、
前記複数のエントリに対応するデータを記憶するデータ記憶部と、
前記第1のアドレス部分とは異なる前記アクセスアドレスの第2のアドレス部分と前記索引されたエントリに含まれる前記タグアドレスとを比較して一致したエントリを検出し、ライトアクセスの場合は前記一致したエントリに含まれる前記データ量フィールドの値に基づいて空き容量が確保されるまで待機して、前記ライトアクセス後は前記データ量フィールドに前記ライトアクセスに係るデータ量を加算し、リードアクセスの場合は前記一致したエントリに含まれる前記データ量フィールドの値に基づいて前記リードアクセスの対象となるデータ量が確保されるまで待機して、前記リードアクセス後は前記データ量フィールドから前記リードアクセスに係るデータ量を減算するタグ制御部と、
前記ライトアクセスに対して前記ライトアクセスに係るデータを前記データ記憶部の前記一致したエントリに書き込み、前記リードアクセスに対して前記一致したエントリに対応するデータを前記データ記憶部から選択するデータ制御部と
を具備するキャッシュメモリ。 - 前記タグ制御部は、前記ライトアクセス後に所定数のエントリについてライトアクセスが実行された遅延タイミングで前記データ量の加算を行うモードを備える請求項12記載のキャッシュメモリ。
- 前記タグ制御部は、前記遅延タイミングによる前記データ量の加算を行うモードにおいて、フラッシュ指示を受けると速やかに前記データ量の加算を行う請求項13記載のキャッシュメモリ。
- 前記タグ記憶部は、前記エントリに当該エントリがロックされているか否かを示すロックビットを含み、
前記タグ制御部は、前記ライトアクセスの際に前記一致したエントリに含まれる前記ロックビットをロックし、前記リードアクセスの際に前記一致したエントリに含まれる前記ロックビットをアンロックする
請求項12記載のキャッシュメモリ。 - 前記タグ制御部は、前記ライトアクセス後に所定数のエントリについてライトアクセスが実行された遅延タイミングで前記ロックビットのロックを行うモードを備える請求項15記載のキャッシュメモリ。
- 前記タグ制御部は、前記遅延タイミングによる前記ロックビットのロックを行うモードにおいて、フラッシュ指示を受けると速やかに前記ロックビットのアンロックを行う請求項16記載のキャッシュメモリ。
- 前記タグ制御部は、前記ライトアクセスに対して前記第1のアドレス部分に対応する前記タグ記憶部のエントリの前記データ量フィールドが何れもゼロより大きい数を表示している場合または前記ロックビットが何れもロックされている場合には前記タグ記憶部および前記データ記憶部にアクセスせずに前記ライトアクセスに係るデータおよびライトデータ量を外部のメモリに退避するように制御する
請求項15記載のキャッシュメモリ。 - アクセスアドレスの第1のアドレス部分によってタグアドレスおよびデータ量フィールドを含む複数のエントリのうち少なくとも一つが索引されるタグ記憶部と、
前記第1のアドレス部分とは異なる前記アクセスアドレスの第2のアドレス部分と前記索引されたエントリに含まれる前記タグアドレスとを比較して一致したエントリを検出し、ライトアクセスの場合は前記一致したエントリに含まれる前記データ量フィールドの値に基づいて空き容量が確保されるまで待機して、前記ライトアクセス後は前記データ量フィールドに前記ライトアクセスに係るデータ量を加算し、リードアクセスの場合は前記一致したエントリに含まれる前記データ量フィールドの値に基づいて前記リードアクセスの対象となるデータ量が確保されるまで待機して、前記リードアクセス後は前記データ量フィールドから前記リードアクセスに係るデータ量を減算するタグ制御部と
を具備するキャッシュメモリ制御装置。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP10839247A EP2518633A1 (en) | 2009-12-21 | 2010-12-14 | Cache memory and cache memory control device |
| US13/515,315 US9535841B2 (en) | 2009-12-21 | 2010-12-14 | Cache memory and cache memory control unit |
| CN201080055593.1A CN102667737B (zh) | 2009-12-21 | 2010-12-14 | 缓冲存储器和缓冲存储器控制单元 |
| US15/364,596 US10102132B2 (en) | 2009-12-21 | 2016-11-30 | Data transfer in a multiprocessor using a shared cache memory |
Applications Claiming Priority (8)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2009-288649 | 2009-12-21 | ||
| JP2009288647 | 2009-12-21 | ||
| JP2009-288648 | 2009-12-21 | ||
| JP2009288648 | 2009-12-21 | ||
| JP2009-288647 | 2009-12-21 | ||
| JP2009288649 | 2009-12-21 | ||
| JP2010-212516 | 2010-09-22 | ||
| JP2010212516A JP2011150684A (ja) | 2009-12-21 | 2010-09-22 | キャッシュメモリおよびキャッシュメモリ制御装置 |
Related Child Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/515,315 A-371-Of-International US9535841B2 (en) | 2009-12-21 | 2010-12-14 | Cache memory and cache memory control unit |
| US15/364,596 Continuation US10102132B2 (en) | 2009-12-21 | 2016-11-30 | Data transfer in a multiprocessor using a shared cache memory |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2011078014A1 true WO2011078014A1 (ja) | 2011-06-30 |
Family
ID=44195543
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2010/072475 Ceased WO2011078014A1 (ja) | 2009-12-21 | 2010-12-14 | キャッシュメモリおよびキャッシュメモリ制御装置 |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US9535841B2 (ja) |
| EP (1) | EP2518633A1 (ja) |
| JP (1) | JP2011150684A (ja) |
| KR (1) | KR20120106748A (ja) |
| CN (1) | CN102667737B (ja) |
| WO (1) | WO2011078014A1 (ja) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013064935A1 (en) * | 2011-10-31 | 2013-05-10 | International Business Machines Corporation | Dynamically adjusted threshold for population of secondary cache |
| JP2014115851A (ja) * | 2012-12-10 | 2014-06-26 | Canon Inc | データ処理装置及びその制御方法 |
| JP2015118638A (ja) * | 2013-12-19 | 2015-06-25 | キヤノン株式会社 | 情報処理装置及びその制御方法、プログラム |
Families Citing this family (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8930624B2 (en) * | 2012-03-05 | 2015-01-06 | International Business Machines Corporation | Adaptive cache promotions in a two level caching system |
| WO2014142852A1 (en) * | 2013-03-13 | 2014-09-18 | Intel Corporation | Vulnerability estimation for cache memory |
| JP5998998B2 (ja) * | 2013-03-22 | 2016-09-28 | 富士通株式会社 | 演算処理装置、情報処理装置、及び演算処理装置の制御方法 |
| WO2014180112A1 (zh) * | 2013-05-06 | 2014-11-13 | 华为技术有限公司 | 一种数据读写方法、存储控制器及计算机 |
| CN104346295B (zh) * | 2013-08-09 | 2017-08-11 | 华为技术有限公司 | 一种缓存刷新方法和装置 |
| KR20150113657A (ko) * | 2014-03-31 | 2015-10-08 | 삼성전자주식회사 | 가변 크기의 데이터를 기록하는 방법 및 프로세서와 가변 크기의 데이터를 판독하는 방법 및 프로세서 및 기록매체 |
| WO2015170550A1 (ja) * | 2014-05-09 | 2015-11-12 | ソニー株式会社 | 記憶制御装置、記憶装置、および、その記憶制御方法 |
| KR102354848B1 (ko) | 2014-11-28 | 2022-01-21 | 삼성전자주식회사 | 캐시 메모리 장치 및 이를 포함하는 전자 시스템 |
| WO2016203629A1 (ja) * | 2015-06-19 | 2016-12-22 | 株式会社日立製作所 | ストレージシステム及びキャッシュ制御方法 |
| CN107025130B (zh) * | 2016-01-29 | 2021-09-03 | 华为技术有限公司 | 处理节点、计算机系统及事务冲突检测方法 |
| US10049044B2 (en) * | 2016-06-14 | 2018-08-14 | Advanced Micro Devices, Inc. | Asynchronous cache flushing |
| CN107526535B (zh) * | 2016-06-22 | 2020-07-10 | 伊姆西Ip控股有限责任公司 | 用于管理存储系统的方法和系统 |
| US10606599B2 (en) * | 2016-12-09 | 2020-03-31 | Advanced Micro Devices, Inc. | Operation cache |
| CN110321997B (zh) * | 2018-03-31 | 2021-10-19 | 赛灵思公司 | 高并行度计算平台、系统及计算实现方法 |
| DE102018005618B4 (de) * | 2018-07-17 | 2021-10-14 | WAGO Verwaltungsgesellschaft mit beschränkter Haftung | Vorrichtung zur gepufferten Übertragung von Daten |
| CN111340678A (zh) * | 2018-12-19 | 2020-06-26 | 华为技术有限公司 | 一种数据缓存系统、图形处理器及数据缓存方法 |
| CN116897335A (zh) * | 2021-02-26 | 2023-10-17 | 华为技术有限公司 | 一种缓存替换方法和装置 |
| CN113222115B (zh) * | 2021-04-30 | 2024-03-01 | 西安邮电大学 | 面向卷积神经网络的共享缓存阵列 |
| US11907722B2 (en) * | 2022-04-20 | 2024-02-20 | Arm Limited | Methods and apparatus for storing prefetch metadata |
| US11994994B2 (en) * | 2022-04-25 | 2024-05-28 | Analog Devices International Unlimited Company | Smart prefetch buffer and queue management |
| GB2619288B (en) * | 2022-05-27 | 2024-09-25 | Advanced Risc Mach Ltd | Writing beyond a pointer |
| US12197329B2 (en) * | 2022-12-09 | 2025-01-14 | Advanced Micro Devices, Inc. | Range-based cache flushing |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000010862A (ja) * | 1998-06-23 | 2000-01-14 | Hitachi Software Eng Co Ltd | キャッシュメモリ制御方法 |
| JP2002236614A (ja) * | 2001-02-09 | 2002-08-23 | Nec Corp | キャッシュ制御方法及びキャッシュ制御回路 |
| JP2003030051A (ja) * | 2001-07-19 | 2003-01-31 | Sony Corp | データ処理装置及びデータアクセス方法 |
| JP2003248625A (ja) * | 2002-02-25 | 2003-09-05 | Seiko Epson Corp | キャッシュ回路、情報処理装置及び電子機器 |
| JP2003271455A (ja) * | 2002-03-19 | 2003-09-26 | Fujitsu Ltd | キャッシュメモリ制御装置およびキャッシュメモリシステム |
| JP2004355365A (ja) * | 2003-05-29 | 2004-12-16 | Fujitsu Ltd | キャッシュ管理装置およびキャッシュメモリ管理方法 |
| JP2005346215A (ja) * | 2004-05-31 | 2005-12-15 | Sony Computer Entertainment Inc | 情報処理装置および情報処理方法 |
| JP2009015509A (ja) * | 2007-07-03 | 2009-01-22 | Renesas Technology Corp | キャッシュメモリ装置 |
| JP2009037615A (ja) | 2007-07-31 | 2009-02-19 | Intel Corp | 複数のコアキャッシュ・クラスタ間の包括的共有キャッシュの提供 |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5944815A (en) * | 1998-01-12 | 1999-08-31 | Advanced Micro Devices, Inc. | Microprocessor configured to execute a prefetch instruction including an access count field defining an expected number of access |
| JPH11339464A (ja) * | 1998-05-28 | 1999-12-10 | Sony Corp | Fifo記憶回路 |
| JP3439350B2 (ja) * | 1998-10-02 | 2003-08-25 | Necエレクトロニクス株式会社 | キャッシュ・メモリ制御方法及びキャッシュ・メモリ制御装置 |
| US7707321B2 (en) * | 1999-08-04 | 2010-04-27 | Super Talent Electronics, Inc. | Chained DMA for low-power extended USB flash device without polling |
| US6868472B1 (en) * | 1999-10-01 | 2005-03-15 | Fujitsu Limited | Method of Controlling and addressing a cache memory which acts as a random address memory to increase an access speed to a main memory |
| US6847990B2 (en) * | 2002-05-17 | 2005-01-25 | Freescale Semiconductor, Inc. | Data transfer unit with support for multiple coherency granules |
| US6976131B2 (en) * | 2002-08-23 | 2005-12-13 | Intel Corporation | Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system |
| US7225301B2 (en) * | 2002-11-22 | 2007-05-29 | Quicksilver Technologies | External memory controller node |
| TWI227853B (en) * | 2003-08-29 | 2005-02-11 | Rdc Semiconductor Co Ltd | Data accessing method and system for processing unit |
| JP2007272336A (ja) * | 2006-03-30 | 2007-10-18 | Toshiba Corp | 命令処理装置及び命令処理方法 |
| US7792805B2 (en) * | 2006-05-30 | 2010-09-07 | Oracle America, Inc. | Fine-locked transactional memory |
| US8769207B2 (en) * | 2008-01-16 | 2014-07-01 | Via Technologies, Inc. | Caching method and apparatus for a vertex shader and geometry shader |
| US8145768B1 (en) * | 2008-02-26 | 2012-03-27 | F5 Networks, Inc. | Tuning of SSL session caches based on SSL session IDS |
-
2010
- 2010-09-22 JP JP2010212516A patent/JP2011150684A/ja active Pending
- 2010-12-14 WO PCT/JP2010/072475 patent/WO2011078014A1/ja not_active Ceased
- 2010-12-14 CN CN201080055593.1A patent/CN102667737B/zh not_active Expired - Fee Related
- 2010-12-14 KR KR1020127015222A patent/KR20120106748A/ko not_active Withdrawn
- 2010-12-14 EP EP10839247A patent/EP2518633A1/en not_active Withdrawn
- 2010-12-14 US US13/515,315 patent/US9535841B2/en active Active
-
2016
- 2016-11-30 US US15/364,596 patent/US10102132B2/en not_active Expired - Fee Related
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000010862A (ja) * | 1998-06-23 | 2000-01-14 | Hitachi Software Eng Co Ltd | キャッシュメモリ制御方法 |
| JP2002236614A (ja) * | 2001-02-09 | 2002-08-23 | Nec Corp | キャッシュ制御方法及びキャッシュ制御回路 |
| JP2003030051A (ja) * | 2001-07-19 | 2003-01-31 | Sony Corp | データ処理装置及びデータアクセス方法 |
| JP2003248625A (ja) * | 2002-02-25 | 2003-09-05 | Seiko Epson Corp | キャッシュ回路、情報処理装置及び電子機器 |
| JP2003271455A (ja) * | 2002-03-19 | 2003-09-26 | Fujitsu Ltd | キャッシュメモリ制御装置およびキャッシュメモリシステム |
| JP2004355365A (ja) * | 2003-05-29 | 2004-12-16 | Fujitsu Ltd | キャッシュ管理装置およびキャッシュメモリ管理方法 |
| JP2005346215A (ja) * | 2004-05-31 | 2005-12-15 | Sony Computer Entertainment Inc | 情報処理装置および情報処理方法 |
| JP2009015509A (ja) * | 2007-07-03 | 2009-01-22 | Renesas Technology Corp | キャッシュメモリ装置 |
| JP2009037615A (ja) | 2007-07-31 | 2009-02-19 | Intel Corp | 複数のコアキャッシュ・クラスタ間の包括的共有キャッシュの提供 |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013064935A1 (en) * | 2011-10-31 | 2013-05-10 | International Business Machines Corporation | Dynamically adjusted threshold for population of secondary cache |
| GB2513741A (en) * | 2011-10-31 | 2014-11-05 | Ibm | Dynamically adjusted threshold for population of secondary cache |
| JP2014535106A (ja) * | 2011-10-31 | 2014-12-25 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | ストレージ・システムの二次キャッシュ内にデータをポピュレートするための方法、制御装置、プログラム |
| US8972661B2 (en) | 2011-10-31 | 2015-03-03 | International Business Machines Corporation | Dynamically adjusted threshold for population of secondary cache |
| US8972662B2 (en) | 2011-10-31 | 2015-03-03 | International Business Machines Corporation | Dynamically adjusted threshold for population of secondary cache |
| GB2513741B (en) * | 2011-10-31 | 2016-11-02 | Ibm | Dynamically adjusted threshold for population of secondary cache |
| JP2014115851A (ja) * | 2012-12-10 | 2014-06-26 | Canon Inc | データ処理装置及びその制御方法 |
| JP2015118638A (ja) * | 2013-12-19 | 2015-06-25 | キヤノン株式会社 | 情報処理装置及びその制御方法、プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102667737A (zh) | 2012-09-12 |
| US20120331234A1 (en) | 2012-12-27 |
| CN102667737B (zh) | 2015-02-25 |
| JP2011150684A (ja) | 2011-08-04 |
| KR20120106748A (ko) | 2012-09-26 |
| EP2518633A1 (en) | 2012-10-31 |
| US9535841B2 (en) | 2017-01-03 |
| US10102132B2 (en) | 2018-10-16 |
| US20170083440A1 (en) | 2017-03-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2011078014A1 (ja) | キャッシュメモリおよびキャッシュメモリ制御装置 | |
| US8495301B1 (en) | System and method for scatter gather cache processing | |
| US8745334B2 (en) | Sectored cache replacement algorithm for reducing memory writebacks | |
| CN102687128B (zh) | 运算处理装置 | |
| CN102110058B (zh) | 一种低缺失率、低缺失惩罚的缓存方法和装置 | |
| KR20160141735A (ko) | 캐시 오염을 감소시키기 위해서 전용 캐시 세트들에서의 경합 전용 프리페치 정책들에 기초한 적응형 캐시 프리페칭 | |
| CN101593161A (zh) | 确保微处理器的快取存储器层级数据一致性的装置与方法 | |
| US20110167224A1 (en) | Cache memory, memory system, data copying method, and data rewriting method | |
| US8122216B2 (en) | Systems and methods for masking latency of memory reorganization work in a compressed memory system | |
| WO2010100679A1 (ja) | コンピュータシステム、制御方法、記録媒体及び制御プログラム | |
| US20100030966A1 (en) | Cache memory and cache memory control apparatus | |
| US7454575B2 (en) | Cache memory and its controlling method | |
| CN119179656A (zh) | 数据缓存控制方法、装置、介质、程序产品及终端 | |
| US20080055323A1 (en) | Systems and methods for reducing latency for accessing compressed memory using stratified compressed memory architectures and organization | |
| US7555610B2 (en) | Cache memory and control method thereof | |
| JP2022015514A (ja) | 半導体装置 | |
| CN104364776B (zh) | 使用缓存缺失请求提供缓存替换通知 | |
| US9983994B2 (en) | Arithmetic processing device and method for controlling arithmetic processing device | |
| JP2014186579A (ja) | キャッシュメモリ、キャッシュメモリ制御装置、および、そのキャッシュメモリ制御方法 | |
| US12197342B2 (en) | Arithmetic processing device and arithmetic processing method | |
| US9053030B2 (en) | Cache memory and control method thereof with cache hit rate | |
| JP5040121B2 (ja) | 情報処理装置、キャッシュ制御方法及びプログラム | |
| CN113791989A (zh) | 基于cache的缓存数据处理方法、存储介质及芯片 | |
| US9430397B2 (en) | Processor and control method thereof | |
| US20120102271A1 (en) | Cache memory system and cache memory control method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 201080055593.1 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10839247 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 20127015222 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 5257/DELNP/2012 Country of ref document: IN Ref document number: 2010839247 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 13515315 Country of ref document: US |