TWI228681B - Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache - Google Patents
Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache Download PDFInfo
- Publication number
- TWI228681B TWI228681B TW92129620A TW92129620A TWI228681B TW I228681 B TWI228681 B TW I228681B TW 92129620 A TW92129620 A TW 92129620A TW 92129620 A TW92129620 A TW 92129620A TW I228681 B TWI228681 B TW I228681B
- Authority
- TW
- Taiwan
- Prior art keywords
- cache memory
- cache
- peep
- array
- memory
- Prior art date
Links
- 230000015654 memory Effects 0.000 title claims abstract description 328
- 238000000034 method Methods 0.000 title claims description 28
- 230000000694 effects Effects 0.000 title description 5
- 244000144985 peep Species 0.000 claims description 185
- 230000009471 action Effects 0.000 claims description 118
- 238000012546 transfer Methods 0.000 claims description 48
- 230000007704 transition Effects 0.000 claims description 26
- 208000008918 voyeurism Diseases 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 14
- 239000003550 marker Substances 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 3
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 claims description 3
- 239000011257 shell material Substances 0.000 claims description 2
- 230000001427 coherent effect Effects 0.000 claims 1
- 238000011010 flushing procedure Methods 0.000 claims 1
- 230000001568 sexual effect Effects 0.000 claims 1
- 230000000007 visual effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 19
- 230000027311 M phase Effects 0.000 description 9
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000958526 Cuon alpinus Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
1228681 五、發明說明(1) 【發明所屬之技」 [0002 ]本潑 頜域,特別是有 記憶體及外部窺 【先前技術】 [0003]許 j 即,其包括耦接 統的計算負擔。 記憶體。進一步 常為階層式的快 [0 0 0 4 ]快 器内部的記憶體 通常遠小於系統 料轉移遠快於處 從系統記憶體讀 記憶體中,所以 取記憶體中讀取 地,下次處理器 於快取記憶體)日 立即將資料寫入 取。存取快取記 力可降低整個資 [ 0005 ]快巧 街領域】 ‘明係有關於微處理器中之快取記憶體的 關於多次通過式(m u 11 i - p a s s)管線化快取 視運算對其造成之影響。 現代的電腦系統為多處理器系統。亦 於共用匯流排上之多個處理器,以分攤系 此外,多個處理器通常會分享共同的系統 言’每個處理器皆包括快取記憶體,其通 取記憶體。 取°己憶體(cache memory或cache)為處理 ’可儲存系統記憶體中之一部分資料,且 記憶體。以處理器的快取記憶體來進行資 理器與記憶體之間的資料轉移。當處理器 取=料時,處理器也會將資料存入其快取 下人處理器需讀取資料時,可更快地從快 泰 次 〜小…也丨思肢丫項取貢料。同樣 而將 > 料寫入系統記憶體位址(其資料存 、’處理器只要寫入快取記憶體,而不必 二其一般稱為回寫(仙“—)快 料,而不必存取記憶體,此種能 〇己情Λ間,而大幅地增進系統效能。 隐體會將資料儲存於快取線中。一般 1228681 五、發明說明(2) 的快取線大小為32位元組。快取線為可在快取記憶體與系 統記憶體之間轉移的最小資料單元。亦即,冬虛理相從 記憶體讀取-份可快取資料時,,將包之快;線 全部讀取,並將整條快取線存入快取記憶體。同樣地,當 要將新的快取線寫入快取記憶體,而取代一條修改快取線 (modified cache line)時,處理器會將整條被取代之快 取線寫至記憶體。1228681 V. Description of the invention (1) [Technology to which the invention belongs] [0002] The jaw field, especially with memory and external peeping [Prior art] [0003] Xu j That is, it includes the computational load of the coupling system. Memory. Further, it is often hierarchical. [0 0 0 4] The internal memory of the cache is usually much smaller than the system material transfer, which is much faster than reading the system memory from the system memory, so take the memory to read the place and process it next time. The device writes data to the cache immediately. Access to cache memory can reduce the entire resource [0005] Quick Street field] 'Ming is about the multi-pass pipeline (mu 11 i-pass) of the cache memory in the microprocessor The effect of operations on it. Modern computer systems are multiprocessor systems. Also, multiple processors on a shared bus are shared to share the system. In addition, multiple processors usually share a common system. Each processor includes cache memory, its access memory. Take the cache memory (cache memory or cache) as the processing part of the system memory, and the memory. The cache memory of the processor is used to transfer data between the server and the memory. When the processor fetches the material, the processor will also store the data into its cache. When the processor needs to read the data, it can be faster from the fast time ~ small ... also 丨 think of the limbs to retrieve the material. Similarly, the > data is written to the system memory address (its data is stored, the processor only needs to write to the cache memory, and it is not necessary. It is generally called write-back (sen "-) fast data, without having to access the memory This kind of energy can greatly improve the system performance. The hidden body will store data in the cache line. Generally 12268681 V. Description of the invention (2) The cache line size is 32 bytes. Fast The fetch line is the smallest unit of data that can be transferred between the cache memory and the system memory. That is, when the winter virtual phase reads from the memory-a cacheable data, the packet is fast; the line is all read Fetch, and store the entire cache line into cache memory. Similarly, when a new cache line is written to the cache memory, instead of a modified cache line, the processor Writes the entire replaced cache line to memory.
[0 0 0 6 ]多個處理器皆具有自己的快取記憶體,以快 取來自於一共旱記憶體的資料,這種情況會導致快取憶 體一致性的問題。亦即,一處理器透過其快取記憶體所看 到的記憶體,與另一處理器透過其快取記憶體所看到的是 不同的。例如,假設記憶體之某位置x包含丨的值。處理器 A從記憶體的位址X讀取,並將1的值快取至其快取記憶 體。接者,處理器B從記憶體的位址X讀取,並將1的值快 取至其快取記憶體。然後,處理器A將〇的值寫入其快取記 憶體,並將記憶體的位址X更新為〇的值。現在若處理器A 讀取位址X,將會從其快取記憶體中,接收到〇 ;但若處理 器B讀取位址X,則將從其快取記憶體中,接收到i。 [〇 〇 〇 7 ]上述的例子顯示,有需要追蹤任何由系統中 一個以上快取記憶體所共享之快取線的狀態。有一種維持 快取記憶體一致性的常用架構,一般稱為窺視 (snooping)。藉由窺視,每個快取記憶體對其持有的每條 快取線,會記錄一份共享狀態的副本。每個快取記憶體會 監視或窺視其他處理器所共享之匯流排上的每個作業,以[0 0 0 6] Multiple processors each have their own cache memory to cache data from a total of dry memory. This situation will cause a cache memory consistency problem. That is, the memory that one processor sees through its cache memory is different from the memory that another processor sees through its cache memory. For example, suppose a location x in memory contains the value of 丨. Processor A reads from memory address X and caches the value of 1 into its cache memory. Then, the processor B reads from the memory address X, and caches the value of 1 to its cache memory. Then, the processor A writes the value of 0 to its cache memory, and updates the address X of the memory to the value of 0. Now if processor A reads address X, it will receive 0 from its cache memory; but if processor B reads address X, it will receive i from its cache memory. [00 〇 7] The above example shows the need to track the status of any cache line shared by more than one cache memory in the system. There is a common architecture for maintaining cache coherence, commonly called snooping. By peeking, each cache memory records a copy of the shared state for each cache line it holds. Each cache memory monitors or peeks at every job on the bus shared by other processors to
第8頁 1228681Page 8 1228681
2斷其本身是否具有另一處理器所啟始之匯流排作業的相 决取線^本。快取記憶趙依據所窥視的作業類型及其相 決取線的狀痛、’來執行不同的動作。一種常用的快取記 思體一致性狀態協定為心51協定。MEs丨代表修改 ⑽dlfled)、獨有(Exclusive)、共享(shared)、無效 (Invalid),其為快取記憶體中,一條快取線的四種可能 狀態或狀態值。 [ 0008 ] 般搭配窺視來維持快取記憶體一致性的方 法為,在處理器將資料寫入一快取線之前,要確保其具有 =取線的獨有存取權。此法一般稱為寫入無效化協;, 因為在寫入時,此法會將其他快取記憶體中之任何相關快 取f的*彳本無效化。要求獨有存取權可確保進行寫入的處 ,器在寫入資料時’不存在所寫入之快取線的其他可讀取 或可寫入副本。 M00 9 ]為了使其他快取記憶體中之快取線的其他副 本無效化,進行無效化的處理器會取得匯流排的存取權, 並在匯流排上提供要無效化之快取線的位址。盆他快取圮 憶體則會窺視匯流排’並檢查其目前是否有快取該位址\ 若是,則其他快取記憶體會將該快取線的狀態改為益效。 [0010]此外,每個快取記憶體也會藉由窺視匯流 斷其是否具有正由另一處理器所讀取之修改快取 J :右疋,則快取記憶體藉由將修改快取線寫入記憶體或 將修改快取線傳送至提出請求的處理器,或藉由此二者, 而提供修改快取線。讀取快取線的作業可允4快取&被共 1228681 五、發明說明(4) 享’或者’其可能會要求其他快取記憶體將快取線無效 化0 [0 〇 11 ]處理器的快取記憶體通常為階層式的快取記 憶體。例如,處理器可能具有第一階(L1)快取記憶體及第 二階(L2 )快取記憶體。L1快取記憶體比L2快取記憶體更靠 近處理器的計算元件,而且能比L2快取記憶體更快地將資 料傳送到計算元件。再者,快取記憶體可進一步分成分離 的指令快取記憶體及資料快取記憶體,分別用以快取指令 及資料。(2) Determine whether it has a dependent line of the bus operation initiated by another processor. Cache memory Zhao performs different actions based on the type of job being watched and the pain of its dependent line, '. A commonly used cache memory is the mind 51 agreement. MEs 丨 stands for Modified (dlfled), Exclusive, Shared, Invalid, which are the four possible states or status values of a cache line in the cache memory. [0008] A general method for maintaining the consistency of cache memory with peeping is to ensure that the processor has unique access to the line before the processor writes data to the line. This method is generally called the write invalidation protocol; because at the time of writing, this method invalidates the * copy of any related cache f in other cache memories. Requires unique access to ensure that the place where the write is made, when the device writes data, there are no other readable or writable copies of the cache line written. M00 9] In order to invalidate other copies of the cache line in other cache memory, the invalidating processor will obtain access to the bus and provide the cache line to be invalidated on the bus. Address. Basing his cache 忆 The memory will peek at the bus ’and check if it currently caches the address \ If so, other cache memories will change the state of the cache line to beneficial. [0010] In addition, each cache memory will also be broken by peeking whether it has a modified cache that is being read by another processor. J: Right, the cache memory is modified by caching The line is written into memory or the modified cache line is passed to the requesting processor, or both to provide the modified cache line. The operation of reading the cache line may allow 4 caches & 1226681. V. Description of the invention (4) Enjoying 'or' it may require other cache memory to invalidate the cache line 0 [0 〇11] processing The processor's cache memory is usually hierarchical cache memory. For example, a processor may have a first-level (L1) cache and a second-level (L2) cache. The L1 cache is closer to the processor's computing elements than the L2 cache and can transfer data to the computing elements faster than the L2 cache. Furthermore, the cache memory can be further divided into separate instruction cache memory and data cache memory, which are used to cache instructions and data, respectively.
[〇 〇 1 2 ]在處理器之快取記憶體階層内,各種快取記 憶體間會轉移彼此的快取線。例如,當快取位址未命中L i 快取記憶體時,若未命中快取線存在於L2中,則L丨會從處 理器的L2快取記憶體載入未命中快取線。再者,若L1快取 記憶體需以較新的快取線來取代一有效快取線,則L1快取 圯憶體會將被取代的快取線移出至L2快取記憶體,而不會 將快取線寫入系統記憶體。此做法一般係用於回寫快取的 組態。[〇 〇 1 2] In the processor's cache memory hierarchy, various cache memories will transfer each other's cache line. For example, when the cache address misses the L i cache, if the miss cache line exists in L2, L 丨 loads the miss cache line from the L2 cache memory of the processor. Furthermore, if the L1 cache needs to replace a valid cache line with a newer cache line, the L1 cache memory will move the replaced cache line to the L2 cache without changing Write the cache line to system memory. This practice is typically used to write back cached configurations.
[0 0 1 3 ]在處理器的二個快取記憶體間轉移快取線, 可能需要幾個處理器時脈週期。這可能有幾個原因。一 原因是快取記憶體通常包括具多個 階段在-時脈週期内處理一個運算的局 己憶體需要多個時脈週期。此外,快取記憶 二疋夕:通過式快取記憶體,因此在第-次通過管線 (通申稱為询問通過)_’會得到相關快取線的狀離。接[0 0 1 3] It may take several processor clock cycles to transfer the cache line between the two cache memories of the processor. There may be several reasons for this. One reason is that cache memory usually includes a local memory with multiple stages to process an operation in a -clock cycle, requiring multiple clock cycles. In addition, the cache memory Erxi: Pass-through cache memory, so in the first pass pipeline (commonly referred to as the inquiry pass) _ ’will get the state of the relevant cache line. Pick up
12286811228681
ί =再通過管線一次或更多次,以依據所取得的狀態來 1、取a憶體,或讀取在詢問通過期間未取得的額資 , V而g ’在處理器的積體電路上,快取記憶體 彼此*間的距離可能相當遠,因而需要額外的時脈週期以用· 於車乂長的吼號路徑,及/或經由許多邏輯閘之傳輸延遲而 產生的訊號。 [001糾例如,假設處理器將一條新的快取線儲存於 :、L1陕取°己憶體,而強制L1取代一修改快取線。l 1會將要 ^取,之修改快取線移出至處理器上的L2快取記憶體。L1 ,其! T線°賣取移出之快取線,並存人二個快取記憶體間的⑩ 緩衝器。L1會將此移出動作告知L2,接著以 移出之快取線心則從移出緩衝器讀取移出之快取= 將此線寫入其本身。 [0015] 只要在移出期間,快取記憶體沒有在匯流排 上窥視到與移出之快取線的位址衝突之作業(亦即,其位 址與移出之快取線的位址相同),則前述做法並無問題。 然而’在移出動作進行中’若窥視到有造成衝突的作業, 則在設計上會造成明顯的問題’必須加以解決。例如,若 被窺視的作業是要進行讀取,且轉移中的快取線包含尚未〇 被寫入記憶體的修改資料,則二個快取記憶體中的哪一個 要it供快取線的資料給匯流排上被窥視的作業?哪一個快 -取記憶體會擁有此移出的快取線,以更新其狀熊? ^ [0016] 對於這個問題’習用的方法是將轉移中運算 取消或殺掉。然而,此種方法會有負面影響。對於被取消ί = Pass the pipeline one or more times again, depending on the status obtained, 1. Take a memory, or read the amount of money that was not obtained during the query pass, V and g 'on the processor's integrated circuit The cache memory may be quite far apart from each other, so additional clock cycles are needed for the long roar path of the car, and / or signals generated by the transmission delay of many logic gates. [0012] For example, suppose the processor stores a new cache line in L1 and L1, and forces L1 to replace a modified cache line. l 1 moves the modified cache line to be fetched to the L2 cache on the processor. L1, its! T line ° sells the removed cache line, and stores a buffer between the two cache memories. L1 will inform L2 of this removal action, then read the removed cache with the removed cache line center = write this line into itself. [0015] As long as the cache memory does not peek on the bus to conflict with the address of the removed cache line during the removal (ie, its address is the same as the address of the removed cache line) , The previous approach is fine. However, if "conflicting operations are observed during the" removal operation in progress ", it will cause obvious problems in design" and must be resolved. For example, if the job being viewed is to be read and the cache line in the transfer contains modification data that has not yet been written to memory, which of the two cache memories should it be for the cache line? Data to spy on the bus? Which cache-fetch memory will have this removed cache line to update its shape? [0016] The conventional method for this problem is to cancel or kill the operation in the transfer. However, this approach has negative effects. For cancelled
12286811228681
五、發明說明(6) =轉移中運算,此法會增加快取記憶 間及複雜度。例如,在前射,L1快取記憶體tC 新快取線複寫移出之快取線的動作,直到L2告知i f以 地進行為止。U等待的時間愈久,取消及/或重試該了,全 =程二愈複雜。再者,所增加的延遲可能會影響效能异 另外,在一個快取記憶體之間,由於需要進行取 ^handshaking)的動作,若二個快取記憶體方塊相距、握手V. Description of the invention (6) = operation during transfer. This method will increase the cache memory and complexity. For example, in the front shot, the new cache line in the L1 cache memory tC overwrites the action of the removed cache line until L2 tells i f to proceed to the ground. The longer U waits, the cancellation and / or retry should be, the more complicated the whole process is. In addition, the increased latency may affect performance. In addition, between a cache memory, due to the need to fetch ^ handshaking), if the two cache memory blocks are spaced apart and shake hands,
遠,則彼此溝通的訊號可能相當長,且會有明顯的傳 遲,如此可能產生關鍵的(critical)時序路徑。 IIf the distance is too long, the signals communicating with each other may be quite long, and there may be significant delay, which may cause a critical timing path. I
[0017]因此,所需要的是一種快取記憶體,可於 部處理外部窺視與轉移中運算衝突所造成的影響,不+ 將轉移中運算取消。 $ 【發明内容[0018] 被窺視作業 此衝突,而 述目的,本發明的一項特徵是,提供 次通過式快 本發明提出一種快取記憶體, 與一轉移中運算之間的衝突, 不需將該轉移中運算取消。因 可偵測一外部 取記憶體。該快取記憶體包括 以接收一個在時間上介於一運算之詢問通 問。該運算係將一快取線於微 另一快取記憶體之間做轉移。 。該快取記憶體還包括控制邏 偵測該窺視位址與該快取線之 間的窥視詢 取記憶體與 一窺視位址 陣列,用以[0017] Therefore, what is needed is a kind of cache memory, which can handle the impact caused by the conflict between the operation of the external peep and the operation in the branch, and does not cancel the operation in the branch. [Summary of the Invention] [0018] This conflict is peeped. For the purpose described, a feature of the present invention is to provide a pass-through cache. The present invention proposes a cache memory that conflicts with operations in a transfer. The operation in the branch needs to be cancelled. Because it can detect an external fetch memory. The cache memory includes a query to receive a query between time and an operation. This operation transfers one cache line from one micro-cache to another. . The cache memory also includes control logic to detect a peeping query between the peeping address and the cache line. The fetching memory and an array of peeping addresses are used for
並可於 此,為 種微處理 一標記 過與結 處理器 該窺視 輯,耦 一位址 内部處理 了達成上 器中之多 陣列,用 束通過之 中之該快 詢問包栝 接至標& 間的衝And here, for a kind of microprocessing, a peek and mark processor, a peeking series, coupled with a single site, internally processes multiple arrays in the device, and uses the quick query packet in the bundle to connect to the label & amp Rush
第12頁 1228681 五、發明說明(7) 突。控制邏輯會在摘測到衝突時,藉由更新標記陣列而使 該結束通過得以完成,而不需取消該結束通過。Page 12 1228681 V. Description of Invention (7) The control logic will complete the end pass by updating the tag array when a conflict is detected, without canceling the end pass.
[0019] 另一方面,本發明的一項特徵是,提供一種 微處理器中之第二階(L2)快取記憶體,可於内部處理一窺 視運算,其係回應微處理器之一外部匯流排上所窺視之作 業而接收,且其位址係與在微處理器中之L2快取記憶體與 另一快取記憶體間轉移一快取線的轉移中運算發生衝突, 而該L2快取記憶體不需將該轉移中運算取消。該L2快取記 憶體包括一窺視衝突邏輯,其依據該轉移中運算之轉移中 標記$態,以及偵測到該窺視運算與轉移中運算間產生位 址衝突,而產生一窺視標記狀態。該L2快取記憶體還包括 一窺視動作邏輯,耦接至窺視衝突邏輯,以依據該窺視標 記狀態,產生一窺視動作。該窺視動作在該轉移中運算將 該快取線之一快取記憶體一致性狀態更新為該轉移中標記 狀態之後,更新該快取記憶體一致性狀態。 [0020] 另一方面,本發明的一項特徵是,提供一種 使第快取記憶體可於内部處理一窺視運算的方法,該窺 視運算具有一條正在第二快取記憶體與第一快取記憶體間 轉移中之相關快取線,第一快取記體並不需取消該轉移中 運算。此方法包括藉由該轉移中運算,詢問第一快取記憶 體,標記陣列,以取得該快取線之第一狀態;藉由該窺視 運算’詢問該標記陣列以取得該快取線之第二狀態;以及 在詢問該第二狀態之後,藉由該轉移中運算,以該快取線 之第二狀態更新標記陣列。此方法還包括依據第二及第三[0019] On the other hand, a feature of the present invention is to provide a second-order (L2) cache memory in the microprocessor, which can process a peep operation internally, and responds to one of the microprocessor's external Received by the operation peeped on the bus, and its address conflicts with the transfer operation of a cache line between the L2 cache memory in the microprocessor and another cache memory, and the L2 The cache does not need to cancel the transfer operation. The L2 cache memory includes a peep conflict logic, which generates a peep flag state based on detecting an address conflict between the peep operation and the operation in the branch, and detecting an address conflict between the peep operation and the operation in the branch. The L2 cache memory also includes a peep action logic, which is coupled to the peep conflict logic to generate a peep action based on the status of the peep marker. The peeking operation updates the cache memory consistency state after updating one of the cache lines' cache memory consistency status to the transfer in-progress mark operation during the transfer operation. [0020] In another aspect, a feature of the present invention is to provide a method for enabling a first cache memory to internally process a peep operation, the peep operation having a second cache memory and a first cache. The relevant cache line in the memory-to-memory transfer, the first cache memory does not need to cancel the operation in the transfer. The method includes querying the first cache memory and marking the array to obtain the first state of the cache line by the transfer operation; querying the mark array to obtain the first line of the cache line by the peek operation. Two states; and after interrogating the second state, the tag array is updated with the second state of the cache line by the transition operation. This method also includes
1228681 五、發明說明(8) 狀悲’以及 突,而產生 後,藉由該 列,藉此避 [ 0 02 1 ] 衝突的結果 週期時序的 上,進行快 行快取記憶 生衝突的轉 運算之處理 [ 0 022 ] 及所附圖示 與轉移中運算間之位址衝 窥視運i此方法還包括在以第三狀態更新之 免取、、肖兮姑从該快取線的第四狀態更新標記陣 免取消該轉移中運算。 千 完本全發ΛΙ/優點是’快取記憶體可將窺視 = :::是其可避免要在處理器積體電Γ 體間:Ϊ :溝通之相關問題。習用方法則需進 移;運二、,以取消位址與一外部窺視運算產 移中運异。此外,本發明亦可降低啟始轉移中 器中其他快取記憶體的複雜度。 本發明之其它特徵及優點,在配合下列說明 後’將更能突顯出來。 【實施方式】 [0 0 3 0 ]現在參照圖i,其係根據本發明繪示之微處理 器1 0 0之快取記憶體階層的方塊圖。 [ 0 0 3 1 ]微處理器1〇〇之快取記憶體階層包括第一階指 令(LI I)快取記憶體1〇2、第一階資料(lid)快取記憶體1〇4 以及第二階(L2)快取記憶體1〇6。LI I 102及L1D 104分別 快取指令及資料,而L2快取記憶體1 〇6則快取指令與資 料,以降低微處理器1 00提取指令與資料所需的時間。L2 快取記憶體1 06在系統的記憶體階層中,係位於系統記憶 體與 L1I 102 及 L1D 104之間。L1I 102、L1D 104 及 L2 快取1228681 V. Description of the invention (8) After the occurrence of the state of sadness and suddenness, the column is used to avoid the [0 02 1] conflict result cycle time sequence, and the cache operation of the cache is performed. The processing [0 022] and the attached picture and the address between the operations in the transfer peek through this method. This method also includes the update in the third state of the exemption, the fourth from the cache line Xiao Xigu The state update flag array does not cancel the operation in the transition. The full advantage of ΛΙ / is that ‘cache memory can peep == :: It can avoid the need to integrate the processor's integrated power Γ between the body: Ϊ: communication related issues. The conventional method needs to be moved; operation two, in order to cancel the address and an external peeping operation, the operation is different. In addition, the present invention can also reduce the complexity of other cache memories in the initial transfer memory. Other features and advantages of the present invention will become more apparent after cooperating with the following description. [Embodiment] [0 0 3 0] Referring now to FIG. I, it is a block diagram of a cache memory hierarchy of the microprocessor 100 according to the present invention. [0 0 3 1] The cache memory level of the microprocessor 100 includes a first-order instruction (LI I) cache memory 102, a first-order data (lid) cache memory 104, and Second level (L2) cache memory 106. LI I 102 and L1D 104 cache instructions and data respectively, while L2 cache memory 106 caches instructions and data to reduce the time required for microprocessor 100 to fetch instructions and data. L2 cache 1 06 is in the system's memory hierarchy, which is located between system memory and L1I 102 and L1D 104. L1I 102, L1D 104, and L2 cache
1228681 五、發明說明(9) 記憶體1 06係耦接在一起。L 1 I 1 02及L2快取記憶體1 06之 間會彼此轉移快取線,而L1D 104及L2快取記憶體1〇6之 間亦會彼此轉移快取線。例如,L1 I 1 02及LI D 1 04會將快 取線移出至L2快取記憶體106,或從L2快取記憶體1〇6載入 快取線。 [0032] 微處理器100還包括一匯流排介面單元1〇8, 其耦接至L1I 102、L1D 104及L2快取記憶體1〇6。匯流排 介面單元108將快取記憶體102-106及微處理器1〇〇中的其 他功能方塊輕接至一處理器匯流排11 2。處理器匯流排11 2 則將微處理器1〇〇耦接至其他的系統元件,如其他微處理 器、I / 0裝置以及記憶體元件,如系統記憶體。微處理器 1 0 0及其他裝置會執行處理器匯流排丨丨2上的匯流排作業, 以執行資料轉移,並維持快取記憶體間的一致性。 [0033] 匯流排介面單元1〇8會回應來自於微處理器 [0 0 3 4 ]此外’匯流排介面單元丨〇 8會監視處理器匯流1228681 V. Description of the invention (9) Memory 1 06 is coupled together. The cache lines are transferred between L 1 I 1 02 and L2 cache memory 1 06, and the cache lines are transferred between L1D 104 and L2 cache memory 106. For example, L1 I 1 02 and LI D 1 04 will move the cache line to L2 cache memory 106 or load the cache line from L2 cache memory 106. [0032] The microprocessor 100 further includes a bus interface unit 108, which is coupled to the L1I 102, the L1D 104, and the L2 cache memory 106. The bus interface unit 108 taps the cache memory 102-106 and other functional blocks in the microprocessor 100 to a processor bus 112. The processor bus 11 2 couples the microprocessor 100 to other system components, such as other microprocessors, I / 0 devices, and memory components, such as system memory. The microprocessor 100 and other devices execute the bus operations on the processor bus 丨 2 to perform data transfer and maintain consistency between cache memories. [0033] The bus interface unit 108 will respond from the microprocessor [0 0 3 4] In addition, the 'bus interface unit 1' will monitor the processor bus
第15頁 1〇〇内的功能方塊(如快取記憶體1 0 2 — 1 06 )之請求,而產生 處理器匯流排丨12上的作業。例如,若L2快取記憶體1〇6從 微處理器100内之另一方塊中,接收到未命中L2快取記憶 體1〇6的請求,則L2快取記憶體1〇6會請求匯流排介面單元 1 〇 8,啟始處理器匯流排丨丨2上的作業,以從處理器匯流排 1^2中,讀取相關的失誤快取線。同樣地,若^快取記憶 需要將一快取線寫入系統記憶體,則L2快取記憶體 106會請求處理器匯流排112,產生處理器匯流排112上的 乂,以將此快取線寫入處理器匯流排丨i 2。 1228681 五、發明說明(ίο) —---- 排112上的作業,並將這些作業反映給快取記憶體1〇2 一 1〇6。特別是,若匯流排介面單元108在處理器〜匯流排112 上發現有一或多個無效化之讀取或寫入記憶體作業,則匯 流排介面單元108會以窺視運算請求(sn〇〇p 〇peatiQn request)的形式,反映該作業給快取記憶體1〇2 — 1〇6。 [〇 〇 3 5 ]圖1之微處理器1 〇 〇的快取記憶體階層為使用 本發明之代表性的微處理器;然而,本發明並不限於圖1 的實施例。更確切地說,本發明可應用於任何一種快取記 憶體階層組態,只要其中二個快取記憶體間會轉移彼此的 資料,並且在進行轉移時(即處於轉移中(in—f light) 時),快取記憶體可接收其位址產生衝突之窺視運算。有 利的是,若處理器匯流排11 2上由一作業所產生之窥視運 算,其位址與一轉移中運算發生衝突,則本發明的L2快取 ㊁己憶體1 0 6可在内部加以處理,而不會像習用方法將該轉 移中運算取消。 [0036]現在參照圖2,其係根據本發明繪示圖1之L2 快取記憶體1 0 6的方塊圖。The request of the function block (such as cache memory 102-106) in 100 on page 15 will generate a job on the processor bus 丨 12. For example, if L2 cache memory 106 receives a request that misses L2 cache memory 106 from another block in microprocessor 100, L2 cache memory 106 will request confluence The interface unit 1 08 starts the operation on the processor bus 丨 丨 2 to read the relevant error cache line from the processor bus 1 ^ 2. Similarly, if the cache memory needs to write a cache line into the system memory, the L2 cache memory 106 will request the processor bus 112 to generate a buffer on the processor bus 112 to cache this cache. Line write processor bus 丨 i 2. 1228681 V. Description of invention (ίο) —---- Assign jobs on row 112 and reflect these jobs to cache memory 102-106. In particular, if the bus interface unit 108 finds one or more invalidated read or write operations on the processor to the bus 112, the bus interface unit 108 will perform a peep operation request (sn〇〇p 〇peatiQn request), reflecting that the job gave cache memory 102-106. The cache memory hierarchy of the microprocessor 100 of FIG. 1 is a representative microprocessor using the present invention; however, the present invention is not limited to the embodiment of FIG. More precisely, the present invention can be applied to any kind of cache memory hierarchy configuration, as long as two of the cache memories transfer each other's data, and during the transfer (that is, in-f light )), The cache can receive peep operations whose addresses conflict. Advantageously, if a peek operation generated by a job on the processor bus 11 2 conflicts with an operation in a transfer, the L2 cache memory of the present invention 106 can be internally It will be processed without canceling the operation in the branch as is customary. [0036] Reference is now made to FIG. 2, which is a block diagram illustrating the L2 cache memory 106 of FIG. 1 according to the present invention.
[0 0 3 7 ] L 2快取記憶體1 0 6包括一資料陣列2 〇 8。資料 陣列2 0 8包括一儲存元件陣列,用以儲存快取線。資料陣 列2 0 8接收一記憶體位址2 1 2,作為資料陣列2 〇 8的索引, 以選取陣列中的一個儲存元件。資料陣列2 〇 8於資料輸出 端2 1 8中,輸出位址2 1 2所選取的快取線。特別是,資料陣 列2 0 8會儲存在L 2快取記憶體1 0 6與L1快取記憶體1 〇 2 -1 0 4 之間所轉移的快取線。[0 0 3 7] The L 2 cache memory 106 includes a data array 208. The data array 208 includes an array of storage elements for storing cache lines. The data array 208 receives a memory address 2 12 as an index of the data array 208 to select a storage element in the array. The data array 2 08 outputs the cache line selected at the address 2 1 2 to the data output terminal 2 1 8. In particular, the data array 208 is stored in the cache line transferred between the L 2 cache memory 106 and the L1 cache memory 10 2-104.
IEMI 第16頁 1228681 五、發明說明(11) [0 0 3 8 ] L 2快取§己憶體1 0 6還包括一標記陣列2 〇 6。標 呂己陣列2 0 6包括一儲存元件陣列’用以儲存資料陣列2 〇 8 '中 所存之快取線的相關狀態資訊。此狀態資訊包括快取記慎 體一致性狀態資訊。在一實施例中,該快取記憶體一致性 資訊包括Μ E S I狀態資訊或狀態。標記陣列2 〇 6也會接收位 址2 1 2,作為標記陣列2 0 6的索引,以選取陣列中的一個儲 存元件。標記陣列2 0 6於狀態輪出端21 6中,輸出位址2 1 2 所選取的狀態。IEMI Page 16 1228681 V. Description of the Invention (11) [0 0 3 8] L 2 Cache § Memory 106 also includes a tag array 206. The label array 206 includes a storage element array ′ for storing relevant status information of the cache lines stored in the data array 208 ′. This status information includes cache consistency status information. In one embodiment, the cache memory consistency information includes M E S I status information or status. The tag array 2 06 will also receive the address 2 12 as the index of the tag array 2 06 to select a storage element in the array. The marker array 2 0 6 outputs the selected state at the address 2 1 2 in the state wheel output 21 6.
[ 0 0 39 ] L2快取記憶體106還包括控制邏輯2 〇2,其耗 接至資料陣列208及標記陣列206。控制邏輯2〇2還耦接至 LI I 102、L1 D 104及匯流排介面單元log,從其中接收 運算請求,並產生回應。控制邏輯2〇2係用以控制L2快取 記憶體1 0 6的運作,下文將配合其餘圖式做更詳細地說 明0 [0 0 4 0 ] L 2快取記憶體1 〇 6為多次通過式快取記憶體 (multi-pass cache)。亦即,大部份的運算需要通過[2快 取記憶體1 06二或更多次才能完成。第一次通過L2快取記 憶體106,時會從標記陣列206讀取標記狀態216,而在讀取 類,運算的情形,還可能從資料陣列2〇8讀取資料21 8。一 、鼻第 人通過L 2快取纟己憶體1 〇 6亦稱為詢問通過(q u e r y PaS_S\,因為其會向標記陣列206詢問快取線的狀態216。 、第一 Γ人及任何所需的後續次通過L2快取記憶體則為結束通 過(finish pass),而由於其會更新標記陣列2〇6中的快取 、·、 '二、所以亦稱為動作通過(action pass)或更新通[0 0 39] The L2 cache memory 106 further includes control logic 202, which is consumed by the data array 208 and the tag array 206. The control logic 202 is also coupled to the LI I 102, L1 D 104, and the bus interface unit log, receives an operation request therefrom, and generates a response. The control logic 202 is used to control the operation of the L2 cache memory 106. The following description will be described in more detail in conjunction with the other drawings. 0 [0 0 4 0] The L2 cache memory 10 is multiple times. Multi-pass cache. That is, most operations need to be completed through [2 cache memory 1 06 two or more times. When the L2 cache memory 106 is passed for the first time, the tag status 216 is read from the tag array 206, and in the case of reading classes and operations, it is also possible to read data 21 8 from the data array 208. First, the first person through the L 2 cache 纟 memory body 106 is also called query pass (query PaS_S \, because it will ask the tag array 206 about the status of the cache line 216., the first person and any place The required subsequent pass of the L2 cache memory is the finish pass, and because it updates the cache in the tag array 206, ..., 'two, it is also called an action pass or Update pass
1228681 五、發明說明(12) 列2〇8在窃寫'入類一型運算的情$ ’還可能將資料寫入資料陣 列208。窺視運鼻的結束通過係稱為窺視動作。 川4 [ = ]:2快取記憶想1〇"包括-窥視動作仔列 Λ,二/至控制邏輯2°2。窺視動作仵列2(34會儲存U =a己憶體106所執行的窺視動作。窺視動作仔列2〇4的運 作在下文將參考其餘圖式,做更詳細說明。 [0042] 現在參照圖3,其係根據本發明更詳細繪示圖 2控制邏輯202之圖i如2快取記憶體1〇6方塊圖。圖3的匕2 快取記憶體106包括圖2的控制邏輯2〇2、窺視動作佇列 204、標記陣列206以及資料陣列2〇8。在圖3的實施例中, L2快取記憶體1〇6管線包括四個階段,標示為】_階段322、 K-階段324、L-階段326及M-階段328。標記陣列206及資料 陣列2 0 8各皆包括j到μ的四個階段3 2 2 - 3 2 8。 [0043] 控制邏輯202包括一仲裁器3〇2。仲裁器3〇2接 收複數個要求存取L2快取記憶體1 〇 6之請求者輸入。其中 一清求者為窺視詢問3 3 6。圖1的匯流排介面單元丨〇 8會回 應圖1之外部處理器匯流排丨丨2上被窺視的作業,而產生窺 視詢問3 3 6請求。 [0044] 另外的一組請求者包括新運算334。新運算 334請求包括L2快取記憶體1〇6運算的詢問通過,但不包括 窺視運鼻的窺視詢問336。在一實施例中,新運算包括來 自L1D 104之載入運算、來自Lu 1〇2之載入運算、來自 L1D 104之移出運算、來自Lu 1〇2之移出運算以及來自 L1D 104之儲存運算。L1D載入運算包括從[2快取記憶體1228681 V. Description of the invention (12) Column 208 may write data into the data array 208 under the circumstance that the type 1 type operation is stolen. The end of peeping nose movement is called peeping movement. Chuan 4 [=]: 2 cache memories like 1 " include-peep action queue Λ, 2 / to control logic 2 ° 2. The peep action queue 2 (34 stores the peep action performed by U = a self memory 106. The operation of the peep action queue 204 will be described in more detail below with reference to the remaining drawings. [0042] Now referring to FIG. 3, which is a more detailed diagram of the control logic 202 of FIG. 2 according to the present invention, such as a block diagram of the cache memory 106. The dagger 2 cache memory 106 of FIG. 3 includes the control logic 202 of FIG. 2 , Peep action queue 204, tag array 206, and data array 208. In the embodiment of FIG. 3, the L2 cache memory 106 pipeline includes four phases, labeled _phase 322, K-phase 324 L-phase 326 and M-phase 328. The tag array 206 and the data array 208 each include four stages j to μ 3 2 2-3 2 8. [0043] The control logic 202 includes an arbiter 3o. 2. The arbiter 302 receives a plurality of requester inputs requesting access to the L2 cache memory 106. One of the requesters is a peep inquiry 3 3 6. The bus interface unit of FIG. 1 will respond. The external processor bus in FIG. 1 is peeked from the job and generates a peek query 3 3 6 request. [0044] Another set of requester packages New operation 334. The new operation 334 request passes the query including the L2 cache memory 106 operation, but does not include the peep query 336 for peeping nose. In one embodiment, the new operation includes a load operation from L1D 104, Load operation from Lu 102, move-out operation from L1D 104, move-out operation from Lu 102, and storage operation from L1D 104. L1D load operation includes from [2 cache memory
第18頁 1228681 五、發明說明(13) 106到L1D 104的資料轉移。L1I載入運算包括從L2快取記 憶體106到L1I 102的資料轉移。L1D移出運算包括從lid 104到L2快取記憶體106的快取線轉移。LI I移出運算包括 從L1I 102到L2快取記憶體106的快取線轉移。L1D儲存運 算包括從L1 D 1 04到L2快取記憶體1 〇6的資料轉移。 [0045] 另一清求者為窺視動作338。在一窺視詢問通 過抵達L2快取§己憶體1 〇6管線之底部時,下文所述之窺視 動作產生邏輯314會回應產生窺視動作338。 [0046] 另外一組請求者包括結束運算332。結束運算 332包括L2快取記憶體106運算的結束通過,但不包括窺視 運鼻的窺視動作338。在一實施例中,結束運算332包括L1 載入結束、L1移出結束、L 1儲存結束以及L 2移出。L1載入 結束包括L1D或L1I載入運算的結束通過。L1移出結束包括 L1D或L1I移出運异的結束通過。L1儲存結束包括L1D儲存 運算的結束通過。L2移出包括L2快取記憶體106回應一個 寫入L 2快取$己憶體1 0 6之寫入類型運算,而將L 2快取記憶 體1 0 6所配置要被取代之犧牲快取線,移出至系統記憶 體。 [0047] 若具有一衝突位址的窺視詢問,在一運算的 詢問通過後但在該運算的最後一個結束通過前進入12快取 記憶體1 0 6管線,則該運算即處於轉移中。一窺視運算也 可能是轉移中運算,如具有一衝突位址的第二窺視詢問, 係在第一窺視詢問後但在第一窺視動作前進入L2快取記憶 體106管線。 ~Page 18 1228681 V. Description of the invention (13) Data transfer from 106 to L1D 104. The L1I load operation includes data transfer from L2 cache memory 106 to L1I 102. The L1D shift-out operation includes a cache line shift from the lid 104 to the L2 cache memory 106. The LI I shift-out operation includes a cache line shift from L1I 102 to L2 cache memory 106. L1D storage operations include data transfer from L1 D 1 04 to L2 cache memory 106. [0045] Another requester is peep action 338. When a peep inquiry reaches the bottom of the L2 cache § memory body 106 pipeline, the peep action generation logic 314 described below will respond to generate a peep action 338. [0046] Another set of requesters includes an end operation 332. The end operation 332 includes the end of operation of the L2 cache memory 106, but does not include the peep action 338 of peeping nose. In one embodiment, the end operation 332 includes the end of L1 loading, the end of L1 removal, the end of L1 storage, and the removal of L2. L1 load end includes the end of the L1D or L1I load operation. The end of L1 removal includes the end of L1D or L1I removal. The end of L1 storage includes the end of the L1D storage operation. L2 removal includes L2 cache memory 106 responding to a write type operation that writes to L2 cache $ memory body 106, and sacrifice cache configured to be replaced by L2 cache memory 106 Line to move to system memory. [0047] If a peep query with a conflicting address is entered into the 12 cache memory 106 pipeline after the query of an operation passes but before the last end of the operation passes, the operation is in transition. A peek operation may also be a transition operation. For example, a second peek query with a conflicting address is entered into the L2 cache memory 106 pipeline after the first peek query but before the first peek action. ~
第19頁 1228681 五、發明說明(14) [ 0 048 ]仲裁器302會依據一優先順序,選取請求者 3 3 2 - 3 3 8中的一個,以存取標記陣列2 0 6及資料陣列2 〇 8。 亦即,仲裁器302會選取請求者332-338中的一個,將其兮己 憶體位址2 1 2傳送至標記陣列2 0 6及資料陣列2 0 8。此外, 若贏得仲裁的請求者332-338為一結束通過,則會將其更 新狀態342或轉移中狀態342傳送到標記陣列2〇6。更新狀 態3 4 2係指出,一轉移中運算隨後所要更新之標記陣列2 〇 6 中位址2 1 2所指定之快取線的快取記憶體一致性狀態。轉 移中狀態’或更新狀態,係包含於每個結束運算Μ 2及窺Page 19 1228681 V. Description of the invention (14) [0 048] The arbiter 302 will select one of the requesters 3 3 2-3 3 8 according to a priority order to access the tag array 2 0 6 and the data array 2 〇8. That is, the arbiter 302 selects one of the requesters 332-338 and transmits its memory address 2 1 2 to the tag array 206 and the data array 208. In addition, if the claimants 332-338 that have won the arbitration have passed, they will transmit their updated state 342 or transitioning state 342 to the tag array 206. The update status 3 4 2 indicates the cache memory consistency status of the cache line specified by the address array 2 2 in the tag array 2 06 that is to be updated after a transfer operation. The transition state ’or update state is included in each end operation M 2 and
視動作338中。最後,若贏得仲裁的請求者332 —338為一清 入類型運算,則其會在資料訊號34 4中提供其資料至資料 ,列208。f得仲裁者之運算類型係指定於運算類型訊號 中。運异類型346係指定下表1所列之u種運算的其中 :種。在-實施例+,仲裁器3〇2所採用之運算類型的優 先順序係預先決定的,如下表1所示。 1 · 窺視詢問Depending on action 338. Finally, if the claimant 332-338 who won the arbitration is a clear type operation, he will provide his data to the data in data signal 34 4, line 208. f The operation type of the arbiter is specified in the operation type signal. The different type 346 specifies the u operations listed in Table 1 below: In-embodiment +, the priority order of the operation types used by the arbiter 302 is determined in advance, as shown in Table 1 below. 1 · Peep inquiry
2 · L 1載入結束\ 3. L2移出| —轉移中運算的結束通過 4 · L 1移出結束丨 5 · L1儲存結束丨 6 · 窺視動作/ 7· L1D載入\2 · L 1 loading completed \ 3. L2 move out | —End of calculation during transfer passed 4 · L 1 move complete 丨 5 · L1 save complete 丨 6 · Peep action / 7 · L1D load \
12286811228681
8. L1I載入I 9· LID移出丨—新運算 i〇·L1I 移出 I 11 · L 1 D 儲存 / 表1 低優= 上表1所示,窺視動作338為結束通過之最8. L1I load I 9 · LID move out 丨 —new operation i〇 · L1I move out I 11 · L 1 D Store / Table 1 Low-optimal = shown in Table 1 above, peep action 338 is the best to end the pass
先序為低^ Φ ’亦即,窺視動作338比結束運算332的^ 運算332及窺視動作338比任何新運算334 序為尚,而比窺視詢問336的優先序為低。 至仲二5:控制,202還包括一運算管線304,其耗接 ° 運算官線304包括四個階段的儲存元件,万 於t裁器302所選取的運算行經L2快取記憶體1〇6管線之相 f陣列2 0 6及資料陣列2〇8的對應階段時,加以儲存。運身 笞線3 0 4的母個階段儲存一記憶體位址3 5 β、一運算類型 364以及一轉移中狀態362或更新狀態362。記憶體位址35( ,攸圮憶體位址212沿管線而下。運算類型364係從運算海The order is lower ^ Φ ′, that is, the peep action 338 has a higher order than the ^ operation 332 and the peep action 338 that end the operation 332, and has a lower priority than any new operation 334. To 2nd 5: Control, 202 also includes a calculation pipeline 304, which consumes ° The calculation official line 304 includes four stages of storage elements, and the calculation line selected by the t cutter 302 passes through the L2 cache memory 106. Phases of the pipeline f array 206 and data array 208 are stored at the corresponding stage. The parent stage of the 笞 line 3 0 4 stores a memory address 3 5 β, an operation type 364, and a transition state 362 or an update state 362. Memory address 35 (, 圮 memory address 212 down the pipeline. Operation type 364 is from the operation sea
型346沿官線而下。轉移中狀態362則從更新狀態342沿管 線而下。 [0051]控制邏輯2〇2還包括複數個位址比較器3〇6, 其輕接至運算管線3 04。位址比較器306從運算管線3 04的 每個階段接收記憶體位址3 5 6。此外,位址比較器3 0 6會接 收目前正由仲裁器3 0 2仲裁以存取L 2快取記憶體1 〇 6之運算Type 346 goes down the official line. The transitioning state 362 goes down the pipeline from the update state 342. [0051] The control logic 202 also includes a plurality of address comparators 306, which are lightly connected to the arithmetic pipeline 304. The address comparator 306 receives a memory address 3 5 6 from each stage of the arithmetic pipeline 3 04. In addition, the address comparator 3 06 will receive the operation currently being arbitrated by the arbiter 3 2 to access the L 2 cache memory 106.
第21頁 1228681Page 121228681
五、發明說明(16) 的記憶體位址3 5 2。最後,位址比較器3 〇 6還會接收一犧牲 位址354。犧牲位址354為一 L1移出運算(亦即,配置新快 取線的運算)之相關快取線的記憶體位址。位址比較器3〇6 比較所接收的各種位址,以判斷在一窺視詢問336位址與 位址比較器3 0 6所接收之任何其他位址間,是否已發生任 何位址衝突,如下表2部分所做的更詳細說明。位址比較 器30 6係藉由位址衝突訊號348,以表示出現位址衝突。在 一實施例中’位址衝突係發生於窺視位址的最高有效位元 與需用以指定一快取線之轉移中運算位址吻合時。V. Invention Description (16) The memory address is 3 5 2. Finally, the address comparator 306 also receives a victim address 354. Sacrifice address 354 is the memory address of the associated cache line of an L1 shift-out operation (that is, an operation that configures a new cache line). The address comparator 3 06 compares the various addresses received to determine whether any address conflict has occurred between a peek query 336 address and any other address received by the address comparator 3 06, as follows Table 2 gives a more detailed explanation. The address comparator 306 uses an address conflict signal 348 to indicate that an address conflict has occurred. In one embodiment, the 'address collision occurs when the most significant bit of the peep address coincides with the address of the computation in the branch that needs to specify a cache line.
[0052]控制邏輯202還包括窺視衝突邏輯308,其麵[0052] The control logic 202 also includes a peep conflict logic 308, which
接至位址比較器306。窺視衝突邏輯308會接收位址衝突訊 號3 4 8。此外,窥視衝突邏輯3 〇 8會從標記陣列2 〇 6接收標 兄狀態2 1 6 ’從運算管線3 0 4的每個階段接收轉移中狀態值 362,並從接受仲裁器302之仲裁以存取L2快取記憶體1〇6 的運算接收轉移中狀態366。再者,窥視衝突邏輯3〇8會從 運算管線3 04的每個階段接收運算類型364,並從接受仲裁 器302之仲裁以存取L2快取記憶體1〇6的運算之運算類型 368。最後’窺視衝突邏輯308會接收一犧牲有效訊號 3 72,其顯示犧牲記憶體位址354是否有效,亦即,一配置 動作的犧牲者是否有效。 · [ 0053 ]控制邏輯202還包括一窥視標記狀態312,其 耦接至窺視衝突邏輯308。窺視衝突邏輯308係因回應所接 收到的各種輸入,而產生窥視標記狀態31 2。窺視標記狀 態3 1 2係用來產生窺視動作及匯流排動作,如下文所述Connected to the address comparator 306. Peep collision logic 308 receives an address conflict signal 3 4 8. In addition, the peep conflict logic 3 08 will receive the tag status 2 16 from the tag array 2 06 'from each stage of the computing pipeline 3 0 4 and receive the transition status value 362, and from the arbiter 302 to accept the arbitration to The operation of accessing the L2 cache memory 106 is in the state of transition 366. Furthermore, the peep conflict logic 308 will receive the operation type 364 from each stage of the operation pipeline 304, and from the operation type 368 that accepts the arbiter 302 to access the L2 cache memory 106. . Finally, the "peep conflict logic 308" will receive a sacrificing valid signal 3 72, which shows whether the sacrificing memory address 354 is valid, that is, whether the victim of a configuration action is valid. [0053] The control logic 202 also includes a peep flag state 312, which is coupled to the peep conflict logic 308. The peep conflict logic 308 generates a peep flag state 31 2 in response to various inputs received. The peep marker status 3 1 2 is used to generate peep actions and bus actions, as described below
第22頁Page 22
1228681 五、發明說明(17) 下表2的等式係描述窺視衝突邏輯3 08如何產生窺視標記狀 態 312,其標示為EsnpTagstatus —M[1:0]。 £SnpTagSta1u$_M[l :0]= ESnp_M & LlLdFui.L· & L2MEqL_P ? 2b 00 : ESi9>_M & LlLiFm.K & L2MEqK_P 7 2b 00 : ESi5>_M & LlLdrmReq_P & ULdTuiEqM 7 2b00 : Κΐφ_Μ & LlStFin_L & L2MEqL_P ? LlStFinWrStatas_L[l:0]: ES^p_M & LlStFinJK. & L2MEqK_P ? LlSffmWiSUta5_K[l:0]: ESrv 一Μ & LlStFmR*q_P & UStFinEqM_P? LlStFinWiSUtas_P[l:0]: Κΐφ_Μ & LlCOiinLast_L & L2MEqL_P 7 LlC〇rmWiStatus_L[l:0]: Εϊφ_Μ & LlC〇riaL^st_K& L2MEqK^P ? LlCOFiaVftStajj_K[l:0]: & LlC〇rmReq_P & UC 〇rmEqM_P? • UCOFinWiStatii_P[l:0]:1228681 V. Description of the invention (17) The equation in Table 2 below describes how the peep conflict logic 3 08 generates the peep tag status 312, which is labeled as EsnpTagstatus —M [1: 0]. £ SnpTagSta1u $ _M [l: 0] = ESnp_M & LlLdFui.L · & L2MEqL_P? 2b 00: ESi9 > _M & LlLiFm.K & L2MEqK_P 7 2b 00: ESi5 > _M & Remq L2 : Κΐφ_Μ & LlStFin_L & L2MEqL_P? LlStFinWrStatas_L [l: 0]: ES ^ p_M & LlStFinJK. &Amp; L2MEqK_P? LlSffmWiSUta5_K [l: 0]: ESrv oneM & LqFF 0]: Κΐφ_Μ & LlCOiinLast_L & L2MEqL_P 7 LlC〇rmWiStatus_L [l: 0]: Εϊφ_Μ & LlC〇riaL ^ st_K & L2MEqK ^ P? LlCOFiaVftStajj_K [l: 0C] &Remp; ? • UCOFinWiStatii_P [l: 0]:
ESrq>VicCoilEA2iy_M? 2*b00 : ESti>_M & LlCOFinReq^P & UCOFinVicVId_Ρ & L2COEqM_P ? 2^00 : ESnp_M & ESn>Fin_L & UMEqL^P ? 2^00 : ESnp.M & ESrf>Fin.K & L2MEqK_P ? 2*b00 : ESnp_M & ESn>Fiifteq__P & ES i^>FinEqM_P ? 2¾ 00ESrq > VicCoilEA2iy_M? 2 * b00: ESti > _M & LlCOFinReq ^ P & UCOFinVicVId_P & L2COEqM_P? 2 ^ 00: ESnp_M & ESn > Fin_L & ESM00 +? PMqL ^ Fin.K & L2MEqK_P? 2 * b00: ESnp_M & ESn > Fiifteq__P & ES i ^ > FinEqM_P? 2¾ 00
HilSUiujJI[l:0]; 表2 [ 0054 ]字尾為_J、_K、_L、或_乂的訊號係對應於L2 快取記憶體106管線之J-階段322、K-階段324、L-階段32$6 或Μ階段3 28。字尾為_P的訊號則非屬特定階段。表2中的 狀態值係對應於如下的MESI狀態值·· 2’ bll =修改;2’1)10二 獨有;2’ b01=共享;2’ b00 =無效。表2等式中的訊號係定 義如下。 [00 5 5 ] Esnp_M為運算類型訊號364中的一個,且若為 真,則表示有一外部窺視詢問類型的運算位於M-階段328 1228681HilSUiujJI [l: 0]; Table 2 [0054] Signals ending with _J, _K, _L, or _ 乂 correspond to J-phase 322, K-phase 324, L- of L2 cache memory 106 pipeline Stage 32 $ 6 or M stage 3 28. Signals ending with _P are not specific. The status values in Table 2 correspond to the following MESI status values: 2 'bll = modified; 2' 1) 10 unique; 2 'b01 = shared; 2' b00 = invalid. The signals in Table 2 are defined as follows. [00 5 5] Esnp_M is one of the operation type signals 364, and if true, it means that there is an external peep query type operation at M-stage 328 1228681
[ 0056 ]L2MEqL一P為位址衝突訊號348 中的一個,且若 為真,則表示Μ-階段328中之運算或動作的記憶體位址 356,與L-階段326中之運算的記憶體位址356相 同。L2MEqK — Ρ為位址衝突訊號348中的一個,且若為真, 則表示M-階段328中之運算或動作的記憶體位址356,與^ 階段324中之運算的記憶體位址356相同。LiLdF in 一L為運 算類型訊號36 4中的一個,且若為真,則表示有一u載入 結束類型的運算位於L-階段326中。LILdF in —K為運算類型 訊號364中的一個,且若為真,則表示有一u載入結束類 型的運算位於1(-階段324中。1^11(1『丨111^9 — ?為仲裁運算類 型訊號368中的一個,且若為真,則表示有一u載入結束 類型之運算正接收仲裁器3〇2仲裁,以存取^快取記&體 106 °LlLdFinEqM — P為位址衝突訊號348中的一個,且若為 真,則表示一仲裁L1載入結束運算的記憶體位址352,盥 Μ- =段328中之運算的記憶體位址3 56相同。UStFin — L為 運算類型訊號3 6 4中的一個,且若為真,則表示有一Li 存結束類型的運算位於L-階段326中。L1StFin—κ為運 型訊號364中的-個,且若為真,則表示有一u儲存 類型的運算位於K-階段324中。L1StFinReq—p為仲裁運 類型訊號368中的一個,且若為真,則表示有一u儲 束類型之運算正接收仲裁器3〇2仲裁,以存取^快取記= 體106 QLlSt 一FinEqM—P為位址衝突訊號348中的一個,且: 為真,係表示一仲裁L1儲存結束運算的記憶體位址M2右[0056] L2MEqL_P is one of the address conflict signals 348, and if true, it represents the memory address 356 of the operation or action in the M-phase 328, and the memory address of the operation in the L-phase 326 356 is the same. L2MEqK — P is one of the address conflict signals 348, and if true, it indicates that the memory address 356 of the operation or action in the M-phase 328 is the same as the memory address 356 of the operation in the ^ phase 324. LiLdF in-L is one of the operation type signals 36 4 and if true, it means that a u-loaded end type operation is located in the L-phase 326. LILdF in —K is one of the operation type signals 364, and if true, it means that there is a u-load end type operation located in 1 (-stage 324. 1 ^ 11 (1 『丨 111 ^ 9 —? Is arbitration One of the operation type signals 368, and if true, it means that a u-load end type operation is receiving arbiter 302 arbitration to access ^ cache & body 106 ° LlLdFinEqM — P is the address One of the conflict signals 348, and if true, it means that an arbitration L1 loads the memory address 352 of the end operation, and the memory address 3 56 of the operation in M- = segment 328 is the same. UStFin — L is the operation type One of the signals 3 6 4 and if it is true, it means that there is an operation of ending type Li in L-phase 326. L1StFin-κ is one of the transport signals 364, and if it is true, it means that there is one The operation of u storage type is located in K-stage 324. L1StFinReq_p is one of the arbitration operation type signal 368, and if true, it means that an operation of u storage type is receiving arbiter 302 arbitration to store Take ^ cache = body 106 QLlSt-FinEqM-P is one of address conflict signals 348 , And: is true, it means that an arbitration L1 stores the memory address M2 right after the operation is completed
1228681 發明說明(19) 與Μ-階段328中之運算的記憶體位址356相同。 LlStFinWrStatus—L[1:0]為轉移中狀態訊號362中的一 個,並且顯示L-階段326中之L1儲存結束運算所要更新標 記陣列2 0 6的快取記憶體一致性狀態 值。LlStFinWrStatus—K[l:0]為轉移中狀態訊號362中的 一個,並且顯示K-階段324中之L1儲存結束運算所要更新 標記陣列206的快取記憶體一致性狀態 值。LIStFinWrStatus一P[1:0]為仲裁轉移中狀態訊號366 =的一個,並且顯示接受仲裁器3〇2仲裁之L1儲存結束運 鼻所要更新標記陣列2 0 6的快取記憶艘一致性狀態 值。LICOFi n Last 一 L為運异類型訊號364中的一個,且若為 真,則表示一L1移出結束類型運算的最後通過係位於[—階 段326中。LICOFinLast — K為運算類型訊號364中的一個, 且若為真,則表示一L1移出結束類型運算的最後通過係位 於K-階段324中。LICOFinReq—P為仲裁運算類型訊號368中 的一個,且若為真,則表示有一[丨移出結束類型運算正接 文仲裁器302之仲裁。LICOFinEqM — P為位址衝突訊號348中 的一個,且若為真,則表示一仲裁[丨移出結束運算的記憒 體位址352,與M-階段328中之運算的記憶體位址356相〜 同。LlCOFinVicVldd —P為犧牲有效訊號 3 72。LICOFi nWr Status —L[ 1:〇]為轉移中狀態訊號362中的 一個,並且顯示L-階段326中之L1移出結束運算所要更新 標§己陣列2 0 6的快取記憶體一致性狀態 值。LlCOFinWrStatus — K[ 1 ·· 〇]為轉移中狀態訊號362中的 12286811228681 Invention description (19) The memory address 356 of the operation in M-stage 328 is the same. LlStFinWrStatus—L [1: 0] is one of the status signals 362 in the transition, and displays the cache memory consistency status value of the tag array 2 0 6 to be updated in the L-phase 326 completion operation of the L1 storage. LlStFinWrStatus—K [l: 0] is one of the transition status signals 362, and displays the cache memory consistency status value of the tag array 206 to be updated when the L1 storage end operation in K-phase 324 is updated. LIStFinWrStatus_P [1: 0] is one of the status signal 366 = in the arbitration transfer, and it shows that the L1 storage that has been accepted by the arbiter 302 is finished. The cache of the tag array 2 0 6 needs to update the consistency status value. . LICOFi n Last-L is one of the different types of signals 364, and if true, it means that the last pass of an L1 removal end type operation is located in the [-stage 326]. LICOFinLast — K is one of the operation type signals 364, and if true, it means that the final pass of an L1 removal end type operation is located in K-phase 324. LICOFinReq_P is one of the arbitration operation type signals 368, and if it is true, it means that there is an arbitration of the arbiter 302 of the removal type end operation type. LICOFinEqM — P is one of the address conflict signals 348, and if it is true, it means an arbitration [丨 removes the memory address 352 of the end operation, which is the same as the memory address 356 of the operation in the M-phase 328 ~ the same . LlCOFinVicVldd —P is the sacrifice valid signal 3 72. LICOFi nWr Status — L [1: 〇] is one of the status signals 362 in the transition, and it shows the cache memory consistency status value of the § Array 2 0 6 to be updated in the L-phase 326. . LlCOFinWrStatus — K [1 ·· 〇] is 1228681 in the transition status signal 362
一個,並且顯示K-階段324中之L1移出結束運算所要更新 標記陣列2 0 6的快取記憶體一致性狀態 值。LlCOFinWrStatus — P[l:〇]為仲裁轉移中狀態訊號 中的一個,並且顯示接受仲裁器3〇2仲裁之L1移出結束運 算所要更新標記陣列206的快取記憶體一致性狀態 值。1^2(:(^〇»« — ?為位址衝突訊號34 8中的一個,且若為真, 則表示L1移出配置犧牲記憶體位址354,與M—階段328中之 運鼻的s己憶艘位址356相同。ESnpFin —L為運算類型訊號 364中的一個,且若為真,則表示有一窺視結束(或窺^動 作)類型運算位於L-階段326中。ESnpF in —K為運算類型訊 號364中的一個,且若為真,則表示有一窺視結束(或窺視 動作)類型運算位於K-階段3 24中。ESnpF inReq—p為仲裁運 算類型訊號368中的一個,且若為真,則表示有一窺視動 作類型運算正接收仲裁器3〇2仲裁,以存取[2快取記憶體 106 °ESnpFinEqM —P為位址衝突訊號348中的一個,且若為 真’則表示一仲裁窥視結束運算的記憶體位址352,與^^^ 階段328中之運算的記憶體位址356相同。 [ 00 57 ] HitStatus —M[ 1 :0]為標記陣列20 6的標記狀雖 216輸出。如可從表2等式所得知的,若窺視詢問與轉移中 運算之間未出現位址衝突,則窺視標記狀態3丨2的預設值 (以表2的EsnpTagStatuS-M[l:0]來表示)為標記狀離, 以HitStatus—M[1 ··0]來表示。 〜One, and it shows the cache memory consistency status value of the tag array 2 0 6 to be updated when the L1 removal end operation in the K-phase 324 is completed. LlCOFinWrStatus — P [1: 〇] is one of the status signals in the arbitration transfer, and it shows that the L1 shifted out of the arbiter 302 arbitration is finished and the cache memory consistency status value of the tag array 206 is updated. 1 ^ 2 (: (^ 〇 »« —? Is one of the address conflict signals 34 8 and if true, it means that L1 is moved out of the configuration memory address 354, which is the same as that in M-stage 328 Jiyi's ship address 356 is the same. ESnpFin —L is one of the operation type signals 364, and if true, it means that a peep end (or peep action) type operation is located in L-phase 326. ESnpF in —K is One of the operation type signals 364, and if true, it indicates that a peep end (or peep action) type operation is located in K-stage 3 24. ESnpF inReq_p is one of the arbitration operation type signals 368, and if it is True, it means that a peep action type operation is receiving arbiter 302 arbitration to access [2 cache memory 106 ° ESnpFinEqM —P is one of the address conflict signals 348, and if true, it means a The memory address 352 at the end of the arbitration peep operation is the same as the memory address 356 of the operation in the ^^^ stage 328. [00 57] HitStatus — M [1: 0] is the flag shape of the flag array 20 6 although it outputs 216 As can be seen from the equation in Table 2, If there is no address conflict between the operations, the default value of peeking flag status 3 丨 2 (represented by EsnpTagStatuS-M [l: 0] in Table 2) is marked off, and it is marked as HitStatus—M [1 ·· 0 ]To represent. ~
[ 005 8 ] EsnpVicCol lEarly—M為窺視衝突邏輯3〇8内部 產生的訊號,用於產生EsnpTagStatus_M 1228681 五、發明說明(21) [1:0] °ESnpVicC〇llEarly一Μ 若為真’則表示 問位於Μ-階段328中,且當該窺視詢問先規視5旬 或L-階讎時,其與一 L1移出犧牲-階細 立山1飛狂言之有效犧牲位址354 產生衝犬,而該L1移出犧牲者將被其結束通過正接受 器302仲裁之出運算所複寫。當該窺視詢問位 階段324中時,窺視衝突邏輯308會將項目(ESnp κ &、[005 8] EsnpVicCol lEarly-M is a signal generated inside the peep conflict logic 308, which is used to generate EsnpTagStatus_M 1228681 V. Description of the invention (21) [1: 0] ° ESnpVicC0llEarly-M If true, it means asking It is located in M-stage 328, and when the peep question first looks at Lent or L-step, it and a L1 move out of Sacrifice-step Hikaruyama 1 to fly the effective sacrifice address 354 to produce a red dog, and the L1 The removed victim will be overwritten by the outgoing operation that it ended arbitrating through the positive acceptor 302. When the peep query bit stage 324, the peep conflict logic 308 sends the item (ESnp κ &,
LlC〇FinReq_P & L1C0FinVicVld —p & L2c〇EqK —ρ)儲存於 一暫存器中,以產生£5111^4(:〇11£31~4^!,铁後者該 詢問位於L-階段326中時,則將暫存值與項目usn^ &LlC〇FinReq_P & L1C0FinVicVld —p & L2c〇EqK —ρ) is stored in a register to generate £ 5111 ^ 4 (: 〇11 £ 31 ~ 4 ^ !, the latter is located at L-stage 326 In the middle, the temporary value and the item usn ^ &
LlCOFinReq_P & LlCOFinVicVld_P & L2c〇EqL p)進 輯"或"的運算,再將結果儲存於一暫存器中並在該 詢問到達M-階段328時,輸出此第二個暫存 運算類型訊號364中的-個,且若為真,則表示 窺視詢問類型運算位於K-階段324中。ESnP_L為運算類型 訊號364中的-個,且若為真,則表示有一外部窺視詢問 類型運算位於L-階段326中。L2COEqK_P為位址衝突訊號 348中的一個,且若為真,則表示!^2移出配置犧牲記憶體 位址354,與K-階段324中之運算的記憶體位址356相 同。1^2(:0£91^ — ?為位址衝突訊號34 8中的一個,且若為真, 則表示L 2移出配置犧牲記憶體位址3 5 4,盘L -階段3 2 ^ Φ夕 運算的記憶體位址356相同。 …奴川中之 [0 0 5 9 ]從表2可觀察到,到達L 2快取記憶體1 〇 6管線 底部之窺視詢問的標記狀態21 6,以及在管線中或接受仲 裁以存取管線之運算的結束通過之轉移中狀態362及366,LlCOFinReq_P & LlCOFinVicVld_P & L2c〇EqL p) edit the operation of "or", and then store the result in a register and when the query reaches the M-stage 328, output this second temporary operation One of the type signals 364, and if true, it means that the peep query type operation is located in the K-phase 324. ESnP_L is one of the operation type signals 364, and if true, it means that an external peep inquiry type operation is located in the L-phase 326. L2COEqK_P is one of the address conflict signals 348, and if it is true, it means that! ^ 2 shifts out the allocation sacrifice memory address 354, which is the same as the memory address 356 of the operation in K-phase 324. 1 ^ 2 (: 0 £ 91 ^ —? Is one of the address conflict signals 34 8 and if true, it means that L 2 is moved out of the configuration to sacrifice memory address 3 5 4 and disk L-stage 3 2 ^ Φ eve The calculated memory address 356 is the same.… [0 0 5 9] in Nuchuan It can be observed from Table 2 that the status of the peek query 21 6 at the bottom of the L 2 cache memory 1 06 pipeline, and in the pipeline Or accepting arbitration to pass the end of the operation of the access pipeline to the transitioning states 362 and 366,
第27頁 1228681Page 27 1228681
五、發明說明(22) 均為影響窺視標記狀態312產生的因素。有利的是,本發 明利用中間的窺視標記狀態312,來產生窺視動作,用以 更新衝突位址之相關快取線的快取記憶體一致性狀離,並 產生匯流排動作,以回應圖1處理器匯流排112上之外部窺 視作業,如下文所述,藉此可避免像習用方法, 轉移中運算取消。 [〇〇6〇]控制邏輯202還包括窺視動作產生邏輯314,V. Description of the Invention (22) are all factors that affect the peep mark state 312. Advantageously, the present invention uses the intermediate peek mark state 312 to generate a peep action to update the cache memory consistency of the relevant cache line of the conflicting address and generate a bus action in response to the processing of FIG. The external peeking operation on the device bus 112 is described below, thereby avoiding the cancellation of operations during transfer, as in the conventional method. [0060] The control logic 202 further includes a peep motion generation logic 314,
其耦接至窺視標記狀態312。窺視動作產生邏輯314會根據 窺視標記狀態3 1 2,來產生窺視動作。由窺視動作產生邏 輯314所產生的窺視動作會儲存於圖2的窺視動作佇列2〇4 中,以經由窺視動作訊號338傳送到仲裁器3〇2。窺視動作 包括二個部份,如圖4所示。 [〇 〇 6 1 ]現在參照圖4,其係根據本發明緣示之圖2窺 視動作佇列2 0 4的方塊圖。圖4的窺視動作佇列2 〇 4係填入 範例值,以方便解說。窺視動作佇列2〇4包括一儲存元件 佇列。每個儲存元件包含一有效位元4〇2、一記憶體位址 404、窺視更新狀態位元406以及一提供資料位元4〇8。It is coupled to the peek mark state 312. The peep action generation logic 314 generates a peep action based on the peep flag state 3 1 2. The peeping action generated by the peeping action generating logic 314 is stored in the peeping action queue 204 of FIG. 2 to be transmitted to the arbiter 302 via the peeping action signal 338. The peep action consists of two parts, as shown in Figure 4. [0 〇 6 1] Referring now to FIG. 4, it is a block diagram of the peep action queue 2 0 4 of FIG. 2 according to the present invention. The peep action queue 2 in FIG. 4 is filled with example values for easy explanation. The peep action queue 204 includes a storage element queue. Each storage element includes a valid bit 402, a memory address 404, a peek update status bit 406, and a provided data bit 408.
[0 0 6 2 ]有效位元4 0 2係表示窺視動作佇列2 〇 4中之項 目包含一有效窺視動作。一旦窥視動作佇列2 〇 4中之有效 項目輸出至仲裁器302並赢得仲裁,則該項目會被標示為 無效,直到新的有效窺視動作被存入該項目為止。位址 4 0 4係指定窺視運算之相關快取線的記憶體位址。位址4 〇 4 係由M-階段328透過圖3的位址356來提供。窺視動作會將 標記陣列206中由位址404所指定之快取線的狀態,更新為[0 0 6 2] The effective bit 4 2 indicates that the item in the peep action queue 2 04 includes an effective peep action. Once a valid item in the peep action queue 204 is output to the arbiter 302 and wins arbitration, the item will be marked as invalid until a new valid peep action is stored in the item. Address 4 0 4 specifies the memory address of the cache line associated with the peek operation. Address 4 0 4 is provided by M-stage 328 through address 356 of FIG. 3. The peep action updates the status of the cache line specified by address 404 in the tag array 206 to
第28頁Page 28
1228681 五、發明說明(23) 窺視更新狀態4 0 6所儲存的快取記憶體一致性狀態。在一 實施例中,窺視更新狀態406包括四個MESI狀態值中的一 個。提供資料位元408係指定該項目中之窺視動作是否會 從資料陣列2 0 8提供資料,像是處理器匯流排11 2上之外部 窺視作業所要求的修改快取線。窺視更新狀態406及提供 資料位元408係根據下表3之所述而產生。 [ 0063 ]現請再參照圖3,下表3的等式係描述窺視動 作產生邏輯314如何產生窺視動作彳宁列204中所儲存的窺視 動作。1228681 V. Description of the invention (23) Peek update status 4 0 6 Stored cache memory consistency status. In one embodiment, the peek update status 406 includes one of four MESI status values. The provided data bit 408 specifies whether the peep action in this item will provide data from the data array 208, such as the modification cache line required for external peek operations on the processor bus 112. The peek update status 406 and the provided data bit 408 are generated as described in Table 3 below. [0063] Referring again to FIG. 3, the equation in Table 3 below describes how the peep action generation logic 314 generates the peep action stored in the peep action 204.
ESnpFiuLd = (ESrp_M & ESupTagStatuifl])丨(EiBp_M & ES if>TagStatis[p[]); ESnpPxoviieDAU = ESnp.M & ES χφΤ^ Uti$ [1] & ES ryTagS Utas〇0]; EsnpUpdAteStataQ] = 110;//更斯烏分孚成纽蚨ESnpFiuLd = (ESrp_M & ESupTagStatuifl]) 丨 (EiBp_M & ES if > TagStatis [p []); ESnpPxoviieDAU = ESnp.M & ES χφΤ ^ Uti $ [1] & ES ryTagS Utas〇Q] As = 110;
EsnpUpdateState P] * ESnp.M & S lOK & ((ES apTacS Ulos [1] & -ESn>Ta«SWaj[OD | // ΕΛ » (-ESnpT4gSUias[l]& ESapT4eSt»tas(]D:D);//S放 ft 表3 [ 00 64 ] ESnpFinLd若為真,則會指示窺視動作佇列 204,載入由窺視動作產生邏輯314所產生的窺視結束通 _ 過,或窺視動作。可以看出,窺視動作佇列2〇4會在窺視 標記狀態312為修改、獨有或共享時,載入窥視動作而 非在窺視標§己狀態31 2為無效時。 [ 0065 ]從表3可看出,只有在相關的快取線為修改EsnpUpdateState P] * ESnp.M & S lOK & ((ES apTacS Ulos [1] & -ESn > Ta «SWaj [OD | // ΕΛ» (-ESnpT4gSUias [l] & ESapT4eSt »tas () D : D); // S ft Table 3 [00 64] If ESnpFinLd is true, it will instruct the peep action queue 204 to load the end of the peep pass generated by the peep action generation logic 314, or the peep action. It can be seen that the peep action queue 204 will load the peep action when the peep flag state 312 is modified, exclusive, or shared, rather than when the peep flag § self state 31 2 is invalid. [0065] From the table 3 It can be seen that only when the relevant cache line is modified
1228681 發明說明(24) 狀態時,窺視動作才會提供資料。 [ 0 0 6 6 ]彳文表3可得知,若相關的快取線已處於獨有 或共享狀態,則L2快取記憶體106會允許快取線被共享。 在另一實施例中,圖4之窺視更新狀態4〇6的等式為: ESnpUdpdateStatus=2’bOO。亦即,L2 快取記憶體 1〇6會 將一窺視衝突之相關快取線無效化,而不允許該快取線被 共享。 [ 00 67 ] ShOK為來自於圖處理器匯流排112的訊 號,其係表不外部窺視作業允許微處理器丨〇 〇將相關快取1228681 Description of Invention (24) The data will only be provided when peeping. [0 0 6 6] According to Table 3, if the relevant cache line is already in the exclusive or shared state, the L2 cache memory 106 will allow the cache line to be shared. In another embodiment, the equation for the peek update status 4 of FIG. 4 is: ESnpUdpdateStatus = 2'bOO. That is, the L2 cache memory 106 will invalidate the cache line related to a peep conflict, and will not allow the cache line to be shared. [00 67] ShOK is a signal from the graphics processor bus 112, which indicates that the external peek operation allows the microprocessor 丨 〇 〇
線保持於共享狀態,用於如指令讀取而非無效化之窺視作 業的情形。 [0068] 控制邏輯1〇2還包括匯流排動作產生邏輯 31 6 ’其耦接至窺視標記狀態3丨2及圖1的匯流排介面單元 108。匯流排動作產生邏輯316會根據窺視標記狀態312, 產生一匯流排動作3 7 4以傳送至匯流排介面單元1 q 8。匯流 排動作3 7 4係指示匯流排介面單元1 〇 8,如何在處理器匯流 排U2上回應會使其產生窥視運算(窺視標記狀態312因而 產生)之外部窺視作業。The line remains in a shared state, and is used for peeping operations such as instruction fetching rather than invalidation. [0068] The control logic 102 further includes a bus action generating logic 31 6 ′, which is coupled to the peek mark state 3 2 and the bus interface unit 108 of FIG. 1. The bus action generating logic 316 will generate a bus action 3 7 4 according to the peek mark state 312 for transmission to the bus interface unit 1 q 8. The bus action 3 7 4 instructs the bus interface unit 108 to respond to an external peep operation on the processor bus U2 which will cause it to generate a peep operation (the peep flag state 312 is thus generated).
[0069] 控制邏輯202還包括結束通過產生邏輯gig, 其耗接至標記陣列206及運算管線3 04。結束通過產生邏輯 318會產生新運算334(亦即非窺視運算)的結束通過或結束 動作。在一實施例中,結束通過產生邏輯318會產生[丨載 ^結束、L1移出結束、L1儲存結束以及L2移出通過或運 算。結束運算3 3 2包括用以更新標記陣列2 〇 6的更新狀態、[0069] The control logic 202 also includes an end pass generation logic gig, which is consumed by the tag array 206 and the arithmetic pipeline 304. The end pass generation logic 318 generates an end pass or end action of a new operation 334 (i.e., a non-peek operation). In one embodiment, the end pass generation logic 318 generates [load end, L1 removal end, L1 storage end, and L2 removal pass or operation. The end operation 3 3 2 includes an update status for updating the marker array 2 06,
12286811228681
用以檢索標記陣列206與資粗陆π。 1κ次Mr从# 枓陣列2 08之一記憶體位址、一 運鼻類型及貝料(右該結東運笪也 •審笪姑你莽、西 采建异為寫入動作),而在該結束 運异被仲裁1§ 3 0 2選取時,所古乂、+,tS α於、 ^ ^ 所有則述項目係分別藉由却味 342、212、346及344來提供。 j猎由汛唬 [ 0070 ]控制邏輯202還包括一結束動作仔列川,复 麵接至結束通過產生邏輯318。結束動作仔删 : 通過產生邏輯318接收結束動作,並儲存該結束動作^束 經由結束通過訊號332傳送至仲裁器3〇2。It is used to retrieve the marker array 206 and the rough land π. 1κ 次 Mr from # 枓 Array 2 08 one of the memory address, the nose type and the shell material (right this knot to the east, and also to the east, you have to write the action), and in the When the end of the difference was selected by the arbitration 1 § 3 02, all the items mentioned above were provided by Kuwei 342, 212, 346, and 344, respectively. The control logic 202 also includes an end action, Tsai Lie Chuan, which is connected to the end pass generation logic 318. The end action is deleted: the end action is received by the generating logic 318, and the end action ^ bundle is stored and transmitted to the arbiter 3202 via the end signal 332.
、[ 007 1 ]現在參照圖5,其係根據本發明繪示,對於 部被窺視之作業所產生的窺視運算與轉移中運算間的衝 犬’圖1之L2快取記憶體1 〇 6於内部進行處理的運作流裎 圖。流程係從方塊502開始。 #[ 0072 ]在方塊502中,圖3的仲裁器302選取圖3的新 運算3 34,使其可存取L2快取記憶體1〇6,並進入圖3管線 之J-階段322。亦即,新運算334的記憶體位址212會被送 到標記陣列2 0 6、資料陣列2 〇 8及圖3的運算管線3 04,而 新運算334的運算類型346則會傳送到運算管線3〇4。流程 繼續進行方塊504。 [0073]在方塊5 04中,仲裁器302從圖1的匯流排介面 單元1 0 8接收圖3的窺視詢問運算3 3 6,而窺視詢問運算3 3 6 則接受仲裁以存取L2快取記憶體1 〇 6。匯流排介面單元1 〇 8 會回應圖1之處理器匯流排11 2上所窥視的外部作業,而產 生窺視询問3 3 6。流程繼續進行方塊5 〇 6。 [ 0074 ]在方塊506中,新運算334到達管線的底部(亦[007 1] Referring now to FIG. 5, it is shown in accordance with the present invention that for the punching dog between the peep operation and the operation in the transition generated by the peeped operation, the L2 cache memory 1 in FIG. Operation flow chart for internal processing. The process starts at block 502. # [0072] In block 502, the arbiter 302 of FIG. 3 selects the new operation 3 34 of FIG. 3 so that it can access the L2 cache memory 106 and enters the J-stage 322 of the pipeline of FIG. 3. That is, the memory address 212 of the new operation 334 will be sent to the tag array 206, the data array 208, and the operation pipeline 3 04 of FIG. 3, and the operation type 346 of the new operation 334 will be transmitted to the operation pipeline 3. 〇4. Flow continues to block 504. [0073] In block 504, the arbiter 302 receives the peep query operation 3 3 6 of FIG. 3 from the bus interface unit 10 8 of FIG. 1, and the peek query operation 3 3 6 accepts arbitration to access the L2 cache. Memory 1 06. The bus interface unit 108 responds to the external operation peeped on the processor bus 11 2 of FIG. 1, and generates a peep inquiry 3 3 6. The flow continues to block 506. [0074] In block 506, the new operation 334 reaches the bottom of the pipeline (also
1228681 五、發明說明(26) 即,到達M-階段328),並從標記陣列206取得標記狀態 216。因為產生衝突之窺視詢問已在新運算334的最後結束 通過前進入L2快取記憶體106管線,所以新運算334此時為 轉移中運算。結束通過產生邏輯3 1 8根據所取得的標記狀 態216及運算類型364,產生轉移中運算的結束通過,其包 括用以更新標記陣列2 0 6的轉移中狀態。流程繼續進行方 塊 5 0 8 〇1228681 V. Description of the invention (26) That is, the M-stage 328 is reached, and the mark state 216 is obtained from the mark array 206. Since the conflicting peep inquiry entered the L2 cache memory 106 pipeline before the end of the new operation 334, the new operation 334 is a branch-in operation at this time. End-of-pass generation logic 3 1 8 generates an end-of-transition operation based on the obtained flag state 216 and operation type 364, which includes updating the state of the flag array 206 in transition. The process continues to block 5 0 8 〇
[ 0075 ]在方塊508中,結束動作佇列382將在方塊506 中所產生之轉移中運算的結束通過332傳送到仲裁器302, 並且轉移中運算會經由仲裁器302進行仲裁。流程繼續進 行方塊512。 [ 00 76 ]在方塊512中,窺視詢問到達M-階段328。窺 視衝突邏輯308偵測窺視運算與轉移中運算之間的位址衝 突。流程繼續進行方塊514。 [ 0077 ]在方塊514中,窺視衝突邏輯3〇8根據窺視詢 問從標記陣列2 0 6所接收的標記狀態2 1 6以及轉移中運算的 相關轉移中狀態3 6 2,而產生窺視標記狀態31 2,如前文表 2部分所述。流程繼續進行方塊51 6。[0075] In block 508, the end action queue 382 transmits the end of the branch operation generated in block 506 to the arbiter 302 through 332, and the branch operation arbitrates through the arbiter 302. Flow continues to block 512. [0076] In block 512, the peek query reaches M-stage 328. The peep conflict logic 308 detects an address conflict between a peep operation and an operation in a branch. Flow continues to block 514. [0077] In block 514, the peep conflict logic 3008 generates the peep marker state 31 based on the peek query received from the marker array 2 06 and the marker state 2 1 6 and the relevant transition in-operation state 3 6 2 calculated in the transition. 2, as described in Table 2 above. Flow continues to block 51 6.
[ 00 78 ]在方塊516中,窺視衝突邏輯3〇8根據表3所得 之窺視標兄狀態31 2 ’產生一窺視動作3 3 8,以存入窺視動 作佇列204。窺視動作338包括一窺視更新狀態,用以更新 標記陣列2 0 6。流程繼續進行方塊5 1 8。 [ 0079 ]在方塊518中,在方塊516中所產生的窺視動 作338接受仲裁器302仲裁,以存取L2快取記憶體1〇6。流[0079] In block 516, the peep conflict logic 308 generates a peep action 3 3 8 based on the peep target state 31 2 ′ obtained in Table 3 to store in the peep action queue 204. The peep action 338 includes a peek update status to update the tag array 206. The flow proceeds to block 5 1 8. [0079] In block 518, the peep action 338 generated in block 516 is arbitrated by the arbiter 302 to access the L2 cache memory 106. flow
第32頁 1228681 五、發明說明(27) 程繼續進行方塊522。 [ 0 08 0 ]在方塊522中,轉移中運算的結束通過以更新 狀態訊號3 4 2中的轉移中狀態,來更新標記陣列2 〇 6。此 外,若轉移中運算為寫入類型運算,則會經由資料訊號 3 44,將資料寫入資料陣列208。流程繼續進行方塊524。 [ 008 1 ]在方塊524中,窺視動作338以方塊516所產生 的窺視更新狀態,來更新標記陣列206。此外,若窺視動 作338的提供資料欄位408顯示窺視運算要提供資料給處理 器匯流排112上之外部窺視作業,則窺視動作338會經由資 料訊號218從資料陣列208取得資料,以提供給匯流排介面 單元1 0 8。流程繼續進行方塊5 2 6。 [ 0082 ]在方塊526中,匯流排動作產生邏輯316根據 窺視標記狀態3 1 2,產生一匯流排動作374。在一實施例 中,方塊526實質上與方塊516同時發生。流程繼續進行方 塊528。 [0083] 在方塊528中,匯流排介面單元1〇8以方塊526 期間所產生的匯流排動作3 7 4,來回覆處理器匯流排11 2上 的外部窥視作業,其可能包括提供在方塊524期間所取得 的資料。流程結束於方塊5 2 8。 [0084] 為了解說本發明,以下將敘述各種時序圖。 為了更完整瞭解本發明,以下先討論習用L2快取記憶體的 運作時序圖。 [0 0 8 5 ]現在參照圖6,其係一相關技術時序圖,繪示 習用L2快取記憶體將與一窺視運算衝突之轉移中運算取消Page 32 1228681 V. Description of Invention (27) The process proceeds to block 522. [0 08 0] In block 522, the end of the branch operation is updated by updating the flag array 2 06 with the branch state in the state signal 3 4 2. In addition, if the in-transition operation is a write-type operation, data is written into the data array 208 via the data signal 3 44. Flow continues to block 524. [0081] In block 524, the peek action 338 updates the tag array 206 with the peek update status generated in block 516. In addition, if the providing data field 408 of the peep action 338 indicates that the peep operation is to provide data to an external peek operation on the processor bus 112, the peep action 338 obtains data from the data array 208 via the data signal 218 to provide to the bus Row interface unit 108. The flow proceeds to block 5 2 6. [0082] In block 526, the bus action generation logic 316 generates a bus action 374 based on the peek mark state 3 1 2. In one embodiment, block 526 occurs substantially simultaneously with block 516. Flow continues to block 528. [0083] In block 528, the bus interface unit 108 uses the bus action 3 7 4 generated during the block 526 to repeat the external peeking operation on the processor bus 112, which may include providing the block Information obtained during 524. The process ends at block 5 2 8. [0084] In order to understand the present invention, various timing diagrams will be described below. In order to understand the present invention more completely, the operation timing diagram of the conventional L2 cache memory is discussed below. [0 0 8 5] Referring now to FIG. 6, it is a related art timing diagram showing that the conventional L2 cache memory cancels the operation in the transfer that conflicts with a peep operation.
1228681 五、發明說明 的,作實f列此例係假設習用L 2快取記憶體中有四個階段 的管線’與圖3之L2快取記憶艎1〇6的階段類似。此例也假 設習用L2快取記憶趙為多次通過式快取記憶體。該時序圖 包括對應於九個連續時脈週期的九行,以及對應於習用u 快取記憶體之四個管線階段(稱為j、K、L及趵的四列。圖 中的每個項目係、顯示在特定時脈週期期間特定管線階段1228681 V. Description of the invention, the implementation of column f assumes that the conventional L2 cache memory has four stages of pipelines, which are similar to the stages of L2 cache memory 106 in Fig. 3. This example also assumes that L2 cache memory is a multi-pass cache memory. The timing diagram includes nine rows corresponding to nine consecutive clock cycles and four pipeline stages (called j, K, L, and 趵) corresponding to the conventional u cache memory. Each item in the diagram System, showing specific pipeline stages during a specific clock cycle
[ 0086 ]在圖6的例子中,習用u快取記憶體接收—具 有記憶體位址A的儲存運算,其係由習用微處理器令另一 快:記憶體(如L1快取記憶體)所啟始。當該儲存運算仍進 :時,其後則緊接著-個具有相同記憶體位址A的窺視運 =,因此在,存運算與窺視運算之間會產生衝突。在此例 中,窺視運异為無效化的窺視運算,亦即,處理器匯流排 ^之外部窺視作業為無效化的作t,人無效 無效化作業。 R ^ [ 0087 ]在時脈週期!期間,儲存運算的詢問通過(以 StQ八表不)進入管線階段;。在時脈週期2期間窥視運 2詢問通過(以SnpQ A表示)在儲存詢問後,進入 、 :脈週期3期間,二個運算沿著管線往下進行到下個階 段〇 [0 0 8 8 ]在時脈週期4期 部,並且快取記憶體會提供 定的快取線。若是未與窺視 結束通過會繼續進行,將位[0086] In the example of FIG. 6, the conventional u cache memory is received—a storage operation with a memory address A, which is caused by another conventional microprocessor: the memory (such as the L1 cache memory) Get started. When the storage operation still advances, it is followed by a peep operation = with the same memory address A, so there will be a conflict between the storage operation and the peek operation. In this example, the peep operation is invalidated peep operation, that is, the external peek operation of the processor bus ^ is invalidated operation t, and the person invalidated invalidated operation. R ^ [0087] in the clock cycle! During this period, the query of the storage operation passed (indicated by StQ eight) into the pipeline stage; During the clock cycle 2, the peep 2 query is passed (indicated by SnpQ A). After the query is stored, it enters: During the clock cycle 3, the two operations go down the pipeline to the next stage. 0 [0 0 8 8 ] In the 4th part of the clock cycle, and the cache memory will provide a fixed cache line. If it does not pass the end of the peep, it will continue,
間,儲存詢問到達管線的底 一獨有標記狀態給位址A所指 詢問產生衝突,則儲存運算的 於位址A的快取線更新為修改At the same time, the storage query reaches the bottom of the pipeline. A unique flag state conflicts with the query pointed to by address A. The cache line at address A in the storage operation is updated with the modification.
12286811228681
由於快取記憶體 所以習用快取記 狀態,並將資料寫入快取記憶體。然而 偵測到儲存詢問與窺視詢問之間的衝突 憶體會將儲存運算取消。 [ 0089 ]在時脈週期5期間,窺視詢問到 部,並且快取記憶體會提供一猸右_ $綠的底 斉死択獨有軚圮狀態給位址A所指 定的快取線。由於儲存運算已被敗、% 丁心异匕被取消,所以獨有狀離 快取線的正確狀態。亦即,若兮啟左、審μ w 4 β 〜A该 ^ ^ ^ Ak '右萏儲存運异從未被啟始,貝,j 該快取線的狀匕、就是其任何本來的狀態。在時脈 間,窺視運算的窺視動作通過(以SnpA A表示)進入管線月Because of the cache memory, the cache state is used, and the data is written into the cache memory. However, a conflict between the save query and the peek query was detected. The memory cancels the save operation. [0089] During clock cycle 5, peep inquiries to the server, and the cache memory will provide a right_ $ green bottom 斉 dead 択 exclusive status to the cache line specified by address A. Because the storage operation has been defeated, and the Dingxinyi Dagger has been canceled, it is unique to leave the correct state of the cache line. That is, if the left and right μ w 4 β ~ A this ^ ^ ^ Ak 'Right storage storage has never been started, the shape of the cache line, j, is any of its original state. In the clock, the peep action of the peep calculation enters the pipeline through (represented by SnpA A)
J階段。I時脈週期7到"月間,窺視動作會行經 、”1 的 餘階段’以將該快取線狀態更新為無效,如快取記憶體 窺視之外部作業所指定的。 ΜJ stage. Between clock cycle 7 and "month", the peep action will go through the remaining stages of "1" to update the status of the cache line as invalid, as specified by the external operation of the cache memory peep.
[ 0090 ]如上所述,習用的[2快取記憶體必須取消該 儲存運算。否則,儲存運算之結束通過會將有效資料寫入 快取記憶體,並將狀態更新為修改,而後由於該窺視動作 會接收到顯示該快取線未被修改之獨有狀態(在儲存結束 通過將狀態更新為修改前,已取得此狀態),所以該窺視 動作會將該快取線無效化。結果是,有效的儲存資料會遺 失。因此,習用的L2快取記憶體必須取消儲存運算,以使 得該窺視詢問能接收到正確狀態。取消儲存運算(亦即轉 移中運算)會有負面的影響,如此處所述。 [ 0 0 9 1 ]現在參照圖7,其係根據本發明繪示圖1之[2 快取記憶體1 0 6依據圖5流程運作的時序圖。從圖7將可得 知’本發明的L2快取記憶體1〇6内部本身可有利地處理衝[0090] As mentioned above, the conventional [2 cache memory must cancel the storage operation. Otherwise, the valid data will be written into the cache memory at the end of the storage operation, and the status will be updated to modify, and then due to the peep action, a unique status showing that the cache line has not been modified (passed at the end of storage) Update the status to the status before the modification), so this peep action will invalidate the cache line. As a result, valid stored data is lost. Therefore, the conventional L2 cache memory must cancel the storage operation so that the peep inquiry can receive the correct state. Canceling a save operation (that is, a move-in operation) has a negative effect, as described here. [0 0 9 1] Referring now to FIG. 7, it is a timing diagram illustrating the operation of the [2 cache memory 1 0 6 of FIG. 1 according to the process of FIG. 5 according to the present invention. It can be seen from FIG. 7 that the internal portion of the L2 cache memory 106 of the present invention can favorably process
1228681 五、發明說明(30) 突的窺視運鼻所造成之影響’而不需將轉移中運算取消, 藉此可減輕取消轉移中運算所帶來的負面效應。1228681 V. Description of the invention (30) Impact of sudden peeping nose operation 'without canceling the operation in the transfer, thereby reducing the negative effect brought by canceling the operation in the transfer.
[ 0 0 9 2]在圖7之時序圖中,圖3的L2快取記憶體1〇6係 經由新運算訊號334,而接收到圖1之L1D 104所啟始之lid 儲存運算’亦即’具有記憶體位址A之儲存運算詢問。當 該儲存運算處於轉移中時,其後則緊接著從匯流排介面單 元1 0 8經由窺視詢問訊號3 3 6而來之具有相同記憶體位址A 的窺視運算。窺視詢問會造成該儲存運算與窺視運算之間 的衝突。在此例中,該窺視運算為無效化的窺視運算,亦 即,處理器匯流排1 12上之外部窺視作業為無效化的作 業’如寫入無效化或讀取無效化作業。在圖7中,儲存詢 問通過係稱為StQ A,窺視詢問通過稱為SnpQ A,而窺視 動作通過則稱為SnpA,如圖6的情形。此外,位址a之儲存 運算的儲存動作通過或儲存結束通過則稱為以人a。 [ 0093 ]在時脈1期間,依據圖5的方塊5〇2,仲裁器 3 0 2選取該儲存詢問通過,且該儲存詢問會進到圖3的j —階 段3 22。在時脈週期2期間,依據方塊504 ,儲存詢問進行 到K-階段324 ’並且窺視詢問會接受以存取[2快取記憶體[0 0 9 2] In the timing diagram of FIG. 7, the L2 cache memory 106 of FIG. 3 passes the new operation signal 334, and receives the lid storage operation initiated by L1D 104 of FIG. 'The storage operation query with memory address A. When the storage operation is in transition, a peek operation having the same memory address A from the bus interface unit 10 8 via the peep inquiry signal 3 36 is followed shortly thereafter. Peep queries can cause conflicts between this stored operation and the peek operation. In this example, the peep operation is an invalid peep operation, that is, an external peep operation on the processor bus 112 is an invalid operation 'such as a write invalidation or a read invalidation operation. In FIG. 7, the stored inquiry pass is called StQ A, the peep inquiry pass is called SnpQ A, and the peep action pass is called SnpA, as in the case of FIG. 6. In addition, the storage operation of the storage operation of the address a is passed or the storage operation is passed. [0093] During clock 1, according to block 502 in FIG. 5, the arbiter 302 selects the stored query to pass, and the stored query will proceed to j-stage 3 22 in FIG. During clock cycle 2, according to block 504, the stored query proceeds to K-phase 324 'and the peek query is accepted to access [2 cache memory
1 0 6 °在時脈3期間,儲存詢問及窺視詢問分別進行到L—階 段326及K-階段324。 [ 0094 ]在時脈4期間,儲存詢問到達%一階段328,並 從標=陣列206接收到一獨有標記狀態216值。結束通過產 生邏輯318會從運算管線304接收到該獨有標記狀態216及 — L1D儲存運算類型364值,並依據方塊506所取得的標記1 0 6 ° During clock 3, the storage inquiry and the peep inquiry proceed to L-phase 326 and K-phase 324, respectively. [0094] During clock 4, the storage query reaches the% stage 328, and a unique flag state 216 value is received from the tag = array 206. Upon completion of the generation, the logic 318 will receive the unique flag state 216 and — L1D store the operation type 364 value from the operation pipeline 304, and according to the flag obtained in block 506.
12286811228681
五、發明說明(31) :態216及運算類型364 ’而產生儲存結束通 =或轉移中狀態。“例中,所產生的轉移中狀態: 件裁[=]之:二期二依段 [ 0 0 9 6 ]再者,在時脈5期間,根據方塊512, 突邏輯3。8會偵測儲存運算與窺視運算之間的衝突=衝 者,窥視㈣會到達M-階段m,並從標記陣列2〇6取得一 獨有標記狀態2 1 6值。然而,在下述的時脈6到8期間,一 旦儲存動作通過將狀態更新為修改時,該快取線的獨有狀 態就是不正確的,或將是不正確的。因此,本發明之窥視 衝突邏輯308可有利地產生比窺視詢問所接收之不正確獨 有標記狀態2 1 6值還新的窺視標記狀態3丨2。亦即,依照方 塊5 1 4 ’窺視衝突邏輯3 〇 8根據所取得的窺視詢問標記狀態 216及修改(Modified)的儲存動作更新狀態值,而產生窥 視標記狀態3 1 2,亦即轉移中狀態,如表2所述。在此例 根據表2,窺視衝突邏輯308會產生修改的窺視標記狀 態312值,這是因為窺視詢問係位於M-階段328中,如運算 類型訊號364所顯示;儲存結束位於L-階段326中,如運算 類型訊號364所顯示;M-階段328及L-階段326中的位址356 會衝突,如位址衝突訊號348所顯示;以及位於L-階段326 之轉移中儲存結束更新狀態為修改。 [ 0097 ]根據方塊516,回應窺視標記狀態312的產 生’窺視動作產生邏輯314會依據表3所得之窺視標記狀態V. Description of the invention (31): State 216 and operation type 364 ′ result in storage end pass = or state in transition. "In the example, the state of the transition in progress: Piece [[]]: Second phase and second phase [0 0 9 6] Also, during clock 5, according to block 512, Logic 3. 8 will detect The conflict between the storage operation and the peep operation = the rusher, the peeper will reach the M-phase m and obtain a unique flag state 2 1 6 value from the flag array 206. However, at the following clock 6 to 6 During the period of 8, the unique state of the cache line is incorrect or will be incorrect once the storage action is updated to the modified state. Therefore, the peep conflict logic 308 of the present invention can advantageously generate a peep Query received incorrect unique mark status 2 1 6 value is also new peep mark status 3 丨 2. That is, according to block 5 1 4 'Peep Conflict Logic 3 008 based on the obtained peek query mark status 216 and modify The (Modified) storage action updates the status value to generate the peek mark status 3 1 2, which is the transition status, as described in Table 2. In this example, according to Table 2, the peek conflict logic 308 will generate a modified peek mark status. A value of 312 is because the peek query is located in M-stage 328, such as The calculation type signal 364 is displayed; the storage end is located in the L-phase 326, as shown in the operation type signal 364; the address 356 in the M-phase 328 and the L-phase 326 will conflict, as shown in the address conflict signal 348; And in the transfer at L-phase 326, the storage update status is modified. [0097] According to block 516, in response to the generation of the peep mark status 312, the 'peep action generation logic 314 will obtain the peep mark status according to Table 3.
第37頁 1228681Page 1228 28681
3 1 2 ’而產生窺視動作。在此例中,根據表3,窺視動作產 生邏輯314會產生真值的EsnpFinLd訊號,以使得一窺視動 作被載入窺視動作佇列2〇4。而由於窺視標記狀態312為修 改(2’ bll),窺視動作產生邏輯314會產生真值的的提供資 料欄位408。並且,由於在此例中,外部匯流排作業為無 效化類型的作業(亦即不可共享),所以窺視動作產生邏輯 314會產生一無效(2,b〇〇)的窺視更新狀態4〇6值。 [ 0098 ]在時脈6期間,根據方塊518,該窺視動作接 受仲裁器302之仲裁進入j_階段322。在時脈6到8期間,根 據方塊522,儲存動作會行經K—階段324、L-階段326及M-階段328,而以修改的轉移中狀態值來更新標記陣列2〇6, 並將儲存資料寫入資料陣列2 〇 8。 [ 0099 ]在時脈7到9期間,根據方塊524,窺視動作會 行經K-階段324、L-階段326及M-階段328,而以無效的窺 視更新狀態值來更新標記陣列2 〇 6。此外,在時脈9期間, 若窥視動作的提供資料欄位4〇8顯示,處理器匯流排丨12上 所窺視的作業應該提供資料,則窥視動作會從資料陣列 2 08經由資料訊號218,而取得快取線資料。在一實施例 中,可能會產生一或多個後續的動作通過,以得到處理器 匯流排11 2所窺視的作業所請求的資料。 ° [ooloo]在後續的時脈週期期間,根據方塊526,匯 流排動作產生邏輯316依據窺視標記狀態312而產生一匯流 排動作’並將該匯流排動作傳送到匯流排介面單元1 〇 8, 並且根據步驟528,匯流排介面單元丨08會以該匯流排動作3 1 2 ′ and a peep motion occurs. In this example, according to Table 3, the peep action generation logic 314 generates a true EsnpFinLd signal, so that a peep action is loaded into the peep action queue 204. And because the peep flag state 312 is modified (2 'bll), the peep action generating logic 314 will generate a true-valued information field 408. And, in this example, the external bus operation is an invalidation type operation (that is, it cannot be shared), so the peek action generation logic 314 will generate an invalid (2, b00) peep update status value of 4.0. . [0098] During clock 6, according to block 518, the peek action accepts arbitration by the arbiter 302 into the j_ stage 322. During clocks 6 to 8, according to block 522, the storage action will go through K-phase 324, L-phase 326, and M-phase 328, and the marker array 206 will be updated with the modified transition state value, and will be stored The data was written into the data array 208. [0100] During clocks 7 to 9, according to block 524, the peep action passes through K-phase 324, L-phase 326, and M-phase 328, and the marker array 206 is updated with invalid peep update status values. In addition, during clock 9, if the data provided by the peeping action field 408 indicates that the task peeking on the processor bus 12 should provide data, then the peeping action will pass from the data array 2 08 through the data signal. 218, and get the cache line data. In one embodiment, one or more subsequent actions may be generated to obtain the data requested by the operation peeked by the processor bus 11 2. ° [ooloo] During the subsequent clock cycle, according to block 526, the bus action generation logic 316 generates a bus action according to the peek mark state 312 and transmits the bus action to the bus interface unit 1 08, And according to step 528, the bus interface unit 丨 08 will act on the bus
12286811228681
=ΛΛ/Λ乍業。有利的*,回應處理器匯流排 遇期中勃視/部作業的匯流排動作,可在後續的時脈 在時序上非屬_,如同L2快取記憶體 1〇6與1^快取記憶體102及104之間的訊號般。因此,窺視 衝突只會影響L2快取記憶體106的内部控制邏輯,或是影 響非時序關鍵(non-tin]ing_critical)的邏輯,藉此可改 善微處理器100的運作頻率,並且,由於不需在快取記憶 體1 02-1 06之間進行取消轉移中運算的相關溝通以可 降低複雜度。= ΛΛ / ΛChaye. Favorable *, in response to the bus action of the Vision / Department during the bus encounter period of the processor, it can be non-_ in terms of timing in subsequent clocks, like L2 cache memory 106 and 1 ^ cache memory The signal between 102 and 104 is like. Therefore, peep conflicts will only affect the internal control logic of the L2 cache memory 106 or non-tin-critical_critical logic, thereby improving the operating frequency of the microprocessor 100. Relevant communication for canceling operations in the cache memory 1 02-1 06 is needed to reduce complexity.
[0 0 1 0 1 ]雖然本發明及其目的、特徵與優點已詳細敘 述,其它實施例亦可包含在本發明之範圍内。例如,雖然 本發明已參照寫入無效化窺視協定加以說明,本發明仍可 適用於其他的協定,如寫入更新協定。此外,雖然本發明 已參照快取記憶體一致性狀態鈍3丨協定做說明,本發明仍 可適用於其他的快取記憶體一致性協定。最後,雖然本發 明的L2快取纪憶體係以L2位在L1快取記憶體與系統記憶體 ,間的系統脈絡做說明,但L2快取記憶體也可位於微處理[0 0 1 0 1] Although the present invention and its objects, features, and advantages have been described in detail, other embodiments may be included within the scope of the present invention. For example, although the present invention has been described with reference to a write invalidation peep protocol, the present invention is applicable to other protocols, such as a write update protocol. In addition, although the present invention has been described with reference to the cache consistency state blunt protocol, the present invention is still applicable to other cache consistency protocols. Finally, although the L2 cache memory system of the present invention uses L2 to describe the system context between L1 cache memory and system memory, L2 cache memory can also be located in microprocessing
器之任一快取記憶體階層,只要在此階層中,轉移中運算 可能會與窺視運算衝突。 ^ 總之’以上所述者,僅為本發明之較佳實施例而已, 當不能以之限定本發明所實施之範圍。大凡依本發明申請 專利範圍所作之均等變化與修飾,皆應仍屬於本發明專利 涵蓋之範圍内,謹請貴審查委員明鑑,並祈惠准,是所 至禱。Any cache memory level of the processor, as long as this level, the operation in the transfer may conflict with the peep operation. ^ In summary, the above is only a preferred embodiment of the present invention, and the scope of implementation of the present invention cannot be limited by it. Any equal changes and modifications made in accordance with the scope of the patent application for the present invention should still fall within the scope of the patent for the present invention. I ask your reviewing committee to make a clear note and pray for your approval.
1228681 圖式簡單說明 明繪示一微處理器之快取記 憶 [ 0023 ]圖1係根據本發 體階層的方塊圖。 [ 0 024 ]圖2係根據本發明繪示圖1之L2快取記憶體的 方塊圖。 〜 [0 0 2 5 ]圖3係根據本發明更詳細繪示圖2控制邏輯之 圖1的L 2快取記憶體方塊圖。 [ 0 026 ]圖4係根據本發明繪示之圖2窺視動作仵列的 方塊圖。 明繪示’對於外部被窺視之作 中運算間的衝突,圖1之L 2快 運作流程圖。 術時序圖,繪示習用L2快取記 轉移中運算取消的運作實例。 明繪示圖1之L2快取記憶體依 [ 0027 ]圖5係根據本發 業所產生的窺視運算與轉移 取記憶體於内部進行處理的 [ 0028 ]圖6係一相關技 憶體將與一窺視運算衝突之 [ 0 02 9 ]圖7係根據本發1228681 Schematic illustration of a schematic diagram showing a cache memory of a microprocessor [0023] FIG. 1 is a block diagram according to the present invention. [0 024] FIG. 2 is a block diagram illustrating the L2 cache memory of FIG. 1 according to the present invention. [0 0 2 5] FIG. 3 is a block diagram of the L 2 cache memory of FIG. 1 according to the present invention in more detail. [0 026] FIG. 4 is a block diagram of the peek action queue of FIG. 2 according to the present invention. It is clearly shown that for the conflict between operations in the externally viewed work, the L 2 fast operation flow chart in FIG. The operation timing diagram shows the operation example of the operation cancellation in the conventional L2 cache. Fig. 1 shows the L2 cache memory of Fig. 1 according to [0027]. Fig. 5 is based on the peep operation and transfer of memory generated by the industry. [0028] Fig. 6 is a related technology memory and [0 02 9] A glance at the conflict of operations Figure 7 is based on the present
據圖! 5流程 運 作 的時序圖。 圖號 說明= 100 : 微處 理 器 102 第一 階 指 令(L1 I)快取記憶 體 104 第一 階 資 料(L1 D )快取記憶 體 106 第二 階(L2)快取記憶體 108 匯流排 介 面單元 112 處理 器 匯 流排 202 控制 邏 輯 第40頁 1228681 圖式簡單說明 204 窥視動作佇列 206 標記陣列 208 資料陣列 212 記憶體位址 216 狀態輸出 218 資料輸出 302 仲裁器 304 運算管線 306 位址比較器 308 窺視衝突邏輯 312 窺視標記狀態 314 窺視動作產生邏輯 316 匯流排動作產生邏輯 318 結束通過產生邏輯 322 J -階段 324 K-階段 326 L-階段 328 Μ-階段 332 結束運算 334 新運算 336 窺視詢問 338 窥視動作 342 更新狀態或轉移中狀態 344 資料訊號According to Figure! 5 sequence timing diagram of the operation. Description of drawing number = 100: microprocessor 102 first order instruction (L1 I) cache memory 104 first order data (L1 D) cache memory 106 second order (L2) cache memory 108 bus interface Unit 112 Processor Bus 202 Control Logic Page 40 1228681 Brief Description of Drawings 204 Peep Action Queue 206 Tag Array 208 Data Array 212 Memory Address 216 Status Output 218 Data Output 302 Arbiter 304 Arithmetic Pipeline 306 Address Comparator 308 Peep conflict logic 312 Peep flag status 314 Peep action generation logic 316 Bus action generation logic 318 End pass generation logic 322 J-Phase 324 K-phase 326 L-phase 328 M-phase 332 End operation 334 New operation 336 Peek inquiry 338 Peep action 342 Update status or transition status 344 Data signal
第41頁 1228681 圖式簡單說明 346 :運算類型訊號 348 :位址衝突訊號 3 5 2 :記憶體位址 3 5 4 ··犧牲位址 3 5 6 :記憶體位址 362 :轉移中狀態或更新狀態 364、368 :運算類型 3 6 6 :轉移中狀態 372 :犧牲有效訊號 374 :匯流排動作 382 :結束動作佇列 4 0 2 ··有效位元 4 0 4 :記憶體位址 40 6 :窺視更新狀態位元 40 8 ·•資料位元 第42頁Page 411228681 Simple explanation of the drawing 346: Operation type signal 348: Address conflict signal 3 5 2: Memory address 3 5 4 ·· Sacrificed address 3 5 6: Memory address 362: In-transit status or update status 364 , 368: Operation type 3 6 6: In transition state 372: Sacrifice valid signal 374: Bus action 382: End action queue 4 0 2 ·· Valid bit 4 0 4: Memory address 40 6: Peep update status bit $ 40 8 • Data Bits 第 42 页
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW92129620A TWI228681B (en) | 2003-10-23 | 2003-10-23 | Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW92129620A TWI228681B (en) | 2003-10-23 | 2003-10-23 | Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI228681B true TWI228681B (en) | 2005-03-01 |
| TW200515281A TW200515281A (en) | 2005-05-01 |
Family
ID=36013511
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW92129620A TWI228681B (en) | 2003-10-23 | 2003-10-23 | Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI228681B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI411915B (en) * | 2009-07-10 | 2013-10-11 | Via Tech Inc | Microprocessor, memory subsystem and method for caching data |
-
2003
- 2003-10-23 TW TW92129620A patent/TWI228681B/en not_active IP Right Cessation
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI411915B (en) * | 2009-07-10 | 2013-10-11 | Via Tech Inc | Microprocessor, memory subsystem and method for caching data |
Also Published As
| Publication number | Publication date |
|---|---|
| TW200515281A (en) | 2005-05-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7146468B2 (en) | Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache | |
| US7398361B2 (en) | Combined buffer for snoop, store merging, load miss, and writeback operations | |
| US7032074B2 (en) | Method and mechanism to use a cache to translate from a virtual bus to a physical bus | |
| US6272602B1 (en) | Multiprocessing system employing pending tags to maintain cache coherence | |
| US5784590A (en) | Slave cache having sub-line valid bits updated by a master cache | |
| US7284097B2 (en) | Modified-invalid cache state to reduce cache-to-cache data transfer operations for speculatively-issued full cache line writes | |
| US8195881B2 (en) | System, method and processor for accessing data after a translation lookaside buffer miss | |
| EP1311956B1 (en) | Method and apparatus for pipelining ordered input/output transactions in a cache coherent, multi-processor system | |
| KR100228940B1 (en) | How to keep memory consistent | |
| US20040039880A1 (en) | Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system | |
| US6871267B2 (en) | Method for increasing efficiency in a multi-processor system and multi-processor system with increased efficiency | |
| GB2260628A (en) | Line buffer for cache memory | |
| US20060085603A1 (en) | Processor, data processing system and method for synchronzing access to data in shared memory | |
| JP4594900B2 (en) | Processor, data processing system, and method for initializing a memory block | |
| US6349366B1 (en) | Method and apparatus for developing multiprocessor cache control protocols using a memory management system generating atomic probe commands and system data control response commands | |
| TWI275992B (en) | A method to reduce memory latencies by performing two levels of speculation | |
| US20080086594A1 (en) | Uncacheable load merging | |
| US9465740B2 (en) | Coherence processing with pre-kill mechanism to avoid duplicated transaction identifiers | |
| US10802968B2 (en) | Processor to memory with coherency bypass | |
| US8108621B2 (en) | Data cache with modified bit array | |
| TWI228681B (en) | Cache memory and method for handling effects of external snoops colliding with in-flight operations internally to the cache | |
| JP5319049B2 (en) | Cash system | |
| GB2502858A (en) | A method of copying data from a first memory location and storing it in a cache line associated with a different memory location | |
| US6314496B1 (en) | Method and apparatus for developing multiprocessor cache control protocols using atomic probe commands and system data control response commands | |
| US8108624B2 (en) | Data cache with modified bit array |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MK4A | Expiration of patent term of an invention patent |