TW201301032A - High-performance cache system and method - Google Patents
High-performance cache system and method Download PDFInfo
- Publication number
- TW201301032A TW201301032A TW100122199A TW100122199A TW201301032A TW 201301032 A TW201301032 A TW 201301032A TW 100122199 A TW100122199 A TW 100122199A TW 100122199 A TW100122199 A TW 100122199A TW 201301032 A TW201301032 A TW 201301032A
- Authority
- TW
- Taiwan
- Prior art keywords
- track
- address
- memory
- instruction
- branch
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 97
- 230000015654 memory Effects 0.000 claims abstract description 593
- 238000012546 transfer Methods 0.000 claims description 25
- 238000013507 mapping Methods 0.000 claims description 16
- 238000012552 review Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims 1
- 238000004590 computer program Methods 0.000 claims 1
- 239000002585 base Substances 0.000 description 27
- 230000008569 process Effects 0.000 description 11
- 230000002457 bidirectional effect Effects 0.000 description 8
- 239000000872 buffer Substances 0.000 description 8
- 238000013519 translation Methods 0.000 description 8
- 101100511858 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) LSB1 gene Proteins 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000003637 basic solution Substances 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000945 filler Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
本發明涉及積體電路及電腦領域The invention relates to an integrated circuit and a computer field
通常而言,緩存的作用是將記憶體中的一部分內容複製在其中,使這些內容能在短時間內由處理器核快速存取,以保證流水線的持續運行。Generally speaking, the role of the cache is to copy a part of the contents of the memory, so that the content can be quickly accessed by the processor core in a short time to ensure the continuous operation of the pipeline.
現行緩存的定址都基於以下方式,首先用位址中的索引段定址讀出標籤記憶體中的標籤。同時用位址中索引段與塊內位移段共同定址讀出緩存中的內容。此外,將標籤記憶體中讀出的標籤與位址中的標籤段進行匹配。如果從標籤記憶體中讀出的標籤與地址中的標籤段相同,那麼從緩存中讀出的內容有效,稱為緩存命中。否則,如果從標籤記憶體中讀出的標籤與地址中的標籤段不相同,成為緩存缺失,從緩存中讀出的內容無效。對於多路組相聯的緩存,同時對各個路組並行進行上述操作,以檢測哪個路組緩存命中。命中路組對應的讀出內容為有效內容。若所有路組都為缺失,則所有讀出內容都無效。緩存缺失之後,緩存控制邏輯將低級存儲媒介中的內容填充到緩存中。The current cache addressing is based on the following method: First, the label in the tag memory is read by the index segment in the address. At the same time, the index in the address is shared with the displacement segment in the block to read the contents of the cache. In addition, the tag read in the tag memory is matched with the tag segment in the address. If the tag read from the tag memory is the same as the tag segment in the address, then the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is different from the tag segment in the address, the cache is missing, and the content read from the cache is invalid. For the cascading cache, the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
緩存缺失可分為三類狀況:強制缺失、衝突缺失和容量缺失。在現有緩存結構中,除了預取成功的小部分內容外,強制缺失是不可避免的。但是,現有的預取操作會帶來不小的代價。此外,雖然多路組相聯緩存可以降低衝突缺失,但受制於功耗及速度限制(如因為多路組相聯緩存結構要求將所有路組由同一索引定址的內容及標籤同時讀出並比較),路組數難以超過一定數目。此外,為了使緩存的速度與處理器核的運行速度匹配,很難增加緩存的容量。所以有多層次緩存的設置,低層次的緩存比高層次的緩存容量大但速度慢。Cache misses can be divided into three categories: mandatory missing, missing conflicts, and missing capacity. In the existing cache structure, in addition to prefetching a small portion of content successfully, forced deletion is inevitable. However, existing prefetch operations can be costly. In addition, although the multiplexed associative cache can reduce the lack of collisions, it is subject to power consumption and speed limitations (eg, because the multiplexed associative cache structure requires that all channels and addresses addressed by the same index be read and compared simultaneously. ), the number of road groups is difficult to exceed a certain number. In addition, in order to match the speed of the cache with the running speed of the processor core, it is difficult to increase the capacity of the cache. So there are multi-level cache settings, low-level cache is larger than high-level cache but slow.
因此,現代的緩存系統通常由多路組相連的多層次緩存構成。新的緩存結構,如:犧牲緩存、跟蹤緩存以及預取(取一個緩存塊時把下一個緩存塊也取來放在緩存緩衝器或使用預取指令)等被用來彌補現有的某些缺陷。然而,隨著日漸擴大的處理器/記憶體速度鴻溝,現行體系結構,特別是多種多樣的緩存缺失的可能性,仍是制約現代處理器性能提升的最嚴重瓶頸。Therefore, modern cache systems are typically composed of multi-level caches connected by multiplexes. New cache structures, such as victim cache, trace cache, and prefetch (putting the next cache block in the cache buffer or using prefetch instructions when taking a cache block) are used to compensate for some of the existing defects. . However, with the ever-increasing processor/memory speed gap, the current architecture, and in particular the potential for multiple cache misses, is still the most serious bottleneck that constrains the performance of modern processors.
本發明提出的方法與系統裝置能直接解決上述或其他的一個或多個困難。The method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
本發明提出一種數位系統,所述數位系統包括一個處理器核和一個緩存控制單元。所述處理器核連接一個包含可執行指令的第一記憶體和一個比第一記憶體速度更快的第二記憶體,且所述處理器用於執行一條或多條存儲在第二記憶體中的可執行指令。所述緩存控制單元連接第一記憶體、第二記憶體和處理器核,用於在處理器核執行第一記憶體中的至少一條或多條指令之前將這一條或多條指令填充到第二記憶體中。此外,所述緩存控制單元進一步能用於對正被從第一記憶體填充到第二記憶體的指令進行審查,從而提取出至少包含分支(轉移)資訊的指令資訊,並根據提取出的指令資訊建立複數條軌道,以及根據複數條指令軌道中的一條或多條軌道填充至少一條或多條指令。The present invention provides a digital system that includes a processor core and a cache control unit. The processor core is coupled to a first memory including executable instructions and a second memory faster than the first memory, and the processor is configured to execute one or more pieces stored in the second memory Executable instructions. The cache control unit is connected to the first memory, the second memory, and the processor core, and is configured to fill the one or more instructions to the processor core before executing the at least one or more instructions in the first memory Two memory. In addition, the cache control unit can further be configured to review an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch (transfer) information, and according to the extracted instruction. The information creates a plurality of tracks and fills at least one or more instructions according to one or more tracks in the plurality of instruction tracks.
本發明還提出一種用於輔助處理器核運行的方法,所述處理器核連接一個包含可執行指令的第一記憶體和一個比第一記憶體速度更快的第二記憶體。所述方法包括對正被從第一記憶體填充到第二記憶體的指令進行審查,從而提取出至少包括分支資訊的指令資訊,根據提取出的指令資訊建立複數條軌道,以及根據複數條指令軌道中的一條或多條軌道將至少一條或多條指令在被處理器核執行前從第一記憶體填充到第二記憶體,使得處理器核能不依賴於第一記憶體獲取所述至少一條或多條指令。The present invention also provides a method for assisting operation of a processor core that connects a first memory containing executable instructions and a second memory that is faster than the first memory. The method includes reviewing an instruction being filled from a first memory to a second memory, thereby extracting instruction information including at least branch information, establishing a plurality of tracks according to the extracted instruction information, and according to the plurality of instructions One or more tracks in the track fill at least one or more instructions from the first memory to the second memory before being executed by the processor core, such that the processor core can acquire the at least one piece independent of the first memory Or multiple instructions.
本發明還提出一種用於緩存控制設備以控制處理器核緩存操作的方法。所述處理器核連接一個包含可執行指令的第一記憶體和一個比第一記憶體速度更快的第二記憶體;且所述處理器核用於執行一條或多條第二記憶體中可執行指令。所述方法包括對正被從第一記憶體填充到第二記憶體的指令進行審查,從被審查的指令中提取出指令資訊。所述方法還包括在處理器核執行分支點(轉移點)前,根據提取的指令資訊確定所述分支點,以及將對應於分支點的分支目標指令的指令段從第一記憶體填充到第二記憶體,使第二記憶體包含處理器核執行所述分支點導致的任何指令。The present invention also proposes a method for a cache control device to control processor core cache operations. The processor core is coupled to a first memory including executable instructions and a second memory faster than the first memory; and the processor core is configured to execute one or more second memories Executable instructions. The method includes reviewing an instruction being filled from the first memory to the second memory, and extracting instruction information from the reviewed instruction. The method further includes determining, before the processor core executes the branch point (transition point), the branch point according to the extracted instruction information, and filling the instruction segment of the branch target instruction corresponding to the branch point from the first memory to the first The second memory causes the second memory to contain any instructions caused by the processor core to execute the branch point.
本發明還提出一種利用緩存控制單元控制包括連接處理器核的第一記憶體和連接所述第一記憶體的第二記憶體在內的複數個緩存記憶體的方法。所述方法包括對被填充到複數個記憶體中的指令進行審查,從被審查的指令中提取出指令資訊。所述方法還包括根據提取出的指令資訊在軌道表中創建軌跡點,用低層次緩存記憶塊號和高層次緩存記憶塊號之一表示分支目標軌跡點的表項。其中,當使用低層次緩存記憶塊號表示分支目標軌跡點時,所述分支目標軌跡點對應的一個指令段被填充到第一記憶體中;當使用高層次緩存記憶塊號表示分支目標軌跡點時,所述分支目標軌跡對應的指令段被填充到第二記憶體而不是第一記憶體中。The present invention also provides a method for controlling a plurality of cache memories including a first memory connected to a processor core and a second memory connected to the first memory by a cache control unit. The method includes reviewing instructions that are populated into a plurality of memories, and extracting instruction information from the reviewed instructions. The method further includes creating a track point in the track table according to the extracted instruction information, and representing the entry of the branch target track point by one of the low level cache memory block number and the high level cache memory block number. Wherein, when the branch target track point is represented by the low-level cache memory block number, an instruction segment corresponding to the branch target track point is filled into the first memory; when the high-level cache memory block number is used to represent the branch target track point The instruction segment corresponding to the branch target track is filled into the second memory instead of the first memory.
本發明還提出一個用於處理器核控制緩存運行緩存控制設備。所述處理器核連接一個包含可執行指令的第一記憶體和一個比第一記憶體速度更快的第二記憶體,且所述處理器被配置為執行第二記憶體中可執行指令中的一條或多條。所述設備包括一個第一填充生成單元,一個循跡器和一個分配器。所述第一填充生成單元用於審查從第一記憶體填充到第二記憶體中的指令,並從被審查的指令中提取出指令資訊。所述循跡器用於根據提取的指令資訊控制領先指標,從而在處理器核執行到分支點之前確定所述分支點。此外,所述分配器用於將分支點的目標指令對應的指令段從第一記憶體填充到第二記憶體,使第二記憶體包含處理器核執行所述分支點導致的任何指令。The present invention also proposes a processor core control cache running cache control device. The processor core is coupled to a first memory including executable instructions and a second memory faster than the first memory, and the processor is configured to execute executable instructions in the second memory One or more. The apparatus includes a first fill generation unit, a tracker and a dispenser. The first padding generating unit is configured to review an instruction filled from the first memory into the second memory, and extract instruction information from the reviewed instruction. The tracker is configured to control the leading indicator according to the extracted instruction information, thereby determining the branch point before the processor core executes to the branch point. In addition, the allocator is configured to fill the instruction segment corresponding to the target instruction of the branch point from the first memory to the second memory, so that the second memory includes any instruction caused by the processor core to execute the branch point.
對於本領域專業人士,還可以在本發明的說明、權利要求和附圖的啟發下,理解、領會本發明所包含其他方面內容。Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.
雖然該發明可以以多種形式的修改和替換來擴展,說明書中也列出了一些具體的實施圖例並進行詳細闡述。應當理解的是,發明者的出發點不是將該發明限於所闡述的特定實施例,正相反,發明者的出發點在於保護所有基於由本權利聲明定義的精神或範圍內進行的改進、等效轉換和修改。Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. .
圖1是本發明所述的計算環境的一個實施例。如圖1所示,計算環境1000包括一個處理器核125、一個高層次記憶體124、一個填充/生成器123、一個低級記憶體122和一個循跡引擎320。應當理解的是,圖中所示的部件或設備僅僅是為了說明而不是限制,可以省略某些部件或設備,也可以增加其他的部件或設備。此外,本實施例僅對用於讀取指令的裝置進行描述,用於讀取資料和存儲資料的裝置與之類似。1 is an embodiment of a computing environment in accordance with the present invention. As shown in FIG. 1, computing environment 1000 includes a processor core 125, a high level memory 124, a fill/generator 123, a low level memory 122, and a tracking engine 320. It should be understood that the components or devices shown in the figures are only for purposes of illustration and not limitation, and some of the components or devices may be omitted, and other components or devices may be added. Further, the present embodiment describes only the apparatus for reading an instruction, and the apparatus for reading data and storing data is similar.
高層次記憶體124和低級記憶體122可以由任意合適的存儲設備構成,例如靜態記憶體(SRAM)、動態記憶體(DRAM)和快閃記憶體(flash memory)。在本實施例中,記憶體的層次表示其與處理器核在連接上的接近程度。越靠近處理器核的記憶體層次越高。此外,通常一個較高層次的記憶體在速度上越快,且面積越小。高層次記憶體124可以作為系統的緩存或當有其他緩存存在時的一級緩存,也可以被分割為複數個被稱為塊(如:記憶塊)的存儲片段,用於存儲處理器核125要訪問的資料(即:指令和資料)。The high level memory 124 and the low level memory 122 may be constructed of any suitable storage device, such as static memory (SRAM), dynamic memory (DRAM), and flash memory. In this embodiment, the level of memory indicates how close it is to the processor core. The closer the memory level is to the processor core, the higher the level of memory. In addition, usually a higher level of memory is faster in speed and smaller in area. The high-level memory 124 can be used as a cache of the system or a level 1 cache when other buffers exist, or can be divided into a plurality of memory segments called blocks (eg, memory blocks) for storing the processor core 125. Access to information (ie: instructions and information).
處理器核125可以是能以流水線方式並與緩存系統協同工作的任意合適的處理器。處理器核125可以使用分開的指令緩存與資料緩存,並可以包含一些用於緩存操作的指令。當處理器核125執行一條指令時,處理器核125首先需要從記憶體中讀入指令和/或資料。循跡引擎320和填充/生成器123用於將處理器核125將要執行到的指令填充到高層次記憶體124中,使處理器核125能從高層次記憶體124中以非常低的緩存缺失率讀到所需的指令。在本實施例中,術語“填充”表示將資料/指令從較低層次的記憶體移動到較高層次的記憶體中,術語“訪問記憶體”表示處理器核125對最接近的記憶體(即高層次記憶體124或一級緩存)進行讀或寫。Processor core 125 can be any suitable processor that can be pipelined and cooperate with the cache system. Processor core 125 may use separate instruction caches and data caches, and may include some instructions for cache operations. When processor core 125 executes an instruction, processor core 125 first needs to read the instructions and/or data from the memory. The tracking engine 320 and the padding/generator 123 are used to populate the high-level memory 124 with instructions to be executed by the processor core 125, enabling the processor core 125 to be missing from the high-level memory 124 with a very low cache. Rate to read the required instructions. In the present embodiment, the term "filling" means moving data/instructions from a lower level memory to a higher level memory, and the term "access memory" means that the processor core 125 is closest to the memory ( That is, high-level memory 124 or level 1 cache) for reading or writing.
循跡引擎320及諸如填充/生成器123其他部件可以作為處理器晶片的一部分實現在同一積體電路中,也可以是獨立的晶片,或作為程式在處理器晶片中運行,或由軟體和硬體組合構成。The tracking engine 320 and other components such as the fill/builder 123 can be implemented in the same integrated circuit as part of the processor die, or can be a stand-alone wafer, or run as a program in a processor die, or by software and hard. Body composition.
在本實施例中,循跡引擎(tracking engine)320根據填充/生成器123和處理器核125發送來的資訊,產生適當的位址用於獲取所需指令或包含所需指令的指令塊。循跡引擎320還可以向填充/生成器123提供適當的位址,使得填充/生成器123能利用該位址從低級記憶體122中獲取對應指令或包含對應指令的指令塊,並將所述指令或指令塊存儲到記憶體124中。此外,循跡引擎320還可以產生對高層次指令記憶體124的塊號。所述塊號與處理器核125產生的塊內偏移量可以一同構成指令定址位址,在不發生緩存缺失的情況下,從高層次指令記憶體124中取得對應的指令送往處理器核125。In the present embodiment, the tracking engine 320 generates the appropriate address for acquiring the desired instruction or the instruction block containing the required instruction based on the information sent by the pad/generator 123 and the processor core 125. The tracking engine 320 can also provide the fill/generator 123 with an appropriate address so that the fill/builder 123 can use the address to retrieve the corresponding instruction or the instruction block containing the corresponding instruction from the lower memory 122, and The instructions or blocks of instructions are stored in memory 124. In addition, the tracking engine 320 can also generate block numbers for the high level instruction memory 124. The block number and the intra-block offset generated by the processor core 125 can form an instruction addressing address together, and the corresponding instruction is sent from the high-level instruction memory 124 to the processor core without a cache miss. 125.
具體來說,填充/生成器123包括一個生成器130和一個填充引擎132。填充引擎132可以根據適當的位址獲取指令或指令塊。生成器130可以對從低級記憶體122中獲取來的每一條指令進行審查,並提取出某些資訊,如:指令類型、指令位址、分支指令的分支目標資訊。所述指令以及包含分之目標資訊的提取出的資訊被送到循跡引擎320。在本實施例中一條分支指令或一個分支點指的是任何適當的能導致處理器核125改變執行流(如:非按順序執行一條指令)的指令形式。循跡引擎320可以根據所述指令以及分支目標資訊確定位址資訊,如:指令類型、分支源位址和分支目標位址資訊。舉例而言,指令類型可以包括條件分支指令、無條件分支指令和其他指令等。特別地,可以認為無條件分支指令是條件分支指令的一種特例,即條件總是成立。因此,指令類型可以分為分支指令和其他指令等。分支源位址可以指分支指令本身的位址,分支目標位址可以指當分支成功發生時將被轉移到的位址。此外,還可以包括其他資訊。Specifically, the pad/generator 123 includes a generator 130 and a padding engine 132. The fill engine 132 can fetch instructions or instruction blocks based on the appropriate address. The generator 130 can review each instruction fetched from the low-level memory 122 and extract certain information such as an instruction type, an instruction address, and a branch target information of the branch instruction. The instructions and the extracted information including the target information of the points are sent to the tracking engine 320. In this embodiment a branch instruction or a branch point refers to any suitable form of instructions that causes processor core 125 to change the execution stream (e.g., execute an instruction out of order). The tracking engine 320 can determine address information, such as instruction type, branch source address, and branch target address information, according to the instruction and the branch target information. For example, the instruction types may include conditional branch instructions, unconditional branch instructions, and other instructions. In particular, an unconditional branch instruction can be considered to be a special case of a conditional branch instruction, ie, the condition is always true. Therefore, the instruction type can be divided into branch instructions and other instructions. The branch source address can refer to the address of the branch instruction itself, and the branch target address can refer to the address to which the branch will be transferred when the branch succeeds. In addition, you can include other information.
此外,循跡引擎320可以根據所確定的資訊提供的用於填充高層次記憶體124的位址資訊建立位址樹或軌道表。圖2A給出了根據本發明所述方法實現位址樹的一個實施例。In addition, the tracking engine 320 can establish an address tree or a track table based on the address information provided by the determined information for populating the high level memory 124. Figure 2A illustrates an embodiment of implementing an address tree in accordance with the method of the present invention.
如圖2A所示,位址樹300可以包括樹節點310和312,樹幹301、302、304、305和307,以及樹支303和306。一個樹幹對應一段固定或可變長度的指令序列。一個樹節點可以是一條在該指令後可能發生轉移的分支指令。如果分支轉移成功發生,一個連接所述樹節點和分支目標位址的樹支就被建立起來。舉例而言,301、302、304、305和307是對應樹幹的普通指令段;310和312是對應樹節點的分支指令;311和313是分支目標,並能以此建立樹支303和306。此外,其他可能的結構也可以被使用。As shown in FIG. 2A, the address tree 300 can include tree nodes 310 and 312, trunks 301, 302, 304, 305, and 307, and tree branches 303 and 306. A trunk corresponds to a fixed or variable length sequence of instructions. A tree node can be a branch instruction that may be transferred after the instruction. If the branch transfer succeeds, a tree branch connecting the tree node and the branch target address is established. For example, 301, 302, 304, 305, and 307 are common instruction segments corresponding to the trunk; 310 and 312 are branch instructions corresponding to the tree nodes; 311 and 313 are branch targets, and tree branches 303 and 306 can be established thereby. In addition, other possible structures can also be used.
在程式執行過程中,位址樹300或位址樹300中的任意部分可以被用做處理器核125執行的一個指令序列的軌跡或軌道。所述指令序列的第一條指令可以被認為是軌跡頭(HOL)或軌道頭,且包括所述第一條指令的指令段被填充到高層次記憶體124中,以被處理器核125使用。在執行過程中,當前指令可以成為正在執行的指令序列中的第一條指令,這樣HOL就沿軌跡移動。此外,還可以產生一個或多個預測軌跡頭(PHOL),用於指示處理器核125可能使用的指令序列。舉例而言,在一個樹節點(即,一條分支指令),根據分支是否發生,可能存在兩個PHOL。在執行過程中,可以根據軌跡中的分支點移動所述PHOL,且PHOL通常領先於HOL。Any portion of address tree 300 or address tree 300 may be used as a track or track of a sequence of instructions executed by processor core 125 during program execution. The first instruction of the sequence of instructions may be considered a track head (HOL) or track header, and the instruction segments including the first instruction are filled into high level memory 124 for use by processor core 125. . During execution, the current instruction can be the first instruction in the sequence of instructions being executed, so that the HOL moves along the trajectory. In addition, one or more predictive track headers (PHOLs) may be generated for indicating a sequence of instructions that processor core 125 may use. For example, in a tree node (ie, a branch instruction), there may be two PHOLs depending on whether a branch occurs. During execution, the PHOL can be moved according to the branch point in the trajectory, and the PHOL is usually ahead of the HOL.
根據分支節點的層數,位址樹300可以提供不同深度。舉例而言,一個一層位址樹可以僅支援一層分支(如:下一個分支),一個兩層位址樹可以支援兩層分支(如:當第一層分支沒有發生時的所述第一層分支之後的分支,或當第一層分支發生時的所述第一層分支對應分支目標軌道上的分支);此外,一個多層位址樹可以支援多層分支。The address tree 300 can provide different depths depending on the number of layers of the branch nodes. For example, a layer address tree can support only one layer of branches (eg, the next branch), and a two-layer address tree can support two layers of branches (eg, the first layer when the first layer branch does not occur) The branch after the branch, or the branch of the first layer when the first layer branch occurs corresponds to the branch on the branch target track; in addition, a multi-layer address tree can support the multi-layer branch.
圖2B是基於本發明所述位址樹運行的的一個實施例。如圖2B所示,直線表示程式,曲線表示轉移路徑,粗點表示分支指令,虛線表示對應程式按固定長度或近似長度指令段的劃分(如:指令段)。Figure 2B is an embodiment of the operation of the address tree in accordance with the present invention. As shown in FIG. 2B, the straight line represents the program, the curve represents the transfer path, the thick point represents the branch instruction, and the broken line represents the division of the corresponding program by the fixed length or the approximate length instruction segment (eg, the instruction segment).
一開始,處理器核125執行程式段30,直到條件分支指令31。若分支指令31的分支轉移條件不成立,則處理器核125執行程式段33,直到無條件分支指令36,之後沿轉移路徑34途徑無條件分支轉移至程式段37。另一方面,若在執行條件分支指令31時分支轉移條件成立,則處理器核125執行沿轉移路徑32轉移到的程式段35,之後再繼續執行程式段37。Initially, processor core 125 executes block 30 until conditional branch instruction 31. If the branch transition condition of the branch instruction 31 is not true, the processor core 125 executes the program segment 33 until the unconditional branch instruction 36, and then moves to the program segment 37 along the transfer path 34 path unconditional branch. On the other hand, if the branch transition condition is satisfied when the conditional branch instruction 31 is executed, the processor core 125 executes the program segment 35 to which the transfer path 32 is transferred, and then continues to execute the program segment 37.
在執行完程式段37後,處理器核125執行程式段38直到用於迴圈的條件分支指令39。若條件分支指令39的迴圈條件成立,則沿轉移路徑40再次執行程式段38。重複多次執行38段直至迴圈條件不成立,之後處理器核125執行程式段41。After execution of block 37, processor core 125 executes block 38 until conditional branch instruction 39 for the loop. If the loop condition of the conditional branch instruction 39 is true, the block 38 is executed again along the transition path 40. The 38 segments are repeatedly executed until the loop condition is not established, after which the processor core 125 executes the block 41.
所述多個程式段可以由指令段11、12、13、14、15、16和17表示,且每個指令段可以包含相同數目的指令或在變長指令集時的不同數目的指令。舉例而言,指令段11可以包括程式段30中的全部指令和程式段33中的一部分指令;指令段12可以包括程式段33中的另一部分指令;指令段13可以包括程式段35中的一部分指令;指令段14可以包括程式段35中的另一部分指令和程式段37中的一部分指令;指令段15可以包括對應程式段37中的另一部分指令;指令段16可以包括程式段38中的一部分指令;指令段17可以包括程式段38中的另一部分指令和程式段41中的一部分指令。可以根據應用目標或硬體資源確定每個指令段的大小。The plurality of program segments may be represented by instruction segments 11, 12, 13, 14, 15, 16, and 17, and each of the instruction segments may include the same number of instructions or a different number of instructions in the variable length instruction set. For example, instruction segment 11 may include all of the instructions in program segment 30 and a portion of instructions in program segment 33; instruction segment 12 may include another portion of instructions in program segment 33; instruction segment 13 may include a portion of program segment 35 The instruction segment 14 may include another portion of the instructions in the program segment 35 and a portion of the instructions in the program segment 37; the instruction portion 15 may include another portion of the instructions in the corresponding program segment 37; the instruction portion 16 may include a portion of the program segment 38. The instruction segment 17 can include another portion of the instructions in the program block 38 and a portion of the instructions in the program segment 41. The size of each instruction segment can be determined based on the application target or hardware resources.
為便於描述,在本實施例中假設對指令段的填充可以不採用交替填充的方法,都是在一個指令段填充完成後再將下一個待填充指令段填充到高層次指令記憶體124中。此外,假設位址樹300的深度為一層。即只有一層分支被用於將指令段填充到高層次記憶體124中。其他配置方法也可以被類似使用。在處理器核125運行之初,填充/生成器123開始將指令段11填充到高層次記憶體124中,並掃描被填充到高層次記憶體124中的每條指令。在某些情況下,可以一次掃描兩條或更多指令,執行一條指令,掃描到的指令領先於執行到的指令。舉例而言,可以在一個時鐘週期內審查兩條指令,同時處理器核125執行一條指令;或在多發射處理器情況下,在一個時鐘週期內審查八條指令,同時處理器核125執行四條指令。其他配置方法也可以被用於在執行前進行掃描。For convenience of description, it is assumed in the present embodiment that the filling of the instruction segments may not use the alternate filling method, and the next to-be-filled instruction segment is filled into the high-level instruction memory 124 after the completion of the filling of one instruction segment. Further, assume that the depth of the address tree 300 is one layer. That is, only one layer of the branch is used to fill the instruction segment into the high level memory 124. Other configuration methods can also be used similarly. At the beginning of processor core 125 operation, pad/generator 123 begins to fill instruction segment 11 into high level memory 124 and scans each instruction that is filled into high level memory 124. In some cases, two or more instructions can be scanned at a time, executing one instruction, and the scanned instructions are ahead of the instructions that are executed. For example, two instructions can be reviewed in one clock cycle while processor core 125 executes one instruction; or in the case of a multi-transmit processor, eight instructions are reviewed in one clock cycle while processor core 125 executes four instructions instruction. Other configuration methods can also be used to scan before execution.
此外,填充/生成器123掃描條件分支指令31後,填充/生成器123可以判斷出條件分支指令31是一條分支指令,且能提取出分支指令31的位於程式段35中的目標位址。這樣,循跡引擎320控制填充/生成器123將所述目標位址對應的指令段,即指令段13,填充到高層次記憶體124中。這樣,指令段13在條件分支指令31被執行前就被填充到高層次記憶體124中。此外,分支指令31對應順序執行的下一條指令所位於的指令段11已經被填充到高層次記憶體124中,所以不需要額外填充操作。Further, after the pad/generator 123 scans the conditional branch instruction 31, the pad/generator 123 can determine that the conditional branch instruction 31 is a branch instruction and can extract the target address of the branch instruction 31 located in the program segment 35. Thus, the tracking engine 320 controls the padding/generator 123 to fill the instruction segment corresponding to the target address, that is, the instruction segment 13, into the high level memory 124. Thus, the instruction segment 13 is filled into the high level memory 124 before the conditional branch instruction 31 is executed. In addition, the instruction segment 11 in which the branch instruction 31 corresponds to the next instruction sequentially executed has been filled into the high level memory 124, so no additional padding operation is required.
此外,當執行到分支指令31時,假設分支轉移條件不成立,則繼續執行指令段11。當指令段11中的最後一條指令被執行時,後一指令段12已經被填充到高層次指令記憶體124中,使得指令段11中最後一條指令執行完畢後能在不發生緩存缺失的情況下執行後一條指令。Further, when the branch instruction 31 is executed, it is assumed that the branch transition condition is not established, and the instruction segment 11 is continuously executed. When the last instruction in the instruction segment 11 is executed, the latter instruction segment 12 has been filled into the high level instruction memory 124 so that the last instruction in the instruction segment 11 can be executed without a cache miss. Execute the latter instruction.
在填充指令段12時,對正在填充的指令進行掃描。填充/生成器123可以發現指令段12的最後一條指令是分支指令(即無條件分支指令36)。這樣,分支指令36目標位址對應的指令段(即指令段14)被填充到高層次指令記憶體124中。When the instruction segment 12 is filled, the instruction being filled is scanned. The pad/generator 123 can find that the last instruction of the instruction segment 12 is a branch instruction (ie, an unconditional branch instruction 36). Thus, the instruction segment (ie, instruction segment 14) corresponding to the target address of the branch instruction 36 is filled into the high level instruction memory 124.
同樣地,在指令段13中的最後一條指令被執行完之前就已經可以知道最後一條指令對應的下一指令在指令段14中。由於指令段14已被填充,因此不需要額外填充操作。同理,在處理器核125執行指令段15、16和17中的任意指令前,將指令段15、16和17填充到高層次指令記憶體124中。Similarly, it is already known that the next instruction corresponding to the last instruction is in instruction segment 14 before the last instruction in instruction segment 13 is executed. Since the instruction segment 14 has been filled, no additional padding is required. Similarly, instruction segments 15, 16 and 17 are populated into high level instruction memory 124 before processor core 125 executes any of instruction segments 15, 16, and 17.
此外,當掃描到指令段17中用於迴圈的分支指令39時,由於目標指令段(即指令段16)和順序執行下一指令位址對應的指令段都已經被填充到高層次記憶體124中,因此不需要額外填充操作。當指令39的分支轉移條件不再成立時,迴圈結束,繼續執行指令段17中的後續指令。In addition, when scanning to the branch instruction 39 for the loop in the instruction segment 17, the target instruction segment (ie, the instruction segment 16) and the instruction segment corresponding to the sequential execution of the next instruction address have been filled into the high-level memory. 124, so no additional padding is required. When the branch transition condition of instruction 39 is no longer true, the loop ends and the subsequent instructions in instruction segment 17 continue to be executed.
綜上所述,循跡引擎320和其他部件可以根據地址樹的概念控制上述操作從而充分地降低緩存缺失率。循跡引擎320和其他部件(如:填充/生成器123)也可以是指類似緩存控制器的面向多種部件的介面,從而充分地降低緩存缺失率。圖3A是本發明所述緩存系統的一個實施例2000。In summary, the tracking engine 320 and other components can control the above operations in accordance with the concept of the address tree to substantially reduce the cache miss rate. The tracking engine 320 and other components (eg, pad/generator 123) may also refer to a multi-component-oriented interface similar to the cache controller, thereby substantially reducing the cache miss rate. 3A is an embodiment 2000 of the cache system of the present invention.
如圖3A所示,循跡引擎320包括一個軌道表126和一個循跡器170。軌道表可以包含處理器核125所需執行的指令的軌道,循跡器170可以根據軌道表126提供多個位址。在本實施例中,一條軌道代表一個將被執行的指令序列(如:一個指令段)。這種表示方式也可以包括任意合適的資料類型,如位址、塊號,或其他數位。此外,當一條軌道包含一個具有可能改變程式流或處於另一個不同的指令段中的分支目標的分支點時建立一條新的軌道,所述分支目標的情況例如下一指令段中的一條指令,一個異常處理程式,以及不同的程式線程等。所述指令序列可以包含相同數目的指令,也可以在例如應用於變長指令集時包含不同數目的指令。As shown in FIG. 3A, the tracking engine 320 includes a track table 126 and a tracker 170. The track table can contain tracks of instructions that processor core 125 needs to execute, and tracker 170 can provide multiple addresses based on track table 126. In this embodiment, a track represents a sequence of instructions to be executed (e.g., an instruction segment). This representation can also include any suitable data type, such as an address, block number, or other digit. In addition, a new track is created when a track contains a branch point with a branch target that may change the program stream or is in another different instruction segment, such as an instruction in the next instruction segment, An exception handler, as well as different program threads. The sequence of instructions may contain the same number of instructions, or may include a different number of instructions when applied, for example, to a variable length instruction set.
軌道表126可以包含複數條軌道,軌道表126中的每條軌道對應軌道表126中的一行,且具有一個對應於記憶塊的行號或塊號。一條軌道可以包括複數個軌跡點,每個軌跡點對應一條單獨的位址。此外,如同一條軌道對應軌道表126中的一個單獨的行一樣,一個軌跡點對應軌道表126中對應行的一個單獨的表項(如:存儲單元)。這樣,軌道中的軌跡點總數可以等於軌道表126中一行的表項總數。此外,其他配置方法也可以被使用。The track table 126 can include a plurality of tracks, each track in the track table 126 corresponding to a row in the track table 126, and having a line number or block number corresponding to the memory block. A track may include a plurality of track points, each track point corresponding to a single address. Moreover, as if one track corresponds to a single row in the track table 126, one track point corresponds to a separate entry (eg, a storage unit) of the corresponding row in the track table 126. Thus, the total number of track points in the track can be equal to the total number of entries in a row in track table 126. In addition, other configuration methods can be used.
一個軌跡點(即表項中的單獨一項)可以包含一條分支指令的資訊,所述分支指令的分支目標可以在另一條軌道中。這樣,軌跡點的內容可以包括軌跡點對應的指令的資訊和分支目標位址的資訊,所述分支目標位址的資訊可以包括所述目標軌道的軌道號,以及用於定位表項在目標軌道中位置的偏移量。通過檢查軌跡點內容,可以根據軌道號確定目標軌道,並根據偏移量確定目標軌道中的一個特定的表項。這樣,軌道表就成為一個分支軌道表項位址對應分支源位址、表項內容對應分支目標位址的表。A track point (ie, a single item in a table entry) can contain information about a branch instruction whose branch target can be in another track. In this way, the content of the track point may include information of the instruction corresponding to the track point and information of the branch target address, the information of the branch target address may include the track number of the target track, and the positioning item is used in the target track. The offset of the position in the middle. By examining the content of the track point, the target track can be determined based on the track number, and a specific entry in the target track is determined based on the offset. In this way, the track table becomes a table in which the branch track entry address corresponds to the branch source address and the entry content of the entry corresponds to the branch target address.
例如,在圖3A中,處理器核125使用(M+Z)位元的指令位址讀取指令並運行,其中M和Z都是整數。位址中M位元部分可以被用於指示高位位址,Z位元部分可以被只是偏移量。軌道表126可以包含2M行,即2M條軌道,且M位元高位位址可以被用於對軌道表126中的軌道進行定址。每個行可以包含2Z個軌道表項,即2Z個軌跡點,且偏移量(Z位)可以被用於在對應行中對一個特定的軌跡點(表項)進行定址。For example, in FIG. 3A, processor core 125 reads and executes an instruction address using (M+Z) bits, where M and Z are integers. The M bit portion of the address can be used to indicate the upper address, and the Z bit portion can be just offset. The track table 126 can contain 2 M lines, i.e., 2 M tracks, and the M-bit high order address can be used to address the tracks in the track table 126. Each row can contain 2 Z track entries, ie 2 Z track points, and the offset (Z bits) can be used to address a particular track point (table entry) in the corresponding row.
當新軌道被建立時,可以將新建軌道放置在軌道表中的一個有效行中。如果所述新軌道包含一個分支軌跡點(對應一條分支源指令),那麼在所述行中的一個表項內建立一個分支軌跡點。可以根據分支源位址確定所述分支點在軌道表126中的行及表項的位置。舉例而言,可以根據分支源位址的高位位址確定行,並根據分支源位址的偏移量確定表項。When a new track is created, the new track can be placed in a valid row in the track table. If the new track contains a branch track point (corresponding to a branch source instruction), then a branch track point is established within an entry in the line. The location of the row and entry of the branch point in the track table 126 can be determined based on the branch source address. For example, the row may be determined according to the upper address of the branch source address, and the entry is determined according to the offset of the branch source address.
此外,行中的每個表項或軌跡點可以包含一種內容格式,所述格式包括類型區域57、XADDR區域58和YADDR區域59。此外還可以包含其他區域。類型區域57可以表示軌跡點對應的指令的類型。如之前所述,指令類型可以包括條件分支指令、無條件分支指令和其他指令。XADDR區域58可以包含M位元位址,也被稱為第一維位址或簡稱為第一位址。YADDR區域59可以包含Z位元位址,也被稱為第二維位址或簡稱為第二位址。Moreover, each entry or track point in the row can include a content format that includes a type region 57, an XADDR region 58, and a YADDR region 59. It can also contain other areas. Type area 57 may represent the type of instruction corresponding to the track point. As mentioned previously, the instruction types may include conditional branch instructions, unconditional branch instructions, and other instructions. The XADDR region 58 may contain an M-bit address, also referred to as a first-dimensional address or simply a first address. The YADDR region 59 may contain a Z-bit address, also referred to as a second-dimensional address or simply a second address.
此外,新軌跡點的內容可以對應分支目標指令。換句話說,分支軌跡點的內容存儲了分支目標位址資訊。舉例而言,軌道表126中的對應與一條分支目標指令的特定的行的行號或塊號被作為第一位址58存儲到所述分支軌跡點的內容中。此外,所述分支目標的偏移量被作為第二位址59存儲到所述分支軌跡點的內容中。所述偏移量可以根據分支源指令位址和分支轉移位移(距離)計算得到。這樣,在對分支目標進行定址時,存儲在分支軌跡點(即所述分支源指令)中的第一位址XADDR 58被用做行位址,存儲在分支軌跡點中的第二位址YADDR 59被用做列地址。In addition, the content of the new track point can correspond to the branch target instruction. In other words, the contents of the branch track point store the branch target address information. For example, the row number or block number in the track table 126 corresponding to a particular row of a branch target instruction is stored as the first address 58 into the content of the branch track point. Further, the offset of the branch target is stored as the second address 59 into the content of the branch track point. The offset can be calculated based on the branch source instruction address and the branch transfer displacement (distance). Thus, when addressing the branch target, the first address XADDR 58 stored in the branch track point (ie, the branch source instruction) is used as the row address, and the second address YADDR stored in the branch track point is stored. 59 is used as the column address.
指令記憶體46可以是高層次記憶體124的一部分,用於指令訪問,並可以由任意合適的高性能記憶體構成。指令記憶體46可以包含2M個記憶塊,每個記憶塊包含2Z個位元組或字。Instruction memory 46 may be part of high level memory 124 for instruction access and may be constructed of any suitable high performance memory. Instruction memory 46 may contain 2 M memory blocks, each memory block containing 2 Z bytes or words.
循跡器170可以由多種部件或設備構成,如:寄存器,選擇器、棧和/或其他存儲模組,用於確定處理器核125執行的下一軌道。循跡器170可以根據軌道表126中的當前軌道、軌跡點資訊和是否因處理器核125的執行發生分支轉移等資訊確定下一軌道。The tracker 170 can be comprised of a variety of components or devices, such as registers, selectors, stacks, and/or other storage modules for determining the next track to be executed by the processor core 125. The tracker 170 can determine the next track based on information such as the current track in the track table 126, track point information, and whether branching has occurred due to execution of the processor core 125.
舉例而言,在運行過程中,匯流排55上傳遞了(M+Z)位元的指令位址。M位元位址通過匯流排56被作為第一位址或XADDR(或X位址)送到軌道表126,Z為位址通過匯流排53被作為第二位址或YADDR(或Y地址)送到軌道表126。根據所述第一位址和第二位址,可以找到軌道表中的一個表項,並將它的內容輸出到匯流排51上。若所述表項對應一條分支指令(一個分支軌跡點或分支源指令),則表項內容通過匯流排51被用做分支的目標位址。For example, during operation, the instruction address of the (M+Z) bit is passed on the bus 55. The M bit address is sent to the track table 126 as the first address or XADDR (or X address) through the bus 56, and Z is the address as the second address or YADDR (or Y address) through the bus bar 53. It is sent to the track table 126. According to the first address and the second address, an entry in the track table can be found and its content is output to the bus bar 51. If the entry corresponds to a branch instruction (a branch track point or a branch source instruction), the contents of the entry are used as the target address of the branch through the bus bar 51.
如果所述分支指令的分支轉移條件不成立,那麼分支轉移不發生,且處理器核125發出的所述分支轉移不發生信號控制選擇器49選擇匯流排53上的YADDR經增一邏輯48增加一(1)個位元組或字後得到匯流排54上的值作為新的第二位址,並在匯流排52上輸出所述新位址。寄存器50保持第一位址不變,由增一邏輯48不斷增一(1)直至指向當前軌道上的下一個分支指令。此後,所述第一位址和第二位址被保持在寄存器50中並被提供到匯流排55上。If the branch transfer condition of the branch instruction does not hold, the branch transfer does not occur, and the branch transfer from the processor core 125 does not occur. The signal control selector 49 selects the YADDR on the bus 53 to increase by one. 1) The value on the bus 54 is obtained as a new second address after a byte or word, and the new address is output on the bus 52. Register 50 keeps the first address unchanged and is incremented by one (1) by the increment one logic 48 until it points to the next branch instruction on the current track. Thereafter, the first address and the second address are held in the register 50 and supplied to the bus bar 55.
另一方面,如果所述分支指令的分支轉移條件成立,那麼分支轉移發生,且處理器核125發出的所述分支轉移繁盛信號控制選擇器49選擇匯流排51上的所述分支點對應的軌道表項的內容中存儲的新目標位址作為輸出送到匯流排52上。寄存器50保持所述改變後的第一位址,並將(M+Z)位的新位址提供到匯流排55上。處理器核125發出的用於控制選擇器49的信號也被成為“發生(taken)”信號,用於表示分支是否發生。On the other hand, if the branch transfer condition of the branch instruction is established, the branch transfer occurs, and the branch transfer prosperous signal control selector 49 issued by the processor core 125 selects the track corresponding to the branch point on the bus bar 51. The new target address stored in the contents of the entry is sent to the bus 52 as an output. The register 50 holds the changed first address and provides a new address of (M+Z) bits to the bus 55. The signal from processor core 125 for controlling selector 49 is also referred to as a "taken" signal for indicating whether a branch has occurred.
這樣,當處理器核125僅提供偏移量的同時,循跡引擎320提供了一個匯流排56上的塊位址,從而實現了對指令記憶體46進行定址的目的。處理器核125向循跡器170回饋分支指令執行情況(即“發生”信號),使得循跡器170能確定如何運行。Thus, while the processor core 125 provides only the offset, the tracking engine 320 provides a block address on the bus 56, thereby enabling the addressing of the instruction memory 46. The processor core 125 feeds back the branch instruction execution (i.e., "occurrence" signal) to the tracker 170 so that the tracker 170 can determine how to operate.
在新軌道被執行前,所述軌道對應的指令段已經被填充到指令記憶體46中。這樣的過程被反復執行,使得所有指令都能在不發生緩存缺失的情況下被處理器核125執行。此外,還可以使用兩層的指標(PHOL),從而對第一個分支點之後的兩個後續分至點進行審查,且循跡器170和/或填充/生成器123可以將所述兩個分支點的兩條軌道對應的指令段填充到指令記憶體46中,從而更進一步隱藏填充緩存的時延。The instruction segment corresponding to the track has been filled into the instruction memory 46 before the new track is executed. Such a process is iteratively executed so that all instructions can be executed by processor core 125 without a cache miss. In addition, a two-level indicator (PHOL) can also be used to review the two subsequent points-to-point after the first branch point, and the tracker 170 and/or the filler/generator 123 can take the two The instruction segments corresponding to the two tracks of the branch point are filled into the instruction memory 46, thereby further hiding the delay of filling the buffer.
圖3B是本發明所述緩存系統的另一個實施例3000。本實施例省略了與圖3A實施例相似的部件。如圖3B所示,匯流排56上的用於對軌道表126和指令記憶體46進行定址的XADDR位址或塊位址可以有多個不同的來源。這就是說,循跡器170可以從複數個位址源中選擇軌道。舉例而言,用多路選擇器65代替了圖3A中的選擇器49,使得能從四個不同的來源中進行選擇:匯流排51上的當前分支指令的目標位址(軌道表內容)、匯流排54上的第一位址不變且第二位址增一(1)產生的一個正常位址、在匯流排64上的來源於棧61的一個位址,以及在匯流排62上的對應於異常處理程式入口的軌道位置。FIG. 3B is another embodiment 3000 of the cache system of the present invention. This embodiment omits components similar to the embodiment of Fig. 3A. As shown in FIG. 3B, the XADDR address or block address on the bus 56 for addressing the track table 126 and the instruction memory 46 can have a number of different sources. That is to say, the tracker 170 can select a track from a plurality of address sources. For example, the selector 49 of FIG. 3A is replaced with a multiplexer 65 so that selection can be made from four different sources: the target address of the current branch instruction on the bus bar 51 (track table content), The first address on the bus 54 is unchanged and the second address is incremented by one (1), a normal address, an address from the stack 61 on the bus 64, and the bus 62. Corresponds to the track position of the exception handler entry.
多路選擇器65根據當前指令及運行狀態選擇一條軌道(當前軌道或新軌道)。舉例而言,如果第二位址對應的軌跡點不是分支指令,那麼第一位址保持不變,且增一邏輯48保持對第二位址的增一(1)直到下一分支指令。如果第二位址對應的軌跡點是分支指令或到達了一個分之指令,且分支條件不滿足,那麼第一位址仍然保持不變且第二位址增加直到下一個分支指令。另一方面,如果分支條件滿足,或所述分支是無條件的,那麼目標位址被用做新的第一位址從而到達一條新的軌道。最後,如果到達最後一條指令,之後也將進入下一指令段對應的一條新軌道。The multiplexer 65 selects a track (current track or new track) based on the current command and the operating state. For example, if the track point corresponding to the second address is not a branch instruction, then the first address remains unchanged, and the increment one logic 48 keeps incrementing the second address by one (1) until the next branch instruction. If the track point corresponding to the second address is a branch instruction or a branch instruction is reached, and the branch condition is not satisfied, then the first address remains unchanged and the second address is incremented until the next branch instruction. On the other hand, if the branch condition is satisfied, or the branch is unconditional, then the target address is used as the new first address to reach a new track. Finally, if the last instruction is reached, it will also enter a new track corresponding to the next instruction segment.
一些特殊的程式,如異常處理程式,也可以被填充到高層次記憶體124中並建立對應軌道。這些特殊程式入口對應的軌跡點位址可以被存儲到一些特殊的寄存器(如:EXCP)中。當一個時間發生時(如:有一個異常發生),對應於一個特殊程式(如:一個異常處理程式)的軌跡點位址經匯流排62被選擇器65選中,以進入所述特殊程式。Some special programs, such as exception handlers, can also be populated into the high level memory 124 and create corresponding tracks. The track point addresses corresponding to these special program entries can be stored in special registers (eg EXCP). When a time occurs (e.g., an exception occurs), the track point address corresponding to a special program (e.g., an exception handler) is selected by the selector 65 via the bus 62 to enter the special program.
此外,棧61可以包含複數個單獨的棧。每個單獨的棧都可以提供棧操作,如將指令入棧,以及將指令出棧,從而保存線程內容或保存“調用(CALL)”的路徑(routine)狀態。當一個程式調用一個路徑時,對應返回位址的軌跡點的位址和/或其他資訊可以入棧,並且當從一個調用路徑返回時,所述被保存的軌跡點位址和/或其他資訊可以出棧並根據所述軌跡點強制改變軌道(由選擇器65選擇64)。在某些情況中,處理器核125可以執行一種“跳轉並鏈結”類型的指令(即當路徑執行完畢後分支轉移或調用到返回位址)。同樣地,一個棧可以用來保存這種類型指令的返回位址。此外,處理器核125可以執行複數層嵌套的“調用”或“跳轉並鏈結”類型的指令。所述單獨棧可以包含複數層從而在不同的棧層次保存多個返回位址。此外,所述複數個棧可以支援多線程程式。軌道表126可以包含對應于不同線程的複數個棧,且線程標識器63可以被用來標識當前程式線程。此外,線程標識器63指向支援當前線程的當前棧。其他的來源或排列也可以被用在本實施例中。Additionally, stack 61 can include a plurality of separate stacks. Each individual stack can provide stack operations, such as pushing instructions onto the stack, and popping the instructions to save thread content or save the "cable" state. When a program calls a path, the address and/or other information of the track point corresponding to the returned address can be pushed onto the stack, and when returned from a call path, the saved track point address and/or other information The stack can be popped and the track is forced to change according to the track point (select 64 by selector 65). In some cases, processor core 125 may execute a "jump and link" type of instruction (ie, branch transfer or call to return address when the path is completed). Similarly, a stack can be used to hold the return address of this type of instruction. In addition, processor core 125 can execute a plurality of nested "call" or "jump and link" type instructions. The separate stack can include multiple layers to hold multiple return addresses at different stack levels. In addition, the plurality of stacks can support multi-threaded programs. Track table 126 can include a plurality of stacks corresponding to different threads, and thread identifier 63 can be used to identify the current program thread. In addition, thread identifier 63 points to the current stack that supports the current thread. Other sources or permutations may also be used in this embodiment.
這樣,通過使用複數個棧可以支援多線程程式,根據線程標識器63的標識,每個棧都可以被一個線程或程式單獨使用。Thus, multithreaded programs can be supported by using a plurality of stacks, each of which can be used by a thread or program alone, depending on the identity of the thread identifier 63.
圖4是本發明所述緩存系統的另一個實施例4000。實施例4000與圖3A中的實施例2000相似。然而,在本實施例中使用了指令記憶體78代替指令記憶體46。如圖4所示,指令記憶體78可以包含2N個記憶塊,其中N是整數且N□M。這就是說,指令記憶體78可以比指令記憶體46包含更少的記憶塊。匯流排56上的第一位址也因此僅被用於對軌道表126定址。4 is another embodiment 4000 of the cache system of the present invention. Embodiment 4000 is similar to embodiment 2000 in Figure 3A. However, the instruction memory 78 is used instead of the instruction memory 46 in this embodiment. As shown in FIG. 4, the instruction memory 78 can contain 2 N memory blocks, where N is an integer and N□M. That is, the instruction memory 78 can contain fewer memory blocks than the instruction memory 46. The first address on bus bar 56 is therefore also only used to address track table 126.
此外,一個映射單元79可以將第一位址映射到N位長的塊號或塊位址80。這樣,送往高層次記憶體的位址可以通過映射從而減小高層次記憶體的大小。由於處理器核125幾乎不可能用到整個位址空間中的全部指令位址,因此採用這種基於映射的方法可以不提供對應於全部位址空間的記憶塊,從而減小指令記憶體78的大小。In addition, one mapping unit 79 may map the first address to a block number or block address 80 of N bits long. In this way, the address addressed to the high-level memory can be mapped to reduce the size of the high-level memory. Since the processor core 125 is almost impossible to use all the instruction addresses in the entire address space, such a mapping-based method may not provide a memory block corresponding to all the address spaces, thereby reducing the instruction memory 78. size.
圖5是本發明所述緩存系統的另一個實施例5000。實施例5000與圖4中的實施例4000相似。然而,軌道表126可以只包含2N個行。這就是說,匯流排56上的第一位址經映射單元82映射後同時對軌道表126和指令記憶體78進行定址,以減少對容量大小的要求。Figure 5 is another embodiment 5000 of the cache system of the present invention. Embodiment 5000 is similar to embodiment 4000 of FIG. However, track table 126 may contain only 2 N rows. That is to say, the first address on the bus bar 56 is mapped by the mapping unit 82 and the track table 126 and the instruction memory 78 are simultaneously addressed to reduce the size requirement.
此外,在軌道表126和指令記憶體78中的總行數小於處理器核125全部可定址空間的情況下,軌道表126中的行可以仍然使用M位作為第一位址並使用Z位作為第二位址,從而同時減少軌道表126和指令記憶體78的記憶體容量。Moreover, where the total number of rows in track table 126 and instruction memory 78 is less than the total addressable space of processor core 125, the rows in track table 126 may still use M bits as the first address and use Z bits as the first The two addresses thereby reducing the memory capacity of the track table 126 and the instruction memory 78 at the same time.
圖6是本發明所述緩存系統的另一個實施例6000。實施例6000與圖5中的實施例5000相似。然而,如圖6所示,一個映射單元83被放置於軌道表126及指令記憶體78的外部,使得M位的第一位址84被軌道表126及指令記憶體78使用前被映射為N位的第一位址85。這樣,送到軌道表126、指令記憶體78和循跡器170的位址都經過了映射以減小容量。6 is another embodiment 6000 of the cache system of the present invention. Embodiment 6000 is similar to embodiment 5000 of FIG. However, as shown in FIG. 6, a mapping unit 83 is placed outside the track table 126 and the instruction memory 78 such that the first address 84 of the M bit is mapped to N before being used by the track table 126 and the instruction memory 78. The first address of the bit is 85. Thus, the addresses addressed to track table 126, instruction memory 78, and tracker 170 are mapped to reduce capacity.
這樣,軌道表126中的行可以使用N位的第一位址和Z位的第二位址,軌道表126和指令記憶體78中的總行數可以小於處理器核125可定址的全部位址空間,從而同時減少軌道表126和指令記憶體78的記憶體容量。此外,較短的第一位址可以提高整個系統的性能。Thus, the rows in the track table 126 can use the first address of the N bits and the second address of the Z bit. The total number of rows in the track table 126 and the instruction memory 78 can be less than the total address addressable by the processor core 125. Space, thereby reducing the memory capacity of track table 126 and instruction memory 78 at the same time. In addition, a shorter first address can improve the performance of the entire system.
雖然上述的映射方法可以減少緩存和軌道表的容量,每一個指令段仍然可以對應到一條軌道。額外的結構也可以被用於在不丟棄已建軌道資訊的情況下防止重複建立已建軌道。圖7A就是採用了上述一個或多個映射方法實現的本發明所述緩存系統的另一個實施例8000。Although the above mapping method can reduce the capacity of the cache and track table, each instruction segment can still correspond to one track. Additional structures can also be used to prevent duplicate builds of built tracks without discarding built track information. FIG. 7A is another embodiment 8000 of the cache system of the present invention implemented using one or more of the above mapping methods.
如圖7A所示,緩存系統8000包含低級記憶體122、高層次指令記憶體124和處理器核125。此外,緩存系統8000還包含填充/生成器123、分配器1200、軌道表126和循跡器170。分配器1200、軌道表126和循跡器170構成了循跡引擎320(圖中未顯示)的主體部分。而且如之前所述,循跡引擎320、填充/生成器123和其他相關邏輯可以被用做一個緩存控制單元。應當理解的是,這裏列出的多個部件只是為了便於描述,還可以包含其他部件,或某些部件可以被組合或省去。所述多個部件可以分佈在多個系統中,可以是物理存在的或虛擬的,也可以用硬體實現(如積體電路)、用軟體實現或由軟硬體組合實現。As shown in FIG. 7A, the cache system 8000 includes low level memory 122, high level instruction memory 124, and processor core 125. In addition, cache system 8000 also includes a fill/builder 123, a dispatcher 1200, a track table 126, and a tracker 170. Dispenser 1200, track table 126, and tracker 170 form the body portion of tracking engine 320 (not shown). Also, as previously described, the tracking engine 320, pad/generator 123, and other related logic can be used as a cache control unit. It should be understood that the various components listed herein are merely for convenience of description and may include other components, or some components may be combined or omitted. The plurality of components may be distributed among a plurality of systems, may be physically present or virtual, or may be implemented by hardware (such as integrated circuits), implemented by software, or by a combination of hardware and software.
此外,填充/生成器123可以包含一個填充引擎132、一個生成器130和位址翻譯單元131,且循跡器170可以包含一個多路選擇器137、寄存器138、增一邏輯136和棧135。還可以包含其他部件,或某些部件可以被組合或省去。為便於描述,根據特定的應用和配置,高層次記憶體124可以被視為一個一級(L1)緩存,且低級記憶體122可以被視為一個二級(L2)緩存或主記憶體。如之前所述,生成器130提取分支指令(源)位址(分支指令對應的軌道表位址)、分支類型,和分支目標位址(分支軌跡點對應的軌道表內容),用以建立軌道表126。In addition, the pad/generator 123 can include a fill engine 132, a generator 130, and an address translation unit 131, and the tracker 170 can include a multiplexer 137, registers 138, add-on logic 136, and stack 135. Other components may also be included, or some components may be combined or omitted. For ease of description, high level memory 124 may be considered a level one (L1) cache, and low level memory 122 may be considered a level two (L2) cache or main memory, depending on the particular application and configuration. As described earlier, the generator 130 extracts the branch instruction (source) address (the track table address corresponding to the branch instruction), the branch type, and the branch target address (the track table content corresponding to the branch track point) to establish the track. Table 126.
分配器1200可以被用於將存儲軌道資訊或分配後存儲軌道資訊以減少軌道表126和高層次記憶體124的容量大小要求。舉例而言,分配器1200可以包含一個主動表121。一個主動表可以存儲一條已建立的軌道的資訊,並建立一個位址(或位址的一部分)與一個諸如軌道在軌道表126中佔據的有效行所表示的塊號之間的映射。舉例而言,當建立一條軌道時,所述軌道的位址資訊被存儲到主動表中。也可以採用其他的安置形式。The distributor 1200 can be used to store track information or to store track information after allocation to reduce the capacity size requirements of the track table 126 and the high level memory 124. For example, the distributor 1200 can include an active table 121. An active table can store information about an established track and establish a mapping between an address (or a portion of the address) and a block number represented by a valid line occupied by the track in the track table 126. For example, when a track is created, the address information of the track is stored in the active list. Other forms of placement are also possible.
如圖7A所示,主動表121可以被用於存儲指令段在高層次記憶體124中的塊位址,且每個塊位址對應一個塊號(BNX)。對應一個特定位址的塊號可以通過對位址和主動表121中的表項進行內容匹配得到。所述匹配成功內容所在位置可以被編碼得到一個塊號,所述塊號可以被用於索引軌道表中的一個行以及高層次記憶體124中的一個塊。如果匹配不成功,則意味著所述位址對應的軌道還沒有建立起來。對應所述位址的指令段被填充到高層次記憶體124中,一個新的軌道被建立在軌道表126裏由位址指標129通過匯流排153索引的行中,且主動表121裏由位址指標129通過匯流排153索引的項被更新(寫入)為對應的塊位址。圖8是本發明所述主動表的一個實施例。As shown in FIG. 7A, the active table 121 can be used to store block addresses of the instruction segments in the high level memory 124, and each block address corresponds to a block number (BNX). The block number corresponding to a specific address can be obtained by performing content matching on the address and the entry in the active table 121. The location of the matching successful content may be encoded to obtain a block number, which may be used to index one row in the track table and one block in the high level memory 124. If the match is unsuccessful, it means that the track corresponding to the address has not been established yet. The instruction segment corresponding to the address is filled into the high level memory 124, and a new track is built in the track table 126 in the row indexed by the address index 129 through the bus bar 153, and the active table 121 is in the bit. The entry index 129 indexed by the bus bar 153 is updated (written) to the corresponding block address. Figure 8 is an embodiment of the active watch of the present invention.
如圖8所示,主動表121可以包括一個位址/資料雙向定址單元100。在一個方向上,資料/位址雙向定址單元100可以根據一個輸入的塊位址輸出一個BNX號。資料/位址雙向定址單元100通過匹配輸入的塊(高位)位址和資料/位址雙向定址單元100中的內容產生一個對應的BNX。在另一個方向上,資料/位址雙向定址單元100可以根據一個輸入的BNX號輸出一個對應的塊位址。所述輸入的BNX號可以索引到存儲所述塊位址的表項。此外,資料/位址雙向定址單元100可以包括複數個表項101,每個表項101包含一個寄存器、一個比較器、一個標誌位元111(即V位元)、一個標誌位元112(即A位元)和一個標誌位元113(即U位)。比較器的結果可以被送到編碼器102用於產生一個匹配的表項號。As shown in FIG. 8, the active list 121 can include an address/data bidirectional addressing unit 100. In one direction, the data/address bidirectional addressing unit 100 can output a BNX number based on an input block address. The data/address bidirectional addressing unit 100 generates a corresponding BNX by matching the input block (high order) address and the data/address bidirectional address unit 100. In the other direction, the data/address bidirectional addressing unit 100 can output a corresponding block address based on an input BNX number. The input BNX number may be indexed to an entry storing the block address. In addition, the data/address bidirectional addressing unit 100 can include a plurality of entries 101, each entry 101 including a register, a comparator, a flag bit 111 (ie, a V bit), and a flag bit 112 (ie, A bit) and a flag bit 113 (ie, U bit). The result of the comparator can be sent to encoder 102 for generating a matching entry number.
控制邏輯107可以被用於控制讀/寫狀態。每一個表項101的V(有效)位可以被初始化為“0”,且每一個表項101的A(主動)位元可以被寫入信號線119上的一個主動信號。一個寫指標105可以指向資料/位址雙向定址單元100中的一個表項,且所述指標由一個迴圈自增單元110(圖7A中的129)產生。迴圈自增單元110能產生的最大值等於表項101的總數。當達到最大值後,迴圈自增單元110增一產生的下一個值從“0”重新開始,且不斷自增直到再次達到最大值。Control logic 107 can be used to control the read/write state. The V (valid) bit of each entry 101 can be initialized to "0", and the A (active) bit of each entry 101 can be written to an active signal on signal line 119. A write indicator 105 can be directed to an entry in the data/address bidirectional addressing unit 100, and the indicator is generated by a loop auto increment unit 110 (129 in Figure 7A). The maximum value that the loop self-incrementing unit 110 can generate is equal to the total number of the entries 101. When the maximum value is reached, the next value generated by the increase of the loop self-incrementing unit 110 is restarted from "0", and continues to increase until it reaches the maximum value again.
在運行過程中,當寫指標105指向一個當前表項101時,檢查當前表項101的V位和A位。如果V位和A位都是“0”,那麼當前表項是空閒、可以被寫入的。當寫操作完成後,迴圈自增單元110可以將指標增加一(1)從而指向下一個表項。然而,如果V位和A位中有一個不為“0”,那麼當前表項並不能被用於新的寫入,迴圈自增單元110可以將指標增加一(1)從而指向下一個表項,並檢查所述下一個表項的是否可以用於新的寫入。During the running process, when the write indicator 105 points to a current entry 101, the V bit and the A bit of the current entry 101 are checked. If both the V bit and the A bit are "0", then the current entry is free and can be written. When the write operation is completed, the loop auto increment unit 110 may increment the index by one (1) to point to the next entry. However, if one of the V bits and the A bit is not "0", then the current entry cannot be used for a new write, and the loop auto increment unit 110 can increment the indicator by one (1) to point to the next table. Item, and check if the next entry is available for new writes.
在匹配過程中,輸入的塊位址資料104和每個表項101的寄存器中的內容進行比較。所述寄存器中的內容僅包含(對應高層次記憶體124中記憶塊的)位址的高位。如果匹配成功,那麼編碼器102將匹配結果編為一個表項號碼並將所述表項號碼送到匹配位址輸出109。如果匹配不成功,那麼所述輸入的塊位址被寫入由指標105指向的表項裏的寄存器101中,同時該表項裏的V位被設為“1”,且所述表項號碼被從到匹配位址輸出109。所述輸出的表項號碼之後會被用於表示BNX(因為它索引了一個記憶塊,因此也就是塊號)。所述輸入位址的低位元(即一個記憶塊中的偏移量)之後會被用於表示BNY。所述BNX和BNY一起用於表示BN,所述BN之後會被存儲到一個軌道表項中,並被用於索引軌道表126、高層次記憶體124,和主動表121。雖然這裏使用的“BN”通常指包含BNX和BNY的“塊號”,但在本領域技術人員能夠理解的前提下,在某些特殊的情況裏,它也可以僅指位址的高位部分,即相當於BNX。此外,迴圈自增單元110可以將指標BNY增加一(1)從而指向下一個表項。In the matching process, the input block address data 104 is compared with the contents of the registers of each entry 101. The contents of the register contain only the upper bits of the address (corresponding to the memory block in the high level memory 124). If the match is successful, the encoder 102 encodes the match result as an entry number and sends the entry number to the match address output 109. If the match is unsuccessful, the input block address is written into the register 101 in the entry pointed to by the indicator 105, and the V bit in the entry is set to "1", and the entry number is It is output 109 from the matching address. The output entry number is then used to represent BNX (because it indexes a memory block, hence the block number). The lower bits of the input address (i.e., the offset in one memory block) are then used to represent BNY. The BNX and BNY are used together to represent the BN, which is then stored in a track entry and used to index the track table 126, the high level memory 124, and the active table 121. Although "BN" as used herein generally refers to a "block number" including BNX and BNY, it can be understood by those skilled in the art that in some special cases, it can only refer to the upper part of the address. That is equivalent to BNX. Further, the loop self-increment unit 110 may increment the index BNY by one (1) to point to the next entry.
對於讀操作,讀位址106被用於在多個表項101中選擇一個表項,並將選中的表項裏的寄存器的內容讀出並送到資料輸出108,以及將所述選中的表項101的V位設置為“1”。For a read operation, the read address 106 is used to select an entry in the plurality of entries 101, and the contents of the registers in the selected entry are read and sent to the data output 108, and the selected The V bit of the entry 101 is set to "1".
一個表項101裏的U位元可以被用於表示存儲狀態。當寫指標105指向一個表項101時,所述被指向的表項裏的U位被設置為“0”。當讀取一個表項101時,所述被讀出的表項裏的U位被設置為“1”。此外,當迴圈自增單元110產生一個寫指標105指向一個新的表項時,檢查所述新表項裏的U位。如果所述U位是“0”,那麼所述新表項是可以被用於替換的,且為了完成可能的資料寫入操作,寫指標105停留在所述新表項。然而,如果所述U位是“1”,那麼指標105進一步指向下一個表項。A U bit in an entry 101 can be used to indicate a storage state. When the write indicator 105 points to an entry 101, the U bit in the pointed entry is set to "0". When an entry 101 is read, the U bit in the read entry is set to "1". In addition, when the loop self-increment unit 110 generates a write index 105 to point to a new entry, the U bit in the new entry is checked. If the U bit is "0", then the new entry can be used for replacement, and in order to complete a possible data write operation, the write indicator 105 stays at the new entry. However, if the U bit is "1", then the indicator 105 is further pointed to the next entry.
可選地,一個視窗指標116可以被用來將其指向的表項裏的U位設置為“0”,且視窗(清除)指標116位於寫指標105之前的N個表項的位置(N為一個整數)。可以通過使用加法器115將寫指標105的值增加N得到視窗指標116的值。寫指標105和視窗指標116之間的N個表項可以被認為是一個視窗。這樣,清除指標可以將一個表項裏的U位設置為“0”。之後,對該表項的任意一次讀操作都會導致所述U位元被設置為“1”。當寫指標105指向所述表項時,檢查所述U位。如果所述U位是“0”,即表示,自從該表項被清除指標116清除後,該表項沒有被使用過,因此寫指標105停留在該表項並用於下一次的寫入。在另一方面,如果所述U位是“1”,即表示該表項最近被使用過,那麼寫指標移動到下一個表項。可以通過改變視窗的大小(即改變N的值)改變表項101中的表項被替換的頻率。這種方法可以被用做一種基於使用率的替換策略以替換主動表121中的表項。Optionally, a window indicator 116 can be used to set the U bit in the entry to which it points to "0", and the window (clear) indicator 116 is located at the position of the N entries preceding the write indicator 105 (N is An integer). The value of the window indicator 116 can be obtained by increasing the value of the write index 105 by the adder 115. The N entries between the write indicator 105 and the window indicator 116 can be considered as a window. Thus, the clear indicator can set the U bit in an entry to "0". Thereafter, any read operation on the entry will cause the U bit to be set to "1". When the write indicator 105 points to the entry, the U bit is checked. If the U bit is "0", it means that the entry has not been used since the entry is cleared by the clear indicator 116, so the write indicator 105 stays in the entry and is used for the next write. On the other hand, if the U bit is "1", indicating that the entry has been used recently, the write indicator moves to the next entry. The frequency at which the entry in the entry 101 is replaced can be changed by changing the size of the window (i.e., changing the value of N). This method can be used as a usage-based replacement strategy to replace entries in the active table 121.
可選地,所述U位可以超過一位,這樣就有多位U位。所述多位U位可以被寫指標105或視窗(清除)指標116清除,且每次讀操作可以將對應的多位U位的值增加“1”。在寫操作過程中,將當前表項裏的U位與一個預先設定的數值。如果U位的值小於所述預先設定的數值,那麼當前表項是可以被替換的。如果U位的值大於所述預先設定的數值,那麼指標105移動到下一個表項。Alternatively, the U bit may be more than one bit, such that there are multiple U bits. The multi-bit U bit can be cleared by the write indicator 105 or the window (clear) indicator 116, and each read operation can increase the value of the corresponding multi-bit U bit by "1". During the write operation, the U bit in the current entry is a pre-set value. If the value of the U bit is less than the predetermined value, the current entry can be replaced. If the value of the U bit is greater than the predetermined value, the indicator 105 moves to the next entry.
回到圖7A,當處理器核125開啟時,一個重置信號(圖中未顯示)把主動表121中所有表項的有效位置為“0”。當重置信號釋放時,一個重置向量(重置起點的指令位址)被放上匯流排141以送到主動表121進行匹配。因為在主動表121裏表項的內容中未尋到匹配,主動表121就把所述位址的高位部分(即重置向量)寫入指標129產生的WXADDR 153指向的主動表121中的表項,將所述表項的有效位設置為“1”,並通過匯流排144將所述重置向量送到填充引擎132。Returning to Fig. 7A, when the processor core 125 is turned on, a reset signal (not shown) sets the effective position of all entries in the active list 121 to "0". When the reset signal is released, a reset vector (the instruction address of the reset start point) is placed on the bus 141 to be sent to the active table 121 for matching. Since no match is found in the contents of the entry in the active table 121, the active table 121 writes the upper portion of the address (ie, the reset vector) to the entry in the active table 121 pointed to by the WXADDR 153 generated by the indicator 129. The valid bit of the entry is set to "1" and the reset vector is sent to the fill engine 132 via the bus 144.
填充引擎132根據重置向量對應的指令位址通過匯流排154從低級記憶體122獲取所述指令。所述獲取到的指令被填充到高層次記憶體124中由指標129產生的WXADDR 153索引的記憶塊中。同時,通過匯流排140從低級記憶體122獲取所述指令時,生成器130可以掃描並審查所述指令。此外,所述指令對應的軌跡資訊被寫入對應的表項或軌道表126中由WXADDR 153指向的行裏的軌跡點中。The fill engine 132 fetches the instructions from the low level memory 122 through the bus bar 154 according to the instruction address corresponding to the reset vector. The fetched instructions are populated into a memory block of the high level memory 124 indexed by the WXADDR 153 generated by the indicator 129. At the same time, when the instruction is fetched from the low level memory 122 through the bus bar 140, the generator 130 can scan and review the instructions. In addition, the trajectory information corresponding to the instruction is written into the corresponding entry or track point in the track table 126 that is pointed by the WXADDR 153.
當填充操作完成後,指標129移動到主動表121中的下一個可用表項。可選地,位址翻譯單元131可以對虛位址和實位址進行翻譯轉換。位址翻譯單元131也可以被放置在低級記憶體122之外,從而減少從低級記憶體122獲取到高層次記憶體124的時延。When the filling operation is completed, the indicator 129 moves to the next available entry in the active list 121. Alternatively, the address translation unit 131 may perform translation conversion on the virtual address and the real address. The address translation unit 131 can also be placed outside of the lower level memory 122, thereby reducing the delay in acquiring the high level memory 124 from the lower level memory 122.
生成器130掃描填充到高層次記憶體124的指令塊中每一條指令。當生成器130找到一條分支指令時,計算出分支指令的目標位址。所述目標位址可以由包含所述分支指令的指令段起始位址加上所述分支指令的偏移量,並再加上分支轉移到目標指令的距離。所述目標位址的低位元部分就是所述分支目標指令在其行中的偏移量(之後將用BNY表示)。所述計算得到的目標位址的高位部分被用於與主動表121中的內容匹配。如果匹配不成功,那麼主動表121就把這個值通過匯流排144送到填充引擎132以實現填充操作。The generator 130 scans each instruction in the instruction block populated into the high level memory 124. When the generator 130 finds a branch instruction, the target address of the branch instruction is calculated. The target address may be added by the instruction segment start address of the branch instruction plus the offset of the branch instruction, and the distance that the branch is transferred to the target instruction. The lower bit portion of the target address is the offset of the branch target instruction in its row (which will be denoted later by BNY). The calculated upper portion of the target address is used to match the content in the active table 121. If the match is unsuccessful, then the active table 121 sends this value through the bus 144 to the fill engine 132 to effect the fill operation.
在另一方面,如果匹配成功,那麼表示包含所述分支目標的指令段已經存儲在高層次記憶體124中,且匹配成功的行號碼(BNX)及分支目標指令在行(BNX)中的偏移量(BNY)(即合併稱為BN)被送到匯流排149用於寫入一個軌道表項。所述表項由WXADDR 153(行位址)以及在匯流排143上的從生成器130(列位址)來的表示分支指令在其所在指令段中偏移量的值共同索引。這樣,當指令段中所有指令都被掃描、處理後,在主動表121、軌道表126和高層次記憶體124中對應同一指令段的表項被同一個WXADDR索引。On the other hand, if the match is successful, it indicates that the instruction segment containing the branch target has been stored in the high level memory 124, and the matching line number (BNX) and the branch target instruction are in the line (BNX). The shift amount (BNY) (i.e., the merge is referred to as BN) is sent to the bus 149 for writing a track entry. The entry is indexed by WXADDR 153 (row address) and the value of the offset from the generator 130 (column address) on the bus 143 indicating that the branch instruction is in the instruction segment in which it is located. Thus, when all the instructions in the instruction segment are scanned and processed, the entries corresponding to the same instruction segment in the active table 121, the track table 126, and the high level memory 124 are indexed by the same WXADDR.
更特殊地,高層次記憶體124包含了將被處理器核125使用的整個指令段,主動表121包含了將與後續指令段進行匹配的塊(高位)位址,且軌道表126包含了所述指令段中的全部分支軌跡點,包括:它們在指令段中的位置和它們的目標位址的BN值。一個BN值包括一個行位址BNX和一個列位址BNY。More specifically, the high level memory 124 contains the entire instruction segment to be used by the processor core 125, the active table 121 contains the block (high order) address to be matched with the subsequent instruction segments, and the track table 126 contains the All branch track points in the instruction segment include their position in the instruction segment and the BN value of their target address. A BN value includes a row address BNX and a column address BNY.
圖9顯示了根據本發明技術方案使用軌道表126建立新軌道的方法的一個實施例。如圖9所示,一個已建立的軌道66(用BNX0表示)可以包含三條分支指令或分支點67、68和69。當審查分支點67時,一條新軌道70(用BNX1表示的下一個可用行)被建立起來用於存儲分支點67的目標指令,且軌道表126中的所述軌道的號碼或所述行號(即BNX1)被記錄在分支點67中作為第一位址。類似地,當審查分支點68時,在軌道表126中另一條新軌道71(用BNX2)表示被建立起來,且軌道號碼被記錄在分支點68中;當審查分支點69時,在軌道表126中另一條新軌道72(用BNX3)表示被建立起來,且軌道號碼被記錄在分支點69中。Figure 9 shows an embodiment of a method of establishing a new track using track table 126 in accordance with the teachings of the present invention. As shown in FIG. 9, an established track 66 (represented by BNX0) may contain three branch instructions or branch points 67, 68, and 69. When reviewing the branch point 67, a new track 70 (the next available line indicated by BNX1) is established for storing the target command of the branch point 67, and the number of the track or the line number in the track table 126 (ie, BNX1) is recorded in the branch point 67 as the first address. Similarly, when reviewing the branch point 68, another new track 71 (with BNX2) representation is established in the track table 126, and the track number is recorded in the branch point 68; when reviewing the branch point 69, in the track table Another new track 72 (with BNX3) representation in 126 is established and the track number is recorded in branch point 69.
這樣,單條軌道中對應全部分支點的新軌道可以被建立起來。此外,軌道表126可以足夠大從而容納全部塊號,且新軌道的號碼可以通過對已使用的軌道中最大的軌道號碼加一(1)的方法得到。可選地,根據特定的軌道粒度,一條軌道對應的指令數目可以是多種的(較高粒度可以允許一個單獨的軌道或行用較小的表項數目代表包含較大個數的指令的指令段)。In this way, new tracks corresponding to all branch points in a single track can be established. Furthermore, the track table 126 can be large enough to accommodate all block numbers, and the number of new tracks can be obtained by adding one (1) to the largest track number in the used track. Optionally, depending on the specific track granularity, the number of instructions corresponding to one track may be multiple (higher granularity may allow a single track or row to represent the instruction segment containing a larger number of instructions with a smaller number of entries) ).
回到圖7A,繼續之前的操作,循跡器170可以輸出一個BN 151用於對軌道表126和高層次記憶體124定址。這就是說,循跡器170可以對軌道表126、高層次記憶體124和處理器核125提供座標操作。圖7B顯示了緩存系統8000中實現所述操作的一個組成部分的實施例。Returning to Figure 7A, continuing the previous operation, the tracker 170 can output a BN 151 for addressing the track table 126 and the high level memory 124. That is, the tracker 170 can provide coordinate operations to the track table 126, the high level memory 124, and the processor core 125. FIG. 7B shows an embodiment of a component of the cache system 8000 that implements the operations.
如圖7B所示,循跡器170包括一個棧135、一個自增器136、一個多路選擇器137、一個寄存器138和一個異常處理程式位址寄存器139。在操作過程中,循跡器170控制軌道表126的一個讀指標。也就是說,循跡器170輸出一個位址(即BN 151)用於對軌道表126和高層次記憶體124進行定址。BN 151包含BNX 152和BNY 156。BNX 152可以被用於對軌道表126中的一個行或一條軌道進行定址,並對高層次記憶體124中的一個記憶塊進行定址,此時BNY 156可以被用於對軌道表126中由BNX 152指向的軌道或行中的一個表項進行定址。As shown in FIG. 7B, the tracker 170 includes a stack 135, an auto-incrementer 136, a multiplexer 137, a register 138, and an exception handler address register 139. Tracker 170 controls a read indicator of track table 126 during operation. That is, the tracker 170 outputs an address (i.e., BN 151) for addressing the track table 126 and the high level memory 124. BN 151 includes BNX 152 and BNY 156. BNX 152 can be used to address one row or track in track table 126 and address a memory block in high level memory 124, at which point BNY 156 can be used in track table 126 by BNX An entry in the track or row pointed to by 152 is addressed.
循跡器170可以通過多路選擇器137選擇從不同來源來的輸出BN 151。舉例而言,多路選擇器137可以有四個BN輸入來源:通過匯流排164送來的存儲在棧135中的一個BN、通過匯流排165送來的由當前BNX 151和經自增器136對當前BNY 156自增後的得到的BNY構成的一個新BN、通過匯流排150送來的來源於軌道表126的一個BN、以及來源於異常處理程式位址寄存器139的一個BN。還可以有其他的來源。如之前所述,存儲在棧135中的BN可以是函數調用或返回時的程式位址對應的BN值,存儲在異常處理程式位址寄存器139中的BN可以是異常處理程式的位址對應的BN值。多路選擇器137輸出的所有BN值均包含BNX和BNY。The tracker 170 can select the output BN 151 from a different source through the multiplexer 137. For example, the multiplexer 137 can have four BN input sources: one BN stored in the stack 135 sent through the bus 164, the current BNX 151 and the booster 136 sent through the bus 165. A new BN formed by the current BNY 156 self-incremented BNY, a BN derived from the track table 126 sent through the bus 150, and a BN derived from the exception handler address register 139. There are also other sources available. As described earlier, the BN stored in the stack 135 may be the BN value corresponding to the program address at the time of the function call or return, and the BN stored in the exception handler address register 139 may be the address of the exception handler. BN value. All BN values output by the multiplexer 137 include BNX and BNY.
此外,多路選擇器137受來自處理器核125的信號381控制以選擇一個特定的BN送到輸出418。舉例而言,當處理器核125發生異常時,多路選擇器137受信號381控制,選擇異常處理程式位址寄存器139送來的BN作為輸出418;當處理器核125進行函數調用返回時,多路選擇器137信號381控制,選擇棧135送來的BN作為輸出418;當處理器核125進行分支轉移時(信號381成為一個分支轉移信號),多路選擇器137信號381控制,選擇軌道表126送來的BN作為輸出418;以及當處理器核125不進行分支轉移或執行其他普通操作時,多路選擇器137受信號381控制選擇BN 165,即BNX 152保持不變、BNY經自增器136增一後的BN作為輸出418。In addition, multiplexer 137 is controlled by signal 381 from processor core 125 to select a particular BN to output 418. For example, when an exception occurs in the processor core 125, the multiplexer 137 is controlled by the signal 381 to select the BN sent from the exception handler address register 139 as the output 418; when the processor core 125 returns a function call, The multiplexer 137 signal 381 controls the BN sent from the stack 135 as the output 418; when the processor core 125 performs the branch transfer (the signal 381 becomes a branch transfer signal), the multiplexer 137 signal 381 controls the selection of the track. Table 126 sends BN as output 418; and when processor core 125 does not perform branch transfer or performs other normal operations, multiplexer 137 is controlled by signal 381 to select BN 165, ie BNX 152 remains unchanged, BNY is self-contained The booster 136 is incremented by BN as output 418.
從多路選擇器137來的匯流排或輸出418(即下一個BN)可以在處理器核125來的信號417的控制下被存儲到寄存器138中並被用於更新循跡器輸出的BN 151。當寄存器138受信號417控制保持當前BN 151不變時,寄存器138不會輸出所述輸出418。另一方面,當寄存器138受信號417控制更新當前BN 151時,所述輸出418被送到匯流排151上成為當前BN 151,從而更新BNX 152和BNY 156。The bus or output 418 (i.e., the next BN) from the multiplexer 137 can be stored in the register 138 under control of the signal 417 from the processor core 125 and used to update the BN 151 of the tracker output. . When the register 138 is controlled by the signal 417 to keep the current BN 151 unchanged, the register 138 does not output the output 418. On the other hand, when the register 138 is controlled by the signal 417 to update the current BN 151, the output 418 is sent to the bus bar 151 to become the current BN 151, thereby updating the BNX 152 and the BNY 156.
循跡器170提供的BN 151包含BNX 152和BNY 156。BNX 152被用於對指令段定址,處理器核125則使用PC的偏移量獲取需要執行的指令。而且,BNX 152和BNY 156被送到軌道表126使得軌道表126能將下一個BN送到匯流排150上。The BN 151 provided by the tracker 170 includes BNX 152 and BNY 156. BNX 152 is used to address the instruction segment, and processor core 125 uses the offset of the PC to fetch the instructions that need to be executed. Moreover, BNX 152 and BNY 156 are sent to track table 126 such that track table 126 can deliver the next BN to bus bar 150.
如圖7B所示,為了描述軌道表126和循跡器170之間的相互關係,假設軌道表126包含了軌道(即:行)410、411和412。每條軌道可以包含從0號到15號的16個表項或軌跡點。此外,軌跡點413(軌道410中的第8項)可以是一個分支目標為軌跡點414(軌道411中的第2項)的分支點,以及軌跡點415(軌道411中的第14項)可以是另一個分支點,其分支目標是軌跡點416(軌道412中的第5項)。As shown in FIG. 7B, to describe the correlation between the track table 126 and the tracker 170, it is assumed that the track table 126 includes tracks (ie, rows) 410, 411, and 412. Each track can contain 16 entries or track points from 0 to 15. Further, the track point 413 (the eighth item in the track 410) may be a branch point where the branch target is the track point 414 (the second item in the track 411), and the track point 415 (the 14th item in the track 411) may Is another branch point whose branch target is track point 416 (the fifth item in track 412).
假設軌道410對應的指令段已經被填充在高層次記憶體124中,且處理器核125從軌道410的起始位置開始執行指令。這就是說,處理器核125的程式計數器(PC)從軌道410中的第0項對應的指令位址開始運行。It is assumed that the instruction segment corresponding to the track 410 has been filled in the high level memory 124, and the processor core 125 executes the instruction from the start position of the track 410. That is to say, the program counter (PC) of the processor core 125 starts from the instruction address corresponding to the 0th entry in the track 410.
同時,假設循跡器170也送出一個包含BNX和BNY的指向軌道表126中軌道410的第0項的讀指標151。軌道410中的其他表項也可以被用到。可以通過檢查所述表項中的內容確定指令的類型資訊和位址資訊等。At the same time, it is assumed that the tracker 170 also sends a read indicator 151 containing the 0th item of the track 410 in the track table 126 of BNX and BNY. Other entries in track 410 can also be used. The type information, address information, and the like of the instruction can be determined by checking the contents of the entry.
正如之前所述,當從軌道410的第0項開始運行時,由於軌道410的第0項不是一個分支點,循跡器170保持BNX 152不變並通過自增器136將BNY增加一,從而得到對應軌道表126中軌道410中的下一個表項的下一個BN。循跡器170不斷增加BNY從而向軌道410中的下一表項移動,直到到達一個分支點,如:軌跡點413(軌道410中的第8項)。在這個過程中,由於BNX沒有變化,因此指令段位址沒有變化,處理器核125可以利用PC的偏移量從高層次記憶體124中不斷獲取指令。As described earlier, when running from the 0th item of the track 410, since the 0th item of the track 410 is not a branch point, the tracker 170 keeps the BNX 152 unchanged and increases BNY by the booster 136, thereby The next BN of the next entry in the track 410 in the corresponding track table 126 is obtained. The tracker 170 continuously increments BNY to move to the next entry in track 410 until a branch point is reached, such as track point 413 (item 8 in track 410). In this process, since the BNX has not changed, the instruction segment address does not change, and the processor core 125 can continuously acquire instructions from the high-level memory 124 using the offset of the PC.
當循跡器170到達軌跡點413(軌道410中的第8項)後,由於軌跡點413是一個分支點,因此對源位址和目標位址都進行分析。如果包含所述分支點源位址的後一條指令的指令段和/或包含所述目標位址的指令段還沒有被填充到高層次記憶體124中,那麼將所述可能被處理器核125執行的指令段填充到高層次記憶體124中。When the tracker 170 reaches the track point 413 (the eighth item in the track 410), since the track point 413 is a branch point, both the source address and the target address are analyzed. If the instruction segment of the next instruction containing the branch point source address and/or the instruction segment containing the target address has not been filled into the high level memory 124, then the processor core 125 may be The executed instruction segment is filled into the high level memory 124.
在某些情況下,由於是在建立軌道表行時才建立主動表121中的表項,因此在循跡器170到達軌跡點413時,包含所述源位址下一條指令的指令段和包含目標位址的指令段可能已經被填充到高層次記憶體124中。這樣,由於下一指令是軌道410中第9項,且軌道410對應的指令段已經被填充到高層次記憶體124中,因此不需要對軌跡點413的下一條指令進行填充。此外,由於軌跡點414已經在軌道表126和主動表121中被建立起來,軌道411對應的包含分支目標(軌道411的第2項)的指令段已經被填充到高層次記憶體124中。In some cases, since the entry in the active table 121 is established when the track table row is established, when the tracker 170 reaches the track point 413, the instruction segment containing the next instruction of the source address and the inclusion is included. The instruction segment of the target address may have been populated into the high level memory 124. Thus, since the next instruction is the ninth item in the track 410, and the instruction segment corresponding to the track 410 has been filled into the high level memory 124, the next instruction of the track point 413 does not need to be filled. Further, since the track point 414 has been established in the track table 126 and the active table 121, the instruction segment corresponding to the track target (the second item of the track 411) corresponding to the track 411 has been filled into the high level memory 124.
由於處理器核125執行指令的速度比循跡器170沿對應所述指令在內的軌跡點移動的速度慢,循跡器170可以在分支點等待處理器核125或與處理器核125同步。此外,軌道表126可以將分支目標作為匯流排150上的下一個BN(軌道411的第2項),即BNX為411及BNY為2,信號381可以在處理器核125執行軌跡點413的分支指令時提供一個分支是否發生的標識。Since processor core 125 executes instructions at a slower rate than tracker 170 moves along track points corresponding to the instructions, tracker 170 can wait for processor core 125 or synchronize with processor core 125 at the branch point. In addition, the track table 126 can have the branch target as the next BN on the bus 150 (the second item of the track 411), that is, BNX is 411 and BNY is 2, and the signal 381 can execute the branch of the track point 413 at the processor core 125. Provides an indication of whether a branch occurred during the instruction.
如本實施例所示,當分支發生時,循跡器170將軌道表通過匯流排150送來得到下一個BN作為BN 151,即BNX指向軌道411及BNY指向軌道411的第2項。BNX也被用於在高層次記憶體124中對相應指令段進行定址,這樣處理器核125可以從軌道411的第2項對應的指令開始執行。然而,如果分支沒有發生,將所述分支點簡單視為非分支點,循跡器170向前移動。As shown in this embodiment, when a branch occurs, the tracker 170 sends the track table through the bus bar 150 to obtain the next BN as the BN 151, that is, the BNX points to the track 411 and the BNY points to the second item of the track 411. BNX is also used to address the corresponding instruction segments in high level memory 124 such that processor core 125 can begin execution of instructions corresponding to the second entry of track 411. However, if the branch does not occur, the branch point is simply treated as a non-branch point, and the tracker 170 moves forward.
類似地,從軌道411的第2項開始,循跡器170找到下一個分支目標為軌跡點416(軌道412的第5項)的分支軌跡點415(軌道411的第14項)。如之前所述,用類似的方法對軌跡點413進行操作。如果針對分支軌跡點415(軌道411的第14項)的分支發生,處理器核125從軌跡點416開始執行。另一方面,如果針對分支軌跡點415的分支不發生,循跡器170移動到軌道411的第15項,即軌道411的最後一項。Similarly, starting from the second term of the track 411, the tracker 170 finds the branch track point 415 (the 14th item of the track 411) where the next branch target is the track point 416 (the fifth item of the track 412). The track point 413 is operated in a similar manner as previously described. If a branch for branch track point 415 (item 14 of track 411) occurs, processor core 125 begins execution at track point 416. On the other hand, if the branch for the branch track point 415 does not occur, the tracker 170 moves to the 15th item of the track 411, that is, the last item of the track 411.
當所述表項不是一個分支點,但是所在軌道的最後一條指令時,從位於下一軌道中的下一指令對應的軌跡點開始運行,循跡器170保持BNX 152不變並不斷對BNY 156增一(1),從而產生新的BNY直到一個新的BNY指向所述新軌道中的第一個分支點。When the entry is not a branch point, but the last instruction of the track is running, starting from the track point corresponding to the next instruction located in the next track, the tracker 170 keeps the BNX 152 unchanged and continues to BNY 156. Increase by one (1) to generate a new BNY until a new BNY points to the first branch point in the new track.
這樣,可以在處理器核125實際執行指令前建立軌道表126,使得指令可以被填充到高層次記憶體124中,從而避免或減少因緩存缺失造成的時延。其他的機制,如增加軌道表運行速度、增大BNY的粒度、通過用軌道表中一個表項代表多條指令等方式減少軌道表中表項的數目等方法,可以單獨或組合應用在上述實施例中。Thus, the track table 126 can be created before the processor core 125 actually executes the instructions so that the instructions can be populated into the high level memory 124 to avoid or reduce latency due to cache misses. Other mechanisms, such as increasing the speed of the track table, increasing the granularity of BNY, reducing the number of entries in the track table by using multiple entries in the track table, etc., can be applied separately or in combination in the above implementation. In the example.
還可以在軌道表126中採用多層分支的方法進一步改善緩存缺失率。舉例而言,當從軌道表126中的一個表行讀取表項時,發現一個分支軌跡點並將包含所述分支軌跡點對應的分支目標指令的指令段填充到高層次記憶體124中。同時在軌道表126中建立一條新軌道(一級)。此外,所述新軌道也被檢測,找到新軌道中的第一個分支軌跡點並將包含新軌道中所述分支軌跡點對應的分支目標指令的指令段填充到高層次記憶體124中。由此,在軌跡表126中建立另一條新軌道(二級)。這樣,兩級分支點被用於填充高層次記憶體124,且對於處理器核125而言,所述填充操作被更進一步地隱藏了。也可以對一級軌道的所有可能的執行結果建立二級軌道。這樣,所述兩級軌道不單根據當前分支點對應的分支目標指令新軌道中的第一個分支點建立,也根據當前分支點之後的下一條指令對應的新軌道的第一個分支點建立。It is also possible to further improve the cache miss rate by employing a multi-layer branching method in the track table 126. For example, when an entry is read from a table row in the track table 126, a branch track point is found and the instruction segment containing the branch target instruction corresponding to the branch track point is filled into the high level memory 124. At the same time, a new track (level 1) is created in the track table 126. In addition, the new track is also detected, the first branch track point in the new track is found, and the instruction segment containing the branch target instruction corresponding to the branch track point in the new track is filled into the high level memory 124. Thus, another new track (secondary) is created in the track table 126. Thus, two levels of branch points are used to fill the high level memory 124, and for the processor core 125, the fill operation is further hidden. It is also possible to establish a secondary track for all possible execution results of the primary track. In this way, the two-level track is not only established according to the first branch point in the new track according to the branch target corresponding to the current branch point, but also established according to the first branch point of the new track corresponding to the next instruction after the current branch point.
此外,可以根據與當前程式計數器(PC)之間的距離,建立一層或多層的可變層數的軌道。所述距離可以用領先與當前處理器核125執行的指令的指令數目表示。也就是說,無論為了保證已填充的指令按一個預先設定的值領先於正在執行的指令而建立了多少層軌道,所述軌道都可以被建立起來以填充至少包含由所述距離決定的數目對應的全部指令的指令段。所述距離也可以用從當前分支點起的一個距離表示。也就是說,無論為了保證已填充的指令按一個預先設定的值領先於正在執行的指令並掩蓋填充延遲而建立了多少層軌道,所述軌道都可以被建立起來以填充至少包含從分支點起對應所述距離的指令的指令段。另外也可以使用其他參數。In addition, one or more layers of variable layer tracks can be created based on the distance from the current program counter (PC). The distance may be represented by the number of instructions leading the instruction executed by the current processor core 125. That is, regardless of how many layers of tracks are established to ensure that the filled instructions are ahead of the instruction being executed by a predetermined value, the tracks can be built to fill at least the number corresponding to the distance determined by the distance. The instruction segment of all instructions. The distance can also be represented by a distance from the current branch point. That is, regardless of how many layers of tracks are established to ensure that the filled instructions are ahead of the instruction being executed by a pre-set value and mask the fill delay, the tracks can be built to fill at least from the branch point. An instruction segment of an instruction corresponding to the distance. Other parameters can also be used.
此外,在某些例子中,複數個記憶塊(如:指令段和資料段)可以同時被填充到高層次記憶體124中。當填充所述複數個指令段或資料段時,每個段可以被分割為多個小片段,且可以對每個小片段設定一個優先順序。這樣,不需要一次性地填充整個單獨的段。可以基於處理器核125對每個小片段的需求設置優先順序,並根據每一個小片段的優先順序採用交替機制對不同段的小片段進行填充。Moreover, in some examples, a plurality of memory blocks (eg, instruction segments and data segments) can be simultaneously populated into the high level memory 124. When the plurality of instruction segments or data segments are filled, each segment can be divided into a plurality of small segments, and a priority order can be set for each small segment. In this way, it is not necessary to fill the entire individual segment at one time. The priority order of each small segment may be prioritized based on the processor core 125, and the small segments of the different segments may be padded using an alternate mechanism according to the priority order of each small segment.
舉例而言,如果一個指令段為256個字(1024個位元組)長,該指令段可以被分割為四個小片段,每個小片段包含64個字(256個位元組)。這樣,對於一個從位址0x1FC00000開始的指令段,四個小片段分別從0x1FC00000、0x1FC00100、0x1FC00200和0x1FC00300開始。如果處理器核125需要的指令位於第二個小片段0x1FC00100,那麼這個小片段0x1FC00100的優先順序就可以被設置為高。這樣,填充所述指令段時的填充順序就可以是0x1FC00100、0x1FC00200、0x1FC00300和0x1FC00000。此外,如果在填充所述從0x1FC00000開始的指令段時,還需要填充另一個或第二個從0x90000000開始的指令段,那麼所述第二個指令段也可以被分割為四個小片段,分別從0x90000000、0x90000100、0x90000200和0x90000300開始。如果處理器核125需要的指令在第四個小片段(0x90000300)中,那麼所述第四個小片段可以被設置一個高的優先順序,且整個填充順序可以以一種交替的方式依次為0x1FC00100、0x90000300、0x1FC00200、0x90000000、0x1FC00300、0x90000100、0x1FC00000和0x90000200。此外,可以採用更多的段和小片段對高層次記憶體124進行填充,而且還可以使用其他配置方式。雖然上述實施例描述的是指令段填充,但資料段填充也可以使用類似的方法。此外,指令段和資料段也可以一起被分段交替填充。For example, if an instruction segment is 256 words (1024 bytes) long, the instruction segment can be split into four small segments, each of which contains 64 words (256 bytes). Thus, for an instruction segment starting at address 0x1FC00000, the four small segments start from 0x1FC00000, 0x1FC00100, 0x1FC00200, and 0x1FC00300, respectively. If the instruction required by processor core 125 is located in the second small segment 0x1FC00100, the priority of this small segment 0x1FC00100 can be set high. Thus, the filling order when filling the instruction segments can be 0x1FC00100, 0x1FC00200, 0x1FC00300, and 0x1FC00000. In addition, if the instruction segment starting from 0x1FC00000 is filled in, and another or the second instruction segment starting from 0x90000000 is also required to be filled, the second instruction segment can also be divided into four small segments, respectively Start with 0x90000000, 0x90000100, 0x90000200, and 0x90000300. If the instruction required by the processor core 125 is in the fourth small segment (0x90000300), then the fourth small segment can be set to a high priority order, and the entire padding order can be 0x1FC00100 in an alternating manner. 0x90000300, 0x1FC00200, 0x90000000, 0x1FC00300, 0x90000100, 0x1FC00000, and 0x90000200. In addition, more segments and small segments can be used to fill the high level memory 124, and other configurations can be used. Although the above embodiment describes the instruction segment padding, a similar method can be used for data segment padding. In addition, the instruction segment and the data segment can also be alternately filled with segments.
圖10A是本發明所述緩存系統的另一個實施例9000。緩存系統9000與圖7A中的緩存系統8000類似。然而,如圖10A所示,緩存系統9000包含一個交換器133,且緩存系統9000中的分配器1200除包含一個主動表121外,還包含一個保留表120。Figure 10A is another embodiment 9000 of the cache system of the present invention. Cache system 9000 is similar to cache system 8000 in Figure 7A. However, as shown in FIG. 10A, the cache system 9000 includes a switch 133, and the allocator 1200 in the cache system 9000 includes a reservation table 120 in addition to an active table 121.
保留表與主動表類似,並與主動表一同存儲程式中所有分支指令的軌道資訊,從而降低主動表121和一級緩存的容量大小。更特殊地,當一條對應一個分支點的軌道已經被建立時,所述分支點的分支目標可以被存儲到保留表中。可以在執行流接近於所述分支點時,才根據存儲在保留表中的資訊建立所述分支目標軌道。The reserved table is similar to the active table, and stores the track information of all branch instructions in the program together with the active table, thereby reducing the capacity of the active table 121 and the primary cache. More specifically, when a track corresponding to a branch point has been established, the branch target of the branch point can be stored in the reservation table. The branch target track may be established based on information stored in the reserved table when the execution flow is close to the branch point.
在某些例子中,主動表存儲已建立的軌道(如:對應已經被填充到高層次記憶體124中的指令段),而保留表存儲將建立的軌道(如:對應尚未被填充到高層次記憶體124中的指令段)。這樣,當建立一條軌道時,一個軌跡點可以對應到主動表中的一個表項(如:一個BN)或保留表中的一個表項(一個TBN)。這裏使用的“TBN”,指的是“臨時塊號”或“臨時BN”,並代表了位於與BN所處的號碼空間不同的號碼空間中的一個號碼,因此保留表使用的號碼空間與主動表使用的號碼空間是相對的。這樣,就可以區分一個TBN和一個BN。舉例而言,可以用號碼的最高位區分一個TBN和一個BN。當軌跡點(如一個分支點)包含的內容是BN時,包含所述分支目標指令的指令段已經被填充到高層次記憶體124中。另一方面,當軌跡點包含的內容是TBN時,包含所述分支目標指令的指令段尚未被填充到高層次記憶體124中。這樣,當一條軌道包含多個分支點時,由於一些分支點可能永遠不會被訪問到,因此使用TBN代替BN可以減少記憶體的填充量並節省一級緩存空間。In some examples, the active table stores the created tracks (eg, corresponding to the instruction segments that have been populated into the high level memory 124), while the reserved table stores the tracks that will be created (eg, the corresponding has not been filled to a high level) The instruction segment in the memory 124). Thus, when a track is created, a track point can correspond to an entry in the active table (eg, a BN) or an entry in the reserved table (a TBN). "TBN" as used herein refers to "temporary block number" or "temporary BN" and represents a number in a number space different from the number space in which the BN is located, thus preserving the number space used by the table and actively The number space used by the table is relative. In this way, one TBN and one BN can be distinguished. For example, one TBN and one BN can be distinguished by the highest bit of the number. When a track point (such as a branch point) contains content that is BN, the instruction segment containing the branch target instruction has been filled into the high level memory 124. On the other hand, when the content of the track point is TBN, the instruction segment containing the branch target instruction has not been filled into the high level memory 124. Thus, when a track contains multiple branch points, since some branch points may never be accessed, using TBN instead of BN can reduce the amount of memory padding and save one level of cache space.
這樣,一個保留表可以被用於提高系統性能並降低存儲容量要求。圖12是使用軌道表126、保留表120和主動表121建立新軌道的實施例。In this way, a reserved table can be used to improve system performance and reduce storage capacity requirements. FIG. 12 is an embodiment of establishing a new track using the track table 126, the reservation table 120, and the active table 121.
如圖12所示,已建立的軌道66(BNX0)可以包含三個分支點67、68和69。為了便於描述,BNX號碼被用於標記軌道表126中的軌道或行。當審查分支點67時,分支點67的目標指令的位址被存儲到保留表120的表項73(標記為TBNX0)中,且表項73的號碼(即TBNX0)被作為第一位址存儲在分支點67中。當審查到分支點68和分支點69時,分支點68和分支點69的目標指令的位址也被存儲到保留表120中(被標記為TBNX1和TBNX2)。類似地,這兩個表項的號碼作為第一位址被分別存儲到分支點68和69中。As shown in FIG. 12, the established track 66 (BNX0) may include three branch points 67, 68, and 69. For ease of description, the BNX number is used to mark tracks or lines in the track table 126. When the branch point 67 is examined, the address of the target instruction of the branch point 67 is stored in the entry 73 of the reservation table 120 (labeled as TBNX0), and the number of the entry 73 (ie, TBNX0) is stored as the first address. In branch point 67. When the branch point 68 and the branch point 69 are examined, the addresses of the target instructions of the branch point 68 and the branch point 69 are also stored in the reservation table 120 (labeled as TBNX1 and TBNX2). Similarly, the numbers of the two entries are stored as the first address in branch points 68 and 69, respectively.
此外,當處理器核125即將執行分支指令67時,保留表120的表項73中的目標位址被轉移到主動表121的表項74中。在某些實施例中,主動表121的表項總數等於軌道表126的總行數,使得主動表121的表項與軌道表126的行能有建立一個一一對應的關係。這樣,基於對應關係75,可以根據主動表121中的對應表項(BNX1)在軌道表126中建立一條包含分支點67的分支目標新軌道70。分支點67中的TBNX0號碼也被替換為BNX1,使得下次這條指令將被執行時,所述BNX1可以在不用到保留表的情況下直接索引到目標軌道以及相應的記憶塊。Further, when the processor core 125 is about to execute the branch instruction 67, the target address in the entry 73 of the reservation table 120 is transferred to the entry 74 of the active table 121. In some embodiments, the total number of entries in the active table 121 is equal to the total number of rows in the track table 126, such that the entries of the active table 121 can have a one-to-one correspondence with the rows of the track table 126. Thus, based on the correspondence 75, a branch target new track 70 including the branch point 67 can be established in the track table 126 according to the corresponding entry (BNX1) in the active table 121. The TBNX0 number in branch point 67 is also replaced with BNX1 so that the next time this instruction is to be executed, the BNX1 can directly index to the target track and the corresponding memory block without going to the reserved list.
因此,只有當分支指令即將被執行時才建立對應的新軌道。這樣,在分支點67被執行前,分支點68和69的目標位址被存儲到保留表120中,並不建立分支點68和69對應的新軌道。Therefore, the corresponding new track is only established when the branch instruction is about to be executed. Thus, before the branch point 67 is executed, the target addresses of the branch points 68 and 69 are stored in the reservation table 120, and new tracks corresponding to the branch points 68 and 69 are not established.
回到圖10A,當處理器核(125)開啟時,重置信號(圖中未顯示)把主動表121中各表項的有效位置‘0’。當處理器核重置信號釋放時,重置向量(重置起點的指令位址值)被放上匯流排141。因為在保留表120與主動表121中都未尋到匹配,保留表120就把所述位址值放到匯流排144上送到填充引擎132用於從低級記憶體122經匯流排154獲取指令段(如:重置向量)。Returning to Fig. 10A, when the processor core (125) is turned on, the reset signal (not shown) puts the valid position '0' of each entry in the active list 121. When the processor core reset signal is released, the reset vector (the instruction address value of the reset start point) is placed on the bus 141. Because no match is found in both the reservation table 120 and the active table 121, the reservation table 120 places the address value on the bus 144 and sends it to the fill engine 132 for obtaining instructions from the lower memory 122 via the bus 154. Segment (eg: reset vector).
指標129通過匯流排153指向主動表121中當前表項,且指標129同時指向高層次記憶體124中的一條指令或存儲獲取來的指令段的記憶塊。The indicator 129 points to the current entry in the active list 121 through the bus 153, and the indicator 129 points to an instruction in the high-level memory 124 or stores the memory block of the acquired instruction segment.
生成器130提取出所述指令段中指令相關的軌道資訊,並寫入軌道表126中由指標129通過位址匯流排153指向的對應表項。當填充操作完成時,主動表121當前表項的有效位被置為‘1’。之後,指標129移向主動表121的下一個有效表項。The generator 130 extracts the track information related to the instruction in the instruction segment and writes the corresponding entry in the track table 126 pointed by the indicator 129 through the address bus 153. When the padding operation is completed, the valid bit of the current entry of the active table 121 is set to '1'. Thereafter, the indicator 129 moves to the next valid entry of the active list 121.
生成器130掃描被填充到高層次記憶體124的指令塊中每一條指令。當生成器130發現分支指令,則計算出所述分支指令的目標位址。所述目標位址可以表示為包含所述分支指令的指令段的起點位址(源指令段位址)加上所述轉移相距所述起點的位移,再加上從源指令到目標指令的分支距離(通常就是分支偏移量)。計算得到的目標位址的高位被用於與保留表120和主動表121中的內容匹配。The generator 130 scans each instruction in the instruction block that is filled into the high level memory 124. When the generator 130 finds a branch instruction, the target address of the branch instruction is calculated. The target address may be represented as a start address (source instruction segment address) of the instruction segment containing the branch instruction plus a shift of the transfer from the start point, plus a branch distance from the source instruction to the target instruction (usually the branch offset). The calculated upper bits of the target address are used to match the contents of the reservation table 120 and the active table 121.
若在保留表120和主動表121均匹配不成功,則將所述高位位址寫入保留表120中由指標127所指示的表項,同時把指標127的值和目標位址的低位元(目標偏移位址)(兩者一同組成TBN)放入軌道表126中由匯流排153(分支源行位址)及由匯流排143(分支源偏移位址)指示的一個表項中。匯流排143可以提供一個對應所述分支指令在其所屬指令段中偏移量的列位址。If the matching between the reservation table 120 and the active table 121 is unsuccessful, the upper address is written into the entry indicated by the indicator 127 in the reservation table 120, and the value of the indicator 127 and the lower address of the target address are The target offset address (both together form a TBN) is placed in the track table 126 in an entry indicated by the bus 153 (branch source row address) and by the bus bar 143 (branch source offset address). Bus 143 may provide a column address corresponding to the offset of the branch instruction in its associated instruction segment.
若在保留表120中有匹配,則指向匹配項的指標127的值放和目標偏移量一同作為TBN被寫入軌道表126中由匯流排153(行位址)及匯流排143(偏移量)確定的表項。若在主動表121中有匹配,則匹配的主動表項和偏移量一同作為BN被寫入軌道表126中由匯流排153(行位址)及匯流排143(偏移量)指示的表項。以TBN形式出現的目標位址對應的指令尚未被填充到高層次記憶體124中,而以BN形式出現的目標位址對應的指令已經被填充到高層次記憶體中。If there is a match in the reservation table 120, the value of the index 127 pointing to the match is written into the track table 126 as the TBN together with the target offset by the bus bar 153 (row address) and the bus bar 143 (offset) Quantity) The determined item. If there is a match in the active list 121, the matched active entry and the offset are written as a BN to the table indicated by the bus bar 153 (row address) and the bus bar 143 (offset) in the track table 126. item. The instruction corresponding to the target address appearing in the form of TBN has not been filled into the high-level memory 124, and the instruction corresponding to the target address appearing in the form of BN has been filled into the high-level memory.
重複上述過程,直到整個指令段都被獲取並填充到高層次記憶體124中。這樣,被動表120、主動表121和軌道表126包含了指令段有關資訊,且高層次記憶體124包含了整個指令段供處理器核125執行。主動表121包含了指令段的起始(段)位址值以供之後的指令段匹配,而軌道表126包含了該指令段中所有的分支點及對應的目標TBN或BN值。The above process is repeated until the entire instruction segment is acquired and populated into the high level memory 124. Thus, passive table 120, active table 121, and track table 126 contain information about the instruction segments, and high level memory 124 contains the entire instruction segment for execution by processor core 125. The active table 121 contains the start (segment) address values of the instruction segments for subsequent instruction segments to match, and the track table 126 contains all of the branch points and corresponding target TBN or BN values in the instruction segment.
當循跡器170輸出一個BN 151用於指示軌道表126中一個表項時,所述表項的內容經讀口161被讀出。如果所述內容顯示不是一個分支點時,那麼後續操作與圖7A實施例中相應操作相同。然而,如果所述內容顯示為一個分支點時,那麼將分支目標位址(BN或TBN)讀出送往交換器133。When the tracker 170 outputs a BN 151 for indicating an entry in the track table 126, the contents of the entry are read through the read port 161. If the content display is not a branch point, then the subsequent operations are the same as the corresponding operations in the embodiment of Fig. 7A. However, if the content is displayed as a branch point, the branch target address (BN or TBN) is read out to the switch 133.
因為分支目標位址可以對應保留表120中的一個表項(即一個TBN)或對應主動表121中的一個表項(即一個BN),因此可以用交換器133對保留表120和主動表121中的表項進行交換。交換器133將TBN通過匯流排180發送到保留表120以啟動從低級記憶體向高層次記憶體124填充記憶塊的操作,並在交換完成後輸出一個BN。這種預填充保證了當處理器需要執行指令時能在高層次記憶體124中找到它們。Since the branch target address can correspond to one entry in the reservation table 120 (ie, one TBN) or one entry in the corresponding active table 121 (ie, one BN), the switch 133 can be used to reserve the table 120 and the active table 121. The entries in the table are exchanged. The switch 133 sends the TBN through the bus bar 180 to the reservation table 120 to initiate an operation of filling the memory block from the lower level memory to the high level memory 124, and outputs a BN after the exchange is completed. This pre-filling ensures that the processor can find them in the high-level memory 124 when it needs to execute instructions.
如圖13所示,交換器133包含一個TBNX表190和一個BNX表191。TBNX表190中的表項對應主動表121中的表項,可以被用於映射從主動表121轉移到保留表120中的表項。TBNX表190中每個表項的內容可以包括對應的表項在保留表120中的表項號和一個標誌位元G位元。As shown in FIG. 13, the switch 133 includes a TBNX table 190 and a BNX table 191. The entries in the TBNX table 190 correspond to the entries in the active table 121 and can be used to map entries transferred from the active table 121 to the reserved table 120. The content of each entry in the TBNX table 190 may include the entry number of the corresponding entry in the reservation table 120 and a flag bit G bit.
BNX表191中的表項對應保留表120中的表項,可以被用於映射從保留表120中轉移到主動表中121中的表項。BNX表191中每個表項的內容可以包括對應的表項在主動表121中的表項號(即BN)和一個有效位。The entries in the BNX table 191 correspond to the entries in the reservation table 120 and can be used to map entries from the reservation table 120 into the active table 121. The content of each entry in the BNX table 191 may include the entry number (ie, BN) of the corresponding entry in the active table 121 and a valid bit.
此外,從軌道表126輸出到匯流排150上的軌道資訊可以也包含一個與TBNX表190中的G位對應的G位192,用於表示BNX值在當前是真實存在於主動表中的,此時可以直接將所述BNX值輸出;否則需要進行映射。In addition, the track information outputted from the track table 126 to the bus bar 150 may also include a G bit 192 corresponding to the G bit in the TBNX table 190 for indicating that the BNX value is currently present in the active list. The BNX value can be directly output; otherwise, mapping is required.
當主動表121中的一個表項被轉移到保留表120中時,TBNX表190中對應表項被用於記錄表項號(BN)172。類似地,當保留表120中的一個表項被轉移到主動表121中時,BNX表191中對應表項被用於記錄所述表項的表項號並將有效位置為有效。When an entry in the active table 121 is transferred to the reservation table 120, the corresponding entry in the TBNX table 190 is used to record the entry number (BN) 172. Similarly, when one of the entries in the reservation table 120 is transferred to the active table 121, the corresponding entry in the BNX table 191 is used to record the entry number of the entry and the valid location is valid.
當匯流排150上的軌跡點資訊中包含的是保留表120的表項號時,所述表項號TBNX被用作索引從BNX表191中讀出BNX值和有效位。若所述BNX值是有效的(即有效位被設為有效),則將所述BNX值輸出作為下一BN 166中的BNX,並送到循跡器。另一方面,若該BNX值是無效的,則所述TBNX被用作索引通過匯流排180從保留表120中讀出內容,並啟動從低級記憶體122向高層次記憶體124填充所述TBNX對應的記憶塊的操作。When the track point information on the bus bar 150 contains the entry number of the reservation table 120, the entry number TBNX is used as an index to read the BNX value and the valid bit from the BNX table 191. If the BNX value is valid (i.e., the valid bit is set to be valid), the BNX value is output as BNX in the next BN 166 and sent to the tracker. On the other hand, if the BNX value is invalid, the TBNX is used as an index to read content from the reservation table 120 through the bus bar 180, and initiates filling of the TBNX from the low-level memory 122 to the high-level memory 124. The operation of the corresponding memory block.
當匯流排150上的軌跡點資訊中包含的是主動表121的表項號(即BN)時,若匯流排150上的軌跡點資訊中的G位元與TBNX表190中對應表項的G位相等時,則將所述BNX值輸出作為下一BN 166中的BNX。另一方面,若匯流排150上的軌跡點資訊中的G位元與TBNX表190中對應表項的G位不等,則保留表120中的表項號從TBNX表190中讀出並被用作索引從BNX表191中對應行讀出BNX值和有效位。若該BNX值是有效的,則將所述BNX值輸出作為下一BN 166中的BNX。另一方面,若該BNX值是無效的,則將保留表120中的所述表項號作為索引通過匯流排180從保留表120中讀出內容。When the track point information on the bus bar 150 includes the entry number (ie, BN) of the active table 121, if the G bit in the track point information on the bus bar 150 and the G of the corresponding entry in the TBNX table 190 When the bits are equal, the BNX value is output as BNX in the next BN 166. On the other hand, if the G bit in the track point information on the bus bar 150 is not equal to the G bit in the corresponding entry in the TBNX table 190, the entry number in the reserved table 120 is read from the TBNX table 190 and is The BNX value and the valid bit are read out from the corresponding row in the BNX table 191 as an index. If the BNX value is valid, the BNX value is output as BNX in the next BN 166. On the other hand, if the BNX value is invalid, the content of the entry in the reservation table 120 is used as an index to read the content from the reservation table 120 through the bus bar 180.
這樣,只要TBNX表190和BNX表191中有有效的表項,一個替換模組193就保持對軌道表126的掃描,並從匯流排159上讀入軌跡點資訊。若一個軌跡點的軌跡點資訊包含主動表121的一個表項號,且該表項號對應TBNX表190中的一個有效表項,則通過匯流排158輸出保留表120的表項號,並將該軌跡點資訊改為保留表120中的所述表項號。類似地,若一個軌跡點的軌跡點資訊包含保留表120的一個表項號,且該表項號對應BNX表191中的一個有效表項,則通過匯流排158輸出主動表121的表項號,並將該軌跡點資訊改為主動表121中的所述表項號。Thus, as long as there are valid entries in the TBNX table 190 and the BNX table 191, a replacement module 193 maintains a scan of the track table 126 and reads track point information from the bus 159. If the track point information of a track point includes an entry number of the active table 121, and the entry number corresponds to a valid entry in the TBNX table 190, the entry number of the reserved table 120 is output through the bus bar 158, and The track point information is changed to the item number in the reservation table 120. Similarly, if the track point information of a track point includes an entry number of the reserved table 120, and the entry number corresponds to a valid entry in the BNX table 191, the entry number of the active table 121 is output through the bus bar 158. And changing the track point information to the item number in the active table 121.
通過掃描整個軌道表,可以實現TBNX表190中表項與BNX表191中表項之間的交換。這樣的交換可以在各種時刻進行。例如,如果主動表容量已滿,即表示高層次記憶體124容量已滿。高層次記憶體124中的某些記憶塊可以被替換,主動表中也是如此。主動表中被替換的表項可以被移動到保留表中,而軌道表中的BNX指引就需要被交換成新的TBNX指引。在交換過程完成後,TBNX表190和BNX表191中之前的表項可以被設為無效。The exchange between the entries in the TBNX table 190 and the entries in the BNX table 191 can be achieved by scanning the entire track table. Such exchanges can be made at various times. For example, if the active table capacity is full, it means that the high-level memory 124 is full. Some of the memory blocks in the high level memory 124 can be replaced, as are the active tables. The replaced entries in the active table can be moved to the reservation table, and the BNX guidelines in the track table need to be exchanged for new TBNX guidelines. After the exchange process is completed, the previous entries in the TBNX table 190 and the BNX table 191 can be set to be invalid.
回到圖10A,當分支點內容被送到交換器133,且交換器133完成對軌道表讀口161的相應操作時,即意味著高層次記憶體124中已有包含分支目標指令的指令段,那麼結果BN就被直接輸出到循跡器170。後續操作與圖7A中的類似。圖10B給出了緩存系統9000的一部分,顯示了使用被動表120和主動表121操作軌道表126、高層次記憶體124和處理器核125的實施例。Returning to Fig. 10A, when the branch point contents are sent to the switch 133, and the switch 133 completes the corresponding operation of the track table read port 161, it means that the high-level memory 124 already has an instruction segment containing the branch target instruction. Then, the result BN is directly output to the tracker 170. Subsequent operations are similar to those in Figure 7A. FIG. 10B shows a portion of the cache system 9000 showing an embodiment of operating the track table 126, the high level memory 124, and the processor core 125 using the passive table 120 and the active table 121.
如圖10B所示,與圖7B類似,循跡器170包括自增器136、多路選擇器137、和寄存器138。為便於描述省略了其他部件。在操作過程中,循跡器170輸出一個位址(即BN 151)用於對軌道表126和高層次記憶體124進行定址。BN 151包含BNX 152和BNY 156。BNX 152可以被用於對軌道表126中的一個行或一條軌道進行定址,並對高層次記憶體124中的一個記憶塊進行定址,此時BNY 156可以被用於對軌道表126中由BNX 152指向的軌道或行中的一個表項進行定址。As shown in FIG. 10B, similar to FIG. 7B, the tracker 170 includes an auto-incrementer 136, a multiplexer 137, and a register 138. Other components are omitted for convenience of description. During operation, tracker 170 outputs an address (i.e., BN 151) for addressing track table 126 and high level memory 124. BN 151 includes BNX 152 and BNY 156. BNX 152 can be used to address one row or track in track table 126 and address a memory block in high level memory 124, at which point BNY 156 can be used in track table 126 by BNX An entry in the track or row pointed to by 152 is addressed.
此外,多路選擇器137受來自處理器核125的信號381控制以選擇來源於交換器133的下一BN 166或來源於自增器136的BN作為輸出418。從多路選擇器137來的輸出418(即下一BN)可以在處理器核125來的信號417的控制下被存儲到寄存器138中。當寄存器138受信號417控制保持當前BN 151不變時,寄存器138不會更新輸出418。另一方面,當寄存器138受信號417控制更新當前BN 151時,所述輸出418被送到匯流排151上成為當前BN 151,從而更新BNX 152和BNY 156。In addition, multiplexer 137 is controlled by signal 381 from processor core 125 to select the next BN 166 from switch 133 or the BN from auto-incrementer 136 as output 418. Output 418 (i.e., next BN) from multiplexer 137 may be stored in register 138 under the control of signal 417 from processor core 125. Register 138 does not update output 418 when register 138 is controlled by signal 417 to keep current BN 151 unchanged. On the other hand, when the register 138 is controlled by the signal 417 to update the current BN 151, the output 418 is sent to the bus bar 151 to become the current BN 151, thereby updating the BNX 152 and the BNY 156.
為了描述軌道表126與循跡器170間的相互關係,與圖7B類似,假設軌道表126包含了軌道(即:行)410、411和412。每條軌道可以包含從0號到15號的16個表項或軌跡點。此外,軌跡點413(軌道410中的第8項)可以是一個分支目標為軌跡點414(軌道411中的第2項)的分支點,以及軌跡點415(軌道411中的第14項)可以是另一個分支點,其分支目標是軌跡點416(軌道412中的第5項)。To describe the relationship between the track table 126 and the tracker 170, similar to FIG. 7B, it is assumed that the track table 126 includes tracks (ie, rows) 410, 411, and 412. Each track can contain 16 entries or track points from 0 to 15. Further, the track point 413 (the eighth item in the track 410) may be a branch point where the branch target is the track point 414 (the second item in the track 411), and the track point 415 (the 14th item in the track 411) may Is another branch point whose branch target is track point 416 (the fifth item in track 412).
假設軌道410對應的指令段已經被填充在高層次記憶體124中,且處理器核125從軌道410的起始位置開始執行指令。這就是說,處理器核125的程式計數器(PC)從軌道410中的第0項對應的指令位址開始運行。It is assumed that the instruction segment corresponding to the track 410 has been filled in the high level memory 124, and the processor core 125 executes the instruction from the start position of the track 410. That is to say, the program counter (PC) of the processor core 125 starts from the instruction address corresponding to the 0th entry in the track 410.
同時,假設循跡器170也送出一個包含BNX和BNY的指向軌道表126中軌道410的第0項的讀指標151。軌道410中的其他表項也可以被用到。可以通過檢查所述表項中的內容確定指令的類型資訊和位址資訊等。At the same time, it is assumed that the tracker 170 also sends a read indicator 151 containing the 0th item of the track 410 in the track table 126 of BNX and BNY. Other entries in track 410 can also be used. The type information, address information, and the like of the instruction can be determined by checking the contents of the entry.
正如之前所述,當從軌道410的第0項開始運行時,由於軌道410的第0項不是一個分支點,循跡器170保持BNX 152不變並通過自增器136將BNY增加一,從而得到對應軌道表126中軌道410中的下一個表項的下一個BN。循跡器170不斷增加BNY從而向軌道410中的下一表項移動,直到到達一個分支點,如:軌跡點413(軌道410中的第8項)。在這個過程中,由於BNX沒有變化,因此指令段位址沒有變化。處理器核125可以利用PC的偏移量從高層次記憶體124中不斷獲取指令。As described earlier, when running from the 0th item of the track 410, since the 0th item of the track 410 is not a branch point, the tracker 170 keeps the BNX 152 unchanged and increases BNY by the booster 136, thereby The next BN of the next entry in the track 410 in the corresponding track table 126 is obtained. The tracker 170 continuously increments BNY to move to the next entry in track 410 until a branch point is reached, such as track point 413 (item 8 in track 410). In this process, there is no change in the instruction segment address since BNX has not changed. The processor core 125 can continuously fetch instructions from the high level memory 124 using the offset of the PC.
當循跡器170到達軌跡點413(軌道410中的第8項)後,由於軌跡點413是一個分支點,因此對源位址和表項內容即目標位址都進行分析。交換器133可以按BN形式或TBN形式對目標位址進行檢查。如果目標位址是一個BN,那麼所述目標位址對應的指令段已經被填充到高層次記憶體124中並準備被處理器核125讀取。另一方面,如果目標位址是一個TBN,那麼所述目標位址對應的指令段尚未被填充到高層次記憶體124中。這樣,如果TBN對應的指令段不在高層次記憶體124中,那麼所述指令段被填充到高層次記憶體124中。而且如之前所述,交換器133將TBNX轉換為BNX並將BNY的值設為TBNY的值。這樣,交換器133就能提供一個BN,所述BN被送出作為下一BN 166。無論表項內容是BN還是TBN,交換器133都能提供一個BN作為下一BN 166。When the tracker 170 reaches the track point 413 (the eighth item in the track 410), since the track point 413 is a branch point, the source address and the content of the entry, that is, the target address are analyzed. The switch 133 can check the target address in the form of BN or TBN. If the target address is a BN, the instruction segment corresponding to the target address has been filled into the high level memory 124 and is ready to be read by the processor core 125. On the other hand, if the target address is a TBN, the instruction segment corresponding to the target address has not been filled into the high level memory 124. Thus, if the instruction segment corresponding to the TBN is not in the high level memory 124, the instruction segment is filled into the high level memory 124. And as previously described, switch 133 converts TBNX to BNX and sets the value of BNY to the value of TBNY. Thus, the switch 133 can provide a BN that is sent out as the next BN 166. The switch 133 can provide a BN as the next BN 166 regardless of whether the entry is BN or TBN.
此外,如果包含源位址下一指令的指令段尚未被填充到高層次記憶體124中,那麼該指令段也被填充到高層次記憶體124中以供處理器核125可能的執行。然而,對於軌跡點413,由於下一指令是軌道410的第9表項且對應軌道410的指令段已經被填充到高層次記憶體124中,因此軌跡點413的下一指令不需要填充。這樣,只有當包含分支目標(軌道411的第2表項)的軌道411對應的指令段尚未填充時,才被填充到高層次記憶體124中。Moreover, if the instruction segment containing the next instruction of the source address has not been filled into the high level memory 124, then the instruction segment is also populated into the high level memory 124 for possible execution by the processor core 125. However, for track point 413, since the next instruction is the ninth entry of track 410 and the instruction segment of corresponding track 410 has been filled into high level memory 124, the next instruction of track point 413 does not need to be filled. Thus, only when the instruction segment corresponding to the track 411 including the branch target (the second entry of the track 411) has not been filled, is it filled into the high-level memory 124.
由於循跡器170沿所述指令軌跡點移動的速度比處理器核125執行指令的速度,處理器核125可能執行的兩個指令段都能在處理器核125執行這兩個指令段中任何指令之前被填充到高層次記憶體124中。這樣就不會發生緩存缺失。BNY 156可以被視為領先指標的一部分(BNX在同一軌道中不變)以在處理器核125執行指令之前將處理器核125可能執行的指令填充到高層次記憶體124中。Since the tracker 170 moves at the command track point faster than the processor core 125 executes the instructions, both instruction segments that the processor core 125 can execute can execute any of the two instruction segments at the processor core 125. The instructions are previously filled into the high level memory 124. This will not cause a cache miss. BNY 156 may be considered part of the lead indicator (BNX is unchanged in the same track) to populate high level memory 124 with instructions that processor core 125 may execute before processor core 125 executes the instructions.
這樣,由於保留表120中的TBNX不會自動填充高層次記憶體124,因此可以在短時間內生成大量軌道或軌道表126中的表項。可以在執行流接近所述指令(如:分支目標指令)時才將所述指令填充到高層次記憶體124中。Thus, since the TBNX in the reservation table 120 does not automatically fill the high-level memory 124, a large number of entries in the track or track table 126 can be generated in a short time. The instructions may be populated into the high level memory 124 when the execution stream approaches the instruction (eg, a branch target instruction).
此外,當到達軌跡點413且相關指令段已經被填充到高層次記憶體124中時,軌道表126或交換器133可以提供分支目標BNX為411及BNY為2作為下一BN 166(軌道411的第2表項),且循跡器170可以等待軌跡點413的分支指令被處理器核125執行時送來的表示分支轉移是否發生的信號381。Moreover, when track point 413 is reached and the associated instruction segment has been filled into high level memory 124, track table 126 or switch 133 can provide branch target BNX of 411 and BNY of 2 as the next BN 166 (track 411 The second entry), and the tracker 170 can wait for a signal 381 sent by the processor core 125 when the branch instruction of the track point 413 is executed, indicating whether a branch transfer has occurred.
若如本實施例中發生分支轉移,則軌道表126或交換器133將下一BN 166作為BN 151供循跡器170使用,其中BNX指向軌道411且BNY指向軌道411的第2個表項。同時BNX也被用做一個對應高層次記憶體124中指令段的位址,這樣處理器核125可以從軌道411的第2個表項開始執行指令。然而,如果分支轉移沒有發生,循跡器170或領先指標如同將分支點簡單視為一個非分支點向前移動。If branching occurs as in this embodiment, track table 126 or switch 133 uses next BN 166 as BN 151 for tracker 170, where BNX points to track 411 and BNY points to the second entry of track 411. At the same time, BNX is also used as an address corresponding to the instruction segment in the high level memory 124, so that the processor core 125 can execute the instruction from the second entry of the track 411. However, if branch branching does not occur, the tracker 170 or leading indicator moves forward as if the branch point was simply treated as a non-branch point.
類似地,從軌道411的第2個表項開始,循跡器170找到下一個分支目標為軌跡點416(軌道412的第5個表項)的分支軌跡點415(軌道411的第14個表項)。接著,如果對應軌道412的指令段尚未被填充到高層次記憶體124,則將該指令段填充到高層次記憶體124中,且如上所述,領先指標等待軌跡點415分支指令執行。Similarly, starting from the second entry of the track 411, the tracker 170 finds the branch track point 415 of the next branch target as the track point 416 (the fifth entry of the track 412) (the 14th table of the track 411) item). Then, if the instruction segment of the corresponding track 412 has not been filled into the high level memory 124, the instruction segment is filled into the high level memory 124, and as described above, the leading indicator waits for the track point 415 branch instruction to execute.
此外,上述討論基於的是一層的軌道操作。也就是說,領先指標聽在第一個分支點,即對應第一個分支點的兩個可能分支結果進行相應的填充操作。軌道表126也可以支援兩層的軌道操作或多層的軌道操作。例如,在一個兩層的軌道操作中,領先指標可以停在所述第一個分支點之後的第一個分支點。這樣,所述兩個分支點的四個可能結果對應的指令都被填充到高層次記憶體124中。類似地,可以用多層的軌道操作填充更多的指令。Furthermore, the above discussion is based on a layer of orbital operations. That is to say, the leading indicator listens to the first branch point, that is, the corresponding two branches of the first branch point result in corresponding filling operations. The track table 126 can also support two layers of track operations or multiple layers of track operations. For example, in a two-layer track operation, the lead indicator can stop at the first branch point after the first branch point. Thus, the instructions corresponding to the four possible outcomes of the two branch points are filled into the high level memory 124. Similarly, more instructions can be filled with multiple layers of track operations.
可以理解的是,雖然保留表120、主動表121和交換器133被用來實現更為靈活和高效的填充高層次記憶體124的操作,但是正如之前所述,也可以用一個表或其他結構實現。It will be appreciated that while the reservation table 120, the active table 121, and the switch 133 are used to achieve a more flexible and efficient operation of filling the high level memory 124, as previously described, a table or other structure may be used. achieve.
此外,回到圖10A,在操作中,更多的軌道可以被添加到軌道表126中,且對應的指令被填充到高層次記憶體124中。然而,軌道表126和/或高層次記憶體124的容量是有限的。需要一個替換機制對軌道表126中的軌道和/或高層次記憶體124中的指令段進行替換。例如,可以使用一個基於主動表121、保留表120和軌道表126的替換機制。特別地,可以確定主動表121中可被替換的表項。Furthermore, returning to FIG. 10A, in operation, more tracks can be added to the track table 126 and corresponding instructions are populated into the high level memory 124. However, the capacity of track table 126 and/or high level memory 124 is limited. An alternate mechanism is needed to replace the tracks in the track table 126 and/or the instruction segments in the high level memory 124. For example, an alternative mechanism based on active table 121, reserved table 120, and track table 126 can be used. In particular, an entry in the active list 121 that can be replaced can be determined.
假設軌道表126通過匯流排180送來的內容TBNX值‘118’被用於填充一個指令段到高層次記憶體124中,所述TBNX值‘118’對應的指令段位址0x1FC0存儲在保留表120中,且主動表121中匯流排153指向的表項存有一個BNX值‘006’,其對應的指令段位址為0x4000。這樣,讀出保留表120中的位址0x1FC0送到匯流排144用於替換主動表121中的位址0x4000,且地址0x1FC0被送到填充引擎132用於將從位址0x1FC0開始指令段填充到高層次記憶體124中以替換從位址0x4000開始的指令段。此外,對應位址0x4000的含有BNX值‘006’的表項被移動到保留表120中由指標127指向的表項中。It is assumed that the content TBNX value '118' sent by the track table 126 through the bus bar 180 is used to fill an instruction segment into the high level memory 124, and the instruction segment address 0x1FC0 corresponding to the TBNX value '118' is stored in the reserved table 120. The entry pointed to by the bus bar 153 in the active table 121 stores a BNX value of '006', and the corresponding command segment address is 0x4000. Thus, the address 0x1FC0 in the read reservation table 120 is sent to the bus 144 for replacing the address 0x4000 in the active table 121, and the address 0x1FC0 is sent to the fill engine 132 for filling the instruction segment from the address 0x1FC0 to The high level memory 124 replaces the instruction segment starting from address 0x4000. Further, the entry containing the BNX value '006' corresponding to the address 0x4000 is moved to the entry in the reservation table 120 pointed to by the indicator 127.
也可以使用一種替換策略確定應該替換軌道表126中哪個表項或存儲單元。例如可以採用最近最少使用策略或最少頻繁使用策略。當採用最近最少使用策略時,每個軌道或軌跡點包含一個使用位(U位);當採用最少頻繁使用策略時,每個軌道或軌跡點包含一個記錄使用次數的計數器。An alternative strategy can also be used to determine which entry or storage unit in the track table 126 should be replaced. For example, a least recently used policy or a least frequently used policy can be employed. When using the least recently used policy, each track or track point contains one usage bit (U bit); when using the least frequent usage policy, each track or track point contains a counter that records the number of uses.
在某些情況下,可以使用超過一層的緩存結構。分配器1200或主動表121可以被用於支援超過一層的緩存結構。圖11顯示了用於多層緩存結構的分配器或保留表的一個實施例。In some cases, more than one layer of cache structure can be used. Distributor 1200 or active table 121 can be used to support more than one layer of cache structure. Figure 11 shows an embodiment of an allocator or reservation table for a multi-layer cache structure.
本實施例以三層存儲層次為例,分別為三級,二級與一級。為了便於說明,假設這三層記憶體都被用做指令記憶體(資料記憶體也類似)。二級記憶體的容量是一級記憶體容量的兩倍(即,一個二級記憶體塊可以包含兩個一級記憶體塊),三級記憶體容量是二級記憶體容量的兩倍(即,一個三級記憶體塊可以包含兩個二級記憶體塊或四個一級記憶體塊)。一級記憶體作為高層次記憶體直接連接至處理器核125。對於更多存儲層次的情況,也可用應用本發明所述方法。In this embodiment, a three-layer storage hierarchy is taken as an example, which is three levels, two levels, and one level. For the sake of explanation, it is assumed that these three layers of memory are used as instruction memory (data memory is similar). The capacity of the secondary memory is twice the capacity of the primary memory (ie, one secondary memory block can contain two primary memory blocks), and the tertiary memory capacity is twice the secondary memory capacity (ie, A three-level memory block can contain two secondary memory blocks or four primary memory blocks). The primary memory is directly connected to the processor core 125 as a high level memory. For the case of more storage tiers, the method of the invention can also be applied.
此外,為便於描述,三級記憶體包含了二級記憶體和一級記憶體中的所有內容,但二級記憶體不一定包含一級記憶體中的內容。雖然沒有在圖中顯示,可以使用一個軌道表建立在這三層記憶體中的指令的軌道,且每個軌跡點(如分支點)可以用圖11中所示的兩種格式之一表示。一種格式包括兩個部分,從高位到低位元分別是一級記憶體索引位址的塊位址部分,和軌道內或記憶塊內的偏移量部分。另一種格式包括三個部分,分別是三級記憶體索引位址的塊位址部分、索引和偏移量部分。In addition, for convenience of description, the third-level memory contains all the contents of the secondary memory and the primary memory, but the secondary memory does not necessarily contain the content in the primary memory. Although not shown in the figure, a track table can be used to build the tracks of the instructions in the three-layer memory, and each track point (such as a branch point) can be represented by one of the two formats shown in FIG. One format consists of two parts, the high-order to low-order bits being the block address portion of the first-level memory index address, and the offset portion within the track or within the memory block. The other format consists of three parts, the block address part of the three-level memory index address, the index and the offset part.
如圖11所示,分配器1200或保留表120可以包含一個內容定址記憶體(CAM)87和一個隨機訪問記憶體(RAM)98。CAM 87包含一列表項,CAM中的每一個表項對應一個三級記憶塊號碼BNX3。這樣,每個表項可以包含一個對應特定的BNX3的三級記憶體記憶塊的位址。As shown in FIG. 11, the distributor 1200 or the reservation table 120 may include a content addressed memory (CAM) 87 and a random access memory (RAM) 98. The CAM 87 contains a list item, and each entry in the CAM corresponds to a three-level memory block number BNX3. Thus, each entry can contain an address corresponding to a particular BNX3 level three memory block.
此外,RAM 98可以包含6列,其中兩列88用於存儲對應特定三級記憶塊的兩個二級記憶塊號BNX2及有效位,其他四列89用於對應所述三級記憶塊的四個一級軌道號BNX1及有效位。多路選擇器93可以根據索引位元97選擇一個對應所述三級記憶塊的特定的一級記憶塊號碼或軌道號。類似地,多路選擇器92可以根據索引位元97,更準確地說是索引位元97中的高位LSB1 90選擇一個對應所述三級記憶塊的特定的二級記憶塊號碼或軌道號。In addition, the RAM 98 may include six columns, wherein two columns 88 are used to store two secondary memory block numbers BNX2 and valid bits corresponding to a particular three-level memory block, and the other four columns 89 are used to correspond to four of the three-level memory blocks. One level track number BNX1 and valid bits. The multiplexer 93 can select a particular one-level memory block number or track number corresponding to the three-level memory block based on the index bit 97. Similarly, multiplexer 92 may select a particular secondary memory block number or track number corresponding to the tertiary memory block based on index bit 97, and more specifically upper bit LSB1 90 in index bit 97.
此表可以由兩種方式訪問。一種是利用存儲位址(如:三級記憶塊位址)搜索CAM 87。如果有位址匹配,那麼選擇CAM中的匹配表項,並將對應的RAM 98中的內容讀出。另一種是用三級記憶塊號TBN的第一位址BNX3 94在CAM 87和RAM 98中直接定址,讀出CAM 87和/或RAM 98中被選中的行的內容。This table can be accessed in two ways. One is to search for CAM 87 using a storage address (eg, a three-level memory block address). If there is an address match, the matching entry in the CAM is selected and the contents of the corresponding RAM 98 are read. The other is to address directly in CAM 87 and RAM 98 with the first address BNX3 94 of the three-level memory block number TBN, reading the contents of the selected row in CAM 87 and/or RAM 98.
如之前例子所述,在將指令段從主記憶體或任何外部記憶體填充到這所有三級的記憶體中時,掃描、檢測所述被填充的指令。當檢測到分支指令時,所述分支指令的分支目標位址被用來與CAM 87中存的三級記憶體塊位址作比較。As described in the previous examples, the filled instructions are scanned and detected when the instruction segments are filled from the main memory or any external memory into all of the three levels of memory. When a branch instruction is detected, the branch target address of the branch instruction is used to compare with the level three memory block address stored in CAM 87.
如果沒有找到匹配,這意味著在三級記憶體中還未包含所述分支目標位址對應的指令段。這時根據某種準則,如替換策略,選定三級記憶體中一個三級記憶塊,把分支目標所在的指令段填充到該記憶塊。同時,將所述選定的三級記憶體中記憶塊位址資訊作為軌跡點內容填入在一級軌道表中所述分支點的對應表項。所述選定的三級記憶塊的塊號被用做第一位址BNX3 94,存儲位址中的索引部分被用做索引號97,且存儲位址中的偏移量部分被用做偏移量(BNY)96。此外,索引號97可以包含2位元,其中的高位LSB1 90被用於區分二級記憶體中的兩個記憶塊,高位LSB1和低位元LSB0 97一起被用於區分一級記憶體中的四個記憶塊。If no match is found, this means that the instruction segment corresponding to the branch target address is not yet included in the third level memory. At this time, according to a certain criterion, such as a replacement strategy, a third-level memory block in the three-level memory is selected, and the instruction segment in which the branch target is located is filled into the memory block. At the same time, the memory block address information in the selected three-level memory is filled as the track point content into the corresponding entry of the branch point in the first-level track table. The block number of the selected three-level memory block is used as the first address BNX3 94, the index portion in the storage address is used as the index number 97, and the offset portion in the storage address is used as the offset. Quantity (BNY) 96. In addition, the index number 97 may include 2 bits, wherein the upper bit LSB1 90 is used to distinguish two memory blocks in the secondary memory, and the upper LSB1 and the low bit LSB0 97 are used together to distinguish four of the first-level memories. Memory block.
另一方面,如果找到匹配,則表明至少在三級記憶體中存儲有所需的指令塊。這時將匹配到的BNX3和索引號、偏移量一同作為軌跡點內容填入軌道表項。On the other hand, if a match is found, it indicates that at least the required block of instructions is stored in the level three memory. At this time, the matched BNX3 is filled in the track entry together with the index number and the offset as the track point content.
在運行過程中,當領先指標到達上述軌道表項時,所述軌道表項或軌跡點顯示的分支目標位址是三級記憶體的TBN。可以使用該TBN中的第一位址(94)到CAM 87和/或RAM 98定址。During the running process, when the leading indicator reaches the track entry, the branch target address displayed by the track entry or the track point is the TBN of the third-level memory. The first address (94) in the TBN can be addressed to CAM 87 and/or RAM 98.
特別地,可以使用一級軌道的第一位址94(BNX3)對RAM 98定址並讀出對應的兩個二級軌道號和有效位以及四個一級軌道號和有效位。多路選擇器93根據索引位元97(即:LSB1、LSB0)和有效位V從四個一級塊號中選出一個有效的一級軌道號。此外,多路選擇器92根據索引高位90(即:LSB1)和有效位V從兩個二級塊號中選出一個有效的二級軌道號。In particular, the RAM 98 can be addressed and read out using the first address 94 of the primary track (BNX3) and the corresponding two secondary track numbers and significant bits and four primary track numbers and significant bits can be read. The multiplexer 93 selects a valid primary track number from the four primary block numbers based on the index bit 97 (i.e., LSB1, LSB0) and the valid bit V. In addition, multiplexer 92 selects a valid secondary track number from the two secondary block numbers based on index high 90 (i.e., LSB1) and valid bit V.
如果選出了一個有效的一級軌道號,說明所述目標位址對應的指令段已經被填充到一級記憶體中,直接將所述有效的一級軌道號送到匯流排99以替換所述分支指令的第一位址。同時,拋棄相應的索引,塊內偏移(BNY)不變。這樣TBN就成為了BN。此外,由於一個三級記憶塊含有四個一級記憶塊,僅憑BNX3 94並不能確定一個一級記憶塊號。BNX3與索引97一起確定特定的一級記憶塊號。在四個一級記憶塊中,可以包含對應三級記憶塊的零個、一個、兩個、三個或四個一級記憶塊。類似地,在兩個二級記憶塊中,可以包含對應三級記憶塊中的零個、一個、或兩個一級記憶塊。If a valid first-level track number is selected, it indicates that the instruction segment corresponding to the target address has been filled into the primary memory, and the valid primary track number is directly sent to the bus 99 to replace the branch instruction. First address. At the same time, the corresponding index is discarded, and the intra-block offset (BNY) is unchanged. This way TBN becomes BN. In addition, since a three-level memory block contains four first-level memory blocks, a single-level memory block number cannot be determined by BNX3 94 alone. BNX3, together with index 97, determines a particular level 1 memory block number. In the four primary memory blocks, zero, one, two, three or four primary memory blocks corresponding to the three-level memory blocks may be included. Similarly, in two secondary memory blocks, zero, one, or two primary memory blocks in the corresponding three-level memory block may be included.
另一方面,如果沒有有效的一級軌道號被選出,說明所述目標位址對應的指令段尚未被填充到一級記憶體中。如果一個有效的二級塊號被選出,那麼意味著目標位址對應的指令段已經被填充到二級記憶體中,則可以將有效的二級塊號送到匯流排91。這時可以從二級記憶體中將與這個二級記憶塊號對應的指令段填充到一級記憶體中,同時更新RAM 98中相應的一級記憶塊的塊號和有效位以對應所述填入的指令段。例如,可以更新RAM 98中由BNX3和所述索引指向的表項中的一級塊號(BNX1)及其有效位,並將所述軌道表項內容格式更新為使用一級軌道的BN號碼。所述BN號碼包含一個第一位址(即:BNX1)和一個第二位址(即:偏移量或BNY)。On the other hand, if no valid primary track number is selected, it indicates that the instruction segment corresponding to the target address has not been filled into the primary memory. If a valid secondary block number is selected, it means that the instruction segment corresponding to the target address has been filled into the secondary memory, and the valid secondary block number can be sent to the bus 91. At this time, the instruction segment corresponding to the secondary memory block number can be filled into the primary memory from the secondary memory, and the block number and the valid bit of the corresponding primary memory block in the RAM 98 are updated to correspond to the filled Instruction segment. For example, the primary block number (BNX1) in the entry pointed to by the BNX3 and the index in the RAM 98 and its valid bit may be updated, and the track entry content format is updated to the BN number using the primary track. The BN number includes a first address (ie: BNX1) and a second address (ie, offset or BNY).
如果沒有有效的二級軌道號被選出,意味著目標位址對應的指令段尚未被填充到二級記憶體中,那麼從三級記憶體中將所述三級軌道號對應的指令段填充到二級記憶體與一級記憶體中。RAM 98中的相應部分也要被更新以對應一級記憶體和二級記憶體中的所述被填入的指令段。例如,可以更新RAM 98中由BNX3和所述索引指向的表項中的一級塊號(BNX1)及其有效位,並將所述軌道表項內容格式更新為使用一級軌道的BN號碼。如果也填充了二級記憶塊,那麼還可以更新RAM 98中由BNX3和所述索引指向的表項中的二級塊號(BNX2)及其有效位。If no valid secondary track number is selected, meaning that the instruction segment corresponding to the target address has not been filled into the secondary memory, the instruction segment corresponding to the tertiary track number is filled from the tertiary memory to Secondary memory and primary memory. Corresponding portions of RAM 98 are also updated to correspond to the filled instruction segments in the primary and secondary memory. For example, the primary block number (BNX1) in the entry pointed to by the BNX3 and the index in the RAM 98 and its valid bit may be updated, and the track entry content format is updated to the BN number using the primary track. If the secondary memory block is also populated, the secondary block number (BNX2) in the entry in the RAM 98 and pointed to by the index and its valid bit can also be updated.
當指令段被填充時,所述指令段可以先從三級記憶體填充到二級記憶體,再從二級記憶體填充到一級記憶體。或者,在三級記憶體與一級記憶體之間有獨立通路的情況下,可以在將所述指令段從三級記憶體填充到二級記憶體的同時,將所述指令段從三級記憶體填充到一級記憶體。此外,如果一級記憶體中的軌跡點隻包含一級軌道資訊,也可以用與之前類似的方法進行操作。When the instruction segment is filled, the instruction segment can be filled from the third-level memory to the secondary memory and then from the secondary memory to the primary memory. Alternatively, in the case where there is an independent path between the third-level memory and the first-level memory, the instruction segment can be read from the third-level memory while the instruction segment is filled from the third-level memory to the secondary memory. The body is filled to the primary memory. In addition, if the track point in the primary memory contains only the primary track information, it can also be operated in a similar manner as before.
圖14A是本發明所述緩存系統的另一個實施例10000。緩存系統10000與圖10A中的緩存系統9000類似。然而,緩存系統10000包含了用於支援多線程程式的某些特徵。Figure 14A is another embodiment 10000 of the cache system of the present invention. Cache system 10000 is similar to cache system 9000 in Figure 10A. However, cache system 10000 includes certain features for supporting multi-threaded programs.
軌道表126中不同的軌道可以對應一個線程或多個線程。由於線上程內容切換時需要保存和恢復線程狀態,因此使用了複數個棧135,用於分別保存線程壓棧的資訊。一個線程標識(PID)188存儲了當前線程標識或線程號。當循跡器170使用棧135時,PID 188提供一個指向所述棧的指標,從而進行正確的棧操作。Different tracks in track table 126 may correspond to one thread or multiple threads. Since the thread state needs to be saved and restored when the online content is switched, a plurality of stacks 135 are used for respectively storing the information of the thread stack. A thread identification (PID) 188 stores the current thread identification or thread number. When the tracker 170 uses the stack 135, the PID 188 provides an indicator to the stack for proper stack operation.
此外,可以在低級記憶體122外提供一個第二填充/生成器187。填充/生成器187中的生成器186與填充生成器123中的生成器130類似,但比生成器130有更高的帶寬。也就是說,生成器186可以一次對更多的指令進行掃描和審查。此外,填充/生成器187對保留表120的操作也與填充生成器123對主動表121的操作類似。這樣,填充引擎185將對應於保留表120中位址的指令段從更低層次記憶體(圖中未顯示)填充到低級記憶體122中。這樣,對應於保留表120中位址的指令段就存儲在低級記憶體122中,從而減少或消除了等待處理器核125取指的時間。Additionally, a second fill/builder 187 can be provided outside of the low level memory 122. The generator 186 in the fill/builder 187 is similar to the generator 130 in the fill generator 123, but has a higher bandwidth than the generator 130. That is, generator 186 can scan and review more instructions at a time. Further, the operation of the padding/generator 187 for the reservation table 120 is also similar to the operation of the pad generator 123 for the active table 121. Thus, the fill engine 185 fills the instruction segments corresponding to the addresses in the reserved table 120 from the lower level memory (not shown) into the lower level memory 122. Thus, the instruction segments corresponding to the addresses in the reservation table 120 are stored in the low level memory 122, thereby reducing or eliminating the time waiting for the processor core 125 to fetch.
此外,不同的軌道可以對應同一個指令段(同一個指令段可以因具有不同的虛擬位址而被存儲在不同的一級緩存記憶塊中)。填充/生成器187也可以包含一個位於填充引擎185之外的翻譯轉換緩衝(TLB)131,使得低級記憶體122和高層次記憶體124中的指令都處於物理位址模式,而處理器核125可以在不進行虛擬到物理位址轉換的情況下直接從高層次記憶體124獲取指令。In addition, different tracks can correspond to the same instruction segment (the same instruction segment can be stored in different level 1 cache memory blocks due to different virtual address addresses). The pad/generator 187 may also include a translation translation buffer (TLB) 131 located outside of the padding engine 185 such that instructions in both the low level memory 122 and the high level memory 124 are in physical address mode, while the processor core 125 Instructions can be fetched directly from the high level memory 124 without virtual to physical address translation.
圖14B顯示了緩存系統10000中一個組成部分。如圖14B所示,主動表121中的每一個表項可以對應高層次記憶體124中的一個記憶塊或指令段,並對應軌道表126中的一條軌道。這樣,可以通過主動表121管理高層次記憶體124。另一方面,低級記憶體122也可以作為緩存使用,並可以通過保留表120管理低級記憶體122。這樣,保留表120中的每一個表項可以對應低級記憶體122中的一個記憶塊或指令段。此外,為便於描述,假設高層次記憶體124和低級記憶體122是互不包含的。換句話說,任何一個存儲位址對應的內容或記憶塊不會同時存在於高層次記憶體124和低級記憶體122中。Figure 14B shows an integral part of the cache system 10000. As shown in FIG. 14B, each entry in the active list 121 may correspond to a memory block or instruction segment in the high level memory 124 and correspond to a track in the track table 126. In this way, the high level memory 124 can be managed by the active table 121. On the other hand, the low-level memory 122 can also be used as a cache, and the low-level memory 122 can be managed by the reservation table 120. Thus, each entry in the reservation table 120 can correspond to a memory block or instruction segment in the low level memory 122. Further, for convenience of description, it is assumed that the high-level memory 124 and the low-level memory 122 are not included in each other. In other words, the content or memory block corresponding to any one of the storage addresses does not exist in both the high level memory 124 and the low level memory 122.
當指令被填充到高層次記憶體124中時,生成器掃描並審查所述指令,並可能在軌道表126中建立一條包含分支點的軌道。將所述分支目標位址與主動表121中的表項進行匹配。如果匹配成功,表示已經有一個相應的記憶塊被填充到高層次記憶體124中,那麼在軌道表126中以BN格式將高層次記憶體124中匹配成功的塊號記錄為分支目標位址。然而,如果匹配不成功,表示相應的記憶塊還沒有被填充到高層次記憶體124中,那麼在保留表120中對分支目標位址進行匹配以開始填充過程。可選地,分支目標位址可能同時在保留表120和主動表121的表項中同時匹配成功。When the instructions are populated into the high level memory 124, the generator scans and reviews the instructions and may create a track containing the branch points in the track table 126. The branch target address is matched with the entry in the active table 121. If the matching is successful, indicating that a corresponding memory block has been filled into the high level memory 124, the matching block number in the high level memory 124 in the track table 126 is recorded as the branch target address in the BN format. However, if the match is unsuccessful, indicating that the corresponding memory block has not been filled into the high level memory 124, then the branch target address is matched in the reservation table 120 to begin the padding process. Optionally, the branch target address may be successfully matched in the entries of the reserved table 120 and the active table 121 at the same time.
如果在保留表120中匹配成功,表示相應的指令段已經被填充到低級記憶體122中,那麼在軌道表126中以TBN格式將低級記憶體122中匹配成功的塊號記錄為分支目標位址。如果在保留表120和主動表121中都沒有匹配成功,填充引擎185將相應指令段從外部記憶體(圖中未顯示)通過匯流排423填充到低級記憶體122中。虛擬到物理位址轉換器131可以對虛擬位址和物理位址進行轉換和翻譯。這樣,被填入低級記憶體122中的記憶塊包含了相應的指令段,並在軌道表126中以TBN格式將低級記憶體122中的被填充的記憶塊號記錄為分支目標位址。If the matching in the reservation table 120 is successful, indicating that the corresponding instruction segment has been filled into the low-level memory 122, the block number in the lower-level memory 122 in which the matching is successful in the TBN format is recorded as the branch target address in the track table 126. . If no match is successful in both the reservation table 120 and the active table 121, the fill engine 185 fills the corresponding instruction segment from the external memory (not shown) through the bus bar 423 into the low-level memory 122. The virtual to physical address translator 131 can convert and translate virtual addresses and physical addresses. Thus, the memory block filled in the low-level memory 122 contains the corresponding instruction segment, and the padded memory block number in the low-level memory 122 is recorded as the branch target address in the track table 126 in the TBN format.
在運行過程中,當領先指標156到達軌道表126中的一個包含TBN格式的分支目標位址的分支軌跡點時,如之前所述,在主動表121中產生一個BN,並將相應指令段從低級記憶體122填充到高層次記憶體124中。此外,軌道表126中的TBN被替換為所述BN,並清除存儲在保留表120中的對應所述TBN的TBNX。During operation, when the leader indicator 156 reaches a branch track point in the track table 126 that includes the branch target address of the TBN format, as described earlier, a BN is generated in the active table 121, and the corresponding instruction segment is The low level memory 122 is filled into the high level memory 124. Further, the TBN in the track table 126 is replaced with the BN, and the TBNX corresponding to the TBN stored in the reservation table 120 is cleared.
這樣,當對應保留表120中表項的一個指令段被填充到高層次記憶體124中時,相關的TBN被替換成BN。類似地,當對應主動表121中表項的一個指令段被替換或回填到低級記憶體122時,相關的BN就被替換成TBN。通過對保留表120和主動表121中表項的交換,可以實現高效率的多層次緩存運作。Thus, when an instruction segment corresponding to an entry in the reservation table 120 is filled into the high level memory 124, the associated TBN is replaced with BN. Similarly, when an instruction segment corresponding to an entry in the active table 121 is replaced or backfilled to the lower level memory 122, the associated BN is replaced with a TBN. By exchanging the entries in the reservation table 120 and the active table 121, an efficient multi-level cache operation can be realized.
雖然在不同的附圖中分別給出了不同的實施例,這些實施例可以被獨立地實現,也可以在某種組合的情況下被實現。這樣,在不背離本發明原則的前提下,這些實施例中的不同部件可以被單獨使用,也可以被組合使用。為便於描述,下面給出了一些具體的例子。Although different embodiments are respectively shown in different figures, these embodiments may be implemented independently or in some combination. Thus, the various components of these embodiments may be used alone or in combination without departing from the principles of the invention. For the convenience of description, some specific examples are given below.
例如,生成器130可以被用於提取分支源位址,從而對軌道表126的寫入做位址索引。一個源位址(如一個指令的位址)可以被分析得到兩種格式。在一種格式中,用一個高位位址部分、一個索引部分和一個偏移量部分在具有多層緩存層次或存儲層次的情況下代表所述位址,在另一種格式中,用一個高位或塊位址部分和一個偏移量部分代表所述位址。在某些情況中,可以用一個高位位址部分、一個索引部分和一個偏移量部分代表所述分支源位址。此外,可以直接用BNY作為所述偏移量部分,而所述高位位址和索引被送到分配器1200轉換為一個塊號。生成器130也可以被用於提取指令類型(如:無條件分支、條件分支、非分支(包括裝載、存儲指令等))。For example, generator 130 can be used to extract branch source addresses to index the writes to track table 126. A source address (such as the address of an instruction) can be analyzed in two formats. In one format, a higher address portion, an index portion, and an offset portion are used to represent the address in the case of a multi-layer cache hierarchy or a storage hierarchy, and in another format, a high or block bit is used. The address portion and an offset portion represent the address. In some cases, the branch source address may be represented by a high order address portion, an index portion, and an offset portion. In addition, BNY can be used directly as the offset portion, and the upper address and index are sent to the distributor 1200 for conversion to a block number. Generator 130 can also be used to extract instruction types (eg, unconditional branches, conditional branches, non-branches (including loads, store instructions, etc.)).
此外,生成器130可以被用來通過將分支源位址加上分支偏移的方法計算分支目標位址,其中所述分支源位址可以是包含所述分支源指令的指令段的塊位址加上所述分支源指令在指令段中的偏移量,而所述分支偏移可以是一個跳轉的量。所述分支目標位址的高位位址和索引被送到匯流排141與分配器1200(如:主動表121、保留表120)中CAM的內容進行匹配。所述偏移位址被送到匯流排143 WYADDR作為軌道表126的Y寫位址。一個針對軌道表126的寫位址可以是一個用於在軌道表126中建立軌跡點表項的位址,包含一個對應於XADDR的行位址(X位址)和一個對應於YADDR的列位址(Y位址)。Furthermore, generator 130 can be used to calculate a branch target address by adding a branch source address to a branch offset, wherein the branch source address can be a block address of an instruction segment containing the branch source instruction The offset of the branch source instruction in the instruction segment is added, and the branch offset can be an amount of a jump. The upper address and index of the branch target address are sent to the bus bar 141 to match the contents of the CAM in the distributor 1200 (eg, the active table 121, the reserved table 120). The offset address is sent to the bus bar 143 WYADDR as the Y write address of the track table 126. A write address for track table 126 may be an address for establishing a track point entry in track table 126, including a row address (X address) corresponding to XADDR and a column bit corresponding to YADDR. Address (Y address).
這樣,生成器130提供了所述分支源位址作為軌道表126的寫位址,並提供了指令類型和分支目標位址作為軌道表126的寫入內容。生成器130生成除寫位址中X位址之外全部位址,而X位址由分配器1200修改或分配。所述X位址可以是一個對應特定高位位址的塊號碼(BN),該高位位址本身可能太長且不連續。例如,一個18位元的高位位址對應了256K個不同的記憶塊,但用BNX號碼將該高位位址分配到256個塊則只需要8位即可。Thus, generator 130 provides the branch source address as a write address for track table 126 and provides the instruction type and branch target address as the write content for track table 126. Generator 130 generates all of the addresses except the X address in the write address, and the X address is modified or assigned by allocator 1200. The X address may be a block number (BN) corresponding to a particular upper address, which may itself be too long and discontinuous. For example, an 18-bit high-order address corresponds to 256K different memory blocks, but using a BNX number to allocate the high-order address to 256 blocks requires only 8 bits.
軌道表126可以被配置成一中二維表結構,其中,由X位址或第一位址BNX對每一行索引,對應一個記憶塊或一個存儲行,由Y位址或第二位址BNY對每一列索引,對應相應指令(資料)在記憶塊中的偏移量。簡單來說,軌道表的寫位址對應分支源指令位址。此外,對於一個特定的分支源位址(如:高位位址、索引、偏移量),分配器1200(即主動表121)根據所述高位位址和索引分配一個BNX到匯流排153,而BNY就等於所述偏移量。然後,所述BNX和BNY就可以組成一個指向被寫表項的寫位址。The track table 126 can be configured as a two-dimensional table structure in which each row is indexed by an X address or a first address BNX, corresponding to a memory block or a storage row, by a Y address or a second address BNY pair. Each column index corresponds to the offset of the corresponding instruction (data) in the memory block. In simple terms, the write address of the track table corresponds to the branch source instruction address. In addition, for a particular branch source address (eg, high address, index, offset), allocator 1200 (ie, active table 121) assigns a BNX to bus 153 based on the high address and index, and BNY is equal to the offset. Then, the BNX and BNY can form a write address that points to the written entry.
此外,對於分支點,其分支目標位址(高位位址、偏移量)被送到主動表121對高位位址進行匹配,且主動表121可能分配一個BNX。所述分配的BNX與從生成器130來的指令類型和偏移量(BNY)一起,組成軌道表中所述分支源指令對應表項的內容。In addition, for the branch point, its branch target address (higher address, offset) is sent to the active table 121 to match the upper address, and the active table 121 may allocate a BNX. The allocated BNX, together with the instruction type and offset (BNY) from the generator 130, constitutes the content of the branch source instruction correspondence entry in the track table.
軌道表126也可以被用來實現其他目的。例如,在一個系統中,軌道表126可以被用來實現對處理器核125的自動功耗管理。例如,可以指定軌道表126中的一條軌道用於存儲一個當處理器核125處於空轉狀態時執行的空轉任務(即一條空轉軌道)。這樣,所述系統可以記錄空轉軌道被使用或訪問到的百分比。系統可以通過將該百分比與一個預先設置的值或一組預先設置的值比較,調整處理器核125和所述系統的功耗。調整方法可以包括改變時鐘頻率或調整對處理器核125及所述系統的供電電壓。Track table 126 can also be used for other purposes. For example, in one system, track table 126 can be used to implement automatic power management of processor core 125. For example, one of the tracks in the track table 126 can be designated for storing an idle task (ie, an idle track) that is executed when the processor core 125 is in an idle state. In this way, the system can record the percentage of idle tracks that are used or accessed. The system can adjust the power consumption of processor core 125 and the system by comparing the percentage to a pre-set value or a set of pre-set values. The adjustment method can include changing the clock frequency or adjusting the supply voltage to the processor core 125 and the system.
循跡器170可以被用於向軌道表126提供一個讀指標151。讀指標151也可以是BNX和BNY的形式。由讀指標指向的軌道表項的內容與所述表項的BNX和BNY(源BNX和源BNY)一起被讀出並被交換器133檢查。如果所述表項內容包含的是一個TBN,那麼其中TBNX被送到分配器1200處理或轉換為一個BNX並填充一級緩存,之後所述BN(BNY等於所述TBNY的值)被交換器133送到循跡器170。循跡器170可以根據所述內容進行多種不同步驟。例如,如果所述表項不是一個分支點,循跡器170可以用新BNX等於源BNX、新BNY等於源BNY加一的方法更新讀指標。Tracker 170 can be used to provide a read indicator 151 to track table 126. The read indicator 151 can also be in the form of BNX and BNY. The contents of the track entry pointed to by the read indicator are read along with the BNX and BNY (source BNX and source BNY) of the entry and are checked by the switch 133. If the content of the entry contains a TBN, then TBNX is sent to the distributor 1200 for processing or conversion to a BNX and fills the primary cache, after which the BN (BNY equals the value of the TBNY) is sent by the switch 133. To the tracker 170. The tracker 170 can perform a number of different steps depending on the content. For example, if the entry is not a branch point, the tracker 170 may update the read indicator with a new BNX equal to the source BNX, a new BNY equal to the source BNY plus one.
如果所述表項是一個條件分支,循跡器170獲取到目標BNX和BNY(即第一位址和第二位址),並將目標BNX和BNY送到分配器1200(即主動表121)以填充高層次記憶體124或一級緩存。此外,循跡器170可以等待處理器核125送來的對應所述分支點的控制信號。如果所述控制信號表明該分支沒有發生,循跡器170可以用新BNX等於源BNX、新BNY等於源BNY加一的方法更新讀指標。然而,如果所述分支成功發生,循跡器170可以用新BNX等於目標BNX、新BNY等於目標BNY的方法更新讀指標。If the entry is a conditional branch, the tracker 170 acquires the target BNX and BNY (ie, the first address and the second address) and sends the target BNX and BNY to the distributor 1200 (ie, the active table 121). To fill the high-level memory 124 or the first-level cache. In addition, the tracker 170 can wait for a control signal from the processor core 125 corresponding to the branch point. If the control signal indicates that the branch has not occurred, the tracker 170 may update the read indicator with a new BNX equal to the source BNX, a new BNY equal to the source BNY plus one. However, if the branch succeeds, the tracker 170 may update the read indicator with a new BNX equal to the target BNX and a new BNY equal to the target BNY.
如果所述表項是一個無條件分支(或跳轉),循跡器170可以將其視為一個條件成立的條件分支,也就是用新BNX等於目標BNX、新BNY等於目標BNY的方法更新讀指標。If the entry is an unconditional branch (or jump), the tracker 170 can treat it as a conditional branch with conditions, that is, update the read indicator with a new BNX equal to the target BNX and a new BNY equal to the target BNY.
此外,如果所述表項是一條“調用”指令,循跡器170可以將當前指標的BNX和BNY對壓入一個棧,讀出表項內容或表示對應指令段已經存儲在一級緩存中的目標BNX。此外,如果所述表項是一個“返回”指令(如:副程式的末尾),循跡器170可以從所述棧中彈出所述BNX和BNY對,並用新BNX等於出棧BNX、新BNY等於出棧BNY的方法更新讀指標。在某些情況下,如果子程式要求返回到“調用”指令的後一條指令,則此時新BNY等於出棧BNY加一。In addition, if the entry is a "call" instruction, the tracker 170 can push the BNX and BNY pairs of the current indicator onto a stack, read the contents of the entry or indicate that the corresponding instruction segment has been stored in the first level cache. BNX. In addition, if the entry is a "return" instruction (eg, the end of the subroutine), the tracker 170 can pop the BNX and BNY pairs from the stack and use the new BNX to be equal to the popped BNX, new BNY. A method equivalent to popping the BNY updates the read indicator. In some cases, if the subroutine asks to return to the next instruction of the "call" instruction, then the new BNY is equal to the stack BNY plus one.
此外,如果所述表項是一個異常處理指令,循跡器170可以讀出保存在異常BN寄存器(EXCP)中的塊號BNX和偏移量BNY,並用新BNX等於異常BNX、新BNY等於異常BNY的方法更新讀指標。特定處理器的異常處理程式的起始位址通常是固定的,可以將異常處理程式的起始段填充到一級緩存中並在軌道表中建立相應的軌道(兩者都可以設置為不被替換)。Further, if the entry is an exception handling instruction, the tracker 170 can read out the block number BNX and the offset BNY held in the exception BN register (EXCP), and equalize the abnormal BNX with the new BNX, and the new BNY is equal to the exception. The BNY method updates the read indicator. The start address of a particular processor's exception handler is usually fixed. The start of the exception handler can be filled into the level 1 cache and the corresponding track can be created in the track table (both can be set to not be replaced). ).
分配器1200可以由一個一維的多表項列表構成。每個表項包括一個含高位位址的CAM和一個含BN、有效位、U位元及其他標誌位元的RAM。分配器1200包含一個自增器(APT)129和一個加法器以指向一個表項,所述列表可以被一個TBNX(如圖10A)索引(定址)。當需要進行緩存填充時,填充APT 129指向的表項、其對應的記憶塊以及軌道表項。The allocator 1200 can be composed of a one-dimensional list of multi-entry items. Each entry includes a CAM with a high address and a RAM with BN, a valid bit, a U bit, and other flag bits. The allocator 1200 includes an auto-incrementer (APT) 129 and an adder to point to an entry that can be indexed (addressed) by a TBNX (Fig. 10A). When a cache fill is required, the entry pointed to by the APT 129, its corresponding memory block, and the track entry are populated.
在某些情況下,分配器1200(如:保留表120、主動表121等)可以被用於提供一個位址-BNX-TBNX的號碼映射關係。例如,TBNX可以被用於索引高位位址或BNX,高位位址可以被用於通過高位位址匹配找到BNX或TBNX。當填充一級緩存時,生成器130計算出分支目標位址並將高位位址經匯流排141送到保留表120的CAM部分以進行高位位址匹配。如果匹配不成功,分配器1200可以將指標127指向的表項號碼作為TBNX,並使用該TBNX作為軌道表內容。同時,分配器1200可以填充所述TBNX對應的二級緩存塊。另一方面,如果匹配成功,分配器1200可以找到對應的TBNX,並將所述TBNX作為軌道表內容。In some cases, the allocator 1200 (e.g., reservation table 120, active list 121, etc.) can be used to provide an address mapping relationship for an address-BNX-TBNX. For example, TBNX can be used to index high-order addresses or BNX, and high-order addresses can be used to find BNX or TBNX by high-order address matching. When the level 1 cache is filled, the generator 130 calculates the branch target address and sends the upper address address via the bus 141 to the CAM portion of the reservation table 120 for high bit address matching. If the match is unsuccessful, the allocator 1200 can use the entry number pointed to by the indicator 127 as TBNX and use the TBNX as the track table content. At the same time, the allocator 1200 can fill the second level cache block corresponding to the TBNX. On the other hand, if the match is successful, the distributor 1200 can find the corresponding TBNX and use the TBNX as the track table content.
此外,在循跡器170的運行過程中,當軌道表讀指標151指向一個含有TBN的軌道表項時,所述TBN被通過讀口161讀出並被送到匯流排180以索引保留表120(即,檢查二級緩存中是否有相應的指令段)。如果不存在有效BN,將APT 129指向的BNX存儲到該TBN的RAM部分的表項中,且用所述BN替換軌道表126中的所述TBN。此外,二級緩存中相應的指令段被填充到一級緩存由BN索引的緩存塊中。然而,如果存在有效BN,意味著在一級緩存中已經存在該表項對應的指令段,那麼用所述有效的BN替換所述TBN。當然,當軌道表讀指標151指向一個內容包含BN的表項時,因為所述相應的指令段已經存儲在一級緩存中,因此分配器1200不需要進行檢查。Further, during the operation of the tracker 170, when the track table read indicator 151 points to a track entry containing a TBN, the TBN is read through the read port 161 and sent to the bus bar 180 to index the reservation table 120. (ie, check if there is a corresponding instruction segment in the L2 cache). If there is no valid BN, the BNX pointed to by the APT 129 is stored in the entry of the RAM portion of the TBN, and the TBN in the track table 126 is replaced with the BN. In addition, the corresponding instruction segment in the L2 cache is padded into the L1 cache buffer block indexed by the BN. However, if there is a valid BN, meaning that the instruction segment corresponding to the entry already exists in the primary cache, then the TBN is replaced with the valid BN. Of course, when the track table read indicator 151 points to an entry whose content contains BN, the distributor 1200 does not need to check because the corresponding instruction segment is already stored in the level one cache.
此外,分配器1200也可以針對主動表121和保留表120支援不同的結構。例如,對於主動表121和保留表120中表項的包含關係,分配器1200可以有兩種配置方式。In addition, the distributor 1200 can also support different configurations for the active table 121 and the reservation table 120. For example, for the inclusion relationship of the entries in the active table 121 and the reserved table 120, the distributor 1200 can have two configurations.
在一種配置方式中,如圖13中描述的那樣,主動表121和保留表120之間產生一種不包含的關係。為產生這種不包含關係,保留表120和主動表121各自擁有用於存儲高位位址的CAM。從生成器130來的位址被同時送到主動表121和保留表120以匹配從而得到TBNX或BNX。然而,只可能在主動表121和保留表120中的一個匹配成功,而不可能在主動表121和保留表120中同時匹配成功,也就是說一條特定的指令只可能存在於一級緩存和二級緩存之一之中,而不可能同時存在於兩者之中。如圖11所示,保留表120由TBNX索引,其CAM存儲了高位位址,其RAM存儲了對應的BNX號碼。可以用一個索引對同一行或表項中的多個BNX進行選擇。此外,主動表121由BNX索引,其CAM存儲了高位位址,其RAM存儲了TBNX號碼。In one configuration, as depicted in FIG. 13, an unrelated relationship is created between the active table 121 and the reserved table 120. To generate such a non-containment relationship, the reservation table 120 and the active table 121 each have a CAM for storing a high-order address. The address from the generator 130 is simultaneously sent to the active table 121 and the reserved table 120 to match to obtain TBNX or BNX. However, it is only possible that one of the active table 121 and the reserved table 120 is successfully matched, and it is not possible that the matching between the active table 121 and the reserved table 120 is successful at the same time, that is, a specific instruction may only exist in the first level cache and the second level. One of the caches, not both. As shown in FIG. 11, the reservation table 120 is indexed by TBNX, its CAM stores the upper address, and its RAM stores the corresponding BNX number. You can use an index to select multiple BNXs in the same row or table entry. In addition, the active list 121 is indexed by BNX, its CAM stores the upper address, and its RAM stores the TBNX number.
在一種配置方式中,主動表121和保留表120之間產生一種包含的關係。在這種關係下,只有保留表120的CAM中存儲了高位位址,保留表120可以由圖11中類似結構組成。然而,主動表121並不具有CAM部分,因此生成器送來的一個位址只在保留表120中進行匹配,這意味著如果一條特定的指令如果存在於一級緩存中,那麼必然存在於二級緩存中。此外,主動表121由BNX索引,其內容僅僅是TBNX。當一個一級緩存塊被清除(或被替換)時,所述舊的BNX被送到主動表121尋找一個TBNX以便存儲到軌道表126中。對於資料記憶體而言,一級緩存塊必須被存儲回保留表120對應的緩存記憶體中。In one configuration, an inclusion relationship is created between the active table 121 and the reserved table 120. In this relationship, only the upper address is stored in the CAM of the reserved table 120, and the reserved table 120 can be composed of a similar structure in FIG. However, the active table 121 does not have a CAM portion, so an address sent by the generator is only matched in the reserved table 120, which means that if a particular instruction exists in the level 1 cache, it must exist in the second level. In the cache. In addition, the active table 121 is indexed by BNX, and its content is only TBNX. When a level one cache block is cleared (or replaced), the old BNX is sent to the active list 121 to find a TBNX for storage in the track table 126. For the data memory, the level 1 cache block must be stored back into the cache memory corresponding to the reserved table 120.
在某些情況下,可以使用一個一層緩存系統。這樣,保留表表項可以由對應主記憶體而非緩存記憶體中一個記憶塊的TBNX索引,所述主記憶體位址的高位位址被存儲到對應的CAM項中。與通常一樣,RAM部分包含BNX。這樣,TBNX被臨時保存在軌道表項中直到軌道表126的讀指標接近所述表項,使得所述高位位址對應的記憶塊可以被填充到緩存(一級緩存)中。之後,也可以指定一個BNX替換軌道表126中的所述TBNX。這個BNX也可以被保存在保留表項中由所述TBNX索引的RAM部分中。In some cases, a one-tiered cache system can be used. Thus, the reserved table entry can be indexed by the TBNX corresponding to the primary memory rather than one of the memory blocks in the cache memory, and the upper address of the primary memory address is stored in the corresponding CAM entry. As usual, the RAM section contains BNX. Thus, the TBNX is temporarily saved in the track entry until the read index of the track table 126 is close to the entry, so that the memory block corresponding to the upper address can be filled into the cache (level 1 cache). Thereafter, the BNX can also be assigned to replace the TBNX in the track table 126. This BNX can also be saved in the RAM portion of the reserved entry indexed by the TBNX.
此外,分配器1200可以被用於輔助一級緩存替換策略的實現。例如,分配器1200可以支援最近最少使用策略和最少頻繁使用策略。Additionally, the allocator 1200 can be used to assist in the implementation of a level one cache replacement strategy. For example, the allocator 1200 can support the least recently used policy and the least frequently used policy.
在最近最少使用策略情況下,分配器1200可以使用一個由主指標129(APT)和清除指標構成的最近最少使用視窗,以找到下一個可被替換的記憶塊。清除指標在主指標129(APT)之前N個表項的位置移動,其中N是可變數,清除指標用於清除指向的表項中的U位(設為‘0’)。另一方面,被訪問到的表項對應的U位被重新設為‘1’。檢查主指標129(APT)指向的表項的U位以決定是否替換該表項。如果U位為‘1’,表示該表項最近被訪問過,並不是最近最少使用的,那麼主指標129增加並檢查下一個表項。如果U位為‘0’,主指標129可以停留在該表項進行替換。In the case of the least recently used policy, the allocator 1200 can use a least recently used window consisting of a primary indicator 129 (APT) and a clear indicator to find the next memory block that can be replaced. The clear indicator moves at the position of the N entries before the main indicator 129 (APT), where N is a variable number and the clear indicator is used to clear the U bit in the pointed entry (set to '0'). On the other hand, the U bit corresponding to the accessed entry is reset to '1'. Check the U bit of the entry pointed to by the primary indicator 129 (APT) to determine whether to replace the entry. If the U bit is '1', indicating that the entry was recently accessed, not the least recently used, then the primary indicator 129 is incremented and the next entry is checked. If the U bit is '0', the main indicator 129 can stay in the entry for replacement.
在最少頻繁使用策略情況下,分配器1200可以使用如上述一樣的視窗,但使用一個記錄存儲訪問次數(表示訪問頻率)的計數器代替U位。將主指標129指向的表項中的計數器的值與處理器核125或其他設備設定的一個調整值進行比較。如果計數結果小於調整值,主指標129可以停留在該表項進行替換。In the case of the least frequently used policy, the allocator 1200 can use the same window as above, but uses a counter that records the number of accesses (representing the access frequency) instead of the U bit. The value of the counter in the entry pointed to by the primary indicator 129 is compared to an adjustment value set by the processor core 125 or other device. If the count result is less than the adjustment value, the main indicator 129 can stay in the entry for replacement.
交換器133可以被用於輔助軌道表126與分配器1200之間的交互。例如,在軌道表126中,當分配一個BN以替換TBN時(例如,當一個二級緩存塊被填充到一個一級緩存塊中時),或分配一個TBN以替換BN時(例如,當由於一級緩存空間不足,一個一級緩存塊因並不存在與二級緩存中而被替換回二級緩存時),交換器133在舊的TBNX(BNX)被重新使用前,將軌道表126中所有舊的TBNX(BNX)替換為新的BNX(TBNX)。這樣,同一個BNX就不會對應到兩個不同的PC位址。Switch 133 can be used to assist in the interaction between track table 126 and distributor 1200. For example, in the track table 126, when a BN is allocated to replace the TBN (for example, when a L2 cache block is filled into a L1 cache block), or when a TBN is allocated to replace the BN (for example, when The cache space is insufficient. When a level 1 cache block is replaced with the level 2 cache and replaced with the level 2 cache, the switch 133 will replace all the old ones in the track table 126 before the old TBNX (BNX) is reused. Replace TBNX (BNX) with the new BNX (TBNX). Thus, the same BNX will not correspond to two different PC addresses.
特別地,交換器133可以在分配操作開始的使用就存儲一組舊TBNX和新BNX對,交換器133沿軌道表移動到底,再從軌道表126的頂部開始直至到達出發點,利用額外的讀埠匯流排159和額外的寫埠匯流排158,將所有舊的TBNX替換為新的BNX。同時,交換器133在將BN送到循跡器170之前,將每一個讀出內容中舊的TBNX替換為新的BNX。In particular, the switch 133 can store a set of old TBNX and new BNX pairs at the beginning of the allocation operation, and the switch 133 moves to the end along the track table, starting from the top of the track table 126 until reaching the departure point, utilizing additional readings. Bus 159 and additional write bus 158 replace all old TBNX with new BNX. At the same time, the switch 133 replaces the old TBNX in each readout with the new BNX before sending the BN to the tracker 170.
此外,也可以使用其他部件提供上述操作所需的功能。例如,處理器核125可以提供一個控制信號“TAKEN”以控制循跡器170中的多路選擇器137。In addition, other components may be used to provide the functions required for the above operations. For example, processor core 125 may provide a control signal "TAKEN" to control multiplexer 137 in tracker 170.
處理器核125還可以提供一個控制信號“BRANCH/JUMP”以控制循跡器170中的寄存器138。讀指標151向前移動(如:增加BNY)直到讀出的軌跡表內容為一個分支/跳轉類型,讀指標151停止並等待處理器核125趕上。同時,根據所述內容中的分支目標位址檢查一級緩存填充的必要性。所述BRANCH/JUMP信號向循跡器170表示處理器核125已經到達所述分支指令,此時TAKEN信號是程式執行的真實結果,則正確的下一位址可以被選出。這樣,當檢測到BRANCH/JUMP信號時,循跡器170控制寄存器138存入新位址並作為BN 151輸出。此外,處理器核125也可以向一級緩存124提供一個部分位址“OFFSET”,從而在由BN 151的BNX決定的緩存塊中索引指令。Processor core 125 may also provide a control signal "BRANCH/JUMP" to control register 138 in tracker 170. The read indicator 151 moves forward (e.g., increases BNY) until the read track table content is a branch/jump type, the read indicator 151 stops and waits for the processor core 125 to catch up. At the same time, the necessity of checking the level 1 cache fill is checked according to the branch target address in the content. The BRANCH/JUMP signal indicates to the tracker 170 that the processor core 125 has reached the branch instruction. At this time, the TAKEN signal is the actual result of the program execution, and the correct lower address can be selected. Thus, when the BRANCH/JUMP signal is detected, the tracker 170 controls the register 138 to be stored in the new address and output as the BN 151. In addition, processor core 125 may also provide a partial address "OFFSET" to L1 cache 124 to index the instructions in the cache block determined by BNX 151's BNX.
一級記憶體124或高層次記憶體124可以被用做由BNX索引的緩存塊或記憶塊。一級記憶體124可以有一個接收從匯流排140來的資料的寫埠。對於寫位址,X位址(WXADDR)由分配器1200提供,從APT 129產生經匯流排153而來,Y位址(WYADDR,偏移位址)由獲取引擎(與被填充的資料同步)提供。一級記憶體124可以包含一個讀埠用於向處理器核125輸出資料。對於讀地址,X地址(BNX)來源於循跡器170提供的BN 151,Y地址來源於處理器核125提供的OFFSET。The first level memory 124 or the high level memory 124 can be used as a cache block or a memory block indexed by BNX. The primary memory 124 can have a write buffer that receives data from the bus 140. For the write address, the X address (WXADDR) is provided by the allocator 1200, which is generated from the APT 129 via the bus 153, and the Y address (WYADDR, offset address) is acquired by the acquisition engine (synchronized with the filled data) provide. The primary memory 124 can include a read command for outputting data to the processor core 125. For the read address, the X address (BNX) is derived from the BN 151 provided by the tracker 170, and the Y address is derived from the OFFSET provided by the processor core 125.
圖15顯示了本發明所述緩存系統的另一個實施例11000。與圖10A中的緩存系統9000類似,緩存系統11000可以被用於獲取資料而非指令。這樣,可以不需要保留表120和交換器133。Figure 15 shows another embodiment 11000 of the cache system of the present invention. Similar to cache system 9000 in Figure 10A, cache system 11000 can be used to obtain data rather than instructions. Thus, the table 120 and the switch 133 may not need to be retained.
用於資料存儲的主動表195具有與主動表121相同的結構。主動表195中的每一個表項對應高層次記憶體196中的一個資料段。此外,一個基底位址指標記憶體197被用於存儲對應基底位址的資料段號。基底位址指標記憶體197中的基底位址指標數目與處理器核125用到的基底位址數目相同,例如:8。也可以使用其他數目。此外,處理器核125可以用基底位址加偏移量的方式對高層次記憶體196定址。所述偏移量可以保證位址資料不會超出基底位址對應的資料段的範圍。The active table 195 for data storage has the same structure as the active table 121. Each entry in the active list 195 corresponds to a data segment in the high level memory 196. In addition, a base address indicator memory 197 is used to store the data segment number corresponding to the base address. The number of base address indices in the base address indicator memory 197 is the same as the number of base addresses used by the processor core 125, for example: 8. Other numbers can also be used. In addition, processor core 125 can address high level memory 196 with a base address plus offset. The offset can ensure that the address data does not exceed the range of the data segment corresponding to the base address.
也可以支援多線程程式。例如,如前所述,複數個棧135可以被用於在多線程程式情況下填充指令,且複數個基底位址指標記憶體197可以被用於在多線程程式情況下填充指令。這樣,PID 188可以指向一個當前棧135及一個當前基底位址指標記憶體197。然而,如果只支持一個線程,可以只使用一個棧135和一個基底位址指標記憶體197,可以不需要PID 188。It can also support multi-threaded programs. For example, as previously discussed, a plurality of stacks 135 can be used to populate instructions in the case of a multi-threaded program, and a plurality of base address pointer memory 197 can be used to populate the instructions in the case of a multi-threaded program. Thus, PID 188 can point to a current stack 135 and a current base address indicator memory 197. However, if only one thread is supported, only one stack 135 and one base address indicator memory 197 can be used, and PID 188 may not be needed.
當生成器130掃描、分析獲取到的指令時,如果一條指令會改變資料的基底位址,那麼就要將對應的基底位址、立即數和寄存器號等資訊存儲到軌跡表126的相應軌跡點中。此外,當處理器核125執行到所述指令時,所述基底位址或修改後的基底位址可以被送到主動表195與其中的內容進行匹配。When the generator 130 scans and analyzes the acquired instruction, if an instruction changes the base address of the data, the corresponding base address, immediate value, and register number are stored in the corresponding track point of the track table 126. in. Moreover, when processor core 125 executes the instructions, the base address or modified base address can be sent to active table 195 to match the content therein.
如果匹配成功,匹配成功項的表項號被作為基底位址指標的內容送到基底位址記憶體197。由於主動表195中的表項對應於高層次記憶體196中的資料段,因此當前基底位址指標存儲了對應資料段在高層次記憶體196中的基底位址。If the matching is successful, the entry number of the matching success item is sent to the base address memory 197 as the content of the base address index. Since the entry in the active table 195 corresponds to the data segment in the high level memory 196, the current base address index stores the base address of the corresponding data segment in the high level memory 196.
另一方面,如果匹配不成功,所述基底位址被送到填充引擎132用於填充相應資料段。當所述基底位址對應的資料段被獲取時,所述基底位址被存儲到主動表195內由指標198指向的一個表項中。主動表195中所述表項的表項號被存儲到基底位址指標記憶體197內一個對應的基底位址指標中。與填充指令類似,指標198移動到主動表195中的下一個有效表項。On the other hand, if the match is unsuccessful, the base address is sent to the fill engine 132 for filling the corresponding data segment. When the data segment corresponding to the base address is obtained, the base address is stored in an entry in the active table 195 pointed to by the indicator 198. The entry number of the entry described in the active table 195 is stored in a corresponding base address indicator in the base address indicator memory 197. Similar to the fill instruction, the indicator 198 moves to the next valid entry in the active list 195.
當處理器核125執行一條訪問高層次記憶體196中某個資料的指令時,所述指令的基底位址189被當作索引從基底位址指標記憶體197中讀出資料段號。此外,資料讀寫位址偏移量194被當作索引從所述資料段號指向的資料段中找到一個資料項目。處理器核125就可以對所述資料項目進行讀、寫操作了。When the processor core 125 executes an instruction to access a material in the high level memory 196, the base address 189 of the instruction is read as an index to read the data segment number from the base address pointer memory 197. In addition, the data read and write address offset 194 is used as an index to find a data item from the data segment pointed to by the data segment number. The processor core 125 can read and write the data item.
圖16是使用本發明所述高性能緩存結構實現存儲結構的一個實施例。緩存結構可以與之前所述的緩存控制單元類似。如圖16所示,處理器核201使用的存儲設備依次包括(從高速到低速):第一層次記憶體202、第二層次記憶體203、主記憶體204和硬碟記憶體205。通常地,第一層次記憶體202的容量比第二層次記憶體203的容量小;第二層次記憶體203的容量比主記憶體204的容量小;主記憶體204的容量比硬碟205的容量小。任何一個層次的存儲設備都可以有任意大小。Figure 16 is an embodiment of a memory structure implemented using the high performance cache structure of the present invention. The cache structure can be similar to the cache control unit described previously. As shown in FIG. 16, the storage device used by the processor core 201 includes (from high speed to low speed) in order: first level memory 202, second level memory 203, main memory 204, and hard disk memory 205. Generally, the capacity of the first level memory 202 is smaller than the capacity of the second level memory 203; the capacity of the second level memory 203 is smaller than the capacity of the main memory 204; the capacity of the main memory 204 is larger than that of the hard disk 205. The capacity is small. Any level of storage can be of any size.
此外,一個緩存結構206被放置在處理器核201和第一層次記憶體202之間;一個緩存結構207被放置在第一層次記憶體202和第二層次記憶體203之間;一個緩存結構208被放置在第二層次記憶體203和主記憶體204之間;一個緩存結構209被放置在主記憶體204和硬碟205之間。也可以使用其他的放置方法。這種多層的緩存結構可以提高處理器核201的性能。In addition, a cache structure 206 is placed between the processor core 201 and the first level memory 202; a cache structure 207 is placed between the first level memory 202 and the second level memory 203; a buffer The structure 208 is placed between the second level memory 203 and the main memory 204; a buffer structure 209 is placed between the main memory 204 and the hard disk 205. Other placement methods can also be used. This multi-layered cache structure can improve the performance of the processor core 201.
例如,對於第一層次記憶體202和第二層次記憶體203之間的緩存結構207,由於處理器核201需要從第一層次記憶體202中獲取指令,而第一層次記憶體202中的指令來源於第二層次記憶體203。這樣,當指令通過緩存結構207傳遞時,所述指令可以被掃描並分析,並在相關指令被執行前將它們也獲取到第一層次記憶體202中,從而同時提高指令和資料的緩存命中率。For example, for the cache structure 207 between the first level memory 202 and the second level memory 203, since the processor core 201 needs to acquire an instruction from the first level memory 202, the first level memory 202 The instructions in the source are derived from the second level memory 203. Thus, when instructions are passed through the cache structure 207, the instructions can be scanned and analyzed, and they are also fetched into the first level memory 202 before the related instructions are executed, thereby simultaneously increasing cache hits for instructions and data. rate.
緩存結構207可以與緩存結構206類似,緩存結構207與第一層次記憶體的介面包括位址匯流排210、讀資料匯流排212和寫資料匯流排211,與第二層次記憶體203的介面包括位址匯流排213、讀資料匯流排214和寫資料匯流排215。這樣,緩存結構207可以提高第一層次記憶體202的命中率。The cache structure 207 can be similar to the cache structure 206. The interface of the cache structure 207 and the first level memory includes an address bus 210, a read data bus 212, and a write data bus 211, and the interface of the second level memory 203. The address bus 213, the read data bus 214, and the write data bus 215 are included. Thus, the cache structure 207 can increase the hit rate of the first level memory 202.
類似地,位於第二層次記憶體203和主記憶體204之間的緩存結構208可以提高第二層次記憶體203的命中率,而位於主記憶體204和硬碟205之間的緩存結構209可以提高主記憶體204的命中率。如果硬碟205包含了處理器核201所需的全部指令,那麼通過這種多層次的緩存結構,處理器核201可以獲得高的命中率或性能。Similarly, the cache structure 208 between the second level memory 203 and the main memory 204 can increase the hit ratio of the second level memory 203, and the cache structure 209 between the main memory 204 and the hard disk 205 can The hit rate of the main memory 204 is increased. If the hard disk 205 contains all of the instructions required by the processor core 201, the processor core 201 can achieve a high hit rate or performance through this multi-level cache structure.
此外,較慢速度的記憶體之間的緩存結構可以具有較寬的帶寬,即能一次獲取較多的指令或資料。例如,緩存結構209的帶寬比緩存結構208的帶寬更寬;緩存結構208的帶寬比緩存結構207的帶寬更寬;緩存結構207的帶寬比緩存結構206的帶寬更寬。也可以配置成其他形式。In addition, the cache structure between slower speed memories can have a wider bandwidth, that is, more instructions or data can be acquired at one time. For example, the bandwidth of the cache structure 209 is wider than the bandwidth of the cache structure 208; the bandwidth of the cache structure 208 is wider than the bandwidth of the cache structure 207; the bandwidth of the cache structure 207 is wider than the bandwidth of the cache structure 206. It can also be configured in other forms.
此外,可以在緩存結構208和第一層次記憶體202之間提供一條獨立的旁路路徑216。可以將主記憶體204中的指令或資料同時填充到第二層次記憶體203和第一層次記憶體202,從而進一步提高整個系統的性能。Additionally, a separate bypass path 216 can be provided between the cache structure 208 and the first level memory 202. The instructions or data in the main memory 204 can be simultaneously filled into the second level memory 203 and the first level memory 202, thereby further improving the performance of the entire system.
本發明所述系統和方法可以為數位系統使用的緩存結構提供基本的解決方案。與傳統緩存系統僅在緩存缺失後才填充的機制不同,本發明所述的系統和方法在處理器執行一條指令或訪問一個資料之前就對指令緩存和資料緩存進行填充,可以避免或充分地隱藏強制缺失。也就是說,本發明所述緩存系統集成了預取過程,並消除了傳統緩存必須的標籤比較過程。此外,本發明所述系統和方法在本質上提供了一個全相聯的緩存結構,避免或充分隱藏了衝突缺失和容量缺失。此外,本發明所述的系統和方法支援同時搜索多層緩存結構,因此降低了多層緩存的缺失懲罰。本發明所述的系統和方法還因為避免了在訪問緩存的時延關鍵路徑上的標籤匹配,所以能運行在較高的時鐘頻率。由於本發明所述的系統和方法需要的匹配操作較少,並有較低的缺失率,因此同等功耗下的效率也比傳統緩存系統有顯著提高。對於本領域專業人士而言,本發明的其他優點和應用是顯見的。The system and method of the present invention can provide a basic solution for the cache structure used by digital systems. Unlike the conventional cache system, which only populates after the cache is missing, the system and method of the present invention fills the instruction cache and the data cache before the processor executes an instruction or accesses a material, which can be avoided or fully hidden. Forced missing. That is to say, the cache system of the present invention integrates the prefetch process and eliminates the label comparison process necessary for the conventional cache. Moreover, the system and method of the present invention essentially provides a fully associative cache structure that avoids or substantially obscures conflicts of absence and capacity. Moreover, the system and method of the present invention supports simultaneous search for a multi-layer cache structure, thereby reducing the lack of penalty for multi-layer cache. The system and method of the present invention can also operate at higher clock frequencies because tag matching on the critical path of the latency of the access cache is avoided. Since the system and method of the present invention require fewer matching operations and a lower rate of misses, the efficiency at the same power consumption is also significantly improved over conventional cache systems. Other advantages and applications of the present invention will be apparent to those skilled in the art.
11、12、13、14、15、16、17...指令段11, 12, 13, 14, 15, 16, 17. . . Instruction segment
30、33、35、37、38、41...程式段30, 33, 35, 37, 38, 41. . . Program segment
31、39...條件分支指令31, 39. . . Conditional branch instruction
32、34、40...轉移路徑32, 34, 40. . . Transfer path
36...無條件分支指令36. . . Unconditional branch instruction
46、78...指令記憶體46, 78. . . Instruction memory
48、136...增一邏輯48, 136. . . Increase one logic
49...選擇器49. . . Selector
50、138...寄存器50, 138. . . register
51、52、53、54、55、56、62、64、91、99、140、141、143、144、149、150、153、154、158、159、164、165、180、423...匯流排51, 52, 53, 54, 55, 56, 62, 64, 91, 99, 140, 141, 143, 144, 149, 150, 153, 154, 158, 159, 164, 165, 180, 423. . . Busbar
57...類型區域57. . . Type area
58...第一位址XADDR58. . . First address XADDR
59...第二位址YADDR59. . . Second address YADDR
61、135...棧61, 135. . . Stack
63...線程標識器63. . . Thread identifier
65、92、93、137...多路選擇器65, 92, 93, 137. . . Multiplexer
66、70、71、72、410、411、412...軌道66, 70, 71, 72, 410, 411, 412. . . track
67、68、69...分支點67, 68, 69. . . Branch point
75...對應關係75. . . Correspondence
79、83...映射單元79, 83. . . Mapping unit
80...塊號或塊位址80. . . Block number or block address
84...M位的第一位址84. . . First address of M bit
85...N位的第一位址85. . . N-bit first address
87...內容定址記憶體(CAM)87. . . Content addressed memory (CAM)
88、89...RAM列88, 89. . . RAM column
90...索引位元高位LSB190. . . Index bit high LSB1
94...第一位址BNX394. . . First address BNX3
96...偏移量(BNY)96. . . Offset (BNY)
97...索引位元97. . . Index bit
98...隨機訪問記憶體(RAM)98. . . Random access memory (RAM)
100...雙向定址單元100. . . Bidirectional addressing unit
101...表項101. . . Entry
102...編碼器102. . . Encoder
104...塊位址資料104. . . Block address data
105...寫指標105. . . Write indicator
106...讀位址106. . . Read address
107...控制邏輯107. . . Control logic
108...資料輸出108. . . Data output
109...匹配位址輸出109. . . Matching address output
110...迴圈自增單元110. . . Loop self-increment unit
111...V標誌位元111. . . V flag bit
112...A標誌位元112. . . A flag bit
113...U標誌位元113. . . U flag bit
115...加法器115. . . Adder
116...視窗(清除)指標116. . . Window (clear) indicator
119...信號線119. . . Signal line
120...保留表120. . . Reserved form
121、195...主動表121, 195. . . Active table
126...軌道表126. . . Track table
127、198...指標127, 198. . . index
129...位址指標129. . . Address indicator
130、186...生成器130, 186. . . Builder
131...位址翻譯單元(TLB)131. . . Address translation unit (TLB)
132、185...填充引擎132, 185. . . Fill engine
133...交換器133. . . Exchanger
139...異常處理程式位址寄存器139. . . Exception handler address register
151...BN151. . . BN
152...BNX152. . . BNX
156...BNY(領先指標)156. . . BNY (leading indicator)
161...讀口161. . . Reading mouth
166...下一BN166. . . Next BN
170...循跡器170. . . Tracker
172...表項號(BN)172. . . Entry number (BN)
187...第二填充/生成器187. . . Second fill/builder
188...線程標識(PID)188. . . Thread identification (PID)
189...基底位址189. . . Base address
190...TBNX表190. . . TBNX table
191...BNX表191. . . BNX table
192...G位192. . . G position
193...替換模組193. . . Replacement module
194...資料讀寫位址偏移量194. . . Data read and write address offset
196...高層次記憶體196. . . High-level memory
197...基底位址指標記憶體197. . . Base address index memory
201...處理器核201. . . Processor core
202...第一層次記憶體202. . . First level memory
203...第二層次記憶體203. . . Second level memory
204...主記憶體204. . . Main memory
205...硬碟記憶體205. . . Hard disk memory
206、207、208、209...緩存結構206, 207, 208, 209. . . Cache structure
213...位址匯流排213. . . Address bus
214...讀資料匯流排214. . . Reading data bus
215...寫資料匯流排215. . . Write data bus
216...旁路路徑216. . . Bypass path
300...位址樹300. . . Address tree
301、302、304、305、307...樹幹301, 302, 304, 305, 307. . . trunk
303、306...樹支303, 306. . . Tree branch
310、312...樹節點310, 312. . . Tree node
311、313...分支目標311, 313. . . Branch target
381、417...信號381, 417. . . signal
413、414、415、416...軌跡點413, 414, 415, 416. . . Track point
418...輸出418. . . Output
1200...分配器1200. . . Distributor
2000、3000、4000、5000、6000、8000、9000、10000、11000...緩存系統2000, 3000, 4000, 5000, 6000, 8000, 9000, 10000, 11000. . . Cache system
圖1是本發明所述的計算環境的一個實施例。1 is an embodiment of a computing environment in accordance with the present invention.
圖2A是根據本發明所述方法實現位址樹的一個實施例。2A is an embodiment of implementing an address tree in accordance with the method of the present invention.
圖2B是基於本發明所述位址樹運行的的一個實施例。Figure 2B is an embodiment of the operation of the address tree in accordance with the present invention.
圖3A是本發明所述緩存系統的一個實施例。Figure 3A is an embodiment of the cache system of the present invention.
圖3B是本發明所述緩存系統的另一個實施例。Figure 3B is another embodiment of the cache system of the present invention.
圖4是本發明所述緩存系統的另一個實施例。4 is another embodiment of the cache system of the present invention.
圖5是本發明所述緩存系統的另一個實施例。Figure 5 is another embodiment of the cache system of the present invention.
圖6是本發明所述緩存系統的另一個實施例。Figure 6 is another embodiment of the cache system of the present invention.
圖7A是本發明所述緩存系統的另一個實施例。Figure 7A is another embodiment of the cache system of the present invention.
圖7B是本發明所述緩存系統中一個組成部分的實施例。Figure 7B is an embodiment of an integral part of the cache system of the present invention.
圖8是本發明所述主動表的一個實施例。圖9是本發明所述建立新軌道的一個實施例。Figure 8 is an embodiment of the active watch of the present invention. Figure 9 is an embodiment of the construction of a new track in accordance with the present invention.
圖10A是本發明所述緩存系統的另一個實施例。Figure 10A is another embodiment of the cache system of the present invention.
圖10B是本發明所述緩存系統中一個組成部分的實施例。Figure 10B is an embodiment of an integral part of the cache system of the present invention.
圖11是本發明所述用於多層緩存結構的分配器或保留表的一個實施例。Figure 11 is an embodiment of a dispenser or reservation table for a multi-layer cache structure of the present invention.
圖12是本發明所述建立新軌道的實施例。Figure 12 is an embodiment of the construction of a new track in accordance with the present invention.
圖13是本發明所述交換器的一個實施例。Figure 13 is an embodiment of the exchanger of the present invention.
圖14A是本發明所述緩存系統的另一個實施例。Figure 14A is another embodiment of the cache system of the present invention.
圖14B是本發明所述緩存系統中一個組成部分的實施例。Figure 14B is an embodiment of an integral part of the cache system of the present invention.
圖15是本發明所述緩存系統的另一個實施例。Figure 15 is another embodiment of the cache system of the present invention.
圖16是使用本發明所述高性能緩存實現存儲結構的一個實施例。Figure 16 is an embodiment of a memory structure implemented using the high performance cache of the present invention.
122...低級記憶體122. . . Low-level memory
123...填充/生成器123. . . Fill/builder
124...高層次記憶體124. . . High-level memory
125...處理器核125. . . Processor core
320...循跡引擎320. . . Tracking engine
Claims (78)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW100122199A TWI636362B (en) | 2011-06-24 | 2011-06-24 | High-performance cache system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW100122199A TWI636362B (en) | 2011-06-24 | 2011-06-24 | High-performance cache system and method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201301032A true TW201301032A (en) | 2013-01-01 |
| TWI636362B TWI636362B (en) | 2018-09-21 |
Family
ID=48137492
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW100122199A TWI636362B (en) | 2011-06-24 | 2011-06-24 | High-performance cache system and method |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI636362B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105893319A (en) * | 2014-12-12 | 2016-08-24 | 上海芯豪微电子有限公司 | Multi-lane/multi-core system and method |
| TWI649693B (en) * | 2013-10-09 | 2019-02-01 | Arm Limited | Data processing device, method and computer program product for controlling speculative vector computing performance |
| TWI716425B (en) * | 2015-07-31 | 2021-01-21 | 英商Arm股份有限公司 | An apparatus and method for performing a splice operation |
| TWI788641B (en) * | 2020-03-03 | 2023-01-01 | 瑞昱半導體股份有限公司 | Data storage system and method for operating a data storage system |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7441110B1 (en) * | 1999-12-10 | 2008-10-21 | International Business Machines Corporation | Prefetching using future branch path information derived from branch prediction |
| CN1248109C (en) * | 2002-10-22 | 2006-03-29 | 富士通株式会社 | Information processing unit and information processing method |
| JP2004171177A (en) * | 2002-11-19 | 2004-06-17 | Renesas Technology Corp | Cache system and cache memory controller |
| KR101076815B1 (en) * | 2004-05-29 | 2011-10-25 | 삼성전자주식회사 | Cache system having branch target address cache |
| JP2011065503A (en) * | 2009-09-18 | 2011-03-31 | Renesas Electronics Corp | Cache memory system and control method for way prediction of cache memory |
-
2011
- 2011-06-24 TW TW100122199A patent/TWI636362B/en active
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI649693B (en) * | 2013-10-09 | 2019-02-01 | Arm Limited | Data processing device, method and computer program product for controlling speculative vector computing performance |
| US10261789B2 (en) | 2013-10-09 | 2019-04-16 | Arm Limited | Data processing apparatus and method for controlling performance of speculative vector operations |
| CN105893319A (en) * | 2014-12-12 | 2016-08-24 | 上海芯豪微电子有限公司 | Multi-lane/multi-core system and method |
| TWI716425B (en) * | 2015-07-31 | 2021-01-21 | 英商Arm股份有限公司 | An apparatus and method for performing a splice operation |
| US12061906B2 (en) | 2015-07-31 | 2024-08-13 | Arm Limited | Apparatus and method for performing a splice of vectors based on location and length data |
| TWI788641B (en) * | 2020-03-03 | 2023-01-01 | 瑞昱半導體股份有限公司 | Data storage system and method for operating a data storage system |
Also Published As
| Publication number | Publication date |
|---|---|
| TWI636362B (en) | 2018-09-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8527707B2 (en) | High-performance cache system and method | |
| US6161166A (en) | Instruction cache for multithreaded processor | |
| US5091851A (en) | Fast multiple-word accesses from a multi-way set-associative cache memory | |
| US5717895A (en) | Associative scalar data cache with write-through capabilities for a vector processor | |
| EP0667580B1 (en) | Cache System for a memory | |
| US9141388B2 (en) | High-performance cache system and method | |
| US6496902B1 (en) | Vector and scalar data cache for a vector multiprocessor | |
| US10275358B2 (en) | High-performance instruction cache system and method | |
| US20110208894A1 (en) | Physical aliasing for thread level speculation with a speculation blind cache | |
| TWI451330B (en) | Cache memory system and method of cache data replacement | |
| US9753855B2 (en) | High-performance instruction cache system and method | |
| US9569219B2 (en) | Low-miss-rate and low-miss-penalty cache system and method | |
| CN104424128B (en) | Variable length instruction word processor system and method | |
| US20100318741A1 (en) | Multiprocessor computer cache coherence protocol | |
| US8806177B2 (en) | Prefetch engine based translation prefetching | |
| TW201638774A (en) | A system and method based on instruction and data serving | |
| KR20150119004A (en) | Instruction processing system and method | |
| US6012135A (en) | Computer having multiple address ports, each having logical address translation with base and limit memory management | |
| TWI636362B (en) | High-performance cache system and method | |
| JPH06236353A (en) | Method and system for increase of parallelism of system memory of multiprocessor computer system | |
| US8019968B2 (en) | 3-dimensional L2/L3 cache array to hide translation (TLB) delays | |
| US8019969B2 (en) | Self prefetching L3/L4 cache mechanism | |
| JPH08263371A (en) | Apparatus and method for generation of copy-backed address in cache | |
| KR20240112295A (en) | How to store and access data operands in a memory unit | |
| Gong et al. | A novel configuration context cache structure of reconfigurable systems |