201234180 六、發明說明: 【發明所屬之技術領域】 本發明係關於快取記憶體同調控制,且特定言之係關 於用於控制共享記憶體多處理器之椒& 处商之呎取圮憶體同調之方 法、系統及程式。 【先前技術】 多處理器系統同時執行複數個任務或程序(以下簡稱 「程序」)。複數個程序中之每一者通常具有用於執行程 序之虛擬位址空間。該虛擬位址空間中之—位置含有映 射在系統記憶體中之實體位址中之位址。 憶體中之單個空間映射至多處理器中之複數個虛擬位址 中》當複數個程序中之每—者使用虛擬位址時,該等位 址轉換為系統記憶體中之實體位址,且若處理器中的快 取記憶體中不存在用於執行該等程序中之每一者之適當 指令或資料’㈣料料自纟統記,隐體擷取且儲存於 快取記憶體中。 為了將多處理器系統中之虛擬位址快速轉換成為系統 。己隐體中之貫體位址且獲得適當指令或資料,使用了與 快取冗憶體相關之所謂的轉換後備緩衝器( look-aside bUffer ’以下簡稱「TLB」)。TLB為一種緩衝 器,該緩衝器含有使用轉換演算法產生之在虛擬位址與 實體位址之間的轉換關係。TLB之使用令位址轉換非常 有效’然而,若該緩衝器用於對稱多處理(以下簡稱 4 201234180 「SMP」)系統中,則將出現非同調問題。對於其中複數 個處理器可自共享系統記憶體讀取資訊且向該共享系統 記憶體寫人資訊之資㈣理系統,必須小。以確保記憶 體系統關調方式操作。亦即,不允許複數個處理器執 行之程序導致的記憶體系怂 系統之非同調。在該多處理器系 統中之處理器中之每一者通常含有用於與快取記憶體相 關的位址轉換的TLB。為了維持同調,在該系統中之共 享記憶體模式必須將多處理器中之單一處理器之TLB 上的變化小心且一致地映射至其他處理器之則。 對於多處理器,TLB之同調可例如藉由在對所有丁LB 之修改中使用處理11間中斷(「intewessor interrupt; Η」)及軟體同步來維持。此技術可確保在整個多處理 器系統上的記憶體同調。在典型頁記憶體系統中,在多 處理器系統中之各TLB之内容反映一部分,該部分與留 存在系統記憶射之頁表之内容的快取記憶體相關。頁 ㈣常為記憶體映射表’該記憶體映射表含有虛擬位址 或虛擬位址的分段,以及與虛擬位址相關聯之實體位 X頁表通吊進一步含有其他各種類型之管理資料, _括頁面保護位兀、有效項目位元及各種存取控制位 'i如T將明確指不同調之需要(要求記憶體同調 :屬性)之位元定義為管理資料以靜態配置該頁面是否 正需要同調。然而,此位元靜態配置方法僅可有效地 振;某—特殊程4 ’可允許該等特殊程式可向軟體控制 、取汜It體重寫’因為,除需要靜態配置上述位元之外, 201234180 必須在整個系統記憶體中靜態配置該位元。 近年來,具有複數個t央處理2元(cpus)& SMP-Linux ( Linux在美國及其他國家為以画τ〇⑽心 之商標)之桌上個人電腦已變得風行,並且許多應用程 式已支援共享記憶體多處理器,亦即,SMp系統。因此, 在系統中處理器之數目之增加改良了應用程式之處理量 且無需重寫軟體。推進支援SMP (例如SMp Unux)之 通用操作系統(OS)已增大至可控制不少於1〇24個處理 器之多處理器系統。處理量可由處理器之數目之增加而 改良且無需重寫軟體之特徵為一種優點,該優點不存在 於不共享記憶體之多處理器系統中,諸如使用訊息傳遞 程式設计之叢集。因此’ SMP為適於保護軟體資產之多 處理器系統》 然而,SMP系統之可擴縮性低於基於訊息傳遞之叢集 之可擴縮性。此乃是因為支援快取記憶體同調之硬體之 成本隨著SMP系統用於改良可擴縮性之處理器之數目 的增加而顯著增加。支援用於SMP系統之快取記憶體同 調之硬體的實例可包括修改、互斥、共享且無效的 (modified,exclusive,shared,invalid; MESI)探聽協定 (snoop protocol),該協定用於桌上PC之共享匯流排且 由便宜的硬體達成;以及基於目錄之協定,該協定用於 大規模分散式共享記憶體(distributed shared memory; DSM)中之快取記憶體同調、非均勻記憶體存取(以下簡 稱「CC-NUMA」)且由昂貴的硬體達成,該硬體將特殊 201234180 的節點間連接與例如協定處理器及目錄記憶體整合。使 用CC-NUMA之處理器之數目的增加導致硬體成本之增 加,且因此,多處理器之成本效能隨著處理器之數目: 增加而降低。亦即,CC_NUMA之經濟可擴縮性較低。 與此相反,因為叢集可由標準元件製造,所以叢集之每 個處理器之硬體成本比需要專用元件之CC NUMA之每 個處理器的硬體成本更為便宜。特定言之,若叢集使用 訊息傳遞介面重寫具有高度平行性之麻煩地平行應用程 式,則具有每處理器恆定硬體成本之叢集可大規模執行 平行處理。 非專利文獻1描述了基於虛擬記憶體(virtual mem〇ry; VM)之共享記憶體技術,該共享記憶體技術利用包括在 處理器令之記憶體管理單元(mem〇ry management (以下簡稱「MMU」)中之硬體,以改良SMP系統之可 擴縮性及成本效能。將此技術應用於在非專利文獻2中 4田述之非快取記憶體同調NUMA (以下簡稱 「NCC-NUMA」),該非快取記憶體同調NUMA可使用與 叢集之硬體一樣便宜之硬體。基於VM之共享記憶體技 術在相同程序中處理快取記憶體同調,但是該基於VM 之共享記憶體技術無法在不同程序之間處理快取記憶體 门調特足5之’因為常見使用寫入複製(C〇py_〇n_Write) 技術將相同實體頁映射至複數個程序來支援虛擬位址且 管理記憶體之通用OS,所以基於VM之共享記憶體技術 適用之資料限於確保應用程式並非由不同程序共享之資 201234180 料’並且無法實施自應用程式透通之快取記憶體同調。 換S之’出現明確指示具有相同虛擬位址空間之資料由 複數個處理器共享之需要,且為了將該技術應用於現有 軟體,必須重寫應用程式,因此產生與重寫應用程式有 關之附加軟體成本。因此,基於VM之共享記憶體技術 無法應用於通用電腦,並且彼技術之適用性限於特定用 途及允許新設計程式之科學計算。 專利文獻1描述主記憶體共享型多處理器,在該多處 理器中藉由提供實體頁映射表來增加少量硬體可消除或 顯著降低在重寫頁表時廣播TLB清除異動以控制tlb 一致性之需要,且可消除或顯著降低網路和節點及與 TLB清除相關聯之處理器之管線棧(pipeline staii)中 的匯流排中之訊務。 專利文獻2描述回應於資料傳送指令(諸如M〇v指 令),賦能存取内容可定址記憶冑(諸如快取記憶體 (CACHE-M)或位址轉換緩衝器(TLB)),且使項目無效 之操作。 專利文獻3描述引入軟體指人拟 子人瓶ί日令對(諸如位址轉換對) 以使轉換資訊能夠經由軟體吉妞你x t = 瑕罝接插入,而頁面錯誤處理 機能夠將轉換資訊插入至頁面曰你士 王只面目錄中並且將彼資訊插入 至TLB中,並且破保在完成百二ω ν. 凡成頁面錯誤處理機常式之執行 之後’當在下一次提供相同;%概& t,士 J湿擬位址時,不發生TLB遣 漏而發生TLB命中。 引用清單 201234180 _專利文獻 [專利文獻(PTL) 1]曰本未審查專利申請公開案第 2000-67009 號 [專利文獻(PTL) 2]曰本未審查專利申請公開案第 8-320829 號 [專利文獻(PTL) 3]曰本未審查專利申請公開案第 62-3357 號 _非專利文獻 [非專利文獻(NPL) 1] Karin Petersen 及 Kai Li 所著 之1993年四月之Newport Beach,CA之第七屆國際平 行處理討論會之論文集中的第1頁至第18頁「Cache Coherence for Shared Memory Multiprocessors Based on Virtual Memory Support »」 _ [非專利文獻(NPL) 2] Leonidas Kontothanassis 等人 所者之「Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks,」,ACM之電腦系統會刊(TOCS),2005年8 月’第3期,第23卷,第301頁至第335頁 【發明内容】 技術問題 因此’本發明之目標是達成使共享記憶體多處理器系 統之可擴縮性能夠增加並且成本效能能夠改良,同時保 留較低之硬體及軟體之成本之快取記憶體同調控制。本 201234180 發明之目標包括提供為達成該快取記憶體同調控制之方 法、系統及程式產品。本發明之目標亦包括經由使用便 宜硬體配置之軟體達成該快取記憶體同調控制。本發明 之目標進一步包括經由自應用程式透通(亦即,無需重 寫應用程式)之軟體達成該快取記憶體同調控制。 問題之解決方案 根據本發明之一個實施例用於控制快取記憶體同調之 方法控制多處理器系統之快取記憶體同調,多處理器系 統中複數個處理器共享系統記憶體,該複數個處理器中 之每者包括快取記憶體及TLB。當複數個處理器之一 處理Is決足TLB中斷不是發生在 誤時,該方法包括··當發生具有匹配位址之註冊資訊 存在於TLB中時,經由處理器執行T]LB遺漏異常處 步驟,該處理步驟將TLB中斷處理為TLB遺漏中斷 或當發生具有匹配位址之註冊資訊存在於tlb中但 取權限無效時’經由處理器執行儲存異常處理步驟, 處理步驟將TLB中斷處理為儲存中斷。遺漏異常 理步驟可包括以下步驟:清除快取記憶體之資料快取 憶體線’該快取記憶體之資料快取記憶體線屬於由犧 ,項目覆蓋之實體頁’當執行咖替換時該犧 項目被逐出且丢棄。TLB遺漏異常處理步驟或 子異常處理㈣可包括以下步驟:決定造成咖遺漏 斷或儲存中斷之#己體在取 tTW 口己隐體存取疋為資料存取還是指令 取,且當決㈣記憶體絲為資料存料,向由加 10 201234180201234180 VI. Description of the Invention: [Technical Field of the Invention] The present invention relates to coherency control of cache memory, and in particular, regarding the control of the shared memory multi-processor pepper & Method, system and program for coherence. [Prior Art] A multiprocessor system simultaneously executes a plurality of tasks or programs (hereinafter referred to as "programs"). Each of the plurality of programs typically has a virtual address space for executing the program. The location in the virtual address space contains the address mapped in the physical address in the system memory. A single space in the memory is mapped to a plurality of virtual addresses in the multiprocessor. When each of the plurality of programs uses a virtual address, the addresses are converted to physical addresses in the system memory, and If there is no appropriate instruction or data for executing each of the programs in the cache memory of the processor, the hidden entity is captured and stored in the cache memory. In order to quickly convert virtual addresses in a multiprocessor system into a system. The so-called conversion look-aside buffer ( look-aside bUffer ' hereinafter referred to as "TLB") associated with the cache memory is used to obtain the appropriate address or information in the hidden body. The TLB is a buffer containing the conversion relationship between the virtual address and the physical address generated using the conversion algorithm. The use of TLB makes address translation very efficient. However, if the buffer is used in a symmetric multiprocessing (hereafter referred to as "201234180 "SMP") system, a non-coherence problem will occur. For a system in which a plurality of processors can read information from the shared system memory and write information to the shared system memory, the system must be small. To ensure that the memory system is turned off. That is, the non-coherence of the memory system 怂 system caused by the execution of a plurality of processors is not allowed. Each of the processors in the multiprocessor system typically contains a TLB for address translation associated with the cache memory. In order to maintain coherence, the shared memory mode in the system must carefully and consistently map changes on the TLB of a single processor in a multiprocessor to other processors. For multiprocessors, the coherence of the TLB can be maintained, for example, by using a process 11 interrupt ("intewessor interrupt") and software synchronization in the modification of all DLBs. This technology ensures memory coherence across the entire multiprocessor system. In a typical page memory system, the content of each TLB in a multiprocessor system reflects a portion associated with the cache memory that retains the contents of the page table of the system memory. Page (4) is often a memory mapping table 'The memory mapping table contains a virtual address or a virtual address segment, and the physical bit X page table associated with the virtual address further contains various other types of management materials, _ including page protection bit, valid item bit and various access control bits 'i such as T will clearly refer to the need for different tune (requires memory coherence: attribute) bit is defined as management data to statically configure whether the page is positive Need to be the same. However, this bit static configuration method can only effectively vibrate; a special program 4' allows these special programs to be controlled to the software, and the It body is rewritten 'because, except that the above bits need to be statically configured, 201234180 This bit must be statically configured in the entire system memory. In recent years, there have been a number of desktop PCs with a total of 2 yuan (cpus) & SMP-Linux (Linux is the trademark of τ〇(10) in the United States and other countries), and many applications have become popular. Shared memory multiprocessor, that is, SMp system, has been supported. Therefore, the increase in the number of processors in the system improves the throughput of the application and does not require rewriting the software. The general-purpose operating system (OS) that promotes SMP support (such as SMp Unux) has grown to a multiprocessor system that can control no less than one 24 processors. The amount of processing that can be improved by an increase in the number of processors and without the need to rewrite the features of the software is an advantage that does not exist in multiprocessor systems that do not share memory, such as clusters that use message passing programming. Therefore, SMP is a multiprocessor system suitable for protecting software assets. However, the scalability of SMP systems is lower than the scalability of packet-based messaging. This is because the cost of supporting the memory of the cache memory is significantly increased as the number of processors used by the SMP system to improve scalability is increased. Examples of hardware that supports cache coherency for SMP systems may include modified, exclusive, shared, invalid; MESI snoop protocol, which is used for tables. A shared bus on the PC and achieved by inexpensive hardware; and a directory-based agreement for cached coherent, non-uniform memory in large-scale distributed shared memory (DSM) Access (hereafter referred to as "CC-NUMA") and achieved by expensive hardware that integrates the inter-node connections of the special 201234180 with, for example, a protocol processor and directory memory. The increase in the number of processors using CC-NUMA leads to an increase in hardware costs, and as a result, the cost performance of multiprocessors decreases as the number of processors: increases. That is, CC_NUMA has low economic scalability. In contrast, because the cluster can be fabricated from standard components, the hardware cost per processor of the cluster is less expensive than the hardware cost of each processor that requires dedicated components of CC NUMA. In particular, if the cluster uses the messaging interface to rewrite the troublesome parallel application with high parallelism, the cluster with constant hardware cost per processor can perform parallel processing on a large scale. Non-Patent Document 1 describes a shared memory technology based on virtual memory (VM), which utilizes a memory management unit (mem〇ry management (hereinafter referred to as "MMU") included in a processor. In the hardware of the SMP system to improve the scalability and cost performance of the SMP system. This technique is applied to the non-cache memory coherence NUMA (hereinafter referred to as "NCC-NUMA" in Non-Patent Document 2). The non-cache memory coherent NUMA can use the same hardware as the cluster hardware. The VM-based shared memory technology handles the cache coherency in the same program, but the VM-based shared memory technology cannot. Handling cache memory gates between different programs 'Because the common use of write copy (C〇py_〇n_Write) technology maps the same physical page to multiple programs to support virtual addresses and manage memory Universal OS, so the data applicable to VM-based shared memory technology is limited to ensuring that the application is not shared by different programs. The cache is coherent. The change of S's clearly indicates that the data with the same virtual address space is shared by multiple processors, and in order to apply the technique to existing software, the application must be rewritten, thus generating Rewriting the additional software costs associated with the application. Therefore, VM-based shared memory technology cannot be applied to general-purpose computers, and the applicability of the technology is limited to specific uses and allows scientific calculations of new design programs. Patent Document 1 describes main memory A shared multiprocessor in which the addition of a small amount of hardware by providing a physical page mapping table eliminates or significantly reduces the need to broadcast TLB cleanups to control tlb consistency when rewriting page tables, and eliminates Or significantly reducing the traffic in the bus in the pipeline and the node and the pipeline staii associated with the TLB clearing. Patent Document 2 describes responding to a data transfer instruction (such as the M〇v instruction). Ability to access content addressable memory (such as cache memory (CACHE-M) or address translation buffer (TLB)) and invalidate the project Operation Patent Document 3 describes the introduction of a software to refer to a person's person's bottle, such as an address conversion pair, so that the conversion information can be inserted via the software, and the page fault handler can convert. The information is inserted into the page of your king's face-only directory and the information is inserted into the TLB, and the break-up is completed after the implementation of the page error processor routine 'when the next offer is the same; % When the T-wet address is located, there is no TLB missed and a TLB hit occurs. Citation List 201234180 _ Patent Literature [Patent Document (PTL) 1] Unexamined Patent Application Publication No. 2000-67009 [Patent Document (PTL) 2] Unexamined Patent Application Publication No. 8-320829 [Patent Document (PTL) 3] Unexamined Patent Application Publication No. 62-3357 - Non-Patent Literature (NPL) 1] Cache Coherence for Shared Memory Multiprocessors B from Karin Petersen and Kai Li, Newport Beach, April 1993, at the 7th International Parallel Processing Workshop of CA, Pages 1 to 18. Ased on Virtual Memory Support »" _ [Non-patent Literature (NPL) 2] "Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks" by Leonidas Kontothanassis et al., ACM Computer Systems Journal (TOCS), August 2005 'No. 3, Vol. 23, pp. 301 to 335. [Technical Problem] Therefore, the object of the present invention is to achieve an increase in the scalability of a shared memory multiprocessor system and The cost-effectiveness of the cache can be improved while retaining the cost of the lower hardware and software. The objectives of the 201234180 invention include providing methods, systems, and program products for achieving coherent control of the cache memory. It is also an object of the present invention to achieve this cache memory coherence control via the use of an inexpensive hardware configuration. The object of the present invention further includes achieving the cache coherency control via software that is transparent from the application (i.e., without rewriting the application). Solution to Problem According to one embodiment of the present invention, a method for controlling coherency of a cache memory controls a cache coherency of a multiprocessor system in which a plurality of processors share system memory, the plurality of processors Each of the processors includes a cache memory and a TLB. When one of the plurality of processors processes Is to determine that the TLB interrupt does not occur, the method includes: when the registration information with the matching address exists in the TLB, the step of performing the T]LB missing exception via the processor The processing step processes the TLB interrupt as a TLB miss interrupt or when a registration information having a matching address exists in tlb but the authority is invalid. 'The storage exception processing step is performed via the processor, and the processing step processes the TLB interrupt as a storage interrupt. . The missing abnormality step may include the following steps: clearing the data of the cache memory, the cache memory line, the data of the cache memory, the cache memory line belongs to the physical page covered by the project, and when the coffee replacement is performed, The sacrifice project was evicted and discarded. The TLB missing exception processing step or the sub-anomaly processing (4) may include the following steps: determining whether the cause of causing a coffee leak or a storage interruption is to access the tTW port as a data access or an instruction fetch, and the decision (4) memory Body silk for data storage, to the plus 10 201234180
目覆蓋之實體頁提供寫入、讀取及執行之權限,該TLB 項目與具有互斥約束之存取結合來替換或更新,該互斥 約束係排除對另一處理器之TLB中之實體頁的存取權 限0 較佳地,提供具有互斥約束之寫入、讀取及執行之權 限的步驟可包括以下步驟:提供具有寫人無效約束之寫 入、4取及執行之權限的處理步驟。提供具有寫入無效 約束之寫人、讀取及執行之權限的步驟可包括mesi模 擬處理步驟,該步驟提供具# MESI協定之約束之寫 入、讀取及執行之權限。 較佳地MESI模擬處理步驟可包括以下步驟:決定 記憶體存取為資料寫人還是資料讀取,#決定記憶體存 取為資料讀取時,將對處判之則中之存取的實體頁 之讀取屬性設定為開啟,且該TLB目錄記憶體留存複數 個處理器之TLB之註冊資訊,纟Τ£Β目錄記㈣Μ 尋該存取之實體頁以及決定其他處理器之tlb是否且 有對存取之實體頁寫人之權限其他處理器具有 寫入之權限時,經由處理器間中斷將清理命令通知其他 處理器且造成其他處理器清除對存取 限’並且清除對則目錄記憶體中之其他處;= 的絲之實體頁的寫入屬性。造成其他處理器清除對存 取之實體頁寫入之權限的步驟可包括以下步驟.、: ===資^取_體複製回存並且禁用對二處 理益之TLB中之存取的實體頁的寫入屬性。 201234180 較佳地,MESI模擬處理可包括以下步驟:當決定記 憶體存取為資料寫入時,將對處理器之tlb中之存取的 實體頁以及對TLB目錄記憶體之寫入屬性設定為開 啟’在TLB目錄記憶體中搜尋該存取之實體頁以及決定 其他處理器之TLB是否具有對存取之實體頁讀取、寫入 或執行之權限’當決定其他處理器具有對實體頁寫入、 讀取或執行之權限時,經由處理器間中斷將清除命令通 A其他處理器並且造成其他處理器清除對存取之實體頁 讀取冑人及執行之權限’並且清除對tlb目錄記憶體 令之其他處理器之TLB的存取之實體頁的讀取、寫入及 執行屬〖生•成其他處理器清除對存取之實體頁讀取、 寫入及執行之權限的步驟可包括以下步驟:造成其他處 理器將資料快取記憶體複製回存並且無效化,並且禁用 對其他處理器之TLB中之存取的實體頁之讀取寫入及 執行屬性。 較佳地,TLB遺漏異常處理步驟或儲存異常處理步驟 可包括以下步驟:決定造成TLB遺漏中斷或儲存中斷之 記憶體存取為資料存取還是指令存取,當決定該記憶體 存取為指令存取時’決定在系統記憶體中之頁表中之項 目是否具有對由指令提取而使·!^^遺漏中斷產生之實 體頁的使用者寫人許可權限,當決定頁表中之項目具有 使用者寫入許可權限時’決定其他處理器之KB是:具 有對實體頁之使用者寫人許可權限,並且,當決定其他 處理器之TLB具有使用者寫人許可權限時,經由處理器 12 201234180 間中斷將清理命令通知其他處理器並且造成其他處理器 清除使用者寫入許可權限。當決定其他處理器之TLB不 具有使用者寫入許可權限時或在造成其他處理器清除使 用者寫入許可權限之步驟之後,TLB遺漏異常處理步驟 或儲存異常處理步驟可包括以下步驟:使進行存取之處 理器之指令快取記憶體無效。當決定頁表中之項目不具 有使用者寫入許可權限或在使進行存取之處理器之指令 快取记憶體無效的步驟之後,TLB遺漏異常處理步驟或 儲存異常處理步驟可包括以下步驟:將對在TLB遺漏中 斷由進行存取之處理器之TLB中的指令提取產生之實 體頁之執行屬性設定為開啟,且該TLB目錄記憶體留存 複數個處理器之TLB之註冊資訊。 較佳地,MESI模擬處理步驟可進一步包括以下步驟: 在搜尋存取之實體頁之TLB目錄記憶體時使用旗號進 行順序存取。 通過本發明之一個實施例,提供一種用於快取記憶體 同調控制之電腦程式產品,該電腦程式產品造成處理器 執行如上所述步驟中之每一者。 根據本發明之另—實施例之用於控制快取記憶體同調 之系統控制多處理器系統之快取記憶體同調,在該多處 理益系統中之複數個處理器共享系統記憶體,該複數個 處理器中之每一者包括快取記憶體及TLB。處理器中之 每一者進一步包括TLB控制器,該TLB控制器包括tlb 搜尋單元及同調處理機,該TLB搜尋單元執行TLB搜 201234180 尋,且該同調處理機當在TLB搜尋令未發生命中且發生 TLB _斷時執行TLB註冊資訊處理。同調處理機包括 TLB替換處理機、TLB遺漏異常處理單元及儲存異常處 理單元TLB替換處理機在系統記憶體中搜尋頁表並且 對TLB註冊資訊執行替換。當TLB _斷並非頁面錯誤 時,當具有匹配位址之註冊資訊不存在於TLB中時,TLB 遺漏異常處理單元將TLB中斷處理為發生TLB遺漏中 斷,並且當具有匹配位址之註冊資訊存在於tlb _但存 取權限無效時,儲存異常處理單元將TLB令斷處理為發 生儲存中斷。TLB遺漏異常處理單元可清除快取記憶體 之資料快取記憶體線’該快取記憶體之資料快取記憶體 線屬於由犧牲纟TLB項目覆蓋之實體頁,#執行;:lb 替換時該犧牲者TLB項目被逐出且丟棄。則遺漏異常 處理單元及儲存異常處理單元中之每一者可決定造成 TLB遺漏巾斷或料中斷之錢體存取是㈣存取還是 才曰7存取’並且’當決定記憶體存取為資料存取時,可 對由TLB項目覆蓋之實體百坦祉 貝遛貝拉供寫入、頃取及執行之權 限,該TLB項目與具有互斥約束之存取結合來替換或更 新’該互斥約束係排除對另-處理器之TLB中之實體頁 之存取權限。 TLB遺漏異常處理單元式 平凡次儲存異常處理單元中之每一 者可決定造成TLB遺漏中磨 斷或儲存中斷之記憶體存取 為資料存取還是指令存取,去 田決疋§亥5己憶體存取為指令 存取時’可決定在系統記憶體 k遐中之頁表中之項目是否具 14 201234180 有對由指令提取而冑TLB遺漏中斷產生之實體頁之使 用者寫人許可餘,當衫^巾之項目具有使用者寫 入許可權限時,可決^其他處理器之TLB是否具有對實 體頁之使用者寫入許可權限,並且,當決定其他處理器 之TLB具有使用者寫人許可權限時,可經由處理器間中 斷將’月理命令通知其他處理器並且造成其他處理器清除 使用者寫入許可權限。 用於控制快取記憶體同調之系統可進一步包括 目錄記憶體,該TLB目錄記憶體留存用於複數個處理器 之TLB之註冊資訊並且由複數個處理器在該目錄 記憶體中搜尋實體頁。 較佳地,多處理器系統可包括複數個節點,該複數個 節點中之每一者可包括複數個處理器、系統記憶體以及 TLB目錄s己憶體及旗號處理機,該系統記憶體經由同調 共享匯流排連接至複數個處理器,該旗號處理機用於由 使用旗號之複數個處理器順序存取TLB目錄記憶體, TLB目錄記憶體及旗號處理機經由橋接機構連接至同調 共享匯流排。該複數個節點可經由NCC-NUMA機構彼 此連接。 發明之有利效果 使用本發明’可以達成使共享記憶體多處理器系統之 可擴縮性能夠增加並且成本效能能夠改良,同時保留較 低之硬體及軟體之成本之快取記憶體同調控制。特定言 之,提供用於達成該快取記憶體同調控制之方法、系統 15 201234180 及程式產品’該快取記憶體同調控制可經由使用便宜硬 體配置之軟體達成,並且該快取記憶體同調控制可在不 重寫應用程式之情況下達成。 【實施方式】 用於執行本發明之最佳方式將在下文參看圖式詳細描 述。下文之實施例並非意欲限制本發明之申請專利範圍 之範疇,且並非需要在實施例中描述之特徵之所有組合 才能解決問題。本發明可以許多不同形式實施且不應理 解為限於本文所陳述之實施例之内容。相同部分及元件 具有遍佈實施例之描述之相同元件符號。The entity page covered by the object provides the right to write, read, and execute. The TLB item is replaced or updated in conjunction with an access with a mutually exclusive constraint that excludes the physical page in the TLB of the other processor. Access Rights 0 Preferably, the step of providing permission to write, read, and execute with mutually exclusive constraints may include the steps of: providing processing steps with write, 4 fetch, and execute permissions with write invalid constraints . The step of providing the write, read, and execute permissions with write invalid constraints may include a mesi simulation processing step that provides permission to write, read, and execute with the constraints of the # MESI agreement. Preferably, the MESI analog processing step may include the steps of: determining whether the memory access is a data writer or a data read, and # determining whether the memory access is an entity that is accessed in the judgment when the data is read. The page read attribute is set to on, and the TLB directory memory retains the registration information of the TLB of the plurality of processors, Β Β 记 (4) 寻 find the physical page of the access and determine whether the tlb of the other processor has Permission to the physical page of the accessed entity. When the other processor has the write permission, the cleanup command is notified to the other processor via the interprocessor interrupt and the other processor clears the access limit and clears the directory memory. Others in ;; the write attribute of the silk physical page. The step of causing other processors to clear the permission to write to the accessed physical page may include the following steps:, === 复制 复制 复制 回 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且 并且Write attribute. 201234180 Preferably, the MESI analog processing may include the following steps: when determining that the memory access is data writing, setting the physical page of the access in the processor tlb and the writing attribute to the TLB directory memory to Enable 'search for the physical page of the access in the TLB directory memory and determine whether the TLB of other processors has the right to read, write or execute the physical page of the access' when deciding that other processors have written to the physical page When entering, reading, or executing permissions, the inter-processor interrupt will clear the command to other processors and cause other processors to clear the read and execute permissions on the accessed physical page' and clear the tlb directory memory. The steps of reading, writing, and executing the physical page of the access of the TLB of other processors are the steps of the other processor to clear the right to read, write, and execute the accessed physical page. The following steps: causing other processors to copy and cache the data cache memory and invalidate, and disable the read and write of the physical page accessed in the TLB of other processors . Preferably, the TLB miss exception processing step or the storage exception processing step may include the steps of: determining whether the memory access causing the TLB miss interrupt or the storage interrupt is a data access or an instruction access, when determining the memory access as an instruction When accessing, 'determines whether the item in the page table in the system memory has the user write permission of the entity page generated by the instruction fetching ·^^ omission interrupt, when determining the item in the page table has When the user writes the permission, 'the KB of the other processor is determined to have the permission of the user to write to the entity page, and when the TLB of the other processor is determined to have the user's permission to write the user, via the processor 12 Interrupts between 201234180 notify the other processors of the cleanup command and cause other processors to clear the user write permission. When it is determined that the TLB of the other processor does not have the user write permission or after the step of causing the other processor to clear the user write permission, the TLB miss exception processing step or the storage exception processing step may include the following steps: The instruction cache memory of the accessing processor is invalid. The TLB Missing Exception Handling Step or the Storing Exception Handling Step may include the following steps when it is determined that the item in the page table does not have the user write permission or the step of invalidating the instruction cache memory of the processor that is accessed; : The execution attribute of the entity page generated by the instruction extraction in the TLB of the processor that accesses the TLB miss interrupt is set to ON, and the TLB directory memory retains the registration information of the TLB of the plurality of processors. Preferably, the MESI analog processing step may further comprise the step of: performing sequential access using the flag when searching for the TLB directory memory of the accessed physical page. With one embodiment of the present invention, a computer program product for cache coherency control is provided that causes a processor to perform each of the steps described above. According to another embodiment of the present invention, the system for controlling the coherency of the cache memory controls the cache coherency of the multiprocessor system, and the plurality of processors in the multiprocessing system share the system memory, the complex number Each of the processors includes a cache memory and a TLB. Each of the processors further includes a TLB controller including a tlb search unit and a coherent processor, the TLB search unit performing a TLB search 201234180 seek, and the coherent processor is not in life when the TLB search command is TLB registration information processing is performed when TLB_break occurs. The coherent processor includes a TLB replacement processor, a TLB missing exception handling unit, and a storage exception handling unit TLB replacement processor that searches the system memory for page tables and performs replacement on the TLB registration information. When the TLB_break is not a page fault, when the registration information with the matching address does not exist in the TLB, the TLB missing exception handling unit processes the TLB interrupt as a TLB miss interrupt, and when the registration information with the matching address exists Tlb _ When the access authority is invalid, the storage exception handling unit processes the TLB assertion as a storage interrupt. The TLB Missing Exception Handling Unit can clear the data of the cache memory. The cache memory line of the cache memory is the physical page covered by the victim 纟TLB project, #execution; The victim TLB project was evicted and discarded. Then each of the missing exception handling unit and the stored exception handling unit can determine whether the money access to the TLB missed or interrupted is (4) access or access 7 and 'when determining memory access is When accessing data, the entity can be written, retrieved, and executed by the entity covered by the TLB project. The TLB project is replaced with an access with a mutually exclusive constraint to replace or update the mutual The exclusion constraint excludes access to the physical page in the TLB of the other processor. Each of the TLB Missing Exception Handling Units, the Quaternary Storage Exception Handling Unit, can determine whether the memory access for the TLB miss in the wear or storage interrupt is a data access or an instruction access. When the memory access is an instruction access, it can determine whether the item in the page table in the system memory is 14 201234180. There is a user write permission for the physical page generated by the instruction and the TLB miss interrupt is generated. When the item of the shirt has the user write permission, it can be determined whether the TLB of the other processor has the permission to write to the user of the physical page, and when the TLB of the other processor is determined to have the user's writer When permissions are granted, the 'monthly commands' can be notified to other processors via interprocessor interrupts and cause other processors to clear the user write permission. The system for controlling the coherency of the cache memory may further include a directory memory that retains registration information for the TLBs of the plurality of processors and searches for the physical pages in the directory memory by the plurality of processors. Preferably, the multiprocessor system may include a plurality of nodes, each of the plurality of nodes may include a plurality of processors, system memory, and TLB directory s memory and flag processor, the system memory via The coherent shared bus is connected to a plurality of processors for sequentially accessing the TLB directory memory by a plurality of processors using the flag, and the TLB directory memory and the flag processor are connected to the coherent shared bus via the bridging mechanism. . The plurality of nodes can be connected to each other via the NCC-NUMA mechanism. Advantageous Effects of Invention With the present invention, it is possible to achieve a cache memory coherent control which enables an increase in the scalability of a shared memory multiprocessor system and which can be improved in cost performance while retaining the cost of a lower hardware and software. Specifically, a method for implementing the coherency control of the cache memory, a system 15 201234180 and a program product 'the cache coherency control can be achieved by using a software with a cheap hardware configuration, and the cache memory is coherent Control can be achieved without rewriting the application. [Embodiment] The best mode for carrying out the invention will be described in detail below with reference to the drawings. The following examples are not intended to limit the scope of the invention, and all combinations of the features described in the embodiments are not required to solve the problems. The invention may be embodied in many different forms and should not be construed as being limited to the embodiments disclosed herein. Identical parts and elements have the same element symbols throughout the description of the embodiments.
第1圖為示意圖示根據本發明可用於達成快取記憶體 同調控制之多處理器系統100之方塊圖。多處理器系統 100包括複數個處理器101、記憶體匯流排1〇2及系統記 憶體103。處理器101由記憶體匯流排1〇2連接至系統 記憶體103。處㈣101中之每一者包括CPU104、MMU 105及快取記憶體1〇6。MMU 1〇5包括TLB 1〇7。處理 器101中之快取記憶體1〇6保持系統記憶體1〇3之内容 之:部分。對於諸如多處理器系、统100之SMP系統,處 理器101可自系統記憶冑1〇3讀取資訊並且向系統記憶 寫入資汛,並且需要使系統記憶體丨03中與快取 記憶體106中之資料及指令同調。較佳地,系統記憶體 103可在系統記憶體1()31具有頁表108β藉由使用複數 個項目亦即’在頁表108中之註冊資訊之片段使虛擬 16 201234180 位址得以有效地映射至系統記憶體t 中之實體位址。 系統記憶體103包括記憶體控制器1〇9並且與連接至系 統記憶體1G3之外部儲存裝置12Q交換與儲存資訊相關 之資訊,亦#,自夕卜部儲存裝i 12〇絲資訊並且向外 部儲存裝置120寫入資訊。處理器1〇1中之每一者可藉 由使用TLB 107複製包含於頁纟1〇8中之每一項目中之 資訊’將指令或資料中之虛擬位址轉換至系統記憶體 103中之實體位址。因為TLB 1〇7提供在記憶體空間中 的位址資訊,所以為了確保TLB 1〇7之正確操作,維持 在多處理器系統100中之TLB 1〇7之間的同調很重要。 第2圖為示意圖示根據本發明之一個實施例具有快取 記憶體同調控制系統之處理器丨〇丨之方塊圖。處理器i 〇丄 之快取記憶體106包括指令快取記憶體丨〇6,及資料快取 記憶體106"。處理器101連接至TLB目錄記憶體m, 除5己憶體匯流排102可存取之外,所有處理器1 〇丨皆可 存取TLB目錄記憶體121。TLB目錄記憶體1S1為複製 保持在所有處理器101之TLB 107中之諸如實體頁編 號、讀取/寫入/執行存取權限及有效狀態的資訊中之一 者’且將該複製之資訊映射至所有處理器1〇1之CPU 104 可參考之全局位址,以便本地端處理器1〇1可在不使用 處理器間中斷中斷遠端處理器101之情況下檢查遠端處 理器101之TLB 107之内容。CPU 104中之每一者皆具 有執行應用程式(application program; AP)處理122的操 作模式(使用者模式)、執行OS核心處理124的操作模 17 201234180 式(管理員桓十、 、,)及執行中斷處理機的操作模式。同調處 理機126以第二榀从>、 一操作杈式執行。TLB控制器123包括TLB 搜尋單元12s R _ 及同調處理機126,該TLB搜尋單元125 用於當AP處理 22存取快取記憶體ι〇6時或當〇s核心 他24存取快取記憶體106時執行TLB搜尋,且該同 調處理機126用於當在則搜尋中未發生命中時且發生 中斷時執行TLB 1〇7之註冊資訊處理。同調處理機 126位於0S核心處理124外部以用於處理頁面錯誤,如 第2圖中所示。 TLB S Λ ^ r-. 早凡125包括快取記憶體標籤搜尋單元 1 7該J·夬取5己憶體標籤搜尋單元⑺用於當搜尋 中,生命中時搜尋快取記憶體標籤。當快取記憶體標籤 搜哥中發生命中時’快取記憶體標藏搜尋單it 127指示 處理122存取快取記憶豸1()6。當快取記憶體標鐵搜 哥中未發生命中而發生快取記憶體標籤遺漏時,快取記 憶體標籤搜尋單元127指示Ap處理122不存取快取記 憶體106 ’而存取系統記憶體1 〇3。 同調處理機126包括TLB替換處理機128、則遺漏 異常處理單70 129及儲存異常處理單元13G。TLB替換 處理機128包括頁表搜尋單元131及頁面錯誤決定單元 132。在TLB中斷由TLB搜尋單元125發現之狀況下, 頁表搜尋單元131對系統記憶體1〇3中之頁表ι〇8執行 搜尋。頁面錯誤決定單元132由頁表搜尋單元Η〗執行 之搜尋決定是否發生頁面錯誤。當頁面錯誤決定單 18 201234180 决疋自由頁表搜尋單元131執 發生時,亦即,決定執仃之搜寻無頁面錯誤 tlb m * 頁面存在於頁表108中時, TLB遺漏異常處理 行同調控制。在其中二 常處理單元130執 於頁表108中且無百月,下’儘官TLB項目頁面存在 頁面錯誤發生,在TLB搜尋未發現命 中發生TLB中斷,其中匹配位址之項目(亦即註冊資 訊)不存在於TLB中之情況被稱為「TLB遺漏中斷」, 且其中匹配位址之項目(亦即註冊資訊)存在於⑽ 中、但存取權限無效之情職㈣「儲存巾斷^」則 遺漏異常處理單元129處理TLB遺漏中斷,而儲存異常 處理單元130處理儲存中斷。因為同調處理機126當無 頁面錯誤發生時執行同調控制,此技術與基於VM之共 享記憶體技術不同,後者當頁面錯誤發生時執行同調: 制。 〇S核心處理124包括記憶體管理單元Π3。當頁面錯 誤決定單元U2由頁表搜尋單元131執行之搜尋決定頁3 面錯誤發生時,TLB替換處理機128產生頁面錯誤中斷 且OS核心處理124之記憶體管理單元1:33處理頁面錯 誤0 同調處理機126之TLB遺漏異常處理單元129及儲存 異常處理單元130執行同調控制,以使得僅註冊於本地 端處理器101中之TLB 107中之實體位址保持在快取記 憶體106中。因此,當同調處理機126執行TLB替換時, 將清除由TLB項目(亦即將作為犧牲者逐出且丟棄之註 19 201234180 冊資訊)覆蓋之貫體頁,亦即自快取記憶體複製回存 (copied back)且無效化。此外,對由添加之tlb項目 覆蓋之實體頁之讀取/寫入/執行權限(亦即註冊資訊) 受到互斥約束,該互斥約束係排除對遠端處理器1〇1之 TLB 107中之實體頁的存取權限。互斥約束之實例包括 寫入無效,特定言之,MESI協定之約束。MESI協定係 分類為寫入無效類型之同調協定。亦有寫入更新類型。 兩種類型皆可使用。MESI協定之約束描述如下。若添 加該約束,則不需要處理同調,直到發生TLB遺漏。因 為基於VM之共享記憶體技術互斥地快取保持於頁表中 之邏輯頁,所以在相同實體頁映射至不同邏輯頁之情況 下,該技術無法解決同調。 當TLB項目亦即註冊資訊被替換時,亦即為Tlb遺 漏中斷及儲存中斷取代或更新時,提供符合MESI協定 之約束之讀取/寫入/執行權限。對於其中由硬體協助頁 表搜尋之處理器,TLB僅為其中快取頁表之一部分的一 者,而對於使用由軟體控制之TLB之圖示於第2圖中的 快取記憶體同調控制,僅存取權限在TLB 1 〇7中設定, 該存取權限與來自記錄於頁表中之存取權限中之mesi 協定的互斥約束相對應。因此’記錄於TLB 107中之存 取權限與記錄於系統記憶體1 〇3之頁表丨〇8中之存取權 限或對存取權限添加約束之存取權限相同。 對於TLB遺漏中斷或儲存中斷,本地端處理器I。〗搜 尋TLB項目(亦即需要更新之遠端處理器丨〇1之註冊資 20 201234180 訊)以便藉由參考TLB目錄記憶體以符合meSI協 定之互斥約束。為了防止複數個處理器1〇1同步更新 TLB目錄記憶體121,可以較佳地將使用旗號之順序存 取用於存取TLB目錄記憶體1;21。TLB目錄記憶體121 可較佳地由内容可定址記憶體(c〇ntent addressable memory; CAM)實施。對於CAM,含有實體頁編號及讀 取/寫入/執行存取許可位元之搜尋字及其中處理器ID及 TLB項目編號相連之搜尋字為CAM之位址輸入。用於 CAM存取中之匯流排可較佳地為獨立於記憶體匯流排 且專用於CPU之匯流排。該匯流排之實例包括裝置控制 暫存器(device control register; DCR)匯流排。 第3圖示意圖示TLB目錄記憶體121之配置。tlb目 錄記憶體12 1留存項目(亦即,處理器i 〇丨之^ w 中之註冊資訊)以允許處理器1〇1中之每一者在無處理 器間中斷之情況下追蹤項目(亦即,其他處理器之 TLB 107之註冊資訊)。控制快取記憶體以使得僅允許項 目之頁面(亦即,在處理器1〇1中之每一者之tlb 中註冊的註冊資訊)得以快取,因此使每一快取記憶體 中之頁面之使用狀態得以藉由搜尋TLB〖〇7決定。將 TLB目錄s己憶體121映射至全局位址空間以便允許所有 處理器101對TLB目錄記憶體121存取。丁LB目錄記憶 體121中之每一項目包括由vs(有效狀態)指示之有效 狀態資訊300、由pPN (實體頁編號)指示之實體頁編 號資訊3〇1及由R/W/Ep(讀取/寫入/執行保護)指示之 21 201234180 讀取/寫入/執行存取權限保護資訊3 02。該等資訊為自保 持於所有處理器101之TLB 107中之相應資訊複製之資 訊。在左端,指示由處理器ID及TLB項目編號之組合 形成之TLB目錄記憶體121的位址;在右端,指示對應 於處理器0至處理器N之項目之群組。使用TLB目錄記 憶體121搜尋處理器1〇1之TLB 107中之實體頁編號, 因此使在不同程序之中的同調能夠得以處理。較佳地, TLB目錄δ己憶體121可藉由由CAM實施而更加快速以 達成接下來兩個搜尋操作:一個搜尋操作為搜尋匹配實 體頁編號且具有寫入許可之頁面,而另一個搜尋操作為 搜尋匹配實體頁編號且具有讀取、寫入,或執行許可之 頁面。在搜尋中’ CAM之搜尋字輸入含有實體頁編號及 存取頁面之許可’並且其中處理器ID與TLB項目編號 相連之輸入係在CAM之位址輸入中之輸入。由處理器 佔據之匯流排,諸如DCR匯流排係適合於用於存取cam 之匯流排。 第4圖為示意圖示根據本發明之一個實施例之用於控 制快取汜憶體同調的方法之流程圖(4〇〇)。此方法可由處 理器ιοί達成,該處理器10丨之TLB經由軟體控制,如 第2圖中所不。當應用程式存取快取記憶體(步驟4〇工) 時程序開始’並且處理器101執行TLB搜尋(步驟4〇2)。 當在TLB搜尋中發生命中時’處理器1〇1搜尋命中tlb .項目之快取記憶體標籤(步驟4〇3 )。當在快取記憶體標 籤搜尋中發生命令日夺,處理器1〇1指示對快取記憶體進 22 201234180 行存取並且對快取記憶體進行存取(步驟404)。冬 取記憶體標籤搜尋中未發生命中而發生快取記憶:標: 遺漏時’處理器1()1指示對系統記憶體進行存取並:對 系統§己憶體進行存取(步驟4〇5 )。當在搜 、, 驟402)中未發生命中而發生TLB中斷日夺處二(ι〇步 1 決定該TLB中斷是否為頁面錯誤(步驟406)。當決— TLB中斷並非頁面錯誤時,亦即㈣項目(亦即註^ 訊)之頁面存在於頁表中(步驟4〇6中之「否」)時斤 理器1CH使用同調處理機執行TLB遺漏異常處理或儲: 異常處理之子常式(步驟術)。當決定TLB中斷為頁 面錯誤(步驟406中之「是」)時,處理器ι〇ι產生頁面 錯誤中斷並且使用〇s核心處理之記憶體 頁面錯誤處理之子常式(步驟彻早讀仃 第5圖為圖示對犧牲者TLB項目的逐出處理之流程圖 ()該犧牲者TLB項目亦即在由同調處理機之tlb 遺漏:常處理及儲存異常處理(見第4圖中之步驟4〇7) 之:㊉式中的註冊資訊。由同調處理機進行之KB遺漏 “處理之子常式開始於進入tlb遺漏異常處理(步驟 5〇1),而由同調處理機之儲存異常處理之子常式開始於 進入儲存異常處理(步驟502 )。對於TLB遺漏異常處 理’因為無匹配位址之$目(亦即註冊資訊)存在於TLB 中所以處理器101執行導入匹配項目之丁 LB替換, 亦即將註冊資訊自頁表1〇8導入至TLB 1〇7 (步驟 )此時’ TLB目錄記憶體121更新項目亦即註冊 23 201234180 資訊。當執行TLB替換睹,由_ 、仃τ換時,處理器1G1清除(複製回存 且無效化)本地端資料快 。隐體線,該本地端資料快 取記憶體線屬於由犧牲者TLB招n , 、 I i LB項目(亦即,待逐出且丟 棄之註冊資訊)覆蓋之實 1遐負(步驟504 )。此舉使得僅 只項目(亦即,註冊於TT R由 几6中之註冊資訊)得以可靠地 快取於本地端處理器中,且 u此’同調控制之需要可簡 單地藉由在TLB遺漏中斷·式紗―上 甲斷次儲存中斷之狀況下檢查遠 端處理器之TLB來決定。A牛眺“ 疋在步驟504之後,處理器ι〇1 決定造成TLB遺漏中斷或储存中斷之記憶體存取為資 料存取還是指令存取(步驟如)。當決定該記憶體存取 為資料存取時,處理器1 〇丨冷> ^、 处益1U1進仃至MESI模擬處理之子BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram showing a multiprocessor system 100 that can be used to achieve cache coherency control in accordance with the present invention. The multiprocessor system 100 includes a plurality of processors 101, a memory bus 1 〇 2, and a system memory 103. The processor 101 is connected to the system memory 103 by a memory bus 〇2. Each of the (four) 101 includes a CPU 104, an MMU 105, and a cache memory 1〇6. MMU 1〇5 includes TLB 1〇7. The cache memory 1〇6 in the processor 101 holds the contents of the system memory 1〇3: part. For an SMP system such as a multiprocessor system, the processor 101 can read information from the system memory 并且1〇3 and write the data to the system memory, and needs to make the system memory 丨03 and the cache memory. The information and instructions in 106 are the same. Preferably, system memory 103 can have virtual page 16 201234180 address effectively mapped in system memory 1() 31 with page table 108β by using a plurality of items, ie, fragments of registration information in page table 108. To the physical address in system memory t. The system memory 103 includes a memory controller 1〇9 and exchanges and stores information related to the external storage device 12Q connected to the system memory 1G3, and also stores the information and stores it to the outside. Device 120 writes information. Each of the processors 101 can convert the virtual address in the instruction or data into the system memory 103 by copying the information contained in each of the items in the page 〇8 using the TLB 107. Physical address. Since TLB 1〇7 provides address information in the memory space, it is important to maintain the coherence between TLBs 1〇7 in multiprocessor system 100 in order to ensure proper operation of TLB 1〇7. Figure 2 is a block diagram showing a processor block having a cache memory coherent control system in accordance with one embodiment of the present invention. The cache memory 106 of the processor i 指令 includes an instruction cache 丨〇6, and a data cache memory 106". The processor 101 is connected to the TLB directory memory m, and all of the processors 1 can access the TLB directory memory 121 except for the 5 memory bus 102. The TLB directory memory 1S1 is one of copying information such as a physical page number, read/write/execute access authority, and valid state held in the TLB 107 of all the processors 101 and mapping the copied information. The global address can be referenced by the CPU 104 of all the processors 101, so that the local end processor 101 can check the TLB of the remote processor 101 without interrupting the inter-processor interrupt to the remote processor 101. 107 content. Each of the CPUs 104 has an operation mode (user mode) for executing an application program (AP) process 122, an operation mode 17 for executing the OS core process 124, and a management mode (administrator 桓10, ,,) Execute the operating mode of the interrupt handler. The coherent processor 126 is executed in a second mode from > The TLB controller 123 includes a TLB search unit 12s R _ and a coherency processor 126 for accessing the cache memory when the AP process 22 accesses the cache memory ι 6 or when the 〇 s core 24 accesses the cache memory The TLB search is performed at the time of the body 106, and the coherency processor 126 is configured to perform the registration information processing of the TLB 1〇7 when no interruption occurs in the search and an interruption occurs. The coherent processor 126 is external to the OS core processing 124 for processing page faults, as shown in FIG. TLB S Λ ^ r-. The first 125 includes the cache memory tag search unit. The 7 J. capture 5 recall tag search unit (7) is used to search for the cache tag in the middle of the search. When the cache memory tag is searched for in the middle of life, the cache memory search list is shown 127. The process 122 accesses the cache memory 1 () 6. When the cache memory tag is missing in the cache memory, the cache memory tag search unit 127 instructs the Ap process 122 not to access the cache memory 106' and access the system memory. 1 〇 3. The coherent processor 126 includes a TLB replacement processor 128, a missing exception processing list 70 129, and a storage exception processing unit 13G. The TLB replacement processor 128 includes a page table search unit 131 and a page fault decision unit 132. In the case where the TLB interrupt is found by the TLB search unit 125, the page table search unit 131 performs a search on the page table ι8 in the system memory 1〇3. The page error determining unit 132 determines whether or not a page fault has occurred by the search performed by the page table search unit. When the page fault decision sheet 18 201234180 decides that the free page table search unit 131 is executed, that is, the search for the stub is not page fault tlb m * when the page exists in the page table 108, the TLB misses the exception processing line coherency control. In the case where the two-time processing unit 130 is in the page table 108 and there is no one hundred months, a page fault occurs in the next TLV project page, and a TLB interrupt is not found in the TLB search, wherein the matching address item (ie, registration) Information) The situation that does not exist in the TLB is called "TLB Missing Interruption", and the item matching the address (that is, the registration information) exists in (10), but the access authority is invalid (4) "Storage towel break ^ Then, the missing exception processing unit 129 processes the TLB miss interrupt, and the storage exception processing unit 130 processes the storage interrupt. Because coherent processor 126 performs coherent control when no page faults occur, this technique differs from VM-based shared memory technology, which performs coherence when a page fault occurs. The 〇S core processing 124 includes a memory management unit Π3. When the page fault determination unit U2 performs a search decision page 3 face error by the page table search unit 131, the TLB replacement handler 128 generates a page fault interrupt and the memory management unit 1:33 of the OS core process 124 processes the page error 0 coherence. The TLB Missing Exception Handling Unit 129 and the Store Exception Handling Unit 130 of the processor 126 perform coherency control such that only the physical address registered in the TLB 107 in the local side processor 101 remains in the cache memory 106. Therefore, when the coherent processor 126 performs the TLB replacement, the overlaid page covered by the TLB item (which will be eviction and discarded as a victim) will be cleared, that is, the self-cache memory is copied back to the memory. (copied back) and invalidated. In addition, the read/write/execute permission (ie, registration information) of the physical page covered by the added tlb item is mutually exclusive, and the mutual exclusion constraint is excluded from the TLB 107 of the remote processor 1〇1. The access rights of the physical page. Examples of mutually exclusive constraints include write invalidation, specifically, the constraints of the MESI protocol. The MESI protocol is classified as a coherent protocol that writes invalid types. There are also write update types. Both types are available. The constraints of the MESI Agreement are described below. If you add this constraint, you don't need to handle the coherence until a TLB miss occurs. Because VM-based shared memory technology mutually exclusive caches logical pages held in page tables, this technique cannot resolve cohomology if the same physical page is mapped to a different logical page. When the TLB project, ie, the registration information is replaced, that is, when the Tlb miss interrupt and the storage interrupt are replaced or updated, the read/write/execute authority conforming to the MESI agreement is provided. For processors that are searched by the hardware-assisted page table, the TLB is only one of the cached page tables, and the cache coherency control is shown in Figure 2 for the use of the software-controlled TLB. The access-only access is set in TLB 1 〇 7, which corresponds to the mutual exclusion constraint from the mesi protocol in the access rights recorded in the page table. Therefore, the access rights recorded in the TLB 107 are the same as the access rights recorded in the page table 8 of the system memory 1 〇 3 or the access rights to the access rights. For TLB miss interrupt or storage interrupt, local processor I. The search for the TLB project (that is, the registration of the remote processor 需要1 20 201234180) is required to reference the TLB directory memory to conform to the mutual exclusion constraint of the meSI protocol. In order to prevent the plurality of processors 101 to synchronously update the TLB directory memory 121, it is preferable to use the order of the flag for accessing the TLB directory memory 1; The TLB directory memory 121 can preferably be implemented by a content addressable memory (CAM). For CAM, the search word containing the physical page number and the read/write/execute access permission bit and its search processor ID and TLB item number are the address input of the CAM. The bus used in the CAM access may preferably be a bus that is independent of the memory bus and dedicated to the CPU. Examples of the bus bar include a device control register (DCR) bus. Figure 3 is a schematic diagram showing the configuration of the TLB directory memory 121. The tlb directory memory 12 1 retains the item (ie, the registration information in the processor i) to allow each of the processors 1〇1 to track the item without interruption between processors (also That is, the registration information of the TLB 107 of other processors). Controlling the cache memory so that only pages of the item (i.e., registration information registered in tlb of each of the processors 1〇1) are cached, thus causing pages in each cache memory The state of use can be determined by searching for the TLB [〇7]. The TLB directory s-resonance 121 is mapped to the global address space to allow all processors 101 to access the TLB directory memory 121. Each item in the D-LB directory memory 121 includes valid status information 300 indicated by vs (active status), physical page number information indicated by pPN (physical page number) information 3〇1, and R/W/Ep (read Fetch/Write/Execute Protection) Indication 21 201234180 Read/Write/Execute Access Rights Protection Information 3 02. This information is the information that is reproduced from the corresponding information in the TLB 107 of all processors 101. At the left end, the address of the TLB directory memory 121 formed by the combination of the processor ID and the TLB item number is indicated; at the right end, the group corresponding to the items of the processor 0 to the processor N is indicated. The TLB directory memory 121 is used to search for the physical page number in the TLB 107 of the processor 101, thus enabling coherency in different programs to be processed. Preferably, the TLB directory δ mnemonic 121 can be implemented more quickly by the CAM to achieve the next two search operations: one search operation is to search for a matching page number and has a write permission page, and another search The action is to search for pages that match the physical page number and have read, write, or execute permissions. In the search, the search input word of the CAM contains the physical page number and permission to access the page' and the input in which the processor ID is connected to the TLB item number is input in the address input of the CAM. A bus that is occupied by the processor, such as a DCR bus, is suitable for the bus for accessing the cam. Fig. 4 is a flow chart (4A) schematically showing a method for controlling coherency of a cache memory according to an embodiment of the present invention. This method can be achieved by the processor ιοί, the TLB of the processor 10 is controlled via software, as shown in Figure 2. The program starts when the application accesses the cache memory (step 4 is completed) and the processor 101 performs the TLB search (step 4〇2). When the lifetime is in the TLB search, the processor 1〇1 searches for the memory tag of the hit tlb. item (step 4〇3). When a command day occurs in the cache memory tag search, the processor 101 indicates an access to the cache memory and access to the cache memory (step 404). In the winter memory tag search, there is no life in the cache memory: the standard: When the processor is missing, 'processor 1 () 1 indicates access to the system memory and: access to the system § memory (step 4〇 5). When the TLB interrupt is not generated in the search, step 402), the TLB interrupt is taken (the ι step 1 determines whether the TLB interrupt is a page fault (step 406). When the decision - the TLB interrupt is not a page fault, that is, (4) The page of the project (ie, the note) exists in the page table ("No" in step 4〇6). The processor 1CH uses the coherent processor to perform TLB miss exception handling or storage: the subroutine of exception handling ( When the TLB interrupt is determined to be a page fault ("YES" in step 406), the processor ι〇ι generates a page fault interrupt and uses the subroutine of the memory page error handling of the 〇s core processing (step early) Reading Figure 5 is a flow chart illustrating the eviction process for the victim TLB project () The victim TLB project is also missing from the tlb of the coherent processor: normal processing and storage exception handling (see Figure 4) Step 4〇7): Registration information in the ten formula. KB omission by the coherent processor “The subroutine of the processing begins with the entry of the tlb omission exception processing (step 5〇1), and the storage exception of the coherent processor The handling of the child's routine begins with Store exception handling (step 502). For TLB miss exception handling 'because the $ mesh (ie registration information) of the unmatched address exists in the TLB, the processor 101 performs the LB replacement of the imported matching item, and will also register the information from Page 1〇8 is imported to TLB 1〇7 (step) At this time, the TLB directory memory 121 update item is also registered 23 201234180 information. When TLB replacement is performed, when _, 仃τ is changed, processor 1G1 is cleared ( Copy and restore and invalidate) Local data is fast. The hidden line, the local data cache memory line belongs to the victim TLB, n, I i LB project (that is, the registration information to be evicted and discarded) The coverage is 1遐 (step 504). This allows only the project (that is, the registration information registered in TT R from 6) to be reliably cached in the local processor, and this is the same The need for control can be determined simply by checking the TLB of the remote processor in the event that the TLB misses the interrupted yarn, and the storage is interrupted. A calf "After step 504, the processor ι〇1 Decided to cause a TLB missed interruption or Whether the memory access of the interrupt is a data access or an instruction access (steps). When it is determined that the memory access is a data access, the processor 1 is cold > ^, and the benefit 1U1 is entered into the MESI. Simulated child
常式506 (步驟505中之「咨刺. L 甲之資枓」);且當決定該記憶體 存取為指令存取時’處理器1G1進行至指令快取記憶體 同調處理之子常式507 (步驟5〇5中之「指令」)。 如上文簡要所述,對於TLB遺漏中斷及儲存中斷兩 者’在替換或更新TLB項目(亦即,註冊資訊)時,將 符合如下所述用於互斥約束(例如,寫入無效)之mesi 協定之讀取/寫入/執行權限設定在本地端TLB與遠端 TLB之間。 •共享唯讀資料 複數個處理器可共享讀取且執行對相同實體頁之權 限。若在資料讀取或指令提取中發生TLB中斷且遠端處 理器具有對只體頁寫入之權限,則由IP I向遠端處理器 通知清理命令,並且使彼遠端處理器清除對彼實體頁寫 24 201234180 入之權限。 •寫入資料之互斥控制 當處理器具有對實體頁寫入之權限時,其他處理器不 具有對彼頁面之任何種類之存取權限。換言之遠端tlb 不具有對本地g TLB具有寫入權限之實體頁之任何種 類之存取。因此’當對寫人之存取造成則遺漏中斷或 儲存中斷時,檢查遠端TLB以決定遠端處理器是否具有 對實體頁之存取權限;若遠端處理器具有該存取權限, 則利用IPI使遠編處理器自遠端快取記憶體清除實體頁 之資料。 第6圖為由軟體控制施加MESI協定約束之作為一實 例中的MESI模擬處理的流程圖(6〇〇)。當在第5圖中決 定記憶體存取為資料存取時(在步驟5〇5中),處理器 101進行至MESI模擬處理之子常式5〇6並且開始彼處 理(步驟601)。首先,處理器1〇1決定造成TLB中斷 之錯誤存取疋為資料寫入還是讀取(步驟6〇2 )。當決定 該錯誤存取為資料讀取時,處理器1〇1由頁表1〇8中頁 表項目(page table entry; PTE)之使用者唯讀(UR)及管理 員唯讀(SR)位元遮住項目(亦即’對應於在本地端TLB 及TLB目錄記憶體121中之錯誤存取之實體頁的註 冊資讯)之讀取(R)屬性,並且將該讀取(R)屬性設定開 啟(步驟603 )。然後’處理器ι〇1在TLB目錄記憶體 121中搜尋錯誤存取之實體頁並且決定遠端tlb是否具 有對彼實體頁寫入(W)之權限(步驟6〇4)。當決定遠端 25 201234180 TLB不具有寫入(W)之權限時(步驟6〇4中之「否」), 處理結束(步請)。冑決定遠端TLB具有寫入㈤之 權限時(步驟6〇4中之「是」),處理器1〇1經由Ιρι向 遂端處理H通知清理命令並且造成遠端處理器清除對實 體頁寫人之權限。亦即,遠端處理器將資料快取記憶體 複製回存並且禁用項目(亦即,對應於遠端TLB中之實 體頁之註冊資訊)之寫入(W)屬性(步驟6〇6 )。自邏輯 至實體位址之轉換留存在遠端TLB中之項目中。隨後, 處理器101清除項目(亦即,對應於TLB目錄記憶體 121中之遠端TLB之實體頁的註冊資訊)之w屬性(步 驟607),並且處理結束(步驟 當決定(在步驟602中)該錯誤存取為寫入時,處理 器1〇1由頁表108中之頁表項目(page table entin stepy; PTE)之使用者寫入(UW)及管理員寫入(sw)位元遮住項 目(亦即’對應於在本地端TLB 1 07及TLB目錄記憶體 121中之錯誤存取之實體頁的註冊資訊)之寫入(w)屬性 並且將該寫入(W)屬性設定開啟(步驟609 )。然後,處 理益101在TLB目錄記憶體121中搜尋錯誤存取之實體 頁並且決定遠端TLB是否具有對彼實體頁讀取(R)、寫 入(W)或執行(X)之權限(步驟610)。當決定遠端TLB 不具有讀取(R)、寫入(W)或執行(X)之權限時(步驟610 中之「否」),則處理結束(步驟605 )。當決定遠端TLB 具有讀取(R)、寫入(W)或執行(X)之權限時(步驟610中 之「是」)時,處理器1〇1經由IPI將清除命令通知遠端 26 201234180 處理器,並且在不向遠媲# ^ ^處理益提供對實體頁之存取權 &情況下造Μ端處理器清除來自遠端快取記憶體之 實體頁資料。亦即’遠端處理器將資料快取記憶體複製 回存且無效化,並且荦用珀 rThe routine 506 ("the thorns of the L-A" in step 505); and when the memory access is determined to be an instruction access, the processor 1G1 proceeds to the subroutine of the instruction cache memory coherent processing 507 ("Command" in step 5〇5). As briefly mentioned above, for both TLB miss interrupts and store interrupts, when replacing or updating a TLB entry (ie, registration information), it will conform to the mesi for mutual exclusion constraints (eg, write invalid) as described below. The protocol read/write/execute permission is set between the local end TLB and the remote TLB. • Shared Read-Only Data A number of processors can share reads and execute permissions on the same physical page. If a TLB interrupt occurs in the data read or instruction fetch and the remote processor has the right to write to the body page, the IP I notifies the remote processor of the cleanup command and causes the remote processor to clear the other Entity page write 24 201234180 Access rights. • Mutually exclusive control of written data When a processor has permission to write to a physical page, other processors do not have any kind of access to each page. In other words, the remote tlb does not have any type of access to the physical page that has write access to the local g TLB. Therefore, when the access to the writer is caused by an interruption or a storage interruption, the remote TLB is checked to determine whether the remote processor has access to the physical page; if the remote processor has the access authority, The IPI is used to enable the remote processor to clear the physical page data from the remote cache. Fig. 6 is a flow chart (6〇〇) of the MESI simulation process in an example in which the MESI agreement is imposed by the software control. When it is determined in Fig. 5 that the memory access is data access (in step 5〇5), the processor 101 proceeds to the sub-routine 5〇6 of the MESI analog processing and starts the processing (step 601). First, the processor 101 determines whether the erroneous access causing the TLB interrupt is a data write or a read (step 6 〇 2). When it is determined that the error access is data reading, the processor 101 uses the user-only read (UR) and the administrator-only read (SR) of the page table entry (PTE) in the page table 1-8. The bit masks the read (R) attribute of the item (ie, the registration information corresponding to the erroneously accessed physical page in the local TLB and TLB directory memory 121), and reads (R) The attribute setting is turned on (step 603). The processor 〇1 searches the TLB directory memory 121 for the erroneously accessed physical page and determines whether the remote tlb has the right to write (W) to the physical page (step 6〇4). When it is determined that the remote end 201234180 TLB does not have the right to write (W) ("No" in step 6〇4), the processing ends (step request).胄 When the remote TLB has the permission to write (5) (YES in step 6〇4), the processor 1〇1 processes the H notification cleanup command via the Ιρι and causes the remote processor to clear the write to the physical page. Human authority. That is, the remote processor copies the data cache memory back to the memory and disables the write (W) attribute of the entry (i.e., the registration information corresponding to the physical page in the remote TLB) (step 6-6). The transition from logical to physical address is left in the project in the remote TLB. Subsequently, the processor 101 clears the w attribute of the item (ie, the registration information corresponding to the physical page of the remote TLB in the TLB directory memory 121) (step 607), and the process ends (step when deciding (in step 602) When the error access is write, the processor 1〇1 is written by the user of the page table entry (PTE) in the page table 108 (UW) and the administrator writes (sw) the bit. The write (w) attribute of the item (ie, the registration information corresponding to the erroneously accessed physical page in the local end TLB 107 and the TLB directory memory 121) is hidden and the write (W) attribute is set. Turning on (step 609). Then, the processing benefit 101 searches for the erroneously accessed physical page in the TLB directory memory 121 and determines whether the remote TLB has read (R), write (W) or execute on the physical page ( X) authority (step 610). When it is determined that the remote TLB does not have the right to read (R), write (W), or execute (X) ("NO" in step 610), the process ends (step 605) when determining that the remote TLB has the right to read (R), write (W), or execute (X) (step 610) In the case of "Yes", the processor 101 indicates the clear command to the remote terminal 201234180 via the IPI, and does not provide access to the physical page to the remote device. The end processor clears the physical page data from the remote cache memory, that is, the remote processor copies the data cache memory back to the memory and invalidates it, and uses the device.
不用項目(亦即,對應於遠端TLB 中之實體頁之註冊資訊)之R、Wn 、 貝代)之R、W及X屬性(步驟611)。 自邏輯至實體位址之轉換留存在遠端Μ中之項目 中。隨後,處理器HH清除項目(亦即,對應於tlb目 錄記憶體121中之遠端則之實體頁的註冊資訊)之 R W及X屬f生(步驟612),並且處理結束(步驟6〇8 )。 以此方式,執行藉由根據MESI協定約束而設定讀 取寫人執行權限之探聽過渡(亦即,使用之探 聽删除)。增加了限制出現廣播探聽請求之決定步驟,對 於覆蓋彼資料之實體頁亦可註冊於遠#則中之情 況’該決定步驟在經由硬體實施Mesi協定時是一個問 題。因此’與由硬體實施MESI協定相比其中廳81 協定約束由軟體控制施加之MESI模擬處理可具有更高 可擴縮性。 利用根據本發明之-個實施例之快取記憶體同調控制 之同調處理機,資料快取記憶體之間的同調以及指令快 取記憶體與資料快取記憶體之間的同調兩者皆可得到控 制。此舉藉由當對具有寫人許可權限之可寫人頁面的指 令提取導致TLB遺漏中斷時,使指令快取記憶體線無效 達成。類似於Linux,為了支援例如動態鏈接程式庫, 指令快取記憶體需要與資料快取記憶體同調。對於 27 201234180The R, W, and X attributes of R, Wn, and Bed's (e.g., corresponding to the registration information of the physical page in the remote TLB) are not used (step 611). The conversion from logical to physical address remains in the project in the remote location. Subsequently, the processor HH clears the RW and X genre of the item (i.e., the registration information corresponding to the physical page of the remote end in the tlb directory memory 121) (step 612), and the process ends (step 6〇8). ). In this way, a snoop transition (i.e., use of the snoop delete) of the read writer execution authority is set by constraining according to the MESI agreement. The decision step of restricting the occurrence of a broadcast snoop request is added, and the physical page covering the data may also be registered in the far case. The decision step is a problem when the Mesi agreement is implemented via hardware. Therefore, the MESI analog processing applied by the software control can be more scalable than the hardware-implemented MESI protocol. With the coherent processor of the cache coherency control according to an embodiment of the present invention, the coherence between the data cache and the coherence between the instruction cache and the data cache can be used. Get control. This is accomplished by invalidating the instruction cache memory line when the instruction fetch for the writable person page with the permission of the writer permits the TLB miss interrupt. Similar to Linux, in order to support, for example, a dynamic link library, the instruction cache needs to be co-located with the data cache. For 27 201234180
Linux,僅當提取使用者空間中之可寫入頁面時,才需要 使指令快取記憶體無效。 第7圖為圖示經由軟體控制之指令快取記憶體同調處 理之流程圖(7〇〇)。當在第5圖中決定記憶體存取為指令 存取時(在步驟505中)’處理器101進行至指令快取記 憶體同調處理之子常式507並且開始彼處理(步驟 701)。首先’處理器101決定頁表108之PTE是否具有 對由指令提取而使TLB遺漏中斷產生之實體頁的使用 者寫入許可權限(步驟702)。當決定PTE具有使用者寫 入許可權限(步驟702中之「是」)時,處理器ι〇1決定 遠端TLB是否具有對實體頁之使用者寫入許可權限(步 驟703 )。當決定遠端TLB具有使用者寫入許可權限(步 驟703中之「是」)時,處理器1〇1藉由Ιρι將清理命令 通知遠端處理器並且造成遠端處理器清除使用者寫入許 可權限。亦即,遠端處理器為資料快取記憶體提供資料 快取記憶體區塊儲存(data cache block store; dcbst)指 令,儲存資料快取記憶體線,並且禁用遠端TLB中之w 屬性(步驟704)。自邏輯至實體位址之轉換留存在遠端 TLB中之項目中。然後,由於在步驟7〇3中決定tlb不 具有使用者寫入許可權限之情況(步驟7〇3中之「否」), 處理器ιοί藉由指令快取記憶體全等類無效(instructi〇n cache congruence Class invalidate; “。叫使本地端指令快 取記憶體無效(步驟705 )。隨後,由於在步驟7〇2中決 定PTE不具有使用者寫入許可權限之情況(步驟7〇2中 28 201234180 之「否」),處理器101由百* ^ 由頁表項目(PTE)之使用者執行(UX)Linux only needs to invalidate the instruction cache when extracting a writable page in user space. Figure 7 is a flow chart showing the coherent processing of the memory of the instruction cache via software control (7〇〇). When it is determined in Fig. 5 that the memory access is an instruction access (in step 505), the processor 101 proceeds to the subroutine 507 of the instruction cache coherency processing and starts the processing (step 701). First, the processor 101 determines whether the PTE of the page table 108 has a write permission to the user of the physical page that caused the TLB miss interrupt to be generated by the instruction fetch (step 702). When it is determined that the PTE has the user write permission ("YES" in step 702), the processor ι〇1 determines whether the remote TLB has a write permission to the user of the physical page (step 703). When it is determined that the remote TLB has the user write permission ("YES" in step 703), the processor 110 indicates the cleanup command to the remote processor by the operation and causes the remote processor to clear the user write. Permissions. That is, the remote processor provides a data cache block store (dcbst) instruction for the data cache memory, stores the data cache memory line, and disables the w attribute in the remote TLB ( Step 704). The conversion from logical to physical address remains in the project in the remote TLB. Then, since it is determined in step 7〇3 that tlb does not have the user write permission (NO in step 7〇3), the processor ιοί is invalidated by the instruction cache memory class (instructi〇) n cache congruence Class invalidate; ". The local command cache memory is invalid (step 705). Subsequently, since it is determined in step 7〇2 that the PTE does not have the user write permission (step 7〇2) 28 201234180 "No"), processor 101 is executed by a user of the page table entry (PTE) (UX)
及管理員執行(SX)位开:A 、M疋遮住項目(亦即,對應於由本地 端 TLB 107 Ά TLB 曰紅 ^ 13目錄C憶體121中之指令提取產生 TLB遺漏中斷的實體頁的註冊資訊)之執行⑻屬性並且 將執行(X)屬性设定為開啟(步驟繼)。然後處理結 束(步驟707 )。 TLB目錄s己憶體丨2丨係使用旗號順序存取。此舉保護 TLB目錄3己憶冑! 2 i免於由複數個處理胃丄〇丄同時更 新。第如何將旗號用於同調處理機之進入及退 出之流程( 800 )。對於同調處理機之進入係為處理開始 (步驟801 )、獲取旗號(步驟8〇2 ),並且處理結束(步 驟803)»對於同調處理機之退出係為處理開始(步驟 804 )、提供旗號之通知(步驟8〇5 ),並且處理結束(步 驟806 )〇儘管整個tlb目錄記憶體121可由單一旗號 互斥存取,但是為了增強複數個處理器中之每一者之可 擴縮性並且允許該等處理器同步存取TLB目錄記憶體 12 1 ’而將旗號分割以形成多個旗號並且將該等旗號分配 至實體頁所分割成為的各個群組中係為較佳的。例如, 虽貫體頁編號除以S時’剩餘數系統為旗號丨D,形成s 旗5虎’並且為每個群組獨立地保護所分割之實體頁。此 時,滿足以下關係: 旗號ID=mod (實體頁編號,S) 其中mod(a,b)表示當a除以b時的餘數。 右將此概念應用於numa(numa為分佈式丘享記憶 29 201234180 體系充)貝!可為各個NUMA節點分配不同旗號。僅當 進行遠端存取時’才可以參考遠端TLB目錄記憶體並且 才可以獲取旗就;否則可以參考本地端TLB目錄記憶體 並且可獲取旗號。 對於NUMA系統’將對處理器及實體記憶體之工作之 分配最佳化以使得對本地端系統記憶體存取之頻率高於 對遠知系統記憶體存取之頻率。對於該NUMA系統之應 用,將TLB目錄記憶體及旗號兩者分佈給NUMA節點 係較佳的。分佈式TLB目錄記憶體記錄本地端系統記憶 體之貫體頁編號,以及快取本地端系統記憶體之處理器 之ID ’並且分佈式旗號保護相應分佈式tlb目錄記憶 體。因此’僅當發生遠端存取時才參考遠端TLB目錄記 憶體及遠端旗號。僅可使用本地端TLB目錄記憶體及本 地端旗號處理其他本地端存取。 由硬體支援之同調之成本對於對本地端系統記憶體之 存取較低’但該成本對於向遠端系統記憶體之存取較 高。為解決此問題,可將便宜探聽匯流排用於對本地端 系統記憶體之存取及根據本發明之快取記憶體同調控制 用於對遠端系統記憶體之存取,而適用於SMP及 NCC-NUMA之延伸混合系統中。換言之由硬體支援之同 調用於對本地端系統記憶體之存取及根據本發明之快取 記憶體同調控制用於對遠端系統記憶體之存取,而可以 配置總體上同調之共享記憶體多處理器系統。第9圖圖 示作為一實例延伸至SMP及NCC-NUMA之混合系統之 30 201234180 同調’、享。己憶體多處理器系統9〇〇。每一節點包括複數 個處理器901、系統記憶體903,以& TLB目錄記憶體 905及旗號處理機9〇6,該系統記憶體9〇3由同調共享匯 流排(亦即,共享匯流排同調SMP 902 )連接至處理器 9〇1,而t亥TLB目錄記憶體9〇5及該旗號處理機9〇6兩 者皆由橋接機構904連接至共享匯流排同調SMP 902。 提供旗號處理機906以允許複數個處理器9()1藉由旗號 以順序存取TLB目錄記憶體9〇5。節點藉* ncc numa 機構9G7彼此連接。因為節點藉由便宜的ncc numa 機構907彼此連接,所以共享記憶體多處理器系統_ 可增加節點之數目,亦即,可改良可擴縮性同時保留較 低之硬體之成本。 右TLB目錄記憶體中之項目之數目並非受限並且本 地端系統記憶體及遠端系統記憶體兩者可自由地相關 聯,則TLB目錄記憶體之大小與處理器之數目成比例增 加。例如,若1024個痄。 個處理态中之每一者具有1〇24個項 目之T L B並且1個js η Λ 個項目為4位元組,則tlb目錄記憶 體之大小經由以下計算為4 " (1024個處理器、*〆,。〜, 、 15 ; ( 1024個項目)* ( 4位元組)叫And the administrator executes (SX) bit open: A, M疋 hides the item (that is, corresponds to the physical page corresponding to the instruction in the local end TLB 107 Ά TLB ^ red ^ 13 directory C memory 121 to generate a TLB miss interrupt. The registration information) executes the (8) attribute and sets the execution (X) attribute to ON (step by step). Processing then ends (step 707). The TLB directory is a sequential access using the flag. This protection protects the TLB directory 3! 2 i is free from multiple treatments for stomach cramps and updates. The first is how to use the flag for the process of entering and exiting the coherent processor (800). The entry into the coherent processor is the processing start (step 801), the acquisition of the flag (step 8〇2), and the end of the process (step 803) » the exit to the coherent processor is the process start (step 804), providing the banner Notifying (step 8〇5), and processing ends (step 806), although the entire tlb directory memory 121 can be accessed by a single flag, in order to enhance the scalability of each of the plurality of processors and allow It is preferred that the processors simultaneously access the TLB directory memory 12 1 'and divide the flags to form a plurality of flags and assign the flags to the respective groups into which the physical pages are divided. For example, although the number of the page numbers is divided by S, the remaining number system is the flag 丨D, forming the s flag 5 tiger' and the divided physical pages are independently protected for each group. At this time, the following relationship is satisfied: Flag ID=mod (physical page number, S) where mod(a,b) represents the remainder when a is divided by b. Right apply this concept to numa (numa is distributed Chuanxiang memory 29 201234180 system charge) shell! Different RANA nodes can be assigned different flags. Only when remote access is performed can the remote TLB directory memory be referenced and the flag can be obtained; otherwise, the local TLB directory memory can be referred to and the flag can be obtained. For NUMA systems, the allocation of processor and physical memory operations is optimized such that the frequency of access to the local system memory is higher than the frequency of access to the remote system memory. For the application of the NUMA system, it is preferable to distribute both the TLB directory memory and the flag to the NUMA node. The distributed TLB directory memory records the local page number of the local system memory, and the ID of the processor that caches the local end system memory and the distributed flag protects the corresponding distributed tlb directory memory. Therefore, the remote TLB directory memory and the remote flag are referenced only when remote access occurs. Only local-side TLB directory memory and local-end flag can be used to handle other local-side access. The cost of co-ordinated support by hardware is lower for access to local system memory 'but the cost is higher for access to remote system memory. To solve this problem, a cheap snoop bus can be used for accessing the local end system memory and the cache coherency control according to the present invention is used for accessing the remote system memory, and is suitable for SMP and NCC-NUMA in an extended hybrid system. In other words, the hardware-assisted homology is used for accessing the local end system memory and the cache coherency control according to the present invention is used for accessing the remote system memory, and the overall coherent shared memory can be configured. Multiprocessor system. Figure 9 shows an example of a hybrid system that extends to SMP and NCC-NUMA. It has a multi-processor system. Each node includes a plurality of processors 901, system memory 903, & TLB directory memory 905 and flag processor 9〇6, the system memory 9〇3 is shared by the coherent bus (ie, shared bus) The coherent SMP 902) is coupled to the processor 〇1, and both the TH TLB directory memory 〇5 and the flag processor 〇6 are connected by the bridging mechanism 904 to the shared bus coherent SMP 902. A flag processor 906 is provided to allow a plurality of processors 9() 1 to sequentially access the TLB directory memory 〇5 by a flag. The nodes are connected to each other by the *ncc numa mechanism 9G7. Since the nodes are connected to each other by the inexpensive ncc numa mechanism 907, the shared memory multiprocessor system _ can increase the number of nodes, i.e., can improve scalability while preserving the cost of lower hardware. The number of items in the right TLB directory memory is not limited and the local system memory and the remote system memory are freely associated, and the size of the TLB directory memory is increased in proportion to the number of processors. For example, if 1024 are. Each of the processing states has a TLB of 1 24 items and 1 js η Λ items are 4 bytes, and the size of the tlb directory memory is calculated as 4 " (1024 processors, *〆,.~, ,15; (1024 items)* (4 bytes) called
MB 為了降低TLB目錄記憶體之大小,例如,如第1 〇圖 中所不’若將系統應用於NUMA系統,則由每一處理器 1〇01分配給遠端系統記憶體(RSM)之用於rSM之TLb 項目1 〇〇2的數目文到限制,並且剩餘之TLB項目用作 31 201234180 分配給本地端系統記憶體(LSΜ)之用於lsm之TLB項目 1003。用於LSM之本地端TLB目錄記憶體包括自 遠端處理器(RP)之TLB目錄複製之項目、被限制之TLb 項目之數目’以及自分配剩餘之TLB項目之本地端處理 器(LP)之TLB目錄複製的項目,並且可以降低該等項目 及數目之大小。特定言之,其中NUMA節點之數目為N, 每CPU之TLB項目之數目為E’並且分配給來自tlb 項目之遠端系統記憶體之項目的數目為R,分配給本地 知系統§己憶體之TLB項目之數目為,並且因此,每 節點之TLB目錄記憶體之項目之數目自e*n減少至 (N-1)*R+1*(E-R)。對於上述實例,當將1024値處理器 分佈給256個NUMA節點並且將4路SMP結構用於每 一節點時,若分配給遠端TLB之TLB項目之數目限制 為16 ’則TLB目錄記憶體之大小經由以下計算為8 i 4 KB : (1020個處理器)( 16個項目)* ( 4位元組)+ ( 4 個處理器)* ( 1008個項目)* ( 4位元組)=8 1.4 KB 當實施在CAM上時,在45 nm半導體技術之狀況下, TLB目錄記憶體所要求之區域之面積僅為1 mm2 u 如上所述’當執行根據本發明之經由軟體之快取記憶 體同調控制時’因為共享記憶體多處理器系統可由便宜 的元件形成’諸如通用元件,所以可將硬體成本控制至 相當於叢集之程度’並且可改良可擴縮性。搜尋僅管理 實體頁之每一處理器之TLB資訊的小規模TLB目錄記 32 201234180 憶體賦能處理複數個程序並且消除改變應用程式之需 要,從而在不產生附加軟體成本之情況下改良可擴縮性。 本發明係如上描述以使用實施例。然而,本發明之技 術範圍不限於針對實施例描述之範圍。可將各種改變或 修改添加至實施例中,並且具有該等改變或修改之模式 亦可包含於本發明之技術範圍中。 【圖式簡單說明】 第1圖為示意圖示根據本發明可用於達成快取記憶體 同調控制之多處理器系統之方塊圖。 第2圖為示意圖示根據本發明之一個實施例之快取記 憶體同調控制系統之方塊圖。 第3圖圖示TLB目錄記憶體之示意配置。 第4圖為不思圖不根據本發明之一個實施例用於控制 快取記憶體同調之方法的流程圖。 第5圖為圖不對由同調處理機執行之TLB遺漏異常處 理及儲存異常處理中备 ^ ^ ? ^ ^In order to reduce the size of the TLB directory memory, for example, as shown in the first figure, if the system is applied to the NUMA system, it is allocated to the remote system memory (RSM) by each processor 1〇01. The number of TLb items 1 〇〇 2 for rSM is limited to the limit, and the remaining TLB items are used as the TLB project 1003 for lsm assigned to the local end system memory (LSΜ) by 31 201234180. The local-side TLB directory memory for the LSM includes the TLB directory replication project from the remote processor (RP), the number of restricted TLb entries', and the local-end processor (LP) that allocates the remaining TLB entries. The TLB catalog replicates the items and can reduce the size of these items and the number. Specifically, where the number of NUMA nodes is N, the number of TLB items per CPU is E' and the number of items assigned to the remote system memory from the tlb item is R, assigned to the local knowledge system § Recall The number of TLB items is, and therefore, the number of items of TLB directory memory per node is reduced from e*n to (N-1)*R+1*(ER). For the above example, when a 1024 値 processor is distributed to 256 NUMA nodes and a 4-way SMP structure is used for each node, if the number of TLB entries allocated to the remote TLB is limited to 16 ', then the TLB directory memory is The size is calculated as 8 i 4 KB by: (1020 processors) (16 items)* (4 bytes) + (4 processors)* (1008 items)* (4 bytes) = 8 1.4 KB When implemented on CAM, in the case of 45 nm semiconductor technology, the area required for the TLB directory memory is only 1 mm2 u as described above 'When performing the software-based cache memory according to the present invention In coherent control, 'because the shared memory multiprocessor system can be formed from inexpensive components such as general purpose components, the hardware cost can be controlled to the extent equivalent to the cluster' and the scalability can be improved. Search for small-scale TLB catalogs that only manage TLB information for each processor on the physical page. 32 201234180 Replica can handle multiple programs and eliminate the need to change applications, thereby improving scalability without additional software costs. Shrinkage. The invention has been described above to use embodiments. However, the technical scope of the present invention is not limited to the scope described for the embodiments. Various changes or modifications may be added to the embodiments, and modes having such changes or modifications may also be included in the technical scope of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram showing a multiprocessor system that can be used to achieve cache coherency control in accordance with the present invention. Figure 2 is a block diagram showing a cache memory coherent control system in accordance with one embodiment of the present invention. Figure 3 illustrates a schematic configuration of the TLB directory memory. Figure 4 is a flow diagram of a method for controlling coherency of a cache memory in accordance with one embodiment of the present invention. Figure 5 is a diagram of the TLB missing exception handling and storage exception handling performed by the coherent processor. ^ ^ ? ^ ^
甲之母—者之子常式中的犧牲者TLB 項目進行逐出處理之流程圖。 第6圖為圖不由同調處理機在几b遺漏異常處理及儲 存異常處理中之每—者 有之子常式中的MESI模擬處理之 流程圖。 第7圖為圖示p]綱南 調處理機之TLB遺漏異常處理及儲存 異常處理中之每_去 有之子常式中的指令快取記憶體同調 處理之流程圖。 33 201234180 第8圖圖示如何將旗號用於同調處理機之進入及退出 之流程。 第9圖圖示延伸至SMP及NCC-NUMA之混合系統之 同調共享記憶體多處理器系統的示意配置。 第1 〇圖圖示LSM之本地端TLB目錄記憶體之示意配 置。 【主要元件符號說明】 100 多處理器系統 Γόι 處理器 102 記憶體 Γ03 系統記憶體 104 CPU ' --一 T05 MMU 106 快取記憶體 「06’ ^旨令快取記憶體 106" 資料快取記憶_體 107 TLB 108 頁表 109 記憶體控制器 120 外部儲存裝置 121 TLB目錄記憶體 122 應用程式(AP、廬理 123 TLB控制器 124 〇s核心處理 125 TLB搜尋單元 126 同調處理機 127 快取記憶體標籤搜 尋單元 128 TLB替換處理機 129 TLB遺漏異常處理 〇〇 單7L 130 储存異常處理單元 131 頁表搜尋單瓦~~ ~~~' 132 負面錯誤決定單元 133 記憶體管理 300 有效狀態資訊 301 實體頁編! 302 讃取/寫入/執行存 取權限保護資訊 4〇〇 ---—. 流程圖 401 步驟 402 403 404 步驟 405 步驟 406 #驟 ~ 407 408 步驟 — 500 流程圖 501 步驟 ~~~ 502 503 步驟 ~~~~~~ 504 步驟 505 步驟 ~ 506 冬常式 507 子常式 ~~~ 34 201234180 600 流程圖 601 步驟 602 步驟 603 步驟 604 步驟 605 步驟 606 步驟 607 步驟 608 步驟 609 步驟 610 步驟 611 步驟 612 步驟 700 流程圖 701 步驟 702 步驟 703 步驟 704 步驟 705 步驟 706 步驟 707 步驟 800 流程 801 步驟 802 步驟 803 步驟 804 步驟 805 步驟 806 步驟 900 共享記憶體多處理 器系統 901 處理器 902 共享匯流排同調 SMP 903 糸統記憶體 904 橋接機構 905 TLB目錄記憶體 906 旗號處理機 1000 本地端TLB目錄記 憶體 1001 處理器 1002 TLB項目 1003 TLB項目 35The flow chart of the eviction process of the victim TLB project in the parent of the child. Fig. 6 is a flow chart showing the MESI simulation processing in the subroutine of each of the missing exception processing and the storage exception processing by the coherent processor. Fig. 7 is a flow chart showing the processing of the coherency processing of the instruction cache in the subroutine of the T) missing exception processing and the storage exception processing in the p] outline processing machine. 33 201234180 Figure 8 shows how to use the flag for the process of entering and exiting the coherent processor. Figure 9 illustrates a schematic configuration of a coherent shared memory multiprocessor system extending to a hybrid system of SMP and NCC-NUMA. Figure 1 illustrates the schematic configuration of the local end TLB directory memory of the LSM. [Main component symbol description] 100 multiprocessor system Γόι processor 102 memory Γ 03 system memory 104 CPU ' -- one T05 MMU 106 cache memory "06 ' ^ to make cache memory 106 " data cache memory _body 107 TLB 108 page table 109 memory controller 120 external storage device 121 TLB directory memory 122 application (AP, processing 123 TLB controller 124 〇s core processing 125 TLB search unit 126 coherent processor 127 cache memory Body Tag Search Unit 128 TLB Replacement Processor 129 TLB Missing Exception Handling Page 7L 130 Storage Exception Handling Unit 131 Page Table Search Single Tile ~~~~~' 132 Negative Error Determination Unit 133 Memory Management 300 Valid Status Information 301 Entity Page !! 302 Capture/Write/Execute Access Rights Protection Information 4〇〇----. Flowchart 401 Step 402 403 404 Step 405 Step 406 #骤~ 407 408 Step - 500 Flowchart 501 Step ~~~ 502 503 Steps ~~~~~~ 504 Step 505 Step ~ 506 Winter 507 Sub-Function ~~~ 34 201234180 600 Flowchart 601 Step 602 Step 603 Step 604 Step 605 Step 606 Step 607 Step 608 Step 609 Step 610 Step 611 Step 612 Step 700 Flowchart 701 Step 702 Step 703 Step 704 Step 705 Step 706 Step 707 Step 800 Flow 801 Step 802 Step 803 Step 804 Step 805 Step 806 Step 900 shared memory multiprocessor system 901 processor 902 shared bus coherent SMP 903 记忆 memory 904 bridging mechanism 905 TLB directory memory 906 flag processor 1000 local end TLB directory memory 1001 processor 1002 TLB project 1003 TLB project 35