TW201035867A

TW201035867A - Microprocessor and method for reducing tablewalk time

Info

Publication number: TW201035867A
Application number: TW99107221A
Authority: TW
Inventors: Rodney E Hooker; Colin Eddy
Original assignee: Via Tech Inc
Priority date: 2009-03-30
Filing date: 2010-03-12
Publication date: 2010-10-01
Also published as: CN102999440A; TW201312461A; US20120198176A1; US20140013058A1; US8433853B2; TWI437490B; CN101833515A; US20100250859A1; US8161246B2; CN102999440B; CN101833515B; TWI451334B

Abstract

A microprocessor includes a cache memory, a load unit, and a prefetch unit, coupled to the load unit. The load unit is configured to receive a load request indicating that the load request is loading a page table entry. The prefetch unit is configured to receive from the load unit a physical address of a first cache line that includes the page table entry specified by the load request. The prefetch unit is further configured to responsively generate a request to prefetch into the cache memory a second cache line. The second cache line is the next physically sequential cache line to the first cache line.

Description

201035867 六、發明說明：【發明所屬之技術領域】本發明係關於微處理器，特別是關於微處理器之預取 (prefetch)資料的方法。【先前技術】現今許多微處理器具有使用虛擬記憶體的能力，特別是能夠運用一 s己憶體分頁機制（memory卩㈣叩 mechanism)。熟知本領域技藝者應能理解，作業系統在系統記憶體中所建立之分頁表(page tables)係用來將虛擬位址轉譯成實體位址。根據《IA-32英特爾⑧架構軟體開發者手冊，第3A冊：系統程式設計導引，第1篇，2006年6月》中所描述之x86架構處理器技術(該參考文獻全文係以引用方式併入本文中），分頁表可採取階層方式（hierarchical fashion)排列。具體說來，分頁表包含複數分頁表項目（page table entries ; PTE) ’各個分頁表項目儲存一實體記憶體分頁之實體分頁位址與實體記憶體分頁之屬性。所謂的分頁表尋訪（tablewalk)係指提取一虛擬記憶體分頁位址並使用此虛擬記憶體分頁位址來尋訪(traverse)分頁表階層，用以取得與此虛擬記憶體分頁位址對應之分頁表項目以便將虛擬位址轉譯成實體位址。由於實體記憶體存取之延遲時間相對較長，加上在分頁表尋訪過程中可能要對實體記憶體進行多重存取，因此執行分頁表尋訪十分耗時。為了避免因執行分頁表尋訪而造成的時耗，處理器通常會包含一轉譯查詢緩衝器 CNTR2456IO0-TW/O608-A42266-TW/Final/ 201035867 (Translation Lookaside Buffer; TLB)用以儲存虛擬位址及由虛擬位址轉譯成之實體位址。然而，轉譯查詢緩衝器之大小有限，並且當轉譯查詢緩衝器發生遺失(miss^f還是需2 執行分頁表尋訪。因此，我們需要一種能夠縮短分頁訪之執行時間的方法。 # 【發明内容】本發明提供一種微處理器’包括—快取記憶體、— 入單元以及一預取單元。載入單元用以接收一第一咬〇求信號’第-載入請求信號用以顯示第一載入請求传= 載入-第-分頁表項目。預取單元搞接至载入單元^ 單元用以從載入單元中接收-第一快取線之一實體位Z 第一快取線包含第一載入請求信號所指定之第一八目，預取單元更產生-第-請求信號以預取下-條快取線。第一快取線之後的本發明提供一種縮短分頁表尋訪時曰]之方法，適用 ο具有-快取記憶體且支援分頁虛擬記憶體之、上述方法包括偵測一第一分頁表項目之一第— 號。上述方法更包括根據偵測第一載Λ請求信:2 預取一第二快取線至快取記憶體，豆中 D〜να /、甲第二快取魂係為一第一快取線之後的下一條快取線，並且取深你马 -載入請求信號所指定之第-分頁表項目W取線包含第本發明提供另一種微處理器，句紅〜快取記憶體、一載入單元以及-預取單元。載入單元用以接 ^ 請求信號，第一載入請求信號用以顯示裳卑一載入請求信號 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 2 201035867 正載入-第-分頁表項目。預取單元耦接至載入單元，預取單元用以從載入單元中接收一第一快取線之一實體位址’第-快取線包含第-載人請求信號所指定之第一分頁表項目，預取單元更產生-第―請求信制以預取一第二快取線至快取記憶體，其中第二快取線係為第一快取線之前的上一條快取線。本發明k供另一種縮短分頁表尋訪時間之方法，適用於具有一快取記憶體且支援分頁虛擬記憶體之一微處理器。上述方法包括偵測一第一分頁表項目之一第一載入請求信號。上述方法更包括根據偵測第一載入請求信號之結果，預取一第二快取線至快取記憶體，其中第二快取線係為一第一快取線之前的上一條快取線，並且第一快取線包含第一載入請求信號所指定之第一分頁表項目。為讓本發明之上述和其他目的、特徵、和優點能更明顯易懂，下文特舉出較佳貫施例，並配合所附圖式，作詳細說明如下：【實施方式】請參考第1圖，第1圖係為本發明實施例之微處理器 100的方塊圖’此微處理器100係為一管線式微處理器 (pipelined microprocessor)。微處理器1〇〇包括一指令快取 102用以提供複數指令至一指令轉譯器1〇4，並且指令轉譯器104將所接收之指令轉譯並將轉譯後之指令提供至一指令配送器（instruction dispatcher) 106。指令配送器1 〇6將指令提供至一載入單元108,其中上述指令可包括記憶體存取指令(例如載入指令或儲存指令）。載入單元108將一記 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 3 201035867 憶體存取指令所指定之虛擬位址132提供至—轉譯查詢緩衝器116，並且轉譯查詢緩衝器116對虛擬位址132進行查找（lookup)。若虛擬位址132有出現在轉譯查詢缓衝器 116中，則轉譯查詢緩衝器116將虛擬位址132轉譯後之實體位址144傳送回載入單元1〇8。若虛擬位址132未出現在轉s睪查§旬緩衝器116中，則轉譯查詢緩衝器us產生一遺失信號(miss signal) 134並傳送至一分頁表尋訪引擎 (tablewalk engine) 118。分頁表尋訪引擎118耦接至載入單〇元108以及轉譯查詢緩衝器116。如第1圖所示，預取單元122與資料快取112也耦接至載入單元108，並且匯流排介面單元η#輕接至資料快取Π2。匯流排介面單元114將微處理器10〇耦接至一處理器匯流排，上述處理器匯流排耦接至具有微處理器1〇〇之電腦系統中的貫體記憶體128。具體說來，實體記憶體 128儲存複數分頁表，其中一分頁表包括位於實體位址p 之一第一快取線124以及位於實體位址p+64之一第二快取 Ο 線126，並且第一快取線124與第二快取線126分別儲存八個分頁表項目。在本實施例中一條快取線之大小係為64 位元組(bytes)’並且一個分頁表項目之大小係為8位元組，因此每條快取線可儲存八個分頁表項目。請參考第2圖，第2圖係為第1圖中之微處理器100 的操作流程圖，用以說明如何預取下一條快取線，其中此快取線與一載入至載入單元之分頁表項目有關。流程從步驟202開始。在步驟202中，當虛擬位址132未出現在轉譯查詢緩 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 4 201035867 衝器116中，轉譯查詢緩衝器116產生一遺失信號134並傳送至分頁表尋訪引擎118。分頁表尋訪引擎118在接收遺失信號134後即執行分頁表尋訪以便取得遺失在轉譯查詢緩衝器116中之虛擬位址132所轉譯成的實體位址。分頁表尋訪引擎118藉由產生一分頁表項目載入請求信號 (PTE load request) 136來執行分頁表尋訪動作，其中分頁表尋訪引擎118將分頁表項目載入請求信號136傳送至載入單元108，用以載入執行位址轉譯所需之分頁表項目。流程前進至步驟204。在步驟204中，載入單元108偵測分頁表項目載入請求信號136並且載入位於實體記憶體128中之分頁表項目。此外，載入單元108透過一確認信號138告知預取單元122已經查見（seen)分頁表項目載入請求信號136，並且將第一快取線124之實體位址提供至轉譯查詢緩衝器 116，在第1圖的實施例中，該實體位址為P，其中此第一快取線124具有載入單元108所載入之分頁表項目。流程前進至步驟206。在步驟206中，預取單元122產生一預取請求信號142 並傳送至載入單元108。預取請求信號142命令載入單元 108將位於實體位址P+64之第二快取線126預取至資料快取112。換言之，載入單元108將位於第一快取線124(具有載入至載入單元108之分頁表項目）之後的下一條快取線 (第二快取線126)預取至資料快取112。流程前進至步驟 208。在步驟208中，載入單元108根據預取請求信號142 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 5 201035867 將下一條快取線（第二快取線126)預取至資料快取112。然而在某些情況下，微處理器100中之載入單元108並不會執行載入第二快取線126的動作。舉例而言，上述情況可為一功能性需求（functional requirement)情況，例如快取線落在一非快取記憶體區（non_cacheable memory region)。上述情況也可為微處理器100要執行非推測性配置 (non-speculative allocations)。若載入單元 108 決定載入來自實體記憶體128中之第二快取線126,則載入單元108命 0 令匯流排介面單元114執行此載入動作。流程結束於步驟 208。雖然本發明實施例係描述預取下一條快取線，但在其他實施例中，預取單元122會產生一請求信號用以命令載入單元108預取上一條快取線，或者是命令載入單元1〇8 預取下一條與上一條快取線。此實施例適用於程式在記憶體分頁中以另一方向行進的情況。此外’雖然本發明實施例係描述預取具有分頁表項目 Ο 之下一條快取線，但在其他實施例中，預取單元122會產生一請求信號用以命令載入單元108預取具有其他層級 (level)的分頁資訊階層之下一條快取線，例如分頁描述符項目（Page Descriptor Entries ; PDE)。值得注意的是，雖然使用此方法的某些程式之存取樣本（access pattern)是有助益的，但由於將大量實體記憶體設置於單一分頁描述符項目下方的情況不常見’並且程式尋訪記憶體的速度會變得很慢，因此上述方法不但效率不彰也會帶來風險。此外，在其他實施例中，預取單元122會產生一請求信號用以命 CNTR2456I00-TW/0608-A42266-TW/Final/ 6 201035867 令載入單元108預取具有另一分頁表階層（不同於上述分頁描述符項目/分頁表項目階層）之下一條快取線。如前文所述，預取單元122會產生一請求信號用以命令載入單元108預取下一條快取線至具有需要完成分頁表尋訪之分頁表項目的快取線。假設各個分頁表之大小為4 千位元組(KB)，各個分頁表項目之大小為8位元組，並且各條快取線之大小為64位元組，所以一個分頁表中會具有 64條分別具有八個分頁表項目之快取線。因此，於步驟208 中所預取之下一條快取線中具有分頁表中緊鄰的（next)八個分頁表項目的可能性相當高，特別是在作業系統將分頁表配置為實體連續分頁表的情況下。在使用小型分頁（通常為4千位元組）的情況下，程式在最後會存取記憶體之八個分頁中的其中幾個，而這些所存取的分頁有很大的可能性係超過於步驟202中轉譯查詢緩沖器116所存取之分頁。在另一實施例中可將額外的邏輯電路加入至預取單元122與載入單元108，使得預取單元 122產生一請求信號用以命令載入單元108預取八個分頁表項目，此舉會大大地減少執行一分頁表尋訪用以將八個記憶體分頁儲存至轉譯查詢緩衝器116所需的時脈週期，其中這八個記憶體分頁的實體位址係儲存在八個分頁表項目中。具體說來，當分頁表尋訪引擎118必須執行分頁表尋訪（包括載入位於第二快取線126之八個分頁表項目中的任意一個）時，這些所載入之分頁表項目將會位於資料快取 112中（除非他們依序從資料快取112中移除），此舉會縮短讀取實體記憶體128用以取得分頁表項目所需的延遲時 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 7 201035867 間。取程式記憶體存取之記憶體存取樣本⑽㈣（即载人指令與儲存指令）。若預取器所偵測到之知式係耩由-樣本來存取記憶體，咖取器會預期之後載入指令或儲存指令的位址，並且從此位址執二預取動作。若程式依序地存取記,_，卿取器通常會根據載入指令或儲存指令的虛擬位址來預取τ―條快取線。在一作 Ο 業系統執行分頁表尋訪之處理器架構中，以載人指令或儲存指令為基礎之預取器（prQg贿1Gad/st。糾ased prefetcher) 會在載入分頁表項目之後預取下一條快取線。然而，在以硬體方式執行分頁表尋訪而不是軟體進行載入指令或儲存指令之處理器中，以載入指令或儲存指令為基礎之預取器並不會觸發(trigger off)分頁表項目之載入動作（因為這不疋一個載入指令）’也因此不會在載入分頁表項目之後預取下一條快取線。相反地，在本發明之以硬體方式執行分頁表尋訪之處理器中，預取單元122可觸發一非程式化之分 Ο 頁表項目載入動作’也就是藉由分頁表尋訪引擎118所觸發之實體記憶體存取動作。因此，不同於以載入指令或儲存指令為基礎的機制，本發明之預取單元122會命令載入單元108預取下一條快取線，並且此快取線可能包含分頁表中之數個分頁表項目。本發明雖以各種實施例揭露如上，然其僅為範例參考而非用以限定本發明的範圍，任何熟習此項技藝者，在不脫離本發明之精神和範圍内，當可做些許的更動與潤飾。舉例而言，可使用軟體來實現本發明所述之裝置與方法的 CNTR2456IOO-TW/〇6〇8-A42266-TW/Final/ 8 201035867 功能、構造、模組、模擬、描述及/或測試。此目的可透過使用一般程式語言（例如c、〇+)、硬體描述語言（包括201035867 VI. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to microprocessors, and more particularly to methods for prefetching data from microprocessors. [Prior Art] Many microprocessors today have the ability to use virtual memory, in particular, to use a memory mechanism (memory). It will be understood by those skilled in the art that the page tables established by the operating system in the system memory are used to translate virtual addresses into physical addresses. The x86 architecture processor technology described in the IA-32 Intel 8 Architecture Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, June 2006 (the full reference is cited by reference) Incorporated herein, the pagination table can be arranged in a hierarchical fashion. Specifically, the paging table contains page table entries (PTE). Each page table entry stores the attributes of the physical page partition and the entity memory page of a physical memory page. The so-called pagewalk refers to extracting a virtual memory paging address and using the virtual memory paging address to traverse the paging table hierarchy for obtaining the paging corresponding to the virtual memory paging address. Table items to translate virtual addresses into physical addresses. Performing paged table lookups is time consuming because of the relatively long latency of physical memory access and the possibility of multiple accesses to physical memory during paged table lookups. In order to avoid the time consumption caused by the execution of the paging table search, the processor usually includes a translation query buffer CNTR2456IO0-TW/O608-A42266-TW/Final/ 201035867 (Translation Lookaside Buffer; TLB) for storing the virtual address and A physical address translated from a virtual address. However, the size of the translation query buffer is limited, and when the translation query buffer is lost (miss^f still needs to perform page table lookup. Therefore, we need a method that can shorten the execution time of paged access. # [Summary] The present invention provides a microprocessor 'including a cache memory, an input unit, and a prefetch unit. The load unit is configured to receive a first bite request signal and a first load request signal for displaying the first load. Incoming request pass = load - page - table item. Prefetch unit is connected to the load unit ^ unit is used to receive from the load unit - one of the first cache line entity bit Z, the first cache line contains the first The first octet specified by the load request signal, the prefetch unit further generates a -first request signal to prefetch the lower-strip line. The present invention after the first cache line provides a shortened page lookup 曰The method of applying ο-cache memory and supporting paged virtual memory, the method includes detecting a number one of the first page table items. The method further comprises: :2 prefetch The second cache line to the cache memory, the bean D~να /, A second cache soul is the next cache line after the first cache line, and deepen your horse - load request signal The specified first-page table item W fetches the line containing the other microprocessor, the sentence red-cache memory, a load unit and the pre-fetch unit. The load unit is used to connect the request signal. The first load request signal is used to display the load-and-load request signal CNTR2456I00-TW/ 0608-A42266-TW/Final/ 2 201035867. The load-p-page table item is loaded. The prefetch unit is coupled to the load unit. The prefetch unit is configured to receive, from the loading unit, one of the first cache lines, the physical address, the first-cache line, the first page table item specified by the first-carrier request signal, and the pre-fetch unit generates - The first request signal system prefetches a second cache line to the cache memory, wherein the second cache line is the previous cache line before the first cache line. The present invention provides another shortened page table. The method of searching for time is applicable to one of the cached virtual memories and one of the supported paging virtual memories. The method includes: detecting a first load request signal of one of the first page table items. The method further includes prefetching a second cache line to the cache according to the result of detecting the first load request signal. The memory, wherein the second cache line is a previous cache line before the first cache line, and the first cache line includes the first page table item specified by the first load request signal. The above and other objects, features, and advantages of the present invention will become more apparent and understood from the appended <RTIgt; 1 is a block diagram of a microprocessor 100 in accordance with an embodiment of the present invention. The microprocessor 100 is a pipelined microprocessor. The microprocessor 1 includes an instruction cache 102 for providing a plurality of instructions to an instruction translator 1〇4, and the instruction translator 104 translates the received instructions and provides the translated instructions to an instruction dispatcher ( Instruction dispatcher) 106. The instruction dispatcher 1 将 6 provides instructions to a load unit 108, wherein the instructions may include memory access instructions (e.g., load instructions or store instructions). The loading unit 108 provides a virtual address 132 specified by the CNTR2456I00-TW/0608-A42266-TW/Final/3 201035867 memory access instruction to the translation query buffer 116, and the translation query buffer 116 is virtual to The address 132 performs a lookup. If the virtual address 132 is present in the translation query buffer 116, the translation query buffer 116 transmits the translated physical address 144 of the virtual address 132 back to the load unit 〇8. If the virtual address 132 does not appear in the buffer 116, the translation query buffer generates a miss signal 134 and transmits it to a tablewalk engine 118. The page table search engine 118 is coupled to the load unit 108 and the translation query buffer 116. As shown in Fig. 1, the prefetch unit 122 and the data cache 112 are also coupled to the load unit 108, and the bus interface unit η# is connected to the data cache Π2. The bus interface unit 114 couples the microprocessor 10 to a processor bus, and the processor bus is coupled to the body memory 128 in the computer system having the microprocessor. In particular, the physical memory 128 stores a plurality of page break tables, wherein a page break table includes a first cache line 124 located at one of the physical addresses p and a second cache line 126 located at one of the physical addresses p+64, and The first cache line 124 and the second cache line 126 store eight page table items, respectively. In this embodiment, the size of one cache line is 64 bytes and the size of one page table item is octet, so each cache line can store eight page table items. Please refer to FIG. 2, which is a flowchart of the operation of the microprocessor 100 in FIG. 1 for explaining how to prefetch the next cache line, wherein the cache line and one load to the load unit Related to the pagination table item. The process begins in step 202. In step 202, when the virtual address 132 does not appear in the translation query buffer CNTR2456IOO-TW/0608-A42266-TW/Final/4 201035867 buffer 116, the translation query buffer 116 generates a lost signal 134 and transmits it to the paging table. Search engine 118. The page table search engine 118 performs a page table lookup upon receipt of the lost signal 134 to obtain the entity address translated by the virtual address 132 lost in the translation query buffer 116. The page table search engine 118 performs a page table lookup action by generating a page table entry request signal (PTE load request) 136, wherein the page table table search engine 118 transmits the page table entry request signal 136 to the load unit 108. Used to load the paging table items needed to perform address translation. Flow proceeds to step 204. In step 204, the load unit 108 detects the page table entry load request signal 136 and loads the page table entry located in the entity memory 128. In addition, the loading unit 108 informs the prefetch unit 122 via a confirmation signal 138 that the page table entry request signal 136 has been spotted, and provides the physical address of the first cache line 124 to the translation query buffer 116. In the embodiment of FIG. 1, the physical address is P, and the first cache line 124 has a page table entry loaded by the loading unit 108. Flow proceeds to step 206. In step 206, prefetch unit 122 generates a prefetch request signal 142 and transmits it to load unit 108. The prefetch request signal 142 commands the load unit 108 to prefetch the second cache line 126 located at the physical address P+64 to the data cache 112. In other words, the loading unit 108 prefetches the next cache line (second cache line 126) located after the first cache line 124 (having the page table entry loaded to the load unit 108) to the data cache 112. . The flow proceeds to step 208. In step 208, the loading unit 108 prefetches the next cache line (the second cache line 126) to the data cache 112 according to the prefetch request signal 142 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 5 201035867. . However, in some cases, the load unit 108 in the microprocessor 100 does not perform the action of loading the second cache line 126. For example, the above situation may be a functional requirement, such as a cache line falling in a non-cacheable memory region. The above may also be for the microprocessor 100 to perform non-speculative allocations. If the load unit 108 decides to load the second cache line 126 from the physical memory 128, the load unit 108 causes the bus interface unit 114 to perform the load action. The process ends at step 208. Although the embodiment of the present invention describes prefetching the next cache line, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch the last cache line, or the command load. Enter unit 1〇8 to prefetch the next and previous cache lines. This embodiment is suitable for situations where the program travels in the other direction in the memory page. Further, although the embodiment of the present invention describes prefetching a cache line having a page table entry, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch other A cache line below the hierarchical information hierarchy, such as the Page Descriptor Entries (PDE). It's worth noting that although the access pattern of some programs using this method is helpful, it is not common to set a large amount of physical memory under a single paged descriptor item. The speed of the memory can become very slow, so the above methods are not only inefficient but also bring risks. In addition, in other embodiments, the prefetch unit 122 generates a request signal for CNTR2456I00-TW/0608-A42266-TW/Final/6 201035867 to cause the load unit 108 to prefetch another hierarchical table hierarchy (unlike The above cached item/page table item hierarchy) is below a cache line. As previously described, the prefetch unit 122 generates a request signal for the command load unit 108 to prefetch the next cache line to the cache line having the page table entry that needs to complete the page table lookup. Suppose the size of each page table is 4 kilobytes (KB), the size of each page table item is octet, and the size of each cache line is 64 bytes, so there will be 64 in a page table. The strips have cache lines for eight page table items, respectively. Therefore, it is highly probable that there is a next page (next) of eight page table entries in the paging table in the next cache line prefetched in step 208, especially in the operating system to configure the page table as a physical continuous page table. in the case of. In the case of small page breaks (usually 4 kilobytes), the program will eventually access several of the eight pages of memory, and these accessed pages have a high probability of exceeding The page accessed by the query buffer 116 is translated in step 202. In another embodiment, additional logic circuitry can be added to the prefetch unit 122 and the load unit 108 such that the prefetch unit 122 generates a request signal to command the load unit 108 to prefetch eight page table entries. The clock cycle required to perform a page table lookup to store eight memory pages to the translation query buffer 116 is greatly reduced, wherein the eight memory paged physical addresses are stored in eight page table entries. in. In particular, when the page table search engine 118 must perform a page table lookup (including loading any of the eight page table items located in the second cache line 126), the loaded page table items will be located. Data cache 112 (unless they are removed from the data cache 112 in sequence), this will shorten the delay required to read the physical memory 128 to obtain the page table entry. CNTR2456I00-TW/ 0608-A42266-TW /Final/ 7 between 201035867. The memory of the program memory access is sampled (10) (4) (ie, the manned command and the store command). If the knowledge system detected by the prefetcher accesses the memory by the -sample, the garner will expect to load the address of the instruction or store instruction and perform the prefetch operation from this address. If the program accesses the records sequentially, the _, the singer usually prefetches the τ-strip line based on the virtual address of the load instruction or the store instruction. In a processor architecture that implements paging table lookups in a system, a prefetcher based on a manned or stored instruction (prQg bribe 1Gad/st. corrected as predetcher) will be prefetched after loading the pagination table item. A cache line. However, in processors that perform paged table searches in hardware rather than software for load or store instructions, prefetchers based on load or store instructions do not trigger off the page table entry. The load action (because this does not load a load instruction) 'so does not prefetch the next cache line after loading the page table entry. Conversely, in the processor of the present invention that performs the paging table search in a hardware manner, the prefetch unit 122 can trigger an unprogrammed tabular page item loading action 'that is, by the paging table search engine 118. The triggered physical memory access action. Therefore, unlike the mechanism based on the load instruction or the store instruction, the prefetch unit 122 of the present invention instructs the load unit 108 to prefetch the next cache line, and the cache line may contain several of the page break tables. Pagination table item. The present invention has been described in terms of various embodiments, which are intended to be illustrative only and not to limit the scope of the present invention, and those skilled in the art can make a few changes without departing from the spirit and scope of the invention. With retouching. For example, CNTR2456IOO-TW/〇6〇8-A42266-TW/Final/ 8 201035867 can be implemented using software to implement the apparatus, method, module, simulation, description, and/or test of the apparatus and method of the present invention. This can be done by using a general programming language (eg c, 〇+), hardware description language (including

Verilog或VHDL硬體描述語言等等）、或其他可用的程式來實現。該軟體可被設置在任何電腦可用的媒體，例如半導體、磁碟、光碟（例如CD-ROM、DVD-ROM等等）中。本發明實施例中所述之裝置與方法可被包括在一半導體智慧財產權核心（semiconductor intellectual property core)，例如以硬體描述語言（HDL)實現之微處理器核心中，並被轉換為硬體型態的積體電路產品。此外，本發明所描述之裝置與方法可透過結合硬體與軟體的方式來實現。因此，^ 發明不應該被本文中之任-實施例所限定，而當視申請專利範圍與其等效物所界定者為準。特別是，係實現於-般用途電腦之微處理器裝置中。最後，任= 知技藝者，在不脫離本發明之精神和範圍内，當 = 更動與潤飾，因此本發明之保護範圍當視後附之申，：= 範圍所界定者為準。專利【圖式簡單說明】第1圖係為本發明實施例之微處理器的方塊圖· 第2圖係為第1圖中之微處理器的操作流程圖。【主要元件符號說明】 100〜微處理器； 102〜指令快取； 104〜指令轉譯器； CNTR2456I00-TW/ 0608-A42266-TW/Final/ 〇 201035867 106〜指令配送器； 108〜載入單元； 112〜資料快取； 114〜匯流排介面單元； 116〜轉譯查詢緩衝器； 118〜分頁表尋訪引擎； 122〜預取單元； 124〜第一快取線； 126〜第二快取線； 128〜實體記憶體； 132〜虛擬位址； 134〜遺失信號； 136〜分頁表項目載入請求信號； 138〜確認信號； 142〜預取請求信號； 144〜實體位址。 ❹ CNTR2456I00-TW/0608-A42266-TW/Final/ 10Verilog or VHDL hardware description language, etc.), or other available programs to implement. The software can be placed in any computer-usable medium such as a semiconductor, a disk, a compact disc (e.g., CD-ROM, DVD-ROM, etc.). The apparatus and method described in the embodiments of the present invention may be included in a semiconductor intellectual property core, such as a microprocessor core implemented in a hardware description language (HDL), and converted into a hardware. Type integrated circuit products. Moreover, the apparatus and method described herein can be implemented by combining hardware and software. Therefore, the invention should not be limited by the examples of the invention, and the scope of the patent application and its equivalents shall prevail. In particular, it is implemented in a microprocessor device of a general purpose computer. Finally, the present invention is intended to be modified and modified, and the scope of protection of the present invention is defined by the scope of the appended claims: ==. [Brief Description of the Drawings] Fig. 1 is a block diagram of a microprocessor according to an embodiment of the present invention. Fig. 2 is a flowchart showing the operation of the microprocessor in Fig. 1. [Main component symbol description] 100~Microprocessor; 102~ instruction cache; 104~ instruction translator; CNTR2456I00-TW/ 0608-A42266-TW/Final/ 〇201035867 106~ instruction dispatcher; 108~loading unit; 112~ data cache; 114~ bus interface unit; 116~ translation query buffer; 118~ page table search engine; 122~ prefetch unit; 124~ first cache line; 126~ second cache line; ~ physical memory; 132 ~ virtual address; 134 ~ lost signal; 136 ~ paging table item load request signal; 138 ~ acknowledgment signal; 142 ~ prefetch request signal; 144 ~ physical address. ❹ CNTR2456I00-TW/0608-A42266-TW/Final/ 10

Claims

201035867 VII. Patent application scope: 1. A microprocessor, comprising: a loading unit, configured to receive a first loading request signal, wherein the first loading request signal is used to display the first loading request signal And the first paging table item; and a prefetching unit coupled to the loading unit, the prefetching unit is configured to receive, by the loading unit, a first entity paging address of a first cache line, The first cache line includes the first page table item specified by the first load request signal, and the prefetch unit further generates a first request signal for prefetching a second cache line to a cache memory. The first cache line is the next cache line after the first cache line. 2. The microprocessor of claim 1, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a corresponding physical memory page. One of the second entity's eight-page address. 3. The microprocessor of claim 1, wherein the microprocessor further comprises: a page-by-page search engine, the city to the above-mentioned manned unit, wherein the above-mentioned search generates the above-mentioned - requesting money to the above-mentioned carrier. The microprocessor of claim 3, wherein the t-page search engine receives the first load request signal from the loading unit according to the instruction signal κΐ from the translation buffer, and the upper display and the above - The page-related table-related-first virtual address is not located in the above translation query buffer. 5. The microprocessor of claim 4, wherein CNTR2456IO0-TW/0608-A42266-TW/FinaI/jj 201035867 5 page table search engine performs a page table search according to the above indication signal. The paging table search engine loads the first entity paging address from the cache memory to the translation query buffer, and the first entity paging address is stored in the first paging table item. 6. The microprocessor of claim 1, wherein the first load request signal received by the loading unit is generated by a loss of a translation query buffer corresponding to one of the microprocessors.

7. The method of searching for time of miscellaneous stomach table is applicable to a microprocessor having a cache memory and supporting paging virtual memory, and the method includes: detecting a first page table item a first-loading request signal; and pre-fetching a first, cache line to the cache memory according to the result of detecting the first load request signal, wherein the second cache line is the first fast Taking the next cache line after the line, and the first cache ^ includes the first page entry specified by the first load request signal. 8. Shortening the page lookup time as described in claim 7 The method, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a second entity page address of the physical memory page corresponding to the second page table item. . The method of shortening the paging table search time according to the seventh aspect of the patent application, wherein the microprocessor further comprises a translation query buffer, the method further comprising: “upperly generating the first according to an indication signal The first load request signal of the page table item, wherein the indication signal is used to display one of the first eight CNTR2456IO〇-TW./〇6〇8-A42266-TW/FinaL/ 12 knife 201035867 page table items The first virtual address is not located in the translation query buffer described above. 10. The method of shortening the paging table search time according to claim 9 of the patent scope, the method further comprising: performing a page table search according to the indication signal, wherein the step of performing the paging table searching comprises: The first entity paging address of the memory is loaded into the translation query buffer, and the first entity paging address is stored in the first paging table item. 11. The method of shortening a page lookup time as described in claim 7, wherein the first load request signal is generated corresponding to a loss of one of the microprocessors in the translation query buffer. 12. A microprocessor, comprising: a load unit for receiving a first load request signal, wherein the first load request signal is loaded into a first page table to display the first load request signal And the pre-fetching unit is configured to receive, from the loading unit, a first entity paging address of the first cache line, the first cache line. The first paging table item specified by the first loading request signal, the pre-fetching unit further generates a first request signal for prefetching a second cache line to a cache memory, wherein the second The cache line is the previous cache line before the first cache line. 13. The microprocessor of claim 12, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a corresponding physical memory page. The second entity paging address, the microprocessor further comprises: CNTR2456I00-TW/ 0608-A42266-TW/Final/ 13 201035867 paging to = the engine 2 is connected to the above loading unit, the above-mentioned input unit. The above-mentioned first-loading of the above-mentioned, ten, eight, one hundred and four, the main microprocessor as claimed in claim 13, wherein the engine is first loaded according to the instruction from a translation buffer. The request signal is sent to the above loading unit, and the address of the ΐ Ο 不 is not located in the above-mentioned translation query buffer. The virtual microprocessor of claim 14, wherein the upper engine further executes a page table table according to the indication signal to perform the loading of the entity-page paging address to the above translation; = The physical paging address is stored in the above-mentioned page-by-page table. And the first one of the above-mentioned 1:6: 1:5 = circumference / 12 of the microprocessor, wherein the upper punch 2 = should be in the above-mentioned microprocessor - translation query is slow and =, *; The method of paging table browsing time, the above method is applicable to cache memory and supports paging virtual memory piracy. The above method includes: = first page request item, a first load request signal; and t according to the above - the above - The result of the manned request signal 'pre-fetches a t-wire to the cache memory, wherein the second cache line is the last cache line of the spoon, and the first cache line =3 The above-mentioned first page table item specified by the above-mentioned 帛 loading request signal is 0 CNTR2456I00-TW/ 〇608-A42266-TW/Final/ 14 201035867 18. The shortened page table search as described in claim 17 The method of time, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores one of the physical memory pages corresponding to the second entity page address, wherein the micro The processor further includes a translation query buffer. The method further includes: generating the first load request signal of the first page table item according to an indication signal, wherein the indication signal displays that one of the first virtual address addresses associated with the first page table item is not located in the translation query In the buffer. 19. The method of shortening the page lookup time as described in claim 18, wherein the method further comprises: performing a page table search according to the indication signal, wherein the step of performing the paging table search comprises: The first entity paging address of the memory is loaded into the translation query buffer, and the first entity paging address is stored in the first paging table item. 20. The method of shortening a page lookup time as described in claim 17, wherein the first load request signal is generated corresponding to a loss of one of the microprocessors in the translation query buffer. CNTR2456I00-TW/ 0608-A42266-TW/Final/ 15