[go: up one dir, main page]

TW201035867A - Microprocessor and method for reducing tablewalk time - Google Patents

Microprocessor and method for reducing tablewalk time Download PDF

Info

Publication number
TW201035867A
TW201035867A TW99107221A TW99107221A TW201035867A TW 201035867 A TW201035867 A TW 201035867A TW 99107221 A TW99107221 A TW 99107221A TW 99107221 A TW99107221 A TW 99107221A TW 201035867 A TW201035867 A TW 201035867A
Authority
TW
Taiwan
Prior art keywords
page
cache line
request signal
paging
page table
Prior art date
Application number
TW99107221A
Other languages
Chinese (zh)
Other versions
TWI437490B (en
Inventor
Rodney E Hooker
Colin Eddy
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW201035867A publication Critical patent/TW201035867A/en
Application granted granted Critical
Publication of TWI437490B publication Critical patent/TWI437490B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A microprocessor includes a cache memory, a load unit, and a prefetch unit, coupled to the load unit. The load unit is configured to receive a load request indicating that the load request is loading a page table entry. The prefetch unit is configured to receive from the load unit a physical address of a first cache line that includes the page table entry specified by the load request. The prefetch unit is further configured to responsively generate a request to prefetch into the cache memory a second cache line. The second cache line is the next physically sequential cache line to the first cache line.

Description

201035867 六、發明說明: 【發明所屬之技術領域】 本發明係關於微處理器,特別是關於微處理器之預取 (prefetch)資料的方法。 【先前技術】 現今許多微處理器具有使用虛擬記憶體的能力,特別 是能夠運用一 s己憶體分頁機制(memory卩㈣叩 mechanism)。熟知本領域技藝者應能理解,作業系統在系 統記憶體中所建立之分頁表(page tables)係用來將虛擬位 址轉譯成實體位址。根據《IA-32英特爾⑧架構軟體開發者 手冊,第3A冊:系統程式設計導引,第1篇,2006年6月》 中所描述之x86架構處理器技術(該參考文獻全文係以引用 方式併入本文中),分頁表可採取階層方式(hierarchical fashion)排列。具體說來,分頁表包含複數分頁表項目(page table entries ; PTE) ’各個分頁表項目儲存一實體記憶體分 頁之實體分頁位址與實體記憶體分頁之屬性。所謂的分頁 表尋訪(tablewalk)係指提取一虛擬記憶體分頁位址並使用 此虛擬記憶體分頁位址來尋訪(traverse)分頁表階層,用以 取得與此虛擬記憶體分頁位址對應之分頁表項目以便將虛 擬位址轉譯成實體位址。 由於實體記憶體存取之延遲時間相對較長,加上在分 頁表尋訪過程中可能要對實體記憶體進行多重存取,因此 執行分頁表尋訪十分耗時。為了避免因執行分頁表尋訪而 造成的時耗,處理器通常會包含一轉譯查詢緩衝器 CNTR2456IO0-TW/O608-A42266-TW/Final/ 201035867 (Translation Lookaside Buffer; TLB)用以儲存虛擬位址及由 虛擬位址轉譯成之實體位址。然而,轉譯查詢緩衝器之大 小有限,並且當轉譯查詢緩衝器發生遺失(miss^f還是需2 執行分頁表尋訪。因此,我們需要一種能夠縮短分頁 訪之執行時間的方法。 # 【發明内容】 本發明提供一種微處理器’包括—快取記憶體、— 入單元以及一預取單元。載入單元用以接收一第一咬 〇求信號’第-載入請求信號用以顯示第一載入請求传= 載入-第-分頁表項目。預取單元搞接至载入單元^ 單元用以從載入單元中接收-第一快取線之一實體位Z 第一快取線包含第一載入請求信號所指定之第一八 目,預取單元更產生-第-請求信號以預取 下-條快取線。 第一快取線之後的 本發明提供一種縮短分頁表尋訪時 曰]之方法,適用 ο具有-快取記憶體且支援分頁虛擬記憶體之、 上述方法包括偵測一第一分頁表項目之一第— 號。上述方法更包括根據偵測第一載Λ請求信:2 預取一第二快取線至快取記憶體,豆中 D〜να /、甲第二快取魂係為一 第一快取線之後的下一條快取線,並且 取深你马 -載入請求信號所指定之第-分頁表項目W取線包含第 本發明提供另一種微處理器,句紅 〜快取記憶體、一 載入單元以及-預取單元。載入單元用以接 ^ 請求信號,第一載入請求信號用以顯示裳 卑一載入請求信號 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 2 201035867 正載入-第-分頁表項目。預取單元耦接至載入單元,預 取單元用以從載入單元中接收一第一快取線之一實體位 址’第-快取線包含第-載人請求信號所指定之第一分頁 表項目,預取單元更產生-第―請求信制以預取一第二 快取線至快取記憶體,其中第二快取線係為第一快取線之 前的上一條快取線。 本發明k供另一種縮短分頁表尋訪時間之方法,適用 於具有一快取記憶體且支援分頁虛擬記憶體之一微處理 器。上述方法包括偵測一第一分頁表項目之一第一載入請 求信號。上述方法更包括根據偵測第一載入請求信號之結 果,預取一第二快取線至快取記憶體,其中第二快取線係 為一第一快取線之前的上一條快取線,並且第一快取線包 含第一載入請求信號所指定之第一分頁表項目。 為讓本發明之上述和其他目的、特徵、和優點能更明 顯易懂,下文特舉出較佳貫施例,並配合所附圖式,作詳 細說明如下: 【實施方式】 請參考第1圖,第1圖係為本發明實施例之微處理器 100的方塊圖’此微處理器100係為一管線式微處理器 (pipelined microprocessor)。微處理器1〇〇包括一指令快取 102用以提供複數指令至一指令轉譯器1〇4,並且指令轉譯 器104將所接收之指令轉譯並將轉譯後之指令提供至一指 令配送器(instruction dispatcher) 106。指令配送器1 〇6將指 令提供至一載入單元108,其中上述指令可包括記憶體存 取指令(例如載入指令或儲存指令)。載入單元108將一記 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 3 201035867 憶體存取指令所指定之虛擬位址132提供至—轉譯查詢緩 衝器116,並且轉譯查詢緩衝器116對虛擬位址132進行 查找(lookup)。若虛擬位址132有出現在轉譯查詢缓衝器 116中,則轉譯查詢緩衝器116將虛擬位址132轉譯後之 實體位址144傳送回載入單元1〇8。若虛擬位址132未出 現在轉s睪查§旬緩衝器116中,則轉譯查詢緩衝器us產生 一遺失信號(miss signal) 134並傳送至一分頁表尋訪引擎 (tablewalk engine) 118。分頁表尋訪引擎118耦接至載入單 〇 元108以及轉譯查詢緩衝器116。 如第1圖所示,預取單元122與資料快取112也耦接 至載入單元108,並且匯流排介面單元η#輕接至資料快 取Π2。匯流排介面單元114將微處理器10〇耦接至一處 理器匯流排,上述處理器匯流排耦接至具有微處理器1〇〇 之電腦系統中的貫體記憶體128。具體說來,實體記憶體 128儲存複數分頁表,其中一分頁表包括位於實體位址p 之一第一快取線124以及位於實體位址p+64之一第二快取 Ο 線126,並且第一快取線124與第二快取線126分別儲存 八個分頁表項目。在本實施例中一條快取線之大小係為64 位元組(bytes)’並且一個分頁表項目之大小係為8位元組, 因此每條快取線可儲存八個分頁表項目。 請參考第2圖,第2圖係為第1圖中之微處理器100 的操作流程圖,用以說明如何預取下一條快取線,其中此 快取線與一載入至載入單元之分頁表項目有關。流程從步 驟202開始。 在步驟202中,當虛擬位址132未出現在轉譯查詢緩 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 4 201035867 衝器116中,轉譯查詢緩衝器116產生一遺失信號134並 傳送至分頁表尋訪引擎118。分頁表尋訪引擎118在接收 遺失信號134後即執行分頁表尋訪以便取得遺失在轉譯查 詢緩衝器116中之虛擬位址132所轉譯成的實體位址。分 頁表尋訪引擎118藉由產生一分頁表項目載入請求信號 (PTE load request) 136來執行分頁表尋訪動作,其中分頁 表尋訪引擎118將分頁表項目載入請求信號136傳送至載 入單元108,用以載入執行位址轉譯所需之分頁表項目。 流程前進至步驟204。 在步驟204中,載入單元108偵測分頁表項目載入請 求信號136並且載入位於實體記憶體128中之分頁表項 目。此外,載入單元108透過一確認信號138告知預取單 元122已經查見(seen)分頁表項目載入請求信號136,並且 將第一快取線124之實體位址提供至轉譯查詢緩衝器 116,在第1圖的實施例中,該實體位址為P,其中此第一 快取線124具有載入單元108所載入之分頁表項目。流程 前進至步驟206。 在步驟206中,預取單元122產生一預取請求信號142 並傳送至載入單元108。預取請求信號142命令載入單元 108將位於實體位址P+64之第二快取線126預取至資料快 取112。換言之,載入單元108將位於第一快取線124(具 有載入至載入單元108之分頁表項目)之後的下一條快取線 (第二快取線126)預取至資料快取112。流程前進至步驟 208。 在步驟208中,載入單元108根據預取請求信號142 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 5 201035867 將下一條快取線(第二快取線126)預取至資料快取112。然 而在某些情況下,微處理器100中之載入單元108並不會 執行載入第二快取線126的動作。舉例而言,上述情況可 為一功能性需求(functional requirement)情況,例如快取線 落在一非快取記憶體區(non_cacheable memory region)。上 述情況也可為微處理器100要執行非推測性配置 (non-speculative allocations)。若載入單元 108 決定載入來 自實體記憶體128中之第二快取線126,則載入單元108命 0 令匯流排介面單元114執行此載入動作。流程結束於步驟 208。 雖然本發明實施例係描述預取下一條快取線,但在其 他實施例中,預取單元122會產生一請求信號用以命令載 入單元108預取上一條快取線,或者是命令載入單元1〇8 預取下一條與上一條快取線。此實施例適用於程式在記憶 體分頁中以另一方向行進的情況。 此外’雖然本發明實施例係描述預取具有分頁表項目 Ο 之下一條快取線,但在其他實施例中,預取單元122會產 生一請求信號用以命令載入單元108預取具有其他層級 (level)的分頁資訊階層之下一條快取線,例如分頁描述符 項目(Page Descriptor Entries ; PDE)。值得注意的是,雖然 使用此方法的某些程式之存取樣本(access pattern)是有助 益的,但由於將大量實體記憶體設置於單一分頁描述符項 目下方的情況不常見’並且程式尋訪記憶體的速度會變得 很慢,因此上述方法不但效率不彰也會帶來風險。此外, 在其他實施例中,預取單元122會產生一請求信號用以命 CNTR2456I00-TW/0608-A42266-TW/Final/ 6 201035867 令載入單元108預取具有另一分頁表階層(不同於上述分頁 描述符項目/分頁表項目階層)之下一條快取線。 如前文所述,預取單元122會產生一請求信號用以命 令載入單元108預取下一條快取線至具有需要完成分頁表 尋訪之分頁表項目的快取線。假設各個分頁表之大小為4 千位元組(KB),各個分頁表項目之大小為8位元組,並且 各條快取線之大小為64位元組,所以一個分頁表中會具有 64條分別具有八個分頁表項目之快取線。因此,於步驟208 中所預取之下一條快取線中具有分頁表中緊鄰的(next)八 個分頁表項目的可能性相當高,特別是在作業系統將分頁 表配置為實體連續分頁表的情況下。 在使用小型分頁(通常為4千位元組)的情況下,程式在 最後會存取記憶體之八個分頁中的其中幾個,而這些所存 取的分頁有很大的可能性係超過於步驟202中轉譯查詢緩 沖器116所存取之分頁。在另一實施例中可將額外的邏輯 電路加入至預取單元122與載入單元108,使得預取單元 122產生一請求信號用以命令載入單元108預取八個分頁 表項目,此舉會大大地減少執行一分頁表尋訪用以將八個 記憶體分頁儲存至轉譯查詢緩衝器116所需的時脈週期, 其中這八個記憶體分頁的實體位址係儲存在八個分頁表項 目中。具體說來,當分頁表尋訪引擎118必須執行分頁表 尋訪(包括載入位於第二快取線126之八個分頁表項目中的 任意一個)時,這些所載入之分頁表項目將會位於資料快取 112中(除非他們依序從資料快取112中移除),此舉會縮短 讀取實體記憶體128用以取得分頁表項目所需的延遲時 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 7 201035867 間。 取程式記憶體存取之記憶體存 取樣本⑽㈣(即载人指令與儲存指令)。若預取器所偵測 到之知式係耩由-樣本來存取記憶體,咖取器會預期之 後載入指令或儲存指令的位址,並且從此位址執二預取動 作。若程式依序地存取記,_,卿取器通常會根據載入 指令或儲存指令的虛擬位址來預取τ―條快取線。在一作 Ο 業系統執行分頁表尋訪之處理器架構中,以載人指令或儲 存指令為基礎之預取器(prQg贿1Gad/st。糾ased prefetcher) 會在載入分頁表項目之後預取下一條快取線。然而,在以 硬體方式執行分頁表尋訪而不是軟體進行載入指令或儲存 指令之處理器中,以載入指令或儲存指令為基礎之預取器 並不會觸發(trigger off)分頁表項目之載入動作(因為這不 疋一個載入指令)’也因此不會在載入分頁表項目之後預取 下一條快取線。相反地,在本發明之以硬體方式執行分頁 表尋訪之處理器中,預取單元122可觸發一非程式化之分 Ο 頁表項目載入動作’也就是藉由分頁表尋訪引擎118所觸 發之實體記憶體存取動作。因此,不同於以載入指令或儲 存指令為基礎的機制,本發明之預取單元122會命令載入 單元108預取下一條快取線,並且此快取線可能包含分頁 表中之數個分頁表項目。 本發明雖以各種實施例揭露如上,然其僅為範例參考 而非用以限定本發明的範圍,任何熟習此項技藝者,在不 脫離本發明之精神和範圍内,當可做些許的更動與潤飾。 舉例而言,可使用軟體來實現本發明所述之裝置與方法的 CNTR2456IOO-TW/〇6〇8-A42266-TW/Final/ 8 201035867 功能、構造、模組、模擬、描述及/或測試。此目的可透過 使用一般程式語言(例如c、〇+)、硬體描述語言(包括201035867 VI. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to microprocessors, and more particularly to methods for prefetching data from microprocessors. [Prior Art] Many microprocessors today have the ability to use virtual memory, in particular, to use a memory mechanism (memory). It will be understood by those skilled in the art that the page tables established by the operating system in the system memory are used to translate virtual addresses into physical addresses. The x86 architecture processor technology described in the IA-32 Intel 8 Architecture Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, June 2006 (the full reference is cited by reference) Incorporated herein, the pagination table can be arranged in a hierarchical fashion. Specifically, the paging table contains page table entries (PTE). Each page table entry stores the attributes of the physical page partition and the entity memory page of a physical memory page. The so-called pagewalk refers to extracting a virtual memory paging address and using the virtual memory paging address to traverse the paging table hierarchy for obtaining the paging corresponding to the virtual memory paging address. Table items to translate virtual addresses into physical addresses. Performing paged table lookups is time consuming because of the relatively long latency of physical memory access and the possibility of multiple accesses to physical memory during paged table lookups. In order to avoid the time consumption caused by the execution of the paging table search, the processor usually includes a translation query buffer CNTR2456IO0-TW/O608-A42266-TW/Final/ 201035867 (Translation Lookaside Buffer; TLB) for storing the virtual address and A physical address translated from a virtual address. However, the size of the translation query buffer is limited, and when the translation query buffer is lost (miss^f still needs to perform page table lookup. Therefore, we need a method that can shorten the execution time of paged access. # [Summary] The present invention provides a microprocessor 'including a cache memory, an input unit, and a prefetch unit. The load unit is configured to receive a first bite request signal and a first load request signal for displaying the first load. Incoming request pass = load - page - table item. Prefetch unit is connected to the load unit ^ unit is used to receive from the load unit - one of the first cache line entity bit Z, the first cache line contains the first The first octet specified by the load request signal, the prefetch unit further generates a -first request signal to prefetch the lower-strip line. The present invention after the first cache line provides a shortened page lookup 曰The method of applying ο-cache memory and supporting paged virtual memory, the method includes detecting a number one of the first page table items. The method further comprises: :2 prefetch The second cache line to the cache memory, the bean D~να /, A second cache soul is the next cache line after the first cache line, and deepen your horse - load request signal The specified first-page table item W fetches the line containing the other microprocessor, the sentence red-cache memory, a load unit and the pre-fetch unit. The load unit is used to connect the request signal. The first load request signal is used to display the load-and-load request signal CNTR2456I00-TW/ 0608-A42266-TW/Final/ 2 201035867. The load-p-page table item is loaded. The prefetch unit is coupled to the load unit. The prefetch unit is configured to receive, from the loading unit, one of the first cache lines, the physical address, the first-cache line, the first page table item specified by the first-carrier request signal, and the pre-fetch unit generates - The first request signal system prefetches a second cache line to the cache memory, wherein the second cache line is the previous cache line before the first cache line. The present invention provides another shortened page table. The method of searching for time is applicable to one of the cached virtual memories and one of the supported paging virtual memories. The method includes: detecting a first load request signal of one of the first page table items. The method further includes prefetching a second cache line to the cache according to the result of detecting the first load request signal. The memory, wherein the second cache line is a previous cache line before the first cache line, and the first cache line includes the first page table item specified by the first load request signal. The above and other objects, features, and advantages of the present invention will become more apparent and understood from the appended <RTIgt; 1 is a block diagram of a microprocessor 100 in accordance with an embodiment of the present invention. The microprocessor 100 is a pipelined microprocessor. The microprocessor 1 includes an instruction cache 102 for providing a plurality of instructions to an instruction translator 1〇4, and the instruction translator 104 translates the received instructions and provides the translated instructions to an instruction dispatcher ( Instruction dispatcher) 106. The instruction dispatcher 1 将 6 provides instructions to a load unit 108, wherein the instructions may include memory access instructions (e.g., load instructions or store instructions). The loading unit 108 provides a virtual address 132 specified by the CNTR2456I00-TW/0608-A42266-TW/Final/3 201035867 memory access instruction to the translation query buffer 116, and the translation query buffer 116 is virtual to The address 132 performs a lookup. If the virtual address 132 is present in the translation query buffer 116, the translation query buffer 116 transmits the translated physical address 144 of the virtual address 132 back to the load unit 〇8. If the virtual address 132 does not appear in the buffer 116, the translation query buffer generates a miss signal 134 and transmits it to a tablewalk engine 118. The page table search engine 118 is coupled to the load unit 108 and the translation query buffer 116. As shown in Fig. 1, the prefetch unit 122 and the data cache 112 are also coupled to the load unit 108, and the bus interface unit η# is connected to the data cache Π2. The bus interface unit 114 couples the microprocessor 10 to a processor bus, and the processor bus is coupled to the body memory 128 in the computer system having the microprocessor. In particular, the physical memory 128 stores a plurality of page break tables, wherein a page break table includes a first cache line 124 located at one of the physical addresses p and a second cache line 126 located at one of the physical addresses p+64, and The first cache line 124 and the second cache line 126 store eight page table items, respectively. In this embodiment, the size of one cache line is 64 bytes and the size of one page table item is octet, so each cache line can store eight page table items. Please refer to FIG. 2, which is a flowchart of the operation of the microprocessor 100 in FIG. 1 for explaining how to prefetch the next cache line, wherein the cache line and one load to the load unit Related to the pagination table item. The process begins in step 202. In step 202, when the virtual address 132 does not appear in the translation query buffer CNTR2456IOO-TW/0608-A42266-TW/Final/4 201035867 buffer 116, the translation query buffer 116 generates a lost signal 134 and transmits it to the paging table. Search engine 118. The page table search engine 118 performs a page table lookup upon receipt of the lost signal 134 to obtain the entity address translated by the virtual address 132 lost in the translation query buffer 116. The page table search engine 118 performs a page table lookup action by generating a page table entry request signal (PTE load request) 136, wherein the page table table search engine 118 transmits the page table entry request signal 136 to the load unit 108. Used to load the paging table items needed to perform address translation. Flow proceeds to step 204. In step 204, the load unit 108 detects the page table entry load request signal 136 and loads the page table entry located in the entity memory 128. In addition, the loading unit 108 informs the prefetch unit 122 via a confirmation signal 138 that the page table entry request signal 136 has been spotted, and provides the physical address of the first cache line 124 to the translation query buffer 116. In the embodiment of FIG. 1, the physical address is P, and the first cache line 124 has a page table entry loaded by the loading unit 108. Flow proceeds to step 206. In step 206, prefetch unit 122 generates a prefetch request signal 142 and transmits it to load unit 108. The prefetch request signal 142 commands the load unit 108 to prefetch the second cache line 126 located at the physical address P+64 to the data cache 112. In other words, the loading unit 108 prefetches the next cache line (second cache line 126) located after the first cache line 124 (having the page table entry loaded to the load unit 108) to the data cache 112. . The flow proceeds to step 208. In step 208, the loading unit 108 prefetches the next cache line (the second cache line 126) to the data cache 112 according to the prefetch request signal 142 CNTR2456IOO-TW/ 0608-A42266-TW/Final/ 5 201035867. . However, in some cases, the load unit 108 in the microprocessor 100 does not perform the action of loading the second cache line 126. For example, the above situation may be a functional requirement, such as a cache line falling in a non-cacheable memory region. The above may also be for the microprocessor 100 to perform non-speculative allocations. If the load unit 108 decides to load the second cache line 126 from the physical memory 128, the load unit 108 causes the bus interface unit 114 to perform the load action. The process ends at step 208. Although the embodiment of the present invention describes prefetching the next cache line, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch the last cache line, or the command load. Enter unit 1〇8 to prefetch the next and previous cache lines. This embodiment is suitable for situations where the program travels in the other direction in the memory page. Further, although the embodiment of the present invention describes prefetching a cache line having a page table entry, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch other A cache line below the hierarchical information hierarchy, such as the Page Descriptor Entries (PDE). It's worth noting that although the access pattern of some programs using this method is helpful, it is not common to set a large amount of physical memory under a single paged descriptor item. The speed of the memory can become very slow, so the above methods are not only inefficient but also bring risks. In addition, in other embodiments, the prefetch unit 122 generates a request signal for CNTR2456I00-TW/0608-A42266-TW/Final/6 201035867 to cause the load unit 108 to prefetch another hierarchical table hierarchy (unlike The above cached item/page table item hierarchy) is below a cache line. As previously described, the prefetch unit 122 generates a request signal for the command load unit 108 to prefetch the next cache line to the cache line having the page table entry that needs to complete the page table lookup. Suppose the size of each page table is 4 kilobytes (KB), the size of each page table item is octet, and the size of each cache line is 64 bytes, so there will be 64 in a page table. The strips have cache lines for eight page table items, respectively. Therefore, it is highly probable that there is a next page (next) of eight page table entries in the paging table in the next cache line prefetched in step 208, especially in the operating system to configure the page table as a physical continuous page table. in the case of. In the case of small page breaks (usually 4 kilobytes), the program will eventually access several of the eight pages of memory, and these accessed pages have a high probability of exceeding The page accessed by the query buffer 116 is translated in step 202. In another embodiment, additional logic circuitry can be added to the prefetch unit 122 and the load unit 108 such that the prefetch unit 122 generates a request signal to command the load unit 108 to prefetch eight page table entries. The clock cycle required to perform a page table lookup to store eight memory pages to the translation query buffer 116 is greatly reduced, wherein the eight memory paged physical addresses are stored in eight page table entries. in. In particular, when the page table search engine 118 must perform a page table lookup (including loading any of the eight page table items located in the second cache line 126), the loaded page table items will be located. Data cache 112 (unless they are removed from the data cache 112 in sequence), this will shorten the delay required to read the physical memory 128 to obtain the page table entry. CNTR2456I00-TW/ 0608-A42266-TW /Final/ 7 between 201035867. The memory of the program memory access is sampled (10) (4) (ie, the manned command and the store command). If the knowledge system detected by the prefetcher accesses the memory by the -sample, the garner will expect to load the address of the instruction or store instruction and perform the prefetch operation from this address. If the program accesses the records sequentially, the _, the singer usually prefetches the τ-strip line based on the virtual address of the load instruction or the store instruction. In a processor architecture that implements paging table lookups in a system, a prefetcher based on a manned or stored instruction (prQg bribe 1Gad/st. corrected as predetcher) will be prefetched after loading the pagination table item. A cache line. However, in processors that perform paged table searches in hardware rather than software for load or store instructions, prefetchers based on load or store instructions do not trigger off the page table entry. The load action (because this does not load a load instruction) 'so does not prefetch the next cache line after loading the page table entry. Conversely, in the processor of the present invention that performs the paging table search in a hardware manner, the prefetch unit 122 can trigger an unprogrammed tabular page item loading action 'that is, by the paging table search engine 118. The triggered physical memory access action. Therefore, unlike the mechanism based on the load instruction or the store instruction, the prefetch unit 122 of the present invention instructs the load unit 108 to prefetch the next cache line, and the cache line may contain several of the page break tables. Pagination table item. The present invention has been described in terms of various embodiments, which are intended to be illustrative only and not to limit the scope of the present invention, and those skilled in the art can make a few changes without departing from the spirit and scope of the invention. With retouching. For example, CNTR2456IOO-TW/〇6〇8-A42266-TW/Final/ 8 201035867 can be implemented using software to implement the apparatus, method, module, simulation, description, and/or test of the apparatus and method of the present invention. This can be done by using a general programming language (eg c, 〇+), hardware description language (including

Verilog或VHDL硬體描述語言等等)、或其他可用的程式 來實現。該軟體可被設置在任何電腦可用的媒體,例如半 導體、磁碟、光碟(例如CD-ROM、DVD-ROM等等)中。 本發明實施例中所述之裝置與方法可被包括在一半導體智 慧財產權核心(semiconductor intellectual property core),例 如以硬體描述語言(HDL)實現之微處理器核心中,並被轉 換為硬體型態的積體電路產品。此外,本發明所描述之裝 置與方法可透過結合硬體與軟體的方式來實現。因此,^ 發明不應該被本文中之任-實施例所限定,而當視 申請專利範圍與其等效物所界定者為準。特別是, 係實現於-般用途電腦之微處理器裝置中。最後,任= 知技藝者,在不脫離本發明之精神和範圍内,當 = 更動與潤飾,因此本發明之保護範圍當視後附之申,:= 範圍所界定者為準。 專利 【圖式簡單說明】 第1圖係為本發明實施例之微處理器的方塊圖· 第2圖係為第1圖中之微處理器的操作流程圖。 【主要元件符號說明】 100〜微處理器; 102〜指令快取; 104〜指令轉譯器; CNTR2456I00-TW/ 0608-A42266-TW/Final/ 〇 201035867 106〜指令配送器; 108〜載入單元; 112〜資料快取; 114〜匯流排介面單元; 116〜轉譯查詢緩衝器; 118〜分頁表尋訪引擎; 122〜預取單元; 124〜第一快取線; 126〜第二快取線; 128〜實體記憶體; 132〜虛擬位址; 134〜遺失信號; 136〜分頁表項目載入請求信號; 138〜確認信號; 142〜預取請求信號; 144〜實體位址。 ❹ CNTR2456I00-TW/0608-A42266-TW/Final/ 10Verilog or VHDL hardware description language, etc.), or other available programs to implement. The software can be placed in any computer-usable medium such as a semiconductor, a disk, a compact disc (e.g., CD-ROM, DVD-ROM, etc.). The apparatus and method described in the embodiments of the present invention may be included in a semiconductor intellectual property core, such as a microprocessor core implemented in a hardware description language (HDL), and converted into a hardware. Type integrated circuit products. Moreover, the apparatus and method described herein can be implemented by combining hardware and software. Therefore, the invention should not be limited by the examples of the invention, and the scope of the patent application and its equivalents shall prevail. In particular, it is implemented in a microprocessor device of a general purpose computer. Finally, the present invention is intended to be modified and modified, and the scope of protection of the present invention is defined by the scope of the appended claims: ==. [Brief Description of the Drawings] Fig. 1 is a block diagram of a microprocessor according to an embodiment of the present invention. Fig. 2 is a flowchart showing the operation of the microprocessor in Fig. 1. [Main component symbol description] 100~Microprocessor; 102~ instruction cache; 104~ instruction translator; CNTR2456I00-TW/ 0608-A42266-TW/Final/ 〇201035867 106~ instruction dispatcher; 108~loading unit; 112~ data cache; 114~ bus interface unit; 116~ translation query buffer; 118~ page table search engine; 122~ prefetch unit; 124~ first cache line; 126~ second cache line; ~ physical memory; 132 ~ virtual address; 134 ~ lost signal; 136 ~ paging table item load request signal; 138 ~ acknowledgment signal; 142 ~ prefetch request signal; 144 ~ physical address. ❹ CNTR2456I00-TW/0608-A42266-TW/Final/ 10

Claims (1)

201035867 七、申請專利範圍: 1. 一種微處理器,包括: 載入單元,用以接收一第一載入請求信號,上述第 一載入請求信號用以顯示上述第一載入請求信號正載入_ 第一分頁表項目;以及 一預取單元,耦接至上述載入單元,上述預取單元用 以從上述載入單元中接收一第一快取線之一第一實體分頁 位址,上述第一快取線包含上述第一載入請求信號所指定 之上述第一分頁表項目,上述預取單元更產生一第一請求 信號用以預取一第二快取線至一快取記憶體,其中上述第 一快取線係為上述第一快取線之後的下一條快取線。 2·如申請專利範圍第1項所述之微處理器,其中上述 第二快取線包含複數第二分頁表項目,並且上述每一第二 分頁表項目儲存所對應之實體記憶體分頁之一第二實體八 頁位址。 3·如申印專利範圍第1項所述之微處理器,更包括: 一分頁表尋訪引擎,城至上述載人單元,其中上述 尋訪產生上述第—以請求錢予上述載 4.如申請專利範圍第3項所述之微處理器,其中 t頁^尋訪引擎根據來自—轉譯查詢緩衝器之-指示信 κΐ,第一載入請求信號予上述載入單元,並且上 號顯示與上述第—分頁表項目有關之-第-虛擬 位址不位於上述轉譯查詢緩衝器中。 5.如申請專利範圍第4項所述之微處理器,其中 CNTR2456IO0-TW/0608-A42266-TW/FinaI/ jj 201035867 5頁表尋訪引擎更根據上述指示信號來執行一分頁表尋 訪’其中上述分頁表尋訪引擎將來自上述快取記憶體之上 述第一實體分頁位址载入至上述轉譯查詢緩衝器,並且上 述第一實體分頁位址係儲存於上述第一分頁表項目中。 6.如申請專利範圍第I項所述之微處理器,其中上述 载^單元所接收的上述第-載入請求信號相應射述微處 理器之一轉譯查詢緩衝器發生遺失而產生。201035867 VII. Patent application scope: 1. A microprocessor, comprising: a loading unit, configured to receive a first loading request signal, wherein the first loading request signal is used to display the first loading request signal And the first paging table item; and a prefetching unit coupled to the loading unit, the prefetching unit is configured to receive, by the loading unit, a first entity paging address of a first cache line, The first cache line includes the first page table item specified by the first load request signal, and the prefetch unit further generates a first request signal for prefetching a second cache line to a cache memory. The first cache line is the next cache line after the first cache line. 2. The microprocessor of claim 1, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a corresponding physical memory page. One of the second entity's eight-page address. 3. The microprocessor of claim 1, wherein the microprocessor further comprises: a page-by-page search engine, the city to the above-mentioned manned unit, wherein the above-mentioned search generates the above-mentioned - requesting money to the above-mentioned carrier. The microprocessor of claim 3, wherein the t-page search engine receives the first load request signal from the loading unit according to the instruction signal κΐ from the translation buffer, and the upper display and the above - The page-related table-related-first virtual address is not located in the above translation query buffer. 5. The microprocessor of claim 4, wherein CNTR2456IO0-TW/0608-A42266-TW/FinaI/jj 201035867 5 page table search engine performs a page table search according to the above indication signal. The paging table search engine loads the first entity paging address from the cache memory to the translation query buffer, and the first entity paging address is stored in the first paging table item. 6. The microprocessor of claim 1, wherein the first load request signal received by the loading unit is generated by a loss of a translation query buffer corresponding to one of the microprocessors. 7. -種雜分胃表尋訪時間之料,上財法適用於 具有一快取記憶體且支援分頁虛擬記憶體之一微處理器,、 上述方法包括: 侦測一第一分頁表項目之一第—载入請求信號;以及 根據上述偵測上述第一載入請求信號之結果,預取一 第,快取線至上述快取記憶體,其中上述第二快取線係為 第一快取線之後的下一條快取線,並且上述第一快取^ 包含上述第一載入請求信號所指定之上述第一分頁表項 8. 如申請專利第7項所叙縮短分頁表尋訪時間 之方法,其中上述第二快取線包含複數第二分頁表項目, 並且上述每―第二分頁表項目儲細對應之實體記憶體分 頁之一第二實體分頁位址。 。體刀 9. 如申請專利範圍第7項所述之縮短分頁表尋訪時間 之方法,其中上述微處理器更包括一轉譯查詢緩衝器, 述方法更包括: ’上 根據一指示信號產生上述第一分頁表項目之上述第一 載入請求信號,其中上述指示信號用以顯示與上述第一八 CNTR2456IO〇-TW./〇6〇8-A42266-TW/FinaL/ 12 刀 201035867 頁表項目有關之一第一虛擬位址不位於上述轉譯查詢緩衝 器中。 10. 如申請專利範圍第9項所述之縮短分頁表尋訪時 間之方法,上述方法更包括: 根據上述指示信號來執行一分頁表尋訪,其中上述執 行上述分頁表尋訪之步驟包括將來自上述快取記憶體之上 述第一實體分頁位址載入至上述轉譯查詢緩衝器,並且上 述第一實體分頁位址係儲存於上述第一分頁表項目中。 11. 如申請專利範圍第7項所述之縮短分頁表尋訪時 間之方法,其中上述第一載入請求信號相應於上述微處理 器之一轉譯查詢緩衝器發生遺失而產生。 12. —種微處理器,包括: 一載入單元,用以接收一第一載入請求信號,上述第 一載入請求信號以顯示上述第一載入請求信號正載入一第 一分頁表項目;以及 一預取單元,耦接至上述載入單元,上述預取單元用 以從上述載入單元中接收一第一快取線之一第一實體分頁 位址,上述第一快取線包含上述第一载入請求信號所指定 之上述第一分頁表項目,上述預取單元更產生一第一請求 信號用以預取一第二快取線至一快取記憶體,其中上述第 二快取線係為上述第一快取線之前的上一條快取線。 13. 如申請專利範圍第12項所述之微處理器,其中上 述第二快取線包含複數第二分頁表項目,並且上述每一第 二分頁表項目儲存所對應之實體記憶體分頁之一第二實體 分頁位址,上述微處理器更包括: CNTR2456I00-TW/ 0608-A42266-TW/Final/ 13 201035867 分頁以=產擎2接至上述載入單元,上述 入單元。 上述第-以财錢予上述載 、十、八1 百4,主如申請專利範圍第13項所述之微處理器,其中上 擎根據來自一轉譯查詢緩衝器之-指示作 第一載入請求信號至上述载入單元,並且ΐ Ο Ο 位址不位於上述轉譯查詢緩衝器中。 虛擬 述八1 百5,二申請專利範圍第14項所述之微處理器,其中上 擎更根據上述指示信號來執行一分頁表尋 述第-實體分頁位址載入至上述轉譯查以;= 述實體分頁位址係儲存於上述第-分頁表^目中。並上 述第1 一6.載 1::5=圍/12項所述之微處理器,其中上 衝器二而=應於上述微處理器之-轉譯查詢緩 於且=,*;種縮短分頁表尋訪時間之方法,上述方法適用 快取記憶體且支援分頁虛擬記憶體之 盗’上述方法包括: =第为頁表項目之一第一载入請求信號;以及 t據上述_上述第—載人請求信號之結果’預取一 t絲線至上述快取記憶體,其中上述第二快取線係為 勺人決、取線之刚的上一條快取線,並且上述第一快取線 =3上述帛載入睛求信號所指定之上述第一分頁表項 目0 CNTR2456I00-TW/ 〇608-A42266-TW/Final/ 14 201035867 18. 如申請專利範圍第17項所述之縮短分頁表尋訪時 間之方法,其中上述第二快取線包含複數第二分頁表項 目,並且上述每一第二分頁表項目儲存所對應之實體記憶 體分頁之一第二實體分頁位址,其中上述微處理器更包括 一轉譯查詢緩衝器,上述方法更包括: 根據一指示信號產生上述第一分頁表項目之上述第一 載入請求信號,其中上述指示信號顯示與上述第一分頁表 項目有關之一第一虛擬位址不位於上述轉譯查詢缓衝器 中。 19. 如申請專利範圍第18項所述之縮短分頁表尋訪時 間之方法,上述方法更包括: 根據上述指示信號來執行一分頁表尋訪,其中上述執 行上述分頁表尋訪之步驟包括將來自上述快取記憶體之上 述第一實體分頁位址載入至上述轉譯查詢缓衝器,並且上 述第一實體分頁位址係儲存於上述第一分頁表項目中。 20. 如申請專利範圍第17項所述之縮短分頁表尋訪時 間之方法,其中上述第一載入請求信號相應於上述微處理 器之一轉譯查詢緩衝器發生遺失而產生。 CNTR2456I00-TW/ 0608-A42266-TW/Final/ 157. The method of searching for time of miscellaneous stomach table is applicable to a microprocessor having a cache memory and supporting paging virtual memory, and the method includes: detecting a first page table item a first-loading request signal; and pre-fetching a first, cache line to the cache memory according to the result of detecting the first load request signal, wherein the second cache line is the first fast Taking the next cache line after the line, and the first cache ^ includes the first page entry specified by the first load request signal. 8. Shortening the page lookup time as described in claim 7 The method, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a second entity page address of the physical memory page corresponding to the second page table item. . The method of shortening the paging table search time according to the seventh aspect of the patent application, wherein the microprocessor further comprises a translation query buffer, the method further comprising: “upperly generating the first according to an indication signal The first load request signal of the page table item, wherein the indication signal is used to display one of the first eight CNTR2456IO〇-TW./〇6〇8-A42266-TW/FinaL/ 12 knife 201035867 page table items The first virtual address is not located in the translation query buffer described above. 10. The method of shortening the paging table search time according to claim 9 of the patent scope, the method further comprising: performing a page table search according to the indication signal, wherein the step of performing the paging table searching comprises: The first entity paging address of the memory is loaded into the translation query buffer, and the first entity paging address is stored in the first paging table item. 11. The method of shortening a page lookup time as described in claim 7, wherein the first load request signal is generated corresponding to a loss of one of the microprocessors in the translation query buffer. 12. A microprocessor, comprising: a load unit for receiving a first load request signal, wherein the first load request signal is loaded into a first page table to display the first load request signal And the pre-fetching unit is configured to receive, from the loading unit, a first entity paging address of the first cache line, the first cache line. The first paging table item specified by the first loading request signal, the pre-fetching unit further generates a first request signal for prefetching a second cache line to a cache memory, wherein the second The cache line is the previous cache line before the first cache line. 13. The microprocessor of claim 12, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores a corresponding physical memory page. The second entity paging address, the microprocessor further comprises: CNTR2456I00-TW/ 0608-A42266-TW/Final/ 13 201035867 paging to = the engine 2 is connected to the above loading unit, the above-mentioned input unit. The above-mentioned first-loading of the above-mentioned, ten, eight, one hundred and four, the main microprocessor as claimed in claim 13, wherein the engine is first loaded according to the instruction from a translation buffer. The request signal is sent to the above loading unit, and the address of the ΐ Ο 不 is not located in the above-mentioned translation query buffer. The virtual microprocessor of claim 14, wherein the upper engine further executes a page table table according to the indication signal to perform the loading of the entity-page paging address to the above translation; = The physical paging address is stored in the above-mentioned page-by-page table. And the first one of the above-mentioned 1:6: 1:5 = circumference / 12 of the microprocessor, wherein the upper punch 2 = should be in the above-mentioned microprocessor - translation query is slow and =, *; The method of paging table browsing time, the above method is applicable to cache memory and supports paging virtual memory piracy. The above method includes: = first page request item, a first load request signal; and t according to the above - the above - The result of the manned request signal 'pre-fetches a t-wire to the cache memory, wherein the second cache line is the last cache line of the spoon, and the first cache line =3 The above-mentioned first page table item specified by the above-mentioned 帛 loading request signal is 0 CNTR2456I00-TW/ 〇608-A42266-TW/Final/ 14 201035867 18. The shortened page table search as described in claim 17 The method of time, wherein the second cache line comprises a plurality of second page table items, and each of the second page table items stores one of the physical memory pages corresponding to the second entity page address, wherein the micro The processor further includes a translation query buffer. The method further includes: generating the first load request signal of the first page table item according to an indication signal, wherein the indication signal displays that one of the first virtual address addresses associated with the first page table item is not located in the translation query In the buffer. 19. The method of shortening the page lookup time as described in claim 18, wherein the method further comprises: performing a page table search according to the indication signal, wherein the step of performing the paging table search comprises: The first entity paging address of the memory is loaded into the translation query buffer, and the first entity paging address is stored in the first paging table item. 20. The method of shortening a page lookup time as described in claim 17, wherein the first load request signal is generated corresponding to a loss of one of the microprocessors in the translation query buffer. CNTR2456I00-TW/ 0608-A42266-TW/Final/ 15
TW99107221A 2009-03-30 2010-03-12 Microprocessor and method for reducing tablewalk time TWI437490B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16458809P 2009-03-30 2009-03-30
US12/604,998 US8161246B2 (en) 2009-03-30 2009-10-23 Prefetching of next physically sequential cache line after cache line that includes loaded page table entry

Publications (2)

Publication Number Publication Date
TW201035867A true TW201035867A (en) 2010-10-01
TWI437490B TWI437490B (en) 2014-05-11

Family

ID=42785701

Family Applications (2)

Application Number Title Priority Date Filing Date
TW99107221A TWI437490B (en) 2009-03-30 2010-03-12 Microprocessor and method for reducing tablewalk time
TW101143382A TWI451334B (en) 2009-03-30 2010-03-12 Microprocessor and method for reducing tablewalk time

Family Applications After (1)

Application Number Title Priority Date Filing Date
TW101143382A TWI451334B (en) 2009-03-30 2010-03-12 Microprocessor and method for reducing tablewalk time

Country Status (3)

Country Link
US (3) US8161246B2 (en)
CN (2) CN102999440B (en)
TW (2) TWI437490B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI511036B (en) * 2013-03-11 2015-12-01 Via Tech Inc Microprocessor and operation method thereof
US9483406B2 (en) 2013-03-11 2016-11-01 Via Technologies, Inc. Communicating prefetchers that throttle one another

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8442059B1 (en) 2008-09-30 2013-05-14 Gridiron Systems, Inc. Storage proxy with virtual ports configuration
US8838850B2 (en) * 2008-11-17 2014-09-16 Violin Memory, Inc. Cluster control protocol
US8417895B1 (en) 2008-09-30 2013-04-09 Violin Memory Inc. System for maintaining coherency during offline changes to storage media
US8443150B1 (en) 2008-11-04 2013-05-14 Violin Memory Inc. Efficient reloading of data into cache resource
US8788758B1 (en) 2008-11-04 2014-07-22 Violin Memory Inc Least profitability used caching scheme
US9569363B2 (en) 2009-03-30 2017-02-14 Via Technologies, Inc. Selective prefetching of physically sequential cache line to cache line that includes loaded page table entry
US8417871B1 (en) 2009-04-17 2013-04-09 Violin Memory Inc. System for increasing storage media performance
US8667366B1 (en) 2009-04-17 2014-03-04 Violin Memory, Inc. Efficient use of physical address space for data overflow and validation
US8713252B1 (en) 2009-05-06 2014-04-29 Violin Memory, Inc. Transactional consistency scheme
US9069676B2 (en) 2009-06-03 2015-06-30 Violin Memory, Inc. Mapping engine for a storage device
US8402198B1 (en) 2009-06-03 2013-03-19 Violin Memory, Inc. Mapping engine for a storage device
US8402246B1 (en) 2009-08-28 2013-03-19 Violin Memory, Inc. Alignment adjustment in a tiered storage system
US9418011B2 (en) * 2010-06-23 2016-08-16 Intel Corporation Region based technique for accurately predicting memory accesses
US8832384B1 (en) * 2010-07-29 2014-09-09 Violin Memory, Inc. Reassembling abstracted memory accesses for prefetching
US8959288B1 (en) 2010-07-29 2015-02-17 Violin Memory, Inc. Identifying invalid cache data
US8972689B1 (en) 2011-02-02 2015-03-03 Violin Memory, Inc. Apparatus, method and system for using real-time performance feedback for modeling and improving access to solid state media
US8635416B1 (en) 2011-03-02 2014-01-21 Violin Memory Inc. Apparatus, method and system for using shadow drives for alternative drive commands
WO2013095401A1 (en) * 2011-12-20 2013-06-27 Intel Corporation System and method for out-of-order prefetch instructions in an in-order pipeline
US20130262779A1 (en) * 2012-03-30 2013-10-03 Jayaram Bobba Profile-based hardware prefetching
CN102722451B (en) * 2012-06-25 2015-04-15 杭州中天微系统有限公司 Device for accessing cache by predicting physical address
US20140108766A1 (en) * 2012-10-17 2014-04-17 Advanced Micro Devices, Inc. Prefetching tablewalk address translations
US9201806B2 (en) 2013-01-04 2015-12-01 International Business Machines Corporation Anticipatorily loading a page of memory
KR102069273B1 (en) 2013-03-11 2020-01-22 삼성전자주식회사 System on chip and operating method thereof
US9880842B2 (en) 2013-03-15 2018-01-30 Intel Corporation Using control flow data structures to direct and track instruction execution
CN104424117B (en) * 2013-08-20 2017-09-05 华为技术有限公司 Memory physical address query method and device
US9645934B2 (en) * 2013-09-13 2017-05-09 Samsung Electronics Co., Ltd. System-on-chip and address translation method thereof using a translation lookaside buffer and a prefetch buffer
GB2528842B (en) 2014-07-29 2021-06-02 Advanced Risc Mach Ltd A data processing apparatus, and a method of handling address translation within a data processing apparatus
TWI590053B (en) * 2015-07-02 2017-07-01 威盛電子股份有限公司 Selective prefetching of physically sequential cache line to cache line that includes loaded page table
US9910780B2 (en) 2015-10-28 2018-03-06 International Business Machines Corporation Pre-loading page table cache lines of a virtual machine
US10175987B2 (en) 2016-03-17 2019-01-08 International Business Machines Corporation Instruction prefetching in a computer processor using a prefetch prediction vector
US10386933B2 (en) 2016-08-30 2019-08-20 International Business Machines Corporation Controlling navigation of a visual aid during a presentation
CN111198827B (en) * 2018-11-16 2022-10-28 展讯通信(上海)有限公司 Page table prefetching method and device
US10936281B2 (en) 2018-12-19 2021-03-02 International Business Machines Corporation Automatic slide page progression based on verbal and visual cues
US10909045B2 (en) * 2018-12-20 2021-02-02 Arm Limited System, method and apparatus for fine granularity access protection
WO2021184141A1 (en) 2020-03-15 2021-09-23 Micron Technology, Inc. Pre-load techniques for improved sequential read
KR20220135560A (en) * 2021-03-30 2022-10-07 삼성전자주식회사 Electronic device for managing memory and operating method thereof
CN114218132B (en) * 2021-12-14 2023-03-24 海光信息技术股份有限公司 Information prefetching method, processor and electronic equipment

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666509A (en) * 1994-03-24 1997-09-09 Motorola, Inc. Data processing system for performing either a precise memory access or an imprecise memory access based upon a logical address value and method thereof
US5613083A (en) * 1994-09-30 1997-03-18 Intel Corporation Translation lookaside buffer that is non-blocking in response to a miss for use within a microprocessor capable of processing speculative instructions
US5752274A (en) * 1994-11-08 1998-05-12 Cyrix Corporation Address translation unit employing a victim TLB
US5963984A (en) * 1994-11-08 1999-10-05 National Semiconductor Corporation Address translation unit employing programmable page size
TW460524B (en) * 1998-09-16 2001-10-21 Morton Int Inc Combination of an organothio compound and a zinc mercapto ester as heat stabilizer in PVC processing
US6681311B2 (en) * 2001-07-18 2004-01-20 Ip-First, Llc Translation lookaside buffer that caches memory type information
US6832296B2 (en) * 2002-04-09 2004-12-14 Ip-First, Llc Microprocessor with repeat prefetch instruction
US7194582B1 (en) * 2003-05-30 2007-03-20 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US7099999B2 (en) * 2003-09-30 2006-08-29 International Business Machines Corporation Apparatus and method for pre-fetching data to cached memory using persistent historical page table data
CN100517274C (en) * 2004-03-24 2009-07-22 松下电器产业株式会社 Cache memory and its control method
US7383418B2 (en) * 2004-09-01 2008-06-03 Intel Corporation Method and apparatus for prefetching data to a lower level cache memory
US20060136696A1 (en) * 2004-12-16 2006-06-22 Grayson Brian C Method and apparatus for address translation
US7383391B2 (en) * 2005-05-18 2008-06-03 International Business Machines Corporation Prefetch mechanism based on page table attributes
US7409524B2 (en) * 2005-08-17 2008-08-05 Hewlett-Packard Development Company, L.P. System and method for responding to TLB misses
US7822941B2 (en) * 2006-06-05 2010-10-26 Oracle America, Inc. Function-based virtual-to-physical address translation
US7949834B2 (en) * 2007-01-24 2011-05-24 Qualcomm Incorporated Method and apparatus for setting cache policies in a processor
US8103832B2 (en) * 2007-06-26 2012-01-24 International Business Machines Corporation Method and apparatus of prefetching streams of varying prefetch depth
US8078827B2 (en) * 2007-07-05 2011-12-13 International Business Machines Corporation Method and apparatus for caching of page translations for virtual machines
US7793070B2 (en) * 2007-07-12 2010-09-07 Qnx Software Systems Gmbh & Co. Kg Processing system implementing multiple page size memory organization with multiple translation lookaside buffers having differing characteristics
US7958316B2 (en) * 2008-02-01 2011-06-07 International Business Machines Corporation Dynamic adjustment of prefetch stream priority
US7996650B2 (en) * 2008-07-14 2011-08-09 Via Technologies, Inc. Microprocessor that performs speculative tablewalks
US7958317B2 (en) * 2008-08-04 2011-06-07 International Business Machines Corporation Cache directed sequential prefetch
US8291202B2 (en) * 2008-08-08 2012-10-16 Qualcomm Incorporated Apparatus and methods for speculative interrupt vector prefetching

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI511036B (en) * 2013-03-11 2015-12-01 Via Tech Inc Microprocessor and operation method thereof
US9251083B2 (en) 2013-03-11 2016-02-02 Via Technologies, Inc. Communicating prefetchers in a microprocessor
US9483406B2 (en) 2013-03-11 2016-11-01 Via Technologies, Inc. Communicating prefetchers that throttle one another

Also Published As

Publication number Publication date
CN102999440A (en) 2013-03-27
TW201312461A (en) 2013-03-16
US20120198176A1 (en) 2012-08-02
US20140013058A1 (en) 2014-01-09
US8433853B2 (en) 2013-04-30
TWI437490B (en) 2014-05-11
CN101833515A (en) 2010-09-15
US20100250859A1 (en) 2010-09-30
US8161246B2 (en) 2012-04-17
CN102999440B (en) 2016-08-24
CN101833515B (en) 2013-01-02
TWI451334B (en) 2014-09-01

Similar Documents

Publication Publication Date Title
TWI437490B (en) Microprocessor and method for reducing tablewalk time
JP4699666B2 (en) Store buffer that forwards data based on index and optional way match
US7783835B2 (en) System and method of improving task switching and page translation performance utilizing a multilevel translation lookaside buffer
US5265236A (en) Method and apparatus for increasing the speed of memory access in a virtual memory system having fast page mode
US9569363B2 (en) Selective prefetching of physically sequential cache line to cache line that includes loaded page table entry
JP2003514299A5 (en)
TWI590053B (en) Selective prefetching of physically sequential cache line to cache line that includes loaded page table
US11500779B1 (en) Vector prefetching for computing systems
CN112416817A (en) Prefetching method, information processing apparatus, device, and storage medium
TW201224923A (en) Region based technique for accurately predicting memory accesses
EP0365117B1 (en) Data-processing apparatus including a cache memory
US7984263B2 (en) Structure for a memory-centric page table walker
CN101326499A (en) Update the Multi-Level Translation Lookaside Buffer (TLB) field
US8019968B2 (en) 3-dimensional L2/L3 cache array to hide translation (TLB) delays
CN110941565A (en) Memory management method and device for chip storage access
US8019969B2 (en) Self prefetching L3/L4 cache mechanism
KR960007833B1 (en) Method and apparatus for fast page mode selection
US7171540B1 (en) Object-addressed memory hierarchy that facilitates accessing objects stored outside of main memory
CN113760783A (en) Joint offset prefetching method, apparatus, computing device and readable storage medium
JP4037806B2 (en) Cache memory device
CN113641403B (en) Microprocessor and method implemented in microprocessor
US20040059887A1 (en) Cache memory
CN111198827A (en) Page table prefetching method and device
JP4796580B2 (en) Apparatus and method for providing information to cache module using fetch burst