[go: up one dir, main page]

CN102156628B - Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor - Google Patents

Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor Download PDF

Info

Publication number
CN102156628B
CN102156628B CN201110094809.1A CN201110094809A CN102156628B CN 102156628 B CN102156628 B CN 102156628B CN 201110094809 A CN201110094809 A CN 201110094809A CN 102156628 B CN102156628 B CN 102156628B
Authority
CN
China
Prior art keywords
cache
mentioned
written
memory cache
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110094809.1A
Other languages
Chinese (zh)
Other versions
CN102156628A (en
Inventor
罗德尼·E·虎克
柯林·艾迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/869,386 external-priority patent/US8291172B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN102156628A publication Critical patent/CN102156628A/en
Application granted granted Critical
Publication of CN102156628B publication Critical patent/CN102156628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A microprocessor and method of prefetching data into a cache memory hierarchy of the microprocessor, the microprocessor comprising: the first cache memory and the second cache memory belong to different levels, and the level of the second cache memory is lower than that of the first cache memory; the data prefetcher monitors the loading operation, records the loading operation of the current cache line as a recent history, judges whether the recent history shows that a definite direction exists, and prefetches one or more cache lines to the first cache memory if the recent history shows that the definite direction exists, otherwise, prefetches one or more cache lines to the second cache memory. The data cache also determines whether the recent history indicates that the load operation is a large data volume. Under other conditions, the condition of large data volume is enabled to prefetch a larger number of cache lines. The data cache also determines whether the recent history indicates that the load operation occurred in consecutive clock cycles. The present invention can provide additional prefetching modes.

Description

微处理器、预取数据至微处理器的快取存储器阶层的方法Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor

技术领域 technical field

本发明与微处理器(microprocessors)领域有关,且特别有关于微处理器的数据预取(data prefetching)技术。The present invention relates to the field of microprocessors, and in particular to data prefetching techniques for microprocessors.

背景技术 Background technique

英特尔Core微处理器架构(microarchitecture)实现一硬件预取技术(hardware prefetcher,相关于数据快取单元预取器/DataCache Unit Prefetcher),用以预取第一层数据快取存储器(L1Dcache)。通过识别一快取列(cache line)的内容的图样(pattern),数据快取单元预取器预取接续而来下一条快取列至该第一层数据快取存储器。若每一接续载入对应到较每一先前地址低的地址,则预取先前提及的接续快取列。The Intel Core microprocessor architecture (microarchitecture) implements a hardware prefetcher (hardware prefetcher, related to the data cache unit prefetcher/DataCache Unit Prefetcher), which is used to prefetch the first layer data cache memory (L1Dcache). By identifying a pattern of contents of a cache line, the data cache unit prefetcher prefetches a subsequent next cache line into the first level data cache memory. If each successive load corresponds to a lower address than each previous address, the previously mentioned successive cache lines are prefetched.

发明内容 Contents of the invention

本发明一种实施方式提供一微处理器。该微处理器包括一第一与一第二快取存储器,属于该微处理器一快取存储器阶层的不同层级,其中,上述第二快取存储器在该快取存储器阶层中的层级低于该第一快取存储器。该微处理器亦包括一载入单元,用以接收关于一存储器的载入操作。该微处理器包括一数据预取器,耦接于上述第一与第二快取存储器。该数据预取器用于监测上述载入操作,并且将关于目前快取列的载入操作记录为一近来历史。该数据预取器也用于判断该近来历史是否显示目前快取列的载入操作具有一明确方向。若该近来历史显示有上述明确方向存在,该数据预取器预取一条或多条快取列至该第一快取存储器。若该近来历史显示无上述明确方向存在,该数据预取器预取一条或多条快取列至该第二快取存储器。One embodiment of the present invention provides a microprocessor. The microprocessor includes a first and a second cache memory belonging to different levels of a cache memory hierarchy of the microprocessor, wherein the second cache memory is at a lower level of the cache memory hierarchy than the First cache memory. The microprocessor also includes a load unit for receiving a load operation related to a memory. The microprocessor includes a data prefetcher coupled to the first and second cache memories. The data prefetcher is used to monitor the above load operations, and record the load operations on the current cache line as a recent history. The data prefetcher is also used to determine whether the recent history shows that the current cache line load operation has a clear direction. If the recent history shows that the above-mentioned clear direction exists, the data prefetcher prefetches one or more cache columns into the first cache memory. If the recent history shows that there is no such clear direction, the data prefetcher prefetches one or more cache columns into the second cache memory.

本发明另一种实施方式则揭露一种方法,用于预取数据至微处理器的快取存储器阶层中,其中该快取存储器阶层包括一第一快取存储器以及一第二快取存储器,上述第一与第二快取存储器分属该快取存储器阶层的不同层级,且该第二快取存储器的层级低于该第一快取存储器。该方法包括监测该微处理器的一载入单元所接收的关于一存储器的载入操作,并且将关于目前快取列的载入操作记录为一近来历史。该方法还包括判断该近来历史是否显示目前快取列的载入操作存在有一明确方向。当该近来历史显示有上述明确方向时,所揭露的方法包括预取一条或多条快取列至该第一快取存储器。当该近来历史没有显示上述明确方向时,所揭露的方法包括预取一条或多条快取列至该第二快取存储器。Another embodiment of the present invention discloses a method for prefetching data into a cache memory hierarchy of a microprocessor, wherein the cache memory hierarchy includes a first cache memory and a second cache memory, The above-mentioned first and second cache memories belong to different levels of the cache hierarchy, and the level of the second cache memory is lower than that of the first cache memory. The method includes monitoring load operations on a memory received by a load unit of the microprocessor, and recording the load operations on the current cache line as a recent history. The method further includes judging whether the recent history shows that there is a clear direction for the loading operation of the current cache row. When the recent history shows a clear direction, the disclosed method includes prefetching one or more cache lines into the first cache memory. When the recent history does not show a clear direction, the disclosed method includes prefetching one or more cache lines into the second cache memory.

本发明相对于现有技术能够提供额外的预取模式。Compared with the prior art, the present invention can provide additional prefetching modes.

附图说明 Description of drawings

图1图解一方块图,描述本发明所揭露的一种微处理器,其中具有一数据预取器;FIG. 1 is a block diagram illustrating a microprocessor disclosed in the present invention, which has a data prefetcher;

图2为一流程图,描述图1微处理器的操作;Figure 2 is a flowchart describing the operation of the microprocessor of Figure 1;

图3为一方块图,描述本发明所揭露的另一种微处理器,其中具有一数据预取器;FIG. 3 is a block diagram illustrating another microprocessor disclosed by the present invention, which has a data prefetcher;

图4为一流程图,描述图3实施例的数据预取器如何实现图2中步骤204的操作。FIG. 4 is a flowchart describing how the data prefetcher in the embodiment of FIG. 3 implements the operation of step 204 in FIG. 2 .

附图中符号的简单说明如下:A brief description of the symbols in the drawings is as follows:

100:微处理器;  102:指令快取存储器;100: microprocessor; 102: instruction cache memory;

112:指令转译器;116:暂存器别名表;112: instruction translator; 116: register alias table;

118:保留站;    122:载入单元;118: reservation station; 122: loading unit;

126:总线接口单元;      132:第一层数据快取存储器;126: bus interface unit; 132: first layer data cache memory;

134:第二层快取存储器;134: second layer cache memory;

136:数据预取器;        142:历史队列;136: Data prefetcher; 142: History queue;

144:历史记录细项;      146:控制逻辑;144: Historical record details; 146: Control logic;

148:时脉计数器;        152:地址栏位;148: clock counter; 152: address field;

154:尺寸栏位;          156:连续性栏位;154: size field; 156: continuity field;

158:方向性栏位;        162:快取列计数器;158: direction field; 162: cache column counter;

164:最新先前时脉;164: latest previous clock;

304:最低指针;          306:最高指针;304: lowest pointer; 306: highest pointer;

308:最低指针变化计数器;312:最高指针变化计数器。308: lowest pointer change counter; 312: highest pointer change counter.

具体实施方式 Detailed ways

本发明叙述一数据预取器,与英特尔的数据快取单元预取器相较,本发明所揭露的数据预取器还提供额外的预取模式。第一,所揭露的数据预取器会考虑是否存在有明确的载入方向(clear load direction);若不确定有明确的载入方向存在,则将数据预取至第二层快取存储器(L2)而非第一层数据快取存储器(L1D)。第二,所揭露的数据预取器将判断同一快取列的载入的时间间距。若间距较短(例如,于连续时脉周期中发生),数据预取器会预取数量较多的快取列(与其他状况相较)。第三,所揭露的数据预取器会观察所述载入的数据量。若数据量相当大,则数据预取器会预取数量较多的快取列(与其他状况相较)。The present invention describes a data prefetcher. Compared with Intel's data cache unit prefetcher, the data prefetcher disclosed in the present invention also provides additional prefetching modes. First, the disclosed data prefetcher will consider whether there is a clear load direction; if it is uncertain that there is a clear load direction, the data will be prefetched to the second level cache memory ( L2) instead of Level 1 Data Cache (L1D). Second, the disclosed data prefetcher will determine the time interval between loads of the same cache line. If the interval is short (eg, occurs in consecutive clock cycles), the data prefetcher will prefetch a larger number of cache rows (compared to other cases). Third, the disclosed data prefetcher observes the amount of data loaded. If the amount of data is quite large, the data prefetcher will prefetch a larger number of cache columns (compared to other situations).

参阅图1,其提供一方块图,描述依照本发明一种实施方式所揭露的微处理器100,微处理器100中具有一数据预取器136。该微处理器100包括一指令快取存储器(instruction cache)102;该指令快取存储器102耦接一指令转译器(instructiontranslator)112;该指令转译器112耦接一暂存器别名表(registeralias table,RAT)116;该暂存器别名表116耦接一保留站(reservation stations)118;且所述保留站118耦接一载入单元(load unit)122。所述保留站118发布指令至该载入单元122(或至其他执行单元,未显示于图中),使之得以跳脱程序顺序执行。一引退单元(retire unit,未显示在图中)包括一记录缓冲器(recorder buffer),用以依据程序顺序来引退指令。所述载入单元122自一第一层数据快取存储器(L1D cache)132读取数据。一第二层快取存储器(L2cache)支援该第一层数据快取存储器132以及该指令快取存储器102。该第二层快取存储器134通过一总线接口单元(bus interface unit)126读、写系统存储器(systemmemory);该总线接口单元126为该微处理器100与一总线(bus,例如本地总线local bus或存储器总线memory bus)的介面。微处理器100还包括一数据预取器136,或称一预取单元prefetchunit,用以自系统存储器预取数据至第二层快取存储器134以及第一层数据快取存储器132,以下详细讨论其内容。Referring to FIG. 1 , it provides a block diagram illustrating a microprocessor 100 disclosed according to an embodiment of the present invention. The microprocessor 100 has a data prefetcher 136 therein. The microprocessor 100 includes an instruction cache (instruction cache) 102; the instruction cache 102 is coupled to an instruction translator (instruction translator) 112; the instruction translator 112 is coupled to a register alias table (registeralias table , RAT) 116; the register alias table 116 is coupled to a reservation station (reservation stations) 118; and the reservation station 118 is coupled to a load unit (load unit) 122. The reservation station 118 issues instructions to the load unit 122 (or to other execution units, not shown in the figure) for execution out of program sequence. A retirement unit (not shown in the figure) includes a recorder buffer for retiring instructions according to program order. The loading unit 122 reads data from a first level data cache (L1D cache) 132 . A level 2 cache (L2 cache) supports the level 1 data cache 132 and the instruction cache 102 . The second layer cache memory 134 reads and writes the system memory (systemmemory) through a bus interface unit (bus interface unit) 126; Or memory bus memory bus) interface. The microprocessor 100 also includes a data prefetcher 136, or a prefetch unit prefetch unit, for prefetching data from the system memory to the second level cache memory 134 and the first level data cache memory 132, which will be discussed in detail below its content.

数据预取器136包括一控制逻辑146。该控制逻辑146耦接且控制一历史队列(history queue)142、一快取列计数器(cache linecounter)162、一时脉计数器(clock cycle counter)148以及一最新先前时脉暂存器(most recent previous clock cycle register)164。历史队列142将历史记录细项(entries)144以队列方式记录。每一历史记录细项144包括一地址栏位(an address field)152、一尺寸栏位(a size field)154、一连续性栏位(a consecutive filed)156以及一方向性栏位(a direction field)158。地址栏位152储存所对应的历史记录细项144所记载的载入操作(load operation)的载入地址(load address)。尺寸栏位154储存该载入操作的尺寸(字节数量)。连续性栏位156则标示被该数据预取器136所接收的该载入操作是否与先前最近一次发生的载入操作位于连续时脉周期内。方向性栏位158则显示该载入操作相对于先前最近一次发生的载入操作的方向。Data prefetcher 136 includes a control logic 146 . The control logic 146 is coupled to and controls a history queue (history queue) 142, a cache line counter (cache line counter) 162, a clock cycle counter (clock cycle counter) 148 and a latest previous clock register (most recent previous) clock cycle register) 164. The history queue 142 records the history record entries 144 in a queue. Each historical record item 144 includes an address field (an address field) 152, a size field (a size field) 154, a continuous field (a consecutive filed) 156, and a direction field (a direction field) 158. The address column 152 stores the load address (load address) of the load operation (load operation) recorded in the corresponding history record item 144 . The size field 154 stores the size (number of bytes) of the load operation. The continuity field 156 indicates whether the load operation received by the data prefetcher 136 is within consecutive clock cycles from the most recent previous load operation. Direction field 158 displays the direction of the load operation relative to the most recent previous load operation.

自该数据预取器136开始追踪目前快取列的存取,快取列计数器162计数该快取列的载入操作的总数量,以下对应图2中的步骤204进行讨论。时脉计数器148随微处理器100的时脉增量。因此,在步骤204处理一载入操作的同时,该控制逻辑146对该时脉计数器148的取样结果可被用来指示相对于当下其他载入操作的新的该载入操作是在哪一个时脉周期被接收,并且,特别用于判断该载入操作是否是在前一载入操作接收后的连续时脉周期中被接收,以设定历史记录细项144内的连续性栏位156。以下于图2中进一步讨论时脉计数器148以及最新先前时脉暂存器164的功用。Since the data prefetcher 136 starts to track the current cache line accesses, the cache line counter 162 counts the total number of load operations of the cache line, which will be discussed corresponding to step 204 in FIG. 2 . The clock counter 148 increments with the microprocessor 100 clock. Thus, while a load operation is being processed in step 204, the result of the control logic 146 sampling the clock counter 148 can be used to indicate at which time the new load operation is relative to other load operations at the moment. The clock cycle is received, and is particularly used to determine whether the load operation is received in consecutive clock cycles after the previous load operation was received, so as to set the continuity field 156 in the history entry 144 . The functions of the clock counter 148 and the latest previous clock register 164 are further discussed below in FIG. 2 .

参阅图2,其以一流程图描述图1微处理器100的操作。该流程始于步骤202。Referring to FIG. 2 , a flow chart is used to describe the operation of the microprocessor 100 of FIG. 1 . The process starts at step 202 .

在步骤202,一个新的载入操作自载入单元122传递至第一层数据快取存储器132。该载入操作明确指示一载入地址,以指示所欲载入的数据于存储器内的地址,此外,载入操作也明确指示有所欲载入的数据的尺寸,例如,1、2、4、8或16字节。流程图接着来到步骤204。At step 202 , a new load operation is passed from the load unit 122 to the level 1 data cache 132 . The load operation clearly indicates a load address to indicate the address of the data to be loaded in the memory. In addition, the load operation also clearly indicates the size of the data to be loaded, for example, 1, 2, 4 , 8 or 16 bytes. The flowchart then goes to step 204 .

在步骤204中,数据预取器136窥视该第一层数据快取存储器132,以侦测该次新的载入操作以及其相关信息。根据侦测结果,数据预取器136在历史队列142配置且填入一历史记录细项144。特别是,该控制逻辑146在地址栏位152填入上述载入地址,并且在尺寸栏位154填入载入数据的尺寸。此外,控制逻辑146读取该时脉计数器148以及该最新先前时脉暂存器164的值,并进行比较。若时脉计数器148目前的值较最新先前时脉暂存器164的值多1,控制逻辑146会设定连续性栏位156指示此次新的载入操作为先前一次载入操作的连续时脉周期内发生,否则,控制逻辑146会清除该连续性栏位156的内容,以指示此次新的载入操作并非在先前一次载入操作的连续时脉周期内发生。在另一种实施方式中,控制逻辑146是在时脉计数器148目前的值较最新先前时脉暂存器164的值多N时设定该连续性栏位156指示载入操作为连续时脉周期内发生,其中,N为一预设值;反之,控制逻辑146会清除该连续性栏位156。在一种实施方式中,N值为2。然而,预设值N为一设计参数,可基于多种参数而设定,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸。在一种实施方式中,该预设值N可经由微处理器100的型号特有暂存器(model specified register,MSR)来设定。在读取最新先前时脉暂存器164的内容后,控制逻辑146会将其自时脉计数器148所读取到的值用来更新最新先前时脉暂存器164的内容。此外,控制逻辑146会将此次新的载入操作的载入地址与历史队列142所记录的最新一次先前载入操作的地址栏位152比较,并据以填写此次新的载入操作的方向性栏位158,以指示此次新的载入操作相对于最新一次先前载入操作的方向。此外,控制逻辑可通过设定历史记录细项144内的一有效位(validbit),标示该历史记录细项144为有效。此外,控制逻辑146还增量该快取列计数器162。此外,在配置、填写以及有效化所配置的历史记录细项144以及增量该快取列计数器162之前,控制逻辑146会判断此次新的载入操作的载入地址的位置是否与历史队列142中其他载入操作位于同一快取列;若非,则控制逻辑146使历史队列142中的所有历史记录细项144为无效,以开始为此次新的载入操作所涉及的新的快取列进行记录,且清除该快取列计数器162的值。图2流程则进入下一步骤206。In step 204 , the data prefetcher 136 peeks at the first level data cache memory 132 to detect the new load operation and its related information. According to the detection result, the data prefetcher 136 configures and fills a history entry 144 in the history queue 142 . In particular, the control logic 146 fills the address field 152 with the load address and fills the size field 154 with the size of the loaded data. In addition, the control logic 146 reads and compares the values of the clock counter 148 and the latest previous clock register 164 . If the current value of the clock counter 148 is 1 more than the value of the latest previous clock register 164, the control logic 146 will set the continuity field 156 to indicate that this new load operation is the continuous time of the previous load operation. Otherwise, the control logic 146 clears the content of the continuity field 156 to indicate that the new load operation does not occur within the continuous clock cycle of the previous load operation. In another embodiment, the control logic 146 sets the continuity field 156 to indicate that the load operation is a continuous clock when the current value of the clock counter 148 is N more than the value of the latest previous clock register 164 Occurs within a period, where N is a default value; otherwise, the control logic 146 clears the continuity field 156 . In one embodiment, the value of N is two. However, the default value N is a design parameter and can be set based on various parameters, such as the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one implementation, the preset value N can be set via a model specified register (MSR) of the microprocessor 100 . After reading the content of the latest previous clock register 164 , the control logic 146 uses the value it read from the clock counter 148 to update the content of the latest previous clock register 164 . In addition, the control logic 146 will compare the load address of this new load operation with the address column 152 of the latest previous load operation recorded in the history queue 142, and fill in the address of the new load operation accordingly. Direction field 158 to indicate the direction of this new loading operation relative to the latest previous loading operation. In addition, the control logic can mark the history record item 144 as valid by setting a valid bit in the history record item 144 . In addition, the control logic 146 also increments the cache rank counter 162 . In addition, before configuring, filling and validating the configured history record item 144 and incrementing the cache column counter 162, the control logic 146 will determine whether the location of the load address of this new load operation is consistent with the history queue Other load operations in 142 are located in the same cache row; if not, then control logic 146 invalidates all historical record entries 144 in the history queue 142 to start a new cache for this new load operation. The column is recorded, and the value of the cache column counter 162 is cleared. The process in FIG. 2 enters the next step 206 .

在步骤206中,数据预取器136辨识此次新的载入操作所涉及的快取列内的一载入存取图样(load access pattern)。在一种实施方式中,预取器136于快取列计数器162(于步骤204增量)大于或等于一预设值P时辨识目前快取列中的一载入存取图样。在一种实施方式中,P值为4。然而,预设值P为一设计参数,可基于多种因素而决定,例如,第一层数据快取存储器132以及/或第二层快取存储器134的尺寸。在一种实施方式中,预设值P可利用微处理器的型号特有暂存器(model specified register,MSR)进行设定。此外,也可以其他方式侦测目前快取列中的一载入存取图样。图2流程接着进入步骤208。In step 206, the data prefetcher 136 identifies a load access pattern in the cache line involved in the new load operation. In one embodiment, the prefetcher 136 identifies a load access pattern in the current cache line when the cache line counter 162 (incremented in step 204 ) is greater than or equal to a predetermined value P. In one embodiment, the P value is 4. However, the default value P is a design parameter and can be determined based on various factors, for example, the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one embodiment, the preset value P can be set by using a model specified register (MSR) of the microprocessor. In addition, a load access pattern in the current cache line can also be detected in other ways. The process in FIG. 2 then goes to step 208 .

在步骤208,数据预取器136判断该载入存取图样是否具有明确的方向。在一种实施方式中,若最新的至少D次载入操作的历史记录细项144的方向性栏位158显示同样方向,则数据预取器136会判断有明确方向存在,其中D为预设值。在一种实施方式中,预设值D为3。然而,预设值D为设计参数,可基于各种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而调整。在一种实施方式中,预设值D可利用微处理器100的一型号特有暂存器(MSR)进行设定。另一种可用的实施方式将于以下图3讨论,其中采用另外一种方式判断是否有明确方向存在。若数据预取器136判断有一明确方向存在,图2流程图进入判断步骤218;反之,则进入另一个判断步骤212。In step 208, the data prefetcher 136 determines whether the load access pattern has a clear direction. In one embodiment, if the direction field 158 of the history item 144 of the latest at least D load operations shows the same direction, the data prefetcher 136 will determine that there is a clear direction, where D is the default value. In one embodiment, the preset value D is 3. However, the preset value D is a design parameter and can be adjusted based on various factors, such as the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one embodiment, the preset value D can be set by using a model-specific register (MSR) of the microprocessor 100 . Another possible implementation will be discussed below in FIG. 3 , where another way is used to determine whether a clear direction exists. If the data prefetcher 136 determines that there is a definite direction, the flow chart of FIG. 2 enters into a decision step 218 ; otherwise, it enters into another decision step 212 .

在判断步骤212中,数据预取器136判断目前快取列的载入操作的数据量是否过大。在一种实施方式中,若有效的历史记录细项144的尺寸栏位154的统计结果显示,相关载入操作共具有至少数据量Y,则数据预取器136会判定所述载入操作为大数据量,其中Y为预设值。在一种实施方式中,预设值Y为8字节。然而,预设值Y为一设计参数,可基于多种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,预设值Y可利用微处理器100的一型号特有暂存器(MSR)进行设定。在另一种实施方式中,数据预取器136可于所述载入操作中大多数具有至少数据量Y时,判定所述载入操作为大数据量;采用手段为:以两计数器追踪大数据量载入操作、与非大数据量载入操作的数量,并比较之;可于步骤204进行之。若所述载入操作为大数据量,则图2流程进入步骤214;否则,图2流程进入步骤216。In the judging step 212 , the data prefetcher 136 judges whether the data volume of the load operation of the current cache line is too large. In one embodiment, if the statistical result of the size column 154 of the valid historical record item 144 shows that the relevant load operations have at least a data volume Y in total, the data prefetcher 136 will determine that the load operation is A large amount of data, where Y is the default value. In one embodiment, the preset value Y is 8 bytes. However, the default value Y is a design parameter, which can be determined based on various factors, such as the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one embodiment, the preset value Y can be set by using a model-specific register (MSR) of the microprocessor 100 . In another implementation manner, the data prefetcher 136 can determine that the load operation is a large data volume when most of the load operations have at least a data volume Y; The number of data loading operations and the number of non-large data loading operations, and comparing them; can be performed in step 204 . If the loading operation is a large amount of data, the process in FIG. 2 goes to step 214 ; otherwise, the process in FIG. 2 goes to step 216 .

在步骤214,数据预取器136预取接续在后的两条快取列至第二层快取存储器134。由于判断步骤208不存在有明确方向,数据预取器136预取数据至第二层快取存储器134而非第一层数据快取存储器132;理由是,预取的数据有较大的机会为非必要,因此,数据预取器136不倾向将潜在可能会用到的数据放置于第一层数据快取存储器132。图2流程于步骤214结束。In step 214 , the data prefetcher 136 prefetches the next two cache columns to the second-level cache memory 134 . Since there is no clear direction in the judgment step 208, the data prefetcher 136 prefetches data to the second level cache memory 134 rather than the first level data cache memory 132; the reason is that the prefetched data has a greater chance to be It is not necessary, therefore, the data prefetcher 136 does not tend to place potentially useful data in the first level data cache memory 132 . The process in FIG. 2 ends at step 214 .

在步骤216,数据预取器136仅预取接续在后的一条快取列至第二层快取存储器134。图2流程于步骤216结束。In step 216 , the data prefetcher 136 only prefetches a subsequent cache row to the second-level cache memory 134 . The process in FIG. 2 ends at step 216 .

在判断步骤218中,数据预取器136判断与目前快取列相关的所述载入操作是否是在连续时脉周期内被接收。若载入是在连续时脉周期内被接收,则表示程序正以极快速度扫描存储器,因此数据预取器136必须预取更多的数据,以领先程序进行的速度,例如,在程序进行请求之前,在第一层数据快取存储器132中预先准备有后续快取列的数据。在一种实施方式中,若关于目前快取列的最新至少C次载入操作的历史记录细项144的连续性栏位156都有被标示,则数据预取器136会得知所述载入操作是在连续时脉周期内被接收,其中C为预设值。在一种实施方式中,预设值C为3,然而,预设值C为一设计参数,可基于各种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,预设值C可通过微处理器100的一型号特有暂存器(MSR)设定。若载入于连续时脉周期所发生,图2流程进入判断步骤232;反之,则流程进入判断步骤222。In decision step 218, the data prefetcher 136 determines whether the load operation associated with the current cache line is received in consecutive clock cycles. If the load is received in consecutive clock cycles, it means that the program is scanning the memory very fast, so the data prefetcher 136 must prefetch more data to get ahead of the program, e.g. Before the request, the data of the subsequent cache column is pre-prepared in the first-level data cache memory 132 . In one embodiment, if the continuity field 156 of the history entry 144 of the latest at least C load operations for the current cache line is marked, the data prefetcher 136 will know that the load The input operation is received in continuous clock cycles, where C is a preset value. In one embodiment, the default value C is 3, however, the default value C is a design parameter, which may be based on various factors, such as the first-level data cache memory 132 and/or the second-level cache memory 134 The size is decided. In one embodiment, the preset value C can be set by a model specific register (MSR) of the microprocessor 100 . If the loading occurs in consecutive clock cycles, the process in FIG. 2 enters the decision step 232 ; otherwise, the process enters the decision step 222 .

在判断步骤222中,数据预取器136判断关于目前快取列的所述载入操作是否为大数据量;详细技术与上述判断步骤212类似。若为大数据量,图2流程进入步骤224;反之,流程进入步骤226。In the judging step 222 , the data prefetcher 136 judges whether the load operation on the current cache line is a large amount of data; the detailed technique is similar to the above judging step 212 . If the amount of data is large, the process in FIG. 2 enters step 224; otherwise, the process enters step 226.

在步骤224中,数据预取器136依循步骤208所判断的明确方向,预取接续的两条快取列至第一层数据快取存储器132。由于判断步骤208判定有明确方向存在,数据预取器136会将数据预取至第一层数据快取存储器132而非第二层快取存储器134;理由是,预取的数据极有可能确实被使用到,预取器136会倾向将有可能用到的数据放置在第一层数据快取存储器132。图2流程结束于步骤224。In step 224 , the data prefetcher 136 prefetches two consecutive cache columns to the first-level data cache memory 132 following the specific direction determined in step 208 . Since the decision step 208 determines that there is a definite direction, the data prefetcher 136 will prefetch data to the first level data cache memory 132 rather than the second level cache memory 134; the reason is that the prefetched data is very likely to be If used, the prefetcher 136 tends to place potentially useful data in the first-level data cache memory 132 . The process in FIG. 2 ends at step 224 .

在步骤226中,数据预取器136会依循步骤208所判断的明确方向预取接续的一条快取列至第一层数据快取存储器132。图2流程结束于步骤226。In step 226 , the data prefetcher 136 prefetches a subsequent cache column to the first-level data cache memory 132 according to the specific direction determined in step 208 . The process in FIG. 2 ends at step 226 .

在判断步骤232中,数据预取器136会判断目前快取列的所述载入操作是否为大数据量;详细内容与前述判断步骤212内容类似。若载入为大数据量,图2流程进入步骤234;反之,图2流程进入步骤236。In the judging step 232 , the data prefetcher 136 judges whether the loading operation of the current cache row is a large amount of data; the detailed content is similar to that of the aforementioned judging step 212 . If the amount of data loaded is large, the process in FIG. 2 enters step 234 ; otherwise, the process in FIG. 2 enters step 236 .

在步骤234,数据预取器136会依循步骤208所判断出的明确方向预取接续的三条快取列至第一层数据快取存储器132。图2流程结束于步骤234。In step 234 , the data prefetcher 136 prefetches three consecutive cache columns to the first-level data cache memory 132 according to the specific direction determined in step 208 . The process in FIG. 2 ends at step 234 .

在步骤236,数据预取器136会依循步骤208所判断出的明确方向预取接续的两条快取列至第一层数据快取存储器132。图2流程结束于步骤236。In step 236 , the data prefetcher 136 prefetches two consecutive cache columns to the first-level data cache memory 132 according to the specific direction determined in step 208 . The process in FIG. 2 ends at step 236 .

接着讨论图3技术,其以方块图描述本发明另外一种实施方式所实现的微处理器100,该微处理器100具有一数据预取器136。图3的数据预取器136与图1所介绍的数据预取器136相似,且也以类似图2的方式操作,以下讨论两种实施方式的不同点。关于图2中步骤204所作的历史记录更新以及判断步骤208所作的明确方向判定,图3数据预取器136有作以下调整。在图3实施例中,历史队列142的历史记录细项144并不包括方向性栏位158。此外,数据预取器136包括一最低指针暂存器(min pointerregister)304以及一最高指针暂存器(max pointer register)306,由控制逻辑146控制,分别指向目前快取列内自数据预取器136开始追踪目前快取列的读取后所发生过的最低与最高地址变量。数据预取器136还包括一最低指针变化计数器308以及一最高指针变化计数器312,用以自数据预取器136开始追踪目前快取列的读取后,分别计数上述最低指针暂存器304与最高指针暂存器306的变化次数。以下讨论如何以图3所揭露的实施例实现图2中步骤204所述的数据预取器136操作。通过判断上述最低指针变化计数器308以及最高指针变化计数器312之间的差值是否大于一预设值,控制逻辑146判断是否有一明确方向存在。在一种实施方式中,该预设值为1;然而,该预设值为一设计参数,可基于多种变数,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,该预设值可利用微处理器100的型号特有暂存器(MSR)进行设定。若该最低指针变化计数器308的值较该最高指针变化计数器312的值高出上述预设值以上,则判定出来的明确方向为向下;若该最高指针变化计数器312的值较该最低指针变化计数器308的值高出上述预设值以上,则判定出来的明确方向为向上;其余的情况,则判定没有明确方向。此外,若此次新的载入操作的载入地址不与历史队列142所记载其他载入操作指向同样的快取列,则控制逻辑146清除最高指针变化计数器312以及最低变化指针计数器308。Next, the technique of FIG. 3 is discussed, which uses a block diagram to describe a microprocessor 100 implemented by another embodiment of the present invention, and the microprocessor 100 has a data prefetcher 136 . The data prefetcher 136 of FIG. 3 is similar to the data prefetcher 136 introduced in FIG. 1 , and also operates in a manner similar to that of FIG. 2 . The differences between the two implementations are discussed below. With regard to the historical record update made in step 204 in FIG. 2 and the definite direction decision made in decision step 208, the data prefetcher 136 in FIG. 3 makes the following adjustments. In the embodiment of FIG. 3 , the history item 144 of the history queue 142 does not include the directionality field 158 . In addition, the data prefetcher 136 includes a lowest pointer register (min pointer register) 304 and a highest pointer register (max pointer register) 306, which are controlled by the control logic 146 and point to the self-data prefetch in the current cache row respectively. The register 136 starts tracking the lowest and highest address variables that have occurred since the fetch of the current cache line. The data prefetcher 136 also includes a lowest pointer change counter 308 and a highest pointer change counter 312, which are used to count the lowest pointer registers 304 and the lowest pointer registers 304 and The change times of the highest pointer register 306. The following discusses how to implement the operation of the data prefetcher 136 described in step 204 in FIG. 2 with the embodiment disclosed in FIG. 3 . By determining whether the difference between the lowest pointer change counter 308 and the highest pointer change counter 312 is greater than a predetermined value, the control logic 146 determines whether there is a definite direction. In one embodiment, the default value is 1; however, the default value is a design parameter, which may be based on various variables, such as the first level data cache memory 132 and/or the second level cache memory 134 The size is decided. In one embodiment, the preset value can be set by using a model specific register (MSR) of the microprocessor 100 . If the value of the lowest pointer change counter 308 is higher than the above-mentioned preset value than the value of the highest pointer change counter 312, then the definite direction determined is downward; if the value of the highest pointer change counter 312 is higher than the lowest pointer change If the value of the counter 308 is higher than the above preset value, then it is determined that the definite direction is upward; in other cases, it is determined that there is no definite direction. In addition, if the load address of this new load operation does not point to the same cache line as other load operations recorded in the history queue 142 , the control logic 146 clears the highest pointer change counter 312 and the lowest change pointer counter 308 .

现在参考图4,其以流程图描述图3实施例的数据预取器136如何实现图2中步骤204的动作。图4流程起于步骤404。Referring now to FIG. 4 , a flow chart is used to describe how the data prefetcher 136 in the embodiment of FIG. 3 implements the actions of step 204 in FIG. 2 . The process in FIG. 4 starts from step 404 .

在判断步骤404,控制逻辑146判断新的载入地址,特别是指目前快取列的最新载入地址偏量,是否大于最高指针暂存器306的值。若是,则流程进入步骤406;其余情况,则流程进入判断步骤408。In the determination step 404 , the control logic 146 determines whether the new load address, especially the latest load address offset of the current cache row, is greater than the value of the highest pointer register 306 . If yes, the flow enters step 406; otherwise, the flow enters judgment step 408.

在步骤406,控制逻辑146以新的载入地址偏量更新最高指针暂存器306,并且增量该最高指针变化计数器312。在这种情况下,图4流程结束于步骤406。In step 406 , the control logic 146 updates the top pointer register 306 with the new load address offset and increments the top pointer change counter 312 . In this case, the flow of FIG. 4 ends at step 406 .

在判断步骤408,控制逻辑146判断目前快取列的最新载入地址偏量是否少于最低指针暂存器304的值。若是,图4流程进入步骤412;若为其他状况,则结束图4流程。In the determination step 408 , the control logic 146 determines whether the latest load address offset of the current cache line is less than the value of the lowest pointer register 304 . If yes, the process in FIG. 4 enters step 412; if otherwise, the process in FIG. 4 ends.

在步骤412,控制逻辑146以最新的载入地址偏量更新最低指针暂存器304,并且增量该最低指针变化计数器308。图4流程并于步骤412结束。In step 412 , the control logic 146 updates the lowest pointer register 304 with the latest load address offset and increments the lowest pointer change counter 308 . The process shown in FIG. 4 ends at step 412 .

纵然以上实施例主要讨论载入操作,在其他实施方式中,也可将所揭露的预取技术作适当改良,以应用于储存操作(storeoperations)中。Although the above embodiments mainly discuss the load operation, in other implementations, the disclosed prefetch technology can also be appropriately improved to apply to store operations.

虽然以上叙述本发明多种实施方式,必须声明的是,上述内容乃本技术的部分应用例子,并非用来限定本发明的范围。本领域技术人员可依循本发明特征,以现有技术另外发展出许多变形。例如,可以软件方式实现本发明所揭露的内容,例如,所揭露的设备或方法的功能、制作、模型化、模拟、说明以及/或测试。上述软件可采用常见的程序语言(例如,C、C++)、硬件描述语言(hardware description language,HDL)包括VerilogHDL、VHDL等或其他可用的程序语言。上述软件可载于现有的任何计算机储存介质,例如,磁记录装置(magnetic tape)、半导体(semiconductor)、磁盘(magnetic disk)或光盘(optical disc、如CD-ROM、DVD-ROM等),也可载于网路、有线系统或其他通讯介质。本发明所揭露的各种装置与方法可由一半导体智慧财产权核心,例如一微处理器核,可由硬件描述语言实现和保护,且可被转换为硬件形式,以集成电路方式制作。此外,所揭露的装置与方法也可由硬件与软件共同设计实现。因此,本发明不应受上述任何实施方式所限定,应当根据权利要求限定的范围作解读。特别是,本发明可被实现于一微处理器中,实现一般常用的计算机。本发明技术领域人员有可能基于本发明,以所揭露的概念以及所述的特殊实施方式为基础,设计或调整其他结构,以在不偏离权利要求所界定的内容的前提下,发展与本发明具有同样目的的技术。Although various embodiments of the present invention have been described above, it must be declared that the above contents are some application examples of the present technology, and are not intended to limit the scope of the present invention. Those skilled in the art can follow the features of the present invention and develop many variants with the prior art. For example, the content disclosed in the present invention can be realized in the form of software, for example, the function, production, modeling, simulation, description and/or testing of the disclosed device or method. Above-mentioned software can adopt common programming language (for example, C, C++), hardware description language (hardware description language, HDL) comprises VerilogHDL, VHDL etc. or other available programming languages. The above-mentioned software can be carried on any existing computer storage medium, for example, magnetic recording device (magnetic tape), semiconductor (semiconductor), magnetic disk (magnetic disk) or optical disc (optical disc, such as CD-ROM, DVD-ROM, etc.), It can also be carried on the network, cable system or other communication media. The various devices and methods disclosed in the present invention can be implemented and protected by a semiconductor intellectual property core, such as a microprocessor core, by a hardware description language, and can be converted into hardware and manufactured in an integrated circuit. In addition, the disclosed devices and methods can also be implemented by co-designing hardware and software. Therefore, the present invention should not be limited by any of the above embodiments, but should be interpreted according to the scope defined by the claims. In particular, the present invention can be implemented in a microprocessor, implementing a commonly used computer. Those skilled in the present invention may design or adjust other structures on the basis of the present invention, on the basis of the disclosed concept and the described special implementation mode, so as to develop the same as the present invention without departing from the content defined in the claims. technology with the same purpose.

Claims (31)

1. a microprocessor, is characterized in that, comprising:
One first memory cache and one second memory cache, the different levels that belong to respectively a memory cache stratum of this microprocessor, wherein, the level in this second memory cache Gai memory cache stratum is lower than the level in this first memory cache Gai memory cache stratum;
One is written into unit, in order to receive the operation that is written into about a storer; And
One data pre-fetching device, couples above-mentioned the first memory cache and above-mentioned the second memory cache, in order to:
Monitor the above-mentioned operation that is written into, and record is about an above-mentioned recent history that is written into operation of current cache row;
Judge whether this recently historically shows that about current cache row above-mentioned, being written into operation has a specific direction; And
When the above-mentioned specific direction of this recent history display exists, one or more cache of looking ahead is listed as to this first memory cache; And do not show that in this recent history while having above-mentioned specific direction, one or more cache of looking ahead is listed as to this second memory cache.
2. microprocessor according to claim 1, is characterized in that, above-mentioned one or more cache that is taken in advance this first memory cache is classified as and followed continue one or more cache row of current cache row of this specific direction.
3. microprocessor according to claim 2, is characterized in that, above-mentioned one or more cache that is taken in advance this second memory cache is classified one or more cache row being connected in after current cache row as.
4. microprocessor according to claim 1, is characterized in that, if being at least written into for D time of occurring is recently operating as same direction, above-mentioned recent history display has above-mentioned specific direction to exist, and wherein D is greater than a default integer of 1.
5. microprocessor according to claim 4, is characterized in that, D is set by user.
6. microprocessor according to claim 1, it is characterized in that, this recent history comprises a lowest address deviator and the location deviator superlatively that is written into operation about current cache row above-mentioned, and the highest counting that comprises a lowest count of the change of counting above-mentioned lowest address deviator and count the change of above-mentioned superlatively location deviator, above-mentioned being written into that described lowest count and the highest described counting start to record current cache row from this data pre-fetching device is operating as above-mentioned recent historical beginning, wherein, when above-mentioned lowest count and above-mentioned when the gap of high counting is greater than a preset value, above-mentioned recent history display one specific direction exists.
7. microprocessor according to claim 6, is characterized in that, this preset value is set by user.
8. microprocessor according to claim 1, it is characterized in that, this data pre-fetching device is also for judging whether this recent history shows that above-mentioned being written into is operating as big data quantity, wherein, under the constant situation of other conditions, if above-mentioned, be written into and be operating as big data quantity, the quantity of cache that this data pre-fetching device is looked ahead row is more than the quantity of the above-mentioned cache row that this data pre-fetching device is looked ahead while being written into operation for big data quantity
Wherein, if all above-mentioned data volumes that is written into operation are all at least Y byte, above-mentioned being written into of this recent history display is operating as big data quantity, and Y is a preset value.
9. microprocessor according to claim 8, is characterized in that, the parameter that this preset value Y sets for user.
10. microprocessor according to claim 8, it is characterized in that, when this recent history does not show this specific direction, if above-mentioned, be written into and be operating as big data quantity, this data pre-fetching device two caches that continue after current cache row of looking ahead are listed as to this second memory cache, if above-mentioned, be written into the non-big data quantity of operation, this data pre-fetching device cache continuing after current cache row of looking ahead is listed as to this second memory cache.
11. microprocessors according to claim 1, it is characterized in that, when this recent history display has above-mentioned specific direction, this data pre-fetching device also judges whether this recent history shows that above-mentioned being written into operates in interior being received of continuous clock pulse cycle, wherein, under the constant situation of other conditions, if above-mentioned, be written into operate in the continuous clock pulse cycle and be received, the quantity that is taken in advance the cache row of this first memory cache by this data pre-fetching device is written into operation and when continuous clock pulse is received in the cycle, by this data pre-fetching device, is not taken in advance the quantity of the cache row of this first memory cache more than above-mentioned.
12. microprocessors according to claim 11, it is characterized in that, if at present cache row up-to-date at least C time above-mentioned to be written into operation be to be received in the cycle in the front continuous clock pulse that is once written into operation, the current cache of this recent history display row all above-mentioned is written into that to operate in continuous clock pulse received in the cycle, wherein, C is greater than a default integer of 1.
13. microprocessors according to claim 12, is characterized in that, the parameter that this default integer C sets for user.
14. microprocessors according to claim 11, it is characterized in that, this data pre-fetching device is also for judging whether this recent history shows that above-mentioned being written into is operating as big data quantity, wherein, under the constant situation of other conditions, if above-mentioned, be written into and be operating as big data quantity, be taken in advance the quantity of cache row of this first memory cache more than the above-mentioned quantity that is taken in advance the cache row of this first memory cache while being written into operation for big data quantity
Wherein, if all above-mentioned data volumes that is written into operation are all at least Y byte, above-mentioned being written into of this recent history display is operating as big data quantity, and Y is a preset value.
15. microprocessors according to claim 14, is characterized in that, also comprise:
When above-mentioned, be written into that to operate in continuous clock pulse received and during for big data quantity in the cycle, this data pre-fetching device is looked ahead and is followed continue three caches of current cache row of this specific direction to be listed as to this first memory cache; And
When above-mentioned being written into operates in continuous clock pulse in the cycle during received but non-big data quantity, this data pre-fetching device is looked ahead and is followed continue two caches of current cache row of this specific direction to be listed as to this first memory cache.
16. microprocessors according to claim 15, is characterized in that, also comprise:
To be written into operation not received but during for big data quantity in the cycle at clock pulse continuously when above-mentioned, and this data pre-fetching device is looked ahead and followed continue two caches of current cache row of this specific direction to be listed as to this first memory cache; And
When above-mentioned, be written into operation not at clock pulse continuously in the cycle during received and non-big data quantity, this data pre-fetching device is looked ahead and is followed a continue cache of current cache row of this specific direction to be listed as to this first memory cache.
17. microprocessors according to claim 1, it is characterized in that, the look ahead operation of above-mentioned one or more cache row of this data pre-fetching device inhibition itself, unless the above-mentioned quantity that is written into operation of the current cache row of this recent history display has at least P time, wherein P is preset value.
18. microprocessors according to claim 17, is characterized in that, the parameter that P sets for user.
19. 1 kinds of prefetch datas are to the method for the memory cache stratum of microprocessor, it is characterized in that, this memory cache stratum comprises one first memory cache and one second memory cache that belongs to different estate, wherein the level of this second memory cache Gai memory cache stratum is lower than the level of this first memory cache Gai memory cache stratum, and the method comprises:
Monitor one of this microprocessor and be written into the operation that is written into about a storer that unit receives, and above-mentioned being written into that record is listed as about current cache is operating as a recent history;
Judge this recently historical whether show current cache row above-mentioned be written into operation and there is a specific direction; And
When this recent history display has above-mentioned specific direction, one or more cache of looking ahead is listed as to this first memory cache, and when this recent history does not show above-mentioned specific direction, one or more cache of looking ahead is listed as to this second memory cache.
20. prefetch datas according to claim 19 are to the method for the memory cache stratum of microprocessor, it is characterized in that, above-mentioned one or more cache that is taken in advance this first memory cache is classified as and is followed continue one or more cache row of current cache row of this specific direction.
21. prefetch datas according to claim 20 are to the method for the memory cache stratum of microprocessor, it is characterized in that, above-mentioned one or more cache that is taken in advance this second memory cache is classified the one or more cache row that continue after current cache row as.
22. prefetch datas according to claim 19 are to the method for the memory cache stratum of microprocessor, it is characterized in that, if the most de novo, be at least written into for D time and be operating as same direction, the above-mentioned specific direction of this recent history display exists, wherein, D is greater than a default integer of 1.
23. prefetch datas according to claim 19 are to the method for the memory cache stratum of microprocessor, it is characterized in that, an above-mentioned lowest address deviator and that is written into operation that this recent history comprises current cache row is location deviator superlatively, and comprise a lowest count of the variation of counting this lowest address deviator and count this superlatively the highest counting of the variation of location deviator, described lowest count and the highest described counting list to state from current cache and are written into operation is recorded in this recent history, wherein, if described lowest count and the described difference of high counting are greater than a preset value, this recent history display has above-mentioned specific direction to exist.
24. prefetch datas according to claim 19 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
Judge whether this recent history shows that above-mentioned being written into is operating as big data quantity, wherein, under the identical situation of other conditions, if above-mentioned, be written into and be operating as big data quantity, the quantity of the cache row of looking ahead is greater than the quantity that above-mentioned cache of looking ahead while being written into operation not for big data quantity is listed as
Wherein, if all above-mentioned data volumes that is written into operation are all at least Y byte, above-mentioned being written into of this recent history display is operating as big data quantity, and wherein Y is a preset value.
25. prefetch datas according to claim 24 are to the method for the memory cache stratum of microprocessor, it is characterized in that, if this recent history does not show above-mentioned specific direction, above-mentioned one or more cache of looking ahead is listed as and to the step of this second memory cache, is included in above-mentioned two caches that continue after current cache row of looking ahead while being operating as big data quantity that are written into and is listed as to this second memory cache, and the cache of current cache after being listed as that continue of looking ahead under other situations is listed as to this second memory cache.
26. prefetch datas according to claim 19 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
When this recent history display has above-mentioned specific direction, judge whether this recent history shows that above-mentioned to be written into operation be continuously clock pulse is received in the cycle, wherein, under the constant situation of other conditions, if above-mentioned, be written into that to operate in continuous clock pulse received in the cycle, the quantity that is taken in advance the cache row of this first memory cache is written into operation and when continuous clock pulse is received in the cycle, is not taken in advance the quantity of the cache row of this first memory cache more than above-mentioned.
27. prefetch datas according to claim 26 are to the method for the memory cache stratum of microprocessor, it is characterized in that, if at present cache row up-to-date at least C time above-mentioned to be written into operation be to be received in the cycle in the front continuous clock pulse that is once written into operation, the current cache of this recent history display row all above-mentioned is written into that to operate in continuous clock pulse received in the cycle, wherein, C is greater than a default integer of 1.
28. prefetch datas according to claim 26 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
Judge whether this recent history shows that above-mentioned being written into is operating as big data quantity, wherein, under the identical situation of other conditions, if above-mentioned, be written into and be operating as big data quantity, be taken in advance the quantity of cache row of this first memory cache more than the above-mentioned quantity that is taken in advance the cache row of this first memory cache while being written into operation for big data quantity
Wherein, if all above-mentioned data volumes that is written into operation are all at least Y byte, above-mentioned being written into of this recent history display is operating as big data quantity, and wherein Y is a preset value.
29. prefetch datas according to claim 28 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
In above-mentioned, be written into that to operate in continuous clock pulse received and during for big data quantity in the cycle, look ahead and follow continue three caches of current cache row of this specific direction to be listed as to this first memory cache;
In above-mentioned being written into, operate in continuous clock pulse in the cycle during received but non-big data quantity, look ahead and follow continue two caches of current cache row of this specific direction to be listed as to this first memory cache.
30. prefetch datas according to claim 29 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
To be written into operation non-received but during for big data quantity in the cycle in continuous clock pulse in above-mentioned, looks ahead and follow continue two caches of current cache row of this specific direction to be listed as to this first memory cache; And
To be written into operation non-in continuous clock pulse in the cycle during received and non-big data quantity in above-mentioned, looks ahead and follow a continue cache of current cache row of this specific direction to be listed as to this first memory cache.
31. prefetch datas according to claim 19 to the method for the memory cache stratum of microprocessor, is characterized in that, also comprise:
The step that suppresses the above-mentioned one or more cache row of looking ahead, unless the quantity that is written into operation of the current cache row of this recent history display is at least P time, wherein P is a preset value.
CN201110094809.1A 2010-04-27 2011-04-14 Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor Active CN102156628B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US32853010P 2010-04-27 2010-04-27
US61/328,530 2010-04-27
US12/869,386 US8291172B2 (en) 2010-04-27 2010-08-26 Multi-modal data prefetcher
US12/869,386 2010-08-26

Publications (2)

Publication Number Publication Date
CN102156628A CN102156628A (en) 2011-08-17
CN102156628B true CN102156628B (en) 2014-04-02

Family

ID=44438137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110094809.1A Active CN102156628B (en) 2010-04-27 2011-04-14 Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor

Country Status (1)

Country Link
CN (1) CN102156628B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372006B (en) * 2015-07-20 2019-11-05 华为技术有限公司 A kind of data prefetching method and device
US10866897B2 (en) * 2016-09-26 2020-12-15 Samsung Electronics Co., Ltd. Byte-addressable flash-based memory module with prefetch mode that is adjusted based on feedback from prefetch accuracy that is calculated by comparing first decoded address and second decoded address, where the first decoded address is sent to memory controller, and the second decoded address is sent to prefetch buffer
CN109783399B (en) * 2018-11-19 2021-01-19 西安交通大学 Data cache prefetching method of dynamic reconfigurable processor

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634971A (en) * 2009-09-01 2010-01-27 威盛电子股份有限公司 Data pre-extraction method and device and computer system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177985B1 (en) * 2003-05-30 2007-02-13 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US7238218B2 (en) * 2004-04-06 2007-07-03 International Business Machines Corporation Memory prefetch method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634971A (en) * 2009-09-01 2010-01-27 威盛电子股份有限公司 Data pre-extraction method and device and computer system

Also Published As

Publication number Publication date
CN102156628A (en) 2011-08-17

Similar Documents

Publication Publication Date Title
TWI428746B (en) Multi-modal data prefetcher
US11842049B2 (en) Dynamic cache management in hard drives
US10558569B2 (en) Cache controller for non-volatile memory
US8909866B2 (en) Prefetching to a cache based on buffer fullness
US20140108740A1 (en) Prefetch throttling
CN105183663B (en) Prefetch unit and data prefetch method
US8473680B1 (en) Hotspot detection and caching for storage devices
US9223705B2 (en) Cache access arbitration for prefetch requests
US9304919B2 (en) Detecting multiple stride sequences for prefetching
CN103226521B (en) Multimode data prefetching device and management method thereof
CN102111448A (en) Data prefetching method of DHT memory system and node and system
US10002079B2 (en) Method of predicting a datum to be preloaded into a cache memory
US9256544B2 (en) Way preparation for accessing a cache
CN105095104B (en) Data buffer storage processing method and processing device
CN102156628B (en) Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor
CN109196487A (en) Up/down prefetcher
JP7038656B2 (en) Access to cache
US9058277B2 (en) Dynamic evaluation and reconfiguration of a data prefetcher
CN109791469B (en) Apparatus and method for setting clock speed/voltage of cache memory
US20230342154A1 (en) Methods and apparatus for storing prefetch metadata
CN106326146B (en) Check the method whether cache hits
US12423243B2 (en) Systems and methods for reducing cache fills
JP5609657B2 (en) Low power design support apparatus and method for semiconductor integrated circuit

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant