CN102156628B - Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor - Google Patents
Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor Download PDFInfo
- Publication number
- CN102156628B CN102156628B CN201110094809.1A CN201110094809A CN102156628B CN 102156628 B CN102156628 B CN 102156628B CN 201110094809 A CN201110094809 A CN 201110094809A CN 102156628 B CN102156628 B CN 102156628B
- Authority
- CN
- China
- Prior art keywords
- cache
- mentioned
- written
- memory cache
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000015654 memory Effects 0.000 title claims abstract description 126
- 238000000034 method Methods 0.000 title claims abstract description 49
- 241001269238 Data Species 0.000 claims 13
- 230000005764 inhibitory process Effects 0.000 claims 1
- 238000013461 design Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
技术领域 technical field
本发明与微处理器(microprocessors)领域有关,且特别有关于微处理器的数据预取(data prefetching)技术。The present invention relates to the field of microprocessors, and in particular to data prefetching techniques for microprocessors.
背景技术 Background technique
英特尔Core微处理器架构(microarchitecture)实现一硬件预取技术(hardware prefetcher,相关于数据快取单元预取器/DataCache Unit Prefetcher),用以预取第一层数据快取存储器(L1Dcache)。通过识别一快取列(cache line)的内容的图样(pattern),数据快取单元预取器预取接续而来下一条快取列至该第一层数据快取存储器。若每一接续载入对应到较每一先前地址低的地址,则预取先前提及的接续快取列。The Intel Core microprocessor architecture (microarchitecture) implements a hardware prefetcher (hardware prefetcher, related to the data cache unit prefetcher/DataCache Unit Prefetcher), which is used to prefetch the first layer data cache memory (L1Dcache). By identifying a pattern of contents of a cache line, the data cache unit prefetcher prefetches a subsequent next cache line into the first level data cache memory. If each successive load corresponds to a lower address than each previous address, the previously mentioned successive cache lines are prefetched.
发明内容 Contents of the invention
本发明一种实施方式提供一微处理器。该微处理器包括一第一与一第二快取存储器,属于该微处理器一快取存储器阶层的不同层级,其中,上述第二快取存储器在该快取存储器阶层中的层级低于该第一快取存储器。该微处理器亦包括一载入单元,用以接收关于一存储器的载入操作。该微处理器包括一数据预取器,耦接于上述第一与第二快取存储器。该数据预取器用于监测上述载入操作,并且将关于目前快取列的载入操作记录为一近来历史。该数据预取器也用于判断该近来历史是否显示目前快取列的载入操作具有一明确方向。若该近来历史显示有上述明确方向存在,该数据预取器预取一条或多条快取列至该第一快取存储器。若该近来历史显示无上述明确方向存在,该数据预取器预取一条或多条快取列至该第二快取存储器。One embodiment of the present invention provides a microprocessor. The microprocessor includes a first and a second cache memory belonging to different levels of a cache memory hierarchy of the microprocessor, wherein the second cache memory is at a lower level of the cache memory hierarchy than the First cache memory. The microprocessor also includes a load unit for receiving a load operation related to a memory. The microprocessor includes a data prefetcher coupled to the first and second cache memories. The data prefetcher is used to monitor the above load operations, and record the load operations on the current cache line as a recent history. The data prefetcher is also used to determine whether the recent history shows that the current cache line load operation has a clear direction. If the recent history shows that the above-mentioned clear direction exists, the data prefetcher prefetches one or more cache columns into the first cache memory. If the recent history shows that there is no such clear direction, the data prefetcher prefetches one or more cache columns into the second cache memory.
本发明另一种实施方式则揭露一种方法,用于预取数据至微处理器的快取存储器阶层中,其中该快取存储器阶层包括一第一快取存储器以及一第二快取存储器,上述第一与第二快取存储器分属该快取存储器阶层的不同层级,且该第二快取存储器的层级低于该第一快取存储器。该方法包括监测该微处理器的一载入单元所接收的关于一存储器的载入操作,并且将关于目前快取列的载入操作记录为一近来历史。该方法还包括判断该近来历史是否显示目前快取列的载入操作存在有一明确方向。当该近来历史显示有上述明确方向时,所揭露的方法包括预取一条或多条快取列至该第一快取存储器。当该近来历史没有显示上述明确方向时,所揭露的方法包括预取一条或多条快取列至该第二快取存储器。Another embodiment of the present invention discloses a method for prefetching data into a cache memory hierarchy of a microprocessor, wherein the cache memory hierarchy includes a first cache memory and a second cache memory, The above-mentioned first and second cache memories belong to different levels of the cache hierarchy, and the level of the second cache memory is lower than that of the first cache memory. The method includes monitoring load operations on a memory received by a load unit of the microprocessor, and recording the load operations on the current cache line as a recent history. The method further includes judging whether the recent history shows that there is a clear direction for the loading operation of the current cache row. When the recent history shows a clear direction, the disclosed method includes prefetching one or more cache lines into the first cache memory. When the recent history does not show a clear direction, the disclosed method includes prefetching one or more cache lines into the second cache memory.
本发明相对于现有技术能够提供额外的预取模式。Compared with the prior art, the present invention can provide additional prefetching modes.
附图说明 Description of drawings
图1图解一方块图,描述本发明所揭露的一种微处理器,其中具有一数据预取器;FIG. 1 is a block diagram illustrating a microprocessor disclosed in the present invention, which has a data prefetcher;
图2为一流程图,描述图1微处理器的操作;Figure 2 is a flowchart describing the operation of the microprocessor of Figure 1;
图3为一方块图,描述本发明所揭露的另一种微处理器,其中具有一数据预取器;FIG. 3 is a block diagram illustrating another microprocessor disclosed by the present invention, which has a data prefetcher;
图4为一流程图,描述图3实施例的数据预取器如何实现图2中步骤204的操作。FIG. 4 is a flowchart describing how the data prefetcher in the embodiment of FIG. 3 implements the operation of
附图中符号的简单说明如下:A brief description of the symbols in the drawings is as follows:
100:微处理器; 102:指令快取存储器;100: microprocessor; 102: instruction cache memory;
112:指令转译器;116:暂存器别名表;112: instruction translator; 116: register alias table;
118:保留站; 122:载入单元;118: reservation station; 122: loading unit;
126:总线接口单元; 132:第一层数据快取存储器;126: bus interface unit; 132: first layer data cache memory;
134:第二层快取存储器;134: second layer cache memory;
136:数据预取器; 142:历史队列;136: Data prefetcher; 142: History queue;
144:历史记录细项; 146:控制逻辑;144: Historical record details; 146: Control logic;
148:时脉计数器; 152:地址栏位;148: clock counter; 152: address field;
154:尺寸栏位; 156:连续性栏位;154: size field; 156: continuity field;
158:方向性栏位; 162:快取列计数器;158: direction field; 162: cache column counter;
164:最新先前时脉;164: latest previous clock;
304:最低指针; 306:最高指针;304: lowest pointer; 306: highest pointer;
308:最低指针变化计数器;312:最高指针变化计数器。308: lowest pointer change counter; 312: highest pointer change counter.
具体实施方式 Detailed ways
本发明叙述一数据预取器,与英特尔的数据快取单元预取器相较,本发明所揭露的数据预取器还提供额外的预取模式。第一,所揭露的数据预取器会考虑是否存在有明确的载入方向(clear load direction);若不确定有明确的载入方向存在,则将数据预取至第二层快取存储器(L2)而非第一层数据快取存储器(L1D)。第二,所揭露的数据预取器将判断同一快取列的载入的时间间距。若间距较短(例如,于连续时脉周期中发生),数据预取器会预取数量较多的快取列(与其他状况相较)。第三,所揭露的数据预取器会观察所述载入的数据量。若数据量相当大,则数据预取器会预取数量较多的快取列(与其他状况相较)。The present invention describes a data prefetcher. Compared with Intel's data cache unit prefetcher, the data prefetcher disclosed in the present invention also provides additional prefetching modes. First, the disclosed data prefetcher will consider whether there is a clear load direction; if it is uncertain that there is a clear load direction, the data will be prefetched to the second level cache memory ( L2) instead of Level 1 Data Cache (L1D). Second, the disclosed data prefetcher will determine the time interval between loads of the same cache line. If the interval is short (eg, occurs in consecutive clock cycles), the data prefetcher will prefetch a larger number of cache rows (compared to other cases). Third, the disclosed data prefetcher observes the amount of data loaded. If the amount of data is quite large, the data prefetcher will prefetch a larger number of cache columns (compared to other situations).
参阅图1,其提供一方块图,描述依照本发明一种实施方式所揭露的微处理器100,微处理器100中具有一数据预取器136。该微处理器100包括一指令快取存储器(instruction cache)102;该指令快取存储器102耦接一指令转译器(instructiontranslator)112;该指令转译器112耦接一暂存器别名表(registeralias table,RAT)116;该暂存器别名表116耦接一保留站(reservation stations)118;且所述保留站118耦接一载入单元(load unit)122。所述保留站118发布指令至该载入单元122(或至其他执行单元,未显示于图中),使之得以跳脱程序顺序执行。一引退单元(retire unit,未显示在图中)包括一记录缓冲器(recorder buffer),用以依据程序顺序来引退指令。所述载入单元122自一第一层数据快取存储器(L1D cache)132读取数据。一第二层快取存储器(L2cache)支援该第一层数据快取存储器132以及该指令快取存储器102。该第二层快取存储器134通过一总线接口单元(bus interface unit)126读、写系统存储器(systemmemory);该总线接口单元126为该微处理器100与一总线(bus,例如本地总线local bus或存储器总线memory bus)的介面。微处理器100还包括一数据预取器136,或称一预取单元prefetchunit,用以自系统存储器预取数据至第二层快取存储器134以及第一层数据快取存储器132,以下详细讨论其内容。Referring to FIG. 1 , it provides a block diagram illustrating a
数据预取器136包括一控制逻辑146。该控制逻辑146耦接且控制一历史队列(history queue)142、一快取列计数器(cache linecounter)162、一时脉计数器(clock cycle counter)148以及一最新先前时脉暂存器(most recent previous clock cycle register)164。历史队列142将历史记录细项(entries)144以队列方式记录。每一历史记录细项144包括一地址栏位(an address field)152、一尺寸栏位(a size field)154、一连续性栏位(a consecutive filed)156以及一方向性栏位(a direction field)158。地址栏位152储存所对应的历史记录细项144所记载的载入操作(load operation)的载入地址(load address)。尺寸栏位154储存该载入操作的尺寸(字节数量)。连续性栏位156则标示被该数据预取器136所接收的该载入操作是否与先前最近一次发生的载入操作位于连续时脉周期内。方向性栏位158则显示该载入操作相对于先前最近一次发生的载入操作的方向。Data prefetcher 136 includes a control logic 146 . The control logic 146 is coupled to and controls a history queue (history queue) 142, a cache line counter (cache line counter) 162, a clock cycle counter (clock cycle counter) 148 and a latest previous clock register (most recent previous) clock cycle register) 164. The history queue 142 records the history record entries 144 in a queue. Each historical record item 144 includes an address field (an address field) 152, a size field (a size field) 154, a continuous field (a consecutive filed) 156, and a direction field (a direction field) 158. The address column 152 stores the load address (load address) of the load operation (load operation) recorded in the corresponding history record item 144 . The size field 154 stores the size (number of bytes) of the load operation. The continuity field 156 indicates whether the load operation received by the data prefetcher 136 is within consecutive clock cycles from the most recent previous load operation. Direction field 158 displays the direction of the load operation relative to the most recent previous load operation.
自该数据预取器136开始追踪目前快取列的存取,快取列计数器162计数该快取列的载入操作的总数量,以下对应图2中的步骤204进行讨论。时脉计数器148随微处理器100的时脉增量。因此,在步骤204处理一载入操作的同时,该控制逻辑146对该时脉计数器148的取样结果可被用来指示相对于当下其他载入操作的新的该载入操作是在哪一个时脉周期被接收,并且,特别用于判断该载入操作是否是在前一载入操作接收后的连续时脉周期中被接收,以设定历史记录细项144内的连续性栏位156。以下于图2中进一步讨论时脉计数器148以及最新先前时脉暂存器164的功用。Since the data prefetcher 136 starts to track the current cache line accesses, the
参阅图2,其以一流程图描述图1微处理器100的操作。该流程始于步骤202。Referring to FIG. 2 , a flow chart is used to describe the operation of the
在步骤202,一个新的载入操作自载入单元122传递至第一层数据快取存储器132。该载入操作明确指示一载入地址,以指示所欲载入的数据于存储器内的地址,此外,载入操作也明确指示有所欲载入的数据的尺寸,例如,1、2、4、8或16字节。流程图接着来到步骤204。At step 202 , a new load operation is passed from the load unit 122 to the level 1 data cache 132 . The load operation clearly indicates a load address to indicate the address of the data to be loaded in the memory. In addition, the load operation also clearly indicates the size of the data to be loaded, for example, 1, 2, 4 , 8 or 16 bytes. The flowchart then goes to step 204 .
在步骤204中,数据预取器136窥视该第一层数据快取存储器132,以侦测该次新的载入操作以及其相关信息。根据侦测结果,数据预取器136在历史队列142配置且填入一历史记录细项144。特别是,该控制逻辑146在地址栏位152填入上述载入地址,并且在尺寸栏位154填入载入数据的尺寸。此外,控制逻辑146读取该时脉计数器148以及该最新先前时脉暂存器164的值,并进行比较。若时脉计数器148目前的值较最新先前时脉暂存器164的值多1,控制逻辑146会设定连续性栏位156指示此次新的载入操作为先前一次载入操作的连续时脉周期内发生,否则,控制逻辑146会清除该连续性栏位156的内容,以指示此次新的载入操作并非在先前一次载入操作的连续时脉周期内发生。在另一种实施方式中,控制逻辑146是在时脉计数器148目前的值较最新先前时脉暂存器164的值多N时设定该连续性栏位156指示载入操作为连续时脉周期内发生,其中,N为一预设值;反之,控制逻辑146会清除该连续性栏位156。在一种实施方式中,N值为2。然而,预设值N为一设计参数,可基于多种参数而设定,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸。在一种实施方式中,该预设值N可经由微处理器100的型号特有暂存器(model specified register,MSR)来设定。在读取最新先前时脉暂存器164的内容后,控制逻辑146会将其自时脉计数器148所读取到的值用来更新最新先前时脉暂存器164的内容。此外,控制逻辑146会将此次新的载入操作的载入地址与历史队列142所记录的最新一次先前载入操作的地址栏位152比较,并据以填写此次新的载入操作的方向性栏位158,以指示此次新的载入操作相对于最新一次先前载入操作的方向。此外,控制逻辑可通过设定历史记录细项144内的一有效位(validbit),标示该历史记录细项144为有效。此外,控制逻辑146还增量该快取列计数器162。此外,在配置、填写以及有效化所配置的历史记录细项144以及增量该快取列计数器162之前,控制逻辑146会判断此次新的载入操作的载入地址的位置是否与历史队列142中其他载入操作位于同一快取列;若非,则控制逻辑146使历史队列142中的所有历史记录细项144为无效,以开始为此次新的载入操作所涉及的新的快取列进行记录,且清除该快取列计数器162的值。图2流程则进入下一步骤206。In
在步骤206中,数据预取器136辨识此次新的载入操作所涉及的快取列内的一载入存取图样(load access pattern)。在一种实施方式中,预取器136于快取列计数器162(于步骤204增量)大于或等于一预设值P时辨识目前快取列中的一载入存取图样。在一种实施方式中,P值为4。然而,预设值P为一设计参数,可基于多种因素而决定,例如,第一层数据快取存储器132以及/或第二层快取存储器134的尺寸。在一种实施方式中,预设值P可利用微处理器的型号特有暂存器(model specified register,MSR)进行设定。此外,也可以其他方式侦测目前快取列中的一载入存取图样。图2流程接着进入步骤208。In step 206, the data prefetcher 136 identifies a load access pattern in the cache line involved in the new load operation. In one embodiment, the prefetcher 136 identifies a load access pattern in the current cache line when the cache line counter 162 (incremented in step 204 ) is greater than or equal to a predetermined value P. In one embodiment, the P value is 4. However, the default value P is a design parameter and can be determined based on various factors, for example, the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one embodiment, the preset value P can be set by using a model specified register (MSR) of the microprocessor. In addition, a load access pattern in the current cache line can also be detected in other ways. The process in FIG. 2 then goes to step 208 .
在步骤208,数据预取器136判断该载入存取图样是否具有明确的方向。在一种实施方式中,若最新的至少D次载入操作的历史记录细项144的方向性栏位158显示同样方向,则数据预取器136会判断有明确方向存在,其中D为预设值。在一种实施方式中,预设值D为3。然而,预设值D为设计参数,可基于各种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而调整。在一种实施方式中,预设值D可利用微处理器100的一型号特有暂存器(MSR)进行设定。另一种可用的实施方式将于以下图3讨论,其中采用另外一种方式判断是否有明确方向存在。若数据预取器136判断有一明确方向存在,图2流程图进入判断步骤218;反之,则进入另一个判断步骤212。In
在判断步骤212中,数据预取器136判断目前快取列的载入操作的数据量是否过大。在一种实施方式中,若有效的历史记录细项144的尺寸栏位154的统计结果显示,相关载入操作共具有至少数据量Y,则数据预取器136会判定所述载入操作为大数据量,其中Y为预设值。在一种实施方式中,预设值Y为8字节。然而,预设值Y为一设计参数,可基于多种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,预设值Y可利用微处理器100的一型号特有暂存器(MSR)进行设定。在另一种实施方式中,数据预取器136可于所述载入操作中大多数具有至少数据量Y时,判定所述载入操作为大数据量;采用手段为:以两计数器追踪大数据量载入操作、与非大数据量载入操作的数量,并比较之;可于步骤204进行之。若所述载入操作为大数据量,则图2流程进入步骤214;否则,图2流程进入步骤216。In the judging step 212 , the data prefetcher 136 judges whether the data volume of the load operation of the current cache line is too large. In one embodiment, if the statistical result of the size column 154 of the valid historical record item 144 shows that the relevant load operations have at least a data volume Y in total, the data prefetcher 136 will determine that the load operation is A large amount of data, where Y is the default value. In one embodiment, the preset value Y is 8 bytes. However, the default value Y is a design parameter, which can be determined based on various factors, such as the size of the first-level data cache memory 132 and/or the second-level cache memory 134 . In one embodiment, the preset value Y can be set by using a model-specific register (MSR) of the
在步骤214,数据预取器136预取接续在后的两条快取列至第二层快取存储器134。由于判断步骤208不存在有明确方向,数据预取器136预取数据至第二层快取存储器134而非第一层数据快取存储器132;理由是,预取的数据有较大的机会为非必要,因此,数据预取器136不倾向将潜在可能会用到的数据放置于第一层数据快取存储器132。图2流程于步骤214结束。In
在步骤216,数据预取器136仅预取接续在后的一条快取列至第二层快取存储器134。图2流程于步骤216结束。In step 216 , the data prefetcher 136 only prefetches a subsequent cache row to the second-level cache memory 134 . The process in FIG. 2 ends at step 216 .
在判断步骤218中,数据预取器136判断与目前快取列相关的所述载入操作是否是在连续时脉周期内被接收。若载入是在连续时脉周期内被接收,则表示程序正以极快速度扫描存储器,因此数据预取器136必须预取更多的数据,以领先程序进行的速度,例如,在程序进行请求之前,在第一层数据快取存储器132中预先准备有后续快取列的数据。在一种实施方式中,若关于目前快取列的最新至少C次载入操作的历史记录细项144的连续性栏位156都有被标示,则数据预取器136会得知所述载入操作是在连续时脉周期内被接收,其中C为预设值。在一种实施方式中,预设值C为3,然而,预设值C为一设计参数,可基于各种因素,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,预设值C可通过微处理器100的一型号特有暂存器(MSR)设定。若载入于连续时脉周期所发生,图2流程进入判断步骤232;反之,则流程进入判断步骤222。In decision step 218, the data prefetcher 136 determines whether the load operation associated with the current cache line is received in consecutive clock cycles. If the load is received in consecutive clock cycles, it means that the program is scanning the memory very fast, so the data prefetcher 136 must prefetch more data to get ahead of the program, e.g. Before the request, the data of the subsequent cache column is pre-prepared in the first-level data cache memory 132 . In one embodiment, if the continuity field 156 of the history entry 144 of the latest at least C load operations for the current cache line is marked, the data prefetcher 136 will know that the load The input operation is received in continuous clock cycles, where C is a preset value. In one embodiment, the default value C is 3, however, the default value C is a design parameter, which may be based on various factors, such as the first-level data cache memory 132 and/or the second-level cache memory 134 The size is decided. In one embodiment, the preset value C can be set by a model specific register (MSR) of the
在判断步骤222中,数据预取器136判断关于目前快取列的所述载入操作是否为大数据量;详细技术与上述判断步骤212类似。若为大数据量,图2流程进入步骤224;反之,流程进入步骤226。In the judging step 222 , the data prefetcher 136 judges whether the load operation on the current cache line is a large amount of data; the detailed technique is similar to the above judging step 212 . If the amount of data is large, the process in FIG. 2 enters
在步骤224中,数据预取器136依循步骤208所判断的明确方向,预取接续的两条快取列至第一层数据快取存储器132。由于判断步骤208判定有明确方向存在,数据预取器136会将数据预取至第一层数据快取存储器132而非第二层快取存储器134;理由是,预取的数据极有可能确实被使用到,预取器136会倾向将有可能用到的数据放置在第一层数据快取存储器132。图2流程结束于步骤224。In
在步骤226中,数据预取器136会依循步骤208所判断的明确方向预取接续的一条快取列至第一层数据快取存储器132。图2流程结束于步骤226。In
在判断步骤232中,数据预取器136会判断目前快取列的所述载入操作是否为大数据量;详细内容与前述判断步骤212内容类似。若载入为大数据量,图2流程进入步骤234;反之,图2流程进入步骤236。In the judging
在步骤234,数据预取器136会依循步骤208所判断出的明确方向预取接续的三条快取列至第一层数据快取存储器132。图2流程结束于步骤234。In
在步骤236,数据预取器136会依循步骤208所判断出的明确方向预取接续的两条快取列至第一层数据快取存储器132。图2流程结束于步骤236。In
接着讨论图3技术,其以方块图描述本发明另外一种实施方式所实现的微处理器100,该微处理器100具有一数据预取器136。图3的数据预取器136与图1所介绍的数据预取器136相似,且也以类似图2的方式操作,以下讨论两种实施方式的不同点。关于图2中步骤204所作的历史记录更新以及判断步骤208所作的明确方向判定,图3数据预取器136有作以下调整。在图3实施例中,历史队列142的历史记录细项144并不包括方向性栏位158。此外,数据预取器136包括一最低指针暂存器(min pointerregister)304以及一最高指针暂存器(max pointer register)306,由控制逻辑146控制,分别指向目前快取列内自数据预取器136开始追踪目前快取列的读取后所发生过的最低与最高地址变量。数据预取器136还包括一最低指针变化计数器308以及一最高指针变化计数器312,用以自数据预取器136开始追踪目前快取列的读取后,分别计数上述最低指针暂存器304与最高指针暂存器306的变化次数。以下讨论如何以图3所揭露的实施例实现图2中步骤204所述的数据预取器136操作。通过判断上述最低指针变化计数器308以及最高指针变化计数器312之间的差值是否大于一预设值,控制逻辑146判断是否有一明确方向存在。在一种实施方式中,该预设值为1;然而,该预设值为一设计参数,可基于多种变数,例如第一层数据快取存储器132以及/或第二层快取存储器134的尺寸,而决定。在一种实施方式中,该预设值可利用微处理器100的型号特有暂存器(MSR)进行设定。若该最低指针变化计数器308的值较该最高指针变化计数器312的值高出上述预设值以上,则判定出来的明确方向为向下;若该最高指针变化计数器312的值较该最低指针变化计数器308的值高出上述预设值以上,则判定出来的明确方向为向上;其余的情况,则判定没有明确方向。此外,若此次新的载入操作的载入地址不与历史队列142所记载其他载入操作指向同样的快取列,则控制逻辑146清除最高指针变化计数器312以及最低变化指针计数器308。Next, the technique of FIG. 3 is discussed, which uses a block diagram to describe a
现在参考图4,其以流程图描述图3实施例的数据预取器136如何实现图2中步骤204的动作。图4流程起于步骤404。Referring now to FIG. 4 , a flow chart is used to describe how the data prefetcher 136 in the embodiment of FIG. 3 implements the actions of
在判断步骤404,控制逻辑146判断新的载入地址,特别是指目前快取列的最新载入地址偏量,是否大于最高指针暂存器306的值。若是,则流程进入步骤406;其余情况,则流程进入判断步骤408。In the determination step 404 , the control logic 146 determines whether the new load address, especially the latest load address offset of the current cache row, is greater than the value of the highest pointer register 306 . If yes, the flow enters step 406; otherwise, the flow enters judgment step 408.
在步骤406,控制逻辑146以新的载入地址偏量更新最高指针暂存器306,并且增量该最高指针变化计数器312。在这种情况下,图4流程结束于步骤406。In step 406 , the control logic 146 updates the top pointer register 306 with the new load address offset and increments the top
在判断步骤408,控制逻辑146判断目前快取列的最新载入地址偏量是否少于最低指针暂存器304的值。若是,图4流程进入步骤412;若为其他状况,则结束图4流程。In the determination step 408 , the control logic 146 determines whether the latest load address offset of the current cache line is less than the value of the lowest pointer register 304 . If yes, the process in FIG. 4 enters step 412; if otherwise, the process in FIG. 4 ends.
在步骤412,控制逻辑146以最新的载入地址偏量更新最低指针暂存器304,并且增量该最低指针变化计数器308。图4流程并于步骤412结束。In step 412 , the control logic 146 updates the lowest pointer register 304 with the latest load address offset and increments the lowest
纵然以上实施例主要讨论载入操作,在其他实施方式中,也可将所揭露的预取技术作适当改良,以应用于储存操作(storeoperations)中。Although the above embodiments mainly discuss the load operation, in other implementations, the disclosed prefetch technology can also be appropriately improved to apply to store operations.
虽然以上叙述本发明多种实施方式,必须声明的是,上述内容乃本技术的部分应用例子,并非用来限定本发明的范围。本领域技术人员可依循本发明特征,以现有技术另外发展出许多变形。例如,可以软件方式实现本发明所揭露的内容,例如,所揭露的设备或方法的功能、制作、模型化、模拟、说明以及/或测试。上述软件可采用常见的程序语言(例如,C、C++)、硬件描述语言(hardware description language,HDL)包括VerilogHDL、VHDL等或其他可用的程序语言。上述软件可载于现有的任何计算机储存介质,例如,磁记录装置(magnetic tape)、半导体(semiconductor)、磁盘(magnetic disk)或光盘(optical disc、如CD-ROM、DVD-ROM等),也可载于网路、有线系统或其他通讯介质。本发明所揭露的各种装置与方法可由一半导体智慧财产权核心,例如一微处理器核,可由硬件描述语言实现和保护,且可被转换为硬件形式,以集成电路方式制作。此外,所揭露的装置与方法也可由硬件与软件共同设计实现。因此,本发明不应受上述任何实施方式所限定,应当根据权利要求限定的范围作解读。特别是,本发明可被实现于一微处理器中,实现一般常用的计算机。本发明技术领域人员有可能基于本发明,以所揭露的概念以及所述的特殊实施方式为基础,设计或调整其他结构,以在不偏离权利要求所界定的内容的前提下,发展与本发明具有同样目的的技术。Although various embodiments of the present invention have been described above, it must be declared that the above contents are some application examples of the present technology, and are not intended to limit the scope of the present invention. Those skilled in the art can follow the features of the present invention and develop many variants with the prior art. For example, the content disclosed in the present invention can be realized in the form of software, for example, the function, production, modeling, simulation, description and/or testing of the disclosed device or method. Above-mentioned software can adopt common programming language (for example, C, C++), hardware description language (hardware description language, HDL) comprises VerilogHDL, VHDL etc. or other available programming languages. The above-mentioned software can be carried on any existing computer storage medium, for example, magnetic recording device (magnetic tape), semiconductor (semiconductor), magnetic disk (magnetic disk) or optical disc (optical disc, such as CD-ROM, DVD-ROM, etc.), It can also be carried on the network, cable system or other communication media. The various devices and methods disclosed in the present invention can be implemented and protected by a semiconductor intellectual property core, such as a microprocessor core, by a hardware description language, and can be converted into hardware and manufactured in an integrated circuit. In addition, the disclosed devices and methods can also be implemented by co-designing hardware and software. Therefore, the present invention should not be limited by any of the above embodiments, but should be interpreted according to the scope defined by the claims. In particular, the present invention can be implemented in a microprocessor, implementing a commonly used computer. Those skilled in the present invention may design or adjust other structures on the basis of the present invention, on the basis of the disclosed concept and the described special implementation mode, so as to develop the same as the present invention without departing from the content defined in the claims. technology with the same purpose.
Claims (31)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US32853010P | 2010-04-27 | 2010-04-27 | |
| US61/328,530 | 2010-04-27 | ||
| US12/869,386 US8291172B2 (en) | 2010-04-27 | 2010-08-26 | Multi-modal data prefetcher |
| US12/869,386 | 2010-08-26 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102156628A CN102156628A (en) | 2011-08-17 |
| CN102156628B true CN102156628B (en) | 2014-04-02 |
Family
ID=44438137
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201110094809.1A Active CN102156628B (en) | 2010-04-27 | 2011-04-14 | Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102156628B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106372006B (en) * | 2015-07-20 | 2019-11-05 | 华为技术有限公司 | A kind of data prefetching method and device |
| US10866897B2 (en) * | 2016-09-26 | 2020-12-15 | Samsung Electronics Co., Ltd. | Byte-addressable flash-based memory module with prefetch mode that is adjusted based on feedback from prefetch accuracy that is calculated by comparing first decoded address and second decoded address, where the first decoded address is sent to memory controller, and the second decoded address is sent to prefetch buffer |
| CN109783399B (en) * | 2018-11-19 | 2021-01-19 | 西安交通大学 | Data cache prefetching method of dynamic reconfigurable processor |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101634971A (en) * | 2009-09-01 | 2010-01-27 | 威盛电子股份有限公司 | Data pre-extraction method and device and computer system |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7177985B1 (en) * | 2003-05-30 | 2007-02-13 | Mips Technologies, Inc. | Microprocessor with improved data stream prefetching |
| US7238218B2 (en) * | 2004-04-06 | 2007-07-03 | International Business Machines Corporation | Memory prefetch method and system |
-
2011
- 2011-04-14 CN CN201110094809.1A patent/CN102156628B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101634971A (en) * | 2009-09-01 | 2010-01-27 | 威盛电子股份有限公司 | Data pre-extraction method and device and computer system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102156628A (en) | 2011-08-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI428746B (en) | Multi-modal data prefetcher | |
| US11842049B2 (en) | Dynamic cache management in hard drives | |
| US10558569B2 (en) | Cache controller for non-volatile memory | |
| US8909866B2 (en) | Prefetching to a cache based on buffer fullness | |
| US20140108740A1 (en) | Prefetch throttling | |
| CN105183663B (en) | Prefetch unit and data prefetch method | |
| US8473680B1 (en) | Hotspot detection and caching for storage devices | |
| US9223705B2 (en) | Cache access arbitration for prefetch requests | |
| US9304919B2 (en) | Detecting multiple stride sequences for prefetching | |
| CN103226521B (en) | Multimode data prefetching device and management method thereof | |
| CN102111448A (en) | Data prefetching method of DHT memory system and node and system | |
| US10002079B2 (en) | Method of predicting a datum to be preloaded into a cache memory | |
| US9256544B2 (en) | Way preparation for accessing a cache | |
| CN105095104B (en) | Data buffer storage processing method and processing device | |
| CN102156628B (en) | Microprocessor, method of prefetching data to the cache memory hierarchy of the microprocessor | |
| CN109196487A (en) | Up/down prefetcher | |
| JP7038656B2 (en) | Access to cache | |
| US9058277B2 (en) | Dynamic evaluation and reconfiguration of a data prefetcher | |
| CN109791469B (en) | Apparatus and method for setting clock speed/voltage of cache memory | |
| US20230342154A1 (en) | Methods and apparatus for storing prefetch metadata | |
| CN106326146B (en) | Check the method whether cache hits | |
| US12423243B2 (en) | Systems and methods for reducing cache fills | |
| JP5609657B2 (en) | Low power design support apparatus and method for semiconductor integrated circuit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |