CN121070813A - Hardware data prefetching method, system, and storage medium based on spatial pattern matching - Google Patents
Hardware data prefetching method, system, and storage medium based on spatial pattern matchingInfo
- Publication number
- CN121070813A CN121070813A CN202511626952.9A CN202511626952A CN121070813A CN 121070813 A CN121070813 A CN 121070813A CN 202511626952 A CN202511626952 A CN 202511626952A CN 121070813 A CN121070813 A CN 121070813A
- Authority
- CN
- China
- Prior art keywords
- prefetching
- mode
- memory
- current
- prefetch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention provides a hardware data prefetching method, a system and a storage medium based on space pattern matching, wherein the method comprises the steps of obtaining a memory access request, searching a memory subarea matched with the memory subarea in a memory area information table according to a memory access request address corresponding to the memory access request, and obtaining corresponding subarea information; the method comprises the steps of verifying the correctness of a pre-fetching mode of current hardware data according to access request addresses and subarea information to obtain verification information of whether the current pre-fetching mode is correct or not, wherein the pre-fetching mode is used for carrying out hardware data pre-fetching at a specific pre-fetching distance, if the verification information indicates that the current pre-fetching mode is correct, the pre-fetching distance of the pre-fetching mode is adjusted, if the verification information indicates that the current pre-fetching mode is incorrect, one of all alternative pre-fetching modes is selected as the current correct pre-fetching mode according to a preset mode matching method, and hardware data pre-fetching is carried out according to the pre-fetching mode. The invention can improve the prefetching accuracy and reduce the bus pressure.
Description
Technical Field
The invention is suitable for the technical field of processors, and particularly relates to a hardware data prefetching method, a system and a storage medium based on space pattern matching.
Background
In modern processor microarchitecture designs, hardware data prefetching (HARDWARE DATA PREFETCHING) is a critical optimization technique to mitigate the high latency problem between the processor and memory. With the continuous increase of processor clock frequency and the popularization of multi-core architecture, memory access bottlenecks have become a major factor restricting system performance. Hardware data prefetching reduces stalls caused by data misses (CACHE MISS) in executing instructions by predicting memory data that may be needed in the future, loading in advance from main memory or higher level caches (e.g., L3 caches) into lower level caches (e.g., L2 or L1 caches).
Typical data prefetching mechanisms include Stride (Stride) based prefetchers, correlation (Correlation) based prefetchers, machine learning based prefetchers, and the like. For example, in the Sandy Bridge microarchitecture and subsequent families of Intel, a multi-level prefetcher is introduced that is capable of detecting a linear prefetch pattern (e.g., a multi-level walk) and prefetching subsequent data blocks, and in the Zen architecture of AMD, an adaptive prefetch strategy is similarly employed to match the prefetch patterns of different workloads.
Prior art prefetchers are typically integrated in a Load/Store Unit (Load/Store Unit) or cache controller of a processor, which processing structures generate prefetch requests by monitoring address flows, historical access records, or context information.
However, the hardware data prefetching achieves significant effects in improving the performance of the processor, but the existing hardware data prefetching scheme still has a plurality of defects, mainly reflected in the aspects of accuracy, resource consumption, adaptability and the like, and has the following specific defects:
1. The low prefetch accuracy results in cache pollution. Many conventional prefetchers (e.g., stride prefetchers) rely on simple pattern matching, and prefetch accuracy may be less than 50% when the workload prefetches patterns are complex or irregular (e.g., random access or branch intensive code). This can cause invalid data to be loaded into the Cache, taking up space for useful data, causing Cache Pollution (Cache Pollution), and further increasing Cache miss rate and memory bandwidth pressure.
2. Increasing the power consumption and heat of the processor circuit. Prefetch operations involve additional memory accesses and cache updates, which consume power even if prefetching fails. In a mobile device or data center scenario, this may lead to 10% -20% increase in power consumption and exacerbate thermal management problems, especially in high density multi-core processors.
3. Lack of adaptability to dynamic workloads. Existing prefetchers often are based on static or semi-static configurations and cannot adapt to diverse application environments in real time. For example, in virtualized or multithreaded environments, prefetchers may ignore inter-thread interference, resulting in prefetch conflicts or excessive prefetching. Furthermore, for emerging workloads such as AI reasoning or graph computation, existing rule-based prefetching mechanisms have difficulty capturing complex data dependencies, limiting their effectiveness in heterogeneous computing architectures.
4. The hardware overhead is too high. Implementing advanced prefetchers requires additional hardware resources, such as larger history buffers or dedicated prediction logic, which increases chip area and manufacturing costs. This may sacrifice the optimization space of other microarchitectural components with a limited silicon budget.
Therefore, it is necessary to optimize the above defects and propose a new hardware data prefetching scheme.
Disclosure of Invention
The invention provides a hardware data prefetching method, a system and a storage medium based on space pattern matching, and aims to solve the technical problems of low accuracy and poor adaptability caused by the fact that the existing hardware data prefetching scheme depends on a fixed matching pattern.
In order to solve the technical problem, in a first aspect, the present invention provides a hardware data prefetching method based on spatial pattern matching, where the hardware data prefetching method includes the following steps:
acquiring a memory access request, searching a memory subarea corresponding to the matching in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea;
Verifying the correctness of a prefetching mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetching mode is correct or not, wherein the prefetching mode is used for prefetching the hardware data at a specific prefetching distance, and the method comprises the following steps:
if the verification information indicates that the current prefetching mode is correct, adjusting the prefetching distance of the prefetching mode;
If the verification information indicates that the current prefetching mode is incorrect, and when a preset retraining counter reaches a threshold value, selecting one from all the alternative prefetching modes as a new and effective prefetching mode according to a preset mode matching method, wherein the preset retraining counter increases in value every time the verification information indicates that the current prefetching mode is incorrect;
And performing hardware data prefetching according to the prefetching mode.
Further, the memory area includes a plurality of memory subareas, each memory subarea includes a plurality of cache lines, wherein the subarea information included in each memory subarea records the accessed cache line in the memory subarea through an access history bitmap, records the prefetched cache line in the memory subarea through a prefetching history bitmap, records the access direction of the memory subarea through a direction counter, and points to the last accessed cache line through a latest access pointer.
Further, verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information, and obtaining the verification information of whether the current prefetching mode is correct or not, wherein the verification information comprises the following steps:
if the sub-region information indicates that the cache line corresponding to the access request address is prefetched, or the cache line can be prefetched according to the current prefetching mode, judging that the current prefetching mode is correct;
if the sub-region information indicates that the cache line corresponding to the access request address is not prefetched and cannot be prefetched according to the current prefetching mode, or if the access request address and the last accessed memory address are greater than a specific distance, judging that the current prefetching mode is incorrect.
Further, the step of adjusting the prefetch distance of the prefetch mode includes the substeps of defining an acknowledge counter and an adjust counter:
acquiring the current prefetching distance of the prefetching mode;
When the verification information indicates that the current pre-fetch mode is correct, the value of the confirmation counter is increased, and when the value of the confirmation counter reaches a preset second threshold value, the value of the confirmation counter is cleared, and the current pre-fetch mode is changed into the value of the adjustment counter;
when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter does not reach a preset third threshold value, increasing the prefetching distance of the prefetching mode;
and when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter reaches the preset third threshold value, doubling the prefetching distance.
Further, the step of selecting one of all the candidate prefetch patterns as the current correct prefetch pattern according to a preset pattern matching method includes the following sub-steps:
Defining a pattern matching register with a window length being a preset matching length, shifting and reversing the cache line recorded by the access history bitmap according to the latest access pointer and the direction counter in the sub-region information, enabling the recording position of the cache line corresponding to the latest access pointer to be aligned with a first window of the pattern matching register, and copying the aligned data of the access history bitmap to the pattern matching register;
Taking the length of the maximum prefetching mode of all the candidate prefetching modes as an initial step length, continuously extracting two groups of data from the mode matching register by the mode matching register according to the step length in a step length decreasing circulation mode, and comparing whether the two groups of data are identical or not, if so, marking the candidate prefetching mode with the same prefetching distance as the current step length value as a matched prefetching mode;
After the step-down loop is completed, selecting the longest output from all the matched prefetch modes as the new and valid prefetch mode.
Further, the value of the preset matching length is 2 times the value of the maximum pre-fetch mode length.
Further, after the step of verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information to obtain the verification information of whether the current prefetching mode is correct, the method further comprises the following steps:
and updating the access history bitmap of the corresponding memory subarea according to the access request address so as to record the cache line accessed by the access request address.
In a second aspect, the present invention also provides a hardware data prefetching system based on spatial pattern matching, including:
the memory area matching module is used for acquiring a memory access request, searching a memory subarea corresponding to the memory access request in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea;
the prefetch verification module is used for verifying the correctness of a prefetch mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetch mode is correct or not, and the prefetch mode is used for prefetching the hardware data at a specific prefetch distance;
the self-adaptive adjustment module is used for adjusting the prefetching distance of the prefetching mode if the verification information indicates that the current prefetching mode is correct;
The pattern matching module is used for selecting one from all the alternative prefetching modes as a new and effective prefetching mode according to a preset pattern matching method when the verification information indicates that the current prefetching mode is incorrect and the preset retraining counter reaches a preset first threshold value;
the system comprises a prefetch generation module, a preset retraining counter and a verification module, wherein the prefetch generation module is used for prefetching hardware data according to the prefetch mode, and the prefetch generation module is also used for increasing the value of the preset retraining counter when the verification information shows that the current prefetch mode is incorrect each time.
In a third aspect, the present invention also provides a computer device, including a memory, a processor, and a spatial pattern matching based hardware data prefetching program stored on the memory and executable on the processor, where the processor implements the steps of the spatial pattern matching based hardware data prefetching method according to any one of the above embodiments when executing the spatial pattern matching based hardware data prefetching program.
In a fourth aspect, the present invention further provides a storage medium, where a hardware data prefetching program based on spatial pattern matching is stored, where the hardware data prefetching program based on spatial pattern matching, when executed by a processor, implements the steps in the hardware data prefetching method based on spatial pattern matching according to any one of the above embodiments.
The invention has the beneficial effects that the invention provides the hardware data prefetching method based on space pattern matching, which can adjust according to the prefetching distance realizing self-adaption, match more favorable prefetching patterns according to the characteristics of memory cache data so as to improve the prefetching accuracy, and simultaneously, record accessed and prefetched historical information based on a bitmap, can provide high-efficiency data support for pattern matching, reduce hardware cost and further reduce the bus pressure of a processor.
Drawings
The present invention will be described in detail with reference to the accompanying drawings. The foregoing and other aspects of the invention will become more apparent and more readily appreciated from the following detailed description taken in conjunction with the accompanying drawings. In the accompanying drawings:
FIG. 1 is a flowchart of steps of a method for prefetching hardware data based on spatial pattern matching according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a memory region partition map structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a memory region partition lookup process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a prefetch pattern matching process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a hardware data prefetching system based on spatial pattern matching according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Referring to fig. 1, fig. 1 is a step flowchart of a hardware data prefetching method based on spatial pattern matching according to an embodiment of the present invention, where the hardware data prefetching method based on spatial pattern matching includes the following steps:
S101, acquiring a memory access request, searching a memory subarea corresponding to the matching in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea.
Specifically, referring to the memory region partition map structure shown in fig. 2, the memory region includes a plurality of memory sub-regions, the sub-region information included in each memory sub-region records the accessed cache line in the memory sub-region through an access history bitmap, records the prefetched cache line in the memory sub-region through a prefetch history bitmap, records the access direction of the memory sub-region through a direction counter, and points to the last accessed cache line through a latest access pointer.
In the embodiment of the invention, the memory area uses the fully-connected cache structure to record the history information of the area, and the fully-connected cache structure is also used to record the history information of the subarea in the area, wherein the entry numbers of the two cache structures can be matched so as to have adaptability in the processor hardware with different specifications.
For executing a memory request, when a new memory request is received, memory region matching is performed according to the address of the memory request corresponding to the new memory request, at this time, a cache of region information (i.e. a memory region information table) is searched first, a memory region to which the address belongs is found, and then a memory sub-region to which the address belongs is further searched. Particularly, if the address corresponding to the access request fails to be searched in the whole memory area, a proper cache item is found in the corresponding cache structure according to a replacement algorithm to replace, a new item is created, and the sub-area information of the new item is output.
For example, referring to the schematic diagram of the memory region partition searching process shown in fig. 3, the region information may be divided into M items, each item stores information of a memory region, and each memory region information may be divided into N items, each item stores information of a sub-region belonging to the memory region. Based on the partition design, the search process of the sub-region information is as follows:
1. searching memory area information from a memory area information table of the first stage according to the memory access request address, if the memory area information is successfully searched, entering a second step, otherwise, entering a third step;
2. searching sub-region information from the searched memory region information according to the access request address, ending if the searching is successful, otherwise, entering a step four;
3. the memory area information searching fails, the area information of the current access request address is replaced and established from M memory areas according to a replacement algorithm, and the step four is entered;
4. and the memory subarea information search fails, and the subarea information of the current access request address is replaced and established from N subareas of the memory area according to a replacement algorithm.
It should be noted that, the replacement algorithm described in the embodiment of the present invention is an algorithm commonly used in the design of the processor memory, and is designed to determine which existing cache entry is removed to make room when the cache space is full and a new cache entry needs to be loaded, such as LRU (least recently used), FIFO (first in first out), LFU (least frequently used), etc., and may be selected for use according to the needs in the implementation process.
S102, verifying the correctness of a prefetching mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetching mode is correct or not, wherein the prefetching mode is used for prefetching the hardware data at a specific prefetching distance. If the verification information indicates that the current prefetching mode is correct, step S103 is executed, and if the verification information indicates that the current prefetching mode is incorrect, step S104 is executed.
In the embodiment of the present invention, the prefetch mode determines which cache lines are to be prefetched, the prefetch distance determines the number of cache lines loaded during prefetching, for example "+64b" indicates that the cache line from the current cache line 64B is prefetched if the cache line length is 64B, and similarly "+128B" indicates that the cache line from the current cache line 128B is prefetched if the cache line length is 128B. Different prefetch sizes are suitable for different application scenarios, small granularity (such as 64B) is suitable for scalar access and random access, medium granularity (such as 128B) is suitable for vector operation and medium step access, and large granularity (such as 256B) is suitable for tensor operation and regular stride access. Therefore, the cache line loading capacity of the prefetching mode is adjusted according to actual needs in the process of prefetching the hardware data, so that the prefetched cache line is matched with the size of the memory segment of the equipment, and the bus utilization rate can be improved. To achieve this objective, in the embodiment of the present invention, during each access request, a specific relationship between the current prefetch mode and the cache line in the memory area is compared and verified to determine whether the current prefetch mode is in the optimal utilization ratio.
Specifically, verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information, and obtaining verification information of whether the current prefetching mode is correct or not, wherein the verification information comprises the following steps:
if the sub-region information indicates that the cache line corresponding to the access request address is prefetched, or the cache line can be prefetched according to the current prefetching mode, judging that the current prefetching mode is correct;
if the sub-region information indicates that the cache line corresponding to the access request address is not prefetched and cannot be prefetched according to the current prefetching mode, or if the access request address and the last accessed memory address are greater than a specific distance, judging that the current prefetching mode is incorrect.
In addition, after the step of verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information to obtain the verification information of whether the current prefetching mode is correct, the method further comprises the following steps:
and updating the access history bitmap of the corresponding memory subarea according to the access request address so as to record the cache line accessed by the access request address.
In the embodiment of the invention, the record updating of the access history bitmap can be performed in an out-of-order mode, because whether the out-of-order execution is performed or not does not influence the selection of the load prefetching mode, the updating mode can be determined according to the performance of the prefetching unit in actual use.
In particular, in the embodiment of the present invention, the number of times that the current prefetch mode is determined to be incorrect in step S102 is recorded by a retraining counter, where the retraining counter is used to perform the determination in step S104 in the embodiment of the present invention, and when the value of the retraining counter reaches the preset first threshold, the relevant step in step S104 is further performed.
And S103, if the verification information indicates that the current prefetching mode is correct, adjusting the prefetching distance of the prefetching mode.
In the embodiment of the invention, the correct prefetching mode can send out the prefetching request in advance by dynamically adjusting the prefetching distance, thereby improving the performance of the prefetching unit.
To this end, the step of adjusting the prefetch distance of the prefetch mode comprises the sub-steps of defining an acknowledge counter and an adjustment counter:
s1031, acquiring the current prefetching distance of the prefetching mode;
S1032, when the verification information shows that the current pre-fetch mode is correct, increasing the value of the confirmation counter, and when the value of the confirmation counter reaches a preset second threshold value, keeping the value of the confirmation counter unchanged, and converting into increasing the value of the adjustment counter;
s1033, when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter does not reach a preset third threshold value, increasing the prefetching distance of the prefetching mode;
S1034, doubling the prefetching distance when the value of the confirmation counter reaches the preset second threshold and the value of the adjustment counter reaches the preset third threshold.
Generally, in step S1031, the default initial distance is used for the prefetch distance for the prefetch mode for which the distance adjustment is not performed. In the embodiment of the present invention, two counters are used to mark different stages of increasing the prefetch distance, in the implementation process, step S1033 is adjusted according to the degree of increasing 1 prefetch distance each time, and step S1034 is adjusted according to the degree of doubling the prefetch distance each time. By the design, the prefetching distance can be gradually increased on the premise that the prefetching mode is correctly matched with the characteristics of the cache line data, and the effect of optimizing the prefetching performance is achieved.
Meanwhile, as a limiting means, during the adjustment of the prefetch distance in step S103, a maximum threshold may be designed according to the initial prefetch distance of the current prefetch mode, and during the adjustment in steps S1033 and S1034, the adjusted prefetch distance will not exceed the maximum threshold, so as to prevent premature or excessive prefetching.
S104, if the verification information indicates that the current prefetching mode is incorrect, and when a preset retraining counter reaches a threshold value, selecting one of all the alternative prefetching modes as a new and effective prefetching mode according to a preset mode matching method, wherein the preset retraining counter increases the value when the verification information indicates that the current prefetching mode is incorrect each time.
In the embodiment of the present invention, step S104 is configured to select a new and valid prefetch mode when the prefetch mode is incorrect, and determine whether to execute according to whether the value of the preset retrain counter reaches the threshold value. It will be understood that the case described in step S102 that the cache line corresponding to the access request address is not prefetched, and cannot be prefetched according to the current prefetch mode, or the access request address is greater than the last accessed memory address by a specific prefetch distance, is for the case of the existing prefetch mode, and in the case of the prefetch mode that is not currently being used (for example, the system is started for the first time, the relevant prefetch mode is not started yet), a new prefetch mode needs to be selected for use, and at this time, the value of the preset retrain counter may be ignored for execution.
Specifically, the step of selecting one of all the candidate prefetch patterns as the current correct prefetch pattern according to a preset pattern matching method includes the following sub-steps:
s1041, defining a pattern matching register with a window length being a preset matching length, shifting and reversing the cache line recorded by the access history bitmap according to the latest access pointer and the direction counter in the sub-region information, aligning the recording position of the cache line corresponding to the latest access pointer with the first window of the pattern matching register, and copying the aligned data of the access history bitmap to the pattern matching register;
S1042, taking the length of the maximum prefetching mode of all the candidate prefetching modes as an initial step length, continuously extracting two groups of data from the mode matching register by the mode matching register according to the step length in a step length decreasing cycle mode, and comparing whether the two groups of data are identical or not, if so, marking the candidate prefetching mode with the same prefetching distance as the current step length value as a matched prefetching mode;
s1043, after the step-size decrementing cycle is finished, selecting one output with the longest step size from all the matched prefetching modes as the new and effective prefetching mode.
For example, referring to the schematic diagram of the prefetch pattern matching flow shown in fig. 4, in the embodiment of the present invention, a pattern matching register is used to perform prefetch pattern matching, and the value of the preset matching length of the pattern matching register is 2 times of the value of the maximum prefetch distance, and in the implementation process, the content of the prefetch pattern can be determined by the alternative prefetch pattern. In the embodiment of the present invention, the preset matching length of the pattern matching register is 16, and the maximum prefetch distance of the alternative prefetch pattern is 8.
In step S1041, as shown in FIG. 4B, the access history is shifted and inverted according to the latest access pointer and access direction of the memory sub-region, thereby ensuring the alignment of the latest access pointer and the pattern matching register.
In step S1042, as shown in fig. 4C, all steps are looped to perform pattern matching. For example, when the preset matching length of the pattern matching register is 16 and the maximum prefetch distance of the alternative prefetch pattern is 8, the pattern matching register is used for the methodThe bits are compared in half to find a pattern of length p, where p is greater than 1, and the step cycle proceeds as follows:
a. comparing the two groups of data of [0:7] and [8:15], and finding that the two groups of data are not equal, namely that the mode with the step length of 8 does not belong to the matched prefetched mode;
b. Comparing the two sets of data of [0:6] and [7:16], and finding that the two sets of data are not equal, namely that the mode with the step length of 7 does not belong to the matched prefetched mode;
c. ......
d. comparing the two groups of data of [0:2] and [3:5], finding equality, namely that the mode with the step length of 3 belongs to a matched prefetching mode;
e. Comparing the two groups of data of [0:1] and [2:3], and finding that the two groups of data are not equal, namely, the mode with the step length of 2 does not belong to the matched prefetching mode;
f. in practice, a single step pattern requires the use of three consecutive bits to identify.
After the step cycle is completed, one output with the largest prefetch distance is selected as the current correct prefetch mode, and a mode with a step of 3 is selected as the prefetch mode and output in the example shown in fig. 4.
It can be understood that the prefetch mode selection in step S104 is implemented based on the cache line access history recorded in the memory sub-area, so that one precondition for this step further includes that the history information recorded in the memory sub-area satisfies the condition required for the mode matching, that is, the maximum prefetch mode length for all the alternative prefetch modes preset. In the embodiment of the invention, the number of the history information recorded by the memory subarea is required to be at least 2 times larger than the maximum prefetching mode length, otherwise, the number of the alternative prefetching modes and the prefetching mode length are required to be adjusted so as to ensure that the method of the embodiment of the invention can be normally executed.
S105, performing hardware data prefetching according to the prefetching mode.
Specifically, for step S105, the implementation process is performed according to the following substeps:
S1051, when a matched prefetching mode exists and a prefetching queue is not full, starting prefetching address generation;
S1052, after starting prefetch address generation, firstly generating a prefetch address according to a prefetch mode and a prefetch distance, then accessing a memory subarea corresponding to the prefetch address, and then updating prefetch information into a prefetch history bitmap of the memory subarea;
s1053, according to the idle condition of the L1 data cache, acquiring the prefetch addresses from the prefetch queue according to the enqueue sequence, and sending out the prefetch request downwards. At the end of step S1053, the flow of one complete hardware data prefetch ends.
In practice, steps S101-S105 are performed in stages in the processor, and in particular, for step S105, it mainly performs hardware data prefetching by waiting for the front end to confirm the correct prefetching mode, in other words, when the correct prefetching mode is not available for execution, the execution entity of step S105 may in turn inform the front end to perform the matching of the prefetching mode. To achieve this effect, the retraining counter is implemented by the execution body of step S105, and the value of the retraining counter is incremented by 1 each time the prefetch pattern is determined to be incorrect in step S103, and when the retraining counter reaches the preset retraining threshold, the execution body of step S105 may set the current prefetch pattern to be invalid, and notify the front end of the execution body of step S104 to perform new pattern matching. In particular, in the case that the number of the retraining counter increases but the preset first threshold is not yet reached, the execution subject of step S105 does not notify the front end of performing the new pattern matching of the subject of step S104, but the flow of the embodiment of the present invention is still performed using the existing prefetch pattern.
The invention has the beneficial effects that the invention provides the hardware data prefetching method based on space pattern matching, which can adjust according to the prefetching distance realizing self-adaption, match more favorable prefetching patterns according to the characteristics of memory cache data so as to improve the prefetching accuracy, and simultaneously, record accessed and prefetched historical information based on a bitmap, can provide high-efficiency data support for pattern matching, reduce hardware cost and further reduce the bus pressure of a processor.
Example two
Referring to fig. 5, fig. 5 is a schematic structural diagram of a hardware data prefetching system based on spatial pattern matching according to an embodiment of the present invention, which includes:
the memory area matching module 201 is configured to obtain a memory access request, search a memory area information table for a corresponding memory subarea according to a memory access request address corresponding to the memory access request, and obtain subarea information corresponding to the memory subarea;
A prefetch verification module 202, configured to verify, according to the access request address and the sub-region information, the correctness of a prefetch mode of the current hardware data prefetch, to obtain verification information about whether the current prefetch mode is correct, where the prefetch mode is used for performing hardware data prefetch at a specific prefetch distance;
the adaptive adjustment module 203 is configured to adjust a prefetch distance of the prefetch mode if the verification information indicates that the current prefetch mode is correct;
A pattern matching module 204, configured to select one of all the alternative prefetch patterns as a new and valid prefetch pattern according to a preset pattern matching method when the verification information indicates that the current prefetch pattern is incorrect and the preset retraining counter reaches a preset first threshold;
the prefetch generation module 205 is configured to prefetch hardware data according to the prefetch mode, where the prefetch generation module 205 is further configured to increment the value of the preset retrain counter each time the verification information indicates that the current prefetch mode is incorrect.
The spatial pattern matching-based hardware data prefetching system 200 can implement the steps in the spatial pattern matching-based hardware data prefetching method in the above embodiment, and can achieve the same technical effects, and is not described in detail herein with reference to the description in the above embodiment.
Example III
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the computer device 300 includes a memory 302, a processor 301, and a hardware data prefetching program stored in the memory 302 and capable of running on the processor 301 and based on spatial pattern matching.
The processor 301 invokes the spatial pattern matching-based hardware data prefetching program stored in the memory 302 to execute the steps in the spatial pattern matching-based hardware data prefetching method provided in the embodiment of the present invention, please refer to fig. 1, specifically including the following steps:
S101, acquiring a memory access request, searching a memory subarea corresponding to the matching in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea.
The memory area comprises a plurality of memory subareas, each memory subarea comprises a plurality of cache lines, wherein the subarea information contained in each memory subarea records the accessed cache lines in the memory subarea through an access history bitmap, records the prefetched cache lines in the memory subarea through a prefetching history bitmap, records the access direction of the memory subarea through a direction counter, and points to the last accessed cache line through a latest access pointer.
S102, verifying the correctness of a prefetching mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetching mode is correct or not, wherein the prefetching mode is used for prefetching the hardware data at a specific prefetching distance. If the verification information indicates that the current prefetching mode is correct, step S103 is executed, and if the verification information indicates that the current prefetching mode is incorrect, step S104 is executed.
Verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the subarea information, and obtaining verification information of whether the current prefetching mode is correct or not, wherein the verification information comprises the following steps:
if the sub-region information indicates that the cache line corresponding to the access request address is prefetched, or the cache line can be prefetched according to the current prefetching mode, judging that the current prefetching mode is correct;
if the sub-region information indicates that the cache line corresponding to the access request address is not prefetched and cannot be prefetched according to the current prefetching mode, or if the access request address and the last accessed memory address are greater than a specific distance, judging that the current prefetching mode is incorrect.
And after the step of verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the subarea information to obtain the verification information of whether the current prefetching mode is correct or not, the method further comprises the following steps:
and updating the access history bitmap of the corresponding memory subarea according to the access request address so as to record the cache line accessed by the access request address.
And S103, if the verification information indicates that the current prefetching mode is correct, adjusting the prefetching distance of the prefetching mode.
The step of adjusting the prefetch distance of the prefetch mode includes the substeps of defining an acknowledge counter and an adjust counter:
acquiring the current prefetching distance of the prefetching mode;
When the verification information indicates that the current pre-fetch mode is correct, the value of the confirmation counter is increased, and when the value of the confirmation counter reaches a preset second threshold value, the value of the confirmation counter is kept unchanged and is converted into the value of the adjustment counter;
When the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter does not reach a preset third threshold value, increasing the prefetching distance of the prefetching mode;
and when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter reaches the preset third threshold value, doubling the prefetching distance.
S104, if the verification information indicates that the current prefetching mode is incorrect, selecting one from all the alternative prefetching modes according to a preset mode matching method as the current correct prefetching mode.
A step of selecting one of all the alternative pre-fetch modes as a new and valid pre-fetch mode according to a preset pattern matching method, comprising the sub-steps of:
Defining a pattern matching register with a window length being a preset matching length, shifting and reversing the cache line recorded by the access history bitmap according to the latest access pointer and the direction counter in the sub-region information, enabling the recording position of the cache line corresponding to the latest access pointer to be aligned with a first window of the pattern matching register, and copying the aligned data of the access history bitmap to the pattern matching register;
Taking the length of the maximum prefetching mode of all the candidate prefetching modes as an initial step length, continuously extracting two groups of data from the mode matching register by the mode matching register according to the step length in a step length decreasing circulation mode, and comparing whether the two groups of data are identical or not, if so, marking the candidate prefetching mode with the same prefetching distance as the current step length value as a matched prefetching mode;
And after the step-size decrementing cycle is finished, selecting one output with the longest step size from all the matched prefetching modes as the current correct prefetching mode.
The value of the preset matching length is 2 times of the value of the maximum pre-fetch mode length.
S105, performing hardware data prefetching according to the prefetching mode.
The computer device 300 provided in the embodiment of the present invention can implement the steps in the hardware data prefetching method based on spatial pattern matching in the above embodiment, and can implement the same technical effects, and is not described herein again with reference to the description in the above embodiment.
Example IV
The embodiment of the invention also provides a storage medium, on which a hardware data prefetching program based on space pattern matching is stored, and when the hardware data prefetching program based on space pattern matching is executed by a processor, the process and the steps in the hardware data prefetching method based on space pattern matching provided by the embodiment of the invention are realized, and the same technical effects can be realized, so that repetition is avoided, and no further description is provided herein.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by prefetching a program or instruction-related hardware based on spatially matched hardware data, where the program may be stored on a computer-readable storage medium, and where the program, when executed, may comprise the steps of the above-described methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
While the embodiments of the present invention have been illustrated and described in connection with the drawings, what is presently considered to be the most practical and preferred embodiments of the invention, it is to be understood that the invention is not limited to the disclosed embodiments, but on the contrary, is intended to cover various equivalent modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
1. The hardware data prefetching method based on the space pattern matching is characterized by comprising the following steps of:
acquiring a memory access request, searching a memory subarea corresponding to the matching in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea;
Verifying the correctness of a prefetching mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetching mode is correct or not, wherein the prefetching mode is used for prefetching the hardware data at a specific prefetching distance, and the method comprises the following steps:
if the verification information indicates that the current prefetching mode is correct, adjusting the prefetching distance of the prefetching mode;
If the verification information indicates that the current prefetching mode is incorrect, and when a preset retraining counter reaches a threshold value, selecting one from all the alternative prefetching modes as a new and effective prefetching mode according to a preset mode matching method, wherein the preset retraining counter increases in value every time the verification information indicates that the current prefetching mode is incorrect;
And performing hardware data prefetching according to the prefetching mode.
2. The method of claim 1, wherein the memory region includes a plurality of memory sub-regions, each of the memory sub-regions includes a plurality of cache lines, wherein the sub-region information included in each of the memory sub-regions records the accessed cache line in the memory sub-region via an access history bitmap, records the prefetched cache line in the memory sub-region via a prefetch history bitmap, records the access direction of the memory sub-region via a direction counter, and points to the last accessed cache line via a last access pointer.
3. The method for prefetching hardware data based on spatial pattern matching according to claim 2, wherein the step of verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information, and obtaining the verification information of whether the current prefetching mode is correct or not, is characterized in that:
if the sub-region information indicates that the cache line corresponding to the access request address is prefetched, or the cache line can be prefetched according to the current prefetching mode, judging that the current prefetching mode is correct;
if the sub-region information indicates that the cache line corresponding to the access request address is not prefetched and cannot be prefetched according to the current prefetching mode, or if the access request address and the last accessed memory address are greater than a specific distance, judging that the current prefetching mode is incorrect.
4. The method of claim 1, wherein the step of adjusting the prefetch distance of the prefetch pattern comprises the substeps of defining an acknowledge counter and an adjust counter:
acquiring the current prefetching distance of the prefetching mode;
When the verification information indicates that the current pre-fetch mode is correct, the value of the confirmation counter is increased, and when the value of the confirmation counter reaches a preset second threshold value, the value of the confirmation counter is cleared, and the current pre-fetch mode is changed into the value of the adjustment counter;
when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter does not reach a preset third threshold value, increasing the prefetching distance of the prefetching mode;
and when the value of the confirmation counter reaches the preset second threshold value and the value of the adjustment counter reaches the preset third threshold value, doubling the prefetching distance.
5. The method for prefetching hardware data based on spatial pattern matching according to claim 2, wherein the step of selecting one of all the candidate prefetching patterns as the current correct prefetching pattern according to a preset pattern matching method comprises the sub-steps of:
Defining a pattern matching register with a window length being a preset matching length, shifting and reversing the cache line recorded by the access history bitmap according to the latest access pointer and the direction counter in the sub-region information, enabling the recording position of the cache line corresponding to the latest access pointer to be aligned with a first window of the pattern matching register, and copying the aligned data of the access history bitmap to the pattern matching register;
Taking the length of the maximum prefetching mode of all the candidate prefetching modes as an initial step length, continuously extracting two groups of data from the mode matching register by the mode matching register according to the step length in a step length decreasing circulation mode, and comparing whether the two groups of data are identical or not, if so, marking the candidate prefetching mode with the same prefetching distance as the current step length value as a matched prefetching mode;
After the step-down loop is completed, the longest step output from all the matched prefetch patterns is used as the new and valid prefetch pattern.
6. The method for prefetching hardware data based on spatial pattern matching of claim 5 wherein said predetermined match length has a value that is 2 times the value of said maximum prefetch pattern length.
7. The method for prefetching hardware data based on spatial pattern matching according to claim 2, wherein the step of verifying the correctness of the prefetching mode of the current hardware data prefetching according to the access request address and the sub-region information, and obtaining the verification information of whether the current prefetching mode is correct or not, further comprises the steps of:
and updating the access history bitmap of the corresponding memory subarea according to the access request address so as to record the cache line accessed by the access request address.
8. A hardware data prefetch system based on spatial pattern matching, comprising:
the memory area matching module is used for acquiring a memory access request, searching a memory subarea corresponding to the memory access request in a memory area information table according to a memory access request address corresponding to the memory access request, and acquiring subarea information corresponding to the memory subarea;
the prefetch verification module is used for verifying the correctness of a prefetch mode of the current hardware data prefetching according to the access request address and the subarea information to obtain verification information of whether the current prefetch mode is correct or not, and the prefetch mode is used for prefetching the hardware data at a specific prefetch distance;
the self-adaptive adjustment module is used for adjusting the prefetching distance of the prefetching mode if the verification information indicates that the current prefetching mode is correct;
The pattern matching module is used for selecting one from all the alternative prefetching modes as a new and effective prefetching mode according to a preset pattern matching method when the verification information indicates that the current prefetching mode is incorrect and the preset retraining counter reaches a preset first threshold value;
the system comprises a prefetch generation module, a preset retraining counter and a verification module, wherein the prefetch generation module is used for prefetching hardware data according to the prefetch mode, and the prefetch generation module is also used for increasing the value of the preset retraining counter when the verification information shows that the current prefetch mode is incorrect each time.
9. A computer device comprising a memory, a processor and a spatial pattern matching based hardware data prefetch program stored on the memory and executable on the processor, the processor implementing the steps in the spatial pattern matching based hardware data prefetch method as claimed in any one of claims 1 to 7 when executing the spatial pattern matching based hardware data prefetch program.
10. A storage medium having stored thereon a spatial pattern matching based hardware data pre-fetching program, which when executed by a processor implements the steps of the spatial pattern matching based hardware data pre-fetching method according to any of claims 1-7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511626952.9A CN121070813A (en) | 2025-11-07 | 2025-11-07 | Hardware data prefetching method, system, and storage medium based on spatial pattern matching |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511626952.9A CN121070813A (en) | 2025-11-07 | 2025-11-07 | Hardware data prefetching method, system, and storage medium based on spatial pattern matching |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN121070813A true CN121070813A (en) | 2025-12-05 |
Family
ID=97851495
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202511626952.9A Pending CN121070813A (en) | 2025-11-07 | 2025-11-07 | Hardware data prefetching method, system, and storage medium based on spatial pattern matching |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN121070813A (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102169429A (en) * | 2010-03-29 | 2011-08-31 | 威盛电子股份有限公司 | Prefetch unit, data prefetch method and microprocessor |
| US20240184581A1 (en) * | 2022-12-02 | 2024-06-06 | SiFive, Inc. | Bit pattern matching hardware prefetcher |
| CN118245512A (en) * | 2024-05-22 | 2024-06-25 | 北京开源芯片研究院 | Prefetch control method, device, electronic device and readable storage medium |
| CN119883954A (en) * | 2025-03-26 | 2025-04-25 | 山东云海国创云计算装备产业创新中心有限公司 | Data prefetching method and device, electronic equipment and storage medium |
-
2025
- 2025-11-07 CN CN202511626952.9A patent/CN121070813A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102169429A (en) * | 2010-03-29 | 2011-08-31 | 威盛电子股份有限公司 | Prefetch unit, data prefetch method and microprocessor |
| US20240184581A1 (en) * | 2022-12-02 | 2024-06-06 | SiFive, Inc. | Bit pattern matching hardware prefetcher |
| CN118245512A (en) * | 2024-05-22 | 2024-06-25 | 北京开源芯片研究院 | Prefetch control method, device, electronic device and readable storage medium |
| CN119883954A (en) * | 2025-03-26 | 2025-04-25 | 山东云海国创云计算装备产业创新中心有限公司 | Data prefetching method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10353819B2 (en) | Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in a processor-based system | |
| US10417130B2 (en) | System and method for spatial memory streaming training | |
| US9176878B2 (en) | Filtering pre-fetch requests to reduce pre-fetching overhead | |
| US9990297B2 (en) | Processor and control method of processor | |
| US8413127B2 (en) | Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations | |
| US8683129B2 (en) | Using speculative cache requests to reduce cache miss delays | |
| JP3425158B2 (en) | Computer system with multi-buffer data cache | |
| US6578130B2 (en) | Programmable data prefetch pacing | |
| US7424578B2 (en) | Computer system, compiler apparatus, and operating system | |
| EP2612248B1 (en) | Method and apparatus for fuzzy stride prefetch | |
| US20110066830A1 (en) | Cache prefill on thread migration | |
| EP1573555B1 (en) | Page descriptors for prefetching and memory management | |
| US8079031B2 (en) | Method, apparatus, and a system for dynamically configuring a prefetcher based on a thread specific latency metric | |
| US20110067011A1 (en) | Transformation of single-threaded code to speculative precomputation enabled code | |
| CN113407119B (en) | Data prefetching method, data prefetching device and processor | |
| CN109461113B (en) | A data structure-oriented graphics processor data prefetching method and device | |
| US6711651B1 (en) | Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching | |
| US20030084433A1 (en) | Profile-guided stride prefetching | |
| Byna et al. | Taxonomy of data prefetching for multicore processors | |
| CN112199400B (en) | Method and apparatus for data processing | |
| TW202429274A (en) | Performing storage-free instruction cache hit prediction in a processor | |
| Kallahalla et al. | PC-OPT: optimal offline prefetching and caching for parallel I/O systems | |
| CN121070813A (en) | Hardware data prefetching method, system, and storage medium based on spatial pattern matching | |
| US11474946B2 (en) | Calculator and calculation method | |
| Gu et al. | P-OPT: Program-directed optimal cache management |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |