WO2023236443A1 - Processor, electronic device and multi-thread shared instruction prefetching method - Google Patents
Processor, electronic device and multi-thread shared instruction prefetching method Download PDFInfo
- Publication number
- WO2023236443A1 WO2023236443A1 PCT/CN2022/130379 CN2022130379W WO2023236443A1 WO 2023236443 A1 WO2023236443 A1 WO 2023236443A1 CN 2022130379 W CN2022130379 W CN 2022130379W WO 2023236443 A1 WO2023236443 A1 WO 2023236443A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- prefetch
- prefetch request
- request
- cache
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application belongs to the field of computer technology, and specifically relates to a processor, an electronic device, and a multi-thread shared instruction prefetching method.
- the instruction cache is usually designed to alleviate the delay problem of instruction fetch access. If the instruction fetch request hits the instruction cache, the instruction can be obtained immediately. If the instruction fetch request does not hit the instruction cache, the instruction fetch request needs to be sent to the next level cache or main memory to obtain the instruction. Usually this type of processing requires a long response time, thereby reducing performance.
- instruction prefetching technology In order to improve the hit rate of instruction fetch requests in the instruction cache, reduce the instruction fetch delay, and prevent the system from stalling due to insufficient instruction supply, instruction prefetching technology is proposed.
- the key points of instruction prefetch quality are controlling the timing of instruction prefetching (too early or too late will affect the effect of instruction prefetching) and avoiding repeated prefetching by multi-thread groups (wasting prefetching bandwidth).
- the instruction fetching method of current instruction prefetching technology is mainly to prefetch in sequence or jump prefetching according to a fixed step size, that is, to prefetch the next fixed number of instructions based on the position of the existing instruction fetch request.
- This instruction prefetching method The time margin is insufficient and the prefetching is not timely enough.
- the purpose of this application is to provide an instruction prefetching method shared by processors, electronic devices, and multi-threads to improve the existing instruction prefetching technology that causes insufficient prefetching in a timely manner, resulting in a high probability of instruction access failure.
- the problem of waiting for a long time to get the corresponding instructions is to provide an instruction prefetching method shared by processors, electronic devices, and multi-threads to improve the existing instruction prefetching technology that causes insufficient prefetching in a timely manner, resulting in a high probability of instruction access failure.
- Some embodiments of the present application provide a processor, including: an instruction cache and a thread group unit; the thread group unit is configured to send a first instruction prefetch request to the instruction cache, and the first instruction prefetch request is sent to the instruction cache.
- the fetch request is used to obtain instructions corresponding to the thread group to be created in the future; the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetch request.
- the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough Prefetch instructions in advance of the time, which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
- the first instruction prefetch request may include: fetch address information and the number of prefetched instructions, and the instruction cache responds to the first instruction prefetch request.
- Instruction prefetching may include: the instruction cache responds to the first instruction prefetch request, obtains the number of prefetched instructions from a later-level cache according to the fetch address information, and stores them in the instruction cache. .
- the processor may further include: an instruction fetching unit; the thread group unit may also be configured to distribute the created thread group to the instruction fetching unit ;
- the instruction fetch unit may be configured to issue a corresponding instruction fetch request based on the received thread group;
- the instruction cache may also be configured to respond to the instruction fetch request and return the instruction hit by the instruction fetch request.
- the created thread group is distributed to the instruction fetch unit, so that the instruction fetch unit issues the corresponding instruction fetch request based on the received thread group, and obtains the corresponding instruction from the instruction cache for subsequent processing. Operation, in addition, since the instruction fetch request is sent after the first instruction prefetch request, this can reduce the probability of subsequent instruction access failure, especially for new threads, the effect is particularly obvious.
- the number of instruction fetching units may be multiple.
- the instruction fetching unit may also be configured to monitor the storage amount of instructions corresponding to each thread group that have been prefetched into the instruction cache. When the amount is less than the preset threshold, a second instruction prefetch request is sent to the instruction cache; the instruction cache may also be configured to perform instruction prefetching in response to the second instruction prefetch request.
- a second instruction prefetch is sent to the instruction cache.
- the request is to facilitate instruction prefetching in advance to increase the probability of subsequent instruction fetch requests hitting instructions and shorten the waiting time for instruction fetching.
- the instruction cache may be further configured to: before performing instruction prefetching in response to the second instruction prefetch request, after receiving the second instruction prefetch request, When a fetch request is made, it is determined that the second instruction prefetch request does not exist in the prefetch status table, wherein the prefetch status table is used to record the second instruction prefetch request in progress, or the second instruction prefetch request that has been completed within a period of time. Second instruction prefetch request.
- the instruction cache before performing instruction prefetching in response to the second instruction prefetching request, the instruction cache first determines that there is no second instruction prefetching request in the prefetch status table, and then responds to the second instruction prefetching request. Instruction prefetching can avoid repeated prefetching and waste of prefetching bandwidth.
- the instruction cache may be further configured to, when receiving a second instruction prefetch request, record the received second instruction prefetch request in the cache. Get the status table.
- the received second instruction prefetch request is recorded in the prefetch status table to facilitate the management of the second instruction prefetch request and learn the instruction prefetch of the second instruction prefetch request. Condition.
- the instruction cache may be configured specifically: in the prefetch status table, there is no data from the received second instruction prefetch request. If there are other instruction prefetch requests of the corresponding thread group, the received second instruction prefetch request will be directly recorded in the prefetch status table; or, if there is an instruction prefetch request from the corresponding thread group in the prefetch status table Other instruction prefetch requests of the thread group corresponding to the received second instruction prefetch request are updated, and the prefetch range of the other instruction prefetch requests is updated.
- the updated prefetch range of the other instruction prefetch requests includes The prefetch range of the received second instruction prefetch request.
- the received second instruction prefetch request when recording the received second instruction prefetch request in the prefetch status table, if there are no other instruction prefetch requests from the same thread group in the prefetch status table, then the received second instruction prefetch request will be recorded directly in the prefetch status table. The received second instruction prefetch request is recorded in the prefetch status table. If there are other instruction prefetch requests from the same thread group, the prefetch ranges of other instruction prefetch requests are updated to the updated prefetch range.
- the fetch range includes the prefetch range of the received second instruction prefetch request, so that repeated prefetching and waste of prefetch bandwidth can be avoided in the future.
- the priority of the instruction cache in responding to the first instruction prefetch request may be lower than the priority of responding to the second instruction prefetch request.
- instructions can be processed for the thread group by utilizing the idle bandwidth of the instruction cache. Prefetching, so that it does not affect the normal memory access performance of the instruction cache.
- the instruction cache may be configured to delete the expired prefetch range and the expired second instruction prefetch request in the prefetch status table.
- the second instruction prefetch request may include instruction fetch address information and the number of prefetched instructions.
- inventions of the present application also provide an electronic device, including a body and a processor as provided in any of the above-mentioned embodiments of the present application and/or in combination with any possible implementation of some embodiments of the present application.
- Some embodiments of the present application also provide a multi-thread shared instruction prefetching method, including: the instruction cache obtains a first instruction prefetch request sent by a thread group unit, wherein the first instruction prefetch request is The thread group unit sends it before creating the thread group.
- the first instruction prefetch request is used to obtain instructions corresponding to the thread group to be created in the future; the instruction cache responds to the first instruction prefetch request. Perform instruction prefetching.
- the first instruction prefetch request may include: fetch address information and the number of prefetched instructions, and the instruction cache responds to the first instruction prefetch request.
- Performing instruction prefetching may include: the instruction cache responds to the first instruction prefetch request, obtains the number of prefetched instructions from a later-level cache according to the instruction fetch address information, and stores them in the instruction In cache.
- the method may further include: the instruction cache receiving a second instruction prefetch request issued from the instruction fetch unit, wherein the second instruction prefetch request The fetch request is sent when the storage amount of instructions prefetched into the instruction cache corresponding to the thread group is less than a preset threshold; the instruction cache performs instruction prefetching in response to the second instruction prefetch request.
- the method may further include: the instruction cache receives When the second instruction prefetch request is made, it is determined that the second instruction prefetch request does not exist in the prefetch status table, wherein the prefetch status table is used to record the ongoing second instruction prefetch request, or Second instruction prefetch requests that have completed within a certain period of time.
- the method may further include: when receiving the second instruction prefetch request, there is no instruction in the prefetch status table from the received instruction prefetch request. For other instruction prefetch requests of the thread group corresponding to the second instruction prefetch request, the received second instruction prefetch request is directly recorded in the prefetch status table; or, in the prefetch status There are other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request in the table. The prefetch range of the other instruction prefetch requests is updated. The updated other instruction prefetch requests are The prefetch range of the fetch request includes the prefetch range of the received second instruction prefetch request.
- Figure 1 shows a schematic structural diagram of a processor provided by an embodiment of the present application.
- FIG. 2 shows a schematic structural diagram of yet another processor provided by an embodiment of the present application.
- FIG. 3 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 4 shows a schematic flowchart of a multi-thread shared instruction prefetching method provided by an embodiment of the present application.
- connection should be understood in a broad sense.
- it can be a fixed connection or a detachable connection, or Integrated connection; it can also be electrical connection; it can be directly connected, or it can be indirectly connected through an intermediate medium; it can be internal connection between two components.
- connection should be understood in a broad sense.
- it can be a fixed connection or a detachable connection, or Integrated connection; it can also be electrical connection; it can be directly connected, or it can be indirectly connected through an intermediate medium; it can be internal connection between two components.
- the specific meanings of the above terms in this application can be understood on a case-by-case basis.
- the prefetch is not timely enough, if an instruction fetch request accesses the prefetch instruction, there is a high probability that the prefetch action of the prefetch instruction has just occurred or is in progress. (that is, the corresponding prefetch instruction has not yet been obtained), especially for a new thread that has just been started. Since the cold start time of the new thread is relatively long, all the initial fetch requests of the new process are access failures (misses). , it takes a long time to get the corresponding instructions.
- embodiments of the present application provide a new multi-thread shared instruction prefetching method, which can effectively reduce the probability of instruction access failure and reduce the waiting time for instruction fetching requests, especially for new threads. Especially obvious.
- the processor may include: an instruction cache and a thread group unit, and the instruction cache and the thread group unit may be connected.
- the thread group unit is responsible for creating the thread group. Since the cold start time of the new thread is relatively long, in order to reduce the probability of instruction access failure and reduce the waiting time of the instruction fetch request.
- the thread group unit can be configured to send the first instruction prefetch request to the instruction cache in advance before creating the thread group.
- the first instruction prefetch request can be used to obtain a period of time in the future (the length of time can be configured according to application requirements). ), so that there is enough time to prefetch instructions in advance, which can reduce the probability of instruction access failure, especially for new threads, the effect is particularly obvious.
- the first instruction prefetch request carries control information for instruction prefetching, for example, fetch address information (to obtain the global access address of the instruction) and the number of prefetched instructions (used to indicate the number of instructions that need to be prefetched, Its value can be flexibly configured as needed).
- fetch address information to obtain the global access address of the instruction
- the number of prefetched instructions used to indicate the number of instructions that need to be prefetched, Its value can be flexibly configured as needed.
- the fetch address information may only include the first address information of the fetched instruction, and it is enough to continuously fetch instructions as many as the prefetched instructions starting from the first address information.
- the access addresses of the instructions to be prefetched are not consecutive, the fetch address information needs to include the access addresses corresponding to the number of prefetched instructions. For example, if the number of prefetched instructions is 32, the fetch address information needs to include the access addresses corresponding to the fetched 32 instructions.
- a thread group can include multiple (such as 16, 32, 64, etc.) threads that obtain the same instruction. Multiple threads that obtain the same instruction are divided into the same thread group so that they can be fetched simultaneously by sending an instruction request. Obtain the instructions required by multiple threads and improve the efficiency of instruction acquisition.
- the instruction cache may be configured to perform instruction prefetching in response to the first instruction prefetch request.
- the first instruction prefetch request may include: fetch address information and the number of prefetched instructions.
- the instruction cache is specifically configured to respond to the first instruction prefetch request, obtain instructions with the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information, and store them in the instruction cache. For example, assuming that the number of prefetched instructions is 64, 64 instructions are fetched from the lower-level cache based on the fetch address information.
- prefetch instructions can be flexibly configured as needed and is not limited to 64 or 32 in the above example. Therefore, the 32 and 64 in the example cannot be understood as a limitation on the present application.
- the priority of the first instruction prefetch request is relatively low, so that it will not affect the normal memory access work of the instruction cache, and the idle bandwidth of the instruction cache is used to prefetch instructions for the new thread group to reduce the number of thread groups.
- the cold start time and the number of prefetched instructions can be flexibly configured to obtain the best performance according to different application scenarios.
- the instruction cache may be performing instruction fetch of an earlier created thread group, and the first instruction prefetch request will Due to the backlog of no response, if the thread group corresponding to the first instruction prefetch request has finished executing, the first instruction prefetch request will be invalid and will be automatically deleted.
- the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough time.
- Perform instruction prefetching in advance which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
- the processor further includes an instruction fetching unit, the schematic diagram of which is shown in Figure 2 .
- the instruction fetch unit is connected to the thread group unit and the instruction cache respectively.
- the thread group unit can also be configured to distribute created thread groups to the fetch unit. A period of time after the thread group unit sends the first instruction prefetch request (which can be configured as needed), the thread group unit will formally create the thread group corresponding to the first instruction prefetch request, and then distribute the created thread group to Fetch unit.
- the number of instruction fetching units can be multiple (two or two).
- the specific value of the instruction fetching unit can be determined according to the requirements of parallel instruction fetching. For example, if it needs to support If 8 thread groups concurrently fetch instructions, the number of instruction fetch units is 8. If 16 thread groups need to support concurrent instruction fetching, the number of instruction fetch units is 16.
- Each instruction fetch unit has the same function, for example, it is used to issue an instruction fetch request to the instruction cache based on the thread group.
- Each instruction fetch request carries a global access address, and the instruction can be fetched based on the access address.
- the instruction cache may also be configured to respond to an instruction fetch request and return to the instruction fetch unit the instruction hit by the instruction fetch request. If the instruction fetch request hits the instruction cache, the instruction can be obtained immediately. If the instruction fetch request does not hit the instruction cache, the instruction cache needs to send the instruction fetch request to the subsequent level cache (next level cache or main memory) to obtain the instruction.
- the instruction fetching unit can also be configured to monitor the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group, and when it is detected that the storage amount is less than a preset threshold (flexibly configurable), The instruction cache sends a second instruction prefetch request.
- the instruction cache may also be configured to perform instruction prefetching in response to a second instruction prefetch request.
- the instruction fetch unit can monitor the consumption status of the prefetch instructions of each thread group. When it detects that the prefetch instruction storage amount is insufficient (less than the preset threshold), it starts the second instruction prefetch request of the thread group and sends the instruction to the instruction fetcher.
- the cache sends a second instruction prefetch request.
- the preset threshold is 32, it means that when the storage amount of instructions that have been prefetched into the instruction cache corresponding to the thread group is less than 32, instruction prefetching is required. For example, assuming that a thread group (assuming that the thread group requires a total of 192 instructions) is not created, the instruction cache prefetches the 128 instructions required by the thread group in advance, and then when the instruction fetch unit starts executing the thread group, An instruction fetch request will be issued to the instruction cache to obtain the corresponding instruction. When the 97th instruction is obtained, because the cached instruction storage amount is lower than the preset threshold (32 at this time), an instruction fetch request will be issued to the instruction cache.
- a second instruction prefetch request to obtain instructions after the thread group, such as the 129th to 160th instructions.
- a second instruction prefetch request will be sent to the instruction cache to fetch it.
- the instructions after the thread group for example, get the 161st to 192nd instructions.
- the above instruction fetch unit monitors the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group, and issues the instruction when the storage amount is insufficient.
- the second instruction prefetch request explains the principle of prefetching instructions. Various special values in the above examples should not be understood as limitations on this application.
- the second instruction prefetch request is similar to the first instruction prefetch request, and also carries control information for instruction prefetching, for example, fetch address information (to obtain the global access address of the instruction) and the number of prefetched instructions (used in Yu represents the number of instructions that need to be prefetched, and its value can be flexibly configured as needed).
- control information for instruction prefetching for example, fetch address information (to obtain the global access address of the instruction) and the number of prefetched instructions (used in Yu represents the number of instructions that need to be prefetched, and its value can be flexibly configured as needed).
- the instruction cache may also be configured to notify the The instruction prefetching situation of the instruction fetch unit.
- the instruction fetch unit will be notified of the instruction prefetching situation regularly or irregularly, so as to facilitate the instruction fetching unit. Learn in real time the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group.
- the instruction fetch range of the first instruction prefetch request and the instruction fetch range of the second instruction prefetch request may or may not overlap, and may be the instruction fetch range of the second instruction prefetch request.
- the instruction fetch range of the first instruction prefetch request is address 0 to address 31, and the instruction fetch range of the second instruction prefetch request may be address 32 to address 63.
- the instruction fetch ranges of the two may overlap or even be the same, that is, the instruction fetch range of the first instruction prefetch request and the instruction fetch range of the second instruction prefetch request can be flexibly set according to needs.
- the priority of the first instruction prefetch request is relatively low. This may mean that the priority of the instruction cache responding to the first instruction prefetch request is lower than the priority of the second instruction prefetch request. If the workload of the instruction cache is heavy, When, the instruction cache may not respond to the first instruction prefetch request of the thread group, then the corresponding instructions of the thread group that have been prefetched into the instruction cache are zero. At this time, the instruction fetch unit is sending the second instruction prefetch request. When fetching a request, the fetch range can start from the first address or start from an address after the first address. If the instruction cache has responded to the first instruction prefetch request, when the instruction fetch unit sends the second instruction prefetch request, its instruction fetch range can fetch instructions immediately adjacent to the instruction fetch range of the first instruction prefetch request.
- the instruction cache can also be configured to: before performing instruction prefetching in response to the second instruction prefetch request, when receiving the second instruction prefetch request, determine There is no second instruction prefetch request in the fetch status table, where the prefetch status table is used to record an ongoing second instruction prefetch request, or a second instruction prefetch request that has been completed within a period of time.
- the instruction cache when receiving the second instruction prefetch request, the instruction cache does not directly respond to the second instruction prefetch request to perform instruction prefetching, but determines that the second instruction prefetch does not exist in the prefetch status table.
- Instruction prefetching is performed only in response to the second instruction prefetch request when the second instruction prefetch request is received. If there is a second instruction prefetch request in the prefetch status table, it indicates that the second instruction prefetch request is in the process of instruction prefetching, or that the instruction prefetching has been completed, and it can be discarded directly.
- the instruction cache may be further configured to record the received second instruction prefetch request in the prefetch status table when the second instruction prefetch request is received.
- the instruction cache is configured such that there are no other instructions from the thread group corresponding to the received second instruction prefetch request in the prefetch status table.
- Prefetch request (for other second instruction prefetch requests from the same thread group as the current second instruction prefetch request), directly record the received second instruction prefetch request in the prefetch status table; or, There are other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the updated other instruction prefetch requests
- the prefetch range includes the prefetch range of the received second instruction prefetch request.
- examples are provided. Assume that the instruction fetch unit issues a second instruction prefetch request for a certain thread group for the first time. At this time, since there are no other instruction prefetch requests from the thread group in the prefetch status table, the instruction prefetch request is directly The second instruction prefetch request is recorded in the prefetch status table. For another example, after that, the instruction fetch unit issues a second instruction prefetch request for the thread group for the second time.
- the second instruction prefetch request issued for the first time because there are other instruction prefetch requests from the thread group (i.e., the first instruction prefetch request) in the prefetch status table, The second instruction prefetch request issued for the first time), therefore, only the prefetch range of other instruction prefetch requests in the prefetch status table (in this case, the second instruction prefetch request issued for the first time) needs to be carried out. Update it to include the prefetch range of the second instruction prefetch request issued for the second time. That is, the control information used for instruction prefetching (for example, including instruction fetch address information and the number of prefetched instructions) in the second instruction prefetch request issued for the first time is updated.
- the instruction cache can also be configured to delete the expired prefetch range in the prefetch status table, as well as the expired second instruction prefetch request, that is, delete the second instruction prefetch request in the prefetch status table whose completion time exceeds the preset time. Instruction prefetch request, retain the second instruction prefetch request whose completion time is within the preset time.
- the processor shown in this application can be improved on the architecture of existing mainstream processors, so that while supporting high concurrent access to the instruction cache, it can effectively reduce the probability of instruction access failure and reduce the acquisition of instruction fetch requests. How long to wait for the command.
- the existing mainstream processors can be general-purpose processors, including central processing units (Central Processing Unit, CPU), network processors (Network Processor, NP), graphics processors (Graphics Processing Unit, GPU), etc.; they can also It is a Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or it can be a microprocessor or anything else Conventional processors, etc.
- embodiments of the present application also provide an electronic device, which may include a body and the above-mentioned processor.
- the ontology may include a transceiver, communication bus, memory, etc.
- the structure of the electronic device is shown in Figure 3 .
- the components of the transceiver, memory, and processor are directly or indirectly electrically connected to each other to realize data transmission or interaction.
- these components may be electrically connected to each other through one or more communication buses or signal lines.
- the transceiver can be used to send and receive data.
- Memory can be used to store data.
- the memory can be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable memory In addition to read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
- RAM Random Access Memory
- ROM Read Only Memory
- PROM Programmable Read-Only Memory
- EPROM Erasable Programmable Read-Only Memory
- EEPROM Electrically erasable read-only memory
- the above-mentioned electronic devices include but are not limited to smartphones, tablets, computers, servers, etc.
- embodiments of the present application also provide a multi-thread shared instruction prefetching method, as shown in Figure 4 .
- the multi-thread shared instruction prefetching method provided by the embodiment of the present application can be applied to the above-mentioned processor.
- the multi-thread shared instruction prefetching method provided by the embodiment of the present application will be described below with reference to FIG. 4 .
- the instruction cache obtains the first instruction prefetch request sent by the thread group unit, where the first instruction prefetch request is sent by the thread group unit before the thread group is created.
- the first instruction prefetch request is used to obtain instructions corresponding to the thread group to be created in the future.
- the first instruction prefetch request is used to obtain instructions corresponding to the thread group to be created in the future.
- the instruction cache responds to the first instruction prefetch request and performs instruction prefetching.
- the first instruction prefetch request includes: fetch address information and the number of prefetched instructions.
- the process of instruction prefetching by the instruction cache in response to the first instruction prefetch request may be: responding to the first instruction prefetch request, obtaining the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information. instructions and stored in the instruction cache.
- the instruction cache is specifically configured to respond to the first instruction prefetch request, obtain instructions with the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information, and store them in the instruction cache. For example, assuming that the number of prefetched instructions is 64, 64 instructions are fetched from the lower-level cache based on the fetch address information. It should be noted that the number of prefetch instructions can be flexibly configured as needed and is not limited to 64 as shown in the example here. Therefore, 64 cannot be understood as a limitation on this application.
- the multi-thread shared instruction prefetching method may also include: the instruction cache receives a second instruction prefetch request issued from the instruction fetch unit, wherein the second instruction prefetch request is a prefetched instruction corresponding to the thread group. It is sent when the storage amount of instructions in the instruction cache is less than the preset threshold; the instruction cache responds to the second instruction prefetch request to perform instruction prefetching.
- the multi-thread shared instruction prefetching method may also include the instruction cache determining, when receiving the second instruction prefetch request, the prefetch status table There is no second instruction prefetch request, wherein the prefetch status table is used to record an ongoing second instruction prefetch request, or a second instruction prefetch request that has been completed within a period of time.
- the multi-thread shared instruction prefetching method may also include: when receiving the second instruction prefetch request, there is no instruction in the prefetch status table corresponding to the received second instruction prefetch request.
- Other instruction prefetch requests from the thread group will directly record the received second instruction prefetch request in the prefetch status table; or, there is a received second instruction prefetch request from the prefetch status table.
- the prefetch range of other instruction prefetch requests is updated, and the updated prefetch range of other instruction prefetch requests includes the prefetch range of the received second instruction prefetch request.
- the processor includes: an instruction cache and a thread group unit; the thread group unit is configured to send a first instruction prefetch request to the instruction cache; the first instruction is pre-configured to send the first instruction prefetch request to the instruction cache; An instruction prefetch request is used to obtain instructions corresponding to a thread group to be created in the future; the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetch request.
- the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough time.
- Perform instruction prefetching in advance which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
- the processor, electronic device and multi-thread shared instruction prefetching method of the present application are reproducible and can be used in a variety of industrial applications.
- the processor, electronic device, and multi-thread shared instruction prefetching method of the present application can be used in any computer that requires a processor.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
Abstract
Description
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月10日提交中国国家知识产权局的申请号为202210649455.0、名称为“一种处理器、电子设备及多线程共享的指令预取方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application with application number 202210649455.0 and titled "A processor, electronic device and multi-thread shared instruction prefetching method" submitted to the State Intellectual Property Office of China on June 10, 2022. The entire contents of which are incorporated herein by reference.
本申请属于计算机技术领域,具体涉及一种处理器、电子设备及多线程共享的指令预取方法。The present application belongs to the field of computer technology, and specifically relates to a processor, an electronic device, and a multi-thread shared instruction prefetching method.
处理器取指性能十分重要,通常会设计指令缓存来缓解取指访问的延时问题,如果取指请求命中指令缓存则可以立即获得指令。如果取指请求没有命中指令缓存,则需要发送该取指请求到下一级缓存或者主存去获取指令,通常这类处理需要很长的响应时间,从而降低性能。Processor instruction fetch performance is very important. The instruction cache is usually designed to alleviate the delay problem of instruction fetch access. If the instruction fetch request hits the instruction cache, the instruction can be obtained immediately. If the instruction fetch request does not hit the instruction cache, the instruction fetch request needs to be sent to the next level cache or main memory to obtain the instruction. Usually this type of processing requires a long response time, thereby reducing performance.
为了提高取指请求在指令缓存的命中率,降低取指的延时,避免系统因为指令供应不上而停滞,提出了指令预取技术。指令预取质量的关键点在于对指令预取的时间点的控制(过早或者过晚都会影响指令预取的效果)和避免多线程组重复预取(浪费预取带宽)。In order to improve the hit rate of instruction fetch requests in the instruction cache, reduce the instruction fetch delay, and prevent the system from stalling due to insufficient instruction supply, instruction prefetching technology is proposed. The key points of instruction prefetch quality are controlling the timing of instruction prefetching (too early or too late will affect the effect of instruction prefetching) and avoiding repeated prefetching by multi-thread groups (wasting prefetching bandwidth).
当前指令预取技术的取指方式主要是,按照顺序预取或者按照固定步长跳跃预取,即基于现有取指请求的位置预取接下来的固定数量的指令,这种指令预取方式的时间裕量不足,预取不够及时,导致如果有取指请求访问到预取指令时,很大概率上该预取指令的预取动作刚发生或正在进行中(即对应的预取指令还未获取到),特别是对于刚启动的新线程,由于新线程冷启动时间相对较长,使得该新进程的初始取指请求都是访问失效(未命中),需要等待很长的时间才能获取到对应的指令,从而达不到预取的目的。The instruction fetching method of current instruction prefetching technology is mainly to prefetch in sequence or jump prefetching according to a fixed step size, that is, to prefetch the next fixed number of instructions based on the position of the existing instruction fetch request. This instruction prefetching method The time margin is insufficient and the prefetching is not timely enough. As a result, if there is an instruction fetch request to access the prefetch instruction, there is a high probability that the prefetch action of the prefetch instruction has just occurred or is in progress (that is, the corresponding prefetch instruction has not yet been has not been obtained), especially for a newly started new thread, due to the relatively long cold start time of the new thread, the initial fetch requests of the new process are all access failures (misses), and it takes a long time to obtain them. to the corresponding instruction, thus failing to achieve the purpose of prefetching.
发明内容Contents of the invention
鉴于此,本申请的目的在于提供一种处理器、电子设备及多线程共享的指令预取方法,以改善现有指令预取技术存在的预取不够及时,导致指令访问失效的概率大,需要等待很长的时间才能获取到对应指令的问题。In view of this, the purpose of this application is to provide an instruction prefetching method shared by processors, electronic devices, and multi-threads to improve the existing instruction prefetching technology that causes insufficient prefetching in a timely manner, resulting in a high probability of instruction access failure. The problem of waiting for a long time to get the corresponding instructions.
本申请的一些实施例提供了一种处理器,包括:指令缓存、线程组单元;所述线程组单元被配置为将第一指令预取请求发送给所述指令缓存,所述第一指令预取请求用于获取未来一段时间内即将要创建的线程组对应的指令;所述指令缓存,被配置为响应所述第一指令预取请求进行指令预取。Some embodiments of the present application provide a processor, including: an instruction cache and a thread group unit; the thread group unit is configured to send a first instruction prefetch request to the instruction cache, and the first instruction prefetch request is sent to the instruction cache. The fetch request is used to obtain instructions corresponding to the thread group to be created in the future; the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetch request.
本申请的一些实施例中,在创建线程组之前,提前将用于获取未来一段时间内即将要 创建的线程组对应的指令的第一指令预取请求发送给指令缓存,以便于指令缓存有足够的时间提前去进行指令预取,这样可以降低后续指令访问失效的概率,特别是对于新线程来说,效果尤为明显。In some embodiments of the present application, before creating a thread group, the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough Prefetch instructions in advance of the time, which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
结合本申请的一些实施例的一种可能的实施方式,所述第一指令预取请求可以包括:取指地址信息及预取指令数量,所述指令缓存响应所述第一指令预取请求进行指令预取可以包括:所述指令缓存响应所述第一指令预取请求,根据所述取指地址信息从后级缓存处获取所述预取指令数量的指令,并存储在所述指令缓存中。In conjunction with a possible implementation of some embodiments of the present application, the first instruction prefetch request may include: fetch address information and the number of prefetched instructions, and the instruction cache responds to the first instruction prefetch request. Instruction prefetching may include: the instruction cache responds to the first instruction prefetch request, obtains the number of prefetched instructions from a later-level cache according to the fetch address information, and stores them in the instruction cache. .
结合本申请的一些实施例的一种可能的实施方式,所述处理器还可以包括:取指单元;所述线程组单元还可以被配置为将已创建的线程组分发给所述取指单元;所述取指单元可以被配置为基于接收到的线程组下发对应的取指请求;所述指令缓存还可以被配置为响应所述取指请求,返回所述取指请求所命中的指令。In conjunction with a possible implementation of some embodiments of the present application, the processor may further include: an instruction fetching unit; the thread group unit may also be configured to distribute the created thread group to the instruction fetching unit ; The instruction fetch unit may be configured to issue a corresponding instruction fetch request based on the received thread group; the instruction cache may also be configured to respond to the instruction fetch request and return the instruction hit by the instruction fetch request. .
本申请的一些实施例中,通过将已创建的线程组分发给取指单元,以便于取指单元基于接收到的线程组下发对应的取指请求,从指令缓存获取对应的指令进行后续的操作,此外,由于取指请求在第一指令预取请求之后发送,这样可以降低后续指令访问失效的概率,特别是对于新线程来说,效果尤为明显。In some embodiments of the present application, the created thread group is distributed to the instruction fetch unit, so that the instruction fetch unit issues the corresponding instruction fetch request based on the received thread group, and obtains the corresponding instruction from the instruction cache for subsequent processing. Operation, in addition, since the instruction fetch request is sent after the first instruction prefetch request, this can reduce the probability of subsequent instruction access failure, especially for new threads, the effect is particularly obvious.
结合本申请的一些实施例的一种可能的实施方式,所述取指单元的数量可以为多个。In conjunction with a possible implementation of some embodiments of the present application, the number of instruction fetching units may be multiple.
结合本申请实施例的一种可能的实施方式,所述取指单元还可以被配置为对各线程组对应的已预取到所述指令缓存中的指令的存储量进行监测,在监测到存储量小于预设阈值时,向所述指令缓存发送第二指令预取请求;所述指令缓存还可以被配置为响应所述第二指令预取请求进行指令预取。In conjunction with a possible implementation manner of the embodiment of the present application, the instruction fetching unit may also be configured to monitor the storage amount of instructions corresponding to each thread group that have been prefetched into the instruction cache. When the amount is less than the preset threshold, a second instruction prefetch request is sent to the instruction cache; the instruction cache may also be configured to perform instruction prefetching in response to the second instruction prefetch request.
本申请的一些实施例中,通过对各线程组对应的已预取到指令缓存中的指令的存储量进行监测,当缓存指令存储量小于预设阈值时,向指令缓存发送第二指令预取请求以便于提前去进行指令预取,以提高后续取指请求命中指令的概率,缩短取指的等待时间。In some embodiments of the present application, by monitoring the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group, when the cached instruction storage amount is less than a preset threshold, a second instruction prefetch is sent to the instruction cache. The request is to facilitate instruction prefetching in advance to increase the probability of subsequent instruction fetch requests hitting instructions and shorten the waiting time for instruction fetching.
结合本申请的一些实施例的一种可能的实施方式,所述指令缓存还可以被配置为:在响应所述第二指令预取请求进行指令预取之前,在接收到所述第二指令预取请求时,确定预取状态表中不存在所述第二指令预取请求,其中,所述预取状态表用于记录正在进行中的第二指令预取请求,或者一段时间内已完成的第二指令预取请求。In conjunction with a possible implementation manner of some embodiments of the present application, the instruction cache may be further configured to: before performing instruction prefetching in response to the second instruction prefetch request, after receiving the second instruction prefetch request, When a fetch request is made, it is determined that the second instruction prefetch request does not exist in the prefetch status table, wherein the prefetch status table is used to record the second instruction prefetch request in progress, or the second instruction prefetch request that has been completed within a period of time. Second instruction prefetch request.
本申请的一些实施例中,指令缓存在响应第二指令预取请求进行指令预取之前,先确定预取状态表中不存在第二指令预取请求,之后才响应第二指令预取请求进行指令预取,这样可以避免重复预取,浪费预取带宽。In some embodiments of the present application, before performing instruction prefetching in response to the second instruction prefetching request, the instruction cache first determines that there is no second instruction prefetching request in the prefetch status table, and then responds to the second instruction prefetching request. Instruction prefetching can avoid repeated prefetching and waste of prefetching bandwidth.
结合本申请的一些实施例的一种可能的实施方式,所述指令缓存还可以被配置为在接收到第二指令预取请求时,将接收到的第二指令预取请求记录在所述预取状态表中。In conjunction with a possible implementation manner of some embodiments of the present application, the instruction cache may be further configured to, when receiving a second instruction prefetch request, record the received second instruction prefetch request in the cache. Get the status table.
本申请的一些实施例中,通过将接收到的第二指令预取请求记录在预取状态表中,以便于对第二指令预取请求进行管理,获悉第二指令预取请求的指令预取情况。In some embodiments of the present application, the received second instruction prefetch request is recorded in the prefetch status table to facilitate the management of the second instruction prefetch request and learn the instruction prefetch of the second instruction prefetch request. Condition.
结合本申请的一些实施例的一种可能的实施方式,所述指令缓存可以具体被配置为:在所述预取状态表中不存在来自于与所述接收到的第二指令预取请求所对应的线程组的其他指令预取请求,则直接将所述接收到的第二指令预取请求记录在所述预取状态表中;或者,在所述预取状态表中存在来自于与所述接收到的第二指令预取请求所对应的线程组的其他指令预取请求,更新所述其他指令预取请求的预取范围,更新后的所述其他指令预取请求的预取范围包括所述接收到的第二指令预取请求的预取范围。In conjunction with a possible implementation manner of some embodiments of the present application, the instruction cache may be configured specifically: in the prefetch status table, there is no data from the received second instruction prefetch request. If there are other instruction prefetch requests of the corresponding thread group, the received second instruction prefetch request will be directly recorded in the prefetch status table; or, if there is an instruction prefetch request from the corresponding thread group in the prefetch status table Other instruction prefetch requests of the thread group corresponding to the received second instruction prefetch request are updated, and the prefetch range of the other instruction prefetch requests is updated. The updated prefetch range of the other instruction prefetch requests includes The prefetch range of the received second instruction prefetch request.
本申请的一些实施例中,在将接收到的第二指令预取请求记录在预取状态表中时,在预取状态表中不存在来自同一线程组的其他指令预取请求,则直接将接收到的第二指令预取请求记录在预取状态表中,若存在来自同一线程组的其他指令预取请求,则对其他指令预取请求的预取范围进行更新,使其更新后的预取范围包括接收到的第二指令预取请求的预取范围,这样后续可以避免重复预取,浪费预取带宽。In some embodiments of the present application, when recording the received second instruction prefetch request in the prefetch status table, if there are no other instruction prefetch requests from the same thread group in the prefetch status table, then the received second instruction prefetch request will be recorded directly in the prefetch status table. The received second instruction prefetch request is recorded in the prefetch status table. If there are other instruction prefetch requests from the same thread group, the prefetch ranges of other instruction prefetch requests are updated to the updated prefetch range. The fetch range includes the prefetch range of the received second instruction prefetch request, so that repeated prefetching and waste of prefetch bandwidth can be avoided in the future.
结合本申请的一些实施例的一种可能的实施方式,所述指令缓存响应所述第一指令预取请求的优先级可以低于响应所述第二指令预取请求的优先级。In conjunction with a possible implementation of some embodiments of the present application, the priority of the instruction cache in responding to the first instruction prefetch request may be lower than the priority of responding to the second instruction prefetch request.
本申请的一些实施例中,通过使指令缓存响应第一指令预取请求的优先级低于响应第二指令预取请求的优先级,使得可以通过利用指令缓存的空闲带宽来为线程组进行指令预取,这样便不会影响指令缓存的正常访存性能。In some embodiments of the present application, by making the priority of the instruction cache to respond to the first instruction prefetch request be lower than the priority of responding to the second instruction prefetch request, instructions can be processed for the thread group by utilizing the idle bandwidth of the instruction cache. Prefetching, so that it does not affect the normal memory access performance of the instruction cache.
结合本申请的一些实施例的一种可能的实施方式,所述指令缓存可以被配置为删除所述预取状态表中已经过期的预取范围,以及已经过期的第二指令预取请求。In conjunction with a possible implementation of some embodiments of the present application, the instruction cache may be configured to delete the expired prefetch range and the expired second instruction prefetch request in the prefetch status table.
结合本申请的一些实施例的一种可能的实施方式,所述第二指令预取请求可以包括取指地址信息及预取指令数量。In conjunction with a possible implementation of some embodiments of the present application, the second instruction prefetch request may include instruction fetch address information and the number of prefetched instructions.
本申请的另一些实施例还提供了一种电子设备,包括本体和如上述本申请的一些实施例和/或结合本申请的一些实施例的任一种可能的实施方式提供的处理器。Other embodiments of the present application also provide an electronic device, including a body and a processor as provided in any of the above-mentioned embodiments of the present application and/or in combination with any possible implementation of some embodiments of the present application.
本申请的又一些实施例还提供了一种多线程共享的指令预取方法,包括:指令缓存获取线程组单元发送的第一指令预取请求,其中,所述第一指令预取请求为所述线程组单元在创建线程组之前发送的,所述第一指令预取请求用于获取未来一段时间内即将要创建的线程组对应的指令;所述指令缓存响应所述第一指令预取请求进行指令预取。Some embodiments of the present application also provide a multi-thread shared instruction prefetching method, including: the instruction cache obtains a first instruction prefetch request sent by a thread group unit, wherein the first instruction prefetch request is The thread group unit sends it before creating the thread group. The first instruction prefetch request is used to obtain instructions corresponding to the thread group to be created in the future; the instruction cache responds to the first instruction prefetch request. Perform instruction prefetching.
结合本申请的又一些实施例的一种可能的实施方式,所述第一指令预取请求可以包括:取指地址信息及预取指令数量,所述指令缓存响应所述第一指令预取请求进行指令预取,可以包括:所述指令缓存响应所述第一指令预取请求,根据所述取指地址信息从后级缓存处获取所述预取指令数量的指令,并存储在所述指令缓存中。In conjunction with a possible implementation of some embodiments of the present application, the first instruction prefetch request may include: fetch address information and the number of prefetched instructions, and the instruction cache responds to the first instruction prefetch request. Performing instruction prefetching may include: the instruction cache responds to the first instruction prefetch request, obtains the number of prefetched instructions from a later-level cache according to the instruction fetch address information, and stores them in the instruction In cache.
结合本申请的又一些实施例的一种可能的实施方式,所述方法还可以包括:所述指令缓存接收来自取指单元下发的第二指令预取请求,其中,所述第二指令预取请求为在线程组对应的已预取到所述指令缓存中的指令的存储量小于预设阈值时发送的;所述指令缓存响应所述第二指令预取请求进行指令预取。In conjunction with a possible implementation of some embodiments of the present application, the method may further include: the instruction cache receiving a second instruction prefetch request issued from the instruction fetch unit, wherein the second instruction prefetch request The fetch request is sent when the storage amount of instructions prefetched into the instruction cache corresponding to the thread group is less than a preset threshold; the instruction cache performs instruction prefetching in response to the second instruction prefetch request.
结合本申请的又一些实施例的一种可能的实施方式,在所述指令缓存响应所述第二指令预取请求进行指令预取之前,所述方法还可以包括:所述指令缓存在接收到所述第二指令预取请求时,确定预取状态表中不存在所述第二指令预取请求,其中,所述预取状态表用于记录正在进行中的第二指令预取请求,或者一段时间内已完成的第二指令预取请求。In conjunction with a possible implementation manner of some embodiments of the present application, before the instruction cache performs instruction prefetching in response to the second instruction prefetch request, the method may further include: the instruction cache receives When the second instruction prefetch request is made, it is determined that the second instruction prefetch request does not exist in the prefetch status table, wherein the prefetch status table is used to record the ongoing second instruction prefetch request, or Second instruction prefetch requests that have completed within a certain period of time.
结合本申请的又一些实施例的一种可能的实施方式,所述方法还可以包括:在接收到第二指令预取请求时,在所述预取状态表中不存在来自于与接收到的第二指令预取请求所对应的线程组的其他指令预取请求,则直接将所述接收到的第二指令预取请求记录在所述预取状态表中;或者,在所述预取状态表中存在来自于与所述接收到的第二指令预取请求所对应的线程组的其他指令预取请求,更新所述其他指令预取请求的预取范围,更新后的所述其他指令预取请求的预取范围包括所述接收到的第二指令预取请求的预取范围。In conjunction with a possible implementation manner of some embodiments of the present application, the method may further include: when receiving the second instruction prefetch request, there is no instruction in the prefetch status table from the received instruction prefetch request. For other instruction prefetch requests of the thread group corresponding to the second instruction prefetch request, the received second instruction prefetch request is directly recorded in the prefetch status table; or, in the prefetch status There are other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request in the table. The prefetch range of the other instruction prefetch requests is updated. The updated other instruction prefetch requests are The prefetch range of the fetch request includes the prefetch range of the received second instruction prefetch request.
本申请的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例而了解。本申请的目的和其他优点可通过在所写的说明书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the present application may be realized and attained by the structure particularly pointed out in the written description and accompanying drawings.
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。通过附图所示,本申请的上述及其它目的、特征和优势将更加清晰。在全部附图中相同的附图标记指示相同的部分。并未刻意按实际尺寸等比例缩放绘制附图,重点在于示出本申请的主旨。In order to more clearly illustrate the technical solutions in the embodiments of the present application or related technologies, the drawings needed to be used in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some implementations of the present application. For example, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts. The above and other objects, features and advantages of the present application will become clearer as shown in the accompanying drawings. The same reference numbers refer to the same parts throughout the drawings. The drawings are not intentionally scaled to actual size and are focused on illustrating the gist of the present application.
图1示出了本申请实施例提供的一种处理器的结构示意图。Figure 1 shows a schematic structural diagram of a processor provided by an embodiment of the present application.
图2示出了本申请实施例提供的又一种处理器的结构示意图。FIG. 2 shows a schematic structural diagram of yet another processor provided by an embodiment of the present application.
图3示出了本申请实施例提供的一种电子设备的结构示意图。FIG. 3 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application.
图4示出了本申请实施例提供的一种多线程共享的指令预取方法的流程示意图。FIG. 4 shows a schematic flowchart of a multi-thread shared instruction prefetching method provided by an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的 描述中诸如“第一”、“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that similar reference numerals and letters represent similar items in the following figures, therefore, once an item is defined in one figure, it does not need further definition and explanation in subsequent figures. At the same time, in the description of this application, relational terms such as "first", "second", etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
在本申请的描述中,还需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。In the description of this application, it should also be noted that, unless otherwise clearly stated and limited, the terms "connected" and "connected" should be understood in a broad sense. For example, it can be a fixed connection or a detachable connection, or Integrated connection; it can also be electrical connection; it can be directly connected, or it can be indirectly connected through an intermediate medium; it can be internal connection between two components. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood on a case-by-case basis.
再者,本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。术语“多个”除非另有说明外,否则指的是两个及两个以上。Furthermore, the term "and/or" in this application is only an association relationship describing related objects, indicating that there can be three relationships. For example, A and/or B can mean: A alone exists, and A and A exist simultaneously. B, there are three situations of B alone. The term "plurality" refers to two or more than two unless otherwise stated.
鉴于现有指令预取技术所存在的缺陷,例如,因预取不够及时,导致如果有取指请求访问到预取指令时,很大概率上该预取指令的预取动作刚发生或正在进行中(即对应的预取指令还未获取到),特别是对于刚启动的新线程,由于新线程冷启动时间相对较长,使得该新进程的初始取指请求全都是访问失效(未命中),需要等待很长的时间才能获取到对应的指令。In view of the shortcomings of the existing instruction prefetch technology, for example, because the prefetch is not timely enough, if an instruction fetch request accesses the prefetch instruction, there is a high probability that the prefetch action of the prefetch instruction has just occurred or is in progress. (that is, the corresponding prefetch instruction has not yet been obtained), especially for a new thread that has just been started. Since the cold start time of the new thread is relatively long, all the initial fetch requests of the new process are access failures (misses). , it takes a long time to get the corresponding instructions.
基于此,本申请实施例提供了一种全新的多线程共享的指令预取方法,能有效降低指令访问失效的概率,减少取指请求获取指令的等待时长,特别是对于新线程来说,效果尤为明显。Based on this, embodiments of the present application provide a new multi-thread shared instruction prefetching method, which can effectively reduce the probability of instruction access failure and reduce the waiting time for instruction fetching requests, especially for new threads. Especially obvious.
为了更好的理解,下面结合图1所示的处理器进行说明。该处理器可以包括:指令缓存和线程组单元,指令缓存和线程组单元可以连接。For a better understanding, the following is explained in conjunction with the processor shown in Figure 1. The processor may include: an instruction cache and a thread group unit, and the instruction cache and the thread group unit may be connected.
线程组单元负责创建线程组,由于新线程冷启动时间相对较长,为了降低指令访问失效的概率,减少取指请求获取指令的等待时长。该线程组单元可以被配置为在创建线程组之前,提前将第一指令预取请求发送给指令缓存,第一指令预取请求可以用于获取未来一段时间(时间的长度可根据应用需求进行配置)内即将要创建的线程组对应的指令,以便于有足够的时间提前去进行指令预取,这样可以降低指令访问失效的概率,特别是对于新线程来说,效果尤为明显。The thread group unit is responsible for creating the thread group. Since the cold start time of the new thread is relatively long, in order to reduce the probability of instruction access failure and reduce the waiting time of the instruction fetch request. The thread group unit can be configured to send the first instruction prefetch request to the instruction cache in advance before creating the thread group. The first instruction prefetch request can be used to obtain a period of time in the future (the length of time can be configured according to application requirements). ), so that there is enough time to prefetch instructions in advance, which can reduce the probability of instruction access failure, especially for new threads, the effect is particularly obvious.
其中,第一指令预取请求中携带有用于指令预取的控制信息,例如,取指地址信息(为获取指令的全局访问地址)及预取指令数量(用于表示需要预取的指令数量,其数值可以 根据需要进行灵活配置)。若所要预取的指令的访问地址是连续的,则取指地址信息可以仅包括取指的首地址信息,从首地址信息开始连续取预取指令数量个指令即可。若所要预取的指令的访问地址不是连续的,则取指地址信息需要包括预取指令数量个指令对应的访问地址。例如,预取指令数量为32,则取指地址信息需要包含获取32个指令对应的访问地址。Among them, the first instruction prefetch request carries control information for instruction prefetching, for example, fetch address information (to obtain the global access address of the instruction) and the number of prefetched instructions (used to indicate the number of instructions that need to be prefetched, Its value can be flexibly configured as needed). If the access addresses of the instructions to be prefetched are continuous, the fetch address information may only include the first address information of the fetched instruction, and it is enough to continuously fetch instructions as many as the prefetched instructions starting from the first address information. If the access addresses of the instructions to be prefetched are not consecutive, the fetch address information needs to include the access addresses corresponding to the number of prefetched instructions. For example, if the number of prefetched instructions is 32, the fetch address information needs to include the access addresses corresponding to the fetched 32 instructions.
一个线程组中可以包括多个(如16、32、64等)获取同一个指令的线程,将获取同一指令的多个线程划分到同一个线程组,以便于通过发送一个取指请求便可同时获取到多个线程所需的指令,提高指令获取效率。A thread group can include multiple (such as 16, 32, 64, etc.) threads that obtain the same instruction. Multiple threads that obtain the same instruction are divided into the same thread group so that they can be fetched simultaneously by sending an instruction request. Obtain the instructions required by multiple threads and improve the efficiency of instruction acquisition.
指令缓存可以被配置为响应第一指令预取请求进行指令预取。其中,第一指令预取请求可以包括:取指地址信息及预取指令数量。指令缓存具体被配置为响应第一指令预取请求,根据取指地址信息从后级缓存(下一级缓存或主存)处获取预取指令数量的指令,并存储在指令缓存中。例如,假设预取指令数量为64,则根据取指地址信息从后级缓存处获取64个指令。The instruction cache may be configured to perform instruction prefetching in response to the first instruction prefetch request. The first instruction prefetch request may include: fetch address information and the number of prefetched instructions. The instruction cache is specifically configured to respond to the first instruction prefetch request, obtain instructions with the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information, and store them in the instruction cache. For example, assuming that the number of prefetched instructions is 64, 64 instructions are fetched from the lower-level cache based on the fetch address information.
需要说明的是,预取指令数量可以根据需要进行灵活配置,并不限于上述示例的64或32,因此,不能将示例中的32、64理解成是对本申请的限制。It should be noted that the number of prefetch instructions can be flexibly configured as needed and is not limited to 64 or 32 in the above example. Therefore, the 32 and 64 in the example cannot be understood as a limitation on the present application.
可选地,第一指令预取请求的优先级相对较低,这样便不会影响指令缓存的正常访存工作,通过利用指令缓存的空闲带宽来为新线程组预取指令,以减少线程组冷启动的时间,预取指令数目可以灵活配置,以根据不同的应用场景获得最佳的性能。当指令缓存工作负荷很重时,比如在接收到某个第一指令预取请求时,此时指令缓存可能正在执行更早被创建的线程组的指令访存,则第一指令预取请求会因得不到响应被积压,后续如果对应该第一指令预取请求的线程组已经执行结束,则该第一指令预取请求作废,会被自动删除。Optionally, the priority of the first instruction prefetch request is relatively low, so that it will not affect the normal memory access work of the instruction cache, and the idle bandwidth of the instruction cache is used to prefetch instructions for the new thread group to reduce the number of thread groups. The cold start time and the number of prefetched instructions can be flexibly configured to obtain the best performance according to different application scenarios. When the instruction cache has a heavy workload, such as when receiving a first instruction prefetch request, the instruction cache may be performing instruction fetch of an earlier created thread group, and the first instruction prefetch request will Due to the backlog of no response, if the thread group corresponding to the first instruction prefetch request has finished executing, the first instruction prefetch request will be invalid and will be automatically deleted.
本申请实施例中,在创建线程组之前,提前将用于获取未来一段时间内即将要创建的线程组对应的指令的第一指令预取请求发送给指令缓存,以便于指令缓存有足够的时间提前去进行指令预取,这样可以降低后续指令访问失效的概率,特别是对于新线程来说,效果尤为明显。In the embodiment of this application, before creating a thread group, the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough time. Perform instruction prefetching in advance, which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
一种实施方式下,该处理器还包括取指单元,其示意图如图2所示。取指单元分别与线程组单元和指令缓存相连。In one embodiment, the processor further includes an instruction fetching unit, the schematic diagram of which is shown in Figure 2 . The instruction fetch unit is connected to the thread group unit and the instruction cache respectively.
线程组单元还可以被配置为将已创建的线程组分发给取指单元。线程组单元在发送第一指令预取请求后的一段时间(可根据需要进行配置),线程组单元便会正式创建第一指令预取请求对应的线程组,之后将已创建的线程组分发给取指单元。The thread group unit can also be configured to distribute created thread groups to the fetch unit. A period of time after the thread group unit sends the first instruction prefetch request (which can be configured as needed), the thread group unit will formally create the thread group corresponding to the first instruction prefetch request, and then distribute the created thread group to Fetch unit.
可选地,为了支持并行多线程取指,取指单元的数量可以是多个(两个及两个),取指单元的具体数值可以根据并行取指的需求来决定,比如,若需要支持8个线程组并发取指, 则取指单元的数量为8,若需要支持16个线程组并发取指,则取指单元的数量为16。Optionally, in order to support parallel multi-threaded instruction fetching, the number of instruction fetching units can be multiple (two or two). The specific value of the instruction fetching unit can be determined according to the requirements of parallel instruction fetching. For example, if it needs to support If 8 thread groups concurrently fetch instructions, the number of instruction fetch units is 8. If 16 thread groups need to support concurrent instruction fetching, the number of instruction fetch units is 16.
每个取指单元的功能一致,例如均用于基于线程组向指令缓存下发取指请求,每个取指请求中携带有全局的访问地址,可以基于该访问地址进行取指。指令缓存还可以被配置为响应取指请求,向取指单元返回取指请求所命中的指令。如果取指请求命中指令缓存则可以立即获得指令,如果取指请求没有命中指令缓存,则指令缓存需要发送该取指请求到后级缓存(下一级缓存或者主存)去获取指令。Each instruction fetch unit has the same function, for example, it is used to issue an instruction fetch request to the instruction cache based on the thread group. Each instruction fetch request carries a global access address, and the instruction can be fetched based on the access address. The instruction cache may also be configured to respond to an instruction fetch request and return to the instruction fetch unit the instruction hit by the instruction fetch request. If the instruction fetch request hits the instruction cache, the instruction can be obtained immediately. If the instruction fetch request does not hit the instruction cache, the instruction cache needs to send the instruction fetch request to the subsequent level cache (next level cache or main memory) to obtain the instruction.
可选地,取指单元还可以被配置为对各线程组对应的已预取到指令缓存中的指令的存储量进行监测,在监测到存储量小于预设阈值(可灵活配置)时,向指令缓存发送第二指令预取请求。指令缓存还可以被配置为响应第二指令预取请求进行指令预取。取指单元可以对各个线程组的预取指令的消耗状态进行监测,当检测到预取指令存储量不足(小于预设阈值)时,便启动该线程组的第二指令预取请求,向指令缓存发送第二指令预取请求。Optionally, the instruction fetching unit can also be configured to monitor the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group, and when it is detected that the storage amount is less than a preset threshold (flexibly configurable), The instruction cache sends a second instruction prefetch request. The instruction cache may also be configured to perform instruction prefetching in response to a second instruction prefetch request. The instruction fetch unit can monitor the consumption status of the prefetch instructions of each thread group. When it detects that the prefetch instruction storage amount is insufficient (less than the preset threshold), it starts the second instruction prefetch request of the thread group and sends the instruction to the instruction fetcher. The cache sends a second instruction prefetch request.
为了便于理解,下面举例进行说明,假设预设阈值为32,则表示当线程组对应的已预取到指令缓存中的指令的存储量小于32时,便需要进行指令预取。例如,假设某个线程组(假设该线程组总共需要192个指令)在未创建之前,指令缓存提前预取了该线程组所需的128个指令,则取指单元开始执行该线程组时,会向指令缓存下发取指请求获取对应的指令,待获取到第97个指令的时候,由于此时缓存的指令存储量低于预设阈值(此时为32),则会向指令缓存下发第二指令预取请求去获取该线程组之后的指令,比如去获取第129~160个指令。之后,待获取到第129个指令的时候,由于此时缓存的指令存储量(31)低于预设阈值(此时为32),则会向指令缓存下发第二指令预取请求去获取该线程组之后的指令,比如去获取第161~192个指令。需要说明的是,此处示例的例子,仅是为了便于理解上述取指单元对各线程组对应的已预取到指令缓存中的指令的存储量进行监测,并在存储量不足时,下发第二指令预取请求去预取指令的原理说明。不能将上述示例中的各种特殊值,理解成是对本申请的限制。For ease of understanding, an example is given below. Assuming that the preset threshold is 32, it means that when the storage amount of instructions that have been prefetched into the instruction cache corresponding to the thread group is less than 32, instruction prefetching is required. For example, assuming that a thread group (assuming that the thread group requires a total of 192 instructions) is not created, the instruction cache prefetches the 128 instructions required by the thread group in advance, and then when the instruction fetch unit starts executing the thread group, An instruction fetch request will be issued to the instruction cache to obtain the corresponding instruction. When the 97th instruction is obtained, because the cached instruction storage amount is lower than the preset threshold (32 at this time), an instruction fetch request will be issued to the instruction cache. Issue a second instruction prefetch request to obtain instructions after the thread group, such as the 129th to 160th instructions. After that, when the 129th instruction is fetched, since the cached instruction storage amount (31) is lower than the preset threshold (32 at this time), a second instruction prefetch request will be sent to the instruction cache to fetch it. The instructions after the thread group, for example, get the 161st to 192nd instructions. It should be noted that the examples shown here are only for the convenience of understanding. The above instruction fetch unit monitors the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group, and issues the instruction when the storage amount is insufficient. The second instruction prefetch request explains the principle of prefetching instructions. Various special values in the above examples should not be understood as limitations on this application.
其中,第二指令预取请求与第一指令预取请求类似,也携带有用于指令预取的控制信息,例如,取指地址信息(为获取指令的全局访问地址)及预取指令数量(用于表示需要预取的指令数量,其数值可以根据需要进行灵活配置)。The second instruction prefetch request is similar to the first instruction prefetch request, and also carries control information for instruction prefetching, for example, fetch address information (to obtain the global access address of the instruction) and the number of prefetched instructions (used in Yu represents the number of instructions that need to be prefetched, and its value can be flexibly configured as needed).
取指单元可以通过已预取的指令数目和指令被消耗的数目(取指单元正常取指的数目)来计算各线程组对应的已预取到指令缓存中的指令的存储量。例如,对于某个线程组来说,假设取指单元已预取的指令数目为64,指令被消耗的数目为32,则该线程组对应的已预取到指令缓存中的指令的存储量为64-32=32。The instruction fetch unit can calculate the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group based on the number of instructions that have been prefetched and the number of instructions consumed (the number of instructions that the instruction fetch unit normally fetches). For example, for a certain thread group, assuming that the number of instructions prefetched by the instruction fetch unit is 64 and the number of instructions consumed is 32, then the storage amount of instructions that have been prefetched into the instruction cache corresponding to the thread group is 64-32=32.
一种可选实施方式下,也可以是,指令缓存还可以被配置为在响应第一指令预取请求进行指令预取,以及在响应第二指令预取请求进行指令预取后,均会告知取指单元指令预 取的情况,此外,即便没有响应第一指令预取请求、第二指令预取请求,也会定时或不定时的告知取指单元指令预取的情况,以便于取指单元实时获悉各线程组对应的已预取到指令缓存中的指令的存储量。In an optional implementation, the instruction cache may also be configured to notify the The instruction prefetching situation of the instruction fetch unit. In addition, even if there is no response to the first instruction prefetch request or the second instruction prefetch request, the instruction fetch unit will be notified of the instruction prefetching situation regularly or irregularly, so as to facilitate the instruction fetching unit. Learn in real time the storage amount of instructions that have been prefetched into the instruction cache corresponding to each thread group.
需要说明的是,针对同一线程来说,第一指令预取请求的取指范围与第二指令预取请求的取指范围可以重叠或不重叠,可以是第二指令预取请求的取指范围紧挨着第一指令预取请求的取指范围,例如,第一指令预取请求的取指范围为地址0至地址31,第二指令预取请求的取指范围可以是地址32至地址63;也可以是二者的取指范围重叠,甚至相同等,也即第一指令预取请求的取指范围与第二指令预取请求的取指范围均可根据需要进行灵活设置。It should be noted that for the same thread, the instruction fetch range of the first instruction prefetch request and the instruction fetch range of the second instruction prefetch request may or may not overlap, and may be the instruction fetch range of the second instruction prefetch request. Next to the instruction fetch range of the first instruction prefetch request, for example, the instruction fetch range of the first instruction prefetch request is address 0 to address 31, and the instruction fetch range of the second instruction prefetch request may be address 32 to address 63. ; Or the instruction fetch ranges of the two may overlap or even be the same, that is, the instruction fetch range of the first instruction prefetch request and the instruction fetch range of the second instruction prefetch request can be flexibly set according to needs.
可选地,第一指令预取请求的优先级相对较低,可以是指令缓存响应第一指令预取请求的优先级低于第二指令预取请求的优先级,若指令缓存工作负荷很重时,指令缓存可能一直没有响应该线程组的第一指令预取请求,则该线程组的对应的已预取到指令缓存中的指令为零,此时,取指单元在发送第二指令预取请求时,其取指范围可以从首地址开始或从首地址开始后的某个地址开始进行取指。若指令缓存已经响应第一指令预取请求,则取指单元在发送第二指令预取请求时,其取指范围可以紧挨着第一指令预取请求的取指范围进行取指。Optionally, the priority of the first instruction prefetch request is relatively low. This may mean that the priority of the instruction cache responding to the first instruction prefetch request is lower than the priority of the second instruction prefetch request. If the workload of the instruction cache is heavy, When, the instruction cache may not respond to the first instruction prefetch request of the thread group, then the corresponding instructions of the thread group that have been prefetched into the instruction cache are zero. At this time, the instruction fetch unit is sending the second instruction prefetch request. When fetching a request, the fetch range can start from the first address or start from an address after the first address. If the instruction cache has responded to the first instruction prefetch request, when the instruction fetch unit sends the second instruction prefetch request, its instruction fetch range can fetch instructions immediately adjacent to the instruction fetch range of the first instruction prefetch request.
为了避免重复预取,浪费预取带宽,可选地,指令缓存还可以被配置为:在响应第二指令预取请求进行指令预取之前,在接收到第二指令预取请求时,确定预取状态表中不存在第二指令预取请求,其中,预取状态表用于记录正在进行中的第二指令预取请求,或者一段时间内已完成的第二指令预取请求。在该种实施方式下,指令缓存在接收到第二指令预取请求时,并不直接响应第二指令预取请求进行指令预取,而是在确定预取状态表中不存在第二指令预取请求时,才响应第二指令预取请求进行指令预取。若预取状态表中存在第二指令预取请求,则表明该第二指令预取请求正在进行指令预取,或者已经完成指令预取,则直接丢弃即可。In order to avoid repeated prefetching and wasting prefetch bandwidth, optionally, the instruction cache can also be configured to: before performing instruction prefetching in response to the second instruction prefetch request, when receiving the second instruction prefetch request, determine There is no second instruction prefetch request in the fetch status table, where the prefetch status table is used to record an ongoing second instruction prefetch request, or a second instruction prefetch request that has been completed within a period of time. In this implementation, when receiving the second instruction prefetch request, the instruction cache does not directly respond to the second instruction prefetch request to perform instruction prefetching, but determines that the second instruction prefetch does not exist in the prefetch status table. Instruction prefetching is performed only in response to the second instruction prefetch request when the second instruction prefetch request is received. If there is a second instruction prefetch request in the prefetch status table, it indicates that the second instruction prefetch request is in the process of instruction prefetching, or that the instruction prefetching has been completed, and it can be discarded directly.
指令缓存还可以被配置为在接收到第二指令预取请求时,将接收到的第二指令预取请求记录在预取状态表中。为了避免重复预取,浪费预取带宽,可选地,指令缓存具体被配置为:在预取状态表中不存在来自于与接收到的第二指令预取请求所对应的线程组的其他指令预取请求(为与当前的第二指令预取请求来自同一线程组的其他第二指令预取请求),则直接将接收到的第二指令预取请求记录在预取状态表中;或者,在预取状态表中存在来自于与接收到的第二指令预取请求所对应的线程组的其他指令预取请求,更新其他指令预取请求的预取范围,更新后的其他指令预取请求的预取范围包括接收到的第二指令预取请求的预取范围。The instruction cache may be further configured to record the received second instruction prefetch request in the prefetch status table when the second instruction prefetch request is received. In order to avoid repeated prefetching and wasting prefetch bandwidth, optionally, the instruction cache is configured such that there are no other instructions from the thread group corresponding to the received second instruction prefetch request in the prefetch status table. Prefetch request (for other second instruction prefetch requests from the same thread group as the current second instruction prefetch request), directly record the received second instruction prefetch request in the prefetch status table; or, There are other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the updated other instruction prefetch requests The prefetch range includes the prefetch range of the received second instruction prefetch request.
为了便于理解,举例进行说明。假设取指单元第一次下发针对某一线程组的第二指令预取请求,此时,由于该预取状态表中不存在来自于该线程组的其他指令预取请求,则直接将该第二指令预取请求记录在该预取状态表中。又例如,之后,取指单元第二次下发针对该线程组的第二指令预取请求,此时,由于预取状态表中存在来自于该线程组的其他指令预取请求(即第一次下发的第二指令预取请求),因此,只需要对预取状态表中的其他指令预取请求(此时为第一次下发的第二指令预取请求)的预取范围进行更新,使其包括第二次下发的第二指令预取请求的预取范围即可。也即对第一次下发的第二指令预取请求中的用于指令预取的控制信息(如包括取指地址信息及预取指令数量)进行更新。To facilitate understanding, examples are provided. Assume that the instruction fetch unit issues a second instruction prefetch request for a certain thread group for the first time. At this time, since there are no other instruction prefetch requests from the thread group in the prefetch status table, the instruction prefetch request is directly The second instruction prefetch request is recorded in the prefetch status table. For another example, after that, the instruction fetch unit issues a second instruction prefetch request for the thread group for the second time. At this time, because there are other instruction prefetch requests from the thread group (i.e., the first instruction prefetch request) in the prefetch status table, The second instruction prefetch request issued for the first time), therefore, only the prefetch range of other instruction prefetch requests in the prefetch status table (in this case, the second instruction prefetch request issued for the first time) needs to be carried out. Update it to include the prefetch range of the second instruction prefetch request issued for the second time. That is, the control information used for instruction prefetching (for example, including instruction fetch address information and the number of prefetched instructions) in the second instruction prefetch request issued for the first time is updated.
此外,指令缓存还可以被配置为删除预取状态表中已经过期的预取范围,以及已经过期的第二指令预取请求,也即删除预取状态表中完成时间超过预设时间的第二指令预取请求,保留完成时间在预设时间内的第二指令预取请求。In addition, the instruction cache can also be configured to delete the expired prefetch range in the prefetch status table, as well as the expired second instruction prefetch request, that is, delete the second instruction prefetch request in the prefetch status table whose completion time exceeds the preset time. Instruction prefetch request, retain the second instruction prefetch request whose completion time is within the preset time.
本申请所示的处理器,可以是在现有主流处理器的架构上进行改进而得到,使其在支持高并发访问指令缓存的同时,能有效降低指令访问失效的概率,减少取指请求获取指令的等待时长。其中,现有的主流处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)、图形处理器(Graphics Processing Unit,GPU)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA),或者,也可以是微处理器或者其他任何常规的处理器等。The processor shown in this application can be improved on the architecture of existing mainstream processors, so that while supporting high concurrent access to the instruction cache, it can effectively reduce the probability of instruction access failure and reduce the acquisition of instruction fetch requests. How long to wait for the command. Among them, the existing mainstream processors can be general-purpose processors, including central processing units (Central Processing Unit, CPU), network processors (Network Processor, NP), graphics processors (Graphics Processing Unit, GPU), etc.; they can also It is a Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or it can be a microprocessor or anything else Conventional processors, etc.
基于同样的发明构思,本申请实施例还提供了一种电子设备,该电子设备可以包括本体和上述的处理器。其中,本体可以包括收发器、通讯总线及存储器等。一种实施方式下,该电子设备的结构示意如图3所示。Based on the same inventive concept, embodiments of the present application also provide an electronic device, which may include a body and the above-mentioned processor. Among them, the ontology may include a transceiver, communication bus, memory, etc. In one embodiment, the structure of the electronic device is shown in Figure 3 .
收发器、存储器、处理器各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。其中,收发器可以用于收发数据。存储器可以用于存储数据。The components of the transceiver, memory, and processor are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these components may be electrically connected to each other through one or more communication buses or signal lines. Among them, the transceiver can be used to send and receive data. Memory can be used to store data.
其中,存储器可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。Among them, the memory can be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable memory In addition to read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
其中,上述的电子设备,包括但不限于智能手机、平板、电脑、服务器等。Among them, the above-mentioned electronic devices include but are not limited to smartphones, tablets, computers, servers, etc.
电子设备实施例所提供的处理器,其实现原理及产生的技术效果和前述处理器实施例相同,为简要描述,电子设备实施例部分未提及之处,可参考处理器实施例中相应内容。The implementation principle and technical effects of the processor provided by the electronic device embodiment are the same as those of the aforementioned processor embodiment. For the purpose of brief description, for matters not mentioned in the electronic device embodiment, please refer to the corresponding content in the processor embodiment. .
基于同样的发明构思,本申请实施例还提供了一种多线程共享的指令预取方法,如图4 所示。本申请实施例提供的多线程共享的指令预取方法可以应用于上述的处理器。下面将结合图4,对本申请实施例提供的多线程共享的指令预取方法进行说明。Based on the same inventive concept, embodiments of the present application also provide a multi-thread shared instruction prefetching method, as shown in Figure 4 . The multi-thread shared instruction prefetching method provided by the embodiment of the present application can be applied to the above-mentioned processor. The multi-thread shared instruction prefetching method provided by the embodiment of the present application will be described below with reference to FIG. 4 .
S1:指令缓存获取线程组单元发送的第一指令预取请求,其中,第一指令预取请求为线程组单元在创建线程组之前发送的。S1: The instruction cache obtains the first instruction prefetch request sent by the thread group unit, where the first instruction prefetch request is sent by the thread group unit before the thread group is created.
其中,第一指令预取请求用于获取未来一段时间内即将要创建的线程组对应的指令。通过在创建线程组之前,提前将第一指令预取请求发送给指令缓存,以便于有足够的时间提前去进行指令预取,这样可以降低指令访问失效的概率,特别是对于新线程来说。Among them, the first instruction prefetch request is used to obtain instructions corresponding to the thread group to be created in the future. By sending the first instruction prefetch request to the instruction cache in advance before creating the thread group, so that there is enough time to perform instruction prefetching in advance, this can reduce the probability of instruction access failure, especially for new threads.
S2:指令缓存响应第一指令预取请求进行指令预取。S2: The instruction cache responds to the first instruction prefetch request and performs instruction prefetching.
第一指令预取请求包括:取指地址信息及预取指令数量。指令缓存响应第一指令预取请求进行指令预取的过程可以是:响应第一指令预取请求,根据取指地址信息从后级缓存(下一级缓存或主存)处获取预取指令数量的指令,并存储在指令缓存中。The first instruction prefetch request includes: fetch address information and the number of prefetched instructions. The process of instruction prefetching by the instruction cache in response to the first instruction prefetch request may be: responding to the first instruction prefetch request, obtaining the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information. instructions and stored in the instruction cache.
指令缓存具体被配置为响应第一指令预取请求,根据取指地址信息从后级缓存(下一级缓存或主存)处获取预取指令数量的指令,并存储在指令缓存中。例如,假设预取指令数量为64,则根据取指地址信息从后级缓存处获取64个指令。需要说明的是,预取指令数量可以根据需要进行灵活配置,并不限于此处示例的64,因此,不能将64理解成是对本申请的限制The instruction cache is specifically configured to respond to the first instruction prefetch request, obtain instructions with the number of prefetched instructions from the subsequent level cache (next level cache or main memory) according to the fetch address information, and store them in the instruction cache. For example, assuming that the number of prefetched instructions is 64, 64 instructions are fetched from the lower-level cache based on the fetch address information. It should be noted that the number of prefetch instructions can be flexibly configured as needed and is not limited to 64 as shown in the example here. Therefore, 64 cannot be understood as a limitation on this application.
可选地,多线程共享的指令预取方法还可以包括:指令缓存接收来自取指单元下发的第二指令预取请求,其中,第二指令预取请求为在线程组对应的已预取到指令缓存中的指令的存储量小于预设阈值时发送的;指令缓存响应第二指令预取请求进行指令预取。Optionally, the multi-thread shared instruction prefetching method may also include: the instruction cache receives a second instruction prefetch request issued from the instruction fetch unit, wherein the second instruction prefetch request is a prefetched instruction corresponding to the thread group. It is sent when the storage amount of instructions in the instruction cache is less than the preset threshold; the instruction cache responds to the second instruction prefetch request to perform instruction prefetching.
可选地,在指令缓存响应第二指令预取请求进行指令预取之前,多线程共享的指令预取方法还可以包括指令缓存在接收到第二指令预取请求时,确定预取状态表中不存在第二指令预取请求,其中,预取状态表用于记录正在进行中的第二指令预取请求,或者一段时间内已完成的第二指令预取请求。Optionally, before the instruction cache performs instruction prefetching in response to the second instruction prefetch request, the multi-thread shared instruction prefetching method may also include the instruction cache determining, when receiving the second instruction prefetch request, the prefetch status table There is no second instruction prefetch request, wherein the prefetch status table is used to record an ongoing second instruction prefetch request, or a second instruction prefetch request that has been completed within a period of time.
可选地,多线程共享的指令预取方法还可以包括:在接收到第二指令预取请求时,在预取状态表中不存在来自于与接收到的第二指令预取请求所对应的线程组的其他指令预取请求,则直接将接收到的第二指令预取请求记录在预取状态表中;或者,在预取状态表中存在来自于与接收到的第二指令预取请求所对应的线程组的其他指令预取请求,更新其他指令预取请求的预取范围,更新后的其他指令预取请求的预取范围包括接收到的第二指令预取请求的预取范围。Optionally, the multi-thread shared instruction prefetching method may also include: when receiving the second instruction prefetch request, there is no instruction in the prefetch status table corresponding to the received second instruction prefetch request. Other instruction prefetch requests from the thread group will directly record the received second instruction prefetch request in the prefetch status table; or, there is a received second instruction prefetch request from the prefetch status table. For other instruction prefetch requests of the corresponding thread group, the prefetch range of other instruction prefetch requests is updated, and the updated prefetch range of other instruction prefetch requests includes the prefetch range of the received second instruction prefetch request.
本申请实施例所提供的多线程共享的指令预取方法,其实现原理及产生的技术效果和前述处理器实施例相同,为简要描述,方法实施例部分未提及之处,可参考前述处理器实施例中相应内容。The implementation principle and technical effects of the multi-thread shared instruction prefetching method provided by the embodiments of the present application are the same as those of the aforementioned processor embodiments. For the sake of brief description, for matters not mentioned in the method embodiments, please refer to the aforementioned processing. The corresponding content in the device embodiment.
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。It should be noted that each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments are referred to each other. Can.
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present application, and all of them should be covered. within the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
本申请提供了一种处理器、电子设备及多线程共享的指令预取方法,属于计算机技术领域。该处理器包括:指令缓存、线程组单元;线程组单元被配置为将第一指令预取请求发送给指令缓存,第一指令预被配置为将第一指令预取请求发送给指令缓存,第一指令预取请求用于获取未来一段时间内即将要创建的线程组对应的指令;指令缓存被配置为响应第一指令预取请求进行指令预取。本申请实施例中,在创建线程组之前,提前将用于获取未来一段时间内即将要创建的线程组对应的指令的第一指令预取请求发送给指令缓存,以便于指令缓存有足够的时间提前去进行指令预取,这样可以降低后续指令访问失效的概率,特别是对于新线程来说,效果尤为明显。This application provides an instruction prefetching method shared by processors, electronic devices, and multi-threads, and belongs to the field of computer technology. The processor includes: an instruction cache and a thread group unit; the thread group unit is configured to send a first instruction prefetch request to the instruction cache; the first instruction is pre-configured to send the first instruction prefetch request to the instruction cache; An instruction prefetch request is used to obtain instructions corresponding to a thread group to be created in the future; the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetch request. In the embodiment of this application, before creating a thread group, the first instruction prefetch request for obtaining instructions corresponding to the thread group to be created in the future is sent to the instruction cache in advance, so that the instruction cache has enough time. Perform instruction prefetching in advance, which can reduce the probability of subsequent instruction access failures, especially for new threads, the effect is particularly obvious.
此外,可以理解的是,本申请的处理器、电子设备及多线程共享的指令预取方法是可以重现的,并且可以用在多种工业应用中。例如,本申请的处理器、电子设备及多线程共享的指令预取方法可以用于需要处理器的任何计算机。In addition, it can be understood that the processor, electronic device and multi-thread shared instruction prefetching method of the present application are reproducible and can be used in a variety of industrial applications. For example, the processor, electronic device, and multi-thread shared instruction prefetching method of the present application can be used in any computer that requires a processor.
Claims (17)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210649455.0A CN114721727B (en) | 2022-06-10 | 2022-06-10 | Processor, electronic equipment and multithreading shared instruction prefetching method |
CN202210649455.0 | 2022-06-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023236443A1 true WO2023236443A1 (en) | 2023-12-14 |
Family
ID=82232958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/130379 WO2023236443A1 (en) | 2022-06-10 | 2022-11-07 | Processor, electronic device and multi-thread shared instruction prefetching method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114721727B (en) |
WO (1) | WO2023236443A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114721727B (en) * | 2022-06-10 | 2022-09-13 | 成都登临科技有限公司 | Processor, electronic equipment and multithreading shared instruction prefetching method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446087A (en) * | 2010-10-12 | 2012-05-09 | 无锡江南计算技术研究所 | Instruction prefetching method and prefetching device |
CN103218309A (en) * | 2011-12-06 | 2013-07-24 | 辉达公司 | Multi-level instruction cache prefetching |
CN105159654A (en) * | 2015-08-21 | 2015-12-16 | 中国人民解放军信息工程大学 | Multi-thread parallelism based integrity measurement hash algorithm optimization method |
CN105786448A (en) * | 2014-12-26 | 2016-07-20 | 深圳市中兴微电子技术有限公司 | Instruction scheduling method and device |
US20200218539A1 (en) * | 2019-01-09 | 2020-07-09 | Intel Corporation | Instruction prefetch based on thread dispatch commands |
CN114721727A (en) * | 2022-06-10 | 2022-07-08 | 成都登临科技有限公司 | Processor, electronic equipment and multithreading shared instruction prefetching method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6965982B2 (en) * | 2001-06-29 | 2005-11-15 | International Business Machines Corporation | Multithreaded processor efficiency by pre-fetching instructions for a scheduled thread |
CN1269036C (en) * | 2003-04-24 | 2006-08-09 | 英特尔公司 | Methods and appats, for generating speculative helper therad spawn-target points |
JP4374221B2 (en) * | 2003-08-29 | 2009-12-02 | パナソニック株式会社 | Computer system and recording medium |
US7730263B2 (en) * | 2006-01-20 | 2010-06-01 | Cornell Research Foundation, Inc. | Future execution prefetching technique and architecture |
US8312442B2 (en) * | 2008-12-10 | 2012-11-13 | Oracle America, Inc. | Method and system for interprocedural prefetching |
US10599571B2 (en) * | 2017-08-07 | 2020-03-24 | Intel Corporation | Instruction prefetch mechanism |
CN114327641B (en) * | 2021-12-31 | 2025-08-19 | 海光信息技术股份有限公司 | Instruction prefetching method, instruction prefetching device, processor and electronic equipment |
-
2022
- 2022-06-10 CN CN202210649455.0A patent/CN114721727B/en active Active
- 2022-11-07 WO PCT/CN2022/130379 patent/WO2023236443A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446087A (en) * | 2010-10-12 | 2012-05-09 | 无锡江南计算技术研究所 | Instruction prefetching method and prefetching device |
CN103218309A (en) * | 2011-12-06 | 2013-07-24 | 辉达公司 | Multi-level instruction cache prefetching |
CN105786448A (en) * | 2014-12-26 | 2016-07-20 | 深圳市中兴微电子技术有限公司 | Instruction scheduling method and device |
CN105159654A (en) * | 2015-08-21 | 2015-12-16 | 中国人民解放军信息工程大学 | Multi-thread parallelism based integrity measurement hash algorithm optimization method |
US20200218539A1 (en) * | 2019-01-09 | 2020-07-09 | Intel Corporation | Instruction prefetch based on thread dispatch commands |
CN114721727A (en) * | 2022-06-10 | 2022-07-08 | 成都登临科技有限公司 | Processor, electronic equipment and multithreading shared instruction prefetching method |
Also Published As
Publication number | Publication date |
---|---|
CN114721727B (en) | 2022-09-13 |
CN114721727A (en) | 2022-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6993571B2 (en) | Power conservation in a server cluster | |
US8527709B2 (en) | Technique for preserving cached information during a low power mode | |
CN113407119B (en) | Data prefetching method, data prefetching device and processor | |
CN102111448B (en) | Data prefetching method of DHT memory system and node and system | |
US7934110B2 (en) | Dynamically managing thermal levels in a processing system | |
TWI291651B (en) | Apparatus and methods for managing and filtering processor core caches by using core indicating bit and processing system therefor | |
TWI518585B (en) | Electronic device having scratchpad memory and management method for scratchpad memory | |
US9229865B2 (en) | One-cacheable multi-core architecture | |
WO2016101664A1 (en) | Instruction scheduling method and device | |
JP2010113734A (en) | Inter-processor interrupt | |
US9323315B2 (en) | Method and system for automatic clock-gating of a clock grid at a clock source | |
US20070180158A1 (en) | Method for command list ordering after multiple cache misses | |
CN112231243B (en) | Data processing method, processor and electronic equipment | |
CN109753445B (en) | Cache access method, multi-level cache system and computer system | |
CN106960054B (en) | Data file access method and device | |
JP2015122094A (en) | Low-complexity instruction prefetch system | |
WO2022143692A1 (en) | Data pre-fetching method and apparatus, and device | |
WO2023236443A1 (en) | Processor, electronic device and multi-thread shared instruction prefetching method | |
CN116048425A (en) | A layered cache method, system and related components | |
WO2012128769A1 (en) | Dynamically determining profitability of direct fetching in a multicore architecture | |
US7162588B2 (en) | Processor prefetch to match memory bus protocol characteristics | |
CN118035132A (en) | Cache data prefetching method, processor and electronic device | |
CN114579480A (en) | Page missing processing method, device and system, electronic equipment and storage medium | |
US10884477B2 (en) | Coordinating accesses of shared resources by clients in a computing device | |
WO2012127631A1 (en) | Processing unit, information processing device and method of controlling processing unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22945572 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22945572 Country of ref document: EP Kind code of ref document: A1 |