WO2025130918A1 - Service processing apparatus and method, and device and storage medium - Google Patents
Service processing apparatus and method, and device and storage medium Download PDFInfo
- Publication number
- WO2025130918A1 WO2025130918A1 PCT/CN2024/140283 CN2024140283W WO2025130918A1 WO 2025130918 A1 WO2025130918 A1 WO 2025130918A1 CN 2024140283 W CN2024140283 W CN 2024140283W WO 2025130918 A1 WO2025130918 A1 WO 2025130918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- operation information
- historical operation
- information
- processing
- buffer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0715—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0721—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
- G06F11/0724—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU] in a multiprocessor or a multi-core unit
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
Definitions
- the present application relates to the field of electronic technology, and in particular to a business processing device, method, equipment and storage medium.
- a central processing unit (CPU) subsystem is usually used for business processing.
- the CPU subsystem includes multiple processing cores, which can be used to perform benchmark performance testing, business performance testing, similar business deployment, or business suite deployment.
- the scale of instructions and data of the above-mentioned businesses is large, and the cache resources of the multiple processing cores in the CPU subsystem are limited, so the CPU subsystem has a low efficiency problem when running the above-mentioned businesses.
- the present application provides a business processing apparatus, method, device and storage medium for improving the operating efficiency when multiple processing cores run businesses.
- a business processing device comprising: a global predictor, and multiple processing cores coupled to the global predictor; the global predictor is used to cache historical operation information of at least one processing core among the multiple processing cores, and the at least one processing core may be part or all of the multiple processing cores; a first processing core among the multiple processing cores is used to obtain first historical operation information among the historical operation information, and operate a first business according to the first historical operation information to obtain first operation information, and the first historical operation information may be operation information related to the first business, and the first processing core is any processing core among the multiple processing cores; the global predictor is also used to cache the first operation information, that is, the global predictor can dynamically update the historical operation information according to the operation information obtained by any processing core among the multiple processing cores when operating the business.
- historical operation information is cached by a global predictor, as well as operation information generated by any processing core among multiple processing cores when running a business, so as to update the cached operation information for subsequent use.
- Any processing core among the multiple processing cores can obtain the required operation information from the global predictor when running a business.
- this solution can reduce the cache miss rate when the multiple processing cores process the business, improve the prediction accuracy and IPC, and thus improve the operation efficiency of the business.
- the historical operation information includes historical operation information of multiple services, and the historical operation information of the multiple services includes first historical operation information;
- the global predictor includes: multiple buffers, used to cache the historical operation information of the multiple services respectively, wherein the first buffer of the multiple buffers is used to cache the first historical operation information, for example, the historical operation information of different services can be cached in different buffers accordingly.
- the first processing core is further used to: obtain configuration information for the first service, the configuration information is used to indicate the first buffer (for example, to indicate the size of the first buffer, and/or to indicate the address space corresponding to the first buffer), the configuration information may be sent by the software running on the multiple processing cores, or may be pre-configured in the first processing core; configure the first buffer for the first service according to the configuration information, wherein configuring the first buffer includes enabling, disabling or clearing the first buffer, for example, enabling the first buffer through the configuration information when it is needed, and disabling or clearing the first buffer through the configuration information after use.
- the resources in the global predictor can be configured to the key services running on the processing core, thereby avoiding resource bottlenecks and further improving the operating efficiency of the services; in addition, by enabling, disabling or clearing the buffer in the global predictor, the utilization rate of the resources in the global predictor can also be improved.
- the size (or size) of each buffer in the multiple buffers is determined according to the business to which the historical operation information cached in the buffer belongs, for example, the size of the first buffer is determined according to the first business.
- determining the size of the buffer used to cache the historical operation information of a business according to a certain business can improve the accuracy of the configured buffer size, thereby avoiding the problem of wasting resources or insufficient resources.
- the multiple processing cores also include: a second processing core, which is used to: when the first service is switched from the first processing core to the second processing core, obtain the first historical operation information from the first buffer, and run the first service according to the first historical operation information.
- the second processing core can still obtain the first historical operation information from the first buffer that caches the first service, thereby avoiding the problem of a large number of cold misses in the second processing core when the first service is switched.
- the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
- the historical operation information or the first operation information further includes a branch block index, and the branch block index is used to index the at least one item; in this case, when the branch block corresponding to the branch block index is accessed, a search of the global predictor can be triggered.
- the speculation accuracy and speculation efficiency of the branch blocks in the business can be greatly improved, thereby improving the operation efficiency of the business.
- the first processing core includes: a buffer, which is used to obtain at least part of the first historical operation information from the global predictor and cache the at least part.
- a buffer which is used to obtain at least part of the first historical operation information from the global predictor and cache the at least part.
- the first historical operation information is historical operation information of other services that are the same as or belong to the same type as the first service.
- the operation information of the first service has a large degree of duplication and similarity with the first historical operation information.
- the first processing core runs the first service according to the first historical operation information, which can greatly improve the operation efficiency of the first service.
- a business processing method is provided, which is applied to a business processing device, the device comprising a global predictor and a plurality of processing cores coupled to the global predictor; the method comprising: the global predictor caches historical operation information of at least one processing core among the plurality of processing cores; a first processing core among the plurality of processing cores obtains first historical operation information in the historical operation information, and runs a first business according to the first historical operation information to obtain first operation information, the first processing core being any processing core among the plurality of processing cores; the global predictor caches the first operation information.
- the historical operation information includes historical operation information of multiple businesses, and the historical operation information of the multiple businesses includes first historical operation information; the global predictor caches historical operation information of at least one processing core among the multiple processing cores, including: multiple buffers of the global predictor respectively cache the historical operation information of the multiple businesses, wherein the first buffer among the multiple buffers is used to cache the first historical operation information.
- the method also includes: the first processing core obtains configuration information for the first service, and the configuration information is used to indicate a first buffer; the first processing core configures the first buffer for the first service according to the configuration information, wherein configuring the first buffer includes enabling, disabling or clearing the first buffer.
- the size of the first buffer is determined according to the first service.
- the multiple processing cores also include a second processing core
- the method also includes: when the first business is switched from the first processing core to the second processing core, the second processing core obtains the first historical operation information from the first buffer and runs the first business according to the first historical operation information.
- the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
- the historical operation information or the first operation information further includes a branch block index, and the branch block index is used to index the at least one item.
- the first processing core obtains first historical operation information in the historical operation information, including: a buffer of the first processing core obtains at least part of the first historical operation information from the global predictor and caches the at least part.
- the first historical operation information is historical operation information of other services that are the same as or belong to the same type as the first service.
- an electronic device which includes a memory and a business processing device, the business processing device includes a global predictor and multiple processing cores, the memory is used to store computer instructions, and the business processing device is used to execute the computer instructions so that the electronic device implements the business processing method provided by the second aspect or any possible implementation of the second aspect.
- a computer-readable storage medium in which a computer program or instruction is stored.
- the computer program or instruction is executed, the business processing method provided by the second aspect or any possible implementation of the second aspect is implemented.
- a computer program product which includes: a computer program (also referred to as code, or instruction), which, when executed, enables a computer to execute a business processing method provided in the second aspect or any possible implementation of the second aspect.
- a computer program also referred to as code, or instruction
- FIG1 is a schematic diagram of the structure of a multi-core system provided in an embodiment of the present application.
- FIG2 is a schematic diagram of the structure of a service processing device provided in an embodiment of the present application.
- FIG3 is a schematic diagram of the structure of a LBTB provided in an embodiment of the present application.
- FIG4 is a schematic diagram of the structure of another LBTB provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a processing core acquiring historical operation information provided by an embodiment of the present application.
- FIG6 is a schematic diagram of the structure of an sBTB provided in an embodiment of the present application.
- FIG7 is a schematic diagram of a method of sharing global prediction among multiple processing cores provided in an embodiment of the present application.
- FIG8 is a schematic diagram of configuring a global predictor provided in an embodiment of the present application.
- FIG. 9 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- circuits or other components may be described or referred to as being “configured to” perform one or more tasks.
- “configured to” is used to imply structure by indicating that the circuit/component includes structure (e.g., circuitry) that performs the one or more tasks during operation. Thus, even when the specified circuit/component is not currently operational (e.g., not turned on), the circuit/component may be referred to as being configured to perform the task.
- Circuits/components used with the phrase “configured to” include hardware, such as circuits that perform an operation, etc.
- At least one of a, b or c can represent: a, b, c, a and b, a and c, b and c, a, b and c; where a, b and c can be single or multiple.
- the central processing unit (CPU) subsystem usually has the problem of low efficiency when processing business.
- the specific manifestations of this problem are: high cache miss rate, low branch prediction accuracy, and low instruction number/clock cycle (instruction per clock, IPC).
- IPC instruction per clock
- the scale of instructions and data of the business running on the CPU subsystem is large, while the capacity of resources such as cache, branch target buffer (BTB) and branch direction predictor in the CPU subsystem is limited, which easily causes more cache misses.
- the software in the CPU subsystem does not perceive speculative means such as front-end prefetching and back-end prefetching, which makes the prefetching information unable to assist other predictions, resulting in information waste.
- the number of threads of an application running on the CPU subsystem is far greater than the number of processing cores (or kernels, also called processor cores) included in the CPU subsystem, which can easily cause a large number of thread switches (including switches within the same processing core and switches between different processing cores).
- threads switch the data stored in resources such as the cache, BTB, and branch direction predictor are not the data required by the current thread, resulting in a large number of cold miss problems for the current thread.
- an embodiment of the present application provides a business processing device, which includes a global predictor (GP) and multiple processing cores coupled to the global predictor.
- the global predictor can be used to cache historical operation information, and any one of the multiple processing cores can be used to obtain the required operation information in the historical operation information, and run the business according to the obtained operation information.
- the global predictor is also used to cache the operation information obtained by running the business.
- the multiple processing cores obtain the required operation information from the global predictor when running the business, which can reduce the cache miss rate of the multiple processing cores when processing the business, improve the prediction accuracy and IPC (number of instructions/clock cycle), and thus improve the operation efficiency of the business.
- the business processing device provided in the embodiment of the present application can be applied to a multi-core system, which can be called a processor subsystem (for example, a CPU subsystem).
- a processor subsystem for example, a CPU subsystem.
- the structure of the multi-core system is introduced and described below.
- the service scenarios of the multi-core system include but are not limited to: server cluster benchmark performance testing, server cluster bidding service performance testing, server cluster similar service deployment (e.g., high performance computing (HPC)), server cluster service suite deployment, terminal field benchmark performance testing, service switching, service migration, etc.
- server cluster benchmark performance testing e.g., server cluster bidding service performance testing
- server cluster similar service deployment e.g., high performance computing (HPC)
- server cluster service suite deployment e.g., terminal field benchmark performance testing, service switching, service migration, etc.
- the number of multiple processing cores included in the multi-core system can be configured according to actual needs.
- the number of the multiple processing cores can be 2, 4, 6 or 8; in addition, the multiple processing cores can be a homogeneous structure (i.e., multiple processing cores included) or a heterogeneous structure (i.e., including different processing cores).
- the multi-core system includes: a global predictor, and 4 processing cores coupled to the global predictor, the 4 processing cores are respectively represented as core 0 to core 3, and the 4 processing cores are a homogeneous structure.
- FIG. 1 the multi-core system includes: a global predictor, and 4 processing cores coupled to the global predictor, the 4 processing cores are respectively represented as core 0 to core 3, and the 4 processing cores are a homogeneous structure.
- FIG. 1 the multi-core system includes: a global predictor, and 4 processing cores coupled to the global predictor, the 4 processing cores are respectively represented as core 0 to core 3, and the 4
- the multi-core system includes: a global predictor, and 8 processing cores coupled to the global predictor, the 8 processing cores are respectively represented as core 0 to core 3, and the 8 processing cores are a homogeneous structure.
- the multi-core system includes: a global predictor, and 8 processing cores coupled to the global predictor, the 8 processing cores may be a heterogeneous structure, and the 8 processing cores may include super large core 0, large core 1 to large core 3, and small core 4 to small core 7.
- the multiple processing cores may include multiple different processing cores of the same processor, or the multiple processing cores may include processing cores of multiple different processors, that is, the multiple processing cores may form one or more processors.
- the processors include, but are not limited to, CPU, general-purpose processor, graphic processing unit (GPU), image signal processor (ISP), digital signal processor (DSP), network processor (NPU), artificial intelligence (AI) processor, etc.
- the structures of any two processing cores among the multiple processing cores and the sizes of their respective hardware resources may be the same or different, and the embodiments of the present application do not impose specific restrictions on this.
- the multi-core system may also include a memory coupled to a plurality of processing cores, the memory may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
- the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
- the volatile memory may be a random access memory (RAM), which is used as an external cache.
- the RAM may be a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), and a double data rate synchronous dynamic random access memory (DDR SDRAM).
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- the multi-core system can be an electronic device, or a system on chip (SoC) or a chipset including multiple chips applied to an electronic device, or a module including the SoC or chipset.
- SoC system on chip
- the electronic device can be used as a terminal device or a server.
- the electronic device includes but is not limited to: mobile phones, tablet computers, laptop computers, desktop computers, PDAs, ultra-mobile personal computers (ultra-mobile personal computers, umPCs), mobile internet devices (mobile internet devices, MIDs), netbooks, cameras, cameras, wearable devices (such as smart watches and smart bracelets, etc.), vehicle-mounted devices (such as cars, bicycles, electric vehicles, airplanes, ships, trains, high-speed trains, etc.), virtual reality (VR) devices, augmented reality (AR) devices, industrial control (i
- the wireless terminals include wireless terminals in industrial control, smart home devices (e.g., refrigerators, TVs, air conditioners, electric meters, etc.), intelligent robots, workshop equipment, wireless terminals in self-driving, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes, and flying equipment (e.g., intelligent robots, hot air balloons, drones, airplanes), etc.
- smart home devices e
- FIG2 is a schematic diagram of the structure of a service processing device provided by an embodiment of the present application, wherein the service processing device includes a global predictor 10 and a plurality of processing cores 20 coupled to the global predictor 10, wherein the plurality of processing cores 20 share the global predictor 10, for example, by configuring the plurality of processing cores 20 through software to share different resources of the global predictor 10.
- FIG2 is taken as an example in which the plurality of processing cores 20 include processing cores 21 to 2n, where n is an integer greater than 1.
- the global predictor 10 is used to: cache historical operation information of at least one processing core among the multiple processing cores 20.
- the at least one processing core includes one or more processing cores. Wherein, when the at least one processing core includes multiple processing cores, the at least one processing core may be a part of the multiple processing cores 20, or may be all of the multiple processing cores 20.
- the historical operation information cached in the global predictor 10 may be obtained by the at least one processing core running a service. For example, each of the at least one processing core may run one or more services, and transmit the operation information obtained by the operation to the global predictor 10.
- the global predictor 10 caches the operation information transmitted by the at least one processing core as the historical operation information.
- any one of the multiple processing cores 20 (hereinafter referred to as the first processing core 21 for the convenience of description) is used to: obtain the first historical operation information in the historical operation information, and run the first business according to the first historical operation information to obtain the first operation information.
- the first historical operation information may be part or all of the historical operation information.
- the first historical operation information is the historical operation information of other businesses that are the same as or belong to the same type as the first business.
- the first historical operation information may be the historical operation information of the first business, that is, the operation information obtained by the previous operation of the first business; or, the first business and the second business are two businesses in the same high-performance computing, and the first historical operation information is the historical operation information of the second business.
- the following description takes the first historical operation information as the historical operation information of the first business as an example.
- the global predictor 10 is also used to: cache the first operation information. For example, when the first processing core 21 obtains the first operation information, it transmits the first operation information to the global predictor 10, and the global predictor 10 receives and caches the first operation information. After the global predictor 10 caches the first operation information, when any processing core of the multiple processing cores 20 (for example, the first processing core 21 or other processing cores) obtains historical operation information from the global predictor 10, the historical operation information includes the first operation information. That is, the historical operation information is dynamically updated, and the global predictor 10 can dynamically update the historical operation information according to the operation information obtained by the multiple processing cores 20 running the business.
- the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
- the branch address pair information refers to the address of the executed branch in the memory, and the address includes a source address and a target address, and the source address and the target address form an address pair.
- the jump information may include a jump direction and a jump destination, and the jump direction is used to indicate the jump direction of the jump instruction (i.e., jump or not jump), and the jump destination refers to the destination address after the jump.
- the data trace is used to indicate the trace of the accessed data.
- the instruction trace is used to indicate the trace of the accessed instruction.
- the cache hot and cold information is used to indicate the hot and cold degree of data and/or instructions in the cache (e.g., the first, second and third level cache).
- the scheduling recommendation information is used to indicate the recommended information in the scheduling process, such as the scheduling policy recommended based on the quality of service (QoS).
- QoS quality of service
- the page table information is used to indicate the correspondence between the virtual address and the physical address corresponding to the accessed data and/or instruction.
- the regularity information of hardware prefetching is used to indicate the regularity of hardware prefetching data and/or instructions, which may include front-end prefetching and/or back-end prefetching.
- the difficult-to-predict address information is used to indicate the address information that is difficult to predict for the hardware prefetcher (hardware prefetch, HWP).
- the memory targeted by the above address may be located on the same or different chip as the above one or more processing cores, and may be a volatile or non-volatile memory, which is not limited in this embodiment.
- the above-mentioned historical operation information or first operation information may also include other information required in the business operation process (such as value prediction information, which may refer to the numerical value obtained by prediction when the memory access is not completed).
- value prediction information which may refer to the numerical value obtained by prediction when the memory access is not completed.
- the above historical operation information or the first operation information also includes a branch block index
- the branch block index is used to index at least one of the above information.
- the search of the global predictor 10 can be triggered.
- the branch block index can also be called a branch block program counter (program counter, PC) index (block PC index).
- program counter program counter
- block PC index block program index
- the historical operation information of each business may include at least one branch block index, and historical operation sub-information corresponding to each branch block index, and each historical operation sub-information may include at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
- Each of the above-mentioned multiple businesses may include at least one branch block, and each branch block (or subroutine or program segment) in the at least one branch block may correspond to a branch block index, and the branch block may correspond to a historical operation sub-information after running, and the branch block index of a branch block may be used to index the historical operation sub-information obtained after the branch block runs.
- the branch block index may be an instruction fetch address, and the instruction fetch address may be a source address corresponding to the branch block.
- the historical operation information may be shown in the following Table 1.
- the multiple services include M services, and the M services are represented as services 1 to M, at least one branch block index of service 1 is represented as ID1-1, ID1-2, ..., at least one branch block index of service 2 is represented as ID2-1, ID2-2, ..., and at least one branch block index of service M is represented as IDM-1, IDM-2, ...
- Table 1 uses the example that the information types included in the historical operation sub-information corresponding to different branch block indexes are the same. In actual applications, the information types included in the historical operation sub-information corresponding to different branch block indexes may also be different, and the embodiments of the present application do not impose specific restrictions on this.
- the global predictor 10 may also include other information of each of the multiple services.
- the other information may include a service identifier and/or a service context, etc. This embodiment of the present application does not impose any specific limitation on this.
- the global predictor 10 includes a plurality of buffers.
- the plurality of buffers are used to cache the historical operation information of the plurality of services respectively.
- Each of the plurality of buffers can be used to cache the historical operation information obtained by the operation of a processing core, that is, the historical operation information obtained by the operation of different processing cores can be cached in different buffers; or, to cache the historical operation information of a service, that is, the historical operation information of different services can be cached in different buffers.
- the multiple buffers include a first buffer, the first buffer is used to cache historical operation information of the first business, and the historical operation information of the first business may include first historical operation information.
- the second processing core 22 can be used to obtain the first historical operation information from the first buffer, and run the first business according to the first historical operation information.
- the second processing core 22 can still obtain the first historical operation information from the first buffer that caches the first business, thereby avoiding the problem of a large number of cold misses in the second processing core 22 when the first business is switched.
- the size (or dimension) of each buffer in the multiple buffers may be determined according to the business to which the historical operation information cached in the buffer belongs.
- the size of the first buffer is determined according to the first business. That is, the size of the buffer used to cache the historical operation information of a business may be dynamically determined according to the size of the business.
- the size of the corresponding buffer may be dynamically determined for each business by software (e.g., an operating system) running on the multiple processing cores 20, and the embodiments of the present application do not impose specific restrictions on this.
- the global predictor 10 includes a multi-way buffer, and each buffer in the above-mentioned multiple buffer areas may include at least one buffer in the multi-way buffer.
- each buffer in the multi-way buffer can be used to cache historical operation information of a service, and historical operation information of different services is cached in different buffers. The historical operation information of the same service can occupy one or more buffers.
- the multi-way buffers are independent of each other, and each buffer can be accessed separately, and the access corresponding to the buffers of different ways does not affect each other.
- the multi-way buffer includes a first buffer, and the first buffer is used to cache the historical operation information of the first service.
- each buffer in the global predictor 10 can be cached by the following structure.
- the first processing core 21 runs the first service
- the first buffer is used to cache the historical operation information of the first service
- the historical operation information of the first service includes branch address pair information.
- the first buffer can also be called a link branch target buffer (LBTB).
- the LBTB may include: a history queue, a storage circuit, a search queue, and a fill queue.
- the history queue, the search queue, and the fill queue are all coupled to the storage circuit.
- the history queue is used to obtain the correct branch address pair (i.e., including the source address and the target address) from the pipeline corresponding to the first business when a branch prediction target address fails (represented as a prediction failure in the figure, and the prediction failure may mean that the target address corresponding to the branch is not found in the cache of the first processing core 21 (e.g., the first buffer and the second buffer hereinafter)) during the operation of the first business, and output the branch address pair to the storage circuit.
- a branch prediction target address fails (represented as a prediction failure in the figure, and the prediction failure may mean that the target address corresponding to the branch is not found in the cache of the first processing core 21 (e.g., the first buffer and the second buffer hereinafter)) during the operation of the first business, and output the branch address pair to the storage circuit.
- the history queue can be used to output the preset number of branch address pairs as a group to the storage circuit in a certain storage format when a preset number of branch address pairs (e.g., 4 address pairs) are accumulated.
- the preset number of branch address pairs includes brn add0-tgt add0, brn add1-tgt add1, brn add2-tgt add2, brn add3-tgt add3, and the storage format also includes a link address link add as an example, and the link add is used to indicate the branch address pair of the next group.
- the storage circuit is used to receive and cache the branch address pairs output by the history queue.
- the storage circuit can index the source address of the first branch address pair and write the group of branch addresses into the corresponding storage.
- the storage circuit can include multi-way connected RAM, so that the search efficiency can be improved when searching the storage circuit.
- the search queue is used to obtain the instruction fetch address of the branch block and output the instruction fetch address to the storage circuit.
- the search queue can receive and filter duplicate address information; in addition, the search queue can be a multi-input and one-output queue, so that the bandwidth difference between input and output can be balanced.
- the search queue can obtain the instruction fetch address when the first processing core 21 meets certain conditions.
- the conditions may include but are not limited to: querying and missing in the main branch target buffer (main branch target buffer, mBTB), or querying and hitting in the stream branch target buffer (stream branch target buffer, sBTB).
- main branch target buffer main branch target buffer
- sBTB stream branch target buffer
- the storage circuit is also used for: when receiving the instruction fetch address of the branch block of the first processing core 21, obtaining the corresponding branch address pair according to the instruction fetch address, and outputting the branch address pair to the backfill queue.
- the storage circuit can disassemble the obtained group of branch addresses, and output a preset number of branch address pairs obtained by disassembly to the backfill queue, and output the link add in the group of branch address pairs to the search queue, so that the search queue performs the next search based on the link add.
- the global predictor 10 may include a history queue, a storage circuit, a search queue, and a backfill queue corresponding to each processing core.
- the different storage circuits in the global predictor 10 may be integrated together as a storage module; or, the global predictor includes a storage module, the storage module includes a storage circuit corresponding to each processing core, and the storage circuit includes a multi-way cache.
- the first processing core 21 includes a first buffer, and the first buffer is used to obtain at least part of the first historical operation information from the global predictor 10 and cache the at least part.
- the first processing core 21 may also include a second buffer, and the second buffer is different from the first buffer.
- the second buffer may refer to a buffer set in the first processing core 21 for caching prefetch information, and the second buffer is not used to cache the historical operation information obtained from the global predictor 10.
- the first buffer may also be referred to as a candidate buffer (e.g., sBTB), and the second buffer may be referred to as a primary buffer (e.g., mBTB).
- the output end of the mBTB and the output end of the sBTB may also be coupled by a selector, which is used to select the output result of the mBTB when the mBTB hits, and select the output result of the sBTB when the sBTB hits.
- Figure 8 is an example of core 0, core 1 and core 2 all including HWP and SBTB.
- enabling the global predictor 10 through configuration information includes: S11.
- the software running on the multiple processing cores 20 can send the first configuration information for the service a to core 0; S12.
- the HWP of core 0 enables the global predictor 10 according to the first configuration information, for example, sending service information and enable information to the global predictor 10 to enable the global predictor 10 to cache the historical operation information of service a; S13.
- Core 0 can also enable sBTB when the information currently cached in sBTB matches service a; S14. The global predictor 10 starts working.
- shutting down the global predictor 10 through configuration information includes: S21.
- the software running on the multiple processing cores 20 can send second configuration information for the service b to core 1; S22.
- the HWP of core 1 shuts down the global predictor 10 according to the second configuration information, for example, sending service information and shutdown information to the global predictor 10; S23.
- the global predictor 10 stops providing services for the service b on core 1, that is, stops providing historical operation information of the service to core 1.
- clearing the global predictor 10 through configuration information includes: S31.
- the software running on the multiple processing cores 20 can send the third configuration information for the business c to core 2; S32.
- the HWP of core 2 clears the global predictor 10 according to the third configuration information, for example, sending thread information and clearing information to the global predictor 10; S33.
- the global predictor 10 clears the information corresponding to the thread, that is, clears the historical running information of the business c.
- the global predictor 10 can be cleared using a row-by-row clearing method.
- the global predictor 10 caches historical operation information and the operation information generated by any processing core running the business, so as to update the cached operation information for subsequent use.
- the multiple processing cores 20 obtain the operation information they need from the global predictor 10. In this way, compared with expanding a larger capacity cache for the multiple processing cores, the cache miss rate of the multiple processing cores 20 when processing the business can be reduced, the prediction accuracy and IPC can be improved, and the operation efficiency of the business can be improved.
- the global predictor 10 can be shared by the multiple processing cores 20, when the business is switched from one processing core to another, the other processing core can still obtain the historical operation information of the business from the global predictor 10 for running the business, thereby avoiding the problem of a large number of cold misses in the other processing core.
- the resources in the global predictor 10 used by any processing core among the multiple processing cores 20 are obtained through software configuration, and the multiple processing cores 20 can also be configured with services that can use the global predictor 10, so as to avoid information overflow in the global predictor 10 causing security information leakage.
- key services can be identified and the resources in the global predictor 10 can be configured to the key services running on the processing core, thereby avoiding resource bottlenecks and further improving the operating efficiency of the services.
- an embodiment of the present application also provides a service processing method, which can be applied to a service processing device including a global predictor and multiple processing cores coupled to the global predictor.
- the method includes: the global predictor caches historical operation information of at least one processing core among the multiple processing cores; the first processing core among the multiple processing cores obtains the first historical operation information in the historical operation information, and runs the first service according to the first historical operation information to obtain the first operation information; the global predictor caches the first operation information.
- the historical operation information includes historical operation information of multiple businesses, and the historical operation information of the multiple businesses includes first historical operation information.
- the global predictor includes multiple buffers, and the multiple buffers cache the historical operation information of the multiple services respectively, wherein the first buffer of the multiple buffers caches the first historical operation information.
- the method may also include: the first processing core obtains configuration information for the first service, and the configuration information is used to indicate the first buffer; the first processing core configures the first buffer for the first service according to the configuration information. Configuring the first buffer includes enabling, disabling or clearing the first buffer.
- historical operation information and operation information generated by any processing core running a business are cached by a global predictor to update the cached operation information for subsequent use.
- Any processing core among the multiple processing cores can obtain the required operation information from the global predictor when running a business. In this way, compared with expanding a larger capacity cache for the multiple processing cores, this solution can reduce the cache miss rate when the multiple processing cores process the business, improve the prediction accuracy and IPC, and thus improve the operation efficiency of the business.
- the disclosed devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic, for example, the division of the modules or units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed.
- the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium, which can include: a USB flash drive, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., which can store program codes.
- a readable storage medium can include: a USB flash drive, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., which can store program codes.
- a readable storage medium in which computer execution instructions are stored.
- a device which may be a single-chip microcomputer, chip, etc.
- a processor executes the steps in the above method embodiment.
- a computer program product which includes computer instructions stored in a readable storage medium; at least one processor of the device can read the computer instructions from the readable storage medium, and at least one processor executes the computer instructions so that the device performs the steps in the above method embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Advance Control (AREA)
Abstract
Description
本申请要求于2023年12月22日提交国家知识产权局、申请号为202311795739.1、申请名称为“一种业务处理装置、方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office on December 22, 2023, with application number 202311795739.1 and application name “A business processing device, method, equipment and storage medium”, all contents of which are incorporated by reference in this application.
本申请涉及电子技术领域,尤其涉及一种业务处理装置、方法、设备及存储介质。The present application relates to the field of electronic technology, and in particular to a business processing device, method, equipment and storage medium.
目前,通常采用中央处理单元(central processing unit,CPU)子系统进行业务处理,该CPU子系统中包括多个处理核,该多个处理核可用于执行基准性能测试、业务性能测试、相似业务部署或业务套件部署等业务。上述业务的指令和数据的规模较大,而该CPU子系统中多个处理核的缓存资源有限,从而该CPU子系统在运行上述业务时存在效率低的问题。At present, a central processing unit (CPU) subsystem is usually used for business processing. The CPU subsystem includes multiple processing cores, which can be used to perform benchmark performance testing, business performance testing, similar business deployment, or business suite deployment. The scale of instructions and data of the above-mentioned businesses is large, and the cache resources of the multiple processing cores in the CPU subsystem are limited, so the CPU subsystem has a low efficiency problem when running the above-mentioned businesses.
本申请提供一种业务处理装置、方法、设备及存储介质,用于提高多个处理核运行业务时的运行效率。The present application provides a business processing apparatus, method, device and storage medium for improving the operating efficiency when multiple processing cores run businesses.
为达到上述目的,本申请的实施例采用如下技术方案:To achieve the above objectives, the embodiments of the present application adopt the following technical solutions:
第一方面,提供一种业务处理装置,包括:全局预测器、以及耦合至该全局预测器的多个处理核;该全局预测器,用于缓存该多个处理核中至少一个处理核的历史运行信息,该至少一个处理核可以为该多个处理核中的部分或全部;该多个处理核中的第一处理核,用于获取该历史运行信息中的第一历史运行信息,并根据第一历史运行信息运行第一业务,以得到第一运行信息,第一历史运行信息可以是与第一业务相关的运行信息,第一处理核为该多个处理核中的任一处理核;该全局预测器,还用于缓存第一运行信息,即该全局预测器可根据该多个处理核中任一处理核运行业务得到的运行信息动态更新该历史运行信息。In a first aspect, a business processing device is provided, comprising: a global predictor, and multiple processing cores coupled to the global predictor; the global predictor is used to cache historical operation information of at least one processing core among the multiple processing cores, and the at least one processing core may be part or all of the multiple processing cores; a first processing core among the multiple processing cores is used to obtain first historical operation information among the historical operation information, and operate a first business according to the first historical operation information to obtain first operation information, and the first historical operation information may be operation information related to the first business, and the first processing core is any processing core among the multiple processing cores; the global predictor is also used to cache the first operation information, that is, the global predictor can dynamically update the historical operation information according to the operation information obtained by any processing core among the multiple processing cores when operating the business.
上述技术方案中,通过全局预测器缓存历史运行信息、以及缓存多个处理核中任一处理核运行业务所产生的运行信息,以实现缓存的运行信息的更新,以供后续使用,该多个处理核中的任一处理核在运行业务时可以从全局预测器中获取所需的运行信息,这样相对于为该多个处理核扩展了更大容量的缓存,本方案可以降低该多个处理核处理业务时的缓存缺失率、提高预测准确率和IPC,进而提高业务的运行效率。In the above technical solution, historical operation information is cached by a global predictor, as well as operation information generated by any processing core among multiple processing cores when running a business, so as to update the cached operation information for subsequent use. Any processing core among the multiple processing cores can obtain the required operation information from the global predictor when running a business. In this way, compared with expanding a larger capacity cache for the multiple processing cores, this solution can reduce the cache miss rate when the multiple processing cores process the business, improve the prediction accuracy and IPC, and thus improve the operation efficiency of the business.
在第一方面的一种可能的实现方式中,该历史运行信息包括多个业务的历史运行信息,该多个业务的历史运行信息包括第一历史运行信息;该全局预测器包括:多个缓冲区,用于分别缓存该多个业务的历史运行信息,其中该多个缓冲区中的第一缓冲区用于缓存第一历史运行信息,比如,不同业务的历史运行信息可对应缓存在不同的缓冲区中。上述可能的实现方式中,通过将该多个业务的历史运行信息分别缓存在多个缓冲区中,可以避免该多个处理核中的任一处理核获取不属于自身业务的历史运行信息,避免该全局预测器中的历史运行信息外溢造成安全信息泄露,进而提高了该全局预测器中历史运行信息的安全性。In a possible implementation of the first aspect, the historical operation information includes historical operation information of multiple services, and the historical operation information of the multiple services includes first historical operation information; the global predictor includes: multiple buffers, used to cache the historical operation information of the multiple services respectively, wherein the first buffer of the multiple buffers is used to cache the first historical operation information, for example, the historical operation information of different services can be cached in different buffers accordingly. In the above possible implementation, by caching the historical operation information of the multiple services in multiple buffers respectively, it is possible to prevent any of the multiple processing cores from obtaining historical operation information that does not belong to its own service, and to prevent the overflow of historical operation information in the global predictor from causing security information leakage, thereby improving the security of the historical operation information in the global predictor.
在第一方面的一种可能的实现方式中,第一处理核还用于:获取针对第一业务的配置信息,该配置信息用于指示第一缓冲区(比如,用于指示第一缓冲区的大小,和/或指示第一缓冲区对应的地址空间),该配置信息可以是由运行在该多个处理核上的软件发送的,也可以是事先配置在第一处理核中;根据该配置信息为第一业务配置第一缓冲区,其中配置第一缓冲区包括使能、关闭或清除第一缓冲区,比如在需要使用第一缓冲区时通过配置信息进行使能,在使用结束后通过配置信息关闭或清除第一缓冲区。上述可能的实现方式中,通过配置信息为该多个处理核中的任一处理核的某一业务在该全局预测器中配置相应的缓冲区,可以将该全局预测器中的资源配置给处理核上运行的关键业务,从而避免了资源瓶颈,进一步提高了业务的运行效率;此外,通过使能、关闭或清除该全局预测器中的缓冲区,还可以提高该全局预测器中资源的利用率。In a possible implementation of the first aspect, the first processing core is further used to: obtain configuration information for the first service, the configuration information is used to indicate the first buffer (for example, to indicate the size of the first buffer, and/or to indicate the address space corresponding to the first buffer), the configuration information may be sent by the software running on the multiple processing cores, or may be pre-configured in the first processing core; configure the first buffer for the first service according to the configuration information, wherein configuring the first buffer includes enabling, disabling or clearing the first buffer, for example, enabling the first buffer through the configuration information when it is needed, and disabling or clearing the first buffer through the configuration information after use. In the above possible implementation, by configuring a corresponding buffer in the global predictor for a service of any of the multiple processing cores through the configuration information, the resources in the global predictor can be configured to the key services running on the processing core, thereby avoiding resource bottlenecks and further improving the operating efficiency of the services; in addition, by enabling, disabling or clearing the buffer in the global predictor, the utilization rate of the resources in the global predictor can also be improved.
在第一方面的一种可能的实现方式中,该多个缓冲区中每个缓冲区的大小(或称为尺寸)是根据该缓存区所缓存的历史运行信息所属的业务确定的,比如,第一缓冲区的大小根据第一业务确定。上述可能的实现方式中,根据某一业务确定用于缓存该业务的历史运行信息的缓存区的大小,可以提高配置的缓冲区大小的准确性,从而避免造成资源浪费或资源不足的问题。In a possible implementation of the first aspect, the size (or size) of each buffer in the multiple buffers is determined according to the business to which the historical operation information cached in the buffer belongs, for example, the size of the first buffer is determined according to the first business. In the above possible implementation, determining the size of the buffer used to cache the historical operation information of a business according to a certain business can improve the accuracy of the configured buffer size, thereby avoiding the problem of wasting resources or insufficient resources.
在第一方面的一种可能的实现方式中,该多个处理核还包括:第二处理核,用于:当第一业务从第一处理核切换至第二处理核时,从第一缓冲区获取第一历史运行信息,并根据第一历史运行信息运行第一业务。上述可能的实现方式中,当第一业务从第一处理核切换至第二处理核时,第二处理核仍然可以从缓存第一业务的第一缓冲区中获取第一历史运行信息,从而在第一业务发生切换时,避免了第二处理核出现大量冷缺失的问题。In a possible implementation of the first aspect, the multiple processing cores also include: a second processing core, which is used to: when the first service is switched from the first processing core to the second processing core, obtain the first historical operation information from the first buffer, and run the first service according to the first historical operation information. In the above possible implementation, when the first service is switched from the first processing core to the second processing core, the second processing core can still obtain the first historical operation information from the first buffer that caches the first service, thereby avoiding the problem of a large number of cold misses in the second processing core when the first service is switched.
在第一方面的一种可能的实现方式中,该历史运行信息或第一运行信息包括以下至少一项:分支地址对信息,跳转信息,指令踪迹,数据踪迹,缓存冷热信息,调度推荐信息,页表信息,硬件预取的规律信息或难预测地址信息。上述可能的实现方式中,能够大大提高上述至少一项信息的投机准确率和投机效率,进而提高业务的运行效率。In a possible implementation of the first aspect, the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information. In the above possible implementation, the speculation accuracy and speculation efficiency of the above at least one information can be greatly improved, thereby improving the operation efficiency of the business.
在第一方面的一种可能的实现方式中,该历史运行信息或第一运行信息还包括分支块索引,该分支块索引用于索引该至少一项;此时,在该分支块索引对应的分支块被访问时可以触发全局预测器的查找。上述可能的实现方式中,能够大大提高业务中分支块的投机准确率和投机效率,进而提高业务的运行效率。In a possible implementation of the first aspect, the historical operation information or the first operation information further includes a branch block index, and the branch block index is used to index the at least one item; in this case, when the branch block corresponding to the branch block index is accessed, a search of the global predictor can be triggered. In the above possible implementation, the speculation accuracy and speculation efficiency of the branch blocks in the business can be greatly improved, thereby improving the operation efficiency of the business.
在第一方面的一种可能的实现方式中,第一处理核包括:缓冲器,用于从该全局预测器获取第一历史运行信息的至少部分并缓存该至少部分。上述可能的实现方式中,通过在第一处理核中设置容量较小的缓冲区,并且通过该缓冲器动态地从全局预测器的第一历史运行信息中为第一业务获取运行所需的运行信息,从而避免了第一历史运行信息的传输时延对第一业务的运行效率的影响,进而提高了第一业务的运行效率。In a possible implementation of the first aspect, the first processing core includes: a buffer, which is used to obtain at least part of the first historical operation information from the global predictor and cache the at least part. In the above possible implementation, by setting a buffer with a smaller capacity in the first processing core, and dynamically obtaining the operation information required for the operation of the first service from the first historical operation information of the global predictor through the buffer, the influence of the transmission delay of the first historical operation information on the operation efficiency of the first service is avoided, thereby improving the operation efficiency of the first service.
在第一方面的一种可能的实现方式中,第一历史运行信息为与第一业务相同或属于同一类型的其他业务的历史运行信息。上述可能的实现方式中,当第一历史运行信息为与第一业务相同或属于同一类型的其他业务的历史运行信息时,第一业务的运行信息与第一历史运行信息存在较大的重复性和相似性,此时第一处理核根据第一历史运行信息运行第一业务,可以大大提高第一业务的运行效率。In a possible implementation of the first aspect, the first historical operation information is historical operation information of other services that are the same as or belong to the same type as the first service. In the above possible implementation, when the first historical operation information is historical operation information of other services that are the same as or belong to the same type as the first service, the operation information of the first service has a large degree of duplication and similarity with the first historical operation information. At this time, the first processing core runs the first service according to the first historical operation information, which can greatly improve the operation efficiency of the first service.
第二方面,提供一种业务处理方法,应用于业务处理装置中,该装置包括全局预测器、以及耦合至该全局预测器的多个处理核;该方法包括:该全局预测器缓存该多个处理核中至少一个处理核的历史运行信息;该多个处理核中的第一处理核获取该历史运行信息中的第一历史运行信息,并根据第一历史运行信息运行第一业务,以得到第一运行信息,第一处理核为该多个处理核中的任一处理核;该全局预测器缓存第一运行信息。In a second aspect, a business processing method is provided, which is applied to a business processing device, the device comprising a global predictor and a plurality of processing cores coupled to the global predictor; the method comprising: the global predictor caches historical operation information of at least one processing core among the plurality of processing cores; a first processing core among the plurality of processing cores obtains first historical operation information in the historical operation information, and runs a first business according to the first historical operation information to obtain first operation information, the first processing core being any processing core among the plurality of processing cores; the global predictor caches the first operation information.
在第二方面的一种可能的实现方式中,该历史运行信息包括多个业务的历史运行信息,该多个业务的历史运行信息包括第一历史运行信息;该全局预测器缓存该多个处理核中至少一个处理核的历史运行信息,包括:该全局预测器的多个缓冲区分别缓存该多个业务的历史运行信息,其中该多个缓冲区中的第一缓冲区用于缓存第一历史运行信息。In a possible implementation of the second aspect, the historical operation information includes historical operation information of multiple businesses, and the historical operation information of the multiple businesses includes first historical operation information; the global predictor caches historical operation information of at least one processing core among the multiple processing cores, including: multiple buffers of the global predictor respectively cache the historical operation information of the multiple businesses, wherein the first buffer among the multiple buffers is used to cache the first historical operation information.
在第二方面的一种可能的实现方式中,该方法还包括:第一处理核获取针对第一业务的配置信息,该配置信息用于指示第一缓冲区;第一处理核根据该配置信息为第一业务配置第一缓冲区,其中配置第一缓冲区包括使能、关闭或清除第一缓冲区。In a possible implementation of the second aspect, the method also includes: the first processing core obtains configuration information for the first service, and the configuration information is used to indicate a first buffer; the first processing core configures the first buffer for the first service according to the configuration information, wherein configuring the first buffer includes enabling, disabling or clearing the first buffer.
在第二方面的一种可能的实现方式中,第一缓冲区的大小根据第一业务确定。In a possible implementation manner of the second aspect, the size of the first buffer is determined according to the first service.
在第二方面的一种可能的实现方式中,该多个处理核还包括第二处理核,该方法还包括:当第一业务从第一处理核切换至第二处理核时,第二处理核从第一缓冲区获取第一历史运行信息,并根据第一历史运行信息运行第一业务。In a possible implementation of the second aspect, the multiple processing cores also include a second processing core, and the method also includes: when the first business is switched from the first processing core to the second processing core, the second processing core obtains the first historical operation information from the first buffer and runs the first business according to the first historical operation information.
在第二方面的一种可能的实现方式中,该历史运行信息或第一运行信息包括以下至少一项:分支地址对信息,跳转信息,指令踪迹,数据踪迹,缓存冷热信息,调度推荐信息,页表信息,硬件预取的规律信息或难预测地址信息。In a possible implementation of the second aspect, the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
在第二方面的一种可能的实现方式中,该历史运行信息或第一运行信息还包括分支块索引,该分支块索引用于索引该至少一项。In a possible implementation manner of the second aspect, the historical operation information or the first operation information further includes a branch block index, and the branch block index is used to index the at least one item.
在第二方面的一种可能的实现方式中,第一处理核获取该历史运行信息中的第一历史运行信息,包括:第一处理核的缓冲器从该全局预测器获取第一历史运行信息的至少部分并缓存该至少部分。In a possible implementation of the second aspect, the first processing core obtains first historical operation information in the historical operation information, including: a buffer of the first processing core obtains at least part of the first historical operation information from the global predictor and caches the at least part.
在第二方面的一种可能的实现方式中,第一历史运行信息为与第一业务相同或属于同一类型的其他业务的历史运行信息。In a possible implementation manner of the second aspect, the first historical operation information is historical operation information of other services that are the same as or belong to the same type as the first service.
第三方面,提供一种电子设备,该电子设备包括存储器和业务处理装置,该业务处理装置包括全局预测器和多个处理核,该存储器用于存储计算机指令,该业务处理装置用于执行该计算机指令,以使该电子设备实现如第二方面或第二方面的任一种可能的实现方式所提供的业务处理方法。In a third aspect, an electronic device is provided, which includes a memory and a business processing device, the business processing device includes a global predictor and multiple processing cores, the memory is used to store computer instructions, and the business processing device is used to execute the computer instructions so that the electronic device implements the business processing method provided by the second aspect or any possible implementation of the second aspect.
在本申请的又一方面,提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序或指令,当该计算机程序或指令被运行时,实现如第二方面或第二方面的任一种可能的实现方式所提供的业务处理方法。In another aspect of the present application, a computer-readable storage medium is provided, in which a computer program or instruction is stored. When the computer program or instruction is executed, the business processing method provided by the second aspect or any possible implementation of the second aspect is implemented.
在本申请的又一方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序(也可以称为代码,或指令),当该计算机程序被运行时,使得计算机执行如第二方面或者第二方面的任一种可能的实现方式所提供的业务处理方法。In another aspect of the present application, a computer program product is provided, which includes: a computer program (also referred to as code, or instruction), which, when executed, enables a computer to execute a business processing method provided in the second aspect or any possible implementation of the second aspect.
可以理解地,上述提供的任一种业务处理方法、电子设备、计算机可读存储介质和计算机程序产品,其所能达到的有益效果可对应参考上文所提供的业务处理装置中的有益效果,此处不再赘述。It can be understood that the beneficial effects that can be achieved by any of the business processing methods, electronic devices, computer-readable storage media and computer program products provided above can correspond to the beneficial effects in the business processing device provided above, and will not be repeated here.
图1为本申请实施例提供的一种多核系统的结构示意图;FIG1 is a schematic diagram of the structure of a multi-core system provided in an embodiment of the present application;
图2为本申请实施例提供的一种业务处理装置的结构示意图;FIG2 is a schematic diagram of the structure of a service processing device provided in an embodiment of the present application;
图3为本申请实施例提供的一种LBTB的结构示意图;FIG3 is a schematic diagram of the structure of a LBTB provided in an embodiment of the present application;
图4为本申请实施例提供的另一种LBTB的结构示意图;FIG4 is a schematic diagram of the structure of another LBTB provided in an embodiment of the present application;
图5为本申请实施例提供的一种处理核获取历史运行信息的示意图;FIG5 is a schematic diagram of a processing core acquiring historical operation information provided by an embodiment of the present application;
图6为本申请实施例提供的一种sBTB的结构示意图;FIG6 is a schematic diagram of the structure of an sBTB provided in an embodiment of the present application;
图7为本申请实施例提供的一种多个处理核共享全局预测的示意图;FIG7 is a schematic diagram of a method of sharing global prediction among multiple processing cores provided in an embodiment of the present application;
图8为本申请实施例提供的一种配置全局预测器的示意图;FIG8 is a schematic diagram of configuring a global predictor provided in an embodiment of the present application;
图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
下文将详细论述各实施例的制作和使用。但应了解,本申请提供的许多适用发明概念可实施在多种具体环境中。所论述的具体实施例仅仅说明用以实施和使用本申请和本技术的具体方式,而不限制本申请的范围。The following will discuss the making and use of each embodiment in detail. However, it should be understood that many applicable inventive concepts provided by this application can be implemented in a variety of specific environments. The specific embodiments discussed are only illustrative of the specific ways to implement and use this application and this technology, and do not limit the scope of this application.
除非另有定义,否则本文所用的所有科技术语都具有与本领域普通技术人员公知的含义相同的含义。Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art.
各电路或其它组件可描述为或称为“用于”执行一项或多项任务。在这种情况下,“用于”用来通过指示电路/组件包括在操作期间执行一项或多项任务的结构(例如电路系统)来暗指结构。因此,即使当指定的电路/组件当前不可操作(例如未打开)时,该电路/组件也可以称为用于执行该任务。与“用于”措辞一起使用的电路/组件包括硬件,例如执行操作的电路等。Various circuits or other components may be described or referred to as being "configured to" perform one or more tasks. In this case, "configured to" is used to imply structure by indicating that the circuit/component includes structure (e.g., circuitry) that performs the one or more tasks during operation. Thus, even when the specified circuit/component is not currently operational (e.g., not turned on), the circuit/component may be referred to as being configured to perform the task. Circuits/components used with the phrase "configured to" include hardware, such as circuits that perform an operation, etc.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。在本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,a、b和c;其中a、b和c可以是单个,也可以是多个。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. In the present application, "at least one" means one or more, and "more" means two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single items or plural items. For example, at least one of a, b or c can represent: a, b, c, a and b, a and c, b and c, a, b and c; where a, b and c can be single or multiple.
本申请的实施例采用了“第一”和“第二”等字样对名称或功能或作用类似的对象进行区分,本领域技术人员可以理解“第一”和“第二”等字样并不对数量和执行次序进行限定。“耦合”一词用于表示电性连接,包括通过导线或连接端直接相连或通过其他器件间接相连。因此“耦合”应被视为是一种广义上的电子通信连接。The embodiments of the present application use words such as "first" and "second" to distinguish objects with similar names, functions or effects. Those skilled in the art will understand that words such as "first" and "second" do not limit the quantity and execution order. The term "coupled" is used to indicate electrical connection, including direct connection through wires or connection terminals or indirect connection through other devices. Therefore, "coupled" should be regarded as a broad electronic communication connection.
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in this application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in this application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.
目前,中央处理单元(central processing unit,CPU)子系统在进行业务处理时通常存在效率低下的问题。该问题具体表现为:缓存缺失率高,分支预测准确率低,指令数/时钟周期(instruction per clock,IPC)低。比如,CPU子系统上运行的业务的指令和数据的规模较大,而CPU子系统中的缓存、分支目标缓冲器(branch target buffer,BTB)和分支方向预测器等资源的容量有限,容易造成较多的缓存缺失。又比如,CPU子系统中的软件不感知前端预取和后端预取等投机手段,致使预取信息无法辅助其他预测,从而造成信息浪费。再比如,CPU子系统上运行的应用程序具有的线程数远大于CPU子系统包括的处理核((或称为内核,也叫处理器核))数,容易造成大量线程的切换(包括同一处理核内的切换和不同处理核间的切换),当线程发生切换时,缓存、BTB和分支方向预测器等资源中存储的数据并非当前线程所需要的数据,从而当前线程会出现大量冷缺失的问题。At present, the central processing unit (CPU) subsystem usually has the problem of low efficiency when processing business. The specific manifestations of this problem are: high cache miss rate, low branch prediction accuracy, and low instruction number/clock cycle (instruction per clock, IPC). For example, the scale of instructions and data of the business running on the CPU subsystem is large, while the capacity of resources such as cache, branch target buffer (BTB) and branch direction predictor in the CPU subsystem is limited, which easily causes more cache misses. For another example, the software in the CPU subsystem does not perceive speculative means such as front-end prefetching and back-end prefetching, which makes the prefetching information unable to assist other predictions, resulting in information waste. For another example, the number of threads of an application running on the CPU subsystem is far greater than the number of processing cores (or kernels, also called processor cores) included in the CPU subsystem, which can easily cause a large number of thread switches (including switches within the same processing core and switches between different processing cores). When threads switch, the data stored in resources such as the cache, BTB, and branch direction predictor are not the data required by the current thread, resulting in a large number of cold miss problems for the current thread.
基于此,本申请实施例提供一种业务处理装置,该业务处理装置包括全局预测器(global predictor,GP)、以及耦合至该全局预测器的多个处理核,该全局预测器可用于缓存历史运行信息,该多个处理核中的任一处理核可用于获取该历史运行信息中所需的运行信息,并根据获取的运行信息运行业务,该全局预测器还用于缓存运行该业务得到的运行信息。这样,通过全局预测器缓存历史运行信息、以及任一处理核运行业务所产生的运行信息,该多个处理核在运行业务时从全局预测器中获取各自所需的运行信息,可以降低该多个处理核在处理业务时的缓存缺失率、提高预测准确率和IPC(指令数/时钟周期),进而提高业务的运行效率。Based on this, an embodiment of the present application provides a business processing device, which includes a global predictor (GP) and multiple processing cores coupled to the global predictor. The global predictor can be used to cache historical operation information, and any one of the multiple processing cores can be used to obtain the required operation information in the historical operation information, and run the business according to the obtained operation information. The global predictor is also used to cache the operation information obtained by running the business. In this way, by caching the historical operation information and the operation information generated by any one of the processing cores running the business through the global predictor, the multiple processing cores obtain the required operation information from the global predictor when running the business, which can reduce the cache miss rate of the multiple processing cores when processing the business, improve the prediction accuracy and IPC (number of instructions/clock cycle), and thus improve the operation efficiency of the business.
本申请实施例提供的业务处理装置可应用于多核系统中,该多核系统可称为处理器子系统(比如,CPU子系统),下面对该多核系统的结构进行介绍说明。The business processing device provided in the embodiment of the present application can be applied to a multi-core system, which can be called a processor subsystem (for example, a CPU subsystem). The structure of the multi-core system is introduced and described below.
图1为本申请实施例提供的一种多核系统的结构示意图。该多核系统包括全局预测器、以及耦合至该全局预测器的多个处理核,该多个处理核可通过软件配置共享该全局预测器。该多个处理核可用于部署不同业务、相同业务或者部分相同的业务,该部分相同的业务可以是指在运行过程中两个业务需要访问的数据和/或指令存在部分相同。在本申请实施例中,该多个处理核可用于执行基准(bench mark)性能测试、业务性能测试、相似业务部署或业务套件部署等业务。示例性的,该多核系统的业务场景包括但不限于:服务器集群基准性能测试,服务器集群竞标业务性能测试,服务器集群相似业务部署(比如,高性能计算(high performance computing,HPC)),服务器集群业务套件部署,终端领域基准性能测试,业务切换,业务迁移等。FIG1 is a schematic diagram of the structure of a multi-core system provided in an embodiment of the present application. The multi-core system includes a global predictor and a plurality of processing cores coupled to the global predictor, and the plurality of processing cores can share the global predictor through software configuration. The plurality of processing cores can be used to deploy different services, the same services or partially the same services, and the partially the same services may refer to the existence of partially the same data and/or instructions that two services need to access during operation. In an embodiment of the present application, the plurality of processing cores can be used to perform services such as benchmark performance testing, service performance testing, similar service deployment or service suite deployment. Exemplarily, the service scenarios of the multi-core system include but are not limited to: server cluster benchmark performance testing, server cluster bidding service performance testing, server cluster similar service deployment (e.g., high performance computing (HPC)), server cluster service suite deployment, terminal field benchmark performance testing, service switching, service migration, etc.
该多核系统中包括的多个处理核的数量可根据实际需求配置,比如,该多个处理核的数量可以为2、4、6或8等;此外,该多个处理核可以为同质结构(即包括的多个处理核)或异质结构(即包括不同的处理核)。在一种示例中,如图1中的(a)所示,该多核系统包括:全局预测器、以及耦合至该全局预测器的4个处理核,该4个处理核分别表示为核0至核3,该4个处理核为同质结构。在另一种示例中,如图1中的(b)所示,该多核系统包括:全局预测器、以及耦合至该全局预测器的8个处理核,该8个处理核分别表示为核0至核3,该8个处理核为同质结构。在又一种示例中,如图1中的(c)所示,该多核系统包括:全局预测器、以及耦合至该全局预测器的8个处理核,该8个处理核可以为异质结构,该8个处理核可以包括超大核0、大核1至大核3、小核4至小核7。The number of multiple processing cores included in the multi-core system can be configured according to actual needs. For example, the number of the multiple processing cores can be 2, 4, 6 or 8; in addition, the multiple processing cores can be a homogeneous structure (i.e., multiple processing cores included) or a heterogeneous structure (i.e., including different processing cores). In one example, as shown in (a) of FIG. 1 , the multi-core system includes: a global predictor, and 4 processing cores coupled to the global predictor, the 4 processing cores are respectively represented as core 0 to core 3, and the 4 processing cores are a homogeneous structure. In another example, as shown in (b) of FIG. 1 , the multi-core system includes: a global predictor, and 8 processing cores coupled to the global predictor, the 8 processing cores are respectively represented as core 0 to core 3, and the 8 processing cores are a homogeneous structure. In another example, as shown in (c) of Figure 1, the multi-core system includes: a global predictor, and 8 processing cores coupled to the global predictor, the 8 processing cores may be a heterogeneous structure, and the 8 processing cores may include super large core 0, large core 1 to large core 3, and small core 4 to small core 7.
在该多核系统中,该多个处理核可以包括同一处理器的多个不同的处理核,或者该多个处理核包括多个不同处理器的处理核,即多个处理核可以形成一个或多个处理器。其中,该处理器包括但不限于:CPU、通用处理器、图形处理器(graphic processing unit,GPU)、图像信号处理器(image signal processor,ISP)、数字信号处理器(digital signal processor,DSP)、网络处理器(network processing unit,NPU)、人工智能(artificial intelligence,AI)处理器等。可选的,该多个处理核中任意两个处理核的结构和各自所具有的硬件资源(比如,缓存资源)大小可以相同或不同,本申请实施例对此不作具体限制。In the multi-core system, the multiple processing cores may include multiple different processing cores of the same processor, or the multiple processing cores may include processing cores of multiple different processors, that is, the multiple processing cores may form one or more processors. The processors include, but are not limited to, CPU, general-purpose processor, graphic processing unit (GPU), image signal processor (ISP), digital signal processor (DSP), network processor (NPU), artificial intelligence (AI) processor, etc. Optionally, the structures of any two processing cores among the multiple processing cores and the sizes of their respective hardware resources (e.g., cache resources) may be the same or different, and the embodiments of the present application do not impose specific restrictions on this.
进一步的,该多核系统还可以包括与多个处理核耦合的存储器,该存储器可以是易失性存储器或非易失性存储器,或同时包括易失性和非易失性的存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。示例性的,该RAM具有可以为静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic random access memory,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)和双倍数据速率同步动态随机存取存储器(doubledata date SDRAM,DDR SDRAM)等。图中未示出存储器。Further, the multi-core system may also include a memory coupled to a plurality of processing cores, the memory may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), which is used as an external cache. Exemplarily, the RAM may be a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), and a double data rate synchronous dynamic random access memory (DDR SDRAM). The memory is not shown in the figure.
上述多核系统可以为电子设备、或者应用于电子设备的片上系统(system of chip,SoC)或包括多个芯片的芯片组、或包含所述SoC或芯片组的模组。该电子设备可以作为终端设备,也可以作为服务器。可选的,该电子设备包括但不限于:手机、平板电脑、笔记本电脑、台式电脑、掌上电脑、超级移动个人计算机(ultra-mobile personal computer,umPC)、移动互联网设备(mobile internet device,MID)、上网本、摄像机、照相机、可穿戴设备(例如智能手表和智能手环等)、车载设备(例如,汽车、自行车、电动车、飞机、船舶、火车、高铁等)、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、智能家居设备(例如,冰箱、电视、空调、电表等)、智能机器人、车间设备、无人驾驶(self-driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端,或智慧家庭(smart home)中的无线终端、飞行设备(例如,智能机器人、热气球、无人机、飞机)等。The multi-core system can be an electronic device, or a system on chip (SoC) or a chipset including multiple chips applied to an electronic device, or a module including the SoC or chipset. The electronic device can be used as a terminal device or a server. Optionally, the electronic device includes but is not limited to: mobile phones, tablet computers, laptop computers, desktop computers, PDAs, ultra-mobile personal computers (ultra-mobile personal computers, umPCs), mobile internet devices (mobile internet devices, MIDs), netbooks, cameras, cameras, wearable devices (such as smart watches and smart bracelets, etc.), vehicle-mounted devices (such as cars, bicycles, electric vehicles, airplanes, ships, trains, high-speed trains, etc.), virtual reality (VR) devices, augmented reality (AR) devices, industrial control (i The wireless terminals include wireless terminals in industrial control, smart home devices (e.g., refrigerators, TVs, air conditioners, electric meters, etc.), intelligent robots, workshop equipment, wireless terminals in self-driving, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes, and flying equipment (e.g., intelligent robots, hot air balloons, drones, airplanes), etc.
图2为本申请实施例提供的一种业务处理装置的结构示意图,该业务处理装置包括全局预测器10、以及耦合至该全局预测器10的多个处理核20,该多个处理核20共享该全局预测器10,比如通过软件配置该多个处理核20共享该全局预测器10的不同资源。图2中以该多个处理核20包括处理核21至处理核2n为例进行说明,n大于1的为整数。FIG2 is a schematic diagram of the structure of a service processing device provided by an embodiment of the present application, wherein the service processing device includes a global predictor 10 and a plurality of processing cores 20 coupled to the global predictor 10, wherein the plurality of processing cores 20 share the global predictor 10, for example, by configuring the plurality of processing cores 20 through software to share different resources of the global predictor 10. FIG2 is taken as an example in which the plurality of processing cores 20 include processing cores 21 to 2n, where n is an integer greater than 1.
该全局预测器10用于:缓存该多个处理核20中至少一个处理核的历史运行信息。该至少一个处理核包括一个或者多个处理核。其中,当该至少一个处理核包括多个处理核时,该至少一个处理核可以为该多个处理核20中的部分,也可以为该多个处理核20的全部。该全局预测器10中缓存的历史运行信息可以是该至少一个处理核运行业务得到的,比如,该至少一个处理核中的每个处理核可以运行一个或多个业务,并将运行得到的运行信息传输至该全局预测器10,该全局预测器10将该至少一个处理核传输的运行信息作为该历史运行信息进行缓存。The global predictor 10 is used to: cache historical operation information of at least one processing core among the multiple processing cores 20. The at least one processing core includes one or more processing cores. Wherein, when the at least one processing core includes multiple processing cores, the at least one processing core may be a part of the multiple processing cores 20, or may be all of the multiple processing cores 20. The historical operation information cached in the global predictor 10 may be obtained by the at least one processing core running a service. For example, each of the at least one processing core may run one or more services, and transmit the operation information obtained by the operation to the global predictor 10. The global predictor 10 caches the operation information transmitted by the at least one processing core as the historical operation information.
对于该多个处理核20中的任一处理核(为方便描述,下文中称为第一处理核21)用于:获取该历史运行信息中的第一历史运行信息,并根据第一历史运行信息运行第一业务,以得到第一运行信息。第一历史运行信息可以为该历史运行信息的部分或全部。可选的,第一历史运行信息为与第一业务相同或属于同一类型的其他业务的历史运行信息。比如,第一历史运行信息可以为第一业务的历史运行信息,即第一业务之前运行得到的运行信息;或者,第一业务和第二业务为同一高性能计算中的两个业务,第一历史运行信息为第二业务的历史运行信息。下文中以第一历史运行信息为第一业务的历史运行信息为例进行说明。Any one of the multiple processing cores 20 (hereinafter referred to as the first processing core 21 for the convenience of description) is used to: obtain the first historical operation information in the historical operation information, and run the first business according to the first historical operation information to obtain the first operation information. The first historical operation information may be part or all of the historical operation information. Optionally, the first historical operation information is the historical operation information of other businesses that are the same as or belong to the same type as the first business. For example, the first historical operation information may be the historical operation information of the first business, that is, the operation information obtained by the previous operation of the first business; or, the first business and the second business are two businesses in the same high-performance computing, and the first historical operation information is the historical operation information of the second business. The following description takes the first historical operation information as the historical operation information of the first business as an example.
该全局预测器10还用于:缓存第一运行信息。比如,第一处理核21在得到第一运行信息时,将第一运行信息传输至该全局预测器10,该全局预测器10接收并缓存第一运行信息。其中,在该全局预测器10缓存第一运行信息之后,该多个处理核20中的任一处理核(比如,第一处理核21或其他处理核)从该全局预测器10获取历史运行信息时,该历史运行信息中包括第一运行信息。也即是,该历史运行信息是动态更新的,该全局预测器10可以根据该多个处理核20运行业务得到的运行信息动态更新该历史运行信息。The global predictor 10 is also used to: cache the first operation information. For example, when the first processing core 21 obtains the first operation information, it transmits the first operation information to the global predictor 10, and the global predictor 10 receives and caches the first operation information. After the global predictor 10 caches the first operation information, when any processing core of the multiple processing cores 20 (for example, the first processing core 21 or other processing cores) obtains historical operation information from the global predictor 10, the historical operation information includes the first operation information. That is, the historical operation information is dynamically updated, and the global predictor 10 can dynamically update the historical operation information according to the operation information obtained by the multiple processing cores 20 running the business.
可选的,该历史运行信息或第一运行信息包括以下至少一项:分支地址对信息,跳转信息,指令踪迹,数据踪迹,缓存冷热信息,调度推荐信息,页表信息,硬件预取的规律信息或难预测地址信息。其中,分支地址对信息是指被执行的分支在存储器中的地址,该地址包括源地址和目标地址,该源地址和目标地址组成一个地址对。该跳转信息可以包括跳转方向和跳转目的,该跳转方向用于指示跳转指令的跳转方向(即跳转或不跳转),该跳转目的是指跳转后的目的地址。该数据踪迹用于指示被访问的数据的轨迹。该指令踪迹用于指示被访问的指令的轨迹。该缓存冷热信息用于指示缓存(比如,一级、二级和三级高速缓存)中的数据和/或指令的冷热程度。调度推荐信息用于指示调度过程中推荐的信息,比如基于服务质量(quality of service,QoS)推荐的调度策略。页表信息用于指示被访问的数据和/或指令对应的虚拟地址与物理地址之间的对应关系。硬件预取的规律信息用于指示硬件预取数据和/或指令时的规律,该预取可以包括前端预取和/或后端预取。难预测地址信息用于指示硬件预取器(hardware prefetch,HWP)难预测的地址信息。上述地址所针对的所述存储器可以与上述一个或多个处理核位于同一个或不同的芯片上,可以是易失性或非易失性的存储器,本实施例对不做限定。Optionally, the historical operation information or the first operation information includes at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information. Among them, the branch address pair information refers to the address of the executed branch in the memory, and the address includes a source address and a target address, and the source address and the target address form an address pair. The jump information may include a jump direction and a jump destination, and the jump direction is used to indicate the jump direction of the jump instruction (i.e., jump or not jump), and the jump destination refers to the destination address after the jump. The data trace is used to indicate the trace of the accessed data. The instruction trace is used to indicate the trace of the accessed instruction. The cache hot and cold information is used to indicate the hot and cold degree of data and/or instructions in the cache (e.g., the first, second and third level cache). The scheduling recommendation information is used to indicate the recommended information in the scheduling process, such as the scheduling policy recommended based on the quality of service (QoS). The page table information is used to indicate the correspondence between the virtual address and the physical address corresponding to the accessed data and/or instruction. The regularity information of hardware prefetching is used to indicate the regularity of hardware prefetching data and/or instructions, which may include front-end prefetching and/or back-end prefetching. The difficult-to-predict address information is used to indicate the address information that is difficult to predict for the hardware prefetcher (hardware prefetch, HWP). The memory targeted by the above address may be located on the same or different chip as the above one or more processing cores, and may be a volatile or non-volatile memory, which is not limited in this embodiment.
可以理解的是,上述历史运行信息或第一运行信息除了包括上述信息,还可以包括业务运行过程中所需的其他信息(比如值预测信息,该值预测信息可以是指在内存访问没有完成时通过预测得到的数值),本申请实施例对此不作具体限制。It can be understood that the above-mentioned historical operation information or first operation information, in addition to the above-mentioned information, may also include other information required in the business operation process (such as value prediction information, which may refer to the numerical value obtained by prediction when the memory access is not completed). The embodiment of the present application does not impose specific restrictions on this.
可选的,上述历史运行信息或第一运行信息还包括分支块索引,该分支块索引用于索引上述至少一项信息。此时,在该分支块索引对应的分支块被访问时可以触发全局预测器10的查找。其中,该分支块索引也可以称为分支块程序计数器(program counter,PC)索引(block PC index)。为便于理解,下面以该历史运行信息为例,对该历史运行信息和第一运行信息中包括的分支块索引和该分支块索引对应的至少一项进行说明。Optionally, the above historical operation information or the first operation information also includes a branch block index, and the branch block index is used to index at least one of the above information. At this time, when the branch block corresponding to the branch block index is accessed, the search of the global predictor 10 can be triggered. Among them, the branch block index can also be called a branch block program counter (program counter, PC) index (block PC index). For ease of understanding, the historical operation information is taken as an example below to explain the branch block index included in the historical operation information and the first operation information and at least one item corresponding to the branch block index.
在一种示例中,假设该历史运行信息包括多个业务的历史运行信息,每个业务的历史运行信息可以包括至少一个分支块索引、以及与每个分支块索引对应的历史运行子信息,每个历史运行子信息可以包括以下至少一项:分支地址对信息,跳转信息,指令踪迹,数据踪迹,缓存冷热信息,调度推荐信息,页表信息,硬件预取的规律信息或难预测地址信息。In one example, assuming that the historical operation information includes historical operation information of multiple businesses, the historical operation information of each business may include at least one branch block index, and historical operation sub-information corresponding to each branch block index, and each historical operation sub-information may include at least one of the following: branch address pair information, jump information, instruction trace, data trace, cache hot and cold information, scheduling recommendation information, page table information, hardware prefetching regularity information or difficult-to-predict address information.
其中,上述多个业务中的每个业务可以包括至少一个分支块,该至少一个分支块中的每个分支块(或称为子程序或程序段)可以对应有一个分支块索引,该分支块在运行后可对应得到一个历史运行子信息,一个分支块的分支块索引可用于索引该分支块运行后得到的历史运行子信息。可选的,该分支块索引可以为取指令地址,该取指令地址可以为分支块对应的源地址。Each of the above-mentioned multiple businesses may include at least one branch block, and each branch block (or subroutine or program segment) in the at least one branch block may correspond to a branch block index, and the branch block may correspond to a historical operation sub-information after running, and the branch block index of a branch block may be used to index the historical operation sub-information obtained after the branch block runs. Optionally, the branch block index may be an instruction fetch address, and the instruction fetch address may be a source address corresponding to the branch block.
示例性的,该历史运行信息可以如下表1所示。表1中假设该多个业务包括M个业务,且该M个业务表示为业务1至业务M,业务1的至少一个分支块索引表示为ID1-1、ID1-2、…,业务2的至少一个分支块索引表示为ID2-1、ID2-2、…,业务M的至少一个分支块索引表示为IDM-1、IDM-2、…。Exemplarily, the historical operation information may be shown in the following Table 1. In Table 1, it is assumed that the multiple services include M services, and the M services are represented as services 1 to M, at least one branch block index of service 1 is represented as ID1-1, ID1-2, ..., at least one branch block index of service 2 is represented as ID2-1, ID2-2, ..., and at least one branch block index of service M is represented as IDM-1, IDM-2, ...
表1
Table 1
可以理解的是,上述表1中以不同分支块索引对应的历史运行子信息所包括的信息类型相同为例进行说明,在实际应用中,不同分支块索引对应的历史运行子信息所包括的信息类型也可以不同,本申请实施例对此不作具体限制。It can be understood that the above Table 1 uses the example that the information types included in the historical operation sub-information corresponding to different branch block indexes are the same. In actual applications, the information types included in the historical operation sub-information corresponding to different branch block indexes may also be different, and the embodiments of the present application do not impose specific restrictions on this.
此外,该全局预测器10中还可以包括该多个业务中每个业务的其他信息,比如,该其他信息可以包括业务标识、和/或业务上下文(context)等,本申请实施例对此不作具体限制。In addition, the global predictor 10 may also include other information of each of the multiple services. For example, the other information may include a service identifier and/or a service context, etc. This embodiment of the present application does not impose any specific limitation on this.
进一步的,该全局预测器10包括多个缓冲区。该多个缓冲区用于:分别缓存该多个业务的历史运行信息。其中,该多个缓冲区中的每个缓冲区可用于:缓存一个处理核运行得到的历史运行信息,即不同处理核运行得到的历史运行信息可对应缓存在不同的缓冲区中;或者,用于缓存一个业务的历史运行信息,即不同业务的历史运行信息可对应缓存在不同的缓冲区中。Further, the global predictor 10 includes a plurality of buffers. The plurality of buffers are used to cache the historical operation information of the plurality of services respectively. Each of the plurality of buffers can be used to cache the historical operation information obtained by the operation of a processing core, that is, the historical operation information obtained by the operation of different processing cores can be cached in different buffers; or, to cache the historical operation information of a service, that is, the historical operation information of different services can be cached in different buffers.
可选的,该多个缓冲区包括第一缓冲区,第一缓冲区用于缓存第一业务的历史运行信息,第一业务的历史运行信息可以包括第一历史运行信息。在一种可能的示例中,对于该多个处理核20中除第一处理核21之外的任意一个处理核(本文中称为第二处理核22),当第一业务从第一处理核21的第一线程切换至第二处理核22的第二线程时,第二处理核22可用于从第一缓冲区获取第一历史运行信息,并根据第一历史运行信息运行第一业务。在该示例中,当第一业务从第一处理核21切换至第二处理核22时,第二处理核22仍然可以从缓存第一业务的第一缓冲区中获取第一历史运行信息,从而在第一业务发生切换时,避免了第二处理核22出现大量冷缺失的问题。Optionally, the multiple buffers include a first buffer, the first buffer is used to cache historical operation information of the first business, and the historical operation information of the first business may include first historical operation information. In a possible example, for any one of the multiple processing cores 20 except the first processing core 21 (referred to as the second processing core 22 herein), when the first business is switched from the first thread of the first processing core 21 to the second thread of the second processing core 22, the second processing core 22 can be used to obtain the first historical operation information from the first buffer, and run the first business according to the first historical operation information. In this example, when the first business is switched from the first processing core 21 to the second processing core 22, the second processing core 22 can still obtain the first historical operation information from the first buffer that caches the first business, thereby avoiding the problem of a large number of cold misses in the second processing core 22 when the first business is switched.
可选的,该多个缓冲区中每个缓冲区的大小(或称为尺寸)可以是根据该缓存区所缓存的历史运行信息所属的业务确定的。示例性的,第一缓冲区的大小根据第一业务确定。也即是,用于缓存一个业务的历史运行信息的缓冲区的大小,可以是根据该业务的大小动态确定的,比如,在实际应用中,具体可以由运行在该多个处理核20上的软件(比如,操作系统)动态地为每个业务确定对应的缓冲区的大小,本申请实施例对此不作具体限制。Optionally, the size (or dimension) of each buffer in the multiple buffers may be determined according to the business to which the historical operation information cached in the buffer belongs. Exemplarily, the size of the first buffer is determined according to the first business. That is, the size of the buffer used to cache the historical operation information of a business may be dynamically determined according to the size of the business. For example, in actual applications, the size of the corresponding buffer may be dynamically determined for each business by software (e.g., an operating system) running on the multiple processing cores 20, and the embodiments of the present application do not impose specific restrictions on this.
在一种可能的实施例中,该全局预测器10包括多路缓冲器,上述多个缓存区中的每个缓冲区可以包括该多路缓冲器中的至少一路缓冲器。可选的,该多路缓冲器中的每路缓冲器可用于缓存一个业务的历史运行信息,不同业务的历史运行信息缓存在不同的缓冲器中,同一业务的历史运行信息可以占用一路或者多路缓冲器。此外,该多路缓冲器是相互独立的,每路缓冲器可以被单独访问,且不同路的缓冲器对应的访问互不影响。示例性的,该多路缓冲器包括第一路缓冲器,第一路缓冲器用于缓存第一业务的历史运行信息。In a possible embodiment, the global predictor 10 includes a multi-way buffer, and each buffer in the above-mentioned multiple buffer areas may include at least one buffer in the multi-way buffer. Optionally, each buffer in the multi-way buffer can be used to cache historical operation information of a service, and historical operation information of different services is cached in different buffers. The historical operation information of the same service can occupy one or more buffers. In addition, the multi-way buffers are independent of each other, and each buffer can be accessed separately, and the access corresponding to the buffers of different ways does not affect each other. Exemplarily, the multi-way buffer includes a first buffer, and the first buffer is used to cache the historical operation information of the first service.
当某一业务的历史运行信息包括分支地址对信息、跳转信息和指令踪迹等上述多项信息中的至少一项时,对于该至少一项中的任意一项,该全局预测器10中的每路缓冲器均可以通过下述结构进行缓存。为便于描述,下文中以第一处理核21运行第一业务,第一路缓冲器用于缓存第一业务的历史运行信息,该第一业务的历史运行信息包括分支地址对信息为例进行说明。其中,第一路缓冲器也可以称为链接分支目标缓冲器(link branch target buffer,LBTB)。When the historical operation information of a certain service includes at least one of the above multiple information such as branch address pair information, jump information and instruction trace, for any one of the at least one item, each buffer in the global predictor 10 can be cached by the following structure. For the convenience of description, the following example is taken as an example that the first processing core 21 runs the first service, the first buffer is used to cache the historical operation information of the first service, and the historical operation information of the first service includes branch address pair information. Among them, the first buffer can also be called a link branch target buffer (LBTB).
在一种示例中,如图3所示,该LBTB可以包括:历史队列(history queue)、存储电路、搜索队列(search queue)和回填队列(fill queue)。该历史队列、该搜索队列和该回填队列均与该存储电路耦合。In one example, as shown in Fig. 3, the LBTB may include: a history queue, a storage circuit, a search queue, and a fill queue. The history queue, the search queue, and the fill queue are all coupled to the storage circuit.
该历史队列用于:在第一业务运行过程中出现分支预测目标地址失效(图中表示为预测失效,该预测失效可以是指在第一处理核21的缓存(比如,下文中的第一缓冲器和第二缓冲器)中没有查找到分支对应的目标地址)时,从第一业务对应的流水线中获取正确的分支地址对(即包括源地址和目标地址),并将该分支地址对输出至该存储电路。可选的,为了减少对该LBTB的更新次数,以减小功耗开销和访问冲突,该历史队列可用于在累计获取到预设数量的分支地址对(比如,4个地址对)时,将该预设数量的分支地址对作为一组按照一定的存储格式输出至该存储电路。图3中以该预设数量的分支地址对包括brn add0-tgt add0、brn add1-tgt add1、brn add2-tgt add2、brn add3-tgt add3,该存储格式中还包括链接地址link add为例,该link add用于指示下一组的分支地址对。The history queue is used to obtain the correct branch address pair (i.e., including the source address and the target address) from the pipeline corresponding to the first business when a branch prediction target address fails (represented as a prediction failure in the figure, and the prediction failure may mean that the target address corresponding to the branch is not found in the cache of the first processing core 21 (e.g., the first buffer and the second buffer hereinafter)) during the operation of the first business, and output the branch address pair to the storage circuit. Optionally, in order to reduce the number of updates to the LBTB to reduce power consumption and access conflicts, the history queue can be used to output the preset number of branch address pairs as a group to the storage circuit in a certain storage format when a preset number of branch address pairs (e.g., 4 address pairs) are accumulated. In FIG3, the preset number of branch address pairs includes brn add0-tgt add0, brn add1-tgt add1, brn add2-tgt add2, brn add3-tgt add3, and the storage format also includes a link address link add as an example, and the link add is used to indicate the branch address pair of the next group.
该存储电路用于:接收并缓存该历史队列输出的分支地址对。其中,当以上述存储格式对一组分支地址对进行缓存时,该存储电路可以用第一个分支地址对的源地址进行索引将该一组分支地址写入对应的存储中。可选的,该存储电路可以包括多路相连的RAM,这样在搜索该存储电路时可以提高搜索效率。The storage circuit is used to receive and cache the branch address pairs output by the history queue. When a group of branch address pairs are cached in the above storage format, the storage circuit can index the source address of the first branch address pair and write the group of branch addresses into the corresponding storage. Optionally, the storage circuit can include multi-way connected RAM, so that the search efficiency can be improved when searching the storage circuit.
该搜索队列用于:获取分支块的取指令地址,并将该取指令地址输出至该存储电路。其中,该搜索队列可以接收并过滤重复的地址信息;此外,该搜索队列可以为多进一出的队列,这样能够实现平衡输入与输出的带宽差的效果。可选的,该搜索队列可以在第一处理核21满足一定条件时获取该取指令地址,比如,该条件可以包括但不限于:在主分支目标缓冲器(main branch target buffer,mBTB)中查询且未命中,或者在流式分支目标缓冲器(stream branch target buffer,sBTB)中查询且命中等。关于mBTB和sBTB的详细描述具体可以参考下文中的描述,本申请实施例在此不再赘述。The search queue is used to obtain the instruction fetch address of the branch block and output the instruction fetch address to the storage circuit. The search queue can receive and filter duplicate address information; in addition, the search queue can be a multi-input and one-output queue, so that the bandwidth difference between input and output can be balanced. Optionally, the search queue can obtain the instruction fetch address when the first processing core 21 meets certain conditions. For example, the conditions may include but are not limited to: querying and missing in the main branch target buffer (main branch target buffer, mBTB), or querying and hitting in the stream branch target buffer (stream branch target buffer, sBTB). For a detailed description of mBTB and sBTB, please refer to the description below, and the embodiments of the present application will not be repeated here.
该存储电路还用于:当接收到第一处理核21的分支块的取指令地址时,根据该取指令地址获取对应的分支地址对,并将该分支地址对输出至该回填队列。可选的,其中,当以上述存储格式对一组分支地址对进行缓存时,该存储电路可以拆解获取到的一组分支地址,并将拆解得到的预设数量的分支地址对输出至该回填队列,将该组分支地址对中的link add输出至该搜索队列,以使搜索队列基于该link add进行下次的搜索。The storage circuit is also used for: when receiving the instruction fetch address of the branch block of the first processing core 21, obtaining the corresponding branch address pair according to the instruction fetch address, and outputting the branch address pair to the backfill queue. Optionally, when a group of branch address pairs are cached in the above storage format, the storage circuit can disassemble the obtained group of branch addresses, and output a preset number of branch address pairs obtained by disassembly to the backfill queue, and output the link add in the group of branch address pairs to the search queue, so that the search queue performs the next search based on the link add.
该回填队列用于:将该存储电路输出的分支地址对回填到第一处理核21中。可选的,该回填队列可以将该存储电路输出的预设数量的分支地址对回填到第一处理核21的缓冲器中,比如,回填到第一处理核21的sBTB中。The backfill queue is used to backfill the branch address pairs output by the storage circuit into the first processing core 21. Optionally, the backfill queue can backfill a preset number of branch address pairs output by the storage circuit into the buffer of the first processing core 21, for example, into the sBTB of the first processing core 21.
在另一种示例中,结合图3,如图4所示,对于该多个处理核20中的每个处理核,该全局预测器10可以包括与每个处理核对应的历史队列、存储电路、搜索队列和回填队列。其中,该全局预测器10中的不同存储电路可以集成在一起作为存储模块;或者,该全局预测器包括存储模块,该存储模块包括与每个处理核对应的存储电路,该存储电路包括多路缓存。图4中以该多个处理核20包括4个处理核且分别表示为核0至核3,核0对应的历史队列0、搜索队列0、回填队列0以及两路缓存w0和w1,核1对应的历史队列1、搜索队列1、回填队列1以及两路缓存w2和w3,核2对应的历史队列2、搜索队列2、回填队列2以及两路缓存w4和w5,核3对应的历史队列3、搜索队列3、回填队列3以及四路缓存w6至w9。In another example, in combination with FIG. 3 , as shown in FIG. 4 , for each processing core in the multiple processing cores 20, the global predictor 10 may include a history queue, a storage circuit, a search queue, and a backfill queue corresponding to each processing core. Among them, the different storage circuits in the global predictor 10 may be integrated together as a storage module; or, the global predictor includes a storage module, the storage module includes a storage circuit corresponding to each processing core, and the storage circuit includes a multi-way cache. In FIG. 4 , the multiple processing cores 20 include 4 processing cores and are represented as cores 0 to 3, respectively, the history queue 0, search queue 0, backfill queue 0, and two-way caches w0 and w1 corresponding to core 0, the history queue 1, search queue 1, backfill queue 1, and two-way caches w2 and w3 corresponding to core 1, the history queue 2, search queue 2, backfill queue 2, and two-way caches w4 and w5 corresponding to core 2, and the history queue 3, search queue 3, backfill queue 3, and four-way caches w6 to w9 corresponding to core 3.
进一步的,下面以第一处理核21从该全局预测器10中获取第一历史运行信息为例,对该多个处理核20中的任一处理核从该全局预测器10中获取历史运行信息的相关过程进行介绍说明。Furthermore, the following takes the example of the first processing core 21 acquiring the first historical operation information from the global predictor 10 to introduce and illustrate the relevant process of any processing core among the multiple processing cores 20 acquiring the historical operation information from the global predictor 10 .
在一种可能的实施例中,第一处理核21包括第一缓冲器,第一缓冲器用于从该全局预测器10获取第一历史运行信息的至少部分并缓存该至少部分。其中,第一处理核21还可以包括第二缓冲器,第二缓冲器与第一缓冲器不同,该第二缓冲器可以是指第一处理核21中设置的用于缓存预取信息的缓冲器,第二缓存器不用于缓存从全局预测器10获取的历史运行信息。本申请实施例中还可以将第一缓冲器称为候选缓冲器(例如sBTB),将第二缓冲器称为主缓冲器(例如mBTB)。In a possible embodiment, the first processing core 21 includes a first buffer, and the first buffer is used to obtain at least part of the first historical operation information from the global predictor 10 and cache the at least part. The first processing core 21 may also include a second buffer, and the second buffer is different from the first buffer. The second buffer may refer to a buffer set in the first processing core 21 for caching prefetch information, and the second buffer is not used to cache the historical operation information obtained from the global predictor 10. In the embodiment of the present application, the first buffer may also be referred to as a candidate buffer (e.g., sBTB), and the second buffer may be referred to as a primary buffer (e.g., mBTB).
可选的,该第一缓冲器具体可用于在满足一定条件时,从该全局预测器10获取第一历史运行信息的至少部分并缓存该至少部分。比如,该条件可以包括:在该第二缓冲器中查询且未命中(即第二缓冲器没有需要的信息),或者在该第一缓冲器中查询且命中(第一缓冲器有需要的信息)等,本申请实施例对此不作具体限制。Optionally, the first buffer can be specifically used to obtain at least part of the first historical operation information from the global predictor 10 and cache the at least part when certain conditions are met. For example, the condition may include: querying in the second buffer and not hitting (that is, the second buffer does not have the required information), or querying in the first buffer and hitting (the first buffer has the required information), etc., and the embodiment of the present application does not impose specific restrictions on this.
其中,当第一历史运行信息包括分支地址对信息、跳转信息和指令踪迹等上述多项信息中的至少一项时,第一处理核21的第一缓冲器的数量可以为至少一个,即第一处理核21包括至少一个第一缓冲器,该至少一个第一缓冲器中的每个第一缓冲器可以用于缓存上述多项历史运行信息中的一项。为便于描述,下文中以第一历史运行信息包括分支地址对信息,第一处理核21包括主分支目标缓冲器mBTB(对应第二缓冲器)和流式分支目标缓冲器sBTB(对应第一缓冲器)为例进行说明。Among them, when the first historical operation information includes at least one of the above multiple information such as branch address pair information, jump information and instruction trace, the number of the first buffers of the first processing core 21 can be at least one, that is, the first processing core 21 includes at least one first buffer, and each first buffer of the at least one first buffer can be used to cache one of the above multiple historical operation information. For ease of description, the following is an example in which the first historical operation information includes branch address pair information, and the first processing core 21 includes a main branch target buffer mBTB (corresponding to the second buffer) and a streaming branch target buffer sBTB (corresponding to the first buffer).
示例性的,如图5所示,第一处理核21包括mBTB和sBTB,该mBTB中缓存有第一处理核21预取的第一业务对应的第一分支地址对集合,该sBTB中缓存有从全局预测器10的LBTB中获取的第二分支地址对集合。在第一处理核21运行第一业务的过程中,当第一处理核21通过取指获取到取指令地址FIVA时,第一处理核21可以先查询mBTB缓存的第一分支地址对集合中是否存在该FIVA对应的目标分支地址;若存在(即命中),则根据该目标分支地址继续执行;若不存在(即未命中),则查询sBTB中缓存的第二分支地址对集合中是否存在该FIVA对应的目标分支地址,若存在(即命中)则根据查询到的目标分支地址继续执行。此外,当在mBTB中查询且未命中,或者在sBTB中查询且命中时,说明历史运行信息作为预取信息是准确且可用的,该sBTB可以从该全局预测器10的LBTB中获取第一业务对应的更多的分支地址对,比如,该sBTB向该全局预测器10发送下一个取指令地址,以通过下一个取指令地址获取该更多的分支地址对,以供第一处理核21运行第一业务时使用。可选的,在第一业务运行结束,或者在其他业务需要使用该sBTB时,第一处理核21还可以清除(或称为刷清)sBTB,比如,第一处理核21可以通过有效位清除该sBTB。Exemplarily, as shown in FIG5 , the first processing core 21 includes an mBTB and an sBTB, wherein the mBTB caches a first branch address pair set corresponding to a first service pre-fetched by the first processing core 21, and the sBTB caches a second branch address pair set obtained from the LBTB of the global predictor 10. In the process of the first processing core 21 running the first service, when the first processing core 21 obtains the instruction fetch address FIVA through instruction fetching, the first processing core 21 may first query whether the target branch address corresponding to the FIVA exists in the first branch address pair set cached in the mBTB; if so (i.e., a hit), the execution continues according to the target branch address; if not (i.e., a miss), the second branch address pair set cached in the sBTB is queried whether the target branch address corresponding to the FIVA exists, and if so (i.e., a hit), the execution continues according to the queried target branch address. In addition, when the query in the mBTB does not hit, or the query in the sBTB hits, it indicates that the historical operation information is accurate and available as pre-fetch information, and the sBTB can obtain more branch address pairs corresponding to the first business from the LBTB of the global predictor 10. For example, the sBTB sends the next instruction fetch address to the global predictor 10 to obtain the more branch address pairs through the next instruction fetch address for use when the first processing core 21 runs the first business. Optionally, when the first business ends, or when other businesses need to use the sBTB, the first processing core 21 can also clear (or flush) the sBTB. For example, the first processing core 21 can clear the sBTB through the valid bit.
可以理解的是,该sBTB从该全局预测器10的LBTB中获取分支地址对的条件还可以包括其他的条件,比如,在达到一定时长时获取,或者在第一处理核21的所有流水线跳转发生时获取等,本申请实施例对此不作具体限制。It can be understood that the conditions for the sBTB to obtain the branch address pair from the LBTB of the global predictor 10 may also include other conditions, such as obtaining when a certain time is reached, or obtaining when all pipeline jumps of the first processing core 21 occur, etc. The embodiment of the present application does not impose specific restrictions on this.
此外,图6示出了上述sBTB的一种可能的结构示意图。如图6所示,该sBTB可以包括存储电路和有效位阵列,该存储电路可用于缓存来自全局预测器10的历史运行信息,该有效位阵列可用于指示该存储电路中的缓存的信息的有效性。其中,该存储电路可以包括多路相连的RAM,这样在搜索该存储电路时可以提高搜索效率。可选的,该sBTB还可以包括匹配和选择电路,该匹配和选择电路可用于在FIVA所属的业务与该sBTB当前缓存的业务匹配,且在该sBTB命中的目标分支地址有效时输出。此外,该mBTB的输出端和该sBTB的输出端还可通过一个选择器耦合,用于在该mBTB命中时选择该mBTB的输出结果,在该sBTB命中时选择该sBTB的输出结果。In addition, FIG6 shows a possible structural diagram of the above-mentioned sBTB. As shown in FIG6, the sBTB may include a storage circuit and a valid bit array, the storage circuit may be used to cache historical operation information from the global predictor 10, and the valid bit array may be used to indicate the validity of the cached information in the storage circuit. Among them, the storage circuit may include a multi-way connected RAM, so that the search efficiency can be improved when searching the storage circuit. Optionally, the sBTB may also include a matching and selection circuit, which may be used to match the business to which the FIVA belongs with the business currently cached by the sBTB, and output when the target branch address hit by the sBTB is valid. In addition, the output end of the mBTB and the output end of the sBTB may also be coupled by a selector, which is used to select the output result of the mBTB when the mBTB hits, and select the output result of the sBTB when the sBTB hits.
可选的,第一处理核21查询第一缓冲器且命中时,第一处理核21可以将命中的运行信息作为投机信息重新装载到流水线需要使用的模块或部件中。示例性的,将分支地址对装载回分支流水线,将指令缓存和数据缓存的冷热和关键度信息装载回指令预取和替换部件,将页表的历史运行信息装载回页表预取部件,将硬件预取的规律信息和难预测地址信息装载回硬件预取部件等。Optionally, when the first processing core 21 queries the first buffer and hits, the first processing core 21 can reload the hit operation information as speculative information into the module or component that needs to be used by the pipeline. Exemplarily, the branch address pair is loaded back into the branch pipeline, the hot and cold and critical information of the instruction cache and the data cache is loaded back into the instruction prefetch and replacement component, the historical operation information of the page table is loaded back into the page table prefetch component, and the regular information and difficult-to-predict address information of the hardware prefetch are loaded back into the hardware prefetch component, etc.
进一步的,在一种可能的示例中,如图7所示,当该全局预测器中缓存的历史运行信息包括分支地址对信息、跳转信息和指令踪迹等多项信息时,该多个处理核21(比如,核0、核1和核2)中的任一处理核(比如,核0)中用于缓存该多项信息的第二缓冲器可以包括分支目标缓冲器(branch target buffer,BTB)、指令高速缓存(instruction cache,Icache)、数据高速缓存(data cache,Dcache)和页表缓存(translation lookaside buffer,TLB)等;该处理核还可以包括执行流水线(execution pipeline)等其他器件。图7中以该处理核通过第一缓冲器缓存从该全局预测器10获取的该多项信息,且该第一缓冲器为流式缓存(stream buffer)为例进行说明。Further, in a possible example, as shown in FIG7, when the historical operation information cached in the global predictor includes multiple information such as branch address pair information, jump information and instruction trace, the second buffer for caching the multiple information in any processing core (for example, core 0) among the multiple processing cores 21 (for example, core 0, core 1 and core 2) may include a branch target buffer (BTB), an instruction cache (Icache), a data cache (Dcache) and a page table cache (TLB); the processing core may also include other devices such as an execution pipeline. FIG7 is illustrated by taking the example that the processing core caches the multiple information obtained from the global predictor 10 through the first buffer, and the first buffer is a stream buffer.
相应的,在一种示例中,如图7所示,该全局预测器10包括多路缓冲器(比如,N路缓冲器且表示为GP w0至GP wN),每路缓冲器可用于缓存一个业务的历史运行信息,任一业务的历史运行信息可以包括多个分支块索引、以及与每个分支块索引对应的上述多项信息。图7中业务1的历史运行信息中的分支块索引包括block-ID1和block-ID2为例进行说明。当同一业务在不同处理核之间发生切换时,该业务切换后所在的处理核可以共享该业务在该全局预测器10中的历史运行信息。Accordingly, in one example, as shown in FIG7 , the global predictor 10 includes multiple buffers (e.g., N buffers and represented as GP w0 to GP wN), each of which can be used to cache the historical operation information of a service, and the historical operation information of any service can include multiple branch block indexes, and the above-mentioned multiple information corresponding to each branch block index. The branch block index in the historical operation information of service 1 in FIG7 includes block-ID1 and block-ID2 as an example for explanation. When the same service switches between different processing cores, the processing core where the service is switched can share the historical operation information of the service in the global predictor 10.
在介绍完该全局预测器10和该多个处理核20的相关结构之后,下面详细介绍为该多个处理核20中的任一处理核分配该全局预测器10中的资源的过程。After introducing the related structures of the global predictor 10 and the multiple processing cores 20 , the process of allocating resources in the global predictor 10 to any processing core among the multiple processing cores 20 is described in detail below.
可选的,该多个处理核20中的任一处理核所使用的该全局预测器10中的资源可以是通过配置得到的。当该多个处理核20中的任一处理核需要使用该全局预测器10缓存某一业务的历史运行信息时,可以为该处理核配置该全局预测器10,比如,为该处理核在该全局预测器10中配置相应的缓冲区。此外,还可以为该多个处理核20配置能够使用该全局预测器10的业务,即为该多个处理核20中的任一处理核配置一些关键业务,只有配置的关键业务的历史运行信息才可被缓存在该全局预测器中。下面以第一处理核21为例对配置该全局预测器的过程进行介绍说明。Optionally, the resources in the global predictor 10 used by any of the multiple processing cores 20 can be obtained through configuration. When any of the multiple processing cores 20 needs to use the global predictor 10 to cache the historical operation information of a certain service, the global predictor 10 can be configured for the processing core, for example, a corresponding buffer is configured in the global predictor 10 for the processing core. In addition, the services that can use the global predictor 10 can also be configured for the multiple processing cores 20, that is, some key services can be configured for any of the multiple processing cores 20, and only the historical operation information of the configured key services can be cached in the global predictor. The process of configuring the global predictor is introduced and explained below by taking the first processing core 21 as an example.
在一种可能的实施例中,第一处理核21还用于:获取针对第一业务的配置信息,该配置信息用于指示第一缓冲区;根据该配置信息为第一业务配置第一缓冲区。可选的,配置第一缓冲区可以包括使能、关闭或清除第一缓冲区。其中,该配置信息可以是由运行在该多个处理核20上的软件发送给第一处理核21的,也可以是事先配置在第一处理核21中的,本申请实施例对此不作具体限制。该配置信息可用于指示第一缓冲区的大小,和/或用于指示第一缓冲区对应的地址空间。进一步的,该配置信息还可用于指示第一业务,比如,该配置信息可以包括第一业务的业务标识。In a possible embodiment, the first processing core 21 is also used to: obtain configuration information for the first service, the configuration information is used to indicate the first buffer; configure the first buffer for the first service according to the configuration information. Optionally, configuring the first buffer may include enabling, disabling or clearing the first buffer. Among them, the configuration information may be sent to the first processing core 21 by the software running on the multiple processing cores 20, or may be pre-configured in the first processing core 21, and the embodiments of the present application do not impose specific restrictions on this. The configuration information can be used to indicate the size of the first buffer, and/or to indicate the address space corresponding to the first buffer. Further, the configuration information can also be used to indicate the first service. For example, the configuration information may include a service identifier of the first service.
为便于理解,如图8所示,下面以第一处理核21分别为核0、核1和核2时,对通过配置信息使能、关闭和清除该全局预测器10的过程进行举例说明。图8以核0、核1和核2均包括HWP和SBTB为例进行说明。For ease of understanding, as shown in Figure 8, the following is an example of the process of enabling, disabling and clearing the global predictor 10 through configuration information when the first processing core 21 is core 0, core 1 and core 2. Figure 8 is an example of core 0, core 1 and core 2 all including HWP and SBTB.
在一种示例中,假设第一处理核21为核0,则通过配置信息使能该全局预测器10包括:S11.当需要在核0上运行业务a时,该多个处理核20上运行的软件可以向核0发生针对该业务a的第一配置信息;S12.当核0接收到第一配置信息时,核0的HWP根据第一配置信息使能该全局预测器10,比如,向该全局预测器10发送业务信息和使能信息,以使能该全局预测器10用于缓存业务a的历史运行信息;S13.核0还可以在sBTB中目前缓存的信息与业务a匹配时,使能sBTB;S14.全局预测器10开始工作。In one example, assuming that the first processing core 21 is core 0, enabling the global predictor 10 through configuration information includes: S11. When it is necessary to run service a on core 0, the software running on the multiple processing cores 20 can send the first configuration information for the service a to core 0; S12. When core 0 receives the first configuration information, the HWP of core 0 enables the global predictor 10 according to the first configuration information, for example, sending service information and enable information to the global predictor 10 to enable the global predictor 10 to cache the historical operation information of service a; S13. Core 0 can also enable sBTB when the information currently cached in sBTB matches service a; S14. The global predictor 10 starts working.
在另一种示例中,假设第一处理核21为核1,则通过配置信息关闭该全局预测器10包括:S21.当需要关闭核1上运行的业务b使用全局预测器10时,该多个处理核20上运行的软件可以向核1发生针对该业务b的第二配置信息;S22.当接收到第二配置信息时,核1的HWP根据第二配置信息关闭该全局预测器10,比如,向该全局预测器10发送业务信息和关闭信息;S23.该全局预测器10停止为核1上的该业务b提供服务,即停止向核1提供该业务的历史运行信息。In another example, assuming that the first processing core 21 is core 1, shutting down the global predictor 10 through configuration information includes: S21. When it is necessary to shut down the service b running on core 1 using the global predictor 10, the software running on the multiple processing cores 20 can send second configuration information for the service b to core 1; S22. When receiving the second configuration information, the HWP of core 1 shuts down the global predictor 10 according to the second configuration information, for example, sending service information and shutdown information to the global predictor 10; S23. The global predictor 10 stops providing services for the service b on core 1, that is, stops providing historical operation information of the service to core 1.
在又一种示例中,假设第一处理核21为核2,则通过配置信息清除该全局预测器10包括:S31.当需要清除核2上运行的业务c使用全局预测器10时,该多个处理核20上运行的软件可以向核2发生针对该业务c的第三配置信息;S32.当接收到第三配置信息时,核2的HWP根据第三配置信息清除该全局预测器10,比如,向该全局预测器10发送线程信息和清除信息;S33.该全局预测器10清除该线程对应的信息,即清除该业务c的历史运行信息。可选的,该全局预测器10可以使用逐行清除的方式进行清除。In another example, assuming that the first processing core 21 is core 2, clearing the global predictor 10 through configuration information includes: S31. When it is necessary to clear the global predictor 10 used by the business c running on core 2, the software running on the multiple processing cores 20 can send the third configuration information for the business c to core 2; S32. When receiving the third configuration information, the HWP of core 2 clears the global predictor 10 according to the third configuration information, for example, sending thread information and clearing information to the global predictor 10; S33. The global predictor 10 clears the information corresponding to the thread, that is, clears the historical running information of the business c. Optionally, the global predictor 10 can be cleared using a row-by-row clearing method.
在本申请实施例中,通过全局预测器10缓存历史运行信息、以及任一处理核运行业务所产生的运行信息,以实现缓存的运行信息的更新,以供后续使用,该多个处理核20在运行业务时从全局预测器10中获取各自所需的运行信息,这样相对于为该多个处理核扩展了更大容量的缓存,从而可以降低该多个处理核20在处理业务时的缓存缺失率、提高预测准确率和IPC,进而提高业务的运行效率。对于业务切换场景,由于该全局预测器10可以被该多个处理核20共享,这样当业务从一个处理核切换至另一处理核时,该另一处理核仍可以从该全局预测器10中获取该业务的历史运行信息用于运行该业务,从而避免了该另一处理核出现大量冷缺失的问题。此外,该多个处理核20中的任一处理核所使用的该全局预测器10中的资源是通过软件配置得到的,还可以为该多个处理核20配置能够使用该全局预测器10的业务,这样可以避免该全局预测器10中的信息外溢造成安全信息泄露,同时从软件视角能够识别关键业务,并将该全局预测器10中的资源配置给处理核上运行的关键业务,从而避免了资源瓶颈,进一步提高了业务的运行效率。In the embodiment of the present application, the global predictor 10 caches historical operation information and the operation information generated by any processing core running the business, so as to update the cached operation information for subsequent use. When running the business, the multiple processing cores 20 obtain the operation information they need from the global predictor 10. In this way, compared with expanding a larger capacity cache for the multiple processing cores, the cache miss rate of the multiple processing cores 20 when processing the business can be reduced, the prediction accuracy and IPC can be improved, and the operation efficiency of the business can be improved. For the business switching scenario, since the global predictor 10 can be shared by the multiple processing cores 20, when the business is switched from one processing core to another, the other processing core can still obtain the historical operation information of the business from the global predictor 10 for running the business, thereby avoiding the problem of a large number of cold misses in the other processing core. In addition, the resources in the global predictor 10 used by any processing core among the multiple processing cores 20 are obtained through software configuration, and the multiple processing cores 20 can also be configured with services that can use the global predictor 10, so as to avoid information overflow in the global predictor 10 causing security information leakage. At the same time, from the software perspective, key services can be identified and the resources in the global predictor 10 can be configured to the key services running on the processing core, thereby avoiding resource bottlenecks and further improving the operating efficiency of the services.
基于此,本申请实施例还提供一种业务处理方法,该方法可应用于包括全局预测器、以及耦合至该全局预测器的多个处理核的业务处理装置中,关于该业务处理装置的描述可参见上文中的阐述。该方法包括:全局预测器缓存多个处理核中至少一个处理核的历史运行信息;该多个处理核中的第一处理核获取该历史运行信息中的第一历史运行信息,并根据第一历史运行信息运行第一业务,以得到第一运行信息;该全局预测器缓存第一运行信息。Based on this, an embodiment of the present application also provides a service processing method, which can be applied to a service processing device including a global predictor and multiple processing cores coupled to the global predictor. For a description of the service processing device, please refer to the above description. The method includes: the global predictor caches historical operation information of at least one processing core among the multiple processing cores; the first processing core among the multiple processing cores obtains the first historical operation information in the historical operation information, and runs the first service according to the first historical operation information to obtain the first operation information; the global predictor caches the first operation information.
可选的,该历史运行信息包括多个业务的历史运行信息,该多个业务的历史运行信息包括第一历史运行信息。Optionally, the historical operation information includes historical operation information of multiple businesses, and the historical operation information of the multiple businesses includes first historical operation information.
在一种可能的实施例中,该全局预测器包括多个缓冲区,该多个缓冲区分别缓存该多个业务的历史运行信息,其中该多个缓冲区中的第一缓冲区缓存第一历史运行信息。相应的,该方法还可以包括:第一处理核获取针对第一业务的配置信息,该配置信息用于指示第一缓冲区;第一处理核根据该配置信息为第一业务配置第一缓冲区。其中,配置第一缓冲区包括使能、关闭或清除第一缓冲区。In a possible embodiment, the global predictor includes multiple buffers, and the multiple buffers cache the historical operation information of the multiple services respectively, wherein the first buffer of the multiple buffers caches the first historical operation information. Accordingly, the method may also include: the first processing core obtains configuration information for the first service, and the configuration information is used to indicate the first buffer; the first processing core configures the first buffer for the first service according to the configuration information. Configuring the first buffer includes enabling, disabling or clearing the first buffer.
可以理解的是,上述装置实施例中的所有内容均可以援引到该业务处理方法对应的实施例中,本申请实施例在此不再赘述。It can be understood that all contents in the above-mentioned device embodiment can be referred to in the embodiment corresponding to the business processing method, and the embodiments of the present application will not be described in detail here.
在本申请实施例中,通过全局预测器缓存历史运行信息、以及任一处理核运行业务所产生的运行信息,以实现缓存的运行信息的更新,以供后续使用,该多个处理核中的任一处理核在运行业务时可以从全局预测器中获取所需的运行信息,这样相对于为该多个处理核扩展了更大容量的缓存,本方案可以降低该多个处理核处理业务时的缓存缺失率、提高预测准确率和IPC,进而提高业务的运行效率。In an embodiment of the present application, historical operation information and operation information generated by any processing core running a business are cached by a global predictor to update the cached operation information for subsequent use. Any processing core among the multiple processing cores can obtain the required operation information from the global predictor when running a business. In this way, compared with expanding a larger capacity cache for the multiple processing cores, this solution can reduce the cache miss rate when the multiple processing cores process the business, improve the prediction accuracy and IPC, and thus improve the operation efficiency of the business.
在本申请的另一方面,还提供一种电子设备,如图9所示,该电子设备包括存储器和业务处理装置,该业务处理装置包括全局预测器和多个处理核,该存储器用于存储计算机指令,该业务处理装置用于执行该计算机指令,以使该电子设备实现如上文所提供的任意一种业务处理方法。存储器的具体介绍可见前面实施例。In another aspect of the present application, an electronic device is also provided, as shown in FIG9 , the electronic device includes a memory and a service processing device, the service processing device includes a global predictor and multiple processing cores, the memory is used to store computer instructions, and the service processing device is used to execute the computer instructions so that the electronic device implements any one of the service processing methods provided above. The specific introduction of the memory can be seen in the previous embodiment.
可以理解的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到该业务处理方法的实施例以及该电子设备的实施例中,本申请实施例在此不再赘述。It can be understood that all relevant contents of each step involved in the above method embodiment can be referred to the embodiment of the business processing method and the embodiment of the electronic device, and the embodiments of the present application will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic, for example, the division of the modules or units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中,该可读存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium, which can include: a USB flash drive, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., which can store program codes. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product.
在本申请的另一实施例中,还提供一种可读存储介质,该可读存储介质中存储有计算机执行指令,当一个设备(可以是单片机,芯片等)或者处理器执行上述方法实施例中的步骤。In another embodiment of the present application, a readable storage medium is also provided, in which computer execution instructions are stored. When a device (which may be a single-chip microcomputer, chip, etc.) or a processor executes the steps in the above method embodiment.
在本申请的又一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在可读存储介质中;设备的至少一个处理器可以从可读存储介质读取该计算机指令,至少一个处理器执行该计算机指令使得设备上述方法实施例中的步骤。In another embodiment of the present application, a computer program product is also provided, which includes computer instructions stored in a readable storage medium; at least one processor of the device can read the computer instructions from the readable storage medium, and at least one processor executes the computer instructions so that the device performs the steps in the above method embodiment.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
Claims (20)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311795739.1 | 2023-12-22 | ||
CN202311795739.1A CN120196428A (en) | 2023-12-22 | 2023-12-22 | A business processing device, method, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025130918A1 true WO2025130918A1 (en) | 2025-06-26 |
Family
ID=96066633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/140283 Pending WO2025130918A1 (en) | 2023-12-22 | 2024-12-18 | Service processing apparatus and method, and device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN120196428A (en) |
WO (1) | WO2025130918A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090287903A1 (en) * | 2008-05-16 | 2009-11-19 | Sun Microsystems, Inc. | Event address register history buffers for supporting profile-guided and dynamic optimizations |
CN112231243A (en) * | 2020-10-29 | 2021-01-15 | 海光信息技术股份有限公司 | A data processing method, processor and electronic device |
CN114518900A (en) * | 2020-11-20 | 2022-05-20 | 上海华为技术有限公司 | Instruction processing method applied to multi-core processor and multi-core processor |
-
2023
- 2023-12-22 CN CN202311795739.1A patent/CN120196428A/en active Pending
-
2024
- 2024-12-18 WO PCT/CN2024/140283 patent/WO2025130918A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090287903A1 (en) * | 2008-05-16 | 2009-11-19 | Sun Microsystems, Inc. | Event address register history buffers for supporting profile-guided and dynamic optimizations |
CN112231243A (en) * | 2020-10-29 | 2021-01-15 | 海光信息技术股份有限公司 | A data processing method, processor and electronic device |
CN114518900A (en) * | 2020-11-20 | 2022-05-20 | 上海华为技术有限公司 | Instruction processing method applied to multi-core processor and multi-core processor |
Also Published As
Publication number | Publication date |
---|---|
CN120196428A (en) | 2025-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11809321B2 (en) | Memory management in a multiple processor system | |
JP5945291B2 (en) | Parallel device for high speed and high compression LZ77 tokenization and Huffman encoding for deflate compression | |
US11586542B2 (en) | Reducing cache transfer overhead in a system | |
US9053049B2 (en) | Translation management instructions for updating address translation data structures in remote processing nodes | |
US8370575B2 (en) | Optimized software cache lookup for SIMD architectures | |
CN110806900B (en) | Memory access instruction processing method and processor | |
US10339060B2 (en) | Optimized caching agent with integrated directory cache | |
US10970214B2 (en) | Selective downstream cache processing for data access | |
US10866902B2 (en) | Memory aware reordered source | |
CN117609109A (en) | Priority-based cache line eviction algorithm for flexible cache allocation techniques | |
EP4031964A1 (en) | Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor | |
US20140089587A1 (en) | Processor, information processing apparatus and control method of processor | |
CN118312449B (en) | Memory management unit and method, chip and electronic equipment | |
US9418024B2 (en) | Apparatus and method for efficient handling of critical chunks | |
US20140258639A1 (en) | Client spatial locality through the use of virtual request trackers | |
WO2025130918A1 (en) | Service processing apparatus and method, and device and storage medium | |
US11093401B2 (en) | Hazard prediction for a group of memory access instructions using a buffer associated with branch prediction | |
CN120029933A (en) | Data request processing method, device, electronic device, storage medium and program product | |
JPWO2012172694A1 (en) | Arithmetic processing device, information processing device, and control method of arithmetic processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24906397 Country of ref document: EP Kind code of ref document: A1 |