GB2632311A - Hints in a data processing apparatus - Google Patents
Hints in a data processing apparatus Download PDFInfo
- Publication number
- GB2632311A GB2632311A GB2311878.9A GB202311878A GB2632311A GB 2632311 A GB2632311 A GB 2632311A GB 202311878 A GB202311878 A GB 202311878A GB 2632311 A GB2632311 A GB 2632311A
- Authority
- GB
- United Kingdom
- Prior art keywords
- loop
- program
- instruction
- hint
- instructions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30065—Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30083—Power or thermal control instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
An apparatus has processing circuitry to perform processing operations specified by program instructions and loop control circuitry to identify a program loop specified by the program instructions. On identification of a program loop, the loop control circuitry stores loop control data that indicates a program loop body, excluding any program flow control instructions that specify the program loop. The loop control circuitry controls the processing circuitry to carry out loop iterations of the program loop body indicated by the loop control data. The processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body to cause loop control data for the given program loop to exclude the hint instruction from a modified program loop body and to perform a performance modifying operating procedure specified by the hint instruction when carrying out loop iterations of the modified program loop body. The omission of the hint instruction avoids needless repeated execution of the instruction on subsequent loop iterations.
Description
HINTS IN A DATA PROCESSING APPARATUS
BACKGROUND
The present technique relates to the field of data processing. More particularly, the present technique relates to handling hints in a data processing apparatus.
Hint instructions may be used within a sequence of program instructions that are to be executed by processing circuitry in order to indicate to the processing circuitry opportunities to carry out performance modifying operations. For example, a hint instruction may indicate certain properties of the data to be operated on or of an upcoming sequence of instructions to be executed where the processing circuitry can make use of these properties to modify how data pressing operations are carried out with a view to improving performance of execution of those operations.
SUMMARY
In one example arrangement, there is provided an apparatus comprising: processing circuitry to perform processing operations specified by program instructions; and loop control circuitry to identify a program loop specified by the program instructions and to store loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; wherein the loop control circuitry is to control the processing circuitry to carry out loop iterations of the program loop body indicated by the loop control data; and wherein the processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions, to cause: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of a modified program loop body, an instruction following the hint instruction, and the processing circuitry to perform, when carrying out loop iterations of the modified program loop body, the performance modifying operating procedure as specified by the hint instruction.
In another example arrangement, there is provided a system comprising: the apparatus of the above apparatus, implemented in at least one packaged chip; at least one system component; and a board; wherein the at least one packaged chip and the at least one system component are assembled on the board.
In a further example arrangement, there is provided a chip-containing product comprising the above system assembled on a further board with at least one other product component.
In a yet further example arrangement, there is provided a computer-readable medium to store computer-readable code for fabrication of an apparatus comprising: processing circuitry to perform processing operations specified by program instructions; and loop control circuitry to identify a program loop specified by the program instructions and to store loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; wherein the loop control circuitry is to control the processing circuitry to carry out loop iterations of the program loop body indicated by the loop control data; and wherein the processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions, to cause: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of a modified program loop body, an instruction following the hint instruction, and the processing circuitry to perform, when carrying out loop iterations of the modified program loop body, the performance modifying operating procedure as specified by the hint instruction.
In another example arrangement, there is provided a method comprising: performing processing operations specified by program instructions; identifying a program loop specified by the program instructions and storing loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; controlling performance of loop iterations of the program loop body indicated by the loop control data; and causing, responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of the modified program loop body, an instruction following the hint instruction, and the performance modifying operating procedure to be performed as specified by the hint instruction when carrying out loop iterations of the modified program loop body.
BRIEF DESCRIPTION OF THE DRAWINGS
Further aspects, features, and advantages of the present technique will be apparent from the following description of examples, which is to be read in conjunction with the accompanying drawings, in which: Figure 1 schematically illustrates an example of a data processing apparatus; Figure 2 illustrates an item of loop control data; Figure 3 schematically illustrates zero-overhead loop behaviour; Figure 4 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body of a zero-overhead loop; Figure 5 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body of a zero-overhead loop with a loop-start instruction; Figure 6 schematically illustrates loop behaviour where two hint instructions are excluded from a modified program loop body; Figure 7 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body of a loop; Figure 8 is a flowchart illustrating the operation of a data processing apparatus in an 10 example; Figure 9 is a flowchart illustrating the operation of a data processing apparatus in another example; and Figure 10 schematically illustrates a system comprising a packaged chip. DESCRIPTION OF EXAMPLES Before discussing the examples with reference to the accompanying figures, the
following description of examples is provided.
Program code commonly includes instructions that set up program loops, causing repeated execution of a series of program instructions forming a program loop body. The program loops may be controlled using branch instructions at the end of the program loop that cause program flow to be diverted back to the start of the loop when executed or may, for example, be controlled using dedicated loop instructions that define a program loop. In accordance with the techniques described herein, there is provided an apparatus with processing circuitry to perform processing operations specified by program instructions. The apparatus also has loop control circuitry that is able to identify a program loop specified by the program instructions. On detection of such a program loop, the loop control circuitry stores loop control data that identifies a program loop body of the program loop. Where the program loop is specified using program flow control instructions, such as the branch instructions or dedicated loop instructions discussed above, those program flow control instruction are excluded from the program loop body excludes. Thus, the program loop body comprises only the instructions that are to be repeatedly executed as part of the loop.
The loop control circuitry then controls the processing circuitry, when executing the program instructions, to carry out loop iterations of only the program loop body, and in doing so prevent execution of the program flow control instructions on at least some executions of the loop. By controlling the processing circuitry to carry out the loop iterations in this way, the loop control circuitry is able to reduce the overhead involved in controlling the program loop that would otherwise be incurred by repeated execution of the program flow control executions on every iteration of the program loop. Instead, by storing loop control data that identifies the program loop body separately from the program flow control instructions, the loop control circuitry is able to ensure that the program loop behaviour defined by the program flow control instructions is observed while preventing execution of the program flow control instructions themselves on at least some iterations of the loop.
Where an apparatus comprising such loop control circuitry is used, and a programmer wants to include one or more hints instructions that indicate a performance modifying operating procedure that is to be employed for instructions in a program loop, a number of possible locations for the hint instruction(s) can be considered.
Firstly, the hint instruction could be included inside the program loop body, for example, as the first instruction in the program loop body. The hint would therefore apply to subsequent instructions occurring in the program loop body (until the hint instruction is cleared, is replaced by another hint instruction, or otherwise no longer applies). On each iteration of the program loop, the hint instruction would be executed allowing the performance modifying operating procedure to be performed. However, this approach requires execution of the hint instruction on each iteration of the program loop which can impact performance and in some cases may negate any performance benefit provided by the performance modifying operative procedure.
To address this, another approach involves positioning the hint instruction outside the program loop body (e.g., before a first instruction in the program loop body or before a program flow control instruction that marks the start of the program loop). This approach means that the hint instruction is executed before the first iteration of the program loop but is not executed on subsequent iterations of the program loop. Since the instruction does not form part of the program loop body, the performance impact associated with executing the hint instruction repeatedly is thereby reduced.
The hint information associated with a hint instruction can be cached and used to indicate the performance modifying operating procedure that should be performed while executing one or more subsequent instructions. When an interrupt or other type of exception occurs causing the processing circuitry to perform a context switch to handle the interrupt, the cached hint instruction is flushed to avoid the performance modifying operating procedure being used in the context of handling the interrupt, where it may inadvertently reduce performance. However, the flushing of this hint information means that if an interrupt occurs while the processing circuitry is executing a loop, the hint information will no longer be cached when the program flow returns the loop after handling the interrupt. In the case where a hint instruction is positioned outside the program loop body, the hint instruction will not form part of the program flow on return to the program loop and so subsequent iterations of the program loop will be executed without employing the performance modifying operating procedure specified by the hint instruction. This itself can lead to a performance impact since any performance benefits associated with the performance modify operating procedure will no longer be obtained.
In accordance with the techniques described herein therefore, the processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body to cause loop control data for the given program loop to exclude the hint instruction from a modified program loop body. This is done by identifying as the start of a modified program loop body, an instruction following the hint instruction. By excluding the hint instruction from the modified program loop body and then carrying out loop iterations of this modified program loop body, the processing circuitry is able to prevent the hint instruction being executed on subsequent iterations of the program loop. During the iterations of the modified program loop body, the processing circuitry performs the performance modifying operating procedure as specified by the hint instruction.
The apparatus is therefore able to benefit from any performance gains associated with the performance modifying operating procedure while also avoiding the need to execute the hint instruction on each iteration of the program loop In response to an interrupt occurring while the processing circuitry is executing iterations of the program loop, both the loop control data and the hint control data will be flushed. Therefore, on return to the program loop after the interrupt has been handled, the loop control data will no longer define the modified program loop body that excludes the hint instruction. Consequently, a first full iteration of the program loop that is executed after return from the interrupt will lead to re-execution of the hint instruction (since the loop control data no longer excludes the hint instruction from the modified program loop body). Execution of this hint instruction will therefore enable the performance modifying operating procedure to be performed for subsequent program instructions (e.g., subsequent iterations of the program loop). The re-execution of the hint instruction may also cause the processing circuitry to again exclude the hint instruction from a modified program loop body such that the processing circuitry will perform loop iterations of this modified program loop body while employing the performance modifying operating procedure.
In this way, the present techniques are able to optimise the execution of program loops containing hint instructions by preventing repeated execution of hint instruction on every iteration of the program loop while ensuring that any performance benefits of the performance modifying operating procedure are not lost when interrupted taken while the processing circuitry is executing iterations of the program loop.
It is important to appreciate that the above-described instruction is a "hint" instruction, and the present techniques therefore are not concerned with changing the data processing operations carried out in response to the sequence of data processing instructions in a functional manner, i.e., to change the data processing results which the data processing operations produce. Rather, the present techniques are concerned with modifying the manner in which the data processing operations are carried out (in terms of "performance"), such that the data processing operations may, for example, be carried out in a manner for which the efficiency, power consumption, latency, and so on may vary with respect to the manner in which they would have been carried out had the hint instruction not been used.
The apparatus is arranged to produce the same data processing results whether the sequence of data processing instructions are carried out according to the operating procedure (i.e. not preceded by the register identifying hint instruction) or are carried out according to the modified operating procedure (i.e. when preceded by the hint instruction).
The same results are produced by either procedure, but the manner in which those results are produced can vary in a range of ways between the operating procedure and the modified operating procedure.
Accordingly, the present techniques provide the programmer, and indeed the instruction set architect, with a mechanism for modifying the manner in which the apparatus responds to one or more data processing instructions which form the sequence of data processing instructions, without having to redefine a number of data processing instructions as part of the instruction set of the apparatus in order to achieve that modification.
There are many different types of performance measure that could be implemented. For instance, in one example use case, the hint instructions can be used in the application of non-temporal behaviour to memory operations. Non-temporal memory operations are a variant of normal load and store operations, where the accessed data is not expected to be accessed again soon, and therefore does not need to be retained in the caches. Hence, if certain accesses can be flagged as being non-temporal, the cache allocation/eviction policy of one or more caches can take that information into account in order to seek to make more optimal use of the cache resources, thereby improving performance, reducing energy consumption, etc. As another illustrative example use case, the hint instructions can be used to flag prefetch behaviour in respect of certain instructions. For instance, the hint instructions could be used to trigger prefetching behaviour to occur, or indeed in some instances to disable prefetching behaviour, in association with certain instructions that reference a register(s) indicated by a hint instruction, with the aim of seeking to improve overall performance of the apparatus.
The point in time at which any given performance measure is implemented based on the hint instruction may vary in dependence on a variety of factors, for example the type of hint instruction. In one example implementation, at least one of the one or more performance measures is implemented in association with performing one or more data processing operations defined by one or more given instructions in the sequence of instructions that accesses a given register identified by the hint instruction. Hence, in such implementations, the performance measure may be implemented in combination with the performance of at least a certain type of data processing operation that accesses a given register identified by the hint instruction. As a particular example of where such an approach may be appropriate, this may be used when the hint instruction provides a non-temporal hint in association with the data stored in the given register. For instance, a memory access operation may be used to load data from memory into the given register, or to store data from the given register to memory, and when that memory access activity causes the data to be cached within a cache structure of the apparatus, the flagging of the data as being non-temporal can enable a more efficient implementation of the cache structure, for example by marking that data as being able to be evicted more quickly than might otherwise be the case, thereby freeing up space for the data that will in fact be used repeatedly.
In one example implementation, the one or more given instructions may be instructions that use a value held in the given register in a predetermined manner. Hence, the use of the hint instruction to trigger a performance measure may be dependent on how the value in the given register is used. By way of one specific example, it may be decided to trigger a performance measure when the value held the given register is used as a pointer by a memory access instruction. For example, it may be considered that this is the use case where it is appropriate to mark the access as being non-temporal.
Similarly, the one or more given instructions may be instructions of a predetermined type, and hence only when an instruction is of that predetermined type, and optionally also uses the data held in the given register in a particular way, will the hint come into play and cause an associated performance measure to be implemented. By way of example, the use of the hint instruction to trigger a performance measure may be limited to memory access instructions.
Whilst in the above examples the performance measure is triggered in association with the performance of a data processing operation defined by a particular instruction, in addition, or alternatively, one or more performance measures may be implemented in response to execution of the hint instruction. In particular, for some types of hint instruction, it may be possible to act on the hint instruction once that hint instruction has been executed, to trigger one or more performance measures without needing to wait for one or more subsequent instructions to access a register(s) identified by the hint instruction. By way of specific example, when performing prefetching, it may be flagged using the hint instruction that the data to be used by one or more subsequent instructions that reference a given register identified by the hint instruction should be prefetched into a cache to improve performance when those one or more subsequent instructions are actually executed, and accordingly the prefetching activity can be implemented before those instructions are executed. In some examples a subsequent instruction that modifies the value of a register that has been flagged using a hint instruction would trigger a further prefetch, thereby improving the performance of further subsequent instructions that use the flagged register as a pointer. It will therefore be appreciated that in some examples the one or more performance measures may be triggered both in response to the generation of the hint instruction, and in response to subsequent instructions that modify the value stored in the register referenced by the hint instruction. As another example, the hint instruction could be used to disable prefetch training on a given load event. For example pointer chasing workloads are known to be adversarial to prefetching and can pollute prefetch training structures. Thus, bypassing training for these load events may improve overall prefetch performance.
As mentioned above, the processing circuitry maintains hint control data indicative of the performance modifying operating procedures that have been specified by instructions that have been encountered by the processing circuitry. This hint control data is then referenced by the processing circuitry in order to determine the performance modifying operating procedures to be performed. This control data is in addition to the loop control data maintained by the loop control circuitry that identifies the (modified) program loop body of program loops.
In response to an exception (e.g., an interrupt), the processing circuitry flushes at least some loop control data and at least some hint control data. Flushing the loop control data and/or hint control data may comprise deleting the data from a storage structure, or invalidating the data, e.g., by clearing respective enable bits associated with the hint control and/or loop control data entries.
Since the loop control data identifies the modified program body (excluding the hint instruction), flushing the loop control data when the exception is taken re-enables execution of the hint instruction on return from the exception. Consequently, re-execution of this hint instruction when encountered during execution of the program loop will cause the hint control data to be repopulated (thereby re-enabling the performance modify operating procedure) and the loop control data to be set so as to exclude the hint instruction from the modified program loop body (thereby preventing execution of the hint instruction on subsequent iterations of the program loop (unless another exception occurs)). It will also be appreciated that if the exception occurs on the last iteration of the program loop, even if the hint instruction is re-enabled, the hint instruction will not be executed again since program flow will not return to the top of the loop.
The processing circuitry may detect the hint instruction occurring at the beginning of the program loop body in a number of ways. In some examples, the processing circuitry is responsive to execution of a hint instruction to reference the loop control data and to determine whether the loop control data identifies the executed hint instruction as the beginning of a program loop body. This may be done by comparing a program counter (indicating the address of a current instruction) with an indication in the loop control data of the address of the start of a program loop body. If the loop control data does identify the hint instruction at the start of the program loop body, the processing circuitry may modify the loop control data to exclude the hint instruction from a modified program loop body. Another way, in which the instruction occurring at the start of the program loop body may be detected is by setting a flag at the start of the loop which is cleared on execution of subsequent instructions. Therefore, if the flag is set when hint instruction is executed the processing circuitry can determine that the instruction occurs at the start of the program loop body.
It should be appreciated that, the first time that the hint instruction at the start of the program loop body is encountered, the loop control data may not yet have been updated to exclude the hint instruction from the modified program loop body in the loop control data. This may be the case, for example, where the loop control data is populated at the end of a program loop, (e.g., where program flow control instructions specifying the program loop occur at the end of the program loop). On the second iteration of the program loop however the loop control data for the loop will have been populated and so the processing circuitry will detect that the loop control data identifies the hint instruction at the start of the program loop body and so will modify the loop control data to exclude the hint instruction from the modified program body. Therefore, for subsequent iterations of the program loop, the hint instruction will not be executed and so the overhead of the instruction will no longer be incurred.
In some cases more than one hint instruction will be present at the start of the program loop. The processing circuitry may therefore be responsive to execution of a further hint instruction occurring at the beginning of the program loop body (i.e., after the initial instruction) to identify that further instruction as occurring at the beginning of the modified program loop body. That is, after executing the initial hint instruction and causing the loop control data to exclude that hint instruction from the modified program loop body, the further hint instruction will then form the start of the modified program loop body. The processing circuitry may detect this for example by comparison of the program counter of the further hint instruction with the loop control data or by referencing the flag mentioned above (where the processing circuitry is configured not to clear the flag on execution of hint instructions). In response to identifying the further hint instruction, the processing circuitry may further modify the loop control data to exclude that further hint instruction from the modified program loop body. The processing circuitry may then carry out loop iterations of the modified program loop body while employing the performance modifying operating procedures as specified by both the initial hint instruction and the further hint instruction. In this way the processing circuitry is able to benefit from any performance benefits associated with the hint instructions while avoiding the overhead associated with executing the hint instructions on each iteration of the program loop. This approach may also be extended to even further instances of hint instructions so that in general the processing circuitry is able to handle a plurality of hint instructions occurring at the beginning of the program loop body.
Another way in which the processing circuitry could detect the hint instruction occurring at the beginning of a program loop body is based on a loop-start instruction. The loop-start instruction is a program flow control instruction that occurs before a program loop body and identifies the start of the program loop. Therefore, when the processing circuitry encounters a loop-start instruction followed by one or more hint instructions, the processing circuitry may indicate to the loop control circuitry that the loop control data should be modified by providing to the loop control circuitry, loop modification information that identifies an instruction following the one or more hint instructions as the start of a modified program loop body. The loop control circuitry can then make use of this loop modification information when a subsequent loop-end instruction is encountered (at which point the extent of the loop body is known) by storing loop control data that identifies the start of the modified program loop body based on the received loop modification information. In this way, the loop control circuitry can set the loop control data to exclude the one or more hint instructions from the modified program loop body. Notably, with this approach the loop control data can be set so as to exclude the hint instruction from the modified program loop body on the second iteration of the program loop.
The program loop may be identified by the loop control circuitry in a number of ways. In some examples, the loop control circuitry identifies the program loop based on a loop-end instruction occurring at the end of the program loop which specifies the extent of the program loop body of the program loop. For example, the loop-end instruction may identify the address of a first instruction in the program loop such that the program loop body comprises instructions occurring between the first instruction and an instruction preceding the loop-end instruction. The loop-end instruction may or may not be conditional and/or identify a number of loop iterations to perform.
In some examples, the loop control circuitry is configured to define a program loop additionally based on a loop-start instruction that controls a number of times the program loop body is to be executed. The loop-start instruction may specify directly a number of iterations of the program loop to be performed or may specify a condition under which execution of the loop is to be continued. In this sense, the loop-start instruction may represent a do or a while loop-start instruction.
The program loop that may therefore represent a zero overhead loop (also referred to as a low overhead loop). Execution of the program flow control instructions that define loops incur an overhead due to the need to execute the program flow control instructions which occupy positions in a processing circuitry pipeline. This this can be particularly impactful for workloads involving a large number of iterations of small loops for which a greater proportion of the executed instructions will be the program flow control instructions.
To combat this, zero overhead loops may be used for which the information defined in program flow control instructions (such as loop-start and loop-end instructions) that control the loop is cached and execution of the program loop body (excluding the loop control instructions) is carried out based on the cached information. In this way, the overhead associated with executing the program flow control instructions themselves can be reduced.
Such zero overhead loops provide an opportunity to implement the present techniques whereby hint instructions can be used to cause performance modifying operating procedures to be employed while taking advantage of the caching of loop control information to prevent repeated execution of the hints instruction and reduce the impact of execution of those instructions.
As well as, or instead of zero overhead loops which explicitly invoke the loop control circuitry to store loop control data and exclude program flow control instructions from execution, the loop control circuitry may implicitly detect the presence of loops in program code and consequently identify the opportunity to avoid execution of program flow control instructions that define these loops. The loop control circuitry may therefore comprise loop detection circuitry to detect program loops defined by one or more branch instructions (where a branch instruction represents a program flow control instruction). In response to detecting a program loop, the loop detection circuitry may store loop control data indicative of a program loop body of the detected program loop in order to perform a loop performance modifying operation that seeks to improve the performance of subsequent iterations of the detected program loop. This performance modifying operation may for example seek to reduce the power consumed when executing the program loop or improve the performance of the program loop (e.g., by avoiding execution of the branch instructions).
In this case the loop control circuitry may comprise loop buffer circuitry for storing loop control data indicative of at least some of the instructions of a program loop body.
Alternatively, the loop control circuitry may comprise loop metadata circuitry for storing loop control data indicative of the address of the start program loop body. In both cases, this loop control data may then be modified on detection of a hint instruction occurring at the beginning of a program loop as described herein.
In this case the processing circuitry may be responsive to a plurality of hint instructions occurring at the beginning of a given program loop body to cause the loop control data to exclude the plurality of hint instructions from the modified program loop body. This is done by identifying an instruction following the plurality of hint instructions at the start of the modified program loop body. The processing circuitry can then perform loop iterations of the modified program loop body excluding those hint instructions while employing the performance modify operating procedures specified by the respective hint instructions of the plurality of hint instructions.
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer-level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts. The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer-readable code defining instructions which are to be executed by the defined apparatus once fabricated.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc.
An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
Particular examples will now be described with reference to the figures.
Figure 1 schematically illustrates an example of a data processing apparatus 2. The data processing apparatus has a processing pipeline 4 which includes a number of pipeline stages, each implemented by corresponding circuitry. In this example, the pipeline stages include a fetch stage 6 for fetching instructions from an instruction cache 8; a decode stage 10 for decoding the fetched program instructions to generate micro-operations (decoded instructions) to be processed by remaining stages of the pipeline; an issue stage 12 for checking whether operands required for the micro-operations are available in a register file 14 and issuing micro-operations for execution once the required operands for a given micro-operation are available; an execute stage 16 for executing data processing operations corresponding to the micro-operations, by processing operands read from the register file 14 to generate result values; and a write-back stage 18 for writing the results of the processing back to the register file 14. It will be appreciated that this is merely one example of possible pipeline architecture, and other systems may have a different number of stages or a different configuration of stages. For example in an out-of-order processor an additional register renaming stage could be included for mapping architectural registers specified by program instructions or micro-operations to physical register specifiers identifying physical registers in the register file 14.
The execute stage 16 includes a number of processing units, for executing different classes of processing operation. For example the execution units may include an arithmetic/logic unit (ALU) 20 for performing arithmetic or logical operations; a floating-point unit 22 for performing operations on floating-point values, a branch unit 24 for evaluating the outcome of branch operations and adjusting the program counter which represents the current point of execution accordingly; and a load/store unit 28 for performing load/store operations to access data in a memory system 8, 30, 32, 34. In this example the memory system include a level one data cache 30, the level one instruction cache 8, a shared level two cache 32 and main system memory 34. It will be appreciated that this is just one example of a possible memory hierarchy and other arrangements including different cache organisations can be provided. The specific types of processing unit 20 to 28 shown in the execute stage 16 are just one example, and other implementations may have a different set of processing units or could include multiple instances of the same type of processing unit so that multiple micro-operations of the same type can be handled in parallel. It will be appreciated that Figure 1 is merely a simplified representation of some components of a possible processor pipeline architecture, and the processor may include many other elements not illustrated for conciseness, such as branch prediction mechanisms or address translation or memory management mechanisms.
The apparatus 2 also comprises loop control circuitry 40 that is arranged to help control the execution of program loops in certain situations. The loop control circuitry 40 is arranged to detect the presence of certain types of program loops, such as zero-overhead loops defined by dedicated loop instructions (e.g., a loop-start and loop-end instruction). The loop control circuitry stores loop control data in a loop control cache 42 (also referred to as loop control storage circuitry). Wth reference to this loop control data, the loop control circuitry 40 is able to control the apparatus 2 to execute the loops in a more efficient manner.
For example, with a zero-overhead loop, the loop control circuitry 40 may cause the apparatus 2 to execute only a loop body of the loop on at least some iterations of the loop, thereby avoiding repeated execution of the loop instructions themselves. The loop control circuitry 40 may also be responsive to other forms of loop such that the loop control circuitry 40 can detect the presence of such a loop and implement a loop performance modifying operation (e.g., by inhibiting execution of control flow changing instructions that define the loop and instead relying on the loop control data, or by using a more efficient performance mode to carry out execution of the loop iterations).
The apparatus 2 supports one or more performance modifying operating procedures that may be employed to seek to improve the performance of the apparatus 2 (e.g., by executing the workload more quickly and/or consuming less power). The apparatus implements these performance modifying operating procedures based on hint instructions included in program code, which may signal the opportunity to perform such performance modifying operating procedures on the basis, for example, of an expected frequency of access to certain data locations, the type of data that is to be executed, or expected values of data items. The modified operating procedure differs from the normal operating procedure by implementing one or more performance measures in dependence on the hint instructions.
In response to execution of a hint instruction by the apparatus 2, the hint control circuitry 50 populates a hint control cache 52 (also referred to as hint control storage circuitry) with hint control data identifying the performance modifying operating procedure to be performed. The apparatus 2 may then operate according using this performance measure.
A wide variety of different performance measures may be implemented, and indeed the timing at which those performance measures take place may vary, dependent on the type of performance measure being implemented. However, purely by way of illustrative example, one form of hint that may be generated may be referred to as a non-temporal hint, and if such a non-temporal hint is associated with a given register, this may trigger the implementation of a performance measure when the data held in that register is used in a particular way. In one particular example implementation, when the data held in given register is used as a pointer for a memory access operation, then the accessed data may be flagged as being non-temporal. This may allow an improvement in the overall performance of the apparatus, by allowing the data cache(s) to be used more efficiently, since such data marked as being non-temporal can be allowed to be evicted more promptly from the cache than might otherwise be the case, thereby increasing the likelihood that other data that will be reused is retained within the cache.
As another example of hint that may be generated, a prefetch hint may be generated in association with a given register, to identify that the data to be used by one or more subsequent instructions that reference that given register should be prefetched into a cache to improve performance when those one or more subsequent instructions are actually executed. Such prefetching activity can be implemented before those instructions are executed. As a result of such prefetching, it is expected that the overall performance of the apparatus will be improved.
As another example of the type of hint that could be supported, a data value hint could be generated for association with the given register, to provide a hint as to the value that is likely to be stored in that given register. Use of such a data value hint could be used in a variety of ways. Purely by way of example, such a data value hint might influence branch prediction mechanisms, for example by enabling a more likely branch path to be distinguished from a less likely branch path (for instance if which path is taken is dependent on whether the value in a given register matches a particular value, for example a 0 value, and the hint metadata indicates that the expected value in that given register is indeed a 0 value).
As a yet further example of the type of hint that could be supported, a branch hint could be generated for associating with a given register. This could for example be used to indicate that the value stored in the given register contains a function pointer, i.e. will be used to identify an address of a function that may be branched to upon encountering a subsequent indirect branch instruction. By using such a branch hint, branch prediction circuitry is provided with a hint to cause instructions to be prefetched starting from the address identified in the given register. If in due course an indirect branch instruction is indeed encountered that identifies the given register as containing an address indication used to identify where to branch to, then if the branch predictor predicts that branch is taken those prefetched instructions can start to be used, thereby improving performance.
An important point to appreciate here is that irrespective of whether or not hint is present, and hence irrespective of whether one or more performance modifying operating procedures are employed, which may modify the manner in which the data processing operations defined by data processing instructions are carried out, this will not change the data processing results of those data processing operations. In other words, the data processing results which are produced as a consequence of the data processing instructions (other than the hint instruction) which the decoding circuitry receives do not change in dependence on whether hint has been used. The data processing results thus remain the same (dependent only on the data processing instructions received and the data values to which those instructions refer).
In addition to the earlier mentioned hint instruction, the sequence of instructions may also include one or more instances of a register hint clear instruction. This can result in the generation of control signals sent to the hint control circuitry 50 to cause it to update the hint control data stored in the hint control cache 52, in particular by clearing one or more existing items of hint control data. The register hint clear instruction may for example identify one or more registers, resulting in hint metadata associated with those registers being cleared, or may alternatively take the form of a "clear all" hint clear instruction that could for example cause all existing hint control data stored in the hint control cache 52 to be cleared. In instances where multiple different types of hint are supported, the register hint clear instructions could identify type information, such that the clearing activities are restricted to one or more given types of hint.
The hint control cache 52 will also be cleared when an exception is taken. When an exception occurs (e.g., due to an interrupt being raised/in order to handle external inputs), the apparatus 2 carries out a context switch to replace the contents of at least some of the registers 14 (e.g. the program counter register) with data relevant to a new context associated with handling the exception. To avoid performance modifying operating procedures that may have been appropriate when executing program code in one context being used for handling the exception (where such performance modifying operating procedures may not be appropriate), the hint control circuitry 50 is arranged to flush the hint control cache 52 when an exception is taken.
Figure 2 illustrates an item of loop control data 60. The loop control data 60 is stored in the loop control cache 42 and used by the loop control circuitry 40 when a program loop is detected in program code to represent details of the loop. The loop control data 60 may then be used to control execution of loop iterations such that only a program loop body is executed and further execution of program flow control instructions that define the loop can be avoided. As shown in Figure 2, the loop control data 60 comprises a loop start field 62 that identifies the start of the program loop and a loop end field 64 that identifies the end of a program loop. The loop start and loop end may be identified directly by specifying the addresses of the instructions forming the loop start and loop end. In other examples, the loop end may be identified as an offset from an address of the loop start or by the number of instructions in the loop body in order to reduce the size of the loop control data 60. The item of loop control data 60 as depicted also includes a valid bit 68 to indicate whether the item of loop control data 60 is a valid item of loop control data 60. This valid bit 68 therefore provides a way to quickly invalidate items of loop control data 60 by toggling the valid bit 68 from a valid state to an invalid state. In some examples, the loop control data 60 may also comprise a count field 70 to identify an expected number of further iterations of the loop to perform. However, since the loop control data can be flushed from the loop control cache 42 when an exception is taken, loop count data will typically also be stored separately in such a way that the loop count data is maintained even when taking an exception to ensure the intended that loops are executed an intended number of times even when an exception is taken during the execution of such loops.
Figure 3 schematically illustrates zero-overhead loop behaviour. Figure 3 shows a sequence of instructions where instructions 05, 06, and 07 (representing load, load, and multiply instructions respectively) are instructions to be executed repeatedly. A loop-end instruction is included as instruction 08, the loop-end instruction indicating that the program flow should be diverted to the position marked loopStart. This sequence of instructions therefore forms a program loop whereby at least instructions 05, 06, 07 will be repeatedly executed. The execution of the hint instructions at instruction 03 and instruction 04 are discussed below.
The loop-end instruction is a specific form of instruction that is used to define a zero-overhead loop (also referred to as a low-overhead loop). The loop control circuitry 40 identifies the zero-overhead loop and stores loop control data indicative of the loop. The loop control data for this overhead loop is depicted in Figure 3 where instruction 04 is identified as the start of the loop body and instruction 07 is identified as the end of the loop body. The valid bit is set to 1 to indicate that the item of loop control data is valid. The loop control circuitry then controls the processing circuitry to iterate over only the program loop body, thereby omitting the loop-end instruction itself from execution.
It will be appreciated that zero-overhead program loop behaviour does not mean that there is no overhead associated with supporting loop behaviour, but that this overhead is reduced. For example, loop control program instructions (such as the loop-end instruction) will occupy slots within the pipeline during a first pass through the program loop but will be omitted on subsequent passes with the loop behaviour controlled by the loop control circuitry 40.
The use of the zero-overhead loop in this way therefore reduces the overhead associated with repeated execution of the loop-end instruction. This can be seen on the right-hand side of Figure 3 which schematically illustrates program flow through the sequence of instructions. As can be seen in Figure 3, on a first pass of the loop, all of the instructions from instruction 03 to instruction 08 are executed. The presence of the loop-end instruction then causes the program flow to be diverted to the start of the program loop body at instruction 04. However, on this next pass of the program loop, the loop-end instruction is omitted and program flow progresses directly from instruction 07 to the next iteration of the loop at instruction 04.
In the third pass of the loop, an exception occurs as illustrated with an asterisk. The apparatus 2 will then handle the exception. In switching to the exception handler however, the loop control cache 42 is cleared. Consequently, on return from the exception, while execution of the loop will continue, the loop control circuitry 40 will not be able to prevent execution of the loop-end instruction. Instead, the loop-end instruction is re-executed, at which point the loop control data can be repopulated so that further iterations of the loop will not execute the loop-end instruction.
To indicate that the processing circuitry should employ a performance modifying operating procedure, hint instructions can be included in the sequence of instructions. When using a hint instruction to indicate a performance modifying operating procedure that is to be applied to instructions that form part of the loop, the hint instruction could be placed either inside or outside of the loop. An example of each approach is shown in Figure 3 with instruction 03 placed before the loopStart position (and so outside the loop) and hint instruction 04 placed after the loopStart position (and so inside the loop).
Hint instruction 03 will be executed before the first pass of the loop. Hint control data corresponding to the hint instruction 03 will be stored and so the performance modifying operating procedure applied as specified. However, on taking of the exception, the hint control data will be cleared. If the exception occurs during iterations of the program loop, on return from the exception, the hint instruction 03 will not be re-executed and so the rest of the loop iterations will be performed without using the performance modifying operating procedure specified by the hint instruction 03. Consequently, any performance benefits associated with using that performance modifying operating procedure will be lost.
For hint instruction 04, even if an exception is taken during execution of the loop iterations, hint instruction 04 will be re-executed on the next iteration of the loop since it forms part of the loop body and so the performance modifying operating procedure will still be performed. However, since the hint instruction 04 is included in the program loop body and executed on each iteration, the apparatus 2 incurs a performance impact associated with the repeated execution of this instruction.
This performance impact due to repeated execution of the hint instruction can be mitigated in accordance with the techniques described herein while ensuring the hint instruction will be executed following an exception thereby allowing performance modifying operating procedure to be employed for iterations of the program loop following the exception.
Figure 4 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body of a zero-overhead loop in accordance with an example. In Figure 4, a similar program loop body involving two load instructions 04-05, a multiply instruction 06 and the loop-end instruction 07 is shown. Here, a hint instruction 03 is included as part of the program loop body following the loopStart location. The program flow when this sequence of instructions is executed on an apparatus incorporating the techniques of the present disclosure according to one example will now be described.
On the first pass through the loop, the hint instruction 03 will be executed causing the hint control cache 52 to be populated with hint control data indicative of a performance modifying operating procedure to be performed for one or more subsequent instructions. In this example, instruction 04-06 will then be executed, followed by the loop-end instruction 07. As in the previous example, the loop-end instruction will cause loop control data to be populated into the loop control cache 42. In this example, on the first pass of the loop, the loop control data (not shown) will identify the hint instruction 03 as the first instruction in the program loop body and multiply instruction 06 as the last instruction in the program loop body.
The next iteration of the loop therefore begins with the hint instruction 03. Since the hint control cache 52 has already been populated based on the hint instruction 03, the hint instruction has no hint effect here. However, in this example, the processing circuitry detects the hint instruction at the start of the program loop body and so modifies the loop control data to the form depicted in Figure 4 to identify the instruction following the hint instruction as the start of a modified program loop body, with instruction 04 now being identified at the first instruction in the loop body. The program flow then proceeds to the last instruction of the modified program loop body (instruction 07) before looping to the start of the modified program loop body based on the modified loop control data.
The third iteration of the loop therefore begins with load instruction 04. In this way, execution of the hint instruction can be excluded from subsequent iterations of the loop and the loop will iterate over only the instructions of the modified program loop body while the already populated hint control data will enable the performance modifying operating procedure to be carried out.
If an exception (e.g. an interrupt) occurs, the hint control data and the loop control data will be cleared/invalidated. The first iteration of the loop following the exception will therefore involve re-execution of the loop-end instruction 07, causing repopulation of the loop control data, identifying as the first instruction of the program loop body, the hint instruction 03 (as specified by the loop-end instruction). On the next iteration of the loop, the hint instruction 03 will be executed again. The processing circuitry can detect that the hint instruction is the first instruction in the program loop body (e.g., by comparison of the program counter with the loop control data or by monitoring of a flag set at the start of the loop and cleared on execution of an instruction that is not a hint instruction). The, loop control data is then modified to again remove the hint instruction 03 from the modified program loop body such that further iterations of the program loop involve execution of only instructions 04-06.
This approach therefore reduces the number of times that the hint instruction needs to be executed while still ensuring that the hint instruction is executed following an exception in order to restore the hint control data cleared when taking the exception.
However, with the approach of Figure 4, the hint instruction is executed on both the first and second iterations of the loop. Figure 5 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body on the first iteration of the program loop, thereby preventing its execution on the second iteration.
In this example, the sequence of instructions includes a loop-start instruction 03. The loop-start instruction may for example specify conditions under which further iterations of the program loop are to be performed or specify a number of iterations of the program loop to perform. The processing circuitry is responsive to execution of the loop-start instruction immediately followed by the hint instruction to identify that modified loop control data could be used once a loop-end instruction has been encountered in order to prevent future execution of the hint instruction 04. Accordingly, loop modification information is provided to the loop control circuitry 40. When the loop-end instruction is encountered, the modified loop control data can be populated which excludes the hint instruction 04 from the modified program loop body. Thus, on the second iteration of the program loop the hint instruction can be excluded.
When an exception such as an interrupt occurs, the loop control data will be cleared such that program flow will progress to the hint instruction. The processing circuitry at that point can detect that the hint instruction occurs at the start of the program loop and cause repopulated loop control data to exclude the hint instruction from the modified program loop body.
Figure 6 schematically illustrates loop behaviour where two hint instructions are excluded from a modified program loop body. It should be appreciated that in general, these techniques may be applied where a plurality of hint instructions are used. Here, the loop control data is modified in a similar manner to that discussed above; however, the presence of two hint instructions occurring at the start of the program loop is detected and modified loop control data is stored identifying, as the start of the modified program loop body, an instruction 05 following the last of the hint instructions. As such, the program flow will proceed through the sequence of instructions from instruction 03 to the loop-end instruction 08 on the first iteration of the loop.
Although not depicted, the approach of Figure 5 could be used here if a loop-start instruction had been present to provide loop modification information to the loop control circuitry 40 on detection of the hint instructions at the start of the loop.
Instead, the hint instructions are executed for a second time on the second iteration of the program loop. At this point, it will be detected that the hint instructions occur at the start of the program loop body. (This may be done by comparison of the program counter when executing the hint instructions with the loop control data, or by comparison of a flag used to identify which is set at the start of the loop and cleared on execution of instructions other than a hint instruction. Thus, if the flag is set when executing a hint instruction, the hint instruction occurs at the start of the program loop body.) With the hint instructions excluded from the modified program loop body, subsequent iterations of the loop will proceed without re-executing the hint instructions.
The present techniques may also be applied in situations that do not make use of zero-overhead loop instructions. Figure 7 schematically illustrates loop behaviour where a hint instruction is excluded from a modified program loop body of a loop that does not use a zero-overhead loop instruction. In this example, instruction 08 is a conditional branch instruction for which, if the branch is taken, program flow is diverted to instruction 03. The loop control circuitry 40 may comprise loop detection circuitry that is able to detect loops such as the loop in Figure 7 and take steps to optimise execution of the loops. For example, the loop detection circuitry may perform a loop performance modifying operation that involves storing loop control data for the loop which identifies a program loop body for the loop. The loop control circuitry 40 can then use this loop control data to execute only the program loop body (and not the branch instruction that sets up the loop). Thus, in a similar manner to the zero-overhead loops, the loop control circuitry 40 may make use of the loop detection circuitry to execute only a loop body of the loop.
In such cases, the processing circuitry may be able to detect the hint instruction 03 occurring at the start of the program loop body and exclude the hint instruction from a modified program loop body which is executed on subsequent iterations of the loop. Thus, once the hint control data associated with the hint instruction has been stored, further execution of the hint instruction can be avoided.
Figure 8 is a flowchart illustrating the operation of a data processing apparatus according to a first approach. At step 802, it is determined whether a program loop has been identified. As discussed above, the program loop may be a zero-overhead loop identified using loop control instructions such as the loop-start/loop-end instruction or the program loop may otherwise be identified based on the sequence of instructions to be executed.
Once a program loop has been identified, loop control data identifying the loop body of the program loop is stored at step 804. This loop control data may be stored in the loop control cache 42 and may cause further iterations of the loop to execute only the loop body (and not any program flow control instructions that define the loop).
If one or more hint instructions are subsequently identified at the beginning of the loop body at step 806, the loop control data is modified at step 810 so as to exclude the one or more hint instructions from a modified loop body that is then executed on subsequent iterations of the program loop. This approach therefore prevents further execution of the hint instructions when hint control data has already been populated based on the hint instructions. However, if an exception such as an interrupt is taken causing the hint control data and the loop control data to be cleared, the hint instructions will be executed again, thereby allowing performance modifying operating procedures associated with the hint instructions to be employed and any associated performance benefits achieved.
Figure 9 is a flowchart illustrating the operation of a data processing apparatus according to a second approach. In this second approach, the processing circuitry is arranged to detect at step 902 one or more hint instructions occurring following a loop-start instruction. Since the hint instruction(s) follow the loop-start instruction, it is known that they will occur at the start of the program loop body. Accordingly, after those hint instruction(s) have been executed, they can be excluded from the program loop body. At step 904, therefore, loop modification information is provided to the loop control circuitry 40. When a loop-end instruction is reached as established at step 906, the loop control information is stored identifying a modified loop body that excludes the one or more hint instructions based on the provided loop modification information. Thus, the modified loop body can be represented in the loop control data without needing to first populate the loop control data with the unmodified loop body. This approach avoids executing the hint instruction(s) on a second iteration of the program loop thereby further improving the overhead incurred as a result of the hint instruction(s).
Concepts described herein may be embodied in a system comprising at least one packaged chip. The apparatus described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).
As shown in Figure 10, one or more packaged chips 400, with the apparatus described above implemented on one chip or distributed over two or more of the chips, are manufactured by a semiconductor chip manufacturer. In some examples, the chip product 400 made by the semiconductor chip manufacturer may be provided as a semiconductor package which comprises a protective casing (e.g. made of metal, plastic, glass or ceramic) containing the semiconductor devices implementing the apparatus described above and connectors, such as lands, balls or pins, for connecting the semiconductor devices to an external environment. Where more than one chip 400 is provided, these could be provided as separate integrated circuits (provided as separate packages), or could be packaged by the semiconductor provider into a multi-chip semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chip product comprising two or more vertically stacked integrated circuit layers).
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chips 400 are assembled on a board 402 together with at least one system component 404 to provide a system 406. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system component 404 comprise one or more external components which are not part of the one or more packaged chip(s) 400. For example, the at least one system component 404 could include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
A chip-containing product 416 is manufactured comprising the system 406 (including the board 402, the one or more chips 400 and the at least one system component 404) and one or more product components 412. The product components 412 comprise one or more further components which are not part of the system 406. As a non-exhaustive list of examples, the one or more product components 412 could include a user input/output device such as a keypad, touch screen, microphone, loudspeaker, display screen, haptic device, etc.; a wireless communication transmitter/receiver; a sensor; an actuator for actuating mechanical motion; a thermal control device; a further packaged chip; an interface module; a resistor; a capacitor; an inductor; a transformer; a diode; and/or a transistor. The system 406 and one or more product components 412 may be assembled on to a further board 414.
The board 402 or the further board 414 may be provided on or within a device housing or other structural support (e.g. a frame or blade) to provide a product which can be handled by a user and/or is intended for operational use by a person or company.
The system 406 or the chip-containing product 416 may be at least one of: an end-user product, a machine, a medical device, a computing or telecommunications infrastructure product, or an automation control system. For example, as a non-exhaustive list of examples, the chip-containing product could be any of the following: a telecommunications device, a mobile phone, a tablet, a laptop, a computer, a server (e.g. a rack server or blade server), an infrastructure device, networking equipment, a vehicle or other automotive product, industrial machinery, consumer device, smart card, credit card, smart glasses, avionics device, robotics device, camera, television, smart television, DVD players, set top box, wearable device, domestic appliance, smart meter, medical device, heating/lighting control device, sensor, and/or a control system for controlling public infrastructure equipment such as smart motorway or traffic lights.
In the present application, the words "configured to..." are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a "configuration" means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. "Configured to" does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase "at least one of mean that any one or more of those features can be provided either individually or in combination. For example, "at least one of: A, B and C" encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative examples of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise examples, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.
Claims (18)
- CLAIMS1. An apparatus comprising: processing circuitry to perform processing operations specified by program instructions; and loop control circuitry to identify a program loop specified by the program instructions and to store loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; wherein the loop control circuitry is to control the processing circuitry to carry out loop iterations of the program loop body indicated by the loop control data; and wherein the processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions, to cause: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of a modified program loop body, an instruction following the hint instruction, and the processing circuitry to perform, when carrying out loop iterations of the modified program loop body, the performance modifying operating procedure as specified by the hint instruction.
- 2. The apparatus according to claim 1, wherein excluding the hint instruction from the modified program loop body prevents the hint instruction being executed on subsequent loop iterations of the given program loop.
- 3. The apparatus according to claim 1 or claim 2, wherein: the processing circuitry is configured to maintain hint control data indicative of the performance modifying operating procedures indicated by one or more hint instructions; and the processing circuitry is responsive to an exception to flush at least some loop control data and at least some hint control data.
- 4. The apparatus according to claim 3, wherein flushing the at least some loop control data re-enables execution of the hint instruction on return from the exception.
- The apparatus according to any preceding claim, wherein: the processing circuitry is responsive to execution of the hint instruction to determine whether the loop control data identifies the hint instruction as the beginning of the given program loop body; and in response to the loop control data identifying the hint instruction as the start of the given program loop body, the processing circuitry is configured to modify the loop control data to exclude the hint instruction from the modified program loop body.
- 6. The apparatus according to any preceding claim, wherein: the processing circuitry is responsive to execution of a further hint instruction to determine whether the loop control data identifies the further hint instruction as the beginning of the modified program loop body; and in response to the loop control data identifying the further hint instruction as the start of the modified program loop body, the processing circuitry is configured to further modify the loop control data to exclude the further hint instruction from the modified program loop 15 body.
- 7. The apparatus according to any preceding claim, wherein the loop control circuitry is configured to identify the program loop based on a loop-end instruction that specifies the program loop body of the program loop.
- 8. The apparatus according to claim 7, wherein the loop control circuitry is configured to identify a loop-start instruction that controls a number of times the program loop body is to be executed.
- 9. The apparatus according to claim 7 or claim 8, wherein the program loop is a zero-overhead loop.
- 10. The apparatus according to any preceding claim, wherein: the processing circuitry is responsive to a loop-start instruction followed by one or more hint instructions to provide, to the loop control circuitry, loop modification information identifying an instruction following the one or more hint instructions as the start of a modified program loop body; and the loop control circuitry is configured make use of received loop modification information when storing loop control data in response to a loop-end instruction by adjusting the stored loop control data identifying the start of the modified program loop body based on the received loop modification information.
- 11. The apparatus according to any preceding claim, wherein: the loop control circuitry comprises loop detection circuitry configured to detect program loops defined by one or more branch instructions; and the loop detection circuitry is responsive to detecting a program loop to store loop control data indicative of a program loop body of the detected program loop in order to perform a loop performance modifying operation that seeks to improve the performance of subsequent iterations of the detected program loop.
- 12. The apparatus according to claim 11, wherein: the loop control circuitry further comprises loop buffer circuitry for storing loop instruction data indicative of at least some of the instructions of a program loop body; and the loop buffer circuitry is configured to store the loop instruction data in response to the loop detection circuitry detecting a program loop.
- 13. The apparatus according to any preceding claim, wherein: the processing circuitry is responsive to a plurality of hint instructions occurring at the beginning of the given program loop body to cause: the loop control data for the given program loop to exclude the plurality of hint instructions from the modified program loop body by identifying, as the start of the modified program loop body, an instruction following the plurality of hint instructions, and the processing circuitry to perform, when carrying out the loop iterations of the modified program loop body, respective performance modifying operating procedures as specified by the plurality of hint instructions.
- 14. The apparatus according to any preceding claim, the apparatus comprising loop control data storage circuitry to store the loop control data.
- 15. A computer-readable medium to store computer-readable code for fabrication of an apparatus comprising: processing circuitry to perform processing operations specified by program instructions; and loop control circuitry to identify a program loop specified by the program instructions and to store loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; wherein the loop control circuitry is to control the processing circuitry to carry out loop iterations of the program loop body indicated by the loop control data; and wherein the processing circuitry is responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions, to cause: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of a modified program loop body, an instruction following the hint instruction, and the processing circuitry to perform, when carrying out loop iterations of the modified program loop body, the performance modifying operating procedure as specified by the hint instruction.
- 16. A system comprising: the apparatus of any of claims 1-14, implemented in at least one packaged chip; at least one system component; and a board; wherein the at least one packaged chip and the at least one system component are assembled on the board.
- 17. A chip-containing product comprising the system of claim 16 assembled on a further board with at least one other product component.
- 18. A method comprising: performing processing operations specified by program instructions; identifying a program loop specified by the program instructions and storing loop control data indicative of a program loop body of the program loop, the program loop body excluding any program flow control instructions that specify the program loop; controlling performance of loop iterations of the program loop body indicated by the loop control data; and causing, responsive to a hint instruction occurring at the beginning of a given program loop body, the hint instruction indicative of a performance modifying operating procedure that is to be employed for one or more subsequent program instructions: loop control data for the given program loop to exclude the hint instruction from a modified program loop body by identifying, as the start of the modified program loop body, an instruction following the hint instruction, and the performance modifying operating procedure to be performed as specified by the hint instruction when carrying out loop iterations of the modified program loop body.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2311878.9A GB2632311A (en) | 2023-08-02 | 2023-08-02 | Hints in a data processing apparatus |
| PCT/GB2024/051843 WO2025027270A1 (en) | 2023-08-02 | 2024-07-15 | Hints in a data processing apparatus |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2311878.9A GB2632311A (en) | 2023-08-02 | 2023-08-02 | Hints in a data processing apparatus |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202311878D0 GB202311878D0 (en) | 2023-09-13 |
| GB2632311A true GB2632311A (en) | 2025-02-05 |
Family
ID=87929690
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2311878.9A Pending GB2632311A (en) | 2023-08-02 | 2023-08-02 | Hints in a data processing apparatus |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2632311A (en) |
| WO (1) | WO2025027270A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080010635A1 (en) * | 2006-07-07 | 2008-01-10 | O'brien John Kevin | Method, Apparatus, and Program Product for Improving Branch Prediction in a Processor Without Hardware Branch Prediction but Supporting Branch Hint Instruction |
| US20150089141A1 (en) * | 2013-09-26 | 2015-03-26 | Andes Technology Corporation | Microprocessor and method for using an instruction loop cache thereof |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7051193B2 (en) * | 2001-03-28 | 2006-05-23 | Intel Corporation | Register rotation prediction and precomputation |
| US10241794B2 (en) * | 2016-12-27 | 2019-03-26 | Intel Corporation | Apparatus and methods to support counted loop exits in a multi-strand loop processor |
| US10572259B2 (en) * | 2018-01-22 | 2020-02-25 | Arm Limited | Hints in a data processing apparatus |
-
2023
- 2023-08-02 GB GB2311878.9A patent/GB2632311A/en active Pending
-
2024
- 2024-07-15 WO PCT/GB2024/051843 patent/WO2025027270A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080010635A1 (en) * | 2006-07-07 | 2008-01-10 | O'brien John Kevin | Method, Apparatus, and Program Product for Improving Branch Prediction in a Processor Without Hardware Branch Prediction but Supporting Branch Hint Instruction |
| US20150089141A1 (en) * | 2013-09-26 | 2015-03-26 | Andes Technology Corporation | Microprocessor and method for using an instruction loop cache thereof |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202311878D0 (en) | 2023-09-13 |
| WO2025027270A1 (en) | 2025-02-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12182574B2 (en) | Technique for predicting behaviour of control flow instructions | |
| US12411692B2 (en) | Storage of prediction-related data | |
| US12288073B2 (en) | Instruction prefetch throttling | |
| US20250231880A1 (en) | Operational modes for prefetch generation circuitry | |
| US20250053421A1 (en) | Register clearing | |
| GB2632311A (en) | Hints in a data processing apparatus | |
| US12293189B2 (en) | Data value prediction and pre-alignment based on prefetched predicted memory access address | |
| US12423100B1 (en) | Prefetch pattern selection | |
| US12423223B2 (en) | Access requests to local storage circuitry | |
| US20250390309A1 (en) | Technique for generating predictions of a target address of branch instructions | |
| US12417104B2 (en) | Switching a predicted branch type following a misprediction of a number of loop iterations | |
| US12373218B2 (en) | Technique for predicting behaviour of control flow instructions | |
| US12554646B2 (en) | Prefetch training circuitry | |
| US20260050443A1 (en) | Predicting an outcome of a branch instruction | |
| US12423109B2 (en) | Storing load predictions | |
| US20250245157A1 (en) | Prefetch training circuitry | |
| US12292834B2 (en) | Cache prefetching | |
| US12405898B1 (en) | Memory synchronisation subsequent to a page table walk | |
| US12340220B2 (en) | Register mapping to map architectural registers to corresponding physical registers based on a mode indicating a register length | |
| US12411771B2 (en) | Combiner cache structure | |
| US20250068939A1 (en) | Suppression of lookup of second predictor | |
| US12405800B2 (en) | Branch prediction based on a predicted confidence that a corresponding function of sampled register state correlates to a later branch instruction outcome | |
| US20260044349A1 (en) | Identification of prediction identifiers | |
| US12277063B1 (en) | Bypassing program counter match conditions | |
| US20250245010A1 (en) | Return address restoration |