HK1112980A

HK1112980A - Method and apparatus for managing a return stack

Info

Publication number: HK1112980A
Application number: HK08108305.5A
Authority: HK
Inventors: 罗德尼‧韦恩‧史密斯; 杰弗里‧托德‧布里奇斯; 詹姆斯‧诺里斯‧迪芬德尔费尔; 托马斯‧安德鲁‧萨托里乌斯
Original assignee: 高通股份有限公司
Priority date: 2005-02-18
Filing date: 2006-02-17
Publication date: 2008-09-19

Abstract

A processor includes a return stack circuit used for predicting procedure return addresses for instruction pre-fetching, wherein a return stack controller determines the number of return levels associated with a given return instruction, and pops that number of return addresses from the return stack. Popping multiple return addresses from the return stack permits the processor to pre-fetch the return address of the original calling procedure in a chain of successive procedure calls. In one embodiment, the return stack controller reads the number of return levels from a value embedded in the return instruction. A complementary compiler calculates the return level values for given return instructions and embeds those values in them at compile-time. In another embodiment, the return stack circuit dynamically tracks the number of return levels by counting the procedure calls (branches) in a chain of successive procedure calls.

Description

Method and apparatus for managing return stack

Technical Field

The present invention relates generally to microprocessors, and in particular, to managing the hardware return stack used by certain types of microprocessors in order to accelerate returns from procedure calls.

Background

As microprocessors are deployed in an ever increasing array of applications requiring complex functionality, there is a need to increase the execution speed of the microprocessors. Furthermore, in embedded applications, such as portable electronic devices with limited battery power, there is a need to reduce power consumption of the microprocessor. However, simply increasing the clock speed of a microprocessor may not produce the desired system performance increase, as various input/output bottlenecks impose constraints on the real-world performance of the microprocessor. For example, off-chip memory accesses typically run slower than on-chip memory accesses, resulting in the use of instruction and data caching techniques. Reduced Instruction Set Computers (RISC) typically issue one or more instructions per clock cycle and typically use instruction caches to enhance performance. Pipelined RISC processors can issue multiple instructions per clock cycle and typically use data and instruction caches extensively.

An instruction cache ("prefetch") predicts future instructions and places them into an on-chip instruction cache before the microprocessor executes the instructions. Prefetching may eliminate much of the latency associated with slower off-chip instruction memory when the correct instruction is prefetched. Most instructions are executed in sequence and can be prefetched with confidence. Conditional branch instructions may "take" or not "take" branches depending on the branch condition, which is typically evaluated only deep in the pipeline. To avoid the delay of waiting for this evaluation, the behavior of the branch instruction is typically predicted early in the pipeline, and instructions are prefetched from the predicted branch target address. Instruction prefetching approaches include both static and dynamic instruction prefetching.

Dynamic instruction prefetching relies on instruction execution history and may involve, for example, tracking the accuracy of previously taken or not-taken predictions for a given number of most recent conditional branch instructions. Static prefetching generally does not depend on execution history and may be used, for example, when a conditional branch is first encountered. One type of branch instruction for which static prefetching provides a performance advantage is a return instruction from a called program, where the return address of the program is predicted to support prefetching of instructions beginning at the predicted return address.

A "return stack" may be used to support static prediction of the return address of a procedure call return instruction. A typical return stack includes multiple levels of buffers. When a program call instruction is predicted or recognized, the corresponding return address may be fetched from the execution stage of the microprocessor's instruction pipeline and pushed onto the return stack. Conversely, when a program return instruction is predicted or recognized, the return address currently at the top of the return stack is popped from the stack and used as the predicted return address for instruction prefetching.

Thus, in conventional methods of managing the return stack, the corresponding predicted return addresses are pushed onto the return stack in order when a procedure call is encountered. Conversely, when a program return instruction is encountered, the return addresses are popped from the stack in order. Such conventional approaches incorrectly predict program return addresses in multi-level procedure calls, where successive procedure calls are "chained" together, because the return instruction of each subsequent procedure call in the chain points back to the return instruction of the previous procedure call.

Optimally, the return address that should be predicted for the return instruction of the last program in the string is the return address corresponding to the first program call in the string. However, since successive procedure calls result in the return address of each nested procedure call being pushed onto the return stack in sequence, the return address popped for the return instruction of the last procedure call is the return address of the immediately preceding calling procedure in the string. If prefetching continues from that address, the next prefetched instruction will be another return, which will pop the return stack again. Sequentially popping the return stack in this manner unnecessarily reduces processor performance and wastes power.

Disclosure of Invention

The present invention includes a method and apparatus for enabling a microprocessor to correctly predict the return address of an initial top-level calling program in a multi-level program call. Such multi-level procedure calls comprise a chain of two or more consecutive procedure calls, with the return address of each subsequent procedure pointing back to the return instruction of the immediately preceding procedure. Instead of providing the last pushed return address as the target of the return instruction in the last procedure called in the chain, the return stack circuit according to one or more embodiments of the present invention sequentially pops up a number of return addresses from the stack that is equal to the number of return levels in the chain of successive calls. In doing so, the return stack circuitry retrieves the return address of the initial caller, which may be provided to the instruction prefetch unit as a target for instruction prefetching.

Accordingly, in one embodiment, the present invention includes a method of managing a return stack based on: the number of return levels associated with the return instruction is determined and the number of return addresses are popped from the return stack. The last popped address may be provided to the instruction prefetch unit as a target address for instruction prefetching. The number of return addresses to be popped from the stack to obtain the predicted return address for a given return instruction may be read from a static level indicator value associated with the return instruction or may be determined on the fly during program execution.

In one embodiment, the level indicator may comprise a field or tag embedded in the return instruction, the value of which has been determined at compile time. Accordingly, the present invention includes a method of program compilation in which compiler logic determines a number of return levels associated with a return instruction and sets a return-level indicator associated with the return instruction to a value corresponding to the number. To determine the number of return levels, the compiler may count the number of procedure calls in a chained sequence of procedure calls. In general, a compiler determines the number of return levels associated with a return instruction by: detecting chained procedure calls; tracking the nesting depth of a given program call string; and setting a number of return levels for a last return instruction in a given program call string according to the nesting depth.

In another embodiment, a return stack circuit or other support logic in a microprocessor dynamically determines the number of return levels associated with a given return instruction at run-time. With this configuration, it is not necessary to store the return-level indicator at compile-time. The runtime determination of the return level may be based on methods similar to those used for compile-time embodiments. For example, the support logic may count the number of procedure calls in a chained procedure call sequence and set a return level value for the return instruction of the last called procedure based on the count. More generally, dynamic runtime tracking includes: detecting chained procedure calls; tracking the nesting depth of a given program call string; and setting a number of return levels for a last return instruction in a given program call string according to the nesting depth.

Drawings

FIG. 1 is a block diagram illustrating a microprocessor.

FIG. 2 is a block diagram illustrating a return stack used in the microprocessor of FIG. 1.

FIG. 3 is a program instruction flow diagram illustrating a multi-level procedure call comprising a series of procedure calls.

FIG. 4 is a logic flow diagram illustrating a process by which a return stack is popped according to the number of return levels in a given multi-level procedure call.

FIG. 5 is a block diagram of a compiler-generated return instruction including an embedded return-level indicator.

FIG. 6 is a logic flow diagram illustrating a process whereby a return stack is popped a number of times as indicated by a level indicator embedded in or associated with a given return instruction.

FIG. 7 is a logic flow diagram illustrating a process for detecting multi-level procedure calls, tracking call nesting depths within such multi-level procedure calls, and setting return levels based on the tracked depths.

Detailed Description

FIG. 1 illustrates, at least in part, a microprocessor 10 including a processor core 12, an instruction prefetch unit 14, an instruction cache 16, an instruction cache controller 18, a load/store unit 20, a data cache 22, a data cache controller 24, and a main translation look aside buffer 26. By way of non-limiting example, the microprocessor 10 may be a pipelined processor based on a Reduced Instruction Set Computer (RISC) architecture.

In one or more embodiments, the core 12 includes an instruction execution unit (not shown) that includes one or more multi-stage instruction pipelines. In operation, the core 12 executes program instructions and performs corresponding load/store data operations. The translation look-aside buffer 26 accepts input from the core 12 and provides output to the core 12. More particularly, the translation look aside buffer 26 interfaces the core 12 to the instruction cache 16 and the data cache 22, respectively. The instruction and data caches 16 and 22 comprise fast on-board memory, and the microprocessor 10 uses instruction and data prefetching via the instruction and data cache controllers 18 and 24 to keep the caches filled with the next needed instructions and data.

In particular, instruction prefetch operations of the microprocessor 10 include predicting the instruction stream returned from the called program. Predicting the address of the program return improves performance by enabling the microprocessor 10 to begin prefetching the most likely required instructions after the program is completed.

FIG. 2 illustrates one embodiment of a return stack circuit 30 used by the core 12 to predict the target address of a program return instruction. The illustrated return stack circuit 30 includes a return stack 32 and an associated return stack controller 34. The temporary storage registers may be logically linked together under the control of a return stack controller 34 to form a return stack 32. Various buffer arrangements may be implemented, but in one embodiment, the return stack 32 is configured as a last-in-first-out memory stack.

When an instruction execution unit in the instruction pipeline (not shown) of the core predicts or otherwise recognizes a program call, the return stack controller 34 receives the corresponding return address and pushes it onto the return stack 32. The return stack controller 34 thus pushes program return addresses onto the return stack 32 one at a time in sequence when predicting program calls. When a program return instruction is encountered or predicted, the return stack controller 34 typically pops up the program return address, providing it to the instruction prefetch unit 14.

However, in contrast to conventional return stack management methods, the return stack controller 34 does not always obtain a return address for instruction prefetching by popping only one return address from the return stack 32 at a time. In fact, the return stack controller 34 is generally configured to sequentially push return addresses onto the return stack 32 and to sequentially pop return addresses from the return stack 32, but is specifically configured to determine the number of return levels associated with a given return instruction and to pop that number of return addresses from the return stack 32. The last popped return address is then provided to prefetch unit 14 as the predicted program return address.

The ability to pop more than one return address at a time from the return stack 32 allows the return stack controller 34 to correctly predict the return address of the initial top-level caller in a multi-level procedure call. Such multi-level procedure calls include a chain of two or more consecutive procedure calls, with the return address of each subsequent procedure pointing back to the return instruction of the immediately preceding procedure.

Fig. 3 illustrates a series of program instructions comprising a number of procedure calls including multi-level procedure calls. According to the illustrated program flow, the "main" program contains a call to a program called "proc 1". The Proc1 program contains a call to a program called "Proc 2". The Proc2 program contains a call to a program called "Proc 3".

Thus, proc1 includes a first top-level program in a chain of consecutive program calls. The program string is characterized by the return address of the return instruction in each successively called program "pointing" back to the return instruction of the immediately preceding program. That is, the return address of the return instruction of proc3 is the address of the return instruction of proc2, and the return address of the return instruction of proc2 is the address of the next instruction after the initial caller proc 1.

FIG. 4 illustrates the operation of the return stack controller 34 of the program flow of FIG. 3. The main program includes a call instruction to the program proc1 at address "b 1", so the return stack controller 34 pushes the return address b1+1 onto the return stack 32 (step 100). In turn, the program proc1 includes a call instruction to the program proc2 at address "b 2", so the return stack controller 34 pushes the return address b2+1 onto the return stack 32 (step 102). Finally, the program proc2 includes a call instruction to the program proc3 at address b3, so the return stack controller 34 pushes the return address b3+1 onto the return stack 32 (step 104). The push sequence causes the return stack 32 to hold return addresses b3+1, b2+1, and b1+1, with b3+1 held at the top of the return stack 32.

For the return instruction of proc3, conventional return stack management methods pop the topmost stack value b3+1 and use that value as the predicted return address for instruction prefetching. The address will pop the return instruction of proc2, resulting in popping address b2+1 on the return stack 32, and the address b2+1 will pop the return instruction of proc1, resulting in popping b1+1 on the return stack 32. This sequential return stack popping and instruction prefetching wastes both power and processor execution time, negatively impacting performance. Due to the chained nature of successive procedure calls, the actual return address location is b1+1 in the initial top-level caller proc 1. Thus, the return stack controller 34 optimizes performance by determining the number of levels it must pop from the return stack 32 to obtain the correct top-level return address for the return instruction of proc3 (step 106).

The return stack controller 34 poppes the number of return addresses from the return stack 32 (step 108) and uses the last popped return address (here b1+1) as the predicted return address for the prefetch unit (PFU)14 (step 110). While the present discussion constructs operational descriptions in terms of pushing return addresses onto the return stack 32 and popping return addresses from the return stack 32, it should be appreciated that such operations may be based on logically moving a stack pointer as needed, such that the pointing value in the return stack 32 represents a return address for instruction pre-fetching.

FIG. 4 is a non-limiting example of the ability of the return stack controller to skip intermediate return addresses stored on the return stack 32 and provide a return address corresponding to the first top-level program in a chain of consecutive procedure calls when a return instruction of the last program in the chain is predicted or otherwise encountered. More generally, the return stack controller 34 is configured to determine the number of return levels associated with a given return instruction, and pop the return stack 32 a corresponding number of times to obtain a return address for instruction prefetching.

Fig. 5 illustrates a mechanism for determining the number of return levels associated with any given return instruction, where the return instruction includes or is associated with a return-level indicator whose value indicates the number of return levels. Thus, in one or more embodiments, the instruction set recognized and used by the microprocessor 10 includes a return instruction that includes a return opcode ("opcode") and a return-level indicator field or tag embedded with or otherwise linked to the opcode. Thus, as shown in FIG. 6, determining the number of return levels associated with any given return instruction includes reading the return-level field or tag of the instruction.

The return stack controller 34 may be configured to implement the processing logic of FIG. 6, where the return stack controller 34 determines whether a given return instruction includes or is associated with a return-level indicator, step 120. If so (step 122), the return stack controller 34 pops the return stack 32 a corresponding number of times (step 124) to obtain a return address for instruction prefetching. If not (step 122), the return stack controller 34 pops the return stack 32 a default number of times (step 126) to obtain a return address for instruction prefetching. For example, the default number of times may be one.

In one embodiment, the program compiler may be configured to generate a return-level indicator for each return instruction, and the return stack controller 34 may thus always be configured to read the respective return-level indicator value for each return instruction and control return stack popping accordingly. However, as shown in FIG. 6, the program compiler may only generate a return-level indicator if the number of return levels exceeds the default value of "one". In other words, the first type of return instruction does not include a return-level indicator, and the return stack controller 34 is configured to recognize this as an implicit indicator that the return stack 32 should be popped once. The second type of return instruction does include a return-level indicator, and the return stack controller is configured to use the value of the return-level indicator to control stack popping.

One or more embodiments of a program compiler may be configured to generate a return-level indicator for a return instruction according to the processing logic illustrated in fig. 7. The compiler may be configured to determine whether two or more consecutive procedure calls are chained together into a multi-level procedure call, step 130. The compiler may recognize multi-level procedure calls, for example, by determining whether the return instruction of the called procedure points back to the return instruction of the calling procedure.

If the given program call is not part of a multi-level program call string, the compiler may clear return-level trace information, such as a call-level counter (step 132), and continue normal compilation operations as needed. However, if the compiler detects a multi-level procedure call, it tracks the nesting (call) depth of the multi-level procedure call. A method of tracking nesting depth includes counting consecutive chained procedure calls (step 134). Thus, the compiler may maintain a counter to accumulate the call depth of the multi-level program.

When the return instruction of the last program in the string is detected (step 136), the compiler sets a return-level indicator for the instruction based on the tracked count (step 138). Thus, as explained above, the compiler may set the value of the return-level indicator for the last program's return instructions based on the accumulated count. Setting this value at compile-time, therefore, enables the return stack controller 34 to perform multi-level return address prediction at runtime (i.e., while the microprocessor 10 is executing compiled program code).

It should also be appreciated that the processing logic of FIG. 7 may be implemented as dynamic runtime processing carried out by the return stack controller 34 or by other support logic within the microprocessor 10. For example, the return stack controller 34 may be configured to cooperate with the instruction execution units or other logic of the kernel during runtime processing to detect multi-level programs, track the call depth of those programs, and generate appropriate return-level indicators for predicting the top-level return address. Thus, the return stack controller 34 or other circuitry within the core 12 may be configured with the necessary logic and memory circuit elements needed to discern chained procedure calls, track/count the call depth of chained procedure calls, and generate corresponding return-level indicator values for controlling return address prediction.

In general, the number of return levels may be determined at compile time or at run time. In either case, the number of return levels associated with a given instruction may be determined by counting procedure calls in a chained sequence, such that the return address predicted for the return instruction of the last procedure in the chain is the return address of the initial calling procedure at the top level, rather than the address of the immediately preceding procedure in the chain.

In any case, the microprocessor 10 implements a method of predicting return addresses for instruction prefetching for a chained sequence of program calls based on replacing the return address of the last program call in the chained sequence of program calls with the return address of the first program call in the chained sequence of program calls and providing the replaced return address for instruction prefetching. A chained procedure call may be detected by recognizing that the call to the next procedure for a given current procedure is the last instruction before the return instruction of the current procedure. In this manner, the performance of the microprocessor 10 is optimized with respect to execution of chained procedure calls because redundant intermediate return instructions are not fetched into the microprocessor's instruction pipeline and executed.

Those skilled in the art will appreciate that the foregoing discussion of one or more embodiments, as well as the accompanying drawings, do not limit the present invention. Rather, the present invention is limited only by the following claims and their legal equivalents.

Claims

1. A method of managing a return stack, comprising:

determining a number of return levels associated with the return instruction; and

popping the number of return addresses from the return stack.

2. The method of claim 1, further comprising providing a last popped return address from the return stack to an instruction prefetch unit as a predicted return address for instruction prefetching.

3. The method of claim 1, wherein determining a number of return levels associated with a return instruction comprises reading a return-level indicator value associated with the return instruction.

4. The method of claim 3, wherein reading a return-level indicator value associated with the return instruction comprises reading a value embedded in the return instruction.

5. The method of claim 1, wherein determining a number of return levels associated with a return instruction comprises determining whether a return instruction includes an embedded return-level indicator, if so, determining the number of return levels by reading the return-level indicator, if not, determining a number of return-level indicators based on a default return-level setting for the return stack.

6. The method of claim 5, further comprising setting the default return level setting for the return stack to one such that the return stack poppes one return address from the return stack for return instructions lacking an embedded return-level indicator and pops the indicated number of return addresses from the return stack for return addresses including an embedded return-level indicator.

7. The method of claim 1, wherein determining a number of return levels associated with a return instruction comprises counting a number of procedure calls in a chained sequence of procedure calls.

8. The method of claim 1, wherein determining a number of return levels associated with a return instruction comprises: detecting chained procedure calls, tracking a nesting depth of a given chain of procedure calls, and setting a number of return levels for a last return instruction in the given chain of procedure calls according to the nesting depth.

9. A return stack circuit, comprising:

a return stack configured to store a plurality of return addresses;

a return stack controller generally configured to sequentially push and pop return addresses onto and from the return stack, and in particular configured to determine the number of return levels associated with a given return instruction and to pop that number of return addresses from the return stack.

10. The return stack circuit of claim 9, wherein the return stack controller is further configured to provide a return address last popped from the return stack to an instruction prefetch unit as a predicted return address for instruction prefetching.

11. The return stack circuit of claim 9, wherein the return stack controller is configured to determine a number of return levels associated with a given return instruction by reading a return-level indicator value associated with the return instruction.

12. The return stack circuit of claim 11, wherein the return stack controller is configured to read a value embedded in the given return instruction as the return-level indicator.

13. The return stack circuit of claim 9, wherein the return stack controller is configured to determine a number of return levels associated with the given return instruction by determining whether the given return instruction includes an embedded return-level indicator, if so, by reading the return-level indicator, if not, determining a number of return-level indicators based on a default return-level setting for the return stack.

14. The return stack circuit of claim 13, further comprising setting the default return level setting for the return stack to one such that the return stack poppes one return address from the return stack for return instructions lacking an embedded return-level indicator and pops an indicated number of return addresses from the return stack for return instructions including an embedded return-level indicator.

15. The return stack circuit of claim 9, wherein the return stack controller is configured to determine the number of return levels associated with the given return instruction by counting a number of procedure calls in a sequence of chained procedure calls.

16. The return stack circuit of claim 9, wherein the return stack controller is configured to determine the number of return levels associated with the given return instruction by: detecting chained procedure calls, tracking a nesting depth of a given chain of procedure calls, and setting a number of return levels for a last return instruction in the given chain of procedure calls according to the nesting depth.

17. A computer program compilation method, comprising:

setting a return-level indicator associated with the return instruction to a value corresponding to the number.

18. The method of claim 17, wherein determining a number of return levels associated with a return instruction comprises counting a number of procedure calls in a chained sequence of procedure calls.

19. The method of claim 17, wherein determining a number of return levels associated with a return instruction comprises: detecting chained procedure calls; tracking the nesting depth of a given program call string; and setting a number of return levels for a last return instruction in the given program call string according to the nesting depth.

20. The method of claim 17, wherein setting a return-level indicator associated with the return instruction to a value corresponding to the number comprises setting a value embedded in the return instruction.

21. A method of predicting return addresses for instruction prefetching for a chained procedure call sequence, comprising:

replacing a return address of a last procedure call in the chained procedure call sequence with a return address of a first procedure call in the chained procedure call sequence; and

the replacement return address is provided for instruction prefetching.

22. The method of claim 22, further comprising detecting a chained procedure call sequence by recognizing that a call to a next procedure for a given current procedure is the last instruction before a return instruction of the current procedure.

23. The method of claim 22, further comprising detecting a chained sequence of procedure calls during program compilation.

24. The method of claim 22, further comprising detecting a chained sequence of procedure calls during procedure execution.

25. The method of claim 21, wherein replacing the return address of the last procedure call in the sequence of chained procedure calls with the return address of the first procedure call in the sequence of chained procedure calls comprises: for successive procedure calls in the chained procedure call sequence, pushing return addresses onto a return stack in sequence; and in response to the return instruction of the last procedure call, popping the return stack the number of times required to obtain the return address of the first procedure call.