HK1069648B

HK1069648B - A multi-threaded processor and method for assigning thread priority

Info

Publication number: HK1069648B
Application number: HK05102172.1A
Authority: HK
Inventors: Darrell Boggs; Alan Kyker; Michael Upton; James Allen; David Burns
Original assignee: Intel Corporation
Priority date: 2001-06-22
Filing date: 2002-04-18
Publication date: 2010-06-25

Description

Method for assigning priority and multithreaded processor

Technical Field

The present invention relates to the operation of processors or similar devices. More particularly, the present invention relates to resolving instruction starvation for threads in a multithreaded processor.

Background

As is well known in the art, a processor includes various sub-modules, each of which is adapted to perform a specific task. In a known processor, the sub-modules include the following: an instruction cache; an instruction fetch unit for reading the appropriate instruction from the instruction cache; decode logic to decode the instruction into a final format or an intermediate format; micro-operation logic that converts the intermediate instructions into a final format for execution; and an execution unit that executes final format instructions (in some examples, the instructions are from decode logic, or in other examples, the instructions are from micro-operation logic). The final format instruction as used herein is referred to as: and (5) micro-operation.

Program code to be executed by a processor may sometimes be broken down into several smaller portions, referred to as "threads. A thread is a series of instructions, and execution of the series of instructions can accomplish a specified task. For example, in a video telephony application, a processor may be requested to execute code to process video image data as well as audio data. There may be mutually independent code sequences whose execution is designed to process each of these data types. Thus, a first thread may include instructions for processing video image data, while a second thread may be instructions for processing audio data. Stated another way, a thread is a self-contained program that is typically associated with a thread identifier and is capable of maintaining its architectural state while executing instructions from another thread during execution in a multi-threaded environment.

In most processors, threads are processed sequentially by the processor. In general, the acceptance of a processing thread is: execution of the decoded micro-operations should occur prior to fetching of the new, undecoded instruction. This is because it is likely that decoded micro-operations will be executed properly, and that new, fetched instructions may eventually be "killed" (killed) due to, for example, branch mispredictions. However, initiating instruction fetching after execution of these micro-operations is typically unmasked, which causes some delay period to wait for decoded instructions to refill the execution pipeline. Thus, executing instructions of one thread can have a negative impact.

In the art, the use of multithreaded processors has been proposed. In such a processor, it is possible to switch between the execution of two or more threads. In other multithreaded processors, threads may be executed concurrently. In any of these processors, it may not be described how the processing between threads is. Specifically, code from one thread is assigned the same priority as code from another thread. This will negatively impact overall system performance, especially when execution of critical code is suspended or slowed by execution of non-critical code. In particular, if the decoded micro-operation is for a first thread and the instruction to be fetched is for a second thread to be processed concurrently (or in parallel) with the first thread, the recognition of the preference for execution of the micro-operation over fetching of a new instruction may increase the negative impact in a multithreaded processor. There may be situations where processing by the first thread may block or may unduly delay instruction fetching by the second thread. For the second thread, this may be referred to as instruction side (I side) starvation.

In view of the foregoing, there is a need to detect and resolve instruction starvation in a multithreaded processor.

Disclosure of Invention

In one aspect of the present invention, there is provided a method for assigning thread priorities in a multithreaded processor executing instructions for at least first and second threads, comprising:

determining whether instruction fetch operations of a first thread will be blocked due to processing of instructions of a second thread;

setting a threshold counter to perform a counting operation in response to the judging operation;

if it is determined that an instruction fetch operation of a first thread will be blocked due to processing of an instruction of a second thread and the counting operation is finished, a priority is assigned to the first thread in the processor.

Preferably, the method further comprises:

in response to the determination operation, a threshold counter is set to perform a counting operation.

Preferably, the method further comprises:

after the threshold counter ends its count operation, an instruction fetch operation of the first thread is performed.

Preferably, the method further comprises:

moving an instruction in an execution pipeline of the processor from a second thread to a temporary storage area.

In another aspect of the invention, a multithreaded processor is provided comprising:

first and second thread queues;

a threshold counter;

control logic coupled to the first and second thread queues, the control logic to determine whether instruction fetch operations of a first thread will be blocked due to processing of instructions of a second thread and to set the threshold counter to perform count operations,

wherein the control logic assigns a priority to a first thread in the processor if an instruction fetch operation of the first thread will be blocked due to processing of an instruction of a second thread and the count operation is complete.

Preferably, the processor further comprises a threshold counter for performing a count operation, wherein the control logic sets the threshold counter if it is determined that instruction fetch operations of the first thread will be blocked due to processing of instructions of the second thread.

Preferably, the control logic assigns a priority to the first thread after the threshold counter has finished its counting operation.

Preferably, the processor further comprises an execution pipeline and a temporary storage area, wherein the control logic is to move an instruction in the execution pipeline of the processor from the second thread to the temporary storage area.

Drawings

FIG. 1 is a block diagram of a computer system operating in accordance with an embodiment of the present invention.

FIG. 2 is a block diagram of a portion of a processor system constructed in accordance with an embodiment of the invention.

FIG. 3 is a state diagram for detecting and resolving instruction starvation according to an embodiment of the present invention.

Detailed Description

Referring now to FIG. 1, shown is a block diagram of a computer system operating in accordance with an embodiment of the present invention. In this example, the computer system 1 includes: a processor 10 capable of executing code stored in the memory 5. In this embodiment, the memory 5 stores the codes of several threads, such as the code of thread 0 (8), the code of thread 1 (9), and so on. The code for the two threads may be part of a user application and adapted to the operating system, as is well known in the art.

Referring now to FIG. 2, shown is a block diagram of a processor system (e.g., a microprocessor, digital signal processor, or the like) operating in accordance with an embodiment of the present invention. In this embodiment, the processor is a multi-threaded processor, where the processor 10 is theoretically split into two or more logical processors. The term "thread" as used herein refers to a sequence of instruction codes. For example, in a video telephony application, a processor may be requested to execute code to process video image data as well as audio data. There may be mutually independent code sequences whose execution is designed to handle each of these data types. Thus, a first thread may include instructions for processing video image data, while a second thread may be instructions for processing audio data. In this example, there are one or more execution units (e.g., including execution unit 41) that can execute one or more instructions at a time. However, the processor system 10 may be viewed as two logical processors: a first logical processor that executes instructions from a first thread, and a second logical processor that executes instructions from a second thread.

In this embodiment of processor system 10, instruction and/or data bytes are fetched by fetch unit 11 per thread and provided to queue 13 and stored as part of either the thread 0 queue or the thread 1 queue. Those skilled in the art will recognize that: queues used in processor system 10 may be used to store more than two threads. Instructions from both threads are provided to a Multiplexer (MUX)15 and control logic 17 is used to control whether instructions from thread 0 or thread 1 are provided to decode unit 21. Decode unit 21 may convert an instruction into two or more microinstructions and provide these microinstructions to queue 23 (in a RISC (reduced instruction set code) processor, these instructions may already be in decoded format, which the decode unit 21 converts into execution format). The output of queue 23 is provided to MUX 25, which MUX 25 provides instructions from either thread 0 or thread 1 to rename/allocate unit 31 in accordance with the operation of control logic 27. Rename/dispatch unit 31 in turn provides instructions to queues 33. MUX 35 selects between the thread 0 queue and the thread 1 queue according to the operation of schedule control logic 37, e.g., MUX 35 may select instructions from thread 0 and thread 1 according to the available resources in execution unit 41. In this embodiment, the output of MUX 35 is provided to out-of-order execution unit 41, and the instruction is executed by execution unit 41. The instruction is then placed in queue 43. The output of queue 43 is provided to MUX45, which MUX45 issues instructions from thread 0 and thread 1 to retirement unit 51 according to the operation of control logic 47.

In FIG. 2, branch prediction circuitry may be added to assist in the efficiency of processor system 10. For example, branch prediction circuitry may be added to fetch unit 11. As is known in the art, branch prediction in connection with prediction may be based on past history of executing code sequences, e.g., whether a branch instruction will be taken (e.g., BNE-branch if not equal). Once a branch is predicted, the next instruction is loaded into the "pipeline" (e.g., the unit that precedes execution unit 41) so that if the branch is taken as predicted, the appropriate instruction is immediately available to the execution unit. If the branch prediction is wrong, the instruction in the pipeline is also wrong and must be evicted and the appropriate instruction loaded into the pipeline.

In one example of a multithreaded processor, two threads may be processed in parallel. The present invention may be extended to process three or more threads in parallel based on the teachings herein. In this embodiment, the term "parallel" includes: simultaneous and/or sequential processing/execution of instructions. As used herein, when both threads need to use the same resource at the same time, thread priority is used to determine which thread can begin using the shared resource. Thread priority may be indicated by one or more signals stored in processor 10 (e.g., in memory area 4 of FIG. 1). For example, thread 0 priority and thread 1 priority would indicate which of the two threads (thread 0 or thread 1) has the higher priority. In one example, if both signals are turned off, neither thread has a higher priority than the other thread.

As described above, it may be the case that a first thread will have more access to a shared resource than another thread. For example, a first thread may have many decoded micro-operations executing in the processor while a second thread is waiting for a specified result from executing the first thread. If the second thread has already occupied many shared resources while waiting for the result, this may severely affect the processing of the first thread and there is a possibility that the processing of the first thread may be completely blocked. For example, the second thread's dominance of this portion of resources may effectively prevent fetching instructions for the first thread. Thus, significant stalls in the processing of the first and second threads result in poor execution performance by the processor. As another scenario for this problem, a first thread may be performing a store operation to a higher level of cache or main memory and a second thread attempts to retrieve an instruction in the higher level of cache or main memory. Data operations of a memory are typically assigned a higher priority than instruction fetch operations of the same memory. Thus, if the first thread has a large number of store operations to execute, the second thread will be effectively blocked from fetching instructions and prevented from advancing execution progress.

According to an embodiment of the invention, instruction side starvation is detected for any one of a plurality of threads. Referring now to fig. 3, a state diagram for detecting and resolving I-terminal starvation is shown. In one embodiment, the indication that I-side starvation may be approaching (i.e., "worry about I-side starvation") is based on the satisfaction of a number of conditions. Generally, I-starvation refers to when a thread is unable to fetch instructions because other threads have effectively blocked it from fetching instructions. As used herein, an indication of approaching I-terminal starvation is an indication that the situation may be approaching for a thread. The first condition for near-I starvation is: in contrast to the single-threaded processor mode, the processor is in a multi-threaded processing mode and more than one thread is active. In FIG. 3, block 101 indicates that the processor is in Single Threaded (ST) mode. This means that the control signal indicates: this mode has been set or in the case where the processor is only processing two threads at a time, where one of the threads is suspended in execution. In this case, control starts at block 103(ST mode). If both threads are active (block 105) because the processor is at least attempting to fetch and/or execute instructions from at least the first and second threads, control transfers to block 107 (normal MT (multithreading) mode). As indicated above, the indication that a thread may become instruction side starved is based on the satisfaction of several conditions. When all of these conditions are met (block 109), control moves to block 111. The first condition mentioned above is: the processor is in a multi-threaded mode. Other conditions may include:

2. the thread under consideration (e.g., thread 0 or thread 1) does not have any instructions in the execution pipeline (e.g., there is no instruction waiting at MUX 35 for schedule control logic 37 to cause a micro-operation for that thread to be passed to the execution unit 41 (fig. 2)).

3. The issue of new instructions to the execution pipeline is not blocked because the thread under consideration is already filled with the required resources. In this embodiment, the execution pipeline includes: the processing of instructions from MUX 35 through execution unit 41. For example, the execution unit 41 may include: a memory buffer for the thread under consideration and filled with store instructions. In this case, the processing of the thread, while not necessarily having been negatively affected by the lack of instruction fetching, is affected by the delay in executing the store instruction. Furthermore, taking measures to increase instruction fetching generally does not appreciably improve thread performance, since lack of resource availability often negatively affects execution of these instructions.

4. Any thread other than the thread under consideration is not given full or exclusive access to the processor feature. In this case, any instruction starvation on the part of the thread under consideration is often in mind.

5. The thread under consideration is in a state of attempting to fetch instructions. For example, among many processors including those produced by intel corporation (Santa Clara, California), a "Stop Clock" pin is included. The signal on this pin is active causing the processor to clear its resources. In this case, all resources may be cleared for the executable instructions of the thread under consideration. Thus, the lack of instruction fetching is often not considered starvation in this case. Switching from multi-threaded mode to single-threaded mode is another example when instruction starvation should not be considered a problem.

6. The higher-order performance saving protocol is inactive. For example, if there is another protocol that is effective to switch priority from one thread to another, running such a protocol in conjunction with the instruction starvation processing of the present invention may have a negative impact on processor performance.

7. The instruction starvation enable bit is set (i.e., a bit that can be set by control logic to turn off I-starvation detection/resolution).

8. The thread under consideration is not currently waiting for an instruction fetch that has left the chip (go off-chip) (e.g., left the processor, such as main memory).

In this embodiment, if all monitored conditions are met, then there is an indication that the approaching I-terminal of the thread is starved. Although eight conditions are given above, the present invention can be extended to additional or fewer conditions. For example, an indication of approaching I-terminal starvation may be based solely on the above conditions 1, 2, and 5 being true. Additionally, implementation of the flow chart of FIG. 3 may be accomplished through appropriately configured control logic (e.g., control logic 37 included in FIG. 2). Alternatively, the control logic may be a sub-module of the processor 10 that executes instructions to implement the flow diagram of FIG. 3.

Returning now to FIG. 3, in this embodiment, if all seven conditions are satisfied, then control passes to block 111, which control block 111 means: there is an indication that instruction starvation of the thread may occur. Thus, the I-side starvation threshold counter is started to perform the counting operation. In this embodiment, threshold counter 53 (FIG. 1) may be a down counter that counts down from a load value to 0 according to the system clock. For example, the value to be loaded into this counter may be set (or implemented by any other hardware or firmware) by control logic or operating microcode in the processor. If any of the above conditions are no longer valid (block 112), then control passes to block 113, which block 113 indicates that I-side starvation is no longer considered. If the threshold counter 53 reaches a predetermined value, (e.g., a timeout or count down to 0) (block 114), control passes to block 115 indicating that the instruction side is starved. In this embodiment, the threshold counter 55 gives the thread under consideration an opportunity to load instructions, thereby negating one or more of the above conditions.

According to embodiments of the invention, instruction side starvation of a thread may be addressed in order to resume instruction fetch operations for the starved thread. Returning now to FIG. 3, when the instruction starved thread does not have priority, control stays at block 115 (e.g., as indicated by the thread 0 priority and thread 1 priority signals), and any lock instructions are active (block 116). In this embodiment, the lock instruction is an instruction that requires exclusive access to a memory location. For example, an "atomic" operation is one in which: under which data values are retrieved from a memory location, modified, and then restored to the same memory location. In such an atomic operation, the specified memory location must be locked so that there is no intermediate access to the memory location until the operation is completed. When a priority is assigned to the instruction-starving thread and no locking mechanism is active (block 117), control passes to block 118 to proactively resolve the I-starvation. In embodiments of the invention, resolution of I-side starvation includes execution of one or more operations to provide instruction execution for the starved thread. This may be accomplished by performing one or more of the following operations:

1. instructions from threads that are not starving are moved from the execution pipeline to a temporary storage area (e.g., replay queue 33a) for later execution. Alternatively, the active instructions may be "killed" and reissued at a later time.

2. Preventing lock instructions from being initiated from threads that are not starving;

3. evicts all write-back buffers in the cache to free that resource for the instruction-starving thread.

4. Resetting reserved registers for the cache (e.g., removing exclusive access to resources that may be set for threads that are not starving); and

5. having control logic 37 not select instructions from threads that are not starving.

Once one of the conditions, which is an indication that I-side starvation is approaching, is no longer met, control then moves back to block 113 to reset the state of the processor to indicate that there is no further immediate concern for instruction-side starvation of any thread.

With the method and apparatus of the present invention, detection and resolution of instruction side starvation of threads can be handled in an efficient manner. Embodiments of the invention can significantly reduce the amount of time it takes to ensure that threads that do not access processor resources get access.

Although several embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.

Claims

1. A method for assigning thread priority in a multithreaded processor executing instructions of at least first and second threads, comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

4. A multithreaded processor comprising:

first and second thread queues;

a threshold counter;

5. The processor of claim 4, further comprising an execution pipeline and a temporary storage area, wherein the control logic is to move an instruction in the execution pipeline of the processor from the second thread to the temporary storage area.