US20030163678A1 - Method and apparatus for reducing branch prediction table pollution - Google Patents
Method and apparatus for reducing branch prediction table pollution Download PDFInfo
- Publication number
- US20030163678A1 US20030163678A1 US09/507,499 US50749900A US2003163678A1 US 20030163678 A1 US20030163678 A1 US 20030163678A1 US 50749900 A US50749900 A US 50749900A US 2003163678 A1 US2003163678 A1 US 2003163678A1
- Authority
- US
- United States
- Prior art keywords
- branch
- address
- branch target
- bits
- btb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
Definitions
- the technical field is computer architectures that use branch prediction as a means to improve processing performance.
- Modem microprocessors frequently use long pipelines to process instructions.
- a side effect of these long pipelines is an increase in the penalty for branches, which must redirect the instruction sequence.
- this branching behavior requires flushing at least a portion of the pipeline, thereby degrading pipeline performance.
- Branch prediction structures are commonly implemented in hardware to mitigate this penalty.
- a branch prediction structure may predict branch targets and may store the branch target information in a branch prediction table.
- some branch target information that is stored in the branch target structure may be incorrect. These errors may occur because in some cases, only a portion of a target address is stored in the branch prediction table. In these cases, the remainder of the target address is inferred, typically using bits from the current instruction address. If this assumption is incorrect, entries in the branch prediction structure can be wasted and/or cause inefficient branch prediction. This incorrect information cannot be used for subsequent branch predictions and so is useless. The presence of this useless information is referred to as branch pollution.
- a comparator compares aliasing bits of a predicted branch target to corresponding bits of a current instruction address.
- the address comparison of the aliasing bits is made to determine if a branch target address is outside of a branch target range for a branch prediction structure. If the aliasing bits match, then assumptions about the branch target address being in a same memory block as the current instruction are correct, and the branch prediction is useable. If the aliasing bits do not match, then the branch prediction will be incorrect.
- the results of the comparison are stored in a branch resolution table.
- the branch resolution table stores branches that are in the pipeline but that have not yet retired.
- a branch instruction retires a corresponding branch entry is accessed and a comparison result bit is examined. If the comparison result bit indicates that the branch target did not alias, the branch entry is allowed to update into the branch prediction structure so that future occurrences of the branch can be predicted. Otherwise, the branch entry will not be inserted. Avoiding insertion of the branch entry when the entry would have provided an incorrect branch target saves entry space in the branch prediction structure that can be used for more useful predictions, and potentially prevents additional incorrect predictions that may result from using an incorrect branch target.
- the same comparison result bit flows down the pipeline with the rest of the instruction until retirement of the instruction. At retirement, if the comparison result bit indicates that the aliasing bits match, then the entry is allowed to be inserted into the branch prediction structure.
- FIG. 1 is a block diagram of a computer system using a branch target buffer
- FIG. 2 illustrates a branch target buffer
- FIG. 3 illustrates a processing pipeline used in conjunction with the branch target buffer of FIG. 2;
- FIG. 4 is a flowchart illustrating processes executed in conjunction with the branch target buffer of FIG. 1.
- a branch prediction structure predicts the target of a branch as an instruction fetch unit fetches an instruction.
- the prediction function is speculative and may be wrong.
- the processor is able to detect and recover when an incorrect prediction is made. Predictions made by the branch prediction structure of targets of direct branches may be verified downstream by a branch address calculator. If the branch prediction structure does not provide a prediction, the branch address calculator may calculate the targets and re-steer the fetch unit. Finally, once a branch is identified, the branch prediction structure may predict the target of that branch instruction.
- branch targets for many branches are known early in the pipeline, but the branch targets are not entered into the branch prediction structure until retirement of the branch instruction.
- One of the data fields that may be included in such branch prediction structures is the branch target address. Due to space or timing constraints, the computer system, in particular the branch prediction structure, may only store a portion of the branch target address. The remaining data bits (referred to as aliasing bits) are implicit from the address of the branch itself. The tacit assumption is that the branch instruction targets another address within a same memory range as the current instruction.
- the predicted branch target is only valid if the branch target is in the same 1 MByte range (2 to the 20 th power) as the branch instruction itself. If any of the upper 12 bits do not match, then this assumption is incorrect. Allowing incorrect branch target predictions to enter the branch prediction structure wastes an entry because the entry is not likely to ever correctly predict the branch target. This condition is called pollution of the branch prediction structure.
- a branch target buffer can be used to provide dynamic branch prediction. That is, the BTB predicts branches early in a fetch pipeline to minimize the penalty that results from flushing and re-steering the target of the branch, once the branch target address is determined.
- instruction execution may be predicted to continue without branching. Any predicted taken branches may have a clock delay of one or, often, more, cycles.
- the BTB may store a history of branch predictions. Then, during the process of instruction fetch, the instruction address is checked with the entries in the BTB. If the address is not in the BTB, instruction execution is predicted to continue to the next instruction without branching behavior.
- FIG. 1 shows a computer system 5 that incorporates branch prediction.
- the system 5 includes one or more processors 12 i and a memory subsystem 16 .
- Each processor 12 i may also include an on-chip memory controller and/or cache memory 17 , as is well known in the art.
- An instruction fetch unit (IFU) 18 in a processor 12 initiates an instruction fetch request for one or more instructions to the memory controller 17 , which may also access the memory subsystem 16 according to principles well known in the art, and controls processing according to a specified pipeline design.
- a branch target buffer (BTB) 10 uses the instruction fetch address to predict whether the fetched instructions may contain a branch or not. If a branch is predicted to be taken, the IFU 18 will redirect program flow to the target of the branch.
- BTB branch target buffer
- BAC 14 branch address calculator
- the BAC 14 decodes the instruction returned from the memory controller 17 , and calculates branch sense and/or target address information.
- the BAC 14 calculated information may be more accurate than the BTB 10 information, since actual instruction data is being used to perform the calculations. For example, branch targets that are encoded in the instruction, e.g., direct branches, can be accurately determined by the BAC 14 .
- the BAC 14 will compare the calculated branch information against the prediction made by the BTB 10 . If the BTB 10 failed to predict a branch, or if the BTB 10 predicted sense and/or target address is determined to be incorrect, the BAC 14 will cause the IFU 18 to redirect the program flow in accordance with the calculated BAC 14 information.
- the BAC 14 includes a Branch Resolution Table (BRT) 15 .
- the BRT 15 is used to store information about the branch. This information is used during processing in the pipeline 19 , through a retirement stage, at which time actual branch taken/not taken sense and branch target address is known for certainty. Note that the sense and/or branch target addresses for some branches may be known with certainty before retirement. For example, the branch target address for direct branches may be known with certainty by the BAC 14 .
- branch information can be pipelined along with the instruction to the execution and retirement pipeline 19 .
- Branch information stored in the BRT 15 and/or in the pipeline 19 is often used to update the BTB 10 with branch sense and target information. In an embodiment, this information may not be stored until the actual sense and/or target address is known, i.e., at retirement. As an example, retirement logic in the execution and retirement pipeline 19 can be sent to the BAC 14 . This information, combined with information stored in the BRT 15 , can be used to update the BTB 10 .
- BTB predictions are made solely on the basis of an instruction address, whereas the BAC 14 actually examines the instruction data and determines what the branch target is for direct branches where the target is encoded in the instruction itself.
- FIG. 1 illustrates one possible arrangement of the computer system 5 .
- other component arrangements are possible that will allow reduction of branch prediction table pollution.
- FIG. 2 shows an example of a BTB, such as the BTB 10 , that may be used for dynamic branch prediction.
- each such processor such as the processor 12
- an instruction address 30 is generated.
- a portion of the bits, such as BTB index bits 32 are used to index into the BTB 10 using a decoder 44 .
- the BTB has 128 entries, so that 7 index bits 32 are required to uniquely index each entry in the BTB 10 .
- tag bits 31 are compared to entry tag 21 to determine whether an entry selected by the index bits 32 in the BTB 10 pertain to a current instruction address 30 . As is common in the art, only a portion of the tag bits 31 may be stored in the entry tag 21 of an entry 20 .
- Additional fields 24 are provided in each BTB entry which are well known in the art. For instance, additional fields may include branch prediction taken/not taken history or branch type.
- a branch target field 23 in the BTB 10 indicates that only a portion of a branch target 40 is stored in the BTB.
- One or more alias bits 41 are not stored in the BTB 10 ; the remaining bits will be implied from the current instruction address 30 when the BTB entry 20 i is used to predict a branch. Only storing a partial branch target, often chosen due to space or timing constraints, results in the potential to incorrectly predict a branch if the alias bits do not, in fact, match the address of the branch instruction itself 31 .
- FIG. 3 illustrates a simplified processing pipeline 101 that maybe used in conjunction with the BTB 10 .
- the pipeline 101 includes a main processing pipeline 110 , a branch target pipeline 120 and a branch address pipeline 130 . Processing in the pipelines 110 , 120 and 130 may occur in parallel.
- the main pipeline 110 may include one or more instruction fetch stages 112 , an instruction execute stage 114 , and a retirement stage 116 . As indicated in FIG. 3, numerous other stages may be included in the main stage 110 .
- the branch target pipeline 120 may include one or more branch target stages 122 in which the BTB 10 predicts a branch taken or not taken.
- the branch address pipeline 130 includes one or more branch address stages 132 , in which the branch address is checked by the branch address comparator 14 .
- an instruction address as stored in the memory 16 may comprise 32 bits. However, instead of storing all 32 bits of an instruction address for a target branch, the BTB 10 may implement only a subset of the address bits, under the assumption that a target branch address is likely to be close to a current instruction address. In an embodiment, only 20 bits of the branch target address are stored in the branch target field 23 of the BTB 10 . The remaining 12 bits of the branch target address 41 are implied based on the address of the current instruction. Thus, when a predicted branch is taken, the upper 12 bits of the current instruction address are prepended to the lower 20 bits of the branch target address, with the lower 20 bits of the branch target address stored in the branch target field 26 of the BTB 10 . In an embodiment, the branch target address is then assumed to be within a 1 Mbyte memory block, or branch target range.
- a normal sequence may start with the BTB 10 empty of any entries.
- the IFU 18 reads through the BTB 10 , but because the BTB 10 contains no entries, the processing continues to the branch address calculator (BAC) stage of the pipeline.
- the BAC 14 determines if an address of the branch target is more than 1 Mbyte away from the current instruction address. This determination is made by comparing the upper bits of the BAC 14 calculated address to the corresponding upper bits of the current instruction, that is, the branch instruction. Since the BTB 10 had no entry for the branch, and was therefore unable to predict the branch at all, the BAC 14 will need to re-steer instruction fetch to the target of the branch. After the retirement stage 116 , the branch target address is written to the BTB 10 .
- the address written to the BTB 10 is truncated to 20 bits. This may cause unnecessary flushing and re-steering, unless a mechanism is provided to detect this error.
- the BTB 10 and BAC 14 will again encounter the branch target instruction. However, this time the instruction address has an entry in the BTB 10 .
- the BTB 10 will construct the predicted branch target by concatenating the partial target address bits stored in the BTB 10 and the implied (or aliasing) bits from the current fetch address. That is, the remaining 12 bits of the current instruction address are prepended to the lower 20 bits of the branch target address.
- the processor will then re-steer to that target address, which is within 1 MByte of the current instruction address, but which is incorrect.
- the BAC 14 will note the incorrect address and flush the pipeline, invalidate the BTB entry, and re-steer again. Then, processing of the instruction will continue through the pipeline 110 . When the instruction processing reaches the retirement stage 116 , the branch target instruction address will be allocated back into the BTB 10 . The next instance of this instruction will therefore also result in a flush and re-steer.
- the BAC 14 compares the upper unimplemented bits of the target address (e.g., the upper 12 bits or aliasing bits) with corresponding bits in the current instruction address. If the aliasing bits match the corresponding bits in the address of the branch instruction, then the assumptions about the branch target address being in the same memory block as the address of the current instruction are correct, and the prediction is usable. If the result of the comparison is no match, the branch target prediction is incorrect.
- the target address e.g., the upper 12 bits or aliasing bits
- the result of comparing the aliasing bits is stored in the BRT 15 (see FIG. 1).
- Each result or entry includes a comparison bit that indicates if the aliasing bits in the predicted target and the address of the branch instruction match.
- the BRT 15 stores branches that are in the pipeline 19 , but that have not been retired. When the branch retires, the corresponding branch entry is accessed and the comparison bit is examined. If the comparison bit indicates that the branch target address did not alias, the BTB 10 is updated with the branch information. If the comparison bit indicates no match, the BTB 10 is not updated with the branch information. This prevents the recording of a branch target address that will cause an extra flush and re-steer.
- the comparison bit may be set to 1 if the comparison indicates no match.
- Other encoding mechanisms may be used to suppress updating of the BTB 10 .
- a comparison bit may be set with the instruction.
- the comparison bit will flow down the pipeline with the rest of the instruction until the instruction retires. At retirement, if the comparison bit indicates that the aliasing bits match, then the entry is allowed to be inserted into the BTB 10 .
- the BTB 10 may be updated before retirement.
- the result of the aliasing bit comparison is used to determine whether an entry should be allocated to the BTB 10 .
- FIG. 4 illustrates a process used to reduce branch prediction table pollution.
- the process starts at 100 .
- the BTB 10 predicts a branch target address, Block 110 .
- the instruction is fetched, further processed and sent to the BAC 14 , Block 120 .
- the BAC 14 then computes the branch target address 130 and compares the aliasing bits of the computed and predicted branch target address, Block 140 .
- the results of the aliasing bit comparison are encoded and stored, e.g., in the pipeline 19 or in the BRT 15 , Blocks 150 , 160 .
- the stored comparison bit is examined, Block 180 .
- Block 190 If the encoding of the comparison bit indicates that the aliasing bits matched, the BTB 10 is updated with the branch information, Block 190 . If the encoding of the comparison bit indicates that the aliasing bits do not match, the BTB update is suppressed, Block 195 . In Block 200 , the process ends.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Description
- The technical field is computer architectures that use branch prediction as a means to improve processing performance.
- Modem microprocessors frequently use long pipelines to process instructions. A side effect of these long pipelines is an increase in the penalty for branches, which must redirect the instruction sequence. Usually, this branching behavior requires flushing at least a portion of the pipeline, thereby degrading pipeline performance. Branch prediction structures are commonly implemented in hardware to mitigate this penalty.
- A branch prediction structure may predict branch targets and may store the branch target information in a branch prediction table. However, some branch target information that is stored in the branch target structure may be incorrect. These errors may occur because in some cases, only a portion of a target address is stored in the branch prediction table. In these cases, the remainder of the target address is inferred, typically using bits from the current instruction address. If this assumption is incorrect, entries in the branch prediction structure can be wasted and/or cause inefficient branch prediction. This incorrect information cannot be used for subsequent branch predictions and so is useless. The presence of this useless information is referred to as branch pollution.
- A comparator compares aliasing bits of a predicted branch target to corresponding bits of a current instruction address. The address comparison of the aliasing bits is made to determine if a branch target address is outside of a branch target range for a branch prediction structure. If the aliasing bits match, then assumptions about the branch target address being in a same memory block as the current instruction are correct, and the branch prediction is useable. If the aliasing bits do not match, then the branch prediction will be incorrect.
- The results of the comparison are stored in a branch resolution table. The branch resolution table stores branches that are in the pipeline but that have not yet retired. When a branch instruction retires, a corresponding branch entry is accessed and a comparison result bit is examined. If the comparison result bit indicates that the branch target did not alias, the branch entry is allowed to update into the branch prediction structure so that future occurrences of the branch can be predicted. Otherwise, the branch entry will not be inserted. Avoiding insertion of the branch entry when the entry would have provided an incorrect branch target saves entry space in the branch prediction structure that can be used for more useful predictions, and potentially prevents additional incorrect predictions that may result from using an incorrect branch target.
- In an alternative embodiment, the same comparison result bit flows down the pipeline with the rest of the instruction until retirement of the instruction. At retirement, if the comparison result bit indicates that the aliasing bits match, then the entry is allowed to be inserted into the branch prediction structure.
- The detailed description refers to the following drawings in which like numerals refer to like items, and wherein:
- FIG. 1 is a block diagram of a computer system using a branch target buffer;
- FIG. 2 illustrates a branch target buffer;
- FIG. 3 illustrates a processing pipeline used in conjunction with the branch target buffer of FIG. 2; and
- FIG. 4 is a flowchart illustrating processes executed in conjunction with the branch target buffer of FIG. 1.
- During instruction processing in modem computer systems, the processing may follow one or more branches that cannot be predicted with certainty in advance. An incorrect branch prediction may result in a significant processing penalty. In particular, with a deeply pipelined machine, a branch penalty, on the order of several cycles, may occur. Clock cycles are wasted if the computer system waits until the branch target is determined to start fetching instructions after the branch. To avoid this delay, a branch prediction structure predicts the target of a branch as an instruction fetch unit fetches an instruction. The prediction function is speculative and may be wrong. However, the processor is able to detect and recover when an incorrect prediction is made. Predictions made by the branch prediction structure of targets of direct branches may be verified downstream by a branch address calculator. If the branch prediction structure does not provide a prediction, the branch address calculator may calculate the targets and re-steer the fetch unit. Finally, once a branch is identified, the branch prediction structure may predict the target of that branch instruction.
- In one implementation of such a branch prediction structure, branch targets for many branches are known early in the pipeline, but the branch targets are not entered into the branch prediction structure until retirement of the branch instruction. One of the data fields that may be included in such branch prediction structures is the branch target address. Due to space or timing constraints, the computer system, in particular the branch prediction structure, may only store a portion of the branch target address. The remaining data bits (referred to as aliasing bits) are implicit from the address of the branch itself. The tacit assumption is that the branch instruction targets another address within a same memory range as the current instruction. For instance, if the lower 20 bits out of 32 are stored for the branch target, then the predicted branch target is only valid if the branch target is in the same 1 MByte range (2 to the 20th power) as the branch instruction itself. If any of the upper 12 bits do not match, then this assumption is incorrect. Allowing incorrect branch target predictions to enter the branch prediction structure wastes an entry because the entry is not likely to ever correctly predict the branch target. This condition is called pollution of the branch prediction structure.
- Enhancements to the branch prediction structure help correctly predict a branch to be followed, thereby increasing the efficiency of the processing. In particular, a branch target buffer (BTB) can be used to provide dynamic branch prediction. That is, the BTB predicts branches early in a fetch pipeline to minimize the penalty that results from flushing and re-steering the target of the branch, once the branch target address is determined. In general, if an instruction address is not recorded in the BTB, instruction execution may be predicted to continue without branching. Any predicted taken branches may have a clock delay of one or, often, more, cycles. Finally, the BTB may store a history of branch predictions. Then, during the process of instruction fetch, the instruction address is checked with the entries in the BTB. If the address is not in the BTB, instruction execution is predicted to continue to the next instruction without branching behavior.
- FIG. 1 shows a
computer system 5 that incorporates branch prediction. Thesystem 5 includes one or more processors 12 i and amemory subsystem 16. Each processor 12 i may also include an on-chip memory controller and/orcache memory 17, as is well known in the art. An instruction fetch unit (IFU) 18 in a processor 12 initiates an instruction fetch request for one or more instructions to thememory controller 17, which may also access thememory subsystem 16 according to principles well known in the art, and controls processing according to a specified pipeline design. A branch target buffer (BTB) 10 uses the instruction fetch address to predict whether the fetched instructions may contain a branch or not. If a branch is predicted to be taken, theIFU 18 will redirect program flow to the target of the branch. Information about taken branches, including the predicted sense of the branch (i.e., taken or not taken) and the predicted target of the branch, is sent down the pipeline to a branch address calculator (BAC) 14. TheBAC 14 decodes the instruction returned from thememory controller 17, and calculates branch sense and/or target address information. TheBAC 14 calculated information may be more accurate than theBTB 10 information, since actual instruction data is being used to perform the calculations. For example, branch targets that are encoded in the instruction, e.g., direct branches, can be accurately determined by theBAC 14. TheBAC 14 will compare the calculated branch information against the prediction made by theBTB 10. If theBTB 10 failed to predict a branch, or if theBTB 10 predicted sense and/or target address is determined to be incorrect, theBAC 14 will cause theIFU 18 to redirect the program flow in accordance with thecalculated BAC 14 information. - In an embodiment, the
BAC 14 includes a Branch Resolution Table (BRT) 15. TheBRT 15 is used to store information about the branch. This information is used during processing in thepipeline 19, through a retirement stage, at which time actual branch taken/not taken sense and branch target address is known for certainty. Note that the sense and/or branch target addresses for some branches may be known with certainty before retirement. For example, the branch target address for direct branches may be known with certainty by theBAC 14. - In another embodiment, the branch information can be pipelined along with the instruction to the execution and
retirement pipeline 19. - Branch information stored in the
BRT 15 and/or in thepipeline 19 is often used to update theBTB 10 with branch sense and target information. In an embodiment, this information may not be stored until the actual sense and/or target address is known, i.e., at retirement. As an example, retirement logic in the execution andretirement pipeline 19 can be sent to theBAC 14. This information, combined with information stored in theBRT 15, can be used to update theBTB 10. - A key distinction between the BTB predictions and the BAC predictions is that the BTB predictions are made solely on the basis of an instruction address, whereas the
BAC 14 actually examines the instruction data and determines what the branch target is for direct branches where the target is encoded in the instruction itself. - FIG. 1 illustrates one possible arrangement of the
computer system 5. As would be obvious to those skilled in the art, other component arrangements are possible that will allow reduction of branch prediction table pollution. - FIG. 2 shows an example of a BTB, such as the
BTB 10, that may be used for dynamic branch prediction. In a computer system with multiple processors, each such processor, such as the processor 12, may include aBTB 10. During instruction fetch by anIFU 18, aninstruction address 30 is generated. A portion of the bits, such asBTB index bits 32, are used to index into theBTB 10 using adecoder 44. For the example shown, the BTB has 128 entries, so that 7index bits 32 are required to uniquely index each entry in theBTB 10. Once an entry is selected,tag bits 31 are compared toentry tag 21 to determine whether an entry selected by theindex bits 32 in theBTB 10 pertain to acurrent instruction address 30. As is common in the art, only a portion of thetag bits 31 may be stored in theentry tag 21 of an entry 20. -
Additional fields 24 are provided in each BTB entry which are well known in the art. For instance, additional fields may include branch prediction taken/not taken history or branch type. - A
branch target field 23 in theBTB 10 indicates that only a portion of abranch target 40 is stored in the BTB. One ormore alias bits 41 are not stored in theBTB 10; the remaining bits will be implied from thecurrent instruction address 30 when the BTB entry 20 i is used to predict a branch. Only storing a partial branch target, often chosen due to space or timing constraints, results in the potential to incorrectly predict a branch if the alias bits do not, in fact, match the address of the branch instruction itself 31. - FIG. 3 illustrates a
simplified processing pipeline 101 that maybe used in conjunction with theBTB 10. Thepipeline 101 includes amain processing pipeline 110, abranch target pipeline 120 and abranch address pipeline 130. Processing in thepipelines main pipeline 110 may include one or more instruction fetchstages 112, an instruction executestage 114, and aretirement stage 116. As indicated in FIG. 3, numerous other stages may be included in themain stage 110. Thebranch target pipeline 120 may include one or more branch target stages 122 in which theBTB 10 predicts a branch taken or not taken. Finally, thebranch address pipeline 130 includes one or more branch address stages 132, in which the branch address is checked by thebranch address comparator 14. - As noted above, an instruction address as stored in the
memory 16 may comprise 32 bits. However, instead of storing all 32 bits of an instruction address for a target branch, theBTB 10 may implement only a subset of the address bits, under the assumption that a target branch address is likely to be close to a current instruction address. In an embodiment, only 20 bits of the branch target address are stored in thebranch target field 23 of theBTB 10. The remaining 12 bits of thebranch target address 41 are implied based on the address of the current instruction. Thus, when a predicted branch is taken, the upper 12 bits of the current instruction address are prepended to the lower 20 bits of the branch target address, with the lower 20 bits of the branch target address stored in the branch target field 26 of theBTB 10. In an embodiment, the branch target address is then assumed to be within a 1 Mbyte memory block, or branch target range. - A normal sequence may start with the
BTB 10 empty of any entries. TheIFU 18 reads through theBTB 10, but because theBTB 10 contains no entries, the processing continues to the branch address calculator (BAC) stage of the pipeline. TheBAC 14 determines if an address of the branch target is more than 1 Mbyte away from the current instruction address. This determination is made by comparing the upper bits of theBAC 14 calculated address to the corresponding upper bits of the current instruction, that is, the branch instruction. Since theBTB 10 had no entry for the branch, and was therefore unable to predict the branch at all, theBAC 14 will need to re-steer instruction fetch to the target of the branch. After theretirement stage 116, the branch target address is written to theBTB 10. However, the address written to theBTB 10 is truncated to 20 bits. This may cause unnecessary flushing and re-steering, unless a mechanism is provided to detect this error. In particular, theBTB 10 andBAC 14 will again encounter the branch target instruction. However, this time the instruction address has an entry in theBTB 10. TheBTB 10 will construct the predicted branch target by concatenating the partial target address bits stored in theBTB 10 and the implied (or aliasing) bits from the current fetch address. That is, the remaining 12 bits of the current instruction address are prepended to the lower 20 bits of the branch target address. The processor will then re-steer to that target address, which is within 1 MByte of the current instruction address, but which is incorrect. TheBAC 14 will note the incorrect address and flush the pipeline, invalidate the BTB entry, and re-steer again. Then, processing of the instruction will continue through thepipeline 110. When the instruction processing reaches theretirement stage 116, the branch target instruction address will be allocated back into theBTB 10. The next instance of this instruction will therefore also result in a flush and re-steer. - To avoid this problem, the
BAC 14 compares the upper unimplemented bits of the target address (e.g., the upper 12 bits or aliasing bits) with corresponding bits in the current instruction address. If the aliasing bits match the corresponding bits in the address of the branch instruction, then the assumptions about the branch target address being in the same memory block as the address of the current instruction are correct, and the prediction is usable. If the result of the comparison is no match, the branch target prediction is incorrect. - The result of comparing the aliasing bits is stored in the BRT15 (see FIG. 1). Each result or entry includes a comparison bit that indicates if the aliasing bits in the predicted target and the address of the branch instruction match. The
BRT 15 stores branches that are in thepipeline 19, but that have not been retired. When the branch retires, the corresponding branch entry is accessed and the comparison bit is examined. If the comparison bit indicates that the branch target address did not alias, theBTB 10 is updated with the branch information. If the comparison bit indicates no match, theBTB 10 is not updated with the branch information. This prevents the recording of a branch target address that will cause an extra flush and re-steer. Subsequent comparison of the branch target address will also result in assertion of a bit assertion of a bit to suppress update of an entry in theBTB 10 for the particular instruction address. Thus, at most one re-steer will be required for a mis-predicted branch target address. - In an embodiment, the comparison bit may be set to 1 if the comparison indicates no match. Other encoding mechanisms may be used to suppress updating of the
BTB 10. - As an alternative to setting the comparison bit upon completion of the comparison by the
BAC 14, a comparison bit may be set with the instruction. In this alternative, the comparison bit will flow down the pipeline with the rest of the instruction until the instruction retires. At retirement, if the comparison bit indicates that the aliasing bits match, then the entry is allowed to be inserted into theBTB 10. - As another alternative, the
BTB 10 may be updated before retirement. As in other alternatives described above, the result of the aliasing bit comparison is used to determine whether an entry should be allocated to theBTB 10. - FIG. 4 illustrates a process used to reduce branch prediction table pollution. The process starts at100. The
BTB 10 predicts a branch target address,Block 110. The instruction is fetched, further processed and sent to theBAC 14,Block 120. TheBAC 14 then computes thebranch target address 130 and compares the aliasing bits of the computed and predicted branch target address,Block 140. The results of the aliasing bit comparison are encoded and stored, e.g., in thepipeline 19 or in theBRT 15,Blocks Block 180. If the encoding of the comparison bit indicates that the aliasing bits matched, theBTB 10 is updated with the branch information,Block 190. If the encoding of the comparison bit indicates that the aliasing bits do not match, the BTB update is suppressed,Block 195. InBlock 200, the process ends. - The terms and descriptions used herein are set forth by way of illustration only and are not meant as limitations. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the invention as defined in the following claims, and their equivalents, in which all terms are to be understood in their broadest possible sense unless otherwise indicated.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/507,499 US6622241B1 (en) | 2000-02-18 | 2000-02-18 | Method and apparatus for reducing branch prediction table pollution |
JP2001041819A JP3840382B2 (en) | 2000-02-18 | 2001-02-19 | Branch prediction table contamination reduction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/507,499 US6622241B1 (en) | 2000-02-18 | 2000-02-18 | Method and apparatus for reducing branch prediction table pollution |
Publications (2)
Publication Number | Publication Date |
---|---|
US20030163678A1 true US20030163678A1 (en) | 2003-08-28 |
US6622241B1 US6622241B1 (en) | 2003-09-16 |
Family
ID=24018872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/507,499 Expired - Fee Related US6622241B1 (en) | 2000-02-18 | 2000-02-18 | Method and apparatus for reducing branch prediction table pollution |
Country Status (2)
Country | Link |
---|---|
US (1) | US6622241B1 (en) |
JP (1) | JP3840382B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7360064B1 (en) | 2003-12-10 | 2008-04-15 | Cisco Technology, Inc. | Thread interleaving in a multithreaded embedded processor |
US20080276070A1 (en) * | 2005-04-19 | 2008-11-06 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US20090249048A1 (en) * | 2008-03-28 | 2009-10-01 | Sergio Schuler | Branch target buffer addressing in a data processor |
US20130205124A1 (en) * | 2012-02-06 | 2013-08-08 | Microsoft Corporation | Branch target computation |
US10860484B2 (en) * | 2016-05-27 | 2020-12-08 | Nxp Usa, Inc. | Data processor having a memory-management-unit which sets a deterministic-quantity value |
US20220350608A1 (en) * | 2019-09-27 | 2022-11-03 | Nec Corporation | Branch prediction circuit and instruction processing method |
US20240095034A1 (en) * | 2022-09-21 | 2024-03-21 | Arm Limited | Selective control flow predictor insertion |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004505345A (en) * | 2000-07-21 | 2004-02-19 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Data processor with branch target buffer |
US7249243B2 (en) * | 2003-08-06 | 2007-07-24 | Intel Corporation | Control word prediction and varying recovery upon comparing actual to set of stored words |
WO2007099605A1 (en) * | 2006-02-28 | 2007-09-07 | Fujitsu Limited | Processing device by predicting branch from compressed address information |
US8898426B2 (en) | 2012-06-11 | 2014-11-25 | International Business Machines Corporation | Target buffer address region tracking |
US11106466B2 (en) | 2018-06-18 | 2021-08-31 | International Business Machines Corporation | Decoupling of conditional branches |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0661625B1 (en) * | 1994-01-03 | 1999-09-08 | Intel Corporation | Method and apparatus for implementing a four stage branch resolution system in a computer processor |
SG47981A1 (en) * | 1994-03-01 | 1998-04-17 | Intel Corp | Pipeline process of instructions in a computer system |
GB9521980D0 (en) * | 1995-10-26 | 1996-01-03 | Sgs Thomson Microelectronics | Branch target buffer |
US5860017A (en) | 1996-06-28 | 1999-01-12 | Intel Corporation | Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction |
US5859999A (en) | 1996-10-03 | 1999-01-12 | Idea Corporation | System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers |
US5826074A (en) * | 1996-11-22 | 1998-10-20 | S3 Incorporated | Extenstion of 32-bit architecture for 64-bit addressing with shared super-page register |
-
2000
- 2000-02-18 US US09/507,499 patent/US6622241B1/en not_active Expired - Fee Related
-
2001
- 2001-02-19 JP JP2001041819A patent/JP3840382B2/en not_active Expired - Fee Related
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7360064B1 (en) | 2003-12-10 | 2008-04-15 | Cisco Technology, Inc. | Thread interleaving in a multithreaded embedded processor |
US20080276070A1 (en) * | 2005-04-19 | 2008-11-06 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US20080276071A1 (en) * | 2005-04-19 | 2008-11-06 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US7836287B2 (en) * | 2005-04-19 | 2010-11-16 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US20090249048A1 (en) * | 2008-03-28 | 2009-10-01 | Sergio Schuler | Branch target buffer addressing in a data processor |
US20130205124A1 (en) * | 2012-02-06 | 2013-08-08 | Microsoft Corporation | Branch target computation |
US10331891B2 (en) * | 2012-02-06 | 2019-06-25 | Microsoft Technology Licensing, Llc | Branch target computation in secure start-up using an integrity datum and an adjustment datum |
US10860484B2 (en) * | 2016-05-27 | 2020-12-08 | Nxp Usa, Inc. | Data processor having a memory-management-unit which sets a deterministic-quantity value |
US20220350608A1 (en) * | 2019-09-27 | 2022-11-03 | Nec Corporation | Branch prediction circuit and instruction processing method |
US20240095034A1 (en) * | 2022-09-21 | 2024-03-21 | Arm Limited | Selective control flow predictor insertion |
US12159141B2 (en) * | 2022-09-21 | 2024-12-03 | Arm Limited | Selective control flow predictor insertion |
Also Published As
Publication number | Publication date |
---|---|
JP2001236224A (en) | 2001-08-31 |
US6622241B1 (en) | 2003-09-16 |
JP3840382B2 (en) | 2006-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6088793A (en) | Method and apparatus for branch execution on a multiple-instruction-set-architecture microprocessor | |
KR100234648B1 (en) | In-processor instruction execution method and system and data processing system | |
US6170054B1 (en) | Method and apparatus for predicting target addresses for return from subroutine instructions utilizing a return address cache | |
US8171260B2 (en) | Fetching all or portion of instructions in memory line up to branch instruction based on branch prediction and size indicator stored in branch target buffer indexed by fetch address | |
US7783870B2 (en) | Branch target address cache | |
US6351796B1 (en) | Methods and apparatus for increasing the efficiency of a higher level cache by selectively performing writes to the higher level cache | |
US6151672A (en) | Methods and apparatus for reducing interference in a branch history table of a microprocessor | |
JP2744890B2 (en) | Branch prediction data processing apparatus and operation method | |
US9021240B2 (en) | System and method for Controlling restarting of instruction fetching using speculative address computations | |
US20050262332A1 (en) | Method and system for branch target prediction using path information | |
JPH0820950B2 (en) | Multi-predictive branch prediction mechanism | |
US6622241B1 (en) | Method and apparatus for reducing branch prediction table pollution | |
US7844807B2 (en) | Branch target address cache storing direct predictions | |
JP2006520964A (en) | Method and apparatus for branch prediction based on branch target | |
JP2006520964A5 (en) | ||
US20070204142A1 (en) | Method and apparatus for repairing a link stack | |
TWI397816B (en) | Method and apparatus for reducing cache search in branch target address | |
KR100276138B1 (en) | Digital processor with branch history table with branch pattern fields | |
US7865705B2 (en) | Branch target address cache including address type tag bit | |
WO2001042927A9 (en) | Memory access device and method using address translation history table | |
US20030204705A1 (en) | Prediction of branch instructions in a data processing apparatus | |
US7603545B2 (en) | Instruction control method and processor to process instructions by out-of-order processing using delay instructions for branching | |
JP3683439B2 (en) | Information processing apparatus and method for suppressing branch prediction | |
US6421771B1 (en) | Processor performing parallel operations subject to operand register interference using operand history storage | |
US10620960B2 (en) | Apparatus and method for performing branch prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BROCKMANN, RUSSELL C.;KELLY, BRIAN M.;FERNANDO, M A SUSITH ROHANA;REEL/FRAME:010908/0364;SIGNING DATES FROM 20000524 TO 20000526 Owner name: HEWLETT-PACKARD COMPANY, COLORADO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BROCKMANN, RUSSELL C.;KELLY, BRIAN M.;FERNANDO, M A SUSITH ROHANA;REEL/FRAME:010908/0364;SIGNING DATES FROM 20000524 TO 20000526 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:026945/0699 Effective date: 20030131 |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20150916 |
|
AS | Assignment |
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001 Effective date: 20151027 |