JP2006031697A

JP2006031697A - Branch target buffer and usage

Info

Publication number: JP2006031697A
Application number: JP2005198036A
Authority: JP
Inventors: Kigo Boku; 基豪朴
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2004-07-16
Filing date: 2005-07-06
Publication date: 2006-02-02
Also published as: GB2416412A; TW200617777A; GB0514599D0; TWI285841B; GB2416412B

Abstract

【課題】分岐命令に関するデータエントリを貯蔵する分岐ターゲットバッファとその使用方法を提供する。
【解決手段】分岐ターゲットバッファは、分岐ターゲットバッファ内にあるワードラインと関連したワードラインゲーティング回路に応じてデータエントリに対するアクセスを条件的にイネーブルする。ワードラインゲーティング回路は、命令に関連した分岐ヒストリデータから誘導されたワードラインゲーティング値を貯蔵する。また、分岐予測ユニットと、分岐ターゲットバッファと結合されたプロセッサについて、分岐ターゲットバッファを動作するための方法と共に提供する。これにより、電力消費を減らし、分岐命令処理速度を高くし、全体的な複雑性を減らすプロセッサの実現と動作を可能にする。
【選択図】図２A branch target buffer for storing a data entry related to a branch instruction and a method of using the same are provided.
A branch target buffer conditionally enables access to a data entry in response to a word line gating circuit associated with a word line in the branch target buffer. The word line gating circuit stores word line gating values derived from branch history data associated with the instruction. A branch prediction unit and a processor coupled to the branch target buffer are also provided along with a method for operating the branch target buffer. This enables implementation and operation of a processor that reduces power consumption, increases branch instruction processing speed, and reduces overall complexity.
[Selection] Figure 2

Description

本発明は、分岐予測ユニット（ｂｒａｎｃｈｐｒｅｄｉｃｔｉｏｎｕｎｉｔ）を有するプロセッサに係り、より詳細には、分岐予測ユニット内の分岐ターゲットバッファに関する。本発明は、パイプライン構造を有したプロセスに好適に使用される。ここで、パイプラインは、単一構造及びスーパースカラーパイプライン構造全てを含む。 The present invention relates to a processor having a branch prediction unit, and more particularly to a branch target buffer in the branch prediction unit. The present invention is preferably used for a process having a pipeline structure. Here, the pipeline includes both a single structure and a superscalar pipeline structure.

草創期のコンピュータは、命令のシーケンスにおいて、次命令を開始する前に一つの命令を終えるように設計されている。数十年間、主とした構造的発展に草創期のこうした設計は、その性能が広く発展している。パイプライン及びスーパースカラー構造はこうした発展の良い例である。 Early computers are designed to finish one instruction in a sequence of instructions before starting the next instruction. For decades, these designs, which were in the early days of major structural development, have developed widely in performance. Pipeline and superscalar structures are good examples of these developments.

プロセス性能は、またキャッシュを使用することによって、さらに発展している。キャッシュは、しばしば使用する情報を貯蔵するか、或いは提供することに使用されるメモリ構成要素である。ここで、“情報”は広くデータと命令とを全て含む。キャッシュは、従来のメモリアクセス動作に多くの周期を必要とするのとは違って、単一クロック周期内で情報を提供する。所謂、分岐ターゲットバッファ（ＢＴＢ）はプロセッサキャッシュの一種である。 Process performance is also further developed by using caches. A cache is a memory component used to store or provide information that is often used. Here, “information” broadly includes all data and instructions. The cache provides information within a single clock cycle, unlike traditional memory access operations that require many cycles. A so-called branch target buffer (BTB) is a kind of processor cache.

分岐ターゲットバッファの有用性は、パイプライン構造下で容易に理解される。“パイプライニング（又はｓｐｅｃｕｌａｔｉｖｅｅｘｅｃｕｔｉｏｎ）は、一般に動作接近（ｏｐｅｒａｔｉｎｇａｐｐｒｏａｃｈ）に関係する用語である。この動作接近で、命令のシーケンスは、一連の機能的段階又はプロセッシング段階を使用して処理される。それぞれのプロセッシング段階は、普通単一クロック周期内でそれを構成する動作を遂行する。 The usefulness of the branch target buffer is easily understood under the pipeline structure. “Pipelining (or speculative execution) is a term that generally relates to operating approach. With this motion approach, a sequence of instructions is processed using a series of functional or processing steps. Each processing stage normally performs the operations that make it up within a single clock period.

次命令を開始する前に各命令を完遂するノン−パイプラインプロセッサとは違って、パイプラインプロセスは、相異なるプロセッシング段階にある多くの命令を同時に処理する。パイプライン段階は、設計者によって任意的に設計され得るが、一般に命令フェッチ段階（ｉｎｓｔｒｕｃｔｉｏｎｆｅｔｃｈｉｎｇｓｔａｇｅ）と、命令デコーディング段階（ｉｎｓｔｒｕｃｔｉｏｎｄｅｃｏｄｉｎｇｓｔａｇｅ）と、命令実行段階（ｉｎｓｔｒｕｃｔｉｏｎｅｘｅｃｕｔｉｏｎｓｔａｇｅ）と、実行分析段階（ｅｘｅｃｕｔｉｏｎｒｅｓｏｌｕｔｉｏｎｓｔａｇｅ）と、を含む。 Unlike a non-pipeline processor that completes each instruction before starting the next instruction, the pipeline process simultaneously processes many instructions in different processing stages. The pipeline stage may be arbitrarily designed by the designer, but generally, an instruction fetching stage, an instruction decoding stage, an instruction execution stage, and an execution analysis An execution resolution stage.

命令フェッチ段階は、命令が貯蔵された場所（例えば、メインシステムメモリ或いは命令キュー）で命令を検索する。フェッチされた命令は、命令デコーディング段階に移動する。命令デコーディング段階では、命令アドレス及び／又は命令オペランド（ｉｎｓｔｒｕｃｔｉｏｎｏｐｅｒａｎｄ）を決定する。命令は、命令デコーディング段階から命令実行段階に移動する。命令実行段階では、命令によって指示された一つ或いはさらに多数の動作を実行する。実行分析段階（ｅｘｅｃｕｔｉｏｎｒｅｓｏｌｕｔｉｏｎｓｔａｇｅ）は、一般に命令の実行によって発生した結果（例えば、結果データ）を後程使用するため一つ或いはその以上のレジスタ或いはメモリに書き戻すこと（ｗｒｉｔｉｎｇ−ｂａｃｋ）を含む。 The instruction fetch stage searches for instructions at the location where the instructions are stored (eg, main system memory or instruction queue). The fetched instruction moves to the instruction decoding stage. In the instruction decoding stage, an instruction address and / or an instruction operand is determined. The instruction moves from the instruction decoding stage to the instruction execution stage. In the instruction execution stage, one or more operations designated by the instruction are executed. The execution resolution stage generally includes writing-back the result (eg, result data) generated by the execution of the instruction to one or more registers or memory for later use.

命令は、規定された周期間に一つのパイプライン段階から次段階に超えて行く。一番目周期間に、命令フェッチ段階は、貯蔵装置（ｓｔｏｒｇｅ）から一番目命令をフェッチし、デコーディングのため関連されたハードウェアレジスタ内にそれを整列する。二番目周期間に、命令フェッチ段階は、二番目命令を貯蔵装置からフェッチし、それを整列する。この際、命令デコーディング段階は、一番目命令をデコーディングする。三番目周期間に、一番目命令は実行段階で実行動作（例えば、ロジック的、数学的、アドレッシング、或いはインデキシング演算）を開始する。この際、命令デコーディング段階は、二番目命令をデコーディングし、命令フェッチ段階は三番目命令をフェッチする。パイプラインは実行分析を通じて続き、こうした方法にプロセッサの全体動作速度は、ノン−パイプライン構造に比べてずっと向上される。 Instructions go from one pipeline stage to the next during a defined period. During the first period, the instruction fetch stage fetches the first instruction from the storage and aligns it in the associated hardware register for decoding. During the second period, the instruction fetch stage fetches the second instruction from the storage device and aligns it. At this time, in the instruction decoding stage, the first instruction is decoded. During the third period, the first instruction initiates an execution operation (eg, logic, mathematical, addressing, or indexing operation) at the execution stage. At this time, the instruction decoding stage decodes the second instruction, and the instruction fetch stage fetches the third instruction. The pipeline continues through execution analysis, and in this way the overall operating speed of the processor is much improved compared to the non-pipeline structure.

スーパースカラー（ｓｕｐｅｒｓｃａｌａｒ）構造で、二つ或いはその以上の命令は、同時に処理されるか、実行される。すなわち、スーパースカラーシステムは、並列に複数の命令を同時に独立的に実行できる二つ又はそのよりさらに多い実行（或いはデコーダ／実行）パスを有している。スカラーシステムは、命令が命令のパイプラインシーケンスから現れるか、なければノン−パイプライン方法に実行されたか間に周期当たりただの一つの命令のみ実行する。多い命令を同時に実行することはプロセスの性能をさらに向上させる。 In a superscalar structure, two or more instructions are processed or executed simultaneously. That is, a superscalar system has two or more execution (or decoder / execution) paths that can execute multiple instructions concurrently and independently in parallel. Scalar systems execute only one instruction per period while the instruction appears from a pipeline sequence of instructions, or otherwise executed in a non-pipelined manner. Executing many instructions simultaneously further improves process performance.

パイプライニングは、実行される命令のシーケンスが非常に線形的であるか予測できるだけ、演算において、明白な利点を提供する。不幸にも大多数の場合、命令シーケンスは、ノン−シーケンシャル実行パス（ｎｏｎ−ｓｅｑｕｅｎｔｉａｌｅｘｅｃｕｔｉｏｎｐａｔｈｓ）を導入できる数多い命令を含んでいる。所謂“分岐命令”（ｂｒａｎｃｈｉｎｓｔｒｕｃｔｉｏｎｓ）（例えば、ジャンプ、リターン、条件分岐命令などを含む）はパイプラインプロセスで分岐予測の効果的な形態が充足されなければ、重要な性能上の欠陥を有する。予測することができない（誤り予測された）分岐命令がプロセス内で現在パイプラインされた命令のシーケンスから離脱を惹起して性能欠陥を起こす。これが発生すれば、命令の現在パイプラインシーケンスは捨てるか、或いは除去（ｆｌｕｓｈ）されなければならなく、命令の新しいシーケンスがパイプライン内にロードされなければならない。パイプライン除去（ｐｉｐｅｌｉｎｅｆｌｕｓｈｅｓ）は多いクロック周期を浪費し、一般にプロセスの実行を遅くする。 Pipelining provides an obvious advantage in operations that can only predict whether the sequence of instructions to be executed is very linear. Unfortunately, in the vast majority of cases, the instruction sequence includes many instructions that can introduce non-sequential execution paths. So-called “branch instructions” (including, for example, jumps, returns, conditional branch instructions, etc.) have significant performance deficiencies unless an effective form of branch prediction is satisfied in the pipeline process. An unpredictable (error-predicted) branch instruction causes a performance defect by leaving the sequence of instructions currently pipelined in the process. If this occurs, the current pipeline sequence of instructions must be discarded or flushed, and a new sequence of instructions must be loaded into the pipeline. Pipeline flushes waste many clock cycles and generally slow the execution of the process.

分岐命令と関連された実行性能とを向上させる一つの方法は、分岐命令の結果を予測し、予測された命令を分岐命令後にパイプラインに即時挿入する方法である。もしこうした分岐予測メカニズムがプロセスで成功的に実現されれば、パイプライン除去と関連した性能欠陥は、単に分岐命令結果が不正確に予測されるときのみ起こる。多幸にも、従来の技術と分析とによれば、分岐命令の結果は殆ど８０％に至る高水準の確実性を有し正確に予測されている。 One way to improve the execution performance associated with a branch instruction is to predict the result of the branch instruction and immediately insert the predicted instruction into the pipeline after the branch instruction. If such a branch prediction mechanism is successfully implemented in the process, the performance deficiency associated with pipeline removal only occurs when the branch instruction result is predicted incorrectly. Fortunately, according to prior art and analysis, the result of a branch instruction is accurately predicted with a high level of certainty of almost 80%.

その結果、従来の分岐予測メカニズムの幾つの形態が発達されている。分岐予測メカニズムの一つの形態は、分岐ターゲットバッファを用いて多数のデータエントリ（ｄａｔａｅｎｔｒｉｅｓ）を貯蔵することである。それぞれのデータエントリは、分岐命令と関連している。分岐ターゲットバッファは、所謂、多数の“分岐アドレスタグ”を貯蔵する。それぞれの分岐アドレスタグは、対応する分岐命令のため分類されるインデックスとして作用する。分岐アドレスタグに追加して、それぞれの分岐ターゲットバッファエントリは、ターゲットアドレス（ｔａｒｇｅｔａｄｄｒｅｓｓ）、命令演算コード（ｉｎｓｔｒｕｃｔｉｏｎｏｐｃｏｄｅ）、分岐ヒストリ情報、そして可能な他のデータをさらに含む。分岐ターゲットバッファを用いるプロセスで分岐予測メカニズムは、パイプランに入るそれぞれの命令をモニタリングする。普通、命令アドレスはモニタリングされ、命令アドレスは、分岐ターゲットバッファ内のエントリと合わせられ、この際命令は分岐命令に認識される。関連された分岐ヒストリ情報から分岐予測メカニズムは分岐が容易に成されるか、ないかを決定する。分岐ヒストリ情報は典型的に分岐ターゲットバッファ内でインデックスされたそれぞれの分岐命令をモニタし、貯蔵されたデータを分岐が進行する動作で起こることであるか、ないかと関連された分岐ヒストリ情報に限定するステートマシン（ｓｔａｔｅｍａｃｈｉｎｅ）によって決定される。 As a result, several forms of conventional branch prediction mechanisms have been developed. One form of branch prediction mechanism is to store a large number of data entries using a branch target buffer. Each data entry is associated with a branch instruction. The branch target buffer stores a number of so-called “branch address tags”. Each branch address tag acts as an index that is classified for the corresponding branch instruction. In addition to the branch address tag, each branch target buffer entry further includes a target address, an instruction opcode, branch history information, and other possible data. In a process that uses a branch target buffer, the branch prediction mechanism monitors each instruction that enters the pipeline. Normally, the instruction address is monitored and the instruction address is aligned with the entry in the branch target buffer, where the instruction is recognized as a branch instruction. From the associated branch history information, the branch prediction mechanism determines whether a branch is easily taken or not. Branch history information typically monitors each branch instruction indexed in the branch target buffer and limits the stored data to branch history information related to whether or not the branch is going on To be determined by a state machine.

分岐ヒストリ情報が分岐命令が起こることを知らせる場合に、一つ或いはそれ以上予測された命令がパイプラインに挿入される。伝統的に、それぞれの分岐ターゲットバッファデータエントリは、演算コード（ｏｐｃｏｄｅ）を含む。演算コードは、分岐ヒストリ情報と関連して評価される分岐命令と関連される。この分岐命令は、分岐予測メカニズムからの適した指示に応じてこうした演算コードは、パイプラインに直ちに挿入される。また、それぞれの分岐ターゲットバッファデータエントリは、評価されている分岐命令と連関された“ターゲットアドレス”を含んでいる。再び、分岐予測メカニズムから適した指示に応じて、このターゲットアドレスは、“予測されたアドレス”として分岐予測ユニットによって出力され、命令シーケンスで次命令をフェッチすることに使用される。 One or more predicted instructions are inserted into the pipeline when the branch history information indicates that a branch instruction will occur. Traditionally, each branch target buffer data entry includes an opcode. The opcode is associated with a branch instruction that is evaluated in connection with the branch history information. As for this branch instruction, such an operation code is immediately inserted into the pipeline in accordance with a suitable instruction from the branch prediction mechanism. Each branch target buffer data entry also includes a “target address” associated with the branch instruction being evaluated. Again, in response to a suitable indication from the branch prediction mechanism, this target address is output by the branch prediction unit as a “predicted address” and used to fetch the next instruction in the instruction sequence.

分岐命令とそれの連続的な命令の処理は、命令の実行段階で分岐命令が実行されるときまで幾周期間パイプラインを通じて進行される。分岐予測の正確度が分かることは直ちにこの時である。もし分岐命令の結果が正確に予測されれば、分岐ターゲットアドレスは、パイプラインを通じて既に移動されており、プロセス実行は停止なしで継続される。しかし、もし分岐命令の結果が不正確に予測されれば、パイプラインは除去され、正確な命令や命令シーケンスがパイプラインに挿入される。複数の命令シーケンスを処理する二つ又はそれ以上のパイプラインを有するスーパースカラープロセスでは、誤まった分岐予測によって発生する性能欠陥がずっと大きい。なぜならば、大部分の場合、命令の数が少なくとも二度除去されなければならないためである。 Processing of the branch instruction and its successive instructions proceeds through the pipeline for several cycles until the branch instruction is executed in the instruction execution stage. It is at this time that the accuracy of branch prediction is known. If the result of the branch instruction is predicted correctly, the branch target address has already been moved through the pipeline and process execution continues without stopping. However, if the result of the branch instruction is predicted incorrectly, the pipeline is removed and the correct instruction or instruction sequence is inserted into the pipeline. In superscalar processes with two or more pipelines that process multiple instruction sequences, the performance deficits caused by erroneous branch prediction are much greater. This is because in most cases, the number of instructions must be removed at least twice.

図１は、分岐予測ロジック２及び関連ハードウェアに連結された分岐ターゲットバッファ１を示している。分岐ターゲットバッファ１は、アドレスデコーダ３と、メモリアレイ４と、センスアンプ５と、を含む。アドレスデコーダ３は、命令フェッチユニットから命令アドレスが入力され、デコーディングされた命令アドレスと連関するワードラインを選択する。選択されたワードラインにはワードライン電圧が印加される。 FIG. 1 shows a branch target buffer 1 coupled to a branch prediction logic 2 and associated hardware. Branch target buffer 1 includes an address decoder 3, a memory array 4, and a sense amplifier 5. The address decoder 3 receives an instruction address from the instruction fetch unit, and selects a word line associated with the decoded instruction address. A word line voltage is applied to the selected word line.

メモリアレイ４は、多数のメモリセルから構成されており、それぞれのメモリセルは、少なくとも一つのデータビットを貯蔵する。それぞれ多数のデータビットを含むデータエントリは、都合のいいように、整列した状態で貯蔵され、それにより特定のワードラインの選択は、原則として対応するデータエントリにアクセスする。データエントリは、少なくとも、分岐アドレスタグを限定する一つのデータ領域と、ターゲットアドレスを限定する他のデータ領域とを含む。ワードラインが選択されたデータエントリは、通常、メモリアレイ４からセンスアンプ５を介して出力される。 The memory array 4 is composed of a large number of memory cells, and each memory cell stores at least one data bit. Data entries, each containing a number of data bits, are conveniently stored in alignment, so that the selection of a particular word line in principle accesses the corresponding data entry. The data entry includes at least one data area that limits the branch address tag and another data area that limits the target address. The data entry for which the word line is selected is normally output from the memory array 4 via the sense amplifier 5.

センスアンプ５から、分岐アドレスタグは、命令アドレスが入力されるタグ比較レジスタ６に伝達される。センスアンプ５からのターゲットアドレスは、マルチプレクサ７に伝達される。この際、分岐されない命令シーケンスに関連したアドレスも共に伝達される。例えば、３２ビット命令ワードプロセッサの場合プログラムカウンターＰＣ値＋４である。命令キュー(ＰＣＭＵＸ８)に対する伝達のためマルチプレクサの二つ入力のうち一つはロジックゲート９の操作によって選択される。ロジックゲート９には、タグ比較レジスタ６からの結果と分岐予測ロジック２からのテイクン／ノットテイクン(ｔａｋｅｎ／ｎｏｔｔａｋｅｎ)指示が入力される。 The branch address tag is transmitted from the sense amplifier 5 to the tag comparison register 6 to which the instruction address is input. The target address from the sense amplifier 5 is transmitted to the multiplexer 7. At this time, the address associated with the instruction sequence that is not branched is also transmitted. For example, in the case of a 32-bit instruction word processor, the program counter PC value is +4. One of the two inputs of the multiplexer is selected by the operation of the logic gate 9 for transmission to the instruction queue (PC MUX 8). The logic gate 9 is inputted with a result from the tag comparison register 6 and a take / not take instruction from the branch prediction logic 2.

従来の分岐ターゲットバッファには、欠点が多い。まず第一に、図１に示す構造にあるように、分岐ターゲットバッファのメモリアレイは、命令の結果に関係なく全ての分岐命令によってアクセスされる。分岐ターゲットバッファのアクセスは、通常、アドレスデコーダによって選択されたワードラインに対して読み取り動作を実行することと関連する。それぞれの読み取り動作は電源から電力を引き込み、選択されたワードラインと関連のある複数のメモリセルに電力を提供し、このメモリセルからデータを出力する。 The conventional branch target buffer has many drawbacks. First of all, as in the structure shown in FIG. 1, the memory array of the branch target buffer is accessed by all branch instructions regardless of the result of the instruction. Access to the branch target buffer is usually associated with performing a read operation on the word line selected by the address decoder. Each read operation draws power from the power supply, provides power to a plurality of memory cells associated with the selected word line, and outputs data from the memory cells.

電力浪費を減らすため、従来技術による分岐ターゲットバッファは、イネーブルラインをメモリアレイ設計内に集積している。こうした例は、[特許文献１]に開示されている。前述した[特許文献１]によれば、分岐ターゲットバッファは、読み取り動作中にメモリアレイと関連のあるワードラインドライバをイネーブルするか、或いはディスエーブルする、イネーブルラインを含んでいる。ワードラインドライバは、特定の命令が受け入れられそうにないか、それとも受け入れられそうかを予測する、テイクン（ｔａｋｅｎ）状態又はノットテイクン（ｎｏｔ−ｔａｋｅｎ）状態に基づいてイネーブルされるか、ディスエーブルされる。例えば、特定の命令に対するテイクン／ノットテイクン（ｔａｋｅｎ／ｎｏｔ−ｔａｋｅｎ）状態が“ｓｔｒｏｎｇｌｙｎｏｔｔａｋｅｎ”状態を示せば、イネーブルラインは、非活性（ｉｎａｃｔｉｖｅ）レベルに移行し、メモリアレイワードラインドライバはディスエーブルされる。 In order to reduce power waste, prior art branch target buffers integrate enable lines into the memory array design. Such an example is disclosed in [Patent Document 1]. According to the aforementioned [Patent Document 1], the branch target buffer includes an enable line that enables or disables a word line driver associated with the memory array during a read operation. The wordline driver is enabled or disabled based on a taken or not-taken state that predicts whether a particular instruction is unlikely or accepted. The For example, if the taken / not-taken state for a particular instruction indicates a “strongly not taken” state, the enable line transitions to an inactive level and the memory array word line driver is disabled. Able to be.

不幸にも、分岐ターゲットバッファアクセス動作中に電力を減らすという従来の方法には、高い経費がかかる。すなわち、イネーブル信号を発生する分岐予測メカニズムは、命令を“プリ−デコーダ”するため時間と資源を必要とし、分岐ヒストリデータとテイクン／ノットテイクン（ｔａｋｅｎ／ｎｏｔ−ｔａｋｅｎ）状態を決定し、必要に応じてイネーブル信号のレベルを変える。 Unfortunately, the traditional method of reducing power during branch target buffer access operations is expensive. That is, the branch prediction mechanism that generates the enable signal requires time and resources to “pre-decoder” the instruction, determines branch history data and taken / not-taken state, and requires The level of the enable signal is changed accordingly.

命令実行率と命令パイプラインの深さ（ｄｅｐｔｈ）が増すにつれ、分岐予測の正確度と速度は、次第に重要になってくる。こうした重要性を認識し、従来のプロセッサの多くは、拡張されたプリ・デコーディング方法を採用している。それにより、命令はすべて評価され、分岐命令は識別される。そして、分岐ヒストリ情報は、検索されるか、もしくは分岐命令と関連して動的に計算され、その結果、currently評価され、プレ・デコードされる。いうまでもなく、分岐命令行動を予測するこうした方法は、相当な時間がかかり、大量の情報資源の追加が必要である。命令シーケンスの処理での付加的な遅延と複雑性は、分岐予測メカニズムにおいて好ましくない。しかしながら、これはまさに、従来の方法の多くが提供していることである。 As instruction execution rates and instruction pipeline depths increase, the accuracy and speed of branch prediction becomes increasingly important. Recognizing this importance, many conventional processors employ extended pre-decoding methods. Thereby, all instructions are evaluated and branch instructions are identified. The branch history information is then retrieved or dynamically calculated in association with the branch instruction, so that it is currently evaluated and pre-decoded. Needless to say, these methods of predicting branch instruction behavior take a considerable amount of time and require the addition of a large amount of information resources. The additional delay and complexity in processing the instruction sequence is undesirable in the branch prediction mechanism. However, this is exactly what many conventional methods offer.

電力消費問題によって、相当な分岐予測メカニズムの設計が、さらに複雑になっている。最近のプロセッサは、電力消耗について厳しい制限を置いている。ハンドセットやＰＤＡのようなラップトップコンピュータ、移動機器は、最小の電力を消費するプロセッサを搭載した装置の例である。 Due to the power consumption problem, the design of considerable branch prediction mechanisms has become more complex. Modern processors place severe restrictions on power consumption. Laptop computers and mobile devices such as handsets and PDAs are examples of devices equipped with processors that consume minimal power.

前述したように、分岐ターゲットバッファは、潜在的に多数のデータエントリを貯蔵するキャッシュタイプのメモリである。従って、分岐ターゲットバッファは、そのコア部分にメモリアレイ、好ましくは揮発性メモリアレイを有する。こうしたメモリアレイ、および、特に、数多いデータエントリを貯蔵できるように十分に大きくしたメモリアレイは、電力消費が多い。分岐予測メカニズムによる分岐ターゲットバッファのアクセスは、分岐ターゲットバッファメモリアレイに対する読み取り動作を暗示する。分岐ターゲットバッファに対するアクセス動作は増加している。そして幾つかの見積りでは、従来のプロセッサに分岐ターゲットバッファメモリアレイに対する読み取り動作が全体電力消費の１０％以上をも占めると示唆している。 As described above, the branch target buffer is a cache type memory that stores a potentially large number of data entries. Accordingly, the branch target buffer has a memory array, preferably a volatile memory array, in its core portion. Such memory arrays, and particularly memory arrays that are large enough to store a large number of data entries, are power intensive. Access to the branch target buffer by the branch prediction mechanism implies a read operation on the branch target buffer memory array. Access operations to the branch target buffer are increasing. And some estimates suggest that read operations on the branch target buffer memory array account for over 10% of the total power consumption in conventional processors.

分岐予測メカニズムの実現に対するより良い方法が要求されている。分岐命令に対する長い実時間評価及び／又は分岐ヒストリ情報に対する動的な検索及び計算を要する従来の方法は、あまりに複雑で遅い。また、従来の方法によれば、絶え間なくしかし必然的に分岐ターゲットバッファメモリアレイにアクセスするので、電力浪費が多い。
米国特許ＵＳ５，７４０，４１７ There is a need for better ways to implement the branch prediction mechanism. Conventional methods that require long real-time evaluation for branch instructions and / or dynamic retrieval and computation for branch history information are too complex and slow. Also, according to the conventional method, since the branch target buffer memory array is accessed continuously but inevitably, power is wasted.
US Patent US 5,740,417

本発明の技術課題は、スーパースカラープロセッサに容易に適用されることができる分岐ターゲットバッファと使用方法を提供するところにある。スーパースカラープロセッサのデコーダ／実行ユニットは、複数の実行パスを含み、それぞれの実行パスはデコーダと実行ユニットを含む。例として、プロセッサはベクタルプロセッサ又は単一−命令−多重−データ（ＳＩＭＤ）プロセッサである。 The technical problem of the present invention is to provide a branch target buffer and a method of use that can be easily applied to a superscalar processor. The decoder / execution unit of the superscalar processor includes a plurality of execution paths, and each execution path includes a decoder and an execution unit. By way of example, the processor is a vectoral processor or a single-instruction-multiple-data (SIMD) processor.

前述した他の技術的課題を達成するための本発明の一実施形態による分岐ターゲットバッファメモリアレイは、ワードラインと、ワードラインと関連したワードラインゲーティング回路と、を含む。ワードラインゲーティング回路は、ワードラインゲーティング値を貯蔵するメモリ回路を含む。 A branch target buffer memory array according to an embodiment of the present invention for achieving the other technical problems described above includes a word line and a word line gating circuit associated with the word line. The word line gating circuit includes a memory circuit that stores a word line gating value.

メモリアレイは、ワードラインに関連されたデータエントリを貯蔵するように設けられる。ワードラインゲーティング回路は、ゲーティングロジック回路をさらに含む。ゲーティングロジック回路は、ワードラインに印加されるワードライン電圧及びワードラインゲーティング値に応じてデータエントリに対するアクセス動作をイネーブルする。アクセス動作は、ワードラインに印加される読み取り動作又はワードラインに印加される書き取り動作である。 A memory array is provided to store data entries associated with the word lines. The word line gating circuit further includes a gating logic circuit. The gating logic circuit enables an access operation to the data entry according to the word line voltage applied to the word line and the word line gating value. The access operation is a read operation applied to the word line or a write operation applied to the word line.

メモリアレイは、揮発性メモリセルアレイから構成される。例えば、メモリアレイは、ＳＲＡＭアレイであり、メモリ回路は、１−ビットＳＲＡＭセルである。 The memory array is composed of a volatile memory cell array. For example, the memory array is an SRAM array and the memory circuit is a 1-bit SRAM cell.

ゲーティングロジック回路は、書き取り信号及びワードラインゲーティング値が入力されて第１のロジック出力を発生する第１のロジックゲートと、第１のロジック出力及びワードライン電圧が入力されて第２のロジック信号を発生する第２のロジックゲートと、を含む。 The gating logic circuit includes a first logic gate that receives a write signal and a word line gating value to generate a first logic output, and a second logic that receives a first logic output and a word line voltage. A second logic gate for generating a signal.

本発明に従う分岐ターゲットバッファメモリアレイは、書き取り動作に応じてデータエントリを貯蔵し、読み取り動作に応じてデータエントリを出力する。分岐ターゲットバッファメモリアレイは、書き取り動作中にデータエントリに対するアクセスをイネーブルするワードラインゲーティング回路と、そしてワードラインゲーティング回路に貯蔵されたワードラインゲーティング値に応答して読み取り動作中にデータエントリに対するアクセスを条件的にイネーブルするワードラインゲーティング回路と、を含む。ワードラインゲーティング回路は、ワードラインゲーティング値を貯蔵するためのメモリ回路と、書き取り信号とワードラインゲーティング値を入力する際に受け入れる、書き取り動作中にデータエントリに対するアクセスをイネーブルする、そしてワードラインゲーティング値によるポジティブ指示（ｐｏｓｉｔｉｖｅｉｎｄｉｃａｔｉｏｎ）時読み取り動作中にデータエントリに対するアクセスを条件的にイネーブルするゲーティングロジック回路と、を含む。 The branch target buffer memory array according to the present invention stores a data entry in response to a write operation and outputs a data entry in response to a read operation. The branch target buffer memory array includes a word line gating circuit that enables access to the data entry during a write operation, and a data entry during a read operation in response to the word line gating value stored in the word line gating circuit. And a word line gating circuit that conditionally enables access to. The word line gating circuit is a memory circuit for storing word line gating values, accepts a write signal and word line gating value input, enables access to data entries during a write operation, and a word And a gating logic circuit that conditionally enables access to the data entry during a read operation during a positive indication by a line gating value.

本発明に従う分岐ターゲットバッファは、メモリアレイとデコーダとを含む。メモリアレイは、ゲートワードラインを備える。ゲートワードラインのそれぞれは、選択ワードライン部分と、ワードラインゲーティング値を貯蔵するためのメモリ回路を含むワードラインゲーティング回路と、ゲートワードライン部分と、を含む。デコーダは、命令部分を受け入れ、命令部分に応じてゲートワードラインのうちで一つを選択するように採用されたことを特徴とする。分岐ターゲットバッファは、デコーダによって選択されたゲートワードラインからデータエントリが入力されるように設けられたセンスアンプをさらに含む。 The branch target buffer according to the present invention includes a memory array and a decoder. The memory array includes gate word lines. Each of the gate word lines includes a selected word line portion, a word line gating circuit including a memory circuit for storing a word line gating value, and a gate word line portion. The decoder is adapted to accept an instruction part and select one of the gate word lines according to the instruction part. The branch target buffer further includes a sense amplifier provided to receive a data entry from the gate word line selected by the decoder.

センスアンプは、ワードラインゲーティング値をワードラインと関連したそれぞれのメモリ回路に伝達するための回路及び／又は書き取り信号をゲートワードラインと関連したそれぞれのワードラインゲーティング回路に伝達するための回路を含む。 The sense amplifier has a circuit for transmitting a word line gating value to each memory circuit associated with the word line and / or a circuit for transmitting a write signal to each word line gating circuit associated with the gate word line. including.

本発明に従う分岐予測ユニットは、分岐ヒストリデータを貯蔵するための分岐ヒストリユニットと、命令アドレスが入力され、予測されたアドレスを提供し、分岐ヒストリデータをアップデートするための分岐予測ロジックと、命令アドレスが入力される分岐ターゲットバッファと、を含む。分岐ターゲットバッファは、ゲートワードラインを含むメモリアレイを含む、それぞれのゲートワードラインはデータエントリを貯蔵する、そして分岐ヒストリデータから誘導されたワードラインゲーティング値を貯蔵するメモリ回路を備えたワードラインゲーティング回路を含む。 The branch prediction unit according to the present invention includes a branch history unit for storing branch history data, a branch prediction logic for receiving an instruction address, providing a predicted address, and updating the branch history data, and an instruction address. Branch target buffer to which is input. The branch target buffer includes a memory array including gate word lines, each gate word line storing a data entry, and a word line with a memory circuit storing a word line gating value derived from the branch history data. Includes gating circuit.

分岐ヒストリユニットは、分岐する実行ヒストリによって命令のための分岐ヒストリデータを計算するステートマシンを含む。 The branch history unit includes a state machine that calculates branch history data for an instruction by branching execution history.

本発明による分岐予測ユニットは、分岐ヒストリデータを貯蔵するための分岐ヒストリユニットと、複数のゲートワードラインを含む、分岐ターゲットバッファと、を含み、それぞれのゲートワードラインは対応するワードラインゲーティング回路の動作によってアクセスされる。分岐ターゲットバッファは、分岐ターゲットバッファで受けた命令部分と分岐ヒストリデータから誘導されたワードラインゲーティング値に応じてデータエントリを出力するように採用される。 A branch prediction unit according to the present invention includes a branch history unit for storing branch history data, and a branch target buffer including a plurality of gate word lines, each gate word line corresponding to a corresponding word line gating circuit. It is accessed by the action of The branch target buffer is employed so as to output a data entry according to the instruction part received by the branch target buffer and the word line gating value derived from the branch history data.

本発明によるプロセッサは、命令が入力され、対応する命令アドレスを提供する命令フェッチユニットと、命令アドレスが入力され、命令フェッチユニットに予測されたアドレスを提供する分岐予測ユニットと、命令が入力され、デコードされた命令を提供し、デコードされた命令の実行に応答してアップデートされたアドレスを提供する命令デコーダ／実行ユニットと、を含む。 The processor according to the present invention includes an instruction fetch unit that receives an instruction and provides a corresponding instruction address, a branch prediction unit that receives the instruction address and provides a predicted address to the instruction fetch unit, and an instruction. An instruction decoder / execution unit that provides decoded instructions and provides updated addresses in response to execution of the decoded instructions.

分岐予測ユニットは、分岐ヒストリデータを貯蔵するための分岐ヒストリユニットと、命令アドレスとアップデートされたアドレスが入力され、予測されたアドレスを提供し、分岐ヒストリデータをアップデートするための分岐予測ロジックと、命令アドレスが入力され、データエントリを出力するための分岐ターゲットバッファと、を含む。分岐ターゲットバッファは、ゲートワードラインを含むメモリアレイを含み、それぞれのゲートワードラインは対応するワードラインゲーティング回路に連結され、ワードラインゲーティング回路は、分岐ヒストリデータから誘導されたワードラインゲーティング値を貯蔵するメモリ回路を含む。 The branch prediction unit is a branch history unit for storing branch history data, a branch prediction logic for receiving an instruction address and an updated address, providing a predicted address, and updating the branch history data; A branch target buffer for inputting an instruction address and outputting a data entry. The branch target buffer includes a memory array including gate word lines, each gate word line being coupled to a corresponding word line gating circuit, the word line gating circuit being derived from the branch history data. It includes a memory circuit that stores values.

本発明は、電力消費を減らし、分岐命令処理速度を高く、全体的な複雑度を減らすプロセッサの実現と作動を可能とする分岐ターゲットバッファを提供する。プロセッサによる電力消費は、分岐ターゲットバッファに対する読み取り動作を、条件に応じて遂行しないようにすることによって減る。これと関連して、プロセッサ内での分岐命令処理は、既存の分岐ヒストリ情報を検索するか、計算に必要な動作によって惹起された遅延なく遂行される。そして、このプロセッサは構成要素分岐予測ユニットで減少された複雑度に対する利点を有する。 The present invention provides a branch target buffer that enables implementation and operation of a processor that reduces power consumption, increases branch instruction processing speed, and reduces overall complexity. Power consumption by the processor is reduced by not performing read operations on the branch target buffer depending on conditions. In this connection, branch instruction processing within the processor is performed without delays caused by retrieving existing branch history information or by operations required for computation. This processor then has the advantage of reduced complexity in the component branch prediction unit.

以下、添付した図面を参照して本発明の好適な実施形態を詳細に説明する。こうした実施形態は例として提示される。本発明の実際範囲は、後述する特許請求の範囲によって決定される。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Such an embodiment is presented as an example. The actual scope of the invention is determined by the claims that follow.

“プロセッサ”という用語は、幅広く命令シーケンスに対して実行するか、或いは応答することができるディジタルロジック装置又はシステムを含む。この用語は、中央処理ユニット（ＣＰＵｓ）マイクロプロセッサと、ディジタル信号プロセッサ（ＤＳＰｓ）と、減少された命令集合コンピュータ（ＲＩＳＣ）プロセッサと、ベクタプロセッサと、シングル−命令−多重−データ（ＳＩＭＤ）プロセッサと、を含む。 The term “processor” includes digital logic devices or systems that can execute or respond to a wide range of instruction sequences. The terms include central processing units (CPUs) microprocessors, digital signal processors (DSPs), reduced instruction set computer (RISC) processors, vector processors, single-instruction-multiple-data (SIMD) processors, ,including.

パイプラインプロセッサは、特に、本発明の指示に応じて設計された分岐予測ユニットの結合に適する。したがって、多重パイプラインプロセッサは、本発明によって提供される色々な長所だけではなく、本発明の実施形態を例として詳述する。図２は、第１の実施形態によるパイプラインプロセッサのブロック図を示す。 Pipeline processors are particularly suitable for combining branch prediction units designed in accordance with the instructions of the present invention. Accordingly, multiple pipeline processors will be described in detail by way of example embodiments of the present invention, as well as various advantages provided by the present invention. FIG. 2 shows a block diagram of a pipeline processor according to the first embodiment.

プロセッサ１０は、多くある従来のデータ移動技術のうちいずれか一つを使用してバス１４を経由してメインメモリ１２とデータを通信する。メモリ１２は、一つ或いはそれ以上のソフトウェアプログラムや命令のシーケンスを含むルーチンを貯蔵すると仮定する。メモリ１２は、さらに命令のシーケンスに関連したデータを貯蔵すると仮定する。このデータは、プロセッサによって使用される入力データ及びプロセッサ１０によってメモリ１２内に貯蔵された結果データを含んでいる。命令は、プロセッサからのアドレスに応じてメモリ１２からプロセッサ１０に戻る。アドレス指示は、多い形態を取るようになってよい。しかしながら、プログラムカウンターＰＣはよく知られた技術のうち一つであり、プロセッサ１０からメモリ１２にメモリ内にアドレスをフェッチするため次命令を貯蔵するように指示する。 The processor 10 communicates data with the main memory 12 via the bus 14 using any one of a number of conventional data movement techniques. Assume that memory 12 stores a routine that includes one or more software programs and sequences of instructions. It is further assumed that the memory 12 stores data related to the sequence of instructions. This data includes input data used by the processor and result data stored in the memory 12 by the processor 10. The instruction returns from the memory 12 to the processor 10 in accordance with the address from the processor. The address indication may take many forms. However, the program counter PC is one of the well-known techniques, and instructs the processor 10 to store the next instruction to fetch the address into the memory 12 from the processor 10.

前述したように、メモリから検索される次命令を指示する簡単なプロセッサは、一つ或いはその以上の命令が一つの条件下で次アドレス又は他の条件下で他の次アドレスを指定できる分岐命令であるとき非常に複雑になる。特に、パイプラインプロセッサの場合がそうである。 As described above, a simple processor that indicates the next instruction retrieved from the memory is a branch instruction in which one or more instructions can specify a next address under one condition or another next address under another condition. When it is, it becomes very complicated. This is especially true for pipeline processors.

再び、図２を参照すると、パイプラインプロセッサ１０は、一般にメモリ１２から命令ＩＲを受け入れ、メモリ１２に予測されたアドレスを提供する命令フェッチユニット１３を含む。命令フェッチユニット１３は、命令デコーダユニット１５に命令を下す。命令デコーダユニット１５は、命令をデコーディングし、一般に実行ユニット１７に命令の演算コード部分を提供する。実行ユニット１７から、受けたデコードされた命令（或いは命令部分）は、実行ユニット内に一つ或いはその以上の動作を開始する。こうした動作は、典型的にメモリ１２やシステム内のさらに他の位置に書き戻される（ｗｒｉｔｔｅｎ−ｂａｃｋ）結果データを発生する。 Referring again to FIG. 2, the pipeline processor 10 generally includes an instruction fetch unit 13 that accepts an instruction IR from the memory 12 and provides the predicted address to the memory 12. The instruction fetch unit 13 issues an instruction to the instruction decoder unit 15. The instruction decoder unit 15 decodes the instructions and generally provides the execution unit 17 with the operational code portion of the instructions. The decoded instruction (or instruction portion) received from the execution unit 17 initiates one or more operations within the execution unit. Such an operation typically produces result data that is written-back to the memory 12 or further locations within the system.

命令デコーダ１５に命令を提供することに加えて、命令フェッチユニット１３は、分岐予測ユニット１９に命令部分を提供する。この命令部分は、一般に命令アドレスを含むが、他の情報も含む。分岐予測ユニット１９はまた実行ユニット１７から“次アドレス”指示を受けとる。すなわち、命令の実行によって、命令のシーケンスで実行される次命令は、実際的に知られる（すなわち、分岐命令の条件は決定される）。従って、次アドレス指示は、分岐予測ユニット１９にフィードバックされる。この情報を用いて、分岐予測ユニット１９は、以前に予測された次命令が実際に正しい次命令であるかを決定する。実行ユニット１７からの次アドレス指示は、典型的に命令アドレスである。 In addition to providing instructions to the instruction decoder 15, the instruction fetch unit 13 provides instruction parts to the branch prediction unit 19. This instruction portion typically includes an instruction address, but also includes other information. Branch prediction unit 19 also receives a “next address” indication from execution unit 17. That is, upon execution of an instruction, the next instruction executed in the sequence of instructions is actually known (ie, the condition of the branch instruction is determined). Therefore, the next address indication is fed back to the branch prediction unit 19. Using this information, branch prediction unit 19 determines whether the previously predicted next instruction is actually the correct next instruction. The next address indication from the execution unit 17 is typically an instruction address.

実行ユニット１７からの次アドレス指示が、以前に予測された命令アドレスとよく合うと（例えば、“ＨＩＴ”条件）、プロセッサは命令のパイプラインシーケンスに応じて進行を続ける。しかしながら、もし次アドレス指示が以前に予測された命令アドレスと違う場合（例えば、“ＭＩＳＳ”条件）、プロセッサは、パイプラインを除去し、次アドレス指示によって指示された命令をロード（ｌｏａｄ）する。 If the next address indication from the execution unit 17 matches the previously predicted instruction address (eg, a “HIT” condition), the processor continues to progress according to the instruction pipeline sequence. However, if the next address indication is different from the previously predicted instruction address (eg, “MISS” condition), the processor removes the pipeline and loads the instruction indicated by the next address indication.

前に予測された命令アドレスと次アドレス指示との比較は、分岐予測ユニット１９内で実行されることが好ましい。幾つかの追加的な説明において後述するように、分岐予測ユニット１９はまた、命令フェッチユニット１３に予測されたアドレスを提供するパイプラインプロセッサ１０内で提供される。 The comparison between the previously predicted instruction address and the next address indication is preferably performed in the branch prediction unit 19. As will be described later in some additional description, branch prediction unit 19 is also provided within pipeline processor 10 that provides the predicted address to instruction fetch unit 13.

好適な実施形態についてのより詳しい説明を続ける前に、本発明はまた、スーパースカラープロセッサに特によく適するということを詳細に説明されなければならない。非常に簡単なスーパースカラープロセッサは、図３に示されている。ここで、メモリ１２は、バス１４を経由してスーパースカラープロセッサに命令及び／又はデータを再び与える。分岐予測ユニット３９と命令フェッチユニット３３は、一般に前述したように動作する。但し、命令フェッチユニット３３は、多重実行パス３４，３５，３６に命令を提供する。これに類似して、それぞれの実行パス３４，３５，３６は分岐予測ユニット３９に次アドレス指示を提供する。 Before continuing with a more detailed description of the preferred embodiments, it must be described in detail that the present invention is also particularly well suited for superscalar processors. A very simple superscalar processor is shown in FIG. Here, the memory 12 again provides instructions and / or data to the superscalar processor via the bus 14. The branch prediction unit 39 and the instruction fetch unit 33 generally operate as described above. However, the instruction fetch unit 33 provides instructions to the multiple execution paths 34, 35 and 36. Analogously to this, each execution path 34, 35, 36 provides a next address indication to the branch prediction unit 39.

スーパースカラープロセッサの例では、三つの実行パスを示している。しかしながら、この数字は殆ど典型的であり、任意に選択された数に過ぎない。さらに、それぞれの実行パスは、共同命令フェッチユニットから命令を受け入れるように組み合わされたデコーダ／実行ユニットによって特徴づけられる。 In the superscalar processor example, three execution paths are shown. However, this number is almost typical and is an arbitrarily chosen number. Further, each execution path is characterized by a decoder / execution unit combined to accept instructions from the joint instruction fetch unit.

プロセッサの典型的な段階に関連したハードウェアと機能上の境界は、全体的に、設計者によるルーチンデザイン設計選択の問題である。例えば、デコーディングと実行関数は、ハードウェア（例えばＩＣ）の断片的な部分であるが、多重に複雑なＩＣで容易に実行することができる。デコーディング及び／又は実行はハードウェア、ソフトウェア、ファームウェア、或いはこうした三つの一般的なプラットホーム形態の組み合わせで実行することができる。同様に本実施形態における命令フェッチユニット、命令デコーダユニット、および／又は分岐予測ユニットの間のハードウェア及び機能上の境界が説明されている。 The hardware and functional boundaries associated with typical stages of the processor are generally a matter of routine design design choices by the designer. For example, the decoding and execution functions are fragmented parts of hardware (eg, IC), but can be easily executed with multiple complex ICs. Decoding and / or execution can be performed in hardware, software, firmware, or a combination of these three general platform configurations. Similarly, the hardware and functional boundaries between the instruction fetch unit, instruction decoder unit, and / or branch prediction unit in this embodiment are described.

結合されたプロセッサのタイプと関係なく、本発明は、好ましくは、分岐予測ロジックの幾つかの形態を含む分岐予測ユニット、命令を分岐することに関連したデータエントリを貯蔵するための幾つかのメカニズム、そして分岐ヒストリデータ貯蔵及び／又は計算のための幾つかの形態を提供する。図４は、図２に示された分岐予測ユニット１９をより詳細に説明するためのブロック図である。 Regardless of the type of combined processor, the present invention preferably includes a branch prediction unit that includes several forms of branch prediction logic, several mechanisms for storing data entries associated with branching instructions. And provide several forms for branch history data storage and / or computation. FIG. 4 is a block diagram for explaining the branch prediction unit 19 shown in FIG. 2 in more detail.

図４で分岐予測ロジック２０は、命令フェッチユニット１３に予測されたアドレスを提供する。分岐予測ロジック２０は、命令フェッチユニット１３から命令アドレスを受け、一般に分岐ターゲットバッファ２２及び分岐ヒストリユニット２４と情報を交換する。こうした三つの機能ブロックは、説明のために選択されている。本発明は、ハードウェアという観点では構成要素の特定なグループ化に制限されない。例えば、本発明の実際的な実施において、分岐ヒストリユニット２４に関連したデータ貯蔵機能性は、分岐ターゲットバッファ２２に関連したメモリアレイ、或いは分岐予測ロジック２０に関連したメモリ装置内で組み合わせてよい。同様に、分岐ヒストリユニット２４と関連した計算機能性は、分岐予測ロジック２０によって提供されたハードウェア、或いはソフトウェア資源を用いて実施することができる。 In FIG. 4, the branch prediction logic 20 provides the predicted address to the instruction fetch unit 13. Branch prediction logic 20 receives instruction addresses from instruction fetch unit 13 and generally exchanges information with branch target buffer 22 and branch history unit 24. These three functional blocks are selected for explanation. The present invention is not limited to a specific grouping of components in terms of hardware. For example, in a practical implementation of the present invention, the data storage functionality associated with the branch history unit 24 may be combined in a memory array associated with the branch target buffer 22 or a memory device associated with the branch prediction logic 20. Similarly, the computational functionality associated with the branch history unit 24 can be implemented using hardware or software resources provided by the branch prediction logic 20.

より詳細には、分岐予測ロジック２０は、命令部分、通常は命令アドレス（例えば、現在のプログラムカウンターＰＣ値）を、命令フェッチユニット１３から受信し、その後に、プロセッサは、命令と関連しているターゲットアドレスに分岐するか、或いは命令のシーケンスで次命令を実行するか予測する。“予測”という用語は一般に、理論的或いは計算的である出力のことをいう。これは、受け取った命令アドレスに関連し、好ましくは、入力された命令アドレスに関連した分岐ヒストリ情報に関連した分岐予測ロジックによって作られる。従って、分岐予測ロジック２０は、ロジック構造、計算回路、データレジスタ、比較回路、及び／又は類似ハードウェア資源、そしてハードウェア資源を駆動するための内蔵制御ソフトウェア（ｅｍｂｅｄｄｅｄｃｏｎｔｒｏｌｌｅｒｓｏｆｔｗａｒｅ）の多い具体的組み合わせから構成される。 More specifically, branch prediction logic 20 receives an instruction portion, usually an instruction address (eg, current program counter PC value) from instruction fetch unit 13, after which the processor is associated with the instruction. Predict whether to branch to the target address or to execute the next instruction in the sequence of instructions. The term “prediction” generally refers to an output that is theoretical or computational. This is made by the branch prediction logic associated with the received instruction address and preferably associated with the branch history information associated with the input instruction address. Accordingly, the branch prediction logic 20 has many specific combinations of logic structures, calculation circuits, data registers, comparison circuits, and / or similar hardware resources, and embedded controller software for driving the hardware resources. Consists of

今まで言及してきたように、分岐予測ユニット２０は、分岐ターゲットバッファ２２に書き取り信号を提供する。書き取り信号は、分岐ターゲットバッファ２２内で書き取り動作を制御する。“読み取り”と“書き取り”というのは、一般にＳＲＡＭやＤＲＡＭのような普通メモリ装置の動作に使用される用語である。 As previously mentioned, the branch prediction unit 20 provides a write signal to the branch target buffer 22. The write signal controls the write operation within the branch target buffer 22. “Read” and “write” are terms generally used in the operation of a normal memory device such as SRAM or DRAM.

ターゲットアドレスに分岐するための分岐予測ロジック２０の決定を“テイクン（Ｔａｋｅｎ）”条件であると限定する。分岐しない分岐予測ロジック２０の決定を“ノットテイクン（Ｎｏｔ−Ｔａｋｅｎ）条件であると限定する。分岐予測ユニット２０は、入力された命令アドレスによって指示された命令と関連した分岐ヒストリデータの状態に応じてｔａｋｅｎ或いはｎｏｔ−ｔａｋｅｎの条件を予測する。 The decision of the branch prediction logic 20 for branching to the target address is limited to the “Taken” condition. The decision of the branch prediction logic 20 that does not branch is limited to “Not-Take” conditions. The branch prediction unit 20 responds to the state of the branch history data associated with the instruction indicated by the input instruction address. The condition of take or not-taken is predicted.

分岐ヒストリユニット２４は、計算、貯蔵及び／又は分岐予測ロジック２０に分岐ヒストリデータを提供する役割を果たす。分岐ヒストリデータは、下された命令によるｔａｋｅｎとｎｏｔｔａｋｅｎとの間の予測に関連した有用なあらゆるデータである。分岐命令がｔａｋｅｎになるか、ならないかを示すデータの計算に他の方法を提案する多くの従来のアルゴリズム及び類似方法が提示されている。本発明は、アルゴリズムや方法が命令行動を分岐する正確な予測を提供する限り、こうした類似方法のうちいずれを使用するにも適している。 The branch history unit 24 serves to provide branch history data to the calculation, storage and / or branch prediction logic 20. Branch history data is any useful data related to the prediction between taken and not taken due to a dropped instruction. Many conventional algorithms and similar methods have been proposed that propose other methods for calculating data indicating whether a branch instruction is taken or not. The present invention is suitable for using any of these similar methods as long as the algorithm or method provides an accurate prediction of branching command behavior.

ヒストリデータの分岐に対する貯蔵と提供は、分岐ヒストリユニットと関連したメモリ要素によって提供される。分岐ターゲットバッファ内に貯蔵された対応するデータエントリを有するそれぞれの分岐命令は、分岐ヒストリユニット内に貯蔵された分岐ヒストリデータの幾つかの形態を有している。（しかしながら、前述したように、分岐ヒストリデータは、対応するデータエントリに沿って分岐ターゲットバッファ内に貯蔵することができる。）命令のための分岐ヒストリデータは、命令が実際に分岐されるように周波数を決定するために、命令を含む一つ或いはそれ以上のプログラムによって実験的に決定される。そうすると、始めに決定されたこの分岐ヒストリデータは次の基準のため分岐ヒストリユニットに貯蔵される。初期に決定された分岐ヒストリデータは、命令のそれぞれの次の実行によってアップデートされる。このようにして、同時分岐行動は現在存在する分岐ヒストリデータをアップデートすることに用いられる。勿論分岐ヒストリデータは、なんらかの方法で、予め決定される必要はない。しかしながら、実際の命令実行に対応して“ｏｎ−ｔｈｅ−ｆｌｙ”に発生することができる。 Storage and provision for the branch of history data is provided by a memory element associated with the branch history unit. Each branch instruction having a corresponding data entry stored in the branch target buffer has several forms of branch history data stored in the branch history unit. (However, as described above, branch history data can be stored in the branch target buffer along with the corresponding data entry.) The branch history data for an instruction ensures that the instruction is actually branched. To determine the frequency, it is determined experimentally by one or more programs containing instructions. Then, the branch history data determined first is stored in the branch history unit for the next criterion. The initially determined branch history data is updated with each subsequent execution of the instruction. In this way, the simultaneous branching action is used to update currently existing branch history data. Of course, the branch history data need not be predetermined in any way. However, it can occur “on-the-fly” corresponding to actual instruction execution.

いつ決定されようか、どのようにアップデートされようが、分岐ヒストリデータは、容易にステートマシンを用いて決定することができる。要求を充足するステートマシンの複雑性と設計は、ルーチンデザイン選択の問題である。しかしながら、前述したように、本発明は２−ビットアップ／ダウンサチュエーションカウンター分岐ヒストリユニット２４の計算部分として使用する。２ビットアップ／ダウンサチュエーションカウンターの動作と使用は、図５の順序図に説明される。ここで、２ビット分岐ヒストリデータ値は実際に実行する間ｔａｋｅｎ或いはｎｏｔｔａｋｅｎする命令に関連して命令を分岐する実行によって増加されるか、或いは減少される。この分岐ヒストリデータは、命令に対して“ｔａｋｅｎ−ｎｅｓｓ”の特定の度合を示唆する。 Regardless of when and how it is updated, branch history data can easily be determined using a state machine. The complexity and design of state machines that meet requirements is a matter of routine design choice. However, as described above, the present invention is used as a calculation part of the 2-bit up / down saturation counter branch history unit 24. The operation and use of the 2-bit up / down saturation counter is illustrated in the sequence diagram of FIG. Here, the 2-bit branch history data value is incremented or decremented by execution branching instructions relative to the instruction taking or not taking during actual execution. This branch history data suggests a specific degree of “taken-ness” for the instruction.

例えば、以前のｎｏｎ−ｔａｋｅｎ分岐命令は、“ｓｔｒｏｎｇｌｙｎｏｔｔａｋｅｎ”から“ｗｅａｋｌｙｎｏｔｔａｋｅｎ”状態に移動する。こうした状態変化は、“００”から“０１”まで対応する分岐ヒストリデータ値を増加することによって指示される。前に“ｓｔｒｏｎｇｌｙｔａｋｅｎ”状態を有した命令は、対応する分岐ヒストリデータ値を減少することによって“ｗｅａｋｌｙｔａｋｅｎ”状態に変化される。 For example, a previous non-taken branch instruction moves from “strongly not take” to a “weakly not take” state. Such a state change is indicated by increasing the corresponding branch history data value from “00” to “01”. An instruction that previously had a “strongly taken” state is changed to a “weakly taken” state by decreasing the corresponding branch history data value.

好適な実施形態において、二つのビットは、命令がｔａｋｅｎされるかどうかという可能性を応用の大部分が正確に予測するには十分であると思われる。しかしながら、これは全ての応用に当てはまることではない。正確にｔａｋｅｎ／ｎｏｔ−ｔａｋｅｎ決定を行うためには分岐ヒストリデータは、さらに大量のビットを要する。従って、分岐ヒストリデータの特定の識別は、分岐ヒストリ設計を計算するアルゴリズムの選択及び／又は選択されたアルゴリズムを実行するステートマシン限定によるデザイン選択の問題である。 In the preferred embodiment, the two bits appear to be sufficient for most applications to accurately predict the likelihood that an instruction will be taken. However, this is not true for all applications. The branch history data requires a larger number of bits in order to accurately perform the take / not-taken determination. Thus, the specific identification of branch history data is a matter of design selection by selecting an algorithm that calculates the branch history design and / or by a state machine that executes the selected algorithm.

図４に示された分岐ターゲットバッファは、図６を参照してより詳細に説明する。ここで、デコーダ４３は、命令部分、好ましくは、命令アドレスが入力され、デコードされた命令部分によって指示されたワードラインを選択する。従来の分岐ターゲットバッファのように、複数のワードラインのデコーダ４３からメモリアレイ４０に拡張されている。しかしながら、ワードラインの特徴、動作そして実現は、本発明で変わる。“ゲートワードライン”というのは、幾つかの実施形態と関連して使用されるが、これは本発明によって限定されたワードラインである。 The branch target buffer shown in FIG. 4 will be described in more detail with reference to FIG. Here, the decoder 43 receives an instruction part, preferably an instruction address, and selects a word line indicated by the decoded instruction part. Like a conventional branch target buffer, the memory array 40 is expanded from a decoder 43 of a plurality of word lines. However, the characteristics, operation and implementation of the word line will vary with the present invention. A “gate word line” is used in connection with some embodiments, which is a word line limited by the present invention.

メモリアレイ４０は、好ましくは、ＳＲＡＭセルのような揮発性メモリセルのアレイである。しかしながら、メモリの他の形態も使用されることができる。従来の分岐ターゲットバッファメモリアレイのように、本発明のメモリアレイは、好ましくは、複数のデータエントリを貯蔵する。ここで、それぞれのデータエントリは、命令に対応し、好ましくは、分岐アドレスタグとターゲットアドレスとを含む。他の形態のデータは、それぞれのデータエントリと関係している。しかしながら、一般的に言えば、分岐アドレスタグとターゲットアドレスの幾つかの形態が必要となる。 Memory array 40 is preferably an array of volatile memory cells such as SRAM cells. However, other forms of memory can be used. Like conventional branch target buffer memory arrays, the memory array of the present invention preferably stores a plurality of data entries. Here, each data entry corresponds to an instruction, and preferably includes a branch address tag and a target address. Other forms of data are associated with each data entry. However, generally speaking, some form of branch address tag and target address is required.

前述したゲートワードラインは、図７に示されている。ゲートワードラインは、一般にワードライン７０とワードラインゲーティング回路６０の組合を含む。図６で説明したように、ワードラインゲーティング回路は、好ましくは、対応するワードラインと一対一関係にある。複数のワードラインゲーティング回路は、好ましくは、メモリアレイ４０内にカラム形態で構成されている。この構造により、従来の書き取り技術を用いるワードラインゲーティング回路に貯蔵されたそれぞれのワードラインゲーティング値を容易にアップデートできる。好適な実施形態で、メモリアレイ４０は、ＳＲＡＭアレイとワードラインゲーティング回路から構成されるが、このワードラインゲーティング回路は、単一ビットＳＲＡＭセルで形成されたメモリ回路から構成されている。 The aforementioned gate word line is shown in FIG. A gate word line generally includes a combination of a word line 70 and a word line gating circuit 60. As described in FIG. 6, the word line gating circuit is preferably in a one-to-one relationship with the corresponding word line. The plurality of word line gating circuits are preferably configured in a column form within the memory array 40. With this structure, each word line gating value stored in the word line gating circuit using the conventional writing technique can be easily updated. In the preferred embodiment, memory array 40 is comprised of an SRAM array and a word line gating circuit, which is comprised of a memory circuit formed of single bit SRAM cells.

ワードラインゲーティング回路の実際的な構成にもかかわらず、ワードラインゲーティング回路はそれぞれ、ワードラインと関連した命令に対する分岐ヒストリデータから誘導された“ワードラインゲーティング値”を踏まえて対応するワードラインにアクセス可能とするか、或いは不可能とする機能をもつ。すなわち、分岐ターゲットバッファ２２で入力されるそれぞれの分岐命令部分は、デコーダ４３の動作を通じて対応するワードラインを選択する。このワードラインは、分岐アドレスタグと入力される分岐命令部分に関連したターゲットアドレスを含むデータエントリを貯蔵する。デコーダ４３によって選択された対応するワードラインはゲートワードラインである。すなわち、ゲートワードラインは、関連のあるワードラインゲーティング回路の動作を通じてのみアクセス可能である。ワードラインゲーティング回路の動作は、ワードラインゲーティング回路に貯蔵されたワードラインゲーティング値によって制御される。このワードラインゲーティング値は、命令に関連した分岐ヒストリデータから誘導される。 Regardless of the actual configuration of the word line gating circuit, each word line gating circuit has a corresponding word based on the “word line gating value” derived from the branch history data for the instruction associated with the word line. It has a function to make the line accessible or impossible. That is, each branch instruction portion input in the branch target buffer 22 selects a corresponding word line through the operation of the decoder 43. This word line stores a data entry including a target address associated with a branch address tag and a branch instruction portion inputted. The corresponding word line selected by the decoder 43 is a gate word line. That is, the gate word line can only be accessed through the operation of the associated word line gating circuit. The operation of the word line gating circuit is controlled by the word line gating value stored in the word line gating circuit. This word line gating value is derived from branch history data associated with the instruction.

それぞれの命令を分岐することについて、ワードラインゲーティング値は、好ましくは、命令と関連した分岐ヒストリデータから誘導される。代表的な誘導方法は、前述した実施形態で説明されている。多くの異なった方法を、分岐ヒストリデータからワードラインゲーティング値を誘導することに利用してよい。これらの方法は、分岐ヒストリデータの特徴、ワードラインゲーティング値の大きさ、及び／又はワードラインゲーティング回路及びそれを構成するメモリ回路の構造によってさまざまである。 For branching each instruction, the wordline gating value is preferably derived from branch history data associated with the instruction. A typical guidance method is described in the embodiment described above. Many different methods may be used to derive word line gating values from branch history data. These methods vary depending on the characteristics of the branch history data, the size of the word line gating value, and / or the structure of the word line gating circuit and the memory circuit constituting the word line gating circuit.

図５について説明したように、２ビット分岐ヒストリデータとそれぞれのワードラインゲーティング回路に関連した単一ビットメモリセルを仮定すると、要求を充足するワードラインゲーティング値は、分岐ヒストリデータの一番重要なビットを用いて簡単に誘導できる。この例として、一番重要なビットに対する“１”のロジック値は、命令について“ｓｔｒｏｎｇｌｙｔａｋｅｎ”であるか、“ｗｅａｋｌｙｔａｋｅｎ”である。一番重要なビットに対するロジック値“０”は、命令について“ｗｅａｋｌｙｎｏｔ−ｔａｋｅｎ”であるか、“ｓｔｒｏｎｇｌｙｎｏｔ−ｔａｋｅｎ”状態である。ワードラインゲーティング回路に関連した単一ビットメモリセルにこのビット値を貯蔵することによって、ｔａｋｅｎ−ｎｅｓｓの命令水準を受容できる程度にまで正確な指示は、ゲートワードラインに対するアクセスを制御することに用いられる。 As described with respect to FIG. 5, assuming 2-bit branch history data and a single bit memory cell associated with each word line gating circuit, the word line gating value that satisfies the requirement is the first of the branch history data. Easy to navigate with important bits. In this example, the logic value of “1” for the most important bit is “strongly taken” or “weakly taken” for the instruction. The logic value “0” for the most important bit is “weak not-taken” or “strong not-taken” state for the instruction. By storing this bit value in a single bit memory cell associated with the word line gating circuit, an accurate indication to the extent that it can accept the take-ness instruction level is to control access to the gate word line. Used.

再び、図７を参照すると、ワードライン７０は、デコーダによって選択され、ワードライン７０にはワードライン電圧が印加される。通常、このような印加されたワードライン電圧は、ワードラインの全体長さにかけて電圧ポテンシャルを増加させる。しかしながら、本発明の場合、選択されたワードラインの長さを通るワードライン電圧の印加は、関連したワードラインゲーティング回路６０によって条件的にイネーブルされる。ワードラインゲーティング回路６０は、好ましくは、メモリ回路６１とゲーティングロジック回路６２とから構成される。 Referring to FIG. 7 again, the word line 70 is selected by the decoder, and a word line voltage is applied to the word line 70. Usually, such an applied word line voltage increases the voltage potential over the entire length of the word line. However, in the present invention, the application of the word line voltage through the selected word line length is conditionally enabled by the associated word line gating circuit 60. The word line gating circuit 60 is preferably composed of a memory circuit 61 and a gating logic circuit 62.

メモリ回路６１のサイズは、貯蔵されたワードライン制御値の大きさによって変わる。前述した説明では、単一ビットが貯蔵される。しかしながら、どのような大きさのワードライン制御値でも貯蔵することができ、ゲートワードラインに対するアクセスを制御することに使用できる。図７で、二つのＰ型トランジスタと四つのＮ型トランジスタを含むＳＲＡＭメモリセルは、ワードラインゲーティング値を貯蔵するメモリ回路に用いられる。 The size of the memory circuit 61 varies depending on the stored word line control value. In the above description, a single bit is stored. However, any size word line control value can be stored and used to control access to the gate word line. In FIG. 7, an SRAM memory cell including two P-type transistors and four N-type transistors is used in a memory circuit that stores a word line gating value.

貯蔵されたワードラインゲーティング値のロジック値（“１”或いは“０”）は、ゲーティングロジック回路６２に対する入力として使用される。具体的には、ワードラインゲーティング値は、第１のロジックゲート８２に一つの入力として印加される。第１のロジックゲート８２には、また、分岐予測ロジックから書き取り信号が入力される。第１のロジックゲート８２がＯＲ型ロジックゲートであるので、少なくとも一つの入力が“１”のロジック値を有すれば、その出力は、“１”のロジック値を有する第１のロジック出力になる。第１のロジック出力がワードライン電圧の値によって第２のロジックゲート８０に印加される。第２のロジックゲート８０は、ＡＮＤ型ロジックゲートであるので、入力の全てが“１”であってこそ第２のロジック出力が“１”になる。実施形態で、第２のロジックゲート８０からの第２のロジック出力は、ワードラインゲーティング回路６０の動作“後で”ワードライン７０の部分のためのワードライン電圧としての役目を果たす。 The logic value (“1” or “0”) of the stored word line gating value is used as an input to the gating logic circuit 62. Specifically, the word line gating value is applied to the first logic gate 82 as one input. The first logic gate 82 also receives a write signal from the branch prediction logic. Since the first logic gate 82 is an OR type logic gate, if at least one input has a logic value of “1”, its output becomes a first logic output having a logic value of “1”. . The first logic output is applied to the second logic gate 80 according to the value of the word line voltage. Since the second logic gate 80 is an AND type logic gate, the second logic output becomes “1” only when all the inputs are “1”. In an embodiment, the second logic output from the second logic gate 80 serves as the word line voltage for the portion of the word line 70 “after” the operation of the word line gating circuit 60.

このように、ワードライン７０は、二つの異なる部分を有する。選択ワードライン部分７１は、命令アドレスに応じてデコーダによって選択でき、ゲートワードライン部分７２は、ワードラインゲーティング回路の動作を通じてのみアクセスすることができる。実施形態で、選択ワードライン部分は、デコーダからワードライン電圧が入力されるように設けられる。このワードライン電圧は、ワードライン７０と関連したワードラインゲーティング回路に入力として使用される。ワードラインゲーティング回路６０に貯蔵されたワードラインゲーティング値によって条件的にイネーブルされるこのワードライン電圧は、対応するゲートワードライン部分７２にまで通る。 Thus, the word line 70 has two different parts. The selected word line portion 71 can be selected by the decoder according to the instruction address, and the gate word line portion 72 can be accessed only through the operation of the word line gating circuit. In the embodiment, the selected word line portion is provided to receive a word line voltage from the decoder. This word line voltage is used as an input to the word line gating circuit associated with the word line 70. This word line voltage, which is conditionally enabled by the word line gating value stored in the word line gating circuit 60, passes to the corresponding gate word line portion 72.

選択ワードライン部分から、対応するゲートワードライン部分にまで条件的に通過（又は、“ゲーティング”）するワードライン電圧は、好ましくは、ワードラインに印加された読み取り動作にのみ関連する。すなわち、ワードラインゲーティング値は、分岐命令を予測する分岐ヒストリデータが受け入れられることを示する場所であり、選択されたワードラインに印加される読み取り動作はイネーブルされる。しかしながら、ワードラインゲーティング値は、分岐命令を予測する分岐ヒストリデータが受け入れられない場所であり、選択されたワードラインに印加される読み取り動作はイネーブルされない。 The word line voltage that conditionally passes (or “gates”) from the selected word line portion to the corresponding gate word line portion is preferably only relevant to the read operation applied to the word line. That is, the word line gating value is a location that indicates that branch history data predicting a branch instruction is accepted, and a read operation applied to the selected word line is enabled. However, the word line gating value is where branch history data predicting a branch instruction is not accepted, and read operations applied to the selected word line are not enabled.

こうした条件的な“アクセス動作”の可能性は、メモリアレイ４０に貯蔵されたデータエントリがアップデートされる書き取り動作中には一般的に不要である。従って、ゲートワードラインがデコーダによって選択されるとき、第１のロジックゲートに対する書き取り信号の適用は、ゲートワードラインに対するアクセスを即時許容する。すなわち、書き取り動作は、ワードラインゲーティング値に関係なく進行される。この方法で、条件的な読み取りと非条件的な書き取り動作は最小のハードウェア資源で効率的に促進される。 Such a conditional “access operation” possibility is generally unnecessary during a write operation in which the data entry stored in the memory array 40 is updated. Thus, when the gate word line is selected by the decoder, application of the write signal to the first logic gate allows immediate access to the gate word line. That is, the write operation proceeds regardless of the word line gating value. In this way, conditional read and unconditional write operations are efficiently facilitated with minimal hardware resources.

図６に示す分岐ターゲットバッファはまた、読み取る動作時メモリセルアレイ４０からデータエントリを受け入れるセンスアンプ４５を含む。前述したように、センスアンプ４５は、また、ワードラインゲーティング回路と関連したそれぞれのメモリ回路にワードライン制御値（ＷＬＣＶ）をロードすることに用いられる。 The branch target buffer shown in FIG. 6 also includes a sense amplifier 45 that receives a data entry from the memory cell array 40 during a read operation. As described above, the sense amplifier 45 is also used to load a word line control value (WLCV) to each memory circuit associated with the word line gating circuit.

本発明に従う分岐予測ユニットの動作方法は、図８を参照して説明する。複数の分岐命令に対応するデータエントリは、分岐ターゲットバッファのメモリアレイに貯蔵される（１００）。それぞれの命令についての分岐ヒストリデータは、適したアルゴリズムを使用して発展する（１０１）。各命令に対するそれぞれのワードラインゲーティング値（ＷＬＧＶ）は、分岐ヒストリデータから誘導され（１０２）、次に対応するワードラインゲーティング回路のメモリ回路に貯蔵される（１０３）。 The operation method of the branch prediction unit according to the present invention will be described with reference to FIG. Data entries corresponding to a plurality of branch instructions are stored in the memory array of the branch target buffer (100). Branch history data for each instruction is developed using a suitable algorithm (101). The respective word line gating value (WLGV) for each instruction is derived from the branch history data (102) and then stored in the memory circuit of the corresponding word line gating circuit (103).

データエントリと貯蔵されたワードラインゲーティング値が貯蔵された分岐予測ユニットは、命令アドレスのような命令部分を容易に受け入れる（１０４）。命令部分がデコードされ（１０５）、対応するワードラインが選択される（１０６）。ワードラインが選択されることによって、貯蔵されたワードラインゲーティング値は、条件的にワードラインのゲート部分がアクセスされるかを決定する。“ポジティブ（ｐｏｓｉｔｉｖｅ）”ワードラインゲーティング値指示（例えば、分岐命令がｔａｋｅｎされるようにする指示）は、ワードラインアクセスをイネーブルし（１０８）、対応するデータエントリを出力する（１０９）。“ネガティブ（ｎｅｇａｔｉｖｅ）”ワードラインゲーティング値指示（例えば、分岐命令がｎｏｔｔａｋｅｎされるようにする指示）は、さらに以上のアクセスやメモリアレイ（１１０）による出力を招かない。前述したネガティブ及びポジティブ指示は、一般に、特定の命令についてｔａｋｅｎ／ｎｏｔｔａｋｅｎ状態に対応する。 The branch prediction unit in which the data entry and the stored wordline gating value are stored easily accepts an instruction part such as an instruction address (104). The instruction part is decoded (105) and the corresponding word line is selected (106). By selecting a word line, the stored word line gating value conditionally determines whether the gate portion of the word line is accessed. A “positive” word line gating value indication (eg, an instruction to cause a branch instruction to be taken) enables word line access (108) and outputs a corresponding data entry (109). A “negative” word line gating value indication (eg, an indication that a branch instruction is not taken) does not cause further access or output by the memory array (110). The negative and positive indications described above generally correspond to a taken / not taken state for a particular instruction.

本発明による分岐予測ユニットは、分岐命令を受け入れ、分岐ターゲットバッファに貯蔵された対応するデータエントリに対するアクセスを条件的にイネーブルすることができる。データエントリは、分岐ターゲットバッファから読み取る。分岐ターゲットバッファでは、対応する分岐ヒストリデータは、データエントリが必要になることを予測する。分岐を“ｔａｋｉｎｇ”する可能性が低い命令は、分岐ターゲットバッファに対する読み取り動作をイネーブルしない。その結果、不要な読み取り動作によって消耗される電力が貯蔵される。 The branch prediction unit according to the present invention can accept branch instructions and conditionally enable access to corresponding data entries stored in the branch target buffer. Data entries are read from the branch target buffer. In the branch target buffer, the corresponding branch history data predicts that a data entry will be required. Instructions that are unlikely to “take” a branch do not enable read operations on the branch target buffer. As a result, power consumed by unnecessary reading operations is stored.

例えば、図９は、ＥＥＭＢＣベンチマークシミュレーションの結果を示している。これは、本発明に従って設計された分岐予測ユニットを使用している。水平軸に沿ったベンチマークルーチンのシリーズに、分岐命令の予測された比率と分岐命令の実際比率との間の比較を示している。この特定なシミュレーションは、分岐予測の約４０％がＮｏｔ−Ｔａｋｅｎ状態と連関されていることを示している。従って、分岐ターゲットバッファメモリアレイ読み取り動作と連関された分岐ターゲットバッファの電力消費は４０％まで減らすことができる。 For example, FIG. 9 shows the result of an EEMBC benchmark simulation. This uses a branch prediction unit designed in accordance with the present invention. A series of benchmark routines along the horizontal axis shows a comparison between the predicted ratio of branch instructions and the actual ratio of branch instructions. This particular simulation shows that about 40% of the branch prediction is associated with the Not-Taken state. Thus, the power consumption of the branch target buffer associated with the branch target buffer memory array read operation can be reduced to 40%.

しかしながら、本発明は、複雑性が増し、動作速度が低下した代わりに、電力保存を提供しない。命令は分岐予測ユニットに入力されると、即時デコーダによって処理される。そしてイネーブルされると、即時対応するデータエントリ出力を発生する。それをプリコードし、命令に対する分岐ヒストリデータを検索するか、或いは計算し、分岐ターゲットバッファメモリアレイに対応する読み取り動作をイネーブル／ディスエーブルする信号を発するために命令を処理するにあたり遅延はない。本発明は分岐ターゲットバッファメモリアレイに対する読み取り動作を条件的に可能とすることにおいて、複雑な追加的回路や機能を要さない。 However, the present invention does not provide power conservation at the expense of increased complexity and reduced operating speed. As instructions enter the branch prediction unit, they are processed by an immediate decoder. When enabled, the corresponding data entry output is generated immediately. There is no delay in processing the instruction to precode it and retrieve or calculate branch history data for the instruction and issue a signal to enable / disable the read operation corresponding to the branch target buffer memory array. The present invention does not require complex additional circuitry or functions to conditionally enable read operations on the branch target buffer memory array.

代わりに、対応するワードラインゲーティング値は、分岐ターゲットバッファでそれぞれ命令処理を待つ。簡単なワードラインゲーティング回路内のワードラインゲーティング値の適用は、入力される命令に対応するデータエントリを貯蔵するワードラインに対するアクセスをイネーブルするか、或いはディスエーブルする。ワードラインゲーティング値は、各命令実行により、ワードラインゲーティング回路内に容易かつ正確にアップデートできる。 Instead, the corresponding word line gating values each wait for instruction processing in the branch target buffer. Application of a word line gating value within a simple word line gating circuit enables or disables access to a word line storing a data entry corresponding to an incoming instruction. The word line gating value can be easily and accurately updated in the word line gating circuit by executing each instruction.

一方、本発明の詳細な説明では、具体的な実施形態について説明したが、本発明の範囲から外れない限度内でさまざまな変更ができる。したがって、本発明の範囲は、前述した実施形態に限定して決められるべきではなく、後述する特許請求の範囲だけでなく、この発明の特許請求の範囲と均等なことによって決められるべきである。 On the other hand, in the detailed description of the present invention, specific embodiments have been described, but various modifications can be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be determined by limiting to the above-described embodiments, but should be determined not only by the claims described below but also by equivalents to the claims of this invention.

分岐ターゲットバッファと分岐ターゲットバッファメモリアレイからのデータエントリの出力に必要な関連構成要素を示す図面である。FIG. 5 is a diagram illustrating a branch target buffer and related components necessary for outputting a data entry from the branch target buffer memory array. 本発明の実施形態によるプロセッサを示すブロック図である。FIG. 3 is a block diagram illustrating a processor according to an embodiment of the present invention. 本発明の実施形態によるスーパースカラープロセッサを示すブロック図である。1 is a block diagram illustrating a superscalar processor according to an embodiment of the present invention. 図２に示された分岐予測ユニットを示すブロック図である。FIG. 3 is a block diagram showing a branch prediction unit shown in FIG. 2. 本発明の実施形態による分岐ヒストリユニット内に含むことができるステートマシンの状態を示す状態図である。FIG. 6 is a state diagram illustrating states of a state machine that can be included in a branch history unit according to an embodiment of the present invention. 本発明の実施形態による分岐ターゲットバッファメモリアレイを示すブロック図である。FIG. 3 is a block diagram illustrating a branch target buffer memory array according to an embodiment of the invention. 本発明の実施形態によるゲートワードライン構造を示す回路図である。FIG. 4 is a circuit diagram illustrating a gate word line structure according to an embodiment of the present invention. 本発明による分岐予測ユニットの動作方法を示す順序図である。FIG. 6 is a flowchart illustrating an operation method of a branch prediction unit according to the present invention. 本発明による分岐予測ユニットについてのベンチマークシミュレーション結果を示す順序図である。It is a flowchart which shows the benchmark simulation result about the branch prediction unit by this invention.

Explanation of symbols

１０：プロセッサ
１２：メインメモリ
１３：フェッチユニット
１５：命令デコーダユニット
１７：実行ユニット
１９：予測ユニット 10: processor 12: main memory 13: fetch unit 15: instruction decoder unit 17: execution unit 19: prediction unit

Claims

A branch target buffer memory array,
A word line and a word line gating circuit associated with the word line;
The memory array, wherein the word line gating circuit includes a memory circuit for storing a word line gating value.

The memory array of claim 1, wherein the memory array is provided to store a data entry associated with the word line.

The memory array of claim 2, wherein the word line gating circuit further includes a gating logic circuit.

The memory of claim 3, wherein the gating logic circuit activates an access operation to the data entry in response to a word line voltage applied to the word line and the word line gating value. array.

The memory array of claim 4, wherein the gating logic circuit is responsive to a write signal.

The memory array according to claim 4, wherein the access operation is a read operation applied to the word line.

The memory array according to claim 5, wherein the access operation is a write operation applied to the word line.

The memory array according to claim 1, wherein the memory array includes a nonvolatile memory cell array.

The memory array according to claim 8, wherein the memory array is an SRAM array.

The memory array of claim 9, wherein the memory circuit is a 1-bit SRAM cell.

3. The memory array according to claim 2, wherein the data entry includes a branch target tag and a branch address.

The memory circuit includes a 1-bit SRAM cell that stores the word line gating value;
The gating logic circuit receives a first logic gate that receives the write signal and the word line gating value to generate a first logic output, and receives the first logic output and the word line voltage. 6. The memory array of claim 5, further comprising a second logic gate that generates a second logic signal.

6. The memory array of claim 5, wherein the word line includes a selected word line portion to which the word line voltage is input and a corresponding gate word line portion.

The memory array of claim 13, wherein the selected word line portion and the gate word line portion are electrically connected through the word line gating circuit.

The memory array of claim 12, wherein the word line includes a selected word line portion to which the word line voltage is input and a corresponding gate word line portion.

The selected word line part and the gate word line part are electrically connected through the word line gating circuit,
The memory array of claim 15, wherein the second logic signal includes a word line voltage applied to the gate word line portion.

A branch target buffer memory array that stores data entries in response to a write operation and outputs the data entries in response to a read operation;
Enabling access to the data entry during the write operation;
A memory array, comprising: a word line gating circuit that conditionally enables access to the data entry during the read operation period in response to a word line gating value stored in the word line gating circuit.

The word line gating circuit is
A memory circuit for storing the word line gating value;
Accepts a write signal and the word line gating value as input, enables access to the data entry during the write operation period, and accesses the data entry during the read operation period when a positive instruction is given by the word line gating value. The memory array of claim 17, further comprising a conditionally enabled gating logic circuit.

In the gating logic circuit, a word line voltage is input to the input,
19. The memory array according to claim 18, wherein access to the data entry is conditionally enabled during the read operation period when a positive instruction is given by the word line gating value and the word line voltage.

The memory array of claim 19, wherein the memory array comprises an array of volatile memory cells.

The memory array of claim 20, wherein the memory array is an SRAM array and the memory circuit is a 1-bit SRAM cell.

A branch target buffer,
The branch target buffer includes a memory array and a decoder,
The memory array includes gate word lines, and each of the gate word lines includes
Selected word line part,
A word line gating circuit including a memory circuit for storing word line gating values; and a gate word line portion;
The branch target buffer is configured to receive an instruction part and select one of the gate word lines in response to the instruction part.

The branch target buffer according to claim 22, wherein the decoder is provided to apply a word line voltage to a selected word line portion of a gate word line selected by the decoder.

24. The branch target buffer of claim 23, wherein the word line gating circuit further comprises a gating logic circuit that receives the word line voltage and the word line gating value as inputs.

The word line gating circuit is provided to conditionally enable an access operation to a gate word line portion of a gate word line selected by the decoder in response to the word line voltage and the word line gating value. 25. The branch target buffer of claim 24.

The branch target buffer according to claim 25, wherein the access operation is a read operation.

The word line gating circuit is provided to accept a write signal at an input;
The branch target buffer according to claim 25, wherein the access operation is a write operation.

The memory array includes an SRAM array, a memory circuit including a 1-bit SRAM cell,
And including the gating logic circuit,
The gating logic circuit is
An OR gate that accepts the write signal and the word line gating value as input and outputs a first logic signal;
27. The branch target buffer of claim 26, further comprising an AND gate that receives the word line voltage and the first logic signal as input and outputs a second logic signal.

30. The branch target buffer of claim 28, wherein the second logic signal is a word line voltage applied to the gate word line portion of the gate word line selected by the decoder.

The branch target buffer according to claim 22, further comprising a sense amplifier adapted to receive a data entry from the gate word line selected by the decoder.

The branch target buffer of claim 30, wherein the sense amplifier includes a circuit for transmitting a word line gating value to each memory circuit associated with the word line.

The memory array includes an SRAM array;
32. The branch target buffer of claim 31, wherein the memory circuit includes a 1-bit SRAM cell for accepting a word line gating value.

The word line gating circuit is
An OR gate that receives the write signal and the word line gating value from the 1-bit SRAM cell as an input and outputs a first logic signal; and the word line voltage and the first logic signal as an input The branch target buffer of claim 32, further comprising an AND gate that receives and outputs a second logic signal.

A branch prediction unit,
A branch history unit for storing branch history data;
A branch prediction logic for inputting an instruction address, providing a predicted address, and updating the branch history data;
A branch target buffer to which the instruction address is input,
The branch target buffer includes a memory array including gate word lines, each gate word line storing a data entry and a word line including a memory circuit storing a word line gating value derived from the branch history data. A branch prediction unit comprising a gating circuit.

The branch prediction unit of claim 34, wherein the branch prediction logic provides a write signal to the branch target buffer.

36. The branch prediction unit of claim 35, wherein the branch history unit includes a state machine that calculates branch history data for an instruction according to a branching execution history.

The branch prediction unit according to claim 36, wherein the branch history unit includes a branch history table that stores branch history data, and the state machine includes a 2-bit up / down saturation counter.

The memory array includes an SRAM array, the memory circuit includes 1-bit SRAM cells, and the word line gating value includes a single data bit derived from the branch history data. Item 35. The branch prediction unit according to Item 34.

The branch target buffer is:
A decoder for receiving the instruction address and selecting a gate word line in response to the instruction address;
A sense amplifier provided with a circuit for transmitting a word line gating value to each word line gating circuit provided to receive a data entry from the selected gate word line and linked to the gate word line; The branch prediction unit according to claim 38, further comprising:

A branch prediction unit,
A branch history unit for storing branch history data;
A plurality of gate word lines, each gate word line including a branch target buffer accessed by operation of a corresponding word line gating circuit;
The branch target buffer is provided to output a data entry in response to an instruction part received by the branch target buffer and a word line gating value derived from the branch history data. unit.

41. The branch prediction logic of claim 40, further comprising branch prediction logic for receiving the instruction portion and providing a predicted address in response to the data entry output by the branch target buffer. unit.

42. The branch prediction unit of claim 41, wherein the branch prediction logic is provided to provide a write signal to the branch target buffer.

The branch prediction unit according to claim 41, wherein the branch history unit includes a branch history table that stores the branch history data.

The memory array includes an SRAM array, the memory circuit includes a 1-bit SRAM cell, and the word line gating value includes a single data bit derived from the branch history data. The branch prediction unit according to claim 43.

A processor,
An instruction fetch unit into which an instruction is input and providing a corresponding instruction address;
A branch prediction unit that receives the instruction address and provides the predicted address to the instruction fetch unit;
An instruction decoder / effective unit that receives the instruction, provides a decoded instruction, and provides an updated address in response to execution of the decoded instruction;
The branch prediction unit is:
A branch history unit for storing branch history data;
A branch prediction logic for receiving the instruction address and the updated address, providing the predicted address, and updating the branch history data;
A branch target buffer for receiving the instruction address and outputting a data entry;
The branch target buffer includes a memory array including gate word lines, each gate word line is connected to a corresponding word line gating circuit, and the word line gating circuit is a word line derived from the branch history data. A processor comprising a memory circuit for storing a gating value.

The processor of claim 45, wherein the decoder / execution unit includes a plurality of execution paths, each execution path including a decoder and an execution unit.

The processor of claim 46, wherein the processor is a superscalar processor.

48. The processor of claim 47, wherein the processor is a vector processor or a single-instruction-multiple-data processor.

The processor of claim 45, wherein the branch prediction logic provides a write signal to the branch target buffer.

50. The processor of claim 49, wherein the branch history unit includes a state machine that calculates branch history data for instructions by branching execution history.

51. The processor of claim 50, wherein the branch history unit includes a branch history table that stores the branch history data.

The memory array includes an SRAM array, the memory circuit includes a 1-bit SRAM cell, and the word line gating value includes a single data bit derived from the branch history data. 46. The processor of claim 45.

The branch target buffer is:
A decoder that receives the instruction address and selects a gate word line in response to the instruction address;
A sense amplifier having a circuit for inputting the data entry from the selected gate word line and transmitting a word line gating value to each word line gating circuit associated with the gate word line; 53. The processor of claim 52, comprising: