JPH04219825A

JPH04219825A - Data processor and method for loading multi-port register file

Info

Publication number: JPH04219825A
Application number: JP6245091A
Authority: JP
Inventors: M Arnold James; ジェームズ・エム・アーノルド; J Hinton Glen; グレン・ジェイ・ヒントン; S Smith Frank; フランク・エス・スミス
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 1990-03-05
Filing date: 1991-03-05
Publication date: 1992-08-10
Also published as: GB9320089D0; GB9101089D0; GB2241801B; GB2241801A

Abstract

PURPOSE: To guarantee that the latest data are used when an instruction stream is executed in a pipeline. CONSTITUTION: The register file of a pipeline microprocessor having by-pass structures 16 and 24 that drive correct source data according to the last write result is provided. Load data and execution result data 52 are returned to a RAM array in a 2nd phase of a cycle, but written in the RAM array actually in a 1st phase of the clock cycle. To evade the delay of an instruction by one cycle wherein data to be read out after being written to the RAM is held, a by-pass logic device sends the load or execution result data to be returned to the column line of the read port of a source bus in a 2nd phase of a cycle wherein the data is returned.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、データ処理装置に関し
、更に詳しくは、必ず最新のデータが用いられるように
するため、直前の書き込み結果から正しいソース・デー
タをドライブするバイパス構造を有するレジスタ・ファ
イルに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device, and more particularly, the present invention relates to a data processing device, and more particularly, to a register having a bypass structure that drives correct source data from the immediately previous write result in order to ensure that the latest data is used. Regarding files.

【０００２】0002

【従来の技術】インテル・コーポレーションに譲渡され
た、発明者デビッド・ブッデ他による１９９０年１月２
日交付された米国特許第４，９８１，７３３号、発明の
名称「マイクロプロセッサ・チップにおけるレジスタ・
スコアボーディング」には、ユーザ・アクセス可能レジ
スタを含んでいるレジスタ・ファイルにおけるロード・
インストラクションに関してスコアボーディング技術を
用いることにより、パイプライン・マイクロプロセッサ
においてインストラクション・ストリームを実行する際
のアイドル時間を最小にする装置が示されている。また
、従来技術には、スコアボーディングを全てのマルチサ
イクル演算に拡張したものもある。いくつかのインスト
ラクションが各クロック・サイクルにおいて発行され、
かつ同時に実行される。複数のオペランドに必要なレジ
スタに対するアクセス要求に応じるため、マルチポート
・レジスタ・ファイルが設けられ、それらが必要なデー
タを同時にアクセスするというマルチプル演算を可能に
している。最も有効的な方法でこれを実施するための新
しいＲＡＭアレイ・セルも供給されている。[Prior Art] January 2, 1990 by inventor David Budde et al., assigned to Intel Corporation.
U.S. Pat.
"Scoreboarding" includes loading and unloading in register files that contain user-accessible registers.
An apparatus is shown that minimizes idle time when executing a stream of instructions in a pipelined microprocessor by using scoreboarding techniques with respect to instructions. Also, some prior art techniques extend scoreboarding to all multi-cycle operations. Several instructions are issued in each clock cycle,
and are executed simultaneously. In order to meet requests for access to registers necessary for multiple operands, a multiport register file is provided, which allows multiple operations to simultaneously access the necessary data. New RAM array cells are also being provided to do this in the most efficient manner.

【０００３】0003

【発明が解決しようとする課題】従来技術において、バ
イパスは、正しいデータを、センス・アンプの後にある
ソース・データ・バスに直接的にマルチプレクスするこ
とにより、基本レジスタファイル・アレイの外で行われ
ていた。本発明の目的は、その後の演算において最新の
データが用いられるようにするため、直前の書き込み結
果から正しいソース・データをドライブするバイパス構
造が組み込まれている装置を提供することである。In the prior art, bypassing was accomplished outside the basic register file array by multiplexing the correct data directly onto the source data bus after the sense amplifier. I was worried. It is an object of the present invention to provide a device incorporating a bypass structure that drives correct source data from the previous write result so that the latest data is used in subsequent operations.

【０００４】0004

【課題を解決するための手段】上記目的は、メモリ・イ
ンタフェイスとランダム・アクセス・メモリ（ＲＡＭ）
アレイとの間にバイパス回路を接続した本発明により達
成することができる。メモリ・インタフェイスは　Ｌｄ
Ｄａｔａ　バス１０６を含み、かつＲＡＭアレイは、複
数の出口ポートを有する複数のワード・レジスタを含ん
でいる。上記メモリ・インタフェイスおよび上記ＲＡＭアレイに
接続したアラインメント論理装置は、メモリ・インタフ
ェイス向けのデータをアレンジし（ストアの場合）、お
よびメモリ・インタフェイスからのデータをＲＡＭアレ
イに送るよう準備する（ロードの場合）。また、コラム
選択ラインがドライブされる直前にＲＡＭアレイにおけ
るレジスタをクリヤする装置が設けられている。ロード
・バイパス論理装置は、メモリ・インタフェイスからリ
ターンする入力データが、ＲＡＭアレイの読出しポート
の出力ラム・ラインに直接的に配置されるように、　Ｌ
ｄＤａｔａ　バスをバイパスする。[Means for solving the problem] The above purpose is to provide a memory interface and a random access memory (RAM).
This can be achieved by the present invention in which a bypass circuit is connected between the array and the array. Memory interface is Ld
Data bus 106 and the RAM array include multiple word registers with multiple exit ports. Alignment logic connected to the memory interface and the RAM array arranges data for the memory interface (in the case of stores) and prepares data from the memory interface to be sent to the RAM array ( load). Also provided is a device for clearing the registers in the RAM array just before the column select line is driven. The load bypass logic ensures that the input data returning from the memory interface is placed directly on the output RAM line of the read port of the RAM array.
Bypass the dData bus.

【０００５】バイパス論理装置は、要求されたソース・
レジスタのアドレスと、データがリターンする全レジス
タのレジスタ・アドレスを比較して、整合出力信号を生
じるレジスタ比較アドレス論理装置を含んでいる。整合
出力信号に応じて、ロード・アラインメント論理装置ま
たは実行装置からのデータは、ソースのＲＡＭコラム・
ラインに直接的に配置される。Bypass logic provides the requested source
It includes register compare address logic that compares the address of the register with the register addresses of all registers into which data is to be returned to produce a matched output signal. Depending on the alignment output signal, data from the load alignment logic or execution unit is routed to the source RAM column.
placed directly on the line.

【０００６】[0006]

【実施例】以下、添付の図面に基づいて、本発明の実施
例に関し説明する。図１において、レジスタ・ファイル
（ＲＦ）は、１６個のローカルおよび１６個のグローバ
ル・レジスタを有し、かつメモリ・インタフェイス装置
／インストラクション・デコーダ８と実行装置４とに接
続している。ＲＦは、並列構成を支持するため４つの独
立読出しポートと２つの書込みポートを有している。そ
れは、またレジスタ・スコアボーディング論理装置２１
をチェックしかつ保持している。DESCRIPTION OF THE PREFERRED EMBODIMENTS Examples of the present invention will be described below with reference to the accompanying drawings. In FIG. 1, a register file (RF) has 16 local and 16 global registers and is connected to a memory interface device/instruction decoder 8 and an execution unit 4. The RF has four independent read ports and two write ports to support parallel configuration. It also includes register scoreboarding logic 21
are checked and maintained.

【０００７】図１の回路は、米国特許第４，８１６，７
００号に述べているクロックのような２つの非オーバラ
ッピング・クロック・フェーズ設計を有するクロックに
よりドライブされる。４つのクロック、ＰＨ１，ＰＨ１
Ｉ，ＰＨ２Ｉがチップに配分されている。ＰＨ１とＰＨ
２は、等しいデューティ・サイクルの一般的なＮＭＯＳ
非オーバラッピング・クロックである。ＰＨ１ＩとＰＨ
２Ｉは、ＰＨ１とＰＨ２のＰＭＯＳアナログで、しかも
それぞれＰＨ１とＰＨ２の正反転である。The circuit of FIG. 1 is described in US Pat. No. 4,816,7.
00, which has a two non-overlapping clock phase design. 4 clocks, PH1, PH1
I, PH2I are distributed to the chips. PH1 and PH
2 is a typical NMOS with equal duty cycle
Non-overlapping clocks. PH1I and PH
2I is a PMOS analog of PH1 and PH2, and is the positive inversion of PH1 and PH2, respectively.

【０００８】レジスタ・ファイル（ＲＦ）は、マイクロ
プロセッサにおける全データ・オペランドに関する焦点
である。マイクロプロセッサは、ロード／ストア・アー
キテクチャを実施し、プログラムに関連した全データ・
オペランド（特殊機能レジスタ・オペランドを除く）は
、同時にまたは別なときにＲＦに存在しなければならな
い。ＲＦは、マクロコードおよびマイクロコード・ビジ
ブルＲＡＭレジスタを含んでいる。ＲＦは、マルチポー
ト・アクセス構造によりこれらレジスタに対する高性能
なインタフェイスを供給し、同じマシン・サイクルにお
いて、異なるレジスタで４つの読出しおよび２つの書込
みを行うことができる。The register file (RF) is the focal point for all data operands in a microprocessor. A microprocessor implements a load/store architecture and stores all data associated with a program.
Operands (except special function register operands) must be present in the RF at the same time or at different times. RF contains macrocode and microcode visible RAM registers. RF provides a high performance interface to these registers with a multi-port access structure, allowing four reads and two writes to different registers in the same machine cycle.

【０００９】レジスタ・ファイルは、６つの主な論理ブ
ロック、すなわちロード／ストア・アラインメント１０
，１２、ベースＭＵＸ１４、ロード・バイパス１６、Ｒ
ＡＭアレイ１８、デスティネーション・バイパス２４、
および　Ｓｒｃ１／Ｓｒｃ２　ＭＵＸ２６から成ってい
る。４つの読出し、すなわちストア５８、ベース５０、
Ｓｒｃ１５４および　Ｓｒｃ２　５６が可能である。同
様に、２つの書込み、すなわちロード５２とデスティネ
ーション６０が可能である。The register file consists of six main logical blocks: load/store alignment,
, 12, base MUX 14, load bypass 16, R
AM array 18, destination bypass 24,
and Src1/Src2 MUX26. 4 reads: store 58, base 50,
Src154 and Src2 56 are possible. Similarly, two writes are possible: load 52 and destination 60.

【００１０】図２において、実際のＲＡＭアレイ１８を
含んでいる全データ路は、グループになったワード・ビ
ット（ワード３ビット３１、ワード２ビット３１、ワー
ド１ビット３１など）で配列された４ワード×３２ビッ
ト／ワード、１２８ビット幅のデータ路に構成されてい
る。この配列は、ＲＡＭセルの幅のサイズおよびロード
／ストア・データの整合の容易性の両方にとって利点が
ある。In FIG. 2, the entire data path containing the actual RAM array 18 consists of 4 bits arranged in grouped word bits (word 3 bit 31, word 2 bit 31, word 1 bit 31, etc.). The data path is word x 32 bits/word, 128 bits wide. This arrangement has advantages in both the size of the RAM cell width and the ease of aligning load/store data.

【００１１】図３は、ＲＡＭアレイの読出しと書込み、
およびスコアボード・ビットのチェックと設定の基本的
タイミングを示している。図３の表において、ロード・
データは、任意数のサイクル後にリターンされる。その
時、信号　ＬｄＶａｌｉｄ１０４がアサートされ、バリ
ド・データが　ＬｄＤａｔａ　バス１０６にあることを
示す。ＲＡＭアレイにおけるレジスタは、Ｐｈ２におい
て読出され、ＰＨ１において書込まれる。ロード・デー
タがレジスタ・ファイルに書込まれる時、以下のことが
生じる。パイプ・ステージ２、フェーズ２（“ｑ２２”
）において、データのゼロはＲＡＭに書込まれ、１は１
フェーズ後のｑ３１において書込まれる。０は、ＲＡＭ
セルにおいて１に重ね書きすることができないので、書
込まれるべきレジスタは、データの実際の書込みの直前
にクリヤされなければならない。FIG. 3 shows reading and writing of the RAM array.
and basic timing for checking and setting scoreboard bits. In the table of Figure 3, the load
Data is returned after any number of cycles. At that time, signal LdValid 104 is asserted, indicating that valid data is on LdData bus 106. Registers in the RAM array are read in Ph2 and written in PH1. When load data is written to the register file, the following occurs. Pipe Stage 2, Phase 2 (“q22”
), data zeros are written to RAM and ones are 1
Written in q31 after phase. 0 is RAM
Since it is not possible to overwrite a 1 in a cell, the register to be written must be cleared immediately before the actual writing of data.

【００１２】ロード・データは、ｑ２２においてリター
ンされるが、実際にはｑ３１においてＲＡＭアレイに書
込まれる。Ａｄｄインストラクションが、ＲＡＭに書込
まれその後再び読出されるべきデータを待機する１サイ
クル遅延されるのを避けるよう、ＲＦは、リターンする
ロード・データをｑ２２において　Ｓｒｃ１　バスにバ
イパスする。ロード・データは、通常どおりｑ３１にお
いて書込まれ、一方、同時にＡｄｄインストラクション
はＥＵ４により実行される。The load data is returned in q22, but is actually written to the RAM array in q31. The RF bypasses the returning load data to the Src1 bus in q22 to avoid the Add instruction being delayed one cycle waiting for data to be written to RAM and then read again. Load data is written in q31 as usual, while at the same time an Add instruction is executed by EU4.

【００１３】ロード／ストア・アラインメントロードお
よびストア・アラインメント論理ブロック１０，１２は
、メモリ・インタフェイスに向かうデータを配列し（ス
トアの場合）、メモリ・インタフェイスからのデータを
ＲＡＭアレイに送るよう準備をする（ロードの場合）、
手続きは、単に方向が反転しているだけで、両方の場合
ほとんど等しいので、ロード・アラインメント・プロセ
スについてのみ説明する。Load/Store Alignment Load and store alignment logic blocks 10, 12 align the data destined for the memory interface (in the case of a store) and prepare the data from the memory interface to be sent to the RAM array. (for loading),
Only the load alignment process will be described, since the procedure is nearly identical in both cases, just with the direction reversed.

【００１４】メモリ・インタフェイスからリターンする
ロード・データは、それが最下位ワード（ＬＳＷ）、す
なわちワード０にワード・アラインされるように配列さ
れる。たとえば、４ワード・メモリ・ブロックにおける
ワード２からリターンするワードは、それが　ＬｄＤａ
ｔａ　バスに配列される前にワード０にシフトされる。ＲＦデータ路は、グループになった（全ビットゼロ、全
ビット１など）ワード・ビットで、４ワード×３２ビッ
ト／ワード路として構成されているように、ＬｄＤａｔ
ａおよび　ＳｔＤａｔａ　バスもこのように構成されて
いる。したがって、ワード０へのワード・シフトは、単
なる各ビット・セルにおけるマルチプレックス・プロセ
スである。ほんの部分的なワード・アラインメントは、
メモリ・インタフェイスにより行われるので、サブワー
ド（バイトおよびショート・ワード）の場合は、メモリ
・インタフェイスの観点から見ればワード・アクセスと
同様である。たとえば、１６（０〜１５）バイト・メモ
リ・ブロックのバイト１３からリターンするバイトは、
ワード３、ビット８−１５においてリターンする。その
後、メモリ・インタフェイスは、バイトがまだビット８
−１５にあるにかかわらず、これをＬＳＢすなわちワー
ド０にアラインする。Load data returning from the memory interface is arranged so that it is word aligned to the least significant word (LSW), word zero. For example, the word returning from word 2 in a 4-word memory block indicates that it is LdDa
Shifted to word 0 before being placed on the ta bus. The RF data path is configured as a 4 word x 32 bit/word path with word bits grouped (all zeros, all ones, etc.).
The a and StData buses are also configured in this manner. Therefore, word shifting to word 0 is simply a multiplexing process on each bit cell. Only partial word alignment is
Since it is done by the memory interface, subwords (bytes and short words) are similar to word accesses from the memory interface's perspective. For example, the byte returned from byte 13 of a 16 (0-15) byte memory block is
Return in word 3, bits 8-15. The memory interface then indicates that the byte is still bit 8.
-15, align it to the LSB, word 0.

【００１５】ＲＦロード・アラインメント論理装置１０
が行う第１過程は、入力データを最下位バイト（ＬＳＢ
）に正しくバイト・アラインすることである。これは、
リターンするデータがサブワード量である場合にのみ行
われなければならない。ＲＦは、データより前のフェー
ズでリターンされる　ＴｙｐｅＩｎ　（または長さ）フ
ィールドからこれを決定する。入力データをバイト・ア
ラインすることは、データを最も低いバイトに物理的に
移動すること、すなわちＲＦのデータ路に対して直角な
、データの実際の“操作（ステアリング）”を必要とす
る。レジスタ・ファイルに向かう全データは、ＬＳＢに完全
にアラインされる。RF load alignment logic 10
The first step is to convert the input data into the least significant byte (LSB
) is to align the bytes correctly. this is,
This must be done only if the data to be returned is a subword amount. The RF determines this from the TypeIn (or Length) field returned in an earlier phase than the data. Byte alignment of input data requires physically moving the data to the lowest byte, ie, actual "steering" of the data at right angles to the RF data path. All data going to the register file is completely LSB aligned.

【００１６】ロードの場合、ここでゼロ拡張が行われる
。これはロード・アラインメント・ブロックにおいて独
特のもので、ストアはこれら演算を行わなくてもいい。メモリからリターンするデータがバイトまたはショート
・ワードの場合、　ＴｙｐｅＩｎ　フィールドのビット
３はゼロで、その後、レジスタに書込まれるべき３２ビ
ット・ワードの残りをパッドするため、ゼロ拡張を行わ
なければならない。最終ステップは、予定されたワード
場所にワードを正しく配置するレジスタ・アラインメン
トである。ワードは、その後ＲＡＭアレイに書込まれる
。[0016] In the case of a load, zero extension is performed here. This is unique in load alignment blocks; stores do not have to perform these operations. If the data returning from memory is a byte or short word, bit 3 of the TypeIn field is zero and must then be zero-extended to pad the remainder of the 32-bit word to be written to the register. The final step is register alignment, which correctly places the word in the intended word location. The word is then written to the RAM array.

【００１７】ベースＭＵＸベースＭＵＸ１４は、図２に示されているＲＡＭアレイ
読出しポートからの６４ビット・フィールドを、メモリ
・インタフェイス８に適した３２ビット・ベースに減少
する２−１マルチプレクサを含んでいる。ベースＭＵＸ
は、ＲＡＭアレイの６４ビット・ベース・バス５０を処
理するのに必要である。これはまた、６４ビット値を３
２ビット・ベースにさらにマルチプレクスしなければな
らないであろうＲＡＭアレイにおける領域を節約する。マルチプレクサは、ＢａｓｅＡｄｒ　バスのビット１に
より制御され、これはどのワードがベース・バスに配置
されるべきかを指定する。Base MUX Base MUX 14 includes a 2-1 multiplexer that reduces the 64-bit field from the RAM array read port shown in FIG. 2 to a 32-bit base suitable for memory interface 8. There is. Base MUX
is required to service the 64-bit base bus 50 of the RAM array. This also converts 64-bit values into 3
This saves space in the RAM array that would otherwise have to be multiplexed on a 2-bit basis. The multiplexer is controlled by bit 1 of the BaseAdr bus, which specifies which word should be placed on the base bus.

【００１８】ロード・バイパスロード・バイパス論理ブロック１６は、メモリ・インタ
フェイスからリターンする　ＬｄＤａｔａ　バス１０６
を様々な出口ポート、すなわち　ＳｔＤａｔａ　５８、
ベース５０、Ｓｒｃ１５４、Ｓｒｃ２５６にバイパスす
る論理装置を含んでいる。データがリターンしている全レジスタのレジスタ・アド
レスは、要求されたソース・レジスタのアドレスに比較
される。整合している場合には、バイパス論理装置は、
ロード・アラインメント論理ブロックからのデータをソ
ースのＲＡＭコラム・ラインに直接的に配置する。Ｓｒ
ｃ　バスの観点から、演算の結果は、あたかもデータが
ＲＡＭアレイ・セルから読出されたものと同様となり、
相違は検出されない。Load Bypass Load Bypass logic block 16 returns from the memory interface LdData bus 106
to the various exit ports, i.e. StData 58,
It includes logic to bypass the base 50, Src154, and Src256. The register addresses of all registers for which data is being returned are compared to the address of the requested source register. If so, the bypass logic
Places data from the load alignment logic block directly into the source RAM column lines. Sr.
From the perspective of the c bus, the result of the operation is as if the data were read from a RAM array cell;
No differences detected.

【００１９】バイパスされているレジスタが図２に示さ
れているクリヤ・ライン６７により既にクリヤされてい
るので、コラム・ラインをドライブするこの方法は可能
である。このことは、コラム・ラインがドライブされる
直前にアサートされる。もし、この事実がなかったなら
ば、デコード論理装置は、ＲＡＭセルをなおエネーブル
し続けて、それらのデータをドライブするので、レジス
タの古くなった内容は、コラム・ラインにドライブされ
てしまう。コラム・ラインは、ネガティブ・トゥルーに
プリチャージされる。これは、セルにおけるゼロがライ
ンの状態に影響しないということを意味している。This method of driving column lines is possible because the bypassed registers have already been cleared by the clear line 67 shown in FIG. This is asserted just before the column line is driven. If this were not the case, the decode logic would still enable the RAM cells and drive their data so that the stale contents of the registers would be driven into the column lines. The column line is precharged to negative true. This means that a zero in a cell does not affect the state of the line.

【００２０】ＲＡＭアレイＲＡＭアレイ論理ブロックは、リテラル発生論理装置１
９と、レジスタＲＡＭアレイ１８と、アドレス・デコー
ダ２０，２２と、レジスタ・スコアボード・ビット２１
から成っている。レジスタ・ファイルは、プログラマ／
マイクロプログラマを使用する場合、３２個のリテラル
すなわち値０〜３１を供給する。これら値を発生するリ
テラル論理装置１９は、ＲＡＭアレイのすぐ上にあり、
ＲＡＭコラム・ライン５０，５８は、そのセクションを
進行し、ロード・バイパス論理ブロック１６まで続いて
いる。リテラルが　Ｓｒｃ１　または　Ｓｒｃ２　オペ
ランドとして要求されている場合（リテラルはベースお
よびストア使用に関するソースとして許されていない）
、その対応する“レジスタ・アドレス”はＳ１Ａｄｒ　
または　Ｓ２Ａｄｒ　バスに配置される。The RAM array RAM array logic block is a literal generating logic unit 1.
9, register RAM array 18, address decoders 20, 22, and register scoreboard bit 21.
It consists of The register file is used by the programmer/
When using a microprogrammer, 32 literals or values 0-31 are provided. The literal logic unit 19 that generates these values is located directly above the RAM array;
RAM column lines 50, 58 continue through that section to load bypass logic block 16. If a literal is required as the Src1 or Src2 operand (literals are not allowed as sources for base and store usage)
, its corresponding “register address” is S1Adr
or placed on the S2Adr bus.

【００２１】デスティネーション・バイパスデスティネ
ーション・バイパス論理ブロック２６は、ＥＵまたはＲ
ＥＧコプロセッサからリターンするＤｓｔバス１１０を
様々な出口ポート：　ＳｔＤａｔａバス、ベース・バス
、Ｓｒｓ！　および　Ｓｒｃ２　　にバイパスする回路
を含んでいる。デスティネーション・バイパスは、いく
つかのわずかな相違があるだけでロード・バイパス１６
にほとんど等しい。Ｄｓｔバス１１０はたった６４ビッ
ト幅なので、わずか２つのレジスタをバイパスできるに
すぎないため、この論理装置は、デスティネーション・
バイパスにおいて、実際にはもっと単純である。ロード
・バイパスにおいて、レジスタ・アドレス比較論理装置
は、　ＬｄＤａｔａ　バス１０６が１２８ビット幅なの
で、４つのレジスタがバイパスされる可能性を処理しな
ければならない。これら相違の他は、論理装置はロード・バイパス回路と
ほとんど同じである。Destination Bypass Destination bypass logic block 26 is
The Dst bus 110 returning from the EG coprocessor is routed to various exit ports: StData bus, base bus, Srs! It includes a bypass circuit to Src2 and Src2. Destination Bypass is a Road Bypass 16 with only a few minor differences.
almost equal to. Since the Dst bus 110 is only 64 bits wide, only two registers can be bypassed, so this logic device
In bypass, it is actually simpler. In load bypass, the register address comparison logic must handle the possibility that four registers are bypassed since the LdData bus 106 is 128 bits wide. Other than these differences, the logic device is much the same as the load bypass circuit.

【００２２】Ｓｒｃ１　および　Ｓｒｃ２　マルチプレ
クサＳｒｃ１　および　Ｓｒｃ２　マルチプレクサ２６
は、Ｓｒｃ１　および　Ｓｒｃ２　バスに関する３２ビ
ット・オペランドをドライブするため２ワード６４ビッ
ト・ソースＲＡＭデータの１つかまたは　ＳＦＲＩｎＢ
ｕｓ　を選択するのに必要とされるマルチプレクサを含
んでいる。また、論理ブロックは　Ｓｒｃ１Ｈｉ　バス
をドライブするバッファを含んでおり、必要な場合フル
６４ビット・ソースを供給する。３つの可能ソースを単
一ワード　Ｓｒｃ　　オペランドにマルチプレクスする
のに必要な制御は、ＳＦＲＩｎＢｕｓ　　をいつエネー
ブルするかを論理装置に知らせる　Ｓ１Ａｄｒ（　Ｓ２
Ａｄｒ　）　＝“１０”フィールドの上部２ビットとと
もに　Ｓ１Ａｄｒ（または　Ｓ２Ａｄｒ）のＬＳＢであ
る。Src1 and Src2 multiplexer Src1 and Src2 multiplexer 26
is one of two words of 64-bit source RAM data or SFRInB to drive the 32-bit operands on the Src1 and Src2 buses.
Contains the multiplexer needed to select us. The logic block also includes a buffer that drives the Src1Hi bus, providing a full 64-bit source when needed. The control required to multiplex the three possible sources into a single word Src operand is S1Adr(S2), which tells the logic when to enable SFRInBus.
Adr) = “10” Together with the upper 2 bits of the field, it is the LSB of S1Adr (or S2Adr).

【００２３】Ｓｒｃ１Ｈｉバスは、データがＥＵまたは
ＲＥＧコプロセッサにより必要とされているかどうかに
関係なくドライブされる。以下、図１に示されるような
ＲＦを他の論理ブロックに接続するバスおよび信号につ
いての概要を説明する。The Src1Hi bus is driven regardless of whether data is needed by the EU or REG coprocessor. An overview of the buses and signals that connect the RF to other logic blocks as shown in FIG. 1 will be described below.

【００２４】メモリ・インタフェイス・バス以下のバス
は、ＲＦに、およびＲＦから実際のデータを伝達する。ＬｄＤａｔａ（０：１２７）　　これは、メモリ・イン
タフェイス（外部メモリ、データ・キャッシュ他）から
の情報をリターンする１２８ビット・ロード・データ・
バスである。ＳｔＤａｔａ（０：１２７）　　これは、メモリ・イン
タフェイスに情報を送る１２８ビット・ストア・データ
・バスである。ベース（０：３１）　　ベース・バスは、ロードまたは
ストアのメモリ・アドレスを指定するメモリ・インタフ
ェイスに送られる３２ビット・ベース・アドレス・バス
である。The buses below the memory interface bus convey the actual data to and from the RF. LdData (0:127) This is a 128-bit load data file that returns information from the memory interface (external memory, data cache, etc.).
It's a bus. StData (0:127) This is a 128-bit store data bus that sends information to the memory interface. Base (0:31) The Base Bus is a 32-bit base address bus sent to the memory interface that specifies the memory address for a load or store.

【００２５】以下のバスは、制御およびレジスタ・アド
レス情報を伝達し、上記データ・バスに関するタイプお
よびロケーション情報を指定する。全レジスタ・アドレ
スは７ビットである。ＢａｓｅＡｄｒ　　　これは、ベース・バスをドライブ
するのに使用されるべきレジスタのアドレスである。ＬｄＡｄｒＯｕｔ　　ロード・アドレス・アウト・バス
は、いくつかの場合において使用される。それは、デー
タがリターンされ、かつロード・インストラクションで
スコアボードされるよう、ロード・インストラクション
（すなわち、クウォド・ワード・アクセスにおいて“Ｇ
Ｏ”）でスコアボードされる開始レジスタを指定するＩ
ＳによりｏｐコードとともにＲＦに送られる。それは、
また、ストア・インストラクションで　ＳｔＤａｔａ　
バスに送られるべき開始レジスタを指定するのにも使用
される。最後に、それは、ＬＤＡ（ロード有効アドレス
）インストラクションでリターンされるレジスタ・デー
タのアドレスｓｉを含んでいる。The following buses convey control and register address information and specify type and location information for the data buses. All register addresses are 7 bits. BaseAdr This is the address of the register to be used to drive the base bus. The LdAdrOut load address out bus is used in several cases. It is important that the load instruction (i.e., “G
I specifying the starting register to be scoreboarded with
S is sent to the RF along with the op code. it is,
Also, in the store instructions, StData
Also used to specify the starting register to be sent to the bus. Finally, it contains the address si of the register data returned in the LDA (load effective address) instruction.

【００２６】ＬｄＡｄｒＩｎ　　　これは、メモリ・イ
ンタフェイスまたはＩＳからリターンするロードまたは
ＬＤＡデータのレジスタ・アドレスである。それは、デ
ータがレジスタ・ファイルにリターンする状態にある場
合にドライブされる。ＴｙｐｅＯｕｔ　（０：３）　　この４ビット・フィー
ルドは、アクセスの長さと、サブワード・アクセスにお
いて使用される拡張の種類を指定する。それは、ｏｐコ
ードおよび　ＬｄＡｄｒＯｕｔ　バスとともにＩＳによ
りドライブされる。それは、ロードに関しどのレジスタ
をスコアボードするか、およびストアに関しどのレジス
タがＳｔＤａｔａ　バスをドライブするかを決定するの
に使用される。ＴｙｐｅＩｎ　（０：３）　　これは、それがデータ・
キャッシュまたは外部メモリからのものであるとしても
データがリターンするのを待機するメモリ・インタフェ
イスによりトラップされた　ＴｙｐｅＯｕｔ　フィール
ドである。それは、　ＬｄＡｄｒＩｎ　バスとともにリ
ターンされる。LdAdrIn This is the register address of the load or LDA data returning from the memory interface or IS. It is driven when data is ready to return to the register file. TypeOut (0:3) This 4-bit field specifies the length of the access and the type of expansion used in subword accesses. It is driven by the IS along with the opcode and LdAdrOut bus. It is used to determine which registers to scoreboard for loads and which registers to drive the StData bus for stores. TypeIn (0:3) This means that it is a data
A TypeOut field trapped by a memory interface waiting for data to return, even if it is from a cache or external memory. It is returned along with the LdAdrIn bus.

【００２７】ＬｄＳｔＯｕｔ　（０：３）　　これは、
メモリ演算のどの特色：ロード、ＬＤＡ、ストア、また
はインストラクション・フェッチが要求されるかを決定
する。それは、ＴｙｐｅＩｎ　　および　ＬｄＡｄｒＩ
ｎ　　フィールドとともに送られる。ＬｄＶａｌｉｄ　　　メモリ・インタフェイスによりド
ライブされるこの信号は、バリド・データが　ＬｄＤａ
ｔａ　バスに配置されるときアサートされる。ＭｅｍＳｃｂｏｋ　　　　ＲＦによりドライブされるこ
の信号は、カレント・メモリ・タイプのインストラクシ
ョンにより使用されるレジスタがフリーでないこと、お
よびレジスタが使用中でない場合インストラクションを
再発行しなければならないことを、論理装置の残りに示
す。[0027] LdStOut (0:3) This is
Determine which features of memory operations are required: loads, LDAs, stores, or instruction fetches. It is TypeIn and LdAdrI
Sent with the n field. LdValid This signal, driven by the memory interface, indicates that valid data is LdDa
Asserted when placed on the ta bus. This signal, driven by MemScbok RF, indicates to the rest of the logic that the register used by the instruction of the current memory type is not free, and that the instruction must be reissued if the register is not in use. show.

【００２８】レジスタ実行バス以下のバスは、ＲＦにおよびＲＦからデータを伝達する
。Ｓｒｃ１Ｈｉ，　Ｓｒｃ１　　　　これら２つの３２ビ
ット・バスは、ＥＵおよびコプロセッサに送られる６４
ビット・ソース・オペランド＃１を形成している。Ｓｒｃ２Ｈｉ，　Ｓｒｃ２　　　　これら２つの３２ビ
ット・バスは、ＥＵおよびコプロセッサに送られる６４
ビット・ソース・オペランド＃２を形成している。ＤｓｔＨｉ　＃，ＤｓｔＬｏ＃　　　これは、ＥＵおよ
びコプロセッサが、実行された演算の結果をリターンす
るのに使用する６４ビット・デスティネーション・バス
を形成している。ＳＦＲＩｎＢｕｓ　（０：３１）＃　
　これは、それらがレジスタであるかのように外部コア
論理機能を読出すことができる３２ビット特殊機能レジ
スタ・バスである。レジスタ・アドレス・フィールドが
ＳＦＲレジスタ・アドレスに整合している場合、ＲＦに
より、ＳＦＲＩｎＢｕｓ　は　Ｓｒｃ１　または　　Ｓ
ｒｃ２　バスをドライブすることができる。それもアサ
ートされた低いバスである。The buses below the register execution bus convey data to and from the RF. Src1Hi, Src1 These two 32-bit buses send 64 bits to the EU and coprocessor.
Forms bit source operand #1. Src2Hi, Src2 These two 32-bit buses send 64 bits to the EU and coprocessor.
Forms bit source operand #2. DstHi #, DstLo# This forms a 64-bit destination bus used by the EU and coprocessor to return the results of performed operations. SFRInBus (0:31)#
This is a 32-bit special function register bus that allows external core logic functions to be read as if they were registers. If the register address field matches the SFR register address, the RF causes SFRInBus to be either Src1 or S
Can drive rc2 bus. It is also a low bus asserted.

【００２９】以下のバスは、ＲＦにおよびＲＦからレジ
スタ・アドレス情報を伝達する。それぞれは７ビットで
ある。Ｓ１Ａｄｒ　　　これは、Ｓｒｃ１　バスをドライブす
るのに使用されるレジスタのアドレスである。Ｓ２Ａｄｒ　　　これは、Ｓｒｃ２　バスをドライブす
るのに使用されるレジスタのアドレスを指定する。ＤｓｔＡｄｒｏｕｔ　　　これは、実行されるべき演算
のデスティネーションを記憶するのに使用されるであろ
うレジスタのアドレスである。それは、適当なレジスタ
をスコアボード（およびチェック）するためＲＦにより
使用される。ＤｓｔＡｄｒＩｎ　　　　これは、ＤｓｔＨｉ　および
　ＤｓｔＬｏ　バスにリターンするデータに関するレジ
スタ・アドレスである。The following buses convey register address information to and from the RF. Each is 7 bits. S1Adr This is the address of the register used to drive the Src1 bus. S2Adr This specifies the address of the register used to drive the Src2 bus. DstAdrout This is the address of a register that will be used to store the destination of the operation to be performed. It is used by the RF to scoreboard (and check) the appropriate registers. DstAdrIn This is the register address for the data returned to the DstHi and DstLo buses.

【００３０】本発明について実施例に基づいて説明して
きたが、本発明の思想から離れることなく様々に改変し
得ることは当業者には明白であろう。Although the present invention has been described based on examples, it will be obvious to those skilled in the art that various modifications can be made without departing from the spirit of the invention.

[Brief explanation of the drawing]

【図１】　　本発明によるレジスタ・ファイルの機能ブ
ロック図である。FIG. 1 is a functional block diagram of a register file according to the present invention.

【図２】　　図１のレジスタ・ファイルのＲＡＭアレイ
におけるＲＡＭアレイおよび比較論理装置の詳細なブロ
ック図である。2 is a detailed block diagram of the RAM array and comparison logic in the RAM array of the register file of FIG. 1; FIG.

【図３】　　図１および図２の回路の動作を示したタイ
ミング図である。3 is a timing diagram showing the operation of the circuits of FIGS. 1 and 2; FIG.

[Explanation of symbols]

４　　実行装置８　　メモリ・インタフェイス装置／命令デコード装置
１０　　ロード・アラインメント装置１２　　ストア・アラインメント装置１４　　ベースＭＵＸ１６　　ロード・バイパス装置１８　　レジスタＲＡＭアレイ１９　　リテラル発生論理装置２０　　アドレス・デコーダ２２　　アドレス・デコーダ4 Execution unit 8 Memory interface unit/instruction decoding unit 10 Load alignment unit 12 Store alignment unit 14 Base MUX 16 Load bypass unit 18 Register RAM array 19 Literal generation logic unit 20 Address decoder 22 Address decoder

Claims

[Claims]

1. A memory interface including an Ld data bus, a RAM array including a plurality of word registers, and a memory interface connected to the memory interface and the RAM array; alignment logic for aligning data destined for the AM array (for stores) and preparing data from the memory interface to be sent to the AM array (for loads);
a device for clearing said registers in said RAM array; said column line of said RAM array exit port: St. Data, base, Src1, and Src2 include load bypass logic for bypassing the Ld data bus returning from the memory interface and register compare address logic, and the request to generate a matched output signal. destination bypass logic that compares the address of the source register returned to the register with the register address of all registers to which the data is being returned;
a device responsive to the alignment output signal for placing the data from the load alignment logic directly onto a RAM column line of the source.

2. A method of loading a multiport register file having a plurality of RAM cells in a pipelined microprocessor, comprising: A. B. enabling said RAM cell with a decoded address to select a particular cell to receive data; B. B. asserting a clear line to set the selected cell to a zero state; D. precharging the column lines to negative true so that zeros in the cells do not affect the state of the cell output lines; D. All registers for which data is returned
E. comparing the address to the address of the requested source register; If they are equal, generating a matching signal;F. placing data on the data bus directly into source RAM column lines for those cells for which alignment signals are generated.