JPH09146835A

JPH09146835A - System for prefetch of data

Info

Publication number: JPH09146835A
Application number: JP8225449A
Authority: JP
Inventors: J Mayfield Michael; マイケル・ジェイ・メイフィールド; Alain Hicks Duwain; ドゥワイン・アラン・ヒクス; S Ley David; デビッド・スコット・レイ; Stefan Tan Xi-Sin; シ−シン・ステファン・タン
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1995-11-06
Filing date: 1996-08-27
Publication date: 1997-06-06
Also published as: KR100262906B1; TW371331B; KR970029103A; US6085291A

Abstract

PROBLEM TO BE SOLVED: To reduce the waiting time for the fetch of the data from a memory in a microprocessor system. SOLUTION: In a data processing system realizing first-order and second-order caches L1 and L2, a stream filter and a buffer, the prefetch of a cache line is progressively executed. In a first mode, the data is not prefetched. In a second mode, two cache lines are prefetched, and the one line is prefetched in the L1 cache and the next line is prefetched in a stream buffer. In a third mode, three cache lines or more are prefetched one time. The prefetches are possible to be performed for a cache mistake or a hit. The cache mistake in a continuous cache line is capable of assigning the stream of the cache line to a stream buffer.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は一般にデータ処理シ
ステムに関し、特に、データをメモリからプリフェッチ
するシステムに関する。FIELD OF THE INVENTION This invention relates generally to data processing systems, and more particularly to systems for prefetching data from memory.

【０００２】[0002]

【従来の技術】現プログラム及びデータをプロセッサ
（ＣＰＵ）が高速に使用可能にすることにより、データ
処理システムの処理速度を向上するために、時に特殊な
超高速メモリが使用される。こうした高速メモリはキャ
ッシュとして知られ、時に大規模コンピュータ・システ
ムにおいて、主メモリ・アクセス時間とプロセッサ論理
回路間の速度差を補償するために使用される。プロセッ
サ論理回路は、通常主メモリ・アクセス時間よりも高速
であり、処理速度は大抵主メモリの速度により制限され
る。動作速度の不一致を補償する１つの技法は、ＣＰＵ
と主メモリとの間で、そのアクセス時間がプロセッサ論
理回路の伝播遅延に近い超高速の小メモリを使用する。
これは現在ＣＰＵ内で実行されているプログラムのセグ
メント、及び現計算において頻繁に必要とされる一時デ
ータを記憶するために使用される。プログラム（命令）
及びデータを高速に使用可能にすることにより、プロセ
ッサの性能を高めることが可能である。2. Description of the Prior Art Special high speed memory is sometimes used to improve the processing speed of a data processing system by making the current program and data available to the processor (CPU) at high speed. Such fast memories are known as caches and are sometimes used in large computer systems to compensate for main memory access times and speed differences between processor logic. Processor logic is typically faster than main memory access time, and processing speed is often limited by main memory speed. One technique for compensating for operating speed inconsistencies is CPU
Between the main memory and the main memory, an ultra-fast small memory whose access time is close to the propagation delay of the processor logic circuit is used.
It is used to store the segment of the program currently executing in the CPU, as well as the temporary data often needed in the current computation. Program (command)
Also, the performance of the processor can be improved by making the data available at high speed.

【０００３】通常の多数のプログラムの分析の結果、所
与の時間間隔におけるメモリ参照は、メモリ内の幾つか
のローカル化領域に制限されることが判明した。この現
象は"参照のローカル性"の特性として知られる。この特
性の理由は、通常のコンピュータ・プログラム・フロー
が直線的に流れ、しばしばプログラム・ループ及びサブ
ルーチン呼び出しに遭遇することを考慮することにより
理解され得る。プログラム・ループが実行されるとき、
ＣＰＵはループを構成するメモリ内の命令セットを繰り
返し参照する。所与のサブルーチンが呼び出される度
に、その命令セットがメモリからフェッチされる。従っ
て、ループ及びサブルーチンは、命令フェッチのための
メモリ参照をローカル化する傾向がある。データのメモ
リ参照も、より低い度合いでローカル化される傾向があ
る。テーブル・ルックアップ・プロシジャは、テーブル
が記憶されるメモリ部分を繰り返し参照する。反復プロ
シジャは共通メモリ位置を参照し、数字のアレイがメモ
リの局所部分に記憶される。これらの観測の結果が参照
のローカル性であり、これは短い時間間隔において、通
常のプログラムにより生成される命令のアドレスが、メ
モリの幾つかのローカル化領域を繰り返し参照し、残り
のメモリ領域は比較的稀にアクセスされることを示す。Analysis of a number of routine programs has revealed that memory references in a given time interval are restricted to a few localized regions in memory. This phenomenon is known as the property of "locality of reference". The reason for this property can be understood by considering that normal computer program flow flows linearly and often encounters program loops and subroutine calls. When the program loop is executed,
The CPU repeatedly refers to the instruction set in the memory forming the loop. Each time a given subroutine is called, its instruction set is fetched from memory. Therefore, loops and subroutines tend to localize memory references for instruction fetches. Memory references to data also tend to be localized to a lesser extent. The table lookup procedure repeatedly references the portion of memory where the table is stored. The iterative procedure references a common memory location and an array of numbers is stored in a local portion of memory. The result of these observations is the locality of reference, which means that in a short time interval, the address of an instruction generated by a normal program repeatedly references some localized areas of memory and the remaining memory areas Indicates that access is relatively rare.

【０００４】プログラム及びデータのアクティブ部分が
高速の小メモリ内に配置される場合、平均メモリ・アク
セス時間が低減され、従ってプログラムの総実行時間が
減少する。こうした高速小メモリは、前述のように、キ
ャッシュとして参照される。キャッシュ・メモリ・アク
セス時間は、主メモリのアクセス時間よりも、しばしば
５倍乃至１０倍小さい。キャッシュはメモリ階層におい
て最も高速のコンポーネントであり、ＣＰＵコンポーネ
ントの速度に近い。When the active portions of the program and data are located in the fast small memory, the average memory access time is reduced, and thus the total program execution time. Such a high speed small memory is referred to as a cache as described above. Cache memory access times are often five to ten times smaller than main memory access times. The cache is the fastest component in the memory hierarchy, close to the speed of the CPU component.

【０００５】キャッシュ構成の基本的な思想は、最も頻
繁にアクセスされる命令及びデータを高速キャッシュ・
メモリに保存することにより、平均メモリ・アクセス時
間をキャッシュのアクセス時間に近づけることである。
キャッシュは主メモリのサイズの小部分ではあるが、メ
モリ要求の大部分は、プログラムの参照のローカル性の
ために、高速キャッシュ・メモリ内で見い出される。The basic idea of cache organization is to cache the most frequently accessed instructions and data at high speed.
Saving in memory brings the average memory access time closer to the cache access time.
Although the cache is a small fraction of the size of main memory, the majority of memory requirements are found in fast cache memory because of the locality of reference of the program.

【０００６】キャッシュの基本動作は次のようである。
ＣＰＵがメモリ・アクセスを必要とすると、キャッシュ
が調査される。ワードがキャッシュ内で見い出される
と、それは高速メモリから読出される。ＣＰＵによりア
ドレス指定されるワードがキャッシュ内で見い出されな
い場合には、そのワードを読出すために、主メモリがア
クセスされる。アクセスされたばかりのワードを含むワ
ード・ブロックが、次に主メモリからキャッシュ・メモ
リに転送される。このようにして、特定のデータがキャ
ッシュに転送され、将来のメモリ参照が、要求されるワ
ードを高速キャッシュ・メモリ内で見い出すことにな
る。The basic operation of the cache is as follows.
When the CPU needs a memory access, the cache is examined. When a word is found in the cache, it is read from fast memory. If the word addressed by the CPU is not found in the cache, main memory is accessed to read the word. The word block containing the word just accessed is then transferred from main memory to cache memory. In this way, certain data will be transferred to the cache and future memory references will find the required word in the fast cache memory.

【０００７】コンピュータ・システムの平均メモリ・ア
クセス時間は、キャッシュの使用により多大に改良され
得る。キャッシュ・メモリの性能は、しばしば"ヒット
率"と呼ばれる量により測定される。ＣＰＵがメモリを
参照し、ワードをキャッシュ内で見い出すとき、これ
は"ヒット"を生成したと称される。ワードがキャッシュ
内で見い出されない場合、そのワードは主メモリ内に存
在し、"ミス"としてカウントされる。ヒット率が十分に
高く、ほとんどの時間、ＣＰＵが主メモリの代わりにキ
ャッシュをアクセスする場合、平均アクセス時間は高速
キャッシュ・メモリのアクセス時間に近づく。例えば、
１００ｎｓのキャッシュ・アクセス時間、１０００ｎｓ
の主メモリ・アクセス時間、及び０．９のヒット率を有
するコンピュータでは、２００ｎｓの平均アクセス時間
を生成する。これはキャッシュ・メモリを有さず、その
アクセス時間が１０００ｎｓの類似のコンピュータに比
較して、多大な改良である。The average memory access time of a computer system can be greatly improved by the use of caches. Cache memory performance is often measured by a quantity called the "hit rate". When the CPU references memory and finds a word in cache, this is said to have generated a "hit". If the word is not found in cache, it is in main memory and is counted as a "miss". If the hit rate is high enough that most of the time the CPU accesses the cache instead of main memory, the average access time approaches that of the fast cache memory. For example,
100ns cache access time, 1000ns
A computer with a main memory access time of 1 and a hit rate of 0.9 produces an average access time of 200 ns. This is a great improvement over a similar computer that has no cache memory and its access time is 1000 ns.

【０００８】最新のマイクロプロセッサでは、プロセッ
サ・サイクル時間が技術の進歩と共に、向上し続けてい
る。また、思惑実行、深いパイプライン、並びにより多
くの実行要素などの設計技法が、マイクロプロセッサの
性能を改良し続けている。性能の改良はメモリ・インタ
フェースに、より大きな負担を課すことになる。なぜな
ら、プロセッサが、より多くのデータ及び命令を、メモ
リからマイクロプロセッサに供給することを要求するか
らである。メモリ待ち時間を低減するために、大規模な
オンチップ・キャッシュ（１次またはＬ１キャッシュ）
が実現され、これらはしばしば、より大規模なオフチッ
プ・キャッシュ（２次またはＬ２キャッシュ）により増
補される。In modern microprocessors, processor cycle times continue to improve as technology advances. Also, design techniques such as speculative execution, deep pipelines, and more execution elements continue to improve the performance of microprocessors. The improved performance puts a greater burden on the memory interface. This is because the processor requires more data and instructions to be supplied from memory to the microprocessor. Large on-chip cache (L1 or L1 cache) to reduce memory latency
Are implemented, which are often augmented by larger off-chip caches (secondary or L2 caches).

【０００９】メモリ・データを予め１次キャッシュに供
給し、待ち時間を低減するために、しばしばプリフェッ
チ技法が実現される。理想的には、プログラムは、プロ
セッサがメモリ・データを必要とするとき、そのコピー
が常に１次キャッシュ内に存在するように、データ及び
命令を予め十分にプリフェッチする。Prefetching techniques are often implemented to pre-populate memory caches with memory data to reduce latency. Ideally, the program pre-fetches data and instructions well enough so that when the processor needs the memory data, its copy always resides in the primary cache.

【００１０】問題は、マイクロプロセッサ・アーキテク
チャが、全ての場合において必要とされ得るデータのア
ドレスを明示的に決定するための、十分な事前情報を提
供しないことである。例えば、メモリ内におけるデータ
・オペランドのアドレスは、それ自身メモリ内にあり、
メモリ命令により使用される第１命令によりフェッチさ
れなければならない。こうしたシーケンスでは、プロセ
ッサはプリフェッチを実行するためのアドレスを前もっ
て有さない。The problem is that the microprocessor architecture does not provide sufficient a priori information to explicitly determine the address of the data that may be needed in all cases. For example, the address of a data operand in memory is itself in memory,
It must be fetched by the first instruction used by the memory instruction. In such a sequence, the processor does not have an address in advance to perform prefetch.

【００１１】命令及び（または）データのプリフェッチ
は既知である。しかしながら、既存のプリフェッチ技法
は、しばしば命令及び（または）データを早計にプリフ
ェッチする。命令及び（または）データをプリフェッチ
し、それらを使用しないことに関わる問題は、（１）プ
リフェッチされたデータが、プロセッサにより必要とさ
れるデータを置換し得る、（２）プリフェッチ・メモリ
・アクセスが、続くプロセッサ・キャッシュ・リロード
を生成し、それによりプリフェッチ・アクセスが待機
し、必要データの待ち時間を増加し得ることである。こ
れら両方の作用はＣＰＵの効率を低下させる。従って、
マイクロプロセッサの性能を低下すること無く、キャッ
シュ・ミスによる１次キャッシュへのデータ及び命令ア
クセスの待ち時間を低減する、プリフェッチのための改
良されたシステムが必要とされる。Prefetching of instructions and / or data is known. However, existing prefetch techniques often prefetch instructions and / or data prematurely. The problem with prefetching instructions and / or data and not using them is that (1) the prefetched data can replace the data needed by the processor, and (2) prefetch memory access , Subsequent processor cache reloads, which can cause prefetch accesses to wait and increase latency of required data. Both of these effects reduce the efficiency of the CPU. Therefore,
What is needed is an improved system for prefetching that reduces the latency of data and instruction access to the primary cache due to cache misses without degrading the performance of the microprocessor.

【００１２】[0012]

【発明が解決しようとする課題】本発明の目的は、マイ
クロプロセッサにおいて、１次及び２次キャッシュと共
にストリーム・フィルタを使用して、メモリからプリフ
ェッチ・データを提供し、マイクロプロセッサ・システ
ムにおけるデータ待ち時間を低減することである。SUMMARY OF THE INVENTION It is an object of the present invention to use stream filters with primary and secondary caches in a microprocessor to provide prefetch data from memory and wait for data in a microprocessor system. It is to reduce the time.

【００１３】本発明の別の目的は、複数のストリームを
同時にサポートでき、プリフェッチ・データをしだいに
増分して、プリフェッチの深さを制御できる固有のスト
リーム・フィルタ装置を提供することである。Another object of the present invention is to provide a unique stream filter device capable of simultaneously supporting multiple streams and gradually incrementing the prefetch data to control the prefetch depth.

【００１４】[0014]

【課題を解決するための手段】前記記述は、以降で述べ
られる本発明の詳細がより理解されるように、本発明の
特長及び技術的利点をかなり広く述べたものであり、本
発明の追加の特長及び利点については、以降で詳細に述
べることにする。The foregoing description fairly broadly describes the features and technical advantages of the present invention, so that the details of the invention described below will be better understood, and the addition of the present invention The features and advantages of will be described in detail below.

【００１５】[0015]

【発明の実施の形態】以降の説明では、本発明を十分に
理解するために、例えば特定のワード長またはバイト長
などの、多数の特定の詳細が述べられる。しかしなが
ら、当業者には明らかなように、本発明はこうした特定
の詳細無しでも実施し得る。他の例では、本発明を不要
な詳細により不明瞭化しないように、既知の回路がブロ
ック図形式で示される。大部分において、タイミングな
どに関する詳細は、本発明を理解する上で不要であり、
当業者の知る範囲内である限り省略される。DETAILED DESCRIPTION OF THE INVENTION In the following description, numerous specific details are set forth, eg, specific word lengths or byte lengths, in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, known circuits are shown in block diagram form in order not to obscure the present invention with unnecessary details. For the most part, details such as timing are not necessary to an understanding of the invention,
It is omitted as long as it is known to those skilled in the art.

【００１６】図面を通じて、図示の要素は必ずしもその
縮尺は適正ではなく、同一または類似の要素は同一の参
照番号により示される。Throughout the drawings, the elements shown are not necessarily to scale, and identical or similar elements are designated by the same reference numbers.

【００１７】図１を参照して、本発明を有利に実現する
データ処理システム１００について説明する。マルチプ
ロセッサ・システム１００は、システム・バス１２４に
動作上接続される複数の処理ユニット１０６、１０８、
１１０を含む。任意の数の処理ユニットが、システム１
００内において使用され得る。システム・バス１２４に
は更に、システム・メモリ１０２へのアクセスを制御す
るメモリ制御装置１０４が接続される。メモリ制御装置
１０４はＩ／Ｏ制御装置１２６にも接続され、Ｉ／Ｏ制
御装置１２６はＩ／Ｏ装置１２８に接続される。処理ユ
ニット１０６、１０８、１１０、Ｉ／Ｏ制御装置１２
６、及びＩ／Ｏ装置１２８は、ここでは全てバス装置と
して参照される。図示のように、各処理ユニット１０
６、１０８、１１０は、プロセッサ及びＬ１（１次）キ
ャッシュ１１２、１１４、１１６をそれぞれ含み得る。
Ｌ１キャッシュは、それぞれのプロセッサと同一チップ
上に配置され得る。処理ユニット１０６、１０８、１１
０には、それぞれＬ２（２次）キャッシュ１１８、１２
０、１２２が接続される。これらのＬ２キャッシュは、
それぞれが接続されるプロセッサを介して、システム・
バス１２４に接続される。Referring to FIG. 1, a data processing system 100 that advantageously implements the present invention will be described. The multiprocessor system 100 includes a plurality of processing units 106, 108, operatively connected to a system bus 124.
Including 110. Any number of processing units can be used in system 1
00 can be used. Also connected to the system bus 124 is a memory controller 104 that controls access to the system memory 102. The memory controller 104 is also connected to the I / O controller 126, and the I / O controller 126 is connected to the I / O device 128. Processing units 106, 108, 110, I / O control device 12
6, and I / O device 128 are all referred to herein as bus devices. As shown, each processing unit 10
6, 108, 110 may include a processor and L1 (primary) cache 112, 114, 116, respectively.
The L1 cache may be located on the same chip as each processor. Processing units 106, 108, 11
0 to L2 (secondary) caches 118 and 12 respectively.
0 and 122 are connected. These L2 caches are
The system, via each connected processor
It is connected to the bus 124.

【００１８】各Ｌ１及びＬ２キャッシュ対は通常、直列
に関連付けられる。Ｌ１キャッシュはストア・イン（st
ore-in）またはライト・スルー（write-through）とし
て実現され、より大規模で低速のＬ２キャッシュは、ラ
イト・バック（write-back）・キャッシュとして実現さ
れ得る。Ｌ１及びＬ２キャッシュ制御装置（図示せず）
の両方は、処理ユニットの一部として物理的に実現さ
れ、内部的にバスを介して処理ユニットに接続される。
Ｌ２キャッシュ制御装置はオフ・チップであってもよ
い。Each L1 and L2 cache pair is typically associated in series. L1 cache is store in (st
Larger, slower L2 caches, implemented as ore-in or write-through, can be implemented as write-back caches. L1 and L2 cache controller (not shown)
Both are physically implemented as part of the processing unit and are internally connected to the processing unit via a bus.
The L2 cache controller may be off-chip.

【００１９】図２を参照すると、本発明に従い動作する
ように構成され得るデータ処理システム２００が示され
る。システム２００は、システム１００の代替アーキテ
クチャである。システム１００及びシステム２００内に
おいて、プロセッサ及びキャッシュの基本動作は類似で
ある。メモリ制御装置１０４及びノード制御装置２０５
の制御及び機能は、本発明に関して類似である。Referring to FIG. 2, a data processing system 200 that may be configured to operate in accordance with the present invention is shown. System 200 is an alternative architecture to system 100. Within system 100 and system 200, the basic operation of processors and caches are similar. Memory controller 104 and node controller 205
The controls and functions of are similar for the present invention.

【００２０】システム２００において、プロセッサ２０
１は内部Ｌ１キャッシュ２０２を有し、Ｌ１キャッシュ
２０２は外部Ｌ２キャッシュ２０３に接続される。プロ
セッサ２０１は、バス２０４によりノード制御装置２０
５に接続される。ノード制御装置２０５は、プロセッサ
２０１をシステム２００の残りの要素に接続するための
既知の基本機能を実行する。ノード制御装置２０５はバ
ス２０６により、スイッチ２０７に接続される。スイッ
チ２０７は例えばクロス・ポイント・スイッチであり、
他のプロセッサ及び（または）Ｉ／Ｏ装置（図示せず）
を、バス２０８を介してシステム・メモリ２０９に接続
する。以降の議論はシステム２００に関して述べられる
が、後述の本発明に関する議論はシステム１００にも関
連付けられ、またそこでも実現され得る。In system 200, processor 20
1 has an internal L1 cache 202, and the L1 cache 202 is connected to an external L2 cache 203. The processor 201 allows the node control device 20 to operate via the bus 204.
5 is connected. Node controller 205 performs the known basic functions for connecting processor 201 to the remaining elements of system 200. The node controller 205 is connected to the switch 207 by the bus 206. The switch 207 is, for example, a cross point switch,
Other processors and / or I / O devices (not shown)
Are connected to system memory 209 via bus 208. Although the discussion below is with respect to system 200, the discussion of the invention below is also related to and may be implemented in system 100.

【００２１】本発明の目標は、プロセッサ２０１がキャ
ッシュ・ラインをＬ１キャッシュ２０２内で相当な時間
ヒットするように、これらのキャッシュ・ライン（デー
タ部分またはデータ・ブロック）をＬ１キャッシュ２０
２にプリフェッチするための、効率的で的確な技法を提
供することであり、それにより、プロセッサ２０１の性
能を低下させるシステム・メモリ２０９からのアドレス
及びデータ情報の取り出しを最小化する。The goal of the present invention is to cache these cache lines (data portions or blocks) in the L1 cache 20 so that the processor 201 hits the cache lines in the L1 cache 202 for a considerable amount of time.
2 to provide an efficient and accurate technique for prefetching, thereby minimizing the retrieval of address and data information from the system memory 209 which degrades the performance of the processor 201.

【００２２】図３を参照すると、使用されないデータの
プリフェッチの発生を低減するために、ストリーム・フ
ィルタが使用される。これらのストリーム・フィルタ
は、アドレス及び方向情報を含む履歴バッファである。
ストリーム・フィルタは、Ｌ１キャッシュ内でミスした
キャッシュ・ラインの次に順次的に高いキャッシュ・ラ
インのアドレスを含む。プロセッサからのアクセスが、
次に高いキャッシュ・ラインに対して実施されると、ス
トリーム状態が検出され、ストリーム・バッファが割当
てられる。ストリーム・フィルタは、ライン・アドレ
ス"Ｘ"へのアクセスが発生した場合、ライン・アドレ
ス"Ｘ＋１"を書込まれる。"Ｘ＋１"がストリーム・フィ
ルタ内に存在する間に、続くアクセスがアドレス"Ｘ＋
１"に対してて実施されると（フィルタ・ヒット）、"Ｘ
＋１"がストリームとして割当てられる。Referring to FIG. 3, stream filters are used to reduce the occurrence of prefetching of unused data. These stream filters are history buffers that contain address and direction information.
The stream filter contains the address of the next higher cache line after the missed cache line in the L1 cache. Access from the processor
When performed on the next higher cache line, the stream condition is detected and the stream buffer is allocated. The stream filter is written with the line address "X + 1" when an access to the line address "X" occurs. While "X + 1" is present in the stream filter, the subsequent access is the address "X +".
When executed for 1 "(filter hit)," X
+1 "is assigned as a stream.

【００２３】ストリーム・バッファは、潜在的キャッシ
ュ・データを保持するプリフェッチ・バッファである。
その思想は、プロセッサ内で実行されるプログラムがデ
ータ／命令の順次ストリームを実行している場合、追加
のラインをシステム・メモリからストリーム・バッファ
にプリフェッチすることが有用であり得ることによる。
従って、続くキャッシュ・ミスがデータをストリーム・
バッファ内で見い出すことができ、ストリーム・バッフ
ァがデータをＬ１キャッシュ及び（または）プロセッサ
に供給する。The stream buffer is a prefetch buffer that holds potential cache data.
The idea is that if the program executing in the processor is executing a sequential stream of data / instructions, it may be useful to prefetch additional lines from system memory into the stream buffer.
Therefore, a subsequent cache miss will stream the data.
It can be found in the buffer and the stream buffer feeds the data to the L1 cache and / or the processor.

【００２４】ストリーム・フィルタ及びストリーム・バ
ッファは、ストリーム・バッファをもミスするＬ１キャ
ッシュ・ミスが発生する場合に、ミス・アドレスがスト
リーム・フィルタ内に記憶されるアドレスと比較される
ように協動する。ストリーム・フィルタ内でヒットが発
生する場合（順次データ・ラインへの順次アクセスが存
在したことを意味する）、次のラインもまた将来的に必
要とされる可能性が高い。The stream filter and stream buffer cooperate so that if an L1 cache miss occurs that also misses the stream buffer, the miss address is compared to the address stored in the stream filter. To do. If a hit occurs in the stream filter (meaning there was a sequential access to the sequential data line) then the next line is also likely to be needed in the future.

【００２５】図４を参照すると、本発明に従い構成され
るシステム２００の詳細図、並びにＣＰＵ２０１内のデ
ータ・フローが示される。データ・フローの変形も既知
であり、そうしたものには、例えば命令及びデータ用の
別々のＬ１キャッシュの使用などが含まれる。Ｌ１キャ
ッシュ２０２は、任意の既知の置換ポリシにより、頻繁
に使用されるデータのコピーをメモリ２０９から保持す
る。大規模なＬ２キャッシュ２０３は、Ｌ１キャッシュ
２０２よりも多くのデータを保持し、メモリ・コヒーレ
ンス・プロトコルを普通に制御する。Ｌ１キャッシュ２
０２内のデータは、Ｌ２キャッシュ２０３内のデータの
サブセットであり得る。Ｌ１キャッシュ２０２及びＬ２
キャッシュ２０３は、"ストア・イン"・キャッシュであ
る。他の機能要素（Ｉ／Ｏを含む）は、既知のスヌープ
・プロトコルを用いて、データを競合する。スヌーピン
グの１つの形態が、米国特許出願番号第４４２７４０号
で開示されている。Referring to FIG. 4, there is shown a detailed view of a system 200 configured in accordance with the present invention, as well as data flow within CPU 201. Variations on data flow are also known, including, for example, the use of separate L1 caches for instructions and data. The L1 cache 202 holds a copy of frequently used data from memory 209, according to any known replacement policy. The large L2 cache 203 holds more data than the L1 cache 202 and normally controls the memory coherence protocol. L1 cache 2
The data in 02 may be a subset of the data in L2 cache 203. L1 cache 202 and L2
The cache 203 is a "store in" cache. Other functional elements (including I / O) use known snoop protocols to compete for data. One form of snooping is disclosed in US patent application Ser. No. 442740.

【００２６】ＣＰＵ２０１に対して示される境界は、チ
ップ境界及び機能境界を示すが、これらは本発明の範囲
を制限するものではない。ＰＣＣ４０４はプロセッサ・
キャッシュ制御装置であり、メモリ・サブシステムから
のフェッチ（プリフェッチ）及びストアを制御する。Ｐ
ＣＣ４０４は、Ｌ１キャッシュ２０２に対するディレク
トリ４１０の作成、有効アドレスから実アドレスへの変
換、及びその逆の変換などの、他の既知の機能を有す
る。プリフェッチ・バッファ（ＰＢＦＲ）４０２は、Ｃ
ＰＵ２０１及びＬ１キャッシュ２０２にステージされる
数ラインのメモリ・データを保持する。ＰＢＦＲ４０２
はストリーム・バッファである。The boundaries shown to the CPU 201 indicate chip boundaries and functional boundaries, but they do not limit the scope of the invention. PCC404 is a processor
A cache controller that controls fetches (prefetches) and stores from the memory subsystem. P
The CC 404 has other known functions such as creating a directory 410 for the L1 cache 202, converting effective addresses to real addresses, and vice versa. The prefetch buffer (PBFR) 402 is C
Holds several lines of memory data staged in PU 201 and L1 cache 202. PBFR402
Is a stream buffer.

【００２７】ＰＣＣ４０４がデータをフェッチすると
き、そのデータがＬ１キャッシュ２０２内に存在すれば
（Ｌ１ヒット）、データがＰＣＣ４０４に送信される。
Ｌ１キャッシュ２０２内に存在しないが（Ｌ１ミス）、
Ｌ２キャッシュ２０３内に存在する場合（Ｌ２ヒッ
ト）、Ｌ１キャッシュ２０２内のあるラインが、Ｌ２キ
ャッシュ２０３からのこの主体データにより置換され
る。この場合、データはＬ１キャッシュ２０２及びＰＣ
Ｃ４０４に同時に送信される。Ｌ２キャッシュ２０３に
おいても同様にミスが発生する場合には、データはメモ
リ２０９からＢＩＵ４０１にフェッチされ、Ｌ１キャッ
シュ２０２、Ｌ２キャッシュ２０３、及びＰＣＣ４０４
に同時にロードされる。このオペレーションの変形は既
知である。データ・ストア・オペレーションは、データ
がＬ１ラインに記憶されてオペレーションが完了する以
外は、フェッチ・オペレーションと類似である。When the PCC 404 fetches data, if the data is in the L1 cache 202 (L1 hit), the data is sent to the PCC 404.
Although it does not exist in the L1 cache 202 (L1 miss),
If it exists in the L2 cache 203 (L2 hit), a line in the L1 cache 202 is replaced by this main data from the L2 cache 203. In this case, the data is L1 cache 202 and PC.
Simultaneously transmitted to C404. If a miss occurs in the L2 cache 203 as well, the data is fetched from the memory 209 to the BIU 401, and the L1 cache 202, the L2 cache 203, and the PCC 404 are fetched.
Loaded at the same time. Variations on this operation are known. The data store operation is similar to the fetch operation except that the data is stored on the L1 line and the operation completes.

【００２８】以降の議論では、ストリーム・バッファ
（図３参照）の様々な部分が、システム２００の様々な
部分に配置される。本技法では、ストリーム・バッファ
は４つのキャッシュ・ラインを記憶する能力を有する
が、任意の数のキャッシュ・ラインがストリーム・バッ
ファ内で実現され得る。ストリーム・バッファの１キャ
ッシュ・ラインは、Ｌ１キャッシュ２０２内で実現され
る。本来、Ｌ１キャッシュ２０２内のキャッシュ・ライ
ンの１つは、ストリーム・バッファのキャッシュ・ライ
ンの内の１つの機能として利用される。ストリーム・バ
ッファの第２のキャッシュ・ラインは、ＰＢＦＲ（プリ
フェッチ・バッファ）４０２内に配置される。ストリー
ム・バッファの他の２つのキャッシュ・ラインは、ノー
ド制御装置２０５内のＰＢＦＲ２４０５及びＰＢＦＲ
３４０６に配置される。ノード制御装置２０５は、バ
ス２０４に沿って、ＣＰＵ２０１の下流のチップ上に配
置され得る。システム１００のアーキテクチャが使用さ
れる場合には、メモリ制御装置１０４がこれらのストリ
ーム・バッファ・ラインを含み得る。In the discussion that follows, various portions of the stream buffer (see FIG. 3) will be located in various portions of system 200. In the present technique, the stream buffer has the ability to store four cache lines, but any number of cache lines may be implemented within the stream buffer. One cache line of the stream buffer is realized in the L1 cache 202. Essentially, one of the cache lines in L1 cache 202 is utilized as a function of one of the stream buffer cache lines. The second cache line of the stream buffer is placed in the PBFR (prefetch buffer) 402. The other two cache lines of the stream buffer are PBFR2 405 and PBFR in the node controller 205.
3 406. The node controller 205 may be arranged on the chip downstream of the CPU 201 along the bus 204. If the architecture of system 100 is used, memory controller 104 may include these stream buffer lines.

【００２９】前記参照されたＩＥＥＥ条項内で述べられ
るように、ストリーム・フィルタ及びストリーム・バッ
ファの基本動作は、要求キャッシュ・ラインに対してＬ
１キャッシュ・ミスが発生するとき、キャッシュ・ライ
ンのアドレスが増分され（一般に１ライン・アドレス
分）、この増分アドレスがストリーム・フィルタ４０３
に挿入される。Ｌ１キャッシュ２０２内におけるキャッ
シュ・ラインの続くミスの発生に際して、このＬ１キャ
ッシュ・ミスのアドレスが、ストリーム・フィルタ４０
３に含まれるアドレスと比較される。少なくとも１つの
アドレスとの一致が観測されると、キャッシュ・ライン
のストリームがストリーム・バッファ内に割当てられ
る。As stated in the referenced IEEE Clause, the basic operation of the stream filter and stream buffer is L for request cache lines.
When a cache miss occurs, the address of the cache line is incremented (generally one line address) and this incremented address is stream filter 403.
Is inserted into. Upon occurrence of a subsequent miss of a cache line in the L1 cache 202, the address of this L1 cache miss is the address of the stream filter 40.
3 is compared with the address contained in 3. A stream of cache lines is allocated in the stream buffer if a match with at least one address is observed.

【００３０】上述のように、キャッシュ・ミスが発生す
るとき、ストリーム・フィルタ・バッファには次の順次
キャッシュ・ラインのアドレスが書込まれる。ストリー
ム・フィルタ（図３及び図５参照）は、こうした事象
の"履歴"を含むアドレスを保持できる複数の位置を含
む。これらは最低使用頻度（ＬＲＵ）ベースで置換され
得る。キャッシュ・ミスが発生する都度、ストリーム・
フィルタ内のアドレスが、キャッシュ・ライン・ミスの
アドレスと比較される。ヒットが発生する場合、フィル
タ・ヒットが発生したと称され、ストリームが割当てら
れる。ストリーム・モードでは、余分なキャッシュ・ラ
インが、Ｌ１キャッシュ２０２によりストリームの一部
として必要とされるとの期待から、ストリーム・バッフ
ァ（例えば、Ｌ１キャッシュ２０２、ＰＢＦＲ４０２、
ＰＢＦＲ２４０５、ＰＢＦＲ３４０６内のライン）
にプリフェッチされる。As described above, when a cache miss occurs, the stream filter buffer is written with the address of the next sequential cache line. Stream filters (see FIGS. 3 and 5) include multiple locations that can hold addresses that contain a "history" of such events. These may be replaced on a least recently used (LRU) basis. Each time a cache miss occurs, the stream
The address in the filter is compared to the cache line miss address. If a hit occurs, the filter hit is said to have occurred and the stream is allocated. In stream mode, an extra cache line is expected by the L1 cache 202 as part of the stream, so the stream buffer (eg, L1 cache 202, PBFR 402,
PBFR2 405 and PBFR3 406 lines)
Prefetched to.

【００３１】図５は、本発明によるストリーム・フィル
タ及びストリーム・バッファの動作を示す高レベル機能
図を示す。ＣＰＵ２０１は、使用されるアーキテクチャ
に従い、有効アドレス（ＥＡ）を生成する。ＥＡは潜在
的にオフセットを有する要求データ・アドレスである。
変換器５０３により、ＣＰＵ２０１は、ＥＡに対応する
変換アドレスまたは実アドレス（ＲＡ）を生成する。実
アドレスはフィルタ・キュー５０２により使用される
が、フィルタ・キュー５０２が実アドレスの代わりに有
効アドレスを使用することも、本発明の範囲に含まれ
る。比較器５０４により、ＲＡはフィルタ・キュー５０
２内のＲＡと無差別に比較され、エントリがその有効ビ
ット（Ｖ）により有効と示される場合には、一致はフィ
ルタ・ヒットと呼ばれる。FIG. 5 shows a high level functional diagram illustrating the operation of the stream filter and stream buffer according to the present invention. The CPU 201 generates an effective address (EA) according to the architecture used. EA is the requested data address with potential offset.
The converter 503 causes the CPU 201 to generate a translated address or a real address (RA) corresponding to EA. The real address is used by the filter queue 502, but it is within the scope of the invention for the filter queue 502 to use a valid address instead of the real address. RA is filtered by the comparator 504 to the filter queue 50.
If the entry is indiscriminately compared to RA in 2 and the entry is marked valid by its valid bit (V), then the match is called a filter hit.

【００３２】フィルタ・キュー５０２は、各エントリに
対して推測方向標識を含み、これは推測ストリームが増
分または減分（±１またはアップ／ダウン）されるべき
ことを示す。これについては後述する。The filter queue 502 includes a speculative direction indicator for each entry, which indicates that the speculative stream should be incremented or decremented (± 1 or up / down). This will be described later.

【００３３】各フィルタ・キュー・エントリは、そのア
ドレスに対応するストリームが存在するか否かを示すフ
ィールドを含み、存在する場合、そのストリームの識別
ストリーム番号を含む（１度に複数のストリームが割当
てられ得る）。Each filter queue entry includes a field that indicates whether or not there is a stream corresponding to that address, and if so, the identifying stream number of that stream (multiple streams can be allocated at one time). Can be).

【００３４】フィルタ・ヒットが発生するとき、ストリ
ームがストリーム・アドレス・バッファ５０１に割当て
られ、対応する割当てがストリーム・データ・バッファ
５０６に作成される。ストリーム・アドレス・エントリ
は、特定の割当てストリームの次のデータ・ラインの推
測有効アドレスを含む。再度、ここでも有効アドレスの
代わりに、実アドレスを使用することも可能である。ス
トリーム・アドレス・エントリは更に、ストリームが割
当てられることを示す有効ビットを含む。更に、ストリ
ームの状態を追跡するために使用される状態フィールド
が存在する。また、推測方向（±またはアップ／ダウ
ン）のコピーが、ストリーム・バッファ内に保持され
る。比較器５０５はプロセッサにより発行されるＥＡ
を、ストリーム・アドレス・バッファ５０１に含まれる
ページ・アドレス及びライン・アドレスと比較する。一
致が発生する場合、これはストリーム・ヒットと呼ばれ
る。When a filter hit occurs, a stream is allocated in stream address buffer 501 and a corresponding allocation is made in stream data buffer 506. The stream address entry contains the speculative effective address of the next data line of a particular allocated stream. Again, it is possible here to use the real address instead of the effective address. The stream address entry also contains a valid bit that indicates that the stream is assigned. In addition, there is a state field used to keep track of the state of the stream. Also, a copy of the speculative direction (± or up / down) is kept in the stream buffer. Comparator 505 is an EA issued by the processor
Is compared with the page address and line address contained in the stream address buffer 501. If a match occurs, this is called a stream hit.

【００３５】図５に示される機能は別の方法によっても
実現され、これらも本発明の範囲に含まれる。The functions shown in FIG. 5 can be implemented by other methods, and these are also included in the scope of the present invention.

【００３６】システム・メモリ２０９内のメモリ空間
は、１２８バイトのラインに分割され得る。各ライン
は、ラインの偶数部分がアドレス０乃至６３に、奇数部
分がアドレス６４乃至１２７に帰するように、半分に分
割されてもよい。上述のように、ＣＰＵ２０１は論理ア
ドレス（ＥＡ）を生成し、これがメモリ内のキャッシュ
可能なラインを指す実アドレスに変換される。メモリは
２Ｎバイトのページに分割される。ページは、サイズ的
にキャッシュ・エントリに対応するラインに分割され
る。キャッシュ・ミスの発生の都度、関連する実アドレ
スが分析される。実アドレスがラインの偶数部分に存在
する場合、潜在ストリームは増分ストリームとなる。フ
ィルタ・キュー５０２内のＬＲＵフィルタ・キュー・エ
ントリが、"アップ"方向によりマークされ、ライン・ミ
スＲＡが"１"増分されて、エントリに保管される。ＲＡ
がラインの奇数側の場合には、キュー５０２内のＲＡエ
ントリが１減分され、"ダウン"がエントリ内にマークさ
れる。The memory space within system memory 209 may be divided into 128 byte lines. Each line may be divided in half such that the even part of the line is at addresses 0-63 and the odd part is at addresses 64-127. As mentioned above, the CPU 201 generates a logical address (EA), which is translated into a real address that points to a cacheable line in memory. The memory is divided into pages of 2N bytes. The page is divided in size into lines that correspond to cache entries. Each time a cache miss occurs, the associated real address is analyzed. If the real address is in the even part of the line, then the latent stream is an incremental stream. The LRU filter queue entry in filter queue 502 is marked with the "up" direction and the line miss RA is incremented by "1" and saved in the entry. RA
If is on the odd side of the line, the RA entry in queue 502 is decremented by 1 and "down" is marked in the entry.

【００３７】別の技法として、ミスに際して、ＲＡをフ
ィルタ・エントリに保管し、続くミスをエントリと比較
して、アップまたはダウンの方向を決定することも、本
発明の範囲に含まれる。As an alternative technique, it is within the scope of the invention to store the RA in a filter entry upon a miss and then compare subsequent misses to the entry to determine the up or down direction.

【００３８】ストリームが割当てられるとき、"次の"有
効ライン・アドレスがストリーム・アドレス・バッファ
５０１に保管されることが明らかであろう。バッファ５
０１は各アクティブ・ストリームに対するエントリを含
む。It will be apparent that the "next" valid line address is stored in the stream address buffer 501 when a stream is allocated. Buffer 5
01 contains an entry for each active stream.

【００３９】Ｌ１キャッシュ２０２及びＬ２キャッシュ
２０３のキャッシュ・ミスが発生する場合、システム・
メモリ２０９をアクセスする以前に、ストリーム・バッ
ファに対する問い合わせが生じる。フィルタ・キュー５
０２回路及びストリーム・アドレス・バッファ５０１回
路を結合することも、本発明の１つの態様である。When a cache miss occurs in the L1 cache 202 and the L2 cache 203, the system
Before accessing the memory 209, an inquiry is made to the stream buffer. Filter queue 5
Combining the 02 circuit and the stream address buffer 501 circuit is also an aspect of the present invention.

【００４０】図６乃至図９を参照すると、本発明の進歩
的プリフェッチ・モードのフロー図が示される。本発明
は３つの進歩的プリフェッチ・モード及びそれらの変形
を可能にする。それらは正常、データ・プリフェッチ、
及びブラスト（Blast）である。正常モードでは、デー
タはプリフェッチされない。データ・プリフェッチ・モ
ードでは、２ラインがプリフェッチされ、１ラインがＬ
１キャッシュに、１ラインがストリーム・バッファにプ
リフェッチされる。ブラスト・モードでは、３ライン以
上が１度にプリフェッチされる。本発明の１つの態様で
は、ブラスト・モードにおいて、４ラインがプリフェッ
チされ、２ラインがデータ・プリフェッチ・モードでプ
リフェッチされ、追加の２ラインがストリーム・バッフ
ァにプリフェッチされる。任意のモードにおいて、プリ
フェッチ・バッファはプロセッサ・チップ、キャッシュ
・チップ、外部チップ、及び（または）メモリ・カード
上に実装され得る。図６乃至図９は、ストリームが流れ
る推測方向が増分方向の場合の例を示す。減分方向の場
合の例は、この例の変更として明らかであろう。図６乃
至図９のフロー図は、データ・プリフェッチ・モード及
びブラスト・モードに入る様子を示す。Referring to FIGS. 6-9, there is shown a flow diagram of the inventive prefetch mode of the present invention. The present invention enables three progressive prefetch modes and their variants. They are normal, data prefetch,
And Blast. No data is prefetched in normal mode. In data prefetch mode, 2 lines are prefetched and 1 line is L
One line in one cache is prefetched into the stream buffer. In blast mode, three or more lines are prefetched at once. In one aspect of the invention, in blast mode, 4 lines are prefetched, 2 lines are prefetched in data prefetch mode, and 2 additional lines are prefetched into the stream buffer. In any mode, the prefetch buffer may be implemented on the processor chip, cache chip, external chip, and / or memory card. 6 to 9 show examples in which the estimated direction in which the stream flows is the incremental direction. An example for the decremental direction would be obvious as a modification of this example. The flow diagrams of FIGS. 6-9 show how the data prefetch mode and blast mode are entered.

【００４１】ステップ６０１で、ＣＰＵ２０１はキャッ
シュ・ラインＡにおけるデータ・アクセスを開始する。
ステップ６０２で、キャッシュ・ラインＡがＬ１キャッ
シュ２０２内に存在するか否かを判断する。存在する場
合、プロセスはステップ６０３に移行し、キャッシュ・
ラインＡがＣＰＵ２０１に戻され、プロセスはステップ
６０４で終了する。In step 601, the CPU 201 starts data access in the cache line A.
At step 602, it is determined whether cache line A exists in L1 cache 202. If so, the process moves to step 603, where the cache
Line A is returned to CPU 201 and the process ends at step 604.

【００４２】しかしながら、キャッシュ・ラインＡにお
いてミスが発生すると、プロセスはステップ６０５に移
行し、キャッシュ・ラインＡのアドレスが、ストリーム
・フィルタ４０３に含まれる全てのアドレスと比較され
る。However, if a miss occurs in cache line A, the process moves to step 605, where the address of cache line A is compared with all the addresses contained in stream filter 403.

【００４３】キャッシュ・ラインＡがストリーム・フィ
ルタ４０３内に存在しない場合、プロセスはステップ６
０６に移行し、キャッシュ・ラインＡのアドレスが１増
分され、ストリーム・フィルタ４０３内に挿入される。
従って、ステップ６０７で、キャッシュ・ラインＡがＬ
２キャッシュ２０３またはメモリ２０９から、Ｌ１キャ
ッシュ２０２にフェッチされる。If cache line A does not exist in stream filter 403, the process proceeds to step 6.
Moving to 06, the address of cache line A is incremented by 1 and inserted into stream filter 403.
Therefore, in step 607, cache line A is L
It is fetched from the 2-cache 203 or the memory 209 to the L1 cache 202.

【００４４】図６乃至図９内のステップ６０７からステ
ップ６０８への破線矢印は、ステップ６０８がステップ
６０７の直後に、必ずしも発生しないことを示す。一般
に多くのミスが発生するほど、キャッシュ・ラインＡ＋
１に対する要求（ステップ６０８）以前に、ストリーム
・フィルタ４０３内のアドレス・エントリが発生し得
る。The dashed arrow from step 607 to step 608 in FIGS. 6-9 indicates that step 608 does not necessarily occur immediately after step 607. Generally, the more misses occur, the more cache line A +
Address requests in stream filter 403 may occur prior to the request for 1 (step 608).

【００４５】やがて、ＣＰＵ２０１がキャッシュ・ライ
ンＡ＋１を要求し得る（ステップ６０８）。再度、ＰＣ
Ｃ４０４は、キャッシュ・ラインＡ＋１がＬ１キャッシ
ュ２０２内に存在するか否かを判断する（ステップ６０
９）。存在する場合、ステップ６１０でキャッシュ・ラ
インＡ＋１がＣＰＵ２０１に返却され、プロセスはステ
ップ６１１で終了する。すなわち、キャッシュ・ライン
Ａ＋１がＬ１キャッシュ２０２内に存在するので、スト
リーム・フィルタ４０３との比較は実行されず、Ａ＋１
エントリはフィルタ置換アルゴリズムにより退却される
まで、フィルタ４０３内に留まる。フィルタ置換アルゴ
リズムは、係属中の米国特許出願番号第５１９０３２号
における教示に従い実行され得る。しかしながら、Ｌ１
キャッシュ２０２内でキャッシュ・ラインＡ＋１に対す
るミスが発生すると、フィルタ・ヒットが発生し（ステ
ップ６３７）、プロセスはステップ６１２に移行して、
キャッシュ・ラインＡ＋２で開始するキャッシュ・ライ
ン・ストリームが割当てられる。なぜなら、要求キャッ
シュ・ラインＡ＋１のアドレスが、ストリーム・フィル
タ４０３内に存在するアドレスＡ＋１と比較され、結果
的にストリーム・フィルタ４０３内でヒットが生じるか
らである。次にステップ６１３で、キャッシュ・ライン
Ａ＋１がＬ２キャッシュ２０３またはメモリ２０９か
ら、Ｌ１キャッシュ２０２にフェッチされる。また、キ
ャッシュ・ラインＡ＋２が、Ｌ１キャッシュ２０２内に
存在するか否かがチェックされる。存在しない場合、キ
ャッシュ・ラインＡ＋２がＬ２キャッシュ２０３または
メモリ２０９から、Ｌ１キャッシュ２０２にプリフェッ
チされる。Eventually, CPU 201 may request cache line A + 1 (step 608). PC again
C404 determines whether cache line A + 1 exists in L1 cache 202 (step 60).
9). If so, then cache line A + 1 is returned to CPU 201 at step 610 and the process ends at step 611. That is, since the cache line A + 1 exists in the L1 cache 202, the comparison with the stream filter 403 is not executed, and A + 1
The entry remains in the filter 403 until it is retired by the filter replacement algorithm. The filter permutation algorithm may be implemented in accordance with the teachings in pending US Patent Application No. 519032. However, L1
When a miss occurs for cache line A + 1 in cache 202, a filter hit occurs (step 637) and the process moves to step 612.
A cache line stream starting with cache line A + 2 is allocated. This is because the address of the requested cache line A + 1 is compared with the address A + 1 existing in the stream filter 403, resulting in a hit in the stream filter 403. Then, in step 613, cache line A + 1 is fetched from L2 cache 203 or memory 209 to L1 cache 202. Also, it is checked whether the cache line A + 2 exists in the L1 cache 202. If not, cache line A + 2 is prefetched from L2 cache 203 or memory 209 to L1 cache 202.

【００４６】その後、ステップ６１４で、キャッシュ・
ラインＡ＋３がＬ２キャッシュ２０３内に存在するか否
かが判断される。存在しない場合、プロセスはステップ
６１５に移行し、キャッシュ・ラインＡ＋３がメモリ２
０９からプリフェッチされ、プリフェッチ・バッファＰ
ＢＦＲ４０２内に挿入される。しかしながら、キャッシ
ュ・ラインＡ＋３がＬ２キャッシュ２０３内に存在する
場合には、プロセスはステップ６１５をスキップする。Then, in step 614, the cache
It is determined whether or not the line A + 3 exists in the L2 cache 203. If not, the process moves to step 615 where cache line A + 3 is in memory 2
09, prefetched from prefetch buffer P
It is inserted into BFR402. However, if cache line A + 3 is present in L2 cache 203, the process skips step 615.

【００４７】再度、ステップ６１５からステップ６１６
への破線矢印は、ステップ６１６がステップ６１５の直
後に必ずしも発生しないことを示す。Again, steps 615 to 616
The dashed arrow to indicates that step 616 does not necessarily occur immediately after step 615.

【００４８】ステップ６１６では、プロセッサ２０１は
キャッシュ・ラインＡ＋２を要求し得り、その場合、ラ
インＡ＋２に対するアクセスが、Ｌ１キャッシュ２０２
から実行される。ステップ６１３で、キャッシュ・ライ
ンＡ＋２がＬ１キャッシュ２０２にフェッチされたの
で、Ｌ１キャッシュ２０２はこのキャッシュ・ラインを
ＣＰＵ２０１に供給することができる。ステップ６１７
で、アドレスＡ＋３をストリームの先頭に有するよう
に、ストリーム・アドレス・バッファ５０１内のストリ
ーム・アドレスが更新される。その後、ステップ６１８
で、キャッシュ・ラインＡ＋３がＬ１キャッシュ２０２
内に存在するか否かがチェックされ、存在しない場合、
キャッシュ・ラインＡ＋３がＬ２キャッシュ２０３また
はＰＢＦＲ４０２から、Ｌ１キャッシュ２０２にフェッ
チされる。次にステップ６１９で、キャッシュ・ライン
Ａ＋４がＬ２キャッシュ２０３またはメモリ２０９か
ら、ＰＢＦＲ４０２にフェッチされる。At step 616, the processor 201 may request cache line A + 2, in which case an access to line A + 2 will result in L1 cache 202.
Run from At step 613, cache line A + 2 has been fetched into the L1 cache 202, so the L1 cache 202 can supply this cache line to the CPU 201. Step 617
Then, the stream address in the stream address buffer 501 is updated so as to have the address A + 3 at the head of the stream. Then, step 618
So, cache line A + 3 is L1 cache 202
If it does not exist, it is checked whether it exists in
Cache line A + 3 is fetched from L2 cache 203 or PBFR 402 into L1 cache 202. Then, in step 619, cache line A + 4 is fetched into PBFR 402 from L2 cache 203 or memory 209.

【００４９】その後、システム２００内でブラスト・モ
ードが許可されていなければ（ステップ６２０）、プロ
セスはステップ６１６に戻り、ＣＰＵ２０１が図示のよ
うに、キャッシュ・ラインを順次増分し続ける限り（"
ストリーミング"として参照される）、ステップ６１６
乃至６２１をループする。ステップ６２１は、その後ス
テップ６１６において、ラインＡ＋３に対するＬ１キャ
ッシュ・アクセスが発生し、ステップ６１７でストリー
ムがアドレスＡ＋３により更新され、ステップ６１８
で、ラインＡ＋４がＬ１キャッシュ２０２からフェッチ
され、ステップ６１９で、キャッシュ・ラインＡ＋４が
ＰＢＦＲ４０２からフェッチされることを示す。Thereafter, if blast mode is not allowed in system 200 (step 620), the process returns to step 616 and as long as CPU 201 continues incrementing cache lines sequentially as shown ("
Referred to as "streaming"), step 616
Through 621. Step 621 then proceeds to step 616 where an L1 cache access to line A + 3 occurs, the stream is updated with address A + 3 at step 617 and step 618.
Indicates that line A + 4 is fetched from L1 cache 202 and that cache line A + 4 is fetched from PBFR 402 at step 619.

【００５０】上述の説明はデータ・プリフェッチ・モー
ドを示すものである。ステップ６２０で、ブラスト・モ
ードがシステム２００において許可されている場合、プ
ロセスはステップ６２２に移行し、ＣＰＵ２０１からキ
ャッシュ・ラインＡ＋３に対する要求が発生する。ステ
ップ６２２で、こうした要求に対して、ＰＣＣ４０４が
Ｌ１キャッシュ２０２内でキャッシュ・ラインＡ＋３を
探索する。キャッシュ・ラインＡ＋３は、ステップ６１
８によりＬ１キャッシュ２０２内に存在するので、キャ
ッシュ・ラインＡ＋３がＣＰＵ２０１に返却される。そ
の後、ステップ６２３で、ストリーム・アドレス・バッ
ファ５０１内のストリーム・アドレスが、Ａ＋４に更新
される。ステップ６２４で、Ｌ１キャッシュ２０２にラ
インＡ＋４が存在するか否かがテストされる。存在しな
い場合、キャッシュ・ラインＡ＋４が、ＰＢＦＲ４０２
から、Ｌ１キャッシュ２０２内のプリフェッチ・バッフ
ァ位置にフェッチされる。The above description shows the data prefetch mode. In step 620, if blast mode is allowed in system 200, the process moves to step 622 and a request is issued from CPU 201 for cache line A + 3. At step 622, PCC 404 searches cache line A + 3 in L1 cache 202 for such a request. Cash line A + 3 is step 61
The cache line A + 3 is returned to the CPU 201 because it exists in the L1 cache 202 due to 8. Then, in step 623, the stream address in stream address buffer 501 is updated to A + 4. At step 624, it is tested whether line A + 4 exists in L1 cache 202. If not present, cache line A + 4 becomes PBFR402.
From the prefetch buffer location in the L1 cache 202.

【００５１】その後、ステップ６２５で、キャッシュ・
ラインＡ＋５がＬ２キャッシュ２０３内に存在するか否
かが判断される。存在する場合、プロセスはステップ６
２６または６２７に移行する。本技法は、ノード制御装
置２０５があらゆるストリーム・バッファ・アクセスを
通知されることを要求する。通知が、次のストリーム・
バッファがＬ２キャッシュ２０３内に存在せず、従って
フェッチされる必要がある場合に限られると、ノード制
御装置バッファ４０５及び４０６が一時的にプロセッサ
２０１との同期を逸する。この設計のトレードオフの利
点は、ステップ６２６及び６２７が結合され、ノード制
御装置２０５に対するアドレス・バス・トラフィックを
低減することである。最初の場合では、Ａ、Ａ＋１など
のラインは、プリフェッチ以前にキャッシュ２０２内に
存在せず、従って通常、キャッシュ・ラインＡ＋５がＬ
２キャッシュ２０３内に存在することは期待できない。Then, in step 625, the cache
It is determined whether or not the line A + 5 exists in the L2 cache 203. If so, the process is step 6
26 or 627. This technique requires the node controller 205 to be notified of any stream buffer access. The notification is the next stream
The node controller buffers 405 and 406 temporarily lose synchronization with the processor 201 only when the buffer is not in the L2 cache 203 and therefore needs to be fetched. The advantage of this design trade-off is that steps 626 and 627 are combined to reduce address bus traffic to node controller 205. In the first case, the lines A, A + 1, etc. do not exist in cache 202 prior to prefetching, so cache line A + 5 is typically L.
2 It cannot be expected to exist in the cache 203.

【００５２】ステップ６２６及び６２７が上述の理由か
ら結合されるとき、ステップ６２７の通知は、ステップ
６２６のプリフェッチに追加される４ビットの制御ビッ
トにより実現され得る。４ビットは、１ビットの有効プ
リフェッチと、２ビットのストリーム識別と、１ビット
のプリフェッチ方向を含む。キャッシュ・ラインＡ＋５
のアドレス及びこれらのビットから、ノード制御装置２
０５は、キャッシュ・ラインＡ＋６及びＡ＋７に対する
メモリ要求を生成する。前述のように、ノード制御装置
２０５は任意の数のキャッシュ・ラインをプリフェッチ
するように実現され得る。ステップ６２８では、ノード
制御装置２０５はキャッシュ・ラインＡ＋６をプリフェ
ッチ・バッファＰＢＦＲ２４０５にプリフェッチし、
キャッシュ・ラインＡ＋７を、プリフェッチ・バッファ
ＰＢＦＲ３４０６にプリフェッチする。When steps 626 and 627 are combined for the reasons set forth above, the notification of step 627 may be realized by the four control bits added to the prefetch of step 626. The 4 bits include a 1-bit valid prefetch, a 2-bit stream identification, and a 1-bit prefetch direction. Cash line A + 5
From the address of the node and these bits, the node controller 2
05 generates memory requests for cache lines A + 6 and A + 7. As mentioned above, the node controller 205 may be implemented to prefetch any number of cache lines. In step 628, node controller 205 prefetches cache line A + 6 into prefetch buffer PBFR2 405,
Prefetch cache line A + 7 into prefetch buffer PBFR3 406.

【００５３】ステップ６２８とステップ６２９との間の
破線矢印は、キャッシュ・ラインＡ＋４に対するＣＰＵ
２０１からの要求が、ステップ６２８の直後に必ずしも
発生しないことを示す。The dashed arrow between steps 628 and 629 indicates the CPU for cache line A + 4.
It indicates that the request from 201 does not necessarily occur immediately after step 628.

【００５４】ステップ６２９では、ＣＰＵ２０１による
キャッシュ・ラインＡ＋４要求に対して、Ｌ１キャッシ
ュ２０２がアクセスされる。キャッシュ・ラインＡ＋４
はステップ６２４でＬ１キャッシュ２０２に挿入されて
いるので、キャッシュ・ラインＡ＋４がＣＰＵ２０１に
返却される。ステップ６３０では、ストリーム・アドレ
スが増分され、アドレスＡ＋５により先導される。ステ
ップ６３１では、キャッシュ・ラインＡ＋５がＬ１キャ
ッシュ２０２内に存在するか否かがチェックされ、存在
しない場合、キャッシュ・ラインＡ＋５がＬ２キャッシ
ュ２０３またはバッファ４０２から、Ｌ１キャッシュ２
０２にフェッチされる。In step 629, the L1 cache 202 is accessed in response to the cache line A + 4 request by the CPU 201. Cash line A + 4
Has been inserted into the L1 cache 202 in step 624, the cache line A + 4 is returned to the CPU 201. In step 630, the stream address is incremented and led by address A + 5. In step 631, it is checked whether the cache line A + 5 exists in the L1 cache 202, and if not, the cache line A + 5 is transferred from the L2 cache 203 or the buffer 402 to the L1 cache 2
02 is fetched.

【００５５】その後、ステップ６３２で、キャッシュ・
ラインＡ＋６がＰＢＦＲ２４０５からＰＢＦＲ４０２
に転送される。ステップ６３３では、キャッシュ・ライ
ンＡ＋７がＰＢＦＲ３４０６からＰＢＦＲ２４０５
に転送される。その後、ステップ６３４で、ノード制御
装置２０５が、キャッシュ・ラインＡ＋８をプリフェッ
チするように通知される。本技法では、ステップ６３２
におけるキャッシュ・ラインＡ＋６のフェッチは、ノー
ド制御装置２０５に、キャッシュ・ラインＡ＋８をプリ
フェッチするように通知する（ステップ６３４）。ステ
ップ６３５で、ノード制御装置２０５は、キャッシュ・
ラインＡ＋８をメモリ２０９からＰＢＦＲ３４０６に
プリフェッチする。Then, in step 632, the cache
Line A + 6 is from PBFR2 405 to PBFR402
Is forwarded to In step 633, cache line A + 7 is from PBFR3 406 to PBFR2 405.
Is forwarded to Then, at step 634, the node controller 205 is notified to prefetch cache line A + 8. In the present technique, step 632
The fetch of the cache line A + 6 in (1) notifies the node controller 205 to prefetch the cache line A + 8 (step 634). In step 635, the node controller 205 determines whether the cache
Prefetch line A + 8 from memory 209 into PBFR3 406.

【００５６】その後、ＣＰＵ２０１がキャッシュ・ライ
ンを順次増分しながらアクセスし続ける限り（すなわ
ち、ＣＰＵ２０１が割当てられたストリーム内のキャッ
シュ・ラインをアクセスし続ける）、プロセスは増分方
式で（ステップ６３６）、ステップ６２９乃至６３６を
ループし続ける。Thereafter, as long as CPU 201 continues to access the cache line while incrementing it sequentially (ie, CPU 201 continues to access the cache line in the allocated stream), the process is incremental (step 636) and steps Continue looping through 629 through 636.

【００５７】上述の議論では、バス・インタフェース
（ＢＩＵ）４０１は、システム・メモリ２０９からキャ
ッシュ・ラインのフェッチを実行する。In the above discussion, bus interface (BIU) 401 performs a cache line fetch from system memory 209.

【００５８】ノード制御装置２０５はスイッチ２０７上
のポートであり得る。The node controller 205 can be a port on the switch 207.

【００５９】有効アドレスはページ境界に跨り連続的で
あり、実アドレスは連続的でないので、ストリーム・ア
ドレス・バッファ５０１内で２つのアドレスを比較する
とき、有効アドレスを使用することがしばしば有利であ
る。更に、増分アドレスを生成するために、カウンタが
使用されてもよい。Since effective addresses are contiguous across page boundaries and real addresses are not contiguous, it is often advantageous to use effective addresses when comparing two addresses in stream address buffer 501. . Further, a counter may be used to generate the incremental address.

【００６０】上述のように、ＢＩＵ４０１からメモリ２
０９へのフェッチにおいて、制御ビットが使用され、ノ
ード制御装置２０５に、キャッシュ・ラインをＰＢＦＲ
２４０５及びＰＢＦＲ３４０６にプリフェッチするよ
うに通知する。１ビットがノード制御装置２０５に、こ
の特定のライン要求が、ノード制御装置２０５によるそ
のバッファへのプリフェッチの実行を要求する旨を通知
し得る。別の２ビットは、ノード制御装置に、プリフェ
ッチに関連付けられるストリーム番号を通知し得る。別
の１ビットは、アドレスが進行するキャッシュ・ライン
の方向を示し得る。ノード制御装置２０５はプリフェッ
チを実行するように通知されるとき、こうしたプリフェ
ッチを、ＣＰＵ２０１の動作とは独立に実行し得る。As mentioned above, from BIU 401 to memory 2
On fetch to 09, the control bit is used to direct the node controller 205 to cache line PBFR.
Notify 2405 and PBFR3 406 to prefetch. A one bit may inform node controller 205 that this particular line request requires node controller 205 to perform a prefetch into its buffer. Another two bits may inform the node controller of the stream number associated with the prefetch. Another one bit may indicate the direction of the cache line the address progresses. When the node controller 205 is notified to perform a prefetch, such a prefetch may be performed independently of the operation of the CPU 201.

【００６１】Ｌ１キャッシュ２０２及びＬ２キャッシュ
２０３を含む場合、前記プロシジャにおいて、キャッシ
ュ・ラインがＰＢＦＲ４０２からＬ１キャッシュ２０２
に転送されるとき、同一のキャッシュ・ラインがＬ２キ
ャッシュ２０３内にも含まれることになる。In the case of including the L1 cache 202 and the L2 cache 203, in the above procedure, the cache line is from the PBFR 402 to the L1 cache 202.
The same cache line will also be included in the L2 cache 203 when transferred to.

【００６２】Ｌ１キャッシュ２０２内にストリーム・バ
ッファ・ラインの１つを有する利点は、Ｌ１キャッシュ
２０２内のそのバッファ・ライン内に含まれる特定のキ
ャッシュ・ラインが、プロセッサ２０１により要求され
るときに、Ｌ１キャッシュ２０２内でミスではなく、ヒ
ットが発生することである。技術的には、たとえ要求さ
れるキャッシュ・ラインが、Ｌ１キャッシュ２０２に接
続される別のバッファ内に含まれていても、ミスは発生
する。こうしたミスにより、そのキャッシュ・ラインを
そのストリーム・バッファ・ラインからＣＰＵ２０１に
取り出すために、余分なハードウェア及びサイクル時間
が要求される。論理的には、ストリーム・バッファ・キ
ャッシュ・ラインの１つとして作用するＬ１キャッシュ
２０２内のキャッシュ・ラインは、プリフェッチ・スト
リーム・バッファ内に含まれると称される。The advantage of having one of the stream buffer lines in L1 cache 202 is that when a particular cache line contained within that buffer line in L1 cache 202 is requested by processor 201, A hit occurs in the L1 cache 202 instead of a miss. Technically, a miss will occur even if the requested cache line is contained within another buffer connected to the L1 cache 202. These misses require extra hardware and cycle time to fetch the cache line from the stream buffer line to the CPU 201. Logically, the cache line in L1 cache 202 that acts as one of the stream buffer cache lines is said to be contained within the prefetch stream buffer.

【００６３】まとめとして、本発明の構成に関して以下
の事項を開示する。In summary, the following items are disclosed regarding the configuration of the present invention.

【００６４】（１）プロセッサと、前記プロセッサにバ
スを介して接続されるシステム・メモリと、前記プロセ
ッサに接続される第１のキャッシュと、前記プロセッサ
に接続される第２のキャッシュと、前記システム・メモ
リからプリフェッチされる１つ以上のデータ・ラインを
記憶するストリーム・バッファ回路と、プリフェッチ状
態を示すストリーム・フィルタ回路と、前記ストリーム
・フィルタ回路に接続され、前記システム・メモリから
前記第１及び第２のキャッシュ、並びに前記ストリーム
・バッファ回路へのデータのフェッチ及びプリフェッチ
を選択的に制御する制御回路と、を含む、データ処理シ
ステム。（２）前記制御回路が、第１のキャッシュ・ラインに対
する要求を前記プロセッサから受信する回路と、前記第
１のキャッシュ・ラインが前記第１のキャッシュ内に存
在するか否かを判断する回路と、前記第１のキャッシュ
・ラインが前記第１のキャッシュ内に存在せず、前記第
１のキャッシュ・ラインが前記第２のキャッシュ内に存
在する場合、前記第１のキャッシュ・ラインを前記第２
のキャッシュから前記第１のキャッシュにフェッチする
回路と、前記第１のキャッシュ・ラインが前記第２のキ
ャッシュ内に存在しない場合、前記第１のキャッシュ・
ラインを前記システム・メモリから前記第１のキャッシ
ュにフェッチする回路と、前記第１のキャッシュ・ライ
ンのアドレスを増分し、第１の増分アドレスを生成する
回路と、前記増分アドレスを前記ストリーム・フィルタ
に記憶する回路と、を含む、前記（１）記載のシステ
ム。（３）前記制御回路が、前記増分アドレスを有する第２
のキャッシュ・ラインに対する要求を前記プロセッサか
ら受信する回路と、前記第２のキャッシュ・ラインが前
記第１のキャッシュ内に存在するか否かを判断する回路
と、前記増分アドレスが前記ストリーム・フィルタ内に
存在するか否かを判断する回路と、前記増分アドレスを
増分し、第２の増分アドレスを生成する回路と、前記第
２の増分アドレスで開始するストリームを割当てる回路
と、前記第２の増分アドレスを有する第３のキャッシュ
・ラインが、前記第１のキャッシュ内に存在するか否か
を判断する回路と、前記第２のキャッシュ・ラインが前
記第２のキャッシュ内に存在し、前記第２のキャッシュ
・ラインが前記第１のキャッシュ内に存在しない場合、
前記第２のキャッシュ・ラインを前記第２のキャッシュ
から前記第１のキャッシュにフェッチする回路と、前記
第２のキャッシュ・ラインが前記第２のキャッシュ内に
存在しない場合、前記第２のキャッシュ・ラインを前記
システム・メモリから前記第１のキャッシュにフェッチ
する回路と、前記第３のキャッシュ・ラインが前記第２
のキャッシュ内に存在し、前記第３のキャッシュ・ライ
ンが前記第１のキャッシュ内に存在しない場合、前記第
３のキャッシュ・ラインを前記第２のキャッシュから前
記第１のキャッシュにプリフェッチする回路と、前記第
３のキャッシュ・ラインが前記第２のキャッシュ内に存
在しない場合、前記第３のキャッシュ・ラインを前記シ
ステム・メモリから前記第１のキャッシュにプリフェッ
チする回路と、を含む、前記（２）記載のシステム。（４）前記第２の増分アドレスを増分し、第３の増分ア
ドレスを生成する回路と、前記第３の増分アドレスを有
する第４のキャッシュ・ラインが、前記第２のキャッシ
ュ内に存在するか否かを判断する回路と、前記第４のキ
ャッシュ・ラインが前記第２のキャッシュ内に存在しな
い場合、前記第４のキャッシュ・ラインを前記システム
・メモリから前記ストリーム・バッファ回路にプリフェ
ッチする回路と、前記第３のキャッシュ・ラインに対す
る要求を前記プロセッサから受信する回路と、前記第３
のキャッシュ・ラインを前記第１のキャッシュから前記
プロセッサに送信する回路と、前記第４のキャッシュ・
ラインが前記第１のキャッシュ内に存在するか否かを判
断する回路と、前記第４のキャッシュ・ラインが前記第
２のキャッシュ内に存在する場合、前記第４のキャッシ
ュ・ラインを前記第２のキャッシュから前記第１のキャ
ッシュにフェッチする回路と、前記第４のキャッシュ・
ラインが前記第２のキャッシュ内に存在しない場合、前
記第４のキャッシュ・ラインを前記ストリーム・バッフ
ァ回路から前記第１のキャッシュにフェッチする回路
と、前記第３の増分アドレスを増分し、第４の増分アド
レスを生成する回路と、前記第４の増分アドレスを有す
る第５のキャッシュ・ラインが、前記第２のキャッシュ
内に存在するか否かを判断する回路と、前記第５のキャ
ッシュ・ラインが前記第２のキャッシュ内に存在する場
合、前記第５のキャッシュ・ラインを前記第２のキャッ
シュから前記ストリーム・バッファ回路にフェッチする
回路と、前記第５のキャッシュ・ラインが前記第２のキ
ャッシュ内に存在しない場合、前記第５のキャッシュ・
ラインを前記システム・メモリから前記ストリーム・バ
ッファ回路にプリフェッチする回路と、を含む、前記
（３）記載のシステム。（５）前記ストリーム・バッファ回路に記憶される１つ
以上のキャッシュ・ラインが、前記第１のキャッシュ内
に配置される、前記（１）記載のシステム。（６）前記ストリーム・バッファ回路に記憶される１つ
以上のキャッシュ・ラインが、前記プロセッサを含むチ
ップ内に配置される、前記（１）記載のシステム。（７）前記ストリーム・バッファ回路に記憶される１つ
以上のキャッシュ・ラインが、前記プロセッサ及び前記
システム・メモリに接続されるノード制御装置内に配置
される、前記（１）記載のシステム。（８）前記第１のキャッシュが前記プロセッサと同一チ
ップ上に配置される１次キャッシュであり、前記第２の
キャッシュが前記チップの外部に配置される２次キャッ
シュである、前記（１）記載のシステム。（９）前記ストリーム・フィルタ回路が複数のストリー
ムを追跡可能であり、前記ストリーム・フィルタ回路内
の各エントリが、前記複数のストリームの１つを追跡
し、前記各エントリが、当該エントリにより追跡される
前記ストリームの妥当性を示す第１の標識と、当該エン
トリのアドレスの増分方向を示す第２の標識とを含む、
前記（１）記載のシステム。（１０）前記ストリーム・バッファ回路が各エントリに
対して、１）ページ・アドレスと、２）ライン・アドレ
スと、３）妥当性標識とを含む、前記（１）記載のシス
テム。（１１）前記ストリーム・フィルタ回路内のエントリが
実アドレスを有し、前記ストリーム・バッファ回路内の
エントリが有効アドレスを有する、前記（１）記載のシ
ステム。（１２）前記プロセッサ、前記制御回路、前記ストリー
ム・フィルタ回路、前記第１のキャッシュ、及び前記ス
トリーム・バッファ回路の一部が、同一チップ上に配置
される、前記（１）記載のシステム。（１３）第２のプロセッサと、前記第２のプロセッサに
接続される前記システム・メモリと、前記第２のプロセ
ッサに接続される第３のキャッシュと、前記第２のプロ
セッサに接続される第４のキャッシュと、前記第２のプ
ロセッサに接続され、前記システム・メモリからプリフ
ェッチされる１つ以上のデータ・ラインを記憶する第２
のストリーム・バッファ回路と、前記第２のプロセッサ
に接続され、プリフェッチ状態を示す第２のストリーム
・フィルタ回路と、前記第２のストリーム・フィルタ回
路に接続され、前記システム・メモリから前記第３及び
第４のキャッシュ、並びに前記第２のストリーム・バッ
ファ回路へのデータのフェッチ及びプリフェッチを選択
的に制御する、第２の制御回路と、を含む、前記（１）
記載のシステム。（１４）データ処理システムにおいて、第１のキャッシ
ュ・ラインに対する要求を前記プロセッサから受信する
ステップと、前記第１のキャッシュ・ラインが第１のキ
ャッシュ内に存在するか否かを判断するステップと、前
記第１のキャッシュ・ラインが前記第１のキャッシュ内
に存在せず、前記第１のキャッシュ・ラインが前記第２
のキャッシュ内に存在する場合、前記第１のキャッシュ
・ラインを前記第２のキャッシュから前記第１のキャッ
シュにフェッチするステップと、前記第１のキャッシュ
・ラインが前記第２のキャッシュ内に存在しない場合、
前記第１のキャッシュ・ラインを前記システム・メモリ
から前記第１のキャッシュにフェッチするステップと、
前記第１のキャッシュ・ラインのアドレスを増分し、第
１の増分アドレスを生成するステップと、前記増分アド
レスをストリーム・フィルタに記憶するステップと、を
含む、方法。（１５）前記増分アドレスを有する第２のキャッシュ・
ラインに対する要求を前記プロセッサから受信するステ
ップと、前記第２のキャッシュ・ラインが前記第１のキ
ャッシュ内に存在するか否かを判断するステップと、前
記増分アドレスが前記ストリーム・フィルタ内に存在す
るか否かを判断するステップと、前記増分アドレスを増
分し、第２の増分アドレスを生成するステップと、前記
第２の増分アドレスで開始するストリームを割当てるス
テップと、前記第２の増分アドレスを有する第３のキャ
ッシュ・ラインが、前記第１のキャッシュ内に存在する
か否かを判断するステップと、前記第２のキャッシュ・
ラインが前記第２のキャッシュ内に存在し、前記第２の
キャッシュ・ラインが前記第１のキャッシュ内に存在し
ない場合、前記第２のキャッシュ・ラインを前記第２の
キャッシュから前記第１のキャッシュにフェッチするス
テップと、前記第２のキャッシュ・ラインが前記第２の
キャッシュ内に存在しない場合、前記第２のキャッシュ
・ラインを前記システム・メモリから前記第１のキャッ
シュにフェッチするステップと、前記第３のキャッシュ
・ラインが前記第２のキャッシュ内に存在し、前記第３
のキャッシュ・ラインが前記第１のキャッシュ内に存在
しない場合、前記第３のキャッシュ・ラインを前記第２
のキャッシュから前記第１のキャッシュにプリフェッチ
するステップと、前記第３のキャッシュ・ラインが前記
第２のキャッシュ内に存在しない場合、前記第３のキャ
ッシュ・ラインを前記システム・メモリから前記第１の
キャッシュにプリフェッチするステップと、を含む、前
記（１４）記載の方法。（１６）前記第２の増分アドレスを増分し、第３の増分
アドレスを生成するステップと、前記第３の増分アドレ
スを有する第４のキャッシュ・ラインが、前記第２のキ
ャッシュ内に存在するか否かを判断するステップと、前
記第４のキャッシュ・ラインが前記第２のキャッシュ内
に存在しない場合、前記第４のキャッシュ・ラインを前
記システム・メモリから前記ストリーム・バッファ回路
にプリフェッチするステップと、前記第３のキャッシュ
・ラインに対する要求を前記プロセッサから受信するス
テップと、前記第３のキャッシュ・ラインを前記第１の
キャッシュから前記プロセッサに送信するステップと、
前記第４のキャッシュ・ラインが前記第１のキャッシュ
内に存在するか否かを判断するステップと、前記第４の
キャッシュ・ラインが前記第２のキャッシュ内に存在す
る場合、前記第４のキャッシュ・ラインを前記第２のキ
ャッシュから前記第１のキャッシュにフェッチするステ
ップと、前記第４のキャッシュ・ラインが前記第２のキ
ャッシュ内に存在しない場合、前記第４のキャッシュ・
ラインを前記ストリーム・バッファ回路から前記第１の
キャッシュにフェッチするステップと、前記第３の増分
アドレスを増分し、第４の増分アドレスを生成するステ
ップと、前記第４の増分アドレスを有する第５のキャッ
シュ・ラインが、前記第２のキャッシュ内に存在するか
否かを判断するステップと、前記第５のキャッシュ・ラ
インが前記第２のキャッシュ内に存在する場合、前記第
５のキャッシュ・ラインを前記第２のキャッシュから前
記ストリーム・バッファ回路にフェッチするステップ
と、前記第５のキャッシュ・ラインが前記第２のキャッ
シュ内に存在しない場合、前記第５のキャッシュ・ライ
ンを前記システム・メモリから前記ストリーム・バッフ
ァ回路にプリフェッチするステップと、を含む、前記
（１５）記載の方法。（１７）前記ストリーム・バッファ回路に記憶される１
つ以上のキャッシュ・ラインが、前記第１のキャッシュ
内に配置される、前記（１６）記載の方法。（１８）前記第１のキャッシュが前記プロセッサと同一
チップ上に配置される１次キャッシュであり、前記第２
のキャッシュが前記チップの外部に配置される２次キャ
ッシュである、前記（１７）記載の方法。（１９）前記ストリーム・フィルタ回路が複数のストリ
ームを追跡可能であり、前記ストリーム・フィルタ回路
内の各エントリが、前記複数のストリームの１つを追跡
し、前記各エントリが、当該エントリにより追跡される
前記ストリームの妥当性を示す第１の標識と、当該エン
トリのアドレスの増分方向を示す第２の標識とを含む、
前記（１８）記載の方法。（２０）前記ストリーム・フィルタ回路内のエントリが
実アドレスを有し、前記ストリーム・バッファ回路内の
エントリが有効アドレスを有する、前記（１９）記載の
方法。(1) A processor, a system memory connected to the processor via a bus, a first cache connected to the processor, a second cache connected to the processor, and the system A stream buffer circuit for storing one or more data lines prefetched from a memory, a stream filter circuit indicating a prefetch state, and a stream filter circuit connected to the stream filter circuit, the first and the second from the system memory A data processing system, comprising: a second cache; and a control circuit that selectively controls fetching and prefetching of data to the stream buffer circuit. (2) A circuit in which the control circuit receives a request for a first cache line from the processor, and a circuit in which it is determined whether the first cache line exists in the first cache. , If the first cache line is not in the first cache and the first cache line is in the second cache, then the first cache line is replaced by the second cache line.
For fetching from the first cache to the first cache, and if the first cache line is not present in the second cache, the first cache
A circuit for fetching a line from the system memory into the first cache; a circuit for incrementing the address of the first cache line to generate a first increment address; and the increment address for the stream filter The circuit according to (1) above, comprising: (3) The second control circuit has the increment address.
A request for a cache line from the processor, a circuit for determining whether the second cache line is in the first cache, and the incremental address in the stream filter. A circuit for determining whether or not there is a second increment address, a circuit for incrementing the increment address to generate a second increment address, a circuit for allocating a stream starting at the second increment address, and the second increment. A circuit for determining whether a third cache line having an address is present in the first cache; the second cache line is present in the second cache; Cache line is not present in the first cache,
A circuit for fetching the second cache line from the second cache to the first cache; and the second cache line if the second cache line is not present in the second cache. A circuit for fetching a line from the system memory into the first cache; and the third cache line with the second cache line.
Of the third cache line and the third cache line is not present in the first cache, the circuit for prefetching the third cache line from the second cache to the first cache. And prefetching the third cache line from the system memory into the first cache if the third cache line is not present in the second cache. ) The system described. (4) Whether a circuit for incrementing the second increment address and generating a third increment address and a fourth cache line having the third increment address are present in the second cache. A circuit for determining whether or not the fourth cache line is present in the second cache, and a circuit for prefetching the fourth cache line from the system memory to the stream buffer circuit. A circuit for receiving a request for the third cache line from the processor;
A cache line from the first cache to the processor;
A circuit for determining whether a line is present in the first cache; and if the fourth cache line is present in the second cache, the fourth cache line is set to the second cache line. Circuit for fetching from the first cache to the first cache;
A circuit for fetching the fourth cache line from the stream buffer circuit to the first cache if the line is not in the second cache; and incrementing the third increment address, A fifth cache line having a fifth cache line having the fourth increment address in the second cache, and a fifth cache line having a fifth cache line having the fourth increment address. Exists in the second cache, a circuit for fetching the fifth cache line from the second cache to the stream buffer circuit; and a fifth cache line for the second cache If not present in the fifth cache,
A circuit for prefetching a line from the system memory into the stream buffer circuit. (5) The system according to (1), wherein one or more cache lines stored in the stream buffer circuit are arranged in the first cache. (6) The system according to (1), wherein one or more cache lines stored in the stream buffer circuit are arranged in a chip including the processor. (7) The system according to (1), wherein one or more cache lines stored in the stream buffer circuit are arranged in a node controller connected to the processor and the system memory. (8) The above-mentioned (1), wherein the first cache is a primary cache arranged on the same chip as the processor, and the second cache is a secondary cache arranged outside the chip. System. (9) The stream filter circuit can track a plurality of streams, each entry in the stream filter circuit tracks one of the plurality of streams, and each entry is tracked by the entry. A first indicator indicating the adequacy of the stream according to the present invention, and a second indicator indicating the increment direction of the address of the entry.
The system according to (1) above. (10) The system according to (1), wherein the stream buffer circuit includes, for each entry, 1) page address, 2) line address, and 3) validity indicator. (11) The system according to (1), wherein the entry in the stream filter circuit has a real address, and the entry in the stream buffer circuit has a valid address. (12) The system according to (1), wherein the processor, the control circuit, the stream filter circuit, the first cache, and part of the stream buffer circuit are arranged on the same chip. (13) A second processor, the system memory connected to the second processor, a third cache connected to the second processor, and a fourth cache connected to the second processor. Second cache connected to the second processor for storing one or more data lines prefetched from the system memory
Stream buffer circuit, a second stream filter circuit connected to the second processor and indicating a prefetch state, and a second stream filter circuit connected to the third stream filter circuit from the system memory. (4) a fourth cache, and a second control circuit for selectively controlling fetching and prefetching of data to the second stream buffer circuit.
The system described. (14) In a data processing system, receiving a request for a first cache line from the processor, and determining whether the first cache line exists in the first cache. The first cache line is not in the first cache and the first cache line is the second cache line;
Fetching the first cache line from the second cache to the first cache if it is present in the second cache, and the first cache line is not present in the second cache If
Fetching the first cache line from the system memory into the first cache;
Incrementing the address of the first cache line to generate a first increment address, and storing the increment address in a stream filter. (15) A second cache having the increment address
Receiving a request for a line from the processor, determining if the second cache line is in the first cache, and the incremental address is in the stream filter. Determining whether or not; incrementing the increment address to generate a second increment address; allocating a stream starting at the second increment address; and having the second increment address. Determining whether a third cache line is present in the first cache; and the second cache line.
A second cache line from the second cache to the first cache if a line exists in the second cache and the second cache line does not exist in the first cache. Fetching the second cache line from the system memory to the first cache if the second cache line is not present in the second cache; A third cache line is present in the second cache, the third cache line
Second cache line is not present in said first cache, said third cache line is said to be said second cache line.
Pre-fetching from said cache to said first cache; and if said third cache line is not present in said second cache, said third cache line from said system memory to said first cache line. Prefetching into a cache, the method according to (14). (16) Incrementing the second increment address to generate a third increment address, and whether a fourth cache line having the third increment address exists in the second cache. Determining if the fourth cache line is not in the second cache and prefetching the fourth cache line from the system memory to the stream buffer circuit. Receiving a request for the third cache line from the processor; sending the third cache line from the first cache to the processor;
Determining whether the fourth cache line is in the first cache; and if the fourth cache line is in the second cache, the fourth cache Fetching a line from the second cache to the first cache; and if the fourth cache line is not present in the second cache, the fourth cache
Fetching a line from the stream buffer circuit into the first cache; incrementing the third increment address to generate a fourth increment address; and a fifth having the fourth increment address. Determining whether a cache line in the second cache is present in the second cache; and if the fifth cache line is in the second cache, the fifth cache line From the second cache to the stream buffer circuit, and if the fifth cache line is not in the second cache, the fifth cache line from the system memory. Prefetching into the stream buffer circuit. (17) 1 stored in the stream buffer circuit
The method of (16) above, wherein one or more cache lines are located within the first cache. (18) The first cache is a primary cache arranged on the same chip as the processor, and the second cache
The method according to (17) above, wherein the cache is a secondary cache located outside the chip. (19) The stream filter circuit can track a plurality of streams, each entry in the stream filter circuit tracks one of the plurality of streams, and each entry is tracked by the entry. A first indicator indicating the adequacy of the stream according to the present invention, and a second indicator indicating the increment direction of the address of the entry.
The method according to (18) above. (20) The method according to (19), wherein the entry in the stream filter circuit has a real address, and the entry in the stream buffer circuit has a valid address.

[Brief description of the drawings]

【図１】本発明により構成可能なマルチプロセッサ・シ
ステムを示す図である。FIG. 1 illustrates a multiprocessor system configurable according to the present invention.

【図２】本発明により構成可能なデータ処理システムを
示す図である。FIG. 2 is a diagram illustrating a data processing system configurable according to the present invention.

【図３】ストリーム・フィルタ及びストリーム・バッフ
ァを示す図である。FIG. 3 is a diagram showing a stream filter and a stream buffer.

【図４】図２に示されるシステムの詳細を示す図であ
る。FIG. 4 is a diagram showing details of the system shown in FIG. 2.

【図５】本発明による機能図である。FIG. 5 is a functional diagram according to the present invention.

【図６】本発明のフロー図である。FIG. 6 is a flow chart of the present invention.

【図７】本発明のフロー図である。FIG. 7 is a flow chart of the present invention.

【図８】本発明のフロー図である。FIG. 8 is a flow chart of the present invention.

【図９】本発明のフロー図である。FIG. 9 is a flow chart of the present invention.

[Explanation of symbols]

１００、２００データ処理システム２０２Ｌ１キャッシュ２０３Ｌ２キャッシュ２０４、２０６、２０８バス４０２プリフェッチ・バッファ 100, 200 data processing system 202 L1 cache 203 L2 cache 204, 206, 208 bus 402 prefetch buffer

───────────────────────────────────────────────────── フロントページの続き (72)発明者ドゥワイン・アラン・ヒクスアメリカ合衆国78660、テキサス州プフルガービル、ドューンズ・ドライブ 2405 (72)発明者デビッド・スコット・レイアメリカ合衆国78728、テキサス州ジョージタウン、ヤング・ランチ・ロード 700 (72)発明者シ−シン・ステファン・タンアメリカ合衆国78726、テキサス州オースティン、バーブルック・ドライブ 9923 ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Dwain Alan Hicks, USA 78660, Duffons Drive, Pflugerville, Texas 2405 (72) Inventor David Scott Ray United States 78728, Young Ranch, Texas, Young Ranch Road 700 (72) Inventor Shih Shin Stephan Tan 9923 Burbrook Drive, Austin, TX 78726, USA

Claims

[Claims]

1. A processor, a system memory connected to the processor via a bus, a first cache connected to the processor, a second cache connected to the processor, and the system cache. A stream buffer circuit for storing one or more data lines prefetched from a memory; a stream filter circuit indicating a prefetch state; a stream filter circuit connected to the stream filter circuit; And a control circuit that selectively controls fetching and prefetching of data to the stream buffer circuit.

2. The control circuit determines a circuit for receiving a request for a first cache line from the processor, and determining whether the first cache line exists in the first cache. Circuit, if the first cache line is not in the first cache and the first cache line is in the second cache, the first cache line is A circuit for fetching from the second cache to the first cache; and if the first cache line is not present in the second cache, the first cache line from the system memory to the first cache line. And a circuit for incrementing the address of the first cache line to generate a first increment address. Including a circuit for storing said incremental address to the stream filter system of claim 1, wherein.

3. The circuit for the control circuit to receive a request from the processor for a second cache line having the incremental address; and whether the second cache line is in the first cache. A circuit for determining whether or not the increment address is present in the stream filter; a circuit for incrementing the increment address to generate a second increment address; A circuit for allocating a stream starting at a second increment address; a circuit for determining whether a third cache line having the second increment address is present in the first cache; A cache line is present in the second cache, and the second cache line is the first cache line.
Circuit for fetching the second cache line from the second cache to the first cache if not present in the second cache; and the second cache line is not present in the second cache A circuit for fetching the second cache line from the system memory into the first cache; the third cache line being present in the second cache; The line is the first
Circuit for prefetching the third cache line from the second cache to the first cache if not present in the second cache; and the third cache line is not present in the second cache. A system for prefetching the third cache line from the system memory into the first cache, if desired.

4. A circuit for incrementing the second increment address to generate a third increment address; and a fourth cache line having the third increment address in the second cache. A circuit for determining whether or not to do so, and if the fourth cache line is not present in the second cache, prefetch the fourth cache line from the system memory to the stream buffer circuit. A circuit, a circuit for receiving a request for the third cache line from the processor, a circuit for transmitting the third cache line from the first cache to the processor, and a fourth cache line A circuit for determining whether a cache cache is present in the first cache and the fourth cache line is connected to the second cache line. A cache for fetching the fourth cache line from the second cache to the first cache, if present in the cache; and if the fourth cache line is not present in the second cache. A circuit for fetching the fourth cache line from the stream buffer circuit to the first cache; a circuit for incrementing the third increment address to generate a fourth increment address; Circuit for determining whether a fifth cache line having an incremented address of is present in the second cache; and if the fifth cache line is present in the second cache, A circuit for fetching the fifth cache line from the second cache to the stream buffer circuit; and a fifth cache line. 4. The system of claim 3, further comprising: prefetching the fifth cache line from the system memory into the stream buffer circuit if a cache line is not present in the second cache.

5. The system of claim 1, wherein one or more cache lines stored in the stream buffer circuit are located in the first cache.

6. The system of claim 1, wherein one or more cache lines stored in the stream buffer circuit are located in a chip containing the processor.

7. The system of claim 1, wherein one or more cache lines stored in the stream buffer circuit are located in a node controller connected to the processor and the system memory.

8. The first cache is a primary cache located on the same chip as the processor, and the second cache is a secondary cache located outside the chip. The system described.

9. The stream filter circuit is capable of tracking a plurality of streams, each entry in the stream filter circuit tracking one of the plurality of streams, and each entry according to the entry. The system of claim 1 including a first indicator of the validity of the stream being tracked and a second indicator of the incrementing direction of the address of the entry.

10. The stream buffer circuit for each entry: 1) page address and 2) line.
The system of claim 1, including an address and 3) a plausibility indicator.

11. The system of claim 1, wherein the entry in the stream filter circuit has a real address and the entry in the stream buffer circuit has a valid address.

12. The system of claim 1, wherein the processor, the control circuit, the stream filter circuit, the first cache, and a portion of the stream buffer circuit are located on the same chip.

13. A second processor, the system memory connected to the second processor, a third cache connected to the second processor, and connected to the second processor. A fourth cache; a second stream buffer circuit connected to the second processor for storing one or more data lines prefetched from the system memory; connected to the second processor A second stream filter circuit indicating a prefetch state, the third stream filter circuit connected to the second stream filter circuit from the system memory,
And selectively controlling fetching and prefetching of data to the second stream buffer circuit,
The control circuit of claim 1, and the system of claim 1.

14. In a data processing system, receiving a request for a first cache line from the processor, and determining whether the first cache line is in a first cache. And said first cache line is not present in said first cache and said first cache line is present in said second cache, said first cache line is said to be said Fetching from a second cache to the first cache; if the first cache line is not present in the second cache, the first cache line from the system memory to the first cache line; To the first cache line and incrementing the address of the first cache line to increase the first increment And generating an address, and storing the incremented address to the stream filter, the method.

15. A step of receiving a request from the processor for a second cache line having the incremented address, and determining whether the second cache line is in the first cache. Determining whether the increment address is in the stream filter; incrementing the increment address to generate a second increment address; starting with the second increment address Allocating a stream to be processed, determining whether a third cache line having the second increment address is present in the first cache, and the second cache line is Is in a second cache, and the second cache line is the first cache line.
Fetching the second cache line from the second cache to the first cache if not present in the second cache line; and the second cache line is not present in the second cache. If the second cache line is fetched from the system memory into the first cache, the third cache line is in the second cache, and the third cache line is present in the third cache line. The line is the first
Pre-fetching the third cache line from the second cache to the first cache if not present in the second cache, the third cache line not present in the second cache 15. If so, then prefetching the third cache line from the system memory into the first cache.

16. Incrementing the second increment address to increment the third increment address.
Generating an incremental address of the second cache line, determining whether a fourth cache line having the third incremental address is present in the second cache, and the fourth cache line If not present in the second cache, prefetching the fourth cache line from the system memory into the stream buffer circuit; and receiving a request for the third cache line from the processor. Sending the third cache line from the first cache to the processor, and determining whether the fourth cache line is in the first cache. And the fourth cache line resides in the second cache Fetching the fourth cache line from the second cache to the first cache; and if the fourth cache line is not in the second cache, the fourth cache line Fetching a line from the stream buffer circuit into the first cache; incrementing the third increment address to generate a fourth increment address; and a fifth having the fourth increment address. Determining whether a cache line in the second cache exists in the second cache; and if the fifth cache line exists in the second cache, the fifth cache line Fetching from the second cache into the stream buffer circuit, the fifth cache 16. The method of claim 15, comprising pre-fetching the fifth cache line from the system memory into the stream buffer circuit if a cache line is not present in the second cache.

17. The method of claim 16, wherein one or more cache lines stored in the stream buffer circuit are located in the first cache.

18. The first cache is a primary cache arranged on the same chip as the processor, and the second cache is arranged outside the chip.
18. The method of claim 17, which is a next cache.

19. The stream filter circuit is capable of tracking a plurality of streams, each entry in the stream filter circuit tracking one of the plurality of streams, and each entry according to the entry. 19. The method of claim 18, including a first indicator of the validity of the stream being tracked and a second indicator of the incrementing direction of the address of the entry.

20. The method of claim 19, wherein the entry in the stream filter circuit has a real address and the entry in the stream buffer circuit has a valid address.