WO2016155623A1

WO2016155623A1 - Information-push-based information system and method

Info

Publication number: WO2016155623A1
Application number: PCT/CN2016/077853
Authority: WO
Inventors: 林正浩
Original assignee: Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Current assignee: Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority date: 2015-03-30
Filing date: 2016-03-30
Publication date: 2016-10-06
Anticipated expiration: 2017-09-30

Abstract

An information processing system and method. The method comprises: generating in advance, by an independent address generator (22), a memory address to access a memory (10); reading information from the memory (10), and pushing the same to a processor (12) for execution or processing, thus reducing the time the processor (12) waits for the information.

Description

Information system and method based on information push

Technical field

本发明涉及计算机，通讯及集成电路领域。特别涉及一种信息处理系统、信息处理方法及存储系统。 The invention relates to the field of computers, communications and integrated circuits. In particular, it relates to an information processing system, an information processing method, and a storage system.

Background technique

存储程序计算机中的处理器从存储器中读取指令或数据供CPU执行，执行的结果送回存储器中存储。图1是现有的存储程序计算机的一个简化框图，其中处理器12包含了执行单元及控制单元，12产生地址经总线11访问存储器10，10根据11上地址经总线13向02提供信息，此处及以下所述信息包括计算机指令及数据。处理器12的执行结果数据也经总线13被存储回10。图1中虚线以上为存储器及其附属逻辑，此后简称为存储系统；虚线以下为处理器及其附属逻辑，此后简称为处理系统。此后存储系统与处理系统合称信息系统或计算机。计算机的输入及输出单元在图1中及本公开的其他实施例中为便于说明被省略。随着技术的进步，存储器的容量增大，其访问延迟增大，存储器访问的通道延迟也增大；而处理器的执行速度却增快，因此存储器访问延迟日益成为计算机性能提高的严重瓶颈。在此存储器访问延迟包括存储器件内部的访问延迟；及存储器间与处理器间的传输通道延迟，比如复数个芯片间在电路板或共用基底上或硅片通孔TSV的传输延迟，中间层次芯片如北桥的延迟，多个存储芯片间接力串接的延迟，因传输格式转换导致的延迟，因通信协议导致的延迟，甚或远距通讯信道上的延迟，例如网络、无线通讯的延迟等。 The processor in the stored program computer reads instructions or data from the memory for execution by the CPU, and the results of the execution are sent back to the memory for storage. Figure 1 is a simplified block diagram of a conventional stored program computer, The processor 12 includes an execution unit and a control unit, 12 generates an address to access the memory 10 via the bus 11, 10 provides information to the 02 via the bus 13 according to the upper address of 11, and the information herein includes computer instructions and data. The execution result data of the processor 12 is also stored back to 10 via the bus 13. The dotted line above in Figure 1 is the memory and its associated logic, hereinafter referred to as the storage system; the dotted line below is the processor and its associated logic, hereinafter referred to as the processing system. Thereafter, the storage system and the processing system are collectively referred to as an information system or computer. The input and output units of the computer are omitted in FIG. 1 and other embodiments of the present disclosure for ease of explanation. With the advancement of technology, the capacity of the memory increases, the access delay increases, the channel delay of the memory access also increases, and the execution speed of the processor increases, so the memory access delay increasingly becomes a serious bottleneck for the improvement of the computer performance. The memory access delay includes the access delay inside the memory device; and the transmission channel delay between the memory and the processor, such as the transmission delay between the plurality of chips on the circuit board or the shared substrate or the TSV via, the intermediate layer chip For example, the delay of the North Bridge, the delay of the indirect connection of multiple memory chips, the delay caused by the transmission format conversion, the delay caused by the communication protocol, or even the delay on the long-distance communication channel, such as the delay of the network and wireless communication.

technical problem

因此，存储程序计算机使用缓存器以掩盖存储器访问延迟以缓解此一瓶颈。图2是现有的使用缓存器的存储程序计算机的简化示意图，其中处理器12包含了执行单元及控制单元，其中可以包含有高层次的缓存器也可以不包含；14为处理系统中存储层次结构最低的缓存器的标签单元，即最后层次缓存器（Last Level Cache，以下简称LLC）的标签单元；16为LLC的数据存储器RAM。12产生地址11送到14及16，如地址11与14中存储的标签匹配，则16将相应的数据经总线13直接送到12供执行，如此避免了访问存储器10的延迟；如地址11与14中存储的标签不匹配，则将地址11经总线17送往存储系统访问存储器10，读取缓存块经总线19将相应的数据送到处理系统以填充16，将16的输出或将总线19上的数据旁路经13送到12供执行。12的执行结果经13被存储回16，并根据缓存器16的写回规则经总线19写回存储系统的存储器10。在图2的信息系统中，当地址11与LLC中标签单元14中的标签不匹配时，则存储器访问延迟仍然无法掩盖，且随着技术的进步，该延迟日益成为计算机性能提升的瓶颈。本发明提出的方法与系统装置能直接解决上述或其他的一个或多个困难。 Therefore, the stored program computer uses a buffer to mask the memory access latency to alleviate this bottleneck. 2 is a simplified schematic diagram of a conventional stored procedure computer using a buffer, The processor 12 includes an execution unit and a control unit, which may or may not include a high-level buffer; 14 is a label unit that stores the lowest-level buffer in the processing system, that is, a last-level buffer (Last) Level Cache, Hereinafter referred to as the label unit of LLC); 16 is the data memory RAM of the LLC. 12 generates address 11 to 14 and 16, if the tags stored in addresses 11 and 14 match, then 16 sends the corresponding data directly to 12 for execution via bus 13, thus avoiding delays in accessing memory 10; such as address 11 and If the tags stored in 14 do not match, then address 11 is sent via bus 17 to memory system access memory 10, which reads the corresponding data over bus 19 to the processing system to fill 16, the output of 16 or the bus 19 The data bypass on the 13 is sent to 12 for execution. The execution result of 12 is stored back to 16 via 13 and written back to the memory 10 of the storage system via bus 19 in accordance with the writeback rules of buffer 16. In the information system of FIG. 2, when the address 11 does not match the label in the label unit 14 in the LLC, the memory access delay is still unmaskable, and as technology advances, the delay is increasingly becoming a bottleneck for computer performance improvement. The method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.

存储程序计算机使用缓存器试图掩盖存储器访问延迟以缓解此一瓶颈，但目前仍然无法完全掩盖。为此某些存储程序计算机使用预取试图进一步掩盖存储器访问延迟，但也遇到了困难。其中的一些困难是因为多层次缓存而导致，应为从较高缓存层次的预取请求要通过复数个存储层次，层层传递才能到达最低层的存储器。 The stored program computer uses a buffer to try to mask the memory access latency to alleviate this bottleneck, but it still cannot be completely masked. For this reason, some stored program computers use prefetching to try to further mask memory access latency, but they also encounter difficulties. Some of these difficulties are caused by multi-level caching. Prefetch requests from a higher cache level should pass through multiple storage tiers, and layer-level delivery can reach the lowest tier of memory.

Technical solution

本发明的目的在于提供一种信息处理系统、信息处理方法及存储系统，以提高系统运行速度。 It is an object of the present invention to provide an information processing system, an information processing method, and a storage system to improve system operation speed.

在本发明提供的信息处理系统、信息处理方法及存储系统中，利用地址产生器根据当前输出的信息块产生并输出更多地址，从而使得存储器 / 寄存器能够据此输出更多的信息块，即相当于在处理器（信息块需求设备）未提出信息请求前，预先向其提供其可能需要的信息块，由此，便加快了处理器（信息块需求设备）对信息块的获取速度，进而提高了处理器及信息处理系统的运行速度。 In the information processing system, the information processing method, and the storage system provided by the present invention, an address generator is used to generate and output more addresses according to the currently output information block, thereby causing the memory/ The register can output more information blocks according to this, which is equivalent to providing the information block that the processor (the information block requirement device) may need before it requests the information, thereby speeding up the processor (information) Block demand device) speeds up the acquisition of information blocks, which in turn increases the speed of the processor and information processing system.

进一步地，本发明提出了一种在处理器的存储器层次结构（Memory Hierarchy）中增设地址产生器的方法及其相应系统装置，该增设的地址产生器可以与处理器协同工作以解决上述或其他的一个或多个困难。所述增设的地址产生器可以先于处理器核产生预测地址，从较低的存储器层次读取指令及数据存入本层次的存储器中，以备处理器核使用。 Further, the present invention proposes a memory hierarchy (Memory Hierarchy) in a processor. Method of adding an address generator and its corresponding system device, the additional address generator can work in conjunction with the processor to address one or more of the above or other difficulties . The additional address generator may generate a predicted address prior to the processor core, and read instructions and data from the lower memory level into the memory of the hierarchy for use by the processor core.

Beneficial effect

本发明所述方法和系统装置可以掩盖处理器经过时延通道对存储器访问所导致的延迟。对于本领域专业人士而言，本发明的其他优点和应用是显见的。 The method and system apparatus of the present invention can mask the delay caused by the processor accessing the memory via the latency channel. Other advantages and applications of the present invention will be apparent to those skilled in the art.

DRAWINGS

图 1 是现有的一种信息处理系统的框结构示意图； 1 is a block diagram showing a structure of an existing information processing system;

图 2 是现有的另一种信息处理系统的框结构示意图； 2 is a block diagram showing another structure of an existing information processing system;

图 3 是本发明实施例一的信息处理系统的框结构示意图； 3 is a block diagram showing the structure of an information processing system according to Embodiment 1 of the present invention;

图 4 是本发明实施例二的信息处理系统的框结构示意图； 4 is a block diagram showing the structure of an information processing system according to Embodiment 2 of the present invention;

图 5 是本发明实施例二的地址产生器实现方式及相关逻辑的原理示意图； FIG. 5 is a schematic diagram of an implementation manner of an address generator and related logic according to Embodiment 2 of the present invention; FIG.

图 6 是本发明实施例三的信息处理系统的框结构示意图； 6 is a block diagram showing the structure of an information processing system according to Embodiment 3 of the present invention;

图 7 是本发明实施例四的信息处理系统的框结构示意图。 FIG. 7 is a block diagram showing the structure of an information processing system according to Embodiment 4 of the present invention.

图 8 是本发明实施例五的存储层次系统的框结构示意图； 8 is a block diagram showing a structure of a storage hierarchy system according to Embodiment 5 of the present invention;

图 9 是本发明实施例六的存储层次系统的框结构示意图； 9 is a block diagram showing a structure of a storage hierarchy system according to Embodiment 6 of the present invention;

图 10 是本发明实施例七的存储层次系统的框结构示意图； 10 is a block diagram showing a structure of a storage hierarchy system according to Embodiment 7 of the present invention;

图 11 是本发明实施例八的存储层次系统的框结构示意图； 11 is a block diagram showing a structure of a storage hierarchy system according to Embodiment 8 of the present invention;

图 12 是本发明实施例九的存储层次系统的框结构示意图； 12 is a block diagram showing the structure of a storage hierarchy system according to Embodiment 9 of the present invention;

图 13 是本发明实施例十的存储层次系统的框结构示意图； 13 is a block diagram showing the structure of a storage hierarchy system according to Embodiment 10 of the present invention;

图 14 是本发明实施例十一的存储层次系统的扫描器结构示意图； 14 is a schematic structural diagram of a scanner of a storage hierarchy system according to Embodiment 11 of the present invention;

图 15 是本发明实施例十二的存储层次系统的扫描器结构示意图。 15 is a schematic diagram showing the structure of a scanner of a storage hierarchy system according to Embodiment 12 of the present invention.

对于本领域专业人士，还可以在本发明的说明、权利要求和附图的启发下，理解、领会本发明所包含其他方面内容。 Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.

BEST MODE FOR CARRYING OUT THE INVENTION

发明的最佳实施方式是附图3。 The preferred embodiment of the invention is shown in Figure 3.

Embodiments of the invention

以下结合附图和具体实施例对本发明作进一步详细说明。根据下面说明和权利要求书，本发明的优点和特征将更清楚。需说明的是，附图均采用非常简化的形式且均使用非精准的比例，仅用以方便、明晰地辅助说明本发明实施例的目的。 The invention will be further described in detail below with reference to the drawings and specific embodiments. Advantages and features of the present invention will be apparent from the description and appended claims. It should be noted that the drawings are in a very simplified form and all use non-precise proportions, and are only for convenience and clarity to assist the purpose of the embodiments of the present invention.

需说明的是，为了清楚地说明本发明的内容，本发明特举多个实施例以进一步阐释本发明的不同实现方式，其中，该多个实施例是列举式并非穷举式。此外，为了说明的简洁，前实施例中已提及的内容往往在后实施例中予以省略，因此，后实施例中未提及的内容可相应参考前实施例。 It should be noted that the various embodiments of the present invention are further described to illustrate the various embodiments of the present invention in order to clearly illustrate the present invention. Further, for the sake of brevity of explanation, the contents already mentioned in the foregoing embodiment are often omitted in the latter embodiment, and therefore, contents not mentioned in the latter embodiment can be referred to the previous embodiment accordingly.

虽然该发明可以以多种形式的修改和替换来扩展，说明书中也列出了一些具体的实施图例并进行详细阐述。应当理解的是，发明者的出发点不是将该发明限于所阐述的特定实施例，正相反，发明者的出发点在于保护所有基于由本权利声明定义的精神或范围内进行的改进、等效转换和修改。同样的元器件号码可能被用于所有附图以代表相同的或类似的部分。 Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may be used in all figures to represent the same or similar parts.

此外，在本说明书中以包含处理器的体系结构为例进行说明，本发明技术方案可以被应用于包含任何合适的处理器（ Processor ）的体系结构。例如，所述处理器可以是通用处理器（ General Purpose Processor ）、中央处理器（ CPU ）、微处理器（ Microprocessor ）、处理器核（ Core ）、微控制器（ MCU ）、数字信号处理器（ DSP ）、图象处理器（ GPU ）、片上系统 (SOC) 、专用集成电路（ ASIC ）等。所述存储器或缓存可以由任何合适的存储设备构成，如：寄存器（ register ）或寄存器堆（ register file ）、静态存储器（ SRAM ）、动态存储器（ DRAM ）、闪存存储器（ Flash memory ）、硬盘（ HD ）、固态硬盘（ SSD ）以及任何一种合适的存储器件或未来的新形态存储器。 In addition, in the present specification, an architecture including a processor is taken as an example, and the technical solution of the present invention can be applied to include any suitable processor ( Processor) architecture. For example, the processor may be a general purpose processor or a central processing unit (CPU) ), Microprocessor, Core, Microcontroller (MCU), Digital Signal Processor (DSP), Image Processor (GPU) ), system on chip (SOC), application specific integrated circuit (ASIC), etc. The memory or cache may be constructed of any suitable storage device, such as a register or register file ( Register file ), static memory (SRAM), dynamic memory (DRAM), flash memory (Flash memory), hard disk (HD) ), Solid State Drive (SSD) and any suitable storage device or future new form of memory.

本发明提出了一种在存储系统中增设一个地址产生器的方法及其相应系统装置，该增设的地址产生器可以与处理器协同工作以解决现有技术导致的一个或多个困难。所述增设的地址产生器可以采取处理器执行指令，或扫描器解析所述存储器输出的信息等多种形式先于所述处理器产生地址，从所述存储器读取指令及数据向所述处理器推送；所述处理器按照程序及内部状态执行推送来的指令，处理从存储器推送来的数据或原来就存在处理器侧的数据，并将处理的结果写回所述存储器。 The invention provides a method for adding an address generator in a storage system and a corresponding system device thereof. The additional address generator can work in conjunction with the processor to address one or more difficulties caused by the prior art . The additional address generator may take a processor to execute an instruction, or the scanner parses the information output by the memory, and the like, generating an address prior to the processor, and reading the instruction and data from the memory to the processing. The processor pushes the pushed instruction according to the program and the internal state, processes the data pushed from the memory or the data on the processor side, and writes the result of the processing back to the memory.

在本发明所述的信息系统中，所述处理器可以完全不向存储系统输出存储器地址，而由所述存储系统中的地址产生器独立产生存储器地址访问存储器，向处理系统推送（ push ）相应的指令与数据。也可以在大部分情况下由地址产生器独立产生存储器地址，但在少数情况下由处理系统中处理器向存储系统输出控制流信息（ control flow information ） , 如分支判断，以简化地址产生器结构。地址产生器根据所述控制流信息确定程序执行路径，产生后续指令的位置信息，并以此向处理器核推送（ push ）相应的指令或数据。为更进一步地简化地址产生器的结构，也可以允许在某些特定条件下由处理器向存储系统中的地址产生器提供存储器地址，地址产生器从处理器提供的存储器地址开始独立执行程序。 In the information system of the present invention, the processor may not output a memory address to the storage system at all, but the address generator in the storage system independently generates a memory address to access the memory and pushes to the processing system (push ) corresponding instructions and data. It is also possible in most cases for the memory address to be generated independently by the address generator, but in a few cases the processor in the processing system outputs control flow information to the storage system (control flow) Information ) , as branched, to simplify the address generator structure. The address generator determines a program execution path according to the control flow information, generates location information of a subsequent instruction, and pushes the device to the processor core ( Push ) the corresponding instruction or data. To further simplify the structure of the address generator, it is also possible to allow the processor to provide a memory address to an address generator in the memory system under certain conditions, and the address generator independently executes the program from the memory address provided by the processor.

【实施例一】请参考图 3 ，其为本发明实施例一的信息处理系统的框结构示意图。此实施例中所述地址产生器是一个专用的计算单元，包含若干寄存器，以及加法器、移位器、逻辑运算器中的部分或全部，至少可执行所述处理器的部分功能，可计算产生指令或数据地址并作分支判断。如此，可以直接在存储系统中执行与存储器地址产生相关的指令，并根据计算得到的指令或数据存储器地址访问所述存储器读取相应指令或数据推送给所述处理器执行。 [First Embodiment] Please refer to Figure 3, which is A block diagram of an information processing system according to Embodiment 1 of the present invention. The address generator in this embodiment is a dedicated computing unit, including a plurality of registers, and some or all of the adders, shifters, and logical operators, at least part of the functions of the processor can be performed, and can be calculated Generate an instruction or data address and make a branch decision. In this way, instructions related to memory address generation can be directly executed in the storage system, and the corresponding instructions or data can be read and sent to the processor for execution according to the calculated instruction or data memory address.

在图 3 中及后述各实施例中，图3中虚线以下的处理系统与图 1 实施例中类似， 12 为处理器， 12 中可以包含有缓存器，也可以不包含缓存器。存储系统中（虚线以上） 10 为存储器， 22 为存储系统增设的地址产生器， 18 为选择器。 22 中同样可以包含有缓存器，也可以不包含缓存器。地址产生器 22 产生地址经总线21以访问存储器10，10经总线23向地址产生器 22及处理器 12提供数据及指令。选择器18选择地址产生器 22 的执行结果25或处理器 12的执行结果15经总线29写回存储器10。此实施例中处理器12 不送出地址，没有如图1实施例所示的经总线11由处理系统向存储系统传送存储器地址。地址产生器 22 可以与处理器 12执行同样的指令集，即22 与 12 功能相同；22 与 12 也可以执行不完全相同的指令集，其中 22 或 12 中的一个执行完整的指令集，另一个执行一个有限制的指令集。执行限制指令集的处理器并不执行超出其限制指令集范围的指令，而仅设置标志位以记录超范围事件。地址产生器 22 与处理器12执行同样指令集，或地址产生器 22执行完整指令集而处理器 12执行限制指令集的应用可以是由地址产生器 22 将处理器 12需用的程序指令及数据向处理系统推送；处理器 12执行程序处理由处理系统采集或已在处理系统中存放的数据，然后将处理器 12处理的最终结果送回存储系统存储器10存储。 In the embodiment of FIG. 3 and the following description, the processing system below the dotted line in FIG. 3 is similar to that in the embodiment of FIG. 1, and 12 is a processor. 12 may or may not contain a buffer. In the storage system (above the dotted line) 10 is the memory, 22 is the address generator added to the storage system, and 18 is the selector. twenty two It can also contain a buffer or not. The address generator 22 generates an address via the bus 21 to access the memory 10, 10 via the bus 23 to the address generator 22 and the processor. 12 provides data and instructions. The selector 18 selects the execution result 25 of the address generator 22 or the execution result 15 of the processor 12 to write back to the memory 10 via the bus 29. Processor 12 in this embodiment The address is not sent, and the memory address is not transferred by the processing system to the memory system via the bus 11 as shown in the embodiment of FIG. The address generator 22 can execute the same set of instructions as the processor 12, namely 22 and 12 The same function; 22 and 12 can also execute a different instruction set, 22 or 12 One of them executes the complete instruction set and the other executes a restricted instruction set. Processors that execute a restricted instruction set do not execute instructions that are outside the scope of their restricted instruction set, but only set flag bits to record out-of-range events. Address generator An application that executes the same instruction set with the processor 12, or the address generator 22 executes the complete instruction set and the processor 12 executes the restricted instruction set may be the processor generated by the address generator 22. 12 required program instructions and data are pushed to the processing system; processor 12 executes the program to process data collected by the processing system or stored in the processing system, and then the processor The final result of the 12 processing is sent back to the storage system memory 10 for storage.

地址产生器 22执行限制指令集而处理器 12执行完整指令集的应用可以是由地址产生器 22将处理器12需用的程序指令及数据向处理系统推送。地址产生器22执行其限制指令集中的指令，并将由执行限制指令集中指令所产生的结果写回存储器10；处理器 12则执行完整指令集中的所有指令，将除了执行限制指令集可产生的结果以外的结果写回存储系统的存储器10。例如地址产生器 22 的限制指令集中仅含改变程序流向的操作所必须的指令，如装载 LD，存储ST，加 ADD，减 SUB，移位 SHIFT, 逻辑，比较COMP，分支 BR，跳转 JUMP等，此时地址产生器 22的主要功能是生成存储器地址，地址产生器 22 执行程序，产生地址送到存储器10中读取信息送往地址产生器22及处理器12执行及处理。上述限制指令集也使得地址产生器22足以执行如将数据从存储器10中的一个区域移动到另一个区域的操作；或对存储器10中的一个区域中的数据进行简单操作，将其写回该区域或另一个区域的操作；或将输入单元的数据存入存储器10中，将10中数据移动到输出单元的操作；减少这些操作对存储系统与处理系统之间通道的带宽要求。同时，地址产生器 22 执行一条指令的时机一般较处理器12执行同一条指令要早一个存储系统到处理系统的通道延迟。此外，地址产生器22产生地址经总线21访问存储器10的时间要比处理器12产生的同样地址再经过处理系统到存储系统的通道延迟访问存储器10要早存储系统与处理系统之间的双向通道延迟。因此实施例一及图3所公开的本发明信息系统结构比较现有系统结构，例如图1或图2实施例所示，可掩盖存储系统与处理系统之间的双向通道延迟。 The address generator 22 executes the restricted instruction set and the processor 12 executes the complete instruction set. The application may be by the address generator. 22 Push the program instructions and data required by the processor 12 to the processing system. The address generator 22 executes its instructions in the set of restricted instructions and writes back the results produced by the instructions in the set of restricted instructions to the memory 10; the processor 12 then executes all instructions in the complete instruction set, writing results other than the results that can be generated by executing the restricted instruction set back to the memory 10 of the storage system. For example, address generator 22 The restricted instruction set contains only the instructions necessary to change the flow of the program, such as loading LD, storing ST, adding ADD, subtracting SUB, shifting SHIFT, logic, comparing COMP, branch BR, jumping JUMP, etc., at this time, the main function of the address generator 22 is to generate a memory address, and the address generator 22 The program is executed, and the generated address is sent to the memory 10 for reading the information and sent to the address generator 22 and the processor 12 for execution and processing. The above-described set of restricted instructions also enables the address generator 22 to perform operations such as moving data from one area to another in the memory 10; or performing a simple operation on data in an area in the memory 10, writing it back to the The operation of a zone or another zone; or the operation of storing data of an input unit into memory 10, moving data in 10 to the output unit; reducing the bandwidth requirements of these operations for the channel between the storage system and the processing system. At the same time, the address generator twenty two The timing of executing an instruction is generally one channel delay from the storage system to the processing system earlier than the processor 12 executing the same instruction. In addition, the address generator 22 generates an address to access the memory 10 via the bus 21 for a time longer than the same address generated by the processor 12 and then passes through the processing system to the memory system's channel delay access memory 10 to the bidirectional channel between the storage system and the processing system. delay. Therefore, the information system structure of the present invention disclosed in Embodiment 1 and FIG. 3 compares the existing system structure, for example, as shown in the embodiment of FIG. 1 or FIG. 2, to mask the bidirectional channel delay between the storage system and the processing system.

【实施例二】请参考图 4 ，其为本发明实施例二的信息处理系统的框结构示意图，其在图 3 实施例的基础上在处理系统与存储系统都增设了相同的最后层次缓存LLC 。在图 4 中，存储系统中10为存储器，22 为地址产生器， 24 为存储系统LLC的标签单元，26为存储系统中LLC的数据存储器RAM，28为暂存地址产生器 22 输出的待存储入存储器10及LLC RAM 26的数据的先入先出（FIFO），18为选择来自28或处理器 12的存储数据以供存储进存储器10的选择器；虚线下的处理系统与图 2 实施例中类似， 12 为处理器， 14 为处理系统LLC的标签单元，16为处理系统LLC的数据存储器RAM。存储系统中地址产生器 22产生地址21送到26及送到24进行筛选，如地址21与24中存储的标签匹配，则26将相应的信息经总线27直接送到22供执行；处理系统中处理器12产生地址经总线11送到14及16，如总线11上的地址与14中存储的标签匹配，则16将相应的信息经总线13直接送到12供执行。总线11仅在处理系统内部使用，与现有技术中，如图1或图2实施例所示，将处理系统中处理器 12产生的地址送到存储系统以访问存储器10不同。如存储系统中地址产生器22产生的地址21与存储系统中标签单元24中存储的标签不匹配，则筛选出总线21上的所述地址以访问存储器10，读取缓存块经总线23填充存储系统LLC RAM 26，将26的输出或将总线23上的信息旁路经总线27送到地址产生器22供执行；也将读取的信息块经总线23送到处理系统，此时处理系统中处理器12产生的地址11与处理系统中LLC标签单元14中存储的标签也不匹配，因此以总线23上缓存块填充处理系统中LLC RAM 16，将16的输出或将直接将总线23上的数据旁路经总线13送到处理器12供执行。另外还有处理系统到存储系统的总线15供将处理器12的运算结果送到存储系统经选择器18选择后经总线29存入存储器10及LLC RAM 26。还有处理系统到存储系统的总线19，将处理器 12的分支判断和/或分支目标地址送到地址产生器 22以备地址产生器不能独立完成分支操作时用。 [Embodiment 2] Please refer to FIG. 4, which is a block diagram of an information processing system according to Embodiment 2 of the present invention, which is in FIG. On the basis of the embodiment, the same final level cache LLC is added to both the processing system and the storage system. In Figure 4, 10 is the memory in the storage system, 22 is the address generator, 24 For the tag unit of the storage system LLC, 26 is the data memory RAM of the LLC in the storage system, and 28 is the output of the temporary address generator 22 to be stored in the memory 10 and the LLC RAM. A first in first out (FIFO) of data of 26, 18 is a selector that selects stored data from 28 or processor 12 for storage into memory 10; a processing system under the dashed line is similar to that in the embodiment of Figure 2, 12 For the processor, 14 is the tag unit of the processing system LLC, and 16 is the data memory RAM of the processing system LLC. Address generator in storage system 22 generates address 21 to 26 and sends to 24 for screening. If the tags stored in addresses 21 and 24 match, then 26 sends the corresponding information directly to 22 for execution via bus 27; processor 12 generates the address in the processing system. Bus 11 is sent to 14 and 16, and if the address on bus 11 matches the tag stored in 14, then 16 the corresponding message is sent directly to 12 via bus 13 for execution. The bus 11 is only used inside the processing system, and in the prior art, as shown in the embodiment of FIG. 1 or FIG. 2, the processor in the processing system will be processed. The resulting address is sent to the storage system to access the memory 10 differently. If the address 21 generated by the address generator 22 in the storage system does not match the tag stored in the tag unit 24 in the storage system, the address on the bus 21 is filtered out to access the memory 10, and the read cache block is filled and stored via the bus 23. System LLC RAM 26, the output of 26 or the information on the bus 23 is bypassed via bus 27 to the address generator 22 for execution; the read information block is also sent to the processing system via bus 23, at which time the processor 12 in the processing system The generated address 11 does not match the tag stored in the LLC tag unit 14 in the processing system, so the LLC in the processing system is populated with the cache block on the bus 23. RAM 16. The output of 16 will either bypass the data on bus 23 directly to bus 12 for execution by bus 13. There is also a bus 15 for processing the system to the storage system for feeding the operation result of the processor 12 to the storage system, selecting it by the selector 18, and storing it in the memory 10 and the LLC via the bus 29. RAM 26. There is also a bus 19 that processes the system to the memory system, and the branch decision and/or branch target address of the processor 12 is sent to the address generator 22 for use by the address generator to perform branch operations independently.

存储系统与处理系统的缓存置换逻辑是相同的，因此26与16中的同一地址的缓存块被来自总线23的缓存块替换。地址21中的标签部分也被写入存储系统LLC标签单元24中由缓存置换逻辑所指出的表项；同理地址11中的标签部分也被写入处理系统LLC标签单元14中由缓存置换逻辑所指出的相同表项。如此存储系统与处理系统的LLC RAM 26 与16中有同样的数据，存储系统LLC 标签单元24与处理系统LLC标签单元14中有同样的标签内容。同样的内容先填入存储系统的标签单元24及RAM 26，经过存储系统到处理系统的通道延迟后再填入存储系统的标签单元14及RAM 16。当存储系统的地址经总线21与标签单元24中的标签内容比较发现不匹配时，即访问存储器10读取数据，开始将该数据填入存储系统LLC RAM 26，经过通道延迟后填入处理系统LLC RAM 16。正常情况下，处理系统的地址经总线11与标签单元14中的标签内容比较发现不匹配的时机比存储系统的同一个地址经总线21与标签单元24中的标签内容比较晚一个存储系统到处理系统的通道延迟，如按图2实施例将总线11上的地址送到存储系统以访问存储器10则又要经过一个处理系统到存储系统的通道延迟。如此，图4实施例中本发明公开的体系结构与图1或图2实施例中的现有体系结构相比，掩盖了存储系统与处理系统之间的双向（Round Trip）通道延迟。 The cache replacement logic of the storage system and the processing system are the same, so cache blocks of the same address in 26 and 16 are replaced by cache blocks from bus 23. The tag portion of the address 21 is also written to the entry indicated by the cache replacement logic in the storage system LLC tag unit 24; the tag portion of the same address 11 is also written to the processing system LLC tag unit 14 by the cache replacement logic. The same entry indicated. Such a storage system and processing system LLC The same data is present in the RAMs 26 and 16, and the storage system LLC tag unit 24 has the same tag content as the processing system LLC tag unit 14. The same content is first filled in the tag unit 24 and RAM of the storage system. 26, after the channel delay of the storage system to the processing system is filled in the tag unit 14 and the RAM of the storage system 16. When the address of the storage system is found to be mismatched by the comparison of the contents of the tag in the tag unit 24 with the bus 21, the access memory 10 reads the data and begins to fill the data into the storage system LLC RAM. 26, after the channel delay, fill in the processing system LLC RAM 16. Normally, the address of the processing system is compared with the contents of the tag in the tag unit 14 via the bus 11 and the timing of the mismatch is found to be later than the same address of the storage system via the bus 21 and the tag content in the tag unit 24. The channel delay of the system, as in the embodiment of Figure 2, sends the address on bus 11 to the storage system to access memory 10, which in turn passes through a channel delay of the processing system to the storage system. Thus, the architecture disclosed in the embodiment of FIG. 4 obscures the two-way between the storage system and the processing system as compared with the existing architecture in the embodiment of FIG. 1 or FIG. (Round Trip) channel delay.

请参考图 5 ，其为本发明实施例二的地址产生器实现方式及相关逻辑的原理示意图，显示了'有效链 '(Valid Chain) 的概念及基于有效链的操作。图 5 中 22 为地址产生器 , 其中 30 为寄存器堆， 34 为指令译码器， 36 为与门， 38 为执行单元， 42 为分支判断单元， 44 ， 46 ， 48 ， 50 为选择器；其中寄存器堆 30 中的每个表项都有增设的一个有效位 32 ；分支判断器 42 中的分支判断或其中存储的分支条件标记的每个表项中都有对应的一个有效位 40 。地址产生器 22 之外的 28 是先入先出， 18 为选择器。'有效链'的规则为由有效的指令对有效的操作数执行的操作，其操作结果为有效；若操作使用的任意一个操作数或操作指令为无效则操作结果为无效。在此，有效指令是指在限制指令集中的指令，亦即执行限制指令集的处理器能够执行的指令。来自存储器 10 或者缓存器的操作数被定义为有效，因此装载指令 LD 的结果为有效，记载于该装载指令的目标寄存器堆 30 的表项有效位 32 中。此后以该目标寄存器为操作数的操作，如果指令为有效指令而指令使用的其他操作数亦有效，则该操作的结果为有效。 Please refer to FIG. 5 , which is a schematic diagram of an implementation manner of an address generator and related logic according to Embodiment 2 of the present invention, showing an effective chain. The concept of '(Valid Chain) and operations based on efficient chain. In Figure 5, 22 is the address generator, where 30 is the register file, 34 is the instruction decoder, and 36 is the AND gate. 38 is the execution unit, 42 is the branch judgment unit, 44, 46, 48, 50 are selectors; wherein each entry in the register file 30 has an additional valid bit 32 The branch judgment in the branch determiner 42 or each entry in the branch condition flag stored therein has a corresponding valid bit 40 . 28 outside address generator 22 is first in, first out, 18 As a selector. The 'effective chain' rule is an operation performed by a valid instruction on a valid operand, and the operation result is valid; if any one of the operands or the operation instruction used by the operation is invalid, the operation result is invalid. Here, the valid instruction refers to an instruction in the restriction instruction set, that is, an instruction that the processor executing the restriction instruction set can execute. From memory 10 or the operand of the buffer is defined as valid, so the result of the load instruction LD is valid, and the valid entry of the entry in the target register file 30 of the load instruction is 32. Medium. Thereafter, the operation of the target register is an operand. If the instruction is a valid instruction and the other operands used by the instruction are also valid, the result of the operation is valid.

总线 27 上的指令由指令译码器 34 译码，如指令有效，则 34 控制从寄存器堆 30 中按指令中的操作数地址读出相应操作数经总线 35 及 25 送到执行单元 38 按指令操作类型处理，其结果经总线 39 写回寄存器堆 30 中由指令中目标操作数地址指向的表项；同时 32 中的上述操作数的有效位（' 0 '为无效，' 1 '为有效）也被读出送到与门 36 中，与译码器 34 产生的指令有效信号 47 进行'与'操作，其结果 37 被写回寄存器堆 30 中由指令中目标操作数地址指向的表项中的有效位 32 。如果所有操作数及指令都有效，则目标表项中的有效位为'有效'；如果任一操作数为无效，则目标表项中的有效位为'无效'。如果指令无效，则译码器 34 将意义为'无效'的指令有效信号 47 送往与门 36 使其输出 37 代表'无效'，并控制寄存器堆 30 将该'无效'信号写入该无效指令的目标寄存器堆表项中的有效位 32 。或者也可以当 37 为'无效'时控制执行单元 38 不进行操作以节省功耗。如此有效信号在寄存器表项之间经与门 36 处理传递，是为有效链，有效链可以被无效的指令或无效的操作数打断。 The instruction on bus 27 is decoded by instruction decoder 34. If the instruction is valid, then 34 controls the slave register file. The corresponding operand is read by the operand address in the instruction and sent to the execution unit via the bus 35 and 25. 38 is processed according to the instruction operation type, and the result is written back to the register file via the bus 39. The entry pointed to by the destination operand address in the instruction; at the same time, the valid bit of the above operand in 32 ('0' is invalid, '1' is valid) is also read out to AND gate 36, and decoded 34 The generated instruction valid signal 47 performs an AND operation, and the result 37 is written back to the valid bit in the register of the register file pointed to by the destination operand address in the instruction file 32. . If all operands and instructions are valid, the valid bit in the target entry is 'valid'; if any of the operands is invalid, the valid bit in the target entry is 'invalid'. If the instruction is invalid, the decoder 34 The instruction valid signal 47 meaning 'invalid' is sent to the AND gate 36 to make its output 37 represent 'invalid' and control the register file 30 The 'invalid' signal is written to the valid bit 32 in the destination register heap entry of the invalid instruction. Or you can control the execution unit when 37 is 'invalid'. No action is taken to save power. Such a valid signal is passed between the register entries via AND gate 36 processing, which is an active chain, and the active chain can be interrupted by an invalid instruction or an invalid operand.

当地址产生器 22 执行总线 27 上的一条存储（ Store ）指令时，译码器 34 控制寄存器堆 30 经总线 35 送出基地址，与此时 34 控制选择器 50 选择的存储指令中的立即数由执行单元 38 相加，其和经总线 39 存入先入先出 28 中一个表项的 56 域作为存储地址。 34 也控制寄存器堆 30 经总线 25 输出数据存入先入先出 28 中同一表项的 54 域，也控制读出寄存器堆 30 中该数据表项中有效位 32 经总线 49 存入先入先出 28 中同一表项的 52 域。当先入先出 28 中该表项到达 28 的头，从 28 中输出时，若该表项中 52 域中信号为'有效'，则该信号控制选择器 18 选择该表项中 54 域中的数据经总线 29 送往存储器 10 及 LLC RAM 26 ，写入由该表项中 56 域经总线 31 送出的存储地址所寻址的存储单元。若该表项中 52 域中信号为'无效'，则该信号控制选择器 18 选择处理系统中处理器 12 经总线 15 送来的数据经总线 29 送往存储器 10 及 LLC RAM 26 ，同样写入总线 31 上的存储地址所寻址的存储单元。执行完整指令集的处理器 12 中有同样的有效位及有效链，其有效链的产生与地址产生器 22 中完全一样，只是其用法恰恰相反。当处理器 12 执行一条存储指令时，如从寄存器堆读出的数据其有效信号为'无效'时（即该数据为执行限制指令集的地址产生器 22 无法正确产生时）则处理器 12 将该数据经总线 15 送到存储系统，由选择器 18 选择经总线 29 存入存储器 10 及 LLC RAM 26 。如从寄存器堆读出的数据其有效信号为'有效'时，则处理器 12 不将该数据送往存储系统，因为地址产生器 22 已产生了同样的数据，已将其存入，或正准备存入 10 及 26 。 When the address generator 22 executes a store (Store) instruction on the bus 27, the decoder 34 controls the register file. 30 The base address is sent via the bus 35, and the immediate value in the store instruction selected by the control selector 50 at this time is added by the execution unit 38, and the sum is stored in the first in first out via the bus 39. The 56 field of one of the entries is used as the storage address. 34 also controls the register file 30. The bus 25 outputs the data stored in the 54 field of the same entry in the first-in, first-out, and also controls the read register file. The valid bit 32 in the data entry is stored on the bus 49 in the 52 field of the same entry in the first in first out. When the first in first out is 28, the entry reaches the head of 28, from 28 In the middle output, if the signal in the 52 field in the entry is 'valid', the signal control selector 18 selects the data in the 54 field of the entry to be sent to the memory 10 and the LLC RAM via the bus 29 26. Write to the memory location addressed by the memory address sent by the 56 field in the entry via bus 31. If the signal in the 52 field in the entry is 'invalid', the signal controls the selector 18 The data sent by the processor 12 via the bus 15 in the processing system is sent to the memory 10 and the LLC RAM 26 via the bus 29, and is also written to the bus 31. The storage location addressed by the storage address. The processor 12 executing the complete instruction set has the same valid bit and active chain, and its active chain generation and address generator 22 It's exactly the same, but its usage is just the opposite. When the processor 12 executes a store instruction, such as when the data read from the register file has a valid signal of 'invalid' (i.e., the data is an address generator that executes the restricted instruction set 22) When it cannot be correctly generated, the processor 12 sends the data to the storage system via the bus 15 and is selected by the selector 18 to be stored in the memory 10 and the LLC RAM 26 via the bus 29. . If the data read from the register file has a valid signal of 'valid', then processor 12 does not send the data to the storage system because address generator 22 has generated the same data, has it stored, or is Ready to deposit 10 and 26 .

有效链也可以被用于分支判断。图 5 实施例中地址产生器 58 产生顺序指令地址 41 （当前指令地址加' 1 '）或直接分支目标地址 43 （当前指令地址加分支指令中的分支偏移量）供选择。在执行非分支指令时指令译码器 34 控制选择器 46 选择顺序地址 41 作为下一指令地址；在执行分支指令时 34 使得分支判断 45 控制选择器 46 选择顺序地址 41 或分支目标地址 43 作为指令地址；指令地址及由执行单元 38 输出的数据地址或间接分支目标地址 39 再由选择器 48 选择作为存储器地址 21 送到存储器 10 ， LLC 标签单元 24 及 LLC RAM 26 寻址。此例中地址产生器 22 独立生成的分支判断，由分支判断单元 42 根据分支指令的分支类型及执行单元 38 执行指令的执行结果 39 产生。当执行结果 39 的一部分作为分支标记或分支判断被存入 42 时，其相应有效链结果 37 也被存入寄存器 40 。 40 的输出控制选择器 44 ，如 40 的内容为有效，则 44 选择 42 输出的地址产生器分支判断以控制选择器 46 选择顺序地址 41 或分支目标地址 43 。这种情形对应于大多数的分支操作，例如执行所有整数的分支操作时。但某些特殊的分支操作可能取决于执行超出限制指令集范围的指令的结果。例如分支操作基于比较两个浮点数的大小，通常的做法是浮点单元将两个浮点数做比较，将比较结果从浮点寄存器堆移到整数寄存器堆，然后由整数单元中的分支判断器根据分支指令的分支类型做判断。如果地址产生器 22 的限制指令集不包括浮点运算，则从浮点寄存器堆移动数据到整数寄存器堆 30 的指令会打断有效链，使数据移入的整数寄存器堆表项中的有效位 32 为'无效'。当该'无效'位经与门 36 处理后认为'无效'，经 37 存入寄存器 40 ，使得选择器 44 选择总线 19 上由处理器 12 送出的分支判断作为分支判断 45 ，以控制选择器 46 选择顺序地址 41 或分支目标地址 43 。与上述从处理系统向存储系统送数据类似，处理器 12 也可保持用于分支判断的有效链，仅当其有效链结果为'无效'（意义为执行限制指令集的地址产生器 22 无法做正确分支判断）时，才经总线 19 送出处理器 12 的分支判断供地址产生器 22 使用。 The valid chain can also be used for branch judgment. Figure 5 Address Generator in the Embodiment 58 Generates a Sequence Instruction Address 41 (Current instruction address plus '1') or direct branch destination address 43 (current instruction address plus branch offset in branch instruction) for selection. The instruction decoder 34 controls the selector 46 when executing the non-branch instruction The sequential address 41 is selected as the next instruction address; when the branch instruction is executed 34 causes the branch decision 45 to control the selector 46 to select the sequential address 41 or the branch target address 43 As the instruction address; the instruction address and the data address or the indirect branch destination address 39 output by the execution unit 38 are selected by the selector 48 as the memory address 21 to be sent to the memory 10, LLC Tag unit 24 and LLC RAM 26 are addressed. In this example, the branch generator 22 generates the branch judgment independently, and the branch judgment unit 42 determines the branch type and execution unit according to the branch instruction. The execution result of the execution instruction 39 is generated. When a portion of the execution result 39 is stored as a branch flag or a branch judgment, its corresponding valid chain result 37 is also stored in the register 40. 40 The output control selector 44, if the contents of 40 is valid, 44 selects the address generator branch branch of the 42 output to control the selector 46 to select the sequential address 41 or the branch target address 43 . This situation corresponds to most branch operations, such as when performing branch operations on all integers. However, some special branch operations may depend on the result of executing an instruction that is outside the scope of the restricted instruction set. For example, the branch operation is based on comparing the size of two floating point numbers. The usual practice is that the floating point unit compares the two floating point numbers, moves the comparison result from the floating point register file to the integer register file, and then the branch determiner in the integer unit. The judgment is made according to the branch type of the branch instruction. If the address generator If the restricted instruction set of 22 does not include floating-point operations, then the instruction to move data from the floating-point register file to the integer register file 30 interrupts the valid chain, causing the data to be shifted into the valid register bits in the integer register file table. Is 'invalid'. When the 'invalid' bit is considered 'invalid' after being processed by AND gate 36, it is stored in register 40 via 37, causing selector 44 to select bus 19 on processor 12. The sent branch is judged as a branch decision 45 to control the selector 46 to select the sequential address 41 or the branch target address 43. Similar to the above sending data from the processing system to the storage system, the processor 12 It is also possible to maintain a valid chain for branch judgment, and only send the processor via bus 19 when its valid chain result is 'invalid' (meaning that address generator 22 executing the restricted instruction set cannot make a correct branch decision). The branch judgment is used by the address generator 22.

图 5 中，当指令译码器 34 译出总线 27 上是间接分支指令时，控制寄存器堆 30 经总线 35 读出基地址，也控制选择器 50 选择总线 27 上所述间接分支指令中的分支偏移量，由执行单元 38 将基地址与分支偏移量相加，其和放上结果总线 39 。如果与该运算结果相应的有效链结果 37 为'有效'，则控制选择器 48 选择总线 39 上的值放上存储器地址总线 21 作为间接分支的分支目标地址对 10 ， 24 ， 26 等寻址。如果 37 上的值为'无效'，则可能是基地址产生的过程中使用了超范围指令，此时指令译码器 34 等控制逻辑控制选择器 46 选择从总线 19 送来的处理器 12 产生的间接分支目标地址，也控制选择器 48 选择 46 的输出，使处理器 12 产生的间接分支目标地址被放上存储器地址总线 21 对 10 ， 24 ， 26 等寻址。总线 21 上的间接分支目标地址也要被存回地址产生单元 58 以产生后续的顺序地址 41 及下一个直接分支地址 43 。 In Figure 5, when the instruction decoder 34 decodes the indirect branch instruction on the bus 27, the control register file 30 is routed via the bus 35. The base address is read, and the selector 50 is also controlled to select the branch offset in the indirect branch instruction on the bus 27, and the base address is added to the branch offset by the execution unit 38, and the sum is placed on the result bus 39. . If the valid chain result 37 corresponding to the result of the operation is 'valid', the control selector 48 selects the value on the bus 39 and places the memory address bus 21 as the branch target address pair of the indirect branch. , 24, 26 and other addressing. If the value on 37 is 'invalid', it may be that the overrange instruction is used in the process of generating the base address. At this time, the control decoder 34 controls the logic to control the selector 46. Selecting the indirect branch destination address generated by processor 12 from bus 19 also controls selector 48 to select the output of 46 to cause processor 12 The generated indirect branch destination address is placed on the memory address bus 21 to address 10, 24, 26, etc. The indirect branch destination address on bus 21 is also stored back to address generation unit 58. To generate the subsequent sequential address 41 and the next direct branch address 43 .

正常情况下，存储系统中地址产生器 22 执行指令比处理系统中处理器 12 执行同一条指令至少要早存储系统到处理系统的通道延迟，可以提前向处理器 12 推送其需要的指令和数据。在上述地址产生器 22 等待总线 19 送来的分支判断或间接分支目标地址时，地址产生器 22 执行分支后的指令的时间比处理器 12 执行同样指令的时间还要晚处理系统到存储系统的通道延迟。但是地址产生器 22 不执行超范围的指令，因此可以追上并再次领先处理器 12 执行同样的指令，提前向处理器 12 推送其需要的指令和数据。其实施方式可以是加大寄存器堆 30 的表项有效位 32 的读写带宽，例如使 32 的所有位都可以并行独立读写；指令译码器 34 可对复数条指令并行译码，对超范围的无效指令直接将其目标寄存器堆表项的相应有效位 32 设为'无效'即可，指令本身不需执行。又例如指令译码器将复数条指令使用的操作数的有效位 32 按指令分组并行读出，并各自与相关的指令有效信号 47 相'与'（类似图 5 中与门 36 的功能）得出中间结果。指令译码器另将该复数条指令做相关性检测，如按程序顺序在先的指令写入的目标寄存器堆表项被顺序在后的指令用作操作数，则将在先的指令的中间结果与在后的指令的中间结果进行'与'操作作为在后指令的最终结果。其最终结果为'有效'的指令按顺序执行；最终结果为'无效'的指令则只需将其目标寄存器堆表项中的有效位 32 置为'无效'即可，指令本身无需执行。写各有效位时需做个优先权检测，如有多个对同一位置的写入，仅写入按程序顺序最后一条指令的有效链结果即可。即先检测地址产生器 22 要执行的指令，过滤其中的无效指令或虽指令有效但操作数无效的指令，使地址产生器 22 只执行能产生有效结果的指令，使地址产生器 22 能先于处理器 12 对同样指令的执行，使存储系统能提前向处理器 12 提供所需的指令或数据。如果地址产生器 22 也能执行完整的指令集（虽然执行效率不一定与处理器 12 一样），则可省略处理器 12 到地址产生器 22 的传送分支判断或分支目标地址的总线 19 。 Normally, the address generator 22 in the storage system executes the instructions than the processor in the processing system. The same instruction must be executed at least to store the channel delay from the system to the processing system, and the required instructions and data can be pushed to the processor 12 in advance. Waiting for the bus 19 at the above address generator 22 When the sent branch judges or indirectly branches the target address, the address generator 22 executes the post-branch instruction time ratio processor 12 The time to execute the same instruction is later to process the channel delay from the system to the storage system. However, the address generator 22 does not execute an overrange instruction, so it can catch up and lead the processor again. Execute the same instructions and push the required instructions and data to processor 12 in advance. The implementation may be to increase the read and write bandwidth of the entry valid bit 32 of the register file 30, for example 32 All of the bits can be read and written independently in parallel; the instruction decoder 34 can decode multiple instructions in parallel, and directly invalidate the corresponding valid bits of the target register file entry for the out-of-range invalid instructions. Set to 'invalid', the instruction itself does not need to be executed. For another example, the instruction decoder decodes the valid bits 32 of the operands used by the plurality of instructions in parallel according to the instruction packets, and respectively associated with the instruction valid signals. Phase 'and' (similar to the door in Figure 5 The function) draws intermediate results. The instruction decoder further performs correlation detection on the plurality of instructions. For example, if the target register file entry written by the preceding instruction in the program sequence is used as the operand by the subsequent instruction, the instruction will be in the middle of the previous instruction. The result is an AND operation with the intermediate result of the subsequent instruction as the final result of the subsequent instruction. The final result of the 'valid' instruction is executed sequentially; the final result of the 'invalid' instruction only needs to have the valid bits in its target register file table entry. 32 Set to 'invalid', the instruction itself does not need to be executed. A priority check is required when writing each valid bit. If there are multiple writes to the same location, only the valid chain result of the last instruction in the program order can be written. First detect the address generator 22 The instruction to be executed, filtering the invalid instruction or the instruction whose instruction is valid but the operand is invalid, causes the address generator 22 to execute only the instruction capable of generating a valid result, so that the address generator 22 can precede the processor 12. Execution of the same instructions enables the storage system to provide the processor 12 with the required instructions or data in advance. If the address generator 22 can also execute the complete instruction set (although the execution efficiency is not necessarily the same as the processor 12) The same can be omitted from the processor 12 to the address generator 22 or the branch destination address bus 19 .

【实施例三】请参考图 6 ，其为本发明实施例三的信息系统的框结构示意图，其在图 4 实施例的基础上省略了存储系统中的 LLC RAM 26 ，其余的都相同。在图 6 中，虚线上的存储系统中10为存储器，22 为地址产生器， 24 为存储系统LLC的标签单元，28为暂存地址产生器 22 输出的先入先出（FIFO），18为选择来自28或处理器12的存储数据的选择器；虚线下的处理系统中 12 为处理器， 14 为处理系统LLC的标签单元，16为处理系统LLC的数据存储器RAM。地址产生器 22 的指令或数据输入直接来自存储器10的输出总线23。地址产生器 22经总线21输出的存储器地址寻址存储器10读出指令或数据放上总线23送到地址产生器 22执行。同时总线21上的存储器地址也寻址存储系统的LLC 标签单元24进行筛选，如果在24中命中，则总线23上的指令或数据仅供地址产生器22使用，不送往处理系统；如果在24中不命中，则总线21上存储器地址按替换规则存入LLC 标签单元24；总线23上的指令或数据被送到处理系统按替换规则存入LLC RAM 16，处理器 12也产生的该指令或数据的存储器地址经总线11（与总线21上的地址一样）存入LLC 标签单元 14。其余的操作与图4及图5实施例完全一样。 [Embodiment 3] Please refer to FIG. 6, which is a block diagram of an information system according to Embodiment 3 of the present invention, which is shown in FIG. The LLC RAM 26 in the storage system is omitted on the basis of the embodiment, and the rest are the same. In Figure 6, in the storage system on the dotted line, 10 is the memory and 22 is the address generator, 24 For the tag unit of the storage system LLC, 28 is the first in first out (FIFO) output of the temporary address generator 22, 18 is the selector for selecting the stored data from 28 or the processor 12; in the processing system under the dotted line 12 is the processor, 14 is the tag unit of the processing system LLC, and 16 is the data memory RAM of the processing system LLC. Address generator 22 The instruction or data input is directly from the output bus 23 of the memory 10. The address generator 22 outputs the memory address addressed by the bus 21 to the memory 10 to read the instruction or data onto the bus 23 to the address generator. 22 execution. At the same time, the memory address on bus 21 also addresses the LLC of the storage system. The tag unit 24 performs filtering. If the hit is at 24, the instruction or data on the bus 23 is only used by the address generator 22 and is not sent to the processing system; if it does not hit in 24, the memory address on the bus 21 is replaced by the rule. Deposit into LLC Label unit 24; instructions or data on bus 23 are sent to the processing system and stored in the LLC RAM 16, processor The memory address of the instruction or data that is also generated 12 is stored in the LLC tag unit 14 via the bus 11 (as with the address on the bus 21). The rest of the operation is identical to the embodiment of Figures 4 and 5.

【实施例四】请参考图 7 ，其为本发明实施例四的信息系统的框结构示意图。本实施例中处理系统中处理器产生地址访问存储器读取信息，存储系统的地址产生器不作分支判断，只是根据处理器产生的地址及存储器根据所述地址输出的信息块产生处理器可能需用的信息的地址送到存储器，使存储器提前输出信息送到处理系统供处理器选用。如图 7 所示，所述信息系统包括存储系统及处理系统。处理系统包括：处理器 12 ，用以获取信息，其中可以含有缓存器，负责产生当前地址 11 ；信息缓冲标签单元 64 ，及信息缓冲器（ Information Buffer ） 66 ，其中 64 与 66 也合称第二存储器；以及选择器 68 。存储系统包括：存储器 10 ，用以存储信息并根据收到的地址经总线 23 输出信息块，地址产生器 60 ，用以根据当前地址 11 及总线 23 上的信息块产生地址并向存储器提供地址，以及选择器 62 。在此，所述信息处理系统通过如下方法进行信息处理： [Embodiment 4] Please refer to Figure 7, which is A block diagram of an information system of Embodiment 4 of the present invention. In this embodiment, the processor generates an address access memory read information in the processing system, and the address generator of the storage system does not make a branch judgment, but may generate a processor according to the address generated by the processor and the information block generated by the memory according to the address. The address of the information is sent to the memory, so that the memory advance output information is sent to the processing system for selection by the processor. As shown As shown in Figure 7, the information system includes a storage system and a processing system. The processing system includes: a processor 12 for obtaining information, which may include a buffer, responsible for generating a current address 11; and an information buffer tag unit 64, and an information buffer (Information Buffer) 66, wherein 64 and 66 are also collectively referred to as a second memory; and a selector 68. The storage system includes: a memory 10, for storing information and outputting a block of information via the bus 23 according to the received address, the address generator 60 for using the current address 11 and the bus 23 The upper information block generates an address and provides an address to the memory, as well as a selector 62. Here, the information processing system performs information processing by the following method:

所述处理器 12 发送存储器地址 11 ；地址 11 与信息缓冲标签单元 64 的内容进行匹配比较，若命中，则由信息缓冲器 66 向处理器 12 输出信息块；若不命中，将地址 11 送到存储系统经选择器 62 选择以访问存储器 10 ；所述存储器 10 根据所述处理器 12 发送的地址 11 输出信息块经总线 23 ，选择器 68 供处理器 12 使用；所述存储系统中地址产生器 60 根据所述存储器 10 当前输出的信息块产生地址 61 ，并经选择器 62 的选择向所述存储器 10 提供地址 61 ；所述存储器 10 根据所述地址产生器 60 提供的地址 61 输出信息块经总线 23 存入信息缓冲器 66 。 The processor 12 sends a memory address 11; an address 11 and an information buffer tag unit 64. The content is matched for comparison. If hit, the information buffer 66 outputs the information block to the processor 12; if not, the address 11 is sent to the storage system via the selector 62 to access the memory 10 The memory 10 outputs an information block via the bus 23 according to the address 11 sent by the processor 12, and the selector 68 is used by the processor 12; the address generator 60 in the storage system An address 61 is generated based on a block of information currently output by the memory 10, and an address 61 is provided to the memory 10 via selection of a selector 62; the memory 10 is based on the address generator 60 Address provided 61 The output block is stored in the message buffer 66 via bus 23.

也就是说，在本实施例中，当处理器 12 发出一个需要获取信息的地址（即发出一个请求 / 请求地址）时，所述存储系统存储器 10 除了根据该处理器 12 发出的地址输出一信息块之外供处理器 12 使用外；还根据所述地址产生器 60 产生的地址发出另一个或者多个信息块，预先存入信息缓冲器 66 以备处理器 12 下一步使用。相对于现有的处理器发出一个需要获取信息的地址，存储器根据该地址发出一个信息块，从而处理器只能得到一个与其发出的地址对应的信息块（或者说其发出的地址指向的信息块）而言，在本实施例提供的信息处理系统中，处理器 12 发出一个需要获取信息的地址，存储器 10 将输出（发出 / 发送 / 提供）多个信息块（其中，一个信息块是应处理器 12 的请求 / 地址而发送，除此以外的信息块是应地址产生器 60 的请求 / 地址而发送的），从而处理器 12 能够因一个请求 / 地址得到多个信息块，掩盖了获取第一个指令块后获取其他指令块的延迟。需说明的是，在本发明的用语中，'信息块（ Block ）'是信息的单位，其包括内容为指令的指令块及内容为数据的数据块，在此，并不限定所述'信息块'的大小（即具体为多少比特），其可以根据不同的系统要求定义，比如处理器 12 中缓存块大小的倍数。 That is, in the present embodiment, when the processor 12 issues an address at which information needs to be obtained (i.e., a request is made / When the address is requested, the storage system memory 10 is used by the processor 12 in addition to outputting a block of information based on the address issued by the processor 12; The generated address issues another one or more blocks of information that are pre-stored in the information buffer 66 for the processor 12 Use it next. Relative to the existing processor to issue an address that needs to obtain information, the memory sends a block of information according to the address, so that the processor can only get a block of information corresponding to the address it sends out (or the block of information pointed to by the address it sends) In the information processing system provided by this embodiment, the processor 12 Issue an address that requires information, and memory 10 will output (send/send/provide) multiple blocks of information (where one block is requested by processor 12 / The address is transmitted, and the other information blocks are sent at the request/address of the address generator 60, so that the processor 12 can be requested by one / The address gets multiple blocks of information, masking the delay in obtaining the other block after the first block of instructions is fetched. It should be noted that in the terminology of the present invention, 'information block (block) ' is a unit of information, which includes an instruction block whose content is an instruction and a data block whose content is data. Here, the size of the 'information block' (ie, how many bits are specifically) is not limited, which may be different according to System requirements definition, such as a processor A multiple of the cache block size in 12.

在本实施例中，所述地址产生器 60 接受总线 11 上的存储器地址，总线 23 上的信息块，总线 63 上的信号，该信号由处理器 12 送出以说明总线 11 上的地址是指令地址或数据地址；地址产生器 60 输出地址 61 ，该地址被选择器 62 选择以访问存储器 10 ，该地址也被存入信息缓冲标签单元 64 。所述地址产生器 60 通过如下过程扫描解析所述当前信息块：所述地址产生器 60 将总线 11 上的存储器地址与加上块偏移量（即相邻两个信息块的地址偏差量），得到当前信息块的相邻信息块的地址；另根据信号 63 判断所述当前信息块是否为指令块，若为指令块，则获取指令块中各指令的指令类型信息（ OP ），并译码每条指令的类型信息判断该指令是否为直接分支指令。若判断某条指令为直接分支指令，即将总线 11 上的存储器地址与加上该指令内含有的分支偏移量，得到该直接分支指令的分支目标所在指令块的地址；所述地址产生器对产生的分支目标地址进行筛选，判断分支目标地址与所述当前指令块的地址是否相同，若否，则该产生的地址通过筛选。地址产生器 60 将获取的相邻信息块地址或分支目标信息地址经总线 61 及选择器 62 提供给存储器 10 ，所述存储器 10 根据该分支指令的地址输出信息块。 In the present embodiment, the address generator 60 accepts the memory address on the bus 11, the information block on the bus 23, and the bus 63. The upper signal is sent by the processor 12 to indicate that the address on the bus 11 is an instruction address or a data address; the address generator 60 outputs an address 61 which is selected by the selector 62. Select to access memory 10, which is also stored in information buffer tag unit 64. The address generator 60 scans and parses the current information block by the following process: the address generator 60 will bus The memory address on 11 is added with the block offset (ie, the address deviation of two adjacent information blocks) to obtain the address of the adjacent information block of the current information block; Determining whether the current information block is an instruction block, and if it is an instruction block, acquiring instruction type information of each instruction in the instruction block (OP ), and decode the type information of each instruction to determine whether the instruction is a direct branch instruction. If it is judged that an instruction is a direct branch instruction, the memory address on the bus 11 is Adding the branch offset contained in the instruction, obtaining the address of the instruction block where the branch target of the direct branch instruction is located; the address generator screening the generated branch target address to determine the branch target address and the current instruction block Whether the addresses are the same, if not, the generated addresses are filtered. Address generator The acquired adjacent block address or branch target information address is supplied to the memory 10 via the bus 61 and the selector 62, and the memory 10 outputs the information block according to the address of the branch instruction.

存储器 10 根据处理器 12 经总线 11 送出的存储器地址输出的信息块经总线 23 ，选择器 68 被直接送到处理器 12 供执行。存储器 10 根据地址产生器 60 经总线 61 送出的存储器地址而输出的信息块经总线 23 被送到信息缓冲器 66 暂存，同时总线 61 上的存储器地址也被送到信息缓冲标签单元 64 暂存。置换逻辑选择 64 及 66 中的对应表项以供上述暂存。置换逻辑可以按照表项的存在时间，在缓冲器 66 已满时选择存在时间最长的表项置换。此后处理器 12 经总线 11 送出的存储器地址首先被送到 64 匹配，如果匹配，则将 66 中相应表项经选择器 68 选择送入处理器 12 供执行和在 12 中的缓存器中存储。如果 12 中含有缓存器，则此时缓冲器 66 及标签单元 64 中上述已读出表项可被标记为'可置换'。如果总线 11 上存储器地址在标签单元 64 中未匹配，则如前所述，将该存储器地址送到存储器 10 读取信息块。 The memory 10 outputs a block of information according to the memory address sent from the processor 12 via the bus 11 via the bus 23, and the selector 68 It is sent directly to the processor 12 for execution. The information block output from the memory 10 based on the memory address sent from the address generator 60 via the bus 61 is sent to the information buffer 66 via the bus 23. Temporary storage, while the memory address on bus 61 is also sent to the information buffer tag unit 64 for temporary storage. Replacement logic selection 64 and 66 The corresponding entry in the file is for the above temporary storage. The permutation logic can select the entry that has the longest time to exist when the buffer 66 is full, according to the existence time of the entry. Thereafter processor 12 goes through bus 11 The sent memory address is first sent to 64 matches, and if there is a match, the corresponding entry in 66 is selected by selector 68 to be sent to processor 12 for execution and stored in the buffer in 12. If 12 The buffer is included, and the above-mentioned read list in the buffer 66 and the tag unit 64 can be marked as 'replaceable' at this time. If the memory address on bus 11 is in tag unit 64 If there is no match, the memory address is sent to the memory 10 to read the information block as previously described.

本实施例所述地址产生器 60 可以在多种模式下操作。例如当信息缓冲器 66 内容小于某个容量时所述地址产生器 60 对总线上的所有信息块都进行扫描解析，据此产生地址访问存储器 10 。当信息缓冲器 66 内容大于某个容量时，所述地址产生器 60 只对由处理器 12 经总线 11 送出的存储器地址及存储器 10 根据该地址输出的信息块扫描解析；而不对由地址产生器 60 经总线 61 送出的存储器地址及存储器 10 根据该地址输出的信息块作扫描解析。此外还可以有其他操作模式，在此不一一赘述。 The address generator 60 of this embodiment can operate in a variety of modes. For example, when the information buffer 66 When the content is smaller than a certain capacity, the address generator 60 performs scan analysis on all the information blocks on the bus, thereby generating an address access memory 10 accordingly. When the content of the information buffer 66 is greater than a certain capacity, The address generator 60 scans only the memory address sent by the processor 12 via the bus 11 and the information block output by the memory 10 based on the address; instead of the address generator 60 via the bus 61 The sent memory address and memory 10 are scanned and parsed based on the information block output from the address. In addition, there are other modes of operation, which will not be repeated here.

【实施例五】请参考图 8 ，其为本发明实施例五的存储层次系统的框结构示意图。其中 110 为缓存器标签单元（ Tag Unit ）， 120 为缓存器存储器 (RAM) ，两者共同构成处理器的一个存储器层次； 118 为增设的地址产生器， 114 为选择器。 113 为存储器访问地址（ Memory Access Address ） , 来自处理器核或较高存储器层次（ Memory Hierarchi ）以对本存储器层次进行访问；地址 113 被送到选择器 114 的一个输入端，经选择器 114 选择后，经总线 111 与标签单元 110 中存储的标签比较匹配。如总线 113 上的地址在标签单元 110 中获得匹配，则存储器 120 中的相应信息经总线 123 送到较高存储器层次或处理器核供使用。如访问地址 113 在标签单元 110 中未获匹配，则该地址经总线 111 送出访问较低存储器层次，获得的信息经总线 121 存入本存储器层次的存储器 120 ，总线 111 上的地址也被存入标签单元 110 中的相应表项。此时访问地址 113 在标签单元 110 中获得匹配，相应信息经总线 123 送到较高存储器层次或处理器核供使用。也可以将总线 121 上的信息在存入存储器 120 的同时直接旁路到总线 123 上。地址产生器 118 基于总线 111 上的地址，在该地址上增加一个增量 115 ，其增量为一个信息块的大小，产生预测地址 (Predicted Address) 119 送到选择器 114 的另一个输入端。图中未显示的仲裁器控制选择器 114 ，如访问地址 113 有效，则仲裁器选择访问地址 113 放上总线 111 送到标签单元 110 匹配；如访问地址 113 上无效，而预测地址 119 有效，则仲裁器选择预测地址 119 放上总线 111 送到标签单元 110 匹配。 [Embodiment 5] Please refer to FIG. 8 , which is a block diagram of a storage hierarchy system according to Embodiment 5 of the present invention. Of which 110 For the buffer tag unit (Tag Unit), 120 is the buffer memory (RAM), which together constitute a memory hierarchy of the processor; 118 is an additional address generator, 114 As a selector. 113 is the Memory Access Address, from the processor core or higher memory level (Memory Hierarchi Accessing this memory hierarchy; address 113 is sent to an input of selector 114, selected by selector 114, via bus 111 and tag unit 110 The tags stored in the comparison match. If the address on bus 113 is matched in tag unit 110, the corresponding information in memory 120 is routed via bus 123. Send to a higher memory level or processor core for use. If the access address 113 is not matched in the tag unit 110, the address is sent over the bus 111 to access the lower memory level, and the obtained information is routed through the bus. 121 The memory 120 stored in the memory hierarchy, the address on the bus 111 is also stored in the corresponding entry in the tag unit 110. At this time, the access address 113 is at the tag unit 110. A match is obtained, and the corresponding information is sent to a higher memory level or processor core via bus 123 for use. It is also possible to bypass the information on the bus 121 directly to the bus while storing it in the memory 120. On. The address generator 118 adds an increment 115 to the address based on the address on the bus 111 in increments of one block size to produce a predicted address (Predicted) Address) 119 is sent to the other input of selector 114. The arbiter control selector 114 not shown in the figure, if the access address 113 is valid, the arbiter selects the access address 113 The bus 111 is placed on the tag unit 110 for matching; if the access address 113 is invalid and the predicted address 119 is valid, the arbiter selects the predicted address 119 and puts it on the bus 111. It is sent to the tag unit 110 to match.

如预测地址 119 未在标签单元中获得匹配，则该预测地址 119 经总线 111 送出访问较低存储器层次，获得的信息经总线 121 存入本存储器层次的存储器 120 ，总线 111 上的地址也被存入标签单元 110 中的相应表项。此时总线 121 上的信息不需被旁路到总线 123 上。地址产生器 118 就如此循环，在总线 111 的地址上增加增量 115 产生预测地址 119 ；只要访问地址 113 无效，仲裁器就控制选择器 114 选择预测地址 119 经总线 111 送到标签单元 110 匹配；只要预测地址与标签单元 110 中的标签不匹配，总线 111 上的预测地址就被送出访问较低的存储器层次获取信息存入存储器 120 ；地址产生器 118 在总线 111 的地址上增加增量 115 产生预测地址 119 。当预测地址 119 与标签单元 110 中的标签匹配时，则地址产生器 118 终止基于该预测地址的后续操作。即本实施例的地址产生器 118 由在标签单元 110 中未获得匹配的访问地址 113 触发，持续产生预测地址 119 以访问较低的存储器层次，获取信息填充本存储层次的存储器 120 ，直到预测地址在本层次的标签单元 110 中获得匹配为止。也即以地址与标签单元 110 中的标签匹配来控制地址产生器的操作，未匹配的访问地址触发地址产生器 118 的操作，匹配的预测地址终止地址产生器 118 的操作。 If the predicted address 119 does not get a match in the tag unit, then the predicted address 119 is via bus 111. The access to the lower memory level is sent, and the obtained information is stored in the memory 120 of the memory level via the bus 121, and the address on the bus 111 is also stored in the corresponding entry in the tag unit 110. Bus at this time The information on 121 does not need to be bypassed onto bus 123. The address generator 118 thus cycles by incrementing the address of the bus 111 to generate a predicted address 119; as long as the address is accessed 113 Invalid, the arbiter controls the selector 114 to select the predicted address 119 to be sent to the tag unit 110 via the bus 111; as long as the predicted address and tag unit 110 The tags in the mismatch, the predicted address on the bus 111 is sent out to access the lower memory level, and the information is stored in the memory 120; the address generator 118 increments the address of the bus 111. 115 Generate predicted address 119. When the predicted address 119 matches the tag in the tag unit 110, the address generator 118 The subsequent operation based on the predicted address is terminated. That is, the address generator 118 of the present embodiment is triggered by the access address 113 that does not obtain a match in the tag unit 110, and the predicted address 119 is continuously generated. In order to access the lower memory hierarchy, the acquisition information fills the memory 120 of the present storage hierarchy until the predicted address is matched in the tag unit 110 of the present hierarchy. That is, the address and tag unit 110 The tag match in the control controls the operation of the address generator, the unmatched access address triggers the operation of the address generator 118, and the matched predicted address terminates the operation of the address generator 118.

实施例五中的地址产生器 118 在其产生的预测地址 119 与标签存储器 110 中的标签匹配时会持续操作。某些情况下，需要地址产生器 118 在预测地址与标签仍然匹配的情况下终止操作，比如在程序的结尾处，以下三个实施例展示三种不同的终止方式。 The address generator 118 in the fifth embodiment generates the predicted address 119 and the tag memory 110. The labels in the match will continue to operate. In some cases, an address generator is required 118 Terminating the operation if the predicted address still matches the tag, such as at the end of the program, the following three embodiments show three different termination methods.

【实施例六】请参考图 9 ，其为本发明实施例六的存储层次系统的框结构示意图。其中 110 为缓存器标签单元， 120 为缓存器存储器， 118 为地址产生器， 113 为访问地址， 119 为预测地址， 114 为选择器选择 113 或 119 ， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线， 123 为本层次与较高存储层次或处理器核之间的信息总线，与图 8 中相同。其操作也与实施例五相似，不再赘述。图 9 中增加了指令译码器 122 。来自较高存储层次或处理器核的访问地址 113 有一伴随信号指明要获得的的信息是指令或数据。这一信号经总线 111 送到地址产生器 118 ，地址产生器 118 产生预测地址 119 时也保持这一信号。因此根据伴随总线 111 上的预测地址的这一信号可以确定经总线 121 来自较低存储层次的信息块是指令块或数据块。指令译码器 122 对总线 121 上的指令块中指令进行译码。如果 122 译出上述指令块中有间接分支指令，则使地址产生器 118 终止产生预测地址。因为编译后的程序都以间接分支指令结尾，如此就避免了对程序结尾以后的存储器位置进行预取。 [Embodiment 6] Please refer to FIG. 9, which is a block diagram of a storage hierarchy system according to Embodiment 6 of the present invention. Of which 110 For the buffer tag unit, 120 is the buffer memory, 118 is the address generator, 113 is the access address, 119 is the predicted address, and 114 is 113 or 119 for the selector. 111 is the address bus selected by the selector 114, 121 is the information bus between the lower storage level and the current level, 123 is the information bus between the level and the higher storage level or the processor core, and the figure The same in 8 . The operation is similar to that of the fifth embodiment and will not be described again. The instruction decoder 122 is added in FIG. Access address from a higher storage tier or processor core 113 There is an accompanying signal indicating that the information to be obtained is an instruction or data. This signal is sent via bus 111 to address generator 118, which generates a predicted address 119. This signal is also maintained. Thus, based on this signal accompanying the predicted address on bus 111, it can be determined that the information block from the lower memory level via bus 121 is an instruction block or a data block. Instruction decoder 122 to bus The instruction in the instruction block on 121 is decoded. If the translation block has an indirect branch instruction in the above instruction block, the address generator 118 is caused. Termination generates a predicted address. Because the compiled program ends with an indirect branch instruction, prefetching of memory locations after the end of the program is avoided.

【实施例七】请参考图 10 ，其为本发明实施例七的存储层次系统的框结构示意图。其中 110 为缓存器标签单元， 120 为缓存器存储器， 118 为地址产生器， 114 为选择器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线， 123 为本层次与较高存储层次或处理器核之间的信息总线，与图 8 中相同。其操作也与实施例五相似，不再赘述。图 10 中增加了计数器 112 。计数器 112 对由地址产生器 118 根据总线 113 上的访问地址产生的预测地址 119 赋予一个初始计数值 ' N '。地址产生器每产生一个新的预测地址，计数器 112 中的计数值即减' 1 '。当计数器 112 中的计数值到' 0 '时，使地址产生器 118 终止操作，使预取的信息块数目不超过一个预设的最大值。如此可以防止预取太多处理器核的分支选择不执行的指令段或不使用的数据段。也可以防止预取太多程序结尾后的无效信息。 [Embodiment 7] Please refer to FIG. 10, which is a block diagram of a storage hierarchy system according to Embodiment 7 of the present invention. Of which 110 For the buffer tag unit, 120 is the buffer memory, 118 is the address generator, 114 is the selector, 113 is the access address, 119 is the predicted address, and 111 is the selector. 114 The selected address bus, 121 is the information bus between the lower storage hierarchy and the current level, 123 is the information bus between the upper layer and the higher storage hierarchy or processor core, and Figure 8 Same in the middle. The operation is similar to that of the fifth embodiment and will not be described again. The counter 112 is added in Figure 10. Counter 112 is paired by address generator 118 according to bus 113 The predicted address generated by the upper access address 119 is assigned an initial count value of 'N'. Each time the address generator generates a new predicted address, the count value in counter 112 is decremented by '1'. When counter When the count value in 112 reaches '0', the address generator 118 is terminated, so that the number of prefetched information blocks does not exceed a preset maximum value. This prevents pre-fetching too many processor core branches from selecting non-executable instruction segments or unused data segments. It also prevents prefetching invalid information after the end of too many programs.

【实施例八】请参考图 11 ，其为本发明实施例八的存储层次系统的框结构示意图。其中 110 为缓存器标签单元， 120 为缓存器存储器， 118 为地址产生器， 114 为选择器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线， 123 为本层次与较高存储层次或处理器核之间的信息总线，与图 8 中相同。其操作也与实施例五相似，不再赘述。图 11 中增加了比较器 116 。比较器 116 将由地址产生器 118 根据总线 113 上的访问地址产生的预测地址 119 与 116 中存储的地址边界比较。如预测地址 119 越过了比较器 116 中存储的地址边界，则使地址产生器 118 终止操作，使预取的信息块不越过地址边界。该地址边界可以是分配给处理器的的最小存储器区间，例如一个存储页面（ page ）或一个存储段 (segment) 。如此可以防止预取超出本线程所获分配的存储器范围。同样可以防止预取太多处理器核选择不执行的指令段或不使用的数据段。所述地址边界可以是预存在比较器 116 中或由程序写入比较器 116 中的寄存器中。 [Embodiment 8] Please refer to FIG. 11, which is a block diagram of a storage hierarchy system according to Embodiment 8 of the present invention. Of which 110 For the buffer tag unit, 120 is the buffer memory, 118 is the address generator, 114 is the selector, 113 is the access address, 119 is the predicted address, and 111 is the selector. 114 The selected address bus, 121 is the information bus between the lower storage hierarchy and the current level, 123 is the information bus between the upper layer and the higher storage hierarchy or processor core, and Figure 8 Same in the middle. The operation is similar to that of the fifth embodiment and will not be described again. Comparator 116 is added in Figure 11. Comparator 116 will be based on bus 113 by address generator 118 The predicted address generated by the upper access address is compared with the address boundary stored in 116. If the predicted address 119 crosses the address boundary stored in the comparator 116, the address generator 118 is caused Terminate the operation so that the prefetched information block does not cross the address boundary. The address boundary can be the smallest memory interval allocated to the processor, such as a page (page) or a segment (segment) . This prevents prefetching beyond the memory range allocated by this thread. It is also possible to prevent prefetching too many processor cores from selecting non-executable instruction segments or unused data segments. The address boundary may be a pre-existing comparator 116 The program is written to a register in comparator 116.

【实施例九】请参考图 12 ，其为本发明实施例九的存储层次系统的框结构示意图。其中 110 为缓存器标签单元， 120 为缓存器存储器， 118 为地址产生器， 114 为选择器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线， 123 为本层次与较高存储层次或处理器核之间的信息总线，与图 8 中相同。其操作也与实施例五相似，不再赘述。图 12 中增加了图 9 中的指令译码器 122 ，图 10 中的计数器 112 以及图 11 中的比较器 116 。可以根据需要选用上述的部分或全部装置以终止地址产生器 118 的操作。如果使用上述全部装置，则在标签单元 110 中未获得匹配的有效访问地址 113 触发地址产生器 118 产生预测地址 119 ，预测地址经总线 111 送出访问较低存储层次，获取信息填充存储器 120 。地址产生器 118 持续产生新的预测地址 119 直到预测地址与标签单元中的标签匹配，或预测地址越过预设的存储器地址边界，或产生的预测地址的数目达到了预设的最大值，或获取的指令块中有间接分支指令为止。 [Embodiment 9] Please refer to FIG. 12, which is a block diagram of a storage hierarchy system according to Embodiment 9 of the present invention. Of which 110 For the buffer tag unit, 120 is the buffer memory, 118 is the address generator, 114 is the selector, 113 is the access address, 119 is the predicted address, and 111 is the selector. 114 The selected address bus, 121 is the information bus between the lower storage hierarchy and the current level, 123 is the information bus between the upper layer and the higher storage hierarchy or processor core, and Figure 8 Same in the middle. The operation is similar to that of the fifth embodiment and will not be described again. The instruction decoder 122 of Fig. 9, the counter 112 of Fig. 10, and the comparator 116 of Fig. 11 are added to Fig. 12. . Some or all of the above devices may be selected as needed to terminate the operation of address generator 118. If all of the above devices are used, no matching valid access address is obtained in the tag unit 110. The trigger address generator 118 generates a predicted address 119, and the predicted address is sent over the bus 111 to access the lower memory level, and the information is filled in the memory 120. Address Generator 118 Continue to generate new forecast addresses 119 Until the predicted address matches the tag in the tag unit, or the predicted address crosses the preset memory address boundary, or the number of generated predicted addresses reaches a preset maximum value, or the obtained instruction block has an indirect branch instruction.

【实施例十】请参考图 13 ，其为本发明实施例十的存储层次系统的框结构示意图。其中 110 为缓存器标签单元， 120 为缓存器存储器， 114 为选择器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线， 123 为本层次与较高存储层次或处理器核之间的信息总线，以上各模块及总线与图 8 中同样号码的模块相同。新增扫描器 128 ，其包含了实施例五至四中地址产生器 118 ，计数器 112 ，比较器 116 的功能，并能扫描通过总线 123 传输的信息块，根据其中分支指令产生分支目标初始地址。实施例十可实现实施例九的所有功能。实施例十与实施例九以及实施例五至四的最大差异在于地址产生器操作不但可以由有效的访问地址 113 在标签单元中未获匹配而触发；而且可以由有效的访问地址 113 在标签单元 110 中获得匹配触发。 [Embodiment 10] Please refer to FIG. 13, which is a block diagram of a storage hierarchy system according to Embodiment 10 of the present invention. Of which 110 For the buffer tag unit, 120 is the buffer memory, 114 is the selector, 113 is the access address, 119 is the predicted address, and 111 is the address bus selected by the selector 114. 121 is the information bus between the lower storage tier and the tier, 123 is the information bus between the tier and the higher storage tier or processor core, the above modules and bus and Figure 8 The modules with the same number are the same. A new scanner 128 is included, which includes the functions of the address generator 118, the counter 112, the comparator 116 in the fifth to fourth embodiments, and can scan through the bus 123. The transmitted information block generates a branch target initial address according to the branch instruction. Embodiment 10 can implement all the functions of Embodiment 9. The biggest difference between Embodiment 10 and Embodiment 9 and Embodiments 5 to 4 is that the address generator operation can be operated not only by a valid access address. 113 is triggered without a match in the tag unit; and a match trigger can be obtained in tag unit 110 by a valid access address 113.

上述有效访问地址 113 在标签单元中未获匹配，因而触发地址产生器及地址产生器操作的过程与实施例五中相同，在此不再赘述。有效的访问地址 113 经总线 111 在标签单元 110 中获得匹配时，从存储器 120 读取相应信息块经总线 123 送到较高存储层次或处理器核，同时也将总线 111 上的有效访问地址送到扫描器 128 。扫描器中的地址产生器根据上述在标签单元中标签匹配的有效访问地址持续产生预测地址，并按实施例五至四中各条件终止产生预测地址。扫描器 128 中的地址产生器可以在上述有效地址中的块地址（存储器地址高位，例如标签与索引地址）上加上增量获得顺序下个信息块的初始预测地址。进一步，扫描器 128 也可以译码总线 123 上指令块中的各指令，对其中的分支指令， 128 中的地址产生器以分支指令本身的地址加上分支指令中的分支偏移量得到分支目标地址作为初始预测地址。此后地址产生器即产生初始预测地址后的顺序预测地址直到终止条件使其终止产生预测地址。某些存储器器层次结构用缓存地址来直接对存储器寻址，此时可以用缓存块地址（缓存地址中的高位，例如路号与索引地址）读出标签单元 110 中的标签与索引地址合并送到扫描器 128 作为存储器块地址。 The above valid access address 113 The process of triggering the operation of the address generator and the address generator is the same as that in the fifth embodiment, and details are not described herein again. A valid access address 113 via bus 111 at tag unit 110 When a match is obtained, the corresponding block of information is read from the memory 120 and sent to the higher storage hierarchy or processor core via the bus 123, and the valid access address on the bus 111 is also sent to the scanner 128. . The address generator in the scanner continues to generate the predicted address based on the valid access address of the tag match in the tag unit described above, and terminates the generation of the predicted address in accordance with the conditions in Embodiments 5 through 4. Scanner 128 The address generator in the medium can add the increment to the block address (the upper address of the memory address, such as the tag and the index address) in the above effective address to obtain the initial predicted address of the next block of the order. Further, the scanner 128 It is also possible to decode each instruction in the instruction block on bus 123, for branch instructions therein, 128 The address generator in the branch obtains the branch target address as the initial predicted address by the branch instruction itself plus the branch offset in the branch instruction. The address generator then generates the sequential prediction address after the initial predicted address until the termination condition causes it to terminate generating the predicted address. Some memory tiers use buffer addresses to directly address memory. In this case, the block unit can be read by the cache block address (higher bits in the cache address, such as the road number and index address). The tag in 110 is merged with the index address and sent to the scanner 128 as the memory block address.

【实施例十一】请参考图 14 ，其为本发明实施例十一的存储层次系统的扫描器结构示意图。实施例十一可以完成以顺序下块地址为初始预测地址的操作。图 14 中 110 为缓存器标签单元， 120 为缓存器存储器， 114 为选择器， 122 为指令地址译码器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线，以上各模块及总线与图 12 中同样号码的模块相同。扫描器 128 包含了由寄存器 130 ，加法器 132 ，选择器 134 构成的地址产生器，其功能类似于实施例五中的地址产生器 118 ；包含了由寄存器 136 ，减法器 138 ，选择器 140 构成的计数器，其功能类似于实施例七中的计数器 112 ；包含了地址边界比较器 116 ，还包含了由存储单元 146 及 148 构成的有损栈（ Loosy Stack ）。 [Embodiment 11] Please refer to FIG. 14 , which is A schematic diagram of a scanner structure of a storage hierarchy system according to Embodiment 11 of the present invention. The eleventh embodiment can complete the operation of sequentially lowering the block address as the initial predicted address. In Figure 14, 110 is the buffer label unit, 120 For the buffer memory, 114 is the selector, 122 is the instruction address decoder, 113 is the access address, 119 is the predicted address, and 111 is the address bus selected by the selector 114. 121 is the information bus between the lower storage level and the current level. The above modules and buses are the same as the modules of the same number in Figure 12. The scanner 128 is comprised of a register 130, an adder 132 The selector 134 constitutes an address generator having a function similar to that of the address generator 118 in the fifth embodiment; including a register 136, a subtractor 138, and a selector 140. The constructed counter is similar in function to the counter 112 in the seventh embodiment; it includes an address boundary comparator 116, and also includes a lossy stack composed of the storage units 146 and 148 (Loosy). Stack ).

当满足实施例十所述的触发条件时，（即访问地址 113 有效时），选择器 140 选择预设最大预取数' N ' 133 送到减法器 138 减' 1 '，其差存入寄存器 136; 同时选择器 134 选择总线 111 上的地址送到加法器 132 与增量 115 相加，其和存入寄存器 130 。 130 的输出即为预测地址 119 ，如此时访问地址 113 无效，选择器 114 选择预测地址 119 经总线 111 送到标签单元 110 匹配。如预测地址 119 与 110 中的标签匹配，则终止地址产生器的进一步操作。如预测地址 119 与 110 中的标签不匹配，则总线 111 上的地址被送到较低存储层次读取信息填入存储器 120 ；同时选择器 140 选择寄存器 136 的输出 137 送到减法器 138 减' 1 '，其差存入寄存器 136 ；选择器 134 选择总线 119 上的预测地址送到加法器 132 与增量 115 相加，其和存入寄存器 130 。如此循环操作直到减法器 138 的输出为' 0 '；或预测地址 119 与标签单元 110 中标签匹配，或预测地址 119 越过边界比较器 116 中预设的边界；或指令译码器 122 译出间接分支指令为止。 When the trigger condition described in Embodiment 10 is satisfied (i.e., when the access address 113 is valid), the selector 140 selects the preset maximum prefetch number' N ' 133 is sent to the subtractor 138 minus '1', the difference is stored in register 136; at the same time selector 134 selects the address on bus 111 to be sent to adder 132 and increment 115 The sum is added to the register 130. The output of 130 is the predicted address 119. In this case, the access address 113 is invalid, and the selector 114 selects the predicted address 119 via the bus 111. It is sent to the tag unit 110 to match. If the predicted addresses 119 match the tags in 110, the further operation of the address generator is terminated. If the predicted addresses 119 and 110 do not match the label, then the bus The address on 111 is sent to the lower memory level read information to be filled in memory 120; at the same time selector 140 selects output 137 of register 136 to be sent to subtractor 138 minus '1 ', the difference is stored in register 136; selector 134 selects the predicted address on bus 119 and sends it to adder 132 to add increment 115, which is stored in register 130. . This loop operates until the output of the subtractor 138 is '0'; or the predicted address 119 matches the tag in the tag unit 110, or the predicted address 119 crosses the boundary comparator 116. The preset boundary; or the instruction decoder 122 translates the indirect branch instruction.

如果地址产生器操作过程中得知，访问地址 113 将会有效，则选择器 114 在本时钟周期仍选择预测地址 119 放上总线 111 送到标签单元 110 匹配。同时预测地址 119 被压入有损栈中存储器 146 ，总线 137 上的计数被压入有损栈中存储器 148 的同一行。下一时钟周期，选择器 114 选择有效的访问地址 113 放上总线 111 送到标签单元 110 匹配。如果不匹配，此后如前所述，扫描器 128 中选择器 134 选择总线 111 上新的访问地址送到加法器 132 产生基于新的访问地址的预测地址；同时选择器 140 选择最大预取块数 133 送到减法器 138 计数，开始基于新的访问地址的预取。当基于新的访问地址的预取因为上述任何原因终止时，选择器 134 选择来自有损栈中存储器 146 的总线 131 ，将栈顶的预测地址送入加法器 132 与增量 115 相加；同时选择器 140 选择来自有损栈中存储器 148 的总线 135 ，将栈顶的计数值送入减法器 138 减' 1 '；如此恢复被打断的基于老的访问地址的预取。栈的使用使得对处理器核最可能需要的信息的预取（当前有效访问地址 113 提出访问请求的后续信息的预取）或较可能需要的信息的预取（最近的有效访问地址 113 提出访问请求的后续信息预取的地址在栈顶或接近栈顶）有优先权。有损栈深度有限，当有损栈填满后继续对其压栈，会使位于其栈底的表项被抛弃。如此可以放弃对处理器核在短时间内较少概率使用的信息（其地址可能在栈底）的预取。 If the address generator is known during operation, the access address 113 will be valid, then the selector 114 still selects the predicted address during this clock cycle. 119 Put the bus 111 to the tag unit 110 to match. At the same time, the predicted address 119 is pushed into the memory 146 in the lossy stack, and the count on the bus 137 is pushed into the memory in the lossy stack. The same line. On the next clock cycle, the selector 114 selects a valid access address 113 and puts the bus 111 on the tag unit 110 for matching. If it does not match, as described earlier, scanner 128 The middle selector 134 selects the new access address on bus 111 and sends it to adder 132 to generate a predicted address based on the new access address; at the same time selector 140 selects the maximum number of prefetch blocks 133 to the subtractor 138 counts, starting prefetching based on the new access address. When the prefetch based on the new access address is terminated for any of the above reasons, the selector 134 selects the bus 131 from the memory 146 in the lossy stack. The predicted address at the top of the stack is sent to the adder 132 and added to the increment 115; at the same time, the selector 140 selects the bus 135 from the memory 148 in the lossy stack, and sends the count value of the top of the stack to the subtractor. 138 minus '1'; thus recovering the prefetch of the old access address that was interrupted. The use of the stack prefetches the most likely information needed by the processor core (current valid access address 113 Prefetching of subsequent information requesting an access request or prefetching of more likely information (the most recent valid access address 113 The address prefetched by the subsequent information requesting the access request has priority at or near the top of the stack. The lossy stack depth is limited. When the lossy stack is filled and continues to be pushed onto the stack, the entries at the bottom of the stack are discarded. This allows for the prefetching of information that the processor core uses at a lower probability in a short period of time (whose address may be at the bottom of the stack).

图 14 中实施例中扫描器 128 可以根据访问地址产生顺序预测地址，其与图 12 中各模块是完全等效的，可以将扫描器 128 应用到图 12 中以取代图 12 中地址产生器 118 ，计数器 112 及边界比较器 116 。对图 14 实施例稍做改变可以使其对经总线 123 传递的分支指令做分支目标预取。 In the embodiment of Figure 14, the scanner 128 can generate an order prediction address based on the access address, which is compared with Figure 12. The modules in the module are completely equivalent, and the scanner 128 can be applied to FIG. 12 instead of the address generator 118, the counter 112 and the boundary comparator 116 of FIG. Figure 14 A slight change to the embodiment allows branch target prefetching of branch instructions passed over bus 123.

【实施例十二】请参考图 15 ，其为本发明实施例十二的存储层次系统的扫描器结构示意图。实施例十二可以完成以顺序下块地址为初始预测地址，以及以分支目标地址为初始预测地址的操作。图 15 中 110 为缓存器标签单元， 120 为缓存器存储器， 114 为选择器， 122 为指令地址译码器， 113 为访问地址， 119 为预测地址， 111 为经选择器 114 选择后的地址总线， 121 为较低存储层次与本层次之间的信息总线。扫描器 128 包含了由寄存器 130 ，加法器 133 ，选择器 134 构成的地址产生器；包含了由寄存器 136 ，减法器 138 ，选择器 140 构成的计数器；包含了地址边界比较器 116 ，还包含了由存储单元 146 及 148 构成的有损栈（ Loosy Stack ）。以上各模块及总线与图 14 中同样号码的模块相同。与图 14 中相比增加了本层次与较高存储层次或处理器核之间的信息总线 123 ，扫描器 128 中也增添了指令译码器 150 及选择器 152 以便支持以分支目标地址为初始预测地址的操作。 [Embodiment 12] Please refer to FIG. 15 , which is A schematic diagram of a scanner structure of a storage hierarchy system according to Embodiment 12 of the present invention. The twelfth embodiment can complete the operation of using the lower block address as the initial predicted address and the branch target address as the initial predicted address. Figure 15 110 For the buffer tag unit, 120 is the buffer memory, 114 is the selector, 122 is the instruction address decoder, 113 is the access address, 119 is the predicted address, and 111 is the selector. 114 The selected address bus, 121 is the information bus between the lower storage hierarchy and this level. The scanner 128 includes a register 130, an adder 133, and a selector 134. The address generator is configured to include a counter composed of a register 136, a subtractor 138, and a selector 140; an address boundary comparator 116 is included, and the memory unit 146 is further included. 148 constitutes a lossy stack (Loosy Stack). The above modules and buses are the same as the modules of the same number in Figure 14. With Figure 14 In addition, the information bus 123 between the level and the higher storage level or the processor core is added, and the instruction decoder 150 and the selector 152 are also added to the scanner 128. In order to support the operation of using the branch target address as the initial predicted address.

实施例中扫描器 128 产生两种初始预测地址，顺序初始预测地址，或者分支目标初始预测地址。产生初始预测地址后都按在当前预测地址上增加增量的方法产生下一个预测地址（也即按地址顺序下一信息块的地址）。不管访问地址在标签单元 110 中是否匹配，扫描器 128 都会以总线 111 上的访问地址加上增量 115 的方式产生顺序初始预测地址，并在该顺序初始预测地址上产生后续预测地址，从较低存储层次预取信息存入本层次的存储器 120 ，直到终止条件使扫描器 128 中地址产生器终止产生预测地址为止。当以顺序地址为预测地址时，图 15 中指令译码器 150 控制选择器 152 选择增量 115 送到加法器 132 的输入端，其后的操作过程与实施例七及图 14 完全一致，在此不再赘述。 Scanner in the embodiment 128 Generate two initial predicted addresses, sequential initial predicted addresses, or branch target initial predicted addresses. After the initial predicted address is generated, the next predicted address (i.e., the address of the next block of information in the order of addresses) is generated by increasing the increment on the current predicted address. Regardless of the access address in the tag unit If there is a match in 110, scanner 128 will increment by the access address on bus 111. The manner of generating a sequential initial predicted address and generating a subsequent predicted address at the initial predicted address of the sequence, prefetching information from the lower storage level into the memory 120 of the hierarchy, until the termination condition causes the scanner 128 The medium address generator terminates generating the predicted address. When the sequential address is the predicted address, the command decoder 150 in Fig. 15 controls the selector 152 to select the increment 115 to be sent to the adder 132. The operation of the input terminal is the same as that of the embodiment 7 and FIG. 14 and will not be described here.

只有当总线 111 上的指令访问地址在标签单元 110 中匹配，存储器 120 经总线 123 输出相应的指令块时，扫描器 128 才会产生分支目标初始预测地址。此时指令译码器 150 对总线 123 上的上述指令块中的指令进行译码，对指令中的直接分支指令计算其分支目标地址。此时指令译码器 150 控制选择器 152 选择总线 123 上该分支指令中的分支偏移量送到加法器 132 的一个输入端；并且将该分支指令在指令块中的块内地址偏移量经总线 154 送出，与总线 111 上的块地址拼合成分支指令的源地址送到加法器 132 的另一个输入端。加法器 132 将分支指令的地址（源地址）与分支偏移量相加，输出分支目标地址。该分支目标地址经寄存器 130 寄存后输出，是为分支目标初始地址 119 。如此时访问地址 113 无效，则该分支目标初始地址被选择器 114 选择，经总线 111 送往标签单元 110 匹配。若分支目标初始地址与标签单元 110 中的标签匹配，则该分支目标指令已在存储器 120 中，不需基于该分支目标初始地址进行任何预取。因此指令译码器 150 可以处理下一条直接分支指令。 Only when the instruction access address on bus 111 matches in tag unit 110, memory 120 passes bus 123. The scanner 128 will generate the branch target initial predicted address when the corresponding instruction block is output. At this time, the instruction decoder 150 pairs the bus 123. The instruction in the above instruction block is decoded, and the branch target address is calculated for the direct branch instruction in the instruction. At this time, the instruction decoder 150 controls the selector 152 to select the bus 123. The branch offset in the branch instruction is sent to an input of the adder 132; and the intra-block address offset of the branch instruction in the instruction block is sent over the bus 154, and the bus 111 The source address of the upper block address synthesis branch instruction is sent to the other input of the adder 132. Adder 132 The branch instruction source address (source address) is added to the branch offset to output the branch destination address. The branch target address is registered by register 130 and is output as the branch target initial address 119. Access address at this time If 113 is invalid, the branch target initial address is selected by the selector 114 and sent to the tag unit 110 via the bus 111 for matching. If the branch target initial address and tag unit 110 If the tag in the match matches, then the branch target instruction is already in the memory 120, and no prefetching is required based on the branch target initial address. Therefore, the instruction decoder 150 can process the next direct branch instruction.

如果分支目标初始地址与标签单元 110 中的标签不匹配，则选择器 134 选择分支目标初始地址 119 送到加法器 132 的一个输入端，此时指令译码器 150 控制选择器 152 选择增量 115 送到加法器 132 的另一个输入端，因此加法器 132 的输出即分支目标初始地址的顺序下一个指令块的地址。请注意标签单元 110 的匹配及对存储器 120 访问只使用块地址，即地址的标签部分以及索引地址，因此预测地址 119 上的块内地址偏移量对上述匹配与寻址没有影响；但是加法器 132 计算分支目标初始地址时需要将分支指令的块内地址偏移量经总线 154 与总线 111 上的块地址拼合送到加法器 132 的输入端，否则产生的分支目标初始地址可能有误差，落在错误的指令块上。此后的操作与实施例七及图 14 相同，不再赘述。 If the branch target initial address does not match the tag in the tag unit 110, the selector 134 selects the branch target initial address 119. It is supplied to an input of the adder 132, at which time the command decoder 150 controls the selector 152 to select the increment 115 to be sent to the other input of the adder 132, so the adder 132 The output is the address of the next instruction block in the order of the branch target initial address. Please note that the matching of the tag unit 110 and the pair of memories 120 Access uses only the block address, the tag portion of the address and the index address, so the intra-block address offset on the predicted address 119 has no effect on the above match and addressing; however, the adder 132 When calculating the branch target initial address, the intra-block address offset of the branch instruction needs to be sent to the adder 132 via the bus 154 and the block address on the bus 111. The input end, otherwise the resulting branch target initial address may have errors and fall on the wrong instruction block. The operation thereafter is the same as that of the seventh embodiment and FIG. 14, and will not be described again.

根据本发明技术方案，还可以将本发明所公开的方法与系统应用到不同结构的存储器层次结构中。某些缓存系统使用缓存器地址直接访问存储器层次，缓存器地址没有存储器地址中的标签，而是已经将标签映射为路号。因此缓存器块地址只有路号及索引地址。扫描器 128 配合这种缓存系统时可以用缓存器块地址寻址标签单元 110 ，从中读出相应的标签，将该标签与缓存器块地址中的索引地址拼合成存储器块地址经总线 111 送到扫描器 128 中供其产生预测地址。其余操作与上述实施例八相同。本发明所公开的方法与系统适用于任何存储器层次结构，特别适合于存储器容量较大的存储器层次，例如处理器中层次最低的存储器层次，最后级缓存（ Last Level Cache ）。 According to the technical solution of the present invention, the methods and systems disclosed by the present invention can also be applied to memory hierarchies of different structures. Some cache systems use a buffer address to directly access the memory hierarchy. The buffer address does not have a label in the memory address, but the label has been mapped to a road number. Therefore, the buffer block address has only the road number and the index address. scanner 128. In conjunction with such a cache system, the tag unit 110 can be addressed with a buffer block address, from which the corresponding tag is read, and the tag is combined with the index address in the buffer block address to form a memory block address via the bus 111. Send to scanner 128 Used to generate a predicted address. The rest of the operations are the same as in the above-described eighth embodiment. The disclosed method and system are applicable to any memory hierarchy, and are particularly suitable for a memory hierarchy with a large memory capacity, such as the lowest level of memory hierarchy in the processor, and the last level cache ( Last Level Cache ).

根据本发明技术方案和构思，还可以有其他任何合适的改动。对于本领域普通技术人员来说，所有这些替换、调整和改进都应属于本发明所附权利要求的保护范围。 There may be any other suitable modifications in accordance with the technical solutions and concepts of the present invention. All such substitutions, modifications and improvements are intended to be within the scope of the appended claims.

Industrial applicability

本发明提出的系统与方法可以被用于各种计算、数据处理以及存储系统相关的应用，可以提高效率，掩盖存储器访问延迟及缓存缺失。 The system and method proposed by the present invention can be applied to various computing, data processing, and storage system related applications, which can improve efficiency, mask memory access latency, and cache miss.

Sequence table free content

Claims

An information processing method for an information system, comprising:

Step A: The address generator generates an address according to the information block and sends an address to the memory, where the information block is a current information block sent by the memory or an information block stored by the address generator;

Step B: The memory outputs an information block according to the address sent by the address;

Step C: The processor acquires and processes the information block of the memory output.

The information processing method according to claim 1, wherein said address generator is capable of processing at least part of functions of said processor, The partial function includes at least a partial branch judging function.

The information processing method according to claim 2, wherein said address generator generates an address by generating method one as follows: Generation Method 1 The address generator has at least a partial address generation function of the processor, and the address generator processes the information block to generate an address.

The information processing method according to claim 3, wherein said address generator includes a buffer, said address generator filters the generated address, wherein the address provided to the memory is an address through which the filter passes. The address generator filters the generated address using the following method: It is judged whether the address generated by the method 1 is matched in the cache of the address generator, and if not, the generated address is filtered.

The information processing method according to claim 4, wherein the processor includes a buffer, and the buffer in the processor is in the same manner as the buffer in the address generator.

The information processing method according to claim 2, wherein: said address generator and all instructions in said processor that are less capable of being executed are referred to as valid instructions; The result refers to the execution result produced by the address generator or the processor executing the instruction; the result completely generated by the execution of the valid instruction is a valid result.

The information processing method according to claim 6, wherein: If the branch judgment and the branch target address generated by the address generator are valid results, the address generator independently completes the branch operation; If the branch decision and/or the branch target address generated by the address generator is not a valid result, the address generator completes the branch operation based on the branch decision and/or the branch target address provided by the processor.

The information processing method according to claim 6, wherein: if the result output by said address generator is a valid result, said valid result is written into said memory; If the result of the output by the address generator is not a valid result, the result is ignored, and the processor outputs a result corresponding to the non-effective result and writes to the memory.

The information processing method according to claim 6, wherein: a valid flag is added to a register of said address generator and said processor; Adding an instruction valid flag to the address generator and the instruction decoder of the processor; the address generator and the processor detecting the validity of the instruction and the validity of the register valid flag of the operand when executing the instruction ; If the valid flag of any instruction or operand is 'invalid', the execution result of the instruction is 'invalid', and the 'invalid' value is written to the valid flag of the target register; If the valid flag of all instructions or operands is 'valid', the execution result of the instruction is 'valid', and the 'valid' value is written to the valid flag of the target register;

The information processing method according to claim 1, wherein said address generator generates an address by generating method 2 as follows: The method 2 is generated by the address generator parsing the current information block. If it is determined that the current information block includes a branch instruction, the target address of the branch instruction is calculated, and an address is generated.

The information processing method according to claim 10, wherein the address generator filters the generated address, wherein the address provided to the memory is an address that is filtered, and the address generator uses the following method to generate the address. Filter by address: It is determined whether the information block pointed to by the address generated by the method 2 is the current information block, and if not, the generated address is filtered.

The information processing method according to claim 10, wherein said address generator further generates an address by generating method three as follows: The third method generates the address of the current information block, adds an offset to the address of the current information block, and generates an address.

The information processing method according to claim 12, wherein said address generator filters the generated address, wherein the address provided to the memory is an address through which the filter passes, and the address generator uses the following method to generate the address Filter by address:

Step 1: The address generated by the generating method 3 is determined to be an address that is filtered;

Step 2: Determine whether the information block pointed by the address generated by the generating method 2 is the current information block or the information block pointed to by the address generated by the method 3, and if not, the address generated by the method 2 is filtered.

The information processing method according to claim 10, wherein a second memory is added; At least one of the information blocks and their corresponding addresses are stored in the second memory.

The information processing method according to claim 14, wherein an address generated by said address generator is stored in said second memory; The corresponding information block output by the memory according to the address generated by the address generator is stored in the second memory.

The information processing method according to claim 14, wherein said processor sends an address to said information buffer to match an address stored therein; If there is a match, the information buffer outputs the corresponding information block of the matching address for the processor to acquire; if not, the memory outputs the information block according to the processor output address for the processor to acquire.

A storage hierarchy prefetching system, comprising: a lower storage hierarchy for storing information and providing a block of information to a memory according to the received access address or predicted address; a memory for storing information and outputting a current information block according to the received access address; The address generator is configured to add an increment to the initial address or the predicted address, continuously generate the predicted address, filter the generated address, provide a predicted address for filtering to a lower storage level, and obtain corresponding information to fill the memory; When the address fails to pass the screening, the address generator terminates the operation; A higher storage level for providing an access address to the memory and receiving the current block of information output by the memory.

The storage hierarchy prefetching system according to claim 17, wherein said address generator generates said initial address by generating method 1 as follows: Method 1: The address generator adds an increment to the access address to generate the initial address.

The storage hierarchy prefetching system according to claim 17, wherein said address generator generates said initial address by generating method 2 as follows: The method 2 is generated by the address generator parsing the current information block. If it is determined that the current information block includes a branch instruction, the target address of the branch instruction is calculated, and the initial address is generated.

A storage hierarchy prefetching system according to claim 17, wherein said address generator includes an address filter for filtering said predicted address generated by said address generator, said address filter utilizing the following screening The method filters the predicted addresses generated by a pair: Determining whether the corresponding information of the predicted address generated by the address generator is already in the memory, and if not, the generated predicted address is filtered.

A storage hierarchy prefetching system according to claim 17, wherein said address generator includes an address filter for filtering said predicted address generated by said address generator, said address filter utilizing the following screening Method 2 filters the generated predicted address: Determining whether the predicted address generated by the address generator crosses a preset address boundary, and if not, the generated predicted address is filtered.

A storage hierarchy prefetching system according to claim 17, wherein said address generator includes an address filter for filtering said predicted address generated by said address generator, said address filter utilizing the following screening Method 3 filters the generated predicted addresses: Determining whether the predicted address count continuously generated by the address generator according to an access address reaches a preset maximum value; if not, the generated predicted address is filtered.

A storage hierarchy prefetching system according to claim 17, wherein said address generator includes an address filter for filtering said predicted address generated by said address generator, said address filter utilizing the following screening Method 4 filters the generated predicted addresses: Determining whether the information block obtained from the lower storage level according to the predicted address generated by the address generator includes an indirect branch instruction, and if not, the predicted address that is detected at that time passes the screening.

The storage hierarchy prefetching system according to claim 17, wherein said prefetching schedules a prefetch priority according to a time point at which said access address accesses said memory, and said visitor address accessed later is accessed earlier than said access address. Address priority; abandon prefetch for earlier accesses that exceed a certain limit.

A storage hierarchy prefetching method, comprising:

Step A: The address generator continuously generates a predicted address according to the initial address;

Step B: The address generator filters the generated predicted address.

Step C: The filtered predicted address is sent to a lower storage level;

Step D: outputting a block of information at a lower storage level;

Step E: storing the information block in a memory;

Step F: The address generator is not terminated by the filtered predicted address.

The storage hierarchy prefetching method according to claim 25, wherein the address generator performs screening by using the predicted address generated by a pair of screening methods as follows: Determining whether the corresponding information of the predicted address generated by the address generator is already in the memory, and if not, the generated predicted address is filtered.