CN1382280B

CN1382280B - Automatic processor generation system and method for designing configurable processors

Info

Publication number: CN1382280B
Application number: CN00812731.XA
Authority: CN
Inventors: 厄尔利·A·基利安; 理查多·E·冈萨雷兹; 阿西什·B·迪克斯特; 蒙妮卡·莱姆; 沃尔特·D·里奇坦斯坦; 克里斯托弗·劳恩; 约翰·拉坦伯格; 罗伯特·P·威尔森; 阿伯特·R－R·王; 多尔·E·麦丹; 文·K·蒋; 理查德·鲁戴尔
Original assignee: Tensilica Inc
Current assignee: Tensilica Inc
Filing date: 2000-02-04
Publication date: 2016-11-30
Anticipated expiration: 2020-02-04

Abstract

A configurable RISC processor implements a user-defined instruction set with high-performance fixed and variable length coding. The process of defining new instruction sets is supported by various tools that allow users to add new instructions and quickly evaluate them to maintain and switch between instruction sets. A standardized language is used to develop configurable definitions of target instruction sets and HDL descriptions of the hardware needed to implement the instruction sets, as well as various development tools for validation and application development, thereby achieving a high degree of automation in the design process.

Description

Automatic processor generation system and method for designing configurable processors

发明背景Background of the invention

1.发明领域1. Field of invention

本发明涉及微处理器系统，更具体地说，本发明涉及含有一个或多个处理器的一种应用程序解决方案的设计，在这里，系统中的各处理器在它们的设计过程中被这样配置和加强，以改进它们对一种特定应用程序的适用性。本发明还面向这样一个系统，在其中，应用程序开发者可以在现有指令集体系结构的基础上，快速地开发指令扩展，例如新的指令，包括控制用户定义的处理器状态的新指令，并且立即测量这样的扩展对应用程序运行时间以及对处理器周期时间的影响。The present invention relates to microprocessor systems, and more particularly, the present invention relates to the design of an application solution containing one or more processors, where the processors in the system are programmed during their design such that Configure and enhance to improve their suitability for a particular application. The present invention is also directed to a system in which application developers can rapidly develop instruction extensions, such as new instructions, including new instructions to control user-defined processor states, based on existing instruction set architectures, And immediately measure the impact of such extensions on application runtime as well as on processor cycle time.

2.相关技术的说明2. Description of related technologies

传统上，对处理器进行设计和修改曾经是很困难的。由于这个原因，大多数含有处理器的系统都使用为通用用途而一次设计和验证的那些(方案)，然后被多种应用程序一直沿用下来。这样一来，它们对特定应用程序的适用性并不经常都是理想的。修改处理器以便较好地执行特定应用程序的代码通常是适宜的(例如，运行更快些，功耗降低些，或者成本降低些)。然而，即使修改现有处理器的设计，其困难，因而其时间、成本和风险，都是很高的，所以典型地不这样做。Traditionally, designing and modifying processors has been difficult. For this reason, most systems containing processors use those (solutions) that are designed and proven once for general use, and then carried over to multiple applications. As such, their suitability for a particular application is not always ideal. It is often desirable to modify the processor to better execute the code of a particular application (eg, to run faster, consume less power, or cost less). However, even modifying the design of an existing processor is difficult, and thus time, cost, and risky, so high that it is typically not done.

为了更好地理解使现有技术的处理器变为可配置的处理器所遇到的困难，让我们考虑它的开发过程。首先，要开发其指令集体系结构(ISA)。实质上，这个步骤进行一次之后，就要被许多系统用上几十年。例如，Intel Pentium^TM处理器所用的指令集可能是继承了早在1970年代中期引入的8008和8080微处理器的遗产。在这个过程中，基于预定的ISA设计规范，各ISA指令，句法等被开发出来，并且用于ISA的软件开发工具，诸如汇编程序，调试程序，编译程序等也被开发出来。随后，开发出针对特定ISA的仿真程序，各种基准测试程序被运行，以评估ISA的有效性，并且根据评估的结果，对ISA进行修改。在某几点上，ISA将被认为是满意的，并且随着一份充分开发的ISA说明书，一段ISA仿真程序，一份ISA验证程序组以及一种开发程序组，包括例如汇编程序，调试程序，编译程序等的完成，ISA过程就宣告终结了。然后，开始进行处理器设计。由于处理器可能有许多年的使用寿命，所以这个过程的执行是不频繁的—典型地，一种处理器一次设计出来后，总要被许多系统用上许多年。只要给出ISA，它的验证程序组，仿真程序以及不同处理器的各种开发目标，就能对该处理器的微体系结构进行设计、仿真和修改。一旦微体系结构被定型，它就被纳入一种硬件描述语言(HDL)之中，并且开发出一种微体系结构验证程序组，用以验证该HDL实施方案(多数在以后进行)。接着，与针对这一点而描述的手工处理相对照，自动设计工具可以基于HDL描述来合成一个电路，并对它的各元件进行布局和布线。随后可以对布局进行修改，以优化芯片面积的使用和定时。可供选择地，可以使用附加的手工处理来生成基于HDL描述的平面布置图，将HDL转换为电路，然后人工地和自动地对电路进行验证并进行布局设计。最后，使用一种自动化工具对布局进行验证，以证实它与电路相匹配，并且根据各项布局参数对各电路进行验证。To better understand the difficulties encountered in making a prior art processor configurable, let's consider its development. First, to develop its instruction set architecture (ISA). In essence, after this step is done once, it will be used by many systems for decades. For example, the instruction set used by the Intel Pentium ^(TM) processor may be a legacy of the 8008 and 8080 microprocessors introduced back in the mid-1970's. In this process, based on predetermined ISA design specifications, each ISA instruction, syntax, etc. are developed, and software development tools for the ISA, such as assembler, debugger, compiler, etc., are also developed. Subsequently, a simulation program for a specific ISA is developed, various benchmark programs are run to evaluate the effectiveness of the ISA, and according to the evaluation results, the ISA is modified. At some point, the ISA will be considered satisfactory, and along with a fully developed ISA specification, an ISA emulator, an ISA verification program suite, and a development program suite including e.g. assembler, debugger , the completion of the compilation program, etc., the ISA process is terminated. Then, the processor design begins. This process is performed infrequently because processors may have a useful life of many years—typically, a processor designed once is used by many systems for many years. As long as ISA, its verification program group, simulation program and various development targets of different processors are given, the microarchitecture of the processor can be designed, simulated and modified. Once the microarchitecture is finalized, it is incorporated into a hardware description language (HDL), and a microarchitecture verification suite is developed to verify the HDL implementation (mostly later). Then, an automated design tool can synthesize a circuit based on the HDL description, and place and route its components, as opposed to the manual process described for this point. The layout can then be modified to optimize chip area usage and timing. Alternatively, additional manual processing can be used to generate a floor plan based on the HDL description, convert the HDL to circuits, and then manually and automatically verify and layout the circuits. Finally, the layout is verified using an automated tool to verify that it matches the circuit, and each circuit is verified against the layout parameters.

在完成处理器开发之后，对系统进行总体设计。不同于ISA和处理器的设计，系统设计(它可以包括芯片设计，现在的芯片包括处理器)是十分普通的，并且典型地对系统进行连续设计。每一种系统都被一种特定应用程序使用一段相当短的时间周期(1或2年)。基于预定的系统目标，例如成本、性能、功率和功能，事先存在的处理器说明书，芯片版型说明书(通常跟处理器经销商紧密联系)，对整个系统的体系结构进行设计，选择一种处理器使之与设计目标相匹配，并选定处理器的版型(这跟处理器选择紧密联系)。After completing the processor development, the overall design of the system is carried out. Unlike the design of ISAs and processors, system design (which can include chip design, which now includes processors) is quite generic and typically continues to the system. Each system is used by a specific application for a relatively short period of time (1 or 2 years). Based on predetermined system goals, such as cost, performance, power, and functionality, pre-existing processor specification, chip form factor specification (usually in close contact with processor vendors), design the overall system architecture, select a process Make it match the design goals, and select the version of the processor (this is closely related to the processor selection).

随后，给出选定的处理器、ISA、版型以及事先开发的仿真程序、验证和开发工具(还有用于所选定的版型的标准单元库)，来设计该系统的实施方案，为该系统的HDL实施方案开发出一种验证程序组，并使该实施方案得以验证。其次，合成该系统的电路，在电路板上进行布局和布线，并对布局和定时进行再优化。最后，对这些板进行设计和布局，制作出各芯片，并组装各电路板。Then, given the selected processor, ISA, version, and previously developed simulation programs, verification and development tools (and standard cell libraries for the selected version), an implementation of the system is designed for The HDL implementation of the system develops a set of verification programs and enables the implementation to be verified. Next, the circuit for the system is synthesized, placed and routed on the board, and the placement and timing are reoptimized. Finally, the boards are designed and laid out, the chips are fabricated, and the boards are assembled.

现有技术处理器设计的另一项困难就是，由于任何给定的应用程序仅需要各项特征的一个特定的组合，以及让一个处理器拥有该项应用程序所不需要的特征将是过分昂贵的，消耗更多功率，并且更加难以制造，所以简单地设计具有更多特征的传统的处理器以覆盖所有的应用程序是不适当的。此外，当开始设计一种处理器时，还不可能知道所有的应用目标。若处理器的修改过程可以实现自动化并且很可靠，则系统设计者产生应用解决方案的能力将会显著地增强。Another difficulty with prior art processor designs is that since any given application requires only a specific combination of features, it would be prohibitively expensive to have a processor with features not required for that application. , consumes more power, and is more difficult to manufacture, so simply designing a conventional processor with more features to cover all applications is inappropriate. Furthermore, when starting to design a processor, it is not possible to know all application targets. If the processor modification process could be automated and reliable, the system designer's ability to generate application solutions would be significantly enhanced.

作为一个实例，考虑这样一种器件，它被设计用于在一条使用复杂协议的信道上发送和接收数据。由于该协议是复杂的，所以不可能全部使用硬接线(例如组合逻辑)来合理地完成处理过程，取而代之的是，将可编程处理器引入该系统用于协议处理。可编程性还允许差错固定，并且通过将新的软件装入指令存储器，就能完成日后的协议更新。然而，传统的处理器或许不是为这种特定应用程序而设计的(当设计此种处理器时，甚至该项应用程序可能尚未出现)，并且它需要执行这样一些操作，这些操作需要几条指令去完成，而只要在附加的处理器逻辑中，用一条或几条指令就能完成这些操作。As an example, consider a device designed to send and receive data over a channel using a complex protocol. Due to the complexity of the protocol, it is not possible to reasonably complete the processing using hard-wiring (such as combinational logic) at all. Instead, a programmable processor is introduced into the system for protocol processing. Programmability also allows error fixation and future protocol updates by loading new software into the instruction memory. However, a conventional processor may not have been designed for this particular application (and the application may not even have existed when the processor was designed), and it needs to perform operations that require several instructions To complete, and as long as in the additional processor logic, these operations can be completed with one or a few instructions.

由于处理器不能轻易地改进，使得许多系统设计者不打算这样做，并且改为在一种可得到的处理器上，选择执行一种低效率的纯软件解决方案。这种低效率导致一种解决方案可能更慢，或者需要更多的功率，或者成本较高(例如，它可能需要一块较大的、功能更强的处理器，以足够的速度来执行该程序)。其他设计者选择在他们为该项应用程序而设计的专用硬件中提供某些处理要求，例如一个协处理器，然后让程序员在程序的不同点上通过编码来访问该专用硬件。然而，由于只有相当大的工作单元才被足够地加速，使得通过使用专用硬件而节省的时间大于(译者注：应为小于)往返于向专用硬件传送数据所需的附加时间，所以，在处理器和专用硬件之间传送数据的时间限制了这种方案在系统优化中的使用。The inability of processors to be easily improved has led many system designers to avoid this and instead choose to implement an inefficient software-only solution on an available processor. This inefficiency results in a solution that may be slower, or require more power, or be more expensive (for example, it may require a larger, more powerful processor to execute the program at sufficient speed ). Other designers choose to provide some of the processing requirements, such as a coprocessor, in special-purpose hardware they design for the application, and then have programmers code to access that special-purpose hardware at various points in the program. However, since only fairly large units of work are accelerated enough that the time saved by using dedicated hardware is greater than the additional time required to transfer data to and from dedicated hardware, in The time to transfer data between the processor and dedicated hardware limits the use of this approach for system optimization.

在通信信道应用的实例中，该协议可能需要加密、纠错，或者压缩/解压缩处理。这样的处理通常在个别的比特上进行操作，而不是在处理器的较大的字上进行操作。用于一项计算的电路可能是适中的，但是让处理器去抽取每一个比特，顺序地对它进行处理，然后重新装入各比特，将增加可观的开销。In the instance of a communication channel application, the protocol may require encryption, error correction, or compression/decompression processing. Such processing typically operates on individual bits rather than on larger words for the processor. The circuitry for a computation may be modest, but having the processor decimate each bit, process it sequentially, and then reload the bits adds considerable overhead.

作为一个非常特殊的例子，考虑使用表1所示规则的哈夫曼解码(类似的编码用于MPEG压缩标准)。As a very specific example, consider Huffman decoding using the rules shown in Table 1 (similar encoding is used in the MPEG compression standard).

Patternpattern Valuevalue LengthLength 0 0 X X X X X X0 0 X X X X X X 00 22 0 1 X X X X X X0 1 X X X X X X 11 22 1 0 X X X X X X1 0 X X X X X X 22 22 1 1 0 X X X X X1 1 0 X X X X X 33 33 1 1 1 0 X X X X1 1 1 0 X X X X 44 44 1 1 1 1 0 X X X1 1 1 1 0 X X X 55 55 1 1 1 1 1 0 X X1 1 1 1 1 0 X X 66 66 1 1 1 1 1 1 0 X1 1 1 1 1 1 0 X 77 77 1 1 1 1 1 1 1 01 1 1 1 1 1 1 0 88 88 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 99 88

数值和长度二者都要进行计算，因此，在码流中，各length比特可以被消除，以便找到待解码的下一个元素的起点。Both the value and the length are computed, so in the codestream each length bit can be eliminated in order to find the start of the next element to be decoded.

对一个常规的指令集来说，对此进行编码有多种方法，但是由于有很多测试需要做，并且跟组合逻辑的单门延时相对比，每一种软件实施方案都需要多个处理器周期，所以它们全都需要许多条指令。例如，一种使用MIPS指令集的有效的现有技术的实施方案可能需要6项逻辑运算，6项条件分支，1项算术运算，以及相关的寄存器装载。使用一种优化设计的指令集可使编码好一些，但是在时间方面仍然开销很大：1项逻辑运算，6项条件分支，1项算术运算，以及相关的寄存器装载。There are multiple ways to code this for a conventional instruction set, but each software implementation would require multiple processors due to the many tests that need to be done and compared to the per-gate latency of combinatorial logic cycles, so they all require many instructions. For example, an efficient prior art implementation using the MIPS instruction set might require 6 logical operations, 6 conditional branches, 1 arithmetic operation, and associated register loads. Using an optimally designed instruction set results in better coding, but is still expensive in terms of time: 1 logical operation, 6 conditional branches, 1 arithmetic operation, and associated register loads.

在处理器资源方面，开销是如此之大，使得典型地要使用一份256行的对照表，来代替作为逐位比较的序列的处理过程的编码。然而，一份256行的对照表要占用大量的空间，并且访问该表可能还需要许多个周期。对于较长的哈夫曼编码来说，表的大小将变得无法使用，它将导致更加复杂和比较慢的代码。The overhead, in terms of processor resources, is so great that a 256-line look-up table is typically used instead of encoding the process as a sequence of bit-by-bit comparisons. However, a 256-row lookup table takes up a lot of space, and accessing the table may take many cycles. For longer Huffman encodings, the table size becomes unusable, which results in more complex and slower codes.

在处理器中，迎合特殊应用要求的可能的问题解决方案就是使用可配置的处理器，它具有易于修改和扩展的指令集和体系结构，以便改进处理器的功能并实现功能的定制。可配置性允许设计者在其产品中指定是否需要或需要多少附加的功能。可配置性的最简单的一种是二进制选择：一种特征是有还是没有。例如，可以提供一种有或没有浮点硬件的处理器。In processors, a possible solution to the problem of catering to special application requirements is to use a configurable processor, which has an instruction set and architecture that can be easily modified and expanded, in order to improve the function of the processor and realize the customization of the function. Configurability allows designers to specify whether or how much additional functionality is required in their product. The simplest form of configurability is a binary choice: a feature is present or absent. For example, a processor may be provided with or without floating point hardware.

通过采用比较精细的渐进方法的配置选择，使灵活性得以改进。例如，处理器可以允许系统设计者在寄存器文件中指定寄存器的数目，存储器的宽度，高速缓冲存储器的大小，高速缓冲存储器的关联性等。然而，这些选项仍然没有达到由系统设计者按照自己的想法加以定制的水平。例如，在上面的哈夫曼解码的例子中，虽然在现有技术中，不知道系统设计者可能喜欢纳入一条专门的指令来进行解码，例如，Flexibility is improved through configuration selection using a finer, incremental approach. For example, a processor may allow a system designer to specify the number of registers, memory width, cache size, cache associativity, etc. in a register file. However, these options still do not reach the level that system designers can customize to their liking. For example, in the Huffman decoding example above, although it is not known in the prior art that a system designer might like to incorporate a dedicated instruction to do the decoding, e.g.,

huff8 t1，t0huff8 t1, t0

在这里，结果的最高8位是解码后的数值，同时，低8位是长度。跟前面所描述的软件实施方案相对照，哈夫曼解码的直接硬件实施方案是十分简单的—除了指令解码等以外，用于组合逻辑功能的指令的解码逻辑大致上有30个门，或者是一个典型的处理器的门的数目的不到0.1％，并且可以由一个专用处理器在一个单周期中进行计算，因此，跟仅使用通用指令相比，其改进因子为4-20。Here, the highest 8 bits of the result are the decoded value, while the lower 8 bits are the length. In contrast to the software implementation described above, the direct hardware implementation of the Huffman decoding is quite simple—in addition to instruction decoding, etc., the decoding logic for instructions for combinatorial logic functions has roughly 30 gates, or A typical processor has less than 0.1% of the number of gates and can be computed in a single cycle by a special purpose processor, thus improving by a factor of 4-20 compared to using only general purpose instructions.

现有技术在可配置处理器产生方面的努力通常分为两类：配合参数化硬件描述而使用的逻辑合成；以及来自抽象机器描述的编译程序和汇编程序的重定目标。属于第1类的可合成的处理器硬件设计，例如Synopsys DW 8051处理器，ARM/Synopsys ARM7-S，LexraLX-4080，ARC可配置的RISC核心；并且在某种程度上还包括Synopsys可合成的/可配置的PCI总线接口。Prior art efforts in configurable processor generation generally fall into two categories: logic synthesis used with parameterized hardware descriptions; and compiler and assembler retargeting from abstract machine descriptions. Synthesizable processor hardware designs belonging to category 1, such as Synopsys DW 8051 processor, ARM/Synopsys ARM7-S, LexraLX-4080, ARC configurable RISC core; and to some extent Synopsys synthesizable / Configurable PCI bus interface.

在上述例子中，Synopsys DW 8051包括一种现有处理器体系结构的二进制兼容的实施方案；以及小量的合成参数，例如内部RAM的128或256字节，由参数rom_addr_size确定的ROM地址范围，一个可选的间隔定时器，一个可变数目(0-2)的串行端口，以及一个支持6或13个源的中断单元。虽然对DW 8051的体系结构可以作出一些改变，但是在其指令集体系结构中不可能作出改变。In the above example, the Synopsys DW 8051 includes a binary-compatible implementation of an existing processor architecture; and a small number of synthetic parameters, such as 128 or 256 bytes of internal RAM, the ROM address range determined by the parameter rom_addr_size, An optional interval timer, a variable number (0-2) of serial ports, and an interrupt unit supporting 6 or 13 sources. While some changes can be made to the DW 8051's architecture, no changes are possible in its instruction set architecture.

ARM/Synopsys ARM7-S处理器包括现有体系结构和微体系结构的二进制兼容的实施方案。它具有两个可配置的参数：高性能或低性能乘法器的选择，以及纳入调试程序和在线仿真逻辑。虽然有可能使ARM7-S的指令集体系结构发生改变，但是它们是现有的不可配置的处理器实施方案的子集，所以不需要新的软件。The ARM/Synopsys ARM7-S processor includes binary-compatible implementations of existing architectures and microarchitectures. It has two configurable parameters: selection of high-performance or low-performance multipliers, and inclusion of debugger and in-circuit emulation logic. While it is possible to make changes to the instruction set architecture of the ARM7-S, they are a subset of existing non-configurable processor implementations, so no new software is required.

LX-4080处理器具有标准的MIPS体系结构的可配置的变体，并且对指令集扩展不提供软件支持。它的选项包括一个定制引擎接口，它允许用专用操作对MIPS算术逻辑单元ALU的操作码进行扩展；一个内部硬件接口，它包括一个寄存器源和一个寄存器或16位宽的立即数源，以及目标和挂起信号；一个简单的存储器管理单元选项；3个MIPS协处理器接口；一个通往高速缓冲存储器、便笺式RAM或ROM的灵活的本地存储器接口；一个总线控制器，它将外部功能和存储器连接到该处理器自身的局部总线；以及一个可配置深度的写缓冲器。The LX-4080 processor has a configurable variant of the standard MIPS architecture and does not provide software support for instruction set extensions. Its options include a custom engine interface that allows the opcodes of the MIPS arithmetic logic unit ALU to be extended with dedicated operations; an internal hardware interface that includes a register source and a register or 16-bit wide immediate source, and destination and suspend signals; a simple memory management unit option; three MIPS coprocessor interfaces; a flexible local memory interface to cache memory, scratch pad RAM, or ROM; a bus controller that connects external functions and memory connected to the processor's own local bus; and a write buffer of configurable depth.

ARC可配置的RISC核心与飞速获取数据的门计数估计之间具有一个用户接口，上述估计基于目标技术和时钟速度，指令高速缓冲存储器配置，指令集扩展，一个定时器选项，一个便笺式存储器器选项，以及存储器控制器选项；一个具有可选择的选项的指令集，例如具有送往存储器的数据块的局部便笺式RAM，专用寄存器，多达16种额外状态代码选择，一个32×32比特计分牌乘法块，一个单周期32位barrel-shifter/旋转块，一条规格化(寻找第1位)指令，直接将结果写到命令缓冲存储器(不是写到寄存器文件)，一个16位MUL/MAC块以及36位累加器，以及使用线性算术的用以访问本地SRAM的滑动指针；以及通过手工编辑VHDL源代码来定义的用户指令。ARC设计没有用于实现一种指令集描述语言的装置，也不产生可配置处理器专用的软件工具。ARC configurable RISC core with a user interface to on-the-fly gate count estimates based on target technology and clock speed, instruction cache configuration, instruction set extensions, a timer option, a scratch pad memory options, and memory controller options; an instruction set with selectable options such as local scratch-pad RAM with data blocks sent to memory, dedicated registers, up to 16 additional status code choices, a 32 x 32 Card multiplication block, a single-cycle 32-bit barrel-shifter/rotation block, a normalize (find 1st bit) instruction, write the result directly to the command buffer memory (not to the register file), a 16-bit MUL/MAC block and 36-bit accumulators, and sliding pointers to access local SRAM using linear arithmetic; and user instructions defined by hand-editing the VHDL source code. The ARC design has no means for implementing an instruction set description language, nor does it produce configurable processor-specific software tools.

Synopsys可配置的PCI接口包括用于安装、配置和合成活动的GUI或命令行接口；在每一个步骤中检查是否采取必要的用户动作；已选定的、基于配置(例如Verilog对VHDL)的设计文件的安装；可选择的配置，例如参数设置，并且用组合有效性的检查来向用户提示各项配置的数值，用用户更新的HDL源代码来产生HDL而不去编辑HDL源文件；以及合成功能，例如一个用户接口，它对技术库进行分析，以选择I/O缓冲器，与技术无关的约束条件以及合成稿本，缓冲器插入和针对特定技术的缓冲器的提示，并将与技术无关的公式转换为依赖于技术的稿本。由于可配置的PCI总线接口实现了各项参数的一致性检查，基于配置的安装，以及HDL文件的自动修改，所以这样的总线接口是值得注意的。Synopsys configurable PCI interface includes GUI or command-line interface for installation, configuration, and synthesis activities; checks at each step for necessary user actions to be taken; selected, configuration-based (e.g., Verilog to VHDL) designs File installation; optional configuration, such as parameter settings, and prompting the user for the values of each configuration with combined validity checks, generating HDL with user-updated HDL source code without editing the HDL source file; and synthesis Functionality, such as a user interface that analyzes technology libraries to select I/O buffers, technology-independent constraints and composition scripts, buffer insertion and hints for technology-specific buffers, and Transform irrelevant formulas into technology-dependent manuscripts. Since the configurable PCI bus interface realizes the consistency check of various parameters, configuration-based installation, and automatic modification of HDL files, such a bus interface is worth noting.

此外，现有技术的合成技术基于用户目标说明而选择不同的映射关系，允许这种映射关系对速度、功率、面积或目标部件进行优化。在这一点上，在现有技术中，在不通过整个映射过程来进行设计的前提下，不可能获得以这种方式重新配置处理器的效果的反馈。这样的反馈可以被用来引导处理器进一步的重新配置，直至达到系统设计目标为止。Furthermore, prior art synthesis techniques select different mappings based on user target specifications, allowing such mappings to be optimized for speed, power, area, or target components. At this point, in the prior art, it is not possible to obtain feedback on the effect of reconfiguring the processor in this way without going through the design through the entire mapping process. Such feedback can be used to guide further reconfiguration of the processor until the system design goals are met.

在可配置处理器产生的领域中，第2类现有技术(即，编译程序和汇编程序的自动重定目标)涉及大范围的学术研究，参见例如Hanono等人所写的《在AVIV可重定目标代码发生器中的指令选择，资源配置和调度》(用于代码发生器的自动生成的机器指令的表示)；Fauth等人所写的《使用nML来描述指令集处理器》；Ramsey等人所写的《在嵌入式系统中用于建立工具的机器描述》；Aho等人所写的《使用树匹配和动态编程的代码产生》(用以匹配与每一条机器指令有关的各种变换的算法，例如，相加、装载、存储、分支等，具有一系列的被表示为某些与机器无关的中间形式的程序操作，使用诸如模式匹配的各种方法)；以及Cattell所写的《代码发生器的形式化和自动导出》(用于编译程序研究的机器体系结构的抽象描述)。In the field of configurable processor generation, a class 2 prior art (i.e., automatic retargeting of compilers and assemblers) involves extensive academic research, see for example "Retargeting in AVIV" by Hanono et al. Instruction Selection, Resource Allocation, and Scheduling in Code Generators" (Representation of Auto-Generated Machine Instructions for Code Generators); Using nML to Describe Instruction Set Processors by Fauth et al.; Ramsey et al. "Machine Descriptions for Building Tools in Embedded Systems"; "Code Generation Using Tree Matching and Dynamic Programming" by Aho et al. , for example, add, load, store, branch, etc., with a sequence of program operations expressed as some machine-independent intermediate form, using various methods such as pattern matching); and "Code Generation by Cattell" Formalization and Automatic Derivation of Machines" (Abstract Description of Machine Architectures for Compiler Studies).

一旦处理器已经被设计出来，就应当对它的运作进行验证。这就是说，处理器通常使用一条流水线(其每一级都适应于指令执行的一个阶段)，从一条存储的指令中来执行各项指令。因此，改变或增加一条指令或者改变配置可能需要在处理器的逻辑中作出普遍的改变，因此，多个流水线级中的每一个都可以在每一条这样的指令上来执行适当的动作。一种处理器的配置要求对它进行重新验证，并且这种验证适用于各项改变和添加。这不是一项简单的任务。各种处理器都是具有扩展的内部数据和控制状态的复杂的逻辑器件，并且控制、数据与程序的组合使得处理器验证成为一种需要的技术。处理器验证所增加的困难就是在开发适当的验证工具中的困难。由于在现有技术中，验证不是自动进行的，所以它的灵活性、速度和可靠性都低于最佳值。Once a processor has been designed, its operation should be verified. That is, processors typically execute instructions from a stored instruction using a pipeline, each stage of which is adapted to a stage of instruction execution. Thus, changing or adding an instruction or changing the configuration may require general changes in the processor's logic so that each of the multiple pipeline stages can perform the appropriate action on each such instruction. A processor's configuration requires it to be revalidated, and this validation applies to changes and additions. This is not an easy task. Processors of all kinds are complex logic devices with extensive internal data and control states, and the combination of control, data, and program makes processor verification a desirable technique. The added difficulty of processor verification is the difficulty in developing appropriate verification tools. Since verification is not automated in the prior art, its flexibility, speed and reliability are less than optimal.

此外，一旦处理器被设计出来并且经过验证，若不能容易地对它进行编程，那就不是特别有用的。通常在扩展软件工具的帮助下对处理器进行编程，上述工具包括编译程序、汇编程序、连接程序、调试程序、仿真程序和跟踪程序。当处理器发生改变时，软件工具也必须随之改变。若一条指令不能被编译、汇编、仿真或调试，则添加一条这样的指令是无益的。在现有技术中，与处理器修改和改进相关的软件改变曾经是促进处理器设计的一个主要的障碍。Furthermore, once a processor has been designed and proven, it is not particularly useful if it cannot be easily programmed. Processors are usually programmed with the help of extensive software tools, including compilers, assemblers, linkers, debuggers, emulators, and tracers. When processors change, software tools must change with them. There is no benefit in adding an instruction if it cannot be compiled, assembled, emulated or debugged. In the prior art, software changes associated with processor modifications and improvements have been a major hurdle in advancing processor design.

因此，可以看出，由于通常典型地不是针对一种特殊应用来设计和修改各种处理器，所以现有技术的处理器设计处于一定程度的困难之中。同样，可以看出，若能够针对特殊应用来配置和扩展各种处理器，则在系统效率上就有可能取得可观的改进。还有，若能在实施方案特性(例如功耗、速度等)上使用反馈来改善处理器的设计，就能增进设计过程的效率和有效性。而且，在现有技术中，一个处理器一旦被修改，就需要进行大量的努力，来验证修改后的处理器的正确运作。最后，虽然现有技术提供有限的处理器可配置性，但是它们不能为软件开发工具的产生提供用于已配置的处理器的修整。Thus, it can be seen that prior art processor design suffers from a degree of difficulty since processors are typically not typically designed and modified for one particular application. Likewise, it can be seen that considerable improvements in system efficiency are possible if various processors can be configured and scaled for specific applications. Also, if feedback on implementation characteristics (eg, power consumption, speed, etc.) can be used to improve processor design, the efficiency and effectiveness of the design process can be enhanced. Moreover, in the prior art, once a processor is modified, a lot of effort is required to verify the correct operation of the modified processor. Finally, while prior art technologies offer limited processor configurability, they do not provide for tailoring of configured processors for production of software development tools.

符合上述规范的一个系统一定是业界中的一项改进，可以作出改进—例如，需要有这样一种处理器系统，它具有对存储在专用寄存器里面的信息(即，处理器状态)进行访问或修改的各项指令，它显著地限制了可获得各项指令的范围，并因此限制了可获得的性能改进的数量。A system that conforms to the above specification must be an improvement in the industry, and improvements can be made—for example, there is a need for a processor system that has access to information stored in special purpose registers (i.e., processor state) or Modified instructions that significantly limit the range of available instructions and thus limit the amount of performance improvement available.

同样，发明新的专用指令涉及在减少周期计数、添加硬件资源以及CPU周期时间影响之间作出复杂的折衷。另一个挑战就是在高性能微处理器实施方案的通常是错综复杂的细节中，在不涉及应用程序开发者的前提下，为新指令获得有效的硬件实施方案。Likewise, inventing new specialized instructions involves making complex trade-offs between reduced cycle counts, added hardware resources, and CPU cycle time impact. Another challenge is to obtain efficient hardware implementations for new instructions without involving application developers in the often intricate details of high-performance microprocessor implementation.

上述系统向用户给出设计一种跟她的应用很好地配套的处理器的灵活性。但是对于硬件和软件的交互式开发来说，仍然是很麻烦的。为了更充分地理解这个问题，考虑这样一种典型方案，该方案被许多软件设计者用来对其软件应用程序的性能进行调整。他们将典型地想到一种可能的改进，修改他们的软件以便使用这种可能的改进，重新编译他们的软件源，以便产生含有那种可能的改进的可运行的应用程序，并且随后对可能的改进进行评估。根据评估的结果，他们可以保留或抛弃这些可能的改进。典型地，整个过程可能仅在几分钟内完成。这就使用户能够自由地进行实验，快速地进行尝试并决定保留或抛弃一些想法。在某些情况下，恰当地评估一种可能的想法是很复杂的。用户可能需要在多种情况下对这种想法进行测试。在这样的情况下，用户通常保留已编译的应用程序的多种版本：一种原始版本以及含有可能的改进的另一种版本。在某些情况下，可能的改进可以是交互式的，并且用户可以保留该应用程序的两个以上的拷贝，其中的每一个都使用可能的改进的一个不同的子集。通过保留多种版本，用户就能在不同情况下，容易地重复测试不同的版本。The system described above gives the user the flexibility to design a processor that is well suited to her application. But for the interactive development of hardware and software, it is still very troublesome. To understand this problem more fully, consider a typical scheme used by many software designers to tune the performance of their software applications. They will typically think of a possible improvement, modify their software to use that possible improvement, recompile their software sources to produce a runnable application with that possible improvement, and then review the possible Improvements are evaluated. Depending on the results of the evaluation, they can keep or discard these possible improvements. Typically, the entire process may be completed in only a few minutes. This gives users the freedom to experiment, to try things out quickly and decide to keep or discard ideas. In some cases, properly evaluating a possible idea is complicated. Users may need to test this idea in a variety of situations. In such cases, users typically keep multiple versions of the compiled application: an original version and another version with possible improvements. In some cases, the potential enhancements may be interactive, and the user may retain two or more copies of the application, each of which uses a different subset of the possible enhancements. By keeping multiple versions, users can easily repeatedly test different versions under different conditions.

可配置处理器的用户喜欢以类似于软件开发者在传统的处理器上开发软件的方式来交互式地联合开发硬件和软件。考虑用户将定制的指令添加到可配置处理器中去这样的情形。用户喜欢交互式地将各种可能的指令添加到他们的处理器中去，并且在他们的特定的应用程序中测试和评估那些指令。在现有技术系统中，由于3种原因，使得这成为困难。Users of configurable processors like to interactively co-develop hardware and software in a manner similar to how software developers develop software on traditional processors. Consider the case where a user adds custom instructions to a configurable processor. Users like to interactively add every possible instruction to their processor, and to test and evaluate those instructions in their specific application. In prior art systems this is made difficult for 3 reasons.

首先，在提出一条可能的指令之后，在获得能得益于这条指令的编译程序和仿真程序之前，用户必须等待一个小时以上。First, after proposing a possible instruction, the user must wait more than an hour before obtaining a compiler and emulator that would benefit from the instruction.

其次，当用户希望用许多可能的指令进行实验时，用户必须为每一条指令生成和保留一个软件开发系统。软件开发系统可能十分庞大。保留许多版本可能变得无法管理。Second, when a user wishes to experiment with many possible instructions, the user must generate and maintain a software development system for each instruction. Software development systems can be very large. Keeping many versions can become unmanageable.

最后，软件开发系统是为整个处理器配置的。这使得在不同工程师当中分解开发过程变得很困难。考虑两个开发者同时在一项特定的应用中进行工作这样一个实例。一个开发者可能负责决定处理器的高速缓冲存储器的特性，而另一个则负责添加定制的指令。当这两个开发者的工作联系在一起时，每一片都是充分地可分离的，使得每一个开发者都能彼此隔离地进行她的任务。高速缓冲存储器的开发者可能一开始就提出一种特殊的配置。另一个开发者开始于该项配置，并且尝试几种指令，为每一条可能的指令建立一个软件开发系统。现在，高速缓冲存储器的开发者修改已提出的高速缓冲存储器的配置。由于她的配置中的每一种都采用原来的高速缓冲存储器的配置，所以另一个开发者现在必须重建她的配置中的每一种。如果有许多开发者同时在一个项目上进行工作，要将不同的配置组织到一起可能很快就变为无法管理。Finally, the software development system is configured for the entire processor. This makes it difficult to break down the development process among different engineers. Consider an instance where two developers are working on a particular application at the same time. One developer may be responsible for determining the characteristics of the processor's cache memory, while another is responsible for adding custom instructions. When the work of the two developers is linked, each piece is sufficiently separable that each developer can perform her tasks in isolation from the other. The developer of the cache memory may propose a special configuration from the beginning. Another developer starts with this configuration and tries several instructions, building a software development system for each possible instruction. Now, the developers of the cache memory modify the proposed configuration of the cache memory. Another developer must now rebuild each of her configurations, since each of her configurations uses the original cache's configuration. With many developers working on a project at the same time, organizing different configurations together can quickly become unmanageable.

本发明的简述Brief description of the invention

本发明克服了现有技术的这些问题，并且它的一个目标就是提供一个这样的系统。它通过产生处理器的硬件实施方案的描述以及一组用于从相同的配置说明对处理器进行编程的软件开发工具，来自动地配置一种处理器。The present invention overcomes these problems of the prior art, and it is an object of it to provide such a system. It automatically configures a processor by generating a description of the processor's hardware implementation and a set of software development tools for programming the processor from the same configuration description.

本发明的另一个目标就是提供这样一个系统，它能针对不同的性能规范，对硬件实施方案和软件开发工具进行优化。Another object of the present invention is to provide such a system which optimizes the hardware implementation and software development tools for different performance specifications.

本发明的又一个目标就是提供这样一个系统，它为处理器给出不同类型的可配置性，包括可扩展性，二进制选择和参数修改。Yet another object of the present invention is to provide such a system which gives processors different types of configurability, including scalability, binary selection and parameter modification.

本发明的再一个目标就是提供这样一个系统，它以一种能够容易地植入硬件的语言来描述处理器的指令集体系结构。It is a further object of the present invention to provide a system which describes the instruction set architecture of a processor in a language which can be easily embedded in hardware.

本发明的还一个目标就是提供这样一个系统和方法，用以开发和实现能修改处理器状态的指令集扩展。It is a further object of the present invention to provide such a system and method for developing and implementing instruction set extensions capable of modifying processor state.

本发明的另一个目标就是提供这样一个系统和方法，用以开发和实现能修改可配置的处理器的各寄存器的指令集扩展。Another object of the present invention is to provide such a system and method for developing and implementing an instruction set extension that modifies registers of a configurable processor.

本发明的又一个目标就是允许用户通过添加新的指令来定制一种处理器配置，并且能在几分钟内评估该项特征。Yet another object of the present invention is to allow a user to customize a processor configuration by adding new instructions and evaluate this feature within minutes.

通过提供一个自动处理器产生系统，能够达到上述目标，上述系统使用以标准化语言编写的定制的处理器指令集选项和扩展来开发一种目标指令集的配置定义，为实现该指令集所需的电路的硬件描述语言说明，以及各种开发工具，例如编译程序，汇编程序，调试程序和仿真程序，它们都可以被用来为该处理器生成软件以及对该处理器进行验证。可以针对不同的规范，例如面积、功耗和速度，来优化处理器电路的实施方案。一种处理器配置一旦被开发出来，它就能被测试，并且被输入到待修改的系统，以便反复地优化处理器的实施方案。The above objectives are achieved by providing an automatic processor generation system that uses custom processor instruction set options and extensions written in a standardized language to develop a configuration definition of a target instruction set for the implementation of the instruction set required A hardware description language description of the circuit, and various development tools, such as compilers, assemblers, debuggers, and simulators, can be used to generate software for and verify the processor. Implementations of processor circuits may be optimized for different specifications, such as area, power consumption, and speed. Once a processor configuration has been developed, it can be tested and imported into the system to be modified in order to iteratively optimize the processor implementation.

为了开发根据本发明的一个自动处理器产生系统，需要定义一种指令集体系结构描述语言，以及研制各种开发工具，诸如汇编程序，连接程序，编译程序和调试程序。这是开发过程的一部分，因为虽然大部分工具都是标准的，但是它们应当被修改为能够根据ISA描述而自动地被配置。这一部分设计过程典型地是由自动处理器设计工具本身的设计者或生产者来完成的。In order to develop an automatic processor generation system according to the present invention, it is necessary to define an instruction set architecture description language, and to develop various development tools such as assembler, linker, compiler and debugger. This is part of the development process because although most of the tools are standard, they should be modified to be automatically configured according to the ISA description. This part of the design process is typically performed by the designer or manufacturer of the automated processor design tool itself.

一个根据本发明的自动处理器产生系统的运作如下。一个用户，例如一个系统设计者，开发一种可配置的指令集体系结构。这就是说，使用ISA定义和先前开发的工具，开发出遵循某种ISA设计目标的一种可配置的指令集体系结构。然后，为这种可配置的指令集体系结构配置开发工具和仿真程序。使用可配置的仿真器，运行基准测试，以评估可配置的指令集体系结构的有效性，并且根据评估结果来修改其核心。一旦可配置的指令集体系结构处于一种满意状态，就为它开发一种验证程序组。An automatic processor generation system according to the present invention operates as follows. A user, such as a system designer, develops a configurable instruction set architecture. That is, using ISA definitions and previously developed tools, a configurable instruction set architecture is developed that follows a certain ISA design goal. Then, configure the development tools and emulators for this configurable instruction set architecture. Using the configurable emulator, run benchmarks to evaluate the effectiveness of the configurable instruction set architecture and modify its core based on the evaluation results. Once the configurable instruction set architecture is in a satisfactory state, a verifier suite is developed for it.

在关注这个过程的软件方面的同时，该系统还通过开发一种可配置的处理器来关注硬件方面。接着，使用诸如成本、性能、功率、功能等系统目标以及关于可用的处理器生产厂家的信息，该系统设计整体的系统体系结构，它考虑到可配置的ISA选项、扩展和处理器特征。使用整体的系统体系结构、开发软件、仿真程序、可配置的指令集体系结构以及处理器的HDL实施方案，由该系统来配置处理器ISA，HDL实施方案，软件和仿真程序，并且系统的HDL被设计用于系统在一个芯片上的设计。同样，基于系统体系结构和芯片版型的说明，基于相对于系统HDL的版型能力的评估来选择芯片的版型(不像在现有技术中涉及处理器选择那样)。最后，使用该版型的标准单元库，该配置系统合成电路，对它进行布局和布线，并提供对布局和定时进行重新优化的能力。随后，若该设计不属于单片类型，则对电路板布局进行设计，制造各芯片，并组装各电路板。While focusing on the software side of the process, the system also focuses on the hardware side by developing a configurable processor. Next, using system goals such as cost, performance, power, functionality, and information about available processor manufacturers, the system designs an overall system architecture that takes into account configurable ISA options, extensions, and processor features. Using the overall system architecture, development software, emulators, configurable instruction set architecture, and processor HDL implementations, the system configures the processor ISA, HDL implementations, software, and emulation programs, and the system HDL are designed for system-on-a-chip design. Also, based on a specification of the system architecture and chip version, the chip version is selected based on an evaluation of the version's capabilities relative to the system HDL (unlike in the prior art involving processor selection). Finally, using the version's standard cell library, the configuration system synthesizes the circuit, places and routes it, and provides the ability to re-optimize placement and timing. Then, if the design is not of the monolithic type, the board layout is designed, the chips are fabricated, and the boards are assembled.

如同上面所看到的那样，使用了几种技术以便于实现处理器设计过程的范围广泛的自动化。用以解决这些问题的第1项技术就是设计和实现专用的机制，它不像随意修改或扩展那样灵活，但是它仍然允许重大的功能改进。通过限制更改的随意性，与此相关的各种问题也受到约束。As seen above, several techniques are used to facilitate wide-ranging automation of the processor design process. The first technique to solve these problems is to design and implement a dedicated mechanism, which is not as flexible as arbitrary modification or extension, but which still allows significant functional improvements. By limiting the arbitrariness of changes, the various problems associated with this are also constrained.

第2项技术就是对各项更改提供一项单独的说明，并自动地对所有受影响的部件产生修改或扩展。由于把某件事情用手工做一次，跟编写一种工具去自动地做这件事情并使用该工具一次相比，前者通常是更廉价的，所以用现有技术设计的处理器做不到这一点。当该项任务被多次重复执行时，就能看出自动化的优点。The second technique is to provide a single specification for each change and automatically produce modifications or extensions to all affected components. Since it is usually cheaper to do something by hand once than to write a tool to do it automatically and use the tool once, processors designed with the prior art cannot do this. a little. The benefits of automation can be seen when the task is repeated many times.

所使用的第3项技术就是建立一个数据库，以便为后续的用户评估的估计和自动配置提供帮助。The third technique used is to build a database to assist in estimation and automatic configuration for subsequent user evaluations.

最后，第4项技术就是以一种适合于配置的形式来提供硬件和软件。在本发明的一个实施例中，某些硬件和软件不是直接地用标准的硬件和软件语言来编写，而是用这样一种语言来编写：通过添加一个预处理器，它允许查询配置数据库，以及具有置换、条件、复制和其他修改功能的标准硬件和软件语言的产生。然后用能够将各项改进连接进来的钩子来完成处理器的设计。Finally, the fourth technique is to provide hardware and software in a form suitable for deployment. In one embodiment of the invention, some hardware and software are written not directly in standard hardware and software languages, but in a language that, by adding a preprocessor, allows querying of a configuration database, and the generation of standard hardware and software languages with permutation, conditional, copy, and other modification functions. The processor design is then completed with hooks that allow the improvements to be wired in.

为了说明这些技术，考虑添加各项专用指令。通过将该方法限制于具有寄存器和常数操作数并产生一个寄存器结果的各种指令，就能仅用组合(无状态，无反馈)逻辑来说明各种指令的运作。这个输入指定操作码的分配、指令名、汇编程序句法，以及用于该指令的组合逻辑(各种工具由此产生)：To illustrate these techniques, consider adding each dedicated instruction. By restricting the method to instructions that have registers and constant operands and produce a register result, the operation of the instructions can be described using only combinational (stateless, feedback-free) logic. This input specifies the assignment of the opcode, instruction name, assembler syntax, and combinatorial logic for that instruction (from which various tools are generated):

—该处理器的指令解码逻辑，用以识别新的操作码；- the processor's instruction decode logic to identify new opcodes;

—添加一个功能单元，以便在寄存器操作数上面执行组合逻辑功能；— Add a functional unit to perform combinational logic functions on register operands;

—送往处理器的指令调度逻辑的输入，以确认仅当其操作数为有效时，才发出指令；— an input to the processor's instruction dispatch logic to confirm that an instruction is issued only if its operand is valid;

—汇编程序的修改，以接受新的操作码及其操作数，并产生正确的机器代码；— Assembler modifications to accept new opcodes and their operands, and produce correct machine code;

—编译程序的修改，增加新的内部函数，以便访问新的指令；—Modification of the compiler, adding new internal functions in order to access new instructions;

—反汇编程序/调试程序的修改，以便将机器代码翻译为新指令；— modification of the disassembler/debugger to translate machine code into new instructions;

—仿真程序的修改，以便接受新的操作码并执行所指定的逻辑功能；以及—Modification of the emulation program so that it accepts the new opcode and performs the specified logic function; and

—诊断程序发生器，它产生直接的和随机的代码序列，用以包含和检查所增加的各项指令的结果。- Diagnostic program generator, which generates direct and random code sequences to contain and check the results of the added instructions.

以上的所有技术都被用来添加各种专用指令。输入被限制为输入和输出各操作数和逻辑以便对他们进行评估。在一处对各项更改进行描述，并且所有硬件和软件的修改都从该描述中导出。这种设置表示一个单独的输入如何能够被用来改进多个部件。All of the above techniques are used to add various specialized instructions. Inputs are restricted to input and output operands and logic to evaluate them. Each change is described in one place, and all hardware and software modifications are derived from this description. This setup shows how a single input can be used to improve multiple components.

这个处理过程的结果是一个这样的系统，由于在设计过程中的更晚一些时候，在处理器和系统逻辑的其余部分之间可以作出各种折衷，所以该系统在满足应用需求方面优于现有技术。由于它的配置可以应用于更多的表示形式，所以它优于上面所讨论的多种现有技术方案。一个单独的源可以用于所有的ISA编码，软件工具和高级仿真可以纳入一个配置包，并且流程可以被设计成迭代式的以便找出各项配置数值的最佳组合。还有，前面所述的各种方法仅单独地集中于硬件配置或软件配置，而没有用于控制的单独的用户接口，或者一个用于用户引导的重新定义的测量系统，而本发明则将全部流量分配给处理器硬件和软件的配置，包括来自硬件设计和软件性能的结果，以帮助选择最佳的配置。The result of this process is a system that outperforms current applications in meeting application requirements because of the various trade-offs that can be made between the processor and the rest of the system logic later in the design process. have technology. It outperforms various prior art solutions discussed above because its configuration can be applied to more representations. A single source can be used for all ISA codes, software tools and advanced simulations can be incorporated into a configuration package, and the process can be designed to iteratively find the best combination of configuration values. Also, the various methods described above only focus on hardware configuration or software configuration alone, without a separate user interface for control, or a redefined measurement system for user guidance, while the present invention will Full traffic allocation to processor hardware and software configuration, including results from hardware design and software performance, to help select the best configuration.

根据本发明的一个方面，通过提供一种自动化的处理器设计工具就能达到这些目标，上述设计工具使用以标准化语言编写的定制的处理器指令集扩展的描述，来开发目标指令集的可配置的定义，为实现该指令集所需的电路的硬件描述语言说明，以及各种开发工具，诸如编译程序、汇编程序、调试程序和仿真程序，它们可以被用来为该处理器开发各种应用，以及对它进行验证。标准化语言能够处理指令集扩展，后者修改处理器状态或使用可配置的处理器。通过提供一种受到限制的扩展和优化的领域，就能在更高的程度上实现过程的自动化，从而促进快速和可靠的开发。According to one aspect of the present invention, these objectives are achieved by providing an automated processor design tool that uses descriptions of custom processor instruction set extensions written in standardized languages to develop configurable The definition of the hardware description language description of the circuits required to implement the instruction set, and various development tools, such as compilers, assemblers, debuggers and emulators, which can be used to develop various applications for the processor , and validate it. Standardized languages can handle instruction set extensions that modify processor state or use configurable processors. By providing a restricted field of expansion and optimization, a higher degree of process automation can be achieved, thereby facilitating rapid and reliable development.

根据本发明的另一个方面，通过提供一个这样的系统也能进一步地达到上述目标，在该系统中，用户能够保存多组可能的指令或状态(在下文中，可能的可配置的指令或状态的组合将被统称为“处理器改进”)，并在评估它们的应用时，容易在它们之间进行切换。According to another aspect of the present invention, the above objects can be further achieved by providing a system in which the user can save sets of possible commands or states (hereinafter, possible configurable commands or states) combinations will be collectively referred to as "Processor Improvements"), and it is easy to switch between them when evaluating their application.

用户使用在这里所描述的方法来选择和建立一个基本处理器。用户生成新的一组用户定义的处理器改进并将它们放进一个文件目录之中。然后，用户启用一种用以处理用户改进的工具，并将它们转换为基本的软件开发工具可以使用的形式。由于它仅涉及用户定义的改进并且不建立一个完整的软件系统，所以这种转换是很快的。然后用户启用基本的软件开发工具，告诉该工具动态地使用在新目录中生成的各项处理器改进。最好是，经由一个命令行选项或者经由一个环境变量，向各工具给出该目录的位置。为了进一步地简化这个过程，用户可以使用标准的软件makefiles。这些使用户能够修改它们的处理器指令，并且随后经由一条单独的make命令，来处理各项改进，并使用基本的软件开发工具，在新的处理器改进的名义下重建和评估他们的应用。The user selects and builds a base processor using the methods described here. The user generates a new set of user-defined processor enhancements and puts them into a file directory. The user then enables a tool that processes the user improvements and converts them into a form that basic software development tools can use. This transition is quick since it involves only user-defined improvements and does not build a complete software system. The user then enables the basic software development tool, telling the tool to dynamically use each processor improvement generated in the new directory. Preferably, the location of this directory is given to each tool via a command line option or via an environment variable. To further simplify this process, users can use standard software makefiles. These enable users to modify their processor instructions, and then, via a single make command, process each improvement, and use basic software development tools to rebuild and evaluate their application in the name of the new processor improvement.

本发明克服了现有技术方案中的3种限制。给出了一组新的可能的改进，用户可以在几分钟时间内评估各项新的改进。通过为每一组生成新的目录，用户就能保存可能的各项改进的多种版本。由于该目录仅包括各项新改进的描述，而不是整个的软件系统的描述，所以所需的存储空间是最小的。最后，各项新的改进跟配置的其余部分解除连接。一旦用户已经生成了具有各项新改进的一个可能的集合的目录，她就能将该目录跟任何基本配置配合使用。The present invention overcomes three limitations in the prior art solutions. Given a new set of possible improvements, users can evaluate each new improvement in a matter of minutes. By generating a new catalog for each set, the user can save multiple versions of each possible improvement. Since the catalog only contains descriptions of individual new improvements, and not of the entire software system, the required storage space is minimal. Finally, each new improvement is decoupled from the rest of the configuration. Once a user has generated a catalog with a possible set of new improvements, she can use that catalog with any base configuration.

附图的简要说明Brief description of the drawings

当结合诸附图来阅读下列详细说明时，本发明的上述和其他目标将变得更加明显，在诸附图中：These and other objects of the invention will become more apparent when the following detailed description is read in conjunction with the accompanying drawings in which:

图1是一份方框图，表示在执行根据本发明的一个优选实施例的指令集的一个处理器；Figure 1 is a block diagram representing a processor executing an instruction set according to a preferred embodiment of the present invention;

图2是一份方框图，表示根据本该实施例的处理器中所使用的一条流水线的方框图；Fig. 2 is a block diagram showing a block diagram of a pipeline used in the processor according to this embodiment;

图3表示在根据本实施例的图形用户接口(GUI)中的一种配置管理程序；Fig. 3 shows a kind of configuration management program in the graphical user interface (GUI) according to the present embodiment;

图4表示在根据本实施例的图形用户接口(GUI)中的一个配置编辑程序；Fig. 4 shows a configuration editing program in the graphical user interface (GUI) according to the present embodiment;

图5表示在根据本实施例的可配置性的不同类型；Figure 5 represents the different types of configurability according to the present embodiment;

图6是一份方框图，表示在该实施例的处理器配置的流程；Fig. 6 is a block diagram, represents the flow process of the processor configuration in this embodiment;

图7是一份方框图，表示根据本实施例的一个指令集；Fig. 7 is a block diagram showing an instruction set according to the present embodiment;

图8是一份方框图，表示用于根据本发明而配置的一个处理器的一块仿真板；Figure 8 is a block diagram showing an emulation board for a processor configured in accordance with the present invention;

图9是一份方框图，表示根据本实施例的可配置处理器的逻辑结构；Fig. 9 is a block diagram showing the logical structure of the configurable processor according to the present embodiment;

图10是一份方框图，表示将一个乘法器添加到图9的结构之中；Figure 10 is a block diagram showing the addition of a multiplier to the structure of Figure 9;

图11是一份方框图，表示将一个乘法累加单元添加到图9的结构之中；Figure 11 is a block diagram showing the addition of a multiply-accumulate unit to the structure of Figure 9;

图12和13这两份图表示在本实施例中的存储器的配置；以及These two figures of Fig. 12 and 13 represent the configuration of the memory in the present embodiment; And

图14和15这两份图表示在图8的结构中的用户定义功能单元的添加；The two figures of Figures 14 and 15 represent the addition of user-defined functional units in the structure of Figure 8;

图16是一份方框图，表示在另一个优选实施例中，介于各系统部件之间的信息流；Figure 16 is a block diagram showing the flow of information between various system components in another preferred embodiment;

图17是一份方框图，表示在本实施例中，用于各种软件开发工具的定制代码是如何产生的；Fig. 17 is a block diagram showing how, in the present embodiment, custom codes for various software development tools are generated;

图18是一份方框图，表示在本发明的另一个优选实施例中，所使用的各种软件模块的产生；Fig. 18 is a block diagram showing that in another preferred embodiment of the present invention, the various software modules used are generated;

图19是一份方框图，表示在根据本实施例的一个可配置的处理器中的流水线的结构；Fig. 19 is a block diagram showing the structure of the pipeline in a configurable processor according to the present embodiment;

图20是根据本实施例的状态寄存器的实施方案；Fig. 20 is the implementation of the state register according to the present embodiment;

图21是一份图，表示在本实施例中，为实现状态寄存器所需的附加逻辑；Figure 21 is a diagram showing the additional logic required to implement the status register in this embodiment;

图22是一份图，表示来自几个语义块的一种状态的下一种状态输出的组合，以及选择其中一个输入到根据本实施例的一个状态寄存器之中；Fig. 22 is a graph representing the combination of the next state output from a state of several semantic blocks, and selecting one of them to be input into a state register according to the present embodiment;

图23表示对应于根据本实施例的语义逻辑的逻辑；Figure 23 represents the logic corresponding to the semantic logic according to the present embodiment;

图24表示在本实施例中，当被映射到用户寄存器的一个比特时，针对状态的一个比特的逻辑。Figure 24 shows the logic for one bit of status when mapped to one bit of the user register in this embodiment.

各优选实施例的详细说明Detailed description of each preferred embodiment

一般来说，自动处理器产生过程开始于可配置的处理器定义以及用户指定的对它的修改，还有有待于为其配置处理器的用户指定的应用程序。此项信息被用来产生一个考虑到用户修改的可配置的处理器，以及产生软件开发工具，例如，针对它的编译程序、仿真程序、汇编程序和反汇编程序，等等。同样，使用各种新的软件开发工具对应用程序进行再编译。使用仿真程序对经过再编译的应用程序进行仿真，来产生一个软件特征文件，用以描述运行该应用程序的已配置的处理器的性能，并且就硅芯片面积利用、功耗、速度等方面对已配置的处理器进行评估，以便产生一个表征处理器电路实施方案的硬件特征文件。软件和硬件特征文件被反馈并提供给用户，以便进行进一步的迭代配置，使地处理器针对该项特定应用程序被优化。In general, the automatic processor generation process begins with a configurable processor definition and user-specified modifications to it, subject to the user-specified application for which the processor is configured. This information is used to generate a processor that is configurable to allow for user modifications, and to generate software development tools such as compilers, emulators, assemblers and disassemblers, etc. for it. Likewise, applications are recompiled using various new software development tools. The recompiled application is simulated using an emulator to generate a software profile that describes the performance of the configured processor running the application, with respect to silicon area utilization, power consumption, speed, etc. The configured processor is evaluated to generate a hardware profile that characterizes a processor circuit implementation. Software and hardware profiles are fed back and provided to the user for further iterative configuration to optimize the processor for that specific application.

根据本发明的一个优选实施例的自动处理器产生系统10具有4个主要部件，如图1所示：一个用户配置接口20，希望通过它来设计处理器的用户输入其可配置性和可扩展性选项以及其他设计约束条件；一套软件开发工具30，它可以被定制，以便按照用户所选择的标准来设计处理器；对处理器40的硬件实施方案的参数化的、可扩展的描述；以及一个建立系统50，它从用户接口那里接收输入数据，产生所要求处理器的定制的、可合成的硬件描述，以及修改各种软件开发工具以适应所选定的设计。最好是，建立系统50附带地产生诊断工具，用以验证硬件和软件设计，还产生一个评估器，用以评估硬件和软件的各项特性。The automatic processor generation system 10 according to a preferred embodiment of the present invention has 4 main components, as shown in FIG. 1 : a user configuration interface 20, through which user input for the design of the processor is desired, its configurability and extensibility performance options and other design constraints; a suite of software development tools 30 that can be customized to design a processor according to user-selected criteria; a parametric, extensible description of a hardware implementation of the processor 40; and a build system 50 which receives input data from the user interface, generates a custom, synthesizable hardware description of the required processor, and modifies various software development tools to suit the selected design. Preferably, build system 50 additionally generates diagnostic tools for verifying hardware and software designs and an evaluator for evaluating hardware and software characteristics.

在本文中以及在所附的权利要求书中所使用的“硬件实施方案描述”指的是，用以描述处理器设计的物理的实施方案的各个方面的一项或多项描述，并且，单独使用或结合一项或多项其他描述，以便于根据该设计的各芯片的生产。因此，硬件实施方案描述的各部分可以处于不同层次的抽象，从诸如硬件描述语言那样的相当高级，通过网表和微代码到各项屏蔽描述。在本实施例中，硬件实施方案描述的主要部分被写入HDL、网表和稿本之中。As used herein and in the appended claims, a "hardware implementation description" refers to one or more descriptions that describe various aspects of the physical implementation of a processor design, and, individually Use or combine with one or more of the other descriptions to facilitate the production of each chip according to the design. Thus, parts of the hardware implementation description can be at different levels of abstraction, from fairly high levels such as hardware description languages, through netlists and microcode, to various mask descriptions. In this embodiment, the main part of the hardware implementation description is written in HDL, netlist and script.

而且，在本文中以及在所附的权利要求书中所使用的HDL指的是一般级别的硬件描述语言，它被用来描述微结构之类，并且不打算用它来表示这种语言的任何特例。Moreover, HDL as used herein and in the appended claims refers to a general-level hardware description language, which is used to describe microstructures and the like, and it is not intended to represent any special case.

在本实施例中，处理器配置的基础就是图2所示的体系结构60。该结构的许多元素是用户不能直接地进行修改的基本特性。这些包括处理器控制段62，调整和解码段64(虽然此段的某些部分基于用户指定的配置)，ALU和地址产生段66，分支逻辑和取指令段68，以及处理器接口70。其他各单元都是基本处理器的一部分，但是可以由用户进行配置。这些包括中断控制段72，数据与指令地址监视段74和76，窗口寄存器文件78，数据与指令高速缓冲存储与标记段80，写缓冲器82和定时器84。图2所示的剩下的各段可以由用户可选地加以纳入。In this embodiment, the processor configuration is based on the architecture 60 shown in FIG. 2 . Many elements of the structure are fundamental properties that cannot be directly modified by the user. These include processor control segment 62 , alignment and decode segment 64 (although some parts of this segment are based on user-specified configuration), ALU and address generation segment 66 , branch logic and instruction fetch segment 68 , and processor interface 70 . Every other unit is part of the base processor, but is user configurable. These include interrupt control segment 72 , data and instruction address monitoring segments 74 and 76 , window register file 78 , data and instruction cache and tag segment 80 , write buffer 82 and timer 84 . The remaining segments shown in Figure 2 can optionally be included by the user.

处理器配置系统10的中央部件是用户配置接口20。这是一个模块，它最好是能向用户提供图形用户接口(GUI)，借助于这个接口，用户有可能去选择包括编译程序重新配置以及汇编程序、反汇编程序和指令集仿真程序(ISS)在内的处理器功能；并准备用于整个处理器合成、布局和布线的的输入。它还让用户得益于处理器面积、功耗、循环时间、应用性能以及代码长度的快速评估，以便进一步地迭代和改善处理器的配置。最好是，GUI还能访问一个配置数据库，以便根据用户输入来取得默认值，并进行差错检测。A central component of processor configuration system 10 is user configuration interface 20 . This is a module, which preferably provides the user with a graphical user interface (GUI), by means of which the user has the possibility to select options including compiler reconfiguration and assembler, disassembler and instruction set simulator (ISS) processor functions included; and prepare inputs for overall processor synthesis, place and route. It also allows users to benefit from quick assessments of processor area, power consumption, cycle time, application performance, and code size to further iterate and improve processor configurations. Preferably, the GUI also has access to a configuration database for default values based on user input and for error detection.

为了使用根据本实施例的自动处理器生成系统10来设计一个处理器60，用户将设计参数输入到用户配置接口20之中。自动处理器生成系统10可以是在用户控制下运行于计算机系统之上的一个孤立系统；然而，它最好是主要地运行于自动处理器生成系统10的生产厂家的控制下的一个系统之上。这样一来，就可以在一个通信网络上来提供用户访问。例如，可以使用一个具有用HTML和Java语言编写的数据输入屏幕的网络浏览器来提供GUI。这有几方面的好处，例如保持任何专有的后端软件的保密性，简化后端软件的维护和更新，等等。在这种情况下，为了访问GUI，用户首先要在系统10中进行登录，以便证明其身份。To design a processor 60 using the automatic processor generation system 10 according to the present embodiment, a user enters design parameters into the user configuration interface 20 . The automatic processor generation system 10 may be an isolated system running on a computer system under user control; however, it is preferably run primarily on a system under the control of the manufacturer of the automatic processor generation system 10 . In this way, user access can be provided over a communications network. For example, the GUI can be provided using a web browser with data entry screens written in HTML and Java languages. This has several benefits, such as keeping any proprietary back-end software confidential, simplifying maintenance and updating of the back-end software, and so on. In this case, in order to access the GUI, the user must first log in to the system 10 in order to prove his identity.

一旦用户获准访问，系统将显示一个配置管理员屏幕86，如图3所示。配置管理员屏幕86是一份目录，其上列出了用户可访问的所有配置。图3中的配置管理员屏幕86表示用户有两种配置，“just intr”和“high prio”，前者已经被建立，即，已被定型用于生产，而后者尚有待建立。从这个屏幕86用户可以建立一种选定的配置，对它进行删除、编辑，生成一份报告，说明已经为该种配置选择那一种配置和扩展选项，或者生成一种新的配置。对那些已经建立的配置来说，例如“just intr”，可以下载为它定制的一套软件开发工具30。Once the user is granted access, the system will display a configuration administrator screen 86, as shown in FIG. The configuration administrator screen 86 is a directory listing all configurations accessible to the user. The configuration administrator screen 86 in FIG. 3 shows that the user has two configurations, "just intr" and "high prio", the former already established, ie finalized for production, and the latter yet to be established. From this screen 86 the user can create a selected configuration, delete it, edit it, generate a report stating which configuration and expansion options have been selected for that configuration, or create a new configuration. For those already established configurations, such as "just intr", a custom set of software development tools 30 can be downloaded.

图4示出了生成一种新的配置或者对一种现有的配置进行编辑要用到图4所示的配置编辑程序88。配置编辑程序88在左边有一个“选项”选择菜单，表示可配置和可扩展的处理器60的各个大概的方面。当一个选项部分被选定时，在右边就出现针对该部分的具有各配置选项的屏幕，并且可以如业界所熟知的那样，用下拉式菜单、便笺框、检查框、收音机旋钮等来设置这些选项。虽然用户可以随机地选择各选项并输入数据，但是，由于在各部分之间存在逻辑上的依赖性，所以最好还是按照顺序逐项输入数据；例如，为了适当地显示在“中断”部分的各选项，中断的号码应当是在“ISA选项”部分中已经被选定的那些。FIG. 4 shows that the configuration editing program 88 shown in FIG. 4 is used to generate a new configuration or edit an existing configuration. The configuration editor 88 has an "options" selection menu on the left, representing various general aspects of the processor 60 that are configurable and expandable. When an options section is selected, a screen with configuration options for that section appears on the right, and these can be set as is well known in the industry using drop-down menus, note boxes, checkboxes, radio knobs, etc. options. Although users can randomly select options and enter data, due to logical dependencies between sections, it is best to enter data item by item in order; for example, in order to properly display the For each option, the number of interrupts should be the ones already selected in the "ISA Options" section.

在本实施例中，对每一部分来说，下列的配置选项都是可用的：In this example, for each part, the following configuration options are available:

目标Target

用于评估的技术Techniques Used for Evaluation

目标ASIC技术：.18，.25，.35微米Targeted ASIC technologies: .18, .25, .35 microns

目标运行条件：典型的，最坏情况Target Operating Conditions: Typical, Worst Case

实施方案目标Implementation plan goals

目标速度：任意Target speed: any

门计数：任意Gate Count: Any

目标功能：任意Target function: any

目标优先顺序：速度，面积功能；速度，功能，面积ISA选项Target Prioritization: Speed, Area Function; Speed, Function, Area ISA Options

数位选项digital option

具有40位累加器的MAC16：是，否MAC16 with 40-bit accumulator: yes, no

16位乘法器：是，否16-bit multiplier: yes, no

除外选项Exclusion option

中断的数目：0-32Number of interrupts: 0-32

高优先级中断等级：0-14High priority interrupt level: 0-14

激活调试程序：是，否Activate debugger: yes, no

定时器数目：0-3Number of timers: 0-3

其他other

字节顺序：低位在先，高位在先Byte order: LSB first, MSB first

可用于调用窗口的寄存器数目：32，64Number of registers available for calling window: 32, 64

处理器高速缓冲存储器与存储器Processor Cache and Memory

处理器接口读出宽度(比特)：32，64，128Processor interface readout width (bits): 32, 64, 128

写缓冲器行(地址/数值对)：4，8，16，32Write buffer rows (address/value pairs): 4, 8, 16, 32

处理器高速缓冲存储器processor cache memory

指令/数据高速缓冲存储器大小(kB)：1，2，4，8，16Instruction/Data Cache Size (kB): 1, 2, 4, 8, 16

指令/数据高速缓冲存储器行大小(kB)：16，32，64Instruction/Data Cache Line Sizes (kB): 16, 32, 64

外围部件peripheral components

定时器timer

定时器中断数目Number of timer interrupts

定时器中断等级Timer Interrupt Level

调试支持Debug support

指令地址断点寄存器数目：0-2Number of instruction address breakpoint registers: 0-2

数据地址断点寄存器数目：0-2Number of data address breakpoint registers: 0-2

调试中断等级debug interrupt level

跟踪端口：是，否Trace port: yes, no

芯片上的调试模块：是，否On-chip debug module: yes, no

全扫描：是，否Full Scan: Yes, No

中断to interrupt

源：外部，软件Source: External, Software

优先等级Priority

系统存储器地址system memory address

矢量与地址计算方法：XTOS，手工Vector and address calculation method: XTOS, manual

配置参数configuration parameters

RAM大小，起始地址：任意RAM size, starting address: any

ROM大小，起始地址：任意ROM size, starting address: any

XTOS：任意XTOS: any

配置专用地址Configure private address

用户除外矢量：任意User Except Vector: Any

核除外矢量：任意Excluded vector: any

寄存器窗口溢出/下溢矢量基地址：任意Register window overflow/underflow vector base address: any

复位矢量：任意Reset vector: any

XTOS起始地址：任意XTOS starting address: any

应用程序起始地址：任意Application start address: any

TIE指令TIE instruction

(定义各项ISA扩展)(Define various ISA extensions)

目标CAD环境target CAD environment

仿真simulation

Verilog^TM：是，否 ^VerilogTM : Yes, No

合成synthesis

Design Compiler^TM：是，否Design Compiler ^™ : Yes, No

布局和布线place and route

Apollo^TM：是，否 ^ApolloTM : Yes, No

此外，系统10还提供添加其他功能单元的选项，例如32位整数乘/除单元或浮点算术运算单元；存储器管理单元；芯片上的RAM和ROM选项；高速缓冲存储器的关联性；增强的DSP及协处理器指令集；回写的高速缓冲存储器；多处理器同步；编译程序引导的推论；以及对附加的CAD封装的支持。可用于一个给定的可配置处理器的那些配置选项，最好是在一份定义文件(例如附录A所示的那一种)中将它们列出，以便一旦用户选定适当的选项时，系统10将其用于句法检查等。In addition, system 10 provides the option to add other functional units such as a 32-bit integer multiply/divide unit or a floating-point arithmetic unit; a memory management unit; on-chip RAM and ROM options; cache memory associativity; enhanced DSP and coprocessor instruction set; write-back cache memory; multiprocessor synchronization; compiler-guided inference; and support for additional CAD packages. The configuration options available for a given configurable processor are preferably listed in a definition file (such as the one shown in Appendix A) so that once the user selects the appropriate option, System 10 uses this for syntax checking and the like.

从以上所述可以看出，自动处理器配置系统10向用户提供两种宽广类型的可配置性300，如图5所示：可扩展性302，它允许用户从搜索中定义任意的功能和结构，以及可修改性304，它允许用户从预定的、受约束的选项集里面进行选择。在可修改性的范围内，系统允许某些特性的二进制选择306，例如，应当将一个MAC16还是一个DSP添加到处理器60以及其他处理器特性的参数说明308，后者例如中断的数目和高速缓冲存储器的大小。As can be seen from the above, the automated processor configuration system 10 provides users with two broad types of configurability 300, shown in Figure 5: extensibility 302, which allows users to define arbitrary functions and structures from a search , and modifiability 304, which allows the user to choose from a predetermined, constrained set of options. Within the scope of modifiability, the system allows binary selection 306 of certain characteristics, such as whether a MAC16 or a DSP should be added to the processor 60, and parameter specification 308 of other processor characteristics, such as the number of interrupts and high-speed The size of the buffer memory.

在上述配置选项中，许多都是专业人士熟悉的；然而，还有另外一些值得关注。例如，RAM和ROM选项允许设计者将便笺式存储器或固件纳入到处理器本身。处理器10可以从这些存储器中取指令或读写数据。存储器的大小和位置是可配置的。在本实施例中，这些存储器中的每一个都是作为在一个集合关联的高速缓冲存储器中的一个附加集而被访问的。通过跟一个单独的标记行进行比较，就能检出在存储器中的一次命中。Many of the above configuration options are familiar to professionals; however, there are others that deserve attention. For example, RAM and ROM options allow designers to incorporate scratch pad memory or firmware into the processor itself. Processor 10 can fetch instructions or read and write data from these memories. The size and location of the memory is configurable. In this embodiment, each of these memories is accessed as an additional set in a set associative cache. A hit in memory can be detected by comparing to a single tag line.

由于每一个高优先等级中断需要3个专用寄存器，开销较大，所以系统10为中断(实现各种1级中断)以及高优先等级中断选项(实现2-15级中断以及各种不可屏蔽中断)提供独立的配置选项。Since each high-priority interrupt needs 3 dedicated registers, the overhead is large, so the system 10 is an interrupt (to realize various 1-level interrupts) and a high-priority interrupt option (to realize 2-15 level interrupts and various non-maskable interrupts) Provides independent configuration options.

具有40位累加器选项的MAC16(示于图2的90)添加了一种16位的乘法器/加法器功能，后者具有一个40位的累加器、8个16位操作数寄存器以及一组复合指令，它将乘法、累加、操作数装载以及地址更新指令组合在一起。可以在与乘法/累加运算并行的条件下，从存储器中将成对的16位数装载到操作数寄存器。这个单元能够支持每个周期两次装载和1次乘法/累加运算的各种算法。The MAC16 with 40-bit accumulator option (90 shown in Figure 2) adds a 16-bit multiplier/adder function with a 40-bit accumulator, eight 16-bit operand registers, and a set of A compound instruction that combines multiply, accumulate, operand load, and address update instructions. Pairs of 16-bit numbers can be loaded from memory into operand registers in parallel with the multiply/accumulate operations. This unit is capable of supporting various algorithms with two loads and one multiply/accumulate per cycle.

芯片上的调试模块(示于图2中的92)被用来通过JTAG端口94去访问处理器60内部的、软件可见的状态。模块92为除外情况的产生提供支持，使处理器60进入调试方式；访问所有的程序可见存储器或存储器位置，执行处理器60被配置去执行的任何指令；修改程序计数器PC使其跳到在代码中的所需位置；以及一段应用程序，它允许返回到正常运算方式，这种方式是从处理器60外部，经由JTAG端口94来触发的。An on-chip debug module (shown at 92 in FIG. 2 ) is used to access the internal, software-visible state of processor 60 via JTAG port 94 . Module 92 provides support for exception generation, causes processor 60 to enter debug mode; accesses all program-visible memory or memory locations, executes any instruction that processor 60 is configured to execute; modifies program counter PC to jump to and an application program that allows a return to normal operating mode, which is triggered from outside the processor 60 via the JTAG port 94.

一旦处理器10进入调试方式，它就从外部世界等待关于一条有效的指令已经经由JTAG端口94被扫描进来的指示。一旦处理器10的硬件实现已经被生产出来，模块92就被用来调试该系统。可以经由运行于一部远方主机之上的调试程序来控制处理器10的执行。调试程序经由JTAG端口94跟处理器建立接口，并使用芯片上的调试模块92的能力来确定和控制处理器10的状态以及控制各指令的执行。Once processor 10 enters debug mode, it waits from the outside world for an indication that a valid instruction has been scanned in via JTAG port 94 . Once a hardware implementation of processor 10 has been produced, module 92 is used to debug the system. Execution of processor 10 may be controlled via a debugger running on a remote host computer. The debugger interfaces with the processor via the JTAG port 94 and uses the capabilities of the on-chip debug module 92 to determine and control the state of the processor 10 and to control the execution of instructions.

可以配置多达3个32位记数器/定时器84。这使得32位寄存器的使用令每一个时钟周期以及(对每一个已配置的定时器来说)比较寄存器都增加1，比较器将比较寄存器的内容跟当前时钟寄存器的计数加以比较，用于中断和类似的功能。记数器/定时器可以被配置为边沿触发，并能产生普通的和高优先等级的内部中断。Up to three 32-bit counter/timers 84 can be configured. This enables the use of 32-bit registers to increase each clock cycle and (for each configured timer) the compare register by 1. The comparator compares the contents of the compare register with the count of the current clock register for interrupts. and similar functions. The counter/timer can be configured as edge-triggered and can generate normal and high-priority internal interrupts.

推断选项通过让装载随机应变地移动，以控制流量，使之流向它们不经常被执行的地方，来提供编译程序在调度上的更大的灵活性。由于装载可能导致除外情况，这样的装载移动可能会把除外情况引入到一段原先没有出现的有效程序之中。当装载被执行时，随机应变的装载能避免这些除外情况的出现，但当需要数据时，就提供一种除外情况。取代为一次装载差错而导致一种除外情况，随机应变的装载令目标寄存器的有效位复位(与该选项有关的新的处理器状态)。The infer option provides compilers with greater flexibility in scheduling by allowing loads to move on ad-hoc basis to control flow to places where they are not executed very often. Since loads may cause exceptions, such load moves may introduce exceptions into a valid program that were not present before. Adaptive loading avoids these exceptions when the load is performed, but provides an exception when data is needed. Instead of causing an exception for a load error, the ad hoc load resets the valid bit of the target register (new processor state associated with this option).

虽然核心处理器60最好具有某些基本的流水线同步能力，但当一个系统使用多个处理器时，需要介于各处理器之间的某种通信或同步。在某些情况下，使用诸如输入和输出队列那样的自同步通信技术。在其他情况下，共享存储器模型被用于通信，并且由于共享存储器不提供所需的语义，所以有必要提供支持同步的指令集。例如，可以添加具有获得和释放语义(功能)的装载和存储指令。在那些不同的存储器位置可能被用于同步和数据，使得在各同步引用之间必须保持精确的顺序这样的多处理器系统中，这对于控制存储器引用的顺序是有用的。其他指令可以被用来生成业界所熟知的信号。While core processor 60 preferably has some basic pipeline synchronization capability, when a system uses multiple processors, some kind of communication or synchronization between the processors is required. In some cases, self-synchronizing communication techniques such as input and output queues are used. In other cases, a shared memory model is used for communication, and since shared memory does not provide the required semantics, it is necessary to provide an instruction set that supports synchronization. For example, load and store instructions with acquire and release semantics (functions) can be added. This is useful for controlling the order of memory references in multiprocessor systems where different memory locations may be used for synchronization and data such that precise order must be maintained between synchronization references. Other instructions can be used to generate signals well known in the industry.

在某些情况下，共享存储器模型被用于通信，并且由于共享存储器不提供所需的语义，所以有必要提供支持同步的指令集。通过多处理器同步选项来完成这一点。In some cases, a shared memory model is used for communication, and since shared memory does not provide the required semantics, it is necessary to provide an instruction set that supports synchronization. This is done through the multiprocessor synchronization option.

在各配置选项中，最重要的也许就是TIE指令的定义了，由此建立设计者定义的指令执行单元96。位于加州Santa Clara的Tensilica公司开发的TIE^TM(Tensilica指令集扩展)允许用户以扩展和新指令的形式为其应用程序描述定制的各项函数，以便扩充基本的ISA。此外，由于TIE的灵活性，它可以被用来描述用户不能改变的ISA部分；这样一来，整个ISA可以被用来一致地产生软件开发工具30以及硬件实施方案描述40。TIE说明使用多个积木块，对各新指令的属性描述如下：Of the configuration options, perhaps the most important is the definition of the TIE instruction, thereby creating a designer-defined instruction execution unit 96 . TIE ^TM (Tensilica Instruction Set Extension), developed by Tensilica Corporation of Santa Clara, California, allows users to describe custom functions for their applications in the form of extensions and new instructions to extend the basic ISA. Furthermore, due to TIE's flexibility, it can be used to describe parts of the ISA that cannot be changed by the user; thus, the entire ISA can be used to consistently generate software development tools 30 and hardware implementation descriptions 40 . TIE instructions use multiple building blocks, and the attributes of each new instruction are described as follows:

—指令字段 —指令类—Instruction field —Instruction class

—指令操作码 —指令语义— instruction opcode — instruction semantics

—指令操作数 —常数表—Instruction operand —Constant table

指令字段语句field被用来改进TIE代码的可读性。各字段是集合在一起并且用一个名字来引用的其他各字段的子集或连锁。在一条指令中，各比特的全集就是最高级的超集字段inst，并且这个字段可以划分为几个较小的字段。例如，The instruction field statement field is used to improve the readability of TIE code. Fields are subsets or concatenations of other fields that are grouped together and referenced by a name. In an instruction, the full set of each bit is the highest-level superset field inst, and this field can be divided into several smaller fields. For example,

field x inst[11：8]field x inst[11:8]

field y inst[15：12]field y inst [15:12]

field xy {x，y}field xy {x, y}

将两个4位字段，x和y，定义为最高级字段inst的子字段(分别是，比特8-11和12-15)，并将一个8位字段xy定义为x和y两个字段的连锁。Define two 4-bit fields, x and y, as subfields of the highest-level field inst (bits 8-11 and 12-15, respectively), and define an 8-bit field xy as the subfield of both fields x and y chain.

语句opcode为各编码专用字段定义各操作码。打算用来指定各操作数(例如各寄存器或各立即常数)的各指令字段，若准备为这样定义的各操作码所使用，则首先必须用字段语句加以定义，然后用操作数语句加以定义。The statement opcode defines each opcode for each encoding-specific field. Instruction fields intended to specify operands (such as registers or immediate constants) must first be defined with field statements and then with operand statements if they are to be used with opcodes thus defined.

例如，For example,

opcode acs op2＝4’b0000 CUST0opcode acs op2=4'b0000 CUST0

opcode adse1 op2＝4’b0001 CUST0opcode adse1 op2=4'b0001 CUST0

根据先前定义的操作码CUST0(4’b0000表示一组4位长的二进制常数0000)来定义两组新的操作码，acs和adse1。优选核心ISA的TIE说明具有下列语句Define two new sets of opcodes, acs and adse1, based on the previously defined opcode CUST0 (4'b0000 represents a set of 4-bit long binary constant 0000). The TIE specification for the preferred core ISA has the following statement

field op0 inst[3：0]field op0 inst[3:0]

field op1 inst[19：16]field op1 inst[19:16]

field op2 inst[23：20]field op2 inst[23:20]

opcode QRST op0＝4’b0000opcode QRST op0=4'b0000

opcode CUST0 op1＝4’b1000 QRSTopcode CUST0 op1=4'b1000 QRST

作为它的基本定义的一部分。因此，acs和adse1的定义使得TIE编译程序产生分别由下列语句表示的指令解码逻辑：as part of its basic definition. Therefore, the definitions of acs and adse1 cause the TIE compiler to generate instruction decoding logic represented by the following statements, respectively:

inst[23：0]＝0000 0110 xxxx xxxx xxxx 0000inst[23:0]＝0000 0110 xxxx xxxx xxxx 0000

inst[23：0]＝0001 0110 xxxx xxxx xxxx 0000inst[23:0]＝0001 0110 xxxx xxxx xxxx 0000

指令操作数语句operand标识各寄存器和立即常数。然而，在将一个字段定义为一个操作数之前，它应当事先已经被定义为一个如上所述的字段。若该操作数是一个立即常数，则可以从该操作数产生该常数的值，或者它可以从如下所述的一份事先定义的常数表中取值。例如，为了对一个立即操作数进行编码，TIE代码The instruction operand statement operand identifies registers and immediate constants. However, before defining a field as an operand, it should have been previously defined as a field as described above. If the operand is an immediate constant, the value of the constant may be derived from the operand, or it may be taken from a table of previously defined constants as described below. For example, to encode an immediate operand, the TIE code

field offset inst[23：6]field offset inst[23:6]

operand offests4 offset{operand offs4 offset{

assign offsets4 ＝ {{14{offset[17]}}，offset}<<2； assign offsets4 = {{14{offset[17]}}, offset}<<2;

}{}{

wire [31：0]t； wire[31:0]t;

assign t＝offsets4>>2； assign t = offsets4>>2;

assign offset＝t[17：0]； assign offset = t[17:0];

定义一个名为offset的18位的字段，它保存一个带符号的数以及一个操作数offsets4，它是存储在offset字段之中的数的4倍。Operand语句的最后部分实际上描述在Verilog^TMHDL的一个子集中用以进行计算的电路，上述HDL是用来描述组合电路的，正如专业人士所熟知的那样。Define an 18-bit field named offset that holds a signed number and an operand offsets4 that is four times the number stored in the offset field. The last part of the Operand statement actually describes the circuit used for computation in a subset of Verilog ^(TM) HDL, which is used to describe combinational circuits, as is well known to those skilled in the art.

这里，wire语句定义一组宽度为32位的名为t的逻辑接线。在wire语句之后的第1个assign语句指定驱动逻辑接线的逻辑信号是向右移位的常数offsets4，并且第2个assign语句指定t的低18位被放进offset字段。第1个assign语句直接地指定作为offset的一个连锁的操作数offsets4的值，并且其符号位(位17)的14个复制品被一个左移两位所跟随。Here, the wire statement defines a set of logical wires named t with a width of 32 bits. The first assign statement after the wire statement specifies that the logic signal driving the logic wire is the constant offsets4 shifted to the right, and the second assign statement specifies that the lower 18 bits of t are placed into the offset field. The first assign statement directly assigns the value of a concatenated operand offsets4 as offset, and 14 copies of its sign bit (bit 17) followed by a left shift of two bits.

对一个常数表操作数来说，TIE代码For a constant table operand, the TIE code

table prime 16{table prime 16{

2，3，5，7，9，11，13，17，19，23，29，31，37，41，43，47， 2, 3, 5, 7, 9, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,

5353

}}

operand prime_s s{operand prime_s s{

assign prime_s ＝ prime[s]； assign prime_s = prime[s];

} {} {

assign s ＝ prime_s＝＝ prime[0] ？ 4′b0000： assign s = prime_s == prime[0] ? 4'b0000:

prime_s＝＝prime[1] ？ 4′b0001： prime_s==prime[1] ? 4'b0001:

prime_s＝＝prime[2] ？ 4′b0010： prime_s==prime[2] ? 4'b0010:

prime_s＝＝prime[3] ？ 4′b0011： prime_s==prime[3] ? 4'b0011:

prime_s＝＝prime[4] ？ 4′b0100： prime_s==prime[4] ? 4'b0100:

prime_s＝＝prime[5] ？ 4′b0101： prime_s==prime[5] ? 4'b0101:

prime_s＝＝prime[6] ？ 4′b0110： prime_s==prime[6] ? 4'b0110:

prime_s＝＝prime[7] ？ 4′b0111： prime_s==prime[7] ? 4'b0111:

prime_s＝＝prime{8] ？ 4′b1000： prime_s==prime{8] ? 4'b1000:

prime_s＝＝prime[9] ？ 4′b1001： prime_s==prime[9] ? 4'b1001:

prime_s＝＝prime[10] ？ 4′b1010： prime_s==prime[10] ? 4'b1010:

prime_s＝＝prime[11] ？ 4′b1011： prime_s==prime[11] ? 4'b1011:

prime_s＝＝prime[12] ？ 4′b1100： prime_s==prime[12] ? 4'b1100:

prime_s＝＝prime[13] ？ 4′b1101： prime_s==prime[13] ? 4'b1101:

prime_s＝＝prime[14] ？ 4′b1110： prime_s==prime[14] ? 4'b1110:

4′b1111； 4'b1111;

}}

利用table语句来定义一个常数数组prime(跟随在表名字之后的数字是表中各元素的号码)，并且使用操作数s作为进入该表prime的一个索引，用以为操作数prime_s编码一个数值(注意在定义索引时Verilog^TM各语句的使用)。Use the table statement to define a constant array prime (the number following the table name is the number of each element in the table), and use the operand s as an index into the table prime to encode a value for the operand prime_s (note Use of Verilog ^TM statements when defining indexes).

指令类语句iclass在一种公共格式中将各操作码和各操作数联系在一起。在语句iclass中定义的所有指令都具有相同的格式和操作数用途。在定义一个指令类之前，它的各成分应当首先被定义为字段，然后被定义为操作码和操作数。例如，在前面定义操作数acs和adse1的实例中所使用的代码的基础上，建立附加的语句The instruction class statement iclass associates opcodes with operands in a common format. All instructions defined in statement iclass have the same format and operand usage. Before defining an instruction class, its components should first be defined as fields and then as opcodes and operands. For example, to build on the code used in the previous example defining the operands acs and adse1, create the additional statement

operand art t {assign art＝AR[t]；}{}operand art t { assign art = AR[t]; } {}

operand ars s {assign ars＝AR[s]；}{}operand ars s { assign ars = AR[s]; } {}

operand arr r {assign AR[r]＝arr；}{}operand arr r { assign AR[r] = arr; } {}

使用operand语句来定义3个寄存器操作数art，ars和arr(再次注意在此项定义中Verilog^TM各语句的使用)。随后，iclass语句iclass viterbi{adse1，acs}{outarr，inart，inars}指定操作数adse1和acs属于指令viterbi的共同类，上述指令viterbi取两个寄存器操作数art和ars作为输入，并且将输出写入到寄存器操作数arr之中。Use the operand statement to define the three register operands art, ars and arr (note again the use of Verilog ^TM statements in this definition). Subsequently, the iclass statement iclass viterbi {adse1, acs} {outarr, inart, inars} specifies that the operands adse1 and acs belong to the common class of the instruction viterbi, which takes two register operands art and ars as input and writes the output to into the register operand arr.

指令语义语句semantic描述使用Verilog^TM(用于对操作数进行编码)的相同子集的一条或多条指令的行为。通过在一条单独的语义语句中定义多条指令，可以共享某些共同的表达式，并且硬件实施方案可以变得更加有效。在语义语句中允许使用的变量是针对在语句的操作码列表中所定义的各操作码的各操作数，以及在操作码列表中为每一个操作码指定的一个单比特变量。这个变量具有与操作码相同的名字，并且当该操作码被检出时，它估值为1。它被用于计算部分(Verilog^TM子集部分)，用以指示相应指令的出现。An instruction semantic statement semantic describes the behavior of one or more instructions using the same subset of Verilog ^TM (used to encode operands). By defining multiple instructions in a single semantic statement, some common expressions can be shared and hardware implementations can be made more efficient. The variables allowed in a semantic statement are the operands for the opcodes defined in the statement's opcode list, and a single-bit variable specified for each opcode in the opcode list. This variable has the same name as the opcode, and it evaluates to 1 when the opcode is checked out. It is used in the calculation part (Verilog ^TM subset part) to indicate the occurrence of the corresponding instruction.

例如，TIE代码定义一条新指令ADD8_4，它将一个32位字中的4个8位操作数跟在另一个32位字中对应的4个8位操作数相加；还定义了另一条新指令MIN16_2，它在一个32位字中，进行两个16位操作数的最小值的选择，并且在另一个32位字中，可以读出各自的16位操作数：For example, the TIE code defines a new instruction, ADD8_4, which adds four 8-bit operands in a 32-bit word to the corresponding four 8-bit operands in another 32-bit word; also defines another new instruction MIN16_2, which, in one 32-bit word, makes the selection of the minimum value of two 16-bit operands, and in another 32-bit word, the respective 16-bit operands can be read:

opcode ADD8_4 op2＝4’b0000 CUST0opcode ADD8_4 op2=4'b0000 CUST0

opcode MIN16_2 op2＝4’b0001 CUST0opcode MIN16_2 op2=4'b0001 CUST0

iclass add_min {ADD8_4，MIN16_2}{outar r，inars，in art}iclass add_min { ADD8_4, MIN16_2 } { outar r, inars, in art }

semantic add_min {ADD8_4，MIN16_2}{semantic add_min {ADD8_4, MIN16_2}{

wire [31：0] add，min；wire[31:0] add,min;

wire [7：0] add3，add2，add1，add0；wire[7:0] add3, add2, add1, add0;

wire [15：0]min1，min0；wire[15:0] min1, min0;

assign add3＝art{31：24}+ars{31：24]；assign add3 = art{31:24}+ars{31:24];

assign add2＝art[23：16]+ars[23：16]；assign add2 = art[23:16]+ars[23:16];

assign add1＝art[15：8]+ars[15：8]；assign add1 = art[15:8]+ars[15:8];

assign add0＝art{7：0]+ars[7：0]；assign add0 = art{7:0]+ars[7:0];

assign add＝{add3，add2，add1，add0}；assign add = {add3, add2, add1, add0};

assign min1＝art[31：16]<ars[31：16] ？ art{31：16}：assign min1=art[31:16]<ars[31:16] ? art{31:16}:

ars[31：16]；ars[31:16];

assign min0＝art[15：0}<ars[15：0] ？ art[15：0]：assign min0=art[15:0}<ars[15:0] ? art[15:0]:

ars[15：0]；ars[15:0];

assign min＝{min1，min0}；assign min={min1,min0};

assign arr＝(({32{{ADD8_4}}}) & (add))assign arr=(({32{{ADD8_4}}}) & (add))

(({32{{MIN16_2}}}) & (min))；(({32{{MIN16_2}}}) &(min));

}}

在这里，op2，CUST0，arr，art和ars是如上面所指出的预定义操作数，并且opcode和iclass语句起着如上所述的作用。Here, op2, CUST0, arr, art, and ars are predefined operands as indicated above, and the opcode and iclass statements function as described above.

Semantic语句指定由新指令进行的计算。正如专业人士所熟知的那样，Semantic语句中的第2行指定由新的ADD84所进行的计算，其中的第3和第4行指定由新的MIN16_2进行的计算，并且该程序段的最后一行指定将结果写arr寄存器。A Semantic statement specifies the computation to be performed by the new instruction. As is well known to professionals, line 2 of the Semantic statement specifies calculations by the new ADD84, lines 3 and 4 of which specify calculations by the new MIN16_2, and the last line of the block specifies Write the result to the arr register.

回到用户输入接口20的讨论，一旦用户已经输入了她所需要的配置和扩展选项，建立系统50就接着往下进行。如图5所示，建立系统50接收由用户设置的各参数组成的配置说明以及由用户设计的可扩展的各项特征，并将它们跟定义核心处理器体系结构的各项附加参数(例如，用户不能修改的各项特征)组合在一起，以生成描述整个处理器的配置说明100。例如，除了用户选择的配置设置102之外，建立系统50还可以添加各项参数，用以为处理器的物理地址空间指定物理地址的位数，处理器60在复位之后待执行的第1条指令，等等。Returning to the discussion of the user input interface 20, once the user has entered her desired configuration and expansion options, the build system 50 proceeds next. As shown in FIG. 5 , the establishment system 50 receives a configuration specification composed of parameters set by the user and various features designed by the user to be extended, and combines them with various additional parameters defining the architecture of the core processor (for example, Features that cannot be modified by the user) are combined to generate a configuration specification 100 that describes the entire processor. For example, in addition to user-selected configuration settings 102, build system 50 may add parameters to specify the number of physical address bits for the processor's physical address space, the first instruction to be executed by processor 60 after reset ,etc.

为了说明在可配置的处理器中作为核心指令而实现的各项指令以及经由配置选项的选择成为可用的各项指令的实例，由Tensilica公司提供的《Xtensa^TM指令集体系结构(ISA)参考手册》(修订1.0版)已作为参考文献被收入本文。To illustrate examples of instructions implemented as core instructions in a configurable processor and examples of instructions made available through selection of configuration options, the Xtensa ^TM Instruction Set Architecture (ISA) Reference Manual provided by Tensilica "(Revision 1.0) has been incorporated into this article as a reference.

配置说明100还包括一个ISA封装，其中包括指定基本ISA的TIE语言语句，用户已经选择任何附加的封装，例如协处理器封装98(见图2)或者一个DSP封装，以及由用户提供的任何TIE扩展。此外，配置说明100还可以具有多种语句设置标志，表示某些结构特征是否有待于纳入处理器60之中。例如The configuration specification 100 also includes an ISA package, including TIE language statements specifying the base ISA, any additional packages the user has selected, such as the coprocessor package 98 (see FIG. 2 ) or a DSP package, and any TIE provided by the user. expand. In addition, configuration specification 100 may also have various statements that set flags indicating whether certain structural features are yet to be incorporated into processor 60 . For example

IsaUseDebug 1IsaUseDebug 1

IsaUseInterrupt 1IsaUseInterrupt 1

IsaUseHighPriorityInterrupt 0IsaUseHighPriorityInterrupt0

IsaUseException 1IsaUseException 1

表示该处理器将包括芯片上的调试模块92，中断装置72以及除外情况管理，但不包括高优先等级中断装置。Indicates that the processor will include an on-chip debug module 92, an interrupt device 72 and exception management, but will not include a high-priority interrupt device.

使用配置说明100能自动地产生下列各项：Using the configuration specification 100 can automatically generate the following:

—处理器60的指令解码逻辑；- the instruction decoding logic of the processor 60;

—用于处理器60的非法指令检测逻辑；- Illegal instruction detection logic for processor 60;

—汇编程序110的ISA专用部分；- the ISA-specific portion of the assembler 110;

—编译程序108的ISA专用支持程序；- the ISA-specific support program of the compiler 108;

—反汇编程序110的ISA专用部分(为调试程序所使用)；以及- the ISA-specific portion of the disassembler 110 (used by the debugger); and

—仿真程序112的ISA专用部分。- The ISA-specific portion of the emulator 112.

由于一种重要的配置能力就是指定指令的封装的纳入，所以自动地产生这些项目是有价值的。对某些事情来说，若指令已经被配置，则在每一种工具中，有可能用条件代码来实现这一步，以管理该指令，但是这是不好用的；更重要的是，它不允许系统设计者容易地为他的系统添加指令。Since an important configuration capability is the incorporation of packages specifying instructions, it is valuable to automatically generate these items. For some things, if the directive has been configured, it is possible in every tool to implement this step with conditional codes to manage the directive, but this is not easy to use; more importantly, it The system designer is not allowed to easily add instructions to his system.

除了将配置说明100作为来自设计者的输入以外，还有可能接受各项目标，并且让建立系统50自动地确定配置。设计者可以为处理器60指定各项目标。例如，时钟频率、面积、成本、典型功耗以及最大功耗等都可以成为目标。由于某些目标存在矛盾(例如，通常仅通过增加面积或功耗或二者同时增加来提高性能)，随后，建立系统50向搜索引擎106进行咨询，以确定可用的配置选项的集合，并且确定如何从企图同时达到各项输入目标的一种算法那里设置每一个选项。Instead of having the configuration specification 100 as input from the designer, it is also possible to accept the targets and have the build system 50 determine the configuration automatically. A designer can specify various goals for processor 60 . For example, clock frequency, area, cost, typical power consumption, and maximum power consumption can all be targets. Since certain goals are contradictory (e.g., performance is often only increased by increasing area or power consumption, or both), build system 50 then consults search engine 106 to determine the set of configuration options available, and determines How to set each option from an algorithm that attempts to achieve each input goal at the same time.

搜索引擎106包括一个数据库，它具有描述各种量度的影响的各行。各行可以指定一种特定的配置设置在一种量度上具有加法的、乘法的或限制的效果。各行还可以被标记为需要其他的配置选项作为先决条件，或者被标志为跟其他各选项不兼容。例如，简单的分支判断选项可以为每一条指令的周期数(CPI—性能的一种决定因素)指定乘法的或加法的效果、对时钟频率的限制，对面积的加法效果、以及对功率的加法效果等。它可以被标记为跟一种偏好的分支判断程序不兼容，并且依赖于将取指令队列的大小设置为至少两行。这些效果的数值可以是一项参数(例如分支判断表的大小)的一个函数。一般来说，用可以估值的各种函数来表示数据库的各行。Search engine 106 includes a database with rows describing the impact of various metrics. Lines can specify that a particular configuration setting has additive, multiplicative, or limiting effects on a metric. Lines can also be marked as requiring other configuration options as prerequisites, or as being incompatible with other options. For example, simple branch predicate options can specify multiplicative or additive effects, limits on clock frequency, additive effects on area, and additive effects on power for cycles per instruction (CPI—a determinant of performance) effects etc. It can be marked as incompatible with a preferred branching routine, and relies on setting the fetch queue size to at least two lines. The magnitude of these effects can be a function of a parameter such as the size of the branch decision table. In general, each row of the database is represented by various functions that can be evaluated.

不同的算法可能用于寻找最接近于达到各项输入目标的配置设置。例如，一种简单的背包封装算法按照数值除以成本的排序来考虑每一个选项，并且接受任何能增加数值同时将成本限制在指定限值以下的选项说明。这样一来，例如，为了使性能最大化，同时保持功率低于一个指定数值，可以按照性能除以功率对各选项进行排序，并且接受能增加性能但不超出功率限值的每一个选项。更复杂的各种背包算法提供某种程度的回溯。Different algorithms may be used to find the configuration setting that most closely achieves each input goal. For example, a simple knapsack packing algorithm considers each option in the order of value divided by cost, and accepts any option specification that increases the value while keeping the cost below a specified limit. So, for example, to maximize performance while keeping power below a specified value, you can sort the options by performance divided by power, and accept every option that increases performance without exceeding the power limit. The more complex various knapsack algorithms provide some level of backtracking.

一种用于从目标和设计数据库来确定配置的很不同的算法种类基于模拟退火。各项参数的一个随机初始集被用来作为起始点，然后通过评估一个全局的应用程序函数来确定接受或拒绝个别参数的改变。当根据一个阈值(随着优化的进行，该阈值降低)概率地接受负的改变时，应用程序函数的改进通常被接受。在这个系统中，从各项输入目标来构建应用程序函数。例如，给定的各项目标为：性能>200，功率<100，面积<4，按照功率、面积，和性能的优先顺序，可以使用下列的应用程序函数：A very different class of algorithms for determining configurations from target and design databases is based on simulated annealing. A random initial set of parameters is used as a starting point, and changes to individual parameters are accepted or rejected by evaluating a global application function. Improvements in application function are generally accepted when negative changes are accepted probabilistically according to a threshold that decreases as optimization proceeds. In this system, application functions are built from various input objects. For example, given the goals of performance >200, power <100, and area <4, in order of priority for power, area, and performance, the following application functions can be used:

Max((1-Power/100)＊0.5，0)+(max((1-Area/4)＊0.3，0)＊(if Power<100 then1 else(1-Power/100)＊＊2))+(max(Performance/200＊0.2，0)＊(if Power<100 then 1else(1-Power/100)＊＊2))＊(if Area<4 then 1 else(1-area/4)＊＊2))Max((1-Power/100)＊0.5,0)+(max((1-Area/4)＊0.3,0)＊(if Power<100 then1 else(1-Power/100)＊＊2)) +(max(Performance/200＊0.2,0)＊(if Power<100 then 1else(1-Power/100)＊＊2))＊(if Area<4 then 1 else(1-area/4)＊＊ 2))

它回报功耗的降低，直到它低于100，随后为中性，回报面积的减少直到它低于4，随后为中性，并且回报性能的提高，直到它高于200，随后为中性。还有这样的部件：当功率超出指定值时，减少面积的使用，当功率或面积超出指定值时，降低性能的使用。It reports a reduction in power consumption until it's below 100, then neutral, a reduction in area until it's below 4, then neutral, and an increase in performance until it's above 200, then neutral. There are also components that reduce the use of area when power exceeds a specified value, and reduce the use of performance when power or area exceeds a specified value.

这两种算法以及其他算法都可以被用来搜索满足指定目标的各种配置。重要的是可配置处理器的设计已经在一个设计数据库中加以说明，该数据库具有先决条件以及各项不兼容性选项的说明，以及各配置选项对不同量度的影响。Both of these algorithms, as well as others, can be used to search for various configurations that satisfy specified goals. It is important that the design of a configurable processor is described in a design database with prerequisites and a description of each incompatibility option, and the impact of each configuration option on different metrics.

我们给出的实例已经使用各项硬件目标，这些目标是一般的，并且不依赖于运行于处理器60之上的特定算法。所描述的算法还可以被用来选择跟特定的用户程序相配套的配置。例如，用户程序可以运行于具有高速缓冲存储器的精确的仿真器之上，以测量不同类型的高速缓冲存储器的数目，这些高速缓冲存储器具有不同的特性，诸如不同的大小，不同的线宽以及不同的设置关联性。这些仿真的结果可以被添加到搜索算法106所使用的数据库中去，上述算法被描述用以帮助选择硬件实施方案说明40。The examples we have given have used various hardware targets, which are generic and do not depend on a particular algorithm running on processor 60. The described algorithm can also be used to select a configuration that matches a particular user program. For example, a user program can be run on a precise emulator with caches to measure the number of different types of caches with different characteristics, such as different sizes, different line widths, and different The setting association. The results of these simulations may be added to the database used by the search algorithm 106 described to aid in the selection of hardware implementation specifications 40 .

类似地，可以针对某些指令的出现来修饰用户算法，上述指令可以任选地被植入到硬件之中。例如，若用户算法花费大量时间来进行乘法运算，则搜索引擎106可以自动地建议纳入一个硬件乘法器。这样的算法不需要局限于考虑一种用户算法。用户可以将一组算法送入系统，并且搜索引擎106可以选择这样一种配置，平均来说，这样的配置对用户程序的集合是有用的。Similarly, user algorithms can be modified for the presence of certain instructions, which can optionally be built into the hardware. For example, if a user's algorithm takes a significant amount of time to perform multiplications, the search engine 106 may automatically suggest incorporating a hardware multiplier. Such an algorithm need not be limited to considering one user algorithm. A user can feed a set of algorithms into the system, and the search engine 106 can choose a configuration that, on average, is useful for the user's set of programs.

除了选择处理器60的预配置特性以外，搜索算法还可以被用来自动地选择或向用户建议可能的TIE扩展。给出各项输入目标，并且给出可能用C编程语言编写的用户程序的实例，这些算法就会建议可能的TIE扩展。对于没有状态的TIE扩展来说，可以用模式匹配程序来嵌入类似于编译程序的各种工具。这些模式匹配程序按照自底向上方式在表达式节点中搜索能用一条单字接指令来代替的多字节指令模式。例如，用户C程序含有下列语句：In addition to selecting pre-configured features of processor 60, search algorithms may also be used to automatically select or suggest possible TIE extensions to the user. Given various input targets, and given examples of possible user programs written in the C programming language, the algorithms suggest possible TIE extensions. For stateless TIE extensions, pattern matching programs can be used to embed tools similar to compilers. These pattern matchers search the expression nodes for multi-byte instruction patterns that can be replaced by a single-word instruction in a bottom-up manner. For example, a user C program contains the following statements:

x＝(y+z)<<2；x=(y+z)<<2;

x2＝(y2+z2)<<2；x2=(y2+z2)<<2;

模式匹配程序将发现该用户在两个不同位置上将两个数相加，并将结果左移两位。系统将产生一条TIE指令(两数相加并将结果左移两位)的可能性添加到数据库之中。The pattern matcher will find that the user added the two numbers in two different positions and shift the result left two places. The system adds to the database the possibility of generating a TIE instruction (adding two numbers and shifting the result left two places).

建立系统50跟踪许多条可能的TIE指令，连同它们出现多少次的一个计数。使用一种跟踪工具，系统50还跟踪在该算法的整个执行过程中，每一条指令被执行的频繁程度。使用一个硬件仿真器，系统50跟踪为了实现每一条可能的TIE指令，硬件的开销有多大。这些数字被送入搜索试探算法，以便选择一组能使各项输入目标最大化的可能的TIE指令；上述目标例如性能，代码大小，硬件复杂性等等。The build system 50 keeps track of a number of possible TIE instructions, along with a count of how many times they occur. Using a tracking tool, system 50 also tracks how often each instruction is executed throughout the execution of the algorithm. Using a hardware emulator, system 50 keeps track of how expensive the hardware is to implement each possible TIE instruction. These numbers are fed into a search heuristic algorithm to select a set of possible TIE instructions that maximize various input objectives; such as performance, code size, hardware complexity, and so on.

类似的然而更强有力的算法被用来发现具有状态的可能的TIE指令。几种不同的算法被用来检出不同类型的机会。一种算法使用类似编译程序的工具来扫描用户程序，并且检测该用户程序是否需要比硬件所能提供的更多的寄存器。正如业界的许多从业者所熟知的那样，通过对寄存器溢出的计数，就能检出这种情况，并且以用户代码的编译后的样式加以恢复(取出)。类似于编译程序的工具向搜索引擎建议一个具有附加硬件寄存器98的协处理器，但是它仅支持用于用户代码的、具有多次溢出和恢复的部分的运算。该工具负责通知搜索引擎106所使用的数据库称：关于协处理器的硬件成本的估计以及关于用户算法性能如何的估计已经得到改进。如上所述，搜索引擎106对所建议的协处理器98是否能导致更好的配置这一点作出全局的判断。A similar but more powerful algorithm is used to find possible TIE instructions with state. Several different algorithms are used to detect different types of opportunities. An algorithm uses a compiler-like tool to scan the user program and detect if the user program requires more registers than the hardware can provide. As is well known to many practitioners in the industry, this condition can be detected by counting register overflows and recovered (fetched) in a compiled fashion in user code. A compiler-like tool suggests to the search engine a coprocessor with additional hardware registers 98, but it only supports operations for parts of the user code that have multiple overflows and restores. This tool is responsible for informing the database used by the search engine 106 that estimates about the hardware cost of the coprocessor and about how well the user's algorithms perform have improved. As noted above, the search engine 106 makes a global determination as to whether a suggested coprocessor 98 would result in a better configuration.

可供选择地，或者与之相结合，类似于编译程序的工具检查用户程序是否使用位屏蔽操作，以保证某些变量永不大于某些限值。在这种情况下，该工具向搜索引擎106建议一个使用与用户限值(例如，12位或20位或任何其他大小的整数)相一致的数据类型的协处理器98。在另一个实施例中所使用的第3种算法，用于以C++语言编写的用户程序，类似于编译程序的工具发现很多时间都消耗在对用户定义的抽象数据类型的运算上。若所有运算都基于适用于TIE的数据类型，则该算法向搜索引擎提出在该种数据类型上，用一个TIE协处理器来实现所有的运算。Alternatively, or in combination, a compiler-like tool checks whether the user program uses bit-masking operations to ensure that certain variables are never larger than certain limits. In this case, the tool suggests to the search engine 106 a coprocessor 98 that uses a data type consistent with user limits (eg, 12-bit or 20-bit or any other sized integer). In the third algorithm used in another embodiment, for user programs written in C++ language, tools like compiling programs find that a lot of time is consumed in operations on user-defined abstract data types. If all operations are based on a data type suitable for TIE, the algorithm proposes to the search engine to use a TIE coprocessor to implement all operations on this data type.

为了生成处理器60的指令解码逻辑，为在配置说明中所定义的每一组操作码产生一组信号。通过简单地将下列语句To generate instruction decode logic for processor 60, a set of signals is generated for each set of opcodes defined in the configuration specification. By simply adding the following statement

opcode NAME FIELD＝VALUEopcode NAME FIELD=VALUE

重写到HDL语句rewrite to HDL statement

assign NAME＝FIELD＝VALUE；assign NAME=FIELD=VALUE;

以及将and will

opcode NAME FIELD＝VALUE PARENTNAME[FIELD2＝opcode NAME FIELD=VALUE PARENTNAME[FIELD2=

VALUE2]VALUE2]

重写到rewrite to

assign NAME＝PARENTNAME & (FIELD＝＝VALUE)assign NAME=PARENTNAME & (FIELD==VALUE)

就能产生该代码。to generate the code.

寄存器互锁以及流水线信号的产生也已经实现自动化。这种逻辑也基于配置说明中的信息而产生。基于包含在iclass语句中的寄存器使用信息以及该指令的潜在因素，当当前指令的源操作数依赖于尚未完成的一条先前的指令的目标操作数时，已产生的逻辑插入一个挂起(或气泡)。实现这种挂起功能的机制作为核心硬件的一部分而实现。The generation of register interlocks and pipeline signals has also been automated. This logic is also generated based on the information in the configuration specification. Based on the register usage information contained in the iclass statement and the latency of the instruction, the resulting logic inserts a hang (or bubble) when the source operand of the current instruction depends on the destination operand of a previous instruction that has not yet completed ). The mechanisms to implement this suspend functionality are implemented as part of the core hardware.

通过对个别的已产生的指令信号进行或非运算，并将其结果跟它们的字段约束条件进行与运算，来产生非法指令检测逻辑：Illegal instruction detection logic is generated by ORing individual generated instruction signals and ANDing the results with their field constraints:

assign illegalinst＝！(INST1|INST2…|INSTn)；assign illegalinst=! (INST1|INST2...|INSTn);

各指令解码信号以及非法指令信号可用作解码模块的输出以及作为手写处理器逻辑的输入。Each instruction decode signal and the illegal instruction signal are available as output of the decode module and as input to the handwriting processor logic.

为了产生其他的处理器特征，本实施例使用可配置处理器60的Verilog^TM描述，并且用一种基于Perl的预处理器语言加以强化。Perl是一种全特征语言，其中包括复杂的控制结构、子程序和I/O装置。在本发明的一个实施例中被称为TPP(如在附录B的源程序列表中所示，TPP本身是一段Perl程序)的预处理器，扫描它的输入，把某些行标识为用预处理器语言(Perl用于TPP)编写的预处理器代码(以分号为前缀的那些用于TPP)，并且构建一段程序，其中包括已抽取的行和语句，以产生其他行的文本。非预处理器的行可以具有嵌入的表达式，在其位置上，作为TPP处理的结果而产生的表达式被置换。然后，执行所得到的程序以产生源代码，即，用以描述详细的处理器逻辑40的Verilog^TM代码(正如在下面将要看到的那样，TPP也被用来配置软件开发工具30)。To generate additional processor features, the present embodiment uses a Verilog ^(TM) description of the configurable processor 60, augmented with a Perl-based preprocessor language. Perl is a full-featured language, including complex control structures, subroutines, and I/O devices. In one embodiment of the invention, a preprocessor called TPP (which itself is a Perl program, as shown in the source program listing in Appendix B), scans its input and identifies certain lines as using the preprocessor. preprocessor code (those prefixed with a semicolon for TPP) written in a processor language (Perl for TPP), and builds a program that includes lines and statements that have been extracted to produce text for other lines. Non-preprocessor lines may have embedded expressions in their places where expressions produced as a result of TPP processing are substituted. The resulting program is then executed to generate source code, ie, Verilog ^(TM) code to describe the detailed processor logic 40 (TPP is also used to configure the software development tool 30, as will be seen below).

当用于这种场合，由于它允许将诸如配置说明查询、条件表达式以及迭代结构那样的结构纳入到Verilog^TM代码之中，以及如前面所指出的那样，允许根据在Verilog^TM代码之中的配置说明100来实现嵌入的表达式，所以TPP是一种强有力的预处理语言。例如，基于数据库查询的的TPP分配类似于When used in this context, since it allows constructs such as configuration specification queries, conditional expressions, and iterative constructs to be incorporated into Verilog ^TM code, and as noted earlier, ^allows Configuration instructions 100 implement embedded expressions, so TPP is a powerful preprocessing language. For example, a TPP assignment based on a database query like

；$endian＝config_get_value(“IsaMemoryOrder”); $endian = config_get_value("IsaMemoryOrder")

在这里，config_get_value是用以查询配置说明100的TPP函数，IsaMemoryOrder是在配置说明100中设置的一个标志，并且$endian是将在以后用来生成Verilog^TM代码的一个TPP变量。Here, config_get_value is a TPP function to query the configuration specification 100, IsaMemoryOrder is a flag set in the configuration specification 100, and $endian is a TPP variable that will be used to generate Verilog ^TM code later.

TPP条件表达式可以是TPP conditional expressions can be

；if(config_get_value(“IsaMemoryOrder”)eq“LittleEndian”);if(config_get_value("IsaMemoryOrder") eq "LittleEndian")

；{按照低位在先顺序执行Verilog^TM代码}; {execute Verilog ^TM code in little-endian order}

；否则;otherwise

；{按照高位在先顺序执行Verilog^TM代码};{Execute Verilog ^TM code in big-endian order}

可以用TPP结构来实现迭代循环，例如Iterative loops can be implemented using the TPP structure, for example

；for($i＝0；$i<$ninterrupts；$i++);for ($i=0; $i < $ninterrupts; $i++)

；{do Verilog^TM code for each of 1...N interrupts}; {do Verilog ^TM code for each of 1...N interrupts}

在这里，$i是一个TPP循环索引变量，$ninterrupts是为处理器60指定的中断的数目(使用config_get_value从配置说明100中获得)。Here, $i is a TPP loop index variable and $ninterrupts is the number of interrupts specified for processor 60 (obtained from configuration specification 100 using config_get_value).

最后，可以将TPP代码嵌入到Verilog^TM表达式，例如Finally, TPP code can be embedded into Verilog ^TM expressions, such as

wire[`$ninterrupts-1`：0]srInterruptEn；wire[`$ninterrupts-1`:0] srInterruptEn;

xtscenflop #(`$ninterrupts`)srintrenreg(srInterruptEn，srDataIn_W[`$ninterrupts-1`：0]，srIntrEnWEn，！cReset，CLK)；xtscenflop #(`$ninterrupts`) srintrenreg(srInterruptEn, srDataIn_W[`$ninterrupts-1`:0], srIntrEnWEn, !cReset, CLK);

在这里，$ninterrupts定义中断的数目并确定xtscenflop模块(一个触发器原始模块)的宽度(用比特来表示)；Here, $ninterrupts defines the number of interrupts and determines the width (in bits) of the xtscenflop block (a flip-flop primitive block);

srInterruptEn是触发器的输出，被定义为一串适当数目的比特；srInterruptEn is the output of the flip-flop, defined as a string of appropriate number of bits;

srDataIn_W是触发器的输入，但是根据中断的数目仅输入有关的比特；srDataIn_W is the input of the flip-flop, but only the relevant bits are input according to the number of interrupts;

srIntrEnWEn是触发器的写使能信号；srIntrEnWEn is the write enable signal of the flip-flop;

cReset是送往触发器的清除输入；以及cReset is the clear input to the flip-flop; and

CLK是送往触发器的输入时钟。CLK is the input clock to the flip-flop.

例如，给出下列送往TPP的输入：For example, given the following input to TPP:

； # Timer Interrupt; # Timer Interrupt

； if (SIsaUseTimer) {; if (SIsaUseTimer) {

wire [`Swidth-1`：0] srCCount；wire[`Swidth-1`:0] srCCount;

wire ccountWEn；wire accountWEn;

//--------------------------------------------------------------//------------------------------------------------ --------------

// CCOUNT Register// CCOUNT Register

//---------------------------------------------------------------//------------------------------------------------ ---------------

assign ccountWEn ＝ srWEn_W && (srWrAdr_W＝＝`SRCCOUNT)；assign accountWEn = srWEn_W && (srWrAdr_W == `SRCCOUNT);

xtflop # (`Swidth`)srccntreg-(srCCount，(ccountWEn？srDataIn_W：xtflop # (`Swidth`)srccntreg-(srCCount, (ccountWEn?srDataIn_W:

srCCount+1)，CLK)；srCCount+1), CLK);

； for (Si＝0； Si<STimerNumber； $i++){; for (Si=0; Si<STimerNumber; $i++){

// CCOMPARE Register// CCOMPARE Register

--

wire [`Swidth-1`：0] srCCompare`$i`；wire[`Swidth-1`:0] srCCompare`$i`;

wire ccompWEn`$i`；wire ccompWEn `$i`;

assign ccompWEn`Si` ＝ srWEn_W && (srWrAdr_W ＝＝`SRCCOMPARE`$i`)；assign ccompWEn`Si` = srWEn_W && (srWrAdr_W == `SRCCOMPARE`$i`);

xtenflop #(`Swidth`) srccmp`$i`regxtenflop #(`Swidth`) srccmp`$i`reg

(srCCompare`$i`，srDataIn_W，ccompWEn`Si`，CLK)；(srCCompare`$i`, srDataIn_W, ccompWEn`Si`, CLK);

assign setCCompIntr`$i` ＝ (srCCompare`$i`＝＝srCCount)；assign setCCompIntr`$i` = (srCCompare`$i`==srCCount);

assign clrCCompIntr`$i` ＝ ccompWEn`$i`；assign clrCCompIntr`$i` = ccompWEn`$i`;

； }; }

； } ## IsaUseTimer; } ## IsaUseTimer

and the declarationsand the declarations

$IsaUseTimer＝1$IsaUseTimer=1

$TimerNumber＝2$TimerNumber=2

$width＝32$width=32

TPP generatesTPP generates

wire [31：0] srCCount；wire[31:0] srCCount;

wire ccountWEn；wire accountWEn;

// CCOUNT Register// CCOUNT Register

xtflop #(32) srccntreg (srCCount，(ccountWEn？srDataIn_W：xtflop #(32) srccntreg(srCCount, (ccountWEn? srDataIn_W:

srCCount+1)，CLK)；srCCount+1), CLK);

// CCOMPARE Register// CCOMPARE Register

wire [31：0] srCCompareO；wire[31:0] srCCompareO;

wire ccompWEnO；wire ccompWEnO;

assign ccompWEnO ＝ srWEn_W && (srWrAdr_W＝＝SRCCOMPAREO)；assign ccompWEnO = srWEn_W && (srWrAdr_W == SRCCOMPAREO);

xtenflop #(32) srccmpOregxtenflop #(32) srccmpOreg

(srCCompareO，srDataIn_W，ccompWEnO，CLK)；(srCCompareO, srDataIn_W, ccompWEnO, CLK);

assign setCCompIntrO＝ (srCCompareO ＝＝ srCCount)；assign setCCompIntrO = (srCCompareO == srCCount);

assign clrCCompIntrO ＝ ccompWEnO；assign clrCCompIntrO = ccompWEnO;

// CCOMPARE Register// CCOMPARE Register

wire [31：0] srCComparel；wire[31:0] srCComparel;

wire ccompWEnl；wire ccompWEnl;

assign ccompWEnl ＝ srWEn_W && (srWrAdr_W ＝＝ `SRCCOMPARE1)；assign ccompWEnl = srWEn_W && (srWrAdr_W == `SRCCOMPARE1);

xtenflop #(32) srccmplregxtenflop #(32) srccmplreg

(srCComparel，srDataIn_W，ccompWEnl，CLK)；(srCComparel, srDataIn_W, ccompWEnl, CLK);

assign setCCompIntrl ＝ (srCComparel ＝＝ srCCount)；assign setCCompIntrl = (srCComparel == srCCount);

assign clrCCompIntrl＝ccompWEnl；assign clrCCompIntrl = ccompWEnl;

这样产生的HDL描述114被用来合成用于实现处理器的硬件，例如在程序块122中使用由Synopsys公司制作的DesignCompiler^TM。然后，在程序块128中使用例如由Cadence公司提供的Silicon Ensemble^TM或者由Avent！公司提供的Apollo^TM对结果进行布局和布线。一旦各部件已经被布线完毕，在程序块132中，使用例如由Synopsys公司提供的PrimeTime^TM，将其结果用于接线的反向注释和定时验证。这样处理的产物就是一个硬件特征文件134，它可以被用户用来向配置俘获程序20提供进一步的输入，以便进行进一步的配置迭代。The HDL description 114 thus generated is used to synthesize the hardware for implementing the processor, for example in block 122 using DesignCompiler ^™ made by Synopsys Corporation. Then, in block 128 using, for example, the Silicon Ensemble ^™ provided by Cadence Corporation or by Avent! Apollo ^TM provided by the company places and routes the results. Once the components have been routed, the results are used in block 132 for back-annotation and timing verification of the wiring using, for example, PrimeTime ^(TM) provided by Synopsys, Inc. The product of this processing is a hardware profile 134, which can be used by the user to provide further input to the configuration capture program 20 for further configuration iterations.

正如前面结合逻辑合成部分122所说明的那样，配置处理器60的结果之一就是一组定制的HDL文件，通过使用多种商业的合成工具中的任何一种，就能从中获得专用的门一级的实施方案。Synopsys公司提供的Design Compiler^TM就是这样一种工具。为了保证正确的和高性能的门一级的实施方案，本实施例提供了在客户环境中为使合成过程自动化所需的稿本。在提供这样的稿本时，所面临的挑战就是支持多种合成方法学和不同用户的实施目标。为了迎接第1种挑战，本实施例将稿本切分为较小的和功能上完整的稿本。一个这样的实例就是提供一个读稿本，它能读出与特定的处理器配置60有关的所有HDL文件，并提供一个定时约束稿本来设置在处理器60中的唯一的定时要求，以及一个稿本，它以能够用于门一级网表的布局和布线的方式写出合成结果。为了迎接第2种挑战，本实施例为每一种实施目标提供一种稿本。一个这样的实例就是提供一种用以获得最快的循环时间的稿本，一种用以获得最小硅片面积的稿本，以及一种用以获得最低功耗的稿本。As previously described in connection with logic synthesis section 122, one of the results of configuring processor 60 is a set of custom HDL files from which specific gate- level implementation. Design Compiler ^TM provided by Synopsys is such a tool. To ensure correct and high-performance gate-level implementation, this example provides the scripts needed to automate the synthesis process in a customer environment. In providing such a manuscript, the challenge is to support multiple synthetic methodologies and implementation goals of different users. In order to meet the first kind of challenge, the present embodiment divides the manuscript into smaller and functionally complete manuscripts. One such example is to provide a script that reads all the HDL files associated with a particular processor configuration 60, and a timing constraint script that sets the unique timing requirements in the processor 60, and a script book, which writes the synthesis results in a way that can be used for the placement and routing of gate-level netlists. In order to meet the second challenge, this embodiment provides a script for each implementation goal. One such example would be to provide a recipe for the fastest cycle time, a recipe for the smallest silicon area, and a recipe for the lowest power consumption.

在处理器配置的其他阶段也使用这些稿本。例如，处理器60的HDL模型一旦被写出，就可以用一段仿真程序来验证处理器60的正确运行，如同前面结合程序块132所说明的那样。通常，通过在被仿真的处理器60中运行多种测试程序或诊断程序来完成这一步。在被仿真的处理器60中运行一种测试程序可能需要许多步骤，例如产生测试程序的一个可执行的图像，产生可以被仿真程序112读出的可执行的图像的一种表示，生成一个暂时的布局以便收集仿真结果，供以后分析之用，分析仿真结果，等等。在现有技术中，使用多个丢弃的稿本来完成这一步。这些稿本具有关于仿真环境的内含知识，例如应当纳入哪一个HDL文件，在目录结构中何处能找到这些文件，在测试台中需要哪些文件，等等。在当前设计中，优选的机制就是编写一个由参数置换而配置出来的稿本样板。这种配置机制也使用TPP来产生在仿真中所需的文件的列表。These scripts are also used in other stages of processor configuration. For example, once the HDL model of processor 60 has been written, a simulation program can be used to verify correct operation of processor 60, as described above in connection with block 132. Typically, this is done by running various test programs or diagnostic programs on the emulated processor 60 . Running a test program in the emulated processor 60 may require many steps, such as generating an executable image of the test program, generating a representation of the executable image that can be read by the emulation program 112, generating a temporary layout to collect simulation results for later analysis, analyze simulation results, and so on. In the prior art, this is done using multiple discarded manuscripts. These scripts have built-in knowledge about the simulation environment, such as which HDL files should be included, where in the directory structure they can be found, which files are required in the test bench, and so on. In the current design, the preferred mechanism is to write a script template configured by parameter substitution. This configuration mechanism also uses TPP to generate a list of files needed in the simulation.

而且，在程序块132的验证过程中，通常需要编写其他的稿本，以便让设计者运行一系列的测试程序。通常被用来运行回归程序组，它使设计者相信在HDL模型中的给定的改变不会引入新的差错。由于回归稿本有许多内含的关于文件名、位置等的假设，所以它们也经常被丢弃。如上所述，针对一个单独的测试程序，为了生成一个运行稿本，将回归稿本写成一个样板。在配置时，通过将各项参数置换为实际数值来配置该样板。Moreover, during the verification process of program block 132, it is usually necessary to write other scripts to allow the designer to run a series of test programs. Often used to run regression suites, it assures the designer that a given change in the HDL model will not introduce new errors. Regression manuscripts are also often discarded due to their many embedded assumptions about filenames, locations, etc. As mentioned above, for a single test program, in order to generate an operation script, the regression script is written as a template. When configuring, configure the template by replacing various parameters with actual values.

将RTL描述转换为硬件实施方案的过程的最后一个步骤就是使用布局和布线(P&R)软件将抽象的网表转换为几何表示。P&R软件分析网表的连接性并且决定各单元的定位。然后它尝试去描画介于所有单元之间的连接。时钟网通常受到特殊的注意并且作为最后一个步骤进行布线。这个过程可能受助于向各工具提供某些信息，例如希望将哪些单元靠拢在一起(称为软件集群)，各单元的相对位置，希望哪些网具有小的传播延时，等等。The final step in the process of converting an RTL description to a hardware implementation is to use place and route (P&R) software to convert the abstract netlist into a geometric representation. The P&R software analyzes the connectivity of the netlist and determines the location of the cells. It then tries to draw connections between all units. Clock nets usually receive special attention and are routed as the last step. This process may be facilitated by providing certain information to the tools, such as which units are desired to be clustered together (called software clusters), the relative positions of the units, which nets are desired to have small propagation delays, and so on.

为了使这个过程变得更容易以及保证符合所需的性能目标—循环时间，面积，功耗—配置机制为P&R软件产生一组稿本或输入文件。这些稿本还含有诸如需要多少根电源与接地连线，这些连线应当如何沿着边界分布，等等。通过查询一个数据库来产生这些稿本，在该数据库中，含有关于要生成多少软件集群，以及哪些单元应当纳入它们之中，哪些网在定时上是重要的，等等。这些参数根据已经选择哪些选项而发生改变。这些稿本必须是根据准备用来进行布局与布线的各种工具来加以配置的。To make this process easier and to ensure compliance with desired performance goals—cycle time, area, power consumption—the configuration mechanism generates a set of scripts, or input files, for the P&R software. These scripts also contain information such as how many power and ground connections are required, how these connections should be distributed along the boundary, and so on. These scripts are generated by querying a database containing information on how many software clusters to generate, and which units should be included in them, which nets are important in timing, and so on. These parameters change depending on which options have been selected. These scripts must be configured according to the various tools to be used for place and route.

可选地，该配置机制可以从用户那里请求更多的信息，并将其送往P&R稿本。例如，接口可以向用户要求最终布局所需的纵横比，在时钟树中应当插入多少个缓冲级，输入和输出引脚应当设置在哪一面，这些引脚相对的或绝对的位置，电源和接地母线的宽度和位置，等等。然后这些参数将被送往P&R稿本，以产生所需的布局。Optionally, the configuration mechanism can request further information from the user and send it to the P&R script. For example, the interface can ask the user the desired aspect ratio for the final layout, how many buffer stages should be inserted in the clock tree, which side the input and output pins should be placed on, the relative or absolute position of these pins, power and ground Width and location of bus bars, etc. These parameters will then be sent to the P&R script to produce the desired layout.

可以使用更加复杂的稿本，它支持例如更加复杂的时钟树。一种用以降低功耗的普通的优化方案就是对时钟信号进行门控。然而，由于要平衡所有分支的延时是比较困难的，所以这使得时钟树的合成成为一个更加困难的问题。配置接口会向用户要求正确的各单元用于时钟树，并进行部分或全部的时钟树合成。通过获知在该项设计中各门控时钟位于何处，以及评估从缓冲门(qualifying gate)到各触发器的时钟输入端之间的延时，就能做到这一步。然后，它将对时钟树合成工具给出一项约束条件，即时钟缓冲器的延时要跟各门控单元的延时相匹配。在当前的实施例中，由一个通用的Perl稿本来完成这一步。这个稿本读出由配置代理商根据哪些选项被选中而产生的门控时钟信息。一旦该设计已经被布局和布线完毕，并在最终的时钟树合成完成之前，就运行Perl稿本。More complex scripts can be used, supporting eg more complex clock trees. A common optimization scheme to reduce power consumption is to gate the clock signal. However, since it is difficult to balance the delays of all branches, this makes clock tree synthesis a more difficult problem. The configuration interface will require the user to use the correct units for the clock tree, and perform partial or full clock tree synthesis. This can be done by knowing where each gated clock is located in the design and by evaluating the delay from the qualifying gate to the clock input of each flip-flop. It then places a constraint on the clock tree synthesis tool that the delays of the clock buffers match the delays of the individual gating cells. In the current embodiment, this is done by a generic Perl script. This script reads out clock gating information generated by the configuration agent based on which options are selected. Once the design has been placed and routed, and before the final clock tree synthesis is complete, the Perl script is run.

对上述的特殊处理过程还可以作出进一步的改进。特别是，我们将叙述一种过程，通过它，用户就能几乎瞬时地获得类似的硬件特征信息，而不必花费几个小时去运行那些CAD工具。这个过程有几个步骤。Further improvements can be made to the above-mentioned special processing procedure. In particular, we will describe a process by which users can obtain similar hardware feature information almost instantaneously, without having to spend hours running those CAD tools. This process has several steps.

这个过程中的第1个步骤就是将所有配置选项的集合划分为各正交选项的组，使得在一个组中的一个选项对硬件特征的影响跟在任何其他组中的各选项无关。例如，MAC16单元对硬件特征的影响跟任何其他选项无关。这样一来，就形成一个仅有MAC16选项的选项组。由于对硬件特征的影响取决于这些选项的特定组合，所以更复杂的实例就是含有各中断选项、各高级中断选项以及定时器选项的一个选项组。The first step in this process is to divide the set of all configuration options into groups of orthogonal options such that the effect of an option in one group on hardware characteristics is independent of options in any other group. For example, the MAC16 unit has no effect on the hardware characteristics of any other option. In this way, an option group with only MAC16 options is formed. A more complex example would be an option group containing interrupt options, advanced interrupt options, and timer options, since the effect on hardware characteristics depends on the specific combination of these options.

第2个步骤就是表征每一个选项组对硬件特征的影响。通过获得在该组中，各选项的各种组合对硬件特征的影响，来实现这种表征。对每一种组合来说，使用一种事先描述的过程来获得此项特征，在上述过程中，导出一个实际的实施方案并测量其硬件特征。这样的信息被存储在一个评估数据库之中。The second step is to characterize the effect of each option group on the hardware characteristics. This characterization is achieved by obtaining the effect of various combinations of options within the set on hardware characteristics. For each combination, this characterization was obtained using a previously described procedure in which an actual implementation was derived and its hardware characteristics measured. Such information is stored in an assessment database.

最后的步骤就是导出专门的公式，用曲线拟合和内插技术，来计算在各选项组中，各选项的特定组合对硬件特征的影响。根据各选项的性质，使用不同的公式。例如，由于每一个附加的中断矢量都向硬件添加大致相同的逻辑，我们使用线性函数来模拟它对硬件的影响。在另一个实例中，具有需要高级中断选项的定时器单元，因此，关于定时器选项对硬件的影响的公式是涉及几个选项的条件公式。The final step is to derive special formulas, using curve fitting and interpolation techniques, to calculate the impact of specific combinations of options in each option group on hardware characteristics. Depending on the nature of each option, different formulas are used. For example, since each additional interrupt vector adds roughly the same logic to the hardware, we use a linear function to simulate its effect on the hardware. In another example, there is a timer unit that requires advanced interrupt options, so the formula for the effect of the timer options on the hardware is a conditional formula involving several options.

就体系结构的选择如何影响应用程序的运行时间性能以及代码的大小提供快速反馈是有用的。来自多个应用领域的几组基准测试程序被选中。对每一个领域来说，预先建立一个数据库，它对不同的体系结构设计决策如何影响在该领域中的各应用程序的运行时间性能和代码大小作出评估。随着用户改变体系结构的设计，针对用户感兴趣的应用领域或针对多个领域对数据库进行查询。评估结果被送给用户，使得她能在软件效益和硬件成本之间的折衷上获得一项估计。It is useful to provide quick feedback on how the choice of architecture affects the runtime performance of the application as well as the size of the code. Several sets of benchmark programs from various application domains are selected. For each domain, a database is pre-built that evaluates how different architectural design decisions affect the runtime performance and code size of each application in that domain. As the user changes the design of the architecture, the database is queried for the application domain that the user is interested in or for multiple domains. The results of the evaluation are sent to the user, enabling her to obtain an estimate of the trade-off between software benefit and hardware cost.

可以容易地对快速评估系统进行扩展，以便就如何修改一种配置使处理器进一步地优化提出建议。一个这样的实例就是将每一种配置选项跟一组数字联系起来，上述数字表示该选项对各种成本度量诸如面积、延时和功率的增加的影响。使用快速评估系统使得计算一种给定的选项对增加成本的影响变得容易。它仅涉及对评估系统的两次调用，其中一次有选项，一次没有选项。这两次评估的成本差异表示该选项对增加成本的影响。例如，通过对两种配置(有和没有MAC16选项)的面积成本进行评估，来计算MAC16选项对增加面积的影响。随后在交互式配置系统中显示有MAC16选项时的差异。这样一个系统能引导用户通过一系列的单步改进到达一种优化的解决方案。The rapid evaluation system can be easily extended to make recommendations on how to modify a configuration to further optimize the processor. One such example is associating each configuration option with a set of numbers representing the option's incremental impact on various cost metrics such as area, latency, and power. Using the rapid assessment system makes it easy to calculate the incremental cost impact of a given option. It just involves two calls to the evaluation system, one with options and one without. The cost difference between these two assessments represents the impact of this option on increased costs. For example, the impact of the MAC16 option on increased area is calculated by evaluating the area cost of two configurations (with and without the MAC16 option). The difference when there is the MAC16 option is then displayed in the interactive configuration system. Such a system can guide the user through a series of single-step improvements to an optimized solution.

现在转到自动处理器配置过程的软件一边，本发明的这个实施例配置了软件开发工具30，使得它们为该处理器所专用。配置过程开始于软件开发工具30，这种工具可以推广应用于多种不同系统和指令集体系结构。这样的可重定目标的工具已经被广泛地研究并且为业界所熟知。这个实施例使用GNU族的工具，这是一种自由软件，包括例如GNU C编译程序，GNU汇编程序，GNU调试程序，GNU链接程序，GNU跟踪程序，以及各种实用程序。然后，通过直接从ISA描述产生软件的各部分，以及通过使用TPP对手写的软件的各部分进行修改，来自动地配置这些工具30。Turning now to the software side of the automatic processor configuration process, this embodiment of the invention configures the software development tools 30 so that they are specific to that processor. The configuration process begins with a software development tool 30 that can be generalized to a variety of different system and instruction set architectures. Such retargetable tools have been extensively researched and are well known in the industry. This embodiment uses the GNU family of tools, which is free software, including, for example, the GNU C compiler, the GNU assembler, the GNU debugger, the GNU linker, the GNU tracer, and various utilities. These tools 30 are then automatically configured by generating parts of the software directly from the ISA description, and by modifying parts of the handwritten software using TPP.

可以按照几种不同方法来配置GNU C编译程序。给出核心ISA描述之后，在汇编程序中许多依赖于机器的逻辑都可以采用手写。在可配置处理器的指令集中，编译程序的这个部分是共同的，并且用手来重定目标允许为取得最佳结果而进行细调。然而，即使对编译程序的这个手写部分来说，某些代码仍然是自动地从ISA描述中产生的。特别是，ISA描述定义各常数数值的集合，它们可以用于各种指令的各立即字段。对每一个立即字段来说，都产生一个判断函数，用以检验一个特定的常数数值在该字段中是否能够被编码。当为处理器60生成代码时，编译程序就使用这些判断函数。对编译程序配置的这个方面进行自动化消除了基于ISA描述与编译程序之间出现不一致的机会，并且它使得只要用最小的努力就能改变ISA中的常数。The GNU C compiler can be configured in several different ways. Given the core ISA description, much of the machine-dependent logic can be handwritten in assembler. This part of the compiler is common across the instruction sets of configurable processors, and manual retargeting allows fine-tuning for best results. However, even for this handwritten part of the compiler, some code is still automatically generated from the ISA description. In particular, the ISA description defines the set of constant values that can be used in the immediate fields of various instructions. For each immediate field, a judgment function is generated to check whether a specific constant value can be encoded in the field. These predicate functions are used by the compiler when generating code for the processor 60 . Automating this aspect of compiler configuration eliminates the chance of inconsistencies between ISA-based descriptions and compilers, and it enables constants in the ISA to be changed with minimal effort.

经过用TPP进行预处理，编译程序的若干部分就配置好了。对于通过参数选择来控制的各配置选项来说，在编译程序中对应的各项参数都经由TPP来设置。例如，编译程序具有一个标志变量，用以表示目标处理器60使用高位在先顺序还是低位在先顺序，并且使用一条TPP命令对这个变量进行自动地设置，上述命令从配置说明100中读出顺序参数。TPP也被用来根据在配置说明100中对应的封装是否被激活，有条件地使能或失能编译程序的手工编码部分，该部分产生用于可选的各ISA封装。例如，若配置说明仅包括MAC16的选项90，则在编译程序中仅包括用以产生各项乘法/累加指令的代码。After preprocessing with TPP, several parts of the compiler are configured. For each configuration option controlled by parameter selection, the corresponding parameters in the compiler are all set via TPP. For example, the compiler has a flag variable indicating whether the target processor 60 uses big-endian or little-endian order, and this variable is automatically set using a TPP command that reads the order from the configuration specification 100 parameter. The TPP is also used to conditionally enable or disable the hand-coded portion of the compiler that generates packages for each of the optional ISAs, depending on whether the corresponding package is enabled in the configuration specification 100 . For example, if the configuration specification only includes option 90 of MAC16, only the codes for generating multiplication/accumulation instructions are included in the compiler.

编译程序也被配置成支持经由TIE语言指定的设计者定义的各项指令。这种支持有两个层次。在最低层次，设计者定义的各项指令可用于宏，内部函数，或者在正在被编译的代码中的在线(外部的)函数。本发明的这个实施例产生一个C语言头文件，它将在线函数定义为“在线汇编”代码(GNU C编译程序的一个标准特征)。给出设计者定义的操作码及其各项操作数的TIE说明之后，生成头文件的过程也就是转换为GNU C编译程序的在线汇编句法的一种直截了当的过程。一种可供选择的实施方案生成含有C预处理器的各个宏(它们指定在线汇编的各项指令)的头文件。再一个可供选择的方案使用TPP直接地将内部函数添加到编译程序之中。The compiler is also configured to support designer-defined instructions specified via the TIE language. There are two levels of this support. At the lowest level, designer-defined directives can be used in macros, intrinsic functions, or inline (external) functions in the code being compiled. This embodiment of the invention produces a C-language header file that defines in-line functions as "in-line assembly" code (a standard feature of the GNU C compiler). After giving the TIE description of the opcode and its operands defined by the designer, the process of generating the header file is a straightforward process of converting to the online assembly syntax of the GNU C compiler. An alternative implementation generates a header file containing the C preprocessor's macros that specify instructions for in-line assembly. Yet another alternative uses TPP to add intrinsics directly to the compiler.

通过让编译程序自动地识别使用各项指令的机会，来提供对设计者定义的各项指令的第2层支持。可以由用户直接地定义这些TIE指令或者在配置过程中自动地生成。在编译用户应用程序之前，TIE代码被自动地察看，并且被转换为等效的C语言函数。这个步骤同样被用来对各项TIE指令进行快速仿真。等效的C语言函数被部分地编译为编译程序所使用的基于树状的中间表示。对每一条TIE指令来说，这种表示被存储在一个数据库之中。当用户应用程序被编译时，编译过程的一部分就是一段模式匹配程序。用户应用程序被编译为基于树状的中间表示。在用户程序中，模式匹配程序对每一棵树都从底部开始扫描。在扫描的每一个步骤中，模式匹配程序检查植根于当前点的立即表示是否匹配于在数据库中的任何TIE指令。若存在匹配，则该匹配被登记。在完成对每一棵树的扫描之后，最大程度的匹配集合被选中。在该树中，每一次最大匹配都被置换为等效的TIE指令。Provides Layer 2 support for designer-defined directives by letting the compiler automatically identify opportunities to use them. These TIE instructions can be defined directly by the user or generated automatically during configuration. TIE code is automatically inspected and converted to equivalent C language functions before compiling the user application. This step is also used to quickly simulate various TIE instructions. The equivalent C language function is partially compiled into a tree-based intermediate representation used by the compiled program. For each TIE instruction, this representation is stored in a database. When a user application is compiled, part of the compilation process is a pattern matching program. User applications are compiled to a tree-based intermediate representation. In the user program, the pattern matching program scans each tree from the bottom. At each step of the scan, the pattern matching routine checks whether the immediate representation rooted at the current point matches any TIE instructions in the database. If there is a match, the match is registered. After completing the scan of each tree, the maximum matching set is selected. In this tree, each maximum match is replaced by an equivalent TIE instruction.

上述算法将自动地识别使用无状态的各项TIE指令的机会。也可以使用各种附加的方案来自动地识别使用有状态的各项TIE指令的机会。一个先前的部分描述了用于自动地选择具有状态的可能的各项TIE指令的算法。相同的算法被用来自动地使用在C或C++应用程序中的各项TIE指令。当一个TIE协处理器被定义为具有更多的寄存器，但只有有限的运算集合时，就对各代码区域进行扫描，以察看它们是否会出现寄存器溢出，以及那些区域是否仅使用可得到的运算的集合。若这样的区域被找到，则在那些区域中的代码将自动地被改变为使用协处理器的各项指令以及各寄存器98。在区域的边界上产生转换操作，以便将数据送入或送出协处理器98。类似地，若一个TIE协处理器已经被定义为对不同大小的整数进行运算，则各代码区域被检查，以察看在该区域中的所有数据是否都被存取，因为它具有不同的大小。对于匹配的各区域来说，其代码被转换，并且glue代码被添加到边界上。类似地，若一个TIE协处理器已经被定义为实现一种C++语言的抽象数据类型，则在那种数据类型中的所有运算都被置换为TIE协处理器的各项指令。The above algorithm will automatically identify opportunities to use stateless TIE instructions. Various additional schemes can also be used to automatically identify opportunities to use stateful TIE instructions. A previous section described an algorithm for automatically selecting possible TIE instructions with state. The same algorithm is used to automatically use TIE instructions in C or C++ applications. When a TIE coprocessor is defined to have more registers but only a limited set of operations, regions of the code are scanned to see if they are subject to register overflow and if those regions use only available operations collection. If such regions are found, the code in those regions will automatically be changed to use the coprocessor instructions and registers 98 . Switch operations are generated at the boundaries of the regions to move data into or out of coprocessor 98 . Similarly, if a TIE coprocessor has been defined to operate on integers of different sizes, each code region is checked to see if all data in that region is accessed because it is of a different size. For each region that matches, its code is converted and the glue code is added to the boundary. Similarly, if a TIE coprocessor has been defined to implement an abstract data type of the C++ language, all operations in that data type are replaced by instructions of the TIE coprocessor.

要注意的是，自动地建议TIE指令以及自动地使用TIE指令二者都是独立地有用的。经由内在机制，用户可以人工地使用所建议的各项TIE指令，并且可以将所使用的算法应用于人工地设计的各项TIE指令或各协处理器98。Note that both automatically suggesting TIE instructions and automatically using TIE instructions are independently useful. Via the internal mechanism, the user can manually use the suggested TIE instructions and apply the used algorithm to the manually designed TIE instructions or coprocessors 98 .

不管设计者定义的各项指令是如何产生的，或者经由各在线函数或者借助于自动识别，编译程序都需要知道设计者定义的各项指令的潜在的侧面效应，使得它能对这些指令进行优化和调度。为了改进性能，传统的编译程序优化用户代码，以便使所需的各项特性，诸如运行时间性能、代码大小或功耗，得以优化。如同一位精通的专业人士所熟知的那样，这样的优化包括诸如重新排列各指令，或者将某些指令置换为语义上等效的其他指令。为了很好地进行优化，No matter how the designer-defined instructions are generated, either through various online functions or by means of automatic recognition, the compiler needs to know the potential side effects of the designer-defined instructions so that it can optimize these instructions and scheduling. To improve performance, conventional compilers optimize user code so that desired characteristics, such as runtime performance, code size, or power consumption, are optimized. As is well known to a skilled practitioner, such optimizations include such things as rearranging instructions, or replacing certain instructions with others that are semantically equivalent. In order to optimize well,

编译程序应当知道每一条指令是如何影响机器的不同部分的。两条对机器状态的不同部分进行读和写的指令可以自由地被重新排序。两条对机器状态的同一部分进行访问的指令通常不能被重新排序。对传统的处理器来说，由不同的指令来进行状态的读和/或写通过硬件接线，有时通过表格，进入编译程序。在本发明的一个实施例中，各项TIE指令被保守地设定为对处理器60的所有状态进行读和写。这使编译程序能产生正确的代码，但是限制了编译程序在出现TIE指令时对代码进行优化的能力。在本发明的另一个实施例中，一种工具自动地读出TIE定义，并且为每一条TIE指令发现哪一种状态是被所述指令读或写的。然后，这个工具修改被编译程序的优化程序所使用的表格，以便精确地模拟每一条TIE指令的效果。The compiler should know how each instruction affects different parts of the machine. Two instructions that read and write to different parts of the machine state can be freely reordered. Two instructions that access the same part of the machine state generally cannot be reordered. With conventional processors, state reads and/or writes are performed by various instructions through hardware wiring, sometimes through tables, into the compiler. In one embodiment of the present invention, the TIE instructions are conservatively set to read and write all states of processor 60 . This enables the compiler to generate correct code, but limits the ability of the compiler to optimize the code when a TIE instruction is present. In another embodiment of the invention, a tool automatically reads the TIE definition and finds for each TIE instruction which state is read or written by that instruction. The tool then modifies the tables used by the compiler's optimizer to accurately simulate the effect of each TIE instruction.

像编译程序那样，汇编程序110的依赖于机器的部分包括自动生成的部分以及用TPP配置的手工编码部分。手工编写的代码支持所有配置所共有的某些特征。然而，汇编程序110的主要任务是对机器指令进行编码，并且可以从ISA描述中自动地生成指令的编码与解码软件。Like the compiler, the machine-dependent parts of the assembler 110 include automatically generated parts as well as hand-coded parts configured with TPP. Hand-written code supports certain characteristics common to all configurations. However, the main task of the assembler 110 is to encode machine instructions, and the encoding and decoding software for the instructions can be automatically generated from the ISA description.

由于在几种不同的软件工具中，指令的编码和解码都是有用的，所以本发明的这个实施例将软件集中在一起，以便在一个独立的软件库中执行这些任务。使用在ISA描述中的信息自动地生成这个库。该库定义各操作码的一个枚举，一个函数，它将操作码助记符的字符串有效地映射为该枚举的成员(StringToOpcode)，以及为每一组操作码指定指令长度的表格(InstructionLength)，操作数的数目，(numberOfOperands)，操作数字段，操作数类型(即，寄存器或立即数)(operandType)，二进制编码(encodeOpcode)，以及助记符串(opcodeName)。对每一个操作数字段来说，该库提供存取者函数，以便对指令字中对应的各比特进行编码(fieldSetFunction)和解码(fieldGetFunction)。ISA描述中所有这些信息都是现成可用的；产生库软件仅仅是将该信息转换为可执行的C语言代码。例如，各项指令的编码被记录在一个C数组变量之中，在其中，每一行都是针对一条特定指令的编码，通过将每一个操作码字段设置为在ISA描述中为该指令而指定的数值而产生上述编码；encodeOpcode函数仅为一组给定的操作码返回该数组的数值。Since the encoding and decoding of instructions is useful in several different software tools, this embodiment of the invention brings together the software to perform these tasks in a single software library. This library is automatically generated using the information in the ISA description. The library defines an enumeration for each opcode, a function that efficiently maps strings of opcode mnemonics to members of the enumeration (StringToOpcode), and tables specifying instruction lengths for each set of opcodes ( InstructionLength), number of operands, (numberOfOperands), operand field, operand type (ie, register or immediate) (operandType), binary encoding (encodeOpcode), and mnemonic string (opcodeName). For each operand field, the library provides accessor functions to encode (fieldSetFunction) and decode (fieldGetFunction) the corresponding bits in the instruction word. All of this information is readily available in the ISA description; generating the library software simply translates this information into executable C code. For example, the encoding of each instruction is recorded in a C array variable, where each row is the encoding for a particular instruction by setting each opcode field to the one specified for that instruction in the ISA description value to generate the above encoding; the encodeOpcode function returns the value of that array only for a given set of opcodes.

该库还提供一个函数，用以对二进制指令中的操作码进行解码(decodeInstruction)。这个函数被生成为嵌套的switch语句的一个序列，其中，最外层的开关对位于操作码层次结构的顶层的子操作码字段进行测试，并且，嵌套的switch语句对在操作码层次结构中层次逐渐降低的各子操作码字段进行测试。因此，为这个函数而生成的代码具有与操作码层次结构本身相同的结构。The library also provides a function to decode opcodes in binary instructions (decodeInstruction). This function is generated as a sequence of nested switch statements, where the outermost switch tests the sub-opcode field at the top level of the opcode hierarchy, and the nested switch statements test the sub-opcode fields at the top of the opcode hierarchy. Each sub-opcode field is tested at progressively lower levels. Therefore, the code generated for this function has the same structure as the opcode hierarchy itself.

给出用于编码和解码的这个库之后，汇编程序110的实现就变得很容易。例如，在汇编程序中的指令编码逻辑是十分简单的：Given this library for encoding and decoding, the implementation of assembler 110 is easy. For example, the instruction encoding logic in assembler is quite simple:

AssembleInstruction (String mnemonic，int arguments[])AssembleInstruction(String mnemonic, int arguments[])

beginbegin

opcode＝stringToOpcode(mnemonic)； opcode = stringToOpcode(mnemonic);

if(opcode＝＝UNDEFINED) if(opcode==UNDEFINED)

Error(″Unknown opcode″)； Error("Unknown opcode");

instruction＝encodeOpcode(opcode)； instruction = encodeOpcode(opcode);

numArgs＝numberOfOperands(opcode)； numArgs = numberOfOperands(opcode);

for i＝0，numArgs-1 do for i=0, numArgs-1 do

begin begin

setFun＝fieldSetFunction(opcode，i)； setFun = fieldSetFunction(opcode, i);

setFun(instruction，arguments[i])； setFun(instruction, arguments[i]);

end end

return instruction； return instruction;

endend

实现反汇编程序110(该程序将二进制指令转换为一种紧密地重新组合汇编代码的可读形式)也同样是直截了当的：Implementing a disassembler 110 (which converts binary instructions into a readable form that tightly reassembles assembly code) is equally straightforward:

DisassembleInstruction(BinaryInstruction instruction)DisassembleInstruction(BinaryInstruction instruction)

beginbegin

opcode＝decodeInstruction(instruction)； opcode = decodeInstruction(instruction);

instructionAddress+＝instructionLength(opcode)； instructionAddress+=instructionLength(opcode);

print opcodeName(opcode)； print opcodeName(opcode);

//Loop through the operands，disassembling each //Loop through the operands, disassembling each

numArgs＝numberOfOperands(opcode)； numArgs = numberOfOperands(opcode);

for i＝0，numArgs-1 do for i=0, numArgs-1 do

begin begin

type＝operandType(opcode，i)； type = operandType(opcode, i);

getFun＝fieldGetFunction(opcode，i)； getFun = fieldGetFunction(opcode, i);

value＝getFun(opcode，i，instruction)； value = getFun(opcode, i, instruction);

if(i！＝O)print″，″；//Commaseparateoperands if(i!=O) print ",";//Commseparateoperands

//Print based on the type of the operand //Print based on the type of the operand

switch(type) switch(type)

case register： case register:

printregisterPrefix(type)，value； printregisterPrefix(type), value;

case immediate： case immediate:

print value； print value;

case pc_relative_label： case pc_relative_label:

print instructionAddress+value； print instructionAddress+value;

//etc.for more different operand types //etc. for more different operand types

end end

endend

这个反汇编程序算法被用于一种卓越的反汇编程序工具，并且也用于调试程序130，以支持机器码的调试。This disassembler algorithm is used in a superior disassembler tool and is also used in debugger 130 to support debugging of machine code.

跟编译程序和汇编程序110相比，链接程序对配置是比较不敏感的。多数链接程序都是标准的，并且甚至依赖于机器的部分也主要是依赖于核心ISA描述，并且可以为一种特定的核心ISA进行手工编码。使用TPP从配置说明100对诸如顺序这样的参数进行设置。目标处理器60的存储器映射是链接程序所需的配置的一个其他方面。跟前面一样，用TPP将指定存储器映射的各项参数插入到链接程序之中。在本发明的这个实施例中，由一组链接程序稿本来驱动GNU链接程序，正是这些链接程序稿本含有存储器映射信息。这个方案的一个优点就是，若目标系统的存储器映射不同于处理器60在配置时所指定的存储器映射，则附加的链接程序稿本可以在以后生成，不用重新配置处理器60，也不用重建链接程序。因此，本实施例包括一种工具，，它用不同的存储器映射参数来配置新的链接程序稿本。Compared to the compiler and assembler 110, the linker is less sensitive to configuration. Most linkers are standard, and even the machine-dependent parts are largely dependent on the core ISA description and can be hand-coded for a specific core ISA. Parameters such as sequence are set from configuration specification 100 using TPP. The memory map of the target processor 60 is one other aspect of configuration required by the linker. As before, use TPP to insert the parameters specifying the memory map into the linker. In this embodiment of the invention, the GNU linker is driven by a set of linker scripts, and it is these linker scripts that contain the memory map information. An advantage of this scheme is that if the memory map of the target system differs from that specified by the processor 60 at configuration time, additional linker scripts can be generated later without reconfiguring the processor 60 and without rebuilding the link program. Therefore, this embodiment includes a tool that configures a new linker script with different memory map parameters.

调试程序130提供下列各种机制：在运行过程中观察程序的状态，在一段时间里单步执行一条指令，引入各断点，执行其他标准的调试任务。被调试的程序可以运行于已配置的处理器的硬件实施方案，或者运行于ISS126之上。在无论哪一种情况下，调试程序都向用户提供相同的接口。当在一个硬件实施方案上运行该程序时，一段小的监控程序被纳入到目标系统之中，用以控制用户程序的执行，并经由一个串行口跟调试程序进行通信。当在仿真程序126上运行该程序时，仿真程序126本身就执行那些功能。调试程序130以几种方式依赖于配置。它跟上述的指令编码/解码库链接，以支持从调试程序130对机器码进行反汇编。通过扫描ISA描述来找出哪一个寄存器存在于处理器60之中，来产生调试程序130中用于显示处理器的寄存器状态的部分，以及向调试程序130提供信息的监控程序部分和ISS126。Debugger 130 provides various mechanisms for observing the state of the program during execution, stepping through an instruction over a period of time, introducing breakpoints, and performing other standard debugging tasks. The debugged program can run on the configured hardware implementation of the processor, or on the ISS126. In either case, the debugger presents the same interface to the user. When running the program on a hardware implementation, a small monitor program is incorporated into the target system to control the execution of the user program and communicate with the debugger via a serial port. When the program is run on the emulator 126, the emulator 126 itself performs those functions. Debugger 130 relies on configuration in several ways. It links with the instruction encoding/decoding library described above to support disassembly of machine code from the debugger 130 . The portion of the debugger 130 that displays the state of the registers of the processor, the monitor portion that provides information to the debugger 130 and the ISS 126 are generated by scanning the ISA description to find out which registers exist in the processor 60 .

其他软件开发工具30都是标准的，并且不需要为每一种处理器配置而加以改变。特征观察程序和各种应用程序都归入这一类。一旦运行于为处理器60的所有配置所共享的二进制格式的文件之上时，这些工具就可能需要重定目标，但是它们既不依赖于ISA描述，也不依赖于在配置说明100中的其他参数。Other software development tools 30 are standard and need not be changed for each processor configuration. Feature watchers and various applications fall into this category. These tools may need to be retargeted once run on the binary format files shared by all configurations of the processor 60, but they depend neither on the ISA description nor on other parameters in the configuration specification 100 .

配置说明也被用来配置示于图13的被称为ISS126的一段仿真程序。ISS126是一段软件应用程序，它模拟可配置处理器指令集的功能行为。不同于诸如Synopsys的VCS以及Cadence的Verilog XL和NC仿真程序那样的对置的处理器硬件模型，ISS HDL模型是CPU在执行指令时的一种抽象。由于它不需要模拟在整个处理器设计中每一个门和寄存器的每一次状态改变，所以ISS126可以运行得比硬件仿真更快。The configuration specification is also used to configure a section of the simulation program shown in FIG. 13 called ISS126. The ISS126 is a software application that emulates the functional behavior of a configurable processor instruction set. Unlike opposed processor hardware models such as Synopsys' VCS and Cadence's Verilog XL and NC simulators, the ISS HDL model is an abstraction of the CPU executing instructions. Because it does not need to simulate every state change of every gate and register in the entire processor design, the ISS126 can run faster than hardware emulation.

ISS126允许为已配置的处理器60而生成的程序在一部宿主计算机上被执行。它精确地再现该处理器的复位和中断行为，这些行为允许对诸如设备驱动程序和初始化代码这样的低级程序进行开发。当把本地代码变换为嵌入式应用程序时，这是特别有用的。ISS 126 allows programs generated for configured processors 60 to be executed on a host computer. It accurately reproduces the processor's reset and interrupt behavior, which allows the development of low-level programs such as device drivers and initialization code. This is especially useful when converting native code to embedded applications.

ISS126可以被用来识别潜在的问题，诸如体系结构假设，存储器排序考虑等，用不着将代码下载到实际的已嵌入的目标。ISS126 can be used to identify potential problems, such as architectural assumptions, memory ordering considerations, etc., without downloading code to the actual embedded target.

在本实施例中，使用一种类似于C的语言来教学式地表达ISS语义，以建立C操作员积木块，它将指令转换为函数。可以使用这种语言来模拟中断的基本功能，例如，中断寄存器，位设置，中断等级，矢量等。In this embodiment, a C-like language is used to express ISS semantics pedagogically to build C operator building blocks, which translate instructions into functions. This language can be used to simulate the basic functions of interrupts, such as interrupt registers, bit settings, interrupt levels, vectors, etc.

可配置的ISS126被用于作为系统设计和验证过程的一部分的下列4种用途或目标：The configurable ISS126 is used for the following 4 purposes or objectives as part of the system design and verification process:

—在硬件变为可用之前调试软件应用程序；— debugging software applications before hardware becomes available;

—调试系统软件(例如，编译程序和操作系统部件)；— Debugging system software (e.g. compilers and operating system components);

—跟用于硬件设计验证的HDL仿真进行比较。ISS用作ISA的引用实现—在处理器设计验证过程中，ISS和处理器HDL为诊断程序和应用程序而运行，并且来自二者的轨迹被比较；以及—Comparison with HDL simulation for hardware design verification. The ISS is used as a reference implementation of the ISA—during processor design verification, the ISS and processor HDL are run for diagnostics and applications, and traces from both are compared; and

—分析软件应用程序性能(这可能是配置过程的一部分，或者在已经选定处理器的配置之后，它可以被用于进一步的应用程序调整)。- Analysis of software application performance (this may be part of the configuration process, or it may be used for further application tuning after the configuration of the processor has been selected).

所有的目标都要求ISS126能够对用可配置的汇编程序110和链接程序产生的程序进行装载和解码。它们还要求ISS对指令的执行在语义上等效于对应的硬件执行以及等效于编译程序的期望。因为这些原因，ISS126从用以定义硬件和系统软件的相同的ISA文件导出它的解码和执行行为。All targets require the ISS 126 to be able to load and decode programs generated with the configurable assembler 110 and linker. They also require that the ISS's execution of instructions be semantically equivalent to the corresponding hardware execution and to the expectations of the compiled program. For these reasons, the ISS 126 derives its decoding and execution behavior from the same ISA files used to define the hardware and system software.

针对上面所列的第1个和最后一个目标，对ISS126来说，重要的是尽可能快地达到所需的精度。因此，ISS126允许对仿真的详细程度进行动态控制。例如，除非被请求，对高速缓冲存储器的细节不进行仿真，并且高速缓冲存储器的模拟可以动态地关闭或接通。此外，在ISS126被编译之前，ISS126的各部件(例如，高速缓冲存储器和流水线模型)被配置，使得在运行时间中，ISS126很少作出依赖于配置的行为选择。这样一来，从涉及系统的其他各部分的定义好了的源导出所有ISS的可配置的行为。For the first and last goals listed above, it is important for the ISS126 to achieve the required accuracy as quickly as possible. Thus, ISS126 allows dynamic control over the level of detail of the simulation. For example, details of the cache memory are not emulated unless requested, and the emulation of the cache memory can be turned off or on dynamically. Furthermore, components of ISS 126 (eg, cache memory and pipeline models) are configured before ISS 126 is compiled such that at runtime, ISS 126 makes few configuration-dependent behavioral choices. In this way, all configurable behavior of the ISS is derived from well-defined sources involving other parts of the system.

针对上面所列的第1个和第3个目标，对ISS126来说，重要的是，当操作系统OS尚未为设计中的系统(目标)提供服务时，为应用程序提供操作系统服务。对这些服务来说，同样重要的是，当这是调试过程的一个相关部分时，由目标OS提供这些服务。这样一来，系统提供一种设计，用于在ISS主机和仿真目标之间灵活地传送这些服务。当前的设计依赖于ISS动态控制(陷阱SYSCALL指令可以被接通和关闭)以及使用专门的SIMCALL指令去请求主机操作系统服务这两者的组合。With respect to goals 1 and 3 listed above, it is important for ISS126 to provide operating system services to applications when the operating system OS is not already serving the system (target) under design. It is also important for these services that they are provided by the target OS when this is a relevant part of the debugging process. In this way, the system provides a design for the flexible delivery of these services between the ISS host and the simulation target. The current design relies on a combination of ISS dynamic control (the trap SYSCALL instruction can be turned on and off) and the use of a dedicated SIMCALL instruction to request host operating system services.

最后一个目标要求ISS126去模拟处理器和系统行为的某些方面，这些方面低于ISA指定的层次。特别是，通过为来自Perl稿本(它从配置数据库100提取各项参数)的模型产生C语言代码，来构建ISS的高速缓冲存储器模型。此外，各项指令的流水线行为的细节(例如基于寄存器使用的互锁以及功能单元有效性要求)也从配置数据库100导出。在当前的实施方案中，一个专用的流水线描述文件按照类似于LISP语言的句法来指定这种信息。The last goal requires ISS126 to emulate certain aspects of processor and system behavior below the level specified by the ISA. In particular, the cache model of the ISS is constructed by generating C code for the model from a Perl script that extracts parameters from the configuration database 100 . In addition, details of the pipeline behavior of individual instructions, such as interlocks based on register usage and functional unit validity requirements, are also derived from the configuration database 100 . In the current embodiment, a dedicated pipeline description file specifies this information in a syntax similar to the LISP language.

第3个目标要求对中断行为进行精确的控制。为此目的，在ISS126中的一个专门的非体系结构寄存器被用来抑制各种中断使能。The third goal requires precise control over interrupt behavior. For this purpose, a special non-architectural register in the ISS126 is used to suppress various interrupt enables.

ISS126提供提供几种接口来支持针对其使用的不同目标：ISS126 provides several interfaces to support different targets for its use:

—一个批处理命令或命令行模式(通常结合第1个和最后一个目标使用)；— a batch command or command line mode (usually used in conjunction with the first and last targets);

—一个命令循环模式，它提供非符号的调试能力，例如，断点，监测点，步骤等—频繁地用于所有4个目标；以及— a command loop mode that provides non-symbolic debugging capabilities, e.g., breakpoints, watchpoints, steps, etc. — frequently used for all 4 targets; and

—一个插座接口，它允许ISS126被软件调试程序作为一个执行后端来使用(这应当被配置成能对所选定的特定配置的寄存器状态进行读和写)。- A socket interface that allows the ISS126 to be used by a software debugger as an execution backend (this should be configured to read and write register states for a particular configuration selected).

—一个可用稿本来描述的接口，它允许非常详细的调试和性能分析。特别是，这个接口可以被用来对不同配置的应用行为进行比较。例如，在任何断点上，来自一种配置的运行状态可以跟来自另一种配置的运行状态进行比较，或者转移到后一种状态。— A scriptable interface that allows very detailed debugging and performance analysis. In particular, this interface can be used to compare the behavior of applications with different configurations. For example, at any breakpoint, the running state from one configuration can be compared to, or transitioned to, the running state from another configuration.

仿真程序126也具有手工编码和自动生成这两部分。手工编码部分是常规的，除了指令解码和执行以外，这两者都是从ISA描述语言产生的表格中生成的。通过开始于从待执行的指令字中找到的基本操作码，这些表对指令进行解码，用字段的值索引到一份表格之中，继续进行下去，直到找到一个叶操作码(即，一个没有按照其他操作码的样式被定义的操作码)为止。然后该表格向从TIE代码转换过来的代码给出一个指针，上述TIE代码按照针对该指令的语义说明来指定。这组代码被执行，以便仿真该指令。The simulation program 126 also has both hand-coded and automatically generated parts. The hand-coded parts are conventional, except for instruction decoding and execution, both of which are generated from tables produced by the ISA description language. These tables decode the instruction by starting with the base opcode found in the instruction word to be executed, indexing into a table with the value of the field, and continuing until a leaf opcode is found (i.e., one without opcodes defined in the pattern of other opcodes) until. The table then gives a pointer to the code converted from the TIE code specified according to the semantic specification for the instruction. This set of code is executed to emulate the instruction.

ISS126能够可选地跟踪被仿真的程序的执行。这种跟踪使用一种为业界所熟知的程序计数器(PC)采样技术。在规则的间隔上，仿真程序126对正在被仿真的处理器的程序计数器进行采样。它按照每一个代码区域的采样数来建立一个直方图。仿真程序126还对在调用图中的每一个边沿被执行的次数进行计数，其方法是，当一条调用指令被仿真时，令计数器加1。当仿真过程完成时，仿真程序126写一个输出文件，其中包括直方图以及调用图边沿计数，其格式是可以被一个标准的跟踪观察程序所读出。由于被仿真的程序118不需要用仪器方式(如同在一种标准的跟踪技术之中)来进行修改，所以跟踪开销不影响仿真结果，并且这种跟踪是完全无损伤的。ISS 126 can optionally trace the execution of the simulated program. This tracking uses a program counter (PC) sampling technique well known in the industry. At regular intervals, emulator 126 samples the program counter of the processor being emulated. It builds a histogram based on the number of samples per code region. The emulator 126 also counts the number of times each edge in the call graph is executed by incrementing the counter when a call instruction is emulated. When the simulation process is complete, the simulator 126 writes an output file, including histograms and call graph edge counts, in a format that can be read by a standard trace watcher. Since the simulated program 118 does not need to be modified in an instrumental manner (as in a standard tracing technique), the tracing overhead does not affect the simulation results, and the tracing is completely non-destructive.

最好是，系统进行有效的硬件处理器仿真以及软件处理器仿真。为此目的，本实施例提供一块仿真板。如图6所示，仿真板200使用一个复合的可编程逻辑器件202。例如Altera Flex 10K200E从硬件上仿真处理器配置60。一旦用该系统产生的处理器网表来进行编程，该CPLD器件就从功能上等效于最后的ASIC产品。它提供这样的好处，即处理器60的物理实现是可行的，它比其他仿真方法(例如ISS或HDL)运行得更快，并且在周期上是精确的。然而，它不能达到最终ASIC所能达到的各项高频目标。Preferably, the system performs effective hardware processor emulation as well as software processor emulation. For this purpose, this embodiment provides a dummy board. As shown in FIG. 6 , the emulation board 200 uses a composite programmable logic device 202 . For example, Altera Flex 10K200E emulates processor configuration 60 from the hardware. Once programmed with the processor netlist generated by the system, the CPLD device is functionally equivalent to the final ASIC product. It provides the benefit that a physical implementation of the processor 60 is feasible, runs faster than other simulation methods such as ISS or HDL, and is cycle accurate. However, it cannot achieve the high-frequency goals that the final ASIC can achieve.

这块板使得设计者能够评估各种处理器配置选项，并且在设计周期的较早阶段就开始进行软件开发与调试。它还可以用于该种处理器配置的功能验证。The board enables designers to evaluate various processor configuration options and begin software development and debugging earlier in the design cycle. It can also be used for functional verification of this processor configuration.

仿真板200在其上具有若干资源，使得软件的开发、调试和验证变得容易。这些包括CPLD器件202本身，EPROM 204，SRAM 206，同步SRAM 208，闪烁存储器210以及两个RS232串行通道212。串行通道212提供一条通往UNIX或PC主机的通信链路，以便下载和调试用户程序。处理器60的配置，以CPLD网表的形式，通过一条通往该器件的配置端口214的专用的串行链路，或者通过专用的各配置ROM216被下载到CPLD 202。The emulation board 200 has several resources on it that make development, debugging and verification of software easy. These include the CPLD device 202 itself, EPROM 204 , SRAM 206 , synchronous SRAM 208 , flash memory 210 and two RS232 serial channels 212 . Serial channel 212 provides a communications link to a UNIX or PC host for downloading and debugging user programs. The configuration of processor 60, in the form of a CPLD netlist, is downloaded to CPLD 202 via a dedicated serial link to the device's configuration port 214, or via dedicated configuration ROMs 216.

在板200上可用的资源同样是可配置到一定程度的。由于映射是通过一种可以容易地改变的可编程逻辑器件(PLD)217来完成的，所以在板上各种存储元件的存储器映射都可以容易地被改变。同样，通过使用较大(容量)的存储器器件以及适当地确定标记总线222和224(连接到高速缓冲存储器218和228)的大小，就能使处理器核心所使用的高速缓冲存储器218和228变为可扩展的。The resources available on board 200 are also configurable to a certain extent. Since the mapping is done by an easily changeable programmable logic device (PLD) 217, the memory map of the various storage elements on the board can be easily changed. Likewise, by using larger (capacity) memory devices and properly sizing tag buses 222 and 224 (connected to caches 218 and 228), the caches 218 and 228 used by the processor core can be made smaller. for extensibility.

使用该板来评估一种特定的处理器配置涉及几个步骤。第1个步骤是获得一组描述处理器的特定配置的RTL文件。下一个步骤是使用多种市售合成工具中的任何一种，从RTL描述中合成一个门级的网表。一个这样的实例就是来自Synopsys公司的FPGA EXPRESS。然后，用门级的网表来获得一种CPLD实施方案，该方案使用典型地由经销商提供的各种工具。一种这样的工具就是来自Altera公司的Maxplus2。最后一个步骤就是使用由CPLD经销商再次提供的编程器，将该实施方案下载到在仿真板上的CPLD芯片上。Using this board to evaluate a specific processor configuration involves several steps. The first step is to obtain a set of RTL files describing a specific configuration of the processor. The next step is to synthesize a gate-level netlist from the RTL description using any of a variety of commercially available synthesis tools. One such example is the FPGA EXPRESS from Synopsys. A gate-level netlist is then used to obtain a CPLD implementation using tools typically provided by distributors. One such tool is Maxplus2 from Altera Corporation. The final step is to download the implementation to the CPLD chip on the emulation board, using the programmer again provided by the CPLD distributor.

由于仿真板的用途之一是支持用于调试目的的快速原型实施方案，所以重要的是，在前面的段落中所列举的CPLD实施过程是自动的。为了达到这个目标，通过将所有相关的文件集中到一个单独的目录之中，来定制提供给用户的各种文件。随后，提供一个完全定制的合成稿本，它能够将特定的处理器配置合成到顾客所选定的特定的FPGA器件中去。经销商的各种工具所使用的完全定制的实施方案稿本也同时生成。这样的合成和实施方案稿本从功能上保证具有最佳性能的正确的实施方案。通过将适当的命令纳入到稿本中去，以便读入跟特定的处理器配置有关的所有RTL文件，通过纳入适当的命令，以便基于在处理器配置中的I/O信号来分配芯片引脚位置，以及通过纳入各种命令，以便获得针对处理器逻辑的某些重要部分(例如门控时钟)的专门的逻辑实施方案，来达到功能上的正确性。该稿本还通过向所有的处理器I/O信号分配详细的定时约束条件，以及通过对某些重要信号的特殊处理，来改进该实施方案的性能。定时约束条件的一个实例就是，通过考虑在板上的该信号的延时，向一个信号分配特定的输入延时。重要信号处理的一个实例就是，向专用的全局接线分配时钟信号，以便在CPLD芯片上获得低的时钟延时差异。Since one of the purposes of the emulation board is to support rapid prototyping implementations for debugging purposes, it is important that the CPLD implementation process outlined in the previous paragraphs be automated. To achieve this goal, the various files presented to the user are customized by grouping all related files into a single directory. Subsequently, a fully customized synthesis script is provided that synthesizes the specific processor configuration into the specific FPGA device selected by the customer. Completely customized implementation scripts used by the dealer's various tools are also generated at the same time. Such syntheses and implementation scripts functionally ensure the correct implementation with optimal performance. By including appropriate commands in the script to read in all RTL files associated with a particular processor configuration, by including appropriate commands to assign chip pins based on the I/O signals in the processor configuration Functional correctness is achieved by including commands to obtain specialized logic implementations for certain important parts of the processor logic, such as clock gating. The manuscript also improves the performance of this implementation by assigning detailed timing constraints to all processor I/O signals, and by special handling of some important signals. An example of a timing constraint is assigning a specific input delay to a signal by taking into account the delay of that signal on the board. An example of important signal processing is distributing clock signals to dedicated global wires in order to obtain low clock delay variation across CPLD chips.

最好是，系统还为已配置的处理器60配置一个验证程序组。大多数像微处理器那样的复合设计的验证包括如下的流程：Preferably, the system also configures a set of verification programs for configured processors 60 . Verification of most composite designs like microprocessors involves the following flow:

—建立一个测试台，用以仿真该设计，并对输出进行比较，比较可以在测试台内进行，也可以使用一个像ISS126那样的外部模型；- Build a test bench to simulate the design and compare the outputs, either within the test bench or using an external model like the ISS126;

—写诊断程序，以产生刺激源；— write diagnostic procedures to generate stimuli;

—使用像有限状态机的行覆盖那样的方案来测量验证的覆盖，包括覆盖HDL、降低差错率、在该设计上所运行的矢量的数目等；以及- Use a scheme like row coverage of finite state machines to measure the coverage of verification, including covering HDL, reducing error rate, number of vectors run on the design, etc.; and

—若覆盖不充分—则写更多的诊断程序，以及使用各种工具，来产生各种诊断程序，以便进一步地实践该项设计。- If coverage is insufficient - write more diagnostics and use various tools to generate diagnostics to further practice the design.

本发明使用与此有些类似的流程，但考虑到本设计的可配置性，该流程的所有部件都被修改。这种方法学包括下列步骤：The present invention uses a flow somewhat similar to this, but all components of the flow are modified to allow for the configurability of the design. This methodology includes the following steps:

—为一种特定的配置建立一个测试台。该测试台的配置使用类似于针对HDL而描述的方案，并支持其中所支持的所有选项和扩展，即，高速缓冲存储器(容量)大小、总线接口、时钟、中断产生等；- Build a test bench for a specific configuration. The configuration of the test bench uses a scheme similar to that described for HDL and supports all options and extensions supported therein, i.e. cache (capacity) size, bus interface, clocks, interrupt generation, etc.;

—在HDL的一种特定配置上运行自检诊断程序。诊断程序本身是可配置的，以便针对硬件的一个特定片段而对它们进行剪裁。选择哪一段诊断程序来运行也依赖于配置；- Runs self-test diagnostics on a specific configuration of HDL. The diagnostics themselves are configurable so that they are tailored to a specific piece of hardware. Which segment of the diagnostic program is chosen to run also depends on the configuration;

—运行以伪随机方式产生的诊断程序，并且在执行每一条指令之后，将处理器状态跟ISS 126进行比较；以及— run diagnostics generated in a pseudo-random manner and compare the processor state to the ISS 126 after each instruction is executed; and

—测量验证的覆盖—使用测量功能和行覆盖的覆盖工具。同样，监控程序和检查程序也跟诊断程序一起运行，以监视非法的各种状态和各种情况。所有这些对一种特定的配置说明来说，都是可配置的。- Coverage for measurement verification - Coverage tools using measurement functions and line coverage. Likewise, monitoring programs and checking programs are run along with diagnostic programs to monitor illegal states and conditions. All of these are configurable for a particular configuration specification.

所有各验证部件都是可验证的。使用TPP来实现可配置性。All verification components are verifiable. Use TPP for configurability.

测试台是其中含有已配置的处理器60的系统的一个Verilog^TM模型。在本发明的情况下，测试台包括：The test bench is a Verilog ^(TM) model of the system with the processor 60 configured therein. In the case of the present invention, the test bench consists of:

—高速缓冲存储器，总线接口，外部存储器；- cache memory, bus interface, external memory;

—外部中断和总线差错产生；以及— external interrupt and bus error generation; and

—时钟产生。— Clock generation.

由于差不多所有的上述特性都是可配置的，所以测试台本身需要支持可配置性。这样一来，例如，根据配置自动地调整高速缓冲存储器的大小和宽度，以及外部中断的数目。Since nearly all of the above features are configurable, the testbench itself needs to support configurability. In this way, for example, the size and width of the cache memory, and the number of external interrupts are automatically adjusted according to the configuration.

测试台向被测试的器件—处理器60提供刺激源。通过提供预装到存储器的汇编级指令来做到这一点。它还产生用以控制处理器60的行为—例如，各种中断—的各种信号。同样，这些信号的频率和定时都是受控于测试台的，并且由后者自动地产生。The test bench provides stimuli to the device under test, the processor 60 . It does this by providing assembly-level instructions preloaded into memory. It also generates various signals used to control the behavior of processor 60, such as various interrupts. Again, the frequency and timing of these signals are controlled and automatically generated by the test bench.

诊断程序有两种类型的可配置性。首先，诊断程序用TPP来确定测试什么。例如，已经编写一种用以测试软件中断的诊断程序。这种诊断程序需要知道有多少种软件中断，以便产生正确的汇编代码。Diagnostics have two types of configurability. First, the diagnostic program uses the TPP to determine what to test. For example, a diagnostic program has been written to test for software interrupts. This diagnostic needs to know how many software interrupts there are in order to generate the correct assembly code.

其次，处理器配置系统10应当确定哪一种诊断程序适用于这种配置。例如，被编写用以测试MAC单元的诊断程序就不适用于不含有这种单元的处理器60。在本实施例中，通过使用一个含有关于每一种诊断程序的信息的数据库来完成这一步。该数据库可以包括针对每一种诊断程序的下列信息：Next, processor configuration system 10 should determine which diagnostics are appropriate for the configuration. For example, a diagnostic program written to test a MAC unit would not work on a processor 60 that does not contain such a unit. In this embodiment, this is accomplished by using a database containing information about each diagnostic procedure. The database can include the following information for each diagnostic procedure:

—使用该诊断程序，若某个选项已经被选中；— use this diagnostic procedure, if an option has been selected;

—若诊断程序不能带着各种中断去运行；- if the diagnostic program cannot be run with various interrupts;

—若诊断程序在运行时，需要各种专门的库或各种句柄；以及— If the diagnostic program is running, various special libraries or various handles are required; and

—若诊断程序不能在与ISS126协同仿真的情况下运行。- If the diagnostic program cannot be run under the condition of co-simulation with ISS126.

最好是，处理器硬件描述包括3种类型的测试工具：测试发生器工具，监控程序和覆盖工具(或检查程序)，以及一种协同仿真机制。测试发生器工具是以智能方式生成一系列处理器指令的各种工具。它们是各种伪随机测试发生器的序列。本实施例内部使用两种类型—专门开发的一种称为RTPG，另一种基于外部工具的称为VERA(VSG)。二者都具有围绕它们而建立的可配置性。基于针对一种配置的有效指令，它们将产生一系列的指令。这些工具也将能够处理从TIE新定义的各种指令—使得这些新定义的指令为了测试而随机地被产生。本实施例包括监控程序和检查程序，用以测量设计验证的覆盖程度。Preferably, the processor hardware description includes 3 types of test tools: test generator tools, monitor and coverage tools (or checkers), and a co-simulation mechanism. Test generator tools are tools that intelligently generate a sequence of processor instructions. They are sequences of various pseudo-random test generators. This embodiment uses two types internally - a specially developed one called RTPG and another based on an external tool called VERA (VSG). Both have configurability built around them. Based on the effective commands for a configuration, they will generate a sequence of commands. These tools will also be able to handle newly defined instructions from TIE - such that these newly defined instructions are randomly generated for testing purposes. This embodiment includes monitoring procedures and checking procedures to measure the coverage of design verification.

监控程序和覆盖工具伴随着一次回归运行而运行。覆盖工具监测诊断程序正在做什么，以及正在实践的HDL的功能和逻辑。在回归运行的整个过程中收集所有这些信息，并在以后进行分析，以便获得关于该逻辑的哪些部分需要进一步测试的提示。本实施例使用几种可配置的功能覆盖工具。例如，对一个特定的有限状态机来说，根据一种配置，它并不包括所有的状态。因此，对那种配置来说，功能覆盖工具并不需要尝试去检查那些状态或跳变。通过使该工具能够用TPP来进行配置，就能完成这一步。Monitors and coverage tools are run with a regression run. The coverage tool monitors what the diagnostic program is doing, as well as the functions and logic of the HDL being practiced. All this information is collected throughout the regression run and analyzed later to get hints as to which parts of that logic need further testing. This embodiment uses several configurable functional coverage tools. For example, for a particular finite state machine, it does not include all states according to a configuration. So, for that configuration, the functional coverage tool doesn't need to try to check those states or transitions. This is done by enabling the tool to be configured with TPP.

类似地，还有各种监控程序，用以检查在HDL仿真过程中出现的各种非法状态。这些非法状态可以表示为各种差错。例如，在一组3态总线中，两个驱动器不应同时处于高电位。这些监控程序是可配置的—根据在该种配置下是否纳入一种特定的逻辑，来增加或取消一些检查项目。Similarly, there are also various monitoring programs to check various illegal states that occur during HDL simulation. These illegal states can be expressed as various errors. For example, in a 3-state bus, both drivers should not be high at the same time. These monitoring programs are configurable—according to whether a specific logic is included in the configuration, some inspection items are added or canceled.

协同仿真机制将HDL跟ISS126连接在一起。它被用来检查在指令结束时在HDL和ISS126中，处理器的状态是否相同。在它知道为每一种配置纳入了哪些特征以及需要对哪一种状态进行比较这个范围内，它也是可配置的。这样一来，例如，数据的断点特征(导致)增加一个专门的寄存器。这种机制需要知道如何对这个新的专用寄存器进行比较。Co-simulation mechanism connects HDL and ISS126 together. It is used to check whether the state of the processor is the same in HDL and ISS126 at the end of the instruction. It is also configurable insofar as it knows which features to incorporate for each configuration and which states need to be compared. Thus, for example, the breakpoint feature of the data (results in) adding a special register. This mechanism needs to know how to compare against this new special purpose register.

经由TIE说明的指令语义可以被转换为功能上等效的C语言函数，以便用于ISS126，以及让系统设计者用于测试和验证。在配置数据库106中，一条指令的语义被各种工具转换为C语言函数(该工具使用标准的语法分析工具来建立一棵语法树)，然后顺着这棵语法树，检查是否符合语法规则，并且输出用C语言写成的对应的表达式。这种转换需要一次预通过，以便向所有表达式分配比特宽度并且重写语法树使某些转换得以简化。跟其他转换程序(例如HDL到C或者C到汇编语言编译程序)相比，这些转换程序是相对地简单的，并且可以由专业人士从TIE和C语言说明书开始进行编写。The instruction semantics specified via the TIE can be translated into functionally equivalent C language functions for use in the ISS126 and for testing and verification by system designers. In the configuration database 106, the semantics of an instruction is converted into a C language function by various tools (the tool uses a standard syntax analysis tool to build a syntax tree), and then follows the syntax tree to check whether the syntax rules are met, And output the corresponding expression written in C language. This conversion requires a pre-pass to assign bit widths to all expressions and rewriting the syntax tree to simplify some conversions. These translators are relatively simple compared to other translators such as HDL to C or C to assembly language compilers, and can be written by professionals starting from TIE and C language specifications.

使用由配置文件100配置的编译程序以及汇编/反汇编程序100，基准测试应用源代码118被编译和汇编，并且，使用样本数据集124，它被仿真以获得软件特征文件130，该文件也被送往用户配置俘获程序以便向用户反馈。Using the compiler configured by the configuration file 100 and the assembler/disassembler 100, the benchmark application source code 118 is compiled and assembled, and, using the sample data set 124, it is simulated to obtain a software profile 130, which is also Sent to the user to configure the capture program for feedback to the user.

有能力获得针对任何配置参数选择的硬件和软件价格/效益特征的为任何配置选择开辟了由设计者进一步地优化系统的机会。特别是，这将使设计者选择最佳的配置参数，这些参数根据某些估价函数来优化整个系统。一种可能的处理过程是基于一种贪婪的策略，即，通过重复地选择或不选择一种配置参数。在每一个步骤，都选择对整个系统性能和价格具有最佳影响的那些参数。这个步骤一直重复进行，直至找不到还能改进系统的性能与价格的单独的参数为止。其他扩展包括同时注视一组配置参数，或者使用更复杂的搜索算法。Having the ability to obtain hardware and software price/benefit characteristics for any choice of configuration parameters opens up opportunities for further optimization of the system by the designer for any configuration choice. In particular, this will allow the designer to choose the best configuration parameters that optimize the overall system according to some valuation function. One possible processing is based on a greedy strategy, ie by repeatedly selecting or not selecting a configuration parameter. At each step, those parameters are selected that have the best impact on overall system performance and price. This step is repeated until no single parameter that can improve the performance and price of the system can be found. Other extensions include looking at a set of configuration parameters simultaneously, or using more complex search algorithms.

除了获得最佳的配置参数选择以外，这种处理过程还可以被用来构建最佳处理器的各种扩展。由于在处理器的各种扩展中存在大量的可能性，重要的是限制扩展候选者的数目。其中，一种技术就是去分析应用软件并且仅注视那些能改进系统性能或价格的指令扩展。In addition to obtaining the optimal choice of configuration parameters, this process can also be used to construct various extensions to the optimal processor. Due to the large number of possibilities in various extensions of the processor, it is important to limit the number of extension candidates. One technique is to analyze application software and focus only on those instruction extensions that improve system performance or price.

已经讲完了根据本实施例的一个自动处理器配置系统的操作之后，现在将给出处理器宏体系结构配置的实例。第1个实例表示将本发明应用于图像压缩时的优点。Having described the operation of an automatic processor configuration system according to this embodiment, an example of processor macroarchitecture configuration will now be given. The first example shows the advantages of applying the invention to image compression.

运动评估是许多图像压缩算法(包括MPEG视频和263会议应用)的一个重要部分。视频图像压缩尝试使用从一帧到另一帧的相似性，以减少用于每一帧所需的存储容量。在最简单的情况下，每一块待压缩的图像都可以跟参考图像的对应块(相同的X，Y位置)进行比较(只有紧挨着的领先的或随后的图像被压缩)。介于各帧之间的图像差异的压缩与个别图像的压缩相比，前者通常具有较高的比特效率。在视频序列中，独特的图像特征通常在不同帧之间发生移动，所以在不同帧的各块之间的最接近的一致性通常不是准确地处于相同的X，Y位置上，而是有一些偏离。若图像的某些重要部分在不同帧之间发生移动，则有必要在对这些差异进行计算之前，识别和补偿这种运动。这个事实意味着通过对介于连续的图像之间的差异(包括对各种独特的特征，以及在用于已计算的差异的子图像中的X，Y偏离)进行编码，就能得到反差最强的表示。在用于计算图像差异的位置上的偏离被称为运动矢量。Motion estimation is an important part of many image compression algorithms, including MPEG video and 263 conferencing applications. Video image compression attempts to use the similarity from one frame to another to reduce the storage capacity required for each frame. In the simplest case, each block of the image to be compressed can be compared with the corresponding block (same X, Y position) of the reference image (only the immediately leading or subsequent image is compressed). Compression of image differences between frames is generally more bit efficient than compression of individual images. In video sequences, unique image features often move between frames, so the closest agreement between blocks in different frames is usually not exactly at the same X,Y position, but some Deviate. If some important part of the image moves between frames, it is necessary to identify and compensate for this motion before calculating these differences. This fact means that by encoding the differences between successive images (including the various unique features, as well as the X, Y deviations in the sub-images used for the calculated differences), the best contrast can be obtained. strong expression. The deviation in position used to calculate the image difference is called a motion vector.

在这一类图像压缩中，最繁重的计算任务就是为每一块确定最适当的运动矢量。选择运动矢量的常用方法就是在被压缩的每一块图像以及前一帧图像的各候选块的集合之间，找出在像素与像素之间具有最小的平均差异的矢量。各候选块是在围绕被压缩的块的位置上的所有的各邻近块的集合。图像的大小，块的大小，以及各邻近块的大小，都影响到运动估计算法的运行时间。In this type of image compression, the most computationally intensive task is to determine the most appropriate motion vector for each block. A common method of selecting a motion vector is to find the vector with the smallest average difference between pixels among each block of the compressed image and the set of candidate blocks of the previous frame image. Candidate blocks are the set of all neighboring blocks at positions surrounding the compressed block. The size of the image, the size of the block, and the size of each adjacent block, all affect the running time of the motion estimation algorithm.

简单的基于块的运动估计将待压缩的图像的每一帧子图像跟一帧参考图像进行比较。在视频序列中，参考图像可以领先于或跟随于主题图像。在每一种情况下，在主题图像被解压缩之前，该参考图像应当被解压缩系统认为是有效的。一块待压缩的图像跟参考图像的各候选块之间的比较说明如下。Simple block-based motion estimation compares each sub-image of the image to be compressed with a reference image. In a video sequence, a reference image can precede or follow a subject image. In each case, the reference image should be considered valid by the decompression system before the subject image is decompressed. The comparison between the candidate blocks of an image to be compressed and the reference image is described as follows.

围绕在参考图像中的对应位置，为主题图像的每一块进行一次搜索。通常，对图像的每一种彩色分量(例如YUV)单独地进行分析。有时，仅对一种分量，例如亮度，进行分析。在主题块和参考图像的搜索区域的每一个可能的块之间，计算像素与像素之间的平均差异。这个差异就是像素数值的大小的差异的绝对值。平均值跟在各块的对子中N2个像素之和成正比(这里，N是该块的维数)。产生最小平均像素差异的参考图像的块定义主题图像的该块的运动矢量。A search is performed for each block of the subject image around its corresponding location in the reference image. Typically, each color component (eg, YUV) of an image is analyzed separately. Sometimes, only one component, such as luminance, is analyzed. Compute the average pixel-to-pixel difference between the subject block and every possible block in the search area of the reference image. This difference is the absolute value of the difference in the size of the pixel values. The average value is proportional to the sum of N2 pixels in each pair of blocks (where N is the dimension of the block). The block of the reference image that yields the smallest average pixel difference defines the motion vector for that block of the subject image.

下面的实例表示运动估计算法的一种简单形式，然后使用TIE为一个小的专用功能单元优化其算法。此项优化产生10倍以上的加速效果，使得基于处理器的压缩适用于许多视频应用。它说明了一个将易于用高级语言进行编程跟专用硬件的效率结合在一起的处理器的功能。The following example shows a simple form of the motion estimation algorithm, which is then optimized for a small dedicated functional unit using TIE. This optimization yields a speedup of more than 10x, making processor-based compression suitable for many video applications. It illustrates the capabilities of a processor that combines ease of programming in a high-level language with the efficiency of dedicated hardware.

这个实例使用两个矩阵OldB和NewB，分别表示老图像和新图像。图像的大小被确定为NX和NY。块大小被确定为BLOCKX和BLOCKY。因此，该图像由NX/BLOCKX乘以NY/BLOCKY个块组成。围绕一个块的搜索区域被确定为SEARCHX和SEARCHY。最佳运动矢量和数值被存储在VectX，VectY，和VectB之中。由基本(参考)实施方案计算出来的最佳运动矢量和数值被存储在BaseX，BaseY，和BaseB之中。这些数值被用来检查由该实施方案使用指令扩展计算出来的各矢量。在以下的C代码段中，可以获得这些基本定义：This example uses two matrices, OldB and NewB, representing the old image and the new image, respectively. The size of the image is determined as NX and NY. Block sizes are determined as BLOCKX and BLOCKY. Thus, the image consists of NX/BLOCKX by NY/BLOCKY blocks. The search area around a block is determined as SEARCHX and SEARCHY. Optimal motion vectors and values are stored in VectX, VectY, and VectB. The optimal motion vectors and values calculated by the base (reference) implementation are stored in BaseX, BaseY, and BaseB. These values are used to check the vectors computed by the implementation using instruction extensions. In the following C code segment, these basic definitions can be obtained:

#define NX 64 /＊image width＊/ #define NX 64 /*image width*/

#define NY 32 /＊image height＊/ #define NY 32 /*image height*/

#define BLOCKX 16 /＊block width＊/ #define BLOCKX 16 /*block width*/

#define BLOCKY 16 /＊block height＊/ #define BLOCKY 16 /*block height*/

#define SEARCHX 4 /＊search region #define SEARCHX 4 /*search region

width＊/ width*/

#define SEARCHY 4 /＊search region #define SEARCHY 4 /*search region

height＊/ height＊/

unsigned char OldB[NX][NY]； /＊oldimage＊/ unsigned char OldB[NX][NY]; /*oldImage*/

unsigned char NewB[NX][NY]； /＊newimage＊/ unsigned char NewB[NX][NY]; /*newImage*/

unsignedshort VectX[NX/BLOCKX][NY/BLOCKY]； /＊Xmotionvector unsignedshort VectX[NX/BLOCKX][NY/BLOCKY]; /* XmotionVector

＊/*/

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]；/＊Ymotion vector unsigned short VectY[NX/BLOCKX][NY/BLOCKY]; /* Y motion vector

＊/*/

unsigned short VectB[NX/BLOCKX][NY/BLOCKY]；/＊absolute unsigned short VectB[NX/BLOCKX][NY/BLOCKY]; /*absolute

difference＊/ difference*/

unsigned short BaseX[NX/BLOCKX][NY/BLOCKY]；/＊Base X motion unsigned short BaseX[NX/BLOCKX][NY/BLOCKY]; /*Base X motion

vector＊/ vector*/

unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]；/＊Base Y motion unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]; /*Base Y motion

vector＊/ vector*/

unsigned short BaseB[NX/BLOCKX][NY/BLOCKY]；/＊Base absolute unsigned short BaseB[NX/BLOCKX][NY/BLOCKY]; /*Base absolute

difference＊/ difference*/

#define ABS(x) (((x)<0)？(-(x))：(x)) #define ABS(x) (((x)<0)?(-(x)):(x))

#define MIN(x，y) (((x)<(y))？(x)：(y)) #define MIN(x,y) (((x)<(y))?(x):(y))

#define MAX(x，y) (((x)>(y))？(x)：(y)) #define MAX(x,y) (((x)>(y))?(x):(y))

#define ABSD(x，y) (((x)>(y))？((x)一(y))：((y)-(x))) #define ABSD(x,y) (((x)>(y))?((x)one(y)):((y)-(x)))

运动评估算法包括3个嵌套的循环：The motion evaluation algorithm consists of 3 nested loops:

1.对老图像中的每一个源块。1. For each source block in the old image.

2.对在环绕源块区域的新图像的每一个目标块。2. For each target block in the new image in the area surrounding the source block.

3.计算介于每一对像素之间的绝对差值。3. Calculate the absolute difference between each pair of pixels.

该算法的完整代码列举如下。The complete code of the algorithm is listed below.

参考软件实施方案Reference software implementation

voidvoid

motion_estimate_base()motion_estimate_base()

{{

int bx，by，cx，cy，x，y； int bx, by, cx, cy, x, y;

int startx，starty，endx，endy； int startx, starty, endx, endy;

unsigned diff，best，bestx，besty； unsigned diff, best, bestx, besty;

for(bx＝0；bx<NX/BLOCKX；bx++){ for(bx=0; bx<NX/BLOCKX; bx++){

for(by＝0；by<NY/BLOCKY；by++){ for(by=0; by<NY/BLOCKY; by++){

best＝bestx＝besty＝UINT_MAX； best = bestx = besty = UINT_MAX;

startx＝MAX(0，bx＊BLOCKX-SEARCHX)； startx=MAX(0, bx*BLOCKX-SEARCHX);

starty＝MAX(0，by＊BLOCKY-SEARCHY)； starty=MAX(0,by*BLOCKY-SEARCHY);

endx＝MIN(NX-BLOCKX，bx＊BLOCKX+SEARCHX)； endx = MIN(NX-BLOCKX, bx*BLOCKX+SEARCHX);

endy＝MIN(NY-BLOCKY，by＊BLOCKY+SEARCHY)； endy=MIN(NY-BLOCKY, by*BLOCKY+SEARCHY);

for(cx＝startx；cx<endx；cx++){ for(cx=startx; cx<endx; cx++){

for(cy＝starty；cy<endy；cy++){ for(cy=starty; cy<endy; cy++){

diff＝0； diff=0;

for(x＝0；x<BLOCKX；x++){ for(x=0; x<BLOCKX; x++){

for(y＝0；y<BLOCKY；y++){ for(y=0; y<BLOCKY; y++){

diff+＝ABSD(OldB[cx+x][cy+y]， diff+=ABSD(OldB[cx+x][cy+y],

NewB[bx＊BLOCKX+x][by＊BLOCKY+y])；NewB[bx＊BLOCKX+x][by＊BLOCKY+y]);

} }

if(diff<best){ if(diff<best){

best＝diff； best = diff;

bestx＝cx； bestx=cx;

besty＝cy； besty=cy;

} }

BaseX[bx][by]＝bestx； BaseX[bx][by]=bestx;

BaseY[bx][by]＝besty； BaseY[bx][by]=besty;

BaseB[bx][by]＝best； BaseB[bx][by]=best;

} }

基本实施方案是简单的，它不能使用这种块与块之间的比较中的更多的内在平行性。可配置的处理器体系结构提供两种重要的工具，能显著地加速这种应用程序的执行。The basic implementation is simple, it cannot use much of the inherent parallelism in this block-to-block comparison. Configurable processor architectures provide two important tools that can significantly speed up the execution of such applications.

首先，该指令集体系结构包括强有力的漏斗式移位基元，允许在存储器中快速抽取未对准的字段。这允许像素比较的内环有效地从存储器中取出相邻各像素的组。该环可以被重写，使之能同时运行于4个像素(字节)之上。特别是，为了达到这个实例的目的，人们希望定义一条新的指令，以便在同一时间内计算4个像素对的绝对差值。然而，在定义这条新指令之前，有必要再次实施该算法，以利用这样一条指令。First, the ISA includes a powerful funneled shift primitive that allows fast fetching of misaligned fields in memory. This allows the inner loop of pixel comparisons to efficiently fetch groups of adjacent pixels from memory. The ring can be rewritten to run on 4 pixels (bytes) at the same time. In particular, for the purposes of this example, one wishes to define a new instruction to compute the absolute difference of 4 pixel pairs at the same time. However, before defining this new instruction, it is necessary to implement the algorithm again to take advantage of such an instruction.

这条指令的出现允许在内环差值计算中得到这样的改进，即，环的打开变得同样引人注目。内环的C语言代码被重写，以便利用新的绝对差值求和指令以及有效的移位。参考图像的4个重叠的块的一部分就能在同一个环中进行比较。SAD(x，y)是对应于所添加的指令的新的内部函数。SRC(x，y)对x和y的连锁进行右移，其位移量存储在SAR寄存器之中。The presence of this instruction allows such an improvement in the calculation of the inner loop difference that the opening of the loop becomes equally dramatic. The C code for the inner loop was rewritten to take advantage of the new sum-of-absolute-difference instruction and efficient shifting. Parts of the 4 overlapping blocks of the reference image can be compared in the same ring. SAD(x,y) is a new internal function corresponding to the added instruction. SRC(x, y) right-shifts the chain of x and y, and its displacement is stored in the SAR register.

使用SAD指令的运动估计的快速方式Fast way of motion estimation using SAD instruction

//

void void

motion_estimate_tie() motion_estimate_tie()

{ {

int bx，by，cx，cy，x； int bx, by, cx, cy, x;

int startx，starty，endx，endy； int startx, starty, endx, endy;

unsigned diff0，diff1，diff2，diff3，best，bestx，besty； unsigned diff0, diff1, diff2, diff3, best, bestx, besty;

unsigned＊N，N1，N2，N3，N4，＊O，A，B，C，D，E； unsigned *N, N1, N2, N3, N4, *O, A, B, C, D, E;

for(bx＝0；bx<NX/BLOCKX；bx++){ for(bx=0; bx<NX/BLOCKX; bx++){

for(by＝0；by<NY/BLOCKY；by++){ for(by=0; by<NY/BLOCKY; by++){

best＝bestx＝besty＝UINT_MAX； best = bestx = besty = UINT_MAX;

startx＝MAX(0，bx*BLOCKX-SEARCHX)； startx=MAX(0, bx*BLOCKX-SEARCHX);

starty＝MAX(0，by＊BLOCKY-SEARCHY)； starty=MAX(0,by*BLOCKY-SEARCHY);

endx＝MIN(NX-BLOCKX，bx*BLOCKX+SEARCHX)； endx = MIN(NX-BLOCKX, bx*BLOCKX+SEARCHX);

for(cy＝starty；cy<endy；cy+＝sizeof(long)){ for(cy=starty; cy<endy; cy+=sizeof(long)){

for(cx＝startx；cx<endx；cx++){ for(cx=startx; cx<endx; cx++){

diff0＝diff1＝diff2＝diff3＝0； diff0=diff1=diff2=diff3=0;

for(x＝0；x<BLOCKX；x++){ for(x=0; x<BLOCKX; x++){

N＝(unsigned＊)&(NewB[bx＊BLOCKX+x] N=(unsigned*)&(NewB[bx*BLOCKX+x]

[by＊BLOCKY])； [by＊BLOCKY]);

N1＝N[0]； N1=N[0];

N2＝N[1]； N2=N[1];

N3＝N[2]； N3=N[2];

N4＝N[3]； N4=N[3];

O＝(unsigned＊)&(OldB[cx+x][cy])； O=(unsigned*)&(OldB[cx+x][cy]);

A＝O[0]； A=O[0];

B＝O[1]； B=O[1];

C＝O[2]； C=O[2];

D＝O[3]； D=O[3];

E＝O[4]； E=O[4];

diff0+＝SAD(A，N1)+SAD(B，N2)+ diff0+=SAD(A, N1)+SAD(B, N2)+

SAD(C，N3)+SAD(D，N4)； SAD(C, N3)+SAD(D, N4);

SSAI(8)； SSAI(8);

diff1+＝SAD(SRC(B，A)，N1)+ diff1+=SAD(SRC(B,A),N1)+

SAD(SRC(C，B)，N2)+SAD(SRC(D，C)， SAD(SRC(C,B),N2)+SAD(SRC(D,C),

N3)+SAD(SRC(E，D)，N4)； N3)+SAD(SRC(E,D),N4);

SSAI(16)； SSAI(16);

diff2+＝SAD(SRC(B，A)，N1)+ diff2+=SAD(SRC(B,A),N1)+

SAD(SRC(C，B)，N2)+SAD(SRC(D，C)， SAD(SRC(C,B),N2)+SAD(SRC(D,C),

N3)+SAD(SRC(E，D)，N4)； N3)+SAD(SRC(E,D),N4);

SSAI(24)； SSAI(24);

diff3+＝SAD(SRC(B，A)，N1)+ diff3+=SAD(SRC(B,A),N1)+

SAD(SRC(C，B)，N2)+SAD(SRC(D，C)， SAD(SRC(C,B),N2)+SAD(SRC(D,C),

N3)+SAD(SRC(E，D)，N4)； N3)+SAD(SRC(E,D),N4);

O+＝NY/4； O+=NY/4;

N+＝NY/4； N+=NY/4;

} }

if(diff0<best) { if(diff0<best) {

best＝diff0； best = diff0;

bestx＝cx； bestx=cx;

besty＝cy； besty=cy;

} }

if(diff1<best) { if(diff1<best) {

best＝diff1； best = diff1;

bestx＝cx； bestx=cx;

besty＝cy+1； besty=cy+1;

} }

if(diff2<best) { if(diff2<best) {

best＝diff2； best = diff2;

bestx＝cx； bestx=cx;

besty＝cy+2； besty=cy+2;

} }

if(diff3<best) { if(diff3<best) {

best＝diff3； best = diff3;

bestx＝cx； bestx=cx;

besty＝cy+3； besty=cy+3;

} }

VectX[bx][by]＝bestx； VectX[bx][by]=bestx;

VectY[bx][by]＝besty； VectY[bx][by]=besty;

VectB[bx][by]＝best； VectB[bx][by]=best;

} }

}}

本实施方案使用下列SAD函数来评估最终的新指令：This implementation uses the following SAD function to evaluate the final new instruction:

4个字节的绝对差值求和4 bytes absolute difference summation

//

static inline unsigned static inline unsigned

SAD(unsigned ars，unsigned art) SAD (unsigned ars, unsigned art)

{ {

return ABSD(ars>>24，art>>24)+ return ABSD(ars>>24, art>>24)+

ABSD((ars>>16) & 255，(art>>16) & 255)+ ABSD((ars>>16) & 255, (art>>16) & 255)+

ABSD((ars>>8) & 255，(art>>8) & 255)+ ABSD((ars>>8) & 255, (art>>8) & 255)+

ABSD(ars & 255，art & 255)； ABSD(ars & 255, art &255);

} }

为了调试这个新的实施方案，使用下列的测试程序，将用新的实施方案以及用基本实施方案计算出来的两种运动矢量和数值加以比较：To debug this new implementation, use the following test program to compare the two motion vectors and values calculated with the new implementation and with the base implementation:

主测试程序main test program

//

int int

main(int argc，char＊＊argv) main(int argc, char**argv)

{ {

int passwd； int passwd;

#ifndef NOPRINTF #ifndef NOPRINTF

printf(″Block＝(％d，％d)，Search＝(％d，％d)，size＝(％d，％d)\n″， printf("Block=(%d,%d), Search=(%d,%d), size=(%d,%d)\n",

BLOCKX，BLOCKY，SEARCHX，SEARCHY，NX，NY)； BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY);

#endif #endif

init()； init();

motion_estimate base()； motion_estimate base();

motion_estimate_tie()； motion_estimate_tie();

passwd＝check()； passwd = check();

#ifndef NOPRINTF #ifndef NOPRINTF

printf(passwd？″TIE version passed\n″：″UTIE version printf(passwd? "TIE version passed\n": "UTIE version

failed\n″)；failed\n″);

#endif #endif

return passwd； return passwd;

} }

在整个开发过程中，都将使用这个简单的测试程序。这里，应当遵循的一条常规就是当检测到一个差错时，主程序应当返回0，否则，返回1。This simple test program will be used throughout the development process. Here, a convention that should be followed is that the main program should return 0 when an error is detected, and 1 otherwise.

使用TIE允许新指令的快速说明。可配置的处理器发生器能够在硬件实施方案以及软件开发工具这两方面充分地实现这些指令。硬件合成将新函数的最佳整合生成到硬件数据通路之中。可配置的处理器的软件环境完全支持在C和C++编译程序、汇编程序、符号调试程序、跟踪程序以及周期精确指令集仿真程序中的新指令。硬件和软件的快速再生使得专用指令成为一种用于应用程序加速的快速和可靠的工具。Using TIE allows for quick specification of new instructions. A configurable processor generator can fully implement these instructions, both in hardware implementations and software development tools. Hardware synthesis generates the optimal integration of new functions into the hardware datapath. The configurable processor's software environment fully supports new instructions in the C and C++ compilers, assemblers, symbolic debuggers, trace programs, and cycle-accurate instruction set emulators. The rapid regeneration of hardware and software makes dedicated instructions a fast and reliable tool for application acceleration.

这个实例使用TIE来实现一条简单指令，以便并行地执行4个像素的像素求差值、取绝对值和累加。这条单字节指令能进行11种基本运算(在常规的处理过程中，可能需要多条独立的指令)，如同一种原子运算那样。下面是完整的描述：This example uses TIE to implement a simple instruction to perform pixel difference, absolute value, and accumulation for 4 pixels in parallel. This single-byte instruction can perform 11 basic operations (multiple independent instructions may be required during normal processing), as an atomic operation. Here is the full description:

//define a new opcode for Sum of Absolute Difference(SAD)//define a new opcode for Sum of Absolute Difference(SAD)

//from which instruction decoding logic is derived//from which instruction decoding logic is derived

opcode SAD op2＝4′b0000 CUSTOopcode SAD op2=4'b0000 CUSTO

//define a new instruction class//define a new instruction class

//from which compiler，assembler，disassembler//from which compiler, assembler, disassembler

//routines are derived//routines are derived

iclass sad(SAD}{out arr，in ars，in art}iclass sad(SAD} {out arr, in ars, in art}

//semantic definition from which instruction-set//semantic definition from which instruction-set

//simulation and RTL descriptions are derived//simulation and RTL descriptions are derived

semantic sad_logic(SAD){semantic sad_logic(SAD){

wire[8∶0]diff01，diff11，diff21，diff31； wire[8:0] diff01, diff11, diff21, diff31;

wire[7∶0]diff0r，diff1r，diff2r，diff3r； wire[7:0] diff0r, diff1r, diff2r, diff3r;

assign diff01＝art[7∶0]-ars[7∶0]； assign diff01 = art[7:0] - ars[7:0];

assign diff11＝art[15∶8]-ars[15∶8]； assign diff11 = art[15:8] - ars[15:8];

assign diff21＝art[23∶16]-ars[23∶16]； assign diff21 = art[23:16] - ars[23:16];

assign diff31＝art[31∶24]-ars[31∶24]； assign diff31 = art[31:24] - ars[31:24];

assign diff0r＝ars[7∶0]-art[7∶0]； assign diff0r = ars[7:0] - art[7:0];

assign diff1r＝ars[15∶8]-art[15∶8]； assign diff1r = ars[15:8] - art[15:8];

assign diff2r＝ars[23∶16]-art[23∶16]； assign diff2r = ars[23:16] - art[23:16];

assign diff3r＝ars[31∶24]-art[31∶24]； assign diff3r = ars[31:24] - art[31:24];

assign arr＝ assign arr=

(diff01[8]？diff0r：diff01)+ (diff01[8]?diff0r:diff01)+

(diff11[8]？diff1r：diff11)+ (diff11[8]?diff1r:diff11)+

(diff21[8]？diff2r：diff21)+ (diff21[8]?diff2r:diff21)+

(diff31[8]？diff3r：diff31)； (diff31[8]?diff3r:diff31);

}}

这种描述表示为定义一条新指令所需的最少步骤。首先，有必要为这条新指令定义一组新的操作码。在这种情况下，新操作码SAD被定义为CUSTO的子操作码。如上面所指出的那样，CUSTO被预定义为：This description represents the minimum steps required to define a new instruction. First, it is necessary to define a new set of opcodes for this new instruction. In this case, a new opcode SAD is defined as a sub-opcode of CUSTO. As noted above, CUSTO is predefined as:

opcode QRST op0＝4’b0000opcode QRST op0=4'b0000

opcode CUSTO op1＝4’b0100 QRSTopcode CUSTO op1=4'b0100 QRST

很容易看出，QRST是顶层操作码。CUSTO是QRST的子操作码，并且SAD又是CUSTO的子操作码。操作码的这种层次结构组织允许操作码空间的逻辑分组和管理。要记住的一件重要事情就是CUSTO(和CUST1)被定义为保留的操作码空间，以便用户添加新的指令。最好是，用户停留在所分配的操作码空间，以保证TIE描述的未来的可再用性。It is easy to see that QRST is the top-level opcode. CUSTO is a sub-opcode of QRST, and SAD is a sub-opcode of CUSTO. This hierarchical organization of opcodes allows logical grouping and management of the opcode space. An important thing to remember is that CUSTO (and CUST1) are defined as reserved opcode space for users to add new instructions. Preferably, users stay in the allocated opcode space to ensure future reusability of TIE descriptions.

在TIE描述中的第2个步骤是定义一个新的指令类，它含有新指令SAD。这就是SAD指令的各操作数被定义的地方。在这种情况下，SAD包括3个寄存器操作数，目标寄存器arr，源寄存器ars和art。如前面所指出的那样，arr被定义为用该指令的字段r索引的寄存器，ars和art被定义为用该指令的字段s和t索引的寄存器。The second step in the TIE description is to define a new instruction class, which contains the new instruction SAD. This is where the operands of the SAD instruction are defined. In this case, SAD consists of 3 register operands, destination register arr, source registers ars and art. As noted earlier, arr is defined to be the register indexed by field r of the instruction, and ars and art are defined to be registers indexed by fields s and t of the instruction.

描述中的最后一块为SAD指令给出正式的语义定义。该描述使用Verilog HDL语言的一个子集，用以描述组合逻辑。正是这个块精确地规定ISS将如何对SAD指令进行仿真，以及如何合成一个附加电路并且被添加到可配置的处理器硬件中去以便支持新的指令。The last block in the description gives the formal semantic definition for the SAD instruction. This description uses a subset of the Verilog HDL language to describe combinational logic. It is this block that specifies exactly how the ISS will emulate the SAD instruction, and how an additional circuit will be synthesized and added to the configurable processor hardware to support the new instruction.

其次，使用前面叙述的各种工具对TIE描述进行调试和验证。在验证TIE描述的正确性之后，下一个步骤就是评估新指令对硬件尺寸和性能的影响。如上所述，可以使用例如Design Compiler^TM来完成这一步。当Design Compiler完成工作之后，用户可以注视其输出，以便得到详细的面积和速度报告。Second, debug and verify the TIE description using the various tools described above. After verifying the correctness of the TIE description, the next step is to evaluate the impact of the new instructions on hardware size and performance. As mentioned above, this can be done using, for example, Design Compiler ^(TM) . When the Design Compiler completes its work, the user can watch its output for a detailed area and velocity report.

在验证TIE描述为正确和有效之后，这就是配置和建造一个也支持新的SAD指令的可配置处理器的时间。如上所述，使用图形用户接口GUI来完成这一步。After verifying that the TIE description is correct and valid, this is the time to configure and build a configurable processor that also supports the new SAD instructions. As mentioned above, this step is done using a graphical user interface GUI.

再次，运动评估代码被编译为用于可配置处理器的代码，可配置处理器使用指令集仿真程序来验证程序的正确性，更重要的是测量其性能。用3个步骤来完成这一步：运行使用仿真程序的测试程序；运行基本实施方案以获得指令计数；以及运行新的实施方案以获得指令计数。Again, the motion evaluation code is compiled to code for the configurable processor, which uses an instruction set emulator to verify the correctness of the program and more importantly measure its performance. This is done in 3 steps: run the test program using the emulator; run the base implementation to get instruction counts; and run the new implementation to get instruction counts.

下面是第2个步骤的仿真输出：The following is the simulation output of the second step:

Block＝(16，16)，Search＝(4，4)，size＝(32，32)Block=(16, 16), Search=(4, 4), size=(32, 32)

TIE version passedTIE version passed

Simulation Completed SuccessfullySimulation Completed Successfully

Time for Simulation＝0.98 secondsTime for Simulation＝0.98 seconds

Events Number NumberEvents Number Number

per 100 per 100

instrs instrs

Instructions 226005(100.00)Instructions 226005(100.00)

Unconditional taken branches 454(0.20)Unconditional taken branches 454(0.20)

Conditional branches 37149(16.44)Conditional branches 37149(16.44)

Taken 26947(11.92) Taken 26947(11.92)

Not taken 10202(4.51) Not taken 10202(4.51)

Window Overflows 20(0.01)Window Overflows 20(0.01)

Window Underflows 19(0.01)Window Underflows 19 (0.01)

下面是最后一个步骤的仿真输出：Here is the simulation output for the last step:

TIE version passedTIE version passed

Simulation Completed SuccessfullySimulation Completed Successfully

Time for Simulation＝0.36 secondsTime for Simulation＝0.36 seconds

Events Number NumberEvents Number Number

per 100 per 100

instrs instrs

Instructions 51743(100.00)Instructions 51743(100.00)

Unconditional taken branches 706(1.36)Unconditional taken branches 706(1.36)

Conditional branches 3541(6.84)Conditional branches 3541(6.84)

Taken 2759(5.33)Taken 2759(5.33)

Not taken 782(1.51)Not taken 782(1.51)

Window Overflows 20(0.04)Window Overflows 20 (0.04)

Window Underflows 19(0.04)Window Underflows 19 (0.04)

从这两份报告可以看出，已经取得了大约4倍的加速。要注意的是，可配置处理器指令集仿真程序还能提供更多的其他有用信息。From these two reports, it can be seen that about a 4x speedup has been achieved. It should be noted that the configurable processor instruction set emulator can also provide many other useful information.

在验证该程序的正确性和性能之后，下一个步骤就是使用如上所述的Verilog仿真程序来运行测试程序。专业人士可以从附录C的makefile中发现这个过程的细节(相关的各文件也在附录C中示出)。这种仿真的目的就是进一步地验证新的实施方案的正确性，并且，更重要的是，使得这段测试程序成为用于这个已配置的处理器的回归测试的一部分。After verifying the correctness and performance of the program, the next step is to run the test program using the Verilog emulator as described above. Professionals can find the details of this process from the makefile in Appendix C (the related files are also shown in Appendix C). The purpose of this simulation is to further verify the correctness of the new implementation and, more importantly, to make this test program part of the regression testing for this configured processor.

最后，可以使用例如Design Compiler^TM来合成处理器逻辑，以及使用例如Apollo^TM来进行布局和布线。Finally, processor logic can be synthesized using eg Design Compiler ^(TM) , and placed and routed using eg Apollo ^(TM) .

为了说明的简明和简单起见，本实例对视频压缩和运动评估作了一次简化的观察。实际上，在标准压缩算法中，存在许多附加的细微差别。例如，MPEG2典型地用子像素分辨率来进行运动评估和补偿。各像素的两个相邻的行和列可以被平均，以生成一组像素，内插到介于两行或两列之间的想象中的一个理想位置上。在这里，由于仅用3或4行TIE代码就能容易地实现一组并行的像素平均算法。所以可配置处理器的用户定义指令再次成为有用的。在一行中的像素平均再次使用该处理器的标准指令集的有效的对准操作。For clarity and simplicity of illustration, this example provides a simplified look at video compression and motion evaluation. In fact, in standard compression algorithms, there are many additional nuances. For example, MPEG2 typically uses sub-pixel resolution for motion estimation and compensation. Two adjacent rows and columns of each pixel can be averaged to generate a set of pixels interpolated to an imaginary ideal position between two rows or two columns. Here, because only 3 or 4 lines of TIE codes can easily implement a group of parallel pixel averaging algorithms. So user-defined instructions for configurable processors are again useful. The pixels in a row are averaged again using efficient alignment operations of the processor's standard instruction set.

因此，纳入一条简单的差值的绝对值求和指令仅增加几百个门，然而对运动评估性能的改进超过10倍。这种加速表示在最终系统中在成本格和电源效率方面显著的改进。Thus, incorporating a simple sum-of-absolute-difference instruction adds only a few hundred gates, yet improves motion estimation performance by a factor of more than 10. This speedup represents a significant improvement in cost and power efficiency in the final system.

而且，软件开发工具的无缝扩展(纳入新的运动评估指令)允许快速的原型研制、性能分析以及完整的软件应用程序解决方案的发表。本发明的解决方案使得专用处理器的配置简单、可靠和完整，并且在最终系统产品的成本、性能、功能和电源效率等方面提供引人注目的改进。Furthermore, seamless expansion of software development tools (incorporating new motion evaluation instructions) allows rapid prototyping, performance analysis, and publication of complete software application solutions. The solution of the present invention enables simple, reliable and complete deployment of special purpose processors and provides dramatic improvements in cost, performance, functionality and power efficiency of the final system product.

作为一个聚焦于添加一个硬件功能单元的实例，考虑图6所示的基本配置，其中包括处理器控制功能、程序计数器(PC)、分支选择、指令存储器或高速缓冲存储器和指令解码器，以及基本整数数据通路，其中包括主寄存器文件，旁路多路复用器，流水线寄存器，算术逻辑单元ALU，地址发生器以及用于高速缓冲存储器的数据存储器。As an example focused on adding a hardware functional unit, consider the basic configuration shown in Figure 6, which includes processor control functions, program counter (PC), branch select, instruction memory or cache, and instruction decoder, and the basic Integer data path, which includes the main register file, bypass multiplexers, pipeline registers, arithmetic logic unit ALU, address generator, and data memory for cache memory.

在有条件地出现乘法器逻辑(当设置”乘法器”参数时)的同时编写HDL，并且如图7所示，乘法器单元作为新的流水线级被添加(若需要支持精确的除外情况，则要求转换到除外情况处理)。当然，最好是伴随着新单元来添加使用乘法器的各种指令。The HDL is written while the multiplier logic is conditionally present (when the "Multiplier" parameter is set), and as shown in Figure 7, the multiplier unit is added as a new pipeline stage (if required to support exact exceptions, then require conversion to exception handling). Of course, it would be nice to add the various instructions that use the multiplier along with the new unit.

作为第2个实例，如图8所示，可以将一个全协处理器添加到基本配置，用作诸如乘法/累加单元那样的数字信号处理器。这就给处理器的控制带来一些改变，例如为乘法—累加运算添加各种解码控制信号，包括对来自扩展指令的源和目标寄存器的内容进行解码；为各控制信号添加适当的流水线延时；扩展寄存器目标逻辑；为一个寄存器旁路多路复用器添加控制，以便从累加寄存器送数，以及纳入一个乘法—累加单元，作为用于一条指令的执行结果的可能的源。此外，它还需要添加一个乘法—累加单元，后者带来了附加的各累加寄存器，用于主寄存器源的一个乘法—累加阵列和源选择多路复用器。同样，添加协处理器带来了来自累加寄存器的寄存器旁路多路复用器的扩展，它从累加寄存器取出一个源，并且扩展装载/对准多路复用器，以便从乘法器结果中取出一个源。再有，为了跟实际的硬件一起使用新的功能单元，本系统最好增添一些指令。As a second example, as shown in Figure 8, a full coprocessor can be added to the basic configuration as a digital signal processor such as a multiply/accumulate unit. This brings some changes to the control of the processor, such as adding various decoding control signals for multiply-accumulate operations, including decoding the contents of source and destination registers from extended instructions; adding appropriate pipeline delays for each control signal ; Extending the register destination logic; adding controls for a register bypass multiplexer to feed from the accumulation register, and incorporating a multiply-accumulate unit as possible sources for the execution result of an instruction. In addition, it requires the addition of a multiply-accumulate unit which brings additional accumulator registers, a multiply-accumulate array for main register sources and source selection multiplexers. Likewise, adding a coprocessor brings the extension of the register bypass multiplexer from the accumulation register, which takes a source from the accumulation register, and the expansion of the load/alignment multiplexer to allow Take out a source. Also, in order to use the new functional unit with actual hardware, the system preferably adds some instructions.

与数字信号处理器相结合显得特别有用的另一个选项就是一个浮点单元。这样一个实施例如IEEE754单精度浮点运算标准的功能单元可以连同用于访问它的各项指令一起添加。浮点单元可以被用于例如数字信号处理的应用场合，诸如音频压缩和解压缩。Another option that is particularly useful in combination with a digital signal processor is a floating point unit. Such a functional unit implementing the IEEE 754 single precision floating point arithmetic standard can be added along with instructions for accessing it. The floating point unit may be used in applications such as digital signal processing, such as audio compression and decompression, for example.

作为本系统的灵活性的又一个实例，考虑如图9所示的4KB存储器接口。使用本发明的可配置性，协处理器的各寄存器和各数据通路可以比主整数寄存器文件和数据通路宽些或窄些，并且本地存储器的宽度可以改变，使得存储器宽度等于最宽的处理器或协处理器的宽度(存储器在读和写时的寻址也相应地被调整)。例如，图10表示一个用于处理器的本地存储器系统，该处理器支持向一个处理器/协处理器组合的32位的装载和存储。上述组合在相同的阵列中寻址，但是该协处理器支持128位的装载和存储。这可以用TPP代码来实现As yet another example of the flexibility of the present system, consider a 4KB memory interface as shown in FIG. 9 . Using the configurability of the present invention, the registers and datapaths of the coprocessor can be made wider or narrower than the main integer register file and datapath, and the width of the local memory can be changed so that the memory width is equal to the widest processor or the width of the coprocessor (memory addressing for reads and writes is adjusted accordingly). For example, Figure 10 shows a local memory system for a processor that supports 32-bit loads and stores to a processor/coprocessor combination. The above combinations address in the same array, but the coprocessor supports 128-bit loads and stores. This can be achieved with the TPP code

function memory(Select，A1，A2，DI1，DI2，W1，W2，DO1，DO2)function memory(Select, A1, A2, DI1, DI2, W1, W2, DO1, DO2)

； SB1＝config_get_value(″width_of_port_1″)；SB2＝; SB1 = config_get_value("width_of_port_1"); SB2 =

config_get_value(″width_of_port_2″)；config_get_value("width_of_port_2");

；$Bytes＝config_get_value(″size_of_memory″)；;$Bytes=config_get_value("size_of_memory");

；$Max＝max($B1，$B2)；$Min＝min($B1，$B2)；;$Max=max($B1, $B2); $Min=min($B1, $B2);

；$Banks＝$Max/SMin；;$Banks=$Max/SMin;

；$Wide1＝($Max＝＝$B1)；$Wide2＝($Max＝＝$B2)；;$Wide1=($Max==$B1); $Wide2=($Max==$B2);

；$Depth＝$Bytes/(log2($Banks)＊log2($Max))；;$Depth=$Bytes/(log2($Banks)*log2($Max));

wire[`$Max`＊8-1∶0]Data1＝`$Wide1`？DI1：(`$Banks`{DI1}}；wire[`$Max`＊8-1∶0]Data1=`$Wide1`? DI1:(`$Banks`{DI1}};

wire[`$Max`＊8-1∶0]Data2＝`$Wide1`？DI2：{`$Banks`{DI2}})；wire[`$Max`*8-1:0]Data2=`$Wide1`? DI2:{`$Banks`{DI2}});

wire[`$Max`*8-1∶0]D＝Select？Data1：Data2；wire[`$Max`*8-1∶0]D=Select? Data1: Data2;

wire Wide＝Select？Wide1：wide2；wire Wide=Select? Wide1: wide2;

wire[log2(`$Bytes`)-1∶0]A＝Select？A1：A2；wire[log2(`$Bytes`)-1:0] A=Select? A1:A2;

wire [log2(`$Bytes`)-1∶0]Address＝A[log2(`$Bytes`)-wire [log2(`$Bytes`)-1∶0]Address＝A[log2(`$Bytes`)-

1：log2(`$Banks`)]：1: log2(`$Banks`)]:

wire [log2(`$Banks`)-1∶0]Lane＝A[log2(`$Banks`)-1∶0]；wire[log2(`$Banks`)-1∶0]Lane=A[log2(`$Banks`)-1∶0];

；for ($i＝0；$i<$Banks；$i++){;for ($i=0; $i<$Banks; $i++){

wire WrEnable(i}＝Wide|(Lane＝＝(i})；wire WrEnable(i}=Wide|(Lane==(i});

wire [log2(`$Min`)-1∶0]WrData`$i`＝D[({i}+1)＊`$Min`＊8-wire [log2(`$Min`)-1∶0]WrData`$i`＝D[({i}+1)＊`$Min`＊8-

1：{i)＊`$Min`＊8]1: {i)＊`$Min`＊8]

ram(RdData`$i`，Depth，Address，WrData`$i`，WrEnable`$i`)； ram(RdData`$i`, Depth, Address, WrData`$i`, WrEnable`$i`);

；};}

wire[`$Max`＊8-1∶0]RdData＝{wire[`$Max`＊8-1∶0]RdData={

；for($i＝0；$i<$Banks；$i++){;for($i=0; $i<$Banks; $i++){

RdData`$i`，RdData `$i`,

；};}

} }

wire[`$B1`＊8-1∶0]DO1＝Wide1？RdData：RdData[(Lane+1)＊B1＊8-wire[`$B1`*8-1:0] DO1=Wide1? RdData: RdData[(Lane+1)*B1*8-

1：Lane＊B1＊8]；1: Lane＊B1＊8];

wire[`$B2`＊8-1∶0]DO2＝Wide2？RdData：RdData[(Lane+1)＊B2＊8-wire[`$B2`*8-1:0] DO2=Wide2? RdData: RdData[(Lane+1)*B2*8-

1：Lane＊B2＊8]；1: Lane＊B2＊8];

在这里，$Bytes是总的存储器大小，在写信号W1的控制下，在数据总线D1的字节地址A1处，以宽度B1进行存取，或者使用对应的参数B2，A2，D2和W2。在一个给定的周期中，只有一组由Select定义的信号是活动的。TPP代码将存储器实现为存储器池的一个集合。每一个池的宽度由最小存取宽度以及池的数目乘以最大与最小存取宽度之比来给出。一个for循环被用来具体说明每一个存储器池及其相关的写信号，即，写使能和写数据。第2个for循环被用来收集从所有各池读出的数据，并将其送到一组单独的总线。Here, $Bytes is the total memory size, accessed with width B1 at byte address A1 of data bus D1 under the control of write signal W1, or using the corresponding parameters B2, A2, D2 and W2. In a given cycle, only a set of signals defined by Select are active. The TPP code implements memory as a collection of memory pools. The width of each pool is given by the minimum access width and the number of pools times the ratio of the maximum to minimum access width. A for loop is used to specify each memory pool and its associated write signals, ie, write enable and write data. The second for loop is used to collect the data read from all pools and send it to a single set of buses.

图11表示将用户定义的各项指令纳入到基本配置之中的一个实例。如图所示，可以用类似于算术逻辑单元ALU那样的定时和接口将简单指令添加到处理器流水线中去。以这种方式添加的各项指令应当不产生挂起或除外情况，不含有状态，仅使用两个普通的源寄存器数值以及指令字作为输入，并产生一个单独的输出数值。然而，若TIE语言具有指定处理器状态的规定，则这样的约束条件就是不必要的。Figure 11 shows an example of incorporating user-defined instructions into the basic configuration. As shown, simple instructions can be added to the processor pipeline with timing and interfaces similar to those of the ALU. Instructions added in this way should produce no hangs or exceptions, contain no state, take as input only the two ordinary source register values and the instruction word, and produce a single output value. However, such constraints are unnecessary if the TIE language has provisions for specifying processor states.

图12表示在这个系统中实现一个用户定义单元的另一个实例。图中所示的功能单元，ALU的一个8/16并行数据单元扩展，从下列ISA代码中产生：Fig. 12 shows another example of implementing a user-defined unit in this system. The functional unit shown in the figure, an 8/16 parallel data unit extension of the ALU, is generated from the following ISA code:

Instruction { Instruction {

Opcode ADD8_4 CUSTOM op2＝0000 Opcode ADD8_4 CUSTOM op2=0000

Opcode MIN16_2 CUSTOM op2＝0001 Opcode MIN16_2 CUSTOM op2=0001

Opcode SHIFT16_2 CUSTOM op2＝0002 Opcode SHIFT16_2 CUSTOM op2=0002

iclass MY 4ADD8，2MIN16，SHIFT16_2 iclass MY 4ADD8, 2MIN16, SHIFT16_2

a<t，a<s，a>t_ a<t, a<s, a>t_

} }

Implementation{ Implementation {

input[31∶0]art，ars； input[31:0] art, ars;

input[23∶0]inst； input[23:0]inst;

input ADD8_4，MIN16_2，SHIFT16_2； input ADD8_4, MIN16_2, SHIFT16_2;

output[31∶0]arr；output[31:0] arr;

wire[31∶0]add，min，shift；wire [31:0] add, min, shift;

assign add＝(art[31∶24]+ars[31∶24]，art[23∶16]+art[23∶16]，assign add=(art[31:24]+ars[31:24], art[23:16]+art[23:16],

art[15∶8]+art[15∶8]，art[7∶0]+art[7∶0]}；art[15:8]+art[15:8], art[7:0]+art[7:0]};

assign min[31∶16]＝art[31∶16]<ars[31∶16]？Art[31∶16]：assign min[31:16]=art[31:16]<ars[31:16]? Art [31:16]:

ars[31∶16]；ars[31:16];

assign min[15∶0]＝art[15∶0]<ars[15∶0]？Art[15∶0]：assign min[15:0]=art[15:0]<ars[15:0]? Art [15:0]:

ars[15∶0]；ars[15:0];

assign shift[31∶16]＝art[31∶16]<<ars[31∶16]；assign shift[31:16]=art[31:16]<<ars[31:16];

assign shift[15∶0]＝art[15∶0]<<ars[15∶0]；assign shift[15:0]=art[15:0]<<ars[15:0];

assign arr＝{32{ADD8_4}}& add|{32{MIN16_2}}& min|assign arr＝{32{ADD8_4}}& add|{32{MIN16_2}}& min|

{32{SHIFT16_2}}& shift；{32{SHIFT16_2}}&shift;

}}

在本发明的另一个方面，特别感兴趣的是，设计者定义指令执行单元96，TIE定义的各项指令，包括那些修改处理器状态的指令，就是在这个单元中被解码和执行。在本发明的这个方面，多个积木块已经被添加到语言之中，使之有可能说明能够被新指令读和写的附加的处理器状态。“状态”语句被用来说明附加的处理器状态。该说明开始于关键字state。状态语句的下一部分说明该状态的大小和各比特的号码，以及该状态的各比特是如何被索引的。其后的部分是状态名，用以标识在其他说明部分中的状态。状态语句的最后一部分是与该状态有关的属性的一份列表。例如，In another aspect of the invention, which is of particular interest, the designer defines the instruction execution unit 96 in which the instructions defined by the TIE, including those which modify the state of the processor, are decoded and executed. In this aspect of the invention, building blocks have been added to the language, making it possible to specify additional processor states that can be read and written by new instructions. The "state" statement is used to specify additional processor state. The specification begins with the keyword state. The next part of the state statement specifies the size of the state and the number of bits, and how the bits of the state are indexed. The following part is the state name, which is used to identify the state in other description parts. The last part of the state statement is a list of properties associated with that state. For example,

state [63∶0] DATA cpn＝0 autopackstate[63:0] DATA cpn=0 autopack

state [27∶0] KEYC cpn＝1 nopackstate[27:0] KEYC cpn=1 nopack

state [27∶0] KEYD cpn＝1state[27:0] KEYD cpn=1

定义3种新的处理器状态，DATA，KEYC和KEYD。状态DATA是64比特宽度，其各比特被索引为从63到0。KEYC和KEYD二者都是28比特的状态。DATA具有一种协处理器号码属性cpn，表示数据DATA属于哪一个协处理器。Define 3 new processor states, DATA, KEYC and KEYD. Status DATA is 64 bits wide with bits indexed from 63 to 0. Both KEYC and KEYD are 28-bit states. DATA has a coprocessor number attribute cpn, indicating which coprocessor the data DATA belongs to.

属性“autopack”表示状态DATA将自动地被映射到用户寄存器文件中的某些寄存器，使得DATA的数值能够被各种软件工具读和写。The attribute "autopack" indicates that the state DATA will be automatically mapped to certain registers in the user register file, so that the value of DATA can be read and written by various software tools.

user_register部分被定义为表示将状态映射到在用户寄存器文件中的各寄存器。user_register部分开始于一个关键字user_register，其后跟随着一个表示寄存器号码的数字，并且以一个表示待映射到寄存器的各状态比特的表达式作为结尾。例如，The user_register section is defined to represent the mapping of state to registers in the user register file. The user_register section begins with a keyword user_register, followed by a number representing the register number, and ends with an expression representing the status bits to be mapped to the register. For example,

user_register 0 DATA[31∶0]user_register 0 DATA[31:0]

user_register 1 DATA[63∶32]user_register 1 DATA [63:32]

user_register 2 KEYCuser_register 2 KEYC

user_register 3 KEYDuser_register 3 KEYD

user_register 4 {X，Y，z}user_register 4{x, y, z}

指定DATA的低位字被映射到第1用户寄存器文件，并且高位字被映射到第2用户寄存器文件。其后的两个用户寄存器文件行被用来保存KEYC和KEYD的数值。显而易见，在这一部分中所使用的状态信息应当跟state部分所使用的保持一致。这里，可以通过一段计算机程序来自动地检查此项一致性。The lower word of the specified DATA is mapped to the 1st user register file, and the upper word is mapped to the 2nd user register file. The next two user register file lines are used to hold the values of KEYC and KEYD. Obviously, the state information used in this section should be consistent with that used in the state section. Here, this consistency can be checked automatically by a computer program.

在本发明的另一个实施例中，使用箱式包装(bin-packing)算法自动地将各状态比特分配到用户寄存器文件的各行。在又一个实施例中，可以使用例如人工和自动分配的组合来保证向上的兼容性。In another embodiment of the invention, a bin-packing algorithm is used to automatically assign status bits to rows of the user register file. In yet another embodiment, upward compatibility may be ensured using, for example, a combination of manual and automatic assignments.

指令字段语句field被用来改进TIE代码的可读性。各字段是被集合在一起并且用名字来引用的其他字段的各连锁的各子集。在一条指令中各比特的完全集是最高级的超集字段inst，并且这个字段可以被划分为较小的各字段。例如，The instruction field statement field is used to improve the readability of TIE code. Fields are subsets of concatenations of other fields that are grouped together and referenced by name. The complete set of bits in an instruction is the highest-level superset field inst, and this field can be divided into smaller fields. For example,

field xinst[11:8]field xinst[11:8]

field yinst[15:12]field yinst [15:12]

fieldxy [x，y]fieldxy[x,y]

将两个4比特字段x和y，定义为一个最高级字段inst的子字段(分别是比特8-11和12-15)，以及将一个8比特字段xy定义为x和y字段的连锁。Two 4-bit fields, x and y, are defined as subfields of a superlative field inst (bits 8-11 and 12-15, respectively), and an 8-bit field, xy, is defined as the concatenation of the x and y fields.

语句opcode为编码专用字段定义操作码。打算指定操作数的指令字段，例如，准备由这样定义的操作码使用的寄存器或立即常数，应当首先用field语句加以定义，然后用operand语句加以定义。The statement opcode defines an opcode for an encoding-specific field. Instruction fields intended to specify operands, eg, registers or immediate constants intended to be used by opcodes thus defined, should first be defined with the field statement, followed by the operand statement.

例如，For example,

opcode acs op2＝4’b0000 CUSTOopcode acs op2=4'b0000 CUSTO

opcode adse1 op2＝4’b0001 CUSTOopcode adse1 op2=4'b0001 CUSTO

基于事先定义的操作码CUSTO(4’b0000表示一个4比特长的二进制常数0000)来定义两组新的操作码acs和adse1。优选的核心ISA的TIE描述具有下列语句Based on the previously defined opcode CUSTO (4'b0000 represents a 4-bit long binary constant 0000) to define two groups of new opcodes acs and adse1. The TIE description for the preferred core ISA has the following statement

field op0 inst[3:0]field op0 inst[3:0]

field op1 inst[19:16]field op1 inst[19:16]

field op2 inst[23P:20]field op2 inst[23P:20]

opcode QRST op0＝4′b0000opcode QRST op0=4'b0000

opcode CUSTO op1＝4′b0100 QRSTopcode CUSTO op1=4'b0100 QRST

inst[23∶0]＝0000 0110 xxxx xxxx xxxx 0000inst[23:0]＝0000 0110 xxxx xxxx xxxx 0000

inst[23:0]＝0001 0110 xxxx xxxx xxxx 0000inst[23:0]＝0001 0110 xxxx xxxx xxxx 0000

指令操作数语句operand标识各寄存器和立即常数。然而，在将一个字段定义为一个操作数之前，它应当事先已经被定义为一个如上所述的字段。若该操作数是一个立即常数，则可以从该操作数产生该常数的数值，或者可以从一个事先定义的常数表中将它取出，常数表的定义将在下面叙述。例如，为了对一个立即操作数进行编码，TIE代码The instruction operand statement operand identifies registers and immediate constants. However, before defining a field as an operand, it should have been previously defined as a field as described above. If the operand is an immediate constant, the value of the constant can be generated from the operand, or it can be taken out from a previously defined constant table whose definition will be described below. For example, to encode an immediate operand, the TIE code

field offset inst[23:6]field offset inst[23:6]

operand offests4 offset{operand offs4 offset{

assign offsets4＝{{14{offset[17]}}，offset}<<2； assign offsets4={{14{offset[17]}},offset}<<2;

}{}{

wire [31∶0]t； wire[31:0]t;

assign t＝offsets4>>2； assign t = offsets4>>2;

assign offset＝t[17∶0]； assign offset = t[17:0];

}}

定义一个18位的、名为offset的字段，它保存一个有符号数以及一个操作数offsets4，后者是存储在offset字段中的数的4倍。如同专业人士所懂得的那样，operand语句的最后一部分实际上描述在用于描述组合电路的Verilog^TM HDL的一个子集中用以进行计算的电路。Defines an 18-bit field named offset that holds a signed number and an operand offsets4 that is four times the number stored in the offset field. As those skilled in the art will understand, the last part of the operand statement actually describes the circuit used for computation in a subset of Verilog ^(TM) HDL used to describe combinational circuits.

这里，wire语句定义一组名为t的逻辑接线，其宽度为32位。在wire语句之后的第1assign语句指定驱动逻辑接线的逻辑信号为offsets4，并且第2assign语句指定t的低18位被放进offset字段。第1assign语句直接地指定操作数offsets4的值为offset以及它的符号位(位17)以及跟随在其后的左移两位的14份拷贝的一个连锁。Here, the wire statement defines a set of logical wires named t with a width of 32 bits. The first assign statement after the wire statement specifies that the logic signal driving the logic wiring is offsets4, and the second assign statement specifies that the lower 18 bits of t are put into the offset field. The first assign statement directly assigns the value of the operand offsets4 to a concatenation of 14 copies of offset and its sign bit (bit 17) followed by a left shift of two bits.

table prime16{table prime16{

5353

}}

operand prime_s s{operand prime_s s{

assign prime_s＝prime[s]；assign prime_s = prime[s];

}{}{

assign s＝ prime_s＝＝prime[0] ？ 4′b0000：assign s = prime_s == prime[0] ? 4'b0000:

prime_s＝＝prime[1] ？ 4′b0001： prime_s==prime[1] ? 4'b0001:

prime_s＝＝prime[2] ？ 4′b0010： prime_s==prime[2] ? 4'b0010:

prime_s＝＝prime[3] ？ 4′b0011： prime_s==prime[3] ? 4'b0011:

prime_s＝＝prime[4] ？ 4′b0100： prime_s==prime[4] ? 4'b0100:

prime_s＝＝prime[5] ？ 4′b0101： prime_s==prime[5] ? 4'b0101:

prime_s＝＝prime[6] ？ 4′b0110： prime_s==prime[6] ? 4'b0110:

prime_s＝＝prime[7] ？ 4′b0111： prime_s==prime[7] ? 4'b0111:

prime_s＝＝prime[8] ？ 4′b1000： prime_s==prime[8] ? 4'b1000:

prime_s＝＝prime[9] ？ 4′b1001： prime_s==prime[9] ? 4'b1001:

prime_s＝＝prime[10] ？ 4′b1010： prime_s==prime[10] ? 4'b1010:

prime_s＝＝prime[11] ？ 4′b1011： prime_s==prime[11] ? 4'b1011:

prime_s＝＝prime[12] ？ 4′b1100： prime_s==prime[12] ? 4'b1100:

prime_s＝＝prime[13] ？ 4′b1101： prime_s==prime[13] ? 4'b1101:

prime_s＝＝prime[14] ？ 4′b1110： prime_s==prime[14] ? 4'b1110:

4′b1111； 4'b1111;

利用table语句来定义一个常数数组prime(跟随在表名之后的数字是表中各元素的号码)，并使用这些操作数作为进入该表prime的索引，以便为操作数prime_s编码一个数值(注意在定义索引时，Verilog^TM语句的使用)。Use the table statement to define a constant array prime (the number following the table name is the number of each element in the table), and use these operands as an index into the table prime to encode a value for the operand prime_s (note that in Use of Verilog ^TM statements when defining an index).

指令类语句iclass在一种共同格式中将操作码和操作数联系在一起。在一个iclass语句中定义的所有指令都具有相同的格式和操作数用法。在定义一个指令类之前，它的各成员必须首先被定义为字段，然后被定义为操作码和操作数。例如，建立在前面的定义操作码acs和adse1的实例的基础上，附加语句The instruction class statement iclass associates opcodes and operands in a common format. All instructions defined in an iclass statement have the same format and operand usage. Before defining an instruction class, its members must first be defined as fields, and then as opcodes and operands. For example, building on the previous example defining opcodes acs and adse1, the additional statement

operand art t {assign art＝AR[t]；}{}operand art t { assign art = AR[t]; } {}

operand ars s {assign ars＝AR{s}；}{}operand ars s {assign ars=AR{s};}{}

operand arr r {assign AR[r]＝arr；}{}operand arr r { assign AR[r] = arr; } {}

使用operand语句来定义3个寄存器操作数art，ars和arr(再次注意在定义中Verilog^TM语句的使用)。然后，iclass语句Use the operand statement to define the 3 register operands art, ars and arr (again note the use of Verilog ^TM statements in the definitions). Then, the iclass statement

iclass viterbi[adse1，acs][out arr，in art in ars]iclass viterbi [adse1, acs] [out arr, in art in ars]

指定操作数adse1和acs属于指令viterbi的一个共同类，它取两个寄存器操作数art和ars作为输入，并将输出写入到一个寄存器操作数arr中去。The specified operands adse1 and acs belong to a common class of the instruction viterbi, which takes two register operands art and ars as input, and writes the output to a register operand arr.

在本发明中，指令类语句iclass被修改，以便允许对各指令的状态访问信息进行说明。它开始于关键字“iclass”，其后跟随着该指令类的名字，属于该指令类的操作码的列表以及操作数访问信息的一份列表，并且结束于一份新定义的、用于状态访问信息的列表。例如，In the present invention, the instruction class statement iclass is modified to allow the description of the state access information of each instruction. It begins with the keyword "iclass", followed by the name of the instruction class, a list of opcodes belonging to the instruction class and a list of operand access information, and ends with a newly defined List of access information. For example,

iclass lddata {LDDATA} {out arr，in imm4} {in DATA} iclass lddata {LDDATA} {out arr, in imm4} {in DATA}

iclass stdata {STDATA} {in ars，inart} {out DATA} iclass stdata {STDATA} {in ars, inart} {out DATA}

iclass stkey {STKEY} {in ars，in art} {out KEYC，out KEYD} iclass stkey {STKEY} {in ars, in art} {out KEYC, out KEYD}

iclass des {DES} {out arr，in imm4} {inout KEYC，inout iclass des {DES} {out arr, in imm4} {inout KEYC, inout

DATA，inout KEYD}DATA, inout KEYD}

定义几个指令类以及各种新指令如何访问各种状态。关键字“in”，“out”和“inout”被用来指示该状态被iclass中的各指令读出、写入或修改(读出和写入)。在本实例中，状态“DATA”被指令“LDDATA”读出，状态“KEYC”和“KEYD”被指令“STKEY”写入，“KEYC”，“KEYD”和“DATA”被指令“DES”修改。Defines several instruction classes and how various new instructions access various states. The keywords "in", "out" and "inout" are used to indicate that the state is read, written or modified (read and write) by instructions in the iclass. In this example, the state "DATA" is read by the instruction "LDDATA", the states "KEYC" and "KEYD" are written by the instruction "STKEY", and "KEYC", "KEYD" and "DATA" are modified by the instruction "DES" .

指令语义语句semantic描述一条或多条指令的行为，这些指令使用用于对操作数进行编码的Verilog^TM的相同子集。通过在一条单独的semantic语句中定义多条指令，某些共同的表达式可以共享，并且硬件实施方案可以变得更加有效。在semantic语句中允许使用的变量是在该语句的操作码列表中定义的用于各操作码的各操作数，以及在该操作码列表中为每一组操作码指定的一个单比特变量。该变量具有与操作码相同的名字，并且当该操作码被检出时，它被估值为1。它被用于计算部分(Verilog^TM子集部分)，用以指示相应的指令的出现。Instruction Semantics A statement semantic describes the behavior of one or more instructions that use the same subset of Verilog ^TM used to encode their operands. By defining multiple instructions in a single semantic statement, certain common expressions can be shared and hardware implementations can be made more efficient. The variables allowed in a semantic statement are the operands for each opcode defined in the statement's opcode list, and a single-bit variable specified in the opcode list for each set of opcodes. This variable has the same name as the opcode, and it is evaluated to 1 when the opcode is checked out. It is used in the calculation part (Verilog ^TM subset part) to indicate the occurrence of the corresponding instruction.

//define a new opcode for BYTESWAP based on //define a new opcode for BYTESWAP based on

// - a predefined instruction field op2 // - a predefined instruction field op2

// - a predefined opcode CUST0 // - a predefined opcode CUST0

//refer to Xtensa ISA manual for descriptions of op2 and CUSTO //refer to Xtensa ISA manual for descriptions of op2 and CUSTO

opcode BYTESWAP op2＝4′b0000 CUST0 opcode BYTESWAP op2=4'b0000 CUST0

//declare state SWAP and COUNT //declare state SWAP and COUNT

state COUNT 32 state COUNT 32

state SWAP1 state SWAP1

//map COUNT and SWAp to user register file entries //map COUNT and SWAp to user register file entries

user_register 0 COUNT user_register 0 COUNT

user_register 1 SWAP user_register 1 SWAP

//define a new instruction class that //define a new instruction class that

// - reads data from ars(predefined to be AR[s]) // - reads data from ars(predefined to be AR[s])

// - uses and writes state COUNT // - uses and writes state COUNT

// - uses state SWAP // - uses state SWAP

iclass bs {BYTESWAP}{outarr，inars}{inout COUNT，in iclass bs { BYTESWAP } { outarr, inars } { inout COUNT, in

SWAp}SWAp}

//semantic definition of byteswap //semantic definition of byteswap

// COUNT the number of byte-swapped words // COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP // Return the swapped or un-swapped data depending on SWAP

semantic bs {BYTESWAP} { semantic bs {BYTESWAP} {

wire [31∶0] ars_swapped＝ wire[31:0] ars_swapped=

{ars[7∶0]，ars[15∶8]，ars[23∶16]，ars[31∶24]}；{ars[7:0], ars[15:8], ars[23:16], ars[31:24]};

assign arr＝SWAP？ars_swapped：ars； assign arr=SWAP? ars_swapped: ars;

assign COUNT＝COUNT+SWAP； assign COUNT=COUNT+SWAP;

} }

上述代码的第1部分为新指令定义一组操作码，称为BYTESWAP。Section 1 of the above code defines a set of opcodes for the new instruction, called BYTESWAP.

//define a new opcode for BYTESWAP based on//define a new opcode for BYTESWAP based on

// - a predefined instruction field op2// - a predefined instruction field op2

// - a prede fined opcode CUSTO// - a prede fined opcode CUSTO

//refer to Xtensa ISA manual for descriptions of op2 and CUSTO//refer to Xtensa ISA manual for descriptions of op2 and CUSTO

opcode BYTESWAP op2＝4′b0000 CUSTOopcode BYTESWAP op2=4'b0000 CUSTO

这里，新操作码被定义为CUSTO的一组子操作码。从下面详细叙述的《Xtensau^TM指令集体系结构参考手册》中，可以看出CUSTO被定义为Here, a new opcode is defined as a set of sub-opcodes of CUSTO. From the "Xtensau ^TM Instruction Set Architecture Reference Manual" described in detail below, it can be seen that CUSTO is defined as

opcode QRST op0＝4’b0000opcode QRST op0=4'b0000

opcode CUSTO op1＝4’b0100 QRSTopcode CUSTO op1=4'b0100 QRST

在这里，op0和op1都是指令中的字段。典型地按照一种层次结构的样式来组织各操作码。这里，QRST是顶级操作码，CUSTO是QRST的子操作码，并且BYTESWAP又是CUSTO的子操作码。操作码的这种层次结构组织允许对操作码空间进行逻辑的集群和管理。Here, both op0 and op1 are fields in the instruction. Opcodes are typically organized in a hierarchical fashion. Here, QRST is the top-level opcode, CUSTO is a sub-opcode of QRST, and BYTESWAP is again a sub-opcode of CUSTO. This hierarchical organization of opcodes allows logical clustering and management of the opcode space.

第2段说明表示BYTESWAP指令所需的附加的处理器状态：Paragraph 2 states the additional processor state required to represent the BYTESWAP instruction:

//declare state SWAP and COUNT//declare state SWAP and COUNT

state COUNT 32state COUNT 32

state SWAP 1state SWAP 1

这里，COUNT被说明为一种32比特的状态，而SWAP为1个比特的状态。TIE语言指定在COUNT中的各比特从31到0进行索引，其中比特0为最低位。Here, COUNT is illustrated as a 32-bit state and SWAP is a 1-bit state. The TIE language specifies that the bits in COUNT are indexed from 31 to 0, with bit 0 being the lowest bit.

Xtensa^TM ISA提供两条指令，RSR和WSR，用于(将数据)存入专用的系统寄存器以及从其中取出。类似地，它提供两条其他指令，RUR和WUR(下面将作详细说明)，用于存储和恢复在TIE中被说明的各种状态。为了存储和恢复在TIE中被说明的各种状态，必须在RUR和WUR指令能够访问的用户寄存器文件中指定由各状态到各行的映射关系。上述代码的下列部分指定这种映射关系：The Xtensa ^(TM) ISA provides two instructions, RSR and WSR, for storing (data) into and fetching from dedicated system registers. Similarly, it provides two other instructions, RUR and WUR (described in detail below), for storing and restoring the various states described in the TIE. In order to store and restore the various states described in TIE, the mapping relationship from each state to each row must be specified in the user register file that the RUR and WUR instructions can access. The following parts of the above code specify this mapping:

//map COUNT and SWAP to user register file entries//map COUNT and SWAP to user register file entries

user_register 0 COUNTuser_register 0 COUNT

user_register 1 SWAPuser_register 1 SWAP

使得下列各指令将COUNT的数值保存到a2，并将SWAP的数值保存到a5：causes the following instructions to store the value of COUNT in a2 and the value of SWAP in a5:

RUR a2，0；RUR a2,0;

RUR a5，1；RUR a5,1;

这种机制实际上被用于测试程序中，用以验证各状态的各项内燃机车。在C语言中，上述两条指令具有下列形式：This mechanism is actually used in the test program to verify various diesel locomotives in various states. In C language, the above two instructions have the following form:

x＝RUR(0)；x = RUR(0);

y＝RUR(1)；y = RUR(1);

TIE描述的嵌套部分是含有新指令BYTESWAP的新指令类的定义：The nested part of the TIE description is the definition of a new instruction class containing the new instruction BYTESWAP:

//define a new instruction class that //define a new instruction class that

// - uses and writes state COUNT // - uses and writes state COUNT

// - uses state SWAP // - uses state SWAP

iclass bs{BYTESWAP}{out arr，in ars}{inout COUNT，in iclass bs{BYTESWAP}{out arr, in ars}{inout COUNT, in

SWAP}SWAP}

在这里，iclass是关键字，bs是iclass的名字。下一个子句列出在指令类(BYTESWAP)中的指令。其后的指令指定在这个类中的各指令所使用的操作数(在本例中为一个输入操作数ars和一个输出操作数arr)。在iclass定义中的最后一个子句指定在这个类中被该指令访问的各种状态(在本例中，该指令将对状态SWAP进行读出，对状态COUNT进行读和写)。Here, iclass is the keyword and bs is the name of the iclass. The next clause lists the commands in the command class (BYTESWAP). Subsequent instructions specify the operands used by the instructions in this class (in this example, an input operand ars and an output operand arr). The last clause in the iclass definition specifies the various states within this class that are accessed by the instruction (in this example, the instruction will read state SWAP and read and write state COUNT).

上述代码的最后一块为BYTESWAP指令给出正式的语义定义：The last block of the above code gives a formal semantic definition for the BYTESWAP instruction:

//semantic definition of byteswap//semantic definition of byteswap

// COUNT the number of byte-swapped words// COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP// Return the swapped or un-swapped data depending on SWAP

semantic bs {BYTESWAP}{semantic bs {BYTESWAP}{

wire [31∶0] ars_swappedwire [31:0] ars_swapped

assign arr＝SWAP？ars_swapped：ars； assign arr=SWAP? ars_swapped: ars;

assign COUNT＝COUNT+SWAP； assign COUNT=COUNT+SWAP;

} }

该描述使用Verilog HDL的一个子集来描述组合逻辑。正是这个块精确地规定指令集仿真程序将如何仿真BYTESWAP指令，以及附加电路如何被合成并且被添加到Xtensa^TM处理器硬件之中，以支持新的指令。This description uses a subset of Verilog HDL to describe combinational logic. It is this block that specifies exactly how the instruction set emulator will emulate the BYTESWAP instruction, and how additional circuitry will be synthesized and added to the Xtensa ^™ processor hardware to support the new instruction.

在实现各种用户定义状态的本发明中，被说明的状态可以像其他变量那样被使用，以便访问存储在各种状态中的信息。出现在一个表达式右边的一个状态标识符指示从该状态读出。通过将一个数值或者一个表达式分配给状态标识符，就能完成写入到一种状态。例如，下面的语义代码段表示一条指令如何读出或写入各种状态：In the present invention implementing various user-defined states, the stated states can be used like other variables to access information stored in the various states. A state identifier appearing on the right side of an expression indicates which state to read from. Writing to a state is accomplished by assigning a value or an expression to the state identifier. For example, the following semantic code snippet shows how an instruction reads or writes various states:

assign KEYC＝sr＝＝8′d2？art[27:0]：KEYC； assign KEYC=sr==8'd2? art[27:0]: KEYC;

assign KEYD＝sr＝＝8′d3？art[27:0]：KEYD； assign KEYD=sr==8'd3? art[27:0]: KEYD;

assign DATA ＝sr＝＝ 8′d0 ？{DATA[63:32]，art}：{art， assign DATA = sr = = 8'd0 ? {DATA[63:32], art}: {art,

DATA[63:32]}；DATA[63:32]};

为了说明在可配置处理器中，可以作为核心指令而被执行的各种指令的实例，以及经由各配置选项的选择变为可用的各指令的目的，Tensilica公司出版的《Xtensa^TM指令集体系结构(ISA)参考手册》修订版1.0已作为参考文献被收入本文。还有，为了说明能够被用来执行这样的用户定义指令的TIE语言指令的各种实例，同样由Tensilica公司出版的《指令扩展语言(TIE)参考手册》修订版1.3也已作为参考文献被收入本文。In order to illustrate the examples of various instructions that can be executed as core instructions in a configurable processor, and the purpose of each instruction that becomes available through the selection of various configuration options, "Xtensa ^TM Instruction Set Architecture" published by Tensilica Corporation (ISA) Reference Manual, Rev. 1.0, is incorporated herein by reference. Also, Instruction Extension Language (TIE) Reference Manual, Revision 1.3, also published by Tensilica Corporation, is incorporated by reference to illustrate various examples of TIE language instructions that can be used to implement such user-defined instructions This article.

从TIE描述中，可以使用例如，类似于附录D所示的一段程序来产生执行这些指令的硬件实施方案。附录E表示为了如同内部函数那样支持新的指令所需的头文件而使用的代码。From the TIE description, a program similar to that shown in Appendix D can be used, for example, to generate a hardware implementation that executes these instructions. Appendix E shows the code used to support the header files required for new instructions as intrinsics.

使用配置说明，可以自动地生成下列各项：Using configuration instructions, the following items can be automatically generated:

—汇编程序的ISA专用部分；— the ISA-specific section of the assembler;

—用于编译程序的ISA专用支持程序；— ISA-specific support programs for compiling programs;

—反汇编程序的ISA专用部分(为调试程序所使用)；以及— the ISA-specific portion of the disassembler (used by the debugger); and

—仿真程序的ISA专用部分。— The ISA-specific portion of the emulator.

图16是一份图，表示这些软件工具的ISA专用部分是如何产生的。TIE语法分析程序410从用户生成的TIE描述文件400为几段程序生成C语言代码，上述程序中的每一段都产生一个文件，该文件可以被一种或多种软件开发工具所访问，以便得到关于用户定义指令和状态的信息。例如，程序tie2gcc 420生成一个称为xtensa_tie.h的C语言头文件470，其中包括针对新指令的内部函数。程序tie2isa 430生成一个动态连接库(DLL)480，其中含有关于用户定义指令格式的信息(在下面所讨论的Wilson等人的专利申请中，这是其中所讨论的编码与解码DLL的有效组合)。程序tie2iss 440生成性能模拟程序，并产生一个含有指令语义的DLL490，正如在Wilson等人的专利申请中所讨论的那样，该指令语义被主机编译程序用来产生为该仿真程序所使用的仿真程序DLL。程序tie2ver 450以一种适当的硬件描述语言为用户定义的各项指令产生必要的描述500。最后，程序tie2xtos 460产生为RUR和WUR指令所使用的保存与恢复代码510。Figure 16 is a diagram showing how the ISA-specific portions of these software tools are generated. The TIE parser 410 generates C language code from the user-generated TIE description file 400 for several segments of the program, each of which generates a file that can be accessed by one or more software development tools to obtain Information about user-defined instructions and status. For example, program tie2gcc 420 generates a C language header file 470 called xtensa_tie.h, which includes intrinsics for the new instructions. Program tie2isa 430 generates a dynamic link library (DLL) 480 containing information about the format of user-defined instructions (in the Wilson et al. patent application discussed below, this is effectively the combination of encoding and decoding DLLs discussed therein) . Program tie2iss 440 generates a performance simulator and produces a DLL 490 containing instruction semantics used by a host compiler to produce an emulator for use by the emulator, as discussed in the Wilson et al patent application DLL. Program tie2ver 450 generates the necessary description 500 for user-defined instructions in an appropriate hardware description language. Finally, program tie2xtos 460 generates save and restore code 510 for use by RUR and WUR instructions.

对各项指令的精细描述以及它们如何访问各种状态使得它有可能产生有效的逻辑，该逻辑可以插入到现有的高性能微处理器设计之中。结合本发明的这个实施例而叙述的各种方法专门处理那些从一个或多个状态寄存器读出或写入其中的各项新指令。特别是，本实施例表示在微处理器类的实现方式的意义上，如何导出用于各状态寄存器的硬件逻辑，上述微处理器的实现方式全都使用流水线，作为获得高性能的一种技术。The fine-grained description of individual instructions and how they access various states makes it possible to generate efficient logic that can be plugged into existing high-performance microprocessor designs. The methods described in connection with this embodiment of the invention deal exclusively with new instructions that read from or write to one or more status registers. In particular, this embodiment shows how to derive the hardware logic for each status register in the sense of a microprocessor-like implementation that all uses pipelining as a technique to achieve high performance.

在例如图17所示的流水线实施方案中，一个状态寄存器被典型地复制若干次，每一项具体说明都表示处于一个特定流水线级之中的状态的数值。在本实施例中，一种状态被转换为与优先的核心处理器实施方案相一致的寄存器的多个拷贝。同时，再次以跟优先的核心处理器实施方案相一致的方式产生附加的旁路和前向逻辑。例如，为了瞄准一个含有3个执行阶段的核心处理器实施方案，本实施例将一种状态转换为3个寄存器，其连接方式如图18所示。在这个实施方案中，每一个寄存器610-630都表示在3个流水线级其中之一的状态in的数值。ctrl-1，ctrl-2，和ctrl-3都是控制信号，用以在相应的触发器610-630中激活数据锁存功能。In a pipeline implementation such as that shown in Figure 17, a state register is typically replicated several times, with each specification representing the value of the state within a particular pipeline stage. In this embodiment, a state is converted to multiple copies of registers consistent with the preferred core processor implementation. At the same time, additional bypass and forward logic is created, again in a manner consistent with preferred core processor implementations. For example, in order to target a core processor implementation with 3 execution stages, this embodiment converts a state into 3 registers, and their connections are shown in FIG. 18 . In this embodiment, each register 610-630 represents the value of state in at one of the three pipeline stages. ctrl-1, ctrl-2, and ctrl-3 are all control signals for activating the data latch function in the corresponding flip-flops 610-630.

为了制作状态寄存器的多个拷贝而进行的跟优先的处理器实施方案相一致的工作要求附加的逻辑和控制信号。“一致”意味着状态应当表现出与在中断、除外情况和流水线挂起等情况下处理器的其余各种状态精确地相同。典型地，一种给定的处理器实施方案定义表示各种流水线条件的某些信号。要求这样的信号能使流水线状态寄存器正确地进行工作。To make multiple copies of the status register consistent with the preferred processor implementation requires additional logic and control signals. "Consistent" means that the state should behave exactly the same as the rest of the state of the processor under conditions of interrupts, exceptions, pipeline stalls, etc. Typically, a given processor implementation defines certain signals indicative of various pipeline conditions. Such a signal is required for the pipeline status register to function correctly.

在一个典型的流水线实施方案中，执行单元包括多个流水线级。在这条流水线的多个级中进行一条指令的计算。指令流按照由控制逻辑所引导的序列从流水线中流过。在任何给定的时间，在流水线中都可能有多达n条指令被执行。这里n为级的数目。在一个超标量的处理器中，也可以使用本发明来实现，在流水线中的指令的数目可以是n×w，其中，w是处理器的出口宽度。In a typical pipeline implementation, an execution unit includes multiple pipeline stages. Computation of an instruction occurs in multiple stages of the pipeline. The stream of instructions flows through the pipeline in a sequence directed by the control logic. At any given time, there may be as many as n instructions being executed in the pipeline. Here n is the number of stages. In a superscalar processor, the invention can also be implemented, and the number of instructions in the pipeline can be n×w, where w is the exit width of the processor.

控制逻辑的作用是确认介于各指令之间的依赖性得到遵守，并且介于各指令之间的任何干扰都得以解决。若一条指令使用由一条先前的指令计算出来的数据，则需要专门的硬件在不阻塞流水线的情况下将数据送到后一条指令。若出现中断，则在流水线中的所有指令都需要被杀掉，然后再重新执行。当由于不具备它所需要的输入数据或计算硬件而使得指令无法执行时，该指令应当被挂起。挂起一条指令的廉价的方法是在它的第1执行阶段就把它杀掉，并在下一个周期中重新执行该指令。这种技术的结果就是在流水线中生成一个无效的级(气泡)。这个气泡连同其他指令一起，流过该流水线。在各指令被损坏的流水线的末端，这些气泡被抛弃。The role of the control logic is to ensure that dependencies between instructions are honored and that any interference between instructions is resolved. If an instruction uses data calculated by a previous instruction, specialized hardware is required to send the data to a subsequent instruction without blocking the pipeline. If an interrupt occurs, all instructions in the pipeline need to be killed and then re-executed. An instruction should be suspended when it cannot execute because the input data or computing hardware it needs is not available. A cheap way to suspend an instruction is to kill it in its first execution phase and re-execute the instruction in the next cycle. The result of this technique is to generate an invalid stage (bubble) in the pipeline. This bubble, along with other instructions, flows through the pipeline. These bubbles are discarded at the end of the pipeline where instructions are corrupted.

使用上述3级流水线的实例，这样一种处理器状态的典型的实施方案所需的附加逻辑和连接示于图19。Using the 3-stage pipeline example above, the additional logic and connections required for a typical implementation of such a processor state are shown in FIG.

在正常情况下，在一级中计算出来的数值将被立即送往下一条指令，而不必等待该数值到达流水线的末端，以便减少因数据依赖性而引入的流水线挂起的次数。通过直接地将第1触发器610的输出送往语义块，使得它能够立即被下一条指令所使用，就能完成这一步。为了处理诸如中断和除外等异常情况，本实施方案需要下列3种控制信号：Kill_1，Kill_all，和Valid_3。Under normal circumstances, the value calculated in one stage will be sent to the next instruction immediately without waiting for the value to reach the end of the pipeline, so as to reduce the number of pipeline hangs introduced by data dependencies. This is accomplished by directly sending the output of the first flip-flop 610 to the semantic block so that it can be used immediately by the next instruction. To handle exceptions such as interrupts and exceptions, this embodiment requires the following three control signals: Kill_1, Kill_all, and Valid_3.

信号Kill_1表示由于例如不具备它所需要的数据，所以当前处于第1流水线级110的指令应当被杀掉。信号Kill_all表示由于它们前面的一条指令已经产生一种除外情况或者已经出现一次中断，所以在流水线中的所有指令都应当被杀掉。信号Valid_3表示当前处于最后一级630之中的指令是否有效。这种情况通常是在第1流水线级610中杀掉一条指令以及在流水线中出现一个气泡(无效指令)的结果。“Valid_3”简单地表示在第3流水线级之中的指令是有效的还是一个气泡。显而易见，只有有效的指令应当被锁存。Signal Kill_1 indicates that the instruction currently at the first pipeline stage 110 should be killed because, for example, the data it needs is not available. The signal Kill_all indicates that all instructions in the pipeline should be killed because an instruction preceding them has generated an exception or an interrupt has occurred. The signal Valid_3 indicates whether the instruction currently in the last stage 630 is valid. This condition is usually the result of killing an instruction in the first pipeline stage 610 and a bubble (invalid instruction) in the pipeline. "Valid_3" simply indicates whether the instruction in the 3rd pipeline stage is valid or a bubble. Obviously, only valid instructions should be latched.

图20表示为实现状态寄存器所需的附加逻辑和连接。它同时表示如何构建控制逻辑，以驱动各信号“ctrl-1”，“ctrl-2”和“ctrl-3”，使得状态寄存器的实施方案符合上述各项要求。下面是为了实现图19所示的状态寄存器而自动地生成的样本HDL代码。Figure 20 shows the additional logic and connections required to implement the status register. It also shows how to build the control logic to drive the signals "ctrl-1", "ctrl-2" and "ctrl-3" so that the implementation of the status register meets the above requirements. The following is sample HDL code automatically generated to implement the status register shown in Figure 19.

module tie_enflop(tie_out，tie_in，en，clk)； module tie_enflop(tie_out, tie_in, en, clk);

parameter size＝32； parameter size=32;

output[size-1:0] tie_out；output[size-1:0] tie_out;

input[size-1:0] tie_in；input[size-1:0] tie_in;

input en；input en;

input clk；input clk;

reg[size-1∶0] tmp；reg[size-1:0]tmp;

assigntie_out＝tmp；assignie_out = tmp;

always@(posedge clk) beginalways@(posedge clk) begin

if (en)if (en)

tmp<＝ #1 tie_in； tmp <= #1 tie_in;

endend

endmoduleendmodule

module tie_athens_state(ns，we，ke，kp，vw，clk，ps)；module tie_athens_state(ns, we, ke, kp, vw, clk, ps);

parameter size＝32；parameter size=32;

input[size-1∶0] ns；//next stateinput[size-1:0] ns; //next state

input we； //write enableinput we; //write enable

input ke； //Kill E stateinput ke; //Kill E state

input kp； //Kill Pipelineinput kp; //Kill Pipeline

input vw； //Valid W stateinput vw; //Valid W state

input clk； //clockinput clk; //clock

output[size-1∶0]ps；//present stateoutput[size-1:0] ps; //present state

wire[size-1∶0]se； // state at E stagewire[size-1∶0]se; // state at E stage

wire[size-1∶0]sm； // state at M stagewire[size-1∶0]sm; // state at M stage

wire[size-1∶0]sw； // state at W stagewire[size-1∶0]sw; // state at W stage

wire[size-1∶0]sx； // state at X stagewire[size-1∶0]sx; // state at X stage

wire ee； // write enable for EM registerwire ee; // write enable for EM register

wire ew； // write enable for WX registerwire ew; // write enable for WX register

assign se＝ kp？sx：ns；assign se = kp? sx: ns;

assign ee＝ kp|we &～ke；assign ee = kp | we & ~ ke;

assign ew＝ vw &～kp；assign ew = vw &~kp;

assign ps＝sm；assign ps = sm;

tie_enflop # (size) state_EM(.tie_out(sm)，.tie_in(se)，.en(ee)，tie_enflop # (size) state_EM(.tie_out(sm), .tie_in(se), .en(ee),

\.clk(clk))； \.clk(clk));

tie_enflop #(size)state_MW(.tie_out(sw)，.tie_in(sm)，tie_enflop #(size) state_MW(.tie_out(sw), .tie_in(sm),

.en(1′b1)，\.clk(clk))； .en(1'b1),\.clk(clk));

tie_enflop # (size)state_WX(.tie_out(sx)，.tie_in(sw)，.en(ew)，tie_enflop # (size) state_WX(.tie_out(sx), .tie_in(sw), .en(ew),

\.clk(clk))； \.clk(clk));

endmoduleendmodule

若语义块指定该状态作为它的输入，则使用上述的流水线状态寄存器模型，将该状态的当前状态值作为一个输入变量送往语义块。若语义块具有为一种状态产生新数值的逻辑，则生成一组输出信号。这个输出信号被用来作为下一个状态，输入到流水线状态寄存器。If the semantic block specifies this state as its input, then use the above pipeline state register model to send the current state value of this state as an input variable to the semantic block. Generates a set of output signals if the semantic block has logic to generate a new value for a state. This output signal is used as the next state input to the pipeline status register.

本实施例允许多个语义描述块，其中的每一个都描述多条指令的行为。在这种不受约束的描述方式下，有可能只有各语义块的一个子集为一种给定状态产生下一个状态的输出。而且，也可能一个给定的语义块有条件地依赖于在一段给定时间内，它执行什么指令而产生下一个状态的输出。因而，需要附加的硬件逻辑去从所有语义块中组合下一个状态输出，以形成送往流水线状态寄存器的输入。在本发明的这个实施例中，为每一个语义块自动地导出一组信号，以表示该块是否已经为该状态产生一个新的数值。在另一个实施例中，这样一组信号可以留给设计者去说明。This embodiment allows multiple semantic description blocks, each of which describes the behavior of multiple instructions. In this unconstrained description, it is possible that only a subset of semantic blocks produce the output of the next state for a given state. Furthermore, it is also possible for a given semantic block to conditionally depend on what instructions it executes at a given time to produce the output of the next state. Thus, additional hardware logic is required to combine the next state outputs from all semantic blocks to form the inputs to the pipeline state registers. In this embodiment of the invention, a set of signals is automatically derived for each semantic block to indicate whether the block has generated a new value for the state. In another embodiment, the specification of such a set of signals may be left to the designer.

图20表示如何从几个语义块sl-sn组合一种状态的下一个状态输出，并且适当地选择其中的一个以便输入到状态寄存器。在这份图中，op1_1和op1_2是用于第1语义块的操作码信号，op2_1和op2_2是用于第2语义块的操作码信号，等等。语义块i的下一个状态输出是si(若有多个状态寄存器，则针对该块有多个下一个状态输出)。该信号表示该语义块i已经为该状态si_we产生一个新的数值。信号s_we表示是否有任何的语义块为该状态产生一个新的数值，并且被用来作为写使能信号输入到流水线状态寄存器。Fig. 20 shows how to combine the next state output of a state from several semantic blocks sl-sn, and select one of them appropriately for input to the state register. In this figure, op1_1 and op1_2 are opcode signals for the 1st semantic block, op2_1 and op2_2 are opcode signals for the 2nd semantic block, and so on. The next state output of semantic block i is si (if there are multiple state registers, there are multiple next state outputs for this block). This signal indicates that the semantic block i has generated a new value for the state si_we. The signal s_we indicates whether any semantic block generates a new value for this state, and is used as a write enable signal input to the pipeline state register.

即使多语义块的表达能力不大于单语义块的，它仍然典型地通过把相关的指令集中到一个单独的块，来提供更加结构化的描述。由于在更受限制的范围内来执行这些指令，所以多语义块还能够导致对指令效果的更简单的分析。另一方面，对一个单语义块来说，通常有理由去描述多条指令的行为。最常见的是，这是由于这些指令的硬件实施方案共亨公共的逻辑。在一个单语义块中描述多条指令通常会导致更有效的硬件设计。Even though the expressive power of a multi-semantic block is no greater than that of a single-semantic block, it still typically provides a more structured description by grouping related instructions into a single block. Multi-semantic blocks can also lead to a simpler analysis of the effects of instructions since the instructions are executed within a more restricted scope. On the other hand, for a single semantic block, it is often justified to describe the behavior of several instructions. Most commonly, this is due to the hardware implementation of these instructions sharing common logic. Describing multiple instructions in a single semantic block often leads to more efficient hardware design.

由于中断和除外情况，对软件来说，有必要向数据存储器装入各种状态的数值，以及从其中恢复(取出)各种状态的数值。基于新状态和新指令的正式描述，有可能自动地产生这样的恢复与装载指令。在本发明的一个实施例中，用于恢复与装载的逻辑被自动地产生为两个语义块，后者可以被递归地转换为恰似任何其他块的实际的硬件。例如，从下列状态的说明中：Due to interrupts and exceptions, it is necessary for software to load and restore (fetch) values of various states into and from data memory. It is possible to automatically generate such restore and load instructions based on the new state and the formal description of the new instructions. In one embodiment of the invention, the logic for recovery and loading is automatically generated as two semantic blocks, which can be recursively translated into actual hardware just like any other block. For example, from the description of the following states:

state [63∶0] DATA cpn＝0 autopackstate[63:0] DATA cpn=0 autopack

state [27∶0] KEYC cpn＝1 nopackstate[27:0] KEYC cpn=1 nopack

state [27∶0] KEYD cpn＝1state[27:0] KEYD cpn=1

user_register 0＝DATA[31∶0]；user_register 0 = DATA[31:0];

user_register 1＝DATA[63∶32]；user_register1 = DATA[63:32];

user_register 2＝KEYC；user_register2 = KEYC;

user_register 3＝KEYD；user_register 3 = KEYD;

可以产生下列语义块，以便将“DATA”，“KEYC”和“KEYD”的数值读入各通用寄存器：The following semantic blocks can be generated to read the values of "DATA", "KEYC" and "KEYD" into the respective general purpose registers:

iclass rur{RUR}{out arr，in st}(in DATA，in KEYC，in KEYD}iclass rur{RUR}{out arr, in st}(in DATA, in KEYC, in KEYD}

semantic rur(RUR){semantic rur(RUR){

wire sel_0＝(st＝＝8′d0)； wire sel_0 = (st = = 8'd0);

wire sel_1＝(st＝＝8′d1)； wire sel_1 = (st = = 8'd1);

wire sel_2＝(st＝＝8′d2)； wire sel_2 = (st = = 8'd2);

wire sel_3＝(st＝＝8′d3)； wire sel_3 = (st = = 8'd3);

assign arr＝{32{sel_0}} & DATA[31∶0] assign arr={32{sel_0}} & DATA[31:0]

{32{sel_1}} & DATA[64∶32] {32{sel_1}} & DATA[64:32]

{32{sel_2}} & KEYC {32{sel_2}} & KEYC

{32{sel_3}} & KEYD； {32{sel_3}} &KEYD;

} }

图21表示对应于这一类语义逻辑的逻辑的方框图。输入信号”st”跟各种常数进行比较，以形成各种选择信号，它们被用来以跟user_register说明相一致的方法，从各状态寄存器中选择某些位。使用先前的状态说明，DATA的位32映射到第2用户寄存器的位0。因此，在此图中MUX的第2输入应当被连接到DATA状态的第32位。Figure 21 shows a block diagram of the logic corresponding to this type of semantic logic. The input signal "st" is compared with various constants to form selection signals which are used to select bits from the status registers in a manner consistent with the user_register specification. Using the previous state description, bit 32 of DATA maps to bit 0 of the 2nd user register. Therefore, the 2nd input of the MUX in this figure should be connected to the 32nd bit of the DATA state.

可以产生下列语义块，将来自各通用寄存器的数值写入状态“DATA”，“KEYC”和“KEYD”The following semantic blocks can be generated to write values from the various general purpose registers to the states "DATA", "KEYC" and "KEYD"

iclass wur {WUR}{in art，in sr}{out DATA.out KEYC，out KEYD}iclass wur {WUR} {in art, in sr} {out DATA. out KEYC, out KEYD}

semantic wur (WUR) {semantic wur (WUR) {

wire sel_0＝(st＝＝8′d0)； wire sel_0 = (st = = 8'd0);

wire sel_1＝(st＝＝8′d1)； wire sel_1 = (st = = 8'd1);

wire sel_2＝(st＝＝8′d2)； wire sel_2 = (st = = 8'd2);

wire sel_3＝(st＝＝8′d3)； wire sel_3 = (st = = 8'd3);

assign DATA＝{sel_1 ？art：DATA[63∶32]，sel_0？art： assign DATA={sel_1? art: DATA[63:32], sel_0? art:

DATA[31∶0]}；DATA[31:0]};

assign KEYC＝art； assign KEYC = art;

assign KEYD＝art； assign KEYD = art;

assign DATA_we＝WUR； assign DATA_we = WUR;

assign KEYC_we＝WUR & sel_2； assign KEYC_we = WUR &sel_2;

assign KEYD_we＝WUR & sel_3； assign KEYD_we = WUR &sel_3;

} }

图22表示当被映射到第i个用户寄存器的第k位时，状态S的第j位的逻辑。在一条WUR指令中，若user_register号码“st”为“i”，则“ars”的第k位被装载到S[j]寄存器；否则，S[j]的原始数值被再次循环。此外，若状态S的任何位被重新装载，则信号S_we被激活。Figure 22 shows the logic of bit j of state S when mapped to bit k of the i user register. In a WUR instruction, if the user_register number "st" is "i", the kth bit of "ars" is loaded into the S[j] register; otherwise, the original value of S[j] is cycled again. Furthermore, signal S_we is activated if any bit of state S is reloaded.

TIE user_register说明指定从由状态说明定义的附加处理器状态到由这些RUR和WUR指令所使用的一个标识符的映射关系，以便对独立于TIE指令之外的这种状态进行读和写。The TIE user_register specification specifies the mapping from the additional processor state defined by the state specification to an identifier used by the RUR and WUR instructions to read and write to this state independently of the TIE instructions.

附录F表示用于产生RUR和WUR指令的代码。Appendix F presents the code used to generate the RUR and WUR instructions.

RUR和WUR指令的主要用途是用于任务切换，在一个多任务环境中，多任务软件共享根据某些调度算法来运行的处理器。当被激活时，该任务的状态驻留在处理器的寄存器之中。当调度算法决定切换到另一项任务时，被保存在处理器的各寄存器之中的状态被存入到存储器之中，并将另一项任务的状态从存储器装载到处理器的寄存器之中。Xtensa^TM指令集体系结构(ISA)包括RSR和WSR指令，用以对ISA所定义的状态进行读和写。例如，下列代码是任务“存入存储器”的一部分：The main use of the RUR and WUR instructions is for task switching, in a multitasking environment where multitasking software shares the processor running according to some scheduling algorithm. When activated, the state of the task resides in processor registers. When the scheduling algorithm decides to switch to another task, the state held in the processor's registers is stored in memory, and the state of the other task is loaded from memory into the processor's registers . The Xtensa ^™ Instruction Set Architecture (ISA) includes RSR and WSR instructions to read and write states defined by the ISA. For example, the following code is part of the task "store to memory":

//save special registers //save special registers

rsr a0，SAR rsr a0, SAR

rsr a1，LCOUNT rsr a1, LCOUNT

s32i a0，a3，UEXCSAVE+0 s32i a0, a3, UEXCSAVE+0

s32i a1，a3，UEXCSAVE+4 s32i a1, a3, UEXCSAVE+4

rsr a0，LBEG rsr a0, LBEG

rsr a1，LEND rsr a1, LEND

s32i a0，a3，UEXCSAVE+8 s32i a0, a3, UEXCSAVE+8

s32i a1，a3，UEXCSAVE+12 s32i a1, a3, UEXCSAVE+12

；if(config_get_value(″IsaUseMAC16″)){;if(config_get_value("IsaUseMAC16")){

rsr a0，ACCLO rsr a0, acclo

rsr a1，ACCHI rsr a1, ACCHI

s32i a0，a3，UEXCSAVE+16 s32i a0, a3, UEXCSAVE+16

s32i a1，a3，UEXCSAVE+20 s32i a1, a3, UEXCSAVE+20

rsr a0，MR_0 rsr a0, MR_0

rsr a1，MR_1 rsr a1, MR_1

s32i a0， a3，UEXCSAVE+24 s32i a0, a3, UEXCSAVE+24

s32i a1，a3，UEXCSAVE+28 s32i a1, a3, UEXCSAVE+28

rsr a0，MR_2 rsr a0, MR_2

rsr a1，MR_3 rsr a1, MR_3

s32i a0，a3，UEXCSAVE+32 s32i a0, a3, UEXCSAVE+32

s32i a1，a3，UEXCSAVE+36 s32i a1, a3, UEXCSAVE+36

；};}

以及下列代码是任务“从存储器恢复”的一部分：and the following code is part of the task "restore from memory":

//restore special registers //restore special registers

132i a2，a1，UEXCSAVE+0 132i a2, a1, UEXCSAVE+0

132i a3，a1，UEXCSAVE+4 132i a3, a1, UEXCSAVE+4

wsr a2，SAR wsr a2, SAR

wsr a3，LCOUNT wsr a3, LCOUNT

132i a2，a1，UEXCSAVE+8 132i a2, a1, UEXCSAVE+8

132i a3，a1，UEXCSAVE+12 132i a3, a1, UEXCSAVE+12

wsr a2，LBEG wsr a2, LBEG

wsr a3，LEND wsr a3, LEND

132i a2，a1，UEXCSAVE+16 132i a2, a1, UEXCSAVE+16

132i a3，a1，UEXCSAVE+20 132i a3, a1, UEXCSAVE+20

wsr a2，ACCLO wsr a2, acclo

wsr a3，ACCHI wsr a3, ACCHI

132i a2，a1，UEXCSAVE+24 132i a2, a1, UEXCSAVE+24

132i a3，a1，UEXCSAVE+28 132i a3, a1, UEXCSAVE+28

wsr a2，MR_0 wsr a2, MR_0

wsr a3，MR_1 wsr a3, MR_1

132i a2，a1，UEXCSAVE+32 132i a2, a1, UEXCSAVE+32

132i a3，a1，UEXCSAVE+36 132i a3, a1, UEXCSAVE+36

wsr a2，MR_2 wsr a2, MR_2

wsr a3，MR_3 wsr a3, MR_3

；};}

在这里，SAR，LCOUNT，LBEG，LEND都是核心Xtensa^TM ISA的处理器状态寄存器部分，并且ACCLO，ACCHI，MR_0，MR_1，MR_2和MR_3都是MAC16Xtensa^TMISA选项的一部分。(各寄存器都以成对方式被存储和恢复，以避免流水线互锁。)Here, SAR, LCOUNT, LBEG, LEND are all processor status registers part of core Xtensa ^TM ISA, and ACCLO, ACCHI, MR_0, MR_1, MR_2 and MR_3 are all part of MAC16Xtensa ^TM ISA options. (Registers are stored and restored in pairs to avoid pipeline interlocks.)

当设计者用TIE来定义新的状态时，它也必须像以上的状态那样进行任务切换。对设计者来说，一种可能性就是，简单地编写任务切换代码(上面已经给出其中的一部分)以及添加类似于上述代码的指令RUR/S32I和L32I/WUR。然而，当软件被自动地产生并且在结构上是正确的时，可配置的处理器将是最有效的。因此，本发明包括一种装置，用以自动地增加任务切换代码。下列的各tpp行被添加到上述的存储任务中去：When the designer uses TIE to define a new state, it must also perform task switching like the above states. One possibility for the designer is to simply write the task switching code (part of which is given above) and add instructions RUR/S32I and L32I/WUR similar to the above code. However, a configurable processor will be most efficient when the software is automatically generated and architecturally correct. Accordingly, the present invention includes a means for automatically adding task switching code. The following tpp lines are added to the storage task above:

；my $off＝0；;my$off=0;

；my $i；;my $i;

；for($i＝0；$i<$#user_registers；$i+＝2){;for($i=0; $i<$#user_registers; $i+=2){

rur a2，`$user_registers[$i+0]` rur a2, `$user_registers[$i+0]`

rur a3，`$user_registers[$i+1]` rur a3, `$user_registers[$i+1]`

s32i a2，UEXCUREG+ `$off+0` s32i a2, UEXCUREG + `$off+0`

s32i a3，UEXCUREG+ `$off+4` s32i a3, UEXCUREG+ `$off+4`

； $off+＝8；; $off+=8;

；};}

；if(@user_registers & 1){; if (@user_registers & 1) {

； # odd number of user registers; # odd number of user registers

rur a2，`$user_registers[$#user_registers]` rur a2, `$user_registers[$#user_registers]`

s32i a2，UEXCUREG+`$off+0` s32i a2, UEXCUREG+`$off+0`

； $off+＝4；; $off+=4;

；};}

以及下列各行被添加到上述恢复任务中去：and the following lines are added to the restore task above:

；my $off＝0；;my$off=0;

；my $i；;my $i;

132i a2，UEXCUREG+ `$off+0` 132i a2, UEXCUREG+ `$off+0`

132i a3，UEXCUREG+ `$off+4` 132i a3, UEXCUREG+ `$off+4`

wur a2，`$user_registers[$i+0]` wur a2, `$user_registers[$i+0]`

wur a3，`$user_registers[$i+1]` wur a3, `$user_registers[$i+1]`

； $off+＝8；; $off+=8;

；};}

；if(@user_registers & 1){; if (@user_registers & 1) {

； # odd number of user registers; # odd number of user registers

132i a2，UEXCUREG+`$off+0` 132i a2, UEXCUREG+`$off+0`

wur a2，`$user_registers[$#user_registers]` wur a2, `$user_registers[$#user_registers]`

；$off+＝4；;$off+=4;

；};}

最后，在存储器中的任务状态区域应当具有分配给用户寄存器存储的附加的空间，并且这个空间从任务存储指针的基地址算起的偏移量被定义为汇编程序常数UEXCUREG。这个存储区域事先用下列代码加以定义#define UEXCREGSIZE(16＊4)Finally, the task state area in memory should have additional space allocated for user register storage, and the offset of this space from the base address of the task memory pointer is defined as the assembler constant UEXCUREG. This storage area is defined in advance with the following code #define UEXCREGSIZE(16*4)

#define UEXCPARMSIZE(4＊4) #define UEXCPARMSIZE(4*4)

；if (& config_get_value(″IsaUseMAC16″)){ ;if (&config_get_value("IsaUseMAC16")){

#define UEXCSAVESIZE(10＊4) #define UEXCSAVESIZE(10*4)

；}else{ ;}else{

#define UEXCSAVESIZE(4＊4) #define UEXCSAVESIZE(4*4)

；} ;}

#define UEXCMISCSIZE(2＊4) #define UEXCMISCSIZE(2*4)

#define UEXCpARM 0 #define UEXCpARM 0

#define UEXCREG(UEXCPARM+UEXCPARMSIZE) #define UEXCREG(UEXCPARM+UEXCPARMSIZE)

#define UEXCSAVE(UEXCREG+UEXCREGSIZE) #define UEXCSAVE(UEXCREG+UEXCREGSIZE)

#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE) #define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)

#define UEXCWIN(UEXCMISC+0) #define UEXCWIN(UEXCMISC+0)

#define UEXCFRAME #define UEXCFRAME

(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE)(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE)

which is changed towhich is changed to

#define UEXCREGSIZE(16＊4) #define UEXCREGSIZE(16*4)

#define UEXCPARMSIZE(4＊4) #define UEXCPARMSIZE(4*4)

；if(& config_get_value(″IsaUseMAC16″)){ ;if(&config_get_value("IsaUseMAC16")){

#define UEXCSAVESIZE(10＊4) #define UEXCSAVESIZE(10*4)

；}else{ ;}else{

#define UEXCSAVESIZE(4＊4) #define UEXCSAVESIZE(4*4)

；} ;}

#define UEXCMISCSIZE(2＊4) #define UEXCMISCSIZE(2*4)

#define UEXCUREGSIZE `@user_registers＊4` #define UEXCUREGSIZE `@user_registers*4`

#define UEXCPARM 0 #define UEXCPARM 0

#define UEXCREG(UEXCPARM+UEXCPARMSIZE) #define UEXCREG(UEXCPARM+UEXCPARMSIZE)

#define UEXCSAVE(UEXCREG+UEXCREGSIZE) #define UEXCSAVE(UEXCREG+UEXCREGSIZE)

#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE) #define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)

#define UEXCUREG(UEXCMISC+UEXCMISCSIZE) #define UEXCUREG(UEXCMISC+UEXCMISCSIZE)

#define UEXCWIN(UEXCUREG+0) #define UEXCWIN(UEXCUREG+0)

#define UEXCFRAME\ #define UEXCFRAME\

(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE+UEXCUREGSIZE(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE+UEXCUREGSIZE

))

这个代码依赖于存在一个tpp变量@user_register，它有一份用户寄存器号码的列表，这简单地是一份从每一个user_register语句的第1自变量生成的列表。This code relies on the existence of a tpp variable @user_register which holds a list of user register numbers, which is simply a list generated from the first argument of each user_register statement.

在某些更复杂的微处理器实施方案中，可以在不同的流水线状态中计算出一种状态，处理这一步需要对这里所描述的过程作出若干扩展(虽然是简单的扩展)。首先，描述语言需要扩展，使之能够将一个语义块跟一个流水线级联系在一起。可以用几种方法当中的一种来完成这一步。在一个实施例中，可以用每一个语义块明显地指定相关的流水线级。在另一个实施例中，可以为每一个语义块指定流水线级的一个范围。在又一个实施例中，根据所需的计算延时，可以为一个给定的语义块自动地导出流水线级。In some more complex microprocessor implementations, a state may be computed in different pipeline states, and handling this step requires several extensions (albeit simple ones) to the process described here. First, the description language needs to be extended so that it can associate a semantic block with a pipeline stage. This step can be accomplished in one of several ways. In one embodiment, each semantic block may explicitly specify the associated pipeline stage. In another embodiment, a range of pipeline stages may be specified for each semantic block. In yet another embodiment, pipeline stages can be automatically derived for a given semantic block according to the required computational latency.

在不同的流水线级中支持状态产生的第2项任务就是处理各种中断，各种除外情况和各种挂起。这通常涉及在流水线控制信号的控制下，增添适当的旁路和前向逻辑。在一个实施例中，可以产生一份通用图，用以指示该状态何时产生以及它何时被使用二者之间的关系。基于应用分析，可以实现适当的前向逻辑，以处理共同的情况，并且可以产生互锁逻辑，对于不由前向逻辑来处理的各种情况，将流水线挂起。The second task to support state generation in different pipeline stages is to handle various interrupts, various exceptions and various suspends. This usually involves adding appropriate bypass and forward logic under the control of pipeline control signals. In one embodiment, a generic diagram can be generated indicating the relationship between when the state is generated and when it is used. Based on application analysis, appropriate forward logic can be implemented to handle common cases, and interlock logic can be generated to suspend the pipeline for various cases not handled by the forward logic.

用于修改基本处理器的指令出口逻辑的方法依赖于该基本处理器所使用的算法。然而，一般来说，对大多数指令来说，不管是单出口还是超标量，也不管是用于单周期指令还是多周期指令，指令出口逻辑都仅依赖于被测试的指令，用于产生：The method used to modify the instruction exit logic of a base processor depends on the algorithm used by the base processor. In general, however, for most instructions, whether single-exit or superscalar, and whether for single-cycle or multi-cycle instructions, the instruction exit logic depends only on the instruction being tested for producing:

1.为每一种处理器状态元素指示该指令是否将各种状态用作一个源的各种信号；1. signals for each processor state element indicating whether the instruction uses the state as a source;

2.为每一种处理器状态元素指示该指令是否将各种状态用作一个目标的各种信号；2. signals for each processor state element indicating whether the instruction uses the state as a target;

3.为每一个功能单元指示该指令是否使用各功能单元的各种信号；3. For each functional unit, indicate whether the instruction uses various signals of each functional unit;

这些信号被用来执行发往流水线以及交叉出口检查，并且被用来在依赖于流水线的出口逻辑中更新流水线的状态。TIE含有所有必需的信息，以便为各项新指令增加各种信号以及它们的方程式。These signals are used to perform on-pipeline and cross-exit checks, and are used to update the state of the pipeline in pipeline-dependent exit logic. The TIE contains all the necessary information to add the various signals and their equations for each new instruction.

首先，TIE状态说明导致为指令出口逻辑生成一组新的信号。在iclass说明的第3或第4自变量中所列出的in或inout操作数或状态为针对被指定的处理器状态元素的第1组方程式的第2自变量中所列出的各项指令增添指令解码信号。First, the TIE state specification results in a new set of signals being generated for the instruction exit logic. The in or inout operands or states listed in the 3rd or 4th argument of the iclass declaration are the instructions listed in the 2nd argument of the first set of equations for the specified processor state element Add instruction decode signal.

其次，在iclass说明的第3或第4自变量中所列出的in或inout操作数或状态为针对被指定的处理器状态元素的第1组方程式的第2自变量所列出的各项指令增添指令解码信号。Second, the in or inout operands or states listed in the 3rd or 4th argument of the iclass declaration are those listed in the 2nd argument of the 1st set of equations for the specified processor state element Instructions add instruction decoding signals.

第三，从每一个TIE语义块生成的逻辑都代表一个新的功能单元，这样就生成一组新的单元信号，并且，用于为该语义块指定的各项TIE指令的各解码信号通过逻辑“或”运算被组合在一起，以形成第3组方程式。Third, the logic generated from each TIE semantic block represents a new functional unit, thus generating a new set of unit signals, and the decoded signals for the TIE instructions specified for that semantic block pass through the logic The "or" operations are combined to form the 3rd set of equations.

当一条指令被发出时，应当为未来的发出决定而更新流水线的状态。再有，用于修改基本处理器的指令发出逻辑的方法依赖于该基本处理器所使用的算法。然而，某些一般的观察是可能的。流水线状态应当向发出逻辑反向提供下列状态：When an instruction is issued, the state of the pipeline should be updated for future issue decisions. Again, the method used to modify the instruction issue logic of the base processor depends on the algorithm used by the base processor. However, certain general observations are possible. The pipeline state shall provide the following states to the inverse of the issuing logic:

4.当该项结果可用于旁路时，为每一条已发出的指令指示目标的各种信号；4. Various signals indicating the target for each command issued when the result is available for bypassing;

5.为每一个功能单元指示该功能单元已经为另一条指令作好准备的各种信号。5. Various signals for each functional unit indicating that the functional unit is ready for another instruction.

这里描述的实施例是一个单出口处理器，其中，设计者定义的各项指令被限制在逻辑计算的一个单周期以内。在这种情况下，上述问题得以可观地简化。不需要功能单元进行检查或者交叉出口检查，并且也没有一条单周期指令能使一个处理器状态元素不为下一条指令作好管道就绪的准备。因此，出口方程式恰好变为The embodiment described here is a single-exit processor in which designer-defined instructions are limited to a single cycle of logical computation. In this case, the above problem is considerably simplified. No functional unit checks or cross-exit checks are required, and no single-cycle instruction can make a processor state element not pipeline-ready for the next instruction. Therefore, the export equation becomes exactly

issue＝(～srcluse|srclpipeready)&(～src2use|src2pipeready)issue＝(~srcluse|srclpipeready)&(~src2use|src2pipeready)

&(～srcNuse|srcNpipeready)；&(~srcNuse|srcNpipeready);

并且其中src[i]管道就绪信号不受各附加指令的影响，并且src[i]use是按照以上所述进行说明和修改的第1方程组。在这个实施例中，不需要第4和第5组信号。对一个多出口和多周期的变通的实施例来说，将用一种潜在因素说明为每一条指令扩充其TIE描述，给出建立计算流水线所需的周期的数目。And wherein the src[i] pipeline ready signal is not affected by each additional instruction, and src[i]use is the first equation set explained and modified as described above. In this embodiment, the 4th and 5th group signals are not needed. For a multi-exit and multi-cycle alternative embodiment, the TIE description will be augmented for each instruction with a latent factor specification, giving the number of cycles required to build the computation pipeline.

通过对每一条指令的指令解码信号进行逻辑“或”运算，把它们集中在一起，从而在每一个语义块流水线级中产生第4组信号，根据说明，指令的执行在该级中完成。By performing a logic "OR" operation on the instruction decoding signals of each instruction, they are gathered together to generate the fourth group of signals in each semantic block pipeline stage. According to the description, the execution of the instruction is completed in this stage.

通过默认已产生的逻辑都将被完全地送进流水线，以及因此由TIE产生的各功能单元在接受一条指令之后，通常都是单周期的。在这种情况下，针对TIE各语义块的第5组信号通常得以建立。当需要在多个周期上再次使用在各语义块中的逻辑时，一个进一步的说明将指定在多少个周期内，这些指令将使用该功能单元。在这种情况下，通过对每一条指令的指令解码信号进行逻辑“或”运算，把它们集中在一起，从而在每一个语义块流水线级中产生第5组信号，每一条指令的执行在该级所指定的周期计数中完成。By default the generated logic will be fully pipelined, and thus each functional unit generated by the TIE is usually single cycle after receiving an instruction. In this case, a Group 5 signal is usually established for each semantic block of the TIE. When the logic in each semantic block needs to be reused over multiple cycles, a further specification will specify how many cycles these instructions will use the functional unit. In this case, a fifth set of signals is generated in each semantic block pipeline stage by logically ORing the instruction decode signals of each instruction to generate a fifth set of signals in which each instruction is executed completes in the cycle count specified by the stage.

可供选择地，在一个不同的实施例中，它可以作为对TIE的扩展，让设计者指定结果就绪信号和功能单元就绪信号。Alternatively, in a different embodiment, it can be used as an extension to TIE to let the designer specify the result ready signal and the functional unit ready signal.

根据本实施例进行处理的代码的实例见各附录。为了简洁起见，对此将不作详细说明；然而，在参阅上述参考手册之后，专业人士对此都将有所理解。附录G是实现一条使用TIE语言的指令的例子；附录H表示TIE编译程序将为使用这样的代码的编译程序产生什么。类似地，附录I表示TIE编译程序将为仿真程序产生什么；附录J表示TIE编译程序将为在一段用户程序中扩展TIE指令的宏产生什么；附录K表示TIE编译程序将产生什么，用以仿真在本地方式中的各项TIE指令；附录L表示TIE编译程序将产生什么，作为对附加硬件的Verilog HDL描述；以及附录M表示TIE编译程序将产生什么，作为优化上述的Verilog HDL描述的Design Compiler稿本，用以评估TIE指令在面积和速度方面对CPU尺寸和性能的影响。Examples of codes processed according to this embodiment are found in the respective appendices. For the sake of brevity, this will not be described in detail; however, it will be understood by professionals after referring to the reference manual mentioned above. Appendix G is an example of implementing an instruction in the TIE language; Appendix H shows what a TIE compiler will produce for a compiler using such code. Similarly, Appendix I shows what a TIE compiler will produce for a simulation program; Appendix J shows what a TIE compiler will produce for a macro that expands a TIE instruction in a user program; and Appendix K shows what a TIE compiler will produce for simulation TIE instructions in native mode; Appendix L shows what the TIE compiler would produce as a Verilog HDL description of the attached hardware; and Appendix M shows what the TIE compiler would produce as a Design Compiler optimizing the Verilog HDL description above A manuscript to evaluate the impact of the TIE instruction on CPU size and performance in terms of area and speed.

如上面所指出的那样，为了开始处理器配置过程，用户经由上述的GUI，通过选择一个基本处理器来开始。作为过程的一部分，如图1所示，软件开发工具30被建立并且被提供给用户。软件开发工具30含有涉及本发明的另一个方面的4个重要部件，详见图6：编译程序108，汇编程序110，指令集仿真程序112，以及调试程序130。As noted above, to begin the processor configuration process, the user begins by selecting a base processor via the GUI described above. As part of the process, as shown in Figure 1, a software development tool 30 is built and provided to the user. The software development tool 30 contains four important components related to another aspect of the present invention, see FIG. 6 for details: compiler 108 , assembler 110 , instruction set emulator 112 , and debugger 130 .

如同专业人士所熟知的那样，编译程序把用诸如C或C++那样的高级编程语言编写的用户应用程序转换为处理器专用的汇编语言。诸如C或C++那样的高级编程语言被设计为允许应用程序的作者以便于他们精细地描述的形式来描述他们的应用程序。这些都不是各种处理器所能理解的语言。应用程序的作者用不着为即将被使用的处理器的所有的专门的特性而操心。典型地，相同的C或C++程序可以不加修改或稍加修改就可以在许多不同类型的处理器中使用。As is well known to those skilled in the art, a compiler converts a user application program written in a high-level programming language such as C or C++ into processor-specific assembly language. High-level programming languages, such as C or C++, are designed to allow application authors to describe their applications in a form that they can elaborately describe. These are not languages understood by various processors. The application author does not have to worry about all the special features of the processor to be used. Typically, the same C or C++ program can be used with little or no modification on many different types of processors.

汇编程序将C或C++程序转换为汇编语言。汇编语言更接近于机器语言，处理器直接地支持机器语言。不同类型的处理器都有它们自身的汇编语言。每一条汇编指令通常直接地代表一条机器指令，但是二者不一定相同。汇编指令被设计为人可读的字符串。每一条指令或操作数都被给出一个有意义的名字或助记符，让人们能够读出汇编指令，同时易于理解机器将进行何种操作。汇编程序将汇编语言转换为机器语言。由汇编程序有效地将每一条汇编指令串编码为一条或多条机器指令，后者可以直接地和有效地被处理器执行。An assembler converts a C or C++ program into assembly language. Assembly language is closer to machine language, and the processor directly supports machine language. Different types of processors have their own assembly languages. Each assembly instruction usually directly represents a machine instruction, but the two are not necessarily the same. Assembly instructions are designed as human-readable strings. Each instruction or operand is given a meaningful name or mnemonic, allowing a human to read the assembly instructions and easily understand what the machine will do. An assembler converts assembly language into machine language. Each string of assembly instructions is effectively encoded by the assembler into one or more machine instructions, which can be directly and efficiently executed by a processor.

机器码可以直接地在处理器上运行，但是各种物理的处理器并不是经常都是立即可用的。建立各种物理的处理器是费时费钱的过程。当选择可能的处理器配置时，用户不能为每一种选择都建立一个物理的处理器。取而代之的是，向用户提供一种称为仿真程序的软件程序。运行于普通的计算机之上的仿真程序能够仿真在用户配置的处理器之上的用户应用程序的效果。仿真程序能够模仿被仿真的处理器的语义，并且能够告诉用户实际的处理器在运行用户的应用程序时将有多快。Machine code can run directly on the processor, but various physical processors are not always immediately available. Building various physical processors is a time-consuming and costly process. When selecting possible processor configurations, the user cannot build a physical processor for each option. Instead, users are provided with a software program called an emulator. An emulator running on an ordinary computer can emulate the effects of a user application on a user-configured processor. An emulator mimics the semantics of the emulated processor and tells the user how fast the actual processor will run the user's application.

调试程序是一种工具，让用户用他们的软件交互式地查找各种问题。调试程序允许用户交互式地运行他们的程序。用户可以在任何时间停止程序的执行，同时注视其C语言源代码，所得到的汇编代码或机器代码。用户还可以在一个断点上察看或修改她的(任何或全部)各变量或硬件寄存器数值。然后用户可以继续执行—或许每次执行一个语句，或许每次执行一条机器指令，或许转到用户选择的一个新的断点。A debugger is a tool that lets users interactively find various problems with their software. Debuggers allow users to run their programs interactively. The user can stop the execution of the program at any time while viewing its C language source code, the resulting assembly code or machine code. The user can also view or modify her (any or all) variables or hardware register values at a breakpoint. The user can then continue execution—perhaps one statement at a time, perhaps one machine instruction at a time, perhaps branching to a new breakpoint of the user's choosing.

所有4部分108，110，112和130都需要知道用户定义的指令750(见图3)，并且仿真程序112和调试程序130还必须附带地知道用户定义的状态752。系统允许用户经由被添加到用户的C和C++应用程序的内部调用来访问用户定义的指令750。编译程序108应当针对用户定义的指令750，将内部调用转换为汇编语言指令738。汇编程序110应当取出新的汇编语言指令738，不管是由用户直接编写还是由编译程序108转换，并将它们编码为对应于用户定义的各指令750的各机器指令740。仿真程序112应当对用户定义的各机器指令740进行解码。它应当模拟各项指令的语义，并且它应当模拟在已配置的处理器上的各项指令的性能。仿真程序112还应当模拟用户定义状态所蕴涵的数值和性能。调试程序130应当允许用户去显示汇编语言指令738，其中包括用户定义指令750。它应当允许用户察看和修改用户定义状态的数值。All 4 parts 108 , 110 , 112 and 130 need to know user-defined instructions 750 (see FIG. 3 ), and emulator 112 and debugger 130 must also know user-defined states 752 incidentally. The system allows the user to access user-defined instructions 750 via internal calls that are added to the user's C and C++ applications. Compiler 108 should convert internal calls to assembly language instructions 738 for user-defined instructions 750 . The assembler 110 should take the new assembly language instructions 738 , whether written directly by the user or converted by the compiler 108 , and encode them into machine instructions 740 corresponding to the user-defined instructions 750 . The emulator 112 should decode each user-defined machine instruction 740 . It should simulate the semantics of the instructions, and it should simulate the performance of the instructions on the configured processor. The simulator 112 should also simulate the values and performance implied by the user-defined states. Debugger 130 should allow a user to display assembly language instructions 738 , including user-defined instructions 750 . It should allow the user to view and modify the value of the user-defined state.

在本发明的这个方面，用户启用一种工具，即TIE编译程序702，来处理当前可能的用户定义的各项改进736。TIE编译程序702不同于编译程序708，后者将用户应用程序转换为汇编语言738。TIE编译程序702建立一些部件，它使得已经建立的基本软件系统30(编译程序708，汇编程序710，仿真程序712以及调试程序730)去使用新的、用户定义的各项改进736。软件系统30的每一个元素都使用各部件的稍为不同的集合。In this aspect of the invention, the user enables a tool, the TIE compiler 702, to process 736 currently possible user-defined enhancements. TIE compiler 702 is distinct from compiler 708 , which converts user applications into assembly language 738 . The TIE compiler 702 builds components that make the already built base software system 30 (compiler 708, assembler 710, simulator 712, and debugger 730) use new, user-defined enhancements 736. Each element of software system 30 uses a slightly different set of components.

图24是一份图，说明这些软件工具的TIE指定部分是如何产生的。TIE编译程序702从用户定义扩展文件736为若干程序生成C语言代码，其中的每一段都产生一个文件，一种或多种软件开发工具都可以访问这个文件，以便取得关于用户定义指令和状态的信息。例如，程序tie2gcc 800产生一个被称为xtensa-tie.h的C语言头文件842(下面将作详细说明)，它含有针对新指令的内部函数定义。程序tie2isa 810产生一个动态连接库(DLL)844/848，它含有关于用户定义指令格式的信息(下面将详细说明编码DLL 844和解码DLL 848的组合)。程序tie2iss 840产生用于性能模拟和指令语义的C语言代码870，如下面将要讨论的那样，被一个主机编译程序846用来产生被仿真程序所使用的仿真程序DLL849，下面将对此作详细叙述。程序tie2ver 850以适当的硬件描述语言为用户定义指令产生必要的描述850。最后，程序tie2xtos 860保存和恢复代码810，为场景切换？？保存和恢复用户定义状态。在上述Wang等人的应用程序中，可以找到关于用户定义状态的实现的附加信息。Figure 24 is a diagram illustrating how the TIE-specific portions of these software tools are generated. TIE compiler 702 generates C language code for several programs from user-defined extension file 736, each segment of which generates a file that can be accessed by one or more software development tools to obtain information about user-defined instructions and states information. For example, program tie2gcc 800 produces a C language header file 842 called xtensa-tie.h (described in detail below) that contains the intrinsic function definitions for the new instructions. Program tie2isa 810 generates a dynamic link library (DLL) 844/848 that contains information about the format of user-defined instructions (the combination of encode DLL 844 and decode DLL 848 will be described in detail below). Program tie2iss 840 generates C language code 870 for performance simulation and instruction semantics, as discussed below, and is used by a host compiler 846 to generate the emulator DLL 849 used by the emulator, described in more detail below . Program tie2ver 850 generates the necessary description 850 for the user-defined instruction in the appropriate hardware description language. Finally, the program tie2xtos 860 saves and restores code 810, for scene switching? ? Save and restore user-defined states. Additional information on the implementation of user-defined states can be found in the application of Wang et al. above.

编译程序708Compiler 708

在本实施例中，编译程序708将用户应用程序中的内部调用转换为汇编语言指令738，用于用户定义的改进736。编译程序708在宏的顶部实现这种机制以及在线汇编机制，这样的机制在诸如GNU编译程序那样的编译程序中可以看到。关于这些机制的更多的信息，可参见例如，《GNU和C++编译程序用户指导》，EGCS版本1.0.3。In this embodiment, compiler 708 converts internal calls in the user application into assembly language instructions 738 for user-defined enhancements 736 . Compiler 708 implements this mechanism on top of macros, as well as the inline assembly mechanism found in compilers such as the GNU compiler. For more information on these mechanisms, see, eg, the GNU and C++ Compiler User's Guide, EGCS version 1.0.3.

考虑一个用户希望生成一条新指令foo，它运行于两个寄存器，并将一个结果返回第3寄存器。用户将指令描述放进用户定义指令文件750的一个特定的目录之中，并启用TIE编译程序702。TIE编译程序702生成具有诸如xtensa-tie.h那样的标准名字的文件742。该文件含有foo的下列定义。Consider a user who wishes to generate a new instruction foo that operates on two registers and returns a result in the third register. The user places the instruction description into a specific directory of the user-defined instruction file 750 and activates the TIE compiler 702 . The TIE compiler 702 generates a file 742 with a standard name such as xtensa-tie.h. This file contains the following definition of foo.

#define foo(ars，art)\#define foo(ars, art) \

({int arr；asm volatile(″foo ％0，％1，％2″：″＝a″(arr)：\ ({int arr; asm volatile("foo %0,%1,%2":"=a"(arr):\

″a″(ars)，″a″(a rt))；})"a"(ars), "a"(ar rt)); })

当用户在她的应用程序中启用编译程序708时，她通过命令行选项或者环境变量，告诉编译程序708具有用户定义改进736的目录名。该目录也包含xtensa-tie.h文件742。编译程序708自动地将文件xtensa-tie.h纳入用户正在编译的C语言或C++语言应用程序，就像用户自己已经编写了foo的定义一样。用户在自己的应用程序中将内部调用纳入指令foo。由于已纳入的定义，所以编译程序708将那些内部调用看成是对已纳入的定义的调用。根据编译程序708提供的标准宏机制，编译程序708处理宏foo的调用时，就好像用户直接编写了汇编语言指令738，而不是宏调用。也就是说，根据标准的在线汇编机制，编译程序708将调用转换为单独的汇编指令foo。例如，用户也许有一个包含对内部的foo进行调用的函数。When the user enables compiler 708 in her application, she tells compiler 708 the name of the directory with user-defined improvements 736, either through a command line option or an environment variable. This directory also contains the xtensa-tie.h file 742. Compiler 708 automatically incorporates the file xtensa-tie.h into the C language or C++ language application program that the user is compiling, as if the user had written the definition of foo himself. The user includes the internal call into the instruction foo in his own application. Because of the included definitions, compiler 708 sees those internal calls as calls to included definitions. According to the standard macro mechanism provided by the compiler 708, when the compiler 708 processes the call of macro foo, it is as if the user directly writes the assembly language instruction 738 instead of a macro call. That is, according to the standard in-line assembly mechanism, the compiler 708 converts the call into a single assembly instruction foo. For example, a user might have a function that contains a call to internal foo.

int fred(int a，int b)int fred(int a, int b)

{{

return foo(a，b)；return foo(a,b);

}}

编译程序利用用户定义的指令foo，将函数转换为下列的汇编语言子程序。The compiler uses the user-defined instruction foo to convert the function into the following assembly language subroutine.

fred：fred:

.frame sp，32.frame sp, 32

entry sp，32entry sp, 32

#APP#APP

foo a2，a2，a3foo a2, a2, a3

#NO_APP#NO_APP

retw.nretw.n

当用户创建一组新的用户定义的改进736时，并不需要编写新的编译程序。TIE编译程序702只是创建文件xtensa_tie.h742，该文件由预先建立的编译程序自动地纳入用户的应用程序。When a user creates a new set of user-defined improvements 736, there is no need to write a new compiler. The TIE compiler 702 simply creates the file xtensa_tie.h 742, which is automatically incorporated into the user's application by the pre-built compiler.

汇编程序710Assembler 710

在这个实施例中，汇编程序710使用编码库744对汇编指令750进行编码。进入该库744的包括如下函数：In this embodiment, assembler 710 encodes assembly instructions 750 using encoding library 744 . Enter this storehouse 744 and comprise following function:

—将操作码助记符字符串转换为内部的操作码表示；— Convert the opcode mnemonic string to the internal opcode representation;

—针对在一条机器指令740中的操作码字段，为每组操作码提供待生成的位图；以及- providing for each set of opcodes a bitmap to be generated for the opcode field in a machine instruction 740; and

—为每一条指令的操作数的操作数数值进行编码，并且将已编码的操作数的位图插入到机器指令740的操作数字段中去。- Encode the operand values of the operands of each instruction and insert the encoded operand bitmap into the operand field of the machine instruction 740 .

举例来说，设想我们上述的例子中调用内部foo的用户函数。汇编程序可能接受指令“foo a2，a2 a3”，然后将它转换成由十六进制数0×62230表示的机器指令，其中，高位6和低位0一起表示foo的操作码，2，2和3分别表示3个寄存器a2，a2和a3。For example, imagine our example above calling a user function inside foo. An assembler might take the instruction "foo a2, a2 a3" and convert it into a machine instruction represented by the hexadecimal number 0x62230, where the high-order 6 and low-order 0 together represent the opcode for foo, 2, 2, and 3 represent three registers a2, a2 and a3 respectively.

这些函数的内部实现是基于表格和内部函数的组合。表格可以由TIE编译程序702容易地生成，但是它们的表达能力却很有限。当需要更大的灵活性时，例如当表达操作数编码函数时，TIE编译程序702就可以生成随机的C语言代码，并被纳入库744之中。The internal implementation of these functions is based on a combination of tables and internal functions. Tables can be easily generated by TIE compiler 702, but their expressive power is limited. When greater flexibility is required, such as when expressing operand encoding functions, the TIE compiler 702 can generate random C code and include it in the library 744 .

再次设想“foo a2，a2，a3”的例子。每个寄存器字段都只是用寄存器的号码进行编码。TIE编译程序702创建下列函数，该函数检查合法的寄存器数值，如果数值是合法的，就返回寄存器的号码。Consider again the "foo a2, a2, a3" example. Each register field is simply encoded with the number of the register. The TIE compiler 702 creates the following function which checks for valid register values and returns the register number if the value is valid.

xtensa_encode_result encode_r(valP)xtensa_encode_result encode_r(valP)

u_int32_t*valp；u_int32_t *valp;

{{

u_int32_t val＝＊valp； u_int32_t val = * valp;

if((val>>4)！＝0) if((val>>4)!=0)

return xtensa_encode_result_too_high； return xtensa_encode_result_too_high;

＊valp＝val； *valp=val;

return xtensa_encode_result_ok； return xtensa_encode_result_ok;

))

如果全部的编码都这么简单，就不需要任何加密函数，只要一份表格就足够了。然而，用户可以选择更加复杂的编码。下列的编码用TIE语言编写，用操作数的值除以1024的商对每个操作数进行编码。这样的编码对于那些要求是1024的倍数的经常被编码的数值是非常有用的。If all encodings were this simple, no encryption functions would be needed, and only one form would suffice. However, users can choose more complex encodings. The following encodings are written in the TIE language and encode each operand by dividing the value of the operand by 1024. Such encodings are useful for frequently encoded values that require multiples of 1024.

Operand t×10t{t<<10}{t×10>>10}Operand t×10t{t<<10}{t×10>>10}

TIE编译程序将操作数编码描述转换为下列的C语言函数。The TIE compiler converts the operand encoding description into the following C language functions.

xtensa_encode_result encode_tx10(valp)xtensa_encode_result encode_tx10(valp)

u_int32_t＊valp；u_int32_t * valp;

{{

u_int32_t t，tx10； u_int32_t t, tx10;

tx10＝＊valp； tx10 = * valp;

t＝(tx10>>10) & 0×f； t=(tx10>>10) &0×f;

tx10＝decode_t×10(t)； tx10=decode_t×10(t);

if(t×10！＝＊valp){ if(t×10!=*valp){

return xtensa_encode_result_not_ok； return xtensa_encode_result_not_ok;

} else{} else{

＊valp＝t； *valp=t;

}}

return xtensa_encode_result_ok；return xtensa_encode_result_ok;

}}

因为对操作数来说，可能的取值范围非常大，所以不能用一份表格来进行这样的编码。表格将不得不非常大。Since the range of possible values for operands is very large, it is not possible to use a table for such encoding. The table will have to be very large.

在编码库744的一个实施例中，一份表格将操作码的助记符字符串映射为内部的操作码表示。为了提高效率，该表格可能被排序，或者它可能是一份散列表，或者允许进行有效检索的其他数据结构。另一份表格将每组操作码跟一条机器指令的样板建立映射关系，将操作码字段初始化为该操作码的适当的位图。具有相同的操作数字段和操作数编码的操作码被组合在一起。对于这些组中的每个操作数，库中包含一个函数将操作数值编码成位图，另一个函数将这些位图插入到机器指令的适当字段之中。一份独立的内部表将每个指令操作数映射为这些函数。设想一个例子，结果寄存器的号码被编码为指令的比特12…15。TIE编译程序702将生成下列函数，将指令的比特12…15设置为结果寄存器的值(号码)：In one embodiment of encoding library 744, a table maps opcode mnemonic strings to internal opcode representations. The table may be sorted for efficiency, or it may be a hash table, or other data structure that allows efficient retrieval. Another table maps each set of opcodes to a template for a machine instruction, initializing the opcode field to the appropriate bitmap for that opcode. Opcodes with the same operand field and operand encoding are grouped together. For each operand in these groups, the library contains a function that encodes the operand value into a bitmap, and another function that inserts the bitmap into the appropriate field in the machine instruction. A separate internal table maps each instruction operand to these functions. Consider an example where the number of the result register is encoded as bits 12...15 of the instruction. The TIE compiler 702 will generate the following function to set bits 12...15 of the instruction to the value (number) of the result register:

void set_r_field(insn，val)void set_r_field(insn, val)

xtensa_insnbuf insn；xtensa_insnbuf insn;

u_int32_t val；u_int32_t val;

{{

insn[0] ＝(insn[0] & 0×ffff0fff)|({val<<12) & 0×f000)； insn[0] = (insn[0] & 0×ffff0fff)|({val<<12) &0×f000);

为了在不用重新编写汇编程序710的情况下就能更改用户定义的指令，编码库744被实施为一个动态连接库(DLL)。DLLs是让程序动态地扩展其功能的标准方式。处理DLLs的细节在不同的主机操作系统中是不同的，但是基本概念是一样的。DLL作为程序代码的扩充，被动态地载入运行中的程序之中。运行时间连接程序解决了DLL和主程序之间以及DLL和其它已经加载的DLLs之间的符号引用。就编码库或DLL744来说，代码的一小部分被静态地连接到汇编程序710。该代码负责加载DLL，将DLL中的信息和预先建立的指令系统746的现有编码信息(可能已经从一个独立的DLL进行加载)加以组合，使该信息可以通过如上所述的各接口函数进行访问。To enable changes to user-defined instructions without rewriting assembler 710, code library 744 is implemented as a dynamic link library (DLL). DLLs are the standard way for programs to dynamically extend their functionality. The details of handling DLLs are different in different host operating systems, but the basic concept is the same. As an extension of the program code, the DLL is dynamically loaded into the running program. The runtime linker resolves symbolic references between the DLL and the main program, and between the DLL and other loaded DLLs. In the case of a coded library or DLL 744 , a small portion of the code is statically linked into the assembler 710 . This code is responsible for loading the DLL, and combines the information in the DLL with the existing encoding information of the pre-established instruction set 746 (may have been loaded from an independent DLL), so that the information can be processed by each interface function as described above. access.

当用户创建新的改进736时，她在改进736描述的描述的基础上启用TIE编译程序702。TIE编译程序702生成的C语言代码定义了实现编码DLL的内部表和函数。TIE编译程序702然后启用主机系统的本机编译程序746(它编译的代码在主机上运行，而不是在被配置的处理器上运行)，以便为用户定义的指令750创建编码DLL144。用户在其应用程序中，使用标志或环境变量来启用预先编写的汇编程序710，这些标志或环境变量指向含有用户定义的的各项改进736的目录。预先编写的汇编程序710在目录中动态地打开DLL744。对于每一条汇编指令来说，预先编写的汇编程序710使用编码DLL744来查找操作码助记符字符串，寻找操作码字段在机器指令中的位图，并对每一个指令操作数进行编码。When a user creates a new improvement 736, she enables the TIE compiler 702 based on the description described by the improvement 736. The C language code generated by the TIE compiler 702 defines the internal tables and functions that implement the coded DLL. The TIE compiler 702 then enables the host system's native compiler 746 (which compiles code to run on the host, rather than the configured processor) to create the coded DLL 144 for the user-defined instructions 750 . Users enable the pre-written assembler 710 in their applications using flags or environment variables that point to directories containing user-defined enhancements 736 . A pre-written assembler 710 dynamically opens the DLL 744 in the directory. For each assembly instruction, the pre-written assembler 710 uses the encoding DLL 744 to look up the opcode mnemonic string, find the bitmap of the opcode field in the machine instruction, and encode each instruction operand.

例如，当汇编程序710发现TIE指令“foo a2，a2，a3”时，汇编程序710通过一份表格发现，“foo”操作码转换为处于比特位置16到23的数字6。从表中，它为每一个寄存器找到编码函数。函数将a2编码为数字2，将另一个a2编码为数字2，将a3编码为数字3。从表中，它找到适当的设置函数。Set_r_field将结果数值2放入该指令的位单元12…15。类似的设置函数也将另外的2和3放入适当的地方。For example, when the assembler 710 finds the TIE instruction "foo a2, a2, a3", the assembler 710 finds through a table that the "foo" opcode translates to the number 6 at bit positions 16 through 23. From the table, it finds the encoding function for each register. The function encodes a2 as the number 2, another a2 as the number 2, and a3 as the number 3. From the table, it finds the appropriate setup function. Set_r_field places the result value 2 in bit locations 12...15 of this instruction. A similar setup function also puts the additional 2 and 3 in place.

仿真程序712Emulator 712

仿真程序712以几种方式与用户定义的各项改进736进行交互。对于机器指令740来说，仿真程序712必须对指令进行解码，也就是说，将指令分解为操作码和操作数单元。用户定义的各项改进736通过解码DLL748的一个函数进行解码(编码DLL744和解码DLL748实际上可能是同一个DLL)。例如，假定用户定义了三个操作码：foo1，foo2和foo3，在指令的比特16到23的编码分别为0×6，0×16和0×26，以及在比特0到3为0，。TIE编译程序702生成下列的解码函数，它将操作码与所有用户定义的指令750进行比较：Simulator 712 interacts with user-defined improvements 736 in several ways. For machine instructions 740, the emulator 712 must decode the instruction, that is, break the instruction into opcode and operand units. User defined enhancements 736 are decoded by a function of decode DLL 748 (encode DLL 744 and decode DLL 748 may actually be the same DLL). For example, suppose the user defines three opcodes: foo1, foo2, and foo3, encoded as 0x6, 0x16, and 0x26 in bits 16 to 23 of the instruction, and 0 in bits 0 to 3, respectively. The TIE compiler 702 generates the following decode function, which compares the opcode to all user-defined instructions 750:

int decode_insn(const xtensa_insnbuf insn)int decode_insn(const xtensa_insnbuf insn)

{{

if((insn[0] & 0×ff000f)＝＝0×60000)return xtensa_fool_op； if((insn[0] & 0×ff000f)==0×60000) return xtensa_fool_op;

if((insn[0] & 0×ff000f)＝＝0×160000)return if((insn[0] & 0×ff000f)==0×160000) return

xtensa_foo2_op；xtensa_foo2_op;

if((insn[0] & 0×ff000f)＝＝0×260000)return if((insn[0] & 0×ff000f)==0×260000) return

xtensa_foo3_op；xtensa_foo3_op;

return XTENSA_UNDEFINED； return XTENSA_UNDEFINED;

} }

当用户定义的指令数量很多时，将操作码与所有可能的用户定义的指令750进行比较可能是费时的，所以TIE编译程序可以使用分开层次的开关语句组来代替。When the number of user-defined instructions is large, it may be time-consuming to compare opcodes to all possible user-defined instructions 750, so the TIE compiler may use a separate hierarchy of switch statement groups instead.

switch(get_op0_field(insn)){switch(get_op0_field(insn)){

case 0×0： case 0×0:

switch(get_op1_field(insn)){ switch(get_op1_field(insn)){

case 0×6： case 0×6:

switch(get_op2_field(insn)){ switch(get_op2_field(insn)){

case 0×0：return xtensa_fool_op； case 0×0: return xtensa_fool_op;

case 0×1：return xtensa_foo2_op； case 0×1: return xtensa_foo2_op;

case 0×2：return xtensa_foo3_op； case 0×2: return xtensa_foo3_op;

default：return XTENSA_UNDEFINED； default: return XTENSA_UNDEFINED;

} }

default：return XTENSA_UNDEFINED； default: return XTENSA_UNDEFINED;

} }

default：return XTENSA_UNDEFINED； default: return XTENSA_UNDEFINED;

}}

除了对指令操作码进行解码之外，解码DLL748还包括用于对指令操作数进行解码的函数。完成的方式同编码DLL744中对操作数进行编码完全相同。首先，解码DLL748的函数从机器指令中选取操作数字段。继续上述的例子，TIE编译程序702生成下列的函数，从一个指令的12到15比特选取一个数值：In addition to decoding instruction opcodes, decode DLL 748 includes functions for decoding instruction operands. This is done in exactly the same way as the operands are encoded in the encoding DLL744. First, the function that decodes the DLL748 picks the operand field from the machine instruction. Continuing with the above example, the TIE compiler 702 generates the following function to select a value from bits 12 through 15 of an instruction:

u_int32_t get_r_field (insn)u_int32_t get_r_field (insn)

xtensa_insnbuf insn； xtensa_insnbuf insn;

{{

return((insn[0] & 0×f000)>>12)； return((insn[0] &0×f000)>>12);

}}

TIE对一个操作数的描述包括编码和解码的描述，所以鉴于编码DLL744使用操作数编码描述，解码DLL748使用操作数解码描述。例如，TIE操作数的描述是：The TIE description of an operand includes the description of encoding and decoding, so whereas the encoding DLL 744 uses the operand encoding description, the decoding DLL 748 uses the operand decoding description. For example, the description of the TIE operand is:

Operand t×10t{t<<10}{t×10>>10}Operand t×10t{t<<10}{t×10>>10}

生成下列的操作数解码函数：Generates the following operand decoding functions:

u_int32_t decode_t×10(val)u_int32_t decode_t×10(val)

u_int32_t val； u_int32_t val;

{{

u_int32_t t，t×10； u_int32_t t, t×10;

t＝val； t = val;

t×10＝t<<10； t×10=t<<10;

return t×10； return t×10;

}}

当用户启用仿真程序712时，她告诉仿真程序712含有用户定义的各项改进736的解码DLL748的目录。仿真程序712打开适当的DLL。每当仿真程序712对一条指令进行解码时，如果该指令没有通过预先编写的指令系统的解码函数成功地进行解码，那么仿真程序712就启用DLL748中的解码函数。When the user starts the emulator 712, she tells the emulator 712 the directory containing the decoding DLL 748 for the user-defined enhancements 736. Emulator 712 opens the appropriate DLL. Whenever emulator 712 decodes an instruction, emulator 712 enables the decode function in DLL 748 if the instruction is not successfully decoded by the pre-programmed decode function of the instruction set.

给出一个已解码的指令750之后，仿真程序712必须对指令750的语义进行解释和模拟。这用函数方式完成。每条指令750都有对应的函数，让仿真程序712对该指令750的语义进行模拟。仿真程序712在内部对被模拟的处理器的全部状态保持跟踪。仿真程序712有固定的接口用于更新或查询处理器的状态。如上所述，用户定义的各项改进736是由TIE硬件描述语言写成的，该语言是Verilog的一个子集。TIE编译程序702将硬件描述语言转换为C语言函数，仿真程序712利用上述的C语言函数来模拟新的改进736。硬件描述语言运算符直接地转换为对应的C语言运算符。读状态或写状态的操作被转换为仿真程序的界面，用于对处理器状态进行更新或查询。Given a decoded instruction 750, the emulator 712 must interpret and simulate the semantics of the instruction 750. This is done in a functional way. Each instruction 750 has a corresponding function, allowing the simulation program 712 to simulate the semantics of the instruction 750 . The simulation program 712 internally keeps track of the overall state of the processor being simulated. The emulation program 712 has a fixed interface for updating or querying the state of the processor. As noted above, user-defined enhancements 736 are written in the TIE Hardware Description Language, which is a subset of Verilog. The TIE compiler 702 converts the hardware description language into C language functions, and the simulation program 712 uses the above C language functions to simulate the new improvement 736 . Hardware description language operators translate directly to corresponding C language operators. The operation of reading state or writing state is translated into the interface of emulator for updating or querying processor state.

作为本实施例中的一个例子，假定有一个用户创建一条指令，用以增加两个寄存器。选择这个例子只是为了简便。用户可以用硬件描述语言对增加的语义做如下的描述：As an example in this embodiment, assume that a user creates an instruction to add two registers. This example was chosen for simplicity only. Users can use the hardware description language to describe the added semantics as follows:

Semantic add{add}{assign arr＝ars+art；}Semantic add{add}{assign arr=ars+art;}

输出寄存器由内部的名字arr来表示，它被赋予了两个输入寄存器的和，这两个输入寄存器的内部名字分别是ars和art。TIE编译程序702采取这种描述，并生成仿真程序712使用的语义函数：The output register, denoted by the internal name arr, is given the sum of the two input registers, whose internal names are ars and art. TIE compiler 702 takes this description and generates semantic functions used by simulator 712:

void add_func(u32 _OPND0_，u32_OPND1_，u32_OPND2_，u32 void add_func(u32_OPND0_, u32_OPND1_, u32_OPND2_, u32

_OPND3_)_OPND3_)

{ {

set_ar(_OPND0_，ar(_OPND1_)+ar(_OPND2_))； set_ar(_OPND0_, ar(_OPND1_)+ar(_OPND2_));

pc_incr(3)； pc_incr(3);

} }

硬件运算符“+”直接地转换为对应的C语言运算符“+”。硬件寄存器ars和art的读取被转换为对仿真程序712的函数“ar”的调用。硬件寄存器arr的写入被转换为对仿真程序712的函数“set ar”的调用。因为每条指令都隐含地将程序计数器pc的内容增加了该指令的大小，所以TIE编译程序702也生成对仿真程序712函数的调用，使被仿真的pc增加3，即加法指令大小。The hardware operator "+" is directly converted to the corresponding C language operator "+". The reading of the hardware registers ars and art is translated into a call to the function "ar" of the emulator 712 . The writing of the hardware register arr is converted into a call to the function “set ar” of the emulation program 712 . Because each instruction implicitly increases the contents of the program counter pc by the size of that instruction, the TIE compiler 702 also generates a call to the emulator 712 function that increments the emulated pc by 3, the size of the add instruction.

当TIE编译程序702被启用时，为每个用户定义的指令都创建一个如上所述的语义函数，同时也创建一份表格，它将全部的操作码名字映射到相关的语义函数之中。使用标准的编译程序746将表格与函数编译到仿真程序DLL749中去。当用户启用仿真程序712时，她告诉仿真程序712含有用户定义的各项改进736的目录。仿真程序712打开适当的DLL。每当启用仿真程序712时，它对程序中所有的指令进行解码，并创建一份表格，其中含有各项指令与相关的各语义函数的映射关系。建立映射关系时，仿真程序712打开DLL，检索适当的语义函数。当对用户定义的指令736的语义进行仿真时，仿真程序712直接地启用DLL中的函数。When the TIE compiler 702 is enabled, a semantic function as described above is created for each user-defined instruction, along with a table that maps all opcode names to the associated semantic functions. The tables and functions are compiled into the emulator DLL 749 using a standard compiler 746 . When the user starts the emulator 712, she tells the emulator 712 the directory containing the user-defined enhancements 736. Emulator 712 opens the appropriate DLL. Whenever the emulation program 712 is activated, it decodes all the instructions in the program and creates a table containing the mapping relationship between each instruction and each related semantic function. When the mapping is established, the emulator 712 opens the DLL and retrieves the appropriate semantic functions. When emulating the semantics of the user-defined instructions 736, the emulator 712 directly invokes the functions in the DLL.

为了告诉用户在被仿真的硬件上运行应用程序所需的时间有多长，仿真程序712需要仿真指令750的执行效果。仿真程序712为此使用了流水线模型。每条指令在几个周期上执行。在每一个周期，指令使用机器的不同资源。仿真程序712开始尝试并行地执行所有的指令。若多条指令在同一个周期中使用相同的资源，则后面的指令被挂起，以等待资源腾出来。若后面的指令读取前面的指令在后面的周期中所写入的状态，则后面的指令就被挂起，以等待该数值被写入。仿真程序712使用函数接口来模拟每一条指令的效果。为每一种类型的指令都创建一个函数。这些函数包括对仿真程序接口的调用，该接口模拟处理器的性能。In order to tell the user how long it takes to run the application program on the simulated hardware, the simulation program 712 needs to simulate the execution effect of the instruction 750 . The simulation program 712 uses a pipeline model for this. Each instruction executes over several cycles. In each cycle, instructions use different resources of the machine. Emulator 712 initially attempts to execute all instructions in parallel. If multiple instructions use the same resource in the same cycle, the following instructions are suspended to wait for the resource to be freed. If a subsequent instruction reads the state written by a previous instruction in a subsequent cycle, the subsequent instruction is suspended, waiting for the value to be written. Simulator 712 uses a functional interface to simulate the effect of each instruction. Create a function for each type of instruction. These functions include calls to the emulator interface, which simulates the performance of the processor.

例如，假定有一个简单的3寄存器指令foo。TIE编译程序可能会创建下列的仿真程序函数：For example, suppose there is a simple 3-register instruction foo. The TIE compiler may create the following emulator functions:

void foo_sched(u32 op0，u32 op1，u32 op2，u32 op3)void foo_sched(u32 op0, u32 op1, u32 op2, u32 op3)

pipe_use_i fetch(3)； pipe_use_i fetch(3);

pipe_use(REGF32_AR，op1，1)； pipe_use(REGF32_AR, op1, 1);

pipe_use(REGF32_AR，op2，1)； pipe_use(REGF32_AR, op2, 1);

pipe_def(REGF32_AR，op0，2)； pipe_def(REGF32_AR, op0, 2);

pipe_def_ifetch(-1)； pipe_def_ifetch(-1);

} }

对pipe_use_ifetch的调用告诉仿真程序712称，该指令将需要取3个字节。对pipe_use的两次调用告诉仿真程序712称，两个输入寄存器将在周期1读入。对pipe_def的调用告诉仿真程序712称，输出寄存器将要在周期2被写入。对pipe_def_ifetch的调用告诉仿真程序712称，该指令不是一个分支，因此下一条指令可以在下一个周期被取出。The call to pipe_use_ifetch tells the emulator 712 that the instruction will need to fetch 3 bytes. The two calls to pipe_use tell the emulator 712 that the two input registers are to be read in cycle one. The call to pipe_def tells the emulator 712 that the output register is going to be written in cycle 2. The call to pipe_def_ifetch tells the emulator 712 that this instruction is not a branch, so the next instruction can be fetched on the next cycle.

这些函数的指针跟各语义函数一起放在同一份表格中。函数本身就像语义函数一样被编译到DLL749之中。当启用仿真程序712时，它创建了指令和运行函数的映射关系。当建立映射关系时，仿真程序712打开DLL749，检索适当的性能函数。当对用户定义的指令736的执行情况进行仿真时，仿真程序712直接启用DLL749中的函数。The pointers to these functions are placed in the same table with the semantic functions. The functions themselves are compiled into DLL749 just like semantic functions. When the emulator 712 is enabled, it creates a mapping of instructions and execution functions. When the mapping is established, the emulator 712 opens the DLL 749 to retrieve the appropriate performance functions. When simulating the execution of user-defined instructions 736 , the emulator 712 directly invokes the functions in the DLL 749 .

调试程序730debugger 730

调试程序以两种方式与用户定义的各项改进750进行交互。首先，用户能够显示针对用户定义的各项指令736的汇编语言指令738。为了做到这一点，调试程序730必须将机器语言指令740解码为汇编语言指令738。这与仿真程序712对指令进行解码时使用的原理是相同的，并且调试程序730使用的DLL最好跟仿真程序712解码时使用的DLL完全相同。除了对各项指令进行解码外，调试程序还必须将已解码的指令转换为字符串。为此，解码DLL748包括一个函数，它将每个内部的操作码表示映射为相应的助记符字符串。这可以通过一份简单的表个来实现。The debugger interacts with user-defined enhancements 750 in two ways. First, the user can display assembly language instructions 738 for user-defined instructions 736 . To do this, debugger 730 must decode machine language instructions 740 into assembly language instructions 738 . This is the same principle used when the emulator 712 decodes instructions, and the DLL used by the debugger 730 is preferably exactly the same as the DLL used by the emulator 712 when decoding. In addition to decoding individual instructions, the debugger must convert the decoded instructions into strings. To this end, the decode DLL748 includes a function that maps each internal opcode representation to the corresponding mnemonic string. This can be accomplished with a simple form.

用户可以使用标志或环境变量来启用预先编写的调试程序，这些标志或环境变量指向含有用户定义的各项改进750的目录。预先建立的调试程序动态地打开适当的DLL748。A user may enable a pre-written debugger using flags or environment variables pointing to a directory containing user-defined enhancements 750 . A pre-built debugger dynamically opens the appropriate DLL748.

调试程序730也跟用户定义的状态752进行交互。调试程序730必须能够读取和修改状态752。为此，调试程序730与仿真程序712进行通信。它向仿真程序712询问状态有多大，以及状态变量的名字是什么。每当调试程序730被要求显示用户状态的数值时，它就像询问预先定义的状态一样向仿真程序712询问该数值。类似地，为了修改用户的状态，调试程序730告诉仿真程序712将状态设置成一个给定值。Debugger 730 also interacts with user-defined states 752 . Debugger 730 must be able to read and modify state 752 . To this end, debugger 730 communicates with emulator 712 . It asks the emulator 712 how big the state is, and what the names of the state variables are. Whenever the debugger 730 is asked to display a value for a user state, it queries the emulator 712 for the value as if it were a predefined state. Similarly, to modify the user's state, the debugger 730 tells the emulator 712 to set the state to a given value.

因此，可以看出，根据本发明对用户定义的指令集和状态加以支持的实施方案，可以使用定义用户函数的模块来完成，这些模块被嵌入到核心软件开发工具之中。因此，开发一个系统时，特定的用户定义的各项改进嵌入模块可以作为系统内部的一组来使用，以便于组织和操作。Thus, it can be seen that support for user-defined instruction sets and states according to embodiments of the present invention can be accomplished using modules defining user functions embedded in the core software development tools. Therefore, when developing a system, specific user-defined enhancement embedded modules can be used as a group within the system for easy organization and operation.

此外，核心软件开发工具可能专用于特定的核心指令集以及处理器状态，并且用户定义的各项改进单一的嵌入模块的集合，可能跟驻留在系统中的许多核心软件开发工具结合在一起进行评价。In addition, core software development tools may be dedicated to a specific core instruction set and processor state, and user-defined enhancements to a single set of embedded modules may be combined with many core software development tools residing in the system evaluate.

附件AAnnex A

#Xtensa配置数据库说明#Xtensa Configuration Database Description

#$Id：Definition，v1.651999/02/04 15:30:45adixit Exp $.#$Id: Definition, v1.65 1999/02/04 15:30:45adixit Exp $.

#这些已编码的指令，语句，以及计算机程序均为Tensilica公司#These coded instructions, statements, and computer programs are from Tensilica

#的保密的专有信息，在没有得到Tensilica公司事先的书面同意的情况下，不得向第三方公开，或者以任何形式进行全部的或部分的拷贝# Confidential proprietary information that may not be disclosed to third parties or copied in whole or in part in any form without the prior written consent of Tensilica

##

#这是配置参数定义文件。#This is the configuration parameter definition file.

# —所有被支持的配置都必须在这个文件中加以说明# - all supported configurations must be specified in this file

# —所有分析配置的工具都应当检查这个文件的正确性# - All tools that analyze configuration should check this file for correctness

# —对这个文件的更改应当保持最小，并且小心地进行处理# - Changes to this file should be kept minimal and handled with care

##

# 命名惯例# naming convention

# 大多数参数的名字都以列表中的一个类名开头：# Most parameter names start with a class name from the list:

# Addr 地址和转换参数# Addr address and conversion parameters

# Build ？# Build?

# Cad 目标CAD环境# Cad target CAD environment

# DV 各项设计验证参数# DV Various design verification parameters

# Data 下列各项中的一项：# Data One of the following:

# DataCache 数据高速缓冲存储器参数# DataCache data cache parameter

# DataRAM 数据RAM参数# DataRAM data RAM parameters

# DataROM 数据ROM参数# DataROM data ROM parameters

# Debug 调试程序选项参数# Debug debugger options parameter

# Impl 实施方案各项目标# Impl implements the goals of the scheme

# Inst 下列各项中的一项：# Inst One of the following:

# InstCache 指令高速缓冲存储器参数# InstCache instruction cache parameters

# InstRAM 指令RAM参数# InstRAM instruction RAM parameters

# InstROM 指令ROM参数# InstROM instruction ROM parameter

# Interrupt 中断参数# Interrupt interrupt parameters

# Isa 指令集体系结构参数# Isa instruction set architecture parameters

# Iss 指令集仿真程序参数# Iss instruction set emulator parameters

# PIF 处理器接口参数# PIF processor interface parameters

# Sys 系统参数(例如存储器映射)# Sys system parameters (eg memory map)

# TIE 专用指令参数# TIE special instruction parameters

# Test 生产测试参数# Test production test parameters

# Timer 周期计数/比较选项# Timer cycle count/compare options

# Vector 复位/除外情况/中断矢量地址# Vector reset/exception/interrupt vector address

# 许多参数以一个后缀结尾，给出它们被测量时所用的单位：# Many parameters end with a suffix giving the units in which they are measured:

# Bits# Bits

# Bytes (即8位)# Bytes (ie 8 bits)

# Count 用作一般的“的数目”后缀# Count is used as a general "number of" suffix

# Entries 类似于Count# Entries are similar to Count

# Filename 文件的绝对路径名# Filename absolute pathname of the file

# Interrupt 中断标识(0…31)# Interrupt interrupt flag (0...31)

# Level 中断等级(1…15)# Level interrupt level (1...15)

# Max 最大值# Max maximum value

# Paddr 物理地址# Paddr physical address

# Type 可能数值的一个枚举# An enumeration of possible values for Type

# Vaddr 虚拟地址# Vaddr virtual address

本文档的格式：Format of this document:

列1：配置参数名Column 1: Configuration parameter name

列2：参数的默认值Column 2: Default value of the parameter

列3：核对值的有效性的perl表示Column 3: Perl representation of the validity of the check value

# Xtensa Configuration Database Specification# Xtensa Configuration Database Specification

# SId：Definition，v1.65 1999/02/04 15:30:45adixit Exp $# SId: Definition, v1.65 1999/02/04 15:30:45adixit Exp $

□□

# These coded instructions，statements，andcomputer programs are# These coded instructions, statements, and computer programs are

# Confidential Proprietary Information of Tensilica Inc.and may not# Confidential Proprietary Information of Tensilica Inc. and may not

bebe

# disclosed to third parties or copied in any form，in whole or in# disclosed to third parties or copied in any form, in whole or in

part，part,

# without the prior written consent of Tensilica Inc。# without the prior written consent of Tensilica Inc.

##

# This is the configuration parameter definition file。# This is the configuration parameter definition file.

# -All supported configurations must be declared in thisfile# -All supported configurations must be declared in thisfile

# -All tools parsing configurations must check against this file for# -All tools parsing configurations must check against this file for

validityvalidity

# -Changes to this file must be kept minimum and dealt with care# -Changes to this file must be kept minimum and dealt with care

##

# Naming Conventions# Naming Conventions

# Most parameter names begin with a category name from the following# Most parameter names begin with a category name from the following

# list：# list:

# Addr Addressing and translation parameters# Addr Addressing and translation parameters

# Build ？# Build?

# Cad Target CAD environment# Cad Target CAD environment

# DV Design Verification parameters# DV Design Verification parameters

# Data One of the following：# Data One of the following:

# DataCache Data Cache parameters# DataCache Data Cache parameters

# DataRAM Data RAM parameters# DataRAM Data RAM parameters

# DataROM Data ROM parameters# DataROM Data ROM parameters

# Debug Debug option parameters# Debug Debug option parameters

# Impl Implementation goals# Impl Implementation goals

# Inst One of the following：#Inst One of the following:

# InstCache Instruction Cache parameters# InstCache Instruction Cache parameters

# InstRAM Instruction RAM parameters#InstRAM Instruction RAM parameters

# InstROM Instruction ROM parameters#InstROM Instruction ROM parameters

# Interrupt Interrupt parameters# Interrupt Interrupt parameters

# Isa Instruction Set Architecture parameters# Isa Instruction Set Architecture parameters

# Iss Instruction Set Simulator parameters# Iss Instruction Set Simulator parameters

# PIF Processor Interface parameters# PIF Processor Interface parameters

# Sys System parameters (e.g.memory map)# Sys System parameters (e.g. memory map)

# TIE Application-specific instruction parameters#TIE Application-specific instruction parameters

# Test Manufacturing Test parameters# Test Manufacturing Test parameters

# Timer Cycle count/compare option parameters# Timer Cycle count/compare option parameters

# Vector Reset/Exception/Interrupt vector addresses# Vector Reset/Exception/Interrupt vector addresses

# Many parameters end in a suffix giving the units in which they# Many parameters end in a suffix giving the units in which they

# are measured：# are measured:

# Bits# Bits

# Bytes (i.e. 8 bits)# Bytes (i.e. 8 bits)

# Count used as a generic″number of″suffix# Count used as a generic "number of" suffix

# Entries similar to Count# Entries similar to Count

# Filename absoluate pathname of file# Filename absolute pathname of file

# Interrupt interrupt id (0..31)# Interrupt interrupt id (0..31)

########################################################################################################################## #####################

##

ISA选项 ISA option

##

############

IsaUseClamps 0 0|1IsaUseClamps 0 0|1

IsaUseMAC16 0 0|1IsaUseMAC16 0 0|1

IsaUseMul16 0 0|1IsaUseMul16 0 0|1

IlsaUseException 1 1IlsaUseException 1 1

IsaUseInterrupt 0 0|1IsaUseInterrupt 0 0|1

IsaUseHighLevelInterrupt 0 0|1IsaUseHighLevelInterrupt 0 0|1

IsaUseDebug 0 0|1IsaUseDebug 0 0|1

IsaUseTimer 0 0|1IsaUseTimer 0 0|1

IsaUseWindowedRegisters 1 1IsaUseWindowedRegisters 1 1

IsaMemoryOrder LittleEndian LittleEndian|BigEndianIsaMemoryOrder LittleEndian LittleEndian|BigEndian

IsaARRegisterCount 32 32|64IsaARRegisterCount 32 32|64

############

# 地址和转换# address and translation

############

AddrPhysicalAddressBits 32 1[6-9]|2[0-9]|3[0-2]AddrPhysicalAddressBits 32 1[6-9]|2[0-9]|3[0-2]

AddrVirtualAddressBits 32 1[6-9]|2[0-9]|3[0-2]AddrVirtualAddressBits 32 1[6-9]|2[0-9]|3[0-2]

############

# 数据高速缓冲存储器/RAM/ROM# Data Cache/RAM/ROM

############

DataCacheBytes 1k 0k|1k|2k|4k|8k|16kDataCacheBytes 1k 0k|1k|2k|4k|8k|16k

DataCacheLineBytes 16 16|32|64DataCacheLineBytes 16 16|32|64

DataRAMBytes 0k 0k|1k|2k|4k|8k|16kData RAM Bytes 0k 0k|1k|2k|4k|8k|16k

DataROMBytes 0k 0k|1k|2k|4k|8k|16kDataROM Bytes 0k 0k|1k|2k|4k|8k|16k

DataWriteBufferEntries 4 4|8|16|32DataWriteBufferEntries 4 4|8|16|32

DataCacheAccessBits 32 32|64|128DataCacheAccessBits 32 32|64|128

##

指令高速缓冲存储器/RAM/ROM Instruction Cache/RAM/ROM

##

############

InstCacheBytes 1k 0k|1k|2k|4k|8k|16kInstCacheBytes 1k 0k|1k|2k|4k|8k|16k

InstCacheLineBytes 16 16|32|64InstCacheLineBytes 16 16|32|64

InstRAMBytes 0k 0k|1k|2k|4k|8k|16kInstRAMBytes 0k 0k|1k|2k|4k|8k|16k

InstROMBytes 0k 0k|1k|2k|4k|8k|16kInstROM Bytes 0k 0k|1k|2k|4k|8k|16k

InstCacheAccessBits 32 32|64|128InstCacheAccessBits 32 32|64|128

####

处理器接口 processor interface

##

############

PIFReadDataBits 32 32|64|128PIFReadDataBits 32 32|64|128

PIFWriteDataBits 32 32|64|128PIFWriteDataBits 32 32|64|128

PIFTracePort 0 0|1PIFTracePort 0 0|1

####

系统 system

##

############

SysAppStartVAddr 0×40001000 0×[0-9a-fA-F]+SysAppStartVAddr 0×40001000 0×[0-9a-fA-F]+

SysDefaultCacheAttr 0×fff21122 0×[0-9a-fA-F]+SysDefaultCacheAttr 0×fff21122 0×[0-9a-fA-F]+

SysROMBytes 128k [0-9]+(k|m)SysROM Bytes 128k [0-9]+(k|m)

SysROMPAddr 0×20000000 0×[0-9a-fA-F]+SysROMPAddr 0×20000000 0×[0-9a-fA-F]+

SysRAMBytes 1m [0-9]+(k|m)SysRAMBytes 1m [0-9]+(k|m)

SysRAMPAddr 0×40000000 0×[0-9a-fA-F]+SysRAMPAddr 0×40000000 0×[0-9a-fA-F]+

SysStackBytes 16k [0-9]+(k|m)SysStackBytes 16k [0-9]+(k|m)

SysXMONBytes 0×0000fd00 0×[0-9a-fA-F]+SysXMONBytes 0×0000fd00 0×[0-9a-fA-F]+

SysXMONVAddr 0×20000300 0×[0-9a-fA-F]+SysXMONVAddr 0×20000300 0×[0-9a-fA-F]+

SysXTOSBytes 0x00000c00 0x[0-9a-fA-F]+SysXTOSBytes 0x00000c00 0x[0-9a-fA-F]+

SysXTOSVAddr 0×40000400 0×[0-9a-fA-F]+SysXTOSVAddr 0×40000400 0×[0-9a-fA-F]+

#″″″″#″″″″

矢量地址 vector address

##

######################################################################################################################### ####################

############

VectorResetVAddr 0×20000020 0×[0-9a-fA-F]+VectorResetVAddr 0×20000020 0×[0-9a-fA-F]+

VectorUserExceptionVAddr 0×40000214 0×[0-9a-fA-F]+VectorUserExceptionVAddr 0×40000214 0×[0-9a-fA-F]+

VectorKernelExceptionVAddr 0×40000204 0×[0-9a-fA-F]+VectorKernelExceptionVAddr 0×40000204 0×[0-9a-fA-F]+

VectorWindowBaseVAddr 0×40000000 0×[0-9a-fA-F]+VectorWindowBaseVAddr 0×40000000 0×[0-9a-fA-F]+

VectorLevel2InterruptVAddr 0×40000224 0×[0-9a-fA-F]+VectorLevel2InterruptVAddr 0×40000224 0×[0-9a-fA-F]+

VectorLevel3InterruptVAddr 0×40000234 0×[0-9a-fA-F]+VectorLevel3InterruptVAddr 0×40000234 0×[0-9a-fA-F]+

############

中断选项 interrupt option

##

############

InterruptCount 1 [1-9]|1[0-9]|2[0-9]|3[0-2]InterruptCount 1 [1-9]|1[0-9]|2[0-9]|3[0-2]

InterruptLevelMax 1 [1-3]InterruptLevelMax 1 [1-3]

Interrupt0Type External External|Internal|SoftwareInterrupt0Type External External|Internal|Software

InterruptlType External External|Internal|SoftwareInterruptlType External External|Internal|Software

Interrupt2Type External External|Internal|SoftwareInterrupt2Type External External|Internal|Software

Interrupt3Type External External|Internal|SoftwareInterrupt3Type External External|Internal|Software

Interrupt4Type Externa External|Internal|SoftwareInterrupt4Type Externa External|Internal|Software

Interrupt5Type External External|Internal|SoftwareInterrupt5Type External External|Internal|Software

Interrupt6Type External External|Internal|SoftwareInterrupt6Type External External|Internal|Software

Interrupt7Type External External|Internal|SoftwareInterrupt7Type External External|Internal|Software

Interrupt8Type External External|Internal|SoftwareInterrupt8Type External External|Internal|Software

Interrupt9Type External External|Internal|SoftwareInterrupt9Type External External|Internal|Software

Interrupt10Type External External|Internal|SoftwareInterrupt10Type External External|Internal|Software

Interrupt1lType External External|Internal|SoftwareInterrupt1lType External External|Internal|Software

Interrupt12Type External External|Internal|SoftwareInterrupt12Type External External|Internal|Software

Interrupt13Type External External|Internal|SoftwareInterrupt13Type External External|Internal|Software

Interrupt14Type External External|Internal|SoftwareInterrupt14Type External External|Internal|Software

Interrupt15Type External External|Internal|SoftwareInterrupt15Type External External|Internal|Software

Interrupt16Type External External|Internal|SoftwareInterrupt16Type External External|Internal|Software

Interrupt17Type External External|Internal|SoftwareInterrupt17Type External External|Internal|Software

Interrupt18Type External External|Internal|SoftwareInterrupt18Type External External|Internal|Software

Interrupt19Type External External|Internal|SoftwareInterrupt19Type External External|Internal|Software

Interrupt20Type External External|Internal|SoftwareInterrupt20Type External External|Internal|Software

Interrupt21Type External External|Internal|SoftwareInterrupt21Type External External|Internal|Software

Interrupt22Type External External|Internal|SoftwareInterrupt22Type External External|Internal|Software

Interrupt23Type External External|Internal|SoftwareInterrupt23Type External External|Internal|Software

Interrupt24Type External External|Internal|SoftwareInterrupt24Type External External|Internal|Software

Interrupt25Type External External|Internal|SoftwareInterrupt25Type External External|Internal|Software

Interrupt26Type External External|Internal|SoftwareInterrupt26Type External External|Internal|Software

Interrupt27Type External External|Internal|SoftwareInterrupt27Type External External|Internal|Software

Interrupt28Type External External|Internal|SoftwareInterrupt28Type External External|Internal|Software

Interrupt29Type External External|Internal|SoftwareInterrupt29Type External External|Internal|Software

Interrupt30Type External External|Internal|SoftwareInterrupt30Type External External|Internal|Software

Interrupt31Type External External|Internal|SoftwareInterrupt31Type External External|Internal|Software

Interrupt0Level 1 [1-3]Interrupt0Level 1 [1-3]

InterruptlLevel 1 [1-3]InterruptlLevel 1 [1-3]

Interrupt2Level 1 [1-3]Interrupt2Level 1 [1-3]

Interrupt3Level 1 [1-3]Interrupt3Level 1 [1-3]

Interrupt4Level 1 [1-3]Interrupt4Level 1 [1-3]

Interrupt5Level 1 [1-3]Interrupt5Level 1 [1-3]

Interrupt6Level 1 [1-3]Interrupt6Level 1 [1-3]

Interrupt7Level 1 [1-3]Interrupt7Level 1 [1-3]

Interrupt8Level 1 [1-3]Interrupt8Level 1 [1-3]

Interrupt9Level 1 [1-3]Interrupt9Level 1 [1-3]

Interrupt10Level 1 [1-3]Interrupt10Level 1 [1-3]

InterruptllLevel 1 [1-3]InterruptllLevel 1 [1-3]

Interrupt12Level 1 [1-3]Interrupt12Level 1 [1-3]

Interrupt13Level 1 [1-3]Interrupt13Level 1 [1-3]

Interrupt14Level 1 [1-3]Interrupt14Level 1 [1-3]

Interrupt15Level 1 [1-3]Interrupt15Level 1 [1-3]

Interrupt16Level 1 [1-3]Interrupt16Level 1 [1-3]

Interrupt17Level 1 [1-3]Interrupt17Level 1 [1-3]

Interrupt18Level 1 [1-3]Interrupt18Level 1 [1-3]

Interrupt19Level 1 [1-3]Interrupt19Level 1 [1-3]

Interrupt20Level 1 [1-3]Interrupt20Level 1 [1-3]

Interrupt21Level 1 [1-3]Interrupt21Level 1 [1-3]

Interrupt22Level 1 [1-3]Interrupt22Level 1 [1-3]

Interrupt23Level 1 [1-3]Interrupt23Level 1 [1-3]

Interrupt24Level 1 [1-3]Interrupt24Level 1 [1-3]

Interrupt25Level 1 [1-3]Interrupt25Level 1 [1-3]

Interrupt26Level 1 [1-3]Interrupt26Level 1 [1-3]

Interrupt27Level 1 [1-3]Interrupt27Level 1 [1-3]

Interrupt28Level 1 [1-3]Interrupt28Level 1 [1-3]

Interrupt29Level 1 [1-3]Interrupt29Level 1 [1-3]

Interrupt30Level 1 [1-3]Interrupt30Level 1 [1-3]

Interrupt31Level 1 [1-3]Interrupt31Level 1 [1-3]

其他处理器部件选项处理器定时器选项 Other Processor Component Options Processor Timer Options

##

############

TimerCount 0 [0-3]TimerCount 0 [0-3]

Timer0Interrupt 0 [0-9]|1[0-9[12[0-9]|3[0-1]Timer0Interrupt 0 [0-9]|1[0-9[12[0-9]|3[0-1]

Timer1Interrupt 0 [0-9]|1[0-9]12[0-9]|3[0-1]Timer1 Interrupt 0 [0-9]|1[0-9]12[0-9]|3[0-1]

Timer2Interrupt 0 [0-9]|1[0-9]12[0-9]|3[0-1]Timer2Interrupt 0 [0-9]|1[0-9]12[0-9]|3[0-1]

############

调试程序选项 Debugger options

##

############

DebugDataVAddrTrapCount 0 [0-2]DebugDataVAddrTrapCount 0 [0-2]

DebugInstVAddrTrapCount 0 [0-2]DebugInstVAddrTrapCount 0 [0-2]

DebugInterruptLevel 2 [2-3]DebugInterruptLevel 2 [2-3]

DebugUseOnChipDebug 0 0|1DebugUseOnChipDebug 0 0|1

############

指令集仿真程序 Instruction Set Emulator

##

########################################################################################################################### ######################

############

ISSArgcPAddr 0×00012000 0×[0-9a-fA-F]+ISSArgcPAddr 0×00012000 0×[0-9a-fA-F]+

ISSArgvPAddr 0×00012004 0×[0-9a-fA-F]+ISSArgvPAddr 0×00012004 0×[0-9a-fA-F]+

############

设计验证 design verification

##

############

DVMagicLocPAddr 0×00010000 0×[0-9a-fA-F]+DVMagicLocPAddr 0×00010000 0×[0-9a-fA-F]+

DVSerialRXADataPAddr 0×00011000 0×[0-9a-fA-F]+DVSerialRXADataPAddr 0×00011000 0×[0-9a-fA-F]+

DVSerialRXBDataPAddr 0×00011010 0×[0-9a-fA-F]+DVSerialRXBDataPAddr 0×00011010 0×[0-9a-fA-F]+

DVSerialRXStatusPAddr 0×00011020 0×[0-9a-fA-F]+DVSerialRXStatusPAddr 0×00011020 0×[0-9a-fA-F]+

DVSerialRXRequestPAddr 0×00011030 0×[0-9a-fA-F]+DVSerialRXRequestPAddr 0×00011030 0×[0-9a-fA-F]+

DVCachedVAddr 0×60000000 0×[0-9a-fA-F]+DVCachedVAddr 0×60000000 0×[0-9a-fA-F]+

DVNonCachedVAddr 0×80000000 0×[0-9a-fA-F]+DVNonCachedVAddr 0×80000000 0×[0-9a-fA-F]+

############

测试选项 test options

##

############

TestFullScan 0 0|1TestFullScan 0 0|1

TestLatchesTransparent 0 0|1TestLatchesTransparent 0 0|1

####

处理器实施方案配置 Processor Implementation Configuration

##

############

ImplTargetSpeed 250 [1-9][0-9]＊ImplTargetSpeed 250 [1-9][0-9]*

ImplTargetSize 20000 [1-9][0-9]＊ImplTargetSize 20000 [1-9][0-9]*

ImplTargetPower 75 [1-9][0-9]＊ImplTargetPower 75 [1-9][0-9]*

ImplSpeedPriority High High|Medium|LowImplSpeedPriority High High|Medium|Low

ImplPowerPriority Medium High|Medium|LowImplPowerPriority Medium High|Medium|Low

ImplSizePriority Low High|Medium|lowImplSizePriority Low High|Medium|low

ImplTargetTechnology 25mImpl Target Technology 25m

18m|25m|35m|cx3551|cx3301|acb25typ|acb25wst|t25typical|t25worst| 18m|25m|35m|cx3551|cx3301|acb25typ|acb25wst|t25typical|t25worst|

ImplOperatingCondition Typical Worst|TypicalImplOperatingCondition Typical Worst|Typical

############

CAD选项CAD option

############

CadParUseApollo 1 0|1CadParUseApollo 1 0|1

CadParUseSiliconEnsembl 0 0|1CadParUseSiliconEnsembl 0 0|1

CadSimUseVCS 1 0|1CadSimUseVCS 1 0|1

CadSimUseVerilogXL 1 0|1CadSimUseVerilogXL 1 0|1

CadSimUseVerilogNC 1 0|1CadSimUseVerilogNC 1 0|1

CadSimUseVantage 0 0|1CadSimUseVantage 0 0|1

CadSimUseMTI 0 0|1CadSimUseMTI 0 0|1

CadStvUseMotive 0 0|1CadStvUseMotive 0 0|1

CadStvUsePrimeTime 1 0|1CadStvUsePrimeTime 1 0|1

CadSynUseBuildGates 0 0|1CadSynUseBuildGates 0 0|1

CadSynUseDesignCompiler 1 0|1CadSynUseDesignCompiler 1 0|1

##

TIE指令文件。它必须是绝对路径名TIE instruction file. it must be an absolute pathname

##

############

TIE文件名 \/.＊|- TIE file name \/.*|-

############

######################################################################################################################## ###################

############

##

下面的程序段仅用于内部。若要将任何内部参数往上送，请确认 The following program segment is for internal use only. To upload any internal parameters, please confirm

##

# 所有产品部件都能支持它。# All product components support it.

############

#Constants for Athens implementation#Constants for Athens implementation

IsaUseAthensCacheTest 1 0|1IsaUseAthensCacheTest 1 0|1

IsaUseSpeculation 0 0IsaUseSpeculation 0 0

IsaUseCoprocessor 0 0IsaUseCoprocessor 0 0

IsaUseFloatingPoint 0 0IsaUseFloatingPoint0 0

IsaUseDSP 0 0IsaUseDSP 0 0

IsaUseDensityInstruction 1 1IsaUseDensityInstruction 1 1

IsaUse32bitMulDiv 0 0IsaUse32bitMulDiv 0 0

IsaUseAbsdif 0 0IsaUseAbsdif 0 0

IsaUseCRC 0 0IsaUseCRC 0 0

IsaUsePopCount 0 0IsaUsePopCount 0 0

IsaUseLeadingZeros 0 0IsaUseLeadingZeros 0 0

IsaUseMinMax 0 0IsaUseMinMax 0 0

IsaUseSignExtend 0 0IsaUseSignExtend 0 0

IsaUseSynchronization 0 0IsaUseSynchronization 0 0

DataCacheIndexLock 0 0DataCacheIndexLock 0 0

DataCacheIndexType physical physicalDataCacheIndexType physical physical

DataCacheMaxMissCount 1 1DataCacheMaxMissCount 1 1

DataCacheMissStart 32 32DataCacheMissStart 32 32

DataCacheParityBits 0 0DataCache ParityBits 0 0

DataCacheSectorSize 16 16DataCacheSectorSize 16 16

DataCacheTagParityBits 0 0DataCacheTagParityBits 0 0

DataCacheTagType physical physicalDataCacheTagType physical physical

DataCacheWayLock 0 0DataCacheWayLock 0 0

InstCacheIndexLock 0 0InstCacheIndexLock 0 0

InstCacheIndexType physical physicalInstCacheIndexType physical physical

InstCacheMaxMissCount 1 1InstCacheMaxMissCount 1 1

InstCacheMissStart 32 32InstCacheMissStart 32 32

InstCacheParityBits 0 0InstCacheParityBits 0 0

InstCacheSectorSize 16 16InstCacheSectorSize 16 16

InstCacheTagParityBits 0 0InstCacheTagParityBits 0 0

InstCacheTagType physical physicalInstCacheTagType physical physical

InstCacheWayLock 0 0InstCacheWayLock 0 0

############

# Build mode...for Web customers.They can run a limited number of# Build mode...for Web customers.They can run a limited number of

# production builds，but as many eval builds as they like。# production builds, but as many eval builds as they like.

#UserCID is used for fingerprinting#UserCID is used for fingerprinting

############

BuildMode Evaluation Evaluation|ProductionBuildMode Evaluation Evaluation|Production

BuildUserCID 999 [0-9]+BuildUserCID 999 [0-9]+

############

#Values used by the GUI-basically persistent state#Values used by the GUI-basically persistent state

############

SysAddressLayout Xtos Xtos|ManualSysAddressLayout Xtos Xtos|Manual

附件BAnnex B

#！/usr/xtensa/tools/bin/perl#! /usr/xtensa/tools/bin/perl

# Tensilica PreProcessor# Tensilica PreProcessor

# SId:tpp，v 1.15 1998/12/17 19:36:03 earl Exp $# SId: tpp, v 1.15 1998/12/17 19:36:03 earl Exp $

# Modified：Kaushik Sheth#Modified: Kaushik Sheth

# The original code was taken from Iain McClatchie。# The original code was taken from Iain McClatchie.

# perl preprocessor# perl preprocessor

warrantee implied。warranty implies.

# Author：Iain McClatchie#Author: Iain McClatchie

# You can redistribute and/or modify this software under the termsofthe# You can redistribute and/or modify this software under the terms of the

# GNU General Public License as published by the FreeSoftwareFoundation；# GNU General Public License as published by the FreeSoftware Foundation;

# either version 2，or(at your option)any later version。# either version 2, or (at your option) any later version.

use lib″@xtools@/lib″；use lib″@xtools@/lib″;

package tpp；package tpp;

# Standard perl modules# Standard perl modules

use strict；use strict;

use Exporter()；use Exporter();

use Getopt::Long；use Getopt::Long;

# Module stuff# Module stuff

@tpp::ISA＝qw(Exporter)；@tpp::ISA=qw(Exporter);

@tpp::EXPORT＝qw(@tpp::EXPORT=qw(

include include

error error

)； );

@tpp::EXPORT_OK＝qw(@tpp::EXPORT_OK=qw(

include include

gen the gene

error error

)； );

％tpp::EXPORT_TAGS＝()；%tpp::EXPORT_TAGS=();

use vars qw(use vars qw(

$debug $debug

$lines $lines

@incdir @incdir

$config $config

$output $output

@global_file_stack @global_file_stack

)； );

#Main program#Main program

{{

S::myname＝′tpp′； # for error messages S::myname='tpp'; # for error messages

# parse command line # parse command line

$debug＝0； # -debug command line option $debug=0; # -debug command line option

$lines＝0； # -linescommand lineoption $lines=0; # -linescommand lineoption

@incdir＝()； # -I command line options @incdir=(); # -I command line options

$config＝″； # -c command line option $config="; # -c command line option

$output＝undef； # -o command line option $output=undef; # -o command line option

my @eval＝()； my @eval = ();

if(！GetOptions( if(!GetOptions(

″debug！″＝>\$debug， "debug!"=>\$debug,

″lines！″＝>\Slines， "lines!" => \Slines,

″I＝s@″＝>\@incdir， "I=s@"=>\@incdir,

″c＝s″＝>\$config， "c=s"=>\$config,

″o＝s″＝>\$output， "o=s"=>\$output,

″eval＝s@″＝>\@eval) "eval=s@"=>\@eval)

‖@ARGV<＝0){ ‖@ARGV<=0){

# command line error # command line error

print STDERR<<″END″； print STDERR << "END";

tpp[args]filetpp[args]file

Applies a perl preprocessor to the indicated file，and any files Applies a perl preprocessor to the indicated file, and any files

included therein；the output of the preprocessor is written to included therein; the output of the preprocessor is written to

stdout.Perl is embedded in the source text by one of two means. stdout.Perl is embedded in the source text by one of two means.

Whole lines of perl can be embedded by preceding them with a Whole lines of perl can be embedded by preceding them with a

semicolon(you would typically do this for looping statments or semicolon (you would typically do this for looping statments or

subroutine calls).Alternatively，perl expressions can be embedded subroutine calls). Alternatively, perl expressions can be embedded

into the middle of other text by escaping them with backticks。 into the middle of other text by escaping them with backticks.

-debug Print perl code to STDERR，so you can figure out why your -debug Print perl code to STDERR, so you can figure out why your

embeddedembedded

perl statements are looping forever。 perl statements are looping forever.

-lines Embed\′#line 43\″foo.w\″\′directives in output，for -lines Embed\′#line 43\″foo.w\″\′directives in output, for

moremore

comprehensible error and warning messages from later comprehensive error and warning messages from later

tools。tools.

-I dir search for include files in directory dir -I dir search for include files in directory dir

-o output_file Redirect the output to afile rather than astdout。 -o output_file Redirect the output to a file rather than astdout.

-c config_file Read the specified config file。 -c config_file Read the specified config file.

-e eval Eval eval before running program -e eval Eval eval before running program

NOTE：NOTE:

the lines with only″；″and″；//″will go unaltered。 the lines with only″;″and″; //″will go unaltered.

ENDEND

exit(1)； exit(1);

} }

#Initialize #Initialize

push(@INC，@incdir)； push(@INC, @incdir);

@global_file_stack＝()； @global_file_stack = ();

#Read configuration file #Read configuration file

tppcode::init($config)； tppcode::init($config);

# Open the output file # Open the output file

if($output){ if ($output){

open(STDOUT，″>$output″) open(STDOUT, ">$output")

‖die(″$::myname：$！，opening′$output′\n″)； ‖die("$::myname:$!,opening'$output'\n");

} }

# Process evals # Process evals

foreach(@eval){ foreach(@eval){

tppcode::execute(S_)； tppcode::execute(S_);

} }

# Process the input files # Process the input files

foreach (@ARGV){ foreach (@ARGV){

include($_)； include($_);

} }

# Done # Done

exit(0)； exit(0);

}}

sub include{sub include{

my($file)＝@_； my($file) = @_;

my($buf，$tempname，@chunks，$chunk，$state，$lasttype)； my($buf, $tempname, @chunks, $chunk, $state, $lasttype);

if($file＝～m|^/|){ if ($file=~m|^/|){

if(！open(INP，″<$file″)){ if(!open(INP,"<$file")){

error($file，″$！，opening $file″)； error($file, "$!, opening $file");

} }

}else{ }else{

my $path； my $path;

foreach $path(″.″，@incdir){ foreach $path(″.”, @incdir) {

if(open(INP，″<$path/$file″)){ if(open(INP,"<$path/$file")){

$file＝″$path/$file″； $file = "$path/$file";

last； last;

} }

error($file，″Couldn′t find $file in @INC″) error($file, "Couldn't find $file in @INC")

if tell(INP)＝＝-1； if tell(INP)==-1;

} }

$lasttype＝″″； $lasttype="";

while(<INP>){while(<INP>){

if(/^\s＊；(.＊)$/){ if(/^\s*;(.*)$/){

my $l＝$1； my $l = $1;

if($lasttype ne″perl″){ if ($lasttype ne″perl″){

$lasttype＝″perl″； $lasttype="perl";

} }

if((/^\s＊；\s＊\/\//)‖(/^\s＊；\s＊$/)){ if((/^\s*;\s*\/\//)‖(/^\s*;\s*$/)){

$buf.＝″print STDOUT\″$_\″；\n″； $buf.="print STDOUT\"$_\";\n";

}else{ }else{

$buf.＝$1.″\n″； $buf.=$1.″\n″;

} }

}else{}else{

if($lines and $lasttype ne″text″){ if ($lines and $lasttype ne″text″){

$buf.＝″print STDOUT\″\#line $.\\\″$file\\\″\\n\″；\n″； $buf.＝"print STDOUT\"\#line $.\\\"$file\\\"\\n\";\n";

$lasttype＝″text″； $lasttype="text";

} }

chomp； chomp;

if(m/^$/){ if(m/^$/){

$buf.＝″print STDOUT\″\\n\″；\n″； $buf.="print STDOUT\"\\n\";\n";

next； next;

} }

@chunks＝split(″\`″)； @chunks = split("\`");

$state＝0； $state=0;

$tempname＝″00″； $tempname="00";

foreach $chunk(@chunks){ foreach $chunk(@chunks){

if($state＝＝0){ if ($state==0){

$chunk＝quotemeta($chunk)； $chunk = quotemeta($chunk);

$state＝1； $state=1;

} else{ } else{

if($chunk＝～m/^\W/){#Perl expression if($chunk＝～m/^\W/){#Perl expression

$buf.＝″\$temp$tempname＝$chunk；\n″； $buf.="\$temp$tempname=$chunk;\n";

$chunk＝″\$\{temp$tempname\}″； $chunk = "\$\{temp$tempname\}";

$tempname++； $tempname++;

$state＝0； $state=0;

} else{ # Backquoted something } else{ # Backquoted something

$chunk＝″\\\`″.quotemeta($chunk)； $chunk = "\\\`".quotemeta($chunk);

$state＝1； $state=1;

} }

# check if the line ends with a backquote # check if the line ends with a backquote

if(m/\`$/){ if(m/\`$/){

$state＝1-$state； $state=1-$state;

} }

error($file，″Unterminated embedded perl expression，line error($file, "Unterminated embedded perl expression, line

$.″)$.")

if($state＝＝0)； if ($state==0);

$buf.＝″print STDOUT\″″.join(″″，@chunks)。 $buf. = "print STDOUT\"". join("", @chunks).

″\\n\″；\n″；"\\n\";\n";

} }

close(INP)； close(INP);

print STDERR $buf if($debug)； print STDERR $buf if($debug);

push(@global_file_stack，$file)； push(@global_file_stack, $file);

tppcode：：execute($buf)； tppcode::execute($buf);

pop(@global_file_stack)； pop(@global_file_stack);

if($@){ if ($@){

chomp($@)； chomp ($@);

error($file，$@)； error($file, $@);

} }

}}

sub gen{sub gen {

print STDOUT(@_)； print STDOUT(@_);

}}

sub error{sub error {

my($file，$err)＝@_； my($file, $err) = @_;

print STDERR″$::myname：Error($err)while preprocessing file print STDERR″$::myname: Error($err) while preprocessing file

\″$file\″\n″；\″$file\″\n″;

my $fn； my $fn;

foreach $fn(@global_file_stack){ foreach $fn(@global_file_stack){

print STDERR″included from\″$fn\″\n″； print STDERR″included from\″$fn\″\n″;

} }

exit(1)； exit(1);

}}

# This package is used to execute the tpp code# This package is used to execute the tpp code

package tppcode；package tppcode;

no strict；no strict;

use Xtensa::Config；use Xtensa::Config;

sub ppp_require{sub ppp_require{

print STDERR(″tpp：Warning：ppp_require used instead of print STDERR(″tpp: Warning: ppp_require used instead of

tpp::include\n″)；tpp::include\n");

tpp::include(@_)； tpp::include(@_);

}}

sub init(sub init(

my($cfile)＝@_； my($cfile) = @_;

config_set($cfile)； config_set($cfile);

}}

sub execute{sub execute{

my($code)＝@_； my($code) = @_;

eval($code)； eval($code);

}}

##

# Local Variables：# Local Variables:

# mode：perl# mode: perl

# perl-indent-level：4# perl-indent-level: 4

# cperl-indent-level：4# cperl-indent-level: 4

# End： # End:

附件CAnnex C

# Change XTENSA to point to your local installation# Change XTENSA to point to your local installation

XTENSA＝/usr/xtensa/awang/s8XTENSA＝/usr/xtensa/awang/s8

##

# No need to change the rest# No need to change the rest

##

GCC＝/usr/xtensa/stools/bin/gccGCC=/usr/xtensa/stools/bin/gcc

XTCC＝$(XTENSA)/bin/xt-gccXTCC=$(XTENSA)/bin/xt-gcc

XTRUN＝$(XTENSA)/bin/xt-runXTRUN=$(XTENSA)/bin/xt-run

XTGO＝$(XTENSA)/Hardware/scripts/xtgoXTGO = $(XTENSA)/Hardware/scripts/xtgo

MFILE＝$(XTENSA)/Hardware/diag/Makefile.commonMFILE=$(XTENSA)/Hardware/diag/Makefile.common

all：run-base run-tie-cstub run-iss run-iss-old run-iss-new run-verall: run-base run-tie-cstub run-iss run-iss-old run-iss-new run-ver

##

# Rules to build various versions of me# Rules to build various versions of me

##

me-base：me.c me_base.c me_tie.c src.c sad.cme-base: me.c me_base.c me_tie.c src.c sad.c

$(GCC)-o me-base -g -O2 -DNX＝64 -DNY＝64 me.c $(GCC) -o me-base -g -O2 -DNX=64 -DNY=64 me.c

me-tie-cstub：me.c me_base.c me_tie.c src.c sad.cme-tie-cstub: me.c me_base.c me_tie.c src.c sad.c

$(GCC)-o me-tie-cstub -g -O2 -DTIE -DNX＝64 -DNY＝64me.c $(GCC) -o me-tie-cstub -g -O2 -DTIE -DNX=64 -DNY=64me.c

me-xt：me.c me_base.c me_tie.c src.c sad.cme-xt: me.c me_base.c me_tie.c src.c sad.c

$(XTCC)-o me-xt -g -O2 -DXTENSA -DNX＝32 -DNY＝32me.c $(XTCC) -o me -xt -g -O2 -DXTENSA -DNX=32 -DNY=32me.c

me-xt-old：me.c me_base.c me_tie.c src.c sad.cme-xt-old: me.c me_base.c me_tie.c src.c sad.c

$(XTCC)-o me-xt-old -g -O3 -DOLD -DXTENSA -DNX＝32 -DNY＝32 $(XTCC) -o me-xt-old -g -O3 -DOLD -DXTENSA -DNX＝32 -DNY＝32

me.cme.c

me-xt-new：me.c me_base.c me_tie.c src.c sad.cme-xt-new: me.c me_base.c me_tie.c src.c sad.c

$(XTCC)-o me-xt-new -g -O3 -DNEW -DXTENSA -DNX＝32 -DNY＝32 $(XTCC) -o me-xt-new -g -O3 -DNEW -DXTENSA -DNX=32 -DNY=32

me.cme.c

me-xt.s：me.c me_base.c me_tie.c src.c sad.cme-xt.s: me.c me_base.c me_tie.c src.c sad.c

$(XTCC)-o me-xt.s -S- O3 -DNOPRINTF -DXTENSA -DNX＝16 -DNY＝16 $(XTCC) -o me-xt.s -S-O3 -DNOPRINTF -DXTENSA -DNX=16 -DNY=16

me.cme.c

##

# Rules for various runs of me# Rules for various runs of me

##

run-base：me-baserun-base: me-base

me-base；exit 0 me-base; exit 0

run-tie-cstub：me-tie-cstubrun-tie-cstub: me-tie-cstub

me-tie-cstub；exit 0 me-tie-cstub; exit 0

run-iss：me-xtrun-iss:me-xt

$(XTRUN)me-xt $(XTRUN)me -xt

run-iss-old：me-xt-oldrun-iss-old:me-xt-old

$(XTRUN)--verbose me-xt-old $(XTRUN) --verbose me-xt-old

run-iss-new：me-xt-newrun-iss-new:me-xt-new

$(XTRUN)--verbose me-xt-new $(XTRUN) --verbose me-xt-new

run-ver：me-xt.s testdirrun-ver: me-xt.s testdir

cp me-xt.s testdir/me-xt cp me-xt.s testdir/me-xt

$(XTGO)-vcs -testdir `pwd`/testdir -test me-xt>run-ver.out $(XTGO)-vcs -testdir `pwd`/testdir -test me-xt>run-ver.out

2>&12>&1

grep Status run-ver.out grep Status run-ver.out

testdir：testdir:

mkdir-p testdir/me-xt mkdir -p testdir/me -xt

@echo′all：me-xt.dat me-xt.bfd′>testdir/me-xt/Makefile @echo'all:me-xt.dat me-xt.bfd'>testdir/me-xt/Makefile

@echo″include $(MFILE)″>>testdir/me-xt/Makefile @echo "include $(MFILE)" >>testdir/me-xt/Makefile

clean：clean:

rm-rf me-＊＊.out testdir results rm -rf me-＊＊.out testdir results

APPENDIX I：TEST PROGRAMAPPENDIX I: TEST PROGRAM

#include<stdio.h>#include <stdio.h>

#include<stdlib.h>#include <stdlib.h>

#include<limits.h>#include <limits.h>

#ifndef NX#ifndef NX

#define NX 32 /＊image width＊/#define NX 32 /*image width*/

#endif#endif

#ifndef NY#ifndef NY

#define NY 32 /＊image height＊/#define NY 32 /*image height*/

#endif#endif

#define BLOCKX 16 /＊block width＊/#define BLOCKX 16 /*block width*/

#define BLOCKY 16 /＊block height＊/#define BLOCKY 16 /*block height*/

#define SEARCHX 4 /＊search regionwidth＊/#define SEARCHX 4 /*search regionwidth*/

#define SEARCHY 4 /＊search regionheight＊/#define SEARCHY 4 /*search regionheight*/

unsigned char OldB[NX][NY]； /＊old image＊/unsigned char OldB[NX][NY]; /*old image*/

unsigned char NewB[NX][NY]； /＊new image＊/unsigned char NewB[NX][NY]; /*new image*/

unsigned short VectX[NX/BLOCKX][NY/BLOCKY]； /＊X motion vector*/unsigned short VectX[NX/BLOCKX][NY/BLOCKY]; /*X motion vector*/

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]； /＊Y motion vector*/unsigned short VectY[NX/BLOCKX][NY/BLOCKY]; /*Y motion vector*/

unsigned short VectB[NX/BLOCKX][NY/BLOCKY]； /＊absolutedifference＊/unsigned short VectB[NX/BLOCKX][NY/BLOCKY]; /*absolutedifference*/

unsigned short.BaseX[NX/BLOCKX][NY/BLOCKY]； /＊Base X motionvector*/unsigned short.BaseX[NX/BLOCKX][NY/BLOCKY]; /*Base X motionvector*/

unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]； /＊BaseY motionvector＊/unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]; /*BaseY motionvector*/

unsigned short BaseB[NX/BLOCKX][NY/BLOCKY]； /＊Base absoluteunsigned short BaseB[NX/BLOCKX][NY/BLOCKY]; /*Base absolute

difference*/difference*/

#define ABS(x) (((x)<0)？(-(x))：(x))#define ABS(x) (((x)<0)?(-(x)):(x))

#define MIN(x，y) (((x)<(y))？(x)：(y))#define MIN(x,y) (((x)<(y))?(x):(y))

#define MAX(x，y) (((x)>(y))？(x)：(y))#define MAX(x,y) (((x)>(y))?(x):(y))

#define ABSD(x，y) (((x)>(y))？((x)-(y))：((y)-(x)))#define ABSD(x,y) (((x)>(y))?((x)-(y)):((y)-(x)))

^L^L

/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为了测试目的对01dB和NewB数组进行初始化 Initialize the 01dB and NewB arrays for testing purposes

＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/

void init()void init()

{{

int x，y，x1，y1； int x, y, x1, y1;

for(x＝0；x<NX；x++){ for(x=0; x<NX; x++){

for(y＝0；y<NY；y++)( for(y=0; y<NY; y++)(

OldB[x][y]＝x^y； OldB[x][y]=x^y;

} }

for(x＝0；x<NX；x++){ for(x=0; x<NX; x++){

for(y＝0；y<NY；y++){ for(y=0; y<NY; y++){

x1＝(x+3)％NX； x1=(x+3)%NX;

y1＝(y+4)％NY； y1=(y+4)%NY;

NewB[x][y]＝OldB[x1][y1]； NewB[x][y]=OldB[x1][y1];

} }

}}

/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

将各项结果对照全色数据进行检查 Check results against full color data

unsigned check()unsigned check()

{{

int bx，by； int bx, by;

for(by＝0；by<NY/BLOCKY；by++){ for(by=0; by<NY/BLOCKY; by++){

for(bx＝0；bx<NX/BLOCKX；bx++){ for(bx=0; bx<NX/BLOCKX; bx++){

if(VectX[bx][by]！＝BaseX[bx][by])return0； if (VectX[bx][by] != BaseX[bx][by]) return 0;

if(VectY[bx][by]！＝BaseY[bx][by])return0； if (VectY[bx][by] != BaseY[bx][by]) return 0;

if(VectB[bx][by]！＝BaseB[bx][by])return0； if (VectB[bx][by] != BaseB[bx][by]) return 0;

} }

return1； return1;

}}

运动评估的各种实施方案 Various implementations of exercise assessment

#include″me_base.c″#include "me_base.c"

#inClude″me_tie.c″#inClude "me_tie.c"

主测试程序 main test program

intint

main(int argc，char^＊＊argv)main(int argc, char ^** argv)

{{

int passed； int passed;

#ifndef NOPRINTF#ifndef NOPRINTF

#endif#endif

init()； init();

#ifdef OLD#ifdef OLD

motion_estimate base()； motion_estimate base();

passed＝1； passed = 1;

#elif NEW#elif NEW

motion_estimate_tie()； motion_estimate_tie();

passed＝1； passed = 1;

#else#else

motion_estimate_base()； motion_estimate_base();

motion_estimate_tie()； motion_estimate_tie();

passed＝check()； passed = check();

#endif#endif

#ifndef NOPRINTF#ifndef NOPRINTF

printf(passed？″TIE version passed\n″：″＊＊TIE version printf(passed? "TIE version passed\n": "＊＊TIE version

failed\n″)；failed\n″);

#endif#endif

return passed； return passed;

}}

APPENDIX II：ME_BASE.CAPPENDIX II: ME_BASE.C

/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

参考软件的实施方案 Implementation of the reference software

＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/

voidvoid

motion_estimate_base()motion_estimate_base()

{{

int bx，by，cx，cy，x，y； int bx, by, cx, cy, x, y;

int startx，starty，endx，endy； int startx, starty, endx, endy;

unsigned diff，best，bestx，besty； unsigned diff, best, bestx, besty;

for(bx＝0；bx<NX/BLOCKX；bx++){ for(bx=0; bx<NX/BLOCKX; bx++){

for(by＝0；by<NY/BLOCKY；by++){ for(by=0; by<NY/BLOCKY; by++){

best＝bestx＝besty＝UINT_MAX； best = bestx = besty = UINT_MAX;

startx＝MAX(0，bx＊BLOCKX-SEARCHX)； startx=MAX(0, bx*BLOCKX-SEARCHX);

starty＝MAX(0，by＊BLOCKY-SEARCHY)； starty=MAX(0,by*BLOCKY-SEARCHY);

for(cx＝startx；cx<endx；cx++){ for(cx=startx; cx<endx; cx++){

for(cy＝starty；cy<endy；cy++){ for(cy=starty; cy<endy; cy++){

diff＝0； diff=0;

for(x＝0；x<BLOCKX；x++){ for(x=0; x<BLOCKX; x++){

for(y＝0；y<BLOCKY；y++){ for(y=0; y<BLOCKY; y++){

diff+＝ABSD(OldB[cx+x][cy+y]， diff+=ABSD(OldB[cx+x][cy+y],

NewB[bx＊BLOCKX+x][by＊BLOCKY+y])；NewB[bx＊BLOCKX+x][by＊BLOCKY+y]);

} }

if (diff<best) { if (diff<best) {

best＝diff； best = diff;

bestx＝cx； bestx=cx;

besty＝cy； besty=cy;

} }

BaseX[bx][by]＝bestx； BaseX[bx][by]=bestx;

BaseY[bx][by]＝besty； BaseY[bx][by]=besty;

BaseB[bx][by]＝best； BaseB[bx][by]=best;

} }

}}

APPENDIX III：ME_TIE.CAPPENDIX III: ME_TIE.C

#include″src.c″#include "src.c"

#include″sad.c″#include "sad.c"

使用SAD指令的运动评估的快速样式 Quick Style for Motion Evaluation Using SAD Instructions

voidvoid

motion_estimate_tie()motion_estimate_tie()

{{

int bx，by，cx，cy，x； int bx, by, cx, cy, x;

int startx，starty，endx，endy； int startx, starty, endx, endy;

unsigned＊N，N1，N2，N3，N4，*O，A，B，C，D，E； unsigned*N, N1, N2, N3, N4, *O, A, B, C, D, E;

for(bx＝0；bx<NX/BLOCKX；bx++){ for(bx=0; bx<NX/BLOCKX; bx++){

for(by＝0；by<NY/BLOCKY；by++){ for(by=0; by<NY/BLOCKY; by++){

best＝bestx＝besty＝UINT_MAX； best = bestx = besty = UINT_MAX;

startx＝MAX(0，bx＊BLOCKX-SEARCHX)； startx=MAX(0, bx*BLOCKX-SEARCHX);

starty＝MAX(0，by*BLOCKY-SEARCHY)； starty=MAX(0, by*BLOCKY-SEARCHY);

for(cx＝startx；cx<endx；cx++){ for(cx=startx; cx<endx; cx++){

diff0＝diff1＝diff2＝diff3＝0； diff0=diff1=diff2=diff3=0;

for(x＝0；x<BLOCKX；x++){ for(x=0; x<BLOCKX; x++){

N＝(unsigned＊) N=(unsigned*)

&(NewB[bx＊BLOCKX+x][by＊BLOCKY])；&(NewB[bx＊BLOCKX+x][by＊BLOCKY]);

N1＝N[0]； N1=N[0];

N2＝N[1]； N2=N[1];

N3＝N[2]； N3=N[2];

N4＝N[3]； N4=N[3];

O＝(unsigned＊)&(OldB[cx+x][cy])； O=(unsigned*)&(OldB[cx+x][cy]);

A＝O[0]； A=O[0];

B＝O[1]； B=O[1];

C＝O[2]； C=O[2];

D＝O[3]； D=O[3];

E＝O[4]； E=O[4];

diff0+＝SAD(A，N1)+SAD(B，N2)+ diff0+=SAD(A, N1)+SAD(B, N2)+

SAD(C，N3)+SAD(D，N4)； SAD(C, N3)+SAD(D, N4);

#ifdef BIG_ENDIAN#ifdef BIG_ENDIAN

SSAI(24)； SSAI(24);

diff1+＝SAD(SRC(A，B)，N1)+SAD(SRC(B，C)，N2) diff1+=SAD(SRC(A,B),N1)+SAD(SRC(B,C),N2)

++

SAD(SRC(C，D)，N3)+SAD(SRC(D，E)， SAD(SRC(C,D),N3)+SAD(SRC(D,E),

N4)；N4);

SSAI(16)； SSAI(16);

diff2+＝SAD(SRC(A，B)，N1)+SAD(SRC(B，C)，N2) diff2+=SAD(SRC(A,B),N1)+SAD(SRC(B,C),N2)

++

SAD(SRC(C，D)，N3)+SAD(SRC(D，E)， SAD(SRC(C,D),N3)+SAD(SRC(D,E),

N4)；N4);

SSAI(8)； SSAI(8);

diff3+＝SAD(SRC(A，B)，N1)+SAD(SRC(B，C)，N2) diff3+=SAD(SRC(A,B),N1)+SAD(SRC(B,C),N2)

++

SAD(SRC(C，D)，N3)+SAD(SRC(D，E)， SAD(SRC(C,D),N3)+SAD(SRC(D,E),

N4)；N4);

#else#else

SSAI(8)； SSAI(8);

diff1+＝SAD(SRC(B，A)，N1)+SAD(SRC(C，B)，N2) diff1+=SAD(SRC(B,A),N1)+SAD(SRC(C,B),N2)

++

SAD(SRC(D，C)，N3)+SAD(SRC(E，D)， SAD(SRC(D,C),N3)+SAD(SRC(E,D),

N4)；N4);

SSAI(16)； SSAI(16);

diff2+＝SAD(SRC(B，A)，N1)+SAD(SRC(C，B)，N2) diff2+=SAD(SRC(B,A),N1)+SAD(SRC(C,B),N2)

++

SAD(SRC(D，C)，N3)+SAD(SRC(E，D)， SAD(SRC(D,C),N3)+SAD(SRC(E,D),

N4)；N4);

SSAI(24)； SSAI(24);

diff3+＝SAD(SRC(B，A)，N1)+SAD(SRC(C，B)，N2) diff3+=SAD(SRC(B,A),N1)+SAD(SRC(C,B),N2)

++

SAD(SRC(D，C)，N3)+SAD(SRC(E，D)， SAD(SRC(D,C),N3)+SAD(SRC(E,D),

N4)；N4);

#endif#endif

O+＝NY/4； O+=NY/4;

N+＝NY/4； N+=NY/4;

} }

if(diff0<best){ if(diff0<best){

best＝diff0； best = diff0;

bestx＝cx； bestx=cx;

besty＝cy； besty=cy;

} }

if(diff1<best){ if(diff1<best){

best＝diff1； best = diff1;

bestx＝cx； bestx=cx;

besty＝cy+1； besty=cy+1;

} }

if(diff2<best){ if(diff2<best){

best＝diff2； best = diff2;

bestx＝cx； bestx=cx;

besty＝cy+2； besty=cy+2;

} }

if(diff3<best){ if(diff3<best){

best＝diff3； best = diff3;

bestx＝cx； bestx=cx;

besty＝cy+3； besty=cy+3;

} }

VectX[bx][by]＝bestx； VectX[bx][by]=bestx;

VectY[bx][by]＝besty； VectY[bx][by]=besty;

VectB[bx][by]＝best； VectB[bx][by]=best;

} }

}}

APPENDIX IV：SAD.CAPPENDIX IV: SAD.C

#if defined(XTENSA)#if defined(XTENSA)

#include <machine/Customer.h>#include <machine/Customer.h>

#elif defined(TIE)#elif defined(TIE)

#include″../dk/me_cstub.c″#include "../dk/me_cstub.c"

#else#else

4个字节的绝对差值之和 sum of absolute differences of 4 bytes

static inline unsignedstatic inline unsigned

SAD(unsigned ars，unsigned art)SAD (unsigned ars, unsigned art)

{{

return ABSD(ars>>24，art>>24)+ return ABSD(ars>>24, art>>24)+

ABSD((ars>>16)&255，(art>>16)&255)+ ABSD((ars>>16)&255, (art>>16)&255)+

ABSD((ars>>8)&255，(art>>8)&255)+ ABSD((ars>>8)&255, (art>>8)&255)+

ABSD(ars & 255，art & 255)； ABSD(ars & 255, art &255);

}}

#endif#endif

APPENDIX V：SRC.CAPPENDIX V: SRC.C

若目标代码是原始代码，则使用一个全局变量来存储SSAI的位 If the object code is the original code, use a global variable to store the SSAI bits

移量。 displacement.

＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

直接访问右移连锁指令。应当单独地用SSAI()来装载位移量 Direct access to right-shift chained instructions. The offset should be loaded with SSAI() alone

寄存器 register

Direct access to the Shift Right Concatenate Instruction. Direct access to the Shift Right Concatenate Instruction.

The shift amount register must be loaded separately with SSAI()。 The shift amount register must be loaded separately with SSAI().

/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊//＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/

static　inline unsignedstatic inline unsigned

SRC(unsigned ars，unsigned art)SRC (unsigned ars, unsigned art)

{{

unsigned arr； unsigned arr;

#ifndef XTENSA#ifndef XTENSA

arr＝(ars<<(32-sar))|(art>>sar)； arr=(ars<<(32-sar))|(art>>sar);

#else#else

asm volatile(″src\t％0，％1，％2″：″＝a″(arr)：″a″(ars)，″a″ asm volatile("src\t%0,%1,%2":"=a"(arr):"a"(ars), "a"

(art))；(art));

#endif#endif

return arr； return arr;

}}

设置位移量寄存器 set displacement register

static inline voidstatic inline void

SSAI(int count)SSAI(int count)

{{

#ifndef XTENSA#ifndef XTENSA

sar＝count； sar=count;

#else#else

switch(count){ switch(count){

case 8： case 8:

asm volatile(″ssai\t8″)； asm volatile("ssai\t8");

break； break;

case 16： case 16:

asm volatile(″ssai\t16″)； asm volatile("ssai\t16");

break； break;

case 24： case 24:

asm volatile(″ssai\t24″)； asm volatile("ssai\t24");

break； break;

default： default:

exit(-1)； exit(-1);

} }

#endif #endif

}}

APPENDIX VI：SOURCE CODEAPPENDIX VI: SOURCE CODE

/＊/*

Block Motion Estimation：Block Motion Estimation：

The purposeof motion estimation is to find the unaligned 8×8 blockofThe purpose of motion estimation is to find the unaligned 8×8 blockof

an existing (old) image that most closely resemblesan aligned 8×8an existing (old) image that most closely resembles an aligned 8×8

block.The search here is at any byte offset in+/- 16 bytes in×andblock.The search here is at any byte offset in+/- 16 bytes in×and

+/- 16 bytes in y.The search is a set of six nested loops。+/- 16 bytes in y. The search is a set of six nested loops.

OldB is pointer to a byte array of old blockOldB is pointer to a byte array of old block

NewB is pointer to a byte array of base blockNewB is pointer to a byte array of base block

＊/*/

#define NY 480#define NY 480

#define NX 640#define NX 640

#define BLOCKX 16#define BLOCKX 16

#define BLOCKY 16#define BLOCKY 16

#define SEARCHX 16#define SEARCHX 16

#define SEARCHY 16#define SEARCHY 16

unsigned char OldB[NX][NY]；unsigned char OldB[NX][NY];

unsigned char NewB[NX][NY]；unsigned char NewB[NX][NY];

unsigned short VectX[NX/BLOCKX][NY/BLOCKY]；unsigned short VectX[NX/BLOCKX][NY/BLOCKY];

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]；unsigned short VectY[NX/BLOCKX][NY/BLOCKY];

#define MIN(x，y) ((x<y)？x∶y)#define MIN(x,y) ((x<y)?x:y)

#define MAX(x，y) ((x>y)？x∶y)#define MAX(x, y) ((x > y)? x: y)

#define ABS(x) ((x<0)？(-x)：(x))#define ABS(x) ((x<0)?(-x):(x))

/＊initialization with reference image data for test purposes＊//*initialization with reference image data for test purposes*/

void init()void init()

{{

intx，y；intx,y;

for (x＝0；x<NX；x++) for (y＝0；y<NY；y++){for (x=0; x<NX; x++) for (y=0; y<NY; y++){

OldB[x][y]＝x^y； OldB[x][y]=x^y;

NewB[x][y]＝x+2＊y+2； NewB[x][y]=x+2*y+2;

} }

}}

main()main()

{{

int by，bx，cy，cx，yo，xo；int by, bx, cy, cx, yo, xo;

unsigned short best，bestx，besty，sumabsdiff0；unsigned short best, bestx, besty, sumabsdiff0;

init()；init();

for(by＝0；by<NY/BLOCKY；by++){for(by=0; by<NY/BLOCKY; by++){

for(bx＝0；bx<NX/BLOCKX；bx++){/＊for each 8×8 block in the for(bx=0; bx<NX/BLOCKX; bx++){/*for each 8×8 block in the

image＊/image＊/

best＝0×ffff；/＊look for the minimum difference＊/ best=0×ffff; /*look for the minimum difference*/

for(cy＝MAX(0，(by＊BLOCKY)-SEARCHY)； for(cy=MAX(0,(by*BLOCKY)-SEARCHY);

cy<MIN(NY-BLOCKY，(by＊BLOCKY)+SEARCHY)； cy<MIN(NY-BLOCKY, (by＊BLOCKY)+SEARCHY);

cy++){/＊for the old block at each line＊/ cy++){/*for the old block at each line*/

for (cx＝MAX(0，(bx＊BLOCKX)-SEARCHX)； for (cx=MAX(0, (bx*BLOCKX)-SEARCHX);

cx<MIN(NX-BLOCKX，(bx＊BLOCKX)+SEARCHX)； cx<MIN(NX-BLOCKX, (bx*BLOCKX)+SEARCHX);

cx++){ cx++){

/＊test the N×N block at(bx，by)against NxN blocks＊/ /*test the N×N block at (bx, by) against NxN blocks*/

/＊at(cx，cy)＊/ /*at(cx, cy)*/

sumabsdiff0＝0； sumabsdiff0 = 0;

for(yo＝0；yo<BLOCKY；yo++){/＊for each of N rows in block for(yo=0; yo<BLOCKY; yo++){/*for each of N rows in block

＊/*/

for(xo＝0；xo<BLOCKX；xo++){/＊for each of N pixels in for(xo=0; xo<BLOCKX; xo++){/*for each of N pixels in

row＊/row*/

sumabsdiff0+＝ sumabsdiff0+=

ABS(OldB[cx+xo][cy+yo]- ABS(OldB[cx+xo][cy+yo]-

NewB[bx＊BLOCKX+xo][by＊BLOCKY+yo])； NewB[bx＊BLOCKX+xo][by＊BLOCKY+yo]);

} }

if(sumabsdiff0<best){ if(sumabsdiff0<best){

best＝sumabsdiff0；bestx＝cx；besty＝cy；} best = sumabsdiff0; bestx = cx; besty = cy; }

} }

VectX[bx][by]＝bestx； VectX[bx][by]=bestx;

VectY[bx][by]＝besty； VectY[bx][by]=besty;

} }

}}

附录VII：用TIE来优化C代码Appendix VII: Optimizing C Code with TIE

像素数值被包装为4个/每字Pixel values are packed as 4/word

OldW是指向旧块的一个字阵列的指针OldW is a pointer to a word array of the old block

NewW是指向基块的一个字阵列的指针NewW is a pointer to an array of words for the base block

#define NY 480#define NY 480

#define NX 640#define NX 640

#define BLOCKX 16#define BLOCKX 16

#define BLOCKY 16#define BLOCKY 16

#define SEARCHX 16#define SEARCHX 16

#define SEARCHY 16#define SEARCHY 16

#define MIN(x，y) ((x<y)？x∶y)#define MIN(x,y) ((x<y)?x:y)

#define MAX(x，y) ((x>y)？x∶y)#define MAX(x, y) ((x > y)? x: y)

unsigned long OldW[NY][NX/sizeof(long)]；unsigned long OldW[NY][NX/sizeof(long)];

unsigned long NewW[NY][NX/sizeof(long)]；unsigned long NewW[NY][NX/sizeof(long)];

unsigned short VectX[NY/BLOCKY][NX/BLOCKX]；unsigned short VectX[NY/BLOCKY][NX/BLOCKX];

unsigned short VectY[NY/BLOCKY][NX/BLOCKX]；unsigned short VectY[NY/BLOCKY][NX/BLOCKX];

void init()void init()

{{

int x，y；int x, y;

for(x＝0；x<NX/sizeof(long)；x++)for(y＝0；y<NY；y++){for(x=0; x<NX/sizeof(long); x++) for(y=0; y<NY; y++){

OldW[y][x]＝((x<<2)^y)<<24|(((x<<2)+1)^y)<<16|(((x<<2)+2)^y)<<8 OldW[y][x]=((x<<2)^y)<<24|(((x<<2)+1)^y)<<16|(((x<<2)+2 )^y)<<8

|((x<<2)+3)^y；|((x<<2)+3)^y;

NewW[y][x]＝((x<<2)+2＊y+2)<<24|(((x<<2)+1)+2＊y+2)<<16| NewW[y][x]=((x<<2)+2*y+2)<<24|(((x<<2)+1)+2*y+2)<<16|

(((x<<2)+2)+2＊y+2)<<8|((x<<2)+3)+2＊y+2；(((x<<2)+2)+2*y+2)<<8|((x<<2)+3)+2*y+2;

} }

}}

main()main()

{{

register int by，bx，cy，cx，yo，xo；register int by, bx, cy, cx, yo, xo;

register unsigned shortregister unsigned short

best，bestx，besty，sumabsdiff0，sumabsdiffl，sumabsdiff2，sumabsdiff3；best, bestx, besty, sumabsdiff0, sumabsdiffl, sumabsdiff2, sumabsdiff3;

init()；init();

for(by＝0；by<NY/BLOCKY；by++)｛for(by=0; by<NY/BLOCKY; by++){

for(bx＝0；bx<NX/BLOCKX；bx++){/＊for each N×N block in the for(bx=0; bx<NX/BLOCKX; bx++){/*for each N×N block in the

image＊/image＊/

for(cy＝MAX(0，(by＊BLOCKY)-SEARCHY)； for(cy=MAX(0,(by*BLOCKY)-SEARCHY);

for(cx＝MAX(0，(bx＊BLOCKX-SEARCHX)/sizeof(long))； for(cx=MAX(0,(bx*BLOCKX-SEARCHX)/sizeof(long));

cx<MIN((NX-BLOCKX-2)/sizeof(long)，(bx＊BLOCKX+SEARCHX)/ cx<MIN((NX-BLOCKX-2)/sizeof(long), (bx*BLOCKX+SEARCHX)/

sizeof(long))；sizeof(long));

cx++){/＊and each word(4byte)offset in line＊/ cx++){/*and each word(4byte)offset in line*/

/＊test the NxN block at(bx，by) against four N×N blocks＊/ /*test the NxN block at(bx，by) against four N×N blocks*/

/＊at(cx，cy)，(cx+1B，cy)，(cx+2B，cy)(cx+3B，cy)＊/ /*at(cx, cy), (cx+1B, cy), (cx+2B, cy)(cx+3B, cy)*/

sumabsdiff0＝sumabsdiff1＝sumabsdiff2＝sumabsdiff3＝0； sumabsdiff0 = sumabsdiff1 = sumabsdiff2 = sumabsdiff3 = 0;

for(yo＝0；yo<BLOCKY；yo++){/*for each of the N lines in for(yo=0; yo<BLOCKY; yo++){/*for each of the N lines in

the block＊/the block*/

for(xo＝0；xo<BLOCKX/8；xo+＝2){ for(xo=0; xo<BLOCKX/8; xo+=2){

register unsigned long＊N，N1，N2＊O，A，B，C，W，X； register unsigned long*N, N1, N2*O, A, B, C, W, X;

N＝& NewW[by+yo][bx＊BLOCKX/sizeof(long)+xo]； N＝&NewW[by+yo][bx＊BLOCKX/sizeof(long)+xo];

N1＝＊N；N2＝＊(N+1)；/＊2words of subject image＊/ N1=*N; N2=*(N+1); /*2words of subject image*/

O＝& OldW[cy+yo][cx+xo]； O＝&OldW[cy+yo][cx+xo];

A＝＊O；B＝＊(O+1)；C＝＊(O+2)；/＊3words of A=*O; B=*(O+1); C=*(O+2); /*3 words of

reference＊/reference*/

sumabsdiff0+＝sad(A，N1)+sad(B，N2)； sumabsdiff0+=sad(A,N1)+sad(B,N2);

SHIFT(24)/＊shiftA，B，C left by one byte into W，X＊/ SHIFT(24)/*shift A, B, C left by one byte into W, X*/

sumabsdiff1+＝sad(W，N1)+sad(X，N2)； sumabsdiff1+=sad(W,N1)+sad(X,N2);

SHIFT(16)/＊shift ，B，C left by two bytes into W，X＊/ SHIFT(16)/*shift, B, C left by two bytes into W, X*/

sumabsdiff2+＝sad(W，N1)+sad(X，N2)； sumabsdiff2+=sad(W,N1)+sad(X,N2);

SHIFT(8)/＊shift A，B，C lft by three bytes into W，X SHIFT(8)/*shift A, B, C lft by three bytes into W, X

＊/*/

sumabsdiff3+＝sad(W，N1)+sad(X，N2)； sumabsdiff3+=sad(W,N1)+sad(X,N2);

} }

if(sumabsdiff0<best){ if(sumabsdiff0<best){

if(sumabsdiff1<best){ if(sumabsdiff1<best){

best＝sumabsdiffl；bestx＝cx+1；besty＝cy；} best = sumabsdiffl; bestx = cx+1; besty = cy; }

if(sumabsdiff2<best){ if(sumabsdiff2<best){

best＝sumabsdiff2；bestx＝cx+2；besty＝cy；} best = sumabsdiff2; bestx = cx+2; besty = cy; }

if(sumabsdiff3<best){ if(sumabsdiff3<best){

best＝sumabsdiff3；bestx＝cx+3；besty＝cy；} best = sumabsdiff3; bestx = cx+3; besty = cy; }

} }

VectX[bx][by]＝bestx； VectX[bx][by]=bestx;

VectY[bx][by]＝besty； VectY[bx][by]=besty;

} }

}}

附件DAnnex D

/＊/*

＊ TIE to Verilog translation routines＊ TIE to Verilog translation routines

＊/*/

/＊ SId：tie2ver_write.c，v 1.27 1999/05/11 00:10:18 awang Exp S＊//* SId: tie2ver_write.c, v 1.27 1999/05/11 00:10:18 awang Exp S */

/＊/*

＊ These coded instructions，statements，and computer programs are* These coded instructions, statements, and computer programs are

＊Confidential Proprietary Information of Tensilica Inc.and may not＊Confidential Proprietary Information of Tensilica Inc. and may not

bebe

＊disclosed to third parties or copied in any form，in whole or in＊disclosed to third parties or copied in any form, in whole or in

part，part,

＊without the prior written consent of Tensilica Inc.＊without the prior written consent of Tensilica Inc.

＊/*/

#include <math.h>#include <math.h>

#include″tie.h″#include "tie.h"

#include″st.h″#include "st.h"

#define COMMENTS″//Do not modify this automatically generated file.″#define COMMENTS″//Do not modify this automatically generated file.”

static void tie2ver_write_expression(static void tie2ver_write_expression(

FILE＊fp，tie_t＊exp，int lhs，st_table＊is，st_table＊os)；FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os);

#define tie2ver_program_foreach_instruction(_prog，_inst)｛ \#define tie2ver_program_foreach_instruction(_prog, _inst) { \

tie_t＊_iclass；\ tie_t*_iclass; \

tie_program_foreach_iclass(_prog，_iclass){ \ tie_program_foreach_iclass(_prog, _iclass) { \

if(tie_get_predefined(_iclass))continue； \ if(tie_get_predefined(_iclass))continue; \

tie_iclass_foreach_instruction(_iclass，_inst)｛ tie_iclass_foreach_instruction(_iclass, _inst){

#define end_tie2ver_program_foreach_instruction \#define end_tie2ver_program_foreach_instruction \

}end_tie_iclass_foreach_instruction； \ } end_tie_iclass_foreach_instruction; \

} end_tie_program_foreach_iclass； \ } end_tie_program_foreach_iclass; \

}}

#defineTIE_ENFLOP″\n\#defineTIE_ENFLOP″\n\

module tie_enflop(tie_out，tie_in，en，clk)；\n\module tie_enflop(tie_out, tie_in, en, clk);\n\

parameter size＝32；\n\parameter size=32;\n\

output[size-1∶0]tie_out；\n\output[size-1:0]tie_out;\n\

input[size-1∶0]tie_in；\n\input[size-1:0]tie_in;\n\

input en；\n\input en;\n\

input clk；\n\input clk;\n\

reg[size-1∶0] tmp；\n\reg[size-1:0]tmp;\n\

assign tie_out＝tmp；\n\assign tie_out=tmp;\n\

always@(posedge clk)begin\n\always@(posedge clk)begin\n\

if(en)\n\ if(en)\n\

tmp<＝#1tie_in；\n\ tmp<=#1tie_in;\n\

end\n\end\n\

endmodule\n″endmodule\n″

#define TIE_FLOP″\n\#define TIE_FLOP″\n\

module tie_flop(tie_out，tie_in，clk)；\n\module tie_flop(tie_out, tie_in, clk);\n\

parameter size＝32；\n\parameter size=32;\n\

output [size-1∶0] tie_out；\n\output [size-1:0] tie_out;\n\

input [size-1∶0] tie_in；\n\input [size-1:0] tie_in;\n\

input clk；\n\input clk;\n\

reg [size-1∶0] tmp；\n\reg [size-1:0] tmp;\n\

assign tie_out＝ tmp；\n\assign tie_out = tmp;\n\

always @(posedge clk)begin\n\always @(posedge clk)begin\n\

tmp<＝#1 tie_n；\n\tmp<=#1 tie_n;\n\

end\n\end\n\

endmodule\n″endmodule\n″

#define TIE_ATHENS_STATE″\n\#define TIE_ATHENS_STATE″\n\

module tie athens_state(ns，we，ke，kp，vw，clk，ps)；\n\module tie athens_state(ns, we, ke, kp, vw, clk, ps);\n\

parameter size＝32；\n\parameter size=32;\n\

input[size-1∶0]ns；//next state\n\input[size-1:0]ns; //next state\n\

input we； //write enable\n\input we; //write enable\n\

input ke； //Kill E state\n\input ke; //Kill E state\n\

input kp； //Kill Pipeline\n\input kp; //Kill Pipeline\n\

input vw； //Valid W state\n\input vw; //Valid W state\n\

input clk； //clock\n\input clk; //clock\n\

output [size-1∶0]ps；//present state\n\output [size-1:0] ps; //present state\n\

\n\\n\

wire[size-1∶0]se； //state at E stage\n\wire[size-1∶0]se; //state at E stage\n\

wire[size-1∶0]sm； //state at M stage\n\wire[size-1∶0]sm; //state at M stage\n\

wire[size-1∶0]sw； //state at W stage\n\wire[size-1∶0]sw; //state at W stage\n\

wire[size-1∶0]sx； //state at X stage\n\wire[size-1∶0]sx; //state at X stage\n\

wire ee； //write enable for EM register\n\wire ee; //write enable for EM register\n\

wire ew； //write enable for WX register\n\wire ew; //write enable for WX register\n\

\n\\n\

assign se＝kp？sx：ns；\n\assign se=kp? sx:ns;\n\

assign ee＝kp|we &～ke；\n\assign ee＝kp|we &～ke;\n\

assign ew＝vw &～kp；\n\assign ew=vw &～kp;\n\

assign ps＝sm；\n\assign ps=sm;\n\

\n\\n\

tie_enflop #(size)state_EM(.tie_out(sm)，.tie_in(se)，.en(ee)，tie_enflop #(size) state_EM(.tie_out(sm), .tie_in(se), .en(ee),

.clk(clk))；\n\.clk(clk));\n\

tie_flop #(size)state_MW(.tie_out(sw)，.tie_in(sm)，.clk(clk))；\n\tie_flop #(size)state_MW(.tie_out(sw), .tie_in(sm), .clk(clk));\n\

tie_enflop #(size)state_WX(.tie_out(sx)，.tie_in(sw)，.en(ew)，tie_enflop #(size) state_WX(.tie_out(sx), .tie_in(sw), .en(ew),

.clk(clk))；\n\.clk(clk));\n\

\n\\n\

endmodule\n″endmodule\n″

建立和返回全局程序→用于用户定义的各项指令的操作数的操 Build and return global program → operations for operands of user-defined instructions

作数表格。返回的表格不含在预先定义的各项指令所使用的各操 Make a table of numbers. The returned table does not contain the operations used by the pre-defined instructions

作数。 Count.

＊＊＊＊＊＊/＊＊＊＊＊＊/

static t_table＊static t_table*

tie2ver_program_get_operand_table(tie_t＊prog)tie2ver_program_get_operand_table(tie_t*prog)

{{

static st_table＊tie2ver_program_args＝0； static st_table * tie2ver_program_args = 0;

tie_t＊inst； tie_t * inst;

char＊key，＊value； char*key,*value;

st_table＊operand_table； st_table * operand_table;

st_generator＊gen； st_generator * gen;

if (tie2ver_program_args＝＝0){ if (tie2ver_program_args==0){

tie2ver_program_args＝st_init_table(strcmp，st_strhash)； tie2ver_program_args = st_init_table(strcmp, st_strhash);

tie2ver_program_foreach_instruction(prog，inst){ tie2ver_program_foreach_instruction(prog, inst) {

operand_table＝tie_instruction_get_operand_table(inst)； operand_table = tie_instruction_get_operand_table(inst);

st_foreach_item(operand_table，gen，&key，&value)｛ st_foreach_item(operand_table, gen, &key, &value){

st_insert(tie2ver_program_args，key，value)； st_insert(tie2ver_program_args, key, value);

} }

}end_tie2ver_program_foreach_instruction； } end_tie2ver_program_foreach_instruction;

} }

return tie2ver_program_args； return tie2ver_program_args;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一个接线语句 print a wiring statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_wire(FILE＊fp，tie_t＊wire)tie2ver_write_wire(FILE*fp, tie_t*wire)

{{

int from，to，write_comma； int from, to, write_comma;

tie_t＊first，＊second，＊var； tie_t *first, *second, *var;

first＝tie_get_first_child(wire)； first = tie_get_first_child(wire);

ASSERT(tie_get_type(first)＝＝TIE_INT)； ASSERT(tie_get_type(first)==TIE_INT);

from＝tie_get_integer(first)； from=tie_get_integer(first);

second＝tie_get_next_sibling(first)； second = tie_get_next_sibling(first);

ASSERT(tie_get_type(second)＝＝TIE_INT)； ASSERT(tie_get_type(second)==TIE_INT);

to＝tie_get_integer(second)； to = tie_get_integer(second);

fprintf(fp，″wire″)； fprintf(fp, "wire");

if(！(from＝＝0 && to＝＝0)){ if(!(from==0 && to==0)){

fprintf(fp，″[％d∶％d]″，from，to)； fprintf(fp, "[%d:%d]", from, to);

} }

write_comma＝0； write_comma=0;

var＝tie_get_next_sibling(second)； var = tie_get_next_sibling(second);

while(var！＝0){ while (var != 0) {

if(write_comma){ if(write_comma){

fprintf(fp，″，″)； fprintf(fp, ", ");

}else{ }else{

write_comma＝1； write_comma=1;

} }

fprintf(fp，″％s″，tie_get_identifier(var))； fprintf(fp, "%s", tie_get_identifier(var));

var＝tree_get_next_sibling(var)； var = tree_get_next_sibling(var);

} }

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一个unary表达式 print a unary expression

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_unary(tie2ver_write_unary(

FILE＊fp，const char＊op，tie_t＊exp，intlhs，st_table＊is，st_tableFILE*fp, const char*op, tie_t*exp, intlhs, st_table*is, st_table

＊os)*os)

{{

fprintf(fp，″％s(″，op)； fprintf(fp, "%s(", op);

tie2ver_write_expression(fp，exp，lhs，is，os)； tie2ver_write_expression(fp, exp, lhs, is, os);

fprintf(fp，″)″)； fprintf(fp, ")");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一个二进制表达式 print a binary expression

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_binary(tie2ver_write_binary(

FILE＊fp，const char＊op，tie_t＊exp1，tree_t＊exp2，FILE*fp, const char*op, tie_t*exp1, tree_t*exp2,

int lhs，st table＊is，st_table＊os)int lhs, st_table*is, st_table*os)

{{

fprintf(fp，″(″)； fprintf(fp, "(");

tie2ver_write expression(fp，exp1，lhs，is，os)； tie2ver_write expression(fp, exp1, lhs, is, os);

fprintf(fp，″)％s(″，op)； fprintf(fp, ")%s(", op);

tie2ver_write_expression(fp，exp2，lhs，is，os)； tie2ver_write_expression(fp, exp2, lhs, is, os);

fprintf(fp，″)″)； fprintf(fp, ")");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一个标识符 print an identifier

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_identifier(tie2ver_write_identifier(

FILE＊fp，tie_t＊id，int lhs，st_table＊is，st_table＊os)FILE*fp, tie_t*id, int lhs, st_table*is, st_table*os)

{{

tie_t＊prog，＊first，＊second； tie_t *prog, *first, *second;

char＊name，＊dummy； char *name, *dummy;

name＝tie_get_identifier(id)； name = tie_get_identifier(id);

if((is！＝0) && st_lookup(is，name，&dummy)){ if((is != 0) && st_lookup(is, name, &dummy)){

fprintf(fp，″％s_％s″，name，lhs？″ns″：″ps″)； fprintf(fp, "%s_%s", name, lhs? "ns": "ps");

}else if((os！＝0) && st_lookup(os，name，&dummy)){ } else if((os!=0) && st_lookup(os, name, &dummy)){

}else{ }else{

fprintf(fp，″％s″，name)； fprintf(fp, "%s", name);

} }

first＝tie_get_first_child(id)； first = tie_get_first_child(id);

if(first＝＝0){ if(first==0){

return； return;

} }

/＊detect whether this is a table access＊/ /*detect whether this is a table access*/

prog＝tie_get_program(id)； prog = tie_get_program(id);

if(tie_program_get_table_by_name(prog，name)！＝0){ if(tie_program_get_table_by_name(prog, name) != 0){

switch(tie_get_type(first)){ switch(tie_get_type(first)){

caseTIE_ID： caseTIE_ID:

fprintf(fp，″(％s)″，tie_get_identifier(first))； fprintf(fp, "(%s)", tie_get_identifier(first));

break； break;

case TIE_INT： case TIE_INT:

fprintf(fp，″(％d)″，tie_get_integer(first))； fprintf(fp, "(%d)", tie_get_integer(first));

break； break;

default： default:

DIE(″Error：expected type\n″)； DIE("Error: expected type\n");

} }

return； return;

} }

second＝tie_get_next_sibling(first)； second = tie_get_next_sibling(first);

if(second＝＝0)｛ if(second==0){

fprintf(fp，″[％d]″，tie_get_integer(first))； fprintf(fp, "[%d]", tie_get_integer(first));

return； return;

} }

fprintf(fp，″[％d∶％d]″，tie_get_integer(first) fprintf(fp, "[%d:%d]", tie_get_integer(first)

tie_get_integer(second))；tie_get_integer(second));

}}

打印连锁表达式 print chain expression

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_concatenation(tie2ver_write_concatenation(

FILE＊fp，tie_t＊exp，intlhs，st_table＊is，st_table＊os)FILE*fp, tie_t*exp, intlhs, st_table*is, st_table*os)

{{

tie_t＊comp； tie_t * comp;

int write_comma； int write_comma;

write_comma＝0； write_comma=0;

fprintf(fp，″｛″)； fprintf(fp, "{");

tie_foreach_child(exp，comp)｛ tie_foreach_child(exp, comp) {

if(write_comma){ if(write_comma){

fprintf(fp，″，″)； fprintf(fp, ", ");

}else{ }else{

write_comma＝1； write_comma=1;

} }

tie2ver_write_expression(fp，comp，lhs，is，os)； tie2ver_write_expression(fp, comp, lhs, is, os);

}end_tie_foreach_child； } end_tie_foreach_child;

fprintf(fp，″}″)； fprintf(fp, "}");

}}

打印条件语句 print conditional statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_conditional(tie2ver_write_conditional(

FILE＊fp，tie_t＊exp，int lhs，st_table＊is，st_table＊os)FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)

{{

tie_t＊cond_exp，＊then_exp，＊else_exp； tie_t *cond_exp, *then_exp, *else_exp;

cond_exp＝tie_get_first_child(exp)； cond_exp = tie_get_first_child(exp);

then_exp＝tie_get_next_sibling(cond_exp)； then_exp = tie_get_next_sibling(cond_exp);

else_exp＝tie_get_next_sibling(then_exp)； else_exp = tie_get_next_sibling(then_exp);

ASSERT(tie_get_last_child(exp)＝＝else_exp)； ASSERT(tie_get_last_child(exp)==else_exp);

fprintf(fp，″(″)； fprintf(fp, "(");

tie2ver_write_expression(fp，cond_exp，lhs，is，os)； tie2ver_write_expression(fp, cond_exp, lhs, is, os);

fprintf(fp，″)？(″)； fprintf(fp, ")? (");

tie2ver_write_expression(fp，then_exp，lhs，is，os)； tie2ver_write_expression(fp, then_exp, lhs, is, os);

fprintf(fp，″)：(″)； fprintf(fp, "): (");

tie2ver_write_expression(fp，else_exp，lhs，is，os)； tie2ver_write_expression(fp, else_exp, lhs, is, os);

fprintf(fp，″)″)； fprintf(fp, ")");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印复制语句 print copy statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_replication(tie2ver_write_replication(

{{

tie_t＊num，＊comp； tie_t *num, *comp;

num＝tie_get_first_child(exp)； num = tie_get_first_child(exp);

comp＝tie_get_next_sibling(num)； comp = tie_get_next_sibling(num);

ASSERT(tie_get_last_child(exp)＝＝comp)； ASSERT(tie_get_last_child(exp)==comp);

ASSERT(tie_get_type(num)＝＝TIE_INT)； ASSERT(tie_get_type(num)==TIE_INT);

fprintf(fp，″{％d{″，tie_get_integer(num))； fprintf(fp, "{%d{", tie_get_integer(num));

fprintf(fp，″}}″)； fprintf(fp, "}}");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一个表达式 print an expression

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_expression(tie2ver_write_expression(

{{

tie_type_ttype； tie_type_ttype;

tie_t＊first，＊second； tie_t *first, *second;

first＝tie_get_first_child(exp)； first = tie_get_first_child(exp);

second＝first＝＝0？0：tie_get_next_sibling(first)； second=first==0?0: tie_get_next_sibling(first);

switch(type＝tie_get_type(exp))(switch(type=tie_get_type(exp))(

case TIE_ID：case TIE_ID:

tie2ver_write_identifier(fp，exp，lhs，is，os)； tie2ver_write_identifier(fp, exp, lhs, is, os);

break； break;

case TIE_INT：case TIE_INT:

fprintf(fp，″％d″，tie_get_integer(exp))； break； fprintf(fp, "%d", tie_get_integer(exp)); break;

case TIE_CONST：case TIE_CONST:

fprintf(fp，″％s″，tie_get_constant(exp))； break； fprintf(fp, "%s", tie_get_constant(exp)); break;

case TIE_LOGICAL_NEGATION：case TIE_LOGICAL_NEGATION:

tie2ver_write_unary(fp，″！″，first，lhs，is，os)；break； tie2ver_write_unary(fp, "!", first, lhs, is, os); break;

case TIE_LOGICAL_AND：case TIE_LOGICAL_AND:

tie2ver_write_binary(fp，″&&″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "&&", first, second, lhs, is, os);

break； break;

case TIE_LOGICAL_OR：case TIE_LOGICAL_OR:

tie2ver_write_binary(fp，″||″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "||", first, second, lhs, is, os);

break； break;

case TIE_BITWISE_NEGATION：case TIE_BITWISE_NEGATION:

tie2ver_write_unary(fp，″～″，first，lhs，is，os)；break； tie2ver_write_unary(fp, "~", first, lhs, is, os); break;

case TIE_BITWISE_AND：case TIE_BITWISE_AND:

tie2ver_write_binary(fp，″&″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "&", first, second, lhs, is, os);

break； break;

case TIE_BITWISE_OR：case TIE_BITWISE_OR:

tie2ver_write_binary(fp，″|″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "|", first, second, lhs, is, os);

break； break;

case TIE_BITWISE_XOR：case TIE_BITWISE_XOR:

tie2ver_write_binary(fp，″＾″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "^", first, second, lhs, is, os);

break； break;

case TIE_BITWISE_XNOR：case TIE_BITWISE_XNOR:

tie2ver_write_binary(fp，″～＾″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "~^", first, second, lhs, is, os);

break； break;

case TIE_ADD：case TIE_ADD:

tie2ver_write_binary(fp，″+″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "+", first, second, lhs, is, os);

break； break;

case TIE_SUB：case TIE_SUB:

tie2ver_write_binary(fp，″-″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "-", first, second, lhs, is, os);

break； break;

case TIE_MULT：case TIE_MULT:

tie2ver_write_binary(fp，″＊″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "*", first, second, lhs, is, os);

break； break;

case TIE_GT：case TIE_GT:

tie2ver_write_binary(fp，″>″，first，second，lhs，is，os)； tie2ver_write_binary(fp, ">", first, second, lhs, is, os);

break； break;

case TIE_GEQ：case TIE_GEQ:

tie2ver_write_binary(fp，″>＝″，first，second，lhs，is，os)； tie2ver_write_binary(fp, ">=", first, second, lhs, is, os);

break； break;

case TIE_LT：case TIE_LT:

tie2ver_write_binary(fp，″<″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "<", first, second, lhs, is, os);

break； break;

case TIE_LEQ：case TIE_LEQ:

tie2ver_write_binary(fp，″<＝″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "<=", first, second, lhs, is, os);

break； break;

case TIE_EQ：case TIE_EQ:

tie2ver_write_binary(fp，″＝＝″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "==", first, second, lhs, is, os);

break； break;

case TIE_NEQ：case TIE_NEQ:

tie2ver_write_binary(fp，″！＝″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "!=", first, second, lhs, is, os);

break； break;

case TIE_REDUCTION_AND： case TIE_REDUCTION_AND:

tie2ver_write_unary(fp，″&″，first，lhs，is，os)；break； tie2ver_write_unary(fp, "&", first, lhs, is, os); break;

case TIE_REDUCTION_OR： case TIE_REDUCTION_OR:

tie2ver_write_unary(fp，″|″，first，lhs，is，os)；break； tie2ver_write_unary(fp, "|", first, lhs, is, os); break;

case TIE_REDUCTION_XOR： case TIE_REDUCTION_XOR:

tie2ver_write_unary(fp，″^″，first，lhs，is，os)；break； tie2ver_write_unary(fp, "^", first, lhs, is, os); break;

case TIE_SHIFT_LEFT： case TIE_SHIFT_LEFT:

tie2ver_write_binary(fp，″<<″，first，second，lhs，is，os)； tie2ver_write_binary(fp, "<<", first, second, lhs, is, os);

break； break;

case TIE_SHIFT_RIGHT： case TIE_SHIFT_RIGHT:

tie2ver_write_binary(fp，″>>″，first，second，lhs，is，os)； tie2ver_write_binary(fp, ">>", first, second, lhs, is, os);

break； break;

case TIE_REPLICATION： case TIE_REPLICATION:

tie2ver_write_replication(fp，exp，lhs，is，os)； tie2ver_write_replication(fp, exp, lhs, is, os);

break； break;

case TIE_CONCATENATION： case TIE_CONCATENATION:

tie2ver_write_concatenation(fp，exp，lhs，is，os)； tie2ver_write_concatenation(fp, exp, lhs, is, os);

break； break;

case TIE_CONDITIONAL： case TIE_CONDITIONAL:

tie2ver_write_conditional(fp，exp，lhs，is，os)； tie2ver_write_conditional(fp, exp, lhs, is, os);

break； break;

default： default:

fprintf(stderr，″Wrong type：％d\n″，type)； fprintf(stderr, "Wrong type: %d\n", type);

DIE(″Error：wrong expression type\n″)； DIE("Error: wrong expression type\n");

} }

}}

打印一个赋值语句 print an assignment statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_assignment(tie2ver_write_assignment(

FILE＊fp，tie_t＊assign，st_table＊in_states，st_table＊out_states)FILE*fp, tie_t*assign, st_table*in_states, st_table*out_states)

{{

tie_t*lval，*rval； tie_t *lval, *rval;

ASSERT(tie_get_type(assign)＝＝TIE_ASSIGNMENT)； ASSERT(tie_get_type(assign)==TIE_ASSIGNMENT);

lval＝tie_get_first_child(assign)； lval = tie_get_first_child(assign);

rval＝tie_get_last_child(assign)； rval = tie_get_last_child(assign);

ASSERT(tie_get_next_-sibling(lval)＝＝rval)； ASSERT(tie_get_next_sibling(lval)==rval);

ASSERT(tie_get_-prev_sibling(rval)＝＝lval)； ASSERT(tie_get_-prev_sibling(rval)==lval);

fprintf(fp，″assign″)； fprintf(fp, "assign");

tie2ver_write_expression(fp，lval，1，in_states，out_states)； tie2ver_write_expression(fp, lval, 1, in_states, out_states);

fprintf(fp，″＝″)； fprintf(fp, "=");

tie2ver_write_expression(fp，rval，0，in_states，out_states)； tie2ver_write_expression(fp, rval, 0, in_states, out_states);

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

打印一份语句列表 print a statement list

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_statement(tie2ver_write_statement(

FIEE＊fp，tie_t＊statement，st_table＊in_states，st_table＊out_states)FIEE*fp, tie_t*statement, st_table*in_states, st_table*out_states)

{{

tie_t＊child； tie_t*child;

ASSERT(tie_get_type(statement)＝＝TIE_STATEMENT)； ASSERT(tie_get_type(statement)==TIE_STATEMENT);

tie_foreach_child(statement，child){ tie_foreach_child(statement, child){

switch(tie_get_type(child)){ switch(tie_get_type(child)){

case TIE_WIRE： case TIE_WIRE:

tie2ver_write_wire(fp，child)； tie2ver_write_wire(fp, child);

break； break;

case TIE_ASSIGNMENT： case TIE_ASSIGNMENT:

tie2ver_write_assignment(fp，child，in_states，out_states)； tie2ver_write_assignment(fp, child, in_states, out_states);

break； break;

default： default:

DIE(″Error：illegal program statement\n″)； DIE("Error: illegal program statement\n");

} }

}end_tie_foreach_child； } end_tie_foreach_child;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为“iclass”编写模块定义 Write a module definition for "iclass"

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_module_declaration(FILE＊fp，tie_t＊semantic)tie2ver_write_module_declaration(FILE*fp, tie_t*semantic)

{{

st_table＊operand_table，＊state_table； st_table *operand_table, *state_table;

st_generator＊gen； st_generator * gen;

tie_t＊ilist，＊inst； tie_t *ilist, *inst;

char＊c，＊key，＊value； char *c, *key, *value;

fprintf(fp，″\n″)； fprintf(fp, "\n");

fprintf(fp，″module ％s(″，tie_semantic_get_name(semantic))； fprintf(fp, "module %s(", tie_semantic_get_name(semantic));

c＝″″； c = "";

operand_table＝tie_semantic_get_operand_table(semantic)； operand_table = tie_semantic_get_operand_table(semantic);

st_foreach_item(operand_table，gen，&key，&value){ st_foreach_item(operand_table, gen, &key, &value){

fprintf(fp，″％s％s″，c，key)； fprintf(fp, "%s %s", c, key);

c＝″，″； c = ",";

} }

state_table＝tie_semantic_get_in_state_table(semantic)； state_table = tie_semantic_get_in_state_table(semantic);

st_foreach_item(state_table，gen，&key，&value){ st_foreach_item(state_table, gen, &key, &value){

fprintf(fp，″％s％s_ps″，c，key)； fprintf(fp, "%s%s_ps", c, key);

c＝″，″； c = ",";

} }

state_table＝tie_semantic_get_out_state_table(semantic)； state_table = tie_semantic_get_out_state_table(semantic);

fprintf(fp，″％s％s_ns″，c，key)； fprintf(fp, "%s%s_ns", c, key);

fprintf(fp，″％s％s_we″，c，key)； fprintf(fp, "%s%s_we", c, key);

c＝″，″； c = ",";

} }

ilist＝tie_semantic_get_inst_list(semantic)； ilist = tie_semantic_get_inst_list(semantic);

tie_inst_list_foreach_instruction(ilist，inst){ tie_inst_list_foreach_instruction(ilist, inst) {

fprintf(fp，″，％s″，tie_instruction_get_name(inst))； fprintf(fp, ", %s", tie_instruction_get_name(inst));

}end_tie_inst_list_foreach_instruction； } end_tie_inst_list_foreach_instruction;

fprintf(fp，″)；\n″)； fprintf(fp, ");\n");

switch((tie_type_t)value){ switch((tie_type_t)value){

case TIE_ARG_IN： case TIE_ARG_IN:

fprintf(fp，″input[31∶0]％s；\n″，key)；break； fprintf(fp, "input[31:0] %s;\n", key); break;

case TIE_ARG_OUT： case TIE_ARG_OUT:

fprintf(fp，″output[31∶0]％s；\n″，key)；break； fprintf(fp, "output[31:0] %s;\n", key); break;

case TIE_ARG_INOUT： case TIE_ARG_INOUT:

fprintf(fp，″inout[31∶0]％s；\n″，key)；break； fprintf(fp, "inout[31:0] %s;\n", key); break;

default： default:

DIE(″Error：unexpected arg type\n″)； DIE("Error: unexpected arg type\n");

} }

fprintf(fp，″input[％d∶0]％s_ps；\n″，(int)value_1，key)； fprintf(fp, "input[%d:0]%s_ps;\n", (int) value_1, key);

} }

fprintf(fp，″output[％d∶0]％s_ns；\n″，(int)value-1，key)； fprintf(fp, "output[%d:0]%s_ns;\n", (int) value-1, key);

fprintf(fp，″output％s_we；\n″，key)； fprintf(fp, "output% s_we;\n", key);

} }

fprintf(fp，″input ％s；\n″，tie_instruction_get_name(inst))； fprintf(fp, "input %s;\n", tie_instruction_get_name(inst));

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

将“表格”打印到一个TIE文件 Print the "table" to a TIE file

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_table(FILE＊fp，tie_t＊table)tie2ver_write_table(FILE*fp, tie_t*table)

{{

int i，width，size，bits，ivalue； int i, width, size, bits, ivalue;

char＊oname，＊iname，＊cvalue； char *oname, *iname, *cvalue;

tie_t＊value； tie_t*value;

oname＝tie_table_get_name(table)； >

iname＝″index″； iname = "index";

width＝tie_table_get_width(table)； width = tie_table_get_width(table);

size＝tie_table_get_depth(table)； size=tie_table_get_depth(table);

bits＝(int)ceil(log(size)/log(2))； bits=(int)ceil(log(size)/log(2));

fprintf(fp，″\nfunction[％d∶0]％s；\n″，width-1，oname)； fprintf(fp, "\nfunction[%d:0] %s;\n", width-1, oname);

fprintf(fp，″input[％d∶0]％s；\n″，bits-1，iname)； fprintf(fp, "input[%d:0]%s;\n", bits-1, iname);

fprintf(fp，″case(％s)\n″，iname)； fprintf(fp, "case(%s)\n", iname);

i＝0； i=0;

tie table_foreach_value(table，value){ tie table_foreach_value(table, value) {

fprintf(fp，″％d′d％d：％s＝″，bits，i，oname)； fprintf(fp, "%d'd %d: %s=", bits, i, oname);

switch(tie_get_type(value)){ switch(tie_get_type(value)){

case TIE_CONST： case TIE_CONST:

cvalue＝tie_get_constant(value)； cvalue = tie_get_constant(value);

fprintf(fp，″％d′b％s；\n″，width， fprintf(fp, "%d'b%s;\n", width,

tie_constant_get_binary_string(cvalue))；tie_constant_get_binary_string(cvalue));

break； break;

case TIE_INT： case TIE_INT:

ivalue＝tie_get_integer(value)； ivalue = tie_get_integer(value);

fprintf(fp，″％d′d％d；\n″，width，ivalue)； fprintf(fp, "%d'd %d;\n", width, ivalue);

break； break;

default： default:

DIE(″Internal Error：unexpected type\n″)； DIE("Internal Error: unexpected type\n");

} }

i++； i++;

}end_tie_table_foreach_value； } end_tie_table_foreach_value;

fprintf(fp，″default：％s＝％d′d0；\n″，oname，width)； fprintf(fp, "default: %s = %d'd0;\n", oname, width);

fprintf(fp，″endcase\n″)； fprintf(fp, "endcase\n");

fprintf(fp，″endfunction\n″)； fprintf(fp, "endfunction\n");

}}

为被“语义”语句修改的每一种状态编写写使能逻辑 Write write-enable logic for each state modified by a "semantic" statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_semantic_write_we(FILE＊fp，tie_t＊semantic)tie2ver_semantic_write_we(FILE*fp, tie_t*semantic)

{{

tie_t＊inst； tie_t * inst;

st_table＊semantic_state_table，＊inst_state_table； st_table *semantic_state_table, *inst_state_table;

st_generator＊gen； st_generator * gen;

char＊key，＊value，＊c，＊iname； char *key, *value, *c, *iname;

int found； int found;

semantic_state_table＝tie_semantic_get_out_state_table(semantic)； semantic_state_table = tie_semantic_get_out_state_table(semantic);

st_foreach_item(semantic_state_table，gen，&key，&value){ st_foreach_item(semantic_state_table, gen, &key, &value){

fprintf(fp，″assign％s_we＝″，key)； fprintf(fp, "assign%s_we=", key);

c＝″″； c = "";

tie_semantic_foreach_instruction(semantic，inst){ tie_semantic_foreach_instruction(semantic, inst) {

iname＝tie_instruction_get_name(inst)； iname = tie_instruction_get_name(inst);

inst_state_table＝tie_instruction_get_state_table(inst)； inst_state_table = tie_instruction_get_state_table(inst);

found＝st_lookup(inst_state_table，key，&value)； found = st_lookup(inst_state_table, key, &value);

if (found && ((tie_type_t) value！＝TIE_ARG_IN)){ if (found && ((tie_type_t) value!=TIE_ARG_IN)){

fprintf(fp，″％s1′b1 & ％s″，c，iname)； fprintf(fp, "%s1'b1 & %s", c, iname);

}else{ }else{

fprintf(fp，″％s1′b0 & ％s″，c，iname)； fprintf(fp, "%s1'b0 & %s", c, iname);

} }

c＝″\n|″； c = "\n|";

}end_tie_semantic_foreach_instruction； } end_tie_semantic_foreach_instruction;

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

} }

}}

＊*

将“语义”语句写入TIE文件 Write "semantic" statements to a TIE file

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_semantic(FILE＊fp，tie_t＊semantic)tie2ver_write_semantic(FILE*fp, tie_t*semantic)

{{

tie_t＊table，＊statement； tie_t *table, *statement;

ls_t＊tables； ls_t *tables;

st_table＊in_state_table，＊out_state_table； st_table *in_state_table, *out_state_table;

ASSERT(tie_get_type(semantic)＝＝TIE_SEMANTIC)； ASSERT(tie_get_type(semantic)==TIE_SEMANTIC);

tie2ver_write_module_declaration(fp，semantic)； tie2ver_write_module_declaration(fp, semantic);

statement＝tie_semantic_get_statement(semantic)； statement = tie_semantic_get_statement(semantic);

in_state_table＝tie_semantic_get_in_state_table(semantic)； in_state_table = tie_semantic_get_in_state_table(semantic);

out_state_table＝tie_semantic_get_out_state_table(semantic)； out_state_table = tie_semantic_get_out_state_table(semantic);

tie2ver_write_statement(fp，statement，in_state_table， tie2ver_write_statement(fp, statement, in_state_table,

out_state_table)；out_state_table);

tables＝tie_expression_get_tables(statement， tables = tie_expression_get_tables(statement,

tie_get_program(semantic))；tie_get_program(semantic));

ls_foreach data(tie_t＊，tables，table){ ls_foreach data(tie_t*, tables, table) {

tie2ver_write_table(fp，table)； tie2ver_write_table(fp, table);

}end_ls_foreach_data； } end_ls_foreach_data;

ls_free(tables)； ls_free(tables);

tie2ver_semantic_write_we(fp，semantic)； tie2ver_semantic_write_we(fp, semantic);

fprintf(fp，″endmodule\n″)； fprintf(fp, "endmodule\n");

}}

为组合语义打印顶级模块说明 print top-level module descriptions for composite semantics

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_top_module(FILE＊fp，tie_t＊prog)tie2ver_write_top_module(FILE*fp, tie_t*prog)

{{

st_generator＊gen； st_generator * gen;

char＊key，＊value； char*key,*value;

st_table＊operand_table； st_table * operand_table;

tie_t＊inst，＊iclass； tie_t *inst, *iclass;

fprintf(fp，″\n″)； fprintf(fp, "\n");

fprintf(fp，″module UserInstModule(clk，out_E，ars_E，art_E， fprintf(fp, "module UserInstModule(clk, out_E, ars_E, art_E,

inst_R″)；inst_R″);

fprintf(fp，″，Kill_E，killPipe_W，valid_W″)； fprintf(fp, ", Kill_E, killPipe_W, valid_W");

tie_program_foreach_iclass(prog，iclass){ tie_program_foreach_iclass(prog, iclass){

if(tie_get_predefined(iclass))continue； if(tie_get_predefined(iclass))continue;

tie_iclass_foreach_instruction(iclass，inst){ tie_iclass_foreach_instruction(iclass, inst) {

fprintf(fp，″，％s_R″，tie_instruction_get_name(inst))； fprintf(fp, ", %s_R", tie_instruction_get_name(inst));

}end_tie_iclass_foreach_instruction； } end_tie_iclass_foreach_instruction;

}end_tie_program_foreach_iclass； } end_tie_program_foreach_iclass;

fprintf(fp，″，en_R)；\n″)； fprintf(fp, ", en_R);\n");

fprintf(fp，″input clk；\n″)； fprintf(fp, "input clk;\n");

fprintf(fp，″output[31∶0]out_E；\n″)； fprintf(fp, "output[31:0] out_E;\n");

fprintf(fp，″input[31∶0]ars_E；\n″)； fprintf(fp, "input[31:0]ars_E;\n");

fprintf(fp，″input[31∶0]art_E；\n″)； fprintf(fp, "input[31:0]art_E;\n");

fprintf(fp，″input[23∶0]inst_R；\n″)； fprintf(fp, "input[23:0] inst_R;\n");

fprintf(fp，″input en_R；\n″)； fprintf(fp, "input en_R;\n");

fprintf(fp，″input Kill_E，killPipe_W，valid_W；\n″)； fprintf(fp, "input Kill_E, killPipe_W, valid_W;\n");

fprintf(fp，″input ％s_R；\n″，tie_instruction_get_name(inst))； fprintf(fp, "input %s_R;\n", tie_instruction_get_name(inst));

fprintf(fp，″wire ％s_E；\n″，tie_instruction_get_name(inst))； fprintf(fp, "wire %s_E;\n", tie_instruction_get_name(inst));

operand_table＝tie2ver_program_get_operand_table(prog)； operand_table = tie2ver_program_get_operand_table(prog);

if((tie_type_t)value！＝TIE_ARG_IN){ if((tie_type_t)value!=TIE_ARG_IN){

fprintf(fp，″wire[31∶0]％s_E；\n″，key)； fprintf(fp, "wire[31:0]%s_E;\n", key);

} }

}}

为每一个语义块和为每一个选择信号的每一个输出编写一 Write one for each semantic block and for each output of each selection signal

段接线程序 segment wiring program

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_wire_declaration(FILE*fp，tie_t＊prog)tie2ver_write_wire_declaration(FILE*fp, tie_t*prog)

{{

tie_t＊semantic，＊state； tie_t *semantic, *state;

st_table＊operand_table，＊global_operand_table； st_table *operand_table, *global_operand_table;

st_table＊state_table； st_table * state_table;

st_generator＊gen； st_generator * gen;

char＊key，＊value，＊shame； char *key, *value, *shame;

int width； int width;

global_operand_table＝tie2ver_program_get_operand_table(prog)； global_operand_table = tie2ver_program_get_operand_table(prog);

st_forsach_item(global_operand_table，gen，&key，&value){ st_forsach_item(global_operand_table, gen, &key, &value){

if((tie_type_t)value＝＝TIE_ARG_IN){ if((tie_type_t)value==TIE_ARG_IN){

if(strcmp(key，″art″)！＝0 && strcmp(key，″ars″)！＝0){ if(strcmp(key, "art") != 0 && strcmp(key, "ars") != 0){

fprintf(fp，″wire[31∶0]％s_R，％s_E；\n″，key，key)； fprintf(fp, "wire[31:0] %s_R, %s_E;\n", key, key);

} }

}}

tie_program_foreach_state(prog，state){ tie_program_foreach_state(prog, state) {

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

sname＝tie_state_get_name(state)； sname = tie_state_get_name(state);

width＝tie_state_get_width(state)； width = tie_state_get_width(state);

fprintf(fp，″wire[％d∶0]％s_ps，％s_ns；\n″，width-1，sname， fprintf(fp, "wire[%d:0] %s_ps, %s_ns;\n", width-1, sname,

sname)；sname);

fprintf(fp，″wire ％s_we；\n″，sname)； fprintf(fp, "wire % s_we;\n", sname);

} end_tie_program_foreach_state； } end_tie_program_foreach_state;

tie_program_foreach_semantic(prog，semantic){ tie_program_foreach_semantic(prog, semantic){

if(tie_get_predefined(semantic))continue； if(tie_get_predefined(semantic))continue;

sname＝tie_semantic_get_name(semantic)； sname = tie_semantic_get_name(semantic);

if((tie_type_t)value！＝TIE_ARG_IN){ if((tie_type_t)value!=TIE_ARG_IN){

fprintf(fp，″wire{31∶0]％s_％s；\n″，sname，key)； fprintf(fp, "wire{31:0] %s_%s;\n", sname, key);

} }

fprintf(fp，″wire[％d∶0]％s_％s_ns；\n″，(int)value-1， fprintf(fp, "wire[%d:0] %s_%s_ns;\n", (int) value - 1,

sname，key)；sname,key);

fprintf(fp，″wire％s_％s_we；\n″，sname，key)； fprintf(fp, "wire%s_%s_we;\n", sname, key);

} }

fprintf(fp，″wire％s_select；\n″，sname)； fprintf(fp, "wire % s_select;\n", sname);

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

写一个浮点运算说明语句 Write a floating-point arithmetic statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_flop_instance(FILE＊fp，char＊name，int num)tie2ver_write_flop_instance(FILE*fp, char*name, int num)

{{

char＊fmt； char*fmt;

fmt＝″tie_flop#(％d)f％s(.tie_out(％s_E)，.tie_in(％s_R)， fmt = "tie_flop#(%d)f%s(.tie_out(%s_E), .tie_in(%s_R),

.clk(clk))；\n″；.clk(clk));\n″;

fprintf(fp，fmt，num，name，name，name)； fprintf(fp, fmt, num, name, name, name);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

锁存所有针对R级的指令信号 Latch all command signals for R class

＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_flop(FILE＊fp，tie_t＊prog)tie2ver_write_flop(FILE*fp, tie_t*prog)

{{

char＊name； char*name;

tie_t＊inst； tie_t * inst;

name＝tie_instruction_get_name(inst)； name = tie_instruction_get_name(inst);

tie2ver_write_flop_instance(fp，name，1)； tie2ver_write_flop_instance(fp, name, 1);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为每一个语义块编写一个实例 Write an instance for each semantic block

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_semantic_instance(FILE*fp，tie_t＊prog)tie2ver_write_semantic_instance(FILE*fp, tie_t*prog)

{{

tie_t＊semantic，＊ilist，＊inst； tie_t *semantic, *ilist, *inst;

const char＊iname，＊aname，＊c； const char *iname, *aname, *c;

st_generator＊gen； st_generator * gen;

char＊key，＊value； char*key,*value;

iname＝tie_semantic_get_name(semantic)； iname = tie_semantic_get_name(semantic);

fprintf(fp，″％s i％s(″，iname，iname)； fprintf(fp, "%s i%s(", iname, iname);

c＝″″； c = "";

if((tie_type_t)value＝＝TIE_ARG_IN){ if((tie_type_t)value==TIE_ARG_IN){

fprintf(fp，″％s\n.％s(％s_E)″，c，key，key)； fprintf(fp, "%s\n.%s(%s_E)", c, key, key);

}else{ }else{

fprintf(fp，″％s\n.％s(％s_％s)″，c，key，iname，key)； fprintf(fp, "%s\n.%s(%s_%s)", c, key, iname, key);

} }

c＝″，″； c = ",";

} }

fprintf(fp，″％s\n.％s_ps(％s_ps)″，c，key，key)； fprintf(fp, "%s\n.%s_ps(%s_ps)", c, key, key);

c＝″，″； c = ",";

} }

fprintf(fp，″％s\n.％s_ns(％s_％s_ns)″，c，key，iname，key)； fprintf(fp, "%s\n.%s_ns(%s_%s_ns)", c, key, iname, key);

fprintf(fp，″％s\n.％s_-we(％s_％s_we)″，c，key，iname，key)； fprintf(fp, "%s\n.%s_-we(%s_%s_we)", c, key, iname, key);

c＝″，″； c = ",";

} }

aname＝tie_instruction_get_name(inst)； aname = tie_instruction_get_name(inst);

fprintf(fp，″，\n .％s(％s_E)″，aname，aname)； fprintf(fp, ",\n.%s(%s_E)", aname, aname);

fprintf(fp，″)；\n″)； fprintf(fp, ");\n");

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为每一种状态编写一个实例 Write an instance for each state

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_state_instance(FILE＊fp，tie_t＊prog)tie2ver_write_state_instance(FILE*fp, tie_t*prog)

{{

tie_t＊state； tie_t *state;

char＊sname； char*sname;

int width； int width;

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

sname＝tie_state_get_name(state)； sname = tie_state_get_name(state);

width＝tie_state_get_width(state)； width = tie_state_get_width(state);

fprintf(fp，″tie_athens_state #(％d)i％s(\n″，width，sname)； fprintf(fp, "tie_athens_state #(%d)i%s(\n", width, sname);

fprintf(fp，″.ns(％s_ns)，\n″，sname)； fprintf(fp, ".ns(%s_ns), \n", sname);

fprintf(fp，″.we(％s_we)，\n″，sname)； fprintf(fp, ".we(%s_we),\n", sname);

fprintf(fp，″.ke(Kill_E)，\n″)； fprintf(fp, ".ke(Kill_E),\n");

fprintf(fp，″.kp(killPipe_W)，\n″)； fprintf(fp, ".kp(killPipe_W),\n");

fprintf(fp，″.vw(valid_W)，\n″)； fprintf(fp, ".vw(valid_W),\n");

fprintf(fp，″.clk(clk)，\n″)； fprintf(fp, ".clk(clk),\n");

fprintf(fp，″.ps(％s_ps))；\n″，sname)； fprintf(fp, ".ps(%s_ps));\n", sname);

}end_tie_program_foreach_state； } end_tie_program_foreach_state;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为一个输出编写操作数选择逻辑 Write operand selection logic for an output

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_operand_selection_logic_one(FILE＊fp，tie_t＊prog，chartie2ver_write_operand_selection_logic_one(FILE*fp, tie_t*prog, char

＊name)*name)

{{

tie_t＊semantic； tie_t * semantic;

char＊c，＊dummy； char *c, *dummy;

st_table＊operand_table； st_table * operand_table;

fprintf(fp，″assign％s_E＝″，name)； fprintf(fp, "assign %s_E=", name);

c＝″″； c = "";

fprintf(fp，″％s″，c)； fprintf(fp, "%s", c);

if(st_lookup(operand_table，name，&dummy)){ if(st_lookup(operand_table, name, &dummy)){

fprintf(fp，″％s_″，tie_semantic_get_name(semantic))； fprintf(fp, "%s_", tie_semantic_get_name(semantic));

fprintf(fp，″％s &″，name)； fprintf(fp, "%s &", name);

}else{ }else{

fprintf(fp，″{32{1′b0}}&″)； fprintf(fp, "{32{1'b0}}&");

} }

fprintf(fp，″{32{％s_select}}″，tie_semantic_get_name(semantic))； fprintf(fp, "{32{%s_select}}", tie_semantic_get_name(semantic));

c＝″\n|″； c = "\n|";

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}}

为一种状态编写状态选择逻辑 Write state selection logic for a state

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_state_selection_logic_one(tie2ver_write_state_selection_logic_one(

FILE＊fp，tie_t＊prog，char＊name，int width)FILE*fp, tie_t*prog, char*name, int width)

{{

tie_t＊semantic； tie_t * semantic;

char＊c，＊value，＊sname； char *c, *value, *sname;

st_table＊state_table； st_table * state_table;

fprintf(fp，″assign ％s_ns＝″，name)； fprintf(fp, "assign %s_ns=", name);

c＝″″； c = "";

fprintf(fp，″％s″，c)； fprintf(fp, "%s", c);

if(st_lookup(state_table，name，&value)){ if(st_lookup(state_table, name, &value)){

fprintf(fp，″％s_％s_ns &″，sname，name)； fprintf(fp, "%s_%s_ns &", sname, name);

}else{ }else{

fprintf(fp，″{％d{1′b0}}&″，width)； fprintf(fp, "{%d{1'b0}}&", width);

} }

fprintf(fp，″{％d{％s_select}}″，width，sname)； fprintf(fp, "{%d{%s_select}}", width, sname);

c＝″\n|″； c = "\n|";

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

fprintf(fp，″assign ％s_we＝″，name)； fprintf(fp, "assign %s_we = ", name);

c＝″″； c = "";

fprintf(fp，″％s″，c)； fprintf(fp, "%s", c);

fprintf(fp，″％s_％s_we &″，sname，name)； fprintf(fp, "%s_%s_we &", sname, name);

}else{ }else{

fprintf(fp，″1′b0 &″)； fprintf(fp, "1'b0 &");

} }

fprintf(fp，″％s_select″，sname)； fprintf(fp, "%s_select", sname);

c＝″\n|″； c = "\n|";

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为顶级模块编写选择逻辑 Write selection logic for top-level modules

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_selection_logic(FILE＊fp，tie_t＊prog)tie2ver_write_selection_logic(FILE*fp, tie_t*prog)

{{

tie_t＊semantic，＊ilist，＊inst，＊state； tie_t *semantic, *ilist, *inst, *state;

char＊key，＊value，＊c，＊sname； char *key, *value, *c, *sname;

st_table＊global_operand_table； st_table * global_operand_table;

st_generator＊gen； st_generator * gen;

int width； int width;

fprintf(fp，″assign ％s_select＝″， fprintf(fp, "assign %s_select=",

tie_semantic_get_name(semantic))；tie_semantic_get_name(semantic));

c＝″″； c = "";

fprintf(fp，″％s％s_E″，c，tie_instruction_get_name(inst))； fprintf(fp, "%s%s_E", c, tie_instruction_get_name(inst));

c＝″\n|″； c = "\n|";

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

st_foreach_item(global_operand_table，gen，&key，&value){ st_foreach_item(global_operand_table, gen, &key, &value){

if((tie_type_t)value！＝TIE_ARG_IN){ if((tie_type_t)value!=TIE_ARG_IN){

tie2ver_write_operand_selection_logic_one(fp，prog，key)； tie2ver_write_operand_selection_logic_one(fp, prog, key);

fprintf(fp，″assign out_E＝％s_E；\n″，key)； fprintf(fp, "assign out_E = %s_E;\n", key);

} }

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

sname＝tie_state_get_name(state)； sname = tie_state_get_name(state);

width＝tie_state_get_width(state)； width = tie_state_get_width(state);

tie2ver_write_state_selection_logic_one(fp，prog，sname，width)； tie2ver_write_state_selection_logic_one(fp, prog, sname, width);

}end_tie_program_foreach_state； } end_tie_program_foreach_state;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

编写一系列的赋值语句，以便从指令中提取“字段” Write a series of assignment statements to extract the "fields" from the directive

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_field_recur(FILE＊fp，tie_t*prog，tie_t＊field，chartie2ver_write_field_recur(FILE*fp, tie_t*prog, tie_t*field, char

＊suffix)*suffix)

{{

tie_t＊subfield，＊newfield； tie_t *subfield, *newfield;

char＊c，＊name； char *c, *name;

c＝″″； c = "";

fprintf(fp，″{″)； fprintf(fp, "{");

tie_field_foreach_subfield(field，subfield){ tie_field_foreach_subfield(field, subfield){

fprintf(fp，″％s″，c)； fprintf(fp, "%s", c);

switch(tie_get_type(subfield)){ switch(tie_get_type(subfield)){

case TIE_ID： case TIE_ID:

name＝tie_get_identifier(subfield)； name = tie_get_identifier(subfield);

newfield＝tie_program_get_field_by_name(prog，name)； newfield = tie_program_get_field_by_name(prog, name);

if(newfield＝＝0){ if(newfield==0){

fprintf(fp，″inst R″)； fprintf(fp, "inst R");

}else{ }else{

tie2ver_write_field_recur(fp，prog，newfield，suffix)； tie2ver_write_field_recur(fp, prog, newfield, suffix);

} }

break； break;

case TIE_SUBFIELD： case TIE_SUBFIELD:

name＝tie_subfield_get_name(subfield)； name = tie_subfield_get_name(subfield);

if(newfield＝＝0){ if(newfield==0){

fprintf(fp，″inst_R″)； fprintf(fp, "inst_R");

}else{ }else{

DIE(″Error：unexpected subfield name(expect′inst′)\n″)； DIE("Error: unexpected subfield name(expect'inst')\n");

} }

fprintf(fp，″[％d：″，tie_subfield_get_from_index(subfield))； fprintf(fp, "[%d:", tie_subfield_get_from_index(subfield));

fprintf(fp，″％d]″，tie_subfield_get_to_index(subfield))； fprintf(fp, "%d]", tie_subfield_get_to_index(subfield));

break； break;

default： default:

DIE(″Error：unexpected subfield type\n″)； DIE("Error: unexpected subfield type\n");

} }

c＝″，″； c = ",";

}end_tie_field_foreach_subfield； } end_tie_field_foreach_subfield;

fprintf(fp，″}″)； fprintf(fp, "}");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_field(FILE＊fp，tie_t＊prog，tie_t＊field，char＊suffix)tie2ver_write_field(FILE*fp, tie_t*prog, tie_t*field, char*suffix)

{{

fprintf(fp，″assign ％s％s＝″，tie_field_get_name(field)，suffix)； fprintf(fp, "assign %s %s=", tie_field_get_name(field), suffix);

tie2ver_write_field_recur(fp，prog，field，suffix)； tie2ver_write_field_recur(fp, prog, field, suffix);

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为“操作数”编写一个模块 Write a module for "operand"

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_one_immediate(FILE＊fp，tie_t＊prog，tie_t*operand)tie2ver_write_one_immediate(FILE*fp, tie_t*prog, tie_t*operand)

{{

tie_t＊decoding，＊field，＊table； tie_t *decoding, *field, *table;

char＊oname，＊fname； char *oname, *fname;

ls_t＊tables； ls_t *tables;

int width； int width;

ASSERT(tie_get_type(operand)＝＝TIE_OPERAND)； ASSERT(tie_get_type(operand)==TIE_OPERAND);

oname＝tie_operand_get_name(operand)； >

fname＝tie_operand_get_field_name(operand)； fname = tie_operand_get_field_name(operand);

field＝tie_program_get_field_by_name(prog，fname)； field = tie_program_get_field_by_name(prog, fname);

width＝tie_field_get_width(field)； width = tie_field_get_width(field);

fprintf(fp，″\n″)； fprintf(fp, "\n");

fprintf(fp，″module ％s(inst_R，％s)；\n″，oname，oname)； fprintf(fp, "module %s(inst_R, %s);\n", oname, oname);

fprintf(fp，″output[31∶0]％s；\n″，oname)； fprintf(fp, "output[31:0] %s;\n", oname);

fprintf(fp，″wire[％d∶0]％s；\n″，tie_field_get_width(field)-1， fprintf(fp, "wire[%d:0]%s;\n", tie_field_get_width(field)-1,

fname)；fname);

tie2ver_write_field(fp，prog，fieid，″″)； tie2ver_write_field(fp, prog, fieid, "");

decoding＝tie_operand_get_decoding_expression(operand)； decoding = tie_operand_get_decoding_expression(operand);

fprintf(fp，″assign％s＝″，oname)； fprintf(fp, "assign %s=", oname);

tie2ver_write_expression(fp，decoding，0，0，0)； tie2ver_write_expression(fp, decoding, 0, 0, 0);

fprintf(fp，″；\n″)； fprintf(fp, ";\n");

tables＝tie_expression_get_tables(decoding，prog)； tables = tie_expression_get_tables(decoding, prog);

ls_foreach_data(tie_t＊，tables，table){ ls_foreach_data(tie_t*, tables, table) {

tie2ver_write_table(fp，table)； tie2ver_write_table(fp, table);

}end_ls_foreach_data； } end_ls_foreach_data;

ls_free(tables)； ls_free(tables);

fprintf(fp，″endmodule\n″)； fprintf(fp, "endmodule\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为每一种立即操作数解码逻辑编写一个模块 Write a module for each immediate operand decoding logic

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_immediate(FILE＊fp，tie_t＊prog)tie2ver_write_immediate(FILE*fp, tie_t*prog)

{{

st_table＊operand_table； st_table * operand_table;

char＊key，＊value； char*key,*value;

st_generator＊gen； st_generator * gen;

tie_t*operand； tie_t *operand;

tie_t*field； tie_t*field;

if((tie_type_t)value＝＝TIE_ARG_IN){ if((tie_type_t)value==TIE_ARG_IN){

operand＝tie_program_get_operand_by_name(prog，key)； operand = tie_program_get_operand_by_name(prog, key);

if(operand！＝0){ if(operand!=0){

if(！tie_get_predefined(operand)){ if(!tie_get_predefined(operand)){

tie2ver_write_one_immediate(fp，prog，operand)； tie2ver_write_one_immediate(fp, prog, operand);

} }

}else{ }else{

field＝tie_program_get_fieid_by_name(prog，key)； field = tie_program_get_fieid_by_name(prog, key);

if(field＝＝0){ if(field==0){

fprintf(stderr，″Error：invalidoperand ％s\n″，key)； fprintf(stderr, "Error: invalidoperand %s\n", key);

} }

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为操作数”编写一个模块 Write a module for the operand "

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_one_operand_instance(FILE＊fp，tie_t＊prog，tie_ttie2ver_write_one_operand_instance(FILE*fp, tie_t*prog, tie_t

＊operand)＊operand)

{{

char＊oname； char * oname;

oname＝tie_operand_get_name(operand)； >

fprintf(fp，″％s i％s(.inst(inst_R)，.％s(％s_R))；\n″，oname，oname， fprintf(fp, "%s i%s(.inst(inst_R), .%s(%s_R));\n", oname, oname,

oname，oname)；oname,oname);

tie2ver_write_flop_instance(fp，oname，32)； tie2ver_write_flop_instance(fp, oname, 32);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

编写一个语句，以便从inst_R中提取“字段名” Write a statement to extract the "field name" from inst_R

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_one_field_instance(FILE*fp，tie_t＊prog，tie_t＊field)tie2ver_write_one_field_instance(FILE*fp, tie_t*prog, tie_t*field)

{{

char＊name； char*name;

tie2ver_write_field(fp，prog，field，″_R″)； tie2ver_write_field(fp, prog, field, "_R");

name＝tie_field_get_name(field)； name = tie_field_get_name(field);

tie2ver_write_flop_instance(fp，name，32)； tie2ver_write_flop_instance(fp, name, 32);

}}

为每一种立即操作数解码逻辑编写一个实例 Write an instance for each immediate operand decoding logic

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2ver_write_immediate_instance(FILE＊fp，tie_t＊prog)tie2ver_write_immediate_instance(FILE*fp, tie_t*prog)

{{

char＊key，＊value； char*key,*value;

st_table＊operand_table； st_table * operand_table;

st_generator＊gen； st_generator * gen;

tie_t*operand，＊field； tie_t *operand, *field;

if((tie_type_t)value＝＝TIE_ARG_IN){ if((tie_type_t)value==TIE_ARG_IN){

if(operand！＝0 && tie_operand_is_immediate(operand)){ if(operand!=0 && tie_operand_is_immediate(operand)){

tie2ver_write_one_operand_instance(fp，prog，operand)； tie2ver_write_one_operand_instance(fp, prog, operand);

}else if(operand＝＝0){ }else if(operand==0){

field＝tie_program_get_field_by_name(prog，key)； field = tie_program_get_field_by_name(prog, key);

if(field！＝0){ if (field != 0) {

tie2ver_write_one_field_instance(fp，prog，field)； tie2ver_write_one_field_instance(fp, prog, field);

} }

}}

将“prog”打印到TIE文件 Print "prog" to a TIE file

＊＊＊＊＊＊/＊＊＊＊＊＊/

voidvoid

tie2ver_write_verilog(FILE＊fp，tie_t＊prog)tie2ver_write_verilog(FILE*fp, tie_t*prog)

{{

tie_t*semantic； tie_t *semantic;

/＊write tie primitives＊/ /*write tie primitives*/

fprintf(fp，COMMENTS)； fprintf(fp, COMMENTS);

fprintf(fp，TIE_ENFLOP)； fprintf(fp, TIE_ENFLOP);

fprintf(fp，TIE_FLOP)； fprintf(fp, TIE_FLOP);

fprintf(fp，TIE_ATHENS_STATE)； fprintf(fp, TIE_ATHENS_STATE);

/＊write each semantic block as a verilog module＊/ /*write each semantic block as a verilog module*/

ASSERT(tie_get_type(prog)＝＝TIE_PROGRAM)； ASSERT(tie_get_type(prog)==TIE_PROGRAM);

tie2ver_write_semantic(fp，semantic)； tie2ver_write_semantic(fp, semantic);

}end_tie_program_foreach_semantic； } end_tie_program_foreach_semantic;

/＊write each immediate operand as a verilog module＊/ /*write each immediate operand as a verilog module*/

tie2ver_write_immediate(fp，prog)； tie2ver_write_immediate(fp, prog);

/*write the top_level Verilog module＊/ /*write the top_level Verilog module*/

tie2ver_write_top_module(fp，prog)； tie2ver_write_top_module(fp, prog);

tie2ver_write_wire_declaration(fp，prog)； tie2ver_write_wire_declaration(fp, prog);

tie2ver_write_flop(fp，prog)； tie2ver_write_flop(fp, prog);

tie2ver_write_immediate_instance(fp，prog)； tie2ver_write_immediate_instance(fp, prog);

tie2ver_write_semantic_instance(fp，prog)； tie2ver_write_semantic_instance(fp, prog);

tie2ver_write_state_instance(fp，prog)； tie2ver_write_state_instance(fp, prog);

tie2ver_write_selection_logic(fp，prog)； tie2ver_write_selection_logic(fp, prog);

fprintf(fp，″endmodule\n″)； fprintf(fp, "endmodule\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

将“prog”打印到TIE文件 Print "prog" to a TIE file

＊＊＊＊＊＊/＊＊＊＊＊＊/

voidvoid

tie2ver_write_instruction(FILE＊fp，tie_t＊prog)tie2ver_write_instruction(FILE*fp, tie_t*prog)

{{

tie_t*inst； tie_t*inst;

int first＝1； int first = 1;

if(first){ if(first){

fprintf(fp，″％s″，tie_instruction_get_name(inst))； fprintf(fp, "%s", tie_instruction_get_name(inst));

first＝0； first=0;

}else{ }else{

} }

}}

/＊/*

＊Local Variables：＊Local Variables:

＊mode：c*mode:c

＊c-basic-offset：4*c-basic-offset: 4

＊End：*End:

＊/*/

附件EAnnex E

#include″tie.h″#include "tie.h"

#define COMMENTS″/＊Do not modify.This is automatically#define COMMENTS″/*Do not modify. This is automatically

generated.＊/″generated.*/″

#define tie2gcc_program_foreach_instruction(_prog，_inst){ \#define tie2gcc_program_foreach_instruction(_prog, _inst) { \

tie_t＊_iclass；\ tie_t*_iclass; \

if(tie_get_predefined(_-iclass))continue；\ if(tie_get_predefined(_-iclass))continue; \

tie_iclass_foreach_instruction(_iclass，_inst){ tie_iclass_foreach_instruction(_iclass, _inst){

#define end_tie2gcc_program_foreach_instruction \#define end_tie2gcc_program_foreach_instruction \

}end_tie_iclass_foreach_instruction；\ }end_tie_iclass_foreach_instruction;\

}end_tie_program_foreach_iclass；\ }end_tie_program_foreach_iclass;\

}}

建立并返回全局程序→用于用户定义的各项指令的自变量表格。 Build and return Global Program→Argument table for each user-defined instruction.

返回的表格不含在预先定义的各项指令中所使用的各自变量。 The returned table does not contain the respective variables used in the pre-defined commands.

＊＊＊＊＊＊/＊＊＊＊＊＊/

static st_table＊static st_table*

tie2gcc_program_get_operand_table(tie_t＊prog)tie2gcc_program_get_operand_table(tie_t*prog)

{{

static st_table＊tie2gcc_program_args＝0； static st_table *tie2gcc_program_args = 0;

tie_t＊inst； tie_t * inst;

char＊key，＊value； char*key,*value;

st_table＊arg_table； st_table * arg_table;

st_generator＊gen； st_generator * gen;

if(tie2gcc_program_args＝＝0){ if(tie2gcc_program_args==0){

tie2gcc_program_args＝st_init_table(strcmp，st_strhash)； tie2gcc_program_args = st_init_table(strcmp, st_strhash);

tie2gcc_program_foreach_instruction(prog，inst){ tie2gcc_program_foreach_instruction(prog, inst) {

arg_table＝tie_instruction_get_operand_table(inst)； arg_table = tie_instruction_get_operand_table(inst);

st_foreach_item(arg_table，gen，&key，&value){ st_foreach_item(arg_table, gen, &key, &value){

st_insert(tie2gcc_program_args，key，value)； st_insert(tie2gcc_program_args, key, value);

} }

st_free_table(arg_table)； st_free_table(arg_table);

}end_tie2gcc_program_foreach_instruction； } end_tie2gcc_program_foreach_instruction;

} }

return tie2gcc_program_args； return tie2gcc_program_args;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生函数和自变量说明 Generate Function and Argument Specifications

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2gcc_write_function(FILE＊fp，tie_t＊inst，tie_t＊args)tie2gcc_write_function(FILE*fp, tie_t*inst, tie_t*args)

{{

tie_t＊arg； tie_t * arg;

char＊c； char*c;

c＝″″； c = "";

fprintf(fp，″\n#define ％s(″，tie_instruction_get_name(inst))； fprintf(fp, "\n#define %s(", tie_instruction_get_name(inst));

tie_args_foreach_arg(args，arg){ tie_args_foreach_arg(args, arg){

if(tie_get_type(arg)！＝TIE_ARG_OUT){ if(tie_get_type(arg) != TIE_ARG_OUT){

fprintf(fp，″％s％s″，c，tie_arg_get_name(arg))； fprintf(fp, "%s %s", c, tie_arg_get_name(arg));

c＝″，″； c = ",";

} }

}end_tie_args_foreach_arg； } end_tie_args_foreach_arg;

fprintf(fp，″)\\\n″)； fprintf(fp, ")\\\n");

}}

返回在“args”中各自变量的列表，首先输出args。返回的列，表 Return a list of the respective variables in "args", outputting args first. returned columns, table

应当被调用者释放。 Should be freed by the caller.

＊＊＊＊＊＊/＊＊＊＊＊＊/

ls_t＊ls_t*

tie2gcc_args_get_ordered(tie_t＊args)tie2gcc_args_get_ordered(tie_t*args)

{{

tie_t＊arg； tie_t * arg;

ls_t＊arglist； ls_t *arglist;

arglist＝ls_alloc()； arglist = ls_alloc();

tie_args_foreach_arg(args，arg){ tie_args_foreach_arg(args, arg){

if(tie_get_type(arg)！＝TIE_ARG_IN){ if(tie_get_type(arg) != TIE_ARG_IN){

ls_append(arglist，arg)； ls_append(arglist, arg);

} }

}end_tie_args_foreach_arg； } end_tie_args_foreach_arg;

tie_args_foreach_arg(args，arg){ tie_args_foreach_arg(args, arg){

if(tie_get_type(arg)！＝TIE_ARG_OUT){ if(tie_get_type(arg) != TIE_ARG_OUT){

ls_append(arglist，arg)； ls_append(arglist, arg);

} }

}end_tie_args_foreach_arg； } end_tie_args_foreach_arg;

return arglist； return arglist;

}}

写出一个ASM语句 Write an ASM statement

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2gcc_write_one_asm(tie2gcc_write_one_asm(

FILE＊fp，tie_t＊prog，tie_t＊inst，tie_t＊args，int value)FILE*fp, tie_t*prog, tie_t*inst, tie_t*args, int value)

{{

tie_t＊arg，＊operand，＊state； tie_t *arg, *operand, *state;

tie_type_t type，ptype； tie_type_t type, ptype;

ls_t＊arglist； ls_t *arglist;

char＊t，s，c，＊name，＊n； char *t, s, c, *name, *n;

int i； int i;

/＊write the asm statement＊/ /*write the asm statement*/

fprintf(fp，″asm volatile(\″％s\t″， fprintf(fp, "asm volatile(\"%s\t",

tie_instruction_get_name(inst))； tie_instruction_get_name(inst));

i＝0； i=0;

tie_args_foreach_arg(args，arg){ tie_args_foreach_arg(args, arg){

fprintf(fp，″％s％％％d″，i＝＝0？″″：″，″，i)； fprintf(fp, "%s%%%d", i==0? "":", ", i);

i++； i++;

}end_tie_args_foreach_arg； } end_tie_args_foreach_arg;

fprintf(fp，″\″″)； fprintf(fp, "\"");

ptype＝TIE_UNKNOWN； ptype = TIE_UNKNOWN;

arglist＝tie2gcc_args_get_ordered(args)； arglist = tie2gcc_args_get_ordered(args);

ls_foreach_data(tie_t＊，arglist，arg){ ls_foreach_data(tie_t*, arglist, arg) {

name＝tie_arg_get_name(arg)； name = tie_arg_get_name(arg);

operand＝tie_program_get_operand_by_name(prog，name)； operand = tie_program_get_operand_by_name(prog, name);

if(operand！＝0){ if(operand!=0){

state＝tie_operand_get_state(operand)； state = tie_operand_get_state(operand);

if(state！＝0){ if (state != 0) {

n＝tie_state_get_name(state)； n = tie_state_get_name(state);

if(strcmp(n，″AR″)＝＝0}{ if(strcmp(n, "AR")==0}{

c＝′a′； c = 'a';

}else if(strcmp(n，″FR″)＝＝0){ }else if(strcmp(n, "FR")==0){

c＝′f′； c = 'f';

}else if(strcmp(n，″DR″)＝＝0){ }else if(strcmp(n, "DR")==0){

c＝′d′； c = 'd';

}else if(strcmp(n，″BR″)＝＝0){ }else if(strcmp(n, "BR")==0){

c＝′b′； c = 'b';

}else{ }else{

DIE(″Internal Error：invalid state\n″)； DIE("Internal Error: invalid state\n");

} }

}else{ }else{

c＝′i′； c = 'i';

} }

}else{ }else{

c＝′i′； c = 'i';

} }

type＝tie_get_type(arg)； type = tie_get_type(arg);

if(ptype＝＝TIE_UNKNOWN && type＝＝TIE_ARG_IN){ if(ptype==TIE_UNKNOWN && type==TIE_ARG_IN){

fprintf(fp，″：″)； fprintf(fp, ":");

} }

s＝type＝＝ptype？′，′：′：′； s=type==ptype? ',':':';

t＝type＝＝TIE_ARG_IN？″″：，″＝″； t=type==TIE_ARG_IN? "":, "=";

fprintf(fp，″％c\″％s％c\″(％s)″，s，t，c，name)； fprintf(fp, "%c\"%s%c\"(%s)", s, t, c, name);

ptype＝type； ptype = type;

}end_ls_foreach_data； } end_ls_foreach_data;

ls_free(arglist)； ls_free(arglist);

fprintf(fp，″)；″)； fprintf(fp, "); ");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为“inst”产生在线函数 generate inline functions for 'inst'

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2gcc_write_asm(FILE＊fp，tie_t＊prog，tie_t＊inst，tie_t＊args)tie2gcc_write_asm(FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)

{{

tie_t＊arg，＊out_arg； tie_t *arg, *out_arg;

/＊declear output variable and find the immediate operand＊/ /*declear output variable and find the immediate operand*/

fprintf(fp，″({″)； fprintf(fp, "({");

out_arg＝0； out_arg = 0;

tie_args_foreach arg(args，arg){ tie_args_foreach arg(args, arg) {

if(tie_get_type(arg)＝＝TIE_ARG_OUT){ if(tie_get_type(arg)==TIE_ARG_OUT){

fprintf(fp，″iht％s；″，tie_arg_get_name(arg))； fprintf(fp, "iht %s;", tie_arg_get_name(arg));

out_arg＝arg； out_arg = arg;

} }

}end_tie_args_foreach_arg； } end_tie_args_foreach_arg;

tie2gcc_write_one_asm(fp，prog，inst，args，_1)； tie2gcc_write_one_asm(fp, prog, inst, args, _1);

/＊return the results＊/ /*return the results*/

if(out_arg！＝0){ if(out_arg != 0){

fprintf(fp，″％s；″，tie_arg_get_name(out_arg))； fprintf(fp, "%s;", tie_arg_get_name(out_arg));

} }

fprintf(fp，″})\n″)； fprintf(fp, "})\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为“inst”产生一个宏 generate a macro for "inst"

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2gcc_write_inst(FILE＊fp，tie_t＊prog，tie_t＊inst，tie_t＊args)tie2gcc_write_inst(FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)

{{

tie2gcc_write_function(fp，inst，args)； tie2gcc_write_function(fp, inst, args);

tie2gcc_write_asm(fp，prog，inst，args)； tie2gcc_write_asm(fp, prog, inst, args);

}}

产生gcc头文件，它将被纳入应用程序代码，以便使用用户定义 Generates gcc header files, which will be incorporated into application code to use user-defined

的各项指令。 of the instructions.

＊＊＊＊＊＊/＊＊＊＊＊＊/

voidvoid

tie2gcc_write_gcc(FILE＊fp，tie_t＊prog)tie2gcc_write_gcc(FILE*fp, tie_t*prog)

{{

tie_t＊iclass，＊ilist，＊inst，＊args； tie_t *iclass, *ilist, *inst, *args;

fprintf(fp，″％s\n″，COMMENTS)； fprintf(fp, "%s\n", COMMENTS);

ilist＝tie_iclass_get_inst_list(iclass)； ilist = tie_iclass_get_inst_list(iclass);

args＝tie_iclass_get_-io_args(iclass)； args = tie_iclass_get_-io_args(iclass);

tie2gcc_write_inst(fp，prog，inst，args)； tie2gcc_write_inst(fp, prog, inst, args);

}end_tie_program_foreach_iclass； } end_tie_program_foreach_iclass;

}}

写出各函数以测试各立即数值的正确值 Write functions to test for the correct value of each immediate value

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie2gcc_write_operand_check_one(FILE＊fp，char＊name)tie2gcc_write_operand_check_one(FILE*fp, char*name)

{{

fprintf(fp，″\nint\n″)； fprintf(fp, "\nint\n");

fprintf(fp，″tensilica_％s(int v)\n″，name)； fprintf(fp, "tensilica_%s(int v)\n", name);

fprintf(fp，″{\n″)； fprintf(fp, "{\n");

fprintf(fp，″tensilica_insnbuf_type insn；\n″)； fprintf(fp, "tensilica_insnbuf_type insn;\n");

fprintf(fp，″int new_v；\n″)； fprintf(fp, "int new_v;\n");

fprintf(fp，″if(！set_％s_field(insn，v))return O；\n″，name)； fprintf(fp, "if(!set_%s_field(insn, v)) return O;\n", name);

fprintf(fp，″new_v＝get_％s_field(insn)；\n″，name)； fprintf(fp, "new_v=get_%s_field(insn);\n", name);

fprintf(fp，″return new_v＝＝v；\n″)； fprintf(fp, "return new_v==v;\n");

fprintf(fp，″}\n″)； fprintf(fp, "}\n");

}}

＊＊＊＊＊＊＊＊＊＊＊＊

＊＊＊＊＊＊/＊＊＊＊＊＊/

voidvoid

tie2gcc_write_operand_check(FILE＊fp，tie_t＊prog)tie2gcc_write_operand_check(FILE*fp, tie_t*prog)

{{

st_table＊arg_table； st_table * arg_table;

st_generator＊gen； st_generator * gen;

char＊key，＊value； char*key,*value;

arg_table＝tie2gcc_program_get_operand_table(prog)； arg_table = tie2gcc_program_get_operand_table(prog);

if((tie_type_t)value＝＝TIE_ARG_IN){ if((tie_type_t)value==TIE_ARG_IN){

if(strcmp(key，″art″)！＝0&&strcmp(key，″ars″)！＝0){ if(strcmp(key, "art") != 0 && strcmp(key, "ars") != 0){

tie2gcc_write_operand_check_one(fp，key)； tie2gcc_write_operand_check_one(fp, key);

} }

}}

附件FAnnex F

/＊/*

＊TIE user_register routines *TIE user_register routines

＊/ */

/＊$Id＊//*$Id*/

/＊/*

＊These coded instructions，statements，and computer programs are ＊These coded instructions, statements, and computer programs are

＊Confidential Proprietary Information of Tensilica Inc.and may not ＊Confidential Proprietary Information of Tensilica Inc. and may not

bebe

＊disclosed to third parties or copied in any form，in whole or in ＊disclosed to third parties or copied in any form, in whole or in

part，part,

＊without the prior written consent of Tensilica Inc。 *Without the prior written consent of Tensilica Inc.

＊/ */

#include<math.h>#include <math.h>

#include″tie.h″#include "tie.h"

#include″tie_int.h″#include "tie_int.h"

typede fstruct ureg_struct{typede fstruct ureg_struct{

int statef； int statef;

int statet； int state;

int uregf； int uregf;

int uregt； int uregt;

int ureg； int ureg;

char＊name； char*name;

}ureg_t；}ureg_t;

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

返回“ureg”的索引 Returns the index of "ureg"

＊＊＊＊＊＊/＊＊＊＊＊＊/

intint

tie_ureg_get_index(tie_t＊ureg)tie_ureg_get_index(tie_t*ureg)

{{

ASSERT(tie_get_type(ureg)＝＝TIE_UREG)； ASSERT(tie_get_type(ureg)==TIE_UREG);

return tie_get_integer(tie_get_first_child(ureg))； return tie_get_integer(tie_get_first_child(ureg));

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

返回“ureg”的表达式 An expression that returns "ureg"

＊＊＊＊＊＊/＊＊＊＊＊＊/

tie_t＊tie_t*

tie_ureg_get_expression(tie_t＊ureg)tie_ureg_get_expression(tie_ureg*ureg)

{{

tie_t＊index； tie_t * index;

index＝tie_get_first_child(ureg)； index = tie_get_first_child(ureg);

return tie_get_next_sibling(index)； return tie_get_next_sibling(index);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生一个表示“ureg”的常数索引的字符串 yields a string representing the constant index of "ureg"

＊＊＊＊＊＊/＊＊＊＊＊＊/

static char ureg_index[10]；static char ureg_index[10];

char＊tie_ureg_get_index_constant(tie_t＊ureg)char*tie_ureg_get_index_constant(tie_t*ureg)

{{

sprintf(ureg_index，″8′d％d″，tie_ureg_get_index(ureg))； sprintf(ureg_index, "8'd %d", tie_ureg_get_index(ureg));

return ureg_index； return ureg_index;

}}

＊＊＊＊＊＊＊＊＊＊＊＊

产生针对RUR指令的st字段 Generate st field for RUR instruction

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_st_field(tie_t＊program)tie_program_generate_st_field(tie_t*program)

{{

tie_t＊field； tie_t*field;

field＝tie_alloc(TIE_FIELD)； field = tie_alloc(TIE_FIELD);

tie_append_child(field，tie_create_identifier(″st″))； tie_append_child(field, tie_create_identifier("st"));

tie_append_child(field，tie_create_identifier(″s″))； tie_append_child(field, tie_create_identifier(″s″));

tie_append_child(field，tie_create_identifier(″t″))； tie_append_child(field, tie_create_identifier(″t″));

tie_program_add(program，field)； tie_program_add(program, field);

}}

产生RUR操作码 Generate RUR opcode

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_rur_opcode(tie_t＊program)tie_program_generate_rur_opcode(tie_t*program)

{{

tie_t＊opcode，＊encode； tie_t *opcode, *encode;

opcode＝tie_alloc(TIE_OPCODE)； opcode = tie_alloc(TIE_OPCODE);

tie_append_child(opcode，tie_create_identifier(″RUR″))； tie_append_child(opcode, tie_create_identifier("RUR"));

encode＝tie_alloc(TIE_ENCODING)； encode = tie_alloc(TIE_ENCODING);

tie_append_child(opcode，encode)； tie_append_child(opcode, encode);

tie_append_child(encode，tie_create_identifier(″op2″))； tie_append_child(encode, tie_create_identifier("op2"));

tie_append_child(encode，tie_create_constant(″4′b1110″))； tie_append_child(encode, tie_create_constant("4'b1110"));

encode＝tie_alloc(TIE_ENCODING)； encode = tie_alloc(TIE_ENCODING);

tie_append_child(opcode，encode)； tie_append_child(opcode, encode);

tie_append_child(encode，tie_create_identifier(″RST3″))； tie_append_child(encode, tie_create_identifier("RST3"));

tie_program_add(program，opcode)； tie_program_add(program, opcode);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生WUR操作码 Generate WUR opcode

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_wur_opcode(tie_t＊program)tie_program_generate_wur_opcode(tie_t*program)

{{

tie_t＊opcode，＊encode； tie_t *opcode, *encode;

opcode＝tie_alloc(TIE_OPCODE)； opcode = tie_alloc(TIE_OPCODE);

tie_append_child(opcode，tie_create_identifier(″WUR″))； tie_append_child(opcode, tie_create_identifier("WUR"));

encode＝tie_alloc(TIE_ENCODING)； encode = tie_alloc(TIE_ENCODING);

tie_append_child(opcode，encode)； tie_append_child(opcode, encode);

tie_append_child(encode，tie_create_constant(″4′b1111″))； tie_append_child(encode, tie_create_constant("4'b1111"));

encode＝tie_alloc(TIE_ENCODING)； encode = tie_alloc(TIE_ENCODING);

tie_append_child(opcode，encode)； tie_append_child(opcode, encode);

tie_program_add(program，opcode)； tie_program_add(program, opcode);

}}

产生RUR iclass Generate RUR iclass

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_rur_iclass(tie_t＊program)tie_program_generate_rur_iclass(tie_t*program)

{{

tie_t＊iclass，＊ilist，＊args，＊arg，＊state； tie_t *iclass, *ilist, *args, *arg, *state;

char＊name； char*name;

iclass＝tie_alloc(TIE_ICLASS)； iclass = tie_alloc(TIE_ICLASS);

tie_append_child(iclass，tie_create_identifier(″rur″))； tie_append_child(iclass, tie_create_identifier("rur"));

ilist＝tie_alloc(TIE_INST_LIST)； ilist = tie_alloc(TIE_INST_LIST);

tie_append_child(iclass，ilist)； tie_append_child(iclass, ilist);

tie_append_child(ilist，tie_create_identifier(″RUR″))； tie_append_child(ilist, tie_create_identifier("RUR"));

args＝tie_alloc(TIE_ARG_LIST)； args = tie_alloc(TIE_ARG_LIST);

tie_append_child(iclass，args)； tie_append_child(iclass, args);

arg＝tie_alloc(TIE_ARG_OUT)； arg = tie_alloc(TIE_ARG_OUT);

tie_append_child(args，arg)； tie_append_child(args, arg);

tie_append_child(arg，tie_create_identifier(″arr″))； tie_append_child(arg, tie_create_identifier("arr"));

arg＝tie_alloc(TIE_ARG_IN)； arg = tie_alloc(TIE_ARG_IN);

tie_append_child(args，arg)； tie_append_child(args, arg);

tie_append_child(arg，tie_create_identifier(″st″))； tie_append_child(arg, tie_create_identifier("st"));

args＝tie_alloc(TIE_ARG_LIST)； args = tie_alloc(TIE_ARG_LIST);

tie_append_child(iclass，args)； tie_append_child(iclass, args);

tie_program_foreach_state(program，state){ tie_program_foreach_state(program, state){

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

arg＝tie_alloc(TIE_ARG_IN)； arg = tie_alloc(TIE_ARG_IN);

tie_append_child(args，arg)； tie_append_child(args, arg);

name＝tie_state_get_name(state)； name = tie_state_get_name(state);

tie_append_child(arg，tie_create_identifier(name))； tie_append_child(arg, tie_create_identifier(name));

}end_tie_program_foreach_state； } end_tie_program_foreach_state;

tie_program_add(program，iclass)； tie_program_add(program, iclass);

}}

产生WUR操作码 Generate WUR opcode

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_wur_iciass(tie_t＊program)tie_program_generate_wur_iciass(tie_t*program)

{{

char＊name； char*name;

iclass＝tie_alloc(TIE_ICLASS)； iclass = tie_alloc(TIE_ICLASS);

tie_append_child(iclass，tie_create_identifief(″wur″))； tie_append_child(iclass, tie_create_identifier("wur"));

ilist＝tie_alloc(TIE_INST_LIST)； ilist = tie_alloc(TIE_INST_LIST);

tie_append_child(iclass，ilist)； tie_append_child(iclass, ilist);

tie_append_child(ilist，tie_create_identifier(″WUR″))； tie_append_child(ilist, tie_create_identifier("WUR"));

args＝tie_alloc(TIE_ARG_LIST)； args = tie_alloc(TIE_ARG_LIST);

tie_append_child(iclass，args)； tie_append_child(iclass, args);

arg＝tie_alloc(TIE_ARG_IN)； arg = tie_alloc(TIE_ARG_IN);

tie_append_child(args，arg)； tie_append_child(args, arg);

tie_append_child(arg，tie_create_identifier(″art″))； tie_append_child(arg, tie_create_identifier("art"));

arg＝tie_alloc(TIE_ARG_IN)； arg = tie_alloc(TIE_ARG_IN);

tie_append_child(args，arg)； tie_append_child(args, arg);

tie_append_child(arg，tie_create_identifier(″sr″))； tie_append_child(arg, tie_create_identifier("sr"));

args＝tie_alloc(TIE_ARG_LIST)； args = tie_alloc(TIE_ARG_LIST);

tie_append_child(iclass，args)； tie_append_child(iclass, args);

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

arg＝tie_alloc(TIE_ARG_INOUT)； arg = tie_alloc(TIE_ARG_INOUT);

tie_append_child(args，arg)； tie_append_child(args, arg);

name＝tie_state_get_name(state)； name = tie_state_get_name(state);

}end_tie_program_foreach_state； } end_tie_program_foreach_state;

tie_program_add(program，iclass)； tie_program_add(program, iclass);

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

为每个ureg产生一组选择信号 Generate a set of select signals for each ureg

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_selection_signals(tie_t＊prog，tie_t＊stmt，chartie_program_generate_selection_signals(tie_t*prog, tie_t*stmt, char

＊fname)*fname)

{{

tie_t＊ureg，*wire，＊assign，＊equal，＊id； tie_t *ureg, *wire, *assign, *equal, *id;

intindex，max_index，width； int index, max_index, width;

char wname[80]； char wname[80];

max_index＝0； max_index = 0;

tie_program_foreach_ureg(prog，ureg){ tie_program_foreach_ureg(prog, ureg){

index＝tie_ureg_get_index(ureg)； index = tie_ureg_get_index(ureg);

max_index＝MAX(max_index，index)； max_index = MAX(max_index, index);

}end_tie_program_foreach_ureg； } end_tie_program_foreach_ureg;

width＝(int)ceil(log(max_index+1)/log(2))； width=(int)ceil(log(max_index+1)/log(2));

tie_program_foreach_ureg(prog，ureg){ tie_program_foreach_ureg(prog, ureg){

index＝tie_ureg_get_index(ureg)； index = tie_ureg_get_index(ureg);

wire＝tie_alloc(TIE_WIRE)； wire = tie_alloc(TIE_WIRE);

sprintf(wname，″ureg_sel_％d″，index)； sprintf(wname, "ureg_sel_%d", index);

tie_append_child(wire，tie_create_integer(0))； tie_append_child(wire, tie_create_integer(0));

tie_append_child(wire，tie_create_identifier(wname))； tie_append_child(wire, tie_create_identifier(wname));

tie_append_child(stmt，wire)； tie_append_child(stmt, wire);

assign＝tie_alloc(TIE_ASSIGNMENT)； assign = tie_alloc(TIE_ASSIGNMENT);

tie_append_child(assign，tie_create_identifier(wname))； tie_append_child(assign, tie_create_identifier(wname));

tie_append_child(stmt，assign)； tie_append_child(stmt, assign);

equal＝tie_alloc(TIE_EQ)； equal = tie_alloc(TIE_EQ);

sprintf(wname，″％d′d％d″，width，index)； sprintf(wname, "%d'd %d", width, index);

id＝tie_create_identifier(fname)； id = tie_create_identifier(fname);

tie_append_child(id，tie_create_integer(width_1))； tie_append_child(id, tie_create_integer(width_1));

tie_-append_child(id，tie_create_integer(0))； tie_-append_child(id, tie_create_integer(0));

tie_append_child(equal，id)； tie_append_child(equal, id);

tie_append_child(equal，tie_create_constant(wname))； tie_append_child(equal, tie_create_constant(wname));

tie_append_child(assign，equal)； tie_append_child(assign, equal);

}end_tie_program_foreach_ureg； } end_tie_program_foreach_ureg;

}}

为“ureg”以及它前面的所有各ureg返回RUR选择逻辑 Returns the RUR selection logic for "ureg" and all uregs preceding it

＊＊＊＊＊＊/＊＊＊＊＊＊/

static tie_t＊static tie_t*

tie_program_rur_semantic_recur(ls_handle_t＊ureg_handle)tie_program_rur_semantic_recur(ls_handle_t*ureg_handle)

{{

tie_t*and，＊node，＊or，＊rep； tie_t *and, *node, *or, *rep;

node＝tie_program_rur_semantic_recur(handle)； node = tie_program_rur_semantic_recur(handle);

tie_append_child(assign，node)； tie_append_child(assign, node);

ls_free(ureg_list)； ls_free(ureg_list);

tie_program_add(program，semantic)； tie_program_add(program, semantic);

}}

将“ureg”的所有成员送进“列表” Send all members of "ureg" to "list"

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_ureg_exp_get_components(tie_t＊exp，ls_t＊list)tie_ureg_exp_get_components(tie_t*exp, ls_t*list)

{{

tie_t＊child； tie_t*child;

if(tie_get_type(exp)＝＝TIE_ID){ if(tie_get_type(exp)==TIE_ID){

ls_prepend(list，exp)； ls_prepend(list, exp);

} }

tie_foreach_child(exp，child){ tie_foreach_child(exp, child) {

tie_ureg_exp_get_components(child，list)； tie_ureg_exp_get_components(child, list);

}end_tie_foreach_child； } end_tie_foreach_child;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

取一份状态列表送往ur映射 Take a status list and send it to ur mapping

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_state_list_insert(ls_t＊list，ureg_t＊ur)tie_state_list_insert(ls_t*list, ureg_t*ur)

{{

ureg_t＊item； ureg_t *item;

ls_handle_t＊handle； ls_handle_t *handle;

handle＝0； handle = 0;

ls_foreach_handle(list，handle){ ls_foreach_handle(list, handle) {

item＝(ureg_t＊)ls_handle_get_data(handle)； item = (ureg_t*) ls_handle_get_data(handle);

if(item->statef<ur->statet){ if(item->statef<ur->statet){

break； break;

} }

}end_ls_forea_handle； } end_ls_forea_handle;

if(handle＝＝0){ if(handle==0){

ls_append(list，ur)； ls_append(list, ur);

}else{ }else{

ls_insert_before(handle，ur)； ls_insert_before(handle, ur);

} }

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

tie_t＊ureg＝(tie_t＊)ls_handle_get_data(ureg_handle)； tie_t*ureg=(tie_t*)ls_handle_get_data(ureg_handle);

ls_handle_t＊ureg_next； ls_handle_t * ureg_next;

char sname[80]； char sname[80];

and＝tie_alloc(TIE_BITWISE_AND)； and = tie_alloc(TIE_BITWISE_AND);

rep＝tie_alloc(TIE_REPLICATION)； rep = tie_alloc(TIE_REPLICATION);

tie_append_child(and，rep)； tie_append_child(and, rep);

tie_append_child(rep，tie_create_integer(32))； tie_append_child(rep, tie_create_integer(32));

sprintf(sname，″ureg_sel_％d″，tie_ureg_get_index(ureg))； sprintf(sname, "ureg_sel_%d", tie_ureg_get_index(ureg));

tie_append_child(rep，tie_create_identifier(sname))； tie_append_child(rep, tie_create_identifier(sname));

tie_append_child(and，tie_dup(tie_ureg_get_expression(ureg)))； tie_append_child(and, tie_dup(tie_ureg_get_expression(ureg)));

ureg_next＝ls_handle_get_next_handle(ureg_handle)； ureg_next = ls_handle_get_next_handle(ureg_handle);

if(ureg_next＝＝0){ if(ureg_next==0){

return and； return and;

}else{ }else{

node＝tie_program_rur_semantic_recur(ureg_next)； node = tie_program_rur_semantic_recur(ureg_next);

or＝tie_alloc(TIE_BITWISE_OR)； or = tie_alloc(TIE_BITWISE_OR);

tie_append_child(or，and)； tie_append_child(or, and);

tie_append_child(or，node)； tie_append_child(or, node);

return or； return or;

} }

}}

/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊/＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊＊

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生RUR语义块 Generate RUR semantic block

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_rur_semantic(tie_t＊program)tie_program_generate_rur_semantic(tie_t*program)

{{

tie_t＊ureg，＊semantic，＊ilist，＊statement，＊assign，＊node； tie_t *ureg, *semantic, *ilist, *statement, *assign, *node;

ls_t＊ureg_list； ls_t * ureg_list;

ls_handle_t＊handle； ls_handle_t *handle;

semantic＝tie_alloc(TIE_SEMANTIC)； semantic = tie_alloc(TIE_SEMANTIC);

tie_append_child(semantic，tie_create_identifier(″rur″))； tie_append_child(semantic, tie_create_identifier("rur"));

ilist＝tie_alloc(TIE_INST_LIST)； ilist = tie_alloc(TIE_INST_LIST);

tie_append_child(semantic，ilist)； tie_append_child(semantic, ilist);

statement＝tie_alloc(TIE_STATEMENT)； statement = tie_alloc(TIE_STATEMENT);

tie_append_child(semantic，statement)； tie_append_child(semantic, statement);

tie_program_generate_selection_signals(program，statement，″st″)； tie_program_generate_selection_signals(program, statement, "st");

assign＝tie alloc(TIE_ASSIGNMENT)； assign = tie alloc(TIE_ASSIGNMENT);

tie_append_child(statement，assign)； tie_append_child(statement, assign);

tie_append_child(assign，tie_create_identifier(″arr″))； tie_append_child(assign, tie_create_identifier("arr"));

ureg_list＝ls_alloc()； ureg_list = ls_alloc();

tie_program_foreach_ureg(program，ureg){ tie_program_foreach_ureg(program, ureg){

ls_append(ureg_list，ureg)； ls_append(ureg_list, ureg);

}end_tie_program_foreach_ureg； } end_tie_program_foreach_ureg;

handle＝ls_get_first_handle(ureg_list)； handle = ls_get_first_handle(ureg_list);

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_state_get_ur_mapping(tie_t＊prog，tie_t＊state，tie_t＊ureg，ls_ttie_state_get_ur_mapping(tie_t*prog, tie_t*state, tie_t*ureg, ls_t

＊list)*list)

{{

tie_t＊exp，＊child，＊s，＊id； tie_t *exp, *child, *s, *id;

int num，uregf，uregt，statef，statet； int num, uregf, uregt, statef, statet;

ls_t＊id_list； ls_t * id_list;

char＊sname，＊iname； char *sname, *iname;

ureg_t＊ur； ureg_t*ur;

exp＝tie_ureg_get_expression(ureg)； exp = tie_ureg_get_expression(ureg);

num＝tie_ureg_get_index(ureg)； num = tie_ureg_get_index(ureg);

sname＝tie_state_get_name(state)； sname = tie_state_get_name(state);

id_list＝ls_alloc()； id_list = ls_alloc();

tie_ureg_exp_get_components(exp，id_list)； tie_ureg_exp_get_components(exp, id_list);

uregt＝uregf＝-1； uregt = uregf = -1;

ls_foreach_data(tie_t＊，id_list，id){ ls_foreach_data(tie_t*, id_list, id) {

iname＝tie_get_identifier(id)； iname = tie_get_identifier(id);

child＝tie_get_first_child(id)； child = tie_get_first_child(id);

/＊compute the next uregf and uregt＊/ /*compute the next uregf and uregt*/

if(child＝＝0){ if(child==0){

s＝tie_program_get_state_by_name(prog，iname)； s = tie_program_get_state_by_name(prog, iname);

ASSERT(s！＝0)； ASSERT(s != 0);

statet＝0； state=0;

statef＝tie_state_get_width(s)-1； statef = tie_state_get_width(s) - 1;

}else{ }else{

statef＝tie_get_integer(child)； statef = tie_get_integer(child);

child＝tie_get_next_sibling(child)； child = tie_get_next_sibling(child);

if(child＝＝0){ if(child==0){

statet＝statef； statet = statef;

}else{ }else{

statet＝tie_get_integer(child)； statet = tie_get_integer(child);

} }

uregt＝uregf+1； uregt=uregf+1;

uregf＝uregt+(statef-statet)； uregf = uregt + (statef - statet);

if(strcmp(iname，sname)＝＝0){ if(strcmp(iname, sname)==0){

ur＝ALLOC(ureg-t，1)； ur = ALLOC(ureg - t, 1);

ur->statef＝statef； ur->statef=statef;

ur->statet＝statet； ur->statet=statet;

ur->uregf＝uregf； ur->uregf = uregf;

ur->uregt＝uregt； ur->uregt=uregt;

ur->ureg＝num； ur->ureg=num;

ur->name＝″art″； ur->name="art";

tie_state_list_insert(list，ur)； tie_state_list_insert(list, ur);

} }

}end_ls_foreach data； } end_ls_foreach data;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

在state-to-ur映射表中填充空隙 Fill gaps in the state-to-ur mapping table

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_state_fill_gap(tie_t＊state，ls_t＊list)tie_state_fill_gap(tie_t*state, ls_t*list)

{{

int width，statet，statef； int width, statet, statef;

ls_handle_t＊handle； ls_handle_t *handle;

ureg_t＊ur，＊gap； ureg_t *ur, *gap;

char＊name； char*name;

width＝tie_state_get_width(state)； width = tie_state_get_width(state);

name＝tie_state_get_name(state)； name = tie_state_get_name(state);

statet＝statef＝width； statet = statef = width;

ls_foreach_handle(list，handle){ ls_foreach_handle(list, handle) {

ur＝(ureg_t＊)ls_handle_get_data(handle)； ur=(ureg_t*)ls_handle_get_data(handle);

if(ur->statef<(statet-1)){ if(ur->statef<(state-1)){

gap＝ALLOC(ureg_t，1)； gap = ALLOC(ureg_t, 1);

gap->statef＝statet-1； gap->statef = statet-1;

gap->statet＝ur->statef+1； gap->statet=ur->statef+1;

gap->uregf＝gap->uregt＝gap->ureg＝-1； gap->uregf=gap->uregt=gap->ureg=-1;

gap->name＝0； gap->name=0;

ls_insert_before(handle，gap)； ls_insert_before(handle, gap);

} }

statet＝ur->statet； statet=ur->statet;

statef＝ur->statef； statef=ur->statef;

}end_ls_foreach_handle； } end_ls_foreach_handle;

handle＝ls_get_last_handle(list)； handle = ls_get_last_handle(list);

if(ur->statet>0){ if(ur->statet>0){

gap＝ALLOC(ureg_t，1)； gap = ALLOC(ureg_t, 1);

gap->statef＝ur->statet-1； gap->statef=ur->statet-1;

gap->statet＝0； gap->state=0;

gap->uregf＝gap->uregt＝gap->ureg＝-1； gap->uregf=gap->uregt=gap->ureg=-1;

gap->name＝0； gap->name=0;

ls_insert_after(handle，gap)； ls_insert_after(handle, gap);

} }

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生WUR语义块 Generate WUR semantic block

＊＊＊＊＊＊/＊＊＊＊＊＊/

static voidstatic void

tie_program_generate_wur_semantic(tie_t＊program)tie_program_generate_wur_semantic(tie_t*program)

{{

tie_t＊ureg，＊semantic，＊ilist，＊statement，＊assign，＊cond； tie_t *ureg, *semantic, *ilist, *statement, *assign, *cond;

tie_t＊state，＊concat，＊id； tie_t *state, *concat, *id;

ureg_t＊ur； ureg_t*ur;

char＊sname，selname[80]； char *sname, selname[80];

ls_t＊list； ls_t*list;

semantic＝tie_alloc(TIE_SEMANTIC)； semantic = tie_alloc(TIE_SEMANTIC);

tie_append_child(program，semantic)； tie_append_child(program, semantic);

tie_append_child(semantic，tie_create_identifier(″wur″))； tie_append_child(semantic, tie_create_identifier("wur"));

ilist＝tie_alloc(TIE_INST_LIST)； ilist = tie_alloc(TIE_INST_LIST);

tie_append_child(semantic，ilist)； tie_append_child(semantic, ilist);

statement＝tie_alloc(TIE_STATEMENT)； statement = tie_alloc(TIE_STATEMENT);

tie_program_generate_selection_signals(program，statement，″sr″)； tie_program_generate_selection_signals(program, statement, "sr");

if(tie_get_predefined(state))continue； if(tie_get_predefined(state))continue;

sname＝tie_state_get_name(state)； sname = tie_state_get_name(state);

list＝ls_alloc()； list = ls_alloc();

tie_state_get_ur_mapping(program，state，ureg，list)； tie_state_get_ur_mapping(program, state, ureg, list);

}end_tie_program_foreach_ureg； } end_tie_program_foreach_ureg;

tie_state_fill_gap(state，list)； tie_state_fill_gap(state, list);

assign＝tie_alloc(TIE_ASSIGNMENT)； assign = tie_alloc(TIE_ASSIGNMENT);

tie_append_child(statement，assign)； tie_append_child(statement, assign);

tie_append_child(assign，tie_create_identifier(sname))； tie_append_child(assign, tie_create_identifier(sname));

concat＝tie_alloc(TIE_CONCATENATION)； concat = tie_alloc(TIE_CONCATENATION);

tie_append_child(assign，concat)； tie_append_child(assign, concat);

ls_foreach_data(ureg_t＊，list，ur){ ls_foreach_data(ureg_t*, list, ur) {

if(ur_>name！＝0){ if(ur_>name!=0){

cond＝tie_alloc(TIE_CONDITIONAL)； cond = tie_alloc(TIE_CONDITIONAL);

tie_append_child(concat，cond)； tie_append_child(concat, cond);

sprintf(selname，″ureg_sel_％d″，ur_>ureg)； sprintf(selname, "ureg_sel_%d", ur_>ureg);

id＝tie_create_identifier(selname)； id = tie_create_identifier(selname);

tie_append_child(cond，id)； tie_append_child(cond, id);

id＝tie_create_identifier(ur_>name)； id = tie_create_identifier(ur_>name);

tie_append_child(id，tie_create_integer(ur->uregf))； tie_append_child(id, tie_create_integer(ur->uregf));

tie_append_child(id，tie_create_integer(ur->uregt))； tie_append_child(id, tie_create_integer(ur->uregt));

tie_appemd_-child(cond，id)； tie_appemd_child(cond, id);

id＝tie_create_identifier(sname)； id = tie_create_identifier(sname);

tie_append_child(id，tie_create_integer(ur->statef))； tie_append_child(id, tie_create_integer(ur->statef));

tie_append_child(id，tie_create_integer(ur->statet))； tie_append_child(id, tie_create_integer(ur->statet));

tie_append_child(cond，id)； tie_append_child(cond, id);

}else{ }else{

id＝tie_create_identifier(sname)； id = tie_create_identifier(sname);

tie_append_child(concat，id)； tie_append_child(concat, id);

} }

}end_ls_foreach_data； } end_ls_foreach_data;

ls_free(list)； ls_free(list);

}end_tie_program_foreach_state； } end_tie_program_foreach_state;

}}

＊＊＊＊＊＊＊＊＊＊＊＊＊＊

产生WUR语义块 Generate WUR semantic block

＊＊＊＊＊＊/＊＊＊＊＊＊/

voidvoid

tie_program_generate_rurwur(tie_t*program)tie_program_generate_rurwur(tie_t*program)

{{

tie_t＊ureg； tie_t*ureg;

int-num＝0； int_num=0;

num++； num++;

}end_tie_program_foreach_ureg； } end_tie_program_foreach_ureg;

if(num＝＝0){ if(num==0){

return； return;

} }

tie_program_generate_st_field(program)； tie_program_generate_st_field(program);

tie_program_generate_rur_opcode(program)； tie_program_generate_rur_opcode(program);

tie_program_generate_wur_-opcode(program)； tie_program_generate_wur_opcode(program);

tie_program_generate_rur_iclass(program)； tie_program_generate_rur_iclass(program);

tie_program_generate_wur_iclass(program)； tie_program_generate_wur_iclass(program);

tie_program_generate_rur_semantic(program)； tie_program_generate_rur_semantic(program);

tie_program_generate_wur_semantic(program)； tie_program_generate_wur_semantic(program);

}}

附件GAnnex G

150 150

// -a predefined instruction field op2// -a predefined instruction field op2

// -a predefined opcode CUSTO// -a predefined opcode CUSTO

opcode BYTESWAP op2＝4′b0000 CUSTOopcode BYTESWAP op2=4'b0000 CUSTO

//declare a state ACC used to accumulate byte-swapped data//declare a state ACC used to accumulate byte-swapped data

state ACC 32state ACC 32

//declare a mode bit SWAP to control the swap//declare a mode bit SWAP to control the swap

state SWAP 1state SWAP 1

//use″RUR ar，0″and″WUR ar，0″to move data between AR and ACC//use″RUR ar，0″and″WUR ar，0″to move data between AR and ACC

user_register 0 ACCuser_register 0 ACC

//use″RUR ar，1″and″WUR ar，1″to move data between AR and SWAP//use″RUR ar，1″and″WUR ar，1″to move data between AR and SWAP

user_register 1 SWAPuser_register 1 SWAP

//define a new instruction class that//define a new instruction class that

// -reads data from ars(predefined to be AR[s])// -reads data from ars(predefined to be AR[s])

// -uses and writes state ACC// -uses and writes state ACC

// -uses state SWAP// -uses state SWAP

iclass bs{BYTESWAP}{in ars}{inout ACC，in SWAP}iclass bs{BYTESWAP}{in ars}{inout ACC, in SWAP}

//semantic definition of byteswap//semantic definition of byteswap

// Accumulates to ACC the byte-swapped ars(AR[s])or// Accumulates to ACC the byte-swapped ars(AR[s]) or

// ars depending on the SWAP bit//ars depending on the SWAP bit

semantic bs{BYTESWAP}{semantic bs{BYTESWAP}{

wire[31：0]ars_swap＝{ars[7：0]，ars[15：8]，ars[23：16]，ars[31：24]}； wire[31:0] ars_swap={ars[7:0], ars[15:8], ars[23:16], ars[31:24]};

assign ACC＝ACC+(SWAP？ars_swap：ars)； assign ACC=ACC+(SWAP?ars_swap:ars);

}}

附件HAnnex H

#define PARAMS(_arg)_arg#define PARAMS(_arg)_arg

typedef signed int int32_t；typedef signed int int32_t;

typedef unsigned int u_int32_t；typedef unsigned int u_int32_t;

typedef void＊ xtensa_isa；typedef void *xtensa_isa;

typedef void＊ xtensa_operand；typedef void * xtensa_operand;

typedef int xtensa_opcode；typedef int xtensa_opcode;

#define XTENSA_UNDEFINED-1#define XTENSA_UNDEFINED-1

typedef u_int32_t xtensa_insnbuf_word；typedef u_int32_t xtensa_insnbuf_word;

typedef xtensa insnbuf_word ＊xtensa insnbuf；typedef xtensa insnbuf_word *xtensa insnbuf;

typedef enum{typedef enum{

xtensa_encode_result_ok， xtensa_encode_result_ok,

xtensa_encode_result_align， xtensa_encode_result_align,

xtensa_encode_result_not_in_table， xtensa_encode_result_not_in_table,

xtensa_encode_result_too_low， xtensa_encode_result_too_low,

xtensa_encode_result_too_high， xtensa_encode_result_too_high,

xtensa_encode_result_not_ok xtensa_encode_result_not_ok

}xtensa_encode_result；} xtensa_encode_result;

typedef u_int32_t(＊xtensa_immed_decode_fn)PARAMS((u_int32_t val))；typedef u_int32_t(*xtensa_immed_decode_fn) params((u_int32_t val));

typedef xtensa_encode_result(＊xtensa_immed_encode_fn)typedef xtensa_encode_result(*xtensa_immed_encode_fn)

PARAMS((u_int32_t＊valp))； params((u_int32_t*valp));

typedef u_int32_t(＊xtensa_get_field_fn)PARAMS((const xtensa_insnbuftypedef u_int32_t(*xtensa_get_field_fn) PARAMS((const xtensa_insnbuf

insn))；insn));

typedef void(＊xtensa_set_field_fn)PARAMS((xtensa_insnbuf insn，typedef void(*xtensa_set_field_fn) PARAMS((xtensa_insnbuf insn,

u_int32_t val))；u_int32_t val));

typedef int(＊xtensa_insn_decode_fn)PARAMS((const xtensa_insnbuftypedef int(*xtensa_insn_decode_fn) PARAMS((const xtensa_insnbuf

insn))；insn));

typedef struct xtensa_operand_internal_struct{typedef struct xtensa_operand_internal_struct{

char operand_kind； char operand_kind;

char inout； char inout;

xtensa_get_field_fn get_field； xtensa_get_field_fn get_field;

xtensa_set_field_fn set_field； xtensa_set_field_fn set_field;

xtensa_immed_encode_fn encode； xtensa_immed_encode_fn encode;

xtensa_immed_decode_fn decode； xtensa_immed_decode_fn decode;

}xtensa_operand_internal；} xtensa_operand_internal;

typede fstruct xtensa_iclass_internal_struct{typede fstruct xtensa_iclass_internal_struct{

int num_operands； int num_operands;

xtensa_operand_internal＊＊operands； xtensa_operand_internal**operands;

}xtensa_iclass_internal；} xtensa_iclass_internal;

typedef struct xtensa_opcode_internal_struct{typedef struct xtensa_opcode_internal_struct{

const char＊name； const char *name;

int length； int length;

xtensa_insnbuf encoding_template； xtensa_insnbuf encoding_template;

xtensa_iclass_internal＊iclass； xtensa_iclass_internal*iclass;

}xtensa_opcode_internal；} xtensa_opcode_internal;

typedef structopname_lookup_entry_struct{typedef struct topname_lookup_entry_struct{

const char＊key； const char*key;

xtensa_opcode opcode； xtensa_opcode opcode;

}opname_lookup_entry；}opname_lookup_entry;

typedef struct xtensa_isa_internal_struct{typedef struct xtensa_isa_internal_struct{

int insn_size； int insn_size;

int insnbuf_size； int insnbuf_size;

int num_opcodes； int num_opcodes;

xtensa_opcode_internal＊＊opcode_table； xtensa_opcode_internal**opcode_table;

int num_modules； int num_modules;

int＊module_opcode base； int *module_opcode base;

xtensa_insn_decode_fn＊module_decode_fn； xtensa_insn_decode_fn *module_decode_fn;

opname_lookup_entry＊opname_lookup_table； opname_lookup_entry *opname_lookup_table;

}xtensa_isa_internal；} xtensa_isa_internal;

externu_int32_tget_r_field(const xtensa_insnbuf insn)；externu_int32_tget_r_field(const xtensa_insnbuf insn);

extern void set_r_field(xtensa_insnbuf insn，u_int32_t val)；extern void set_r_field(xtensa_insnbuf insn, u_int32_t val);

extern u_int32_t get_s_field(const xtensa_insnbuf insn)；extern u_int32_t get_s_field(const xtensa_insnbuf insn);

extern void set_s_field(xtensa_insnbuf insn，u_int32_t val)；extern void set_s_field(xtensa_insnbuf insn, u_int32_t val);

extern u_int32_t get_sr_field(const xtensa_insnbuf insn)；extern u_int32_t get_sr_field(const xtensa_insnbuf insn);

extern void set_sr_field(xtensa_insnbuf insn，u_int32_t val)；extern void set_sr_field(xtensa_insnbuf insn, u_int32_t val);

extern u_int32_t get_t_field(const xtensa_insnbuf insn)；extern u_int32_t get_t_field(const xtensa_insnbuf insn);

extern void set_t_field(xtensa_insnbuf insn，u_int32_t val)；extern void set_t_field(xtensa_insnbuf insn, u_int32_t val);

extern xtensa_encode_result encode_r(u_int32_t＊valp)；extern xtensa_encode_result encode_r(u_int32_t*valp);

extern u_int32_t decode_r(u_int32_t val)；extern u_int32_t decode_r(u_int32_t val);

extern xtensa_encode_result encode_s(u_int32_t＊valp)；extern xtensa_encode_result encode_s(u_int32_t*valp);

extern u_int32_t decode_s(u_int32_t val)；extern u_int32_t decode_s(u_int32_t val);

extern xtensa_encode_result encode_sr(u_int32_t＊valp)；extern xtensa_encode_result encode_sr(u_int32_t*valp);

extern u_int32_t decode_sr(u_int32_t val)；extern u_int32_t decode_sr(u_int32_t val);

extern xtensa_encode_result encode_t(u_int32_t＊valp)；extern xtensa_encode_result encode_t(u_int32_t*valp);

extern u_int32_t decode_t(u_int32_tval)；extern u_int32_t decode_t(u_int32_tval);

static u_int32t get_st_field(insn)static u_int32t get_st_field(insn)

const xtensa_insnbuf insn；const xtensa_insnbuf insn;

{{

u_int32_t temp；u_int32_t temp;

temp＝0；temp=0;

temp|＝((insn[0] & 0×f00)>>8)<<4；temp|=((insn[0] &0×f00)>>8)<<4;

temp|＝((insn[0] & 0×f0)>>4)<<0；temp|=((insn[0] &0×f0)>>4)<<0;

return temp；return temp;

}}

static void set_st_field(insn，val)static void set_st_field(insn, val)

xtensa_insnbuf insn；u_int32_tval；xtensa_insnbuf insn; u_int32_tval;

{{

insn[0]＝(insn[0] & 0×fffff0ff)|((val & 0×f0)<<8)；insn[0]=(insn[0] & 0×fffff0ff)|((val &0×f0)<<8);

insn[0]＝(insn[0] & 0×ffffff0f)|((val & 0×f)<<4)；insn[0]=(insn[0] & 0×ffffff0f)|((val &0×f)<<4);

}}

static u_int32t decode_st(u_int32_t val)static u_int32t decode_st(u_int32_t val)

{{

return val； return val;

}}

static xtensa_encode_result encode_st(u_int32_t＊valp)static xtensa_encode_result encode_st(u_int32_t*valp)

{{

if((＊valp>>8)！＝0){ if((*valp>>8)!=0){

return xtensa_encode_result_too_high； return xtensa_encode_result_too_high;

}else{ }else{

return xtensa_encode_result_ok； return xtensa_encode_result_ok;

} }

}}

static xtensa_operand_internal aor_operand＝{static xtensa_operand_internal aor_operand={

′a′， 'a',

′>′， '>',

get_r_field， get_r_field,

set_r_field， set_r_field,

encode_r， encode_r,

decode_r decode_r

}；};

static xtensa_operand_internal ais_operand＝{static xtensa_operand_internal ais_operand = {

′a′， 'a',

′<′， '<',

get_s_field， get_s_field,

set_s_field， set_s_field,

encode_s， encode_s,

decode_s decode_s

}；};

static xtensa_operand_internal ait_operand＝{static xtensa_operand_internal ait_operand = {

′a′， 'a',

′<′， '<',

get_t_field， get_t_field,

set_t_field， set_t_field,

encode_t， encode_t,

decode_t decode_t

}；};

static xtensa_operand_internal iisr_operand＝{static xtensa_operand_internal iisr_operand = {

′i′， 'i',

′<′， '<',

get_sr_field， get_sr_field,

set_sr_field， set_sr_field,

encode_sr， encode_sr,

decode_sr decode_sr

}；};

static xtensa_operand_internal iist_operand＝{static xtensa_operand_internal iist_operand = {

′i′， 'i',

′<′， '<',

get_st_field， get_st_field,

set_st_field， set_st_field,

encode_st， encode_st,

decode_st decode_st

}；};

static xtensa_operand_internal＊bs_operand_list[]＝{static xtensa_operand_internal*bs_operand_list[]={

　　&ais_operand&ais_operand

}；};

static xtensa_iclass_internal bs_iclass＝{static xtensa_iclass_internal bs_iclass={

1， 1,

&bs_operand_list[0] &bs_operand_list[0]

}；};

static xtensa_operand_internal＊rur_operand_list[]＝(static xtensa_operand_internal*rur_operand_list[]=(

&aor_operand， &aor_operand,

&iist_operand &iist_operand

}；};

static xtensa_iclass_internal rur_iclass＝{static xtensa_iclass_internal rur_iclass = {

2， 2,

&rur_operand_list[0] &rur_operand_list[0]

}；};

static xtensa_operand_internal＊wur_operand_list[]＝{static xtensa_operand_internal*wur_operand_list[]={

&ait_operand， &ait_operand,

&iisr_operand &iisr_operand

}；};

static xtensa_iclass_internal wur_iclass＝{static xtensa_iclass_internal wur_iclass={

2， 2,

&wur_operand_list[0] &wur_operand_list[0]

}；};

static xtensa_insnbuf_word BYTESWAP_template[]＝{0x60000}；static xtensa_insnbuf_word BYTESWAP_template[] = {0x60000};

static xtensa_opcode_internal BYTESWAP_opcode＝{static xtensa_opcode_internal BYTESWAP_opcode = {

″byteswap″， "byteswap",

3， 3,

&BYTESWAP_template[0]， &BYTESWAP_template[0],

&bs_iclass &bs_iclass

}；};

static xtensa_insnbuf_word RUR_template[]＝{0xe30000}；static xtensa_insnbuf_word RUR_template[] = {0xe30000};

static xtensa_opcode_internal RUR_opcode＝{static xtensa_opcode_internal RUR_opcode = {

″rur″， "rur",

3， 3,

&RUR_template[0]， &RUR_template[0],

&rur_iclass &rur_iclass

}；};

static xtensa_insnbuf_word WUR_template[]＝{0xf30000}；static xtensa_insnbuf_word WUR_template[] = {0xf30000};

static xtensa_opcode_internal WUR_opcode＝{static xtensa_opcode_internal WUR_opcode = {

″wur″， "wur",

3， 3,

&WUR_template[0]， &WUR_template[0],

&wur_iclass &wur_iclass

}；};

static xtensa_opcode_internal＊opcodes[]＝{static xtensa_opcode_internal*opcodes[]={

&BYTESWAP_opcode， &BYTESWAP_opcode,

&RUR_opcode， &RUR_opcode,

&WUR_opcode &WUR_opcode

}；};

xtensa_opcode_internal＊＊get_opcodes(){return & opcodes[0]；}xtensa_opcode_internal*** get_opcodes() { return &opcodes[0]; }

const int get_num_opcodes(){return3；}const int get_num_opcodes() { return3; }

#define xtensa_BYTESWAP_op 0#define xtensa_BYTESWAP_op 0

#define xtensa_RUR_op1#define xtensa_RUR_op1

#define xtensa_WUR_op2#define xtensa_WUR_op2

{{

if((insn[0] & 0×ff000f)＝＝0×60000)return xtensa_BYTESWAP_op；if((insn[0] & 0×ff000f)==0×60000) return xtensa_BYTESWAP_op;

if((insn[0] & 0×ff000f)＝＝0×e30000)return xtensa_RUR_op；if((insn[0] & 0×ff000f)==0×e30000) return xtensa_RUR_op;

if((insn[0] & 0×ff000f)＝＝0×f30000)return xtensa_WUR_-op；if((insn[0] & 0×ff000f)==0×f30000) return xtensa_WUR_-op;

return XTENSA_UNDEFINED；return XTENSA_UNDEFINED;

}}

附件IAnnex I

typedef unsigned u32；typedef unsigned u32;

typedef struct u64str{unsigned int lo；unsigned int hi；}u64；typedef struct u64str { unsigned int lo; unsigned int hi; } u64;

extern u32 state32(inti)；extern u32 state32(inti);

extern u64 state64(inti)；extern u64 state64(inti);

extern void set_state32(int i，u32v)；extern void set_state32(int i, u32v);

extern void set_state64(int i，u64v)；extern void set_state64(int i, u64v);

extern void set_ar(int i，u32v)；extern void set_ar(int i, u32v);

extern u32 ar(int i)；extern u32 ar(int i);

extern void pc_incr(int i)；extern void pc_incr(int i);

extern int au×32_fetchfirst(void)；extern int au×32_fetchfirst(void);

extern void pipe_use_ifetch(intn)；extern void pipe_use_ifetch(intn);

extern void pipe_use_dcache(void)；extern void pipe_use_dcache(void);

extern void pipe_def_ifetch(int n)；extern void pipe_def_ifetch(int n);

extern int arcode(void)；extern int arcode(void);

extern void pipe_use(int n，int v，int i)；extern void pipe_use(int n, int v, int i);

extern void pipe_def(int n，int v，int i)；extern void pipe_def(int n, int v, int i);

struct state_tbl_entry{struct state_tbl_entry{

const char ＊name； const char *name;

int numbits； int numbits;

}；};

#define STATE_ACC 0#define STATE_ACC 0

#define STATE_SWAP 1#define STATE_SWAP 1

#define NUM_STATES 2#define NUM_STATES 2

struct state_tbl_entrylocal_state_tbl[NUM_STATES+1]＝{struct state_tbl_entrylocal_state_tbl[NUM_STATES+1]={

{″ACC″，32}， {"ACC",32},

{″SWAP″，1}， {"SWAP", 1},

{″″，0} {″″,0}

}；};

extern″C″structstate_tbl_entry＊get_state_tbl(void)；extern "C" struct state_tbl_entry * get_state_tbl(void);

structstate_tbl_entry＊get_state_tbl(void)structstate_tbl_entry*get_state_tbl(void)

{{

return & local_state_tbl[0]； return &local_state_tbl[0];

}}

/＊constant table ai4const ＊//*constant table ai4const */

static const unsigned CONST_TBL_AI4CONST[]＝{static const unsigned CONST_TBL_AI4CONST[]={

0×ffffffff， 0×ffffffff,

0×1， 0×1,

0×2， 0×2,

0×3， 0×3,

0×4， 0×4,

0×5， 0×5,

0×6， 0×6,

0×7， 0×7,

0×8， 0×8,

0×9， 0×9,

0×a， 0×a,

0×b， 0×b,

0×c， 0×c,

0×d， 0×d,

0×e， 0×e,

0×f 0×f

}；};

/＊constant table b4const＊//*constant table b4const*/

static const unsigned CONST_TBL_B4CONST[]＝{static const unsigned CONST_TBL_B4CONST[]={

0×ffffffff， 0×ffffffff,

0×l， 0×l,

0×2， 0×2,

0×3， 0×3,

0×4， 0×4,

0×5， 0×5,

0×6， 0×6,

0×7， 0×7,

0×8， 0×8,

0×a， 0×a,

0×c， 0×c,

0×10， 0×10,

0×20， 0×20,

0×40， 0×40,

0×80， 0×80,

0×100 0×100

}；};

/＊constant table b4constu＊//*constant table b4constu*/

static const unsigned CONST_TBL_B4CONSTU[]＝{static const unsigned CONST_TBL_B4CONSTU[]={

0×8000， 0×8000,

0×10000， 0×10000,

0×2， 0×2,

0×3， 0×3,

0×4， 0×4,

0×5， 0×5,

0×6， 0×6,

0×7， 0×7,

0×8， 0×8,

0×a， 0×a,

0×c， 0×c,

0×10， 0×10,

0×20， 0×20,

0×40， 0×40,

0×80， 0×80,

0×100 0×100

}；};

/＊constant table d01tab＊//*constant table d01tab*/

static const unsigned CONST_TBL_D01TAB[]＝{static const unsigned CONST_TBL_D01TAB[]={

0， 0,

0×1 0×1

}；};

/＊constanttable d23tab＊//*constant table d23tab*/

static const unsigned CONST_TBL_D23TAB[]＝{static const unsigned CONST_TBL_D23TAB[]={

0×2， 0×2,

0×3 0×3

}；};

/＊constant table i4plconst＊//*constant table i4plconst*/

static const unsigned CONST_TBL_I4P1CONST[]＝{static const unsigned CONST_TBL_I4P1CONST[]={

0×1， 0×1,

0×2， 0×2,

0×3， 0×3,

0×4， 0×4,

0×5， 0×5,

0×6， 0×6,

0×7， 0×7,

0×8， 0×8,

0×9， 0×9,

0×a， 0×a,

0×b， 0×b,

0×c， 0×c,

0×d， 0×d,

0×e， 0×e,

0×f， 0×f,

0×10 0×10

}；};

/＊constant table mip32const＊//*constant table mip32const*/

static const unsigned CONST_TBL_MI P32CONST[]＝{static const unsigned CONST_TBL_MI P32CONST[]={

0×20， 0×20,

0×1f， 0×1f,

0×1e， 0×1e,

0×1d， 0×1d,

0×1c， 0×1c,

0×1b， 0×1b,

0×1a， 0×1a,

0×19， 0×19,

0×18， 0×18,

0×17， 0×17,

0×16， 0×16,

0×15， 0×15,

0×14， 0×14,

0×13， 0×13,

0×12， 0×12,

0×11， 0×11,

0×10， 0×10,

0×f， 0×f,

0×e， 0×e,

0×d， 0×d,

0×c， 0×c,

0×b， 0×b,

0×a， 0×a,

0×9， 0×9,

0×8， 0×8,

0×7， 0×7,

0×6， 0×6,

0×5， 0×5,

0×4， 0×4,

0×3， 0×3,

0×2， 0×2,

0×1 0×1

}；};

voidvoid

BYTESWAP_func(u32_OPND0_，u32_OPND1_，u32_OPND_2_，u32_OPND3)BYTESWAP_func(u32_OPND0_, u32_OPND1_, u32_OPND_2_, u32_OPND3)

{{

unsigned ars＝ar(_OPND0_)；unsigned ars = ar(_OPND0_);

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32S WAP＝state32(STATE_SWAP)；u32S WAP = state32(STATE_SWAP);

unsigned_tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

unsigned ACC_ns；unsigned ACC_ns;

unsigned ars_swap；unsigned ars_swap;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

ars_swap＝(((ars & 0×ff))<<24)|((((ars>>8) & 0×ff))<<ars_swap＝(((ars & 0×ff))<<24)|((((ars>>8) & 0×ff))<<

16)|((((ars>>16) & 0×ff))<<8)|(((ars>>24) & 0×ff))；16)|((((ars>>16) & 0×ff))<<8)|(((ars>>24) &0×ff));

if(SWAP_ps){if(SWAP_ps){

_tmp0＝ars_swap；_tmp0 = ars_swap;

)else{)else{

_tmp0＝ars；_tmp0 = ars;

}}

ACC_ns＝ACC_ps+_tmp0；ACC_ns=ACC_ps+_tmp0;

ACC＝ACC_ns；ACC = ACC_ns;

set_state32(STATE_ACC，ACC)；set_state32(STATE_ACC, ACC);

pc_incr(3)；pc_incr(3);

}}

voidvoid

RUR_func(u32_OPND0_，u32_OPND1_，u32_OPND2_，u32_OPND3_)RUR_func(u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)

{{

unsigned arr；unsigned arr;

unsigned st＝_OPND1_；unsigned st = _OPND1_;

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32 SWAP＝state32(STATE_SWAP)；u32 SWAP = state32(STATE_SWAP);

unsigned_tmp1；unsigned_tmp1;

unsigned_tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

if(st＝＝1){if(st==1){

_tmp0＝SWAP_ps；_tmp0 = SWAP_ps;

}else{}else{

_tmp0＝0；_tmp0=0;

}}

if(st＝＝0){if(st==0){

_tmp1＝ACC_ps；_tmp1 = ACC_ps;

}else{}else{

_tmp1＝_tmp0；_tmp1 = _tmp0;

}}

arr＝_tmp1；arr = _tmp1;

set_ar(_OPND0_，arr)；set_ar(_OPND0_, arr);

pc_incr(3)；pc_incr(3);

}}

voidvoid

WUR_func(u32_OPND0_，u32_OPND1_，u32_OPND2_，u32_OPND3_)WUR_func(u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)

{{

unsigned art＝ar(_OPND0_)；unsigned art = ar(_OPND0_);

unsigned sr＝_OPND1_；unsigned sr = _OPND1_;

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32 SWAP＝state32(STATE_SWAP)；u32 SWAP = state32(STATE_SWAP);

unsigned _tmp1；unsigned_tmp1;

unsigned _tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

unsigned SWAP_ns；unsigned SWAP_ns;

unsigned ACC_ns；unsigned ACC_ns;

unsigned ureg_sel_0；unsigned ureg_sel_0;

unsigned ureg_sel_1；unsigned ureg_sel_1;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

ureg_sel_0＝sr＝＝0；ureg_sel_0 = sr = = 0;

ureg_sel_1＝sr＝＝1；ureg_sel_1 = sr = = 1;

if(ureg_sel_0){if(ureg_sel_0){

_tmp0＝art；_tmp0 = art;

}else{}else{

_tmp0＝ACC_ps；_tmp0 = ACC_ps;

}}

ACC_ns＝_tmp0；ACC_ns = _tmp0;

if(ureg_sel_1){if(ureg_sel_1){

_tmp1＝(art & 0×1)；_tmp1=(art &0×1);

}else{}else{

_tmp1＝(SWAP_ps & 0×1)；_tmp1=(SWAP_ps &0×1);

}}

SWAP_ns＝_tmp1；SWAP_ns = _tmp1;

ACC＝ACC_ns；ACC = ACC_ns;

SWAP＝SWAP_ns；SWAP = SWAP_ns;

set_state32(STATE_ACC，ACC)；set_state32(STATE_ACC, ACC);

set_state32(STATE_SWAP，SWAP)；set_state32(STATE_SWAP, SWAP);

pc_incr(3)；pc_incr(3);

}}

void BYTESWAP_sched(u32 op0，u32 op1，u32 op2，u32 op3)void BYTESWAP_sched(u32 op0, u32 op1, u32 op2, u32 op3)

{{

int ff； int ff;

int cond； int cond;

ff＝au×32_fetchfirst()； ff=au×32_fetchfirst();

if(ff){ if(ff){

pipe_use_ifetch(3)； pipe_use_ifetch(3);

} }

pipe_use(arcode()，op0，1)； pipe_use(arcode(), op0, 1);

if(！ff){ if (!ff) {

pipe_use_ifetch(3)； pipe_use_ifetch(3);

} }

pipe_use_dcache()； pipe_use_dcache();

pipe_def_ifetch(-1)； pipe_def_ifetch(-1);

}}

void RUR_sched(u32 op0，u32 op1，u32 op2，u32 op3)void RUR_sched(u32 op0, u32 op1, u32 op2, u32 op3)

{{

int ff； int ff;

int cond； int cond;

ff＝au×32 fetchfirst()； ff=au×32 fetchfirst();

if(ff){ if(ff){

pipe_use_ifetch (3)； pipe_use_ifetch(3);

} }

if(！ff){ if (!ff) {

pipe_use_ifetch (3)； pipe_use_ifetch(3);

} }

pipe_use_dcache ()； pipe_use_dcache();

pipe_def(arcode()，op0，2)； pipe_def(arcode(), op0, 2);

pipe_def_ifetch (-1)； pipe_def_ifetch(-1);

}}

void WUR_sched(u32 op0，u32 op1，u32 op2，u32 op3)void WUR_sched(u32 op0, u32 op1, u32 op2, u32 op3)

{{

int ff； int ff;

int cond； int cond;

ff＝au×32_fetchfirst()； ff=au×32_fetchfirst();

if (ff){ if (ff){

pipe_use_i fetch (3)； pipe_use_i fetch(3);

} }

pipe_use(arcode()，op0，1)； pipe_use(arcode(), op0, 1);

if(！ff){ if (!ff) {

pipe_use_ifetch (3)； pipe_use_ifetch(3);

} }

pipe_use_dcache ()； pipe_use_dcache();

pipe_def_ifetch (-1)； pipe_def_ifetch(-1);

}}

typedef void(SEMFUNC)(u32_OPND0_，u32_OPND1_，u32_OPND2_，u32typedef void(SEMFUNC)(u32_OPND0_, u32_OPND1_, u32_OPND2_, u32

_OPND3_)；_OPND3_);

struct isafunc_tbl_entry{struct isafunc_tbl_entry{

const char ＊opname； const char *opname;

SEMFUNC ＊semfn； semfunc * semfn;

SEMFUNC ＊schedfn； SEMFUNC * schedfn;

}；};

static struct isafunc_tbl_entrylocal_fptr_tbl[]＝{static struct isafunc_tbl_entrylocal_fptr_tbl[]={

{″byteswap″，BYTESWAP_func，BYTESWAP_sched}， {"byteswap", BYTESWAP_func, BYTESWAP_sched},

{″rur″，RUR_func，RUR_sched}， {"rur", RUR_func, RUR_sched},

{″wur″，WUR_func，WUR_sched}， {"wur", WUR_func, WUR_sched},

{″″，0，0} {″″,0,0}

}；};

extern″C″struct isafunc_tbl_entry＊get_isafunc_tbl (void)；extern "C" struct isafunc_tbl_entry*get_isafunc_tbl(void);

struct isafunc_tbl_entry ＊get_isafunc_tbl(void)struct isafunc_tbl_entry *get_isafunc_tbl(void)

{{

return & local_fptr_tbl[0]； return &local_fptr_tbl[0];

}}

附件JAnnex J

/＊不要进行修改。这是自动产生的。＊//* Do not modify. This is generated automatically. */

#define BYTESWA#define BYTESWA

P(ars)\P(ars)\

({asm volatile(″BYTESWAP ％0″：：″a″(ars))；}) ({asm volatile("BYTESWAP %0"::"a"(ars)); })

#define RUR(st)\#define RUR(st)\

({int arr；asm volatile(″RUR ％0，％1″：″＝a″(arr)：″i″(st))； ({ int arr; asm volatile("RUR %0,%1":"=a"(arr):"i"(st));

arr；})arr; })

#define WUR(art，sr)\#define WUR(art,sr)\

({asm volatile(″WUR ％0，％1″：：″a″(art)，″i″(sr)) ({asm volatile("WUR %0,%1"::"a"(art), "i"(sr))

附件KAnnex K

#ifdef TIE_DEBUG#ifdef TIE_DEBUG

#define BYTESWAP TIE_BYTESWAP#define BYTESWAP TIE_BYTESWAP

#define RUR TIE_RUR#define RUR TIE_RUR

#define WUR TIE_WUR#define WUR TIE_WUR

#endif#endif

typedef unsigned u32；typedef unsigned u32;

#define STATE32_ACC 0#define STATE32_ACC 0

#define STATE_ACC STATE32_ACC#define STATE_ACC STATE32_ACC

#define STATE32_SWAP 1#define STATE32_SWAP 1

#define STATE_SWAP STATE32_SWAP#define STATE_SWAP STATE32_SWAP

#define NUM_STATE32 2#define NUM_STATE32 2

static u32 state32table[NUM_STATE32]；static u32 state32table[NUM_STATE32];

static char＊state32_name_table[NUM_STATE32]＝{static char*state32_name_table[NUM_STATE32]={

″ACC″， "ACC",

″SWAP″ "SWAP"

}；};

static u32 state32(int rn){return state32_table[rn]；}static u32 state32(int rn) { return state32_table[rn]; }

static void set_state32(int rn，u32s){state32_table[rn]＝s；}static void set_state32(int rn, u32s) { state32_table[rn] = s; }

static int num_state32(void){return NUM_STATE32；}static int num_state32(void) { return NUM_STATE32; }

static char＊state32_name(int rn){return state32_name_table[rn]；}static char * state32_name(int rn) { return state32_name_table[rn]; }

voidvoid

BYTESWAP(unsigned ars)BYTESWAP(unsigned ars)

{{

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32 SWAP＝state32(STATE_SWAP)；u32 SWAP = state32(STATE_SWAP);

unsigned_tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

unsigned ACC_ns；unsigned ACC_ns;

unsigned ars_swap；unsigned ars_swap;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

ars_swap＝(((ars & 0×ff))<<24)|((((ars>>8)& 0×ff))<<ars_swap＝(((ars & 0×ff))<<24)|((((ars>>8)& 0×ff))<<

16)|((((ars>>16)& 0×ff))<<8)|(((ars>>24)& 0×ff))；16)|((((ars>>16)& 0×ff))<<8)|(((ars>>24)&0×ff));

if(SWAP_ps){if(SWAP_ps){

_tmp0＝ars_swap；_tmp0 = ars_swap;

}else{}else{

_tmp0＝ars；_tmp0 = ars;

}}

ACC_ns＝ACC_ps+_tmp0；ACC_ns=ACC_ps+_tmp0;

ACC＝ACC_ns；ACC = ACC_ns;

set_state32(STATE_ACC，ACC)；set_state32(STATE_ACC, ACC);

}}

unsignedunsigned

RUR(unsigned st)RUR (unsigned st)

{{

unsigned arr；unsigned arr;

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32 SWAP＝state32(STATE_SWAP)；u32 SWAP = state32(STATE_SWAP);

unsigned_tmp1；unsigned_tmp1;

unsigned_tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

if(st＝＝1){if(st==1){

tmp0＝SWAP_ps；tmp0 = SWAP_ps;

}else{}else{

tmp0＝0；tmp0=0;

}}

if(st＝＝0){if(st==0){

_tmp1＝ACC_ps；_tmp1 = ACC_ps;

}else{}else{

_tmp1＝tmp0；_tmp1 = tmp0;

}}

arr＝_tmp1；arr = _tmp1;

return arr；return arr;

}}

voidvoid

WUR(unsigned art，unsigned sr)WUR (unsigned art, unsigned sr)

{{

u32 ACC＝state32(STATE_ACC)；u32 ACC = state32(STATE_ACC);

u32 SWAP＝state32(STATE_SWAP)；u32 SWAP = state32(STATE_SWAP);

unsigned_tmp1；unsigned_tmp1;

unsigned_tmp0；unsigned_tmp0;

unsigned SWAP_ps；unsigned SWAP_ps;

unsigned ACC_ps；unsigned ACC_ps;

unsigned SWAP_ns；unsigned SWAP_ns;

unsigned ACC_ns；unsigned ACC_ns;

unsigned ureg_sel_0；unsigned ureg_sel_0;

unsigned ureg_sel_1；unsigned ureg_sel_1;

SWAP_ps＝SWAP；SWAP_ps = SWAP;

ACC_ps＝ACC；ACC_ps = ACC;

ureg_sel_0＝sr＝＝0；ureg_sel_0 = sr = = 0;

ureg_sel_1＝sr＝＝1；ureg_sel_1 = sr = = 1;

if(ureg_sel_0){if(ureg_sel_0){

tmp0＝art；tmp0 = art;

}else{}else{

_tmp0＝ACC_ps；_tmp0 = ACC_ps;

}}

ACC_ns＝_tmp0；ACC_ns = _tmp0;

if(ureg_sel_1){if(ureg_sel_1){

tmp1＝(art& 0×1)；tmp1 = (art&0×1);

}else{}else{

_tmp1＝(SWAP_ps & 0×1)；_tmp1=(SWAP_ps &0×1);

}}

SWAP_ns＝_tmp1；SWAP_ns = _tmp1;

ACC＝ACC_ns；ACC = ACC_ns;

SWAP＝SWAP_ns；SWAP = SWAP_ns;

set_state32(STATE_ACC，ACC)；set_state32(STATE_ACC, ACC);

set_state32(STATE_SWAP，SWAP)；set_state32(STATE_SWAP, SWAP);

}}

#ifdef TIE_DEBUG#ifdef TIE_DEBUG

#unde fBYTESWAP#unde fBYTESWAP

#undef RUR#undef RUR

#undef WUR#undef WUR

#endif#endif

附件LAnnex L

//Do not modify this automatically generated file.//Do not modify this automatically generated file.

module tie_enflop(tie_out，tie_in，en，clk)；module tie_enflop(tie_out, tie_in, en, clk);

parameter size＝32；parameter size=32;

output [size-1∶0] tie_out；output[size-1:0] tie_out;

input [size-1∶0] tie_in；input[size-1:0] tie_in;

input en；input en;

input clk；input clk;

reg[size-1∶0] tmp；reg[size-1:0]tmp;

assign tie_out＝tmp；assign tie_out = tmp;

always @(p@osedge clk) beginalways @(p@osedge clk) begin

if (en) if (en)

tmp<＝#1tie_in； tmp <= #1 tie_in;

endend

endmoduleendmodule

module tie_flop(tie_out，tie_in，clk)；module tie_flop(tie_out, tie_in, clk);

parameter size＝32；parameter size=32;

output [size-1∶0] tie_out；output[size-1:0] tie_out;

input [size-1∶0] tie_in；input[size-1:0] tie_in;

input clk；input clk;

reg [size-1∶0] tmp；reg[size-1:0]tmp;

assign tie_out＝tmp；assign tie_out = tmp;

always @(posedge clk) beginalways @(posedge clk) begin

tmp<＝ #1 tie_in；tmp <= #1 tie_in;

endend

endmoduleendmodule

parameter size＝32；parameter size=32;

input[size-1∶0]ns； //next stateinput[size-1:0]ns; //next state

input we； //write enableinput we; //write enable

input ke； //Kill E stateinput ke; //Kill E state

input kp； //Kill Pipelineinput kp; //Kill Pipeline

input vw； //Valid W stateinput vw; //Valid W state

input clk； //clockinput clk; //clock

output [size-1∶0] ps；//presentstateoutput[size-1:0] ps; //presentstate

wire [size-1∶0] se； //state at E stagewire [size-1∶0] se; //state at E stage

wire[size-1∶0]sm； //state at M stagewire[size-1∶0]sm; //state at M stage

wire[size-1∶0]sw； //state at W stagewire[size-1∶0]sw; //state at W stage

wire[size-1∶0]sx； //state at X stagewire[size-1∶0]sx; //state at X stage

assign se＝kp ？ sx ： ns；assign se=kp ? sx: ns;

assign ee＝kp l we &～ke；assign ee = kp l we & ~ ke;

assign ew＝vw &～kp；assign ew = vw & ~ kp;

assign ps＝sm；assign ps = sm;

.clk(clk))；.clk(clk));

tie_flop #(size)state_MW(.tie_out(sw)，.tie_in(sm)，.clk(clk))；tie_flop #(size) state_MW(.tie_out(sw), .tie_in(sm), .clk(clk));

.clk(clk))；.clk(clk));

endmoduleendmodule

module bs (ars，ACC_ps，SWAP_ps，ACC_ns，ACC_we，BYTESWAP)；module bs(ars, ACC_ps, SWAP_ps, ACC_ns, ACC_we, BYTESWAP);

input [31∶0] ars；input[31:0] ars;

input [31∶0] ACC_ps；input[31:0] acc_ps;

input [0∶0] SWAP_ps；input[0:0] SWAP_ps;

output [31∶0] ACC_ns；output[31:0] ACC_ns;

output ACC_we；output ACC_we;

input BYTESWAP；input BYTESWAP;

wire [31∶0] ars_swap；wire[31:0] ars_swap;

assign ars_swap＝{ars[7∶0]，ars[15∶8]，ars[23∶16]，ars[31∶24]}；assign ars_swap = {ars[7:0], ars[15:8], ars[23:16], ars[31:24]};

assign ACC_ns＝(ACC_ps)+((SWAP_ps)？(ars_swap)：(ars))；assign ACC_ns=(ACC_ps)+((SWAP_ps)?(ars_swap):(ars));

assign ACC_we＝1′b1 & BYTESWAP；assign ACC_we=1′b1 &BYTESWAP;

endmoduleendmodule

module rur(arr，st，ACC_ps，SWAP_ps，RUR)；module rur(arr, st, ACC_ps, SWAP_ps, RUR);

output [31∶0] arr；output[31:0] arr;

input [31∶0] st；input[31:0]st;

input [31∶0] ACC_ps；input[31:0] acc_ps;

input [0∶0] SWAP_ps；input[0:0] SWAP_ps;

input RUR；input RUR;

assign arr＝((st)＝＝(8′d0))？(ACC_ps)：(((st)＝＝(8′d1))？assign arr=((st)==(8'd0))? (ACC_ps): (((st)==(8'd1))?

(SWAP_ps)：(32′b0))；(SWAP_ps):(32'b0));

endmoduleendmodule

module wur (art，sr，ACC_ps，SWAP_ps，ACC_ns，ACC_we，SWAP_ns，SWAP_we，WUR)；module wur(art, sr, ACC_ps, SWAP_ps, ACC_ns, ACC_we, SWAP_ns, SWAP_we, WUR);

input [31∶0] art；input [31:0] art;

input [31∶0] sr；input[31:0] sr;

input [31∶0] ACC_ps；input[31:0] acc_ps;

input [0∶0] SWAP_ps；input[0:0] SWAP_ps;

output [31∶0] ACC_ns；output[31:0] ACC_ns;

output ACC_we；output ACC_we;

output [0∶0] SWAP_ns；output[0:0] SWAP_ns;

output SWAP_we；output SWAP_we;

input WUR；input WUR;

wire ureg_sel_0；wire ureg_sel_0;

assign ureg_sel_O＝(sr)＝＝(8′h0)；assign ureg_sel_O=(sr)==(8'h0);

wire ureg_sel_1；wire ureg_sel_1;

assign ureg_sel_1＝(sr)＝＝(8′h1)；assign ureg_sel_1=(sr)==(8'h1);

assign ACC_ns＝{(ureg_sel_0)？(art[31∶0])：(ACC_ps[31∶0])}；assign ACC_ns = {(ureg_sel_0)? (art[31:0]):(ACC_ps[31:0])};

assign SWAP_ns＝{(ureg_sel_1)？(art[0∶0])：(SWAP_ps[0∶0])}；assign SWAP_ns = {(ureg_sel_1)? (art[0:0]):(SWAP_ps[0:0])};

assign ACC_we＝1′b1 & WUR；assign ACC_we=1′b1 &WUR;

assign SWAP_we＝1′b1 & WUR；assign SWAP_we=1′b1 &WUR;

endmoduleendmodule

module UserInstModule(clk，out_E，ars_E，art_E，inst_R，Kill_E，module UserInstModule(clk, out_E, ars_E, art_E, inst_R, Kill_E,

killPipe_W，valid_W，BYTESWAP_R，RUR_R，WUR_R，en_R)；killPipe_W, valid_W, BYTESWAP_R, RUR_R, WUR_R, en_R);

input clk；input clk;

output [31∶0] out_E；output[31:0] out_E;

input [31∶0] ars_E；input[31:0] ars_E;

input [31∶0] art_E；input[31:0] art_E;

input [23∶0] inst_R；input[23:0] inst_R;

input en_R；input en_R;

input Kill_E，killPipe_W，valid_W；input Kill_E, killPipe_W, valid_W;

input BYTESWAP_R；input BYTESWAP_R;

input RUR_R；input RUR_R;

input WUR_R；input WUR_R;

wire BYTESWAP_E；wire BYTESWAP_E;

wire RUR_E；wire RUR_E;

wire WUR_E；wire WUR_E;

wire [31∶0]arr_E；wire[31:0] arr_E;

wire [31∶0]sr_R，sr_E；wire [31:0] sr_R, sr_E;

wire [31∶0]st_R，st_E；wire [31:0] st_R, st_E;

wire [31∶0]ACC_ps，ACC_ns；wire[31:0] ACC_ps, ACC_ns;

wire ACC_we；wire ACC_we;

wire [0∶0]SWAP_ps，SWAP_ns；wire [0:0] SWAP_ps, SWAP_ns;

wire SWAP_we；wire SWAP_we;

wire [31∶0]bs_ACC_ns；wire[31:0] bs_acc_ns;

wire bs_ACC_we；wire bs_ACC_we;

wire bs_select；wire bs_select;

wire [31∶0]rur_arr；wire[31:0] rur_arr;

wire rur_select；wire rur_select;

wire [31∶0]wur_ACC_ns；wire[31:0] wur_acc_ns;

wire wur_ACC_we；wire wur_ACC_we;

wire [0∶0]wur_SWAP_ns；wire[0:0] wur_SWAP_ns;

wire wur_SWAP_we；wire wur_SWAP_we;

wire wur_select；wire wur_select;

tie_enflop#(1)fBYTESWAP(.tie_out(BYTESWAP_E)，.tie_in(BYTESWAP_R)，tie_enflop#(1) fBYTESWAP(.tie_out(BYTESWAP_E), .tie_in(BYTESWAP_R),

.en(en_R)，.clk(clk))；.en(en_R), .clk(clk));

tie_enflop#(1)fRUR(.tie_out(RUR_E)，.tie_in(RUR_R)，.en(en_R)，tie_enflop#(1)fRUR(.tie_out(RUR_E), .tie_in(RUR_R), .en(en_R),

.clk(clk))；.clk(clk));

tie_enflop#(1)fWUR(.tie_out(WUR_E)，.tie_in(WUR_R)，.en(en_R)，tie_enflop#(1)fWUR(.tie_out(WUR_E), .tie_in(WUR_R), .en(en_R),

.clk(clk))；.clk(clk));

assign sr_R＝{{inst_R[11∶8]}，{inst_R[15∶12]}}；assign sr_R = {{inst_R[11:8]}, {inst_R[15:12]}};

tie_enflop#(32)fsr(.tie_out(sr_E)，.tie_in(sr_R)，.en(en_R)，tie_enflop #(32) fsr(.tie_out(sr_E), .tie_in(sr_R), .en(en_R),

.clk(clk))；.clk(clk));

assign st_R＝{{inst_R[11∶8]}，{inst_R[7∶4]}}；assign st_R = {{inst_R[11:8]}, {inst_R[7:4]}};

tie_enflop#(32)fst(.tie_out(st_E)，.tie_in(st_R)，.en(en_R)，tie_enflop#(32) fst(.tie_out(st_E), .tie_in(st_R), .en(en_R),

.clk(clk))；.clk(clk));

bs ibs(bs ibs(

.ars(ars_E)， .ars(ars_E),

.ACC_ps(ACC_ps)， .ACC_ps(ACC_ps),

.SWAP_ps(SWAP_ps)， .SWAP_ps(SWAP_ps),

.ACC_ns(bs_ACC_ns)， .ACC_ns(bs_ACC_ns),

.ACC_we(bs_ACC_we)， .ACC_we(bs_ACC_we),

.BYTESWAP(BYTESWAP_E))； .BYTESWAP(BYTESWAP_E));

rur irur(rur irur(

.arr(rur_arr)， .arr(rur_arr),

.st(st_E)， .st(st_E),

.ACC_ps(ACC_ps)， .ACC_ps(ACC_ps),

.SWAP_ps(SWAP_ps)， .SWAP_ps(SWAP_ps),

.RUR(RUR_E))； .RUR(RUR_E));

wur iwur(wur iwur(

.art(art_E)， .art(art_E),

.sr(sr_E)， .sr(sr_E),

.ACC_ps(ACC_ps)， .ACC_ps(ACC_ps),

.SWAP_ps(SWAP_ps)， .SWAP_ps(SWAP_ps),

.ACC_ns(wur_ACC_ns)， .ACC_ns(wur_ACC_ns),

.ACC_we(wur_ACC_we)， .ACC_we(wur_ACC_we),

.SWAP_ns(wur_SWAP_ns)， .SWAP_ns(wur_SWAP_ns),

.SWAP_we(wur_SWAP_we)， .SWAP_we(wur_SWAP_we),

.WUR(WUR_E))； .WUR(WUR_E));

tie_athens_state#(32)iACC(tie_athens_state#(32)iACC(

.ns(ACC_ns)， .ns(ACC_ns),

.we(ACC_we)， .we(ACC_we),

.ke(Kill_E)， .ke(Kill_E),

.kp(killPipe_W)， .kp(killPipe_W),

.vw(valid_W)， .vw(valid_W),

.clk(clk)， .clk(clk),

.ps(ACC_ps))； .ps(ACC_ps));

tie_athens_state#(1)iSWAP(tie_athens_state#(1)iSWAP(

.ns(SWAP_ns)， .ns(SWAP_ns),

.we(SWAP_we)， .we(SWAP_we),

.ke(Kill_E)， .ke(Kill_E),

.kp(killPipe_W)， .kp(killPipe_W),

.vw(valid_W)， .vw(valid_W),

.clk(clk)， .clk(clk),

.ps(SWAP_ps))； .ps(SWAP_ps));

assign bs_select＝BYTESWAP_E；assign bs_select = BYTESWAP_E;

assign rur_select＝RUR_E；assign rur_select = rur_e;

assign wur_select＝WUR_E；assign wur_select = wur_e;

assign arr_E＝{32{1′b0}} & {32{bs_select}}assign arr_E={32{1′b0}} & {32{bs_select}}

| rur_arr & {32{rur_select}} | rur_arr & {32{rur_select}}

| {32{1′b0}} & {32{wur_select}}； | {32{1′b0}} &{32{wur_select}};

assign out_E＝arr_E；assign out_E = arr_E;

assign ACC_ns＝bs_ACC_ns&{32{bs_select}}assign ACC_ns=bs_ACC_ns&{32{bs_select}}

| {32{1′b0}} & {32{rur_select}} | {32{1′b0}} & {32{rur_select}}

| wur_ACC_ns & {32{wur_select}}； | wur_ACC_ns &{32{wur_select}};

assign ACC_we＝bs_ACC_we & bs_selectassign ACC_we = bs_ACC_we & bs_select

|1′b0 & rur_select |1′b0 & rur_select

|wur_ACC_we & wur_select； |wur_ACC_we &wur_select;

assign SWAP_ns＝{1{1′b0}} & {1{bs_select}}assign SWAP_ns={1{1′b0}} & {1{bs_select}}

| {1{1′b0}} & {1{rur_select}} | {1{1′b0}} & {1{rur_select}}

| wur_SWAP_ns & {1{wur_select}}； | wur_SWAP_ns &{1{wur_select}};

assign SWAP_we＝1′b0 & bs_selectassign SWAP_we=1′b0 & bs_select

|1′b0 & rur_select |1′b0 & rur_select

|wur_SWAP_we & wur_select； |wur_SWAP_we &wur_select;

endmoduleendmodule

附件MAnnex M

　　　　　　　　　　您需要为这个部分填入必要的信息You need to fill in the necessary information for this section

＊/*/

/＊ Set the search path to include the library directories＊//* Set the search path to include the library directories*/

SYNOPSYS＝get_unix_variable(″SYNOPSYS″)SYNOPSYS = get_unix_variable("SYNOPSYS")

search_path＝SYNOPSYS+/libraries/synsearch_path=SYNOPSYS+/libraries/syn

/＊Set the path and name of target library＊//*Set the path and name of target library*/

search_path＝<...>+search_pathsearch_path=<...>+search_path

target_library＝<nameofthelibrary>target_library=<nameofthelibrary>

/＊Constraint information＊//*Constraint information*/

OPERATING_CONDITION＝<name of the operating condition>OPERATING_CONDITION=<name of the operating condition>

WIRE_LOAD＝<name of the wire-load model>WIRE_LOAD=<name of the wire-load model>

BOUNDARY_LOAD＝<library name>/<smallest inverter name>/<input pinBOUNDARY_LOAD=<library name>/<smallest inverter name>/<input pin

name>name>

DRIVE_CELL＝<alargeFF name>DRIVE_CELL=<largeFF name>

DRIVE_PIN＝<Q pin name of the FF>DRIVE_PIN=<Q pin name of the FF>

DRIVE_PIN_FROM＝<clock pin name of the FF>DRIVE_PIN_FROM=<clock pin name of the FF>

/＊target rocessor clock period＊//*target processor clock period*/

CLOCK_PERIOD＝<target clock period>CLOCK_PERIOD=<target clock period>

＊＊＊＊

下面您不需要作出任何更改 You don't need to make any changes below

＊/*/

link_library＝{″＊″}+target_librarylink_library={″*″}+target_library

symbol_library＝generic.sdbsymbol_library=generic.sdb

/＊prepare workdir for hdl compiler＊//*prepare workdir for hdl compiler*/

hdlin_auto_save_templates＝″TRUE″hdlin_auto_save_templates="TRUE"

define_design_lib WORK-path workdirdefine_design_lib WORK-path workdir

sh mkdir -p workdirsh mkdir -p workdir

read -f verilog./prim.vread -f verilog./prim.v

read -f erilog./ROOT.vread -f erilog./ROOT.v

current_design UserInstModulecurrent_design UserInstModule

linklink

set_operating_conditions OPERATING_CONDITIONset_operating_conditions OPERATING_CONDITION

set_wire_load WIRE_LOADset_wire_load WIRE_LOAD

create_clock clk-period CLOCK_PERIODcreate_clock clk-period CLOCK_PERIOD

set_dont_touch_network clkset_dont_touch_network clk

set_load{2＊load_of(BOUNDARY_LOAD)}all_outputs()set_load{2*load_of(BOUNDARY_LOAD)}all_outputs()

set_load{2＊load_of(BOUNDARY_LOAD)}all_inputs()set_load{2*load_of(BOUNDARY_LOAD)}all_inputs()

set_driving_cell-cellDRIVE_CELL-pin DRIVE_PIN-from_pinset_driving_cell-cellDRIVE_CELL-pin DRIVE_PIN-from_pin

DRIVE_PIN_FROM all_inputs()DRIVE_PIN_FROM all_inputs()

set_max_delay 0.5＊CLOCK_PERIOD-from all_inputs()-to find(clock，set_max_delay 0.5 * CLOCK_PERIOD - from all_inputs() - to find(clock,

clk)clk)

set_max_delay 0.5＊CLOCK_PERIOD-from find(clock，clk)-toset_max_delay 0.5 * CLOCK_PERIOD - from find (clock, clk) - to

all_outputs()all_outputs()

set_max_delay 0.5＊CLOCK_PERIOD-from all_inputs()-toall_outputs()set_max_delay 0.5 * CLOCK_PERIOD - from all_inputs() - to all_outputs()

set_drive-rise O clkset_drive-rise O clk

set_drive-fall O clkset_drive-fall O clk

compile-ungroup_allcompile-ungroup_all

report_timingreport_timing

report_constraint-all_violreport_constraint-all_viol

report_areareport_area

Claims

1. A system for designing a configurable processor, the system comprising:

means for generating a description of a hardware implementation of a processor based on a configuration specification, wherein the configuration specification includes: a binary selection portion for determining whether certain features are included in the processor; the parameter selection part of the parameters of some predetermined features; and

means for generating software development tools specific to the hardware implementation based on the configuration specification,

Wherein, the configuration description includes at least one extended description of the processor's extensible features, and the extended description specifies the inclusion of a user-defined instruction and an implementation scheme for the instruction.

2. The system of claim 1, wherein the software development tool is used to generate code to run on the processor.

3. The system of claim 1, wherein the software development tool includes a compiler, adapted to the configuration specification, for compiling the application into code executable by the processor.

4. The system of claim 1, wherein the software development tool includes an assembler program adapted to the configuration specification for assembling the application into code executable by the processor.

5. The system of claim 1, wherein the software development tool includes a linker, adapted to the configuration specification, for linking code executable by the processor.

6. The system of claim 1, wherein the software development tool includes a decompiler adapted to the configuration specification for disassembling code executable by the processor.

7. The system of claim 1, wherein the software development tool includes a debugger adapted to the configuration specification for debugging code executable by the processor.

8. The system of claim 7, wherein the debugger has a common interface and configuration for the instruction set simulator and the hardware implementation.

9. The system of claim 1, wherein the software development tool includes an instruction set simulator adapted to the configuration specification for simulating code executable by the processor.

10. The system of claim 9, wherein the instruction set simulator is capable of simulating the execution of the simulated code to measure one or more performance specifications including execution cycles.

11. The system of claim 10, wherein the performance specification is based on specific configurable microarchitectural features.

12. The system of claim 10, wherein the instruction set simulator is capable of configuring the execution of the simulated program to record standard configuration statistics including the number of cycles executed in each simulated function.

13. The system of claim 1, wherein the hardware implementation description includes at least one of the following: a detailed HDL hardware implementation description; a synthetic script; a layout and wiring script; a programmable logic device script scripts; test benches; diagnostic tests for verification; scripts for running diagnostic tests on a simulated program; and test tools.

14. The system according to claim 1, wherein the means for generating a hardware implementation description comprises:

means for generating a hardware description language description of a hardware implementation description from a configuration specification;

means for synthesizing logic for a hardware implementation based on a hardware description language description; and

A device for placing and wiring elements on a chip to form circuits based on synthesized logic.

15. The system according to claim 14, the means for generating a hardware implementation description further comprising:

means for verifying the timing of the circuit; and

A device used to determine the area, cycle time, and power consumption of a circuit.

16. The system of claim 1, further comprising means for generating configuration instructions.

17. The system of claim 16, wherein the means for generating configuration instructions is responsive to selection of configuration parameters by a user.

18. The system of claim 16, wherein the means for generating a configuration specification is for generating a specification based on a processor design target.

19. The system of claim 1, wherein the configuration specification includes at least one parametric specification of a modifiable characteristic of the processor.

20. The system of claim 19, wherein at least one parameter specification specifies inclusion of a functional unit, and at least one processor instruction to execute the functional unit.

21. The system of claim 19, wherein at least one parameter specification specifies one of inclusion, exclusion, and characteristics of a structure that affects processor state.

22. The system of claim 21, wherein the structure is a register file and the parameter specification specifies the number of registers in the register file.

23. The system of claim 21, wherein the structure is an instruction cache.

24. The system of claim 21, wherein the structure is a data cache.

25. The system of claim 21, wherein the structure is a write cache.

26. The system of claim 21, wherein the structure is one of on-chip ROM and on-chip RAM.

27. The system of claim 19, wherein at least one parameter specification specifies a semantic property that controls interpretation of at least one of data and instructions in the processor.

28. The system of claim 19, wherein at least one parameter specification specifies an execution characteristic that controls execution of instructions in the processor.

29. The system of claim 19, wherein at least one parameter specifies a debug characteristic of a given processor.

30. The system of claim 19, wherein the configuration specification includes a parameter specification specifying at least one selected from among predetermined features, size or number of processor elements, and assignment of values.

31. The system of claim 1, further comprising means for evaluating the applicability of the configuration specification.

32. The system of claim 31, wherein the means for evaluating comprises an interactive evaluation tool.

33. The system of claim 31, wherein the means for evaluating is to evaluate hardware characteristics of the processor described by the configuration specification.

34. The system of claim 31, wherein the means for evaluating is to evaluate applicability of the configuration specification based on the evaluated performance characteristics of the processor.

35. A system according to claim 34, further comprising means for providing information which modifies the configuration specification based on the evaluated performance characteristics.

36. The system of claim 34, wherein the performance characteristics include at least one of an area required to implement the processor on a chip, power consumed by the processor, and a clock speed of the processor.

37. The system of claim 31, wherein the means for evaluating is to evaluate the applicability of the configuration specification based on the evaluated software characteristics of the processor.

38. The system of claim 37, wherein the means for evaluating determines at least one of required code size and number of cycles by executing a suite of benchmark programs on the processor described by the configuration specification An assessment is made to interactively provide a suitability assessment to the user.

39. The system according to claim 31, wherein the means for evaluating evaluates each hardware characteristic and each software characteristic of the processor described by the configuration specification.

40. The system of claim 1, wherein the means for generating a description of a hardware implementation of a processor also simultaneously provides performance and cost characteristics of the hardware, and the means for generating software development tools together with the means for generating a processing The device hardware implementation described means for generating software application performance information for modification of configuration specifications.

41. The system of claim 1, wherein the means for generating a description of a hardware implementation of a processor also simultaneously provides performance and cost characteristics of the hardware, and the means for generating software development tools together with the means for generating The processor hardware implementation describes means for generating software application performance information to extend configuration specifications.

42. The system of claim 1 , wherein the means for generating a hardware description of a processor also simultaneously provides performance and cost characteristics of the hardware, and the means for generating a software development tool together with the means for generating a processor hardware Embodiments describe means for generating software application performance information to facilitate the description of configuration specifications; and means for generating a hardware description of a processor providing performance and cost characteristics of the hardware, and means for generating software development tools in conjunction with Means for generating a processor hardware implementation description for generating software application performance information for describing extensions to configuration descriptions.

43. The system of claim 1, further comprising means for generating a configuration of processors by extending a basic configuration of processors.

44. The system of claim 1, wherein the extended specification specifies additional instructions.

45. The system of claim 1, wherein the means for generating a software development tool comprises means for suggesting to a user possible user-defined instructions suitable for at least one application.

46. The system of claim 1, wherein the software development tool includes a compiler to generate user-defined instructions.

47. The system of claim 46, wherein the compiler is capable of optimizing code containing user-defined instructions.

48. The system of claim 1 , wherein the software development tools include at least one of: an assembler capable of generating user-defined instructions; a simulator capable of simulating execution of user code using user-defined instructions; and A tool capable of validating user implementations of user-defined directives.

49. The system of claim 46, wherein the compiler is capable of automatically generating additional instructions.

50. The system of claim 1, wherein:

An extension specification specifies a new feature with functionality designed in abstract form by the user; and

The means for generating the hardware implementation description also redefines new features and integrates them into the detailed hardware implementation description.

51. The system of claim 50, wherein an extension specification is a statement in an instruction set architecture language that is used to specify an opcode assignment and an instruction semantics.

52. The system of claim 51, wherein the means for generating a hardware implementation description comprises means for generating instruction decode logic from an instruction set architecture language definition.

53. The system of claim 52, wherein the means for generating a hardware implementation description further comprises an instruction set architecture language definition for generating signals specifying register operand usage for instruction interlock and suspend logic installation.

54. The system of claim 50, wherein the means for generating a software development tool includes means for generating an instruction decoding method for use in an instruction set simulation program adapted to a configuration specification.

55. The system of claim 50, wherein the means for generating a software development tool includes means for generating a coding table for a segment of assembly for generating a processor's object code adapted to a configuration specification in the program.

56. The system of claim 50, wherein the means for generating a hardware implementation description is further configured to generate a hardware description of a data path for a new feature, the hardware of the data path being related to the particular pipeline architecture of the processor The structure is consistent.

57. The system of claim 44, wherein additional instructions add no new state to the processor.

58. The system of claim 44, wherein additional instructions add state to the processor.

59. The system of claim 1, wherein the configuration description includes at least a portion specified by an instruction set architecture description language description.

60. The system of claim 59, wherein the means for generating a hardware implementation description comprises means for automatically generating instruction decode logic from an instruction set architecture language description.

61. The system of claim 59, wherein the means for generating a software development tool includes means for automatically generating an assembler kernel from an instruction set architecture language description.

62. The system of claim 59, wherein the means for generating a software development tool includes means for automatically generating a compiled program from an instruction set architecture language description.

63. The system of claim 59, wherein the means for generating a software development tool includes means for automatically generating a disassembler from an instruction set architecture language description.

64. The system of claim 59, wherein the means for generating a software development tool includes means for automatically generating an instruction set simulation program from an instruction set architecture language description.

65. The system of claim 1, wherein the means for generating a hardware implementation description includes preprocessing a portion of at least one of the hardware implementation description and the means for software development tools, so that according to the configuration specification, respectively Means for making modifications to hardware implementation descriptions and software tools.

66. The system of claim 65, wherein the means for preprocessing evaluates an expression in one of the hardware implementation description and the software development tool from the configuration specification and replaces the expression with a numerical value Mode.

67. The system of claim 66, wherein the expression includes at least one of an iteration structure, a conditional structure, and a database query.

68. The system of claim 1, wherein the configuration specification includes at least one parameter specification specifying modifiable characteristics of the processor.

69. The system of claim 68, wherein the modifiable feature is one of a modification to the core specification and an optional feature not specified in the core specification.

70. The system of claim 1, wherein the configuration specification includes at least one parameter specification specifying a binary selectable characteristic of the processor, at least one processor characteristic that can be specified with parameters.

71. A method for designing a configurable processor, the method comprising:

A description of a hardware implementation of a processor is generated from a configuration specification, wherein the configuration specification includes a binary selection portion for determining whether certain features are included in the processor and parameters for specifying certain predetermined features of the processor The parameter selection part of the ; and

Generate software development tools specific to the hardware implementation according to the configuration instructions,

72. A system for designing a configurable processor, the system comprising:

Means for generating configuration instructions with user-definable parts comprising:

Notes on user-defined processor states, and

at least one user-defined instruction and an associated user-defined function that includes at least one of reading from and writing to a user-defined processor state; and

Means for generating a hardware implementation description of a processor based on a configuration specification, wherein the hardware implementation of the processor includes a user-defined instruction execution unit for executing user-defined instructions.

73. The system of claim 72, wherein the hardware implementation description of the processor includes a description of the control logic required to execute at least one user-defined instruction and to implement the user-defined processor state.

74. The system of claim 73, wherein:

The hardware implementation of the processor describes a pipeline of instruction execution; and

Control logic includes portions associated with each stage of the pipeline for instruction execution.

75. The system of claim 74, wherein:

The hardware implementation description includes a description of the circuitry used to halt execution of instructions; and

The control logic includes circuitry for preventing modification of the user-defined state by the aborted instruction.

76. The system of claim 75, wherein the control logic includes circuitry for performing at least one of instruction issue, operand bypass, and operand write enable for at least one user-defined instruction.

77. The system of claim 74, wherein the hardware implementation description includes registers for implementing user-defined states in a number of stages of a pipeline of instruction execution.

78. The system of claim 74, wherein:

The hardware implementation description includes status registers that are written in a pipeline stage different from the pipeline stage in which each output operand is generated;

The hardware implementation description specifies bypassing such writes into subsequent instructions that reference user-defined processor state before acknowledging the write to the state register.

79. The system of claim 72, wherein:

the configuration description includes a predetermined section other than the user-defined section; and

The predetermined portion of the description includes an instruction that facilitates storing a user-defined state into memory, and an instruction that facilitates retrieving the user-defined state from memory.

80. The system of claim 79, further comprising means for generating software for context switching a user-defined state using said instructions that facilitate storing the user-defined state in memory.

81. The system of claim 72, further comprising means for generating at least one of:

an assembler program for assembling user-defined processor states and at least one user-defined instruction;

a compiler for compiling user-defined processor states and at least one user-defined instruction;

a simulation program that simulates a user-defined processor state and at least one user-defined instruction; and

A debugger for debugging user-defined processor states and at least one user-defined instruction.

82. The system of claim 72, further comprising generating an assembler for assembling user-defined processor states and at least one user-defined instruction; a compiler for user-defined processor states and compiles for at least one user-defined instruction; a simulator for simulating user-defined processor states and at least one user-defined instruction; and a debugger for user-defined processor states and at least one user-defined instruction Device for debugging.

83. The system of claim 72, wherein the user-defined portion of the specification includes at least one statement specifying a size and an index of the user-defined state.

84. The system of claim 83, wherein the user-defined portion of the specification includes at least one attribute related to a user-defined state in a processor register and a package specifying the user-defined state.

85. The system of claim 72, wherein the user-defined portion of the specification includes at least one statement specifying a mapping of user-defined states to processor registers.

86. The system of claim 72, wherein the means for generating a hardware implementation description comprises means for automatically mapping user-defined states to registers of the processor.

87. The system of claim 72, wherein the user-defined portion of the description includes at least one statement to describe a type of user-defined instruction and its effect on a user-defined state.

88. The system of claim 72, wherein the user-defined portion of the description includes at least one assignment statement to assign a value to the user-defined state.

89. A system for designing a configurable processor, the system comprising:

core software tools for generating software development tools specific to an instruction set architecture specification from that specification; and

A user-defined instruction module for generating at least one module from a user-defined instruction specification for use by core software tools in implementing the user-defined instruction, wherein the hardware implementation of the configurable processor includes a The user-defined instruction execution unit for the defined instruction.

90. The system of claim 89, wherein the core software tools include software tools capable of generating code to run on the processor.

91. The system of claim 89, wherein at least one module is implemented as a dynamic link library.

92. The system of claim 89, wherein at least one module is implemented as a table.

93. The system of claim 89, wherein the core software tools include a compiler that uses user-defined instruction modules for compiling the application into code that uses the user-defined instructions and is executable by the processor .

94. The system of claim 93, wherein at least one module includes a module used by a compiler to compile user-defined instructions.

95. The system of claim 89, wherein the core software tools include an assembler that uses user-defined modules for assembling the application into code that uses user-defined instructions and that is executable by the processor.

96. The system of claim 95, wherein at least one module includes a module used by an assembler to map assembly language instructions into user-defined instructions.

97. The system of claim 96, wherein:

The system also includes core instruction set descriptions for non-user-defined instructions; and

Description of the core instruction set, used by the assembler to assemble the application into code that can be executed by the processor.

98. The system of claim 89, wherein the core software tool includes an instruction set simulator for simulating code executable by the processor.

99. The system of claim 98, wherein at least one module includes a simulator module used by the simulator to simulate execution of user-defined instructions.

100. The system of claim 99, wherein the modules used by the simulation program include data for decoding user-defined instructions.

101. The system of claim 100, wherein the emulator uses a module to decode instructions using the emulator module when the instruction cannot be decoded as a predefined instruction.

102. The system of claim 89, wherein the core software tools include a debugger that uses user-defined modules to debug code that uses user-defined instructions and is executable by the processor.

103. The system of claim 102, wherein at least one module includes a module used by the debugger to decode machine instructions into assembly instructions.

104. The system of claim 102, wherein at least one module includes a module used by the debugger to convert assembly instructions to strings.

105. The system of claim 102, wherein:

The core software tool includes an instruction set emulation program that simulates the code that can be executed by the processor; and

The debugger is used to communicate with the simulator to obtain information about user-defined states for debugging.

106. The system of claim 89, wherein a single user-defined instruction can be used without modification by multiple core software tools according to different core instruction set specifications.

107. A system for designing a configurable processor, the system comprising:

Core software tools for producing software development tools specific to a specification based on an instruction set architecture;

A user-defined instruction module for generating a set of at least one module based on a user-defined instruction specification, which is used by the core software tools to implement each user-defined instruction, wherein the hardware implementation of the processor includes a The user-defined instruction execution unit of the instruction; and

Storage means for simultaneously storing groups generated by the user-defined instruction modules, each group corresponding to a different set of user-defined instructions.

108. The system of claim 107, wherein at least one module is implemented as a dynamic link library.

109. The system of claim 107, wherein at least one module is implemented as a table.

110. The system of claim 107, wherein the core software tools include a compiler that uses user-defined instruction modules for compiling an application into a processor-executable code.

111. The system of claim 110, wherein at least one module comprises a module used by a compiler to compile user-defined instructions.

112. The system of claim 107, wherein the core software tools include an assembler that uses user-defined instruction modules for assembling the application into code that uses the user-defined instructions and that is executable by the processor .

113. The system of claim 112, wherein at least one module comprises a module used by an assembler to map assembly language instructions to user-defined instructions.

114. The system of claim 107, wherein the core software tool includes an instruction set simulator for simulating code executable by the processor.

115. The system of claim 114, wherein at least one module includes a module used by the simulation program to simulate execution of user-defined instructions.

116. The system of claim 115, wherein the modules used by the simulation program include data for decoding user-defined instructions.

117. The system of claim 116, wherein the emulator uses a module to decode instructions using the emulator module when the instruction cannot be decoded as a predefined instruction.

118. The system of claim 107, wherein the core software tools include a debugger that uses user-defined modules to debug code that uses user-defined instructions and is executable by the processor.

119. The system of claim 118, wherein at least one module includes a module used by the debugger to decode machine instructions into assembly instructions.

120. The system of claim 118, wherein at least one module includes a module used by the debugger to convert assembly instructions to strings.

121. A system for designing a configurable processor, the system comprising:

A set of core software tools for producing software development tools specific to the specification based on the instruction set architecture;

A user-defined instruction module for generating at least one module, based on a user-defined instruction set specification, which is used by a core set of software tools to implement the user-defined instructions, wherein the hardware implementation of the processor includes instructions for executing the user-defined A user-defined instruction execution unit for instructions.

122. The system of claim 121, wherein at least one module is implemented as a dynamic link library.

123. The system of claim 121, wherein at least one module is implemented as a table.

124. The system of claim 121 , wherein at least one set of core software tools includes a compiler that uses user-defined instruction modules for compiling applications to use user-defined instructions and that can be processed by a processor The executed code.

125. The system of claim 124, wherein at least one module comprises a module used by a compiler to compile user-defined instructions.

126. The system of claim 121 , wherein at least one set of core software tools includes an assembler that uses user-defined instruction modules for assembling an application into a The executed code.

127. The system of claim 126, wherein at least one module includes a module used by an assembler to map assembly language instructions to user-defined instructions.

128. The system of claim 121, wherein at least one set of core software tools includes an instruction set simulator for simulating code executable by the processor.

129. The system of claim 128, wherein at least one module includes a module used by the simulation program to simulate execution of user-defined instructions.

130. The system of claim 129, wherein the modules used by the simulation program include data for decoding user-defined instructions.

131. The system of claim 130, wherein the emulator uses a module to decode instructions using the emulator module when the instruction cannot be decoded as a predefined instruction.

132. The system of claim 121, wherein at least one set of core software tools includes a debugger that uses user-defined modules to debug code executable by the processor using user-defined instructions.

133. The system of claim 132, wherein at least one module comprises

A module used by the program being debugged to decode machine instructions into assembly instructions.

134. The system of claim 132, wherein at least one module includes a module used by the debugger to convert assembly instructions to character strings.