CN107832083B

CN107832083B - Microprocessor with conditional instruction and processing method thereof

Info

Publication number: CN107832083B
Application number: CN201711069237.5A
Authority: CN
Inventors: G.葛兰.亨利; 泰瑞.派克斯; 罗德尼.E.虎克
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2011-04-07
Filing date: 2012-04-09
Publication date: 2020-06-12
Anticipated expiration: 2032-04-09
Also published as: CN107832083A; CN102707988B; CN103218203A; CN102707988A; CN103218203B

Abstract

A microprocessor with an instruction set architecture. The instruction set architecture defines an instruction, which includes an immediate field, which has a first part specifying a first value and a second part specifying a second value. The instruction instructs the microprocessor to perform an operation with a fixed value as one of the source operands, and the fixed value is obtained by rotating/shifting the first value based on the second value by a certain number of bits. The microprocessor includes: an instruction translator, which translates the instruction into at least one immediate ALU microinstruction, wherein the immediate ALU microinstruction is encoded in an instruction encoding method different from that defined by the instruction set architecture; and an execution pipeline, which executes the microinstruction generated by the instruction translator to generate a result defined by the instruction set architecture. The instruction translator, rather than the execution pipeline, generates a fixed value as a source operand for the immediate ALU microinstruction based on the first value and the second value for execution by the execution pipeline.

Description

Microprocessor with conditional instruction and processing method thereof

本发明为申请日为2012年4月9日、申请号为201610126292.2的名称为“具有条件指令的微处理器及其处理方法”的申请案(其中该申请案的原申请的申请日为2012年4月9日以及申请号为201210102141.5)的分案申请。The present invention is an application entitled "Microprocessor with Conditional Instructions and Its Processing Method" with an application date of April 9, 2012 and an application number of 201610126292.2 (wherein the application date of the original application of this application is 2012 April 9 and divisional application with application number 201210102141.5).

技术领域technical field

本发明是关于微处理器的技术领域，特别是关于在指令集中具有条件指令的微处理器。The present invention relates to the technical field of microprocessors, and particularly to microprocessors having conditional instructions in the instruction set.

背景技术Background technique

由Intel Corporation of Santa Clara,California开发出来的x86处理器架构以及由ARM Ltd.of Cambridge,UK开发出来的进阶精简指令集机器(advanced riscmachines,ARM)架构是计算机领域中两种广为人知的处理器架构。许多使用ARM或x86处理器的计算机系统已经出现，并且，对于此计算机系统的需求正在快速成长。现今，ARM架构处理核心是主宰低功耗、低价位的计算机市场，例如手机、手持式电子产品、平板计算机、网络路由器与集线器、机上盒等。举例来说，苹果iPhone与iPad主要的处理能力即是由ARM架构的处理核心提供。另一方面，x86架构处理器则是主宰需要高效能的高价位市场，例如膝上计算机、桌上型计算机与服务器等。然而，随着ARM核心效能的提升，以及某些x86处理器在功耗与成本的改善，前述低价位与高价位市场的界线逐渐模糊。在移动运算市场，如智能型手机，这两种架构已经开始激烈竞争。在膝上计算机、桌上型计算机与服务器市场，可以预期这两种架构将会有更频繁的竞争。The x86 processor architecture developed by Intel Corporation of Santa Clara, California and the advanced riscmachines (ARM) architecture developed by ARM Ltd. of Cambridge, UK are two well-known processors in the computer field. Architecture. Many computer systems using ARM or x86 processors have emerged, and the demand for such computer systems is growing rapidly. Today, the ARM architecture processing core dominates the low-power, low-cost computer market, such as mobile phones, handheld electronic products, tablet computers, network routers and hubs, and set-top boxes. For example, the main processing power of Apple's iPhone and iPad is provided by the processing core of the ARM architecture. On the other hand, x86-based processors dominate high-priced markets that require high performance, such as laptops, desktops, and servers. However, with the improvement of ARM core performance and the improvement of power consumption and cost of some x86 processors, the line between the low-priced and high-priced markets mentioned above is gradually blurred. In the mobile computing market, such as smart phones, the two architectures have begun to compete fiercely. In the laptop, desktop and server markets, expect more competition between the two architectures.

前述竞争态势使得计算机装置制造业者与消费者陷入两难，因无从判断哪一个架构将会主宰市场，更精确来说，无法判定哪一种架构的软件开发商将会开发更多软件。举例来说，一些每月或每年会定期购买大量计算机系统的消费个体，基于成本效率的考虑，例如大量采购的价格优惠与系统维修的简化等，会倾向于购买具有相同系统配置设定的计算机系统。然而，这些大型消费个体中的使用者群体，对于这些具有相同系统配置设定的计算机系统，往往有各种各样的运算需求。具体来说，部分使用者的需求是希望能够在ARM架构处理器上执行程序，其它部分使用者的需求是希望能够在x86架构处理器上执行程序，甚至有部分使用者希望能够同时在两种架构上执行程序。此外，新的、预期外的运算需求也可能出现而需要使用另一种架构。在这些情况下，这些大型个体所投入的部分资金就变成浪费。在另一个例子中，使用者具有一个重要的应用程序只能在x86架构上执行，因而他购买了x86架构的计算机系统(反之亦然)。不过，这个应用程序的后续版本改为针对ARM架构开发，并且优于原本的x86版本。使用者会希望转换架构来执行新版本的应用程序，但不幸地，他已经对于不倾向使用的架构投入相当成本。同样地，使用者原本投资于只能在ARM架构上执行的应用程序，但是后来也希望能够使用针对x86架构开发而未见于ARM架构的应用程序或是优于以ARM架构开发的应用程序，亦会遭遇这样的问题，反之亦然。值得注意的是，虽然小实体或是个人投入的金额较大实体为小，然而投资损失比例可能更高。其它类似的投资损失的例子可能出现在各种不同的运算市场中，例如由x86架构转换至ARM架构或是由ARM架构转换至x86架构的情况。最后，投资大量资源来开发新产品的运算装置制造业者，例如OEM厂商，也会陷入此架构选择的困境。若是制造业者基于x86或ARM架构研发制造大量产品，而使用者的需求突然改变，则会导致许多有价值的研发资源的浪费。The aforementioned competitive situation creates a dilemma between computer device manufacturers and consumers, as it is impossible to determine which architecture will dominate the market, and more precisely, which architecture software developers will develop more software. For example, some consumers who purchase a large number of computer systems on a monthly or yearly basis will tend to purchase computers with the same system configuration settings based on cost-efficiency considerations, such as price concessions for large-scale purchases and simplification of system maintenance. system. However, the user groups in these large consumer entities often have various computing requirements for these computer systems with the same system configuration settings. Specifically, the needs of some users are to be able to execute programs on ARM-based processors, and the needs of other users are to be able to execute programs on x86-based processors. Execute the program on the architecture. In addition, new, unanticipated computing requirements may arise that require the use of another architecture. In these cases, some of the money invested by these large individuals becomes wasted. In another example, a user has an important application that can only be executed on the x86 architecture, so he purchases an x86 architecture computer system (and vice versa). However, subsequent versions of this application were instead developed for the ARM architecture and outperformed the original x86 version. The user will want to switch the architecture to execute the new version of the application, but unfortunately, he has invested considerable cost in the architecture he does not prefer to use. Likewise, users who originally invested in applications that could only run on the ARM architecture, but later also wanted to be able to use applications developed for the x86 architecture that were not found on the ARM architecture or better than those developed on the ARM architecture, and also will encounter such problems, and vice versa. It is worth noting that although small entities or individuals with larger amounts invested are small, the proportion of investment losses may be higher. Other examples of similar investment losses may occur in various computing markets, such as switching from an x86 architecture to an ARM architecture or vice versa. Finally, computing device manufacturers, such as OEMs, that invest significant resources to develop new products are also caught in the dilemma of this architecture choice. If the manufacturer develops and manufactures a large number of products based on the x86 or ARM architecture, and the user's demand suddenly changes, it will lead to the waste of many valuable research and development resources.

对于运算装置的制造业者与消费者，能够保有其投资免于受到二种架构中何者胜出的影响是有帮助的，因而有必要提出一种解决方法让系统制造业者发展出可让使用者同时执行x86架构与ARM架构的程序的运算装置。For manufacturers and consumers of computing devices, it is helpful to be able to protect their investments from the influence of which of the two architectures wins, so there is a need for a solution for system manufacturers to develop systems that allow users to run concurrently A computing device for programs of the x86 architecture and the ARM architecture.

使系统能够执行多个指令集程序的需求由来已久，这些需求主要是因为消费者会投入相当成本在旧硬件上执行的软件程序，而其指令集往往不兼容于新硬件。举例来说，IBM 360系统Model 30即具有兼容于IBM 1401系统的特征来缓和使用者由1401系统转换至较高效能与改良特征的360系统的痛苦。Model 30具有360系统与1401系统的只读储存控制(Read Only Storage,ROS))，使其在辅助储存空间预先存入所需信息的情况下能够使用于1401系统。此外，在软件程序以高级语言开发的情况下，新的硬件开发商几乎没有办法控制为旧硬件所编译的软件程序，而软件开发商也欠缺动力为新硬件重新编译(re-compile)源码，此情形尤其发生在软件开发商与硬件开发商是不同个体的情况。Siberman与Ebcioglu于Computer,June 1993,No.6提出的文章“An Architectural Framework for SupportingHeterogeneous Instruction-Set Architectures”中揭露一种利用执行于精简指令集(RISC)、超纯量架构(superscalar)与超长指令字(VLIW)架构(下称原生架构)的系统来改善既存复杂指令集(CISC)架构(例如IBM S/390)执行效率的技术，其所揭露的系统包含执行原生码的原生引擎(native engine)与执行目的码的迁移引擎(migrant engine)，并可依据转译软件将目的码(object bode)转译为原生码(native code)的转译效果，在这两种编码间视需要进行转换。请参照2006年5月16日公告的美国专利第7,047,394号专利案，VanDyke et al.揭露一处理器，具有用以执行原生精简指令集(Tapestry)的程序指令的执行管线，并利用硬件转译与软件转译的结合，将x86程序指令转译为原生精简指令集的指令。Nakada et al.提出具有ARM架构的前端管线与Fujitsu FR-V(超长指令字)架构的前端管线的异质多线程处理器(heterogeneous SMT processor)，ARM架构前端管线用于非规则(irregular)软件程序(如操作系统)，而Fujitsu FR-V(超长指令字)架构的前端管线用于多媒体应用程序，其将一增加的超长指令字队列提供予FR-V超长指令字的后端管线以维持来自前端管线的指令。请参照Buchty与Weib,eds,Universitatsverlag Karlsruhe于2008年11月在First International Workshop on New Frontiers in High-performance andHardware-aware Computing(HipHaC’08),Lake Como,Italy,(配合MICRO-41)发表的论文集(ISBN 978-3-86644-298-6)的文章“OROCHI:A Multiple Instruction Set SMTProcessor”。文中提出的方法用以降低整个系统在异质系统单芯片(SOC)装置(如德州仪器OMAP应用处理器)内所占据的空间，此异质系统单芯片装置具有一个ARM处理器核心加上一个或多个协同处理器(co-processors)(例如TMS320、多种数字信号处理器、或是多种图形处理单元(GPUs))。这些协同处理器并不分享指令执行资源，只是集成于同一芯片上的不同处理核心。The long-standing need to enable systems to execute programs in multiple instruction sets arises primarily from the fact that consumers will invest considerable cost in software programs that execute on older hardware, whose instruction sets are often incompatible with newer hardware. For example, the IBM 360 system Model 30 has features compatible with the IBM 1401 system to ease the pain of users switching from the 1401 system to the 360 system with higher performance and improved features. The Model 30 has the Read Only Storage (ROS) of the 360 system and the 1401 system, so that it can be used in the 1401 system if the auxiliary storage space is pre-stored with the required information. In addition, with software programs developed in high-level languages, new hardware developers have little control over software programs compiled for old hardware, and software developers have little incentive to re-compile source code for new hardware, This is especially the case when the software developer and the hardware developer are different individuals. In the article "An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures" proposed by Siberman and Ebcioglu in Computer, June 1993, No. 6, they disclose a method that implements RISC, superscalar and super long Instruction word (VLIW) architecture (hereinafter referred to as the native architecture) system to improve the execution efficiency of the existing complex instruction set (CISC) architecture (such as IBM S/390) technology, the disclosed system includes a native engine (native engine) that executes native code. engine) and the migration engine (migrant engine) that executes the object code, and can convert between the two encodings as needed according to the translation effect of the translation software translating the object bode into the native code. Please refer to US Patent No. 7,047,394 published on May 16, 2006. VanDyke et al. discloses a processor having an execution pipeline for executing program instructions of a native reduced instruction set (Tapestry), and utilizing hardware translation and The combination of software translation translates x86 program instructions into native reduced instruction set instructions. Nakada et al. proposed a heterogeneous SMT processor with front-end pipeline of ARM architecture and front-end pipeline of Fujitsu FR-V (very long instruction word) architecture, and the front-end pipeline of ARM architecture is used for irregular (irregular) Software programs (such as operating systems), while the front-end pipeline of the Fujitsu FR-V (very long instruction word) architecture is used for multimedia applications, which provides an increased queue of very long instruction words to the backend of FR-V very long instruction words front-end pipeline to maintain instructions from the front-end pipeline. Please refer to the paper published by Buchty and Weib, eds, Universitatsverlag Karlsruhe in November 2008 in First International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC'08), Lake Como, Italy, (with MICRO-41) Set (ISBN 978-3-86644-298-6) for the article "OROCHI: A Multiple Instruction Set SMTProcessor". The method presented here is designed to reduce the space occupied by the entire system in a heterogeneous system-on-a-chip (SOC) device, such as the Texas Instruments OMAP applications processor, which has an ARM processor core plus an or multiple co-processors (eg TMS320, various digital signal processors, or various graphics processing units (GPUs)). These coprocessors do not share instruction execution resources, but are integrated into different processing cores on the same chip.

软件转译器(software translator)、或称软件仿真器(software emulator,software simulator)、动态二进制码转译器等，亦被用于支持将软件程序在与此软件程序架构不同的处理器上执行的能力。其中受欢迎的商用实例如搭配苹果麦金塔(Macintosh)计算机的Motorola 68K-to-PowerPC仿真器，其可在具有PowerPC处理器的麦金塔计算机上执行68K程序，以及后续研发出来的PowerPC-to-x86仿真器，其可在具有x86处理器的麦金塔计算机上执行68K程序。位于加州圣塔克拉拉(Santa Clara,California)的全美达公司，结合超长指令字(VLIW)的核心硬件与“纯粹软件指令的转译器(亦即程序码转译软件(CodeMorphing Software))以动态地编译或仿真(emulate)x86程序码序列”以执行x86程序码，请参照2011年维基百科针对全美达(Transmeta)的说明<http://en.wikipedia.org/wiki/Transmeta>。另外，参照1998年11月3日由Kelly et al.提出的美国专利第5,832,205号公告案。IBM的DAISY(Dynamic Architecture Instruction Set from Yorktown)系统具有超长指令字(VLIW)机器与动态二进制软件转译，可提供100％的旧架构软件相容仿真。DAISY具有位于只读存储器内的虚拟机器观测器(Virtual Machine Monitor)，以平行处理(parallelize)与储存超长指令字原始码(VLIW primitives)至未见于旧有系统架构的部分主要存储器内，其能避免这些旧有体系架构的程序码片段在后续程序被重新编译(re-translation)。DAISY具有高速编译器优化算法(fast compiler optimizationalgorithms)以提升效能。QEMU是一具有软件动态转译器的机器仿真器(machineemulator)。QEMU可在多种主系统(host)，如x86、PowerPC、ARM、SPARC、Alpha与MIPS，模拟多种中央处理器，如x86、PowerPC、ARM与SPARC。请参照QEMU,a Fast and Portable DynamicTranslator,Fabrice Bellard,USENIX Association,FREENIX Track:2005USENIX AnnualTechnical Conference，如同其开发者所称“动态转译器对于目标处理器指令执行时的转换(runtime conversion)，将其转换至主系统指令集，所产生的二进制码是储存于一转译快取以利于重复取用。…QEMU[较之其它动态转译器]远为简单，因为它只连接GNC C编译器于离线(off line)时所产生的机器码片段”。同时可参照2009年6月19日Adelaide大学LeeWang Hao的学位论文“ARM Instruction Set Simulation on Multi-core x86Hardware”。虽然以软件转译为基础的解决方案所提供的处理效能可以满足多个运算需求的一部分，但是不大能够满足多个使用者的情况。Software translator (software translator), or software emulator (software emulator, software simulator), dynamic binary code translator, etc., are also used to support the ability to execute software programs on processors with different architectures from the software program. . Among the popular commercial examples are the Motorola 68K-to-PowerPC emulator paired with an Apple Macintosh computer, which can execute 68K programs on a Macintosh computer with a PowerPC processor, and the subsequent PowerPC-to-PowerPC emulator. to-x86 emulator, which can execute 68K programs on a Macintosh computer with an x86 processor. Transmeta Corporation, located in Santa Clara, California, combines the core hardware of very long instruction word (VLIW) with the "translator of pure software instructions (ie CodeMorphing Software) to dynamically Compile or emulate x86 code sequences" to execute x86 code, please refer to the 2011 Wikipedia description for Transmeta <http://en.wikipedia.org/wiki/Transmeta>. Also, see US Patent No. 5,832,205, filed November 3, 1998 by Kelly et al. IBM's DAISY (Dynamic Architecture Instruction Set from Yorktown) system features a very long instruction word (VLIW) machine and dynamic binary software translation that provides 100 percent software-compatible emulation of legacy architectures. DAISY has a Virtual Machine Monitor located in ROM to parallelize and store VLIW primitives to parts of main memory not found in legacy system architectures. It is possible to avoid re-translation of these old architecture code fragments in subsequent programs. DAISY has a high-speed compiler optimization algorithm (fast compiler optimization algorithms) to improve performance. QEMU is a machine emulator with a software dynamic translator. QEMU can emulate a variety of CPUs, such as x86, PowerPC, ARM, and SPARC, on a variety of hosts, such as x86, PowerPC, ARM, SPARC, Alpha, and MIPS. Please refer to QEMU, a Fast and Portable DynamicTranslator, Fabrice Bellard, USENIX Association, FREENIX Track: 2005USENIX AnnualTechnical Conference, as its developers call "the runtime conversion of the target processor instruction execution by the dynamic translator, convert it To the main system instruction set, the resulting binary code is stored in a translation cache to facilitate repeated access....QEMU [compared to other dynamic translators] is much simpler because it only links the GNC C compiler offline (off line) generated machine code fragment". Also refer to the dissertation "ARM Instruction Set Simulation on Multi-core x86Hardware" by Lee Wang Hao of Adelaide University on June 19, 2009. Although the processing performance provided by the software translation-based solution can meet some of the multiple computing requirements, it is not sufficient for multiple users.

静态(static)二进制转译是另一种具有高效能潜力的技术。不过，二进制转译技术的使用存在技术上的问题(例如自我修改程序码(self-modifying code)、只在执行时(run-time)可知的间接分支(indirect branches)数值)以及商业与法律上的障碍(例如：此技术可能需要硬件开发商配合开发散布新程序所需的管道；对原程序散布者存在潜在的授权或是著作权侵害的风险)。Static binary translation is another technique with high performance potential. However, there are technical problems with the use of binary translation techniques (such as self-modifying code, indirect branches values known only at run-time) and commercial and legal issues. Obstacles (eg: this technology may require hardware developers to cooperate with the development of channels needed to distribute new programs; potential licensing or copyright infringement risks for original program distributors).

ARM指令集架构(ISA)具有条件指令执行的特色。如ARM架构参考手册(ARMArchitecture Reference Manual)第A4-3页所述：“大部分ARM指令可被有条件地执行，意谓若是APSR中的N、Z、C与V旗标满足指令所指定的条件，他们才会在程序者的模式操作、存储器与协同处理器发挥正常的效果。若是这些旗标不满足条件，指令就会如同一个不操作(NOP)机器指令，此指令执行至下一个正常指令，包括对于例外事件实行的所有相关的确认操作，但不会有其它效果。”The ARM Instruction Set Architecture (ISA) features conditional instruction execution. As stated on page A4-3 of the ARM Architecture Reference Manual: "Most ARM instructions can be executed conditionally, meaning that if the N, Z, C and V flags in the APSR satisfy the conditions, they will operate in programmer mode, memory and coprocessors will have normal effects. If these flags do not meet the conditions, the instruction will behave like a no-operation (NOP) machine instruction, and the instruction will execute until the next normal operation. Instructions, including all relevant confirmation operations for exceptions, have no other effect."

条件执行有利于缩小指令码的尺寸，并能通过缩减分支指令的数量来提升效能，但指令错误预测则会伴随效能减损。因此，如何有效率地执行条件指令，尤其在支持高微处理器时钟的情况，是亟待解决的问题。Conditional execution helps reduce the size of the instruction code and can improve performance by reducing the number of branch instructions, but instruction misprediction will come with a performance penalty. Therefore, how to efficiently execute conditional instructions, especially in the case of supporting a high microprocessor clock, is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本发明的一实施例提供一个执行条件非分支指令的微处理器。其中各该条件非分支指令是指定一条件，各该条件非分支指令在该条件满足时指示该微处理器执行一操作，而在该条件不满足该微处理器的条件旗标时不去执行该操作。该微处理器可以包含：一预测器，用以提供关于一条件非分支指令的预测；一指令转译器，用以：在该预测预测该条件将不会被满足时，将该条件非分支指令转译为具有条件码的一不操作微指令，其中具有条件码的不操作微指令除了使执行单元启动来检查该预测外不会执行其它操作；以及在该预测预测该条件将会被满足时，将该条件非分支指令转译为单个具有条件码的可操作微指令以非条件地执行该操作。其中该指令转译器将x86指令集架构(ISA)程序和进阶精简指令集机器(ARM)ISA程序的指令转译为由该微处理器的微指令集定义的微指令，其中该微指令按照与其中由x86ISA和ARM ISA的指令集定义的指令被编码的方式不同的方式来编码。以及包含一执行管线，包括指令发布单元和多个执行单元，其中指令发布单元操作来发布该单个具有条件码的可操作微指令给多个执行单元的选择的一个，并且所选择的执行单元操作来执行该单个具有条件码的可操作微指令。One embodiment of the present invention provides a microprocessor for executing conditional non-branch instructions. Each of the conditional non-branch instructions specifies a condition, and each of the conditional non-branch instructions instructs the microprocessor to perform an operation when the condition is satisfied, and does not execute when the condition does not satisfy the condition flag of the microprocessor the operation. The microprocessor may include: a predictor for providing a prediction about a conditional non-branch instruction; an instruction translator for: when the prediction predicts that the condition will not be satisfied, the conditional non-branch instruction translates into a no-op microinstruction with a condition code, wherein the no-op microinstruction with a condition code does nothing but cause an execution unit to start to check the prediction; and when the prediction predicts that the condition will be met, The conditional non-branch instruction is translated into a single operable microinstruction with a condition code to perform the operation unconditionally. The instruction translator translates the instructions of the x86 instruction set architecture (ISA) program and the advanced reduced instruction set machine (ARM) ISA program into microinstructions defined by the microinstruction set of the microprocessor, wherein the microinstructions are in accordance with the The way in which the instructions defined by the x86 ISA and ARM ISA instruction sets are encoded is encoded in different ways. and includes an execution pipeline including an instruction issue unit and a plurality of execution units, wherein the instruction issue unit operates to issue the single operable microinstruction with a condition code to a selected one of the multiple execution units, and the selected execution unit operates to execute the single operable microinstruction with a condition code.

本发明的另一实施例提供一种利用一微处理器执行条件非分支指令的方法。其中，该微处理器具有指令转译器，其将x86指令集架构(ISA)程序和进阶精简指令集机器(ARM)ISA程序的指令转译为由该微处理器的微指令集定义的微指令，其中该微指令按照与其中由x86ISA和ARM ISA的指令集定义的指令被编码的方式不同的方式来编码，其中各该条件非分支指令是指定一条件，各该条件非分支指令在该条件满足时指示该微处理器执行一操作，而在该条件不满足该微处理器的条件旗标时不去执行该操作。该方法包含：提供关于一条件非分支指令的预测；在该预测预测该条件将不会被满足时，将该条件非分支指令转译为具有条件码的一不操作微指令，其中具有条件码的不操作微指令除了使执行单元启动来检查该预测外不会执行其它操作；在该预测预测该条件将会被满足时，将该条件非分支指令转译为单个具有条件码的可操作微指令以非条件地执行该操作；以及指令发布单元发布该单个具有条件码的可操作微指令给多个执行单元的选择的一个，所选择的执行单元执行该单个具有条件码的可操作微指令。其中指令发布单元和选择的执行单元是该微处理器的一硬件执行管线的一部分。Another embodiment of the present invention provides a method for executing conditional non-branch instructions using a microprocessor. Wherein, the microprocessor has an instruction translator that translates instructions of x86 instruction set architecture (ISA) programs and advanced reduced instruction set machine (ARM) ISA programs into microinstructions defined by the microinstruction set of the microprocessor , where the microinstruction is encoded in a different way than the one in which the instructions defined by the instruction sets of the x86 ISA and ARM ISA are encoded, where each conditional non-branch instruction specifies a condition, and each conditional non-branch instruction specifies a condition in which Instructs the microprocessor to perform an operation when satisfied, and does not perform the operation when the condition does not satisfy the condition flag of the microprocessor. The method includes: providing a prediction about a conditional non-branch instruction; when the prediction predicts that the condition will not be satisfied, translating the conditional non-branch instruction into a no-op microinstruction with a condition code, wherein the The no-op microinstruction does nothing but cause the execution unit to start to check the prediction; when the prediction predicts that the condition will be met, the conditional non-branch instruction is translated into a single operable microinstruction with a condition code to performing the operation unconditionally; and the instruction issue unit issues the single operable microinstruction with the condition code to a selected one of the plurality of execution units, the selected execution unit executing the single operable microinstruction with the condition code. The instruction issuing unit and the selected execution unit are part of a hardware execution pipeline of the microprocessor.

本发明的又一实施例提供一编码于计算机可读取储存媒介的计算机程序产品，其包含计算机可读取程序码，用以指定一微处理器以执行条件非分支指令。其中，各个条件非分支指令是指定一条件。各个条件非分支指令在条件满足时指示微处理器执行一操作，而在条件不满足微处理器的条件旗标时不去执行操作。此计算机可读取程序码包含第一程序码以指定一预测器，提供关于一条件非分支指令的预测。此计算机可读取程序码并包含第二程序码，以指定一指令转译器，用以在前述预测预测条件将不会被满足时，将条件非分支指令转译为一不操作微指令，而在前述预测该预测条件将会被满足时，将条件非分支指令转译为一个或多个微指令构成的一微指令组以非条件地执行此操作。此计算机可读取程序码并包含第三程序码，以指定一执行管线，执行前述不操作微指令或是由指令转译器提供的微指令组。Yet another embodiment of the present invention provides a computer program product encoded on a computer-readable storage medium, comprising computer-readable program code for instructing a microprocessor to execute conditional non-branch instructions. Wherein, each conditional non-branch instruction specifies a condition. Each conditional non-branch instruction instructs the microprocessor to perform an operation when the condition is satisfied, and does not perform the operation when the condition does not satisfy the condition flag of the microprocessor. The computer-readable program code includes first program code to designate a predictor that provides prediction for a conditional non-branch instruction. The computer can read the program code and include the second program code to designate an instruction translator for translating the conditional non-branch instruction into a no-operation microinstruction when the aforementioned prediction prediction condition will not be satisfied, and in the When it is predicted that the predicted condition will be satisfied, the conditional non-branch instruction is translated into a microinstruction group composed of one or more microinstructions to unconditionally execute the operation. The computer can read the program code and include the third program code to designate an execution pipeline to execute the aforementioned non-operational microinstructions or a group of microinstructions provided by an instruction translator.

本发明的一实施例提供一种具有一指令集架构的微处理器。该指令集架构是定义有一指令，该指令包含一立即字段，该立即字段内具有一第一部分指定一第一数值与一第二部分指定一第二数值，该指令指示该微处理器执行一操作将一固定数值作为其中之一的来源操作数，该固定数值是将该第一数值基于该第二数值转动/移动一定数量的位而获得。该微处理器包含：一指令转译器，将该指令转译为至少一立即ALU微指令，其中该立即ALU微指令是以不同于该指令集架构所定义的指令编码方式进行编码；以及一执行管线，执行由该指令转译器所产生的微指令，以产生由该指令集架构定义的结果。其中该指令转译器，而非该执行管线，依据该第一数值与该第二数值产生该固定数值作为给立即ALU微指令的一来源操作数，供该执行管线执行。An embodiment of the present invention provides a microprocessor having an instruction set architecture. The instruction set architecture defines an instruction, the instruction includes an immediate field, the immediate field has a first part specifying a first value and a second part specifying a second value, the instruction instructs the microprocessor to perform an operation Taking a fixed value as one of the source operands, the fixed value is obtained by rotating/shifting the first value by a certain number of bits based on the second value. The microprocessor includes: an instruction translator that translates the instruction into at least one immediate ALU microinstruction, wherein the immediate ALU microinstruction is encoded in an instruction encoding manner different from that defined by the instruction set architecture; and an execution pipeline , executes microinstructions generated by the instruction translator to produce results defined by the instruction set architecture. The instruction translator, not the execution pipeline, generates the fixed value according to the first value and the second value as a source operand for the immediate ALU microinstruction for the execution pipeline to execute.

本发明的另一实施例提供一种方法，由一具有一指令集架构的微处理器执行。该指令集架构是定义一指令，该指令包含一立即字段，该立即字段内具有一第一部分指定一第一数值与一第二部分指定一第二数值，该指令指示该微处理器执行一操作将一固定数值作为其中之一的来源操作数，该固定数值是将该第一数值基于该第二数值转动/移动一定数量的位而获得。该方法包含：将该指令转译为至少一立即ALU微指令，其中该立即ALU微指令是以不同于该指令集架构所定义的指令编码方式进行编码，其中，该转译步骤是由该微处理器的一指令转译器执行；以及执行由该指令转译器所产生的微指令，以产生一由该指令集架构定义的结果，其中该执行步骤是由该微处理器的一执行管线执行。其中由该指令转译器，而非该执行管线，依据该第一数值与该第二数值产生该固定数值作为给立即ALU微指令的一来源操作数，供该执行管线执行。Another embodiment of the present invention provides a method performed by a microprocessor having an instruction set architecture. The instruction set architecture defines an instruction, the instruction includes an immediate field, the immediate field has a first part specifying a first value and a second part specifying a second value, the instruction instructs the microprocessor to perform an operation Taking a fixed value as one of the source operands, the fixed value is obtained by rotating/shifting the first value by a certain number of bits based on the second value. The method includes: translating the instruction into at least one immediate ALU microinstruction, wherein the immediate ALU microinstruction is encoded in an instruction encoding manner different from that defined by the instruction set architecture, wherein the translating step is performed by the microprocessor and executing microinstructions generated by the instruction translator to generate a result defined by the instruction set architecture, wherein the execution steps are performed by an execution pipeline of the microprocessor. The fixed value is generated by the instruction translator, not the execution pipeline, according to the first value and the second value as a source operand for the immediate ALU microinstruction for the execution pipeline to execute.

本发明的另一实施例提供一种具有一指令集架构的微处理器。该指令集架构是定义一指令，该指令包含一立即字段，该立即字段内具有一第一部分指定一第一数值与一第二部分指定一第二数值，该指令指示该微处理器执行一操作将一固定数值作为其中之一的来源操作数，该固定数值是将该第一数值基于该第二数值转动/移动一定数量的位而获得。该微处理器包含：一指令转译器，将该指令转译为微指令；以及一执行管线，执行由该指令转译器所产生的该微指令，以产生一由该指令集架构定义的结果。其中，当该立即字段的一数值是落于一预定的数值子集内：该指令转译器将该指令转译为至少一立即ALU微指令；该指令转译器而非该执行管线，依据该第一与该第二数值产生该固定数值；以及该执行管线利用该指令转译器产生的该固定数值作为其中之一的来源操作数，执行该立即ALU微指令。以及其中，当该立即字段的该数值并不落于该预定的数值子集内：该指令转译器将该指令转译为至少第一与第二微指令；该执行管线而非该指令转译器，通过执行该第一微指令，产生该固定数值；以及该执行管线通过利用该第一微指令执行产生的该固定数值作为其中之一的来源操作数，以执行该第二微指令。Another embodiment of the present invention provides a microprocessor having an instruction set architecture. The instruction set architecture defines an instruction, the instruction includes an immediate field, the immediate field has a first part specifying a first value and a second part specifying a second value, the instruction instructs the microprocessor to perform an operation Taking a fixed value as one of the source operands, the fixed value is obtained by rotating/shifting the first value by a certain number of bits based on the second value. The microprocessor includes: an instruction translator that translates the instruction into microinstructions; and an execution pipeline that executes the microinstruction generated by the instruction translator to generate a result defined by the instruction set architecture. Wherein, when a value of the immediate field falls within a predetermined value subset: the instruction translator translates the instruction into at least one immediate ALU microinstruction; the instruction translator, not the execution pipeline, according to the first generating the fixed value with the second value; and the execution pipeline executes the immediate ALU microinstruction using the fixed value generated by the instruction translator as one of the source operands. and wherein, when the value of the immediate field does not fall within the predetermined subset of values: the instruction translator translates the instruction into at least the first and second microinstructions; the execution pipeline, not the instruction translator, By executing the first microinstruction, the fixed value is generated; and the execution pipeline executes the second microinstruction by using the fixed value generated by executing the first microinstruction as one of the source operands.

本发明的另一实施例提供一种方法，由具有一指令集架构的一微处理器执行。该指令集架构是定义一指令，该指令包含一立即字段，该立即字段内具有一第一部分指定一第一数值与一第二部分指定一第二数值，该指令指示该微处理器执行一操作将一固定数值作为其中之一的来源操作数，该固定数值是将该第一数值基于该第二数值转动/移动一定数量的位而获得，该微处理器并包含一指令转译器与一执行管线。该方法包含：利用该指令转译器，确认该立即字段的一数值是否落于一预定的数值子集内；当该立即字段的该数值是落于该预定的数值子集内：利用该指令转译器将该指令转译为至少立即ALU微指令；利用该指令转译器而非该执行管线，依据该第一与该第二数值产生该固定数值；以及利用该执行管线，将该指令转译器产生的该固定数值作为其中之一的来源操作数，来执行该立即ALU微指令。以及其中，当该立即字段的该数值并不落于该预定的数值子集内：利用该指令转译器，将该指令转译为至少第一与第二微指令；利用该执行管线而非该指令转译器，通过执行该第一微指令，以产生该固定数值；以及利用该执行管线，通过利用该第一微指令执行产生的该固定数值作为其中之一的来源操作数，以执行该第二微指令。Another embodiment of the present invention provides a method performed by a microprocessor having an instruction set architecture. The instruction set architecture defines an instruction, the instruction includes an immediate field, the immediate field has a first part specifying a first value and a second part specifying a second value, the instruction instructs the microprocessor to perform an operation Taking a fixed value as one of the source operands, the fixed value is obtained by rotating/shifting the first value by a certain number of bits based on the second value, the microprocessor includes an instruction translator and an execution pipeline. The method includes: using the instruction translator to confirm whether a value of the immediate field falls within a predetermined value subset; when the value of the immediate field falls within the predetermined value subset: using the instruction to translate A processor translates the instruction into at least immediate ALU microinstructions; uses the instruction translator instead of the execution pipeline to generate the fixed value based on the first and second values; and utilizes the execution pipeline to generate the fixed value from the instruction translator The fixed value is used as one of the source operands to execute the immediate ALU microinstruction. and wherein, when the value of the immediate field does not fall within the predetermined subset of values: use the instruction translator to translate the instruction into at least first and second microinstructions; use the execution pipeline instead of the instruction A translator, by executing the first microinstruction, to generate the fixed value; and by using the execution pipeline, by using the fixed value generated by executing the first microinstruction as one of the source operands, to execute the second microinstructions.

本发明的又一实施例提供一种计算机程序产品，编码于至少一计算机可读取储存媒介以使用于一运算装置。此计算机程序产品包括编码于此媒介的计算机可读取程序码，用以指定一微处理器。此微处理器具有一指令集架构，指令集架构是定义至少一指令。此指令包含一立即字段，立即字段内具有一第一部分用以指定一第一数值与一第二部分用以指定一第二数值。指令是指示微处理器执行一操作将一固定数值作为其中之一的来源操作数。此固定数值是将第一数值基于第二数值转动/移动一定数量的位而获得。此计算机可读取程序码具有第一程序码，指定一指令转译器，用以将至少一指令转译为一个或多个微指令，其中该指令是以不同于指令集架构所定义的指令编码方式进行编码。此计算机可读取程序码并具有第二程序码，指定一执行管线，用以执行由指令转译器所产生的微指令，以产生一由该指令集架构定义的结果。其中，指令转译器，而非执行管线，是依据第一与第二数值产生的固定数值作为至少一微指令的一来源操作数，供执行管线执行。Yet another embodiment of the present invention provides a computer program product encoded in at least one computer-readable storage medium for use in a computing device. The computer program product includes computer readable program code encoded on the medium for specifying a microprocessor. The microprocessor has an instruction set architecture, and the instruction set architecture defines at least one instruction. The command includes an immediate field with a first part for specifying a first value and a second part for specifying a second value. An instruction instructs the microprocessor to perform an operation with a fixed value as one of its source operands. This fixed value is obtained by rotating/shifting the first value by a certain number of bits based on the second value. The computer-readable program code has a first program code and designates an instruction translator for translating at least one instruction into one or more micro-instructions, wherein the instruction is in an instruction encoding manner different from that defined by the instruction set architecture to encode. The computer can read program code and has a second program code designating an execution pipeline for executing microinstructions generated by an instruction translator to generate a result defined by the instruction set architecture. The instruction translator, not the execution pipeline, uses a fixed value generated according to the first and second values as a source operand of at least one microinstruction for the execution pipeline to execute.

关于本发明的优点与精神可以通过以下的发明详述及所附图式得到进一步的了解。The advantages and spirit of the present invention can be further understood from the following detailed description of the invention and the accompanying drawings.

附图说明Description of drawings

图1是本发明执行x86程序集架构与ARM程序集架构机器语言程序的微处理器一实施例的方块图。FIG. 1 is a block diagram of an embodiment of a microprocessor for executing x86 assembly architecture and ARM assembly architecture machine language programs according to the present invention.

图2是一方块图，详细显示图1的硬件指令转译器。FIG. 2 is a block diagram showing the hardware instruction translator of FIG. 1 in detail.

图3是一方块图，详细显示图2的指令格式化程序。FIG. 3 is a block diagram showing the instruction formatter of FIG. 2 in detail.

图4是一方块图，详细显示图1的执行管线。FIG. 4 is a block diagram showing the execution pipeline of FIG. 1 in detail.

图5是一方块图，详细显示图1的寄存器文件。FIG. 5 is a block diagram showing the register file of FIG. 1 in detail.

图6A和图6B是一流程图，显示图1的微处理器的操作步骤。6A and 6B are a flowchart showing the operation steps of the microprocessor of FIG. 1. FIG.

图7是本发明一双核心微处理器的方块图。7 is a block diagram of a dual-core microprocessor of the present invention.

图8是本发明执行x86ISA与ARM ISA机器语言程序的微处理器另一实施例的方块图。FIG. 8 is a block diagram of another embodiment of the microprocessor for executing x86 ISA and ARM ISA machine language programs according to the present invention.

图9是一方块图，详细显示部分图1的微处理器。FIG. 9 is a block diagram showing a portion of the microprocessor of FIG. 1 in detail.

图10A和图10B是一流程图，显示图1的硬件指令转译器转译条件ALU指令的操作步骤。10A and 10B are flowcharts showing the operation steps of the hardware instruction translator of FIG. 1 to translate a conditional ALU instruction.

图11是一流程图，显示图4的执行单元执行一移位微指令的操作步骤。FIG. 11 is a flowchart showing the operation steps of the execution unit of FIG. 4 to execute a shift microinstruction.

图12A和图12B是一流程图，显示图4的执行单元执行一条件ALU微指令的操作步骤。12A and 12B are flowcharts showing the operational steps of the execution unit of FIG. 4 to execute a conditional ALU microinstruction.

图13是一流程图，显示图4的执行单元执行一条件移动微指令的操作步骤。FIG. 13 is a flowchart showing the operation steps of the execution unit of FIG. 4 to execute a conditional move microinstruction.

图14至20是方块图，显示图1的执行管线112执行依据图10的转译操作所转译的各种形式的条件ALU指令的操作步骤。FIGS. 14-20 are block diagrams illustrating the operational steps of the execution pipeline 112 of FIG. 1 to execute various forms of conditional ALU instructions translated in accordance with the translation operation of FIG. 10 .

图21A和图21B是一流程图，显示图1的硬件指令转译器转译条件ALU指令，以指定来源寄存器的其中之一与目的寄存器为同一的操作步骤。21A and 21B are flowcharts showing the operation steps for the hardware instruction translator of FIG. 1 to translate the conditional ALU instruction to specify that one of the source registers is the same as the destination register.

图22至28是方块图，显示图1的执行管线112执行依据图21的转译操作所转译的各种形式的条件ALU指令的操作步骤。FIGS. 22-28 are block diagrams illustrating the operational steps of the execution pipeline 112 of FIG. 1 to execute various forms of conditional ALU instructions translated in accordance with the translation operation of FIG. 21 .

图29是一方块图，显示本发明对于非条件分支指令进行预测的微处理器100的一实施例。FIG. 29 is a block diagram showing an embodiment of the microprocessor 100 for predicting unconditional branch instructions according to the present invention.

图30是一方块示意图，显示图29的指令转译器对于条件ALU指令的转译的一实施例。FIG. 30 is a block diagram illustrating an embodiment of the translation of conditional ALU instructions by the instruction translator of FIG. 29 .

图31A和图31B是一流程图显示本发明图29的微处理器执行图30的一条件ALU指令的一实施例。FIGS. 31A and 31B are a flowchart showing an embodiment of the present invention in which the microprocessor of FIG. 29 executes a conditional ALU instruction of FIG. 30 .

图32是一方块图，显示本发明在转译过程中处理修正后立即常数的微处理器的一实施例。FIG. 32 is a block diagram showing one embodiment of a microprocessor for processing immediate post-correction constants during translation according to the present invention.

图33是一方块图，显示本发明将一立即操作数指令选择性地转译为一个ROR微指令与一ALU微指令或是转译为一立即ALU微指令的一实施例。33 is a block diagram showing an embodiment of the present invention for selectively translating an immediate operand instruction into a ROR microinstruction and an ALU microinstruction or into an immediate ALU microinstruction.

图34A和图34B是一流程图，显示本发明图32的微处理器100执行图33的一立即操作数指令的操作的一实施例。34A and 34B are flowcharts showing one embodiment of the operation of the microprocessor 100 of FIG. 32 to execute an immediate operand instruction of FIG. 33 of the present invention.

[主要元件标号说明][Main component label description]

微处理器(处理核心) 100 指令快取 102Microprocessor (processing core) 100 Instruction cache 102

硬件指令转译器 104 寄存器文件 106Hardware Instruction Translator 104 Register File 106

存储器子系统 108 执行管线 112memory subsystem 108 execution pipeline 112

指令撷取单元与分支预测器 114 ARM程序计数器(PC)寄存器 116Instruction Fetch Unit and Branch Predictor 114 ARM Program Counter (PC) Register 116

x86指令指针(IP)寄存器 118x86 Instruction Pointer (IP) Register 118

配置寄存器(configuration register) 122configuration register 122

ISA指令 124 微指令 126ISA Instructions 124 Microinstructions 126

结果 128Results 128

指令模式指针(instruction mode indicator) 132instruction mode indicator 132

撷取地址 134Fetch address 134

环境模式指针(environment mode indicator) 136environment mode indicator 136

指令格式化程序 202 简单指令转译器(SIT) 204Instruction Formatter 202 Simple Instruction Interpreter (SIT) 204

复杂指令转译器(CIT) 206 多工器(mux) 212Complex Instruction Translator (CIT) 206 Multiplexer (mux) 212

x86简单指令转译器 222 ARM简单指令转译器 224x86 Simple Instruction Translator 222 ARM Simple Instruction Translator 224

微程序计数器(micro-program counter,micro-PC) 232Micro-program counter (micro-program counter, micro-PC) 232

微码只读存储器 234Microcode ROM 234

微程序器(microsequencer) 236Microsequencer 236

指令间接寄存器(instruction indirection register,IIR) 235Instruction indirection register (IIR) 235

微转译器(microtranslator) 237Microtranslator 237

格式化ISA指令 242format ISA instructions 242

实行微指令(implementing microinstructions) 244Implementing microinstructions 244

实行微指令 246Execute microinstructions 246

选择输入 248select input 248

微码地址 252Microcode address 252

只读存储器地址 254ROM address 254

ISA指令信息 255ISA Directive Information 255

预解码器(pre-decoder) 302Pre-decoder (pre-decoder) 302

指令字节队列(IBQ) 304Instruction Byte Queue (IBQ) 304

长度解码器(length decoders)与涟波逻辑门(ripple logic) 306Length decoders and ripple logic 306

多工器队列(mux queue,MQ) 308Multiplexer queue (mux queue, MQ) 308

多工器 312Multiplexer 312

格式化指令队列(formatted instruction queue,FIQ) 314Formatted instruction queue (FIQ) 314

ARM指令集状态 322ARM instruction set status 322

微指令队列 401Microinstruction Queue 401

寄存器配置表(register allocation table,RAT) 402Register allocation table (RAT) 402

指令调度器(instruction dispatcher) 404instruction dispatcher 404

保留站(reservation station) 406reservation station 406

指令发布单元(instruction issue unit) 408Instruction issue unit 408

整数/分支(integer/branch)单元 412Integer/branch unit 412

媒体单元(media unit) 414Media unit 414

加载/储存(load/store)单元 416load/store unit 416

浮点(floating point)单元 418floating point unit 418

重排缓冲器(reorder buffer,ROB) 422Reorder buffer (ROB) 422

执行单元 424 ARM特定寄存器 502Execution unit 424 ARM specific registers 502

x86特定寄存器 504 共享寄存器 506x86 specific registers 504 shared registers 506

双核心微处理器 700 微指令快取 892Dual-core microprocessor 700 microinstruction cache 892

条件旗标寄存器 926 多工器 922Condition Flag Register 926 Multiplexer 922

旗标总线 928 条件旗标数值 928/924Flag Bus 928 Condition Flag Value 928/924

ISA条件旗标 902 条件满足(SAT)位 904ISA Condition Flag 902 Condition Satisfied (SAT) bit 904

预移位进位(PSC)位 906 使用移位进位(USE)位 908Preshift Carry (PSC) bit 906 Use Shift Carry (USE) bit 908

动态预测器 2932 预测器选择器 2934Dynamic Predictor 2932 Predictor Selector 2934

静态预测器 2936 动态预测 2982Static Predictors 2936 Dynamic Predictors 2982

预测选择 2984 静态预测 2986Forecast selection 2984 Static forecast 2986

历史更新 2974 误预测 2976Historical update 2974 Misprediction 2976

ALU微指令 3044 条件移动微指令 3046ALU microinstructions 3044 Conditional move microinstructions 3046

具条件码的条件ALU微指令 3045Conditional ALU microinstructions with condition codes 3045

具条件码的不操作微指令 3047No-op microinstruction with condition code 3047

操作码字段 a202,a212,a222,a252,a272Opcode fields a202,a212,a222,a252,a272

条件码字段 a204,a224,a254,a274Condition code fields a204,a224,a254,a274

来源寄存器1与2的字段 a206,a216,a256Fields a206, a216, a256 of source registers 1 and 2

目的寄存器字段 a208,a218,a232,a258Destination register fields a208,a218,a232,a258

来源寄存器1的字段 a226 来源寄存器2的字段 a228Field a226 of Source Register 1 Field a228 of Source Register 2

立即操作数 3266 ROR微指令 3344Immediate operand 3266 ROR microinstruction 3344

ALU微指令 3346 立即ALU微指令 3348ALU microinstruction 3346 Immediate ALU microinstruction 3348

操作码字段 b202,b212,b222,b232Opcode fields b202,b212,b222,b232

来源寄存器1的字段 b204,b214,b234Fields b204, b214, b234 of source register 1

来源寄存器2的字段 b235Field b235 of source register 2

目的寄存器字段 b206,b216,b226,b236Destination register fields b206,b216,b226,b236

立即字段 b207 immed_8字段 b208,b228Immediate field b207 immed_8 field b208,b228

rotate_imm字段 b209,b229 immediate-32字段 b218rotate_imm field b209, b229 immediate-32 field b218

具体实施方式Detailed ways

名词定义noun definition

指令集，是定义二进制制编码值的集合(即机器语言指令)与微处理器所执行的操作间的对应关系。机器语言程序基本上以二进制制进行编码，不过亦可使用其它进位制的系统，如部分早期IBM计算机的机器语言程序，虽然最终亦是以电压高低呈现二进制值的物理信号来表现，不过却是以十进制制进行编码。机器语言指令指示微处理器执行的操作如：将寄存器1内的操作数与寄存器2内的操作数相加并将结果写入寄存器3、将存储器地址0x12345678的操作数减掉指令所指定的立即操作数并将结果写入寄存器5、依据寄存器7所指定的位数移动寄存器6内的数值、若是零旗标被设定时，在分支到指令后方的36个字节、将存储器地址0xABCD0000的数值加载寄存器8。因此，指令集是定义各个机器语言指令使微处理器执行所欲执行的操作的二进制编码值。需了解的是，指令集定义二进制值与微处理器操作间的对应关系，并不意味着单一个二进制值就会对应至单一个微处理器操作。具体来说，在部分指令集中，多个二进制值可能会对应至同一个微处理器操作。An instruction set defines the correspondence between a set of binary-coded values (ie, machine language instructions) and the operations performed by the microprocessor. Machine language programs are basically coded in binary, but other systems can also be used. For example, the machine language programs of some early IBM computers are ultimately expressed as physical signals with binary values of voltage levels, but they are not. Encode in decimal. Machine language instructions instruct the microprocessor to perform operations such as: add the operand in register 1 to the operand in register 2 and write the result to register 3, subtract the operand at memory address 0x12345678 by the immediate value specified by the instruction. The operand and the result are written into register 5, and the value in register 6 is shifted according to the number of bits specified by register 7. If the zero flag is set, branch to the 36 bytes after the instruction, and store the value of memory address 0xABCD0000. The value is loaded into register 8. Thus, an instruction set is a binary coded value that defines individual machine language instructions that cause the microprocessor to perform the operations it is intended to perform. It should be understood that the instruction set defines the correspondence between binary values and microprocessor operations, and does not mean that a single binary value corresponds to a single microprocessor operation. Specifically, in some instruction sets, multiple binary values may correspond to the same microprocessor operation.

指令集架构(ISA)，从微处理器家族的脉络来看包含(1)指令集；(2)指令集的指令所能存取的资源集(例如：存储器寻址所需的寄存器与模式)；以及(3)微处理器响应指令集的指令执行所产生的例外事件集(例如：除以零、分页错误、存储器保护违反等)。因为程序撰写者，如组译器与编译器的撰写者，想要作出机器语言程序在一微处理器家族执行时，就需要此微处理器家族的ISA定义，所以微处理器家族的制造者通常会将ISA定义于操作者操作手册中。举例来说，2009年3月公布的Intel 64与IA-32架构软件开发者手册(Intel64andIA-32Architectures Software Developer’s Manual)即定义Intel 64与IA-32处理器架构的ISA。此软件开发者手册包含有五个章节，第一章是基本架构；第二A章是指令集参考A至M；第二B章是指令集参考N至Z；第三A章是系统编程指南；第三B章是系统编程指南第二部分，此手册系列为本案的参考文件。此种处理器架构通常被称为x86架构，本文中则是以x86、x86ISA、x86ISA家族、x86家族或是相似用语来说明。在另一个例子中，2010年公布的ARM架构参考手册，ARM v7-A与ARM v7-R版本Errata markup，定义ARM处理器架构的ISA。此参考手册系列为参考文件。此ARM处理器架构的ISA在此亦被称为ARM、ARM ISA、ARM ISA家族、ARM家族或是相似用语。其它众所周知的ISA家族还有IBM System/360/370/390与z/Architecture、DEC VAX、Motorola 68k、MIPS、SPARC、PowerPC与DEC Alpha等等。ISA的定义会涵盖处理器家族，因为处理器家族的发展中，制造者会通过在指令集中增加新指令、以及/或在寄存器组中增加新的寄存器等方式来改进原始处理器的ISA。举例来说，随着x86程序集架构的发展，其于Intel Pentium III处理器家族导入一组128位的多媒体扩展指令集(MMX)寄存器作为单指令多重数据流扩展(SSE)指令集的一部分，而x86ISA机器语言程序已经开发来利用XMM寄存器以提升效能，虽然现存的x86ISA机器语言程序并不使用单指令多重数据流扩展指令集的XMM寄存器。此外，其它制造商亦设计且制造出可执行x86ISA机器语言程序的微处理器。例如，超微半导体(AMD)与威盛电子(VIA Technologies)即在x86ISA增加新技术特征，如超微半导体的3DNOW！单指令多重数据流(SIMD)向量处理指令，以及威盛电子的Padlock安全引擎随机数产生器(random number generator)与先进解码引擎(advanced cryptography engine)的技术，前述技术都是采用x86ISA的机器语言程序，但却非由现有的Intel微处理器实现。以另一个实例来说明，ARM ISA原本定义ARM指令集状态具有4字节的指令。然而，随着ARM ISA的发展而增加其它指令集状态，如具有2字节指令以提升编码密度的Thumb指令集状态以及用以加速Java字节码程序的Jazelle指令集状态，而ARM ISA机器语言程序已被发展来使用部分或所有其它ARM ISA指令集状态，即使现存的ARM ISA机器语言程序并非采用这些其它ARM ISA指令集状态。The instruction set architecture (ISA), from the perspective of the microprocessor family, includes (1) the instruction set; (2) the resource set that the instructions of the instruction set can access (for example: registers and modes required for memory addressing) ; and (3) a set of exceptional events (eg, division by zero, page fault, memory protection violation, etc.) generated by the microprocessor in response to instruction execution of the instruction set. Because program writers, such as assemblers and compiler writers, want to make machine language programs for a family of microprocessors to execute, they need the ISA definition of the family of microprocessors, so the manufacturer of the family of microprocessors ISA is usually defined in the operator's manual. For example, the Intel 64 and IA-32 Architectures Software Developer's Manual published in March 2009 is the ISA that defines the Intel 64 and IA-32 processor architectures. This software developer's manual contains five chapters, the first chapter is the basic architecture; the second chapter A is the instruction set reference A to M; the second B chapter is the instruction set reference N to Z; the third chapter A is the system programming guide ;Chapter 3B is the second part of the System Programming Guide, this manual series is the reference document for this case. This processor architecture is often referred to as the x86 architecture, and is described in this document in terms of x86, x86ISA, x86ISA family, x86 family, or similar terms. In another example, the ARM Architecture Reference Manual published in 2010, the ARM v7-A and ARM v7-R versions Errata markup, defines the ISA for the ARM processor architecture. This reference manual series is a reference document. The ISA for the ARM processor architecture is also referred to herein as ARM, ARM ISA, ARM ISA family, ARM family, or similar terms. Other well-known ISA families are IBM System/360/370/390 and z/Architecture, DEC VAX, Motorola 68k, MIPS, SPARC, PowerPC and DEC Alpha, etc. The definition of ISA will cover processor families, as processor families evolve, manufacturers will improve the ISA of the original processor by adding new instructions to the instruction set, and/or adding new registers to the register set. For example, with the development of the x86 assembly architecture, it introduced a set of 128-bit Multimedia Extensions (MMX) registers in the Intel Pentium III processor family as part of the Single Instruction Multiple Stream Extensions (SSE) instruction set, And x86ISA machine language programs have been developed to use XMM registers to improve performance, although existing x86ISA machine language programs do not use the XMM registers of the SIMD extended instruction set. In addition, other manufacturers have designed and built microprocessors that can execute x86ISA machine language programs. For example, AMD and VIA Technologies are adding new technology features to the x86ISA, such as AMD's 3DNOW! Single instruction multiple data stream (SIMD) vector processing instructions, as well as VIA's Padlock security engine random number generator (random number generator) and advanced cryptography engine (advanced cryptography engine) technology, the aforementioned technologies are machine language programs using x86ISA , but not implemented by existing Intel microprocessors. As another example, the ARM ISA originally defines the ARM instruction set state to have 4-byte instructions. However, with the development of the ARM ISA other instruction set states have been added, such as the Thumb instruction set state with 2-byte instructions to increase coding density and the Jazelle instruction set state to accelerate Java bytecode programs, while the ARM ISA machine language Programs have been developed to use some or all of the other ARM ISA instruction set states, even though existing ARM ISA machine language programs do not use these other ARM ISA instruction set states.

指令集架构(ISA)机器语言程序，包含ISA指令序列，即ISA指令集对应至程序撰写者要程序执行的操作序列的二进制编码值序列。因此，x86ISA机器语言程序包含x86ISA指令序列，ARM ISA机器语言程序则包含ARM ISA指令序列。机器语言程序指令是存放于存储器内，且由微处理器撷取并执行。An Instruction Set Architecture (ISA) machine language program includes a sequence of ISA instructions, that is, a sequence of binary coded values in the ISA instruction set corresponding to the sequence of operations that the program author wants the program to perform. Thus, x86ISA machine language programs contain x86ISA instruction sequences, and ARM ISA machine language programs contain ARM ISA instruction sequences. Machine language program instructions are stored in the memory and are retrieved and executed by the microprocessor.

硬件指令转译器，包含多个晶体管的配置，用以接收ISA机器语言指令(例如x86ISA或是ARM ISA机器语言指令)作为输入，并对应地输出一个或多个微指令至微处理器的执行管线。执行管线执行微指令的执行结果是由ISA指令所定义。因此，执行管线通过对这些微指令的集体执行来“实现”ISA指令。也就是说，执行管线通过对于硬件指令转译器输出的实行微指令的集体执行，实现所输入ISA指令所指定的操作，以产生此ISA指令定义的结果。因此，硬件指令转译器可视为是将ISA指令“转译(translate)”为一个或多个实行微指令。本实施例所描述的微处理器具有硬件指令转译器以将x86ISA指令与ARM ISA指令转译为微指令。不过，需理解的是，硬件指令转译器并非必然可对x86使用者操作手册或是ARM使用者操作手册所定义的整个指令集进行转译，而往往只能转译这些指令中一个子集合，如同绝大多数x86ISA与ARM ISA处理器只支持其对应的使用者操作手册所定义的一个指令子集合。具体来说，x86使用者操作手册定义由硬件指令转译器转译的指令子集合，不必然就对应至所有现存的x86ISA处理器，ARM使用者操作手册定义由硬件指令转译器转译的指令子集合，不必然就对应至所有现存的ARM ISA处理器。Hardware instruction translator, including a configuration of multiple transistors, to receive ISA machine language instructions (such as x86ISA or ARM ISA machine language instructions) as input, and correspondingly output one or more microinstructions to the execution pipeline of the microprocessor . The execution result of the execution pipeline executing the microinstruction is defined by the ISA instruction. Thus, the execution pipeline "implements" ISA instructions through collective execution of these microinstructions. That is, the execution pipeline implements the operation specified by the input ISA instruction through collective execution of the execution microinstructions output by the hardware instruction translator, so as to generate the result defined by the ISA instruction. Thus, a hardware instruction translator can be thought of as "translating" an ISA instruction into one or more execution microinstructions. The microprocessor described in this embodiment has a hardware instruction translator to translate x86 ISA instructions and ARM ISA instructions into micro-instructions. However, it should be understood that the hardware instruction translator is not necessarily capable of translating the entire instruction set defined in the x86 user manual or the ARM user manual, but often can only translate a subset of these instructions, just like absolutely Most x86ISA and ARM ISA processors only support a subset of instructions as defined by their corresponding user manuals. Specifically, the x86 user manual defines the instruction subset translated by the hardware instruction translator, which does not necessarily correspond to all existing x86 ISA processors. The ARM user manual defines the instruction subset translated by the hardware instruction translator. Does not necessarily correspond to all existing ARM ISA processors.

执行管线，是一多层级序列(sequence of stages)。此多层级序列的各个层级分别具有硬件逻辑与一硬件寄存器。硬件寄存器保持硬件逻辑的输出信号，并依据微处理器的时脉信号，将此输出信号提供至多层级序列的下一层级。执行管线可以具有多个多层级序列，例多重执行管线。执行管线接收微指令作为输入信号，并相应地执行微指令所指定的操作以输出执行结果。微指令所指定，且由执行管线的硬件逻辑所执行的操作包括但不限于算术、逻辑、存储器加载/储存、比较、测试、与分支解析，对进行操作的数据格式包括但不限于整数、浮点数、字、二进编码十进数(BCD)、与压缩格式(packed format)。执行管线执行微指令以实现ISA指令(如x86与ARM)，藉以产生ISA指令所定义的结果。执行管线不同于硬件指令转译器。具体来说，硬件指令转译器产生实行微指令，执行管线则是执行这些指令，但不产生这些实行微指令。The execution pipeline is a sequence of stages. Each level of the multi-level sequence has hardware logic and a hardware register, respectively. The hardware register holds the output signal of the hardware logic and provides the output signal to the next level of the multi-level sequence according to the clock signal of the microprocessor. An execution pipeline can have multiple multi-level sequences, such as multiple execution pipelines. The execution pipeline receives the microinstruction as an input signal, and accordingly executes the operation specified by the microinstruction to output the execution result. The operations specified by the microinstructions and performed by the hardware logic of the execution pipeline include but are not limited to arithmetic, logic, memory load/store, comparison, testing, and branch parsing, and the data formats for operations include but are not limited to integer, floating Points, words, binary coded decimal (BCD), and packed formats. The execution pipeline executes microinstructions to implement ISA instructions (eg, x86 and ARM), thereby producing the results defined by the ISA instructions. The execution pipeline is different from the hardware instruction translator. Specifically, the hardware instruction translator generates execution microinstructions, and the execution pipeline executes these instructions, but does not generate these execution microinstructions.

指令快取，是微处理器内的一个随机存取存储装置，微处理器将ISA机器语言程序的指令(例如x86ISA与ARM ISA的机器语言指令)放置其中，这些指令是撷取自系统存储器并由微处理器依据ISA机器语言程序的执行流程来执行。具体来说，ISA定义一指令地址寄存器以持有下一个待执行ISA指令的存储器地址(举例来说，在x86ISA是定义为指令指针(IP)而在ARM ISA是定义为程序计数器(PC)，而在微处理器执行机器语言程序以控制程序流程时，微处理器会更新指令地址寄存器的内容。ISA指令被快取来供后续撷取之用。当该寄存器所包含的下一个机器语言程序的ISA指令地址是位于目前的指令快取中，可依据指令寄存器的内容快速地从指令快取撷取ISA指令由系统存储器中取出该ISA指令。尤其是，此程序是基于指令地址寄存器(如指令指针(IP)或是程序计数器(PC))的存储器地址向指令快取取得数据，而非特地运用一加载或储存指令所指定的存储器地址来进行数据撷取。因此，将指令集架构的指令视为数据(例如采用软件转译的系统的硬件部分所呈现的数据)的专用数据快取，特地运用一加载/储存地址来存取，而非基于指令地址寄存器的数值做存取的，就不是此处所称的指令快取。此外，可取得指令与数据的混合式快取，是基于指令地址寄存器的数值以及基于加载/储存地址，而非仅仅基于加载/储存地址，亦被涵盖在本说明对指令快取的定义内。在本说明内容中，加载指令是指将数据由存储器读取至微处理器的指令，储存指令是指将数据由微处理器写入存储器的指令。The instruction cache is a random-access storage device in a microprocessor in which the microprocessor places the instructions of an ISA machine language program (such as the machine language instructions of the x86 ISA and ARM ISA), which are fetched from system memory and stored. It is executed by the microprocessor according to the execution flow of the ISA machine language program. Specifically, the ISA defines an instruction address register to hold the memory address of the next ISA instruction to be executed (for example, the x86 ISA is defined as the instruction pointer (IP) and the ARM ISA is defined as the program counter (PC), When the microprocessor executes the machine language program to control the program flow, the microprocessor will update the content of the instruction address register. The ISA instruction is cached for subsequent retrieval. When the next machine language program contained in the register The ISA instruction address is located in the current instruction cache, and the ISA instruction can be quickly retrieved from the instruction cache and fetched from the system memory according to the content of the instruction register. In particular, this program is based on the instruction address register (such as The memory address of the instruction pointer (IP) or program counter (PC) fetches data from the instruction cache, rather than using the memory address specified by a load or store instruction for data retrieval. Therefore, the instruction set architecture Instructions are treated as dedicated data caches for data (such as data presented by the hardware portion of the system using software translation) and are specifically accessed using a load/store address rather than based on the value of the instruction address register. Not an instruction cache as referred to here. In addition, hybrid caches that can fetch instructions and data, based on the value of the instruction address register and based on the load/store address, not just the load/store address, are also covered in this document. Explain within the definition of instruction caching. In the content of this description, a load instruction refers to an instruction to read data from the memory to the microprocessor, and a store instruction refers to an instruction to write data from the microprocessor into the memory.

微指令集，是微处理器的执行管线能够执行的指令(微指令)的集合。A microinstruction set is a collection of instructions (microinstructions) that can be executed by the execution pipeline of a microprocessor.

实施例说明Example description

本发明实施例揭露的微处理器可通过硬件将其对应的x86ISA与ARM ISA指令转译为由微处理器执行管线直接执行的微指令，以达到可执行x86ISA与ARM ISA机器语言程序的目的。此微指令是由不同于x86ISA与ARM ISA的微处理器微架构(microarchitecture)的微指令集所定义。由于本文所述的微处理器需要执行x86与ARM机器语言程序，微处理器的硬件指令转译器会将x86与ARM指令转译为微指令，并将这些微指令提供至微处理器的执行管线，由微处理器执行这些微指令以实现前述x86与ARM指令。由于这些实行微指令是直接由硬件指令转译器提供至执行管线来执行，而不同于采用软件转译器的系统需于执行管线执行指令前，将预先储存本机(host)指令至存储器，因此，前揭微处理器具有潜力能够以较快的执行速度执行x86与ARM机器语言程序。The microprocessor disclosed in the embodiment of the present invention can translate its corresponding x86ISA and ARM ISA instructions into microinstructions directly executed by the microprocessor execution pipeline through hardware, so as to achieve the purpose of executing the x86ISA and ARM ISA machine language programs. The microinstructions are defined by a microinstruction set that is different from the microarchitecture of the x86 ISA and the ARM ISA. Since the microprocessor described in this paper needs to execute x86 and ARM machine language programs, the hardware instruction translator of the microprocessor will translate the x86 and ARM instructions into microinstructions, and provide these microinstructions to the execution pipeline of the microprocessor, These microinstructions are executed by the microprocessor to implement the aforementioned x86 and ARM instructions. Since these execution microinstructions are directly provided by the hardware instruction translator to the execution pipeline for execution, different from the system using the software translator, which needs to pre-store the local (host) instructions in the memory before executing the instructions in the execution pipeline, therefore, The aforementioned microprocessors have the potential to execute x86 and ARM machine language programs at relatively high execution speeds.

图1是一方块图显示本发明执行x86ISA与ARM ISA的机器语言程序的微处理器100的实施例。此微处理器100具有一指令快取102；一硬件指令转译器104，用以由指令快取102接收x86ISA指令与ARM ISA指令124并将其转译为微指令126；一执行管线112，执行由硬件指令转译器104接收的微指令126以产生微指令结果128，该结果是以操作数的型式回传至执行管线112；一寄存器文件106与一存储器子系统108，分别提供操作数至执行管线112并由执行管线112接收微指令结果128；一指令撷取单元与分支预测器114，提供一撷取地址134至指令快取102；一ARM ISA定义的程序计数器寄存器116与一x86ISA定义的指令指针寄存器118，其依据微指令结果128进行更新，且提供其内容至指令撷取单元与分支预测器114；以及多个配置寄存器122，提供一指令模式指针132与一环境模式指针136至硬件指令转译器104与指令撷取单元与分支预测器114，并且是基于微指令结果128进行更新。FIG. 1 is a block diagram showing an embodiment of a microprocessor 100 of the present invention that executes machine language programs for x86 ISA and ARM ISA. The microprocessor 100 has an instruction cache 102; a hardware instruction translator 104 for receiving x86 ISA instructions and ARM ISA instructions 124 from the instruction cache 102 and translating them into micro-instructions 126; an execution pipeline 112, which is executed by The microinstructions 126 received by the hardware instruction translator 104 are used to generate microinstruction results 128, which are passed back to the execution pipeline 112 in the form of operands; a register file 106 and a memory subsystem 108, respectively, provide operands to the execution pipelines 112 and receive microinstruction result 128 by execution pipeline 112; an instruction fetch unit and branch predictor 114, providing a fetch address 134 to instruction cache 102; an ARM ISA-defined program counter register 116 and an x86 ISA-defined instruction a pointer register 118, which is updated according to the microinstruction result 128 and provides its contents to the instruction fetch unit and branch predictor 114; and a plurality of configuration registers 122, which provides an instruction mode pointer 132 and an ambient mode pointer 136 to hardware instructions The translator 104 and the instruction fetch unit and the branch predictor 114 are updated based on the microinstruction result 128 .

由于微处理器100可执行x86ISA与ARM ISA机器语言指令，微处理器100依据程序流程由系统存储器(未图标)撷取指令至微处理器100。微处理器100存取最近撷取的x86ISA与ARM ISA的机器语言指令至指令快取102。指令撷取单元114将依据由系统存储器撷取的x86或ARM指令字节区段，产生一撷取地址134。若是命中指令快取102，指令快取102将位于撷取地址134的x86或ARM指令字节区段提供至硬件指令转译器104，否则由系统存储器中撷取指令集架构的指令124。指令撷取单元114基于ARM程序计数器116与x86指令指针118的值产生撷取地址134。具体来说，指令撷取单元114会在一撷取地址寄存器中维持一撷取地址。任何时候指令撷取单元114撷取到新的ISA指令字节区段，它就会依据此区段的大小更新撷取地址，并依据既有方式依序进行，直到出现一控制流程事件。控制流程事件包含例外事件的产生、分支预测器114的预测显示撷取区段内有一将发生的分支(taken branch)、以及由执行管线112响应一非由分支预测器114所预测的将发生分支指令的执行结果，而对ARM程序计数器116与x86指令指针118进行的更新。指令撷取单元114将撷取地址相应地更新为例外处理程序地址、预测目标地址或是执行目标地址以响应一控制流程事件。在一实施例中，指令快取102是一混合快取，以存取ISA指令124与数据。值得注意的是，在此混合快取的实施例中，虽然混合快取可基于一加载/储存地址将数据写入快取或由快取读取数据，在微处理器100是由混合快取撷取指令集架构的指令124的情况下，混合快取是基于ARM程序计数器116与x86指令指针118的数值来存取，而非基于加载/储存地址。指令快取102可以是一随机存取存储器装置。Since the microprocessor 100 can execute x86 ISA and ARM ISA machine language instructions, the microprocessor 100 retrieves the instructions from the system memory (not shown) to the microprocessor 100 according to the program flow. The microprocessor 100 accesses the most recently fetched x86 ISA and ARM ISA machine language instructions to the instruction cache 102 . The instruction fetch unit 114 will generate a fetch address 134 according to the x86 or ARM instruction byte segment fetched from the system memory. If the instruction cache 102 is hit, the instruction cache 102 provides the x86 or ARM instruction byte segment at the fetch address 134 to the hardware instruction translator 104, otherwise the instruction set architecture instruction 124 is fetched from system memory. The instruction fetch unit 114 generates the fetch address 134 based on the values of the ARM program counter 116 and the x86 instruction pointer 118 . Specifically, the instruction fetch unit 114 maintains a fetch address in a fetch address register. Whenever the instruction fetch unit 114 fetches a new ISA instruction byte segment, it updates the fetch address according to the size of the segment, and proceeds in sequence according to the established manner until a control flow event occurs. Control flow events include the generation of an exception event, the prediction of branch predictor 114 indicating that there is a taking branch within the fetch section, and the response by execution pipeline 112 to a branch that was not predicted by branch predictor 114 to take The execution result of the instruction, and the update of the ARM program counter 116 and the x86 instruction pointer 118. The instruction fetch unit 114 updates the fetch address to the exception handler address, the prediction target address or the execution target address accordingly in response to a control flow event. In one embodiment, instruction cache 102 is a hybrid cache for accessing ISA instructions 124 and data. It should be noted that in this hybrid cache embodiment, although the hybrid cache can write data into the cache or read data from the cache based on a load/store address, the microprocessor 100 uses the hybrid cache In the case of fetching the instruction 124 of the instruction set architecture, the hybrid cache is accessed based on the values of the ARM program counter 116 and the x86 instruction pointer 118, rather than based on load/store addresses. Instruction cache 102 may be a random access memory device.

指令模式指针132是一状态指示微处理器100当前是否正在撷取、格式化(formatting)/解码、以及将x86ISA或ARM ISA指令124转译为微指令126。此外，执行管线112与存储器子系统108接收此指令模式指针132，此指令模式指针132会影响微指令126的执行方式，尽管只是微指令集内的一个小集合受影响而已。x86指令指针寄存器118持有下一个待执行的x86ISA指令124的存储器地址，ARM程序计数器寄存器116持有下一个待执行的ARM ISA指令124的存储器地址。为了控制程序流程，微处理器100在其执行x86与ARM机器语言程序时，分别更新x86指令指针寄存器118与ARM程序计数器寄存器116，至下一个指令、分支指令的目标地址或是例外处理程序地址。在微处理器100执行x86与ARM ISA的机器语言程序的指令时，微处理器100由系统存储器撷取机器语言程序的指令集架构的指令，并将其置入指令快取102以取代最近较不被撷取与执行的指令。此指令撷取单元114基于x86指令指针寄存器118或是ARM程序计数器寄存器116的数值，并依据指令模式指针132指示微处理器100正在撷取的ISA指令124是x86或是ARM模式来产生撷取地址134。在一实施例中，x86指令指针寄存器118与ARM程序计数器寄存器116可实施为一共享的硬件指令地址寄存器，用以提供其内容至指令撷取单元与分支预测器114并由执行管线112依据指令模式指针132指示的模式是x86或ARM与x86或ARM的语意(semantics)来进行更新。Instruction mode pointer 132 is a status indicating whether microprocessor 100 is currently fetching, formatting/decoding, and translating x86 ISA or ARM ISA instructions 124 into microinstructions 126 . In addition, execution pipeline 112 and memory subsystem 108 receive the instruction mode pointer 132, which affects how microinstructions 126 are executed, although only a small subset of the microinstruction set is affected. The x86 instruction pointer register 118 holds the memory address of the next x86 ISA instruction 124 to be executed, and the ARM program counter register 116 holds the memory address of the next ARM ISA instruction 124 to be executed. In order to control the program flow, the microprocessor 100 updates the x86 instruction pointer register 118 and the ARM program counter register 116 respectively when it executes the x86 and ARM machine language programs to the target address of the next instruction, branch instruction or exception handler address . When the microprocessor 100 executes the instructions of the machine language program of x86 and ARM ISA, the microprocessor 100 fetches the instruction of the instruction set architecture of the machine language program from the system memory, and puts it into the instruction cache 102 to replace the latest more recent instruction. Instructions that are not fetched and executed. The instruction fetch unit 114 generates a fetch based on the value of the x86 instruction pointer register 118 or the ARM program counter register 116, and according to the instruction mode pointer 132 indicating that the ISA instruction 124 being fetched by the microprocessor 100 is in x86 or ARM mode. Address 134. In one embodiment, the x86 instruction pointer register 118 and the ARM program counter register 116 may be implemented as a shared hardware instruction address register to provide its contents to the instruction fetch unit and branch predictor 114 and to be executed by the execution pipeline 112 according to the instruction. The mode indicated by the mode pointer 132 is the semantics of x86 or ARM and x86 or ARM to be updated.

环境模式指针136一状态指示微处理器100是使用x86或ARM ISA的语意于此微处理器100所操作的多种执行环境，例如虚拟存储器、例外事件、快取控制、与全域执行时间保护。因此，指令模式指针132与环境模式指针136共同产生多个执行模式。在第一种模式中，指令模式指针132与环境模式指针136都指向x86ISA，微处理器100是作为一般的x86ISA处理器。在第二种模式中，指令模式指针132与环境模式指针136都指向ARM ISA，微处理器100是作为一般的ARM ISA处理器。在第三种模式中，指令模式指针132指向x86ISA，不过环境模式指针136则是指向ARM ISA，此模式有利于在ARM操作系统或是超管理器的控制下执行使用者模式x86机器语言程序；相反地，在第四种模式中，指令模式指针132是指向ARM ISA，不过环境模式指针136则是指向x86ISA，此模式有利于在x86操作系统或超管理器的控制下执行使用者模式ARM机器语言程序。指令模式指针132与环境模式指针136的数值在重置(reset)之初就已确定。在一实施例中，此初始值是被视为微码常数进行编码，不过可通过熔断配置熔丝与/或使用微码修补进行修改。在另一实施例中，此初始值则是由一外部输入提供至微处理器100。在一实施例中，环境模式指针136只在由一重置至ARM(reset-to-ARM)指令124或是一重置至x86(reset-to-x86)指令124执行重置后才会改变(请参照下述图6A及图6B)；亦即，在微处理器100正常运作而未由一般重置、重置至x86或重置至ARM指令124执行重置时，环境模式指针136并不会改变。The context mode pointer 136 is a status indicating that the microprocessor 100 is using the x86 or ARM ISA semantics meaning the various execution environments in which the microprocessor 100 operates, such as virtual memory, exceptions, cache control, and global execution time protection. Thus, the instruction mode pointer 132 and the ambient mode pointer 136 together generate multiple execution modes. In the first mode, the instruction mode pointer 132 and the environment mode pointer 136 both point to the x86 ISA, and the microprocessor 100 acts as a general x86 ISA processor. In the second mode, the instruction mode pointer 132 and the environment mode pointer 136 both point to the ARM ISA, and the microprocessor 100 acts as a general ARM ISA processor. In the third mode, the instruction mode pointer 132 points to the x86 ISA, but the environment mode pointer 136 points to the ARM ISA. This mode is conducive to executing user-mode x86 machine language programs under the control of the ARM operating system or the hypervisor; Conversely, in the fourth mode, the instruction mode pointer 132 points to the ARM ISA, but the environment mode pointer 136 points to the x86 ISA. This mode facilitates execution of user-mode ARM machines under the control of an x86 operating system or a hypervisor. language program. The values of the command mode pointer 132 and the environment mode pointer 136 are determined at the beginning of the reset. In one embodiment, this initial value is encoded as a microcode constant, but can be modified by blowing configuration fuses and/or using microcode patching. In another embodiment, the initial value is provided to the microprocessor 100 by an external input. In one embodiment, the ambient mode pointer 136 is only changed after a reset is performed by a reset-to-ARM instruction 124 or a reset-to-x86 instruction 124 (Please refer to FIGS. 6A and 6B below); that is, when the microprocessor 100 is operating normally without performing a reset by the general reset, reset to x86, or reset to ARM instructions 124, the ambient mode pointer 136 does not will not change.

硬件指令转译器104接收x86与ARM ISA的机器语言指令124作为输入，相应地提供一个或多个微指令126作为输出信号以实现x86或ARM ISA指令124。执行管线112执行前揭一个或多个微指令126，其集体执行的结果实现x86或ARM ISA指令124。也就是说，这些微指令126的集体执行可依据输入端所指定的x86或ARM ISA指令124，来执行x86或是ARM ISA指令124所指定的操作，以产生x86或ARM ISA指令124所定义的结果。因此，硬件指令转译器104将x86或ARM ISA指令124转译为一个或多个微指令126。硬件指令转译器104包含一组晶体管，以一预设方式进行配置来将x86ISA与ARM ISA的机器语言指令124转译为实行微指令126。硬件指令转译器104并具有布尔逻辑门以产生实行微指令126(如图2所示的简单指令转译器204)。在一实施例中，硬件指令转译器104并具有一微码只读存储器(如图2中复杂指令转译器206的元件234)，硬件指令转译器104利用此微码只读存储器，并依据复杂ISA指令124产生实行微指令126，这部分将在图2的说明内容会有进一步的说明。就一较佳实施例而言，硬件指令转译器104不必然要能转译x86使用者操作手册或是ARM使用者操作手册所定义的整个ISA指令124集，而只要能够转译这些指令的一个子集合即可。具体来说，由x86使用者操作手册定义且由硬件指令转译器104转译的ISA指令124的子集合，并不必然对应至任何Intel开发的既有x86ISA处理器，而由ARM使用者操作手册定义且由硬件指令转译器104转译的ISA指令124的子集合并不必然对应至任何由ARM Ltd.开发的既有的ISA处理器。前揭一个或多个用以实现x86或ARM ISA指令124的实行微指令126，可由硬件指令转译器104一次全部提供至执行管线112或是依序提供。本实施例的优点在于，硬件指令转译器104可将实行微指令126直接提供至执行管线112执行，而不需要将这些微指令126储存于设置两者间的存储器。在图1的微处理器100的实施例中，当微处理器100执行x86或是ARM机器语言程序时，微处理器100每一次执行x86或是ARM指令124时，硬件指令转译器104就会将x86或ARM机器语言指令124转译为一个或多个微指令126。不过，图8的实施例则是利用一微指令快取以避免微处理器100每次执行x86或ARM ISA指令124所会遭遇到的重复转译的问题。硬件指令转译器104的实施例在图2会有更详细的说明。The hardware instruction translator 104 receives x86 and ARM ISA machine language instructions 124 as input and accordingly provides one or more microinstructions 126 as output signals to implement the x86 or ARM ISA instructions 124. The execution pipeline 112 exposes one or more micro-instructions 126 before execution, and the result of its collective execution implements x86 or ARM ISA instructions 124. That is to say, the collective execution of these microinstructions 126 can execute the operations specified by the x86 or ARM ISA instructions 124 according to the x86 or ARM ISA instructions 124 specified by the input terminal to generate the x86 or ARM ISA instructions 124 specified operations. result. Accordingly, hardware instruction translator 104 translates x86 or ARM ISA instructions 124 into one or more microinstructions 126 . The hardware instruction translator 104 includes a set of transistors configured in a predetermined manner to translate x86 ISA and ARM ISA machine language instructions 124 into execution microinstructions 126 . The hardware instruction translator 104 also has Boolean logic gates to generate execute microinstructions 126 (simple instruction translator 204 shown in FIG. 2). In one embodiment, the hardware instruction translator 104 also has a microcode ROM (eg, element 234 of the complex instruction translator 206 in FIG. 2 ). The ISA instruction 124 generates the execution microinstruction 126, which will be further explained in the description of FIG. 2 . For a preferred embodiment, the hardware instruction translator 104 need not be able to translate the entire set of ISA instructions 124 as defined by the x86 user manual or the ARM user manual, but only a subset of these instructions. That's it. Specifically, the subset of ISA instructions 124 defined by the x86 user manual and translated by the hardware instruction translator 104 does not necessarily correspond to any existing x86 ISA processor developed by Intel, but is defined by the ARM user manual And the subset of ISA instructions 124 translated by hardware instruction translator 104 does not necessarily correspond to any existing ISA processor developed by ARM Ltd. The preceding one or more execution microinstructions 126 for implementing x86 or ARM ISA instructions 124 may be provided by hardware instruction translator 104 all at once to execution pipeline 112 or sequentially. The advantage of this embodiment is that the hardware instruction translator 104 can directly provide the execution microinstructions 126 to the execution pipeline 112 for execution without the need to store these microinstructions 126 in a memory between the two. In the embodiment of the microprocessor 100 in FIG. 1, when the microprocessor 100 executes the x86 or ARM machine language program, every time the microprocessor 100 executes the x86 or ARM instruction 124, the hardware instruction translator 104 will The x86 or ARM machine language instructions 124 are translated into one or more microinstructions 126 . However, the embodiment of FIG. 8 utilizes a microinstruction cache to avoid the double translation problem encountered by the microprocessor 100 every time the x86 or ARM ISA instruction 124 is executed. An embodiment of the hardware instruction translator 104 is described in more detail in FIG. 2 .

执行管线112执行由硬件指令转译器104提供的实行微指令126。基本上，执行管线112是一通用高速微指令处理器。虽然本文所描述的功能是由具有x86/ARM特定特征的执行管线112执行，但大多数x86/ARM特定功能其实是由此微处理器100的其它部分，如硬件指令转译器104，来执行。在一实施例中，执行管线112执行由硬件指令转译器104接收到的实行微指令126的寄存器重命名、超纯量发布、与非循序执行。执行管线112在图4会有更详细的说明。The execution pipeline 112 executes the execute microinstructions 126 provided by the hardware instruction translator 104 . Basically, the execution pipeline 112 is a general purpose high-speed microinstruction processor. Although the functions described herein are performed by the execution pipeline 112 with x86/ARM specific features, most x86/ARM specific functions are actually performed by other parts of the microprocessor 100, such as the hardware instruction translator 104. In one embodiment, execution pipeline 112 executes register renaming, superscalar issue, and non-sequential execution of execution microinstructions 126 received by hardware instruction translator 104 . The execution pipeline 112 is described in more detail in FIG. 4 .

微处理器100的微架构包含：(1)微指令集；(2)微指令集的微指令126所能取用的资源集，此资源集是x86与ARM ISA的资源的超集(superset)；以及(3)微处理器100相应于微指令126的执行所定义的微例外事件(micro-exception)集，此微例外事件集x86ISA与ARM ISA的例外事件的超集。此微架构不同于x86ISA与ARM ISA。具体来说，此微指令集在许多面向是不同于x86ISA与ARM ISA的指令集。首先，微指令集的微指令指示执行管线112执行的操作与x86ISA与ARM ISA的指令集的指令指示微处理器执行的操作并非一对一对应。虽然其中许多操作相同，不过，仍有一些微指令集指定的操作并非x86ISA及/或ARM ISA指令集所指定。相反地，有一些x86ISA及/或ARM ISA指令集指定的操作并非微指令集所指定。其次，微指令集的微指令是以不同于x86ISA与ARM ISA指令集的指令的编码方式进行编码。亦即，虽然有许多相同的操作(如：相加、偏移、加载、返回)在微指令集以及x86与ARM ISA指令集中都有指定，微指令集与x86或ARM ISA指令集的二进制操作码值对应表并没有一对一对应。微指令集与x86或ARM ISA指令集的二进制操作码值对应表相同通常是巧合，其间仍不具有一对一的对应关系。第三，微指令集的微指令位字段与x86或是ARM ISA指令集的指令位字段也不是一对一对应。The microarchitecture of the microprocessor 100 includes: (1) a microinstruction set; (2) a resource set that can be accessed by the microinstruction 126 of the microinstruction set, and this resource set is a superset of the resources of x86 and ARM ISA ; and (3) a micro-exception set defined by the microprocessor 100 corresponding to the execution of the microinstruction 126, which is a superset of the exceptions of the x86 ISA and the ARM ISA. This microarchitecture is different from x86ISA and ARM ISA. Specifically, this microinstruction set differs from that of the x86ISA and ARM ISA in many respects. First of all, the operations performed by the microinstructions of the microinstruction set instruct the execution pipeline 112 and the operations performed by the instructions of the x86 ISA and ARM ISA instruction sets instruct the microprocessor to perform are not in a one-to-one correspondence. While many of these operations are the same, there are some operations specified by the microinstruction set that are not specified by the x86 ISA and/or ARM ISA instruction set. Conversely, there are some operations specified by the x86ISA and/or ARM ISA instruction set that are not specified by the microinstruction set. Second, the microinstructions of the microinstruction set are encoded in a different way than the instructions in the x86ISA and ARM ISA instruction sets. That is, although many of the same operations (eg: add, offset, load, return) are specified in the microinstruction set as well as in the x86 and ARM ISA instruction sets, the binary operations of the microinstruction set and the x86 or ARM ISA instruction set The code value correspondence table does not have a one-to-one correspondence. It is usually a coincidence that the binary opcode value correspondence table of the microinstruction set is the same as that of the x86 or ARM ISA instruction set, and there is still no one-to-one correspondence between them. Third, the microinstruction bit field of the microinstruction set does not correspond one-to-one with the instruction bit field of the x86 or ARM ISA instruction set.

整体而言，微处理器100可执行x86ISA与ARM ISA机器语言程序指令。然而，执行管线112本身无法执行x86或ARM ISA机器语言指令；而是执行由x86ISA与ARM ISA指令转译成的微处理器100微架构的微指令集的实行微指令126。然而，虽然此微架构与x86ISA以及ARMISA不同，本发明亦提出其它实施例将微指令集与其它微架构指定的资源开放给使用者。在这些实施例中，此微架构可有效地作为在x86ISA与ARM ISA外的一个具有微处理器所能执行的机器语言程序的第三ISA。In general, the microprocessor 100 can execute x86 ISA and ARM ISA machine language program instructions. However, the execution pipeline 112 itself cannot execute x86 or ARM ISA machine language instructions; instead, it executes the execution microinstructions 126 of the microinstruction set of the microprocessor 100 microarchitecture translated from the x86 ISA and ARM ISA instructions. However, although this microarchitecture is different from x86ISA and ARMISA, the present invention also proposes other embodiments to open the microinstruction set and resources specified by other microarchitectures to users. In these embodiments, the microarchitecture can effectively act as a third ISA with machine language programs executable by the microprocessor, in addition to the x86 ISA and the ARM ISA.

下表(表1)描述本发明微处理器100的一实施例的微指令集的微指令126的一些位字段。The following table (Table 1) describes some of the bit fields of the microinstructions 126 of the microinstruction set of one embodiment of the microprocessor 100 of the present invention.

下表(表2)描述本发明微处理器100的一实施例的微指令集的一些微指令。The following table (Table 2) describes some of the microinstructions of the microinstruction set of one embodiment of the microprocessor 100 of the present invention.

微处理器100也包含一些微架构特定的资源，如微架构特定的通用寄存器、媒体寄存器与区段寄存器(如用于重命名的寄存器或由微码所使用的寄存器)以及未见于x86或ARM ISA的控制寄存器，以及一私人随机存取存储器(PRAM)。此外，此微架构可产生例外事件，亦即前述的微例外事件。这些例外事件未见于x86或ARM ISA或是由它们所指定，而通常是微指令126与相关微指令126的重新执行(replay)。举例来说，这些情形包含：加载错过(load miss)的情况，其是执行管线112假设加载操作并于错过时重新执行此加载微指令126；错过转译后备缓冲区(TLB)，在查表(page table walk)与转译后备缓冲区填满后，重新执行此微指令126；浮点微指令126接收一异常操作数(denormal operand)但此操作数被评估为正常情况，需在执行管线112正常化此操作数后重新执行此微指令126；一加载微指令126执行后检测到一个更早的储存(store)微指令126与其地址冲突(address-colliding)需要重新执行此加载微指令126。需理解的是，本文表1所列的位字段，表2所列的微指令，以及微架构指定的资源与微架构指定的例外事件，只是作为例示说明本发明的微架构，而非穷尽本发明的所有可能实施例。Microprocessor 100 also includes some microarchitecture-specific resources, such as microarchitecture-specific general purpose registers, media registers, and segment registers (eg, registers for renaming or registers used by microcode) and not found on x86 or ARM. ISA control registers, and a private random access memory (PRAM). Furthermore, this micro-architecture can generate exceptional events, ie, the aforementioned micro-exception events. These exceptions are not found in or specified by the x86 or ARM ISAs, but are typically replays of microinstructions 126 and associated microinstructions 126 . Such situations include, for example: load miss situations, in which the execution pipeline 112 assumes a load operation and re-executes the load microinstruction 126 when it misses; miss translation lookaside buffers (TLBs), where a lookup table ( page table walk) and the translation lookaside buffer is filled, the microinstruction 126 is re-executed; the floating-point microinstruction 126 receives a denormal operand but the operand is evaluated as a normal condition, which needs to be normal in the execution pipeline 112 The microinstruction 126 is re-executed after the operand is converted; after a load microinstruction 126 is executed, it is detected that an earlier store microinstruction 126 and its address-colliding conflict (address-colliding), and the load microinstruction 126 needs to be re-executed. It should be understood that the bit fields listed in Table 1, the microinstructions listed in Table 2, as well as the resources specified by the microarchitecture and the exception events specified by the microarchitecture are only used as examples to illustrate the microarchitecture of the present invention, rather than exhaustive. All possible embodiments of the invention.

寄存器文件106包含微指令126所使用的硬件寄存器，以持有资源与/或目的操作数。执行管线112将其结果128写入寄存器文件106，并由寄存器文件106为微指令126接收操作数。硬件寄存器是引用(instantiate)x86ISA定义与ARM ISA定义的通用寄存器是共享寄存器文件106中的一些寄存器。举例来说，在一实施例中，寄存器文件106是引用十五个32位的寄存器，由ARM ISA寄存器R0至R14以及x86ISA累积寄存器(EAX register)至R14D寄存器所共享。因此，若是一第一微指令126将一数值写入ARM R2寄存器，随后一后续的第二微指令126读取x86累积寄存器将会接收到与第一微指令126写入相同的数值，反之亦然。此技术特征有利于使x86ISA与ARM ISA的机器语言程序得以快速通过寄存器进行沟通。举例来说，假设在ARM机器语言操作系统执行的ARM机器语言程序能使指令模式132改变为x86ISA，并将控制权转换至一x86机器语言程序以执行特定功能，因为x86ISA可支持一些指令，其执行操作的速度快于ARM ISA，在这种情形下将有利于执行速度的提升。ARM程序透过寄存器文件106的共享寄存器提供需要的数据给x86执行程序。反之，x86执行程序可将执行结果提供至寄存器文件106的共享寄存器内，以使ARM程序在x86执行程序回复后可见到此执行结果。相似地，在x86机器语言操作系统执行的x86机器语言程序可使指令模式132改变为ARM ISA并将控制权转换至ARM机器语言程序；此x86程序可通过寄存器文件106的共享寄存器提供所需的数据给ARM执行程序，而此ARM执行程序可通过寄存器文件106的共享寄存器提供执行结果，以使x86程序在ARM执行程序回复后可见到此执行结果。因为ARM R15寄存器是一独立引用的ARM程序计数器寄存器116，因此，引用x86R15D寄存器的第十六个32位寄存器并不分享给ARM R15寄存器。此外，在一实施例中，x86的十六个128位XMM0至XMM15寄存器与十六个128位进阶单指令多重数据扩展(Advanced SIMD(“Neon”))寄存器的32位区段是分享给三十二个32位ARM VFPv3浮点寄存器。寄存器文件106亦引用旗标寄存器(即x86EFLAGS寄存器与ARM条件旗标寄存器)，以及x86ISA与ARM ISA所定义的多种控制权与状态寄存器，这些架构控制与状态寄存器包括x86架构的特定模型寄存器(model specific registers,MSRs)与保留给ARM架构的协同处理器(8-15)寄存器。此寄存器文件106亦引用非架构寄存器，如用于寄存器重命名或是由微码234所使用的非架构通用寄存器，以及非架构x86特定模型寄存器与实作定义的或是由制造商指定的ARM协同处理器寄存器。寄存器文件106在图5会有更进一步的说明。Register file 106 contains hardware registers used by microinstructions 126 to hold resource and/or destination operands. The execution pipeline 112 writes its result 128 to the register file 106 , and the register file 106 receives operands for the microinstruction 126 . The hardware registers are some registers in the shared register file 106 that instantiate the x86 ISA-defined and ARM ISA-defined general purpose registers. For example, in one embodiment, the register file 106 references fifteen 32-bit registers shared by the ARM ISA registers R0 through R14 and the x86 ISA accumulation registers (EAX register) through R14D registers. Therefore, if a first microinstruction 126 writes a value to the ARM R2 register, a subsequent second microinstruction 126 that reads the x86 accumulation register will receive the same value as the first microinstruction 126 wrote, and vice versa Of course. This technical feature is beneficial to make the machine language programs of x86ISA and ARM ISA communicate quickly through registers. For example, assume that an ARM machine language program executed on an ARM machine language operating system can change the instruction mode 132 to x86ISA and transfer control to an x86 machine language program to perform a specific function, because the x86ISA can support some instructions, which The execution speed is faster than the ARM ISA, which in this case will benefit the execution speed. The ARM program provides the required data to the x86 execution program through the shared registers of the register file 106 . On the contrary, the x86 executable program can provide the execution result into the shared register of the register file 106, so that the ARM program can see the execution result after the x86 executable program responds. Similarly, an x86 machine language program executing on an x86 machine language operating system may change the instruction mode 132 to the ARM ISA and transfer control to the ARM machine language program; this x86 program may provide the required The data is sent to the ARM execution program, and the ARM execution program can provide the execution result through the shared register of the register file 106, so that the x86 program can see the execution result after the ARM execution program replies. Because the ARM R15 register is an independently referenced ARM program counter register 116, the sixteenth 32-bit register that references the x86R15D register is not shared with the ARM R15 register. In addition, in one embodiment, sixteen 128-bit XMM0 to XMM15 registers and sixteen 128-bit Advanced SIMD ("Neon") 32-bit segments of x86 registers are shared with Thirty-two 32-bit ARM VFPv3 floating point registers. Register file 106 also references flag registers (ie, the x86EFLAGS register and the ARM condition flags register), as well as various control and status registers defined by the x86 ISA and ARM ISA, including the x86 architecture specific model registers ( model specific registers, MSRs) and coprocessor (8-15) registers reserved for the ARM architecture. The register file 106 also references non-architectural registers, such as non-architectural general purpose registers used for register renaming or used by microcode 234, and non-architectural x86 model-specific registers and ARM implementation-defined or manufacturer-specified registers Coprocessor registers. The register file 106 is further illustrated in FIG. 5 .

存储器子系统108包含一由高速缓存构成的高速缓存阶层架构(在一实施例中包含第1层(level-1)指令快取102、第1层(level-1)数据快取与第2层混合快取)。此存储器子系统108包含多种存储器请求队列，如加载、储存、填入、窥探、合并写入归并缓冲区。存储器子系统亦包含一存储器管理单元(MMU)。存储器管理单元具有转译后备缓冲区(TLBs)，尤以独立的指令与数据转译后备缓冲区为佳。存储器子系统还包含一查表引擎(table walkengine)以获得虚拟与实体地址间的转译，来回应转译后备缓冲区的错失。虽然在图1中指令快取102与存储器子系统108是显示为各自独立，不过，在逻辑上，指令快取102亦是存储器子系统108的一部分。存储器子系统108是设定使x86与ARM机器语言程序分享一共同的存储空间，以使x86与ARM机器语言程序容易通过存储器互相沟通。The memory subsystem 108 includes a cache hierarchy consisting of caches (including, in one embodiment, a level-1 instruction cache 102, a level-1 data cache, and a level-2 cache). hybrid cache). The memory subsystem 108 contains various memory request queues such as load, store, fill, snoop, merge write merge buffer. The memory subsystem also includes a memory management unit (MMU). The memory management unit has translation lookaside buffers (TLBs), preferably separate instruction and data translation lookaside buffers. The memory subsystem also includes a table walkengine to obtain translations between virtual and physical addresses in response to translation lookaside buffer misses. Although the instruction cache 102 and the memory subsystem 108 are shown as independent in FIG. 1 , logically, the instruction cache 102 is also a part of the memory subsystem 108 . The memory subsystem 108 is configured to allow x86 and ARM machine language programs to share a common storage space, so that the x86 and ARM machine language programs can easily communicate with each other through memory.

存储器子系统108得知指令模式132与环境模式136，使其能够在适当ISA内容中执行多种操作。举例来说，存储器子系统108依据指令模式指针132指示为x86或ARM ISA，来执行特定存储器存取违规的检验(例如过限检验(limit violation check))。在另一实施例中，响应环境模式指针136的改变，存储器子系统108会更新(flush)转译后备缓冲区；不过在指令模式指针132改变时，存储器子系统108并不相应地更新转译后备缓冲区，以在前述指令模式指针132与环境模式指针136分指x86与ARM的第三与第四模式中提供较佳的效能。在另一实施例中，回应一转译后备缓冲区错失(TKB miss)，查表引擎依据环境模式指针136指示为x86或ARM ISA，从而决定利用x86分页表或ARM分页表来执行一分页查表动作以取出转译后备缓冲区。在另一实施例中，若是环境状态指标136指示为x86ISA，存储器子系统108检查会影响快取策略的x86ISA控制寄存器(如CR0CD与NW位)的架构状态；若是环境模式指针136指示为ARM ISA，则检查相关的ARM ISA控制寄存器(如SCTLR I与C位)的架构模式。在另一实施例中，若是状态指标136指示为x86ISA，存储器子系统108检查会影响存储器管理的x86ISA控制寄存器(如CR0PG位)的架构状态；若是环境模式指针136指示为ARM ISA，则检查相关的ARM ISA控制寄存器(如SCTLR M位)的架构模式。在另一实施例中，若是状态指标136指示为x86ISA，存储器子系统108检查会影响对准检测的x86ISA控制寄存器(如CR0AM位)的架构状态，若是环境模式指针136指示为ARM ISA，则检查相关的ARM ISA控制寄存器(如SCTLR A位)的架构模式。在另一实施例中，若是状态指标136指示为x86ISA，存储器子系统108(以及用于特权指令的硬件指令转译器104)检查当前所指定特权级(CPL)的x86ISA控制寄存器的架构状态；若是环境模式指针136指示为ARM ISA，则检查指示使用者或特权模式的相关ARM ISA控制寄存器的架构模式。不过，在一实施例中，x86ISA与ARM ISA分享微处理器100中具有相似功能的控制字节/寄存器，微处理器100并不对各个指令集架构引用独立的控制字节/寄存器。Memory subsystem 108 is aware of instruction mode 132 and ambient mode 136, enabling it to perform various operations in the appropriate ISA context. For example, the memory subsystem 108 performs checks for certain memory access violations (eg, limit violation checks) according to the instruction mode pointer 132 indicating the x86 or ARM ISA. In another embodiment, the memory subsystem 108 flushes the translation lookaside buffer in response to a change in the ambient mode pointer 136; however, when the instruction mode pointer 132 changes, the memory subsystem 108 does not update the translation lookaside buffer accordingly area to provide better performance in the third and fourth modes in which the instruction mode pointer 132 and the environment mode pointer 136 respectively point to x86 and ARM. In another embodiment, in response to a translation lookaside buffer miss (TKB miss), the table lookup engine determines to use the x86 paging table or the ARM paging table to perform a paging table lookup according to the environment mode pointer 136 indicating the x86 or ARM ISA. Action to fetch the translation lookaside buffer. In another embodiment, if the environment state indicator 136 indicates x86 ISA, the memory subsystem 108 checks the architectural state of the x86 ISA control registers (eg, CROCD and NW bits) that affect the cache policy; if the environment mode indicator 136 indicates ARM ISA , check the architectural mode of the relevant ARM ISA control registers (such as the SCTLR I and C bits). In another embodiment, if the status indicator 136 indicates x86 ISA, the memory subsystem 108 checks the architectural status of the x86 ISA control registers (eg, the CROPG bit) that affect memory management; if the environment mode indicator 136 indicates ARM ISA, it checks the relevant The architectural mode of the ARM ISA control registers (such as the SCTLR M bit). In another embodiment, if the status indicator 136 indicates x86 ISA, the memory subsystem 108 checks the architectural status of the x86 ISA control registers (eg, the CROAM bit) that can affect alignment detection, and if the ambient mode indicator 136 indicates ARM ISA, checks Architectural mode of the associated ARM ISA control registers (such as the SCTLR A bit). In another embodiment, if status indicator 136 indicates x86ISA, memory subsystem 108 (and hardware instruction translator 104 for privileged instructions) checks the architectural state of the x86ISA control registers for the currently assigned privilege level (CPL); The environment mode pointer 136 indicates ARM ISA, then check the architectural mode of the relevant ARM ISA control register indicating user or privileged mode. However, in one embodiment, the x86 ISA and the ARM ISA share control bytes/registers with similar functions in the microprocessor 100, and the microprocessor 100 does not reference separate control bytes/registers for each instruction set architecture.

虽然配置寄存器122与寄存器文件106在图标中是各自独立，不过配置寄存器122可被理解为寄存器文件106的一部分。配置寄存器122具有一全域配置寄存器，用以控制微处理器100在x86ISA与ARM ISA各种不同面向的操作，例如使多种特征生效或失效的功能。全域配置寄存器可使微处理器100执行ARM ISA机器语言程序的能力失效，即让微处理器100成为一个仅能执行x86指令的微处理器100，并可使其它相关且专属于ARM的能力(如启动x86(launch-x86)与重置至x86的指令124与本文所称的实作定义(implementation-defined)协同处理器寄存器)失效。全域配置寄存器亦可使微处理器100执行x86ISA机器语言程序的能力失效，亦即让微处理器100成为一个仅能执行ARM指令的微处理器100，并可使其它相关的能力(如启动ARM与重置至ARM的指令124与本文所称的新的非架构特定模型寄存器)失效。在一实施例中，微处理器100在制造时具有预设的配置设定，如微码234中的硬式编码值，此微码234在启动时利用此硬式编码值来设定微处理器100的配置，例如写入编码寄存器122。不过，部分编码寄存器122是以硬件而非以微码234进行设定。此外，微处理器100具有多个熔丝，可由微码234进行读取。这些熔丝可被熔断以修改预设配置值。在一实施例中，微码234读取熔丝值，对预设值与熔丝值执行一非互斥或(exclusive-OR)操作，并将操作结果写入配置寄存器122。此外，对于熔丝值修改的效果可利用一微码234修补而回复。在微处理器100能够执行x86与ARM程序的情况下，全域配置寄存器可用于确认微处理器100(或如图7所示处理器的多核心部分的一特定核心100)在重置或如图6A及图6B所示在响应x86形式的INIT指令时，会以x86微处理器的形态还是以ARM微处理器的形态进行开机。全域配置寄存器并具有一些位提供起始预设值给特定的架构控制寄存器，如ARM ISA SCTLT与CPACR寄存器。图7所示的多核心的实施例中仅具有一个全域配置寄存器，即使各核心的配置可分别设定，如在指令模式指针132与环境模式指针136都设定为x86或ARM时，选择以x86核心或是ARM核心开机。此外，启动ARM指令126与启动x86指令126可用以在x86与ARM指令模式132间动态切换。在一实施例中，全域配置寄存器可通过一x86RDMSR指令对一新的非架构特定模型寄存器进行读取，并且其中部分的控制位可通过x86WRMSR指令对前揭新的非架构特定模型寄存器的写入来进行写入操作。全域配置寄存器还可通过ARM MCR/MCRR指令对一对应至前揭新的非架构特定模型寄存器的ARM协同处理器寄存器进行读取，而其中部分的控制位可通过ARM MRC/MRRC指令对应至此新的非架构特定模型寄存器的ARM协同处理器寄存器的写入来进行写入操作。Although the configuration registers 122 and the register file 106 are shown separately in the figures, the configuration registers 122 may be understood as part of the register file 106 . The configuration register 122 has a global configuration register for controlling various aspects of the operation of the microprocessor 100 in the x86 ISA and the ARM ISA, such as the function of enabling or disabling various features. The global configuration register disables the ability of the microprocessor 100 to execute ARM ISA machine language programs, that is, makes the microprocessor 100 a microprocessor 100 that can only execute x86 instructions, and disables other related and ARM-specific capabilities ( Instructions 124 such as launch-x86 and reset to x86 and what are referred to herein as implementation-defined coprocessor registers) fail. The global configuration register can also disable the ability of the microprocessor 100 to execute x86ISA machine language programs, that is, make the microprocessor 100 a microprocessor 100 that can only execute ARM instructions, and enable other related capabilities (such as enabling ARM and reset to ARM instruction 124 and what is referred to herein as a new non-architecture specific model register) invalidation. In one embodiment, the microprocessor 100 is manufactured with preset configuration settings, such as hard-coded values in the microcode 234, which are used to configure the microprocessor 100 at startup with the hard-coded values configuration, such as writing to the encoding register 122. However, the partial code register 122 is set in hardware rather than in the microcode 234 . Additionally, the microprocessor 100 has multiple fuses that can be read by the microcode 234 . These fuses can be blown to modify preset configuration values. In one embodiment, the microcode 234 reads the fuse value, performs an exclusive-OR operation on the default value and the fuse value, and writes the operation result into the configuration register 122 . In addition, the effects of fuse value modification can be recovered using a microcode 234 patch. In the case where the microprocessor 100 is capable of executing x86 and ARM programs, the global configuration register can be used to confirm that the microprocessor 100 (or a particular core 100 of the multi-core portion of the processor as shown in FIG. 7) is in reset or as shown in FIG. As shown in FIG. 6A and FIG. 6B , when responding to the INIT instruction in the form of x86, whether the boot is performed in the form of the x86 microprocessor or the form of the ARM microprocessor. The global configuration register has some bits that provide initial preset values for specific architectural control registers, such as the ARM ISA SCTLT and CPACR registers. The multi-core embodiment shown in FIG. 7 only has one global configuration register, even if the configuration of each core can be set separately, for example, when the instruction mode pointer 132 and the environment mode pointer 136 are both set to x86 or ARM, select the x86 core or ARM core boot. In addition, the enable ARM instruction 126 and the enable x86 instruction 126 may be used to dynamically switch between the x86 and ARM instruction modes 132 . In one embodiment, the global configuration register can be read from a new non-architecture specific model register by an x86RDMSR instruction, and some of its control bits can be written to the new non-architecture specific model register by the x86WRMSR instruction. to perform the write operation. The global configuration register can also be read through the ARM MCR/MCRR instruction to an ARM coprocessor register corresponding to the new non-architecture-specific model register disclosed above, and some of the control bits can be corresponding to the new non-architecture model register through the ARM MRC/MRRC instruction. The non-architecture specific model registers are written to the ARM coprocessor registers to perform write operations.

配置寄存器122并包含多种不同的控制寄存器从不同面向控制微处理器100的操作。这些非x86(non-x86)/ARM的控制寄存器包括本文所称的全域控制寄存器、非指令集架构控制寄存器、非x86/ARM控制寄存器、通用控制寄存器、以及其它类似的寄存器。在一实施例中，这些控制寄存器可利用x86RDMSR/WRMSR指令至非架构特定模型寄存器(MSRs)进行存取、以及利用ARM MCR/MRC(或MCRR/MRRC)指令至新实作定义的协同处理器寄存器进行存取。举例来说，微处理器100包含非x86/ARM的控制寄存器，以确认微型(fine-grained)快取控制，此微型快取控制是小于x86ISA与ARM ISA控制寄存器所能提供者。Configuration registers 122 and contain a variety of different control registers from different orientations to control the operation of microprocessor 100 . These non-x86 (non-x86)/ARM control registers include what are referred to herein as global control registers, non-instruction set architecture control registers, non-x86/ARM control registers, general purpose control registers, and other similar registers. In one embodiment, these control registers can be accessed using x86 RDMSR/WRMSR instructions to non-architecture specific model registers (MSRs) and using ARM MCR/MRC (or MCRR/MRRC) instructions to new implementation-defined coprocessors register to access. For example, the microprocessor 100 includes non-x86/ARM control registers to enable fine-grained cache control that is smaller than what the x86 ISA and ARM ISA control registers can provide.

在一实施例中，微处理器100提供ARM ISA机器语言程序通过实作定义ARM ISA协同处理器寄存器存取x86ISA特定模型寄存器，这些实作定义ARM ISA协同处理器寄存器是直接对应于相对应的x86特定模型寄存器。此特定模型寄存器的地址是指定于ARM ISA R1寄存器。此数据是由MRC/MRRC/MCR/MCRR指令所指定的ARM ISA寄存器读出或写入。在一实施例中，特定模型寄存器的一子集合是以密码保护，亦即指令在尝试存取特定模型寄存器时必须使用密码。在此实施例中，密码是指定于ARM R7:R6寄存器。若是此存取操作导致x86通用保护错误，微处理器100随即产生一ARM ISA未定义指令中止模式(UND)例外事件。在一实施例中，ARM协同处理器4(地址为：0,7,15,0)存取相对应的x86特定模型寄存器。In one embodiment, microprocessor 100 provides ARM ISA machine language programs to access x86 ISA model-specific registers through implementation-defined ARM ISA coprocessor registers that directly correspond to corresponding x86 specific model registers. The address of this particular model register is assigned to the ARM ISA R1 register. This data is read from or written to the ARM ISA register specified by the MRC/MRRC/MCR/MCRR instruction. In one embodiment, a subset of the model-specific registers are password protected, ie instructions must use the password when attempting to access the model-specific registers. In this embodiment, the password is specified in the ARM R7:R6 registers. If the access operation results in an x86 general protection fault, the microprocessor 100 then generates an ARM ISA undefined instruction abort mode (UND) exception event. In one embodiment, the ARM coprocessor 4 (addresses: 0, 7, 15, 0) accesses the corresponding x86 model-specific registers.

微处理器100并包含一个耦接至执行管线112的中断控制器(未图示)。在一实施例中，此中断控制器是一x86型式的先进可编程中断控制器(APIC)。中断控制器将x86ISA中断事件对应至ARM ISA中断事件。在一实施例中，x86INTR对应至ARM IRQ中断事件；x86NMI是对应至ARM IRQ中断事件；x86INIT在微处理器100启动时引发起动重置循序过程(INIT-reset sequence)，无论那一个指令集架构(x86或ARM)原本是由硬件重置启动的；x86SMI对应至ARM FIQ中断事件；以及x86STPCLK、A20、Thermal、PREQ、与Rebranch则不对应至ARM中断事件。ARM机器语言能通过新的实作定义的ARM协同处理器寄存器存取先进可编程中断控制器的功能。在一实施例中，APIC寄存器地址是指定于ARM R0寄存器，此APIC寄存器的地址与x86的地址相同。在一实施例中，ARM协同处理器6通常用于操作系统执行的特权模式功能，此ARM协同处理器6的地址为：0,7,nn,0；其中nn为15时可存取先进可编程中断控制器；nn是12-14以存取总线接口单元，藉以在处理器总线上执行8位、16位与32位输入/输出循环。微处理器100并包含一总线接口单元(未图示)，此总线接口单元耦接至存储器子系统108与执行管线112，作为微处理器100与处理器总线的接口。在一实施例中，处理器总线符合一个Intel Pentium微处理器家族的微处理器总线的规格。ARM机器语言程序可通过新的实作定义的ARM协同处理器寄存器存取总线接口单元的功能以在处理器总线上产生输入/输出循环，即由输入输出总线传送至输入输出空间的一特定地址，藉以与系统芯片组沟通，举例来说，ARM机器语言程序可产生一SMI认可的特定循环或是关于C状态转换的输入输出循环。在一实施例中，输入输出地址是指定于ARM R0寄存器。在一实施例中，微处理器100具有电力管理能力，如已知的P-state与C-state管理。ARM机器语言程序可通过新的实作定义ARM协同处理器寄存器执行电力管理。在一实施例中，微处理器100包含一加密单元(未图示)，此加密单元是位于执行管线112内。在一实施例中，此加密单元实质上类似于具有Padlock安全科技功能的VIA微处理器的加密单元。ARM机器语言程序能通过新的实作定义的ARM协同处理器寄存器取得加密单元的功能，如加密指令。在一实施例中，ARM协同处理器5用于通常由使用者模式应用程序执行的使用者模式功能，例如那些使用加密单元的技术特征所产生的功能。The microprocessor 100 also includes an interrupt controller (not shown) coupled to the execution pipeline 112 . In one embodiment, the interrupt controller is an x86 style Advanced Programmable Interrupt Controller (APIC). The interrupt controller maps x86ISA interrupt events to ARM ISA interrupt events. In one embodiment, x86INTR corresponds to an ARM IRQ interrupt event; x86NMI corresponds to an ARM IRQ interrupt event; x86INIT triggers an INIT-reset sequence when the microprocessor 100 starts up, regardless of the instruction set architecture (x86 or ARM) were originally initiated by hardware reset; x86SMI corresponds to ARM FIQ interrupt events; and x86STPCLK, A20, Thermal, PREQ, and Rebranch do not correspond to ARM interrupt events. The ARM machine language can access advanced programmable interrupt controller functionality through new implementation-defined ARM coprocessor registers. In one embodiment, the APIC register address is assigned to the ARM R0 register, and the address of the APIC register is the same as the x86 address. In one embodiment, the ARM coprocessor 6 is usually used for the privileged mode function executed by the operating system, and the address of the ARM coprocessor 6 is: 0, 7, nn, 0; Program the interrupt controller; nn is 12-14 to access the bus interface unit to perform 8-bit, 16-bit and 32-bit input/output cycles on the processor bus. The microprocessor 100 also includes a bus interface unit (not shown), which is coupled to the memory subsystem 108 and the execution pipeline 112 as an interface between the microprocessor 100 and the processor bus. In one embodiment, the processor bus conforms to the specification of a microprocessor bus of the Intel Pentium microprocessor family. ARM machine language programs can access the functions of the bus interface unit through the new implementation-defined ARM coprocessor registers to generate an input/output cycle on the processor bus, that is, a specific address transferred from the input/output bus to the input/output space , so as to communicate with the system chip set, for example, ARM machine language program can generate a specific loop recognized by SMI or input and output loop about C state transition. In one embodiment, the input and output addresses are assigned to ARM R0 registers. In one embodiment, the microprocessor 100 has power management capabilities, such as known P-state and C-state management. ARM machine language programs can perform power management through new implementation-defined ARM coprocessor registers. In one embodiment, the microprocessor 100 includes an encryption unit (not shown) located within the execution pipeline 112 . In one embodiment, the encryption unit is substantially similar to the encryption unit of a VIA microprocessor with Padlock security technology. ARM machine language programs can access cryptographic unit functions, such as cryptographic instructions, through new implementation-defined ARM coprocessor registers. In one embodiment, the ARM coprocessor 5 is used for user-mode functions typically performed by user-mode applications, such as those generated using technical features of the encryption unit.

在微处理器100执行x86ISA与ARM ISA机器语言程序时，每一次微处理器100执行x86或是ARM ISA指令124，硬件指令转译器104就会执行硬件转译。反之，采用软件转译的系统则能在多个事件中重复使用同一个转译，而非对之前已转译过的机器语言指令重复转译，因而有助于改善效能。此外，图8的实施例使用微指令快取以避免微处理器每一次执行x86或ARM ISA指令124时可能发生的重复转译操作。本发明的前述各个实施例所描述的方式是配合不同的程序特征及其执行环境，因此确实有助于改善效能。When the microprocessor 100 executes the x86 ISA and ARM ISA machine language programs, every time the microprocessor 100 executes the x86 or ARM ISA instruction 124, the hardware instruction translator 104 performs hardware translation. Conversely, systems using software translation can help improve performance by reusing the same translation across multiple events, rather than repeating translations of previously translated machine language instructions. Furthermore, the embodiment of FIG. 8 uses microinstruction caching to avoid repeated translation operations that may occur each time the microprocessor executes an x86 or ARM ISA instruction 124 . The manners described in the foregoing embodiments of the present invention are adapted to different program features and their execution environments, and thus are indeed helpful to improve performance.

分支预测器114存取之前执行过的x86与ARM分支指令的历史数据。分支预测器114依据之前的快取历史数据，来分析由指令快取102所取得快取线是否存在x86与ARM分支指令以及其目标地址。在一实施例中，快取历史数据包含分支指令124的存储器地址、分支目标地址、一个方向指针、分支指令的种类、分支指令快取线的起始字节、以及一个显示是否横跨多个快取线的指标。在一实施例中，如2011年4月7日提出的美国第61/473,067号临时申请案“APPARATUS AND METHOD FOR USING BRANCH PREDICTION TO EFFICIENTLYEXECUTE CONDITIONAL NON-BRANCH INSTRUCTIONS”，其提供改善分支预测器114的效能以使其能预测ARM ISA条件非分支指令方向的方法。在一实施例中，硬件指令转译器104并包含一静态分支预测器，可依据执行码、条件码的类型、向后(backward)或向前(forward)等等数据，预测x86与ARM分支指令的方向与分支目标地址。The branch predictor 114 accesses historical data of previously executed x86 and ARM branch instructions. The branch predictor 114 analyzes whether there are x86 and ARM branch instructions and their target addresses in the cache line obtained by the instruction cache 102 according to the previous cache history data. In one embodiment, the cache history data includes the memory address of the branch instruction 124, the branch target address, a direction pointer, the type of the branch instruction, the start byte of the branch instruction cache line, and an indication of whether it spans multiple Cache line indicator. In one embodiment, such as US Provisional Application No. 61/473,067, "APPARATUS AND METHOD FOR USING BRANCH PREDICTION TO EFFICIENTLYEXECUTE CONDITIONAL NON-BRANCH INSTRUCTIONS," filed April 7, 2011, provides improved performance of branch predictor 114 A way to make it predict the direction of an ARM ISA conditional non-branch instruction. In one embodiment, the hardware instruction translator 104 also includes a static branch predictor that can predict x86 and ARM branch instructions based on data such as execution code, type of condition code, backward or forward, etc. direction and branch target address.

本发明亦考虑多种不同的实施例以实现x86ISA与ARM ISA定义的不同特征的组合。举例来说，在一实施例中，微处理器100实现ARM、Thumb、ThumbEE与Jazelle指令集状态，但对Jazelle扩充指令集则是提供无意义的实现(trivial implementation)；微处理器100并实现下述扩充指令集，包含：Thumb-2、VFPv3-D32、进阶单指令多重数据(Advanced SIMD(Neon))、多重处理、与VMSA；但不实现下述扩充指令集，包含：安全性扩充、快速内容切换扩充、ARM除错(ARM程序可通过ARM MCR/MRC指令至新的实作定义协同处理器寄存器取得x86除错功能)、效能检测计数器(ARM程序可通过新的实作定义协同处理器寄存器取得x86效能计数器)。举例来说，在一实施例中，微处理器100将ARM SETEND指令视为一无操作指令(NOP)并且只支持Little-endian数据格式。在另一实施例中，微处理器100并不实现x86SSE4.2的功能。The present invention also contemplates a number of different embodiments to implement the combination of different features defined by the x86 ISA and the ARM ISA. For example, in one embodiment, microprocessor 100 implements the ARM, Thumb, ThumbEE, and Jazelle instruction set states, but provides a trivial implementation for the Jazelle extended instruction set; microprocessor 100 does not implement The following extended instruction sets, including: Thumb-2, VFPv3-D32, Advanced SIMD (Neon), multiprocessing, and VMSA; but do not implement the following extended instruction sets, including: Security extensions , Fast content switching extension, ARM debug (ARM program can obtain x86 debug function through ARM MCR/MRC instruction to the new implementation-defined coprocessor register), performance check counter (ARM program can use the new implementation-defined coprocessor register to obtain the x86 debug function) processor registers to obtain x86 performance counters). For example, in one embodiment, the microprocessor 100 treats the ARM SETEND instruction as a no-operation instruction (NOP) and only supports the Little-endian data format. In another embodiment, the microprocessor 100 does not implement x86SSE4.2 functionality.

本发明考虑多个实施例的微处理器100的改良，例如对台湾台北的威盛电子股份有限公司所生产的商用微处理器VIA Nano^TM进行改良。此Nano微处理器能够执行x86ISA机器语言程序，但无法执行ARM ISA机器语言程序。Nano微处理器包含高效能寄存器重命名、超纯量指令技术、非循序执行管线与一硬件转译器以将x86ISA指令转译为微指令供执行管线执行。本发明对于Nano硬件指令转译器的改良，使其除了可转译x86机器语言指令外，还可将ARM ISA机器语言指令转译为微指令供执行管线执行。硬件指令转译器的改良包含简单指令转译器的改良与复杂指令转译器的改良(亦包含微码在内)。此外，微指令集可加入新的微指令以支持ARM ISA机器语言指令与微指令间的转译，并可改善执行管线使能执行新的微指令。此外，Nano寄存器文件与存储器子系统亦可经改善使其能支持ARM ISA，亦包含特定寄存器的共享。分支预测单元可通过改善使其在x86分支预测外，亦能适用于ARM分支指令预测。此实施例的优点在于，因为在很大的程度上与ISA无关(largely ISA-agnostic)的限制，因而只需对于Nano微处理器的执行管线进行轻微的修改，即可适用于ARM ISA指令。对于执行管线的改良包含条件码旗标的产生与使用方式、用以更新与回报指令指针寄存器的语意、存取特权保护方法、以及多种存储器管理相关的功能，如存取违规检测、分页与转译后备缓冲区(TLB)的使用、与快取策略等。前述内容仅为例示，而非限定本发明，其中部分特征在后续内容会有进一步的说明。最后，如前述，x86ISA与ARM ISA定义的部分特征可能无法为前揭对Nano微处理器进行改良的实施例所支持，这些特征如x86SSE 4.2与ARM安全性扩充、快速内容切换扩充、除错与效能计数器，其中部分特征在后续内容会有更进一步的说明。此外，前揭通过对于Nano处理器的改良以支持ARM ISA机器语言程序，为一集成使用设计、测试与制造资源以完成能够执行x86与ARM机器语言程序的单集成电路产品的实施例，此单集成电路产品是涵盖市场绝大多数既存的机器语言程序，而符合现今市场潮流。本文所述的微处理器100的实施例实质上可被配置为x86微处理器、ARM微处理器、或是可同时执行x86ISA与ARM ISA机器语言程序微处理器。此微处理器可通过在单一微处理器100(或是图7的核心100)上的x86与ARM指令模式132间的动态切换以取得同时执行x86ISA与ARM ISA机器语言程序的能力，亦可通过将多核心微处理100(对应于图7所示)的一个或多个核心配置为ARM核心而一或多个核心配置为x86核心，亦即通过在多核心100的每一个核心上进行x86与ARM指令间的动态切换，以取得同时执行x86ISA与ARM ISA机器语言程序的能力。此外，传统上，ARM ISA核心是被设计作为知识产权核心，而被各个第三者协力厂商纳入其应用，如系统芯片与/或嵌入式应用。因此，ARM ISA并不具有一特定的标准处理器总线，作为ARM核心与系统的其它部分(如芯片组或其它接口设备)间的接口。有利的是，Nano处理器已具有一高速x86型式处理器总线作为连接至存储器与接口设备的接口，以及一存储器一致性结构可协同微处理器100在x86计算机系统环境下支持ARM ISA机器语言程序的执行。The present invention contemplates the improvement of the microprocessor 100 of various embodiments, such as the improvement of the commercial microprocessor VIA Nano ^TM produced by VIA Electronics Co., Ltd. of Taipei, Taiwan. This Nano microprocessor is capable of executing x86 ISA machine language programs, but not ARM ISA machine language programs. The Nano microprocessor includes high-performance register renaming, superscalar instruction technology, an out-of-order execution pipeline, and a hardware translator to translate x86 ISA instructions into microinstructions for execution by the execution pipeline. The invention improves the Nano hardware instruction translator, so that in addition to translating x86 machine language instructions, it can also translate ARM ISA machine language instructions into micro-instructions for execution by the execution pipeline. Improvements to hardware instruction translators include improvements to simple instruction translators and improvements to complex instruction translators (including microcode). In addition, the microinstruction set can add new microinstructions to support the translation between ARM ISA machine language instructions and microinstructions, and can improve the execution pipeline to enable the execution of new microinstructions. In addition, the Nano register file and memory subsystem can also be improved to support the ARM ISA, also including the sharing of specific registers. The branch prediction unit can be improved to be suitable for ARM branch instruction prediction in addition to x86 branch prediction. The advantage of this embodiment is that, because of the largely ISA-agnostic limitations, only minor modifications to the Nano microprocessor's execution pipeline are required to accommodate ARM ISA instructions. Improvements to the execution pipeline include the generation and use of condition code flags, the semantics for updating and reporting the instruction pointer register, access privilege protection methods, and various memory management related functions such as access violation detection, paging and translation The use of lookaside buffers (TLBs), and caching strategies, etc. The foregoing content is only an example, rather than a limitation of the present invention, and some of the features will be further described in the subsequent content. Finally, as mentioned above, some features defined by the x86 ISA and ARM ISA may not be supported by the previously disclosed embodiments that improve the Nano microprocessor, such as the x86SSE 4.2 and ARM security extensions, fast content switching extensions, debugging and Performance counter, some of its features will be further explained in the following content. In addition, the previous disclosure is an embodiment of a single integrated circuit product capable of executing x86 and ARM machine language programs by improving the Nano processor to support ARM ISA machine language programs, which is an integrated use of design, testing and manufacturing resources to execute x86 and ARM machine language programs. Integrated circuit products cover the vast majority of existing machine language programs in the market, and are in line with current market trends. The embodiments of the microprocessor 100 described herein may be configured essentially as an x86 microprocessor, an ARM microprocessor, or a microprocessor that can execute both x86 ISA and ARM ISA machine language programs. The microprocessor can achieve the ability to simultaneously execute x86 ISA and ARM ISA machine language programs through dynamic switching between x86 and ARM instruction modes 132 on a single microprocessor 100 (or core 100 of FIG. 7 ), or through One or more cores of the multi-core microprocessor 100 (corresponding to that shown in FIG. 7 ) are configured as ARM cores and one or more cores as x86 cores, that is, by performing x86 and Dynamic switching between ARM instructions to achieve simultaneous execution of x86ISA and ARM ISA machine language programs. In addition, traditionally, the ARM ISA core was designed as an intellectual property core and incorporated into its applications by various third-party third-party vendors, such as SoCs and/or embedded applications. Therefore, the ARM ISA does not have a specific standard processor bus as the interface between the ARM core and other parts of the system (such as chipsets or other interface devices). Advantageously, the Nano processor already has a high-speed x86-style processor bus as an interface to memory and interface devices, and a memory-coherent architecture that cooperates with the microprocessor 100 to support ARM ISA machine language programs in an x86 computer system environment. execution.

请参照图2，图中是以方块图详细显示图1的硬件指令转译器104。此硬件指令转译器104包含硬件，更具体来说，就是晶体管的集合。硬件指令转译器104包含一指令格式化程序202，由图1的指令快取102接收指令模式指针132以及x86ISA与ARM ISA指令字节124的区块，并输出格式化的x86ISA与ARM ISA指令242；一简单指令转译器(SIT)204接收指令模式指针132与环境模式指针136，并输出实行微指令244与一微码地址252；一复杂指令转译器(CIT)206(亦称为一微码单元)，接收微码地址252与环境模式指针136，并提供实行微指令246；以及一多工器212，其一输入端由简单指令转译器204接收微指令244，另一输入端由复杂指令转译器206接收微指令246，并提供实行微指令126至图1的执行管线112。指令格式化程序202在图3会有更详细的说明。简单指令转译器204包含一x86简单指令转译器222与一ARM简单指令转译器224。复杂指令转译器206包含一接收微码地址252的微程序计数器(micro-PC)232，一由微程序计数器232接收只读存储器地址254的微码只读存储器234，一用以更新微程序计数器的微序列器236、一指令间接寄存器(instruction indirectionregister,IIR)235、以及一用以产生复杂指令转译器所输出的实行微指令246的微转译器(microtranslator)237。由简单指令转译器204所产生的实行微指令244与由复杂指令转译器206所产生的实行微指令246都属于微处理器100的微架构的微指令集的微指令126，并且都可直接由执行管线112执行。Please refer to FIG. 2 , which is a block diagram showing the hardware instruction translator 104 of FIG. 1 in detail. The hardware instruction translator 104 includes hardware, more specifically, a collection of transistors. The hardware instruction translator 104 includes an instruction formatter 202, which receives the instruction mode pointer 132 and the block of x86ISA and ARM ISA instruction bytes 124 from the instruction cache 102 of FIG. 1, and outputs formatted x86ISA and ARM ISA instructions 242 ; a simple instruction translator (SIT) 204 receives instruction mode pointer 132 and ambient mode pointer 136, and outputs implementation microinstructions 244 and a microcode address 252; a complex instruction translator (CIT) 206 (also known as a microcode unit), receives microcode address 252 and ambient mode pointer 136, and provides execution microinstructions 246; and a multiplexer 212, one input of which receives microinstructions 244 by simple instruction translator 204, and the other input is composed of complex instructions Translator 206 receives microinstructions 246 and provides implementation of microinstructions 126 to execution pipeline 112 of FIG. 1 . The instruction formatter 202 is described in more detail in FIG. 3 . The simple instruction translator 204 includes an x86 simple instruction translator 222 and an ARM simple instruction translator 224 . The complex instruction translator 206 includes a micro-PC (micro-PC) 232 that receives the micro-program address 252, a micro-code ROM 234 that receives the ROM address 254 from the micro-program counter 232, a micro-PC for updating the micro-PC The microsequencer 236, an instruction indirection register (IIR) 235, and a microtranslator (microtranslator) 237 for generating the execution microinstructions 246 output by the complex instruction translator. Both the execution microinstructions 244 generated by the simple instruction translator 204 and the execution microinstructions 246 generated by the complex instruction translator 206 belong to the microinstructions 126 of the microinstruction set of the microarchitecture of the microprocessor 100, and can be directly accessed by The execution pipeline 112 executes.

多工器212是受到一选择输入248所控制。一般的时候，多工器212会选择来自简单指令转译器204的微指令；然而，当简单指令转译器204遭遇一复杂x86或ARM ISA指令242而将控制权转移、或遭遇陷阱(traps)、以转移至复杂指令转译器206时，简单指令转译器204控制选择输入248让多工器212选择来自复杂指令转译器的微指令246。当寄存器配置表(RAT)402(请参照图4)遭遇到一个微指令126具有一特定位指出其为实现复杂ISA指令242序列的最后一个微指令126时，寄存器配置表402随即控制选择输入248使多工器212回复至选择来自简单指令转译器204的微指令244。此外，当重排缓冲器422(请参照图4)准备要使微指令126引退且该指令的状态指出需要选择来自复杂指令器的微指令时，重排缓冲器422控制选择输入248使多工器212选择来自复杂指令转译器206的微指令246。前揭需引退微指令126的情形如：微指令126已经导致一例外条件产生。Multiplexer 212 is controlled by a select input 248 . Normally, the multiplexer 212 selects a microinstruction from the simple instruction translator 204; however, when the simple instruction translator 204 encounters a complex x86 or ARM ISA instruction 242 and transfers control, or encounters traps, To transfer to complex instruction translator 206, simple instruction translator 204 controls select input 248 to allow multiplexer 212 to select microinstructions 246 from the complex instruction translator. When register allocation table (RAT) 402 (see FIG. 4 ) encounters a microinstruction 126 with a specific bit indicating that it is the last microinstruction 126 to implement a sequence of complex ISA instructions 242 , register allocation table 402 then controls select input 248 The multiplexer 212 is reverted to selecting the microinstruction 244 from the simple instruction translator 204 . In addition, when the reorder buffer 422 (see FIG. 4) is ready to retire the microinstruction 126 and the state of the instruction indicates that a microinstruction from the complex instruction set needs to be selected, the reorder buffer 422 controls the select input 248 to multiplex The processor 212 selects the microinstructions 246 from the complex instruction translator 206 . For example, the situation in which the microinstruction 126 needs to be retired is as follows: the microinstruction 126 has caused an exception condition to be generated.

简单指令转译器204接收ISA指令242，并且在指令模式指针132指示为x86时，将这些指令视为x86ISA指令进行解码，而在指令模式指针132指示为ARM时，将这些指令视为ARMISA指令进行解码。简单指令转译器204并确认此ISA指令242为简单或是复杂ISA指令。简单指令转译器204能够为简单ISA指令242，输出所有用以实现此ISA指令242的实行微指令126；也就是说，复杂指令转译器206并不提供任何实行微指令126给简单ISA指令124。反之，复杂ISA指令124要求复杂指令转译器206提供至少部分(若非全部)的实行微指令126。在一实施例中，对ARM与x86ISA指令集的指令124的子集合而言，简单指令转译器204输出部分实现x86/ARM ISA指令126的微指令244，随后将控制权转移至复杂指令转译器206，由复杂指令转译器206接续输出剩下的微指令246来实现x86/ARM ISA指令126。多工器212是受到控制，首先提供来自简单指令转译器204的实行微指令244作为提供至执行管线112的微指令126，随后提供来自复杂指令转译器206的实行微指令246作为提供至执行管线112的微指令126。简单指令转译器204知道由硬件指令转译器104执行，以针对多个不同复杂ISA指令124产生实行微指令126的多个微码程序中的起始微码只读存储器234的地址，并且当简单指令转译器204对一复杂ISA指令242进行解码时，简单指令转译器204会提供相对应的微码程序地址252至复杂指令转译器206的微程序计数器232。简单指令转译器204输出实现ARM与x86ISA指令集中相当大比例的指令124所需的微指令244，尤其是对于需要由x86ISA与ARMISA机器语言程序来说是较常执行的ISA指令124，而只有相对少数的指令124需要由复杂指令转译器206提供实行微指令246。依据一实施例，主要由复杂指令转译器206实现的x86指令如RDMSR/WRMSR、CPUID、复杂运算指令(如FSQRT与超越指令(transcendentalinstruction))、以及IRET指令；主要由复杂指令转译器206实现的ARM指令如MCR、MRC、MSR、MRS、SRS、与RFE指令。前揭列出的指令并非限定本发明，仅例示指出本发明复杂指令转译器206所能实现的ISA指令的种类。The simple instruction translator 204 receives ISA instructions 242 and decodes these instructions as x86 ISA instructions when the instruction mode pointer 132 indicates x86, and treats these instructions as ARMISA instructions when the instruction mode pointer 132 indicates ARM. decoding. Simple instruction translator 204 and confirms that ISA instruction 242 is a simple or complex ISA instruction. The simple instruction translator 204 is capable of outputting all the implement microinstructions 126 for implementing the simple ISA instruction 242 for the simple ISA instruction 242; that is, the complex instruction translator 206 does not provide any implement microinstructions 126 to the simple ISA instruction 124. Conversely, complex ISA instructions 124 require complex instruction translator 206 to provide at least some, if not all, of execute microinstructions 126 . In one embodiment, for the subset of instructions 124 of the ARM and x86 ISA instruction sets, the simple instruction translator 204 outputs microinstructions 244 that partially implement the x86/ARM ISA instructions 126, and then transfers control to the complex instruction translator 206, the complex instruction translator 206 continuously outputs the remaining microinstructions 246 to implement the x86/ARM ISA instructions 126. The multiplexer 212 is controlled to first provide the execute microinstructions 244 from the simple instruction translator 204 as the microinstructions 126 to the execution pipeline 112, and then provide the execute microinstructions 246 from the complex instruction translator 206 to be supplied to the execution pipeline 112 microinstructions 126. Simple instruction translator 204 knows to be executed by hardware instruction translator 104 to generate the address of starting microcode ROM 234 in multiple microcode programs that implement microinstruction 126 for multiple different complex ISA instructions 124, and when simple When the instruction translator 204 decodes a complex ISA instruction 242 , the simple instruction translator 204 provides the corresponding microcode program address 252 to the microprogram counter 232 of the complex instruction translator 206 . The simple instruction translator 204 outputs the microinstructions 244 required to implement a substantial proportion of the instructions 124 in the ARM and x86ISA instruction sets, especially for the ISA instructions 124 that need to be executed more frequently by x86ISA and ARMISA machine language programs, while only relatively A small number of instructions 124 need to be provided by complex instruction translator 206 to implement microinstructions 246 . According to one embodiment, x86 instructions such as RDMSR/WRMSR, CPUID, complex arithmetic instructions (such as FSQRT and transcendental instructions), and IRET instructions are mainly implemented by complex instruction translator 206; ARM instructions such as MCR, MRC, MSR, MRS, SRS, and RFE instructions. The instructions disclosed above do not limit the present invention, but merely illustrate the types of ISA instructions that can be implemented by the complex instruction translator 206 of the present invention.

当指令模式指针132指示为x86，x86简单指令转译器222对于x86ISA指令242进行解码，并且将其转译为实行微指令244；当指令模式指针132指示为ARM，ARM简单指令转译器224对于ARM ISA指令242进行解码，并将其转译为实行微指令244。在一实施例中，简单指令转译器204是一可由已知合成工具合成的布尔逻辑门方块。在一实施例中，x86简单指令转译器222与ARM简单指令转译器224是独立的布尔逻辑门方块；不过，在另一实施例中，x86简单指令转译器222与ARM简单指令转译器224是位于同一个布尔逻辑门方块。在一实施例中，简单指令转译器204在单一时脉周期中转译最多三个ISA指令242并提供最多六个实行微指令244至执行管线112。在一实施例中，简单指令转译器204包含三个次转译器(未图示)，各个次转译器转译单一个格式化的ISA指令242，其中，第一个转译器能够转译需要不多于三个实行微指令126的格式化ISA指令242；第二个转译器能够转译需要不多于两个实行微指令126的格式化ISA指令242；第三个转译器能后转译需要不多于一个实行微指令126的格式化ISA指令242。在一实施例中，简单指令转译器204包含一硬件状态机器使其能够在多个时脉周期输出多个微指令244以实现一个ISA指令242。When the instruction mode pointer 132 indicates x86, the x86 simple instruction translator 222 decodes the x86 ISA instruction 242, and translates it into an execution microinstruction 244; when the instruction mode pointer 132 indicates ARM, the ARM simple instruction translator 224 for the ARM ISA Instructions 242 are decoded and translated into implement microinstructions 244 . In one embodiment, the simple instruction translator 204 is a block of Boolean logic gates that can be synthesized by known synthesis tools. In one embodiment, x86 simple instruction translator 222 and ARM simple instruction translator 224 are separate Boolean logic gate blocks; however, in another embodiment, x86 simple instruction translator 222 and ARM simple instruction translator 224 are located in the same Boolean logic gate block. In one embodiment, the simple instruction translator 204 translates up to three ISA instructions 242 and provides up to six execute microinstructions 244 to the execution pipeline 112 in a single clock cycle. In one embodiment, the simple instruction translator 204 includes three sub-translators (not shown), each sub-translator translating a single formatted ISA instruction 242, wherein the first translator is capable of translating no more than Three formatted ISA instructions 242 implementing microinstructions 126; a second translator capable of translating no more than two formatted ISA instructions 242 implementing microinstructions 126; a third translator capable of posttranslation requiring no more than one Formatted ISA instructions 242 of microinstructions 126 are executed. In one embodiment, the simple instruction translator 204 includes a hardware state machine that enables it to output multiple microinstructions 244 over multiple clock cycles to implement an ISA instruction 242 .

在一实施例中，简单指令转译器204并依据指令模式指针132与/或环境模式指针136，执行多个不同的例外事件检测。举例来说，若是指令模式指针132指示为x86且x86简单指令转译器222对一个就x86ISA而言是无效的ISA指令124进行解码，简单指令转译器204随即产生一个x86无效操作码例外事件；相似地，若是指令模式指针132指示为ARM且ARM简单指令转译器224对一个就ARM ISA而言是无效的ISA指令124进行解码，简单指令转译器204随即产生一个ARM未定义指令例外事件。在另一实施例中，若是环境模式指针136指示为x86ISA，简单指令转译器204随即检测是否其所遭遇的每个x86ISA指令242需要一特别特权级(particular privilege level)，若是，检测当前特权级(CPL)是否满足此x86ISA指令242所需的特别特权级，并于不满足时产生一例外事件；相似地，若是环境模式指针136指示为ARM ISA，简单指令转译器204随即检测是否每个格式化ARM ISA指令242需要一特权模式指令，若是，检测当前的模式是否为特权模式，并于现在模式为使用者模式时，产生一例外事件。复杂指令转译器206对于特定复杂ISA指令242亦执行类似的功能。In one embodiment, the simple instruction translator 204 performs a number of different exception detections based on the instruction mode pointer 132 and/or the ambient mode pointer 136 . For example, if the instruction mode pointer 132 indicates x86 and the x86 simple instruction translator 222 decodes an ISA instruction 124 that is invalid for the x86 ISA, the simple instruction translator 204 then generates an x86 invalid opcode exception; similarly Also, if the instruction mode pointer 132 indicates ARM and the ARM simple instruction translator 224 decodes an ISA instruction 124 that is not valid for the ARM ISA, the simple instruction translator 204 then generates an ARM undefined instruction exception event. In another embodiment, if ambient mode pointer 136 indicates x86 ISA, simple instruction translator 204 then checks whether each x86 ISA instruction 242 it encounters requires a particular privilege level, and if so, checks the current privilege level (CPL) whether the special privilege level required by this x86ISA instruction 242 is satisfied, and an exception is generated if not; similarly, if the environment mode pointer 136 indicates ARM ISA, the simple instruction translator 204 then checks whether each format The ARM ISA instruction 242 requires a privileged mode instruction. If so, it detects whether the current mode is the privileged mode, and generates an exception when the current mode is the user mode. Complex instruction translator 206 performs similar functions for certain complex ISA instructions 242 .

复杂指令转译器206输出一系列实行微指令246至多工器212。微码只读存储器234储存微码程序的只读存储器指令247。微码只读存储器234输出只读存储器指令247以响应由微码只读存储器234取得的下一个只读存储器指令247的地址，并由微程序计数器232所持有。一般来说，微程序计数器232由简单指令转译器204接收其起始值252，以响应简单指令转译器204对于一复杂ISA指令242的解码操作。在其它情形，例如响应一重置或例外事件，微程序计数器232分别接收重置微码程序地址或适当的微码例外事件处理地址。微程序器236通常依据只读存储器指令247的大小，将微程序计数器232更新为微码程序的序列以及选择性地更新为执行管线112响应控制型微指令126(如分支指令)执行所产生的目标地址，以使指向微码只读存储器234内的非程序地址的分支生效。微码只读存储器234是制造于微处理器100的半导体芯片内。Complex instruction translator 206 outputs a series of execute microinstructions 246 to multiplexer 212 . The microcode ROM 234 stores the ROM instructions 247 of the microcode program. The microcode ROM 234 outputs the ROM instruction 247 in response to the address of the next ROM instruction 247 fetched by the microcode ROM 234 and held by the microprogram counter 232 . Generally, the microprogram counter 232 receives its start value 252 from the simple instruction translator 204 in response to the simple instruction translator 204 decoding a complex ISA instruction 242 . In other cases, such as in response to a reset or exception event, the microprogram counter 232 receives the reset microcode program address or the appropriate microcode exception handler address, respectively. The microprogrammer 236 typically updates the microprogram counter 232 to a sequence of microcode programs and, optionally, to the execution pipeline 112 in response to the execution of control-type microinstructions 126 (eg, branch instructions), depending on the size of the ROM instruction 247. target address to enable branches to non-program addresses within microcode ROM 234 to take effect. Microcode ROM 234 is fabricated within the semiconductor chip of microprocessor 100 .

除了用来实现简单ISA指令124或部分复杂ISA指令124的微指令244外，简单指令转译器204也产生ISA指令信息255以写入指令间接寄存器235。储存于指令间接寄存器235的ISA指令信息255包含关于被转译的ISA指令124的信息，例如，确认由ISA指令所指定的来源与目的寄存器的信息以及ISA指令124的格式，如ISA指令124是在存储器的一操作数上或是在微处理器100的一架构寄存器106内执行。这样可藉此使微码程序能够变为通用，亦即不需对于各个不同的来源与/或目的架构寄存器106使用不同的微码程序。尤其是，简单指令转译器204知道寄存器文件106的内容，包含哪些寄存器是共享寄存器504，而能将x86ISA与ARM ISA指令124内提供的寄存器信息，通过ISA指令信息255的使用，转译至寄存器文件106内的适当的寄存器。ISA指令信息255包含一移位字段、一立即字段、一常数字段、各个来源操作数与微指令126本身的重命名信息、用以实现ISA指令124的一系列微指令126中指示第一个与最后一个微指令126的信息、以及储存由硬件指令转译器104对ISA指令124转译时所搜集到的有用信息的其它位。In addition to microinstructions 244 used to implement simple ISA instructions 124 or portions of complex ISA instructions 124 , simple instruction translator 204 also generates ISA instruction information 255 for writing to instruction indirect registers 235 . The ISA instruction information 255 stored in the instruction indirect register 235 contains information about the ISA instruction 124 being translated, for example, information identifying the source and destination registers specified by the ISA instruction and the format of the ISA instruction 124, such as the ISA instruction 124 in the Executes on an operand of memory or within an architectural register 106 of the microprocessor 100 . This can thereby enable the microcode routines to be generalized, ie, eliminating the need to use different microcode routines for each different source and/or destination architectural register 106 . In particular, the simple instruction translator 204 knows the contents of the register file 106, including which registers are shared registers 504, and can translate the register information provided in the x86 ISA and ARM ISA instructions 124 to the register file through the use of the ISA instruction information 255. appropriate registers within 106. ISA instruction information 255 includes a shift field, an immediate field, a constant field, renaming information for each source operand and microinstruction 126 itself, the first and Information about the last microinstruction 126, and other bits that store useful information collected by the hardware instruction translator 104 when translating the ISA instruction 124.

微转译器237由微码只读存储器234与间接指令寄存器235的内容接收只读存储器指令247，并相应地产生实行微指令246。微转译器237依据由间接指令寄存器235接收的信息，如依据ISA指令124的格式以及由其所指定的来源与/或目的架构寄存器106组合，来将特定只读存储器指令247转译为不同的微指令246系列。在一些实施例中，许多ISA指令信息255是与只读存储器指令247合并以产生实行微指令246。在一实施例中，各个只读存储器指令247大约有40位宽，并且各个微指令246大约有200位宽。在一实施例中，微转译器237最多能够由一个微读存储器指令247产生三个微指令246。微转译器237包含多个布尔逻辑门以产生实行微指令246。Microtranslator 237 receives ROM instruction 247 from the contents of microcode ROM 234 and indirect instruction register 235 and generates execute microinstruction 246 accordingly. The microtranslator 237 translates a particular ROM instruction 247 into a different microtranslator based on the information received by the indirect instruction register 235, such as based on the format of the ISA instruction 124 and the combination of source and/or destination architecture registers 106 specified by it. Instruction 246 series. In some embodiments, number of ISA instruction information 255 is combined with ROM instructions 247 to generate execute microinstructions 246 . In one embodiment, each ROM instruction 247 is approximately 40 bits wide, and each microinstruction 246 is approximately 200 bits wide. In one embodiment, the microtranslator 237 can generate up to three microinstructions 246 from one microread memory instruction 247 . Microtranslator 237 contains a plurality of Boolean logic gates to generate execute microinstructions 246 .

使用微转译器237的优点在于，由于简单指令转译器204本身就会产生ISA指令信息255，微码只读存储器234不需要储存间接指令寄存器235提供的ISA指令信息255，因而可以降低减少其大小。此外，因为微码只读存储器234不需要为了各个不同的ISA指令格式、以及各个来源与/或目的架构寄存器106的组合，提供一独立的程序，微码只读存储器234程序可包含较少的条件分支指令。举例来说，若是复杂ISA指令124是存储器格式，简单指令转译器204会产生微指令244的逻辑编程，其包含将来源操作数由存储器加载一暂时寄存器106的微指令244，并且微转译器237会产生微指令246用以将结果由暂时寄存器106储存至存储器；然而，若复杂ISA指令124是寄存器格式，此逻辑编程会将来源操作数由ISA指令124所指定的来源寄存器移动至暂时寄存器，并且微转译器237会产生微指令246用以将结果由暂时寄存器移动至由间接指令寄存器235所指定的架构目的寄存器106。在一实施例中，微转译器237的许多面向是类似于2010年4月23日提出的美国专利第12/766,244号申请案，在此是列为参考数据。不过，本案的微转译器237除了x86ISA指令124外，亦经改良以转译ARM ISA指令124。The advantage of using the microtranslator 237 is that since the simple instruction translator 204 itself generates the ISA instruction information 255, the microcode ROM 234 does not need to store the ISA instruction information 255 provided by the indirect instruction register 235, thus reducing its size. . Furthermore, because the microcode ROM 234 does not need to provide a separate program for each different ISA instruction format, and each source and/or destination architecture register 106 combination, the microcode ROM 234 program may contain fewer Conditional branch instruction. For example, if complex ISA instructions 124 are in memory format, simple instruction translator 204 generates logic programming of microinstructions 244 including microinstructions 244 that load source operands from memory into a scratch register 106, and microtranslator 237 A microinstruction 246 is generated to store the result from the scratch register 106 to memory; however, if the complex ISA instruction 124 is in register format, the logic programming will move the source operand from the source register specified by the ISA instruction 124 to the scratch register, And the microtranslator 237 generates a microinstruction 246 for moving the result from the scratch register to the architectural destination register 106 designated by the indirect instruction register 235 . In one embodiment, many aspects of microtranslator 237 are similar to US Patent Application Serial No. 12/766,244, filed April 23, 2010, which is incorporated herein by reference. However, in addition to the x86 ISA instructions 124, the microtranslator 237 in this case is also improved to translate the ARM ISA instructions 124.

值得注意的是，微程序计数器232不同于ARM程序计数器116与x86指令指针118，亦即微程序计数器232并不持有ISA指令124的地址，微程序计数器232所持有的地址亦不落于系统存储器地址空间内。此外，更值得注意的是，微指令246是由硬件指令转译器104所产生，并且直接提供给执行管线112执行，而非作为执行管线112的执行结果128。It is worth noting that the microprogram counter 232 is different from the ARM program counter 116 and the x86 instruction pointer 118, that is, the microprogram counter 232 does not hold the address of the ISA instruction 124, and the address held by the microprogram counter 232 does not fall within the in the system memory address space. Furthermore, it is worth noting that the microinstructions 246 are generated by the hardware instruction translator 104 and provided directly to the execution pipeline 112 for execution, rather than as the execution result 128 of the execution pipeline 112 .

请参照图3，图中是以方块图详述图2的指令格式化器202。指令格式化器202由图1的指令快取102接收x86ISA与ARM ISA指令字节124区块。凭借x86ISA指令长度可变的特性，x86指令124可以由指令字节124区块的任何字节开始。由于x86ISA容许前缀字节的长度会受到当前地址长度与操作数长度预设值的影响，因此确认快取区块内的x86ISA指令的长度与位置的任务会更为复杂。此外，依据当前ARM指令集状态322与ARM ISA指令124的操作码，ARM ISA指令的长度不是2字节就是4字节，因而不是2字节对齐就是4字节对齐。因此，指令格式化器202由指令字节124串(stream)撷取不同的x86ISA与ARM ISA指令，此指令字节124串是由指令快取102接收的区块所构成。也就是说，指令格式化器202格式化x86ISA与ARMISA指令字节串，因而大幅简化图2的简单指令转译器对ISA指令124进行解码与转译的困难任务。Please refer to FIG. 3 , which is a block diagram illustrating the command formatter 202 of FIG. 2 in detail. Instruction formatter 202 receives x86 ISA and ARM ISA instruction byte 124 blocks from instruction cache 102 of FIG. 1 . With the variable-length nature of x86 ISA instructions, x86 instruction 124 can start with any byte in the instruction byte 124 block. Since x86ISA allows the length of the prefix byte to be affected by the current address length and operand length presets, the task of confirming the length and location of x86ISA instructions within a cache block is more complicated. In addition, according to the current ARM instruction set state 322 and the opcode of the ARM ISA instruction 124, the length of the ARM ISA instruction is either 2 bytes or 4 bytes, so it is either 2-byte aligned or 4-byte aligned. Therefore, the instruction formatter 202 extracts different x86 ISA and ARM ISA instructions from a stream of instruction bytes 124 formed from blocks received by the instruction cache 102 . That is, the instruction formatter 202 formats the x86 ISA and ARMISA instruction byte strings, thereby greatly simplifying the difficult task of decoding and translating the ISA instructions 124 by the simple instruction translator of FIG. 2 .

指令格式化器202包含一预解码器302，在指令模式指针132指示为x86时，预解码器302预先将指令字节124视为x86指令字节进行解码以产生预解码信息，在指令模式指针132指示为ARM时，预解码器302预先将指令字节124视为ARM指令字节进行解码以产生预解码信息。指令字节队列(IBQ)304接收ISA指令字节124区块以及由预解码器302产生的相关预解码信息。The instruction formatter 202 includes a pre-decoder 302. When the instruction mode pointer 132 indicates x86, the pre-decoder 302 pre-decodes the instruction byte 124 as an x86 instruction byte to generate pre-decoding information. When 132 indicates ARM, the pre-decoder 302 pre-decodes the instruction byte 124 as an ARM instruction byte to generate pre-decoding information. Instruction Byte Queue (IBQ) 304 receives the ISA instruction byte 124 block and associated pre-decoding information generated by pre-decoder 302 .

一个由长度解码器与涟波逻辑门306构成的阵列接收指令字节队列304底部项目(bottom entry)的内容，亦即ISA指令字节124区块与相关的预解码信息。此长度解码器与涟波逻辑门306亦接收指令模式指针132与ARM ISA指令集状态322。在一实施例中，ARM ISA指令集状态322包含ARM ISA CPSR寄存器的J与T位。为了响应其输入信息，此长度解码器与涟波逻辑门306产生解码信息。此解码信息包含ISA指令字节124区块内的x86与ARM指令的长度、x86前缀信息、以及关于各个ISA指令字节124的指针，此指针指出此字节是否为ISA指令124的起始字节、终止字节、以及/或一有效字节。一多工器队列308接收ISA指令字节124区块、由预解码器302产生的相关预解码信息、以及由长度解码器与涟波逻辑门306产生的相关解码信息。An array of length decoders and ripple logic gates 306 receives the contents of the bottom entry of the instruction byte queue 304, ie, the ISA instruction byte 124 block and associated pre-decoding information. The length decoder and ripple logic gate 306 also receives the instruction mode pointer 132 and the ARM ISA instruction set status 322. In one embodiment, the ARM ISA instruction set state 322 includes the J and T bits of the ARM ISA CPSR register. In response to its input information, the length decoder and ripple logic gate 306 generate decoded information. The decoded information includes the length of the x86 and ARM instructions within the ISA instruction byte 124 block, the x86 prefix information, and a pointer to each ISA instruction byte 124 indicating whether the byte is the start word of the ISA instruction 124 section, termination byte, and/or a valid byte. A multiplexer queue 308 receives the block of ISA command bytes 124 , the associated predecode information generated by the predecoder 302 , and the associated decode information generated by the length decoder and ripple logic gate 306 .

控制逻辑(未图标)检验多工器队列(MQ)308底部项目的内容，并控制多工器312撷取不同的、或格式化的ISA指令与相关的预解码与解码信息，所撷取的信息是提供至一格式化指令队列(FIQ)314。格式化指令队列314在格式化ISA指令242与提供至图2的简单指令转译器204的相关信息间作为缓冲。在一实施例中，多工器312在每一个时脉周期内撷取至多三个格式化ISA指令与相关的信息。Control logic (not shown) checks the contents of the bottom entry of multiplexer queue (MQ) 308 and controls multiplexer 312 to retrieve different, or formatted, ISA commands and associated pre-decoding and decoding information, the retrieved Information is provided to a formatted instruction queue (FIQ) 314 . Formatted instruction queue 314 acts as a buffer between formatted ISA instructions 242 and related information provided to simple instruction translator 204 of FIG. 2 . In one embodiment, the multiplexer 312 retrieves up to three formatted ISA instructions and associated information per clock cycle.

在一实施例中，指令格式化程序202在许多方面类似于2009年10月1日提出的美国专利第12/571,997号、第12/572,002号、第12/572,045号、第12/572,024号、第12/572,052号与第12/572,058号申请案共同揭露的XIBQ、指令格式化程序、与FIQ，这些申请案在此列为参考数据。然而，前述专利申请案所揭示的XIBQ、指令格式化程序、与FIQ通过修改，使其能在格式化x86ISA指令124外，还能格式化ARM ISA指令124。长度解码器306被修改，使能对ARM ISA指令124进行解码以产生长度以及起点、终点与有效性的字节指针。尤其，若是指令模式指针132指示为ARM ISA，长度解码器306检测当前ARM指令集状态322与ARM ISA指令124的操作码，以确认ARM指令124是一个2位组长度或是4位组长度的指令。在一实施例中，长度解码器306包含多个独立的长度解码器分别用以产生x86ISA指令124的长度数据以及ARM ISA指令124的长度数据，这些独立的长度解码器的输出再以连线或(wire-ORed)耦接在一起，以提供输出至涟波逻辑门306。在一实施例中，此格式化指令队列314包含独立的队列以持有格式化指令242的多个互相分离的部分。在一实施例中，指令格式化程序202在单一时脉周期内，提供简单指令转译器204至多三个格式化ISA指令242。In one embodiment, the instruction formatter 202 is similar in many respects to US Patent Nos. 12/571,997, 12/572,002, 12/572,045, 12/572,024, XIBQ, Command Formatter, and FIQ disclosed jointly by Application Nos. 12/572,052 and 12/572,058, which are incorporated herein by reference. However, the XIBQ, instruction formatter, and FIQ disclosed in the aforementioned patent application are modified so that they can format ARM ISA instructions 124 in addition to the x86 ISA instructions 124 . Length decoder 306 is modified to enable decoding of ARM ISA instructions 124 to generate length and byte pointers for start, end and validity. In particular, if the instruction mode pointer 132 indicates ARM ISA, the length decoder 306 detects the current ARM instruction set state 322 and the opcode of the ARM ISA instruction 124 to confirm that the ARM instruction 124 is a 2-byte or 4-byte length. instruction. In one embodiment, the length decoder 306 includes a plurality of independent length decoders for generating the length data of the x86 ISA instruction 124 and the length data of the ARM ISA instruction 124 respectively, and the outputs of these independent length decoders are then wired or (wire-ORed) coupled together to provide the output to the ripple logic gate 306 . In one embodiment, the format command queue 314 includes separate queues to hold separate portions of the format command 242 . In one embodiment, the instruction formatter 202 provides the simple instruction translator 204 with up to three formatted ISA instructions 242 in a single clock cycle.

请参照图4，图中是以方块图详细显示图1的执行管线112，此执行管线112耦接至硬件指令转译器104以直接接收来自图2的硬件指令转译器104的实行微指令。执行管线112包含一微指令队列401，以接收微指令126；一寄存器配置表402，由微指令队列401接收微指令；一指令调度器404，耦接至寄存器配置表402；多个保留站406，耦接至指令调度器404；一指令发布单元408，耦接至保留站406；一重排缓冲器422，耦接至寄存器配置表402、指令调度器404与保留站406；以及，执行单元424是耦接至保留站406、指令发布单元408与重排缓冲器422。寄存器配置表402与执行单元424接收指令模式指针132。Please refer to FIG. 4 , which is a block diagram showing the execution pipeline 112 of FIG. 1 in detail. The execution pipeline 112 is coupled to the hardware instruction translator 104 to directly receive the execution microinstructions from the hardware instruction translator 104 of FIG. 2 . The execution pipeline 112 includes a microinstruction queue 401 for receiving microinstructions 126 ; a register allocation table 402 for receiving microinstructions from the microinstruction queue 401 ; an instruction scheduler 404 coupled to the register allocation table 402 ; a plurality of reservation stations 406 , coupled to the instruction scheduler 404; an instruction issue unit 408, coupled to the reservation station 406; a rearrangement buffer 422, coupled to the register allocation table 402, the instruction scheduler 404 and the reservation station 406; and, the execution unit 424 is coupled to the reservation station 406 , the instruction issue unit 408 and the rearrangement buffer 422 . The register configuration table 402 and the execution unit 424 receive the instruction mode pointer 132 .

在硬件指令转译器104产生实行微指令126的速率不同于执行管线112执行微指令126的情况下，微指令队列401是作为一缓冲器。在一实施例中，微指令队列401包含一个M至N可压缩微指令队列。此可压缩微指令队列使执行管线112能够在一给定的时脉周期内，从硬件指令转译器104接收至多M个(在一实施例中，M是六)微指令126，并且随后将接收到的微指令126储存至宽度为N(在一实施例中，N是三)的队列结构，以在每个时脉周期提供至多N个微指令126至寄存器配置表402，此寄存器配置表402能够在每个时脉周期处理最多N个微指令126。微指令队列401是可压缩的，因它不论接收到微指令126的特定时脉周期为何，皆会依序将由硬件指令转译器104所传送的微指令126时填满队列的空项目，因而不会在队列项目中留下空洞。此方法的优点为能够充分利用执行单元424(请参照图4)，因为它可比对在一不可压缩宽度M或宽度M的指令队列提供较高的指令储存效能。具体来说，不可压缩宽度N的队列会需要硬件指令转译器104，尤其是简单指令转译器204，在之后的时脉周期内会重复转译一个或多个已经在之前的时脉周期内已经被转译过的ISA指令124。会这样做的原因是，不可压缩宽度N的队列无法在同一个时脉周期接收多于N个微指令126，而重复转译将导致电力耗损。不过，不可压缩宽度M的队列虽然不需要简单指令转译器204重复转译，但却会在队列项目中产生空洞而导致浪费，因而需要更多列项目以及一个较大且更耗能的队列来提供相当的缓冲能力。The microinstruction queue 401 acts as a buffer in situations where the hardware instruction translator 104 generates and executes the microinstructions 126 at a different rate than the execution pipeline 112 executes the microinstructions 126 . In one embodiment, the microinstruction queue 401 includes an M to N compressible microinstruction queue. This compressible microinstruction queue enables the execution pipeline 112 to receive up to M (in one embodiment, M is six) microinstructions 126 from the hardware instruction translator 104 in a given clock cycle, and will subsequently receive The resulting microinstructions 126 are stored in a queue structure of width N (in one embodiment, N is three) to provide at most N microinstructions 126 per clock cycle to the register allocation table 402 , which Up to N microinstructions 126 can be processed per clock cycle. The microinstruction queue 401 is compressible because it fills up the empty items of the queue with the microinstructions 126 transmitted by the hardware instruction translator 104 in sequence, regardless of the particular clock cycle at which the microinstructions 126 are received. Will leave holes in the queue items. The advantage of this approach is that the execution unit 424 (please refer to FIG. 4) can be fully utilized because it can provide higher instruction storage performance than an instruction queue of incompressible width M or width M. Specifically, a queue of incompressible width N would require hardware instruction translators 104, especially simple instruction translators 204, to repeatedly translate in subsequent clock cycles one or more commands that have already been executed in previous clock cycles. Translated ISA instructions 124. The reason for this is that a queue of incompressible width N cannot receive more than N uops 126 in the same clock cycle, and repeated translations will result in power consumption. However, although a queue of incompressible width M does not require repeated translation by the simple instruction translator 204, it will generate holes in the queue entries and cause waste, thus requiring more column entries and a larger and more energy-intensive queue to provide considerable buffering capacity.

寄存器配置表402是由微指令队列401接收微指令126并产生与微处理器100内进行中的微指令126的附属信息，寄存器配置表402并执行寄存器重命名操作来增加微指令平行处理的能力，以利于执行管线112的超纯量、非循序执行能力。若是ISA指令124指示为x86，寄存器配置表402会对应于微处理器100的x86ISA寄存器106，产生附属信息且执行相对应的寄存器重命名操作；反之，若是ISA指令124指示为ARM，寄存器配置表402就会对应于微处理器100的ARM ISA寄存器106，产生附属信息且执行相对应的寄存器重命名操作；不过，如前述，部分寄存器106可能是由x86ISA与ARM ISA所共享。寄存器配置表402亦在重排缓冲器422中依据程序顺序配置一项目给各个微指令126，因此重排缓冲器422可使微指令126以及其相关的x86ISA与ARM ISA指令124依据程序顺序进行引退，即使微指令126的执行对应于其所欲实现的x86ISA与ARM ISA指令124而言是以非循序的方式进行的。重排缓冲器422包含一环形队列，此环形队列的各个项目是用以储存关于进行中的微指令126的信息，此信息除了其它事项，还包含微指令126执行状态、一个确认微指令126是由x86或是ARMISA指令124所转译的标签、以及用以储存微指令126的结果的储存空间。The register configuration table 402 is used by the microinstruction queue 401 to receive the microinstruction 126 and generate auxiliary information with the microinstruction 126 in progress in the microprocessor 100. The register configuration table 402 performs the register renaming operation to increase the capability of parallel processing of the microinstruction. , in order to facilitate the ultra-scalar, non-sequential execution capability of the execution pipeline 112 . If the ISA instruction 124 indicates x86, the register configuration table 402 will correspond to the x86 ISA register 106 of the microprocessor 100, generate auxiliary information and perform the corresponding register renaming operation; otherwise, if the ISA instruction 124 indicates ARM, the register configuration table 402 will correspond to the ARM ISA registers 106 of the microprocessor 100, generate additional information and perform the corresponding register renaming operations; however, as mentioned above, some of the registers 106 may be shared by the x86 ISA and the ARM ISA. The register allocation table 402 also allocates an entry in the rearrangement buffer 422 to each microinstruction 126 according to program order, so that the rearrangement buffer 422 enables the microinstruction 126 and its associated x86 ISA and ARM ISA instructions 124 to be retired according to program order , even though the execution of the microinstruction 126 is performed in a non-sequential manner corresponding to the x86 ISA and ARM ISA instructions 124 it is intended to implement. The rearrangement buffer 422 contains a circular queue whose entries are used to store information about the microinstruction 126 in progress. This information includes, among other things, the execution status of the microinstruction 126, a confirmation that the microinstruction 126 is Labels translated by x86 or ARMISA instructions 124, and storage space for storing the results of microinstructions 126.

指令调度器404由寄存器配置表402接收寄存器重命名微指令126与附属信息，并依据指令的种类以及执行单元424的可利用性，将微指令126及其附属信息分派至关联于适当的执行单元424的保留站406。此执行单元424将会执行微指令126。The instruction scheduler 404 receives the register renaming microinstruction 126 and ancillary information from the register allocation table 402, and dispatches the microinstruction 126 and its ancillary information to the appropriate execution unit according to the type of instruction and the availability of the execution unit 424 424 of the reserved station 406 . The execution unit 424 will execute the microinstruction 126 .

对各个在保留站406中等待的微指令126而言，指令发布单元408测得相关执行单元424可运用且其附属信息被满足(如来源操作数可被运用)时，即发布微指令126至执行单元424供执行。如前述，指令发布单元408所发布的微指令126，可以非循以及以超纯量方式来执行。For each microinstruction 126 waiting in the reservation station 406, the instruction issuing unit 408 issues the microinstruction 126 to when it detects that the relevant execution unit 424 is available and its associated information is satisfied (eg, the source operand is available). The execution unit 424 is for execution. As mentioned above, the microinstructions 126 issued by the instruction issuing unit 408 can be executed acyclically and in a superscalar manner.

在一实施例中，执行单元424包含整数/分支单元412、媒体单元414、加载/储存单元416、以及浮点单元418。执行单元424执行微指令126以产生结果128并提供至重排缓冲器422。虽然执行单元424并不大受到其所执行的微指令126是由x86或是ARM ISA指令124转译而来的影响，执行单元424仍会使用指令模式指针132与环境模式指针136以执行相对较小的微指令126子集。举例来说，执行管线112管理旗标的产生，其管理会依据指令模式指针132指示为x86ISA或是ARM ISA而有些微不同，并且，执行管线112依据指令模式指针132指示为x86ISA或是ARM ISA，对x86EFLAGS寄存器或是程序状态寄存器(PSR)内的ARM条件码旗标进行更新。在另一实例中，执行管线112对指令模式指针132进行取样以决定去更新x86指令指针(IP)118或ARM程序计数器(PC)116，还是更新共通的指令地址寄存器。此外，执行管线122亦藉此来决定使用x86或是ARM语意执行前述操作。一旦微指令126变成微处理器100中最旧的已完成微指令126(亦即，在重排缓冲器422队列的排头且呈现已完成的状态)且其它用以实现相关的ISA指令124的所有微指令126均已完成，重排缓冲器422就会引退ISA指令124并释放与实行微指令126相关的项目。在一实施例中，微处理器100可在一时脉周期内引退至多三个ISA指令124。此处理方法的优点在于，执行管线112是一高效能、通用执行引擎，其可执行支持x86ISA与ARM ISA指令124的微处理器100微架构的微指令126。In one embodiment, execution unit 424 includes integer/branch unit 412 , media unit 414 , load/store unit 416 , and floating point unit 418 . Execution unit 424 executes microinstructions 126 to produce results 128 and provide to reorder buffer 422 . Although execution unit 424 is not greatly affected by whether the microinstructions 126 it executes are translated from x86 or ARM ISA instructions 124, execution unit 424 still uses instruction mode pointer 132 and ambient mode pointer 136 to execute relatively small 126 subset of microinstructions. For example, the execution pipeline 112 manages the generation of flags, and its management is slightly different depending on whether the instruction mode pointer 132 indicates the x86 ISA or the ARM ISA, and the execution pipeline 112 according to the instruction mode pointer 132 indicates whether the x86 ISA or the ARM ISA, Updates the ARM condition code flags in the x86EFLAGS register or the Program Status Register (PSR). In another example, the execution pipeline 112 samples the instruction mode pointer 132 to determine whether to update the x86 instruction pointer (IP) 118 or the ARM program counter (PC) 116, or to update the common instruction address register. In addition, the execution pipeline 122 also uses this to determine whether to use the x86 or ARM semantics to perform the aforementioned operations. Once the microinstruction 126 becomes the oldest completed microinstruction 126 in the microprocessor 100 (ie, at the head of the queue in the rearrangement buffer 422 and assumes a completed state) and other functions used to implement the associated ISA instruction 124 When all microinstructions 126 have completed, the rearrangement buffer 422 retires the ISA instructions 124 and frees the entries associated with the execution of the microinstructions 126. In one embodiment, the microprocessor 100 can retire up to three ISA instructions 124 in a clock cycle. The advantage of this processing method is that the execution pipeline 112 is a high-performance, general-purpose execution engine that executes the microinstructions 126 of the microprocessor 100 microarchitecture that supports the x86 ISA and ARM ISA instructions 124.

请参照图5，图中是以方块图详述图1的寄存器文件106。就一较佳实施例而言，寄存器文件106为独立的寄存器区块实体。在一实施例中，通用寄存器是由一具有多个读出端口与写入端口的寄存器文件实体来实现；其它寄存器可在实体上独立于此通用寄存器文件以及其它会存取这些寄存器但具有较少的读取写入端口的邻近功能方块。在一实施例中，部分非通用寄存器，尤其是那些不直接控制微处理器100的硬件而仅储存微码234会使用到的数值的寄存器(如部分x86MSR或是ARM协同处理器寄存器)，则是在一个微码234可存取的私有随机存取存储器(PRAM)内实现。不过，x86ISA与ARM ISA程序者无法见到此私有随机存取存储器，亦即此存储器并不在ISA系统存储器地址空间内。Please refer to FIG. 5 , which is a block diagram illustrating the register file 106 of FIG. 1 in detail. According to a preferred embodiment, the register file 106 is a separate register block entity. In one embodiment, general purpose registers are implemented by a register file entity with multiple read and write ports; other registers may be physically independent of this general register file and others that access these registers but have more Adjacent functional blocks with fewer read and write ports. In one embodiment, some non-general purpose registers, especially those registers that do not directly control the hardware of the microprocessor 100 but only store values that the microcode 234 will use (such as some x86MSR or ARM coprocessor registers), then is implemented in a microcode 234 accessible private random access memory (PRAM). However, x86ISA and ARM ISA programmers cannot see this private random access memory, that is, this memory is not in the ISA system memory address space.

总括来说，如图5所示，寄存器文件106在逻辑上是区分为三种，亦即ARM特定的寄存器502、x86特定的寄存器504、以及共享寄存器506。在一实施例中，共享寄存器506包含十五个32位寄存器，由ARM ISA寄存器R0至R14以及x86ISA EAX至R14D寄存器所共享，另外有十六个128位寄存器由x86ISA XMM0至XMM15寄存器以及ARM ISA进阶单指令多重数据扩展(Neon)寄存器所共享，这些寄存器的部分是重迭于三十二个32位ARM VFPv3浮点寄存器。如前文图1所述，通用寄存器的共享意指由x86ISA指令124写入一共享寄存器的数值，会被ARMISA指令124在随后读取此共享寄存器时见到，反之亦然。此方式的优点在于，能够使x86ISA与ARM ISA程序通过寄存器互相沟通。此外，如前述，x86ISA与ARM ISA的架构控制寄存器的特定位亦可被引用为共享寄存器506。如前述，在一实施例中，x86特定模型寄存器可被ARMISA指令124通过实作定义协同处理器寄存器存取，因而是由x86ISA与ARM ISA所共享。此共享寄存器506可包含非架构寄存器，例如，条件旗标的非架构同等物，这些非架构寄存器同样由寄存器配置表402重命名。硬件指令转译器104知道哪一个寄存器是由x86ISA与ARMISA所共享，因而会产生实行微指令126来存取正确的寄存器。To sum up, as shown in FIG. 5 , the register file 106 is logically divided into three types, namely, ARM-specific registers 502 , x86-specific registers 504 , and shared registers 506 . In one embodiment, shared registers 506 include fifteen 32-bit registers shared by the ARM ISA registers R0 to R14 and x86ISA EAX to R14D registers, and sixteen 128-bit registers shared by the x86ISA XMM0 to XMM15 registers and the ARM ISA Advanced single instruction multiple data extension (Neon) registers are shared, and some of these registers overlap with thirty-two 32-bit ARM VFPv3 floating-point registers. As previously described in Figure 1, the sharing of general registers means that values written to a shared register by x86ISA instructions 124 will be seen by ARMISA instructions 124 when subsequently reading the shared register, and vice versa. The advantage of this method is that x86ISA and ARM ISA programs can communicate with each other through registers. In addition, as mentioned above, certain bits of the architecture control registers of the x86 ISA and the ARM ISA can also be referred to as shared registers 506 . As previously mentioned, in one embodiment, the x86-specific model registers are accessible by ARMISA instructions 124 through implementation-defined coprocessor registers, and are thus shared by the x86 ISA and the ARM ISA. This shared register 506 may include non-architectural registers, eg, non-architectural equivalents of conditional flags, which are also renamed by register configuration table 402 . The hardware instruction translator 104 knows which register is shared by the x86 ISA and the ARM ISA, and thus generates an execute microinstruction 126 to access the correct register.

ARM特定的寄存器502包含ARM ISA所定义但未被包含于共享寄存器506的其它寄存器，而x86特定的寄存器502包含x86ISA所定义但未被包含于共享寄存器506的其它寄存器。举例来说，ARM特定的寄存器502包含ARM程序计数器116、CPSR、SCTRL、FPSCR、CPACR、协同处理器寄存器、多种例外事件模式的备用通用寄存器与程序状态保存寄存器(savedprogram status registers,SPSRs)等等。前文列出的ARM特定寄存器502并非为限定本案发明，仅为例示以说明本发明。另外，举例来说，x86特定的寄存器504包含x86指令指针(EIP或IP)118、EFLAGS、R15D、64位的R0至R15寄存器的上面32位(亦即未落于共享寄存器506的部分)、区段寄存器(SS,CS,DS,ES,FS,GS)、x87FPU寄存器、MMX寄存器、控制寄存器(如CR0-CR3、CR8)等。前文列出的x86特定寄存器504并非为限定本案发明，仅为例示以说明本发明。ARM-specific registers 502 contain other registers defined by the ARM ISA but not included in shared registers 506 , while x86-specific registers 502 contain other registers defined by the x86 ISA but not included in shared registers 506 . For example, the ARM-specific registers 502 include the ARM program counter 116, CPSR, SCTRL, FPSCR, CPACR, co-processor registers, spare general-purpose registers for various exceptional event modes, saved program status registers (SPSRs), etc. Wait. The ARM-specific registers 502 listed above are not intended to limit the present invention, but are merely examples to illustrate the present invention. Additionally, x86-specific registers 504 include, for example, x86 instruction pointer (EIP or IP) 118, EFLAGS, R15D, the upper 32 bits of the 64-bit R0 through R15 registers (ie, the portion that does not fall within shared registers 506), Segment registers (SS, CS, DS, ES, FS, GS), x87FPU registers, MMX registers, control registers (such as CR0-CR3, CR8), etc. The x86 specific registers 504 listed above are not intended to limit the present invention, but are merely examples to illustrate the present invention.

在一实施例中，微处理器100包含新的实作定义ARM协同处理器寄存器，在指令模式指针132指示为ARM ISA时，此实作定义协同处理器寄存器可被存取以执行x86ISA相关的操作。这些操作包含但不限于：将微处理器100重置为一x86ISA处理器(重置至x86指令)的能力；将微处理器100初始化为x86特定的状态，将指令模式指针132切换至x86，并开始在一特定x86目标地址撷取x86指令124(启动至x86指令)的能力；存取前述全域配置寄存器的能力；存取x86特定寄存器(如EFLAGS)的能力，此x86寄存器是指定在ARM R0寄存器中，存取电力管理(如P状态与C状态的转换)，存取处理器总线功能(如输入/输出循环)、中断控制器的存取、以及加密加速功能的存取。此外，在一实施例中，微处理器100包含新的x86非架构特定模型寄存器，在指令模式指针132指示为x86ISA时，此非架构特定模型寄存器可被存取以执行ARM ISA相关的操作。这些操作包含但不限于：将微处理器100重置为一ARM ISA处理器(重置至ARM指令)的能力；将微处理器100初始化为ARM特定的状态，将指令模式指针132切换至ARM，且开始在一特定ARM目标地址撷取ARM指令124(启动至ARM指令)的能力；存取前述全域配置寄存器的能力；存取ARM特定寄存器(如CPSR)的能力，此ARM寄存器是指定在EAX寄存器内。In one embodiment, the microprocessor 100 includes new implementation-defined ARM coprocessor registers that can be accessed to perform x86 ISA-related operations when the instruction mode pointer 132 indicates the ARM ISA. operate. These operations include, but are not limited to: the ability to reset the microprocessor 100 to an x86 ISA processor (reset to x86 instructions); initialize the microprocessor 100 to an x86 specific state, switch the instruction mode pointer 132 to x86, And begin to fetch x86 instruction 124 (boot to x86 instruction) at a specific x86 target address; the ability to access the aforementioned global configuration registers; the ability to access x86 specific registers (such as EFLAGS), which are specified in the ARM In the R0 register, access to power management (such as P-state and C-state transitions), access to processor bus functions (such as input/output looping), access to the interrupt controller, and access to encryption acceleration functions. Additionally, in one embodiment, the microprocessor 100 includes new x86 non-architecture-specific model registers that can be accessed to perform ARM ISA-related operations when the instruction mode pointer 132 indicates the x86 ISA. These operations include, but are not limited to: the ability to reset the microprocessor 100 to an ARM ISA processor (reset to ARM instructions); initialize the microprocessor 100 to an ARM-specific state, and switch the instruction mode pointer 132 to ARM , and begin to retrieve the ability of ARM instruction 124 (start to ARM instruction) at a specific ARM target address; the ability to access the aforementioned global configuration registers; the ability to access ARM-specific registers (such as CPSR), which are specified in the in the EAX register.

请参照图6A与6B，图中显示一流程说明图1的微处理器100的操作程序。此流程始于步骤602。Referring to FIGS. 6A and 6B , a flowchart is shown to illustrate the operation procedure of the microprocessor 100 of FIG. 1 . The process begins at step 602 .

如步骤602所示，微处理器100是被重置。可向微处理器100的重置输入端发出信号来进行此重置操作。此外，在一实施例中，此微处理器总线是一x86型式的处理器总线，此重置操作可由x86型式的INIT命令进行。响应此重置操作，微码234的重置程序是被调用来执行。此重置微码的操作包含：(1)将x86特定的状态504初始化为x86ISA所指定的预设数值；(2)将ARM特定的状态502初始化为ARM ISA所指定的预设数值；(3)将微处理器100的非ISA特定的状态初始化为微处理器100制造商所指定的预设数值；(4)将共享ISA状态506，如GPRs，初始化为x86ISA所指定的预设数值；以及(5)将指令模式指针132与环境模式指针136设定为指示x86ISA。在另一实施例中，不同于前揭操作(4)与(5)，此重置微码将共享ISA状态506初始化为ARM ISA特定的预设数值，并将指令模式指针132与环境模式指针136设定为指示ARM ISA。在此实施例中，步骤638与642的操作不需要被执行，并且，在步骤614之前，此重置微码会将共享ISA状态506初始化为x86ISA所指定的预设数值，并将指令模式指针132与环境模式指针136设定为指示x86ISA。接下来进入步骤604。As shown in step 602, the microprocessor 100 is reset. This reset operation may be performed by signaling the reset input of the microprocessor 100 . Additionally, in one embodiment, the microprocessor bus is an x86-style processor bus, and the reset operation can be performed by an x86-style INIT command. In response to this reset operation, the reset routine of microcode 234 is invoked to execute. The operation of the reset microcode includes: (1) initializing the x86-specific state 504 to the preset value specified by the x86 ISA; (2) initializing the ARM-specific state 502 to the preset value specified by the ARM ISA; (3) ) initialize the non-ISA specific state of the microprocessor 100 to the preset value specified by the manufacturer of the microprocessor 100; (4) initialize the shared ISA state 506, such as GPRs, to the preset value specified by the x86 ISA; and (5) The instruction mode pointer 132 and the environment mode pointer 136 are set to indicate x86ISA. In another embodiment, different from the preceding operations (4) and (5), the reset microcode initializes the shared ISA state 506 to ARM ISA-specific preset values, and resets the instruction mode pointer 132 and the environment mode pointer 136 is set to indicate the ARM ISA. In this embodiment, the operations of steps 638 and 642 need not be performed, and, prior to step 614, the reset microcode will initialize the shared ISA state 506 to the default value specified by the x86 ISA, and reset the instruction mode pointer 132 and ambient mode pointer 136 are set to indicate x86 ISA. Next, step 604 is entered.

在步骤604，重置微码确认微处理器100是配置为一个x86处理器或是一个ARM处理器来进行开机。在一实施例中，如前述，预设ISA开机模式是硬式编码于微码，不过可通过熔断配置熔丝的方式，或利用一微码修补来修改。在一实施例中，此预设ISA开机模式作为一外部输入提供至微处理器100，例如一外部输入接脚。接下来进入步骤606。在步骤606中，若是预设ISA开机模式为x86，就会进入步骤614；反之，若是预设开机模式为ARM，就会进入步骤638。At step 604, the reset microcode confirms whether the microprocessor 100 is configured as an x86 processor or an ARM processor to power on. In one embodiment, as described above, the default ISA boot mode is hard-coded in the microcode, but can be modified by blowing a configuration fuse, or using a microcode patch. In one embodiment, the default ISA boot mode is provided to the microprocessor 100 as an external input, such as an external input pin. Next, step 606 is entered. In step 606, if the default ISA boot mode is x86, the process goes to step 614; otherwise, if the default boot mode is ARM, the process goes to step 638.

在步骤614中，重置微码使微处理器100开始由x86ISA指定的重置向量地址撷取x86指令124。接下来进入步骤616。In step 614, the reset microcode causes the microprocessor 100 to begin fetching the x86 instruction 124 at the reset vector address specified by the x86 ISA. Next, go to step 616 .

在步骤616中，x86系统软件(如BIOS)是配置微处理器100来使用如x86ISA RDMSR与WRMSR指令124。接下来进入步骤618。In step 616, x86 system software (eg, BIOS) configures microprocessor 100 to use instructions 124 such as x86 ISA RDMSR and WRMSR. Next, step 618 is entered.

在步骤618中，x86系统软件执行一重置至ARM的指令124。此重置至ARM的指令使微处理器100重置并以一ARM处理器的状态离开重置程序。然而，因为x86特定状态504以及非ISA特定配置状态不会因为重置至ARM的指令126而改变，此方式有利于使x86系统固件执行微处理器100的初步设定并使微处理器100随后以ARM处理器的状态重开机，而同时还能使x86系统软件执行的微处理器100的非ARM配置配置维持完好。藉此，此方法能够使用“小型的”微开机码来执行ARM操作系统的开机程序，而不需要使用微开机码来解决如何配置微处理器100的复杂问题。在一实施例中，此重置至ARM指令系一x86WRMSR指令至一新的非架构特定模型寄存器。接下来进入步骤622。In step 618, the x86 system software executes a reset to ARM instruction 124. This reset to ARM instruction causes the microprocessor 100 to reset and leave the reset procedure in the state of an ARM processor. However, since the x86-specific state 504 and the non-ISA-specific configuration state are not changed by the reset to ARM instruction 126, this approach facilitates the x86 system firmware to perform the initial setup of the microprocessor 100 and allow the microprocessor 100 to subsequently Rebooting in the state of the ARM processor while maintaining the non-ARM configuration of the microprocessor 100 executed by the x86 system software intact. In this way, the method can use the "small" micro-boot code to execute the boot procedure of the ARM operating system without using micro-boot code to solve the complicated problem of how to configure the microprocessor 100 . In one embodiment, the reset to ARM instruction is an x86WRMSR instruction to a new non-architecture specific model register. Next, step 622 is entered.

在步骤622，简单指令转译器204进入陷阱至重置微码，以响应复杂重置至ARM(complex reset-to-ARM)指令124。此重置微码使ARM特定状态502初始化至由ARM ISA指定的预设数值。不过，重置微码并不修改微处理器100的非ISA特定状态，因而有利于保存步骤616执行所需的配置设定。此外，重置微码使共享ISA状态506初始化至ARM ISA指定的预设数值。最后，重置微码设定指令模式指针132与环境模式指针136以指示ARM ISA。接下来进入步骤624。At step 622 , the simple instruction translator 204 enters a trap-to-reset microcode in response to the complex reset-to-ARM instruction 124 . This reset microcode initializes the ARM specific state 502 to the preset value specified by the ARM ISA. However, resetting the microcode does not modify the non-ISA specific state of the microprocessor 100, and thus facilitates saving the configuration settings required for the execution of step 616. Additionally, the reset microcode initializes the shared ISA state 506 to the default values specified by the ARM ISA. Finally, the microcode sets the instruction mode pointer 132 and the ambient mode pointer 136 to indicate the ARM ISA. Next, step 624 is entered.

在步骤624中，重置微码使微处理器100开始在x86ISA EDX:EAX寄存器指定的地址撷取ARM指令124。此流程结束于步骤624。In step 624, the reset microcode causes the microprocessor 100 to begin fetching the ARM instruction 124 at the address specified by the x86 ISA EDX:EAX registers. The process ends at step 624.

在步骤638中，重置微码将共享ISA状态506，如GPRs，初始化至ARM ISA指定的预设数值。接下来进入步骤642。In step 638, the reset microcode initializes the shared ISA states 506, such as GPRs, to the default values specified by the ARM ISA. Next, step 642 is entered.

在步骤642中，重置微码设定指令模式指针132与环境模式指针136以指示ARMISA。接下来进入步骤644。In step 642, the microcode sets the instruction mode pointer 132 and the ambient mode pointer 136 to indicate ARMISA. Next, step 644 is entered.

在步骤644中，重置微码使微处理器100开始在ARM ISA指定的重置向量地址撷取ARM指令124。此ARM ISA定义两个重置向量地址，并可由一输入来选择。在一实施例中，微处理器100包含一外部输入，以在两个ARM ISA定义的重置向量地址间进行选择。在另一实施例中，微码234包含在两个ARM ISA定义的重置向量地址间的一预设选择，此预设选则可通过熔断熔丝以及/或是微码修补来修改。接下来进入步骤646。In step 644, the reset microcode causes the microprocessor 100 to begin fetching the ARM instruction 124 at the reset vector address specified by the ARM ISA. The ARM ISA defines two reset vector addresses and can be selected by an input. In one embodiment, the microprocessor 100 includes an external input to select between two ARM ISA-defined reset vector addresses. In another embodiment, microcode 234 includes a default selection between two ARM ISA-defined reset vector addresses, which may be modified by blowing fuses and/or microcode patching. Next, step 646 is entered.

在步骤646中，ARM系统软件设定微处理器100来使用特定指令，如ARM ISA MCR与MRC指令124。接下来进入步骤648。In step 646, the ARM system software configures the microprocessor 100 to use specific instructions, such as the ARM ISA MCR and MRC instructions 124. Next, step 648 is entered.

在步骤648中，ARM系统软件执行一重置至x86的指令124，来使微处理器100重置并以一x86处理器的状态离开重置程序。然而，因为ARM特定状态502以及非ISA特定配置状态不会因为重置至x86的指令126而改变，此方式有利于使ARM系统固件执行微处理器100的初步设定并使微处理器100随后以x86处理器的状态重开机，而同时还能使由ARM系统软件执行的微处理器100的非x86配置配置维持完好。藉此，此方法能够使用“小型的”微开机码来执行x86操作系统的开机程序，而不需要使用微开机码来解决如何配置微处理器100的复杂问题。在一实施例中，此重置至x86指令系一ARM MRC/MRCC指令至一新的实作定义协同处理器寄存器。接下来进入步骤652。In step 648, the ARM system software executes a reset to x86 instruction 124 to cause the microprocessor 100 to reset and exit the reset procedure in an x86 processor state. However, because the ARM-specific state 502 and the non-ISA-specific configuration state are not changed by the reset to x86 instruction 126, this approach facilitates the ARM system firmware to perform the initial setup of the microprocessor 100 and allow the microprocessor 100 to subsequently Rebooting in the state of an x86 processor while leaving the non-x86 configuration of the microprocessor 100 executed by the ARM system software intact. Thereby, the method can use the "small" micro-boot code to execute the boot procedure of the x86 operating system without using micro-boot code to solve the complicated problem of how to configure the microprocessor 100 . In one embodiment, the reset to x86 instruction is an ARM MRC/MRCC instruction to a new implementation-defined coprocessor register. Next, step 652 is entered.

在步骤652中，简单指令转译器204进入陷阱至重置微码，以响应复杂重置至x86指令124。重置微码使x86特定状态504初始化至x86ISA所指定的预设数值。不过，重置微码并不修改微处理器100的非ISA特定状态，此处理有利于保存步骤646所执行的配置设定。此外，重置微码使共享ISA状态506初始化至x86ISA所指定的预设数值。最后，重置微码设定指令模式指针132与环境模式指针136以指示x86ISA。接下来进入步骤654。In step 652, the simple instruction translator 204 enters a trap to reset microcode in response to the complex reset to x86 instruction 124. The reset microcode initializes the x86 specific state 504 to the default value specified by the x86 ISA. However, resetting the microcode does not modify the non-ISA-specific state of the microprocessor 100, which facilitates saving the configuration settings performed in step 646. Additionally, the reset microcode initializes the shared ISA state 506 to the default value specified by the x86 ISA. Finally, the microcode sets the instruction mode pointer 132 and the ambient mode pointer 136 to indicate the x86 ISA. Next, step 654 is entered.

在步骤654中，重置微码使微处理器100开始在ARM ISA R1:R0寄存器所指定的地址撷取ARM指令124。此流程终止于步骤654。In step 654, the reset microcode causes the microprocessor 100 to begin fetching the ARM instruction 124 at the address specified by the ARM ISA R1:R0 registers. The process ends at step 654.

请参照图7，图中是以一方块图说明本发明的一双核心微处理器700。此双核心微处理器700包含两个处理核心100，各个核心100包含图1的微处理器100所具有的元件，藉此，各个核心均可执行x86ISA与ARM ISA机器语言程序。这些核心100可被设定为两个核心100都执行x86ISA程序、两个核心100都执行ARM ISA程序、或是一个核心100执行x86ISA程序而另一个核心100则是执行ARM ISA程序。在微处理器700的操作过程中，前述三种设定方式可混合且动态改变。如图6A及图6B的说明内容所述，各个核心100对于其指令模式指针132与环境模式指针136均具有一预设数值，此预设数值可利用熔丝或微码修补做修改，藉此，各个核心100可以独立地通过重置改变为x86或是ARM处理器。虽然图7的实施例仅具有二个核心100，在其它实施例中，微处理器700可具有多于二个核心100，而各个核心均可执行x86ISA与ARM ISA机器语言程序。Please refer to FIG. 7 , which is a block diagram illustrating a dual-core microprocessor 700 of the present invention. The dual-core microprocessor 700 includes two processing cores 100, and each core 100 includes the elements of the microprocessor 100 in FIG. 1, whereby each core can execute x86 ISA and ARM ISA machine language programs. The cores 100 can be configured such that both cores 100 execute x86 ISA programs, both cores 100 execute ARM ISA programs, or one core 100 executes x86 ISA programs and the other core 100 executes ARM ISA programs. During the operation of the microprocessor 700, the aforementioned three setting modes may be mixed and dynamically changed. As described in the description of FIGS. 6A and 6B , each core 100 has a preset value for its command mode pointer 132 and ambient mode pointer 136 , and the preset value can be modified by fuse or microcode patching, thereby , each core 100 can be independently changed to x86 or ARM processor through reset. Although the embodiment of FIG. 7 has only two cores 100, in other embodiments, the microprocessor 700 may have more than two cores 100, and each core may execute x86 ISA and ARM ISA machine language programs.

请参照图8，图中是以一方块图说明本发明另一实施例的可执行x86ISA与ARM ISA机器语言程序的微处理器100。图8的微处理器100系类似于图1的微处理器100，其中的元件编号亦相似。然而，图8的微处理器100亦包含一微指令快取892，此微指令快取892存取由硬件指令转译器104产生且直接提供给执行管线112的微指令126。微指令快取892是由指令撷取单元114所产生的撷取地址做索引。若是撷取地址134命中微指令快取892，执行管线112内的多工器(未图示)就选择来自微指令快取892的微指令126，而非来自硬件指令转译器104的微指令126；反之，多工器则是选择直接由硬件指令转译器104提供的微指令126。微指令快取的操作，通常亦称为追踪快取，是微处理器设计的技术领域所已知的技术。微指令快取892所带来的优点在于，由微指令快取892撷取微指令126所需的时间通常会少于由指令快取102撷取指令124并且利用硬件指令转译器将其转译为微指令126的时间。在图8的实施例中，微处理器100在执行x86或是ARM ISA机器语言程序时，硬件指令转译器104不需要在每次执行x86或ARM ISA指令124时都执行硬件转译，亦即当实行微指令126已经存在于微指令快取892，就不需要执行硬件转译。Please refer to FIG. 8 , which is a block diagram illustrating a microprocessor 100 capable of executing x86 ISA and ARM ISA machine language programs according to another embodiment of the present invention. The microprocessor 100 of FIG. 8 is similar to the microprocessor 100 of FIG. 1, and the element numbers therein are also similar. However, the microprocessor 100 of FIG. 8 also includes a microinstruction cache 892 that accesses the microinstructions 126 generated by the hardware instruction translator 104 and provided directly to the execution pipeline 112 . The microinstruction cache 892 is indexed by the fetch address generated by the instruction fetch unit 114 . If the fetch address 134 hits the microinstruction cache 892, a multiplexer (not shown) in the execution pipeline 112 selects the microinstruction 126 from the microinstruction cache 892 instead of the microinstruction 126 from the hardware instruction translator 104 On the contrary, the multiplexer selects the microinstructions 126 directly provided by the hardware instruction translator 104. The operation of microinstruction caching, also commonly referred to as trace caching, is a technique known in the art of microprocessor design. The advantage provided by the microinstruction cache 892 is that it generally takes less time to fetch the microinstruction 126 from the microinstruction cache 892 than to fetch the instruction 124 from the instruction cache 102 and translate it into a hardware instruction translator. The time of the microinstruction 126. In the embodiment of FIG. 8 , when the microprocessor 100 executes the x86 or ARM ISA machine language program, the hardware instruction translator 104 does not need to perform hardware translation every time the x86 or ARM ISA instruction 124 is executed, that is, when Execute microinstruction 126 already exists in microinstruction cache 892 and does not need to perform hardware translation.

在此所述的微处理器的实施例的优点在于，其通过内建的硬件指令转译器来将x86ISA与ARM ISA指令转译为微指令集的微指令，而能执行x86ISA与ARM ISA机器语言程序，此微指令集不同于x86ISA与ARM ISA指令集，且微指令可利用微处理器的共享的执行管线来执行以提供实行微指令。在此所述的微处理器的实施例的优点在于，通过协同利用大量与ISA无关的执行管线来执行由x86ISA与ARM ISA指令硬件转译来的微指令，微处理器的设计与制造所需的资源会少于两个独立设计制造的微处理器(亦即一个能够执行x86ISA机器语言程序，一个能够执行ARM ISA机器语言程序)所需的资源。此外，这些微处理器的实施例中，尤其是那些使用超纯量非循序执行管线的微处理器，具有潜力能提供相较于既有ARMISA处理器更高的效能。此外，这些微处理器的实施例，相较于采用软件转译器的系统，亦在x86与ARM的执行上可更具潜力地提供更高的效能。最后，由于微处理器可执行x86ISA与ARMISA机器语言程序，此微处理器有利于建构一个能够高效地同时执行x86与ARM机器语言程序的系统。The advantage of the embodiments of the microprocessor described herein is that it can execute x86ISA and ARM ISA machine language programs by translating x86ISA and ARM ISA instructions into microinstructions of the microinstruction set through a built-in hardware instruction translator. , this microinstruction set is different from the x86 ISA and ARM ISA instruction sets, and the microinstructions can be executed using the shared execution pipeline of the microprocessor to provide execution microinstructions. An advantage of the microprocessor embodiments described herein is that by cooperating with a large number of ISA-independent execution pipelines to execute microinstructions hardware translated from x86 ISA and ARM ISA instructions, the design and manufacture of the microprocessor requires The resources will be less than those required by two independently designed and manufactured microprocessors (ie, one capable of executing x86 ISA machine language programs and one capable of executing ARM ISA machine language programs). In addition, embodiments of these microprocessors, especially those using ultra-scalar out-of-order execution pipelines, have the potential to provide higher performance than existing ARMISA processors. In addition, these microprocessor embodiments also have the potential to provide higher performance on x86 and ARM implementations than systems using software translators. Finally, since the microprocessor can execute x86ISA and ARMISA machine language programs, the microprocessor facilitates the construction of a system that can efficiently execute both x86 and ARM machine language programs simultaneously.

条件算术与逻辑单元(CONDITIONAL ALU)指令Conditional Arithmetic and Logic Unit (CONDITIONAL ALU) Instructions

对微处理器而言，在指令集中内含让指令被条件执行的功能是令人想要的。条件执行指令的意思就是，指令会指定一条件(如零、或负、或大于)，如果满足条件旗标，此条件就会由微处理器执行，如果不满足条件旗标，条件就不会执行。如前述，ARM ISA不仅只提供此功能至分支指令，还提供至其指令集中的大部分的指令中。被条件执行的指令会指定来自通用寄存器的来源操作数，以产生一结果写入通用目地寄存器。专利权人为ARMLimited,of Cambridge,Great Britain的美国第7,647,480号专利案即描述一处理条件指令的数据处理装置。一般而言，一管线处理单元执行一条件指令以产生一结果数据数值。此结果数据数值在条件满足时显示条件指令指定的计算的结果，而在条件不满足时显示储存于目的寄存器的现今数据数值。两个可能的解决方案系描述于下列段落。It would be desirable for a microprocessor to include in the instruction set the ability for instructions to be conditionally executed. Conditional execution of instructions means that the instruction specifies a condition (such as zero, or negative, or greater than), if the condition flag is met, the condition will be executed by the microprocessor, if the condition flag is not met, the condition will not be met. implement. As mentioned earlier, the ARM ISA provides this functionality not only to branch instructions, but also to most of the instructions in its instruction set. A conditionally executed instruction specifies a source operand from a general-purpose register to produce a result written to a general-purpose destination register. US Patent No. 7,647,480 to ARM Limited, of Cambridge, Great Britain describes a data processing apparatus for processing conditional instructions. Generally, a pipeline processing unit executes a conditional instruction to generate a resulting data value. The result data value displays the result of the calculation specified by the conditional instruction when the condition is satisfied, and displays the current data value stored in the destination register when the condition is not satisfied. Two possible solutions are described in the following paragraphs.

在第一个解决方案中，指令集内的各个条件指令被限制为，该指令条件所指定的寄存器是同时为来源寄存器以及目的寄存器。使用此方式，条件指令只会占据寄存器文件的两个读出埠，即提供现今目的寄存器数值作为一来源操作数，以及提供其它来源操作数。因此，此第一个解决方案可进一步降低支持管线处理单元执行条件指令所需的最低限度的寄存器文件读出端口数量。In the first solution, each conditional instruction in the instruction set is restricted to the register specified by the instruction condition is both the source register and the destination register. Using this method, conditional instructions occupy only two read ports of the register file, ie, provide the current destination register value as a source operand, and provide other source operands. Therefore, this first solution can further reduce the minimum number of register file read ports required to support the execution of conditional instructions by the pipeline processing unit.

第二个解决方案移除第一个解决方案中对于条件指令的限制，藉此，条件指令可以指定独立的目的寄存器与来源寄存器。第二个解决方案需要使用寄存器文件一个额外的读出埠，以在单一周期内能读取条件指令所需的操作数数据数值(即来自寄存器文件的来源操作数与目的操作数)。因为第二个解决方案不仅需要为额外的读出端口付出成本，还需要较大数量的位来指定条件指令与更为复杂的数据路径，美国专利第7,647,480号专利案系选择第一个解决方案为其标的。具体来说，此数据路径需要为来自寄存器文件的三个输入路径提供逻辑处理，并且还可能需要导向逻辑以耦接至此三个路径中的任何一个。The second solution removes the restriction on conditional instructions in the first solution, whereby conditional instructions can specify separate destination and source registers. The second solution requires an additional read port using the register file to be able to read the operand data values required by the conditional instruction (ie, source and destination operands from the register file) in a single cycle. Because the second solution requires not only the cost of an additional read port, but also a larger number of bits to specify conditional instructions and a more complex data path, US Pat. No. 7,647,480 chose the first solution for its target. Specifically, this data path needs to provide logic processing for the three input paths from the register file, and may also require steering logic to couple to any of the three paths.

在此提出的实施例的优点在于，其能使条件指令指定不同于目的寄存器的来源操作数寄存器，并且不需要在寄存器文件使用一个额外的读出端口。一般而言，依据本发明的实施例，图1的微处理器100的硬件指令转译器104将一条件执行ISA指令124转译为由一个或多个微指令126构成的序列以供执行管线112执行。执行此序列的最后一个微指令126的执行单元424接收到由条件指令124指定的目的寄存器的原本数值，以确认条件是否满足。前一个微指令126，或是最后一个微指令126本身，会对来源操作数执行一操作以产生一结果。若是条件不满足，执行此序列的最后一个微指令126的执行单元424会将此原本数值写回目的寄存器，而非将结果数值写入目的寄存器。An advantage of the embodiments presented here is that it enables conditional instructions to specify a source operand register other than the destination register and does not require the use of an additional read port in the register file. Generally speaking, in accordance with embodiments of the present invention, the hardware instruction translator 104 of the microprocessor 100 of FIG. 1 translates a conditional execution ISA instruction 124 into a sequence of one or more microinstructions 126 for execution by the execution pipeline 112 . The execution unit 424 executing the last microinstruction 126 of the sequence receives the original value of the destination register specified by the conditional instruction 124 to confirm whether the condition is satisfied. The previous microinstruction 126, or the last microinstruction 126 itself, performs an operation on the source operand to produce a result. If the condition is not met, the execution unit 424 executing the last microinstruction 126 in the sequence will write the original value back to the destination register instead of writing the resulting value to the destination register.

在本发明的实施例中，条件ALU指令系一ISA指令124指示微处理器100对一个以上的来源操作数去执行一算术或逻辑操作，以产生一结果并将此结果写入一目的寄存器。其它种类的条件指令124亦可能被微处理器100的ISA指令集所支持，例如条件分支指令124或是条件加载/储存指令124，这些指令有别于条件ALU指令124。In an embodiment of the invention, the conditional ALU instruction is an ISA instruction 124 instructing the microprocessor 100 to perform an arithmetic or logical operation on more than one source operand to generate a result and write the result to a destination register. Other types of conditional instructions 124 may also be supported by the ISA instruction set of the microprocessor 100 , such as conditional branch instructions 124 or conditional load/store instructions 124 , which are distinct from conditional ALU instructions 124 .

由硬件指令转译器104响应遭遇到的条件ALU指令124所送出序列中的微指令126的数量与类型，是由两个特点所定性。第一个特点是，条件ALU指令124是否指定来源操作数之一是被施以预移位操作。在一实施例中，预移位操作举例来说系包含ARM架构参考手册第A8-10页至A8-12页描述的操作。若是条件ALU指令124指定一预移位操作，硬件指令转译器104产生一移位微指令126(在图10起标示为SHF)作为序列中的第一个微指令126。此移位微指令126执行此预移位操作以产生一移位结果写入一暂时寄存器(temporary register)，以提供予序列中后续微指令126之使用。第二个特点是，是否由条件ALU指令124指定的目的寄存器亦是这些来源操作数寄存器的其中之一。若是，硬件指令转译器104进行一优化程序来将条件ALU指令124转译为一个相较于未指定目的寄存器为来源操作数寄存器其中之一的条件ALU指令124所产生者少一个微指令126的数量。此程序主要系描述于第21至28图。The number and type of microinstructions 126 in the sequence sent by the hardware instruction translator 104 in response to an encountered conditional ALU instruction 124 is characterized by two characteristics. The first feature is whether the conditional ALU instruction 124 specifies that one of the source operands is to be subjected to a preshift operation. In one embodiment, the pre-shift operation includes, for example, the operations described on pages A8-10 to A8-12 of the ARM Architecture Reference Manual. If the conditional ALU instruction 124 specifies a preshift operation, the hardware instruction translator 104 generates a shift microinstruction 126 (designated SHF from FIG. 10 onwards) as the first microinstruction 126 in the sequence. The shift microinstruction 126 performs the preshift operation to generate a shift result written to a temporary register for use by subsequent microinstructions 126 in the sequence. The second feature is whether the destination register specified by the conditional ALU instruction 124 is also one of these source operand registers. If so, the hardware instruction translator 104 performs an optimization procedure to translate the conditional ALU instruction 124 into a conditional ALU instruction 124 that is one less number of microinstructions 126 than the one generated by the conditional ALU instruction 124 that does not specify the destination register as one of the source operand registers . This procedure is mainly described in Figures 21 to 28.

此外，条件ALU指令124指定一个架构条件旗标必须满足的条件，使微处理器100能执行此条件ALU指令124。此条件ALU指令124会指定架构条件旗标需以ALU操作的结果以及/或一预移位产生的进位旗标(carry flag)进行更新。然而，若是条件不被满足，架构条件旗标就不会被更新。达成此操作相当复杂，因为硬件指令转译器104需将条件ALU指令124转译为由多个微指令126构成的序列。具体来说，若是条件被满足，至少一个微指令126必须写入此新的条件旗标数值；然而，条件旗标的旧数值可能被位于序列的微指令126用来确认是否条件ALU指令124所指定的条件被满足，以及/或去执行ALU操作。这些实施例的优点在于，微处理器100运用技巧以确保在条件不被满足时就不更新条件旗标，且在条件被满足时才以正确的数值来更新旗标，这包含利用预移位进位旗标值进行更新。In addition, the conditional ALU instruction 124 specifies a condition that an architectural conditional flag must satisfy in order for the microprocessor 100 to execute the conditional ALU instruction 124. The conditional ALU instruction 124 specifies that the architectural condition flag needs to be updated with the result of the ALU operation and/or a carry flag generated by a preshift. However, if the condition is not met, the architectural condition flag will not be updated. Achieving this is rather complicated because the hardware instruction translator 104 needs to translate the conditional ALU instruction 124 into a sequence of microinstructions 126 . Specifically, if the condition is satisfied, at least one microinstruction 126 must write the new conditional flag value; however, the old value of the conditional flag may be used by the microinstruction 126 in the sequence to determine whether the conditional ALU instruction 124 specifies that conditions are met, and/or to perform an ALU operation. An advantage of these embodiments is that the microprocessor 100 employs techniques to ensure that the condition flag is not updated when the condition is not met, and that the flag is updated with the correct value when the condition is met, including using pre-shifting The carry flag value is updated.

在本发明的微处理器100的实施例中，如图1所示，用以保持通用寄存器的寄存器文件106，所具有的读出端口仅足够供寄存器文件106提供至多二个来源操作数至执行微指令的执行单元424以实现条件ALU指令124。如前揭对应于图1的说明内容，本发明的微处理器100的实施例系针对一商用微处理器进行改良。用以保持此商用微处理器的通用寄存器的寄存器文件具有的读出端口仅足够供寄存器文件提供至多二个来源操作数至执行单元，执行单元执行此处所称的微指令126以实现条件ALU指令124。因此，本文所述的实施例特别有利于运用在此商用微处理器的微架构中。如前揭对应于图1的说明内容，此商用微处理器原本系设计为x86ISA，关于指令的条件执行并不是其中的关键特征，因为此处理器系基于累加器，而通常需要一个来源操作数作为目的操作数，因此，此处理器看来并不需要此额外的读出埠。In the embodiment of the microprocessor 100 of the present invention, as shown in FIG. 1 , the register file 106 for holding general registers has only enough read ports for the register file 106 to provide at most two source operands for execution. Execution unit 424 of microinstructions to implement conditional ALU instructions 124 . As previously disclosed corresponding to the description of FIG. 1 , the embodiment of the microprocessor 100 of the present invention is improved for a commercial microprocessor. The register file used to hold the general purpose registers of this commercial microprocessor has read ports only sufficient for the register file to provide up to two source operands to the execution unit, which executes what is referred to herein as microinstruction 126 to implement the conditional ALU instruction 124. Accordingly, the embodiments described herein are particularly advantageous for use in the microarchitecture of such commercial microprocessors. As mentioned earlier, corresponding to the description in Figure 1, the commercial microprocessor was originally designed as x86 ISA, and conditional execution of instructions is not a key feature, because the processor is based on accumulators and usually requires a source operand As a destination operand, therefore, the processor does not appear to need this additional read port.

在此所述的实施例的一个优点在于，虽然在一些例子中，关联于由条件ALU指令124所转译的两个微指令的执行会有两个时脉周期的执行延迟，而在一些事例中，关联于由条件ALU指令124所转译的两个微指令的执行会有三个时脉周期的执行延迟，但各个微指令所执行的操作系相对简单的，而使管线化架构的实作能够支持相对较高的核心时钟频率。An advantage of the embodiments described herein is that while in some instances there is an execution delay of two clock cycles associated with the execution of the two microinstructions translated by the conditional ALU instruction 124, in some instances , there will be an execution delay of three clock cycles associated with the execution of the two microinstructions translated by the conditional ALU instruction 124, but the operation performed by each microinstruction is relatively simple, and the implementation of the pipelined architecture can support Relatively high core clock frequency.

虽然在此所述的实施例中，微处理器100能够执行ARM ISA与x86ISA指令，不过本发明并不限于此。本发明的实施例亦可适用于微处理器仅执行单一个ISA指令的情形。此外，虽然在此所述的实施例中，微处理器100系将ARM ISA条件ALU指令转译为微指令126，不过此实施例亦可适用于，微处理器执行一个不同于ARM的ISA指令，而同样在其指令集中包含条件ALU指令的情形。Although in the embodiments described herein, the microprocessor 100 is capable of executing ARM ISA and x86 ISA instructions, the invention is not limited thereto. Embodiments of the present invention are also applicable to situations where the microprocessor executes only a single ISA instruction. In addition, although in the embodiment described here, the microprocessor 100 translates the ARM ISA conditional ALU instruction into the microinstruction 126, this embodiment is also applicable to the microprocessor executing an ISA instruction different from the ARM, And also the case where conditional ALU instructions are included in its instruction set.

请参照图9，图中是以一方块图进一步详述图1的微处理器100。此微处理器100在图1的寄存器文件106内包含一架构条件旗标寄存器926，此微处理器100并包含图4的执行单元424与重排缓冲器422。条件旗标寄存器926储存架构条件旗标。在一实施例中，当指令模式指针132指示为ARM ISA，条件旗标寄存器926依据ARM ISA条件旗标的语意储存数值，而当指令模式指针132指示为x86ISA，条件旗标寄存器926则是依据x86ISA条件旗标，即x86EFLAGS，的语意储存数值。如前文对应于图5的说明内容所述，寄存器文件106以由寄存器构成的独立的实体区块来实现效果较佳；特别是，举例来说，条件旗标寄存器926可以是一个不同于通用寄存器的寄存器文件的实体寄存器文件。因此，即使如下述，条件旗标系提供至执行单元424以执行微指令126，条件旗标寄存器文件的读出端口可以是不同于通用寄存器文件的读出端口。Please refer to FIG. 9 , which is a block diagram illustrating the microprocessor 100 of FIG. 1 in further detail. The microprocessor 100 includes an architectural condition flag register 926 in the register file 106 of FIG. 1 , and the microprocessor 100 also includes the execution unit 424 and the reorder buffer 422 of FIG. 4 . Condition flag register 926 stores architectural condition flags. In one embodiment, when the instruction mode pointer 132 indicates the ARM ISA, the condition flag register 926 stores the value according to the semantics of the ARM ISA condition flag, and when the instruction mode pointer 132 indicates the x86 ISA, the condition flag register 926 is according to the x86 ISA. The semantics of conditional flags, x86EFLAGS, store values. As described above in relation to FIG. 5, it is better to implement the register file 106 as an independent physical block composed of registers; in particular, for example, the condition flag register 926 may be a different register than a general-purpose register The physical register file of the register file. Thus, even though conditional flags are provided to execution unit 424 to execute microinstructions 126 as described below, the read port of the conditional flags register file may be a different read port from the general register file.

条件旗标寄存器926输出其条件旗标数值至一个三输入端多工器922的一数据输入端。此多工器922的一第二数据输入端亦由重排缓冲器422的适当项目(entry)接收条件旗标结果。此多工器922的一第三数据输入端亦由一旗标总线928接收条件旗标结果。此多工器922选择适当的数据输入端的输入，作为其输出924提供至执行单元424以执行微指令126读取条件旗标。此过程在后续段落会有更清楚的描述。虽然本实施例仅描述单一个旗标总线928，不过，依据本发明的一实施例，各个能够产生条件旗标的执行单元424都具有其自己的旗标总线928，而各个能够读取条件旗标的执行单元424都具有其自己的条件旗标输入端924。因此，各个不同的执行单元424能够同时执行不同的微指令126来读取与/或写入条件旗标。The condition flag register 926 outputs its condition flag value to a data input of a three-input multiplexer 922 . A second data input of the multiplexer 922 also receives the conditional flag result from the appropriate entry of the rearrangement buffer 422 . A third data input of the multiplexer 922 also receives the conditional flag result via a flag bus 928 . The multiplexer 922 selects the input of the appropriate data input as its output 924 provided to the execution unit 424 to execute the microinstruction 126 to read the condition flag. This process is described more clearly in subsequent paragraphs. Although this embodiment only describes a single flag bus 928, according to an embodiment of the present invention, each execution unit 424 capable of generating a conditional flag has its own flag bus 928, and each execution unit capable of reading a conditional flag Execution units 424 each have their own condition flag input 924 . Thus, various execution units 424 can concurrently execute various microinstructions 126 to read and/or write conditional flags.

旗标总线928是图1的结果总线128的一部分，用以传送由执行单元424输出的条件旗标结果。条件旗标结果系写入重排缓冲器422，更精确来说，是写入重排缓冲器422内配置给由执行单元424执行的微指令126的项目，而执行单元424执行的结果是被传送至旗标总线928。条件旗标结果同时被旗标总线928传送至多工器922的第三数据输入端。Flag bus 928 is part of result bus 128 of FIG. 1 and is used to communicate conditional flag results output by execution unit 424 . The condition flag result is written to the reorder buffer 422, and more precisely, to the entry in the reorder buffer 422 that is allocated to the microinstruction 126 executed by the execution unit 424, and the result of the execution by the execution unit 424 is to the flag bus 928. The condition flag result is simultaneously transmitted to the third data input of the multiplexer 922 by the flag bus 928 .

图9亦以方块图显示由执行单元424输出于条件总线928上的条件旗标数值，以及执行单元424由多工器922接收的条件旗标数值924。条件旗标数值928/924包含ISA条件旗标902、一条件满足(SAT)位904、一预移位进位(PSC)位906、以及一使用移位进位(USE)位908。当指令模式指针132指示为ARM ISA，ISA条件旗标902包含ARM进位旗标(C)、零旗标(Z)、溢位旗标(V)、与负旗标(N)。当指令模式指针132指示为x86ISA，ISA条件旗标902包含x86EFLAGS进位旗标(CF)、零旗标(ZF)、溢位旗标(OF)、符号旗标(SF)、同位旗标(PF)与辅助旗标(AF)。条件旗标寄存器926包含储存空间提供给ISA条件旗标902、SAT位904、PSC位906、与USE位908。在一实施例中，条件旗标寄存器926分享储存空间给x86ISA以及ARM ISA进位旗标、零旗标、溢位旗标、与负旗标/符号旗标。FIG. 9 also shows, in block diagram form, the condition flag values output by execution unit 424 on condition bus 928 , and the condition flag values 924 received by execution unit 424 from multiplexer 922 . Condition flag values 928/924 include ISA condition flags 902, a Condition Satisfaction (SAT) bit 904, a Preshift Carry (PSC) bit 906, and a Use Shift Carry (USE) bit 908. When the instruction mode pointer 132 indicates ARM ISA, the ISA condition flags 902 include an ARM carry flag (C), a zero flag (Z), an overflow flag (V), and a negative flag (N). When the instruction mode pointer 132 indicates x86ISA, the ISA condition flags 902 include the x86EFLAGS carry flag (CF), zero flag (ZF), overflow flag (OF), sign flag (SF), parity flag (PF) ) and Auxiliary Flag (AF). Condition flag register 926 contains storage space provided for ISA condition flag 902 , SAT bit 904 , PSC bit 906 , and USE bit 908 . In one embodiment, the condition flags register 926 shares storage space for the x86 ISA and ARM ISA carry flags, zero flags, overflow flags, and negative/sign flags.

各个微指令126除了其基本操作(如加、载入/储存、移位、布尔运算的及、分支)外，还指示微指令126是否执行下述三个额外操作中的一个或更多，这些操作即(1)读取条件旗标926(在图10以下的图示中标示为RDFLAGS)，(2)写入条件旗标926(在图10以下的图示中标示为WRFLAGS)，以及(3)产生一进位旗标数值并将其写入至条件旗标926(在图10以下的图示中标示为WRCARRY)的PSC位906。在一实施例中，微指令126包含相对应的位以指示此三个额外的操作。在另一实施例中，微指令126通过微指令126的操作码指示此三个额外的操作；亦即，依据不同微指令126类型所具有的不同的操作码，搭配这些微指令类型能够执行的操作，来指示此三个额外的操作。In addition to its basic operations (such as add, load/store, shift, Boolean sum, branch), each microinstruction 126 also indicates whether the microinstruction 126 performs one or more of the following three additional operations, which The operations are (1) read condition flags 926 (labeled RDFLAGS in the diagrams below in FIG. 10 ), (2) write condition flags 926 (indicated as WRFLAGS in the diagrams below in FIG. 10 ), and ( 3) A carry flag value is generated and written to the PSC bit 906 of the condition flag 926 (labeled WRCARRY in the diagrams below in Figure 10). In one embodiment, microinstructions 126 contain corresponding bits to indicate these three additional operations. In another embodiment, the microinstruction 126 indicates the three additional operations through the operation code of the microinstruction 126; action to indicate these three additional actions.

若是一执行单元424执行一条件ALU微指令126(在图10以下的图示中，标示为ALUOP CC,CUALUOP CC,NCUALUOP CC)指示其写入条件旗标926(标示为WRFLAGS)而由执行单元424读取的条件旗标924满足微指令126所指定的条件，此执行单元424随后就会将SAT位904设定位一；否则，执行单元424就会将SAT位904清除为零。进一步说明，若是执行单元424执行的任何微指令126指示其去写入条件旗标926并且此微指令126并非一条件ALU微指令126，执行单元424随后就会将SAT位904清除为零。部分条件微指令126系依据ISA条件旗标902(在图10以下的图示中标示为XMOV CC)来指定条件，而部分条件微指令126是依据SAT位904(在图10以下的图示中标示为CMOV)来指定条件，这在下列段落会有进一步的说明。If an execution unit 424 executes a conditional ALU microinstruction 126 (in the diagrams below in FIG. 10, labeled ALUOP CC, CUALUOP CC, NCUALUOP CC) indicating its write condition flag 926 (labeled as WRFLAGS), it is executed by the execution unit. If the condition flag 924 read by 424 satisfies the condition specified by the microinstruction 126, the execution unit 424 will then set the SAT bit 904 to one; otherwise, the execution unit 424 will clear the SAT bit 904 to zero. Further, if any microinstruction 126 executed by the execution unit 424 instructs it to write the condition flag 926 and the microinstruction 126 is not a conditional ALU microinstruction 126, the execution unit 424 will then clear the SAT bit 904 to zero. Some conditional microinstructions 126 specify conditions based on ISA condition flags 902 (labeled as XMOV CC in the diagrams below in FIG. 10 ), while some conditional microinstructions 126 are based on SAT bits 904 (in the diagrams below in FIG. 10 ) marked as CMOV) to specify conditions, which are further explained in the following paragraphs.

若是一执行单元424执行一移位微指令126指示其去写入进位旗标(标示为WRCARRY)，执行单元424随后就会将USE位908设定为1，并将移位微指令126产生的进位数值写入PSC位906；否则，执行单元424会将USE位908清除为零。进一步说明，若是一执行单元424执行任何指示其去写入条件旗标926并且并非移位微指令126的微指令126，执行单元424随后会将USE位908清除为零。此USE位908系被一后续的条件ALU微指令126使用，以确认是否以PSC位数值906的数值更新架构进位旗标902，还是以基于条件ALU微指令126所执行的ALU操作产生的进位旗标的数值来进行更新。此操作在下列段落会有进一步的说明。在另一个实施例中，USE位908并不存在，但使用硬件指令转译器104直接产生USE位908的功能同等物来做为条件ALU微指令126内的一个指标。If an execution unit 424 executes a shift microinstruction 126 instructing it to write the carry flag (labeled as WRCARRY), the execution unit 424 will then set the USE bit 908 to 1 and the shift microinstruction 126 generates The carry value is written to PSC bits 906; otherwise, execution unit 424 clears USE bits 908 to zero. Further, if an execution unit 424 executes any microinstruction 126 that instructs it to write the condition flag 926 and is not a shift microinstruction 126, the execution unit 424 will then clear the USE bit 908 to zero. The USE bit 908 is used by a subsequent conditional ALU microinstruction 126 to determine whether to update the architectural carry flag 902 with the value of the PSC bit value 906 or the carry flag generated based on the ALU operation performed by the conditional ALU microinstruction 126 The target value is updated. This operation is further explained in the following paragraphs. In another embodiment, the USE bit 908 is not present, but the hardware instruction translator 104 is used to directly generate the functional equivalent of the USE bit 908 as an indicator within the conditional ALU microinstruction 126 .

请参照图10(包括图10A和图10B)，图中是以流程图说明本发明图1的硬件指令转译器104转译条件ALU指令124的操作的实施例。基本上，图10A和图10B系描述硬件指令转译器104对条件ALU指令124解码以确认其类型，以将其转译至适当的微指令126序列供执行管线112执行的方式。具体来说，硬件指令转译器104确认条件ALU指令124是否更新架构条件旗标902、是否对一来源操作数执行一预移位操作、是否使用进位旗标作为ALU操作的一输入、以及ALU操作是一进位更新或是非进位更新的操作。此操作在下述进一步说明时，将标示该ALU操作仅更新架构条件旗标902的一子集或是更新全部的架构条件旗标902。此流程始于步骤1002。Please refer to FIG. 10 (including FIG. 10A and FIG. 10B ), which is a flowchart illustrating an embodiment of the operation of the hardware instruction translator 104 of FIG. 1 to translate the conditional ALU instruction 124 of the present invention. Basically, FIGS. 10A and 10B describe the manner in which the hardware instruction translator 104 decodes the conditional ALU instruction 124 to determine its type to translate it into the appropriate sequence of microinstructions 126 for execution by the execution pipeline 112 . Specifically, the hardware instruction translator 104 determines whether the conditional ALU instruction 124 updates the architectural condition flag 902, whether a pre-shift operation is performed on a source operand, whether the carry flag is used as an input to the ALU operation, and whether the ALU operation Is a carry update or a non-carry update operation. This operation, described further below, will indicate that the ALU operation updates only a subset of the framework condition flags 902 or updates all of the framework condition flags 902 . The process begins at step 1002.

在步骤1002，硬件指令转译器104遇到一条件ALU指令124，对其解码，并将其转译为适当的微指令126序列，如步骤1024、1026、1034、1036、1044、1054与1056所述。条件ALU指令124指示微处理器100在一个或多个来源操作数执行一算术或逻辑操作以产生一结果，并将此结果写入目的寄存器。由条件ALU指令124所指定的ALU操作的部分类型使用架构进位旗标902作为输入(如加入进位(add with carry))，虽然大多数的类型并不这样做。条件ALU指令124亦指定一条件对应于ISA的架构条件旗标902。若是架构条件旗标902满足此指定条件，微处理器100就会执行条件ALU指令124，亦即执行ALU操作并将结果写入目的寄存器。否则，微处理器100就会将条件ALU指令124视为一不操作(no-op)指令；具体来说，微处理器100就不会改变目的寄存器内的数值。此外，条件ALU指令124可将架构条件旗标902指定为系依据ALU操作的结果来更新，或是不会被更新。不过，即使条件ALU指令124将架构条件旗标902指定为会被更新，若是架构条件旗标902不满足指定条件，微处理器100就不会变更架构条件旗标902内的数值。最后，条件ALU指令124可额外将ALU操作的来源操作数其中之一指定为要被预移位，请一并参照步骤1012的叙述。在一实施例中，被硬件指令转译器104转译的条件ALU指令124系ARM ISA指令。具体来说，在一实施例中，如图10所示，ARM ISA数据处理指令与乘法指令系由硬件指令转译器104进行转译。在一实施例中，这些指令包含但不限于：AND、EOR、SUB、RSB、ADD、ADC、SBC、RSC、TST、TEQ、CMP、CMN、ORR、ORN、MOV、LSL、LSR、ASR、RRX、ROR、BIC、MVN、MUL、MLA、以及MLS指令。在步骤1024、1026、1034、1036、1044、1054与1056中，为说明起见，相关类型的ARM ISA条件ALU指令124是显示于第一行，硬件指令转译器104转译条件ALU指令124所产生的微指令126是显示于后续行。下标“CC”显示此指令124为一条件指令。此外，ALU操作的类型是以所指定的来源与目的操作数为例。程序设计者可将一目的寄存器指定为提供一来源操作数的寄存器为同一者；在此情况下，硬件指令转译器104系配置来利用此情况且优化微指令126序列以利于条件ALU指令124的转译。此特征系描述于图21。接下来进入步骤1004。At step 1002, hardware instruction translator 104 encounters a conditional ALU instruction 124, decodes it, and translates it into the appropriate sequence of microinstructions 126, as described in steps 1024, 1026, 1034, 1036, 1044, 1054, and 1056 . The conditional ALU instruction 124 instructs the microprocessor 100 to perform an arithmetic or logical operation on one or more source operands to produce a result and to write the result to the destination register. Some types of ALU operations specified by conditional ALU instructions 124 use the architectural carry flag 902 as input (eg, add with carry), although most types do not. The conditional ALU instruction 124 also specifies a condition corresponding to the architectural condition flag 902 of the ISA. If the architectural condition flag 902 satisfies the specified condition, the microprocessor 100 executes the conditional ALU instruction 124, that is, performs an ALU operation and writes the result to the destination register. Otherwise, the microprocessor 100 would treat the conditional ALU instruction 124 as a no-op instruction; in particular, the microprocessor 100 would not change the value in the destination register. Additionally, the conditional ALU instruction 124 may designate the architectural conditional flag 902 to be updated depending on the result of the ALU operation, or not to be updated. However, even if the conditional ALU instruction 124 specifies the architectural condition flag 902 to be updated, the microprocessor 100 will not change the value in the architectural condition flag 902 if the architectural condition flag 902 does not meet the specified condition. Finally, the conditional ALU instruction 124 may additionally designate one of the source operands of the ALU operation to be pre-shifted. Please refer to the description of step 1012 together. In one embodiment, the conditional ALU instructions 124 translated by the hardware instruction translator 104 are ARM ISA instructions. Specifically, in one embodiment, as shown in FIG. 10 , the ARM ISA data processing instructions and multiply instructions are translated by the hardware instruction translator 104 . In one embodiment, these instructions include, but are not limited to: AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, ORN, MOV, LSL, LSR, ASR, RRX , ROR, BIC, MVN, MUL, MLA, and MLS instructions. In steps 1024 , 1026 , 1034 , 1036 , 1044 , 1054 and 1056 , for the sake of illustration, the ARM ISA conditional ALU instruction 124 of the relevant type is shown in the first row, and the hardware instruction translator 104 translates the conditional ALU instruction 124 generated by Microinstructions 126 are displayed on subsequent lines. The subscript "CC" indicates that this instruction 124 is a conditional instruction. In addition, the type of ALU operation is exemplified by the specified source and destination operands. The programmer may designate a destination register to be the same as the register providing a source operand; in this case, hardware instruction translator 104 is configured to take advantage of this situation and optimize the sequence of microinstructions 126 to facilitate the execution of conditional ALU instructions 124. translate. This feature is described in Figure 21. Next, step 1004 is entered.

在步骤1004中，硬件指令转译器104确认条件ALU指令124是否将架构条件旗标902指定为需要由条件ALU指令124进行更新。也就是说，在一些情况下，程序设计者可选择依据ALU操作的结果来更新架构条件旗标902的条件ALU指令124的方式，而在其它情况下，程序者可选择无论ALU操作的结果为何，都不更新架构条件旗标902的条件ALU指令124的方式。在ARM ISA汇编语言中，指令下标“S”系指示架构条件旗标902是要被更新的，在图10以下的图示系采用此习惯用法。举例来说，步骤1044将ARM ISA条件ALU指令124标示为“ALUOP S”以表示架构条件旗标902是要被更新的，而步骤1024将ARM ISA条件ALU指令124标示为“ALUOP”(亦即，差异在于“S”)则表示架构条件旗标902不要被更新。若条件ALU指令124将架构条件旗标902指定为要更新，流程就会前进至步骤1042；否则就会前进至步骤1012。In step 1004 , the hardware instruction translator 104 determines whether the conditional ALU instruction 124 specifies the architectural condition flag 902 as requiring an update by the conditional ALU instruction 124 . That is, in some cases, the programmer may choose to update the conditional ALU instruction 124 of the architectural condition flag 902 according to the result of the ALU operation, while in other cases, the programmer may choose to update the conditional ALU instruction 124 regardless of the result of the ALU operation. , do not update the way the conditional ALU instruction 124 of the architectural conditional flag 902 is used. In the ARM ISA assembly language, the instruction subscript "S" indicates that the architectural condition flag 902 is to be updated, and the following diagrams in Figure 10 use this idiom. For example, step 1044 marks the ARM ISA conditional ALU instruction 124 as "ALUOP S" to indicate that the architectural conditional flag 902 is to be updated, and step 1024 marks the ARM ISA conditional ALU instruction 124 as "ALUOP" (ie , the difference is "S"), it means that the architectural condition flag 902 should not be updated. If the conditional ALU instruction 124 specifies the architectural condition flag 902 to be updated, flow proceeds to step 1042; otherwise, it proceeds to step 1012.

在步骤1012中，硬件指令转译器104确认条件ALU指令124的种类是否会对于ALU操作操作数的其中一者指定一预移位操作。此预移位操作可以由一立即字段进行以产生一常数来源操作数，或是此预移位操作可以由来自由寄存器提供的一来源操作数而进行。此预移位操作的数量可以指定为条件ALU指令124内的一常数。此外，在使用寄存器移位操作数的情况下，预移位操作数量可以由寄存器内的数值所指定。在ARM ISA的情况下，将一立即数值依据一立即移位量所进行一预移位操作而产生一常数来源操作数将视为一修改后的立即常数。预移位操作产生一进位旗标数值。对于某些类型的ALU操作而言，架构进位旗标902是以由移位操作所产生的进位旗标数值进行更新，不过对于一些类型的ALU操作而言，架构进位旗标902系以由ALU操作产生的进位旗标数值来进行更新。然而，由预移位操作产生的进位旗标数值并不被用来确认条件ALU指令124所指定的条件是否被满足，更明确地说，所使用的是当前的架构进位旗标902。值得注意的是，ARM ISA MUL、ASR、LSL、LSR、ROR、与RRX指令并不能指定一预移位操作，其处理过程会在步骤1024、1026或1044进行描述。此外，在MOV与MVN指令指定一修正后的立即常数操作数的情形可指定一预移位操作，不过在MOV与MVN指令并不指定一修正后的立即常数操作数(即指定一寄存器操作数)的情形下，就不会指定一预移位操作，其处理过程会在步骤1024、1026或1044进行描述。如前述，此预移位操作可由一立即字段进行以产生一常数来源操作数，或者此预移位操作可由寄存器提供的一来源操作数而进行。若是条件ALU指令124指定一预移位操作，流程会前进至步骤1032；否则流程就会前进至步骤1022。In step 1012, hardware instruction translator 104 determines whether the type of conditional ALU instruction 124 would specify a preshift operation for one of the operands of the ALU operation. The preshift operation can be performed from an immediate field to generate a constant source operand, or the preshift operation can be performed from a source operand provided from a register. The number of such preshift operations may be specified as a constant within the conditional ALU instruction 124 . Also, in the case of using register shift operands, the number of pre-shift operations can be specified by the value in the register. In the case of ARM ISA, an immediate value is pre-shifted according to an immediate shift amount to generate a constant source operand will be regarded as a modified immediate constant. The preshift operation produces a carry flag value. For some types of ALU operations, the architectural carry flag 902 is updated with the carry flag value generated by the shift operation, but for some types of ALU operations, the architectural carry flag 902 is updated by the ALU The value of the carry flag generated by the operation is updated. However, the carry flag value produced by the preshift operation is not used to confirm that the condition specified by the conditional ALU instruction 124 is satisfied, and more specifically, the current architectural carry flag 902 is used. It is worth noting that the ARM ISA MUL, ASR, LSL, LSR, ROR, and RRX instructions cannot specify a pre-shift operation, the processing of which will be described in steps 1024 , 1026 or 1044 . In addition, a preshift operation can be specified in the case where the MOV and MVN instructions specify a modified immediate constant operand, but in the case where the MOV and MVN instructions do not specify a modified immediate constant operand (ie, a register operand is specified) ), a pre-shift operation will not be specified, and its processing will be described in steps 1024, 1026 or 1044. As mentioned above, the preshift operation can be performed by an immediate field to generate a constant source operand, or the preshift operation can be performed by a source operand provided by a register. If the conditional ALU instruction 124 specifies a pre-shift operation, the flow proceeds to step 1032 ; otherwise, the flow proceeds to step 1022 .

在步骤1022中，硬件指令转译器104确认条件ALU指令124是否指定一使用进位旗标的ALU操作。使用进位旗标的ARM ISA指令124，举例来说，包括带进位加法(add withcarry,ADC)、反向带进位减法(reverse subtract with carry,RSC)、以及带进位减法(subtract with carry,SBC)指令，以及指定一移位寄存器操作数且使用进位旗标以进行移位操作的指令，也就是RRX移位类型的指令。若是条件ALU指令124指定使用进位旗标的ALU操作，则流程前进至步骤1026；反之则前进至步骤1024。In step 1022, the hardware instruction translator 104 determines whether the conditional ALU instruction 124 specifies an ALU operation using the carry flag. ARM ISA instructions 124 that use the carry flag, for example, include add with carry (ADC), reverse subtract with carry (RSC), and subtract with carry (subtract with carry, SBC) instruction, and an instruction that specifies a shift register operand and uses the carry flag to perform a shift operation, that is, an RRX shift type instruction. If the conditional ALU instruction 124 specifies an ALU operation using the carry flag, the flow proceeds to step 1026 ; otherwise, proceeds to step 1024 .

在步骤1024，硬件指令转译器104将非旗标更新、非预移位、非使用进位的条件ALU指令124转译为第一与第二微指令126，也就是(1)一ALU操作微指令126(标示为ALUOP)；以及(2)一条件移动微指令126(标示为XMOV)。在步骤1024的一实例中，条件ALU指令124指定一第一来源寄存器(R1)与一第二来源寄存器(R2)，并在第一来源寄存器与第二来源寄存器上执行一ALU操作(标示为ALUOP)以产生一结果，以及一用以条件写入此结果的目的寄存器(RD)。ALUOP微指令126与条件ALU指令124指定相同的ALU与来源操作数。ALUOP微指令126在两个来源操作数上执行ALU操作并将结果写入一暂时寄存器(标示为T2)。条件移动微指令126与条件ALU指令124指定相同的状态。条件移动微指令126接收暂时寄存器中由ALUOP微指令126写入的数值，并接收旧的、或当前的目的寄存器(RD)的数值。条件移动微指令126接收条件旗标924并确认这些旗标是否满足条件。若是满足条件，条件移动微指令126将暂时寄存器的数值写入目的寄存器(RD)，否则就将旧的目的寄存器的数值写回目的寄存器。值得注意的是，虽然本实施例指定两个来源寄存器操作数，不过本发明并不限于此，这些来源操作数中的一个可以是指定于一条件ALU指令124的立即字段中的常数操作数，而非由寄存器所提供。微指令126的执行在图20会有更进一步的说明。在图10A及10B及后续图示中使用的“旧的”用语，系指此旗标或目的寄存器数值，除非另外特别指明，否则是指执行单元424在执行微指令126时所接收的数值。前述说明也可以表示至当前数值。对目的寄存器而言，旧的或是当前的数值系由图1的导向结果总线(forwarding result bus)、重排缓冲器422、或是架构寄存器文件106接收。对旗标而言，如关于图9的叙述，旧的或是当前的数值系由导向总线(forwarding flag bus)928、重排缓冲器422、或是架构条件旗标寄存器926所接收。此流程终止于步骤1024。At step 1024, the hardware instruction translator 104 translates the non-flag update, non-preshift, non-use-carry conditional ALU instructions 124 into first and second microinstructions 126, namely (1) an ALU operation microinstruction 126 (designated ALUOP); and (2) a conditional move microinstruction 126 (designated XMOV). In one example of step 1024, the conditional ALU instruction 124 specifies a first source register (R1) and a second source register (R2), and performs an ALU operation on the first source register and the second source register (labeled as ALUOP) to generate a result, and a destination register (RD) to conditionally write the result. ALUOP microinstruction 126 and conditional ALU instruction 124 specify the same ALU and source operand. ALUOP microinstruction 126 performs an ALU operation on the two source operands and writes the result to a temporary register (designated T2). Conditional move microinstructions 126 specify the same state as conditional ALU instructions 124 . The conditional move microinstruction 126 receives the value written by the ALUOP microinstruction 126 in the scratch register and receives the old, or current, destination register (RD) value. Conditional move microinstructions 126 receive condition flags 924 and determine whether these flags satisfy the conditions. If the condition is met, the conditional move microinstruction 126 writes the temporary register value into the destination register (RD), otherwise it writes the old destination register value back to the destination register. It is worth noting that although the present embodiment specifies two source register operands, the present invention is not limited thereto, and one of these source operands may be a constant operand specified in the immediate field of a conditional ALU instruction 124, not provided by registers. The execution of microinstructions 126 is further illustrated in FIG. 20 . The term "old" as used in FIGS. 10A and 10B and subsequent figures refers to this flag or destination register value and, unless otherwise specified, refers to the value received by execution unit 424 when microinstruction 126 is executed. The foregoing description can also be expressed up to the current value. For destination registers, the old or current value system is received by the forwarding result bus of FIG. 1 , the rearrangement buffer 422 , or the architectural register file 106 . For flags, the old or current value is received by forwarding flag bus 928 , reorder buffer 422 , or architecture condition flag register 926 as described with respect to FIG. 9 . The process ends at step 1024.

在步骤1026中，硬件指令转译器104将非旗标更新、非预移位、使用进位的条件ALU指令124转译为第一与第二微指令126，即(1)一使用进位ALU操作微指令126(标示为ALUOPUC)；以及(2)一条件移动微指令126(标示为XMOV)。在步骤1026的一实例中，条件ALU指令124系类似于步骤1024所描述者，除了所指定的ALU操作系使用进位旗标。这两个微指令126亦类似于步骤1024所描述者；不过，ALUOPUC微指令126亦接收条件旗标924以获得进位旗标的当前数值，并应用于使用进位ALU操作中。微指令126的执行系详述于图19。此流程终止于步骤1026。In step 1026, the hardware instruction translator 104 translates the non-flag update, non-preshift, and carry-use conditional ALU instructions 124 into first and second microinstructions 126, namely (1) a carry-use ALU operation microinstruction 126 (designated ALUOPUC); and (2) a conditional move microinstruction 126 (designated XMOV). In one instance of step 1026, the conditional ALU instruction 124 is similar to that described for step 1024, except that the specified ALU operation uses the carry flag. The two microinstructions 126 are also similar to those described in step 1024; however, the ALUOPUC microinstruction 126 also receives the condition flag 924 to obtain the current value of the carry flag and applies it to ALU operations using carry. The execution of microinstructions 126 is detailed in FIG. 19 . The process ends at step 1026.

在步骤1032中，硬件指令转译器104确认条件ALU指令124是否指定一ALU操作以使用进位旗标。若是ALU操作使用进位旗标，流程会进行至步骤1036；反之则前进至步骤1034。In step 1032, the hardware instruction translator 104 determines whether the conditional ALU instruction 124 specifies an ALU operation to use the carry flag. If the ALU operation uses the carry flag, the process proceeds to step 1036 ; otherwise, proceeds to step 1034 .

在步骤1034中，硬件指令转译器104将非旗标更新、预移位、非进位使用的条件ALU指令124转译为第一、第二与第三微指令126，亦即(1)一移位微指令126(标示为SHF)；(2)一ALU操作微指令126；以及(3)一条件移动微指令126。在步骤134的一实例中，条件ALU指令124系类似于步骤1024所描述者；不过，此条件ALU指令124亦指定一具有一移位量的预移位操作于第二来源操作数(R2)，在步骤1034的实施例中，此移位量是储存于由条件ALU指令124所指定的一第三来源寄存器(R3)。不过，若是条件ALU指令124的类型是将移位量指定为指令124内的一常数，第三来源寄存器就不会被使用。此可能产生的预移位操作与条件ALU指令124的列表可指定为，包含但不限于，逻辑左移(logical shift left,LSL)、逻辑右移(logical shift right,LSR)、算术左移(arithmetic shift right,ASR)、右转(rotateright,ROR)、以及扩展右转(rotate right with extend,RRX)。在一实施例中，硬件指令转译器104输出一移位微指令126以确保移位数值系依据ARM ISA的语意所产生，举例来说，尤其是指ARM架构参考手册中对应于个别ARM指令的描述，以及例如是第A8-10至A8-12、与第A5-10至A5-11页的内容。此移位微指令126与条件ALU指令124是指定相同的预移位操作，此移位微指令126亦与条件ALU指令124指定相同的第二来源操作数R2与第三来源操作数R3。此移位微指令126对第二来源操作数R2执行具有一移位量的移位操作，并将结果写入一暂时寄存器(标示为T3)。虽然在步骤1034中，由于条件ALU指令124将架构条件旗标902指定为不更新，移位微指令126产生的条件旗标数值不会被使用，不过，举例来说，在步骤1056中，移位微指令126产生的移位旗标数值则会被使用这会在下列段落有进一步的说明。此外，预移位操作会需要将旧的移位旗标旋转至移位后的结果数值；举例来说，扩展右转(RRX)预移位操作将进位指标移位至结果中的最高有效位。在此情况下，虽未见于图10A及图10B(除了步骤1056)，移位微指令126也会读取条件旗标924以取得当前的进位旗标数值。ALUOP微指令126系类似于步骤1024所描述者；然而，此ALUOP微指令126系接收暂时寄存器T3的数值，而非第二来源操作数R2，并且执行ALU操作于第一来源操作数R1与暂时寄存器T3以产生结果写入暂时寄存器T2。XMOV微指令126系类似于步骤1024所描述者。微指令126的执行在图18会有更详细的说明。此流程终止于步骤1034。In step 1034, the hardware instruction translator 104 translates the non-flag update, pre-shift, non-carry conditional ALU instructions 124 into the first, second and third microinstructions 126, ie (1) a shift microinstruction 126 (designated SHF); (2) an ALU operation microinstruction 126; and (3) a conditional move microinstruction 126. In one example of step 134, the conditional ALU instruction 124 is similar to that described for step 1024; however, the conditional ALU instruction 124 also specifies a preshift operation with a shift amount on the second source operand (R2) , in the embodiment of step 1034 , the shift amount is stored in a third source register ( R3 ) specified by the conditional ALU instruction 124 . However, if the type of conditional ALU instruction 124 is to specify the shift amount as a constant within instruction 124, the third source register will not be used. This list of possible resulting preshift operations and conditional ALU instructions 124 may be specified as, but not limited to, logical shift left (LSL), logical shift right (LSR), arithmetic left shift ( arithmetic shift right (ASR), rotate right (ROR), and rotate right with extend (RRX). In one embodiment, the hardware instruction translator 104 outputs a shift microinstruction 126 to ensure that the shift values are generated according to the semantics of the ARM ISA, for example, in particular the ARM Architecture Reference Manual corresponding to individual ARM instructions. Description, and for example the content of pages A8-10 to A8-12, and A5-10 to A5-11. The shift microinstruction 126 and the conditional ALU instruction 124 specify the same pre-shift operation. The shift microinstruction 126 also specifies the same second source operand R2 and third source operand R3 as the conditional ALU instruction 124 . The shift microinstruction 126 performs a shift operation with a shift amount on the second source operand R2, and writes the result to a temporary register (labeled as T3). Although in step 1034 the conditional flag value generated by the shift microinstruction 126 is not used because the conditional ALU instruction 124 designates the architectural conditional flag 902 not to update, for example, in step 1056, the shift The shift flag values generated by the bit microinstruction 126 are used as described further in the following paragraphs. In addition, preshift operations may require rotating the old shift flag to the shifted result value; for example, extended right turn (RRX) preshift operations shift the carry indicator to the most significant bit in the result . In this case, although not shown in FIGS. 10A and 10B (except for step 1056 ), the shift microinstruction 126 also reads the condition flag 924 to obtain the current carry flag value. ALUOP microinstruction 126 is similar to that described in step 1024; however, this ALUOP microinstruction 126 receives the value of temporary register T3 instead of second source operand R2, and performs an ALU operation on first source operand R1 and temporary Register T3 is written to temporary register T2 with the result. The XMOV microinstruction 126 is similar to that described for step 1024. The execution of microinstructions 126 is described in more detail in FIG. 18 . The process ends at step 1034.

在步骤1036中，硬件指令转译器104将非旗标更新、预移位、使用进位的条件ALU指令124转译为第一、第二与第三微指令126，亦即(1)一移位微指令126；(2)一使用进位ALU操作微指令126；以及(3)一条件移动微指令126。在步骤1036的实例中，条件ALU指令124是类似于步骤1034所述者，除了此指令124所指定的ALU操作使用系使用进位旗标。此三个微指令126系类似于步骤1034所描述者；不过，ALUOPUC微指令126亦接收条件旗标924以取得进位旗标的当前数值以使用于进位使用ALU操作。微指令126的执行在图17中会有更详细的描述。此流程终止于步骤1036。In step 1036, the hardware instruction translator 104 translates the non-flag update, preshift, and carry-use conditional ALU instruction 124 into the first, second and third microinstructions 126, ie (1) a shift microinstruction instruction 126; (2) a use carry ALU operation microinstruction 126; and (3) a conditional move microinstruction 126. In the example of step 1036, the conditional ALU instruction 124 is similar to that described in step 1034, except that the ALU operation specified by this instruction 124 uses the carry flag. The three microinstructions 126 are similar to those described in step 1034; however, the ALUOPUC microinstruction 126 also receives the condition flag 924 to obtain the current value of the carry flag for use in carry using the ALU operation. The execution of microinstructions 126 is described in more detail in FIG. 17 . The process ends at step 1036.

在步骤1042，硬件指令转译器104确认条件ALU指令124的类型是否对ALU操作操作数其中之一指定一预移位。若是条件ALU指令124指定一预移位，流程会前进到步骤1052；否则，流程会前进到步骤1044。At step 1042, hardware instruction translator 104 determines whether the type of conditional ALU instruction 124 specifies a pre-shift for one of the ALU operands. If the conditional ALU instruction 124 specifies a pre-shift, the flow proceeds to step 1052 ; otherwise, the flow proceeds to step 1044 .

在步骤1044中，硬件指令转译器104将旗标更新、非预移位的条件ALU指令124转译为第一与第二微指令126，亦即：(1)一条件ALU操作微指令126(标示为ALUOP CC)；以及(2)一条件移动微指令126(标示为CMOV)。在步骤1044的实例中，条件ALU指令124系类似于步骤1024的条件ALU指令124，除了本实施例系更新架构条件旗标902。条件ALU微指令126与条件ALU指令124系指定相同的条件与来源操作数。条件ALU操作微指令126对两个来源操作数执行ALU操作，并且将结果写入一暂时寄存器(标示为T2)。此外，条件ALU操作微指令126接收架构条件旗标902并确认其是否满足条件。此外，条件ALU操作微指令126写入条件旗标寄存器926。具体来说，条件ALU操作微指令126写入SAT位904以指示架构条件旗标902是否满足条件。此外，若是条件不满足，条件ALU操作微指令126将旧的条件旗标数值写入架构条件旗标902；反之，若是条件满足，条件ALU操作微指令126就依据ALU操作的结果更新架构条件旗标902。此架构条件旗标902的更新数值系相关于ALU操作的类型。也就是说，对于部分种类的ALU操作，所有的架构条件旗标902都会依据ALU操作的结果以新数值更新；反之，对于部分种类的ALU操作，一些架构条件旗标902(在一实施例中，为Z与N旗标)系依据ALU操作的结果以新数值更新，不过旧的数值会保留给其它的架构条件旗标902(在一实施例中，为V与C旗标)。架构条件旗标902的更新在图14中会更详细地说明。条件移动(CMOV)微指令126接收由ALUOP微指令126写入暂时寄存器(T2)的数值，并接收目的寄存器(RD)的旧的或是当前的数值。条件移动(CMOV)微指令126接收条件旗标924，并且检验SAT位904以确认条件ALU操作微指令126是否指示架构条件旗标902满足条件。若是条件满足，条件移动(CMOV)微指令126将暂时寄存器的值写入目的寄存器，否则就将旧的目的寄存器的值写回目的寄存器。微指令126的执行在图14中会有更详细的说明。值得注意的是，步骤1044(以及步骤1054与1056)所产生的条件ALU操作微指令126所执行的ALU操作，可以是一个使用条件旗标(类似于步骤1026与1036所描述者)的ALU操作，而由于微指令126读取旗标(如RDFLAGS指针)，执行单元424具有进位旗标以执行此使用进位ALU操作。此流程终止于步骤1044。In step 1044, the hardware instruction translator 104 translates the flag update, non-preshifted conditional ALU instruction 124 into the first and second microinstructions 126, namely: (1) a conditional ALU operation microinstruction 126 (marked and (2) a conditional move microinstruction 126 (labeled as CMOV). In the example of step 1044, the conditional ALU instruction 124 is similar to the conditional ALU instruction 124 of step 1024, except that the architectural condition flag 902 is updated in this embodiment. Conditional ALU microinstruction 126 and conditional ALU instruction 124 specify the same conditions and source operands. Conditional ALU operation microinstruction 126 performs an ALU operation on the two source operands and writes the result to a temporary register (labeled T2). In addition, the conditional ALU operation microinstruction 126 receives the architectural condition flag 902 and confirms whether it satisfies the condition. Additionally, the conditional ALU operation microinstruction 126 writes to the conditional flags register 926 . Specifically, the conditional ALU operation microinstruction 126 writes the SAT bit 904 to indicate whether the architectural condition flag 902 is satisfied. In addition, if the condition is not satisfied, the conditional ALU operation microinstruction 126 writes the old conditional flag value into the architectural conditional flag 902; otherwise, if the condition is satisfied, the conditional ALU operation microinstruction 126 updates the architectural conditional flag according to the result of the ALU operation Mark 902. The update value of this architectural condition flag 902 is related to the type of ALU operation. That is, for some types of ALU operations, all the architectural condition flags 902 are updated with new values according to the results of the ALU operations; conversely, for some types of ALU operations, some architectural condition flags 902 (in one embodiment) , the Z and N flags) are updated with new values according to the result of the ALU operation, but the old values are reserved for other architectural condition flags 902 (in one embodiment, the V and C flags). The updating of the architectural condition flags 902 is described in more detail in FIG. 14 . The conditional move (CMOV) microinstruction 126 receives the value written to the temporary register (T2) by the ALUOP microinstruction 126 and receives the old or current value of the destination register (RD). The conditional move (CMOV) microinstruction 126 receives the conditional flag 924 and checks the SAT bit 904 to determine whether the conditional ALU operation microinstruction 126 indicates that the architectural conditional flag 902 satisfies the condition. If the condition is met, the conditional move (CMOV) microinstruction 126 writes the temporary register value to the destination register, otherwise the old destination register value is written back to the destination register. The execution of microinstructions 126 is described in more detail in FIG. 14 . It is worth noting that the ALU operation performed by the conditional ALU operation microinstruction 126 generated in step 1044 (and steps 1054 and 1056) may be an ALU operation using a conditional flag (similar to that described in steps 1026 and 1036). , and since microinstruction 126 reads flags (eg, the RDFLAGS pointer), execution unit 424 has a carry flag to perform this use-carry ALU operation. The process ends at step 1044.

在步骤1052中，硬件指令转译器104确认条件ALU指令124是否指定一属于会更新架构进位旗标902的类型的ALU操作。对于硬件指令转译器104而言，区分是否会更新架构进位旗标902是必要的，因为若是ALU操作不更新架构进位旗标902，预移位操作所产生的进位旗标数值，而非基于ALU操作所产生的条件旗标数值，就必须被用于更新架构进位旗标902。在一实施例中，指定一不更新架构进位旗标902的ALU操作，但指定一预移位操作的ARM ISA指令124，系包含但不限于AND、BIC、EOR、ORN、ORR、TEQ与TST，以及MOV/MVN指令124，这些指另通过一非零旋转数值(non-zero rotation value)指定一调整过的立即常数。若是ALU操作更新此架构进位旗标902，此流程会前进至步骤1054；反之则前进至步骤1056。In step 1052, the hardware instruction translator 104 determines whether the conditional ALU instruction 124 specifies an ALU operation of the type that would update the architectural carry flag 902. It is necessary for the hardware instruction translator 104 to distinguish whether the architectural carry flag 902 will be updated, because if the ALU operation does not update the architectural carry flag 902, the carry flag value generated by the preshift operation is not based on the ALU The condition flag value generated by the operation must be used to update the architectural carry flag 902 . In one embodiment, ARM ISA instructions 124 that specify an ALU operation that does not update the architectural carry flag 902, but specify a pre-shift operation, include but are not limited to AND, BIC, EOR, ORN, ORR, TEQ, and TST , and MOV/MVN instructions 124, which also specify an adjusted immediate constant by a non-zero rotation value. If the ALU operation updates the frame carry flag 902 , the process proceeds to step 1054 ; otherwise, proceeds to step 1056 .

在步骤1054中，硬件指令转译器104将旗标更新、预移位、进位使用的条件ALU指令124转译为第一、第二与第三微指令126，亦即：(1)一移位微指令126；(2)一条件进位更新ALU操作微指令126(标示为CU ALUOP CC)；以及(3)一条件移动微指令126。在步骤1054的一实例中，条件ALU指令124系类似于步骤1034所描述者；然而，此条件ALU指令124亦将架构条件旗标902指定为要被更新。移位微指令126系类似于步骤1034所描述者。条件进位更新ALU操作微指令126与条件ALU指令124系指定相同的条件。条件进位更新ALU操作微指令126在第一来源操作数R1与暂时寄存器T3执行ALU操作，并将结果写入一暂时寄存器(标示为T2)。此外，条件进位更新ALU操作微指令126接收架构条件旗标902并确认其是否满足条件。此外，条件进位更新ALU操作微指令126写入条件旗标寄存器926。具体来说，条件进位更新ALU操作微指令126写入SAT位904以指示是否架构条件旗标902满足条件。此外，若是条件不被满足，条件进位更新ALU操作微指令126将旧的条件旗标数值写入架构条件旗标902；反之，若是条件满足，条件进位更新ALU操作微指令126依据ALU操作的结果来更新架构条件旗标902。架构条件旗标902的更新在图16会有更详细的说明。条件移动(CMOV)微指令126系类似于步骤1044所描述者。此流程终止于步骤1054。In step 1054, the hardware instruction translator 104 translates the conditional ALU instructions 124 used for flag update, preshift, and carry into the first, second and third microinstructions 126, namely: (1) a shift microinstruction instruction 126; (2) a conditional carry update ALU operation microinstruction 126 (labeled as CU ALUOP CC); and (3) a conditional move microinstruction 126. In one example of step 1054, the conditional ALU instruction 124 is similar to that described for step 1034; however, this conditional ALU instruction 124 also specifies the architectural condition flag 902 to be updated. The shift microinstruction 126 is similar to that described for step 1034. The conditional carry-update ALU operation microinstruction 126 and the conditional ALU instruction 124 specify the same conditions. The conditional carry update ALU operation microinstruction 126 performs an ALU operation on the first source operand R1 and the temporary register T3, and writes the result to a temporary register (labeled as T2). In addition, the conditional carry update ALU operation microinstruction 126 receives the architectural condition flag 902 and confirms whether it satisfies the condition. In addition, the conditional carry update ALU operation microinstruction 126 is written to the conditional flags register 926 . Specifically, the conditional carry update ALU operation microinstruction 126 writes to the SAT bit 904 to indicate whether the architectural condition flag 902 satisfies the condition. In addition, if the condition is not satisfied, the conditional carry update ALU operation microinstruction 126 writes the old condition flag value into the architectural condition flag 902; otherwise, if the condition is satisfied, the conditional carry update ALU operation microinstruction 126 is based on the result of the ALU operation to update the architectural condition flags 902. The updating of the architectural condition flags 902 is described in more detail in FIG. 16 . The conditional move (CMOV) microinstruction 126 is similar to that described for step 1044. The process ends at step 1054.

在步骤1056中，硬件指令转译器104将旗标更新、预移位、非进位更新的条件ALU指令124转译为第一、第二与第三微指令126，亦即(1)一移位微指令126；(2)一条件非进位更新ALU操作微指令126(标示为NCUALUOP CC)；以及(3)一条件移动微指令126。在步骤1056的实例中，条件ALU指令124系类似于步骤1054所描述者；不过，此条件ALU指令124是指定一非进位更新ALU操作。因此，当条件满足时，架构进位旗标902系以预移位旗标数值进行更新。移位微指令126系类似于步骤1034所描述者；不过，此微指令126会读取与写入条件旗标寄存器926。具体来说，此移位微指令126会：(1)将预移位操作所产生的条件旗标数值写入PSC位906；(2)设定USE位908以指示条件非进位更新ALU操作微指令126使用PSC906来更新架构进位旗标902；以及(3)将旧的架构条件旗标902写回条件旗标寄存器926，藉此，条件非进位更新ALU操作微指令126可评估架构条件旗标902的旧数值，来确认其是否满足条件。条件非进位更新ALU操作微指令126与条件ALU指令124系指定相同的条件。此条件非进位更新ALU操作微指令126在来源操作数R1与暂时寄存器T3执行ALU操作并将结果写入一暂时寄存器(标示为T2)。再者，条件非进位更新ALU操作微指令126接收架构条件旗标902并且确认其是否满足条件。此外，条件非进位更新ALU操作微指令126系写入条件旗标寄存器926。具体来说，条件非进位更新ALU操作微指令126系写入SAT位904以指示架构条件旗标902是否满足条件。此外，若是条件不满足，条件非进位更新ALU操作微指令126将旧的条件旗标数值写入架构条件旗标902；反之，若是条件被满足，条件非进位更新ALU操作微指令126则是基于ALU操作的结果更新架构条件旗标902。具体来说，架构溢位(V)旗标902系以旧的溢位旗标数值写入。此外，在USE位908的指示下，架构进位旗标902系以位于PSC位906的预移位进位旗标数值进行更新，否则就以旧的进位旗标数值924进行更新。架构条件旗标902的更新在图15会有更详细的说明。CMOV微指令126系类似于步骤1044所描述者。在另一实施例中，USE位908并不存在，并且硬件指令转译器104系直接产生USE位908的功能同等物作为条件非进位更新ALU操作微指令126的一指针。执行单元424检测此指针以确认是利用位于PSC位906的预移位进位旗标数值还是利用旧的进位旗标数值924来更新架构进位旗标902。此流程终止于步骤1056。In step 1056, the hardware instruction translator 104 translates the flag update, preshift, and non-carry update conditional ALU instructions 124 into the first, second and third microinstructions 126, ie (1) a shift microinstruction instruction 126; (2) a conditional non-carry update ALU operation microinstruction 126 (labeled as NCUALUOP CC); and (3) a conditional move microinstruction 126. In the example of step 1056, the conditional ALU instruction 124 is similar to that described in step 1054; however, the conditional ALU instruction 124 specifies a non-carry update ALU operation. Therefore, when the condition is met, the architectural carry flag 902 is updated with the pre-shift flag value. The shift microinstruction 126 is similar to that described in step 1034; however, the microinstruction 126 reads and writes the condition flags register 926. Specifically, the shift microinstruction 126 will: (1) write the conditional flag value generated by the preshift operation into the PSC bit 906; (2) set the USE bit 908 to indicate the conditional non-carry update ALU operation microinstruction Instruction 126 uses PSC 906 to update the architectural carry flag 902; and (3) writes the old architectural condition flag 902 back to the condition flags register 926, whereby the conditional non-carry update ALU operation microinstruction 126 can evaluate the architectural condition flag The old value of 902 to confirm whether it satisfies the condition. Conditional non-carry update ALU operation microinstructions 126 and conditional ALU instructions 124 specify the same conditions. The conditional non-carry update ALU operation microinstruction 126 performs an ALU operation on source operand R1 and temporary register T3 and writes the result to a temporary register (labeled as T2). Furthermore, the conditional non-carry update ALU operation microinstruction 126 receives the architectural condition flag 902 and confirms whether it satisfies the condition. Additionally, the conditional non-carry update ALU operation microinstruction 126 is written to the condition flags register 926 . Specifically, the conditional non-carry update ALU operation microinstruction 126 is written to the SAT bit 904 to indicate whether the architectural condition flag 902 is satisfied. In addition, if the condition is not satisfied, the conditional non-carry update ALU operation microinstruction 126 writes the old condition flag value into the architectural condition flag 902; otherwise, if the condition is satisfied, the conditional non-carry update ALU operation microinstruction 126 is based on The architectural condition flags 902 are updated as a result of the ALU operation. Specifically, the architectural overflow (V) flag 902 is written with the old overflow flag value. In addition, the architectural carry flag 902 is updated with the pre-shifted carry flag value at PSC bit 906 under the direction of USE bit 908, and the old carry flag value 924 otherwise. The updating of the architectural condition flags 902 is described in more detail in FIG. 15 . The CMOV microinstruction 126 is similar to that described for step 1044. In another embodiment, the USE bit 908 is not present, and the hardware instruction translator 104 directly generates the functional equivalent of the USE bit 908 as a pointer to the conditional non-carry update ALU operation microinstruction 126. Execution unit 424 checks this pointer to determine whether to update the architectural carry flag 902 with the pre-shifted carry flag value at PSC bit 906 or with the old carry flag value 924 . The process ends at step 1056.

在一实施例中，硬件指令转译器104系配置来产生且提供一调整过的立即常数而非输出一移位微指令126来进行此操作。在此实施例中，处理程序系类似于步骤1024、1026与1044所描述者，而非步骤1034、1036与1054/1056。此外，在此实施例中，硬件指令转译器104亦产生且由预移位操作提供进位旗标数值供条件ALU操作微指令126用于更新架构进位旗标902。In one embodiment, the hardware instruction translator 104 is configured to generate and provide an adjusted immediate constant rather than outputting a shift microinstruction 126 to do this. In this embodiment, the processing procedure is similar to that described for steps 1024, 1026 and 1044, rather than steps 1034, 1036 and 1054/1056. Furthermore, in this embodiment, the hardware instruction translator 104 also generates and provides the carry flag value by the pre-shift operation for the conditional ALU operation microinstruction 126 to update the architectural carry flag 902 .

请参照图11，图中是以一流程图显示本发明图4的执行单元424执行一移位微指令126的操作。此流程始于步骤1102。Please refer to FIG. 11 , which is a flowchart showing the operation of executing a shift microinstruction 126 by the execution unit 424 of FIG. 4 of the present invention. The process begins at step 1102.

在步骤1102中，图4的执行单元424中的一者接收到一移位微指令126，例如在图10所描述且由硬件指令转译器104响应所遭遇到的条件ALU指令124而产生的微指令。此执行单元424亦接收由微指令126所指定的来源操作数，包含条件旗标数值924，这些条件旗标数值924可能被或不被微指令126所使用。接下来进行至步骤1104。In step 1102, one of the execution units 424 of FIG. 4 receives a shift microinstruction 126, such as the microinstruction described in FIG. 10 and generated by the hardware instruction translator 104 in response to the encountered conditional ALU instruction 124 instruction. The execution unit 424 also receives source operands specified by the microinstruction 126 , including condition flag values 924 that may or may not be used by the microinstruction 126 . Next, proceed to step 1104 .

在步骤1104中，执行单元424执行由移位微指令126所指定的移位操作，此移位操作系执行于由移位微指令126所指定的操作数上以产生一结果，并将此结果输出至结果总线128。在一实施例中，此移位操作可包括但不限于一逻辑向左(LSL)、逻辑向右(LSR)、算术向右(ASR)、右转(ROR)、以及扩展右转(RRX)。此外，此执行单元424系基于移位操作的结果产生新的条件旗标数值。具体来说，执行单元424系基于移位操作的结果产生一进位旗标数值。在一实施例中，在逻辑向左(LSL)移位操作的情况下，进位旗标数值为一扩展数值(extended value)的第N个位，此扩展数值为M个最低有效位零串连左移后的操作数(Mleast significant bit zeroes concatenated with the operand being left-shifted)，其中N是原始操作数的位数量，M是所指定的正移位量；在逻辑向右(LSR)移位操作的情况下，进位旗标数值系一扩展数值的第(M-1)个位，而此扩展数值系原始操作数零扩展(M+N)个位，其中M是指定正移位量，N是原始操作数的位数；在算术向右(ASR)移位操作的情况下，进位旗标数值系一扩展数值的第(M-1)个位，此扩展数值系原始操作数符号扩展(sign-extended)(M+N)个位，其中M是特定正移位量，N是原始操作数的位数；在右转(ROR)移位操作的情况下，进位旗标数值系操作数右转后的结果的第(N-1)个位，此操作数系依据特定非零移位量(specified non-zero shift amount)模(mod)N进行右转，其中N是原始操作数的位数；在扩展向右(RRX)移位操作的情况下，进位旗标数值系原始操作数的位零。接下来进行至步骤1106。In step 1104, the execution unit 424 executes the shift operation specified by the shift microinstruction 126 on the operand specified by the shift microinstruction 126 to produce a result, and converts the result to Output to result bus 128 . In one embodiment, such shift operations may include, but are not limited to, a logical left (LSL), logical right (LSR), arithmetic right (ASR), right turn (ROR), and extended right (RRX) . In addition, the execution unit 424 generates a new conditional flag value based on the result of the shift operation. Specifically, the execution unit 424 generates a carry flag value based on the result of the shift operation. In one embodiment, in the case of a logical left (LSL) shift operation, the carry flag value is the Nth bit of an extended value, which is the M least significant bit zero concatenation Left-shifted operand (Mleast significant bit zeroes concatenated with the operand being left-shifted), where N is the number of bits in the original operand and M is the specified positive shift amount; shift in logical right (LSR) In the case of operation, the carry flag value is the (M-1)th bit of an extended value, and this extended value is zero-extended (M+N) bits of the original operand, where M is the specified positive shift amount, N is the number of bits of the original operand; in the case of an arithmetic right (ASR) shift operation, the carry flag value is the (M-1)th bit of an extended value that is sign-extended from the original operand (sign-extended)(M+N) bits, where M is the specific positive shift amount and N is the number of bits of the original operand; in the case of a right-turn (ROR) shift operation, the carry flag value coefficient operation The (N-1)th bit of the result after turning right, this operand is right-turned according to the specified non-zero shift amount modulo (mod) N, where N is the original operand The number of bits; in the case of an extended right (RRX) shift operation, the carry flag value is the bit zero of the original operand. Next, proceed to step 1106 .

在步骤1106中，执行单元424确认由硬件指令转译器104输出的移位微指令126是否指示执行单元424应写入进位旗标，如同图10B的步骤1056中的指令WRCARRY。具体来说，此移位微指令126指示位于旗标总线输出928的PSC位906应写入由移位操作产生的进位旗标数值写入，而USE位908应被设定，以使随后的条件非进位更新ALU操作微指令126生效以有条件地将PSC位906数值写入架构进位旗标902。若是执行单元424应该写入进位旗标，流程会前进至步骤1114；否则，流程就会前进至步骤1108。In step 1106, the execution unit 424 determines whether the shift microinstruction 126 output by the hardware instruction translator 104 indicates that the execution unit 424 should write the carry flag, as in the instruction WRCARRY in step 1056 of FIG. 10B. Specifically, the shift microinstruction 126 indicates that the PSC bit 906 at the flag bus output 928 should be written with the carry flag value generated by the shift operation, and the USE bit 908 should be set so that subsequent The conditional non-carry update ALU operation microinstruction 126 takes effect to conditionally write the PSC bit 906 value to the architectural carry flag 902 . If the execution unit 424 should write the carry flag, the flow proceeds to step 1114 ; otherwise, the flow proceeds to step 1108 .

在步骤1108中，执行单元424确认由硬件指令转译器104输出的移位微指令126是否指示执行单元424应写入条件旗标(标示为WRFLAGS)。虽然在移位微指令126未指示应写入PSC位906(标示为WRCARRY)的情况下，图10中没有任何一个移位微指令指示执行单元424应写入条件旗标，但硬件指令转译器104在转译其它ISA指令124时还是会产生此一移位微指令126。若是执行单元424应写入条件旗标，流程会前进到步骤1112；否则就会终止。In step 1108, the execution unit 424 determines whether the shift microinstruction 126 output by the hardware instruction translator 104 indicates that the execution unit 424 should write a condition flag (labeled as WRFLAGS). Although none of the shift microinstructions in FIG. 10 indicate that the execution unit 424 should write the condition flag without the shift microinstruction 126 indicating that the PSC bit 906 (labeled WRCARRY) should be written, the hardware instruction translator This shift microinstruction 126 will still be generated by 104 when translating other ISA instructions 124 . If the execution unit 424 should write the condition flag, the flow proceeds to step 1112; otherwise, it terminates.

在步骤1112，执行单元424输出数值到旗标总线928上来将PSC位906、USE位908、与SAT位904清除为零，并且将步骤1104所产生的新的架构条件旗标902数值写入架构条件旗标902。此流程终止于步骤1114。At step 1112, the execution unit 424 outputs a value on the flag bus 928 to clear the PSC bit 906, USE bit 908, and SAT bit 904 to zero, and writes the new frame condition flag 902 value generated at step 1104 into the frame Conditional flags 902 . The process ends at step 1114.

在步骤1114，执行单元424输出数值到旗标总线928上以将步骤1112所产生的进位旗标数值写入PSC位906、设定USE位908为一、将SAT位904清除为零、并且以步骤1102所接收到的旧的架构条件旗标902写入数值架构条件旗标902。此流程终止于步骤1114。At step 1114, execution unit 424 outputs a value on flag bus 928 to write the carry flag value generated at step 1112 into PSC bit 906, set USE bit 908 to one, clear SAT bit 904 to zero, and start with The old frame condition flag 902 received in step 1102 is written into the numerical frame condition flag 902 . The process ends at step 1114.

请参照图12(包括图12A和图12B)，图中显示一流程图描述本发明图4的执行单元424执行一条件ALU微指令126的操作。此流程始于步骤1202。Please refer to FIG. 12 (including FIG. 12A and FIG. 12B ), which shows a flowchart describing the operation of the execution unit 424 of FIG. 4 to execute a conditional ALU microinstruction 126 of the present invention. The process begins at step 1202.

在步骤1202中，图4的执行单元424的其中一者接收到一条件ALU微指令126，如图10所述由硬件指令转译器104响应遇到的一条件ALU指令124所产生微指令的情形。此执行单元424亦接收由微指令指定的来源操作数，包含条件旗标数值924，而不论其是否会被微指令126使用到。需了解的是，执行单元424也会依据类似于图12描述的处理程序，而排除其中步骤1209、1212、1214与1216的执移操作，来执行非条件ALU微指令126，此微指令可以是图10所述由硬件指令转译器104响应遇到一条件ALU指令124所产生的条件微指令。此外，执行条件ALU微指令126的执行单元424与执行相关移位微指令126以及/或XMOV/CMOV微指令126的执行单元424可以相同或是不同。接下来流程前进至步骤1204。In step 1202 , one of the execution units 424 of FIG. 4 receives a conditional ALU microinstruction 126 , as described in FIG. 10 , in the case of the microinstruction generated by the hardware instruction translator 104 in response to an encountered conditional ALU instruction 124 . The execution unit 424 also receives the source operand specified by the microinstruction, including the conditional flag value 924, whether or not it will be used by the microinstruction 126. It should be understood that the execution unit 424 also executes the unconditional ALU microinstruction 126 according to the processing procedure similar to that described in FIG. 12 , excluding the execution operations of steps 1209 , 1212 , 1214 and 1216 . The microinstruction may be Figure 10 illustrates the conditional microinstructions generated by the hardware instruction translator 104 in response to encountering a conditional ALU instruction 124. In addition, the execution unit 424 that executes the conditional ALU microinstruction 126 and the execution unit 424 that executes the associated shift microinstruction 126 and/or the XMOV/CMOV microinstruction 126 may be the same or different. Next, the flow proceeds to step 1204 .

在步骤1204，执行单元424对由条件ALU微指令126所指定的操作数，执行由条件ALU微指令126所指定的ALU操作，以产生一结果并将此结果输出至结果总线128。此外，执行单元424亦基于ALU操作的结果产生新的架构条件旗标902数值。若ALU操作使用进位旗标，执行单元424就使用所接收的架构进位旗标924的旧的数值，而非由ALU操作所产生的新的进位旗标数值。接下来流程前进至步骤1206。At step 1204 , the execution unit 424 performs the ALU operation specified by the conditional ALU microinstruction 126 on the operand specified by the conditional ALU microinstruction 126 to generate a result and output the result to the result bus 128 . In addition, the execution unit 424 also generates a new value of the architectural condition flag 902 based on the result of the ALU operation. If the ALU operation uses the carry flag, the execution unit 424 uses the old value of the received architectural carry flag 924 instead of the new carry flag value generated by the ALU operation. Next, the flow proceeds to step 1206 .

在步骤1206中，执行单元424确认由步骤1202接收的架构条件旗标924是否满足此指定条件。此确认结果在后续步骤1212与1214中会被使用。接下来流程前进至步骤1208。In step 1206, the execution unit 424 confirms whether the architectural condition flag 924 received by the step 1202 satisfies the specified condition. This confirmation result will be used in subsequent steps 1212 and 1214 . Next, the flow proceeds to step 1208 .

在步骤1208中，执行单元424确认条件ALU微指令126是否指示执行单元424写入条件旗标寄存器926，如同图10A及10B的许多步骤中的指令WRFLAGS。若是，流程前进至步骤1214；否则，流程前进至步骤1209。In step 1208, execution unit 424 determines whether conditional ALU microinstruction 126 instructs execution unit 424 to write to conditional flags register 926, as in the instruction WRFLAGS in many of the steps of Figures 10A and 10B. If so, the process proceeds to step 1214; otherwise, the process proceeds to step 1209.

在步骤1209中，若是步骤1206确认结果为条件满足，流程前进至步骤1211；否则，流程前进至步骤1212。In step 1209 , if the result confirmed in step 1206 is that the condition is satisfied, the process proceeds to step 1211 ; otherwise, the process proceeds to step 1212 .

在步骤1211中，由于条件满足，执行单元424输出步骤1204产生的结果至结果总线128。不过，条件ALU微指令126并不更新条件旗标寄存器926，因为条件ALU微指令126系指定为不去更新架构条件旗标902。如前述，由执行单元424输出至结果总线128/928的结果与条件旗标数值系传送至执行管线112的其它执行单元424，并且被写入重排缓冲器422相关于条件ALU微指令126的项目。需要了解的是，即使微指令126系指定为不去更新架构条件旗标902，执行单元424仍然输出一些数值至旗标结果总线928以写入重排缓冲器422相关于条件ALU微指令126的项目，不过这些数值将不会由重排缓冲器422引退至目的寄存器106与/或条件旗标寄存器926。这也就是说，是否写入重排寄存器422的项目的数值最终会被引退的确认操作，是由执行管线112的引退单元基于微指令126的类型、例外事件出现、分支误预测、或是其它无效事件来进行，而非由执行单元424本身来进行。此流程终止于步骤1211。In step 1211 , since the condition is satisfied, the execution unit 424 outputs the result generated in step 1204 to the result bus 128 . However, the conditional ALU microinstruction 126 does not update the conditional flag register 926 because the conditional ALU microinstruction 126 is designated not to update the architectural conditional flag 902 . As previously described, the result and condition flag values output by the execution unit 424 to the result bus 128/928 are passed to the other execution units 424 of the execution pipeline 112 and written to the rearrangement buffer 422 relative to the conditional ALU microinstruction 126 project. It should be understood that even though the microinstruction 126 is designated not to update the architectural condition flag 902, the execution unit 424 still outputs some value to the flag result bus 928 to write to the rearrangement buffer 422 relative to the conditional ALU microinstruction 126. items, but these values will not be retired by the rearrangement buffer 422 to the destination register 106 and/or the condition flag register 926. That is to say, the confirmation of whether the value of the entry written to the rearrangement register 422 will eventually be retired is performed by the retirement unit of the execution pipeline 112 based on the type of the microinstruction 126, the occurrence of an exception event, a branch misprediction, or other Invalidation events are performed, rather than by the execution unit 424 itself. The process ends at step 1211.

在步骤1212中，执行单元424输出第一来源操作数至结果总线128。值得注意的是，在条件不被满足时，图10A及10B所描述的多种条件ALU微指令126并不使用此输出第一来源操作数。具体来说，图10A及10B的XMOV与CMOV微指令126会写回旧的目的寄存器数值而非暂时寄存器T2的数值。然而，在图21A及21B及其后续图示的说明中，对于其它格式的条件ALU指令124的转译而言，即相同来源目的条件ALU指令124(或是其它ISA指令124)，硬件指令转译器104于产生条件ALU微指令126时，其中第一来源操作数也是ISA指令124指定的目的寄存器，藉以在条件不被满足时写回原本的目的寄存器的数值。如步骤1211所述，条件ALU微指令126并不更新条件旗标寄存器926因为条件ALU微指令126系指定为不去更新架构条件旗标902。此流程终止于步骤1212。In step 1212 , the execution unit 424 outputs the first source operand to the result bus 128 . It is worth noting that the various conditional ALU microinstructions 126 described in FIGS. 10A and 10B do not use this output first source operand when the condition is not satisfied. Specifically, the XMOV and CMOV microinstructions 126 of Figures 10A and 10B write back the old destination register value instead of the temporary register T2 value. However, in the descriptions of FIGS. 21A and 21B and subsequent figures, for the translation of conditional ALU instructions 124 in other formats, that is, the same source and destination conditional ALU instructions 124 (or other ISA instructions 124 ), the hardware instruction translator 104 When the conditional ALU microinstruction 126 is generated, the first source operand is also the destination register specified by the ISA instruction 124, so as to write back the value of the original destination register when the condition is not satisfied. As described in step 1211, the conditional ALU microinstruction 126 does not update the conditional flags register 926 because the conditional ALU microinstruction 126 is designated not to update the architectural conditional flags 902. The process ends at step 1212.

在步骤1214中，若是步骤1206确认条件被满足，流程会前进至步骤1218；否则，流程会前进至步骤1216。In step 1214 , if the conditions confirmed in step 1206 are satisfied, the process proceeds to step 1218 ; otherwise, the process proceeds to step 1216 .

在步骤1216中，执行单元424输出第一来源操作数，清除USE位908、PSC位906、与SAT位904为零，以及输出由步骤1202接收的旧的架构条件旗标924数值至旗标总线928，以能够在不调整架构条件旗标902的数值的情况下，将条件ALU指令124整体视为一不操作指令来执行(亦即不去执行条件ALU指令124)。此流程终止于步骤1216。In step 1216, execution unit 424 outputs the first source operand, clears USE bit 908, PSC bit 906, and SAT bit 904 to zero, and outputs the old architecture condition flag 924 value received in step 1202 to the flag bus 928, so that the conditional ALU instruction 124 as a whole can be regarded as a no-operation instruction to be executed (ie, the conditional ALU instruction 124 is not executed) without adjusting the value of the architectural condition flag 902. The process ends at step 1216.

在步骤1218中，执行单元424确认条件ALU微指令126是否指定一进位更新ALU操作。在一实施例中，执行单元424对条件ALU微指令126的操作码进行解码，以做出确认结果。在另一实施例中，硬件指令转译器104确认ALU操作是否是图10A的步骤1052的进位更新操作，并据此提供一指针至执行单元424。在一实施例中，非进位更新ALU操作包含但不限于由AND、BIC、EOR、ORN、ORR、TEQ、TST、MUL、MOV、MVN、ASR、LSL、LSR、ROR、与RRX ARM ISA指令124所指定的操作。若是ALU操作系进位更新操作，流程前进至步骤1222；否则流程前进至步骤1224。In step 1218, execution unit 424 determines whether conditional ALU microinstruction 126 specifies a carry-update ALU operation. In one embodiment, the execution unit 424 decodes the opcode of the conditional ALU microinstruction 126 to make a validation result. In another embodiment, the hardware instruction translator 104 determines whether the ALU operation is the carry-update operation of step 1052 of FIG. 10A and provides a pointer to the execution unit 424 accordingly. In one embodiment, non-carry update ALU operations include, but are not limited to, operations performed by AND, BIC, EOR, ORN, ORR, TEQ, TST, MUL, MOV, MVN, ASR, LSL, LSR, ROR, and RRX ARM ISA instructions 124 the specified action. If the ALU operating system is a carry update operation, the process proceeds to step 1222 ; otherwise, the process proceeds to step 1224 .

在步骤1222，执行单元424输出步骤1204产生的结果，清除USE位908与PSC位906为零，设定SAT位904为一，以及输出步骤1204产生的新的架构条件旗标数值至旗标总线928。值得注意的是，不更新溢位旗标但指定一进位更新ALU操作(如ASR、LSL、LSR、ROR、与RRX操作)的条件ALU微指令126的处理过程与步骤1222所描述者有些许不同。尤其是，执行单元424输出旧的V旗标数值而非新的V旗标数值。此流程终止于步骤1222。At step 1222, the execution unit 424 outputs the result generated in step 1204, clears the USE bit 908 and the PSC bit 906 to zero, sets the SAT bit 904 to one, and outputs the new architecture condition flag value generated in step 1204 to the flag bus 928. It is worth noting that the processing of conditional ALU microinstructions 126 that do not update the overflow flag but specify a carry-update ALU operation (such as ASR, LSL, LSR, ROR, and RRX operations) is slightly different from that described in step 1222 . In particular, execution unit 424 outputs the old VFlag value instead of the new VFlag value. The process ends at step 1222.

在步骤1224，执行单元424检验USE位908。若USE位908被设定为一，流程会前进至步骤1228；否则，流程就会前进至步骤1226。在另一实施例中，如上文/下文所述，USE位908并不存在，而执行单元424则是检测条件ALU微指令126内的指针，以确认是否以PSC位906内的预移位进位旗标数值来更新架构进位旗标902，还是使用旧的进位旗标数值924。At step 1224, the execution unit 424 checks the USE bit 908. If the USE bit 908 is set to one, the flow proceeds to step 1228; otherwise, the flow proceeds to step 1226. In another embodiment, as described above/below, the USE bit 908 is not present, and the execution unit 424 checks the pointer in the conditional ALU microinstruction 126 to determine whether to carry with the pre-shift in the PSC bit 906 Flag value to update the architectural carry flag 902, or use the old carry flag value 924.

在步骤1226中，执行单元424输出步骤1204产生的结果、清除USE位908与PSC位906为零、设定SAT位904为一，以及以下列方式输出架构条件旗标至旗标总线928：C旗标与V旗标系写入由步骤1202所接收的旧的C旗标与V旗标数值；N旗标与Z旗标系分别以步骤1204所产生的新的N旗标与Z旗标数值做写入。此流程终止于步骤1226。In step 1226, execution unit 424 outputs the result generated in step 1204, clears USE bit 908 and PSC bit 906 to zero, sets SAT bit 904 to one, and outputs an architectural condition flag to flag bus 928 in the following manner: C The flags and V flags are written with the old C flag and V flag values received in step 1202; the N flag and Z flag are written with the new N flag and Z flag generated in step 1204, respectively Write the value. The process ends at step 1226.

在步骤1228中，执行单元424输出步骤1204产生的结果，清除USE位908与PSC位906为零，设定SAT位904为一，以及以下列方式输出架构条件旗标至旗标总线928：C旗标写入由步骤1202所接收的PSC位906的数值；V旗标写入由步骤1202所接收的旧的V旗标数值；N旗标与Z旗标分别写入步骤1204所接收的新的N旗标与Z旗标数值。此流程终止于步骤1228。In step 1228, execution unit 424 outputs the result generated in step 1204, clears USE bit 908 and PSC bit 906 to zero, sets SAT bit 904 to one, and outputs an architectural condition flag to flag bus 928 in the following manner: C Flag is written to the value of PSC bit 906 received in step 1202; V flag is written to the old V flag value received in step 1202; The N flag and Z flag values of . The process ends at step 1228.

在一实施例中，输出于旗标总线928的数值会因为依据指令模式指针132指示为x86或ARM而有不同，因此，执行单元424会以不同方式执行条件ALU微指令126。具体来说，若是指令模式指针132指示为x86，执行单元424就不去区分ALU操作模式是进位更新或是非进位更新、不考虑USE位908、以及利用x86语意来更新条件码旗标。In one embodiment, the value output on the flag bus 928 is different depending on whether the instruction mode pointer 132 indicates x86 or ARM. Therefore, the execution unit 424 executes the conditional ALU microinstruction 126 in different ways. Specifically, if the instruction mode pointer 132 indicates x86, the execution unit 424 does not distinguish whether the ALU operating mode is carry-update or non-carry-update, ignores the USE bit 908, and uses x86 semantics to update the condition code flags.

请参照图13，图中显示本发明图4的执行单元424执行一条件移动微指令126的操作。此流程起始于步骤1302。Please refer to FIG. 13 , which shows the operation of executing a conditional move microinstruction 126 by the execution unit 424 of FIG. 4 of the present invention. The process starts at step 1302.

在步骤1302，图4的执行单元424的其中一者接收到一条件移动微指令126，如图10所述由硬件指令转译器104响应遇到一条件ALU指令124所产生微指令(标示为CMOV或XMOV)的情形。此执行单元424亦接收微指令126所指定的来源操作数，包含条件旗标数值924，无论其是否会被微指令126所使用。接下来前进至步骤1304。At step 1302, one of the execution units 424 of FIG. 4 receives a conditional move microinstruction 126, a microinstruction (labeled as CMOV) generated by the hardware instruction translator 104 in response to encountering a conditional ALU instruction 124 as described in FIG. 10 . or XMOV). The execution unit 424 also receives the source operand specified by the microinstruction 126 , including the conditional flag value 924 , whether or not it will be used by the microinstruction 126 . Next, proceed to step 1304 .

在步骤1304，执行单元424对微指令126进行解码以确认其是否为一个XMOV微指令126或是一个微指令126。如果是CMOV微指令126，流程前进至步骤1308；否则流程前进至步骤1306。At step 1304 , the execution unit 424 decodes the microinstruction 126 to determine whether it is an XMOV microinstruction 126 or a microinstruction 126 . If it is the CMOV microinstruction 126, the flow proceeds to step 1308; otherwise, the flow proceeds to step 1306.

在步骤1306，执行单元424确认步骤1302接收到的架构条件旗标902并确认是否满足条件。接下来前进至步骤1312。At step 1306, the execution unit 424 validates the architectural condition flag 902 received at step 1302 and confirms whether the condition is satisfied. Next, proceed to step 1312.

在步骤1308，执行单元424检验由步骤1302接收到的SAT位904并由其确认条件是否满足，如同一写入SAT位904的相对应条件ALU微指令126在之前所做的确认，如图10的步骤1044、1054、与1056所述。接下来流程前进至步骤1312。At step 1308, the execution unit 424 examines the SAT bit 904 received by step 1302 and verifies that the condition is satisfied, as previously done by the corresponding conditional ALU microinstruction 126 that wrote the SAT bit 904, as shown in Figure 10 Steps 1044, 1054, and 1056 are described. Next, the flow proceeds to step 1312 .

在步骤1312，若是步骤1306或1308确认条件满足，流程就会前进至步骤1316；否则就会前进至步骤1314。At step 1312, if the conditions confirmed by step 1306 or 1308 are satisfied, the flow proceeds to step 1316; otherwise, it proceeds to step 1314.

在步骤1314，执行单元424输出第一来源操作数的数值至结果总线128。在图10中，第一来源操作数的数值系旧的目的寄存器数值，以能够在条件不被满足且不变更目的寄存器的数值的情况下，有利于将条件ALU指令124整体视为一不操作指令(即不去执行条件ALU指令124)来执行。此流程终止于步骤1314。At step 1314 , the execution unit 424 outputs the value of the first source operand to the result bus 128 . In FIG. 10, the value of the first source operand is the value of the old destination register, so that the conditional ALU instruction 124 as a whole can be regarded as a no-operation under the condition that the condition is not satisfied and the value of the destination register is not changed. instruction (ie, without executing the conditional ALU instruction 124). The process ends at step 1314.

在步骤1316中，执行单元424输出第二来源操作数的数值至结果总线128。如图10所示，此第二来源操作数的数值系由相关的条件ALU微指令126写入暂时寄存器的数值，以利于在预设条件满足的情况下，通过将结果写入目的寄存器，以协助条件ALU指令124的执行。此流程终止于步骤1316。In step 1316 , the execution unit 424 outputs the value of the second source operand to the result bus 128 . As shown in FIG. 10, the value of the second source operand is written into the temporary register by the relevant conditional ALU microinstruction 126, so that when the preset condition is satisfied, the result is written into the destination register to Assists in the execution of conditional ALU instructions 124 . The process ends at step 1316.

请参照图14，图中是以一方块图显示本发明图1的执行管线112执行一条件ALU指令124的操作。具体来说，此条件ALU指令124系一旗标更新、非预移位、条件ALU的操作ISA指令124。硬件指令转译器104系将此指令124转译为图10的步骤1044的微指令126。图4的寄存器配置表402产生附属信息给位于暂时寄存器T2的CMOV微指令126以及由条件ALUOP微指令126写入的条件旗标寄存器926的数值等等。指令调度器404将微指令126分派至图4中适当的保留站406。当微指令126可取得所有来源操作数的数值时(无论是从导向总线(forwarding bus)128、重排缓冲器(ROB)422、或是寄存器文件106)，指令发布单元408确认一微指令126已完成由其保留站406发送至相对应执行单元以供执行的准备。此微指令126系依据图12(包括图12A和图12B)与图13的描述内容来执行。Please refer to FIG. 14 , which is a block diagram showing the operation of the execution pipeline 112 of FIG. 1 to execute a conditional ALU instruction 124 of the present invention. Specifically, the conditional ALU instruction 124 is a flag update, non-preshift, conditional ALU operation ISA instruction 124. The hardware instruction translator 104 translates the instruction 124 into the microinstruction 126 of step 1044 of FIG. 10 . The register configuration table 402 of FIG. 4 generates auxiliary information to the CMOV microinstruction 126 located in the temporary register T2, the value of the condition flag register 926 written by the conditional ALUOP microinstruction 126, and so on. The instruction scheduler 404 dispatches the microinstructions 126 to the appropriate reservation stations 406 in FIG. 4 . When the values of all source operands are available to the microinstruction 126 (whether from the forwarding bus 128 , the rearrangement buffer (ROB) 422 , or the register file 106 ), the instruction issue unit 408 acknowledges a microinstruction 126 Ready to be sent by its reservation station 406 to the corresponding execution unit for execution. The microinstruction 126 is executed according to the description of FIG. 12 (including FIG. 12A and FIG. 12B ) and FIG. 13 .

执行单元424从保留站406接收步骤1044所产生的条件ALUOP微指令126、从图1的寄存器文件106的寄存器R1与R2接收来源操作数的数值、以及依据图12A的步骤1202从图9的条件旗标寄存器926(或是从导向总线128与/或ROB422)接收条件旗标924。执行单元424对寄存器R1与R2执行ALU操作(若是此ALU操作是使用进位操作，则对所接收的C旗标902执行此操作)以产生一结果，此结果系依据步骤1204写入暂时寄存器T2。此外，(1)若是架构条件旗标902并不满足指定的条件(在图14中标示为NOT SATISFIED)，执行单元424依据图12B的步骤1216产生新的条件旗标928数值以写入条件旗标寄存器926；(2)若是架构条件旗标902满足指定的条件而ALU操作系非进位更新的操作(在图14中标示为NCUALUOP SAT)，执行单元424依据图12的步骤1226产生新的条件旗标928数值以写入条件旗标寄存器926；并且(3)若是架构条件旗标902满足指定的条件并且ALU操作系进位更新的操作(在图14中标示为CU ALUOP SAT)，执行单元424依据图12的步骤1222产生新的条件旗标928数值以写入条件旗标寄存器926。暂时寄存器T2的数值与条件旗标928系提供于导向总线128供CMOV微指令126利用，在并非来自导向总线128的情况下写入重排缓冲器422的项目供CMOV微指令126利用，并且在并非来自导向总线128或重排缓冲器422的情况下，除了在例外事件出现、分支误预测、或是其它无效事件的情形下是最终引退至适当的架构状态而被CMOV微指令126所利用。尤其是，图9的多工器922将操作而选择适当的条件旗标924提供给执行单元424。Execution unit 424 receives the conditional ALUOP microinstruction 126 generated in step 1044 from reservation station 406, the value of the source operand from registers R1 and R2 of register file 106 of FIG. 1, and the conditional value of FIG. 9 according to step 1202 of FIG. Flags register 926 (or from steering bus 128 and/or ROB 422) receives condition flags 924. The execution unit 424 performs an ALU operation on the registers R1 and R2 (if the ALU operation uses a carry operation, the operation is performed on the received C flag 902 ) to generate a result, which is written to the temporary register T2 according to step 1204 . In addition, (1) if the architectural condition flag 902 does not satisfy the specified condition (marked as NOT SATISFIED in FIG. 14 ), the execution unit 424 generates a new value of the condition flag 928 to write the condition flag according to step 1216 in FIG. 12B (2) If the architecture condition flag 902 satisfies the specified condition and the ALU operation is a non-carry update operation (marked as NCUALUOP SAT in FIG. 14 ), the execution unit 424 generates a new condition according to step 1226 in FIG. 12 and (3) if the architectural condition flag 902 satisfies the specified condition and the ALU operating system is a carry-update operation (labeled CU ALUOP SAT in FIG. 14 ), the execution unit 424 A new condition flag 928 value is generated to write to the condition flag register 926 according to step 1222 of FIG. 12 . The value of the temporary register T2 and the condition flag 928 are provided on the steering bus 128 for use by the CMOV microinstruction 126, the entry written to the rearrangement buffer 422 if not from the steering bus 128 is used by the CMOV microinstruction 126, and Not from the steering bus 128 or the rearrangement buffer 422, except in the event of an exception occurrence, branch misprediction, or other invalid event eventual retirement to the appropriate architectural state utilized by the CMOV microinstruction 126. In particular, the multiplexer 922 of FIG. 9 provides the operation to select the appropriate condition flag 924 to the execution unit 424.

执行单元424接收步骤1044的CMOV微指令126，暂时寄存器T2与目的寄存器(RD)的来源操作数数值，以及依据图13的步骤1302所产生的条件旗标924。依据图13的步骤1316与1314，当SAT位904被设定时，执行单元424输出暂时寄存器T2的来源操作数的数值，当SAT位904被清除时，执行单元424输出目的寄存器RD的来源操作数的数值。此结果数值系提供于导向总线128供后续微指令126所利用，并写入重排寄存器422的项目，最后在除了例外事件出现、分支误预测、或是其它无效事件之外，被引退至其适当的架构状态而被微指令126所利用。The execution unit 424 receives the CMOV microinstruction 126 of step 1044 , the source operand values of the temporary register T2 and the destination register (RD), and the condition flag 924 generated according to the step 1302 of FIG. 13 . According to steps 1316 and 1314 of FIG. 13, when the SAT bit 904 is set, the execution unit 424 outputs the value of the source operand of the temporary register T2, and when the SAT bit 904 is cleared, the execution unit 424 outputs the source operation of the destination register RD number value. The resulting value is provided on the steering bus 128 for use by subsequent microinstructions 126, written to the entries in the rearrangement register 422, and finally retired to its The appropriate architectural state is utilized by microinstructions 126 .

如步骤1222所述，旗标更新条件ALU指令124指定一进位更新ALU操作，但不更新溢位旗标，如ARM ISA ASR、LSL、LSR、ROR、与RRX指令124，这些指令124的处理程序与图14所示有些不同。尤其是，执行单元424输出旧的V旗标数值而非新的V旗标数值。最后，如前述，旗标更新ARM ISA MUL以及MOV/MVN(寄存器)指令124均属非进位更新指令并且无法指定一预移位操作，因而系以步骤1044的程序处理。在图12B的步骤1226有更明确的说明。As described in step 1222, the flag update condition ALU instruction 124 specifies a carry update ALU operation, but does not update the overflow flag, such as the ARM ISA ASR, LSL, LSR, ROR, and RRX instructions 124, and the processing procedures of these instructions 124 It is slightly different from that shown in Figure 14. In particular, execution unit 424 outputs the old VFlag value instead of the new VFlag value. Finally, as mentioned above, the flag update ARM ISA MUL and MOV/MVN (register) instructions 124 are both non-carry update instructions and cannot specify a pre-shift operation, so the procedure of step 1044 is performed. This is more explicitly described at step 1226 of Figure 12B.

在前文中可发现，ALU操作微指令126系通过SAT位904指示CMOV微指令126是否旧的条件旗标902满足指定条件，藉以使ALU操作微指令126替代条件旗标902的旧的数值，并在条件满足时，依据ALU操作结果所产生的适当数值来进行处理。It can be found in the foregoing that the ALU operation microinstruction 126 indicates through the SAT bit 904 whether the old condition flag 902 of the CMOV microinstruction 126 satisfies the specified condition, so that the ALU operation microinstruction 126 replaces the old value of the conditional flag 902, and When the conditions are met, processing is performed according to the appropriate value generated by the ALU operation result.

请参照图15(包括图15A和图15B)，图中是以一方块图说明本发明图1的执行管线112执行一条件ALU指令124。具体来说，此条件ALU指令124系一旗标更新、预移位、非进位更新条件ALU的操作ISA指令124，硬件指令转译器104系将此指令124转译为图10B步骤1056所示的微指令126。图15(包括图15A和图15B)的操作在许多面向系类似于图14的操作，相似的操作在此不再赘述，以下仅列出相异处。图4的寄存器配置表402产生附属信息给位于暂时寄存器T3的NCUALUOP微指令126以及由移位微指令126写入的条件旗标寄存器926的数值等等。微指令126系依据第11、12与13图的描述来执行。Please refer to FIG. 15 (including FIG. 15A and FIG. 15B ), which is a block diagram illustrating the execution of a conditional ALU instruction 124 by the execution pipeline 112 of FIG. 1 of the present invention. Specifically, the conditional ALU instruction 124 is a flag update, pre-shift, non-carry update conditional ALU operation ISA instruction 124, and the hardware instruction translator 104 translates this instruction 124 into the microcomputer shown in step 1056 of FIG. 10B. Instruction 126. The operations in FIG. 15 (including FIG. 15A and FIG. 15B ) are similar to the operations in FIG. 14 in many aspects, and similar operations are not repeated here, and only the differences are listed below. The register configuration table 402 of FIG. 4 generates auxiliary information to the NCUALUOP microinstruction 126 located in the temporary register T3, the value of the condition flags register 926 written by the shift microinstruction 126, and so on. Microinstructions 126 are executed as described in Figures 11, 12 and 13.

执行单元424由保留站406接收步骤1056所产生的移位微指令126、由寄存器文件106的寄存器R2与R3接收来源操作数数值、以及依据图11的步骤1102由条件旗标寄存器926接收条件旗标924(或是由导向总线128与/或重排缓冲器422)。执行单元424在寄存器R2与R3执行移位操作(若是ALU操作系进位使用操作，则对所接收到的C旗标902进行此操作)以产生一结果并依据步骤1104写入暂时寄存器T3。此外，执行单元424依据步骤1104产生新的架构条件旗标902数值，并依据图11的写入条件旗标寄存器926的步骤1114来写入新的条件旗标928。暂时寄存器T3的数值与条件旗标928系提供至导向总线128供NCUALUOP微指令126利用，若非来自导向总线128则是写入重排缓冲器422的项目供NCUALUOP微指令126利用，并且若非来自导向总线128或重排缓冲器422时，最后在除了例外事件出现、分支误预测、或是其它无效事件之外被退出至其适当的状态而被NCUALUOP微指令126利用。尤其是，图9的多工器922的操作系选择适当的条件旗标924提供给执行单元424。The execution unit 424 receives the shift microinstruction 126 generated in step 1056 from the reservation station 406, the source operand value from registers R2 and R3 of the register file 106, and the condition flag from the condition flag register 926 according to step 1102 of FIG. Target 924 (or by steering bus 128 and/or rearrangement buffer 422). The execution unit 424 performs a shift operation in the registers R2 and R3 (if the ALU operation is a carry operation, this operation is performed on the received C flag 902 ) to generate a result and write to the temporary register T3 according to step 1104 . In addition, the execution unit 424 generates a new value of the architectural condition flag 902 according to step 1104, and writes a new condition flag 928 according to step 1114 of writing the condition flag register 926 of FIG. 11 . The value of the temporary register T3 and the condition flag 928 are provided to the steering bus 128 for use by the NCUALUOP microinstruction 126, if not from the steering bus 128, the entry written to the rearrangement buffer 422 for use by the NCUALUOP microinstruction 126, and if not from the steering bus 128. The bus 128 or the rearrangement buffer 422 is finally exited to its appropriate state for use by the NCUALUOP microinstruction 126, except for exception occurrences, branch mispredictions, or other invalid events. In particular, the operating system of the multiplexer 922 of FIG. 9 selects the appropriate condition flags 924 to provide to the execution unit 424 .

执行单元424由保留站406接收步骤1056所产生的NCUALUOP微指令126、由寄存器文件106的寄存器R1与暂时寄存器T3接收来源操作数数值、以及依据步骤1202由条件旗标寄存器926接收条件旗标924。执行单元424在寄存器R1与暂时寄存器T3执行ALU操作(在ALU操作系一使用进位操作时，亦在接收到的C旗标902执行)以产生一结果，并依据步骤1204写入暂时寄存器T2。此外：(1)若是架构条件旗标902不满足指定条件(图15中标示为NOTSATISFIED)，执行单元424依据步骤1216产生新的条件旗标928数值以写入条件旗标寄存器926；(2)若是架构条件旗标902满足指定条件且USE位908系被清除(在图15中标示为SAT.,USE＝＝0)，执行单元424依据图12B的步骤1226产生新的条件旗标928数值以写入条件旗标寄存器926；以及(3)若是架构条件旗标902满足指定条件且USE位908被设定(图15中标示为SAT.,USE＝＝1)，执行单元424依据图12的步骤1228产生新的条件旗标928数值以写入条件旗标寄存器926。图15的CMOV微指令126的执行系类似于图14所描述者。在另一实施例中，如前述，USE位908并不存在，而执行单元424改以检验条件ALU微指令126内的指针来确认是以PSC位906内的预移位进位旗标数值更新架构进位旗标902，还是以旧的进位旗标数值924做更新。The execution unit 424 receives the NCUALUOP microinstruction 126 generated in step 1056 from the reservation station 406, the source operand value from the register R1 and the temporary register T3 of the register file 106, and the condition flag 924 from the condition flag register 926 according to step 1202. . The execution unit 424 performs an ALU operation on the register R1 and the temporary register T3 (and also on the received C flag 902 when the ALU operation is a carry operation) to generate a result, and writes the temporary register T2 according to step 1204 . In addition: (1) if the architectural condition flag 902 does not satisfy the specified condition (marked as NOTSATISFIED in FIG. 15 ), the execution unit 424 generates a new value of the condition flag 928 according to step 1216 to write to the condition flag register 926; (2) If the architectural condition flag 902 satisfies the specified condition and the USE bit 908 is cleared (marked as SAT., USE==0 in FIG. 15 ), the execution unit 424 generates a new value of the condition flag 928 according to step 1226 of FIG. 12B to write the condition flag register 926; and (3) if the architecture condition flag 902 meets the specified condition and the USE bit 908 is set (labeled as SAT., USE==1 in FIG. 15), the execution unit 424 according to the Step 1228 generates a new condition flag 928 value to write to the condition flag register 926. The execution of the CMOV microinstruction 126 of FIG. 15 is similar to that described in FIG. 14 . In another embodiment, as previously described, the USE bit 908 is not present, and the execution unit 424 instead checks the pointer in the conditional ALU microinstruction 126 to confirm that the architecture was updated with the pre-shift carry flag value in the PSC bit 906 The carry flag 902 is still updated with the old carry flag value 924.

在前文中可发现，移位微指令126并不替代条件旗标902的旧的数值，而是将条件旗标902的旧的数值写回条件旗标寄存器926，因此，由移位微指令126接收条件旗标寄存器926的结果的条件ALU操作微指令126，可确认旧的条件旗标902是否满足由ISA条件ALU指令124所指定的条件。另一方面，若是移位微指令126系以新产生的进位旗标数值代替旧的进位旗标902，条件ALU操作微指令126将不会确认旧的条件旗标902是否满足指定条件。It can be found in the foregoing that the shift microinstruction 126 does not replace the old value of the condition flag 902, but writes the old value of the condition flag 902 back to the condition flag register 926. Therefore, the shift microinstruction 126 The conditional ALU operation microinstruction 126 receiving the result of the conditional flags register 926 may determine whether the old conditional flags 902 satisfy the condition specified by the ISA conditional ALU instruction 124 . On the other hand, if the shift microinstruction 126 replaces the old carry flag 902 with the newly generated carry flag value, the conditional ALU operation microinstruction 126 will not confirm whether the old conditional flag 902 satisfies the specified condition.

请参照图16(包括图16A和图16B)，图中是以一方块图说明本发明图1的执行管线112执行一条件ALU指令124的情形。具体来说，此条件ALU指令124系一旗标更新、预移位、进位更新条件ALU操作ISA指令124，硬件指令转译器104则是依据图10的步骤1054将此指令124转译为微指令126。图16的操作在许多面向系类似于图15的操作，相似部分在此不与赘述，而仅说明相异之处。图4的寄存器配置表402产生附属信息给移位微指令126写入暂时寄存器T3的数值的CU ALUOP微指令126，不过，由于移位微指令126并不写入条件旗标寄存器，此寄存器配置表并不产生其相关信息。Please refer to FIG. 16 (including FIG. 16A and FIG. 16B ), which is a block diagram illustrating the execution of a conditional ALU instruction 124 by the execution pipeline 112 of FIG. 1 of the present invention. Specifically, the conditional ALU instruction 124 is a flag update, preshift, and carry update conditional ALU operation ISA instruction 124, and the hardware instruction translator 104 translates the instruction 124 into the microinstruction 126 according to step 1054 in FIG. 10 . . The operation of FIG. 16 is similar to the operation of FIG. 15 in many aspects, and the similar parts will not be repeated here, but only the differences will be described. The register configuration table 402 of FIG. 4 generates auxiliary information for the CU ALUOP microinstruction 126 that writes the value of the temporary register T3 to the shift microinstruction 126. However, since the shift microinstruction 126 does not write the condition flag register, this register configuration Tables do not produce information about them.

执行单元424由保留站406接收步骤1054所产生移位微指令126并依据步骤1102由寄存器文件106的寄存器R2与R3接收来源操作数数值，但不接收条件旗标924(除非ALU操作系一使用进位操作)。此执行单元424在寄存器R2与R3上进行移位操作(若是ALU操作系一使用进位操作，则在所接收到的C旗标902上进行)以产生一结果依据步骤1104写入暂时寄存器T3。暂时寄存器T3的数值系提供至导向总线128供CU ALUOP微指令126利用、若非来自导向总线128，则是写入重排缓冲器422的项目供CU ALUOP微指令126利用、以及若非来自导向总线128或重排缓冲器422，则是除了例外事件出现、分支误预测、或其它无效事件之外被引退至其适当的状态供CU ALUOP微指令126利用。The execution unit 424 receives the shift microinstruction 126 generated in step 1054 from the reservation station 406 and receives the source operand value from the registers R2 and R3 of the register file 106 according to step 1102, but does not receive the condition flag 924 (unless the ALU operating system uses a carry operation). The execution unit 424 performs a shift operation on registers R2 and R3 (or on the received C flag 902 if the ALU operation is a carry operation) to generate a result written to temporary register T3 according to step 1104 . The value of temporary register T3 is provided to steering bus 128 for use by CU ALUOP microinstructions 126 , if not from steering bus 128 , the entry written to reorder buffer 422 for use by CU ALUOP microinstructions 126 , and if not from steering bus 128 Or the rearrangement buffer 422 is retired to its proper state for CU ALUOP microinstructions 126 to utilize, except for exception occurrences, branch mispredictions, or other invalid events.

执行单元424由保留站406接收步骤1054所产生的CUALUOP微指令126，由寄存器文件106的寄存器R1与缓冲寄存器T3接收来源操作数数值，以及由条件旗标寄存器926并依据步骤1202接收条件旗标924。执行单元424在寄存器R1与暂时寄存器T3执行ALU操作(若是ALU操作系一使用进位操作，则在所接收到的C旗标902执行)以产生一结果并依据步骤1204写入暂时寄存器T2。此外：(1)若是架构条件旗标902并不满足指定条件(在图16中标示为NOT SATISFIED)，执行单元424依据步骤1216产生新的条件旗标928数值以写入条件旗标寄存器926；以及(2)若是架构条件旗标902满足指定条件(在图16中标示为SATISFIED)，执行单元424依据图12的步骤1222产生新的条件旗标928数值以写入条件旗标寄存器926。图16的CMOV微指令126的执行系类似于图14所描述者。The execution unit 424 receives the CUALUOP microinstruction 126 generated in step 1054 from the reservation station 406, receives the source operand value from the register R1 and the buffer register T3 of the register file 106, and receives the condition flag from the condition flag register 926 according to step 1202. 924. The execution unit 424 performs an ALU operation on the register R1 and the temporary register T3 (if the ALU operation uses a carry operation, then on the received C flag 902 ) to generate a result and write it into the temporary register T2 according to step 1204 . In addition: (1) if the architectural condition flag 902 does not satisfy the specified condition (marked as NOT SATISFIED in FIG. 16 ), the execution unit 424 generates a new value of the condition flag 928 according to step 1216 to write the condition flag register 926; and (2) if the architectural condition flag 902 satisfies the specified condition (marked as SATISFIED in FIG. 16 ), the execution unit 424 generates a new value of the condition flag 928 to write to the condition flag register 926 according to step 1222 in FIG. 12 . The execution of the CMOV microinstruction 126 of FIG. 16 is similar to that described in FIG. 14 .

请参照图17，图中是以一方块图显示本发明图1的执行管线112执行一条件ALU指令124的操作。具体来说，此条件ALU指令124系一非旗标更新、预移位、使用进位条件ALU操作的ISA指令124，硬件指令转译器104将此指令转译为图10的步骤1036所述的微指令126。依据图17的操作在许多面向是相似于图16的操作，相似的操作在此不再赘述，以下仅列出相异处。图17的移位微指令126的执行系类似于图16所描述者。Please refer to FIG. 17 , which is a block diagram showing the operation of the execution pipeline 112 of FIG. 1 to execute a conditional ALU instruction 124 of the present invention. Specifically, the conditional ALU instruction 124 is a non-flag update, pre-shift, and ISA instruction 124 operating with a carry conditional ALU. The hardware instruction translator 104 translates this instruction into the microinstruction described in step 1036 of FIG. 10 . 126. The operation according to FIG. 17 is similar to the operation of FIG. 16 in many aspects, the similar operations are not repeated here, and only the differences are listed below. Execution of the shift microinstruction 126 of FIG. 17 is similar to that described in FIG. 16 .

执行单元424由保留站406接收步骤1036所产生的ALUOPUC微指令126、由寄存器文件106的寄存器R1与暂时寄存器T3接收来源操作数数值、以及依据步骤1202由条件旗标寄存器926接收条件旗标924。因为ALU操作系一使用进位操作，执行单元424在寄存器R1、暂时寄存器T3与接收到的C旗标902执行ALU操作，以产生一结果并依据步骤1204写入暂时寄存器T2。执行单元424并不写入条件旗标寄存器926。The execution unit 424 receives the ALUOPUC microinstruction 126 generated in step 1036 from the reservation station 406, the source operand value from the register R1 and the temporary register T3 of the register file 106, and the condition flag 924 from the condition flag register 926 according to step 1202. . Since the ALU operation uses a carry operation, the execution unit 424 performs an ALU operation on the register R1 , the temporary register T3 and the received C flag 902 to generate a result and write the temporary register T2 according to step 1204 . Execution unit 424 does not write to condition flag register 926 .

执行单元424接收步骤1036所产生的XMOV微指令126、暂时寄存器T2与目的寄存器RD的来源操作数数值、以及依据图13的步骤1302所产生的条件旗标924。依据图13的步骤1316与1314，在条件旗标924满足预设条件时，执行单元424将暂时寄存器T2的来源操作数的数值作为其结果输出，而在条件旗标924不满足预设条件时，执行单元424将目的寄存器RD的来源操作数的数值作为其结果输出。此结果数值系提供至导向总线128供后续微指令126利用、此结果数值系写入重排缓冲器422的项目、并且除了例外事件出现、分支误预测、或是其它无效事件之外，此结果数值会引退至其适当的架构状态。The execution unit 424 receives the XMOV microinstruction 126 generated in step 1036 , the source operand values of the temporary register T2 and the destination register RD, and the condition flag 924 generated according to the step 1302 of FIG. 13 . According to steps 1316 and 1314 of FIG. 13 , when the condition flag 924 satisfies the preset condition, the execution unit 424 outputs the value of the source operand of the temporary register T2 as its result, and when the condition flag 924 does not satisfy the preset condition , the execution unit 424 outputs the value of the source operand of the destination register RD as its result. The result value is provided to the steering bus 128 for use by subsequent microinstructions 126, the result value is written to the entry of the rearrangement buffer 422, and except for exception occurrences, branch mispredictions, or other invalid events, the result Values fall back to their appropriate architectural state.

请参照图18，图中是以一方块图说明本发明图1的执行管线112执行一条件ALU指令124的情形。具体来说，此条件ALU指令124是一非旗标更新、预移位、非使用进位条件ALU操作的ISA指令124，而硬件指令转译器104系将此指令124转译为图10的步骤1034的微指令126。依据图18所进行的操作在许多面向系类似于依据图17所进行者，其中相似处系不再赘述，而仅说明相异处。图18的移位微指令126的执行系类似于图16所描述者。图18的ALUOP微指令126的执行系类似于图17的ALUOPUC微指令126的执行，除了图18的ALUOP微指令126并不使用C旗标902来产生其结果。图18的XMOV微指令126的执行系类似于图17的XMOV微指令126的执行。Please refer to FIG. 18 , which is a block diagram illustrating a situation in which the execution pipeline 112 of FIG. 1 executes a conditional ALU instruction 124 of the present invention. Specifically, the conditional ALU instruction 124 is an ISA instruction 124 that is not a flag update, pre-shift, and does not use a carry conditional ALU operation, and the hardware instruction translator 104 translates this instruction 124 into step 1034 of FIG. 10 . Microinstructions 126. The operations performed according to FIG. 18 are similar in many aspects to those performed according to FIG. 17, wherein the similarities are not repeated, and only the differences are described. Execution of the shift microinstruction 126 of FIG. 18 is similar to that described in FIG. 16 . The execution of the ALUOP microinstruction 126 of FIG. 18 is similar to the execution of the ALUOPUC microinstruction 126 of FIG. 17, except that the ALUOP microinstruction 126 of FIG. 18 does not use the C flag 902 to generate its result. The execution of the XMOV microinstruction 126 of FIG. 18 is similar to the execution of the XMOV microinstruction 126 of FIG. 17 .

请参照图19，图中是以一方块图说明本发明图1的执行管线112执行一条件ALU指令124。具体来说，此条件ALU指令124系一非旗标更新、非预移位、使用进位条件ALU操作的ISA指令124，而硬件指令转译器104系将此指令124转译为图10的步骤1026所述的微指令126。依据图19的操作在许多面向系类似于图17所述者，相似处在此不再赘述，而仅说明相异处。条件ALU指令124的转译系一非旗标更新、非预移位、使用进位条件ALU操作的ISA指令124，而不包含一移位微指令126。Please refer to FIG. 19 , which is a block diagram illustrating the execution of a conditional ALU instruction 124 by the execution pipeline 112 of FIG. 1 of the present invention. Specifically, the conditional ALU instruction 124 is a non-flag update, non-preshift, ISA instruction 124 using carry-conditional ALU operation, and the hardware instruction translator 104 translates this instruction 124 into the instruction shown in step 1026 of FIG. 10 . described microinstruction 126. The operation according to FIG. 19 is similar to that described in FIG. 17 in many aspects, and the similarities will not be repeated here, and only the differences will be described. The translation of the conditional ALU instruction 124 is a non-flag update, non-preshift, ISA instruction 124 using carry conditional ALU operations, and does not include a shift microinstruction 126 .

执行单元424从保留站406接收步骤1026所述的ALUOPUC微指令126、从寄存器文件106的寄存器R1与R2接收来源操作数数值、以及依据步骤1202从条件旗标寄存器926接收条件旗标924。因为ALU操作系一使用进位操作，执行单元424在寄存器R1与R2以及所接收到的C旗标902执行ALU操作，以产生一结果且依据步骤1204写入暂时寄存器T2。执行单元424并不写入条件旗标寄存器926。图19的XMOV微指令126的执行系类似于图17的XMOV微指令126的执行。The execution unit 424 receives the ALUOPUC microinstruction 126 described in step 1026 from the reservation station 406 , the source operand value from registers R1 and R2 of the register file 106 , and the condition flag 924 from the condition flag register 926 according to step 1202 . Since the ALU operation uses a carry operation, the execution unit 424 performs an ALU operation on the registers R1 and R2 and the received C flag 902 to generate a result and write to the temporary register T2 according to step 1204 . Execution unit 424 does not write to condition flag register 926 . The execution of the XMOV microinstruction 126 of FIG. 19 is similar to the execution of the XMOV microinstruction 126 of FIG. 17 .

请参照图20，图中是以一方块图说明本发明图1的执行管线112执行一条件ALU指令124。具体来说，此条件ALU指令124系一非旗标更新、非预移位、非使用进位条件ALU操作的ISA指令124，硬件指令转译器104系将此指令转译为图10的步骤1024所述的微指令126。依据图20的操作在许多面向系类似于图19所描述的操作，其中相同处不再赘述，而仅说明相异处。图20的ALUOP微指令126的执行系类似于图19的ALUOPUC微指令126的执行，除了图20的ALUOP微指令126并不使用C旗标902来产生其结果。图20的XMOV微指令126的执行系类似于图17的XMOV微指令126的执行。Please refer to FIG. 20 , which is a block diagram illustrating the execution of a conditional ALU instruction 124 by the execution pipeline 112 of FIG. 1 of the present invention. Specifically, the conditional ALU instruction 124 is an ISA instruction 124 that is not a flag update, non-preshift, and non-use carry conditional ALU operation, and the hardware instruction translator 104 translates this instruction into step 1024 in FIG. 10 . of microinstructions 126. The operation according to FIG. 20 is similar to the operation described in FIG. 19 in many aspects, and the same parts will not be repeated, but only the differences will be described. The execution of the ALUOP microinstruction 126 of FIG. 20 is similar to the execution of the ALUOPUC microinstruction 126 of FIG. 19, except that the ALUOP microinstruction 126 of FIG. 20 does not use the C flag 902 to generate its result. The execution of the XMOV microinstruction 126 of FIG. 20 is similar to the execution of the XMOV microinstruction 126 of FIG. 17 .

在前文中可以发现，本发明所述的实施例可避免允许微指令126来指定一额外来源操作数所衍生出来的缺点。这些缺点包括，第一，对各个将利用额外来源操作数执行微指令126的执行单元424，在通用寄存器文件需设置一额外的读出端口。第二，对各个将利用额外来源操作数执行微指令126的执行单元424，在重排缓冲器422需设置一个额外的读出端口。第三，对各个将利用额外来源操作数执行微指令126的执行单元424，在导向总线128上需使用更多的线路。第四，对各个将利用额外来源操作数执行微指令126的执行单元424，需要一个额外相对较大的多工器。第五，需要使用Q个额外的标签比较器，其中：As can be seen in the foregoing, the described embodiments of the present invention avoid the drawbacks of allowing microinstructions 126 to specify an additional source operand. These disadvantages include, first, an additional read port in the general register file for each execution unit 424 that will execute the microinstruction 126 with additional source operands. Second, for each execution unit 424 that will execute the microinstruction 126 with additional source operands, an additional read port needs to be provided in the rearrangement buffer 422 . Third, more lines are used on the steering bus 128 for each execution unit 424 that will execute the microinstruction 126 with additional source operands. Fourth, an additional relatively large multiplexer is required for each execution unit 424 that will execute microinstructions 126 with additional source operands. Fifth, Q additional label comparators need to be used, where:

Q＝∑i＝1to n,(R[i]*P[i]*J[i])Q=∑i=1ton,(R[i]*P[i]*J[i])

其中，n是执行单元424的数量，R[i]是保留站406提供给第[i]个执行单元424的项目的数量406，P[i]是可由第[i]个执行单元424所执行微指令所能指定的来源操作数的最大数量，以及J[i]是能够导向至第[i]个执行单元424的执行单元424的数量。第六，对额外的来源操作数，在寄存器配置表402中需要额外的重命名查询操作。第七，需要扩展保留站406以处理额外的来源操作数。这些在速度、电力与空间付出的额外成本是不受欢迎的。where n is the number of execution units 424 , R[i] is the number of items 406 provided by the reservation station 406 to the [i]th execution unit 424 , and P[i] is executable by the [i]th execution unit 424 The maximum number of source operands that a microinstruction can specify, and J[i] is the number of execution units 424 that can lead to the [i]th execution unit 424 . Sixth, additional rename lookup operations are required in register configuration table 402 for additional source operands. Seventh, the reservation station 406 needs to be extended to handle additional source operands. These extra costs in speed, power and space are not welcome.

相同来源目的(SAME-SOURCE-DESTINATION)优化实施例SAME-SOURCE-DESTINATION Optimized Example

请参照图21，图中是以流程图说明本发明图1的硬件指令转译器104转译条件ALU指令124的操作。基本上，硬件指令转译器104依据图21所述的操作在许多面向系类似于依据图10所述的操作，尤其是对应于各种需要做出决定的步骤，因而在此对于这些步骤给予相同的编号。Please refer to FIG. 21 , which is a flowchart illustrating the operation of the hardware instruction translator 104 of FIG. 1 to translate the conditional ALU instruction 124 of the present invention. Basically, the operation of the hardware instruction translator 104 according to FIG. 21 is similar in many respects to the operation described with respect to FIG. 10, especially corresponding to the various steps that need to make decisions, and therefore the same is given here to these steps. 's number.

请参照图21，图10的步骤1002系以步骤2102予以取代。在步骤2102中，硬件指令转译器104遇到的条件ALU指令124系不同于步骤1002所遇到的，因为在步骤2102所遭遇到的条件ALU指令124系在多个来源寄存器中指定一个寄存器作为目的寄存器。硬件指令转译器104系配置来识别此条件并且优化其输出的微指令126。尤其是，硬件指令转译器104系将相同来源目的的条件ALU指令124解码并将其转译为与图10的步骤1024、1026、1034、1036、1044、1054与1055(步骤10XX)所描述者不同的微指令126序列。此不同的微指令126序列系描述于图21的步骤2124、2126、2134、2136、2144、2154与2156(步骤21XX)以取代其相对应的步骤10XX。尤其是，步骤21XX中的各个步骤的微指令126序列所具有的微指令126，较步骤10XX内的相对应微指令126序列少一个指令。具体来说，步骤21XX的序列并不包含CMOV或XMOV微指令126，选择性写入原始目的寄存器数值或是结果数值的操作，系由条件ALU微指令126在序列的末端执行。此操作在下列段落会有更清楚的说明。Please refer to FIG. 21 , step 1002 of FIG. 10 is replaced by step 2102 . In step 2102, the conditional ALU instruction 124 encountered by the hardware instruction translator 104 is different from that encountered in step 1002 because the conditional ALU instruction 124 encountered in step 2102 specifies one of the source registers as the destination register. The hardware instruction translator 104 is configured to recognize this condition and optimize the microinstructions 126 it outputs. In particular, the hardware instruction translator 104 decodes and translates the conditional ALU instructions 124 of the same origin and destination as described in steps 1024 , 1026 , 1034 , 1036 , 1044 , 1054 and 1055 (step 10XX ) of FIG. 10 . 126 sequence of microinstructions. This different sequence of microinstructions 126 is described in steps 2124, 2126, 2134, 2136, 2144, 2154 and 2156 (step 21XX) of FIG. 21 in place of its corresponding step 10XX. In particular, the microinstructions 126 in the microinstruction 126 sequence of each step in step 21XX are one instruction less than the corresponding microinstruction 126 sequence in step 10XX. Specifically, the sequence of step 21XX does not include the CMOV or XMOV microinstructions 126, and the operation of selectively writing the original destination register value or the result value is performed by the conditional ALU microinstruction 126 at the end of the sequence. This operation is explained more clearly in the following paragraphs.

在步骤2124中，硬件指令转译器104将相同来源目的非旗标更新、非预移位、非进位使用的条件ALU指令124转译为单一个微指令126，即一条件ALU操作微指令126(标示为ALUOP CC)。在步骤2124的实例中，条件ALU指令124系类似于步骤1024所描述者，除了第一来源操作数系目的寄存器(RD)。因此，条件ALU指令124指定一第一来源寄存器(RD)与一第二来源寄存器(R2)，一ALU操作(标示为ALUOP)用以执行于第一来源寄存器RD与第二来源寄存器R2以产生一结果，而目的寄存器(RD)则与第一来源寄存器相同，执行结果系有条件地写入此目的寄存器中。条件ALUOP微指令126与条件ALU指令124指定相同的ALU操作与条件。执行条件ALUOP微指令126的执行单元424接收旧的或是当前的目的寄存器(RD)的数值，同时依据步骤1202接收第二来源操作数R2的数值，并依据步骤1204执行ALU操作于此两个来源操作数以产生一结果。执行单元424亦接收条件旗标924并依据步骤1204检验条件旗标924以确认其是否满足指定条件。若是，执行单元424依据步骤1211输出结果，否则就依据步骤1212输出旧的目的寄存器数值。条件ALUOP微指令126的执行是以方块图呈现于第28图。此流程终止于步骤2124。In step 2124, the hardware instruction translator 104 translates the conditional ALU instruction 124 with the same source and destination non-flag update, non-preshift, and non-carry use into a single microinstruction 126, that is, a conditional ALU operation microinstruction 126 (marked for ALUOP CC). In the instance of step 2124, the conditional ALU instruction 124 is similar to that described for step 1024, except that the first source operand is the destination register (RD). Therefore, the conditional ALU instruction 124 specifies a first source register (RD) and a second source register (R2), and an ALU operation (labeled as ALUOP) is performed on the first source register RD and the second source register R2 to generate A result, and the destination register (RD) is the same as the first source register, and the execution result is conditionally written into the destination register. Conditional ALUOP microinstructions 126 and conditional ALU instructions 124 specify the same ALU operations and conditions. The execution unit 424 of the execution condition ALUOP microinstruction 126 receives the old or current value of the destination register (RD), and at the same time receives the value of the second source operand R2 according to step 1202, and executes the ALU operation on these two according to step 1204. Source operands to produce a result. The execution unit 424 also receives the condition flag 924 and checks the condition flag 924 according to step 1204 to confirm whether it satisfies the specified condition. If yes, the execution unit 424 outputs the result according to step 1211 , otherwise it outputs the old destination register value according to step 1212 . The execution of the conditional ALUOP microinstruction 126 is shown in FIG. 28 as a block diagram. The process ends at step 2124.

在步骤2126中，硬件指令转译器104将相同来源目的非旗标更新、非预移位、使用进位的条件ALU指令124转译为单一个微指令126，即一使用进位条件ALU操作微指令126(标示为ALUOPUC CC)。在步骤2126的实例中，此条件ALU指令124是类似于步骤2124所描述者，除了其所指定的ALU操作系使用进位旗标，并且此指令亦类似于步骤1026所描述者，除了第一来源操作数系目的寄存器(RD)。条件ALUOPUC微指令126是类似于步骤2124所描述者；不过，其所指定的ALU操作系使用进位旗标。如图27的方块图所示的条件ALUOPUC微指令126的执行系类似于步骤2124的条件ALUOP微指令126的执行，除了执行单元424系使用进位旗标来执行ALU操作。此流程终止于步骤2126。In step 2126, the hardware instruction translator 104 translates the conditional ALU instruction 124 with the same source destination, non-flag update, non-preshift, and use carry, into a single microinstruction 126, that is, a carry conditional ALU operation microinstruction 126 ( Labeled as ALUOPUC CC). In the instance of step 2126, the conditional ALU instruction 124 is similar to that described in step 2124, except that the ALU operation it specifies uses the carry flag, and the instruction is also similar to that described in step 1026, except that the first source The operand is the destination register (RD). The conditional ALUOPUC microinstruction 126 is similar to that described in step 2124; however, the ALU operation it specifies uses the carry flag. The execution of the conditional ALUOPUC microinstruction 126 as shown in the block diagram of FIG. 27 is similar to the execution of the conditional ALUOPUC microinstruction 126 of step 2124, except that the execution unit 424 uses the carry flag to perform the ALU operation. The process ends at step 2126.

在步骤2134中，硬件指令转译器104系将相同来源目的非旗标更新、预移位、非使用进位的条件ALU指令124转译为第一与第二微指令126，即：(1)一移位微指令126；以及(2)一ALUOP微指令126。在步骤2134的实例中，条件ALU指令124系类似于步骤1034所描述者，除了第一来源操作数系目的寄存器(RD)外，并且，此指令系类似于步骤2124所描述者，除了条件ALU指令124亦指定一具有一移位量的预移位操作于第二来源操作数(R2)，在步骤2134的实例中，此移位量系储存于由条件ALU指令124所指定的第三来源寄存器(R3)。不过，若是条件ALU指令124属于会将移位量指定为指令124内的常数的种类，第三来源寄存器就不会被使用。移位微指令126系类似于步骤1034所描述者，而执行单元424执行此移位微指令126的方式系类似于步骤1034与图18所描述者。虽然在步骤2134中，因为条件ALU指令124指示架构条件旗标902不会被更新，因此，由移位微指令126产生的进位旗标数值并不会被使用，不过，如在步骤2156，由移位微指令126所产生的进位旗标数值则是会被使用到。此外，此预移位操作会需要旧的进位旗标旋转至移位后的结果数值；举例来说，RRX预移位操作系将进位旗标移位至结果的最高有效位。在此情况下，虽未见于图21(除了步骤2156之外)，当执行单元424执行移位微指令126时，它亦会读取条件旗标924以取得当前的进位旗标数值。条件ALUOP微指令126及其执行系类似于步骤2124所描述者；不过，此微指令系接收暂时寄存器T3的数值而非寄存器R2的数值，并且执行ALU操作于寄存器R1与暂时寄存器T3以产生结果写入目的寄存器。移位微指令126的执行与条件ALUOP微指令126是呈现于第26图中。此流程终止于步骤2134。In step 2134, the hardware instruction translator 104 translates the conditional ALU instruction 124 with the same source and destination non-flag update, pre-shift, and non-use carry into the first and second microinstructions 126, namely: (1) one shift and (2) an ALUOP microinstruction 126. In the instance of step 2134, the conditional ALU instruction 124 is similar to that described in step 1034, except that the first source operand is the destination register (RD), and this instruction is similar to that described in step 2124, except that the conditional ALU Instruction 124 also specifies a pre-shift operation on the second source operand (R2) with a shift amount that, in the example of step 2134, is stored in the third source specified by conditional ALU instruction 124 register (R3). However, if the conditional ALU instruction 124 is of the type that specifies the shift amount as a constant within the instruction 124, the third source register will not be used. The shift microinstruction 126 is similar to that described in step 1034, and the manner in which the execution unit 424 executes the shift microinstruction 126 is similar to that described in step 1034 and FIG. Although in step 2134, because the conditional ALU instruction 124 indicates that the architectural condition flag 902 will not be updated, the carry flag value generated by the shift microinstruction 126 is not used, however, as in step 2156, by The carry flag value generated by the shift microinstruction 126 is used. Furthermore, this preshift operation would require the old carry flag to be rotated to the shifted result value; for example, an RRX preshift operation would shift the carry flag to the most significant bit of the result. In this case, although not shown in Figure 21 (except for step 2156), when the execution unit 424 executes the shift microinstruction 126, it also reads the condition flag 924 to obtain the current carry flag value. Conditional ALUOP microinstruction 126 and its execution are similar to those described in step 2124; however, this microinstruction receives the value of temporary register T3 instead of the value of register R2, and performs an ALU operation on register R1 and temporary register T3 to produce a result Write to the destination register. The execution of the shift microinstruction 126 and the conditional ALUOP microinstruction 126 are shown in FIG. 26 . The process ends at step 2134.

在步骤2136中，硬件指令转译器104将相同来源目的非旗标更新、预移位、使用进位的条件ALU指令124更新为第一与第二微指令126，即：(1)一移位微指令126；以及(2)一使用进位条件算术与逻辑单元ALUOP微指令126(标示为ALUOPUC CC)。在步骤2136的实例中，条件ALU指令124系类似于步骤2134所描述者，除了所指定的ALU操作系使用进位旗标，并且，此指令系类似于步骤1036所描述者，除了第一来源操作数系目的寄存器(RD)。这两个微指令126以及其执行系类似于步骤2134所描述者；不过，ALUOPUC微指令126亦接收条件旗标924以取得进位旗标当前的数值以使用于进位使用ALU操作。移位微指令126与条件ALUOPUC微指令126的执行，如图25所示，系类似于步骤2134中的移位微指令126与条件ALUOP微指令126的执行，除了执行单元424系使用进位旗标来执行ALU操作。此流程终止于步骤2136。In step 2136, the hardware instruction translator 104 updates the conditional ALU instruction 124 of the same source destination non-flag update, pre-shift, and use carry to the first and second micro-instructions 126, namely: (1) a shift micro-instruction instruction 126; and (2) an ALUOP microinstruction 126 (designated as ALUOPUC CC) using the carry conditional arithmetic and logic unit. In the example of step 2136, the conditional ALU instruction 124 is similar to that described in step 2134, except that the specified ALU operation uses the carry flag, and the instruction is similar to that described in step 1036, except that the first source operates Number System Destination Register (RD). The two microinstructions 126 and their execution are similar to those described in step 2134; however, the ALUOPUC microinstruction 126 also receives the condition flag 924 to obtain the current value of the carry flag for use in carry using the ALU operation. The execution of the shift microinstruction 126 and the conditional ALUOPUC microinstruction 126, as shown in Figure 25, is similar to the execution of the shift microinstruction 126 and the conditional ALUOP microinstruction 126 in step 2134, except that the execution unit 424 uses the carry flag to perform ALU operations. The process ends at step 2136.

在步骤2144中，硬件指令转译器104将相同来源目的旗标更新、非预移位的条件ALU指令124转译为单一个微指令126，即一条件ALU操作微指令126(标示为ALUOP CC)。在步骤2144的实例中，条件ALU指令124系类似于步骤2124的条件ALU指令124，除了会更新架构条件旗标902之外，并且系类似于步骤1044所描述者，除了第一来源操作数系目的寄存器。步骤2144的条件ALU操作微指令126及其操作系类似于步骤2124所描述者，除了步骤2144的ALU操作微指令126亦更新架构条件旗标902，并且系类似于步骤1044的条件ALU微指令126，除了其第一操作数系目的寄存器而非寄存器R1并且其目的寄存器系目的寄存器而非暂时寄存器T2。执行单元424执行条件ALU微指令126的执行单元424系依据步骤1202接收目的寄存器RD与寄存器R2作为来源操作数，并且依据步骤1204执行此指定的ALU操作于此二个来源操作数以产生一结果。执行单元424亦接收架构条件旗标902并依据步骤1206确认其是否满足指定条件。若是，执行单元424依据ALU操作是否为进位更新操作，来选择依据步骤1222或1226输出ALU操作的结果以写入目的寄存器RD，否则就依据步骤1216输出目的寄存器RD的旧的数值。此外，执行单元424依据条件是否满足以及ALU操作是否为进位更新操作，来选择依据步骤1216、1222或是1226写入条件旗标寄存器926。若是条件不满足，执行单元424依据步骤1216将旧的条件旗标数值写入架构条件旗标902；反之，若是条件满足，执行单元424在采取条件进位ALU操作的情况下，系依据步骤1222，基于ALU操作的结果来更新架构条件旗标902，而在采取非条件进位ALU操作的情况下，则是依据步骤1226。条件算术与逻辑单元ALUOP微指令126的执行系呈现于图22。值得注意的是，在步骤2144(以及步骤1054与1056)产生的条件ALU操作微指令126所执行的ALU操作，可以是一使用进位旗标的ALU操作(类似于步骤1026与1036所描述者)，而由于微指令126会读取旗标(由RDFLAGS标示)，执行单元424具有此进位旗标来执行进位使用ALU操作。此流程终止于步骤2144。In step 2144, the hardware instruction translator 104 translates the same source destination flag update, non-preshifted conditional ALU instruction 124 into a single microinstruction 126, that is, a conditional ALU operation microinstruction 126 (labeled as ALUOP CC). In the example of step 2144, the conditional ALU instruction 124 is similar to the conditional ALU instruction 124 of step 2124, except that the architectural conditional flag 902 is updated, and is similar to that described in step 1044, except that the first source operand coefficient destination register. The conditional ALU operation microinstruction 126 of step 2144 and its operation are similar to those described in step 2124, except that the ALU operation microinstruction 126 of step 2144 also updates the architectural condition flag 902 and is similar to the conditional ALU operation microinstruction 126 of step 1044 , except that its first operand is the destination register instead of register R1 and its destination register is the destination register instead of temporary register T2. The execution unit 424 executes the conditional ALU microinstruction 126. The execution unit 424 receives the destination register RD and the register R2 as source operands according to step 1202, and executes the specified ALU operation on the two source operands according to step 1204 to generate a result . The execution unit 424 also receives the framework condition flag 902 and confirms whether it satisfies the specified condition according to step 1206 . If so, the execution unit 424 selects to output the result of the ALU operation to write to the destination register RD according to step 1222 or 1226 according to whether the ALU operation is a carry update operation, otherwise, output the old value of the destination register RD according to step 1216. In addition, the execution unit 424 selects to write the condition flag register 926 according to steps 1216 , 1222 or 1226 according to whether the condition is satisfied and whether the ALU operation is a carry update operation. If the condition is not satisfied, the execution unit 424 writes the old condition flag value into the frame condition flag 902 according to step 1216; on the contrary, if the condition is satisfied, the execution unit 424 performs the conditional carry ALU operation according to step 1222, The architectural condition flag 902 is updated based on the result of the ALU operation, and in the case of an unconditional carry ALU operation, according to step 1226 . The execution of the conditional arithmetic and logic unit ALUOP microinstruction 126 is presented in FIG. 22 . It is worth noting that the ALU operation performed by the conditional ALU operation microinstruction 126 generated in step 2144 (and steps 1054 and 1056) may be an ALU operation using a carry flag (similar to that described in steps 1026 and 1036), And since the microinstruction 126 will read the flag (indicated by RDFLAGS), the execution unit 424 has the carry flag to perform the carry using the ALU operation. The process ends at step 2144.

在步骤2154，硬件指令转译器104将相同来源目的旗标更新、预移位、进位使用的条件ALU指令124转译为第一与第二微指令126，即(1)一移位微指令126；以及(2)一条件进位更新ALU操作微指令126(标示为CU ALUOP CC)。在步骤2154的实例中，条件ALU指令124系类似于步骤2134所描述者，除了条件ALU指令124亦指定架构条件旗标902是要被更新的，并且系类似于步骤1054所描述者，除了第一来源操作数系目的寄存器。移位微指令126系类似于步骤1034所描述者，并且，执行单元424执行移位微指令126的方式系类似于图18的步骤1034所描述者。CU ALUOP微指令126及其执行系类似于步骤2124的条件ALU微指令126，除了步骤2144的CU ALUOP微指令126亦会更新架构条件旗标902，并且系类似于步骤1054的条件ALU微指令126，除了其第一操作数系目的寄存器而非寄存器R1，且其目的寄存器系目的寄存器而非暂时寄存器T2。执行CU ALUOP微指令126的执行单元424系依据步骤2102接收目的寄存器RD与暂时寄存器T3作为来源操作数，并依据步骤1204执行此指定的ALU操作于目的寄存器与暂时寄存器T3以产生一结果。此外，执行单元424依据步骤1202接收架构条件旗标902，并且依据步骤1206确认其是否满足指定条件。此外，依据条件是否满足，执行单元424系依据步骤1216或1222以更新条件旗标寄存器926。若是条件不被满足，执行单元424将旧的条件旗标数值写入架构条件旗标902；反之，若是条件满足，执行单元424则基于ALU操作的结果来更新架构条件旗标902。移位微指令126与条件ALUOP微指令126的执行系呈现于第24图。此流程终止于步骤2154。In step 2154, the hardware instruction translator 104 translates the conditional ALU instructions 124 used for the same source destination flag update, pre-shift, and carry into the first and second microinstructions 126, namely (1) a shift microinstruction 126; and (2) a conditional carry update ALU operation microinstruction 126 (labeled CU ALUOP CC). In the instance of step 2154, the conditional ALU instruction 124 is similar to that described in step 2134, except that the conditional ALU instruction 124 also specifies that the architectural condition flag 902 is to be updated, and is similar to that described in step 1054, except that the first A source operand is the destination register. The shift microinstruction 126 is similar to that described for step 1034, and the manner in which the execution unit 424 executes the shift microinstruction 126 is similar to that described for step 1034 of FIG. The CU ALUOP microinstruction 126 and its execution are similar to the conditional ALU microinstruction 126 of step 2124, except that the CU ALUOP microinstruction 126 of step 2144 also updates the architectural condition flag 902, and is similar to the conditional ALU microinstruction 126 of step 1054 , except that its first operand is a destination register instead of register R1, and its destination register is a destination register instead of temporary register T2. The execution unit 424 executing the CU ALUOP microinstruction 126 receives the destination register RD and the temporary register T3 as source operands according to step 2102, and executes the specified ALU operation on the destination register and the temporary register T3 according to step 1204 to generate a result. In addition, the execution unit 424 receives the framework condition flag 902 according to step 1202 and confirms whether it satisfies the specified condition according to step 1206 . In addition, according to whether the condition is satisfied, the execution unit 424 updates the condition flag register 926 according to step 1216 or 1222. If the condition is not satisfied, the execution unit 424 writes the old condition flag value into the architectural condition flag 902; otherwise, if the condition is satisfied, the execution unit 424 updates the architectural condition flag 902 based on the result of the ALU operation. The execution of the shift microinstruction 126 and the conditional ALUOP microinstruction 126 is shown in FIG. 24 . The process ends at step 2154.

在步骤2156中，硬件指令转译器104系将相同来源目的旗标更新、欲移位、非进位更新的条件ALU指令124转译为第一与第二微指令126，即：(1)一移位微指令126；以及(2)一条件非进位更新ALU操作微指令126(标示为NCU ALUOP CC)。在步骤2156的实例中，条件ALU指令124系类似于步骤2154所描述者，除了条件ALU指令124系指定一非进位更新ALU操作，并且系类似于步骤1056所描述者，除了第一来源操作数系目的寄存器。因此，在条件满足时，架构进位旗标902系以此预移位进位旗标数值进行更新。移位微指令126系类似于步骤2134所描述者；不过，移位微指令126会读取以及写入条件旗标寄存器926。具体来说，执行移位微指令126的执行单元424系：(1)将预移位操作所产生的进位旗标数值写入PSC位906；(2)设定USE位908来指示条件NCUALUOP微指令126去利用PSC位906来更新架构进位旗标902；以及(3)依据步骤1114将旧的架构条件旗标902写回条件旗标寄存器926，藉此NCUALUOP微指令126可以评估架构条件旗标902的旧的数值来确认其是否满足指定条件。NCUALUOP微指令126与条件ALU指令124系指定相同的条件。执行NCUALUOP微指令126的执行单元424系依据步骤1204执行ALU操作于目的寄存器与暂时寄存器T3以产生一结果。此外，执行单元424接收架构条件旗标902并且依据步骤1206确认其是否满足条件。此外，执行单元424依据条件是否满足以及USE位908是否被设定，选择依据步骤1216、1226或1228来写入条件旗标寄存器926。具体来说，若是条件不被满足，执行单元424会依据步骤1216将旧的条件旗标数值写入架构条件旗标902；而在条件满足时，执行单元424会视USE位908是否被设定，选择依据步骤1226或是1228，且基于ALU操作的结果来更新架构条件旗标902。具体来说，架构溢位(V)旗标902系以旧的溢位旗标数值924写入，N旗标与Z旗标则是以基于结果所产生的新的数值写入。此外，若是USE位908如此指示，架构进位旗标902系依据步骤1228以位于PSC位906的预移位进位旗标数值进行更新，否则就依据步骤1226以旧的进位旗标数值924进行更新。移位微指令126与NCUALUOP微指令126的执行系呈现于第23图。此流程终止于步骤2156。In step 2156, the hardware instruction translator 104 translates the conditional ALU instruction 124 with the same source destination flag update, to be shifted, and not to carry update into the first and second microinstructions 126, namely: (1) a shift microinstruction 126; and (2) a conditional non-carry update ALU operation microinstruction 126 (labeled as NCU ALUOP CC). In the example of step 2156, the conditional ALU instruction 124 is similar to that described in step 2154, except that the conditional ALU instruction 124 specifies a non-carry update ALU operation, and is similar to that described in step 1056, except that the first source operand system destination register. Therefore, when the conditions are met, the architectural carry flag 902 is updated with this pre-shift carry flag value. The shift microinstruction 126 is similar to that described in step 2134; however, the shift microinstruction 126 reads and writes the condition flags register 926. Specifically, the execution unit 424 that executes the shift microinstruction 126: (1) writes the carry flag value generated by the preshift operation into the PSC bit 906; (2) sets the USE bit 908 to indicate the conditional NCUALUOP microinstruction instructs 126 to update the architectural carry flag 902 with the PSC bit 906; and (3) writes the old architectural condition flag 902 back to the condition flags register 926 according to step 1114, whereby the NCUALUOP microinstruction 126 can evaluate the architectural condition flag 902 to confirm whether it meets the specified conditions. The NCUALUOP microinstruction 126 and the conditional ALU instruction 124 specify the same conditions. The execution unit 424 executing the NCUALUOP microinstruction 126 performs an ALU operation on the destination register and the temporary register T3 according to step 1204 to generate a result. In addition, the execution unit 424 receives the architectural condition flag 902 and confirms whether it satisfies the condition according to step 1206 . In addition, the execution unit 424 chooses to write the condition flag register 926 according to steps 1216, 1226 or 1228 depending on whether the condition is satisfied and whether the USE bit 908 is set. Specifically, if the condition is not satisfied, the execution unit 424 will write the old condition flag value into the framework condition flag 902 according to step 1216; and when the condition is satisfied, the execution unit 424 will check whether the USE bit 908 is set , select according to step 1226 or 1228, and update the architecture condition flag 902 based on the result of the ALU operation. Specifically, the architectural overflow (V) flag 902 is written with the old overflow flag value 924, and the N and Z flags are written with new values based on the result. Furthermore, if the USE bit 908 so indicates, the architectural carry flag 902 is updated with the pre-shifted carry flag value at the PSC bit 906 according to step 1228, otherwise it is updated with the old carry flag value 924 according to the step 1226. The execution of the shift microinstruction 126 and the NCUALUOP microinstruction 126 is shown in FIG. 23 . The process ends at step 2156.

此处理方式的优点在于，在条件ALU指令124指定目的寄存器与来源寄存器其中之一为同一者时，硬件指令转译器104可以进行优化且使所产生的微指令126序列减少一个微指令126。第一，其可增加微处理器100的前瞻(lookahead)功能以利用所欲执行的程序的指令层的平行处理来增加执行单元424的使用。因为微指令126数量减少意味着重排缓冲器422中供额外微指令126所用的自由槽(slot)将增加，于是能产生一较大的微指令126池(pool)以完成发布准备供后续执行之用，因而可以提升此前瞻功能。第二，因为每一个时脉周期中，硬件指令转译器104只能输出微指令126至一预设数量的槽(slot)，而至少在一实施例中，硬件指令转译器104必须在同一时脉周期内输出实施一给定ISA指令124所需的所有微指令126，因此，减少一条件ALU指令124转译产生的微指令126数量，亦可减少每个周期内空的微指令126槽的平均数量，同时有助于增加微处理器100的前瞻功能与执行单元424的使用。The advantage of this approach is that when the conditional ALU instruction 124 specifies that one of the destination and source registers is the same, the hardware instruction translator 104 can optimize and reduce the generated sequence of microinstructions 126 by one microinstruction 126 . First, it can increase the lookahead function of the microprocessor 100 to increase the usage of the execution unit 424 by utilizing the parallel processing of the instruction level of the program to be executed. Since the reduction in the number of microinstructions 126 means that there will be more free slots in the rearrangement buffer 422 for additional microinstructions 126, a larger pool of microinstructions 126 can be created to complete the issue preparation for subsequent execution. can be used to improve this look-ahead function. Second, because the hardware instruction translator 104 can only output microinstructions 126 to a predetermined number of slots per clock cycle, and at least in one embodiment, the hardware instruction translator 104 must be at the same time All microinstructions 126 required to implement a given ISA instruction 124 are output in one pulse cycle. Therefore, reducing the number of microinstructions 126 generated by the translation of a conditional ALU instruction 124 can also reduce the average number of empty microinstruction slots 126 per cycle. number, and at the same time helps to increase the look-ahead function of the microprocessor 100 and the use of the execution unit 424 .

条件非分支指令预测Conditional non-branch instruction prediction

以上实施例所描述的是在限定读出端口(read port-limited)的管线式微处理器中，将一条件非分支指令，亦即此处所称的条件ALU指令，转译为微指令的技术。第一微指令执行一ALU操作并将结果写入一暂时寄存器。第二微指令接收来自暂时寄存器的结果与目的寄存器的当前数值，并且在条件满足时，将结果写入目的寄存器，而在条件不满足时，将当前数值写回目的寄存器。相类似地，美国临时申请案61/473,062所描述的实施例系在一限定读出端口的管线式微处理器中，将一条件非分支指令，即文中所称的条件载入指令转译为微指令。此指令转译器系将条件载入指令转译为二个微指令：(1)一个同时取得条件码与旗标的载入微指令，在条件不满足时，就不更新其架构状态(例如：页表查询产生的存储器写入的副作用或是产生例外事件)并且加载一虚设数值至暂时寄存器，不过，若是条件满足，就将来自存储器的真实数值加载暂时寄存器；以及(2)一个条件移动微指令，接收目的寄存器的当前数值，并且在条件非为真时，将此当前数值移回目的寄存器，而在条件为真时，将来自暂时寄存器的数值移至目的寄存器。The above embodiments describe the technology of translating a conditional non-branch instruction, ie, a conditional ALU instruction, into a microinstruction in a pipelined microprocessor with limited read port-limited. The first microinstruction performs an ALU operation and writes the result to a scratch register. The second microinstruction receives the result from the temporary register and the current value of the destination register, and writes the result to the destination register when the condition is satisfied, and writes the current value back to the destination register when the condition is not satisfied. Similarly, the embodiment described in US Provisional Application 61/473,062 is to translate a conditional non-branch instruction, referred to herein as a conditional load instruction, into a microinstruction in a pipelined microprocessor with limited read ports. . This instruction translator translates the conditional load instruction into two microinstructions: (1) a load microinstruction that obtains the condition code and flag at the same time, and does not update its architectural state (eg page table) when the condition is not satisfied query for the side effects of a memory write generated or generate an exception) and load a dummy value into the scratch register, but, if the condition is met, load the scratch register with the real value from memory; and (2) a conditional move microinstruction, Receives the current value of the destination register, and if the condition is not true, moves this current value back to the destination register, and when the condition is true, moves the value from the temporary register to the destination register.

虽然此解决方法相较于传统技术有所改善，不过，此方法会产生额外的成本，亦即第二微指令以及第二微指令与第一微指令的关联性有关的延迟。其次，在微处理器的其它结构的指令槽，例如微指令队列、重排缓冲器、保留站、以及执行单元也会被第二微指令利用。此外，第二微指令的出现会降低每个时脉周期中，指令转译器所发出、指令发布单元所发布、以及指令引退单元所引退的平均指令数量，因而限制了微处理器的处理能力。Although this solution is an improvement over conventional techniques, this approach incurs additional costs, namely delays related to the second microinstruction and the second microinstruction's dependencies with the first microinstruction. Second, instruction slots in other structures of the microprocessor, such as microinstruction queues, reorder buffers, reservation stations, and execution units, are also utilized by the second microinstruction. In addition, the presence of the second microinstruction reduces the average number of instructions issued by the instruction translator, issued by the instruction issue unit, and retired by the instruction retirement unit per clock cycle, thereby limiting the processing capability of the microprocessor.

本发明在此提供一种具有更高效能的解决方法，纳入一预测机制，类似于分支预测方法，来预测条件非分支指令的走向，亦即预测条件是否被满足，而决定是否需要执行条件非分支指令。此解决方法让条件转译器可基于预测信息发出单一的微指令，而非多个微指令。微处理器并具有一个由误预测状态恢复的机制。The present invention provides a solution with higher performance, which incorporates a prediction mechanism, similar to the branch prediction method, to predict the direction of the conditional non-branch instruction, that is, predict whether the condition is satisfied, and then determine whether the conditional non-branch instruction needs to be executed. branch instruction. This workaround allows the conditional translator to issue a single microinstruction, rather than multiple microinstructions, based on prediction information. The microprocessor also has a mechanism for recovering from mispredicted states.

以下同时描述有静态与动态预测机制的实施例。静态预测机制系类似于静态分支预测。动态(或基于历史信息)的预测机制系在由指令快取撷取条件非分支指令时，检视此条件非分支指令的程序计数器/指令指针的数值，此运作方式类似于分支目标存储器地址快取(branch target address cache,BTAC)。Embodiments with both static and dynamic prediction mechanisms are described below. The static prediction mechanism is similar to static branch prediction. A dynamic (or historically based) prediction mechanism looks at the program counter/instruction pointer value of a conditional non-branch instruction when fetching the conditional non-branch instruction from the instruction cache, which works similarly to branch target memory address caching (branch target address cache, BTAC).

在静态预测机制中，静态预测器检视此操作与/或由条件非分支指令指定的条件码(例如：ALU操作是加，条件码是EQUAL)，并预测是否基于现有数据(profiling data)执行此操作。举例来说，基于操作与条件码的结合与经验数据显示相当大比例的时间里会执行条件非分支指令时，于是静态预测器预测此指令将会被执行，而指令转译器将发出单一个非条件微指令，例如：In the static prediction mechanism, the static predictor looks at this operation and/or the condition code specified by the conditional non-branch instruction (eg: ALU operation is add, condition code is EQUAL), and predicts whether to execute based on existing data (profiling data) this action. For example, when a conditional non-branch instruction is executed a significant percentage of the time based on a combination of operations and condition codes and empirical data, the static predictor predicts that the instruction will be executed, and the instruction translator will issue a single non-branch instruction. Conditional microinstructions, such as:

addcc dst,src1,src2addcc dst,src1,src2

此条件码与旗标系提供给微指令(也就是addcc)，因此，此执行单元可以确认此预测是否正确，并在预测错误时产生一误预测指标。The condition code and flag are provided to the microinstruction (ie, addcc), so the execution unit can confirm whether the prediction is correct and generate a misprediction indicator when the prediction is wrong.

相反地，操作与条件码的结合与经验数据显示相当大比例的时间条件非分支指令不会被执行的情况下，此静态预测器预测此指令不会被执行，此指令转译器发出单一个不操作(nop)微指令，例如：Conversely, where a combination of operations and condition codes and empirical data show that a substantial percentage of the time a conditional non-branch instruction will not be executed, the static predictor predicts that the instruction will not be executed, and the instruction translator issues a single non-branch instruction. Operation (nop) microinstructions, such as:

NopccNopcc

同样地，此条件码与旗标系提供给微指令(也就是nopcc)，因此，执行单元在必要时可以产生一误预测指标。Likewise, the condition codes and flags are provided to microinstructions (ie, nopcc), so that the execution unit can generate a misprediction indicator when necessary.

在执行/不执行的比例尚未大到足以证明静态预测的结果是正确的情况下，指令转译器会恢复至前述较低效能的多微指令解决方案，例如：转译器放出两个微指令：In the case where the ratio of go/no-go is not large enough to justify the static prediction, the instruction translator reverts to the previously less efficient multi-microinstruction solution, eg: the translator issues two microinstructions:

add tmp,src1,src2add tmp,src1,src2

movcc dst,src-dst,tmp//src-dst是当前的dst reg数值movcc dst, src-dst, tmp//src-dst is the current dst reg value

在动态预测机制中，一个类似BTAC的架构，也就是这里所称的条件ALU走向快取(conditional ALU direction cache,CADC)撷取以前执行的条件非分支指令的走向历史信息以及其程序计数器/指令指针数值，并基于CADC入口的历史信息中撷取地址数值命中情形，来预测后续撷取的条件非分支指令的走向。此CADC提供其预测至指令转译器。指令转译器依据前述静态预测器所作的预测发出微指令。In the dynamic prediction mechanism, a BTAC-like architecture, also referred to here as the conditional ALU direction cache (CADC), retrieves the direction history information of previously executed conditional non-branch instructions and their program counters/instructions The pointer value is used to predict the direction of the subsequent fetched conditional non-branch instruction based on the hit situation of the fetched address value in the historical information of the CADC entry. This CADC provides its predictions to the instruction translator. The instruction translator issues microinstructions based on predictions made by the aforementioned static predictor.

回复机制会清除条件非分支指令所在的管线与所有其后的指令(更精确的来说，就是由其转译而来的微指令)、或是所有至少直接或间接与此条件非分支指令相关的指令，然后再重复执行(replay)所有被清除的指令。在条件非分支指令的重复执行中，转译器会倾向于采取发出多个微指令的方式。The recovery mechanism clears the pipeline of the conditional non-branch instruction and all subsequent instructions (more precisely, the microinstructions from which it is translated), or all at least directly or indirectly related to the conditional non-branch instruction. instruction, and then replay (replay) all cleared instructions. In repeated execution of conditional non-branch instructions, the translator will tend to issue multiple microinstructions.

本发明的一实施例是同时使用静态与动态预测器，并记录对各个程序计数器/指令指针数值而言，哪一个预测器较为准确的历史数据。依据已知的两阶层混合式分支预测的方法，可利用此历史数据以在这两个预测器中动态选择其一提供最终预测。One embodiment of the present invention uses both static and dynamic predictors, and records historical data on which predictor is more accurate for each program counter/instruction pointer value. According to the known two-level hybrid branch prediction method, the historical data can be used to dynamically select one of the two predictors to provide the final prediction.

值得注意的是，对于条件非分支指令的误预测会造成损失(即清除管线与重复执行条件非分支指令及其后指令或至少直接间接的相关指令)，此损失会变动，并且是应用程序码以及/或数据集的函数。所以，预测条件非分支指令的解决方案对某些应用程序码以及/或数据集的混合状态而言，可能是效能较低的解决方式。It is worth noting that there is a penalty for misprediction of conditional non-branch instructions (i.e. clearing the pipeline and repeating execution of conditional non-branch instructions and subsequent instructions or at least directly and indirectly related instructions), this penalty varies and is application code and/or a function of the dataset. Therefore, the solution of predicting conditional non-branch instructions may be a less efficient solution for some mixed states of application code and/or data sets.

此处定义一种非分支指令(non-branch instruction)，此指令并不写入微处理器的程序计数器，因此微处理器会撷取与执行此非分支指令的后续指令。程序计数器系应用于ARM架构，其它架构则会使用不同的元件来取代程序计数器。举例来说，x86ISA使用指令指针，而其它ISA使用指令地址寄存器。非分支指令与写入地址至程序计数器/指令指针以使微处理器指向此地址的分支指令有明显的差别。微处理器将开始由分支指令写入程序计数器/指令指针的地址来撷取指令，然后再执行所撷取的指令。此操作与撷取并执行分支指令的后续指令有明显差异，因撷取并执行分支指令的后续指令是微处理器的预设操作，并且也是在遭遇到非分支指令时的操作。条件非分支指令的例子包含条件ALU指令与条件加载/储存指令。A non-branch instruction is defined here. This instruction is not written into the program counter of the microprocessor, so the microprocessor will fetch and execute the following instructions of the non-branch instruction. The program counter is used in the ARM architecture, and other architectures use different components to replace the program counter. For example, the x86 ISA uses the instruction pointer, while other ISAs use the instruction address register. There is a distinct difference between a non-branch instruction and a branch instruction that writes an address to the program counter/instruction pointer so that the microprocessor points to that address. The microprocessor will begin fetching instructions from the address written to the program counter/instruction pointer by the branch instruction, and then execute the fetched instruction. This operation is significantly different from fetching and executing the subsequent instruction of the branch instruction, because fetching and executing the subsequent instruction of the branch instruction is the default operation of the microprocessor, and it is also the operation when a non-branch instruction is encountered. Examples of conditional non-branch instructions include conditional ALU instructions and conditional load/store instructions.

请参照图29，图中显示本发明对于非条件分支指令进行预测的微处理器100的方块图。图29的微处理器100系类似于图1的微处理器100，并包含与图1以及图4相类似的元件，即指令快取102、指令转译器104、配置寄存器122、寄存器配置表402、指令发布单元408、执行单元424与重排缓冲器422。执行单元424包含一个或多个单元来执行这里所称的微指令126。此外，执行单元424会执行不操作微指令126。不操作微指令126系指示执行单元424不执行操作。进一步来说，在此所称的不操作微指令126包含一由转译出此不操作微指令126的条件ALU指令124所指定的条件、或条件码。关于不操作微指令126在以下篇幅会有更进一步的说明。微处理器100亦包含架构寄存器、暂时暂存器126与图9的旗标926。Please refer to FIG. 29 , which shows a block diagram of the microprocessor 100 for predicting unconditional branch instructions according to the present invention. The microprocessor 100 of FIG. 29 is similar to the microprocessor 100 of FIG. 1 and includes similar elements to those of FIG. 1 and FIG. , instruction issue unit 408 , execution unit 424 and reorder buffer 422 . Execution unit 424 contains one or more units to execute what are referred to herein as microinstructions 126 . Additionally, the execution unit 424 may execute the no-op microinstruction 126 . The do not operate microinstruction 126 instructs the execution unit 424 not to perform the operation. Further, the no-op microinstruction 126 referred to herein includes a condition, or condition code, specified by the conditional ALU instruction 124 that translates the no-op microinstruction 126 . There will be further explanations about not operating the microinstruction 126 in the following pages. Microprocessor 100 also includes architectural registers, scratchpad 126 and flag 926 of FIG. 9 .

图29的微处理器100并包含一动态预测器2932、一静态预测器2936、与一预测器选择器2934。这些元件系耦接至指令转译器104，并用来预测(图2的)一条件ALU指令124的走向(被执行或不被执行)。图1的撷取地址134亦提供给动态预测器2932与预测器选择器2934。The microprocessor 100 of FIG. 29 also includes a dynamic predictor 2932, a static predictor 2936, and a predictor selector 2934. These elements are coupled to the instruction translator 104 and are used to predict the direction (executed or not executed) of a conditional ALU instruction 124 (of FIG. 2). The fetch address 134 of FIG. 1 is also provided to the dynamic predictor 2932 and the predictor selector 2934.

动态预测器2932与预测器选择器2934各自包含一具有多个入口的高速缓存。各个入口快取一之前执行的ARM条件ALU指令124的存储器地址。也就是说，当微处理器100引退一条件ALU指令124时，动态预测器2932与预测器选择器2934会被检视，来判断其是否包含一入口具有此条件ALU指令124的地址。若是，此入口就会依据一历史数据更新指针2974所指示的条件ALU指令124的正确走向做更新；若否，就会在动态预测器2932与预测器选择器2934中各配置一个入口给条件ALU指令124。虽然图1的动态预测器2932与预测器选择器2934系各自独立，不过在一实施例中，此二个元件系整合至单一高速缓存阵列。也就是说，此单一阵列的各个入口包含动态预测器2932的走向预测与预测器选择器2934的选择字段，这在以下篇幅会有进一步说明。Dynamic predictor 2932 and predictor selector 2934 each include a cache with multiple entries. Each entry caches the memory address of a previously executed ARM conditional ALU instruction 124. That is, when the microprocessor 100 retires a conditional ALU instruction 124, the dynamic predictor 2932 and the predictor selector 2934 are inspected to determine whether it contains an entry with the address of the conditional ALU instruction 124. If so, the entry will be updated according to the correct direction of the conditional ALU instruction 124 indicated by a historical data update pointer 2974; if not, an entry will be configured in each of the dynamic predictor 2932 and the predictor selector 2934 for the conditional ALU Instruction 124. Although the dynamic predictor 2932 and predictor selector 2934 of FIG. 1 are independent, in one embodiment, the two elements are integrated into a single cache array. That is, each entry of this single array contains the trend prediction of the dynamic predictor 2932 and the selection field of the predictor selector 2934, which will be further explained in the following pages.

动态预测器2932的各个入口储存有一条件ALU指令124的地址，各个入口并具有一字段以储存条件ALU指令124的走向预测。此走向预测系因应此条件ALU指令124在地址上引退的正确走向来进行更新。走向预测可包含各种不同格式。举例来说，走向预测可包含单一位来表示被执行或不被执行。若是预测走向是被执行，此位就设定为一特定数值，若是不执行，就设定为另一个数值。再举另一个例子，走向预测可包含一多位计数器，在预测走向是被执行时，此多位计数器会尽量递增，若是不执行，就会尽量递减。计数器数值大于中央值系预测被执行，小于中央值则是预测不被执行。Each entry of the dynamic predictor 2932 stores the address of a conditional ALU instruction 124, and each entry has a field to store the trend prediction of the conditional ALU instruction 124. The orientation prediction is updated in response to the correct orientation of the conditional ALU instruction 124 to retire at the address. Trend forecasts can contain a variety of different formats. For example, a trend prediction may include a single bit to indicate whether it is performed or not. This bit is set to a specific value if the predicted trend is to be executed, or to another value if it is not executed. As another example, the trend prediction may include a multi-bit counter. When the predicted trend is executed, the multi-bit counter will be incremented as much as possible, and if not executed, the multi-bit counter will be decremented as much as possible. If the counter value is greater than the median value, the prediction is executed, and if the counter value is smaller than the median value, the prediction is not executed.

每次只要从指令快取102撷取指令区块时，撷取地址134会提供至动态预测器2932。动态预测器2932检视此撷取地址134以确认是否吻合其高速缓存阵列的有效标签，亦即命中有效标签与否。若是撷取地址134未命中，动态预测器2932的动态预测输出端2982输出一数值表示无预测(no prediction,NP)。若是撷取地址134命中，动态预测器2932在其动态预测输出端2982输出一数值，表示执行(executed,E)走向或是一不执行(not executed,NE)走向是依据储存于相吻合入口的走向预测字段数值而定。在一实施例中，即使撷取地址134命中，动态预测器2932的动态预测输出端2982仍可能输出表示无预测(NP)的数值。例如在历史数据显示条件ALU指令124将被执行或不被执行的机率几乎相等，也就是条件将被满足或不被满足的机率几乎相等的情况下。此走向预测2982系提供给指令转译器104。The fetch address 134 is provided to the dynamic predictor 2932 each time an instruction block is fetched from the instruction cache 102 . The dynamic predictor 2932 looks at the fetch address 134 to see if it matches a valid tag in its cache array, ie hits a valid tag or not. If the fetch address 134 is not hit, the dynamic prediction output terminal 2982 of the dynamic predictor 2932 outputs a value indicating no prediction (NP). If the fetch address 134 is hit, the dynamic predictor 2932 outputs a value at its dynamic prediction output 2982, indicating that the executed (E) direction or a not executed (NE) direction is based on the memory stored in the matching entry. Depending on the value of the predicted field. In one embodiment, the dynamic prediction output 2982 of the dynamic predictor 2932 may output a value indicating no prediction (NP) even if the fetch address 134 is hit. For example, in the case where the historical data shows that the probability that the conditional ALU instruction 124 will be executed or not executed is almost equal, that is, the probability that the condition will be satisfied or not satisfied is almost equal. This trend prediction 2982 is provided to the instruction translator 104 .

预测器选择器2934的各个入口亦包含一字段，以储存每个地址储存在此入口的条件ALU指令124的选择子(selector)，选择子指出动态预测器2932或是静态预测器2936较可能正确预测条件ALU指令124的走向。选择子系因应此条件ALU指令124在地址上的引退进行更新，特别是历史数据更新指针(history update indicator)2974(指示此预测是由动态预测器2932或静态预测器2936所做的)所指示的正确走向与信息来进行更新。选择子可包含各种不同格式。举例来说，此选择子可包含单一位，来表示动态预测器2932或静态预测器2936。在动态预测器2932会正确预测走向时，此位系设定为一特定数值，而在静态预测器2936会正确预测走向时，设定为另一数值，若是二者都正确预测走向，就维持之前选择的预测器。再举另一个例子，选择子可包含一多位计数器，在动态预测器2932正确预测走向时，此多位计数器会尽量递增，若是静态预测器2936正确预测走向，就会尽量递减，若是二者都正确预测走向，就不更新计数器的数值。计数器数值大于中央值系预测动态预测器2932将会正确预测走向，小于中央值则是预测静态预测2936器将会正确预测走向。Each entry of predictor selector 2934 also contains a field to store the selector of conditional ALU instruction 124 for each address stored in that entry, the selector indicating whether dynamic predictor 2932 or static predictor 2936 is more likely to be correct Predict the direction of the conditional ALU instruction 124 . The selector is updated in response to the retirement of this conditional ALU instruction 124 at the address, specifically as indicated by the history update indicator 2974 (indicating that the prediction was made by the dynamic predictor 2932 or the static predictor 2936) correct direction and information to be updated. Selectors can contain a variety of different formats. For example, this selector may include a single bit to represent the dynamic predictor 2932 or the static predictor 2936. This bit is set to a specific value when the dynamic predictor 2932 correctly predicts the direction, and to another value when the static predictor 2936 correctly predicts the direction, and remains if both correctly predict the direction The previously selected predictor. To give another example, the selector may include a multi-bit counter. When the dynamic predictor 2932 correctly predicts the direction, the multi-bit counter will increase as much as possible, and if the static predictor 2936 correctly predicts the direction, the multi-bit counter will be decremented as much as possible. If the direction is correctly predicted, the value of the counter is not updated. A counter value greater than the median value predicts that the dynamic predictor 2932 will correctly predict the trend, and a counter value smaller than the median value predicts that the static predictor 2936 will correctly predict the trend.

每次由指令快取102撷取一指令区块，指令地址134就会提供给预测器选择器2934来检视撷取地址134以确认是否吻合其高速缓存阵列的一有效标签，亦即命中有效标签或是未命中。若是撷取地址134未命中，预测器选择器2934的预测选择输出端2984输出一数值表示无预测(no prediction,NP)。若是撷取地址134命中，预测器选择器2934依据储存于相吻合的入口的选择区数值，在其预测选择输出端2984输出一数值以表示动态预测器2932(D)或是静态预测器2936。在一实施例中，即使撷取地址134命中，预测器选择器2934的选择预测输出端2984可能还是输出数值表示无预测。例如在历史数据显示动态预测器2932或静态预测器2936都不大可能正确预测的情况。此预测选择2984系提供给指令转译器104。Each time an instruction block is fetched by the instruction cache 102, the instruction address 134 is provided to the predictor selector 2934 to examine the fetch address 134 to confirm whether it matches a valid tag of its cache array, ie, hits a valid tag or miss. If the fetch address 134 is not hit, the prediction select output 2984 of the predictor selector 2934 outputs a value indicating no prediction (NP). If the fetch address 134 hits, the predictor selector 2934 outputs a value at its predict select output 2984 to indicate the dynamic predictor 2932(D) or the static predictor 2936 according to the select field value stored in the matching entry. In one embodiment, even if the fetch address 134 hits, the select prediction output 2984 of the predictor selector 2934 may still output a value indicating no prediction. For example, historical data shows that neither the dynamic predictor 2932 nor the static predictor 2936 is likely to predict correctly. This predictive selection 2984 is provided to the instruction translator 104 .

静态预测器2936接收撷取自指令快取102的指令124，并分析此指令124的条件码与/或其指定的特殊ALU功能，以预测此条件ALU指令124的走向。静态预测器2936基本上包含一检视表(lookup table)，此检视表包含E、NE、或NP指标关联至各个可能的条件码/ALU功能组合。就一实施例而言，这些E、NE、或NP指标系依据写给ARM指令集架构的程序执行的经验数据而配置在静态预测器2936内。静态预测2986系提供给指令转译器104。在一实施例中，此静态预测器2936系整合在指令转译器104内。The static predictor 2936 receives the instruction 124 fetched from the instruction cache 102 and analyzes the condition code of the instruction 124 and/or the special ALU function it specifies to predict the direction of the conditional ALU instruction 124 . The static predictor 2936 basically includes a lookup table containing E, NE, or NP indicators associated with each possible condition code/ALU function combination. For one embodiment, these E, NE, or NP metrics are configured within static predictor 2936 based on empirical data written to ARM instruction set architecture program execution. Static prediction 2986 is provided to instruction translator 104 . In one embodiment, the static predictor 2936 is integrated within the instruction translator 104 .

指令转译器104利用前述预测2982/2984/2986，来将条件ALU指令转译为微指令126，在以下对应于第30与31图的篇幅会有更进一步的说明。这些预测2982/2984/2986系伴随着条件ALU指令124，沿着微处理器100的管线向下传送，供执行单元424利用以确认各个预测器2932/2934/2936是否正确预测条件ALU指令124的走向。在一实施例中，在判断每个时脉周期撷取自指令快取102的指令区块会包含多个条件ALU指令124的情况下，动态预测器2932、预测器选择器2934与静态预测器2936在每个时脉周期都会产生多个预测2982/2984/2986。The instruction translator 104 uses the aforementioned predictions 2982/2984/2986 to translate the conditional ALU instruction into the microinstruction 126, which will be further explained in the following sections corresponding to Figures 30 and 31. These predictions 2982/2984/2986 are passed down the pipeline of the microprocessor 100 along with the conditional ALU instruction 124 for use by the execution unit 424 to confirm whether each predictor 2932/2934/2936 correctly predicted the conditional ALU instruction 124 towards. In one embodiment, the dynamic predictor 2932 , the predictor selector 2934 and the static predictor are determined when the instruction block fetched from the instruction cache 102 per clock cycle contains multiple conditional ALU instructions 124 . The 2936 produces multiple predicted 2982/2984/2986 per clock cycle.

在一实施例中，此微处理器100的微架构在许多面向系类似于台湾威盛电子所生产的VIA Nano^TM处理器的微架构，不过，本实施例的微处理器100是经修改以支持ARM指令集架构。此VIA Nano^TM处理器的微架构系一高效能非循序执行超纯量微架构以支持x86指令集架构。此处理器经如本文所述的修改，即能额外支持ARM微架构，特别是详述于下列篇幅，对应于图2的ARM条件ALU指令124。In one embodiment, the microarchitecture of the microprocessor 100 is similar in many aspects to the microarchitecture of the VIA Nano ^™ processor produced by VIA Electronics, Taiwan, however, the microprocessor 100 of this embodiment is modified to support ARM instruction set architecture. The microarchitecture of the VIA Nano ^™ processor is a high performance non-sequential execution superscalar microarchitecture to support the x86 instruction set architecture. The processor is modified as described herein to additionally support the ARM microarchitecture, particularly detailed in the following sections, corresponding to the ARM conditional ALU instruction 124 of FIG. 2 .

寄存器配置表402表示一条件移动微指令3046(其详细说明请参照图30)系关联于一ALU微指令3044(其详细说明请参照图30)的结果，二者都是由指令转译器104在特定条件下将条件ALU指令124转译时发出。这些特定条件如下所述，在一条件ALU指令124没有可获得的预测、或是一条件ALU指令被误预测而重复执行时。The register allocation table 402 represents the result of a conditional move microinstruction 3046 (refer to FIG. 30 for a detailed description) associated with an ALU microinstruction 3044 (refer to FIG. 30 for a detailed description), both of which are executed by the instruction translator 104. Issued when a conditional ALU instruction 124 is translated under certain conditions. These specific conditions are described below when a conditional ALU instruction 124 has no available prediction, or a conditional ALU instruction is mispredicted and repeated.

暂时寄存器106储存微处理器100的非架构状态。暂时寄存器106可由微架构所利用来暂时性地储存执行指令集架构的指令所需的中间值。进一步来说，由指令转译器104放出的微指令可能将暂时寄存器106指定为来源以及/或目的操作数位置。特别是，图30的ALU微指令3044可能将一暂时寄存器106指定为其目的寄存器，相关的条件移动微指令3046则是将同一个暂时寄存器106指定为其中一个来源寄存器。这在以下篇幅会有更进一步的说明。Temporary registers 106 store the non-architectural state of microprocessor 100 . Temporary registers 106 may be utilized by the microarchitecture to temporarily store intermediate values required to execute instructions of the instruction set architecture. Further, microinstructions issued by instruction translator 104 may designate scratch registers 106 as source and/or destination operand locations. In particular, the ALU microinstruction 3044 of FIG. 30 may designate a temporary register 106 as its destination register, and a related conditional move microinstruction 3046 designates the same temporary register 106 as one of its source registers. This will be further explained in the following sections.

至少一个执行单元424具有一算术逻辑单元(ALU)(未图标)用以执行各种不同的微指令，包含图30所示的ALU微指令3044与具有条件码(CC)的非条件ALU微指令3045。此外，至少一个执行单元424系用以执行图30所示的条件移动微指令3046与具有条件码(CC)的不操作微指令3047。就图30的条件移动微指令3046、具有条件码的非条件ALU微指令3045、或是具有条件码的不操作微指令3047而言，执行单元424系接收条件码数值a224、a254或a274(请参照图30)作为输入值以及旗标926的当前数值。执行单元424确认旗标926的数值是否满足条件码a224、a254或a274指定的条件。因此，执行单元424可确认条件ALU指令124的正确走向，并判断动态预测器2932以及/或静态预测器2936是否对于条件ALU指令124的走向做出误预测，此判断结果系表示于一误预测指标(misprediction indication)2976以提供给重排缓冲器(ROB)422。此外，执行单元424判断预测器选择器2934选择的预测器2932,2936是否正确预测走向，此判断结果系用来更新动态预测器2932与预测器选择器2934。就图30的条件移动微指令3046而言，若是条件被满足，执行单元424系将来源寄存器1的字段a226所指定的暂时寄存器106的数值，移动至图30的目的寄存器字段a232所指定的架构寄存器106。若是条件不被满足，就将来源寄存器2的字段a228所指定的架构寄存器106的数值，也就是原本目的寄存器的数值，移动至目的寄存器字段a232所指定的架构寄存器106。At least one execution unit 424 has an arithmetic logic unit (ALU) (not shown) for executing various microinstructions, including the ALU microinstruction 3044 shown in FIG. 30 and an unconditional ALU microinstruction with a condition code (CC). 3045. In addition, at least one execution unit 424 is used to execute the conditional move microinstruction 3046 and the do not operate microinstruction 3047 with a condition code (CC) shown in FIG. 30 . For the conditional move microinstruction 3046 of FIG. 30, the unconditional ALU microinstruction 3045 with the condition code, or the do not operate microinstruction 3047 with the condition code, the execution unit 424 receives the condition code value a224, a254, or a274 (please 30) as the input value and the current value of the flag 926. The execution unit 424 confirms whether the value of the flag 926 satisfies the condition specified by the condition code a224, a254 or a274. Therefore, the execution unit 424 can confirm the correct direction of the conditional ALU instruction 124 and determine whether the dynamic predictor 2932 and/or the static predictor 2936 mispredicts the direction of the conditional ALU instruction 124, and the judgment result is represented by a misprediction A misprediction indication 2976 is provided to the rearrangement buffer (ROB) 422. In addition, the execution unit 424 judges whether the predictors 2932 and 2936 selected by the predictor selector 2934 correctly predict the trend, and the judgment result is used to update the dynamic predictor 2932 and the predictor selector 2934 . As far as the conditional move microinstruction 3046 in FIG. 30 is concerned, if the condition is satisfied, the execution unit 424 moves the value of the temporary register 106 specified by the field a226 of the source register 1 to the structure specified by the destination register field a232 of FIG. 30 . register 106. If the condition is not satisfied, the value of the architectural register 106 specified by the field a228 of the source register 2, that is, the value of the original destination register, is moved to the architectural register 106 specified by the destination register field a232.

重排缓冲器422接收来自执行单元424的结果，其包含条件ALU指令124的走向是否被误预测的指标。若是此走向并未被误预测，重排缓冲器422系以执行在来源寄存器1与来源寄存器2的字段a206所指定的来源操作数上的条件ALU指令124的操作码a202所指定的ALU操作产生的结果，来更新微处理器100的架构状态，这也就是利用这个结果来更新旗标926与条件ALU指令的目的寄存器字段a208所指定的架构寄存器106，这反映在图30的条件移动微指令3046的目的寄存器字段a232与具有操作码的非条件ALU微指令的目的寄存器字段a258。不过，若是走向系被误预测，重排缓冲器422会产生一真值于一误预测指标2976。此误预测指针2976系提供给指令转译器104，藉此，通过重复执行此被误预测的条件ALU指令124，指令转译器104知道需要依据一不预测原则(NP regime)回复实行多微指令(multiplemicroinstruction)技术。此误预测指标2976同时提供给其它相关的管线单元，如寄存器配置表402与指令发布单元408，使其在必要时能清除微指令。此重排缓冲器422同时依据条件ALU指令124的结果，也就是走向预测结果，产生历史数据更新数值2974来更新动态预测器2932与预测器选择器2934。Reorder buffer 422 receives results from execution unit 424 that contain an indication of whether the direction of conditional ALU instruction 124 was mispredicted. If this trend is not mispredicted, the rearrangement buffer 422 is generated by executing the ALU operation specified by the opcode a202 of the conditional ALU instruction 124 on the source operand specified by the field a206 of the source register 1 and the source register 2. The result is to update the architectural state of the microprocessor 100, which is to use this result to update the flag 926 and the architectural register 106 specified by the destination register field a208 of the conditional ALU instruction, which is reflected in the conditional move microinstruction of FIG. 30. The destination register field a232 of the 3046 is the same as the destination register field a258 of the unconditional ALU microinstruction with the opcode. However, if the direction is mispredicted, the rearrangement buffer 422 will generate a true value for a mispredicted indicator 2976. The misprediction pointer 2976 is provided to the instruction translator 104, whereby, by repeatedly executing the mispredicted conditional ALU instruction 124, the instruction translator 104 knows that multiple microinstructions ( multiplemicroinstruction) technology. The misprediction indicator 2976 is also provided to other related pipeline units, such as the register allocation table 402 and the instruction issue unit 408, so that it can clear micro-instructions when necessary. The rearrangement buffer 422 also generates a historical data update value 2974 to update the dynamic predictor 2932 and the predictor selector 2934 according to the result of the conditional ALU instruction 124, that is, the trend prediction result.

请参照图30，图中显示图29的指令转译器104对于条件ALU指令124的转译的方块示意图。如本文所述，图29的指令转译器104可能将条件ALU指令124转译为三个不同的微指令集，端视指定转译器104转译条件ALU指令124的环境为何，亦即如图30所示，条件ALU指令124是预测会被执行(E)、预测不会被执行(NE)、或是无预测(NP)。在一实施例中，条件ALU指令124系一ARM指令集架构定义的条件ALU指令。Please refer to FIG. 30 , which shows a block diagram of the translation of the conditional ALU instruction 124 by the instruction translator 104 of FIG. 29 . As described herein, the instruction translator 104 of FIG. 29 may translate the conditional ALU instruction 124 into three different sets of microinstructions, depending on the environment in which the translator 104 is assigned to translate the conditional ALU instruction 124, as shown in FIG. 30 . , the conditional ALU instruction 124 is predicted to be executed (E), predicted not to be executed (NE), or not predicted (NP). In one embodiment, the conditional ALU instruction 124 is a conditional ALU instruction defined by the ARM instruction set architecture.

条件ALU指令124包含一操作码字段a202、一条件码字段a204、来源寄存器1与来源寄存器2的字段a206，与一目的寄存器字段a208。操作码字段a202包含一数值以区分此条件ALU指令与指令集架构内的其它指令。The conditional ALU instruction 124 includes an opcode field a202, a condition code field a204, a source register 1 and source register 2 fields a206, and a destination register field a208. Opcode field a202 contains a value to distinguish this conditional ALU instruction from other instructions within the instruction set architecture.

条件码字段a204系指定一条件，在此条件下，依据旗标926当前的数值是否满足条件，目的寄存器将会被选择性地以下述ALU微指令3044的结果进行更新。依据一兼容于ARM指令集架构的实施例，此条件码字段a204系指定于条件ALU指令124的上四个位(亦即位[31:28])，使能依据下表3对于十六个不同的可能数值进行编码。对架构版本相关数值(0b1111)而言，此指令无法由架构版本预测，而是用来指出其它架构版本的非条件指令延伸空间(unconditional instruction extension space)。The condition code field a204 specifies a condition under which, depending on whether the current value of the flag 926 satisfies the condition, the destination register will be selectively updated with the result of the ALU microinstruction 3044 described below. According to an embodiment compatible with the ARM instruction set architecture, the condition code field a204 is assigned to the upper four bits (ie, bits [31:28]) of the conditional ALU instruction 124, enabling sixteen different possible values for encoding. For the architecture version dependent value (0b1111), this instruction cannot be predicted by the architecture version, but is used to indicate the unconditional instruction extension space of other architecture versions.

表3.table 3.

来源寄存器1与来源寄存器2的字段a206系指定立即值与持有输入操作数的架构寄存器106，而操作码a202指定的ALU操作(例如：加、减、乘、除、及、或等)将会依据对其执行以产生一结果。在条件满足时，此结果将会被条件性地加载由目的寄存器字段a208所指定的架构寄存器106。Field a206 of source register 1 and source register 2 specifies the immediate value and architectural register 106 holding the input operand, and the ALU operation (eg: add, subtract, multiply, divide, and, or, etc.) specified by opcode a202 will A result will be generated according to its execution. When the condition is met, the result will be conditionally loaded into the architectural register 106 specified by the destination register field a208.

在无预测(NP)的情况下，指令转译器104系将条件ALU指令124转译为一ALU微指令3044与一条件移动微指令3046供执行单元424执行。In the case of no prediction (NP), the instruction translator 104 translates the conditional ALU instruction 124 into an ALU microinstruction 3044 and a conditional move microinstruction 3046 for the execution unit 424 to execute.

ALU微指令3044包含一操作码字段a212、来源寄存器1与来源寄存器2的字段a216、与一目的寄存器字段a218。操作码字段a212包含一数值来区别ALU微指令3044与微处理器100的微指令集架构的其它微指令。由条件ALU指令124的操作码a202所指定的ALU功能系传达至ALU微指令3044的操作码字段a212。来源寄存器1与来源寄存器2的字段a216系指定立即值与持有操作数的架构寄存器106。由操作码a212指定的ALU操作将会依据操作数执行以产生一结果，而此结果将会被加载由目的寄存器字段a218所指定的架构或暂时寄存器106。在无预测的情况下，当指令转译器104转译条件ALU指令124时，指令转译器104系以相同于条件ALU指令124的来源寄存器1与来源寄存器2的字段a206的数值，填入ALU微指令3044的来源寄存器1与来源寄存器2的字段a216。当指令转译器104转译条件ALU指令124时，指令转译器104系填写目的寄存器字段a218以指定一暂时寄存器106接收ALU操作的结果。The ALU microinstruction 3044 includes an opcode field a212, fields a216 of source register 1 and source register 2, and a destination register field a218. The opcode field a212 contains a value to distinguish the ALU microinstruction 3044 from other microinstructions of the microinstruction set architecture of the microprocessor 100 . The ALU function specified by opcode a202 of conditional ALU instruction 124 is communicated to opcode field a212 of ALU microinstruction 3044 . Field a216 of source register 1 and source register 2 specifies the immediate value and the architectural register 106 that holds the operand. The ALU operation specified by opcode a212 will be performed on the operands to produce a result, which will be loaded into the architectural or scratch register 106 specified by destination register field a218. In the case of no prediction, when the instruction translator 104 translates the conditional ALU instruction 124, the instruction translator 104 fills the ALU microinstruction with the same value as the field a206 of the source register 1 and the source register 2 of the conditional ALU instruction 124 Field a216 of source register 1 and source register 2 of 3044. When the instruction translator 104 translates the conditional ALU instruction 124, the instruction translator 104 fills in the destination register field a218 to designate a temporary register 106 to receive the result of the ALU operation.

条件移动微指令3046包含一操作码字段a222、一条件码字段a224、一来源寄存器1的字段a226、一来源寄存器2的字段a228、与一目的寄存器字段a232。操作码字段a222包含一数值来区别此条件移动微指令3046与微处理器100的微指令集架构的其它微指令。条件码字段a224指定一条件，以依据旗标926的当前数值是否满足条件ALU指令124的条件码字段a204的条件，而选择性地执行移动操作。事实上，在转译条件ALU指令124时，指令转译器104系以相同于条件ALU指令124的条件码字段a204的数值，填入条件移动微指令3046的条件码字段a224。来源寄存器1的字段a226指定一架构或是暂时寄存器106，来指出此寄存器的第一来源操作数将会被提供至条件移动微指令3046；而来源寄存器2的字段a228指定一架构或是暂时寄存器106，来指出此寄存器的第二来源操作数将会被提供至条件移动微指令3046。当指令转译器104转译条件ALU指令124时，指令转译器104系以相同于其填入ALU微指令3044的目的寄存器字段a218的数值，填入来源寄存器1的字段a226。指令转译器104亦以相同于其填入条件ALU指令124的目的寄存器字段a208的数值，填入来源寄存器2的字段a228。也就是说，来源寄存器2的字段a228会使条件移动微指令3046接收目的寄存器的当前数值，藉以在条件不被满足时，能够将当前数值写回目的寄存器。此指令转译器104系以相同于条件ALU指令的目的寄存器字段a208的数值，填入目的寄存器字段a232，藉此，不是在条件不满足时，将条件ALU指令124所指定的目的寄存器字段的当前数值加载目的寄存器，就是在条件满足时，将持有ALU微指令3044结果的暂时寄存器的数值加载目的寄存器。The conditional move microinstruction 3046 includes an opcode field a222, a condition code field a224, a source register 1 field a226, a source register 2 field a228, and a destination register field a232. Opcode field a222 contains a value to distinguish this conditional move microinstruction 3046 from other microinstructions of the microinstruction set architecture of microprocessor 100 . The condition code field a224 specifies a condition to selectively execute the move operation according to whether the current value of the flag 926 satisfies the condition of the condition code field a204 of the conditional ALU instruction 124 . In fact, when translating the conditional ALU instruction 124, the instruction translator 104 populates the condition code field a224 of the conditional move microinstruction 3046 with the same value as the condition code field a204 of the conditional ALU instruction 124. Field a226 of source register 1 specifies an architectural or scratch register 106 to indicate that the first source operand of this register will be provided to the conditional move microinstruction 3046; and field a228 of source register 2 specifies an architectural or scratch register 106 to indicate that the second source operand of this register will be provided to the conditional move microinstruction 3046. When the instruction translator 104 translates the conditional ALU instruction 124, the instruction translator 104 fills the source register 1 field a226 with the same value it fills in the destination register field a218 of the ALU microinstruction 3044. Instruction translator 104 also fills field a228 of source register 2 with the same value it fills in destination register field a208 of conditional ALU instruction 124 . That is, the field a228 of the source register 2 causes the conditional move microinstruction 3046 to receive the current value of the destination register, so that when the condition is not satisfied, the current value can be written back to the destination register. The instruction translator 104 fills the destination register field a232 with the same value as the destination register field a208 of the conditional ALU instruction, so that the current value of the destination register field specified by the conditional ALU instruction 124 is not changed when the condition is not satisfied. The value loading to the destination register is to load the value of the temporary register holding the result of the ALU microinstruction 3044 into the destination register when the condition is satisfied.

在一实施例中，在无预测(NP)的情况下，指令转译器104系将条件ALU指令124转译为第10至28图所述的微指令126。如前述，微指令组126会随着条件ALU指令124而改变，例如：来源寄存器的其中之一是否为目的寄存器、是否为一旗标更新指令、是否指定一预移位、是否使用当前的进位旗标数值、以及在旗标更新预移位的情况下，此ALU操作是否更新进位旗标。特别是，在部分预移位条件ALU指令124的情况下，微指令组会包含如图10所示的三个微指令126，而非如图30所示的两个微指令126。其次，在条件ALU指令124将来源寄存器其中之一指定为目的寄存器的情况下，微指令组所包含的微指令126的数量会减少一个，如图21与图10所比较者。更进一步说，这个微指令组不包含条件移动微指令126，而是由条件ALU微指令126提供条件移动的功能。结果是，在一些实例中，微指令组仅包含如图21所示的单一个微指令126，而非如图30所示的两个微指令126。此外，在旗标更新条件ALU指令124的情况下，微指令组所包含的条件移动微指令126略为不同于图30所示的条件移动微指令126。特别是，为了确认条件是否满足，图10步骤1044、1054与1056所述的条件移动微指令(CMOV)126系检验一非架构旗标，此非架构旗标系由微指令集内前一个微指令126基于架构旗标是否满足条件来进行更新，相较之下，图30的条件移动微指令126则是检验架构旗标来确认条件是否满足。最后，虽然图30的ALU微指令126系一非条件ALU微指令126，第10与21图的ALU微指令126在一些情况下可能是条件ALU微指令126。In one embodiment, without prediction (NP), the instruction translator 104 translates the conditional ALU instruction 124 into the microinstruction 126 described in FIGS. 10-28. As mentioned above, the microinstruction group 126 will change with the conditional ALU instruction 124, such as: whether one of the source registers is the destination register, whether it is a flag update instruction, whether a pre-shift is specified, whether the current carry is used or not The flag value, and in the case of a flag update preshift, whether this ALU operation updates the carry flag. In particular, in the case of a partial preshift conditional ALU instruction 124, the microinstruction group will contain three microinstructions 126 as shown in FIG. 10 instead of two microinstructions 126 as shown in FIG. 30 . Secondly, when the conditional ALU instruction 124 designates one of the source registers as the destination register, the number of microinstructions 126 included in the microinstruction group is reduced by one, as compared with FIG. 21 and FIG. 10 . Furthermore, this microinstruction group does not contain the conditional move microinstruction 126, but the conditional move function is provided by the conditional ALU microinstruction 126. As a result, in some instances, the microinstruction group contains only a single microinstruction 126 as shown in FIG. 21 , rather than two microinstructions 126 as shown in FIG. 30 . In addition, in the case of the flag update conditional ALU instruction 124, the conditional move microinstruction 126 included in the microinstruction group is slightly different from the conditional move microinstruction 126 shown in FIG. 30 . In particular, in order to confirm whether the condition is satisfied, the conditional move microinstruction (CMOV) 126 described in steps 1044, 1054 and 1056 of FIG. 10 checks a non-architectural flag, which is determined by the previous microinstruction in the microinstruction set. The instruction 126 is updated based on whether the architectural flag satisfies the condition. In contrast, the conditional move microinstruction 126 of FIG. 30 checks the architectural flag to confirm whether the condition is met. Finally, while the ALU microinstruction 126 of Figure 30 is an unconditional ALU microinstruction 126, the ALU microinstruction 126 of Figures 10 and 21 may be a conditional ALU microinstruction 126 in some cases.

在被执行(E)的情况下，指令转译器104系将条件ALU指令124转译为一具有条件码的非条件ALU微指令3045供执行单元424执行。此具有条件码的非条件ALU微指令3045包含一操作码字段a252、一条件码字段a254、来源寄存器1与来源寄存器2的字段a256、与一目的寄存器字段a258。此操作码字段a252包含一数值来区别此具有条件码的非条件ALU微指令3045与微处理器100的微指令集架构内的其它微指令。由条件ALU指令124的操作码a252指定的ALU功能系传达至具有条件码的非条件ALU微指令3045的操作码字段a252。来源寄存器1与来源寄存器2的字段a256指定立即值且架构寄存器106来持有操作数，由操作码a252指定的ALU操作将以它们为操作数来执行并产生一结果。此结果将会被加载至由目的寄存器字段a258所指定的架构或暂时寄存器106。在执行的情况下，指令转译器104系以相同于条件ALU指令124的来源寄存器1与来源寄存器2的字段a206的数值，填入具有条件码的非条件ALU微指令3045的来源寄存器1与来源寄存器2的字段a256。在转译条件ALU指令124时，指令转译器104系以相同于条件ALU指令124的条件码字段a204的数值，填入具有条件码的非条件ALU微指令3045的条件码字段a254。此条件码a254系由执行单元424来确认相关条件ALU指令124的走向是否被误预测。在转译条件ALU指令124时，指令转译器104系以相同于条件ALU指令124的目的寄存器字段a208的数值，填入目的寄存器字段a258。因此，由于相关的条件ALU指令124被预测执行，此具有条件码的非条件ALU微指令3045系一非条件微指令，它不论条件是否满足都会被执行。然而，此具有条件码的非条件ALU微指令3045的预测是类似于一预测分支指令，由于其执行预测仍须经审查，而在发现误预测的情况下，将不会以ALU结果更新目的寄存器字段a258所指定的架构寄存器106，而是清除架构寄存器106，并重复执行相关的条件ALU指令124，而这次执行就不具有预测。相反地，若是执行预测是正确的，就会利用ALU结果来更新目的寄存器字段a258所指定的架构寄存器106。在一实施例中，除了图30的具有条件码的非条件ALU微指令126之外，当条件ALU指令124指定一如第10至28图所述的预移位操作时，指令转译器104会另外为条件ALU指令124转译一移位微指令126，此移位微指令126系位于具有条件码的非条件ALU微指令126之前。举例来说，此移位微指令126系类似于图10的步骤1034所述的移位微指令，而图30的具有条件码的非条件ALU微指令126系经修正来将暂时寄存器指定为其来源操作数寄存器，此暂时寄存器系移位微指令126的目的寄存器。在存在误预测的情况下，除了具有条件码的非条件ALU微指令126之外，此移位微指令126将在图31的步骤3134中被清除(如下所述)。When executed (E), the instruction translator 104 translates the conditional ALU instruction 124 into a non-conditional ALU microinstruction 3045 with a condition code for the execution unit 424 to execute. The unconditional ALU microinstruction 3045 with condition code includes an opcode field a252, a condition code field a254, fields a256 of source register 1 and source register 2, and a destination register field a258. The opcode field a252 contains a value to distinguish the unconditional ALU microinstruction 3045 with a condition code from other microinstructions within the microinstruction set architecture of the microprocessor 100 . The ALU function specified by the opcode a252 of the conditional ALU instruction 124 is communicated to the opcode field a252 of the non-conditional ALU microinstruction 3045 with the condition code. Fields a256 of source register 1 and source register 2 specify immediate values and architectural register 106 holds the operands with which the ALU operation specified by opcode a252 will be performed and produce a result. The result will be loaded into the architectural or scratch register 106 specified by the destination register field a258. In the case of execution, the instruction translator 104 fills the source register 1 and the source of the unconditional ALU microinstruction 3045 with the condition code with the same values as the source register 1 and the source register 2 of the conditional ALU instruction 124. Field a256 of register 2. When translating the conditional ALU instruction 124 , the instruction translator 104 fills the condition code field a254 of the non-conditional ALU microinstruction 3045 with the condition code with the same value as the condition code field a204 of the conditional ALU instruction 124 . The condition code a254 is used by the execution unit 424 to confirm whether the direction of the associated conditional ALU instruction 124 is mispredicted. When translating the conditional ALU instruction 124 , the instruction translator 104 fills the destination register field a258 with the same value as the destination register field a208 of the conditional ALU instruction 124 . Therefore, since the associated conditional ALU instruction 124 is predicted to execute, the unconditional ALU microinstruction 3045 with the condition code is an unconditional microinstruction that is executed regardless of whether the condition is met. However, the prediction of this non-conditional ALU microinstruction 3045 with a condition code is similar to a predicted branch instruction, since its execution prediction still needs to be reviewed, and in the case of a misprediction, the destination register will not be updated with the ALU result Instead, the architectural register 106 specified by field a258 is cleared, and the associated conditional ALU instruction 124 is repeatedly executed, this time without prediction. Conversely, if the execution prediction is correct, the architectural register 106 specified by the destination register field a258 is updated with the ALU result. In one embodiment, in addition to the non-conditional ALU microinstruction 126 with condition code of FIG. 30, when the conditional ALU instruction 124 specifies a preshift operation as described in FIGS. 10-28, the instruction translator 104 will In addition, a shift microinstruction 126 is translated for the conditional ALU instruction 124, and the shift microinstruction 126 precedes the non-conditional ALU microinstruction 126 with the condition code. For example, this shift microinstruction 126 is similar to the shift microinstruction described in step 1034 of FIG. 10, while the non-conditional ALU microinstruction 126 with condition code of FIG. 30 is modified to designate the scratch register as its The source operand register, this temporary register is the destination register of the shift microinstruction 126 . In the presence of a misprediction, this shift microinstruction 126 will be cleared in step 3134 of Figure 31 (described below), except for the non-conditional ALU microinstruction 126 with a condition code.

在不执行(NE)的情况下，指令转译器104系将条件ALU指令124转译为一具有条件码的不操作微指令3047供执行单元424执行。此具有条件码的不操作微指令3047包含一操作码字段a272与一条件码字段a274。此操作码字段a272包含一数值以区别此具有条件码的不操作微指令3047与微处理器100的微指令集架构内的其它微指令。在转译条件ALU指令124时，此指令转译器104系将相同于条件ALU指令124的条件码字段a204的数值填入此具有条件码的不操作微指令3047的条件码字段a274。此条件码a274系供执行单元424利用来确认相关的条件ALU指令124的走向是否被误预测。此具有条件码的不操作微指令3047除了使执行单元424启动来检查条件ALU指令的走向预测外，并不会执行其它操作。In the case of no execution (NE), the instruction translator 104 translates the conditional ALU instruction 124 into a no-operation microinstruction 3047 with a condition code for the execution unit 424 to execute. The no-op microinstruction 3047 with condition code includes an opcode field a272 and a condition code field a274. The opcode field a272 contains a value to distinguish the no-op microinstruction 3047 with the condition code from other microinstructions within the microinstruction set architecture of the microprocessor 100 . When translating the conditional ALU instruction 124, the instruction translator 104 fills the condition code field a274 of the no-op microinstruction 3047 with the condition code with the same value as the condition code field a204 of the conditional ALU instruction 124. The condition code a274 is used by the execution unit 424 to determine whether the course of the associated conditional ALU instruction 124 has been mispredicted. The no-op microinstruction 3047 with the condition code does nothing but cause the execution unit 424 to start to check the direction prediction for the conditional ALU instruction.

请参照图31(包括图31A和图31B)，图中是一流程图显示本发明图29的微处理器100执行图30的一条件ALU指令124的一实施例。此流程同时始于步骤3102、3104与3106。Please refer to FIG. 31 (including FIG. 31A and FIG. 31B ), which is a flowchart showing an embodiment of the microprocessor 100 of FIG. 29 executing a conditional ALU instruction 124 of FIG. 30 of the present invention. The process begins at steps 3102, 3104, and 3106 simultaneously.

在步骤3102中，一个包含有图30的条件ALU指令124的指令区块依据如图29所示的指令快取102的撷取地址134进行撷取。接下来进入步骤3108。In step 3102 , an instruction block containing the conditional ALU instruction 124 of FIG. 30 is fetched according to the fetch address 134 of the instruction cache 102 as shown in FIG. 29 . Next, go to step 3108.

在步骤3104中，动态预测器2932检视撷取地址134，并提供动态预测2982至图29的指令转译器104。接下来进入步骤3108。In step 3104, the dynamic predictor 2932 looks at the fetch address 134 and provides the dynamic predictor 2982 to the instruction translator 104 of FIG. Next, go to step 3108.

在步骤3106，预测器选择器2934检视撷取地址134并提供一预测器选择2984至图29的指令转译器。接下来进入步骤3108。At step 3106, the predictor selector 2934 looks at the fetch address 134 and provides a predictor selector 2984 to the instruction translator of FIG. 29. Next, go to step 3108.

在步骤3108，静态预测器2936接收条件ALU指令124，经评估后，提供静态预测2984至图29的指令转译器104。接下来进入步骤3112。At step 3108, the static predictor 2936 receives the conditional ALU instruction 124 and, after evaluation, provides a static prediction 2984 to the instruction translator 104 of FIG. 29 . Next, go to step 3112.

在步骤3112，指令转译器104遇到条件ALU指令124，并接收到来自动态预测器2932、预测器选择器2934与静态预测器2936的预测2982/2984/2986，基于此，指令转译器104产生此条件ALU指令124的走向预测。接下来进入步骤3114。At step 3112, instruction translator 104 encounters conditional ALU instruction 124 and receives predictions 2982/2984/2986 from dynamic predictor 2932, predictor selector 2934, and static predictor 2936, based on which instruction translator 104 generates This conditional ALU instruction 124 trend prediction. Next, go to step 3114.

在步骤3114中，指令转译器104确认其在步骤3112所预测的条件ALU指令124是否被执行。若是，此流程进入步骤3116；否则就进入步骤3118进行判断。In step 3114, the instruction translator 104 confirms whether the conditional ALU instruction 124 it predicted in step 3112 was executed. If so, the flow goes to step 3116; otherwise, it goes to step 3118 for judgment.

在步骤3116，指令转译器104系依据执行预测，发出如图30所示的具有条件码的非条件ALU微指令3045。接下来进入步骤3126。At step 3116, the instruction translator 104 issues an unconditional ALU microinstruction 3045 with a condition code as shown in FIG. 30 based on the execution prediction. Next, go to step 3126.

在步骤3118，指令转译器104确认其在步骤3112所预测的条件ALU指令124是否不会被执行。若是，此流程进入步骤3122；否则就进入步骤3124。At step 3118, the instruction translator 104 confirms whether the conditional ALU instruction 124 it predicted at step 3112 will not be executed. If so, the flow goes to step 3122; otherwise, it goes to step 3124.

在步骤3122，指令转译器104系依据不执行预测，发出如图30所示的具有条件码的不操作微指令3047。接下来进入步骤3126。In step 3122, the instruction translator 104 issues a no-op microinstruction 3047 with a condition code as shown in FIG. 30 based on the no-execution prediction. Next, go to step 3126.

在步骤3124，在无预测的情况下，指令转译器104系放出如图30所示的ALU微指令3044与条件移动微指令3046。接下来进入步骤3126。At step 3124, in the absence of prediction, the instruction translator 104 issues the ALU microinstruction 3044 and the conditional move microinstruction 3046 as shown in FIG. 30 . Next, go to step 3126.

在步骤3126，执行单元424执行指令转译器104于步骤3116,3122或3124发出的微指令126。在无预测的情况下，执行单元424通过执行由操作码字段a212所指定的ALU功能于指定于字段a216的来源操作数，来执行ALU微指令3044以产生一结果，此结果系输出至结果总线128并被写入重排缓冲器配置给ALU微指令3044的入口，期待之后能写入由字段a218所指定的暂时寄存器106。一旦ALU微指令3044的结果是可获得的，条件移动微指令3046就能够被发送至执行单元424以确认旗标926是否满足由条件码244所指定的条件。若是，ALU微指令3044(不是来自导向总线就是来自暂时寄存器106)的结果就会输出至结果总线128，并被写入重排缓冲器配置给条件移动微指令3046的入口，期待之后能被写入由字段a232所指定的架构寄存器106。不过，若是条件不满足，由来源寄存器2的字段a228所指定的架构寄存器106的原本数值，即由条件ALU指令124的目的寄存器字段a208所指定的架构寄存器，就会被输出至结果总线并被写入重排缓冲器配置给条件移动微指令3046的入口，期待之后能被写入字段a232所指定的架构寄存器106。此执行单元242同时指定一正确预测至重排缓冲器(因为指令转译器104是因应于无预测的情况下产生ALU微指令3044与条件移动微指令3046)。也就是说，在无预测的情况下，既然没有预测，就决不会产生误预测。在预测执行的情况下，执行单元424系通过执行由操作码字段a252所指定的ALU功能于字段a256所指定的来源操作数，来执行具有条件码的非条件ALU微指令3045以产生一结果，此结果系输出至结果总线128并被写入重排缓冲器配置给具有条件码的非条件ALU微指令的入口，期待之后能被写入字段a258所指定的架构寄存器106。执行单元424同时确认旗标926是否满足由条件码a254指定的条件，并据以提供一指标至重排缓冲器422。进一步来说，执行单元424只在旗标926不满足由条件码a254指定的条件时，会指示误预测至重排缓冲器422，这是因为指令转译器104在执行预测的情况下会产生具有条件码的非条件ALU微指令3045，否则就指示正确预测。在不执行的情况下，执行单元424不会执行任何操作，来因应具有条件码的不操作微指令3047的执行。此外，执行单元424确认旗标926是否满足由条件码a274所指定的条件并据以提供一指标给重排缓冲器422。进一步来说，执行单元424只在旗标满足条件码a254所指定的条件时，会指示误预测给重排缓冲器422，这是因为指令转译器104在预测不执行的情况下，会产生具有条件码的不操作微指令3047，否则就指示正确预测。接下来进入决策步骤3128。At step 3126, the execution unit 424 executes the microinstructions 126 issued by the instruction translator 104 at steps 3116, 3122 or 3124. Without prediction, execution unit 424 executes ALU microinstruction 3044 by executing the ALU function specified by opcode field a212 on the source operand specified in field a216 to generate a result, which is output to the result bus 128 and is assigned to the entry of the ALU microinstruction 3044 by the write rearrangement buffer, expecting to be able to write to the scratch register 106 specified by field a218 later. Once the result of the ALU microinstruction 3044 is available, the conditional move microinstruction 3046 can be sent to the execution unit 424 to confirm whether the flag 926 meets the condition specified by the condition code 244 . If so, the result of the ALU microinstruction 3044 (either from the steering bus or from the temporary register 106) will be output to the result bus 128 and written to the rearrangement buffer configured to the entry of the conditional move microinstruction 3046, which is expected to be written later into the architectural register 106 specified by field a232. However, if the condition is not satisfied, the original value of the architectural register 106 specified by the field a228 of the source register 2, that is, the architectural register specified by the destination register field a208 of the conditional ALU instruction 124, will be output to the result bus and be The write rearrangement buffer is allocated to the entry of the conditional move microinstruction 3046, and is expected to be later written to the architectural register 106 specified by field a232. The execution unit 242 also assigns a correct prediction to the reorder buffer (because the instruction translator 104 generates ALU microinstructions 3044 and conditional move microinstructions 3046 in response to no prediction). That is to say, in the case of no prediction, since there is no prediction, there will never be a misprediction. In the case of speculative execution, the execution unit 424 executes the unconditional ALU microinstruction 3045 with the condition code to generate a result by executing the ALU function specified by the opcode field a252 on the source operand specified by the field a256, This result is output to the result bus 128 and written to the rearrangement buffer allocated to the entry of the unconditional ALU microinstruction with the condition code, which is expected to be written to the architectural register 106 specified by field a258. The execution unit 424 also confirms whether the flag 926 satisfies the condition specified by the condition code a254, and provides an indicator to the rearrangement buffer 422 accordingly. Further, the execution unit 424 will only indicate a misprediction to the reorder buffer 422 when the flag 926 does not satisfy the condition specified by the condition code a254, because the instruction translator 104 would generate a Unconditional ALU microinstruction 3045 for condition code, otherwise indicates correct prediction. In the case of no execution, the execution unit 424 will not perform any operation in response to the execution of the no-operation microinstruction 3047 with the condition code. In addition, the execution unit 424 confirms whether the flag 926 satisfies the condition specified by the condition code a274 and provides an indicator to the rearrangement buffer 422 accordingly. Further, the execution unit 424 will indicate a misprediction to the rearrangement buffer 422 only when the flag satisfies the condition specified by the condition code a254. This is because the instruction translator 104 will generate a misprediction when the prediction is not executed. The condition code does not operate the microinstruction 3047, otherwise it indicates a correct prediction. Next, the decision step 3128 is entered.

在决策步骤3128，重排缓冲器422基于接收自执行单元242的误预测指针2976，判断条件ALU指令124的走向是否被误预测。若是，此流程进入步骤3134；若否，就进入步骤3132。At decision step 3128, the reorder buffer 422 determines whether the course of the conditional ALU instruction 124 was mispredicted based on the misprediction pointer 2976 received from the execution unit 242. If yes, the process goes to step 3134; if not, goes to step 3132.

在步骤3132，重排缓冲器422系以条件ALU指令124的结果更新微处理器100的架构状态，亦即更新架构寄存器106与旗标926。进一步来说，由于重排缓冲器422必须依照程序顺序引退指令，重排缓冲器422会在条件移动微指令3046(在无预测的情况下)、具有条件码的非条件ALU微指令3045(在预测执行的情况下)、或是具有条件码的不操作微指令3047(在预测不执行的情况下)变成微处理器100中最旧的微指令时，更新架构状态。接下来进入步骤3136。At step 3132, the reorder buffer 422 updates the architectural state of the microprocessor 100 with the result of the conditional ALU instruction 124, ie, the architectural registers 106 and the flags 926 are updated. Further, since reorder buffer 422 must retire instructions in program order, reorder buffer 422 will conditionally move microinstructions 3046 (in the unpredicted case), non-conditional ALU microinstructions 3045 with condition codes (in the In the case of speculative execution), or when a no-op microinstruction 3047 with a condition code (in the case of speculative non-execution) becomes the oldest microinstruction in the microprocessor 100, the architectural state is updated. Next, go to step 3136.

在步骤3134中，重排缓冲器422产生一真值(true value)于误预测指标2976，以使条件ALU指令124转译产生的微指令与所有与其相关的微指令都被清除。此外，在误预测指标2976产生一真值也会使条件ALU指令124重复执行。也就是说，指令转译器104会再次转译条件ALU指令124，只是这次是依循步骤3124的无预测原则进行。依据另一实施例，在条件ALU指令124重复执行时，指令转译器104会反转正确的预测，并依据此反转后的预测进行转译。也就是说，若是预测执行是误预测，指令转译器104就会依据预测不执行的原则来执行转译，若是预测不执行是误预测，指令转译器就会依据预测执行的原则来执行转译。不过，值得注意的是，这个实施例会容易产生活锁(livelock)的情形。In step 3134, the rearrangement buffer 422 generates a true value in the misprediction indicator 2976, so that the microinstructions generated by the translation of the conditional ALU instruction 124 and all microinstructions associated therewith are cleared. In addition, generating a true value in the misprediction indicator 2976 also causes the conditional ALU instruction 124 to repeat execution. That is, the instruction translator 104 translates the conditional ALU instruction 124 again, but this time according to the no-prediction principle of step 3124. According to another embodiment, when the conditional ALU instruction 124 is repeatedly executed, the instruction translator 104 inverts the correct prediction and translates according to the inverted prediction. That is to say, if the predicted execution is a misprediction, the instruction translator 104 will execute the translation according to the principle of the predicted execution. If the predicted execution is a misprediction, the instruction translator will execute the translation according to the predicted execution principle. However, it is worth noting that this embodiment is prone to livelock situations.

在步骤3136，重排缓冲器422提供适当数值的历史数据更新指针2974至动态预测器2932与预测器选择器2934，并依据执行单元424评估的正确走向与沿着管线流动的预测信息2982/2984/2986来更新动态预测器2932与预测器选择器2934。At step 3136, the rearrangement buffer 422 provides the historical data update pointer 2974 with the appropriate value to the dynamic predictor 2932 and predictor selector 2934, and based on the execution unit 424 evaluating the correct direction and predicting information flowing along the pipeline 2982/2984 /2986 to update the dynamic predictor 2932 and predictor selector 2934.

由前文可以发现，本发明的微处理器100系将一条件ALU指令124转译为单一个微指令而非多个微指令，也就是在能够预测走向的时候将会产生极大的优势。It can be found from the foregoing that the microprocessor 100 of the present invention translates a conditional ALU instruction 124 into a single microinstruction instead of multiple microinstructions, that is, when the trend can be predicted, it will have a great advantage.

首先，本发明可以减少一个或多个需要占用非循序执行微处理器100的资源中的额外指令槽的微指令。这些资源包括寄存器配置表402、重排缓冲器、保留站(未图标)与执行单元424。因此，本发明可以减少并简化所需使用的资源，这些资源所消耗的能量也可以减少。First, the present invention can reduce one or more microinstructions that need to occupy additional instruction slots in the resources of the non-sequential execution microprocessor 100 . These resources include register allocation table 402 , reorder buffers, reservation stations (not shown), and execution units 424 . Therefore, the present invention can reduce and simplify the resources required to be used, and the energy consumed by these resources can also be reduced.

其次，指令转译器104每个时脉周期所能转译的指令集架构(例如ARM指令)程序指令的平均数量将可增加。假定指令转译器104每个时脉周期能够转译最多三个ARM指令，不过每个时脉周期最多只能发出三个微指令，而且包含此指令转译器104在同一个时脉周期内必须放出关联于此ARM指令的所有微指令的额外限制，也就是说，指令转译器104不能在第一时脉周期发出关联于一ARM指令的微指令，又在下一个时脉周期发出关联于此ARM指令的第二个微指令。假定下列ARM指令序列，其中CAI是一个条件ALU指令124而“Rx”数值系通用寄存器：Second, the average number of instruction set architecture (eg, ARM instructions) program instructions that the instruction translator 104 can translate per clock cycle will increase. It is assumed that the instruction translator 104 can translate up to three ARM instructions per clock cycle, but a maximum of three micro-instructions can only be issued per clock cycle, and the instruction translator 104 must release the associated instructions in the same clock cycle. An additional restriction on all microinstructions of this ARM instruction, that is, the instruction translator 104 cannot issue a microinstruction associated with an ARM instruction in the first clock cycle, and issue a microinstruction associated with the ARM instruction in the next clock cycle. The second microinstruction. Assume the following ARM instruction sequence, where CAI is a conditional ALU instruction 124 and the "Rx" value is a general-purpose register:

CAI EQ R1,R2,R3CAI EQ R1, R2, R3

CAI NE R4,R5,R6CAI NE R4,R5,R6

CAI CS R7,R8,R9CAI CS R7,R8,R9

在不具有预测器2932/2934/2936(或具有但不进行预测)的处理器中，指令转译器104必须花费三个时脉周期来转译这三个CAI指令。不过，在具有预测器2932/2934/2936进行预测的处理器中，指令转译器可以在同一个时脉周期转译全部的三个CAI指令。再者，此项优点在混合有非CAI的指令，亦即其它ARM指令的情况中依然有效。举例来说，假定CAI指令系接着一个会被转译为二个微指令的ARM指令D，而此CAI指令的走向系经预测器2932/2934/2936预测，一个会被转译为二个微指令的ARM指令E接着CAI指令，一个会被转译为单一个微指令的ARM指令F接着ARM指令E。在这个情况下，指令转译器可在同一个时脉周期转译ARM指令D与CAI指令，随后在下一个时脉周期转译ARM指令E与F。也就是在两个时脉周期内转译四个ARM指令。相较之下，若没有本实施例所提供的功能，指令转译器104将需要三个时脉周期才能转译这四个指令。相类似的优点也可在指令发布单元408与重排缓冲器422发现。In processors that do not have predictors 2932/2934/2936 (or have but do not predict), the instruction translator 104 must take three clock cycles to translate the three CAI instructions. However, in processors with predictors 2932/2934/2936 for prediction, the instruction translator can translate all three CAI instructions on the same clock cycle. Furthermore, this advantage is still valid in the case of mixing non-CAI instructions, ie other ARM instructions. For example, suppose that the CAI instruction is followed by an ARM instruction D that will be translated into two microinstructions, and the direction of the CAI instruction is predicted by the predictor 2932/2934/2936, and one will be translated into two microinstructions. ARM instruction E is followed by CAI instruction, an ARM instruction F followed by ARM instruction E which will be translated into a single microinstruction. In this case, the instruction translator can translate ARM instructions D and CAI instructions on the same clock cycle, and then translate ARM instructions E and F in the next clock cycle. That is, four ARM instructions are translated in two clock cycles. In contrast, without the functionality provided by this embodiment, the instruction translator 104 would need three clock cycles to translate the four instructions. Similar advantages can also be found in instruction issue unit 408 and reorder buffer 422 .

第三，在通过预测器2932,2934,2936预测走向，使指令转译器104只需发出单一个微指令的情况下，条件ALU指令124的延迟可望降低。Third, the latency of the conditional ALU instruction 124 can be expected to be reduced when the direction is predicted by the predictors 2932, 2934, and 2936, so that the instruction translator 104 only needs to issue a single microinstruction.

第四，重排缓冲器与保留站中不具有额外的微指令，可以提升微处理器的前瞻能力，因而提升处理器对于所执行的程序的指令阶层平行处理能力，藉此可以改善对于执行单元424的利用，以提升微处理器100的处理能力(throughput)。进一步来说，省略第二微指令可以在重排缓冲器中保留更多空间给微指令。此特征的优点在于，其可产生一个更大的微指令池供发送微指令给执行单元424执行。微指令在“完成准备”之前还不能发送出去执行，这也就是此微指令中，来自之前微指令的所有来源操作数都处于可取得的状态才能发送出去。因此，微处理器100寻找完成准备的微指令的微指令池越大，找到的机会就越大，所以执行单元424就有较大的机会被利用。这通常被称为微处理器的前瞻能力，也就是充分利用微处理器所要执行的程序的指令阶层平行处理能力。前瞻能力越大，通常就越会提升执行单元424的利用。因此，本发明的微处理器100具有潜力能通过将条件ALU指令124转译为单一微指令，而非多个微指令，以提升其前瞻能力。Fourth, there are no additional micro-instructions in the rearrangement buffer and the reservation station, which can improve the look-ahead capability of the microprocessor, thereby improving the parallel processing capability of the processor for the instruction level of the executed program, thereby improving the execution unit. 424 is utilized to improve the throughput of the microprocessor 100 . Further, omitting the second uops may reserve more space in the reorder buffer for uops. The advantage of this feature is that it produces a larger pool of microinstructions for sending microinstructions to execution unit 424 for execution. The microinstruction cannot be sent for execution until it is "completed preparation", which means that in this microinstruction, all the source operands from the previous microinstruction can be sent out only when all the source operands from the previous microinstruction are available. Therefore, the larger the microinstruction pool in which the microprocessor 100 looks for the prepared microinstruction, the greater the chance of finding it, and thus the greater chance of the execution unit 424 being utilized. This is often referred to as the look-ahead capability of a microprocessor, which is to take full advantage of the instruction-level parallel processing capability of the program to be executed by the microprocessor. The greater the look-ahead capability, the more generally the utilization of the execution unit 424 will be improved. Therefore, the microprocessor 100 of the present invention has the potential to improve its look-ahead capability by translating the conditional ALU instruction 124 into a single microinstruction rather than multiple microinstructions.

虽然前述实施例的微架构除了支持ARM指令集架构条件ALU指令，其也支持x86指令集架构，值得注意的是，本发明亦可应用于其它实施例，亦即支持不同于ARM指令集架构的其它指令集架构的条件ALU指令。其次，值得注意的是，本发明亦可应用于没有预先存在的微架构或是此预先存在的微架构所支持的指令集架构并非x86指令集架构的情况。此外，值得注意的是，本发明在此所描述的是一个广泛的处理器概念，通过在指令执行前，预先在管线预测条件ALU指令的走向，来支持一指令集架构的条件ALU指令。就一实施例而言，系类似于分支预测技术，其依据走向预测的有无确认所撷取的指令流，并发送不同的微指令序列。此外，虽然本文所描述的实施例包含动态预测器与静态预测器，本发明亦可应用于只具有静态预测器或是只具有动态预测器的实施例。其次，本发明亦可应用于具有多个动态与/或静态预测器的实施例，而其中的预测器选择器是从多个动态与静态预测器中进行选择。更其次，本发明亦可应用于动态预测器整合于一分支预测阵列中，例如一分支目标地址快取的实施例。这个实施例的缺点在于，在各个入口用来储存一分支指令的目标地址的空间造成浪费，这是由于对条件ALU指令而言并不需要预测其目标地址。基于程序中的指令混合，虽然分支指令与条件ALU指令间可能产生干扰或牵制，不过，这个实施例仍可能具有以下优点：整合后的高速缓存的储存空间会更有效率被利用，整合后的阵列所具有的入口数可能多于个别阵列的入口数的总合。Although the micro-architecture of the aforementioned embodiment supports the ARM instruction set architecture conditional ALU instruction, it also supports the x86 instruction set architecture, it is worth noting that the present invention can also be applied to other embodiments, that is, supports different ARM instruction set architectures. Conditional ALU instructions for other instruction set architectures. Secondly, it should be noted that the present invention can also be applied to the situation where there is no pre-existing micro-architecture or the instruction set architecture supported by the pre-existing micro-architecture is not the x86 instruction set architecture. In addition, it should be noted that the invention described herein is a broad processor concept that supports conditional ALU instructions of an instruction set architecture by predicting the direction of conditional ALU instructions in the pipeline in advance prior to execution of the instruction. For one embodiment, it is similar to a branch prediction technique, which confirms the fetched instruction stream and issues different microinstruction sequences depending on the presence or absence of direction prediction. Furthermore, although the embodiments described herein include both dynamic predictors and static predictors, the present invention may also be applied to embodiments having only static predictors or only dynamic predictors. Second, the present invention can also be applied to embodiments with multiple dynamic and/or static predictors, wherein the predictor selector is selected from multiple dynamic and static predictors. Furthermore, the present invention can also be applied to the embodiment in which the dynamic predictor is integrated into a branch prediction array, such as a branch target address cache. The disadvantage of this embodiment is that the space used to store the target address of a branch instruction at each entry is wasted because the target address of a conditional ALU instruction does not need to be predicted. Based on instruction mixing in the program, although there may be interference or pinning between branch instructions and conditional ALU instructions, this embodiment may still have the following advantages: the storage space of the integrated cache is more efficiently used, and the integrated An array may have more entries than the sum of the entries of the individual arrays.

虽然前述实施例系针对属于条件ALU指令的条件非分支指令，本发明亦可应用预测器来预测于其它类型的条件非分支指令。举例来说，条件加载指令就可被预测。若是预测执行，指令转译器产生一具有条件码的非条件加载微指令。此具有条件码的非条件加载微指令包含由条件加载指令所指定的条件，使执行管线能检测是否误预测。若是执行管线检测到误预测，就会避免执行任何架构状态更新的操作，例如在加载造成转译后备缓冲区(TLB)错过时更新存储器的页表查询(page table walk)、或是在加载产生一例外状态时产生一架构例外事件。此外，若是在快取中出现加载错过(load misses)的情形，执行管线会避免在处理器总线产生传输以填入错过的快取线。若是预测结果是无预测，指令转译器会产生微指令集来条件执行加载操作。在一实施例中，若是预测结果是无预测，此微指令集可采取类似于美国专利临时申请案61/473,062所描述的方式。Although the foregoing embodiments are directed to conditional non-branch instructions that are conditional ALU instructions, the present invention can also apply predictors to predict other types of conditional non-branch instructions. For example, conditional load instructions can be predicted. For speculative execution, the instruction translator generates an unconditional load microinstruction with a condition code. This unconditional load microinstruction with condition code contains the condition specified by the conditional load instruction, enabling the execution pipeline to detect misprediction. If the execution pipeline detects a misprediction, it avoids performing any architectural state update operations, such as updating a page table walk in memory when a load causes a translation lookaside buffer (TLB) miss, or when a load results in a A schema exception event is generated when the exception state occurs. Additionally, if load misses occur in the cache, the execution pipeline avoids generating transfers on the processor bus to fill the missed cache lines. If the prediction result is unpredicted, the instruction translator will generate a microinstruction set to conditionally execute the load operation. In one embodiment, if the prediction result is no prediction, the microinstruction set may be in a manner similar to that described in US Patent Provisional Application 61/473,062.

虽然以上实施例系关于ARM ISA条件非分支指令，本发明亦可利用预测器预测应用于其它ISA的条件非分支指令。举例来说，x86ISA的条件非分支指令，如CMOVcc与SETcc即可被预测。Although the above embodiments relate to ARM ISA conditional non-branch instructions, the present invention can also utilize a predictor to predict conditional non-branch instructions that apply to other ISAs. For example, conditional non-branch instructions of the x86 ISA such as CMOVcc and SETcc can be predicted.

修正后的立即值应用于指令转译Corrected immediate value applied to instruction translation

ARM指令集架构定义一数据处理指令集，允许指令指定一立即来源操作数，也就是这里所称的“立即操作数指令(immediate operand instruction)”。此立即来源操作数是一个32位的数值，通过将一个8位的数值向右旋转一个4位数值的两倍所产生。此8位数值系指定于指令中标示immed_8的字段，4位数值系指定于指令中标示为rotate_imm的字段。因此The ARM instruction set architecture defines a data processing instruction set that allows an instruction to specify an immediate source operand, also referred to herein as an "immediate operand instruction". The immediate source operand is a 32-bit value produced by right-rotating an 8-bit value by twice a 4-bit value. The 8-bit value is specified in the field marked immed_8 in the instruction, and the 4-bit value is specified in the field marked rotate_imm in the instruction. therefore

立即操作数数值＝immed_8>>(2*rotate_imm)Immediate operand value = immed_8>>(2*rotate_imm)

处理一既存微架构内立即操作数指令的方法系让指令转译器来产生两个微指令。第一微指令对immed_8数值执行两倍于rotate_imm的数值的转动操作以产生一结果，第二微指令接收第一微指令的结果，作为一用以执行立即操作数指令所指定ALU功能的来源操作数。此实施例可参照第10与21图。举例来说，图10的步骤1034中，指令转译器产生SHF微指令来执行一移位操作(在本实施例即是转动操作)以产生一移位后的结果写入一暂时寄存器，其后的ALUOP微指令便可使用暂时寄存器中由SHF微指令所产生的移位结果。此移位操作可执行于一指定于立即操作数指令的立即数值(举例来说，即对应于图10的步骤1012与1024)。不过，相较于利用指令转译器仅产生单一微指令的方法，此方法应用于非循序执行处理器可能产生以下的缺点。A method of processing immediate operand instructions within an existing microarchitecture is to have the instruction translator generate two microinstructions. The first microinstruction performs a rotation operation on the immed_8 value twice the value of rotate_imm to generate a result, and the second microinstruction receives the result of the first microinstruction as a source operation for executing the ALU function specified by the immediate operand instruction number. For this embodiment, refer to FIGS. 10 and 21 . For example, in step 1034 of FIG. 10 , the instruction translator generates SHF microinstructions to perform a shift operation (in this embodiment, a rotation operation) to generate a shifted result to write into a temporary register, and then The ALUOP microinstruction can use the shift result produced by the SHF microinstruction in the scratch register. This shift operation may be performed on an immediate value specified in an immediate operand instruction (eg, corresponding to steps 1012 and 1024 of FIG. 10 ). However, compared with the method of using an instruction translator to generate only a single microinstruction, the application of this method to a non-sequential processor may have the following disadvantages.

首先，此额外的微指令会在非循序执行处理器的各个资源中占据一额外的指令槽，如寄存器配置表、重排缓冲器、保留站与执行单元内的额外指令槽或入口，因而需要较大、较复杂的资源，能量的消耗也会较高。First, this extra microinstruction occupies an extra instruction slot in various resources of the non-sequential processor, such as register allocation table, rearrangement buffer, reservation station and extra instruction slot or entry in the execution unit, thus requiring Larger and more complex resources, the energy consumption will also be higher.

其次，部分功能单元系受限于每个时脉周期内所能执行的最大指令数量。举例来说，依据一实施例，指令转译器每个时脉周期能发出的指令数量有其最大限制(例如每个时脉周期三个微指令)，发布单元每个时脉周期能发送给执行单元的指令数量有其最大限制(例如每个时脉周期四个微指令)，而引退单元每个时脉周期能引退的指令数量也有其最大限制(例如每个时脉周期三个微指令)。因此，这些功能单元内额外微指令的产生，会减少每个时脉周期内所能发出、发送或引退的平均指令数量，因而限制了处理器的效能表现。Second, some functional units are limited by the maximum number of instructions that can be executed per clock cycle. For example, according to an embodiment, the instruction translator can issue a maximum number of instructions per clock cycle (for example, three micro-instructions per clock cycle), and the issue unit can issue an execution unit per clock cycle. The unit has a maximum limit on the number of instructions (such as four microinstructions per clock cycle), and the number of instructions that the retirement unit can retire per clock cycle has its maximum limit (for example, three microinstructions per clock cycle) . Therefore, the generation of additional micro-instructions within these functional units reduces the average number of instructions that can be issued, issued or retired per clock cycle, thereby limiting the performance of the processor.

第三，立即操作数指令在其构成微指令完成执行前还不会引退，因为第二微指令系关联于第一微指令的结果，因此在第一微指令产生结果前，第二微指令无法发送至执行单元。这些都会对于立即操作数指令的总执行时间造成额外的延迟。Third, the immediate operand instruction will not retire until its constituent microinstructions are executed, because the second microinstruction is related to the result of the first microinstruction, so the second microinstruction cannot Sent to the execution unit. These all cause additional delays to the total execution time of immediate operand instructions.

第四，在重排缓冲器以及/或保留站内出现额外的微指令，会降低处理器的前瞻能力，因而降低处理器利用指令阶层平行处理来执行程序的能力，因而会减少执行单元的利用率，降低处理器的整体效能。Fourth, the presence of additional microinstructions in the rearrangement buffer and/or reservation station reduces the look-ahead capability of the processor, thereby reducing the processor's ability to execute programs using instruction-level parallel processing, thereby reducing execution unit utilization , reducing the overall performance of the processor.

本文所描述的实施例具有潜力能在执行立即操作数指令时有较佳的表现。在此系将immed_8字段与rotate_imm字段合并称为“立即字段”。特别是，指令转译器得知立即字段数值的一预定子集以及由各个相对应的立即字段数值所产生的相关的32位立即操作数数值。在指令转译器遇到一立即操作数指令时，指令转译器会确认所指定的立即字段数值是否落于此预测子集。若是，指令转译器就发出正确的32位立即操作数至立即操作数总线，并伴随着立即操作数指令，一并沿着管线传送以供执行。若是立即字段数值并不落于预定子集，指令转译器采取较低效能的方法，亦即发出两个微指令。可通过执行应用软件与观察产生不同立即字段数值的相对频率，并选择少数最常观察到的立即字段数值作为立即字段数值的预设集合，以维持其尺寸、能量消耗、与指令转译器的复杂度在一定的范围内。The embodiments described herein have the potential to perform better when executing immediate operand instructions. The combination of the immed_8 field and the rotate_imm field is referred to herein as an "immediate field". In particular, the instruction translator knows a predetermined subset of immediate field values and the associated 32-bit immediate operand values resulting from each corresponding immediate field value. When the instruction translator encounters an immediate operand instruction, the instruction translator checks whether the specified immediate field value falls within the predicted subset. If so, the instruction translator issues the correct 32-bit immediate operand to the immediate operand bus and, along with the immediate operand instruction, is passed down the pipeline for execution. If the immediate field value does not fall within the predetermined subset, the instruction translator takes a less efficient approach, ie, issuing two microinstructions. The relative frequency of different immediate field values can be generated by executing the application software and observing, and selecting the few most frequently observed immediate field values as a preset set of immediate field values to maintain their size, power consumption, and complexity of the instruction translator degree is within a certain range.

请参照图32的方块图，图中显示本发明的微处理器100在指令转译过程中处理修正后的立即常数的情形。图32的微处理器100系类似于图1的微处理器，并包含类似于第1至图4中所示的元件，这些元件包含指令快取102、指令转译器104、配置寄存器122、寄存器配置表402、指令发布单元408与执行单元424。执行单元424包含一个或多个单元用以执行以下所述的微指令126。进一步来说，执行单元424包含一个或多个单元，以执行图33所示的向右转动(ROR)微指令3344(在此亦称为移位微指令)、ALU微指令3346、以及立即ALU微指令3348。微处理器100并包含图33所示的架构寄存器与暂时寄存器106以及旗标926。指令快取102撷取图33所示的立即操作数指令124。Please refer to the block diagram of FIG. 32 , which shows the situation in which the microprocessor 100 of the present invention processes the modified immediate constant during the instruction translation process. The microprocessor 100 of FIG. 32 is similar to the microprocessor of FIG. 1 and includes elements similar to those shown in FIGS. 1-4, including an instruction cache 102, an instruction translator 104, configuration registers 122, registers Configuration table 402 , instruction issuing unit 408 and execution unit 424 . Execution unit 424 includes one or more units for executing microinstructions 126 described below. Further, execution unit 424 includes one or more units to execute rotate right (ROR) microinstructions 3344 (also referred to herein as shift microinstructions), ALU microinstructions 3346, and immediate ALU microinstruction 3348. The microprocessor 100 also includes the architectural and temporary registers 106 and flags 926 shown in FIG. 33 . The instruction cache 102 fetches the immediate operand instruction 124 shown in FIG. 33 .

在一实施例中，此微处理器100的微架构在许多面向系类似于台湾威盛电子所生产的VIA Nano^TM处理器的微架构，不过本实施例的微处理器100是经修改以支持ARM指令集架构。此VIA Nano^TM处理器的微架构系一高效能非循序执行超纯量微架构以支持x86指令集架构，此处理器系经如本文所述的修改，使能额外支持ARM微架构，特别是详述于图33的相关篇幅所述对ARM立即操作数指令124的支持。进一步来说，当指令转译器104在遇到一立即操作数指令124，且其指定的立即字段b207(请参照图33所示)的数值系落于此指令转译器104已知数值所构成的一预定子集时，就会发出一个立即操作数3366于一立即操作数总线作为响应。此立即操作数3366系沿着微处理器100管线的阶段(stage)向下传递，直到抵达执行单元424为止。In one embodiment, the microarchitecture of the microprocessor 100 is similar in many aspects to the microarchitecture of the VIA Nano ^™ processor produced by Taiwan's VIA Electronics, but the microprocessor 100 of this embodiment is modified to support ARM Instruction set architecture. The microarchitecture of the VIA Nano ^™ processor is a high performance non-sequential execution superscalar microarchitecture to support the x86 instruction set architecture. The processor is modified as described herein to enable additional support for the ARM microarchitecture, in particular Support for ARM immediate operand instructions 124 is described in detail in the relevant section of FIG. 33 . Further, when the instruction translator 104 encounters an immediate operand instruction 124, and the value of the specified immediate field b207 (please refer to FIG. 33 ) falls within the set of known values of the instruction translator 104 In response to a predetermined subset, an immediate operand 3366 is issued on an immediate operand bus. The immediate operand 3366 is passed down the stages of the microprocessor 100 pipeline until it reaches the execution unit 424 .

寄存器配置表402从指令转译器104接收微指令164，并相对应地产生各个微指令164的相关信息。进一步来说，此寄存器配置表402指出，ALU微指令3346(可参照图33)系关联于ROR微指令3344(可参照图33)的结果，而在指令转译器104转译一立即操作数指令，其所指定的立即字段数值b207未落于立即字段b207的数值所构成的一预定子集内时，就会一并发出这两个微指令。此外，如图34(包括图34A和图34B)所示，在指令转译器104额外发出一条件移动微指令126的情况下(例如图10所描述者)，寄存器配置表402会指出，此条件移动微指令126系关联于ALU微指令3346的结果。The register configuration table 402 receives the microinstructions 164 from the instruction translator 104 and generates relevant information of each microinstruction 164 correspondingly. Further, the register configuration table 402 indicates that the ALU microinstruction 3346 (refer to FIG. 33 ) is associated with the result of the ROR microinstruction 3344 (refer to FIG. 33 ), and the instruction translator 104 translates an immediate operand instruction, When the specified immediate field value b207 does not fall within a predetermined subset formed by the value of the immediate field b207, the two micro-instructions will be issued together. In addition, as shown in FIG. 34 (including FIGS. 34A and 34B ), in the event that the instruction translator 104 additionally issues a conditional move microinstruction 126 (such as that described in FIG. 10 ), the register allocation table 402 will indicate that this condition Move microinstruction 126 is associated with the result of ALU microinstruction 3346.

暂时寄存器106储存微处理器100的非架构状态，并且可供微架构用于暂时储存执行指令集架构的指令124所需的中间数值。进一步来说，由指令转译器104所发出的微指令126会将暂时寄存器106指定为来源以及/或目的操作数位置。图33所示的ROR微指令3344即是将一个暂时寄存器106指定为其目的寄存器，而ALU微指令3346则是将同一个暂时寄存器106指定为其来源寄存器的一者。这在以下篇幅会有更详细的说明。Temporary registers 106 store the non-architectural state of microprocessor 100 and may be used by the micro-architecture to temporarily store intermediate values required to execute instructions 124 of the instruction set architecture. Further, microinstructions 126 issued by instruction translator 104 designate temporary registers 106 as source and/or destination operand locations. The ROR microinstruction 3344 shown in FIG. 33 designates a temporary register 106 as its destination register, while the ALU microinstruction 3346 designates the same temporary register 106 as one of its source registers. This will be explained in more detail in the following sections.

至少一个执行单元424包含一个算术逻辑单元(未图标)用以执行各种微指令。这些微指令包含图33所示的ROR微指令3344、ALU微指令3346、以及立即ALU微指令3348。在立即ALU微指令3348的情况下，执行单元424接收来自指令转译器104的立即操作数3366的数值作为其输入。此执行单元424执行由操作码字段b212所指定的ALU功能，而此功能系相同于由立即操作数指令124所指定的ALU功能，并且此指令执行在立即操作数3366与一个第二来源操作数之上。在ALU微指令3346的情况下，执行单元424执行由操作码字段b212所指定的ALU功能，而此功能系相同于由立即操作数指令124所指定的ALU功能，并且此指令执行于二个来源操作数之上，此二个来源操作数其中之一系来自暂时寄存器106，而相关的ROR微指令3344系将其结果写入此寄存器。在ROR微指令3344的情况下，执行单元424将一个8位数值以两倍于一个4位数值的量向右转动，以产生一个32位的立即数值且写入一暂时寄存器106供后续相关的ALU微指令3344使用。前述8位数值系相同于由立即操作数指令124的immed_8字段b208所指定的数值，前述4位数值系相同于由立即操作数指令124的rotate_imm字段b209所指定的数值。At least one execution unit 424 includes an arithmetic logic unit (not shown) for executing various microinstructions. These microinstructions include the ROR microinstruction 3344, the ALU microinstruction 3346, and the immediate ALU microinstruction 3348 shown in FIG. In the case of immediate ALU microinstruction 3348, execution unit 424 receives as its input the value of immediate operand 3366 from instruction translator 104. The execution unit 424 performs the ALU function specified by the opcode field b212, which is identical to the ALU function specified by the immediate operand instruction 124, and the instruction executes on the immediate operand 3366 and a second source operand above. In the case of ALU microinstruction 3346, execution unit 424 executes the ALU function specified by opcode field b212, which is the same as the ALU function specified by immediate operand instruction 124, and which executes from two sources On top of the operands, one of the two source operands is from scratch register 106, and the associated ROR microinstruction 3344 writes its result into this register. In the case of the ROR microinstruction 3344, the execution unit 424 rotates an 8-bit value to the right by twice as much as a 4-bit value to generate a 32-bit immediate value and write to a temporary register 106 for subsequent correlation The ALU microinstruction 3344 uses. The aforementioned 8-bit value is the same as the value specified by the immed_8 field b208 of the immediate operand instruction 124 , and the aforementioned 4-bit value is the same as the value specified by the rotate_imm field b209 of the immediate operand instruction 124 .

请参照图33，图中是以一方块图，显示本发明将一个立即操作数指令124选择性地转译为一个ROR微指令3344与一个ALU微指令3346、或是转译为一个立即ALU微指令3348的一实施例。如本文所述，指令转译器104系在立即字段b207所指定的数值落入指令转译器104已知的预定子集内时，将立即操作数指令124转译为一个立即ALU微指令3348供执行单元424执行，而由此，指令转译器104系发出一相对应的评估立即操作数数值3366。如图32所示，在立即字段b207所指定的数值未落入预定子集内时，指令转译器104系将立即操作数指令124转译为一个ROR微指令3344接着一个ALU微指令3044供执行单元424执行。在一实施例中，立即操作数指令124系一个由ARM指令集架构所定义的立即操作数指令，以ARM的用语来说，就是一个具有数据处理立即编码(data processing immediate encoding)功能的指令。Please refer to FIG. 33, which is a block diagram showing that the present invention selectively translates an immediate operand instruction 124 into a ROR microinstruction 3344 and an ALU microinstruction 3346, or into an immediate ALU microinstruction 3348 an embodiment of. As described herein, instruction translator 104 translates immediate operand instruction 124 into an immediate ALU microinstruction 3348 for execution units when the value specified by immediate field b207 falls within a predetermined subset known to instruction translator 104 424 is executed, whereby the instruction translator 104 issues a corresponding evaluated immediate operand value 3366. As shown in FIG. 32, when the value specified by the immediate field b207 does not fall within the predetermined subset, the instruction translator 104 translates the immediate operand instruction 124 into a ROR microinstruction 3344 followed by an ALU microinstruction 3044 for the execution unit 424 execute. In one embodiment, the immediate operand instruction 124 is an immediate operand instruction defined by the ARM instruction set architecture. In ARM parlance, it is an instruction with a data processing immediate encoding function.

立即操作数指令124包含一操作码字段b202、一来源寄存器1的字段b204、目的寄存器字段b206、一个immed_8字段b208、以及一个rotate_imm字段b209。如图33所示，immed_8字段b208与rotate_imm字段b209的合并即构成立即字段b209。此操作码字段b202包含一数值，用以区分立即操作数指令124与此指令集架构的其它指令，并且，此数值系指定一个执行于来源操作数的ALU功能。就一个ARM立即操作数指令124而言，此ALU功能举例来说，可包含加(ADD)、带进位加(add with carry,ADC)、逻辑及(logical AND,AND)、逻辑位清除(logical bit clear,BIC)、比较取负(compare negative,CMN)、比较(compare,CMP)、逻辑异或(logical exclusive-OR,EOR)、移动(move,MOV)、反向移动(move not,MVN)、逻辑或(logic OR,ORR)、反向减(reverse subtract,RSB)、带进位反向减(reverse subtractwith carry,RSC)、带进位减(subtract with carry,SBC)、减(subtract,SUB)、相等测试(test equivalence,TEQ)与测试(test,TST)。来源寄存器1的字段b204指定一架构寄存器106或是一暂时寄存器106，执行单元424所接收的来源操作数系来自这个被指定的寄存器。目的寄存器字段b206指定一架构寄存器106或是一暂时寄存器106，结果则是写入这个被指定的寄存器。前述immed_8字段b208持有一个8位常数，此常数会以两倍于前述4位的rotate_imm字段b209的数值向右旋转，以产生一立即来源操作数。如前文第9至28图的实施例所述，立即操作数指令124可包含一条件ALU指令。举例来说，此立即操作数指令124可以是如步骤1056所述的一个ARM NCUALUOP指令124，其系将一修正后的立即常数指定为其来源操作数，而不是寄存器。The immediate operand instruction 124 includes an opcode field b202, a source register 1 field b204, a destination register field b206, an immed_8 field b208, and a rotate_imm field b209. As shown in FIG. 33, the combination of the immed_8 field b208 and the rotate_imm field b209 constitutes the immediate field b209. The opcode field b202 contains a value to distinguish the immediate operand instruction 124 from other instructions of the instruction set architecture, and the value specifies an ALU function to perform on the source operand. For an ARM immediate operand instruction 124, the ALU functions include, for example, add (ADD), add with carry (ADC), logical AND (logical AND, AND), logical bit clear ( logical bit clear, BIC), compare negative (CMN), compare (compare, CMP), logical exclusive-OR (logical exclusive-OR, EOR), move (move, MOV), reverse move (move not, MVN), logic OR (ORR), reverse subtract (RSB), reverse subtract with carry (RSC), subtract with carry (SBC), subtract ( subtract, SUB), test equivalence (TEQ) and test (test, TST). Field b204 of source register 1 specifies an architectural register 106 or a temporary register 106 from which the source operand received by the execution unit 424 is derived. The destination register field b206 specifies either an architectural register 106 or a temporary register 106, and the result is written to the specified register. The aforementioned immed_8 field b208 holds an 8-bit constant that is rotated right by twice the value of the aforementioned 4-bit rotate_imm field b209 to generate an immediate source operand. The immediate operand instruction 124 may include a conditional ALU instruction, as described above in the embodiments of FIGS. 9-28. For example, the immediate operand instruction 124 may be an ARM NCUALUOP instruction 124 as described in step 1056, which specifies a modified immediate constant as its source operand, rather than a register.

ROR微指令3344包含一个操作码字段b222、一个目的寄存器字段b226、以及两个用以指定来源操作数的来源操作数字段，如图33所示，分别标示为immed_8字段b228以及rotate_imm字段b229，用以实行立即操作数指令124。此操作码字段b222包含一数值，用以区别ROR微指令3344与此微处理器100的微指令集架构的其它微指令。目的寄存器字段b226是指定一架构寄存器106或是一目的寄存器106，ROR微指令3344的结果将会写入其中。在指令转译器104转译立即操作数指令124，而立即字段b207所指定的数值并未落入预定子集时，指令转译器104会以立即操作数指令的immed_8字段b208与rotate_imm字段b209的相对应数值填入immed_8字段b228与rotate_imm字段b229，并且，指令转译器104会填入目的寄存器字段b226以指定一暂时寄存器106来接收ALU功能的结果，此寄存器后续将会被ALU微指令3344利用来作为其来源操作数。除了前文所述，ROR微指令3344还可包含一个移位微指令126(从图10起标示为SHF)来指定一修正后的立即常数，这在图10与图11有更详细的说明。举例来说，若是被转译的立即操作数指令124是步骤1056所述是指定一修正后的立即常数的ARM NCUALUOP指令124，此ROR微指令3344就可能是步骤1056中的SHF微指令126。The ROR microinstruction 3344 includes an opcode field b222, a destination register field b226, and two source operand fields for specifying source operands. As shown in Figure 33, they are marked as immed_8 field b228 and rotate_imm field b229, respectively. to execute the immediate operand instruction 124. The opcode field b222 contains a value for distinguishing the ROR microinstruction 3344 from other microinstructions of the microinstruction set architecture of the microprocessor 100 . The destination register field b226 specifies an architectural register 106 or a destination register 106 into which the result of the ROR microinstruction 3344 will be written. When the instruction translator 104 translates the immediate operand instruction 124 and the value specified by the immediate field b207 does not fall into the predetermined subset, the instruction translator 104 will use the immed_8 field b208 of the immediate operand instruction to correspond to the rotate_imm field b209 The value is filled in the immed_8 field b228 and the rotate_imm field b229, and the instruction translator 104 will fill in the destination register field b226 to designate a temporary register 106 to receive the result of the ALU function, which will be subsequently used by the ALU microinstruction 3344 as a its source operand. In addition to the foregoing, the ROR microinstruction 3344 may also include a shift microinstruction 126 (designated SHF from FIG. 10 onwards) to specify a modified immediate constant, which is described in greater detail in FIGS. 10 and 11 . For example, if the translated immediate operand instruction 124 is the ARM NCUALUOP instruction 124 that specifies a modified immediate constant as described in step 1056 , the ROR microinstruction 3344 may be the SHF microinstruction 126 in step 1056 .

ALU微指令3346包含一操作码字段b232、一来源寄存器1的字段b234、一来源寄存器2的字段b235、一目的寄存器字段b236。此操作码字段b232包含一数值，用以区别ALU微指令3346与此微处理器100的微指令集架构的其它微指令，并且，其所指定用以执行于来源操作数的ALU功能系相同于立即操作数指令124转译产生的ALU功能。来源寄存器1的字段b234指定一架构寄存器106或是一暂时寄存器106，第一来源操作数将会由这个被指定的寄存器提供给ALU微指令3346，来源寄存器2的字段b235指定一架构寄存器106或是一暂时寄存器106，第二来源操作数将会从这个被指定的寄存器提供给ALU微指令3346，目的寄存器字段b236指定一架构寄存器106或是一暂时寄存器106，ALU微指令3346的结果将会写入这个被指定的寄存器。当指令转译器104转译立即操作数指令124且立即字段b207所指定的数值并未落入预定子集，指令转译器104会填入来源寄存器1的字段b234以指定一寄存器，其与立即来源操作数指令124的来源操作数1的字段b204所指定者相同，指令转译器104会填入目的寄存器字段b236以指定一寄存器，其与立即来源操作数124的目的寄存器字段b206所指定者相同，指令转译器104也会填入来源寄存器2的字段b235以指定一暂时寄存器106，其与ROR微指令3344的目的寄存器字段b226所指定者相同。如前述，此ALU微指令3346可包含任何ALU操作微指令126，分别标示为ALUOP、ALUOPUC、CALUOP以及NCALUOP，还包含详述于第10与12图的条件版本的微指令。举例来说，若是被转译的立即操作数指令124是步骤1056所述的ARM NCUALUOP指令124，而此指定所指定的修正后的立即常数并未落入预定子集时，此ALU微指令3346就可能是步骤1056中的NCUALUOP微指令126。The ALU microinstruction 3346 includes an opcode field b232, a source register 1 field b234, a source register 2 field b235, and a destination register field b236. The opcode field b232 contains a value to distinguish the ALU microinstruction 3346 from other microinstructions of the microinstruction set architecture of the microprocessor 100, and the ALU function it specifies to execute on the source operand is the same as The immediate operand instruction 124 translates the resulting ALU function. Field b234 of source register 1 specifies an architectural register 106 or a temporary register 106 from which the first source operand will be provided to the ALU microinstruction 3346, and field b235 of source register 2 specifies an architectural register 106 or is a temporary register 106 from which the second source operand will be provided to the ALU microinstruction 3346. The destination register field b236 specifies an architectural register 106 or a temporary register 106. The result of the ALU microinstruction 3346 will be Write to the specified register. When the instruction translator 104 translates the immediate operand instruction 124 and the value specified by the immediate field b207 does not fall into the predetermined subset, the instruction translator 104 fills the field b234 of the source register 1 to specify a register which is the same as the immediate source operation The source operand 1 of the number instruction 124 is the same as that specified by field b204, the instruction translator 104 will fill in the destination register field b236 to specify a register, which is the same as that specified by the destination register field b206 of the immediate source operand 124, the instruction Translator 104 also fills in field b235 of source register 2 to designate a temporary register 106 that is the same as that designated by destination register field b226 of ROR microinstruction 3344. As previously mentioned, the ALU microinstructions 3346 may include any of the ALU operation microinstructions 126, designated ALUOP, ALUOPUC, CALUOP, and NCALUOP, respectively, as well as the conditional versions of the microinstructions detailed in Figures 10 and 12. For example, if the translated immediate operand instruction 124 is the ARM NCUALUOP instruction 124 described in step 1056, and the modified immediate constant specified by this specification does not fall within the predetermined subset, the ALU microinstruction 3346 will Possibly the NCUALUOP microinstruction 126 in step 1056.

立即ALU微指令3348包含一操作码字段b212、一来源寄存器1的字段b214、一目的寄存器字段b216、与一个immediate-32字段b218。就一实施例而言，此immediate-32字段b218就是执行立即ALU微指令3348的执行单元424所接收的立即操作数3366。也就是说，操作数多工器(未图示)运作以选择将立即操作数3366提供给接收立即ALU微指令3348的执行单元424。操作码字段b212包含一数值以区别ALU微指令3348与微处理器100的微指令集架构内的其它微指令，并且，其所指定用以执行于来源操作数的ALU功能系相同于立即操作数指令124转译产生的ALU功能。此来源寄存器1的字段b214系指定一架构寄存器106或是一暂时寄存器106，一个第一来源操作数将会从中提供给ALU微指令3346，目的寄存器字段b216指定一架构寄存器106或是一暂时寄存器106，立即ALU微指令3348的结果将会写入此指定寄存器中。当指令转译器1045转译立即操作数指令124且立即字段b207指定的数值落入预定子集时，指令转译器104会填入来源寄存器1的字段b214以指定一个寄存器，其相同于立即操作数指令124的来源操作数1的字段b204所指定者，指令转译器104会填入目的寄存器字段b216以指定一个寄存器，其相同于立即操作数指令124的目的寄存器字段b206所指定者。如前述，此立即ALU微指令3346可包含任何ALU操作微指令126，分别标示为ALUOP、ALUOPUC、CALUOP以及NCALUOP，包含详述于第10与12图的条件版本的微指令，以指定一立即来源操作数。举例来说，若是被转译的立即操作数指令124是步骤1056所述的ARM NCUALUOP指令124，其指定的修正后的立即常数系落于预定子集内，此立即ALU微指令3348可以是步骤1056内的NCUALUOP微指令126，而指令转译器104将不会发出步骤1056的SHF微指令126，以提供前述有关于利用指令转译器104处理修正后立即常数所产生的优点。Immediate ALU microinstruction 3348 includes an opcode field b212, a source register 1 field b214, a destination register field b216, and an immediate-32 field b218. For one embodiment, the immediate-32 field b218 is the immediate operand 3366 received by the execution unit 424 executing the immediate ALU microinstruction 3348 . That is, an operand multiplexer (not shown) operates to select the immediate operand 3366 to be provided to the execution unit 424 that receives the immediate ALU microinstruction 3348. The opcode field b212 contains a value to distinguish the ALU microinstruction 3348 from other microinstructions in the microinstruction set architecture of the microprocessor 100, and the ALU function it specifies to execute on the source operand is the same as the immediate operand Instruction 124 translates the resulting ALU function. Field b214 of the source register 1 specifies an architectural register 106 or a temporary register 106 from which a first source operand will be provided to the ALU microinstruction 3346, and the destination register field b216 specifies an architectural register 106 or a temporary register 106, the result of the immediate ALU microinstruction 3348 will be written to this designated register. When the instruction translator 1045 translates the immediate operand instruction 124 and the value specified by the immediate field b207 falls within the predetermined subset, the instruction translator 104 fills the field b214 of the source register 1 to specify a register, which is the same as the immediate operand instruction 124 specified by field b204 of source operand 1, instruction translator 104 fills in destination register field b216 to specify a register that is identical to that specified by destination register field b206 of immediate operand instruction 124. As previously mentioned, the immediate ALU microinstructions 3346 may include any ALU operation microinstructions 126, labeled ALUOP, ALUOPUC, CALUOP, and NCALUOP, respectively, including the conditional versions of the microinstructions detailed in Figures 10 and 12 to specify an immediate source operand. For example, if the translated immediate operand instruction 124 is the ARM NCUALUOP instruction 124 described in step 1056, and the specified modified immediate constant falls within a predetermined subset, the immediate ALU microinstruction 3348 may be step 1056 The NCUALUOP microinstruction 126 within the instruction translator 104 will not issue the SHF microinstruction 126 of step 1056 to provide the aforementioned advantages associated with using the instruction translator 104 to process the immediate postfix constant.

请参照图34，图中是以一流程图，显示本发明图32的微处理器100执行图33的一立即操作数指令的操作的一实施例。此流程始于步骤3402。Please refer to FIG. 34 , which is a flowchart showing an embodiment of the operation of the microprocessor 100 of FIG. 32 to execute an immediate operand instruction of FIG. 33 of the present invention. The process begins at step 3402.

在步骤3402中，指令转译器104遇到图33的一立即操作数指令124，并以由多个数值构成的预定子集检查立即字段b207(就一ARM立即操作数指令124而言，即是位于低位的12个位)。接下来进入一决策步骤3404。In step 3402, instruction translator 104 encounters an immediate operand instruction 124 of FIG. 33, and checks immediate field b207 with a predetermined subset of values (for an ARM immediate operand instruction 124, that is located in the lower 12 bits). Next, a decision step 3404 is entered.

在决策步骤3404中，指令转译器104确认立即字段b207的数值是否落于此数值预定子集内。若是，前进至步骤3406；否则就前进至步骤3414。In decision step 3404, instruction translator 104 determines whether the value of immediate field b207 falls within the predetermined subset of values. If so, go to step 3406; otherwise, go to step 3414.

在步骤3406，指令转译器104发出单一个如图33所示的立即ALU微指令3348，以响应立即操作数指令124。在一实施例中，若是立即操作数指令124系一条件ALU指令124指定一来源目的共享的寄存器，此立即ALU微指令3348将包含图21的步骤2134、2136、2154与2156所描述的诸多ALU微指令126其中之一，不过不包含前述SHF微指令。若是此条件ALU指令124并未指定一来源目的共享的寄存器，指令转译器104就会发出立即ALU微指令3348与一个图10的步骤1034、1036、1054、1056所描述的条件移动微指令126(XMOV以及CMOV)，不过不包含前述SHF微指令。在这个状况下，寄存器配置表402产生的条件移动微指令126的关联性信息，会指出条件移动微指令126系关联于立即ALU微指令3348的结果。接下来进入步骤3408。At step 3406, the instruction translator 104 issues a single immediate ALU microinstruction 3348 as shown in FIG. 33 in response to the immediate operand instruction 124. In one embodiment, if the immediate operand instruction 124 is a conditional ALU instruction 124 specifying a source-destination shared register, the immediate ALU microinstruction 3348 will include the ALUs described in steps 2134, 2136, 2154 and 2156 of FIG. 21 . One of the microinstructions 126, but does not contain the aforementioned SHF microinstructions. If the conditional ALU instruction 124 does not specify a source-destination shared register, the instruction translator 104 will issue an immediate ALU microinstruction 3348 and a conditional move microinstruction 126 ( XMOV and CMOV), but does not contain the aforementioned SHF microinstructions. In this case, the association information of the conditional move microinstruction 126 generated by the register allocation table 402 will indicate that the conditional move microinstruction 126 is associated with the result of the immediate ALU microinstruction 3348. Next, step 3408 is entered.

在步骤3408，指令发布单元408将立即ALU微指令3348发布给执行单元424。接下来进入步骤3412。At step 3408, the instruction issue unit 408 issues the immediate ALU microinstruction 3348 to the execution unit 424. Next, go to step 3412.

在步骤3412中，执行单元424从立即操作数总线接收通过由管线所传输的32位立即操作数3366的数值，以及由来源寄存器1的字段b214所指定的来源操作数。执行单元424执行立即ALU微指令3348的过程，是将操作码字段b212所指定的ALU功能执行于32位立即操作数3366与其它来源操作数，以产生结果至结果总线128，供目的寄存器字段b216所指定的架构寄存器106进行后续引退操作，此架构寄存器106系相同于由立即操作数指令124的目的寄存器字段b206所指定的架构寄存器106。若是在步骤3406中，指令转译器104发出一个条件移动微指令126，立即ALU微指令3348的结果就注定会是一个暂时寄存器106，而非由立即操作数指令124所指定的目的寄存器106，并且，为了以响应步骤3412中执行单元424完成立即ALU微指令的操作，如前述，尤其是图10至图20，指令发布单元408会发布条件移动微指令126至执行单元424，而执行单元424会执行此条件移动微指令126以产生立即操作数指令124的结果。此流程终止于步骤3412。In step 3412, the execution unit 424 receives from the immediate operand bus the value of the 32-bit immediate operand 3366 transmitted through the pipeline, and the source operand specified by field b214 of source register 1. The process of executing the immediate ALU microinstruction 3348 by the execution unit 424 is to execute the ALU function specified by the opcode field b212 on the 32-bit immediate operand 3366 and other source operands to generate the result to the result bus 128 for the destination register field b216 The specified architectural register 106 is the same as the architectural register 106 specified by the destination register field b 206 of the immediate operand instruction 124 for subsequent retirement operations. If, in step 3406, the instruction translator 104 issues a conditional move microinstruction 126, the result of the immediate ALU microinstruction 3348 is destined to be a temporary register 106 rather than the destination register 106 specified by the immediate operand instruction 124, and , in order to respond to the execution unit 424 in step 3412 to complete the operation of the immediate ALU microinstruction, as described above, especially in FIGS. 10 to 20 , the instruction issuing unit 408 will issue the conditional move microinstruction 126 to the execution unit 424, and the execution unit 424 will This conditional move microinstruction 126 is executed to produce the result of the immediate operand instruction 124 . The process ends at step 3412.

在步骤3414中，指令转译器104放出两个微指令，即图33中的一个ROR微指令3344与一个ALU微指令3346，以响应此立即操作数指令124。在一实施例中，若是此立即操作数指令124系一个指定一修正后立即常数的条件ALU指令124，ROR微指令3344会包含图10的步骤1034、1034、1054与1056或是在图21的步骤2134、2136、2154与2154所描述的SHF微指令126。举例来说，若是被转译的立即操作数指令124系步骤1056中的ARM NCUALUOP指令124，其指定的修正后立即常数并未落于预定子集内，此ROR微指令3344可能就会是步骤1056中的SHF微指令126。在一实施例中，若是条件操作数指令124系一个条件ALU指令124，其指定一个来源目的共享的寄存器，ALU微指令3346可能会包含图21的步骤2134、2136、2154与2156中描述的ALU微指令126的其中之一。若是立即操作数条件ALU指令124并未指定一来源目的共同的寄存器，指令转译器104就会发出ALU微指令3346与图10的步骤1034、1036、1054与1056所描述的一个条件移动微指令126(XMOV与CMOV)。接下来进入步骤3416。In step 3414 , the instruction translator 104 issues two microinstructions, namely a ROR microinstruction 3344 and an ALU microinstruction 3346 in FIG. 33 , in response to the immediate operand instruction 124 . In one embodiment, if the immediate operand instruction 124 is a conditional ALU instruction 124 specifying a modified immediate constant, the ROR microinstruction 3344 would include steps 1034, 1034, 1054, and 1056 in FIG. Steps 2134, 2136, 2154 and 2154 describe the SHF microinstruction 126. For example, if the translated immediate operand instruction 124 is the ARM NCUALUOP instruction 124 in step 1056 and the specified modified immediate constant does not fall within the predetermined subset, the ROR microinstruction 3344 may be step 1056 SHF microinstructions in 126. In one embodiment, if the conditional operand instruction 124 is a conditional ALU instruction 124 that specifies a source-destination shared register, the ALU microinstruction 3346 may include the ALU described in steps 2134, 2136, 2154 and 2156 of FIG. 21 . One of the microinstructions 126. If the immediate operand conditional ALU instruction 124 does not specify a source-destination common register, the instruction translator 104 will issue the ALU microinstruction 3346 and a conditional move microinstruction 126 as described in steps 1034, 1036, 1054 and 1056 of FIG. (XMOV and CMOV). Next, go to step 3416.

在步骤3416，寄存器配置表402产生ALU微指令3346的关联性信息，指出ALU微指令3346系关联于ROR微指令3344的结果。若是在步骤3414中，指令转译器104发出一个条件移动微指令126，寄存器配置表402就会产生条件移动微指令126的关联性信息，指出条件移动微指令126系关联于ALU微指令3346的结果。接下来进入步骤3418。At step 3416, the register configuration table 402 generates dependency information for the ALU microinstruction 3346, indicating that the ALU microinstruction 3346 is associated with the result of the ROR microinstruction 3344. If, in step 3414, the instruction translator 104 issues a conditional move microinstruction 126, the register configuration table 402 will generate the association information of the conditional move microinstruction 126, indicating that the conditional move microinstruction 126 is associated with the result of the ALU microinstruction 3346. . Next, go to step 3418.

在步骤3418中，指令发布单元408发布ROR微指令3344至执行单元424。所以，执行单元424会接收由立即操作数指令124所指定的immed_8字段b208与rotate_imm字段b209的数值。接下来前进至决策步骤3412。In step 3418 , the instruction issue unit 408 issues the ROR microinstruction 3344 to the execution unit 424 . Therefore, the execution unit 424 receives the values of the immed_8 field b208 and the rotate_imm field b209 specified by the immediate operand instruction 124 . Next proceeds to decision step 3412.

在步骤3422中，执行单元424执行ROR微指令3344以产生立即操作数结果，写入由目的寄存器字段b226所指定的暂时寄存器106。接下来进入步骤3424。In step 3422, the execution unit 424 executes the ROR microinstruction 3344 to generate an immediate operand result, which is written to the scratch register 106 specified by the destination register field b226. Next, go to step 3424.

在步骤3424中，因应步骤3422中执行单元424完成ROR微指令3344的操作，指令发布单元会将ALU微指令3346发布至执行单元424。所以，执行单元424(整数单元124)接收步骤3422所产生的ROR微指令3344的结果以及由ALU微指令3346的来源寄存器1的字段b234所指定的操作数数值，此操作数数值与立即操作数指令124的来源寄存器1的字段b204所指定的架构寄存器106相同。接下来前进至决策步骤3426。In step 3424 , in response to the execution unit 424 completing the operation of the ROR microinstruction 3344 in step 3422 , the instruction issuing unit will issue the ALU microinstruction 3346 to the execution unit 424 . Therefore, the execution unit 424 (integer unit 124) receives the result of the ROR microinstruction 3344 generated in step 3422 and the operand value specified by the field b234 of the source register 1 of the ALU microinstruction 3346, the operand value and the immediate operand The architectural register 106 specified by the field b204 of the source register 1 of the instruction 124 is the same. Next proceeds to decision step 3426.

在步骤3426中，执行单元424执行ALU微指令3346的过程，系将操作码字段b232所指定的ALU功能执行于两个来源操作数，以产生一结果提供至结果总线128供目的寄存器字段b236所指定的架构寄存器106在后续引退步骤利用，此架构寄存器106系相同于由立即操作数指令124的目的寄存器字段b206所指定的架构寄存器104。若是步骤3414中，指令转译器104发出一个条件移动微指令126，ALU微指令3346的结果就会注定是一个暂时寄存器106，而非由立即操作数指令所指定的目的寄存器106，并且，因应执行单元424在步骤3426中完成ALU微指令3346的操作，指令发布单元408会发布条件移动微指令126至执行单元424，而如前述，尤其是第10至20图，执行单元424就会执行条件移动微指令126以产生立即操作数指令124的结果。此流程终止于步骤3426。In step 3426, the execution unit 424 executes the process of the ALU microinstruction 3346, which executes the ALU function specified by the opcode field b232 on the two source operands to generate a result that is provided to the result bus 128 for the destination register field b236. The specified architectural register 106 is used in subsequent retirement steps, and this architectural register 106 is the same as the architectural register 104 specified by the destination register field b 206 of the immediate operand instruction 124 . If, in step 3414, the instruction translator 104 issues a conditional move microinstruction 126, the result of the ALU microinstruction 3346 is destined to be a temporary register 106 rather than the destination register 106 specified by the immediate operand instruction, and, in response to the execution The unit 424 completes the operation of the ALU microinstruction 3346 in step 3426, and the instruction issuing unit 408 will issue the conditional move microinstruction 126 to the execution unit 424, and as described above, especially in Figures 10 to 20, the execution unit 424 will execute the conditional move Microinstruction 126 to produce the result of immediate operand instruction 124 . The process ends at step 3426.

从前文可知，本发明的微处理器100在一定情况下，系将立即操作数指令124转译为单一个立即ALU微指令3346，而非多个微指令。在某些状况下，亦即当立即字段b207系落于一数值预定子集内，而指令转译器104可以直接发出相对应的评估后的立即操作数3366的数值时，可提供相当大的贡献。As can be seen from the foregoing, the microprocessor 100 of the present invention translates the immediate operand instruction 124 into a single immediate ALU microinstruction 3346 under certain circumstances, rather than a plurality of microinstructions. In some cases, namely when the immediate field b 207 falls within a predetermined subset of values, and the instruction translator 104 can directly issue the value of the corresponding evaluated immediate operand 3 366, a considerable contribution may be provided .

首先，本发明可减少一个微指令在非循序执行处理器的各个资源中占据一额外的指令槽，如寄存器配置表402、重排缓冲器422、保留站406与执行单元424内的额外指令槽或入口，因而能够缩减、简化资源，能量的消耗也可以降低。First of all, the present invention can reduce one microinstruction from occupying an extra instruction slot in each resource of the non-sequential execution processor, such as the register allocation table 402 , the rearrangement buffer 422 , the reservation station 406 and the extra instruction slot in the execution unit 424 Or entrance, so resources can be reduced and simplified, and energy consumption can also be reduced.

其次，每个时脉周期内指令转译器104所能转译的指令集架构(例如ARM指令)的程序的平均指令数量可获得提升。举例来说，假定指令转译器104每个时脉周期能转译最多三个ARM指令，但是每个时脉周期最多只能放出三个微指令，此外，它还必须遵守在同一个时脉周期内发出所有关联于此ARM指令的微指令的限制，也就是说，此指令转译器104不能在一第一时脉周期发出关联于一ARM指令的微指令，同时又在下一个时脉周期发出关联于此ARM指令的第二个微指令。假定ARM指令序列如下，其中，IOI是一个立即操作数指令124，例如一个条件ALU指令，其指定一目的寄存器，此目的寄存器同时也是来源寄存器，而“Rx”数值是通用寄存器：Second, the average number of instructions for programs of instruction set architectures (eg, ARM instructions) that can be translated by the instruction translator 104 in each clock cycle can be increased. For example, it is assumed that the instruction translator 104 can translate up to three ARM instructions per clock cycle, but can only issue up to three microinstructions per clock cycle, and it must also comply with the same clock cycle. The restriction on issuing all microinstructions associated with this ARM instruction, that is, the instruction translator 104 cannot issue a microinstruction associated with an ARM instruction in a first clock cycle, and at the same time issue a microinstruction associated with an ARM instruction in the next clock cycle. The second microinstruction of this ARM instruction. Suppose the ARM instruction sequence is as follows, where IOI is an immediate operand instruction 124, such as a conditional ALU instruction, which specifies a destination register, which is also a source register, and the "Rx" value is a general-purpose register:

IOI R1,R1,立即字段数值AIOI R1, R1, immediate field value A

IOI R3,R3,立即字段数值BIOI R3, R3, immediate field value B

IOI R5,R5,立即字段数值CIOI R5, R5, immediate field value C

在立即字段数值A、B与C并未落入预定子集的情况下，指令转译器104必须花费三个时脉周期来转译这三个IOI指令。不过，在立即字段数值A、B与C系落入预定子集的情况下，指令转译器104可能只需要一个时脉周期就能转译这三个IOI指令。此外，此优点亦可在其它混合有非IOI指令，亦即其它ARM指令，的实例中获得印证。举例来说，假定一个ARM指令D，会被转译为两个微指令，其后跟随着一个IOI指令，此IOI指令指定的一立即字段数值系落入预定子集内，此IOI指令后跟随着一个ARM指令E，此指令会被转译为两个微指令，其后还跟随着一个ARM指令F，此指令会被转译为单一个微指令。在这个情况下，指令转译器104可在单一个时脉周期转译将ARM指令D与IOI指令，然后在下一个时脉周期转译ARM指令E与F，亦即四个ARM指令在两个时脉周期内完成转译。相较之下，若是没有本实施例所描述的功能，指令转译器104将需要三个时脉周期来转译这四个指令。相类似的优点也存在于指令发布单元408与引退单元422。相类似的优点也出现在四指令宽度(four-wide instruction)的指令转译器以及条件ALU指令并未指定一目的寄存器同时为一来源寄存器的情况，在此情况下，两个指令可在同一个时脉周期内进行转译，若无本实施例所描述的功能，就需使用两个时脉周期。In the case where the immediate field values A, B, and C do not fall within the predetermined subset, the instruction translator 104 must take three clock cycles to translate the three IOI instructions. However, in the event that the immediate field values A, B, and C fall within a predetermined subset, the instruction translator 104 may only need one clock cycle to translate the three IOI instructions. In addition, this advantage can also be confirmed in other examples mixed with non-IOI instructions, ie other ARM instructions. For example, suppose an ARM instruction D is translated into two microinstructions, followed by an IOI instruction that specifies an immediate field value that falls within a predetermined subset, followed by an IOI instruction An ARM instruction E, which is translated into two microinstructions, followed by an ARM instruction F, which is translated into a single microinstruction. In this case, the instruction translator 104 can translate the ARM instructions D and IOI instructions in a single clock cycle, and then translate the ARM instructions E and F in the next clock cycle, that is, four ARM instructions in two clock cycles complete the translation within. In contrast, without the functions described in this embodiment, the instruction translator 104 would require three clock cycles to translate the four instructions. Similar advantages exist for the instruction issue unit 408 and the retirement unit 422 . Similar advantages also occur in the case of four-wide instruction translators and conditional ALU instructions that do not specify a destination register and a source register at the same time. In this case, two instructions can be in the same Translation is performed within the clock cycle. If there is no function described in this embodiment, two clock cycles are required.

第三，在立即字段b207的数值落入预定子集，而指令转译器104可以发出单一微指令(或两个而非三个微指令)的情况下，因为第二个(或第三个)微指令的消失，可以减少立即操作数指令124的延迟。Third, in the case where the value of immediate field b207 falls within a predetermined subset, and the instruction translator 104 can issue a single microinstruction (or two instead of three), because the second (or third) The disappearance of the microinstruction can reduce the latency of the immediate operand instruction 124 .

第四，重排缓冲器以及/或保留站内不存在额外的微指令，可以提高降低处理器的前瞻能力，因而提升微处理器100利用指令阶层平行处理来执行程序的能力，增加执行单元424的利用率，改善微处理器100的整体效能。进一步来说，减少第二微指令可以在重排缓冲器空出更多空间给微指令，这样就可以产生一个较大的微指令池，可发派给执行单元424执行。微指令在“完成准备”之前还不能发送出去执行，这也就是此微指令中，来自之前微指令的所有来源操作数都处于可取得的状态才能发送出去。因此，微处理器100寻找完成准备的微指令的微指令池越大，找到的机会就越大，所以执行单元424就有较大的机会被利用。这通常被称为微处理器的前瞻能力，也就是充分利用微处理器所要执行的程序的指令阶层平行处理能力。前瞻能力越大，通常就越会提升执行单元424的利用效率。因此，本发明的微处理器100可依据立即字段b207的数值，将立即操作数指令124转译为单一个立即ALU微指令3348，而非多个微指令，因而具有潜力能提升其前瞻能力。Fourth, there are no additional micro-instructions in the rearrangement buffer and/or the reservation station, which can improve the look-ahead capability of the processor, thereby improving the capability of the microprocessor 100 to execute programs using the parallel processing of the instruction hierarchy, and increasing the capacity of the execution unit 424. The utilization rate improves the overall performance of the microprocessor 100 . Further, reducing the second microinstruction may free up more space in the reorder buffer for microinstructions, which may result in a larger pool of microinstructions that can be dispatched to the execution unit 424 for execution. The microinstruction cannot be sent for execution until it is "completed preparation", which means that in this microinstruction, all the source operands from the previous microinstruction can be sent out only when all the source operands from the previous microinstruction are available. Therefore, the larger the microinstruction pool in which the microprocessor 100 looks for the prepared microinstruction, the greater the chance of finding it, and thus the greater chance of the execution unit 424 being utilized. This is often referred to as the look-ahead capability of a microprocessor, which is to take full advantage of the instruction-level parallel processing capability of the program to be executed by the microprocessor. The greater the look-ahead capability, the more generally the utilization efficiency of the execution unit 424 will be improved. Therefore, the microprocessor 100 of the present invention can translate the immediate operand instruction 124 into a single immediate ALU microinstruction 3348 according to the value of the immediate field b207 instead of multiple microinstructions, thus potentially improving its look-ahead capability.

虽然前述实施例中的立即操作数指令系一个具有数据处理立即编码功能的ARM指令，此技术亦可应用于转译其它指令集架构的立即操作数指令；其次，值得注意的是，本发明亦可应用于没有预先存在的微架构，或是此预先存在的微架构所支持的指令集架构并非x86指令集架构的情况。此外，值得注意的是，本发明在此所描述的是一个广泛的处理器概念，其依据立即操作数指令指定的立即字段数值是否落入预定子集内，来将操作数指令转译为一个乱序执行微架构的不同微指令序列，以支持一指令集架构的立即操作数指令。Although the immediate operand instruction in the foregoing embodiment is an ARM instruction with data processing immediate encoding function, this technique can also be applied to translate immediate operand instructions of other instruction set architectures; secondly, it should be noted that the present invention can also Applies when there is no pre-existing microarchitecture, or where the pre-existing microarchitecture supports an instruction set architecture other than the x86 instruction set architecture. Furthermore, it is worth noting that the invention described herein is a broad processor concept that translates an operand instruction into a random operation depending on whether the immediate field value specified by the immediate operand instruction falls within a predetermined subset. Different microinstruction sequences of the microarchitecture are executed sequentially to support immediate operand instructions of an instruction set architecture.

在另一实施例中，指令转译器104产生图32的立即操作数3266给图33的立即操作数指令124的立即字段b207的所有数值。也就是说，立即字段b207的数值的预定子集内的所有数值都是立即字段b207的可能数值。以下是此实施例的Verilog硬件描述语言编码。In another embodiment, instruction translator 104 generates immediate operand 3266 of FIG. 32 to all values of immediate field b207 of immediate operand instruction 124 of FIG. 33 . That is, all values within the predetermined subset of values of immediate field b207 are possible values of immediate field b207. The following is the Verilog hardware description language encoding for this embodiment.

惟以上所述者，仅为本发明的较佳实施例而已，当不能以此限定本发明实施的范围，即大凡依本发明权利要求范围及发明说明内容所作的简单的等效变化与修饰，皆仍属本发明权利要求涵盖的范围内。举例来说，软件可以执行本发明所述的装置与方法的功能、制造、形塑、仿真、描述以及/或测试等。这可由一般的程序语言(如C、C++)、硬件描述语言(HDL)包含Verilog HDL,VHDL等，或是其它既有程序来达成。此软件可以设置于任何已知的计算机可利用媒介，如磁带、半导体、磁盘、光盘(如CD-ROM、DVD-ROM等)、网络或是其它通讯媒介。此处描述的装置与方法的实施例可被包含于一半导体智财核心，例如一微处理核心(如以硬件描述语言的实施方式)并且通过集成电路的制作转换为硬件。此外，本文所描述的装置与方法亦可包含硬件与软件的结合。因此，本文所述的任何实施例，并非用以限定本发明的范围。此外，本发明可应用于一般通用计算机的微处理器装置。最后，本领域技术人员利用本发明所揭露的观念与实施例作为基础，来设计并调整出不同的结构已达成相同的目的，亦不超出本发明的范围。Only the above are only preferred embodiments of the present invention, and should not limit the scope of the present invention, that is, any simple equivalent changes and modifications made according to the scope of the claims of the present invention and the content of the description of the invention, All still fall within the scope covered by the claims of the present invention. For example, software may perform the functions, manufacture, modeling, simulation, description, and/or testing of the devices and methods described herein. This can be achieved by general programming languages (eg C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, etc., or other existing programs. The software may be provided on any known computer-usable medium, such as magnetic tape, semiconductor, magnetic disk, optical disk (eg, CD-ROM, DVD-ROM, etc.), network, or other communication medium. Embodiments of the apparatus and methods described herein may be incorporated into a semiconductor intellectual property core, such as a microprocessor core (eg, implemented in a hardware description language) and translated into hardware through the fabrication of integrated circuits. Furthermore, the apparatus and methods described herein may also include a combination of hardware and software. Therefore, any embodiments described herein are not intended to limit the scope of the invention. Furthermore, the present invention can be applied to a microprocessor device of a general general-purpose computer. Finally, those skilled in the art can use the concepts and embodiments disclosed in the present invention as a basis to design and adjust different structures to achieve the same purpose, which does not go beyond the scope of the present invention.

惟以上所述者，仅为本发明的较佳实施例而已，当不能以此限定本发明实施的范围，即大凡依本发明权利要求范围及发明说明内容所作的简单的等效变化与修饰，皆仍属本发明权利要求涵盖的范围内。另外本发明的任一实施例或权利要求范围不须达成本发明所揭露的全部目的或优点或特点。此外，摘要部分和标题仅是用来辅助专利文件搜寻之用，并非用来限制本发明的权利要求范围。Only the above are only preferred embodiments of the present invention, and should not limit the scope of the present invention, that is, any simple equivalent changes and modifications made according to the scope of the claims of the present invention and the content of the description of the invention, All still fall within the scope covered by the claims of the present invention. Furthermore, it is not necessary for any embodiment of the invention or the scope of the claims to achieve all of the objects or advantages or features disclosed herein. Furthermore, the abstract section and headings are provided only to aid in patent document searching and are not intended to limit the scope of the claims of the present invention.

【相关申请案的参考文献】【References of related applications】

本申请案是同在申请中美国专利正式申请案的部分连续案，该些案件整体皆纳入本案参考：This application is a partial continuation of the US patent application in the same application, and these cases are incorporated by reference in this case in their entirety:

案号case number 申请日application date 13/224,310(CNTR.2575)13/224,310 (CNTR.2575) 09/01/201109/01/2011 13/333,520(CNTR.2569)13/333,520 (CNTR.2569) 12/21/201112/21/2011 13/333,572(CNTR.2572)13/333,572 (CNTR.2572) 12/21/201112/21/2011 13/333,631(CNTR.2618)13/333,631 (CNTR.2618) 12/21/201112/21/2011

本申请案是引用于以下美国临时专利申请案作优先权，每一申请案整体皆纳入本案参考：This application is cited for priority in the following U.S. provisional patent applications, each of which is incorporated herein by reference in its entirety:

案号case number 申请日application date 61/473,062(CNTR.2547)61/473,062 (CNTR.2547) 04/07/201104/07/2011 61/473,067(CNTR.2552)61/473,067 (CNTR.2552) 04/07/201104/07/2011 61/473,069(CNTR.2556)61/473,069 (CNTR.2556) 04/07/201104/07/2011 61/537,473(CNTR.2569)61/537,473 (CNTR.2569) 09/21/201109/21/2011 61/541,307(CNTR.2585)61/541,307 (CNTR.2585) 09/30/201109/30/2011 61/547,449(CNTR.2573)61/547,449 (CNTR.2573) 10/14/201110/14/2011 61/555,023(CNTR.2564)61/555,023 (CNTR.2564) 11/03/201111/03/2011 61/604,561(CNTR.2552)61/604,561 (CNTR.2552) 02/29/201202/29/2012

美国正式专利申请案US official patent application

13/224,310(CNTR.2575)13/224,310 (CNTR.2575) 09/01/201109/01/2011

是引用下列美国临时申请案的优先权：is to cite the priority of the following U.S. provisional applications:

61/473,062(CNTR.2547)61/473,062 (CNTR.2547) 04/07/201104/07/2011 61/473,067(CNTR.2552)61/473,067 (CNTR.2552) 04/07/201104/07/2011 61/473,069(CNTR.2556)61/473,069 (CNTR.2556) 04/07/201104/07/2011

以下三个本美国正式申请案The following three official U.S. applications

13/333,520(CNTR.2569)13/333,520 (CNTR.2569) 12/21/201112/21/2011 13/333,572(CNTR.2572)13/333,572 (CNTR.2572) 12/21/201112/21/2011 13/333,631(CNTR.2618)13/333,631 (CNTR.2618) 12/21/201112/21/2011

皆是以下美国正式申请式的延续案：They are all continuations of the following U.S. formal applications:

13/224,310(CNTR.2575)13/224,310 (CNTR.2575) 09/01/201109/01/2011

并引用下列美国临时申请案的优先权：And cite the priority of the following U.S. provisional applications:

61/473,062(CNTR.2547)61/473,062 (CNTR.2547) 04/07/201104/07/2011 61/473,067(CNTR.2552)61/473,067 (CNTR.2552) 04/07/201104/07/2011 61/473,069(CNTR.2556)61/473,069 (CNTR.2556) 04/07/201104/07/2011 61/537,473(CNTR.2569)61/537,473 (CNTR.2569) 09/21/201109/21/2011

本申请案是以下美国正式专利申请案的相关案：This application is related to the following U.S. official patent applications:

13/413,258(CNTR.2552)13/413,258 (CNTR.2552) 03/06/201203/06/2012 13/412,888(CNTR.2580)13/412,888 (CNTR.2580) 03/06/201203/06/2012 13/412,904(CNTR.2583)13/412,904 (CNTR.2583) 03/06/201203/06/2012 13/412,914(CNTR.2585)13/412,914 (CNTR.2585) 03/06/201203/06/2012 13/413,346(CNTR.2573)13/413,346 (CNTR.2573) 03/06/201203/06/2012 13/413,300(CNTR.2564)13/413,300 (CNTR.2564) 03/06/201203/06/2012 13/413,314(CNTR.2568)13/413,314 (CNTR.2568) 03/06/201203/06/2012

Claims

1. a microprocessor having an instruction set architecture defining an instruction including an immediate field having a first portion specifying a first value and a second portion specifying a second value, the instruction directing the microprocessor to perform an operation with a fixed value as one of source operands, the fixed value being obtained by rotating/shifting the first value by a number of bits based on the second value, the microprocessor comprising:

an instruction translator for translating the instruction into at least one immediate ALU micro-instruction, wherein the immediate ALU micro-instruction is encoded in a different instruction encoding than that defined by the instruction set architecture; and

an execution pipeline that executes micro instructions generated by the instruction translator to generate results defined by the instruction set architecture;

wherein the instruction translator, but not the execution pipeline, generates the fixed value as a source operand for the immediate ALU micro-instruction based on the first and second values for execution by the execution pipeline.

2. The microprocessor of claim 1, wherein the instruction translator translates the instruction into different microinstructions based on whether a value of the immediate field falls within a predetermined subset of values.

3. The microprocessor of claim 1, wherein the execution pipeline comprises:

a plurality of execution units that execute the microinstructions to generate the result; and

an issue unit issues the fixed value generated by the instruction translator to at least one of the execution units as the source operand of the immediate ALU micro instruction executed by the at least one of the execution units.

4. The microprocessor of claim 1, wherein the execution pipeline comprises:

a plurality of execution units that execute the microinstructions to generate the result;

wherein, this microprocessor still includes:

one or more first buses for transmitting execution results of the microinstructions from the execution unit back to the execution unit as source operands of other microinstructions; and

a second bus providing the fixed value generated by the instruction translator to the execution pipeline, wherein the second bus is different from the one or more first buses.

5. The microprocessor of claim 4, further comprising:

a plurality of registers to receive results of the execution of the microinstructions from the execution unit, wherein the fixed value generated by the instruction translator is not written into the registers by the microprocessor.

6. The microprocessor of claim 1, wherein the fixed value is obtained by rotating/shifting the first value by twice the number of bits of the second value.

7. The microprocessor of claim 1, wherein the instruction set architecture of the microprocessor defines a plurality of instructions each including an immediate field, including data processing instructions of the advanced reduced instruction set machine (ARM) Instruction Set Architecture (ISA) that specify a modified immediate constant.

8. The microprocessor of claim 7, wherein the data processing instruction of the ARM instruction set architecture specifying a modified immediate constant comprises a conditional ALU instruction specifying a modified immediate constant.

9. A method for processing a microprocessor, the method performed by a microprocessor having an instruction set architecture defining an instruction, the instruction including an immediate field having a first portion specifying a first value and a second portion specifying a second value, the instruction directing the microprocessor to perform an operation on a fixed value as a source operand, the fixed value being obtained by rotating/moving the first value by a number of bits based on the second value, the method comprising:

translating the instruction into at least one immediate ALU microinstruction encoded in an instruction encoding manner different from that defined by the instruction set architecture, wherein the translating is performed by an instruction translator of the microprocessor; and

executing the microinstructions generated by the instruction translator to generate a result defined by the instruction set architecture, wherein the executing step is performed by an execution pipeline of the microprocessor;

wherein the fixed value is generated by the instruction translator, but not the execution pipeline, as a source operand to the immediate ALU micro-instruction based on the first and second values for execution by the execution pipeline.

10. The method of claim 9, wherein the translating step includes translating the instruction into different micro instructions based on whether a value of the immediate field falls within a predetermined subset of values.

11. The method of claim 9, wherein the fixed value is obtained by rotating/shifting the first value by twice the number of bits of the second value.

12. The method of claim 9, wherein the instruction set architecture of the microprocessor defines a plurality of instructions each including an immediate field, including data processing instructions of the advanced reduced instruction set machine ARM instruction set architecture ISA, the data processing instructions specifying a modified immediate constant.

13. A microprocessor having an instruction set architecture that defines an instruction that includes an immediate field having a first portion specifying a first value and a second portion specifying a second value, the instruction directing the microprocessor to perform an operation with a fixed value as one of source operands, the fixed value being obtained by rotating/shifting the first value by a number of bits based on the second value, the microprocessor comprising:

an instruction translator for translating the instruction into micro instructions; and

an execution pipeline that executes the microinstructions generated by the instruction translator to generate a result defined by the instruction set architecture;

wherein when a value of the immediate field falls within a predetermined subset of values:

the instruction translator translates the instruction into at least one immediate ALU micro-instruction;

the instruction translator, but not the execution pipeline, generating the fixed value according to the first value and the second value; and

the execution pipeline executing the immediate ALU microinstruction using the fixed value generated by the instruction translator as one of the source operands; and

wherein when the value of the immediate field does not fall within the predetermined subset of values:

the instruction translator translates the instruction into at least a first micro instruction and a second micro instruction;

the execution pipeline, other than the instruction translator, generating the fixed value by executing the first micro instruction; and

the execution pipeline executes the second micro instruction by using the fixed value generated by the execution of the first micro instruction as one of the source operands.

14. The microprocessor of claim 13, wherein the execution pipeline comprises:

a register allocation table for generating the association between the second micro instruction and the fixed value generated by the first micro instruction.

15. The microprocessor of claim 13, wherein all of the microinstructions are defined by a microarchitecture of the microprocessor and are encoded with an instruction encoding different than the instruction set architecture definition.

16. The microprocessor of claim 13, wherein the first micro instruction is a shift/rotate micro instruction.

17. A method for processing a microprocessor, the method performed by a microprocessor having an instruction set architecture defining an instruction, the instruction including an immediate field having a first portion specifying a first value and a second portion specifying a second value, the instruction directing the microprocessor to perform an operation on a fixed value as a source operand, the fixed value being obtained by rotating/shifting the first value by a number of bits based on the second value, the microprocessor including an instruction translator and an execution pipeline, the method comprising:

determining, by the instruction translator, whether a value of the immediate field falls within a predetermined subset of values;

when the value of the immediate field falls within the predetermined subset of values:

translating the instruction into at least an immediate ALU micro-instruction using the instruction translator;

generating the fixed value according to the first value and the second value using the instruction translator instead of the execution pipeline; and

executing the immediate ALU microinstruction using the fixed value generated by the instruction translator as one of the source operands using the execution pipeline; and

translating the instruction into at least a first micro instruction and a second micro instruction by using the instruction translator;

generating the fixed value by executing the first micro instruction using the execution pipeline instead of the instruction translator; and

the second micro instruction is executed using the execution pipeline by using the fixed value generated by the first micro instruction execution as one of the source operands.

18. The method of claim 17, further comprising:

generating an association between the second micro instruction and the fixed value generated by the execution of the first micro instruction, wherein the generating the association is performed by a register allocation table of the microprocessor.

19. The method of claim 17, wherein all of the microinstructions are defined by a microarchitecture of the microprocessor and are encoded with an instruction encoding different than the instruction set architecture definition.