CN1308813C

CN1308813C - Control mechanism referenced by non-temporary memory

Info

Publication number: CN1308813C
Application number: CNB031030408A
Authority: CN
Inventors: G·葛兰·亨利; 罗德·E·胡克; 泰瑞·派克斯
Original assignee: INTELLIGENCE FIRST CO
Current assignee: INTELLIGENCE FIRST CO
Priority date: 2002-08-22
Filing date: 2003-01-28
Publication date: 2007-04-04
Anticipated expiration: 2023-01-28
Also published as: TWI220042B; CN1431586A

Abstract

The invention relates to a control device, structure and method of non-temporary memory parameters, which are used to expand a microprocessor instruction set to specify non-temporary memory parameters at the instruction level. The device includes a translation logic device and an extended execution logic device. The translation logic device translates an extended instruction into a microinstruction sequence. The extended instruction has an extended preamble and an extended preamble flag. The extended preamble specifies a non-temporal access operation to a memory parameter specified by the extended instruction, wherein the non-transitory access operation cannot be specified by an existing instruction of an existing instruction set. The extended preamble flag identifies the extended preamble, which is another structurally specified opcode in the existing instruction set. The extended execution logic device is coupled to the translation logic device for receiving the microinstruction sequence, and applying the non-temporary access operation to execute the operation indicated by the memory parameter.

Description

Device, structure and method for controlling non-temporary memory parameters

与相关申请案的对照Comparison with Related Applications

(0001)本发明主张以下美国申请案的优先权：案号10/227583，申请日为2002年8月22日。(0001) This application claims priority to the following US application: Case No. 10/227583, filed August 22, 2002.

(0002)本发明与下列同在申请中的美国专利申请案有关，都具有相同的申请人与发明人。台湾申请案号申请日 DOCKET NUMBER 专利名称 91116957 7/30/02 CNTR：2176 扩展微处理器指令集的装置及方法 91116958 7/30/02 CNTR：2186 执行条件指令的装置及方法 91124008 10/18/02 CNTR：2187 选择性控制存储器属性的装置及方法 91116956 7/30/02 CNTR：2188 选择性地控制条件码回写的装置及方法 91116959 7/30/02 CNTR：2189 增加微处理器的缓存器数量的结构 91124005 10/18/02 CNTR：2190 扩展微处理器数据模式的装置及方法 91124006 10/18/02 CNTR：2191 扩展微处理器地址模式的装置及方法 CNTR：2192 储存检查的禁止 CNTR：2193 选择性中断的禁止 91116672 7/26/02 CNTR：2198 选择性地控制结果回写的装置及方法 (0002) This invention is related to the following co-pending US patent applications, all having the same applicant and inventor. Taiwan application number filing date DOCKET NUMBER patent name 91116957 7/30/02 CNTR: 2176 Device and method for extending microprocessor instruction set 91116958 7/30/02 CNTR: 2186 Device and method for executing conditional instructions 91124008 10/18/02 CNTR: 2187 Apparatus and method for selectively controlling memory attributes 91116956 7/30/02 CNTR: 2188 Device and method for selectively controlling condition code write-back 91116959 7/30/02 CNTR: 2189 Structure to increase the number of registers in a microprocessor 91124005 10/18/02 CNTR: 2190 Apparatus and method for extending microprocessor data mode 91124006 10/18/02 CNTR: 2191 Device and method for extending microprocessor address mode CNTR: 2192 Prohibition of storage inspection CNTR: 2193 Disabling of Selective Interrupts 91116672 7/26/02 CNTR: 2198 Apparatus and method for selectively controlling result write-back

技术领域technical field

(0003)本发明是有关微电子的领域，尤指一种能将指令层次的非临时(non-temporal)存储器属性控制纳入一已有的微处理器指令集架构的技术。(0003) The present invention relates to the field of microelectronics, especially a technology capable of incorporating instruction-level non-temporal memory attribute control into an existing microprocessor instruction set architecture.

背景技术Background technique

(0004)自1970年代初以来，微处理器的使用即呈指数般成长。从最早应用于科学与技术的领域，到如今已从那些特殊领域引进商业的消费者领域，如桌上型与膝上型(lapiop)计算机、视频游戏控制器以及许多其它常见的家用与商用装置等产品。(0004) Since the early 1970s, the use of microprocessors has grown exponentially. From its earliest applications in the fields of science and technology, to the consumer field that has now been introduced into business from those specialized fields, such as desktop and laptop (lapiop) computers, video game controllers, and many other common home and business devices and other products.

(0005)随着使用上的爆炸性成长，在技术上也历经一相对应的提升，其特征在于对下列项目有着日益升高的要求：更快的速度、更强的寻址能力、更快的存储器存取、更大的操作数、更多种一般用途类型的运算(如浮点运算。单一指令多重数据(SIMD)、条件移动等)以及附加的特殊用途运算(如数字信号处理功能及其它多媒体运算)。如此造就了该领域中惊人的技术进展，且都已应用于微处理器的设计，像扩充流水线化(extensive pipelining)、超纯量架构(super-scalar architecture)、快取结构、乱序处理(out-of-order processing)、爆发式存取(burst access)结构、分支预测(branch predication)以及假想执行(speculative execution)。总之，比起30年前刚出现时，现在的微处理器呈现出惊人的复杂度，且具备了强大的能力。(0005) With the explosive growth in use, there has also been a corresponding improvement in technology, characterized by increasing requirements for the following items: faster speed, stronger addressability, faster Memory access, larger operands, more general-purpose types of operations (such as floating-point operations, single instruction multiple data (SIMD), conditional moves, etc.), and additional special-purpose operations (such as digital signal processing functions and other multimedia computing). This has resulted in amazing technological advances in the field, and has been applied to the design of microprocessors, such as extended pipelining (extensive pipelining), super-scalar architecture (super-scalar architecture), cache structure, out-of-order processing ( out-of-order processing), burst access structures, branch prediction, and speculative execution. In short, today's microprocessors are astonishingly complex and powerful compared to when they first appeared 30 years ago.

(0006)但与许多其它产品不同的是：有另一非常重要的因素已限制了，并持续限制着微处理器架构的演进。现今微处理器会如此复杂，一大部分得归因于这项因素，即旧有软件的兼容性。在市场考量下，所多制造商选择将新的架构特征纳入最新的微处理器设计中，但同时在这些最新的产品中，又保留了所有为确保兼容于较旧的、即所谓“旧有”(legacy)应用程序所必需的能力。(0006) But unlike many other products, there is another very important factor that has limited and continues to limit the evolution of microprocessor architectures. Much of the complexity of today's microprocessors is due to this factor, legacy software compatibility. Under market considerations, many manufacturers choose to incorporate new architectural features into the latest microprocessor designs, but at the same time, in these latest products, they retain all the features required to ensure compatibility with older, so-called "legacy" microprocessors. "(legacy) capabilities required by the application.

(0007)这种旧有软件兼容性的负担，没有其它地方，会比在x86-兼容的微处理器的发展史中更加显而易见。大家都知道，现在的32/16位的虚拟模式(virtual-mode)x86微处理器，仍可执行1980年代所撰写的8位真实模式(real-mode)的应用程序。而本领域的技术人员也承认，有不少相关的架构“包袱”堆在x86架构中，只为了支持与旧有应用程序及运作模式的兼容性。虽然在过去，研发者可将新开发的架构特征加入已有的指令集架构，但如今使用这些特征所凭借的工具，即可程序化的指令，却变得相当稀少。更简单他说，在某些重要的指令集中，已没有“多余”的指令，让设计者可借以将更新的特征纳入一已有的架构中。(0007) Nowhere is this burden of legacy software compatibility more apparent than in the history of x86-compatible microprocessors. Everyone knows that today's 32/16-bit virtual-mode x86 microprocessors can still execute 8-bit real-mode applications written in the 1980s. Those skilled in the art also admit that there is a lot of related architectural "baggage" piled up in the x86 architecture just to support compatibility with legacy applications and operating modes. While in the past developers could add newly developed architectural features to existing instruction set architectures, today the tools to use these features, ie, programmable instructions, are relatively rare. More simply, he said, in some important instruction sets, there are no "redundant" instructions that allow designers to incorporate newer features into an existing architecture.

(0008)例如，在x86指令集架构中，已经没有任何一未定义的一字节大小的运算码状态是尚未被使用的。在主要的一字节大小的x86运算码图中，全部256个运算码状态都已被已有的指令占用了。结果是，x86微处理器的设计者现在必须在提供新特征与保留旧有软件兼容性两者间作抉择。若要提供新的可程序化特征，则必须分派运算码状态给这些特征。若已有的指令集架构没有多余的运算码状态，则某些既存的运算码状态必须重新定义，以提供给新的特征。因此，为了提供新的特征，就得牺牲旧有软件兼容性了。(0008) For example, in the x86 ISA, there is no undefined one-byte opcode state that is unused. In the main one-byte x86 opcode map, all 256 opcode states are already occupied by existing instructions. As a result, designers of x86 microprocessors must now choose between providing new features and preserving legacy software compatibility. To provide new programmable features, opcode states must be assigned to those features. If the existing ISA does not have redundant opcode states, some existing opcode states must be redefined to accommodate new features. Therefore, compatibility with older software has to be sacrificed in order to provide new features.

(0009)一个现在微处理器设计者所关心的问题领域是应用程序如何有效率地使用高速缓存结构。随着快取技术的演进，已提供越来越多的特征，其允许系统程序员可控制一系统中高速缓存何时及如何被使用。早期的快取控制特征仅提供开/关的能力，通过设定微处理器的一内部缓存器，或通过将其封装体(package)上的某外部信号脚位设为真，设计者可将存储器的快取致能，或将整个存储器空间设定为不可快取(uncacheable)。对于不可快取的存储器参量(memory reference)(即加载/读取与储存/写入)，则皆送至系统存储器总线，而产生与外在总线架构相同的等待时间(latency)。相反地，存储器对于一高速缓存的参照或存取，只有在一快取未中(cache miss)发生时(亦即，一存储器参量的目标在内部高速缓存内并非有效)，才被送至系统存储器总线。快取特征使得应用程序在执行速度上大幅提升，特别是应用程序对存储器中相同的数据结构进行重复参照时。(0009) A current problem area of interest to microprocessor designers is how efficiently applications use cache structures. As caching technology has evolved, more and more features have been provided that allow system programmers to control when and how caches are used in a system. Early cache control features only provided on/off capability. By setting an internal register of the microprocessor, or by setting an external signal pin on its package (package) to true, the designer can set The caching of the memory is enabled, or the entire memory space is set as uncacheable. For non-cacheable memory references (ie load/read and store/write), they are all sent to the system memory bus, resulting in the same latency as the external bus architecture. Conversely, memory references or accesses to a cache are sent to the system only when a cache miss occurs (i.e., the target of a memory parameter is not valid within the internal cache). memory bus. The caching feature allows applications to run much faster, especially when the application repeatedly references the same data structures in memory.

(0010)最近微处理器架构上的改良，使得系统设计者能更精确地控制如何使用快取特征。这些改良允许设计者在一微处理器的地址空间内，对于其中一段地址区间，就微处理器如何依快取层级体(cache hierarchy)是执行对该地址区间的参照，设定该地址区间的性质。一般而言，对该地址区间的参照可被设定为不可快取。结合有写功能(write combining)、写透(write through)、回写(write back)或写入保护(write protected)。这些性质称为存储器属性(attribute)，或存储器特性(trait)。因此，对具有回写属性的地址的储存参量，会被送至高速缓存，并假想地(speculatively)分派至其中的储存位置。对具有不可快取属性的另一地址的储存参量，则送至系统总线，且不会被假想执行(speculatively executed)。(0010) Recent improvements in microprocessor architecture have given system designers more precise control over how cache features are used. These improvements allow the designer to set the address range for a segment of the address range in the address space of a microprocessor on how the microprocessor executes references to the address range according to the cache hierarchy. nature. In general, references to the address range can be made non-cacheable. Combined with write combining, write through, write back or write protected. These properties are called memory attributes (attributes), or memory characteristics (traits). Therefore, the storage parameter for the address with the write-back attribute will be sent to the cache, and speculatively assigned to the storage location therein. The storage parameter of another address with non-cacheable attribute is sent to the system bus and will not be speculatively executed.

(0011)不过，对于存储器属性及特定属性如何由微处理器借其高速缓存加以处理，提供一深度的说明，则不在本申请案的范围内。此处去了解本技术领域目前所能使设计者指派一存储器属性予一存储器域，以及所有后续对该区域内地址的存储器参量，将依据关联于该指定存储器属性的快取原则(ccachepolicy)来处理，如此即已足够。(0011) However, it is outside the scope of this application to provide an in-depth description of memory attributes and how specific attributes are handled by a microprocessor with its caches. Going to understand that the art currently enables designers to assign a memory attribute to a memory region, and all subsequent memory parameters for addresses in the region will be based on the ccache policy associated with the specified memory attribute deal with it, that's enough.

(0012)虽然现代的微处理器设计允许存储器的不同区域被赋予不同的存储器特性，但在两个重要方面，设计上仍受限制。第一，微处理器指令集架构限制了用以定义/改变存储器特性至使用者层级(user-level)的应用程序所无法存取的一(privilege)层级的指令执行。因此，当一桌上型/膝上型微处理器激活时，其操作系统在任何使用者层级应用程序开启前，便将虚拟存储器空间的存储器特性建立好。因而使用者层级的应用程序便不能改变主机系统的存储器特性。第二，在现代的微处理器中，用来建立存储器特性的最佳处理层级为分页层级。在常用的允许存储器分页(memory paging)的架构中，每一存储器分页的存储器属性，由操作系统在分页目录/表(page directory/tabie)的项目内作设定。因此，所有对于一特定分页内地址的参照，将使用于该相关存储器存取运算执行时所赋予的存储器属性。(0012) While modern microprocessor designs allow different regions of memory to be assigned different memory characteristics, the design is still limited in two important respects. First, the microprocessor instruction set architecture restricts the execution of instructions for defining/changing memory characteristics to a privileged level inaccessible to user-level applications. Thus, when a desktop/laptop microprocessor is activated, its operating system establishes the memory characteristics of the virtual memory space before any user-level applications are started. Therefore, user-level applications cannot change the memory characteristics of the host system. Second, in modern microprocessors, the best processing level for establishing memory characteristics is the paging level. In a commonly used architecture that allows memory paging (memory paging), the memory attributes of each memory page are set by the operating system in the entry of the page directory/table (page directory/tabie). Thus, all references to addresses within a particular page will use the memory attribute assigned when the associated memory access operation was performed.

(0013)对许多应用程序而言，上述的控制特征虽可让使用者层级的应用程序明显加快其执行速度，但本案发明人注意到，就其它的应用程序而言其效果仍是有限的。这除了因为在使用者层级上，并无法应用现代的存储器特性控制特征，也因为存储器属性仅能依分页层级(page-level)的单位来建立。例如，一个对一第一数据结构作重复存取的使用者程序，在对一第二数据结构进行一偶发的参照时，若第一数据结构的快取项目必须清除，以空出高速缓存的空间供第二数据结构使用，则该使用者程序的执行效率会因而受到影响。由于操作系统并未预知使用者层级的应用程序对于数据结构的参照频率，应用程序的数据空间一般皆被赋予一回写特性，因而促成了前述冲突的产生条件。程序员并没有用来更改数据空间特性的工具，以强迫该偶发参照转送至存储器总线(例如，赋予不可快取的特性给该第二数据结构)，而禁止该冲突。(0013) For many application programs, although the above-mentioned control features can significantly speed up the execution speed of user-level application programs, the inventor of the present case noticed that the effect is still limited for other application programs. This is not only because modern memory property control features cannot be applied at the user level, but also because memory properties can only be established in page-level units. For example, if a user program that repeatedly accesses a first data structure makes an occasional reference to a second data structure, if the cache entry of the first data structure must be cleared to free up the cache If the space is used by the second data structure, the execution efficiency of the user program will be affected accordingly. Because the operating system does not predict the reference frequency of the user-level application program to the data structure, the data space of the application program is generally given a write-back feature, thus contributing to the aforementioned conflict. Programmers do not have tools for changing data space properties to force the infrequent references to be forwarded to the memory bus (eg, assigning non-cacheable properties to the second data structure), while prohibiting the conflict.

(0014)在此技术领域里，应用程序所重复存取的数据被称为临时数据(temporal data)，而偶然存取的数据则称为非临时数据(non-temporal data)。本领域的技术人员将发觉，一高速缓存若填满了非临时数据(亦即快取污染(cache pollution))，将是非常不利的。因此，最近的技术已进展到可于已有的指令集中，增加一组有限的非临时储存指令，以允许应用程序设计者将数据从内部缓存器移至存储器，而不会造成高速缓存的污染，然而，现在并没有适用的工具可让程序员将一已有指令(例如，指定一使用一个或更多操作数的算术或逻辑运算的指令)所指定的存储器参量以非临时方式执行，因而完全跳过高速缓存的存取。(0014) In this technical field, the data repeatedly accessed by the application program is called temporary data, while the data accessed occasionally is called non-temporal data. Those skilled in the art will realize that it is very disadvantageous for a cache to be filled with non-temporal data (ie, cache pollution). As a result, recent technology has advanced to the existing instruction set by adding a limited set of non-temporary store instructions to allow application designers to move data from internal caches to memory without polluting the cache , however, there are currently no available tools that allow programmers to execute in a non-temporal fashion memory parameters specified by an existing instruction (for example, an instruction specifying an arithmetic or logical operation using one or more operands), thus Skip cache access entirely.

(0015)因此，我们所需要的是，一种可将指令层级的非临时存储器参量控制特征纳入已有微处理器指令集架构的装置及方法，其中该指令集架构是被已定义的运算码完全占用，且纳入该存储器参量控制特征能让一符合旧有规格的微处理器保留执行旧有应用程序的能力，同时还提供程序员指定非临时存储器存取的能力。(0015) What is needed, therefore, is an apparatus and method for incorporating instruction-level non-transitory memory parameter control features into existing microprocessor instruction set architectures, where the instruction set architecture is a defined opcode Full occupancy and the inclusion of the memory parameter control feature allows a microprocessor that conforms to legacy specifications to retain the ability to execute legacy applications while also providing the ability for the programmer to specify non-transitory memory accesses.

发明内容Contents of the invention

(0016)本发明如同前述其它申请案，是针对上述及其它公知技术的问题与缺点加以克服。本发明提供一种更好的技术，用以扩充微处理器的指令集，使其超越现有的能力，提供指令层级的非临时存储器参量的控制。在一具体实施例中，提供了一种可在微处理器内进行指令层级的存储器参量控制的装置。该装置包括一翻译逻辑装置(translation logic)与一扩展执行逻辑装置(extendedexecution logic)。该翻译逻辑装置将一扩展指令翻译成一微指令序列(microinstruction sequence)。该扩展指令具一扩展前置码(extended prefix)与一扩展前置码标记(extended prefix tag)。该扩展前置码对于该扩展指令所指定的一存储器参量，指定一非临时存取操作，其中该非临时存取操作不能以已有指令集的已有指令来指定。该扩展前置码标记则标识出该扩展前置码，其中扩展前置码标记是原本该已有指令集内另一结构上指定的运算码。该扩展执行逻辑装置耦接至翻译逻辑装置，用以接收该微指令序列，并应用该非临时存取操作来执行该存储器参量所指示的操作。(0016) The present invention, like other aforementioned applications, is to overcome the problems and shortcomings of the above-mentioned and other known technologies. The present invention provides a better technique for extending the instruction set of a microprocessor beyond its current capabilities to provide instruction-level control of non-temporary memory parameters. In a specific embodiment, a device capable of controlling memory parameters at instruction level in a microprocessor is provided. The device includes a translation logic device (translation logic) and an extended execution logic device (extended execution logic). The translation logic device translates an extended instruction into a microinstruction sequence. The extended command has an extended prefix and an extended prefix tag. The extended preamble specifies a non-temporal access operation for a memory parameter specified by the extended instruction, wherein the non-temporal access operation cannot be specified by an existing instruction of an existing instruction set. The extended preamble tag identifies the extended preamble, wherein the extended preamble tag is another structurally specified opcode in the existing instruction set. The extended execution logic device is coupled to the translation logic device for receiving the microinstruction sequence and applying the non-temporary access operation to execute the operation indicated by the memory parameter.

(0017)本发明的一个目的在于提出一种扩充已有指令集以提供指令层级的非临时存储器存取控制的微处理器结构。该微处理器结构存储有一扩展指令并具有一翻译器(translator)。该扩展指令指定一存储器参量的非临时存取操作。该扩展指令包括该已有指令集其中一选取的运算码，其后则接着一n位的扩展前置码。该选取的运算码标识出该扩展指令，而该n位的扩展前置码则标识出该非临时存取操作。该存储器参量的非临时存取操作不能另依该已有指令集的指令加以指定。该翻译器接收该扩展指令，并产生一微指令序列，以指示微处理器通过该非临时存取操作执行该存储器参量所指示的操作。(0017) It is an object of the present invention to propose a microprocessor architecture that extends the existing instruction set to provide instruction-level non-transitory memory access control. The microprocessor architecture stores an extended instruction and has a translator. The extended instruction specifies a non-temporary access operation of a memory parameter. The extended instruction includes a selected operation code in the existing instruction set, followed by an n-bit extended preamble. The selected opcode identifies the extended instruction, and the n-bit extended preamble identifies the non-transitory access operation. The non-transitory access operation of the memory parameter cannot be specified in addition to the instructions of the existing instruction set. The translator receives the extended instruction and generates a microinstruction sequence to instruct the microprocessor to execute the operation indicated by the memory parameter through the non-temporary access operation.

(0018)本发明的另一目的在于提出一种为已有指令集增添指令层级的非临时存取操作控制特征的装置。该装置包括一翻译逻辑装置，接收一逸出标记(escape tag)，该逸出标记标识出一对应指令的附随部分是指定了一存储器参量，其中该逸出标记为该已有指令集内的一第一运算码，并且其中一非临时存取操作指定码(non-temporal access specifier)耦接至该逸出标记，且为该附随部分其中之一，指定用于执行该存储器参量所指示操作的一非临时存取操作；及一扩展执行逻辑装置。。该扩展执行逻辑装置耦接至该翻译逻辑装置，通过该非临时存取操作执行该存储器参量所指示的操作。(0018) Another object of the present invention is to provide an apparatus for adding an instruction-level non-temporary access operation control feature to an existing instruction set. The device includes a translation logic device that receives an escape tag that identifies an accompanying portion of a corresponding instruction specifying a memory parameter, wherein the escape tag is an existing instruction set a first operation code, and wherein a non-temporal access specifier (non-temporal access specifier) is coupled to the escape flag, and is one of the accompanying parts, designated for performing the operation indicated by the memory parameter a non-transitory access operation; and an extended execution logic device. . The extended execution logic device is coupled to the translation logic device, and executes the operation indicated by the memory parameter through the non-temporary access operation.

(0019)本发明的再一目的在于提供一种扩充已有指令集架构的方法，以于指令层级提供非临时存储器参量的控制。该方法包括提供一扩展指令，该扩展指令包括一扩展标记及一扩展前置码，其中该扩展标记是该已有指令集架构其中一第一运算码；通过该扩展前置码指定要应用于一对应存储器参量的一非临时存取操作，其中该存储器参量是由该扩展指令的其余部分所指定；以及应用该非临时存取操作以执行该存储器参量所指示的参量，其中该应用动作禁止了该存储器参量的相关数据的快取动作。(0019) Another object of the present invention is to provide a method for extending the existing instruction set architecture to provide control of non-temporary memory parameters at the instruction level. The method includes providing an extension instruction, the extension instruction includes an extension flag and an extension preamble, wherein the extension flag is a first operation code of the existing instruction set architecture; the extension preamble is specified to be applied to a non-transitory access operation corresponding to a memory parameter, wherein the memory parameter is specified by the rest of the extension instruction; and applying the non-transitory access operation to execute the parameter indicated by the memory parameter, wherein the apply action prohibits The caching action of the relevant data of the memory parameter.

附图说明Description of drawings

(0020)本发明的前述与其它目的。特征及优点，在配合下列说明及所附图标后，将可获得更好的理解：(0020) the aforementioned and other objects of the present invention. The features and advantages can be better understood with the following explanations and attached icons:

(0021)图1为一相关技术的微处理器指令格式的方块图；(0021) Fig. 1 is a block diagram of the microprocessor instruction format of a related art;

(0022)图2为一表格，其描述一指令集架构中的指令，如何对应至图1指令格式内一8位运算码字节的位逻辑状态；(0022) Fig. 2 is a table, which describes the instructions in an instruction set architecture, how to correspond to the bit logic state of an 8-bit operation code byte in the instruction format of Fig. 1;

(0023)图3为本发明的扩展指令格式的方块图；(0023) Fig. 3 is a block diagram of the extended instruction format of the present invention;

(0024)图4为一表格，其显示依据本发明，扩展架构特征如何对应至一8位扩展前置码实施例中位的逻辑状态；(0024) FIG. 4 is a table showing how the extended architecture features map to the logic states of the bits in an 8-bit extended preamble embodiment in accordance with the present invention;

(0025)图5为描述本发明应用非临时存储器参量控制的一流水线化微处理器的方块图；(0025) FIG. 5 is a block diagram describing a pipelined microprocessor controlled by non-temporary memory parameters of the present invention;

(0026)图6为本发明于一微处理器中，用以指定一程序化存储器参量的非临时存取操作的扩展前置码的一具体实施例方块图；(0026) FIG. 6 is a block diagram of a specific embodiment of the extended preamble used for specifying a non-temporary access operation of a programmed memory parameter in a microprocessor according to the present invention;

(0027)图7为图5微处理器内翻译阶段逻辑装置的具体的方块图；(0027) Fig. 7 is the concrete block diagram of translation stage logic device in Fig. 5 microprocessor;

(0028)图8为图5的微处理器内扩展执行阶段逻辑装置的方块图；以及(0028) Fig. 8 is a block diagram of the extended execution stage logic device in the microprocessor of Fig. 5; and

(0029)图9为描述本发明用于控制微处理器中的非临时存储器参量的方法的运作流程图。(0029) FIG. 9 is a flowchart describing the operation of the method of the present invention for controlling non-transitory memory parameters in a microprocessor.

图标说明：Icon description:

100指令格式 101前置码100 Instruction Format 101 Preamble

102运算码 103地址指定码102 operation code 103 address designation code

2008位运算码图 201运算码值2008-bit opcode diagram 201 opcode value

202运算码F1H202 operation code F1H

300扩展指令格式 301前置码300 extended instruction format 301 prefix

302运算码 303地址指定码302 operation code 303 address designation code

304扩展指令标记 305扩展前置码304 Extended Instruction Mark

4008位前置码图 401架构特征4008-bit preamble 401 architecture features

500流水线化微处理器 501提取逻辑装置500 pipelined microprocessor 501 extraction logic device

502指令高速缓存外部存储器502 instruction cache external memory

503指令队列 504翻译逻辑装置503 instruction queue 504 translation logic device

505扩展翻译逻辑装置 506微指令队列505 extended translation logic device 506 microinstruction queue

507执行逻辑装置 508扩展执行逻辑装置507 Execution logic device 508 Extended execution logic device

600扩展前置码 601源字段600 Extended Preamble 601 Source Field

602目的字段 603备用字段602 purpose field 603 spare field

700翻译阶段逻辑装置 701激活状态信号700 Translation stage logic device 701 Activation state signal

702机器特定缓存器 703扩展特征字段702 Machine specific register 703 Extended feature field

704指令缓冲器 705翻译逻辑装置704 instruction buffer 705 translation logic device

706翻译控制器 707除能信号706 translation controller 707 disable signal

708逸出指令检测器 709扩展前置码译码器708 Escaped Instruction Detector 709 Extended Preamble Decoder

710指令译码器 711控制只读存储器710 Instruction Decoder 711 Control ROM

712微指令缓冲 713运算码扩展项字段712 Microinstruction buffer 713 Operation code extension field

714微运算码字段 715目的字段714 Micro operation code field 715 Purpose field

716源字段 717位移字段716 source field 717 displacement field

800扩展执行阶段逻辑装置 801微指令缓冲器800 Extended Execution Stage Logic Device 801 Microinstruction Buffer

802地址缓冲器 803地址缓冲器802 address buffer 803 address buffer

804目的操作数缓冲 805扩展存取逻辑装置804 purpose operand buffer 805 extended access logic device

806存储器特性描述符 807高速缓存806 Memory Characteristic Descriptor 807 Cache

808总线单元 809存取控制器808 bus unit 809 access controller

810储存逻辑装置 811加载逻辑装置810 store logic device 811 load logic device

812总线 813总线812 bus 813 bus

814扩展微指令缓存器 815源操作数缓冲器814 extended micro instruction buffer 815 source operand buffer

816非临时加载缓冲器 817结合有写功能的缓冲器816 non-temporary loading buffer 817 combined buffer with write function

900～932控制微处理器中的非临时存储器参量的方法的运作流程900-932 Operation flow of the method for controlling non-temporary memory parameters in a microprocessor

具体实施方式Detailed ways

(0030)以下的说明，是在一特定实施例及其必要条件的脉络下而提供，可使一般本领域的技术人员能够利用本发明。然而，各种对该较佳实施例所作的修改，对本领域的技术人员而言乃是显而易见，并且，在此所讨论的一般原理，亦可应用至其它实施例。因此，本发明并不限于此处所展示与叙述的特定实施例，而是具有与此处所公开的原理与新颖特征相符的最大范围。(0030) The following description is provided in the context of a specific embodiment and its prerequisites to enable those of ordinary skill in the art to utilize the present invention. However, various modifications to this preferred embodiment will be apparent to those skilled in the art, and the general principles discussed herein can be applied to other embodiments as well. Therefore, the present invention is not limited to the specific embodiments shown and described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

(0031)前文已针对今日的微处理器内，如何扩充其架构特征，以超越相关指令集能力的技术，作了背景的讨论。有鉴于此，在图1与图2中，将讨论一相关技术的例子。此处的讨论强调了微处理器设计者所一直面对的两难，即一方面，他们想将最新开发的架构特征纳入微处理器的设计中，但另一方面，他们又要保留执行旧有应用程序的能力。在图1至2的例子中，一完全占用的运算码图，已把增加新运算码至该范例架构的可能性禁止，因而迫使设计者要不就选择将新特征纳入，而牺牲某种程度的旧有软件兼容性，要不就将架构上的最新进展一并放弃，以便维持微处理器与旧有应用程序的兼容性。在相关技术的讨论后，于图3至9，将提供对本发明的讨论。通过利用一已有但未使用的运算码作为一扩展指令的前置码标记，本发明可让微处理器设计者克服已完全使用的指令集架构的限制，除了提供程序员于指令层级指定一特定存储器参量的非临时存储器存取的能力，同时也能保留执行旧有应用程序所需的所有特征。(0031) The previous article has discussed the background of how to expand its architectural features in today's microprocessors to exceed the capabilities of related instruction sets. In view of this, in FIG. 1 and FIG. 2 , an example of the related art will be discussed. The discussion here highlights the constant dilemma that microprocessor designers face when, on the one hand, they want to incorporate newly developed architectural features into their microprocessor designs, but on the other hand, they want to retain the implementation of application capabilities. In the example of Figures 1-2, a fully occupied opcode map has prohibited the possibility of adding new opcodes to the example architecture, thus forcing the designer to either choose to incorporate new features at the expense of some degree of Legacy software compatibility, or abandon the latest developments in the architecture, in order to maintain the compatibility of the microprocessor with legacy applications. Following the discussion of the related art, in Figures 3 through 9, a discussion of the present invention will be provided. By utilizing an existing but unused opcode as a preamble tag for an extended instruction, the present invention allows microprocessor designers to overcome the limitations of fully-used instruction set architectures, in addition to providing programmers with the ability to specify an The ability for non-temporal memory access of specific memory parameters while also retaining all the features required to execute legacy applications.

(0032)请参阅图1，其是一相关技术的微处理器指令格式100的方块图。该相关技术的指令100具有数量可变的数据项101-103，每一项目皆设定成一特定值，合在一起便组成微处理器的一特定指令100。该特定指令100指示微处理器执行一特定运算，例如将两操作数相加，或者是将一操作数从存储器搬移至一内部缓存器，或从该内部缓存器搬移至存储器。一般而言，指令100内的运算码项目102指定了所要执行的特定运算，而选用(optional)的地址指定码项目103位于运算码102之后，以指定关于该特定运算的附加信息，像是如何执行该运算，操作数位于何处等等。指令格式100并允许程序员在一运算码102前加上前置码项目101。在运算码102所指定的特定运算执行时，前置码101用以指示是否使用特定的架构特征。一般来说，这些架构特征能应用于指令集中任何运算码102所指定运算的大部分。例如，现今前置码101存在于一些能使用不同大小虚拟地址(如8位、16位、32位)执行运算的微处理器中。而当许多此类处理器被程序化为一预设的地址大小时(比如32位)，在其个别指令集中所提供的前置码101，仍能使程序员依据各个指令，选择性地取代(override)该预设的地址大小(如为了产生16位的虚拟地址)。可选择的地址大小仅是架构特征的一例，在许多现代的微处理器中，这些架构特征能应用于众多可由运算码102加以指定的运算如加、减、乘、布尔逻辑等)。(0032) Please refer to FIG. 1, which is a block diagram of a related art microprocessor instruction format 100. The instruction 100 of the related art has a variable number of data items 101-103, each of which is set to a specific value, and together constitutes a specific instruction 100 of the microprocessor. The specific instruction 100 instructs the microprocessor to perform a specific operation, such as adding two operands, or moving an operand from memory to an internal register, or from the internal register to memory. In general, the operation code item 102 in the instruction 100 specifies the specific operation to be performed, and the optional (optional) address specification code item 103 is located after the operation code 102 to specify additional information about the specific operation, such as how to The operation is performed, where are the operands located, etc. The instruction format 100 also allows the programmer to add a preamble item 101 before an opcode 102 . When the specific operation specified by the operation code 102 is executed, the preamble 101 is used to indicate whether to use a specific architectural feature. In general, these architectural features apply to most of the operations specified by any opcode 102 in the instruction set. For example, preamble 101 exists today in some microprocessors that can perform operations using virtual addresses of different sizes (eg, 8-bit, 16-bit, 32-bit). And when many of these processors are programmed to a preset address size (such as 32 bits), the precode 101 provided in their individual instruction sets still enables programmers to selectively replace (override) the preset address size (for example, to generate a 16-bit virtual address). The selectable address size is just one example of an architectural feature that, in many modern microprocessors, can be applied to numerous operations (such as addition, subtraction, multiplication, Boolean logic, etc.) that can be specified by opcodes 102 .

(0033)图1所示的指令格式100，有一为业界所熟知的范例，此即x86指令格式100，其为所有现代的x86-兼容微处理器所采用。更具体他说，x86指令格式100(也称为x86指令集架构100)使用了8位前置码101、8位运算码102以及8位地址指定码103。x86架构100亦具有数个前置码101，其中两个取代了x86微处理器所预设的地址/数据大小(即运算码状态66H与67H)，另一个则指示微处理器依据不同的翻译规则来解译其后的运算码字节102(即前置码值0FH，其使得翻译动作是依据所谓的二字节运算码规则来进行)，其它的前置码101则使特殊运算重复执行，直至重复条件满足为止(即REP运算码：F0H、F2H及F3H)。(0033) The instruction format 100 shown in FIG. 1 has a well-known example in the industry, the x86 instruction format 100, which is adopted by all modern x86-compatible microprocessors. More specifically, he said that the x86 instruction format 100 (also known as the x86 instruction set architecture 100 ) uses an 8-bit preamble 101 , an 8-bit operation code 102 and an 8-bit address specification code 103 . The x86 architecture 100 also has several precodes 101, two of which replace the default address/data size of the x86 microprocessor (ie, opcode states 66H and 67H), and the other instructs the microprocessor to follow a different translation rule to interpret the subsequent operation code byte 102 (that is, the preamble value 0FH, which makes the translation action be performed according to the so-called two-byte operation code rule), and the other preamble code 101 makes the special operation repeated , until the repetition condition is satisfied (that is, the REP operation code: F0H, F2H and F3H).

(0034)现请参阅图2，其显示一表格200，用以描述一指令集架构的指令201如何对应至图1指令格式内一8位运算码字节102的位值。表格200呈现了一8位运算码图200的范例，其将一8位运算码项目102所具有的最多256个值，关联到对应的微处理器运算码指令201。表格200将运算码项目102的一特定值，比如02H，映像至一对应的运算码指令201(即指令I02 201)。在x86运算码图的例子中，为此领域中技术人员所熟知的是，运算码值14H是映像至x86的进位累加(Add With Carry，ADC)指令，此指令将一8位的直接(immediate)操作数加至架构缓存器AL的内含值。本领域的技术人员也将发觉，上文提及的x86前置码101(亦即66H、67H、0FH、F0H、F2H及F3H)是实际的运算码值201，其在不同脉络下，指定要将特定的架构扩展项应用于随后的运算码项目102所指定的运算。例如，在运算码14H(正常情况下，是前述的ADC运算码)前加上前置码0FH，会使得x86处理器执行一“解压缩与插入低压缩的单精度浮点值”(Unpack and Interleave Low PackedSingle-precision Floating-Point Values)运算，而非原本的ADC运算。诸如此x86例子所述的特征，在现代的微处理器中是部分地致能，此因微处理器内的指令翻译/译码逻辑是依序解译一指令100的项目101-103。所以在过去，于指令集架构中使用特定运算码值作为前置码101，可允许微处理器设计者将不少先进的架构特征纳入兼容旧有软件的微处理器的设计中，而不会对未使用那些特定运算码状态的旧有程序，带来执行上的负面冲击。例如，一未曾使用x86运算码0FH的旧有程序，仍可在今日的x86微处理器上执行。而一较新的应用程序，借着运用x86运算码0FH作为前置码101，就能使用许多新进纳入的x86架构特征，如单一指令多重数据(SIMD)运算、条件移动运算等等。(0034) Please refer to FIG. 2, which shows a table 200 for describing how an instruction 201 of an instruction set architecture corresponds to a bit value of an 8-bit opcode byte 102 in the instruction format of FIG. 1 . Table 200 presents an example of an 8-bit opcode map 200 that associates up to 256 values of an 8-bit opcode entry 102 to corresponding microprocessor opcode instructions 201 . The table 200 maps a specific value of the operation code item 102, such as 02H, to a corresponding operation code instruction 201 (ie, instruction I02 201). In the example of the x86 opcode diagram, it is well known to those skilled in the art that the opcode value 14H is mapped to the x86 Add With Carry (ADC) instruction, which converts an 8-bit immediate ) operand is added to the content of the architectural register AL. Those skilled in the art will also find that the x86 preamble 101 mentioned above (ie, 66H, 67H, 0FH, F0H, F2H, and F3H) is the actual opcode value 201, which, in a different context, specifies the desired The particular architectural extension term is applied to the operation specified by the opcode term 102 that follows. For example, adding the preamble 0FH before the operation code 14H (normally, it is the aforementioned ADC operation code) will cause the x86 processor to perform an "unpack and insert low-compression single-precision floating-point value" (Unpack and Interleave Low PackedSingle-precision Floating-Point Values) operation, not the original ADC operation. Features such as those described for the x86 example are partially enabled in modern microprocessors because instruction translation/decoding logic within the microprocessor interprets the items 101-103 of an instruction 100 in sequence. So in the past, using a specific opcode value as the preamble 101 in the instruction set architecture allowed microprocessor designers to incorporate many advanced architectural features into the design of a microprocessor compatible with legacy software without A negative impact on the execution of legacy programs that do not use those particular opcode states. For example, an old program that never used the x86 opcode 0FH can still execute on today's x86 microprocessors. And a newer application, by using the x86 opcode 0FH as the prefix 101, can use many newly incorporated x86 architecture features, such as single instruction multiple data (SIMD) operations, conditional move operations, and so on.

(0035)尽管过去已通过指定可用/多余的运算码值201作为前置码101(也称为架构特征标记/指针101或逸出指令101)，来提供架构特征，但许多指令集架构100在提供功能上的强化时，仍会因为一非常直接的理由，而碰到阻碍：所有可用/多余的运算码值已被用完，也就是，运算码图200中的全部运算码值已被架构化地指定。当所有可用的值被分派为运算码项目102或前置码项目101时，就没有剩余的运算码值可作为纳入新特征之用。这个严重的问题存在于现在的许多微处理器架构中，因而迫使设计者得在增添架构特征与保留旧有程序的兼容性两者间作抉择。(0035) Although architectural features have been provided in the past by designating available/redundant opcode values 201 as preambles 101 (also known as architectural signatures/pointers 101 or escaped instructions 101), many instruction set architectures 100 in Providing functional enhancements is still hindered for a very straightforward reason: all available/redundant opcode values have been used up, i.e. all opcode values in opcode map 200 have been constructed specified. When all available values are assigned as opcode entry 102 or preamble entry 101, there are no remaining opcode values available for incorporating new features. This serious problem exists in many microprocessor architectures today, forcing designers to choose between adding architectural features and retaining compatibility with legacy programs.

(0036)值得注意的是，图2所示的指令201是以一般性的方式表示(亦即I24、I86)，而非具体指涉实际的运算(加进位累加、减、异或)。这是因为，在一些不同的微处理器架构中，完全占用的运算码图200在架构上，已将纳入较新进展的可能性禁止。虽然图2例子所提到的，是8位的运算码项目102，本领域的技术人员仍将发觉，运算码102的特定大小，除了作为一特殊情况来讨论完全占用的运算码结构200所造成的问题外，其它方面与问题本身并不相干。因此，一完全占用的6位运算码图将有64个可架构化地指定的运算码/前置码201，并将无法提供可用/多余的运算码值作为扩充之用。(0036) It is worth noting that the instruction 201 shown in FIG. 2 is expressed in a general way (ie I24, I86), rather than specifically referring to actual operations (add carry accumulation, subtraction, XOR). This is because, in some different microprocessor architectures, the fully occupied opcode graph 200 has architecturally prohibited the possibility of incorporating newer advances. Although the example of FIG. 2 refers to an 8-bit opcode entry 102, those skilled in the art will recognize that the specific size of the opcode 102 results from the fully occupied opcode structure 200 being discussed except as a special case. In addition to the problem, other aspects are irrelevant to the problem itself. Thus, a fully occupied 6-bit opcode map would have 64 architecturally assignable opcodes/preambles 201 and would not provide usable/redundant opcode values for expansion.

(0037)另一种替代做法，则并非将原有指令集完全废弃，以一新的格式100与运算码图200取代，而是只针对一部份已有的运算码201，以新的指令意含取代，如图2的运算码40H至4FH。以这种混合的技术，微处理器就可以单独地以下列两种模式之一运作：其中旧有模式利用运算码40H-4FH，是依旧有规则来解译，或者以另一种改良模式(enhanced mode)运作，此时运算码40H-4FH则依加强的架构规则来解译。此项技术确能允许设计者将新特征纳入设计，然而，当符合旧有规格的微处理器于加强模式运作时，缺点仍旧存在，因为微处理器不能执行任何使用运算码40H-4FH的应用程序。因此，站在保留旧有软件兼容性的立场，兼容旧有软件/加强模式的技术，还是无法接受的。(0037) Another alternative method is not to completely abolish the original instruction set, and replace it with a new format 100 and operation code map 200, but only for a part of the existing operation code 201, use the new instruction Means substitution, as shown in operation codes 40H to 4FH in Figure 2 . With this hybrid technique, the microprocessor can operate individually in one of two modes: the old mode, using opcodes 40H-4FH, which are still interpreted by the rules, or in another modified mode ( enhanced mode), at this time, the operation codes 40H-4FH are interpreted according to the enhanced architectural rules. This technology does allow designers to incorporate new features into their designs, however, when a microprocessor conforming to the legacy specification is run in enhanced mode, the disadvantage still exists because the microprocessor cannot execute any application using opcodes 40H-4FH program. Therefore, from the standpoint of retaining the compatibility of old software, it is still unacceptable to be compatible with old software/enhanced technology.

(0038)然而，对于运算码空间已完全占用的指令集200，且该空间涵盖所有于符合旧有规格的微处理器上执行的应用程序的情形，发明人已注意到其中运算码201的使用状况，且亦观察出，虽然有些指令202是架构化地指定，但未用于能被微处理器执行的应用程序中。图2所述的指令IF1 202即为此现象的一例。事实上，相同的运算码值202(亦即F1H)是映像至未用于x86指令集架构的一有效指令202。虽然该未使用的x86指令202是有效的x86指令202，其指示要在x86微处理器上执行一架构化地指定的运算，但它却未使用于任何能在现代x86微处理器上执行的应用程序。这个特殊的x86指令202被称为电路内模拟断点(In Circuit Emulation Breakpoint)(亦即ICE BKPT，运算码值为F1H)，之前都是专门使用于一种现在已不存在的微处理器模拟设备中。ICE BKPT 202从未用于电路内模拟器之外的应用程序中，并且先前使用ICE BKPT 202的电路内模拟设备已不复存在。因此，在x86的情形下，本案发明人已在一完全占用的指令集架构200内发现一样工具，利用一有效但未使用的运算码202，以允许在微处理器的设计中纳入先进的架构特征，而不需牺牲旧有软件的兼容性。在一完全占用的指令集架构200中，本发明利用一架构化地指定但未使用的运算码202，作为一指针标记，以标识出其后的一n位前置码，因此允许微处理器设计者可将最多2ⁿ个最新发展的架构特征，纳入微处理器的设计中，同时保留与所有旧有软件完全的兼容性。(0038) However, the inventors have noted the use of opcode 201 in the case of instruction set 200 where the opcode space is fully occupied and that space covers all applications executing on microprocessors conforming to legacy specifications. situation, and it is also observed that, although some instructions 202 are architecturally specified, they are not used in applications executable by the microprocessor. The instruction IF1 202 shown in FIG. 2 is an example of this phenomenon. In fact, the same opcode value 202 (ie, F1H) is mapped to an effective instruction 202 that is not used in the x86 instruction set architecture. Although the unused x86 instruction 202 is a valid x86 instruction 202 that instructs an architecturally specified operation to be performed on an x86 microprocessor, it is not used in any operation that can be performed on a modern x86 microprocessor. application. This special x86 instruction 202 is called In Circuit Emulation Breakpoint (also known as ICE BKPT, the operation code value is F1H), and was previously used exclusively for a now defunct microprocessor emulation in the device. ICE BKPT 202 has never been used in applications other than in-circuit simulators, and previous in-circuit simulation devices that used ICE BKPT 202 no longer exist. Thus, in the case of x86, the present inventors have discovered a tool within a fully occupied instruction set architecture 200, utilizing an efficient but unused opcode 202, to allow the incorporation of advanced architectures in the design of microprocessors features without sacrificing legacy software compatibility. In a fully occupied instruction set architecture 200, the present invention utilizes an architecturally specified but unused opcode 202 as a pointer marker to identify a subsequent n-bit preamble, thus allowing the microprocessor Designers can incorporate up to ²ⁿ of the latest developments in architectural features into microprocessor designs while retaining full compatibility with all legacy software.

(0039)本发明通过提供一n位的扩展非临时存取操作指定码前置码，以使用前置码标记/扩展前置码的概念，因而可允许程序员在一微处理器中，依据每个指令指定一非临时存储器存取予一对应的存储器参量运算。在该对应的存储器参量运算执行时，该非临时存储器存取被用于取代依照一预设属性所进行的基于高速缓存(cache-based)的存取，其中该预设属性是由操作系统程序先前建立的存储器特性描述符表/结构所指定。本发明现将参照图3至9进行讨论。(0039) The present invention specifies the code preamble by providing an n-bit extended non-temporary access operation, so as to use the concept of the preamble mark/extended preamble, thereby allowing the programmer to, in a microprocessor, according to Each instruction specifies a non-transitory memory access to a corresponding memory argument operation. When the corresponding memory parameter operation is performed, the non-transitory memory access is used in place of a cache-based access according to a predetermined attribute determined by the operating system program specified by a previously built Memory Characteristic Descriptor table/structure. The present invention will now be discussed with reference to FIGS. 3 to 9 .

(0040)现请参阅图3，其为本发明的扩展指令格式300的方块图6与图1所讨论的格式100非常近似，该扩展指令格式300具有数量可变的指令项目301-305，每一项目设定为一特定值，集合起来便组成微处理器的一特定指令300。该特定指令300指示微处理器执行一特定运算，像是将两操作数相加，或是将一操作数从存储器搬移至微处理器的缓存器内。一般而言，指令300的运算码项目302指定了所要执行的特定运算，而选用的地址指定码项目303则位于运算码302后，以指定该特定运算的相关附加信息，像是如何执行该运算、操作数所在的缓存器、用于计算源/结果操作数的存储器地址的直接与间接数据等等。指令格式300亦允许程序员在一运算码302前加上前置码项目301。在运算码302所指定的特定运算执行时，前置码项目301是用来指示是否要使用已有的架构特征。(0040) Referring now to FIG. 3, it is a block diagram of an extended instruction format 300 of the present invention that is very similar to the format 100 discussed in FIG. An item is set to a specific value, and collectively constitutes a specific instruction 300 of the microprocessor. The specific instruction 300 instructs the microprocessor to perform a specific operation, such as adding two operands, or moving an operand from memory to a register of the microprocessor. Generally speaking, the operation code item 302 of the instruction 300 specifies the specific operation to be performed, and the optional address specification code item 303 is placed after the operation code 302 to specify additional information related to the specific operation, such as how to perform the operation , the buffer where the operand is located, the direct and indirect data used to calculate the memory address of the source/result operand, and so on. The instruction format 300 also allows the programmer to prefix an opcode 302 with a prefix item 301 . When the specific operation specified by the opcode 302 is executed, the preamble item 301 is used to indicate whether to use the existing architectural features.

(0041)然而，本发明的扩展指令300是前述图1指令格式100的一超集(superset)，其具有两个附加项目304与305，可被选择性作为指令扩展项，并置于一格式化扩展指令300中所有其余项目301-303之前。这两个附加项目304与305可让程序员能对于扩展指令300所指定的存储器参量指定一非临时存储器存取，其中对应于该存储器参量的该非临时存储器存取是无法另由符合旧有规格微处理器的已有指令集来加以指定。选用项目304与305是一扩展指令标记304与一扩展非临时指定码前置码305。该扩展指令标记304是一微处理器指令集内另一结构上指定的运算码。在一x86的实施例中，该扩展指令标记304，或称逸出标记304，是用运算码状态F1H，其为早先使用的ICE BKPT指令。逸出标记304向微处理器逻辑装置标识出，该扩展前置码305，或称扩展特征指定码305，是跟随在后，其中该扩展前置码305指定了对应于一指定存储器参量(即一加载运算、一储存运算或两者)的一非临时存取操作。在一具体实施例中，逸出标记304标识出，一对应扩展指令300的附随部分301-303及305指定了微处理器所要执行的存储器参量。非临时存取操作指定码305，或称扩展前置码305，则指定在一源操作数加载运算、一目的操作数储存运算或以上两者中，需进行该非临时存取操作。微处理器内的扩展执行逻辑装置便借着进行该非临时存储器存取，来执行该存储器参量所指示的操作，以取代原先用其它方式所指定的可快取(cacheable)的预设存储器属性。这些其它方式包括使用现代微处理器架构所具有的控制缓存器位、存储器类型缓存器、分页表及其它类型的存储器属性描述符(descriptor)。(0041) However, the extended instruction 300 of the present invention is a superset (superset) of the aforementioned instruction format 100 in FIG. before all remaining items 301-303 in the extension instruction 300. These two additional items 304 and 305 allow the programmer to specify a non-temporal memory access for the memory parameter specified by the extension instruction 300, wherein the non-temporal memory access corresponding to the memory parameter cannot otherwise be determined by conforming to the old Specified microprocessor's existing instruction set to be specified. Optional items 304 and 305 are an extended command flag 304 and an extended non-temporary designation code prefix 305 . The extended instruction flag 304 is another architecturally specified opcode within a microprocessor instruction set. In an x86 embodiment, the extended instruction flag 304, or escape flag 304, is in opcode state F1H, which is the ICE BKPT instruction used earlier. The escape flag 304 identifies to the microprocessor logic device that the extended preamble 305, or the extended feature designation code 305, is to follow, wherein the extended preamble 305 specifies a value corresponding to a specified memory parameter (i.e. A non-temporal access operation of a load operation, a store operation, or both). In one embodiment, the escape flag 304 identifies that the accompanying portions 301-303 and 305 of a corresponding extension instruction 300 specify memory parameters to be executed by the microprocessor. The non-temporary access operation specifying code 305, or the extension prefix 305, specifies that the non-temporary access operation needs to be performed in a source operand load operation, a destination operand store operation, or both. The extended execution logic device in the microprocessor performs the operation indicated by the memory parameter by performing the non-temporary memory access, replacing the cacheable default memory attribute originally specified by other means . These other approaches include the use of control register bits, memory type registers, page tables, and other types of memory attribute descriptors that are present in modern microprocessor architectures.

(0042)此处将本发明的非临时参照的控制技术作个概述。一扩展指令是用于对一已有微处理器指令集的存储器参量指定一非临时存储器存取；其中该存储器参量的非临时存取操作无法另以该已有微处理器指令集的指令来加以指定。该扩展指令包括该已有指令集的运算码/指令304其中之一以及一n位的扩展前置码305。所选取的运算码/指令作为一指针304，以标识出指令300是一扩展特征指令300(亦即，其指定了微处理器架构的扩展项)，而该n位的特征前置码305则标识出该非临时存取操作是应用于一源操作数、一目的操作数或以上两者。在一具体实施例中，扩展前置码305具八位的大小，可指定非临时存取操作控制特征与其它最多64种扩展特征的组合。n位前置码的实施例，则除了非临时存取操作控制特征外，最多还可指定其它2^n-2种扩展特征。(0042) The non-temporary reference control technology of the present invention is summarized here. An extension instruction is used to specify a non-transitory memory access to a memory parameter of an existing microprocessor instruction set; wherein the non-transitory access operation of the memory parameter cannot be performed by another instruction of the existing microprocessor instruction set be specified. The extended instruction includes one of the opcode/instruction 304 of the existing instruction set and an n-bit extended preamble 305 . The selected opcode/instruction acts as a pointer 304 to identify that the instruction 300 is an extended feature instruction 300 (i.e., it specifies extensions to the microprocessor architecture), and the n-bit feature preamble 305 is Identifies whether the non-transitory access operation is applied to a source operand, a destination operand, or both. In one embodiment, the extended preamble 305 has a size of eight bits, and can specify a combination of the non-temporary access operation control feature and other extended features at most. In the embodiment of the n-bit preamble, in addition to the non-temporary access operation control feature, other 2n ^-2 extended features can be specified at most.

(0043)现请参阅图4，一表格400显示依据本发明，一指定存储器参量的非临时存取操作控制特征如何映像至一8位扩展前置码实施例的位逻辑状态。类似于图2所讨论的运算码图200，图4的表格400呈现一8位的扩展前置码图400的范例，其将一8位扩展前置码项目305的最多256个值，关联到一符合旧有规格的微处理器的对应扩展特征401(如E34、E4D等)，其中两个是指示进行非临时存取操作。在一x86的具体实施例中，本发明的8位扩展特征前置码305是提供给非临时存储器存取401(亦即E00-EFF)的指令层级控制之用，该些存储器特性401乃现行x86指令集架构于指令层级所未能指定的。(0043) Referring now to FIG. 4, a table 400 shows how the non-temporary access operation control characteristics of a given memory parameter map to the bit logic states of an 8-bit extended preamble embodiment according to the present invention. Similar to the opcode map 200 discussed in FIG. 2, the table 400 of FIG. Corresponding extended features 401 (such as E34, E4D, etc.) of a microprocessor conforming to the old specifications, two of which indicate non-temporary access operations. In an x86 embodiment, the 8-bit extended feature preamble 305 of the present invention is used to provide instruction level control of non-transitory memory accesses 401 (i.e., E00-EFF), which memory features 401 are currently The x86 instruction set architecture is not specified at the instruction level.

(0044)图4所示的扩展特征401是以一般性的方式表示，而非具体指涉实际的特征，此固本发明的技术可应用于各种不同的架构扩展项401与特定的指令集架构。本领域的技术人员将发觉，许多不同的架构特征401，其中一些已于上文提及，可依此处所述的逸出标记304/扩展前置码305技术将其纳入一已有的指令集。图4之，8位前置码实施例提供了最多256个不同的特征401，而一n位前置码实施例则具有最多2ⁿ个不同特征401的程序化选择。(0044) The extended feature 401 shown in FIG. 4 is expressed in a general manner, rather than specifically referring to actual features. This solidifies that the technology of the present invention can be applied to various architectural extensions 401 and specific instruction sets. architecture. Those skilled in the art will recognize that many different architectural features 401, some of which were mentioned above, can be incorporated into an existing instruction according to the escape marker 304/extended preamble 305 technique described herein. set. 4, the 8-bit preamble embodiment provides a maximum of 256 different features 401, while an n-bit preamble embodiment has a programmed selection of up to ²ⁿ different features 401.

(0045)现请参阅图5，其为描述本发明用以执行非临时存储器参量运算的流水线化微处理器500的方块图。微处理器500具有三个明显的阶段类型：提取、翻译及执行。提取阶段具有提取逻辑装置501，可从指令高速缓存502或外部存储器502提取指令。所提取的指令经由指令队列503送至翻译阶段。翻译阶段具有翻译逻辑装置504，耦接至一微指令队列506。翻译逻辑装置504包括扩展翻译逻辑装置505。执行阶段则有执行逻辑装置507，其内具有扩展执行逻辑装置508。(0045) Please refer to FIG. 5, which is a block diagram illustrating a pipelined microprocessor 500 for performing non-temporary memory parameter operations according to the present invention. Microprocessor 500 has three distinct phase types: fetch, translate, and execute. The fetch stage has a fetch logic 501 that can fetch instructions from an instruction cache 502 or external memory 502 . The fetched instructions are sent to the translation stage via the instruction queue 503 . The translation stage has translation logic 504 coupled to a microinstruction queue 506 . Translation logic 504 includes extended translation logic 505 . The execution phase has an execution logic device 507 with an extended execution logic device 508 therein.

(0046)依据本发明，于运作时，提取逻辑装置501从指令高速缓存/外部存储器502提取格式化指令，并将这些指令依其执行顺序放入指令队列503中。接着从指令队列503提取这些指令，送至翻译逻辑装置504。翻译逻辑装置504将每一送入的指令翻译/译码为一对应的微指令序列，以指示微处理器500去执行这些指令所指定的运算。依本发明，扩展翻译逻辑装置505检测那些具有扩展前置码标记的指令，以进行对应扩展非临时存储器参量指定码前置码的翻译/译码。在一x86的实施例中，扩展翻译逻辑装置505用于检测其值为F1H的扩展前置码标记，其是x86的ICE BKPT运算码。微指令字段则提供于微指令队列506中，以允许指定由该指令附随部分所指定的相关存储器参量的源/目的非临时存取操作。(0046) According to the present invention, during operation, the fetch logic device 501 fetches formatted instructions from the instruction cache/external memory 502 and puts these instructions into the instruction queue 503 according to their execution order. These instructions are then extracted from the instruction queue 503 and sent to the translation logic device 504 . The translation logic device 504 translates/decodes each incoming instruction into a corresponding microinstruction sequence to instruct the microprocessor 500 to perform operations specified by these instructions. According to the present invention, the extended translation logic device 505 detects those instructions marked with the extended preamble to perform translation/decoding corresponding to the extended non-temporary memory parameter specification code preamble. In an x86 embodiment, the extended translation logic device 505 is used to detect the extended preamble flag whose value is F1H, which is the ICE BKPT opcode of x86. A microinstruction field is provided in the microinstruction queue 506 to allow specifying the source/destination non-transitory access operations of the associated memory parameters specified by the accompanying portion of the instruction.

(0047)微指令从微指令队列506被送至执行逻辑装置507，其中扩展执行逻辑装置508用于依照一预设存储器特性(由已有存储器特性描述符means所定义)执行一指定存储器参量，或用于执行于使用者层级通过本发明的扩展前置码所程序化的一非临时存储器存取，依扩展微指令字段的指定，取代该预设的存储器特性，并完全跳过高速缓存。在一具体实施例中，非临时储存运算的处理方式，与使用具结合有写功能的属性的地址区间的储存运算相同。(0047) The microinstruction is sent from the microinstruction queue 506 to the execution logic device 507, wherein the expansion execution logic device 508 is used to execute a specified memory parameter according to a preset memory characteristic (defined by the existing memory characteristic descriptor means), Or for performing a non-transitory memory access programmed at the user level by the extended preamble of the present invention, as specified by the extended microinstruction field, overriding the default memory characteristics and skipping the cache entirely. In one embodiment, non-transitory store operations are handled in the same manner as store operations using address ranges with attributes associated with write functionality.

(0048)本领域的技术人员将发现，图5所示的微处理器500是现代的流水线化微处理器50经过简化的结果。事实上，现代的流水线化微处理器500最多可包括有20至30个不同的流水线阶段。然而，这些阶段可概括地归类为方块图所示的三个阶段，因此，图5的方块图500可用以点明前述本发明实施例所需的必要组件。为了简明起见，微处理器500中无关的组件并未显示出来。(0048) Those skilled in the art will recognize that the microprocessor 500 shown in FIG. 5 is a simplified result of the modern pipelined microprocessor 50 . In fact, a modern pipelined microprocessor 500 may include up to 20 to 30 different pipeline stages. However, these stages can be broadly categorized into three stages shown in the block diagram. Therefore, the block diagram 500 of FIG. 5 can be used to point out the necessary components required by the foregoing embodiments of the present invention. For the sake of clarity, unrelated components of microprocessor 500 are not shown.

(0049)现请参阅图6，其为本发明于一微处理器中，用以指定一程序化存储器参量的非临时存取操作的扩展前置码600的一具体实施例方块图。非临时存取操作指定码前置码600具8位大小，且包括一源字段601、一目的字段602及一备用字段603。源字段601指定一非临时存取操作要应用于一相关扩展指令的其余部分所指定的源操作数存储器存取(即加载、读取)中，而目的字段602则指定一非临时存取操作要应用于该其余部分所指定的目的操作数存储器存取(即储存，写入)中。本领域的技术人员将发觉，源与目的非临时存取操作可分别加以指定，在与重复字符串指令如x86架构的REP MOVS等连用的情形下会特别有用。(0049) Please refer to FIG. 6 , which is a block diagram of an embodiment of an extended preamble 600 for specifying a non-temporary access operation of a programmed memory parameter in a microprocessor according to the present invention. The non-temporary access operation specific code preamble 600 has a size of 8 bits and includes a source field 601 , a destination field 602 and a spare field 603 . The source field 601 specifies that a non-temporal access operation is to be applied to the source operand memory access (i.e., load, read) specified by the remainder of an associated extension instruction, while the destination field 602 specifies a non-temporal access operation To be applied to the destination operand memory access (ie store, write) specified by the remainder. Those skilled in the art will recognize that the source and destination non-transitory access operations can be specified separately, which is particularly useful in conjunction with repeating string instructions such as REP MOVS on the x86 architecture.

(0050)现请参阅图7，其为图5的微处理器内翻译阶段逻辑装置700的具体的方块图。翻译阶段逻辑装置700具有一指令缓冲器704，依本发明，其提供扩展指令至翻译逻辑装置705。翻译逻辑装置705是耦接至一具有一扩展特征字段703的机器特定缓存器(machine specific register)702。翻译逻辑装置705具一翻译控制器706，其提供一除能信号707至一逸出指令检测器708及一扩展译码器709。逸出指令检测器708耦接至扩展译码器709及一指令译码器710。扩展译码器709与指令译码逻辑装置710存取一控制只读存储器(ROM)711，其中储存了对应至某些扩展指令的样板(template)微指令序列，翻译逻辑装置705亦包括一微指令缓冲器712，其具有一运算码扩展项字段713、一微运算码字段714、一目的字段715、一源字段716以及一位移字段717。(0050) Please refer to FIG. 7 , which is a specific block diagram of the logic device 700 of the translation stage in the microprocessor of FIG. 5 . The translation stage logic device 700 has an instruction buffer 704 that provides extended instructions to the translation logic device 705 in accordance with the present invention. The translation logic 705 is coupled to a machine specific register 702 having an extended feature field 703 . The translation logic device 705 has a translation controller 706 that provides a disable signal 707 to an escape command detector 708 and an extension decoder 709 . The escape instruction detector 708 is coupled to the extension decoder 709 and an instruction decoder 710 . The extension decoder 709 and the instruction decoding logic device 710 access a control read-only memory (ROM) 711, which stores template microinstruction sequences corresponding to certain extension instructions, and the translation logic device 705 also includes a microinstruction The instruction buffer 712 has an opcode extension field 713 , a micro opcode field 714 , a destination field 715 , a source field 716 and a displacement field 717 .

(0051)运作上，在微处理器通电激活期间，机器特定缓存器702内的扩展字段703的状态是通过信号激活状态(signal power-up state)701决定，以标识出该特定微处理器是否能翻译与执行本发明之用以执行指令层级的非临时存储器参量的扩展指令。在一具体实施例中，信号701从一特征控制缓存器(图上未显示)导出，该特征控制缓存器则读取一于制造时即已组态的熔丝数组(fuse array)(未显示)。机器特定缓存器702将扩展特征字段703的状态送至翻译控制器706。翻译控制逻辑装置706则控制从指令缓冲器704所提取的指令，要依照扩展翻译规则或常用翻译规则进行解译。提供这样的控制特征，可允许监督应用程序(如BIOS)致能/除能微处理器的扩展执行特征。若扩展特征被除能，则具有被选为扩展特征标记的运算码状态的指令，将依常用翻译规则进行翻译。在一x86的具体实施例中，选取运算码状态F1H作为标记，则在常用的翻译规则下、遇到F1H将造成不合法的指令异常(exception)。若扩展翻译被除能，指令译码器710将翻译/译码所有送入的指令，并对微指令712的所有字段713至717进行组态。然而，在扩展翻译规则下，若遇到标记，则会被逸出指令检测器708检测出来。逸出指令检测器708将指示扩展前置码译码器709依据扩展翻译规则，翻译/译码该扩展指令的扩展前置码部分，并对运算码扩展项字段713进行组态，以指示该非临时存储器存取要应用于该扩展指令的其余部分所指定的存储器参量中。指令译码器710将译码/翻译该扩展指令的其余部分，并对微指令712的微运算码字段714、源字段716、目的字段715以及位移字段717进行组态。某些特定指令将导致对控制ROM 711的存取，以获取对应的微指令序列样板。经过组态的微指令712被送至一微指令队列(未显示于图中)，由处理器进行后续执行。(0051) In operation, during power-up of the microprocessor, the state of the extension field 703 in the machine-specific register 702 is determined by the signal power-up state 701 to identify whether the particular microprocessor is Extended instructions for implementing instruction-level non-temporary memory parameters capable of translating and executing the present invention. In one embodiment, signal 701 is derived from a feature control register (not shown) that reads a fuse array (fuse array) (not shown) configured at the time of manufacture. ). The machine specific register 702 sends the status of the extended features field 703 to the translation controller 706 . The translation control logic device 706 controls the instructions extracted from the instruction buffer 704 to be interpreted according to the extended translation rules or common translation rules. Providing such a control feature may allow a supervisory application (eg, BIOS) to enable/disable the extended execution features of the microprocessor. If extended features are disabled, instructions with opcode states selected as extended feature flags will be translated according to common translation rules. In a specific embodiment of x86, if the operation code state F1H is selected as a flag, under common translation rules, encountering F1H will cause an illegal instruction exception (exception). If extended translation is disabled, instruction decoder 710 will translate/decode all incoming instructions and configure all fields 713 to 717 of microinstruction 712 . However, under extended translation rules, if a token is encountered, it will be detected by the escaped instruction detector 708 . The escaped instruction detector 708 will instruct the extended preamble decoder 709 to translate/decode the extended preamble part of the extended instruction according to the extended translation rules, and configure the operation code extension field 713 to indicate the Non-temporary memory accesses are to be applied to the memory parameters specified by the rest of the extended instruction. The instruction decoder 710 will decode/translate the rest of the extended instruction and configure the micro-opcode field 714 , source field 716 , destination field 715 and displacement field 717 of the microinstruction 712 . Certain specific instructions will result in access to the control ROM 711 to obtain the corresponding microinstruction sequence template. The configured microinstructions 712 are sent to a microinstruction queue (not shown in the figure) for subsequent execution by the processor.

(0052)现请参阅图8，其为图5微处理器内的扩展执行阶段逻辑装置800的方块图。该扩展执行阶段逻辑装置800具一扩展存取逻辑装置(extendedaccess logic)805，其分别经由总线812与813耦接至一高速缓存807与一总线单元808。总线单元808是用于指导一存储器总线(图中未显示)上的存储器存取操作(memory transaction)。依本发明，扩展存取逻辑装置805从微处理器前一阶段的一扩展微指令缓冲器801接收微指令，从地址缓冲器802与803接收两个地址操作数，并从目的操作数缓冲器804接收一目的操作数。扩展存取逻辑装置805亦耦接至数个依主机微处理器的架构协议进行组态的存储器特性描述符806。扩展存取逻辑装置805包括一存取控制器809、一储存逻辑装置810及一加载逻辑装置811。加载逻辑装置811包括一非临时加载缓冲器816，并将一源操作数输出至一源操作数缓冲器815。储存逻辑装置810则具有一结合有写功能的缓冲器817。(0052) Please refer to FIG. 8 , which is a block diagram of an extended execution stage logic device 800 in the microprocessor of FIG. 5 . The extended runtime logic device 800 has an extended access logic device (extended access logic) 805, which is coupled to a cache memory 807 and a bus unit 808 via buses 812 and 813, respectively. The bus unit 808 is used to direct memory transactions on a memory bus (not shown). According to the present invention, the extended access logic device 805 receives microinstructions from an extended microinstruction buffer 801 in the previous stage of the microprocessor, receives two address operands from address buffers 802 and 803, and receives two address operands from the destination operand buffer. 804 receives a destination operand. The extended access logic device 805 is also coupled to a number of memory property descriptors 806 configured according to the architectural protocol of the host microprocessor. The extended access logic device 805 includes an access controller 809 , a store logic device 810 and a load logic device 811 . The load logic device 811 includes a non-transitory load buffer 816 and outputs a source operand to a source operand buffer 815 . The storage logic device 810 has a buffer 817 combined with a write function.

(0053)运作上，扩展执行逻辑装置800是根据扩展微指令缓冲器801中的微指令的指示，来执行存储器存取，从存储器读取操作数，以及将操作数写入存储器。执行读取/加载运算时，存取控制器809从地址缓冲器802与803接收一个或更多存储器地址，并读取存储器特性描述符806，以决定相关于该加载运算的存储器属性，在一x86实施例中，存储器特性描述符806包括x86高速缓存与分页控制缓存器、分页目录与分页表项目、存储器类型范围缓存器(memory toe range register，MTTR)、分页属性表(paging attribute table，PAT)以及外部信号脚位KEN#、WB/WT#、PCT及PWT。存取控制器809依据x86的层级存储器属性协议，使用从这些源806所取得的信息，以决定该加载运算的预设存储器属性。对非x86的实施例而言，存取控制器809依据对应主机微处理器的特定架构的层级存储器属性协议，使用从存储器特性描述符806所取得的信息，来决定该加载运算的预设存储器属性。存储器地址，连同其对应存取的属性，被送至加载逻辑装置811。依据所提供的特性属性，加载逻辑装置811经由总线812从高速缓存或直接经由总线单元808从系统存储器(未显示)获得源操作数。所获得的源操作数与一流水线时钟信号(未显示)同步，被送至源操作数缓冲器815，扩展微指令亦与该流水线时钟信号同步，被送入流水线至扩展微指令缓存器814。源操作数便以此种方式被送至微处理器的下一阶段。(0053) In operation, the extended execution logic device 800 performs memory access, reads operands from the memory, and writes operands to the memory according to instructions of the microinstructions in the extended microinstruction buffer 801 . When performing a load/load operation, the access controller 809 receives one or more memory addresses from the address buffers 802 and 803, and reads the memory property descriptor 806 to determine memory properties associated with the load operation, in a In an x86 embodiment, the memory characteristic descriptor 806 includes an x86 cache and paging control register, a paging directory and a paging table entry, a memory type range register (memory toe range register, MTTR), and a paging attribute table (paging attribute table, PAT ) and external signal pins KEN#, WB/WT#, PCT and PWT. The access controller 809 uses the information obtained from these sources 806 to determine the default memory attributes for the load operation according to the x86 hierarchical memory attribute protocol. For non-x86 embodiments, the access controller 809 uses information obtained from the memory property descriptor 806 to determine the default memory for the load operation according to the hierarchical memory attribute protocol corresponding to the specific architecture of the host microprocessor. Attributes. The memory address, along with its corresponding access attribute, is sent to load logic 811 . Load logic 811 obtains source operands from cache memory via bus 812 or directly from system memory (not shown) via bus unit 808, depending on the provided property attributes. The obtained source operands are sent to the source operand buffer 815 synchronously with a pipeline clock signal (not shown), and the extended microinstructions are also sent into the pipeline to the extended microinstruction register 814 synchronously with the pipeline clock signal. In this way the source operand is sent to the next stage of the microprocessor.

(0054)执行扩展微指令所指示的写入/储存运算时，存取控制器809从地址缓冲器802与803接收该运算的地址数据，并从缓冲器804接收所要储存的操作数。存取控制器809存取如前所述的存储器特性描述符806，以决定对应于该储存存取运算的存储器特性。该存储器特性。地址数据以及该目的操作数并送至储存逻辑装置810。依据所提供的特定属性，储存逻辑装置810经由总线812将该目的操作数写入高速缓存807，或直接经由总线单元808写入系统存储器。(0054) When executing the write/store operation indicated by the extended microinstruction, the access controller 809 receives the address data of the operation from the address buffers 802 and 803, and receives the operand to be stored from the buffer 804. The access controller 809 accesses the memory property descriptor 806 as described above to determine the memory property corresponding to the storage access operation. The memory characteristics. The address data and the destination operand are sent to the storage logic device 810 . Depending on the specific attributes provided, storage logic device 810 writes the destination operand into cache 807 via bus 812 , or directly into system memory via bus unit 808 .

(0055)本发明的储存逻辑装置810与加载逻辑装置811被用于依据主机处理器的存储器属性模型的相关处理要求，来执行储存与加载的参照运算，其中该处理要求是包括强/弱排序协议(如假想执行规则)以及快取存取原则。在一具体实施例中，加载与储存运算是在主机微处理器的不同流水线阶段中执行。(0055) The storage logic device 810 and the load logic device 811 of the present invention are used to perform storage and load reference operations according to the relevant processing requirements of the memory attribute model of the host processor, wherein the processing requirements include strong/weak sorting Protocols (such as hypothetical execution rules) and cache access principles. In one embodiment, the load and store operations are performed in separate pipeline stages of the host microprocessor.

(0056)对使用非临时存储器参量前置码的扩展指令而言，相关存储器参量(即加载。储存或加载与储存两者)的非临时操作数指定码通过扩展微指令缓冲器801内的扩展微指令的运算码扩展项字段(未显示)，被送至存取控制器809。存取控制器809，如前所述，通过从存储器特性描述符806所获得的信息，决定所指定存储器存取的预设存储器特性。若该对应的预设特性允许非临时存取操作(即可快取的特性，如回写特性)，则存取控制器809将非临时指定码连同前述的地址及/或目的操作数，送至储存逻辑装置810/加载逻辑装置811。若该对应的预设特性不允许非临时存取操作(即不可快取的特性)，则存取控制器809将该预设特性连同地址及/或目的操作数，送至储存逻辑装置810/加载逻辑装置811。(0056) For extension instructions using non-temporary memory parameter preambles, the non-temporal operand specification code for the associated memory parameter (i.e. load. store or both load and store) is passed through the extended The opcode extension field (not shown) of the microinstruction is sent to the access controller 809 . The access controller 809, as described above, uses the information obtained from the memory property descriptor 806 to determine the default memory characteristics for the specified memory access. If the corresponding default feature allows non-temporary access operations (i.e. fast-fetching features, such as write-back features), the access controller 809 sends the non-temporary designation code together with the aforementioned address and/or destination operand to To store logic device 810/load logic device 811. If the corresponding default characteristic does not allow non-temporary access operations (i.e., non-cacheable characteristics), the access controller 809 sends the default characteristic together with the address and/or destination operand to the storage logic device 810/ The logic device 811 is loaded.

(0057)若该对应的预设特性允许非临时存取操作，则就非临时加载参量而言，加载逻辑装置811首先通过总线812询问高速缓存807，以判断在高速缓存807中，一对应的加载操作数是否存在且有效(即加载命中(load hit))。若是，则加载运算即依该预设存储器特性执行。然而，若该对应的加载操作数不存在于高速缓存807中，则加载逻辑装置811通过总线单元808从存储器提取含有该加载操作数的快取线，并将该快取线保留于非临时缓冲器816中，因而完全跳过高速缓存807。于是，该加载操作数被非临时地送至源操作数缓冲器815。(0057) If the corresponding default feature allows non-temporary access operations, then with respect to non-temporary load parameters, the load logic device 811 first inquires the cache 807 through the bus 812 to determine in the cache 807, a corresponding Whether the load operand exists and is valid (i.e. load hit). If yes, the load operation is executed according to the default memory characteristic. However, if the corresponding load operand does not exist in cache 807, then load logic 811 fetches the cache line containing the load operand from memory via bus unit 808 and keeps the cache line in a non-temporary cache cache 816, thus skipping cache 807 entirely. Thus, the load operand is sent to the source operand buffer 815 non-temporarily.

(0058)就非临时储存参量而言，储存逻辑装置810首先询问高速缓存807，以判断在高速缓存807中，一通过目的操作数缓冲器804所提供的储存操作数的一对应快取线是否存在且有效(即储存命中(store hit))。若是，则储存运算即依该预设存储器特性而不是非临时地执行。然而，若该快取线不存在于高速缓存807中(即储存未中(store miss))，则储存逻辑装置810并不配置高速缓存807的空间给该快取线，而是将该储存操作数送至结合有写功能的缓冲器817。结合有写功能的缓冲器817的内容接着通过总线单元808，直接被写入存储器，以符合特定处理器(processor-specific)层级存储器属性的处理协议，其中该协议是应用于结合有写功能的存储器特性。在一x86的实施例中，结合有写功能的属性允许将存储器的写入运算予以延迟及合并，而不要求一致性(coherency)。储存操作数因而以非临时的方式被送至存储器。(0058) For non-temporary storage parameters, storage logic device 810 first queries cache 807 to determine whether a corresponding cache line for a storage operand provided by destination operand buffer 804 is present in cache 807 exists and is valid (i.e. store hit). If yes, the storage operation is performed according to the default memory characteristics instead of non-temporarily. However, if the cache line does not exist in the cache 807 (i.e. a store miss), then the storage logic device 810 does not allocate the cache 807 space to the cache line, but the store operation The data is sent to the buffer 817 which incorporates a write function. The contents of the write-incorporated buffer 817 are then directly written into memory via the bus unit 808 to conform to the processor-specific level memory attribute processing protocol applied to the write-incorporated memory characteristics. In an x86 embodiment, the write-incorporated attribute allows memory write operations to be delayed and coalesced without requiring coherency. Store operands are thus sent to memory in a non-transitory manner.

(0059)现请参阅图9，其为描述本发明对可使程序员于指令层级取代微处理器内的非临时存储器参量的指令，进行翻译与执行的方法的运作流程图900。流程开始于方块902，其中一个组态有扩展特征指令的程序，被送至微处理器。流程接着进行至方块904。(0059) Please refer to FIG. 9 , which is a flow chart 900 describing the operation of the method for translating and executing instructions that enable programmers to replace non-temporary memory parameters in the microprocessor at the instruction level. The flow begins at block 902, where a program configured with extended feature instructions is sent to the microprocessor. The flow then proceeds to block 904 .

(0060)于方块904中，下一个指令是从高速缓存/外部存储器提取。流程接着进行至判断方块906。(0060) In block 904, the next instruction is fetched from cache/external memory. The flow then proceeds to decision block 906 .

(0061)于判断方块906中，对在方块904中所提取的下个指令进行检查，以判断是否包括一本发明的扩展逸出码。在一x86的实施例中，该检查是用以检测运算码值F1(ICE BKPT)。若检测到该扩展逸出码，则流程进行至方块908。若未检测到该扩展逸出码，则流程进行至方块912。(0061) In decision block 906, check the next command extracted in block 904 to determine whether it includes an extended escape code of the present invention. In an x86 embodiment, the check is to detect opcode value F1(ICE BKPT). If the extended escape code is detected, the flow proceeds to block 908 . If the extended escape code is not detected, the flow proceeds to block 912 .

(0062)于方块908中，译码/翻译该扩展指令的扩展前置码部分，以决定是否应用一非临时存取操作，该非临时存取操作是被指定为取代于方块904所提取指令所指定的相关存储器参量的预设存储器属性。流程接着进行到方块910。(0062) In block 908, decode/translate the extended preamble portion of the extended instruction to determine whether to apply a non-temporal access operation designated to replace the fetched instruction at block 904 Default memory attribute for the specified associated memory parameter. Flow then proceeds to block 910 .

(0063)于方块910中，该相关存储器参量的一非临时存取操作指定码于一对应微指令序列的扩展项字段进行组态。流程接着进行至方块912。(0063) In block 910, a non-temporary access operation designation code of the relevant memory parameter is configured in an extension field of a corresponding microinstruction sequence. Flow then proceeds to block 912 .

(0064)于方块912中，该指令的所有其余部分被译码/翻译，以决定所指定的存储器参量、缓存器操作数的位置、存储器地址指定码以及依据该已有微处理器指令集，由前置码所指定的已有架构特征的使用。流程接着进行至方块914。(0064) In block 912, all remaining parts of the instruction are decoded/translated to determine the specified memory parameters, register operand locations, memory address specification codes and according to the existing microprocessor instruction set, Use of existing architectural features specified by prefixes. Flow then proceeds to block 914 .

(0065)于方块914中，一微指令序列被用于指定所指定的存储器参量及其对应的运算码扩展项。流程接着进行至方块916。(0065) In block 914, a microinstruction sequence is used to specify the specified memory parameter and its corresponding opcode extension. Flow then proceeds to block 916 .

(0066)于方块916中，该微指令序列被送至一微指令队列，由微处理器执行。流程接着进行至方块918。(0066) In block 916, the microinstruction sequence is sent to a microinstruction queue for execution by the microprocessor. Flow then proceeds to block 918 .

(0067)于方块918中，该微指令序列由本发明的一地址逻辑装置进行提取。该地址逻辑装置产生该存储器参量的地址，并将该地址送至扩展执行逻辑装置。流程接着进行至方块920。(0067) In block 918, the microinstruction sequence is fetched by an address logic device of the present invention. The address logic device generates the address of the memory parameter and sends the address to the extended execution logic device. The flow then proceeds to block 920 .

(0068)于方块920中，扩展执行逻辑装置运用该微处理器架构的存储器特性描述工具，以决定一预设的存储器特性。流程接着进行至判断方块922。(0068) In block 920, the extended execution logic device uses the memory characterization tool of the microprocessor architecture to determine a predetermined memory characteristic. The flow then proceeds to decision block 922 .

(0069)于判断方块922中进行评估，以判断该微处理器架构的快取/存储器模型是否允许该非临时存取操作取代该预设属性。若非临时存取操作被允许，流程进行至判断方块926。若非临时存取操作未被允许，则流程进行至方块924。(0069) An evaluation is performed in decision block 922 to determine whether the cache/memory model of the microprocessor architecture allows the non-temporal access operation to override the default attribute. If the non-temporary access operation is allowed, the flow proceeds to decision block 926 . If the non-temporary access operation is not allowed, the process proceeds to block 924 .

(0070)于方块924中，通过使用于方块920所决定的预设存储器属性，执行该存储器存取。流程接着进行至方块932。(0070) In block 924, the memory access is performed using the default memory attributes determined in block 920. Flow then proceeds to block 932 .

(0071)于判断方块926中进行评估，以判断于高速缓存中，对应于该指定存储器参量的快取线是否存在且有效。若是，流程进行至方块928。若产生一快取未中，则流程进行至方块930。(0071) An evaluation is performed in decision block 926 to determine whether the cache line corresponding to the specified memory parameter exists and is valid in the cache. If so, the process proceeds to block 928 . If a cache miss occurs, the process proceeds to block 930 .

(0072)于方块928中，由于在高速缓存中，对应于该存储器参量的快取线存在且有效，即使用于方块920所决定的预设存储器属性，经由高速缓存执行该存储器存取。流程接着进行至方块932。(0072) In block 928, since the cache line corresponding to the memory parameter exists and is valid in the cache, ie, for the default memory attribute determined in block 920, the memory access is performed via the cache. Flow then proceeds to block 932 .

(0073)于方块930中，运用非临时工具(如非临时加载缓冲器或/与结合有写功能的缓冲器)执行该存储器参量所指示的操作。流程接着进行至方块932。(0073) In block 930, perform the operation indicated by the memory parameter using a non-temporal tool (such as a non-temporal load buffer or/and a buffer combined with a write function). Flow then proceeds to block 932 .

(0074)于方块932中，本方法完成。(0074) In block 932, the method is complete.

(0075)虽然本发明及其目的、特征与优点已详细叙述，其它实施例亦可包括在本发明的范围内。例如，本发明已就如下的技术加以叙述：利用已完全占用的指令集架构内一单一、未使用的运算码状态作为标记，以标识出其后的扩展特征前置码。但本发明的范围就任一方面来看，并不限于已完全占用的指令集架构，或未使用的指令，或是单一标记。相反地，本发明涵盖了未完全映像的指令集、具已使用运算码的实施例以及使用一个以上的指令标记的实施例。例如，考虑一没有未使用运算码状态的指令集架构。本发明的一具体实施例包括了选取一作为逸出标记的运算码状态，其中选取标准是依市场因素而决定。另一具体实施例则包括使用运算码的一特殊组合作为标记，如运算码状态7FH的连续出现。因此，本发明的本质是在于使用一标记序列，其后则为一n位的扩展前置码，可允许程序员于指令层级指定存储器存取的存储器属性，而该些属性是无法另由微处理器指令集的已有指令来提供。(0075) Although the present invention and its objects, features and advantages have been described in detail, other embodiments may also be included within the scope of the present invention. For example, the present invention has been described with respect to the technique of using a single, unused opcode state within a fully occupied instruction set architecture as a flag to identify a subsequent extended feature preamble. However, the scope of the present invention is not limited in any respect to a fully occupied ISA, or unused instructions, or a single flag. Rather, the invention covers instruction sets that are not fully mapped, embodiments with used opcodes, and embodiments that use more than one instruction flag. For example, consider an ISA without unused opcode states. An embodiment of the present invention includes selecting an opcode state as an escape flag, wherein the selection criteria are determined by market factors. Another embodiment includes using a particular combination of opcodes as a marker, such as successive occurrences of opcode state 7FH. Therefore, the essence of the present invention is that the use of a flag sequence followed by an n-bit extended preamble allows the programmer to specify at the instruction level memory attributes for memory accesses that cannot otherwise be determined by the microprocessor. Existing instructions of the processor instruction set are provided.

(0076)此外，虽然上文是利用微处理器为例来解说本发明及其目的，特征和优点，本领域的技术人员仍可明白，本发明的范围并不限于微处理器的架构，而可涵盖所有形式的可程序化装置，如信号处理器、工业用控制器(industrial controller)、阵列处理机及其它同类装置。(0076) In addition, although the above uses the microprocessor as an example to illustrate the present invention and its purpose, features and advantages, those skilled in the art can still understand that the scope of the present invention is not limited to the architecture of the microprocessor, but Can cover all forms of programmable devices, such as signal processors, industrial controllers (industrial controllers), array processors and other similar devices.

总之，以上所述仅为本发明的较佳实施例而已，当不能以之限定本发明所实施的范围。大凡依本发明权利要求所作的等效变化与修饰，皆应仍属于本发明专利涵盖的范围内。In a word, the above descriptions are only preferred embodiments of the present invention, and should not be used to limit the implementation scope of the present invention. All equivalent changes and modifications made according to the claims of the present invention should still fall within the scope covered by the patent of the present invention.

Claims

1. the device of the memory references control that can instruct level in a microprocessor is characterized in that, comprising:

One translation logic device, in order to an extended instruction is translated into a microinstruction sequence, wherein this extended instruction comprises:

One expansion preamble in order to the specified memory references of this extended instruction, is specified a non-interim accessing operation, and wherein this non-interim accessing operation can not be with existing an appointment of an existing instruction set; And

One expansion preamble mark, in order to identify this expansion preamble, wherein this expansion preamble mark is the former operation code that should have appointment on interior another structure of instruction set; And

One expansion actuating logic device is coupled to this translation logic device, in order to receiving this microinstruction sequence, and uses this non-interim accessing operation and carries out the indicated operation of this memory references.

2. device as claimed in claim 1 is characterized in that described extended instruction also comprises the instruction of this existing instruction set.

3. device as claimed in claim 2 it is characterized in that the computing that the instruction of the existing instruction set that described extended instruction is included specifies this microprocessor to carry out, and wherein this computing comprises this memory references computing.

4. device as claimed in claim 1 is characterized in that described memory references comprises the parameter of a load operation, a store operation or this load operation and this store operation.

5. device as claimed in claim 1, it is characterized in that described non-interim accessing operation indicate this microprocessor forbid this memory references related data get action soon.

6. device as claimed in claim 1, it is characterized in that comprising in the described expansion actuating logic device a non-interim access buffer, described non-interim accessing operation indicate this microprocessor forbid this memory references relevant loading data get action soon, and this loading data is remained in the described non-temporary buffer.

7. device as claimed in claim 1, it is characterized in that described non-interim accessing operation indicate this microprocessor forbid this memory references relevant storage data get action soon, and carry out as the indicated operation of memory references that stores parameter, this storage parameter has and is combined with the memory attribute of writing function.

8. device as claimed in claim 1 is characterized in that described expansion preamble comprises:

One source field will be applied to the specified loading parameter of this extended instruction in order to specify this non-interim accessing operation; And

One destination field will be applied to the specified storage parameter of this extended instruction in order to specify this non-interim accessing operation.

9. device as claimed in claim 1 is characterized in that described translation logic device comprises:

One escape instruction detects logical unit, is used to detect this expansion preamble mark;

One instruction decode logical unit, in order to the computing that decision will be carried out, wherein this computing comprises this memory references computing; And

One expansion decoding logic device is coupled to this escape instruction and detects logical unit and this instruction decode logical unit, in order to this non-interim accessing operation of decision, and specifies this non-interim accessing operation in this microinstruction sequence.

10. microprocessor architecture that expands an existing instruction set with non-temporary storage access control that the instruction level is provided is characterized in that:

This microprocessor architecture stores an extended instruction, be used to specify a non-interim accessing operation of a memory references, wherein this extended instruction comprises an operation code of choosing in the existing instruction set, after this operation code, then follow the expansion preamble of a n position, this operation code of choosing identifies this extended instruction, the expansion preamble of this n position then identifies this non-interim accessing operation, and wherein the non-interim accessing operation of this memory references can not be specified according to the instruction of this existing instruction set; And

This microprocessor architecture comprises a translater, be used to receive this extended instruction, and produce a microinstruction sequence, carry out the indicated operation of this memory references to indicate a microprocessor by this non-interim accessing operation, this microprocessor also comprises an expansion actuating logic device, it is coupled to this translation logic device, in order to receiving this microinstruction sequence, and uses this non-interim accessing operation and carries out the indicated operation of this memory references.

11. microprocessor architecture as claimed in claim 10 is characterized in that described extended instruction also comprises:

All the other instructions are used to specify this memory references, and wherein this memory references comprises that one loads parameter, a storage parameter maybe this loading parameter and this storage parameter.

12. microprocessor architecture as claimed in claim 11 is characterized in that the expansion preamble of described n position comprises:

One first non-interim specific field is used for this non-interim accessing operation of indication and will be applied to this loading parameter; And

One second non-interim specific field is used for this non-interim accessing operation of indication and will be applied to this storage parameter.

13. microprocessor architecture as claimed in claim 10, it is characterized in that described non-interim accessing operation indicate this microprocessor forbid this memory references related data get action soon.

14. microprocessor architecture as claimed in claim 10, it is characterized in that comprising in the described expansion actuating logic device a non-interim access buffer, described non-interim accessing operation indicate this microprocessor forbid this memory references relevant loading data get action soon, and this loading data is sent in the described non-temporary buffer.

15. microprocessor architecture as claimed in claim 10, it is characterized in that described non-interim accessing operation indicate this microprocessor forbid this memory references relevant storage data get action soon, and, carry out the indicated operation of this memory references to be combined with the access mode of writing function.

16. microprocessor architecture as claimed in claim 10 is characterized in that described translater comprises:

One escape instruction detecting device is in order to detect this operation code chosen in this extended instruction;

One command decoder is in order to decipher the part except the described operation code of choosing in this extended instruction, to determine this memory references; , and

One expansion preamble code translator is coupled to this escape instruction detecting device and this command decoder, in order to deciphering the expansion preamble of this n position, and specifies this non-interim accessing operation in this microinstruction sequence.

17. one kind is the device that an existing instruction set increases the non-interim accessing operation controlling features of instruction level, it is characterized in that, comprising:

One translation logic device, receive an effusion mark, this effusion mark identifies the subsidiary part of a corresponding instruction and has specified a memory references, wherein this effusion is labeled as one first operation code in this existing instruction set, a non-interim accessing operation designated code wherein, be coupled to this effusion mark, and for should subsidiary part one of them, to be applied to this memory references in order to specify a non-interim accessing operation; And

One expansion actuating logic device is coupled to this translation logic device, carries out the indicated operation of this memory references by this non-interim accessing operation.

18. device as claimed in claim 17 is characterized in that the remainder of described subsidiary part comprises one second operation code, in order to specify this memory references.

19. device as claimed in claim 17, it is characterized in that described translation logic device should the effusion mark with should subsidiary part translate into corresponding micro-order, this corresponding micro-order is to indicate this expansion actuating logic device to carry out the indicated operation of this memory references by this non-interim accessing operation.

20. device as claimed in claim 17 is characterized in that described translation logic device comprises:

One effusion marker detection logical unit in order to detecting this effusion mark, and indicates translation action of this subsidiary part to need according to the expansion translation protocol; And

One decoding logic device, be coupled to this effusion marker detection logical unit, should have the agreement of instruction set in order to foundation, the translation action of execution command, and carry out the translation of this correspondence instruction according to this expansion translation protocol, to pass through this non-interim accessing operation with indication, carry out the indicated operation of this memory references.

21. a method that expands an existing instruction set architecture so that the control of non-temporary storage parameter to be provided in the instruction level, is characterized in that this method comprises:

One extended instruction is provided, and this extended instruction comprises an extending marking and an expansion preamble, and wherein this extending marking is wherein one first operation code of this existing instruction set architecture;

Specify a non-interim accessing operation that will be applied to a corresponding memory references by this expansion preamble, wherein this memory references is specified by the remainder of this extended instruction; And

Use this non-interim accessing operation to carry out the indicated operation of this memory references, wherein this using action has forbidden getting soon the corresponding data of this memory references in advance.

22. method as claimed in claim 21 is characterized in that described required movement comprises:

At first in the remainder of this extended instruction, specify this memory references, wherein this at first the action of appointment comprise use should existing instruction set architecture in one second operation code.

23. method as claimed in claim 21 is characterized in that, it also comprises:

This extended instruction is translated into a microinstruction sequence, and this microinstruction sequence is that indication one expansion actuating logic device is carried out the indicated operation of this memory references by this non-interim accessing operation.

24. method as claimed in claim 23 is characterized in that the action of described translation extended instruction comprises:

In a translation logic device, detect this extending marking; And

Decipher the remainder of this expansion preamble and this extended instruction according to expanding translation rule, to produce this microinstruction sequence.

25. method as claimed in claim 21 is characterized in that described required movement comprises:

Specify this non-interim accessing operation to be applied to a source operand, a destination operand or above both.