CN1842770A

CN1842770A - A holistic mechanism for suspending and releasing threads of computation during execution in a processor

Info

Publication number: CN1842770A
Application number: CN 200480024800
Authority: CN
Inventors: 凯文·基塞尔
Original assignee: MIPS Technologies Inc
Current assignee: MIPS Tech LLC
Priority date: 2003-08-28
Filing date: 2004-08-26
Publication date: 2006-10-04
Also published as: CN100538640C; CN1846194A; CN100489784C; CN1842771A; CN1846194B; CN1842769A

Abstract

A processing mechanism in a processor capable of supporting and executing a plurality of program threads, comprising a parameter for scheduling a program thread and an instruction in the program thread, the instruction having access to the parameter. When the parameter equals the first value, and when a program thread issues the instruction, the instruction reschedules the program thread in accordance with one or more conditions encoded in the parameter.

Description

An overall mechanism for suspending and releasing computing threads during execution in a processor Related applications cross-referenced with the present invention

本发明要求以下申请的优先权：This application claims priority from the following applications:

(1)美国暂时申请案No.60/499,180，申请日2003年8月28日，其标题为“多线程应用的特别扩充(Multithreading Application SpecificExtension)”(代理人标号P3865，发明人凯文基赛尔(Kevin D.Kissell)，快邮编号EV 315085819 US)，(1) U.S. Provisional Application No. 60/499,180, dated August 28, 2003, titled "Multithreading Application Specific Extension" (attorney number P3865, inventor Kevin Kissey Kissell (Kevin D. Kissell, Express Mail No. EV 315085819 US),

(2)美国暂时申请案No.60/502,358，申请日2003年9月12日，其标题为“在一处理器架构上多线程应用的特别扩充(MultithreadingApplication Specific Extension to a Processor Architecture)”(代理人标号0188.02US，发明人凯文基赛尔(Kevin D.Kissell)，快邮编号ER456368993US)，和(2) U.S. Provisional Application No. 60/502,358, dated September 12, 2003, titled "Multithreading Application Specific Extension to a Processor Architecture on a Processor Architecture" (Proxy ID 0188.02US, Inventor Kevin D. Kissell, Express Mail No. ER456368993US), and

(3)美国暂时申请案No.60/502,359，申请日2003年9月12日，其标题为“在一处理器架构上多线程应用的特别扩充(MultithreadingApplication Specific Extension to a Processor Architecture)”(代理人标号0188.03US，发明人凯文基赛尔(Kevin D.Kissell)，快邮编号ER456369013US)，其中提到的各申请案的全部内容皆为本发明所参照的参考资料。(3) U.S. Provisional Application No. 60/502,359, dated September 12, 2003, titled "Multithreading Application Specific Extension to a Processor Architecture on a Processor Architecture" (Proxy No. 0188.03US, inventor Kevin D. Kissell (Kevin D. Kissell), Express Mail No. ER456369013US), the entire content of each application mentioned therein is the reference material that the present invention refers to.

本发明也与申请中的美国非暂时申请案No.(尚未收到)相关，申请日2003年10月10日，其标题为“确定多线程处理器上执行的程序的服务质量的机制(Mechanisms for Assuring Quality of Servicefor Programs Executing on a Multithreaded Processor)”(代理人标号3865.01，发明人凯文基赛尔(Kevin D.Kissell)，快邮编号EL988990749 US)，这里提到的申请案的全部内容皆为本发明所参照的参考资料。This invention is also related to pending U.S. Nonprovisional Application No. (not yet received), filed October 10, 2003, entitled "Mechanisms for Determining Quality of Service for Programs Executing on Multithreaded Processors" for Assuring Quality of Service for Programs Executing on a Multithreaded Processor)” (Attorney No. 3865.01, Inventor Kevin D. Kissell, Express Mail No. EL988990749 US), the entire content of the application mentioned here is Reference materials referred to in the present invention.

技术领域technical field

本发明属于数字处理器的领域(例如，微处理器、数字信号处理器、微控制器等等)，特别是有关于，涉及在单个处理器中管理多个线程的执行的装置与方法。The present invention is in the field of digital processors (eg, microprocessors, digital signal processors, microcontrollers, etc.), and more particularly, relates to apparatus and methods for managing the execution of multiple threads within a single processor.

技术背景technical background

在数字计算的领域，计算能力的发展历史显示了在各方面都有持续的进步。持续的进步一直在发生，例如处理器的装置密度与线路互连的技术，可用于改善运算速度、容错能力、使用更高速的时脉信号或者更多其它改进。另一个可改善整体计算能力的研究领域为并行处理，其不仅包括使用多个分开的处理器执行并行操作。In the field of digital computing, the history of computing power shows continuous progress on all fronts. Continuing advances are taking place all the time, such as processor device density and wire interconnection technology, which can be used to improve computing speed, fault tolerance, use higher speed clock signals, or many other improvements. Another area of research that could improve overall computing power is parallel processing, which involves more than just using multiple separate processors to perform parallel operations.

并行处理的概念包括将任务分散至多个分开的处理器，但是也包括多个程序同时在一个处理器上执行的方案。此方案一般被称为多线程。The concept of parallel processing involves distributing tasks across separate processors, but also includes scenarios in which multiple programs are executed simultaneously on a single processor. This scheme is generally referred to as multithreading.

接下来将介绍多线程的概念：随着处理器操作频率逐渐加快，要隐藏在计算机系统的操作中固有的延迟(latency)也变的越来越困难。一个高级处理器在一个特定应用中其高速数据缓存中丢失了百分之一的指令，如果它对于片外RAM有50个周期的延迟的话，则可能导致大概百分之五十的时间停顿。如果当该处理器因为丢失的高速缓存指令而停顿时，属于另一个不同应用程序的指令可以被执行的话，该处理器的性能可以因此而改善，并且一部份或者全部的跟内存有关的延迟也可有效的被消除。举例来说，图1A显示了因为高速缓存丢失而停顿的单个指令流101。支持该指令运作的机器仅可在一个时间内执行单个线程或任务。相反的，图1B显示了在指令流101停顿时指令流102可被执行。在这种情况当中，该支持机器可以同时支持两个线程，也因此更有效的使用该机器所拥有的资源。Next, the concept of multi-threading will be introduced: as the operating frequency of the processor gradually increases, it becomes more and more difficult to hide the inherent delay (latency) in the operation of the computer system. An advanced processor that misses 1 percent of its instructions in its cache in a particular application, if it has 50 cycles of latency to off-chip RAM, can cause stalls about 50 percent of the time. The performance of the processor can be improved if an instruction belonging to a different application can be executed while the processor is stalled due to a missing cache instruction, and some or all of the memory-related delay can also be effectively eliminated. For example, FIG. 1A shows a single instruction stream 101 stalled due to a cache miss. Machines that support this instruction can only execute a single thread or task at a time. In contrast, FIG. 1B shows that instruction stream 102 can be executed while instruction stream 101 is stalled. In this case, the supporting machine can support two threads at the same time, and thus use the resources owned by the machine more efficiently.

更一般的说，各个单独的计算机指令都具有特定的语法，使得不同种类的指令需要不同的资源去执行期望的运算。整数负载没有充分使用到整个浮点运算单元的逻辑或寄存器，任何除了寄存器移位之外的运算皆需要使用加载/储存单元的资源。没有一个单一指令使用到全部处理器的资源，而且当为了追求更高性能的设计而因此加入了更多的管线级与并行功能单元后，会进而降低平均被所有指令使用而全部消耗的处理器资源的比例。More generally, each individual computer instruction has a specific syntax such that different types of instructions require different resources to perform desired operations. Integer loads do not fully utilize the logic or registers of the entire floating-point unit, and any operation other than register shifts requires the use of load/store unit resources. No single instruction uses all the resources of the processor, and when more pipeline stages and parallel functional units are added in order to pursue a higher performance design, it will further reduce the average consumption of the processor used by all instructions ratio of resources.

多线程的发展一大部分源自于，如果一个顺序程序基本上不能有效率地完全使用处理器的全部资源，该处理器就应该能够将这些资源中的一部分在属于程序执行的多个线程当中进行分配。这种方式并不一定导致任何特定程序可被更快速的执行，事实上，一些多线程方案实质上降低了单一线程程序执行的性能，然而却可使一个并行指令流的集合在更短的时间内和/或使用更少的处理器数目来运行。这个概念可用图2A与图2B来说明，其中各自显示了单一线程处理器210与双线程处理器250。处理器210支持单一线程212，表示为使用加载/储存单元214。如果当存取高速缓存216时发生一个丢失，那么处理器210便会停顿(根据图1A所描述)直到重新获得该丢失数据。在这个过程当中，乘法器/除法器单元218始终处于闲置状态而且没有被有效使用。然而，处理器250支持两个线程；即212与262。因此，若是线程212发生停顿，处理器250仍然可以同时执行线程262与乘法器/除法器单元218，因而更有效地利用了所有的资源(根据图1B所描述)。A large part of the development of multithreading stems from the fact that if a sequential program cannot efficiently use all of the processor's resources, the processor should be able to allocate some of these resources among the multiple threads belonging to the program's execution. to allocate. This approach does not necessarily result in any particular program being executed faster. In fact, some multithreading schemes substantially degrade the performance of single-threaded program execution, yet allow a collection of parallel instruction streams to execute in less time. within and/or use fewer processors. This concept is illustrated in Figures 2A and 2B, which show a single-threaded processor 210 and a dual-threaded processor 250, respectively. Processor 210 supports a single thread 212 , shown using a load/store unit 214 . If a miss occurs while accessing cache 216, processor 210 stalls (as described with respect to FIG. 1A) until the missing data is retrieved. During this process, the multiplier/divider unit 218 is always idle and not actively used. However, processor 250 supports two threads; namely 212 and 262 . Therefore, if the thread 212 is stalled, the processor 250 can still execute the thread 262 and the multiplier/divider unit 218 concurrently, thereby more efficiently utilizing all resources (as described with respect to FIG. 1B ).

在单一处理器上具有多线程可获得更佳的多任务处理能力的好处。然而，捆绑多个程序线程在关键事件上可以降低事件反应时间，而且线程级的并行处理在原理上可在单一应用程序中被充分利用。Having multiple threads on a single processor has the benefit of better multitasking capabilities. However, bundling multiple program threads on key events can reduce event reaction time, and thread-level parallel processing can in principle be fully exploited within a single application.

已经提出了各种多线程处理方式。其中之一为指令交错多线程(interleaved multithreading)，也就是分时复用(TDM)方案，对于每个发出的指令从一个线程切换至另一个线程。该方案在调度上有一定程度的“公平性”，但是为了静态分配多个发起槽(issue slot)至多个线程，通常会限制单一程序线程的性能。动态交错的方式可以改良这个问题，但是实现这个方式比较复杂。Various multithreading approaches have been proposed. One of these is interleaved multithreading, a time-division multiplexing (TDM) scheme that switches from one thread to another for each issued instruction. This scheme has a certain degree of "fairness" in scheduling, but in order to statically allocate multiple issue slots to multiple threads, it usually limits the performance of a single program thread. The dynamic interleaving method can improve this problem, but it is more complicated to realize this method.

另一个多线程的方案是块交错多线程(blocked multithreading)，其从一个单一程序线程持续地发出连续多个指令，直到某个特定的阻塞事件发生，例如高速缓存丢失或者重新设定，举例来说，导致该线程被挂起而另一个线程被激活。因为块交错多线程变换线程的频率较小，所以其实现方式可以是比较简单的。另一方面，阻塞的动作在调度线程的过程中是比较不具有“公平性”。单个线程可以独占整个处理器很长一段时间，如果它非常地幸运能在高速缓存中找到它需要的所有数据。一种混合调度方案结合了块交错多线程与指令交错多线程的特定，也常被使用与研究。Another multithreading scheme is blocked multithreading, which continuously issues consecutive instructions from a single program thread until a specific blocking event occurs, such as a cache miss or reset, for example Say, causing the thread to be suspended while another thread is activated. Because the frequency of changing threads by block interleaved multithreading is relatively small, its implementation may be relatively simple. On the other hand, blocking actions are less "fair" in the process of scheduling threads. A single thread can monopolize the entire processor for a long time, if it is very lucky to find all the data it needs in cache. A hybrid scheduling scheme that combines block-interleaved multithreading and instruction-interleaved multithreading is often used and researched.

仍然有另一种多线程的形式为同时多线程(simultaneousmultithreading)，是一种在超标量处理器上实现的方案。在同步多线程中，来自不同线程的多个指令可以被同时发出。例如，一个超标量精简指令集计算机(RISC)，每周期发出总共两个指令，以及一个同步多线程的超标量管线，每周期从两个线程中任一个发出总计两个指令。那些单一程序线程所依附或停顿的周期会导致该处理器不被充分利用，因此在同步多线程中这些周期可被另一线程的发出指令所填补。There is still another form of multithreading called simultaneous multithreading (simultaneousmultithreading), which is a solution implemented on a superscalar processor. In synchronous multithreading, multiple instructions from different threads can be issued simultaneously. For example, a superscalar reduced instruction set computer (RISC) issues a total of two instructions per cycle, and a synchronously multithreaded superscalar pipeline issues a total of two instructions per cycle from either of two threads. Cycles that are tied to or stalled by a single program thread can cause the processor to be underutilized, so in SMT these cycles can be filled by issuing instructions from another thread.

同步多线程也因此成为一个非常有用的技术，用以解决并恢复在超标量管线中所浪费的效率。但也有争议地被认为是最复杂的多线程系统所使用的方法，因为在一周期内激活多于一个线程，会使得内存存取保护装置的实现更加地复杂等等。另外一个值得注意的一点，对于一定的工作量，中央处理器(CPU)操作的管线化越完善，越会降低其多线程实现的潜在获得的效率。Synchronous multithreading thus becomes a very useful technique to address and restore lost efficiencies in superscalar pipelines. But it is also arguably considered the most complex method used by multi-threaded systems, because activating more than one thread in a cycle will make the implementation of memory access protection devices more complicated, etc. Another point worth noting is that for a certain workload, the more perfect the pipeline of central processing unit (CPU) operations, the more it will reduce the potential efficiency obtained by its multi-threaded implementation.

多线程与多重处理实质上是非常相关的。事实上，一般可认为其中的差异只有一点不同：也就是多重处理器只共享内存和/或线路连线，而多线程处理器除了共享内存和/或线路连线之外，还共享指令提取与发出逻辑，和其它可能的处理器资源。在单个多线程处理器当中，各个线程互相竞争发起槽与其它资源，从而限制了并行处理能力。某些多线程程序与架构模型假设了新线程会被分配到不同的处理器，使得该程序可被有效的并行处理。Multithreading is very much related to multiprocessing in nature. In fact, it can generally be considered that there is only one difference: that multiprocessors only share memory and/or wires, while multithreaded processors share instruction fetches and wires in addition to memory and/or wires. Issue logic, and possibly other processor resources. In a single multithreaded processor, threads compete with each other for issue slots and other resources, limiting parallel processing capabilities. Certain multithreaded programs and architectural models assume that new threads are assigned to different processors so that the program can be processed effectively in parallel.

在本申请案提出申请时，应该已有许多的多线程解决方案，用以解决当前领域许多不同的问题。其中之一即为实时线程的改善方案。一般来说，实时多媒体算法通常被执行于专用处理器/数字信号处理器(DSP)，用以保证服务质量(QoS)与响应时间，且并不包括将线程混合并共享于多线程方案当中，因为实时软件并无法容易地获得保证该软件可被适时地执行。At the time this application was filed, there should have been many multi-threaded solutions to solve many different problems in the current field. One of them is the improvement scheme of real-time thread. In general, real-time multimedia algorithms are usually executed on dedicated processors/digital signal processors (DSPs) to ensure quality of service (QoS) and response time, and do not include mixing and sharing threads in multi-threaded solutions, Because real-time software cannot be easily obtained to ensure that the software can be executed in a timely manner.

在这方面，非常清楚的必须要有一个方案与机制，允许一个或多个实时线程或虚拟处理器能够在特定的指令间的间隔保证获得在多线程处理器中特定比例的指令发起槽，因而计算带宽与响应时间能被清楚的定义。如果这样的机制能被使用，则具有严格QoS要求的线程即可被包含在此多线程的混用中。另外，这种系统中的实时线程(如DSP相关的线程)，可以或多或少地被避免于遇到中断而因搬动重要资源而变化执行的时间。这种技术可以在特殊的情况下使用具有DSP加强功能的RISC处理器与内核，以取代一般在消费性多媒体应用上使用分离的RISC与DSP内核。In this regard, it is very clear that there must be a scheme and mechanism that allows one or more real-time threads or virtual processors to be able to guarantee a specific proportion of instruction issue slots in a multi-threaded processor at a specific interval between instructions, so that Computing bandwidth and response time can be clearly defined. If such a mechanism can be used, threads with strict QoS requirements can be included in the multi-thread mix. In addition, real-time threads in such systems (such as DSP-related threads) can be more or less prevented from changing the execution time due to moving important resources when encountering interrupts. This technology allows the use of a RISC processor and core with DSP enhancements in special cases, instead of the separate RISC and DSP cores typically used in consumer multimedia applications.

在本申请案提出申请时，当前技术中多线程方案的另一个问题为在处理器中产出与消灭激活的线程。为了支持相对细粒度多线程(fine-grained multithreading)，期望以可能的最小系统开销产生和消灭程序执行过程的并行线程，并且至少在一般的情况下不会干扰到必须的操作系统功能。在这个方面，清楚需要的是一些指令如FORK(线程产生)与JOIN(线程终止)。另一个存在于多线程处理器中的问题为，调度原则使得一个线程持续运行直到被另一些资源阻塞，然而一个没有被任何资源阻塞的线程仍然需要使得该处理器切换至其它的线程。所以在这个方面可以清楚的知道，还有需要PAUSE或YIELD指令。Another problem with multithreading solutions in the state of the art at the time of filing of this application is spawning and destroying active threads in the processor. To support relatively fine-grained multithreading, it is desirable to spawn and destroy parallel threads of program execution with the minimum possible overhead and, at least in general, without interfering with essential operating system functions. In this regard, what is clearly needed are instructions such as FORK (thread creation) and JOIN (thread termination). Another problem that exists in multithreaded processors is that the scheduling principle makes a thread keep running until it is blocked by some other resource, while a thread that is not blocked by any resource still needs to cause the processor to switch to other threads. So in this regard, it can be clearly known that there is still a need for PAUSE or YIELD instructions.

发明内容Contents of the invention

本发明的一个基本目的是提供一种适用于细粒度多线程的健壮系统，其中可以利用最小的系统开销来产生和消灭线程。根据这个目的，在根据本发明的一个较佳实施例中，在一个能够支持并执行多程序线程的处理器中，提供了一种处理机制，其包括：一个参数，用于调度一个程序线程；和一个设置于该程序线程中并且能存取该参数的指令。当该参数等于第一数值时，该指令即根据编码在该参数中的一个或多个条件来重新调度该程序线程。在一个有关该机制的较佳实施例中，该参数被保存在一个数据储存装置中。并且在另一个较佳实施例中，当该参数等于第二数值时，其中该第二数值不等于该第一数值，该指令即释放该程序线程。在某些实施例中，该第二数值为零。A basic object of the present invention is to provide a robust system for fine-grained multithreading in which threads can be spawned and destroyed with minimal overhead. According to this purpose, in a preferred embodiment of the present invention, in a processor capable of supporting and executing multiple program threads, a processing mechanism is provided, which includes: a parameter for scheduling a program thread; and an instruction set in the program thread that can access the parameter. When the parameter is equal to the first value, the instruction reschedules the program thread based on one or more conditions encoded in the parameter. In a preferred embodiment of the mechanism, the parameters are stored in a data storage device. And in another preferred embodiment, when the parameter is equal to a second value, wherein the second value is not equal to the first value, the instruction releases the program thread. In some embodiments, the second value is zero.

在某些实施例中，当该参数等于该第二数值时，其中该第二数值不等于该第一数值，该指令即无条件的重新调度该程序线程。另外在一些实施例中，该第二数值为一奇数值。在一些其它的较佳实施例中，该第二数值为负1。In some embodiments, when the parameter is equal to the second value, wherein the second value is not equal to the first value, the instruction unconditionally reschedules the program thread. In addition, in some embodiments, the second value is an odd value. In some other preferred embodiments, the second value is negative 1.

在一些较佳实施例中，该一个或多个条件中的一个条件与将执行机会让与其他线程直到一个条件被满足的程序线程相关。另外，在一些实施例中，该一个条件被编码于该参数中的一个位向量或位字段中。还有，在一些实施例中，在该程序线程被重新调度的情形下，该程序线程的执行在该线程当中该指令之后继续。还有，在其他一些较佳实施例中，当该参数等于第三数值时，其中该第三数值不等于该第一数值与第二数值，该指令无条件地重新调度该程序线程。In some preferred embodiments, one of the one or more conditions is associated with a program thread that relinquishes execution opportunities to other threads until a condition is satisfied. Additionally, in some embodiments, the one condition is encoded in a bit vector or bit field in the parameter. Also, in some embodiments, where the program thread is rescheduled, execution of the program thread continues after the instruction in the thread. Also, in some other preferred embodiments, when the parameter is equal to a third value, wherein the third value is not equal to the first value and the second value, the instruction reschedules the program thread unconditionally.

某一些有关于该机制的较佳实施例中，该一个或多个条件中的一个条件为一个硬件中断。还有，在一些实施例中，该一个或多个条件中的一个条件为一个软件中断。还有，在许多实施例中，当该程序线程被重新调度的情形下，该程序线程的执行在该线程中该指令之后的位置继续。In some preferred embodiments of the mechanism, one of the one or more conditions is a hardware interrupt. Also, in some embodiments, one of the one or more conditions is a software interrupt. Also, in many embodiments, when the program thread is rescheduled, execution of the program thread continues at the location in the thread following the instruction.

根据本发明的另一方面，在一个能够支持并执行多程序线程的处理器中，提供了一种由一个线程重新调度执行或释放该线程本身的方法，其包括：(a)发出一个指令，其能存取一个数据储存装置中的一部分记录，该部分记录编码了与决定该线程是否会被重新调度的一个或多个条件相关的一个或多个参数；和(b)遵循该条件，根据在该部分记录中的一个或多个参数重新调度或释放该线程。在一个较佳实施例中，该记录被置于一个通用寄存器(GPR)中。另外，在一个较佳实施例中，这些参数中的一个参数与该被释放的而不是被重新调度的线程相关。在一些较佳实施例中，与该被释放的线程相关的该一个参数的值为零。According to another aspect of the present invention, in a processor capable of supporting and executing multi-program threads, a method for rescheduling execution by a thread or releasing the thread itself is provided, which includes: (a) issuing an instruction, It can access a portion of records in a data storage device that encodes one or more parameters related to one or more conditions that determine whether the thread will be rescheduled; and (b) comply with the conditions, according to One or more parameters in this section record reschedules or releases this thread. In a preferred embodiment, the record is placed in a general purpose register (GPR). Additionally, in a preferred embodiment, one of these parameters is associated with the thread being released rather than being rescheduled. In some preferred embodiments, the value of the one parameter associated with the released thread is zero.

在一些实施该方法的实施例当中，这些参数中的一个参数与被重新排队等待调度的线程相关。在一些实施例当中，该参数为任意的奇数值。在一些实施例当中，该参数为负1的二进制补码值。在一些实施例中，这些参数中的一个参数与将执行机会让与其他线程直到一个特定条件被满足的实施例相关。另外，在另一些实施例中，该一个条件被编码于该记录中的一个位向量或是一个或多个位字段中。In some embodiments implementing the method, one of the parameters is associated with the thread being requeued for scheduling. In some embodiments, this parameter is any odd value. In some embodiments, this parameter is a negative one's two's complement value. In some embodiments, one of these parameters is associated with an embodiment that relinquishes execution opportunities to other threads until a particular condition is met. Additionally, in other embodiments, the one condition is encoded in a bit vector or one or more bit fields in the record.

在许多实施该方法的实施例当中，在该线程发出该指令并且被重新调度的情形下，当该一个或多个条件被满足时，该线程的执行在线程指令流中该线程发出的该指令之后的位置继续。在一些实施例当中，这些参数中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与被重新排队等待调度的线程相关。在其它的一些实施例当中，这些参数中的一个参数与该被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。在其它的一些实施例当中，这些参数中的一个参数与被重新排队等待重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定的条件被满足的线程相关。另外，在其它的一些实施例当中，这些参数中的一个参数与该被释放而不是被重新调度的线程相关，而这些参数中的另一个参数与被重新排队等待调度的线程相关，并且这些参数中的再另一个参数与将执行机会让与其它线程直到一个特定的条件被满足的线程相关。In many embodiments implementing the method, in the event that the thread issues the instruction and is rescheduled, when the one or more conditions are met, execution of the thread in the thread instruction stream issues the instruction After the position continues. In some embodiments, one of these parameters relates to threads that are released rather than rescheduled, and another of these parameters relates to threads that are requeued for scheduling. In other embodiments, one of these parameters is related to the thread being released rather than rescheduled, and another of these parameters is related to relinquishing execution opportunities to other threads until a specific condition is met. thread dependent. In other embodiments, one of these parameters relates to threads being requeued for rescheduling, and another of these parameters relates to threads yielding execution to other threads until a specified condition is met relevant. Additionally, in other embodiments, one of these parameters is associated with the thread being released rather than rescheduled, and another of these parameters is associated with the thread that has been requeued for scheduling, and these parameters Yet another parameter in is related to threads that relinquish execution opportunities to other threads until a specific condition is met.

根据本发明的另一个方面，提供了一种能支持与执行多个软件实体的数字处理器，包括一个数据储存装置中的一部分记录，该部分记录编码了与一个或多个条件相关的一个或多个参数，该一个或多个条件决定了当一个线程将执行机会让与其它线程时该线程是否会被重新调度。According to another aspect of the present invention, there is provided a digital processor capable of supporting and executing multiple software entities, comprising a portion of records in a data storage device encoding one or more conditions associated with one or more conditions. Multiple parameters, the one or more conditions determine whether a thread will be rescheduled when it yields execution opportunities to other threads.

在该处理器的一些实施例当中，该部分记录被置于一个通用寄存器(GPR)中。在一些其它较佳实施例，这些参数中的一个参数与被释放而不是被重新调度的线程相关。在另一些较佳实施例中，该与被释放的线程相关的参数的值为零。In some embodiments of the processor, the portion of the record is placed in a general purpose register (GPR). In some other preferred embodiments, one of these parameters is associated with threads being released rather than rescheduled. In some other preferred embodiments, the value of the parameter related to the released thread is zero.

在该处理器的其他一些实施例当中，这些参数中的一个参数与被重新排队等待调度的线程相关。在一些实施例当中，该一个参数为任意的奇数值。在另一些实施例当中，该参数为负1的二进制补码值。在另一些实施例中，这些参数中的一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。另外，在一些情况下，该一个参数可以被编码于该记录中的一个位向量或是一个或多个位字段中。In some other embodiments of the processor, one of the parameters is associated with the thread being requeued for scheduling. In some embodiments, the one parameter is any odd value. In other embodiments, the parameter is a negative one's two's complement value. In other embodiments, one of these parameters is associated with a thread that relinquishes execution opportunities to other threads until a specified condition is met. Additionally, in some cases, the one parameter may be encoded in a bit vector or one or more bit fields in the record.

在该处理器的其他一些实施例当中，这些参数中的一个参数与该被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与被重新排队等待调度的线程相关。在其它的一些实施例当中，这些参数中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。在其它的一些实施例当中，该参数当中的一个参数与被重新排队等待重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In some other embodiments of the processor, one of the parameters is associated with the thread that was released rather than rescheduled, and another of the parameters is associated with the thread that was requeued for scheduling. In other embodiments, one of these parameters relates to threads that are released rather than rescheduled, and another of these parameters relates to threads that yield execution to other threads until a specified condition is met relevant. In other embodiments, one of the parameters is associated with threads being requeued for rescheduling, and another of these parameters is associated with threads that yield execution opportunities to other threads until a specified condition is met .

在其它的一些实施例当中，这些参数中的一个参数与被释放而不是被重新调度的线程相关，而该参数当中的另一个参数与被重新排队等待调度的线程相关，并且这些参数中的再另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In other embodiments, one of these parameters relates to threads that are released rather than rescheduled, and another of the parameters relates to threads that are requeued for scheduling, and the rescheduling of these parameters Another parameter relates to threads that yield execution opportunities to other threads until a certain condition is met.

根据本发明的另一方面，提供一种能支持与执行多个程序线程的处理系统，包括：一个数字处理器；一个数据储存装置中的一部分记录，该部分记录编码了与决定一个线程是否会被重新调度的一个或多个条件相关的一个或多个参数；和一个指令集，其包含一个指令用以重新调度和释放该线程。当该线程发出该指令时，该指令存取该记录中的该一个或多个参数，并且该系统遵循该一个或多个条件，根据该部分记录中的该一个或多个参数，重新调度和释放发出该指令的线程。According to another aspect of the present invention, there is provided a processing system capable of supporting and executing multiple program threads, comprising: a digital processor; a portion of records in a data storage device encoding and determining whether a thread will one or more parameters associated with the one or more conditions to be rescheduled; and an instruction set including an instruction to reschedule and release the thread. When the thread issues the instruction, the instruction accesses the one or more parameters in the record, and the system follows the one or more conditions to reschedule and Release the thread that issued the instruction.

在该处理系统的一些较佳实施例当中，该记录被置于一个通用寄存器(GPR)中。在一些其它的较佳实施例，这些参数中的一个参数与被释放而不是被重新调度的线程相关。在一些实施例中，与被释放的线程相关的该一个参数的值为零。在一些其它的实施例当中，这些参数中的一个参数与被重新排队等待调度的线程相关。在一些实施例当中，该用来重新调度的参数为任意的奇数值。在其它的一些实施例当中，该用来重新调度的参数为负1的二进制补码值。In some preferred embodiments of the processing system, the record is placed in a general purpose register (GPR). In some other preferred embodiments, one of these parameters is associated with threads being released rather than rescheduled. In some embodiments, the value of the one parameter associated with the thread being released is zero. In some other embodiments, one of these parameters is associated with a thread that is requeued for scheduling. In some embodiments, the parameter used for rescheduling is any odd value. In some other embodiments, the parameter used for rescheduling is a negative 1 two's complement value.

在该系统的一些实施例当中，这些参数中的一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。另外，在一些实施例中，该一个参数被编码于该记录中的一个位向量或是一个或多个位字段中。在该系统的许多实施例当中，在一个线程发出该指令并且被有条件地重新调度的情形下，当该一个或多个条件被满足时，该线程的执行在该线程指令流中该指令之后的位置继续。In some embodiments of the system, one of these parameters is associated with threads that yield execution opportunities to other threads until a specified condition is met. Additionally, in some embodiments, the one parameter is encoded in a bit vector or one or more bit fields in the record. In many embodiments of the system, where a thread issues the instruction and is conditionally rescheduled, execution of the thread follows the instruction in the thread's instruction stream when the one or more conditions are met The location continues.

在该处理系统的一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与被重新排队等待调度的线程相关。在其它的一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In some embodiments of the processing system, one of the parameters relates to threads that are released rather than rescheduled, and another of the parameters relates to threads that are requeued for scheduling. In other embodiments, one of the parameters relates to threads being freed rather than rescheduled, and another of these parameters relates to threads yielding execution to other threads until a specified condition is met relevant.

在其它的一些实施例当中，该参数当中的一个参数与被重新排队等待重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足相关。另外，在其它的一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，而该参数当中的另一个参数与被重新排队等待调度的线程相关，并且这些参数中的再另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In other embodiments, one of the parameters relates to threads being requeued for rescheduling, and another of the parameters relates to yielding execution opportunities to other threads until a specified condition is met. In addition, in some other embodiments, one of the parameters is related to the thread that is released rather than rescheduled, and another parameter is related to the thread that is requeued for scheduling, and among these parameters Yet another parameter of is related to threads that relinquish execution opportunities to other threads until a specific condition is met.

另外，根据本发明的又一个方面，提供了一种数字储存介质，其上写入了来自一个指令集的指令，用于在一个数字处理器上执行多个软件线程中的各个软件线程，该指令集包含了一个指令，该指令使发出该指令的线程放弃执行，并且存取一个数据储存装置中的一部份记录中的一个参数，其中，关于释放或重新调度的条件与该参数相关，并且遵循该条件，根据该部份记录中的该参数执行释放或重新调度。In addition, according to still another aspect of the present invention, a digital storage medium is provided, on which instructions from an instruction set are written, for executing each software thread among a plurality of software threads on a digital processor, the an instruction set comprising an instruction that causes a thread issuing the instruction to abort execution and access a parameter in a portion of a record in a data storage device, wherein a condition for release or rescheduling is associated with the parameter, And follow this condition, perform release or reschedule according to this parameter in this partial record.

在该介质的一些实施例当中，该记录被置于一个通用寄存器(GPR)中。在该介质的一些其它实施例中，这些参数中的一个参数与被释放而不是被重新调度的线程相关。在一些实施例中，该与被释放的线程相关的参数的值为零。在该介质的一些其它实施例当中，这些参数中的一个参数与被重新排队等待调度的线程相关。在一些实施例当中，该一个参数为任意的奇数值。在另一些实施例当中，该一个参数为负1的二进制补码值。In some embodiments of the medium, the record is placed in a general purpose register (GPR). In some other embodiments of the medium, one of the parameters relates to a thread that is released rather than rescheduled. In some embodiments, the parameter associated with the released thread has a value of zero. In some other embodiments of the medium, one of the parameters relates to a thread that is requeued for scheduling. In some embodiments, the one parameter is any odd value. In other embodiments, the one parameter is a negative one's two's complement value.

在该介质的另一些实施例当中，这些参数中的一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。在另一些实施例中，该参数被编码于该记录中的一个位向量或是一个或多个位字段中。在另外一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与被重新排队等待调度的线程相关。在其它的一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In other embodiments of the medium, one of the parameters is associated with a thread that yields execution to other threads until a specified condition is met. In other embodiments, the parameter is encoded in a bit vector or one or more bit fields in the record. In other embodiments, one of the parameters relates to threads that are released rather than rescheduled, and another of the parameters relates to threads that are requeued for scheduling. In other embodiments, one of the parameters relates to threads being freed rather than rescheduled, and another of these parameters relates to threads yielding execution to other threads until a specified condition is met relevant.

在该机制的一些实施例当中，该参数当中的一个参数与被重新排队等待重新调度的线程相关，并且这些参数中的另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。另外，在该数字储存介质的一些实施例当中，该参数当中的一个参数与被释放而不是被重新调度的线程相关，而该参数当中的另一个参数与被重新排队等待调度的线程相关，并且这些参数中的再另一个参数与将执行机会让与其它线程直到一个特定条件被满足的线程相关。In some embodiments of the mechanism, one of the parameters relates to threads being requeued for rescheduling, and another of these parameters relates to threads yielding execution opportunities to other threads until a specified condition is met relevant. Additionally, in some embodiments of the digital storage medium, one of the parameters relates to a thread that was released rather than rescheduled, and another of the parameters relates to a thread that was requeued for scheduling, and Yet another of these parameters relates to threads that relinquish execution opportunities to other threads until a specific condition is met.

在该机制的一些实施例当中，该指令为一个YIELD指令。还有，在该机制的一些实施例当中，该部分记录包括一个位向量。另外，在该机制的其他一些实施例当中，该部分记录包括一个或多个多位字段。In some embodiments of the mechanism, the instruction is a YIELD instruction. Also, in some embodiments of the mechanism, the partial record includes a bit vector. Additionally, in some other embodiments of the mechanism, the partial record includes one or more multi-bit fields.

在该方法的一些实施例当中，该指令为一个YIELD指令。还有，在该处理系统的一些实施例当中，该指令为一个YIELD指令。In some embodiments of the method, the instruction is a YIELD instruction. Also, in some embodiments of the processing system, the instruction is a YIELD instruction.

在该数字储存介质的实施例当中，该指令为一个YIELD指令。In an embodiment of the digital storage medium, the command is a YIELD command.

根据本发明的另一个方面，提供了一种包含在传输介质中的计算机数据信号，包括计算机可读的程序代码，该程序代码描述了一个能够支持并执行多程序线程的处理器，并且包括用于释放与重新调度一个线程的机制，该程序编码包括：第一程序代码段，用于描述一个数据储存装置中的一部份记录，该部分记录编码了与决定一个线程是否会被重新调度的一个或多个条件相关的一个或多个参数；和第二程序代码段，用于描述能够存取该记录中的该一个或多个参数的一条指令，其中，当该线程发出该指令时，该指令存取该记录中的该一个或多个值，并且遵循该一个或多个条件，根据该一个或多个值重新调度或释放该线程。According to another aspect of the present invention, there is provided a computer data signal contained in a transmission medium, including computer readable program code, the program code describes a processor capable of supporting and executing multiple program threads, and includes a For the mechanism of releasing and rescheduling a thread, the program code includes: a first program code segment, which is used to describe a part of records in a data storage device, and this part of the records encodes and determines whether a thread will be rescheduled One or more parameters related to one or more conditions; and a second program code segment for describing an instruction capable of accessing the one or more parameters in the record, wherein, when the thread issues the instruction, The instruction accesses the one or more values in the record and, subject to the one or more conditions, reschedules or releases the thread based on the one or more values.

根据本发明的另一方面，在一个能够支持多程序线程的处理器当中，提供了一种方法，包括：执行一个指令，该指令存取与线程调度相关的一个参数，其中该指令包含在一个程序线程中；当该参数等于第一数值时，则根据该指令释放该程序线程。在该方法的一些实施例当中，该第一数值为零。在该方法的另一些实施例当中，该方法还包括一个步骤：当该参数等于第二数值时，根据该指令挂起该程序线程的执行，其中该第二数值不等于该第一数值。在该方法的一些实施例当中，该第二数值表示，执行该程序线程所需要具备的条件并不满足。According to another aspect of the present invention, in a processor capable of supporting multiple program threads, a method is provided, comprising: executing an instruction that accesses a parameter related to thread scheduling, wherein the instruction is included in a In the program thread; when the parameter is equal to the first value, release the program thread according to the instruction. In some embodiments of the method, the first value is zero. In some other embodiments of the method, the method further includes a step of: suspending execution of the program thread according to the instruction when the parameter is equal to a second value, wherein the second value is not equal to the first value. In some embodiments of the method, the second value indicates that a condition required to execute the program thread is not satisfied.

在该方法的一些其它实施例当中，该条件以一个位向量或值字段的形式被编码在该参数之中。在一些其它的实施例当中，当该参数等于第三数值时，根据该指令而重新调度该程序线程，其中该第三数值不等于该第一数值与第二数值。在其他一些实施例当中，该第三数值为负1。在另外一些实施例当中，该第三数值为奇数值。In some other embodiments of the method, the condition is encoded in the parameter in the form of a bit vector or value field. In some other embodiments, the program thread is rescheduled according to the instruction when the parameter is equal to a third value, wherein the third value is not equal to the first value and the second value. In some other embodiments, the third value is negative 1. In some other embodiments, the third value is an odd value.

根据本发明的另一个方面，在一个能够支持多程序线程的处理器当中，提供了一个方法，包括：执行一条指令，该指令存取与线程调度相关的一个参数，其中该指令包含于一个程序线程中；当该参数等于第一数值时，则根据该指令挂起该程序线程的执行。在该方法的一些其它实施例当中，该方法还包含一个步骤：当该参数等于第二数值时，根据该指令而重新调度该程序线程，其中该第二数值不等于该第一数值。According to another aspect of the present invention, in a processor capable of supporting multiple program threads, a method is provided, comprising: executing an instruction that accesses a parameter related to thread scheduling, wherein the instruction is included in a program In the thread; when the parameter is equal to the first value, the execution of the program thread is suspended according to the instruction. In some other embodiments of the method, the method further includes a step of rescheduling the program thread according to the instruction when the parameter is equal to a second value, wherein the second value is not equal to the first value.

根据本发明的另一方面，在一个能够支持多程序线程的处理器当中，提供了一个方法，包括：执行一个指令，该指令存取与线程调度相关的一个参数，其中该指令包含于一个程序线程中；当该参数等于第一数值时，则根据该指令重新调度该程序线程。在该方法的一些实施例当中，该方法还包括一个步骤：当该参数等于第二数值时，根据该指令释放该程序线程，其中该第二数值不等于该第一数值。According to another aspect of the present invention, in a processor capable of supporting multiple program threads, a method is provided, comprising: executing an instruction that accesses a parameter related to thread scheduling, wherein the instruction is included in a program In the thread; when the parameter is equal to the first value, reschedule the program thread according to the instruction. In some embodiments of the method, the method further includes a step of releasing the program thread according to the instruction when the parameter is equal to a second value, wherein the second value is not equal to the first value.

本发明的实施例将在下文中更加详细的描述，在这些实施例中将第一次提供一种用于细粒度多线程的真正强壮的系统，使得产生与消灭线程所用的系统开销最小化。Embodiments of the present invention, described in more detail below, provide for the first time a truly robust system for fine-grained multithreading, minimizing the overhead of spawning and destroying threads.

附图说明Description of drawings

图1A是一个示意图，显示单一指令流由于高速缓存丢失而停顿的情况；Figure 1A is a schematic diagram showing a situation where a single instruction stream stalls due to a cache miss;

图1B是一个示意图，显示当如图1A的指令流被停顿时一个指令流仍然能被执行；FIG. 1B is a schematic diagram showing that an instruction flow can still be executed when the instruction flow shown in FIG. 1A is paused;

图2A是一个示意图，显示单一线程处理器；Figure 2A is a schematic diagram showing a single-threaded processor;

图2B是一个示意图，显示双线程处理器250；FIG. 2B is a schematic diagram showing a dual-thread processor 250;

图3是一个示意图，描述了根据本发明的一个实施例中，一个处理器支持第一与第二VPE；FIG. 3 is a schematic diagram illustrating a processor supporting first and second VPEs according to an embodiment of the present invention;

图4是一个示意图，描述了根据本发明的一个实施例中，一个处理器能够支持单一VPE，该VPE能进一步支持三个线程；Fig. 4 is a schematic diagram, has described in one embodiment of the present invention, a processor can support single VPE, and this VPE can further support three threads;

图5显示了根据本发明的一个实施例中，一个FORK指令的格式；Figure 5 shows the format of a FORK instruction according to one embodiment of the present invention;

图6显示了根据本发明的一个实施例中，一个YIELD指令的格式；Figure 6 shows the format of a YIELD instruction according to one embodiment of the present invention;

图7是一个表格，显示了一个用于GPR rs的十六位的限定掩码；Figure 7 is a table showing a sixteen-bit limit mask for GPR rs;

图8显示了根据本发明的一个实施例中，一个MFTR指令的格式；Figure 8 shows the format of an MFTR instruction according to an embodiment of the present invention;

图9是一个表格，说明了根据本发明的一个实施例中，一个MFTR指令的字段；Figure 9 is a table illustrating the fields of an MFTR instruction according to one embodiment of the present invention;

图10显示了根据本发明的一个实施例中，一个MTTR指令的格式；FIG. 10 shows the format of an MTTR instruction according to an embodiment of the present invention;

图11是一个表格，说明了根据本发明的一个实施例中，一个MTTR指令u和sel位；Figure 11 is a table illustrating the u and sel bits of an MTTR instruction according to one embodiment of the present invention;

图12显示了根据本发明的一个实施例中，一个EMT指令的格式；Figure 12 shows the format of an EMT instruction according to an embodiment of the present invention;

图13显示了根据本发明的一个实施例中，一个DMT指令的格式；Figure 13 shows the format of a DMT instruction according to an embodiment of the present invention;

图14显示了根据本发明的一个实施例中，一个ECONF指令的格式；Figure 14 shows the format of an ECONF instruction according to an embodiment of the present invention;

图15是根据本发明的一个实施例中，一个系统协处理器特权资源的说明表格；Fig. 15 is a description table of a system coprocessor privileged resource according to an embodiment of the present invention;

图16是根据本发明的一个实施例中，一个ThreadControl寄存器的架构；Fig. 16 is according to an embodiment of the present invention, the architecture of a ThreadControl register;

图17是根据本发明的一个实施例中，一个ThreadControl寄存器架构中各字段的说明表格；Fig. 17 is according to one embodiment of the present invention, the explanatory form of each field in a ThreadControl register structure;

图18是根据本发明的一个实施例中，一个ThreadStatus寄存器的架构；Fig. 18 is according to an embodiment of the present invention, the architecture of a ThreadStatus register;

图19是根据本发明的一个实施例中，一个ThreadStatus寄存器架构中各字段的说明表格；Fig. 19 is according to one embodiment of the present invention, the explanatory form of each field in a ThreadStatus register structure;

图20是根据本发明的一个实施例中，一个ThreadContext寄存器的架构；Fig. 20 is according to an embodiment of the present invention, the architecture of a ThreadContext register;

图21是根据本发明的一个实施例中，一个ThreadConfig寄存器的架构；Fig. 21 is according to an embodiment of the present invention, the architecture of a ThreadConfig register;

图22是根据本发明的一个实施例中，一个ThreadConfig寄存器架构中各字段的说明表格；Fig. 22 is according to one embodiment of the present invention, the explanatory form of each field in a ThreadConfig register structure;

图23是根据本发明的一个实施例中，一个ThreadSchedule寄存器的架构；Fig. 23 is according to an embodiment of the present invention, the architecture of a ThreadSchedule register;

图24是根据本发明的一个实施例中，一个VPESchedule寄存器的架构；FIG. 24 is a structure of a VPESchedule register according to an embodiment of the present invention;

图25是根据本发明的一个实施例中，一个Config4寄存器的架构；FIG. 25 is a structure of a Config4 register according to an embodiment of the present invention;

图26是根据本发明的一个实施例中，一个Config4寄存器架构中各字段的说明表格；Fig. 26 is an explanation form of each field in a Config4 register architecture according to an embodiment of the present invention;

图27是一个表格，定义了线程异常所需的Cause寄存器的异常代码值；Figure 27 is a table that defines the exception code value of the Cause register required by the thread exception;

图28是一个表格，定义了ITC指示符；Figure 28 is a table that defines the ITC indicator;

图29是一个表格，定义了Config3寄存器架构中的各字段；Figure 29 is a table that defines each field in the Config3 register architecture;

图30是一个表格，描述了每个VPE上下文的VPE禁止位；Figure 30 is a table describing the VPE disable bits for each VPE context;

图31是一个表格，描述了ITC储存的运作方式；Figure 31 is a table describing the operation of ITC storage;

图32是一个示意图，描述了根据本发明的一个实施例中的YIELD功能的操作；Figure 32 is a schematic diagram describing the operation of the YIELD function according to one embodiment of the present invention;

图33是一个示意图，描述了根据本发明的一个实施例的一个计算机操作系统；Figure 33 is a schematic diagram illustrating a computer operating system according to an embodiment of the present invention;

图34是一个示意图，描述了根据本发明的一个实施例中，在一个处理器中使用VPE和在一个VPE中使用线程来实施调度。FIG. 34 is a schematic diagram illustrating the implementation of scheduling using VPEs in a processor and threads in a VPE according to an embodiment of the present invention.

具体实施方式Detailed ways

根据本发明的一个较佳实施例，一处理器架构包括一指令集，而该指令集包含多个特征、多个功能与多个指令，而能够在一兼容处理器上产生多线程的运算。本发明并不止限于任何特定的处理器架构与指令集，而是可以大致归类为众所皆知而参照的MIPS架构，指令集与处理器技术(总言之，为MIPS技术)。并且加上本发明所详细描述的实施例也可归类为MIPS技术。更多有关MIPS技术的信息(包括以下所参照到的文件)可以从MIPS科技公司(MIPS ttechnology，Inc.)(位于Mountain View，California)和其网站 www.mips.com(该公司网站)获得。According to a preferred embodiment of the present invention, a processor architecture includes an instruction set, and the instruction set includes features, functions, and instructions capable of generating multi-threaded operations on a compatible processor. The present invention is not limited to any specific processor architecture and instruction set, but can be roughly classified into the well-known and referenced MIPS architecture, instruction set and processor technology (in a word, MIPS technology). And the embodiments described in detail in addition to the present invention can also be classified as MIPS technology. Additional information on MIPS technology, including the documents referenced below, is available from MIPS Technology, Inc. (located in Mountain View, California) and from its website at www.mips.com (the company's website).

所提到的“处理器”与“数字处理器”其意义包含任何可程序化的装置(举例来说，微处理器、微控制器、数字信号处理器、中央处理单元、处理器内核等等)，包含在硬件方面(例如，专用硅芯片、现场可程序门阵列(FPGA)等等)，在软件方面(例如，硬件描述语言、C语言、C+语言等等)或任何其组成(或其组合)。References to "processor" and "digital processor" include any programmable device (for example, microprocessor, microcontroller, digital signal processor, central processing unit, processor core, etc. ), in terms of hardware (e.g., dedicated silicon chips, field-programmable gate arrays (FPGAs), etc.), in terms of software (e.g., hardware description languages, C language, C+ language, etc.), or any components thereof (or combination).

术语“线程”与“程序线程”在本文中代表相同的意义。The terms "thread" and "program thread" are used synonymously herein.

概要描述Summary description

在本发明的实施例中，“线程上下文”是一处理器状态的集合，用于描述在一处理器上的一指令流执行的状态。所说的状态通常反映在处理器寄存器的内容中。举例来说，在一个与工业规格MIPS32和/或MIPS64指令集架构兼容的处理器(MIPS处理器)中，线程上下文为由通用寄存器(GPRs)，高低(Hi/Lo)乘法结果寄存器，有程序计数器(PC)功能和一些相关的特权系统控制状态的寄存器组成。系统控制状态在MIPS处理器中通常称作第零协处理器(CP0)的部分保留，并且一大部分是被系统控制寄存器与翻译后援缓存器(Translation Lookaside Buffer，TLB)所保存(如果使用了TLB)。相反的，“处理器上下文”是一个更大的处理器状态集合，包含至少一个线程上下文。再参照之前提到的MIPS处理器为例，一个处理器上下文包含至少一个线程上下文(如前述)，也就是CP0和必须的系统状态，用以描述已知的MIPS32与MIPS64专属资源或特权资源架构(PRA)。(简单的说，PRA是一组有关一个指令集架构操作时所依据的环境与能力参数的集合。该PRA提供了操作系统所必须的机制用以管理处理器的资源，例如，虚拟内存、高速缓存、异常运算与使用者上下文)。In an embodiment of the present invention, a "thread context" is a collection of processor states, which is used to describe the execution state of an instruction stream on a processor. Said state is usually reflected in the contents of processor registers. For example, in a processor compatible with the industry specification MIPS32 and/or MIPS64 instruction set architecture (MIPS processor), the thread context is composed of general purpose registers (GPRs), high and low (Hi/Lo) multiplication result registers, and program The counter (PC) function and some related privileged system control state registers. System control state is reserved in a portion of the MIPS processor commonly referred to as coprocessor zero (CP0), and a large portion is maintained by the system control registers and the Translation Lookaside Buffer (TLB) (if used TLB). In contrast, a "processor context" is a larger collection of processor state that includes at least one thread context. Referring again to the previously mentioned MIPS processor as an example, a processor context includes at least one thread context (as mentioned above), that is, CP0 and the necessary system state to describe the known MIPS32 and MIPS64 exclusive resources or privileged resource architectures (PRA). (Simply put, PRA is a set of environment and capability parameters related to the operation of an instruction set architecture. The PRA provides the mechanism necessary for the operating system to manage processor resources, such as virtual memory, high-speed caching, exception operations, and user contexts).

根据本发明的一个实施例，关于一指令集架构与PRA的一个特殊应用扩充的多线程(multithreading application-specific extensions，Multithreading ASE)允许在一处理器中包含两个不同，但并不互相排斥的多线程性能。首先，一个单一处理器可以有一定数目的处理器上下文，而其中每一个都透过共享某些处理器的资源且支持一个指令集架构而作为独立的处理单元操作。这些独立的处理单元在这里被称为虚拟处理单元(VPE)。对软件而言，具有N个VPE的处理器被看成是有N个路径且对称的多处理器(SMP)。这允许已有的具SMP功能的操作系统可以管理VPE集合，也就是透明地共享处理器的执行单元。According to one embodiment of the present invention, multithreading application-specific extensions (Multithreading ASE) on an instruction set architecture and a PRA allow two different, but not mutually exclusive Multi-threaded performance. First, a single processor can have a certain number of processor contexts, each of which operates as an independent processing unit by sharing some of the processor's resources and supporting an instruction set architecture. These separate processing units are referred to herein as virtual processing units (VPEs). To software, a processor with N VPEs is seen as a symmetric multiprocessor (SMP) with N paths. This allows existing SMP-capable operating systems to manage sets of VPEs, that is, transparently share the processor's execution units.

图3用一个单一处理器301描述了相关的性能，其支持了一个第一VPE(VPE0)，VPE0包含第零寄存器状态302与第零系统协处理器状态304。处理器301也支持第二VPE(VPE1)，其包含第一寄存器状态306与第一系统协处理器状态308。VPE0与VPE1共享处理器301的这些部分，包括取指、译码、管线化执行和高速缓存310。与SMP兼容的操作系统320被执行在该处理器301上，并支持VPE0与VPE1。如图所示，软件进程A 322与进程C 326分别被执行于VPE0与VPE1上，如同他们被执行于两个不同的处理器上。进程B 324处于队列状态，而可以在VPE0或VPE1的任一个上执行。FIG. 3 depicts the associated performance with a single processor 301 supporting a first VPE (VPE0 ), which contains a zeroth register state 302 and a zeroth system coprocessor state 304 . The processor 301 also supports a second VPE ( VPE1 ), which includes a first register state 306 and a first system coprocessor state 308 . VPE0 and VPE1 share these portions of processor 301 , including instruction fetch, decode, pipelined execution, and cache 310 . An SMP-compatible operating system 320 is executed on the processor 301 and supports VPE0 and VPE1. As shown in the figure, software process A 322 and process C 326 are executed on VPE0 and VPE1 respectively, as if they were executed on two different processors. Process B 324 is in a queue state, and may execute on either of VPE0 or VPE1.

多线程ASE所允许的第二个能力为，每个处理器或VPE皆可以在基本架构中所需的单一线程上下文之外，再含有某些数目的线程上下文。多线程VPEs需要特别的操作系统支持，并且在其支持下提供一个简易、细粒度多线程程序模型，其中线程可以被产生与消灭，使得在一般的情况下不会干扰操作系统，并且系统服务线程可以响应外部条件(例如，事件等等)安排调度，而没有中断的延迟。A second capability allowed by multi-threaded ASE is that each processor or VPE can contain a certain number of thread contexts in addition to the single thread context required in the base architecture. Multi-threaded VPEs require special operating system support and provide a simple, fine-grained multi-threaded program model with its support, in which threads can be spawned and destroyed, so that the operating system will not be disturbed in general, and the system service thread Scheduling can be scheduled in response to external conditions (eg, events, etc.) without interruption delays.

图4描述了这第二个能力且使用了处理器401来支持单一VPE，其包含寄存器状态402，404与406(支持三个线程422)以及系统协处理器状态408。与图3不同的是，在这个例子中三个线程是在单一应用地址空间中，且在单一VPE上共享CP0资源(以及硬件资源)。另外也描述了一个专门多线程操作系统420。在此范例中，多线程VPE正在处理来自一个宽带网络450的数据包，而此数据包的下载分布于整组的先进先出缓冲器(FIFO)452(在多线程VPE的输入/输出内存空间中每个FIFO皆有不同的地址)。控制应用程序产生了足够多的线程，与使用的FIFO数目相同，且将每一线程应用于在读取FIFO的紧凑循环中。FIG. 4 depicts this second capability and uses processor 401 to support a single VPE, which includes register states 402, 404 and 406 (supporting three threads 422) and system coprocessor state 408. The difference from Figure 3 is that in this example the three threads are in a single application address space and share CP0 resources (and hardware resources) on a single VPE. Also depicted is a dedicated multi-threaded operating system 420 . In this example, the multi-threaded VPE is processing a data packet from a broadband network 450, and the download of this data packet is distributed across an entire set of first-in-first-out buffers (FIFOs) 452 (in the input/output memory space of the multi-threaded VPE). Each FIFO in has a different address). The controlling application spawns as many threads as there are FIFOs used, and applies each thread in a tight loop reading the FIFOs.

一个线程上下文可以是四种状态之一。其可以是空闲(free)，激活(activated)，停止(halted)或连线(wired)。一个空闲的线程上下文不具有有效的内容，且不能被调度为可发出指令。一个激活的线程上下文可以根据实施的规则来调度，而从程序计数器提取与发出指令。一个停止的线程上下文可以具有有效的内容，但是不能够提取与发出指令。一个连线的线程上下文可以被指定作为映像寄存器使用，也就是说其被保留成专用于异常处理程序，以避免在该异常处理程序中储存与恢复存储器上下文所产生的开销。一个空闲的线程上下文不可以是激活，停止或连线的。只有激活的线程上下文可以被调度。只有空闲的线程上下文可以被分配而产生新的线程。A thread context can be in one of four states. It can be free, activated, halted or wired. An idle thread context has no valid content and cannot be scheduled to issue instructions. An active thread context can be scheduled according to enforced rules, fetching and issuing instructions from the program counter. A stopped thread context can have valid content, but cannot fetch and issue instructions. A thread context for a thread can be designated as a shadow register, that is, it is reserved exclusively for the exception handler, to avoid the overhead of storing and restoring the memory context in the exception handler. An idle thread context cannot be activated, stopped or connected. Only active thread contexts can be scheduled. Only idle thread contexts can be allocated to spawn new threads.

为了允许协同线程的细粒度同步，一个供线程内部之间沟通(ITC)的记忆空间在虚拟内存中被产生，并且有空(empty)/满(full)位语法用来允许线程在加载或储存时被阻塞，直到数据被其它线程产生或消耗。To allow fine-grained synchronization of co-threads, a memory space for inter-thread communication (ITC) is created in virtual memory with empty/full bit syntax to allow threads to load or store is blocked until data is produced or consumed by other threads.

线程产生/消灭与同步特性在一般情况下不会干预操作系统，但是他们所操控的资源可通过操作系统将之虚拟化。这允许了多线程程序可以利用更多的虚拟线程执行，其数目多于在一个VPE上的线程上下文数目，而使得线程的迁移能平衡多处理器系统的负荷。Thread creation/destruction and synchronization features generally do not interfere with the operating system, but the resources they manipulate can be virtualized by the operating system. This allows multithreaded programs to execute with more virtual threads than there are thread contexts on a VPE, enabling thread migration to balance the load on multiprocessor systems.

再仔细的从执行过程的某个点来看，一个线程与一个特定VPE上的一个特定线程上下文捆绑在一起。VPE的线程上下文组合的索引在该点的发生时间提供了一个唯一的标识符。但是上下文的切换与迁移能够使单一相继的线程的执行具有一连串不同的线程索引，例如是在一连串不同的VPE上。Looking carefully at a certain point in the execution process, a thread is tied to a specific thread context on a specific VPE. The combined index of the VPE's thread context provides a unique identifier at that point in time. But context switching and migration can enable the execution of a single successive thread with a sequence of different thread indices, for example on a sequence of different VPEs.

在一个特定的处理器重置调整状态中执行线程上下文、TLB项目和其它资源与同一处理器上多个VPE的动态绑定。每一个VPE输入其重置向量，如同它就是一个独立的处理器。Performs dynamic binding of thread contexts, TLB entries, and other resources to multiple VPEs on the same processor in a particular processor reset state. Each VPE inputs its reset vector as if it were an independent processor.

多线程的执行和异常模型Multithreaded Execution and Exception Model

多线程ASE并没有强加任何特殊的实现方式或调度模型用于并行线程与VPE的执行。调度方式可以是循环式的，任意粒度的时间切割或同时的。然而没有一个实现方式允许一个阻塞的线程独占任何共享的处理器资源，然后使得硬件运行陷入死锁。The multithreaded ASE does not impose any special implementation or scheduling model for the execution of parallel threads and VPEs. Scheduling can be round-robin, time-cut at any granularity, or simultaneous. However, none of the implementations allow a blocked thread to monopolize any shared processor resource, thus deadlocking the hardware operation.

在一MIPS处理器中，多个线程执行在一个单一VPE上，并皆共享一样的系统协处理器(CP0)，一样的TLB和一样的虚拟地址空间。每一个线程有一个独立的内核/监督者/使用者状态，用于内存访问与指令译码。当一个异常发生时，除了执行该异常的线程之外，所有的线程都被停止或挂起，直到状态字符串的EXL与ERL位被清除。或者，在EJTAG调试异常的情形下，退出该调试状态。该状态字符串被置于CP0中的状态寄存器中。有关该EXL与ERL位还有EJTAG调试异常的详细资料可以从下列的两个出版物获得，可以从MIPS科技公司取得该出版物，并且其全部内容在各种情形下可列为本文的参考文件：MIPS32^TM Architecture for Programmers Volumn III：TheMIPS32^TM Privileged Resource Architecture，Rev.2.00，MIPS科技公司(2003)和MIPS64^TM Architecture for Programmers Volumn III：TheMIPS64^TM Privileged Resource Architecture，Rev.2.00，MIPS科技公司(2003)。In a MIPS processor, multiple threads execute on a single VPE, and all share the same system coprocessor (CP0), the same TLB and the same virtual address space. Each thread has an independent kernel/supervisor/user state for memory access and instruction decoding. When an exception occurs, all threads except the one executing the exception are stopped or suspended until the EXL and ERL bits of the status string are cleared. Or, in case of an EJTAG debug exception, exit the debug state. This status string is placed in the status register in CP0. Details on the EXL and ERL bits and EJTAG debug exceptions can be obtained from the following two publications, which are available from MIPS Technologies, Inc. and are hereby incorporated by reference in their entirety in each case: : MIPS32 ^TM Architecture for Programmers Volume III: The MIPS32 ^TM Privileged Resource Architecture, Rev. 2.00, MIPS Technologies, Inc. (2003) and MIPS64 ^TM Architecture for Programmers Volumn III: The MIPS64 ^TM Privileged Resource Architecture, Rev. 2.00, MIPS Technologies, Inc. (2003) .

因为执行一个指令流而引起的同步异常的异常处理程序，例如TLB的丢失与浮点的异常，都由用于执行该指令流的该线程来执行。当一个未被屏蔽的异步异常，例如一个中断，被提升至一个VPE时，它的实现与执行了该异常处理程序的那个线程相关。Exception handlers for synchronous exceptions caused by executing an instruction stream, such as TLB misses and floating-point exceptions, are executed by the thread used to execute the instruction stream. When an unmasked asynchronous exception, such as an interrupt, is raised to a VPE, its implementation is relative to the thread that executed the exception handler.

甚至当使用影像寄存器组来执行例外处理程序时，每一个异常都与一个线程上下文有关。这个相关的线程上下文是被异常处理程序执行的RDPGPR与WRPGPR指令所要处理的目标。有关RDPGPR与WRPGPR指令(用来访问影像寄存器)的详细描述可以从以下两个出版物中获得，可以从MIPS科技公司取得该出版物，并且其全部内容在各种情形下可列为本文的参考文件：MIPS32^TM Architecture forProgrammers Volumn III：The MIPS32^TM Instruction Set，Rev.2.00，MIPS科技公司(2003)和MIPS64^TM Architecture for ProgrammersVolumn III：The MIPS64^TM Instruction Set，Rev.2.00，MIPS科技公司(2003)。Even when using the shadow register set to execute the exception handler, each exception is associated with a thread context. The associated thread context is the target of the RDPGPR and WRPGPR instructions executed by the exception handler. A detailed description of the RDPGPR and WRPGPR instructions (used to access shadow registers) can be obtained from the following two publications, available from MIPS Technologies, Inc., the entire contents of which are in each case incorporated herein by reference Documents: MIPS32 ^™ Architecture for Programmers Volumn III: The MIPS32 ^™ Instruction Set, Rev. 2.00, MIPS Technologies, Inc. (2003) and MIPS64 ^™ Architecture for Programmers Volumn III: The MIPS64 ^™ Instruction Set, Rev. 2.00, MIPS Technologies, Inc. (2003).

该多线程ASE包含了两个异常状况。第一个是线程未取得的状况，其中一个线程分配要求不能被满足。第二个为线程下溢状况，其中一个线程的终止与释放使得没有线程被分配于一个VPE上。这两种异常状况都被映射至一个新的单一线程异常。当该异常发生时，他们可以根据CP0寄存器的位设置而被区分。The multithreaded ASE contains two exception conditions. The first is the thread unacquired condition, where a thread allocation request cannot be satisfied. The second is a thread underflow condition, where the termination and release of a thread causes no threads to be allocated on a VPE. Both exception conditions are mapped to a new single-threaded exception. When this exception occurs, they can be distinguished according to the bit setting of the CP0 register.

指令instruction

在一个较佳的实施例中，多线程ASE包含七个指令。FORK与YIELD指令控制线程分配，释放和调度，并且可以在被执行和使能的全部执行模式中取得。MFTR与MTTR指令是系统协处理器(Cop0)指令，可用于特权系统软件来管理线程状态。一个新的EMT指令与一个新的DMT指令是特权的Cop0指令，其用来激活与禁能一个VPE的多线程操作。最后，一个新的ECONF指令是特权的Cop0指令，用于退出一个特殊处理器配置状态并重新初始化该处理器。In a preferred embodiment, the multithreaded ASE contains seven instructions. The FORK and YIELD instructions control thread allocation, release, and scheduling, and are available in all execution modes that are executed and enabled. The MFTR and MTTR instructions are system coprocessor (Cop0) instructions that can be used by privileged system software to manage thread state. A new EMT instruction and a new DMT instruction are privileged Cop0 instructions, which are used to enable and disable multi-threaded operation of a VPE. Finally, a new ECONF instruction is a privileged Cop0 instruction used to exit a specific processor configuration state and reinitialize the processor.

FORK-分配与调度一个新的线程FORK - allocates and schedules a new thread

FORK指令可以驱使一个空闲线程上下文被分配与激活。它的格式500如图5所示。FORK指令从由字段502(rs)与504(rt)标识的GPR(通用寄存器)取得两个操作数值。GPR rs的内容是用来对新线程开始提取与执行的地址。GPR rt的内容是一个值，用来传送至新线程的GPR。目的GPR由CP0的ThreadConfig寄存器的ForkTarget字段的值确定，其说明位于图21中，并于稍后会加以描述。新线程的内核/监督者/使用者状态被设定于FORK处理的线程中。如果没有空闲线程上下文给该FORK指令使用，产生关于该FORK指令的一个线程异常。The FORK instruction can drive an idle thread context to be allocated and activated. Its format 500 is shown in FIG. 5 . The FORK instruction takes two operand values from a GPR (general purpose register) identified by fields 502(rs) and 504(rt). The content of GPR rs is the address used to start fetching and executing the new thread. The content of GPR rt is a value to pass to the new thread's GPR. The destination GPR is determined by the value of the ForkTarget field of the ThreadConfig register of CP0, which is illustrated in Figure 21 and described later. The kernel/supervisor/user state of the new thread is set in the thread processed by FORK. If there is no free thread context for the FORK instruction, a thread exception is raised for the FORK instruction.

YIELD-重调度与有条件的释放一个线程YIELD - rescheduling and conditional release of a thread

YIELD指令使得当前的线程被重新调度。它的格式600如图6所示，并且图32中的流程图3200描述了根据本发明的一个实施例的系统操作，来说明YIELD指令的功能。The YIELD instruction causes the current thread to be rescheduled. Its format 600 is shown in FIG. 6, and the flowchart 3200 in FIG. 32 describes system operation according to one embodiment of the present invention to illustrate the function of the YIELD instruction.

YIELD指令例如从字段602(rs)中指定的GPR得到一个单一操作数值。在一较佳实施例中使用了一个GPR，但是在其它的实施例中，该操作数值可以在实质上任何可由系统访问的数据储存装置(例如，非GPR寄存器，内存等等)中被储存或取得。在一实施例中，GPR rs的内容可以被视为是一个描述符，描述了一个发出的线程应该被重新调度的情况。如果该GPR rs的内容是零(即该操作数的值为零)，如图32的步骤3202所示，则该线程并不会被重调度，而是会如同步骤3204所示被释放(即，终止或永久的停止更进一步的执行)，并且与其相关的线程上下文的储存器(即上述提到的用于保存状态的寄存器)变成空闲，从而可以被其它线程发出的接下来的FORK指令进行分配。如果该GPR rs的最低有效位被设定(即，rs0＝1)，则如图32的步骤3206所示该线程马上被重新调度，并且如果没有其它可执行的线程抢先的话，就继续执行该线程。在该实施例中，该GPR rs的内容被视为15位的限定符掩码，如图7中表格700的描述(即，用于编码各种条件的位向量)。The YIELD instruction takes a single operand value, for example, from the GPR specified in field 602(rs). A GPR is used in one preferred embodiment, but in other embodiments, the operand value may be stored in virtually any data storage device accessible by the system (e.g., non-GPR registers, memory, etc.) or obtain. In one embodiment, the content of GPR rs can be viewed as a descriptor describing the circumstances under which an emitted thread should be rescheduled. If the content of this GPR rs is zero (i.e. the value of the operand is zero), as shown in step 3202 of Figure 32, then this thread will not be rescheduled, but will be released as shown in step 3204 (i.e. , terminate or permanently stop further execution), and its associated thread context storage (that is, the above-mentioned register for saving state) becomes free, so that it can be issued by other threads for the next FORK instruction to allocate. If the least significant bit of the GPR rs is set (i.e., rs0=1), the thread is immediately rescheduled as shown in step 3206 of FIG. thread. In this embodiment, the contents of the GPR rs are treated as a 15-bit qualifier mask, as described in table 700 in FIG. 7 (ie, a bit vector for encoding various conditions).

请参照表格700，寄存器rs的位15至位10表示提供给处理器的硬件中断信号，位9至位8表示处理器中产生的软件中断信号，位7至位6表示MIPS架构中最基本地关联加载(Load Linked)和条件存储(Store Conditional)同步原语的操作，还有位5至位2表示提供给处理器的外部非中断信号。Please refer to table 700, bits 15 to 10 of the register rs indicate the hardware interrupt signal provided to the processor, bits 9 to 8 indicate the software interrupt signal generated in the processor, and bits 7 to 6 indicate the most basic MIPS architecture The operations of the associated load (Load Linked) and conditional storage (Store Conditional) synchronization primitives, and bits 5 to 2 represent external non-interrupt signals provided to the processor.

如果GPR rs的内容值是一偶数(即，位0未被设定)，并且GPRrs的限定符掩码中的任何其他位皆被设定(步骤3208)，则该线程被挂起，直到满足至少一个对应条件。如果当此情形发生，该线程就被重新调度(步骤3210)，并且从YIELD之后的指令重新开始执行。这个过程的使能并不会受CP0.Status.iMn中断屏蔽位的影响，因此总共有被位15至10和位5至2(如图7所示)编码的十个外部条件(例如，事件等等)以及被位9至6(如图7所示)编码的四个软件条件，在目前的实施例中被用来响应外部信号以使能独立的线程，而不需要处理器执行异常处理。在这个特定例子中，有六个硬件中断和四个非中断信号，再加上两个软件中断和两个非中断信号，最后再加上一个专用于重调度功能的信号(即rs0)，总共对应于十五个条件。(该CP0.Status.iMn中断屏蔽位是在CP0 Status寄存器中的一个八位的集合，其可选择性地屏蔽MIPS处理器的八个基本的中断输入。如果一个IM位被设定，则其它相关的中断输入就不会引起处理器的异常事件。)If the content value of GPRrs is an even number (i.e., bit 0 is not set), and any other bit in the qualifier mask of GPRrs is set (step 3208), then the thread is suspended until At least one corresponding condition. If and when this happens, the thread is rescheduled (step 3210), and execution resumes from the instruction after YIELD. The enabling of this process is not affected by the CP0.Status.iMn interrupt mask bits, so there are a total of ten external conditions (e.g., event etc.) and the four software conditions encoded by bits 9 through 6 (as shown in Figure 7), are used in the current embodiment to respond to external signals to enable separate threads without requiring the processor to perform exception handling . In this particular example, there are six hardware interrupts and four non-interrupt signals, plus two software interrupts and two non-interrupt signals, and finally one signal dedicated to the rescheduling function (i.e. rs0), for a total of Corresponding to fifteen conditions. (The CP0.Status.iMn interrupt mask bit is a set of eight bits in the CP0 Status register that selectively masks the eight basic interrupt inputs of the MIPS processor. If one IM bit is set, the other The associated interrupt input will not cause a processor exception event.)

在EIC的中断模式中，位IP2至IP7对有最高优先权的中断进行编码，而不只是表示了一个有正交指示的向量。当处理器使用EIC的中断模式时，在一个YIELD指令中与位IP2至IP7相关联的GPR rs的位因此不再能够被用于针对一个特定外部事件去重新使能一个线程。在EIC的中断模式中，只有与系统相关连的外部事件指示符(例如，在本实施例中，是GPR rs的位5至2)可以被用作YIELD的限定符。EIC的中断模式与位IP2至IP7已被更进一步地描述于下列的出版物，上文中已经指出并引用了该出版物的整个内容：MIPS32^TMArchitecture for Programmers Volumn III：The MIPS32^TM PrivilegedResource Architecture，与MIPS64^TM Architecture for ProgrammersVolumn III：The MIPS64^TM Privileged Resource Architecture。In the EIC's interrupt mode, bits IP2 to IP7 encode the interrupt with the highest priority, not just a vector with quadrature instructions. When the processor uses the interrupt mode of the EIC, the bits of GPR rs associated with bits IP2 to IP7 in a YIELD instruction can therefore no longer be used to re-enable a thread for a specific external event. In the interrupt mode of the EIC, only external event indicators associated with the system (eg, in this embodiment, bits 5 to 2 of GPR rs) can be used as qualifiers for YIELD. The interrupt modes and bits IP2 to IP7 of the EIC have been further described in the following publication, which was noted above and cited in its entirety: MIPS32 ^™ Architecture for Programmers Volume III: The MIPS32 ^™ PrivilegedResource Architecture, and MIPS64 ^™ Architecture for Programmers Volume III: The MIPS64 ^™ Privileged Resource Architecture.

如果执行YIELD的结果是对于处理器或VPE上最近分配的线程的释放，则关于该YIELD指令产生一个线程异常，具有在CP0的ThreadStatus寄存器中的下溢指示(如图18所示且在稍后会加以说明)。If the result of executing YIELD is the release of the most recently allocated thread on the processor or VPE, then a thread exception is generated for the YIELD instruction, with an underflow indication in the ThreadStatus register of CP0 (as shown in Figure 18 and later on will be explained).

上述实施例使用了YIELD指令的GPR rs中所包含的操作数作为线程调度的参数。在该例子中，这个参数被看作一个15位的正交指示的向量(参照图7，位1与15被保留，所以在此较佳实施例中只有十五个条件被编码)。此实施例也将此参数当成是一个指定的值(即，用于决定是否一个给定的线程应该被释放，参考图32的步骤3202)。然而，这样一个参数的特性可以被改变，以适合各种不同指令的实施例。例如，不是依靠最低有效位(即rs0)来决定是否一个线程可被立即重新调度，而是使用该参数本身的值(例如，二进制补码形式的负一{-1})来决定一个线程是不是应该被立即重新调度(即，用于调度的重新排队)。The foregoing embodiment uses the operand contained in the GPR rs of the YIELD instruction as a parameter for thread scheduling. In this example, this parameter is viewed as a 15-bit vector of orthogonal indications (see Figure 7, bits 1 and 15 are reserved, so only fifteen conditions are encoded in the preferred embodiment). This embodiment also treats this parameter as a specified value (ie, for determining whether a given thread should be released, see step 3202 of FIG. 32). However, the nature of such a parameter may be changed to suit various instruction embodiments. For example, instead of relying on the least significant bit (i.e., rs0) to determine whether a thread is immediately reschedulable, the value of the parameter itself (e.g., negative one {-1} in two's complement form) is used to determine whether a thread is Should not be immediately rescheduled (ie, requeued for scheduling).

在该指令的其他实施例中，可以将这样的线程调度参数看作包含一个或多个多位值的字段，以使得一个线程可以确定，它将关于一个大的时间名空间(例如，32位或更大)中的一个单一事件而产生。在这样的实施例中，至少与该目标事件有关的位可以被当前YIELD指令访问。当然，如一特定实施例所期望的，更多的位字段可被传送至该指令(与更多的事件相关联)。In other embodiments of this instruction, such thread scheduling parameters can be viewed as fields containing one or more multi-bit values, so that a thread can determine that it will be associated with a large temporal namespace (e.g., 32-bit or larger) from a single event. In such an embodiment, at least the bits associated with the target event are accessible by the current YIELD instruction. Of course, more bit fields may be passed to the instruction (associated with more events) as desired for a particular embodiment.

该YIELD指令的其他实施例，在由该指令访问的一个线程调度参数中可以包含前述位向量与值字段的组合，或其它具体应用的改进和提高，(例如)以满足特定实现的需要。YIELD指令的可选实施例可以用任何已知方法访问如前所述的一个线程调度的参数，例如，从一个GPR(如图6所示)，从任意其它的数据存储装置(包含内存)以及作为该指令本身中的立即值。In other embodiments of the YIELD instruction, a thread scheduling parameter accessed by the instruction may include a combination of the aforementioned bit vector and value field, or other specific application improvements and enhancements, (for example) to meet the needs of specific implementations. Alternative embodiments of the YIELD instruction can use any known method to access the parameters of a thread schedule as previously described, for example, from a GPR (as shown in Figure 6), from any other data storage device (including memory) and as an immediate value in the instruction itself.

MFTR-从线程寄存器移动MFTR - move from thread register

MFTR指令是一个特权(Cop0)指令，可允许一个操作系统执行一个线程来访问另一个不同的线程上下文。其格式800被描述于图8。The MFTR instruction is a privileged (Cop0) instruction that allows an operating system to execute a thread to access a different thread context. Its format 800 is depicted in FIG. 8 .

要被访问的线程上下文由CP0的ThreadControl寄存器的AlternateThread字段的值确定，该字段如图16所示并且会在稍后描述。在选定的线程上下文中，要被读取的寄存器由字段802标定的rt操作数寄存器的值，和分别位于该MFTR指令的字段804与806中的u与sel位来确定，并且依据图9所示的表格900进行说明。产生的值被写入由字段808标定的目的寄存器rd。The thread context to be accessed is determined by the value of the AlternateThread field of the ThreadControl register of CP0, which is shown in Figure 16 and described later. In the selected thread context, the register to be read is determined by the value of the rt operand register designated by field 802, and the u and sel bits in fields 804 and 806 of the MFTR instruction, respectively, and according to FIG. 9 Form 900 is shown for illustration. The resulting value is written to the destination register rd identified by field 808 .

MTTR-向线程寄存器移动MTTR - move to thread register

MTTR指令与MFTR指令相反。它是一个特权Cop0指令，其将当前线程的线程上下文中的寄存器值复制到另一个线程上下文的寄存器中。其格式1000如图10所示。The MTTR instruction is the opposite of the MFTR instruction. It is a privileged Cop0 instruction that copies register values in the thread context of the current thread to registers in another thread context. Its format 1000 is shown in FIG. 10 .

要被访问的线程上下文由CP0的ThreadControl寄存器的AlternateThread字段的值确定，该字段如图16所示并且会在稍后描述。在选定的线程上下文中，要被写入的寄存器由字段1002标定的rd操作数寄存器中的值，结合分别在该MTTR指令的字段1004与1006中提供的u与sel位来确定，并且依据在图11中所显示的表格1100进行解释(其编码相似于MFTR)。由字段1008标定的寄存器rt中的值被复制到选定的寄存器。The thread context to be accessed is determined by the value of the AlternateThread field of the ThreadControl register of CP0, which is shown in Figure 16 and described later. In the selected thread context, the register to be written is determined by the value in the rd operand register identified by field 1002, in conjunction with the u and sel bits provided in fields 1004 and 1006, respectively, of the MTTR instruction, and according to Table 1100 shown in Fig. 11 for interpretation (coded similarly to MFTR). The value in register rt identified by field 1008 is copied to the selected register.

EMT-使能多线程EMT - enable multithreading

EMT指令为特权Cop0指令，其通过设定CP0的ThreadControl寄存器的TE位来使能多个线程的并行执行，该寄存器如图16所示并且会在稍后描述。该指令的格式1200显示于图12。包含在该EMT执行之前的TE(Thread Enabled)位值的该ThreadControl寄存器的值会被传回寄存器rt。The EMT instruction is a privileged Cop0 instruction that enables parallel execution of multiple threads by setting the TE bit of the ThreadControl register of CP0, which is shown in FIG. 16 and described later. The format 1200 of this command is shown in FIG. 12 . The value of the ThreadControl register containing the TE (Thread Enabled) bit value before the EMT execution will be passed back to the register rt.

DMT-禁能多线程DMT - disable multithreading

DMT指令为特权Cop0指令，其通过清除CP0的ThreadControl寄存器的TE位来禁止多线程的并行执行，该寄存器如图16所示并且会在稍后描述。该指令的格式1300显示于图13。The DMT instruction is a privileged Cop0 instruction, which disables parallel execution of multiple threads by clearing the TE bit of the ThreadControl register of CP0, which is shown in FIG. 16 and described later. The format 1300 of this command is shown in FIG. 13 .

除了发出该DMT指令的线程之外，所有的线程都被禁止进一步的指令提取与执行。这与所有线程暂停状态是无关的。包含该DMT执行之前的TE(Thread Enabled)位值的该ThreadControl寄存器的值会被传回寄存器rt。Except for the thread that issued the DMT instruction, all threads are prohibited from further instruction fetching and execution. This is independent of any thread suspension state. The value of the ThreadControl register containing the TE (Thread Enabled) bit value before the DMT is executed will be returned to the register rt.

ECONF-结束处理器的配置ECONF - Ends the configuration of the handler

ECONF指令为一特权Cop0指令，其通知VPE配置的结束，并使能多VPE的执行。该指令的格式1400显示于图14。The ECONF instruction is a privileged CopO instruction that notifies the end of VPE configuration and enables execution of multiple VPEs. The format 1400 of this command is shown in FIG. 14 .

当一个ECONF指令被执行时，Config3寄存器的VPC位(稍后描述)即被清除，而该寄存器的MVP位的当前值也变成只读，并且处理器的所有VPE，包含正在执行ECNOF的这个VPE，都产生一个Reset异常。When an ECONF instruction is executed, the VPC bit of the Config3 register (described later) is cleared, and the current value of the MVP bit of the register becomes read-only, and all VPEs of the processor, including the one that is executing ECNOF VPE, all generate a Reset exception.

特权资源(Privileged Resource)Privileged Resource

图15的表格1500列出了系统协处理器的与多线程ASE相关的特权资源。除了特别的说明之外，不管是新的还是修改过的第零协处理器(CP0)的如下所述的寄存器都是可访问的(即，写入和读出)，就像传统的第零协处理器(即，MIPS处理器)的系统控制寄存器一样。Table 1500 of FIG. 15 lists privileged resources of the system coprocessor related to multi-threaded ASE. Unless otherwise specified, the registers described below are accessible (i.e., written to and read from) on both new and modified coprocessor zero (CP0) just like conventional zeroth The system control registers of coprocessors (ie, MIPS processors) are the same.

新的特权资源New Privileged Resources

(A)ThreadControl寄存器(CP0寄存器号码7，选择号码1)(A) ThreadControl register (CP0 register number 7, selection number 1)

该ThreadControl寄存器是在每个VPE中作为系统协处理器的一个部分。其结构1600显示于图16。该ThreadControl寄存器的字段可根据图17的表格1700来设定。The ThreadControl register is in each VPE as part of the system coprocessor. Its structure 1600 is shown in FIG. 16 . The fields of the ThreadControl register can be set according to the table 1700 of FIG. 17 .

(B)ThreadStatus寄存器(CP0寄存器号码12，选择号码4)(B) ThreadStatus register (CP0 register number 12, selection number 4)

该ThreadStatus寄存器存在于每个线程上下文中。每一个线程皆有其ThreadStatus的拷贝，并且特权程序代码可以通过MFTR与MTTR指令访问其它线程的ThreadStatus。其结构1800显示于图18。该ThreadStatus寄存器的字段可根据图19的表格1900来设定。The ThreadStatus register exists per thread context. Each thread has a copy of its ThreadStatus, and privileged program code can access the ThreadStatus of other threads through the MFTR and MTTR instructions. Its structure 1800 is shown in FIG. 18 . The fields of the ThreadStatus register can be set according to the table 1900 of FIG. 19 .

在一个激活线程的Halted位写入一个1，会使得该激活线程停止提取指令，并且将内部重启动程序计数器(PC)设定到下一个发出的指令。在一个激活线程的Halted位写入一个0，使得被调度的该线程从内部重启动程序计数器(PC)地址提取和执行指令。只要在未被激活的线程的Activated位或是Halted位中有任一为1，则该线程就可避免被一个FORK指令分配和激活。Writing a 1 to the Halted bit of an active thread causes the active thread to stop fetching instructions and resets the internal restart program counter (PC) to the next issued instruction. Writing a 0 to the Halted bit of an active thread causes that thread to be scheduled to fetch and execute instructions from the internal restart program counter (PC) address. As long as either the Activated bit or the Halted bit of the unactivated thread is 1, the thread can avoid being allocated and activated by a FORK instruction.

(C)ThreadContext寄存器(CP0寄存器号码4，选择号码1)(C) ThreadContext register (CP0 register number 4, selection number 1)

该ThreadContext寄存器2000存在于每个线程上下文中，并且如图20所示其长度与处理器的GPR是相同的。这纯粹是一个软件可读/写的寄存器，可被操作系统用作特定线程储存的指针，例如一个线程上下文保存的区域。The ThreadContext register 2000 exists in each thread context, and its length is the same as the processor's GPR as shown in FIG. 20 . This is purely a software read/write register that can be used by the operating system as a pointer to thread-specific storage, such as an area where the thread context is kept.

(D)ThreadConfig寄存器(CP0寄存器号码6，选择号码1)(D) ThreadConfig register (CP0 register number 6, selection number 1)

该ThreadConfig寄存器存在于每个处理器或VPE中。其结构2100显示于图21中。该ThreadConfig寄存器的字段被定义在图22的表格2200中。The ThreadConfig register exists in each processor or VPE. Its structure 2100 is shown in FIG. 21 . The fields of the ThreadConfig register are defined in table 2200 of FIG. 22 .

ThreadConfig寄存器的WiredThread字段允许在一个VPE上可获得的线程上下文的集合在影像寄存器集合与并行执行线程之间被分割。线程上下文的索引若是小于该WireThread的值，则可从影像寄存器获得该线程上下文。The WiredThread field of the ThreadConfig register allows the set of thread contexts available on a VPE to be split between the set of shadow registers and parallel execution threads. If the index of the thread context is less than the value of the WireThread, the thread context can be obtained from the shadow register.

(E)ThreadSchedule寄存器(CP0寄存器号码6，选择号码2)(E) ThreadSchedule register (CP0 register number 6, selection number 2)

ThreadSchedule寄存器是选择性的，但是当被实现时，最好被实现于每个线程中。其结构2300显示于图23中。The ThreadSchedule register is optional, but when implemented, is preferably implemented per thread. Its structure 2300 is shown in FIG. 23 .

调度向量(Schedule Vector)(如图所示，在一较佳实施例中其宽度为32位)为对于相关线程进行调度所要求的发出带宽的描述。在此实施例中，每一位皆代表该处理器或VPE的发出带宽的1/32，并且每一位的位置代表了在有32个时段的调度循环中的一个明确的时段。The schedule vector (Schedule Vector) (as shown in the figure, its width is 32 bits in a preferred embodiment) is a description of the issue bandwidth required for scheduling related threads. In this embodiment, each bit represents 1/32 of the issue bandwidth of the processor or VPE, and the position of each bit represents a specific time slot in the 32 time slot scheduling loop.

如果在一个线程的ThreadSchedule寄存器中设定了一位，那么该线程即被保证了在相关的处理器或VPE上每32个可能的连续发出中，可获得一个对应的发出时段。在ThreadSchedule寄存器的一位上写入一个1，当相同处理器或VPE上的其它线程已经具有一样的ThreadSchedule位设定时，则将产生线程异常。虽然在这里，ThreadSchedule寄存器的优选宽度是32位，但是可以预料的到，在其它的实施例中该宽度可以改变(即，增加或减少)。If a bit is set in a thread's ThreadSchedule register, then the thread is guaranteed to get a corresponding issue slot for every 32 possible consecutive issues on the associated processor or VPE. Writing a 1 to a bit in the ThreadSchedule register will generate a thread exception when other threads on the same processor or VPE already have the same ThreadSchedule bit set. Although here, the preferred width of the ThreadSchedule register is 32 bits, it is contemplated that the width may vary (ie, increase or decrease) in other embodiments.

(F)VPESchedule寄存器(CP0寄存器号码6，选择号码3)(F) VPESchedule register (CP0 register number 6, selection number 3)

VPESchedule寄存器是可选择的，并且优选的存在于每个VPE中。它只有当Config3寄存器的MVP位被设定时才能被写入(请参考图29)。其格式2400显示于图24。The VPESchedule register is optional and preferably present in each VPE. It can only be written when the MVP bit of the Config3 register is set (refer to Figure 29). Its format 2400 is shown in FIG. 24 .

调度向量(如图所示，在一较佳实施例中其宽度为32位)为对于相关VPE进行调度所要求的发出带宽的描述。在此实施例中，每一位皆代表一个多VPE处理器的发出总带宽的1/32，并且每一位的位置代表了在有32个时段的调度循环中的一个明确的时段。A scheduling vector (as shown, which is 32 bits wide in a preferred embodiment) is a description of the required outgoing bandwidth for scheduling the associated VPE. In this embodiment, each bit represents 1/32 of the total issue bandwidth of a multi-VPE processor, and the position of each bit represents a specific time slot in the 32 time slot scheduling loop.

如果VPE的VPESchedule寄存器中的一个位被设定，那么该线程即被保证了在相关处理器上每32个可能的连续发出中，可获得一个对应的发出时段。在VPE的VPESchedule寄存器的一位上写入一个1，当其它VPE已经具有一样的VPESchedule位设定时，则将产生线程异常。If a bit in the VPE's VPESchedule register is set, then the thread is guaranteed a corresponding issue slot for every 32 possible consecutive issues on the associated processor. Writing a 1 to a bit of the VPESchedule register of a VPE will generate a thread exception when other VPEs already have the same VPESchedule bit set.

依据处理器目前默认的线程调度原则(例如，循环法等等)，只要发出时段未被任何线程特别的排定，其仍然可被自由的分配给任何可执行的VPE/线程。According to the current default thread scheduling principle of the processor (for example, round robin, etc.), as long as the issue period is not specially scheduled by any thread, it can still be freely allocated to any executable VPE/thread.

VPESchedule寄存器与ThreadSchedule寄存器创造了一个发出带宽分配的结构。VPESchedule寄存器的设定指定了对于VPE的带宽，其为在一个处理器或内核上全部可取得带宽的一定比例，而ThreadSchedule寄存器指定了对于线程的带宽，其为在一个包含线程的VPE中可取得全部带宽的一定比例。The VPESchedule register and the ThreadSchedule register create a structure for issuing bandwidth allocations. The setting of the VPESchedule register specifies the bandwidth for the VPE, which is a certain percentage of the total available bandwidth on a processor or core, and the ThreadSchedule register specifies the bandwidth for threads, which is available in a VPE containing threads A certain percentage of the total bandwidth.

虽然在这里，VPESchedule寄存器的优选宽度是32位，但是可以预期，在其它的实施例中该宽度可以改变(即，增加或减小)。Although here, the preferred width of the VPESchedule register is 32 bits, it is contemplated that the width may vary (ie, increase or decrease) in other embodiments.

(G)Config4寄存器(CP0寄存器号码16，选择号码4)(G) Config4 register (CP0 register number 16, selection number 4)

寄存器Config4存在于每一个处理器中。其包含了对于动态多VPE处理器配置所必须配置信息。如果处理器并不处于VPE配置状态(即，Config3寄存器的VMC位被设定)，则除了M(连续)字段之外的所有字段的值都会成为与实施方式有关且是不可预测其结果的。其结构2500描述于图25。Config4寄存器的字段的定义如图26的表格2600中所示。在某些实现方式或实施例中，Config3寄存器的VMC位可以是一个被事先保留/不指定的位。Register Config4 exists in each processor. It contains the necessary configuration information for dynamic multi-VPE processor configuration. If the processor is not in the VPE configuration state (ie, the VMC bit of the Config3 register is set), the values of all fields except the M (Continuous) field become implementation-dependent and their results are unpredictable. Its structure 2500 is depicted in FIG. 25 . The fields of the Config4 register are defined as shown in table 2600 of FIG. 26 . In some implementations or embodiments, the VMC bit of the Config3 register may be a reserved/unassigned bit.

对于目前存在的特权资源架构的修改Modifications to the Existing Privileged Resource Architecture

该多线程ASE对于当前MIPS32与MIPS64PRA的一些单元做了变更。The multithreaded ASE has made changes to some units of the current MIPS32 and MIPS64PRA.

(A)Status寄存器(A) Status register

Status寄存器中的CU位对于多线程配置具有一些额外的意义。设定CU位的动作也就是要求将一个协处理器上下文与和该CU位关联的线程绑定。如果一个协处理器上下文是可用的，则将它与该线程绑定在一起，以使该线程所发出的指令能传送到该协处理器，并且该CU位会保留写入该位的1。如果没有一个协处理器上下文是可用的，则该CU位便会读回0。写入一个0去设定该CU位，会使得任何相关联的协处理器被释放。The CU bit in the Status register has some additional meaning for multithreaded configurations. The act of setting the CU bit also requires binding a coprocessor context to the thread associated with the CU bit. If a coprocessor context is available, it is bound to the thread so that instructions issued by the thread can be delivered to the coprocessor, and the CU bit retains the 1 written to this bit. If no coprocessor context is available, the CU bit will read back 0. Writing a 0 to set the CU bit causes any associated coprocessors to be released.

(B)Cause寄存器(B) Cause register

如图27所示，线程异常需要有一个新的Cause寄存器的异常代码值。As shown in Figure 27, thread exceptions need to have a new exception code value of the Cause register.

(C)EntryLo寄存器(C) EntryLo register

如图28所示，一个事先被保留的高速缓存标志变成ITC指示符。As shown in FIG. 28, a previously reserved cache flag becomes an ITC indicator.

(D)Config3寄存器(D) Config3 register

如图29的表格2900中所示，定义了新的Config3寄存器的字段，用来表示多线程ASE与多个线程上下文是否可用。As shown in table 2900 of FIG. 29 , a field of a new Config3 register is defined to indicate whether the multi-thread ASE and multiple thread contexts are available.

(E)Ebase(E)Ebase

如图30所示，Ebase寄存器的一个事先被保留的位30变成每个VPE上下文中的一个VPE的禁止位。As shown in Figure 30, a previously reserved bit 30 of the Ebase register becomes a VPE disable bit in each VPE context.

(F)SRSCtl(F)SRSCtl

先前预设定的HSS字段现在变成了ThreadConfig寄存器的WiredThread字段的一个功能。The previously preset HSS field is now a function of the WiredThread field of the ThreadConfig register.

未使用FORK指令的线程分配与初始化Thread allocation and initialization without using the FORK instruction

在一较佳实施例中，一个操作系统“手动”产生一个线程的过程如下：In a preferred embodiment, the process of an operating system "manually" generating a thread is as follows:

1.执行一个DMT，用以停止其它线程的执行或是可能的FORK指令执行。1. Execute a DMT to stop the execution of other threads or possibly the execution of the FORK instruction.

2.通过将连续的值设定在ThreadControl寄存器的AlternateThread字段，并用MFTR指令来读取ThreadStatus寄存器，来识别一个可获得的线程上下文。一个空闲的线程在其ThreadStatus寄存器中不会有Halted或Activated位被设定。2. Identify an available thread context by setting consecutive values in the AlternateThread field of the ThreadControl register and reading the ThreadStatus register with the MFTR instruction. An idle thread will not have the Halted or Activated bits set in its ThreadStatus register.

3.设定选定线程的ThreadStatus寄存器的Halted位，用以避免其被其它线程配置。3. Set the Halted bit of the ThreadStatus register of the selected thread to prevent it from being configured by other threads.

4.执行一个EMT指令去重新使能多线程。4. Execute an EMT instruction to re-enable multithreading.

5.使用MTTR指令并使其u字段设定为1，来复制任何需要的GPR至选定的线程上下文。5. Use the MTTR instruction with its u field set to 1 to copy any required GPRs to the selected thread context.

6.使用MTTR指令并使其u和sel字段设定为0且rt字段设定为14(EPC)，从而写入所需的开始执行地址至该线程的内部重启动地址寄存器中。6. Use the MTTR instruction with the u and sel fields set to 0 and the rt field set to 14 (EPC) to write the required start execution address into the thread's internal restart address register.

7.使用MTTR指令将0和1分别写入选定的ThreadStatus寄存器的Halted位和Activated位。7. Use the MTTR instruction to write 0 and 1 to the Halted and Activated bits of the selected ThreadStatus register, respectively.

然后该新分配的线程可以被调度。如果在该过程中设定了EXL或ERL，由于他们隐含了禁止多线程的执行，则执行DMT，设定新线程的Halted位和执行EMT的这些步骤可以被省略。This newly allocated thread can then be scheduled. If EXL or ERL is set during this process, since they implicitly prohibit multi-threaded execution, the steps of executing DMT, setting the Halted bit of the new thread and executing EMT can be omitted.

未使用YIELD指令的线程的终止和释放Termination and release of threads that do not use the YIELD instruction

在本发明的一较佳实施例中，一个操作系统用来终止当前线程的过程如下：In a preferred embodiment of the present invention, the process that an operating system is used to terminate the current thread is as follows:

1.如果操作系统不支持关于线程下溢状态的线程异常，则使用MFTR指令来扫描ThreadStatus寄存器的设定，以检验在处理器上有另一个可运行的线程，相反的如果没有，就向程序发出错误信号。1. If the operating system does not support thread exceptions about thread underflow status, then use the MFTR instruction to scan the ThreadStatus register setting to check that there is another runnable thread on the processor, otherwise, report to the program Signal an error.

2.写入任何重要的GPR寄存器的值至内存。2. Write the value of any important GPR register to memory.

3.在Status/ThreadStatus寄存器中，设定内核模式(Kernel mode)。3. In the Status/ThreadStatus register, set the kernel mode (Kernel mode).

4.当目前的线程维持在一个特权状态时，清除EXL/ERL来允许其他线程被调度。4. When the current thread remains in a privileged state, clear EXL/ERL to allow other threads to be scheduled.

5.使用一个标准的MTC0指令来写入0值至ThreadStatus寄存器的Halted与Activated位。5. Use a standard MTC0 instruction to write a value of 0 to the Halted and Activated bits of the ThreadStatus register.

正常的过程是一个线程按照这种方式终止自己。在一个特权模式当中，一个线程也可以使用MTTR指令来终止另一个线程，只不过会有额外的问题产生，这时操作系统需要决定应该释放哪个线程上下文，以及在哪点上该线程的运算状态是稳定的。The normal procedure is for a thread to terminate itself in this way. In a privileged mode, a thread can also use the MTTR instruction to terminate another thread, but there will be additional problems. At this time, the operating system needs to decide which thread context should be released, and at what point the thread's computing state is stable.

线程间通讯的储存(Inter-Thread Communication Storage)Inter-Thread Communication Storage

线程间通讯(ITC)的储存是一个可选择的功能，其可以替代用于细粒度多线程的关联载入/条件存储同步方法。因为通过载入与储存的动作来操作，所以这个ITC存储在指令集架构中是不可见的，但是在特权资源架构中，它是可见的，并且需要有效的微架构的支持。Stores for inter-thread communication (ITC) is an optional feature that can replace the associative load/conditional store synchronization method for fine-grained multithreading. This ITC store is invisible in the instruction set architecture because it operates through load and store actions, but it is visible in the privileged resource architecture and requires efficient microarchitectural support.

参照虚拟内存页面，其包含有被标示为ITC储存的TLB项目，可以被归为是一个有特定属性的储存。每一个页面映射至一组1-128个64位的存储位置，其中每一个存储位置都有一个与其相关的Empty/Full位，并且可以使用标准的载入和储存指令，以四个方法之一来访问该存储位置。该访问模式被编码在所产生的虚拟地址的最低有效(和未翻译)位，如图31的表格3100所示。Referring to virtual memory pages, which contain TLB entries marked as ITC storage, can be classified as a storage with certain attributes. Each page is mapped to a set of 1-128 64-bit storage locations, each of which has an Empty/Full bit associated with it, and can be accessed in one of four ways using standard load and store instructions to access the storage location. The access mode is encoded in the least significant (and untranslated) bits of the generated virtual address, as shown in table 3100 of FIG. 31 .

因此每一个储存位置可以用C语言的结构来描述：Therefore, each storage location can be described by the structure of C language:

struct{struct{

unit64 ef_sync_location；unit64 ef_sync_location;

unit64 force_ef_location；unit64 force_ef_location;

unit64 bypass_location；unit64 bypass_location;

unit64 ef_state；unit64 ef_state;

}ITC_location；}ITC_location;

其中，全部四个位置都参考潜在存储空间的相同64位。当每次访问实施同样的Empty/Full协议时，该储存的参考可以具有小于64位的访问类型(例如，LW，LH，LB)。where all four locations refer to the same 64 bits of underlying storage space. The stored reference may have a less than 64-bit access type (eg, LW, LH, LB) while each access implements the same Empty/Full protocol.

Empty与Full位不相同，因此不相互耦合的多项目数据缓冲器，如FIFO，可以被映射至ITC储存空间。The Empty and Full bits are not the same, so uncoupled multi-entry data buffers, such as FIFOs, can be mapped into the ITC memory space.

可以通过向和从通用储存器复制{bypass_location，ef_state}对的方式来保存和恢复ITC的存储。严格的说，当64位的bypass_location必须被保留时，只有ef_state的最低有效位需要被操控。在多项目的数据缓冲器中，每一个位置必须被读取直到Empty位，从而通过拷贝读出该缓冲器的内容。The ITC's storage can be saved and restored by copying {bypass_location, ef_state} pairs to and from general storage. Strictly speaking, only the least significant bits of ef_state need to be manipulated while the 64-bit bypass_location must be preserved. In a multi-entry data buffer, each location must be read up to the Empty bit to read the contents of the buffer by copying.

每4K页面的位置数目与每一个VPE的ITC页面的数目都是VPE或处理器可以设定的参数。The number of locations per 4K page and the number of ITC pages per VPE are parameters that can be set by the VPE or the processor.

ITC储存的“物理地址空间”可以是全局的，跨越一个多处理器系统中的所有VPE和处理器，这样一个线程便可以从正在执行该线程的一个VPE同步到另一个不同VPE的一个位置上。全局的ITC储存地址可以从每一个VPE的EBase寄存器的CPUNum字段上取得。该CPUNum的10个位对应于ITC储存地址的10个有效位。一般为了单处理器的应用所设计的处理器或内核不需要输出一个物理接口至ITC储存，并且可以将其作为一个处理器内部的资源。The "physical address space" stored by ITC can be global, spanning all VPEs and processors in a multiprocessor system, so that a thread can be synchronized from a VPE where the thread is executing to a location in a different VPE . The global ITC storage address can be obtained from the CPUNum field of each VPE's EBase register. The 10 bits of this CPUNum correspond to the 10 significant bits of the ITC storage address. Generally, a processor or core designed for uniprocessor applications does not need to output a physical interface to the ITC storage, and can use it as a resource inside the processor.

多VPE处理器Multiple VPE processors

一个内核或处理器可以实现多个共享资源的VPE，如共享功能单元。每一个VPE都可以看到自己的在MIPS32或MIPS64指令中的具体实施和特权资源架构。每一个都可以看到自己的寄存器堆或线程上下文阵列，并且也可看到自己的CP0系统协处理器和自己的TLB状态。对于在具有2-CPU高速缓存相干的SMP多处理器的软件而言，在同一个处理器上的两个VPE是无法区分的。A core or processor can implement multiple VPEs that share resources, such as shared functional units. Each VPE sees its own implementation of MIPS32 or MIPS64 instructions and privileged resource structures. Each can see its own register file or thread context array, and also see its own CP0 system coprocessor and its own TLB state. To software on a SMP multiprocessor with 2-CPU cache coherence, two VPEs on the same processor are indistinguishable.

一个处理器上的每个VPE都可以在CP0在Ebase寄存器的CPUNum字段中看到一个不同的值。Each VPE on a processor can see a different value in the CPUNum field of the Ebase register on CP0.

处理器架构上的资源，如线程上下文，TLB储存和协处理器，可以在硬件式的配置下与VPE绑定，或者可以在一个支持必须的配置能力的处理器中被动态地配置。Processor architectural resources, such as thread contexts, TLB storage, and coprocessors, can be bound to the VPE in a hardware-based configuration, or can be dynamically configured in a processor that supports the necessary configuration capabilities.

重置与虚拟处理器配置Reset and virtual processor configuration

为了能够反向兼容MIPS32与MIPS64PRA，在重置时，一个可配置的多线程/多VPE处理器必须具有完全的默认线程/VPE配置。一般的情形下都是如此，但是对于一个有单一线程上下文的单一VPE却不一定必须如此。Config3寄存器的MVP位可以在重置时被取得，用来决定动态的VPE配置是否是可能的。如果这项能力被忽略，例如在传统的软件中，该处理器就会按照默认配置中的每个具体设定来操作。To be backward compatible with MIPS32 and MIPS64PRA, a configurable multi-threaded/multi-VPE processor must have the full default thread/VPE configuration at reset. This is true in general, but not necessarily so for a single VPE with a single thread context. The MVP bit of the Config3 register can be retrieved at reset to determine if dynamic VPE configuration is possible. If this capability is ignored, such as in conventional software, the processor will operate according to each specific setting in the default configuration.

如果该MVP位被设定，则寄存器Config3的VPC(虚拟处理器配置)位就可以通过软件设定。这可以使得处理器进入一个设定状态，该设定状态下，Config4寄存器的内容可以被读出，用以决定可使用的VPE上下文，线程上下文，TLB项目和协处理器的数目，并且可使某些正常情况下只读的Config寄存器的“预设”字段变成可写入。可以将一些限制施加在配置状态指令流上，例如它们可以被禁止使用高速缓存的或是TLB映射的内存地址。If the MVP bit is set, the VPC (Virtual Processor Configuration) bit of register Config3 can be set by software. This allows the processor to enter a setup state where the contents of the Config4 register can be read to determine the number of available VPE contexts, thread contexts, TLB entries, and coprocessors, and enable The "preset" fields of some normally read-only Config registers have become writable. Some restrictions can be imposed on configuration state instruction flow, for example they can be prohibited from using cached or TLB mapped memory addresses.

在配置状态中，可配置的VPE的全部数目被编码于Config4寄存器的PVPE字段中。通过将每个VPE的索引写入EBase寄存器的CPUNum字段中，可以选择每一个VPE。对于被选择的VPE，下列的寄存器字段都可通过写入而被设定。In the configuration state, the total number of configurable VPEs is encoded in the PVPE field of the Config4 register. Each VPE can be selected by writing its index into the CPUNum field of the EBase register. For the selected VPE, the following register fields can be set by writing.

·Config1.MMU_Size· Config1.MMU_Size

·Config1.FP· Config1.FP

·Config1.MX· Config1.MX

·Config1.C2· Config1.C2

·Config3.NThreads· Config3. NThreads

·Config3.NITC_Pages· Config3.NITC_Pages

·Config3.NITC_PLocs· Config3.NITC_PLocs

·Config3.MVP· Config3.MVP

·VPESchedule·VPESchedule

并不是所有上述所提到的设定参数都需要是可配置的。举例来说，即使每一个VPE的ITC页面是可配置的，每一个页面的ITC位置的数目可以是固定的，或者两个参数都可以被固定，对于每个VPE，FPU可以被预先分配或是硬连线的，等等。Not all of the setting parameters mentioned above need be configurable. For example, even though the ITC pages per VPE are configurable, the number of ITC locations per page can be fixed, or both parameters can be fixed, for each VPE, the FPU can be pre-allocated or Hardwired, etc.

协处理器作为分离的不同单元被分配给VPE。协处理器可被多线程化的程度应该经由协处理器特定的控制和状态寄存器被表示和控制。Coprocessors are assigned to VPEs as separate distinct units. The degree to which a coprocessor can be multithreaded should be indicated and controlled via coprocessor specific control and status registers.

通过清除EBase寄存器的VPI禁止位，使能一个VPE，用于配置之后的执行。By clearing the VPI disable bit of the EBase register, enable a VPE for execution after configuration.

通过发出一个ECONF指令可以退出该配置状态。这个指令使得所有未被禁止的VPE取得一个重置异常，并且开始同时执行。如果Config3寄存器的MVP位在配置期间被清除，而且被一个ECONF指令保持为零，则该VPC位就不能再被设定，而且该处理器配置就会被有效的冻结，直到下一个处理器重置。如果MVP仍然被设定，则再次设定该VPC位可以使一个操作系统再次进入配置模式。可是，如果该处理器的一个运行中的VPE重新进入配置模式，可能会有不可预测的结果。This configuration state can be exited by issuing an ECONF command. This instruction causes all non-disabled VPEs to take a reset exception and start executing concurrently. If the MVP bit of the Config3 register is cleared during configuration and is held to zero by an ECONF instruction, the VPC bit can no longer be set and the processor configuration is effectively frozen until the next processor reset place. If MVP is still set, setting the VPC bit again can put an operating system into configuration mode again. However, if a running VPE for that processor re-enters configuration mode, there may be unpredictable results.

对于多线程处理器的服务质量调度Quality of Service Scheduling for Multithreaded Processors

到目前为止该说明书描述了一种MIPS兼容系统的具体应用扩展，用于实现多线程。如前面所述，所描述的MIPS实现只是用于列举描述，而并不用于限制本发明所包含的范围。如同之前所描述的功能与机制可以应用于MIPS之外的系统。The specification so far describes an application-specific extension of a MIPS-compatible system for implementing multithreading. As mentioned above, the described MIPS implementation is for illustrative purposes only, and is not intended to limit the scope of the present invention. The functions and mechanisms as previously described can be applied to systems other than MIPS.

在实时和近乎实时线程的多线程中的特殊服务的问题在背景段落被提出，该问题在之前有关于ThreadSchedule寄存器(图23)和VPESchedule寄存器(图24)的说明中已经简单地涉及。本说明书的以下部分将更详细地处理该问题；也更清楚地说明用于具体处理线程级服务质量(QoS)的特定扩充。The issue of special services in multithreading of real-time and near-real-time threads is raised in the background paragraph, which has been briefly covered in the previous description of the ThreadSchedule register (FIG. 23) and the VPESchedule register (FIG. 24). The following sections of this specification will deal with this issue in more detail; specific extensions to deal specifically with thread-level Quality of Service (QoS) will also be more clearly explained.

背景background

一般用于传输多媒体数据的网络设计都会牵涉到服务质量(QoS)的概念，用来描述需要使用不同策略来处理在网络中不同的数据流。以语音的传输为例，相对的对于带宽的要求不高，但是却不能忍受几十毫秒的延迟。在宽带的多媒体网络中，QoS协议可以保证在时间为关键要素的传输中，能取得任何特别的处理与优先权，这是在时间上能适时传输的必须保证。Generally, the network design for transmitting multimedia data will involve the concept of Quality of Service (QoS), which is used to describe the need to use different strategies to process different data flows in the network. Taking voice transmission as an example, the requirements for bandwidth are relatively low, but delays of tens of milliseconds cannot be tolerated. In a broadband multimedia network, the QoS protocol can ensure that any special processing and priority can be obtained in the transmission where time is the key factor, which is a necessary guarantee for timely transmission in time.

影响在单一芯片上RISC与DSP相组合的程序执行的其中一个主要问题是，在一个组合的多任务的环境中，要去保证DSP程序代码的严格实时执行是非常困难的。从而该DSP应用可被视为，在处理器带宽中需要一个QoS条件。One of the main problems affecting program execution on a combined RISC and DSP on a single chip is that it is very difficult to ensure strictly real-time execution of DSP program code in a combined multitasking environment. The DSP application can thus be viewed as requiring a QoS condition within the processor bandwidth.

多线程与QoSMultithreading and QoS

有许多种方式可以对来自多线程的指令发出进行调度。交错式的调度器可以在每个周期改变线程，而块交错式的调度器可以在当一个高速缓存丢失或其它严重的停顿发生时改变线程。以上详细描述的多线程ASE，提供了一个架构给多线程处理器，用于避免对于一个特定线程调度的机制或策略任何依赖。然而，调度策略可能对于Qos为各种线程的执行所提供保证的内容有重大的影响。There are many ways to schedule instruction issuance from multiple threads. An interleaved scheduler can change threads every cycle, while a block-interleaved scheduler can change threads when a cache miss or other severe stall occurs. The multithreaded ASE, described in detail above, provides an architecture for multithreaded processors that avoids any dependence on a particular thread scheduling mechanism or strategy. However, scheduling policies can have a significant impact on what Qos guarantees for the execution of various threads.

一个具有DSP扩充功能的RISC在Qos能够保证实时DSP程序代码能够被执行的情形下会变的更有用处。在该处理器上实现多线程，使得在一个独立的线程上执行DSP程序代码，甚至也有可能是在一个独立的虚拟处理器上执行DSP程序代码，使得DSP线程的硬件调度能够被可编程地确定来保证QoS，从而理所当然的消除了具有DSP加强功能的RISC的一个主要障碍。A RISC with DSP extensions becomes more useful in situations where QoS ensures that real-time DSP code can be executed. Implement multi-threading on the processor, so that the DSP program code is executed on an independent thread, and it is even possible to execute the DSP program code on an independent virtual processor, so that the hardware scheduling of the DSP thread can be programmatically determined To guarantee the QoS, which of course removes a major obstacle of RISC with DSP-enhanced functions.

QoS线程调度算法QoS thread scheduling algorithm

服务质量线程的调度可以宽松的被定义为一组调度机制和策略，其允许程序员或系统设计者对于一段特定的程序代码的执行时间可以作出确信的和可预测的陈述。一般来说，这些陈述的形式为“这段程序代码将执行不多于Nmax且不少于Nmin个周期”。在许多的情形下，只有Nmax数字在实际的执行中被考虑，但是在某些应用中，程序代码的运行超前于调度也会造成问题，所以Nmin也应该被考量。如果Nmax数字与Nmin数字的差距能更小的话，整个系统的行为也更能精确的被预测。The scheduling of quality of service threads can be loosely defined as a set of scheduling mechanisms and policies that allow a programmer or system designer to make confident and predictable statements about the execution time of a particular piece of program code. Generally, these statements are of the form "this program code will execute for no more than Nmax and no less than Nmin cycles". In many cases, only the Nmax number is considered in actual execution, but in some applications, the execution of the program code ahead of schedule can also cause problems, so Nmin should also be considered. If the difference between the Nmax number and the Nmin number can be smaller, the behavior of the entire system can be more accurately predicted.

简单的优先权方案simple priority scheme

一种简单的模型被提出来，用于在多线程发出调度时提供一定程度的QoS，其为简单地将最高优先权分配给一个指定实时线程，因此当该线程可执行时，总是选择它来发出指令。这种方式可以提供一个Nmin的最小值，似乎也可提供该指定线程的Nmax的可能的最小值，但是仍然会有一些不太好的后果。A simple model was proposed to provide some degree of QoS when multiple threads issue a schedule, which is to simply assign the highest priority to a given real-time thread, so that when that thread is executable, it is always chosen to issue instructions. This approach can provide a minimum value of Nmin and seems to provide the minimum possible value of Nmax for the specified thread, but there are still some unfavorable consequences.

首先，在该方案中只有一个线程可以有QoS的保证。该算法暗含了在一个不同于该指定实时线程的线程中，任意程序代码的Nmax会变成实质上不受约束。其次，当该特定线程内的一段程序代码的Nmin数被最小化的时候，则异常就必须被包含在模型当中。如果该指定线程产生该异常，则该Nmax的值就会变的更复杂，并且在某种情形下是不可能确定的。如果该异常是被该指定线程以外的线程所产生的，则该指定线程中的程序代码的Nmax就会受到严格约束，但是该处理器的中断响应时间变得不受约束。First, only one thread can have QoS guarantee in this scheme. The algorithm implies that in a thread other than the designated real-time thread, Nmax for arbitrary program code becomes substantially unbound. Second, exceptions must be included in the model when the Nmin number of a piece of program code within that particular thread is to be minimized. If the specified thread generates the exception, the value of Nmax becomes more complicated and in some cases impossible to determine. If the exception is generated by a thread other than the designated thread, the Nmax of the program code in the designated thread is strictly constrained, but the interrupt response time of the processor becomes unconstrained.

这种简单的优先权方案也许在某些情形下是有用的，并且在硬件的实现上也有实际的优点，但是它们仍然没有提供一个通用的QoS调度的解决方案。Such simple priority schemes may be useful in some situations, and have practical advantages in hardware implementation, but they still do not provide a general QoS scheduling solution.

基于保留的方案reservation-based scheme

另一个功能更强大且独特的线程调度模型是基于保留发出时段。在这种方案中，硬件调度机制允许一个或多个线程可被分配得到M个连续发出时段中的N个。在一个没有中断的环境中，对于一个实时代码段，该方案并没有提供向优先权方案所提供的低Nmin值，可是却拥有了其它的优点。Another more powerful and unique thread scheduling model is based on reserved issue windows. In this scheme, a hardware scheduling mechanism allows one or more threads to be assigned N of M consecutive issue periods. In a non-disruptive environment, for a real-time code segment, this scheme does not provide the low Nmin value offered by the priority scheme, but has other advantages.

·多于一个的线程可以被保证有QoS。• More than one thread can be guaranteed QoS.

·即使当中断是与具有最高优先权的线程之外的其他线程绑定的，该中断延迟也可以是受约束的。这样可以使实时程序代码区段有较低的Nmax。• The interrupt latency can be bound even when the interrupt is bound to a thread other than the one with the highest priority. This results in a lower Nmax for the real-time program code section.

基于保留方案的调度的一种简单形式是，将每第N个发出时段分配给一个实时线程。在1与2之间并没有N的中间值，这说明在一个多线程环境中的实时线程可以取得最多50％的处理器的发出时段。当一个实时任务需要使用多于50％的嵌入式处理器的带宽时，十分需要一种方案，其允许更灵活地分配发出带宽。A simple form of reservation-based scheduling assigns every Nth issue period to a real-time thread. There is no intermediate value of N between 1 and 2, which means that real-time threads in a multithreaded environment can take up to 50% of the processor's issue time. When a real-time task needs to use more than 50% of the embedded processor's bandwidth, a solution that allows more flexible allocation of the issue bandwidth is highly desirable.

具有QoS的混合线程调度Hybrid thread scheduling with QoS

上述的多线程系统是侧重中立的调度策略，但是还可以被扩充，以允许形成一种混合的线程调度模型。在这种模型中，实时线程可以被给予一定比例的线程发出时段的固定调度，并且用与实现方式相关的默认调度方案来分配剩余的时段。The multithreading system described above focuses on a neutral scheduling strategy, but can also be extended to allow a hybrid thread scheduling model. In this model, real-time threads can be given a fixed schedule for a proportion of the thread issue slots, and use an implementation-dependent default schedule to allocate the remaining slots.

绑定线程至发出时段Bind thread to issue period

处理器中的指令是被快速地顺序发出。在一个多线程的环境当中，在多数的线程当中，一个线程可以通过在一个给定的时段数目中所占用的时段数目的比例来计算出所使用的带宽。相反的，本发明认识到，可以任意的声明一确定数目的时段，并限制该处理器为某特定的线程保留该固定数目中的一定数目的时段。从而可以指定带宽中的一个固定部分来保证一个实时线程。Instructions in the processor are issued sequentially in rapid succession. In a multi-threaded environment, among the majority of threads, a thread can calculate the bandwidth used by the ratio of the number of slots it occupies in a given number of slots. Instead, the present invention recognizes that a certain number of time slots can be arbitrarily declared, and the processor is restricted to reserve a certain number of time slots within that fixed number for a particular thread. Thus, a fixed portion of the bandwidth can be specified to guarantee a real-time thread.

很清楚地，可以将时段按比例地分配给多于一个实时线程，并且该方案进行操作的粒度受到发出时段的该固定数目的约束，所述比例就是以该固定数目为基础得到的。举例来说，如果选择32个时段，则任意一个特定的线程可以被保证具有带宽的1/32至32/32。Clearly, slots can be distributed proportionally to more than one real-time thread, and the granularity at which the scheme operates is constrained by the fixed number of issue slots on which the ratio is based. For example, if 32 slots are chosen, any one particular thread may be guaranteed to have 1/32 to 32/32 of the bandwidth.

也许用于将固定发出带宽分配给线程的最具一般性的模型是将每个线程与一对整数{N，D}关联，这对整数表示分配给该线程的发出时段比例的分子与分母，例如是1/2或4/5。如果所允许的整数范围足够大的话，这样可以允许对于线程优先权分配的几乎任意细粒度的调整，但是如此做的话还是会有一些实质上的缺点。其中一个问题是，要使用一个硬件逻辑将一个很大的配对集合{N0，D0}，{N1，D1}，…{Nn，Dn}}转换成一个发出调度并不是一件简单的事，并且多于100％的时段被分配这种错误情况无法非常容易的被检测出来。另一个问题就是，这种方案允许在相当长的一段时间上，一个线程被分配N/D比例的发出时段，但是它并不一定允许任意声明关于哪个发出时段被分配给一个较短子程序代码片段的线程。Perhaps the most general model for allocating fixed issue bandwidth to threads is to associate each thread with a pair of integers {N, D} denoting the numerator and denominator of the fraction of the issue period allocated to that thread, For example 1/2 or 4/5. This allows almost arbitrarily fine-grained tuning of thread priority assignments if the allowed integer range is large enough, but there are some substantial disadvantages to doing so. One of the problems is that it is not trivial to convert a large set of pairs {N0, D0}, {N1, D1}, ... {Nn, Dn}} into an issue schedule using a piece of hardware logic, and The error case that more than 100% of the slots are allocated cannot be detected very easily. Another problem is that this scheme allows a thread to be assigned N/D ratios of issue slots over a considerable period of time, but it doesn't necessarily allow arbitrary statements about which issue slots are assigned to a shorter subroutine code Fragment's thread.

因此，在本发明的一较佳实施例中，不使用整数对，而是用一个位向量与每一个需要有实时带宽QoS的线程相关联，该位向量表示要被分配给该线程的调度时段。在该较佳实施例中，该向量也就是前述ThreadSchedule寄存器(图23)的内容，可以被系统软件所看到。虽然该ThreadSchedule寄存器具有32位宽的调度“掩码”，但是在其他实施例中该屏蔽当中可以具有更长或更短的位宽度。一个具有32位宽度的线程调度掩码可以允许一个线程被分配从1/32至32/32的该处理器的发出带宽，并且也可进一步对于特定的发出线程给予特定的发出带宽模式。对于一个32位的掩码，值0xaaaaaaaa将每第二个时段分配给该线程。值0x0000ffff也可以将50％的发出带宽分配给该线程，但是以16个连续时段的块方式进行分配。将值0xeeeeeeee分配给线程X并且将值0x01010101分配给线程Y，从而给予线程X每四个周期中的三个(32个中的24个)和给予线程Y每八个周期中的一个(32个中的4个)，并且将剩下的每一组32个周期中的4个，被其它可能确定性较低的硬件算法分配给其它线程。更进一步的说，线程X将具有每四个周期中的三个，并且该线程Y在两组连续指令之间具有不超过八个周期。Therefore, in a preferred embodiment of the present invention, instead of using integer pairs, a bit vector is associated with each thread requiring real-time bandwidth QoS, and the bit vector represents the scheduling period to be allocated to the thread . In this preferred embodiment, this vector is the content of the aforementioned ThreadSchedule register (FIG. 23), which can be seen by the system software. Although the ThreadSchedule register has a 32-bit wide scheduling "mask," there may be longer or shorter bit widths in the mask in other embodiments. A thread scheduling mask with a width of 32 bits may allow a thread to be allocated from 1/32 to 32/32 of the processor's issue bandwidth, and may further assign specific issue bandwidth patterns to specific issuing threads. For a 32-bit mask, the value 0xaaaaaaaa will be assigned to the thread every second epoch. A value of 0x0000ffff also assigns 50% of the issue bandwidth to this thread, but in blocks of 16 consecutive periods. The value 0xeeeeeeee is assigned to thread X and the value 0x01010101 to thread Y, giving thread X three of every four cycles (24 of 32) and thread Y one of every eight cycles (32 4 of them), and 4 of the remaining 32 cycles in each group are allocated to other threads by other possibly less deterministic hardware algorithms. Further, thread X will have three of every four cycles, and thread Y will have no more than eight cycles between two consecutive sets of instructions.

在该实施例中的调度冲突可以被很简单地检测出，因为没有一位将被设置在多于一个线程的ThreadSchedule寄存器中。也就是说，如果为一个线程设定了一个特定的位，那么对于被分配了发出掩码的所有其他线程，该位必须是零值。因此，如果有任何的冲突，都可以被轻易的检测出。Scheduling conflicts in this embodiment can be detected very simply because no bit will be set in the ThreadSchedule register for more than one thread. That is, if a particular bit is set for one thread, that bit must be zero for all other threads that are assigned an emit mask. Therefore, if there is any conflict, it can be easily detected.

实时线程的发出逻辑相对的简单直接：每一个发出机会皆关联至一个32模数的索引，该索引会被传送给所有就绪的线程，这些就绪线程中的至多一个会被分配该关联的发出时段。如果得到该时段，则该相关联的线程就会发出它的下一个指令。如果没有任何线程拥有该时段，则该处理器会选择一个可执行的非实时线程。The issue logic of real-time threads is relatively simple and straightforward: each issue opportunity is associated with a 32-modulus index, which will be sent to all ready threads, and at most one of these ready threads will be assigned the associated issue period . If the slot is obtained, the associated thread issues its next instruction. If no thread owns the slot, the processor selects an executable non-real-time thread.

ThreadSchedule寄存器的硬件实现如果使用少于32位的话，可以减少每个线程的储存与逻辑的大小，但却会同时降低调度的灵活性。原则上，该寄存器可以扩充至64位，或甚至被实现(在MIPS处理器的情况下)为一连串的寄存器，增加了在MIPS32 CP0寄存器空间中的选择值，从而提供更长的调度向量。If the hardware implementation of the ThreadSchedule register uses less than 32 bits, it can reduce the storage and logic size of each thread, but it will reduce the scheduling flexibility at the same time. In principle, this register could be extended to 64 bits, or even implemented (in the case of MIPS processors) as a chain of registers, increasing the select value in the MIPS32 CP0 register space, thus providing longer scheduling vectors.

使线程免除中断服务exempt thread from interrupt servicing

如前所述，中断服务可以使得执行该异常程序的线程在执行时间上具有很大的可变性。因此，期望使需要严格QoS保证的线程能够免除中断服务。这里提出了一个较佳的实施例，对于每一个线程利用一个单一位，该位对于操作系统是可以看见的，用来使任何异步异常延迟，直到一个非免除的线程被调度(即，ThreadStatus寄存器的IXMT位，请参考图18与图19)。这样会增加中断的延迟，但是通过ThreadSchedule寄存器的值的选择，可以将该中断延迟限制在受约束和可控的程度下。如果中断处理程序只是执行在那些没有被分配给可免除的实时QoS线程的发出时段中，自然地该中断服务对于该实时程序代码的执行时间就没有任何优先的影响。As mentioned earlier, interrupt servicing can cause the thread executing the exception program to have great variability in execution time. Therefore, it is desirable to enable threads requiring strict QoS guarantees to be interrupt service free. A preferred embodiment is presented here that utilizes a single bit per thread, visible to the operating system, to delay any asynchronous exception until a non-exempt thread is scheduled (i.e., the ThreadStatus register IXMT bit, please refer to Figure 18 and Figure 19). This will increase the delay of the interrupt, but through the selection of the value of the ThreadSchedule register, the interrupt delay can be limited to a constrained and controllable level. If the interrupt handler is only executed in those issue slots that are not assigned to exempt real-time QoS threads, naturally the interrupt servicing does not have any priority impact on the execution time of the real-time program code.

线程与虚拟处理单元的发出时段分配Issue slot allocation for threads and virtual processing units

以上详细描述的多线程ASE描述了一种线程资源的有层次的分配，其中一些数目的VPE(虚拟处理单元)各自具有一些数目的线程。每一个VPE具有CP0的硬件实现和特权资源架构(当配置在一MIPS处理器上时)，所以运行在其中一个VPE上的操作系统软件(OS)不可能直接知道和控制被其它VPE所要求的发出时段。因此每一VPE的发出时段名称空间被关联到该VPE，这就形成了一个发出时段分配的层次结构。The multithreaded ASE described in detail above describes a hierarchical allocation of thread resources where some number of VPEs (Virtual Processing Units) each have some number of threads. Each VPE has a hardware implementation of CP0 and a privileged resource architecture (when configured on a MIPS processor), so the operating system software (OS) running on one of the VPEs cannot directly know and control what is required by other VPEs issue period. The issue window namespace for each VPE is thus associated to that VPE, which forms a hierarchy of issue window assignments.

图34为调度电路3400的方框示意图，其描述了这种线程资源的层次分配。处理器调度器3402(即，主处理器的全部调度逻辑)经由“时段选择”信号3403传递一个发出时段号码至该主处理器内的全部VPE中的全部VPESchedule寄存器。信号3403对应于VPESchedule寄存器内一个位的位置(在本较佳实施例中，即为三十二个位置中的一个)。通过使该位的位置在每个发出时段出现时移至一个增加的位置，并且当到达了最高有效位位置时(即，在此较佳实施例中是第31位)再重置到最低有效位位置(即，第0位)，调度器3402重复地循环信号3403。FIG. 34 is a block schematic diagram of a scheduling circuit 3400 depicting such a hierarchical allocation of thread resources. The processor scheduler 3402 (ie, the overall scheduling logic of the host processor) passes an issue slot number via the "slot select" signal 3403 to all VPESchedule registers in all VPEs within the host processor. Signal 3403 corresponds to a bit position (in the preferred embodiment, one of thirty-two positions) in the VPESchedule register. By shifting the bit position to an incremental position each emission period occurs, and resetting to the least significant bit position when the most significant bit position is reached (ie, bit 31 in the preferred embodiment) bit position (ie, bit 0), the scheduler 3402 repeatedly cycles the signal 3403 .

再参照图34，以此图为例，位位置1(即，时段1)经由信号3403传递至该主处理器的全部VPESchedule寄存器，即寄存器3414与3416。在任一个VPESchedule寄存器中，如果其对应位为“设定”(即，该位为逻辑1)，该寄存器就用一个“VPE发出要求”信号来通知处理器调度器。作为响应，调度器就会用一个“VPE发出允许”信号来允许该VPE使用目前的发出时段。再参照图34，(VPE0中的)VPESchedule寄存器3414的位位置1被设定，因此发出了一个VPE发出要求信号3415至处理器调度器3402，然后该处理器调度器3402也会响应一个VPE发出允许信号3405。Referring again to FIG. 34 , taking this figure as an example, bit position 1 (ie, period 1 ) is passed to all VPESchedule registers of the host processor, ie, registers 3414 and 3416 , via signal 3403 . In any VPESchedule register, if its corresponding bit is "set" (ie, the bit is logic 1), the register notifies the processor scheduler with a "VPE issued request" signal. In response, the scheduler will allow the VPE to use the current issue slot with a "VPE issue allow" signal. Referring again to FIG. 34 , the bit position 1 of the VPESchedule register 3414 (in VPE0) is set, so a VPE sends a request signal 3415 to the processor scheduler 3402, and then the processor scheduler 3402 also responds to a VPE sent Allow signal 3405.

当一个VPE被授予一个发出时，他在VPE层次上采用相类似的逻辑。再参照图34，VPE调度器3412(即VPE0 3406的调度逻辑)响应于信号3405，而经由时段选择信号3413传递一个发出时段号码给该VPE内的全部ThreadSchedule寄存器。这些ThreadSchedule寄存器每一个都关联至由该相关VPE所支持的线程。信号3413对应于ThreadSchedule寄存器中的一个位位置(在本实施例中，可以是三十二个位中的一个)。通过使位位置在每个发出时段出现时移至一个增加的位置，并且当到达了最高有效位位置时(即，在此较佳实施例中是第31位)再重置到最低有效位位置(即，第0位)，调度器3412重复地循环信号3403。该时段号码独立于在VPESchedule层次所使用的时段号码。When a VPE is granted an issue, it uses similar logic at the VPE level. Referring again to FIG. 34 , the VPE scheduler 3412 (i.e. the scheduling logic of VPE0 3406) responds to the signal 3405, and transmits a sending period number to all ThreadSchedule registers in the VPE via the period selection signal 3413. Each of these ThreadSchedule registers is associated to a thread supported by the associated VPE. Signal 3413 corresponds to a bit position (in this embodiment, one of thirty-two bits) in the ThreadSchedule register. By shifting the bit position to an incremental position at the occurrence of each issue period, and resetting to the least significant bit position when the most significant bit position is reached (ie, bit 31 in the preferred embodiment) (ie, bit 0), the scheduler 3412 loops the signal 3403 repeatedly. This slot number is independent of the slot number used at the VPESchedule level.

请参照图34并以其为例，位位置0(即，“时段0”)经由信号3413传递至在该目标VPE内的全部ThreadSchedule寄存器，也就是寄存器3418与3420。任一个线程的ThreadSchedule寄存器的该选定位置的位已被设定的话，该线程通知VPE调度器，从而被允许使用目前的发出时段。参照图34，(线程0的)ThreadSchedule寄存器3418的位位置0被设定，因此将线程发出要求信号3419发送至该VPE调度器3412，而该VPE调度器也响应了一个线程发出允许信号3417(从而允许线程0可以使用目前的发出时段)。在一些周期当中，如果VPESchedule寄存器中没有设定与指定的时段对应的位，或是ThreadSchedule寄存器中没有设定与指定的时段对应的位，则该处理器或VPE调度器就会根据某种其它默认调度算法来分配下一个发出时段。Referring to FIG. 34 and taking it as an example, bit position 0 (ie, "Period 0") is passed via signal 3413 to all ThreadSchedule registers in the target VPE, ie, registers 3418 and 3420 . If any thread's ThreadSchedule register bit in the selected location has been set, the thread notifies the VPE scheduler that it is allowed to use the current issue slot. Referring to FIG. 34, bit position 0 of the ThreadSchedule register 3418 (of thread 0) is set so that a thread issue request signal 3419 is sent to the VPE scheduler 3412, which in turn responds to a thread issue enable signal 3417 ( Thus allowing thread 0 to use the current issue period). During some cycles, if the bit corresponding to the specified time period is not set in the VPESchedule register, or if the bit corresponding to the specified time period is not set in the ThreadSchedule register, the processor or VPE scheduler will execute according to some other The default scheduling algorithm to allocate the next emission slot.

根据之前所述，在一个较佳实施例当中，每一个VPE，例如图34的VPE0(3406)和VPE1(3404)，都具有一个VPESchedule寄存器(其格式显示于图24)，用来允许以该寄存器内容的长度为模的特定时段，可以被确定地分配给该VPE。图34的VPESchedule寄存器为VPE0的寄存器3414和VPE1的寄存器3416。那些没有被分配给任何VPE的发出时段，通过特定实现方式的分配策略进行分配。According to the foregoing, in a preferred embodiment, each VPE, such as VPE0 (3406) and VPE1 (3404) of FIG. 34, has a VPESchedule register (its format is shown in FIG. 24) to allow The length of the register content is modulo a specific period that can be deterministically assigned to the VPE. The VPESchedule registers in FIG. 34 are the register 3414 of VPE0 and the register 3416 of VPE1. Those emission slots that are not allocated to any VPE are allocated by an implementation-specific allocation policy.

另外根据上文的描述，被分配给在一个VPE之内的线程的时段是从给予该VPE的时段中分配的。举一个具体例子，如果一个处理器有两个VPE，如图34所示，其中一个VPE的VPESchedule寄存器具有0xaaaaaaaa值，而另一个VPE的VPESchedule寄存器具有0x55555555值，则发出时段就会被交替分配给这两个VPE。如果这两个VPE之一中的一个线程的ThreadSchedule寄存器包含0x55555555值，则该线程会取得包含该线程的VPE的每两个发出时段中的一个，或者是说整个处理器的每四个发出时段中的一个。Also according to the description above, the time slots allocated to threads within a VPE are allocated from the time slots given to that VPE. As a concrete example, if a processor has two VPEs, as shown in Figure 34, and one of the VPEs has a VPESchedule register with a value of 0xaaaaaaaa and the other VPE has a VPESchedule register with a value of 0x55555555, the issue slots are alternately assigned to These two VPEs. If a thread's ThreadSchedule register in one of these two VPEs contains the value 0x55555555, the thread gets one of every two issue slots for the VPE that contains that thread, or every fourth issue slot for the entire processor one of the.

因此，该每一个VPE相关的VPESchedule寄存器的值决定了每个VPE会得到哪些处理时段。特定线程被分配给每一个VPE，例如是VPE0中所示的线程0与线程1。其它没有显示的线程也类似地被分配给VPE1。每个线程都有一个关联的ThreadSchedule寄存器，例如线程0的寄存器3418，线程1的寄存器3420。ThreadSchedule寄存器的值决定了一个VPE中每个线程的处理时段的分配。Therefore, the value of the VPESchedule register associated with each VPE determines which processing time slots each VPE will get. Specific threads are assigned to each VPE, such as thread 0 and thread 1 shown in VPE0. Other threads not shown are similarly assigned to VPE1. Each thread has an associated ThreadSchedule register, eg register 3418 for thread 0, register 3420 for thread 1. The value of the ThreadSchedule register determines the allocation of processing time slots for each thread in a VPE.

调度器3402与3412可以用简单的组合逻辑来实现，以执行上述的功能，根据本发明的公开内容，建构这些调度器并不需要复杂的实验就能够由本领域技术人员来实现。例如，调度器的构成也可以使用传统的方法，如通过组合逻辑，可程序逻辑，软件等等，用以得到所描述的功能。The schedulers 3402 and 3412 can be implemented with simple combinational logic to perform the above-mentioned functions. According to the disclosure of the present invention, the construction of these schedulers can be implemented by those skilled in the art without complicated experiments. For example, the scheduler can also be constructed using conventional methods, such as combinational logic, programmable logic, software, etc., to achieve the described functions.

图33描述了一个通用形式的计算机系统3300，根据本发明的各种实施例可以在该计算机系统上实施。该系统包含了具有必须的译码和执行逻辑的一个处理器3302(本领域一般技术人员对此应该很清楚)，用以支持一个或多个上述指令(即，FORK，YIELD，MFTR，MTTR，EMT，DMT和ECONF)。在一个较佳实施例当中，内核3302还包含如图34所示的调度电路3400，并且代表上述的“主处理器”。系统3300还包含：系统接口控制器3304，可以与该处理器双向通信；RAM3316和ROM3314，可被系统接口控制器进行存取；三个I/O装置3306，3308和3310，通过总线3312与系统接口控制器通信。通过这里对装置和程序代码应用的详细描述，系统3300可以作为一个多线程系统进行操作。本领域技术人员应该很清楚，图33中所示的一般形式可以有很多替代形式。举例来说，总线3312可以有许多的形式来实现，并且在某些实施例当中可以是一种芯片上的总线。同样的，I/O装置的数目也只是为了描述方便，实质上是可以在不同的系统上做任意的变更。另外，虽然在图中只有装置3306发出了一个中断要求，很明显地其它的装置也可以发出中断要求。FIG. 33 depicts a general form of computer system 3300 upon which various embodiments according to the invention may be implemented. The system includes a processor 3302 (as should be apparent to those of ordinary skill in the art) with the necessary decoding and execution logic to support one or more of the above instructions (i.e., FORK, YIELD, MFTR, MTTR, EMT, DMT and ECONF). In a preferred embodiment, the core 3302 also includes a scheduling circuit 3400 as shown in FIG. 34 and represents the above-mentioned "main processor". System 3300 also includes: system interface controller 3304, which can communicate bidirectionally with the processor; RAM3316 and ROM3314, which can be accessed by the system interface controller; Interface controller communication. With the detailed description of the apparatus and program code used herein, the system 3300 can operate as a multi-threaded system. It should be clear to those skilled in the art that many alternatives to the general form shown in Figure 33 are possible. For example, bus 3312 may be implemented in many forms, and in some embodiments may be an on-chip bus. Similarly, the number of I/O devices is only for convenience of description, and can be changed arbitrarily in different systems in essence. Additionally, although only device 3306 has issued an interrupt request in the figure, it is obvious that other devices can also issue interrupt requests.

更进一步的改善further improvement

到目前为止，所描述的实施例中，32位的ThreadSchedule寄存器和VPESchedule寄存器并不允许精确地分配奇数比例的发出带宽。如果一个程序员期望精确地分配所有发出时段的三分之一给一个指定的线程，他只能近似到10/32或11/32。在一个实施例中，一个具有可程序的掩码或长度的寄存器，允许程序员去指定ThreadSchedule寄存器和/或VPESchedule寄存器中的位的子集，在重新开始这次序列之前被发出逻辑使用。在所提出的例子当中，该程序员设定了只有30个位是有效的，并且将VPESchedule寄存器和/或ThreadSchedule寄存器适当地编程为具有值0x24924924。In the embodiments described so far, the 32-bit ThreadSchedule and VPESchedule registers do not allow precise allocation of odd proportions of issue bandwidth. If a programmer wishes to allocate exactly one-third of all issue slots to a given thread, he can only approximate to 10/32 or 11/32. In one embodiment, a register with a programmable mask or length allows the programmer to specify a subset of bits in the ThreadSchedule register and/or the VPESchedule register to be used by the issuing logic before restarting the sequence. In the example presented, the programmer set only 30 bits to be valid, and programmed the VPESchedule register and/or the ThreadSchedule register to have a value of 0x24924924 as appropriate.

本文所描述的多线程ASE当然可以实现在硬件中，例如，在中央处理单元(CPU)，微处理器，数字信号处理器，处理器内核，系统整合芯片(SOC)或其它任何可编程器件内，或与上述各个器件连接。另外，该多线程ASE也可以实现在软件之中(例如，计算机可读程序代码，程序代码，任何形式的指令和/或数据，如源语言，目标语言或机器语言)，该软件设置在计算机可使用的(例如，可读的)的介质中，该介质用来储存该软件。该软件实现了这里描述的装置和过程的功能，制造，建模，仿真，描述和/或测试。举例来说，这些可以通过使用以下工具实现：通用编程语言(比如C语言，C++语言)，GDSII数据库，硬件描述语言(HDL)，其包含Verilog HDL、VHDL、AHDL(Altera HDL)等等，或者其它可利用的程序，数据库和/或电路(即原理图)设计工具。这些软件可以被置于任何已知的计算机可使用的介质，包含半导体，磁盘，光盘(例如CD-ROM，DVD-ROM等等)，或是作为任何计算机可使用(例如，可读取)的传输介质(例如，载波，包括数字、光学的任何其它介质，或基于模拟的介质)中所容纳的计算机数据信号。因此，该软件可以在包括互联网络与内联网络的通信网络中传输。The multithreaded ASE described herein can of course be implemented in hardware, for example, in a central processing unit (CPU), microprocessor, digital signal processor, processor core, system on chip (SOC) or any other programmable device , or connect with each of the above devices. In addition, the multithreaded ASE can also be implemented in software (for example, computer readable program code, program code, instructions and/or data in any form, such as source language, target language or machine language), the software is set on the computer A usable (eg, readable) medium is used to store the software. The software enables the function, fabrication, modeling, simulation, description and/or testing of the devices and processes described herein. For example, these can be realized by using the following tools: general-purpose programming languages (such as C language, C++ language), GDSII database, hardware description language (HDL), which includes Verilog HDL, VHDL, AHDL (Altera HDL), etc., or Other available programs, databases and/or circuit (ie, schematic) design tools. The software may reside on any known computer-usable medium, including semiconductor, magnetic disk, optical disk (e.g. CD-ROM, DVD-ROM, etc.), or as any computer-usable (e.g., readable) A computer data signal embodied in a transmission medium such as a carrier wave, any other medium including digital, optical, or analog based. Accordingly, the software may be transmitted across communication networks including the Internet and intranets.

一个由软件实施的多线程ASE可以被包含在一个半导体的知识产权内核内，如一个处理器内核(例如，以HDL实现)，并且可以在集成电路的生产过程中被转变成硬件。另外，这里描述的一个多线程ASE也可以作为硬件与软件的组合实现。A software-implemented multithreaded ASE can be included within a semiconductor intellectual property core, such as a processor core (eg, implemented in HDL), and can be converted to hardware during the production of integrated circuits. Alternatively, a multithreaded ASE as described here can also be implemented as a combination of hardware and software.

对于本领域技术人员来说，很明显地可以在不超出本发明所揭示的精神与范围的情况下，对本发明所揭示的实施例加以润饰与修改。举例来说，之前所描述的实施例大多是使用MIPS处理器，架构和技术，作为具体例子。本发明具有各种实施例，可以被用于更广的范围，而不限于这些具体例子。更进一步而言，一个本领域技术人员可以找到方法对本发明所描述的功能性做些微的改变，而这仍然是在本发明的精神与范围内。在描述QoS时，ThreadSchedule寄存器与VPESchedule寄存器的内容不限于所描述的长度，而且可以在本发明的精神和范围内作出修改。It is obvious to those skilled in the art that modifications and modifications can be made to the disclosed embodiments of the present invention without departing from the spirit and scope of the present invention. For example, the previously described embodiments mostly use the MIPS processor, architecture and technology as specific examples. The present invention has various embodiments and can be applied in a wider range without being limited to these specific examples. Furthermore, one skilled in the art may find ways to make minor changes to the described functionality of the invention and still remain within the spirit and scope of the invention. When describing QoS, the contents of the ThreadSchedule register and the VPESchedule register are not limited to the described lengths, and can be modified within the spirit and scope of the present invention.

因此，实质上只能依照所附权利要求的范围来限定本发明的范围。Accordingly, it is intended that the scope of the invention be limited essentially only by the scope of the appended claims.

Claims

1. can support and carry out in the processor of multiprogram thread at one, a kind of mechanism of processing comprises:

A parameter is used to dispatch a program threads; And

An instruction places this program threads, and can this parameter of access;

Wherein, when this parameter equaled first numerical value, this instruction rescheduled this program threads according to one or more conditions that are encoded in this parameter.

2. mechanism as claimed in claim 1, wherein this parameter is kept in the data storage device.

3. mechanism as claimed in claim 1 wherein equals second value when this parameter, and this second value is when being not equal to this first numerical value, and this instruction discharges this program threads.

4. mechanism as claimed in claim 3, wherein this second value is zero.

5. mechanism as claimed in claim 1 wherein equals second value when this parameter, and this second value is when being not equal to this first numerical value, and this instruction unconditionally reschedules this program threads.

6. mechanism as claimed in claim 5, wherein this second value is an odd number.

7. mechanism as claimed in claim 5, wherein this second value is negative 1.

8. mechanism as claimed in claim 1, wherein to convey other thread this program threads till this condition is satisfied relevant with carrying out chance for condition in these one or more conditions.

9. mechanism as claimed in claim 8, wherein this condition is encoded in one of bit vector in this parameter or bit field.

10. mechanism as claimed in claim 5, wherein, under the situation that this program threads is rescheduled, the execution meeting of this program threads should continue instruction position afterwards in this thread.

11. mechanism as claimed in claim 3 wherein equals third value when this parameter, and this third value is when being not equal to this first numerical value and this second value, this instruction unconditionally reschedules this program threads.

12. mechanism as claimed in claim 1, wherein a condition in these one or more conditions is a hardware interrupts.

13. mechanism as claimed in claim 1, wherein a condition in these one or more conditions is a software interruption.

14. mechanism as claimed in claim 1, wherein, under the situation that this program threads is rescheduled, the execution meeting of this program threads should continue instruction position afterwards in this thread.

15. can support and carry out in the processor of multiprogram thread at one, a kind ofly reschedule the method for carrying out or discharging this thread itself by a thread, comprising:

(a) send an instruction, the part of records in data memory storage of this instruction accessing, this part record coding with the relevant one or more parameters of one or more conditions that whether rescheduled of this thread of decision; And

(b) reschedule this thread or discharge this thread according to this condition according to these the one or more parameters in this partial record.

16. method as claimed in claim 15, wherein this record is placed in the general-purpose register (GPR).

17. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

18. method as claimed in claim 17, this wherein relevant with this d/d thread parameter is a null value.

19. method as claimed in claim 15, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

20. method as claimed in claim 19, wherein this parameter is any odd number value.

21. method as claimed in claim 19, wherein this parameter is negative 1 two's complement value.

22. method as claimed in claim 15, wherein to convey other thread this thread till specified conditions are satisfied relevant with carrying out chance for parameter in these parameters.

23. method as claimed in claim 22, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

24. method as claimed in claim 15, wherein, send under this instruction and the quilt situation that reschedules at this thread, when these one or more conditions were satisfied, continued the position after the execution meeting of this thread this instruction that this thread sent in this thread instruction stream.

25. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

26. method as claimed in claim 15, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

27. method as claimed in claim 15, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

28. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

29. support and the digital processing unit of carrying out a plurality of software entitys comprise:

Part of records in data storage device, this partial record one or more parameters relevant of having encoded with one or more conditions, these one or more conditional decisions when a thread will be carried out chance and convey other thread this thread whether rescheduled.

30. digital processing unit as claimed in claim 29, wherein this partial record is placed in the general-purpose register (GPR).

31. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

32. digital processing unit as claimed in claim 31, this wherein relevant with this d/d thread parameter is a null value.

33. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

34. digital processing unit as claimed in claim 33, wherein the value of this parameter is any odd number value.

35. digital processing unit as claimed in claim 33, wherein the value of this parameter is negative 1 two's complement value.

36. digital processing unit as claimed in claim 29, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

37. digital processing unit as claimed in claim 36, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

38. digital processing unit as claimed in claim 29, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters and the quilt thread that wait dispatches that requeues is relevant.

39. digital processing unit as claimed in claim 29, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

40. digital processing unit as claimed in claim 29, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

41. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

42. the disposal system that can support and carry out a plurality of program threads comprises:

A digital processing unit;

Part of records in data storage device, this partial record one or more parameters relevant of having encoded with one or more conditions, these one or more conditional decisions a thread whether rescheduled; And

An instruction set comprises an instruction that is used to reschedule and discharge this thread;

Wherein when this thread sends this instruction, this instruction accessing this one or more parameters in should record, and this system reschedules according to these one or more conditions according to these the one or more parameters in this partial record or discharges the thread that this sends instruction.

43. disposal system as claimed in claim 42, wherein this partial record is placed in the general-purpose register (GPR).

44. disposal system as claimed in claim 41, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

45. disposal system as claimed in claim 44, this wherein relevant with this d/d thread parameter is a null value.

46. disposal system as claimed in claim 44, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

47. disposal system as claimed in claim 46, wherein the value of this parameter is any odd number value.

48. disposal system as claimed in claim 46, wherein the value of this parameter is negative 1 two's complement value.

49. disposal system as claimed in claim 41, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

50. disposal system as claimed in claim 49, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

51. disposal system as claimed in claim 44, wherein, a thread send this instruction and the situation that rescheduled conditionally under, when these one or more conditions were satisfied, continued the position of the execution meeting of this thread after should instruction in this thread instruction stream.

52. disposal system as claimed in claim 42, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

53. disposal system as claimed in claim 42, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

54. disposal system as claimed in claim 42, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

55. disposal system as claimed in claim 42, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

56. digital storage medium, be written into instruction on it from an instruction set, be used on a digital processing unit, carrying out each software thread of a plurality of software threads, this instruction set has comprised an instruction, be used to make the thread that sends instruction to abandon carrying out, an and parameter in the record of the some in data storage device of access, wherein, the condition and this parameter correlation that discharge or reschedule, and discharge according to this condition according to this parameter in this partial record or reschedule.

57. digital storage medium as claimed in claim 56, wherein this record is placed in the general-purpose register (GPR).

58. digital storage medium as claimed in claim 57, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

59. digital storage medium as claimed in claim 58 should the parameter relevant with d/d thread be a null value wherein.

60. digital storage medium as claimed in claim 56, wherein a parameter in the middle of these parameters is relevant with the thread of the wait scheduling of being requeued.

61. digital storage medium as claimed in claim 60, wherein the value of this parameter is any odd number value.

62. digital storage medium as claimed in claim 60, wherein the value of this parameter is negative 1 two's complement value.

63. digital storage medium as claimed in claim 16, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

64. as the described digital storage medium of claim 63, wherein this parameter is encoded in one of bit vector in this record or one or more bit fields.

65. digital storage medium as claimed in claim 56, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

66. digital storage medium as claimed in claim 56, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

67. digital storage medium as claimed in claim 56, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

68. digital storage medium as claimed in claim 56, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

69. mechanism as claimed in claim 1, wherein this instruction is a YIELD instruction.

70. mechanism as claimed in claim 1, wherein this partial record comprises bit vector.

71. mechanism as claimed in claim 1, wherein this partial record comprises one or more multiple bit fields.

72. method as claimed in claim 15, wherein this instruction is a YIELD instruction.

73. disposal system as claimed in claim 42, wherein this instruction is a YIELD instruction.

74. digital storage medium as claimed in claim 56, wherein this instruction is a YIELD instruction.

75. a computer data signal that is included in the transmission medium comprises:

Computer-readable program code, it is used to describe the processor that can support and carry out a plurality of program threads, and it comprises a kind ofly in order to discharge and to reschedule the mechanism of a thread, and this program code comprises:

First program code segments is used for describing the part of records of a data memory storage, this partial record one or more parameters relevant with the one or more conditions that determine a thread whether to be rescheduled of having encoded; And

Second program code segments, be used for describing an instruction of these one or more parameters that can this record of access, wherein when this thread sends this instruction, the one or more values of this instruction accessing in should record, and come to reschedule or discharge this thread according to these one or more conditions according to these one or more values.

76. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, this instruction can an access parameter relevant with thread scheduling, and wherein this instruction is included in the program threads; And

When this parameter equals first numerical value, discharge this program threads in response to this instruction.

77. as the described method of claim 76, wherein this first numerical value is zero.

78., also comprise as the described method of claim 76:

When this parameter equals second value, hang up the execution of this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.

79. as the described method of claim 78, wherein this second value is represented, carries out the required condition that possesses of this program threads and does not satisfy.

80. as the described method of claim 79, wherein this condition is encoded in this parameter as bit vector or value field.

81., also comprise as the described method of claim 78:

When this parameter equals third value, reschedule this program threads in response to this instruction, wherein this third value is unequal in this first numerical value and second value.

82. as the described method of claim 81, wherein this third value is negative 1.

83. as the described method of claim 81, wherein this third value is an odd number value.

84. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, parameter relevant of this instruction accessing with thread scheduling, wherein this instruction is contained in the program threads;

When this parameter equals first numerical value, hang up the execution of this program threads in response to this instruction.

85., also comprise as the described method of claim 84:

When this parameter equals second value, reschedule this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.

86. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, the parameter that this instruction accessing is relevant with thread scheduling, wherein this instruction is contained in the middle of the program threads; And

When this parameter equals first numerical value, reschedule this program threads in response to this instruction.

87., also comprise as the described method of claim 86:

When this parameter equals second value, discharge this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.