CN103425460A - Writing back and discarding method of low-power-consumption register file - Google Patents
Writing back and discarding method of low-power-consumption register file Download PDFInfo
- Publication number
- CN103425460A CN103425460A CN2013103638857A CN201310363885A CN103425460A CN 103425460 A CN103425460 A CN 103425460A CN 2013103638857 A CN2013103638857 A CN 2013103638857A CN 201310363885 A CN201310363885 A CN 201310363885A CN 103425460 A CN103425460 A CN 103425460A
- Authority
- CN
- China
- Prior art keywords
- instruction
- register
- algorithm
- life
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Executing Machine-Instructions (AREA)
Abstract
本发明属于微处理器技术领域,具体涉及一种低功耗的寄存器堆的写回丢弃方法。本发明以现有微处理器为基础,其步骤包括:对所述微处理器,扩充原有的MIPS指令集,在有冗余位的指令中增加3位的“生命长度”来表征当前寄存器变量将要被几条后续指令使用;在执行级、访存级和对齐级增加“生命长度”调整逻辑,如果当前寄存器变量被后续的指令使用,则将其“生命长度”减小1,一旦发现当前寄存器变量的“生命长度”为0,则将其通过基于选择器-寄存器的屏蔽逻辑丢弃。指令寄存器生命长度静态推测算法由软件工具实现。与现有的架构相比,本发明能够在几乎不增加硬件开销的情况下,有效地发现可以丢弃的寄存器变量,从而降低寄存器堆的功耗和功耗密度。
The invention belongs to the technical field of microprocessors, and in particular relates to a write-back and discarding method of a register file with low power consumption. The present invention is based on the existing microprocessor, and its steps include: for the microprocessor, expand the original MIPS instruction set, add 3 "life length" in the instruction with redundant bits to represent the current register The variable will be used by several subsequent instructions; add "life length" adjustment logic at the execution level, memory access level, and alignment level. If the current register variable is used by subsequent instructions, its "life length" will be reduced by 1. Once found The "life length" of the current register variable is 0, it is discarded by selector-register based masking logic. The static estimation algorithm for the life length of the instruction register is implemented by software tools. Compared with the existing architecture, the present invention can effectively discover register variables that can be discarded without increasing hardware overhead, thereby reducing the power consumption and power consumption density of the register file.
Description
技术领域 technical field
本发明属于微处理器技术领域,具体涉及一种低功耗的寄存器堆的写回丢弃方法。 The invention belongs to the technical field of microprocessors, and in particular relates to a write-back and discarding method of a register file with low power consumption.
背景技术 Background technique
寄存器堆是处理器中第一级存储单元,是现代微处理器的核心部件,由于对寄存器堆的访问呈现出高速、高频的特点,是的寄存器堆的功耗和功耗密度都相当大,以至于成为了微处理器的能量消耗主要部件和功耗热点。高能耗对微处理器特别是嵌入式应用领域的微处理器提出了挑战,而功耗热点更会导致电路稳定性和寿命下降。因此,研究降低寄存器堆功耗有十分重要的现实意义。 The register file is the first-level storage unit in the processor and the core component of a modern microprocessor. Because the access to the register file is characterized by high speed and high frequency, the power consumption and power consumption density of the register file are quite large. , so that it has become the main component of energy consumption and power consumption hotspot of the microprocessor. High energy consumption poses a challenge to microprocessors, especially those in the embedded application field, and power consumption hotspots will lead to a decrease in circuit stability and life. Therefore, research on reducing the power consumption of the register file has very important practical significance.
图1展示了传统的6级流水线的微处理器结构图。包括了取指令级、译码级、执行级、访存储器级、对齐级和写回级。 Figure 1 shows a traditional 6-stage pipeline microprocessor structure diagram. Including fetching instruction level, decoding level, executing level, accessing memory level, aligning level and writing back level.
在传统的微处理器架构中,对于寄存器堆的写回没有专门的电路进行控制,实际指令执行过程中可能出现无用的写回操作,但这些操作在传统的微处理器架构中不会被屏蔽,从而导致了不必要的能量消耗,针对这个缺点,需要对寄存器堆的写回进行控制,一旦发现无用的写回操作则将相应的写回操丢弃,从而降低寄存器堆的访问功耗。 In the traditional microprocessor architecture, there is no special circuit to control the write-back of the register file, and useless write-back operations may occur during the actual instruction execution process, but these operations will not be shielded in the traditional microprocessor architecture , resulting in unnecessary energy consumption. To address this shortcoming, it is necessary to control the write-back of the register file. Once a useless write-back operation is found, the corresponding write-back operation is discarded, thereby reducing the access power consumption of the register file.
发明内容 Contents of the invention
本发明的目的在于提供一种能够降低访问功耗的寄存器堆写回丢弃方法。 The purpose of the present invention is to provide a register file write-back discarding method capable of reducing access power consumption.
本发明通过在某些指令中插入表征当前指令目的寄存器的生命长度的标签,并在执行级、访存级和对齐级进行生命长度的减小,来判断当前的目的寄存器是否需要写回(如果为0则放弃写回),从而降低寄存器堆无用的写回功耗,同时也降低寄存器堆的功耗密度,提高电路的稳定性和寿命。 The present invention judges whether the current destination register needs to be written back (if If it is 0, the write-back is abandoned), thereby reducing the useless write-back power consumption of the register file, and also reducing the power consumption density of the register file, and improving the stability and life of the circuit.
本发明提供的能够降低访问功耗的寄存器堆写回丢弃方法,以现有的基本流水线结构的MIPS微处理器为基础,所述现有的微处理器包含取指令级、译码级、执行级、访存储器级、对齐级和写回级(见图1所示);具体步骤为: The register file write-back discarding method that can reduce the access power consumption provided by the present invention is based on the MIPS microprocessor of the existing basic pipeline structure, and the existing microprocessor includes an instruction fetching stage, a decoding stage, an execution level, memory access level, alignment level, and write-back level (see Figure 1); the specific steps are:
(1)对该微处理器,扩充原有的MIPS指令集,在有冗余位的指令中增加3位的“生命长度”来表征当前寄存器变量将要被几条后续指令使用; (1) For the microprocessor, expand the original MIPS instruction set, and add 3 bits of "life length" to the instructions with redundant bits to indicate that the current register variable will be used by several subsequent instructions;
本发明中,寄存器X的生命长度定义如下:当寄存器X在E(执行级), M(访存级), A(对齐级) 级时,如果寄存器X有1、2或3个在其反馈范围内的后续指令需要用到寄存器X,则寄存器X的生命长度就对应被指定为1、2或3;如果寄存器X有超出反馈范围的后续指令需要用到寄存器X,则寄存器X的生命长度就被定义为4。反馈范围定义为:当某条在译码级(D级)的指令Y用到了寄存器X时,而产生寄存器X的指令Z处在E、M或者A级,则称Y指令在Z指令产生寄存器X的反馈范围内,例如在图2中,当寄存器$1在E、M、A级时表示其在反馈范围内,一旦超过这个范围,比如在最下方的一条指令用到了寄存器$1,则定义为寄存器$1的后续指令超出了反馈范围。 In the present invention, the life length of register X is defined as follows: when register X is at E (execution level), M (memory access level), A (alignment level) level, if register X has 1, 2 or 3 feedback Subsequent instructions within the range need to use register X, and the life length of register X is correspondingly designated as 1, 2 or 3; if register X has subsequent instructions beyond the feedback range that need to use register X, the life length of register X is defined as 4. The feedback range is defined as: when an instruction Y at the decoding level (D level) uses register X, and the instruction Z that generates register X is at E, M, or A level, it is said that the Y instruction generates the register in the Z instruction Within the feedback range of X, for example, in Figure 2, when the register $1 is in the E, M, and A levels, it means that it is within the feedback range. Once this range is exceeded, such as register $1 is used in the bottom instruction, it is defined as Subsequent instructions for register $1 are out of feedback range.
(2)在上述扩充MIPS指令集的基础上,在执行级、访存级和对齐级增加“生命长度”调整逻辑:即如果当前寄存器变量被后续的指令使用,则将其“生命长度”减小1;一旦发现当前寄存器变量的“生命长度”为0,则将其通过基于选择器-寄存器的屏蔽逻辑丢弃。具体的步骤为:1、如果当前处在执行级、访存级或对齐级的指令目的寄存器被位于译码级的指令用到,则当前指令中的“生命长度”的tag就被减1;2、如果当前指令中的“生命长度”的tag的值减为0,则传递给下一级的数据将保持不变,而表征寄存器堆写回的信号将被置为无效,从而在最后写回级不会将该寄存器变量写回。值得说明的是,上述步骤在执行级、访存级和对齐级拥有相同的结构。 (2) On the basis of the above-mentioned expansion of the MIPS instruction set, the "life length" adjustment logic is added at the execution level, memory access level, and alignment level: that is, if the current register variable is used by subsequent instructions, its "life length" will be reduced. Small 1; once the "life length" of the current register variable is found to be 0, it is discarded through the selector-register based masking logic. The specific steps are: 1. If the instruction destination register currently at the execution level, memory access level, or alignment level is used by an instruction at the decoding level, the "lifetime" tag in the current instruction will be decremented by 1; 2. If the value of the "life length" tag in the current instruction is reduced to 0, the data passed to the next level will remain unchanged, and the signal representing the write back of the register file will be invalidated, so that the last write Rollback does not write the register variable back. It is worth noting that the above steps have the same structure at the execution level, memory access level and alignment level.
进一步,本发明还提供指令寄存器生命长度静态推测计算的算法,该算法由软件工具实现,该软件工具可以静态遍历生成的汇编代码,确定需要加入“生命长度”的指令中的寄存器变量的“生命长度”,并将其嵌入到当前指令的冗余位。 Further, the present invention also provides an algorithm for static calculation of the life length of instruction registers. The algorithm is implemented by a software tool that can statically traverse the generated assembly code to determine the "life length" of the register variable in the instruction that needs to be added to the "life length". length" and embed it into the redundant bits of the current instruction.
所述的指令寄存器生命长度静态推测计算的算法包括一个主算法和两个子算法,主算法简称算法I,两个子算法分别是:组内生命周期计算(inGroupLifetimeCalculation),简称算法II,组外写丢弃判断计算(outOfGroupWriteDiscardingJudgement),简称算法III;指令寄存器生命长度静态推测算法的代码见附录。 The algorithm for the static calculation of the life length of the instruction register includes a main algorithm and two sub-algorithms. Judgment calculation (outOfGroupWriteDiscardingJudgement), referred to as Algorithm III; see the appendix for the code of the static estimation algorithm for the life length of the instruction register. the
主算法调用两个子算法,主算法的步骤如下: The main algorithm calls two sub-algorithms, and the steps of the main algorithm are as follows:
(1)、针对当前程序的汇编代码中每条指令,首先调用算法II来计算组内的生命周期life和写丢弃信号wd,如果算法II返回了有效的wd信号(表示算法II确定在组内该寄存器可以被丢弃),那么算法I就将该指令寄存器的生命确定为life,如果life等于4,则表明在组内确定该指令的寄存器不能被丢弃,算法I就将life置为4,否则表示单独调用算法II无法确定是否要丢弃; (1) For each instruction in the assembly code of the current program, first call Algorithm II to calculate the life cycle life and write discard signal wd in the group, if Algorithm II returns a valid wd signal (indicating that Algorithm II is determined to be in the group The register can be discarded), then Algorithm I will determine the life of the instruction register as life, if life is equal to 4, it indicates that the register of the instruction in the group cannot be discarded, and Algorithm I will set life to 4, otherwise Indicates that calling Algorithm II alone cannot determine whether to discard;
(2)、进一步调用了算法III,如果算法III确定可以丢弃,则将指令的寄存器生命周期置为算法III的返回值,否则将其置为4。 (2) Algorithm III is further called, if Algorithm III determines that it can be discarded, then set the register life cycle of the instruction as the return value of Algorithm III, otherwise set it to 4.
算法II是组内推测,其步骤如下: Algorithm II is inference within the group, and its steps are as follows:
(1)、在当前指令的后续选择3条后续指令构成一个组(group),当当前指令在延迟槽时,按照分支发生与不发生,可以获得两个组,如附录(II-a)所示; (1) Select 3 subsequent instructions after the current instruction to form a group (group). When the current instruction is in the delay slot, two groups can be obtained according to whether the branch occurs or not, as shown in Appendix (II-a) Show;
(2)、定义三个概念:distance,即距离,表示当前指令的寄存器将隔多少个时钟周期而被后续的指令用到,如果距离大于3,则一律置为4,dependent,即依赖性,表示当前指令是否依赖于第一条指令,依赖意思是当前指令的操作数来自第一条指令的目的寄存器,rewrite,即覆写,表示当前指令的目的寄存器是否与第一条指令一致,一致则为覆写,否则不为覆写;如附录(II-b)所示; (2) Define three concepts: distance, that is, distance, indicating how many clock cycles the register of the current instruction will be used by subsequent instructions, if the distance is greater than 3, it will be set to 4, dependent, that is, dependency, Indicates whether the current instruction depends on the first instruction. Dependency means that the operand of the current instruction comes from the destination register of the first instruction. rewrite, that is, overwrite, indicates whether the destination register of the current instruction is consistent with the first instruction. is overwritten, otherwise it is not overwritten; as shown in Appendix (II-b);
(3)、算法II首先获取当前指令对应的指令组,然后填写附录(II-b)所示的表格,然后开始逐一检查后续指令,A、如果发现有指令依赖于第一条指令,且距离为4,则将第一条指令的生命周期置为4,退出;B、如果有依赖且距离小于4,则生命周期加1;C、如果出现了覆写,则判断生命周期是否是4,如果是的话不能丢弃,否则可以丢弃,返回生命周期,如附录(II-c)所示。 (3) Algorithm II first obtains the instruction group corresponding to the current instruction, then fills in the form shown in Appendix (II-b), and then starts to check the subsequent instructions one by one. A. If an instruction is found to depend on the first instruction, and the distance If it is 4, set the life cycle of the first instruction to 4 and exit; B. If there is a dependency and the distance is less than 4, add 1 to the life cycle; C. If there is an overwrite, determine whether the life cycle is 4, If it is, it cannot be discarded, otherwise it can be discarded, returning to the life cycle, as shown in Appendix (II-c).
算法III是组外判断,由于本算法可以遍历整个程序,考虑到程序中可能存在的分支点,引入了一个容器来保存程序中的分支点。算法III的具体步骤为: Algorithm III is an out-of-group judgment. Since this algorithm can traverse the entire program, considering the possible branch points in the program, a container is introduced to save the branch points in the program. The specific steps of Algorithm III are:
(1)、如果判断出当前指令被后续指令使用,则表示当前指令的寄存器不能被丢弃,返回0,结束; (1) If it is determined that the current instruction is used by a subsequent instruction, it means that the register of the current instruction cannot be discarded, return 0, and end;
(2)、否则,如果判断出当前指令的寄存器被后续的指令覆写了或者达到程序的结束出口,则开启下一轮的判断; (2) Otherwise, if it is judged that the register of the current instruction is overwritten by a subsequent instruction or the end exit of the program is reached, the next round of judgment is started;
(3)、如果下条指令是条件分支指令,那么必须同时判断分支是否发生两条路,这里将分支节点存入到分支容器中,需要注意的是如果在分支容器中发现了这个分支节点,则表示之前已经达到过这个分支点,意味着出现了环,需要把环剔除,否则将分支存入容器,并且先判定分支成功的支路,然后再判断分支失败的支路,直到达到步骤(2)中所说的条件,算法结束。 (3) If the next instruction is a conditional branch instruction, it must be judged at the same time whether the branch occurs in two ways. Here, the branch node is stored in the branch container. It should be noted that if the branch node is found in the branch container, It means that the branch point has been reached before, which means that there is a ring, and the ring needs to be removed, otherwise the branch will be stored in the container, and the successful branch will be judged first, and then the failed branch will be judged until the step ( 2) The conditions mentioned in 2) end the algorithm.
本发明方法在编译器生成软件代码之后,采用全局遍历的策略,来确定某一个寄存器变量被后续指令访问的“生命长度”。本发明提出的算法具备在静态编译时推断寄存器变量的生命长度,并通过指令架构的支持,在指令中嵌入寄存器的生命长度,在运行时,动态调整变量的生命长度,在程序运行时根据生命长度决定是否写回,如果生命长度为零则可屏蔽对寄存器的的写回。本发明省去了不必要的寄存器的写回操作,从而降低了寄存器堆的功耗。 After the software code is generated by the compiler, the method of the present invention adopts a strategy of global traversal to determine the "life length" of a certain register variable accessed by subsequent instructions. The algorithm proposed by the present invention has the ability to infer the life length of register variables during static compilation, and through the support of the instruction structure, the life length of the register is embedded in the instruction, and the life length of the variable is dynamically adjusted during operation. The length determines whether to write back, if the life length is zero, the write back to the register can be masked. The invention saves unnecessary write-back operations of registers, thereby reducing the power consumption of register files.
与现有的架构相比,本发明提供的软件指导的寄存器堆写回丢弃方法,能够在几乎不增加硬件开销的情况下,有效地发现可以丢弃的寄存器变量,从而降低寄存器堆的功耗和功耗密度。 Compared with the existing architecture, the software-guided register file write-back discarding method provided by the present invention can effectively find register variables that can be discarded with almost no increase in hardware overhead, thereby reducing the power consumption of the register file and power density.
附图说明 Description of drawings
图1是传统的6级流水线微处理器架构。 Figure 1 is a traditional 6-stage pipeline microprocessor architecture.
图2是寄存器生命长度的定义示例。 Figure 2 is an example of the definition of the register lifetime.
图3是寄存器堆写丢弃的具体判断逻辑。 Figure 3 is the specific judgment logic of register file write discard.
图4是指令寄存器生命长度标签插入的具体实施策略。 Fig. 4 is a specific implementation strategy of instruction register lifetime tag insertion.
具体实施方式 Detailed ways
本发明描述了一种软件指导的寄存器堆写回丢弃技术。以下阐述了本发明的各种实例及其中的设计思想。 The present invention describes a software-directed register file write-back discard technique. Various examples of the present invention and design ideas therein are described below.
图2 是用来说明寄存器生命长度定义的示例。如果在反馈范围内有1、2、3个后续指令用到了当前指令的目的寄存器则其生命被定义为1、2、3,如果有超出反馈范围的指令用到该寄存器,则其生命被定义为4,具体的示例,在图2中,如果$1寄存器只有其后续三条指令(即subu $3, $1, $7,lw $10, 4($1)和mul $11, $6, $1这三条指令)用到,则其生命为3,但是,如果红色所示的指令(slt $3, $1, $8)也用到了$1,则这条指令超出了$1的反馈范围,$1的生命长度将被设置为4。其中,反馈范围的定义为:当某条在译码级(D级)的指令Y用到了寄存器X时,而产生寄存器X的指令Z处在E、M或者A级,则称Y指令在Z指令产生寄存器X的反馈范围内。 Figure 2 is an example used to illustrate the definition of register lifetime. If there are 1, 2, or 3 subsequent instructions within the feedback range that use the destination register of the current instruction, its life is defined as 1, 2, or 3. If there are instructions beyond the feedback range that use the register, its life is defined For 4, a specific example, in Figure 2, if the $1 register is only used by its subsequent three instructions (ie subu $3, $1, $7, lw $10, 4($1) and mul $11, $6, $1, these three instructions), Then its life is 3, but if the instruction shown in red (slt $3, $1, $8) also uses $1, then this instruction exceeds the feedback range of $1, and the life length of $1 will be set to 4. Among them, the definition of the feedback range is: when a certain instruction Y at the decoding level (D level) uses register X, and the instruction Z that generates register X is at E, M or A level, then the Y instruction is said to be in Z instruction generates register X within the feedback range.
图3展示了寄存器堆写丢弃的具体判断逻辑。与图1传统的结构相比,该结构增加了对生命周期tag的判断逻辑,图3以E级为例进行描述,M级和A级的结构完全一样。首先判断当前的指令的寄存器是否被反馈到D级(必须是反馈命中信号有效(即图3中所示的信号bypass-hit信号有效)并且此时的d级不能阻塞(即图3中所示的d_stall信号为0)),如果是的话则对当前的生命周期tag进行自减,接着判断是否当前的tag为0,为零表示当前指令寄存器的生命为0,可以丢弃,丢弃采用了基于寄存器-选择器的电路结构,该结构的特点是,如果生命tag为0,则传递给下一级的寄存器数据被保持不变,并且,表征寄存器堆写回信号X_wr_sig(X代表E、M和A,图3中以E级示例,所以X为E)将被置为0,从而屏蔽寄存器堆的写回。 Figure 3 shows the specific judgment logic of register file write discard. Compared with the traditional structure in Figure 1, this structure adds the judgment logic for the life cycle tag. Figure 3 uses the E-level as an example to describe, and the M-level and A-level structures are exactly the same. First judge whether the register of the current instruction is fed back to the D stage (the feedback hit signal must be valid (that is, the signal bypass-hit signal shown in Figure 3 is valid) and the d stage at this time cannot be blocked (that is, as shown in Figure 3 The d_stall signal is 0)), if it is, the current life cycle tag will be self-decremented, and then judge whether the current tag is 0. If it is zero, it means that the life of the current instruction register is 0, which can be discarded. The discarding adopts register-based -The circuit structure of the selector. The characteristic of this structure is that if the life tag is 0, the register data passed to the next level is kept unchanged, and the representative register file writes back the signal X_wr_sig (X represents E, M and A , the E-level example in Figure 3, so X is E) will be set to 0, thereby shielding the write-back of the register file.
图4展示指令寄存器生命长度标签插入的具体实施策略。实际的逻辑设计中,我们对R型指令和I型指令进行区分,对于R型指令,由于其总有5bit的冗余位,因此可以将tag插入到冗余位。对于I型指令,需要区分立即数的范围,如果立即数的范围在-4096~4095之间,则可以将tag插入到立即数域的高3位,否则,引入一条新的指令lli,然后将原先的指令拆分为lli指令和一条R型指令,lli指令的生命周期为1而R型指令的生命周期tag与之前的I型指令一致,并且可以有5bit的冗余位来插入tag。 Fig. 4 shows the specific implementation strategy of instruction register lifetime tag insertion. In the actual logic design, we distinguish between R-type instructions and I-type instructions. For R-type instructions, since there are always 5 bits of redundant bits, tags can be inserted into the redundant bits. For I-type instructions, it is necessary to distinguish the range of the immediate value. If the range of the immediate value is between -4096~4095, the tag can be inserted into the upper 3 bits of the immediate value field. Otherwise, a new instruction lli is introduced, and then the The original instruction is split into an lli instruction and an R-type instruction. The life cycle of the lli instruction is 1 and the life cycle tag of the R-type instruction is consistent with the previous I-type instruction, and there can be 5 bits of redundant bits to insert the tag.
附录展示了通过编译时静态确定/推测指令寄存器变量的生命周期的算法。算法I中所列的算法是总体的算法,调用了两个子算法,分别是:inGroupLifetimeCalculation(组内生命周期计算,如附录(II)所示)和outOfGroupWriteDiscardingJudgement(组外写丢弃判断,如附录(III)所示)。算法I的步骤是:针对当前程序的汇编代码中每条指令,首先调用算法II来计算组内的生命周期life和写丢弃信号wd,如果算法II返回了有效的wd信号(表示算法II确定在组内该寄存器可以被丢弃),那么算法I就将该指令寄存器的生命确定为life,如果life等于4,则表明在组内确定该指令的寄存器不能被丢弃,算法I就将life置为4,否则表示单调用算法II无法确定是否要丢弃,进一步调用了算法III,如果算法III确定可以丢弃,则将指令的寄存器生命周期置为算法III的返回值,否则将其置为4。 The appendix presents an algorithm for statically determining/speculating the lifetime of instruction register variables via compile-time. The algorithm listed in Algorithm I is the overall algorithm, and two sub-algorithms are called, namely: inGroupLifetimeCalculation (in-group life cycle calculation, as shown in Appendix (II)) and outOfGroupWriteDiscardingJudgement (outside-group write discard judgment, as shown in Appendix (III) ) shown). The steps of Algorithm I are: For each instruction in the assembly code of the current program, first call Algorithm II to calculate the life cycle life and write discard signal wd in the group, if Algorithm II returns a valid wd signal (indicating that Algorithm II is determined to be The register in the group can be discarded), then Algorithm I determines the life of the instruction register as life, if life is equal to 4, it indicates that the register of the instruction in the group cannot be discarded, and Algorithm I sets life to 4 , otherwise it means that the single-call Algorithm II cannot determine whether to discard, and Algorithm III is further called. If Algorithm III determines that it can be discarded, the register lifetime of the instruction is set to the return value of Algorithm III, otherwise it is set to 4.
算法II是组内推测,其步骤是:在当前指令的后续选择3条后续指令构成一个组(group),当当前指令在延迟槽时,按照分支发生与不发生,可以获得两个组,如附录(II-a)所示。如附录(II-b)所示,再定义了三个概念:distance(距离,表示当前指令的寄存器将隔多少个时钟周期而被后续的指令用到,如果距离大于3,则一律置为4),dependent(依赖性,表示当前指令是否依赖于第一条指令,依赖意思是当前指令的操作数来自第一条指令的目的寄存器),rewrite(覆写,表示当前指令的目的寄存器是否与第一条指令一致,一致则为覆写,否则不为覆写)。如附录(II-c)所示,算法II首先获取当前指令对应的指令组,然后填写附录(II-b)所示的表格,然后开始逐一检查后续指令,1、如果发现有指令依赖于第一条指令,且距离为4,则将第一条指令的生命周期置为4,退出;2、如果有依赖且距离小于4,则生命周期加1;3、如果出现了覆写,则判断生命周期是否是4,如果是的话不能丢弃,否则可以丢弃,返回生命周期。 Algorithm II is inference within a group, and its steps are: select 3 subsequent instructions following the current instruction to form a group (group). When the current instruction is in the delay slot, two groups can be obtained according to whether the branch occurs or not, such as shown in Appendix (II-a). As shown in the appendix (II-b), three more concepts are defined: distance (distance, indicating how many clock cycles the register of the current instruction will be used by subsequent instructions, if the distance is greater than 3, it will be set to 4 ), dependent (dependency, indicating whether the current instruction depends on the first instruction, dependent means that the operand of the current instruction comes from the destination register of the first instruction), rewrite (overwriting, indicating whether the destination register of the current instruction is the same as the first instruction) If an instruction is consistent, it is overwritten if it is consistent, otherwise it is not overwritten). As shown in Appendix (II-c), Algorithm II first obtains the instruction group corresponding to the current instruction, then fills in the form shown in Appendix (II-b), and then starts to check the subsequent instructions one by one. 1. If any instruction is found to depend on the first One instruction, and the distance is 4, set the life cycle of the first instruction to 4 and exit; 2. If there is a dependency and the distance is less than 4, add 1 to the life cycle; 3. If there is an overwrite, judge Whether the life cycle is 4, if it is, it cannot be discarded, otherwise it can be discarded, and returns the life cycle.
算法III是组外判断,由于本算法可以遍历整个程序,考虑到程序中可能存在的分支点,引入了一个容器来保存程序中的分支点。算法III的步骤为:1、如果判断出当前指令被后续指令使用,则表示当前指令的寄存器不能被丢弃,返回0,结束;2、否则,如果判断出当前指令的寄存器被后续的指令覆写了或者达到程序的结束出口,则开启下一轮的判断;3、如果下条指令是条件分支指令,那么必须同时判断分支是否发生两条路,这里将分支节点存入到分支容器中,需要注意的是如果在分支容器中发现了这个分支节点,则表示之前已经达到过这个分支点,意味着出现了环,需要把环剔除,否则将分支存入容器,并且先判定分支成功的支路,然后再判断分支失败的支路,直到达到2中所说的条件,算法结束。
Algorithm III is an out-of-group judgment. Since this algorithm can traverse the entire program, considering the possible branch points in the program, a container is introduced to save the branch points in the program. The steps of Algorithm III are: 1. If it is determined that the current instruction is used by a subsequent instruction, it means that the register of the current instruction cannot be discarded,
附录appendix
算法: 生命周期的静态推测Algorithm: static speculation of life cycle
输入: 源程序汇编代码Input: source program assembly code
参数: 当前指令curr_instr, 源程序src_instr Parameters: current command curr_instr , source program src_instr
生命周期life, 写回丢弃wd, 计数器i, 指令数目instr_# Life cycle life , write back discard wd , counter i, number of instructions instr_#
输出: 生命周期数组 life_arr Output: life cycle array life_arr
初始化: 将汇编代码转化为中间表达 Initialization: convert assembly code into intermediate expression
00: foreach(i∈ [0, instr_# )) 00: foreach( i∈ [ 0, instr_# ))
01: curr_instr = getInstr( i ); src_instr = curr_instr 01: curr_instr = getInstr ( i ); src_instr = curr_instr
02: wd =inGroupLifetimeCalculation(& curr_instr , & life ) 02: wd = inGroupLifetimeCalculation (& curr_instr , & life )
03: if( wd ) life_arr [ i ] = life ; continue; endif /* wd有效, 取下条指令 */ 03: if ( wd ) life_arr [ i ] = life ; continue ; endif /* wd is valid , take the next instruction */
04: if( life == 4) life_arr [ i ] = 4; continue; endif /* 不能丢弃, 下条指令 */ 04: if ( life == 4) life_arr [ i ] = 4; continue ; endif /* cannot be discarded , next instruction */
05: wd = outOfGroupWriteDiscardingJudgement( src_instr , curr_instr ) 05: wd = outOfGroupWriteDiscardingJudgement ( src_instr , curr_instr )
06: if( wd ) life_arr [ i ] = life ; continue; endif /* wd有效,取下条指令 */ 06: if ( wd ) life_arr [ i ] = life ; continue ; endif /* wd is valid, take the next instruction */
07: life_arr [ i ] = 4 /* 保守推测,生命周期为4,取下条指令 */ 07: life_arr [ i ] = 4 /* Conservative speculation, the life cycle is 4, fetch the next instruction */
08: endfor08: endfor
(I) (I)
算法: 组内生命周期计算Algorithm: Intra-group life cycle calculation
输入: 当前指令curr_instr , 生命周期life Input: current instruction curr_instr , life cycle life
参数: 四条指令组group , 加速器 i, 列的 distance , dependent , rewrite Parameters: group of four instructions , accelerator i, column distance , dependent , rewrite
输出: 写丢弃 wd output: write discard wd
初始化: group = getGroup( curr_instr );fillThreeColumns( group ); i = 0; life = 0; wd = 0 Initialization: group = getGroup ( curr_instr ); fillThreeColumns ( group ); i = 0; life = 0; wd = 0
00: while( i ≤2 && ( dependent [ i ] || ! rewrite [ i ])) 00: while ( i ≤2 && ( dependent [ i ] || ! rewrite [ i ]))
01: if(dependent [ i ] && distance [ i ] == 4) life = 4; wd = 0; break; endif 01: if( dependent [ i ] && distance [ i ] == 4) life = 4; wd = 0; break ; endif
02: if(dependent [ i ] && distance [ i ] != 4) life ++; endif; 02: if( dependent [ i ] && distance [ i ] != 4) life ++; endif ;
03: if( rewrite [ i ]) break; endif; i ++ 03: if ( rewrite [ i ]) break ; endif ; i ++
04: endwhile04: endwhile
05: if( rewrite [ i ] && life != 4) wd = 1; returnwd ; endif /* rewrite in-range detected */ 05: if ( rewrite [ i ] && life != 4) wd = 1; return wd ; endif /* rewrite in-range detected */
06: updateCurrInstrToLastInstrOfGroup( group ); return wd 06: updateCurrInstrToLastInstrOfGroup ( group ); return wd
(c)(c)
(II) (II)
算法: 组外写丢弃判断Algorithm: write discard judgment outside the group
输入:源程序 src_instr , 当前指令curr_instr Input: source program src_instr , current instruction curr_instr
参数: 下条指令next_instr Parameters: next instruction next_instr , 分支点容器, the branch point container bpbp
00: bp .clear(); bp .push_back( curr_instr ) 00 : bp.clear ( ); bp.push_back ( curr_instr )
01: while(! bp .empty()) 01: while( ! bp . empty ())
02: next_instr = getNextInstr( bp .pop_back()) 02: next_instr = getNextInstr ( bp . pop_back ())
03: while(1) 03: while( 1)
04: if(isConsumer( src_instr , next_instr )) return 0; endif /*发现读后写 */
04: if ( isConsumer ( src_instr , next_instr ))
05: if(isRewrite( src_instr , next_instr ) || isReachEnd( next_instr )) 05: if ( isRewrite ( src_instr , next_instr ) || isReachEnd ( next_instr ))
06: break; endif /* 没有读后写,下轮推测 */ 06: break; endif /* no write after reading, the next round of speculation */
07: if(isConditionalBranch( next_instr )) 07: if ( isConditionalBranch ( next_instr ))
08: if( bp .find( next_instr )) break; endif /*检测到环,下轮推测 */ 08: if ( bp . find ( next_instr )) break; endif /* ring detected, next round of speculation */
09: bp .push_back( next_instr ) /* 没有环,保存分支点到容器 */ 09: bp . push_back ( next_instr ) /* no loop, save branch point to container */
10: next_instr = getInstrFromBranchSucc( next_instr ) /* 分支成功推测 */ 10 : next_instr = getInstrFromBranchSucc ( next_instr ) /* branch succes succ */
11: endif 11 : endif
12: else next_instr = getNextInstr( next_instr ) endelse 12: else next_instr = getNextInstr ( next_instr ) endelse
13: endwhile13: endwhile
14: endwhile14: endwhile
15: return 1
15:
(III) (III)
Claims (2)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2013103638857A CN103425460A (en) | 2013-08-20 | 2013-08-20 | Writing back and discarding method of low-power-consumption register file |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2013103638857A CN103425460A (en) | 2013-08-20 | 2013-08-20 | Writing back and discarding method of low-power-consumption register file |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN103425460A true CN103425460A (en) | 2013-12-04 |
Family
ID=49650267
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2013103638857A Pending CN103425460A (en) | 2013-08-20 | 2013-08-20 | Writing back and discarding method of low-power-consumption register file |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103425460A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108958453A (en) * | 2018-07-03 | 2018-12-07 | 中国人民解放军国防科技大学 | Low-power-consumption access method and device for register file |
| WO2021147449A1 (en) * | 2020-01-23 | 2021-07-29 | Huawei Technologies Co., Ltd. | Method and apparatus for predicting and scheduling copy instruction for software pipelined loops |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1758229A (en) * | 2005-10-28 | 2006-04-12 | 中国人民解放军国防科学技术大学 | Local space shared memory method of heterogeneous multi-kernel microprocessor |
| US20120093237A1 (en) * | 2009-06-24 | 2012-04-19 | Vixs Systems, Inc. | Processing system with register arbitration and methods for use therewith |
-
2013
- 2013-08-20 CN CN2013103638857A patent/CN103425460A/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1758229A (en) * | 2005-10-28 | 2006-04-12 | 中国人民解放军国防科学技术大学 | Local space shared memory method of heterogeneous multi-kernel microprocessor |
| US20120093237A1 (en) * | 2009-06-24 | 2012-04-19 | Vixs Systems, Inc. | Processing system with register arbitration and methods for use therewith |
Non-Patent Citations (1)
| Title |
|---|
| ZHENG YU等: "A Low Power Register File with Asynchronously Controlled Read-Isolation and Software-Directed Write-Discarding", 《CIRCUITS AND SYSTEMS (ISCAS)2013 IEEE INTERNATIONAL SYMPOSIUM ON》 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108958453A (en) * | 2018-07-03 | 2018-12-07 | 中国人民解放军国防科技大学 | Low-power-consumption access method and device for register file |
| CN108958453B (en) * | 2018-07-03 | 2020-06-05 | 中国人民解放军国防科技大学 | Low-power-consumption access method and device for register file |
| WO2021147449A1 (en) * | 2020-01-23 | 2021-07-29 | Huawei Technologies Co., Ltd. | Method and apparatus for predicting and scheduling copy instruction for software pipelined loops |
| US11366646B2 (en) | 2020-01-23 | 2022-06-21 | Huawei Technologies Co., Ltd. | Method and apparatus for predicting and scheduling copy instruction for software pipelined loops |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11900113B2 (en) | Data flow processing method and related device | |
| CN104423929B (en) | A kind of branch prediction method and relevant apparatus | |
| CN101373427B (en) | Program Execution Control Device | |
| CN104854560B (en) | Method and device for software and hardware cooperative prefetching | |
| CN101504618B (en) | Real-time thread migration method for multi-core processors | |
| US8751823B2 (en) | System and method for branch function based obfuscation | |
| JP2021103577A (en) | Processing method for circulation instruction, electronic device, computer-readable storage medium, and computer program | |
| US8954775B2 (en) | Power gating functional units of a processor | |
| CN102508635B (en) | Processor device and loop processing method thereof | |
| KR101817459B1 (en) | Instruction for shifting bits left with pulling ones into less significant bits | |
| KR20180021812A (en) | Block-based architecture that executes contiguous blocks in parallel | |
| Abbaspour et al. | A time-predictable stack cache | |
| TWI469046B (en) | Register allocation in rotation based alias protection register | |
| CN112667289B (en) | CNN reasoning acceleration system, acceleration method and medium | |
| CN102360306A (en) | Method for extracting and optimizing information of cyclic data flow charts in high-level language codes | |
| CN103425460A (en) | Writing back and discarding method of low-power-consumption register file | |
| Anand et al. | Instruction cache locking inside a binary rewriter | |
| US20110320781A1 (en) | Dynamic data synchronization in thread-level speculation | |
| WO2009024907A2 (en) | Data processing with protection against soft errors | |
| CN103425498B (en) | A kind of long instruction words command memory of low-power consumption and its method for optimizing power consumption | |
| CN117348936A (en) | Processor, finger fetching method and computer system | |
| CN109683959B (en) | Instruction execution method of processor and processor thereof | |
| CN104484286B (en) | Data prefetching method based on location aware in Cache networks on piece | |
| US9411724B2 (en) | Method and apparatus for a partial-address select-signal generator with address shift | |
| CN102789428B (en) | Instruction cache device and control method thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20131204 |