CN1783009A - Hot route searching method in assembly code hot function - Google Patents
Hot route searching method in assembly code hot function Download PDFInfo
- Publication number
- CN1783009A CN1783009A CN 200410009960 CN200410009960A CN1783009A CN 1783009 A CN1783009 A CN 1783009A CN 200410009960 CN200410009960 CN 200410009960 CN 200410009960 A CN200410009960 A CN 200410009960A CN 1783009 A CN1783009 A CN 1783009A
- Authority
- CN
- China
- Prior art keywords
- path
- block
- hot
- loop
- basic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000012141 concentrate Substances 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000004458 analytical method Methods 0.000 description 13
- KDYFGRWQOYBRFD-UHFFFAOYSA-N succinic acid Chemical compound OC(=O)CCC(O)=O KDYFGRWQOYBRFD-UHFFFAOYSA-N 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 208000033986 Device capturing issue Diseases 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 230000009191 jumping Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
Images
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
本发明公开一种汇编代码热函数中的热路径搜寻方法,用于搜寻汇编代码中执行频率高的路径,包括:将所述汇编代码还原成控制流图,所述控制流图由多个基本块组成,并且包括所述多个基本块的信息;根据所述基本块的信息识别所述控制流图中基本块的循环信息以及循环嵌套信息;分别将所述控制流图中的每一层循环作为当前循环来搜寻该循环中所有的路径,在搜寻当前循环的路径时,不考虑当前循环的子循环的基本块;计算搜寻出的各个路径的执行频率,根据所述路径的执行频率挑选出所述热路径。本发明方法的优点:能从汇编代码中,快速准确地找出对性能影响最密切、执行频率最高的若干条热路径;有利于开发人员专注分析热路径上的代码,节省工作量。
The invention discloses a method for searching hot paths in hot functions of assembly codes, which is used to search for paths with high execution frequency in assembly codes, including: restoring the assembly codes to control flow graphs, the control flow graphs are composed of multiple basic blocks, and includes the information of the plurality of basic blocks; identify the cycle information and loop nesting information of the basic blocks in the control flow graph according to the information of the basic blocks; The layer cycle is used as the current cycle to search for all the paths in the cycle. When searching for the path of the current cycle, the basic blocks of the sub-cycles of the current cycle are not considered; the execution frequency of each path found is calculated, and according to the execution frequency of the path Pick out the thermal path. The method of the present invention has the advantages of being able to quickly and accurately find out a number of hot paths that most closely affect performance and have the highest execution frequency from assembly codes; it is beneficial for developers to concentrate on analyzing codes on hot paths and save workload.
Description
技术领域technical field
本发明涉及一种热路径搜寻方法,特别是涉及一种汇编代码热函数中的热路径搜寻方法。The invention relates to a method for searching hot paths, in particular to a method for searching hot paths in hot functions of assembly codes.
背景技术Background technique
在编译器的开发过程中,开发人员经常会遇到这样的一些问题:编译器的性能不如预期的好,却不知道问题出在哪里;做了某种改进之后,结果发现在有些测试例子上性能得到了提高,但是在有些例子上却有所下降;拿到性能不同的两种编译器,想知道它们在优化手段上的差异却不知如何下手。这类问题通常都要靠性能分析来解决。During the development of the compiler, developers often encounter such problems: the performance of the compiler is not as good as expected, but they don't know where the problem is; The performance has been improved, but it has declined in some cases; I got two compilers with different performances, and I want to know the difference in their optimization methods, but I don't know how to start. Such problems are usually resolved by performance analysis.
性能分析指的是动态地或者静态地分析程序生成的代码,找出其中的不完善之处或者是有性能潜力之处。性能分析工具可分为动态和静态两种。Performance analysis refers to dynamically or statically analyzing the code generated by the program to find out the imperfections or performance potential. Performance analysis tools can be divided into dynamic and static.
动态分析指的是在程序运行时,利用软件或硬件收集运行时的信息,最后给出统计信息,辅助开发人员找出性能瓶颈。比如Intel公司的VTune(Intel Corp,VTunePerformance Analyzers)可以按时间或事件对当前执行的指令采样,或者对程序进行插桩(Instrumentation),给出在各个函数、各行代码上CPU执行时间的统计信息。再如HP公司的perfmon(HP Corp,Overview of the Perfmon),它利用IA-64 Itanium处理器上的性能管理部件(PMU),统计CPU时间在各类事件上的分布,如Cache不命中、流水线清空和寄存器栈溢出等。Dynamic analysis refers to the use of software or hardware to collect runtime information when the program is running, and finally gives statistical information to assist developers in finding performance bottlenecks. For example, Intel Corporation's VTune (Intel Corp, VTunePerformance Analyzers) can sample the currently executed instructions by time or event, or perform instrumentation on the program, and give statistical information on the CPU execution time of each function and each line of code. Another example is HP's perfmon (HP Corp, Overview of the Perfmon), which uses the performance management unit (PMU) on the IA-64 Itanium processor to count the distribution of CPU time on various events, such as Cache miss, pipeline Flush and register stack overflow, etc.
静态分析指的是分析编译器生成的汇编代码,找出问题。Static analysis refers to analyzing the assembly code generated by the compiler to find problems.
ORC是IA64/Linux平台上一个面向研究的开放源码优化编译器。在开发ORC编译器的过程中,先使用VTune和perfmon做动态性能分析。但是perfmon等给出的宏观统计信息对微观问题并无多大帮助。比如虽然它指出了程序多少时间花在了指令停等上,但却无从得知哪些地方可以通过指令调度去克服。再如它指出了程序多少时间花在了执行指令上,但却不知哪些指令可以通过复写传播(CopyPropagation)、死代码删除(Dead-Code Elimination)等技术优化掉。这些信息都要靠从汇编代码中静态分析得来。ORC is a research-oriented open source optimizing compiler for the IA64/Linux platform. In the process of developing the ORC compiler, first use VTune and perfmon for dynamic performance analysis. But the macro statistics given by perfmon et al. don't help much with micro problems. For example, although it points out how much time the program spends on instruction stopping and waiting, it has no way of knowing which places can be overcome by instruction scheduling. Another example is that it points out how much time the program spends executing instructions, but it does not know which instructions can be optimized through technologies such as Copy Propagation and Dead-Code Elimination. This information is obtained from static analysis of assembly code.
但在现实中,分析汇编码的工作量很大,开发人员常常会有无从下手的感觉。比如SPEC2000中测试例子的汇编码行数动辄几十万行,无法全部分析。为了解决上述问题,一个易想到的方法是根据程序执行的2/8分布原则:程序80%的时间都花在执行20%的代码,因此只需分析那些对性能影响最密切、执行频率最高的少数几个函数,即热函数。但是通常碰到的情况是这些热函数的汇编码也有数万行,特别是在做了内联(Inlining)之后,分析量依然很大。对这种思路继续拓展,可以集中精力去分析热函数中的热代码。要找出热代码,就需搜寻出热路径。所谓热路径,指的是程序执行最常走的几条路径。由于程序绝大部分时间都是花在循环上,如果把整个函数看作最外层的循环,那么每个热路径必然位于某个循环之中。因此对于选定的热函数,先要找出函数中的循环,然后在每个循环中搜寻热路径,最后按执行频率排序,选出最热的几条路径作为性能分析的基础。But in reality, the workload of analyzing assembly code is very large, and developers often feel that they have no way to start. For example, the number of assembler code lines in the test examples in SPEC2000 is often hundreds of thousands of lines, which cannot be fully analyzed. In order to solve the above problems, an easy-to-think method is based on the 2/8 distribution principle of program execution: 80% of the program time is spent executing 20% of the code, so only those that have the most impact on performance and the highest execution frequency need to be analyzed A handful of functions, the hot functions. However, what is usually encountered is that the assembly code of these hot functions also has tens of thousands of lines, especially after inlining (Inlining), the analysis volume is still very large. Continuing to expand this idea, we can concentrate on analyzing the hot code in the hot function. To find the hot code, you need to search for the hot path. The so-called hot paths refer to the most common paths for program execution. Since most of the program's time is spent in loops, if the entire function is regarded as the outermost loop, then each hot path must be located in a certain loop. Therefore, for the selected hot function, it is necessary to find out the loops in the function first, then search for hot paths in each loop, and finally sort by execution frequency, and select the hottest paths as the basis for performance analysis.
发明内容Contents of the invention
本发明的目的是克服现有的编译器静态性能分析时汇编代码数量大,不容易查找出关键代码进行分析的缺陷,提供一种编译器静态性能分析方法,即汇编代码热函数中的热路径搜寻方法,它能在编译器编译生成的汇编代码中,快速准确地找出对性能影响最密切、执行频率最高的若干条路径。The purpose of the present invention is to overcome the large number of assembly codes in the static performance analysis of existing compilers, and it is not easy to find out the defects of key codes for analysis, and provide a static performance analysis method for compilers, that is, the hot path in the assembly code hot function Search method, which can quickly and accurately find several paths that have the most impact on performance and the highest execution frequency in the assembly code generated by the compiler.
为实现上述目的,本发明提供了一种汇编代码热函数中的热路径搜寻方法,用于搜寻汇编代码中执行频率高的路径,包括:In order to achieve the above object, the present invention provides a method for searching hot paths in assembly code hot functions, which is used to search for paths with high execution frequency in assembly codes, including:
步骤a):将所述汇编代码还原成控制流图,所述控制流图由多个基本块组成,并且包括所述多个基本块的信息;Step a): restoring the assembly code into a control flow graph, the control flow graph is composed of a plurality of basic blocks and includes information of the plurality of basic blocks;
步骤b):根据所述基本块的信息识别所述控制流图中基本块的循环信息以及循环嵌套信息;Step b): identifying the loop information and loop nesting information of the basic block in the control flow diagram according to the information of the basic block;
步骤c):分别将所述控制流图中的每一层循环作为当前循环来搜寻该循环中所有的路径;其中,在搜寻当前循环的路径时,不考虑当前循环的子循环的基本块;Step c): using each loop in the control flow graph as the current loop to search for all the paths in the loop; wherein, when searching for the path of the current loop, the basic blocks of the sub-loops of the current loop are not considered;
步骤d):计算搜寻出的各个路径的执行频率,根据所述路径的执行频率挑选出所述热路径。Step d): Calculate the execution frequency of each searched path, and select the hot path according to the execution frequency of the path.
上述技术方案中,所述基本块的信息包括:基本块的执行频率、基本块的前趋块和后继块、基本块前进到后继块的概率。In the above technical solution, the information of the basic block includes: the execution frequency of the basic block, the predecessor block and the successor block of the basic block, and the probability that the basic block advances to the successor block.
上述技术方案中,在步骤c)中,所述路径从当前循环的入口块开始,并沿着基本块的执行路径前进,如果在前进过程中遇到当前循环的子循环,则跳过该子循环的基本块继续前进,当在前进过程中,一当前基本块满足一终止条件时,将该当前基本块作为该路径的最后一个基本块;所述终止条件为:In the above technical solution, in step c), the path starts from the entry block of the current loop, and advances along the execution path of the basic block, if a sub-loop of the current loop is encountered during the forward process, the sub-loop is skipped. The basic block of circulation moves on, and when in advancing process, when a current basic block satisfies a termination condition, this current basic block is used as the last basic block of this path; Said termination condition is:
1)所述当前基本块没有后继块;1) The current basic block has no successor block;
2)所述当前基本块至少有一个后继块不在当前循环内。2) At least one successor block of the current basic block is not in the current cycle.
上述技术方案中,在步骤c)中,还包括设定一阀值,在搜寻路径时,当基本块前进到其一后继块的概率小于该阀值时,则不考虑该基本块到该后继块的执行路径。In the above-mentioned technical solution, in step c), it also includes setting a threshold value. When searching for a path, when the probability that the basic block advances to a successor block is less than the threshold value, then the basic block is not considered to go to the successor block. The block's execution path.
在步骤d)中,所述路径的执行频率是组成该路径的所有基本块的执行频率以及组成该路径的相邻基本块之间的执行概率的乘积。In step d), the execution frequency of the path is the product of the execution frequencies of all the basic blocks making up the path and the execution probabilities between adjacent basic blocks making up the path.
本发明所述方法的优点在于:The advantage of method of the present invention is:
1.能从编译器编译生成的大量汇编代码中,快速准确地找出对性能影响最密切、执行频率最高的若干条热路径。1. From the large amount of assembly code generated by the compiler, quickly and accurately find out several hot paths that have the most impact on performance and the highest execution frequency.
2.有利于编译器开发人员专注精力分析热路径上的代码,节省工作量。2. It helps compiler developers to focus on analyzing the code on the hot path, saving workload.
附图说明Description of drawings
图1为本发明方法的流程图Fig. 1 is the flowchart of the inventive method
图2为实施例中的hot_path_enum函数的伪代码示意图。Fig. 2 is a schematic diagram of the pseudocode of the hot_path_enum function in the embodiment.
图3和图4为实施例中的enum_paths函数的伪代码示意图。Fig. 3 and Fig. 4 are schematic diagrams of pseudo codes of the enum_paths function in the embodiment.
图5为对CPU2000gap中的一个函数CollectGarb做热路径搜寻时的一个输出。Figure 5 is an output of a hot path search for a function CollectGarb in CPU2000gap.
图6为一个实施例中的代码控制流图Fig. 6 is a code control flow diagram in an embodiment
具体实施方式Detailed ways
下面结合附图和具体实施方式对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,一个汇编代码热函数中的热路径搜寻方法包括以下步骤:As shown in Figure 1, a hot path search method in an assembly code hot function includes the following steps:
在步骤10中,将汇编代码还原成控制流图。控制流图是指编译器中所使用的一种抽象数据结构,它是一个程序或过程的抽象表述。在控制流图中,每个节点表示一个基本块,有向边表示控制流中的调转。控制流图还有两个特殊块(block):入口块(entry block),出口块(exit block),控制流通过入口块进入控制流图,通过出口块结束控制流。还原控制流图的目的是为识别循环做准备。还原控制流图的操作是利用编译器profiling技术实现的。编译器profiling是一种取得程序执行信息的技术,它是一种现有技术,利用编译器profiling技术可以动态获取如基本块的执行次数,程序中每个函数执行的时间、控制流转移概率等信息。在还原控制流图操作执行完毕以后,每个基本块至少标记有下列信息:In
Block ID:基本块的ID;Block ID: the ID of the basic block;
Frequency:基本块的执行频率;Frequency: the execution frequency of the basic block;
Predecessors:基本块在控制流图上所有前驱的ID;Predecessors: IDs of all predecessors of the basic block on the control flow graph;
Successors:基本块在控制流图上所有后继的ID;Successors: IDs of all successors of the basic block on the control flow graph;
Probabilities:依次对应控制流转移到它的每个后继的概率。Probabilities: in turn correspond to the probability that the control flow is transferred to each of its successors.
图6就是一个还原以后的控制流图,在该图中包含了基本块的个数与标识,后继块的执行频率,基本块转移到其后继块的概率等信息。Figure 6 is a restored control flow graph, which contains information such as the number and identification of the basic block, the execution frequency of the successor block, and the probability of transferring the basic block to its successor block.
由于本步骤可以采用现有技术实现,在此不再详细描述。Since this step can be implemented using existing technologies, it will not be described in detail here.
在步骤20中,由基本块的信息识别控制流图中基本块的循环信息。编译器在做循环优化时在基本块上记录循环信息,并且在代码输出时随基本块一同输出。识别循环的结果是,每个循环都有如下几种信息:In
Loop entry:简称为entry,循环的唯一入口;Loop entry: referred to as entry, the only entry of the loop;
Loop tail:简称为tail,循环的唯一结尾基本块,它有指向入口的循环回边;Loop tail: referred to as tail, the only ending basic block of the loop, which has a loop back edge pointing to the entry;
Loop side exits:s0…sm,循环的0个或多个旁出口;Loop side exits: s 0 …s m , 0 or more side exits of the loop;
Loop blocks:bb1…bbn,位于此循环的所有基本块。Loop blocks: bb 1 …bb n , all basic blocks located in this loop.
上述的循环的结尾基本块或旁出口定义为循环出口。整个函数可以被当作是最外层循环,函数的入口就是最外层循环的入口,函数的出口就是最外层循环的结尾。最外层循环没有旁出口,它的基本块即是函数中的所有基本块。The above-mentioned ending basic blocks or side exits of the loop are defined as loop exits. The entire function can be regarded as the outermost loop, the entry of the function is the entry of the outermost loop, and the exit of the function is the end of the outermost loop. The outermost loop has no side exits, and its basic blocks are all basic blocks in the function.
在图6中,所要识别的函数就是最外层循环loop1,它包含所有的基本块。循环loop2由基本块b,e组成。In Figure 6, the function to be identified is the outermost loop loop1, which contains all basic blocks. The loop loop2 consists of basic blocks b, e.
由于循环的识别可以在编译器中实现,故具体的循环识别过程在此也不再详细描述。Since the identification of the cycle can be implemented in the compiler, the specific process of identifying the cycle will not be described in detail here.
在步骤30中,识别循环嵌套信息。循环的嵌套信息,也就是要标明每个循环的紧邻的父循环是哪个。循环嵌套信息的识别利用如下两个原理:1)如果loop1的entry和tail都在loop2之中,那么loop2必然是loop1的祖先循环;2)如果loop2是loop1的祖先循环中大小最小的,那么loop2就是loop1的父循环。对所有循环按大小排序,由小到大扫描小循环是否在大循环之中,识别出循环的嵌套信息。利用上述两个原理,对某个循环可以具体识别出其子循环是哪些,父循环是哪些。In
在图6中,可知loop2是loop1的子循环。In Figure 6, it can be seen that loop2 is a sub-loop of loop1.
在步骤40中,搜寻循环中的热路径。在发明内容中已经提到,搜寻热路径的过程中的一个优选方案是利用阀值技术快速搜寻热路径。在本实施例中,即以该优选方案为例,对本步骤进行详细描述。In
在对本操作步骤进行详细描述以前,先介绍如何实现对子循环的处理。Before describing this operation step in detail, first introduce how to realize the processing of the sub-cycle.
热路径的判断与识别是以路径的执行频率为标准的,但由于子循环能够自我循环,会得出错误的当前路径的执行频率,因此在对一个循环进行热路径的查找过程中应当跳过子循环。具体操作步骤如下:设当前路径cur_path={entry,…,bbi},并且bbi的一个后继cur_block是某子循环的入口,并设该子循环的k个出口后继(post exit)分别为pe1,…,pek。所谓循环的出口后继是指当前循环的所有循环出口的后继块,且这些块不在当前循环之中。对子循环的处理,我们只需分别跳到子循环的每个出口后继pei(i=0,1,2…k),让当前路径cur_path={entry,…,cur_block,pei}。相应地,当前路径的频率要乘以控制流从子循环跳到pei的概率。The judgment and identification of the hot path is based on the execution frequency of the path, but because the sub-loop can loop itself, it will get the wrong execution frequency of the current path, so it should be skipped during the hot path search for a loop sub loop. The specific operation steps are as follows: set the current path cur_path={entry,...,bb i }, and a successor cur_block of bb i is the entry of a certain sub-loop, and set the k post exits of the sub-loop to be pe 1 ,..., pe k . The so-called loop exit successor refers to the successor blocks of all loop exits of the current loop, and these blocks are not in the current loop. For sub-loop processing, we only need to jump to each exit successor pe i (i=0, 1, 2...k) of the sub-loop, let the current path cur_path={entry,..., cur_block, pe i }. Correspondingly, the frequency of the current path is multiplied by the probability of control flow jumping from the subloop to pe i .
对子循环的处理,应用于下述的快速搜寻热路径的方法。The processing of sub-loops is applied to the method of fast hot path search described below.
热路径快速搜寻的实现方法是:The implementation method of hot path fast search is:
所述路径从当前循环的入口块开始,并沿着基本块的执行路径前进,如果在前进过程中遇到当前循环的子循环,则跳过该子循环的基本块继续前进,当在前进过程中,一当前基本块满足一终止条件时,将该当前基本块作为该路径的最后一个基本块;所述终止条件为:The path starts from the entry block of the current loop and advances along the execution path of the basic block. If a sub-loop of the current loop is encountered during the forward process, the basic block of the sub-loop is skipped and continues to move forward. In, when a current basic block satisfies a termination condition, the current basic block is taken as the last basic block of the path; the termination condition is:
1)所述当前基本块没有后继块;1) The current basic block has no successor block;
2)所述当前基本块至少有一个后继块不在当前循环内。2) At least one successor block of the current basic block is not in the current cycle.
路径的执行频率是组成该路径的所有基本块的执行频率以及组成该路径的相邻基本块之间的执行概率的乘积。The execution frequency of a path is the product of the execution frequencies of all the basic blocks that make up the path and the execution probabilities between adjacent basic blocks that make up the path.
在热路径的搜寻过程中采用阀值技术,在搜寻过程中,设定一阀值,若基本块跳转到一后继基本块的概率小于该阀值,在不考虑该基本块到该后继块的执行路径,这样做加快了热路径的搜寻速度。The threshold technology is used in the search process of the hot path. In the search process, a threshold value is set. If the probability of a basic block jumping to a subsequent basic block is less than the threshold value, the basic block to the subsequent block is not considered. Execution path, which speeds up the search for hot paths.
对上述的热路径快速搜寻方法,在本实施例中,用函数hot_path_enum(loop)来实现。该函数的作用是找出当前循环loop及其子循环中的所有热路径,用它对函数最外层循环进行调用即可找出函数中的所有热路径。参照图2,它包含以下步骤:In this embodiment, the above-mentioned fast hot path search method is realized by using the function hot_path_enum(loop). The function of this function is to find out all the hot paths in the current loop loop and its sub-loops, and use it to call the outermost loop of the function to find out all the hot paths in the function. Referring to Figure 2, it includes the following steps:
步骤41:定义一个变量cur_path,将它置空,表示目前尚无热路径。Step 41: Define a variable cur_path, and leave it empty, indicating that there is no hot path yet.
步骤42:用变量eblock来标记循环的入口块。Step 42: Use the variable eblock to mark the entry block of the loop.
步骤43:用变量cur_path_freq来标记eblock的执行频率。Step 43: Use the variable cur_path_freq to mark the execution frequency of the eblock.
步骤44:调用函数enum_paths(eblock,cur_path,cur_path_freq,loop),计算当前循环的热路径。其中的变量cur_path_freq用来记录路径的执行频率,变量loop表示当前循环。Step 44: Call the function enum_paths(eblock, cur_path, cur_path_freq, loop) to calculate the hot path of the current loop. The variable cur_path_freq is used to record the execution frequency of the path, and the variable loop indicates the current loop.
步骤45:计算循环的所有子循环的热路径。Step 45: Compute the thermal paths of all sub-loops of the loop.
在步骤44中调用了函数enum_paths(eblock,cur_path,cur_path_freq,loop),该函数是用来计算热路径的,在该函数中运用了阀值技术。它的操作流程较为复杂,下面参照图3和图4对其具体操作进行详细说明。In
调用函数enum_paths(cur_block,cur_path,cur_path_freq,loop),其中cur_block表示当前块,cur_path表示当前路径,cur_path_freq表示当前路径的执行频率,loop表示所在循环。Call the function enum_paths(cur_block, cur_path, cur_path_freq, loop), where cur_block represents the current block, cur_path represents the current path, cur_path_freq represents the execution frequency of the current path, and loop represents the loop it is in.
步骤100:判断当前块cur_block是否在循环中,若为否,则返回,结束当前函数,若为是,继续执行下面操作。Step 100: Determine whether the current block cur_block is in a loop, if not, return and end the current function, if yes, continue to perform the following operations.
步骤200:将当前块cur_block加入当前路径cur_path中。Step 200: Add the current block cur_block into the current path cur_path.
步骤300:判断cur_block是否有后继块,若有后继块,跳转到步骤500,若没有后继块,继续执行下一步操作。Step 300: Determine whether cur_block has a successor block, if there is a successor block, jump to step 500, if there is no successor block, proceed to the next step.
步骤400:输出当前的热路径cur_path及其执行频率cur_path_freq,跳转到步骤700。Step 400: Output the current hot path cur_path and its execution frequency cur_path_freq, and jump to step 700.
步骤500:将标记alread_dump置0,表示尚未输出热路径及其执行频率。Step 500: Set the flag already_dump to 0, indicating that the hot path and its execution frequency have not been output.
步骤600:将curr_block的后继块记为succ,并对后继块进行以下操作。Step 600: Record the successor block of curr_block as succ, and perform the following operations on the successor block.
步骤610:判断后继块succ是否是当前循环loop的一个子循环的入口,若为否,跳转到步骤620,若为是,继续执行下面操作。Step 610: Determine whether the successor block succ is the entry of a sub-loop of the current loop, if no, jump to step 620, if yes, continue to perform the following operations.
步骤611:设置一变量tmp_cur_path_freq,该变量的大小为当前路径执行频率cur_path_freq和由当前块cur_block到该后继块succ的边的概率之积。Step 611: Set a variable tmp_cur_path_freq, which is the product of the current path execution frequency cur_path_freq and the probability of the edge from the current block cur_block to the successor block succ.
步骤612:用inner_loop来标识后继块succ所在的子循环,该子循环的出口若有后继块,则用post_exit来标识这新的后继块,对新的后继块执行以下操作。Step 612: Use inner_loop to identify the sub-loop where the successor block succ is located. If there is a successor block at the exit of the sub-loop, use post_exit to identify the new successor block, and perform the following operations on the new successor block.
步骤612a:将tmp_cur_path_freq和由inner_loop跳转到post_exit的概率相乘,所得结果重新置入tmp_cur_path_freq中。
步骤612b:调用enum_paths(post_exit,cur_path,tmp_cur_path_freq,loop);计算由post_exit开始的所有的热路径。
步骤613:判断子循环inner_loop的出口的其他后继块post_exit中的热路径是否已经计算完毕,若没有计算完,则跳转到步骤612a,若已经计算完,则继续以下操作。Step 613: Determine whether the hot paths in other subsequent blocks post_exit of the exit of the sub-loop inner_loop have been calculated. If not, jump to step 612a. If calculated, continue the following operations.
步骤620:如果后继块succ不在循环之内或者cur_block的后继路径(即cur_block->succ)是一条回边,则执行以下操作,若不是上述情况,则跳转到步骤630。Step 620: If the successor block succ is not within the loop or the successor path of cur_block (namely cur_block->succ) is a return edge, then perform the following operations; if not, go to step 630.
步骤621:判断already_dump是否为0,若为0,则输出当前的热路径集合和执行频率,并继续执行以下操作。如果不为0,则不做任何操作,并跳转到步骤630。Step 621: Determine whether already_dump is 0, if it is 0, output the current set of hot paths and execution frequency, and continue to perform the following operations. If not 0, do nothing and jump to step 630.
步骤622:将already_dump置为1。Step 622: Set already_dump to 1.
步骤630:判断cur_block的后继路径(即cur_block->succ)所在边的执行概率是否小于阀值,所述的阀值大小是用户根据实际需要而给出的。若为是,则跳转到步骤660,若为否,继续下列操作。Step 630: Determine whether the execution probability of the side where the successor path of cur_block (ie cur_block->succ) is located is less than a threshold value, the threshold value is given by the user according to actual needs. If yes, jump to step 660, if no, continue the following operations.
步骤640:将cur_path_freq与cur_block->succ这条边的执行概率相乘,并将相乘所得的结果置于tmp_cur_path_freq。Step 640: Multiply cur_path_freq by the execution probability of the edge cur_block->succ, and place the multiplied result in tmp_cur_path_freq.
步骤650:调用函数enum_paths(succ,cur_path,tmp_cur_path_freq,loop)以计算从succ开始的所有热路径集合。Step 650: Call the function enum_paths(succ, cur_path, tmp_cur_path_freq, loop) to calculate the set of all hot paths starting from succ.
步骤660:判断对于当前块curr_block的所有后继块succ是否都已经计算其内部的热路径,若为否,选择一内部热路径未计算的后继块,并跳转到步骤600,若为是,继续以下操作。Step 660: Determine whether the internal heat paths of all successor blocks succ of the current block curr_block have been calculated, if not, select a successor block whose internal heat paths have not been calculated, and jump to step 600, if yes, continue Do the following.
步骤700:结束操作,退出函数。Step 700: End the operation and exit the function.
上述步骤中的步骤610至步骤613就是处理子循环的具体实现。Step 610 to step 613 in the above steps are the specific realization of the processing sub-loop.
以上所述即为本发明的方法。The above is the method of the present invention.
利用编译过程中的profiling可以对热路径按执行频率排序,并选出其中最热的几条路径(即执行频率最高的几条路径)作为性能分析的基础。Using the profiling in the compilation process, the hot paths can be sorted by execution frequency, and the hottest paths (that is, the paths with the highest execution frequency) can be selected as the basis for performance analysis.
下面举一个具体的例子来说明本实施例的方法。A specific example is given below to illustrate the method of this embodiment.
如图6所示,为一段代码的控制流图,在该图中,有a,b,c,d,e,f,g七个基本块,基本块旁边的整数代表这个基本块的执行频率,块与块之间连有边,边上的数字代表走这个边的概率。在图中共有两个循环,一个是最外层循环,记为loop1,它包含所有的基本块。另一个是子循环loop2,它由基本块b,e组成。从a开始搜寻loop1中的路径,a有2个后继块,分别为b和c,a的执行频率为10,从a走向b的概率为0.2,从a走向c的概率为0.8。为说明对子循环的处理方法,暂不使用阈值技术。先从a走向b,当前路径cur_path={a},频率为10。因为b是子循环loop2的入口,从前述的对子循环处理可知,应当跳过子循环loop2,所以直接跳到loop2的出口后继d和f,由此分裂出两条路径:{a,d}和{a,f},{a,d}的频率为10×0.2×0.01=0.02;{a,f}的频率为10*0.2*0.01=0.02。这两条路继续向下遍历,生成两条路径:{a,d,g}和{a,f,g},频率分别为0.02和0.02。此时再向下遍历,由于基本块g没有后继块,因此路径被终止,最终得到两条路径{a,d,g}和{a,f,g},频率分别为0.02和0.02。a若走向c,当前路径cur_path={a},频率为10,a的后继块c不满足路径的终止条件,重复使用本发明所述的热路径搜寻方法,以基本块c作为当前块,其后继块为g,将c加入当前路径中,当前路径cur_path={a,c},频率为10×0.8=8,同样地向下遍历,得到路径{a,c,g},频率为10×0.8×1.0=8,由于g没有后继块,满足路径的终止条件,最终得到的路径为{a,c,g},频率为8。对最外层循环loop1实现热路径搜寻后,对在第一次热路径搜寻中跳过的子循环loop2进行热路径搜寻。loop2中有两个基本块b和e。在该循环中,循环的入口块为b,循环的出口块为b和e。若b既做入口块又做出口块,很容易得到一条路径{b},其频率为100。若b为入口块,e为出口块,则根据本发明所述方法也能够容易地得到路径{b,e},其频率为100×0.99=99。故在循环loop2中有两条路径{b}和{b,e},它们的频率分别为100和99。从本段代码的控制流图中,根据本发明所述的方法可以得到5条路径,即:{a,d,g},{a,f,g},{a,c,g},{b}和{b,e}。如果对热路径进行排序,取前3个最热路径,那么它们依次是{b},{b,e},{a,c,g}。以上所述是在不使用阀值技术的条件下。若使用阈值技术,假设设定阀值为0.1,由于从a跳转到d和f的概率小于0.1,因此就不会去遍历{a,d,g}和{a,f,g}两条路径,最终得到的路径为{a,c,g},{b}和{b,e}。这样做显然可以节省遍历相关基本块的时间,加快搜寻速度,从而达到快速搜寻热路径的目的。As shown in Figure 6, it is a control flow graph of a piece of code. In this graph, there are seven basic blocks a, b, c, d, e, f, and g. The integer next to the basic block represents the execution frequency of this basic block , There are edges between blocks, and the numbers on the edges represent the probability of walking this edge. There are two loops in the figure, one is the outermost loop, denoted as loop1, which contains all the basic blocks. The other is the sub-loop loop2, which consists of basic blocks b, e. Start searching for the path in loop1 from a, a has two successor blocks, namely b and c, the execution frequency of a is 10, the probability of going from a to b is 0.2, and the probability of going from a to c is 0.8. To illustrate the processing of subloops, the thresholding technique is not used. First go from a to b, the current path cur_path={a}, the frequency is 10. Because b is the entrance of the sub-loop loop2, we can see from the above-mentioned sub-loop processing that the sub-loop loop2 should be skipped, so it directly jumps to the exit of loop2 to follow d and f, thus splitting into two paths: {a, d} The frequency of and {a, f}, {a, d} is 10*0.2*0.01=0.02; the frequency of {a, f} is 10*0.2*0.01=0.02. These two paths continue to traverse downwards, generating two paths: {a, d, g} and {a, f, g} with frequencies of 0.02 and 0.02, respectively. At this time, traverse down again, because the basic block g has no successor block, so the path is terminated, and finally two paths {a, d, g} and {a, f, g} are obtained, and the frequencies are 0.02 and 0.02 respectively. If a goes to c, the current path cur_path={a}, the frequency is 10, the successor block c of a does not meet the termination condition of the path, and the hot path search method described in the present invention is repeatedly used, with the basic block c as the current block, its The successor block is g, add c to the current path, the current path cur_path={a,c}, the frequency is 10×0.8=8, similarly traverse downwards, and get the path {a,c,g}, the frequency is 10× 0.8×1.0=8, since g has no successor block, the termination condition of the path is met, and the final obtained path is {a, c, g} with a frequency of 8. After the hot path search is implemented for the outermost loop loop1, the hot path search is performed on the sub-loop loop2 skipped in the first hot path search. There are two basic blocks b and e in loop2. In this loop, the entry block of the loop is b, and the exit blocks of the loop are b and e. If b is both an entry block and an exit block, it is easy to obtain a path {b} with a frequency of 100. If b is an entry block and e is an exit block, the method of the present invention can also easily obtain the path {b, e}, and its frequency is 100×0.99=99. Therefore, there are two paths {b} and {b, e} in loop2, and their frequencies are 100 and 99 respectively. From the control flow graph of this code, five paths can be obtained according to the method of the present invention, namely: {a, d, g}, {a, f, g}, {a, c, g}, { b} and {b, e}. If the hot paths are sorted and the top 3 hottest paths are taken, then they are {b}, {b, e}, {a, c, g} in order. The above is under the condition that the threshold technology is not used. If the threshold technique is used, assuming that the threshold value is set to 0.1, since the probability of jumping from a to d and f is less than 0.1, it will not traverse {a, d, g} and {a, f, g}. Path, the final path is {a, c, g}, {b} and {b, e}. Doing so can obviously save the time of traversing related basic blocks and speed up the search, so as to achieve the purpose of quickly searching for hot paths.
利用本发明所述的方法可以较为方便地实现对热路径的搜索,如图5所示为利用本发明所述方法对CPU2000gap中的一个函数CollectGarb做热路径搜寻时的一个输出。The method of the present invention can be used to search for the hot path more conveniently. As shown in FIG. 5 , an output of a function CollectGarb in the CPU2000gap using the method of the present invention is used to search for the hot path.
在图中,Path#表示路径的序号,Loop#表示循环序号,Freq表示执行的频率,Hot Path &(Source Line#)就是表示具体的热路径,后面的括号内部分表示源程序的行号。在图5中,列出了CollectGarb函数中执行频率最高的5条路径,以路径2为例,该路径所在的循环序号为1,它的执行频率为5.21e+06,它的具体热路径为29 30 116 117 47,其中的数字标号表示基本块的序号,也就是说热路径2通过了基本块29、30、116、117和47。上述基本块的起始源程序的行号分别为774 789790 791 792 799 800 834。In the figure, Path# represents the sequence number of the path, Loop# represents the loop sequence number, Freq represents the frequency of execution, Hot Path &(Source Line#) represents the specific hot path, and the part in the back brackets represents the line number of the source program. In Figure 5, the five paths with the highest execution frequency in the CollectGarb function are listed. Taking
Claims (5)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004100099600A CN100337202C (en) | 2004-12-03 | 2004-12-03 | Hot route searching method in assembly code hot function |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004100099600A CN100337202C (en) | 2004-12-03 | 2004-12-03 | Hot route searching method in assembly code hot function |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1783009A true CN1783009A (en) | 2006-06-07 |
| CN100337202C CN100337202C (en) | 2007-09-12 |
Family
ID=36773238
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2004100099600A Expired - Fee Related CN100337202C (en) | 2004-12-03 | 2004-12-03 | Hot route searching method in assembly code hot function |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN100337202C (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103399741A (en) * | 2013-07-24 | 2013-11-20 | 中国科学院声学研究所 | Method and device for profiling assembly-level static paths |
| CN103473319A (en) * | 2013-09-12 | 2013-12-25 | 中国科学院声学研究所 | Statistical method for hotspot data |
| CN103473168A (en) * | 2013-09-12 | 2013-12-25 | 中国科学院声学研究所 | Statistical method for hotspot program |
| CN109582444A (en) * | 2017-09-29 | 2019-04-05 | 英特尔公司 | Method and apparatus for performing region formation of dynamic binary translation processor |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101655782B (en) * | 2009-09-10 | 2012-07-18 | 浙江大学 | Implementation method of data flow graph of program obtained on the basis of assembly codes of basic block |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4782444A (en) * | 1985-12-17 | 1988-11-01 | International Business Machine Corporation | Compilation using two-colored pebbling register allocation method such that spill code amount is invariant with basic block's textual ordering |
| US6064820A (en) * | 1998-05-27 | 2000-05-16 | Hewlett-Packard Company | Apparatus and method to incrementally update single static assignment (SSA) form |
-
2004
- 2004-12-03 CN CNB2004100099600A patent/CN100337202C/en not_active Expired - Fee Related
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103399741A (en) * | 2013-07-24 | 2013-11-20 | 中国科学院声学研究所 | Method and device for profiling assembly-level static paths |
| CN103399741B (en) * | 2013-07-24 | 2016-05-25 | 中国科学院声学研究所 | A kind of assembly level static path method for decomposing and device |
| CN103473319A (en) * | 2013-09-12 | 2013-12-25 | 中国科学院声学研究所 | Statistical method for hotspot data |
| CN103473168A (en) * | 2013-09-12 | 2013-12-25 | 中国科学院声学研究所 | Statistical method for hotspot program |
| CN103473168B (en) * | 2013-09-12 | 2016-05-04 | 中国科学院声学研究所 | A kind of statistical method of focus program |
| CN103473319B (en) * | 2013-09-12 | 2017-02-15 | 中国科学院声学研究所 | Statistical method for hotspot data |
| CN109582444A (en) * | 2017-09-29 | 2019-04-05 | 英特尔公司 | Method and apparatus for performing region formation of dynamic binary translation processor |
| CN109582444B (en) * | 2017-09-29 | 2025-05-09 | 英特尔公司 | Method and apparatus for performing region formation of a dynamic binary translation processor |
Also Published As
| Publication number | Publication date |
|---|---|
| CN100337202C (en) | 2007-09-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1294486C (en) | Method and system for transparent dynamic optimization in multiple treating environment | |
| Li et al. | Hardware-software co-design of embedded reconfigurable architectures | |
| CN1302384C (en) | Recckoning multiroute operation quided by controlling quasi-independent point | |
| CN1308826C (en) | Systems and methods for CPI scheduling on SMT processors | |
| CN1308825C (en) | System and method for CPI load balancing in SMT processors | |
| CN101273332B (en) | Thread-data affinity optimization method and compiler | |
| Banerjee et al. | Automatic program parallelization | |
| CN1127016C (en) | Realizing self deserialization by register rename | |
| CN1614555A (en) | Apparatus and method for autonomic hardware assisted thread stack tracking | |
| US20120324454A1 (en) | Control Flow Graph Driven Operating System | |
| CN101044457A (en) | System, method and apparatus for dependency chain processing | |
| CN1853165A (en) | Methods and apparatuses for compiler-creating helper threads for multi-threading | |
| CN1906578A (en) | Apparatus and method for automatic thread partitioning compiler | |
| CN1877532A (en) | Compiler apparatus | |
| CN1922574A (en) | Method and system for link-time code optimization without additional code analysis | |
| CN1949185A (en) | Parallel adjusting and performance analyzing method of supporting multi-language multi-platform under isomerized environment | |
| CN119396406B (en) | Automatic increment compiling method and system for Java development | |
| CN100337202C (en) | Hot route searching method in assembly code hot function | |
| CN1834907A (en) | System, method and program product to optimize code during run time | |
| CN1542616A (en) | Compiling device, compiling method, communication terminal device, compiling program, and program product | |
| CN1716202A (en) | Method and device for processing incomplete static information in dynamic-static combined binary translation | |
| US8117604B2 (en) | Architecture cloning for power PC processors | |
| CN1670699A (en) | A Micro-scheduling Method Supporting Directed and Cyclic Graphs | |
| Alsayyah et al. | Energy efficient software development techniques for cloud based applications | |
| CN101231598B (en) | A Method of Constructing Semantic Flowchart Based on Assembler |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070912 Termination date: 20201203 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |