[go: up one dir, main page]

CN101630268B - Synchronous optimization method and synchronous optimization equipment - Google Patents

Synchronous optimization method and synchronous optimization equipment Download PDF

Info

Publication number
CN101630268B
CN101630268B CN2009101629791A CN200910162979A CN101630268B CN 101630268 B CN101630268 B CN 101630268B CN 2009101629791 A CN2009101629791 A CN 2009101629791A CN 200910162979 A CN200910162979 A CN 200910162979A CN 101630268 B CN101630268 B CN 101630268B
Authority
CN
China
Prior art keywords
synchronization
synchronous
point
version
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101629791A
Other languages
Chinese (zh)
Other versions
CN101630268A (en
Inventor
张昱
史成荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN2009101629791A priority Critical patent/CN101630268B/en
Publication of CN101630268A publication Critical patent/CN101630268A/en
Application granted granted Critical
Publication of CN101630268B publication Critical patent/CN101630268B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

本发明提出了一种同步优化方法及设备。该方法包括以下步骤:对被编译方法进行静态程序分析,根据分析结果确定所述被编译方法中无须对同步对象进行同步操作的同步方法调用点,并对所述同步方法调用点进行标记;根据所述标记的同步方法调用点为其调用的同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码;根据所述标记的同步方法调用点执行所述本地代码。本发明所提出的同步优化的方法及设备具有高度的灵活性和良好的可扩展性。

Figure 200910162979

The invention provides a synchronization optimization method and equipment. The method includes the following steps: performing static program analysis on the compiled method, determining a synchronization method call point in the compiled method that does not need to perform a synchronization operation on a synchronization object according to the analysis result, and marking the synchronization method call point; The marked synchronous method call point compiles and generates a local code that allows no synchronization operation to be performed on the synchronization object of the synchronous method for the called synchronous method; executes the local code according to the marked synchronous method call point. The method and device for synchronous optimization proposed by the present invention have high flexibility and good scalability.

Figure 200910162979

Description

同步优化的方法及设备Method and device for synchronous optimization

技术领域 technical field

本发明涉及计算机技术,更具体地涉及同步优化技术。The present invention relates to computer technology, and more particularly to synchronization optimization technology.

背景技术 Background technique

Java是一种在语言级别上支持多线程编程的程序设计语言,它提供基于监视器机制的同步方法和同步语句来用于线程间的同步。Java中每个对象有一个对应的监视器,线程可以通过同步方法调用和同步语句对某对象进行加锁/解锁,每次只有一个线程可以持有某监视器上的锁,其他试图对这个监视器加锁的线程都将被阻塞直到它们获得这个监视器上的锁。Java is a programming language that supports multi-threaded programming at the language level. It provides synchronization methods and synchronization statements based on the monitor mechanism for synchronization between threads. Each object in Java has a corresponding monitor. Threads can lock/unlock an object through synchronous method calls and synchronous statements. Only one thread can hold the lock on a certain monitor at a time. Others try to monitor this Threads that lock the monitor will be blocked until they acquire the lock on the monitor.

为了保证多线程安全,Java标准库中的很多类都必须设计成线程安全的,也就是在访问共享数据时需要添加同步操作,如java.util.Vector类、java.util.Hashtable类等,应用程序可以在多线程环境中简单安全地使用这些库所提供的功能。除了可以通过调用Java标准库中各个线程安全的类和方法外,程序员还可以自己利用同步方法和同步语句来实现多线程对共享数据的互斥访问,以确保程序的正确性。虽然Java中的同步机制能方便地保证多线程的安全,但是也给多线程程序的运行带来非常大的开销。据研究表明,同步开销通常占总执行时间的5%-10%;在对Marmot(一种Java优化编译器)的测试中,5个中等大小的单线程程序在同步上花费了26%~60%的时间。由此,开展同步优化(synchronization optimization)是十分重要的。In order to ensure multi-thread safety, many classes in the Java standard library must be designed to be thread-safe, that is, synchronization operations need to be added when accessing shared data, such as java.util.Vector class, java.util.Hashtable class, etc., application Programs can simply and safely use the functions provided by these libraries in a multi-threaded environment. In addition to calling various thread-safe classes and methods in the Java standard library, programmers can also use synchronization methods and synchronization statements to achieve mutual exclusive access to shared data by multiple threads to ensure the correctness of the program. Although the synchronization mechanism in Java can easily ensure the safety of multi-threading, it also brings a very large overhead to the operation of multi-threaded programs. According to research, synchronization overhead usually accounts for 5%-10% of the total execution time; in a test of Marmot (a Java optimizing compiler), 5 medium-sized single-threaded programs spent 26%-60% on synchronization. %time. Therefore, it is very important to carry out synchronization optimization.

同步优化旨在降低同步操作的开销,提高程序的整体性能。为了减少同步操作的开销,一些方法直接改良同步原语的实现,比如,瘦锁(thin lock)技术和锁保留(lock reservation)技术。还有一些方法利用静态程序分析技术自动分析并消除不必要的同步操作,比如,锁取消(lock elision)技术和锁粗化(lock coarsening,lock coalescence)技术。锁取消技术利用逃逸分析(escapeanalysis)技术确定出线程局部(thread-local)的对象,然后取消这些对象上的同步操作。锁粗化技术合并某些作用于同一同步对象上的多个同步操作,合并之后会导致被同步的代码区域范围增大,这在多线程的环境下会引起程序并发度的下降,尤其是在线程数目非常多的情况下,对并发度的影响会更加明显。Synchronization optimization aims to reduce the overhead of synchronization operations and improve the overall performance of the program. In order to reduce the overhead of synchronization operations, some methods directly improve the implementation of synchronization primitives, such as thin lock technology and lock reservation technology. There are also some methods that use static program analysis techniques to automatically analyze and eliminate unnecessary synchronization operations, such as lock cancellation (lock elimination) technology and lock coarsening (lock coarsening, lock coalescence) technology. Lock cancellation technology uses escape analysis (escape analysis) technology to determine thread-local (thread-local) objects, and then cancels the synchronization operations on these objects. The lock coarsening technology merges some multiple synchronization operations that act on the same synchronization object. After the merger, the range of the code area to be synchronized will increase, which will cause a decrease in program concurrency in a multi-threaded environment, especially in When the number of threads is very large, the impact on concurrency will be more obvious.

因此目前需要一种具有良好的灵活性和可扩展性并对并发度具有较小影响的同步优化的方案。Therefore, a synchronization optimization scheme with good flexibility and scalability and little impact on concurrency is needed at present.

发明内容 Contents of the invention

为了解决上述问题之一,本发明提出了一种同步优化方法,包括以下步骤:对被编译方法进行静态程序分析,根据分析结果确定所述被编译方法中无须对同步对象进行同步操作的同步方法调用点,并对所述同步方法调用点进行标记;根据所述标记的同步方法调用点为其调用的同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码;根据所述标记的同步方法调用点执行所述本地代码。In order to solve one of the above-mentioned problems, the present invention proposes a synchronization optimization method, which includes the following steps: performing static program analysis on the compiled method, and determining a synchronization method in the compiled method that does not need to perform a synchronization operation on the synchronization object according to the analysis result call point, and mark the synchronization method call point; according to the marked synchronization method call point, compile and generate a local code that allows not to perform a synchronization operation on the synchronization object of the synchronization method; according to the The marked synchronous method call site executes the native code.

根据本发明的实施例,所述对被编译方法进行静态程序分析,确定无须对同步对象进行同步操作的同步方法调用点的步骤包括:合并作用于同一个同步对象上的相邻同步区域和/或嵌套同步区域,将包含在所述合并后的同步区域内的同步方法调用点确定为所述冗余同步方法调用点。According to an embodiment of the present invention, the step of performing static program analysis on the compiled method to determine a synchronization method call point that does not need to perform a synchronization operation on a synchronization object includes: merging adjacent synchronization areas and/or functions on the same synchronization object Or a nested synchronization area, determining a synchronization method call point included in the merged synchronization area as the redundant synchronization method call point.

根据本发明的实施例,所述对被编译方法进行静态程序分析,确定无须对同步对象进行同步操作的同步方法调用点的步骤包括:根据对象是否在多个线程间共享确定对所述对象的同步方法调用点是否是无须对同步对象进行同步操作的同步方法调用点。According to an embodiment of the present invention, the step of performing static program analysis on the compiled method to determine the call point of the synchronization method that does not need to perform a synchronization operation on the synchronization object includes: determining the call point of the object according to whether the object is shared among multiple threads Whether the synchronous method call point is a synchronous method call point that does not need to perform a synchronous operation on the synchronization object.

根据本发明的实施例,根据对象是否在多个线程间共享确定对所述对象的同步方法调用点是否是无须对同步对象进行同步操作的同步方法调用点的步骤包括:如果所述对象只在创建所述对象的线程中被访问,则将作用于所述对象的同步方法调用点确定为所述无须对同步对象进行同步操作的同步方法调用点。According to an embodiment of the present invention, according to whether the object is shared between multiple threads, the step of determining whether the synchronization method call point for the object is a synchronization method call point that does not need to perform a synchronization operation on the synchronization object includes: if the object is only in the If the object is accessed in the thread that created the object, the synchronization method call point acting on the object is determined as the synchronization method call point that does not need to perform a synchronization operation on the synchronization object.

根据本发明的实施例,根据所述标记的同步方法调用点为同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码的步骤包括:根据所述标记的同步方法调用点为所述同步方法编译生成所述同步方法的普通版本和所述同步方法的非同步版本,其中所述同步方法的非同步版本包括忽略对所述同步方法的同步对象上的同步操作而生成的本地代码。According to an embodiment of the present invention, the step of compiling and generating a local code for the synchronization method according to the marked synchronization method call point that allows not to perform a synchronization operation on the synchronization object of the synchronization method includes: according to the marked synchronization method call point is Compilation of the synchronized method generates a normal version of the synchronized method and an unsynchronized version of the synchronized method, wherein the unsynchronized version of the synchronized method includes local code.

根据本发明的实施例,根据所述标记的同步方法调用点为其调用的同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码的步骤包括:根据所述标记的同步方法调用点为所述同步方法编译生成所述同步方法的分支版本,并在所述被标记的同步方法调用点前插入一条设置标记信息的指令。其中所述同步方法的分支版本通过以下步骤获得:在所述同步方法入口处插入获得标记信息的指令和清除标记的指令;在作用在所述同步方法的同步对象上的同步操作之前插入分支指令,所述分支指令判断获得的标记信息,若所述标记信息为真,则跳转到所述同步操作之后,如果所述标记信息为假,则执行所述同步操作。According to an embodiment of the present invention, the step of compiling and generating a local code that allows no synchronization operation on the synchronization object of the synchronization method for the synchronization method called by the marked synchronization method call point includes: according to the synchronization method of the mark The call point compiles and generates a branch version of the synchronization method for the synchronization method, and inserts an instruction for setting marking information before the marked synchronization method call point. Wherein the branch version of the synchronization method is obtained through the following steps: inserting an instruction for obtaining tag information and an instruction for clearing the tag at the entry of the synchronization method; inserting a branch instruction before the synchronization operation acting on the synchronization object of the synchronization method , the branch instruction judges the obtained flag information, if the flag information is true, jump to after the synchronization operation, and if the flag information is false, execute the synchronization operation.

根据本发明的实施例,执行所述本地代码的步骤包括:当执行同步方法调用点时,如果所述同步方法调用点为标记的同步方法调用点,则调用所述非同步版本的本地代码,如果所述同步方法调用点为没有标记的同步方法调用点,则调用所述普通版本的本地代码。According to an embodiment of the present invention, the step of executing the local code includes: when executing a synchronous method call point, if the synchronous method call point is a marked synchronous method call point, calling the asynchronous version of the local code, If the synchronous method call point is an unmarked synchronous method call point, then call the normal version of the native code.

根据本发明的实施例,执行所述本地代码的步骤包括:当执行到同步方法调用点时,根据所述同步方法调用点前设置的标记信息指令调用所述分支版本的本地代码。According to an embodiment of the present invention, the step of executing the native code includes: calling the local code of the branch version according to the flag information set before the synchronous method call point when execution reaches the synchronous method call point.

根据本发明的实施例,对被编译方法进行静态程序分析中出现以下情况之一时不进行将同步方法调用点确定为无须对同步对象进行同步操作的同步方法调用点的步骤:作用于同一个同步对象上的相邻同步方法调用点和同步操作之间包含作用于其他对象上的同步操作;所述作用于同一个同步对象上的相邻同步方法调用点和同步操作之间包含访问易变volatile数据的操作;所述作用于同一个同步对象上的相邻同步方法调用点和同步操作之间包含静态无法解析的访问或方法调用;所述作用于同一个同步对象上的相邻同步方法调用点和同步操作之间包含循环;所述作用于同一个同步对象上的相邻同步方法调用点和同步操作之间包含锁定次数不一致的操作,所述锁定次数不一致的操作所属的控制流图CFG节点的各个前驱节点被锁定的次数不同。According to an embodiment of the present invention, when one of the following situations occurs in the static program analysis of the compiled method, the step of determining the synchronization method call point as a synchronization method call point that does not need to perform a synchronization operation on the synchronization object is not performed: acting on the same synchronization Between adjacent synchronization method call points and synchronization operations on the object include synchronization operations acting on other objects; between adjacent synchronization method call points and synchronization operations acting on the same synchronization object include access variable volatile Operation of data; the call point of the adjacent synchronization method acting on the same synchronization object and the synchronization operation contain static unresolvable access or method call; the call of the adjacent synchronization method acting on the same synchronization object There is a loop between the point and the synchronization operation; between the adjacent synchronization method call point and the synchronization operation acting on the same synchronization object, there are operations with inconsistent locking times, and the control flow graph CFG to which the operations with inconsistent locking times belong Each predecessor node of a node is locked a different number of times.

本发明还提出了一种同步优化设备,包括分析模块、编译模块和执行模块。其中,所述分析模块用于对被编译方法进行静态程序分析,根据分析结果确定所述被编译方法中无须对同步对象进行同步操作的同步方法调用点,并对所述同步方法调用点进行标记;所述编译模块用于根据所述标记的同步方法调用点为其调用的同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码;所述执行模块用于根据所述标记的同步方法调用点执行所述本地代码。The invention also proposes a synchronous optimization device, which includes an analysis module, a compilation module and an execution module. Wherein, the analysis module is used to perform static program analysis on the compiled method, determine the synchronization method call points in the compiled method that do not need to perform synchronization operations on the synchronization object according to the analysis results, and mark the synchronization method call points ; The compiling module is used to compile and generate a local code that allows not to perform a synchronization operation on the synchronization object of the synchronization method according to the synchronization method call point of the mark for the synchronization method called by it; A synchronous method call site executes the native code.

本发明所提出的同步优化的方法及设备具有高度的灵活性和良好的可扩展性。The method and device for synchronous optimization proposed by the present invention have high flexibility and good scalability.

附图说明 Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是根据本发明的一个实施例的同步优化方法的流程图;Fig. 1 is a flow chart of a synchronization optimization method according to an embodiment of the present invention;

图2是根据本发明的一个实施例的同步方法的代码版本的示意图;Fig. 2 is a schematic diagram of a code version of a synchronization method according to an embodiment of the present invention;

图3是根据本发明的一个实施例的双版本算法示意图;Fig. 3 is a schematic diagram of a dual-version algorithm according to an embodiment of the present invention;

图4是根据本发明的一个实施例的分支版本算法示意图;Fig. 4 is a schematic diagram of a branch version algorithm according to an embodiment of the present invention;

图5是根据本发明的一个实施例的同步优化设备的示意图;FIG. 5 is a schematic diagram of a synchronization optimization device according to an embodiment of the present invention;

图6是根据本发明的一个实施例的未使用方法内联时的吞吐量对比示意图;Fig. 6 is a schematic diagram of throughput comparison when method inlining is not used according to an embodiment of the present invention;

图7是根据本发明的一个实施例的使用方法内联后的吞吐量对比示意图。Fig. 7 is a schematic diagram of throughput comparison after inlining using methods according to an embodiment of the present invention.

具体实施方式 Detailed ways

下面详细描述本发明的实施例,所述实施例的示例在附图中示出。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。Embodiments of the invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

同步优化在改善应用程序的同步开销的同时,也耗费了较大的程序分析开销。为了权衡过程间同步优化的开销和收益,本发明对同步方法的调用点进行优化,因为:1)在实际的代码中,同步方法调用所花的同步开销在总同步开销中占很大比重;2)对同步方法的调用点的分析相对比较简单。While the synchronization optimization improves the synchronization overhead of the application program, it also consumes a large program analysis overhead. In order to balance the cost and benefit of synchronous optimization between processes, the present invention optimizes the calling point of the synchronous method, because: 1) in the actual code, the synchronous overhead spent by the synchronous method call accounts for a large proportion in the total synchronous overhead; 2) The analysis of the calling point of the synchronization method is relatively simple.

作为本发明的一个实施例,一个static的同步方法(即类同步方法)的同步对象是指该方法所属的类对应的Class对象;一个非static同步方法(即实例同步方法)的同步对象是指该方法的this对象;一个同步语句的同步对象是指在该同步语句中的圆括号内显式指定的对象,即同步语句synchronized(O){...}中的O。As an embodiment of the present invention, the synchronization object of a static synchronization method (i.e. class synchronization method) refers to the Class object corresponding to the class to which the method belongs; the synchronization object of a non-static synchronization method (i.e. instance synchronization method) refers to The this object of this method; the synchronization object of a synchronization statement refers to the object explicitly specified in the parentheses in the synchronization statement, that is, the O in the synchronization statement synchronized(O){...}.

在字节码级,语言级的同步语句变换成以monitorenter(加锁,即获得同步对象上的监视器)指令开始、以monitorexit(解锁,即释放相应的监视器)指令结束的指令序列。同步方法在字节码级则是在其常量池条目中设置有同步标志ACC_SYNCRONIZED,方法调用指令会检查这个标志,当调用一个设置有该标志的方法时(即调用同步方法时),当前线程将获取监视器,然后调用执行方法体本身,不论方法调用是正常结束还是异常结束都要释放这个监视器。为便于描述,本发明中统一用lock(O)和unlock(O)分别表示对对象O进行加锁和解锁。At the bytecode level, the language-level synchronization statement is transformed into a sequence of instructions starting with the monitorenter (lock, that is, obtaining the monitor on the synchronization object) instruction and ending with the monitorexit (unlocking, that is, releasing the corresponding monitor) instruction. At the bytecode level, the synchronization method is set with the synchronization flag ACC_SYNCRONIZED in its constant pool entry. The method call instruction will check this flag. When calling a method with this flag (that is, when calling the synchronization method), the current thread will Get the monitor, and then call to execute the method body itself, regardless of whether the method call ends normally or abnormally, the monitor must be released. For ease of description, in the present invention, lock(O) and unlock(O) are collectively used to represent locking and unlocking of the object O, respectively.

作为本发明的一个实施例,作用于对象O的一个同步区域是指从对该对象加锁到对该对象解锁的一段连续执行的指令序列形成的代码区域。As an embodiment of the present invention, a synchronization area acting on the object O refers to a code area formed by a sequence of continuously executed instructions from locking the object to unlocking the object.

本发明提出一种优化同步方法及其调用点的同步方法优化框架,其基本思想是分析出无须对同步对象进行同步的同步方法调用点,将其改成对该同步方法的非同步版本的调用,或者是设置标记信息并调用该同步方法的分支版本。该框架由程序分析、编译和执行3个阶段组成。下面分别简述这3个阶段。The present invention proposes a synchronization method optimization framework for optimizing the synchronization method and its calling point. The basic idea is to analyze the calling point of the synchronization method that does not need to synchronize the synchronization object, and change it into a calling of the non-synchronized version of the synchronization method. , or a branched version that sets the tag information and calls the synchronous method. The framework consists of three stages: program analysis, compilation and execution. These three stages are briefly described below.

如图1所示为根据本发明的一个实施例的同步优化方法100的流程图。如图所示,该方法100包括以下步骤:FIG. 1 is a flowchart of a synchronization optimization method 100 according to an embodiment of the present invention. As shown, the method 100 includes the following steps:

S101:对被编译方法进行静态程序分析,根据分析结果确定被编译方法中无须对同步对象进行同步操作的同步方法调用点,并对该同步方法调用点进行标记。该步骤或称程序分析阶段。S101: Perform static program analysis on the compiled method, determine a synchronization method call point in the compiled method that does not need to perform a synchronization operation on a synchronization object according to the analysis result, and mark the synchronization method call point. This step is also called the program analysis phase.

作为本发明的一个实施例,当编译器对Java程序中的一个方法进行编译时,对被编译方法进行静态程序分析,根据分析结果确定被编译方法中的哪些同步方法调用点在其同步对象上的同步操作是冗余的,并对这些同步方法调用点进行标记。As an embodiment of the present invention, when the compiler compiles a method in the Java program, static program analysis is performed on the compiled method, and according to the analysis result, it is determined which synchronization method call points in the compiled method are on its synchronization object The synchronous operations of are redundant and mark the sites of these synchronous method calls.

作为本发明的一个实施例,在该阶段,可以采用不同的程序分析技术来对被编译方法进行分析以确定哪些同步方法调用点需要标记,这些分析技术可以基于锁粗化技术,也可以基于逃逸分析技术,或者是其他技术。As an embodiment of the present invention, at this stage, different program analysis techniques can be used to analyze the compiled method to determine which synchronous method call points need to be marked. These analysis techniques can be based on lock coarsening technology or escape Analytical techniques, or other techniques.

作为本发明的一个实施例,程序分析阶段可以采用基于锁粗化的程序分析。As an embodiment of the present invention, the program analysis stage may adopt program analysis based on lock coarsening.

作为本发明的一个实施例,基于锁粗化的同步优化的主要思想是:如果线程T在获得锁而未释放该锁期间又获取同一个锁,那么第2个锁获取操作是不必要的;如果线程T连续多次获得并释放同一个锁,那么可以将这些锁获取和锁释放操作合并成一对,以减少锁操作的数量。As an embodiment of the present invention, the main idea of synchronization optimization based on lock coarsening is: if thread T acquires the same lock while acquiring the lock but not releasing the lock, then the second lock acquisition operation is unnecessary; If thread T acquires and releases the same lock multiple times in a row, then these lock acquisition and lock release operations can be combined into a pair to reduce the number of lock operations.

作为本发明的一个实施例,基于锁粗化技术分析识别无须同步的同步方法调用点的主要方法包括:合并作用于同一个同步对象上的相邻同步区域或嵌套同步区域以减少同步操作的数量。如果把同步方法调用看作是一种特殊的同步区域并让它与相邻的同步区域或者是包含它的同步区域相合并,那么包含在合并后的同步区域内的同步方法调用点在其同步对象上的同步操作是冗余的,从而可以对这些调用点进行标记。As an embodiment of the present invention, the main method for analyzing and identifying synchronization method call points that do not need to be synchronized based on lock coarsening technology includes: merging adjacent synchronization regions or nested synchronization regions that act on the same synchronization object to reduce the cost of synchronization operations quantity. If you treat a synchronized method call as a special kind of synchronization area and merge it with an adjacent synchronization area or the synchronization area that contains it, then the synchronization method call site contained in the merged synchronization area is synchronized at its synchronization area. Synchronous operations on objects are redundant, allowing these call sites to be marked.

作为本发明的一个实施例,程序分析阶段可以采用基于逃逸分析的程序分析。As an embodiment of the present invention, the program analysis stage may adopt program analysis based on escape analysis.

作为本发明的一个实施例,基于逃逸分析的同步优化的主要思想是:如果某个线程T对对象O进行同步操作期间没有其他线程T′对O进行访问,那么就可以删除线程T对O的同步操作。在Java应用程序中,常有些看似是单线程的程序使用JDK中的帮助线程,这些程序实际上是多线程的,但是帮助线程并不和应用程序的单线程共享数据;基于逃逸分析会识别出一些线程局部的对象,这些对象只被创建其的线程访问,对这些对象的访问是不需要同步的。As an embodiment of the present invention, the main idea of synchronization optimization based on escape analysis is: if no other thread T' accesses O during the synchronization operation of a certain thread T on the object O, then the thread T's access to O can be deleted. Synchronous operation. In Java applications, some seemingly single-threaded programs often use the helper threads in the JDK. These programs are actually multi-threaded, but the helper threads do not share data with the single thread of the application; based on escape analysis, it will be identified Create some thread-local objects, which are only accessed by the thread that created them, and access to these objects does not need to be synchronized.

作为本发明的一个实施例,根据一个对象是否在多个线程间共享来决定是否删除作用于该对象上的同步操作。如果一个对象O只在创建它的线程中被访问(即O未逃逸出线程),则所有作用于O的同步操作(包括O的同步方法调用所涉及的、对O的同步操作)是冗余的,从而可以对O的同步方法的任意调用点进行标记。As an embodiment of the present invention, it is determined whether to delete the synchronization operation acting on the object according to whether the object is shared among multiple threads. If an object O is accessed only in the thread that created it (that is, O has not escaped from the thread), then all synchronization operations on O (including synchronization operations on O involved in O's synchronization method calls) are redundant , so that any invocation site of O's synchronous method can be marked.

需要说明的是,程序分析阶段的目的是分析识别出无须同步的同步方法调用点,除了基于锁粗化的程序分析技术和逃逸分析技术外,还可以采用其他的技术。只要这项技术能识别出无须同步的同步方法调用点,这项技术就可以被本发明所述的同步方法优化框架的程序分析阶段所使用。It should be noted that the purpose of the program analysis phase is to analyze and identify the synchronization method call points that do not need to be synchronized. In addition to the program analysis technology and escape analysis technology based on lock coarsening, other technologies can also be used. This technique can be used by the program analysis phase of the synchronization method optimization framework described in the present invention as long as the technique can identify synchronous method call points that do not require synchronization.

S102:根据所述标记的同步方法调用点为同步方法编译生成允许不对同步方法的同步对象执行同步操作的本地代码,该步骤或称编译阶段。S102: Compile and generate local code for the synchronization method according to the marked synchronization method call point, which allows no synchronization operation to be performed on the synchronization object of the synchronization method. This step is also called the compilation phase.

作为本发明的一个实施例,根据需要,可以采用双版本策略或分支版本策略为同步方法编译生成不同版本的本地代码。As an embodiment of the present invention, as required, a dual version strategy or a branch version strategy can be used to compile and generate different versions of local codes for the synchronization method.

作为本发明的一个实施例,双版本策略可以包括:为一个同步方法编译生成两个版本的本地代码,一个为该方法的普通版本,另一个为该方法的非同步版本。作为本发明的一个实施例,一个同步方法的普通版本是指按常规方式对该方法的代码进行编译生成的本地代码。一个同步方法的非同步版本是指在编译该方法的代码时,忽略对该方法的同步对象上的所有同步操作而生成的本地代码。As an embodiment of the present invention, the dual-version strategy may include: compiling and generating two versions of native code for a synchronous method, one is the normal version of the method, and the other is the asynchronous version of the method. As an embodiment of the present invention, the common version of a synchronization method refers to the local code generated by compiling the code of the method in a conventional manner. An unsynchronized version of a synchronized method is the native code generated by ignoring all synchronization operations on the method's synchronized object when the method's code is compiled.

作为本发明的一个实施例,分支版本策略可以包括:为一个同步方法编译生成一个分支版本的本地代码,并在被标记的同步方法调用点前插入一条设置标记信息的指令。作为本发明的一个实施例,一个同步方法的分支版本是指在编译该方法对应的代码时,增加如下处理:(a)在方法入口处插入一条获得标记信息的指令和一条清除标记的指令;(b)对于每个作用在该方法的同步对象上的同步操作,在其之前插入一条分支指令,该分支指令判断获得的标记信息,若为真,则跳转到该同步操作之后,否则,执行该同步操作。As an embodiment of the present invention, the branch version strategy may include: compiling and generating a branch version of local code for a synchronization method, and inserting an instruction for setting marking information before the marked synchronization method call point. As an embodiment of the present invention, the branch version of a synchronous method refers to adding the following processing when compiling the code corresponding to the method: (a) inserting an instruction for obtaining tag information and an instruction for clearing the tag at the method entry; (b) For each synchronization operation acting on the synchronization object of the method, insert a branch instruction before it, the branch instruction judges the obtained flag information, if it is true, jump to after the synchronization operation, otherwise, Execute the sync operation.

如图2所示为根据本发明的一个实施例的同步方法的不同代码版本的示意图。以图2(a)中的实例同步方法setSharedField()为例,图2中(b)、(c)和(d)分别给出其普通版本、非同步版本和分支版本的伪代码。在(b)图中,执行方法体中的代码之前和之后需要分别对this对象加锁和解锁;在(c)图中,直接执行方法体中的代码;在(d)图中,首先获取当前方法被调用时的标记信息marked,然后清除标记,接着对每个作用在this对象上的加锁和解锁操作增加分支指令以保证这些操作在marked为假时才被执行。FIG. 2 is a schematic diagram of different code versions of a synchronization method according to an embodiment of the present invention. Taking the instance synchronization method setSharedField() in Figure 2(a) as an example, Figure 2(b), (c) and (d) give the pseudocodes of its normal version, asynchronous version and branch version respectively. In (b) diagram, the this object needs to be locked and unlocked before and after executing the code in the method body; in (c) diagram, the code in the method body is directly executed; in (d) diagram, first obtain The mark information when the current method is called is marked, then clear the mark, and then add branch instructions to each lock and unlock operation on this object to ensure that these operations are executed when marked is false.

双版本策略和分支版本策略有各自的优缺点。双版本策略为一个同步方法生成并保存两个版本的本地代码,而分支版本策略只需要为一个同步方法生成并保存一个版本的本地代码,故双版本策略需要额外的空间开销并且需要虚拟机内核支持对多版本本地代码的管理。另一方面,双版本策略为一个同步方法生成的每个版本的本地代码中没有添加判断该方法的当前调用是否被标记的分支操作,而分支版本策略则添加了额外的这类分支操作,故分支版本会在一定程度上影响程序的运行时性能。The dual version strategy and the branch version strategy have their own advantages and disadvantages. The dual-version strategy generates and saves two versions of local code for a synchronization method, while the branch version strategy only needs to generate and save one version of the local code for a synchronization method, so the double-version strategy requires additional space overhead and requires a virtual machine kernel Supports management of multi-version native code. On the other hand, the double-version strategy does not add a branch operation to determine whether the current invocation of the method is marked in each version of the local code generated by a synchronous method, while the branch version strategy adds additional such branch operations, so The branch version will affect the runtime performance of the program to a certain extent.

双版本策略适合那些被标记的同步方法数目少但调用频率高的情况,分支版本策略则适合那些被标记的同步方法数目多且调用频率低的情况。在进行同步方法优化时,可以根据应用程序的特点来决定是使用双版本策略还是使用分支版本策略,或者是设计一种启发式算法,让编译器自动选择最合适的策略。The dual-version strategy is suitable for situations where the number of marked synchronization methods is small but the calling frequency is high, and the branch version strategy is suitable for those situations where the number of marked synchronization methods is large and the calling frequency is low. When optimizing the synchronization method, you can decide whether to use the double version strategy or the branch version strategy according to the characteristics of the application, or design a heuristic algorithm to let the compiler automatically select the most appropriate strategy.

S103:根据标记的同步方法调用点执行本地代码,该步骤或称执行阶段。该阶段负责执行本地代码,它根据编译阶段采取的是双版本策略还是分支版本策略而略有不同。S103: Execute the local code according to the marked synchronous method call point, this step is called the execution phase. This phase is responsible for executing native code, and it varies slightly depending on whether the compilation phase adopts a double-version strategy or a branch-version strategy.

作为本发明的一个实施例,如果编译阶段采取的是双版本策略,当执行到一个同步方法的调用点时,若在程序分析阶段标记过该调用点,则调用其非同步版本的本地代码,否则,调用其普通版本的本地代码。As an embodiment of the present invention, if the compiling stage adopts a dual-version strategy, when the invocation point of a synchronous method is executed, if the invocation point is marked in the program analysis stage, the local code of its asynchronous version is called, Otherwise, call its normal version of the native code.

作为本发明的一个实施例,如果编译阶段采取的是分支版本策略,当执行到一个同步方法的调用点时,根据程序分析阶段是否标记过该调用点来决定是否设置标记信息,然后调用其分支版本的本地代码。As an embodiment of the present invention, if the compilation stage adopts a branch version strategy, when the call point of a synchronous method is executed, it is determined whether to set the mark information according to whether the call point has been marked in the program analysis stage, and then call its branch version of the native code.

以上所述的同步方法优化方法包括程序分析、编译和执行三个阶段,前两个阶段分别可以采取不同的策略。以下介绍同步方法优化框架的几个应用实例,即基于锁粗化的双版本同步方法优化算法(简称双版本粗化算法)、基于锁粗化的分支版本同步方法优化算法(简称分支版本粗化算法)、基于逃逸分析的双版本同步方法优化算法(简称双版本逃逸算法)和基于逃逸分析的分支版本同步方法优化算法(简称分支版本逃逸算法)等。The synchronization method optimization method described above includes three stages of program analysis, compilation and execution, and different strategies can be adopted in the first two stages. The following introduces several application examples of the synchronization method optimization framework, namely, the dual-version synchronization method optimization algorithm based on lock coarsening (referred to as the dual-version coarsening algorithm), and the branch version synchronization method optimization algorithm based on lock coarsening (referred to as the branch version coarsening algorithm). Algorithm), the double-version synchronization method optimization algorithm based on escape analysis (referred to as the dual-version escape algorithm), and the branch version synchronization method optimization algorithm based on escape analysis (referred to as the branch version escape algorithm), etc.

双版本粗化算法是指在同步方法优化框架的程序分析阶段采用锁粗化分析技术、编译阶段采用双版本策略所形成的同步方法优化算法;分支版本粗化算法指在同步方法优化框架的程序分析阶段采用锁粗化分析技术、编译阶段采用分支版本策略所形成的同步方法优化算法;双版本逃逸算法指在同步方法优化框架的程序分析阶段采用逃逸分析技术、编译阶段采用双版本策略所形成的同步方法优化算法;分支版本逃逸算法指在同步方法优化框架的程序分析阶段采用逃逸分析技术、编译阶段采用分支版本策略所形成的同步方法优化算法。The double-version coarsening algorithm refers to the synchronization method optimization algorithm formed by using the lock coarsening analysis technology in the program analysis stage of the synchronization method optimization framework and the dual-version strategy in the compilation stage; the branch version coarsening algorithm refers to the program in the synchronization method optimization framework Synchronous method optimization algorithm formed by using lock coarse analysis technology in the analysis stage and branch version strategy in the compilation stage; double-version escape algorithm refers to the use of escape analysis technology in the program analysis stage of the synchronization method optimization framework and the dual-version strategy in the compilation stage. The synchronization method optimization algorithm; the branch version escape algorithm refers to the synchronization method optimization algorithm formed by using the escape analysis technology in the program analysis stage of the synchronization method optimization framework and the branch version strategy in the compilation stage.

作为本发明的一个实施例,双版本粗化算法包括以下两个部分:As an embodiment of the present invention, the dual-version coarsening algorithm includes the following two parts:

1)利用Java虚拟机的即时编译器(JIT)对每个被编译方法的CFG(控制流图)进行程序分析和变换,执行如下操作:1) Use the just-in-time compiler (JIT) of the Java virtual machine to analyze and transform the CFG (control flow graph) of each compiled method, and perform the following operations:

(a)找到作用于同一个同步对象上的、可以合并的相邻同步方法调用点以及同步操作,将它们进行合并并标记在合并区域内的同步方法调用点。(a) Find adjacent synchronization method call points and synchronization operations that act on the same synchronization object and can be merged, merge them and mark the synchronization method call points in the merged area.

(b)在作用于某同步对象的同步语句块内部,找到作用于该同步对象上的、可以合并的同步操作和同步方法调用点,删除这些同步操作并标记这些同步方法调用点。(b) In the synchronization statement block acting on a certain synchronization object, find the synchronous operations and synchronization method call points that can be combined on the synchronization object, delete these synchronization operations and mark these synchronization method call points.

2)修改Java虚拟机中负责协调Java方法的编译和执行的组件,即编译流水线及虚拟机内核VMCore。当执行到一个同步方法调用点时,如果该调用点未被标记,检查该同步方法的普通版本的本地代码是否存在,如果存在,则执行之,否则启动JIT编译流水线为之编译生成普通版本的本地代码,然后执行之;如果该调用点被标记,检查该同步方法的非同步版本的本地代码是否存在,如果存在,则执行之,否则启动JIT编译生成非同步版本的本地代码,然后执行之。2) Modify the component responsible for coordinating the compilation and execution of the Java method in the Java virtual machine, that is, the compilation pipeline and the virtual machine kernel VMCore. When executing a synchronous method call point, if the call point is not marked, check whether the normal version of the native code of the synchronous method exists, if it exists, execute it, otherwise start the JIT compilation pipeline to compile and generate the normal version for it Native code, and then execute it; if the call point is marked, check whether the asynchronous version of the native code of the synchronous method exists, if it exists, execute it, otherwise start JIT compilation to generate an asynchronous version of the native code, and then execute it .

作为本发明的一个实施例,如果作用于同一个同步对象上的相邻同步方法调用点和同步操作之间存在下列情况之一,则不能进行上述的优化过程:As an embodiment of the present invention, if one of the following conditions exists between the adjacent synchronization method call site and the synchronization operation acting on the same synchronization object, the above optimization process cannot be performed:

条件1:包含作用于其他对象上的同步操作。Condition 1: Contains synchronous operations acting on other objects.

条件2:包含访问易变(volatile)数据的操作。Condition 2: Operations involving access to volatile data.

条件3:包含静态无法解析的访问或方法调用。Condition 3: Contains a static unresolvable access or method call.

条件4:包含循环。Condition 4: contains loops.

条件5:某操作对应于CFG中的节点的各个前驱节点被锁定的次数不同。Condition 5: A certain operation corresponds to a node in the CFG whose predecessor nodes are locked for different times.

其中,条件1~3是为了防止执行同步区域合并后的代码以不同的次序获得不同的锁而可能产生死锁;条件3~4是为了防止同步区域合并产生超大的同步区域而降低多线程程序的并发度;条件5是一种复杂的情况,这种情况出现的概率几乎为0,故该算法不处理这种情况。Among them, conditions 1 to 3 are to prevent deadlocks that may occur in codes that acquire different locks in different orders after merging synchronization areas; conditions 3 to 4 are to prevent synchronization area merging from generating a large synchronization area and reduce the number of multi-threaded programs. The degree of concurrency; condition 5 is a complex situation, the probability of this situation is almost 0, so the algorithm does not deal with this situation.

作为本发明的一个实施例,分支版本粗化算法利用Java虚拟机的即时编译器对每个被编译方法的CFG进行程序分析和变换,执行如下操作:As an embodiment of the present invention, the branch version coarsening algorithm utilizes the just-in-time compiler of the Java virtual machine to carry out program analysis and transformation to the CFG of each compiled method, and performs the following operations:

(a)找到作用于同一个同步对象上的、可以合并的相邻同步方法调用点以及同步操作,将它们进行合并并在这些同步方法调用点前插入一条标记指令(该指令将标记信息置为真)。(a) Find adjacent synchronization method call points and synchronization operations that act on the same synchronization object and can be merged, merge them and insert a mark instruction before these synchronization method call points (this instruction sets the mark information to real).

(b)在作用于某同步对象的同步语句块内部,找到作用于该同步对象上的、可以合并的同步操作和同步方法调用点,删除这些同步操作,并在这些同步方法调用点前插入一条标记指令(该指令将标记信息置为真)。(b) In the synchronization statement block acting on a synchronization object, find the synchronous operations and synchronization method call points that can be combined on the synchronization object, delete these synchronization operations, and insert a line before these synchronization method call points Flag command (this command sets flag information to true).

(c)如果当前被编译方法是标记过的同步方法,为其变换生成分支版本的本地代码。(c) If the currently compiled method is a marked synchronous method, generate a branched version of the native code for its transformation.

与双版本粗化算法一样,如果作用于同一个同步对象上的相邻同步方法调用点和同步操作之间存在条件1~5之一,则不能进行上述的优化过程。Like the double-version coarsening algorithm, if one of the conditions 1-5 exists between the adjacent synchronization method call site and the synchronization operation acting on the same synchronization object, the above optimization process cannot be performed.

作为本发明的一个实施例,双版本逃逸算法和分支版本逃逸算法在同步方法优化框架的程序分析阶段采用逃逸分析技术,如果逃逸分析的结果表明对象O未逃逸出某个线程,即对象O只被该线程访问,则可以对作用于同步对象O的所有同步方法调用点进行标记。As an embodiment of the present invention, the double-version escape algorithm and the branch version escape algorithm adopt escape analysis technology in the program analysis stage of the synchronization method optimization framework. If accessed by this thread, all synchronization method call points acting on the synchronization object O can be marked.

作为本发明的一个实施例,对于双版本逃逸算法,与双版本粗化算法类似,还需要修改编译流水线及虚拟机内核VMCore。当执行到一个同步方法调用点时,如果该调用点未被标记,检查该同步方法的普通版本的本地代码是否存在,如果存在,则执行之,否则启动JIT编译流水线编译生成普通版本的本地代码,然后执行之;如果该调用点被标记,检查该同步方法的非同步版本的本地代码是否存在,如果存在,则执行之,否则启动JIT编译生成非同步版本的本地代码,然后执行之。As an embodiment of the present invention, for the double-version escape algorithm, similar to the double-version coarsening algorithm, it is also necessary to modify the compilation pipeline and the virtual machine kernel VMCore. When executing a synchronous method call point, if the call point is not marked, check whether the normal version of the native code of the synchronous method exists, if it exists, execute it, otherwise start the JIT compilation pipeline to compile and generate the normal version of the native code , and then execute it; if the call point is marked, check whether the native code of the asynchronous version of the synchronous method exists, if it exists, execute it, otherwise start JIT compilation to generate the native code of the asynchronous version, and then execute it.

作为本发明的一个实施例,对于分支版本逃逸算法,还需要在每个标记过的同步方法调用点之前插入一条标记指令mark(),并将被标记过的每个同步方法编译成分支版本的本地代码。As an embodiment of the present invention, for the branch version escape algorithm, it is also necessary to insert a marking instruction mark() before each marked synchronization method call point, and compile each marked synchronization method into a branch version local code.

需要注意的是,本发明所提出的同步方法优化框架非常灵活,它的前两个阶段均可以分别采取不同的策略,每个阶段选择不同策略可以组合成不同的优化算法。可以根据不同的应用需求,使用不同的组合。It should be noted that the optimization framework of the synchronization method proposed by the present invention is very flexible, and different strategies can be adopted in the first two stages, and different strategies can be combined into different optimization algorithms in each stage. Different combinations can be used according to different application requirements.

除了高度的灵活性之外,该框架还具有另一个优点,那就是良好的可扩展性。该框架的程序分析阶段除了可以采用锁粗化和逃逸分析之外,还可以采用其他的策略,只要该策略能识别出无须同步的同步方法调用点,都可以作为该同步方法优化框架的第一阶段。同步方法优化框架的程序分析阶段每增加一种新的策略,就可以用这种策略与编译阶段的2种策略分别进行组合,从而生成新的优化算法。Besides high flexibility, another advantage of this framework is good scalability. In addition to lock coarsening and escape analysis, other strategies can be used in the program analysis phase of the framework. As long as the strategy can identify the synchronization method call point that does not need to be synchronized, it can be used as the first step in the synchronization method optimization framework. stage. Whenever a new strategy is added in the program analysis stage of the synchronization method optimization framework, this strategy can be combined with the two strategies in the compilation stage to generate a new optimization algorithm.

作为本发明的一个实施例,可以在开源的Java SE平台Apache Harmony上,用C++实现双版本粗化算法和分支版本粗化算法,下面分别举例介绍这两个算法的实现。As an embodiment of the present invention, can be on the open source Java SE platform Apache Harmony, realize double-version coarsening algorithm and branch version coarsening algorithm with C++, introduce the realization of these two algorithms with examples respectively below.

双版本粗化算法指在同步方法优化框架的程序分析阶段采用锁粗化技术,编译阶段采用双版本策略的算法,该算法包括两个部分:The dual-version coarsening algorithm refers to the algorithm that adopts the lock coarsening technology in the program analysis phase of the synchronization method optimization framework, and adopts the dual-version strategy in the compilation phase. The algorithm includes two parts:

1)利用Harmony的即时编译器对每个被编译方法的CFG进行程序分析和变换,该过程的算法描述如下:1) Use Harmony's just-in-time compiler to analyze and transform the CFG of each compiled method. The algorithm of this process is described as follows:

输入:当前被编译的方法的CFGInput: CFG of the method currently being compiled

输出:变换后的CFGOutput: Transformed CFG

过程:包括程序分析和代码变换两个阶段,其基本框架如下:Process: including two stages of program analysis and code transformation, the basic framework is as follows:

doubleOptimization(){doubleOptimization(){

  programAnalyse();//程序分析programAnalyse();//program analysis

  transform();//代码变换transform();//code transformation

}}

如图3所示为根据本发明的一个实施例的双版本算法的示意图。其中O.syncCall()指令是对该同步方法的普通版本的本地代码的调用,O.Call()指令是对非同步版本的本地代码的调用。以图3展示的伪CFG(即图中的节点未必是严格的基本块节点)为例对其进行简单描述。FIG. 3 is a schematic diagram of a dual-version algorithm according to an embodiment of the present invention. The O.syncCall() instruction is a call to the native code of the normal version of the synchronous method, and the O.Call() instruction is a call to the native code of the asynchronous version. Take the pseudo-CFG shown in Figure 3 (that is, the nodes in the figure may not necessarily be strict basic block nodes) as an example to briefly describe it.

programAnalyse():利用JIT对当前被编译方法的中间表示进行程序分析,找到作用于同一个同步对象上的相邻同步方法调用点,如图3中的O.syncCall1()、O.syncCall2()和O.syncCall3(),以及同步操作,如图3中的lock(O)和unlock(O)。programAnalyse(): Use JIT to perform program analysis on the intermediate representation of the currently compiled method, and find the adjacent synchronization method call points acting on the same synchronization object, such as O.syncCall1() and O.syncCall2() in Figure 3 And O.syncCall3 (), and synchronous operations, such as lock (O) and unlock (O) in Figure 3.

transform():根据找到的这些同步方法调用点和同步操作对当前被编译方法的中间表示进行代码变换,其具体过程如下:transform(): Perform code transformation on the intermediate representation of the currently compiled method according to the found synchronous method call points and synchronous operations. The specific process is as follows:

(a)找到这些相邻的同步方法调用点与同步操作中的第一个,如图3中的O.syncCall1(),在它之前插入lock(O)指令;找到这些相邻的同步方法调用点与同步操作中的最后一个,如图3中的O.syncCall3(),在它之后插入unlock(O)指令。(a) Find the first of these adjacent synchronous method call sites and synchronous operations, such as O.syncCall1() in Figure 3, insert the lock (O) instruction before it; find these adjacent synchronous method calls The last one in the point and synchronization operation, such as O.syncCall3() in Figure 3, inserts the unlock(O) instruction after it.

(b)标记在programAnalyse()中找到的这些同步方法调用点,如图3中的O.syncCall1()、O.syncCall2()和O.syncCall3();删除在programAnalyse()中找到的这些同步操作,如图3中的lock(O)和unlock(O)。(b) mark these synchronization method call points found in programAnalyse(), such as O.syncCall1(), O.syncCall2() and O.syncCall3() in Figure 3; delete these synchronizations found in programAnalyse() Operation, such as lock (O) and unlock (O) in Figure 3.

(c)对于在代码变换之前不在同步区域中、但在代码变换之后被包含到同步区域中的基本块,如图3中的基本块D,如果在代码变换之后,存在从不在同步区域中的基本块到该基本块的控制流边,如图3中从基本块C到基本块D的控制流边,则需要在该边上插入lock(O)指令;如果在代码变换之后,存在从该基本块到不在同步区域中的基本块的控制流边,如图3中从基本块D到基本块E的控制流边,需要在该边上插入unlock(O)指令。(c) For a basic block that is not in the synchronization area before the transcoding but is included in the synchronous area after the transcoding, such as basic block D in Figure 3, if after the transcoding, there is a The control flow edge from the basic block to the basic block, such as the control flow edge from the basic block C to the basic block D in Figure 3, then you need to insert a lock (O) instruction on this edge; The control flow edge from the basic block to the basic block not in the synchronization area, such as the control flow edge from basic block D to basic block E in Figure 3, needs to insert an unlock (O) instruction on this edge.

(d)修改编译流水线及虚拟机内核VMCore。当执行到一个同步方法调用点时,如果该调用点未被标记,则执行其普通版本的本地代码,然后执行之;如果该调用点被标记,如图3中的O.syncCall1()、O.syncCall2()和O.syncCall3(),则执行其非同步版本的本地代码,如图3中的O.Call1()、O.Call2()和O.Call3()。(d) Modify the compilation pipeline and the virtual machine kernel VMCore. When a synchronous method call point is executed, if the call point is not marked, then execute its normal version of the local code, and then execute it; if the call point is marked, as shown in O.syncCall1(), O in Figure 3 .syncCall2() and O.syncCall3() execute the local codes of their asynchronous versions, such as O.Call1(), O.Call2() and O.Call3() in Figure 3.

作为本发明的一个实施例,分支版本粗化算法指在同步方法优化框架的程序分析阶段采用锁粗化技术,编译阶段采用分支版本策略的算法,其算法描述如下:As an embodiment of the present invention, the branch version coarsening algorithm refers to the algorithm that adopts the lock coarsening technology in the program analysis stage of the synchronization method optimization framework, and adopts the branch version strategy in the compilation stage, and its algorithm is described as follows:

输入:当前被编译的方法的CFGInput: CFG of the method currently being compiled

输出:变换后的CFGOutput: Transformed CFG

过程:包括程序分析、代码变换以及分支变换三个阶段,其基本框架如下:Process: Including three stages of program analysis, code transformation and branch transformation, the basic framework is as follows:

branchOptimization(){branchOptimization(){

  programAnalyse();//程序分析programAnalyse();//program analysis

  transform();//代码变换transform();//code transformation

  branch();//分支变换branch();//branch transformation

}}

如图4所示为根据本发明的一个实施例的分支版本算法的示意图。其中O.syncCall()指令是对该同步方法普通版本本地代码的调用,O.brCall()指令是对分支版本本地代码的调用。下面以图4展示的伪CFG为例进行简单描述。FIG. 4 is a schematic diagram of a branch version algorithm according to an embodiment of the present invention. The O.syncCall() instruction is a call to the native code of the normal version of the synchronization method, and the O.brCall() instruction is a call to the branch version native code. The pseudo CFG shown in Fig. 4 is taken as an example for a brief description below.

programAnalyse():利用JIT对当前被编译方法的中间表示进行程序分析,找到作用于同一个同步对象上的相邻同步方法调用点,如图4中的O.syncCall1()、O.syncCall2()和O.syncCall3(),以及同步操作,如图4中的lock(O)和unlock(O)。programAnalyse(): Use JIT to perform program analysis on the intermediate representation of the currently compiled method, and find the adjacent synchronization method call points acting on the same synchronization object, such as O.syncCall1() and O.syncCall2() in Figure 4 And O.syncCall3 (), and synchronous operations, such as lock (O) and unlock (O) in Figure 4.

transform():根据找到的这些同步方法调用点和同步操作对当前被编译方法的中间表示进行代码变换,其具体过程如下:transform(): Perform code transformation on the intermediate representation of the currently compiled method according to the found synchronous method call points and synchronous operations. The specific process is as follows:

(a)找到这些相邻的同步方法调用点与同步操作中的第一个,如图4中的O.syncCall1(),在它之前插入lock(O)指令;找到这些相邻的同步方法调用点与同步操作中的最后一个,如图4中的O.syncCall3(),在它之后插入unlock(O)指令。(a) Find the first of these adjacent synchronous method call points and synchronous operations, such as O.syncCall1() in Figure 4, insert the lock (O) instruction before it; find these adjacent synchronous method calls The last one in the point and synchronization operation, such as O.syncCall3() in Figure 4, inserts the unlock(O) instruction after it.

(b)在programAnalyse()中找到的这些同步方法调用点之前插入一条标记指令mark(),如图4中的O.syncCall1()、O.syncCall2()和O.syncCall3(),删除在programAnalyse()中找到的这些同步操作,如图4中的lock(O)和unlock(O)。(b) Insert a mark instruction mark() before the synchronous method call points found in programAnalyse(), such as O.syncCall1(), O.syncCall2() and O.syncCall3() in Figure 4, and delete them in programAnalyse These synchronous operations found in () are lock(O) and unlock(O) in Figure 4.

(c)对于在代码变换之前不在同步区域中、但在代码变换之后被包含到同步区域中的基本块,如图4中的基本块D,如果在代码变换之后,存在从不在同步区域中的基本块到该基本块的控制流边,如图4中从基本块C到基本块D的控制流边,需要在该边上插入lock(O)指令,如果在代码变换之后,存在从该基本块到不在同步区域中的基本块的控制流边,如图4中从基本块D到基本块E的控制流边,需要在该边上插入unlock(O)指令。(c) For a basic block that is not in the synchronization area before the transcoding but is included in the synchronous area after the transcoding, such as basic block D in Figure 4, if after transcoding, there is a block that is never in the synchronous area The control flow edge from the basic block to the basic block, such as the control flow edge from basic block C to basic block D in Figure 4, needs to insert a lock (O) instruction on this edge. The control flow edge from the block to the basic block not in the synchronization area, such as the control flow edge from basic block D to basic block E in Figure 4, needs to insert an unlock (O) instruction on this edge.

branch():如果当前被编译方法是同步方法,将其变换成分支版本,其具体过程如下:branch(): If the currently compiled method is a synchronous method, transform it into a branch version. The specific process is as follows:

(a)在方法入口处插入一条获得标记信息的指令和一条清除标记的指令,如图4中入口块中的marked=getMark()与clearMark()指令。(a) Insert an instruction for obtaining mark information and an instruction for clearing the mark at the entry of the method, such as the marked=getMark() and clearMark() instructions in the entry block in Figure 4 .

(b)对于作用在该方法的同步对象上的同步操作,如图4中的lock(O1)和unlock(O1),在其之前插入一条分支指令,该分支指令判断获得的标记信息marked是否为真,若为真,则跳转到该同步操作之后,否则,执行该同步操作。(b) For the synchronization operation acting on the synchronization object of this method, such as lock (O1) and unlock (O1) in Figure 4, a branch instruction is inserted before it, and the branch instruction judges whether the obtained mark information marked is True, if true, jump to after the synchronous operation, otherwise, execute the synchronous operation.

需要注意的是,作为本发明的一个实施例,无论是双版本粗化算法还是分支版本粗化算法,如果作用于同一个同步对象上的相邻同步方法调用点和同步操作之间满足上述的条件1~5之一,则不能进行优化。如图3及图4中的基本块A、B和D,如果满足上述条件1~5中的任何一条,都不能进行优化。It should be noted that, as an embodiment of the present invention, whether it is a double-version coarsening algorithm or a branch version coarsening algorithm, if the above-mentioned One of conditions 1 to 5 cannot be optimized. The basic blocks A, B and D in Fig. 3 and Fig. 4 cannot be optimized if any one of the above conditions 1-5 is satisfied.

如图5所示为根据本发明的一个实施例的同步优化设备的示意图。如图所示,该同步优化设备包括分析模块、编译模块和执行模块。其中,FIG. 5 is a schematic diagram of a synchronization optimization device according to an embodiment of the present invention. As shown in the figure, the synchronization optimization device includes an analysis module, a compilation module and an execution module. in,

分析模块用于对被编译方法进行静态程序分析,根据分析结果确定被编译方法中在同步对象上存在无须同步的同步方法调用点,并对该同步方法调用点进行标记。The analysis module is used for static program analysis of the compiled method, and according to the analysis result, it is determined that there is a synchronous method call point that does not need to be synchronized on the synchronization object in the compiled method, and the call point of the synchronous method is marked.

编译模块用于根据标记的同步方法调用点为同步方法编译生成允许不对所述同步方法的同步对象执行同步操作的本地代码。The compiling module is used for compiling and generating a local code for synchronous method compilation according to the marked synchronous method call point, allowing no synchronous operation to be performed on the synchronous object of the synchronous method.

执行模块用于根据标记的同步方法调用点执行所述本地代码。The execution module is used to execute the local code according to the marked synchronous method call point.

本发明提出的同步方法优化方法及设备主要有以下三个优点:The synchronization method optimization method and equipment proposed by the present invention mainly have the following three advantages:

第一,本发明的同步方法优化方法及设备具有高度的灵活,可以根据不同的需求,自由组合各阶段的不同策略,生成不同的优化算法。First, the synchronization method optimization method and equipment of the present invention are highly flexible, and can freely combine different strategies at each stage to generate different optimization algorithms according to different requirements.

第二,本发明的同步方法优化方法及设备具有良好的可扩展性,只要一种程序分析方法能识别出无须同步的同步方法调用点,就可以将其应用于该同步方法优化框架的第一阶段。Second, the synchronization method optimization method and device of the present invention have good scalability, as long as a program analysis method can identify synchronization method call points that do not need to be synchronized, it can be applied to the first synchronization method optimization framework stage.

第三,在本发明的同步方法优化方法及设备的基础上设计的优化算法是过程间的同步优化算法,不仅能删除大量过程内同步算法无法删除的同步操作,而且能用过程内同步优化算法的开销获得接近过程间算法的优化效果,缓解了开销和优化效果之间的矛盾。Third, the optimization algorithm designed on the basis of the synchronization method optimization method and equipment of the present invention is a synchronization optimization algorithm between processes, which can not only delete a large number of synchronization operations that cannot be deleted by the synchronization algorithm in the process, but also use the synchronization optimization algorithm in the process. The cost of the algorithm is close to the optimization effect of the inter-procedural algorithm, which alleviates the contradiction between the cost and the optimization effect.

发明人在开源的Java SE平台Apache Harmony上,基于该同步方法优化框架实现了双版本粗化算法和分支版本粗化算法,该框架可以同样地应用在其他Java虚拟机上。发明人以SPECjbb2005作为测试程序对同步方法优化算法的实现进行了测试。The inventor realized the dual-version coarsening algorithm and the branch-version coarsening algorithm based on the synchronization method optimization framework on the open source Java SE platform Apache Harmony, and the framework can be similarly applied to other Java virtual machines. The inventor used SPECjbb2005 as a test program to test the realization of the synchronization method optimization algorithm.

表1给出了SPECjbb2005中各种同步操作的数量,其中静态和动态分别表示通过编译时分析和运行时分析统计出来的各种同步操作的数量,内联(inline)前和内联后分别表示执行方法内联前后SPECjbb2005中各种同步操作的数量。静态编译时分析只对应用程序中的每个方法分析一次,它不能准确反映程序在实际运行时执行的各种同步操作数量,但能反映出SPECjbb2005中静态存在的可优化的机会有多少;而动态运行时分析所收集、统计的信息(动态列)能准确反映程序在实际运行时执行的各种同步操作数量,从而能准确反映出同步优化算法对整个程序性能的影响。表中的Locks表示所有monitorenter的数量,Unlocks表示所有monitorexit的数量,inserted Locks表示采用锁粗化技术时插入的monitorenter的数量,insertedUnlocks表示采用锁粗化技术时插入的monitorexit的数量,removed Locks表示优化算法删除掉的monitorenter的数量,removed Unlocks表示优化算法删除掉的monitorexit的数量,Sync-Methods是所有同步方法的数量,optSync-Methods是在双版本策略中存在非同步版本的本地代码的同步方法的数量。对于静态信息而言,sync method Callsites表示所有同步方法调用点的数量,opt sync methods Callsites表示在同步方法优化算法中所有被标记过的同步方法调用点的数量,即被优化的同步方法调用点的数量。对于动态信息而言,sync method Callsites表示同步方法被执行的总次数,opt syncmethods Callsites表示所有被优化的同步方法调用点被执行的总次数。Table 1 shows the number of various synchronization operations in SPECjbb2005, where static and dynamic represent the number of various synchronization operations counted by compile-time analysis and run-time analysis respectively, and before inline and after inline represent respectively The number of various synchronization operations in SPECjbb2005 before and after method inlining is performed. Static compile-time analysis only analyzes each method in the application program once. It cannot accurately reflect the number of various synchronization operations performed by the program during actual runtime, but it can reflect how many opportunities exist for static optimization in SPECjbb2005; and The information collected and counted by dynamic runtime analysis (dynamic columns) can accurately reflect the number of various synchronization operations performed by the program during actual runtime, thereby accurately reflecting the impact of the synchronization optimization algorithm on the performance of the entire program. Locks in the table indicates the number of all monitorenters, Unlocks indicates the number of all monitorexits, inserted Locks indicates the number of monitorenters inserted when lock coarsening technology is used, insertedUnlocks indicates the number of monitor exits inserted when lock coarsening technology is used, and removed Locks indicates optimization The number of monitorenters deleted by the algorithm, removed Unlocks indicates the number of monitorexits deleted by the optimization algorithm, Sync-Methods is the number of all synchronization methods, and optSync-Methods is the synchronization method of local codes with non-synchronized versions in the dual-version strategy quantity. For static information, sync method Callsites indicates the number of all synchronization method call sites, and opt sync methods Callsites indicates the number of all marked synchronization method call sites in the synchronization method optimization algorithm, that is, the number of optimized synchronization method call sites quantity. For dynamic information, sync method Callsites indicates the total number of times the synchronization method is executed, and opt syncmethods Callsites indicates the total number of times all optimized synchronization method call sites are executed.

表1SPECjbb2005中各种同步操作的数量统计Table 1 Statistics of various synchronous operations in SPECjbb2005

Figure G2009101629791D00161
Figure G2009101629791D00161

从表1可以看出,在内联之前只能删除被标记的同步方法中同步对象上的同步操作,同步方法优化算法能删除61个同步方法中的61个加锁操作,除此之外不能删除任何加锁操作。在内联之后由于大量的同步方法被内联到其调用者的方法中成为同步语句块,在调用者方法中便出现了大量可以删除加锁操作的机会,从表中的数据可以看出,内联后可以删除431个加锁操作,其中包含29个同步方法中的29个加锁操作。It can be seen from Table 1 that only the synchronization operations on the synchronization object in the marked synchronization method can be deleted before inlining, and the synchronization method optimization algorithm can delete 61 locking operations in the 61 synchronization methods, otherwise it cannot Remove any locking operations. After inlining, because a large number of synchronization methods are inlined into the caller's method to become a synchronization statement block, there are a lot of opportunities to delete the locking operation in the caller's method. As can be seen from the data in the table, After inlining, 431 locking operations can be removed, including 29 locking operations in 29 synchronized methods.

从表1中的静态信息可以看出,不管是在内联前还是内联后,被优化的同步方法调用点的数量都相当大,说明内联前后都存在大量同步方法优化的机会。从表1中的动态信息可以看出,无论内联前后,优化过的同步方法实际被执行的次数都相当之多。因此,无论编译器是否支持内联,同步方法优化算法都能起到较好的优化同步方法的效果。From the static information in Table 1, it can be seen that the number of optimized synchronization method call points is quite large no matter before or after inlining, indicating that there are a large number of synchronization method optimization opportunities before and after inlining. From the dynamic information in Table 1, it can be seen that the optimized synchronization method is actually executed quite a lot, no matter before or after inlining. Therefore, no matter whether the compiler supports inlining or not, the synchronization method optimization algorithm can achieve a better effect of optimizing the synchronization method.

从表1中还可以看出,为了保证程序语义不被改变而插入的同步操作的数量远小于删除掉的同步操作的数量;被优化的同步方法的数量相比同步方法的总数量要小得多,而且一般情况下同步方法的代码都非常少,对于双版本粗化算法意味着保存非同步版本的本地代码的额外存储空间开销会很小。It can also be seen from Table 1 that the number of synchronization operations inserted in order to ensure that the program semantics are not changed is much smaller than the number of deleted synchronization operations; the number of optimized synchronization methods is much smaller than the total number of synchronization methods There are many, and in general, the code of the synchronization method is very small. For the double-version coarsening algorithm, it means that the additional storage space overhead for saving the non-synchronized version of the local code will be very small.

采用双版本策略为每个需要优化的同步方法生成两个版本的本地代码,需要一些额外的存储空间开销。Using the dual-version strategy generates two versions of native code for each synchronization method that needs to be optimized, requiring some additional storage space overhead.

表2给出了SPECjbb2005在各种情况下所耗费的本地代码的存储空间,其中none表示执行时不进行方法内联也不使用双版本粗化算法时所有本地代码需要的存储空间,none&syncOpt表示执行时不进行方法内联但使用双版本粗化算法时所有本地代码需要的存储空间。从表中所示的数据可以看出,在不使用内联时双版本策略产生的额外存储空间开销只有83324字节,仅增加约5.6%的存储空间。表中的inline表示进行方法内联但不使用双版本粗化算法时所有本地代码需要的存储空间,inline&syncOpt表示进行方法内联而且使用双版本粗化算法时所有本地代码需要的存储空间,从表中所示的数据可以看出,在使用方法内联时双版本策略产生的额外存储空间开销只有114428字节,仅增加约7.5%的存储空间。由此可见,双版本策略对SPECjbb2005产生的额外存储开销不大。假使针对一些应用程序,双版本算法会产生非常大的额外存储空间开销,则可以考虑用分支版本策略。Table 2 shows the storage space of the local code consumed by SPECjbb2005 in various situations, where none indicates the storage space required by all local codes when the method is not inlined and the double-version coarsening algorithm is not used during execution, and none&syncOpt indicates the execution The storage space required by all native code when not doing method inlining but using the double version coarsening algorithm. From the data shown in the table, it can be seen that the additional storage space overhead generated by the dual-version strategy is only 83324 bytes when inlining is not used, which only increases the storage space by about 5.6%. inline in the table indicates the storage space required by all native codes when method inlining is performed but does not use the dual-version coarsening algorithm, and inline&syncOpt indicates the storage space required by all local codes when method inlining is performed and the dual-version coarsening algorithm is used, as shown in the table From the data shown in , it can be seen that when the method is inlined, the additional storage space overhead generated by the double-version strategy is only 114428 bytes, which only increases the storage space by about 7.5%. It can be seen that the additional storage overhead of SPECjbb2005 caused by the dual-version strategy is not large. If for some applications, the double-version algorithm will generate very large additional storage space overhead, you can consider using the branch version strategy.

表2各种情况下SPECjbb2005所耗费的本地代码存储空间(单位:byte)Table 2 Local code storage space consumed by SPECjbb2005 in various situations (unit: byte)

Figure G2009101629791D00171
Figure G2009101629791D00171

发明人构造了不同版本的Java虚拟机对SPECjbb2005进行了测试,实验使用的平台为Intel(R)Core(TM)2 Quad CPU Q66002.40GHz,3GB内存,操作系统为Windows XP,部分测试结果如图6和图7所示。The inventor constructed different versions of Java virtual machines to test SPECjbb2005. The platform used in the experiment was Intel(R) Core(TM) 2 Quad CPU Q6600 2.40GHz, 3GB memory, and the operating system was Windows XP. Part of the test results are shown in the figure 6 and Figure 7.

图6为根据本发明的一个实施例的3种不使用方法内联的Java虚拟机执行SPECjbb2005所得的吞吐量(bops:business operation per second)对比示意图。其中,none表示执行时未应用任何同步优化,branch表示执行时应用分支版本粗化同步优化算法,double表示执行时应用双版本粗化同步优化算法,improvement1和improvement2分别表示branch相对于none、double相对于none的吞吐量提高率。图中数据表的第1~3行分别为使用未应用任何同步优化的虚拟机(记为none)、应用分支版本粗化算法的虚拟机(记为branch)、应用双版本粗化算法的虚拟机(记为double)执行SPECjbb2005的6组测试结果。数据表中的后2行分别为branch相对于none、double相对于none的吞吐量提高率。由于Stoodley等的过程内锁粗化算法在不使用方法内联时几乎对同步无改进,故这里未与之对比。从图6中可见,分支版本粗化算法和双版本粗化算法均能改进SPECjbb2005的吞吐量,而双版本粗化算法对吞吐量的提高率比分支版本粗化算法的要高些,这是因为分支版本粗化算法会在一些同步方法中加入判断是否需要执行同步操作的分支指令,这些分支指令将耗用运行时间,从而影响吞吐量的改进。不过,双版本粗化算法的较高性能改进是以更多的编译时开销和存储开销为代价的。FIG. 6 is a schematic diagram of a comparison of throughput (bops: business operation per second) obtained by executing SPECjbb2005 with three Java virtual machines that do not use method inlining according to an embodiment of the present invention. Among them, none indicates that no synchronization optimization is applied during execution, branch indicates that the branch version coarse synchronization optimization algorithm is applied during execution, double indicates that the double version coarse synchronization optimization algorithm is applied during execution, improvement1 and improvement2 indicate that branch is relative to none and double respectively Throughput improvement over none. Rows 1 to 3 of the data table in the figure are respectively the virtual machine using no synchronization optimization (denoted as none), the virtual machine applying the branch version coarsening algorithm (denoted as branch), and the virtual machine applying the dual version coarsening algorithm. The machine (denoted as double) executes 6 sets of test results of SPECjbb2005. The last two rows in the data table are the throughput improvement rates of branch relative to none and double relative to none. Since the procedural lock coarsening algorithm of Stoodley et al. has almost no improvement on synchronization when method inlining is not used, it is not compared with it here. It can be seen from Figure 6 that both the branch version coarsening algorithm and the dual version coarsening algorithm can improve the throughput of SPECjbb2005, and the double version coarsening algorithm improves the throughput higher than the branch version coarsening algorithm, which is Because the branch version coarsening algorithm will add branch instructions in some synchronization methods to determine whether to perform synchronization operations, these branch instructions will consume running time, thereby affecting the improvement of throughput. However, the higher performance improvement of the dual-version coarsening algorithm comes at the cost of more compile-time and storage overhead.

图7为根据本发明的一个实施例的4种使用方法内联的Java虚拟机执行SPECjbb2005所得的吞吐量对比示意图,其中,inline表示执行时未应用任何同步优化,inline&Stoodley表示执行时应用了Stoodley等的同步优化算法,inline&branch表示执行时应用分支版本粗化同步优化算法,inline&double表示执行时应用双版本粗化同步优化算法的测试结果,improvement1、improvement2、improvement3分别表示inline&Stoodley相对于inline、inline&branch相对于inline、inline&double相对于inline的吞吐量提高率。图中数据表的第1~4行分别为使用应用方法内联但未应用任何同步优化的虚拟机(记为inline)、在内联后应用Stoodley等的粗化算法的虚拟机(记为inline&Stoodley)、在内联后应用分支版本粗化算法的虚拟机(记为inline&branch)、在内联后应用双版本粗化算法的虚拟机(记为inline&double)执行SPECjbb2005的6组测试结果。数据表中的后3行分别为inline&Stoodley相对于inline、inline&branch相对于inline、inline&double相对于inline的吞吐量提高率。从中可见,三种同步优化算法都能改进吞吐量,而双版本粗化算法和分支版本粗化算法所带来的改进高于Stoodley等的算法所带来的改进。从表1中可知,在内联后同步方法调用的执行次数由612136827减到87182262,这是由于在执行了内联后绝大多数同步方法调用点被替换成这些同步方法的方法体。另外,由于在内联后应用分支版本粗化算法所产生的判断是否需要执行同步操作的分支指令数显著减少了,故分支版本粗化算法所带来的分支指令对性能的影响也减轻了。从图7中可以看到,inline&branch和inline&double对吞吐量的提高率是差不多的。Fig. 7 is a schematic diagram of the throughput comparison obtained by executing SPECjbb2005 using the inline Java virtual machine of four methods according to an embodiment of the present invention, wherein inline means that no synchronization optimization is applied during execution, and inline & Stoodley means that Stoodley etc. are applied during execution inline&branch indicates that the branch version coarsening synchronization optimization algorithm is applied during execution, inline&double indicates the test result of applying the double version coarsening synchronization optimization algorithm during execution, improvement1, improvement2, and improvement3 respectively indicate that inline&Stoodley is relative to inline, and inline&branch is relative to inline , The throughput improvement rate of inline&double relative to inline. Rows 1 to 4 of the data table in the figure are respectively the virtual machine using method inlining but not applying any synchronization optimization (denoted as inline), and the virtual machine applying Stoodley’s coarsening algorithm after inlining (denoted as inline&Stoodley ), the virtual machine (denoted as inline&branch) that applies the branch version coarsening algorithm after inlining, and the virtual machine that applies the double version coarsening algorithm after inlining (denoted as inline&double) execute six groups of test results of SPECjbb2005. The last three rows in the data table are the throughput improvement rates of inline&Stoodley relative to inline, inline&branch relative to inline, and inline&double relative to inline. It can be seen that the three synchronous optimization algorithms can improve the throughput, and the improvement brought by the double-version coarsening algorithm and the branch version coarsening algorithm is higher than that brought by the algorithm of Stoodley et al. It can be seen from Table 1 that the number of executions of synchronous method calls after inlining is reduced from 612136827 to 87182262. This is because most of the synchronous method call points are replaced by the method bodies of these synchronous methods after inlining. In addition, since the number of branch instructions for judging whether a synchronous operation needs to be performed by applying the branch version coarsening algorithm after inlining is significantly reduced, the impact of branch instructions brought by the branch version coarsening algorithm on performance is also reduced. As can be seen from Figure 7, inline&branch and inline&double have similar throughput improvement rates.

由图6和图7中分支版本粗化算法和双版本粗化算法对吞吐量的改进情况可知,两种算法在内联后的改进比对应的内联前的改进都要高。这主要是因为:当不执行内联时,锁粗化算法形成的新的同步区域会包含方法调用和返回、甚至是对未编译过的同步方法的编译等操作,这些操作导致新的同步区域较长,从而严重影响多线程程序的并发性。在内联后,由于大部分同步方法调用点被替换成同步方法的方法体,故锁粗化算法形成的新的同步区域中的方法调用和返回以及对未编译过的同步方法的编译等操作显著减少了,从而新的同步区域相对较短,并发性将得到改善。It can be seen from the improvement of the throughput of the branch version coarsening algorithm and the double version coarsening algorithm in Figure 6 and Figure 7 that the improvement of the two algorithms after inlining is higher than the corresponding improvement before inlining. This is mainly because: when inlining is not performed, the new synchronization area formed by the lock coarsening algorithm will contain operations such as method calls and returns, and even compilation of uncompiled synchronization methods, which lead to new synchronization areas Longer, which seriously affects the concurrency of multi-threaded programs. After inlining, since most of the synchronous method call points are replaced by the method body of the synchronous method, the method call and return in the new synchronous area formed by the lock coarsening algorithm, as well as the compilation of the uncompiled synchronous method, etc. significantly reduced so that new sync regions are relatively short and concurrency will improve.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (10)

1. A method of synchronization optimization, comprising the steps of:
performing static program analysis on the compiled method, determining a synchronous method calling point which does not need to perform synchronous operation on a synchronous object in the compiled method according to an analysis result, and marking the synchronous method calling point;
compiling and generating a local code which allows synchronous operation not to be executed on a synchronous object of the synchronous method according to the marked synchronous method calling point;
and executing the local code according to the marked synchronous method calling point.
2. The synchronization optimization method of claim 1, wherein the step of performing static program analysis on the compiled method and determining the calling points of the synchronization method without performing synchronization operation on the synchronization object comprises:
and merging adjacent synchronous areas and/or nested synchronous areas acting on the same synchronous object, and determining a synchronous method calling point contained in the merged synchronous area as the synchronous method calling point which does not need to carry out synchronous operation on the synchronous object.
3. The synchronization optimization method of claim 1, wherein the step of performing static program analysis on the compiled method and determining the calling points of the synchronization method without performing synchronization operation on the synchronization object comprises:
and determining whether the synchronous method call point of the object is a synchronous method call point without synchronous operation on the synchronous object according to whether the object is shared among a plurality of threads.
4. The synchronization optimization method of claim 3, wherein the step of determining whether the synchronization method call point for the object is a synchronization method call point that does not require a synchronization operation for a synchronization object based on whether the object is shared among a plurality of threads comprises:
and if the object is accessed only in the thread for creating the object, determining the synchronous method call point acting on the object as the synchronous method call point without synchronous operation on the synchronous object.
5. The synchronization optimization method according to claim 1, wherein the step of generating native code allowing not to perform synchronization operations on synchronization objects of the synchronization method according to the marked synchronization method call point for the compilation of the synchronization method called by the marked synchronization method call point comprises:
compiling a normal version of the synchronization method and an unsynchronized version of the synchronization method for the synchronization method according to the marked synchronization method call points,
wherein the unsynchronized version of the synchronization method includes native code generated by ignoring synchronization operations on synchronization objects of the synchronization method.
6. The synchronization optimization method according to claim 1, wherein the step of generating native code allowing not to perform synchronization operations on synchronization objects of the synchronization method according to the marked synchronization method call point for the compilation of the synchronization method called by the marked synchronization method call point comprises:
compiling and generating a branch version of the synchronous method for the synchronous method according to the marked synchronous method calling point, inserting an instruction for setting marking information in front of the marked synchronous method calling point,
wherein a branched version of the synchronization method is obtained by: inserting an instruction for obtaining mark information and an instruction for clearing marks at an entrance of the synchronization method; inserting a branch instruction before the synchronous operation acted on a synchronous object of the synchronous method, wherein the branch instruction judges the obtained marking information, if the marking information is true, jumping to the back of the synchronous operation, and if the marking information is false, executing the synchronous operation.
7. The synchronous optimization method of claim 5, wherein the step of executing the native code comprises:
when executing the synchronous method call point, if the synchronous method call point is the marked synchronous method call point, calling the local code of the asynchronous version, and if the synchronous method call point is the unmarked synchronous method call point, calling the local code of the common version.
8. The synchronous optimization method of claim 6, wherein the step of executing the native code comprises:
and when the synchronous method calling point is executed, calling the local code of the branch version according to the marking information instruction set in front of the synchronous method calling point.
9. The synchronization optimization method according to claim 1, wherein the step of determining the synchronization method call point as not requiring a synchronization operation on the synchronization object is not performed when one of the following occurs in the static program analysis of the compiled method:
the synchronization operation acted on other objects is included between the adjacent synchronization method calling points acted on the same synchronization object and the synchronization operation;
the operation of accessing variable data is included between the adjacent synchronous method calling points acting on the same synchronous object and the synchronous operation;
the adjacent synchronous method call points acting on the same synchronous object and the synchronous operation contain static unresolvable access or method call;
the adjacent synchronous method call points acting on the same synchronous object and the synchronous operation contain circulation;
the adjacent synchronous method call points acting on the same synchronous object and the synchronous operation comprise operations with inconsistent locking times, and the locking times of all precursor nodes of the control flow graph nodes to which the operations with inconsistent locking times belong are different.
10. A synchronous optimization device is characterized by comprising an analysis module, a compiling module and an execution module, wherein,
the analysis module is used for performing static program analysis on the compiled method, determining a synchronous method calling point which does not need to perform synchronous operation on a synchronous object in the compiled method according to an analysis result, and marking the synchronous method calling point;
the compiling module is used for compiling and generating a local code which allows synchronous operation not to be executed on a synchronous object of the synchronous method according to the marked synchronous method calling point;
the execution module is used for executing the local code according to the marked synchronous method calling point.
CN2009101629791A 2009-08-20 2009-08-20 Synchronous optimization method and synchronous optimization equipment Expired - Fee Related CN101630268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101629791A CN101630268B (en) 2009-08-20 2009-08-20 Synchronous optimization method and synchronous optimization equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101629791A CN101630268B (en) 2009-08-20 2009-08-20 Synchronous optimization method and synchronous optimization equipment

Publications (2)

Publication Number Publication Date
CN101630268A CN101630268A (en) 2010-01-20
CN101630268B true CN101630268B (en) 2012-07-04

Family

ID=41575386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101629791A Expired - Fee Related CN101630268B (en) 2009-08-20 2009-08-20 Synchronous optimization method and synchronous optimization equipment

Country Status (1)

Country Link
CN (1) CN101630268B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170543B (en) * 2017-12-26 2021-06-01 上海展扬通信技术有限公司 Kernel code and synchronization processing method and device of upper layer code of Kernel code
CN112313626B (en) 2018-06-22 2024-07-19 华为技术有限公司 Deadlock detection and synchronous perception optimization method on asynchronous processor architecture
CN112269581B (en) * 2020-12-24 2021-07-02 北京清微智能科技有限公司 Memory coupling compiling method and system for reconfigurable chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591335A (en) * 2003-08-26 2005-03-09 微软公司 Data flow analysis of transactional processes
WO2006128891A2 (en) * 2005-06-03 2006-12-07 International Business Machines Corporation Shared memory synchronization
CN1961292A (en) * 2004-06-03 2007-05-09 英特尔公司 Thread synchronization method and apparatus for managed runtime environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591335A (en) * 2003-08-26 2005-03-09 微软公司 Data flow analysis of transactional processes
CN1961292A (en) * 2004-06-03 2007-05-09 英特尔公司 Thread synchronization method and apparatus for managed runtime environment
WO2006128891A2 (en) * 2005-06-03 2006-12-07 International Business Machines Corporation Shared memory synchronization

Also Published As

Publication number Publication date
CN101630268A (en) 2010-01-20

Similar Documents

Publication Publication Date Title
Chen et al. The Jrpm system for dynamically parallelizing Java programs
Mehrara et al. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
Mehrara et al. Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism
US9424013B2 (en) System and method for reducing transactional abort rates using compiler optimization techniques
US8677331B2 (en) Lock-clustering compilation for software transactional memory
Halpert et al. Component-based lock allocation
Zhao et al. Intermediate language extensions for parallelism
ElTantawy et al. MIMD synchronization on SIMT architectures
Cheramangalath et al. Falcon: A graph manipulation language for heterogeneous systems
Dolan et al. Compiler support for lightweight context switching
Barik et al. Interprocedural load elimination for dynamic optimization of parallel programs
Liu et al. Safe-by-default concurrency for modern programming languages
CN101630268B (en) Synchronous optimization method and synchronous optimization equipment
Yiapanis et al. Compiler-driven software speculation for thread-level parallelism
Jenke et al. Mapping high-level concurrency from openmp and mpi to threadsanitizer fibers
Navabi et al. Quasi-static scheduling for safe futures
Spear et al. Reducing memory ordering overheads in software transactional memory
Afek et al. Lowering STM overhead with static analysis
Matsunaga et al. Shelving a code block for thread-level speculation
Noll et al. Online feedback-directed optimizations for parallel Java code
Nie et al. Parallel region reconstruction technique for sunway high-performance multi-core processors
Yu et al. General data structure expansion for multi-threading
D. Jardim et al. An extension for Transactional Memory in OpenMP
Honorio et al. On the efficiency of transactional code generation: A GCC case study
Ravi et al. Semi-automatic restructuring of offloadable tasks for many-core accelerators

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120704

Termination date: 20150820

EXPY Termination of patent right or utility model