GB2539961A

GB2539961A - Code hotspot encapsulation

Info

Publication number: GB2539961A
Application number: GB1511705.4A
Authority: GB
Inventors: Wilson Nicholas; Bhaskaran Balakrishnan; Al-Jarro Ahmed
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-07-03
Filing date: 2015-07-03
Publication date: 2017-01-04
Anticipated expiration: 2035-07-03
Also published as: GB201511705D0; GB2539961B

Abstract

A method is disclosed of encapsulating code hotspots and corresponding data from a large scale computer application and isolating the hotspots into standalone micro-routines. The hotspots are identified S102, S104 and selected S106 for encapsulation S108. Encapsulation involves extracting hotspots and their data dependencies, into standalone micro-routines that can be modified, compiled and run independently from the original application. The data dependencies are captured by applying a plurality of different data sets to the same hotspot. Optimising strategies may be applied S110 to the micro routines in isolation and the results tested S112, S114 without having to re-execute the entire application. The method can be applied at source code level S108_I, compile and link level S108_II or executable level S108_III. Hook functions may be used to perform the extraction of the hotspots.

Description

Code Hotspot Encapsulation

Field of the Invention

The present invention relates to performance analysis, tuning and optimisation of large-scale software applications and more particularly to so-called code hotspots within such applications.

Background of the Invention

Development of large-scale software application starts with a programmer writing source code using a high-level language, and taking advantage of existing code in the form of “libraries” for performing commonly-used functions.

Figure 1 schematically shows a computer system 10 suitable for use in developing a large-scale application and for implementing the present invention or parts thereof. It includes a memory 14 for temporary storage of various programs and data, including the application code of an application (which may be in any of a number of forms or levels as discussed later). The memory is connected to a CPU 12 for executing programs held in the memory (as will be understood by those skilled in the art, the CPU may in fact be many separate CPUs or cores). An input/output section 16 performs communications over a network 50 with entities outside the computer system 10, in particular a client workstation 20 employed by a programmer, developer or end-user, as well as with long-term (e.g., hard disk) storage 30 and a library 40 of re-usable code.

Figure 2 schematically illustrates stages in development of a large-scale application. Source code 100 is created by the programmer. Prior to compiling the source code, preprocessing 110 is often performed to remove redundant spaces, comments and so on, and to give effect to preprocessor directives. The programmer (or alternatively the end-user) then compiles the source code as indicated at 120, and links this object code as shown at 130 to any libraries being used, these being stored in library 40 already in object code form. Compiling and linking may involve multiple stages giving rise to intermediate code (also called compile and link level code) 200. The end result of compiling and linking is to form an executable file, usually simply called an “executable” containing object code 300. The above stages may also be regarded as distinct functions or modules, even if in reality each is accomplished by suitable programming of computer system 10. This, the pre-processing stage 110 may be referred to as a “pre-processor”, and compiling 120 as being performed by a “compiler”.

Performance analysis, tuning and optimisation are important parts of developing large-scale applications. A typical procedure is first to analyse the performance of the unmodified application, for example by “profiling”. Profiling may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods. Profiling is typically achieved by instrumenting either the application source code or the executable using a tool called a profiler (or code profiler), then executing the code to analyze the performance. Instrumenting involves addition of code in comparison with the original application, with the primary function of evaluating the correctness and efficiency of the original application.

One form of evaluation is to monitor the progress of execution through the sequence of instructions in the program code. A program counter is useful for this purpose, since it indicates where a computer is in its program sequence. Typically, the program counter is incremented after fetching an instruction, and holds the memory address of the next instruction to be executed. Thus, if during execution the program counter appears to be not changing or barely increasing for some time, this is indicative of a “bottleneck” or “hotspot” in the program code.

As is well known, a software application consists of many parts, which may be referred to as routines, functions and the like. Profiling allows identification of the bottlenecks or hotspots that are critical for improving the performance. A hotspot is a part of the application code that is executed many times such that its performance matters to the overall application performance. Put another way, a hotspot is a region of code where the program counter spends a good fraction of its time. The hotspot code will, in general, refer to results of computation in other parts of the application, for example to find the values of certain variables employed by the hotspot code. The hotspot is said to have “data dependencies" in the sense that the code needs access to these results (i.e., to values of the respective variables) in order to be executed. A set of values of the data referred to by a hotspot is referred to below as a “data set” or “data instance”. Within a single execution of the application, the hotspot code may be executed many times (in other words, in many iterations) each time with different data instances. “Hook functions” may be provided within the programming language used for the source code, to facilitate profiling. A hook function inserted at a given point in program code allows the current instruction and data being executed at that point to be read out and recorded for later analysis. One hook function is sufficient to extract one hotspot, by capturing the data when entering the hotspot and extracting the code in the scope of the hotspot. An example of a prior-proposed tool for this purpose is called Codelet Finder and was developed by CAPS Enterprises. However, this tool is applicable only to source code, and does not capture different data sets for the same hotspot.

Tuning and optimisation means to modify the code to remove bottlenecks, for example by improving performance at the hotspots, then to measure the performance of the system after modification to verify that performance has indeed improved.

Benchmarking, in the software sense, means running an application in order to assess its performance. A micro-benchmark attempts to measure the performance of a relatively small section of code, such as a single hotspot. Thus, it is possible to extract code sections from a larger application for benchmarking purposes; however, attention needs to be paid to data dependencies, where different lines of code access or modify the same resource.

Another important technique in managing large-scale applications is called checkpointing and restarting. Checkpoint/restart mechanisms allow an application which has terminated (either due to a hardware crash, or intentionally) and is subsequently restarted, to continue from the checkpoint with no loss of data, just as if no failure had occurred.

Checkpointing can occur either within the operating system or at the application level.

In the latter case, the application uses operating system hooks that enable it to save the relevant resources and data needed for a restart. Conventionally, an application-level checkpoint provides a “snapshot” of the entire state of the application, including data. With large-scale applications requiring a long time (up to several weeks) to complete execution, it is common practice to break up execution into several batches. Then, at certain time intervals or at the beginning of the business day, these long-running programs are intentionally stopped after a checkpoint so that smaller jobs can then be processed. When the smaller jobs have finished, the larger program restarts at the last checkpoint.

Although a checkpoint of a large-scale application (and likewise the subsequent restart) may take a long time, it is also possible to isolate sections of code forming part of a larger application and checkpoint them individually. Again, data dependencies have to be taken into account.

Data layout can be an important consideration for a large-scale application. Data used by an application are frequently saved in a default layout, which is inefficient because the data the programs need at a particular time may not reside near to each other in the memory space. As applications require increasing amounts of memory, provided at different levels of cache, main memory and external storage, it becomes necessary to consider how data is saved and how it will later be accessed. By matching the code to data layout, data can be accessed more quickly and programs will run faster. Thus, in addition to optimisation of the program code itself, the data layout can also be optimised (this is also called memory optimisation).

Large-scale applications are laborious to tune as their size often makes them challenging to manipulate, where many hotspots may exist within the same application. As large-scale applications typically have large elapsed time, particularly when simulating real-life problems, the turnaround time required for testing a potentially multiple optimization strategy for the entire application is often unfeasible.

There is consequently a need for improved techniques for applying optimisation technologies to large-scale applications.

Summary of the Invention

An object of this invention is to provide a mechanism for encapsulating targeted code hotspots into standalone micro-benchmark routines (also referred to below simply as micro-routines). In effect, an encapsulated routine is walled off (within a “wrapper”) to ensure that all the code that changes its behaviour is contained within the object itself. Encapsulation encompasses both data and code or executable, and takes into account that one code segment can use multiple sets of data.

This invention is particularly applicable in cases where external technologies are to be applied to large-scale applications, such technologies including, for example, performance analysis, tuning and optimising.

The extraction of hotspots and all possible corresponding data dependencies as well as their encapsulation into standalone micro-benchmark routines will therefore make them accessible for use externally and without having to rely on re-executing the entire application. Importantly, this encapsulation mechanism of the hotspots makes them suitable for deployment within an external technology and away from the full application.

Data layout is an important consideration for a large-scale application as already mentioned. The above mentioned extraction of the hotspots will allow for the more in-depth examination of the layout of data structures for a particular hotspot without needing to modify the data layout for the full application. That is, various data layouts can be tested for each hotspot without having to change the data layout for the whole application.

Attempts have been made to perform hotspot isolation by extraction from source code. However, proposals to date in this field do not provide a complete solution, or a solution capable of flexible application.

Firstly, prior proposals do not consider scenarios where there is no access to the application’s source code. For example, in the case where only access to the executable is available, and not the source code, there is no mechanism for hotspot isolation. Furthermore, prior proposals generally target situations where the capturing of data dependency is only considered for static data dependency: in other words they cannot manage different datasets for the same hotspot, whereas embodiments of the present invention permit isolation of hotspots as well as ail instances of data dependency for these hotspots.

In addition, embodiments of the present invention generate a profile of the captured data instances, in the sense of a (visual) representation of results for each of a set of instances of data that has been recorded and collected for a particular hotspot: this is a different sense of “profiling" from the instrumentation sense mentioned earlier. With this profile, the user can make a choice on the extracted data instance(s) for the isolated hotspots when used externally: in other words, to choose from among the set of data instances already captured during the hotspot encapsulation. By contrast, prior proposals lack both the isolation of hotspots where access to the source code is not available and the isolation of different datasets for the same hotspot.

According to a first aspect of the present invention, there is provided a method of analysing code of a large-scale application comprising: identifying hotspots in code of the application; extracting, from the code of the application, one or more of the identified hotspots along with data dependencies of the hotspots; encapsulating each extracted hotspot with its data dependencies into a standalone routine; and executing each standalone routine separately from the large-scale application; wherein the extracting includes: capturing data dependencies of each hotspot for a plurality of different data sets applied to the same hotspot.

Here, a “hotspot” may be, as already mentioned, a part of the application code that is executed many times, and/or is highly computationally intensive such that its performance matters to the overall application performance. The "extracting” of a hotspot means to cut out the relevant code leaving behind the remainder of the application (including other hotspots). The “encapsulating” refers to isolating the cutout code and related data into a single, standalone component which is capable of being executed separately from the original application. This may involve virtualisation to allow the hotspot to exist independently. The ‘‘code of the application” can either be source code, compile and link-level code, or executable code (see below). The “data sets” applied to a hotspot are data values referred to by the hotspot code and used in calculations and so forth, and give rise to “data instances” of the hotspot. The “routine” referred to above means some individually-definabfe funciion of the application, and is also referred to elsewhere in this specification as “micro-routine” or “microbenchmarking routine”. Capturing data dependencies may be followed by a further step of allowing a user to select data sets to be applied in the executing step.

Whilst it would be possible to employ the above method merely for analysis of existing code, it is advantageously applied to optimising and/or tuning the hotspots to improve performance of the application as a whole. Such optimising and/or tuning is not part of the invention, and can employ any known techniques. By applying, to the hotspot, at least one technology for tuning or optimisation of the hotspot, separately from any technology applied to another said hotspot, it becomes possible to apply several technologies to the same hotspot, or the same technology can be applied to several hotspots in isolation from each other. A first embodiment is applied at the source code level. In this case, the extracting can be performed on source code of the application by inserting hook functions into the source code at points where hotspots are identified, the hook functions capturing code statements and corresponding processing data of the hotspots.

In a second embodiment, the extracting is performed at compile and link level code of the application by inserting hooks into the compile and link level code at locations specified using configuration files, compiler flags and/or environment variables, the hooks capturing code statements and corresponding processing data of the hotspots.

In a third embodiment, the extracting is performed on an executable file of the application. Encapsulation of hotspots from the executable can be performed after the hotspots are identified and marked. Thereafter, hotspot encapsulation is performed for the marked instances of the executable that correspond to each marked hotspot via checkpointing and restarting technology, the checkpointing being applied either to the whole application or only to part of it.

In any of these embodiments, preferably, the capturing is applied to all available data sets, but only data sets giving significant differences (in execution time for example) when executing the standalone routine are employed in the encapsulating. Further, the capturing step may be used to generate a profile of data instances of the hotspot, each data instance corresponding to a different data set. The profile may indicate the time required for executing the hotspot code with each data set (set of values of the variables which constitute the data dependency), revealing which data sets are especially useful for assessing the effect of tuning/optimisation.

It is not necessary to perform the above steps on every hotspot identified within the application. Thus the extracting may further comprise selecting from among the hotspots identified, one or more hotspots for extraction.

According to a second aspect of the present invention, there is provided a computer system for analysing code of a large-scale application comprising: means for extracting, from code of the application, hotspots along with data dependencies of the hotspots; means for encapsulating each extracted hotspot with its data dependencies into a standalone routine; and means for analysing the standalone routines separately from the large-scale application; wherein the means for extracting includes: means for capturing data dependencies of each hotspot for a plurality of different data sets applied to the same hotspot.

According to a third aspect of the present invention, there is provided program code which, when executed by a computer system, performs any method referred to above.

According to a fourth aspect of the present invention, there is provided one or more non-transitory computer-readable recording media on which is stored the program code just mentioned.

Thus, the proposed invention features the encapsulation of code hotspots and corresponding data; therefore, isolating them into standalone micro-routines. This technology is highly suitable for the complete isolation of pieces of code; for example, for extracting computationally intensive routines and their data dependencies, that reside in an otherwise large-scale application, into standalone codes that can be modified, built (compiled) and run independently from the original application.

Therefore, external technologies (e.g,, optimising strategies) of choice can be applied to the extracted micro-routines in isolation without having to re-execute the entire application.

Different data sets or “data instances” are typically used in different executions (or iterations) of the hotspot code. During the capturing of data dependencies all data instances are scanned, whereby a collection for encapsulation purposes is only made with respect to data sets with significant differences.

As a result, a profile of the captured data dependencies is also generated; therefore, providing the user with choice of the extracted data instance(s) for the isolated hotspots when applying an external technology on the now fully extracted micro-routines.

The mechanism for encapsulating code hotspots into standalone micro-routines that also capture all instances of data dependencies for each hotspot/funciion is an important and distinctive technical feature of this invention. The complete isolation of each code hotspot (source code, compile and link level or executable) and all instances of possible data dependency into an individual micro-routine is achieved. Specifically, an embodiment of the mechanism to isolating and encapsulating a micro-routine entails the following steps:

Select hotspots identified in large-scale application for isolation and extraction Isolate code hotspots based on source-code, compile and link, or executable levels

Extract hotspot and all possible instances of data dependency

Store and generate profile of captured data instances for the isolated hotspot

Encapsulate hotspot and data into a standalone micro-routine

Brief Description of the Drawings

Reference is made, by way of example only, to the accompanying drawings in which: Figure 1 shows a conventional computer system suitable for performing the methods of the invention;

Figure 2 schematically illustrates stages in development of a large-scale application; Figure 3 illustrates the concept of hotspot encapsulation;

Figure 4 is an overview of the method for encapsulating code hotspots into standalone micro-benchmark routines.

Figure 5 illustrates a hotspot isolation mechanism applied at source code or intermediate code level, providing insertion of hook functions to the selected hotspots for the purpose of capturing codes and data dependency into standalone microroutines;

Figure 6 illustrates a hotspot isolation mechanism applied at object code level, employing checkpointing and restarting technology for isolating hotspots into micro-benchmark routines; and

Figure 7 shows a data profile that is generated from the captured data instances. Detailed Description

Figure 3 illustrates the principle of encapsulation as employed in the present invention. Encapsulation, in the present context, refers to isolating, from original source code 100 for example, a section of code identified as a hotspot (HS in Figure 3), and wrapping the hotspot and related data dependencies into a single, standalone component (or micro-routine, MR in Figure 3) which is capable of being executed separately from the original application, in Figure 3, the code depicted is purely for illustrative purposes and does not necessarily represent an actual hotspot. A flow chart that illustrates the mechanism proposed to extract code hotspots into standalone micro-routines is shown in the right-hand portion of Figure 4. By way of comparison, a known hotspot evaluation method is shown in the left-hand part of the Figure. The mechanism is a computer-implemented method, under user control.

The method of the invention starts at step SI 00 by executing the application to be analysed, whereby a profiling tool is used and profile data collected (SI 02) in order to identify code hotspots. Next (SI04), the hotspots are ranked in some way, such as in order of importance (for example based on time spent by the execution in each hotspot). In SI08, two options for applying the mechanism to isolate the micro-routines can be considered: i) after the hotspots are identified and selected; or ii) to all functions of the application. The difference here is that in i) only selected hotspots are identified and extracted, whereas in ii) the encapsulation mechanism is extended to all functions of the application regardless of whether or not they are hotspots.

After any desired selection is made, three approaches to hotspot isolation can also be considered: at source-code level (S108J), compile and link level (S108JI), or executable level (S108_ll[) as explained in more detail later. For case i) above -isolating the micro-routines after the hotspots are identified and selected - a re-run of the application is performed in order to maintain the efficiency of capturing all data dependencies of the selected hotspots and their encapsulation into standalone microroutines.

Once the hotspots have been isolated in S108, external technologies (e.g. tuning and optimisation) are applied to the isolated hotspots in S110. Depending upon the level of code being considered (source code, intermediate code or object code), this results in various modifications being applied to the hotspot code as shown in the boxes labelled S112J, II and III. Several external technologies can be applied to the same hotspot or the same technology can be applied to several hotspots in isolation from each other and away from the full application. In S114 the isolated hotspots are each executed individually with the modifications applied in S112, to assess the results. In this way external technologies (e.g., optimising strategies) of choice can be applied to the extracted micro-routines in isolation without having to re-execute the entire application.

As indicated in the flowchart, the flow may return to the previous step to allow alternative modifications to be applied (for example if modifications applied to a hotspot do not have the expected effect). Once it appears that the individual hotspots have been satisfactorily optimised or tuned, the application including the modified hotspots is executed at S116 to verify (test) the effect upon the application as a whole. Note that this step may only have to be performed once.

Meanwhile, the known evaluation method shown in the left-hand portion of Figure 4 includes steps S10, S12, S14 and S16 corresponding to the above steps S100, S102, S104 and S106 respectively. However, the hotspots are not isolated so at S18, external technologies (optimisations) are applied to selected hotspots one hotspot at a time and the effect upon the whole application needs to be assessed after each modification at S20. Thus typically the whole application would need to be re-executed many times. For a large-scale application this is an extremely time-consuming process.

The process of hotspot isolation will now be explained in more detail with reference to Figure 5 and 6. As already mentioned, the methodology proposed for hotspot isolation is categorised into three mechanisms that can be applied at: source-code level (S108J in Figure 4), at compile and link level (S108JI), and/or at executable level (S108JII). Figure 5 shows stages in hotspot isolation in S108_l and II, and Figure 6 shows the case of S108JII. S108_l: Source-code level

Figure 5 shows how the novel process fits into the conventional stages of preprocessing 110, compilation 120 and linking 130 mentioned with reference to Figure 2.

In this approach, hooks insertion tools 102 are employed to insert hook functions 104 to the identified hotspots within the source-code, for the purpose of capturing each hotspot’s statements and data dependency.

One such hook function 104 is shown in the example of a hotspot labelled HS in the Figure. As a result of inserting the hook functions to the selected hotspots, code statements and corresponding processing data required for an independent execution of the selected hotspots are extracted as a micro-routine MR. In other words, this process isolates hotspots and extracts them as separate source files. It also captures and stores all data dependencies in the original application to allow replaying the now fully extracted hotspots in isolation from the full application, S108JI; Compile and link level

The right-hand portion of Figure 5, headed “II- Compile & Link Level”, shows the corresponding hook insertion mechanism to the selected hotspots for the purpose of capturing codes and data dependency into standalone micro-routines, in the case of the compile and link level.

At this level, the compiler 120 (or pre-processor 110) inserts the required functionality into the executable to isolate hotspots which have already been identified in a prior stage. In the Figure, this is indicated by “Hooks insertion” 108 as part of preprocessing. No direct changes are required to the source code.

The information to be supplied to the hooks insertion function 108, i.e. the locations where the hooks are to be inserted can be specified using configuration files 105, environment variables 107 or similar settings.

In this way the identified hotspots can be isolated during the compilation process, i.e. without requiring direct insertion of the hook functions into the original source code of the application. As shown in Figure 5, a reference to the process of encapsulating the selected hotspots is invoked via: a) modification of configuration files 105, b) modification of environment variables 107, c) calls to compiler flags 106 that relate to these hotspots. These flags are placed at command line by the user in relation to the identified hotspot. The compiler isolates source code of hotspots as well as all data dependencies away from the full application. As is apparent from the Figure, the encapsulation is a process separate from, and subsequent to, the identification of hotspots, making use of compiler flags already set. S108JII: Executable level:

Figure 6 shows the corresponding process of hotspot isolation in the case of an executable file, i.e. object code 300. In this approach, an extraction of hotspots is implemented from executables via checkpointing and restarting technology applied to the application (and thus OS-independent), where multiple snapshots of the executable are generated; each representing an isolated micro-routine. In other words, successive checkpoints can be taken to “slice up" the application, for example at fixed time intervals, and where two adjacent checkpoints indicate little or no progress in terms of the program counter, they can be assumed to bound a hotspot. This is schematically illustrated in Figure 6 by a step S200 of checkpointing hotspots, step S202 of isolating hotspots from the object code 300, which can then be saved (checkpointed) individually in disk storage 30 of the computer system 10, and restarting the hotspots at step S204 allowing each to be considered in isolation from the executable.

This mechanism is particularly suitable for use in cases where no access to the source code is available, it isolates hotspots from executable during runtime for each marked instance of the executable via checkpointing into disk storage. The hotspots are isolated from each other together upon the restarting of each check-pointed hotspot, along with each hotspot’s instances of data dependency.

The process of capturing data dependencies (data extraction) will now be considered in more detail, referring to Figure 7.

In general, a micro-routine or hotspot can be expected to have data dependencies, in that it will need access to data in some other part of the application, such as a result calculated by another routine. In all cases of hotspot isolations, an encapsulation mechanism should also capture the data dependencies of the isolated micro-routines.

However, there will in general be a plurality of instances of data representing both the data values and the data structure to be processed by the micro-routine for a given execution of the application, or iteration of the hotspot within the same execution. The design of code or executable isolation and their data encapsulation mechanism in the present invention is general enough so as to capture one of three data dependency scenarios: a) only one selected instance of data; b) all instances of data; c) a comprehensive range of instances of data.

In the latter case, a scan (of the results of execution) over all data instances is performed; however, storage is only invoked whenever a significant difference between a new set of data and those that are already captured is observed. As a result of this tatter selective operation, a profile of the collected data is also generated as schematically shown in Figure 7.

In Fig. 7, each vertical bar represents a marker to one data instance of the collected data. The purpose of the generated data profile is to present to the user a comprehensive spectrum of data instances. For example it may be more instructive to execute the hotspot with data instances showing extremes of behaviour (maximum and minimum run times) rather than data instances all giving similar execution times. Therefore, one can use (optimise or tune, etc) the now fully isolated, extracted and encapsulated micro-benchmark routines as well as the data of choice away from the entire application.

To summarise the foregoing, embodiments of the present invention can provide a method of encapsulation of code hotspots (HS) and corresponding data from a large-scale application, isolating them into standalone micro-routines. The hotspots are identified (S102, S104) and selected (S106) for encapsulation. Encapsulation (S108) involves extracting hotspots and their data dependencies, into standalone microroutines that can be modified, built (compiled) and run independently from the original application. Optimising strategies of choice can be applied (S110) to the microroutines in isolation and the results tested (S112, S114) without having to re-execute the entire application. The method can be applied at any of source code level (S108J), compile and link level (S108J1) or executable level (S108JII). In addition, a profile of the captured data instances can be generated so that the user can make a choice on the extracted data instance(s) for the isolated hotspots when used externally.

Features of embodiments include the following:

Code and/or executable hotspots can be captured in an effective and efficient manner since the isolation mechanism can be applied to all or a selected number of compute functions.

Code and/or executable hotspots and their data dependencies are isolated away from the full application.

Isolating the hotspots allows them to be tested independently of the main application. Thus there is no need to re-start application execution when applying external technologies to code hotspots.

There is no need to re-start application execution when examining different datasets for the same hotspot.

An increase of efficiency in hotspots treatment is provided, since several external technologies can be applied to the same hotspot or the same technology can be applied to several hotspots and in isolation from each other. In other words it becomes a more efficient process to consider various technologies (optimisations) while using different data sets for the same hotspot.

No loss of problem complexity, in comparison with the prior art where often programmers opt for applying external technologies targeted for hotspots treatment (for example, tuning and optimisation) to simplified case studies since the resources required for executing the entire application (particularly, when simulating real-life problems) can be computationally very intensive.

Various modifications are possible within the scope of the invention.

Although C code has been included in some of the Figures, this is for illustrative purposes only. The present invention is not restricted to the C language. The source code and compilation case of the described method, as well as the disclosed method of isolating hotspots, can be implemented in any language. The executable-level approach is also of general applicability.

When applied at the compile and link level, the present invention can employ any intermediate representation generated during compilation. Known compilation processes create so-called intermediate representations as intermediate coding forms prior to a final compilation stage, such as an abstract syntax tree.

When applied at the executable level, the code being processed can be any of assembly code, register transfer language (RTL), virtual machine-executable code such as bytecode, and/or object code.

The alternative methods of hotspot encapsulation described above can be combined; that is, the isolation of hotspots can be performed at different levels of code if desired.

Industrial Applicability

Isolation of hotspots, as well as all instances of data dependency for these hotspots, facilitates testing of a large-scale application because each hotspot can be tested separately from the rest of the application. In addition, a profile of captured data instances can be generated so that the user can make a choice from among the extracted data instance(s) for the isolated hotspots when used externally. Performance analysis, tuning and optimisation of large-scale applications can therefore be performed more efficiently.

Claims

1. A method of analysing code of a large-scale application comprising: identifying hotspots in the code of the application; extracting, from the code of the application, one or more of the identified hotspots along with data dependencies of the hotspots; encapsulating each extracted hotspot with its data dependencies into a standalone routine; and executing each standalone routine separately from the large-scale application; wherein the extracting includes: capturing data dependencies of each hotspot for a plurality of different data sets applied to the same hotspot.

2. The method according to claim 1 wherein the method further comprises applying, to the hotspot, at least one technology for tuning or optimisation of the hotspot, separately from any technology applied to another said hotspot.

3. The method according to claim 1, 2 or 3 wherein the extracting is performed on source code of the application by inserting hook functions into the source code at points where hotspots are identified, the hook functions capturing code statements and corresponding processing data of the hotspots.

4. The method according to claim 1, 2 or 3 wherein the extracting is performed at a compile and link level code of the application

5. The method according to claim 4 wherein the extracting is performed by inserting hooks into the compile and link level code at locations specified using configuration files, compiler flags and/or environment variables, the hooks capturing code statements and corresponding processing data of the hotspots.

6. The method according to any preceding claim wherein the extracting is performed on an executable file of the application.

7. The method according to claim 6 wherein the extracting employs checkpointing and restarting to generate multiple snapshots of the executable each representing an isolated hotspot.

8. The method according to any preceding claim wherein the capturing is applied to all available data sets, but only data sets giving significant differences when executing the standalone routine are employed in the encapsulating.

9. The method according to any preceding claim wherein the capturing generates a profile of data instances of the hotspot, each data instance corresponding to a different data set.

10. The method according to claim 9 further comprising a user selecting, on the basis of the profile, data instances for use in the executing.

11. The method according to any preceding claim wherein the extracting further comprises selecting from among the hotspots identified, the one or more hotspots for extraction.

12. A computer system for analysing code of a large-scale application comprising: means for extracting, from code of the application, hotspots along with data dependencies of the hotspots; means for encapsulating each extracted hotspot with its data dependencies into a standalone routine; and means for analysing the standalone routines separately from the large-scale application; wherein the means for extracting includes; means for capturing data dependencies of each hotspot for a plurality of different data sets applied to the same hotspot.

13. Program code which, when executed by a computer system, performs the method of any of claims 1 to 11,

14. Non-transitory computer-readable recording media on which is stored the program code according to claim 13.