RU2411569C2

RU2411569C2 - Method for automatic paralleling of programs

Info

Publication number: RU2411569C2
Application number: RU2009102804/08A
Authority: RU
Inventors: Александр Юльевич Дроздов (RU); Александр Юльевич Дроздов; Сергей Викторович Новиков (RU); Сергей Викторович Новиков
Original assignee: Закрытое акционерное общество "ОПТИМИЗИРУЮЩИЕ ТЕХНОЛОГИИ"; Александр Юльевич Дроздов; Сергей Викторович Новиков
Priority date: 2009-01-29
Filing date: 2009-01-29
Publication date: 2011-02-10
Also published as: RU2009102804A

Abstract

FIELD: information technologies.

SUBSTANCE: in method for automatic paralleling of cycle areas in algorithmic part of program, previously graph of control flow, dominator trees, cycle trees, graph of data flow are produced; intermediate representations of procedures are substituted into areas of calls; procedure-by-procedure analysis of data flow is carried out; to detect equivalent operations, analysis of data flow is carried out, preferably by method of values numbering. Analysis of cycle variables is carried out for invariance and inductance; analysis of operations of access to massifs is made, indices of access to massifs are built in the form of canonical forms of product sums. Cycles are merged; invariant conditions are taken out; procedure of iteration cycle space bypass is changed; analysis of parallel cycles is carried out.

EFFECT: reduced time of information processing by increasing number of parallel realised operations.

5 dwg

Description

Изобретение относится к области компьютерного программирования и, в частности, к области преобразования текста программы в машинные коды (компиляции).The invention relates to the field of computer programming and, in particular, to the field of converting program text into machine codes (compilation).

Заявляемый способ относится к области оптимизирующей компиляции. Компилятор преобразует программу, написанную на исходном языке (С, C++, Fortran и т.п.), в код целевой архитектуры (Intel x86, Sparc и т.д.). Оптимизирующий компилятор выполняет это преобразование так, чтобы достичь наилучшего показателя для получаемого кода по некоторому критерию. Чаще всего этим критерием является максимальная скорость исполнения программы на целевой архитектуре (платформе).The inventive method relates to the field of optimizing compilation. The compiler converts the program written in the source language (C, C ++, Fortran, etc.) into the target architecture code (Intel x86, Sparc, etc.). The optimizing compiler performs this conversion so as to achieve the best performance for the resulting code according to some criteria. Most often, this criterion is the maximum speed of program execution on the target architecture (platform).

ТерминыTerms

Промежуточное представление программы. Семантика программы может быть представлена в операционном виде. Элементом данного представления является операция. Операцию можно понимать как произвольное действие над окружающим контекстом. Все операции упорядочены в соответствии с семантическими зависимостями. Например, если в программе написано c=a+b, то в промежуточном представлении появляется объект 'операция', в котором указано, что выполняется сложение двух аргументов а и b и результат записывается в с. Interim presentation of the program . The semantics of the program can be presented in the operational form. The element of this representation is the operation. An operation can be understood as an arbitrary action on the surrounding context. All operations are ordered according to semantic dependencies. For example, if c = a + b is written in the program, then the 'operation' object appears in the intermediate representation, which indicates that the two arguments a and b are added and the result is written in c.

Линейный участок. Последовательность операций, без внутренних разветвлений и схождений потока управления. Программа вида: Linear section . The sequence of operations, without internal branching and convergence of the control flow. Program view:

Граф потока управления. Граф потока управления процедуры является аналитической структурой данных, отражением результатов анализа топологии и семантики программы. Каждый узел управляющего графа соответствует некоторому линейному участку. Граф потока управления является ориентированным. Каждая дуга такого графа соответствует возможности передачи управления в программе между линейными участками. В графе выделяют узел старт и узел стоп. От этих узлов достижимы все узлы управляющего графа. Control flow graph . The graph of the control flow of the procedure is an analytical data structure, a reflection of the results of the analysis of the topology and semantics of the program. Each node of the control graph corresponds to a certain linear section. The control flow graph is oriented. Each arc of such a graph corresponds to the possibility of transferring control in the program between linear sections. In the graph, the start node and the stop node are selected. From these nodes, all nodes of the control graph are reachable.

Доминирование. Для двух узлов вводится отношение доминирования. Первый узел доминирует второй, если все пути от стартового узла до второго узла проходят через первый узел. На основе этого отношения строится дерево доминаторов, узлами которого являются узлы управляющего графа, а дуги соответствуют отношению доминирования. Dominance . For two nodes, a dominance relation is introduced. The first node dominates the second if all the paths from the start node to the second node pass through the first node. Based on this relationship, a tree of dominators is constructed, the nodes of which are nodes of the control graph, and the arcs correspond to the dominance relation.

Дерево циклов. На основе анализа потока управления строится факторизация программы, которая называется деревом циклов. Эта факторизация нужна для выделения интересных с точки зрения оптимизации регионов программы, называемых циклами. Узлами дерева являются циклы. Дуги отображают вложенность циклов. Tree of cycles . Based on the analysis of the control flow, the factorization of the program, called the tree of cycles, is built. This factorization is needed to highlight areas of the program that are interesting in terms of optimization, called loops. Tree nodes are loops. Arcs represent nesting cycles.

Граф потока данных. Граф потока данных является аналитической структурой данных, в которой узлами являются операции, а дуги отображают передачу значений между операциями. Для фрагмента программы (1) c=a+b; (2) f=d-c два узла, соответствующих операциям 1 и 2, будут связаны дугой в графе потока данных, которая соответствует передаче данных через переменную с. Data flow graph . A data flow graph is an analytical data structure in which nodes are operations, and arcs represent the transfer of values between operations. For the program fragment (1) c = a + b; (2) f = dc, the two nodes corresponding to operations 1 and 2 will be connected by an arc in the graph of the data stream, which corresponds to the transfer of data through the variable c.

Конфликт. Для двух операций доступа в память есть понятие конфликта. Конфликт означает доступ в одну и ту же область памяти. The conflict . For two memory access operations, there is a concept of conflict. Conflict means access to the same memory area.

Зависимость. Для двух операций доступа в память вводится отношение зависимости. Зависимость означает наличие конфикта и достижимость по графу потока управления. Addiction For two access operations, a dependency relation is entered into the memory. Dependence means confict and reachability according to the control flow graph.

Параллельный цикл. Параллельным является цикл, у которого нет межитерационных зависимостей. Parallel loop . Parallel is a cycle that has no inter-iteration dependencies.

Инвариантная переменная. Инвариантной переменной цикла является переменная, которая не изменяется в цикле. Invariant variable . An invariant loop variable is a variable that does not change in the loop.

Индуктивная переменная. Индуктивной переменной является переменная цикла, которая инкрементируется или декрементируется на константное значение на каждой итерации цикла. Inductive variable . An inductive variable is a loop variable that is incremented or decremented by a constant value at each iteration of the loop.

Рекурентность. Рекурентностью называется последовательность операций, связанных межитерационно в графе потока данных. Recurrence . Recurrence is a sequence of operations inter-iteratively connected in a data flow graph.

Редукция. Редукцией называется переменная, которая коммутативно изменяется на каждой итерации цикла. Reduction . Reduction is a variable that commutatively changes at each iteration of the loop.

УРОВЕНЬ ТЕХНИКИBACKGROUND

Главную роль в эффективном использовании архитектур с явной параллельностью (многоядерных, кластерных и т.д.) играют системы программирования, предназначенные для создания параллельных программ. Такие системы включают в себя как языковые средства задания параллельности в программе (системы явного параллельного программирования - стандарт ОреnМР [31], платформа RapidMind [35], …), так и автоматические средства поиска параллельных вычислений (компиляторы фирм Intel [32], Portland Group [33], …) и последующего представления их с использованием существующих библиотек поддержки параллельности (MPI ([30]), libgomp ([34]), …).The main role in the effective use of architectures with explicit parallelism (multicore, cluster, etc.) is played by programming systems designed to create parallel programs. Such systems include both language means for setting parallelism in the program (explicit parallel programming systems - OrenMR standard [31], RapidMind platform [35], ...), and automatic parallel computing search tools (Intel compilers [32], Portland Group [33], ...) and their subsequent presentation using existing parallelism support libraries (MPI ([30]), libgomp ([34]), ...).

Явное параллельное программирование и автоматическое программирование, взятые независимо друг от друга, обладают серьезными недостатками. Компромиссное решение, описанное в работе [29], состоит в создании системы неявного параллельного программирования. С одной стороны программист на традиционных языках программирования (С, C++, Fortran) создает программу с параллельными вычислениями. Автоматическими средствами эти параллельные вычисления должны быть распознаны. В случае проблем с анализом, программист может добавить подсказки в программу, разрешив какой либо конфликт, мешающий определению параллельности. Затем, найденные параллельные вычисления автоматически представляются в параллельных терминах конкретной архитектуры. По сути, это некоторая интерактивная среда разработки параллельных программ, в которой сбалансированы усилия человека и автоматических средств. В такой системе основную работу по анализу, оптимизациям и инструментированию параллельного кода выполняет автомат, а человек лишь точечно прагмами разрешает конфликты, с которыми статический анализатор не справляется. Невозможность автоматически разрешить конфликт возникает по двум причинам: несовершенство статического анализа и динамическая природа конфликтов. Несмотря на то что первый фактор со временем будет нивелироваться, разрешение динамических конфликтов потребует подсказок.Explicit parallel programming and automatic programming, taken independently of each other, have serious drawbacks. The compromise solution described in [29] is to create an implicit parallel programming system. On the one hand, a programmer in traditional programming languages (C, C ++, Fortran) creates a program with parallel computations. By automatic means, these parallel computations must be recognized. In case of problems with the analysis, the programmer can add tips to the program, resolving any conflict that interferes with the determination of parallelism. Then, the found parallel computations are automatically presented in parallel terms of a particular architecture. In fact, this is some interactive environment for developing parallel programs, in which the efforts of man and automated tools are balanced. In such a system, the main work on the analysis, optimizations and instrumentation of parallel code is performed by the automaton, and a person only resolves conflicts with point-by-point pragmas that the static analyzer cannot handle. The inability to automatically resolve the conflict arises for two reasons: the imperfection of static analysis and the dynamic nature of conflicts. Despite the fact that the first factor will be leveled over time, the resolution of dynamic conflicts will require hints.

В работе Дроздова А.Ю. [1] был предложен компонентный подход к построению оптимизирующих компиляторов [2] и была разработана технология портировния компонент в контекст любой существующей технологии оптимизирующей компиляции. Этот способ выбран в качестве прототипа.In the work of Drozdov A.Yu. [1] a component approach to the construction of optimizing compilers was proposed [2] and a technology for porting components to the context of any existing optimizing compilation technology was developed. This method is selected as a prototype.

Технический результат, достигаемый при применении заявляемого решения, состоит в уменьшении времени обработки информации за счет увеличения числа параллельно выполняемых операций для большего числа случаев, встречающихся в практике.The technical result achieved by the application of the proposed solution is to reduce the processing time of information by increasing the number of parallel operations for more cases encountered in practice.

Известные способы не позволяют получить заявленный технический результат.Known methods do not allow to obtain the claimed technical result.

В данной работе описан автоматический распараллеливатель программ, созданный на базе компонентного подхода, описанного в работе [1], и встроенный в технологическую цепочку GCC [34].This work describes an automatic program parallelizer created on the basis of the component approach described in [1] and built into the GCC process chain [34].

ОПИСАНИЕ ГРАФИЧЕСКОГО МАТЕРИАЛАDESCRIPTION OF GRAPHIC MATERIAL

На фиг.1 показан алгоритм представления операции.Figure 1 shows the algorithm for representing the operation.

На фиг.2 показан граф потока данных в форме единственного присваивания.Figure 2 shows a graph of a data stream in the form of a single assignment.

На фиг.3 приведена структура способа реализации канонической формы.Figure 3 shows the structure of the method of implementation of the canonical form.

На фиг.4 приведен пример построения канонической формы для операций сложения и умножения.Figure 4 shows an example of the construction of the canonical form for the operations of addition and multiplication.

На фиг.5 показан алгоритм динамической проверки эффективности распараллеливания.Figure 5 shows the algorithm for dynamically checking the efficiency of parallelization.

ПОДРОБНОЕ ОПИСАНИЕDETAILED DESCRIPTION

Статический анализ программStatic analysis of programs

Ключевым элементом автоматического распараллеливателя (АР) является статический анализатор алгоритмической сущности (семантики) программ, реализованный на базе аналитических компонент универсальной технологии оптимизирующей компиляции [1]. Аналитическая часть содержит новые (см. далее) алгоритмы анализа потока управления, анализа потока данных, межпроцедурного анализа, анализа цикловых зависимостей и пр. Алгоритмы были опробованы в промышленных компиляторах и доказали свою высокую эффективность. В анализатор включены следующие аналитические компоненты:A key element of automatic parallelizer (AR) is a static analyzer of the algorithmic essence (semantics) of programs implemented on the basis of the analytical components of the universal technology of optimizing compilation [1]. The analytical part contains new (see below) algorithms for control flow analysis, data flow analysis, interprocedural analysis, cycle dependency analysis, etc. The algorithms have been tested in industrial compilers and have proven their high efficiency. The analyzer includes the following analytical components:

- анализ потока данных методом нумераций значений;- analysis of the data stream by the method of numbering values;

- анализ потока управления (построение графа потока управления, построение деревьев доминаторов/постдоминаторов, построение дерева циклов, поиск итерационного фронта доминирования, определение эквивалентности по управлению);- analysis of the control flow (construction of a graph of the control flow, construction of trees of dominators / postdominators, construction of a tree of cycles, search for the iterative front of dominance, determination of equivalence for control);

- анализ переменных в цикле (распознавание инвариантов, индуктивностей, рекурентностей, редукций);- analysis of variables in the cycle (recognition of invariants, inductances, recurrences, reductions);

- анализ цикловых зависимостей;- analysis of cyclic dependencies;

- анализ зависимостей на ациклических участках;- analysis of dependencies on acyclic sites;

- межпроцедурный анализ потока данных.- interprocedural data flow analysis.

Основная задача, которую решают методы статического анализа программ в оптимизирующих компиляторах, - это определение отношения зависимости по данным и управлению между различными группами вычислений программы [1-8]. Эффективное (точное) вычисление этих отношений имеет решающую роль при проведении оптимизирующих преобразований программы.The main task that the methods of static analysis of programs in optimizing compilers solve is the determination of the relationship of data dependencies and control between different groups of program calculations [1-8]. Efficient (accurate) calculation of these relations is crucial in carrying out optimizing transformations of the program.

Поэтому в современных оптимизирующих компиляторах важнейшую роль играют аналитические фазы компиляции. На фазах анализа определяют избыточные вычисления, зависимости между операциями доступа в память, зависимости по управлению, достижимость по управлению и т.п. В рамках этих фаз анализируют поток управления в программе, поток данных, решают системы линейных диофантовых уравнений и неравенств с целью определения зависимостей в цикловых регионах программы.Therefore, in modern optimizing compilers, the analytical phase of compilation plays a crucial role. In the analysis phases, redundant calculations, dependencies between memory access operations, control dependencies, control reachability, etc., are determined. Within the framework of these phases, the control flow in the program, the data flow are analyzed, systems of linear diophantine equations and inequalities are solved in order to determine the dependencies in the cyclic regions of the program.

После анализа потоков данных и управления компилятор может использовать результаты для получения ответов на вопросы о зависимостях между любыми парами вычислений программы [2, 5, 25-28]. Под вычислениями здесь понимают такие элементы факторизации любого уровня, как операции, линейные участки, циклы, процедуры. Любой запрос на отношение зависимости имеет два измерения сложности, которые определяют структуру построения ответа. Одно из них рассматривает сложность вычислений, для которых определяют отношение зависимости, с точки зрения факторизации по управлению, а второе определяет уровень сложности объектов программы, которые участвуют в анализируемых вычислениях.After analysis of data flows and control, the compiler can use the results to get answers to questions about dependencies between any pairs of program calculations [2, 5, 25-28]. By calculations here we mean such factors of factorization of any level, such as operations, linear sections, cycles, procedures. Any request for a dependency relationship has two dimensions of complexity that determine the structure of the response. One of them considers the complexity of the calculations, for which the relationship of dependence is determined from the point of view of factorization for management, and the second determines the level of complexity of the program objects that participate in the analyzed calculations.

Если объекты элементарные, то для ответа достаточно результатов внутрипроцедурного потокового анализа управления и данных. По мере увеличения сложности объектов увеличивается и сложность алгоритмов анализа, результаты которых необходимы для получения наименее консервативного ответа на вопрос о зависимости. Для объектов - массивов часто бывает необходимо применить анализ зависимостей в гнездах циклов, а для объектов, на которые брался указатель, будут востребованы результаты межпроцедурного анализа потока данных.If the objects are elementary, then the results of the intra-procedural flow analysis of control and data are enough to answer. As the complexity of the objects increases, so does the complexity of the analysis algorithms, the results of which are necessary to obtain the least conservative answer to the question of dependence. For objects - arrays it is often necessary to apply the analysis of dependencies in the nests of loops, and for the objects that the pointer was taken, the results of the interprocedural analysis of the data stream will be claimed.

Особенно важное значение методы статического анализа программ играют при построении анализатора и автоматического распараллеливателя, для обнаружения независимых участков программы (анализатор) и их дальнейшего оформления в виде параллельных блоков (распараллеливатель).The methods of static analysis of programs are especially important in the construction of the analyzer and automatic parallelizer, for the detection of independent sections of the program (analyzer) and their further design in the form of parallel blocks (parallelizer).

Подстановка промежуточного представления процедур в места их вызововSubstitution of the intermediate representation of procedures in the places of their calls

Классическое проеобразование (inline) состоит в том, что вместо операции вызова подставляют промежуточное представление вызываемой процедуры. Благодаря этому преобразованию упрощают анализ зависимостей, так как все операции подставляемой процедуры можно явно анализировать алгоритмами анализа. При подстановке вызовов важно определить критерий, по которому это преобразование нужно применять. В настоящей работе критерий выбран таким, чтобы максимизировать количество циклов, не содержащих вызовов. Если подстановка вызова не приводит к появлению новых циклов без вызовов, то подстановку не осуществляют.The classic inline is that instead of a call operation, an intermediate representation of the called procedure is substituted. Thanks to this transformation, dependency analysis is simplified, since all the operations of the substituted procedure can be explicitly analyzed by analysis algorithms. When substituting calls, it is important to determine the criteria by which this transformation should be applied. In this paper, the criterion is chosen so as to maximize the number of cycles that do not contain calls. If the substitution of the call does not lead to the appearance of new cycles without calls, then the substitution is not carried out.

В процессе компиляции программу преобразуют из исходного представления (для языков программирования это текст программы) во внутреннее промежуточное представление компилятора. Любое промежуточное представление, используемое в компиляторах, в первую очередь сохраняет семантику исходной программы. Под семантикой программы понимают ее алгоритмическую сущность. С точки зрения фрагментации программы, написанные на наиболее распространенных языках программирования (С, C++, Fortran и т.д.), организованы в модули и процедуры. Модуль соответствует файловой организации программ. В виде процедур оформляют фрагменты программ внутри модулей. Структуры данных программ представляют объектами с фиксированными или динамически задаваемыми размерами и описанием их внутренней структуры с помощью типов. Наиболее общим способом сохранения алгоритмической составляющей программы в компиляторах является операционное представление (фиг.1).During compilation, the program is converted from the original representation (for programming languages, this is the text of the program) into an internal intermediate representation of the compiler. Any intermediate representation used in compilers primarily preserves the semantics of the original program. Semantics of a program is understood to mean its algorithmic nature. In terms of fragmentation, programs written in the most common programming languages (C, C ++, Fortran, etc.) are organized into modules and procedures. The module corresponds to the file organization of the programs. In the form of procedures, fragments of programs are executed inside the modules. The structures of these programs are represented by objects with fixed or dynamically defined sizes and a description of their internal structure using types. The most common way to save the algorithmic component of the program in compilers is the operational representation (figure 1).

Операционное представление является списком операций. У операции может существовать входной контекст и выходной контекст. Входной и выходной контексты задают списками аргументов и результатов. Аргументы могут быть литералами, объектами или ссылками на другие операции. Отображение связи между результатом одной операции и аргументом другой через объекты является более общим, чем представление этой связи в виде ссылки на операцию. Сама операция представляет собой набор атрибутов, определяющих семантическое действие над входным контекстом для получения выходного контекста. К такому операционному представлению может быть сведено практически любое известное промежуточное представление, начиная с синтаксических деревьев фронтендов (gcc, edg), заканчивая представлениями, наиболее приближенными к ассемблерам целевой архитектуры.The operational view is a list of operations. An operation may have an input context and an output context. The input and output contexts are specified by lists of arguments and results. Arguments can be literals, objects, or references to other operations. The mapping of the relationship between the result of one operation and the argument of another through objects is more general than the representation of this relationship as a reference to the operation. The operation itself is a set of attributes that define the semantic action on the input context to obtain the output context. Almost any known intermediate representation can be reduced to such an operational representation, starting with the syntax trees of the frontends (gcc, edg) and ending with the representations closest to the assemblers of the target architecture.

Для выполнения оптимизаций в соответствии с выбранным критерием выполняют последовательность преобразований промежуточного представления.To perform optimizations in accordance with the selected criterion, a sequence of transformations of the intermediate representation is performed.

В приведенном алгоритме представлена последовательность шагов, которая позволяет эффективно распараллеливать циклы в программе. Алгоритм запускают на промежуточном представлении программы. В процессе работы выполняют преобразования промежуточного представления. На промежуточном представлении при применении преобразований строят (если уже построены, то используют) аналитические структуры данных: граф потока управления, дерево доминаторов, дерево циклов, граф потока данных.The above algorithm presents a sequence of steps that allows you to effectively parallelize the cycles in the program. The algorithm is run on an intermediate representation of the program. In the process of performing transformations of the intermediate representation. At the intermediate representation, when applying transformations, analytic data structures are built (if already built, then used): control flow graph, dominator tree, loop tree, data flow graph.

Межпроцедурный анализ указателей - использовано решение, описанное в работе [22, 23, 24]. Interprocedural analysis of pointers - the solution described in [22, 23, 24] was used.

Межпроцедурный анализ указателей необходим для того, чтобы уже на стадии компиляции программы определить возможные значения указателей, содержащиеся в переменных в различных местах программы. Наличие такой информации помогает в определении того, какие группы операций могут быть выполнены параллельно, какие значения уже были вычислены и их можно повторно использовать, оптимизировать обращения к объектам в памяти, сворачивать константные вычисления. Наличие результатов такого анализа, обладающих высокой степенью детализации, позволяет эффективно применять многие оптимизирующие преобразования программы как внутри, так и межпроцедурного уровня.Interprocedural analysis of pointers is necessary in order to determine at the stage of compilation of the program the possible values of pointers contained in variables in various places of the program. The presence of such information helps in determining which groups of operations can be performed in parallel, which values have already been calculated and can be reused, optimizing access to objects in memory, and folding constant calculations. The presence of the results of such an analysis with a high degree of detail allows us to effectively apply many optimizing transformations of the program both internally and interprocedurally.

На более формальном уровне задачей межпроцедурного анализа указателей является вычисление некоторой функции, которая в каждой точке программы для каждой переменной позволяет получить множество значений одного из перечисленных выше типов, которые в этой точке переменная может содержать. Задача межпроцедурного анализа может быть описана построением такого приближения семантики программы, которое отображало бы интересующее свойство поведения этой программы на стадии исполнения.On a more formal level, the task of interprocedural analysis of pointers is to calculate a function that at each point in the program for each variable allows you to get many values of one of the types listed above that a variable can contain at this point. The task of interprocedural analysis can be described by constructing such an approximation of the semantics of a program that would display an interesting property of the behavior of this program at the execution stage.

Предлагаемый здесь подход к проблеме межпроцедурного анализа потока данных описан такими характеристиками, как чувствительность к потоку управления (flow-sensitivity) и чувствительность к контексту вызова процедуры (context-sensitivity). Первое означает, что алгоритм учитывает управление внутри процедуры, что приводит к повышению его точности. Второе - что он стремится различать информацию, приходящую в процедуру по различным путям во время исполнения. Но поскольку число таких путей может быть значительным, необходимо объединять те из них, которые предположительно наиболее близки (и объединение которых внесет минимально возможный консерватизм в его результаты). Основным механизмом, который обычно используют для обеспечения свойства контекстной зависимости, является механизм частично трансферной функции (ЧТФ). Он позволяет весьма эффективно выбирать соотношение между скоростью проведения анализа и его точностью, ибо, в общем случае, межпроцедурный анализ потока данных является достаточно ресурсозатратным процессом как с точки зрения требуемой памяти, так и с точки зрения времени его проведения.The approach proposed here to the problem of interprocedural analysis of data flow is described by such characteristics as sensitivity to the control flow (flow-sensitivity) and sensitivity to the context of the procedure call (context-sensitivity). The first means that the algorithm takes into account the control inside the procedure, which leads to an increase in its accuracy. The second is that he seeks to distinguish between information coming into the procedure along various paths during execution. But since the number of such paths can be significant, it is necessary to combine those that are supposedly closest (and combining which will introduce the least possible conservatism in its results). The main mechanism that is usually used to provide contextual dependence properties is the partially transfer function (CTF) mechanism. It allows you to very effectively choose the ratio between the speed of the analysis and its accuracy, because, in the general case, the interprocedural analysis of the data stream is a rather resource-intensive process both from the point of view of the required memory and from the point of view of its time.

Анализ потока данных способом нумераций значенийData flow analysis by numbering method

Способ описан в диссертации [36]. Анализ потока данных в программе способом нумераций значений состоит в том, чтобы присвоить результатам операций одинаковые классы эквивалентности, если операции пишут в эти результаты эквивалентные значения. Две операции считают эквивалентными, если у них эквивалентны аргументы и операции выполняют одно и то же семантическое действие. При потоковом анализе строят граф потока данных, который связывает результаты и аргументы операций. Граф потока данных позволяет для аргументов получать операции, вырабатывающие значение аргумента. Кроме графа потока данных для анализа применяют способ, использующий форму единственного присваивания. Текст программы, переведенный в эту форму, содержит псевдооперации в точках схождения управления. В результате с учетом этих псевдоопераций, для каждого аргумента, для которого произведен перевод в форму единственного присваивания, существует единственная запись. В графе потока данных для таких аргументов будет всего одна входная дуга от единственной записи. На фиг.2 показан граф потока данных в форме единственного присваивания. В программе, приведенной для примера на фиг.2, есть две записи в переменную А и одно чтение. Запись обозначают как 'А=…', чтение обозначают как '…=А'. Прямоугольниками и связывающими их стрелками отображают поток управления в программе. Из блока BLOCK 1 возможна передача управления на блок BLOCK 2 или на блок BLOCK 3. Из блоков BLOCK 2 и BLOCK 3 управление передают в блок BLOCK 4. Черными кружками и стрелками, соединяющими их, образован граф потока данных для данного фрагмента программы. На графе потока данных каждый узел соответствует операции, и каждая дуга соответствует паре аргумент-результат. Псевдооперация, соответствующая схождению потока управления, обозначена как 'φ(А)'.The method is described in the thesis [36]. An analysis of the data flow in a program by way of numbering values is to assign the same equivalence classes to the results of operations if the operations write equivalent values to these results. Two operations are considered equivalent if they have equivalent arguments and the operations perform the same semantic action. In streaming analysis, a data flow graph is constructed that links the results and arguments of operations. The data flow graph allows for arguments to obtain operations that produce the value of the argument. In addition to the data flow graph for analysis, a method is used that uses the single assignment form. The program text translated into this form contains pseudo-operations at the points of control convergence. As a result, taking into account these pseudo-operations, for each argument for which a translation is made in the form of a single assignment, there is a single record. In the data flow graph for such arguments there will be only one input arc from a single record. Figure 2 shows a graph of a data stream in the form of a single assignment. In the program shown for the example in figure 2, there are two entries in the variable A and one read. A record is designated as 'A = ...', a reading is designated as '... = A'. Rectangles and arrows connecting them represent the control flow in the program. From BLOCK 1, it is possible to transfer control to BLOCK 2 or BLOCK 3. From BLOCK 2 and BLOCK 3, control is transferred to BLOCK 4. Black circles and arrows connecting them form a data flow graph for this program fragment. On a data flow graph, each node corresponds to an operation, and each arc corresponds to an argument-result pair. The pseudo-operation corresponding to the convergence of the control flow is denoted as 'φ (A)'.

Анализ переменных цикла на инвариантность, индуктивность и редукциюAnalysis of cycle variables for invariance, inductance and reduction

Поиск переменных цикла осуществлют по графу потока данных. Анализ состоит в поиске подграфов, удовлетворяющих свойствам инвариантности, индуктивности, редукции [2, 5, 7, 8].The search for loop variables is performed by the data flow graph. The analysis consists in finding subgraphs satisfying the properties of invariance, inductance, reduction [2, 5, 7, 8].

Анализ операций доступа в массивыAnalysis of array access operations

Существенное улучшение качества потокового анализа методом нумераций значений можно получить, если использовать каноническую форму представления выражений в виде сумм произведений. Эта форма является одним из способов представления линейного выражения в виде полиномаA significant improvement in the quality of stream analysis by the method of numbering values can be obtained by using the canonical form of representing expressions in the form of sums of products. This form is one way of representing a linear expression as a polynomial

с₀+с₁*x₁₁*x₁₂*…*x_1k1+c₂*x₂₁*x₂₂*…*x_2k2+…+c_n*x_n1*x_n2*…*x_nkn,s ₀ + s ₁ * x ₁₁ * x ₁₂ * ... * x _1k1 + c ₂ * x ₂₁ * x ₂₂ * ... * x _2k2 + ... + c _n * x _n1 * x _n2 * ... * x _nkn ,

где c₀, c₁, c_n - некоторые константы; X_ij - множители.where c ₀ , c ₁ , c _n are some constants; X _ij are factors.

Эффект от использования канонической формы состоит в приведении линейных выражений к единому (каноническому) виду, что позволяет определять эквивалентные вычисления, даже если исходный вид выражений различается. В рамках приведения арифметических выражений к каноническому виду выполняют свертку констант (исполнение арифметических операций над константами), упорядочивание коммутативных операций, приведение подобных слагаемых. Например, выражение (а+b)*с и выражение а*с+b*с будут приведены к единому виду а*с+b*с и определены как эквивалентные.The effect of using the canonical form is to reduce the linear expressions to a single (canonical) form, which allows you to define equivalent calculations, even if the original form of the expression is different. As part of the reduction of arithmetic expressions to the canonical form, convolutions of constants are performed (the execution of arithmetic operations on constants), the ordering of commutative operations, and the reduction of such terms. For example, the expression (a + b) * c and the expression a * c + b * c will be reduced to a single form a * c + b * c and defined as equivalent.

На фиг.3 приведена структура способа реализации указанной канонической формы. В прямоугольниках представлены элементы, соответствующие слагаемым. В овалах представлены элементы, соответствующие множителям.Figure 3 shows the structure of the method for implementing the specified canonical form. The rectangles represent the elements corresponding to the terms. The ovals show the elements corresponding to the factors.

Механизм использования канонической формы во время работы алгоритма нумераций значений состоит в том, чтобы для операций сложения, вычитания и умножения строить сумму произведений по аргументам этой операции. Множителями канонической формы будут являться классы эквивалентности (напр. см. ранее «Анализ потока данных способом нумерации значений»). Хеширование операций, для которых возможно построение канонических форм, выполняют на основе этих форм.The mechanism for using the canonical form during the operation of the numbering algorithm is to build the sum of products for the arguments of this operation for addition, subtraction and multiplication. The canonical form factors will be equivalence classes (for example, see “Analysis of the data stream by way of numbering values” earlier). Hashing operations for which the construction of canonical forms is possible is performed on the basis of these forms.

На фиг.4 приведен пример построения канонической формы для операций сложения и умножения. Символами V1, V2 и V3 обозначены классы эквивалентностей для операций чтения из переменных А, С и В соответственно.Figure 4 shows an example of the construction of the canonical form for the operations of addition and multiplication. The symbols V1, V2, and V3 denote equivalence classes for read operations from variables A, C, and B, respectively.

Способ представления адресов операций доступа в память - каноническая форма сумм произведений, множителями которой выбирают классы эквивалентности, вычисленные в результате потокового анализа методом нумераций значений. Анализ зависимостей в цикловых регионах в основном сводят к анализу конфликтов операций доступа в память. Наиболее известным способом работой с памятью является обращение к массивам по линейным индексам. Ниже приведен фрагмент программы, написанной на языке С. В строке 1 объявляют массив А размером в 10 элементов. К массиву А в строке 5 задают обращение по индексам 'i+1' и 'j-1'. Присваивают операции чтения из переменной i некоторый класс эквивалентности V. Для индекса 'i+1' строят каноническую форму 1+V. Чтению из переменной j устанавливают тот же класс эквивалентности V, так как значения переменных i и j содержат одни и те же значения. Для индекса доступа в массив 'j-1' строят каноническую форму -1+V. Для определения зависимостей решают уравнения 1+V=-1+V', где для переменных V и V' есть ограничения 0<V<10 и 0<V'<10. Переменные V и V' являются номерами итераций цикла. В результате решения системы уравнений и неравенств определяют итерации, на которых происходит обращение к одному и тому же элементу массива А.The way to represent the addresses of memory access operations is the canonical form of the sums of the products, the factors of which are the equivalence classes calculated as a result of stream analysis by the method of numbering the values. The analysis of dependencies in cyclic regions mainly reduces to the analysis of conflicts in memory access operations. The most famous way of working with memory is to access arrays by linear indexes. The following is a snippet of a program written in C. In line 1, an array A of 10 elements in size is declared. Array A in line 5 is specified by the indices 'i + 1' and 'j-1'. Assign read operations from variable i to some equivalence class V. For the index 'i + 1', the canonical form 1 + V is constructed. Reading from the variable j sets the same equivalence class V, since the values of the variables i and j contain the same values. For the index of access to the array 'j-1', the canonical form -1 + V is built. To determine the dependencies solve the equation 1 + V = -1 + V ', where for the variables V and V' there are restrictions 0 <V <10 and 0 <V '<10. Variables V and V 'are loop iteration numbers. As a result of solving the system of equations and inequalities, iterations are determined at which the access to the same element of array A occurs.

Слияние цикловMerge loops

Оптимизация по слиянию циклов (fusion) [5, 6] для увеличения количества вычислений на одной итерации цикла и уменьшения потребных ресурсов на распараллеливание. Ниже приведен пример слияния циклов. Потребные ресурсы на распараллеливание после применения преобразования сокращены вдвое, так как распараллеливают только один цикл вместо двух.Optimization by fusion of cycles (fusion) [5, 6] to increase the number of calculations per iteration of the cycle and reduce the required resources for parallelization. The following is an example of merging loops. The resources required for parallelization after applying the conversion are halved, as they only parallelize one cycle instead of two.

Вынос инвариантных условийRemoval of invariant conditions

Оптимизация по выносу инвариантных условий из цикла (unswitching) [5, 6] позволяет удалять из цикла операции, исполняющиеся при этих условиях. Среди удаленных операций могут оказаться операции, препятствующие распараллеливанию. К ним относятся, в частности, операции вызова. Поэтому это преобразование увеличивает количество параллельных циклов.Optimization of the removal of invariant conditions from the cycle (unswitching) [5, 6] allows you to remove operations that are performed under these conditions from the cycle. Remote operations may include operations that prevent parallelization. These include, in particular, call operations. Therefore, this conversion increases the number of parallel loops.

Изменение порядка обхода итерационного пространстваChange the iterative space traversal order

Эти оптимизации нацелены как на повышение количества параллельных циклов, так и на повышение эффективности распараллеливания. Другое название этих трансформаций - "унимодулярные преобразования". Они подробно освещены в работах [5, 7, 8].These optimizations are aimed both at increasing the number of parallel cycles, and at increasing the efficiency of parallelization. Another name for these transformations is “unimodular transformations”. They are described in detail in [5, 7, 8].

Анализ межитерационных зависимостейAnalysis of inter-iteration dependencies

Классический способ приведен в работах [5, 7, 8]The classical method is given in [5, 7, 8]

Задача по нахождению межитерационных зависимостей состоит в решении системы линейных диофантовых уравнений и неравенств. Уравнения составляют в результате сравнения индексов доступа в массив, а неравенства формируют из представления верхних и нижних границ переменных циклов в математической форме. Количество уравнений определяют по размерности массивов, операции обращения к которым рассматривают при постановке задачи.The task of finding the interiterational dependences consists in solving a system of linear Diophantine equations and inequalities. The equations are the result of comparing the access indices in the array, and the inequalities are formed from the representation of the upper and lower bounds of the variable cycles in mathematical form. The number of equations is determined by the dimension of the arrays, the operations of access to which are considered when setting the problem.

После того как задача поставлена, выполняют следующую последовательность шагов.After the task is set, perform the following sequence of steps.

Решают систему линейных диофантовых уравнений. Если система уравнений не имеет решения, значит зависимость не существует. Если система уравнений имеет решение, исследуют систему линейных диофантовых неравенств. Если система диофантовых неравенств имеет решение, это означает, что зависимость существует. Если не имеет решения, это означает, что зависимость не существует. Если зависимость существует, далее исследуют вектора направлений и вектора расстояний.Solve a system of linear diophantine equations. If the system of equations has no solution, then the dependence does not exist. If the system of equations has a solution, investigate the system of linear Diophantine inequalities. If the system of Diophantine inequalities has a solution, this means that the dependence exists. If it has no solution, this means that the dependency does not exist. If the dependence exists, then we study the direction vectors and the distance vectors.

Классический подход к решению систем линейных диофантовых неравенств состоит в применении метода Фурье.The classical approach to solving systems of linear Diophantine inequalities consists in applying the Fourier method.

Анализ массивов на локальностьLocality analysis of arrays

Если массив на каждой итерации цикла перед использованием перезаписывают новыми значениями и не используют после цикла, то для каждой итерации можно создать локальную копию таких массивов. Это преобразование называют локализацией и оно позволяет удалить межитерационные зависимости исходного массива. Таким образом, чтобы локализовать массивы нужно удостовериться, что по всем путям от начала цикла до операций чтения из массивов есть операции записи в те же элементы массива. Кроме этого, нужно проверить, что массив не используют после цикла.If the array at each iteration of the cycle is overwritten with new values before use and is not used after the cycle, then for each iteration you can create a local copy of such arrays. This transformation is called localization and it allows you to remove the inter-iteration dependencies of the original array. Thus, in order to localize arrays, you need to make sure that there are write operations to the same elements of the array along all paths from the beginning of the cycle to reads from arrays. In addition, you need to check that the array is not used after the loop.

Это действие позволяет устранить межитерационные зависимости в цикле, тем самым увеличивая возможности для распараллеливания.This action eliminates the inter-iteration dependences in the cycle, thereby increasing the possibilities for parallelization.

Дублирование циклаLoop duplication

Для проверки эффективности от применения распараллеливания используют способ дублирования циклов. Одну копию оставляют не распараллелиной, а другую распараллеливают. Проверка перед циклами определяет целесообразность перехода на распараллелиную копию или оставление исходной версии цикла. Выполняют проверку на время исполнения итераций и сравнение этого времени с затратами ресурсов на запуск параллельного цикла.To verify the effectiveness of the use of parallelization using the method of duplication of cycles. One copy is not left parallelized, and the other parallelized. Check before the cycles determines the feasibility of switching to a parallel copy or leaving the original version of the cycle. They check for the execution time of iterations and compare this time with the cost of resources for launching a parallel loop.

На фиг.5 показан пример такого преобразования.Figure 5 shows an example of such a conversion.

Заявляемый способ состоит в следующем.The inventive method consists in the following.

На этапе промежуточного представлении программы строят (если уже построены, то используют) по крайней мере следующие аналитические структуры данных: граф потока управления, дерево доминаторов, дерево циклов, граф потока данных любыми известными способами.At the stage of the intermediate presentation of the program, at least the following analytical data structures are built (if already built, then used): control flow graph, dominator tree, loop tree, data flow graph by any known means.

Выполняют подстановки тел процедур в места их вызовов. В случае распараллеливания это действие направлено на то, чтобы проанализировать циклы, в которых имеются вызовы других процедур. При наличии вызовов известные способы анализа зависимостей в циклах неприменимы.Substitute procedure bodies at their call points. In the case of parallelization, this action is aimed at analyzing cycles in which there are calls of other procedures. In the presence of calls, known methods for analyzing dependencies in cycles are not applicable.

Выполняют межпроцедурный анализ потока данных. Для анализа указателей преимущественно используют способ частичных трансферных функций [22, 23, 24]. Возможно также использование различных альтернативных способов анализа [9-21].Perform interprocedural data flow analysis. For the analysis of pointers, the method of partial transfer functions is mainly used [22, 23, 24]. It is also possible to use various alternative methods of analysis [9-21].

Выполняют внутрипроцедурный анализ потока данных. Преимущественно используют известный способ нумераций значений. Цель анализа - определить эквивалентные операции. Возможно также использование классических итерационных алгоритмов распространения потоковых свойств программы [5, 6].Intra-procedural analysis of the data stream is performed. Advantageously, a known method of numbering values is used. The purpose of the analysis is to determine equivalent operations. It is also possible to use classical iterative algorithms for distributing the streaming properties of a program [5, 6].

Выполняют анализ переменных цикла. Определяют инвариантные переменные, индуктивные переменные и редукции. Например, анализ графа потока данных с определением подграфов, соответствующих индуктивным и/или инвариантным переменным.Perform loop variable analysis. Define invariant variables, inductive variables, and reductions. For example, analyzing a data flow graph with the definition of subgraphs corresponding to inductive and / or invariant variables.

Выполняют анализ операций доступа в массивы. Строят индексы доступа в массивы в виде канонических форм сумм произведений.Perform an analysis of access operations to arrays. They build access indexes into arrays in the form of canonical forms of sums of products.

Выполняют преобразования циклов, увеличивающие эффект от последующего распараллеливания. В их число входят преобразования по слиянию циклов, по выносу инвариантных условий, по изменению порядка обхода итерационного пространства циклов.Perform loop transformations that increase the effect of subsequent parallelization. These include transformations by merging cycles, by removing invariant conditions, by changing the order of traversal of the iterative space of cycles.

Выполняют анализ параллельных циклов. Дерево циклов обходят рекурсивно от корневого узла с проверкой на возможность распараллелить текущий цикл. Выполняют по крайней мере следующие виды анализа.Perform parallel loop analysis. The loop tree is bypassed recursively from the root node with a check for the ability to parallelize the current loop. Perform at least the following types of analysis.

а. Анализ структуры цикла. Распараллеливать можно циклы с одним выходом, управление которым осуществляют индуктивной переменной с инвариантными верхней и нижней границами, а также инвариантным шагом.but. Analysis of the structure of the cycle. You can parallelize cycles with one output, which are controlled by an inductive variable with invariant upper and lower boundaries, as well as an invariant step.

b. Анализ редукций цикла. Распознанные редукции исключают при анализе межитерационных зависимостей.b. Analysis of cycle reductions. Recognized reductions are excluded in the analysis of inter-iteration dependencies.

с. Анализ межитерационных зависимостей в цикле. Межитерационные зависимости выявляют классическим методом с помощью решения диофантовых уравнений.from. Analysis of inter-iteration dependencies in a cycle. The interiterational dependences are revealed by the classical method by solving Diophantine equations.

d. Анализ массивов на локальность. В случае локализации массивов межитерационные зависимости могут быть удалены.d. Analysis of arrays for locality. In case of localization of arrays, inter-iteration dependencies can be removed.

Для параллельных циклов выполняют построение копии цикла, поскольку распараллеленный цикл не всегда работает быстрее, чем исходный цикл. При некоторых условиях потребность в дополнительных ресурсах на запуск цикла в параллель может привести к снижению производительности. Динамически осуществляют проверку на время исполнения итераций и сравнение этого времени с дополнительными ресурсами на запуск параллельного цикла.For parallel loops, a copy of the loop is constructed, since a parallelized loop does not always work faster than the original loop. Under some conditions, the need for additional resources to run the loop in parallel can lead to reduced performance. They dynamically check for the execution time of iterations and compare this time with additional resources for launching a parallel loop.

Для параллельных циклов выполняют построение операций, которые обеспечат параллельное исполнение циклов. При параллельном исполнении производят запуск нескольких потоков, каждый из которых выполняет часть итераций цикла. Тело цикла выделяют в новую процедуру, параметрами которой являются нелокальные данные итераций цикла. Локальными структурами данных созданной процедуры назначают локальные структуры данных итераций исходного цикла.For parallel loops, operations are constructed that provide parallel execution of loops. In parallel execution, several threads are launched, each of which performs part of the loop iterations. The body of the loop is allocated into a new procedure, the parameters of which are nonlocal data of loop iterations. Local data structures of the created procedure are assigned local data structures of iterations of the original cycle.

Описанный способ был реализован на базе КТОТ [1]. Результаты, публикуемые в указанной работе, были получены за счет распараллеливания циклов.The described method was implemented on the basis of CTOT [1]. The results published in this work were obtained by parallelizing the cycles.

Структура описания организована так, что сначала приводят описание некоторого уровня алгоритма на абстрактном языке, а затем приводят комментарии к каждому шагу алгоритма.The description structure is organized in such a way that first they describe a certain level of the algorithm in an abstract language, and then provide comments on each step of the algorithm.

Алгоритм 1. Автоматическое распараллеливание процедуры. Algorithm 1 . Automatic parallelization procedure.

На вход подают промежуточное представление программы, состоящее из набора промежуточных представлений процедур.At the input serves an intermediate representation of the program, consisting of a set of intermediate representations of the procedures.

На выходе получают модифицированное промежуточное представление программы, в котором могут присутствовать распараллеленные циклы.The output is a modified intermediate representation of the program, in which parallelized loops may be present.

Распараллеливание_программыParallelization_programs

1. Выполняют подстановки тел процедур в места их вызовов.1. Substitution of the bodies of the procedures in place of their calls.

2. Выполняют межпроцедурный анализ потока данных.2. Perform interprocedural analysis of the data stream.

3. Цикл по всем процедурам программы.3. The cycle for all program procedures.

4. Распараллеливание_процедуры. 4. Parallelization_procedures.

5. Конец цикла. 5. The end of the cycle.

Распараллеливание_процедурыParallelization_procedures

1. Выполняют внутрипроцедурный анализ потока данных.1. Perform intra-procedural analysis of the data stream.

2. Выполняют анализ переменных цикла.2. Perform loop variable analysis.

3. Выполняют анализ операций доступа в массивы.3. Perform an analysis of access operations to arrays.

4. Выполняют преобразования циклов, увеличивающие эффект от последующего распараллеливания.4. Perform cycle conversions that increase the effect of subsequent parallelization.

5. Распараллеливание_цикла. 5. Parallelization of a cycle.

6. Выполняют удаление вспомогательных структур данных.6. Remove auxiliary data structures.

Распараллеливание_циклаParallelization

1. Выполняют проверку свойств индуктивной переменной цикла и либо переход к шагу 5, либо переход к шагу 2.1. Perform verification of the properties of the inductive variable of the cycle and either go to step 5 or go to step 2.

2. Выполняют анализ редукций цикла.2. Perform a reduction analysis of the cycle.

3. Выполняют проверку наличия межитерационных зависимостей и в случае их отсутствия переход к шагу 4.3. Check for the presence of inter-iteration dependencies and, if they are absent, go to step 4.

4. В случае наличия межитерационных зависимостей выполняют попытку удалить зависимости способом локализации массивов. Если зависимости удалить не удалось, то переход на шаг 6.4. If there are inter-iteration dependencies, an attempt is made to remove the dependencies by the method of localizing arrays. If the dependencies could not be removed, go to step 6.

5. Применение_распараллеливания_цикла. Выход из процедуры.5. The use of parallelization_cycle . Exit the procedure.

6. Цикл по всем вложенным циклам в дереве циклов.6. The cycle for all nested cycles in the tree of cycles.

7. Распараллеливание_цикла. 7. Parallelization of a cycle.

8. Конец цикла.8. The end of the cycle .

Применение_распараллеливания_циклаLoop_ parallelization

1. Построение копии цикла и условия перехода на распараллеленный вариант.1. Creating a copy of the cycle and the conditions for the transition to a parallelized version.

2. Построение для распараллеливаемого цикла переменных, содержащих значение нижней границы, верхней границы и шага.2. Construction of a parallelized cycle of variables containing the value of the lower boundary, upper boundary and step.

3. Формирование дескриптора параллельного цикла. По дескриптору в зависимости от типа среды осуществляют построение операций, которые обеспечат параллельное исполнение циклов. При параллельном исполнении производят запуск нескольких потоков, каждый из которых выполняет часть итераций цикла.3. Formation of a parallel loop descriptor. According to the descriptor, depending on the type of medium, operations are constructed that will ensure parallel execution of loops. In parallel execution, several threads are launched, each of which performs part of the loop iterations.

Верхний уровень алгоритма автоматического автораспараллеливания реализован в процедуре Распараллеливание_программы.The upper level of the automatic auto-parallelization algorithm is implemented in the program_ Parallelization procedure.

На 1-м и 2-м шагах выполняют алгоритмы межпроцедурного анализа и оптимизаций. Сначала выполняют подстановку промежуточных представлений процедур в места их вызовов, а затем выполняют межпроцедурный анализ потока данных.At the 1st and 2nd steps, algorithms of interprocedural analysis and optimizations are performed. First, the intermediate representations of the procedures are substituted into the places of their calls, and then the interprocedural analysis of the data flow is performed.

На шагах 3-5 выполняют цикл по процедурам программы, в котором на шаге 4 запускают алгоритм распараллеливания процедуры Распараллеливание_процедуры.In steps 3-5 perform the cycle of program procedures, wherein in step 4 is started algorithm parallelization Rasparallelivanie_protsedury procedure.

На шагах 1-4 процедуры Распараллеливание_процедуры выполняют предварительные действия перед распараллеливанием, а именно: проводят анализ потока данных методом нумераций значений, осуществляют анализ переменных цикла, выполняют анализ операций доступа в массив, а также проводят ряд цикловых оптимизаций, направленных на увеличение количества параллельных циклов и/или усиление эффекта от распараллеливания.In steps 1-4 of the Parallelization_procedure procedure, preliminary steps are performed before parallelization, namely: they analyze the data stream by the method of numbering the values, analyze the loop variables, analyze the operations of accessing the array, and also conduct a series of loop optimizations aimed at increasing the number of parallel loops and / or enhancing the effect of parallelization.

На шаге 5 запускают рекурсивный алгоритм распараллеливания циклов Распараллеливание_цикла. Эта процедура рекурсивно обходит дерево циклов и пытается распараллелить текущий цикл. Стартуют процедуру от головы дерева циклов.In step 5, run a recursive algorithm parallelization Rasparallelivanie_tsikla cycles. This procedure recursively traverses the loop tree and tries to parallelize the current loop. The procedure starts from the head of the tree of cycles.

На шаге 6 выполняют удаление всех вспомогательных структур данных.In step 6, all auxiliary data structures are deleted.

Следующим уровнем реализации алгоритма распараллеливания является распараллеливание цикла. Процедура Распараллеливание_цикла осуществляет проверку свойств цикла на возможное распараллеливание и в случае наличия этих свойств у цикла распараллеливает его. На шаге 1 проверяют структуру цикла. Распараллеливать можно циклы с одним выходом, управление которым осуществляет индуктивная переменная с инвариантными верхней и нижней границами, а также инвариантным шагом.The next level of implementation of the parallelization algorithm is loop parallelization. The Cycle_ parallelization procedure checks the properties of the cycle for possible parallelization and, if these properties are available, the cycle parallelizes it. In step 1, the loop structure is checked. You can parallelize cycles with one output, which is controlled by an inductive variable with invariant upper and lower boundaries, as well as an invariant step.

Анализ редукций цикла выполняют на шаге 2. Распознанные редукции исключают при анализе межитерационных зависимостей (зависимости между итерациями цикла) на шаге 3.The analysis of cycle reductions is performed in step 2. Recognized reductions are excluded when analyzing inter-iteration dependencies (dependencies between iteration of the cycle) in step 3.

На шаге 3 проверяют наличие межитерационных зависимостей в цикле. Для этого используют анализатор. Если в цикле нет межитерационных зависимостей, то он может быть распараллелен.At step 3, the presence of inter-iteration dependencies in the cycle is checked. To do this, use the analyzer. If there are no inter-iteration dependencies in the cycle, then it can be parallelized.

В случае существования межитерационных зависимостей на шаге 4 осуществляют попытку удалить зависимости способом локализации массивов.If there are inter-iteration dependencies in step 4, an attempt is made to remove the dependencies by the method of localizing arrays.

В случае успешного выполнения проверок на шагах 1, 3 и 4 на шаге 5 запускают процедуру Применение_распараллеливания_цикла, которая производит необходимые действия по распараллеливанию цикла.If the checks were successfully completed in steps 1, 3, and 4 in step 5, the application Cycle_Parallelization_Application procedure is launched, which performs the necessary actions to parallelize the cycle.

Если решение о распараллеливании принято не было, то на шаге 7 выполняют рекурсивный вызов процедуры Распараллеливание_цикла для всех вложенных циклов текущего цикла.If the decision on parallelization was not made, then at step 7, a recursive call of the procedure Parallelization of a cycle for all nested cycles of the current cycle is performed.

Процедура Применение_распараллеливания_цикла выполняет действия по распараллеливанию цикла, для которого распараллеливание возможно. На шаге 1 выполняют построение копии цикла и условия, проверяющего эффективность распараллеливания. Дело в том, что потребные ресурсы на создание нового потока велики и если цикл недостаточно долгого исполнения, то его распараллеливание может привести к снижению производительности. Динамически осуществляют проверку на время исполнения итераций и сравнение этого времени с потребными ресурсами на запуск нового потока. Вычисление времени исполнения итераций цикла носит эвристический характер. Оценка этого времени основана на величинах счетчиков исполняемых циклов и эвристических временах исполнения отдельных команд в теле цикла. Счетчики вложенных циклов не всегда известны на уровне охватывающего цикла. В этом случае их значение выставляют эвристически. Считается, что в среднем цикл содержит 10 итераций.The Cycle_Parallelization_Application procedure performs actions to parallelize a cycle for which parallelization is possible. In step 1, a copy of the cycle and a condition that checks the effectiveness of parallelization are built. The fact is that the required resources for creating a new stream are large and if the cycle is not long enough, then its parallelization can lead to a decrease in productivity. They dynamically check for the execution time of iterations and compare this time with the required resources to start a new thread. The calculation of the execution time of loop iterations is heuristic. The estimate of this time is based on the values of the counters of the executed cycles and the heuristic times of execution of individual commands in the body of the cycle. Nested loop counters are not always known at the level of the enclosing loop. In this case, their value is set heuristically. It is estimated that the average cycle contains 10 iterations.

На шаге 2 осуществляют инициализацию переменных значениями нижней и верхней границ, а также шагом индуктивной переменной.At step 2, the variables are initialized with the values of the lower and upper bounds, as well as with the step of the inductive variable.

На шаге 3 промежуточное представление распараллеленного цикла модифицируют. В представление встраивают операции, обеспечивающие параллельный запуск тела цикла. Тело цикла выделяют в новую процедуру. В зависимости от локальности структур данных по отношению к итерациям циклов часть из них передают параметром в новую процедуру, а часть (которые локальные) становятся структурами данных новой процедуры.In step 3, the intermediate representation of the parallelized loop is modified. The view is embedded in operations that ensure the parallel launch of the loop body. The body of the cycle is isolated in a new procedure. Depending on the locality of the data structures in relation to the iterations of the cycles, some of them are passed as a parameter to the new procedure, and some (which are local) become data structures of the new procedure.

ЛитератураLiterature

1. Дроздов А.Ю., Особенности и перспективы Универсальной технологии оптимизирующей компиляции. // Сборник научных трудов ИТМиВТ им. С.А.Лебедева РАН, Выпуск №1, Материалы Всероссийской конференции "Перспективы развития высокопроизводительных архитектур. История, современность и будущее отечественного компьютеростроения", 2008, С.62-74.1. Drozdov A.Yu., Features and prospects of the Universal technology of optimizing compilation. // Collection of scientific papers of ITMiVT im. S.A. Lebedeva RAS, Issue No. 1, Materials of the All-Russian Conference "Prospects for the Development of High-Performance Architectures. History, Present and Future of Russian Computer Engineering", 2008, P.62-74.

2. Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman, "Compilers: Principles, Techniques, and Tools", Addison-Wesley, Reading, 1986.2. Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, "Compilers: Principles, Techniques, and Tools", Addison-Wesley, Reading, 1986.

3. John R. Ellis. Bulldog: A compiler for VLIW Architectures. MIT Press, 1985.3. John R. Ellis. Bulldog: A compiler for VLIW Architectures. MIT Press, 1985.

4. Dick Grune, Henri E. Bal, Ceriel J.H. Jacobs and Koen G. Langendoen, Modern Compiler Design, by John Wiley & Sons, Ltd, 2000.4. Dick Grune, Henri E. Bal, Ceriel J.H. Jacobs and Koen G. Langendoen, Modern Compiler Design, by John Wiley & Sons, Ltd, 2000.

5. Randy Alien, Ken Kennedy, Optimizing Compilers for Modern Architectures. 2002 by Academic Press.5. Randy Alien, Ken Kennedy, Optimizing Compilers for Modern Architectures. 2002 by Academic Press.

6. Steven S. Muchnick, "Advanced Compiler Design and Implementation", Morgan Kauffman, San Francisco, 1997.6. Steven S. Muchnick, "Advanced Compiler Design and Implementation", Morgan Kauffman, San Francisco, 1997.

7. Utpal Banerjee, Loop Transformations for Restructuring Compilers. Kluwer academic Publishers, 1993.7. Utpal Banerjee, Loop Transformations for Restructuring Compilers. Kluwer academic Publishers, 1993.

8. Utpal Banerjee. Dependence Analysis. Kluwer Acadmic Publishers. 1997.8. Utpal Banerjee. Dependence Analysis. Kluwer Acadmic Publishers. 1997.

9. [29] Kwangkeun Yi and Williams Ludwell Harrison III, Interprocedural Data Flow Analysis for Compile-time Memory Management. Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.9. [29] Kwangkeun Yi and Williams Ludwell Harrison III, Interprocedural Data Flow Analysis for Compile-time Memory Management. Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.

10. Evelyn Duesterwald, Rajiv Gupta, Mary Lou Soffa, Demand-driven Computation of Interprocedural Data Flow. Department of Computer Science, University of Pittsburg, POPL'95 1/95 San Francisco, CA USA.10. Evelyn Duesterwald, Rajiv Gupta, Mary Lou Soffa, Demand-driven Computation of Interprocedural Data Flow. Department of Computer Science, University of Pittsburg, POPL'95 1/95 San Francisco, CA USA.

11. Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa, A Practical Framework for Demand-Driven Interprocedural Data Flow Analysis. University of Pittsburg, ASM SISPLAN-SIGACT Symposium on Principles of Programming Languages, 1995.11. Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa, A Practical Framework for Demand-Driven Interprocedural Data Flow Analysis. University of Pittsburg, ASM SISPLAN-SIGACT Symposium on Principles of Programming Languages, 1995.

12. Susan Horwitz, Thomas Reps, and Mooly Sagiv, Demand Interprocedural Dataflow Analysis. University of Wisconsin, Proceedings of the Third ASM SIGSOFT Symposium on Foundations of Software Engineering, Washington DC, October 10-13, 1995.12. Susan Horwitz, Thomas Reps, and Mooly Sagiv, Demand Interprocedural Dataflow Analysis. University of Wisconsin, Proceedings of the Third ASM SIGSOFT Symposium on Foundations of Software Engineering, Washington DC, October 10-13, 1995.

13. В.В.Воеводин, Вл.В.Воеводин, Параллельные вычисления. Санкт-Петербург «БХВ-Петербург», 2002.13. V.V. Voevodin, Vl.V. Voevodin, Parallel computing. St. Petersburg "BHV-Petersburg", 2002.

14. David R.Chase, Mark Wegman, F.Kenneth Zadeck, Analysis of Pointers and Structures. ASM SIGPLAN'90 PLDI, June 20-22, 1990.14. David R. Chase, Mark Wegman, F. Kenneth Zadeck, Analysis of Pointers and Structures. ASM SIGPLAN'90 PLDI, June 20-22, 1990.

15. Yuan-Shin Hwang, Joel Saltz, Compile-Time Analysis on Programs with Dynamic Pointer-Linked Data Structures. Depatment of Computer Science, University of Maryland, Nobember 8, 1996.15. Yuan-Shin Hwang, Joel Saltz, Compile-Time Analysis on Programs with Dynamic Pointer-Linked Data Structures. Depatment of Computer Science, University of Maryland, Nobember 8, 1996.

16. Rakesh Ghiya and Laurie J.Hendren, Connection Analysis: A Practical Interprocedural Heap Analysis for C. School of Computer Science, McGill University, Proceedings of the Eigth Workshop on Languages and Compilers fo rParallel Computing, Columbus, Ohio, August 10-12, 1995.16. Rakesh Ghiya and Laurie J. Hendren, Connection Analysis: A Practical Interprocedural Heap Analysis for C. School of Computer Science, McGill University, Proceedings of the Eigth Workshop on Languages and Compilers fo Parallel Computing, Columbus, Ohio, August 10-12 1995.

17. Xinan Tang, Rakesh Ghiya, Laurie J. Hendren, Guang R.Gao, Heap Analysis and Optimizations for Threaded Programs. School of Computing Science, McGill University, 1997.17. Xinan Tang, Rakesh Ghiya, Laurie J. Hendren, Guang R. Gao, Heap Analysis and Optimizations for Threaded Programs. School of Computing Science, McGill University, 1997.

18. Rakesh Ghiya, Practical Techniques for Interprocedural Heap Analysis. School of Computing Science, McGill University, Montreal, January 1996.18. Rakesh Ghiya, Practical Techniques for Interprocedural Heap Analysis. School of Computing Science, McGill University, Montreal, January 1996.

19. Keith D.Cooper, Ken Kennedy, Fast Interprocedural Alias Analysis. Rice University, 1989.19. Keith D. Cooper, Ken Kennedy, Fast Interprocedural Alias Analysis. Rice University, 1989.

20. Bjarne Steensgaard, Points-to Analysis in Almost Linear Time, Microsoft Research, 1996.20. Bjarne Steensgaard, Points-to Analysis in Almost Linear Time, Microsoft Research, 1996.

21. Keith D.Cooper, Ken Kennedy, Interprocedural Side-Effect Analysis in Linear Time. Rice University, 1988.21. Keith D. Cooper, Ken Kennedy, Interprocedural Side-Effect Analysis in Linear Time. Rice University, 1988.

22. Robert P.Wilson, Monika S.Lam. Efficient context-sensitive pointer analysis for С programs. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language and Implementation, pages 1-12, June 1995.22. Robert P. Wilson, Monika S. Lam. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language and Implementation, pages 1-12, June 1995.

23. Дроздов А.Ю, Владиславлев В.Е. Межпроцедурный анализ указателей // Информационные технологии. Приложение №2, Февраль 2005 г.23. Drozdov A.Yu., Vladislavlev V.E. Interprocedural analysis of pointers // Information Technologies. Appendix No. 2, February 2005

24. [44] Дроздов А.Ю., Сыркин А.Г. Методы контекстного межпроцедурного распространения свойств значений переменных программы // Информационные технологии. Приложение № 2, Февраль 2005 г.24. [44] Drozdov A.Yu., Syrkin A.G. Methods of contextual interprocedural distribution of properties of values of program variables // Information Technologies. Appendix No. 2, February 2005

25. [46] Дроздов А.Ю., Корнев P.M., Боханко А.С. Индексный анализ зависимостей по данным // Информационные технологии и вычислительные системы, №3, 2004 г.25. [46] Drozdov A.Yu., Kornev P.M., Bohanko A.S. Index analysis of dependencies according to data // Information Technologies and Computing Systems, No. 3, 2004

26. Kleanthis Psarris and Konstantinos Kyriakopoulos, Data Dependence Testing in Practice. Division of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249.26. Kleanthis Psarris and Konstantinos Kyriakopoulos, Data Dependence Testing in Practice. Division of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249.

27. [48] Paul M.Petersen and David A.Padua, Experimental Evaluation of Some Data Dependence Tests (Extended Abstract), Center for Super computing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801.27. [48] Paul M. Petersen and David A. Padua, Experimental Evaluation of Some Data Dependence Tests (Extended Abstract), Center for Super computing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801.

28. [49] Paul M.Petersen, David A.Padua, Static and Dynamic Evaluation of Data Dependence Analysis. Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.28. [49] Paul M. Petersen, David A. Padua, Static and Dynamic Evaluation of Data Dependence Analysis. Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.

29. [54] Wen-mei Hwu, Shane Ryoo, Sain-Zee Ueng, John H. Kelm, Isaac Gelado, Sam S. Stone, Robert E. Kidd, Sara S. Baghsorkhi, Aqeel A. Mahesri, Stephanie C. Tsao, Nacho Navarro, Steve S. Lumetta, Matthew I. Frank, and Sanjay J. Patel, Implicit Parallel Programming Models for Thousand-Core Microprocessors, Proceedings of the 44th Annual Design Automation Conference, June 2007.29. [54] Wen-mei Hwu, Shane Ryoo, Sain-Zee Ueng, John H. Kelm, Isaac Gelado, Sam S. Stone, Robert E. Kidd, Sara S. Baghsorkhi, Aqeel A. Mahesri, Stephanie C. Tsao , Nacho Navarro, Steve S. Lumetta, Matthew I. Frank, and Sanjay J. Patel, Implicit Parallel Programming Models for Thousand-Core Microprocessors, Proceedings of the 44th Annual Design Automation Conference, June 2007.

30. www.mpi-forum.org [56].30. www.mpi-forum.org [56].

31. www.openmp.org [57].31. www.openmp.org [57].

32. www.intel.com [58].32. www.intel.com [58].

33. www.pgroup.com [59].33. www.pgroup.com [59].

34. www.gnu.org [63].34. www.gnu.org [63].

35. www.rapidmind.com [64].35. www.rapidmind.com [64].

36. [72] Loren Taylor Simson, Value-Driven Redundancy Elimination, Thesis, Rice University, Houston, Texas, April, 1996.36. [72] Loren Taylor Simson, Value-Driven Redundancy Elimination, Thesis, Rice University, Houston, Texas, April 1996.

Claims

A method for automatic parallelization of cyclic areas in the algorithmic part of the program, which allows to maximize the number of parallel cycles by applying interprocedural and intraprocedural optimizations, characterized by embedding dynamic checks for parallelization efficiency in the program, including the following steps:
preliminary, at the stage of intermediate presentation of the program, when performing the transformations, they build using any known methods, and if they are already built, then at least the following analytical data structures are used: control flow graph, dominators tree, cycle tree, data flow graph;
perform substitution of the intermediate representation of procedures in the place of calls;
perform interprocedural analysis of the data stream,
to detect equivalent operations, an analysis of the data stream is performed, preferably by a method of numbering the values;
perform an analysis of cycle variables for invariance and inductance;
analyze access operations in arrays, build access indices in arrays in the form of canonical forms of sums of products;
merge loops;
carry out the removal of invariant conditions;
change the order of traversal of the iterative space of loops,
perform parallel loop analysis,
moreover, when performing interprocedural analysis of the data stream, a method is used that is sensitive to the context and sensitive to the control flow,
moreover, the substitution is carried out in cyclic areas to obtain the maximum number of cycles that do not contain procedure calls,
moreover, the method of numbering the values is additionally checked by analysis that clarifies the equivalence for memory access operations,
moreover, when constructing canonical forms, the results of the analysis of the data flow by the method of numbering values are used,
moreover, when analyzing parallel loops, the loop tree is bypassed recursively from the root node with a check for the possibility of parallelizing the current loop, and the following types of analysis are performed:
analysis of the structure of the cycle for the presence of a single output, since it is possible to parallelize only cycles with one output, which are controlled by an inductive variable with invariant upper and lower boundaries, as well as an invariant step;
analysis for the presence of cycle reductions; recognized reductions are excluded in the analysis of inter-iteration dependencies;
analysis of inter-iteration dependences in a cycle, mainly by solving diophantine equations;
analysis of arrays for locality,
moreover, for parallel loops, they build a copy of the loop and dynamically check for the execution time of iterations and compare this time with the cost of resources for starting a parallel loop,
moreover, for parallel loops, operations are constructed that will ensure parallel execution of loops; when executed in parallel, several threads are launched, each of which performs part of the loop iterations,
moreover, the body of parallel loops is allocated into a new procedure, the parameters of which are assigned non-local data of loop iterations, and the local data structures of loop iterations are assigned as local data structures of the created procedure.