RU205189U1

RU205189U1 - Control device for the selection of means for parallelizing cyclic sections of program code

Info

Publication number: RU205189U1
Application number: RU2021100901U
Authority: RU
Inventors: Михаил Александрович Аксенов; Дмитрий Сергеевич Викторов; Александр Александрович Аксенов; Алексей Сергеевич Костров
Original assignee: Михаил Александрович Аксенов
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-06-30

Abstract

Полезная модель относится к вычислительной технике, в частности к области создания (программирования) параллельных программ, и может быть использована для автоматического выбора средства распараллеливания циклических участков разрабатываемых программ при разном числе итераций цикла.Технической задачей и результатом разработки полезной модели является сокращение временных затрат на выполнение сложных расчетов в циклических участках программ за счет автоматического выбора оптимального (с минимальным временем работы распараллеленного участка) средства распараллеливания под каждый цикл и текущего числа итераций (повторений) в цикле программы. Технический результат достигается использованием собранных при тестировании первичных статистических данных о времени выполнения каждого цикла программы при различном числе итераций этого цикла, специальной обработки полученных статистических данных с построением сглаженной эмпирической функции распределения времени, и проведением оценки времени выполнения цикла по верхней границе доверительного интервала для каждого средства распараллеливания и последовательного преобразования обработанной статистики в массив оптимальных средств распараллеливания, что позволяет выполнять быстрый поиск номера оптимального средства распараллеливания цикла и значительно сократить время работы целевой программы в целом.Устройство управления выбором средств распараллеливания циклических участков программного кода содержит блок базы знаний результирующей статистики, блок объяснения, пользовательский интерфейс, блок приобретения знаний, блок тестового прогона, блок хранения статистики, блок оценки времени, блок оценки номера средства.Новизной заявленной полезной модели является сбор статистических данных, характеризующих выполнение каждого цикла в виде четырех параметров при применении средств распараллеливания, построения по собранной статистике эмпирических сглаженных функций распределения времени, оценки времени выполнения цикла по верхней границе доверительного интервала функции распределения, создания массива оптимальных средств распараллеливания, ускоренного выбора средств распараллеливания из массива оптимальных по входным параметрам - номеру цикла и текущему числу итераций.The utility model relates to computer technology, in particular to the field of creation (programming) of parallel programs, and can be used to automatically select a means for parallelizing cyclic sections of programs under development with a different number of loop iterations. complex calculations in cyclic sections of programs due to automatic selection of the optimal (with a minimum running time of the parallelized section) parallelization tool for each cycle and the current number of iterations (repetitions) in the program cycle. The technical result is achieved by using the primary statistical data collected during testing on the execution time of each program cycle with a different number of iterations of this cycle, special processing of the obtained statistical data with the construction of a smoothed empirical time distribution function, and evaluating the cycle execution time along the upper limit of the confidence interval for each tool parallelization and sequential transformation of the processed statistics into an array of optimal parallelization tools, which allows you to quickly search for the number of the optimal loop parallelization tool and significantly reduce the operating time of the target program as a whole. , user interface, block of knowledge acquisition, block of test run, block of statistics storage, block of time estimation, b The novelty of the declared utility model is the collection of statistical data characterizing the execution of each cycle in the form of four parameters when using parallelization tools, building empirical smoothed time distribution functions from the collected statistics, estimating the cycle time at the upper boundary of the confidence interval of the distribution function, creating an array of optimal parallelization tools, accelerated selection of parallelization tools from an array of optimal ones in terms of input parameters - the cycle number and the current number of iterations.

Description

Полезная модель относится к вычислительной технике, в частности к области создания (программирования) параллельных программ. Технической задачей и результатом разработки полезной модели является сокращение временных затрат на выполнение сложных расчетов в циклических участках программ за счет автоматического выбора оптимального (с минимальным временем работы распараллеленного участка) средства распараллеливания под каждый цикл и текущего числа итераций (повторений) в цикле программы.The utility model relates to computer technology, in particular to the field of creation (programming) of parallel programs. The technical problem and the result of the development of a useful model is to reduce the time spent on performing complex calculations in cyclic sections of programs due to the automatic selection of the optimal (with a minimum running time of the parallelized section) parallelization tool for each cycle and the current number of iterations (repetitions) in the program cycle.

В последнее время в сфере инженерных и научно-технических расчетов, задач моделирования, задач реального времени наметилась тенденция в сторону отказа от последовательных решений и классических однопроцессорных компьютеров. Практически все современное создаваемое программное обеспечение в указанных выше областях ориентировано на параллельные вычислительные среды. Одновременно с возрастанием потребности в параллельных программах и с распространением многоядерных (многопроцессорных) вычислительных систем разработка параллельных программ становится все более актуальней, поскольку целесообразно максимально полно использовать все имеющиеся возможности современных ЭВМ. При этом многие вычислительные задачи наиболее эффективно решаются с помощью многоядерных вычислительных систем с общей памятью (SMP Symmetric Multiprocessing) [1].Recently, in the field of engineering and scientific and technical calculations, modeling tasks, real-time tasks, there has been a tendency towards rejection of sequential solutions and classical single-processor computers. Almost all modern software being created in the above areas is focused on parallel computing environments. Simultaneously with the growing demand for parallel programs and with the spread of multicore (multiprocessor) computing systems, the development of parallel programs is becoming more and more urgent, since it is advisable to make the most of all the available capabilities of modern computers. At the same time, many computational problems are most effectively solved using multicore computing systems with shared memory (SMP Symmetric Multiprocessing) [1].

В настоящее время ведутся интенсивные исследования в области применения всевозможных автоматизаций при создании параллельных программ, в частности в области создания инструментов разработки, отладки и исследования параллельных программ. Популярность систем с общей памятью подтверждается ростом и развитием количества всевозможных средств распараллеливания - стандартов и технологий таких как OpenCL, OpenMP, C++AMP, TBB, Cilk Plus, OpenACC и др., ориентированных на параллельное программирование, т.е. создание параллельных программ. Несмотря на то, что все средства распараллеливания предоставляют программисту возможности работы с параллельными потоками, их интерфейсы и внутренняя структура сильно отличаются. На данный момент не существует точного метода определения наиболее эффективного средства распараллеливания кроме сравнения их между собой применительно к конкретной решаемой задаче [2]. Таким образом в зависимости от текущего контекста программы для одного и того же участка программного кода эффективность различных средств распараллеливания будет отличаться. Критерием эффективности распараллеливания является время выполнения прикладной задачи при идентичных требованиях к аппаратному и программному обеспечению.At present, intensive research is underway in the field of using all kinds of automation in the creation of parallel programs, in particular in the field of creating development tools, debugging and researching parallel programs. The popularity of systems with shared memory is confirmed by the growth and development of the number of various parallelization tools - standards and technologies such as OpenCL, OpenMP, C ++ AMP, TBB, Cilk Plus, OpenACC, etc., focused on parallel programming, i.e. creation of parallel programs. Despite the fact that all parallelization tools provide the programmer with the ability to work with parallel threads, their interfaces and internal structure are very different. At the moment, there is no exact method for determining the most effective means of parallelization, except for comparing them with each other in relation to a specific problem being solved [2]. Thus, depending on the current context of the program for the same section of the program code, the effectiveness of various parallelization tools will differ. The criterion for the efficiency of parallelization is the execution time of an application with identical hardware and software requirements.

Непосредственным объектом распараллеливания в целевой программе целесообразно рассматривать циклические участки программы, т.к. около 80% времени работы программы занимает выполнение именно циклов [3]. Таким образом сокращение временных затрат на выполнение целевой программы предполагает распараллеливание циклических участков целевой программы многопоточное параллельное выполнение в многоядерных ПЭВМ.It is expedient to consider cyclic program sections as the direct object of parallelization in the target program, since about 80% of the program's time is spent on the execution of just cycles [3]. Thus, reducing the time spent on the execution of the target program involves the parallelization of cyclic sections of the target program, multithreaded parallel execution in multicore computers.

Средства распараллеливания для разработчика представляются «черным ящиком», они применяются с помощью внесения в программный код специальных дополнений (директив), представленных на фиг. 1.Parallelization tools for the developer are represented as a "black box"; they are applied by introducing special additions (directives) into the program code, shown in Fig. one.

В практической работе по применению средств распараллеливания было установлено, что существует зависимость времени выполнения цикла от применяемого средства распараллеливания, реализовавшего распараллеливание данного цикла и от количества итераций цикла.In practical work on the use of parallelization tools, it was found that there is a dependence of the cycle execution time on the parallelization tool used, which implements the parallelization of this cycle, and on the number of cycle iterations.

Разные средства распараллеливания имеют свои особенности, и в зависимости от входных параметров распараллеливаемого программного кода могут обладать разной эффективностью работы представленных на фиг. 2. Всего на фиг. 2 представлено 4 цикла с разной длинной (количеством итераций). Каждый циклический участок имеет характеристические параметры такие как номер цикла, количество повторений (итераций) в процессе работы программы и времени выполнения данного цикла. Прямоугольниками обозначены времена выполнения, соответствующих распаралливаемых циклов для указанных средств распараллеливания. Обозначениями T показаны суммарные времена работы целевой программы (состоящей из 4-х циклов), распараллеленной с помощью указанных технологий: Ts - Serial (последовательное выполнение), Tс - Cilk Plus, To - OpenMP, TBT - Boost Threads, TT - TBB. Максимальной эффективности (минимального времени) работы распараллеливаемой целевой программы можно достичь, если обеспечить возможность выбора технологии распараллеливания индивидуально для каждого цикла. Таким образом, оптимальное (Optimal) время выполнения программы будет составлять

, где

- минимальное среди всех средств распараллеливания время работы i-го цикла, N - число запусков распараллеливаемых циклов.Different means of parallelization have their own peculiarities, and depending on the input parameters of the program code being parallelized, they can have different efficiency of the operation shown in Fig. 2. In total, FIG. 2 shows 4 cycles with different lengths (number of iterations). Each cyclic section has characteristic parameters such as the cycle number, the number of repetitions (iterations) in the course of the program operation and the execution time of the given cycle. The rectangles indicate the execution times of the corresponding parallelized loops for the indicated parallelization means. The designations T show the total running times of the target program (consisting of 4 cycles) parallelized using the indicated technologies: Ts - Serial (sequential execution), Tc - Cilk Plus, To - OpenMP, TBT - Boost Threads, TT - TBB. The maximum efficiency (minimum time) of the parallelizable target program can be achieved if it is possible to select the parallelization technology individually for each cycle. Thus, the optimal (Optimal) program execution time will be

where

- the minimum among all parallelization tools, the running time of the i-th cycle, N - the number of starts of the parallelized cycles.

Заявленное устройство выполняет автоматический выбор оптимального средства распараллеливания циклического участка с минимальным временем выполнения цикла.The claimed device automatically selects the optimal means for parallelizing the cyclic section with a minimum cycle time.

Наиболее близким к заявленному устройству решением является система [4], в состав которой входит база знаний, машина логического вывода, рабочая память, блок объяснения, пользовательский интерфейс, блок приобретения знаний. А также близким решением является подход к использованию нескольких средств распараллеливания в одной разрабатываемой программе [5].The closest solution to the claimed device is the system [4], which includes a knowledge base, an inference engine, working memory, an explanation block, a user interface, and a knowledge acquisition block. And also a close solution is the approach to using several parallelization tools in one developed program [5].

Структурная схема заявленного устройства представлена на фиг. 5.The block diagram of the claimed device is shown in Fig. five.

БЗРС - Блок знаний результирующей статистики (1) предназначен для хранения данных результирующей статистики быстрого выбора номера средства распараллеливания полученной устройством в ходе работы и проведенным расчетам, правил и критериев выбора средства распараллеливания по входным параметрам. Вход 1 блока БЗРС соединен с выходом 7 блока МЛВ (2). Выход 3 блока БЗРС соединен с входом 1 блока МЛВ (2). Вход 2 блока БЗРС соединен с выходом 4 блока БОН (10).BZRS - The block of knowledge of the resulting statistics (1) is intended for storing the data of the resulting statistics for the quick selection of the number of the parallelization tool obtained by the device during operation and the calculations performed, the rules and criteria for selecting the parallelization tool by input parameters. Input 1 of the BZRS unit is connected to the output 7 of the MLV unit (2). Output 3 of the BZRS unit is connected to the input 1 of the MLV unit (2). Input 2 of the BZRS unit is connected to the output 4 of the BON unit (10).

МЛВ - Машина логического вывода (2) предназначена для обеспечения логических и арифметических действий с данными, хранящимися в блоке БРП (3), блоке БЗРС (1) и блоке БОН (10), а также для осуществления управляющих действий, запросов, передачи и приема команд и сигналов, необходимых при работе устройства. Входы 1, 2, 3 и 9 блока МЛВ соединены с выходами: 3 блока БЗРС (1), выходом 2 блока БРП (3), выходом 2 блока БПЗ (6) и выходом 3 блока БОН (10) соответственно. Выходы 4, 5, 6, 7 и 8 соединены с входами: 2 блока БОН (10), входом 2 блока БХС (8), входом 1 блока БО (4), входом 1 блока БЗРС (1) и входом 1 блока БРП (3) соответственно.MLV - Logic inference machine (2) is designed to provide logical and arithmetic operations with data stored in the PDU block (3), the BZRS block (1) and the BON block (10), as well as for the implementation of control actions, requests, transmission and reception commands and signals necessary for the operation of the device. Inputs 1, 2, 3 and 9 of the MLV unit are connected to the outputs: 3 units BZRS (1), output 2 of the BRP unit (3), output 2 of the BPZ unit (6) and output 3 of the BON unit (10), respectively. Outputs 4, 5, 6, 7 and 8 are connected to inputs: 2 BON blocks (10), input 2 of BHS block (8), input 1 of BO block (4), input 1 of BZRS block (1) and input 1 of BRP block ( 3) respectively.

БРП - Блок рабочей памяти (3) предназначен для хранения в памяти необходимых исходных данных (БИД 3.1) для выполнения тестового прогона в блоке БПК (7) и получения первичной статистики, хранения модифицированного программного кода целевой программы (БПК 3.2), хранения библиотек средств распараллеливания (БСР 3.3), необходимых для осуществления процесса распараллеливания. Входы 1 и 4 блока БРП (3) соединены с выходами: 8 блока МЛВ (2) и 3 блока БТП (7) соответственно, выходы 2 и 3 блока БРП (3) соединены с входами: 2 блока МЛВ (2), 1 блока БТП (7) соответственно.BRP - Working memory block (3) is intended for storing in memory the necessary initial data (BID 3.1) for performing a test run in the BOD block (7) and obtaining primary statistics, storing the modified program code of the target program (BOD 3.2), storing libraries of parallelization tools (BSR 3.3) required for the parallelization process. Inputs 1 and 4 of the PDU block (3) are connected to the outputs: 8 of the MLV unit (2) and 3 of the BPP unit (7), respectively, outputs 2 and 3 of the PDU unit (3) are connected to the inputs: 2 MLV units (2), 1 unit BTP (7), respectively.

БО - Блок объяснения (4) предназначен для передачи в блок ПИ (5) информационных сообщений при загрузке и подготовке исходных данных и другой информации в блоке БРП (3), а также передачи выходного параметра номера средства распараллеливания, выбранного устройством на совместно работающую целевую программу через блок ПИ (5). Входы 1 и 2 блока ПИ (5) соединены выходами: 3 блока БО (4) и 1 блока БПЗ (6) соответственно. Выходы 3 и 4 блока ПИ (5) соединены с входами: 2 блока БО (4) и 3 блока БПЗ (6) соответственно.BO - Explanation block (4) is intended for transferring information messages to the PI block (5) when loading and preparing the initial data and other information in the PDU block (3), as well as transmitting the output parameter of the number of the parallelization tool selected by the device to the jointly working target program through the PI block (5). Inputs 1 and 2 of PI block (5) are connected by outputs: 3 BO blocks (4) and 1 block BPZ (6), respectively. Outputs 3 and 4 of PI block (5) are connected to inputs: 2 BO blocks (4) and 3 BPZ blocks (6), respectively.

ПИ - Пользовательский интерфейс (5) предназначен для взаимодействия с пользователем (программистом-разработчиком) или с целевой программой, для оптимизации которой используется заявленное устройство. Входы 1, 2 блока ПИ (5) и выходы 3, 4 блока ПИ (5) соединены с выходами 3, 1 и входами 2, 3 блока БО (4), блока БПЗ (6) соответственно. Вход 6 и выход 5 блока ПИ (5) являются основными внешними входом и выходом заявленного устройства.PI - User interface (5) is intended for interaction with the user (programmer-developer) or with the target program, for the optimization of which the declared device is used. Inputs 1, 2 of PI block (5) and outputs 3, 4 of PI block (5) are connected to outputs 3, 1 and inputs 2, 3 of BO block (4), BPZ block (6), respectively. Input 6 and output 5 of the PI block (5) are the main external input and output of the declared device.

БПЗ - Блок приобретения знаний (6) предназначен для организации диалоговой процедуры, целью которой является ввод и корректировка сохраненных данных в блок БРП (3) через блок МЛВ (2). Вход 3 соединен с выходом 4 блока ПИ (5). Выход 2 и 1 блока БПЗ (6) соединены с входом 3 блока МЛВ (2) и входом 2 блока ПИ (5) соответственно.BPZ - The block of knowledge acquisition (6) is intended for organizing a dialogue procedure, the purpose of which is to enter and correct the stored data into the PDU block (3) through the MLV block (2). Input 3 is connected to output 4 of PI block (5). Output 2 and 1 of the BPZ unit (6) are connected to the input 3 of the MLV unit (2) and the input 2 of the PI unit (5), respectively.

БТП - Блок тестового прогона (7) предназначен для осуществления тестового запуска введенного программного кода БПК (3.2), запроса и применения при необходимости исходных данных БИД (3.1) и средств распараллеливания БСР (3.3), выходными параметрами блока БТП (7) является первичная статистика тестового прогона в виде табличной строки профиля конкретного цикла в составе: номера цикла, количества итераций, номера средства распараллеливания и времени выполнения. Вход 1 соединен с выходом 3 блока БРП (3). Выход 2 соединен с входом 1 блока БХС (8). Выход 3 соединен с входом 4 блока БРП (3).BTP - Test run block (7) is designed to carry out a test run of the entered BOD program code (3.2), request and use, if necessary, the initial BID data (3.1) and the BSR parallelization means (3.3), the output parameters of the BTP block (7) are primary statistics a test run in the form of a table row of a specific cycle profile, consisting of: cycle numbers, number of iterations, parallelizer number and execution time. Input 1 is connected to output 3 of the PDU (3). Output 2 is connected to input 1 of the BHS unit (8). Output 3 is connected to input 4 of the PDU block (3).

БХС - Блок хранения статистики (8) предназначен для хранения статистики, полученной из блока БТП (7). Статистические данные формируются в результате тестового прогона или добавляются вручную. Вход 1 соединен с выходами 2 блока БТП (7), вход 2 соединен с выходом 5 блока МЛВ (2). Выход 3 соединен с входом 1 блока БОВ (9).BCS - Statistics storage unit (8) is intended for storing statistics obtained from the BTP unit (7). Statistics are generated from a test run or added manually. Input 1 is connected to outputs 2 of the BTP unit (7), input 2 is connected to output 5 of the MLV unit (2). Output 3 is connected to input 1 of the BOV unit (9).

БОВ - Блок оценки времени выполнения цикла средствами распараллеливания (9) предназначен для обработки статистики, получаемой из блока БХС (8) и формирования временных отсчетов для получения выборок статистики каждого цикла при определенном числе итераций, а также специальной обработки полученных статистических данных с построением сглаженной эмпирической функции распределения времени, и проведения оценки времени выполнения по верхней границе доверительного интервала для каждого средства распараллеливания. Вход 1 соединен с выходом 3 блока БХС (8). Выход 2 соединен с входом 1 блока БОН (10).BOV - The block for estimating the cycle execution time by means of parallelization (9) is intended for processing statistics obtained from the BCS block (8) and forming time counts to obtain samples of statistics for each cycle with a certain number of iterations, as well as special processing of the obtained statistical data with the construction of a smoothed empirical time distribution functions, and estimating the execution time at the upper bound of the confidence interval for each parallelizer. Input 1 is connected to output 3 of the BHS unit (8). Output 2 is connected to input 1 of the BON unit (10).

БОН - Блок определения номера (10) предназначен для преобразования обработанной статистики, полученной из блока БОВ (9), в массив оптимальных средств (результирующей статистики) распараллеливания для каждого циклического участка программы и передачи в блок БЗРС (1), данный массив позволяет выполнять ускоренный поиск номера оптимального средства распараллеливания циклов и значительно сократить время программы в целом. Входы 1 и 2 блока БОН (10) соединены с выходами 2 блока БОВ (9) и 4 блока МЛВ (2) соответственно. Выходы 3 и 4 блока БОН (10) соединены с входами 9 блока МЛВ (2) и 2 блока БЗРС (1) соответственно.BON - The block for determining the number (10) is designed to convert the processed statistics obtained from the BOV block (9) into an array of optimal means (resulting statistics) for parallelization for each cyclic section of the program and transfer to the BZRS block (1), this array allows you to perform accelerated search for the number of the optimal means of parallelizing the loops and significantly reduce the program time as a whole. Inputs 1 and 2 of the BON unit (10) are connected to the outputs 2 of the BOV unit (9) and 4 of the MLV unit (2), respectively. Outputs 3 and 4 of the BON unit (10) are connected to the inputs 9 of the MLV unit (2) and 2 of the BZRS unit (1), respectively.

На предварительном этапе, представленном на фиг. 3 при подготовке разрабатываемой целевой программы для работы совместно с заявленным устройством, разработчик (программист) определяет в программном коде наиболее затратные циклы (циклические участки программы) номерует их, добавляет дополнительные метки начала и конца цикла, далее копирует их в код программы, создавая так называемые клоны циклов. Затем добавляет в каждый клон средство распараллеливания в виде специальных директив из доступного ему набора средств распараллеливания (см. фиг.1). Таким образом, каждый цикл получает одинаковое количество клонов, равное количеству доступных средств распараллеливания.In the preliminary step shown in FIG. 3 when preparing the target program being developed to work together with the declared device, the developer (programmer) determines in the program code the most expensive cycles (cyclic sections of the program), numbers them, adds additional marks of the beginning and end of the cycle, then copies them into the program code, creating the so-called clones of cycles. Then it adds a parallelization tool to each clone in the form of special directives from the set of parallelization tools available to it (see Fig. 1). This way, each loop gets the same number of clones, equal to the number of parallelizers available.

Под целевой программой понимается разрабатываемая программа, требующая максимально эффективного использования предоставленных аппаратно-технических возможностей, т.е. распараллеливания, такая как программа управления, моделирования, сложных математических расчетов в реальном масштабе времени.A target program is a program being developed that requires the most efficient use of the provided hardware and technical capabilities, i.e. parallelization, such as a control program, simulation, complex mathematical calculations in real time.

Тело программного кода увеличивается на определенное количество клонов циклов, равное количеству средств распараллеливания, но при этом продолжительность исполнения программы остается прежней, так как в процессе исполнения все клоны не выполняются, выполняется только один выбранный заявленным устройством это показано на фиг. 4. Процесс управления оптимальным выбором клона (с примененным средством распараллеливания) позволяет снизить временные затраты выполнения циклических участков программы и в целом всей программы. Это связано с тем, что временные затраты на выполнение одного и того же циклического участка с заданным количеством итераций при применении разных средств распараллеливания будут существенно отличаться. Поэтому оптимизация выбора ветки (клона) программы позволяет снизить временные затраты на выполнение цикла и программы в целом. Работа устройства автоматизирована, по входным параметрам циклического участка целевой программы (номера цикла и количества итераций в цикле). Заявленным устройством вырабатывается решение о применяемом средстве распараллеливания и формируется команда о подключении одной из веток клонированного данного циклического участка программы с минимальным временем распараллеливания цикла.The body of the program code is increased by a certain number of loop clones, equal to the number of parallelization means, but the duration of the program execution remains the same, since all the clones are not executed during execution, only one selected by the declared device is executed, this is shown in Fig. 4. The process of managing the optimal choice of the clone (with the parallelization tool applied) allows to reduce the time spent executing the cyclic sections of the program and the entire program as a whole. This is due to the fact that the time spent on executing the same cyclic section with a given number of iterations when using different parallelization tools will differ significantly. Therefore, optimization of the choice of a branch (clone) of the program allows you to reduce the time spent on executing the cycle and the program as a whole. The operation of the device is automated, according to the input parameters of the cyclic section of the target program (cycle number and the number of iterations in the cycle). The declared device generates a decision on the parallelization tool used and a command is formed to connect one of the branches of the cloned given cyclic program section with a minimum cycle parallelization time.

Работа устройства предполагает два этапа. Первый этап подготовительный, необходим для загрузки кода целевой программы, исходных данных и библиотек средств распараллеливания, а также выполнения тестового прогона целевой программы и наполнение начальной статистики. Второй этап - это непосредственно работа устройства с автоматическим выбором средства распараллеливания под каждый циклический участок целевой программы, взаимодействующей с заявленным устройством.The operation of the device involves two stages. The first preparatory stage is necessary for loading the target program code, source data and libraries of parallelization tools, as well as performing a test run of the target program and filling in the initial statistics. The second stage is directly the operation of the device with automatic selection of the parallelization tool for each cyclic section of the target program interacting with the declared device.

На первом этапе устройство подготавливается к работе, с этой целью на вход 6 блока ПИ (5) последовательно вводятся: код целевой программы, исходные данные и библиотеки средств распараллеливания, с выхода 4 которого пакеты поступают на вход 3 блока БПЗ (6), с выхода 2 которого поступают на вход 3 блока МЛВ (2) и с выхода 8 которого поступают на вход 2 блока БРП (3) (БИД (3.1), БПК (3.2), БСР (3.3)) соответственно на этом подготовка устройства к тестовому прогону заканчивается.At the first stage, the device is prepared for operation, for this purpose, at the input 6 of the PI block (5), the following are sequentially entered: the code of the target program, initial data and libraries of parallelization tools, from the output of which the packets enter the input 3 of the BPZ block (6), from the output 2 of which are fed to the input 3 of the MLV unit (2) and from the output 8 of which they enter the input 2 of the PDU unit (3) (BID (3.1), BOD (3.2), BSR (3.3)), respectively, this is where the preparation of the device for the test run ends ...

После этого по команде оператора с выхода 4 блока ПИ (5) формируется команда на включение устройства в режим тестового «прогона» целевой программы для сбора первичной статистики, сформированная управляющая команда в виде сигнала подается на вход 3 блока БПЗ (6), с выхода 2 которого подается на вход 3 блока МЛВ (2) с последующей передачей с выходов 4,5,6,7,8 на соответствующие входы: вход 2 блока БОН (10), вход 2 блока БХС (8), вход 1 блока БО (4), вход 1 блока БЗРС (1), вход 1 блока РП (3) для инициирования их работы в заданном режиме.After that, at the operator's command from the output 4 of the PI block (5), a command is generated to turn on the device into the test "run" mode of the target program to collect primary statistics, the generated control command in the form of a signal is fed to input 3 of the BPZ block (6), from output 2 which is fed to the input 3 of the MLV unit (2) with subsequent transmission from outputs 4,5,6,7,8 to the corresponding inputs: input 2 of the BON unit (10), input 2 of the BCS unit (8), input 1 of the BO unit (4 ), input 1 of the BZRS block (1), input 1 of the RP block (3) to initiate their operation in a given mode.

После получения управляющего сигнала о начале работы устройства, с выхода 3 блока БПК (3.2) на вход 1 блока БТП (7) подается модифицированный программный код с последующей загрузкой исходных данных из блока БИД (3.1) для тестового прогона. В блоке БТП (7) на основании модифицированной программы и исходных данных выполняется тестовый прогон всех клонов всех циклов с фиксацией номера цикла, времени его начала и конца, количества итераций в данном цикле, а также номера средства распараллеливания. Применение средств распараллеливания происходит по директивам, установленным перед каждым клоном циклического участка программы. При обнаружении директивы на распараллеливание очередного клона циклического участка с выхода 3 блока БТП (7) формируется команда запроса на подключение библиотеки средства распараллеливания к данному клону для управления распараллеливанием клона, которая подается на вход 4 блока БСР(3.3).After receiving the control signal about the start of the device operation, from the output 3 of the BOD block (3.2) to the input 1 of the BTP block (7), a modified program code is fed, followed by loading the initial data from the BID block (3.1) for a test run. In the FTP block (7), based on the modified program and initial data, a test run of all clones of all cycles is performed, with the cycle number, the time of its beginning and end, the number of iterations in this cycle, and the number of the parallelization tool fixed. Application of means of parallelization occurs according to the directives set before each clone of the cyclic section of the program. When a directive is found to parallelize the next clone of the cyclic section from the output 3 of the BTP block (7), a request command is generated to connect the library of the parallelization tool to this clone to control the parallelization of the clone, which is fed to input 4 of the BSR block (3.3).

По окончанию выполнения одного из клонов данного циклического участка фиксируется время выполнения клона. Последующие клоны циклических участков целевой программы выполняются аналогичным образом.At the end of the execution of one of the clones of this cyclic section, the time of the clone execution is recorded. Subsequent clones of the cyclic sections of the target program are executed in the same way.

На выходе 2 блока БТП формируются параметры время-итерационный профиля цикла для каждого клона в виде четырех параметров (номера цикла, количества итераций, номера средства распараллеливания, время выполнения клона данного цикла), которые подаются на вход 1 блока БХС (8) для наполнения таблицы первичной статистики (время-итерационного профиля). Модифицированный программный код целевой программы целиком «прогоняется» блоком БТП (7) с формированием время-итерационного профиля всех циклов и клонов циклов.At the output 2 of the BTP block, the parameters of the cycle time-iteration profile are formed for each clone in the form of four parameters (cycle number, number of iterations, the number of the parallelization tool, the execution time of the clone of this cycle), which are fed to the input 1 of the BCS block (8) to fill the table primary statistics (time-iteration profile). The modified program code of the target program is completely "run" by the BTP block (7) with the formation of the time-iteration profile of all cycles and clones of the cycles.

В связи с тем, что параметр времени выполнения цикла является случайной величиной, и функция распределения вероятности времени выполнения цикла неизвестна, поэтому возникает необходимость повторно 3-5 раз выполнять «тестовый прогон» для сбора достаточного количества статистических данных и последующего формирования временной выборки из профиля статистики с 3-5 отсчетами под каждый цикл и средство распараллеливания.Due to the fact that the parameter of the cycle execution time is a random variable, and the probability distribution function of the cycle execution time is unknown, therefore, it becomes necessary to repeat the "test run" 3-5 times to collect a sufficient amount of statistical data and then form a time sample from the statistics profile with 3-5 counts for each cycle and parallelization facility.

Накопленные в блоке БХС (8) данные первоначальной статистики с выхода 3 подаются на вход 1 блока БОВ (9), где происходит обработка первичных поступивших на вход статистических данных, математически строятся эмпирические функции распределения времени выполнения каждого цикла для определенного числа итераций [6], каждая эмпирическая функция распределения аппроксимируется гладкой кривой с применением полиномов Бернштейна [7], оцениваются значения верхней границы доверительного интервала времени выполнения каждого цикла для определенного числа итераций, формируется измененный время-итерационный профиль для каждого цикла в виде (номер цикла, количество итераций, номер средства распараллеливания, верхняя граница времени выполнения данного цикла). Полученный измененный профиль передается с выхода 2 на вход 1 блока БОН (10).The data of the initial statistics accumulated in the BCS block (8) from the output 3 is fed to the input 1 of the BOV block (9), where the primary statistical data received at the input is processed, the empirical distribution functions of the execution time of each cycle for a certain number of iterations are mathematically constructed [6], each empirical distribution function is approximated by a smooth curve using Bernstein polynomials [7], the values of the upper bound of the confidence interval of the execution time of each cycle for a certain number of iterations are estimated, a modified time-iteration profile for each cycle is formed in the form (cycle number, number of iterations, tool number parallelization, the upper bound on the execution time of this cycle). The resulting modified profile is transmitted from output 2 to input 1 of the BON block (10).

В блоке БОН (10) для всех поступивших и обработанных в блоке БОВ (10) время-итерационных профилей каждого цикла формируется массив результирующей статистики оптимальных средств распараллеливания цикла. Таким образом, результирующая статистика имеет вид массива с интервалами начального и конечного числа итераций цикла, номера оптимального средства распараллеливания для этого интервала итераций. С выхода 4 блока БОН (10) на вход 2 блока БЗРС(1) подается результирующая статистика для сохранения и последующего использования при выборе средства распараллеливания, что позволит осуществлять быстрый выбор оптимально средства.In the BON block (10), for all the time-iteration profiles of each cycle received and processed in the BOV block (10), an array of the resulting statistics of the optimal means for parallelizing the cycle is formed. Thus, the resulting statistics has the form of an array with intervals of the initial and final number of loop iterations, the number of the optimal parallelization tool for this iteration interval. From the output 4 of the BON block (10) to the input 2 of the BZRS block (1), the resulting statistics are fed for saving and subsequent use when choosing a parallelization tool, which will allow a quick selection of the optimal tool.

На втором этапе функционирования заявленного устройства, когда происходит совместная работа целевой программы и подключенного к ней устройства, работа происходит следующим образом, с целевой программы на вход 6 блока ПИ (5) подаются входные данные, полученные от целевой программы в виде номера цикла и количества итераций в цикле, с выхода 4 которого эти данные подаются на вход 3 блока БПЗ (6) и с выхода 2 которого подаются на вход 3 блока МЛВ (2). На другой вход 1 блока МЛВ (2) с выхода 3 блока БЗРС (1) подается массив результирующей статистики оптимальных средств распараллеливания цикла. На основе входных данных и имеющихся результирующих статистик, в блоке МЛВ (2) выполняется их сравнение и делается вывод о попадании входных данных в соответствующий итерационный интервал массива результирующей статистики и мгновенно происходит выбор оптимального средства распараллеливания. По результатам выбора номера оптимального средства распараллеливания формируется ответный сигнал с номером средства, который с выхода 6 блока МЛВ (2) подается на вход 1 блока БО (4), и далее с выхода 3 которого поступает на вход 1 блока ПИ (5) и через выход 5 блока ПИ (5) подается на управление подключением оптимального средства в целевой программе совместно работающей с заявленным устройством.At the second stage of the operation of the claimed device, when the target program and the device connected to it work together, the work proceeds as follows, the input data received from the target program in the form of a cycle number and the number of iterations is fed from the target program to the input 6 of the PI block (5) in a loop, from the output 4 of which these data are fed to the input 3 of the BPZ block (6) and from the output 2 of which they are fed to the input 3 of the MLV block (2). To the other input 1 of the MLV block (2) from the output 3 of the BZRS block (1), an array of the resulting statistics of the optimal means for parallelizing the cycle is fed. Based on the input data and the available resulting statistics, in the MLV block (2), they are compared and a conclusion is made that the input data has entered the corresponding iterative interval of the resulting statistics array, and the optimal parallelization tool is instantly selected. Based on the results of choosing the number of the optimal parallelization tool, a response signal is generated with the tool number, which from the output 6 of the MLV block (2) is fed to the input 1 of the BO block (4), and then from the output 3 of which it is fed to the input 1 of the PI block (5) and through the output 5 of the PI block (5) is fed to the control of the connection of the optimal means in the target program working together with the declared device.

При непопадании в сформированные интервалы результирующей статистики, поступившие входные данные с выхода 4 блока МЛВ (2) подаются на вход 2 блока БОН (10), где производится математический расчет времени выполнения цикла по графикам аппроксимированных трендов изменения времени от количества итераций для доступных средств распараллеливания. Далее производится выбор оптимального средства по минимальному времени выполнения цикла и с выхода 3 блока БОН (10) сигнал поступает на вход 9 блока МЛВ (2), а далее на выход 6 и дальнейшие этапы аналогичны при попадании входных данных цикла в итерационный интервал.If the resulting statistics do not fall into the generated intervals, the input data received from the output 4 of the MLV block (2) is fed to the input 2 of the BON block (10), where the cycle execution time is mathematically calculated according to the graphs of the approximated trends in the time change versus the number of iterations for the available parallelization tools. Next, the optimal tool is selected according to the minimum cycle execution time and from the output 3 of the BON block (10) the signal goes to the input 9 of the MLV block (2), and then to the output 6 and further stages are similar when the input data of the cycle enters the iteration interval.

В текущем режиме совместной работы целевой программы и заявленного устройства собирается статистика выполнения циклических участков целевой программы и происходит систематическое пополнение первичной статистики в блоке БХС (8) для корректировки массива результирующей статистики оптимальных средств распараллеливания.In the current mode of joint operation of the target program and the declared device, statistics of the execution of the cyclic sections of the target program are collected and the primary statistics are systematically replenished in the BCS block (8) to correct the array of the resulting statistics of the optimal parallelization tools.

Кроме того, данные находящиеся в блоке БХС (8) и данные находящиеся в блоке БЗРС (1), могут быть скорректированы через блок ПИ (5) в ручном режиме.In addition, the data located in the BCS block (8) and the data located in the BZRS block (1) can be corrected through the PI block (5) in manual mode.

Элементы заявленной полезной модели выполнены в виде программно-технических модулей на плате расширения ПЭВМ архитектуры типа х86.The elements of the claimed utility model are made in the form of software and hardware modules on a PC expansion board of x86 architecture.

Путем проведения экспериментов на исследовательском моделирующем комплексе военных действий в Военной академии воздушно-космической обороны имени Маршала Советского Союза Г.К. Жукова, было установлено, что применение заявленного устройства позволяет сократить время на получение результатов моделирования и оперативно-тактических расчетов в 1,3-1,4 раза.By conducting experiments on the research modeling complex of military operations at the Military Academy of Aerospace Defense named after Marshal of the Soviet Union G.K. Zhukov, it was found that the use of the claimed device can reduce the time needed to obtain simulation results and operational-tactical calculations by 1.3-1.4 times.

Представленный вариант заявленного устройства не исчерпывает возможные способы его практического исполнения.The presented version of the claimed device does not exhaust the possible ways of its practical implementation.

Источники, принятые во внимание:Sources taken into account:

1. Automatic Paralleli/ation. Carnegie Mellon University, HPCA-4, February 1-4, 1998. Sutter H. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, March 2005.1. Automatic Paralleli / ation. Carnegie Mellon University, HPCA-4, February 1-4, 1998. Sutter H. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, March 2005.

2. Аксенов М.А. Анализ технологий параллельного программирования для применения в моделирующих комплексах военного назначения в интересах повышения эффективности моделирования//Сборник научных трудов IV Всероссийской научно-практической конференции «Актуальные вопросы состояния и перспектив развития сложных технических систем военного назначения» ВУЦ МГТУ имени Н.Э. Баумана. 2020. С.77-86.2. Aksenov M.A. Analysis of Parallel Programming Technologies for Application in Modeling Complexes for Military Purposes in the Interest of Increasing the Efficiency of Modeling // Collection of Scientific Papers of the IV All-Russian Scientific and Practical Conference "Topical Issues of the State and Prospects for the Development of Complex Technical Systems for Military Purpose" Bauman. 2020.P.77-86.

3. Воеводин В.В., Воеводин Вл.В. Параллельные вычисления. Спб.:Санкт-Петербург, 2002. 608 с. 3. Voevodin VV, Voevodin Vl.V. Parallel computing. St. Petersburg: St. Petersburg, 2002. 608 p.

4. Джаратано Д., Райли Г. Экспертные системы. Принципы разработки и программирования. - М.: Издательский дом "Вильяме", 2007, рис. 1.6, с. 70;4. Jaratano D., Riley G. Expert systems. Design and programming principles. - M .: Publishing house "Williams", 2007, fig. 1.6, p. 70;

5. Андреев А.Е. Технологии программирования многопроцессорных систем: учеб. пособие / Андреев А.Е., Егунов В.А., Шаповалов О.В.; ВолгГТУ. Волгоград, 2015. 103 с. 5. Andreev A.E. Technologies for programming multiprocessor systems: textbook. allowance / Andreev A.E., Egunov V.A., Shapovalov O.V .; VolgSTU. Volgograd, 2015.103 p.

6. Красовский Ю.М. Математические основы военной кибернетики [Текст]: (Учеб. пособие) / Ю.М. Красовский, М.С. Любушкин; Отв. ред. М.С. Любушкин; Воен. командная акад. противовоздуш. обороны. Кафедра №12. - Калинин: [б. и.], 1971. - 446 с. 6. Krasovsky Yu.M. Mathematical foundations of military cybernetics [Text]: (Textbook) / Yu.M. Krasovsky, M.S. Lyubushkin; Resp. ed. M.S. Lyubushkin; Military. team acad. anti-aircraft. defense. Department number 12. - Kalinin: [b. and.], 1971. - 446 p.

7. Голик Ф.В. Аппроксимация эмпирических распределений вероятностей полиномами Бернштейна. Журнал радиоэлектроники [электронный журнал]. 2018. №7.7. Golik F.V. Approximation of empirical probability distributions by Bernstein polynomials. Radio Electronics Journal [electronic journal]. 2018. No. 7.

Claims

The control device for the selection of means for parallelization of cyclic sections of the program code is implemented by the block of the logical inference machine (MLV), which performs the operation of choosing the optimal parallelization tool based on the input data in the form of the cycle number and the number of iterations in the cycle, obtained from the knowledge acquisition unit (KLU) and the array of resulting statistics optimal means of parallelizing a cycle obtained from a block of knowledge of the resulting statistics (BZRS), containing a block of the knowledge base of the resulting statistics (BZRS), a block of an inference machine (MLV), a block of working memory (BW), an explanation block (BO), a user interface (PI ), a block of knowledge acquisition (BPZ), where 3 output and 1 input of the BZRS block are connected to 1 input and 7 output, respectively, of the MLV unit, 8 and 6 outputs of which are connected to 1 inputs, respectively, of the BRP and BO blocks, the 3 output and 2 inputs of which are connected with 1 input and 3 output, respectively, of the PI block, 6 input and 5 output of which are the main external and the input and output of the device, and output 4 and input 2 of the PI unit are connected to the 3 input and 1 output, respectively, of the BPZ unit, the 2 output of which is connected to the 3 input of the MLV unit, characterized in that, according to the utility model, it additionally contains a test run unit (BTP) , a statistics storage unit (BCS), a unit for evaluating the cycle execution time by means of parallelization (BOV), a unit for determining the number of a parallelization tool (BON), a working memory unit (BDU), which additionally includes a block of initial data (BID), a block of program code ( BPK), a block of parallelization means (BSR), while the 2 and 3 outputs of the PDU block are connected to 2 and 1 inputs of the MLV and BTP blocks, respectively, and the input 4 of the PDU block is connected to the 3rd output of the BTP unit, the 2nd output of which is connected to 1 input of the block BCS, the 3 output of which is connected to 1 input of the BOV unit, and the 2 input of the BCS unit is connected to the 5 output of the MLV unit, which performs the operation of choosing the optimal parallelization tool based on the input data at input 3 in the form of a qi number kla and the number of iterations in the cycle obtained from the output of the 2 block of knowledge acquisition (BK) and the array of the resulting statistics of the optimal means of parallelizing the cycle obtained at the input 1 from the output of the 3 block of knowledge of the resulting statistics (BZRS), 4 output and 9 input of the MLV block are connected to 2 input and 3 output, respectively, of the BON block, 4 output and 1 input of which are connected to the 2 input and 2 output of the BZRS and BOV blocks, respectively, and the control inputs and outputs of the BID, BOD, BSR blocks, which are part of the BRP block, have the same numbers for each block.