KR20250069168A

KR20250069168A - Apparatus for distributed scheduling of analytics workflow and method thereof

Info

Publication number: KR20250069168A
Application number: KR1020230155538A
Authority: KR
Inventors: 손시운; 원희선
Original assignee: 한국전자통신연구원
Priority date: 2023-11-10
Filing date: 2023-11-10
Publication date: 2025-05-19

Abstract

본 발명은 분석 워크플로우의 분산 스케줄링 장치 및 그 방법에 관한 것으로, 통신모듈, 메모리 및 상기 통신모듈 및 메모리와 연결되는 프로세서를 포함하고, 프로세서는 통신모듈을 통해 수신된 분석 워크플로우를 분석하여 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하고, 선택된 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배포하며, 각 응용 작업이 배포된 클라우드에서 해당 응용 작업이 분산 실행되도록 하는 것을 특징으로 한다. The present invention relates to a distributed scheduling device for an analysis workflow and a method thereof, comprising: a communication module, a memory, and a processor connected to the communication module and the memory, wherein the processor analyzes an analysis workflow received through the communication module to select at least one application task included in the analysis workflow, and distributes each application task to a different cloud in consideration of security information and movement cost of target data required for each selected application task, and causes the application task to be distributedly executed in the cloud to which each application task is distributed.

Description

{APPARATUS FOR DISTRIBUTED SCHEDULING OF ANALYTICS WORKFLOW AND METHOD THEREOF}

본 발명은 분석 워크플로우의 분산 스케줄링 장치 및 그 방법에 관한 것으로, 보다 상세하게는 분석 워크플로우의 각 응용 작업들을 다수의 클라우드 환경에서 분산 병렬적으로 처리할 수 있도록 하는 분석 워크플로우의 분산 스케줄링 장치 및 그 방법에 대한 것이다.The present invention relates to a distributed scheduling device for an analysis workflow and a method thereof, and more specifically, to a distributed scheduling device for an analysis workflow and a method thereof that enable each application task of an analysis workflow to be processed in a distributed, parallel manner in a plurality of cloud environments.

공공 또는 개인에서 발생하는 데이터는 방대하며 서로 다른 기관에서 관리된다. 데이터 관리 기관들이 데이터를 제공하면, 데이터 사용자는 데이터 관리 기관으로부터 온라인으로 데이터를 다운로드 받거나 오프라인으로 데이터 관리 기관에 방문해서 접근한다. 이 과정에서 대용량의 데이터 이동을 위한 높은 네트워크 통신 비용이 발생하거나 데이터 보안 문제가 발생할 수 있다.Data generated by public or private entities are vast and managed by different organizations. When data management organizations provide data, data users download data online from the data management organization or visit the data management organization offline to access it. In this process, high network communication costs for moving large amounts of data or data security issues may arise.

다수의 데이터로부터 가공, 처리, 학습, 분석, 시각화, 저장 등 일련의 작업 과정을 수행하는 분석 워크플로우는 데이터에 접근하여 작업을 수행하는 응용 작업으로 구성된다. 각 응용 작업은 서로 다른 데이터 관리 기관으로부터 데이터에 접근해야 한다. 하지만, 분석 워크플로우가 단일 지역에서 수행된다면, 분석 워크플로우 실행 위치와 데이터 관리 기관 간의 네트워크 통신이 발생할 수 밖에 없다.The analysis workflow, which performs a series of tasks such as processing, handling, learning, analyzing, visualizing, and storing from a large amount of data, consists of application tasks that access the data and perform tasks. Each application task must access data from different data management organizations. However, if the analysis workflow is performed in a single location, network communication between the analysis workflow execution location and the data management organization is inevitable.

본 발명의 배경기술은 대한민국 공개특허공보 제10-2015-0017052호에 개시되어 있다.The background technology of the present invention is disclosed in Korean Patent Publication No. 10-2015-0017052.

본 발명은 분석 워크플로우의 각 응용 작업들을 다수의 클라우드 환경에서 분산 병렬적으로 처리할 수 있도록 하는 분석 워크플로우의 분산 스케줄링 장치 및 그 방법을 제공하는 데 그 목적이 있다. The purpose of the present invention is to provide a distributed scheduling device and method for an analysis workflow that enables each application task of an analysis workflow to be processed in a distributed and parallel manner in multiple cloud environments.

본 발명의 일부 실시예에 따른 분석 워크플로우의 분산 스케줄링 장치는, 통신모듈, 메모리 및 상기 통신모듈 및 메모리와 연결되는 프로세서를 포함하고, 상기 프로세서는, 상기 통신모듈을 통해 수신된 분석 워크플로우를 분석하여 상기 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하고, 상기 선택된 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배포하며, 각 응용 작업이 배포된 클라우드에서 해당 응용 작업이 분산 실행되도록 하는 것을 특징으로 한다.A distributed scheduling device for an analysis workflow according to some embodiments of the present invention comprises a communication module, a memory, and a processor connected to the communication module and the memory, wherein the processor analyzes an analysis workflow received through the communication module to select at least one application task included in the analysis workflow, and distributes each application task to a different cloud by considering security information and movement cost of target data required for each of the selected application tasks, and causes the application tasks to be distributedly executed in the cloud to which each application task is distributed.

본 발명의 일부 실시예에서, 상기 프로세서는, 상기 분석 워크플로우에 포함된 복수의 응용 작업, 각 응용 작업의 입력 데이터 정보, 및 각 응용 작업의 요구 컴퓨팅 자원 중 적어도 하나를 분석하는 구분 분석기, 각 응용 작업에 필요한 대상 데이터의 보안 정보를 포함하는 대상 데이터 정보를 조회하는 대상 데이터 조회기, 각 응용 작업의 요구 컴퓨팅 자원에 기초하여 사용 가능한 클라우드를 조회하는 클라우드 자원 조회기, 상기 조회된 클라우드 중에서 상기 대상 데이터의 보안 정보 및 이동 비용을 고려하여, 각 응용 작업의 최적 클라우드를 선택하는 스케줄러, 및 상기 선택된 최적 클라우드에 각 응용 작업의 실행을 요청하는 실행기를 포함하는 것을 특징으로 한다. In some embodiments of the present invention, the processor is characterized by including a classification analyzer that analyzes at least one of a plurality of application tasks included in the analysis workflow, input data information of each application task, and required computing resources of each application task, a target data query unit that queries target data information including security information of target data required for each application task, a cloud resource query unit that queries available clouds based on the required computing resources of each application task, a scheduler that selects an optimal cloud for each application task by considering the security information and movement cost of the target data among the searched clouds, and an executor that requests execution of each application task on the selected optimal cloud.

본 발명의 일부 실시예에서, 상기 구분 분석기는, 상기 분석 워크플로우의 구분 분석을 통해 상기 분석 워크플로우가 유효한 분석 워크플로우인지를 판단하고, 유효한 분석 워크플로우인 경우 상기 분석 워크플로우에서 실행할 응용 작업들을 확인하고, 유효하지 않은 분석 워크플로우인 경우 상기 분석 워크플로우를 종료하고 다음 분석 워크플로우를 대기하는 것을 특징으로 한다. In some embodiments of the present invention, the classification analyzer is characterized in that it determines whether the analysis workflow is a valid analysis workflow through classification analysis of the analysis workflow, and if it is a valid analysis workflow, it checks application tasks to be executed in the analysis workflow, and if it is an invalid analysis workflow, it terminates the analysis workflow and waits for the next analysis workflow.

본 발명의 일부 실시예에서, 상기 대상 데이터 조회기는, 상기 응용 작업의 입력 데이터 정보를 기초로, 상기 대상 데이터의 데이터 관리 기관에 조회하여 접근 가능한 대상인지 확인하고, 상기 대상 데이터의 보안 정보 및 접근정보를 포함하는 대상 데이터 정보를 조회하는 것을 특징으로 한다. In some embodiments of the present invention, the target data query device is characterized in that it queries a data management organization of the target data based on input data information of the application task to confirm whether the target data is an accessible target, and queries target data information including security information and access information of the target data.

본 발명의 일부 실시예에서, 상기 스케줄러는, 상기 조회된 클라우드 중에서 상기 대상 데이터의 보안 정보에 따라 접근할 수 있는 적어도 하나의 클라우드를 추출하고, 상기 추출된 적어도 하나의 클라우드 중에서 상기 대상 데이터의 이동 비용이 가장 적은 클라우드를 최적 클라우드로 선택하는 것을 특징으로 한다. In some embodiments of the present invention, the scheduler is characterized in that it extracts at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and selects a cloud having the lowest movement cost of the target data from among the extracted at least one cloud as an optimal cloud.

본 발명의 일부 실시예에서, 상기 대상 데이터의 이동 비용은, 상기 대상 데이터의 크기 및 네트워크 거리 중 적어도 하나에 기초하여 산출된 값인 것을 특징으로 한다. In some embodiments of the present invention, the movement cost of the target data is characterized in that it is a value calculated based on at least one of the size of the target data and the network distance.

본 발명의 일부 실시예에서, 상기 실행기는, 각 응용 작업이 정상적으로 실행되면, 각 응용 작업의 상태를 모니터링하는 것을 특징으로 한다. In some embodiments of the present invention, the executor is characterized in that it monitors the status of each application task when each application task is executed normally.

본 발명의 일부 실시예에 따른 분석 워크플로우의 분산 스케줄링 방법은, 프로세서가, 클라이언트로부터 수신된 분석 워크플로우를 분석하여 상기 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하는 단계, 상기 프로세서가, 상기 선택된 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배치하는 단계, 상기 프로세서가, 각 응용 작업이 배치된 클라우드에서 해당 응용 작업이 분산 실행되도록 제어하는 단계를 포함하는 것을 특징으로 한다. A method for distributed scheduling of an analysis workflow according to some embodiments of the present invention is characterized by including a step in which a processor analyzes an analysis workflow received from a client and selects at least one application task included in the analysis workflow, a step in which the processor places each application task in a different cloud by considering security information and a movement cost of target data required for each of the selected application tasks, and a step in which the processor controls the application tasks to be distributedly executed in the cloud in which each application task is placed.

본 발명의 일부 실시예에서, 상기 적어도 하나의 응용 작업을 선택하는 단계는, 상기 프로세서가, 상기 분석 워크플로우에 포함된 복수의 응용 작업, 각 응용 작업의 입력 데이터 정보, 및 각 응용 작업의 요구 컴퓨팅 자원 중 적어도 하나를 확인하는 단계, 상기 프로세서가, 각 응용 작업을 검증하여 상기 분석 워크플로우가 유효한 분석 워크플로우인지를 판단하는 단계, 및 유효한 분석 워크플로우인 경우, 상기 프로세서가 상기 분석 워크플로우에서 실행할 응용 작업들을 확인하는 단계를 포함하는 것을 특징으로 한다. In some embodiments of the present invention, the step of selecting at least one application task is characterized in that the step of the processor includes the step of checking at least one of a plurality of application tasks included in the analysis workflow, input data information of each application task, and required computing resources of each application task, the step of the processor verifying each application task to determine whether the analysis workflow is a valid analysis workflow, and if it is a valid analysis workflow, the step of the processor checking application tasks to be executed in the analysis workflow.

본 발명의 일부 실시예에서, 상기 각 응용 작업을 서로 다른 클라우드에 배치하는 단계는, 상기 프로세서가, 각 응용 작업에 필요한 대상 데이터의 보안 정보를 포함하는 대상 데이터 정보를 조회하는 단계, 상기 프로세서가, 각 응용 작업의 요구 컴퓨팅 자원에 기초하여 사용 가능한 클라우드를 조회하는 단계, 및 상기 프로세서가, 상기 조회된 클라우드 중에서 상기 대상 데이터의 보안 정보 및 이동 비용을 고려하여, 각 응용 작업의 최적 클라우드를 선택하는 단계를 포함하는 것을 특징으로 한다. In some embodiments of the present invention, the step of deploying each application task to a different cloud is characterized in that it includes the step of the processor retrieving target data information including security information of target data required for each application task, the step of the processor retrieving available clouds based on required computing resources of each application task, and the step of the processor selecting an optimal cloud for each application task from among the retrieved clouds, considering the security information and movement cost of the target data.

본 발명의 일부 실시예는 상기 대상 데이터 정보를 조회하는 단계에서, 상기 프로세서는, 상기 응용 작업의 입력 데이터 정보를 기초로, 상기 대상 데이터의 데이터 관리 기관에 조회하여 접근 가능한 대상인지 확인하고, 상기 대상 데이터의 보안 정보 및 접근정보를 포함하는 대상 데이터 정보를 조회하는 것을 특징으로 한다. Some embodiments of the present invention are characterized in that, in the step of searching for the target data information, the processor searches a data management organization of the target data based on input data information of the application task to confirm whether the target data is an accessible target, and searches for target data information including security information and access information of the target data.

본 발명의 일부 실시예는 상기 각 응용 작업의 최적 클라우드를 선택하는 단계에서, 상기 프로세서는, 상기 조회된 클라우드 중에서 상기 대상 데이터의 보안 정보에 따라 접근할 수 있는 적어도 하나의 클라우드를 추출하고, 상기 추출된 적어도 하나의 클라우드 중에서 이동 비용이 가장 적은 클라우드를 최적 클라우드로 선택하는 것을 특징으로 한다. Some embodiments of the present invention are characterized in that, in the step of selecting the optimal cloud for each application task, the processor extracts at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and selects the cloud with the lowest movement cost from among the extracted at least one cloud as the optimal cloud.

본 발명의 일부 실시예에서 상기 대상 데이터의 이동 비용은, 상기 대상 데이터의 크기 및 네트워크 거리 중 적어도 하나에 기초하여 산출된 값인 것을 특징으로 한다. In some embodiments of the present invention, the movement cost of the target data is characterized in that it is a value calculated based on at least one of the size of the target data and the network distance.

본 발명의 일부 실시예는 상기 분산 실행되도록 제어하는 단계에서, 상기 프로세서는, 각 응용 작업이 정상적으로 실행되면, 각 응용 작업의 상태를 모니터링하는 것을 특징으로 한다. Some embodiments of the present invention are characterized in that, in the step of controlling the distributed execution, the processor monitors the status of each application task when each application task is executed normally.

본 발명의 일부 실시예에 따른 분석 워크플로우의 분산 스케줄링 방법은, 프로세서가, 클라이언트로부터 수신된 분석 워크플로우를 분석하여 상기 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하는 단계, 상기 프로세서가, 상기 선택된 각 응용 작업에 필요한 대상 데이터에 대한 정보를 조회하는 단계, 상기 프로세서가, 각 응용 작업의 요구 컴퓨팅 자원에 기초하여 사용 가능한 클라우드를 조회하는 단계, 상기 프로세서가, 상기 대상 데이터의 보안 정보와 이동 비용을 고려하여, 상기 조회된 클라우드 중에서 각 응용 작업을 서로 다른 클라우드에 배치하는 단계, 및 상기 프로세서가, 각 응용 작업이 배치된 클라우드에서 해당 응용 작업이 분산 실행되도록 제어하는 단계를 포함하는 것을 특징으로 한다. A method for distributed scheduling of an analysis workflow according to some embodiments of the present invention is characterized by including: a step in which a processor analyzes an analysis workflow received from a client and selects at least one application task included in the analysis workflow; a step in which the processor searches for information on target data required for each of the selected application tasks; a step in which the processor searches for available clouds based on required computing resources of each application task; a step in which the processor places each application task on a different cloud among the searched clouds in consideration of security information and a movement cost of the target data; and a step in which the processor controls the application task to be distributedly executed in the cloud in which each application task is placed.

본 발명의 일부 실시예에서 상기 적어도 하나의 응용 작업을 선택하는 단계는, 상기 프로세서가, 상기 분석 워크플로우에 포함된 복수의 응용 작업, 각 응용 작업의 입력 데이터 정보, 및 각 응용 작업의 요구 컴퓨팅 자원 중 적어도 하나를 확인하는 단계, 상기 프로세서가, 각 응용 작업을 검증하여 상기 분석 워크플로우가 유효한 분석 워크플로우인지를 판단하는 단계, 및 유효한 분석 워크플로우인 경우, 상기 프로세서가 상기 분석 워크플로우에서 실행할 응용 작업들을 확인하는 단계를 포함하는 것을 특징으로 한다. In some embodiments of the present invention, the step of selecting at least one application task is characterized in that the step of the processor includes the step of checking at least one of a plurality of application tasks included in the analysis workflow, input data information of each application task, and required computing resources of each application task, the step of the processor verifying each application task to determine whether the analysis workflow is a valid analysis workflow, and if it is a valid analysis workflow, the step of the processor checking application tasks to be executed in the analysis workflow.

본 발명의 일부 실시예는 상기 대상 데이터에 대한 정보를 조회하는 단계에서, 상기 프로세서는, 상기 응용 작업의 입력 데이터 정보를 기초로, 상기 대상 데이터의 데이터 관리 기관에 조회하여 접근 가능한 대상인지 확인하고, 상기 대상 데이터의 보안 정보 및 접근정보를 포함하는 대상 데이터 정보를 조회하는 것을 특징으로 한다. Some embodiments of the present invention are characterized in that, in the step of searching for information on the target data, the processor searches a data management organization of the target data based on input data information of the application task to confirm whether the target data is an accessible target, and searches for target data information including security information and access information of the target data.

본 발명의 일부 실시예는 상기 각 응용 작업을 서로 다른 클라우드에 배치하는 단계에서, 상기 프로세서는, 상기 조회된 클라우드 중에서 상기 대상 데이터의 보안 정보에 따라 접근할 수 있는 적어도 하나의 클라우드를 추출하고, 상기 추출된 적어도 하나의 클라우드 중에서 이동 비용이 가장 적은 클라우드를 최적 클라우드로 선택하는 것을 특징으로 한다. Some embodiments of the present invention are characterized in that, in the step of deploying each application task to a different cloud, the processor extracts at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and selects a cloud with the lowest movement cost from among the extracted at least one cloud as the optimal cloud.

본 발명의 일부 실시예에서 상기 이동 비용은, 상기 대상 데이터의 크기 및 네트워크 거리 중 적어도 하나에 기초하여 산출된 값인 것을 특징으로 한다. In some embodiments of the present invention, the movement cost is characterized in that it is a value calculated based on at least one of the size of the target data and the network distance.

본 발명에 따른 분석 워크플로우의 분산 스케줄링 장치 및 그 방법은, 분석 워크플로우에 포함된 각 응용 작업의 대상 데이터들의 보안 정보와 이동 비용을 고려하여, 각 응용 작업들을 서로 다른 클라우드에 배포 및 실행함으로써, 분석 워크플로우의 각 응용 작업들을 분산 병렬적으로 처리할 수 있도록 하는 효과가 있다. The distributed scheduling device and method of an analysis workflow according to the present invention have the effect of enabling each application task of an analysis workflow to be processed in a distributed, parallel manner by distributing and executing each application task to different clouds while considering the security information and movement cost of target data of each application task included in the analysis workflow.

본 발명에 따른 분석 워크플로우의 분산 스케줄링 장치 및 그 방법은, 서로 다른 데이터 관리 기관의 데이터들을 함께 활용하여 다양한 클라우드 환경에서 효율적으로 분석 워크플로우를 실행할 수 있도록 하는 효과가 있다. The distributed scheduling device and method of an analysis workflow according to the present invention have the effect of enabling efficient execution of an analysis workflow in various cloud environments by jointly utilizing data from different data management organizations.

도 1은 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 서비스 제공 시스템을 나타낸 블록 구성도이다.
도 2는 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 장치를 나타낸 블록 구성도이다.
도 3은 도 2에 도시된 프로세서의 기능을 설명하기 위한 블록 구성도이다.
도 4는 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 방법을 설명하기 위한 흐름도이다. FIG. 1 is a block diagram illustrating a distributed scheduling service providing system of an analysis workflow according to one embodiment of the present invention.
FIG. 2 is a block diagram showing a distributed scheduling device of an analysis workflow according to an embodiment of the present invention.
Figure 3 is a block diagram for explaining the function of the processor illustrated in Figure 2.
FIG. 4 is a flowchart illustrating a distributed scheduling method of an analysis workflow according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 분석 워크플로우의 분산 스케줄링 장치 및 그 방법의 실시예를 설명한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로, 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, an embodiment of a distributed scheduling device and method of an analysis workflow according to the present invention will be described with reference to the attached drawings. In this process, the thickness of lines and the size of components illustrated in the drawings may be exaggerated for clarity and convenience of explanation. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definitions of these terms should be made based on the contents throughout this specification.

최근, 클라우드 기술이 발전하면서 개인 또는 회사가 자체적으로 구축한 컴퓨팅 서버들이 아닌 클라우드 서비스 제공 기관의 기술을 활용할 수 있다. 또는 개인 또는 회사는 사설 클라우드를 직접 구축할 수 있다. 클라우드 환경을 통해 가상의 컴퓨팅 자원들을 생성하고 사용자들에게 할당할 수 있다.Recently, with the development of cloud technology, individuals or companies can utilize the technology of cloud service providers rather than their own computing servers. Or, individuals or companies can build their own private clouds. Through the cloud environment, virtual computing resources can be created and allocated to users.

데이터 관리 기관들은 데이터를 제공하면서 동시에 데이터 관련 작업을 수행하도록 클라우드 환경을 구축하기도 한다. 이를 통해 데이터 관리 기관 외부로 데이터가 이동하지 않아 네트워크 통신 비용이 적고 보안 문제를 줄일 수 있다. 하지만, 단일 데이터 관리 기관에서만 데이터를 제공받는다면 다른 데이터 관리 기관의 데이터는 작업에 활용할 수 없다.Data management organizations also build cloud environments to perform data-related tasks while providing data. This reduces network communication costs and security issues because data does not move outside the data management organization. However, if data is provided only from a single data management organization, data from other data management organizations cannot be used for tasks.

따라서, 분석 워크플로우의 각 응용 작업에 필요한 대상 데이터들의 관리 기관이 신뢰하는 클라우드 환경에 응용 작업들을 배포 및 실행할 수 있도록 하는 분산 스케줄링 기술이 필요하다.Therefore, a distributed scheduling technology is needed that enables the management agency of the target data required for each application task of the analysis workflow to distribute and execute application tasks in a trusted cloud environment.

이에, 본 발명은 지리적으로 서로 다른 위치에서 관리되는 데이터들을 다수의 응용 작업들로 분산 병렬 처리하기 위한 분석 워크플로우의 분산 스케줄링 장치 및 그 방법을 제안한다. Accordingly, the present invention proposes a distributed scheduling device and method for an analysis workflow for distributing and parallel processing of data managed at geographically different locations into a plurality of application tasks.

본 발명은 지리적으로 서로 다른 위치에서 관리되는 데이터의 분석을 지원하는 분석 워크플로우를 다수의 클라우드 환경의 컴퓨팅 자원을 활용하여 분산 실행할 수 있도록 하는 분석 워크플로우의 분산 스케줄링 장치 및 그 방법에 관한 것이다. The present invention relates to a distributed scheduling device and method for an analysis workflow that enables analysis workflows supporting analysis of data managed at geographically different locations to be executed in a distributed manner by utilizing computing resources of multiple cloud environments.

구체적으로, 본 발명은 분석 워크플로우의 각 응용 작업들을 대상 데이터의 보안 정보와 이동 비용을 고려하여 클라우드 환경에 효율적으로 배포하여 분산 병렬적으로 처리할 수 있도록 하는 스케줄링 장치 및 방법에 관한 것이다. Specifically, the present invention relates to a scheduling device and method for efficiently distributing each application task of an analysis workflow to a cloud environment in consideration of security information and movement costs of target data, thereby enabling distributed parallel processing.

도 1은 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 서비스 제공 시스템을 나타낸 블록 구성도이다.FIG. 1 is a block diagram illustrating a distributed scheduling service providing system of an analysis workflow according to one embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 서비스 제공 시스템은, 클라이언트(100), 분석 워크플로우의 분산 스케줄링 장치(이하 '분산 스케줄링 장치'라 칭함, 200), 복수의 데이터 관리 기관(300), 및 클라우드 서비스 제공 기관(400)을 포함한다. Referring to FIG. 1, a distributed scheduling service providing system of an analysis workflow according to one embodiment of the present invention includes a client (100), a distributed scheduling device of an analysis workflow (hereinafter referred to as a 'distributed scheduling device', 200), a plurality of data management organizations (300), and a cloud service providing organization (400).

클라이언트(100)는 분석 워크플로우 서비스 요청을 분산 스케줄링 장치(200)로 전송할 수 있다. 여기서, 분석 워크플로우 서비스 요청은 분석 워크플로우에 포함된 응용 작업 등을 포함할 수 있다. A client (100) can transmit an analysis workflow service request to a distributed scheduling device (200). Here, the analysis workflow service request can include application tasks included in the analysis workflow.

클라이언트(100)는 분석 워크플로우의 분산 스케줄링 서비스를 제공받고자 하는 사용자가 사용하는 유무선 통신 단말, 장치, 시스템 등을 포함할 수 있다. A client (100) may include a wired or wireless communication terminal, device, system, etc. used by a user who wishes to receive a distributed scheduling service of an analysis workflow.

분산 스케줄링 장치(200)는 클라이언트(100)로부터 수신된 분석 워크플로우 서비스 요청을 분석하여 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하고, 선택된 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배포하며, 각 응용 작업이 배포된 클라우드에서 분산 실행되도록 할 수 있다. 이때, 분산 스케줄링 장치(200)는 분석 워크플로우의 각 응용 작업에 필요한 대상 데이터들을 데이터 관리 기관(300)에 조회하여 보안 정보, 접근정보 등의 대상 데이터 정보를 조회할 수 있다. 또한, 분산 스케줄링 장치(200)는 클라우드 서비스 제공 기관(400)의 클라우드 자원을 조회할 수 있다.The distributed scheduling device (200) analyzes the analysis workflow service request received from the client (100) to select at least one application task included in the analysis workflow, and distributes each application task to different clouds by considering the security information and movement cost of target data required for each selected application task, and enables each application task to be distributedly executed in the cloud to which it is deployed. At this time, the distributed scheduling device (200) can inquire about target data required for each application task of the analysis workflow from the data management organization (300) and inquire about target data information such as security information and access information. In addition, the distributed scheduling device (200) can inquire about cloud resources of the cloud service provider (400).

이러한 분산 스케줄링 장치(200)에 대한 상세한 설명은 도 2를 참조하기로 한다.A detailed description of this distributed scheduling device (200) is given with reference to FIG. 2.

복수의 데이터 관리 기관(300)은 분석 워크플로우의 응용 작업에 필요한 데이터들을 관리하는 기관으로, 각 데이터 관리 기관(300)은 서로 다른 위치에서 서로 다른 데이터를 관리할 수 있다. Multiple data management organizations (300) are organizations that manage data required for application tasks of the analysis workflow, and each data management organization (300) can manage different data in different locations.

클라우드 서비스 제공 기관(400)은 클라우드 제공 기관, 데이터 관리 기관(300), 사설 클라우드 등 클라우드 자원을 제공하는 기관을 의미할 수 있다. A cloud service provider (400) may refer to an institution that provides cloud resources, such as a cloud provider, a data management institution (300), or a private cloud.

클라우드 서비스 제공 기관(400)은 데이터 관리 기관(300)에 연결되어, 데이터 관리 기관(300) 중 일부는 클라우드 서비스를 제공할 수 있다.The cloud service provider (400) is connected to the data management agency (300), and some of the data management agencies (300) can provide cloud services.

한편, 도 1에서는 하나의 클라우드 서비스 제공 기관(400)만을 도시하였으나, 클라우드 서비스 제공 기관(400)은 복수 개 존재할 수 있다.도 2는 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 장치를 나타낸 블록 구성도, 도 3은 도 2에 도시된 프로세서의 기능을 설명하기 위한 블록 구성도이다. Meanwhile, although only one cloud service provider (400) is illustrated in FIG. 1, there may be multiple cloud service providers (400). FIG. 2 is a block diagram showing a distributed scheduling device of an analysis workflow according to an embodiment of the present invention, and FIG. 3 is a block diagram explaining the function of the processor illustrated in FIG. 2.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 장치(200)는 통신모듈(210), 메모리(220) 및 프로세서(230)를 포함한다. Referring to FIG. 2, a distributed scheduling device (200) of an analysis workflow according to one embodiment of the present invention includes a communication module (210), a memory (220), and a processor (230).

통신모듈(210)은 통신망을 통해 클라이언트(100), 데이터 관리 기관(300), 클라우드 서비스 제공 기관(400) 등과 통신하기 위한 구성으로, 분석 워크플로우 서비스 요청 등 다양한 정보를 송수신할 수 있다. 이때, 통신모듈(210)은 근거리 통신모듈, 무선 통신모듈, 이동통신 모듈, 유선 통신모듈 등 다양한 형태로 구현될 수 있다.The communication module (210) is configured to communicate with a client (100), a data management organization (300), a cloud service provider (400), etc. via a communication network, and can transmit and receive various information such as an analysis workflow service request. At this time, the communication module (210) can be implemented in various forms such as a short-range communication module, a wireless communication module, a mobile communication module, and a wired communication module.

메모리(220)는 분산 스케줄링 장치(200)의 동작과 관련된 데이터들을 저장하는 구성이다. 특히, 메모리(220)에는 분석 워크플로우의 각 응용 작업들을 대상 데이터의 보안과 이동 비용을 고려하여 클라우드 환경에 효율적으로 배포하여 분산 병렬적으로 처리될 수 있도록 하는 어플리케이션(프로그램 또는 애플릿) 등이 저장될 수 있으며, 저장되는 정보들은 필요에 따라 프로세서(230)에 의해 취사 선택될 수 있다. 즉, 메모리(220)에는 분산 스케줄링 장치(200)의 구동을 위한 운영 체제나 어플리케이션(프로그램 또는 애플릿)의 실행 과정에서 발생하는 여러 종류의 데이터가 저장된다. 이때, 메모리(220)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치를 통칭하는 것이다. The memory (220) is a configuration that stores data related to the operation of the distributed scheduling device (200). In particular, the memory (220) may store applications (programs or applets) that allow each application task of the analysis workflow to be efficiently distributed to a cloud environment in consideration of the security and movement cost of the target data and processed in a distributed parallel manner, and the stored information may be selectively selected by the processor (230) as needed. That is, the memory (220) stores various types of data generated during the execution of the operating system or application (program or applet) for driving the distributed scheduling device (200). At this time, the memory (220) refers to a nonvolatile storage device that continuously maintains stored information even when power is not supplied and a volatile storage device that requires power to maintain the stored information.

프로세서(230)는 ASIC(Application Specific Integrated Circuit), DSP(Digital Signal Processor), PLD(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), CPU(Central Processing unit), 마이크로 컨트롤러(microcontrollers) 및/또는 마이크로프로세서(microprocessors) 중 적어도 하나로 구현될 수 있다.The processor (230) may be implemented as at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Programmable Logic Devices (PLD), a Field Programmable Gate Arrays (FPGAs), a Central Processing Unit (CPU), a microcontroller, and/or a microprocessor.

프로세서(230)는, 통신모듈(210)을 통해 수신된 분석 워크플로우를 분석하여 분석 워크플로우에 포함된 적어도 하나의 응용 작업을 선택하고, 선택된 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배포하며, 각 응용 작업이 배포된 클라우드에서 분산 실행되도록 할 수 있다. The processor (230) analyzes the analysis workflow received through the communication module (210) to select at least one application task included in the analysis workflow, and distributes each application task to different clouds by considering the security information and movement cost of target data required for each selected application task, and allows each application task to be distributedly executed in the cloud to which it is distributed.

이러한 프로세서(230)는 도 3에 도시된 바와 같이 구문 분석기(231), 대상 데이터 조회기(232), 클라우드 자원 조회기(233), 스케줄러(234) 및 실행기(235)를 포함한다. This processor (230) includes a syntax analyzer (231), a target data queryer (232), a cloud resource queryer (233), a scheduler (234), and an executor (235), as illustrated in FIG. 3.

구문 분석기(231)는 클라이언트(100)로부터 전달받은 분석 워크플로우를 분석하고, 검증할 수 있다. The parser (231) can analyze and verify the analysis workflow received from the client (100).

분석 워크플로우는 JSON, YAML 등의 다양한 구조화된 포맷으로 정의될 수 있으며, 이는 데이터를 입력 받고 처리하는 다수의 순서화된 응용 작업들을 포함한다. 따라서, 구문 분석기(231)는 분석 워크플로우에 포함된 응용 작업들의 입력 데이터 정보를 이해하고, 응용 작업 간의 순서 및 실행 규칙 등을 검증할 수 있다.The analysis workflow can be defined in various structured formats such as JSON, YAML, etc., and includes a number of ordered application tasks that input and process data. Accordingly, the parser (231) can understand the input data information of the application tasks included in the analysis workflow, and verify the order and execution rules between the application tasks.

구체적으로, 구문 분석기(231)는 분석 워크플로우에 포함된 복수의 응용 작업, 각 응용 작업의 입력 데이터 정보, 각 응용 작업의 요구 컴퓨팅 자원 등을 분석할 수 있다. 여기서, 입력 데이터 정보는 입력값, 데이터 등을 포함하고, 입력값은 처리에 필요한 값으로, 숫자, 문자 등의 값을 포함하며, 데이터는 접근정보(서버 주소, 이름, 경로 등을 포함함)) 등을 포함할 수 있다. Specifically, the parser (231) can analyze multiple application tasks included in the analysis workflow, input data information of each application task, required computing resources of each application task, etc. Here, the input data information includes input values, data, etc., and the input values are values required for processing and include values such as numbers and characters, and the data can include access information (including server addresses, names, paths, etc.).

또한, 구문 분석기(231)는 분석 워크플로우의 구분 분석을 통해 분석 워크플로우가 유효한 분석 워크플로우인지를 판단할 수 있다. 이때, 구문 분석기(231)는 분석 워크플로우가 기 정의된 포맷에 맞게 정의되어 있는지, 응용 작업 간의 순서 및 실행 규칙 등을 확인함으로써, 유효한 분석 워크플로우인지를 판단할 수 있다. 예를 들면, 구문 분석기(231)는 분석 워크플로우 내의 노드와 노드 간의 관계가 정상적으로 잘 연결되어 있는지 등을 확인함으로써, 유효한 분석 워크플로우인지를 판단할 수 있다. In addition, the syntax analyzer (231) can determine whether the analysis workflow is a valid analysis workflow by analyzing the distinction of the analysis workflow. At this time, the syntax analyzer (231) can determine whether the analysis workflow is a valid analysis workflow by checking whether the analysis workflow is defined according to a predefined format, the order and execution rules between application tasks, etc. For example, the syntax analyzer (231) can determine whether the analysis workflow is a valid analysis workflow by checking whether the relationships between nodes in the analysis workflow are normally well connected.

유효하지 않은 분석 워크플로우인 경우, 구문 분석기(231)는 현재 분석 워크플로우를 종료하고 다음 분석 워크플로우를 대기할 수 있다. If the analysis workflow is invalid, the parser (231) can terminate the current analysis workflow and wait for the next analysis workflow.

유효한 분석 워크플로우인 경우, 구문 분석기(231)는 분석 워크플로우에서 실행할 응용 작업들을 선택할 수 있다. If a valid analysis workflow is present, the parser (231) can select application tasks to execute in the analysis workflow.

대상 데이터 조회기(232)는 구문 분석기(231)에서 선택된 각 응용 작업에 필요한 대상 데이터의 정보를 조회할 수 있다. 즉, 대상 데이터 조회기(232)는 대상 데이터의 보안 정보 및 접근정보를 포함하는 대상 데이터 정보를 조회할 수 있다. 여기서, 보안 정보는 보안 등급을 의미하는 것으로, 유출 불가능, 유출 가능 등에 대한 정보를 포함할 수 있다. 보안 정보가 유출 불가능인 경우, 해당 대상 데이터는 해당 데이터 관리 기관(300)에서만 동작(관리)될 수 있다. The target data query unit (232) can query information on target data required for each application task selected by the syntax analyzer (231). That is, the target data query unit (232) can query target data information including security information and access information of the target data. Here, the security information means a security level and can include information on whether it cannot be leaked or can be leaked. If the security information cannot be leaked, the target data can only be operated (managed) by the data management organization (300).

응용 작업은 0개 이상의 대상 데이터가 필요할 수 있으므로, 대상 데이터 조회기(232)는 대상 데이터들의 보안 정보(보안 범위), 접근정보(관리 위치) 등을 조회할 수 있다. 대상 데이터 조회기(232)는 응용 작업의 입력 데이터 정보를 기초로, 대상 데이터의 데이터 관리 기관(300)에 조회하여 접근 가능한 대상인지 확인하고, 대상 데이터의 보안 정보 및 접근정보를 포함하는 대상 데이터 정보를 조회할 수 있다. Since an application task may require zero or more target data, the target data query device (232) can query the security information (security scope), access information (management location), etc. of the target data. Based on the input data information of the application task, the target data query device (232) can query the data management organization (300) of the target data to confirm whether it is an accessible target, and can query target data information including the security information and access information of the target data.

클라우드 자원 조회기(233)는 각 응용 작업의 요구 컴퓨팅 자원에 기초하여 사용 가능한 클라우드를 조회할 수 있다. 이때, 클라우드 자원 조회기(233)는 클라우드 서비스 제공 기관(400), 데이터 관리 기관(300), 사설 클라우드 등 접근 가능한 클라우드 자원들을 조회할 수 있다. The cloud resource query device (233) can query available clouds based on the required computing resources of each application task. At this time, the cloud resource query device (233) can query accessible cloud resources such as a cloud service provider (400), a data management agency (300), and a private cloud.

응용 작업은 실행 시 CPU, 메모리 등의 컴퓨팅 자원을 요구할 수 있으므로, 클라우드 자원 조회기(233)는 요구 컴퓨팅 자원을 사용할 수 있는 클라우드들을 조회할 수 있다. 여기서, 요구 컴퓨팅 자원은 CPU 처리 능력, 메모리 용량 등을 포함할 수 있다. Since an application task may require computing resources such as CPU and memory when executed, the cloud resource query unit (233) may query clouds that can use the required computing resources. Here, the required computing resources may include CPU processing power, memory capacity, etc.

스케줄러(234)는 클라우드 자원 조회기(233)에서 조회된 클라우드 중에서 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업을 서로 다른 클라우드에 배치할 수 있다. 여기서, 이동 비용은 대상 데이터를 클라우드로 이동시키는 비용으로, 대상 데이터의 크기 및 네트워크 거리 중 적어도 하나에 기초하여 산출될 수 있고, 네트워크 거리는 홉 수(hop count) 등을 이용하여 산출될 수 있다.The scheduler (234) can place each application task in a different cloud by considering the security information and movement cost of target data required for each application task among the clouds searched by the cloud resource searcher (233). Here, the movement cost is the cost of moving the target data to the cloud, and can be calculated based on at least one of the size of the target data and the network distance, and the network distance can be calculated using the hop count, etc.

스케줄러(234)는 조회된 클라우드 중에서 대상 데이터의 보안 정보에 따라 접근할 수 있는 적어도 하나의 클라우드를 추출하고, 추출된 적어도 하나의 클라우드 중에서 대상 데이터의 이동 비용이 가장 적은 클라우드를 최적 클라우드로 선택할 수 있다. The scheduler (234) can extract at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and select the cloud with the lowest cost of moving the target data from among the extracted at least one cloud as the optimal cloud.

예를 들면, 보안등급이 높은 경우(즉, 대상 데이터의 유출(이동) 불가능인 경우), 스케줄러(234)는 데이터 관리 기관(300)의 클라우드 중에서 최적 클라우드를 선택할 수 있다. 대상 데이터의 이동이 불가능하고, 해당 데이터 관리 기관(300)의 클라우드를 사용할 수 없는 경우, 해당 응용 작업은 실행되지 못하고, 에러 상태로 남을 수 있다. 또한, 보안등급이 낮은 경우(즉, 대상 데이터의 유출(이동)이 가능한 경우), 스케줄러(234)는 조회된 클라우드 중에서 이동 비용이 가장 적은 클라우드를 선택할 수 있다. For example, if the security level is high (i.e., if the target data cannot be leaked (moved)), the scheduler (234) can select the optimal cloud among the clouds of the data management organization (300). If the target data cannot be moved and the cloud of the data management organization (300) cannot be used, the application task cannot be executed and may remain in an error state. In addition, if the security level is low (i.e., if the target data can be leaked (moved)), the scheduler (234) can select the cloud with the lowest movement cost among the searched clouds.

예를 들어, 100GB 크기의 제1 대상 데이터가 데이터 관리 기관1에 위치하고, 200GB 크기의 제2 대상 데이터가 데이터 관리 기관2에 위치하며, 제1 대상 데이터와 제2 대상 데이터 모두 이동 가능하고, 제1 대상 데이터와 제2 대상 데이터를 한 곳에서 처리해야 하는 경우를 설명하기로 한다. 이 경우, 스케줄러(234)는 데이터 크기가 작은 제1 대상 데이터의 이동 비용이 제2 대상 데이터의 이동 비용보다 더 적다고 판단하여, 제1 대상 데이터를 이동시킬 수 있다. For example, a case will be described where the first target data of 100 GB in size is located in data management organization 1, the second target data of 200 GB in size is located in data management organization 2, both the first target data and the second target data are movable, and the first target data and the second target data must be processed in one place. In this case, the scheduler (234) determines that the movement cost of the first target data with a small data size is less than the movement cost of the second target data, and thus can move the first target data.

실행기(235)는 스케줄러(234)에서 선택된 최적 클라우드에 각 응용 작업의 실행을 요청하고, 응용 작업의 상태를 모니터링 할 수 있다. The executor (235) can request execution of each application task on the optimal cloud selected by the scheduler (234) and monitor the status of the application task.

즉, 실행기(235)는 응용 작업의 실행 준비를 마치면 선택된 최적 클라우드에 각 응용 작업을 배포 및 실행시킬 수 있다. 또한, 정상적으로 응용 작업들이 실행되면, 실행기(235)는 응용 작업들의 상태를 모니터링하여, 해당 응용 작업이 종료되었는지 확인하고, 다음에 실행 가능한 응용 작업들을 찾아 실행시킬 수 있다. 모든 응용 작업들이 종료되면 실행기(235)는 분석 워크플로우를 종료할 수 있다. That is, when the executor (235) is ready to execute the application tasks, it can distribute and execute each application task to the selected optimal cloud. In addition, when the application tasks are executed normally, the executor (235) can monitor the status of the application tasks, check whether the corresponding application task has been terminated, and find and execute the next executable application tasks. When all the application tasks have been terminated, the executor (235) can terminate the analysis workflow.

상기와 같이 구성된 분산 스케줄링 장치(200)는 분석 워크플로우의 각 응용 작업들을 대상 데이터의 보안 정보와 이동 비용을 고려하여 클라우드 환경에 효율적으로 배포하여 분산 병렬적으로 처리할 수 있도록 한다. The distributed scheduling device (200) configured as described above efficiently distributes each application task of the analysis workflow to a cloud environment while considering the security information and movement cost of the target data, so that it can be processed in a distributed parallel manner.

도 4는 본 발명의 일 실시 예에 따른 분석 워크플로우의 분산 스케줄링 방법을 설명하기 위한 흐름도이다. FIG. 4 is a flowchart illustrating a distributed scheduling method of an analysis workflow according to an embodiment of the present invention.

도 4를 참조하면, 클라이언트(100)로부터 분석 워크플로우가 수신되면(S402), 분산 스케줄링 장치(200)는 분석 워크플로우를 구분 분석하여(S404), 유효한 분석 워크플로우인지를 판단한다(S406). 즉, 분산 스케줄링 장치(200)는 분석 워크플로우를 구문 분석하여 분석 워크플로우에 포함된 복수의 응용 작업, 각 응용 작업의 입력 데이터 정보, 각 응용 작업의 요구 컴퓨팅 자원 등을 확인할 수 있다. 그런 후, 분산 스케줄링 장치(200)는 응용 작업 간의 순서 및 실행 규칙 등을 검증하여 분석 워크플로우가 유효한 분석 워크플로우인지를 판단할 수 있다. 이때, 분산 스케줄링 장치(200)는 스케줄러(234)가 이해할 수 있는 유효한 분석 워크플로우인지 검토하고, 유효한 분석 워크플로우가 아닌 경우 해당 분석 워크플로우를 종료하고, 다음 분석 워크플로우를 대기할 수 있다. Referring to FIG. 4, when an analysis workflow is received from a client (100) (S402), the distributed scheduling device (200) analyzes the analysis workflow (S404) and determines whether it is a valid analysis workflow (S406). That is, the distributed scheduling device (200) can parse the analysis workflow to check a plurality of application tasks included in the analysis workflow, input data information of each application task, required computing resources of each application task, etc. Then, the distributed scheduling device (200) can verify the order and execution rules between the application tasks to determine whether the analysis workflow is a valid analysis workflow. At this time, the distributed scheduling device (200) can check whether it is a valid analysis workflow that the scheduler (234) can understand, and if it is not a valid analysis workflow, it can terminate the corresponding analysis workflow and wait for the next analysis workflow.

S406 단계가 수행되면, 분산 스케줄링 장치(200)는 분석 워크플로우에서 실행 가능한 응용 작업들이 있는지 확인하여(S408), 실행 가능한 응용 작업을 선택한다(S410).When step S406 is performed, the distributed scheduling device (200) checks whether there are executable application tasks in the analysis workflow (S408) and selects an executable application task (S410).

S410 단계가 수행되면, 분산 스케줄링 장치(200)는 선택된 응용 작업에 필요한 대상 데이터의 정보를 조회한다(S412). 이때, 분산 스케줄링 장치(200)는 선택된 응용 작업에 필요한 대상 데이터들을 데이터 관리 기관(300)에 조회하여 접근 가능한 대상인지 확인할 수 있다. 또한, 분산 스케줄링 장치(200)는 각 대상 데이터의 보안 정보(보안범위), 접근정보(관리 위치) 등을 포함하는 대상 데이터 정보를 조회할 수 있다.When step S410 is performed, the distributed scheduling device (200) searches for information on target data required for the selected application task (S412). At this time, the distributed scheduling device (200) can search for target data required for the selected application task from the data management organization (300) to confirm whether the target data is accessible. In addition, the distributed scheduling device (200) can search for target data information including security information (security scope), access information (management location), etc. of each target data.

S412 단계가 수행되면, 분산 스케줄링 장치(200)는 각 응용 작업의 요구 컴퓨팅 자원에 기초하여 사용 가능한 클라우드를 조회한다(S414). 응용 작업은 실행 시 CPU, 메모리 등의 컴퓨팅 자원을 요구할 수 있으므로, 분산 스케줄링 장치(200)는 요구 컴퓨팅 자원을 사용할 수 있는 클라우드들을 조회할 수 있다.When step S412 is performed, the distributed scheduling device (200) searches for available clouds based on the required computing resources of each application task (S414). Since the application task may require computing resources such as CPU and memory when executed, the distributed scheduling device (200) can search for clouds that can use the required computing resources.

S414 단계가 수행되면, 분산 스케줄링 장치(200)는 조회된 클라우드 중에서 각 응용 작업에 필요한 대상 데이터의 보안 정보 및 이동 비용을 고려하여 각 응용 작업의 최적 클라우드를 선택한다(S416). 즉, 분산 스케줄링 장치(200)는 조회된 클라우드 중에서 대상 데이터의 보안 정보에 따라 접근할 수 있는 적어도 하나의 클라우드를 추출하고, 추출된 적어도 하나의 클라우드 중에서 대상 데이터의 이동 비용이 가장 적은 클라우드를 최적 클라우드로 선택할 수 있다. When step S414 is performed, the distributed scheduling device (200) selects an optimal cloud for each application task from among the searched clouds, considering the security information and movement cost of target data required for each application task (S416). That is, the distributed scheduling device (200) can extract at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and select a cloud with the lowest movement cost of the target data from among the extracted at least one cloud as the optimal cloud.

S416 단계가 수행되면, 분산 스케줄링 장치(200)는 선택된 최적 클라우드에 응용 작업을 배포하여 실행시킨다(S418). 즉, 분산 스케줄링 장치(200)는 응용 작업의 실행 준비를 마치면 선택된 최적 클라우드에 응용 작업을 배포 및 실행시킬 수 있다. When step S416 is performed, the distributed scheduling device (200) distributes and executes the application task to the selected optimal cloud (S418). That is, when the distributed scheduling device (200) completes preparation for execution of the application task, it can distribute and execute the application task to the selected optimal cloud.

S418 단계가 수행되면, 분산 스케줄링 장치(200)는 응용 작업의 상태를 모니터링하여, 응용 작업이 종료되었는지를 판단한다(S420).When step S418 is performed, the distributed scheduling device (200) monitors the status of the application task and determines whether the application task has been terminated (S420).

S420 단계의 판단결과, 응용 작업이 종료되면, 분산 스케줄링 장치(200)는 S408 단계를 수행한다. As a result of the judgment at step S420, when the application task is terminated, the distributed scheduling device (200) performs step S408.

또한, 본 발명에 따른 분석 워크플로우의 분산 스케줄링 장치 및 그 방법은, 서로 다른 데이터 관리 기관의 데이터들을 함께 활용하여 다양한 클라우드 환경에서 효율적으로 분석 워크플로우를 실행할 수 있도록 하는 효과가 있다. In addition, the distributed scheduling device and method of the analysis workflow according to the present invention have the effect of enabling efficient execution of the analysis workflow in various cloud environments by jointly utilizing data from different data management organizations.

본 발명은 도면에 도시된 실시예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 기술적 보호범위는 아래의 특허청구범위에 의해서 정하여져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, these are merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the technical protection scope of the present invention should be defined by the following patent claims.

100 : 클라이언트
200 : 분석 워크플로우의 분산 스케줄링 장치
210 : 통신모듈
220 : 메모리
230 : 프로세서
231 : 구문 분석기
232 : 대상 데이터 조회기
233 : 클라우드 자원 조회기
234 : 스케줄러
235 : 실행기
300 : 데이터 관리 기관
400 : 클라우드 서비스 제공 기관100 : Client
200: Distributed scheduling device for analysis workflows
210 : Communication module
220 : Memory
230 : Processor
231 : Parser
232: Target Data Lookup
233: Cloud Resource Query
234 : Scheduler
235 : Executor
300 : Data Management Agency
400: Cloud Service Provider

Claims

Communication module;
memory; and
Including a processor connected to the above communication module and memory,
The above processor,
A distributed scheduling device for an analysis workflow, characterized in that it analyzes an analysis workflow received through the communication module to select at least one application task included in the analysis workflow, distributes each application task to a different cloud by considering the security information and movement cost of target data required for each of the selected application tasks, and causes the application task to be executed in a distributed manner in the cloud to which each application task is distributed.

In the first paragraph,
The above processor,
A segmentation analyzer that analyzes at least one of a plurality of application tasks included in the above analysis workflow, input data information of each application task, and required computing resources of each application task;
A target data query device that queries target data information including security information of target data required for each application task;
A cloud resource query engine that queries available clouds based on the computing resources required for each application task;
A scheduler that selects the optimal cloud for each application task by considering the security information and movement cost of the target data among the clouds searched above; and
A distributed scheduling device for an analysis workflow, characterized in that it includes an executor that requests execution of each application task on the above-mentioned selected optimal cloud.

In the first paragraph,
The above-mentioned classification analyzer is,
A distributed scheduling device for an analysis workflow, characterized in that it determines whether the analysis workflow is a valid analysis workflow through a distinction analysis of the analysis workflow, and if it is a valid analysis workflow, it checks application tasks to be executed in the analysis workflow, and if it is an invalid analysis workflow, it terminates the analysis workflow and waits for the next analysis workflow.

In the first paragraph,
The above target data query engine is,
A distributed scheduling device for an analysis workflow, characterized in that it checks whether the target data is an accessible target by inquiring about the data management agency of the target data based on the input data information of the above application task, and inquiring about target data information including security information and access information of the target data.

In the first paragraph,
The above scheduler,
A distributed scheduling device for an analysis workflow, characterized in that it extracts at least one cloud that can be accessed according to the security information of the target data from among the clouds searched for above, and selects the cloud with the lowest movement cost of the target data from among the at least one extracted cloud as the optimal cloud.

In paragraph 5,
The cost of moving the above target data is,
A distributed scheduling device for an analysis workflow, characterized in that the value is calculated based on at least one of the size of the target data and the network distance.

In the first paragraph,
The above executor,
A distributed scheduling device for an analysis workflow, characterized by monitoring the status of each application task when each application task is executed normally.

A step in which a processor analyzes an analysis workflow received from a client and selects at least one application task included in the analysis workflow;
The step of the above processor placing each application task in a different cloud by considering the security information and movement cost of target data required for each of the above selected application tasks; and
The above processor controls the application tasks to be distributed and executed in the cloud where each application task is deployed.
A method for distributed scheduling of an analysis workflow, comprising:

In Article 8,
The step of selecting at least one application task is:
A step in which the above processor checks at least one of a plurality of application tasks included in the analysis workflow, input data information of each application task, and required computing resources of each application task;
The step of the above processor verifying each application task to determine whether the analysis workflow is a valid analysis workflow; and
A method for distributed scheduling of an analysis workflow, characterized in that it comprises a step of checking application tasks to be executed by the processor in the analysis workflow if the analysis workflow is a valid analysis workflow.

In Article 8,
The steps for deploying each of the above applications to different clouds are:
A step in which the above processor searches for target data information including security information of target data required for each application task;
The above processor searches for available clouds based on the required computing resources of each application task; and
A distributed scheduling method for an analysis workflow, characterized in that the processor comprises a step of selecting an optimal cloud for each application task by considering security information and movement cost of the target data among the searched clouds.

In Article 10,
In the step of searching the above target data information,
A distributed scheduling method for an analysis workflow, characterized in that the processor checks whether the target data is an accessible target by inquiring about the data management organization of the target data based on the input data information of the application task, and inquires about target data information including security information and access information of the target data.

In Article 11,
In the step of selecting the optimal cloud for each of the above applications,
A distributed scheduling method for an analysis workflow, characterized in that the processor extracts at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and selects a cloud with the lowest movement cost from among the extracted at least one cloud as an optimal cloud.

In Article 12,
The cost of moving the above target data is,
A distributed scheduling method for an analysis workflow, characterized in that the value is calculated based on at least one of the size of the target data and the network distance.

In Article 8,
In the step of controlling the above distributed execution,
A distributed scheduling method for an analysis workflow, characterized in that the above processor monitors the status of each application task when each application task is executed normally.

A step in which a processor analyzes an analysis workflow received from a client and selects at least one application task included in the analysis workflow;
A step in which the above processor searches for information on target data required for each of the selected application tasks;
The step of the above processor searching for available clouds based on the required computing resources of each application task;
The step of the processor placing each application task in a different cloud among the searched clouds, considering the security information and movement cost of the target data; and
The above processor controls the application tasks to be distributed and executed in the cloud where each application task is deployed.
A method for distributed scheduling of an analysis workflow, comprising:

In Article 15,
The step of selecting at least one application task is:
A step in which the above processor checks at least one of a plurality of application tasks included in the analysis workflow, input data information of each application task, and required computing resources of each application task;
The step of the above processor verifying each application task to determine whether the analysis workflow is a valid analysis workflow; and
A method for distributed scheduling of an analysis workflow, characterized in that it comprises a step of checking application tasks to be executed by the processor in the analysis workflow if the analysis workflow is a valid analysis workflow.

In Article 15,
In the step of searching for information on the above target data,
A distributed scheduling method for an analysis workflow, characterized in that the processor checks whether the target data is an accessible target by inquiring about the data management organization of the target data based on the input data information of the application task, and inquires about target data information including security information and access information of the target data.

In Article 15,
In the step of deploying each of the above applications to different clouds,
A distributed scheduling method for an analysis workflow, characterized in that the processor extracts at least one cloud that can be accessed according to the security information of the target data from among the searched clouds, and selects a cloud with the lowest movement cost from among the extracted at least one cloud as an optimal cloud.

In Article 18,
The above moving costs are,
A distributed scheduling method for an analysis workflow, characterized in that the value is calculated based on at least one of the size of the target data and the network distance.

In Article 15,
In the step of controlling the above distributed execution,
A distributed scheduling method for an analysis workflow, characterized in that the above processor monitors the status of each application task when each application task is executed normally.