WO2022109932A1

WO2022109932A1 - Multi-task submission system based on slurm computing platform

Info

Publication number: WO2022109932A1
Application number: PCT/CN2020/131819
Authority: WO
Inventors: 张楠; 蒋瑞; 康晓琦; 马健; 温书豪; 赖力鹏
Original assignee: Shenzhen Jingtai Technology Co Ltd
Current assignee: Shenzhen Jingtai Technology Co Ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2022-06-02
Anticipated expiration: 2023-05-26

Abstract

A multi-task submission system based on the Slurm computing platform. The system comprises a running environment deployment module, a computing task scheduling plug-in module, a computing task running data storage management module, a unified API module and a task data unified management module. The multi-task submission system has a simple and convenient task submission check API, which can be easily accessed as a plug-in for serving task submission and monitoring. An NAS is used for managing a task running environment and task running data, such that a data management operation of a user can be simplified, data preparation and check can be efficiently completed, and the NAS can also be easily switched to an sge cluster or a cloud computing scheduler and an auxiliary storage service.

Description

Multi-task submission system based on slurm computing platform

technical field

本发明属于计算机服务端技术领域，具体涉及一种基于机器学习的自由能微扰网络设计方法。The invention belongs to the technical field of computer servers, in particular to a method for designing a free energy perturbation network based on machine learning.

Background technique

Slurm是一个用于Linux和Unix内核系统的任务调度工具。它提供了三个关键功能。第一，为用户分配一定时间的专享或非专享的资源(计算机节点)，以供用户执行工作。第二，它提供了一个框架，用于启动、执行、监测在节点上运行着的任务(通常是并行的任务，例如MPI)，第三，为任务队列合理地分配资源。Slurm is a task scheduling tool for Linux and Unix kernel systems. It provides three key functions. First, users are allocated exclusive or non-exclusive resources (computer nodes) for a certain period of time for users to perform work. Second, it provides a framework for starting, executing, and monitoring tasks (usually parallel tasks, such as MPI) running on nodes, and third, assigning resources to task queues appropriately.

Python是一种面向对象的、动态的程序设计语言。具有非常简洁而清晰的语法，适合于完成各种高层任务。它既可以用来快速开发程序脚本，也可以用来开发大规模的软件。使用Python进行计算脚本开发，便捷高效。Python is an object-oriented, dynamic programming language. Has a very concise and clear syntax, suitable for various high-level tasks. It can be used to develop program scripts quickly, as well as large-scale software. Use Python for calculation script development, which is convenient and efficient.

Conda是一个开源、跨平台、语言无关的包管理与环境管理系统。由“连续统分析”(Continuum Analytics)基于BSD许可证发布。Conda允许用户方便地安装不同版本的二进制软件包与该计算平台需要的所有库。还允许用户在不同版本的包之间切换、从一个软件仓库下载包并安装。Conda是用Python语言开发，但能管理其他编程语言的项目(如R语言)，包括多语言项目。Conda可安装Python语言的包，类似于其他基于Python的跨平台包管理器(如wheel或pip)。Conda is an open source, cross-platform, language-agnostic package management and environment management system. Released under the BSD license by Continuum Analytics. Conda allows users to easily install different versions of binary packages and all the libraries needed for the computing platform. It also allows users to switch between different versions of packages, download packages from a software repository, and install them. Conda is developed in Python, but can manage projects in other programming languages (such as R), including multilingual projects. Conda installs packages in the Python language, similar to other Python-based cross-platform package managers (such as wheel or pip).

Slurm提交的任务运行环境由计算机节点本身环境所决定，而用户提交的任务的实际运行环境往往包含特殊的库与软件。用户通常只会往Slurm提交一个bash脚本，并在脚本中定义实际的python脚本并且指定解释器或者R等其他语言的脚本及指定解释器。The running environment of tasks submitted by Slurm is determined by the environment of the computer node itself, while the actual running environment of tasks submitted by users often includes special libraries and software. Users usually only submit a bash script to Slurm, and define the actual python script in the script and specify the interpreter or script in other languages such as R and the specified interpreter.

这一方法本身已然足够满足用户使用Slurm进行大规模并行计算的需求，但是，该方法并不高效。主要体现在以下几个方面：This method itself is enough to meet the needs of users to use Slurm for massively parallel computing, but this method is not efficient. Mainly reflected in the following aspects:

1.对于用户而言实际有价值的计算逻辑往往定义在非bash脚本(如Python或R)中，但是为了能在Slurm集群中运行大批量的计算任务，需要额外花费时间去编写支持在Slurm运行的bash脚本以及进行相应的脚本调试，长此以往将耗费一定精力与时间。1. Computational logic that is actually valuable to users is often defined in non-bash scripts (such as Python or R), but in order to run large-scale computing tasks in a Slurm cluster, it takes extra time to write support to run in Slurm bash script and corresponding script debugging, it will consume a certain amount of energy and time in the long run.

2.在直接使用slurm的过程中，用户需要自行管理任务的输入输出及错误等数据(文件)，增加使用slurm的准备工作。2. In the process of using slurm directly, users need to manage the data (files) such as input, output and errors of tasks by themselves, and increase the preparation work for using slurm.

Slurm运行任务在一个独立的计算节点中，用户需要手动为每一个任务预备计算环境并在每一次任务投递时主动指定计算环境。Slurm runs tasks in an independent computing node. Users need to manually prepare the computing environment for each task and actively specify the computing environment when each task is delivered.

发明内容SUMMARY OF THE INVENTION

针对上述技术问题，本发明的目的在于提供一种基于slurm计算平台的多任务提交系统，使用一个服务端进行任务提交与管理，并在服务端集成不同运行任务所需要的数据准备与环境准备，从而满足用户使用相同的接口不同的参数快速完成计算任务提交到slurm集群的操作。In view of the above-mentioned technical problems, the purpose of the present invention is to provide a multi-task submission system based on the slurm computing platform, which uses a server to perform task submission and management, and integrates data preparation and environment preparation required by different running tasks on the server, In this way, users can quickly complete the operation of submitting computing tasks to the slurm cluster using the same interface and different parameters.

为实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

基于slurm计算平台的多任务提交系统，包括运行环境部署模块、计算任务调度插件模块、计算任务运行数据存储管理模块、统一API接口模块、任务数据统一管理模块。The multi-task submission system based on the slurm computing platform includes a running environment deployment module, a computing task scheduling plug-in module, a computing task running data storage management module, a unified API interface module, and a unified task data management module.

具体的，所述的运行环境部署模块，对于基于python编写的计算脚本，用户使用Conda进行环境部署。在NAS上预备一个路径安装Conda，再创建env并安装所有的依赖包。最后提供该env下的python解释器。Specifically, in the operating environment deployment module, for the calculation script written based on python, the user uses Conda to deploy the environment. Prepare a path on the NAS to install Conda, create an env and install all dependencies. Finally provide the python interpreter under this env.

对于使用R语言或者其他编程语言编写的脚本，使用相应的包管理工具准备任务运行环境，并提供解释器路径。For scripts written in R language or other programming languages, use the corresponding package management tool to prepare the task execution environment and provide the interpreter path.

总之，环境需要部署在NAS上，解释器路径需要slurm集群可访问调用。In summary, the environment needs to be deployed on the NAS, and the interpreter path needs to be accessible to the slurm cluster for calls.

所述的计算任务调度插件模块，用户使用slurm的常用命令包括：sbatch、squeue、sacct、scontrol、scancel等，分别用于提交任务、查看任务队列、查看任务运行状态、查看任务与重跑任务、取消任务等。For the computing task scheduling plug-in module, the common commands that users use slurm include: sbatch, squeue, sacct, scontrol, scancel, etc., which are respectively used to submit tasks, view task queues, view task running status, view tasks and rerun tasks, Cancel tasks, etc.

所述的计算任务调度插件模块，主要是封装这些命令并形成高层接口：submit、get_job_status、batch_get_jobs_status、get_job_detail、rerun_job、kill_job、dump_job等接口。The computing task scheduling plug-in module mainly encapsulates these commands and forms high-level interfaces: submit, get_job_status, batch_get_jobs_status, get_job_detail, rerun_job, kill_job, dump_job and other interfaces.

所述的计算任务运行数据存储管理模块，用户仅需要提供一个任务脚本(仅支持单个文件)，和一个json格式定义的输入数据；使用dump_file接口将脚本拷贝到任务运行路径；使用dump_json将json格式数据写入到对应的inputs.json文件中。脚本需要在运行过程中重新加载inputs.json文件获取任务指定输入。任务运行的输出及错误信息，包括脚本的输出及slurm定义的输出，都放置在任务运行路径中，以便在任务运行过程及结束后查看。For the computing task operation data storage management module, the user only needs to provide a task script (only a single file is supported) and an input data defined in json format; use the dump_file interface to copy the script to the task running path; use dump_json to convert the json format The data is written to the corresponding inputs.json file. The script needs to reload the inputs.json file to get the task specified input during the run. The output and error information of the task running, including the output of the script and the output defined by slurm, are placed in the task running path for viewing during and after the task is running.

所述的统一API接口模块，对计算任务的调度与存储管理API进行统一封装，减少用户理解复杂度，便于用户快捷使用API接口提交任务。The unified API interface module uniformly encapsulates the scheduling and storage management APIs of computing tasks, reduces the complexity of user understanding, and facilitates users to quickly submit tasks using the API interface.

所述的任务数据统一管理模块，所有输入输出均定义在NAS盘分配的Log目录下，并为每一个任务创建一个独立的ID，基于此ID，用户使用API便捷的访问每一个任务的所有数据。In the unified management module for task data, all inputs and outputs are defined in the Log directory allocated by the NAS disk, and an independent ID is created for each task. Based on this ID, users can easily access all data of each task using the API .

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

1、简便的任务提交查看API接口，轻易的作为插件接入到任务提交与监控服务。1. Simple task submission and viewing API interface, which can be easily connected to the task submission and monitoring service as a plug-in.

2、使用NAS进行任务运行环境和任务运行数据管理，简化用户数据管理操作，高效完成数据准备收集与查看。2. Use NAS for task operation environment and task operation data management, simplify user data management operations, and efficiently complete data preparation collection and viewing.

3、对接NAS+slurm是该插件的固化使用方案，用户可以基于本发明实现逻辑轻松转接到sge集群或云端计算调度器以及相辅的存储服务。3. The docking NAS+slurm is the solidified usage scheme of the plug-in. The user can easily transfer to the sge cluster or cloud computing scheduler and the complementary storage service based on the implementation logic of the present invention.

Description of drawings

图1为本发明的插件架构设计结构图；Fig. 1 is a plug-in architecture design structural diagram of the present invention;

图2为本发明的任务管理流程图。FIG. 2 is a flow chart of task management of the present invention.

Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例1Example 1

本发明的插件架构设计如图1所示，具体任务管理流程如图2所示。The plug-in architecture design of the present invention is shown in FIG. 1 , and the specific task management process is shown in FIG. 2 .

一、初始化环境1. Initialize the environment

1.首先，在NAS集群上创建独享权限目录，如“/mydir/”，所有投递的任务数据，包括输入输出，算法数据、任务运行环境与任务运行日志等，都将被放置到该目录下。1. First, create an exclusive permission directory on the NAS cluster, such as "/mydir/", all delivered task data, including input and output, algorithm data, task running environment and task running logs, etc., will be placed in this directory. Down.

(1)在目录下创建conda路径，用于安装conda环境(1) Create a conda path in the directory for installing the conda environment

(2)在目录下创建scripts路径，用于放置算法脚本(2) Create a scripts path in the directory for placing algorithm scripts

(3)在目录下创建log路径，类似于一个沙箱环境，用于放置计算运行时数据(包括日志)(3) Create a log path in the directory, similar to a sandbox environment, for placing computing runtime data (including logs)

2.在创建的目录下，创建conda环境，参考官方文档提供的安装方法，进行安装。2. In the created directory, create a conda environment, and install it by referring to the installation method provided by the official documentation.

3.进入conda环境，并根据不同算法对环境的依赖不同，评估是否区分环境，在conda中使用”conda create”命令创建对应算法环境。创建方法参考文档。3. Enter the conda environment, and according to the different dependencies of different algorithms on the environment, evaluate whether to distinguish the environment, and use the "conda create" command in conda to create the corresponding algorithm environment. Create method reference documentation.

记录下运行环境，后续将作为任务投递配置信息。Record the running environment, and the configuration information will be delivered as a task later.

二、撰写任务投递脚本2. Write a task delivery script

1.编写投递任务的shell脚本，用于执行sbatch操作，其中固化的变量包括：1. Write a shell script for the delivery task to perform the sbatch operation. The fixed variables include:

(1)执行的算法文件名(.py文件)，如“loader.py”(1) The file name of the algorithm to be executed (.py file), such as "loader.py"

(2)算法需要读入的输入文件名(json格式文件)，如“inputs.json”(2) The input file name (json format file) that the algorithm needs to read, such as "inputs.json"

(3)其他必须的环境变量信息(3) Other necessary environment variable information

需要注意：统一的环境变量可以区别开输入进行传递，但是需要谨慎使用。因为环境变量处理不当会影响到默认环境。Note: Uniform environment variables can be passed separately from input, but they need to be used with caution. Because improper handling of environment variables will affect the default environment.

三、封装slurm命令Third, encapsulate the slurm command

1.将sbatch命令封装成submit接口，将sbatch的参数，包括cpu、内存等信息封装成submit接口的参数，并独立一个inputs＝{}的参数，用于将输入信息以json(在python中为dict)格式传入。接口需要完成以下事情：1. Encapsulate the sbatch command into the submit interface, encapsulate the sbatch parameters, including cpu, memory and other information into the parameters of the submit interface, and separate a parameter of inputs={}, which is used to convert the input information to json (in python as dict) format. The interface needs to do the following things:

(1)在log路径创建一个基于任务id标识的目录“/log/{id}”(1) Create a directory "/log/{id}" based on the task id identifier in the log path

(2)将指定算法拷贝到“/log/{id}”下(2) Copy the specified algorithm to "/log/{id}"

(3)将inputs数据dump到输入文件中(3) Dump the inputs data to the input file

2.将squeue与sacct命令封装成get_job_status接口与batch_get_jobs_status接口。squeue是为了查看在队列中的任务，scct可以查看开始运行的任务的任务状态，包括运行完成后的成功或者失败了的状态。batch_get_jobs_status用于批量查看多个任务的状态。2. Encapsulate the squeue and sacct commands into the get_job_status interface and the batch_get_jobs_status interface. Squeue is to view the tasks in the queue, and scct can view the task status of the task that starts to run, including the status of success or failure after the running is completed. batch_get_jobs_status is used to view the status of multiple jobs in batches.

3.将是”scontrol show”封装成get_job_detail命令。对于已经进入运行或运行完成的任务，sacct可以获取任务运行详情。根据用户需要，将任务详情组织后作为get_job_detail的返回信息提供给用户。3. Encapsulate "scontrol show" into the get_job_detail command. For tasks that have entered or completed, sacct can obtain task running details. According to the user's needs, the task details are organized and provided to the user as the return information of get_job_detail.

4.将“scontrolrequeue”命令封装成rerun_job接口。注意支持rerun的任务需要在sbatch提交任务时增加“--requeue”参数。4. Encapsulate the "scontrolrequeue" command into the rerun_job interface. Note that tasks that support rerun need to add the "--requeue" parameter when sbatch submits tasks.

5.将scancel命令封装成kill_job接口，用于取消任务。5. Encapsulate the scancel command into a kill_job interface to cancel the task.

独立封装dump_jobs接口，将“log/{id}”目录下用户可访问的输出(如结果数据，运行日志等)数据提供用户下载。Independently encapsulate the dump_jobs interface, and provide user-accessible output (such as result data, operation logs, etc.) data in the "log/{id}" directory for download.

实施例2应用场景与效果：Example 2 Application scenarios and effects:

对于使用slurm集群进行计算任务投递的工程师，管理集群数据是繁琐的事情。在实际场景中，基于本插件封装成一个完整的后端服务应用。应用中使用本发明进行任务投递与任务输入输出管理，而对外暴露算法业务HTTP API。对于业务工程师而言，可以随时随地使用与后端服务相联通的客户端应用，提交算法任务，而无需再管理任务在slurm集群的运行情况，也无需管理任务计算的输入输出存放，只需要等待任务运行结果即可。通过客户端调用后端API去获取slurm集群任务运行的结果，即可以知道任务成功与失败，并将成功任务的结果下载到本地，错误任务也可以方便的查看日志。所有这一切使用客户端封装成更贴近算法业务的更高层使用方式，而后端服务则承担了将算法任务转化为slurm 任务与NAS存储的实际访问。本发明在过程中提供了该转换服务。For engineers who use slurm clusters for computing task delivery, managing cluster data is tedious. In actual scenarios, a complete back-end service application is encapsulated based on this plug-in. In the application, the present invention is used for task delivery and task input and output management, and the algorithm service HTTP API is exposed externally. For business engineers, they can use client applications connected to back-end services to submit algorithm tasks anytime, anywhere, without the need to manage the operation of tasks in the slurm cluster, nor to manage the input and output storage of task calculations, just wait The result of the task execution can be obtained. The client can call the back-end API to obtain the result of the slurm cluster task operation, that is, you can know the success and failure of the task, and download the result of the successful task to the local, and the error task can also easily view the log. All of this is encapsulated into higher-level usage that is closer to the algorithm business using the client, while the back-end service undertakes the actual access to the slurm task and the NAS storage. The present invention provides this conversion service in the process.

从产品上线后观察，用户可以很清晰的管理自己的具体的计算任务，为自己的任务分配CPU与内存，而不再关注slurm集群的调度情况与计算结果的存储方式，部分用户可以做到在不了解slurm集群的情况下，仍然能很好的完成自己的计算任务投递与计算结果回收。After the product is launched, users can clearly manage their specific computing tasks, allocate CPU and memory for their tasks, and no longer pay attention to the scheduling situation of the slurm cluster and the storage method of computing results. Some users can do it in Without knowing the slurm cluster, you can still complete your own computing task delivery and computing result recovery.

Claims

The multi-task submission system based on the slurm computing platform is characterized in that it includes a running environment deployment module, a computing task scheduling plug-in module, a computing task running data storage management module, a unified API interface module, and a unified task data management module.

The multi-task submission system based on slurm computing platform according to claim 1, is characterized in that:

For the operating environment deployment module, for computing scripts written based on python, users use Conda to deploy the environment; prepare a path on the NAS to install Conda, then create an env and install all dependent packages; finally, provide a python explanation under the env device;

For scripts written in R language or other programming languages, use the corresponding package management tool to prepare the task running environment and provide the interpreter path;

In short, the environment needs to be deployed on the NAS, and the interpreter path needs to be accessible to the slurm cluster;

For the computing task scheduling plug-in module, the common commands that users use slurm include: sbatch, squeue, sacct, scontrol, scancel, which are respectively used to submit tasks, view task queues, view task running status, view tasks and rerun tasks, cancel tasks Task;

The described computing task scheduling plug-in module mainly encapsulates these commands and forms high-level interfaces: submit, get_job_status, batch_get_jobs_status, get_job_detail, rerun_job, kill_job, dump_job interfaces;

For the computing task operation data storage management module, the user only needs to provide a task script and an input data defined in json format; use the dump_file interface to copy the script to the task running path; use dump_json to write the json format data to the corresponding inputs .json file; the script needs to reload the inputs.json file during the running process to obtain the input specified by the task; the output and error information of the task operation, including the output of the script and the output defined by slurm, are placed in the task running path, so that the View the task running process and after the end;

The unified API interface module uniformly encapsulates the scheduling and storage management API of computing tasks, reduces the complexity of user understanding, and facilitates users to quickly submit tasks using the API interface;

In the unified management module for task data, all inputs and outputs are defined in the Log directory allocated by the NAS disk, and an independent ID is created for each task. Based on this ID, users can easily access all data of each task using the API .