[go: up one dir, main page]

CN114428710B - Queue status verification method, device and electronic equipment - Google Patents

Queue status verification method, device and electronic equipment Download PDF

Info

Publication number
CN114428710B
CN114428710B CN202210083037.XA CN202210083037A CN114428710B CN 114428710 B CN114428710 B CN 114428710B CN 202210083037 A CN202210083037 A CN 202210083037A CN 114428710 B CN114428710 B CN 114428710B
Authority
CN
China
Prior art keywords
queue
task
configuration
configuration file
status
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210083037.XA
Other languages
Chinese (zh)
Other versions
CN114428710A (en
Inventor
胡永泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210083037.XA priority Critical patent/CN114428710B/en
Publication of CN114428710A publication Critical patent/CN114428710A/en
Application granted granted Critical
Publication of CN114428710B publication Critical patent/CN114428710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明实施例涉及一种队列状态校验方法、装置及电子设备,该方法包括:获取待执行的任务;从任务中提取执行任务的队列对应的队列标识信息;根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项;根据配置文件和配置项,获取待执行任务的队列;根据配置文件,验证队列的状态,以便后续确定队列的状态正常时,利用队列执行任务。执行任务之前,首先验证队列状态是否正常。验证过程本身占用少量时间,避免在执行任务,并在执行任务失败后,才获知队列不正确的情况发生。队列任务的执行通常占用很多时间,通过该方式,可以大大节省在队列状态异常情况下执行任务所占用的时间,提升工作效率。

The embodiment of the present invention relates to a queue status verification method, device and electronic device, the method comprising: obtaining a task to be executed; extracting queue identification information corresponding to the queue for executing the task from the task; obtaining a configuration file and configuration items corresponding to the queue identification information from a pre-built database according to the queue identification information; obtaining the queue of the task to be executed according to the configuration file and the configuration items; verifying the status of the queue according to the configuration file, so that when the status of the queue is determined to be normal later, the queue is used to execute the task. Before executing the task, first verify whether the queue status is normal. The verification process itself takes a small amount of time, avoiding the situation where the queue is not known to be incorrect until the task is executed and the task fails. The execution of queue tasks usually takes a lot of time. In this way, the time taken to execute tasks under abnormal queue status can be greatly saved, thereby improving work efficiency.

Description

队列状态校验方法、装置及电子设备Queue status verification method, device and electronic equipment

技术领域Technical Field

本发明实施例涉及计算机技术领域,尤其涉及一种队列状态校验方法、装置及电子设备。The embodiments of the present invention relate to the field of computer technology, and in particular to a queue status verification method, device and electronic device.

背景技术Background Art

DolphinScheduler是一个分布式易扩展的可视化DAG工作流任务调度系统。为Apache的一个顶级项目。在现有的DolphinScheduler框架中,可以按照顺序调度Flink、MR、Spark任务,这些任务需要使用到Yarn队列。然而现有的DolphinScheduler框架中对Yarn队列的管理中只会保存一个Yarn队列名。执行任务时,根据yarn队列名执行将任务提交到队列时,或者提交任务到相应队列后,很可能发生队列不存在、Yarn执行不正常等情况的发生,而这些情况只能在提示任务执行失败后,才能意识到队列存在故障,白白浪费很多时间。DolphinScheduler is a distributed and scalable visual DAG workflow task scheduling system. It is a top-level project of Apache. In the existing DolphinScheduler framework, Flink, MR, and Spark tasks can be scheduled in sequence, and these tasks require the use of Yarn queues. However, in the existing DolphinScheduler framework, only one Yarn queue name is saved in the management of Yarn queues. When executing tasks, when submitting tasks to the queue according to the yarn queue name, or after submitting tasks to the corresponding queue, it is very likely that the queue does not exist or Yarn executes abnormally. In these cases, you can only realize that there is a fault in the queue after the task execution fails, which wastes a lot of time.

发明内容Summary of the invention

本申请提供了一种队列状态校验方法、装置及电子设备,以解决现有技术中DolphinScheduler框架中对Yarn队列的管理中只会保存一个Yarn队列名,进而引出上述所介绍的一系列技术问题。The present application provides a queue status verification method, device and electronic device to solve the problem that only one Yarn queue name is saved in the management of the Yarn queue in the DolphinScheduler framework in the prior art, which leads to a series of technical problems introduced above.

第一方面,本申请提供了一种队列状态校验方法,该方法包括:In a first aspect, the present application provides a queue status verification method, the method comprising:

获取待执行的任务;Get the tasks to be executed;

从任务中提取执行任务的队列对应的队列标识信息;Extract the queue identification information corresponding to the queue that executes the task from the task;

根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项;According to the queue identification information, obtain the configuration file and configuration items corresponding to the queue identification information from the pre-built database;

根据配置文件和配置项,获取待执行任务的队列;Get the queue of tasks to be executed according to the configuration files and configuration items;

根据配置文件,验证队列的状态,以便后续确定队列的状态正常时,利用队列执行任务。Verify the status of the queue according to the configuration file so that when the status of the queue is determined to be normal, the queue can be used to execute tasks.

第二方面,本申请提供了一种队列状态校验装置,该装置包括:In a second aspect, the present application provides a queue status verification device, the device comprising:

获取模块,用于获取待执行的任务;The acquisition module is used to obtain the tasks to be executed;

提取模块,用于从任务中提取执行任务的队列对应的队列标识信息;An extraction module, used to extract queue identification information corresponding to the queue that executes the task from the task;

处理模块,用于根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项;根据配置文件和配置项,获取待执行任务的队列;The processing module is used to obtain the configuration file and configuration items corresponding to the queue identification information from a pre-built database according to the queue identification information; and obtain the queue of the task to be executed according to the configuration file and configuration items;

验证模块,用于根据配置文件,验证队列的状态,以便后续确定队列的状态正常时,利用队列执行任务。The verification module is used to verify the state of the queue according to the configuration file, so as to use the queue to execute tasks when the state of the queue is determined to be normal.

第三方面,提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;In a third aspect, an electronic device is provided, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

存储器,用于存放计算机程序;Memory, used to store computer programs;

处理器,用于执行存储器上所存放的程序时,实现第一方面任一项实施例的队列状态校验方法的步骤。The processor is used to implement the steps of the queue status verification method of any embodiment of the first aspect when executing the program stored in the memory.

第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如第一方面任一项实施例的队列状态校验方法的步骤。In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps of the queue status verification method as in any embodiment of the first aspect are implemented.

本申请实施例提供的上述技术方案与现有技术相比具有如下优点:The above technical solution provided by the embodiment of the present application has the following advantages compared with the prior art:

本申请实施例提供的该方法,获取待执行的任务,从任务中提取与执行该任务的队列对应的队列标识信息。根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项。根据配置文件和配置项,从预构建的队列库中提取队列,然后根据配置文件,验证队列的状态。进而在确定队列状态正常时,再利用该队列执行任务。也即是,执行任务之前,首先验证队列状态是否正常。验证过程本身占用少量时间,避免在执行任务,并在执行任务失败后,才获知队列不正确的情况发生。队列任务的执行通常占用很多时间,通过该方式,可以大大节省在队列状态异常情况下执行任务所占用的时间,提升工作效率。The method provided in the embodiment of the present application obtains the task to be executed, and extracts the queue identification information corresponding to the queue executing the task from the task. According to the queue identification information, the configuration file and configuration items corresponding to the queue identification information are obtained from the pre-built database. According to the configuration file and the configuration item, the queue is extracted from the pre-built queue library, and then the state of the queue is verified according to the configuration file. Then, when it is determined that the queue state is normal, the queue is used to execute the task. That is, before executing the task, first verify whether the queue state is normal. The verification process itself takes a small amount of time, avoiding the situation where the queue is not correct after the task is executed and the task fails. The execution of queue tasks usually takes a lot of time. In this way, the time taken to execute tasks in the case of abnormal queue status can be greatly saved, thereby improving work efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例提供的一种队列状态校验方法流程示意图;FIG1 is a schematic flow chart of a queue status verification method provided by an embodiment of the present invention;

图2为本发明提供的DolphinScheduler的系统架构图;FIG2 is a system architecture diagram of DolphinScheduler provided by the present invention;

图3为本发明提供的一种DAG图;FIG3 is a DAG diagram provided by the present invention;

图4为本发明提供的一种对队列的状态进行校验的具体执行方法流程示意图;FIG4 is a schematic flow chart of a specific execution method for checking the state of a queue provided by the present invention;

图5为本发明提供的另一种对队列的状态进行校验的具体执行方法流程示意图;FIG5 is a flow chart of another specific execution method for checking the state of a queue provided by the present invention;

图6为本发明提供的另一种对队列的状态进行校验的具体执行方法流程示意图;FIG6 is a flowchart of another specific execution method for checking the state of a queue provided by the present invention;

图7为本发明提供的另一种对队列的状态进行校验的具体执行方法流程示意图;FIG7 is a flowchart of another specific execution method for checking the state of a queue provided by the present invention;

图8为本发明提供的一种队列状态校验方法的整体流程示意图;FIG8 is a schematic diagram of the overall flow of a queue status verification method provided by the present invention;

图9为本发明提供的从资源中心上传yarn的配置文件到运行任务完成,结果上报的整个流程方法流程示意图;FIG9 is a flow chart of the entire process method provided by the present invention, from uploading the configuration file of YARN to completing the task and reporting the result;

图10为本发明实施例提供的一种队列状态校验装置的结构示意图;FIG10 is a schematic structural diagram of a queue status verification device provided by an embodiment of the present invention;

图11为本发明实施例提供一种电子设备的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

为便于对本发明实施例的理解,下面将结合附图以具体实施例做进一步的解释说明,实施例并不构成对本发明实施例的限定。To facilitate understanding of the embodiments of the present invention, specific embodiments will be further explained below in conjunction with the accompanying drawings. The embodiments do not constitute a limitation on the embodiments of the present invention.

图1为本发明实施例提供的一种队列状态校验方法流程示意图,在介绍该方法步骤之前,首先介绍该方法适用的一种应用环境,以及对应的一种系统架构图。如图2所示,图2为本发明实施例对应的系统架构图。应用环境可以是DolphinScheduler中对工作流进行调度工作的情景。在工作流中的任务执行之前,可以执行本申请实施例中的方法,用以验证执行工作流中的任务队列的状态。FIG. 1 is a flow chart of a queue status verification method provided by an embodiment of the present invention. Before introducing the steps of the method, an application environment applicable to the method and a corresponding system architecture diagram are first introduced. As shown in FIG. 2 , FIG. 2 is a system architecture diagram corresponding to an embodiment of the present invention. The application environment can be a scenario in which a workflow is scheduled in DolphinScheduler. Before executing a task in a workflow, the method in the embodiment of the present application can be executed to verify the status of the task queue in the execution workflow.

DolphinScheduler是一个分布式易扩展的可视化DAG工作流任务调度系统。为Apache的一个顶级项目。图2示意出了DolphinScheduler的系统架构图。DolphinScheduler is a distributed and scalable visual DAG workflow task scheduling system. It is a top-level project of Apache. Figure 2 shows the system architecture of DolphinScheduler.

其中,UI和restful API Server的功能主要在界面体现,即在显示界面用户可以配置工作流,以及上传与工作流中的任务对应的资源文件到文件系统,文件系统例如可以是分布式文件系统(Hadoop Distributed File System,简称HDFS)或简单存储服务器(Simple Storage Service,简称S3)等。还包括将工作流元数据加入到database中,以及查询task的日志信息等。其中,task日志信息是通过ZK集群(ZK Cluster)监听worker集群中worker的工作状态获取。具体参见图2所示,ZK集群用于完成worker的服务注册、通过心跳或者其他监听手段监听worker的工作状态,包括查询task的日志信息等工作。同样的,ZK集群还用于完成master集群中master的服务注册、通过心跳或者其他监听手段监听master的服务状态等。Master集群中的master通过数据监听方式监听database中的数据变化,用以监听到有新的任务时,执行对有向无环图DAG工作流中的任务切分。然后将任务依次分发到worker集群中的某个worker上。Among them, the functions of UI and restful API Server are mainly reflected in the interface, that is, in the display interface, users can configure the workflow and upload the resource files corresponding to the tasks in the workflow to the file system. The file system can be, for example, a distributed file system (Hadoop Distributed File System, referred to as HDFS) or a simple storage server (Simple Storage Service, referred to as S3). It also includes adding workflow metadata to the database and querying the log information of the task. Among them, the task log information is obtained by monitoring the working status of the worker in the worker cluster through the ZK cluster (ZK Cluster). As shown in Figure 2, the ZK cluster is used to complete the service registration of the worker, monitor the working status of the worker through heartbeat or other monitoring means, including querying the log information of the task. Similarly, the ZK cluster is also used to complete the service registration of the master in the master cluster, monitor the service status of the master through heartbeat or other monitoring means, etc. The master in the Master cluster monitors the data changes in the database through data monitoring, and when it monitors that there is a new task, it executes the task segmentation in the directed acyclic graph DAG workflow. Then the tasks are distributed to a worker in the worker cluster in sequence.

被分配任务的worker则下载与任务对应的配置文件,并完成相应的任务执行工作。下载配置文件,则通过文件系统中下载(图中示意的是下载资源文件)。具体的,worker可执行的任务的类型包括但不限于图中所示的任务类型,例如sqoop、Flume、Shell、Python、MR、Spark、SQL、Flink等等。The worker assigned the task downloads the configuration file corresponding to the task and completes the corresponding task execution work. The configuration file is downloaded from the file system (the figure shows the download of resource files). Specifically, the types of tasks that can be executed by the worker include but are not limited to the task types shown in the figure, such as sqoop, Flume, Shell, Python, MR, Spark, SQL, Flink, etc.

图3中示意出了一种DAG图,master在拆分DAG工作流流中的任务时,可根据界面预先生成的任务执行配置文件(该文件中包括任务执行指向顺序,例如先执行A,然后执行B和C,最后执行D;或者,文件中包括个任务之间的依赖关系,例如执行任务B之前,首先要执行任务A,执行任务D之前,首先要执行任务B和C等),通过这样的执行配置文件,完成对任务的切割工作。然后,先下发A任务到worker集群中随机选中的worker,当worker完成工作,生成响应信息到master,master再将任务B和任务C随机下发到至少一个(可能是两个)worker中,完成任务后,最终再下发任务D到随机的worker中。FIG3 illustrates a DAG graph. When the master splits the tasks in the DAG workflow, it can perform the task splitting work according to the task execution configuration file pre-generated in the interface (the file includes the task execution order, such as executing A first, then executing B and C, and finally executing D; or, the file includes the dependency relationship between tasks, such as before executing task B, first executing task A, before executing task D, first executing tasks B and C, etc.). Then, first send task A to a randomly selected worker in the worker cluster. When the worker completes the work and generates a response message to the master, the master then randomly sends task B and task C to at least one (possibly two) workers. After completing the tasks, task D is finally sent to a random worker.

Worke从文件系统中下载执行任务需要的配置文件,执行具体的任务,监控任务的状态并向Master汇报。Worker downloads the configuration files required to execute tasks from the file system, executes specific tasks, monitors the status of tasks and reports to the Master.

DolphinScheduler支持在集群中执行Flink、MR、Spark等各种任务。通过上述系统架构,可以定时按照DAG的顺序调度监控管理Flink、MR、Spark等各种类型的任务,降低了大数据开发的运维成本。而且,还可以通过可视化拖拉拽的方式配置DAG图与其中的任务信息,方便易用。DolphinScheduler supports executing various tasks such as Flink, MR, Spark, etc. in the cluster. Through the above system architecture, various types of tasks such as Flink, MR, Spark, etc. can be scheduled, monitored and managed in the order of DAG, reducing the operation and maintenance costs of big data development. Moreover, the DAG graph and the task information therein can be configured by visual drag and drop, which is convenient and easy to use.

而本方法步骤,则是由上述系统架构中的worker执行,具体的方法步骤参见如下,包括:The steps of this method are executed by the worker in the above system architecture. The specific steps of the method are as follows, including:

步骤110,获取待执行的任务。Step 110, obtaining the task to be executed.

也即是接收到master配置的任务。That is, the task that receives the master configuration.

步骤120,从任务中提取执行任务的队列对应的队列标识信息。Step 120: extracting queue identification information corresponding to the queue that executes the task from the task.

具体的,配置的任务中携带有执行该任务的队列对应的队列标识信息。Specifically, the configured task carries queue identification information corresponding to the queue that executes the task.

在一个可选的例子中,队列标识信息例如是队列的ID信息。In an optional example, the queue identification information is, for example, ID information of the queue.

步骤130,根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项。Step 130: According to the queue identification information, a configuration file and configuration items corresponding to the queue identification information are obtained from a pre-built database.

具体的,在一个可选的例子中,如上所介绍的,队列标识信息例如可以包括队列的ID信息。Specifically, in an optional example, as introduced above, the queue identification information may include, for example, ID information of the queue.

根据队列的ID信息,可以从预构建的队列库中,获取与队列标识信息对应的队列信息。在一个具体的例子中,预构建的队列库可以理解为图1中的database数据库。According to the ID information of the queue, the queue information corresponding to the queue identification information can be obtained from the pre-built queue library. In a specific example, the pre-built queue library can be understood as the database in FIG. 1 .

在获取到队列信息后,根据队列信息,从预构建的数据库中下载与队列信息对应的配置文件和配置项。After obtaining the queue information, the configuration files and configuration items corresponding to the queue information are downloaded from the pre-built database according to the queue information.

在一个可选的例子中,预构建的数据库可以理解为图1中的文件系统。In an optional example, the pre-built database can be understood as the file system in FIG. 1 .

实际上,在执行任务之前,本发明实施例还包括:将队列信息加入到预构建的队列库中。In fact, before executing the task, the embodiment of the present invention further includes: adding the queue information to a pre-built queue library.

在执行队列入库之前,首先要对队列进行校验,包括校验队列是否存在,队列状态是否正常等。如果队列不存在,则无法将队列信息加入到队列库。具体的校验过程,同本发明实施例即将介绍的对于队列的状态的验证过程,因此这个过程将在下文中做详细描述。Before executing queue storage, the queue must first be verified, including whether the queue exists, whether the queue status is normal, etc. If the queue does not exist, the queue information cannot be added to the queue library. The specific verification process is the same as the verification process for the queue status to be introduced in the embodiment of the present invention, so this process will be described in detail below.

在一个具体的例子中,队列可以为yarn队列,队列集群为yarn队列集群,简称yarn集群。In a specific example, the queue may be a yarn queue, and the queue cluster may be a yarn queue cluster, referred to as a yarn cluster.

在对队列校验成功后,则将队列入库,否则,可以生成队列入库失败的信息,并在界面进行显示。方便工作人员更改配置文件信息,或者重新上传一份新的配置信息,再次进行校验,直至校验成功,完成队列入库。After the queue is successfully verified, the queue will be stored. Otherwise, a message indicating that the queue has failed to be stored can be generated and displayed on the interface. This allows the staff to change the configuration file information or upload a new configuration information and verify again until the verification is successful and the queue is stored.

而队列库中,所包括的队列信息可以包括与队列ID对应的队列名称以及与队列名称对应的配置文件和指示项等对应的指示信息,这里的指示信息,例如可以是配置文件和指示项的存储路径等。The queue information included in the queue library may include a queue name corresponding to the queue ID and indication information corresponding to the configuration file and indication items corresponding to the queue name. The indication information here may be, for example, a storage path of the configuration file and indication items.

那么通过队列信息,就可以在预构建的数据库中找到存储对应配置文件的存储路径,并在存储路径中下载相应的配置文件,以及配置项。Then, through the queue information, you can find the storage path of the corresponding configuration file in the pre-built database, and download the corresponding configuration file and configuration items in the storage path.

步骤140,根据配置文件和配置项,获取待执行任务的队列。Step 140: Obtain a queue of tasks to be executed according to the configuration file and configuration items.

在一个可选的例子中,配置文件用于指示执行任务的队列集群,配置项用于指示队列集群中的队列。In an optional example, the configuration file is used to indicate a queue cluster for executing a task, and the configuration item is used to indicate a queue in the queue cluster.

在一个具体的例子中,配置文件中例如包括Yarn的主机IP、集群URL等内容。根据这些内容,可以获知yarn集群。指示项,则指示执行任务为yarn集群中的哪个队列。In a specific example, the configuration file includes the host IP and cluster URL of Yarn. Based on these contents, the Yarn cluster can be known. The indicator item indicates which queue in the Yarn cluster the task is to be executed.

因此,可以通过配置文件和配置项,获取待执行任务的队列。在一个可选的例子中,配置文件例如可以包括但不限于如下文件:core-site.xml、hdfs-site.xml、yarn-site.xml三个文件。Therefore, the queue of tasks to be executed can be obtained through the configuration file and the configuration item. In an optional example, the configuration file may include, but is not limited to, the following files: core-site.xml, hdfs-site.xml, and yarn-site.xml.

步骤150,根据配置文件,验证队列的状态。Step 150: Verify the state of the queue according to the configuration file.

在获取到执行任务对应的队列后,并不能够立即将任务提交到该队列,以执行任务。而是要事先验证队列的状态,只有在确定队列的状态为正常的情况下,才能够利用该队列执行任务,否则直接提示队列异常,以便工作人员及时进行处理。避免因为任务运行结束后才告知队列异常的情况发生,白白浪费大量运行时间。After obtaining the queue corresponding to the task to be executed, the task cannot be submitted to the queue immediately for execution. Instead, the queue status must be verified in advance. Only when the queue status is confirmed to be normal can the queue be used to execute the task. Otherwise, the queue will be directly prompted to be abnormal so that the staff can handle it in time. Avoid the situation where the queue abnormality is notified after the task is completed, which wastes a lot of running time.

可选的,在一个可执行的例子中,具体执行对队列的状态校验过程时,可以通过如下方式实现,具体参见图4所示,该方法包括:Optionally, in an executable example, when the queue status check process is specifically performed, it can be implemented in the following manner, as shown in FIG. 4, the method includes:

步骤410,解析配置文件,获取配置信息。Step 410: parse the configuration file to obtain configuration information.

步骤420,从配置信息中提取预设的第一配置参数。Step 420: extract a preset first configuration parameter from the configuration information.

步骤430,根据第一配置参数,执行与第一配置参数对应的操作指令,获取操作结果。Step 430: Execute the operation instruction corresponding to the first configuration parameter according to the first configuration parameter to obtain the operation result.

步骤440,根据操作结果,验证队列的状态。Step 440: Verify the state of the queue according to the operation result.

在一个可选的例子中,第一配置参数可以包括但不限于如下中的一种或多种:Yarn Resource Manager主机或Yarn URL地址。In an optional example, the first configuration parameter may include, but is not limited to, one or more of the following: a Yarn Resource Manager host or a Yarn URL address.

当第一配置参数为Yarn Resource Manager主机时,执行的操作指令为通过网络命令连接改该Yarn Resource Manager主机,获取连接结果。如果连接失败,则队列的状态更新为主机异常。否则,暂时可以确定队列的状态为正常。When the first configuration parameter is the Yarn Resource Manager host, the executed operation instruction is to connect to the Yarn Resource Manager host through a network command to obtain the connection result. If the connection fails, the queue status is updated to host abnormality. Otherwise, the queue status can be temporarily determined to be normal.

和/或,当第一配置参数为Yarn URL地址时,执行的操作指令可以包括:尝试构建连接Yarn URL地址,并获取连接结果。如果连接失败,则说明队列状态为服务异常,否则,暂时可以确定队列的状态为正常。And/or, when the first configuration parameter is a Yarn URL address, the executed operation instruction may include: attempting to build a connection to the Yarn URL address and obtaining a connection result. If the connection fails, it means that the queue status is a service exception, otherwise, it can be temporarily determined that the queue status is normal.

当然,如果第一配置参数同时包括多种时,只要任一种参数对应的执行结果异常,则说明队列异常。Of course, if the first configuration parameter includes multiple types at the same time, as long as the execution result corresponding to any one of the parameters is abnormal, it means that the queue is abnormal.

在另一个可选的例子中,根据配置文件,验证队列的状态,还可以包括如下方法步骤,具体参见图5所示,包括:In another optional example, verifying the state of the queue according to the configuration file may also include the following method steps, as shown in FIG5 , including:

步骤510,根据配置文件,构造队列客户端。Step 510: construct a queue client according to the configuration file.

步骤520,利用队列客户端,获取队列集群中的所有队列信息,以验证队列是否存在于队列集群中。Step 520: Using the queue client, obtain all queue information in the queue cluster to verify whether the queue exists in the queue cluster.

步骤530,当确定队列并不存在于队列集群中时,确定队列状态异常。Step 530: When it is determined that the queue does not exist in the queue cluster, it is determined that the queue state is abnormal.

或者,步骤540,当确定队列存在于队列集群中时,确定队列状态正常。Alternatively, in step 540, when it is determined that the queue exists in the queue cluster, it is determined that the queue state is normal.

具体的,根据配置文件,创建客户端。然后利用客户端,获取配置文件所指向的队列集群中所有的队列信息,通常可以以列表的形式体现。然后,确定指示项所指示的队列是否存在与队列列表中。如果存在,则可以确定队列状态为正常,否则,确定队列状态异常,队列状态可以更新为队列错误。Specifically, a client is created according to the configuration file. Then, the client is used to obtain all queue information in the queue cluster pointed to by the configuration file, which can usually be reflected in the form of a list. Then, it is determined whether the queue indicated by the indicator item exists in the queue list. If it exists, it can be determined that the queue status is normal. Otherwise, it is determined that the queue status is abnormal, and the queue status can be updated to a queue error.

进一步可选的,在确定队列存在于队列集群中后,该方法还可以包括如下方法步骤,用以验证队列的状态,具体参见图6所示,包括:Further optionally, after determining that the queue exists in the queue cluster, the method may further include the following method steps to verify the state of the queue, as shown in FIG. 6 , including:

步骤610,验证队列的提交任务权限。Step 610: Verify the task submission authority of the queue.

步骤620,当确定队列的提交任务权限不包括执行任务时,确定队列状态异常。Step 620: When it is determined that the task submission authority of the queue does not include task execution, it is determined that the queue state is abnormal.

步骤630,当确定队列的提交任务权限包括执行任务时,确定队列状态正常。Step 630: When it is determined that the task submission authority of the queue includes executing tasks, it is determined that the queue state is normal.

也即是,进一步通过客户端查看该队列的任务提交权限,如果没有该任务的提交权限,则队列的状态可以更新为权限不足,用以指示队列状态异常。否则,队里的状态可一个更新为权限充足,用以指示队列状态暂时正常。That is, the task submission permission of the queue is further checked through the client. If there is no submission permission for the task, the queue status can be updated to insufficient permission to indicate that the queue status is abnormal. Otherwise, the queue status can be updated to sufficient permission to indicate that the queue status is temporarily normal.

除了执行上述操作外,该方法还可以包括如下操作方式,用以进一步验证队列状态,具体参见图7所示,包括:In addition to performing the above operations, the method may also include the following operation modes to further verify the queue status, as shown in FIG. 7 , including:

步骤710,提交一个测试任务到队列,获取执行结果。Step 710: Submit a test task to the queue and obtain the execution result.

步骤720,根据执行结果,进一步验证队列的状态。Step 720: further verify the state of the queue according to the execution result.

具体的,提交测试任务到队列,获取执行结果可以包括但不限于如下几种情况,例如包括:Specifically, submitting a test task to a queue and obtaining execution results may include but are not limited to the following situations, for example:

第一种,任务运行失败,那么队列的状态可以更新为队列异常。First, if the task fails to run, the queue status can be updated to queue exception.

第二种,任务运行成功,但时间超过预设时间阈值,那么队列的状态更新为资源紧张。The second is that the task runs successfully, but the time exceeds the preset time threshold, then the queue status is updated to resource shortage.

第三种,任务运行成功,且时间小于或者等于预设时间阈值,队列的状态可以更新为资源充足,队列状态暂时正常。The third type is that the task runs successfully and the time is less than or equal to the preset time threshold. The queue status can be updated to sufficient resources and the queue status is temporarily normal.

图8示意出了上述验证过程的整体流程示意图,具体参见图8所示,具体执行过程已在上文做了详细介绍,因此这里不再过多赘述。FIG8 shows a schematic diagram of the overall flow of the above verification process. For details, please refer to FIG8 . The specific execution process has been introduced in detail above, so it will not be described in detail here.

而在队列状态异常的情况下,还包括对队列异常情况进行提示,以引导用户解决问题。In the case of abnormal queue status, it also includes prompts of the abnormal queue situation to guide users to solve the problem.

针对每一种异常状态,提示信息与具体流程参见如下:For each abnormal state, the prompt information and specific process are as follows:

主机异常:Host abnormality:

提示“Yarn主机无法连接,请检查网络情况、Yarn主机状态、配置文件”。弹出Yarn队列编辑框,用户可以修改Yarn队列信息文件,点击保存后,将会再次对队列进行校验,更新队列状态。The message "Yarn host cannot be connected, please check the network, Yarn host status, and configuration files" is displayed. The Yarn queue edit box pops up, and the user can modify the Yarn queue information file. After clicking Save, the queue will be verified again and the queue status will be updated.

服务异常:Service exception:

提示“Yarn服务异常,请检查Yarn服务状态、配置文件”。弹出Yarn队列编辑框,用户可以修改Yarn队列信息文件,点击保存后,将会再次对队列进行校验,更新队列状态。The message "Yarn service is abnormal, please check the Yarn service status and configuration files" is displayed. The Yarn queue editing box pops up, and the user can modify the Yarn queue information file. After clicking Save, the queue will be verified again and the queue status will be updated.

队列错误:Queue Error:

提示“Yarn队列不存在,请检查Yarn服务中检查队列是否存在、配置文件”。弹出Yarn队列编辑框,用户可以修改Yarn队列信息文件,点击保存后,将会再次对队列进行校验,更新队列状态。The message "Yarn queue does not exist, please check the configuration file in the Yarn service to see if the queue exists" is displayed. The Yarn queue edit box pops up, and the user can modify the Yarn queue information file. After clicking Save, the queue will be verified again and the queue status will be updated.

权限不足:Insufficient permissions:

提示“Yarn队列权限不足,请检查Yarn服务中检查队列权限是否充足、配置文件”。弹出Yarn队列编辑框,用户可以修改Yarn队列信息文件,点击保存后,将会再次对队列进行校验,更新队列状态。The message "Insufficient Yarn queue permissions, please check whether the queue permissions are sufficient and the configuration file in the Yarn service" is displayed. The Yarn queue edit box pops up, and the user can modify the Yarn queue information file. After clicking Save, the queue will be verified again and the queue status will be updated.

队列异常:Queue exception:

提示“Yarn队列异常,请检查Yarn服务是否运行正常、配置文件”。弹出Yarn队列编辑框,用户可以修改Yarn队列信息文件,点击保存后,将会再次对队列进行校验,更新队列状态。The message "Yarn queue is abnormal, please check whether the Yarn service is running normally and the configuration file" is displayed. The Yarn queue editing box pops up, and the user can modify the Yarn queue information file. After clicking Save, the queue will be verified again and the queue status will be updated.

资源不足:Insufficient resources:

提示“Yarn资源不足,请确认是否继续运行”。弹出提示框,点击继续运行,运行Yarn任务。The message "Insufficient Yarn resources, please confirm whether to continue running" pops up. Click Continue to run the Yarn task.

队列正常:The queue is normal:

直接运行Yarn任务。Run the YARN task directly.

还需要说明的是,上文所提及的在执行本发明上述方法步骤操作之前,还需要对队列进行校验,然后完成队列信息入库的操作。具体的队列校验操作,完全可以参见上述操作过程。而因为配置文件是工作人员事先配置并存储在数据库中的,所以,即使不执行配置任务,也可以调用配置文件,用以验证队列是否存在。并在确定队列存在的情况下,才会将队列信息加入到队列库,后续才能够在队列库中根据队列信息调用队列执行任务。It should also be noted that before executing the above-mentioned method steps of the present invention, it is necessary to verify the queue and then complete the operation of warehousing the queue information. For the specific queue verification operation, please refer to the above operation process. Because the configuration file is pre-configured by the staff and stored in the database, even if the configuration task is not executed, the configuration file can be called to verify whether the queue exists. And only when it is determined that the queue exists, the queue information will be added to the queue library, and the queue execution task can be called in the queue library according to the queue information later.

因此,在本发明实施例之前执行上述对队列的校验工作与本发明后续执行的所有操作都不冲突。Therefore, the above-mentioned verification work on the queue performed before the embodiment of the present invention does not conflict with all operations subsequently performed by the present invention.

图9中示意出了从资源中心上传yarn的配置文件到运行任务完成,结果上报的整个流程,具体参见图9所示,包括:FIG9 illustrates the entire process from uploading the YARN configuration file to the resource center to completing the task and reporting the result. For details, see FIG9, including:

1、工作人员事先将yarn配置文件上传到文件系统。1. The staff uploads the yarn configuration file to the file system in advance.

这里所指的上传操作仅执行一次。就是将所有配置文件均上传到文件系统。后续,在根据yarn集群选择配置文件时,无需重新手动上传,而是可以通过DolphinScheduler系统中的界面进行灵活选择。The upload operation mentioned here is performed only once. That is, all configuration files are uploaded to the file system. Later, when selecting configuration files based on the yarn cluster, there is no need to manually upload them again, but you can flexibly select them through the interface in the DolphinScheduler system.

举一个例子,假设DolphinScheduler系统中包括3个worker。3个worker的上配置文件均指向yarn集群A,但是任务六中的A,B任务向集群A提交,C任务向集群B提交,因为worker都指向集群A,导致任务执行失败。现有技术中的手段是需要A执行完毕后,暂停任务,工作人员需手动将所有worker上配置文件指向集群B,然后再执行任务B和任务C,操作繁琐。而且,这种操作仅是理论上可实现,在实际执行过程中很难实现。而在本申请实施例中,通过在DolphinScheduler系统中直接添加配置项,通过该配置项可以选择队列,在执行任务时,可以直接根据选择的队列不同,从文件系统中下载与不同队列对应的配置文件即可,无需工作人员重复手动上传操作文件,扩充DolphinScheduler功能,提升易用性。For example, suppose that the DolphinScheduler system includes 3 workers. The configuration files of the 3 workers all point to yarn cluster A, but tasks A and B in task six are submitted to cluster A, and task C is submitted to cluster B. Because the workers all point to cluster A, the task execution fails. The means in the prior art require that task A be paused after execution, and the staff must manually point the configuration files on all workers to cluster B, and then execute tasks B and C, which is cumbersome. Moreover, this operation is only feasible in theory and is difficult to implement in actual execution. In the embodiment of the present application, by directly adding configuration items in the DolphinScheduler system, queues can be selected through the configuration items. When executing tasks, the configuration files corresponding to different queues can be directly downloaded from the file system according to the selected queues. There is no need for staff to manually upload operation files repeatedly, thereby expanding the functions of DolphinScheduler and improving ease of use.

2、在队列管理中配置yarn队列。2. Configure the yarn queue in queue management.

也即是将yarn队列信息入库,入库之前还需要执行上述所列举的验证工作。验证成功后,方能入库。That is, the yarn queue information is stored in the warehouse. Before storage, the verification work listed above needs to be performed. Only after the verification is successful can it be stored in the warehouse.

3、工作人员配置任务时,只可以选择已经创建入库,状态为正常的队列。3. When staff configure tasks, they can only select queues that have been created and are in normal status.

4、系统运行工作流,如果工作流中存在不可用的队列(此时,同样可以执行上述对队列进行验证的操作),该工作流无法运行;如果不存在,将工作流下发到Master。4. The system runs the workflow. If there is an unavailable queue in the workflow (at this time, the above-mentioned queue verification operation can also be performed), the workflow cannot run; if it does not exist, the workflow will be sent to the Master.

5、Master拆解工作流中的任务,封装如队列信息与对应的文件依赖信息,将任务下发到Worker中。5. The Master disassembles the tasks in the workflow, encapsulates queue information and corresponding file dependency information, and sends the tasks to the Worker.

6、Worker根据Master封装的队列信息,下载对应的配置文件。6. Worker downloads the corresponding configuration file based on the queue information encapsulated by Master.

7、根据配置文件,校验Yarn队列的运行状态,yarn队列正常执行8,不正常执行9。7. According to the configuration file, verify the running status of the Yarn queue. If the Yarn queue runs normally, execute 8; otherwise, execute 9.

8、提交任务,监控任务状态,上报Master,Api展示任务状态,结束。8. Submit the task, monitor the task status, report to the Master, and use the API to display the task status.

9、将Yarn队列不正常的状态上报到Master中,Master将该队列的状态标记为不可用,同时任务运行的状态为失败。9. Report the abnormal status of the Yarn queue to the Master, and the Master will mark the status of the queue as unavailable, and the status of the task operation as failed.

整体操作过程中的相应操作细节,均已在上文中做了详细介绍,因此之类不再过多赘述。The corresponding operation details in the overall operation process have been introduced in detail above, so I will not go into details about them.

本发明实施例提供的队列状态校验方法,获取待执行的任务,从任务中提取与执行该任务的队列对应的队列标识信息。根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项。根据配置文件和配置项,从预构建的队列库中提取队列,然后根据配置文件,验证队列的状态。进而在确定队列状态正常时,再利用该队列执行任务。也即是,执行任务之前,首先验证队列状态是否正常。验证过程本身占用少量时间,避免在执行任务,并在执行任务失败后,才获知队列不正确的情况发生。队列任务的执行通常占用很多时间,通过该方式,可以大大节省在队列状态异常情况下执行任务所占用的时间,提升工作效率。The queue status verification method provided by the embodiment of the present invention obtains the task to be executed, and extracts the queue identification information corresponding to the queue executing the task from the task. According to the queue identification information, the configuration file and configuration items corresponding to the queue identification information are obtained from the pre-built database. According to the configuration file and the configuration item, the queue is extracted from the pre-built queue library, and then the status of the queue is verified according to the configuration file. Then, when it is determined that the queue status is normal, the queue is used to execute the task. That is, before executing the task, first verify whether the queue status is normal. The verification process itself takes a small amount of time, avoiding the situation where the queue is incorrect only after the task is executed and the task fails. The execution of queue tasks usually takes a lot of time. In this way, the time taken to execute tasks under abnormal queue status can be greatly saved, thereby improving work efficiency.

而且,配置文件和配置项预先配置到预构建的数据库中,无需用户手动上传。即使当任务需要适应性的切换为其他队列集群中的队列执行时,worker中的配置文件也无需通过用户再次手动上传更换,尤其在worker数量巨大的情况下,该效果体现的尤为明显。再者,也正是因为配置文件已经预先配置在预构建的数据库中,所以当任务临时需要切换队列集群时,工作流也无需暂停,而是直接通过配置文件的调用完成后续的任务的执行工作,操作更加灵活方便,大大提升工作效率。Moreover, the configuration files and configuration items are pre-configured in the pre-built database, and users do not need to upload them manually. Even when the task needs to be adaptively switched to a queue in another queue cluster for execution, the configuration file in the worker does not need to be manually uploaded and replaced by the user again. This effect is particularly evident when there are a large number of workers. Furthermore, precisely because the configuration files have been pre-configured in the pre-built database, when a task temporarily needs to switch queue clusters, the workflow does not need to be paused, but the subsequent task execution work is completed directly by calling the configuration file, which makes the operation more flexible and convenient, greatly improving work efficiency.

图10为本发明实施例提供的一种队列状态校验装置,该装置包括:获取模块101、提取模块102、处理模块103,以及验证模块104。FIG. 10 is a queue status verification device provided by an embodiment of the present invention, the device comprising: an acquisition module 101 , an extraction module 102 , a processing module 103 , and a verification module 104 .

获取模块101,用于获取待执行的任务;An acquisition module 101 is used to acquire tasks to be executed;

提取模块102,用于从任务中提取执行任务的队列对应的队列标识信息;An extraction module 102 is used to extract queue identification information corresponding to a queue for executing a task from a task;

处理模块103,用于根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项;根据配置文件和配置项,获取待执行任务的队列;The processing module 103 is used to obtain a configuration file and a configuration item corresponding to the queue identification information from a pre-built database according to the queue identification information; and obtain a queue of tasks to be executed according to the configuration file and the configuration item;

验证模块104,用于根据配置文件,验证队列的状态,以便后续确定队列的状态正常时,利用队列执行任务。The verification module 104 is used to verify the state of the queue according to the configuration file, so as to use the queue to execute the task when it is subsequently determined that the state of the queue is normal.

可选的,处理模块103,具体用于根据队列标识信息,从预构建的队列库中,获取与队列标识信息对应的队列信息;Optionally, the processing module 103 is specifically configured to obtain queue information corresponding to the queue identification information from a pre-built queue library according to the queue identification information;

根据队列信息,从预构建的数据库中下载与队列信息对应的配置文件和配置项。According to the queue information, download the configuration files and configuration items corresponding to the queue information from the pre-built database.

可选的,配置文件用于指示执行任务的队列集群,配置项用于指示队列集群中的队列。Optionally, the configuration file is used to indicate the queue cluster that executes the task, and the configuration item is used to indicate the queue in the queue cluster.

可选的,验证模块104,具体用于解析配置文件,获取配置信息;Optionally, the verification module 104 is specifically used to parse the configuration file and obtain configuration information;

从配置信息中提取预设的第一配置参数;Extracting a preset first configuration parameter from the configuration information;

根据第一配置参数,执行与第一配置参数对应的操作指令,获取操作结果;According to the first configuration parameter, executing an operation instruction corresponding to the first configuration parameter to obtain an operation result;

根据操作结果,验证队列的状态。Based on the operation results, verify the status of the queue.

可选的,验证模块104,还用于根据配置文件,构造队列客户端;Optionally, the verification module 104 is further used to construct a queue client according to the configuration file;

利用队列客户端,获取队列集群中的所有队列信息,以验证队列是否存在于队列集群中;Use the queue client to obtain all queue information in the queue cluster to verify whether the queue exists in the queue cluster;

当确定队列并不存在于队列集群中时,确定队列状态异常;When it is determined that the queue does not exist in the queue cluster, it is determined that the queue state is abnormal;

否则,确定队列状态正常。Otherwise, the queue status is determined to be normal.

可选的,验证模块104,还用于当确定队列存在于队列集群中后,验证队列的提交任务权限;Optionally, the verification module 104 is further used to verify the task submission permission of the queue after determining that the queue exists in the queue cluster;

当确定队列的提交任务权限不包括执行任务时,确定队列状态异常;When it is determined that the task submission authority of the queue does not include the task execution, it is determined that the queue state is abnormal;

否则,确定队列状态正常。Otherwise, the queue status is determined to be normal.

可选的,验证模块104,还用于当确定队列的状态正常时,利用队列执行任务之前,提交一个测试任务到队列,获取执行结果;Optionally, the verification module 104 is further configured to submit a test task to the queue and obtain an execution result before executing a task using the queue when it is determined that the state of the queue is normal;

根据执行结果,进一步验证队列的状态。Based on the execution results, further verify the status of the queue.

本发明实施例提供的一种队列状态校验装置,获取待执行的任务,从任务中提取与执行该任务的队列对应的队列标识信息。根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项。根据配置文件和配置项,从预构建的队列库中提取队列,然后根据配置文件,验证队列的状态。进而在确定队列状态正常时,再利用该队列执行任务。也即是,执行任务之前,首先验证队列状态是否正常。验证过程本身占用少量时间,避免在执行任务,并在执行任务失败后,才获知队列不正确的情况发生。队列任务的执行通常占用很多时间,通过该方式,可以大大节省在队列状态异常情况下执行任务所占用的时间,提升工作效率。A queue status verification device provided by an embodiment of the present invention obtains a task to be executed, and extracts queue identification information corresponding to the queue that executes the task from the task. According to the queue identification information, a configuration file and a configuration item corresponding to the queue identification information are obtained from a pre-built database. According to the configuration file and the configuration item, a queue is extracted from a pre-built queue library, and then the status of the queue is verified according to the configuration file. Then, when it is determined that the queue status is normal, the queue is used to execute the task. That is, before executing the task, first verify whether the queue status is normal. The verification process itself takes a small amount of time, which avoids the situation where the queue is incorrect only after the task is executed and the execution of the task fails. The execution of queue tasks usually takes a lot of time. In this way, the time taken to execute tasks under abnormal queue status can be greatly saved, thereby improving work efficiency.

而且,配置文件和配置项预先配置到预构建的数据库中,无需用户手动上传。即使当任务需要适应性的切换为其他队列集群中的队列执行时,worker中的配置文件也无需通过用户再次手动上传更换,尤其在worker数量巨大的情况下,该效果体现的尤为明显。再者,也正是因为配置文件已经预先配置在预构建的数据库中,所以当任务临时需要切换队列集群时,工作流也无需暂停,而是直接通过配置文件的调用完成后续的任务的执行工作,操作更加灵活方便,大大提升工作效率。Moreover, the configuration files and configuration items are pre-configured in the pre-built database, and users do not need to upload them manually. Even when the task needs to be adaptively switched to a queue in another queue cluster for execution, the configuration file in the worker does not need to be manually uploaded and replaced by the user again. This effect is particularly evident when there are a large number of workers. Furthermore, precisely because the configuration files have been pre-configured in the pre-built database, when a task temporarily needs to switch queue clusters, the workflow does not need to be paused, but the subsequent task execution work is completed directly by calling the configuration file, which makes the operation more flexible and convenient, greatly improving work efficiency.

如图11所示,本申请实施例提供了一种电子设备,该电子设备中运行有DolphinScheduler系统,该电子设备包括处理器111、通信接口112、存储器113和通信总线114,其中,处理器111,通信接口112,存储器113通过通信总线114完成相互间的通信,As shown in FIG11 , an embodiment of the present application provides an electronic device in which a DolphinScheduler system is running. The electronic device includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, wherein the processor 111, the communication interface 112, and the memory 113 communicate with each other through the communication bus 114.

存储器113,用于存放计算机程序;Memory 113, used for storing computer programs;

在本申请一个实施例中,处理器111,用于执行存储器113上所存放的程序时,实现前述任意一个方法实施例提供的队列状态校验方法,包括:In one embodiment of the present application, the processor 111 is used to implement the queue status verification method provided by any one of the aforementioned method embodiments when executing the program stored in the memory 113, including:

获取待执行的任务;Get the tasks to be executed;

从任务中提取执行任务的队列对应的队列标识信息;Extract the queue identification information corresponding to the queue that executes the task from the task;

根据队列标识信息,从预构建的数据库中获取与队列标识信息对应的配置文件以及配置项;According to the queue identification information, obtain the configuration file and configuration items corresponding to the queue identification information from the pre-built database;

根据配置文件和配置项,获取待执行任务的队列;Get the queue of tasks to be executed according to the configuration files and configuration items;

根据配置文件,验证队列的状态,以便后续确定队列的状态正常时,利用队列执行任务。Verify the status of the queue according to the configuration file so that when the status of the queue is determined to be normal, the queue can be used to execute tasks.

可选的,根据队列标识信息,从预构建的队列库中,获取与队列标识信息对应的队列信息;Optionally, according to the queue identification information, obtain queue information corresponding to the queue identification information from a pre-built queue library;

根据队列信息,从预构建的数据库中下载与队列信息对应的配置文件和配置项。According to the queue information, download the configuration files and configuration items corresponding to the queue information from the pre-built database.

可选的,配置文件用于指示执行任务的队列集群,配置项用于指示队列集群中的队列。Optionally, the configuration file is used to indicate the queue cluster that executes the task, and the configuration item is used to indicate the queue in the queue cluster.

可选的,解析配置文件,获取配置信息;Optionally, parse the configuration file to obtain configuration information;

从配置信息中提取预设的第一配置参数;Extracting a preset first configuration parameter from the configuration information;

根据第一配置参数,执行与第一配置参数对应的操作指令,获取操作结果;According to the first configuration parameter, executing an operation instruction corresponding to the first configuration parameter to obtain an operation result;

根据操作结果,验证队列的状态。Based on the operation results, verify the status of the queue.

可选的,根据配置文件,验证队列的状态,还包括:Optionally, verify the queue status according to the configuration file, including:

根据配置文件,构造队列客户端;Construct a queue client according to the configuration file;

利用队列客户端,获取队列集群中的所有队列信息,以验证队列是否存在于队列集群中;Use the queue client to obtain all queue information in the queue cluster to verify whether the queue exists in the queue cluster;

当确定队列并不存在于队列集群中时,确定队列状态异常;When it is determined that the queue does not exist in the queue cluster, it is determined that the queue state is abnormal;

否则,确定队列状态正常。Otherwise, the queue status is determined to be normal.

可选的,当确定队列存在于队列集群中后,还包括:验证队列的提交任务权限;Optionally, after confirming that the queue exists in the queue cluster, the process further includes: verifying the task submission permission of the queue;

当确定队列的提交任务权限不包括执行任务时,确定队列状态异常;When it is determined that the task submission authority of the queue does not include the task execution, it is determined that the queue state is abnormal;

否则,确定队列状态正常。Otherwise, the queue status is determined to be normal.

可选的,当确定队列的状态正常时,利用队列执行任务之前,还包括:Optionally, when it is determined that the queue is in a normal state, before using the queue to execute the task, the following is further included:

提交一个测试任务到队列,获取执行结果;Submit a test task to the queue and obtain the execution result;

根据执行结果,进一步验证队列的状态。Based on the execution results, further verify the status of the queue.

本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如前述任意一个方法实施例提供的队列状态校验方法的步骤。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the queue status verification method provided in any of the aforementioned method embodiments are implemented.

需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or device including the elements.

以上仅是本发明的具体实施方式,使本领域技术人员能够理解或实现本发明。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above is only a specific embodiment of the present invention, so that those skilled in the art can understand or implement the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present invention. Therefore, the present invention will not be limited to the embodiments shown herein, but should conform to the widest scope consistent with the principles and novel features applied herein.

Claims (8)

1.一种队列状态校验方法,其特征在于,所述方法包括:1. A queue status verification method, characterized in that the method comprises: 获取待执行的任务;Get the tasks to be executed; 从所述任务中提取执行所述任务的队列对应的队列标识信息;Extracting queue identification information corresponding to the queue that executes the task from the task; 根据所述队列标识信息,从预构建的数据库中获取与所述队列标识信息对应的配置文件以及配置项;According to the queue identification information, obtaining a configuration file and a configuration item corresponding to the queue identification information from a pre-built database; 根据所述配置文件和所述配置项,获取待执行所述任务的队列;According to the configuration file and the configuration item, obtaining a queue for executing the task; 根据所述配置文件,验证所述队列的状态,以便后续确定所述队列的状态正常时,利用所述队列执行所述任务;Verify the state of the queue according to the configuration file, so that when it is subsequently determined that the state of the queue is normal, the queue is used to execute the task; 其中,所述根据所述配置文件,验证所述队列的状态,具体包括:The step of verifying the state of the queue according to the configuration file specifically includes: 解析所述配置文件,获取配置信息;Parsing the configuration file to obtain configuration information; 从所述配置信息中提取预设的第一配置参数;Extracting a preset first configuration parameter from the configuration information; 根据所述第一配置参数,执行与所述第一配置参数对应的操作指令,获取操作结果;According to the first configuration parameter, executing an operation instruction corresponding to the first configuration parameter to obtain an operation result; 根据所述操作结果,验证所述队列的状态;Verifying the state of the queue according to the operation result; 所述根据所述配置文件,验证所述队列的状态,还包括:The verifying the state of the queue according to the configuration file further includes: 根据所述配置文件,构造队列客户端;According to the configuration file, construct a queue client; 利用所述队列客户端,获取所述队列集群中的所有队列信息,以验证所述队列是否存在于所述队列集群中;Using the queue client, obtaining all queue information in the queue cluster to verify whether the queue exists in the queue cluster; 当确定所述队列并不存在于所述队列集群中时,确定所述队列状态异常;When it is determined that the queue does not exist in the queue cluster, determining that the queue state is abnormal; 否则,确定所述队列状态正常。Otherwise, it is determined that the queue status is normal. 2.根据权利要求1所述的方法,其特征在于,所述根据所述队列标识信息,从预构建的文件库中获取与所述队列标识信息对应的配置文件以及配置项,具体包括:2. The method according to claim 1, characterized in that the step of obtaining, according to the queue identification information, a configuration file and a configuration item corresponding to the queue identification information from a pre-built file library specifically comprises: 根据所述队列标识信息,从预构建的队列库中,获取与所述队列标识信息对应的队列信息;According to the queue identification information, obtaining queue information corresponding to the queue identification information from a pre-built queue library; 根据所述队列信息,从预构建的数据库中下载与所述队列信息对应的配置文件和配置项。According to the queue information, configuration files and configuration items corresponding to the queue information are downloaded from a pre-built database. 3.根据权利要求1或2所述的方法,其特征在于,所述配置文件用于指示执行所述任务的队列集群,所述配置项用于指示所述队列集群中的队列。3. The method according to claim 1 or 2 is characterized in that the configuration file is used to indicate a queue cluster that executes the task, and the configuration item is used to indicate a queue in the queue cluster. 4.根据权利要求3所述的方法,其特征在于,当确定所述队列存在于所述队列集群中后,所述方法还包括:4. The method according to claim 3, characterized in that after determining that the queue exists in the queue cluster, the method further comprises: 验证所述队列的提交任务权限;Verify the task submission permission of the queue; 当确定所述队列的提交任务权限不包括执行所述任务时,确定所述队列状态异常;When it is determined that the task submission authority of the queue does not include executing the task, determining that the queue state is abnormal; 否则,确定所述队列状态正常。Otherwise, it is determined that the queue status is normal. 5.根据权利要求1、2,或4中任一项所述的方法,其特征在于,当确定所述队列的状态正常时,利用所述队列执行所述任务之前,所述方法还包括:5. The method according to any one of claims 1, 2, or 4, characterized in that when it is determined that the state of the queue is normal, before using the queue to execute the task, the method further comprises: 提交一个测试任务到所述队列,获取执行结果;Submit a test task to the queue and obtain the execution result; 根据所述执行结果,进一步验证所述队列的状态。According to the execution result, the state of the queue is further verified. 6.一种队列状态校验装置,其特征在于,所述装置包括:6. A queue status verification device, characterized in that the device comprises: 获取模块,用于获取待执行的任务;The acquisition module is used to obtain the tasks to be executed; 提取模块,用于从所述任务中提取执行所述任务的队列对应的队列标识信息;An extraction module, used to extract queue identification information corresponding to the queue executing the task from the task; 处理模块,用于根据所述队列标识信息,从预构建的数据库中获取与所述队列标识信息对应的配置文件以及配置项;根据所述配置文件和所述配置项,获取待执行所述任务的队列;A processing module, configured to obtain, from a pre-built database, a configuration file and a configuration item corresponding to the queue identification information according to the queue identification information; and obtain, according to the configuration file and the configuration item, a queue for executing the task; 验证模块,用于根据所述配置文件,验证所述队列的状态,以便后续确定所述队列的状态正常时,利用所述队列执行所述任务;A verification module, used to verify the state of the queue according to the configuration file, so as to use the queue to execute the task when it is subsequently determined that the state of the queue is normal; 其中,所述验证模块,具体用于从所述配置信息中提取预设的第一配置参数;Wherein, the verification module is specifically used to extract a preset first configuration parameter from the configuration information; 根据所述第一配置参数,执行与所述第一配置参数对应的操作指令,获取操作结果;According to the first configuration parameter, executing an operation instruction corresponding to the first configuration parameter to obtain an operation result; 根据所述操作结果,验证所述队列的状态;Verifying the state of the queue according to the operation result; 根据所述配置文件,构造队列客户端;According to the configuration file, construct a queue client; 利用所述队列客户端,获取所述队列集群中的所有队列信息,以验证所述队列是否存在于所述队列集群中;Using the queue client, obtaining all queue information in the queue cluster to verify whether the queue exists in the queue cluster; 当确定所述队列并不存在于所述队列集群中时,确定所述队列状态异常;When it is determined that the queue does not exist in the queue cluster, determining that the queue state is abnormal; 否则,确定所述队列状态正常。Otherwise, it is determined that the queue status is normal. 7.一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;7. An electronic device, characterized in that it comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus; 存储器,用于存放计算机程序;Memory, used to store computer programs; 处理器,用于执行存储器上所存放的程序时,实现权利要求1-5任一项所述的队列状态校验方法的步骤。A processor, for implementing the steps of the queue status verification method described in any one of claims 1 to 5 when executing a program stored in a memory. 8.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-5任一项所述的队列状态校验方法的步骤。8. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the queue status verification method according to any one of claims 1 to 5 are implemented.
CN202210083037.XA 2022-01-24 2022-01-24 Queue status verification method, device and electronic equipment Active CN114428710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083037.XA CN114428710B (en) 2022-01-24 2022-01-24 Queue status verification method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083037.XA CN114428710B (en) 2022-01-24 2022-01-24 Queue status verification method, device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114428710A CN114428710A (en) 2022-05-03
CN114428710B true CN114428710B (en) 2024-10-22

Family

ID=81314035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083037.XA Active CN114428710B (en) 2022-01-24 2022-01-24 Queue status verification method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114428710B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888721A (en) * 2019-10-15 2020-03-17 平安科技(深圳)有限公司 Task scheduling method and related device
CN113127185A (en) * 2019-12-31 2021-07-16 北京懿医云科技有限公司 Task execution queue processing method and device, storage medium and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198767B (en) * 2020-01-07 2024-10-18 平安科技(深圳)有限公司 Big data resource processing method, device, terminal and storage medium
US11010719B1 (en) * 2020-10-16 2021-05-18 Coupang Corp. Systems and methods for detecting errors of asynchronously enqueued requests

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888721A (en) * 2019-10-15 2020-03-17 平安科技(深圳)有限公司 Task scheduling method and related device
CN113127185A (en) * 2019-12-31 2021-07-16 北京懿医云科技有限公司 Task execution queue processing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114428710A (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN108600029B (en) A configuration file updating method, device, terminal device and storage medium
US10678601B2 (en) Orchestration service for multi-step recipe composition with flexible, topology-aware, and massive parallel execution
JP5684946B2 (en) Method and system for supporting analysis of root cause of event
CN106469068B (en) Application program deployment method and system
US20110296390A1 (en) Systems and methods for generating machine state verification using number of installed package objects
CN112463144A (en) Distributed storage command line service method, system, terminal and storage medium
WO2020063550A1 (en) Policy decision method, apparatus and system, and storage medium, policy decision unit and cluster
US10838712B1 (en) Lifecycle management for software-defined datacenters
CN114035925A (en) Workflow scheduling method, device and equipment and readable storage medium
WO2021169124A1 (en) Method and apparatus for installing software package to target host, and computer device
US20230239212A1 (en) Stable References for Network Function Life Cycle Management Automation
JP2011197785A (en) System and program for collecting log
US10649808B2 (en) Outcome-based job rescheduling in software configuration automation
CN118689529B (en) A method, system, device and medium for configuring parameters of application services
CN108399095A (en) Dynamic is supported to manage method, system, equipment and the storage medium of timed task
KR102194974B1 (en) System for monitoring and controling electric power system for process verification
CN114428710B (en) Queue status verification method, device and electronic equipment
CN118466932A (en) Template configuration method, system and electronic device for low-code platform
CN117493058A (en) A troubleshooting system and method
CN110365627B (en) Application program synchronization method and device, computing equipment and storage medium
US20220342779A1 (en) Self-healing for data protection systems using automatic macro recording and playback
CN111262934A (en) File analysis method and device
US10592227B2 (en) Versioned intelligent offline execution of software configuration automation
CN114911590A (en) Task scheduling method and device, computer equipment and readable storage medium
WO2022214200A1 (en) Method and network element for pre-upgrade use case validation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China