CN107204868B - Task operation monitoring information acquisition method and device - Google Patents
Task operation monitoring information acquisition method and device Download PDFInfo
- Publication number
- CN107204868B CN107204868B CN201610158804.3A CN201610158804A CN107204868B CN 107204868 B CN107204868 B CN 107204868B CN 201610158804 A CN201610158804 A CN 201610158804A CN 107204868 B CN107204868 B CN 107204868B
- Authority
- CN
- China
- Prior art keywords
- task
- monitoring information
- platform
- information
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 191
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000000875 corresponding effect Effects 0.000 description 31
- 230000008569 process Effects 0.000 description 23
- 238000004458 analytical method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 230000009471 action Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 3
- 238000011112 process operation Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013024 troubleshooting Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
技术领域technical field
本发明涉及大数据运营管理技术,尤其涉及一种任务运行监控信息获取方法和装置。The invention relates to a big data operation management technology, in particular to a method and device for acquiring task operation monitoring information.
背景技术Background technique
目前,大数据应用发展迅速,在各个领域都有着广泛的运用;各个技术平台任务运行都有各自独特优势,但自成体系,各自为政;大数据平台由于需要整合不同技术体系的优势,满足不同层面的业务需求,因此,建设异构、技术混搭、融合部署、多业务联动的运营模式成为大数据平台不同于传统平台的显著特点。在这种情况下,在出现故障的时候,故障排查步骤基本如下:At present, the application of big data is developing rapidly and is widely used in various fields; each technology platform has its own unique advantages in the operation of tasks, but it is a self-contained system and its own governance; due to the need to integrate the advantages of different technology systems, the big data platform can meet the needs of different levels. Therefore, building an operation model of heterogeneous, technology mix and match, integrated deployment, and multi-service linkage has become a distinctive feature of big data platforms that is different from traditional platforms. In this case, when a fault occurs, the troubleshooting steps are basically as follows:
一、故障定位:由于大数据平台任务运行散落在各个平台中,各个平台的监控信息管理系统彼此独立,各自为政,按照自身的标准和规则工作,发生故障时,运维人员需要人工方式在各个平台的管理平台上交叉比对,核对各个平台发现的故障信息,人工剔除次要和关联告警,确认故障告警和故障原因;1. Fault location: Since the tasks of the big data platform are scattered in each platform, the monitoring information management systems of each platform are independent of each other, and they work in accordance with their own standards and rules. Cross-comparison on the management platform, check the fault information found on each platform, manually remove minor and related alarms, and confirm fault alarms and fault causes;
二、故障分析:由于各个系统的数据各自独立,目前大部分的故障分析都是在本系统内进行故障分析,采用人工方式对各个系统的数据进行汇总、关联,当平台任务众多,运行逻辑关系复杂时,基本无法判断相互交叉关系,分析只能从系统整体逐步细化,这个过程耗时长,并且无法在第一时间确认故障影响度、影响范围;2. Failure analysis: Since the data of each system is independent, most of the failure analysis is carried out in this system at present, and the data of each system is aggregated and correlated manually. When there are many tasks on the platform, the logical relationship When it is complex, it is basically impossible to judge the mutual cross relationship, and the analysis can only be gradually refined from the whole system. This process takes a long time, and it is impossible to confirm the impact degree and scope of the fault at the first time;
三、故障解决:在经过各个厂商的信息汇总,人工确定故障点后,需要协调各个厂商一起解决,各个厂商只负责本系统的故障解决,不考虑本系统故障解决对别的系统的影响,无法站在系统架构层次进行故障的整体把握;解决完成后,还要在各个系统的管控平台上分别确认解决情况,人工判断故障是否解决,是否产生新的问题等等。3. Troubleshooting: After collecting the information of each manufacturer and manually determining the fault point, it is necessary to coordinate with each manufacturer to solve the fault together. Each manufacturer is only responsible for the fault solving of this system, regardless of the impact of the fault solving of this system on other systems. At the system architecture level, the overall grasp of the fault should be carried out; after the solution is completed, the solution should be confirmed on the management and control platform of each system, and the fault should be manually judged whether it has been solved, whether new problems have occurred, and so on.
在现有条件下,大数据平台的故障业务影响评估、故障分析效率、告警准确性存在如下缺点:现在的故障定位都是不同厂商各自分析,特别是大数据平台多技术混搭下,随着平台任务上线越来越多,无法真正了解平台任务之间的逻辑关联关系,难以理清任务之间的运行依赖关系,导致缺乏全面的故障分析能力,让故障对业务的影响难以准确的评估;现有的故障分析在各种厂商大数据技术运营管理水平参差不齐的背景下,特别在Spark、Storm、Sqoop、HIVE,HBASE等诸多技术组件混合使用情况下,无法理清组件之间的系统故障关联影响,在故障定位过程中只能每个组件从头检查,定位,让整个故障的解决时效延长;现有的故障监控都是各自为主,每个任务故障、平台故障、设备故障都有自己的界面和信息,缺乏信息的关联融合,发生故障时,不同层面出现大量告警信息,无法对上述告警信息进行有效关联过滤,形成告警风暴,让运维人员无所适从。需要人工方式汇总、分析各个平台的告警信息,剔除次要告警和关联告警,找到故障原因,对专家级的人员依赖度大,故障处理效率低。Under the existing conditions, the evaluation of fault business impact, the efficiency of fault analysis, and the accuracy of alarms on the big data platform have the following shortcomings: the current fault location is analyzed by different manufacturers, especially under the multi-technology mix and match of the big data platform, with the platform As more and more tasks are launched, it is impossible to truly understand the logical relationship between platform tasks, and it is difficult to clarify the operational dependencies between tasks, resulting in a lack of comprehensive fault analysis capabilities, making it difficult to accurately assess the impact of faults on business; In some fault analysis, under the background of uneven operation and management of big data technology of various manufacturers, especially in the case of mixed use of many technical components such as Spark, Storm, Sqoop, HIVE, HBASE, etc., it is impossible to sort out the system faults between components. In the process of fault location, only each component can be checked and located from the beginning, so that the time limit for solving the entire fault can be prolonged; the existing fault monitoring is independent, and each task fault, platform fault, and equipment fault has its own There is a lack of correlation and fusion of information. When a fault occurs, a large amount of alarm information appears at different levels, and the above alarm information cannot be effectively correlated and filtered, forming an alarm storm, which makes operation and maintenance personnel at a loss. It is necessary to manually summarize and analyze the alarm information of each platform, eliminate secondary alarms and related alarms, and find the cause of the failure.
可见,提高大数据平台故障的快速定位,实现故障影响的自动化分析,提高故障解决及时性,是亟待解决的问题。It can be seen that improving the rapid location of faults on the big data platform, realizing automatic analysis of the impact of faults, and improving the timeliness of fault resolution are urgent problems to be solved.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明实施例期望提供一种任务运行监控信息获取方法和装置,提高大数据平台故障的快速定位,实现故障影响的自动化分析,提高故障解决及时性。In view of this, the embodiments of the present invention are expected to provide a method and device for acquiring task operation monitoring information, which can improve the rapid location of faults on the big data platform, realize automatic analysis of fault effects, and improve the timeliness of fault resolution.
为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, the technical scheme of the present invention is achieved in this way:
本发明实施例提供了一种任务运行监控信息获取方法,所述方法包括:根据任务所处的各任务节点的运行信息,分别设置任务标识;所述方法还包括:An embodiment of the present invention provides a method for obtaining task operation monitoring information. The method includes: setting task identifiers respectively according to the operation information of each task node where the task is located; the method further includes:
根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息、和/或平台监控信息、和/或设备监控信息,确定故障信息。According to the task identification of the fault point, the task monitoring information, and/or the platform monitoring information, and/or the equipment monitoring information of the task node corresponding to the content of the task identification of the fault point are obtained from the monitoring information, and the fault information is determined.
上述方案中,所述运行信息包括:运行平台类型、和/或运行平台组件类型;In the above solution, the operating information includes: operating platform type and/or operating platform component type;
所述任务标识的内容包括:所述运行平台类型、和/或所述运行平台组件类型、和/或任务序列号;The content of the task identification includes: the running platform type, and/or the running platform component type, and/or the task sequence number;
所述任务序列号包括:根据所述任务预设的唯一的标识号,或根据所述任务节点预设的唯一的标识号。The task sequence number includes: a unique identification number preset according to the task, or a unique identification number preset according to the task node.
上述方案中,所述根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息,确定故障信息;包括:In the above solution, the task monitoring information of the task node corresponding to the content of the task identification of the failure point is obtained in the monitoring information according to the task identification of the failure point, and the failure information is determined; including:
预先关联所述任务序列号与所述任务监控信息;Associating the task sequence number with the task monitoring information in advance;
根据所述故障点任务标识中的任务序列号,确定所述故障点任务标识对应的任务节点的任务监控信息。The task monitoring information of the task node corresponding to the task identification of the failure point is determined according to the task sequence number in the task identification of the failure point.
上述方案中,所述根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的平台监控信息,确定故障信息;包括:In the above solution, according to the task identifier of the fault point, the platform monitoring information of the task node corresponding to the content of the task identifier of the fault point is obtained in the monitoring information, and the fault information is determined; including:
根据所述故障点任务标识中的运行平台类型和运行平台组件类型,确定所述故障点任务标识对应的任务节点的平台类型和平台组件类型;According to the operating platform type and operating platform component type in the fault point task identifier, determine the platform type and platform component type of the task node corresponding to the fault point task identifier;
根据所述故障点任务标识对应的任务节点的平台类型和平台组件类型,确定所述故障点任务标识对应的任务节点的平台类型和平台组件类型对应的所述平台监控信息。According to the platform type and platform component type of the task node corresponding to the fault point task identifier, the platform monitoring information corresponding to the platform type and platform component type of the task node corresponding to the fault point task identifier is determined.
上述方案中,所述根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的设备监控信息,确定故障信息;包括:In the above solution, according to the task identifier of the fault point, the device monitoring information of the task node corresponding to the content of the task identifier of the fault point is obtained in the monitoring information, and the fault information is determined; including:
在任务执行日志中检索所述故障点任务标识中的运行平台类型和运行平台组件类型运行的设备主机名;Retrieve the operating platform type and the device host name of the operating platform component type in the fault point task identifier from the task execution log;
根据设备主机名,确定所述故障点任务标识对应任务节点的设备监控信息。According to the device host name, the device monitoring information of the task node corresponding to the fault point task identifier is determined.
本发明实施例还提供了一种任务运行监控信息获取装置,所述装置包括:设置装置和确定装置,其中,The embodiment of the present invention also provides a task operation monitoring information acquisition device, the device includes: a setting device and a determination device, wherein,
所述设置装置,用于根据任务所处的各任务节点的运行信息,分别设置任务标识;The setting device is used to respectively set the task identifier according to the operation information of each task node where the task is located;
所述确定装置,根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息、和/或平台监控信息、和/或设备监控信息,确定故障信息。The determining device, according to the task identification of the fault point, obtains the task monitoring information, and/or the platform monitoring information, and/or the equipment monitoring information of the task node corresponding to the content of the task identification of the fault point in the monitoring information, and determines the fault information .
上述方案中,所述运行信息包括:运行平台类型、和/或运行平台组件类型;In the above solution, the operating information includes: operating platform type and/or operating platform component type;
所述任务标识的内容包括:所述运行平台类型、和/或所述运行平台组件类型、和/或任务序列号;The content of the task identification includes: the running platform type, and/or the running platform component type, and/or the task sequence number;
所述任务序列号包括:根据所述任务预设的唯一的标识号,或根据所述任务节点预设的唯一的标识号。The task sequence number includes: a unique identification number preset according to the task, or a unique identification number preset according to the task node.
上述方案中,所述确定装置具体用于:In the above scheme, the determining device is specifically used for:
预先关联所述任务序列号与所述任务监控信息;Associating the task sequence number with the task monitoring information in advance;
所述确定故障信息,包括:根据所述故障点任务标识中的任务序列号,确定所述故障点任务标识对应的任务节点的任务监控信息。The determining of the fault information includes: determining the task monitoring information of the task node corresponding to the task identification of the failure point according to the task sequence number in the task identification of the failure point.
上述方案中,所述确定装置具体用于:In the above scheme, the determining device is specifically used for:
根据所述故障点任务标识中的运行平台类型和运行平台组件类型,确定所述故障点任务标识对应的任务节点的平台类型和平台组件类型;According to the operating platform type and operating platform component type in the fault point task identifier, determine the platform type and platform component type of the task node corresponding to the fault point task identifier;
根据所述故障点任务标识对应的任务节点的平台类型和平台组件类型,确定所述故障点任务标识对应的任务节点的平台类型和平台组件类型对应的所述平台监控信息。According to the platform type and platform component type of the task node corresponding to the fault point task identifier, the platform monitoring information corresponding to the platform type and platform component type of the task node corresponding to the fault point task identifier is determined.
上述方案中,所述确定装置具体用于:In the above scheme, the determining device is specifically used for:
在任务执行日志中检索所述故障点任务标识中的运行平台类型和运行平台组件类型运行的设备主机名;Retrieve the operating platform type and the device host name of the operating platform component type in the fault point task identifier from the task execution log;
根据设备主机名,确定所述故障点任务标识对应任务节点的设备监控信息。According to the device host name, the device monitoring information of the task node corresponding to the fault point task identifier is determined.
本发明实施例所提供的任务运行监控信息获取方法和装置,根据任务所处的各任务节点的运行信息,分别设置任务标识;所述方法还包括:根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息、和/或平台监控信息、和/或设备监控信息,确定故障信息。如此,能通过故障点任务节点的任务标识,准确获取所述故障点任务节点的任务监控信息、平台监控信息和设备监控信息;在出现故障时能够根据故障节点的任务标识获取故障监控信息,快速定位故障,实现故障影响的自动化分析,提高故障解决及时性。In the method and device for obtaining task operation monitoring information provided by the embodiments of the present invention, task identifiers are respectively set according to the operation information of each task node where the task is located; the method further includes: obtaining the task identifier from the monitoring information according to the task identifier of the fault point The fault information is determined according to the task monitoring information, and/or the platform monitoring information, and/or the equipment monitoring information of the task node corresponding to the task identification content of the fault point. In this way, the task monitoring information, platform monitoring information and equipment monitoring information of the task node at the failure point can be accurately obtained through the task identification of the task node at the failure point; when a failure occurs, the failure monitoring information can be obtained according to the task identification of the failure node, and the failure monitoring information can be quickly obtained. Locate faults, realize automatic analysis of fault impact, and improve the timeliness of fault resolution.
附图说明Description of drawings
图1为本发明实施例任务运行监控信息获取方法的流程示意图;1 is a schematic flowchart of a method for acquiring task operation monitoring information according to an embodiment of the present invention;
图2为本发明实施例任务标识组成示意图;2 is a schematic diagram of the composition of task identifiers according to an embodiment of the present invention;
图3为本发明实施例任务标识实现故障定位流程顺序示意图;3 is a schematic diagram of a sequence diagram of a fault location process for task identification according to an embodiment of the present invention;
图4为本发明实施例任务标识关联原理示意图;4 is a schematic diagram of a task identifier association principle according to an embodiment of the present invention;
图5为本发明实施例应用实例业务流程示意图;5 is a schematic diagram of a business process flow of an application example of an embodiment of the present invention;
图6为本发明实施例应用实例业务流程运行记录示意图;6 is a schematic diagram of an application example business process operation record according to an embodiment of the present invention;
图7为本发明实施例应用实例业务任务标识运行结果示意图;FIG. 7 is a schematic diagram of a running result of an application instance business task identifier according to an embodiment of the present invention;
图8为本发明实施例任务运行监控信息获取装置的组成结构示意图。FIG. 8 is a schematic diagram of the composition and structure of an apparatus for obtaining task operation monitoring information according to an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例中,根据任务所处的各任务节点的运行信息,分别设置任务标识;所述方法还包括:根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息、和/或平台监控信息、和/或设备监控信息,确定故障信息。In the embodiment of the present invention, according to the operation information of each task node where the task is located, the task identifiers are respectively set; the method further includes: according to the task identifiers of the fault points, acquiring from the monitoring information corresponding to the contents of the fault point task identifiers The task monitoring information, and/or the platform monitoring information, and/or the equipment monitoring information of the task node is used to determine the fault information.
下面结合实施例对本发明再作进一步详细的说明。The present invention will be described in further detail below in conjunction with the embodiments.
本发明实施例提供的一种任务运行监控信息获取方法,如图1所示,所述方法包括:A method for acquiring task operation monitoring information provided by an embodiment of the present invention, as shown in FIG. 1 , the method includes:
步骤101:根据任务所处的各任务节点的运行信息,分别设置任务标识;Step 101: according to the operation information of each task node where the task is located, set task identifiers respectively;
通常,一个大数据应用包含多个任务,大数据应用与任务的关系是一对多的关系,大数据应用与任务的关联关系由系统进行维护;单个任务也称为作业流程,单个作业流程由多个任务节点组成;现有技术采取的方法是给每个任务分配一个与任务运行信息无关的任务标识或任务序列号,用于跟踪任务的执行状况,其缺点是无法精确获取到各个任务节点的执行状况,也无法直接从中获取运行信息;Usually, a big data application contains multiple tasks, the relationship between big data applications and tasks is one-to-many, and the relationship between big data applications and tasks is maintained by the system; a single task is also called a job process, and a single job process is maintained by the system. It is composed of multiple task nodes; the method adopted in the prior art is to assign a task identifier or task sequence number unrelated to the task operation information to each task, which is used to track the execution status of the task. The disadvantage is that each task node cannot be accurately obtained. the execution status, and it is not possible to directly obtain operating information from it;
本发明的技术方案在任务运行过程中的每个任务节点设置不同的任务标识,所述任务标识包含任务在该任务节点的运行信息,所述运行信息包括:运行平台类型、运行平台组件类型、和/或任务序列号;其中,运行平台类型是指任务运行所处节点的运行平台的类型,如Java,Storm、Hadoop、Spark等平台;运行平台组件类型是指任务运行所处任务节点的运行平台的组件的类型,如Java平台中的Java-Node组件;任务序列号是任务运行前分配给所述任务的唯一序列号。In the technical solution of the present invention, a different task identifier is set for each task node in the task running process, and the task identifier includes the operation information of the task on the task node, and the operation information includes: the operation platform type, the operation platform component type, and/or task sequence number; wherein, the running platform type refers to the type of the running platform of the node where the task runs, such as Java, Storm, Hadoop, Spark and other platforms; the running platform component type refers to the running platform of the task node where the task runs The type of the component of the platform, such as the Java-Node component in the Java platform; the task sequence number is the unique sequence number assigned to the task before the task runs.
实际应用中,任务标识的形式可以如图2所示,这里,任务标识还可以包括:任务类型和任务名称,用于更快更方便地识别任务;根据不同的具体任务组件类型,按照图2的形式,任务标识可以如下设置:In practical applications, the form of the task identification can be as shown in Figure 2. Here, the task identification may also include: task type and task name, which are used to identify tasks faster and more conveniently; according to different specific task component types, according to Figure 2 In the form of , the task identifier can be set as follows:
对于同步任务组件,直接运行在oozie服务端,只有成功或失败信息,没有特别明显的信息,因此,任务标识可以设为:oozie:none;For the synchronization task component, which runs directly on the oozie server, there is only success or failure information, and there is no particularly obvious information. Therefore, the task identifier can be set to: oozie: none;
对于单映射(map)/回归(Reduce)任务组件,会提交一个单映射(only map)的mapreduce任务触发运行,在该任务中,map中封装用户定义的action组件,任务的任务标识可以定义为:oozie:lancher:T={0}:W={1}:A={2}:ID={3};其中,0表示组件类型,1表示平台类型,2表示任务名称,3表示任务序列号,如此,任务表示体现任务与平台的关系;For a single map (map)/regress (Reduce) task component, a mapreduce task with a single map (only map) will be submitted to trigger the operation. In this task, the user-defined action component is encapsulated in the map, and the task identifier of the task can be defined as :oozie:lancher:T={0}:W={1}:A={2}:ID={3}; where 0 is the component type, 1 is the platform type, 2 is the task name, and 3 is the task sequence No., in this way, the task representation reflects the relationship between the task and the platform;
对于双map/Reduce任务组件,在单map/Reduce任务运行的基础上,其map中封装用户定义的action是具有mapreduce性质的作业,而这种性质的mapreduce作业的任务标识可以定义为:oozie:action:T={0}:W={1}:A={2}:ID={3},其中,0表示组件类型,1表示平台类型,2表示任务名称,3表示任务序列号,如此,任务表示体现任务与平台的关系;For a dual map/Reduce task component, on the basis of running a single map/Reduce task, the user-defined action encapsulated in the map is a job with mapreduce properties, and the task identifier of a mapreduce job of this nature can be defined as: oozie: action: T={0}: W={1}: A={2}: ID={3}, where 0 represents the component type, 1 represents the platform type, 2 represents the task name, 3 represents the task sequence number, and so on , the task representation reflects the relationship between the task and the platform;
这里,单map/Reduce任务组件和双map/Reduce任务组件称为异步任务组件。Here, single map/Reduce task components and dual map/Reduce task components are called asynchronous task components.
步骤102:根据故障点任务标识,在监控信息中获取与所述故障点任务标识内容对应的任务节点的任务监控信息、和/或平台监控信息、和/或设备监控信息,确定故障信息;Step 102: Acquire task monitoring information, and/or platform monitoring information, and/or device monitoring information of task nodes corresponding to the content of the task identification of the fault point in the monitoring information according to the task identifier of the fault point, and determine the fault information;
现有的大数据的监控信息包括:应用监控信息、任务监控信息、平台监控信息、设备监控信息;各监控信息包含了任务在运行过程中的各种运行信息;其中,任务监控信息、平台监控信息、设备监控信息是相互独立的,互不关联;大数据应用与任务的关联关系由系统进行维护,因此,任务监控信息同应用监控信息可以通过任务归属关系完成关联。任务监控信息、平台监控信息和设备监控信息三者可以通过本发明技术方案的任务标识来实现关联,从而将所述应用监控信息、任务监控信息、平台监控信息、设备监控信息进行关联;其中,所述平台监控信息,包括:平台名称、平台类型、平台状态、平台上任务执行状况与日志;所述任务监控信息,包括:任务流转信息、流转环节、当前环节、各环节所用时间、节点转改、任务输出日志;所述设备监控信息包括:设备平台、设备主机信息、设备主机运行状况;通过本发明技术方案的任务标识实现任务监控信息、平台监控信息和设备监控信息三者关联,并可以将关联的信息合并到应用监控信息中,这样,所述应用监控信息能提供如下信息:应用包含的任务,每个任务的执行情况、任务在平台运行的情况、任务所在设备主机的运行状况;如此,可以获取整个应用的各任务,在各任务节点运行的信息;Existing big data monitoring information includes: application monitoring information, task monitoring information, platform monitoring information, and equipment monitoring information; each monitoring information includes various operation information of the task during the running process; among them, task monitoring information, platform monitoring information Information and device monitoring information are independent and unrelated to each other; the relationship between big data applications and tasks is maintained by the system. Therefore, task monitoring information and application monitoring information can be associated with task attribution. Task monitoring information, platform monitoring information, and device monitoring information can be associated through the task identifier of the technical solution of the present invention, so as to associate the application monitoring information, task monitoring information, platform monitoring information, and device monitoring information; wherein, The platform monitoring information includes: platform name, platform type, platform status, task execution status and logs on the platform; the task monitoring information includes: task transfer information, transfer links, current links, time used in each link, and node transfer information. The device monitoring information includes: device platform, device host information, and device host operating status; the task monitoring information, platform monitoring information and device monitoring information are correlated through the task identification of the technical solution of the present invention, and the The associated information can be incorporated into the application monitoring information, so that the application monitoring information can provide the following information: the tasks included in the application, the execution status of each task, the running status of the task on the platform, and the operating status of the device host where the task is located. ; In this way, each task of the entire application and the information running on each task node can be obtained;
具体的,任务运行前,可以为所述任务分配一个唯一的任务序列号,任务监控信息可以与所述任务序列号相对应,如:可以以任务序列号命名所述任务监控信息,如此,通过任务标识中的任务序列号就可以获取对应的任务监控信息;也可以在序列号后面增加任务节点特有标识,并按各任务节点的序列号分别建立任务监控信息,如此,通过任务标识中的任务序列号就可以获取对应的任务节点任务监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识获取对应故障任务节点的任务监控信息。Specifically, before a task runs, a unique task sequence number may be assigned to the task, and the task monitoring information may correspond to the task sequence number. For example, the task monitoring information may be named with the task sequence number. The corresponding task monitoring information can be obtained by the task sequence number in the task identifier; the unique identifier of the task node can also be added after the serial number, and the task monitoring information can be established according to the serial number of each task node. The serial number can obtain the task monitoring information of the corresponding task node; during the operation of the big data task, if a failure occurs, the task monitoring information of the corresponding failed task node can be obtained through the task ID of the failed task node.
通过任务标识中的运行平台类型和运行平台组件类型,可以确定当前任务运行的任务节点所处平台类型和平台组件类型上;通常,平台监控信息以平台类型和平台组件类型进行归类,如此,可以通过平台类型和平台组件类型关联出所述任务节点的平台监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识中的平台类型和平台组件类型,关联出所述故障任务节点的平台监控信息。Through the running platform type and running platform component type in the task identifier, the platform type and platform component type of the task node on which the current task is running can be determined; usually, platform monitoring information is classified by platform type and platform component type, so, The platform monitoring information of the task node can be associated with the platform type and the platform component type; during the operation of the big data task, if a fault occurs, the platform type and the platform component type in the task identifier of the faulty task node can be associated. Platform monitoring information of the faulty task node.
通过任务标识中的运行平台类型和运行平台组件类型,可以在所述平台监控信息的任务执行日志中检索到运行所述平台类型和组件类型的设备的主机名称;通过设备的主机名可以在设备监控信息中检索到所述任务节点的设备监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识中的平台类型和平台组件类型,在任务执行日志中确定出设备主机名,进一步获取任务节点的设备监控信息。Through the running platform type and running platform component type in the task identifier, the host name of the device running the platform type and component type can be retrieved from the task execution log of the platform monitoring information; The device monitoring information of the task node is retrieved from the monitoring information; during the operation of the big data task, if a fault occurs, the platform type and platform component type in the task identifier of the faulty task node can be determined in the task execution log. Device host name, and further obtain the device monitoring information of the task node.
如此,应用监控信息、任务监控信息、平台监控信息、设备监控信息通过任务标识,完成了监控信息的关联,达到几种监控信息的无缝融合;在实际应用中,通过这种融合关系,在日常的维护保障的时候,可以建立一个用户界面,收集各监控信息的关联信息,使关联的应用监控信息、任务监控信息、平台监控信息、设备监控信息等信息同时提供给维护人员,直接获取应用包含任务在各任务节点的各种监控信息,方便运营保障和故障定位;在发生故障时可以通过故障点任务标识,确定故障涉及的平台类型,平台组件类型或故障设备;大大提高故障定位的效率。In this way, application monitoring information, task monitoring information, platform monitoring information, and equipment monitoring information complete the association of monitoring information through task identification, and achieve seamless integration of several monitoring information; in practical applications, through this fusion relationship, in During daily maintenance, a user interface can be established to collect the related information of each monitoring information, so that the related application monitoring information, task monitoring information, platform monitoring information, equipment monitoring information and other information can be provided to the maintenance personnel at the same time, and the application monitoring information can be directly obtained. It contains various monitoring information of tasks in each task node, which is convenient for operation guarantee and fault location; in the event of a fault, the fault point task identification can be used to determine the type of platform involved in the fault, the type of platform components or the faulty equipment; greatly improve the efficiency of fault location. .
下面结合实施例1对本发明作进一步详细的说明。The present invention will be described in further detail below in conjunction with Example 1.
如图3所示,这里a和b分别展示了同步任务组件和异步任务组件在通过任务标识实现故障定位流程顺序;其中,任务标识将任务、平台、设备、应用关联起来的原理如图4所示:一个应用包含多个任务,应用与任务的关系是一对多的关系,此关联关系由系统进行维护;单个任务又指作业流程,作业流程由多个任务节点组成,每个任务节点在流程每次运行时,设置一个唯一的任务标识;通过所述任务标识,与平台上运行的任务相关联;平台作业在具体的设备上执行中,平台作业日志或作业状态又可以与相应的设备信息关联。这样,任务标识就以此来完成任务、平台、设备、应用关联等能力;进一步,在发生故障时,可以通过任务日志,完成故障定位、故障分析、故障监控等业务处理。As shown in Figure 3, here a and b respectively show that the synchronous task component and the asynchronous task component implement the fault location process sequence through the task identification; among them, the principle of the task identification associating tasks, platforms, devices, and applications is shown in Figure 4 Show: an application contains multiple tasks, the relationship between the application and the task is a one-to-many relationship, and this relationship is maintained by the system; a single task also refers to a job process, and the job process consists of multiple task nodes, each task node in Each time the process runs, a unique task identifier is set; through the task identifier, it is associated with the task running on the platform; when the platform job is executed on a specific device, the platform job log or job status can be associated with the corresponding device. Information association. In this way, the task identifier can be used to complete the task, platform, device, application association and other capabilities; further, when a fault occurs, fault location, fault analysis, fault monitoring and other business processing can be completed through the task log.
下面结合实施例2对本发明作更进一步详细的说明。Below in conjunction with embodiment 2, the present invention is described in further detail.
本实施例在具体业务应用的各任务的任务节点设置了任务标识,对具体业务应用中的任务运行起到了良好的监控的效果;In this embodiment, task identifiers are set on the task nodes of each task of the specific business application, which has a good effect of monitoring the task operation in the specific business application;
这里,具体业务应用实现的功能是:计算分析用户交往圈行为,使用用户语音详单来发现与分析用户的交往圈,通过判断用户和其他用户的通话行为,例如通话频次、通话时长、通话时间段等指标来分析用户的交往行为,判断是否为交往圈影响力最高的用户;具体业务流程为如图5所示,包括:Here, the functions implemented by the specific business application are: calculating and analyzing the behavior of the user's contact circle, using the user's voice detailed list to discover and analyze the user's contact circle, and judging the call behavior of the user and other users, such as call frequency, call duration, call time Segment and other indicators to analyze the user's communication behavior, and determine whether it is the user with the highest influence in the communication circle; the specific business process is shown in Figure 5, including:
步骤501:从接口机采集详单文件到Hadoop的分布式文件系统(HDFS,HadoopDistributed File System)中;Step 501: collect the detailed list file from the interface machine into the Hadoop Distributed File System (HDFS, Hadoop Distributed File System);
步骤502:用Map/Reduce程序对详单进行清洗、过滤与分拣;Step 502: Use the Map/Reduce program to clean, filter and sort the detailed list;
步骤503:步骤502的结果入Hive库,按主叫号码与被叫号码进行汇总,计算通话次数、通话时长、通话时间段等指标;Step 503: The result of step 502 is entered into the Hive database, and the calling number and the called number are aggregated to calculate indicators such as the number of calls, the duration of the call, and the duration of the call;
步骤504:将分析结果通过sqoop脚本导出到关系型数据库中。Step 504: Export the analysis result to the relational database through the sqoop script.
在所述业务流程中采用了本发明的技术方案,在各节点设置任务标识,业务流程运行记录如图6所示,业务流程节点状态如图7(a)所示,业务流程运行日志如图7(b)所示;业务流程节点中的任务标识在平台中按名称对应情况如图7(c)所示;业务流程节点中的任务标识在平台中对应的作业运行状态如图7(d)所示;业务流程节点中的任务标识在平台中的运行情况与对应设备情况如图7(e)所示;The technical solution of the present invention is adopted in the business process, and task identifiers are set at each node, the business process operation record is shown in Figure 6, the business process node status is shown in Figure 7(a), and the business process operation log is shown in Figure 6 As shown in Figure 7(b); the corresponding situation of the task ID in the business process node in the platform by name is shown in Figure 7(c); the corresponding job running status of the task ID in the business process node in the platform is shown in Figure 7(d) ); the operation situation of the task identifier in the business process node in the platform and the corresponding equipment situation are shown in Figure 7(e);
通过图7可以看出在整个业务流程中,任务节点中的任务监控信息、平台监控信息、设备监控信息已经通过任务标识完成了关联;如此,维护人员可以方便地获取所需信息;在发生故障时可以方便地确定故障发生的平台,平台组件或者设备等信息。It can be seen from Figure 7 that in the entire business process, the task monitoring information, platform monitoring information, and equipment monitoring information in the task node have been associated through the task identification; in this way, the maintenance personnel can easily obtain the required information; in the event of a failure When the fault occurs, it can easily determine the platform, platform components or equipment and other information.
本发明实施例提供的一种任务运行监控信息获取装置,如图8所示,所述装置包括:设置模块81、确定模块82、其中,An apparatus for obtaining task operation monitoring information provided by an embodiment of the present invention, as shown in FIG. 8 , the apparatus includes: a setting module 81 and a determination module 82 , wherein,
所述设置模块81,用于根据任务所处的各任务节点的运行信息,分别设置任务标识;The setting module 81 is used to respectively set the task identifier according to the operation information of each task node where the task is located;
通常,一个大数据应用包含多个任务,大数据应用与任务的关系是一对多的关系,大数据应用与任务的关联关系由系统进行维护;单个任务也称为作业流程,单个作业流程由多个任务节点组成;现有技术采取的方法是给每个任务分配一个与任务运行信息无关的任务标识或任务序列号,用于跟踪任务的执行状况,其缺点是无法精确获取到各个任务节点的执行状况,也无法直接从中获取运行信息;Usually, a big data application contains multiple tasks, the relationship between big data applications and tasks is one-to-many, and the relationship between big data applications and tasks is maintained by the system; a single task is also called a job process, and a single job process is maintained by the system. It is composed of multiple task nodes; the method adopted in the prior art is to assign a task identifier or task sequence number unrelated to the task operation information to each task, which is used to track the execution status of the task. The disadvantage is that each task node cannot be accurately obtained. the execution status, and it is not possible to directly obtain operating information from it;
本发明的技术方案在任务运行过程中的每个任务节点设置不同的任务标识,所述任务标识包含任务在该任务节点的运行信息,所述运行信息包括:运行平台类型、运行平台组件类型、和/或任务序列号;其中,运行平台类型是指任务运行所处任务节点的运行平台的类型,如Java,Storm、Hadoop、Spark等平台;运行平台组件类型是指任务运行所处任务节点的运行平台的组件的类型,如Java平台中的Java-Node组件;任务序列号是任务运行前分配给所述任务的唯一序列号。In the technical solution of the present invention, a different task identifier is set for each task node in the task running process, and the task identifier includes the operation information of the task on the task node, and the operation information includes: the operation platform type, the operation platform component type, and/or the task sequence number; wherein, the running platform type refers to the type of the running platform of the task node where the task runs, such as Java, Storm, Hadoop, Spark and other platforms; the running platform component type refers to the type of the task node where the task runs. The type of the component of the running platform, such as the Java-Node component in the Java platform; the task sequence number is the unique sequence number assigned to the task before the task runs.
实际应用中,任务标识的形式可以如图2所示,这里,任务标识还可以包括:任务类型和任务名称,用于更快更方便地识别任务;根据不同的具体任务组件类型,按照图2的形式,任务标识可以如下设置:In practical applications, the form of the task identification can be as shown in Figure 2. Here, the task identification may also include: task type and task name, which are used to identify tasks faster and more conveniently; according to different specific task component types, according to Figure 2 In the form of , the task identifier can be set as follows:
对于同步任务组件,直接运行在oozie服务端,只有成功或失败信息,没有特别明显的信息,因此,任务标识可以设为:oozie:none;For the synchronization task component, which runs directly on the oozie server, there is only success or failure information, and there is no particularly obvious information. Therefore, the task identifier can be set to: oozie: none;
对于单映射(map)/回归(Reduce)任务组件,会提交一个单映射(only map)的mapreduce任务触发运行,在该任务中,map中封装用户定义的action组件,任务的任务标识可以定义为:oozie:lancher:T={0}:W={1}:A={2}:ID={3};其中,0表示组件类型,1表示平台类型,2表示任务名称,3表示任务序列号,如此,任务表示体现任务与平台的关系;For a single map (map)/regress (Reduce) task component, a mapreduce task with a single map (only map) will be submitted to trigger the operation. In this task, the user-defined action component is encapsulated in the map, and the task identifier of the task can be defined as :oozie:lancher:T={0}:W={1}:A={2}:ID={3}; where 0 is the component type, 1 is the platform type, 2 is the task name, and 3 is the task sequence No., in this way, the task representation reflects the relationship between the task and the platform;
对于双map/Reduce任务组件,在单map/Reduce任务运行的基础上,其map中封装用户定义的action是具有mapreduce性质的作业,而这种性质的mapreduce作业的任务标识可以定义为:oozie:action:T={0}:W={1}:A={2}:ID={3},其中,0表示组件类型,1表示平台类型,2表示任务名称,3表示任务序列号,如此,任务表示体现任务与平台的关系;For a dual map/Reduce task component, on the basis of running a single map/Reduce task, the user-defined action encapsulated in the map is a job with mapreduce properties, and the task identifier of a mapreduce job of this nature can be defined as: oozie: action: T={0}: W={1}: A={2}: ID={3}, where 0 represents the component type, 1 represents the platform type, 2 represents the task name, 3 represents the task sequence number, and so on , the task representation reflects the relationship between the task and the platform;
这里,单map/Reduce任务组件和双map/Reduce任务组件称为异步任务组件。Here, single map/Reduce task components and dual map/Reduce task components are called asynchronous task components.
所述确定模块82,根据各个任务标识,在监控信息中获取与所述任务标识内容对应的各任务节点任务运行的监控信息;The determining module 82, according to each task identifier, obtains the monitoring information of the task operation of each task node corresponding to the content of the task identifier in the monitoring information;
现有的大数据的监控信息包括:应用监控信息、任务监控信息、平台监控信息、设备监控信息;各监控信息包含了任务在运行过程中的各种运行信息;其中,任务监控信息、平台监控信息、设备监控信息是相互独立的,互不关联;大数据应用与任务的关联关系由系统进行维护,因此,任务监控信息同应用监控信息可以通过任务归属关系完成关联。任务监控信息、平台监控信息和设备监控信息三者可以通过本发明技术方案的任务标识来实现关联,从而将所述应用监控信息、任务监控信息、平台监控信息、设备监控信息进行关联;其中,所述平台监控信息,包括:平台名称、平台类型、平台状态、平台上任务执行状况与日志;所述任务监控信息,包括:任务流转信息、流转环节、当前环节、各环节所用时间、节点转改、任务输出日志;所述设备监控信息包括:设备平台、设备主机信息、设备主机运行状况;通过本发明技术方案的任务标识实现任务监控信息、平台监控信息和设备监控信息三者关联,并可以将关联的信息合并到应用监控信息中,这样,所述应用监控信息能提供如下信息:应用包含的任务,每个任务的执行情况、任务在平台运行的情况、任务所在设备主机的运行状况;如此,可以获取整个应用的各任务,在各任务节点运行的信息;Existing big data monitoring information includes: application monitoring information, task monitoring information, platform monitoring information, and equipment monitoring information; each monitoring information includes various operation information of the task during the running process; among them, task monitoring information, platform monitoring information Information and device monitoring information are independent and unrelated to each other; the relationship between big data applications and tasks is maintained by the system. Therefore, task monitoring information and application monitoring information can be associated with task attribution. Task monitoring information, platform monitoring information, and device monitoring information can be associated through the task identifier of the technical solution of the present invention, so as to associate the application monitoring information, task monitoring information, platform monitoring information, and device monitoring information; wherein, The platform monitoring information includes: platform name, platform type, platform status, task execution status and logs on the platform; the task monitoring information includes: task transfer information, transfer links, current links, time used in each link, and node transfer information. The device monitoring information includes: device platform, device host information, and device host operating status; the task monitoring information, platform monitoring information and device monitoring information are correlated through the task identification of the technical solution of the present invention, and the The associated information can be incorporated into the application monitoring information, so that the application monitoring information can provide the following information: the tasks included in the application, the execution status of each task, the running status of the task on the platform, and the operating status of the device host where the task is located. ; In this way, each task of the entire application and the information running on each task node can be obtained;
具体的,任务运行前,可以为所述任务分配一个唯一的任务序列号,任务监控信息可以与所述任务序列号相对应,如:可以以任务序列号命名所述任务监控信息,如此,通过任务标识中的任务序列号就可以获取对应的任务监控信息;也可以在序列号后面增加任务节点特有标识,并按各任务节点的序列号分别建立任务监控信息,如此,通过任务标识中的任务序列号就可以获取对应的任务节点任务监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识获取对应故障任务节点的任务监控信息。Specifically, before a task runs, a unique task sequence number may be assigned to the task, and the task monitoring information may correspond to the task sequence number. For example, the task monitoring information may be named with the task sequence number. The corresponding task monitoring information can be obtained by the task sequence number in the task identifier; the unique identifier of the task node can also be added after the serial number, and the task monitoring information can be established according to the serial number of each task node. The serial number can obtain the task monitoring information of the corresponding task node; during the operation of the big data task, if a failure occurs, the task monitoring information of the corresponding failed task node can be obtained through the task ID of the failed task node.
通过任务标识中的运行平台类型和运行平台组件类型,可以确定当前任务运行的任务节点所处平台类型和平台组件类型上;通常,平台监控信息以平台类型和平台组件类型进行归类,如此,可以通过平台类型和平台组件类型关联出所述任务节点的平台监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识中的平台类型和平台组件类型,关联出所述故障任务节点的平台监控信息。Through the running platform type and running platform component type in the task identifier, the platform type and platform component type of the task node on which the current task is running can be determined; usually, platform monitoring information is classified by platform type and platform component type, so, The platform monitoring information of the task node can be associated with the platform type and the platform component type; during the operation of the big data task, if a fault occurs, the platform type and the platform component type in the task identifier of the faulty task node can be associated. Platform monitoring information of the faulty task node.
通过任务标识中的运行平台类型和运行平台组件类型,可以在所述平台监控信息的任务执行日志中检索到运行所述平台类型和组件类型的设备的主机名称;通过设备的主机名可以在设备监控信息中检索到所述任务节点的设备监控信息;在大数据任务运行过程中,如果出现故障,可以通过故障任务节点的任务标识中的平台类型和平台组件类型,在任务执行日志中确定出设备主机名,进一步获取任务节点的设备监控信息。Through the running platform type and running platform component type in the task identifier, the host name of the device running the platform type and component type can be retrieved from the task execution log of the platform monitoring information; The device monitoring information of the task node is retrieved from the monitoring information; during the operation of the big data task, if a fault occurs, the platform type and platform component type in the task identifier of the faulty task node can be determined in the task execution log. Device host name, and further obtain the device monitoring information of the task node.
如此,应用监控信息、任务监控信息、平台监控信息、设备监控信息通过任务标识,完成了监控信息的关联,达到几种监控信息的无缝融合;在实际应用中,通过这种融合关系,在日常的维护保障的时候,可以建立一个用户界面,收集各监控信息的关联信息,使关联的应用监控信息、任务监控信息、平台监控信息、设备监控信息等信息同时提供给维护人员,直接获取应用包含任务在各任务节点的各种监控信息,方便运营保障和故障定位;在发生故障时可以通过故障点任务标识,确定故障涉及的平台类型,平台组件类型或故障设备;大大提高故障定位的效率。In this way, application monitoring information, task monitoring information, platform monitoring information, and equipment monitoring information complete the association of monitoring information through task identification, and achieve seamless integration of several monitoring information; in practical applications, through this fusion relationship, in During daily maintenance, a user interface can be established to collect the related information of each monitoring information, so that the related application monitoring information, task monitoring information, platform monitoring information, equipment monitoring information and other information can be provided to the maintenance personnel at the same time, and the application monitoring information can be directly obtained. It contains various monitoring information of tasks in each task node, which is convenient for operation guarantee and fault location; in the event of a fault, the fault point task identification can be used to determine the type of platform involved in the fault, the type of platform components or the faulty equipment; greatly improve the efficiency of fault location. .
在实际应用中,设置模块81、确定模块82可由大数据服务器系统的中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)、或现场可编程门阵列(FPGA)等实现。In practical applications, the setting module 81 and the determination module 82 can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA) of the big data server system, etc. accomplish.
以上所述,仅为本发明的佳实施例而已,并非用于限定本发明的保护范围,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection of the invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610158804.3A CN107204868B (en) | 2016-03-18 | 2016-03-18 | Task operation monitoring information acquisition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610158804.3A CN107204868B (en) | 2016-03-18 | 2016-03-18 | Task operation monitoring information acquisition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107204868A CN107204868A (en) | 2017-09-26 |
CN107204868B true CN107204868B (en) | 2020-08-18 |
Family
ID=59904279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610158804.3A Active CN107204868B (en) | 2016-03-18 | 2016-03-18 | Task operation monitoring information acquisition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107204868B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108776579B (en) * | 2018-06-19 | 2021-10-15 | 郑州云海信息技术有限公司 | Distributed storage cluster expansion method, device, equipment and storage medium |
CN109471709B (en) * | 2018-10-16 | 2022-02-18 | 深圳中顺易金融服务有限公司 | Scheduling method for flow task processing big data based on Apache Oozie framework |
CN110209893A (en) * | 2019-04-23 | 2019-09-06 | 北京奇艺世纪科技有限公司 | Task creating method, system and storage medium |
CN110489261A (en) * | 2019-07-31 | 2019-11-22 | 上海艾融软件股份有限公司 | Task handles alarm method, device and electronic equipment, storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6952780B2 (en) * | 2000-01-28 | 2005-10-04 | Safecom A/S | System and method for ensuring secure transfer of a document from a client of a network to a printer |
GB2465860A (en) * | 2008-12-04 | 2010-06-09 | Ibm | A directed graph behaviour model for monitoring a computer system in which each node of the graph represents an event generated by an application |
CN102521099A (en) * | 2011-11-24 | 2012-06-27 | 深圳市同洲视讯传媒有限公司 | Process monitoring method and process monitoring system |
CN103902646A (en) * | 2013-12-27 | 2014-07-02 | 北京天融信软件有限公司 | Distributed task managing system and method |
-
2016
- 2016-03-18 CN CN201610158804.3A patent/CN107204868B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6952780B2 (en) * | 2000-01-28 | 2005-10-04 | Safecom A/S | System and method for ensuring secure transfer of a document from a client of a network to a printer |
GB2465860A (en) * | 2008-12-04 | 2010-06-09 | Ibm | A directed graph behaviour model for monitoring a computer system in which each node of the graph represents an event generated by an application |
CN102521099A (en) * | 2011-11-24 | 2012-06-27 | 深圳市同洲视讯传媒有限公司 | Process monitoring method and process monitoring system |
CN103902646A (en) * | 2013-12-27 | 2014-07-02 | 北京天融信软件有限公司 | Distributed task managing system and method |
Non-Patent Citations (2)
Title |
---|
How to schedule sqoop job command using oozie in azure,https://social.msdn.microsoft.com/Forums/en-US/a520f0df-ef13-48c0-b737-218625047284/how-to-schedule-sqoop-job-command-using-oozie-in-azure-hdinsightremote-machine?forum=hdinsight;MSDN;《MSDN》;20140701;全文 * |
Oozie 与 Yarn 协同工作,https://blog.csdn.net/samhacker/article/details/21413057;LiuQiYun;《CSDN》;20140317;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107204868A (en) | 2017-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111756582B (en) | Service chain monitoring method based on NFV log alarm | |
CN101800675B (en) | Failure monitoring method, monitoring equipment and communication system | |
CN107204868B (en) | Task operation monitoring information acquisition method and device | |
CN113452607B (en) | Method, device, computing equipment and storage medium for distributed link acquisition | |
CN103544093A (en) | Monitoring and alarm control method and system | |
WO2017114152A1 (en) | Service dial testing method, apparatus and system | |
CN106209405A (en) | Method for diagnosing faults and device | |
CN105159964A (en) | Log monitoring method and system | |
CN105138460B (en) | A kind of method for testing software and system | |
CN106330501A (en) | A fault correlation method and device | |
CN100555952C (en) | How to identify relevant alarms | |
CN108632111A (en) | Service link monitoring method based on log | |
CN114024825B (en) | A method for end-to-end fault monitoring of services in a cloud computing environment | |
CN109964450B (en) | A method and device for determining a shared risk link group | |
CN100499482C (en) | A method for monitoring user behavior in network management system | |
CN107635003A (en) | System log management method, device and system | |
CN111884859B (en) | Network fault diagnosis method and device and readable storage medium | |
CN104219087A (en) | Fault location method | |
CN101252477B (en) | Determining method and analyzing apparatus of network fault root | |
CN114374600A (en) | Network operation and maintenance method, device, equipment and product based on big data | |
CN108449212B (en) | MAS message transmission method based on event association | |
JP6926646B2 (en) | Inter-operator batch service management device and inter-operator batch service management method | |
WO2023093527A1 (en) | Alarm association rule generation method and apparatus, and electronic device and storage medium | |
CN119966795B (en) | A voice dialing fault handling method and system | |
CN116881089B (en) | Buried point and buried point management method for calling chain state feedback state |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |