[go: up one dir, main page]

CN114443422B - A distributed resource monitoring method and system - Google Patents

A distributed resource monitoring method and system Download PDF

Info

Publication number
CN114443422B
CN114443422B CN202111636767.XA CN202111636767A CN114443422B CN 114443422 B CN114443422 B CN 114443422B CN 202111636767 A CN202111636767 A CN 202111636767A CN 114443422 B CN114443422 B CN 114443422B
Authority
CN
China
Prior art keywords
node
monitoring
data
monitoring center
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111636767.XA
Other languages
Chinese (zh)
Other versions
CN114443422A (en
Inventor
王一凡
王中华
张洋
刘雨坤
吴娜
李亚晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202111636767.XA priority Critical patent/CN114443422B/en
Publication of CN114443422A publication Critical patent/CN114443422A/en
Application granted granted Critical
Publication of CN114443422B publication Critical patent/CN114443422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3013Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is an embedded system, i.e. a combination of hardware and software dedicated to perform a certain function in mobile devices, printers, automotive or aircraft systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请提供了一种分布式资源监控方法和系统,属于计算机技术领域,监控方法具体包括:建立软件环境,在每个节点上部署中间件;所有的中间件覆盖监控中心和各节点,并在监控中心和各节点之间形成全局数据空间,各节点通过中间件将数据传输至全局数据空间,监控中心通过中间件订阅全局数据空间中需要的监控数据,资源监控器节点通过调用监控中心的主题绑定接口,将本节点相关信息注册至监控中心的数据库中,实现主题与监控节点一一映射的关系;监控中心主题绑定接口被监控节点调用后,根据监控节点提供的主题Topic订阅该节点的监控数据,由用户设置监控数据拉取速率,从全局数据空间获取对应节点的监控数据。

The present application provides a distributed resource monitoring method and system, which belongs to the field of computer technology. The monitoring method specifically includes: establishing a software environment and deploying middleware on each node; all middleware covers the monitoring center and each node, and forms a global data space between the monitoring center and each node, each node transmits data to the global data space through the middleware, the monitoring center subscribes to the required monitoring data in the global data space through the middleware, and the resource monitor node registers the relevant information of the node into the database of the monitoring center by calling the theme binding interface of the monitoring center, so as to realize a one-to-one mapping relationship between the theme and the monitoring node; after the theme binding interface of the monitoring center is called by the monitoring node, the monitoring data of the node is subscribed according to the theme Topic provided by the monitoring node, and the user sets the monitoring data pulling rate to obtain the monitoring data of the corresponding node from the global data space.

Description

Distributed resource monitoring method and system
Technical Field
The present application relates to the field of computer technology, and in particular, to a distributed resource monitoring method and system.
Background
In an airborne environment, each aircraft platform carries various intelligent equipment and load resources, a distributed cluster is naturally formed, and reasonable and efficient utilization of the existing resources on the aircraft platform is a key to winning in high-intensity air combat countermeasures.
Disclosure of Invention
In view of the above, the application provides a distributed resource monitoring method and system, which solve the problems in the prior art, realize the effective monitoring of airborne resources, and ensure that the scheduled airborne resources are in an available state when the tasks of an upper layer decision module are distributed and deployed while meeting the real-time performance and the safety of an airborne embedded environment.
In one aspect, the distributed resource monitoring method provided by the application adopts the following technical scheme:
a distributed resource monitoring method, the method being applied to a distributed system, the resource monitoring method comprising:
establishing a software environment, and deploying middleware on each node, wherein all the middleware covers a monitoring center and each node, a global data space is formed between the monitoring center and each node, each node transmits data to the global data space through the middleware, and the monitoring center subscribes monitoring data needed in the global data space through the middleware;
The resource monitor node registers the related information of the node into a database of the monitoring center by calling a theme binding interface of the monitoring center, so as to realize the one-to-one mapping relation between the theme and the monitoring node;
After the monitoring center theme binding interface is called by the monitoring node, subscribing the monitoring data of the node according to the theme Topic provided by the monitoring node, setting the pulling rate of the monitoring data by a user, and acquiring the monitoring data of the corresponding node from the global data space.
Optionally, the specific steps of each node include:
Step 1, each computing node and terminal node in the cluster call a theme binding interface of a monitoring center to register the node, wherein parameters contained in interface call comprise node names, node IP addresses, node MAC physical addresses, monitoring data themes Topic and monitoring periods;
Step 2, after the node receives the successful binding mark, starting multithreading to perform concurrent monitoring on various resources of the node;
step 3, the node stores the monitoring data in a local storage medium;
step 4, the node packages the collected monitoring data into a JSON data format;
and 5, the node issues the packaged monitoring data to the global data space through the DDS middleware according to the monitoring data subject Topic bound in the step 1, and the step 2 is executed.
Optionally, the specific steps of the monitoring center include:
step 1, a monitoring center starts a theme binding interface service;
step 2, the monitoring center receives the Topic binding request of the node and analyzes the request, subscribes to the Topic of the node monitoring data according to the Topic field, starts a monitoring thread for acquiring the monitoring data, and returns a binding success mark;
And step 3, obtaining the monitoring data of the corresponding node from the global data space according to the node monitoring data pulling rate set by the user.
Optionally, the MAC physical address of the Topic or splice node is generated by a string random method.
Optionally, the monitoring item of the monitored node and the processing mode of the monitored data include:
for a computing node, monitoring items comprise node CPU occupancy rate, memory occupancy rate, disk occupancy rate, network bandwidth occupancy rate, node survival state, node working state, task running state and monitoring time stamp;
For the terminal node, the monitoring item comprises a monitoring item of the computing node, data collected by a sensor and a terminal working state;
The resource monitor of the monitored node receives the binding success mark returned by the monitoring center, monitors the monitoring item of the node according to the preset monitoring period, locally stores the monitoring result data, and distributes the monitoring result data to the global data space according to the binding monitoring data subject Topic through the DDS middleware;
The node encapsulates the collected monitoring data into a data format that can be identified.
On the other hand, the distributed resource monitoring system provided by the application adopts the following technical scheme:
A distributed resource monitoring system comprises a monitoring center and a plurality of nodes, wherein the monitoring center is provided with a cluster monitor and a DDS middleware, and all the nodes are provided with the resource monitor and the DDS middleware, wherein:
The monitoring center is provided with a cluster monitor and a DDS middleware for monitoring all bound computing nodes and terminal nodes of the cluster;
The resource monitor is used for monitoring the resource state of the node;
the DDS middleware is used for publishing the monitored resource state data to the global data space;
and the monitoring center subscribes and acquires the monitoring data published by the computing node and the terminal node in the global data space through the DDS middleware.
Optionally, the system includes N computing nodes, N terminal nodes, and a monitoring center, where:
the computing node provides computing resources for running computing tasks;
The terminal node provides sensing resources and functional resources for realizing calculation, storage and communication of the terminal equipment.
Optionally, the terminal node includes a terminal device and a computing device, where the terminal device includes a sensing device and a functional device, and the computing device includes a general purpose computing device with computing, storing and communication capabilities, where a resource agent is deployed in the computing device and used for performing control interaction and data interaction with the terminal device, where the resource agent is running in a container of the computing device, where a driver of the terminal device is encapsulated in the container, and the container provides an interface method for controlling the terminal device.
Optionally, the interface method comprises one or more of Socket call, web Service, JSON-RPC and Restful API.
Optionally, all nodes in the system include a processor and a computer readable storage medium having stored therein a computer program for enabling distributed monitoring.
In summary, the application has the following beneficial technical effects:
1. According to the application, middleware is deployed on each node to realize the resource and state monitoring of each distributed node in the cluster, the nodes agree on Topic of monitoring data transmission communication by calling a theme binding interface of a monitoring center, and the monitoring center and the monitored node issue and acquire monitoring data through the same theme Topic;
2. the application realizes distributed monitoring by a DDS publishing and subscribing mechanism, the monitoring center can dynamically adjust the pulling rate of monitoring data according to self load, meanwhile, because the monitoring center and the monitored node carry out data interaction through a DDS theme Topic, cluster communication is more flexible, a new node is added into a cluster and only needs to call a theme binding interface of the monitoring center to register, so that the system is easier to expand, the volume is lighter, and the application is more suitable for an embedded environment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a distributed resource monitoring flow according to the present application.
FIG. 2 is a diagram of a distributed resource monitoring system cluster architecture according to the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the application provides a distributed resource monitoring method.
As shown in fig. 1, a distributed resource monitoring method is applied to a distributed system, and the resource monitoring method includes:
Establishing a software environment, deploying middleware on each node, covering a monitoring center and each node by all the middleware, forming a global data space between the monitoring center and each node, transmitting data to the global data space by each node through the middleware, subscribing monitoring data needed in the global data space by the monitoring center through the middleware,
The resource monitor node registers the related information of the node into a database of the monitoring center by calling a theme binding interface of the monitoring center, so as to realize the one-to-one mapping relation between the theme and the monitoring node;
After the monitoring center theme binding interface is called by the monitoring node, subscribing the monitoring data of the node according to the theme Topic provided by the monitoring node, setting the pulling rate of the monitoring data by a user, and acquiring the monitoring data of the corresponding node from the global data space.
The specific steps of each node include:
Step 1, each computing node and terminal node in the cluster call a theme binding interface of a monitoring center to register the node, wherein parameters contained in interface call comprise node names, node IP addresses, node MAC physical addresses, monitoring data themes Topic and monitoring periods;
Step 2, after the node receives the successful binding mark, starting multithreading to perform concurrent monitoring on various resources of the node;
step 3, the node stores the monitoring data in a local storage medium;
step 4, the node packages the collected monitoring data into a JSON data format;
and 5, the node issues the packaged monitoring data to the global data space through the DDS middleware according to the monitoring data subject Topic bound in the step 1, and the step 2 is executed.
The specific steps of the monitoring center include:
step 1, a monitoring center starts a theme binding interface service;
step 2, the monitoring center receives the Topic binding request of the node and analyzes the request, subscribes to the Topic of the node monitoring data according to the Topic field, starts a monitoring thread for acquiring the monitoring data, and returns a binding success mark;
And step 3, obtaining the monitoring data of the corresponding node from the global data space according to the node monitoring data pulling rate set by the user.
In order to ensure that the monitoring data Topic Topic is unique in the global data space, the MAC physical address of the Topic or the splicing node is generated by a character string random method. By setting the same domain, the logic isolation between the monitoring data interaction and other application communication networks in the system is ensured, and the communication between the monitoring center and the monitored node is not affected.
The monitoring item of the monitored node and the processing mode of the monitored data comprise:
for a computing node, monitoring items comprise node CPU occupancy rate, memory occupancy rate, disk occupancy rate, network bandwidth occupancy rate, node survival state, node working state, task running state and monitoring time stamp;
For the terminal node, the monitoring item comprises a monitoring item of the computing node, data collected by a sensor and a terminal working state;
The resource monitor of the monitored node receives the binding success mark returned by the monitoring center, monitors the monitoring item of the node according to the preset monitoring period, locally stores the monitoring result data, and distributes the monitoring result data to the global data space according to the binding monitoring data subject Topic through the DDS middleware;
The node encapsulates the collected monitoring data into a data format that can be identified.
The application also discloses a distributed resource monitoring system
As shown in fig. 2, a distributed resource monitoring system includes a monitoring center and a plurality of nodes, the monitoring center deploys a cluster monitor and a DDS middleware, and all the nodes deploy the resource monitor and the DDS middleware, wherein:
The monitoring center is provided with a cluster monitor and a DDS middleware for monitoring all bound computing nodes and terminal nodes of the cluster;
The resource monitor is used for monitoring the resource state of the node;
the DDS middleware is used for publishing the monitored resource state data to the global data space;
through DDS middleware, the monitoring center can subscribe and acquire monitoring data published by the computing nodes and the terminal nodes in the global data space.
The monitoring center can subscribe and acquire monitoring data issued by the computing nodes and the terminal nodes in the global data space through the DDS middleware. The monitoring center can dynamically adjust the pulling rate of the monitoring data according to the self-load condition, including the data receiving and processing capacity and the communication bandwidth capacity.
The application realizes the resource and state monitoring of each distributed node in the cluster through the DDS publishing/subscribing mechanism, the nodes agree on the Topic of monitoring data transmission communication by calling the Topic binding interface of the monitoring center, and the monitoring center and the monitored nodes publish and acquire monitoring data through the same Topic. The application realizes the active data acquisition of the monitoring center in a publish/subscribe mode, can dynamically adjust the quantity and the frequency of the monitoring data acquisition according to the load of the monitoring center, and effectively solves the problem of data congestion of a receiving end. Meanwhile, the flexibility and the expandability of the system are improved.
The system comprises N computing nodes, N terminal nodes and a monitoring center, wherein:
The computing node provides computing resources for running computing tasks;
the terminal nodes provide sensing resources and functional resources for realizing computation, storage and communication of the terminal equipment.
The terminal node is a combination of terminal equipment and computing equipment, the terminal equipment comprises sensing equipment and functional equipment, the computing equipment comprises general computing equipment, a resource agent is deployed in the computing equipment with computing, storing and communication capabilities and used for controlling interaction and data interaction with the terminal equipment, the resource agent is operated in a container of the computing equipment, a driving program of the terminal equipment is packaged in the container, and an interface method is provided outside the container and used for controlling the terminal equipment.
The interface method of (1) comprises one or more of Socket call, web Service, JSON-RPC and Restful API.
All nodes in the system include a processor and a computer readable storage medium having stored therein a computer program for enabling distributed monitoring.
As shown in fig. 1, the distributed resource monitoring process is divided into two processes of a monitored node and a monitoring center.
For the monitored node:
Step a1, each computing node and terminal node in the cluster call a theme binding interface of a monitoring center to register the node, wherein parameters included in interface call comprise node names, node IP addresses, node MAC physical addresses, monitoring data themes Topic and monitoring periods;
step a2, after the node receives the successful binding mark, starting multithreading to perform concurrent monitoring on various resources of the node;
step a3, the node stores the monitoring data in a local storage medium;
Step a4, the node packages the collected monitoring data into a JSON data format;
and a step a5, the node issues the encapsulated monitoring data to the global data space through the DDS middleware according to the monitoring data subject Topic bound in the step 1, and the step a4 is executed.
For a monitoring center:
step b2, the monitoring center starts a theme binding interface service;
Step b2, the monitoring center receives the Topic binding request of the node and analyzes the request, subscribes to the Topic of the node monitoring data according to the Topic field, starts a monitoring thread for acquiring the monitoring data, and returns a binding success mark;
Step b3, according to the node monitoring data pulling rate set by the user, monitoring data of the corresponding node is obtained from the global data space;
and b4, if the latest acquired monitoring data is abnormal and exceeds a set threshold value, carrying out event warning.
After the computing node and the terminal node call the Topic binding interface successfully, the monitored node and the monitoring center are stated to achieve Topic agreement, data communication between the monitored node and the monitoring center is achieved by publishing and subscribing data to the same Topic, the monitored node circularly carries out the resource state monitoring and the process of monitoring data publishing in step b1 to step b3, and the monitoring center responds to abnormal node resources or states in step b 4.
After the monitoring center receives the topic binding request of the node, analyzing and processing the interface data, and circularly reading the monitoring data of the node according to the preset pulling rate through the step c 2.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present application should be included in the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (8)

1. A method for monitoring distributed resources, the method being applied to a distributed system, the method comprising:
establishing a software environment, and deploying middleware on each node, wherein all the middleware covers a monitoring center and each node, a global data space is formed between the monitoring center and each node, each node transmits data to the global data space through the middleware, and the monitoring center subscribes monitoring data needed in the global data space through the middleware;
The resource monitor node registers the related information of the node into a database of the monitoring center by calling a theme binding interface of the monitoring center, so as to realize the one-to-one mapping relation between the theme and the monitoring node;
After the monitoring center theme binding interface is called by the monitoring node, subscribing the monitoring data of the node according to the theme Topic provided by the monitoring node, setting the pulling rate of the monitoring data by a user, and acquiring the monitoring data of the corresponding node from the global data space;
The specific steps of each node include:
Step 1, each computing node and terminal node in the cluster call a theme binding interface of a monitoring center to register the node, wherein parameters contained in interface call comprise node names, node IP addresses, node MAC physical addresses, monitoring data themes Topic and monitoring periods;
Step 2, after the node receives the successful binding mark, starting multithreading to perform concurrent monitoring on various resources of the node;
step 3, the node stores the monitoring data in a local storage medium;
step 4, the node packages the collected monitoring data into a JSON data format;
step 5, the node issues the encapsulated monitoring data to the global data space through the DDS middleware according to the monitoring data subject Topic bound in the step 1, and the step 2 is executed;
the specific steps of the monitoring center comprise:
step 1, a monitoring center starts a theme binding interface service;
step 2, the monitoring center receives the Topic binding request of the node and analyzes the request, subscribes to the Topic of the node monitoring data according to the Topic field, starts a monitoring thread for acquiring the monitoring data, and returns a binding success mark;
And step 3, obtaining the monitoring data of the corresponding node from the global data space according to the node monitoring data pulling rate set by the user.
2. The distributed resource monitoring method according to claim 1, wherein the MAC physical address of the Topic or splice node is generated by a string random method.
3. The distributed resource monitoring method according to claim 1, wherein the monitored item of the monitored node and the processing manner of the monitored data include:
for a computing node, monitoring items comprise node CPU occupancy rate, memory occupancy rate, disk occupancy rate, network bandwidth occupancy rate, node survival state, node working state, task running state and monitoring time stamp;
For the terminal node, the monitoring item comprises a monitoring item of the computing node, data collected by a sensor and a terminal working state;
The resource monitor of the monitored node receives the binding success mark returned by the monitoring center, monitors the monitoring item of the node according to the preset monitoring period, locally stores the monitoring result data, and distributes the monitoring result data to the global data space according to the binding monitoring data subject Topic through the DDS middleware;
The node encapsulates the collected monitoring data into a data format that can be identified.
4. The distributed resource monitoring system is characterized by comprising a monitoring center and a plurality of nodes, wherein the monitoring center deploys a cluster monitor and a DDS middleware, and all the nodes deploy the resource monitor and the DDS middleware, and the distributed resource monitoring system comprises the following components:
The monitoring center is provided with a cluster monitor and a DDS middleware for monitoring all bound computing nodes and terminal nodes of the cluster;
The resource monitor is used for monitoring the resource state of the node;
the DDS middleware is used for publishing the monitored resource state data to the global data space;
the monitoring center subscribes and acquires monitoring data published by the computing node and the terminal node in the global data space through the DDS middleware;
The specific steps of each node include:
Step 1, each computing node and terminal node in the cluster call a theme binding interface of a monitoring center to register the node, wherein parameters contained in interface call comprise node names, node IP addresses, node MAC physical addresses, monitoring data themes Topic and monitoring periods;
Step 2, after the node receives the successful binding mark, starting multithreading to perform concurrent monitoring on various resources of the node;
step 3, the node stores the monitoring data in a local storage medium;
step 4, the node packages the collected monitoring data into a JSON data format;
step 5, the node issues the encapsulated monitoring data to the global data space through the DDS middleware according to the monitoring data subject Topic bound in the step 1, and the step 2 is executed;
the specific steps of the monitoring center comprise:
step 1, a monitoring center starts a theme binding interface service;
step 2, the monitoring center receives the Topic binding request of the node and analyzes the request, subscribes to the Topic of the node monitoring data according to the Topic field, starts a monitoring thread for acquiring the monitoring data, and returns a binding success mark;
And step 3, obtaining the monitoring data of the corresponding node from the global data space according to the node monitoring data pulling rate set by the user.
5. The distributed resource monitoring system of claim 4, wherein the system comprises N computing nodes, N terminal nodes, and a monitoring center, wherein:
the computing node provides computing resources for running computing tasks;
The terminal node provides sensing resources and functional resources for realizing calculation, storage and communication of the terminal equipment.
6. The distributed resource monitoring system of claim 5 wherein the terminal nodes comprise terminal devices and computing devices, the terminal devices comprise sensing devices and functional devices, the computing devices comprise general purpose computing devices with computing, storage and communication capabilities, the computing devices are provided with resource agents for control interaction and data interaction with the terminal devices, the resource agents are operated in containers of the computing devices, the containers encapsulate drivers of the terminal devices, and the containers provide interface methods for controlling the terminal devices.
7. The distributed resource monitoring system of claim 6, wherein the interface method comprises one or more of Socket calls, web services, JSON-RPC, restful APIs.
8. The distributed resource monitoring system of claim 4 wherein all nodes in the system include a processor and a computer readable storage medium having stored therein a computer program for effecting distributed monitoring.
CN202111636767.XA 2021-12-29 2021-12-29 A distributed resource monitoring method and system Active CN114443422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111636767.XA CN114443422B (en) 2021-12-29 2021-12-29 A distributed resource monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111636767.XA CN114443422B (en) 2021-12-29 2021-12-29 A distributed resource monitoring method and system

Publications (2)

Publication Number Publication Date
CN114443422A CN114443422A (en) 2022-05-06
CN114443422B true CN114443422B (en) 2025-06-27

Family

ID=81365200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111636767.XA Active CN114443422B (en) 2021-12-29 2021-12-29 A distributed resource monitoring method and system

Country Status (1)

Country Link
CN (1) CN114443422B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979158B (en) * 2022-05-23 2024-04-09 深信服科技股份有限公司 Resource monitoring method, system, equipment and computer readable storage medium
CN117319234B (en) * 2023-09-14 2025-01-21 西安探索者智能光电科技有限公司 Distributed communication system and communication method based on GRPC and OpenDDS

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943649A (en) * 2017-11-21 2018-04-20 郑州云海信息技术有限公司 A kind of distributed type assemblies performance monitoring system and method
CN108829509A (en) * 2018-05-03 2018-11-16 山东汇贸电子口岸有限公司 Distributed container cluster framework resources management method based on domestic CPU and operating system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996502B2 (en) * 2004-01-20 2006-02-07 International Business Machines Corporation Remote enterprise management of high availability systems
CN109547243B (en) * 2018-11-16 2021-12-03 南京华讯方舟通信设备有限公司 DDS-based cross-network-segment communication method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943649A (en) * 2017-11-21 2018-04-20 郑州云海信息技术有限公司 A kind of distributed type assemblies performance monitoring system and method
CN108829509A (en) * 2018-05-03 2018-11-16 山东汇贸电子口岸有限公司 Distributed container cluster framework resources management method based on domestic CPU and operating system

Also Published As

Publication number Publication date
CN114443422A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN110704186B (en) Computing resource allocation method and device based on hybrid distribution architecture and storage medium
Baresi et al. Towards a serverless platform for edge computing
CN106708622B (en) Cluster resource processing method and system and resource processing cluster
CN109976774B (en) Block link point deployment method, device, equipment and storage medium
CN114443422B (en) A distributed resource monitoring method and system
US7313547B2 (en) Manager level device/service arbitrator and methods
CN113312165B (en) Task processing method and device
CN109302483A (en) Application management method and system
JP2023139086A5 (en)
CN103118076B (en) Upgraded server cluster system and load balancing method thereof
CN111126895A (en) Management warehouse and scheduling method for scheduling intelligent analysis algorithm in complex scene
CN111880936A (en) Resource scheduling method and device, container cluster, computer equipment and storage medium
CN111625354A (en) Arrangement method of computing power of edge computing equipment and related equipment thereof
CN112445615B (en) A thread scheduling system, computer device and storage medium
CN113382077A (en) Micro-service scheduling method and device, computer equipment and storage medium
CN107924383A (en) System and method for network function virtualization resource management
CN105791354A (en) Job scheduling method and cloud scheduling server
CN112866321B (en) A resource scheduling method, device and system
WO2014101475A1 (en) Cloud platform application deployment method and apparatus
CN114615268B (en) Service network, monitoring node, container node and equipment based on Kubernetes cluster
CN115686805A (en) GPU resource sharing method and device, and GPU resource sharing scheduling method and device
CN113032153A (en) Dynamic capacity expansion method, system, device and storage medium for container service resources
CN114371931A (en) Service cluster resource allocation method and device and computer equipment
US20240073092A1 (en) Deploying network functions of a network slice
CN116069481A (en) Container scheduling system and scheduling method for sharing GPU resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant