CN105610648B - A method and server for collecting operation and maintenance monitoring data - Google Patents
A method and server for collecting operation and maintenance monitoring data Download PDFInfo
- Publication number
- CN105610648B CN105610648B CN201610014809.9A CN201610014809A CN105610648B CN 105610648 B CN105610648 B CN 105610648B CN 201610014809 A CN201610014809 A CN 201610014809A CN 105610648 B CN105610648 B CN 105610648B
- Authority
- CN
- China
- Prior art keywords
- server
- monitoring
- client
- data
- acquisition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 456
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000012423 maintenance Methods 0.000 title abstract description 19
- 238000013480 data collection Methods 0.000 claims abstract description 64
- 238000012545 processing Methods 0.000 claims description 32
- 230000002159 abnormal effect Effects 0.000 claims description 16
- 230000001960 triggered effect Effects 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims 10
- 230000006399 behavior Effects 0.000 claims 1
- 239000003795 chemical substances by application Substances 0.000 description 26
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
- H04L67/025—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
Abstract
本发明公开一种运维监控数据的采集方法及服务器,属于信息安全领域。服务器根据服务器配置文件中各个监控项的采集间隔,获取有满足采集条件的监控项,将满足采集条件的监控项对应的监控信息发送至客户端,接收客户端返回的采集结果,对所述采集结果进行处理。采用本发明的技术方法,监控针对的粒度更小,从原来的机器,降到机器上的某个可以监控的资源上。在机器需要新增监控资源时,系统程序不需做修改,只需编写对应的监控数据采集应用并配置到系统中即可。提高了系统的稳定性,并且增加了系统的可扩展性。
The invention discloses a method for collecting operation and maintenance monitoring data and a server, belonging to the field of information security. According to the collection interval of each monitoring item in the server configuration file, the server obtains the monitoring items that meet the collection conditions, sends the monitoring information corresponding to the monitoring items that meet the collection conditions to the client, receives the collection results returned by the client, and performs the collection The results are processed. By adopting the technical method of the present invention, the granularity of monitoring is smaller, and the original machine is reduced to a certain resource on the machine that can be monitored. When the machine needs to add monitoring resources, the system program does not need to be modified, just write the corresponding monitoring data collection application and configure it in the system. Improve the stability of the system, and increase the scalability of the system.
Description
技术领域technical field
本发明涉及信息安全领域,尤其涉及一种运维监控数据的采集方法及服务器。The invention relates to the field of information security, in particular to a method for collecting operation and maintenance monitoring data and a server.
背景技术Background technique
在现有的灾备系统中,运维监控的扩展性只能是在机器层级的,不能是在单台机器上的资源上,且每台机器上的监控的资源都是预先设计好的,不易在单台机器上增加新的监控资源。In the existing disaster recovery system, the scalability of operation and maintenance monitoring can only be at the machine level, not on the resources on a single machine, and the monitoring resources on each machine are pre-designed. It is not easy to add new monitoring resources on a single machine.
因此,如需新增监控资源,需要在系统层级上做开发支持,不利于系统的稳定性和可维护性。Therefore, if you need to add monitoring resources, you need to provide development support at the system level, which is not conducive to the stability and maintainability of the system.
发明内容Contents of the invention
本发明的目的是为了解决现有技术中存在的问题,提供了一种运维监控数据的采集方法及服务器。The object of the present invention is to provide a method for collecting operation and maintenance monitoring data and a server in order to solve the problems existing in the prior art.
本发明采用的技术方案是:一种运维监控数据的采集方法,包括:The technical solution adopted in the present invention is: a method for collecting operation and maintenance monitoring data, comprising:
步骤S1:服务器根据服务器配置文件中各个监控项的采集间隔,判断是否有监控项满足采集条件,如果是,则执行步骤S2,否则继续执行步骤S1;Step S1: The server judges whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the server configuration file, if yes, executes step S2, otherwise continues to execute step S1;
步骤S2:所述服务器将满足所述采集条件的监控项对应的监控信息发送至客户端;所述客户端安装在被监控的机器上;Step S2: the server sends the monitoring information corresponding to the monitoring items meeting the collection conditions to the client; the client is installed on the monitored machine;
步骤S3:所述服务器接收所述客户端返回的采集结果;所述采集结果是由所述客户端根据接收到的所述监控信息在所述被监控的机器上采集到的数据;Step S3: the server receives the collection result returned by the client; the collection result is the data collected by the client on the monitored machine according to the received monitoring information;
步骤S4:所述服务器对所述采集结果进行处理,返回步骤S1。Step S4: the server processes the collection result, and returns to step S1.
所述方法还包括:所述服务器获取更新后的服务器配置文件,根据所述更新后的服务器配置文件中的各个监控项的采集间隔,判断是否有监控项满足采集条件,如果是,则执行步骤S2,否则继续执行步骤S1’。The method also includes: the server obtains an updated server configuration file, and according to the collection interval of each monitoring item in the updated server configuration file, judges whether any monitoring item satisfies the collection condition, and if so, executes the step S2, otherwise continue to execute step S1'.
所述步骤S1具体为:所述服务器根据所述服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件,当检测到有定时事件触发时,则有监控项满足采集条件,执行步骤S2,否则继续执行步骤S1。The step S1 is specifically: the server registers a timing event for each monitoring item according to the collection interval of each monitoring item in the server configuration file, and when it is detected that a timing event is triggered, then there is a monitoring item that meets the acquisition condition, and executes Step S2, otherwise continue to execute step S1.
所述根据所述服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件,具体为:所述服务器从所述服务器配置文件中解析得到所有监控项,根据所有监控项的数量,为各个监控项创建工作进程组,根据每个监控项的采集间隔,为各个工作进程组注册定时事件。According to the collection interval of each monitoring item in the server configuration file, register a timing event for each monitoring item, specifically: the server parses all monitoring items from the server configuration file, and according to the quantity of all monitoring items, Create a working process group for each monitoring item, and register timing events for each working process group according to the collection interval of each monitoring item.
所述客户端包括数据采集应用,则所述步骤S2与所述步骤S3之间还包括:The client includes a data collection application, and between the step S2 and the step S3 also includes:
步骤d1:所述客户端将接收到的所述监控信息发送至所述数据采集应用;Step d1: the client sends the received monitoring information to the data collection application;
步骤d2:所述数据采集应用根据接收到的所述监控信息采集对应的所述客户端中的数据,将采集得到的采集结果发送至所述服务器。Step d2: the data collection application collects the corresponding data in the client according to the received monitoring information, and sends the collected collection result to the server.
所述步骤S2具体为:当所述服务器配置文件的格式为文本文件格式时,调用处理文本文件格式的接口,将满足采集条件的监控项对应的监控信息发送至所述客户端。The step S2 specifically includes: when the server configuration file is in a text file format, calling an interface for processing a text file format, and sending monitoring information corresponding to monitoring items meeting collection conditions to the client.
所述步骤S2具体为:当所述服务器配置文件的格式为XML文件格式时,调用处理XML文件格式的接口,将满足采集条件的监控项对应的监控信息发送至所述客户端。The step S2 specifically includes: when the server configuration file is in the XML file format, calling an interface for processing the XML file format, and sending the monitoring information corresponding to the monitoring items meeting the collection conditions to the client.
所述步骤S4具体包括:Described step S4 specifically comprises:
步骤e1:所述服务器判断所述采集结果是否异常,如果是,则执行步骤e2,否则返回步骤S1;Step e1: the server judges whether the collection result is abnormal, if yes, execute step e2, otherwise return to step S1;
步骤e2:所述服务器根据所述监控项的监控信息,判断异常的采集结果是否需要报警,如果是,则执行步骤e3,否则返回步骤S1;Step e2: The server judges whether the abnormal collection result needs to be called an alarm according to the monitoring information of the monitoring item, if yes, execute step e3, otherwise return to step S1;
步骤e3:所述服务器根据所述监控项中的监控信息,判断是否允许发送报警通知,如果是,则执行步骤e4,否则返回步骤S1;Step e3: The server judges whether the alarm notification is allowed to be sent according to the monitoring information in the monitoring item, if yes, executes step e4, otherwise returns to step S1;
步骤e4:所述服务器选择报警通知方式,将报警信息发送至管理员,返回步骤S1。Step e4: The server selects an alarm notification method, sends the alarm information to the administrator, and returns to step S1.
所述步骤S4具体为:所述服务器根据所述服务器配置文件判断是否需要分析所述采集结果的变化趋势,如果是,则将所述采集结果保存,返回步骤S1,否则直接返回步骤S1。The step S4 is specifically: the server judges according to the server configuration file whether it is necessary to analyze the change trend of the collection result, if yes, save the collection result and return to step S1, otherwise directly return to step S1.
所述步骤S4具体为:所述服务器根据所述采集结果判断是否需要记录日志信息,如果是,则记录日志信息,返回步骤S1,否则直接返回步骤S1。The step S4 specifically includes: the server judges whether log information needs to be recorded according to the collection result, if yes, records the log information, and returns to step S1, otherwise directly returns to step S1.
所述步骤S3之前还包括:所述客户端加载客户端配置文件,根据所述客户端配置文件中的配置参数,监听与所述服务器的连接端口,等待接收所述服务器的监控信息。Before the step S3, the method further includes: the client loads the client configuration file, monitors the connection port with the server according to the configuration parameters in the client configuration file, and waits to receive the monitoring information of the server.
所述步骤S3之前还包括:Also include before the step S3:
步骤f1:所述客户端接收到所述服务器发送的所述监控信息后,根据所述客户端配置文件,判断所述服务器是否为允许接入的设备,如果是,则执行步骤f2,否则继续等待接收监控信息;Step f1: After receiving the monitoring information sent by the server, the client judges whether the server is a device that allows access according to the configuration file of the client, and if so, executes step f2, otherwise continues Waiting to receive monitoring information;
步骤f2:所述客户端根据所述客户端配置文件,判断所述监控信息是否为允许执行的监控信息,如果是,则执行步骤f3,否则继续等待接收监控信息;Step f2: The client judges whether the monitoring information is monitoring information that is allowed to be executed according to the client configuration file, and if so, executes step f3, otherwise continues to wait for receiving the monitoring information;
步骤f3:所述客户端根据接收到的所述监控信息采集对应的数据,将采集得到的采集结果发送至所述服务器。Step f3: the client collects corresponding data according to the received monitoring information, and sends the collected collection result to the server.
所述步骤S2具体为:当所述服务器获取所述服务器配置文件中的采集数据方式为监控代理采集数据方式时,将满足采集条件的监控项对应的监控信息发送至客户端。The step S2 specifically includes: when the data collection method in the server configuration file acquired by the server is the data collection method of the monitoring agent, sending the monitoring information corresponding to the monitoring items satisfying the collection conditions to the client.
所述步骤S2和所述步骤S3替换为:当所述服务器获取所述服务器配置文件中的采集数据方式为SNMP代理采集数据方式时,将满足采集条件的监控项对应的监控信息发送至SNMP代理,接收所述SNMP代理返回的采集结果,执行步骤S4。The step S2 and the step S3 are replaced by: when the server obtains the data collection method in the server configuration file as the SNMP agent data collection method, send the monitoring information corresponding to the monitoring items that meet the collection conditions to the SNMP agent , receiving the collection result returned by the SNMP agent, and performing step S4.
所述服务器包括数据采集器,所述步骤S2具体为:The server includes a data collector, and the step S2 is specifically:
步骤b1:所述服务器将满足所述采集条件的监控项对应的监控信息发送至所述数据采集器;Step b1: the server sends the monitoring information corresponding to the monitoring item meeting the collection condition to the data collector;
步骤b2:所述数据采集器将接收到的所述监控信息发送至所述客户端。Step b2: the data collector sends the received monitoring information to the client.
所述步骤b1与所述步骤b2之间还包括:所述数据采集器获取所述监控信息中的数据采集的执行方式为本机执行,所述数据采集器根据所述监控信息采集所述服务器中的数据,得到采集结果,将所述采集结果发送至所述服务器,执行步骤S4。Between the step b1 and the step b2, it also includes: the execution mode of the data acquisition in the monitoring information obtained by the data collector is local execution, and the data collector collects the data collected by the server according to the monitoring information The data in the method is obtained, and the collection result is obtained, and the collection result is sent to the server, and step S4 is executed.
所述步骤b2具体为:当所述数据采集器获取所述监控信息中的数据采集的执行方式为客户端执行时,将接收到的所述监控信息发送至所述客户端。The step b2 specifically includes: when the execution mode of the data acquisition in the monitoring information acquired by the data collector is executed by the client, sending the received monitoring information to the client.
所述服务器包括数据采集应用,则所述数据采集器根据所述监控信息采集所述服务器中的数据,具体为:The server includes a data collection application, and the data collector collects data in the server according to the monitoring information, specifically:
步骤c1:所述数据采集器将所述监控信息发送至所述数据采集应用;Step c1: the data collector sends the monitoring information to the data collection application;
步骤c2:所述数据采集应用根据接收到的所述监控信息采集对应的服务器中的数据;Step c2: the data collection application collects data in the corresponding server according to the received monitoring information;
步骤c3:所述数据采集应用将采集到的所述服务器中的数据发送至所述数据采集器。Step c3: the data collection application sends the collected data in the server to the data collector.
所述步骤b2之前还包括:所述数据采集器判断接收到的监控信息中是否包含辅助信息,如果是,则显示所述辅助信息,结束,否则执行步骤b2。Before the step b2, the method further includes: the data collector judges whether the received monitoring information contains auxiliary information, if yes, displays the auxiliary information, and ends, otherwise, executes step b2.
所述步骤S2与所述步骤S3之间,还包括:Between the step S2 and the step S3, it also includes:
步骤g1:所述客户端根据接收到的所述监控信息采集对应的数据,将采集得到的采集结果发送至所述数据采集器;Step g1: the client collects corresponding data according to the received monitoring information, and sends the collected collection result to the data collector;
步骤g2:所述数据采集器根据所述采集结果中的校验和,判断所述采集结果是否正确,如果是,则执行步骤g3,否则报错,结束;Step g2: The data collector judges whether the collection result is correct according to the checksum in the collection result, if yes, executes step g3, otherwise reports an error, and ends;
步骤g3:所述数据采集器将所述采集结果发送至所述服务器。Step g3: the data collector sends the collection result to the server.
所述步骤S1与所述步骤S2之间,还包括:所述服务器根据所述服务器配置文件中的操作系统信息,判断是否能够采集所述客户端中的数据,如果是,则执行步骤S2,否则报错,返回步骤S1。Between the step S1 and the step S2, it also includes: the server judges whether the data in the client can be collected according to the operating system information in the server configuration file, and if so, executes step S2, Otherwise, report an error and return to step S1.
所述判断是否能够采集所述客户端中的数据,具体为:所述服务器向所述客户端发送获取客户端操作系统信息的请求,接收所述客户端返回的客户端操作系统信息,判断所述服务器配置文件中的操作系统信息与接收到的所述客户端操作系统信息是否匹配,如果是,则执行步骤S2,否则报错,返回步骤S1。The judging whether the data in the client can be collected is specifically: the server sends a request to the client to obtain the client operating system information, receives the client operating system information returned by the client, and judges the client operating system information. Whether the operating system information in the server configuration file matches the received client operating system information, if yes, execute step S2, otherwise report an error, and return to step S1.
一种运维监控数据采集的服务器,包括:A server for collecting operation and maintenance monitoring data, including:
第一判断模块,用于根据服务器配置文件中各个监控项的采集间隔,判断是否有监控项满足采集条件;The first judging module is used for judging whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the server configuration file;
第一发送模块,用于当所述第一判断模块判断有监控项满足采集条件时,将满足所述采集条件的监控项对应的监控信息发送至客户端;The first sending module is configured to send the monitoring information corresponding to the monitoring items satisfying the collection conditions to the client when the first judging module judges that there are monitoring items meeting the collection conditions;
第一接收模块,用于接收所述客户端返回的采集结果;The first receiving module is configured to receive the collection result returned by the client;
处理模块,用于对所述第一接收模块接收到的所述采集结果进行处理,触发所述第一判断模块。A processing module, configured to process the collection result received by the first receiving module, and trigger the first judging module.
所述服务器,还包括第二判断模块;The server also includes a second judging module;
所述第二判断模块,用于获取更新后的服务器配置文件,根据所述更新后服务器配置文件中的各个监控项的采集间隔,判断是否有监控项满足采集条件,判断为是时,触发所述第一发送模块,判断为否时,继续触发所述第二判断模块。The second judging module is used to obtain the updated server configuration file, and judge whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the updated server configuration file, and when the judgment is yes, trigger the The first sending module, if the judgment is no, continue to trigger the second judging module.
所述第一判断模块,具体包括注册单元和检测单元;The first judging module specifically includes a registration unit and a detection unit;
所述注册单元,用于根据所述服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件;The registration unit is configured to register timing events for each monitoring item according to the collection interval of each monitoring item in the server configuration file;
所述检测单元,用于检测定时事件,当检测到有定时事件触发时,触发所述第一发送模块。The detecting unit is configured to detect a timing event, and trigger the first sending module when a timing event trigger is detected.
所述注册单元,具体用于从所述服务器配置文件中解析得到所有监控项,根据监控项的数量,为各个监控项创建工作进程组,根据每个监控项的采集间隔,为各个工作进程组注册定时事件。The registration unit is specifically used to parse and obtain all monitoring items from the server configuration file, create a working process group for each monitoring item according to the number of monitoring items, and create a working process group for each monitoring item according to the collection interval of each monitoring item. Register for timed events.
所述第一发送模块,具体用于当所述服务器配置文件的格式为文本文件格式时,调用处理文本文件格式的接口,将满足采集条件的监控项对应的监控信息发送至所述客户端。The first sending module is specifically configured to call an interface for processing a text file format when the server configuration file is in a text file format, and send monitoring information corresponding to monitoring items meeting collection conditions to the client.
所述第一发送模块,具体用于当所述服务器配置文件的格式为XML文件格式时,调用处理XML文件格式的接口,将满足采集条件的监控项对应的监控信息发送至所述客户端。The first sending module is specifically configured to call an interface for processing an XML file format when the server configuration file is in an XML file format, and send monitoring information corresponding to monitoring items meeting collection conditions to the client.
所述处理模块,具体包括:第一判断单元、第二判断单元、第三判断单元和报警单元;The processing module specifically includes: a first judging unit, a second judging unit, a third judging unit and an alarm unit;
所述第一判断单元,用于判断所述接收模块接收到的所述采集结果是否异常,判断为是时,触发所述第二判断单元,判断为否时,触发所述第一判断模块;The first judging unit is configured to judge whether the collection result received by the receiving module is abnormal, if it is judged to be yes, trigger the second judging unit, and if it is judged to be no, trigger the first judging module;
所述第二判断单元,用于根据所述监控项的监控信息,判断异常的采集结果是否需要报警,判断为是时,触发所述第三判断单元,判断为否时,触发所述第一判断模块;The second judging unit is used to judge whether the abnormal collection result needs to be called an alarm according to the monitoring information of the monitoring item, if it is judged to be yes, trigger the third judging unit, and if it is judged to be no, trigger the first Judgment module;
所述第三判断单元,用于根据所述监控项中的监控信息,判断是否允许发送报警通知,判断为是时,触发所述报警单元,判断为否时,触发所述第一判断模块;The third judging unit is used to judge whether to allow sending an alarm notification according to the monitoring information in the monitoring item, if it is judged to be yes, trigger the alarm unit, and if it is judged to be no, trigger the first judgment module;
所述报警单元,用于当所述第三判断单元判断为是时,选择报警通知方式,将报警信息发送至管理员,触发所述第一判断模块。The alarm unit is configured to select an alarm notification mode, send alarm information to an administrator, and trigger the first determination module when the third determination unit determines yes.
所述服务器还包括存储模块;The server also includes a storage module;
所述处理模块,具体用于根据所述服务器配置文件判断是否需要分析所述采集数据的变化趋势,判断为是时,触发所述存储模块,判断为否时,触发所述第一判断模块;The processing module is specifically used to judge whether it is necessary to analyze the change trend of the collected data according to the server configuration file, if it is judged to be yes, trigger the storage module, and if it is judged to be no, trigger the first judgment module;
所述存储模块,用于将所述采集结果保存,触发所述第一判断模块。The storage module is configured to save the collection result and trigger the first judging module.
所述服务器还包括记录日志模块;The server also includes a logging module;
所述处理模块,具体用于根据所述采集数据判断是否需要记录日志信息,判断为是时,触发所述记录日志模块,判断为否时,触发所述第一判断模块;The processing module is specifically used to judge whether log information needs to be recorded according to the collected data, if it is judged to be yes, trigger the log recording module, and if it is judged to be no, trigger the first judgment module;
所述记录日志模块,用于记录日志信息,触发所述第一判断模块。The log recording module is configured to record log information and trigger the first judging module.
所述第一发送模块,具体用于当获取所述服务器配置文件中的采集数据方式为监控代理采集数据方式时,将满足所述采集条件的监控项对应的监控信息发送至客户端。The first sending module is specifically configured to send the monitoring information corresponding to the monitoring items meeting the collection conditions to the client when the data collection method in the server configuration file is the monitoring agent data collection method.
所述第一发送模块,还用于当获取所述服务器配置文件中的采集数据方式为SNMP方式采集数据方式时,满足采集条件的监控项对应的监控信息发送至SNMP代理;The first sending module is also used to send the monitoring information corresponding to the monitoring items that meet the collection conditions to the SNMP agent when the data collection method in the server configuration file is obtained as the SNMP data collection method;
所述第一接收模块,还用于接收所述SNMP代理返回的采集结果,触发所述处理模块。The first receiving module is further configured to receive the collection result returned by the SNMP agent, and trigger the processing module.
所述服务器还包括数据采集器模块;The server also includes a data collector module;
所述数据采集器模块包括第一接收单元和第一发送单元;The data collector module includes a first receiving unit and a first sending unit;
所述第一发送模块,还用于当所述第一判断模块判断有监控项满足采集条件时,将满足所述采集条件的监控项对应的监控信息发送至所述第一接收单元;The first sending module is further configured to send the monitoring information corresponding to the monitoring items satisfying the collection conditions to the first receiving unit when the first judging module judges that there are monitoring items meeting the collection conditions;
所述第一接收单元,用于接收所述第一发送模块发送的所述监控信息;The first receiving unit is configured to receive the monitoring information sent by the first sending module;
所述第一发送单元,用于将所述第一接收单元接收到的所述监控信息发送至所述客户端。The first sending unit is configured to send the monitoring information received by the first receiving unit to the client.
所述数据采集器模块还包括第一采集单元;The data collector module also includes a first collection unit;
所述第一采集单元,用于当获取所述监控信息中的数据采集的执行方式为本机执行时,根据所述监控信息采集所述服务器中的数据,触发所述第一发送单元。The first collecting unit is configured to collect data in the server according to the monitoring information and trigger the first sending unit when the data collection in the monitoring information is executed locally.
所述第一发送单元,具体用于当获取所述监控信息中的数据采集的执行方式为客户端执行时,将所述第一接收单元接收到的所述监控信息发送至所述客户端。The first sending unit is specifically configured to send the monitoring information received by the first receiving unit to the client when the execution mode of acquiring the data in the monitoring information is executed by the client.
所述服务器包括数据采集应用模块;The server includes a data acquisition application module;
所述数据采集应用模块包括第二接收单元、第二采集单元和第二发送单元;The data collection application module includes a second receiving unit, a second collecting unit and a second sending unit;
所述第二接收单元,用于接收所述数据采集器模块发送的监控信息;The second receiving unit is configured to receive monitoring information sent by the data collector module;
所述第二采集单元,用于根据所述第二接收单元接收到的所述监控信息采集对应的服务器中的数据;The second collecting unit is configured to collect data in the corresponding server according to the monitoring information received by the second receiving unit;
所述第二发送单元,用于所述第二采集单元采集到的所述服务器中的数据发送至所述数据采集器模块。The second sending unit is configured to send the data in the server collected by the second collecting unit to the data collector module.
所述数据采集器模块还包括第四判断单元和显示单元;The data collector module also includes a fourth judging unit and a display unit;
所述第四判断单元,用于判断所述第一接收单元接收到的所述监控信息中是否包含辅助信息,判断为是时,触发所述显示单元,判断为否时,触发所述第一发送单元;The fourth judging unit is configured to judge whether the monitoring information received by the first receiving unit contains auxiliary information, and if it is judged to be yes, trigger the display unit; sending unit;
所述显示单元,用于显示所述辅助信息。The display unit is used to display the auxiliary information.
所述服务器还包括第三判断模块和报错模块;The server also includes a third judging module and an error reporting module;
所述第三判断模块,用于当所述第一判断模块判断为是时,根据所述服务器配置文件中的操作系统信息,判断是否能够采集所述客户端中的数据,判断为是时,触发所述第一发送模块,判断为否时,触发所述报错模块;The third judging module is configured to judge whether the data in the client can be collected according to the operating system information in the server configuration file when the judgment is yes by the first judgment module, and if the judgment is yes, triggering the first sending module, and triggering the error reporting module when the judgment is negative;
所述报错模块,用于报错,继续触发所述第一判断模块。The error reporting module is used to report an error and continue to trigger the first judging module.
所述第三判断模块具体包括第三发送单元、第三接收单元和第五判断单元;The third judging module specifically includes a third sending unit, a third receiving unit and a fifth judging unit;
所述第三发送单元,用于向所述客户端发送获取客户端操作系统信息的请求;The third sending unit is configured to send to the client a request for acquiring client operating system information;
所述第三接收单元,用于接收所述客户端返回的客户端操作系统信息;The third receiving unit is configured to receive the client operating system information returned by the client;
所述第五判断单元,用于判断所述服务器配置文件中的操作系统信息与所述第一接收单元接收到的所述客户端操作系统信息是否匹配,判断为是时,触发所述第一发送模块,判断为否时,触发所述报错模块。The fifth judging unit is configured to judge whether the operating system information in the server configuration file matches the client operating system information received by the first receiving unit, and trigger the first The sending module triggers the error reporting module when the judgment is negative.
本发明取得的有益效果是:采用本发明的技术方法,监控针对的粒度更小,从原来的机器,降到机器上的某个可以监控的资源上。在机器需要新增监控资源时,系统程序不需做修改,只需编写对应的监控数据采集应用并配置到系统中即可。提高了系统的稳定性,并且增加了系统的可扩展性。The beneficial effects obtained by the present invention are: adopting the technical method of the present invention, the granularity of monitoring is smaller, and the original machine is reduced to a certain resource on the machine that can be monitored. When the machine needs to add monitoring resources, the system program does not need to be modified, just write the corresponding monitoring data collection application and configure it in the system. Improve the stability of the system, and increase the scalability of the system.
附图说明Description of drawings
为了更清楚的说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明实施例1提供的一种运维监控数据的采集方法流程图;FIG. 1 is a flowchart of a method for collecting operation and maintenance monitoring data provided by Embodiment 1 of the present invention;
图2是本发明实施例2提供的一种运维监控数据系统的采集方法流程图;FIG. 2 is a flow chart of a collection method of an operation and maintenance monitoring data system provided in Embodiment 2 of the present invention;
图3是本发明实施例3提供的一种运维监控数据系统中监控服务器的工作方法流程图;3 is a flow chart of a working method of a monitoring server in an operation and maintenance monitoring data system provided by Embodiment 3 of the present invention;
图4是本发明实施例4提供的一种运维监控数据系统中数据采集器的工作方法流程图;4 is a flow chart of a working method of a data collector in an operation and maintenance monitoring data system provided in Embodiment 4 of the present invention;
图5是本发明实施例5提供的一种运维监控数据系统中监控客户端的工作方法流程图。Fig. 5 is a flowchart of a working method of a monitoring client in an operation and maintenance monitoring data system provided by Embodiment 5 of the present invention.
图6是本发明实施例6提供的一种运维监控数据采集的服务器的装置图。FIG. 6 is a device diagram of a server for collecting operation and maintenance monitoring data according to Embodiment 6 of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
预先在服务器和客户端中均注册有配置文件,配置文件中包括服务启动的配置参数以及指定的监控机器的监控配置文件。监控配置文件可以有多个,每个监控配置文件代表监控的一台机器,客户端安装在需要被检测的机器上。监控配置文件中包含机器的基本信息,以及多个监控项,每个监控项中包含多个监控信息,包括采集数据的类型及采集数据的属性(如采集信息描述、IP地址等)等,其中,监控项和监控信息均可以由用户自行设定,也可以根据监控服务器或监控客户端采集数据的频率进行设定。Both the server and the client are pre-registered with configuration files, which include configuration parameters for service startup and monitoring configuration files for specified monitoring machines. There can be multiple monitoring configuration files, each monitoring configuration file represents a machine to be monitored, and the client is installed on the machine to be detected. The monitoring configuration file contains the basic information of the machine and multiple monitoring items. Each monitoring item contains multiple monitoring information, including the type of data collected and the attributes of the collected data (such as collection information description, IP address, etc.), among which , the monitoring items and monitoring information can be set by the user, or can be set according to the frequency of data collection by the monitoring server or monitoring client.
本发明中,只要能够通过量化的数字表示的资源,并且能够使用某种方式通过程序采集到量化数据的,都可以作为监控项,其中,监控项包括但不限于CPU使用率、CPU负载、内存使用率、磁盘使用率、网络接口流量、WEB网站的响应时间等等。In the present invention, as long as resources that can be represented by quantified numbers and can be used to collect quantitative data through programs in a certain way, they can be used as monitoring items. Among them, monitoring items include but are not limited to CPU usage, CPU load, memory Usage, disk usage, network interface traffic, WEB site response time, etc.
实施例1Example 1
本发明实施例1提供了一种运维监控数据的采集方法,本实施例中,服务器中可以设置有数据采集器和数据采集应用,客户端中可以设置有数据采集应用,如图1所示,包括:Embodiment 1 of the present invention provides a method for collecting operation and maintenance monitoring data. In this embodiment, a data collector and a data collection application may be provided in the server, and a data collection application may be provided in the client, as shown in FIG. 1 ,include:
步骤S1:服务器根据服务器配置文件中各个监控项的采集间隔,判断是否有监控项满足采集条件,如果是,则执行步骤S2,否则继续执行步骤S1;Step S1: The server judges whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the server configuration file, if yes, executes step S2, otherwise continues to execute step S1;
本步骤具体为:服务器根据服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件,当检测到有定时事件触发时,则有监控项满足采集条件,执行步骤S2,否则继续执行步骤S1;This step is specifically as follows: the server registers timing events for each monitoring item according to the collection interval of each monitoring item in the server configuration file. When a timing event is detected to be triggered, there are monitoring items that meet the collection conditions, and step S2 is executed; otherwise, continue to execute Step S1;
其中,根据服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件,具体为:服务器从服务器配置文件中解析得到所有监控项,根据监控项的数量,为各个监控项创建工作进程组,根据每个监控项的采集间隔,为各个工作进程组注册定时事件。Among them, according to the collection interval of each monitoring item in the server configuration file, register timing events for each monitoring item, specifically: the server parses all monitoring items from the server configuration file, and creates a working process for each monitoring item according to the number of monitoring items Group, according to the collection interval of each monitoring item, register timing events for each working process group.
步骤S2:服务器将满足采集条件的监控项对应的监控信息发送至客户端;Step S2: The server sends the monitoring information corresponding to the monitoring items meeting the collection conditions to the client;
其中,客户端安装在需要被监控的机器上;Among them, the client is installed on the machine to be monitored;
当服务器配置文件的格式为文本文件格式时,调用处理文本文件格式的接口,将满足采集条件的监控项对应的监控信息发送至客户端;When the format of the server configuration file is a text file format, call the interface for processing the text file format, and send the monitoring information corresponding to the monitoring items that meet the collection conditions to the client;
当服务器配置文件的格式为XML文件格式时,调用处理XML文件格式的接口,将满足采集条件的监控项对应的监控信息发送至客户端;When the format of the server configuration file is an XML file format, call the interface for processing the XML file format, and send the monitoring information corresponding to the monitoring items that meet the collection conditions to the client;
本步骤具体包括:This step specifically includes:
当服务器获取服务器配置文件中的采集数据方式为监控代理采集数据方式时,将满足采集条件的监控项对应的监控信息发送至客户端;When the server obtains the data collection method in the server configuration file as the data collection method of the monitoring agent, it sends the monitoring information corresponding to the monitoring items that meet the collection conditions to the client;
当服务器获取服务器配置文件中的采集数据方式为SNMP代理采集数据方式时,将满足采集条件的监控项对应的监控信息发送至SNMP代理,接收SNMP代理返回的采集数据,执行步骤S4。When the server obtains the data collection method in the server configuration file as the SNMP agent data collection method, it sends the monitoring information corresponding to the monitoring items meeting the collection conditions to the SNMP agent, receives the collected data returned by the SNMP agent, and executes step S4.
步骤S3:服务器接收客户端返回的采集结果;Step S3: the server receives the collection result returned by the client;
本实施例中,采集结果是由客户端根据接收到的监控信息在需要被监控的机器上采集到的数据;In this embodiment, the collection result is the data collected by the client on the machine to be monitored according to the received monitoring information;
步骤S4:服务器对采集结果进行处理,返回步骤S1。Step S4: the server processes the collection result, and returns to step S1.
本实施例中,服务器对采集结果进行处理包括报警(包括重启客户端等)、保存采集数据分析变化趋势、记录日志信息等。In this embodiment, the processing of the collected results by the server includes alarming (including restarting the client, etc.), saving the collected data to analyze changing trends, recording log information, and the like.
本实施例中,步骤S2之前还包括:In this embodiment, before step S2, it also includes:
步骤S1’:服务器获取更新后的服务器配置文件,根据更新后的服务器配置文件中的各个监控项的采集间隔,判断是否有监控项满足采集条件,如果是,则执行步骤S2,否则继续执行步骤S1’;Step S1': The server obtains the updated server configuration file, and according to the collection interval of each monitoring item in the updated server configuration file, judges whether any monitoring item satisfies the collection condition, if yes, executes step S2, otherwise proceeds to step S1';
其中,更新后的服务器配置文件中的监控项与更新前的服务器配置文件中的监控项不同。Wherein, the monitoring items in the updated server configuration file are different from the monitoring items in the server configuration file before updating.
实施例2Example 2
本发明实施例2提供了一种运维监控数据采集系统的工作方法,应用于包括服务器和客户端的系统中,其中,服务器包括监控服务器、数据采集器和数据采集应用;客户端包括监控客户端和数据采集应用;具体的,数据采集器是服务器用于获取采集结果的应用程序,数据采集应用可以部署在服务器中或客户端中,是监控服务器或监控客户端中用于数据采集的插件,如图2所示,包括:Embodiment 2 of the present invention provides a working method of an operation and maintenance monitoring data collection system, which is applied to a system including a server and a client, wherein the server includes a monitoring server, a data collector, and a data collection application; the client includes a monitoring client and data collection applications; specifically, the data collector is an application program used by the server to obtain collection results. The data collection application can be deployed in the server or client, and is a plug-in for data collection in the monitoring server or monitoring client. As shown in Figure 2, including:
步骤101:监控服务器启动,根据服务器配置文件中每个监控项的采集间隔,为每个监控项注册定时事件,等待定时事件的触发,当接收到定时事件的触发时,执行步骤102,否则继续等待定时事件的触发;Step 101: Start the monitoring server, register a timing event for each monitoring item according to the collection interval of each monitoring item in the server configuration file, wait for the trigger of the timing event, and execute step 102 when the trigger of the timing event is received, otherwise continue Wait for the trigger of the timing event;
本实施例中,在服务器配置文件的每个监控项中预先设定采集时长,其中,采集时长可以由用户自行设定或监控服务器根据采集数据的速率进行设定;In this embodiment, the collection duration is preset in each monitoring item of the server configuration file, wherein the collection duration can be set by the user or set by the monitoring server according to the rate of data collection;
其中,监控服务器的定时事件定时的监控服务器配置文件中每个监控项的采集时长,若有一个监控项的采集间隔达到了该监控项的监控时长时,对该监控项进行下述步骤的处理;Among them, the collection time of each monitoring item in the timing monitoring server configuration file of the timing event of the monitoring server, if the collection interval of a monitoring item reaches the monitoring duration of the monitoring item, the following steps are performed on the monitoring item ;
本实施例中,监控项的格式可以由用户自行设定,包括文本文件格式和XML文件格式等;In this embodiment, the format of the monitoring item can be set by the user, including text file format and XML file format, etc.;
例如,文本文件格式的服务器配置文件为(需要说明的是,服务器配置文件中可以包含一个或多个监控项,本实施例以服务器配置文件中包含一个监控项为例来说明):For example, the server configuration file in the text file format is (it should be noted that one or more monitoring items may be included in the server configuration file, and this embodiment uses the server configuration file to include one monitoring item as an example):
例如,XML文件格式的服务器配置文件为:For example, a server configuration file in XML file format is:
<service><service>
<use>generic-service</use><use>generic-service</use>
<host_name>192.168.88.179</host_name><host_name>192.168.88.179</host_name>
<service_description>Swap Usage</service_description><service_description>Swap Usage</service_description>
<check_command>check_nrpe!check_swap!20!10</check_command><check_command>check_nrpe! check_swap! 20! 10</check_command>
<active_checks_enabled>1</active_checks_enabled><active_checks_enabled>1</active_checks_enabled>
<passive_checks_enabled>1</passive_checks_enabled><passive_checks_enabled>1</passive_checks_enabled>
<parallelize_check>1</parallelize_check><parallelize_check>1</parallelize_check>
<obsess_over_service>1</obsess_over_service><obsess_over_service>1</obsess_over_service>
<check_freshness>0</check_freshness><check_freshness>0</check_freshness>
<notifications_enabled>1</notifications_enabled><notifications_enabled>1</notifications_enabled>
<event_handler_enabled>1</event_handler_enabled><event_handler_enabled>1</event_handler_enabled>
<flap_detection_enabled>1</flap_detection_enabled><flap_detection_enabled>1</flap_detection_enabled>
<process_perf_data>1</process_perf_data><process_perf_data>1</process_perf_data>
<retain_status_information>1</retain_status_information><retain_status_information>1</retain_status_information>
<retain_nonstatus_information>1</retain_nonstatus_information><retain_nonstatus_information>1</retain_nonstatus_information>
<is_volatile>0</is_volatile><is_volatile>0</is_volatile>
<check_period>24x7</check_period><check_period>24x7</check_period>
<max_check_attempts>3</max_check_attempts><max_check_attempts>3</max_check_attempts>
<normal_check_interval>2</normal_check_interval><normal_check_interval>2</normal_check_interval>
<retry_check_interval>2</retry_check_interval><retry_check_interval>2</retry_check_interval>
<contact_groups>admins</contact_groups><contact_groups>admins</contact_groups>
<notification_options>w,u,c,r</notification_options><notification_options>w,u,c,r</notification_options>
<notification_interval>60</notification_interval><notification_interval>60</notification_interval>
<notification_period>24x7</notification_period><notification_period>24x7</notification_period>
<register>0</register><register>0</register>
</service></service>
根据上述两种类型的服务器配置文件中的监控信息service_description SwapUsage或者<service_description>SwapUsage</service_description>可知,该服务器配置文件用于在监控客户端中检查机器192.168.89.179中swap分区的使用情况,其中,根据上述两种类型的服务器配置文件中的监控信息normal_check_interval2或者<normal_check_interval>2</normal_check_interval>,可知采集时长为2分钟;According to the monitoring information service_description SwapUsage or <service_description>SwapUsage</service_description> in the above two types of server configuration files, the server configuration file is used to check the usage of the swap partition in the machine 192.168.89.179 in the monitoring client, where , according to the monitoring information normal_check_interval2 or <normal_check_interval>2</normal_check_interval> in the above two types of server configuration files, it can be seen that the collection time is 2 minutes;
本实施例中,监控服务器注册定时事件后,记录第一时间,当监控到当前时间与第一时间的时间间隔(即采集间隔)达到采集时长的整数倍时,执行步骤102;In this embodiment, after the monitoring server registers the timing event, it records the first time, and when it is monitored that the time interval between the current time and the first time (that is, the collection interval) reaches an integer multiple of the collection time, step 102 is executed;
例如,监控服务器注册定时事件后,记录的第一时间为2015.09.24 10:39:25,则当监控到当前时间与第一时间的时间间隔达到所有监控项中最短的采集时长(2分钟)时,即当前时间为2015.09.24 10:41:25时,执行步骤102;For example, after the monitoring server registers the timing event, the first time recorded is 2015.09.24 10:39:25, then when the time interval between the current time and the first time is monitored to reach the shortest collection time (2 minutes) among all monitoring items When, that is, when the current time is 2015.09.24 10:41:25, step 102 is executed;
步骤102:监控服务器根据服务器配置文件中的操作系统判断是否能够采集监控客户端对应的被监控的机器上的数据,如果是,则执行步骤103,否则继续等待定时事件的触发;Step 102: the monitoring server judges whether the data on the monitored machine corresponding to the monitoring client can be collected according to the operating system in the server configuration file, if yes, then execute step 103, otherwise continue to wait for the triggering of the timing event;
本实施例中,本步骤具体为:In this embodiment, this step is specifically:
监控服务器向监控客户端发送获取客户端操作系统的请求,接收监控客户端返回的客户端操作系统信息,判断服务器配置文件中的操作系统信息和接收到的客户端操作系统信息是否匹配,如果是,则执行步骤102,否则报错,继续等待定时事件的触发;The monitoring server sends a request to the monitoring client to obtain the client operating system, receives the client operating system information returned by the monitoring client, and judges whether the operating system information in the server configuration file matches the received client operating system information. , then execute step 102, otherwise report an error and continue to wait for the trigger of the timing event;
进一步的,本实施例中,如果服务器配置文件中没有配置操作系统或者将操作系统配置为自动识别,则执行完步骤101后直接执行步骤103;Further, in this embodiment, if the operating system is not configured in the server configuration file or the operating system is configured to be automatically identified, step 103 is directly executed after step 101 is executed;
步骤103:监控服务器获取服务器配置文件中的采集数据方式,如果是监控代理采集数据方式,则执行步骤104,如果是SNMP代理采集数据方式,则执行步骤115;Step 103: the monitoring server obtains the data collection method in the server configuration file, if it is the monitoring agent data collection method, then execute step 104, if it is the SNMP agent data collection method, then execute step 115;
本实施例中,SNMP是基于TCP/IP协议族的网络管理标准,是一种在IP网络中管理网络节点的标准协议,实现监控服务器与被监控机器的之间的直接交互。In this embodiment, SNMP is a network management standard based on the TCP/IP protocol family, a standard protocol for managing network nodes in an IP network, and realizes direct interaction between a monitoring server and a monitored machine.
步骤104:监控服务器将接收到定时事件触发的监控项对应的监控信息发送至数据采集器;Step 104: The monitoring server sends the monitoring information corresponding to the monitoring item triggered by the timing event to the data collector;
步骤105:数据采集器获取接收到的监控信息中的数据采集的执行方式,如果是在本机执行,则执行步骤111,如果是在监控客户端执行,则执行步骤106;Step 105: The data collector acquires the execution mode of the data collection in the received monitoring information. If it is executed on the local machine, then execute step 111, and if it is executed on the monitoring client, then execute step 106;
本实施例中,预先约定,如果监控信息中的数据采集的执行方式check_command中没有check_agent,则标识该数据采集是在本机执行,如果监控信息中的采集执行方式check_command中使用到check_agent,则标识该数据采集是在监控客户端执行;In this embodiment, it is pre-agreed that if there is no check_agent in the execution mode check_command of the data collection in the monitoring information, it will be identified that the data collection is executed locally; if check_agent is used in the collection execution mode check_command in the monitoring information, it will be identified The data collection is performed on the monitoring client;
例如,步骤101的实例中,数据采集器从服务器配置文件中获取到的监控信息check_nrpe!check_swap!20!10中没有check_agent,表示该数据采集是在本机执行,执行步骤111;For example, in the example of step 101, the data collector acquires the monitoring information check_nrpe! from the server configuration file. check_swap! 20! There is no check_agent in 10, which means that the data collection is performed on the local machine, and go to step 111;
步骤106:数据采集器将监控信息发送至监控客户端;Step 106: the data collector sends the monitoring information to the monitoring client;
步骤107:监控客户端将监控信息发送至数据采集应用;Step 107: the monitoring client sends the monitoring information to the data collection application;
步骤108:数据采集应用运行接收到的监控信息,根据监控信息采集被监控机器上的数据,得到预设格式的采集结果;Step 108: collect the monitoring information received by running the data collection application, collect the data on the monitored machine according to the monitoring information, and obtain the collection result in a preset format;
本实施例中,优选的,预设格式的采集结果为:In this embodiment, preferably, the collection result in the preset format is:
监控项当前状态描述信息|监控项当前采集信息描述=监控项当前采集值;报警阀值;故障阀值;输出报表时y轴最小值;输出报表时y轴最大值;监控项当前状态描述信息|监控项当前采集信息描述=监控项当前采集值;报警阀值;故障阀值;输出报表时y轴最小值;输出报表时y轴最大值……(用“;”将多个采集项隔开);Description of the current state of the monitoring item | Description of the current collection information of the monitoring item = the current collection value of the monitoring item; alarm threshold; fault threshold; the minimum value of the y-axis when outputting the report; the maximum value of the y-axis when outputting the report; description information of the current state of the monitoring item |The current collection information description of the monitoring item=the current collection value of the monitoring item; alarm threshold; fault threshold; the minimum value of the y-axis when outputting the report; the maximum value of the y-axis when outputting the report... (use ";" to separate multiple collection items open);
例如,监控信息用于采集内存使用率,采集得到的采集结果为:For example, monitoring information is used to collect memory usage, and the collected collection results are:
Memory Free:/246MB(16%)|percentage=24;20;10;0;100;Memory Free:/246MB(16%)|percentage=24;20;10;0;100;
其中,监控项当前状态描述信息:Memory Free:/246MB(16%),监控项当前采集信息描述:percentage,监控项当前采集值:24,报警阀值:20,故障阀值:10,输出报表时y轴最小值:0,输出报表时y轴最大值:100。Among them, the current state description information of the monitoring item: Memory Free:/246MB (16%), the current collection information description of the monitoring item: percentage, the current collection value of the monitoring item: 24, the alarm threshold: 20, the failure threshold: 10, and output the report The minimum value of the y-axis when the report is output: 0, and the maximum value of the y-axis when the report is output: 100.
步骤109:数据采集应用将采集结果返回至监控客户端;Step 109: the data collection application returns the collection result to the monitoring client;
步骤110:监控客户端将采集结果返回至数据采集器,执行步骤114;Step 110: the monitoring client returns the collection result to the data collector, and executes step 114;
步骤111:数据采集器将所有监控信息发送至数据采集应用;Step 111: the data collector sends all monitoring information to the data collection application;
步骤112:数据采集应用运行接收到的监控信息,根据监控信息采集数据,得到预设格式的采集结果;Step 112: collect the monitoring information received by running the data collection application, collect data according to the monitoring information, and obtain the collection result in a preset format;
例如,配置信息用于采集CPU负载,采集得到的采集结果为:For example, the configuration information is used to collect the CPU load, and the collected collection results are as follows:
OK-load average:0.16,0.12,0.09|load1=0.160;15.000;30.000;0;load5=0.120;10.000;25.000;0;load15=0.090;5.000;20.000;0;OK-load average: 0.16, 0.12, 0.09 | load1 = 0.160; 15.000; 30.000; 0; load5 = 0.120; 10.000; 25.000; 0; load15 = 0.090;
其中,监控项当前状态描述信息:OK-load average:0.16,0.12,0.09,采集结果中包括三个值:Among them, the current state description information of the monitoring item: OK-load average: 0.16, 0.12, 0.09, and the collection results include three values:
第一个:First:
监控项当前采集信息描述:load1,监控项当前采集值:0.160,报警阀值:15.000,故障阀值:30.000,输出报表时y轴最小值:0,输出报表时y轴最大值:无(不限制)Description of the current collection information of the monitoring item: load1, the current collection value of the monitoring item: 0.160, the alarm threshold: 15.000, the fault threshold: 30.000, the minimum value of the y-axis when outputting the report: 0, the maximum value of the y-axis when outputting the report: none (no limit)
第二个:the second:
监控项当前采集信息描述:load5,监控项当前采集值:0.120,报警阀值:10.000,故障阀值:25.000,输出报表时y轴最小值:0,输出报表时y轴最大值:无(不限制)Description of the current collection information of the monitoring item: load5, the current collection value of the monitoring item: 0.120, the alarm threshold: 10.000, the fault threshold: 25.000, the minimum value of the y-axis when outputting the report: 0, the maximum value of the y-axis when outputting the report: none (no limit)
第三个:The third:
监控项当前采集信息描述:load15,监控项当前采集值:0.090,报警阀值:5.000,故障阀值:20.000,输出报表时y轴最小值:0,输出报表时y轴最大值:无(不限制);Description of the current collection information of the monitoring item: load15, the current collection value of the monitoring item: 0.090, the alarm threshold: 5.000, the fault threshold: 20.000, the minimum value of the y-axis when outputting the report: 0, the maximum value of the y-axis when outputting the report: none (no limit);
步骤113:数据采集应用将采集结果返回至数据采集器;Step 113: the data collection application returns the collection result to the data collector;
具体的,数据采集应用通过管道将采集数据发送至数据采集器。Specifically, the data collection application sends the collected data to the data collector through a pipeline.
步骤114:数据采集器将采集结果返回至监控服务器,执行步骤118;Step 114: the data collector returns the collection result to the monitoring server, and executes step 118;
步骤115:监控服务器将接收到定时事件触发的监控项对应的监控信息发送至SNMP代理;Step 115: the monitoring server sends the monitoring information corresponding to the monitoring item triggered by the received timing event to the SNMP agent;
步骤116:SNMP代理运行接收到的监控信息,根据监控信息采集数据,得到预设格式的采集结果;Step 116: The SNMP agent runs the received monitoring information, collects data according to the monitoring information, and obtains a collection result in a preset format;
步骤117:SNMP代理将采集结果发送至监控服务器,执行步骤118;Step 117: the SNMP agent sends the collection result to the monitoring server, and executes step 118;
步骤118:监控服务器对接收到的采集结果进行处理,继续等待定时事件的触发;Step 118: The monitoring server processes the received collection results, and continues to wait for the trigger of the timing event;
其中,监控服务器对采集结果的处理在实施例3中具体说明,在此不再赘述。Wherein, the processing of the collection result by the monitoring server is specifically described in Embodiment 3, and will not be repeated here.
实施例3Example 3
本发明实施例3提供了一种运维监控数据系统中监控服务器的工作方法,如图3所示,包括:Embodiment 3 of the present invention provides a working method of a monitoring server in an operation and maintenance monitoring data system, as shown in FIG. 3 , including:
步骤201:监控服务器启动,进行初始化,从服务器本地读取服务器配置文件;Step 201: start the monitoring server, perform initialization, and read the server configuration file locally from the server;
其中,进行初始化包括读取服务器配置文件及解析命令行参数等;Among them, initialization includes reading server configuration files and parsing command line parameters, etc.;
步骤202:监控服务器向监控客户端发送获取客户端操作系统的请求,接收监控客户端返回的客户端操作系统信息,判断服务器配置文件中的操作系统信息和接收到的客户端操作系统信息是否匹配,如果是,则执行步骤203,否则报错,结束;Step 202: The monitoring server sends a request to the monitoring client to acquire the client operating system, receives the client operating system information returned by the monitoring client, and determines whether the operating system information in the server configuration file matches the received client operating system information , if yes, execute step 203, otherwise report an error and end;
本实施例中,如果服务器配置文件中未配置操作系统信息,则执行步骤201之后直接执行步骤203;或者,如果服务器配置文件中配置有操作系统信息,但步骤202判断为否时,也可以执行步骤203。In this embodiment, if the operating system information is not configured in the server configuration file, step 203 is directly executed after step 201 is executed; or, if the operating system information is configured in the server configuration file, but step 202 is judged as no, it can also be executed Step 203.
步骤203:监控服务器判断服务器配置文件的格式,如果为文本文件格式,则调用处理文本文件格式的接口,执行步骤204,如果是XML文件格式,则调用处理XML文件格式的接口,执行步骤204;Step 203: the monitoring server judges the format of the server configuration file, if it is a text file format, then call the interface for processing the text file format, and perform step 204, if it is an XML file format, then call the interface for processing the XML file format, and perform step 204;
步骤204:监控服务器从服务器配置文件中解析得到所有监控项,将所有监控项读取到内存中;Step 204: the monitoring server parses all monitoring items from the server configuration file, and reads all monitoring items into memory;
步骤205:监控服务器根据所有监控项的数量,为各个监控项创建工作进程组,用以处理服务器配置文件中的监控项;Step 205: The monitoring server creates a working process group for each monitoring item according to the quantity of all monitoring items, so as to process the monitoring items in the server configuration file;
本实施例中,服务器配置文件中包含有所有监控项的数量,用以指明需要创建多少个工作进程组来进行数据监控,如,配置参数为check_workers=6;In this embodiment, the server configuration file contains the quantity of all monitoring items, which is used to indicate how many work process groups need to be created for data monitoring, such as, the configuration parameter is check_workers=6;
除此之外,服务器可默认创建的进程组的个数为1.5乘以CPU的个数,默认最少为4个。In addition, the number of process groups that the server can create by default is 1.5 multiplied by the number of CPUs, and the default is at least 4.
步骤206:监控服务器根据每个监控项的采集间隔,为每个工作进程注册定时事件;Step 206: The monitoring server registers timing events for each working process according to the collection interval of each monitoring item;
本实施例中,监控服务器根据监控项中的监控信息normal_check_interval的值(即该监控项的采集间隔),为对应的工作进程注册定时事件;In this embodiment, the monitoring server registers a timing event for the corresponding working process according to the value of the monitoring information normal_check_interval in the monitoring item (ie, the collection interval of the monitoring item);
例如,监控信息normal_check_interval为1,则设置工作进程注册定时事件为每1分钟执行一次,如果获取到监控信息normal_check_interval为2,则设置工作进程注册定时事件为每2分钟执行一次。For example, if the monitoring information normal_check_interval is 1, set the worker process registration timing event to be executed every 1 minute. If the obtained monitoring information normal_check_interval is 2, set the worker process registration timing event to be executed every 2 minutes.
步骤207:监控服务器的每个工作进程监控对应的监控项,判断是否监测到某个工作进程的定时事件触发,如果是,则执行步骤208,否则继续执行步骤207;Step 207: each working process of the monitoring server monitors the corresponding monitoring item, and judges whether a timing event trigger of a certain working process is detected, if yes, then execute step 208, otherwise continue to execute step 207;
具体为,判断是否存在某个工作进程的定时事件达到该监控项的采集间隔的整数倍;Specifically, it is judged whether there is a timing event of a certain working process that reaches an integer multiple of the collection interval of the monitoring item;
例如,监控项的采集间隔为1分钟,则当该监控项对应的工作进程监控到有定时事件触发时,进行数据采集。For example, if the collection interval of the monitoring item is 1 minute, when the work process corresponding to the monitoring item monitors that a timing event is triggered, data collection will be performed.
步骤208:监控服务器判断数据采集方式,如果是监控代理采集数据方式,则执行步骤209,如果是SNMP代理采集数据方式,则执行步骤210;Step 208: the monitoring server judges the data collection method, if it is the monitoring agent data collection method, then execute step 209, if it is the SNMP agent data collection method, then execute step 210;
步骤209:监控服务器将该工作进程对应的监控项所对应的监控信息发送至数据采集器,接收数据采集器返回的采集结果,执行步骤211;Step 209: The monitoring server sends the monitoring information corresponding to the monitoring item corresponding to the working process to the data collector, receives the collection result returned by the data collector, and executes step 211;
步骤210:监控服务器将该工作进程对应的监控项所对应的监控信息发送至SNMP代理,接收SNMP代理返回的采集结果,执行步骤211;Step 210: The monitoring server sends the monitoring information corresponding to the monitoring item corresponding to the working process to the SNMP agent, receives the collection result returned by the SNMP agent, and executes step 211;
步骤211:监控服务器判断接收到的采集结果是否异常,如果是,则执行步骤212,否则执行步骤215;Step 211: the monitoring server judges whether the received collection result is abnormal, if so, execute step 212, otherwise execute step 215;
本实施例中,监控服务器判断接收到的采集结果中的当前采集值是否小于报警阀值或小于故障阀值,如果是,则接收到的采集结果异常,否则接收到的采集结果正常;In this embodiment, the monitoring server judges whether the current collection value in the received collection result is less than the alarm threshold or the failure threshold, if yes, the received collection result is abnormal, otherwise the received collection result is normal;
例如,采集结果为:For example, the collection result is:
Memory Free:/246MB(16%)|percentage=24;20;10;0;100;其中,当前采集值为24,大于故障阀值10,且大于报警阀值20,因此采集结果正常;Memory Free:/246MB(16%)|percentage=24; 20; 10; 0; 100; among them, the current collection value is 24, which is greater than the fault threshold of 10, and greater than the alarm threshold of 20, so the collection result is normal;
如果接收到的采集结果为:If the collection result received is:
Memory Free:/246MB(16%)|percentage18;20;10;0;100;其中,当前采集值为18,小于报警阀值20,大于故障阀值10,因此采集结果异常;Memory Free:/246MB(16%)|percentage18;20;10;0;100; Among them, the current collection value is 18, which is less than the alarm threshold value of 20 and greater than the fault threshold value of 10, so the collection result is abnormal;
步骤212:监控服务器根据监控项的监控信息,判断异常的采集结果是否需要报警,如果是,则执行步骤213,否则执行步骤215;Step 212: The monitoring server judges whether the abnormal collection result needs to be called an alarm according to the monitoring information of the monitored item, if yes, execute step 213, otherwise execute step 215;
本实施例中,监控服务器根据监控项中的监控信息notifications_enabled的值,判断异常的采集结果是否需要报警,如果该监控信息notifications_enabled的值为1时,则需要报警,如果该监控信息notifications_enabled的值为0时,则不需要报警;In this embodiment, the monitoring server judges whether an abnormal collection result requires an alarm according to the value of the monitoring information notifications_enabled in the monitoring item. If the value of the monitoring information notifications_enabled is 1, an alarm is required. If the value of the monitoring information notifications_enabled is When 0, no alarm is required;
步骤213:监控服务器根据服务器配置文件中监控项的监控信息,判断是否允许发送报警通知,如果是,则执行步骤214,否则执行步骤215;Step 213: The monitoring server judges whether to allow sending an alarm notification according to the monitoring information of the monitoring item in the server configuration file, if yes, then execute step 214, otherwise execute step 215;
本实施例中,监控服务器根据监控项中的监控信息notification_period的值,判断是否允许发送报警通知;In this embodiment, the monitoring server determines whether to allow sending an alarm notification according to the value of the monitoring information notification_period in the monitoring item;
例如,监控项中的监控信息notification_period的值为24x7,则表示7*24小时均可发送报警通知,其中,是否需要报警或者何时需要报警可以由用户自行对该监控信息的值进行设定。For example, if the value of the monitoring information notification_period in the monitoring item is 24x7, it means that the alarm notification can be sent 7*24 hours, and the value of the monitoring information can be set by the user whether or not an alarm is required and when an alarm is required.
步骤214:监控服务器选择报警通知方式,将报警信息发送至管理员,执行步骤215;Step 214: the monitoring server selects an alarm notification method, sends the alarm information to the administrator, and executes step 215;
其中,报警通知方式可以为语音报警或者邮件通知报警等,由用户根据需求自行设定。Among them, the alarm notification method can be voice alarm or email notification alarm, etc., which can be set by the user according to the needs.
步骤215:监控服务器将对应的采集结果存储在服务器数据库中,返回步骤207。Step 215: The monitoring server stores the corresponding collection results in the server database, and returns to step 207.
本实施例还包括:用户界面定时(优选为1分钟)从监控服务器数据库中读取采集结果,在界面上显示读取到的采集结果;This embodiment also includes: the user interface regularly (preferably 1 minute) reads the collection result from the monitoring server database, and displays the read collection result on the interface;
本实施例中,步骤211至步骤215可以替换为:服务器根据服务器配置文件判断是否需要分析采集结果的变化趋势,如果是,则将采集结果保存,返回步骤207,否则直接返回步骤207;In this embodiment, step 211 to step 215 can be replaced by: the server judges according to the server configuration file whether it is necessary to analyze the change trend of the collection result, if yes, save the collection result and return to step 207, otherwise directly return to step 207;
例如,服务器配置文件中监控项的监控信息save_result的值为1时,表示需要分析采集结果的变化趋势,保存采集结果,如果save_result的值为0时,表示不需要分析采集结果的变化趋势,不保存采集结果。For example, when the value of save_result of the monitoring information of the monitoring item in the server configuration file is 1, it means that the change trend of the collection result needs to be analyzed and the collection result needs to be saved. If the value of save_result is 0, it means that the change trend of the collection result does not need to be analyzed. Save the collection result.
除此之外,步骤211至步骤215还可以替换为:服务器根据采集结果判断是否需要记录日志信息,如果是,则记录日志信息,返回步骤207,否则直接返回步骤207;In addition, steps 211 to 215 can also be replaced by: the server judges whether log information needs to be recorded according to the collection result, if yes, records the log information, and returns to step 207, otherwise directly returns to step 207;
具体的,服务器判断采集结果是否异常,如果是,则记录日志信息,否则不记录日志信息;Specifically, the server judges whether the collection result is abnormal, and if so, records the log information, otherwise does not record the log information;
例如,监控信息为采集CPU使用率,当采集结果中CPU使用率超过预设报警阀值时,记录当前时间点该被监控的机器发生CPU使用率超阀值信息。For example, the monitoring information is to collect the CPU usage rate. When the CPU usage rate in the collection result exceeds the preset alarm threshold, record the information that the CPU usage rate exceeds the threshold value of the monitored machine at the current time point.
实施例4Example 4
本发明实施例4提供了一种运维监控数据系统中数据采集器的工作方法,如图4所示,包括:Embodiment 4 of the present invention provides a working method of a data collector in an operation and maintenance monitoring data system, as shown in FIG. 4 , including:
本实施例中,优选的,监控服务器将监控信息以命令行的形式发送至数据采集器,数据采集器接收到监控信息后执行以下操作:In this embodiment, preferably, the monitoring server sends the monitoring information to the data collector in the form of a command line, and the data collector performs the following operations after receiving the monitoring information:
步骤301:数据采集器以命令行的形式解析接收到的监控信息,判断命令行参数是否出错,如果是,则向监控服务器发送错误信息,结束,否则执行步骤302;Step 301: The data collector parses the received monitoring information in the form of a command line, and judges whether the command line parameters are wrong, and if so, sends an error message to the monitoring server, and ends, otherwise executes step 302;
步骤302:数据采集器判断监控信息中是否包含辅助信息,如果是,则显示所述辅助信息,将已显示响应返回至监控服务器,结束,否则执行步骤303;Step 302: The data collector judges whether the monitoring information contains auxiliary information, if yes, displays the auxiliary information, returns the displayed response to the monitoring server, and ends, otherwise executes step 303;
本实施例中,当获取到的命令行参数中包含有数据采集器的帮助信息、版本信息等辅助信息时,为用于显示辅助信息的命令;In this embodiment, when the obtained command line parameters include auxiliary information such as help information and version information of the data collector, it is a command for displaying auxiliary information;
步骤303:数据采集器根据监控信息组装数据包,并获取监控信息中指定的监控客户端;Step 303: the data collector assembles the data packet according to the monitoring information, and obtains the monitoring client specified in the monitoring information;
本实施例中,组装得到的数据包中包含版本号、包类型、数据采集命令信息和通过校验算法(如crc32算法等)对上述数据进行计算得到的校验和,其中包类型包括请求包类型和响应包类型;In this embodiment, the assembled data package contains the version number, package type, data collection command information and the checksum obtained by calculating the above data through a verification algorithm (such as crc32 algorithm, etc.), wherein the package type includes request package type and response packet type;
步骤304:数据采集器将数据包发送至指定的监控客户端;Step 304: the data collector sends the data packet to a designated monitoring client;
步骤305:数据采集器接收监控客户端返回的采集结果;Step 305: the data collector receives the collection result returned by the monitoring client;
本实施例中,采集结果中包含版本号、包类型、采集得到的数据信息和通过校验算法(如crc32算法等)对上述数据进行计算得到的校验和,其中包类型包括请求包类型和响应包类型;In this embodiment, the collection result includes the version number, the packet type, the collected data information and the checksum obtained by calculating the above data through a verification algorithm (such as crc32 algorithm, etc.), wherein the packet type includes the request packet type and Response packet type;
步骤306:数据采集器判断采集结果是否正确,如果是,则执行步骤307,否则向监控服务器发送错误信息,结束;Step 306: the data collector judges whether the collection result is correct, if yes, then executes step 307, otherwise sends an error message to the monitoring server, and ends;
本实施例中,数据采集器从采集结果中获取校验和,判断采集结果中的校验和与发送的数据包的校验和是否相同,判断采集结果中的版本号与数据包中的版本号是否相同,判断采集结果中的包类型与数据包中的包类型是否相同,如果均相同,则采集结果正确,否则采集结果错误;In this embodiment, the data collector obtains the checksum from the collection result, judges whether the checksum in the collection result is the same as the checksum of the data packet sent, and judges whether the version number in the collection result is the same as the version number in the data packet. Whether the number is the same, judge whether the packet type in the collection result is the same as the packet type in the data packet, if they are the same, the collection result is correct, otherwise the collection result is wrong;
步骤307:数据采集器将采集结果发送至监控服务器。Step 307: the data collector sends the collection result to the monitoring server.
实施例5Example 5
本发明实施例5提供了一种运维监控数据系统中监控客户端的工作方法,如图5所示,包括:Embodiment 5 of the present invention provides a working method for monitoring a client in an operation and maintenance monitoring data system, as shown in FIG. 5 , including:
步骤401:监控客户端加载客户端配置文件,根据客户端配置文件中的配置参数,监听与监控服务器的连接端口;Step 401: the monitoring client loads the client configuration file, and monitors the connection port of the monitoring server according to the configuration parameters in the client configuration file;
例如,客户端配置文件中配置参数为:For example, the configuration parameters in the client configuration file are:
log_facility=daemonlog_facility=daemon
server_port=5666server_port=5666
allowed_hosts=127.0.0.1,192.168.88.179,192.168.88.189allowed_hosts=127.0.0.1,192.168.88.179,192.168.88.189
dont_blame_nrpe=0dont_blame_nrpe=0
客户端配置文件中的监控项为:The monitoring items in the client configuration file are:
其中,客户端配置文件的配置参数中的server_port=5666即为监控客户端与监控服务器的连接端口。Among them, server_port=5666 in the configuration parameters of the client configuration file is the connection port between the monitoring client and the monitoring server.
步骤402:监控客户端等待接收监控服务器发送的采集命令;Step 402: the monitoring client waits to receive the collection command sent by the monitoring server;
步骤403:当监控客户端接收到采集命令时,根据客户端配置文件,判断发送该命令的监控服务器是否为允许接入的设备,如果是,则执行步骤404,否则向监控服务器返回错误信息,返回步骤402;Step 403: When the monitoring client receives the acquisition command, according to the client configuration file, it is judged whether the monitoring server that sends the command is a device that allows access, if yes, then perform step 404, otherwise return an error message to the monitoring server, Return to step 402;
本实施例中,判断采集命令中监控服务器的IP地址是否包含在客户端配置文件中的配置参数allowed_hosts值对应的IP地址内,如果是,则发送该命令的监控服务器为允许接入的设备,否则该命令的监控服务器不是允许接入的设备;In this embodiment, it is judged whether the IP address of the monitoring server in the collection command is included in the IP address corresponding to the configuration parameter allowed_hosts value in the client configuration file, if yes, the monitoring server sending the command is a device that allows access, Otherwise, the monitoring server of this command is not a device that is allowed to be accessed;
例如,客户端配置文件中的配置参数allowed_hosts值为127.0.0.1,192.168.88.179,192.168.88.189,从采集命令中获取到的监控服务器的IP地址为192.168.88.179,因此该服务器为允许接入该客户端的设备。For example, the configuration parameter allowed_hosts in the client configuration file is 127.0.0.1, 192.168.88.179, 192.168.88.189, and the IP address of the monitoring server obtained from the collection command is 192.168.88.179, so the server is allowed to access the Client's device.
步骤404:监控客户端对接收到的采集命令进行解析,得到需要采集的监控信息,判断该监控信息是否为允许执行的监控信息,如果是,则执行步骤405,否则向监控服务器返回错误信息,返回步骤402;Step 404: The monitoring client parses the received collection command, obtains the monitoring information to be collected, and judges whether the monitoring information is monitoring information that is allowed to be executed, and if so, executes step 405, otherwise returns an error message to the monitoring server, Return to step 402;
本实施例中,判断得到的监控信息是否能够在客户端配置文件中找到,如果是,则该监控信息为允许执行的监控信息,否则该监控信息不是允许执行的监控信息。In this embodiment, it is judged whether the obtained monitoring information can be found in the configuration file of the client, if yes, the monitoring information is the monitoring information that is allowed to be executed, otherwise, the monitoring information is not the monitoring information that is allowed to be executed.
步骤405:监控客户端将需要采集的监控信息发送至数据采集应用;Step 405: the monitoring client sends the monitoring information to be collected to the data collection application;
例如,客户端接收到的监控信息用于采集CPU使用率,则数据采集应用通过top命令获取CPU使用率,如获取到的CPU使用率为:For example, the monitoring information received by the client is used to collect the CPU usage, and the data collection application uses the top command to obtain the CPU usage. For example, the obtained CPU usage is:
Cpu(s):2.9%us,33.3%sy,16.2%ni,44.8%id,2.9%wa,0.0%hi,0.0%si。Cpu(s): 2.9% us, 33.3% sy, 16.2% ni, 44.8% id, 2.9% wa, 0.0% hi, 0.0% si.
例如,客户端接收到的监控信息用于获取内存使用率,则数据采集应用通过free命令获取内存使用率,如获取到的内存使用率为:For example, the monitoring information received by the client is used to obtain the memory usage, and the data collection application obtains the memory usage through the free command. For example, the obtained memory usage is:
步骤406:监控客户端判断在预设时间内是否接收到数据采集应用返回的采集结果,如果是,则执行步骤407,否则将采集结果设置为超时,执行步骤407;Step 406: The monitoring client judges whether the collection result returned by the data collection application is received within the preset time, if yes, then execute step 407, otherwise set the collection result as timeout, and execute step 407;
优选的,预设时间为10s。Preferably, the preset time is 10s.
步骤407:监控客户端将采集结果返回监控服务器。Step 407: the monitoring client returns the collection result to the monitoring server.
实施例6Example 6
本发明实施例6提供了一种运维监控数据采集的服务器,如图6所示,包括:Embodiment 6 of the present invention provides a server for collecting operation and maintenance monitoring data, as shown in FIG. 6 , including:
第一判断模块601,用于根据服务器配置文件中各个监控项的采集间隔,判断是否有监控项满足采集条件;The first judging module 601 is used for judging whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the server configuration file;
第一发送模块602,用于当第一判断模块601判断有监控项满足采集条件时,将满足采集条件的监控项对应的监控信息发送至客户端;The first sending module 602 is configured to send the monitoring information corresponding to the monitoring items that meet the collection conditions to the client when the first judging module 601 judges that there are monitoring items that meet the collection conditions;
第一接收模块603,用于接收客户端返回的采集结果;The first receiving module 603 is configured to receive the collection result returned by the client;
处理模块604,用于对第一接收模块603接收到的采集结果进行处理,触发第一判断模块601。The processing module 604 is configured to process the collection result received by the first receiving module 603 and trigger the first judging module 601 .
本实施例中,服务器还包括第二判断模块;In this embodiment, the server further includes a second judging module;
第二判断模块,用于获取更新后的服务器配置文件,根据更新后服务器配置文件中的各个监控项的采集间隔,判断是否有监控项满足采集条件,判断为是时,触发第一发送模块602,判断为否时,继续触发第二判断模块。The second judging module is used to obtain the updated server configuration file, and judge whether any monitoring item satisfies the collection condition according to the collection interval of each monitoring item in the updated server configuration file, and when it is judged to be yes, trigger the first sending module 602 , when the judgment is no, continue to trigger the second judging module.
具体的,第一判断模块601,具体包括注册单元和检测单元;Specifically, the first judging module 601 specifically includes a registration unit and a detection unit;
注册单元,用于根据服务器配置文件中各个监控项的采集间隔,为各个监控项注册定时事件;The registration unit is used to register timing events for each monitoring item according to the collection interval of each monitoring item in the server configuration file;
检测单元,用于检测定时事件,当检测到有定时事件触发时,触发第一发送模块602。The detection unit is configured to detect a timing event, and trigger the first sending module 602 when a timing event trigger is detected.
其中,注册单元,具体用于从服务器配置文件中解析得到所有监控项,根据监控项的数量,为各个监控项创建工作进程组,根据每个监控项的采集间隔,为各个工作进程组注册定时事件。Among them, the registration unit is specifically used to parse all monitoring items from the server configuration file, create a working process group for each monitoring item according to the number of monitoring items, and register a timing for each working process group according to the collection interval of each monitoring item. event.
进一步的,第一发送模块602,具体用于当服务器配置文件的格式为文本文件格式时,调用处理文本文件格式的接口,将满足采集条件的监控项对应的监控信息发送至客户端。Further, the first sending module 602 is specifically configured to call the interface for processing the text file format when the server configuration file is in the text file format, and send the monitoring information corresponding to the monitoring items meeting the collection conditions to the client.
更进一步的,第一发送模块602,具体用于当服务器配置文件的格式为XML文件格式时,调用处理XML文件格式的接口,将满足采集条件的监控项对应的监控信息发送至客户端。Furthermore, the first sending module 602 is specifically configured to call an interface for processing the XML file format when the server configuration file is in the XML file format, and send the monitoring information corresponding to the monitoring items meeting the collection conditions to the client.
其中,处理模块604,具体包括:第一判断单元、第二判断单元、第三判断单元、报警单元和存储单元;Wherein, the processing module 604 specifically includes: a first judging unit, a second judging unit, a third judging unit, an alarm unit and a storage unit;
第一判断单元,用于判断接收模块接收到的采集结果是否异常,判断为是时,触发第二判断单元,判断为否时,触发第一判断模块601;The first judging unit is used to judge whether the collection result received by the receiving module is abnormal, and when it is judged to be yes, trigger the second judging unit, and when it is judged to be no, trigger the first judging module 601;
第二判断单元,用于根据监控项的监控信息,判断异常的采集结果是否需要报警,判断为是时,触发第三判断单元,判断为否时,触发第一判断模块601;The second judging unit is used to judge whether the abnormal collection result needs to be called an alarm according to the monitoring information of the monitoring item, and when it is judged to be yes, trigger the third judging unit, and when it is judged to be no, trigger the first judging module 601;
第三判断单元,用于根据监控项中的监控信息,判断是否允许发送报警通知,判断为是时,触发报警单元,判断为否时,触发第一判断模块601;The third judging unit is used to judge whether the alarm notification is allowed to be sent according to the monitoring information in the monitoring item, when it is judged to be yes, trigger the alarm unit, and when it is judged to be no, trigger the first judging module 601;
报警单元,用于当第三判断单元判断为是时,选择报警通知方式,将报警信息发送至管理员,触发第一判断模块601。The alarm unit is configured to select an alarm notification mode, send the alarm information to the administrator, and trigger the first determination module 601 when the third determination unit determines yes.
本实施例中,服务器还包括存储模块;In this embodiment, the server further includes a storage module;
处理模块604,具体用于根据服务器配置文件判断是否需要分析采集数据的变化趋势,判断为是时,触发存储模块,判断为否时,触发第一判断模块601;The processing module 604 is specifically used to judge whether it is necessary to analyze the change trend of the collected data according to the server configuration file, if it is judged to be yes, trigger the storage module, and when it is judged to be no, trigger the first judgment module 601;
存储模块,用于将采集结果保存,触发第一判断模块601。The storage module is used to save the collection result and trigger the first judging module 601 .
本实施例中,服务器还包括记录日志模块;In this embodiment, the server also includes a logging module;
处理模块604,具体用于根据采集数据判断是否需要记录日志信息,判断为是时,触发记录日志模块,判断为否时,触发第一判断模块601;The processing module 604 is specifically used to judge whether log information needs to be recorded according to the collected data, if it is judged to be yes, trigger the log recording module, and when it is judged to be no, trigger the first judging module 601;
记录日志模块,用于记录日志信息,触发第一判断模块601。The log recording module is configured to record log information and trigger the first judging module 601 .
其中,第一发送模块602,具体用于当获取服务器配置文件中的采集数据方式为监控代理采集数据方式时,将满足采集条件的监控项对应的监控信息发送至客户端。Wherein, the first sending module 602 is specifically configured to send the monitoring information corresponding to the monitoring items meeting the collection conditions to the client when the data collection method in the acquisition server configuration file is the monitoring agent data collection method.
进一步的,第一发送模块602,还用于当获取服务器配置文件中的采集数据方式为SNMP方式采集数据方式时,将满足采集条件的监控项对应的监控信息发送至SNMP代理;Further, the first sending module 602 is also used to send the monitoring information corresponding to the monitoring items that meet the collection conditions to the SNMP agent when the data collection method in the acquisition server configuration file is the SNMP data collection method;
第一接收模块603,还用于接收SNMP代理返回的采集数据,触发处理模块604。The first receiving module 603 is further configured to receive the collected data returned by the SNMP agent, and trigger the processing module 604 .
本实施例中,服务器还包括数据采集器模块;In this embodiment, the server also includes a data collector module;
数据采集器模块包括第一接收单元和第一发送单元;The data collector module includes a first receiving unit and a first sending unit;
第一发送模块602,还用于当第一判断模块601判断有监控项满足采集条件时,将满足采集条件的监控项对应的监控信息发送至第一接收单元;The first sending module 602 is also used to send the monitoring information corresponding to the monitoring items that meet the collection conditions to the first receiving unit when the first judging module 601 judges that there are monitoring items that meet the collection conditions;
第一接收单元,用于接收第一发送模块发送的监控信息;The first receiving unit is configured to receive the monitoring information sent by the first sending module;
第一发送单元,用于将第一接收单元接收到的监控信息发送至客户端。The first sending unit is configured to send the monitoring information received by the first receiving unit to the client.
其中,数据采集器模块还包括第一采集单元;Wherein, the data collector module also includes a first collection unit;
第一采集单元,用于当获取监控信息中的数据采集的执行方式为本机执行时,根据监控信息采集服务器中的数据,触发第一发送单元。The first collection unit is configured to collect data in the server according to the monitoring information and trigger the first sending unit when the execution mode of the data collection in the monitoring information acquisition is local execution.
进一步的,第一发送单元,具体用于当获取监控信息中的数据采集的执行方式为客户端执行时,将第一接收单元接收到的监控信息发送至客户端。Further, the first sending unit is specifically configured to send the monitoring information received by the first receiving unit to the client when the data collection in acquiring the monitoring information is performed by the client.
本实施例中,服务器包括数据采集应用模块;In this embodiment, the server includes a data acquisition application module;
数据采集应用模块包括第二接收单元、第二采集单元和第二发送单元;The data collection application module includes a second receiving unit, a second collecting unit and a second sending unit;
第二接收单元,用于接收数据采集器发送的监控信息;The second receiving unit is used to receive the monitoring information sent by the data collector;
第二采集单元,用于根据第二接收单元接收到的监控信息采集对应的服务器中的数据;The second collecting unit is used to collect the data in the corresponding server according to the monitoring information received by the second receiving unit;
第二发送单元,用于将第二采集单元采集到的服务器中的数据发送至数据采集器模块。The second sending unit is configured to send the data in the server collected by the second collecting unit to the data collector module.
其中,数据采集器模块还包括第四判断单元和显示单元;Wherein, the data collector module also includes a fourth judging unit and a display unit;
第四判断单元,用于判断第一接收单元接收到的监控信息中是否包含辅助信息,判断为是时,触发显示单元,判断为否时,触发第一发送单元;The fourth judging unit is used to judge whether the monitoring information received by the first receiving unit contains auxiliary information, if it is judged to be yes, trigger the display unit, and if it is judged to be no, trigger the first sending unit;
显示单元,用于显示辅助信息。The display unit is used for displaying auxiliary information.
本实施例中,服务器还包括第三判断模块和报错模块;In this embodiment, the server further includes a third judging module and an error reporting module;
第三判断模块,用于当第一判断模块601判断为是时,根据服务器配置文件中的操作系统信息,判断是否能够采集客户端中的数据,判断为是时,触发第一发送模块602,判断为否时,触发报错模块;The third judging module is used to judge whether the data in the client can be collected according to the operating system information in the server configuration file when the first judging module 601 judges yes, and trigger the first sending module 602 when judging yes, When the judgment is no, trigger the error reporting module;
报错模块,用于报错,继续触发第一判断模块601。The error reporting module is used to report an error and continue to trigger the first judging module 601 .
其中,第三判断模块具体包括第三发送单元、第三接收单元和第五判断单元;Wherein, the third judging module specifically includes a third sending unit, a third receiving unit and a fifth judging unit;
第三发送单元,用于向客户端发送获取客户端操作系统信息的请求;The third sending unit is configured to send to the client a request for acquiring client operating system information;
第三接收单元,用于接收客户端返回的客户端操作系统信息;A third receiving unit, configured to receive client operating system information returned by the client;
第五判断单元,用于判断服务器配置文件中的操作系统信息与第三接收单元接收到的客户端操作系统信息是否匹配,判断为是时,触发第一发送模块602,判断为否时,触发报错模块。The fifth judging unit is used to judge whether the operating system information in the server configuration file matches the client operating system information received by the third receiving unit, and trigger the first sending module 602 when it is judged to be yes, and trigger Error reporting module.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明公开的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto, any changes or variations that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (36)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610014809.9A CN105610648B (en) | 2016-01-11 | 2016-01-11 | A method and server for collecting operation and maintenance monitoring data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610014809.9A CN105610648B (en) | 2016-01-11 | 2016-01-11 | A method and server for collecting operation and maintenance monitoring data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105610648A CN105610648A (en) | 2016-05-25 |
CN105610648B true CN105610648B (en) | 2019-08-09 |
Family
ID=55990193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610014809.9A Expired - Fee Related CN105610648B (en) | 2016-01-11 | 2016-01-11 | A method and server for collecting operation and maintenance monitoring data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105610648B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326068B (en) * | 2016-08-17 | 2019-01-25 | 东软集团股份有限公司 | The monitoring method and device of resource metrics |
CN107797465A (en) * | 2016-09-06 | 2018-03-13 | 北京百度网讯科技有限公司 | Monitoring method and device |
CN106549830A (en) * | 2016-11-09 | 2017-03-29 | 上海最会保网络科技有限公司 | A kind of Mobile solution data acquisition unit and method |
CN107181639A (en) * | 2017-03-31 | 2017-09-19 | 北京奇艺世纪科技有限公司 | The monitoring method and device of a kind of communications status |
CN108563515B (en) * | 2018-03-14 | 2021-08-27 | 中国银联股份有限公司 | Business process management method and system |
CN108965403B (en) * | 2018-06-27 | 2021-05-14 | 平安科技(深圳)有限公司 | Operation and maintenance monitoring connection establishing method and terminal equipment |
CN108959009A (en) * | 2018-07-26 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of server failure analysis method and its fail analysis device |
CN109766206A (en) * | 2018-12-29 | 2019-05-17 | 北京中电普华信息技术有限公司 | Method and system for collecting logs |
CN110401698A (en) * | 2019-06-27 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A monitoring data acquisition method, device and system based on task asynchronous distribution |
CN112968789B (en) * | 2019-12-12 | 2024-02-27 | 中兴通讯股份有限公司 | Data acquisition method, device, computer equipment and computer readable medium |
CN111162936B (en) * | 2019-12-17 | 2022-10-21 | 北京首钢自动化信息技术有限公司 | Data acquisition method and device |
CN112241527B (en) * | 2020-12-15 | 2021-04-27 | 杭州海康威视数字技术股份有限公司 | Secret key generation method and system of terminal equipment of Internet of things and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051962A (en) * | 2007-05-22 | 2007-10-10 | 网御神州科技(北京)有限公司 | Expandable dynamic network monitor system and its monitor method |
CN103248651A (en) * | 2012-02-09 | 2013-08-14 | 腾讯科技(深圳)有限公司 | Performance monitoring method and system, as well as client side and server |
CN105099819A (en) * | 2015-07-20 | 2015-11-25 | 努比亚技术有限公司 | System and method for monitoring website state |
CN105163330A (en) * | 2015-06-19 | 2015-12-16 | 深圳天珑无线科技有限公司 | Information monitoring method, equipment and information monitoring system |
-
2016
- 2016-01-11 CN CN201610014809.9A patent/CN105610648B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051962A (en) * | 2007-05-22 | 2007-10-10 | 网御神州科技(北京)有限公司 | Expandable dynamic network monitor system and its monitor method |
CN103248651A (en) * | 2012-02-09 | 2013-08-14 | 腾讯科技(深圳)有限公司 | Performance monitoring method and system, as well as client side and server |
CN105163330A (en) * | 2015-06-19 | 2015-12-16 | 深圳天珑无线科技有限公司 | Information monitoring method, equipment and information monitoring system |
CN105099819A (en) * | 2015-07-20 | 2015-11-25 | 努比亚技术有限公司 | System and method for monitoring website state |
Also Published As
Publication number | Publication date |
---|---|
CN105610648A (en) | 2016-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105610648B (en) | A method and server for collecting operation and maintenance monitoring data | |
US10348809B2 (en) | Naming of distributed business transactions | |
US9514387B2 (en) | System and method of monitoring and measuring cluster performance hosted by an IAAS provider by means of outlier detection | |
US8892960B2 (en) | System and method for determining causes of performance problems within middleware systems | |
CN103605722B (en) | Database monitoring method and device, equipment | |
CN110716842B (en) | Cluster fault detection method and device | |
US9921877B2 (en) | Intelligent auto-scaling | |
CN110175451A (en) | A kind of method for safety monitoring and system based on electric power cloud | |
US20060026467A1 (en) | Method and apparatus for automatically discovering of application errors as a predictive metric for the functional health of enterprise applications | |
TW201507403A (en) | Method and affair-processing apparatus for monitoring websites and system thereof | |
CN114978883B (en) | Network wakeup management method and device, electronic equipment and storage medium | |
CN107807872A (en) | A kind of power transmission and transformation system method for monitoring operation states | |
CN113760652B (en) | Method, system, device and storage medium for full link monitoring based on application | |
CN100499482C (en) | A method for monitoring user behavior in network management system | |
CN106911519A (en) | A kind of data acquisition monitoring method and device | |
CN107635003A (en) | System log management method, device and system | |
CN113824601A (en) | Electric power marketing monitored control system based on service log | |
CN104796283B (en) | A kind of method of monitoring alarm | |
CN114490237B (en) | Operation and maintenance monitoring method and device based on multiple data sources | |
JP5503177B2 (en) | Fault information collection device | |
CN111143154A (en) | Wharf operating system operation monitoring method and device, server and storage medium | |
US10296967B1 (en) | System, method, and computer program for aggregating fallouts in an ordering system | |
CN120011170A (en) | Resource aggregation alarm method, device, equipment and medium | |
CN117112350A (en) | Data acquisition method, device, computer equipment and storage medium | |
CN115827394A (en) | A monitoring and alarming method based on Zabbix |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190809 |