[go: up one dir, main page]

CN119603130A - A network business application operation and maintenance alarm system - Google Patents

A network business application operation and maintenance alarm system Download PDF

Info

Publication number
CN119603130A
CN119603130A CN202411864459.6A CN202411864459A CN119603130A CN 119603130 A CN119603130 A CN 119603130A CN 202411864459 A CN202411864459 A CN 202411864459A CN 119603130 A CN119603130 A CN 119603130A
Authority
CN
China
Prior art keywords
alarm
service
management module
monitoring
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411864459.6A
Other languages
Chinese (zh)
Inventor
王萍
缐月
潘腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tangshan Aikete Technology Co ltd
Original Assignee
Tangshan Aikete Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tangshan Aikete Technology Co ltd filed Critical Tangshan Aikete Technology Co ltd
Priority to CN202411864459.6A priority Critical patent/CN119603130A/en
Publication of CN119603130A publication Critical patent/CN119603130A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network service application operation and maintenance alarm system, which relates to the technical field of network operation and maintenance and comprises a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module. The invention can monitor the whole running state of the network service application in real time, accurately locate the position of the fault or abnormality through the fault analysis model when the fault or abnormality is monitored, and analyze and judge the potential risk and the influence range. The accurate conclusion is obtained through the fault analysis model, and the alarm notification is sent to the operation and maintenance personnel in time, so that the operation and maintenance personnel can be helped to manage better and ensure the efficient and stable operation of the business application.

Description

Network service application operation and maintenance alarm system
Technical Field
The invention relates to the technical field of network operation and maintenance, in particular to a network service application operation and maintenance alarm system.
Background
With the advent of the internet era, the rapid development of networks, the rapid improvement of informatization degree and the gradual deepening of digital industrialization have been achieved, each industry gradually has a respective service application platform system, and a complete system is operated on the basis of a series of software and hardware services, and enterprises generally establish monitoring systems to operate and maintain the service systems, networks and hardware. However, when the existing monitoring system faces sudden faults or security threats, the problem cannot be timely and accurately found and positioned and the influence scope of the problem is judged, and an accurate comprehensive alarm mechanism for the whole service (software, network and hardware) is lacking. Due to the lack of an intelligent real-time analysis and early warning mechanism, the operation and maintenance personnel cannot predict potential problems in advance.
Therefore, developing an operation and maintenance alarm system with functions of real-time monitoring, intelligent analysis, automatic alarm and the like for overall business becomes urgent need to improve the reliability and stability of business application operation and ensure the efficient operation of network business application.
Disclosure of Invention
The invention aims to provide a network service application operation and maintenance alarm system which can monitor the overall operation state of network service applications (software, network and hardware) in real time, accurately locate the position of a fault or abnormality through a fault analysis model when the fault or abnormality is monitored, and analyze and judge potential risks and influence ranges. The accurate conclusion is obtained through the fault analysis model, and the alarm notification is sent to the operation and maintenance personnel in time, so that the operation and maintenance personnel can be helped to manage better and ensure the efficient and stable operation of the business application.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A network service application operation and maintenance alarm system comprises a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module.
The comprehensive monitoring module is used for collecting operation data of the network service application in real time, and the operation data comprise key indexes such as CPU utilization rate, memory occupation, magnetic disk, network state, network delay, request response time and the like.
The service management module is used for providing a man-machine interaction interface for a user, and performing operations such as operation and maintenance management and control, operation and maintenance policy creation, issuing and the like on the managed service application.
The alarm management module is used for monitoring the running state and data of the business application in real time, generating alarm information of different levels according to the monitoring result and sending the alarm information to management personnel in real time.
The alarm analysis module is used for receiving the generated alarm data, comprehensively judging and analyzing the alarm based on a set alarm analysis model, analyzing abnormal or potential faults, positioning the position of the fault and accurately judging the influence range of the fault.
The service topology management module is used for displaying details of the service system, maintaining basic information of the service system, generating a comprehensive service topology graph according to the service system information and displaying connection relations of software and hardware equipment of service application.
The report management module is used for counting and analyzing the alarm information of the business application and recording and analyzing the abnormal condition and maintenance condition of the business application.
And the user management module manages and maintains user rights according to different roles and performs daily record on user login and operation behaviors.
Further, the comprehensive monitoring module is configured to monitor the running state of the service application and the devices such as network devices (routers, switches, etc.), servers, databases, middleware, etc., obtain relevant performance index data, configure in combination with alarm rules, discover abnormal performance monitoring situations in time, and push alarm information to relevant personnel.
The comprehensive monitoring module further comprises the steps of extracting accurate monitoring information from different types of service applications, comprehensively monitoring index data and alarm rules, intelligently analyzing performance parameters and alarm factors of various types, and timely alarming operation risks, faults and abnormal conditions.
The monitoring results of the comprehensive monitoring are divided into normal, abnormal and fault. Abnormal and fault conditions indicate that the system may fail and need to be handled. The administrator may set an alarm condition and a transmission mode for an abnormal or fault state. And starting an alarm module after the monitoring result is abnormal or in a fault state, judging whether an alarm is required by the alarm module according to alarm rules, and sending alarm information according to a preset alarm mode if the alarm is required.
Further, the service management module comprises a service alarm combination template, a template name, alarm description, alarm level, trigger rules and the like are configured, a monitoring object can be added, important alarm indexes of imported service are supported, and the created service alarm template can be modified and deleted.
The service management module also comprises a service system, wherein basic information of the service system is configured, such as service names, home departments, alarm templates and affiliated service trees, and monitoring objects are selected and added.
The service management module displays various service systems in a service tree structure, can edit, add and delete the service systems, and checks the service system profile, including service objects contained in the service systems and alarm information records sent by the service objects.
The service management module provides an intuitive service large screen panoramic wall, can check the overall health degree, the alarm information list, the access quantity trend, the service object type comparison analysis, the alarm information grade comparison analysis and the like of the service system, has a macroscopic grasp on the overall operation condition of the service system, supports the custom configuration of the service large screen panoramic wall and adds the service system.
Furthermore, the alarm management module classifies the real-time alarms according to the alarm levels, different levels are distinguished by different colors, alarm information and details are displayed, and real-time alarm statistics according to the time dimension and the alarm levels is supported.
The alarm management module supports the classification of historical alarms according to all alarms, whether the alarms are confirmed or not and different equipment alarms, displays basic information of the alarms and alarm notification, and provides a common knowledge base solution.
The alarm management module processes the alarm information to transfer the work order, distributes the work order to maintenance personnel, supports flexible notification configuration, notifies the maintenance personnel in a mail, short message and micro-message mode, and can inquire the alarm notification record according to the alarm title, the receiving mode and the receiving personnel.
Further, the alarm analysis module sets alarm levels (emergency, serious, secondary, alarm, information) with different degrees and triggering rules of alarm indexes by setting an alarm template, selects required resource classifications (operating systems, middleware, network equipment and the like), imports various preset alarm indexes of the system, finely screens the alarm information, only acquires alarms concerned by users and accurately alarms faults generated by the system.
Further, the service topology management module provides details of a service system, and basic information of the service system comprises a resource name, a resource type, a resource IP, a host state, an acquisition state and an alarm log.
The service topology management module can generate a comprehensive network topology graph according to service system information, display the connection relation of equipment, edit the network topology graph and dynamically display the alarm state of each node in the topology graph.
Further, the report management module displays the total number of alarms, the total number of closed alarms and the total number of unrecovered alarms according to the alarm level classification statistics, supports generation of daily reports, weekly reports and monthly reports, provides inquiry functions of the historical daily reports, weekly reports and monthly reports, alarms statistics trend and provides a report deriving function.
Further, the user management module comprises management functions of adding, modifying, deleting, distributing roles and the like of the user, and realizes the management of the user on the user and the authority configuration.
In order to optimize the system architecture, the invention adopts a B/S architecture design, the front end uses a Vue frame, and the rear end is constructed based on a Spring Boot frame. The system adopts a modularized structure to divide functional modules, so that the system is ensured to have good maintainability, expansibility and flexibility.
As a preference of the present invention, the system preferably employs a monitoring solution with multiprotocol support and flexible extension capability in order to achieve efficient and comprehensive monitoring functionality. The scheme supports various data acquisition modes, including SNMP, agent, IPMI, JMX, ODBC protocols and the like, and can adapt to diversified environment requirements. The scheme can monitor resources such as a server, an operating system, network equipment, a database, middleware, storage equipment, virtualization and the like, provides a grouping management function and realizes multi-level and multi-dimensional monitoring.
As the optimization of the invention, the alarm information interface provides an operation and maintenance knowledge base, including alarm reason analysis and solution, and can comment, thereby increasing humanized interaction modes.
Compared with the prior art, the network service application operation and maintenance alarm system has the advantages of high efficiency, reasonability, easiness in management and capability of effectively arranging and synchronizing various data and displaying. When abnormal conditions occur, the system can automatically locate the abnormal conditions through the fault analysis model and send out an alarm and influence range, so that the problem locating and solving time can be greatly shortened, the service interruption risk is reduced, and the running stability and user experience of the service system are improved.
Drawings
Fig. 1 is a flowchart of a service application alarm operation of a network service application operation alarm system according to the present invention.
Fig. 2 is a schematic diagram of service application alarm logic of a network service application operation alarm system according to the present invention.
Detailed Description
The following description of the preferred embodiments of the present invention is provided in connection with the accompanying drawings, and it is to be understood that the preferred embodiments described herein are for purposes of illustration and explanation of the present invention only, and are not intended to limit the invention to the precise embodiments disclosed, but are not limited to the precise embodiments disclosed.
Examples
It should be noted that various devices monitored by the system, such as operating systems, middleware, network devices, databases, servers, etc., are provided that the objects have been monitored in the system. A network service application operation and maintenance alarm system comprises the following specific processes:
s1, creating a host, configuring host information, performing SNMP configuration including monitoring types (IP, DNS), ports, SNMP versions, group words and the like, and supporting batch acquisition.
S2, creating a service alarm configuration template, filling in template names, single selection of alarm grades (1 information, 2 alarms, 3 times, 4 seriousness and 5 urgency), selecting multiple selection of alarm indexes (according to monitoring types, selecting index information under a corresponding template), and customizing service alarm description information.
S3, creating a service application system, a service name, an affiliated department, an alarm template (the created template) and a contained host object. The host object alarm under the service matches with the template host index item, if the template contains index alarm, the corresponding level of the template is sent out, and the description alarm is sent to the unified alarm management module.
S4, recovering the alarm (all monitoring objects under the service have no alarm, and recovering the service to a normal state).
In the step, the data access of the information acquisition data is completed, and acquisition means such as SNMP protocol, IPMI protocol, XML active reporting and passive reporting are supported in the step, and a series of parameters such as CPU, disk, network card, server sensor, application layer information and the like of the acquisition capability system are responsible.
In step S2, the configuration alert rule expression is supported, so that richer alert rules can be dynamically configured, dynamic expression configuration is provided, and the alert expression can be established according to the configured alert finger table. Such as:
and alarming immediately, and alarming the service immediately when receiving the finger list alarm.
And alarming simultaneously, wherein the indexes establish a relation, and when the indexes simultaneously have alarming, the service is given out an alarm.
And when a certain alarm exceeds the appointed duration, the duration gives an alarm to the service.
Assuming that the database monitoring of an enterprise sets two indexes of CPU utilization rate and memory utilization rate, a user creates a corresponding alarm template. When the CPU utilization rate exceeds 80%, an emergency alarm is triggered, and meanwhile, if the memory utilization rate also exceeds 75%, the system sends out a joint alarm with higher priority to prompt the operation and maintenance team to intervene. Once CPU and memory usage returns to normal levels, the system automatically records recovery events, updates the business state and notifies the relevant personnel.
It should be noted that the above-mentioned embodiments are merely preferred embodiments of the present invention, and the present invention is not limited thereto, but may be modified or substituted for some of the technical features thereof by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The network service application operation and maintenance alarm system is characterized by comprising a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module;
the comprehensive monitoring module is used for collecting running data of the network service application in real time, including CPU utilization rate, memory occupation, magnetic disk, network state, network delay, request response time and other key indexes;
the business management module is used for providing a man-machine interaction interface for a user, and performing operation and maintenance management and control, operation and maintenance strategy creation and issuing operations on the managed business application;
the alarm management module is used for monitoring the running state and data of the business application in real time, generating alarm information of different levels according to the monitoring result and sending the alarm information to a manager in real time;
the alarm analysis module is used for receiving the generated alarm data, comprehensively judging and analyzing the alarm based on a set alarm analysis model, analyzing abnormal or potential faults, positioning the position of the fault and accurately judging the influence range of the fault;
the service topology management module is used for displaying details of the service system, maintaining basic information of the service system, generating a comprehensive service topology graph according to the service system information and displaying connection relations of software and hardware equipment of service application;
the report management module is used for counting and analyzing alarm information of the business application and recording and analyzing abnormal conditions and maintenance conditions of the business application;
and the user management module manages and maintains user rights according to different roles and performs daily record on user login and operation behaviors.
2. The network service application operation and maintenance alarm system according to claim 1, wherein the comprehensive monitoring module is configured to monitor the operation state of the service application and the devices such as network devices, servers, databases, middleware, etc. in real time, obtain relevant performance index data, configure in combination with alarm rules to find out abnormal performance monitoring conditions in time, and push alarm information to relevant personnel.
3. The integrated monitoring module according to claim 2, wherein accurate monitoring information is extracted for different types of service applications, index data and alarm rules are monitored comprehensively, performance parameters and alarm factors of various types are analyzed intelligently, and timely alarms are given to operation risks, faults and abnormal conditions.
4. According to claim 3, the monitoring results of the comprehensive monitoring are classified into normal, abnormal and fault. Abnormal and fault conditions indicate that the system may fail and need to be handled. The manager can set an alarm condition and a sending mode aiming at the abnormal or fault state, and when the monitoring result is the abnormal or fault state, the alarm module is started, the alarm module judges whether the alarm is needed according to the alarm rule, and if the alarm is needed, the alarm information is sent according to the preset alarm mode.
5. The network service application operation and maintenance alarm system according to claim 1, wherein the service management module comprises a service alarm combination template, wherein the service alarm combination template is configured with a template name, an alarm description, an alarm level and a trigger rule, can be added with a monitoring object, supports important alarm indexes of an imported service, and can be modified and deleted.
6. According to claim 5, the service management module provides an intuitive service large screen panoramic wall, views the overall health of the service system, the alarm information list, the access trend, the service object type comparison analysis and the alarm information level comparison analysis, and performs macroscopic grasp on the overall operation condition of the service system.
7. The network service application operation and maintenance alarm system according to claim 1, wherein the alarm management module processes the alarm information to transfer the work order, distributes the work order to maintenance personnel, supports flexible notification configuration, notifies the maintenance personnel in a mail, short message and WeChat mode, and can inquire the alarm notification record according to the alarm title, the receiving mode and the receiving personnel.
8. The system of claim 1, wherein the alarm analysis module sets alarm levels (emergency, serious, secondary, alarm, information) of different degrees and trigger rules of alarm indexes by setting an alarm template, selects required resource classifications (operating system, middleware, network equipment), imports alarm indexes preset by various systems, can finely screen the alarm information, only acquire alarms concerned by users, and can accurately alarm faults occurring in the system.
9. A network service application operation and maintenance alarm system according to any one of claims 1-8, wherein the operation flow of the system is as follows:
The method comprises the steps of S1, creating a host, configuring host information and monitoring configuration, wherein the monitoring can be performed in various modes, namely, monitoring through SNMP (IP, port, SNMP version, group word and the like, supporting batch acquisition) and monitoring through Agent (IP, port, DNS);
S2, creating a service alarm configuration template, filling in a template name, an alarm grade, selecting an alarm index and customizing service alarm description information;
s3, creating a service application system, a service name, an affiliated department, an alarm template and a contained host object, wherein the alarm of the host object under the service matches with a template host index item, if the template contains an index alarm, sending a template corresponding grade and describing the alarm to a unified alarm management module;
S4, recovering the alarm, wherein all monitoring objects under the service have no alarm, and the service is recovered to a normal state.
CN202411864459.6A 2024-12-18 2024-12-18 A network business application operation and maintenance alarm system Pending CN119603130A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411864459.6A CN119603130A (en) 2024-12-18 2024-12-18 A network business application operation and maintenance alarm system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411864459.6A CN119603130A (en) 2024-12-18 2024-12-18 A network business application operation and maintenance alarm system

Publications (1)

Publication Number Publication Date
CN119603130A true CN119603130A (en) 2025-03-11

Family

ID=94830881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411864459.6A Pending CN119603130A (en) 2024-12-18 2024-12-18 A network business application operation and maintenance alarm system

Country Status (1)

Country Link
CN (1) CN119603130A (en)

Similar Documents

Publication Publication Date Title
US8370479B2 (en) System and method for dynamically grouping devices based on present device conditions
US7043659B1 (en) System and method for flexible processing of management policies for managing network elements
CN110278097B (en) Server operation and maintenance system and method based on Android system
US7197561B1 (en) Method and apparatus for maintaining the status of objects in computer networks using virtual state machines
US20030135382A1 (en) Self-monitoring service system for providing historical and current operating status
US7028228B1 (en) Method and apparatus for identifying problems in computer networks
WO2023142054A1 (en) Container microservice-oriented performance monitoring and alarm method and alarm system
US7685269B1 (en) Service-level monitoring for storage applications
CN114244676A (en) Intelligent IT integrated gateway system
CN105282772A (en) Wireless network data communication equipment monitoring system and equipment monitoring method
CN107294764A (en) Intelligent supervision method and intelligent monitoring system
EP1361761A1 (en) Telecommunications network management system and method for service monitoring
CN112688819A (en) Comprehensive management system for network operation and maintenance
CN111190794A (en) Operation and maintenance monitoring and management system
CN1984170B (en) Method for processing network alerting information
US20060112175A1 (en) Agile information technology infrastructure management system
CN106487574A (en) Automatic operating safeguards monitoring system
CN101095307A (en) Network management appliance
CN106789412A (en) Method, the apparatus and system of monitoring information collection main website performance
CN113765717B (en) An operation and maintenance management system based on a confidential special computing platform
CN116961241B (en) Unified application monitoring platform based on power grid business
CN116094905B (en) Full-link monitoring system
CN113608457A (en) Network operation and maintenance monitoring system
CN119603130A (en) A network business application operation and maintenance alarm system
CN119025373A (en) A centralized operation and maintenance method, system, device and storage medium for multiple front-end and back-end separated monomer projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination