CN119603130A - A network business application operation and maintenance alarm system - Google Patents
A network business application operation and maintenance alarm system Download PDFInfo
- Publication number
- CN119603130A CN119603130A CN202411864459.6A CN202411864459A CN119603130A CN 119603130 A CN119603130 A CN 119603130A CN 202411864459 A CN202411864459 A CN 202411864459A CN 119603130 A CN119603130 A CN 119603130A
- Authority
- CN
- China
- Prior art keywords
- alarm
- service
- management module
- monitoring
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a network service application operation and maintenance alarm system, which relates to the technical field of network operation and maintenance and comprises a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module. The invention can monitor the whole running state of the network service application in real time, accurately locate the position of the fault or abnormality through the fault analysis model when the fault or abnormality is monitored, and analyze and judge the potential risk and the influence range. The accurate conclusion is obtained through the fault analysis model, and the alarm notification is sent to the operation and maintenance personnel in time, so that the operation and maintenance personnel can be helped to manage better and ensure the efficient and stable operation of the business application.
Description
Technical Field
The invention relates to the technical field of network operation and maintenance, in particular to a network service application operation and maintenance alarm system.
Background
With the advent of the internet era, the rapid development of networks, the rapid improvement of informatization degree and the gradual deepening of digital industrialization have been achieved, each industry gradually has a respective service application platform system, and a complete system is operated on the basis of a series of software and hardware services, and enterprises generally establish monitoring systems to operate and maintain the service systems, networks and hardware. However, when the existing monitoring system faces sudden faults or security threats, the problem cannot be timely and accurately found and positioned and the influence scope of the problem is judged, and an accurate comprehensive alarm mechanism for the whole service (software, network and hardware) is lacking. Due to the lack of an intelligent real-time analysis and early warning mechanism, the operation and maintenance personnel cannot predict potential problems in advance.
Therefore, developing an operation and maintenance alarm system with functions of real-time monitoring, intelligent analysis, automatic alarm and the like for overall business becomes urgent need to improve the reliability and stability of business application operation and ensure the efficient operation of network business application.
Disclosure of Invention
The invention aims to provide a network service application operation and maintenance alarm system which can monitor the overall operation state of network service applications (software, network and hardware) in real time, accurately locate the position of a fault or abnormality through a fault analysis model when the fault or abnormality is monitored, and analyze and judge potential risks and influence ranges. The accurate conclusion is obtained through the fault analysis model, and the alarm notification is sent to the operation and maintenance personnel in time, so that the operation and maintenance personnel can be helped to manage better and ensure the efficient and stable operation of the business application.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A network service application operation and maintenance alarm system comprises a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module.
The comprehensive monitoring module is used for collecting operation data of the network service application in real time, and the operation data comprise key indexes such as CPU utilization rate, memory occupation, magnetic disk, network state, network delay, request response time and the like.
The service management module is used for providing a man-machine interaction interface for a user, and performing operations such as operation and maintenance management and control, operation and maintenance policy creation, issuing and the like on the managed service application.
The alarm management module is used for monitoring the running state and data of the business application in real time, generating alarm information of different levels according to the monitoring result and sending the alarm information to management personnel in real time.
The alarm analysis module is used for receiving the generated alarm data, comprehensively judging and analyzing the alarm based on a set alarm analysis model, analyzing abnormal or potential faults, positioning the position of the fault and accurately judging the influence range of the fault.
The service topology management module is used for displaying details of the service system, maintaining basic information of the service system, generating a comprehensive service topology graph according to the service system information and displaying connection relations of software and hardware equipment of service application.
The report management module is used for counting and analyzing the alarm information of the business application and recording and analyzing the abnormal condition and maintenance condition of the business application.
And the user management module manages and maintains user rights according to different roles and performs daily record on user login and operation behaviors.
Further, the comprehensive monitoring module is configured to monitor the running state of the service application and the devices such as network devices (routers, switches, etc.), servers, databases, middleware, etc., obtain relevant performance index data, configure in combination with alarm rules, discover abnormal performance monitoring situations in time, and push alarm information to relevant personnel.
The comprehensive monitoring module further comprises the steps of extracting accurate monitoring information from different types of service applications, comprehensively monitoring index data and alarm rules, intelligently analyzing performance parameters and alarm factors of various types, and timely alarming operation risks, faults and abnormal conditions.
The monitoring results of the comprehensive monitoring are divided into normal, abnormal and fault. Abnormal and fault conditions indicate that the system may fail and need to be handled. The administrator may set an alarm condition and a transmission mode for an abnormal or fault state. And starting an alarm module after the monitoring result is abnormal or in a fault state, judging whether an alarm is required by the alarm module according to alarm rules, and sending alarm information according to a preset alarm mode if the alarm is required.
Further, the service management module comprises a service alarm combination template, a template name, alarm description, alarm level, trigger rules and the like are configured, a monitoring object can be added, important alarm indexes of imported service are supported, and the created service alarm template can be modified and deleted.
The service management module also comprises a service system, wherein basic information of the service system is configured, such as service names, home departments, alarm templates and affiliated service trees, and monitoring objects are selected and added.
The service management module displays various service systems in a service tree structure, can edit, add and delete the service systems, and checks the service system profile, including service objects contained in the service systems and alarm information records sent by the service objects.
The service management module provides an intuitive service large screen panoramic wall, can check the overall health degree, the alarm information list, the access quantity trend, the service object type comparison analysis, the alarm information grade comparison analysis and the like of the service system, has a macroscopic grasp on the overall operation condition of the service system, supports the custom configuration of the service large screen panoramic wall and adds the service system.
Furthermore, the alarm management module classifies the real-time alarms according to the alarm levels, different levels are distinguished by different colors, alarm information and details are displayed, and real-time alarm statistics according to the time dimension and the alarm levels is supported.
The alarm management module supports the classification of historical alarms according to all alarms, whether the alarms are confirmed or not and different equipment alarms, displays basic information of the alarms and alarm notification, and provides a common knowledge base solution.
The alarm management module processes the alarm information to transfer the work order, distributes the work order to maintenance personnel, supports flexible notification configuration, notifies the maintenance personnel in a mail, short message and micro-message mode, and can inquire the alarm notification record according to the alarm title, the receiving mode and the receiving personnel.
Further, the alarm analysis module sets alarm levels (emergency, serious, secondary, alarm, information) with different degrees and triggering rules of alarm indexes by setting an alarm template, selects required resource classifications (operating systems, middleware, network equipment and the like), imports various preset alarm indexes of the system, finely screens the alarm information, only acquires alarms concerned by users and accurately alarms faults generated by the system.
Further, the service topology management module provides details of a service system, and basic information of the service system comprises a resource name, a resource type, a resource IP, a host state, an acquisition state and an alarm log.
The service topology management module can generate a comprehensive network topology graph according to service system information, display the connection relation of equipment, edit the network topology graph and dynamically display the alarm state of each node in the topology graph.
Further, the report management module displays the total number of alarms, the total number of closed alarms and the total number of unrecovered alarms according to the alarm level classification statistics, supports generation of daily reports, weekly reports and monthly reports, provides inquiry functions of the historical daily reports, weekly reports and monthly reports, alarms statistics trend and provides a report deriving function.
Further, the user management module comprises management functions of adding, modifying, deleting, distributing roles and the like of the user, and realizes the management of the user on the user and the authority configuration.
In order to optimize the system architecture, the invention adopts a B/S architecture design, the front end uses a Vue frame, and the rear end is constructed based on a Spring Boot frame. The system adopts a modularized structure to divide functional modules, so that the system is ensured to have good maintainability, expansibility and flexibility.
As a preference of the present invention, the system preferably employs a monitoring solution with multiprotocol support and flexible extension capability in order to achieve efficient and comprehensive monitoring functionality. The scheme supports various data acquisition modes, including SNMP, agent, IPMI, JMX, ODBC protocols and the like, and can adapt to diversified environment requirements. The scheme can monitor resources such as a server, an operating system, network equipment, a database, middleware, storage equipment, virtualization and the like, provides a grouping management function and realizes multi-level and multi-dimensional monitoring.
As the optimization of the invention, the alarm information interface provides an operation and maintenance knowledge base, including alarm reason analysis and solution, and can comment, thereby increasing humanized interaction modes.
Compared with the prior art, the network service application operation and maintenance alarm system has the advantages of high efficiency, reasonability, easiness in management and capability of effectively arranging and synchronizing various data and displaying. When abnormal conditions occur, the system can automatically locate the abnormal conditions through the fault analysis model and send out an alarm and influence range, so that the problem locating and solving time can be greatly shortened, the service interruption risk is reduced, and the running stability and user experience of the service system are improved.
Drawings
Fig. 1 is a flowchart of a service application alarm operation of a network service application operation alarm system according to the present invention.
Fig. 2 is a schematic diagram of service application alarm logic of a network service application operation alarm system according to the present invention.
Detailed Description
The following description of the preferred embodiments of the present invention is provided in connection with the accompanying drawings, and it is to be understood that the preferred embodiments described herein are for purposes of illustration and explanation of the present invention only, and are not intended to limit the invention to the precise embodiments disclosed, but are not limited to the precise embodiments disclosed.
Examples
It should be noted that various devices monitored by the system, such as operating systems, middleware, network devices, databases, servers, etc., are provided that the objects have been monitored in the system. A network service application operation and maintenance alarm system comprises the following specific processes:
s1, creating a host, configuring host information, performing SNMP configuration including monitoring types (IP, DNS), ports, SNMP versions, group words and the like, and supporting batch acquisition.
S2, creating a service alarm configuration template, filling in template names, single selection of alarm grades (1 information, 2 alarms, 3 times, 4 seriousness and 5 urgency), selecting multiple selection of alarm indexes (according to monitoring types, selecting index information under a corresponding template), and customizing service alarm description information.
S3, creating a service application system, a service name, an affiliated department, an alarm template (the created template) and a contained host object. The host object alarm under the service matches with the template host index item, if the template contains index alarm, the corresponding level of the template is sent out, and the description alarm is sent to the unified alarm management module.
S4, recovering the alarm (all monitoring objects under the service have no alarm, and recovering the service to a normal state).
In the step, the data access of the information acquisition data is completed, and acquisition means such as SNMP protocol, IPMI protocol, XML active reporting and passive reporting are supported in the step, and a series of parameters such as CPU, disk, network card, server sensor, application layer information and the like of the acquisition capability system are responsible.
In step S2, the configuration alert rule expression is supported, so that richer alert rules can be dynamically configured, dynamic expression configuration is provided, and the alert expression can be established according to the configured alert finger table. Such as:
and alarming immediately, and alarming the service immediately when receiving the finger list alarm.
And alarming simultaneously, wherein the indexes establish a relation, and when the indexes simultaneously have alarming, the service is given out an alarm.
And when a certain alarm exceeds the appointed duration, the duration gives an alarm to the service.
Assuming that the database monitoring of an enterprise sets two indexes of CPU utilization rate and memory utilization rate, a user creates a corresponding alarm template. When the CPU utilization rate exceeds 80%, an emergency alarm is triggered, and meanwhile, if the memory utilization rate also exceeds 75%, the system sends out a joint alarm with higher priority to prompt the operation and maintenance team to intervene. Once CPU and memory usage returns to normal levels, the system automatically records recovery events, updates the business state and notifies the relevant personnel.
It should be noted that the above-mentioned embodiments are merely preferred embodiments of the present invention, and the present invention is not limited thereto, but may be modified or substituted for some of the technical features thereof by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. The network service application operation and maintenance alarm system is characterized by comprising a comprehensive monitoring module, a service management module, an alarm analysis module, a service topology management module, a report management module and a user management module;
the comprehensive monitoring module is used for collecting running data of the network service application in real time, including CPU utilization rate, memory occupation, magnetic disk, network state, network delay, request response time and other key indexes;
the business management module is used for providing a man-machine interaction interface for a user, and performing operation and maintenance management and control, operation and maintenance strategy creation and issuing operations on the managed business application;
the alarm management module is used for monitoring the running state and data of the business application in real time, generating alarm information of different levels according to the monitoring result and sending the alarm information to a manager in real time;
the alarm analysis module is used for receiving the generated alarm data, comprehensively judging and analyzing the alarm based on a set alarm analysis model, analyzing abnormal or potential faults, positioning the position of the fault and accurately judging the influence range of the fault;
the service topology management module is used for displaying details of the service system, maintaining basic information of the service system, generating a comprehensive service topology graph according to the service system information and displaying connection relations of software and hardware equipment of service application;
the report management module is used for counting and analyzing alarm information of the business application and recording and analyzing abnormal conditions and maintenance conditions of the business application;
and the user management module manages and maintains user rights according to different roles and performs daily record on user login and operation behaviors.
2. The network service application operation and maintenance alarm system according to claim 1, wherein the comprehensive monitoring module is configured to monitor the operation state of the service application and the devices such as network devices, servers, databases, middleware, etc. in real time, obtain relevant performance index data, configure in combination with alarm rules to find out abnormal performance monitoring conditions in time, and push alarm information to relevant personnel.
3. The integrated monitoring module according to claim 2, wherein accurate monitoring information is extracted for different types of service applications, index data and alarm rules are monitored comprehensively, performance parameters and alarm factors of various types are analyzed intelligently, and timely alarms are given to operation risks, faults and abnormal conditions.
4. According to claim 3, the monitoring results of the comprehensive monitoring are classified into normal, abnormal and fault. Abnormal and fault conditions indicate that the system may fail and need to be handled. The manager can set an alarm condition and a sending mode aiming at the abnormal or fault state, and when the monitoring result is the abnormal or fault state, the alarm module is started, the alarm module judges whether the alarm is needed according to the alarm rule, and if the alarm is needed, the alarm information is sent according to the preset alarm mode.
5. The network service application operation and maintenance alarm system according to claim 1, wherein the service management module comprises a service alarm combination template, wherein the service alarm combination template is configured with a template name, an alarm description, an alarm level and a trigger rule, can be added with a monitoring object, supports important alarm indexes of an imported service, and can be modified and deleted.
6. According to claim 5, the service management module provides an intuitive service large screen panoramic wall, views the overall health of the service system, the alarm information list, the access trend, the service object type comparison analysis and the alarm information level comparison analysis, and performs macroscopic grasp on the overall operation condition of the service system.
7. The network service application operation and maintenance alarm system according to claim 1, wherein the alarm management module processes the alarm information to transfer the work order, distributes the work order to maintenance personnel, supports flexible notification configuration, notifies the maintenance personnel in a mail, short message and WeChat mode, and can inquire the alarm notification record according to the alarm title, the receiving mode and the receiving personnel.
8. The system of claim 1, wherein the alarm analysis module sets alarm levels (emergency, serious, secondary, alarm, information) of different degrees and trigger rules of alarm indexes by setting an alarm template, selects required resource classifications (operating system, middleware, network equipment), imports alarm indexes preset by various systems, can finely screen the alarm information, only acquire alarms concerned by users, and can accurately alarm faults occurring in the system.
9. A network service application operation and maintenance alarm system according to any one of claims 1-8, wherein the operation flow of the system is as follows:
The method comprises the steps of S1, creating a host, configuring host information and monitoring configuration, wherein the monitoring can be performed in various modes, namely, monitoring through SNMP (IP, port, SNMP version, group word and the like, supporting batch acquisition) and monitoring through Agent (IP, port, DNS);
S2, creating a service alarm configuration template, filling in a template name, an alarm grade, selecting an alarm index and customizing service alarm description information;
s3, creating a service application system, a service name, an affiliated department, an alarm template and a contained host object, wherein the alarm of the host object under the service matches with a template host index item, if the template contains an index alarm, sending a template corresponding grade and describing the alarm to a unified alarm management module;
S4, recovering the alarm, wherein all monitoring objects under the service have no alarm, and the service is recovered to a normal state.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411864459.6A CN119603130A (en) | 2024-12-18 | 2024-12-18 | A network business application operation and maintenance alarm system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411864459.6A CN119603130A (en) | 2024-12-18 | 2024-12-18 | A network business application operation and maintenance alarm system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119603130A true CN119603130A (en) | 2025-03-11 |
Family
ID=94830881
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411864459.6A Pending CN119603130A (en) | 2024-12-18 | 2024-12-18 | A network business application operation and maintenance alarm system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119603130A (en) |
-
2024
- 2024-12-18 CN CN202411864459.6A patent/CN119603130A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8370479B2 (en) | System and method for dynamically grouping devices based on present device conditions | |
| US7043659B1 (en) | System and method for flexible processing of management policies for managing network elements | |
| CN110278097B (en) | Server operation and maintenance system and method based on Android system | |
| US7197561B1 (en) | Method and apparatus for maintaining the status of objects in computer networks using virtual state machines | |
| US20030135382A1 (en) | Self-monitoring service system for providing historical and current operating status | |
| US7028228B1 (en) | Method and apparatus for identifying problems in computer networks | |
| WO2023142054A1 (en) | Container microservice-oriented performance monitoring and alarm method and alarm system | |
| US7685269B1 (en) | Service-level monitoring for storage applications | |
| CN114244676A (en) | Intelligent IT integrated gateway system | |
| CN105282772A (en) | Wireless network data communication equipment monitoring system and equipment monitoring method | |
| CN107294764A (en) | Intelligent supervision method and intelligent monitoring system | |
| EP1361761A1 (en) | Telecommunications network management system and method for service monitoring | |
| CN112688819A (en) | Comprehensive management system for network operation and maintenance | |
| CN111190794A (en) | Operation and maintenance monitoring and management system | |
| CN1984170B (en) | Method for processing network alerting information | |
| US20060112175A1 (en) | Agile information technology infrastructure management system | |
| CN106487574A (en) | Automatic operating safeguards monitoring system | |
| CN101095307A (en) | Network management appliance | |
| CN106789412A (en) | Method, the apparatus and system of monitoring information collection main website performance | |
| CN113765717B (en) | An operation and maintenance management system based on a confidential special computing platform | |
| CN116961241B (en) | Unified application monitoring platform based on power grid business | |
| CN116094905B (en) | Full-link monitoring system | |
| CN113608457A (en) | Network operation and maintenance monitoring system | |
| CN119603130A (en) | A network business application operation and maintenance alarm system | |
| CN119025373A (en) | A centralized operation and maintenance method, system, device and storage medium for multiple front-end and back-end separated monomer projects |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |