CN114253806A - Access stratum log collection, analysis and early warning system - Google Patents
Access stratum log collection, analysis and early warning system Download PDFInfo
- Publication number
- CN114253806A CN114253806A CN202111550550.7A CN202111550550A CN114253806A CN 114253806 A CN114253806 A CN 114253806A CN 202111550550 A CN202111550550 A CN 202111550550A CN 114253806 A CN114253806 A CN 114253806A
- Authority
- CN
- China
- Prior art keywords
- log
- data
- analysis
- collection
- alarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3086—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves the use of self describing data formats, i.e. metadata, markup languages, human readable formats
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Library & Information Science (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an access stratum log collecting, analyzing and early warning system, which comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module is deployed on a provincial access stratum; the collection and statistics module carries out change monitoring and setting analysis on the configuration file of the access layer, aggregates statistical calculation and returns to the center; the storage display module comprises a collection storage program and the like and can be used for storing, recording, searching and displaying return data; the analysis alarm module comprises a trend alarm and a state alarm. The invention provides an access stratum log collection, analysis and early warning system which comprises a collection statistical module, a storage and display module and an analysis and alarm module which are respectively arranged in a provincial access stratum and a provincial access stratum center, solves the problems that the original data transmission in the traditional technology occupies large resources and can not analyze and early warn in real time, and simultaneously realizes an early warning scheme capable of customizing in real time.
Description
Technical Field
The invention relates to the field of computers, in particular to an access stratum log collection, analysis and early warning system.
Background
The most common log collection and analysis system scheme in the industry at present is the ELK scheme.
The ELK is a collection of elastic search, logstack and Kibana open source software, and is an open source scheme of a log management system. The system can be used for log search, analysis and visual display. And the Logstash acquires the log, formats the log into a JSON format, transmits the log into an ElasticSearch for storage, and accesses the Kibana to inquire log information by a browser.
The ELK scheme can provide the capability of log collection and analysis, but because the collected logs are all stored in the Elasticissearch in the form of original data, the requirement on the storage capacity is high under the condition of large log quantity, and if the requirement of remote transmission exists, the requirement on the network bandwidth is also high, which indicates that the temporary resource of the ELK scheme is large.
Because the traditional analysis early warning is based on original log data and is not streaming received log data, the time change of the log data cannot be confirmed in real time, and the log data used for analysis early warning is always ensured to be complete by delaying for a certain amount of time
Therefore, the existing scheme of the acquisition and analysis system has the problems that the transmission of original data occupies large resources and real-time analysis and early warning cannot be realized.
Disclosure of Invention
The invention aims to provide an access stratum log collection, analysis and early warning system. The invention has the advantages of relatively small occupied resources and real-time analysis and early warning.
The technical scheme of the invention is as follows: an access stratum log collecting, analyzing and early warning system comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module is deployed on a provincial access stratum;
the collecting and counting module carries out change monitoring and setting analysis on the configuration file of the access layer, carries out aggregate counting calculation by taking minutes as time dimension and regularly transmits back the counting data to the center for storage, calculation and display;
the storage display module comprises a collection storage program, kafka, an elastic search, kibana and grafana and can be used for storing, recording, searching and displaying returned log statistical data;
the analysis alarm module comprises a trend alarm and a state alarm;
the trend alarm judges and realizes the abnormal trend change alarm of the service index through real-time streaming calculation;
and the state alarm judges and realizes the absolute value abnormal alarm of the service index through timing retrieval.
In the foregoing system for collecting, analyzing and warning an access stratum log, the collecting and counting module obtains a list of a Nginx log record file and a log record format by monitoring and analyzing a log configuration file of the Nginx, and generates a task of collecting, analyzing and counting the Nginx log record file; the Nginx log configuration file comprises nginx.conf and a plurality of vhost.conf; the nginx.conf can determine the log format type and the detailed format, and each vhost.conf corresponds to the log generation configuration of different services and comprises respective file names and the used log format types.
In the foregoing system for collecting, analyzing and warning an access stratum log, the monitoring and analyzing process of the collecting and counting module is as follows:
a1, monitoring a log configuration file of Nginx;
a2, judging whether the Nginx log configuration file is changed; if yes, executing the step A3, otherwise, continuing monitoring;
a3, judging the configuration type; if yes, executing A4.1; if the log _ format is true, executing A4.2;
a4.1, updating a monitoring service list;
a4.2, updating a log analysis format;
and A5, updating a log collection and analysis task.
In the access stratum log collection, analysis and early warning system, the collection and statistics module monitors and analyzes the log file and then aggregates the log file to generate statistical data which is transmitted back to the center; the specific process is as follows:
b1, monitoring and analyzing the log file in real time, and performing aggregation statistics on index data according to time periods; the time period is defaulted to one minute and can be adjusted according to needs;
b2, returning the statistical data of the previous time period when the statistical period of the log time is changed, and recording the progress point read by the log file into the progress record file;
b3, if the error occurs in returning the statistical data to the center, executing step B4;
b4, repeating the return retry; if the accumulated retry exceeds the limit, go to step B5;
b5, interrupting the step B1 until the returned statistical data is successful.
In the foregoing system for collecting, analyzing and warning an access stratum log, the collection and storage program has the following procedures:
c1, receiving statistical data returned from each region through an http interface service;
c2, preprocessing the statistical data;
c3, writing the preprocessed statistical data into kafka for caching, and storing the preprocessed statistical data into an elastic search database by logstack;
c4, judging whether kafka is abnormal or not; if yes, go to step C5;
c5, interrupting the step C1, and refusing to receive new data.
In the foregoing system for collecting, analyzing and warning the access stratum log, the preprocessing in step C2 includes completing missing default data (missing due to old version), and discarding abnormal data.
In the foregoing access stratum log collection, analysis, and early warning system, the analysis and alarm module performs real-time analysis and calculation on the received statistical data, and alarms when a service index change trend is persistently abnormal, and the specific process is as follows:
d1, monitoring the log items of the early warning and the warning threshold value of the corresponding service index in a customized manner through a configuration file;
d2, receiving data from the kafka real-time streaming, and screening the data of the service or the interface needing to monitor the alarm through pre-filtering; storing the screened data into a cache by taking time as an index for aggregation; if not, storing the data into a cache, and if so, calculating and updating the cache data;
d3, after the time index of the received log statistical data changes, considering that the data of the previous time point is received, and calculating the data of the next time point;
d4, taking out historical data in a period of time, generating data values which are required by calculation of abnormal values and are sorted according to a time sequence, and judging whether the data at the last time point are in accordance with normal distribution of numerical values in the previous period of time through Kolmogorov-Smirnov test to judge whether the data are the abnormal values; if yes, generating corresponding abnormal alarm information; if not, discarding;
d5, further filtering the obtained abnormal alarm information, and judging whether the abnormal alarm information meets the abnormal condition or not through the threshold value of the service index; if yes, go to step D6; if not, discarding;
d6, caching the obtained information needing to be alarmed, acquiring abnormal values in a preset time period from the cache, and judging the continuity and trend consistency of the abnormal values; if the abnormal condition exists, alarming; if not, discarding;
the abnormal value must be continuous and consistent in changing trend (both rising and falling) to be considered as abnormal, otherwise it is normal value fluctuation.
In the foregoing access stratum log collection, analysis, and early warning system, the preset time period in step D6 is defaulted to 10 minutes, which can be adjusted as needed.
Compared with the prior art, the invention provides an access stratum log collecting, analyzing and early warning system, which comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module, the storage and display module and the analyzing and warning module are respectively deployed at a provincial access stratum and a center; according to the invention, the analysis and statistics of the data are completed in the log acquisition module, and the data volume is reduced after the original data is calculated and counted to form statistical data, so that the occupation of various resources of network transmission, central storage and statistical calculation is reduced, and the occupied resources are relatively small;
the trend alarm in the analysis alarm module is to acquire log statistical data from a kafka streaming mode, sense the time switching of the log statistical data in real time, immediately analyze the log statistical data in the previous time period and alarm, and analyze and early warn in real time;
therefore, the invention has the advantages of relatively small occupied resources and real-time analysis and early warning.
Furthermore, the collection and statistics module monitors the log format and the log item generation configuration in real time, responds to the change of the configuration file in real time, automatically analyzes the analysis rule and the item content of the log which are updated after the log configuration, does not need additional manual intervention, and reduces the labor cost during large-scale deployment and change;
the collection statistical module supports a customized log format based on nginx error.log (the log _ format of the nginx error.log is fixed, the customized log is used for recording customized information based on an error information field provided in the log _ format and is identified and analyzed through the customized format, and the current customized format is ' customization ' action: action _ name '), so that statistics of behaviors under special requirements can be supported; for example, the user-defined log can be written into the nginx error log file after the user-defined action is executed by the lua script, and the collection, analysis and statistical return can also be carried out by the collection and statistical module of the application;
the collecting and counting module analyzes the log information in real time and then performs statistical calculation processing on the result in a dimension of a minute level, a large amount of original log information under the same monitoring dimension is aggregated into an effective running state information record according to a time dimension, a part of processing links of the center are completed in the collecting end in advance, and the occupied resource of transmission, calculation and storage occupied by the log information obtained by the center is greatly reduced;
the collection statistical module respectively maintains log collection progress for each log file, and simultaneously provides a retransmission and interruption blocking mechanism under abnormal conditions, thereby avoiding loss or repetition of returned data and ensuring validity of the data;
the collection and storage program collects, processes, forwards and stores the statistical data returned by the collection and statistics module, performs compatibility processing on the data returned by the collection and statistics modules of different versions, ensures the consistency and the effectiveness of the data, simultaneously performs caching through writing kafka to ensure that the data is not lost, and analyzes and alarms in real time to provide a data source;
logstack receives statistical data from kafka and writes the statistical data into an elastic search, and provides an intuitive data retrieval and operation state chart display page through kibana and grafana;
the trend alarm in the analysis alarm module realizes customized service project range and index alarm threshold value through configuration files, performs streaming data analysis calculation through data accessed from kakfa, monitors service indexes in real time in a minute-level change mode, improves the accuracy of abnormal judgment of the change trend through multiple links such as abnormal value judgment, abnormal continuity judgment, abnormal consistency judgment, abnormal value threshold value judgment and the like based on Kolmogorov-Smirnov test, and provides an alarm for monitoring the change trend of the indexes in real time.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a flow chart of the snoop resolution of the gather statistics module of the present invention;
FIG. 3 is a flow diagram of a storage exhibition module of the present invention;
FIG. 4 is a flow chart of the analyze alarm module of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not to be construed as limiting the invention.
Examples are given. An access stratum log collection, analysis and early warning system is shown in fig. 1 and comprises a collection statistical module, a storage display module and an analysis and alarm module, wherein the collection statistical module is deployed on a provincial access stratum;
the collecting and counting module carries out change monitoring and setting analysis on the configuration file of the access layer, carries out aggregate counting calculation by taking minutes as time dimension and regularly transmits back the counting data to the center for storage, calculation and display;
the storage display module comprises a collection storage program, kafka, an elastic search, kibana and grafana and can be used for storing, recording, searching and displaying returned log statistical data;
the analysis alarm module comprises a trend alarm and a state alarm;
the trend alarm judges and realizes the abnormal trend change alarm of the service index through real-time streaming calculation;
and the state alarm judges and realizes the absolute value abnormal alarm of the service index through timing retrieval.
The collecting and counting module acquires a Nginx log record file list and a log record format by monitoring and analyzing a log configuration file of the Nginx and generates a Nginx log record file collecting and analyzing counting task; the Nginx log configuration file comprises nginx.conf and a plurality of vhost.conf; the nginx.conf can determine the log format type and the detailed format, and each vhost.conf corresponds to the log generation configuration of different services and comprises respective file names and the used log format types.
As shown in fig. 2, the monitoring and parsing process of the statistics collecting module is as follows:
a1, monitoring a log configuration file of Nginx;
a2, judging whether the Nginx log configuration file is changed; if yes, executing the step A3, otherwise, continuing monitoring;
a3, judging the configuration type; if yes, executing A4.1; if the log _ format is true, executing A4.2;
a4.1, updating a monitoring service list;
a4.2, updating a log analysis format;
and A5, updating a log collection and analysis task.
As shown in fig. 3, the collection statistics module monitors and analyzes the log file, aggregates the log file to generate statistics data, and transmits the statistics data back to the center; the specific process is as follows:
b1, monitoring and analyzing the log file in real time, and performing aggregation statistics on index data according to time periods; the time period is one minute;
b2, returning the statistical data of the previous time period when the statistical period of the log time is changed, and recording the progress point read by the log file into the progress record file;
b3, if the error occurs in returning the statistical data to the center, executing step B4;
b4, repeating the return retry; if the accumulated retry exceeds the limit, go to step B5;
b5, interrupting the step B1 until the returned statistical data is successful.
The collection and storage program has the following procedures:
c1, receiving statistical data returned from each region through an http interface service;
c2, preprocessing the statistical data;
c3, writing the preprocessed statistical data into kafka for caching, and storing the preprocessed statistical data into an elastic search database by logstack;
c4, judging whether kafka is abnormal or not; if yes, go to step C5;
c5, interrupting the step C1, and refusing to receive new data.
The preprocessing described in step C2 includes complementing missing default data (missing due to old version) and discarding abnormal data.
As shown in fig. 4, the analyzing and alarming module performs real-time analysis and calculation on the received statistical data, and alarms when persistent abnormality occurs in the change trend of the service index, and the specific process is as follows:
d1, monitoring the log items of the early warning and the warning threshold value of the corresponding service index in a customized manner through a configuration file;
d2, receiving data from the kafka real-time streaming, and screening the data of the service or the interface needing to monitor the alarm through pre-filtering; storing the screened data into a cache by taking time as an index for aggregation; if not, storing the data into a cache, and if so, calculating and updating the cache data;
d3, after the time index of the received log statistical data changes, considering that the data of the previous time point is received, and calculating the data of the next time point;
d4, taking out historical data in a period of time, generating data values which are required by calculation of abnormal values and are sorted according to a time sequence, and judging whether the data at the last time point are in accordance with normal distribution of numerical values in the previous period of time through Kolmogorov-Smirnov test to judge whether the data are the abnormal values; if yes, generating corresponding abnormal alarm information; if not, discarding;
d5, further filtering the obtained abnormal alarm information, and judging whether the abnormal alarm information meets the abnormal condition or not through the threshold value of the service index; if yes, go to step D6; if not, discarding;
d6, caching the obtained information needing to be alarmed, acquiring abnormal values in a preset time period from the cache, and judging the continuity and trend consistency of the abnormal values; if the abnormal condition exists, alarming; if not, discarding.
The preset time period in step D6 is 10 minutes.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111550550.7A CN114253806A (en) | 2021-12-17 | 2021-12-17 | Access stratum log collection, analysis and early warning system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111550550.7A CN114253806A (en) | 2021-12-17 | 2021-12-17 | Access stratum log collection, analysis and early warning system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114253806A true CN114253806A (en) | 2022-03-29 |
Family
ID=80795592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111550550.7A Pending CN114253806A (en) | 2021-12-17 | 2021-12-17 | Access stratum log collection, analysis and early warning system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114253806A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114676784A (en) * | 2022-04-02 | 2022-06-28 | 杭州隆埠科技有限公司 | A method and system for determining the impact surface of a change |
CN114826943A (en) * | 2022-06-30 | 2022-07-29 | 山东捷瑞数字科技股份有限公司 | NGINX log analysis method and system |
CN115460072A (en) * | 2022-08-25 | 2022-12-09 | 浪潮云信息技术股份公司 | Log processing system integrating log collection, analysis, storage and service |
CN116303317A (en) * | 2023-02-16 | 2023-06-23 | 杭州安恒信息技术股份有限公司 | A log processing method, device and computer equipment for lua program interface |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104660427A (en) * | 2013-11-18 | 2015-05-27 | 深圳市腾讯计算机系统有限公司 | Method and device for real-time statistics of logs |
CN110990223A (en) * | 2019-11-27 | 2020-04-10 | 中诚信征信有限公司 | Monitoring alarm method and device based on system log |
CN112422344A (en) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | Alarm method, device, storage medium and electronic device for log abnormality |
-
2021
- 2021-12-17 CN CN202111550550.7A patent/CN114253806A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104660427A (en) * | 2013-11-18 | 2015-05-27 | 深圳市腾讯计算机系统有限公司 | Method and device for real-time statistics of logs |
CN110990223A (en) * | 2019-11-27 | 2020-04-10 | 中诚信征信有限公司 | Monitoring alarm method and device based on system log |
CN112422344A (en) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | Alarm method, device, storage medium and electronic device for log abnormality |
Non-Patent Citations (3)
Title |
---|
675EA0B3A47D: "openresty+filebeat发送nginx返回日志到kafka", pages 1 - 5, Retrieved from the Internet <URL:《https://www.jianshu.com/p/017e88524612》> * |
MNASD: "用Prometheus细化Nginx监控", pages 1 - 4, Retrieved from the Internet <URL:《https://blog.csdn.net/mnasd/article/details/86585377》> * |
春天的一只小鹿: "Prometheus监控Nginx", pages 1 - 8, Retrieved from the Internet <URL:《https://mp.weixin.qq.com/s/g0IXL2oryJvXWtFbDvbJvg》> * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114676784A (en) * | 2022-04-02 | 2022-06-28 | 杭州隆埠科技有限公司 | A method and system for determining the impact surface of a change |
CN114826943A (en) * | 2022-06-30 | 2022-07-29 | 山东捷瑞数字科技股份有限公司 | NGINX log analysis method and system |
CN114826943B (en) * | 2022-06-30 | 2022-10-28 | 山东捷瑞数字科技股份有限公司 | NGINX log analysis method and system |
CN115460072A (en) * | 2022-08-25 | 2022-12-09 | 浪潮云信息技术股份公司 | Log processing system integrating log collection, analysis, storage and service |
CN116303317A (en) * | 2023-02-16 | 2023-06-23 | 杭州安恒信息技术股份有限公司 | A log processing method, device and computer equipment for lua program interface |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114253806A (en) | Access stratum log collection, analysis and early warning system | |
CN110086666B (en) | Alarm method, device and system | |
US8181161B2 (en) | System for automatically collecting trace detail and history data | |
CN111339175B (en) | Data processing method, device, electronic equipment and readable storage medium | |
US8095514B2 (en) | Treemap visualizations of database time | |
WO2003073203A2 (en) | System and method for analyzing input/output activity on local attached storage | |
US7685475B2 (en) | System and method for providing performance statistics for application components | |
CN112699007B (en) | Method, system, network device and storage medium for monitoring machine performance | |
US20070150581A1 (en) | System and method for monitoring system performance levels across a network | |
CN117194142A (en) | Integrated application performance diagnosis system and method based on link tracking | |
CN103001824A (en) | A monitoring system and monitoring method for monitoring multiple servers | |
US9448998B1 (en) | Systems and methods for monitoring multiple heterogeneous software applications | |
CN114143169A (en) | Micro-service application observability system | |
CN114745297A (en) | Application performance monitoring system and method for distributed link tracking | |
CN114531361A (en) | Service topology analysis method and device of distributed system and storage medium | |
CN118627023A (en) | An analysis system for tracking calls between microservices | |
CN113220543A (en) | Automatic service alarm method and device | |
Chakraborty et al. | Observability | |
CN118331823B (en) | Aerospace engineering business operation log management and monitoring alarm method and system | |
CN118626341B (en) | State monitoring method and system for network on chip | |
CN111431738B (en) | Alarm monitoring method based on Internet operation and maintenance | |
CN116991661A (en) | Problem alarm system and method for software system | |
CN118093221A (en) | Distributed information gathering method, device, electronic equipment and storage medium | |
CN119537153A (en) | Monitoring system, electronic device and computer readable storage medium | |
CN118567937A (en) | Job monitoring method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |