CN116340274A - Nginx log compression analysis method, device and readable storage medium - Google Patents
Nginx log compression analysis method, device and readable storage medium Download PDFInfo
- Publication number
- CN116340274A CN116340274A CN202310085553.0A CN202310085553A CN116340274A CN 116340274 A CN116340274 A CN 116340274A CN 202310085553 A CN202310085553 A CN 202310085553A CN 116340274 A CN116340274 A CN 116340274A
- Authority
- CN
- China
- Prior art keywords
- compression
- log
- nginx
- access
- record
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0246—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
- H04L41/0253—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using browsers or web-pages for accessing management information
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method, equipment and a readable storage medium for compression analysis of Nginx logs, wherein the method comprises the steps of reading access log files under a single Nginx instance; determining a compression section according to the access time and the preset compression time length, and compressing the access log file in the single compression section into a first compression record, wherein the first compression record comprises at least one first field and a corresponding first attribute value; compressing the access log file of the same access URL into a second compressed record, wherein the second compressed record comprises at least one second field and a corresponding second attribute value; the first compressed record and the second compressed record are sent to a database. According to the method, the requirement when the log is analyzed from the dimensions of the project and the function (URL), the log is compressed and stored in the ES index, a series of indexes are provided in three directions of performance, errors and clients to reduce the influence of compression, and the value of the log is highlighted.
Description
Technical Field
The invention belongs to the field of data processing, and particularly relates to a method and equipment for compressing and analyzing Nginx logs and a readable storage medium.
Background
The Nginx log can comprehensively reflect the change of important indexes such as access quantity, success rate, performance and the like of an internet service, is beneficial to the guarantee and continuous improvement of the running quality of related services, and is widely used for acquiring, storing and analyzing the data by ELK in the industry at present.
Because the log quantity is huge, billions of logs generated by the business every day occupy tens of TB or more, and the business still needs unscheduled capacity expansion service after inputting a large number of servers, otherwise, the log servers are easy to respond slowly or fail, and common countermeasures comprise only saving the latest small amount of data and respectively using different clusters for different businesses, so that the value of all the data cannot be fully mastered, the comprehensive analysis and comparison of different projects are not facilitated, and partial problems are difficult to discover and respond in time. How to efficiently process this portion of data with limited IT resources in the operation and maintenance security work is one of the important challenges to be solved in this field.
Disclosure of Invention
Based on the above, the present invention aims to provide a method, a system, a device and a readable storage medium for compressing and analyzing an nmginx log, which compress the log from different service dimensions and mine the value of the nmginx log so as to overcome the defects of the prior art.
In a first aspect, the present invention provides a method for compressing and analyzing an nginnx log, including:
reading an access log file under a single Nginx instance;
determining a compression section according to the access time and the preset compression time length, and compressing the access log file in the single compression section into a first compression record, wherein the first compression record comprises at least one first field and a corresponding first attribute value;
compressing the access log file of the same access URL into a second compressed record, wherein the second compressed record comprises at least one second field and a corresponding second attribute value;
the first compressed record and the second compressed record are sent to a database.
Preferably, the log compression analysis method further includes:
and compressing access log files corresponding to the same event into event compression records according to a preset event type, wherein the preset event type at least comprises a log state code of 500 or more, and at least one of responding to a request of the log and responding to abnormal content when the time consumption exceeds a set threshold value.
Preferably, determining the compression section according to the access time and the preset compression time length includes:
according to the access time of the log file analyzed in the Nginx log format, let u=access time L/preset compression duration T, and make a whole, [ u ] T, (u+1) T) interval be a compression section, the access time belongs to [ u ] T, (u+1) T) the log file in the access time is compressed into a first compression record.
Preferably, the generating process of the first compressed record includes:
counting and/or inquiring the access log files in the compressed section according to the first field to obtain a corresponding first attribute value,
and/or
Analyzing the log file, and carrying out statistics and/or inquiry on the access log file in the compressed section according to the first field and the log variable to obtain a corresponding first attribute value;
at least one first field and a first attribute value corresponding to the at least one first field form a first compressed record.
Preferably, the generating of the second compressed record includes:
counting and/or inquiring the access log files of the same access URL according to the second field to obtain a corresponding second attribute value;
at least one second field and a second index value corresponding to the second field form a second compressed record.
Preferably, the first field comprises at least performance values TP90, TP95, TP99, and when the access amount p < 10000 in a single compressed section, the calculation of TP90 comprises:
generating a first performance data list, wherein the list comprises performance data corresponding to each log file;
randomly selecting time-consuming data with the sequence number i, wherein the time-consuming data is larger than any time-consuming data between sequence numbers [1, i-1] and smaller than any time-consuming data between sequence numbers [ i+1, p ] through the following sub-table recursion query process:
If i is less than 0.9p, recursively finding the number of 0.9p from the sub-table [ i+1, p-1], if i is more than 0.9p, recursively finding the number of 0.9p-i from the sub-table [0, i-1], and when i-1 is less than or equal to 0 or i+1 is more than or equal to p-1, exiting the first element which is recursively merged and returned to the recursion sub-table, wherein the element determined in the recursion sub-table query process is TP90.
Sorting the data in the first performance data list, which is arranged after TP90, in an ascending order to obtain a second performance data list, wherein the second performance data list comprises c elements;
determining TP95 and TP99 in the second performance data list according to the position of the sequence number of TP90 by utilizing the proportional relation of TP95, TP99 and TP90 on the sequence number, specifically, TP95 is the first performance data listThe data, TP99 is the first +.in the second Performance data List>Data, i dx Representing the sequence number of TP90 in the first performance data list.
Preferably, when there are at least two nmginx instances for a single item, the method further comprises:
reading a first compression record and a second compression record of each Nginx instance;
compressing the first compressed record into a third compressed record;
compressing the second compressed record of the same access URL into a fourth compressed record;
and sending the third compressed record and the fourth compressed record to a database.
Preferably, the method further includes adaptively recommending a preset compression duration T, including:
S1, calculating the compression ratio r of a single Nginx instance according to the number ns of the Nginx instances and the overall service compression ratio c, wherein r=c/ns;
s2, determining the shortest time length M of at least r log files;
s3, determining the shortest time length M of at least r log files, wherein the units are milliseconds;
step S3, calculating z=M+N/(24.3600.1000), wherein N represents the total time period number of the log record number not less than r in all time periods M of the same day, and when z exceeds a set value, uploading and storing T, otherwise, entering step S4;
step S4, let m=1.1m and round up, repeat step S3.
In a second aspect, the present invention provides an nginnx log compression analysis system, including a server and a compression end, where an nginnx instance to be compression-analyzed is stored on the server, and the compression end is configured to implement relevant steps in the nginnx log compression analysis method provided in the first aspect.
In a third aspect, the present invention provides an nginnx log compression analysis apparatus, including a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program, and implement each step of the nginnx log compression analysis method provided in the first aspect.
In a fourth aspect, the present invention also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the nginnx log compression analysis method provided in the first aspect.
From the above technical solutions, the method for compressing and analyzing the nmginx log provided by the embodiment of the present invention has the following beneficial effects:
1. compressing and mining the value of the Nginx log from three business dimensions of project, function and important event, and providing a series of special indexes from the client, error and performance in compression results in a focus manner except for conventional access quantity, flow, performance and error trend analysis, so that the value and detailed distribution of the Nginx log are intuitively reflected; the method can be separated from the original log, can simply and rapidly analyze the Nginx logs of more items on the ES clusters with the same configuration, provides more analysis effects, can not miss important information, can avoid data expansion, is beneficial to realizing standardization of the analysis of the Nginx logs and fully finding more values;
2. the method supports the self-defined compression ratio, automatically recommends the compression time of the log, automatically discovers and manages the Nginx instance information, is beneficial to automatically finding out the problem on the Nginx configuration, and is convenient for the automatic analysis of the Nginx log;
3. the architecture consisting of the custom compression end, the ES and the Kibana is more convenient to develop, operate and maintain and expand than the big data architecture based on Hadoop, the cluster analysis and query performance of the same configuration are higher, and the three parts can be flexibly adjusted by using Logstash, clickHouse, grafana.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an Nginx log compression analysis method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an Nginx log compression analysis system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a business entity design for Nginx log compression analysis according to an embodiment of the present invention;
fig. 4 is a structural block diagram of an Nginx log compression analysis device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The reason that the nmginx access log is wastefully stored is that repeated data is more, and the compression basis can be determined according to the repeated condition of each field value of each log, which mainly comprises the following steps: remote_addr and port, access time and time zone time_local, request URI (URL for HTTP/HTTPs protocol), server_ protocol, equest _ method, status (which can determine if there is an error, type of error), upstream_addr, URL jump source, http_user_agent, request and response duration and number of bytes, SSL protocol, request parameters, server hostname server_name. The values of server_name, server_protocol and SSL protocol are fixed, the values of URI, request_ method, upstream _addr, URL jump source and status have a set of fixed options, the http_user_agent has a large number of repetitions after removing the version number, the remote_addr also has a series of identical values, and the truly changed and meaningful fields only have time_local, client port, request and response duration, byte number and request parameters.
The value of the nmginx log mainly consists in the change of trends such as access quantity (log quantity), performance, errors, flow and the like and part of URLs, the log with low access quantity, normal performance and no errors does not need to be seen in detail, and a user always can only see the log in detail when the trend is about to be worsened and hope to quickly judge the problem. The invention improves the existing log compression processing flow from the architecture design and compression algorithm to the project whole and key functions, compresses the original log into trend data according to proper time granularity in two stages, namely, compresses all log records in a certain period (aggregation time period, such as 1 second, 5 seconds, etc.) into one log, for example, compresses at least 100 logs into one result, compresses the log of important events such as duplicate error and slow performance, etc., so that one ES cluster can store mass logs for analyzing a plurality of services, a user can analyze data in a larger time range, and the details of the main events and URLs before compression can be kept, so that the original log can be ignored.
Referring to fig. 1, the present embodiment provides a method for compressing and analyzing an nginnx log, including the following steps:
reading an access log file under a single Nginx instance;
determining a compression section according to the access time and the preset compression time length, and compressing the access log file in the single compression section into a first compression record, wherein the first compression record comprises at least one first field and a corresponding first attribute value;
compressing the access log file of the same access URL into a second compressed record, wherein the second compressed record comprises at least one second field and a corresponding second attribute value;
the first compressed record and the second compressed record are sent to a database.
The first compression record represents compression processing in the item dimension, and can store the compressed result records of all access logs in a certain duration range of the Nginx instance, such as compression starting time, compression duration, access quantity, average and maximum performance, time and URL of maximum performance occurrence, indexes of request performance, including percentile performance TP90, TP95 and TP99 (other percentile performance can be added according to requirements, such as TP 99.9), access quantity of each performance interval, input and output flow, rear end error distribution with the format of ' state code 1-error quantity|state code 2-error quantity … … ', possible error types classified according to common rear end error state codes and error quantity thereof, and format of ' agent1- [ state code 1-error quantity; the client error distribution of the status code 2-error quantity … … |agent2- [ status code 3-error quantity ] … … ', the possible error types classified by the common client error status code and the error quantity thereof, and the distribution of the access questions (errors on the nginnx configuration, questions with security risks, etc.) judged based on the log content and the request parameters are stored in the format of "type 1 code-question number|type 2 code-question number … …'.
The second compression record is embodied by compression processing in the functional dimension, can store the compression result record of the project function (the project of HTTP/HTTPS protocol is URL, hereinafter referred to as URL) appointed by the user, besides the fields which are the same as the first compression record, the second compression record also comprises error distribution, output and output flow of 4XX and 5XX error amounts and the error amount which is in the format of 'state code 1-error amount|state code 2-error amount … …', can store standard deviation of response performance according to the technical scheme of the function than the more specific characteristics of the project, can assist in judging the stability of the realization effect, and can expand more fields according to the requirement of the function log analysis.
Preferably, the log compression analysis method further includes:
and compressing access log files corresponding to the same event into event compression records according to a preset event type, wherein the preset event type at least comprises a log state code of 500 or more, and at least one of responding to a request of the log and responding to abnormal content when the time consumption exceeds a set threshold value.
The event compression record mainly stores the main contents of logs of the types such as background errors, performance out-of-limit, security risks and the like, including the content of the original log of the addresses of the URL, the project, the occurrence time, the event type, the protocol, the agent, the user side and the back-end service related to the problem.
In a further embodiment, determining the compression section based on the access time and the preset compression duration comprises:
according to the access time of the log file analyzed in the Nginx log format, let u=access time L/preset compression duration T, and make a whole, [ u ] T, (u+1) T) interval be a compression section, the access time belongs to [ u ] T, (u+1) T) the log file in the access time is compressed into a first compression record.
In a further embodiment, the generating of the first compressed record includes:
counting and/or inquiring the access log files in the compressed section according to the first field to obtain a corresponding first attribute value,
and/or
Analyzing the log file, and carrying out statistics and/or inquiry on the access log file in the compressed section according to the first field and the log variable to obtain a corresponding first attribute value;
at least one first field and a first attribute value corresponding to the at least one first field form a first compressed record.
The first field to be counted mainly comprises access quantity, input flow, output flow, request quantity in each interval accumulated according to the configuration of performance counting intervals, maximum performance, occurrence time and corresponding URL, log quantity statistics for different log state codes and the like.
The first field, which needs to analyze the log file and then count according to the first field and the log variable, mainly includes an access amount, an error log amount and error distribution corresponding to each access client type.
In a further embodiment, the generating of the second compressed record includes:
counting and/or inquiring the access log files of the same access URL according to the second field to obtain a corresponding second attribute value,
at least one second field and a second index value corresponding to the second field form a second compressed record.
In a further embodiment, the first field comprises at least performance values TP90, TP95, TP99, and when the access amount p < 10000 in a single compressed section, the calculation of TP90 comprises:
generating a first performance data list, wherein the list comprises performance data corresponding to each log file;
randomly selecting time-consuming data with the sequence number i, wherein the time-consuming data is larger than any time-consuming data between sequence numbers [1, i-1] and smaller than any time-consuming data between sequence numbers [ i+1, p ] through the following sub-table recursion query process:
if i is less than 0.9p, recursively finding the number of 0.9p from the sub-table [ i+1, p-1], if i is more than 0.9p, recursively finding the number of 0.9p-i from the sub-table [0, i-1], and when i-1 is less than or equal to 0 or i+1 is more than or equal to p-1, exiting the first element which is recursively merged and returned to the recursion sub-table, wherein the element determined in the recursion sub-table query process is TP90.
Sorting the data in the first performance data list, which is arranged after TP90, in an ascending order to obtain a second performance data list, wherein the second performance data list comprises c elements;
determining TP95 and TP99 in the second performance data list according to the position of the sequence number of TP90 by utilizing the proportional relation of TP95, TP99 and TP90 on the sequence number, specifically, TP95 is the first performance data listThe data, TP99 is the first +.in the second Performance data List>Data, i dx Representing the sequence number of TP90 in the first performance data list.
In a further embodiment, when there are at least two nmginx instances for a single item, the method further comprises:
reading a first compression record and a second compression record of each Nginx instance;
compressing the first compressed record into a third compressed record;
compressing the second compressed record of the same access URL into a fourth compressed record;
and sending the third compressed record and the fourth compressed record to a database.
In a further embodiment, the method further comprises the step of adaptively recommending a preset compression duration T, and the method comprises the following steps:
s1, calculating the compression ratio r of a single Nginx instance according to the number ns of the Nginx instances and the overall service compression ratio c, wherein r=c/ns;
s2, determining the shortest time length M of at least r log files, wherein the units are milliseconds;
Step S3, calculating z=M+N/(24.3600.1000), wherein N represents the total time period number of the log record number not less than r in all time periods M of the same day, and when z exceeds a set value, uploading and storing T, otherwise, entering step S4;
step S4, let m=1.1m and round up, repeat step S3.
Referring to fig. 2 and 3, the following embodiment further describes the compression analysis method provided by the present invention by a specific physical design.
Fig. 2 shows an nmginx log compression analysis system, which consists of a client and a server, wherein the server is responsible for discovery and management of nmginx instance information by taking a project as a unit, analysis and display of compression results are carried out, the client is suitable for development by go or Rust, and the compression results are stored in an ES. The Kibana, grafana analysis effect sharing function can be utilized to make up for the defects of the Kibana, grafana analysis effect sharing function and the compression result, some custom analysis functions are further developed based on the compression result and are integrated with the analysis of the custom analysis functions and the custom analysis functions on a UI, and linkage between related business entity data is achieved.
The automatic discovery of the Nginx instance information is helpful for a user to quickly discover possible inconsistencies in the Nginx configuration of the same item, and is convenient for analyzing and processing the Nginx access log and timely processing the discovered problems. Mainly comprises the following steps:
a) If the running Nginx process exists on the current server, designating an option-t to execute the corresponding Nginx and then resolving and outputting a configuration file which can be used by the current server, and extracting configuration such as log_format, a back-end service instance, weight and the like from the configuration file;
b) If no running Nginx process exists, firstly analyzing the history of the operating system to obtain a command for operating Nginx, if not, exiting, if the command is at a default position, using a configuration file of the Nginx default position, otherwise, attempting to analyze the most recently used configuration file from a system history Nginx start command or a start script;
c) After the configuration file is found, resolving log_format, a back-end service instance and weight configuration in the same way as the previous step, resolving to obtain a server IP, reporting to a service end, and if the configuration file cannot be found, exiting and prompting a user to input the position of the Nginx configuration file;
d) After receiving the reported Nginx instance records, the server side analyzes all back-end service instance addresses, access weight possibly distributed to each back-end instance, log_format and Nginx instance IP, and then stores the records in an Nginx instance table;
e) The server side checks whether different records with the same back-end service instance set or non-empty intersection exist in the stored Nginx instance data at regular time, if so, the corresponding Nginx instances are considered to belong to the same item, then item numbers are automatically generated for the records, and the numbers allow a user to modify the numbers into visual values as long as the unique values are maintained in the table; then, generating a new project record by adopting the obtained project number and taking the record number as the Nginx instance number in the project table, and allowing a user to supplement more information through a related management function;
f) The server may further check whether there is a difference in the configuration of log_format of each nglnx of the same item and notify the user to confirm the difference.
Fig. 3 shows an entity design provided in this embodiment, under this entity architecture, management of Nginx instance information may be saved by using a relational database, and a storage client automatically discovers and manages an Nginx instance, so as to conveniently parse an Nginx access log, including an Nginx instance table: the method is used for storing the IP of the Nginx instance, the log_format configuration of the Nginx, the back-end service address set and the possible load balancing proportion, and can store more information of the Nginx instance according to the need; the system can further find out all Nginx instance records belonging to the same project according to the Nginx instance information reported by the client, automatically generate and store project numbers and the Nginx instance numbers of the project to a project table, and a user can set more attributes such as products to which the project belongs, project types for analysis, slow performance thresholds of the project and the like through a management function; the project URL table stores which URLs (project functions) of the project the user needs to analyze and the configuration thereof, and can store some bases for analyzing URL-based aggregated result data according to needs. The system provides relevant management functions to support users in adjusting the data of these tables as desired.
The compression processing of the log is mainly divided into the access log of a single nmginx instance and the compression processing of all the nmginx instances in the project.
1. Compression processing of a single nmginx instance access log.
Analyzing the IP of the server where the current Nginx instance is located, inquiring the log format configuration log_format of the Nginx from the server through the IP, numbering the item, the Nginx total instance number nc of the item, the aggregation time length T and the interval configuration of performance segmentation statistics, such as [0,0.2], [0.2,1], [1,3], and if not, using a default value. And reading file names of all files under a specified log catalog, sequencing, checking whether a last stored processing history file exists under the specific catalog, loading and comparing if the file names exist and the last processed line number of the file names, judging whether the file names are the files currently being processed, if so, jumping to the next line of the last processed line number, and continuing to process the log of each line.
The specific compression treatment process is as follows:
s101, based on buffering concurrent reading log files, analyzing records corresponding to each row according to log_format, converting field time_local in the log files into millisecond form L, rounding u=L/T, compressing logs in the interval of time_local belonging to [ u+1 ] T to one record with u T as starting time,
S102, for all log records with the state code of 500 or more and the response performance exceeding a certain threshold, the request and the response contain abnormal contents, the log records are assembled into corresponding records according to the design of important event indexes one to one and submitted to a server for storage. Detection and analysis supporting more types of events can be extended here. The use of smaller compression cycles during the compression time periods containing these events may also be controlled to support presentation of changes in data at finer temporal granularity.
S103, executing the following compression process on the single compression section obtained by dividing in the step S101, and obtaining the compression record in the item dimension of the compression section.
The access amount p is accumulated, the flow is input and output, the request quantity in each interval is accumulated according to the configuration of the performance statistics interval, the maximum performance, the occurrence time and the corresponding URL.
And accumulating error amounts corresponding to the state codes for the logs with the state codes of 400 or more.
Analyzing the access log file, obtaining the client type (browser or access library) corresponding to each log file according to the log field http_user_agent, and accumulating the access quantity corresponding to each client type, the error quantity of the state code 4XX and the error distribution.
S104, aiming at all log files to be compressed, classifying according to item numbers, URLs and starting time, accumulating the access quantity url_pv of the log files of the same URLs, the access quantity, input and output flow and error distribution of the logs with the state codes of 4XX and 5XX, and recording all response performance data lists of the current time period to obtain compression records in the functional dimension of the compression section.
For the calculation of the performance fields TP90, TP95 and TP99 in the compressed record, there is the following procedure.
When the access amount p <10000, the performance field is calculated as follows:
the smallest of the several percentile performances to be calculated is found, assuming TP90, TP95 and TP99 are to be calculated, the smallest of which is TP90.
All performance data to be calculated are added to a list L, and a data is randomly selected from the list as a sequencing basis, so that after sequencing, the number is smaller on the left side than the number, and the number is larger on the right side than the number, and the number of the data in the list is i.
If i is less than 0.9p, recursively finding the 0.9th number from the sub-table of [ i+1, p-1], if i is more than 0.9p, recursively finding the 0.9th number from the sub-table of [0, i-1], and when i-1 is less than or equal to 0 or i+1 is more than or equal to p-1, exiting the first element of the recursion sub-table, merging back, and determining the element as TP90 in the recursion sub-table query process.
The data arranged behind TP90 in the list L is ordered in ascending order to obtain a list L1, and the proportion of TP95 and TP99 relative to TP90 is calculated, specifically, TP95 is the first in the list L1TP99 is the first data in list L1Data, i dx Representing the sequence number of TP90 in list L.
When the access quantity p is more than 10000, it indicates that many logs are still to be processed, at this time, a performance data summary object is created first, compression is performed by default according to the compression ratio c=100, and the method is adjustable, then the request performance data of all logs in the current time period is added to the summary object by default weight 1, and then the summary object can be called to approximate the estimated performance percentile value.
The creating of the abstract object comprises the following steps:
adding the performance data d of each log file into the abstract object, firstly adding the weight of d into the total weight of the abstract object, then searching the cluster with the average value closest to d of the added data clusters, setting a pt= (the weight of the current cluster/the total weight of all clusters before 2+)/the total weight for each cluster, and generating a new cluster according to the data to be added if the weight of the cluster is not found, wherein the weight upper limit wl=4xc.ptile (1-pt) and the total cluster number of the clusters are increased; if a cluster is found and its weight < wl indicates that the cluster has room, the sequence number of this cluster is returned, otherwise a new cluster is also generated.
If multiple clusters are found, there are four cases: if the average value of all clusters is smaller than d, finding the cluster with the largest number and returning the serial number of the cluster when the cluster has space, if the average value of all clusters is larger than d, finding the cluster with the smallest number and returning the serial number of the cluster when the cluster has space, if the average value of all clusters is equal to d, randomly finding a cluster with space from the cluster and then returning the serial number of the cluster, if the average value of only part of clusters is larger than d, finding two clusters with the average value containing d from the cluster, randomly returning the serial number of the cluster with space, and if the average value of all clusters has no space, generating a new cluster. And determining the position sequence number of the new cluster according to the average value of elements in the cluster when the new cluster is generated every time, so that the average value of all the clusters is orderly from small to large. If no new cluster needs to be added, adding d to the found cluster, if the sum of the weights of the clusters after d is added does not exceed wl, directly accumulating the weights of d to the weights of the clusters, wherein the average value of the clusters = the original average value +d weight (d-original average value)/the new weight of the clusters, otherwise, filling the space of the clusters and adopting the residual space to calculate the average value of the clusters after adding the elements:
after creating the performance data summary object through the above process, calculating the percentile value of the performance by using the summary object, for example, letting tp=0.9 when TP90 is calculated, calculating q=tp total weight first, then finding the ith cluster from the beginning, making the sum of the weights of the former i-1 clusters +half of the weights of the ith cluster > q but the sum of the former i-2 clusters not satisfy the condition, then calculating the sum of the slope s= (average value of the ith cluster-average value of the former cluster)/half of the weight of the former cluster, and finally taking the average value +s of the ith cluster (q- (weight of the i cluster/weight sum of all clusters before 2)) as the required percentile value.
When the URL log is compressed, the above step S104 is similarly executed, if nc=1, when url_pv <10000, each performance percentile value, standard deviation, average and maximum performance are directly calculated according to the response performance list, and when url_pv > =10000, each percentile performance is estimated according to the above summary object; if nc >1, when url_pv > =10000/nc, the digest object of the response performance of each nginnx instance is calculated as in step S104 described above, the fields accumulated in step S103 are recombined and sent to Kafka, and when url_pv <10000/nc, the index accumulated in step d) is directly sent to Kafka. When standard deviation is required to be calculated, sending the url_pv, the sum of the performance data of the current instance and the IP of Nginx to Kafka and then accumulating the data of each instance of the project, calculating an average value, returning the average value and the total performance data quantity to the client through a queue, calculating the variance of each instance, accumulating and squaring to obtain the standard deviation;
s105, updating a compression processing history record, and recording the current file and the completed line number.
S106, continuing to process all logs of the next compression section according to the steps S101-S105 until all logs are processed;
the real-time generated Nginx access log can be directly compressed in the memory and then sent to the Kafka cluster, and the main aggregation logic is the same as the compression processing process: the module ngx _http_log_module of the Nginx needs to be developed for the second time, then the Nginx is recompiled, the updated module is added, the actually used Nginx is updated to be effective, and whether the function of real-time aggregation log is started or not can be controlled through the configuration of the Nginx. The scheme can avoid unnecessary I/O in the log aggregation process, and the real-time performance can be better.
2. Compression process for items with multiple nmginx instances.
When an item has multiple nmginx instances, the following section involves further processing the compression results of all of its instances, generating the final record and saving it. This part may be controlled by a configuration file as part of the client in a different operation mode from that described above, or may be compressed and stored by the client after the client sends the input data to the server through an RPC or other protocol instead of Kafka, but such a change is still within the scope of the present patent. This process is performed as follows:
s201, reading the aggregation time length T of the current project, a URL list to be analyzed and the Nginx total instance number nc of the project from the service interface.
S202, reading a compression record of a single Nginx instance from a Kafka queue, accumulating access quantity, performance distribution, 5XX and 4XX error quantity, error distribution, input and output flow of each URL according to compression time of the belonging project, calculating average performance and maximum performance of each URL, calculating standard deviation in cooperation with the calculation, calculating performance of each percentile of each URL according to a compression processing process of a single Nginx instance function dimension, and then assembling each field into records according to composition design of a function log compression history and submitting the records to a server for storage.
S203, reading data from a Kafka queue temporarily storing Nginx instance compression results, accumulating values of indexes of each Nginx instance compression record of the same item number, including access quantity, performance distribution, error quantity and error distribution of each error type, access problem distribution, input and output flow, merging all records with the same starting time into one record, comparing the maximum performance of each Nginx instance, and taking the maximum value, related URL and occurrence time thereof as the maximum performance, occurrence time and related URL of the item in the corresponding compression section;
performance percentile values for multiple nmginx instance items are calculated: when nc >1, if the merged access amount p > =10000, reading the Kafka queue for transferring the data summary objects and merging all the summary objects, randomly scrambling sequence numbers of all clusters of one summary object during merging, then adding each cluster to another summary object one by one, adding each cluster (setting the weight of each cluster as c), gradually increasing the weight to a certain number of times (adjustable), finding the maximum allowable weight a without out-of-limit, if a > c, adding the cluster by the weight of c finally, otherwise, adding the weight of c-a to prevent overload after adding. After the digest objects corresponding to the Nginx instances in a compression time period are combined, calculating each performance percentile value by using the combined digest objects according to the disclosure of the compression processing process of the single Nginx instance; if p is less than 10000, reading the Kafka queue for calculating the total performance and the error quantity, analyzing, and accurately calculating the performance data of each percentile.
S204, according to the composition design of the project log compression history, each field is assembled and submitted to a server for storage.
Further, the embodiment also supports adaptive recommendation of compression duration of the nmginx instance, and specifically comprises the following steps:
s301, calculating the compression ratio r of a single Nginx instance according to the number ns of the Nginx instances and the overall service compression ratio c, wherein r=c/ns;
s302, determining the shortest time length M of at least r log files, wherein the units are milliseconds;
step S3, calculating z=M+N/(24.3600.1000), wherein N represents the total time period number of the log record number not less than r in all time periods M of the same day, and when z exceeds a set value, uploading and storing T, otherwise, entering step S4;
step S304, let m=1.1m and round up, repeat step S303.
Because the number of the result records after compression calculation is greatly reduced compared with that of the original logs (100 original logs are compressed into one by default and adjustable), one ES cluster can store compression results of Nginx logs of a plurality of projects, and simultaneously can comprehensively analyze and compare all indexes from three dimensions of the projects and products, project functions and important events to which the projects belong based on compression historical data, so that query and analysis performances are better, and storage occupation is reduced; the compression result based on the targeted design forms the scheme, and the original log does not need to be checked any more.
The variation trend of the indexes such as the access quantity, the flow, the average performance and the maximum performance can be conveniently and rapidly analyzed based on the compression history through Kibana/Grafana, so that various analysis requirements of the Nginx original log in actual work can be met, and further provided analysis includes but is not limited to: (1) organizing a series of concrete analysis graphs into a dashboard, asynchronously acquiring the change trend of indexes such as project, access quantity of functions, certain performance or/and errors along with time, and displaying details of important events and maximum performance; (2) the ranking list in the form of a table is used for comparing the error quantity, the error rate, the slow energy and the slow performance ratio of different functions similar to different projects or technical schemes to find out which main projects and functions have an optimized space; (3) during analysis, the trends of each error type, the access amount of each client type and the 4XX error amount can be calculated and combined through the pain script; (4) comparing indexes of different projects and trends of the same project at different times based on the time of Kibana; (5) the watch mechanism of ELK can be used to trigger real-time alarm through mail and other channels when the amplitude of any index or trend deterioration is over limit; (6) and analyzing detailed data of important events such as errors, performance, safety and the like, and specific change conditions of indexes such as related system access quantity, error quantity, flow, performance and the like near the occurrence time of the events.
The access logs of K8S ingress and Apache HTTP Server can also be compressed by adopting the scheme, and compared with the original log processed by directly using ELK, the similar effects of processing more logs with less ES storage space and simultaneously providing more and more visual analysis can be realized.
According to the embodiment, the value of each aspect of the Nginx log is displayed through three dimensional designs of the project, the function and the important event, the expansion of records cannot be caused due to the difference of each field of the compression result, three parts of data can be conveniently obtained in parallel on a UI in the forms of a statistical graph, a table, a Metric and the like, the three parts of data are assembled into a display interface which is refreshed at regular time to display the change of main indexes, the correlation influence and the important analysis result, the related details are supported to be checked, and any granularity larger than the compression duration can be selected according to requirements during analysis.
The embodiment of the invention also provides an Nginx log compression analysis system, which comprises a server and a compression end, wherein the server stores an Nginx instance to be compressed and analyzed, and the compression end is used for realizing the relevant steps in the Nginx log compression analysis method provided by the previous embodiments.
As shown in fig. 4, an embodiment of the present invention provides an nginnx log compression analysis device, including: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
In the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to: the steps of the nginnx log compression analysis method described in the foregoing embodiments are implemented.
The embodiments of the present invention also provide a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the nginnx log compression analysis method provided in the foregoing embodiments.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (11)
1. An nginnx log compression analysis method, comprising:
reading an access log file under a single Nginx instance;
determining a compression section according to the access time and the preset compression time length, and compressing the access log file in the single compression section into a first compression record, wherein the first compression record comprises at least one first field and a corresponding first attribute value;
compressing the access log file of the same access URL into a second compressed record, wherein the second compressed record comprises at least one second field and a corresponding second attribute value;
the first compressed record and the second compressed record are sent to a database.
2. The Nginx log compression analysis method of claim 1, wherein the method further comprises:
And compressing access log files corresponding to the same event into event compression records according to a preset event type, wherein the preset event type at least comprises a log state code of 500 or more, and at least one of responding to a request of the log and responding to abnormal content when the time consumption exceeds a set threshold value.
3. The nginnx log compression analysis method of claim 1, wherein the determining the compression section according to the access time and the preset compression duration includes:
according to the access time of the log file analyzed in the Nginx log format, let u=access time L/preset compression duration T, and make a whole, [ u ] T, (u+1) T) interval be a compression section, the access time belongs to [ u ] T, (u+1) T) the log file in the access time is compressed into a first compression record.
4. The Nginx log compression analysis method of claim 1, wherein the generating of the first compressed record includes:
counting and/or inquiring the access log files in the compressed section according to the first field to obtain a corresponding first attribute value,
and/or
Analyzing the log file, and carrying out statistics and/or inquiry on the access log file in the compressed section according to the first field and the log variable to obtain a corresponding first attribute value;
At least one first field and a first attribute value corresponding to the at least one first field form a first compressed record.
5. The Nginx log compression analysis method of claim 1, wherein the generating of the second compressed record includes:
counting and/or inquiring the access log files of the same access URL according to the second field to obtain a corresponding second attribute value;
at least one second field and a second index value corresponding to the second field form a second compressed record.
6. The method of compression analysis of an Nginx log according to claim 1 or 4, wherein the first field includes at least performance values TP90, TP95, TP99, and when the access amount p in a single compressed section is less than 10000, the calculation of TP90 includes:
generating a first performance data list, wherein the list comprises performance data corresponding to each log file;
randomly selecting time-consuming data with the sequence number i, wherein the time-consuming data is larger than any time-consuming data between sequence numbers [1, i-1] and smaller than any time-consuming data between sequence numbers [ i+1, p ] through the following sub-table recursion query process:
if i is less than 0.9p, recursively finding the number of the 0.9p from the sub-table [ i+1, p-1], if i is more than 0.9p, recursively finding the number of the 0.9p-i from the sub-table [0, i-1], and when i-1 is less than or equal to 0 or i+1 is more than or equal to p-1, exiting the first element which is recursively merged and returned to the recursion sub-table, wherein the element determined in the recursion sub-table query process is TP90;
Sorting the data in the first performance data list, which is arranged after TP90, in an ascending order to obtain a second performance data list, wherein the second performance data list comprises c elements;
determining TP95 and TP99 in the second performance data list according to the position of the sequence number of TP90 by utilizing the proportional relation of TP95, TP99 and TP90 on the sequence number, specifically, TP95 is the first performance data listThe data, TP99 is the first +.in the second Performance data List>Data, i dx Representing the sequence number of TP90 in the first performance data list.
7. The Nginx log compression analysis method of claim 1, wherein when there are at least two Nginx instances for a single item, the method further comprises:
reading a first compression record and a second compression record of each Nginx instance;
compressing the first compressed record into a third compressed record;
compressing the second compressed record of the same access URL into a fourth compressed record;
and sending the third compressed record and the fourth compressed record to a database.
8. The method for compression analysis of an nginnx log according to claim 1 or 7, further comprising adaptively recommending a preset compression duration T, specifically comprising:
s1, calculating the compression ratio r of a single Nginx instance according to the number ns of the Nginx instances and the overall service compression ratio c, wherein r=c/ns;
S2, determining the shortest time length M of at least r log files, wherein the units are milliseconds;
step S3, calculating z=M+N/(24.3600.1000), wherein N represents the total time period number of the log record number not less than r in all time periods M of the same day, and when z exceeds a set value, uploading and storing T, otherwise, entering step S4;
step S4, let m=1.1m and round up, repeat step S3.
9. An Nginx log compression analysis system, comprising a server and a compression end, wherein an Nginx instance to be compression analyzed is stored on the server, and the compression end is used for implementing the Nginx log compression analysis method according to any one of claims 1-8.
10. An Nginx log compression analysis device is characterized by comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the nginnx log compression analysis method according to any one of claims 1 to 8.
11. A readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the nginnx log compression analysis method of any of claims 1 to 8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310085553.0A CN116340274A (en) | 2023-01-18 | 2023-01-18 | Nginx log compression analysis method, device and readable storage medium |
PCT/CN2023/134763 WO2024152746A1 (en) | 2023-01-18 | 2023-11-28 | Nginx log compression and analysis method and device, and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310085553.0A CN116340274A (en) | 2023-01-18 | 2023-01-18 | Nginx log compression analysis method, device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116340274A true CN116340274A (en) | 2023-06-27 |
Family
ID=86893751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310085553.0A Pending CN116340274A (en) | 2023-01-18 | 2023-01-18 | Nginx log compression analysis method, device and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116340274A (en) |
WO (1) | WO2024152746A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312258A (en) * | 2023-09-18 | 2023-12-29 | 赛力斯汽车有限公司 | Log monitoring and compressing method, device, computer equipment and storage medium |
WO2024152746A1 (en) * | 2023-01-18 | 2024-07-25 | 天翼数字生活科技有限公司 | Nginx log compression and analysis method and device, and readable storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106936439A (en) * | 2016-09-20 | 2017-07-07 | 南开大学 | It is a kind of general based on the compression preprocess method of block sorting thought and application |
US10545964B2 (en) * | 2017-01-30 | 2020-01-28 | Splunk Inc. | Multi-phased data execution in a data processing system |
CN111651417B (en) * | 2020-07-09 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Log processing method and device |
CN113312376B (en) * | 2021-05-21 | 2022-10-21 | 福建天泉教育科技有限公司 | Method and terminal for real-time processing and analysis of Nginx logs |
CN116340274A (en) * | 2023-01-18 | 2023-06-27 | 天翼数字生活科技有限公司 | Nginx log compression analysis method, device and readable storage medium |
-
2023
- 2023-01-18 CN CN202310085553.0A patent/CN116340274A/en active Pending
- 2023-11-28 WO PCT/CN2023/134763 patent/WO2024152746A1/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024152746A1 (en) * | 2023-01-18 | 2024-07-25 | 天翼数字生活科技有限公司 | Nginx log compression and analysis method and device, and readable storage medium |
CN117312258A (en) * | 2023-09-18 | 2023-12-29 | 赛力斯汽车有限公司 | Log monitoring and compressing method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2024152746A1 (en) | 2024-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11288282B2 (en) | Distributed database systems and methods with pluggable storage engines | |
CN108416620B (en) | Portrait data intelligent social advertisement putting platform based on big data | |
US11775501B2 (en) | Trace and span sampling and analysis for instrumented software | |
US10963330B2 (en) | Correlating failures with performance in application telemetry data | |
US20200257680A1 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
US9576010B2 (en) | Monitoring an application environment | |
CN116340274A (en) | Nginx log compression analysis method, device and readable storage medium | |
US20090182756A1 (en) | Database system testing | |
CN102156933A (en) | Method and counting system for counting electronic commerce transaction data | |
CN112231296B (en) | Distributed log processing method, device, system, equipment and medium | |
JP2015508543A (en) | Processing store visit data | |
CN108228322B (en) | Distributed link tracking and analyzing method, server and global scheduler | |
CN112347355B (en) | Data processing method, device, server and storage medium | |
CN116800596A (en) | Log lossless compression analysis method and system | |
CN113220530B (en) | Data quality monitoring method and platform | |
CN118394713A (en) | Log data processing method, device, equipment, storage medium and program product | |
US20120054181A1 (en) | Online management of historical data for efficient reporting and analytics | |
CN114722078B (en) | Data statistics method, device, equipment, storage medium and program product | |
CN119201399B (en) | Task data configuration method and device, electronic equipment and storage medium | |
CN106557483B (en) | Data processing method, data query method, data processing equipment and data query equipment | |
CN117131059A (en) | Report data processing method, device, equipment and storage medium | |
US20240195714A1 (en) | Methods and Apparatuses for Use in a Network Analytics Tool | |
CN110032560B (en) | Method and device for generating monitoring chart | |
CN104317820B (en) | Statistical method and device for report forms | |
US11995052B1 (en) | System and method for categorical drift detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |