Disclosure of Invention
In order to solve the problems of resource occupation and waste and intelligent monitoring management of interfaces which are not reported and have no service interaction existing in mutual cooperation of large enterprises, an interface monitoring method based on Java byte code embedding technology is provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
an interface monitoring method based on Java byte code embedding technology comprises the following steps: comprises the following steps:
step 1, collecting and analyzing an operation log and acquiring an access relation of an information system;
step 2, mirroring the network key nodes, analyzing and counting all communications in the data packet, and determining the data forwarding relation and the flow load of the key routing nodes;
step 3, constructing a machine learning data source, training a neural network model, determining parameters of the machine learning model, and performing business modeling according to data types;
step 4, intelligently monitoring the interface state between systems, and constructing a global topological structure of the data stream;
step 5, monitoring a data transmission link channel;
step 6, analyzing the original data of the service model, acquiring early warning threshold values of various services and classifying abnormal alarms;
and 7, collecting link layer data and diagnosing faults.
Further, in step 1, acquiring and analyzing the operation log and obtaining the access relationship of the information system, the specific method is as follows: collecting the log files of each interface by using a collection program, and classifying and sorting the log files according to the request types to obtain key information; according to the key information in the log file, the relationship between each piece of key information is induced and analyzed, and then the Neo4j database is used for drawing and storing; so as to facilitate visualization and construction of multivariate information graphs.
The key information in the log file comprises an information source, a message request object, request time and request parameters.
Further, the step 2 of mirroring the network key nodes and analyzing and counting all communications in the data packet includes the specific steps of: calculating routing nodes for information transmission in the whole network by using a routing technology, analyzing data flow and forwarding load of the nodes, and judging the key degree of the nodes; and (4) carrying out statistical analysis on all communication pairs in the data packet by using a network packet capturing means to obtain a least square fitting flow chart of each node. And finally obtaining stable forwarding relation and flow load degree between the key routing nodes and each node through continuous repeated statistical analysis.
Further, the step 3 of counting and analyzing data and performing service modeling according to data types includes the specific steps of: using the data obtained in the step 2 and the step 1 as data sources, dividing the data sources into a data source A of 7/10, a data source B of 2/10 and a data source C of 1/10 according to the proportion of 7:2:1, putting the data source A as training data into a machine learning framework for learning and training, and calculating a HuberLoss loss function L in the training processδ(y,f(x)),
Where δ is the hyper-parameter of HuberLoss, y is the true value, f (x) is the predicted value of the model; after training is finished, parameters of a processing frame and a machine learning network frame which basically accord with the data source can be obtained; the data source B is used for testing, and the data source C is used for verifying, so that the correctness of the parameters learned by the machine is ensured; and carrying out deepening processing on the data by using an Apriori algorithm in big data analysis, searching the internal relation among the data, and carrying out service modeling according to the requested service type so as to realize automatic discovery of interfaces among information systems and automatic generation of data transmission topology among systems.
Further, the method for intelligently monitoring the interface state between the systems in step 4 and constructing the global topology of the data stream includes: deploying Java agents on a server by using a bytecode embedding technology according to the information acquired in the steps 1 to 3, introducing jar packets called by interfaces between systems into a monitoring instance in a JVM (Java virtual machine), dynamically monitoring method execution, SQL/NoSQL access and API (application program interface) calling between applications in real time by using a bytecode manipulation technology when class files are loaded into the JVM, constructing a global topology of data flow direction, and realizing accurate problem positioning of a code level based on the topology; hash storage and real-time update are carried out on the positioning information, and macroscopic analysis is carried out on the information in all interfaces so as to obtain detailed parameters called by the interfaces between the systems;
the information in all interfaces comprises URL paths, request modes, appearance time, application versions, network connection states, interface request Body bodies, Http headers, IP addresses, error return bodies, equipment and systems and application versions.
Further, the monitoring of the data transmission link channel in step 5 includes the following specific steps: monitoring the whole data transmission link channel by adopting a method of a full data packet seven-layer protocol, decoding the seven-layer protocol, and counting and displaying the decoding content; and (4) combining the topology and interface information obtained in the steps 1-4 to realize the visual display of the data full link relation.
The decoding content comprises the whole number of OGGs, the number of OGGs corresponding to each service system, the task type, the running state and the time consumption of an OGG channel corresponding to each service system, and the number of topics, the running state, the time consumption and sessions corresponding to an information channel process.
Further, step 6 is to analyze the original data of the service model, obtain the early warning threshold values of various services and classify abnormal alarms, and the specific method is as follows: processing the monitoring information by using a mathematical statistics and probability analysis method, calculating the occurrence probability of each type of event and analyzing the occurrence distribution rule of the events; searching early warning threshold values monitored by each interface according to the distribution type and the distribution characteristics, wherein the threshold values are generated based on the specific timeliness and integrity requirements of data transmission of each interface, different requests of different information systems are different in threshold value setting, and different software is configured according to the environment to be iterated and continuously changed;
the early warning threshold value is an average value of historical data generated by an application system at the same time and on different dates, and is used as a measurement standard for measuring whether a request generated by the system in real time is abnormal or not, whether a certain index belongs to the range of the historical average value at the current time can be reflected, and if the fluctuation is large, an alarm is given to remind a user that the request under the current application is in the abnormal category of the historical time; classifying various abnormal alarms according to an early warning threshold;
the method comprises the following steps that the triggering of the alarm is divided into two parts, the first part is a statistical result from an early warning threshold value and comprises a corresponding time baseline, an access amount baseline and an error rate baseline, the second part is an isolated forest algorithm, the abnormal point is intelligently detected, the alarm is classified at the same time, and a root alarm result is output by adopting a short message gateway or mailbox butt joint mode; meanwhile, the fault is diagnosed through data link, and the granularity can reach a communication pair level and a code level.
Further, the step 7 of collecting link layer data and diagnosing faults includes the following specific steps: monitoring the data full link, outputting an alarm result according to the service threshold value in the step 6, and packaging and sending the data by adopting a byte coding technology; key nodes of information in the alarm result are mainly explored; collecting data flow information of a data link layer in a network by using the flow bandwidth and the flow rate of data, classifying and inducing, calculating a correlation coefficient, solving the correlation degree, and arranging variables of a correlation sequence and a column correlation matrix; and then, the fault position is positioned by combining the decision tree model technology and the Hash algorithm, so that the purpose of fault diagnosis is achieved.
Compared with the prior art, the invention has the following advantages:
the existing intelligent supervision technology can only monitor a service layer of a system, and the accurate positioning of an interface layer is rarely involved. The technology can achieve the monitoring effect of the code level, improve the monitoring quality and facilitate the real-time adjustment and repair of the system by operators. Meanwhile, the intelligent scoring of the application and the interface is supported, a scoring mechanism can be defined by a user, a key index is selected to set a score and a weight, and the key index necessarily comprises eight indexes such as response time, interface slow rate, interface error rate, Apdex value, interface failure number, interface program abnormal constant, interface program method execution time, interface SQL slow calling number and the like. Through the combing of the interface health state, the operation and maintenance managers at each level can quickly sense the operation health state of each interface. The interface health report content can self-define index service request quantity, response time, error rate, slow rate, minute-level service data and a key transaction data list to form an automatic template, and the automatic template is automatically sent through monthly reports, weekly reports and daily reports. The project is based on technologies such as machine learning, artificial intelligence and big data, and through dynamic data sensing, intelligent monitoring and intelligent health analysis, the quality and the effect of data operation and maintenance can be improved, the quality of shared data is improved, and operation and maintenance mode transformation is promoted. Theoretical and technical support is provided for automatic data interface combing monitoring, automatic topology finding, intelligent baseline alarming and transmission quality monitoring, panoramic visual display capacity of information system data flow is formed, manual workload of data link combing is reduced, and timely and complete data flow is ensured; the method and the device realize accurate positioning of data transmission faults, eliminate potential safety hazards timely and efficiently, and enhance the robustness of an information system. Technically, a Java bytecode class file (. class) is a "target file" generated by a Java compiler compiling a Java source file (. Java). The class file is a binary stream file with 8-bit bytes, each data item is closely arranged from front to back in sequence, and no gap exists between adjacent items, so that the class file is very compact and light in size, can be quickly loaded to a memory by a JVM (JVM) and occupies less memory space (is convenient for network transmission). After the Java source file is compiled by a Java compiler, each class (or interface) independently occupies a class file, and all information in the class has corresponding description in the class file.
Example 1
The scheme is applied to a full-service data center system for intelligent monitoring of the interface:
step 1, collecting log files of each interface of 22 sets of systems integrated by a service center by using a collection Agent installed in an application system server, and classifying, arranging and regularizing according to request types. And extracting key information in the log file, such as a message source, a message request object, request time and request parameters. According to the key information in the log file, the relations among all information are induced and analyzed, and the relations are drawn and stored by using a Neo4j database, so that a multivariate information graph is conveniently visualized and constructed. The interface operation log diagram of fig. 1 and the information system relationship representation diagram of fig. 2 are shown.
And 2, acquiring mirror image flow of an outlet route of the data center, calculating a route node for information transmission in the whole network, analyzing data flow and forwarding load of the node, and judging the key degree of the node. And (4) carrying out statistical analysis on all communication pairs in the data packet by using a network packet capturing means to obtain a least square fitting flow chart of each node. And finally obtaining stable key nodes and the communication relation among the nodes through continuous repeated analysis.
And 3, taking the data obtained in the step 2 and the step 1 as a data source, and enabling the data source to be in a range of 7:2:1, dividing the data into 7/10 for a data source A, 2/10 for a data source B and 1/10 for a data source C, putting the data source A as training data into a machine learning frame for learning and training, and calculating a loss function in the training process; after training is finished, parameters of a processing frame and a machine learning network frame which basically accord with the data source can be obtained; the data source B is used for testing, and the data source C is used for verifying, so that the correctness of the parameters learned by the machine is ensured; and carrying out deep processing on the data by utilizing a big data algorithm, searching the internal relation among the data, and carrying out business modeling according to the requested business type so as to analyze the mutual relation among the businesses.
And 4, deploying Java agents on the server by using a bytecode embedding technology according to the related information acquired in the steps 1 to 3, introducing a jar packet called by the intersystem interface into a monitoring instance in the JVM, dynamically monitoring method execution, SQL/NoSQL access and call among applications in real time by using a bytecode manipulation technology when the class file is loaded into the JVM, constructing a global topology of a data flow direction, realizing accurate data transmission monitoring of a code level based on the topology, and performing Hash storage and real-time updating on the monitoring information. And performing macroscopic analysis on information in all interfaces, including URL paths, request modes, occurrence time, application versions, network connection states, interface request Body, Http headers, IP addresses, error return bodies, equipment, systems and application versions. The monitoring system 22 and the interface 47 are arranged in the same example.
And 5, monitoring the whole data transmission link channel by adopting a method of a full data packet seven-layer protocol, and decoding the seven-layer protocol, wherein the seven-layer protocol comprises the whole quantity of OGGs and the quantity of OGGs corresponding to each service system. And counting and displaying the task type, the running state and the time consumption of the OGG channel corresponding to each service system, and the theme, the running state, the time consumption and the number of sessions corresponding to the information channel flow. This example monitors 6988 data sheets, where 6800 data stream transmission rates are real time and 188 data stream transmission frequencies are weeks or months.
FIG. 5 is a marketing domain system association diagram;
and 6, processing the monitoring information by using mathematical statistics and probability analysis methods, calculating the occurrence probability of each type of event and analyzing the occurrence distribution rule of the events. And searching early warning threshold values monitored by each interface according to the distribution type and the distribution characteristics, wherein the threshold values are generated based on the specific timeliness and integrity requirements of data transmission of each interface, different requests of different information systems are different in threshold value setting, and meanwhile, different software is configured according to the environment and is continuously changed in an iterative manner. The early warning threshold value is an average value of historical data generated by an application system at the same time and on different dates, and is used as a measurement standard for measuring whether a request generated by the system in real time is abnormal or not, whether a certain index belongs to the range of the historical average value at the current time can be reflected, and if the fluctuation is large, an alarm can be given to remind a user that the request under the current application is in the abnormal category of the historical time. And classifying various abnormal alarms according to the early warning threshold value. The method comprises the following steps of triggering an alarm, wherein the first part is a statistical result from an early warning threshold value and comprises a corresponding time baseline, an access amount baseline and an error rate baseline, the second part is an isolated forest algorithm, intelligently detecting an abnormal point, classifying the alarm, and outputting a root alarm result by adopting a short message gateway or mailbox butt joint mode. Meanwhile, the fault is diagnosed through data link, and the granularity can reach a communication pair level and a code level. Fig. 3 is a diagram showing an alarm result display.
And 7, monitoring the data full link, outputting an alarm result according to the service threshold value in the step 6, and packaging and sending the data by adopting a byte coding technology. Key nodes in the alarm information are intensively explored. The method comprises the steps of collecting data flow information of a data link layer in a network by using the flow bandwidth and the flow rate of data, classifying and summarizing the information, calculating a correlation coefficient, solving the correlation degree, arranging correlation sequences, arranging a correlation matrix and other variables. And then, the fault position is positioned by combining the decision tree model technology and the Hash algorithm, so that the purpose of fault diagnosis is achieved. As shown in fig. 4, a failure diagnosis diagram.
FIG. 6 is an exception handling dataflow diagram.
TABLE 1 monitoring data links
TABLE 2 monitoring failure types
From the above results, it can be seen that the present invention has excellent utility. No matter how the complexity and the cross complexity of the system are, the monitoring efficiency is over 99 percent, the data transmission fault between the systems is found in time, and the economic loss is avoided.
The method shortens the calculation time, thereby improving the efficiency of the algorithm and increasing the monitoring efficiency. On the problem that a plurality of systems are crossed, the method can obtain better monitoring results.