CN120123184A

CN120123184A - Distributed server cluster log processing method and device

Info

Publication number: CN120123184A
Application number: CN202510601646.3A
Authority: CN
Inventors: 樊晓峰; 樊明辉
Original assignee: Fullsee Technology Co ltd
Current assignee: Fullsee Technology Co ltd
Priority date: 2025-05-12
Filing date: 2025-05-12
Publication date: 2025-06-10
Anticipated expiration: 2045-05-12
Also published as: CN120123184B

Abstract

The embodiment of the application provides a method and a device for processing a distributed server cluster log, and dynamically distributing the acquisition tasks through a load balancing dispatching center by constructing a distributed server cluster log acquisition network. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

Description

Distributed server cluster log processing method and device

Technical Field

The application relates to the field of data processing, in particular to a distributed server cluster log processing method and device.

Background

The existing log processing method has obvious defects. Traditional systems often adopt a single server processing mode, lack a distributed load balancing mechanism, and are difficult to meet the log acquisition requirements of a large-scale server cluster.

Furthermore, the prior art has a bottleneck in terms of data transmission efficiency. Most systems adopt a full data transmission mode, and incremental acquisition and real-time compression cannot be realized, so that the occupation of network bandwidth is high and the transmission delay is large.

Existing systems have technology shortboards in log analysis. The lack of intelligent anomaly detection and correlation analysis capabilities makes it difficult to quickly identify anomaly patterns and mining event correlations from a vast array of logs. The resolution of these problems is significant in improving the performance and usability of log processing systems.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides a method and a device for processing the distributed server cluster logs, which can effectively solve the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like and obviously improve the performance and the practicability of a log processing system.

In order to solve at least one of the problems, the application provides the following technical scheme:

In a first aspect, the present application provides a method for processing a distributed server cluster log, including:

Establishing a distributed server cluster log acquisition network, and establishing a server cluster configuration table on a management terminal, wherein the server cluster configuration table comprises network addresses, port numbers, access certificates and server load thresholds of all servers to be monitored, and establishing a load balancing dispatching center based on the server cluster configuration table, wherein the load balancing dispatching center periodically acquires system load information of all servers to be monitored, dynamically distributes log acquisition tasks according to the system load information, and generates task dispatching queues according to preset task priorities;

Constructing an incremental log data acquisition channel, establishing a websocket long connection pool between the management terminal and each server to be monitored based on the task scheduling queue, recording the latest reading position of each log file, carrying out incremental data acquisition according to the latest reading position, carrying out real-time compression coding on the acquired log data, detecting repeated data based on a sliding time window, transmitting the compression coded data to the management terminal after eliminating the repeated data, and decompressing and restoring the received compression coded data at the management terminal;

An intelligent log analysis model is constructed, a deep learning method is adopted to train historical log data, an analysis engine comprising an abnormality detection model, a pattern recognition model and an event association analysis model is constructed, the restored log data is input into the analysis engine, the abnormality detection model calculates abnormal scores based on log feature vectors, the pattern recognition model classifies and marks log contents, the event association analysis model mines association relations among multidimensional logs to generate abnormal event early warning information, the abnormal event early warning information, log classification marking results and association analysis results are visually displayed in a management terminal interface, and when a user trigger downloading operation is detected, analysis result data in a specified time range are packed and compressed and then are downloaded to a local storage.

Further, the method further comprises the steps of inputting server cluster information in a configuration interface of a management terminal, storing the server cluster information in a database to generate a server cluster configuration table, wherein the server cluster configuration table comprises a server identifier, a network address, a port number, an access certificate, a load threshold and a priority field, reading the network address and the port number in the server cluster configuration table to establish TCP connection, verifying the validity of the access certificate through SSH protocol, and marking a server node as an available state after verification;

Constructing a load monitoring agent program, deploying the load monitoring agent program to each server to be monitored, collecting CPU utilization rate, memory occupancy rate, disk IO and network bandwidth data by the load monitoring agent program, calculating a comprehensive load score of the server, comparing the comprehensive load score with a load threshold value in a server cluster configuration table, sending an early warning signal to a server exceeding the load threshold value, and adjusting the collection task priority of the server according to the early warning signal.

Further, a load balancing dispatching center process is created, a load threshold value and a priority parameter in the server cluster configuration table are read, a system load information acquisition timing task is established, the system load information acquisition timing task acquires CPU utilization rate, memory occupancy rate, disk IO and network bandwidth data from a load monitoring agent program of each server to be monitored according to a preset time interval, and the system load information is stored in a load state cache table;

And calculating the available resource capacity of each server based on the system load information in the load state cache table, performing task quantity evaluation on the log acquisition task, matching the task quantity with the available resource capacity, generating a task allocation scheme, constructing a task scheduling queue according to the task allocation scheme by combining with the priority parameters in the server cluster configuration table, and distributing the task scheduling queue to the load monitoring agent programs of the servers to be monitored.

Further, the method further comprises the steps of obtaining a network address and a port number of a server to be monitored based on the task scheduling queue, establishing long connection between a management terminal and the server to be monitored through websocket protocol, storing the long connection into a connection pool according to a server identifier, creating a connection pool manager, monitoring a connection state by the connection pool manager, automatically reestablishing disconnected connection, and maintaining availability of the connection pool;

Creating a log file reading position record table, wherein the log file reading position record table comprises a log file path, a file size, a last reading time stamp and a reading position offset field, the offset of the last reading position is obtained from the log file reading position record table before each reading of the log file, the offset is used as the starting position of a new reading round, and the time stamp and the offset in the log file reading position record table are updated after the reading is completed.

Further, the method further comprises the steps of obtaining offset from the log file reading position record table, reading newly added log data by taking the offset as a file pointer position, performing line segmentation processing on the read log data, calculating a hash value of each line of data, obtaining the hash value of the historical data from a log data cache according to a preset time window range, comparing the hash value of the current data with the hash value of the historical data, removing repeated data, and writing a de-duplication result into a compression buffer area;

And executing an LZ4 compression algorithm on the log data in the compression buffer zone to generate a compressed data block, adding a data identification header to the compressed data block, wherein the data identification header comprises the size of the data block, the type of the compression algorithm and checksum information, transmitting the compressed data block to a management terminal through websocket long connection, decompressing the compressed data by the management terminal according to the data identification header by selecting a corresponding decompression algorithm, and writing the decompressed data into a log storage area.

Further, the method comprises the steps of reading historical log data from a log storage area, carrying out text preprocessing on the historical log data, extracting a feature set formed by a timestamp, an event type, an operation object, a state code, an execution result, an error code, a user identifier and an operation instruction in a log, converting the feature set into a vector representation, constructing an anomaly detection model and a pattern recognition model by adopting an LSTM (least squares) neural network, constructing an event correlation analysis model by adopting a graph neural network, training the anomaly detection model, the pattern recognition model and the event correlation analysis model by using the feature vector, and storing the trained model into a model library;

Loading an anomaly detection model, a pattern recognition model and an event association analysis model from the model library to construct a log analysis engine, converting the restored log data into feature vectors, inputting the feature vectors into the log analysis engine, calculating the anomaly scores of the feature vectors by the anomaly detection model, carrying out multi-classification prediction on the log content by the pattern recognition model to output category labels, mining time sequence association and causal relationship among log events by the event association analysis model based on a graph structure, and generating anomaly event early warning information according to the anomaly scores, the category labels and the association relationship.

Creating a visual panel of the management terminal, dividing the abnormal event early warning information into three levels of high, medium and low according to the abnormal level, displaying the abnormal event early warning information in an early warning information area, displaying the log classification marking result in a cake graph mode, displaying the association strength among log event nodes by the association analysis result through a force guide graph, creating a time range selector in the visual panel, and updating visual data in real time according to the start-stop time of the time range selector;

Monitoring a download button click event of a user in the visual panel, acquiring a start-stop time parameter in a time range selector, inquiring abnormal event early warning information, a log classification marking result and a correlation analysis result in the time range from a database, converting the inquiry result into a JSON format, executing a ZIP compression algorithm on the JSON format data to generate a compressed file, and storing the compressed file in a local storage path appointed by the user.

In a second aspect, the present application provides a distributed server cluster log processing apparatus, including:

The dynamic allocation module is used for establishing a distributed server cluster log acquisition network, creating a server cluster configuration table on a management terminal, wherein the server cluster configuration table comprises network addresses, port numbers, access certificates and server load thresholds of all servers to be monitored, establishing a load balancing dispatching center based on the server cluster configuration table, periodically acquiring system load information of all servers to be monitored, dynamically allocating log acquisition tasks according to the system load information, and generating a task dispatching queue according to preset task priorities;

The log processing module is used for constructing an incremental log data acquisition channel, establishing a websocket long connection pool between the management terminal and each server to be monitored based on the task scheduling queue, recording the latest reading position of each log file, carrying out incremental data acquisition according to the latest reading position, carrying out real-time compression coding on the acquired log data, detecting repeated data based on a sliding time window, transmitting the compression coded data to the management terminal after eliminating the repeated data, and decompressing and restoring the received compression coded data at the management terminal;

The log analysis module is used for constructing an intelligent log analysis model, training historical log data by adopting a deep learning method, constructing an analysis engine comprising an anomaly detection model, a pattern recognition model and an event association analysis model, inputting the restored log data into the analysis engine, calculating an anomaly score by the anomaly detection model based on a log feature vector, classifying and labeling log contents by the pattern recognition model, mining association relations among multidimensional logs by the event association analysis model, generating anomaly event early warning information, visually displaying the anomaly event early warning information, log classification labeling results and association analysis results in the management terminal interface, and packing and compressing analysis result data in a specified time range and downloading the analysis result data to local storage when a user triggering downloading operation is detected.

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the distributed server cluster log processing method when the program is executed.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the distributed server cluster log processing method.

In a fifth aspect, the present application provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the distributed server cluster log processing method.

According to the technical scheme, the application provides a distributed server cluster log processing method and device, a distributed server cluster log acquisition network is constructed, and acquisition tasks are dynamically distributed through a load balancing dispatching center. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for processing logs of a distributed server cluster according to an embodiment of the present application;

FIG. 2 is a block diagram of a distributed server cluster log processing device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals:

An electronic device 9600, a central processor 9100, a memory 9140, a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, a power supply 9170, a buffer memory 9141, an application/function storage portion 9142, a data storage portion 9143, a driver storage portion 9144, an antenna 9111, a speaker 9131, and a microphone 9132.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.

In view of the problems existing in the prior art, the present application provides a method and apparatus for processing logs of a distributed server cluster, and dynamically distributing the acquisition tasks through a load balancing dispatching center by constructing a distributed server cluster log acquisition network. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

In order to effectively solve the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, the application provides an embodiment of a distributed server cluster log processing method, which specifically comprises the following steps:

Step S101, a distributed server cluster log acquisition network is established, a server cluster configuration table is established on a management terminal, the server cluster configuration table comprises network addresses, port numbers, access certificates and server load thresholds of all servers to be monitored, a load balancing dispatching center is established based on the server cluster configuration table, the load balancing dispatching center periodically acquires system load information of all servers to be monitored, dynamic allocation is carried out on log acquisition tasks according to the system load information, and task dispatching queues are generated according to preset task priorities;

Optionally, the embodiment realizes efficient log data acquisition and processing by constructing a log acquisition network of the distributed server cluster. On a management terminal, a configuration interface with good expansibility is designed, and the interface adopts a modularized design and comprises three main functional modules of a server information input area, a connection state display area and a load monitoring area. The configuration interface realizes responsive layout through a compact framework, and ensures that good display effects can be obtained on different terminal devices.

The embodiment adopts a step-by-step verification mechanism when the server information is recorded. Firstly, checking the format of an input network address to support three formats of IPv4, IPv6 and domain name, then verifying the validity of port numbers, using 8080 ports by default and supporting custom port configuration, finally encrypting and storing access certificates, encrypting sensitive information such as passwords by adopting an asymmetric encryption algorithm, and storing the ciphertext in a database to ensure the security. The step-by-step verification mechanism effectively avoids the problem of connection failure caused by configuration errors.

The server cluster configuration table created in this embodiment is stored by using a relational database, and the table structure includes fields such as a server ID, a network address, a port number, an access credential, a load threshold, and a priority. The server ID is used as a primary key, uniqueness is ensured by adopting a UUID generation algorithm, a load threshold field is used for storing threshold parameters of four dimensions of CPU utilization rate, memory occupancy rate, disk IO and network bandwidth, a priority field is expressed by an integer, and the smaller the numerical value, the higher the priority. The database adopts a master-slave replication architecture to provide data backup and failover capabilities.

The embodiment adopts innovative design on the realization of the load balancing dispatching center. The dispatching center operates as an independent micro-service, and uses a Spring Cloud framework to realize service registration and discovery functions. The dispatching center realizes distributed coordination through the ZooKeeper, and when the master node fails, the slave node can rapidly take over the dispatching task, so that the high availability of the service is ensured. Meanwhile, the dispatching center also realizes the dynamic refreshing function of the configuration, and when the configuration of the server cluster is changed, the server cluster can be effective without restarting.

The embodiment designs an accurate system load acquisition mechanism. And deploying a load acquisition agent on each server to be monitored, wherein the agent program is developed by adopting the Go language and has low resource occupation and high concurrency processing capacity. The agent program obtains CPU utilization rate, memory occupancy rate, disk IO and network bandwidth data through system call, and the sampling interval can be dynamically adjusted and defaults to 30 seconds. After the collected data is subjected to local aggregation processing, the collected data is sent to a dispatching center through gRPC protocols, so that the efficiency and reliability of data transmission are ensured.

The embodiment realizes an intelligent task allocation algorithm. After receiving the load data, the dispatching center first calculates the comprehensive load score of each server. The calculation formula is as follows:

Score=w1cpu+w2memory+w DiskIO +w4network, where w1 to w4 are weight coefficients, which are adjustable according to the actual traffic scenario. When the score exceeds a preset threshold, triggering a load alarm, automatically reducing the priority of the acquisition task of the server by the system, and transferring part of tasks to the server with lower load.

The embodiment builds an efficient task scheduling queue. The queue is realized by adopting a priority queue data structure, and dynamic insertion and adjustment of tasks are supported. Each task contains information such as server identification, acquisition path, acquisition interval and the like. The queue processing adopts a multithreading model, and the size of a thread pool is dynamically adjusted according to the performance of a server. Meanwhile, a batch processing mechanism of tasks is realized, tasks with the same priority can be combined and executed, and the processing efficiency is improved.

By the technical innovation, the key problems of complex configuration management, unreasonable load balancing, low task scheduling efficiency and the like of log acquisition in a distributed environment are effectively solved. In practical application, the scheme can support the log acquisition requirement of a large-scale server cluster, and continuous and stable operation of acquisition tasks is ensured through dynamic load balancing and intelligent task scheduling. The method is particularly suitable for a micro-service architecture and a containerized deployment environment, and the reliability and the efficiency of log acquisition are remarkably improved. The system and the innovation of the scheme enable the system to adapt to enterprise application requirements of different scales, and comprehensive improvement of a log acquisition system is realized through flexible management of configuration and intelligent scheduling of tasks.

Step S102, constructing an incremental log data acquisition channel, establishing a websocket long connection pool between the management terminal and each server to be monitored based on the task scheduling queue, recording the latest reading position of each log file, carrying out incremental data acquisition according to the latest reading position, carrying out real-time compression coding on the acquired log data, detecting repeated data based on a sliding time window, transmitting the compression coded data to the management terminal after eliminating the repeated data, and decompressing and restoring the received compression coded data at the management terminal;

Optionally, the embodiment realizes efficient distributed log collection by establishing an incremental log data collection channel. Based on the event driven model of node. Js, the management terminal creates a WebSocket server instance and monitors connection requests from each server to be monitored. When the connection is established, a handshake protocol is adopted to carry out authentication, after the authentication is passed, a connection object is stored in a connection pool, a heartbeat detection mechanism is started, and a ping frame is sent periodically to ensure the activity of the connection.

The present embodiment employs a hierarchical design in connection pool management. The connection pools are grouped by server cluster, each group maintaining a separate connection counter and status flag. The core parameters of the connection pool comprise the maximum connection number, idle timeout time, reconnection interval and the like, and the parameters can be dynamically adjusted through configuration files. When a disconnection is detected, the connection pool manager automatically triggers a reconnection mechanism, ensuring continued availability of the connection.

The embodiment realizes an accurate file position tracking mechanism. A location record table is created in the Redis, and metadata information of each log file is stored using a hash structure. The key is generated by combining the server ID and the file path, and the value contains fields such as the file size, the last read time stamp, the read position offset, and the like. And the acquisition program queries the position record table before each reading to acquire the last reading position, so that incremental acquisition is realized, and the acquired data is prevented from being repeatedly processed.

This embodiment designs an efficient data compression scheme. The LZ4 compression algorithm is adopted to compress the log data in real time, and has extremely high compression speed and high compression ratio. The compression process adopts a stream processing mode, the size of a compression block of 64KB is set, and each compression block contains independent header information and supports random access and parallel decompression. Meanwhile, the dynamic adjustment of the compression level is realized, and the optimal compression parameters are automatically selected according to the load condition of the CPU.

The present embodiment builds a reliable duplicate data detection mechanism. The sliding time window method is adopted, the window size is set to be 5 minutes by default, and the window size can be adjusted according to actual requirements. Within the window, the hash value of the processed log line is recorded using a bloom filter. When a new log line arrives, firstly, calculating the hash value of the new log line and querying a bloom filter, and if repetition is found, directly discarding the new log line. The bloom filter adopts a plurality of hash functions to improve the accuracy, and meanwhile, the memory occupation is controlled by periodically cleaning out expired data.

The embodiment realizes an efficient data transmission protocol. The WebSocket frame encapsulates a custom application layer protocol comprising a magic number of 4 bytes, a version number of 1 byte, a data length of 4 bytes, a checksum of 2 bytes, and a variable length data body. The transmitting end packages the compressed data according to the protocol format, and the receiving end analyzes the data through a protocol analyzer. The transmission process supports slice transmission, the size of a single slice is limited to be within 1MB, and the impact of large data block transmission on a network is avoided.

The embodiment designs a complete data reduction flow. After receiving the compressed data, the management terminal firstly performs protocol analysis and data integrity verification. And after the verification is passed, selecting a corresponding decompression algorithm according to the header information of the compression block to decompress. The decompression process adopts multi-thread parallel processing, and each thread is responsible for the decompression task of one compression block. And writing the decompressed data into a local cache to wait for subsequent analysis and processing.

Through the technical innovation, the key problems in distributed log acquisition are effectively solved, such as low data transmission efficiency, repeated data interference, large decompression recovery delay and the like. In practical application, the scheme can support the real-time log acquisition requirement of a large-scale server cluster, and the occupation of network bandwidth and the consumption of storage space are obviously reduced through incremental acquisition and efficient compression transmission. The method is particularly suitable for a micro-service architecture environment, and ensures the integrity and accuracy of log data through a reliable connection pool management and data deduplication mechanism. The system and the innovation of the scheme enable the system to adapt to enterprise application requirements of different scales, and comprehensive improvement of log acquisition efficiency is achieved through technical innovation and optimization design.

Step 103, an intelligent log analysis model is constructed, a deep learning method is adopted to train historical log data, an analysis engine comprising an anomaly detection model, a pattern recognition model and an event association analysis model is constructed, the restored log data is input into the analysis engine, the anomaly detection model calculates anomaly scores based on log feature vectors, the pattern recognition model classifies and marks log contents, the event association analysis model mines association relations among multidimensional logs, abnormal event early warning information is generated, the abnormal event early warning information, log classification marking results and association analysis results are visually displayed in the management terminal interface, and when a user trigger downloading operation is detected, analysis result data in a specified time range are packaged and compressed and then are downloaded to local storage.

Optionally, the embodiment constructs the intelligent log analysis engine by adopting a deep learning method. Firstly, preprocessing historical log data, and extracting key fields in the log, including time stamps, event types, operation objects, state codes and the like, through a regular expression. The Word2Vec model is adopted to convert the log text into dense vector representation, the window size is set to be 5 in the training process, the vector dimension is 128, and the semantic relation among words is fully captured.

The present embodiment adopts a bidirectional LSTM network structure in the anomaly detection model. The input layer receives the log feature vector sequence, the LSTM layer comprises 128 hidden units, and long-term dependency relationship of the log sequence is learned through a gating mechanism. The model training adopts a contrast learning strategy, and takes a normal log sequence as a positive sample and an artificially constructed abnormal sequence as a negative sample. The loss function uses the contrast loss to pull the distance between normal samples closer and push away the distance between abnormal samples. The abnormal score is obtained by calculating the average distance between the sample to be measured and the normal sample set.

The present embodiment designs an innovative pattern recognition model. With the attention-enhanced CNN network, the convolution layer uses multiple convolution kernels of different sizes to extract features in parallel, capturing text patterns of different scales. The attention mechanism calculates weights on the feature map, highlighting important pattern information. The classification layer outputs probability distributions for each class using a softmax function. The log categories that the model can recognize include system errors, security warnings, performance anomalies, configuration changes, etc., supporting dynamic expansion of new categories.

The embodiment realizes a complex event correlation analysis model. And constructing an event association graph based on the graph neural network, wherein nodes represent log events, and edges represent association relations among the events. The characteristics of each node comprise information such as event type, occurrence time, influence range and the like, and the weight of the edge is calculated through time sequence correlation and causal relation. The model adopts a graph attention network (GAT) structure, and the association mode among nodes is learned through a multi-head attention mechanism, so that tracking analysis of a complex event chain is realized.

The embodiment builds a self-adaptive early warning mechanism. Comprehensively considering the scores of the anomaly detection models, the class probabilities of pattern recognition and the influence ranges associated with the events, and calculating the risk level of the events. The risk assessment adopts a weighted scoring method, and the weight coefficient is dynamically adjusted through the history early warning effect. When the risk level exceeds a threshold, pre-warning information is generated, including descriptions of anomalies, possible causes, and processing advice. The early warning information is pushed to the management terminal through the message queue, so that timely response is ensured.

The embodiment designs an intuitive visual interface. And constructing a front-end application by adopting a React framework, and realizing data visualization by using ECharts libraries. The abnormal early warning information is displayed in a time axis form, and different risk levels are marked by different colors. And the log classification result shows the distribution proportion of each category through a ring graph. The event correlation analysis results are displayed by using a force-directed graph, the size of the node represents the importance of the event, and the thickness of the edge represents the correlation strength. The visualization component supports interoperation such as time-horizon selection, node expansion, path tracing, and the like.

The embodiment realizes a flexible data export function. The user can select a time range and export content types on the interface, and the system background queries analysis results of corresponding time periods to convert the data into a JSON format. For a large amount of data, the memory overflow is avoided by adopting the slicing process. The data compression uses the GZIP algorithm, so that the size of the exported file can be effectively reduced. The exporting process supports breakpoint continuous transmission, and ensures the reliability of downloading the large file.

Through the technical innovation, the key problems in log analysis, such as low abnormality detection accuracy, insufficient event correlation analysis, early warning response delay and the like, are effectively solved. In practical application, the scheme can accurately identify various abnormal conditions, and provides more and more accurate analysis results through continuous optimization of the deep learning model. The method is particularly suitable for operation and maintenance monitoring of a large-scale distributed system, and the efficiency of problem positioning and fault prevention is remarkably improved through multi-dimensional intelligent analysis. The system and the innovation of the scheme enable the system to adapt to the log analysis requirements of different types, and the comprehensive improvement of the log analysis capability is realized through continuous learning and optimization of the model.

From the above description, the method for processing the distributed server cluster logs provided by the embodiment of the application can dynamically allocate the acquisition tasks by constructing the distributed server cluster log acquisition network and by the load balancing dispatching center. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

In an embodiment of the method for processing a log of a distributed server cluster of the present application, the method may further specifically include the following:

Step S201, server cluster information is input into a configuration interface of a management terminal, the server cluster information is stored in a database to generate a server cluster configuration table, the server cluster configuration table comprises a server identifier, a network address, a port number, an access certificate, a load threshold and a priority field, the network address and the port number in the server cluster configuration table are read to establish TCP connection, the validity of the access certificate is verified through SSH protocol, and after verification, a server node is marked as an available state;

Step S202, constructing a load monitoring agent program, deploying the load monitoring agent program to each server to be monitored, collecting CPU utilization rate, memory occupancy rate, disk IO and network bandwidth data by the load monitoring agent program, calculating a comprehensive load score of the server, comparing the comprehensive load score with a load threshold value in a server cluster configuration table, sending out an early warning signal to the server exceeding the load threshold value, and adjusting the collection task priority of the server according to the early warning signal.

Optionally, the present embodiment adopts a distributed architecture concept in the design of the configuration interface of the management terminal. The interface is realized through a Vue. Js framework, and functions such as server information input form, state monitoring panel, load threshold configuration and the like are decoupled into independent components by adopting a componentization design mode. The form verification adopts async-validator library to realize real-time verification of fields such as network address, port number and the like, and avoid the submission of invalid data.

The embodiment realizes a safe data storage mechanism. Firstly, the server cluster information is subjected to data standardization processing to escape special characters, so that SQL injection attack is prevented. The database adopts PostgreSQL to create a server configuration table, the primary key adopts self-increasing sequence, and unique index constraint of network address and port number is established. The access credentials are encrypted in one direction by using bcrypt algorithm before storage, so that the security of the sensitive information is ensured. And the database connection pool adopts HikariCP, and the access performance is improved through the connection pool configuration optimization.

The present embodiment builds a reliable connection verification mechanism. The TCP connection establishment adopts a non-blocking mode, setting the connection timeout time to 5 seconds. After the connection is successful, SSH verification is realized through JSch library, and two authentication modes of a password and a secret key are supported. In the verification process, a retry mechanism is adopted, the maximum retry number is 3, and the retry interval of each time is exponentially increased. After verification is passed, the state of the server node is recorded in Redis, and the server node is stored by using a hash structure, wherein the key is a server identifier, and the value comprises a state mark and the last update time.

This embodiment designs an efficient load monitoring agent. The agent was developed using the Go language and the goroutine was employed to achieve concurrency acquisition. CPU utilization rate is obtained through analysis/proc/stat files, and CPU time duty ratio of user state and system state is calculated. The memory occupancy rate obtains the use condition of the physical memory and the exchange space through the/proc/meminfo file. The disk IO acquires the read-write rate through/proc/diskstats, and the network bandwidth acquires the statistics of the receiving and transmitting package through/proc/net/dev.

The embodiment realizes an innovative load calculation model. The comprehensive load score adopts a weighted calculation method, and the calculation formula is as follows:

score=w1cpu+w2memory+w DiskIO +w4network, where w1 to w4 are weight coefficients and initial values are 0.4, 0.3, 0.2, 0.1, respectively. The weight coefficient can be dynamically adjusted through the configuration file, and the method is suitable for the resource characteristics of different servers. The sampled data is smoothed by a sliding window, the window size is 5 minutes, and the influence of instantaneous fluctuation is reduced.

The embodiment builds an intelligent early warning mechanism. Load threshold detection adopts a multi-level threshold design and is divided into warning and serious levels. When the load score exceeds the warning threshold, the system gives out warning level early warning, and when the load score exceeds the serious threshold, the system gives out serious level early warning. The early warning information is pushed to the management terminal through a message queue (RabbitMQ), so that instantaneity is ensured. Meanwhile, the early warning information is stored in a time sequence database (InfluxDB) in a lasting mode and is used for subsequent trend analysis.

The embodiment realizes a dynamic task adjustment strategy. When the early warning signal is received, the dispatching system firstly pauses the allocation of a new acquisition task to the overload server. For the executing task, the system will evaluate its priority, migrating the low priority task to the lower load server. And a smooth switching strategy is adopted in the task migration process, so that the continuity of data acquisition is ensured. And simultaneously updating the available resource capacity of the server for subsequent task allocation decisions.

By the technical innovation, the key problems of complex configuration management, inaccurate load monitoring, unreasonable task scheduling and the like in server cluster management are effectively solved. In practical application, the scheme can adapt to the dynamic management requirement of a large-scale server cluster, and ensures the stable operation of a log acquisition system through accurate load monitoring and intelligent task scheduling. The method is particularly suitable for micro-service architecture environments, and the availability and reliability of the system are obviously improved through distributed monitoring and dynamic adjustment. The system and the innovation of the scheme enable the system to support enterprise application requirements of different scales, and the comprehensive improvement of the management efficiency of the server cluster is realized through continuous optimization and improvement.

step 301, creating a load balancing dispatching center process, reading a load threshold value and a priority parameter in the server cluster configuration table, and creating a system load information acquisition timing task, wherein the system load information acquisition timing task acquires CPU utilization rate, memory occupancy rate, disk IO and network bandwidth data from a load monitoring agent program of each server to be monitored according to a preset time interval, and stores the system load information into a load state cache table;

Step S302, calculating available resource capacity of each server based on system load information in the load state cache table, performing task quantity evaluation on a log acquisition task, matching the task quantity with the available resource capacity, generating a task allocation scheme, constructing a task scheduling queue according to the task allocation scheme and combining priority parameters in a server cluster configuration table, and distributing the task scheduling queue to load monitoring agent programs of all servers to be monitored.

Optionally, the embodiment adopts a micro-service architecture on the design of the load balancing dispatching center. The dispatch center process is implemented using a Spring Cloud framework, and service discovery and registration is implemented through a service registry (Eureka). When the process is started, the server cluster configuration is firstly obtained from the configuration center, a configuration hot update mechanism is adopted, dynamic refreshing of the configuration is supported, and service interruption caused by restarting service is avoided.

The embodiment realizes an accurate load information acquisition mechanism. The timing task scheduling adopts a Quartz frame, and CronTrigger expressions are configured to realize a flexible scheduling strategy. The acquisition interval is set to 30 seconds by default and can be dynamically adjusted according to actual requirements. The acquisition process adopts an asynchronous mode, and data acquisition requests of a plurality of servers are processed in parallel by CompletableFuture, so that the acquisition efficiency is remarkably improved.

The embodiment designs an efficient cache management strategy. The load state cache table is deployed by adopting a Redis cluster, and the server load information is stored by using a hash structure. The key value design adopts a 'serverid:timestamp' format, so that the time sequence of the data is ensured. Meanwhile, a multi-level caching mechanism is realized, hot spot data are stored in a local cache (Caffeine), and network request overhead is reduced. The cache elimination adopts an LRU algorithm, and reasonable expiration time is set to prevent the expiration of cache data.

The embodiment builds an innovative resource capacity calculation model. The available resource capacity is comprehensively evaluated through a plurality of dimension indexes, and the calculation formula is as follows:

Capacity = min(1-CPURatio, 1-MemRatio, 1-IOWeight, 1-NetWeight) BaseCapacity, wherein BaseCapacity is the server base processing power. And each index ensures dimension consistency through normalization processing. And meanwhile, introducing a fluctuation coefficient, and predicting the short-term resource use trend by an exponential smoothing method.

The embodiment realizes an intelligent task amount evaluation mechanism. The evaluation of the log collection task considers multiple dimensions, namely log file size, update frequency, resolution complexity and the like. The evaluation result is converted into a standardized resource demand, and the calculation formula is as follows:

TaskLoad = FileSizeUpdateFreq ComplexityFactor, wherein ComplexityFactor is dynamically determined based on log format type. The task evaluation result is cached locally and updated periodically, so that repeated calculation is avoided.

The present embodiment designs an accurate task matching algorithm. The tasks are arranged in descending order of resource demand by adopting an improved Best Fit algorithm (Best Fit), and are preferentially distributed to the servers with the closest residual resources. The matching process takes into account the geographical location of the servers and network delays, and preferentially selects servers in the same area. When resource competition occurs, arbitration is performed according to the task priority, so that the resource requirement of the key task is ensured.

The embodiment builds a reliable task scheduling queue. The queue is realized by adopting a priority queue, and the dynamic adjustment of the task priority is supported. The queues are stored in Redis, and using Sorted Set structures, the score value is generated from the task priority and creation time combination, ensuring the FIFO order of tasks of the same priority. The queue supports batch operation, and task distribution efficiency is improved.

The embodiment realizes an efficient task distribution mechanism. The distribution process adopts a push-pull combined mode, and the agent program can regularly pull task update while the dispatching center actively pushes tasks to the agent program. The task distribution adopts a breakpoint continuous transmission mechanism, so that reliable transmission under the condition of network fluctuation is ensured. Meanwhile, real-time feedback of the task execution state is realized, and automatic retry of abnormal tasks is supported.

Through the technical innovation, the key problems in distributed log acquisition, such as inaccurate load balancing, unreasonable task allocation, low scheduling efficiency and the like, are effectively solved. In practical application, the scheme can support the dynamic scheduling requirement of a large-scale server cluster, and the overall performance of the system is remarkably improved through accurate load assessment and intelligent task allocation. The method is particularly suitable for micro-service architecture environments, and ensures the real-time performance and the integrity of log acquisition through flexible scheduling strategies and reliable task distribution. The scheme is systematic and innovative, so that the scheme can adapt to enterprise application requirements of different scales, and the comprehensive improvement of the dispatching system efficiency is realized through continuous optimization and improvement.

Step S401, based on the task scheduling queue, obtaining a network address and a port number of a server to be monitored, establishing long connection between a management terminal and the server to be monitored through websocket protocol, storing the long connection into a connection pool according to a server identifier, creating a connection pool manager, monitoring a connection state by the connection pool manager, automatically reestablishing disconnected connection, and maintaining the availability of the connection pool;

Step S402, creating a log file reading position record table, wherein the log file reading position record table comprises a log file path, a file size, a last reading time stamp and a reading position offset field, the offset of the last reading position is obtained from the log file reading position record table before each reading of the log file, the offset is used as the starting position of a new reading round, and the time stamp and the offset in the log file reading position record table are updated after the reading is completed.

Optionally, the embodiment adopts a hierarchical handshake mechanism in the WebSocket long connection establishment process. First, establishing WebSocket connection through an HTTP upgrading request, wherein a request header comprises an Upgrade field and a Sec-WebSocket-Key field. After verifying the validity of the request, the server returns a response containing Sec-WebSocket-Accept to finish the protocol upgrading. Identity authentication is performed immediately after connection is established, and the safety of connection is ensured by adopting JWT (JSON Web Token) mechanism.

The embodiment realizes an innovative connection pool management strategy. The connection pool adopts a multi-level grouping structure, and groups are carried out according to server clusters, geographic areas and service types. Each packet maintains an independent connection counter and status monitor. The core parameters of the connection pool comprise a maximum connection number, a minimum idle connection number, a connection survival time and the like, and the parameters can be dynamically adjusted through a configuration center. Meanwhile, a connection multiplexing mechanism is realized, and the requests of the same target server preferentially use the existing connection.

The present embodiment designs a reliable connection status monitoring mechanism. The state monitoring adopts a heartbeat detection scheme, a ping frame is sent every 30 seconds, and if pong response is not received 3 times continuously, connection disconnection is judged. The heartbeat interval can be dynamically adjusted according to the network quality, and the interval is appropriately shortened when the network condition is poor. Meanwhile, connection quality evaluation is realized, connection is scored by measuring Round Trip Time (RTT) and packet loss rate, and low-score connection is rebuilt with priority.

The present embodiment builds an efficient reconnection mechanism. The reconnection strategy adopts an exponential backoff algorithm, the initial reconnection interval is 1 second, the interval time is doubled after each failure, and the maximum interval is limited to 60 seconds. The context information of the connection is saved in the reconnection process, including authentication state and session data, and the session state before the reconnection is successful can be quickly recovered. And meanwhile, concurrent reconnection control is realized, and the phenomenon that the server is too stressed due to the fact that excessive reconnection requests are initiated simultaneously is avoided.

The embodiment realizes an accurate file position recording mechanism. The location record table adopts a distributed storage scheme, uses MongoDB storage, and the document structure design contains fields of_id (unique identifier generated by server ID and file path), filePath (log file absolute path), fileSize (file current size), LASTREADTIME (last read timestamp), offset (read location offset), and the like. Meanwhile, a composite index is established, and the query performance is optimized.

The present embodiment contemplates an innovative location update strategy. Concurrent updates are handled using an optimistic lock mechanism, with the atomicity of the update operation controlled by a version number field. The update operation uses atomic update commands to ensure data consistency in high concurrency scenarios. Meanwhile, an incremental updating mechanism is realized, only the changed fields are updated, and the data transmission quantity is reduced. To prevent data loss, the update operation employs a write attention mechanism to ensure that the data is written to enough copies.

The present embodiment builds a reliable file reading mechanism. The reading process adopts a buffer area mode, the default buffer area is 8KB in size, and the size of the default buffer area can be dynamically adjusted according to the file size. File pointer positioning uses seek operations to support reading from a specified offset. Meanwhile, file state detection is realized, whether the file is modified or deleted is verified before reading, and invalid data is prevented from being read. And for the additionally written log file, continuously monitoring file change by adopting a tail-f mode.

The embodiment realizes a complete exception handling flow. When the file is read abnormally, detailed error information including file states, system error codes and the like is recorded. For temporary errors (e.g., file locking), a retry mechanism is employed, and for permanent errors (e.g., file deletion), upper layer applications are notified in time and the location record status is updated. Meanwhile, data consistency verification is realized, and the integrity of read data is verified through a checksum mechanism.

Through the technical innovation, the key problems in distributed log acquisition are effectively solved, such as complex connection management, inaccurate position record, unreliable file reading and the like. In practical application, the scheme can support the real-time log acquisition requirement of a large-scale server cluster, and the reliability and efficiency of data acquisition are remarkably improved through reliable connection management and accurate position recording. The method is particularly suitable for a micro-service architecture environment, and ensures the integrity and continuity of log data through flexible connection pool management and a reliable file reading mechanism. The system and the innovation of the scheme enable the system to adapt to enterprise application requirements of different scales, and comprehensive improvement of a log acquisition system is achieved through continuous optimization and improvement.

Step S501, obtaining offset from the log file reading position record table, reading newly added log data by taking the offset as a file pointer position, performing line segmentation processing on the read log data, calculating a hash value of each line of data, obtaining a hash value of historical data from a log data cache according to a preset time window range, comparing the hash value of current data with the hash value of the historical data, and writing a de-duplication result into a compression buffer area after eliminating duplicate data;

Step S502, executing an LZ4 compression algorithm on the log data in the compression buffer to generate a compressed data block, adding a data identification header to the compressed data block, wherein the data identification header comprises the size of the data block, the type of the compression algorithm and checksum information, transmitting the compressed data block to a management terminal through websocket long connection, decompressing the compressed data by the management terminal according to a decompression algorithm selected by the data identification header, and writing the decompressed data into a log storage area.

Optionally, the present embodiment implements an efficient incremental reading mechanism based on file pointer locations. Firstly, the mmap technology is utilized to map the log file to the memory, so that frequent disk IO operation is avoided. The reading process adopts a double-buffer area design, one buffer area is used for data reading, the other buffer area is used for data processing, and continuous processing of data is realized through buffer area exchange. The size of the buffer area is dynamically adjusted according to the file updating frequency, and the capacity of the buffer area is properly increased under the high-frequency writing scene.

The embodiment realizes an accurate line segmentation algorithm. Considering the line feed difference of different operating systems, three line feed modes of LF (n), CR (r) and CRLF (r n) are simultaneously supported. The segmentation process adopts zero copy technology, and data copying is avoided through character pointer operation. Meanwhile, the multi-character set support is realized, log files in coding formats such as UTF-8 and GBK can be processed correctly, and the accuracy of line segmentation is ensured.

The embodiment designs an innovative hash calculation method. In order to improve the calculation efficiency, murmurHash algorithm is adopted, and the algorithm has the characteristics of high speed and low collision rate. In the hash calculation process, the time stamp information is considered, the time stamp is used as a seed value to participate in the hash calculation, and the uniqueness of the hash value is improved. Meanwhile, parallel processing of hash calculation is realized, a plurality of threads process different data blocks at the same time, and the processing speed is remarkably improved.

The present embodiment builds a reliable duplicate data detection mechanism. The time window adopts a sliding window design, the window size is configurable, and the default is 5 minutes. The hash value storage in the window adopts a bloom filter, and the misjudgment rate is reduced through a plurality of hash functions. Parameters of the bloom filter are dynamically calculated according to expected data quantity, and the bitmap size and the number of hash functions keep the optimal proportion. And data exceeding the time window is automatically cleaned, so that the continuous increase of memory occupation is avoided.

The embodiment realizes an efficient compression strategy. The LZ4 compression algorithm is selected based on the rapid compression and decompression characteristics, and is particularly suitable for text data with more repeated modes, such as log data. The compression process uses a streaming mode, the data block size is set to 64KB, which balances compression ratio and processing delay. Meanwhile, dynamic adjustment of compression levels is realized, and different compression levels are selected according to the load condition of the CPU.

The present embodiment designs a complete data identification header structure. The identification header contains fields such as magic number (for fast identification of data blocks), version number, compression algorithm type, original data size, compressed size, time stamp, checksum, etc. And the checksum adopts a CRC32 algorithm to check the compressed data, so that the integrity of data transmission is ensured. The design of the identification header supports backward compatibility, facilitating future expansion of new compression algorithms or addition of new metadata fields.

The present embodiment builds a reliable data transmission mechanism. The WebSocket data frame adopts a binary mode, so that the encoding overhead of a text mode is avoided. The transmission process realizes flow control, and the sending rate is controlled by a sliding window mechanism to prevent the buffer area of the receiving end from overflowing. Meanwhile, the slicing transmission of the data frame is realized, the single slicing size is limited within 1MB, and the transmission stability is ensured.

The embodiment realizes a high-efficiency decompression and reduction flow. After receiving the data, the management terminal firstly verifies the integrity and the correctness of the data identification head. And selecting a corresponding decompression method according to the type of the compression algorithm, wherein LZ4 decompression adopts stream processing and supports parallel decompression of a plurality of data blocks. And performing format verification before the decompressed data is written into the log storage area, so as to ensure the validity of the data. The storage area adopts a slicing storage strategy, and the data is sliced according to the time range, so that the subsequent inquiry and analysis are convenient.

By the technical innovation, the key problems in log acquisition, such as repeated data, low transmission efficiency, large decompression delay and the like, are effectively solved. In practical application, the scheme can support the real-time log acquisition requirement of a large-scale server cluster, and the occupation of network bandwidth and the consumption of storage space are obviously reduced through an efficient de-duplication compression and reliable transmission mechanism. The method is particularly suitable for a high-concurrency micro-service environment, and ensures the real-time performance and the integrity of log acquisition through stream processing and parallel optimization. The system and the innovation of the scheme enable the system to adapt to enterprise application requirements of different scales, and comprehensive improvement of a log acquisition system is achieved through continuous optimization and improvement.

Step S601, reading historical log data from a log storage area, carrying out text preprocessing on the historical log data, extracting a feature set formed by a timestamp, an event type, an operation object, a state code, an execution result, an error code, a user identifier and an operation instruction in a log, converting the feature set into vector representation, constructing an anomaly detection model and a pattern recognition model by adopting an LSTM (least squares) neural network, constructing an event correlation analysis model by adopting a graph neural network, training the anomaly detection model, the pattern recognition model and the event correlation analysis model by using the feature vector, and storing the trained model into a model library;

Step S602, loading an anomaly detection model, a pattern recognition model and an event association analysis model from the model library to construct a log analysis engine, converting the restored log data into feature vectors, inputting the feature vectors into the log analysis engine, calculating the anomaly scores of the feature vectors by the anomaly detection model, carrying out multi-classification prediction on the log content by the pattern recognition model to output category labels, mining time sequence association and causal relation among log events by the event association analysis model based on a graph structure, and generating anomaly event early warning information according to the anomaly scores, the category labels and the association relation.

Optionally, the embodiment first implements an efficient log text preprocessing procedure. And adopting a regular expression engine to perform structural analysis, and establishing a matching rule base aiming at log templates in different formats. The parsing process considers various formats of the time stamp, and supports automatic identification and conversion of standard formats such as ISO8601, unix time stamp and the like. And extracting key information for unstructured log content by using a sliding window and a word frequency statistical method, so as to ensure the integrity of feature extraction.

The present embodiment designs an innovative feature engineering scheme. The time features are analyzed through time stamps to obtain specific time granularity information, wherein the specific time granularity information comprises periodic features such as hours, weeks, months and the like. The event type and the operation object are converted into dense vectors by a Word embedding technology, and 300-dimensional vector representation is obtained by training the Word2Vec model. The state code and the error code are converted through One-Hot coding, and meanwhile original numerical information is reserved, so that the association relation between the error codes is conveniently captured.

The present embodiment constructs an anomaly detection model of depth. The LSTM network adopts a bidirectional structure and comprises two LSTM layers, and 128 hidden units are arranged in each layer. The network input is a sequence of time sequential features, each time step containing a vector representation of all features. The training adopts a contrast learning strategy, and takes a normal sequence as a positive sample and an artificially constructed abnormal sequence as a negative sample. The loss function combines contrast loss and reconstruction loss, and improves the sensitivity of the model to abnormal modes.

The embodiment realizes an accurate pattern recognition model. Based on the LSTM sequence classification network, an attention mechanism is added above the LSTM layer, adaptively focusing on important time steps and feature dimensions. The attention weight is calculated through a softmax function, so that the dynamic adjustment of the feature importance is realized. The model output layer uses a multi-label classification structure, supports the simultaneous identification of a plurality of class labels for a single log, and improves the classification flexibility.

The present embodiment designs an innovative graph neural network model. The nodes of the event association graph represent log events, and the edges represent association relations among the events. The node characteristics contain all attribute information of the event, and the weight of the edge is calculated through the time interval and the characteristic similarity. The graph neural network adopts a graph attention network (GAT) structure, and learns complex dependency relationships among nodes through a multi-head attention mechanism, so that causal relations in an event chain are effectively captured.

The embodiment realizes an efficient model training mechanism. Training data is organized in a sliding window mode, and window sizes are adaptively adjusted according to time correlation. The batch size was set to 64, parameter updates were performed using Adam optimizer, learning rate was increased gradually to 0.001 at the beginning of training using the arm-up strategy, and then decayed using cosine annealing. Meanwhile, early stop braking is realized, and training is stopped when the performance of the verification set is continuously improved by 5 epochs.

The present embodiment builds a complete model deployment framework. The model library adopts a distributed storage architecture, and supports version management and quick rollback of the model. The model loading adopts a delay loading strategy, and the required model is dynamically loaded according to actual requirements. Model optimization is carried out in ONNX format in the reasoning process, and reasoning efficiency is improved through operator fusion and quantization optimization. Meanwhile, a hot updating mechanism of the model is realized, and online updating is supported without influencing the service.

The embodiment designs a reliable abnormality early warning mechanism. The anomaly score is mapped to the [0,1] interval through probability distribution transformation, and a dynamic threshold is set in combination with expert rules. And comparing the category prediction result with the historical statistical distribution, and finding abnormal category distribution change. The event correlation analysis results are used for constructing a fault propagation diagram and predicting potential fault diffusion paths. The early warning information comprises abnormal description, influence range and processing advice, and supports a multi-level warning strategy.

Through the technical innovation, the key problems in the intelligent log analysis are effectively solved, such as incomplete feature extraction, inaccurate anomaly detection, insufficient correlation analysis and the like. In practical application, the scheme can accurately identify system anomalies, and provides more and more accurate analysis results through continuous optimization of the deep learning model. The method is particularly suitable for operation and maintenance monitoring of a large-scale distributed system, and the efficiency of problem positioning and fault prevention is remarkably improved through multi-dimensional intelligent analysis. The system and the innovation of the scheme enable the system to adapt to the log analysis requirements of different types, and the comprehensive improvement of the log analysis capability is realized through continuous learning and optimization of the model.

Step S701, creating a management terminal visual panel, dividing the abnormal event early warning information into three levels of high, medium and low according to the abnormal level, displaying the abnormal event early warning information in an early warning information area, displaying the log classification marking result in a cake pattern mode, displaying the association strength among log event nodes by the association analysis result through a force guide graph, creating a time range selector in the visual panel, and updating visual data in real time according to the start-stop time of the time range selector;

Step S702 is to monitor a download button click event of a user in the visual panel, acquire start-stop time parameters in a time range selector, query abnormal event early warning information, log classification labeling results and associated analysis results in the time range from a database, convert the query results into a JSON format, execute a ZIP compression algorithm on the JSON format data to generate a compression file, and store the compression file to a local storage path appointed by the user.

Optionally, the present embodiment employs a modular architecture in the design of the visualization panel. Front-end application is built based on a reaction framework, and state management is performed by adopting Redux, so that data flow and state synchronization among all visual components are realized. The panel layout adopts a responsive design, and the self-adaptive layout is realized through CSS Grid and Flexbox, so that good display effects can be obtained under different screen sizes.

The embodiment realizes an accurate early warning information display mechanism. The classification of anomaly level is based on multidimensional scoring, including factors such as impact range, duration, business importance, etc. The high level alert uses a red highlighting and is provided with a flashing animation effect, the medium level uses orange and the low level uses yellow. The early warning information area adopts card type layout, and each warning card contains information such as time, type, description, processing advice and the like, so that the unfolding and viewing of details are supported.

The embodiment designs a dynamic classification result visualization. The pie chart is realized by adopting ECharts libraries, supports multi-level class display, and has main classes displayed in an inner ring and subcategory distribution displayed in an outer ring. The chart interaction supports click screening and drill analysis of categories, displaying detailed statistical information when hovered. The color scheme is specially designed, so that good readability and attractive appearance are ensured. Meanwhile, the automatic zooming function of the chart is realized, and the method is suitable for display areas with different sizes.

The present embodiment builds an innovative association analysis visualization. The force directed graph is implemented based on the D3.js library, the size of the node represents the importance of the event, and the color represents the event type. The thickness and color of the edges represent the strength of the association, and the timing relationship of the occurrence of events is indicated by arrows. The graph layout adopts a force guiding algorithm, so that the node positions can be automatically adjusted, and node overlapping is avoided. Meanwhile, the zooming, translating and node dragging functions of the graph are realized, and the interaction experience is improved.

The present embodiment realizes a reliable time range selector. The selector adopts a double slider design, supporting time range selection accurate to seconds. Time selection supports shortcuts such as last hour, today, week, etc. The change of the selector can trigger the real-time update of the data, and the update process adopts anti-shake processing, so that frequent data requests are avoided. Meanwhile, the automatic identification and conversion of time zones are realized, and the correct time is ensured to be displayed in different areas.

The present embodiment designs an efficient data update mechanism. The server pushing is realized by adopting the WebSocket long connection, and when a new analysis result exists, the server actively pushes data to the front end. The data updating adopts an incremental updating strategy, only the changed part is transmitted, and the network transmission quantity is reduced. Meanwhile, a data caching mechanism is realized, data in a common time range is cached in the browser, and the query response speed is improved.

The present embodiment builds a complete data export function. And monitoring a clicking event of the downloading button, and adopting throttling processing to avoid repeated clicking. The data query adopts a paging mode, so that memory overflow caused by acquiring excessive data at one time is avoided. And (3) data cleaning is performed during JSON format conversion, unnecessary fields are removed, and the data structure is optimized. The compression process is realized by Web Workers, so that the blocking of a main thread is avoided, and the user experience is improved.

The embodiment realizes a reliable file storage mechanism. File saving is achieved using native FILE SYSTEM ACCESS API, supporting user selection of save paths. And the large file is stored by adopting streaming processing, and the memory occupation is avoided by the segmented writing of the Blob object. Meanwhile, a saving progress prompt and error retry mechanism is realized, and the reliability of file saving is improved. For browsers that do not support new APIs, a compatible download scheme is provided.

Through the technical innovation, the key problems in log analysis visualization, such as non-visual display effect, poor interaction experience, inconvenient data export and the like, are effectively solved. In practical application, the scheme can intuitively display the running state of the system, and the working efficiency of operation and maintenance personnel is remarkably improved through abundant visual components and smooth interaction experience. The method is particularly suitable for monitoring scenes of a large-scale distributed system, and meets analysis requirements of different layers through multi-dimensional data display and flexible data export. The systematic and innovative design of the scheme can adapt to different types of visual requirements, and the comprehensive improvement of the log analysis system is realized through continuous optimization and improvement.

In order to effectively solve the shortcomings of the conventional technology in terms of distributed acquisition, data transmission, intelligent analysis and the like, the application provides an embodiment of a distributed server cluster log processing device for realizing all or part of the content of the distributed server cluster log processing method, and referring to fig. 2, the distributed server cluster log processing device specifically comprises the following contents:

the dynamic allocation module 10 is configured to establish a distributed server cluster log acquisition network, create a server cluster configuration table on a management terminal, where the server cluster configuration table includes network addresses, port numbers, access credentials and server load thresholds of each server to be monitored, establish a load balancing scheduling center based on the server cluster configuration table, periodically acquire system load information of each server to be monitored, dynamically allocate log acquisition tasks according to the system load information, and generate task scheduling queues according to preset task priorities;

The log processing module 20 is configured to construct an incremental log data acquisition channel, establish a websocket long connection pool between the management terminal and each server to be monitored based on the task scheduling queue, record a latest reading position of each log file, perform incremental data acquisition according to the latest reading position, perform real-time compression encoding on the acquired log data, detect repeated data based on a sliding time window, reject the repeated data, transmit the compression encoded data to the management terminal, and decompress and restore the received compression encoded data at the management terminal;

The log analysis module 30 is configured to construct an intelligent log analysis model, train historical log data by using a deep learning method, construct an analysis engine including an anomaly detection model, a pattern recognition model and an event association analysis model, input the restored log data into the analysis engine, calculate an anomaly score based on a log feature vector by the anomaly detection model, classify and label log content by the pattern recognition model, mine association relations among multidimensional logs by the event association analysis model, generate anomaly event early warning information, visually display the anomaly event early warning information, log classification and label results and association analysis results in the management terminal interface, and package and compress analysis result data in a specified time range and then download the analysis result data to local storage when a user trigger downloading operation is detected.

From the above description, the distributed server cluster log processing device provided by the embodiment of the application can dynamically allocate the acquisition tasks through the load balancing dispatching center by constructing the distributed server cluster log acquisition network. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

In order to effectively solve the defects of the traditional technology in terms of distributed acquisition, data transmission, intelligent analysis and the like in terms of hardware level, the application provides an embodiment of an electronic device for realizing all or part of contents in a distributed server cluster log processing method, which specifically comprises the following contents:

The system comprises a processor (processor), a memory (memory), a communication interface (Communications Interface) and a bus, wherein the processor, the memory and the communication interface are used for completing communication among the devices through the bus, the communication interface is used for realizing information transmission among a distributed server cluster log processing device, a core service system, a user terminal, related equipment such as a related database and the like, and the logic controller can be a desktop computer, a tablet computer, a mobile terminal and the like. In this embodiment, the logic controller may refer to an embodiment of the method for processing a distributed server cluster log and an embodiment of the device for processing a distributed server cluster log, and the contents thereof are incorporated herein and are not repeated here.

It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, etc. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..

In practical applications, part of the distributed server cluster log processing method may be executed on the electronic device side as described above, or all operations may be completed in the client device. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor.

The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.

Fig. 3 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 3, the electronic device 9600 can include a central processor 9100 and a memory 9140, the memory 9140 being coupled to the central processor 9100. It is noted that this fig. 3 is exemplary, and that other types of structures may be used in addition to or in place of the structures to implement telecommunications functions or other functions.

In one embodiment, the distributed server cluster log processing method functionality may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:

From the above description, it can be seen that, in the electronic device provided by the embodiment of the present application, the distributed server cluster log collection network is constructed, and the collection task is dynamically distributed through the load balancing dispatching center. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

In another embodiment, the distributed server cluster log processing device may be configured separately from the central processor 9100, for example, the distributed server cluster log processing device may be configured as a chip connected to the central processor 9100, and the function of the distributed server cluster log processing method is implemented by control of the central processor.

As shown in fig. 3, the electronic device 9600 may further include a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 does not necessarily include all the components shown in fig. 3, and furthermore, the electronic device 9600 may include components not shown in fig. 3, to which reference is made in the prior art.

As shown in fig. 3, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.

The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.

The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 9110 is a transmitter/receiver that transmits and receives signals via the antenna 9111. The communication module 9110 (transmitter/receiver) is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module 9110 (transmitter/receiver) is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.

The embodiment of the present application further provides a computer readable storage medium capable of implementing all steps in the distributed server cluster log processing method in which the execution subject is a server or a client, and the computer readable storage medium stores a computer program thereon, and the computer program when executed by a processor implements all steps in the distributed server cluster log processing method in which the execution subject is a server or a client, for example, the processor implements the following steps when executing the computer program:

As can be seen from the above description, the computer readable storage medium provided by the embodiments of the present application dynamically distributes the collection tasks through the load balancing dispatching center by constructing the distributed server cluster log collection network. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

The embodiment of the present application further provides a computer program product capable of implementing all the steps in the distributed server cluster log processing method in which the execution subject in the above embodiment is a server or a client, where the computer program/instructions implement the steps of the distributed server cluster log processing method when executed by a processor, for example, the computer program/instructions implement the steps of:

As can be seen from the above description, the computer program product provided by the embodiments of the present application dynamically distributes the collection tasks through the load balancing dispatching center by constructing the distributed server cluster log collection network. And designing an incremental data acquisition channel, realizing high-efficiency transmission by utilizing a websocket long connection pool, and optimizing transmission efficiency by combining a real-time compression coding and repeated data detection mechanism. An intelligent log analysis model is constructed, and comprises three sub-models of anomaly detection, pattern recognition and event association analysis, and anomaly score calculation, log classification marking and multidimensional association analysis are realized based on a deep learning method, and visual display and data export functions are provided. The method effectively solves the defects of the traditional technology in the aspects of distributed acquisition, data transmission, intelligent analysis and the like, and remarkably improves the performance and practicability of the log processing system.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the principles and embodiments of the present invention have been described in detail in the foregoing application of the principles and embodiments of the present invention, the above examples are provided for the purpose of aiding in the understanding of the principles and concepts of the present invention and may be varied in many ways by those of ordinary skill in the art in light of the teachings of the present invention, and the above descriptions should not be construed as limiting the invention.

Claims

1. A distributed server cluster log processing method, characterized in that the method comprises:

Establish a distributed server cluster log collection network, create a server cluster configuration table on the management terminal, establish a load balancing scheduling center based on the server cluster configuration table, the load balancing scheduling center periodically collects system load information of each server to be monitored, dynamically allocates log collection tasks according to the system load information, and generates a task scheduling queue according to a preset task priority;

Construct an incremental log data collection channel, establish a websocket long connection pool between the management terminal and each monitored server based on the task scheduling queue, record the latest read position of each log file, perform incremental data collection according to the latest read position, detect duplicate data based on a sliding time window, remove the duplicate data and transmit the compressed encoded data to the management terminal, and decompress and restore the received compressed encoded data at the management terminal;

An intelligent log analysis model is constructed, and deep learning methods are used to train historical log data. An analysis engine including an anomaly detection model, a pattern recognition model, and an event association analysis model is constructed. The restored log data is input into the analysis engine. The anomaly detection model calculates the anomaly score based on the log feature vector, and the pattern recognition model classifies and annotates the log content. The event association analysis model mines the correlation relationship between multi-dimensional logs and generates abnormal event warning information. The abnormal event warning information, log classification and annotation results, and correlation analysis results are visualized in the management terminal interface. When a user triggers a download operation, the analysis result data within the specified time range is packaged and compressed and downloaded to local storage.

2. The distributed server cluster log processing method according to claim 1, wherein the step of establishing a distributed server cluster log collection network and creating a server cluster configuration table on a management terminal comprises:

Enter the server cluster information in the configuration interface of the management terminal, save the server cluster information into the database to generate a server cluster configuration table, the server cluster configuration table includes server identification, network address, port number, access credential, load threshold and priority field, read the network address and port number in the server cluster configuration table to establish a TCP connection, verify the validity of the access credential through the SSH protocol, and mark the server node as available after the verification is passed;

Build a load monitoring agent program and deploy it to each server to be monitored. The load monitoring agent program collects CPU usage, memory occupancy, disk IO and network bandwidth data, calculates the server's comprehensive load score, compares the comprehensive load score with the load threshold in the server cluster configuration table, and sends a warning signal to the server that exceeds the load threshold, and adjusts the collection task priority of the server according to the warning signal.

3. The distributed server cluster log processing method according to claim 1 is characterized in that a load balancing scheduling center is established based on the server cluster configuration table, the load balancing scheduling center periodically collects system load information of each server to be monitored, dynamically allocates log collection tasks according to the system load information, and generates a task scheduling queue according to a preset task priority, including:

Create a load balancing scheduling center process, read the load threshold and priority parameters in the server cluster configuration table, establish a system load information collection timing task, the system load information collection timing task obtains CPU usage, memory occupancy, disk IO and network bandwidth data from the load monitoring agent program of each monitored server at a preset time interval, and stores the system load information in the load status cache table;

Based on the system load information in the load status cache table, the available resource capacity of each server is calculated, the task volume of the log collection task is evaluated, the task volume size is matched with the available resource capacity, and a task allocation plan is generated. According to the task allocation plan and the priority parameters in the server cluster configuration table, a task scheduling queue is constructed, and the task scheduling queue is distributed to the load monitoring agent program of each server to be monitored.

4. The distributed server cluster log processing method according to claim 1 is characterized in that the incremental log data collection channel is constructed, a websocket long connection pool is established between the management terminal and each monitored server based on the task scheduling queue, and the latest read position of each log file is recorded, including:

Based on the task scheduling queue, the network address and port number of the server to be monitored are obtained, a long connection between the management terminal and the server to be monitored is established through the websocket protocol, the long connection is stored in the connection pool according to the server identifier, and a connection pool manager is created. The connection pool manager monitors the connection status, automatically reestablishes the disconnected connection, and maintains the availability of the connection pool;

Create a log file reading position record table, which contains the log file path, file size, last read timestamp and read position offset fields. Before reading the log file each time, obtain the offset of the last read position from the log file reading position record table, and use the offset as the starting position of a new round of reading. After the reading is completed, update the timestamp and offset in the log file reading position record table.

5. The distributed server cluster log processing method according to claim 1 is characterized in that the incremental data collection is performed according to the latest read position, duplicate data is detected based on a sliding time window, the compressed encoded data is transmitted to a management terminal after the duplicate data is removed, and the received compressed encoded data is decompressed and restored at the management terminal, comprising:

Obtaining an offset from the log file reading position record table, using the offset as the file pointer position to read newly added log data, performing row segmentation processing on the read log data, calculating a hash value for each row of data, obtaining a hash value for historical data from a log data cache according to a preset time window range, comparing the hash value of current data with the hash value of historical data, and writing a deduplication result into a compression buffer after removing duplicate data;

Execute the LZ4 compression algorithm on the log data in the compression buffer to generate a compressed data block, add a data identification header to the compressed data block, the data identification header includes the data block size, compression algorithm type and checksum information, send the compressed data block to the management terminal through the websocket long connection, select the corresponding decompression algorithm according to the data identification header at the management terminal to decompress the compressed data, and write the decompressed data into the log storage area.

6. The distributed server cluster log processing method according to claim 1 is characterized in that the intelligent log analysis model is constructed, the historical log data is trained by a deep learning method, an analysis engine including an anomaly detection model, a pattern recognition model and an event association analysis model is constructed, the restored log data is input into the analysis engine, the anomaly detection model calculates an anomaly score based on a log feature vector, the pattern recognition model classifies and annotates the log content, and the event association analysis model mines the correlation between multi-dimensional logs to generate abnormal event warning information, including:

Read historical log data from the log storage area, perform text preprocessing on the historical log data, extract timestamps, event types, operation objects, status codes, execution results, error codes, user identifiers, and operation instructions in the logs to form a feature set, convert the feature set into a vector representation, use an LSTM neural network to build an anomaly detection model and a pattern recognition model, use a graph neural network to build an event association analysis model, use the feature vectors to train the anomaly detection model, pattern recognition model, and event association analysis model, and save the trained model to a model library;

The anomaly detection model, pattern recognition model and event association analysis model are loaded from the model library to build a log analysis engine, the restored log data is converted into a feature vector and input into the log analysis engine, the anomaly detection model calculates the anomaly score of the feature vector, the pattern recognition model performs multi-classification prediction on the log content and outputs a category label, the event association analysis model mines the temporal association and causal relationship between log events based on the graph structure, and generates abnormal event warning information according to the anomaly score, category label and association relationship.

7. The distributed server cluster log processing method according to claim 1 is characterized in that the abnormal event warning information, log classification and annotation results and correlation analysis results are visually displayed in the management terminal interface, and when a user triggers a download operation, the analysis result data within a specified time range is packaged and compressed and downloaded to local storage, including:

Create a management terminal visualization panel, divide the abnormal event warning information into three levels of high, medium and low according to the abnormal level and display it in the warning information area, display the distribution of each category in the form of a pie chart, and display the correlation analysis result through a force-directed graph to show the correlation strength between log event nodes. Create a time range selector in the visualization panel, and update the visualization data in real time according to the start and end time of the time range selector;

Listen to the download button click event in the visualization panel, obtain the start and end time parameters in the time range selector, query the abnormal event warning information, log classification and annotation results and correlation analysis results within the time range from the database, convert the query results into JSON format, execute the ZIP compression algorithm on the JSON format data to generate a compressed file, and save the compressed file to the local storage path specified by the user.

8. A distributed server cluster log processing device, characterized in that the device comprises:

A dynamic allocation module is used to establish a distributed server cluster log collection network, create a server cluster configuration table on the management terminal, the server cluster configuration table contains the network address, port number, access credentials and server load threshold of each server to be monitored, and establish a load balancing scheduling center based on the server cluster configuration table. The load balancing scheduling center periodically collects system load information of each server to be monitored, dynamically allocates log collection tasks according to the system load information, and generates a task scheduling queue according to a preset task priority;

A log processing module is used to build an incremental log data collection channel, establish a websocket long connection pool between the management terminal and each server to be monitored based on the task scheduling queue, record the latest read position of each log file, perform incremental data collection according to the latest read position, perform real-time compression encoding on the collected log data, detect duplicate data based on a sliding time window, transmit the compressed encoded data to the management terminal after removing the duplicate data, and decompress and restore the received compressed encoded data at the management terminal;

The log analysis module is used to build an intelligent log analysis model, use deep learning methods to train historical log data, build an analysis engine including an anomaly detection model, a pattern recognition model and an event association analysis model, input the restored log data into the analysis engine, the anomaly detection model calculates the anomaly score based on the log feature vector, the pattern recognition model classifies and annotates the log content, and the event association analysis model mines the association relationship between multi-dimensional logs to generate abnormal event warning information. The abnormal event warning information, log classification and annotation results and association analysis results are visualized in the management terminal interface. When it is detected that the user triggers the download operation, the analysis result data within the specified time range is packaged and compressed and downloaded to local storage.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the distributed server cluster log processing method according to any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the distributed server cluster log processing method according to any one of claims 1 to 7 are implemented.