CN118982347A

CN118982347A - IT operation and maintenance service management method based on big data

Info

Publication number: CN118982347A
Application number: CN202411472875.1A
Authority: CN
Inventors: 丁旻
Original assignee: Hangzhou Aoju Technology Co ltd
Current assignee: Hangzhou Aoju Technology Co ltd
Priority date: 2024-10-22
Filing date: 2024-10-22
Publication date: 2024-11-19
Anticipated expiration: 2044-10-22
Also published as: CN118982347B

Abstract

The present invention relates to an IT operation and maintenance service management method based on big data, comprising the following steps: S1: collecting data from logs, monitoring tools, and user behaviors, and performing preprocessing; S2: performing anomaly detection on the preprocessed data set, and performing secondary data cleaning; S3: predicting the trend of system performance indicators based on the data after secondary data cleaning; S4: constructing a static user portrait based on user behavior data using hierarchical clustering and cluster analysis, and dynamically updating the user portrait to adapt to the latest behavior data based on path dynamic analysis; S5: optimizing resource allocation and task scheduling using a reinforcement learning algorithm based on the trend of system performance indicators and user portraits; S6: integrating external security intelligence and using an adaptive strategy adjustment mechanism. The present invention not only effectively improves system operation efficiency and reliability, but also improves effective management in an environment of IT operation and maintenance complexity and dynamic changes.

Description

IT operation and maintenance service management method based on big data

Technical Field

The invention relates to the field of IT operation and maintenance management, in particular to an IT operation and maintenance service management method based on big data.

Background

In today's digital and information intensive environments, the IT infrastructure of an enterprise has become the basic support for ITs operations and strategies. IT operation and maintenance service management based on big data is becoming an important means for organizations to ensure their system stability, security and efficient operation. With the rapid development of internet of things (IoT), cloud computing, and big data technologies, the amount of data facing enterprises has increased explosively, and the operating environment has also become more complex and dynamic. These changes present new challenges, including: how to collect and process mass data effectively, detect system abnormality rapidly, predict system performance trend accurately, and make intelligent decision in changeable environment. Meanwhile, the security and reliability of enterprise systems directly affect their market competitiveness and customer satisfaction.

Disclosure of Invention

In order to solve the problems, the invention aims to provide the IT operation and maintenance service management method based on big data, which not only effectively improves the operation efficiency and reliability of the system, but also improves the complexity of IT operation and maintenance and effective management under the dynamically changing environment.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

An IT operation and maintenance service management method based on big data comprises the following steps:

S1, collecting data from logs, monitoring tools and user behaviors, and preprocessing by using a Pandas library of Python, wherein the preprocessing comprises data cleaning and missing value processing;

s2, carrying out anomaly detection by combining the data set preprocessed by the K-Means and the DBSCAN, carrying out secondary data cleaning based on the anomaly detection result, and optimizing a data cleaning strategy;

S3, predicting the trend of the system performance index based on the ARIMA model based on the data after the secondary data cleaning;

S4, constructing a static user portrait according to the user behavior data by using a hierarchical clustering algorithm, and dynamically updating the user portrait to adapt to the latest behavior data based on path dynamic analysis;

s5, optimizing resource allocation and task scheduling by using a reinforcement learning algorithm based on the trend of the system performance index and the user portraits;

And S6, integrating external safety information, applying to safety event analysis, and automatically modifying a protection strategy according to a safety situation by utilizing a self-adaptive strategy adjustment mechanism.

Further, S1 is specifically:

using Logstar as a log collector, extracting log data from different servers and application programs, and sending the log data to an elastic search for storage and indexing;

collecting network and system performance indexes by utilizing Prometaus and Zabbix;

deriving user access and interaction data from a user behavior analysis tool;

Using APACHE KAFKA as a data flow platform, and transmitting the data of all data sources to a database in real time;

Data is loaded from the database using the Pandas library and the repeat value is removed using the drop duplicates () function of Pandas, unnecessary rows or columns are filtered out according to certain conditions, and the missing values are processed with forward padding.

Further, S2 is specifically:

Carrying out standardization processing on the preprocessed data, carrying out cluster analysis on the standardized data by using a K-Means algorithm, selecting a cluster number, initializing a cluster center, and repeating the following steps until convergence:

Wherein C _j is the set of all member samples of cluster j, and μ _j is the center of the j-th cluster; x _i is the eigenvector of the ith data point, c _i is the cluster to which the ith data point is assigned;

Calculating Euclidean distance from each point to the center of the cluster to which the point belongs:

；

Where m is the total dimension representing the feature space; Representing data points A value in a kth dimension; representing the value of cluster center mu _j in the kth dimension;

finding out abnormal points according to the distance threshold;

;

wherein, Representing data pointsA cluster to which the cluster belongs; x _d represents an outlier; d _u denotes a distance threshold;

Density clustering of data by DBSCAN, for each data point, detecting it The number of points in the neighborhood, if the number of points reaches min_samples, the number of points is a core point; the method includes the steps that a boundary point which does not meet the core point standard but belongs to a neighborhood of a certain core point is a noise point, and the noise point is marked as abnormal;

in combination with the abnormal data sets obtained from K-Means and DBSCAN, intersection and independent portions are analyzed and secondary cleaning is performed to improve the quality of the data sets.

Further, in combination with the abnormal data sets obtained from K-Means and DBSCAN, intersections and independent portions are analyzed and secondary cleaning is performed to improve the quality of the data sets, as follows: identifying data points marked as abnormal by K-Means and DBSCAN at the same time, and modifying the numerical data by using a data modification strategy as clear abnormal points; for the data points which are only identified as abnormal by the K-Means, checking whether the data points belong to edge values or not, and if analysis is not affected, filtering; for the data points which are only identified as abnormal by the DBSCAN, confirming that the data is normal, and retaining; for abnormal data that is uncorrectable or invalid for analysis, removing from the dataset;

further, S3 is specifically:

Checking the stationarity of the data by ADF, if not stationary, applying differential operations to remove trends and seasonal effects;

dividing data into a training set and a testing set, wherein the training set is used for model fitting, and the testing set is used for model verification;

Selecting appropriate parameters (p, d, q) according to the behavior characteristics of the autocorrelation function ACF and the partial autocorrelation function PACF; p is the autoregressive partial order; d is the degree of difference; q is the moving average partial order;

constructing an ARIMA model by using the selected parameters, fitting the denoised training data, and training the training model to obtain parameter estimation and model fitting results;

；

wherein, An observation at time t for the time series; c is a constant term; Coefficients that are autoregressive parts; p is the order of the autoregressive portion, Is the corresponding index variable; Is that An observation of the order; For the coefficients of the moving average portion, q is the order of the moving average portion, Is the corresponding index variable; is a past error term; An error term at time t;

adopting Ljung-Box to check whether the residual error is white noise or not so as to confirm that the model captures the data structure;

；

Wherein Q is Ljung-Box statistic; n is the sample size, i.e. the number of observations of the time series data; as autocorrelation coefficients of residual, hysteresis Is a coefficient of autocorrelation of (a); h is the number of lags tested.

Further, S4 is specifically:

extracting key features from the user behavior log, and constructing the key features into feature vectors for cluster analysis;

Analyzing the user behavior data by using a Ward variance minimization method to obtain a clustering result;

classifying users into different groups according to the clustering result, calculating the characteristic mean value of each group, and forming a characteristic set representing the group portrait to obtain a static user portrait;

analyzing a user behavior path, and identifying an interaction mode of a user by modeling the user behavior as a state transition process, and adding the interaction mode as a dynamic characteristic into a static user portrait to obtain a dynamic user portrait;

The user portrait is adjusted by using a real-time data updating mechanism, the dynamic updating is carried out by a weighted average method, and new and old data are combined:

Updated_Profile=α×Old_Profile+(1−α)×New_Data；

wherein updated_profile represents the Updated user representation, old_profile represents the original user representation, new_data represents the New Data, and α is the weight.

Further, the Ward variance minimization method is adopted to analyze the user behavior data, and a clustering result is obtained, and the method is specifically as follows:

extracting key behavior characteristics including session duration, page access number and click rate to form a feature matrix

X is a group; the Euclidean distance is used as a standard distance measurement method, the distance between different users is calculated, and a distance matrix is constructed;

Clustering by using Ward algorithm according to the calculated distance matrix, clustering by calculating the dispersion of each cluster, combining two clusters each time, so that the square sum WCSS increment of errors in the combined clusters is minimum, generating a cluster tree, and constructing a tree diagram;

；

Wherein WCSS is WCSS increment; x is a group of A user; Cluster Is the average value of (2);

Based on the dendrogram, the cluster number is determined by CH index and Dunn index:

wherein, In order to be an inter-cluster scatter matrix,Is an intra-cluster scatter matrix; Is the cluster number; Is the total number of data;

the Dunn index is the ratio of the minimum distance between two nearest neighbor clusters to the maximum cluster diameter selected.

Further, analyzing the behavior path of the user, and identifying the interaction mode of the user in the application by modeling the user behavior as a state transition process (such as a Markov chain); determining all possible user behaviors as states in the Markov chain;

Counting the frequency of each behavior conversion, and calculating the transition probability between states;

wherein, Is in state ofTransition to StateProbability of (2); c () is a transfer counter; Representing acquisition status Is a total number of transfers;

Based on the calculated transition probabilities between the states, a transition matrix is constructed, and a high-frequency transition path is identified by using the transition matrix, so that a user interaction mode is determined.

Further, S5 is specifically:

Defining a state space as a combination of a system load level and a user path position, and defining an action set as a resource adjustment and user guidance strategy; combining the user portrait information with the system state characteristics to form the state input of reinforcement learning, so that the scheduling decision is more personalized; the action space is adjusted according to different user groups;

The bonus function R (s, a) combines resource utilization and path completion rate:

R(s,a)=w₁×Resource_Efficiency(s,a)+w₂×Path_Completion_Success(s)；

Wherein resource_efficiency (s, a) is the Resource utilization rate; path_completion_success(s) Path Completion rate; w ₁ and w ₂ are corresponding weights;

Q value update:

；

wherein Q (s, a) is a value estimate of performing action a in state ss; alpha is the learning rate; gamma is a discount factor, and the current value of future rewards is weighted within the range of 0-gamma <1; is the maximum expected Q value for all possible actions a 'at the next state s';

Collecting behavior data and performance indexes, obtaining an optimization strategy through Q-Learning adjustment strategy and training, and improving resource allocation efficiency and user experience based on the optimization strategy;

Implementation in real-time streaming -Action selection balancing in a greedy policy-driven model.

Further, S6 is specifically:

and setting performance and experience thresholds by using a CEP module of the Flink, analyzing the data stream in real time, and triggering an alarm or automatically adjusting rules when abnormality occurs.

Packaging feedback information into FLINK PIPELINE, and automatically adjusting according to trigger rulesLearning rate and update frequency of state features.

The invention has the following beneficial effects:

1. The invention not only effectively improves the running efficiency and reliability of the system, but also improves the complexity of IT operation and maintenance and effective management under the dynamically changing environment.

2. According to the invention, abnormal data is identified and corrected through a multi-stage data optimization method and through secondary cleaning, and the overall quality of a data set is improved through deleting or expanding operation, so that the data is ensured to be more accurate and more efficiently added with an analysis flow for decision making, and the intelligent improvement of efficient and accurate abnormal detection and data cleaning is realized;

3. The invention can effectively identify different user behavior patterns by using the Ward method, construct user portraits on the basis, combine accurate feature extraction and Euclidean distance to realize efficient data segmentation and group analysis, dynamically respond to the static frequency and quantitative state change of the user by introducing path analysis, can also identify the behavior patterns of the user in the system, and be embodied in user portraits updating and resource optimizing decisions, finally combine a Markov process and a reinforcement learning algorithm, can cope with the problems of user experience and performance coordination in complex application environments, realize personalized resource management and user guiding mechanisms, and improve the efficiency and user satisfaction of the system.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the attached drawings and specific examples:

Referring to fig. 1, in this embodiment, an IT operation and maintenance service management method based on big data is provided, which includes the following steps:

s1, collecting data (system logs, application program logs, network monitoring tools (such as Prometheus, zabbix) and user behavior logs) from various sources of logs, monitoring tools and user behaviors, and preprocessing by using a Pandas library of Python, including data cleaning and missing value processing;

s4, constructing a static user portrait according to the user behavior data by using a hierarchical cluster analysis algorithm, and dynamically updating the user portrait to adapt to the latest behavior data based on path dynamic analysis;

s5, optimizing resource allocation and task scheduling by using a reinforcement Learning algorithm (such as Q-Learning) based on the trend of the system performance index and the user portrait;

and S6, integrating external security information (CTI), applying to security event analysis, and automatically modifying a protection strategy according to a security situation by utilizing an adaptive strategy adjustment mechanism.

In this embodiment, S1 is specifically:

deriving user access and interaction data from a user behavior analysis tool;

In this embodiment, S2 is specifically:

；

finding out abnormal points according to the distance threshold;

;

In this embodiment, intersection and independent parts are analyzed in combination with the abnormal data sets obtained from K-Means and DBSCAN, and a secondary cleaning is performed to improve the quality of the data sets, concretely as follows: identifying data points marked as abnormal by K-Means and DBSCAN at the same time, and modifying the digital data by using a data modification strategy (such as interpolation and median substitution) for the specific abnormal points; for the data points which are only identified as abnormal by the K-Means, checking whether the data points belong to edge values or not, and if analysis is not affected, filtering; for the data points which are only identified as abnormal by the DBSCAN, confirming that the data is normal, and retaining; for abnormal data that is uncorrectable or invalid for analysis, removing from the dataset;

in this embodiment, S3 is specifically:

；

Q is Ljung-Box statistic; n is the sample size, i.e. the number of observations of the time series data; as autocorrelation coefficients of residual, hysteresis Is a coefficient of autocorrelation of (a); h is the number of lags tested.

In this embodiment, S4 is specifically:

Extracting key features such as session time length, page stay time, click times and the like from a user behavior log, and constructing the key features into feature vectors for cluster analysis;

Classifying users into different groups according to the clustering result, calculating the characteristic mean value of each group, and forming a characteristic set representing the group portrait to obtain a static user portrait; (user portraits include demographics, consumption habits, traffic usage patterns, personalized interest preferences, etc.);

Analyzing a user behavior path, and identifying an interaction mode of the user by modeling the user behavior as a state transition process (such as a Markov chain), and adding the interaction mode as a dynamic characteristic into a static user portrait to obtain a dynamic user portrait;

(updating the feature matrix and the state transition probability in real time, adjusting the user portrait) and using a real-time data updating mechanism to adjust the user portrait, dynamically updating by a weighted average method, and combining new and old data:

Updated_Profile=α×Old_Profile+(1−α)×New_Data；

In this embodiment, a Ward variance minimization method is adopted to analyze user behavior data, and a clustering result is obtained, which is specifically as follows:

；

In this embodiment, the behavior path of the user is analyzed, and the interaction mode of the user in the application is identified by modeling the user behavior as a state transition process (such as a markov chain); determining all possible user behaviors as states in the Markov chain;

In this embodiment, S5 is specifically:

R(s,a)=w₁×Resource_Efficiency(s,a)+w₂×Path_Completion_Success(s)；

Q value update:

；

wherein Q (s, a) is a value estimate of performing action a in state s; alpha is the learning rate; gamma is a discount factor, and the current value of future rewards is weighted within the range of 0-gamma <1; is the maximum expected Q value for all possible actions a 'at the next state s';

In this embodiment, S6 is specifically:

And setting performance and experience thresholds by using a CEP (complex event processing) module of the Flink, analyzing the data stream in real time, and triggering an alarm or automatically adjusting rules when abnormality occurs.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. An IT operation and maintenance service management method based on big data, characterized in that it comprises the following steps:

S1: Collect data from logs, monitoring tools, and user behaviors, and use Python's Pandas library for preprocessing, including data cleaning and handling missing values;

S2: Combine the K-Means and DBSCAN preprocessed data sets for anomaly detection, and perform secondary data cleaning based on the anomaly detection results to optimize the data cleaning strategy;

S3: Based on the data after secondary data cleaning, the trend of system performance indicators is predicted based on the ARIMA model;

S4: Using the hierarchical clustering analysis algorithm, a static user profile is constructed based on user behavior data, and based on path dynamic analysis, the user profile is dynamically updated to adapt to the latest behavior data;

S5: Based on the trends of system performance indicators and user profiles, use reinforcement learning algorithms to optimize resource allocation and task scheduling;

S6: Integrate external security intelligence and apply it to security incident analysis, using the adaptive policy adjustment mechanism to automatically modify the protection strategy based on the security situation.

2. The IT operation and maintenance service management method based on big data according to claim 1, characterized in that S1 specifically comprises:

Use Logstash as a log collector to extract log data from different servers and applications and send the log data to Elasticsearch for storage and indexing;

Collect network and system performance metrics using Prometheus and Zabbix;

Export user access and interaction data from user behavior analysis tools;

Use Apache Kafka as a data streaming platform to transfer data from all data sources to the database in real time;

Use the Pandas library to load data from the database, and use the drop_duplicates() function of Pandas to remove duplicate values, filter out unnecessary rows or columns based on specific conditions, and use forward filling to handle missing values.

3. The IT operation and maintenance service management method based on big data according to claim 1, characterized in that S2 specifically comprises:

Standardize the preprocessed data, use the K-Means algorithm to perform cluster analysis on the standardized data, select the number of clusters and initialize the cluster centers, and repeat the following steps until convergence:

Where _Cj is the set of all member samples of cluster j, _μj is the center of the jth cluster; _xi is the feature vector of the ith data point, and _ci is the cluster to which the ith data point is assigned;

Compute the Euclidean distance from each point to the center of its cluster:

;

Among them, m represents the total dimension of the feature space; Represents data points The value in the kth dimension; represents the value of the cluster center μ _j in the kth dimension;

Find outliers based on distance threshold;

;

in, Represents data points The cluster to which it belongs; x _d represents an outlier; d _u represents the distance threshold;

The data is densely clustered using DBSCAN. For each data point, its The number of points in the neighborhood. If it reaches min_samples, it is a core point. If it does not meet the core point standard but belongs to the neighborhood of a core point, it is a boundary point. Otherwise, it is a noise point and is marked as an anomaly.

The anomaly datasets obtained from K-Means and DBSCAN are combined, the intersection and independent parts are analyzed, and secondary cleaning is performed to improve the quality of the dataset.

4. According to the IT operation and maintenance service management method based on big data in claim 3, it is characterized in that the abnormal data set obtained from K-Means and DBSCAN is combined, the intersection and independent parts are analyzed, and secondary cleaning is performed to improve the quality of the data set, specifically as follows: data points marked as abnormal by both K-Means and DBSCAN are identified as clear abnormal points, and the numerical data are modified using a data correction strategy; for data points only identified as abnormal by K-Means, check whether the data points belong to edge values, and filter them if they do not affect the analysis; for data points only identified as abnormal by DBSCAN, confirm that the data is normal and retain them; for abnormal data that cannot be corrected or is invalid for analysis, remove them from the data set.

5. The IT operation and maintenance service management method based on big data according to claim 1, characterized in that S3 specifically comprises:

The stationarity of the data was tested by ADF. If it was non-stationary, the difference operation was applied to remove the trend and seasonal effects;

Divide the data into training set and test set, where the training set is used for model fitting and the test set is used for model verification;

According to the behavioral characteristics of the autocorrelation function ACF and the partial autocorrelation function PACF, select appropriate parameters (p, d, q); p is the order of the autoregressive part; d is the degree of difference; q is the order of the moving average part;

Use the selected parameters to build an ARIMA model, fit the denoised training data, use the training set to train the model, and obtain parameter estimation and model fitting results;

;

in, is the observed value of the time series at time t; c is a constant term; is the coefficient of the autoregressive part; p is the order of the autoregressive part, is the corresponding index variable; for Observation value of order; is the coefficient of the moving average part, q is the order of the moving average part, is the corresponding index variable; is the past error term; The error term at time t;

The Ljung-Box test was used to test whether the residuals were white noise to confirm that the model captured the structure of the data;

;

Where Q is the Ljung-Box statistic; n is the sample size, that is, the number of observations of the time series data; is the autocorrelation coefficient of the residual, lag The autocorrelation coefficient of ; h is the number of lags tested.

6. According to the IT operation and maintenance service management method based on big data in claim 1, it is characterized in that the S4 is specifically:

Extract key features from user behavior logs and construct them into feature vectors for cluster analysis;

Use Ward variance minimization method to analyze user behavior data and obtain clustering results;

According to the clustering results, users are classified into different groups, and the feature mean of each group is calculated to form a feature set representing the group portrait, and a static user portrait is obtained;

Analyze the user's behavior path, identify the user's interaction pattern by modeling the user behavior as a state transition process, and add it to the static user profile as a dynamic feature to obtain a dynamic user profile;

Use the real-time data update mechanism to adjust the user portrait, dynamically update it using the weighted average method, and merge new and old data:

Updated_Profile=α×Old_Profile+(1−α)×New_Data;

Among them, Updated_Profile represents the updated user profile, Old_Profile represents the original user profile, New_Data represents the new data, and α is the weight.

7. The IT operation and maintenance service management method based on big data according to claim 6 is characterized in that the Ward variance minimization method is used to analyze the user behavior data to obtain the clustering results, which are specifically as follows:

Extract key behavioral features, including session duration, page visits, and click-through rate, to form a feature matrix

X; and use Euclidean distance as the standard distance measurement method to calculate the distance between different users and construct a distance matrix;

According to the calculated distance matrix, the Ward algorithm is used for clustering. Clustering is performed by calculating the discreteness of each cluster. Two clusters are merged each iteration to minimize the increment of the square error sum WCSS within the merged cluster. A clustering tree is generated and a dendrogram is constructed.

;

Where WCSS is the WCSS increment; x is the number of cells belonging to the cluster user; cluster The mean of

Based on the dendrogram, the number of clusters is determined by the CH index and Dunn index:

in, is the inter-cluster scatter matrix, is the intra-cluster scatter matrix; is the number of clusters; is the total number of data;

The Dunn index is the ratio of the minimum distance between the two most adjacent clusters to the maximum cluster diameter.

8. According to claim 6, the IT operation and maintenance service management method based on big data is characterized in that the user's behavior path is analyzed by modeling the user behavior as a state transition process, using a Markov chain to identify the user's interaction mode in the application; all possible user behaviors are determined as states in the Markov chain;

Count the frequency of each behavior transition and calculate the transition probability between states;

in, Status Transfer to state The probability of; C(.) is the transfer counter; Indicates the acquisition status Total number of transfers;

Based on the calculated transition probabilities between states, a transition matrix is constructed, and the transition matrix is used to identify high-frequency transition paths and determine user interaction patterns.

9. According to the IT operation and maintenance service management method based on big data in claim 1, it is characterized in that the S5 is specifically:

Define the state space as a combination of system load level and user path location, and define the action set as resource adjustment and user guidance strategy; combine user profile information with system state characteristics to form the state input of reinforcement learning, making scheduling decisions more personalized; adjust the action space according to different user groups;

The reward function R(s,a) combines resource utilization and path completion rate:

R(s,a)=w ₁ ×Resource_Efficiency(s,a)+w ₂ ×Path_Completion_Success(s);

Among them, Resource_Efficiency(s,a) is the resource utilization; Path_Completion_Success(s) is the path completion rate; _w1 and _w2 are the corresponding weights;

Q value update:

;

Where Q(s,a) is the estimated value of executing action a in state ss; α is the learning rate; γ is the discount factor; is the maximum expected Q value for all possible actions a′ at the next state s′;

Collect behavioral data and performance indicators, adjust strategies through Q-Learning, obtain optimization strategies through training, and improve resource allocation efficiency and user experience based on the optimization strategies;

Implemented in real-time stream processing -Balanced action selection in the greedy policy boosting model.

10. The IT operation and maintenance service management method based on big data according to claim 9, characterized in that S6 specifically comprises:

Use Flink's CEP module to set performance and experience thresholds, analyze data streams in real time, and trigger alarms or automatically adjust rules when anomalies occur;

Encapsulate feedback information into the Flink pipeline and automatically adjust it according to the trigger rules , learning rate, and update frequency of state features.