CN116701998A

CN116701998A - An Intelligent Fault Migration Diagnosis Method Based on Generalized Networks in Heterogeneous Federal Domains

Info

Publication number: CN116701998A
Application number: CN202310628150.6A
Authority: CN
Inventors: 秦毅; 钱泉; 蒲华燕; 毛永芳; 周江洪
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-09-05
Anticipated expiration: 2043-05-30
Also published as: CN116701998B

Abstract

The invention relates to an intelligent fault migration diagnosis method based on heterogeneous federal domain generalized network, belonging to the field of mechanical fault migration diagnosis. The method comprises the following steps: constructing a DDA model; building HFDGN based on MIWM mechanism; s4: inputting the training samples divided in all auxiliary domains of the source client and the source domain of the target client into the constructed DDA model, and training the corresponding local client model by utilizing an optimized objective function in the DDA model; s5: uploading the parameters of the trained DDA model to a central server, and then performing federal migration fault by using the constructed HFDGN network; s6: after repeated iterative training, the error curve tends to be stable, the HFDGN network training is completed, and the trained HFDGN network is used for heterogeneous multi-source federal migration diagnosis. The invention can meet the application requirements of high data utilization rate and real-time diagnosis in actual engineering.

Description

Intelligent fault migration diagnosis method based on heterogeneous federal domain generalization network

Technical Field

The invention belongs to the field of mechanical fault migration diagnosis, and relates to an intelligent fault migration diagnosis method based on heterogeneous federal domain generalized network.

Background

In the existing mechanical fault migration diagnosis technology, although some federal migration learning intelligent diagnosis methods exist to solve the distribution difference between a source domain and a target domain and the problem of data privacy protection. However, their performance is entirely dependent on the source client and the destination client's corresponding mechanical devices being homogenous, i.e. the source client and the destination client's mechanical data should come from the same mechanical part. At the same time, these methods also require that test target domain data samples in the target client be available during the training process. Considering that in practical engineering, the target domain data is usually invisible and heterogeneous with the source client data, the existing federal migration diagnostic method is difficult to meet the application requirements of high data utilization and real-time diagnosis.

In order to solve the above problems, a novel heterogeneous federal domain generalization network (Heterogeneous federated domain generalization network, HFDGN) is needed to fill the gap of heterogeneous multi-source federal diagnostic methods.

Disclosure of Invention

In view of the above, the present invention aims to provide an intelligent fault migration diagnosis method based on heterogeneous federal domain generalization network, in the HFDGN network, a heterogeneous migration learning framework (Heterogeneous federated transfer learning, HFTL) is adopted, which can implement generalization fault diagnosis of a target client through public knowledge characterization mapping of heterogeneous source clients. In addition, a decoupling domain adaptive base model (Disentangled domain adaptation, DDA) is employed to remove negative effects of noise while at the same time enhancing the ability of domain aliasing and extracting inherent fault-related features.

In order to achieve the above purpose, the present invention provides the following technical solutions:

an intelligent fault migration diagnosis method based on heterogeneous federal domain generalization network specifically comprises the following steps:

s1: acquiring an original vibration signal on mechanical equipment through an acceleration sensor so as to facilitate construction of a subsequent migration diagnosis task; the collected original vibration signal is subjected to sample expansion by utilizing a sliding window sampling technology;

s2: constructing a decoupling domain self-adaptive basic model, namely a DDA model;

s3: constructing a heterogeneous federal migration learning network based on a mutual information weight matching mechanism, wherein the mutual information weight matching mechanism is abbreviated as MIWM; the heterogeneous federal transfer learning network is called HFDGN for short, and adopts a heterogeneous transfer learning framework;

s4: inputting the training samples divided in all auxiliary domains of the source client and the source domain of the target client into the constructed DDA model, and training the corresponding local client model by utilizing an optimized objective function in the DDA model;

s5: uploading the parameters of the trained DDA model to a central server, and then performing federal migration fault diagnosis by using the constructed HFDGN network;

s6: after repeated iterative training, the error curve tends to be stable, the HFDGN network training is completed, and the trained HFDGN network is used for heterogeneous multi-source federal migration diagnosis.

Further, in step S2, the backbone network of the DDA model includes four parts: feature extractor G _FE (θ _FE ) Decoupler G _D (θ _D ) Reconstructor G _R (θ _R ) And fault classifier G _FC (θ _FC), wherein ,θ_FE 、θ _D 、θ _R and θ_FC Respectively representing trainable weights of the corresponding network models; feature extractors are used to mine distribution difference knowledge and reduce distribution differencesDifferent and then obtain general feature F _G The method comprises the steps of carrying out a first treatment on the surface of the The fault classifier is aimed at extracting the tag features F _L To identify the type of fault; the decoupler is used for decoupling and separating out the fault related characteristic F _FR Fault uncorrelated feature F _FI The method comprises the steps of carrying out a first treatment on the surface of the The reconstructor reconstructs the common features from the fault-related features and the fault-uncorrelated features, so the decoupler and the reconstructor can be regarded as the encoder and decoder structures in the self-encoder.

The optimization objective of the DDA model mainly comprises the following three parts: 1) Decoupling characterization: learning a decoupling characterization to separate noise-induced fault uncorrelated features from the generic features; 2) Separation and reconstruction: maximizing a distribution distance between the fault-related features and the fault-unrelated features so as to ensure independence of the extracted fault-related features; 3) Distribution self-adaption: minimizing the distribution difference between any two fault-related features.

Further, in step S2, in the optimization objective 1) of the DDA model, the decoupling characterization is specifically: removing noise-induced fault uncorrelated features in an anti-training manner with decouplers and fault classifiers; first, using cross entropy loss L _C Training a feature extractor, a decoupler and a fault classifier to accurately identify fault types;

wherein ,representing a desire for a sample domain; x is X ⁱ ,Y ⁱ Respectively representing an ith data sample and a corresponding label;Representing a sample field, C representing a fault type; i () represents an indication function, when argmax (Y ⁱ ) When=c, i=0;

the weight parameters of the feature extractor and fault classifier are then fixed, trained by maximizing the information entropy lossThe decoupler spoofs the fault classifier to learn fault uncorrelated features, wherein the information entropy loss is a measure that can reflect the purity of the sample predictive label; the smaller the information entropy loss, the higher the predictive tag probability purity. Information entropy (Information entropy, IE) loss L _IE The expression is as follows:

wherein ,representing a desire for a sample domain;Representing sample X ⁱ Is>Representing the c-th element in the predictive label vector, the predictive label vector is obtained by:

by the countermeasure training between the formulas (1) and (2), the final failure-uncorrelated feature and the failure-correlated feature can be obtained by the decoupler separation.

Further, in step S2, in the optimization objective 2) of the DDA model, the separation and reconstruction are specifically: by maximizing the average difference loss L _D Training the decoupler to broaden the distribution difference between the fault-related and fault-uncorrelated features:

wherein ,respectively representing expectations of fault uncorrelated features and fault correlated features; phi (·) is a high-dimensional mapping function in the regenerated hilbert space;Representing the 2 norms of the regenerated hilbert space;

fault-related feature F _FR Fault uncorrelated feature F _FI Obtained by the formula:

{F _FR ,F _FI }＝G _D (G _FE (X)) (5)

at the same time, in order to avoid that the excessive training of formula (5) destroys the intrinsic properties of the extracted fault-related features, the loss L is lost through reconstruction _R Training a reconstructor to reconstruct generic features from the fault-related features and the fault-uncorrelated features;

wherein ,representing the desire for a sample field.

Further, in step S2, in the optimization objective 3) of the DDA model, the distribution adaptation is specifically: using minimized average difference index as domain aliasing loss L _DC Training the feature extractor and decoupler to achieve distributed adaptation;

wherein ,representing the expectation of fault-uncorrelated features, phi (·) being a high-dimensional mapping function in the regenerated hilbert space;Representing the 2 norms of the regenerated hilbert space;

suppose the client contains Q _m There will be K domain confusion losses for the auxiliary domains wherein According to the migration learning theory, it can be known that +.>Representing the similarity of distribution of the corresponding two auxiliary domains, i.e.>Smaller represents that the feature extractor will focus too much on the corresponding +.>Is aligned with the distribution of the two auxiliary domains. Therefore, in order for the feature extractor to learn more general domain invariant features, the purity loss L is constructed using information entropy _P Feature extractor and decoupler are de-trained to reduce purity between all domain aliasing losses, i.e. to maximize purity loss L _P ；

Further, in step S2, the data of the plurality of auxiliary domains of the source client will completely participate in three optimization targets of the DDA model; because the target client assumes that the target domain data is inaccessible in the training process and can only be used for testing links, the target client only participates in two optimization targets, namely 'decoupling characterization' and 'separation and reconstruction'.

Further, in step S3, the MIWM mechanism is used to evaluate the contribution of each source client to the target client during the weight parameter allocation process;

contribution MI of ith source client to target client _i The expression of (2) is:

wherein , andA fault classifier and decoupler in the DDA model trained on behalf of the ith source client; f (F) _G and F_L Respectively representing the obtained general characteristics and the label characteristics of the source domain data training DDA model in the target client; i _Θ Representing two data sample fields +.>Mutual information between the two is expressed as follows:

wherein T (θ) represents a function set formed by network parameters; andRespectively represent-> andEdge probability distribution of>Is a joint probability distribution between two domains;Representing the product of the edge probability distribution between the two domains; theta represents a weight parameter set, theta represents a weight parameter,/->

Representing the expectation of joint probabilities of two domains,representing the product of the two-domain edge probabilities.

Further, in step S5, the central server aggregates the weight parameters of the feature extractors from all source clients, and the calculation formula is:

wherein M represents the number of source clients;a weight parameter representing a feature extractor from an ith source client;Representing the aggregated weight parameters; w (w) _i Acquiring weight parameters matched with a corresponding source client and a target client through an MIWM mechanism;

wherein τ represents a temperature coefficient;

the central server then assigns the aggregated weight parameters to each of the source client and the target client.

Further, in step S5, the data is uploaded to the central serverDDA model parameters of (c) include: for all source clients, the weight parameters of the feature extractor, the decoupler and the fault classifier in the DDA model need to be uploaded; for the target client, the general feature F extracted from the source domain data in the DDA model needs to be uploaded _G And tag feature F _L 。

The invention has the beneficial effects that: the invention can solve the problem of meeting the application requirements of high data utilization rate and real-time diagnosis in actual engineering. In the Heterogeneous Federal Domain Generalized Network (HFDGN), the invention adopts a heterogeneous migration learning framework (HFTL), and can realize the generalized fault diagnosis of the target client through the public knowledge characterization mapping of the heterogeneous source client. In addition, the invention adopts a decoupling domain adaptive basic model (DDA) to remove negative influence caused by noise, and meanwhile, the basic model can enhance the domain confusion capacity and extract the inherent fault related characteristics.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a diagram of a decoupling domain adaptive base model (DDA);

FIG. 2 is a schematic diagram of a heterogeneous Federal transfer learning framework (HFTL);

FIG. 3 is a DDS test stand;

FIG. 4 is a CWRU test stand;

FIG. 5 is an RDS test stand;

FIG. 6 is a SWJTU bench;

FIG. 7 shows the results of comparative experiments.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Referring to fig. 1 to 7, an embodiment of the present invention provides a cross-bearing migration diagnosis method based on heterogeneous federal domain generalization network, which specifically includes the following steps:

step 1: the original vibration signals on the mechanical equipment are collected through the acceleration sensor so as to facilitate the construction of a subsequent migration diagnosis task. And then, expanding the sample by utilizing a sliding window sampling technology from the acquired original vibration signal.

Step 2: a decoupling domain adaptive basis model (DDA) based on a one-dimensional convolutional neural network is constructed.

As shown in fig. 1, the backbone network of the decoupling domain adaptive base model (DDA) includes four parts: feature extractor G _FE (θ _FE ) Decoupler G _D (θ _D ) Reconstructor G _R (θ _R ) And fault classifier G _FC (θ _FC), wherein θ_FE 、θ _D 、θ _R and θ_FC Respectively representing trainable weights of the corresponding network model. The feature extractor is used to mine the knowledge of the distribution differences and reduce the distribution differences, and then obtain the general feature F _G The method comprises the steps of carrying out a first treatment on the surface of the The fault classifier is aimed at extracting the tag features F _L To identify the type of fault; the decoupler is used for decoupling and separating out the fault related characteristic F _FR Fault uncorrelated feature F _FI The method comprises the steps of carrying out a first treatment on the surface of the The reconstructor reconstructs the common features from the fault-related features and the fault-uncorrelated features, so the decoupler and the reconstructor can be regarded as the encoder and decoder structures in the self-encoder.

The optimization objective of the decoupling domain adaptive basic model mainly comprises the following three parts: (1) decoupling characterization: learning a decoupling characterization to separate noise-induced fault uncorrelated features from the generic features; (2) separation and reconstruction: maximizing a distribution distance between the fault-related features and the fault-unrelated features so as to ensure independence of the extracted fault-related features; (3) distribution self-adaption: minimizing the distribution difference between any two fault-related features. These three parts will be described in detail below:

(1) decoupling characterization: to achieve feature decoupling, noise-induced fault-uncorrelated features are removed in an anti-training manner with decouplers and fault classifiers. First, using cross entropy loss L _C Feature extractors, decouplers, and fault classifiers are trained to accurately identify fault categories.

wherein ,Xⁱ ,Y ⁱ Respectively representing an ith data sample and a corresponding label;representing a sample field, C representing a fault type; i () represents an indication function, when argmax (Y ⁱ ) When=c, i=0.

The weight parameters of the feature extractor and fault classifier are then fixed, and the decoupler spoofs the fault classifier by maximizing the information entropy loss, which is a purity that can reflect the sample prediction labels, to learn the fault uncorrelated features. The smaller the information entropy loss, the higher the predictive tag probability purity. Information entropy (Information entropy, IE) loss L _IE The expression is as follows:

wherein ,representing sample X ⁱ Is>Representing the c-th element in the predictive label vector, the predictive label vector is obtained by:

(2) Separation and reconstruction: to enhance the independence of the extracted fault-related features, the average difference loss L is maximized by _D Training the decoupler to broaden the distribution difference between the fault-related and fault-uncorrelated features:

wherein φ (·) is a high-dimensional mapping function in regenerated Hilbert space; wherein the fault-related feature F _FR Fault uncorrelated feature F _FI Obtained by the formula:

{F _FR ,F _FI }＝G _D (G _FE (X)) (5)

(3) distribution self-adaption: after the two processes described above, in order to align the distribution of the extracted fault-related features between different domains, the minimized average difference index is also used as the domain aliasing loss L _DC To train the feature extractor and decoupler to achieve distributed adaptation.

Suppose the client contains Q _m There will be K domain confusion losses for the auxiliary domains wherein According to the migration learning theory, it can be known that +.>Representing the similarity of distribution of the corresponding two auxiliary domains, i.e.>Smaller represents that the feature extractor will focus too much on the corresponding +.>Is aligned with the distribution of the two auxiliary domains. Therefore, in order for the feature extractor to learn more general domain invariant features, the purity loss L is constructed using information entropy _P Feature extractor and decoupler are de-trained to reduce purity between all domain aliasing losses, i.e. to maximize purity loss L _P 。

The data of the multiple auxiliary domains of the source client will participate in the above three optimization processes in its entirety. Because the target client assumes that the target domain data is inaccessible in the training process and can only be used for a test link, the target client only participates in two optimization processes of decoupling characterization and separation and reconstruction of the source domain data.

Step 3: a heterogeneous federal migration learning network (HFDGN) based on a mutual information weight matching mechanism (MIWM) was constructed, and as shown in fig. 2, a heterogeneous migration learning framework (HFTL) was adopted.

The heterogeneous migration learning framework proposed by the present invention is explained in detail from three aspects:

(1) heterogeneous migration learning frame structure

As shown in fig. 2, the heterogeneous federal migration learning framework is composed of a plurality of source clients, a target domain client, a central server, and a mutual information weight matching (Mutual information weight matching, MIWM) mechanism. Wherein the source client and the target client may be heterogeneous; the central server is used for realizing the distribution processing of decoupling domain self-adaptive basic model (DDA) weight parameters trained by all source clients; a mutual information weight matching mechanism (MIWM) is used to evaluate the contribution of each source client to the target client during the weight parameter assignment process.

(2) Federal communication paradigm

Through the federal communication paradigm demonstrated in fig. 2, it can be known that all source domain clients need to upload the weight parameters of the feature extractor, the decoupler and the fault classifier in the decoupling domain adaptive base model (DDA) to the central server; for a target client, decoupling common features F extracted from source domain data in a domain adaptive base model (DDA) _G And tag feature F _L And need to be uploaded to a central server. Because the high-dimensional features extracted by the feature extractor are more generic and migratable, the central server only aggregates the weight parameters of the feature extractor from all source clients.

Wherein M represents the number of source clients;a weight parameter representing a feature extractor from an ith source client; w (w) _i The weight parameters matched with the corresponding source client and the target client are obtained through a mutual information weight matching mechanism (MIWM). Finally, the central server assigns the aggregated weight parameters to each of the source client and the target client.

(3) Mutual information weight matching mechanism (MIWM)

In the mutual information weight matching mechanism, mutual information is used to evaluate the contribution of each source client to the target client. The mutual information may reflect the similarity between two data sample fields, with greater mutual information having greater similarity. Two data sample fieldsThe definition of mutual information between them is as follows:

wherein , andRespectively represent-> andIs a boundary probability distribution of (1);Is a joint probability distribution between two domains;Representing the product of the edge probability distribution between the two domains. The relationship between the joint probability distribution and the edge probability distribution is as follows:

because equation (10) is difficult to calculate the mutual information between unknown continuous variables, we re-express equation (10) by the Donsker-vardhan variation theory as:

where T (θ) represents the set of functions formed by the network parameters. By mutual information evaluation of equation (12), the contribution MI of the ith source client to the target client _i Can be obtained by the following formula:

wherein , andA fault classifier and decoupler in a decoupling domain adaptive base model trained on behalf of an ith source client; f (F) _G and F_L And respectively representing the obtained general characteristics and the label characteristics of the source domain data training decoupling domain self-adaptive basic model in the target client. Finally, the weight w in formula (9) _i Can be expressed as:

where τ represents the temperature coefficient, the smaller the temperature coefficient, the larger { MI } will be ₁ ,MI ₂ ,…,MI _M Differences between.

In combination with the decoupling domain adaptive basic model and the heterogeneous federation migration learning framework, a final Heterogeneous Federation Domain Generalized Network (HFDGN) is constructed to implement data federation under privacy protection.

Step 4: and inputting the training samples divided in all auxiliary domains of the source client and the source domain of the target client into a built decoupling domain self-adaptive basic model (DDA), and training a corresponding local client model by utilizing an optimized objective function in the decoupling domain self-adaptive basic model (DDA).

Step 5: uploading parameters of a decoupling domain self-adaptive basic model of a trained target client and a source client to a central server, and then performing federal migration fault diagnosis by using a built heterogeneous federal migration learning network (HFDGN).

Step 6: after repeated iterative training, the error curve tends to be stable, the model training is completed, and the trained heterogeneous federal domain generalization network is used for heterogeneous multi-source federal migration diagnosis.

The effectiveness of the above-described intelligent diagnosis method is described below by experimental results.

Verification experiment: the experiment collects vibration signals (CWRU, RDS, SWJTU) of three bearing fault simulation experiment tables and vibration signals (DDS) of one gear fault simulation experiment table.

(1) As shown in fig. 3, the DDS test stand mainly comprises five parts, namely a motor, a planetary gear box, a parallel gear box and a magnetic powder brake. The different working condition signals can be obtained through magnetic powder brake loading simulation. Five kinds of health information are shared by the planetary gear boxes: normal, surface wear, root breakage, tooth defect, tooth breakage. Furthermore, three conditions were simulated: 0 N.m (G1), 1.4 N.m (G2), 2.8 N.m (G3).

(2) The CWRU bearing data set is a CWRU standard bearing data set disclosed by Kassi kitchen university, U.S. and a schematic diagram of the CWRU bearing data set is shown in FIG. 4, which is composed of a motor, bearings at both ends of the motor, a torque sensor and a power meter. Four faults, namely a normal fault, an inner ring fault, a rolling body fault and an outer ring fault, are simulated. It contains four kinds of operating mode information: hp (C1), 1hp (C2), 2hp (C3), and 3hp (C4).

(3) A schematic diagram of a test bed for collecting RDS bearing data sets is shown in FIG. 5, and the test bed consists of a servo motor, a coupler, a rotor, bearings at two ends of the rotor and bearing blocks. As with the CWRU bearing dataset, four health states (normal, inner ring failure, rolling body failure, and outer ring failure) were also simulated. By applying rotors of different weights, four load conditions were simulated: 0N (R1), 14N (R2) 28N (R3), and 44N (R4).

(4) The SWJTU bearing data set is from southwest university of traffic, and a schematic diagram of a test bed is shown in FIG. 6, and consists of a motor, a support bearing, a fault bearing for testing, and a loading system. Four health conditions (normal, inner ring failure, rolling body failure, and outer ring failure) and four load conditions (0 kN (B1), 1kN (B2), 2kN (B3), and 3kN (B4)) were simulated.

From the four data sets described above, a cwru+rds+swjtu→dds heterogeneous multisource federal diagnostic example was constructed to evaluate the diagnostic performance of HFDGN.

Comparison experiment:

to demonstrate the superiority of the Heterogeneous Federal Domain Generalized Network (HFDGN) -based migration diagnostic method of the present invention, experimental results at six generalized migration diagnostic tasks are shown in fig. 7, compared to the current typical domain generalized migration diagnostic model (WDCNN, whitening-net) and the classical domain adaptive diagnostic model (DANN, DDC). From fig. 7, it can be derived that the heterogeneous federal domain generalization network proposed by the present invention has higher migration diagnosis precision and stronger generalization capability. In addition, the average diagnosis accuracy of the HFDGN on six generalized migration diagnosis tasks reaches more than 89.85%, and the contrast domain adaptive diagnosis model (DANN, DDC) has approximately the same diagnosis effect. In particular, for the G1-G3, G2-G3 and G3-G2 migration diagnostic tasks, HFDGN has a more significant improvement in diagnostic accuracy than all other comparative models.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A method for intelligent fault migration diagnosis based on heterogeneous federated domain generalized networks, characterized in that the method specifically includes the following steps:

S1: Collect raw vibration signals from mechanical equipment using sensors, and then amplify the collected raw vibration signals into samples;

S2: Construct the Decoupled Domain Adaptive Base Model, abbreviated as DDA model;

S3: Construct a heterogeneous federated transfer learning network based on the mutual information weight matching mechanism, where the mutual information weight matching mechanism is abbreviated as MIWM; the heterogeneous federated transfer learning network is abbreviated as HFDGN, and a heterogeneous transfer learning framework is adopted.

S4: Input the training samples from all auxiliary domains of the source client and the source domain of the target client into the constructed DDA model, and use the optimization objective function in the DDA model to train the corresponding local client model;

S5: Upload the parameters of the trained DDA model to the central server, and then use the built HFDGN network to perform federated migration fault diagnosis.

S6: After multiple iterations of training, the error curve tends to stabilize, and the HFDGN network training is completed. The trained HFDGN network is then used for heterogeneous multi-source federated migration diagnosis.

2. The intelligent fault migration diagnosis method according to claim 1, characterized in that, in step S2, the backbone network of the DDA model comprises four parts: a feature extractor _GFE ( _θFE ), a decoupler _GD ( _θD ), a reconstructor _GR ( _θR ), and a fault classifier _GFC ( _θFC ), wherein _θFE , _θD , _θR , and _θFC represent the trainable weights of the corresponding network model; the feature extractor is used to mine knowledge of distribution differences and reduce distribution differences, and then obtain general features _FG ; the fault classifier is dedicated to extracting label features _FL to identify fault types; the decoupler is used to decouple and separate fault-related features _FFR and fault-independent features _FFI ; the reconstructor reconstructs general features from fault-related features and fault-independent features;

The optimization objectives of the DDA model include the following three parts: 1) Decoupling representation: learn a decoupling representation to separate fault-irrelevant features caused by noise from general features; 2) Separation and reconstruction: maximize the distribution distance between fault-related features and fault-irrelevant features; 3) Distribution adaptation: minimize the distribution difference between any two fault-related features.

3. The intelligent fault migration diagnosis method according to claim 2, characterized in that, in step S2, the decoupling representation in the optimization objective 1) of the DDA model is specifically: using a decoupler and a fault classifier in an adversarial training manner to remove fault-irrelevant features caused by noise; firstly, using cross-entropy loss _LC to train the feature extractor, decoupler and fault classifier to accurately identify fault types;

in, This represents the expectation over the sample domain; ^Xi and ^Yi represent the i-th data sample and its corresponding label, respectively. Let C represent the sample domain and C represent the fault type; I() represents the indicator function, where I = 0 when argmax(Y ⁱ ) = c;

Then, with the weight parameters of the feature extractor and the fault classifier fixed, the decoupler is trained to deceive the fault classifier by maximizing the information entropy loss, thereby learning fault-independent features; the information entropy loss _LIE is expressed as follows:

in, This indicates the expectation over the sample domain; This represents the predicted label vector of sample ^Xi . This represents the c-th element in the predicted label vector, which is obtained by the following formula:

Through adversarial training between formulas (1) and (2), the final fault-independent features and fault-related features are obtained by decoupling.

4. The intelligent fault migration diagnosis method according to claim 2, characterized in that, in step S2, the separation and reconstruction in the optimization objective 2) of the DDA model specifically involves: training the decoupler to amplify the distribution difference between fault-related features and fault-independent features by maximizing the average difference loss _LD .

in, Let represent the expectation of fault-independent features and fault-dependent features, respectively; φ(·) is a high-dimensional mapping function in the regenerated Hilbert space. The 2-norm of the regenerated Hilbert space;

Fault-related characteristics _FFR and fault-independent characteristics _FFI are obtained by the following formula:

{F _FR ,F _FI }＝G _D (G _FE (X)) (5)

Meanwhile, the reconstructor is trained by reconstructing the loss L _R to reconstruct general features from fault-related features and fault-independent features;

in, This indicates the expectation over the sample domain.

5. The intelligent fault migration diagnosis method according to claim 2, characterized in that, in step S2, the distribution adaptation in the optimization objective 3) of the DDA model is: using the minimization of the average difference index as the domain confusion loss _LDC to train the feature extractor and decoupler, thereby achieving distribution adaptation;

in, φ(·) represents the expectation of fault-independent features, and is a high-dimensional mapping function in the regenerated Hilbert space. The 2-norm of the regenerated Hilbert space;

Assuming the client has _Qm auxiliary fields, there will be K field confusion losses. in The purity loss _LP, constructed using information entropy, is used to train the feature extractor and decoupler to reduce the purity among all domain confusion losses, i.e., to maximize the purity loss _LP .

6. The intelligent fault migration diagnosis method according to claim 2, characterized in that, in step S2, the data from multiple auxiliary domains of the source client will fully participate in the three optimization objectives of the DDA model; the source domain data of the target client will only participate in the two optimization objectives of "decoupling representation" and "separation and reconstruction".

7. The intelligent fault migration diagnosis method according to claim 1, wherein in step S3, the MIWM mechanism is used to evaluate the contribution of each source client to the target client during the weight parameter allocation process;

The expression for the contribution _MI of the i-th source client to the target client is:

in, and Represents the fault classifier and decoupler in the DDA model trained on the i-th source client; F_G and FL _L represent the general features and label features obtained from training the DDA model on the source domain data in the target client, respectively; I _Θ represents the two data sample domains. The mutual information between them is expressed as:

Where T(θ) represents the set of functions formed by the network parameters; and Represent and Marginal probability distribution, It is the joint probability distribution between two domains; This represents the product of the marginal probability distributions between two domains; Θ represents the set of weight parameters, and θ represents the weight parameters. To express the expectation of the joint probability of two domains, This represents the expectation of the product of the marginal probabilities of the two domains.

8. The intelligent fault migration diagnosis method according to claim 7, characterized in that, in step S5, the central server aggregates the weight parameters of the feature extractors from all source clients, and the calculation formula is:

Where M represents the number of source clients; This represents the weight parameters of the feature extractor from the i-th source client; This represents the aggregated weight parameters; w_i is the weight parameter obtained by the MWIM mechanism to match the source client and the target client.

Where τ represents the temperature coefficient;

Then, the central server assigns the aggregated weight parameters to each source client and target client.

9. The intelligent fault migration diagnosis method according to claim 1 or 8, characterized in that, in step S5, the DDA model parameters uploaded to the central server include: for all source clients, the weight parameters of the feature extractor, decoupler and fault classifier in the DDA model need to be uploaded; for the target client, the general features _FG and label features _FL extracted from the source domain data in the DDA model need to be uploaded.