[go: up one dir, main page]

CN118885738B - Comprehensive automated testing method and system for electric control cabinets - Google Patents

Comprehensive automated testing method and system for electric control cabinets Download PDF

Info

Publication number
CN118885738B
CN118885738B CN202411338200.8A CN202411338200A CN118885738B CN 118885738 B CN118885738 B CN 118885738B CN 202411338200 A CN202411338200 A CN 202411338200A CN 118885738 B CN118885738 B CN 118885738B
Authority
CN
China
Prior art keywords
data
parameter
data set
noise
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411338200.8A
Other languages
Chinese (zh)
Other versions
CN118885738A (en
Inventor
李艳春
华星
苏丽
濮伟新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI KANGBEI ELECTRONIC EQUIPMENT CO Ltd
Original Assignee
WUXI KANGBEI ELECTRONIC EQUIPMENT CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI KANGBEI ELECTRONIC EQUIPMENT CO Ltd filed Critical WUXI KANGBEI ELECTRONIC EQUIPMENT CO Ltd
Priority to CN202411338200.8A priority Critical patent/CN118885738B/en
Publication of CN118885738A publication Critical patent/CN118885738A/en
Application granted granted Critical
Publication of CN118885738B publication Critical patent/CN118885738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/26Discovering frequent patterns
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention relates to the technical field of data cleaning, in particular to a comprehensive automatic test method and system for an electric control cabinet. According to the invention, when the simulation test model is established to detect the electric control cabinet, noise data is effectively screened and targeted denoising is performed, so that the efficiency and accuracy are improved. And based on the data similarity cluster analysis parameter time sequence data, constructing a frequent pattern model, and distinguishing the normal data from the noise data obviously. And analyzing the relevance and frequent modes in the model, quantifying the noise existence probability, and providing a basis for evaluating the quality of the data set. Comprehensively considering the variation trend and fluctuation condition of the data, and combining the noise probability to obtain the data quality. And accurately identifying the noise data set, efficiently denoising and obtaining high-quality data. Finally, the performance test of the electric control cabinet is performed based on the high-quality data set, so that the denoising efficiency is improved, and the accuracy and reliability of the test result are ensured. The invention provides a high-efficiency and accurate data processing scheme for detection of the electric control cabinet.

Description

Comprehensive automatic test method and system for electric control cabinet
Technical Field
The invention relates to the technical field of data cleaning, in particular to a comprehensive automatic test method and system for an electric control cabinet.
Background
The electric control cabinet is totally called as an electric control cabinet, and centralized control of equipment is realized through integrating various electric elements, namely, the comprehensive automation of the electric control cabinet. In order to find potential fault points of equipment and improve reliability and service life of the electric control cabinet, performance of the electric control cabinet is required to be tested, and the conventional method is used for realizing comprehensive automatic test of the electric control cabinet through software simulation detection.
Currently, when an electric control cabinet is tested, a large amount of operation data of the electric control cabinet is generally acquired, and then the detection is performed by using software simulation, so that the quality of the acquired data is crucial to the whole simulation detection process. Because the acquired data can be interfered by factors such as environment and the like when the data are acquired, noise is generated when the data are denoised, the prior art generally adopts indifferent denoising, however, considering that the data amount required by the simulation model detection is large, the indifferent denoising can cause low efficiency, so that the accurate and efficient screening of the noisy data in a large amount of data becomes a key problem in the simulation detection process.
Disclosure of Invention
In order to solve the technical problems that denoising of a large amount of data by using an indiscriminate denoising method causes low efficiency and further influences the accuracy of a test result, the invention aims to provide a comprehensive automatic test method and system for an electric control cabinet, and the adopted technical scheme is as follows:
For any performance parameter of the electric control cabinet, acquiring a parameter data set of the performance parameter in each test period, wherein each parameter data set comprises parameter time sequence data of a plurality of data periods;
according to the numerical characteristics of all parameter time sequence data in each parameter data set and a frequent pattern mining algorithm, carrying out fusion analysis on the parameter time sequence data in each cluster in the clustering result to obtain a frequent pattern model corresponding to each parameter data set;
In the frequent pattern model, the noise existence probability of each parameter data set is determined based on the distribution condition of each item in all parameter time sequence data, the similarity condition among frequent item sets and the position of the frequent item set in the frequent pattern model;
And screening the noise data sets in all the parameter data sets according to the data quality corresponding to the parameter data sets of all the test periods, denoising the parameter time sequence data in the noise data sets to obtain high-quality data sets corresponding to all the test periods, and performing performance test of the electric control cabinet based on all the high-quality data sets corresponding to all the performance parameters.
Further, the method for acquiring the clustering result comprises the following steps:
In each parameter data set, for parameter time sequence data in any two different data periods, determining a difference characteristic value between the two parameter time sequence data based on the difference between the numerical values and the difference condition between the slope values of the numerical values;
And taking the difference characteristic value between the parameter time sequence data as distance measurement, and carrying out K-means cluster analysis on all the parameter time sequence data in the parameter data set based on a preset K value to obtain the cluster result.
Further, the method for acquiring the frequent pattern model comprises the following steps:
for any one parameter data set, determining a numerical range based on a numerical maximum value and a numerical minimum value in the parameter data set, uniformly dividing the numerical range to obtain a preset number of interval ranges, and marking different interval ranges with different labels;
replacing each numerical value in each parameter time sequence data by using the label to obtain a label sequence corresponding to each parameter time sequence data;
Obtaining the importance index of each cluster according to the occurrence frequency of all the labels and the number of label sequences in each cluster, wherein the number of label sequences in each cluster and the occurrence frequency of the labels are positively correlated with the importance index;
in a clustering result corresponding to the parameter data set, analyzing all label sequences in each cluster based on an FP-Growth algorithm to obtain frequent pattern trees corresponding to each cluster, arranging the frequent pattern trees corresponding to all clusters in a descending order based on an importance index to obtain an arrangement sequence, wherein in the arrangement sequence, a first frequent pattern tree is used as a target tree, a next frequent pattern tree of the target tree is used as a tree to be analyzed, a part with difference between the tree to be analyzed and the target tree is added into the target tree to obtain a new target tree, the next frequent pattern tree of the tree to be analyzed is used as a new tree to be analyzed, a part with difference between the new tree to be analyzed and the new target tree is added into the new target tree, the new target tree is continuously determined until the frequent pattern tree in the arrangement sequence is stopped after traversing, and the new target tree is used as a frequent pattern model corresponding to the parameter data set.
Further, the method for acquiring the noise existence probability comprises the following steps:
For any parameter data set, acquiring all frequent item sets in the corresponding frequent pattern model, wherein the occurrence frequency of each item in the frequent item sets is the occurrence frequency of a label corresponding to the item in the parameter data set;
For any frequent item set, determining a first integral noise-containing factor of the frequent item set based on the occurrence frequency of each item in the frequent item set, wherein the first integral noise-containing factor is in negative correlation with the occurrence frequency;
Fusing and averaging the first integral noise-containing factors and the second integral noise-containing factors of all frequent item sets to obtain a first noise-containing index of the parameter data set, wherein the first integral noise-containing factors and the second integral noise-containing factors are positively correlated with the first noise-containing index;
determining a second noise-containing index of the parameter data set based on the similarity conditions among all the frequent item sets;
and obtaining the noise existence probability of the parameter data set according to the first noise-containing index and the second noise-containing index of the parameter data set, wherein the first noise-containing index and the second noise-containing index are positively correlated with the noise existence probability.
Further, the method for acquiring the second noisy indicator comprises the following steps:
in all frequent item sets, taking the frequent item sets with the same item number as a type of frequent item set;
For any one combination, determining the independence degree value of the frequent item sets in the combination based on the difference condition of items at the same position in the two frequent item sets;
and taking the average value of the independent degree values corresponding to all combinations of all types of frequent item sets as a second noisy index of the parameter data set.
Further, the method for acquiring the data quality comprises the following steps:
Taking a parameter data set with noise existence probability larger than or equal to a preset noise threshold value as a target data set, taking a parameter data set with noise existence probability smaller than the preset noise threshold value as a normal data set, and setting the data quality of the normal data set to be a fixed value not smaller than 1;
for any one target data set, determining a first quality parameter of the target data set based on the change trend change condition of time sequence data of each parameter in the target data set;
In each parameter time sequence data corresponding to the target data set, based on the deviation condition between each numerical value and the average value of all numerical values, obtaining a deviation factor of each numerical value, and determining the overall quality factor of each parameter time sequence data based on the deviation factors of all numerical values, wherein the overall quality factor is in negative correlation with the deviation factor;
And obtaining the data quality of the target data set according to the noise existence probability, the first quality parameter and the second quality parameter of the target data set, wherein the noise existence probability, the first quality parameter and the second quality parameter are negatively related to the data quality, and the value of the data quality is a normalized value.
Further, the method for acquiring the first quality parameter includes:
And acquiring the number of extreme points in each parameter time sequence data in the target data set, and determining a first quality parameter of the target data set based on the ratio of the sum value of the number of extreme points of all parameter time sequence data to the total number of values of all parameter time sequence data.
Further, the method for acquiring the high-quality data set comprises the following steps:
Taking the parameter data set with the data quality smaller than or equal to a preset quality threshold value as a noise data set in all the parameter data sets;
Denoising all parameter time sequence data in each noise data set based on a Kalman filtering method to obtain a denoising data set corresponding to each noise data set;
and taking the parameter data set with the data quality larger than the preset quality threshold value and all the denoising data sets as high-quality data sets corresponding to all the test periods.
Further, the performance test of the electric control cabinet based on all high-quality data sets corresponding to all performance parameters includes:
And establishing a simulation test model corresponding to the electric control cabinet, taking all high-quality data sets corresponding to all performance parameters as a database of the simulation test model, and performing performance test of the electric control cabinet to obtain a test result.
The invention also provides a comprehensive automatic test system for the electric control cabinet, which comprises:
A memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of any one of the methods when the computer program is executed.
The invention has the following beneficial effects:
According to the invention, when the simulation test model is utilized to detect the electric control cabinet, noise data can be accurately screened out from a large amount of data, so that targeted denoising treatment is performed, the efficiency is effectively improved, and the accuracy of a subsequent model test result is ensured. Firstly, for any performance parameter, a parameter data set of the performance parameter in a plurality of test periods needs to be acquired, and the parameter data set contains a plurality of parameter time sequence data, because noise data belongs to accidental phenomena compared with normal data, the parameter time sequence data in each parameter data set can be subjected to cluster analysis based on similarity among the data to obtain a cluster result, and at the moment, the parameter time sequence data in each cluster in each cluster result can distinguish the normal data and the noise data in the parameter data set to a certain extent. Further, based on numerical characteristics of all parameter time sequence data in the parameter data set and a frequent pattern mining algorithm, fusion analysis is carried out on each cluster in the clustering result, so that a frequent pattern model is obtained, and the frequent pattern model can more remarkably represent the difference between normal data and noise data. Further, the frequent pattern model can be analyzed, the relevance among the data and the position distribution of the frequent pattern and the frequent item set in the frequent pattern model are analyzed, so that the possibility of noise data existence is quantified, the noise existence probability of the parameter data set is obtained, and a reference is provided for the quality of the parameter data set to be evaluated later. Further, since the noise data has characteristics on data fluctuation and trend change, the change trend and the data fluctuation condition of the parameter time sequence data are comprehensively considered, and the data quality of the parameter data set can be accurately estimated by combining the noise existence probability. This helps to accurately identify and screen out the noisy dataset, and then denoise it, resulting in a high quality dataset. And finally, performing performance test of the electric control cabinet based on the high-quality data sets corresponding to all the performance parameters to obtain a test result. The invention can accurately screen out the noise data set, so that only the noise data set is denoised in a large number of data sets, the data denoising processing efficiency can be improved, and the accuracy and the reliability of the test result are ensured.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for a fully automated test method for an electronic control cabinet according to one embodiment of the present invention;
FIG. 2 is a flow chart of a method for constructing a frequent pattern model according to an embodiment of the present invention;
FIG. 3 is an exemplary schematic diagram of a merging sub-step of a frequent pattern model creation process according to one embodiment of the present invention;
FIG. 4 is a flowchart of a method for obtaining a noise existence probability according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method for obtaining data quality according to an embodiment of the present invention;
FIG. 6 is a system block diagram of a full-scale automated test system for an electronic control cabinet in accordance with one embodiment of the present invention;
FIG. 7 is a schematic diagram of a system architecture of a fully automated test system for an electronic control cabinet according to one embodiment of the present invention;
fig. 8 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following description refers to the specific implementation, structure, characteristics and effects of a comprehensive automatic testing method and system for an electric control cabinet according to the invention in combination with the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a comprehensive automatic test method and a system for an electric control cabinet.
Referring to fig. 1, a flowchart of a method for fully automated testing of an electronic control cabinet according to an embodiment of the present invention is shown, the method comprising the steps of:
Step S1, for any performance parameter of an electric control cabinet, acquiring a parameter data set of the performance parameter in each test period, wherein each parameter data set comprises parameter time sequence data of a plurality of data periods.
In the process of carrying out comprehensive automatic test on an electric control cabinet by utilizing a software simulation detection mode, a large amount of data is required to be acquired so as to ensure the accuracy of a simulation detection result, noise data exists in a data set due to various external influencing factors in the process of data acquisition, and denoising processing is required to be carried out on the data set so as to ensure the accuracy of simulation operation, so that the reliability of the simulation result is improved. In view of the large scale of the data volume, in the embodiment of the invention, the data in the data set is analyzed, so that the data set with noise is screened out and denoised, the denoising efficiency can be effectively improved, and the credibility of the simulation result is ensured.
The electric control cabinet is widely applied to multiple fields of new energy sources, electric control, food machinery, machine tools and the like, and because the data characteristics of the electric control cabinet can show differences under different process flows, in order to obtain more accurate test results, the invention mainly tests under a certain process flow. The process flow can be, for example, that when a CNC machine tool of a machining center works, a spindle motor and a feeding system of the CNC machine tool can generate variable loads, so that current and power parameters in an electric control cabinet fluctuate. In the injection molding process, the power requirement of the injection molding machine is periodically changed in the processes of heating, injecting, cooling and opening and closing the mold, so that the current and power parameters in the electric control cabinet are regularly fluctuated.
Aiming at any performance parameter of the electric control cabinet, a sensor arranged in the electric control cabinet is utilized to collect parameter data sets of a plurality of test periods of the electric control cabinet under the current process flow, and each parameter data set should comprise parameter time sequence data of a plurality of data periods. It should be noted that, the performance parameters include voltage, current, etc., the test period is a period from the start of the process flow to the end of the process flow, wherein the data period is set to 10 minutes, the sampling time interval is set to 0.1s, and specific values of the data period and the sampling time interval can be adjusted according to the implementation scenario, which is not limited herein.
So far, for a certain technological process, a plurality of parameter data sets corresponding to each new energy parameter of the electric control cabinet can be obtained, and each parameter data set can be subjected to subsequent analysis so as to determine whether noise data is contained in the parameter data.
And S2, clustering all the parameter time sequence data according to the similarity condition among the parameter time sequence data in each parameter data set to obtain a clustering result, and carrying out fusion analysis on the parameter time sequence data in each cluster in the clustering result according to the numerical characteristics of all the parameter time sequence data in each parameter data set and a frequent pattern mining algorithm to obtain a frequent pattern model corresponding to each parameter data set.
Because noise data is an occasional phenomenon compared with normal data, the parameter time sequence data in each parameter data set can be subjected to cluster analysis based on the similarity between the data to obtain a cluster result, and the parameter time sequence data in each cluster result can distinguish the normal data and the noise data in the parameter data set to a certain extent. Because the relevance between the normal data is strong, based on the numerical characteristics of all parameter time sequence data in the parameter data set and a frequent pattern mining algorithm, fusion analysis is carried out on each cluster in the clustering result, so that a frequent pattern model is obtained, and the frequent pattern model at the moment can more remarkably represent the difference between the normal data and the noise data, so that preparation is made for the subsequent noise data screening process.
By clustering the parameter time sequence data in each parameter data set, the similar parameter time sequence data can be classified, so that the data distribution of the parameter data sets is simplified, meanwhile, the clustering result can reveal the internal structure and mode in the parameter data, and the understanding of the relation between the parameter time sequence data is facilitated. Therefore, according to the similarity among the numerical values in each parameter time sequence data in each parameter data set, clustering analysis is carried out on all the parameter time sequence data, so that a clustering result is obtained.
Preferably, in one embodiment of the present invention, the method for obtaining a clustering result includes:
The purpose of clustering is to make samples within the same cluster as similar as possible, while samples of different clusters are as dissimilar as possible. In this embodiment of the present invention, the similarity between the parameter time series data is determined mainly based on the numerical difference case between the parameter time series data.
Therefore, in each parameter data set, for parameter time sequence data in any two different data periods, a difference characteristic value between the two parameter time sequence data is determined based on the difference between numerical values and the difference condition between slope values of the numerical values, wherein a formula model of the difference characteristic value comprises:
;
wherein, Is shown in the firstA difference characteristic value between the parameter time sequence data 1 and the parameter time sequence data 2 in the parameter data set; Is shown in the first The parameter time series data 1 is the first parameter data setSlope values of the individual values; Is shown in the first The parameter time series data 2 of the parameter data setSlope values of the individual values; Is shown in the first The parameter time series data 1 is the first parameter data setA number of values; Is shown in the first The parameter time series data 2 of the parameter data setA number of values; representing a normalization function; Representing the total number of values in the parameter timing data.
In the formula model of the difference characteristic value, for any two different parameter time series data in the parameter data set, the difference between the numerical values at the corresponding positions is calculatedThe smaller the value, the more similar the values at the same position in the two parameter time series data, and similarly, after the slope value of each value is obtained, the difference between the slope values of the values at the same position is analyzedSince the slope value can represent the change trend of the data, when the difference value is smaller, the more consistent change trend exists at the same position of the time sequence data of the two parameters. So at the same position, willAnd (3) withAnd adding, namely representing the similarity condition of the time sequence data of the two parameters at a certain position, wherein the smaller the sum value is, the higher the similarity degree is, and conversely, the larger the sum value is, the higher the difference degree is. Finally, all positions, i.e. all values, are corresponding toAnd accumulating, and normalizing the accumulated sum to obtain a difference characteristic value between the two parameter time sequence data.
And then taking the difference characteristic value between the parameter time sequence data as a distance measure, and carrying out K-means cluster analysis on all the parameter time sequence data in the parameter data set based on a preset K value to obtain a cluster result, wherein the parameter time sequence data in each cluster in the cluster result has a relatively consistent numerical characteristic.
The slope value is calculated by comparing the difference between the last value and the previous value with the time interval between the two values, and setting the slope value of the last value as the slope value of the previous value in time sequence. The preset K value is set to 5, the specific value can be adjusted according to the implementation scene, the specific value is not limited herein, and the K-means clustering algorithm is a technical means well known to those skilled in the art, and is not described herein. Here, explanation is made on the same positions in two parameter time series data, for example, values in one parameter time series data are arranged as (1, 2,3, 4) in time series, values in the other parameter time series data are arranged as (a, b, c, d) in time series, then 1 and a are two values at the same position, and similarly, 2 and b are two values at the same position, 3 and c are two values at the same position, and 4 and d are two values at the same position.
In other embodiments of the present invention, the following method may be adopted to analyze the similarity of the parameter time sequence data in each parameter data set, so as to implement clustering to obtain a clustering result.
In view of the fact that a dynamic time warping algorithm (DYNAMIC TIME WARPING, DTW) can be used to measure the similarity of two time series, in each parameter dataset, parameter timing data within any two different data periods is acquiredValue and willAnd carrying out normalization processing on the values, so as to obtain the difference characteristic value between the two parameter time sequence data. The formula model of the difference eigenvalue comprises:
;
wherein, Is shown in the firstA difference characteristic value between the parameter time sequence data 1 and the parameter time sequence data 2 in the parameter data set; Is shown in the first In the parameter data set, parameter time sequence data 1 and parameter time sequence data 2 are arrangedA value; Representing the normalization function.
In the formula model, when two parameters are time-series dataThe smaller the value, the more similar the timing data of the two parameters, and therefore, will beAnd carrying out normalization processing on the values to obtain a difference characteristic value between the two parameter time sequence data.
And then taking the difference characteristic value between the parameter time sequence data as a distance measure, and carrying out K-means cluster analysis on all the parameter time sequence data in the parameter data set based on a preset K value to obtain a cluster result, wherein the parameter time sequence data in each cluster in the cluster result has a relatively consistent characteristic.
So far, in each parameter data set, all the clustering clusters can be obtained by carrying out cluster analysis on the parameter time sequence data in the parameter data set, different data characteristics are provided in different clustering clusters, normal data and noise data can be distinguished to a certain extent, and a reference is provided for the subsequent process.
The frequent pattern mining algorithm can be used for retrieving frequent item set information and can quickly find out frequent patterns and associated information in the parameter time sequence data, so that whether noise data exist in the parameter data set can be more effectively identified. The clustering results have obvious differences among different clustering clusters, so that the main components and the branch components can be more intuitively distinguished in a final frequent pattern model by carrying out frequent pattern mining and fusion analysis on the different clustering clusters, and the branch components are more likely to represent noise data because the noise data are less than normal data.
Preferably, in one embodiment of the present invention, the method for acquiring the frequent pattern model includes:
Referring to fig. 2, a method flowchart of a method for constructing a frequent pattern model according to an embodiment of the invention is shown, the method includes the following steps:
step S201, based on the numerical characteristics of all the parameter time sequence data in the parameter data set, carrying out label conversion on the numerical value in each parameter time sequence data, and determining a label sequence corresponding to each parameter time sequence data.
If each value is used as a single class, when the value range is too large, for example, 0-100, the model is too huge when the frequent pattern model is constructed, so that for any one parameter data set, the value range is determined based on the value maximum value and the value minimum value in the parameter data set, the value range is uniformly divided, a preset number of interval ranges are obtained, and all the interval ranges are marked by different labels.
And then, replacing each numerical value in each parameter time sequence data by using a label to obtain a label sequence corresponding to each parameter time sequence data.
This procedure is exemplified here, for example, in the range of 1-100, the preset number is set to 10, and the interval ranges from 1-10,11-20, 21-30..71-80, 81-90,91-100. Each interval range is marked with a different letter, in this embodiment of the invention the letters a-J are used, i.e. the labels corresponding to interval ranges 0-10 are a, the labels corresponding to 11-20 are B, and so on. At this time, if the value in one parameter time series data sequence is (0,15,14,35,21,50,78), the corresponding tag sequence is (a, B, D, C, E, H).
It should be noted that the preset number of settings may be adjusted according to the implementation scenario, which is not limited herein.
Step S202, constructing frequent pattern trees corresponding to tag sequences in all clusters in a clustering result based on an FP-Growth algorithm, and merging all the frequent pattern trees to obtain a frequent pattern model corresponding to each parameter data set.
The FP-Growth algorithm utilizes an FP-Tree (frequent pattern Tree) data structure to retrieve frequent item set information, which enables the algorithm to more quickly discover frequent patterns and associated information in the parameter time sequence data, thereby more effectively identifying whether noise data exists in the parameter data set.
Through the processing in step S201, at this time, in the parameter data set, each parameter time sequence data corresponds to one tag sequence, and the occurrence frequency of each tag may be calculated in all tags, where the occurrence frequency may represent the importance degree of the value corresponding to the tag in all values.
Because the number of parameter time sequence data in the cluster can represent the importance degree of the cluster, in each cluster, according to the occurrence frequency of all labels and the number of label sequences in each cluster, an importance degree index of each cluster is obtained, and the number of label sequences in each cluster and the occurrence frequency of labels are positively correlated with the importance degree index. The formula model of the importance index may specifically be, for example:
;
wherein, Is shown in the firstA first parameter data setImportance index of each cluster; Is shown in the first A first parameter data setThe number of tag sequences in the cluster; Is shown in the first A first parameter data setThe total number of tags in the cluster; Is shown in the first A first parameter data setThe first cluster of clustersThe frequency of occurrence of the individual tags; Representing the normalization function.
In the formula model of the importance index, when the number of tag sequences contained in one cluster is larger, the more likely that the data in the cluster is normal data, the more likely that the data features of the process flow are represented, so that the importance degree is higher, the frequency of occurrence of each tag in all tags in the parameter data set is calculated, the greater the frequency of occurrence is, the more likely that the data corresponding to the tag is normal data, the more likely that the data features of the process flow are represented, so that the greater the frequency of occurrence of the tag in one cluster is, the more likely that the number of tag sequences is, the more likely that the importance degree of the cluster is represented. Therefore, based on the logic, a formula model of the importance index is constructed, and the importance index of each cluster is obtained.
In other embodiments of the present invention, the value obtained by normalizing the sum of the occurrence frequencies of all the tags in the parameter data set may be used as the importance index of the cluster, and the specific formula model is as follows:
;
wherein, Is shown in the firstA first parameter data setImportance index of each cluster; Is shown in the first A first parameter data setThe total number of tags in the cluster; Is shown in the first A first parameter data setThe first cluster of clustersThe frequency of occurrence of the individual tags; Representing the normalization function.
In the formula model, the occurrence frequency of each label in all labels in the parameter data set is calculated, the larger the occurrence frequency is, the more times that data corresponding to the label appear in the parameter data set is, the more likely the data corresponding to the label is normal data, the more data characteristics of the process flow can be represented, so that the higher the occurrence frequency of the label in a certain cluster is, the higher the importance degree of the cluster can be indicated.
And then, in a clustering result corresponding to the parameter data set, analyzing all tag sequences in each cluster based on an FP-Growth algorithm, so that a frequent pattern tree corresponding to each cluster can be obtained. It should be noted that, the method for constructing the frequent pattern tree is a technical means well known to those skilled in the art, and will not be described herein.
The method comprises the steps of obtaining a sequence, wherein a first frequent pattern tree is used as a target tree in the sequence, a next frequent pattern tree of the target tree is used as a tree to be analyzed, a part with difference between the tree to be analyzed and the target tree is added into the target tree to obtain a new target tree, the next frequent pattern tree of the tree to be analyzed is used as a new tree to be analyzed, a part with difference between the new tree to be analyzed and the new target tree is added into the new target tree, and the new target tree is continuously determined until the frequent pattern tree in the sequence is traversed, and the new target tree is used as a frequent pattern model corresponding to the parameter data set.
The process of merging and constructing frequent pattern models is illustrated, and if 4 frequent pattern trees exist in the permutation sequence, the frequent pattern trees are respectively marked as 1,2,3 and 4. Then frequent pattern tree 1 is used as the target tree, frequent pattern tree 2 is compared with frequent pattern tree 1, and the part which exists in frequent pattern tree 2 but does not exist in frequent pattern tree 1 is supplemented into frequent pattern tree 1 to obtain a frequent pattern treeThen the frequent pattern tree 3 is combined with the frequent pattern treeComparing, to make the frequent pattern tree 3 existThe non-existing part is supplemented to the frequent pattern treeIn (3) obtaining frequent pattern treeThen, the frequent pattern tree 4 and the frequent pattern tree are used forComparing, to make the frequent pattern tree 4 existThe non-existing part is supplemented to the frequent pattern treeIn (3) obtaining frequent pattern treeFrequent pattern tree at this timeThe final frequent pattern model is obtained. Referring to FIG. 3, an exemplary schematic diagram of a merging sub-step in the frequent pattern model building process is shown.
The frequent pattern model can be obtained, and because the frequent pattern tree is built for each cluster, each tree can well keep the data characteristics and the association patterns in the corresponding cluster, meanwhile, because noise data are often distributed into smaller clusters in the clustering process, the frequent pattern tree is combined based on the importance degree value of each cluster, the obtained frequent pattern model can better keep the normal data in the current parameter data set and also can be the feature data, so that the normal data can be distinguished from the noise data, wherein the normal data can be more obvious in the frequent pattern model due to the frequency and the association of the normal data, and the noise data can possibly represent isolated or low-frequency items.
And step S3, determining the noise existence probability of each parameter data set in the frequent pattern model based on the distribution condition of each item in all parameter time sequence data, the similarity condition among frequent item sets and the position of the frequent item set in the frequent pattern model, and obtaining the data quality of each parameter data set according to the noise existence probability of each parameter data set, the change trend change condition of the numerical value in each parameter time sequence data in the parameter data set and the numerical value fluctuation condition.
The construction of the frequent pattern model corresponding to the parameter data set can be completed through the steps, because the normal data is a high-frequency item and tends to form a trunk part of the frequent pattern model compared with the noise data, and the noise data is a low-frequency item, an atypical path is formed and has obvious distinction from the path formed by the normal data, so that the noise existence probability of the parameter data set can be determined based on the distribution condition of each item, the similarity condition among frequent item sets and the position condition of the frequent item set in the frequent pattern model. Compared with normal data, the noise data has no obvious trend distribution and frequent trend change, so that the change trend of numerical values in the parameter time sequence data can be used as a reference for evaluating the data quality, the noise data and the abnormal data are both represented as outlier data points, so that the noise data and the abnormal data have more consistent characteristics in a frequent mode model, the abnormal data can be used for simulating the possible fault condition of the electric control cabinet, otherwise, the noise data are not helpful for simulating and detecting the electric control cabinet, the noise data and the abnormal data also need to be distinguished, the outlier degree of the abnormal data is more obvious compared with the noise data, the fluctuation condition of the whole data is greatly influenced, the data fluctuation condition in the parameter time sequence data is also used as a reference for evaluating the data quality, and meanwhile, the data quality of the parameter data set is analyzed more comprehensively by combining the noise existence probability of the parameter data set.
Because noise data generally appears as items that are infrequent, dissimilar to other item sets, or are located abnormally in a tree, by comprehensively considering the frequency of occurrence of the items, the similarity of the item sets, and the location information, the characteristics and structure of the data can be more comprehensively known, so that the noise existence probability of the parameter data set can be calculated for identifying noise data.
Preferably, in one embodiment of the present invention, the method for acquiring the noise existence probability includes:
Referring to fig. 4, a method flowchart of a method for obtaining a noise existence probability according to an embodiment of the present invention is shown, where the method includes the following steps:
Step S401, determining a first noise-containing index of the parameter data set based on the distribution condition of each item in all parameter time sequence data and the position of the frequent item set in the frequent pattern model.
And for any parameter data set, acquiring all frequent item sets in the corresponding frequent pattern model.
The frequency of occurrence of each item in the frequent item set is the frequency of occurrence of the label corresponding to the item in each parameter data set, and by considering the frequency of occurrence of the item, the items with low frequency of occurrence can be more effectively identified, and the items are likely to correspond to noise data. For any one frequent item set, determining a first overall noise-containing factor of the frequent item set based on the occurrence frequency of each item in the frequent item set, namely the occurrence frequency of each label, wherein the first overall noise-containing factor is inversely related to the occurrence frequency.
The frequent pattern model reflects the association relation among the items through the tree structure, the items corresponding to the normal data tend to form the trunk part of the tree model, and the items corresponding to the noise data tend to form the branch part of the tree model, so that the intrinsic structural characteristics of the data can be further captured by utilizing the position information of the frequent item set in the frequent pattern model, particularly the number of bifurcation points. Consideration of the number of bifurcation points may help understand the complexity and diversity of the data to more accurately assess the presence of noise. And determining a second integral noise-containing factor of the frequent item set based on the number of bifurcation points of the corresponding path of the frequent item set in the frequent pattern model, wherein the second integral noise-containing factor is in negative correlation with the number of bifurcation points.
And finally, fusing and averaging the first integral noise-containing factors and the second integral noise-containing factors of all the frequent item sets to obtain a first noise-containing index of the parameter data set, wherein the first integral noise-containing factors and the second integral noise-containing factors are positively correlated with the first noise-containing index. The formula model of the first noisy indicator may specifically be, for example:
;
wherein, Represent the firstA first noisy indicator of the individual parameter dataset; Represent the first The total number of frequent item sets in the frequent pattern model corresponding to the individual parameter data sets; Represent the first The corresponding first parameter data setThe number of bifurcation points of the paths of the frequent item sets in the frequent pattern model; Represent the first The corresponding first parameter data setThe number of items of the frequent item set; Represent the first The corresponding first parameter data setFrequent item set firstThe frequency of occurrence of the corresponding entries; representing a preset first parameter.
In the formula model of the first noisy index, since the frequency of occurrence of the term corresponding to the normal data in the parameter data set is larger, for each frequent term set, if the frequency of occurrence of the term in the frequent term set is larger, the probability that the frequent term set contains noisy data is lower is indicated, so that the frequency of occurrence of each term in the frequent term set is based onDetermining a first overall noise-containing factor for the frequent item setAnd to achieve a logical relationship correction, the first overall noise-containing factor is inversely related to the frequency of occurrence. Meanwhile, because the items corresponding to the normal data are distributed in the trunk part of the frequent pattern model more, if a frequent item set has more branches in the frequency pattern model, the paths of the frequent item set are in the trunk part, namely in the core position, the higher the possibility that the data corresponding to the items in the frequent item set are normal data, the lower the possibility that the frequent item set contains noise data, therefore, the number of branches is subjected to negative correlation mapping, logic relation correction is realized, and the second integral noise factor of the frequent item set is obtained. Finally, the first integral noise-containing factor and the second integral noise-containing factor of the frequent item set are integrated, and the multiplied value is used as the integrated noise-containing factor of the frequent item setAnd carrying out averaging treatment on the comprehensive noise-containing factors of all the frequent item sets, thereby obtaining a first noise-containing index of the parameter data set.
The first parameter is presetThe function of (2) is to prevent the denominator from being 0, and the value can be 0.001, and the specific value can be adjusted according to the implementation scene, and is not limited herein.
Step S402, determining a second noisy index of the parameter data set based on the similarity condition among the frequent item sets.
Among all the frequent item sets, the frequent item set with the same item number is used as a type of frequent item set.
And for any one combination, analyzing the difference condition of the items at the same position in the two frequent item sets, and setting the difference judgment index at the position as 0 when the items at the same position are the same, otherwise setting as 1. And then determining the independence degree value of the frequent item set in the combination based on the difference judgment indexes of all the positions and the sequence number of the positions. The independence level value may be used to reflect an inherent relationship between two frequent item sets, since similar frequent item sets should have the same item in the same location. The formula model of the independence level value comprises:
;
wherein, Is shown in the firstClass frequent item set, item 1Individual degree of independence between sets of combined frequent items; Represent the first A class frequent item set, the number of items of each frequent item set; A term number index representing a set of frequent terms; Is shown in the first Class frequent item set, item 1Between frequent item sets of individual combinationsAnd judging the index of the difference of the individual items.
In the formula model of the independent degree value, since the normal data is a main body part compared with the noise data, the frequent item sets corresponding to the normal data should have higher similarity, the items which are embodied as the same position should be the same, and the positions where the different items appear should be more backward in the frequent item sets, so the difference of the items at the same position is compared first, if the difference is different, the difference judgment index at the position is set to be 1, and if the difference judgment index is the same, the difference judgment index is set to be 0, at this time, if the number of the items at the same position in the two frequent item sets is greater when the comparison is performed, the difference between the two frequent item sets is larger, the similarity degree is lower, and noise is more likely to be contained. Meanwhile, during comparison, the position of the item, namely the index of the item number is taken as a weight, if the position of the difference is more front, the description contains more noise components, so the index of the item number is subjected to negative correlation mapping to realize logical relation correction and then is taken as an adjustment weightFinally, the difference judgment index is weighted by the adjustment weightAnd comprehensively analyzing all the items to obtain an independent degree value, wherein the larger the independent degree value is, the stronger the difference between two frequent item sets in the combination is, and the higher the possibility of noise is.
It should be noted that the method for determining the adjustment weight and the difference judgment index is exemplified herein, for example, two frequent item sets of two items are { A, F } and { C, F }, respectively, where A and C are the first item, the index of the item number is 1, the adjustment weight value of the first item is 1, the difference judgment index is 1;F and F is the second item, the index of the item number is 2, and the adjustment weight value of the second item is 1The difference judgment index is 0.
So far, each combination in each type of frequent item set corresponds to an independent degree value, so that the average value of the independent degree values corresponding to all combinations is used as a second noise-containing index of the parameter data set and is recorded asAnd the larger the value, the higher the likelihood that noise is present in the parameter dataset.
Step S403, obtaining the noise existence probability of the parameter data set according to the first noise-containing index and the second noise-containing index of the parameter data set.
And taking the value obtained after normalization of the sum value of the first noisy index and the second noisy index of the parameter data set as the noise existence probability of the parameter data set. The formula model of the noise existence probability includes:
;
wherein, Represent the firstNoise existence probability of the individual parameter data sets; Represent the first A first noisy indicator of the individual parameter dataset; Represent the first A second noisy indicator of the individual parameter dataset; Representing the normalization function.
In the formula model of the noise existence probability, based on the analysis in step S401 and step S402, the larger the first noise-containing index is, the larger the possibility of existence of noise in the parameter data set is, the larger the second noise-containing index is, and the larger the possibility of existence of noise in the parameter data set is, so that the sum value of the first noise-containing index and the second noise-containing index is normalized, thereby obtaining the noise existence probability of the parameter data set.
In other embodiments of the present invention, the value obtained by normalizing the product of the first noisy index and the second noisy index may be used as the noise existence probability of the parameter data set.
So far, based on the method, the noise existence probability of the parameter data set of each test period can be obtained, and in the subsequent process, the data quality of each parameter data set can be continuously analyzed by combining the noise existence probability.
Because the abnormal data can be used for simulating the possible fault conditions of the electric control cabinet, the fault can be found more effectively in the simulation process, otherwise, the noise data is not helpful to the fully-automatic simulation test of the electric control cabinet, but can affect the final simulation result, so that the more the noise data is, the worse the data quality is. Meanwhile, as the noise data and the abnormal data represent outlier characteristics, and the outlier characteristics of the abnormal data points are more obvious, the normal data set is screened out firstly, and the data quality of the normal data set is set to be a fixed value, so that the abnormal data set and the noise data set are further distinguished in the rest data set according to the characteristics of the fluctuation condition of the numerical value and the like, the data quality of the parameter data set is comprehensively measured, and the noise data set can be screened out accurately in the follow-up process.
Preferably, in one embodiment of the present invention, the method for acquiring data quality includes:
referring to fig. 5, a method flowchart of a data quality acquisition method according to an embodiment of the present invention is shown, the method includes the following steps:
Step S501, distinguishing a normal data set from a target data set based on the noise existence probability of the parameter data set, and acquiring the data set quality of the normal data set.
The parameter data set having a noise existence probability greater than or equal to a preset noise threshold is taken as a target data set, the parameter data set having a noise existence probability less than the preset noise threshold is taken as a normal data set, and the data quality of the normal data set is set to a fixed value not less than 1. It should be noted that, in the embodiment of the present invention, the preset noise threshold is set to 0.6, and the data quality of the normal data set is set to 1, and specific values can be adjusted according to the implementation scenario, which is not limited herein.
Step S502, for any one target data set, determining a first quality parameter of the target data set based on the change trend change condition of each parameter time sequence data in the target data set.
Since noise data appears as irregularly distributed data points in the dataset, i.e., there is no apparent trend distribution, it can be stated that noise components in the target dataset may be more if the data trend of the parametric temporal data changes more frequently.
Because the extreme points in a group of data comprise the maximum point and the minimum point, are the points with the highest local or the lowest local in the data sequence and are the nodes marked by the transition of the data from one trend to another trend, for any one target data set, the number of the extreme points in each parameter time sequence data in the target data set is obtained, and the first quality parameter of the target data set is determined based on the ratio of the sum value of the number of the extreme points of all the parameter time sequence data to the total number of the values of all the parameter time sequence data and recorded asThe first quality parameter quantifies the frequency of the data change trend in the time sequence data of the target data set, and the larger the first quality parameter is, the more noise data contained in the target data set is indicated, and the worse the data quality is.
Step S503, obtaining a second quality parameter of the target data set according to the fluctuation condition of the numerical value in each parameter time sequence data in the target data set.
The target data set may contain noise data and anomaly data, which should be distinguished from noise data in order to measure the data quality of the target data set more accurately, since anomaly data is more valuable than noise data. Although both noise data and abnormal data exhibit outlier characteristics, since the abnormal data is generally more deviated from normal data, the difference between the data can be analyzed to distinguish the noise data from the abnormal data.
And in each parameter time sequence data corresponding to the target data set, taking the deviation condition of each numerical value and the average value of all numerical values as a deviation factor of each numerical value, and taking the value obtained by carrying out negative correlation mapping on the sum value of the deviation factors of all numerical values as the integral quality factor of each parameter time sequence data, wherein the smaller the deviation factor is, the larger the integral quality factor is, which indicates that the parameter time sequence data contains more noise data.
And carrying out averaging treatment on the overall quality factors of all the parameter time sequence data to obtain a second quality parameter of the target data set, wherein the larger the second quality parameter is, the more noise data contained in the target data set is, and the worse the data quality is. The formula model of the second quality parameter includes:
;
wherein, Represent the firstA second quality parameter of the respective target data set; Represent the first The number of parameter timing data in the individual target data sets; Represent the first Item number of target data setTotal number of values in the individual parameter timing data; Represent the first Item number of target data setThe first parameter in the time sequence dataA number of values; Represent the first Item number of target data setNumerical average value in parameter time sequence data of each; representing a preset second parameter.
In the formula model of the second quality parameter, for each parameter time sequence data in any one target data set, calculating the average value of all valuesThe average value characterizes the average level of the values in the time sequence data of a parameter, and then the difference between each value and the average value is calculated to obtain a deviation factorThe noise data has smaller deviation factor for the abnormal data, so the sum of the deviation factors of all values is subjected to negative correlation mapping to obtain the overall quality factor of each parameter time sequence dataThe larger the value is, the more noise data are contained in the parameter time sequence data, and finally, the whole quality factors of all the parameter time sequence data in the target data set are subjected to the averaging processing to obtain the second quality parameters of the target data set.
The second parameter is presetThe function of (2) is to prevent the denominator from being 0, and the value is 0.001, and the specific value can be adjusted according to the implementation scenario, and is not limited herein.
Step S504, the noise existence probability, the first quality parameter and the second quality parameter of the target data set are integrated to obtain the data quality of the target data set.
And carrying out negative correlation mapping on the product of the noise existence probability, the first quality parameter and the second quality parameter of the target data set and normalizing the processed value to obtain the data quality of the target data set. The formula model of the data quality may specifically be, for example:
;
wherein, Represent the firstData quality of the individual target data sets; Represent the first Noise existence probabilities of the individual target data sets; Represent the first A first quality parameter of the respective target data set; Represent the first A second quality parameter of the respective target data set; Expressed in natural constant An exponential function of the base.
In the formula model of the data quality, the higher the noise existence probability is, the higher the possibility that noise is contained in the target data set is, the worse the data quality is, the higher the first quality parameter is, the more the noise data is contained in the target data set, the worse the data quality is, the higher the second quality parameter is, the more the noise data is contained in the target data set, and the worse the data quality is, so that the product of the noise existence probability, the first quality parameter and the second quality parameter is subjected to negative correlation mapping and normalization processing, and the data quality of the target parameter data is obtained.
So far, the data quality of all parameter data sets can be obtained, and the noise data sets can be screened out and denoised based on the data quality in the subsequent process.
And step S4, screening the noise data sets in all the parameter data sets according to the data quality corresponding to the parameter data sets of all the test periods, denoising the parameter time sequence data in the noise data sets to obtain high-quality data sets corresponding to all the test periods, and performing performance test of the electric control cabinet based on all the high-quality data sets corresponding to all the performance parameters.
Because the worse the data quality, the more noise data is contained in the data, in order to ensure the accuracy of the subsequent simulation test result, the noise data set needing to be denoised should be screened and denoised, so that the noise data set and the parameter data set not needing to be denoised together form a high-quality data set.
Preferably, in one embodiment of the present invention, a method for acquiring a high quality data set includes:
and taking the parameter data set with the data quality smaller than or equal to a preset quality threshold value as a noise data set in all the parameter data sets.
And then denoising all parameter time sequence data in each noise data set based on a Kalman filtering method to obtain a denoising data set corresponding to each noise data set.
And finally, taking the parameter data set with the data quality larger than the preset quality threshold value and all the denoising data sets as high-quality data sets corresponding to all the test periods.
It should be noted that the preset quality threshold is set to 0.6, the specific value can be adjusted according to the implementation scenario, and the method is not limited herein, and the kalman filtering method is a technical means well known to those skilled in the art, and the specific process is not described herein.
All performance parameters under a certain technological process can be analyzed based on the process, so that high-quality data sets of all the performance parameters are obtained, and then the performance test of the electric control cabinet can be performed based on the high-quality data sets of all the performance parameters.
Preferably, in one embodiment of the present invention, the performance test of the electric control cabinet is performed based on all high quality data sets corresponding to all performance parameters, including:
According to the design drawing and specification of the electric control cabinet, a simulation test model corresponding to the electric control cabinet is established, all high-quality data sets corresponding to all performance parameters are used as databases of the simulation test model and input into the simulation test model, then the behavior and output of the system can be simulated and observed in a software environment, performance test of the electric control cabinet is carried out, namely various fault conditions such as element damage, power supply fluctuation, load mutation and the like are simulated, and test results are obtained.
And finally, adjusting the control strategy of the electric control cabinet based on the deviation of the test result and the expected result.
In summary, in the embodiment of the present invention, for any performance parameter, a parameter data set of the performance parameter in a plurality of test periods needs to be acquired, and the parameter data set includes a plurality of parameter time sequence data, because noise data belongs to an accidental phenomenon compared with normal data, the parameter time sequence data in each parameter data set can be subjected to cluster analysis based on similarity between data to obtain a cluster result, and at this time, the parameter time sequence data in each cluster in each cluster result can distinguish the normal data and the noise data in the parameter data set to a certain extent. Further, based on numerical characteristics of all parameter time sequence data in the parameter data set and an FP-Growth algorithm, fusion analysis is carried out on each cluster in the clustering result, so that a frequent pattern model is obtained, and at the moment, the frequent pattern model can more remarkably represent the difference between normal data and noise data. Further, the frequent pattern model can be analyzed, the relevance among the data and the position distribution of the frequent pattern and the frequent item set in the frequent pattern model are analyzed, so that the possibility of noise data existence is quantified, the noise existence probability of the parameter data set is obtained, and a reference is provided for the quality of the parameter data set to be evaluated later. Further, since the noise data has characteristics on data fluctuation and trend change, the change trend and the data fluctuation condition of the parameter time sequence data are comprehensively considered, and the data quality of the parameter data set can be accurately estimated by combining the noise existence probability. This helps to accurately identify and screen out the noisy dataset, and then denoise it, resulting in a high quality dataset. And finally, performing performance test of the electric control cabinet based on the high-quality data sets corresponding to all the performance parameters to obtain a test result. According to the embodiment of the invention, the noise data set can be accurately screened, so that only the noise data set is denoised in a large number of data sets, the data denoising processing efficiency can be improved, and meanwhile, the accuracy and the reliability of a test result are ensured.
The embodiment also provides a comprehensive automatic test system for the electric control cabinet, which comprises a processor, a memory and a computer program, wherein the memory is used for storing the corresponding computer program, the processor is used for running the corresponding computer program, and the computer program can realize any one of the steps of the comprehensive automatic test method for the electric control cabinet when running on the processor.
Referring to fig. 6, a system block diagram of a full-automatic testing system for an electronic control cabinet according to an embodiment of the present invention includes a data acquisition module 601, a frequent pattern model building module 602, a data quality analysis module 603, and a performance test module 604, wherein the data acquisition module is used for implementing step S1, step S2, and step S3, respectively.
Referring to fig. 7, a schematic diagram of a system architecture of a fully automated test system for an electronic control unit according to an embodiment of the present invention is shown, and includes a processor 700, a memory 701, a bus 702, and a communication interface 703, where the processor 700, the communication interface 703, and the memory 701 are connected by the bus 702, and the memory 701 may include a high-speed random access memory, the bus 702 may be an ISA bus, a PCI bus, or an EISA bus, and the processor 700 may be an integrated circuit chip with signal processing capability.
The embodiment of the present invention further provides a computer readable storage medium corresponding to the method provided in the foregoing embodiment, referring to fig. 8, the storage medium is shown as an optical disc, and a computer program (i.e. a program product) is stored on the storage medium, where the computer program, when executed by a processor, performs the method provided in any of the foregoing embodiments.
It should be noted that, examples of the computer readable storage medium may also include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), read Only Memory (ROM), and other optical and magnetic storage media, which are not described herein in detail.
It should be noted that the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (7)

1.一种用于电控柜的全面自动化测试方法,其特征在于,所述方法包括:1. A comprehensive automated testing method for an electric control cabinet, characterized in that the method comprises: 对于电控柜的任意一项性能参数,获取每个测试时段内该性能参数的参数数据集,每个所述参数数据集中包括多个数据周期的参数时序数据;For any performance parameter of the electric control cabinet, a parameter data set of the performance parameter in each test period is obtained, each of the parameter data sets including parameter time series data of multiple data cycles; 根据每个参数数据集中各个参数时序数据之间的相似情况,对所有的参数时序数据进行聚类,得到聚类结果;根据每个参数数据集中所有参数时序数据的数值特征以及频繁模式挖掘算法,对聚类结果中各个聚类簇中的参数时序数据进行融合分析,得到每个参数数据集对应的频繁模式模型;According to the similarity between each parameter time series data in each parameter data set, all parameter time series data are clustered to obtain clustering results; according to the numerical characteristics of all parameter time series data in each parameter data set and the frequent pattern mining algorithm, the parameter time series data in each cluster cluster in the clustering result are fused and analyzed to obtain the frequent pattern model corresponding to each parameter data set; 在所述频繁模式模型中,基于各个项在所有参数时序数据中的分布情况、频繁项集之间的相似情况,以及频繁项集在频繁模式模型中的位置,确定每个参数数据集的噪声存在概率;根据每个参数数据集的噪声存在概率、参数数据集中各个参数时序数据中数值的变化趋势改变情况以及数值波动情况,得到每个参数数据集的数据质量;In the frequent pattern model, based on the distribution of each item in all parameter time series data, the similarity between frequent item sets, and the position of frequent item sets in the frequent pattern model, the probability of noise existence in each parameter data set is determined; according to the probability of noise existence in each parameter data set, the change trend of the value in each parameter time series data in the parameter data set, and the value fluctuation, the data quality of each parameter data set is obtained; 根据所有测试时段的参数数据集对应的数据质量,在所有参数数据集中筛选噪声数据集并对噪声数据集中的参数时序数据进行去噪,得到所有测试时段对应的高质量数据集;基于所有性能参数对应的所有高质量数据集进行电控柜的性能测试;According to the data quality corresponding to the parameter data sets of all test periods, the noise data sets are screened from all parameter data sets and the parameter time series data in the noise data sets are denoised to obtain high-quality data sets corresponding to all test periods; the performance test of the electric control cabinet is performed based on all high-quality data sets corresponding to all performance parameters; 所述频繁模式模型的获取方法包括:The method for acquiring the frequent pattern model includes: 对于任意一个参数数据集,基于该参数数据集中的数值最大值以及数值最小值确定数值范围,将所述数值范围进行均匀划分,得到预设数量个区间范围,将不同的区间范围用不同的标签进行标记;For any parameter data set, a numerical range is determined based on the maximum and minimum values in the parameter data set, the numerical range is evenly divided to obtain a preset number of interval ranges, and different interval ranges are marked with different labels; 利用所述标签对每个参数时序数据中的每个数值进行替换,得到每个参数时序数据对应的标签序列;Using the label to replace each value in each parameter time series data, to obtain a label sequence corresponding to each parameter time series data; 获取每个标签在该参数数据集中的出现频率;在每个聚类簇中,根据所有标签的出现频率以及每个聚类簇中标签序列的数量,得到每个聚类簇的重要程度指标,每个聚类簇中标签序列的数量和标签的出现频率均与所述重要程度指标呈正相关;Obtain the frequency of occurrence of each tag in the parameter data set; in each cluster, according to the frequency of occurrence of all tags and the number of tag sequences in each cluster, obtain the importance index of each cluster, and the number of tag sequences in each cluster and the frequency of occurrence of tags are positively correlated with the importance index; 在该参数数据集对应的聚类结果中,基于FP-Growth算法对每个聚类簇中的所有标签序列进行分析,得到每个聚类簇对应的频繁模式树;基于重要程度指标对所有聚类簇对应的频繁模式树进行降序排列,得到排列序列,在所述排列序列中,将第一个频繁模式树作为目标树,将目标树后一个频繁模式树作为待分析树,将待分析树与目标树存在差异的部分添加到目标树中,得到新的目标树;将待分析树后一个频繁模式树作为新的待分析树,将新的待分析树与新的目标树中存在差异的部分添加到新的目标树中,不断确定新的目标树,直至对排列序列中的频繁模式树遍历完成后停止,将此时新的目标树作为该参数数据集对应的频繁模式模型;In the clustering result corresponding to the parameter data set, all label sequences in each cluster are analyzed based on the FP-Growth algorithm to obtain the frequent pattern tree corresponding to each cluster; the frequent pattern trees corresponding to all clusters are arranged in descending order based on the importance index to obtain an arrangement sequence, in which the first frequent pattern tree is used as the target tree, the frequent pattern tree after the target tree is used as the tree to be analyzed, and the part where the tree to be analyzed and the target tree have differences is added to the target tree to obtain a new target tree; the frequent pattern tree after the tree to be analyzed is used as the new tree to be analyzed, and the part where the new tree to be analyzed and the new target tree have differences is added to the new target tree, and the new target tree is continuously determined until the frequent pattern trees in the arrangement sequence are traversed and stopped, and the new target tree at this time is used as the frequent pattern model corresponding to the parameter data set; 所述噪声存在概率的获取方法包括:The method for obtaining the noise existence probability includes: 对于任意一个参数数据集,获取对应频繁模式模型中的所有频繁项集;频繁项集内各个项的出现频率为项对应的标签在参数数据集中的出现频率;For any parameter data set, obtain all frequent item sets in the corresponding frequent pattern model; the frequency of occurrence of each item in the frequent item set is the frequency of occurrence of the label corresponding to the item in the parameter data set; 对于任意一个频繁项集,基于该频繁项集中各个项的出现频率,确定该频繁项集的第一整体含噪因子,所述第一整体含噪因子与出现频率呈负相关;基于该频繁项集对应路径在频繁模式模型中的分叉点数量,确定该频繁项集的第二整体含噪因子,所述第二整体含噪因子与所述分叉点数量呈负相关;For any frequent item set, based on the occurrence frequency of each item in the frequent item set, determine the first overall noise factor of the frequent item set, and the first overall noise factor is negatively correlated with the occurrence frequency; based on the number of bifurcation points of the path corresponding to the frequent item set in the frequent pattern model, determine the second overall noise factor of the frequent item set, and the second overall noise factor is negatively correlated with the number of bifurcation points; 将所有频繁项集的第一整体含噪因子与第二整体含噪因子进行融合并均值化处理,得到该参数数据集的第一含噪指标,所述第一整体含噪因子与第二整体含噪因子均与所述第一含噪指标呈正相关;The first overall noise factor and the second overall noise factor of all frequent item sets are merged and averaged to obtain a first noise index of the parameter data set, wherein the first overall noise factor and the second overall noise factor are both positively correlated with the first noise index; 基于所有频繁项集之间的相似情况,确定该参数数据集的第二含噪指标;Based on the similarities between all frequent item sets, determine the second noise index of the parameter data set; 根据该参数数据集的第一含噪指标与第二含噪指标,得到该参数数据集的噪声存在概率,所述第一含噪指标与第二含噪指标均与所述噪声存在概率呈正相关;According to a first noise index and a second noise index of the parameter data set, a noise existence probability of the parameter data set is obtained, wherein the first noise index and the second noise index are both positively correlated with the noise existence probability; 所述数据质量的获取方法包括:The method for obtaining the data quality includes: 将噪声存在概率大于或等于预设噪声阈值的参数数据集作为目标数据集,将噪声存在概率小于预设噪声阈值的参数数据集作为正常数据集,正常数据集的数据质量设置为不小于1的固定值;The parameter data set with a noise probability greater than or equal to a preset noise threshold is taken as the target data set, and the parameter data set with a noise probability less than the preset noise threshold is taken as the normal data set, and the data quality of the normal data set is set to a fixed value not less than 1; 对于任意一个目标数据集,基于该目标数据集中各个参数时序数据的变化趋势改变情况,确定该目标数据集的第一质量参数;For any target data set, based on the change trend of each parameter time series data in the target data set, determine the first quality parameter of the target data set; 在该目标数据集对应的每个参数时序数据中,基于每个数值与所有数值均值之间的偏差情况,得到每个数值的偏差因子,基于所有数值的偏差因子确定每个参数时序数据的整体质量因子,且所述整体质量因子与所述偏差因子呈负相关;将所有参数时序数据的整体质量因子进行均值化处理,得到该目标数据集的第二质量参数;In each parameter time series data corresponding to the target data set, based on the deviation between each value and the mean of all values, a deviation factor of each value is obtained, and based on the deviation factors of all values, an overall quality factor of each parameter time series data is determined, and the overall quality factor is negatively correlated with the deviation factor; the overall quality factors of all parameter time series data are averaged to obtain a second quality parameter of the target data set; 根据该目标数据集的噪声存在概率、第一质量参数以及第二质量参数,得到该目标数据集的数据质量,所述噪声存在概率、第一质量参数以及第二质量参数,均与数据质量呈负相关,且所述数据质量的取值为归一化后的数值。The data quality of the target data set is obtained according to the noise existence probability, the first quality parameter and the second quality parameter of the target data set. The noise existence probability, the first quality parameter and the second quality parameter are all negatively correlated with the data quality, and the value of the data quality is a normalized value. 2.根据权利要求1所述的一种用于电控柜的全面自动化测试方法,其特征在于,所述聚类结果的获取方法包括:2. The comprehensive automated testing method for an electric control cabinet according to claim 1, wherein the method for obtaining the clustering results comprises: 在每个参数数据集中,对于任意两个不同数据周期内的参数时序数据,基于数值之间的差异性以及数值的斜率值之间的差异情况,确定两个参数时序数据之间的差异特征值;In each parameter data set, for any two parameter time series data in different data periods, based on the difference between the numerical values and the difference between the slope values of the numerical values, determine the difference characteristic value between the two parameter time series data; 将参数时序数据之间的差异特征值作为距离度量,基于预设K值对参数数据集中的所有参数时序数据进行K-means聚类分析,得到所述聚类结果。The difference characteristic values between the parameter time series data are used as the distance metric, and K-means clustering analysis is performed on all the parameter time series data in the parameter data set based on a preset K value to obtain the clustering result. 3.根据权利要求1所述的一种用于电控柜的全面自动化测试方法,其特征在于,所述第二含噪指标的获取方法包括:3. A comprehensive automated testing method for an electric control cabinet according to claim 1, characterized in that the method for obtaining the second noise index comprises: 在所有的频繁项集中,将项数相同的频繁项集作为一类频繁项集;Among all frequent item sets, the frequent item sets with the same number of items are regarded as a class of frequent item sets; 在任意一类频繁项集中,将任意两个频繁项集进行组合,得到所有不重复的组合;对于任意一个组合,基于两个频繁项集中相同位置的项的差异情况,确定该组合中频繁项集的独立程度值;In any type of frequent item sets, any two frequent item sets are combined to obtain all non-repeating combinations; for any combination, the independence value of the frequent item sets in the combination is determined based on the differences in the items at the same position in the two frequent item sets; 将所有类频繁项集的所有组合对应的独立程度值的均值作为该参数数据集的第二含噪指标。The mean of the independence values corresponding to all combinations of frequent item sets of all classes is taken as the second noise index of the parameter data set. 4.根据权利要求1所述的一种用于电控柜的全面自动化测试方法,其特征在于,所述第一质量参数的获取方法包括:4. The comprehensive automated testing method for an electric control cabinet according to claim 1, wherein the method for obtaining the first quality parameter comprises: 获取该目标数据集中每个参数时序数据中的极值点数量,基于所有参数时序数据的极值点数量的和值在所有参数时序数据的数值总数中的占比,确定该目标数据集的第一质量参数。The number of extreme value points in each parameter time series data in the target data set is obtained, and the first quality parameter of the target data set is determined based on the proportion of the sum of the number of extreme value points of all parameter time series data in the total number of values of all parameter time series data. 5.根据权利要求1所述的一种用于电控柜的全面自动化测试方法,其特征在于,所述高质量数据集的获取方法包括:5. The comprehensive automated testing method for an electric control cabinet according to claim 1, wherein the method for acquiring the high-quality data set comprises: 在所有参数数据集中,将数据质量小于或等于预设质量阈值的参数数据集作为噪声数据集;Among all parameter data sets, the parameter data sets whose data quality is less than or equal to the preset quality threshold are regarded as noise data sets; 基于卡尔曼滤波方法对每个噪声数据集中的所有参数时序数据进行去噪,得到每个噪声数据集对应的去噪数据集;Based on the Kalman filtering method, all parameter time series data in each noise data set are denoised to obtain a denoised data set corresponding to each noise data set; 将数据质量大于预设质量阈值的参数数据集与所有的去噪数据集,作为所有测试时段对应的高质量数据集。The parameter data sets with data quality greater than the preset quality threshold and all denoised data sets are regarded as high-quality data sets corresponding to all test periods. 6.根据权利要求1所述的一种用于电控柜的全面自动化测试方法,其特征在于,所述基于所有性能参数对应的所有高质量数据集进行电控柜的性能测试,包括:6. A comprehensive automated testing method for an electric control cabinet according to claim 1, characterized in that the performance test of the electric control cabinet based on all high-quality data sets corresponding to all performance parameters comprises: 建立与电控柜对应的仿真测试模型,将所有性能参数对应的所有高质量数据集作为所述仿真测试模型的数据库,进行电控柜的性能测试,得到测试结果。A simulation test model corresponding to the electric control cabinet is established, all high-quality data sets corresponding to all performance parameters are used as the database of the simulation test model, and the performance test of the electric control cabinet is performed to obtain the test results. 7.一种用于电控柜的全面自动化测试系统,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1~6任意一项所述方法的步骤。7. A comprehensive automated testing system for an electric control cabinet, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method as claimed in any one of claims 1 to 6 when executing the computer program.
CN202411338200.8A 2024-09-25 2024-09-25 Comprehensive automated testing method and system for electric control cabinets Active CN118885738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411338200.8A CN118885738B (en) 2024-09-25 2024-09-25 Comprehensive automated testing method and system for electric control cabinets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411338200.8A CN118885738B (en) 2024-09-25 2024-09-25 Comprehensive automated testing method and system for electric control cabinets

Publications (2)

Publication Number Publication Date
CN118885738A CN118885738A (en) 2024-11-01
CN118885738B true CN118885738B (en) 2025-04-15

Family

ID=93221323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411338200.8A Active CN118885738B (en) 2024-09-25 2024-09-25 Comprehensive automated testing method and system for electric control cabinets

Country Status (1)

Country Link
CN (1) CN118885738B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119064707B (en) * 2024-11-07 2025-03-21 洛阳联江电力工程有限责任公司 A fault diagnosis method and system for an electrical control cabinet

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8359329B2 (en) * 2007-02-13 2013-01-22 Future Route Limited Method, computer apparatus and computer program for identifying unusual combinations of values in data
AT520185B1 (en) * 2017-12-04 2019-02-15 Avl List Gmbh Test bench and method for carrying out a test
CN111881617B (en) * 2020-07-02 2024-03-26 上海电气风电集团股份有限公司 Data processing method, performance evaluation method and system of wind generating set
CN116361059B (en) * 2023-05-19 2023-08-08 湖南三湘银行股份有限公司 Diagnosis method and diagnosis system for abnormal root cause of banking business
CN118471320B (en) * 2024-07-12 2024-11-15 深圳市奥斯珂科技有限公司 Testing method and system for stacked package memory

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method

Also Published As

Publication number Publication date
CN118885738A (en) 2024-11-01

Similar Documents

Publication Publication Date Title
CN109387712B (en) Non-intrusive load detection and decomposition method based on state matrix decision tree
CN111931868B (en) Time series data abnormity detection method and device
CN118885738B (en) Comprehensive automated testing method and system for electric control cabinets
CN118134539B (en) User behavior prediction method based on intelligent kitchen multi-source data fusion
CN112637132A (en) Network anomaly detection method and device, electronic equipment and storage medium
CN115982602B (en) Photovoltaic transformer electrical fault detection method
CN119125687B (en) Cable shielding layer state monitoring method and system for shielding data cable
CN117113235B (en) Cloud computing data center energy consumption optimization method and system
CN116610938B (en) Method and equipment for detecting unsupervised abnormality of semiconductor manufacture in curve mode segmentation
CN118378199A (en) Real-time anomaly detection method in big data analysis platform
CN118228006B (en) A chip detection method and system based on FPGA technology
CN118244071B (en) Method and system for detecting partial discharge of dry type iron core reactor
CN119807713B (en) Station electric equipment steady-state feature extraction system based on load perception
CN111161097A (en) Method and device for detecting switch event based on event detection algorithm of hypothesis test
CN120009700A (en) Integrated circuit intelligent testing and analysis system and method
CN118465475B (en) Ultrahigh frequency partial discharge stability test method and device for hydraulic generator
CN118395311A (en) Intelligent analysis method and system for production data of electronic components
CN118226791A (en) Industrial control computer self-checking control system and method
CN117994026A (en) Financial risk intelligent analysis method based on big data
CN111833012A (en) Industrial data processing method and device
CN114781083A (en) Engine steady-state data hierarchical analysis and steady-state data characteristic value extraction method
CN119322953B (en) Switch cabinet safety detection method based on big data
CN120354242B (en) Data cleaning and steady-state detection method and system based on robust time series modeling
CN118915977B (en) Data storage method of humidity on-line monitoring equipment
CN118427537B (en) Intelligent data acquisition method and system for semiconductor equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant