[go: up one dir, main page]

CN110533071B - SMT production tracing method based on self-encoder and ensemble learning - Google Patents

SMT production tracing method based on self-encoder and ensemble learning Download PDF

Info

Publication number
CN110533071B
CN110533071B CN201910688024.3A CN201910688024A CN110533071B CN 110533071 B CN110533071 B CN 110533071B CN 201910688024 A CN201910688024 A CN 201910688024A CN 110533071 B CN110533071 B CN 110533071B
Authority
CN
China
Prior art keywords
value
data
autoencoder
attribute
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910688024.3A
Other languages
Chinese (zh)
Other versions
CN110533071A (en
Inventor
常建涛
张凯磊
孔宪光
王佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910688024.3A priority Critical patent/CN110533071B/en
Publication of CN110533071A publication Critical patent/CN110533071A/en
Application granted granted Critical
Publication of CN110533071B publication Critical patent/CN110533071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • Manufacturing & Machinery (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an SMT production tracing method based on a self-encoder and ensemble learning, which comprises the following steps: (1) constructing a self-encoder; (2) acquiring an SPI defect tracing data set; (3) carrying out normalization processing on the SPI defect tracing data set; (4) training a self-encoder; (5) obtaining a classification tree set by using an ensemble learning method; (6) obtaining the SPI production tracing sequence. According to the invention, the normalized SPI defect tracing data set is input into the trained self-encoder to generate a classification data set, a classification tree is trained by using an ensemble learning method, the trained classification tree is traversed to obtain an SMT production tracing sequence, key factors causing product defects are positioned, and the SMT production tracing accuracy is improved.

Description

SMT production tracing method based on self-encoder and ensemble learning
Technical Field
The invention belongs to the technical field of electronics, and further relates to a Surface Mounting Technology (SMT) production tracing method based on a self-encoder and ensemble learning in the technical field of informatization of electronic manufacturing industry. The invention can be applied to tracing the Printed Circuit Board (PCB) of an electronic product in the surface mounting production process, and is used for quickly positioning key factors causing product defects.
Background
SMT is an electronic assembly technique that assembles surface mount components to a printed board. ISO 9000-: "traceability" traces back the history of the object under consideration, the application or the ability of the location where it is located. The method can control and adjust the technical instability factors, human factors or management factors causing the defect points, and continuously improve the product quality. Among various tracing methods, the method based on machine learning and deep learning effectively utilizes various data generated in the SMT production process during tracing, solves the problem of insufficient data utilization, and realizes SMT production tracing.
Shanghai ' an science and technology Limited company discloses an SMT production tracing method and technology in the patent document ' an SMT production intelligent error-proof tracing method and technology ' (patent application No. 201810719538.6, application publication No. CN 109911365A). According to the method, the intelligent warehouse is established, the operation flow of the intelligent warehouse is standardized, the intelligent warehouse and the intelligent production module are jointly used, real-time feedback and optimization of the SMT production line are achieved, and SMT production tracing is achieved. The method has the defects that only relevant data in the production process of the product can be obtained, the relation between SMT production information and production defects cannot be deeply mined, and key factors causing the product defects cannot be timely and accurately positioned.
Ban X et al, in its published paper "Quality tracking of converter steel based adaptive feature selection and multiple linear regression" (2018IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE,2018:462-468), disclose a method for tracing converter steel making process abnormal production Data using an adaptive feature selection method based on correlation and deviation matching for feature selection, using a multivariate linear regression method to analyze the causal relationship between the parameters, and using the feature with the largest coefficient in the regression equation as the key factor causing the production abnormality. The method has the defects that the used self-adaptive feature selection method can only find out features linearly related to the dependent variable, so that excessive features are lost, the used multiple linear regression cannot describe the nonlinear relation between the independent variable and the dependent variable, and the interpretation capability of the relation between the independent variable and the dependent variable is weak.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an SMT production tracing method based on an auto-encoder and ensemble learning so as to locate key factors causing product defects.
The idea for realizing the purpose of the invention is as follows: and (3) constructing 18 self-encoders with the same structure and different parameters, processing the normalized SPI defect tracing data set to obtain a classification data set, then obtaining a classification tree set by using an integrated learning method, finally traversing the classification tree set to obtain an SMT production tracing sequence, and positioning production information which has important influence on SMT production defects.
The method comprises the following specific steps:
(1) constructing an auto encoder:
(1a) build 18 autocoders with the same structure and different parameters, wherein each autocoder has three layers, and the structure of each autocoder is as follows: input layer → fully connected layer → output layer;
(1b) setting the number of nodes of the input layer and the output layer to 76;
(1c) the number of full link layer nodes for each autoencoder is set according to the following equation:
Figure BDA0002146985590000021
wherein n isiRepresents the number of fully-connected layer nodes of the ith self-encoder, i e {1,2, …,18},
Figure BDA0002146985590000022
represents a rounding down operation,% represents a remainder operation;
(1d) calculating an activation value of each node of the fully-connected layers in the 1 st to 9 th self-encoders according to the following formula:
Figure BDA0002146985590000023
wherein, TmjRepresents the activation value of the jth node in the m-th self-encoder's fully-connected layer, m ∈ {1,2, …,9}, j ∈ {1,2, …, N ∈m},NmRepresenting the total number of fully-connected layer nodes of the mth self-encoder, e(·)Expressing exponential operations based on a natural constant e,xmjRepresenting the j-th node input value, x, in the fully-connected layer of the m-th self-encodermj=Wmj TXm,WmjRepresenting a weight matrix of the network between the input layer of the mth self-encoder and the jth node in the fully-connected layer, the initialized value of each element of the matrix obeying a standard normal distribution, T representing a transposition operation, XmRepresents a vector consisting of input values of 76 nodes in the input layer of the mth self-encoder;
(1e) calculating an activation value of each node of the fully-connected layers in the 10 th to 18 th self-encoders according to the following formula:
Rln=max(0,xln)
wherein R islnRepresents the activation value of the nth node in the fully-connected layer of the ith self-encoder, i ∈ {10, …,18}, N ∈ {1,2, …, N ∈l},NlRepresents the total number of nodes of the fully-connected layer of the ith self-encoder, max (·) represents the max operation, xlnRepresenting the input value, x, of the nth node in the fully-connected layer of the ith self-encoderln=Wln TXl,WlnA weight matrix representing the network between the nth node of the input layer and the fully-connected layer of the ith self-encoder, the initialized value of each element of the matrix obeying a standard normal distribution, XlRepresenting a vector consisting of input values of 76 nodes in the ith self-encoder input layer;
(1f) calculating a loss error value between each output value from the encoder output layer and the input layer input value according to:
Figure BDA0002146985590000031
wherein L isiRepresenting the loss error value between the i-th output value from the encoder output layer and the input layer input value, i e {1,2, …,18}, NiRepresents the number of input layer nodes and the number of output layer nodes of the ith self-encoder, sigma represents the summation operation, yikRepresents the input value of the kth node of the ith self-encoder input layer,
Figure BDA0002146985590000032
represents the output value of the kth node of the ith self-encoder output layer, k is equal to {1,2, …, N };
(2) acquiring an SPI defect tracing data set:
randomly extracting at least 5320000 pieces of SPI tracing data from a database of a Manufacturing Execution System (MES) to form an MXN-dimensional SPI defect tracing data set, wherein M is at least 70000, and N is at least 76, each row of data represents SPI defect tracing data containing production information, each column of data represents a sequence formed by all values of one attribute in the SPI tracing data set, and at least 20000 rows of SPI tracing data in the SPI defect tracing data set are defective detection data;
(3) according to the following formula, normalizing the data of each attribute in the SPI defect tracing data set to obtain a normalized SPI defect tracing data set:
x'qp=(xqp-min(xq))/(max(xq)-min(xq))
wherein, x'qpNormalized value, x, of the pth data representing the qth attribute in the SPI defect trace datasetqpP data representing the q attribute of the SPI defect tracing data set, min (-) represents the minimum value operation, xqAll data representing the qth attribute of the SPI defect tracing data set, max (·) represents the maximum value operation;
(4) training the self-encoder:
respectively inputting the normalized SPI defect tracing data set into an input layer of each self-encoder of 18 self-encoders, and respectively training each self-encoder by using a random gradient descent method to obtain 18 trained self-encoders in total;
(5) obtaining a set of classification trees using ensemble learning:
(5a) inputting all data of the normalized MXN-dimensional SPI defect tracing data set to a full-connection layer of each trained self-encoder in sequence according to rows, and forming output data of all nodes of the full-connection layer into an MXN-dimensional classification data set, wherein the value of N' is equal to the number of nodes of the full-connection layer;
(5b) selecting A row of data from the classification data set to form a training set, wherein,
Figure BDA0002146985590000041
Figure BDA0002146985590000042
representing a rounding-down operation, wherein M represents the line number of the classified data set, and forming a test set by the residual data in the classified data set;
(5c) training the training set by using a classification regression tree CART training method to obtain a trained classification tree;
(5d) classifying the test set by using the trained classification tree to obtain the classification accuracy of the classification tree;
(6) obtaining an SMT production tracing sequence:
(6a) for each trained classification tree, taking a root node of the classification tree as a starting node of each traversal, sequentially taking all leaf nodes of the classification tree as target nodes of each traversal, and taking all attribute names passed by each traversal as a tracing sequence of the classification tree;
(6b) taking the classification accuracy of each classification tree as the credibility of all tracing sequences of the classification tree;
(6c) searching a self-encoder corresponding to each tracing sequence, then searching a node corresponding to each attribute name in the tracing sequence corresponding to the self-encoder in a full connection layer of the self-encoder, and forming a network weight vector of the attribute name by using network weight values from all nodes of an input layer of the self-encoder to the corresponding node in the full connection layer, wherein the total number of elements of the network weight vector corresponding to each attribute name is the same as the number of nodes of the input layer of the corresponding self-encoder;
(6d) arranging the network weight vectors of each attribute name of each tracing sequence according to rows to form a C x D-dimensional risk matrix of the tracing sequence, wherein C represents the total number of the attribute names in the tracing sequence, and D represents the total number of the attributes in the SPI defect tracing data set;
(6e) all the data obtained by summing the risk matrix of each tracing sequence according to columns form a tracing vector of the tracing sequence, wherein each data in the tracing vector represents the importance of a corresponding attribute in the SPI defect tracing data set;
(6f) and sequencing all data of the tracing vectors of each tracing sequence from large to small to form the SMT production tracing sequence, wherein the SMT production information with higher importance has higher influence on SMT production defects.
Compared with the prior art, the invention has the following advantages:
first, the invention uses the self-encoder to be constructed and trained, and simultaneously reserves the independent variables which are linearly related and nonlinearly related to the dependent variable, thereby overcoming the defect that the prior art can only select the independent variables which are linearly related to the dependent variable, and leading the SMT production information contained in the SMT production tracing sequence obtained by the invention to be more comprehensive.
Secondly, the invention obtains the classification tree set by using an ensemble learning method, obtains the SMT production tracing sequence, and simultaneously describes the linear relation and the nonlinear relation between the independent variable and the dependent variable, thereby overcoming the defect that the prior art can only describe the linear relation between the independent variable and the dependent variable, and leading the invention to more accurately obtain the key factors causing the product defects.
Thirdly, because the invention uses the ensemble learning method to obtain the classification tree set, the relationship between the SMT production information and the SMT product defects is deeply excavated, the defect that the prior art can only obtain related data in the production process of the product is overcome, and the invention can obtain hidden factors causing the product defects.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the self-encoder structure of the present invention;
FIG. 3 is a schematic diagram of production information for the SPI production traceability dataset of the present invention;
FIG. 4 is a classification tree of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The specific steps of the present invention will be described in further detail with reference to fig. 1.
And step 1, constructing a self-encoder.
Build 18 autocoders with the same structure and different parameters, wherein each autocoder has three layers, and the structure of each autocoder is as follows: input layer → fully connected layer → output layer.
The number of nodes of the input layer and the output layer is set to 76.
The number of full link layer nodes for each autoencoder is set according to the following equation:
Figure BDA0002146985590000061
wherein n isiRepresents the number of fully-connected layer nodes of the ith self-encoder, i e {1,2, …,18},
Figure BDA0002146985590000062
denotes a rounding down operation,% denotes a remainder operation.
The structure of the constructed self-encoder will be further described with reference to fig. 2.
The circles in fig. 2, which are marked with X in the leftmost column, represent nodes of the input layer, the circles, which are marked with h in the middle column, represent nodes of the fully connected layer, and the circles, which are marked with y in the right column, represent nodes of the output layer. Each arrow line in fig. 2 represents an input value of the node at the right end of the arrow line after multiplying the value of the node at the left end of the arrow line by the corresponding weight, and the arrow in fig. 2 represents the data flow direction in the prediction process.
Calculating an activation value of each node of the fully-connected layers in the 1 st to 9 th self-encoders according to the following formula:
Figure BDA0002146985590000063
wherein, TmjRepresents the activation value of the jth node in the fully-connected layer of the mth self-encoder, m ∈{1,2,…,9},j∈{1,2,…,Nm},NmRepresenting the total number of fully-connected layer nodes of the mth self-encoder, e(·)Denotes an exponential operation based on a natural constant e, xmjRepresenting the j-th node input value, x, in the fully-connected layer of the m-th self-encodermj=Wmj TXm,WmjA weight matrix representing the network between the input layer of the mth self-encoder and the jth node of the fully-connected layer, the initialized value of each element of the matrix obeys the standard normal distribution, T represents the transposition operation, XmRepresenting a vector of 76 node input values in the mth self-encoder input layer.
Calculating an activation value of each node of the fully-connected layers in the 10 th to 18 th self-encoders according to the following formula:
Rln=max(0,xln)
wherein R islnRepresents the activation value of the nth node in the fully-connected layer of the ith self-encoder, i ∈ {10, …,18}, N ∈ {1,2, …, N ∈l},NlRepresents the total number of nodes of the fully-connected layer of the ith self-encoder, max (·) represents the max operation, xlnRepresenting the nth node input value, x, in the fully-connected layer of the ith self-encoderln=Wln TXl,WlnA weight matrix representing a network between the nth node of the input layer and the fully-connected layer of the ith self-encoder, the initialized value of each element of the matrix obeys a standard normal distribution, T represents a transposition operation, XlRepresenting a vector of 76 node input values in the ith self-encoder input layer.
Table 118 parameter table of self-encoder
Figure BDA0002146985590000071
In an embodiment of the present invention, the parameters of the 18 autoencoders are shown in table 1.
Calculating a loss error value between each output value from the encoder output layer and the input layer input value according to:
Figure BDA0002146985590000081
wherein L isiRepresenting the loss error value between the i-th output value from the encoder output layer and the input layer input value, i e {1,2, …,18}, NiRepresents the number of input layer nodes and the number of output layer nodes of the ith self-encoder, sigma represents the summation operation, yikRepresents the input value of the kth node of the ith self-encoder input layer,
Figure BDA0002146985590000082
representing the output value of the ith node from the output layer of the encoder, k e {1,2, …, N }.
And 2, acquiring an SPI defect tracing data set.
At least 5320000 pieces of SPI tracing data are randomly extracted from a database of a Manufacturing Execution System (MES), an MXN dimensionality SPI defect tracing data set is formed, M is at least 70000, N is at least 76, each row of data represents SPI defect tracing data containing production information, each column of data represents a sequence formed by all values of one attribute in the SPI tracing data set, and at least 20000 rows of SPI tracing data are defective detection data in the SPI defect tracing data set.
Five types of production information included in the SPI defect trace back data will be further described with reference to fig. 3. The boxes labeled "process parameters" in fig. 3 represent production information in terms of process parameters, including blade classification speed, blade classification distance, platen print height compensation, platen separation speed, platen separation distance, blade pressure, cleaning speed. The box labeled "printing process status parameters" in fig. 3 represents production information in terms of printing process status parameters, including print time, work file, production count, squeegee count, MASK count, squeegee mean pressure, squeegee minimum pressure, squeegee maximum pressure, auto clean count, manual clean count, print direction, and platen separation delay. The box labeled "intermediate product inspection parameter" in fig. 3 represents production information in terms of intermediate product inspection result parameters, including pad volume, pad area, pad height, and inspection result. The box labeled "environmental parameter" in fig. 3 represents production information in terms of environmental parameters, including humidity and temperature. The box labeled "raw material property parameter" in fig. 3 represents production information in terms of raw material property parameters, including PCB bar code, PCB length, PCB width, PCB thickness, pad number, package type, doctor blade ID, steel mesh ID. The box marked with MES system in the middle represents MES system, and the line in the figure represents 5 aspects of production information from MES system.
And 3, normalizing the data of each attribute in the SPI defect tracing data set according to the following formula to obtain a normalized SPI defect tracing data set:
x'qp=(xqp-min(xq))/(max(xq)-min(xq))
wherein, x'qpNormalized value, x, of the pth data representing the qth attribute in the SPI defect trace datasetqpP data representing the q attribute of the SPI defect tracing data set, min (-) represents the minimum value operation, xqAll data representing the qth attribute of the SPI defect trace back data set, max (·) represents a max operation.
In the embodiment of the present invention, the SPI defect trace back data set before normalization is shown in table 2, and includes 7 types of production information in the obtained SPI defect trace back data set, where each row of data includes 7 types of production information including blade pressure, blade speed, separation speed, pad volume, pad area, pad height, and SPI detection result, where the serial number is the number of the row where the data is located, the unit of blade pressure is newton per square centimeter, the unit of blade speed is millimeter per second, the unit of separation speed is centimeter per second, the pad volume represents the relative value of the pad volume automatically calculated by the SPI detection device, the pad area represents the relative value of the pad area automatically calculated by the SPI detection device, the pad height represents the relative value of the pad height automatically calculated by the SPI detection device, and the SPI detection result represents the detection result of the SPI detection device, 0 represents no defect and 1 represents continuous tin defect.
TABLE 2 partial data table of SPI defect tracing data set
Figure BDA0002146985590000091
The normalization process in the embodiment of the present invention is illustrated by taking the blade pressure attribute values of the first row data table of table 2 as an example. The maximum value of all data listed in table 2 for blade pressure is 13 and the minimum value is 8. The blade attribute value for the first row of the data table in table 2 is 11, which is normalized as follows:
x'=(11-8)/(13-8)
the normalized blade attribute value for the first row of the data table was calculated to be 0.6.
The results obtained after normalizing all the data of table 2 are shown in table 3.
Table 3 partial normalized SPI defect tracing data table
Figure BDA0002146985590000101
And 4, training the self-encoder.
And respectively inputting the normalized SPI defect tracing data sets into an input layer of each of 18 self-encoders, and respectively training each self-encoder by using a random gradient descent method to obtain 18 trained self-encoders in total.
The steps of the random gradient descent method are as follows:
step 1, randomly selecting an unselected data from the normalized SPI defect tracing data set;
and 2, after the selected data is input into the input layer of the self-encoder, calculating a loss error value between the output data of the output layer of the self-encoder and the selected data according to the following formula:
Figure BDA0002146985590000102
wherein L isiRepresenting the loss error value between the i-th output value from the encoder output layer and the input layer input value, i e {1,2, …,18}, NiRepresents the number of input layer nodes and the number of output layer nodes of the ith self-encoder, sigma represents the summation operation, yikRepresents the input value of the kth node of the ith self-encoder input layer,
Figure BDA0002146985590000104
represents the output value of the kth node of the ith self-encoder output layer, k is equal to {1,2, …, N };
and 3, updating each parameter in the self-encoder network according to the following formula:
Figure BDA0002146985590000103
wherein, omega'tDenotes the t-th parameter updated from the encoder, t ∈ {1,2, …,2 × N × (num +1) }, num denotes the total number of full-link layer nodes from the encoder, ωtRepresents the t-th parameter before updating of the self-encoder, l represents the learning rate, and the value range of l is [0,1 ]],
Figure BDA0002146985590000111
Indicating a derivation operation, thetatThe t-th parameter of the self-encoder before representing the parameter updating;
step 4, inputting the data selected in the step one into an input layer of the self-encoder after the parameters are updated, and calculating a loss error value between output layer output data of each self-encoder after the parameters are updated and the selected data according to the following formula;
Figure BDA0002146985590000112
wherein L isiRepresenting the loss error value between the i-th output value from the encoder output layer and the input layer input value, i e {1,2, …,18}, NiIndicating the ith self-encodingThe number of input layer nodes and the number of output layer nodes of the device, sigma, the summation operation, yikRepresents the input value of the kth node of the ith self-encoder input layer,
Figure BDA0002146985590000113
represents the output value of the kth node of the ith self-encoder output layer, k is equal to {1,2, …, N };
in the embodiment of the present invention, the training error values of 18 autoencoders are shown in table 4:
table 418 loss error values table from encoder
Self encoder sequence number Error value of training Self encoder sequence number Error value of training
1 0.0093 10 0.0158
2 0.0159 11 0.0100
3 0.0095 12 0.0034
4 0.0061 13 0.0081
5 0.0036 14 0.0075
6 0.0067 15 0.0030
7 0.0119 16 0.0017
8 0.0195 17 0.0075
9 0.0194 18 0.0151
Step 5, judging whether the loss error value between the updated output value of the output layer of the self-encoder and the selected data is smaller than the current loss error value threshold value, if so, obtaining a trained self-encoder, otherwise, executing the first step; the threshold value is a value selected from the range of [0,300] according to different requirements on the training precision of the self-encoder network, the larger the selected value is, the lower the training precision of the network is, and the smaller the selected value is, the higher the training precision of the network is.
In the embodiment of the present invention, the threshold value is set to 0.02.
And 5, obtaining a classification tree set by using an ensemble learning method.
And sequentially inputting all data of the normalized MXN-dimensional SPI defect tracing data set to a full-connection layer of each trained self-encoder according to rows, and forming an MXN 'dimensional classification data set by output data of all nodes of the full-connection layer, wherein the value of N' is equal to the number of nodes of the full-connection layer.
Selecting A row of data from the classification data set to form a training set, wherein,
Figure BDA0002146985590000122
Figure BDA0002146985590000123
representing a rounding-down operation, M representing the number of rows of the sorted data set, and grouping the remaining data in the sorted data set into a test set.
And training the training set by using a classification regression tree CART training method to obtain a trained classification tree.
The CART training method comprises the following steps:
step 1, taking the sequence number of each column in the training set as an attribute of the training set, and forming a value sequence of the attribute corresponding to the training set by all elements of each column in the training set;
step 2, deleting repeated numerical values in the value sequence of each attribute to obtain a numerical value set of each attribute;
step 3, counting the frequency of each value in the value set of each attribute, wherein the frequency of each value appearing in the value sequence of the attribute corresponding to the training set is used as the frequency of the value;
and 4, calculating the value sequence and the value set of each attribute according to the following formula to obtain the kini index value of each attribute:
Figure BDA0002146985590000121
wherein, gbKeny index value, N, representing the b-th attributebThe total number of values in the value set representing the b-th attribute, sigma represents summation operation, s represents the sequence number of the values in the value set, and s belongs to [1, N ]b],nbsFrequency count, n, of the s-th value in the set of values representing the b-th attributebThe total number of values of the value sequence of the b-th attribute is represented;
step 5, taking the attribute with the maximum Gini index value as the optimal attribute;
step 6, adding the attribute name of the optimal attribute into a base classifier;
step 7, arranging all numerical values of the value sequence of the optimal attribute from small to large as the optimal attribute sequence;
step 8, sequentially taking the average value of each pair of adjacent numerical values in the optimal attribute sequence from left to right as a segmentation point of the optimal attribute sequence, forming all the numerical values smaller than the segmentation point in the sequence into a left sequence of the segmentation point, and forming all the numerical values larger than the segmentation point in the sequence into a right sequence of the segmentation point;
and 9, respectively calculating the Gini index value of each division point according to the following formula:
Figure BDA0002146985590000131
wherein g represents the importance score of the segmentation point, c represents the number of numerical values of the left sequence of the segmentation point, and d represents the number of numerical values of the right sequence of the segmentation point;
step 10, selecting the value of the segmentation point with the maximum Gini index value as the segmentation threshold value of the optimal attribute;
step 11, taking the row elements of each row in the training set as a piece of classification data;
step 12, forming a left sub-training set by the classification data of which the values of all the optimal attributes are less than or equal to the segmentation threshold, and forming a right sub-training set by the classification data of which the values of all the optimal attributes are greater than the segmentation threshold;
and (13) respectively training the left sub-training set and the right sub-training set by using the CART training method which is the same as that in the step (5c) until the SPI detection results of all data in the left sub-training set and the right sub-training set are the same, and forming a classification tree by all attribute names of the base classifier.
The trained classification tree is further described with reference to fig. 4, where each box in fig. 4 represents each node in a classification tree. Wherein the box labeled "X2" indicates that the value of the node is the attribute name of the output data sequence of the 2 nd node of the full link layer of the self-encoder corresponding to the classification tree. The box labeled "X [16 ]" indicates that the value of the node is the attribute name of the output data sequence of the fully-connected 16 th node of the self-encoder to which the classification tree corresponds. The box labeled "X4" indicates that the value of the node is the attribute name of the output data sequence of the 4 th node of the full link layer of the self-encoder to which the classification tree corresponds. The box labeled "X7" indicates the value of the node and the attribute name of the output data sequence of the fully-connected 7 th node of the self-encoder corresponding to the classification tree. The box labeled "X5" indicates the value of the node and the attribute name of the output data sequence of the fully-connected 5 th node of the self-encoder corresponding to the classification tree. The box labeled "tin-through" indicates that the SPI detection result for the node is tin-through. The box labeled "non-defective" indicates that the SPI detection result for that node is non-defective. In fig. 4, the starting node of the arrow line is a parent node, and the destination node of the arrow line is a child node.
And classifying the test set by using the trained classification tree to obtain the classification accuracy of the classification tree.
And 6, obtaining an SMT production tracing sequence.
And for each trained classification tree, taking a root node of the classification tree as a starting node of each traversal, sequentially taking all leaf nodes of the classification tree as destination nodes of each traversal, and taking all attribute names passed by each traversal as a tracing sequence of the classification tree.
And taking the classification accuracy of each classification tree as the credibility of all the tracing sequences of the classification tree.
Searching a self-encoder corresponding to each tracing sequence, then searching a node corresponding to each attribute name in the tracing sequence corresponding to the self-encoder in a full connection layer of the self-encoder, and forming a network weight vector of the attribute name by using network weight values from all nodes of an input layer of the self-encoder to the corresponding node in the full connection layer, wherein the total number of elements of the network weight vector corresponding to each attribute name is the same as the number of nodes of the input layer of the corresponding self-encoder.
And arranging the network weight vectors of each attribute name of each tracing sequence according to rows to form a C multiplied by D (dimension) risk matrix of the tracing sequence, wherein C represents the total number of the attribute names in the tracing sequence, and D represents the total number of the attributes in the SPI defect tracing data set.
And (3) forming a tracing vector of each tracing sequence by all the data obtained by summing the risk matrix of each tracing sequence according to columns, wherein each data in the tracing vector represents the importance of a corresponding attribute in the SPI defect tracing data set.
And sequencing all data of the tracing vectors of each tracing sequence from large to small to form an SMT production tracing sequence and an SMT production tracing sequence.
In the embodiment of the invention, the finally obtained SMT production trace sequence is shown in Table 5.
TABLE 5 SMT production traceability sequence Listing
Serial number SMT production tracing sequence
1 Distance of blade separation>Width of the board>Thickness of the board>The tin is connected with the molten tin,
2 distance of blade separation>Width of the board>Automatic cleaning and counting>The tin is connected with the molten tin,
3 speed of blade separation>Automatic cleaning and counting>Tin connection
4 Speed of blade separation>Speed of blade separation>Tin connection
5 Length of the board>Automatic cleaning>Separating speed of the working table>Tin connection
6 Speed of blade separation>Separating speed of the working table>Pressure of the scraper>Tin connection
7 Separating speed of the working table>Tin connection

Claims (4)

1.一种基于自编码器和集成学习的SMT生产追溯方法,其特征在于,构建自编码器,使用集成学习方法获得分类树集合,获得SMT生产追溯序列,该方法的具体步骤包括如下:1. a SMT production tracing method based on self-encoder and integrated learning, it is characterized in that, build self-encoder, use integrated learning method to obtain classification tree set, obtain SMT production tracing sequence, the concrete steps of this method comprise as follows: (1)构建自编码器:(1) Build an autoencoder: (1a)搭建18个结构相同、参数不同的自编码器,每个自编码器为三层,其结构依次是:输入层→全连接层→输出层;(1a) Build 18 autoencoders with the same structure and different parameters. Each autoencoder has three layers, and its structure is: input layer → fully connected layer → output layer; (1b)将输入层和输出层的节点数量设置为76;(1b) Set the number of nodes in the input layer and output layer to 76; (1c)按照下式,设置每个自编码器的全连接层节点数:(1c) According to the following formula, set the number of fully connected layer nodes of each autoencoder:
Figure FDA0002146985580000011
Figure FDA0002146985580000011
其中,ni表示第i个自编码器的全连接层节点数,i∈{1,2,…,18},∈表示属于符号,
Figure FDA0002146985580000012
表示向下取整操作,%表示取余操作;
Among them, n i represents the number of fully connected layer nodes of the ith self-encoder, i∈{1,2,…,18},∈represents a symbol,
Figure FDA0002146985580000012
Represents a round-down operation, and % represents a remainder operation;
(1d)按照下式,计算第1至9自编码器中全连接层的每个节点的激活值:(1d) Calculate the activation value of each node of the fully connected layer in the 1st to 9th autoencoders according to the following formula:
Figure FDA0002146985580000013
Figure FDA0002146985580000013
其中,Tmj表示第m个自编码器的全连接层中第j个节点的激活值,m∈{1,2,…,9},j∈{1,2,…,Nm},Nm表示第m个自编码器的全连接层节点的总数,e(·)表示以自然常数e为底数的指数操作,xmj表示第m个自编码器的全连接层中第j个节点输入值,xmj=Wmj TXm,Wmj表示第m个自编码器的输入层和全连接层中的第j个节点之间网络的权重矩阵,该矩阵的每个元素的初始化值服从标准正态分布,T表示转置操作,Xm表示第m个自编码器的输入层中由76个节点的输入值组成的向量;where T mj represents the activation value of the jth node in the fully connected layer of the mth autoencoder, m∈{1,2,…,9},j∈{1,2,…,N m },N m represents the total number of nodes in the fully connected layer of the mth autoencoder, e ( ) represents the exponential operation with the natural constant e as the base, and x mj represents the input of the jth node in the fully connected layer of the mth autoencoder value, x mj = W mj T X m , W mj represents the weight matrix of the network between the input layer of the m-th autoencoder and the j-th node in the fully connected layer, and the initialization value of each element of the matrix obeys Standard normal distribution, T represents the transpose operation, X m represents the vector composed of the input values of 76 nodes in the input layer of the m-th autoencoder; (1e)按照下式,计算第10至18自编码器中全连接层的每个节点的激活值:(1e) Calculate the activation value of each node of the fully connected layer in the 10th to 18th autoencoders according to the following formula: Rln=max(0,xln)R ln = max(0, x ln ) 其中,Rln表示第l个自编码器的全连接层中第n个节点的激活值,l∈{10,…,18},n∈{1,2,…,Nl},Nl表示第l个自编码器的全连接层的节点总数,max(·)表示取最大值操作,xln表示第l个自编码器的全连接层中第n个节点的输入值,xln=Wln TXl,Wln表示第l个自编码器的输入层和全连接层第n个节点之间网络的权重矩阵,该矩阵的每个元素的初始化值服从标准正态分布,Xl表示第l个自编码器输入层中由76个节点的输入值组成的向量;Among them, R ln represents the activation value of the n-th node in the fully-connected layer of the l-th autoencoder, and l∈{10,…,18},n∈{1,2,…,N l },N l denotes The total number of nodes in the fully connected layer of the l-th autoencoder, max( ) represents the operation of taking the maximum value, x ln represents the input value of the n-th node in the fully-connected layer of the l-th autoencoder, x ln =W ln T X l , W ln represents the weight matrix of the network between the input layer of the l-th autoencoder and the n-th node of the fully connected layer. The initialization value of each element of the matrix obeys the standard normal distribution, and X l represents A vector consisting of the input values of 76 nodes in the input layer of the l-th autoencoder; (1f)按照下式,计算每个自编码器输出层的输出值和输入层输入值之间的损失误差值:(1f) Calculate the loss error value between the output value of each autoencoder output layer and the input value of the input layer according to the following formula:
Figure FDA0002146985580000021
Figure FDA0002146985580000021
其中,Li表示第i个自编码器输出层的输出值和输入层输入值之间的损失误差值,i∈{1,2,…,18},Ni表示第i个自编码器的输入层节点数和输出层节点数,∑表示求和操作,yik表示第i个自编码器输入层第k个节点的输入值,
Figure FDA0002146985580000022
表示第i个自编码器输出层第k个节点的输出值,k∈{1,2,…,N};
Among them, Li represents the loss error value between the output value of the ith autoencoder output layer and the input value of the input layer, i∈{1,2,...,18}, Ni represents the ith autoencoder's output value The number of nodes in the input layer and the number of nodes in the output layer, ∑ represents the summation operation, y ik represents the input value of the kth node of the ith autoencoder input layer,
Figure FDA0002146985580000022
Represents the output value of the kth node of the ith self-encoder output layer, k∈{1,2,…,N};
(2)获取SPI缺陷追溯数据集:(2) Obtain the SPI defect traceability data set: 从制造执行系统MES的数据库中随机提取至少5320000条的SPI追溯数据,组成M×N维度的SPI缺陷追溯数据集,M至少为70000,N至少为76,其中,每行数据表示一条包含生产信息的SPI缺陷追溯数据,每列数据表示SPI追溯数据集中一个属性的所有取值组成的序列,SPI缺陷追溯数据集中至少20000行SPI追溯数据是有缺陷的检测数据;Randomly extract at least 5,320,000 pieces of SPI traceability data from the MES database of the manufacturing execution system to form an M×N dimension of SPI defect traceability data sets, where M is at least 70,000 and N is at least 76, where each row of data represents a piece of production information. SPI defect traceability data, each column of data represents a sequence composed of all values of an attribute in the SPI traceability data set, and at least 20,000 lines of SPI traceability data in the SPI traceability data set are defective inspection data; (3)按照下式,对SPI缺陷追溯数据集中每种属性的数据进行归一化处理,得到归一化后的SPI缺陷追溯数据集:(3) According to the following formula, normalize the data of each attribute in the SPI defect traceability data set to obtain the normalized SPI defect traceability data set: x'qp=(xqp-min(xq))/(max(xq)-min(xq))x' qp =(x qp -min(x q ))/(max(x q )-min(x q )) 其中,x'qp表示SPI缺陷追溯数据集中第q个属性的第p个数据归一化后的值,xqp表示SPI缺陷追溯数据集第q个属性的第p个数据,min(·)表示取最小值操作,xq表示SPI缺陷追溯数据集第q个属性的所有数据,max(·)表示取最大值操作;Among them, x' qp represents the normalized value of the p-th data of the q-th attribute in the SPI defect tracing data set, x qp represents the p-th data of the q-th attribute in the SPI defect tracing data set, and min( ) means The operation of taking the minimum value, x q represents all the data of the qth attribute of the SPI defect traceability data set, and max( ) represents the operation of taking the maximum value; (4)训练自编码器:(4) Train the autoencoder: 将归一化后的SPI缺陷追溯数据集分别输入到18个自编码器的每个自编码器的输入层中,使用随机梯度下降法,分别训练每个自编码器,共得到18个训练好的自编码器;Input the normalized SPI defect tracing data set into the input layer of each autoencoder of 18 autoencoders, and use the stochastic gradient descent method to train each autoencoder separately, and a total of 18 trained autoencoders are obtained. the autoencoder; (5)使用集成学习方法获得分类树集合:(5) Use the ensemble learning method to obtain a set of classification trees: (5a)将归一化后的M×N维SPI缺陷追溯数据集的所有数据,按行依次输入到训练好的每个自编码器的全连接层,将全连接层所有节点的输出数据组成M×N'维度的分类数据集,N'的取值与全连接层的节点数相等;(5a) Input all data of the normalized M×N-dimensional SPI defect tracing data set into the fully connected layer of each trained autoencoder row by row, and combine the output data of all nodes in the fully connected layer M×N' dimension classification data set, the value of N' is equal to the number of nodes in the fully connected layer; (5b)从分类数据集中选取A行数据组成训练集,其中,
Figure FDA0002146985580000031
Figure FDA0002146985580000032
表示向下取整操作,M表示分类数据集的行数,将分类数据集中的剩余数据组成测试集;
(5b) Select row A data from the classification data set to form a training set, wherein,
Figure FDA0002146985580000031
Figure FDA0002146985580000032
Represents the round-down operation, M represents the number of rows in the classification data set, and the remaining data in the classification data set is formed into a test set;
(5c)使用分类回归树CART训练法,对训练集进行训练,得到训练好的分类树;(5c) Use the classification regression tree CART training method to train the training set to obtain a trained classification tree; (5d)使用训练好的分类树对测试集进行分类,获得该分类树的分类准确率;(5d) using the trained classification tree to classify the test set to obtain the classification accuracy of the classification tree; (6)获得SMT生产追溯序列:(6) Obtain SMT production traceability sequence: (6a)对于每个训练好的分类树,将该分类树的根节点作为每次遍历的出发节点,依次将该分类树的所有叶子节点作为每次遍历的目的节点,将每次遍历所经过的所有属性名作为该分类树的一条溯源序列;(6a) For each trained classification tree, the root node of the classification tree is used as the starting node of each traversal, all the leaf nodes of the classification tree are used as the destination node of each traversal in turn, and the All attribute names of the classification tree are used as a traceable sequence of the classification tree; (6b)将每个分类树的分类准确率作为该分类树的所有溯源序列的可信度;(6b) Taking the classification accuracy of each classification tree as the credibility of all traceable sequences of the classification tree; (6c)查找每条溯源序列所对应的自编码器,再找出该自编码器对应的溯源序列中每个属性名在该自编码器的全连接层中对应的节点,将该自编码器的输入层所有节点到全连接层中的对应节点的网络权重值,组成该属性名的网络权重向量,每个属性名对应的网络权重向量的元素总数与对应自编码器输入层节点数目相同;(6c) Find the autoencoder corresponding to each traceability sequence, and then find out the node corresponding to each attribute name in the fully connected layer of the autoencoder in the traceability sequence corresponding to the autoencoder, and use the autoencoder The network weight values of all nodes in the input layer of the input layer to the corresponding nodes in the fully connected layer form the network weight vector of the attribute name, and the total number of elements of the network weight vector corresponding to each attribute name is the same as the number of nodes in the corresponding autoencoder input layer; (6d)将每条溯源序列的每个属性名的网络权重向量按行排列,组成该溯源序列的C×D维的风险矩阵,其中,C表示该溯源序列中属性名的总数,D表示SPI缺陷追溯数据集中属性的总数;(6d) Arrange the network weight vector of each attribute name of each traceability sequence in rows to form a C×D-dimensional risk matrix of the traceability sequence, where C represents the total number of attribute names in the traceability sequence, and D represents the SPI The total number of attributes in the defect traceability dataset; (6e)将每条溯源序列的风险矩阵按列求和后的所有数据,组成该溯源序列的追溯向量,追溯向量中的每一个数据表示SPI缺陷追溯数据集中的对应属性的重要度;(6e) All the data after the summation of the risk matrix of each traceability sequence by column forms the traceability vector of the traceability sequence, and each data in the traceability vector represents the importance of the corresponding attribute in the SPI defect traceability data set; (6f)将每个溯源序列的追溯向量的所有数据从大到小进行排序,组成SMT生产溯源序列。(6f) Sort all data of the traceability vector of each traceability sequence from large to small to form an SMT production traceability sequence.
2.根据权利要求1所述的基于自编码器和集成学习的SMT生产追溯方法,其特征在于,步骤(4)中所述的随机梯度下降法的步骤如下:2. the SMT production traceability method based on self-encoder and integrated learning according to claim 1, is characterized in that, the step of the stochastic gradient descent method described in step (4) is as follows: 第一步,从归一化后的SPI缺陷追溯数据集中随机选择一条未选过的数据;The first step is to randomly select a piece of unselected data from the normalized SPI defect traceability data set; 第二步,将所选数据输入到自编码器的输入层后按照下式,计算自编码器的输出层的输出数据和所选数据之间的损失误差值:The second step is to input the selected data into the input layer of the autoencoder and calculate the loss error value between the output data of the output layer of the autoencoder and the selected data according to the following formula:
Figure FDA0002146985580000041
Figure FDA0002146985580000041
其中,L表示所选数据输入自编码器后输出层的输出值和所选数据之间的损失误差值,N表示自编码器的输入层节点的总数,自编码器的输出层节点的总数和输入层节点的总数相等,且自编码器的输入层节点与输出层节点按照节点顺序一一对应,∑表示求和操作,yk表示自编码器输入层中第k个节点的输入值,
Figure FDA0002146985580000042
表示自编码器输出层中第k个节点的输出值,k∈{1,2,…,N},∈表示属于符号;
Among them, L represents the loss error value between the output value of the output layer after the selected data is input to the self-encoder and the selected data, N represents the total number of nodes in the input layer of the self-encoder, the total number of nodes in the output layer of the self-encoder and The total number of input layer nodes is equal, and the input layer nodes of the auto-encoder correspond to the output layer nodes one-to-one according to the node order, ∑ represents the summation operation, y k represents the input value of the kth node in the input layer of the auto-encoder,
Figure FDA0002146985580000042
Represents the output value of the kth node in the output layer of the auto-encoder, k∈{1,2,…,N}, ∈ means belonging to the symbol;
第三步,按照下式,更新自编码器网络中的每个参数:The third step is to update each parameter in the autoencoder network according to the following formula:
Figure FDA0002146985580000043
Figure FDA0002146985580000043
其中,ω′t表示自编码器更新后的第t个参数,t∈{1,2,…,2×N×(num+1)},num表示自编码器全连接层节点的总数,ωt表示自编码器更新前的第t个参数,l表示学习率,其取值范围为[0,1],
Figure FDA0002146985580000044
表示求偏导操作,θt表示参数更新前自编码器的第t个参数;
Among them, ω′ t represents the t-th parameter updated by the autoencoder, t∈{1,2,…,2×N×(num+1)}, num represents the total number of fully connected layer nodes of the autoencoder, ω t represents the t-th parameter before the auto-encoder update, l represents the learning rate, and its value range is [0, 1],
Figure FDA0002146985580000044
Represents the partial derivative operation, and θ t represents the t-th parameter of the autoencoder before the parameter update;
第四步,将第一步所选的数据输入到参数更新后的自编码器的输入层中,按照下式,计算参数更新后自编码器的输出层输出数据与所选数据之间的损失误差值:The fourth step is to input the data selected in the first step into the input layer of the auto-encoder after the parameter update, and calculate the loss between the output data of the output layer of the auto-encoder after the parameter update and the selected data according to the following formula difference:
Figure FDA0002146985580000051
Figure FDA0002146985580000051
其中,L'表示参数更新后自编码器输出层的输出数据与所选数据之间的损失误差值,
Figure FDA0002146985580000052
表示参数更新后自编码器输出层中第k个节点的输出值;
where L' represents the loss error value between the output data of the output layer of the autoencoder and the selected data after parameter update,
Figure FDA0002146985580000052
Represents the output value of the kth node in the output layer of the autoencoder after parameter update;
第五步,判断更新后的自编码器输出层的输出值和所选数据之间的损失误差值是否小于当前损失误差值阈值,若是,得到一个训练好的自编码器,否则,执行第一步;所述阈值为一个根据对自编码器网络的训练精度的不同要求而从[0,300]范围选取的数值,选取的数值越大,网络的训练精度越低,选取的数值越小,网络的训练精度越高。The fifth step is to judge whether the loss error value between the output value of the updated autoencoder output layer and the selected data is less than the current loss error value threshold. If so, get a trained autoencoder, otherwise, execute the first step. Step; the threshold is a value selected from the range of [0,300] according to the different requirements for the training accuracy of the self-encoder network, the larger the selected value, the lower the training accuracy of the network, the smaller the selected value, the smaller the value of the network. The higher the training accuracy.
3.根据权利要求1所述的基于自编码器和集成学习的SMT生产追溯方法,其特征在于,步骤(2)中所述的生产信息包含原料、工艺、印刷过程、环境和检测结果5个方面。3. the SMT production traceability method based on self-encoder and integrated learning according to claim 1, is characterized in that, the production information described in step (2) comprises 5 raw materials, technology, printing process, environment and detection result aspect. 4.根据权利要求1所述的基于自编码器和集成学习的SMT生产追溯方法,其特征在于,步骤(5c)中所述的CART训练法的步骤如下:4. the SMT production traceability method based on self-encoder and integrated learning according to claim 1, is characterized in that, the step of the CART training method described in step (5c) is as follows: 第一步,将训练集中每列的序号作为训练集的一个属性,将训练集中每列的所有元素组成训练集对应属性的取值序列;In the first step, the serial number of each column in the training set is used as an attribute of the training set, and all elements of each column in the training set are formed into the value sequence of the corresponding attribute of the training set; 第二步,删除每个属性的取值序列中重复的数值,得到每个属性的数值集;The second step is to delete the repeated values in the value sequence of each attribute to obtain the value set of each attribute; 第三步,统计每个属性的数值集中,每个数值在训练集对应属性的取值序列中出现的次数,作为该数值的频数;The third step is to count the number of occurrences of each value in the value sequence of the corresponding attribute of the training set in the value set of each attribute, as the frequency of the value; 第四步,按照下式,对每个属性的取值序列和数值集进行计算,得到每个属性的基尼指数值:The fourth step is to calculate the value sequence and value set of each attribute according to the following formula, and obtain the Gini index value of each attribute:
Figure FDA0002146985580000053
Figure FDA0002146985580000053
其中,gb表示第b个属性的基尼指数值,Nb表示第b个属性的数值集的数值总数,∑表示求和操作,s表示数值集中数值的序号,s∈[1,Nb],nbs表示第b个属性的数值集中第s个数值的频数,nb表示第b个属性的取值序列的数值总数;Among them, g b represents the Gini index value of the bth attribute, N b represents the total number of values in the value set of the bth attribute, ∑ represents the summation operation, s represents the sequence number of the value in the value set, s∈[1,N b ] , n bs represents the frequency of the s th value in the value set of the b th attribute, and n b represents the total number of values in the value sequence of the b th attribute; 第五步:将基尼指数值最大的属性作为最优属性;Step 5: Take the attribute with the largest Gini index value as the optimal attribute; 第六步:将最优属性的属性名添加到基分类器中;Step 6: Add the attribute name of the optimal attribute to the base classifier; 第七步:将最优属性的取值序列的所有数值从小到大排列作为最优属性序列;Step 7: Arrange all the values of the optimal attribute value sequence from small to large as the optimal attribute sequence; 第八步:从左向右依次取最优属性序列中每对相邻数值的平均值,作为最优属性序列的一个分割点,将序列中所有小于该分割点的数值组成该分割点的左序列,将序列中所有大于该分割点的数值组成该分割点的右序列;Step 8: Take the average value of each pair of adjacent values in the optimal attribute sequence from left to right, as a split point of the optimal attribute sequence, and combine all the values in the sequence smaller than the split point to form the left side of the split point. Sequence, all the values in the sequence that are greater than the split point form the right sequence of the split point; 第九步:按照下式,分别计算每个分割点的基尼指数值:Step 9: Calculate the Gini index value of each split point according to the following formula:
Figure FDA0002146985580000061
Figure FDA0002146985580000061
其中,g表示分割点的重要度得分,c表示分割点的左序列的数值的个数,d表示分割点的右序列的数值的个数;Among them, g represents the importance score of the segmentation point, c represents the number of values in the left sequence of the segmentation point, and d represents the number of values in the right sequence of the segmentation point; 第十步:选择基尼指数值最大的分割点的值作为最优属性的分割阈值;Step 10: Select the value of the segmentation point with the largest Gini index value as the segmentation threshold of the optimal attribute; 第十一步:将训练集中每行的行元素作为一条分类数据;Step 11: Use the row elements of each row in the training set as a piece of classification data; 第十二步:将所有最优属性的取值小于等于分割阈值的分类数据组成一个的左子训练集,将所有最优属性的取值大于分割阈值的分类数据组成一个右子训练集;The twelfth step: form a left sub-training set of all the classification data whose values of the optimal attributes are less than or equal to the segmentation threshold, and form a right sub-training set of all the classification data whose values of the optimal attributes are greater than the segmentation threshold; 第十三步:对左子训练集和右子训练集分别使用与步骤(5c)相同的CART训练法进行训练,直到左子训练集和右子训练集中的所有数据的SPI检测结果均相同,将基分类器的所有属性名组成分类树。Step 13: Use the same CART training method as in step (5c) to train the left sub-training set and the right sub-training set respectively, until the SPI detection results of all the data in the left sub-training set and the right sub-training set are the same, Combine all attribute names of the base classifier into a classification tree.
CN201910688024.3A 2019-07-29 2019-07-29 SMT production tracing method based on self-encoder and ensemble learning Active CN110533071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910688024.3A CN110533071B (en) 2019-07-29 2019-07-29 SMT production tracing method based on self-encoder and ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910688024.3A CN110533071B (en) 2019-07-29 2019-07-29 SMT production tracing method based on self-encoder and ensemble learning

Publications (2)

Publication Number Publication Date
CN110533071A CN110533071A (en) 2019-12-03
CN110533071B true CN110533071B (en) 2022-03-22

Family

ID=68660567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910688024.3A Active CN110533071B (en) 2019-07-29 2019-07-29 SMT production tracing method based on self-encoder and ensemble learning

Country Status (1)

Country Link
CN (1) CN110533071B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140057753A (en) * 2012-11-04 2014-05-14 박호열 Production management system of surface mounted tfchnology
CN109597968A (en) * 2018-12-29 2019-04-09 西安电子科技大学 Paste solder printing Performance Influence Factor analysis method based on SMT big data
CN109657718A (en) * 2018-12-19 2019-04-19 广东省智能机器人研究院 SPI defect classification intelligent identification Method on a kind of SMT production line of data-driven
CN110021341A (en) * 2019-02-21 2019-07-16 华东师范大学 A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7366728B2 (en) * 2004-04-27 2008-04-29 International Business Machines Corporation System for compressing a search tree structure used in rule classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140057753A (en) * 2012-11-04 2014-05-14 박호열 Production management system of surface mounted tfchnology
CN109657718A (en) * 2018-12-19 2019-04-19 广东省智能机器人研究院 SPI defect classification intelligent identification Method on a kind of SMT production line of data-driven
CN109597968A (en) * 2018-12-29 2019-04-09 西安电子科技大学 Paste solder printing Performance Influence Factor analysis method based on SMT big data
CN110021341A (en) * 2019-02-21 2019-07-16 华东师范大学 A kind of prediction technique of GPCR drug based on heterogeneous network and targeting access

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向电子元器件产品的质量追溯系统设计与实现;屈正龙;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20160315(第03期);I138-2892 *

Also Published As

Publication number Publication date
CN110533071A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN109597968B (en) SMT big data-based solder paste printing performance influence factor analysis method
CN110543616B (en) SMT solder paste printing volume prediction method based on industrial big data
Tsai Modeling and optimization of stencil printing operations: A comparison study
CN104407589B (en) Workshop manufacturing process-oriented active sensing and anomaly analysis method of real-time production performance
DE112020001874T5 (en) DATA EXTRACTION SYSTEM
CN111242363A (en) A method and system for predicting order combination and layout of PCB boards based on machine learning
CN113601261B (en) Monitoring method of online rapid optimization model for cutter
WO2022267509A1 (en) Method for training smt printing parameter optimization model, device, and storage medium
CN114375107B (en) Method, device and equipment for reconstructing unstructured influencing factors of solder paste printing on SMT production lines
CN111832432A (en) A real-time prediction method of tool wear based on wavelet packet decomposition and deep learning
CN114330549A (en) Chemical process fault diagnosis method based on depth map network
CN107728589B (en) A method for on-line monitoring of flexible IC substrate etching and development process
US20190095876A1 (en) Method and system for determining maintenance policy of complex forming device
CN106055579B (en) Vehicle performance data cleaning system and method based on artificial neural network
CN115099147A (en) A process analysis and intelligent decision-making method based on SMT production line
CN115587543A (en) Tool Remaining Life Prediction Method and System Based on Federated Learning and LSTM
CN113822499A (en) Train spare part loss prediction method based on model fusion
WO2020162884A1 (en) Parameter suggestion system
CN111177495A (en) Method for intelligently identifying data content and generating corresponding industry report
CN114820569A (en) PCB surface defect classification method based on improved ResNet34 network
CN115017671A (en) Industrial process soft sensing modeling method and system based on data flow online cluster analysis
Lawrence et al. On the distribution of performance from multiple neural-network trials
CN118037112A (en) Tread quality prediction model construction method based on data driving
CN110533071B (en) SMT production tracing method based on self-encoder and ensemble learning
Digiesi et al. A model to evaluate the human error probability in inspection tasks of a production system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant