Summary of the invention
The objective of the invention is to, a kind of Fault Locating Method is provided, can when veneer breaks down, navigate to particular device, and this method has the function of continuous correction station-keeping ability.
For realizing this purpose, the invention provides a kind of Fault Locating Method, comprise step:
(A) set up correlation matrix between fault mode and the test item at veneer;
(B) optimize correlation matrix by localization of fault rate FDR and Percent Isolated FIR;
(C) optimize correlation matrix by mantenance data;
(D) when veneer breaks down, according to the correlation matrix fault location device after optimizing.
The described correlation matrix of step (A) is:
Wherein, the i row matrix is: [di1 di2 ... dij], shown the correlativity of fault mode Fi and each test item, it represents that when component failure the test result of which test item is failure;
The j column matrix is:
Which possible fault mode is the correlativity that has shown test item Tj and each fault mode exist during its expression test item Tj failure;
Wherein, d
IjBe that 1 expression is relevant, 0 expression is uncorrelated.
The foundation of described fault mode Fi comprises step:
(A1) obtain the device fault library;
(A2) obtain the device list of this veneer by the BOM inventory of veneer;
(A3) obtain the fault mode tabulation of all devices of this veneer by device list and device fault library.
The described FDR of step (B) can count N by detected fault mode
DRatio with the contingent fault mode sum of device on veneer N.
The described FIR of step (B) is that orientable fault mode is counted N
ILWith can count N by detected fault mode
DThe ratio.
Step (B) further comprises: at the correlation matrix of having set up, FDR that obtains determining and FIR, if FDR, FIR index do not reach expectation value FDR=95%, FIR=60%, then optimize testing scheme by increasing test item, rebuild correlation matrix; If FDR, FIR index have reached expectation value FDR=95%, FIR=60%, then this correlation matrix enters the practical application optimizing phase.
Step (C) is described to be after found a fault, finishing maintenance by mantenance data optimization correlation matrix, and maintenance record is entered in the corresponding correlation matrix.
Described maintenance record is entered in the corresponding correlation matrix is: fault shows as the failure of Tj test item, and it is the test crash that the Fi fault mode causes that maintenance is found, then at Tj, the unit of Fi correspondence adds up 1.
Step (D) is described to be according to the correlation matrix after optimizing according to the correlation matrix fault location device after optimizing, and utilizes Bayes' theorem, obtains each fault mode and causes occurring the test item failed probability, according to this probability, realizes the location of defective device.
Step (D) further comprises step:
(D1) find the test item of all failures, the relative coefficient of the test item of all failures of each fault mode correspondence that adds up obtains the accumulated value of all failures of each fault mode;
(D2) test item that finds all to pass through, the relative coefficient of all test items that pass through of each fault mode correspondence that adds up obtains all accumulated values by item of each fault mode;
(D3) to each fault mode on the veneer, the accumulated value of all failure items deducts all accumulated values by item with it, if the result who obtains is a positive number, and then direct record; If the result who obtains is a negative, then change 0 and record into;
(D4) calculating has only under the situation of a fault mode fault probability that each fault mode breaks down;
(D5) preferential which device of changing of the size of the probability that breaks down according to each fault mode decision.
Step (D4) satisfies following relation:
Wherein, P (Aj|B) represents that this fault mode is the probability of Aj when detecting a fault mode fault; P (B) expression is extracted a fault mode out and is detected the probability that this fault mode breaks down; The probability that P (Aj) expression fault mode Aj occurs on veneer; P (B|Aj) expression is extracted fault mode Aj out and is detected, the probability that this fault mode Aj breaks down, it equals Dj * ∑ j, wherein, Dj represents the probability that certain fault mode breaks down on all veneers, ∑ j represents the historical number of times that certain fault mode occurs on testing single-board.
Implement the present invention, can when breaking down, veneer, utilize Bayes' theorem to calculate the probability that each device breaks down on the veneer according to the correlation matrix after foundation and the optimization, and, realize failure location according to preferential which device of changing of the size decision of probability.
Method of the present invention is set up correlation matrix between fault mode and the test item at veneer, and by the mantenance data of practical application correlation matrix is optimized, and along with the accumulation of mantenance data, can constantly revise its station-keeping ability.
Implement the present invention, promptly consider the ability of veneer device fault location, and this method makes service experience realize datumization, can reach the purpose that experience is inherited in the design phase of Fault Locating Method.
Embodiment
The invention provides a kind of Fault Locating Method, it can navigate to particular device when veneer breaks down, and this method has the function of continuous correction station-keeping ability, and as shown in Figure 4, the realization of this method is specific as follows:
Step 1, at the veneer modeling, set up the corresponding relation between fault mode and the test item, so-called fault mode is the situation of the contingent inefficacy of each device on the veneer, adopts following correlation matrix form:
Wherein, i row matrix Fi is:
[di1 di2 … dij]
The correlativity that has shown fault mode Fi and each test item, it represents that when component failure the test result of which test item is failure;
Setting up of fault mode Fi is as follows:
(1.1) obtain the device fault library, wherein comprise all devices, and all fault modes of this device;
(1.2) the BOM inventory (Bill of Material, Bill of Material (BOM)) by veneer obtains the device list of this veneer;
(1.3) obtain the fault mode tabulation of all devices of this veneer by device list and device fault library.
And j column matrix Tj is:
Which possible fault mode is the correlativity that has shown test item Tj and each fault mode exist during its expression test item Tj failure.
If the fault of device may be detected in certain test item, then the correspondence position in matrix fills out 1, and expression is relevant; If can not detect, then the correspondence position in matrix fills out 0, represents uncorrelated.
Foundation below in conjunction with specific embodiment explanation correlation matrix:
Fig. 1 is the illustraton of model of a typical circuit, and as shown in Figure 1, each device and function thereof are on the veneer: CPU is the main control unit of veneer, the data when RAM1 and RAM2 are used to preserve the CPU operation, and RAM1 and RAM2 hang on same the bus of CPU; Functional module FUN1, FUN2 and FUN3 are used for signal Processing, and input signal enters from FUN1, export from FUN3 at last.
In the embodiment shown in fig. 1, for the sake of simplicity, give tacit consent to each device and have only a failure mode, i.e. chip global failure; During practical application, a device has a plurality of failure modes usually.
Test item as shown in Figure 1 is:
The T1 position measurement only covers FUN1, so T1 is only relevant with FUN1;
The T2 position measurement covers FUN1 and FUN2 simultaneously, so T2 is relevant with FUN1, FUN2;
The T3 position measurement covers FUN1, FUN2 and FUN3 simultaneously, so T3 is relevant with FUN1, FUN2, FUN3;
T4 is the CPU self check, so T4 is only relevant with CPU;
T5 be CPU to the RAM1 readwrite tests, so T5 is relevant with CPU, RAM1;
T6 be CPU to the RAM2 readwrite tests, so T6 is relevant with CPU, RAM2.
According to each device on the veneer and and each test item between correlativity, can obtain correlation of data matrix as shown in table 1:
| Correlativity | T1 | T2 | T3 | T4 | T5 | T6 |
| CPU | 0 | 0 | 0 | 1 | 1 | 1 |
| RAM1 | 0 | 0 | 0 | 0 | 1 | 0 |
| RAM2 | 0 | 0 | 0 | 0 | 0 | 1 |
| FUN1 | 1 | 1 | 1 | 0 | 0 | 0 |
| FUN2 | 0 | 1 | 1 | 0 | 0 | 0 |
| FUN3 | 0 | 0 | 1 | 0 | 0 | 0 |
Table 1
Its correlation matrix is:
Step 2, optimize correlation matrix by localization of fault rate and Percent Isolated:
After the correlation matrix of fault mode and test item is set up on the veneer, can obtain some quantitative indexs, as: localization of fault rate FDR and Percent Isolated FIR.
Wherein, FDR: can count N by detected fault mode
DWith the ratio of the contingent fault mode sum of device on veneer N, represent with percentage:
FIR: orientable fault mode is counted N
ILWith can count N by detected fault mode
DRatio, represent with percentage:
The described location refers to the T1 for determining as the test item result, and T2 during ..Tj (what wherein Tj represented is the test result of Tj test item, by or failure), can judge that specific fault mode has taken place.
Data such as FIR1, FIR2 are arranged in the practical application, the levels of precision of its data representation location, what represent as FIR1 is to navigate to a fault mode; What FIR2 represented is to navigate to two fault modes.
Can adopt the historical crash rate of each fault mode in calculating, real data such as the ratio of each fault mode come above-mentioned formula is revised:
Promptly to N
D, N is weighted processing, corresponding certain fault mode Fi
[di1 di2 … dij]
Mark have in di1~dij one non-vanishing, Xi=1 then, otherwise Xi=0,
When then not revising: N
D=∑ Xi;
During correction: N
D=∑ Xi * Pi;
Wherein Pi represents the historical crash rate of this fault mode, and other are similar.
But for the correlation matrix of practical application, the localization of fault rate that is provided with and the expectation value of Percent Isolated are usually: FDR=95%, FIR=60%.
Fig. 2 is a process flow diagram of optimizing correlation matrix among the present invention by FDR and FIR, as shown in Figure 2, at the correlation matrix of having set up, FDR that can obtain determining and FIR, if FDR, FIR index do not reach expectation value, then need to optimize testing scheme, rebuild correlation matrix then by increasing methods such as test item.If FDR, FIR index have reached expectation value, then correlation matrix can enter the practical application optimizing phase.
The FDR and the FIR of above-mentioned table 1 correlation of data matrix are respectively:
By the data in the table 1 as can be known: owing to each fault mode on the veneer all can be detected, so N
D=6; Always have 6 failure modes on the veneer, promptly the fault mode of each device is as a failure mode, thus N=6, therefore
By the data in the table 1 as can be known: because vector [the di1 di2 of each fault mode ... dij] different, all devices all can be distinguished, so N
IL=6; Because each fault mode on the veneer all can be detected, so N
D=6, therefore
If cancellation T2 position measurement in the test item shown in Figure 1 then obtains correlation of data matrix as shown in table 2:
| Correlativity | T1 | T3 | T4 | T5 | T6 |
| CPU | 0 | 0 | 1 | 1 | 1 |
| RAM1 | 0 | 0 | 0 | 1 | 0 |
| RAM2 | 0 | 0 | 0 | 0 | 1 |
| FUN1 | 1 | 1 | 0 | 0 | 0 |
| FUN2 | 0 | 1 | 0 | 0 | 0 |
| FUN3 | 0 | 1 | 0 | 0 | 0 |
Table 2
Its correlation matrix is:
By table 2 data as can be known, the FDR and the FIR of employing table 2 correlation of data matrix are respectively:
Because the data of FUN2 and FUN3 are identical in the table 2, then the fault mode of FUN2 and FUN3 is identical, and so the fault undistinguishable of FUN2 and FUN3 is the N of this correlation matrix
IL=6-2=4, then
By The above results as can be seen,, can not distinguish the fault of FUN2, FUN3, therefore need to increase the test item of T2 position measurement, rebuild correlation of data matrix as shown in table 1 by fault mode though FIR reaches expectation value.
Step 3, enter the practical application optimizing phase, optimize correlation matrix by mantenance data:
Fill out 0 or 1 in the design correlation matrix just, the corresponding relation between expression test cell and the test item, in entering the practical application optimizing process, can be specific as follows to the operation of correlation matrix:
Fig. 3 is a process flow diagram of optimizing correlation matrix among the present invention by mantenance data, as shown in Figure 3, after having found a fault, having finished maintenance, maintenance record is entered in the corresponding correlation matrix, if current fault shows as T1, the failure of T3 test item, final maintenance finds it is the test crash that the F2 fault mode causes, then at T1, the unit of F2 correspondence and T3, the unit of F2 correspondence adds up 1.
Be described further optimizing correlation matrix below in conjunction with concrete maintenance examples by mantenance data:
On the basis of setting up correlation of data matrix as shown in table 1, the location of keeping in repair, when the test result of maintenance is as shown in table 3:
| | T1(OK) | T2(OK) | T3(Fail) | T4(OK) | T5(OK) | T6(OK) |
| CPU | 0 | 0 | 0 | 1 | 1 | 1 |
| RAM1 | 0 | 0 | 0 | 0 | 1 | 0 |
| RAM2 | 0 | 0 | 0 | 0 | 0 | 1 |
| FUN1 | 1 | 1 | 1 | 0 | 0 | 0 |
| FUN2 | 0 | 1 | 1 | 0 | 0 | 0 |
| FUN3 | 0 | 0 | 1→2 | 0 | 0 | 0 |
Table 3
As can be seen from Table 3, T1, T2, T4, T5, T6 test are passed through, the T3 test crash, then the set of the possible defective device that obtains by correlation matrix for (FUN1, FUN2, FUN3); Pass through owing to T1 tests simultaneously, so the FUN1 state is normal; Because T2 test is passed through, so FUN1 and FUN2 state are normal; Finally obtaining defective device is FUN3.
Change FUN3, carry out the T3 test again, if pass through, then the explanation location is correct, and strengthen the correlativity between T3 and the FUN3 this moment, as: the unit of FUN3 and T3 correspondence adds up 1.
When the test result of maintenance is as shown in table 4:
| | T1(OK) | T2(OK) | T3(OK) | T4(OK) | T5(Fail) | T6(OK) |
| CPU | 0 | 0 | 0 | 1 | 1 | 1 |
| RAM1 | 0 | 0 | 0 | 0 | 1 | 0 |
| RAM2 | 0 | 0 | 0 | 0 | 0→1 | 1 |
| FUN1 | 1 | 1 | 1 | 0 | 0 | 0 |
| FUN2 | 0 | 1 | 1 | 0 | 0 | 0 |
| FUN3 | 0 | 0 | 1 | 0 | 0 | 0 |
Table 4
As can be seen from Table 4, T1, T2, T3, T4, T6 test are passed through, the T5 test crash, the device that breaks down according to the correlation matrix location should be RAM1, but maintenance finds that the device of fact damaged is RAM2, at this moment need to analyze reason, in fact RAM1 and RAM2 hang on the bus of CPU simultaneously, when causing its conversion bus when the RAM2 damage, CPU will fail to the read-write of RAM1, in this case, just needs such experience is kept in the correlation matrix, promptly increase the correlativity of T5 and RAM2, as: add up 1 in the unit of correspondence.
Correlation matrix fault location device behind step 4, the optimizing application:
Continuous increase along with cumulative data when veneer breaks down, utilizes Bayes' theorem, can obtain the probability that each fault mode occurs, and according to these probable values, realizes the location of defective device.
Realize that to utilizing Bayes' theorem the location of defective device is described further below in conjunction with specific embodiment:
After the data of one end time of accumulation, data as shown in table 5 may appear:
| | Dj | T1 | T2 | T3 | T4 | T5 | T6 |
| A0 | CPU | D
0 | 0d
1,0 | 0d
2,0 | 0d
3,0 | 3d
4,0 | 4d
5,0 | 1d
6,0 |
| A1 | RAM1 | D
1 | 0d
1,1 | 0d
2,1 | 0d
3,1 | 0d
4,1 | 5d
5,1 | 2d
6,1 |
| A2 | RAM2 | D
2 | 0d
1,2 | 0d
2,2 | 0d
3,2 | 0d
4,2 | 9d
5,2 | 1d
6,2 |
| A3 | FUN1 | D
3 | 3d
1,3 | 1d
2,3 | 2d
3,3 | 0d
4,3 | 0d
5,3 | 0d
6,3 |
| A4 | FUN2 | D
4 | 0d
1,4 | 6d
2,4 | 1d
3,4 | 0d
4,4 | 0d
5,4 | 0d
6,4 |
| A5 | FUN3 | D
5 | 0d
1,5 | 0d
2,5 | 1d
3,5 | 0d
4,5 | 0d
5,5 | 0d
6,5 |
Table 5
Wherein, Dj represents the probability that certain fault mode breaks down on all veneers, for example as CPU can be used in much on other the veneer, in 1 year, used 10000 these CPU altogether, found altogether that on all veneers 35 CPU damage, then this probability is 35/10000;
d
IjBe used for representing the correlativity of test item Ti and device Aj.
When veneer breaks down, carry out the location of defective device as follows, as shown in table 5, when supposing the failure of T4 item and T5 item test item:
4.1 find the test item of all failures, the relative coefficient of the test item of all failures of each fault mode correspondence that adds up obtains the accumulated value of all failures of each fault mode, obtains the accumulated value of each device as shown in table 6:
| Device |
The relative coefficient that adds up |
Accumulated value |
| CPU |
∑
fail0=d
4,0+d
5,0 |
7 |
| RAM1 |
∑
fail1=d
4,1+d
5,1 |
5 |
| RAM2 |
∑
fail2=d
4,2+d
5,2 |
9 |
| FUN1 |
∑
fail3=d
4,3+d
5,3 |
0 |
| FUN2 |
∑
fail4=d
4,4+d
5,4 |
0 |
| FUN3 |
∑
fail5=d
4,5+d
5,5 |
0 |
Table 6
4.2 the test item that finds all to pass through, the relative coefficient of all test items that pass through of each fault mode correspondence that adds up obtains all accumulated values by item of each fault mode, obtains the accumulated value of each device as shown in table 7:
| Device | The relative coefficient that adds up | Accumulated value |
| CPU | ∑
ok0=d
1,0+d
2,0+d
3,0+d
6,0 | 1 |
| RAM1 | ∑
ok1=d
1,1+d
2,1+d
3,1+d
6,1 | 2 |
| RAM2 | ∑
ok2=d
1,2+d
2,2+d
3,2+d
6,2 | 1 |
| FUN1 | ∑
ok3=d
1,3+d
2,3+d
3,3+d
6,3 | 6 |
| FUN2 | ∑
ok4=d
1,4+d
2,4+d
3,4+d
6,4 | 7 |
| FUN3 | ∑
ok5=d
1,5+d
2,5+d
3,5+d
6,5 | 1 |
Table 7
4.3 to each fault mode on the veneer, the accumulated value of all failure items deducts all accumulated values by item with it, if the result who obtains is a positive number, and then direct record; If the result who obtains is a negative, then change 0 and record, i.e. ∑ j=max (∑ failj-∑ okj into; 0), this ∑ j represents the historical number of times that certain fault mode occurs on testing single-board, obtain data as shown in table 8:
| Device |
Dj |
∑j |
| CPU |
D
0 |
∑0=Max(7-1;0)=6 |
| RAM1 |
D
1 |
∑1=Max(5-2;0)=3 |
| RAM2 |
D
2 |
∑2=Max(9-1;0)=8 |
| FUN1 |
D
3 |
∑3=Max(0-6;0)=0 |
| FUN2 |
D
4 |
∑4=Max(0-7;0)=0 |
| FUN3 |
D
5 |
∑5=Max(0-1;0)=0 |
Table 8
4.4 calculate and have only under the situation of a fault mode fault probability that each fault mode breaks down:
In the practical application, a device has a plurality of fault modes, has only when calculating under the situation of a fault mode fault, and the probability that each fault mode breaks down is defined as follows:
P (B) expression is extracted a fault mode out and is detected the probability that this fault mode breaks down;
The probability that P (Aj) expression fault mode Aj occurs on veneer thinks that the probability that each fault mode occurs all equates, i.e. P (Aj)=1;
P (B|Aj) expression is extracted fault mode Aj out and is detected, the probability that this fault mode Aj breaks down, and it equals Dj * ∑ j;
Then when detecting a fault mode fault, this fault mode is that the probability of Aj is
For data shown in the table 8,
Its ∑ P (Aj) * P (B|Aj)=D
0* 6+D
1* 3+D
2* 8+D
3* 0+D
4* 0+D
5* 0
=D
0×6+D
1×3+D
2×8
Can obtain probability as shown in table 9:
| Device |
P(B|Aj) |
P(Aj|B) |
| CPU |
P(B|A0) |
(D
0×6)/(D
0×6+D
1×3+D
2×8)
|
| RAM1 |
P(B|A1) |
(D
1×3)/(D
0×6+D
1×3+D
2×8)
|
| RAM2 |
P(B|A2) |
(D
2×8)/(D
0×6+D
1×3+D
2×8)
|
| FUN1 |
P(B|A3) |
0 |
| FUN2 |
P(B|A4) |
0 |
| FUN3 |
P(B|A5) |
0 |
Table 9
4.5 preferential which device of changing of the size of the probability that breaks down according to each fault mode decision: promptly according to preferential which device of changing of the size decision of P (Aj|B).
By above-mentioned method, can when breaking down, veneer, utilize Bayes' theorem to calculate the probability that each device breaks down on the veneer according to the correlation matrix after foundation and the optimization, and, realize failure location according to preferential which device of changing of the size decision of probability.
Implement the present invention, set up correlation matrix between fault mode and the test item at veneer, and by the mantenance data of practical application correlation matrix is optimized, along with the accumulation of mantenance data, method of the present invention can constantly be revised its station-keeping ability.
Implement the present invention, promptly consider the ability of veneer device fault location, and this method makes service experience realize datumization, can reach the purpose that experience is inherited in the design phase of Fault Locating Method.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.