[go: up one dir, main page]

CN111857097A - Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency - Google Patents

Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency Download PDF

Info

Publication number
CN111857097A
CN111857097A CN202010733364.6A CN202010733364A CN111857097A CN 111857097 A CN111857097 A CN 111857097A CN 202010733364 A CN202010733364 A CN 202010733364A CN 111857097 A CN111857097 A CN 111857097A
Authority
CN
China
Prior art keywords
frequency
inverse document
word frequency
word
industrial control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010733364.6A
Other languages
Chinese (zh)
Other versions
CN111857097B (en
Inventor
李少森
梁钰华
孙豪
黄剑湘
杨光
李�浩
张启浩
任君
杨铖
丁丙侯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming Bureau of Extra High Voltage Power Transmission Co
Original Assignee
Kunming Bureau of Extra High Voltage Power Transmission Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming Bureau of Extra High Voltage Power Transmission Co filed Critical Kunming Bureau of Extra High Voltage Power Transmission Co
Priority to CN202010733364.6A priority Critical patent/CN111857097B/en
Publication of CN111857097A publication Critical patent/CN111857097A/en
Application granted granted Critical
Publication of CN111857097B publication Critical patent/CN111857097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0243Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/24Pc safety
    • G05B2219/24065Real time diagnostics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, which comprises the following steps: establishing a response corpus of diagnosis commands; sending a diagnosis command to the tested system again to obtain the (N +1) th echo message; filtering stop words and segmenting all echoed messages; calculating the inverse document frequency IDF of each word in each group of text lists of all echoing messages by using a TF-IDF word frequency and inverse document frequency algorithm; setting a lowest inverse document frequency threshold IDFmin, and deleting words not greater than IDFmin; establishing a phrase list V for the text list of the filtered N +1 echo messages, and calculating a word frequency value; and setting a word frequency threshold value, and comparing the calculated word frequency value with the set word frequency threshold value to judge the abnormality. The algorithm can define the health degree of each diagnosis command echoed information in a self-learning mode, can greatly reduce the manual development cost of an industrial control monitoring system, and improves the timeliness of event judgment.

Description

Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency
Technical Field
The invention relates to the technical field of industrial control system abnormity diagnosis, in particular to an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency.
Background
At present, part of industrial control systems realize operation and maintenance based on remote management, local operation interfaces such as screens and keys are not provided for field operation and maintenance personnel to interact, and a debugging computer is required to be used for access, and the system is interacted with a device in modes such as debugging software/a browser and the like to check and analyze system problems. Once an abnormal event of a channel or a device occurs, on-site operation and maintenance personnel can only give an alarm according to the channel interruption of other service systems and feedback and obtain information of operation and maintenance personnel of a remote monitoring center (such as each level of scheduling master station) and then access an industrial control system to check, analyze and process the abnormal reason by using a debugging computer. If the remote monitoring does not notice the abnormity, the abnormity can be discovered only when on-site operation and maintenance personnel regularly operate and maintain and configure backup, and the fault processing is generally delayed and untimely. Due to the randomness of the abnormality of the industrial control system, the quality of the abnormality analysis is lower as time goes on because the detailed information of the abnormality moment is difficult to grasp by manual regular checking and analysis.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, and solves the problem of low quality of abnormity analysis of the industrial control system in the prior art.
The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, which comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N echo messages according to a time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command corpus established in the step (1);
and step 3: filtering stop words and segmenting the N +1 echo messages;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm;
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message where the words are positioned, and calculating the word frequency value
Figure BDA0002604047460000021
And 7: setting a word frequency threshold tfmaxThe word frequency value calculated in step 6 is used
Figure BDA0002604047460000022
Value and set word frequency thresholdThe value tfmaxMake a comparison if
Figure BDA0002604047460000023
The message is identified as an abnormal message and alarm information is output.
According to an embodiment of the present invention, the sending time interval of the diagnostic command in step 1 is T, the value range of T is determined according to the time range in which the returned result of the diagnostic command may change, and the value range of T is 1 to 30 days under the condition that the system resource does not change suddenly; the value range of T is 1 s-24 h under the condition that the network channel is interrupted at any time.
According to an embodiment of the invention, the stop word in step 3 comprises a date and a time.
According to one embodiment of the invention, the date format is yyy-mm-dd and the time format is hh mm ss, h mm.
According to an embodiment of the present invention, the word processing in step 3 specifically includes: and (3) taking a blank as a separator, and dividing the N +1 group of command playback into a plurality of phrases to form an N +1 group of one-dimensional text list.
According to an embodiment of the present invention, the calculation formula of the IDF in step 4 is:
Figure BDA0002604047460000031
according to one embodiment of the present invention, IDFmin ≧ 1 in step 5.
According to an embodiment of the present invention, in step 6, the word frequency value
Figure BDA0002604047460000033
The calculation method comprises the following steps: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echoing message, and obtaining a (N +1) x (M) matrixAnd A, if aij is the element in the ith row and j column of the matrix A, the word frequency of each element a (N +1) j in the (N +1) th group text list
Figure BDA0002604047460000034
Is defined as:
Figure BDA0002604047460000032
according to one embodiment of the present invention, tf in step 7maxThe value range of (A) is 0.2-0.5.
The beneficial effects that the invention can realize are as follows:
1. the invention relates to an industrial control system abnormity diagnostic information identification method based on word frequency and inverse document frequency, which is used for industrial control system diagnostic information identification through a word frequency and inverse document frequency algorithm to realize automatic mining of key information in each piece of diagnostic command echo information, such as abnormal value change, sudden generation of alarm content and the like, without manually defining the key content and information abnormity criterion for each piece of diagnostic command echo information. And then, judging the frequency of the variable type information in the sample through word frequency calculation, and giving an alarm to the variables which appear less frequently (such as sudden abnormal high CPU load, abnormal alarm information and the like) to prompt operation and maintenance personnel to pay attention in time.
2. The algorithm of the invention can define the health degree of each diagnosis command echo information in a self-learning mode, can greatly reduce the manual development cost of an automatic monitoring industrial control system, can be easily transplanted to the operation state monitoring work of different business systems by an analysis method irrelevant to the characteristics of the monitored system, has strong adaptability, can effectively liberate manpower, improves the timeliness of event judgment and improves the operation and maintenance efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is an algorithm flow chart of an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency according to the present invention;
FIG. 2 is a schematic diagram of N echoed messages in an embodiment of an identification method for abnormality diagnosis information of an industrial control system based on word frequency and inverse document frequency according to the present invention;
fig. 3 is a schematic diagram of N +1 echo messages in an embodiment of the method for identifying information of abnormality diagnosis of an industrial control system based on word frequency and inverse document frequency according to the present invention.
Detailed Description
In the following description, for purposes of explanation, numerous implementation details are set forth in order to provide a thorough understanding of the various embodiments of the present invention. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, such implementation details are not necessary. In addition, some conventional structures and components are shown in simplified schematic form in the drawings.
In addition, the descriptions related to the first, the second, etc. in the present invention are only used for description purposes, do not particularly refer to an order or sequence, and do not limit the present invention, but only distinguish components or operations described in the same technical terms, and are not understood to indicate or imply relative importance or implicitly indicate the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, an algorithm flow is shown in figure 1, and the method comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times according to a time interval T, and arranging the obtained N echo messages into a corpus as shown in FIG. 2 according to a time sequence to serve as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command response corpus established in the step (1), wherein the arranged effect is shown in fig. 3;
and step 3: filtering stop words and performing word segmentation on the N +1 echo messages, wherein the stop words comprise a date format yyy-mm-dd and a time format hh: mm: ss and h: mm, and performing word segmentation: taking a blank as a separator, and dividing the N +1 groups of command playback display into a plurality of phrases to form an N +1 group of one-dimensional text lists;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm, wherein the calculation formula of the IDF is as follows:
Figure BDA0002604047460000051
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in the N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words of the text list according to the ordering of the vocabularies in V on the N +1 group text list after filtering, and then the phrasesConverting the word into a vector, wherein the size of the vector is the frequency of the word appearing in the echo message, obtaining an (N +1) x (M) matrix A, and if aij is an element of an ith row and j column of the matrix A, for each element a (N +1) j in an N +1 group text list, the word frequency of the element is
Figure BDA0002604047460000062
Is defined as:
Figure BDA0002604047460000061
and 7: setting a word frequency threshold tfmaxThe word frequency value calculated in step 6 is used
Figure BDA0002604047460000063
Value and set word frequency threshold tfmaxMake a comparison if
Figure BDA0002604047460000064
The message is identified as an abnormal message and alarm information is output.
Example one
The longitudinal encryption device of the Pu' er converter station of +/-800 kV is taken as an example for explanation:
setting a playback message of the tested system after a top diagnosis command is sent to the longitudinal encryption device at a certain time as follows:
top-18:29:33up 2:26,1user,load average:0.00,0.03,0.06
Tasks:0total,0running,0sleeping,0stopped,0zombie
%Cpu(s):20.0us,0.0sy,0.0ni,80.0id,0.0wa,0.0hi,0.0si,0.0st
MiB Mem:987.4total,91.3free,642.2used,253.8buff/cache
MiB Swap:1022.0total,776.4free,245.6used.185.3avail Mem
step 1: and sending the diagnosis command to the tested system for N times according to the time interval T which is 5 seconds, obtaining N echo messages for the diagnosis command, arranging the echo messages according to the time sequence to be used as a corpus of the command, wherein the echo messages simultaneously contain meaningless information and meaningful information. Meaningless information such as date and time, comments and the like, and meaningful information including information reflecting the state of the system to be tested, such as CPU occupancy, memory occupancy, alarm prompts and the like.
Step 2: and after the corpus of the diagnosis command is obtained, sending the diagnosis command to the tested system again and obtaining the (N +1) th echoing information, and adding the echoing message to the corpus.
And step 3: and (3) carrying out filtering stop word processing on the N +1 parts of texts: stop words include date format yyy-mm-dd, time format hh mm: ss, h: mm, e.g., 18:29:33, 2:26, to be filtered; performing word segmentation on the N +1 parts of texts: taking a blank as a separator, dividing the N +1 group of command playback display into a plurality of phrases to form an N +1 group of one-dimensional text lists:
[top,up,1,user,load,average,0.00,0.03,0.06,Tasks……]。
and 4, step 4: and (3) calculating the inverse document frequency IDF of each word of each group of text list in the N +1 groups of echo messages by applying a word frequency algorithm:
Figure BDA0002604047460000071
where the word top is present in N +1 parts of text, it is
Figure BDA0002604047460000072
And 5: the lowest inverse document frequency threshold IDFmin is set to be 1.0, the inverse document frequency of the words in each group of text lists is deleted if the inverse document frequency is smaller than or equal to the threshold, the processing can filter out meaningless information in the command playback, and words such as 'Tasks', 'top', 'user', 'load' and 'average' are annotated to have no meaning and appear in N +1 parts of text, and the inverse document frequency is smaller than 1.0 and is filtered.
Step 6: vectorizing the filtered N +1 groups of text lists: extracting all phrases in the N +1 groups of text lists, and obtaining a phrase table V with the length of M after removing repetition: [ "0.00", "0.03", "0.06", … … ], where M equals the total number of phrases that have been repeatedly filtered, V represents all phrases that appear in the N +1 groups of text lists that have been filtered, then the N +1 groups of text lists that have been filtered are reordered to have the words of the text lists sorted by the vocabulary in V, and then the phrases are converted to vectors: if a group of text list contains "0.00" for 1 time, "0.03" for 0 time, and "0.06" for 3 times, the vector is quantized to [1,0,3, … … ], and the position of the vector in the list coincides with the position of the phrase represented by the vector in the phrase list V.
After the processing is finished, obtaining a (N +1) x (M) matrix A, and setting aij as the element of the ith row and j column of the matrix A, and then for each element a (N +1) j in the (N +1) th group text list, the word frequency of the element a (N +1) j
Figure BDA0002604047460000075
Is defined as:
Figure BDA0002604047460000073
the results for the N +1 set of matrices after vectorization are shown in table 1 below:
matrix A Column 1 Column 2 …… Column M
Line 1: 1 0 0 1
line 2: 0 0 0 2
…… 2 0 0 1
row N + 1: 1 0 1 1
therefore, the method comprises the following steps:
the sum of each column: 4015
Tf of the element of row N + 1: 0.200.50.166667
And 7: setting a word frequency threshold tfmax0.5, when the word frequency of any vector element in the N +1 th group text list
Figure BDA0002604047460000082
Namely, the abnormal message is considered to appear, the algorithm outputs alarm information to remind the operation and maintenance personnel to pay attention.
Example two
The longitudinal encryption device of the + -800 kV Kunzei converter station is taken as an example for explanation:
step 1: sending a top diagnosis command to the longitudinal encryption authentication device in a cycle of T ═ 10 seconds to obtain 4 echo messages, as shown in table 2:
Figure BDA0002604047460000081
Figure BDA0002604047460000091
step 2: the diagnostic command is sent to the longitudinal encryption authentication device again and the 5 th echo message is obtained, as shown in table 3:
Figure BDA0002604047460000092
Figure BDA0002604047460000101
and step 3: the 5 echoed messages in the corpus are processed by text filtering stop words in a uniform format, the content of the processed corpus is shown in table 4, and time-related useless information is deleted:
Figure BDA0002604047460000102
Figure BDA0002604047460000111
performing text word segmentation processing in a unified format on all echoed messages in the corpus: taking blank as separator, changing N +1 group command back display into N +1 group one-dimensional text list, and processing the corpus content such as
Shown in Table 5:
Figure BDA0002604047460000112
Figure BDA0002604047460000121
and 4, step 4: the word frequency algorithm is applied to calculate the inverse document frequency IDF of each word of each group of text list in the N +1 groups of echo messages,
Figure BDA0002604047460000131
IDF calculation is performed on the corpus after the stop words have been filtered and the word segmentation is completed, taking the calculation of the 1 st, 2, 7 th words of the 1 st echo message as an example, and the result is shown in table 6:
Figure BDA0002604047460000132
and 5: setting the lowest inverse document frequency threshold IDFmin to be 0.1, if the IDF value is lower than 0.1, determining that the echoed information is over-frequency information, which can be obtained from table 6, top and up are non-important echoed information, filtering, 0.00 is important echoed information, reserving, and after the IDF calculation of all the echoed information is completed, updating the corpus as shown in table 7:
1 0.000.030.06
2 0.010.050.07
3 0.000.070.06
4 0.010.030.06
5 0.020.040.18
the corpus is subjected to deduplication processing to generate an important echoed information list, the processing result is shown in table 8, and a non-repeated set of all important information in the corpus is displayed:
Figure BDA0002604047460000133
vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after completing the filtering, then reordering the words in the text list according to the ordering of the vocabularies in V on the N +1 group text list after completing the filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message where the words are located, and the conversion result is shown in a table 9: .
1 {1,1,1,0,0,0,0,0,0}
2 {0,0,0,1,1,1,0,0,0}
3 {1,0,1,0,0,1,0,0,0}
4 {0,1,1,1,0,0,0,0,0}
5 {0,0,0,0,0,0,1,1,1}
By calculation of formulae
Figure BDA0002604047460000141
Performing TF word frequency calculation on the 5-time echo messages: the calculation results are shown in table 10:
Figure BDA0002604047460000142
and 7: setting a word frequency threshold tfmaxIf the value of the echoed information TF of a certain message is greater than or equal to the fixed value, judging that abnormal echoed information exists in the echoed message, and if the value of the echoed information TF is less than the fixed value, judging that the echoed message is a normal message, wherein the final result is shown in a table 10:
Figure BDA0002604047460000151
therefore, the information bodies of No. 7, No. 8 and No. 9 in the No. 5 message are abnormal, and the message is an abnormal message and sends out an alarm.
The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (9)

1. An industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency is characterized by comprising the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N echo messages according to a time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command response corpus established in the step (1);
and step 3: filtering stop words and segmenting the N +1 echo messages;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm;
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in the N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message, and calculating the word frequency value
Figure FDA0002604047450000011
And 7: setting a word frequency threshold tfmaxThe word frequency value calculated in step 6 is used
Figure FDA0002604047450000012
Value and set word frequency threshold tfmaxMake a comparison if
Figure FDA0002604047450000013
The message is identified as an abnormal message and alarm information is output.
2. The method for identifying the industrial control system abnormity diagnostic information based on the word frequency and the inverse document frequency according to claim 1, wherein the sending time interval of the diagnostic command in the step 1 is T, the value range of T is determined according to the time range of possible change of the return result of the diagnostic command, and the value range of T is 1-30 days under the condition that the system resource is not mutated; the value range of T is 1 s-24 h under the condition that the network channel is interrupted at any time.
3. The method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein the stop word in the step 3 includes a date and a time.
4. The method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 3, wherein the date format is yyy-mm-dd, and the time format is hh mm: ss, h mm.
5. The method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein the word processing in the step 3 specifically comprises: and (3) taking a blank as a separator, and dividing the N +1 group of command playback into a plurality of phrases to form an N +1 group of one-dimensional text list.
6. The method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein the calculation formula of the IDF in the step 4 is:
Figure FDA0002604047450000021
7. the method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein IDFmin is greater than or equal to 1 in the step 5.
8. The method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 1, wherein in the step 6, the word frequency value
Figure FDA0002604047450000022
The calculation method comprises the following steps: removing all phrases in the N +1 group text list, obtaining a phrase table V with the length of M after removing the repetition, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after completing the filtering, and then completing the filtering for the N +1 groupThe words in the text list are reordered according to the ordering of the vocabulary in the V, then the phrases are converted into vectors, the size of the vectors is the frequency of the words appearing in the echoing message, an (N +1) x (M) matrix A is obtained, aij is an element of the ith row and j column of the matrix A, and the word frequency of each element a (N +1) j in the N +1 group of text list
Figure FDA0002604047450000023
Is defined as:
Figure FDA0002604047450000031
9. the method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 1, wherein tf in the step 7maxThe value range of (A) is 0.2-0.5.
CN202010733364.6A 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency Active CN111857097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010733364.6A CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010733364.6A CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Publications (2)

Publication Number Publication Date
CN111857097A true CN111857097A (en) 2020-10-30
CN111857097B CN111857097B (en) 2023-10-31

Family

ID=72947886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010733364.6A Active CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Country Status (1)

Country Link
CN (1) CN111857097B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761133A (en) * 2021-09-10 2021-12-07 未鲲(上海)科技服务有限公司 System abnormity monitoring method and device based on artificial intelligence and related equipment
CN118378194A (en) * 2024-06-21 2024-07-23 国网江苏省电力有限公司电力科学研究院 A method and device for power equipment operation and maintenance based on knowledge graph

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170145A (en) * 2014-03-07 2015-09-28 Kddi株式会社 Program, apparatus, and server for estimating simple sentence symbolizing target sentence
WO2016093837A1 (en) * 2014-12-11 2016-06-16 Hewlett Packard Enterprise Development Lp Determining term scores based on a modified inverse domain frequency
US20170102984A1 (en) * 2015-10-13 2017-04-13 Huawei Technologies Co., Ltd. Fault Diagnosis Method and Apparatus for Big-Data Network System
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108846142A (en) * 2018-07-12 2018-11-20 南方电网调峰调频发电有限公司 A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
CN109495479A (en) * 2018-11-20 2019-03-19 华青融天(北京)软件股份有限公司 A kind of user's abnormal behaviour recognition methods and device
KR101964412B1 (en) * 2018-12-12 2019-04-01 주식회사 모비젠 Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof
WO2019134334A1 (en) * 2018-01-04 2019-07-11 平安科技(深圳)有限公司 Network abnormal data detection method and apparatus, computer device and storage medium
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A product information security risk monitoring method based on semantic analysis
CN110321411A (en) * 2019-06-26 2019-10-11 国网江苏省电力有限公司 A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing
WO2020124037A1 (en) * 2018-12-13 2020-06-18 DataRobot, Inc. Methods for detecting and interpreting data anomalies, and related systems and devices

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170145A (en) * 2014-03-07 2015-09-28 Kddi株式会社 Program, apparatus, and server for estimating simple sentence symbolizing target sentence
WO2016093837A1 (en) * 2014-12-11 2016-06-16 Hewlett Packard Enterprise Development Lp Determining term scores based on a modified inverse domain frequency
US20170102984A1 (en) * 2015-10-13 2017-04-13 Huawei Technologies Co., Ltd. Fault Diagnosis Method and Apparatus for Big-Data Network System
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
WO2019134334A1 (en) * 2018-01-04 2019-07-11 平安科技(深圳)有限公司 Network abnormal data detection method and apparatus, computer device and storage medium
CN108846142A (en) * 2018-07-12 2018-11-20 南方电网调峰调频发电有限公司 A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
CN109495479A (en) * 2018-11-20 2019-03-19 华青融天(北京)软件股份有限公司 A kind of user's abnormal behaviour recognition methods and device
KR101964412B1 (en) * 2018-12-12 2019-04-01 주식회사 모비젠 Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof
WO2020124037A1 (en) * 2018-12-13 2020-06-18 DataRobot, Inc. Methods for detecting and interpreting data anomalies, and related systems and devices
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A product information security risk monitoring method based on semantic analysis
CN110321411A (en) * 2019-06-26 2019-10-11 国网江苏省电力有限公司 A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
KE ZHANG等: "Automated IT system failure prediction: A deep learning approach", 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), pages 1291 - 1300 *
NOLLE, T: "BINet: Multivariate Business Process Anomaly Detection Using Deep Learning", INTERNATIONAL CONFERENCE ON BUSINESS PROCESS MANAGEMENT, vol. 11080, pages 271 - 287 *
刘浩: "基于关联规则的高铁列控车载设备故障诊断方法研究", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, no. 01, pages 033 - 582 *
吕旭明;雷振江;赵永彬;由广浩;: "电力企业文本数据挖掘技术研究", 电力信息与通信技术, no. 01, pages 7 - 10 *
吴刚勇;张千斌;吴恒超;顾冰;: "基于自然语言处理技术的电力客户投诉工单文本挖掘分析", 电力大数据, no. 10 *
梅御东;陈旭;孙毓忠;牛逸翔;肖立;王海荣;冯百明;: "一种基于日志信息和CNN-text的软件系统异常检测方法", 计算机学报, no. 02, pages 366 - 380 *
王海明: "基于TF-IDF改进计算模型的实时大数据处理系统设计与实现", 中国优秀硕士学位论文全文数据库信息科技辑, no. 04, pages 140 - 886 *
范华;翁利国;周艳;姜川;孙涛;: "基于Bi-LSTM和TFIDF的工单事件提取", 电脑知识与技术, no. 04, pages 291 - 293 *
黄晓丹等: "基于TF-IDF算法的AAA服务异常检测机制研究", 移动通信, pages 83 - 87 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761133A (en) * 2021-09-10 2021-12-07 未鲲(上海)科技服务有限公司 System abnormity monitoring method and device based on artificial intelligence and related equipment
CN118378194A (en) * 2024-06-21 2024-07-23 国网江苏省电力有限公司电力科学研究院 A method and device for power equipment operation and maintenance based on knowledge graph

Also Published As

Publication number Publication date
CN111857097B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN115497272B (en) Construction period intelligent early warning system and method based on digital construction
CN106250934B (en) Defect data classification method and device
CN107423205B (en) System fault early warning method and system for data leakage prevention system
CN111435366A (en) Equipment fault diagnosis method and device and electronic equipment
CN116167370B (en) Anomaly detection method for distributed systems based on log spatiotemporal feature analysis
CN115373369B (en) Vehicle fault diagnosis system and method
DE102021130081A1 (en) AUTOMATIC ONTOLOGY EXTRACTION BASED ON DEEP LEARNING TO CAPTURE NEW AREAS OF KNOWLEDGE
CN111857097A (en) Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency
CN114357171A (en) Emergency event processing method and device, storage medium and electronic equipment
CN110781173A (en) Data identification method and device, computer equipment and storage medium
CN111736636A (en) Flooded waterwheel room early warning method and system based on knowledge graph
CN119441846A (en) A method and system for diagnosing important events in converter stations based on correlation causality
CN118035468A (en) Deep learning-based equal-protection evaluation result record knowledge graph extraction method
DE102013101871A1 (en) Word-based speech analysis and speech analysis facility
CN112910733A (en) Full link monitoring system and method based on big data
CN116185797A (en) Method, device and storage medium for predicting server resource saturation
CN111414943A (en) An Anomaly Detection Method Based on Hybrid Hidden Naive Bayesian Model
CN114757277A (en) Rail transit signal system alarm method, device, electronic equipment and storage medium
DE102020113193B4 (en) Method and system for processing sensor data for transmission to a central unit
CN110083611B (en) Random hybrid system security analysis method based on statistical model detection
CN120407774A (en) Vehicle functional fault intelligent troubleshooting method, device, equipment and storage medium
CN116383710B (en) Label determining method, device, electronic equipment and storage medium
DE102024002821A1 (en) Methods for improving the reliability of the output of a Large Language Model
CN114091447A (en) Text recognition method, device and equipment
CN112435151B (en) Government information data processing method and system based on association analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant