CN111857097A - Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency - Google Patents
Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency Download PDFInfo
- Publication number
- CN111857097A CN111857097A CN202010733364.6A CN202010733364A CN111857097A CN 111857097 A CN111857097 A CN 111857097A CN 202010733364 A CN202010733364 A CN 202010733364A CN 111857097 A CN111857097 A CN 111857097A
- Authority
- CN
- China
- Prior art keywords
- frequency
- inverse document
- word frequency
- word
- industrial control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0243—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/20—Pc systems
- G05B2219/24—Pc safety
- G05B2219/24065—Real time diagnostics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Debugging And Monitoring (AREA)
- Input From Keyboards Or The Like (AREA)
Abstract
The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, which comprises the following steps: establishing a response corpus of diagnosis commands; sending a diagnosis command to the tested system again to obtain the (N +1) th echo message; filtering stop words and segmenting all echoed messages; calculating the inverse document frequency IDF of each word in each group of text lists of all echoing messages by using a TF-IDF word frequency and inverse document frequency algorithm; setting a lowest inverse document frequency threshold IDFmin, and deleting words not greater than IDFmin; establishing a phrase list V for the text list of the filtered N +1 echo messages, and calculating a word frequency value; and setting a word frequency threshold value, and comparing the calculated word frequency value with the set word frequency threshold value to judge the abnormality. The algorithm can define the health degree of each diagnosis command echoed information in a self-learning mode, can greatly reduce the manual development cost of an industrial control monitoring system, and improves the timeliness of event judgment.
Description
Technical Field
The invention relates to the technical field of industrial control system abnormity diagnosis, in particular to an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency.
Background
At present, part of industrial control systems realize operation and maintenance based on remote management, local operation interfaces such as screens and keys are not provided for field operation and maintenance personnel to interact, and a debugging computer is required to be used for access, and the system is interacted with a device in modes such as debugging software/a browser and the like to check and analyze system problems. Once an abnormal event of a channel or a device occurs, on-site operation and maintenance personnel can only give an alarm according to the channel interruption of other service systems and feedback and obtain information of operation and maintenance personnel of a remote monitoring center (such as each level of scheduling master station) and then access an industrial control system to check, analyze and process the abnormal reason by using a debugging computer. If the remote monitoring does not notice the abnormity, the abnormity can be discovered only when on-site operation and maintenance personnel regularly operate and maintain and configure backup, and the fault processing is generally delayed and untimely. Due to the randomness of the abnormality of the industrial control system, the quality of the abnormality analysis is lower as time goes on because the detailed information of the abnormality moment is difficult to grasp by manual regular checking and analysis.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, and solves the problem of low quality of abnormity analysis of the industrial control system in the prior art.
The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, which comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N echo messages according to a time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command corpus established in the step (1);
and step 3: filtering stop words and segmenting the N +1 echo messages;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm;
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message where the words are positioned, and calculating the word frequency value
And 7: setting a word frequency threshold tfmaxThe word frequency value calculated in step 6 is usedValue and set word frequency thresholdThe value tfmaxMake a comparison ifThe message is identified as an abnormal message and alarm information is output.
According to an embodiment of the present invention, the sending time interval of the diagnostic command in step 1 is T, the value range of T is determined according to the time range in which the returned result of the diagnostic command may change, and the value range of T is 1 to 30 days under the condition that the system resource does not change suddenly; the value range of T is 1 s-24 h under the condition that the network channel is interrupted at any time.
According to an embodiment of the invention, the stop word in step 3 comprises a date and a time.
According to one embodiment of the invention, the date format is yyy-mm-dd and the time format is hh mm ss, h mm.
According to an embodiment of the present invention, the word processing in step 3 specifically includes: and (3) taking a blank as a separator, and dividing the N +1 group of command playback into a plurality of phrases to form an N +1 group of one-dimensional text list.
According to an embodiment of the present invention, the calculation formula of the IDF in step 4 is:
according to one embodiment of the present invention, IDFmin ≧ 1 in step 5.
According to an embodiment of the present invention, in step 6, the word frequency valueThe calculation method comprises the following steps: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echoing message, and obtaining a (N +1) x (M) matrixAnd A, if aij is the element in the ith row and j column of the matrix A, the word frequency of each element a (N +1) j in the (N +1) th group text listIs defined as:
according to one embodiment of the present invention, tf in step 7maxThe value range of (A) is 0.2-0.5.
The beneficial effects that the invention can realize are as follows:
1. the invention relates to an industrial control system abnormity diagnostic information identification method based on word frequency and inverse document frequency, which is used for industrial control system diagnostic information identification through a word frequency and inverse document frequency algorithm to realize automatic mining of key information in each piece of diagnostic command echo information, such as abnormal value change, sudden generation of alarm content and the like, without manually defining the key content and information abnormity criterion for each piece of diagnostic command echo information. And then, judging the frequency of the variable type information in the sample through word frequency calculation, and giving an alarm to the variables which appear less frequently (such as sudden abnormal high CPU load, abnormal alarm information and the like) to prompt operation and maintenance personnel to pay attention in time.
2. The algorithm of the invention can define the health degree of each diagnosis command echo information in a self-learning mode, can greatly reduce the manual development cost of an automatic monitoring industrial control system, can be easily transplanted to the operation state monitoring work of different business systems by an analysis method irrelevant to the characteristics of the monitored system, has strong adaptability, can effectively liberate manpower, improves the timeliness of event judgment and improves the operation and maintenance efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is an algorithm flow chart of an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency according to the present invention;
FIG. 2 is a schematic diagram of N echoed messages in an embodiment of an identification method for abnormality diagnosis information of an industrial control system based on word frequency and inverse document frequency according to the present invention;
fig. 3 is a schematic diagram of N +1 echo messages in an embodiment of the method for identifying information of abnormality diagnosis of an industrial control system based on word frequency and inverse document frequency according to the present invention.
Detailed Description
In the following description, for purposes of explanation, numerous implementation details are set forth in order to provide a thorough understanding of the various embodiments of the present invention. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, such implementation details are not necessary. In addition, some conventional structures and components are shown in simplified schematic form in the drawings.
In addition, the descriptions related to the first, the second, etc. in the present invention are only used for description purposes, do not particularly refer to an order or sequence, and do not limit the present invention, but only distinguish components or operations described in the same technical terms, and are not understood to indicate or imply relative importance or implicitly indicate the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The invention discloses an industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency, an algorithm flow is shown in figure 1, and the method comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times according to a time interval T, and arranging the obtained N echo messages into a corpus as shown in FIG. 2 according to a time sequence to serve as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command response corpus established in the step (1), wherein the arranged effect is shown in fig. 3;
and step 3: filtering stop words and performing word segmentation on the N +1 echo messages, wherein the stop words comprise a date format yyy-mm-dd and a time format hh: mm: ss and h: mm, and performing word segmentation: taking a blank as a separator, and dividing the N +1 groups of command playback display into a plurality of phrases to form an N +1 group of one-dimensional text lists;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm, wherein the calculation formula of the IDF is as follows:
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in the N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words of the text list according to the ordering of the vocabularies in V on the N +1 group text list after filtering, and then the phrasesConverting the word into a vector, wherein the size of the vector is the frequency of the word appearing in the echo message, obtaining an (N +1) x (M) matrix A, and if aij is an element of an ith row and j column of the matrix A, for each element a (N +1) j in an N +1 group text list, the word frequency of the element isIs defined as:
Example one
The longitudinal encryption device of the Pu' er converter station of +/-800 kV is taken as an example for explanation:
setting a playback message of the tested system after a top diagnosis command is sent to the longitudinal encryption device at a certain time as follows:
top-18:29:33up 2:26,1user,load average:0.00,0.03,0.06
Tasks:0total,0running,0sleeping,0stopped,0zombie
%Cpu(s):20.0us,0.0sy,0.0ni,80.0id,0.0wa,0.0hi,0.0si,0.0st
MiB Mem:987.4total,91.3free,642.2used,253.8buff/cache
MiB Swap:1022.0total,776.4free,245.6used.185.3avail Mem
step 1: and sending the diagnosis command to the tested system for N times according to the time interval T which is 5 seconds, obtaining N echo messages for the diagnosis command, arranging the echo messages according to the time sequence to be used as a corpus of the command, wherein the echo messages simultaneously contain meaningless information and meaningful information. Meaningless information such as date and time, comments and the like, and meaningful information including information reflecting the state of the system to be tested, such as CPU occupancy, memory occupancy, alarm prompts and the like.
Step 2: and after the corpus of the diagnosis command is obtained, sending the diagnosis command to the tested system again and obtaining the (N +1) th echoing information, and adding the echoing message to the corpus.
And step 3: and (3) carrying out filtering stop word processing on the N +1 parts of texts: stop words include date format yyy-mm-dd, time format hh mm: ss, h: mm, e.g., 18:29:33, 2:26, to be filtered; performing word segmentation on the N +1 parts of texts: taking a blank as a separator, dividing the N +1 group of command playback display into a plurality of phrases to form an N +1 group of one-dimensional text lists:
[top,up,1,user,load,average,0.00,0.03,0.06,Tasks……]。
and 4, step 4: and (3) calculating the inverse document frequency IDF of each word of each group of text list in the N +1 groups of echo messages by applying a word frequency algorithm:where the word top is present in N +1 parts of text, it is
And 5: the lowest inverse document frequency threshold IDFmin is set to be 1.0, the inverse document frequency of the words in each group of text lists is deleted if the inverse document frequency is smaller than or equal to the threshold, the processing can filter out meaningless information in the command playback, and words such as 'Tasks', 'top', 'user', 'load' and 'average' are annotated to have no meaning and appear in N +1 parts of text, and the inverse document frequency is smaller than 1.0 and is filtered.
Step 6: vectorizing the filtered N +1 groups of text lists: extracting all phrases in the N +1 groups of text lists, and obtaining a phrase table V with the length of M after removing repetition: [ "0.00", "0.03", "0.06", … … ], where M equals the total number of phrases that have been repeatedly filtered, V represents all phrases that appear in the N +1 groups of text lists that have been filtered, then the N +1 groups of text lists that have been filtered are reordered to have the words of the text lists sorted by the vocabulary in V, and then the phrases are converted to vectors: if a group of text list contains "0.00" for 1 time, "0.03" for 0 time, and "0.06" for 3 times, the vector is quantized to [1,0,3, … … ], and the position of the vector in the list coincides with the position of the phrase represented by the vector in the phrase list V.
After the processing is finished, obtaining a (N +1) x (M) matrix A, and setting aij as the element of the ith row and j column of the matrix A, and then for each element a (N +1) j in the (N +1) th group text list, the word frequency of the element a (N +1) jIs defined as:
the results for the N +1 set of matrices after vectorization are shown in table 1 below:
| matrix A | Column 1 | Column 2 | …… | Column M |
| Line 1: | 1 | 0 | 0 | 1 |
| line 2: | 0 | 0 | 0 | 2 |
| …… | 2 | 0 | 0 | 1 |
| row N + 1: | 1 | 0 | 1 | 1 |
therefore, the method comprises the following steps:
the sum of each column: 4015
Tf of the element of row N + 1: 0.200.50.166667
Example two
The longitudinal encryption device of the + -800 kV Kunzei converter station is taken as an example for explanation:
step 1: sending a top diagnosis command to the longitudinal encryption authentication device in a cycle of T ═ 10 seconds to obtain 4 echo messages, as shown in table 2:
step 2: the diagnostic command is sent to the longitudinal encryption authentication device again and the 5 th echo message is obtained, as shown in table 3:
and step 3: the 5 echoed messages in the corpus are processed by text filtering stop words in a uniform format, the content of the processed corpus is shown in table 4, and time-related useless information is deleted:
performing text word segmentation processing in a unified format on all echoed messages in the corpus: taking blank as separator, changing N +1 group command back display into N +1 group one-dimensional text list, and processing the corpus content such as
Shown in Table 5:
and 4, step 4: the word frequency algorithm is applied to calculate the inverse document frequency IDF of each word of each group of text list in the N +1 groups of echo messages,
IDF calculation is performed on the corpus after the stop words have been filtered and the word segmentation is completed, taking the calculation of the 1 st, 2, 7 th words of the 1 st echo message as an example, and the result is shown in table 6:
and 5: setting the lowest inverse document frequency threshold IDFmin to be 0.1, if the IDF value is lower than 0.1, determining that the echoed information is over-frequency information, which can be obtained from table 6, top and up are non-important echoed information, filtering, 0.00 is important echoed information, reserving, and after the IDF calculation of all the echoed information is completed, updating the corpus as shown in table 7:
| 1 | 0.000.030.06 |
| 2 | 0.010.050.07 |
| 3 | 0.000.070.06 |
| 4 | 0.010.030.06 |
| 5 | 0.020.040.18 |
the corpus is subjected to deduplication processing to generate an important echoed information list, the processing result is shown in table 8, and a non-repeated set of all important information in the corpus is displayed:
vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in an N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after completing the filtering, then reordering the words in the text list according to the ordering of the vocabularies in V on the N +1 group text list after completing the filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message where the words are located, and the conversion result is shown in a table 9: .
| 1 | {1,1,1,0,0,0,0,0,0} |
| 2 | {0,0,0,1,1,1,0,0,0} |
| 3 | {1,0,1,0,0,1,0,0,0} |
| 4 | {0,1,1,1,0,0,0,0,0} |
| 5 | {0,0,0,0,0,0,1,1,1} |
By calculation of formulaePerforming TF word frequency calculation on the 5-time echo messages: the calculation results are shown in table 10:
and 7: setting a word frequency threshold tfmaxIf the value of the echoed information TF of a certain message is greater than or equal to the fixed value, judging that abnormal echoed information exists in the echoed message, and if the value of the echoed information TF is less than the fixed value, judging that the echoed message is a normal message, wherein the final result is shown in a table 10:
therefore, the information bodies of No. 7, No. 8 and No. 9 in the No. 5 message are abnormal, and the message is an abnormal message and sends out an alarm.
The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (9)
1. An industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency is characterized by comprising the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N echo messages according to a time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain the (N +1) th echo message, and adding the (N +1) th echo message to the end of the diagnosis command response corpus established in the step (1);
and step 3: filtering stop words and segmenting the N +1 echo messages;
and 4, step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N +1 echoed messages by using a TF-IDF word frequency and inverse document frequency algorithm;
and 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text lists calculated in the step 4 if the inverse document frequency IDF is less than or equal to the IDFmin value;
step 6: vectorizing the text list of the N +1 echo messages which are filtered in the step 5: extracting all phrases in the N +1 group text list, removing repetition to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after filtering, then reordering the words in the text list according to the ordering of the vocabularies in V in the N +1 group text list after filtering, then converting the phrases into vectors, the size of the vectors is the number of times that the words appear in the echo message, and calculating the word frequency value
2. The method for identifying the industrial control system abnormity diagnostic information based on the word frequency and the inverse document frequency according to claim 1, wherein the sending time interval of the diagnostic command in the step 1 is T, the value range of T is determined according to the time range of possible change of the return result of the diagnostic command, and the value range of T is 1-30 days under the condition that the system resource is not mutated; the value range of T is 1 s-24 h under the condition that the network channel is interrupted at any time.
3. The method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein the stop word in the step 3 includes a date and a time.
4. The method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 3, wherein the date format is yyy-mm-dd, and the time format is hh mm: ss, h mm.
5. The method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein the word processing in the step 3 specifically comprises: and (3) taking a blank as a separator, and dividing the N +1 group of command playback into a plurality of phrases to form an N +1 group of one-dimensional text list.
7. the method for identifying the industrial control system abnormality diagnosis information based on the word frequency and the inverse document frequency as claimed in claim 1, wherein IDFmin is greater than or equal to 1 in the step 5.
8. The method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 1, wherein in the step 6, the word frequency valueThe calculation method comprises the following steps: removing all phrases in the N +1 group text list, obtaining a phrase table V with the length of M after removing the repetition, wherein M is equal to the total number of the phrases after removing the repetition filtering, V represents all phrases appearing in the N +1 group text list after completing the filtering, and then completing the filtering for the N +1 groupThe words in the text list are reordered according to the ordering of the vocabulary in the V, then the phrases are converted into vectors, the size of the vectors is the frequency of the words appearing in the echoing message, an (N +1) x (M) matrix A is obtained, aij is an element of the ith row and j column of the matrix A, and the word frequency of each element a (N +1) j in the N +1 group of text listIs defined as:
9. the method for identifying the abnormality diagnosis information of the industrial control system based on the word frequency and the inverse document frequency as claimed in claim 1, wherein tf in the step 7maxThe value range of (A) is 0.2-0.5.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010733364.6A CN111857097B (en) | 2020-07-27 | 2020-07-27 | Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010733364.6A CN111857097B (en) | 2020-07-27 | 2020-07-27 | Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111857097A true CN111857097A (en) | 2020-10-30 |
| CN111857097B CN111857097B (en) | 2023-10-31 |
Family
ID=72947886
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010733364.6A Active CN111857097B (en) | 2020-07-27 | 2020-07-27 | Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111857097B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113761133A (en) * | 2021-09-10 | 2021-12-07 | 未鲲(上海)科技服务有限公司 | System abnormity monitoring method and device based on artificial intelligence and related equipment |
| CN118378194A (en) * | 2024-06-21 | 2024-07-23 | 国网江苏省电力有限公司电力科学研究院 | A method and device for power equipment operation and maintenance based on knowledge graph |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015170145A (en) * | 2014-03-07 | 2015-09-28 | Kddi株式会社 | Program, apparatus, and server for estimating simple sentence symbolizing target sentence |
| WO2016093837A1 (en) * | 2014-12-11 | 2016-06-16 | Hewlett Packard Enterprise Development Lp | Determining term scores based on a modified inverse domain frequency |
| US20170102984A1 (en) * | 2015-10-13 | 2017-04-13 | Huawei Technologies Co., Ltd. | Fault Diagnosis Method and Apparatus for Big-Data Network System |
| CN107992597A (en) * | 2017-12-13 | 2018-05-04 | 国网山东省电力公司电力科学研究院 | A kind of text structure method towards electric network fault case |
| CN108846142A (en) * | 2018-07-12 | 2018-11-20 | 南方电网调峰调频发电有限公司 | A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing |
| CN109495479A (en) * | 2018-11-20 | 2019-03-19 | 华青融天(北京)软件股份有限公司 | A kind of user's abnormal behaviour recognition methods and device |
| KR101964412B1 (en) * | 2018-12-12 | 2019-04-01 | 주식회사 모비젠 | Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof |
| WO2019134334A1 (en) * | 2018-01-04 | 2019-07-11 | 平安科技(深圳)有限公司 | Network abnormal data detection method and apparatus, computer device and storage medium |
| CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A product information security risk monitoring method based on semantic analysis |
| CN110321411A (en) * | 2019-06-26 | 2019-10-11 | 国网江苏省电力有限公司 | A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing |
| WO2020124037A1 (en) * | 2018-12-13 | 2020-06-18 | DataRobot, Inc. | Methods for detecting and interpreting data anomalies, and related systems and devices |
-
2020
- 2020-07-27 CN CN202010733364.6A patent/CN111857097B/en active Active
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015170145A (en) * | 2014-03-07 | 2015-09-28 | Kddi株式会社 | Program, apparatus, and server for estimating simple sentence symbolizing target sentence |
| WO2016093837A1 (en) * | 2014-12-11 | 2016-06-16 | Hewlett Packard Enterprise Development Lp | Determining term scores based on a modified inverse domain frequency |
| US20170102984A1 (en) * | 2015-10-13 | 2017-04-13 | Huawei Technologies Co., Ltd. | Fault Diagnosis Method and Apparatus for Big-Data Network System |
| CN107992597A (en) * | 2017-12-13 | 2018-05-04 | 国网山东省电力公司电力科学研究院 | A kind of text structure method towards electric network fault case |
| WO2019134334A1 (en) * | 2018-01-04 | 2019-07-11 | 平安科技(深圳)有限公司 | Network abnormal data detection method and apparatus, computer device and storage medium |
| CN108846142A (en) * | 2018-07-12 | 2018-11-20 | 南方电网调峰调频发电有限公司 | A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing |
| CN109495479A (en) * | 2018-11-20 | 2019-03-19 | 华青融天(北京)软件股份有限公司 | A kind of user's abnormal behaviour recognition methods and device |
| KR101964412B1 (en) * | 2018-12-12 | 2019-04-01 | 주식회사 모비젠 | Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof |
| WO2020124037A1 (en) * | 2018-12-13 | 2020-06-18 | DataRobot, Inc. | Methods for detecting and interpreting data anomalies, and related systems and devices |
| CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A product information security risk monitoring method based on semantic analysis |
| CN110321411A (en) * | 2019-06-26 | 2019-10-11 | 国网江苏省电力有限公司 | A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing |
Non-Patent Citations (9)
| Title |
|---|
| KE ZHANG等: "Automated IT system failure prediction: A deep learning approach", 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), pages 1291 - 1300 * |
| NOLLE, T: "BINet: Multivariate Business Process Anomaly Detection Using Deep Learning", INTERNATIONAL CONFERENCE ON BUSINESS PROCESS MANAGEMENT, vol. 11080, pages 271 - 287 * |
| 刘浩: "基于关联规则的高铁列控车载设备故障诊断方法研究", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, no. 01, pages 033 - 582 * |
| 吕旭明;雷振江;赵永彬;由广浩;: "电力企业文本数据挖掘技术研究", 电力信息与通信技术, no. 01, pages 7 - 10 * |
| 吴刚勇;张千斌;吴恒超;顾冰;: "基于自然语言处理技术的电力客户投诉工单文本挖掘分析", 电力大数据, no. 10 * |
| 梅御东;陈旭;孙毓忠;牛逸翔;肖立;王海荣;冯百明;: "一种基于日志信息和CNN-text的软件系统异常检测方法", 计算机学报, no. 02, pages 366 - 380 * |
| 王海明: "基于TF-IDF改进计算模型的实时大数据处理系统设计与实现", 中国优秀硕士学位论文全文数据库信息科技辑, no. 04, pages 140 - 886 * |
| 范华;翁利国;周艳;姜川;孙涛;: "基于Bi-LSTM和TFIDF的工单事件提取", 电脑知识与技术, no. 04, pages 291 - 293 * |
| 黄晓丹等: "基于TF-IDF算法的AAA服务异常检测机制研究", 移动通信, pages 83 - 87 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113761133A (en) * | 2021-09-10 | 2021-12-07 | 未鲲(上海)科技服务有限公司 | System abnormity monitoring method and device based on artificial intelligence and related equipment |
| CN118378194A (en) * | 2024-06-21 | 2024-07-23 | 国网江苏省电力有限公司电力科学研究院 | A method and device for power equipment operation and maintenance based on knowledge graph |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111857097B (en) | 2023-10-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN115497272B (en) | Construction period intelligent early warning system and method based on digital construction | |
| CN106250934B (en) | Defect data classification method and device | |
| CN107423205B (en) | System fault early warning method and system for data leakage prevention system | |
| CN111435366A (en) | Equipment fault diagnosis method and device and electronic equipment | |
| CN116167370B (en) | Anomaly detection method for distributed systems based on log spatiotemporal feature analysis | |
| CN115373369B (en) | Vehicle fault diagnosis system and method | |
| DE102021130081A1 (en) | AUTOMATIC ONTOLOGY EXTRACTION BASED ON DEEP LEARNING TO CAPTURE NEW AREAS OF KNOWLEDGE | |
| CN111857097A (en) | Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency | |
| CN114357171A (en) | Emergency event processing method and device, storage medium and electronic equipment | |
| CN110781173A (en) | Data identification method and device, computer equipment and storage medium | |
| CN111736636A (en) | Flooded waterwheel room early warning method and system based on knowledge graph | |
| CN119441846A (en) | A method and system for diagnosing important events in converter stations based on correlation causality | |
| CN118035468A (en) | Deep learning-based equal-protection evaluation result record knowledge graph extraction method | |
| DE102013101871A1 (en) | Word-based speech analysis and speech analysis facility | |
| CN112910733A (en) | Full link monitoring system and method based on big data | |
| CN116185797A (en) | Method, device and storage medium for predicting server resource saturation | |
| CN111414943A (en) | An Anomaly Detection Method Based on Hybrid Hidden Naive Bayesian Model | |
| CN114757277A (en) | Rail transit signal system alarm method, device, electronic equipment and storage medium | |
| DE102020113193B4 (en) | Method and system for processing sensor data for transmission to a central unit | |
| CN110083611B (en) | Random hybrid system security analysis method based on statistical model detection | |
| CN120407774A (en) | Vehicle functional fault intelligent troubleshooting method, device, equipment and storage medium | |
| CN116383710B (en) | Label determining method, device, electronic equipment and storage medium | |
| DE102024002821A1 (en) | Methods for improving the reliability of the output of a Large Language Model | |
| CN114091447A (en) | Text recognition method, device and equipment | |
| CN112435151B (en) | Government information data processing method and system based on association analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |