CN108563629A - A kind of daily record resolution rules automatic generation method and device - Google Patents
A kind of daily record resolution rules automatic generation method and device Download PDFInfo
- Publication number
- CN108563629A CN108563629A CN201810205205.1A CN201810205205A CN108563629A CN 108563629 A CN108563629 A CN 108563629A CN 201810205205 A CN201810205205 A CN 201810205205A CN 108563629 A CN108563629 A CN 108563629A
- Authority
- CN
- China
- Prior art keywords
- daily record
- resolution rules
- regular expression
- word
- generation method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of daily record resolution rules automatic generation methods and device, this method to include:Daily record segments step, receives newly added equipment daily record, and carry out automatic word segmentation to the newly added equipment daily record;Syntax analysis step assigns syntactic definition to the word separated;Canonical generation step generates resolution rules regular expression according to the syntactic definition;And field mapping step, by the resolution rules regular expression automatism to server-side analytics engine.Through the invention, user can not have to that device log access can be automatically performed under the premise of writing any code, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.
Description
Technical field
The present invention relates to safety management technology fields, and in particular, to a kind of daily record resolution rules automatic generation method and
Device.
Background technology
In the prior art, the device log increased newly in computer is accessed by writing code, to be parsed to daily record
Difficulty is larger, complexity is higher, to carry out the extremely inefficient of resolution rules exploitation to daily record.
Invention content
The purpose of the present invention is to solve to daily record parsing, difficulty is larger, complexity is higher, to be solved to daily record
The extremely inefficient technical problem of analysis rule exploitation.
To achieve the goals above, the present invention uses following technical scheme:
The present invention provides a kind of daily record resolution rules automatic generation methods, including:Daily record segments step, and reception is newly established
Standby daily record, and automatic word segmentation is carried out to the newly added equipment daily record;Syntax analysis step assigns syntactic definition to the word separated;
Canonical generation step generates resolution rules regular expression according to the syntactic definition;And field mapping step, by the solution
Regular regular expression automatism is analysed to server-side analytics engine.
Preferably, in the daily record segments step, finite state automata is built, the finite state automata is passed through
Character one by one in the newly added equipment daily record is analyzed, when encountering the stop-word in stop-word dictionary, then exits institute
It states finite state automata and exports lexical token, be then back to the finite state automata and continue to segment, until described new
Until increasing the alphabet analysis completion in device log, to which the newly added equipment daily record is cut into word list.
Preferably, it is built-in in computer systems or has rule governing parsing by user's definition, in the grammer point
Analyse step in, receive the lexical token, and the rule governing parsing is matched with the lexical token, if having with
The matched rule governing parsing of lexical token then assigns and the lexical token each word in the word list that is syncopated as
Syntactic definition in matched rule governing parsing, if not with the matched rule governing parsing of the lexical token, to institute
State the rule governing parsing that lexical token assigns acquiescence.
Preferably, in the syntax analysis step, the syntactic definition includes timestamp, IP address, the addresses URL, uses
One or more of family agency, integer, floating number, file, user name.
Preferably, in the syntax analysis step, by different lexical tokens respectively with rule governing parsing carry out
Match, for the same lexical token, lexical token is matched with multiple rule governing parsings, and selects and lexical token
With the maximum rule governing parsing of degree.
Preferably, in the canonical generation step, by the Combination conversion of the syntactic definition at resolution rules canonical table
Spliced up to formula, and with the daily record segment of non-successfully resolved.
Preferably, in the field mapping step, the server-side analytics engine is to the resolution rules regular expressions
Field in formula carries out function operation, and the field mapping in the resolution rules regular expression is become the server-side solution
Analyse the final field that engine needs.
Preferably, in the field mapping step, the resolution rules regular expression is automatically uploaded to server
On, and user is by the visualization interface to the resolution rules regular expression to be shown to user by visualization interface
Secondary-confirmation and preservation are carried out, and is issued to server-side analytics engine again.
Preferably, in the field mapping step, by the resolution rules regular expression and rule governing parsing with
The matching degree of lexical token is automatically uploaded on server, and is shown to user by visualization interface, and user can described in
The resolution rules regular expression is modified depending on changing interface, and is issued to server-side analytics engine again.
The present invention also provides a kind of daily record resolution rules automatically generating devices, for executing above-mentioned daily record resolution rules certainly
Dynamic generation method, the daily record resolution rules automatically generating device include:Daily record word-dividing mode receives newly added equipment daily record, and
Automatic word segmentation is carried out to the newly added equipment daily record;Syntax Analysis Module assigns syntactic definition to the word being syncopated as;Canonical generates
Module generates resolution rules regular expression according to the syntactic definition;And field mapping block, by the parsing of generation
Regular regular expression automatism is to server-side analytics engine.
Compared with prior art, the present invention has the following advantages and beneficial effect:
Through the invention, user can connect without that can be automatically performed device log under the premise of writing any code
Enter, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.
Description of the drawings
Fig. 1 is the flow chart of daily record participle step;
Fig. 2 is the flow chart of syntax analysis step;
Fig. 3 is the structure chart of daily record resolution rules automatically generating device.
Specific implementation mode
Below with reference to the accompanying drawings, the present invention will be further described in detail, in order to which the present invention is more clear and is easy to
Understand.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention, Ke Yiyong
A variety of different modes or combinations thereof are modified described embodiment.Therefore, attached drawing and description are inherently explanation
Property, it is not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is drawn not in scale, and
Identical reference numeral indicates identical part.
With reference to Fig. 1-3 embodiments that the present invention will be described in detail.
The present invention daily record resolution rules automatic generation method include:Daily record segments step, syntax analysis step, canonical life
At step and field mapping step.
In daily record segments step, newly added equipment daily record is received, and automatic word segmentation is carried out to newly added equipment daily record.
Preferably, in daily record segments step, as shown in Figure 1, structure finite state automata (FSM), by FSM to new
The character one by one increased in device log is analyzed, and when encountering the stop-word in stop-word dictionary, is then exited FSM and is exported word
Method marks (morphology token), is then back to the finite state automata and continues to segment, until in the newly added equipment daily record
Until alphabet analysis is completed, to which the newly added equipment daily record is cut into word list.Stop-word dictionary can dynamic into
Row update, can be arranged different stop-word dictionaries according to actual conditions for different device types.
In syntax analysis step, syntactic definition is assigned to the word separated.
Preferably, built-in in computer systems or have rule governing parsing by user's definition, in syntax analysis step
In, as shown in Fig. 2, receiving lexical token, and rule governing parsing is matched with lexical token.If having and lexical token
Matched rule governing parsing then assigns each word in the word list that is syncopated as and being advised with the matched syntactic analysis of lexical token
Syntactic definition in then.If not assigning the grammer of acquiescence to lexical token with the matched rule governing parsing of lexical token
Analysis rule.
Preferably, rule governing parsing includes two parts content, and first part is syntactic definition, including but not limited to the time
Stamp, IP address, the addresses URL, user agent (User-Agent), integer, floating number, file, user name etc., second part is just
Then expression formula defines, and different regular expressions is formulated for different syntactic definitions.
Preferably, in syntax analysis step, multithreading by different lexical tokens respectively with rule governing parsing into
Row matching.For the same lexical token, lexical token is matched with multiple rule governing parsings, and selected and morphology mark
Remember the maximum rule governing parsing of matching degree.Therefore, it is possible to efficiently export matching result.
In canonical generation step, resolution rules regular expression is generated according to syntactic definition.
Preferably, in canonical generation step, by the Combination conversion of syntactic definition at resolution rules regular expression, and with
The daily record segment of non-successfully resolved spliced with.
In field mapping step, by resolution rules regular expression automatism to server-side analytics engine.
Preferably, in field mapping step, server-side analytics engine to the field in resolution rules regular expression into
Line function operates, and the field mapping in resolution rules regular expression is become the final word that server-side analytics engine needs
Section.
Preferably, in field mapping step, resolution rules regular expression is automatically uploaded on server, and is passed through
Visualization interface shows user user carries out secondary-confirmation and guarantor by visualization interface to resolution rules regular expression
It deposits, and is issued to server-side analytics engine again.
Preferably, in field mapping step, by resolution rules regular expression and rule governing parsing and lexical token
Matching degree be automatically uploaded on server, and by visualization interface to user show, user by visualization interface to solution
The regular regular expression of analysis is modified, and is issued to server-side analytics engine again, with to resolution rules regular expression into
Row optimization.
The daily record resolution rules that the invention also includes a kind of for executing above-mentioned daily record resolution rules automatic generation method are certainly
Dynamic generating means, as shown in figure 3, including:Daily record word-dividing mode receives newly added equipment daily record, and is carried out to newly added equipment daily record
Automatic word segmentation;Syntax Analysis Module assigns syntactic definition to the word being syncopated as;Canonical generation module, generates according to syntactic definition
Resolution rules regular expression;And field mapping block, by the resolution rules regular expression automatism of generation to service
Hold analytics engine.
Through the invention, user can connect without that can be automatically performed device log under the premise of writing any code
Enter, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.
The foregoing is merely the preferred embodiment of the present invention, are not intended to restrict the invention, for those skilled in the art
For member, the invention may be variously modified and varied.Any modification made by all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of daily record resolution rules automatic generation method, including:
Daily record segments step, receives newly added equipment daily record, and carry out automatic word segmentation to the newly added equipment daily record;
Syntax analysis step assigns syntactic definition to the word separated;
Canonical generation step generates resolution rules regular expression according to the syntactic definition;And
Field mapping step, by the resolution rules regular expression automatism to server-side analytics engine.
2. daily record resolution rules automatic generation method according to claim 1 is built in the daily record segments step
Poor state automata analyzes the character one by one in the newly added equipment daily record by the finite state automata, when
When encountering the stop-word in stop-word dictionary, then exits the finite state automata and export lexical token, be then back to institute
It states finite state automata to continue to segment, until the alphabet in the newly added equipment daily record, which is analyzed, to be completed, thus will
The newly added equipment daily record is cut into word list.
3. daily record resolution rules automatic generation method according to claim 2, is built-in with or passes through in computer systems
User's definition has rule governing parsing, in the syntax analysis step, receives the lexical token, and by the syntactic analysis
It is regular to be matched with the lexical token,
If with the matched rule governing parsing of the lexical token, to each word in the word list that is syncopated as assign with
Syntactic definition in the matched rule governing parsing of lexical token,
If not assigning the syntactic analysis of acquiescence to the lexical token with the matched rule governing parsing of the lexical token
Rule.
4. daily record resolution rules automatic generation method according to claim 3, in the syntax analysis step, institute's predicate
Method is defined including one or more in timestamp, IP address, the addresses URL, user agent, integer, floating number, file, user name
It is a.
5. daily record resolution rules automatic generation method according to claim 3, in the syntax analysis step, multithreading
Ground matches different lexical tokens with rule governing parsing respectively, for the same lexical token, by lexical token with
Multiple rule governing parsings are matched, and are selected and the maximum rule governing parsing of lexical token matching degree.
6. daily record resolution rules automatic generation method according to claim 3 will be described in the canonical generation step
The Combination conversion of syntactic definition is spliced at resolution rules regular expression, and with the daily record segment of non-successfully resolved.
7. daily record resolution rules automatic generation method according to claim 6, in the field mapping step, the clothes
End analytics engine be engaged in the field progress function operation in the resolution rules regular expression, by the resolution rules canonical
Field mapping in expression formula becomes the final field that the server-side analytics engine needs.
8. daily record resolution rules automatic generation method according to claim 7 will be described in the field mapping step
Resolution rules regular expression is automatically uploaded on server, and is shown to user by visualization interface, and user passes through described
Visualization interface carries out secondary-confirmation and preservation to the resolution rules regular expression, and is issued to server-side parsing again and draws
It holds up.
9. daily record resolution rules automatic generation method according to claim 8 will be described in the field mapping step
The matching degree of resolution rules regular expression and rule governing parsing and lexical token is automatically uploaded on server, and by can
User is shown depending on changing interface, user is modified the resolution rules regular expression by the visualization interface, and
Re-issue server-side analytics engine.
10. a kind of daily record resolution rules automatically generating device requires 1-9 any one of them daily records parsing rule for perform claim
Then automatic generation method, the daily record resolution rules automatically generating device include:
Daily record word-dividing mode receives newly added equipment daily record, and carries out automatic word segmentation to the newly added equipment daily record;
Syntax Analysis Module assigns syntactic definition to the word being syncopated as;
Canonical generation module generates resolution rules regular expression according to the syntactic definition;And
Field mapping block, by the resolution rules regular expression automatism of generation to server-side analytics engine.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810205205.1A CN108563629B (en) | 2018-03-13 | 2018-03-13 | Automatic log analysis rule generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810205205.1A CN108563629B (en) | 2018-03-13 | 2018-03-13 | Automatic log analysis rule generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108563629A true CN108563629A (en) | 2018-09-21 |
CN108563629B CN108563629B (en) | 2022-04-19 |
Family
ID=63531515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810205205.1A Expired - Fee Related CN108563629B (en) | 2018-03-13 | 2018-03-13 | Automatic log analysis rule generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108563629B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134615A (en) * | 2019-04-10 | 2019-08-16 | 百度在线网络技术(北京)有限公司 | The method and device of application program acquisition daily record data |
CN110321457A (en) * | 2019-04-19 | 2019-10-11 | 杭州玳数科技有限公司 | Access log resolution rules generation method and device, log analytic method and system |
CN110968560A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Log collector configuration method, device and system |
CN111737950A (en) * | 2020-08-27 | 2020-10-02 | 北京安帝科技有限公司 | Log carrier format extraction method and device based on natural language |
CN112667672A (en) * | 2021-01-06 | 2021-04-16 | 北京启明星辰信息安全技术有限公司 | Log analysis method and analysis device |
CN114064390A (en) * | 2021-09-26 | 2022-02-18 | 杭州安恒信息技术股份有限公司 | Log collision rule conversion method, device, system and electronic device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1759354A (en) * | 2003-01-09 | 2006-04-12 | 思科系统公司 | Methods and apparatuses for evaluation of regular expressions of arbitrary size |
CN1975725A (en) * | 2006-12-12 | 2007-06-06 | 华为技术有限公司 | Method and system for managing journal |
US20080109905A1 (en) * | 2006-11-03 | 2008-05-08 | Grosse Eric H | Methods and apparatus for detecting unwanted traffic in one or more packet networks utilizing string analysis |
US20110083123A1 (en) * | 2009-10-05 | 2011-04-07 | Microsoft Corporation | Automatically localizing root error through log analysis |
CN102955914A (en) * | 2011-08-19 | 2013-03-06 | 百度在线网络技术(北京)有限公司 | Method and device for detecting security flaws of source files |
CN104144071A (en) * | 2013-05-10 | 2014-11-12 | 北京新媒传信科技有限公司 | System log processing method and platform |
CN104391881A (en) * | 2014-10-30 | 2015-03-04 | 杭州安恒信息技术有限公司 | Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system |
CN106294673A (en) * | 2016-08-08 | 2017-01-04 | 杭州玳数科技有限公司 | A kind of method and system of User Defined rule real time parsing daily record data |
CN106790109A (en) * | 2016-12-26 | 2017-05-31 | 东软集团股份有限公司 | Data matching method and device, protocol data analysis method, device and system |
-
2018
- 2018-03-13 CN CN201810205205.1A patent/CN108563629B/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1759354A (en) * | 2003-01-09 | 2006-04-12 | 思科系统公司 | Methods and apparatuses for evaluation of regular expressions of arbitrary size |
US20080109905A1 (en) * | 2006-11-03 | 2008-05-08 | Grosse Eric H | Methods and apparatus for detecting unwanted traffic in one or more packet networks utilizing string analysis |
CN1975725A (en) * | 2006-12-12 | 2007-06-06 | 华为技术有限公司 | Method and system for managing journal |
US20110083123A1 (en) * | 2009-10-05 | 2011-04-07 | Microsoft Corporation | Automatically localizing root error through log analysis |
CN102955914A (en) * | 2011-08-19 | 2013-03-06 | 百度在线网络技术(北京)有限公司 | Method and device for detecting security flaws of source files |
CN104144071A (en) * | 2013-05-10 | 2014-11-12 | 北京新媒传信科技有限公司 | System log processing method and platform |
CN104391881A (en) * | 2014-10-30 | 2015-03-04 | 杭州安恒信息技术有限公司 | Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system |
CN106294673A (en) * | 2016-08-08 | 2017-01-04 | 杭州玳数科技有限公司 | A kind of method and system of User Defined rule real time parsing daily record data |
CN106790109A (en) * | 2016-12-26 | 2017-05-31 | 东软集团股份有限公司 | Data matching method and device, protocol data analysis method, device and system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968560A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Log collector configuration method, device and system |
CN110968560B (en) * | 2018-09-29 | 2023-05-23 | 北京国双科技有限公司 | Configuration method, device and system of log collector |
CN110134615A (en) * | 2019-04-10 | 2019-08-16 | 百度在线网络技术(北京)有限公司 | The method and device of application program acquisition daily record data |
CN110321457A (en) * | 2019-04-19 | 2019-10-11 | 杭州玳数科技有限公司 | Access log resolution rules generation method and device, log analytic method and system |
CN111737950A (en) * | 2020-08-27 | 2020-10-02 | 北京安帝科技有限公司 | Log carrier format extraction method and device based on natural language |
CN112667672A (en) * | 2021-01-06 | 2021-04-16 | 北京启明星辰信息安全技术有限公司 | Log analysis method and analysis device |
CN112667672B (en) * | 2021-01-06 | 2024-05-10 | 北京启明星辰信息安全技术有限公司 | Log analysis method and analysis device |
CN114064390A (en) * | 2021-09-26 | 2022-02-18 | 杭州安恒信息技术股份有限公司 | Log collision rule conversion method, device, system and electronic device |
CN114064390B (en) * | 2021-09-26 | 2025-01-10 | 杭州安恒信息技术股份有限公司 | Log collision rule conversion method, device, system and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN108563629B (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108563629A (en) | A kind of daily record resolution rules automatic generation method and device | |
CN102043808B (en) | Method and equipment for extracting bilingual terms using webpage structure | |
CN102098331B (en) | Method and system for reducing WEB type application contents | |
CN106970820A (en) | Code storage method and code storage | |
US7676358B2 (en) | System and method for the recognition of organic chemical names in text documents | |
US9110852B1 (en) | Methods and systems for extracting information from text | |
CN103778185A (en) | SQL statement parsing method and system used for database auditing system | |
CN113704575B (en) | SQL method, device, equipment and storage medium for analyzing XML and Java files | |
CN109241080A (en) | A kind of the building application method and its system of FQL query language | |
CN111950263A (en) | A log parsing method, system and electronic device | |
CN113806321A (en) | Log processing method and system | |
CN113886527A (en) | A natural language semantic extraction method and system | |
US9208134B2 (en) | Methods and systems for tokenizing multilingual textual documents | |
CN102270223B (en) | The generation method in source codec storehouse, device and source codec method, device | |
CN120045689A (en) | Data query method, system, terminal and medium based on large language model | |
CN109150962A (en) | A method of quickly identifying HTTP request head by keyword | |
CN110336798B (en) | Message matching filtering method and device based on DPI | |
CN101520778A (en) | Apparatus and method for determing parts-of-speech in chinese | |
CN111984883B (en) | Label mining method, device, equipment and storage medium | |
US20250199779A1 (en) | Method and Device for Parsing Programming Language, and Non-transitory Computer-readable Storage Medium | |
CN102521357A (en) | System and method for achieving accurate matching of texts by automaton | |
Kulkarni et al. | Statistical constituency parser for Sanskrit compounds | |
CN114547169B (en) | File transfer reading and writing method, device, equipment and storage medium | |
CN103246671A (en) | Processing method and device for abstract syntax notation files | |
CN103729379B (en) | Computational methods, method of adjustment and the server of SQL program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220419 |
|
CF01 | Termination of patent right due to non-payment of annual fee |