[go: up one dir, main page]

CN108563629A - A kind of daily record resolution rules automatic generation method and device - Google Patents

A kind of daily record resolution rules automatic generation method and device Download PDF

Info

Publication number
CN108563629A
CN108563629A CN201810205205.1A CN201810205205A CN108563629A CN 108563629 A CN108563629 A CN 108563629A CN 201810205205 A CN201810205205 A CN 201810205205A CN 108563629 A CN108563629 A CN 108563629A
Authority
CN
China
Prior art keywords
daily record
resolution rules
regular expression
word
generation method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810205205.1A
Other languages
Chinese (zh)
Other versions
CN108563629B (en
Inventor
邸壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Renhe Honesty And Technology Co Ltd
Original Assignee
Beijing Renhe Honesty And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Renhe Honesty And Technology Co Ltd filed Critical Beijing Renhe Honesty And Technology Co Ltd
Priority to CN201810205205.1A priority Critical patent/CN108563629B/en
Publication of CN108563629A publication Critical patent/CN108563629A/en
Application granted granted Critical
Publication of CN108563629B publication Critical patent/CN108563629B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of daily record resolution rules automatic generation methods and device, this method to include:Daily record segments step, receives newly added equipment daily record, and carry out automatic word segmentation to the newly added equipment daily record;Syntax analysis step assigns syntactic definition to the word separated;Canonical generation step generates resolution rules regular expression according to the syntactic definition;And field mapping step, by the resolution rules regular expression automatism to server-side analytics engine.Through the invention, user can not have to that device log access can be automatically performed under the premise of writing any code, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.

Description

A kind of daily record resolution rules automatic generation method and device
Technical field
The present invention relates to safety management technology fields, and in particular, to a kind of daily record resolution rules automatic generation method and Device.
Background technology
In the prior art, the device log increased newly in computer is accessed by writing code, to be parsed to daily record Difficulty is larger, complexity is higher, to carry out the extremely inefficient of resolution rules exploitation to daily record.
Invention content
The purpose of the present invention is to solve to daily record parsing, difficulty is larger, complexity is higher, to be solved to daily record The extremely inefficient technical problem of analysis rule exploitation.
To achieve the goals above, the present invention uses following technical scheme:
The present invention provides a kind of daily record resolution rules automatic generation methods, including:Daily record segments step, and reception is newly established Standby daily record, and automatic word segmentation is carried out to the newly added equipment daily record;Syntax analysis step assigns syntactic definition to the word separated; Canonical generation step generates resolution rules regular expression according to the syntactic definition;And field mapping step, by the solution Regular regular expression automatism is analysed to server-side analytics engine.
Preferably, in the daily record segments step, finite state automata is built, the finite state automata is passed through Character one by one in the newly added equipment daily record is analyzed, when encountering the stop-word in stop-word dictionary, then exits institute It states finite state automata and exports lexical token, be then back to the finite state automata and continue to segment, until described new Until increasing the alphabet analysis completion in device log, to which the newly added equipment daily record is cut into word list.
Preferably, it is built-in in computer systems or has rule governing parsing by user's definition, in the grammer point Analyse step in, receive the lexical token, and the rule governing parsing is matched with the lexical token, if having with The matched rule governing parsing of lexical token then assigns and the lexical token each word in the word list that is syncopated as Syntactic definition in matched rule governing parsing, if not with the matched rule governing parsing of the lexical token, to institute State the rule governing parsing that lexical token assigns acquiescence.
Preferably, in the syntax analysis step, the syntactic definition includes timestamp, IP address, the addresses URL, uses One or more of family agency, integer, floating number, file, user name.
Preferably, in the syntax analysis step, by different lexical tokens respectively with rule governing parsing carry out Match, for the same lexical token, lexical token is matched with multiple rule governing parsings, and selects and lexical token With the maximum rule governing parsing of degree.
Preferably, in the canonical generation step, by the Combination conversion of the syntactic definition at resolution rules canonical table Spliced up to formula, and with the daily record segment of non-successfully resolved.
Preferably, in the field mapping step, the server-side analytics engine is to the resolution rules regular expressions Field in formula carries out function operation, and the field mapping in the resolution rules regular expression is become the server-side solution Analyse the final field that engine needs.
Preferably, in the field mapping step, the resolution rules regular expression is automatically uploaded to server On, and user is by the visualization interface to the resolution rules regular expression to be shown to user by visualization interface Secondary-confirmation and preservation are carried out, and is issued to server-side analytics engine again.
Preferably, in the field mapping step, by the resolution rules regular expression and rule governing parsing with The matching degree of lexical token is automatically uploaded on server, and is shown to user by visualization interface, and user can described in The resolution rules regular expression is modified depending on changing interface, and is issued to server-side analytics engine again.
The present invention also provides a kind of daily record resolution rules automatically generating devices, for executing above-mentioned daily record resolution rules certainly Dynamic generation method, the daily record resolution rules automatically generating device include:Daily record word-dividing mode receives newly added equipment daily record, and Automatic word segmentation is carried out to the newly added equipment daily record;Syntax Analysis Module assigns syntactic definition to the word being syncopated as;Canonical generates Module generates resolution rules regular expression according to the syntactic definition;And field mapping block, by the parsing of generation Regular regular expression automatism is to server-side analytics engine.
Compared with prior art, the present invention has the following advantages and beneficial effect:
Through the invention, user can connect without that can be automatically performed device log under the premise of writing any code Enter, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.
Description of the drawings
Fig. 1 is the flow chart of daily record participle step;
Fig. 2 is the flow chart of syntax analysis step;
Fig. 3 is the structure chart of daily record resolution rules automatically generating device.
Specific implementation mode
Below with reference to the accompanying drawings, the present invention will be further described in detail, in order to which the present invention is more clear and is easy to Understand.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention, Ke Yiyong A variety of different modes or combinations thereof are modified described embodiment.Therefore, attached drawing and description are inherently explanation Property, it is not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is drawn not in scale, and Identical reference numeral indicates identical part.
With reference to Fig. 1-3 embodiments that the present invention will be described in detail.
The present invention daily record resolution rules automatic generation method include:Daily record segments step, syntax analysis step, canonical life At step and field mapping step.
In daily record segments step, newly added equipment daily record is received, and automatic word segmentation is carried out to newly added equipment daily record.
Preferably, in daily record segments step, as shown in Figure 1, structure finite state automata (FSM), by FSM to new The character one by one increased in device log is analyzed, and when encountering the stop-word in stop-word dictionary, is then exited FSM and is exported word Method marks (morphology token), is then back to the finite state automata and continues to segment, until in the newly added equipment daily record Until alphabet analysis is completed, to which the newly added equipment daily record is cut into word list.Stop-word dictionary can dynamic into Row update, can be arranged different stop-word dictionaries according to actual conditions for different device types.
In syntax analysis step, syntactic definition is assigned to the word separated.
Preferably, built-in in computer systems or have rule governing parsing by user's definition, in syntax analysis step In, as shown in Fig. 2, receiving lexical token, and rule governing parsing is matched with lexical token.If having and lexical token Matched rule governing parsing then assigns each word in the word list that is syncopated as and being advised with the matched syntactic analysis of lexical token Syntactic definition in then.If not assigning the grammer of acquiescence to lexical token with the matched rule governing parsing of lexical token Analysis rule.
Preferably, rule governing parsing includes two parts content, and first part is syntactic definition, including but not limited to the time Stamp, IP address, the addresses URL, user agent (User-Agent), integer, floating number, file, user name etc., second part is just Then expression formula defines, and different regular expressions is formulated for different syntactic definitions.
Preferably, in syntax analysis step, multithreading by different lexical tokens respectively with rule governing parsing into Row matching.For the same lexical token, lexical token is matched with multiple rule governing parsings, and selected and morphology mark Remember the maximum rule governing parsing of matching degree.Therefore, it is possible to efficiently export matching result.
In canonical generation step, resolution rules regular expression is generated according to syntactic definition.
Preferably, in canonical generation step, by the Combination conversion of syntactic definition at resolution rules regular expression, and with The daily record segment of non-successfully resolved spliced with.
In field mapping step, by resolution rules regular expression automatism to server-side analytics engine.
Preferably, in field mapping step, server-side analytics engine to the field in resolution rules regular expression into Line function operates, and the field mapping in resolution rules regular expression is become the final word that server-side analytics engine needs Section.
Preferably, in field mapping step, resolution rules regular expression is automatically uploaded on server, and is passed through Visualization interface shows user user carries out secondary-confirmation and guarantor by visualization interface to resolution rules regular expression It deposits, and is issued to server-side analytics engine again.
Preferably, in field mapping step, by resolution rules regular expression and rule governing parsing and lexical token Matching degree be automatically uploaded on server, and by visualization interface to user show, user by visualization interface to solution The regular regular expression of analysis is modified, and is issued to server-side analytics engine again, with to resolution rules regular expression into Row optimization.
The daily record resolution rules that the invention also includes a kind of for executing above-mentioned daily record resolution rules automatic generation method are certainly Dynamic generating means, as shown in figure 3, including:Daily record word-dividing mode receives newly added equipment daily record, and is carried out to newly added equipment daily record Automatic word segmentation;Syntax Analysis Module assigns syntactic definition to the word being syncopated as;Canonical generation module, generates according to syntactic definition Resolution rules regular expression;And field mapping block, by the resolution rules regular expression automatism of generation to service Hold analytics engine.
Through the invention, user can connect without that can be automatically performed device log under the premise of writing any code Enter, significantly reduce the difficulty and complexity of daily record parsing, to promote the efficiency for carrying out resolution rules exploitation to daily record.
The foregoing is merely the preferred embodiment of the present invention, are not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.Any modification made by all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of daily record resolution rules automatic generation method, including:
Daily record segments step, receives newly added equipment daily record, and carry out automatic word segmentation to the newly added equipment daily record;
Syntax analysis step assigns syntactic definition to the word separated;
Canonical generation step generates resolution rules regular expression according to the syntactic definition;And
Field mapping step, by the resolution rules regular expression automatism to server-side analytics engine.
2. daily record resolution rules automatic generation method according to claim 1 is built in the daily record segments step Poor state automata analyzes the character one by one in the newly added equipment daily record by the finite state automata, when When encountering the stop-word in stop-word dictionary, then exits the finite state automata and export lexical token, be then back to institute It states finite state automata to continue to segment, until the alphabet in the newly added equipment daily record, which is analyzed, to be completed, thus will The newly added equipment daily record is cut into word list.
3. daily record resolution rules automatic generation method according to claim 2, is built-in with or passes through in computer systems User's definition has rule governing parsing, in the syntax analysis step, receives the lexical token, and by the syntactic analysis It is regular to be matched with the lexical token,
If with the matched rule governing parsing of the lexical token, to each word in the word list that is syncopated as assign with Syntactic definition in the matched rule governing parsing of lexical token,
If not assigning the syntactic analysis of acquiescence to the lexical token with the matched rule governing parsing of the lexical token Rule.
4. daily record resolution rules automatic generation method according to claim 3, in the syntax analysis step, institute's predicate Method is defined including one or more in timestamp, IP address, the addresses URL, user agent, integer, floating number, file, user name It is a.
5. daily record resolution rules automatic generation method according to claim 3, in the syntax analysis step, multithreading Ground matches different lexical tokens with rule governing parsing respectively, for the same lexical token, by lexical token with Multiple rule governing parsings are matched, and are selected and the maximum rule governing parsing of lexical token matching degree.
6. daily record resolution rules automatic generation method according to claim 3 will be described in the canonical generation step The Combination conversion of syntactic definition is spliced at resolution rules regular expression, and with the daily record segment of non-successfully resolved.
7. daily record resolution rules automatic generation method according to claim 6, in the field mapping step, the clothes End analytics engine be engaged in the field progress function operation in the resolution rules regular expression, by the resolution rules canonical Field mapping in expression formula becomes the final field that the server-side analytics engine needs.
8. daily record resolution rules automatic generation method according to claim 7 will be described in the field mapping step Resolution rules regular expression is automatically uploaded on server, and is shown to user by visualization interface, and user passes through described Visualization interface carries out secondary-confirmation and preservation to the resolution rules regular expression, and is issued to server-side parsing again and draws It holds up.
9. daily record resolution rules automatic generation method according to claim 8 will be described in the field mapping step The matching degree of resolution rules regular expression and rule governing parsing and lexical token is automatically uploaded on server, and by can User is shown depending on changing interface, user is modified the resolution rules regular expression by the visualization interface, and Re-issue server-side analytics engine.
10. a kind of daily record resolution rules automatically generating device requires 1-9 any one of them daily records parsing rule for perform claim Then automatic generation method, the daily record resolution rules automatically generating device include:
Daily record word-dividing mode receives newly added equipment daily record, and carries out automatic word segmentation to the newly added equipment daily record;
Syntax Analysis Module assigns syntactic definition to the word being syncopated as;
Canonical generation module generates resolution rules regular expression according to the syntactic definition;And
Field mapping block, by the resolution rules regular expression automatism of generation to server-side analytics engine.
CN201810205205.1A 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device Expired - Fee Related CN108563629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810205205.1A CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810205205.1A CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Publications (2)

Publication Number Publication Date
CN108563629A true CN108563629A (en) 2018-09-21
CN108563629B CN108563629B (en) 2022-04-19

Family

ID=63531515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810205205.1A Expired - Fee Related CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Country Status (1)

Country Link
CN (1) CN108563629B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134615A (en) * 2019-04-10 2019-08-16 百度在线网络技术(北京)有限公司 The method and device of application program acquisition daily record data
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN110968560A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Log collector configuration method, device and system
CN111737950A (en) * 2020-08-27 2020-10-02 北京安帝科技有限公司 Log carrier format extraction method and device based on natural language
CN112667672A (en) * 2021-01-06 2021-04-16 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN114064390A (en) * 2021-09-26 2022-02-18 杭州安恒信息技术股份有限公司 Log collision rule conversion method, device, system and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1759354A (en) * 2003-01-09 2006-04-12 思科系统公司 Methods and apparatuses for evaluation of regular expressions of arbitrary size
CN1975725A (en) * 2006-12-12 2007-06-06 华为技术有限公司 Method and system for managing journal
US20080109905A1 (en) * 2006-11-03 2008-05-08 Grosse Eric H Methods and apparatus for detecting unwanted traffic in one or more packet networks utilizing string analysis
US20110083123A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Automatically localizing root error through log analysis
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN104144071A (en) * 2013-05-10 2014-11-12 北京新媒传信科技有限公司 System log processing method and platform
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN106294673A (en) * 2016-08-08 2017-01-04 杭州玳数科技有限公司 A kind of method and system of User Defined rule real time parsing daily record data
CN106790109A (en) * 2016-12-26 2017-05-31 东软集团股份有限公司 Data matching method and device, protocol data analysis method, device and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1759354A (en) * 2003-01-09 2006-04-12 思科系统公司 Methods and apparatuses for evaluation of regular expressions of arbitrary size
US20080109905A1 (en) * 2006-11-03 2008-05-08 Grosse Eric H Methods and apparatus for detecting unwanted traffic in one or more packet networks utilizing string analysis
CN1975725A (en) * 2006-12-12 2007-06-06 华为技术有限公司 Method and system for managing journal
US20110083123A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Automatically localizing root error through log analysis
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN104144071A (en) * 2013-05-10 2014-11-12 北京新媒传信科技有限公司 System log processing method and platform
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN106294673A (en) * 2016-08-08 2017-01-04 杭州玳数科技有限公司 A kind of method and system of User Defined rule real time parsing daily record data
CN106790109A (en) * 2016-12-26 2017-05-31 东软集团股份有限公司 Data matching method and device, protocol data analysis method, device and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968560A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Log collector configuration method, device and system
CN110968560B (en) * 2018-09-29 2023-05-23 北京国双科技有限公司 Configuration method, device and system of log collector
CN110134615A (en) * 2019-04-10 2019-08-16 百度在线网络技术(北京)有限公司 The method and device of application program acquisition daily record data
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN111737950A (en) * 2020-08-27 2020-10-02 北京安帝科技有限公司 Log carrier format extraction method and device based on natural language
CN112667672A (en) * 2021-01-06 2021-04-16 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN112667672B (en) * 2021-01-06 2024-05-10 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN114064390A (en) * 2021-09-26 2022-02-18 杭州安恒信息技术股份有限公司 Log collision rule conversion method, device, system and electronic device
CN114064390B (en) * 2021-09-26 2025-01-10 杭州安恒信息技术股份有限公司 Log collision rule conversion method, device, system and electronic device

Also Published As

Publication number Publication date
CN108563629B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN108563629A (en) A kind of daily record resolution rules automatic generation method and device
CN102043808B (en) Method and equipment for extracting bilingual terms using webpage structure
CN102098331B (en) Method and system for reducing WEB type application contents
CN106970820A (en) Code storage method and code storage
US7676358B2 (en) System and method for the recognition of organic chemical names in text documents
US9110852B1 (en) Methods and systems for extracting information from text
CN103778185A (en) SQL statement parsing method and system used for database auditing system
CN113704575B (en) SQL method, device, equipment and storage medium for analyzing XML and Java files
CN109241080A (en) A kind of the building application method and its system of FQL query language
CN111950263A (en) A log parsing method, system and electronic device
CN113806321A (en) Log processing method and system
CN113886527A (en) A natural language semantic extraction method and system
US9208134B2 (en) Methods and systems for tokenizing multilingual textual documents
CN102270223B (en) The generation method in source codec storehouse, device and source codec method, device
CN120045689A (en) Data query method, system, terminal and medium based on large language model
CN109150962A (en) A method of quickly identifying HTTP request head by keyword
CN110336798B (en) Message matching filtering method and device based on DPI
CN101520778A (en) Apparatus and method for determing parts-of-speech in chinese
CN111984883B (en) Label mining method, device, equipment and storage medium
US20250199779A1 (en) Method and Device for Parsing Programming Language, and Non-transitory Computer-readable Storage Medium
CN102521357A (en) System and method for achieving accurate matching of texts by automaton
Kulkarni et al. Statistical constituency parser for Sanskrit compounds
CN114547169B (en) File transfer reading and writing method, device, equipment and storage medium
CN103246671A (en) Processing method and device for abstract syntax notation files
CN103729379B (en) Computational methods, method of adjustment and the server of SQL program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220419

CF01 Termination of patent right due to non-payment of annual fee