[go: up one dir, main page]

CN112559818A - Character string matching method, device, equipment and storage medium - Google Patents

Character string matching method, device, equipment and storage medium Download PDF

Info

Publication number
CN112559818A
CN112559818A CN202011482497.7A CN202011482497A CN112559818A CN 112559818 A CN112559818 A CN 112559818A CN 202011482497 A CN202011482497 A CN 202011482497A CN 112559818 A CN112559818 A CN 112559818A
Authority
CN
China
Prior art keywords
matching
node
current node
substrings
substring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011482497.7A
Other languages
Chinese (zh)
Other versions
CN112559818B (en
Inventor
余东祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202011482497.7A priority Critical patent/CN112559818B/en
Publication of CN112559818A publication Critical patent/CN112559818A/en
Application granted granted Critical
Publication of CN112559818B publication Critical patent/CN112559818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种字符串匹配方法、装置、设备及存储介质,方法包括:匹配两个字符串的第一匹配段记入根节点;从两个字符串中获取第一匹配段的左子串和右子串分别作为根节点的左节点和右节点;将左节点和右节点依次作为当前节点,匹配当前节点的两个子串的第二匹配段;若匹配到,将匹配的第二匹配段记入当前节点,从当前节点的两个子串中获取第二匹配段的左子串和右子串分别作为当前节点的左节点和右节点,继续进行匹配;若未匹配到,确定当前节点的两个子串的不匹配类型记入当前节点。通过在父串和子串中始终查找匹配段建立匹配树的节点,以保证原始字符串的顺序,由于每一节点记录中间匹配结果,因此可以灵活操作每个节点。

Figure 202011482497

The invention discloses a string matching method, device, equipment and storage medium. The method includes: recording a first matching segment matching two strings into a root node; obtaining the left child of the first matching segment from the two strings The string and the right substring are used as the left node and the right node of the root node respectively; the left node and the right node are taken as the current node in turn, and the second matching segment of the two substrings of the current node is matched; The segment is recorded in the current node, and the left substring and right substring of the second matching segment are obtained from the two substrings of the current node as the left and right nodes of the current node, respectively, and the matching continues; if no match is found, determine the current node. The mismatched types of the two substrings of are recorded in the current node. The nodes of the matching tree are established by always finding matching segments in the parent string and substring to ensure the order of the original string. Since each node records the intermediate matching result, each node can be flexibly operated.

Figure 202011482497

Description

Character string matching method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a character string matching method, a device, equipment and a storage medium, which are applied to the field of intelligent decision making.
Background
With the rise of artificial intelligence technology, character string matching is involved in many technical fields (such as natural language processing, text analysis, etc.), so that it is a problem that people always pay attention to how to develop a matching scheme with strong practicability.
At present, most of the existing character string matching tools in the industry are not visual enough, and some character string matching tools only show different positions and are not friendly to end users; some can only find out the same or different places, but the sequence is disordered, so that the practicability is poor; some of them will have matching position errors.
Disclosure of Invention
The present invention provides a method, an apparatus, a device and a storage medium for matching a character string, which are provided to overcome the above-mentioned shortcomings in the prior art.
The first aspect of the present invention provides a character string matching method, including:
s1, matching the first matching segments of the two character strings and recording the first matching segments into a root node;
s2, acquiring a left substring and a right substring of the first matching section from the two character strings to be respectively used as a left node and a right node of the root node;
s3, sequentially taking the left node and the right node as current nodes, and matching second matching sections of the two substrings of the current nodes;
s4, if the matching is achieved, recording the matched second matching segment into the current node, acquiring a left substring and a right substring of the second matching segment from the two substrings of the current node as a left node and a right node of the current node respectively, and continuing to execute the step S3;
and S5, if the substrings are not matched, determining the type of mismatch of the two substrings of the current node and recording the type of mismatch into the current node.
A second aspect of the present invention provides a character string matching apparatus, including:
the root node module is used for matching the first matching sections of the two character strings and recording the first matching sections into a root node;
the sub-node module is used for acquiring a left sub-string and a right sub-string of the first matching section from the two character strings to be respectively used as a left node and a right node of the root node;
the substring matching module is used for sequentially taking the left node and the right node as current nodes and matching second matching sections of the two substrings of the current nodes;
the first processing module is used for recording the matched second matching section into the current node when the matching is finished, acquiring a left substring and a right substring of the second matching section from the two substrings of the current node to be respectively used as the left node and the right node of the current node, and continuously executing the process of the substring matching module;
and the second processing module is used for determining the mismatch type of the two substrings of the current node and recording the mismatch type into the current node when the substrings are not matched.
A third aspect of the present invention provides a character string matching apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the character string matching method according to the first aspect when executing the program.
A fourth aspect of the present invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the string matching method according to the first aspect described above.
The character string matching method based on the first aspect has the following beneficial effects:
the matching segments are found out from the two character strings and used as root nodes of the matching tree, child nodes of the matching tree are established by searching the matching segments in subsequent left sub-string matching and right sub-string matching all the time, the sequence of the original character strings is ensured, and because the middle matching result is recorded by each node in the matching tree, each node can be flexibly operated, the practicability is good, the output result can be changed by modifying the output mode of each node in the subsequent traversal output of the matching tree, the purpose of separating the matching tree from the output matching result can be achieved, the code coupling degree is smaller, and the expansion is convenient. This scheme can be applied to in the wisdom education field to promote the construction in wisdom city.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow diagram illustrating an embodiment of a string matching method in accordance with an illustrative embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a matching tree building process according to the present invention;
FIG. 3 is a schematic diagram illustrating a matching result output according to the embodiment of FIG. 2;
fig. 4 is a hardware configuration diagram illustrating a character string matching apparatus according to an exemplary embodiment of the present invention;
fig. 5 is a flowchart illustrating an embodiment of a string matching apparatus according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to solve the problems that most of character string matching tools in the industry are not flexible enough and poor in practicability, the application provides an improved character string matching method, namely matching sections are found out in two character strings to serve as root nodes of a matching tree, child nodes of the matching tree are established by finding out the matching sections in subsequent left substring matching and right substring matching all the time, so that the sequence of original character strings is guaranteed, each node in the matching tree records a middle matching result, therefore, the operation on each node can be flexible, the practicability is good, the output result can be changed by modifying the output mode of each node when the matching tree is traversed and output, the purpose of separating the matching tree from the output matching result can be achieved, the code coupling degree is smaller, and the expansion is facilitated.
The following describes the character string matching method proposed in the present application in detail with specific embodiments.
Fig. 1 is a flowchart illustrating an embodiment of a string matching method according to an exemplary embodiment of the present invention, where the string matching method may be applied to a string matching device (e.g., a terminal, a server, etc.). As shown in fig. 1, the character string matching method includes the following steps:
step 101: the first matching segments of the two strings are matched and logged into the root node.
In the construction of a smart city, taking a recording monitoring service scene as an example, the matched character string is a correct sentence string, and the matched character string is a sentence string input by a user and needs to be matched with the correct sentence string.
In one embodiment, step 101 is a first match of two strings, and a longest matching segment can be found from the two strings to avoid the problem of matching position error, and the found longest matching segment is used as a first matching segment.
When searching the longest matching segment, the matching character string is used to search the longest matching segment in the matched character string.
In one example, the specific search process may be: taking the first character in the matched character string as a current character, searching the character which is the same as the current character in the matched character string, matching the next character of the current character with the next character of the character searched in the matched character string if the character is searched, taking the next character of the current character as the current character if the character is successfully matched, matching the next character of the current character with the next character of the character successfully matched in the matched character string, continuing to execute the process of matching the next character of the current character with the next character of the character successfully matched in the matched character string if the character is successfully matched, and taking the character searched in the matched character string and the character successfully matched as a matching segment until the matching failure occurs.
Then, the second character in the matched character string is used as the current character, the searching process is executed until the last character in the matched character string is used as the current character to finish searching, so that a plurality of matched segments can be obtained, and finally, one matched segment with the longest character length is selected from the plurality of matched segments to be used as the longest matched segment.
In the searching process, the space between the characters is used as an interval for searching. For example, two strings are "a b c d" and "a l b c e d f", respectively, and the longest "b c" substring will eventually be taken as the longest matching segment.
It can be understood by those skilled in the art that the above-mentioned search process is only an example, and the search mode of the longest matching segment can also be implemented by using other related technologies, which is not specifically limited in this application.
It should be noted that, if the longest matching segment is not found in the first matching process of the two character strings, it indicates that there is no matching place between the two character strings.
Step 102: and acquiring a left substring and a right substring of the first matching section from the two character strings to be used as a left node and a right node of the root node respectively.
After the root node of the matching tree is obtained, the left substring is used as the left node of the root node, and the right substring is used as the right node of the root node, so that the sequence of the original character strings can be ensured.
Still taking the two strings "a b c d" and "a l b c e d f" as examples, and the root node is "b c", the "a" and "a l" in the two strings will be the left node of the root node, and "d" and "e d f" will be the right node of the root node.
Step 103: and taking the left node and the right node as the current nodes in sequence, and matching the second matching sections of the two substrings of the current nodes.
In an embodiment, for the process of matching the second matching segment of the two substrings of the current node, the longest matching segment may be found from the two substrings to avoid the problem of matching position error, and the found longest matching segment is used as the second matching segment. For the search principle of the second matching segment, reference may be made to the description of the search principle of the first matching segment in step 101, which is not described herein again.
Based on the matching process of the first matching segment and the second matching segment, the longest matching segment is always searched in the parent string and the substring, so that the problem of wrong matching positions of the two character strings can be avoided in the whole matching process.
Step 104: if the matching is achieved, the matched second matching section is recorded into the current node, a left substring and a right substring of the second matching section are obtained from the two substrings of the current node and are respectively used as a left node and a right node of the current node, and the process of the step 103 is continuously executed.
Step 105: if not, determining the mismatch type of the two substrings of the current node and recording the mismatch type into the current node.
Still taking two character strings of "a b c d" and "a l b c e d f" as examples, regarding "a" and "a l" on the left node of the root node, the matched second matching segment is "a" and is recorded into the left node, the "a" and "l" are taken out from "a" and "a l" as the right node of the node, the "a" and "l" are continuously matched, the longest matching segment is not matched, and the type of mismatch is determined and is recorded into the right node of the node; for the'd' and 'e d f' on the right node of the root node, the matched second matching section is'd' and is recorded in the right node, the 'e' and the 'e' are taken out from the'd' and the 'e d f' as the left node of the node, the 'f' and the 'e' are taken out as the right node of the node, the 'e' and the 'e' are continuously matched and are not matched in the longest matching section, the mismatch type is determined to enter the left node of the node, the 'e' and the 'f' are continuously matched and are not matched in the longest matching section, the mismatch type is determined to enter the right node of the node, and therefore the establishment of a matching tree is completed, and the matching tree is composed of 6 nodes.
In one embodiment, for the process of determining the mismatch type of the two substrings of the current node, the following three cases can be distinguished:
firstly, if one substring of two substrings of a current node belongs to a matched character string and the other substring is empty, determining that the mismatch type of the two substrings is redundant;
secondly, if one substring of the two substrings of the current node belongs to the matched character string and the other substring is empty, determining that the mismatch type of the two substrings is missing;
and thirdly, if the two substrings of the current node are not empty, determining that the mismatch type of the two substrings is an error.
Wherein, redundant means that the substring belonging to the matching character string is redundant in the matched character string; "missing" refers to the absence of a substring belonging to the matched string in the matching string; "error" means that there is no longest matching segment between the substring belonging to the matching string and the substring belonging to the matched string.
Based on the above example, if "a b c d" is a matched string, "a l b c e d f" is a matched string, the mismatch type of the two substrings "" and "l" is redundant, the mismatch type of "" and "e" is redundant, and the mismatch type of "" and "f" is redundant.
It should be noted that, in order to distinguish different matching result types of each node, in step 101, while the first matching segment is recorded in the root node, a first mark corresponding to a matching success type may be marked for the root node; in step 104, the matched second matching segment is recorded into the current node, and simultaneously, a first mark corresponding to the matching success type is marked for the current node; in step 105, while recording the mismatch type into the current node, marking a second mark corresponding to the mismatch type for the current node;
the second mark and the first mark are used for representing display colors when the matching results of the two substrings of the current node are output, and different mismatch types correspond to different second marks.
Illustratively, the first label corresponding to the matching success type is green, the second label corresponding to the missing type is gray, the second label corresponding to the redundant type is yellow, and the second label corresponding to the error type is red.
Further, for two sub-strings with mismatch types of errors which are not empty, while the mismatch types of errors are recorded in the current node, a third mark may be added to the sub-string belonging to the matched character string, and a fourth mark may be added to the sub-string belonging to the matched character string, so as to distinguish the belonging character strings in different display manners indicated by different marks.
The third mark and the fourth mark are used for distinguishing different display modes of the two substrings. For example, the third label is underlined, the fourth label is not underlined, and for example, the third label is with a strikethrough and the fourth label is not strikethrough.
In summary, referring to the schematic diagram of the matching tree establishment process shown in fig. 2, taking the recording monitoring in the smart city construction as an example, it is assumed that the matched character string is "hello I love you and this is what I needed", belongs to the correct sentence string, and the matched character string is "I ate you that is not what I needed", belongs to the sentence string entered by the user, and needs to be matched with the correct sentence string. Firstly, the longest matching segment of the two character strings is 'what I need', so that the root node 1 of the matching tree is a green mark corresponding to 'what I need';
the child node 2 (left node) of the root node is a green mark corresponding to the longest matching segment 'I' of two substrings 'hello I love you and this is' and 'I ate you this is not', and the right node of the root node does not exist;
a child node 3 (left node) of the child node 2 is two substrings of "hello" and "", the corresponding mismatch type is missing, the gray mark is marked, and a child node 4 (right node) of the child node 2 is the longest matching section of "you" corresponding to the two substrings of "love you and this is" and "just you this is not";
the child node 5 (left node) of the child node 4 is two substrings "love" and "ate", the corresponding mismatch type is error, the fourth label of "love" is not underlined, the third label of "ate" is underlined, the child node 6 (right node) of the child node 4 is the longest matching segment of the two substrings "and this is" and "that is not" and is corresponding to "is" and green label;
the child node 7 (left node) of the child node 6 is two substrings "and this" and "that", the corresponding mismatch type is error, red mark, the fourth mark of "and this" is not underlined, the third mark of "that" is underlined, the child node 8 (right node) of the child node 6 is two substrings "and" not ", the corresponding mismatch type is redundant, yellow mark.
Still further, after the matching process of the two character strings is completed, the matching result of the two character strings can be output according to the mark recorded on each node.
Because each node can be flexibly operated, the output mode of each node can be easily modified when the matching result is output.
Illustratively, the matching result can be output in an html tag manner, so that the browser can conveniently display the result, and the user can visually check the result.
Based on the matching tree shown in fig. 2, each node on the matching tree is traversed, and the matching result of two character strings according to the mark recorded on each node is output as shown in fig. 3.
So far, the matching process shown in fig. 1 is completed, the matching segments are found out from the two character strings and used as root nodes of the matching tree, child nodes of the matching tree are established by finding out the matching segments all the time in subsequent left sub-string matching and right sub-string matching, so as to ensure the sequence of the original character strings, and as each node in the matching tree records the middle matching result, each node can be flexibly operated, the practicability is good, and the output result can be changed by modifying the output mode of each node when the matching tree is traversed and output, so that the purpose of separating the matching tree from the output matching result can be achieved, the code coupling degree is smaller, and the expansion is convenient.
Fig. 4 is a schematic diagram of a hardware structure of a character string matching apparatus according to an exemplary embodiment of the present invention. As shown in fig. 4, the character string matching apparatus includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the character string matching device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize the character string matching method described above when being executed by the processor. The processor of the string matching device is used for providing calculation and control capability and supporting the operation of the whole string matching device. The memory of the string matching device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a string matching method. The network interface of the character string matching device is used for connecting and communicating with the terminal.
It will be understood by those skilled in the art that the structure shown in fig. 4 is a block diagram of only a part of the structure related to the scheme of the present invention, and does not constitute a limitation of the character string matching apparatus to which the scheme of the present invention is applied, and a specific character string matching apparatus may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
Corresponding to the embodiment of the character string matching method, the invention also provides an embodiment of a character string matching device.
Fig. 5 is a flowchart illustrating an embodiment of a string matching apparatus according to an exemplary embodiment of the present invention, which may be applied to a string matching device. As shown in fig. 5, the character string matching apparatus includes:
a root node module 510, configured to match the first matching segments of the two character strings and record the first matching segments into a root node;
a sub-node module 520, configured to obtain, from the two character strings, a left sub-string and a right sub-string of the first matching section, which are respectively used as a left node and a right node of the root node;
the substring matching module 530 is used for sequentially taking the left node and the right node as the current node and matching the second matching sections of the two substrings of the current node;
the first processing module 540 is configured to, when matching is completed, record the matched second matching segment into the current node, obtain a left substring and a right substring of the second matching segment from the two substrings of the current node, respectively serve as the left node and the right node of the current node, and continue to execute the process of the substring matching module;
and a second processing module 550, configured to determine a mismatch type of the two substrings of the current node when the substrings are not matched, and record the mismatch type into the current node.
In an alternative implementation, the first matching segment is the longest matching segment in the two character strings; and the second matching section is the longest matching section in the two substrings of the current node.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
the first marking module is used for marking the first mark corresponding to the matching success type for the root node while recording the first matching section into the root node; recording the matched second matching section into the current node, and marking a first mark corresponding to the matching success type for the current node; when the unmatched type is recorded into the current node, marking a second mark corresponding to the unmatched type for the current node; and the different mismatch types correspond to different second marks, and the second marks and the first marks are used for representing display colors when the matching results of the two substrings of the current node are output.
In an optional implementation manner, the two character strings are a matched character string and a matching character string respectively; the second processing module 550 is specifically configured to, in the process of determining the mismatch type of the two substrings of the current node, determine that the mismatch type of the two substrings is redundant if one of the two substrings of the current node belongs to a matching character string and the other substring is empty; if one substring of the two substrings of the current node belongs to the matched character string and the other substring is empty, determining that the mismatch type of the two substrings is missing; and if the two substrings of the current node are not empty, determining that the mismatch type of the two substrings is an error.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
the second marking module is used for adding a third mark to the substring belonging to the matched character string and adding a fourth mark to the substring belonging to the matched character string while determining that the mismatch type of the two substrings is wrong and recording the mismatch type into the current node; wherein the third mark and the fourth mark are used for representing different display modes.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
and the output module is used for outputting the matching result of the two character strings according to the color mark and/or the display mark recorded on each node in an html tag mode after the matching process of the two character strings is completed.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The present invention also provides another embodiment, which is to provide a computer-readable storage medium having a computer program stored thereon, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of any one of the character string matching methods described above.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1.一种字符串匹配方法,其特征在于,所述方法包括:1. a string matching method, it is characterised in that the method comprises: S1、匹配两个字符串的第一匹配段并记入根节点;S1. Match the first matching segment of two strings and record it in the root node; S2、从所述两个字符串中获取第一匹配段的左子串和右子串分别作为所述根节点的左节点和右节点;S2, obtain the left substring and the right substring of the first matching segment from the two character strings as the left node and the right node of the root node respectively; S3、将左节点和右节点依次作为当前节点,并匹配当前节点的两个子串的第二匹配段;S3, take the left node and the right node as the current node in turn, and match the second matching segment of the two substrings of the current node; S4、若匹配到,则将匹配到的第二匹配段记入当前节点,从当前节点的两个子串中获取第二匹配段的左子串和右子串分别作为当前节点的左节点和右节点,并继续执行步骤S3;S4. If it matches, record the matched second matching segment into the current node, and obtain the left substring and right substring of the second matching segment from the two substrings of the current node as the left node and right substring of the current node, respectively. node, and continue to perform step S3; S5、若未匹配到,则确定当前节点的两个子串的不匹配类型并记入当前节点。S5. If no match is found, determine the mismatch type of the two substrings of the current node and record them in the current node. 2.根据权利要求1所述的方法,其特征在于,所述第一匹配段为所述两个字符串中的最长匹配段;2. The method according to claim 1, wherein the first matching segment is the longest matching segment in the two character strings; 所述第二匹配段为两个子串中的最长匹配段。The second matching segment is the longest matching segment in the two substrings. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, wherein the method further comprises: 将第一匹配段记入根节点的同时,为根节点标记匹配成功类型对应的第一标记;While recording the first matching segment into the root node, mark the root node with the first mark corresponding to the successful matching type; 将匹配到的第二匹配段记入当前节点的同时,为当前节点标记匹配成功类型对应的第一标记;While recording the matched second matching segment into the current node, mark the first mark corresponding to the matching successful type for the current node; 将不匹配类型记入当前节点的同时,为当前节点标记所述不匹配类型对应的第二标记;While recording the mismatch type into the current node, mark the current node with a second marker corresponding to the mismatch type; 其中,不同的不匹配类型对应不同的第二标记,所述第二标记与第一标记均用于表征在输出当前节点的两个子串匹配结果时的显示颜色。Wherein, different mismatch types correspond to different second marks, and both the second mark and the first mark are used to represent the display color when the matching result of the two substrings of the current node is output. 4.根据权利要求3所述的方法,其特征在于,所述两个字符串分别为被匹配字符串和匹配字符串;所述确定当前节点的两个子串的不匹配类型,包括:4. The method according to claim 3, wherein the two character strings are respectively a matched character string and a matching character string; the described determination of the mismatch types of the two substrings of the current node comprises: 若当前节点的两个子串中的一个子串属于匹配字符串,另一个子串为空,则确定两个子串的不匹配类型为多余;If one of the two substrings of the current node belongs to the matching string and the other substring is empty, the mismatch type of the two substrings is determined to be redundant; 若当前节点的两个子串中的一个子串属于被匹配字符串,另一个子串为空,则确定两个子串的不匹配类型为缺失;If one of the two substrings of the current node belongs to the matched string and the other substring is empty, the mismatch type of the two substrings is determined to be missing; 若当前节点的两个子串均不为空,则确定两个子串的不匹配类型为错误。If neither of the two substrings of the current node is empty, it is determined that the mismatch type of the two substrings is an error. 5.根据权利要求4所述的方法,其特征在于,所述方法还包括:5. The method according to claim 4, wherein the method further comprises: 在确定两个子串的不匹配类型为错误并记入当前节点的同时,为属于匹配字符串的子串添加第三标记,为属于被匹配字符串的子串添加第四标记;When it is determined that the mismatch type of the two substrings is an error and recorded in the current node, a third mark is added to the substring belonging to the matched string, and a fourth mark is added to the substring belonging to the matched string; 所述第三标记和第四标记用于表征不同的显示方式。The third mark and the fourth mark are used to represent different display modes. 6.根据权利要求5所述的方法,其特征在于,所述方法还包括:6. The method according to claim 5, wherein the method further comprises: 在完成两个字符串的匹配之后,通过html标签方式,根据每一节点上记录标记输出两个字符串的匹配结果。After the matching of the two strings is completed, the matching results of the two strings are output according to the record tags on each node by means of html tags. 7.一种字符串匹配装置,其特征在于,所述装置包括:7. A string matching device, wherein the device comprises: 根节点模块,用于匹配两个字符串的第一匹配段并记入根节点;The root node module is used to match the first matching segment of the two strings and record it in the root node; 子节点模块,用于从所述两个字符串中获取第一匹配段的左子串和右子串分别作为所述根节点的左节点和右节点;a child node module, used to obtain the left substring and the right substring of the first matching segment from the two character strings as the left node and the right node of the root node, respectively; 子串匹配模块,用于将左节点和右节点依次作为当前节点,并匹配当前节点的两个子串的第二匹配段;The substring matching module is used to take the left node and the right node as the current node in turn, and match the second matching segment of the two substrings of the current node; 第一处理模块,用于在匹配到时,将匹配到的第二匹配段记入当前节点,从当前节点的两个子串中获取第二匹配段的左子串和右子串分别作为当前节点的左节点和右节点,并继续执行所述子串匹配模块的过程;The first processing module is used to record the matched second matching segment into the current node when it is matched, and obtain the left substring and right substring of the second matching segment from the two substrings of the current node as the current node respectively the left and right nodes of the substring, and continue to execute the process of the substring matching module; 第二处理模块,用于在未匹配到时,确定当前节点的两个子串的不匹配类型并记入当前节点。The second processing module is configured to determine the mismatch type of the two substrings of the current node and record them in the current node when no match is found. 8.根据权利要求7所述的装置,其特征在于,所述装置还包括:8. The apparatus according to claim 7, wherein the apparatus further comprises: 第一标记模块,用于将第一匹配段记入根节点的同时,为根节点标记匹配成功类型对应的第一标记;将匹配到的第二匹配段记入当前节点的同时,为当前节点标记匹配成功类型对应的第一标记;将不匹配类型记入当前节点的同时,为当前节点标记所述不匹配类型对应的第二标记;其中,不同的不匹配类型对应不同的第二标记,所述第二标记与第一标记均用于表征在输出当前节点的两个子串匹配结果时的显示颜色。The first marking module is used to mark the first mark corresponding to the successful type of the root node while recording the first matching segment in the root node; while recording the matched second matching segment in the current node, it is the current node Mark the first mark corresponding to the successful matching type; while recording the unmatched type into the current node, mark the second mark corresponding to the unmatched type for the current node; wherein, different unmatched types correspond to different second marks, Both the second mark and the first mark are used to represent the display color when the matching result of the two substrings of the current node is output. 9.一种字符串匹配设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-6任一项所述方法的步骤。9. A string matching device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the program as claimed in the right when executing the program The steps of any one of claims 1-6. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1~6中任一项字符串匹配方法的步骤。10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the string matching according to any one of claims 1 to 6 is realized steps of the method.
CN202011482497.7A 2020-12-15 2020-12-15 A string matching method, device, equipment and storage medium Active CN112559818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011482497.7A CN112559818B (en) 2020-12-15 2020-12-15 A string matching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011482497.7A CN112559818B (en) 2020-12-15 2020-12-15 A string matching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112559818A true CN112559818A (en) 2021-03-26
CN112559818B CN112559818B (en) 2025-03-21

Family

ID=75063995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011482497.7A Active CN112559818B (en) 2020-12-15 2020-12-15 A string matching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112559818B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879845A (en) * 2022-12-21 2023-03-31 中国电信股份有限公司 Method and device for determining cargo scheduling strategy and nonvolatile storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130983A1 (en) * 2010-11-24 2012-05-24 Microsoft Corporation Efficient string pattern matching for large pattern sets
CN106959962A (en) * 2016-01-12 2017-07-18 中国移动通信集团青海有限公司 A kind of multi-pattern match method and apparatus
CN108710671A (en) * 2018-05-16 2018-10-26 北京金堤科技有限公司 The extracting method and device of Business Name in text
CN111951894A (en) * 2019-05-14 2020-11-17 三星电子株式会社 Solid State Drives and Parallelizable Sequence Alignment Methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130983A1 (en) * 2010-11-24 2012-05-24 Microsoft Corporation Efficient string pattern matching for large pattern sets
CN106959962A (en) * 2016-01-12 2017-07-18 中国移动通信集团青海有限公司 A kind of multi-pattern match method and apparatus
CN108710671A (en) * 2018-05-16 2018-10-26 北京金堤科技有限公司 The extracting method and device of Business Name in text
CN111951894A (en) * 2019-05-14 2020-11-17 三星电子株式会社 Solid State Drives and Parallelizable Sequence Alignment Methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHSMY2018: "字符串匹配算法(单模式串)", Retrieved from the Internet <URL:https://blog.csdn.net/mingyunxiaohai/article/details/87563292> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879845A (en) * 2022-12-21 2023-03-31 中国电信股份有限公司 Method and device for determining cargo scheduling strategy and nonvolatile storage medium

Also Published As

Publication number Publication date
CN112559818B (en) 2025-03-21

Similar Documents

Publication Publication Date Title
WO2023045417A1 (en) Fault knowledge graph construction method and apparatus
CN111079043B (en) Key content positioning method
CN110851491B (en) Network link prediction method based on multiple semantic influence of multiple neighbor nodes
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
JP6850806B2 (en) Annotation system for extracting attributes from electronic data structures
CN115599899B (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN114064913B (en) A document retrieval method and system based on knowledge graph
CN110287327B (en) Automatic path adaptive knowledge graph generation method based on teaching material catalog and directed hypergraph
CN110929933A (en) Rice disease prediction and diagnosis method based on knowledge graph
CN110909174B (en) Knowledge graph-based method for improving entity link in simple question answering
CN114238645A (en) A Relation Selection Method Based on BERT Siamese Attention Network and Fusion Graph Embedding Features
CN107832047A (en) A kind of non-api function argument based on LSTM recommends method
CN116244333B (en) Database query performance prediction method and system based on cost factor calibration
CN112364091A (en) Method and system for visually inquiring character relationship based on knowledge graph
JP2002099561A (en) Data conversion method, data conversion system, and storage medium
Li et al. Neural Chinese address parsing
CN119293195A (en) A RAG optimization method for large language models based on tree neighbor context
CN109582675A (en) Tag match method, apparatus, server and storage medium
CN113434627A (en) Work order processing method and device and computer readable storage medium
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN114168704A (en) Multi-information-source intelligent education question-answering method based on multi-relation graph convolutional neural network
CN109684438B (en) Method for retrieving data with parent-child hierarchical structure
CN112559818A (en) Character string matching method, device, equipment and storage medium
CN117009549B (en) Interactive thinking map knowledge base device
CN116383354B (en) A graph visualization automatic question answering method based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant