Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to solve the problems that most of character string matching tools in the industry are not flexible enough and poor in practicability, the application provides an improved character string matching method, namely matching sections are found out in two character strings to serve as root nodes of a matching tree, child nodes of the matching tree are established by finding out the matching sections in subsequent left substring matching and right substring matching all the time, so that the sequence of original character strings is guaranteed, each node in the matching tree records a middle matching result, therefore, the operation on each node can be flexible, the practicability is good, the output result can be changed by modifying the output mode of each node when the matching tree is traversed and output, the purpose of separating the matching tree from the output matching result can be achieved, the code coupling degree is smaller, and the expansion is facilitated.
The following describes the character string matching method proposed in the present application in detail with specific embodiments.
Fig. 1 is a flowchart illustrating an embodiment of a string matching method according to an exemplary embodiment of the present invention, where the string matching method may be applied to a string matching device (e.g., a terminal, a server, etc.). As shown in fig. 1, the character string matching method includes the following steps:
step 101: the first matching segments of the two strings are matched and logged into the root node.
In the construction of a smart city, taking a recording monitoring service scene as an example, the matched character string is a correct sentence string, and the matched character string is a sentence string input by a user and needs to be matched with the correct sentence string.
In one embodiment, step 101 is a first match of two strings, and a longest matching segment can be found from the two strings to avoid the problem of matching position error, and the found longest matching segment is used as a first matching segment.
When searching the longest matching segment, the matching character string is used to search the longest matching segment in the matched character string.
In one example, the specific search process may be: taking the first character in the matched character string as a current character, searching the character which is the same as the current character in the matched character string, matching the next character of the current character with the next character of the character searched in the matched character string if the character is searched, taking the next character of the current character as the current character if the character is successfully matched, matching the next character of the current character with the next character of the character successfully matched in the matched character string, continuing to execute the process of matching the next character of the current character with the next character of the character successfully matched in the matched character string if the character is successfully matched, and taking the character searched in the matched character string and the character successfully matched as a matching segment until the matching failure occurs.
Then, the second character in the matched character string is used as the current character, the searching process is executed until the last character in the matched character string is used as the current character to finish searching, so that a plurality of matched segments can be obtained, and finally, one matched segment with the longest character length is selected from the plurality of matched segments to be used as the longest matched segment.
In the searching process, the space between the characters is used as an interval for searching. For example, two strings are "a b c d" and "a l b c e d f", respectively, and the longest "b c" substring will eventually be taken as the longest matching segment.
It can be understood by those skilled in the art that the above-mentioned search process is only an example, and the search mode of the longest matching segment can also be implemented by using other related technologies, which is not specifically limited in this application.
It should be noted that, if the longest matching segment is not found in the first matching process of the two character strings, it indicates that there is no matching place between the two character strings.
Step 102: and acquiring a left substring and a right substring of the first matching section from the two character strings to be used as a left node and a right node of the root node respectively.
After the root node of the matching tree is obtained, the left substring is used as the left node of the root node, and the right substring is used as the right node of the root node, so that the sequence of the original character strings can be ensured.
Still taking the two strings "a b c d" and "a l b c e d f" as examples, and the root node is "b c", the "a" and "a l" in the two strings will be the left node of the root node, and "d" and "e d f" will be the right node of the root node.
Step 103: and taking the left node and the right node as the current nodes in sequence, and matching the second matching sections of the two substrings of the current nodes.
In an embodiment, for the process of matching the second matching segment of the two substrings of the current node, the longest matching segment may be found from the two substrings to avoid the problem of matching position error, and the found longest matching segment is used as the second matching segment. For the search principle of the second matching segment, reference may be made to the description of the search principle of the first matching segment in step 101, which is not described herein again.
Based on the matching process of the first matching segment and the second matching segment, the longest matching segment is always searched in the parent string and the substring, so that the problem of wrong matching positions of the two character strings can be avoided in the whole matching process.
Step 104: if the matching is achieved, the matched second matching section is recorded into the current node, a left substring and a right substring of the second matching section are obtained from the two substrings of the current node and are respectively used as a left node and a right node of the current node, and the process of the step 103 is continuously executed.
Step 105: if not, determining the mismatch type of the two substrings of the current node and recording the mismatch type into the current node.
Still taking two character strings of "a b c d" and "a l b c e d f" as examples, regarding "a" and "a l" on the left node of the root node, the matched second matching segment is "a" and is recorded into the left node, the "a" and "l" are taken out from "a" and "a l" as the right node of the node, the "a" and "l" are continuously matched, the longest matching segment is not matched, and the type of mismatch is determined and is recorded into the right node of the node; for the'd' and 'e d f' on the right node of the root node, the matched second matching section is'd' and is recorded in the right node, the 'e' and the 'e' are taken out from the'd' and the 'e d f' as the left node of the node, the 'f' and the 'e' are taken out as the right node of the node, the 'e' and the 'e' are continuously matched and are not matched in the longest matching section, the mismatch type is determined to enter the left node of the node, the 'e' and the 'f' are continuously matched and are not matched in the longest matching section, the mismatch type is determined to enter the right node of the node, and therefore the establishment of a matching tree is completed, and the matching tree is composed of 6 nodes.
In one embodiment, for the process of determining the mismatch type of the two substrings of the current node, the following three cases can be distinguished:
firstly, if one substring of two substrings of a current node belongs to a matched character string and the other substring is empty, determining that the mismatch type of the two substrings is redundant;
secondly, if one substring of the two substrings of the current node belongs to the matched character string and the other substring is empty, determining that the mismatch type of the two substrings is missing;
and thirdly, if the two substrings of the current node are not empty, determining that the mismatch type of the two substrings is an error.
Wherein, redundant means that the substring belonging to the matching character string is redundant in the matched character string; "missing" refers to the absence of a substring belonging to the matched string in the matching string; "error" means that there is no longest matching segment between the substring belonging to the matching string and the substring belonging to the matched string.
Based on the above example, if "a b c d" is a matched string, "a l b c e d f" is a matched string, the mismatch type of the two substrings "" and "l" is redundant, the mismatch type of "" and "e" is redundant, and the mismatch type of "" and "f" is redundant.
It should be noted that, in order to distinguish different matching result types of each node, in step 101, while the first matching segment is recorded in the root node, a first mark corresponding to a matching success type may be marked for the root node; in step 104, the matched second matching segment is recorded into the current node, and simultaneously, a first mark corresponding to the matching success type is marked for the current node; in step 105, while recording the mismatch type into the current node, marking a second mark corresponding to the mismatch type for the current node;
the second mark and the first mark are used for representing display colors when the matching results of the two substrings of the current node are output, and different mismatch types correspond to different second marks.
Illustratively, the first label corresponding to the matching success type is green, the second label corresponding to the missing type is gray, the second label corresponding to the redundant type is yellow, and the second label corresponding to the error type is red.
Further, for two sub-strings with mismatch types of errors which are not empty, while the mismatch types of errors are recorded in the current node, a third mark may be added to the sub-string belonging to the matched character string, and a fourth mark may be added to the sub-string belonging to the matched character string, so as to distinguish the belonging character strings in different display manners indicated by different marks.
The third mark and the fourth mark are used for distinguishing different display modes of the two substrings. For example, the third label is underlined, the fourth label is not underlined, and for example, the third label is with a strikethrough and the fourth label is not strikethrough.
In summary, referring to the schematic diagram of the matching tree establishment process shown in fig. 2, taking the recording monitoring in the smart city construction as an example, it is assumed that the matched character string is "hello I love you and this is what I needed", belongs to the correct sentence string, and the matched character string is "I ate you that is not what I needed", belongs to the sentence string entered by the user, and needs to be matched with the correct sentence string. Firstly, the longest matching segment of the two character strings is 'what I need', so that the root node 1 of the matching tree is a green mark corresponding to 'what I need';
the child node 2 (left node) of the root node is a green mark corresponding to the longest matching segment 'I' of two substrings 'hello I love you and this is' and 'I ate you this is not', and the right node of the root node does not exist;
a child node 3 (left node) of the child node 2 is two substrings of "hello" and "", the corresponding mismatch type is missing, the gray mark is marked, and a child node 4 (right node) of the child node 2 is the longest matching section of "you" corresponding to the two substrings of "love you and this is" and "just you this is not";
the child node 5 (left node) of the child node 4 is two substrings "love" and "ate", the corresponding mismatch type is error, the fourth label of "love" is not underlined, the third label of "ate" is underlined, the child node 6 (right node) of the child node 4 is the longest matching segment of the two substrings "and this is" and "that is not" and is corresponding to "is" and green label;
the child node 7 (left node) of the child node 6 is two substrings "and this" and "that", the corresponding mismatch type is error, red mark, the fourth mark of "and this" is not underlined, the third mark of "that" is underlined, the child node 8 (right node) of the child node 6 is two substrings "and" not ", the corresponding mismatch type is redundant, yellow mark.
Still further, after the matching process of the two character strings is completed, the matching result of the two character strings can be output according to the mark recorded on each node.
Because each node can be flexibly operated, the output mode of each node can be easily modified when the matching result is output.
Illustratively, the matching result can be output in an html tag manner, so that the browser can conveniently display the result, and the user can visually check the result.
Based on the matching tree shown in fig. 2, each node on the matching tree is traversed, and the matching result of two character strings according to the mark recorded on each node is output as shown in fig. 3.
So far, the matching process shown in fig. 1 is completed, the matching segments are found out from the two character strings and used as root nodes of the matching tree, child nodes of the matching tree are established by finding out the matching segments all the time in subsequent left sub-string matching and right sub-string matching, so as to ensure the sequence of the original character strings, and as each node in the matching tree records the middle matching result, each node can be flexibly operated, the practicability is good, and the output result can be changed by modifying the output mode of each node when the matching tree is traversed and output, so that the purpose of separating the matching tree from the output matching result can be achieved, the code coupling degree is smaller, and the expansion is convenient.
Fig. 4 is a schematic diagram of a hardware structure of a character string matching apparatus according to an exemplary embodiment of the present invention. As shown in fig. 4, the character string matching apparatus includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the character string matching device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize the character string matching method described above when being executed by the processor. The processor of the string matching device is used for providing calculation and control capability and supporting the operation of the whole string matching device. The memory of the string matching device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a string matching method. The network interface of the character string matching device is used for connecting and communicating with the terminal.
It will be understood by those skilled in the art that the structure shown in fig. 4 is a block diagram of only a part of the structure related to the scheme of the present invention, and does not constitute a limitation of the character string matching apparatus to which the scheme of the present invention is applied, and a specific character string matching apparatus may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
Corresponding to the embodiment of the character string matching method, the invention also provides an embodiment of a character string matching device.
Fig. 5 is a flowchart illustrating an embodiment of a string matching apparatus according to an exemplary embodiment of the present invention, which may be applied to a string matching device. As shown in fig. 5, the character string matching apparatus includes:
a root node module 510, configured to match the first matching segments of the two character strings and record the first matching segments into a root node;
a sub-node module 520, configured to obtain, from the two character strings, a left sub-string and a right sub-string of the first matching section, which are respectively used as a left node and a right node of the root node;
the substring matching module 530 is used for sequentially taking the left node and the right node as the current node and matching the second matching sections of the two substrings of the current node;
the first processing module 540 is configured to, when matching is completed, record the matched second matching segment into the current node, obtain a left substring and a right substring of the second matching segment from the two substrings of the current node, respectively serve as the left node and the right node of the current node, and continue to execute the process of the substring matching module;
and a second processing module 550, configured to determine a mismatch type of the two substrings of the current node when the substrings are not matched, and record the mismatch type into the current node.
In an alternative implementation, the first matching segment is the longest matching segment in the two character strings; and the second matching section is the longest matching section in the two substrings of the current node.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
the first marking module is used for marking the first mark corresponding to the matching success type for the root node while recording the first matching section into the root node; recording the matched second matching section into the current node, and marking a first mark corresponding to the matching success type for the current node; when the unmatched type is recorded into the current node, marking a second mark corresponding to the unmatched type for the current node; and the different mismatch types correspond to different second marks, and the second marks and the first marks are used for representing display colors when the matching results of the two substrings of the current node are output.
In an optional implementation manner, the two character strings are a matched character string and a matching character string respectively; the second processing module 550 is specifically configured to, in the process of determining the mismatch type of the two substrings of the current node, determine that the mismatch type of the two substrings is redundant if one of the two substrings of the current node belongs to a matching character string and the other substring is empty; if one substring of the two substrings of the current node belongs to the matched character string and the other substring is empty, determining that the mismatch type of the two substrings is missing; and if the two substrings of the current node are not empty, determining that the mismatch type of the two substrings is an error.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
the second marking module is used for adding a third mark to the substring belonging to the matched character string and adding a fourth mark to the substring belonging to the matched character string while determining that the mismatch type of the two substrings is wrong and recording the mismatch type into the current node; wherein the third mark and the fourth mark are used for representing different display modes.
In an alternative implementation, the apparatus further comprises (not shown in fig. 5):
and the output module is used for outputting the matching result of the two character strings according to the color mark and/or the display mark recorded on each node in an html tag mode after the matching process of the two character strings is completed.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The present invention also provides another embodiment, which is to provide a computer-readable storage medium having a computer program stored thereon, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of any one of the character string matching methods described above.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.