[go: up one dir, main page]

CN110008383B - Black and white list retrieval method and device based on multiple indexes - Google Patents

Black and white list retrieval method and device based on multiple indexes Download PDF

Info

Publication number
CN110008383B
CN110008383B CN201910289693.3A CN201910289693A CN110008383B CN 110008383 B CN110008383 B CN 110008383B CN 201910289693 A CN201910289693 A CN 201910289693A CN 110008383 B CN110008383 B CN 110008383B
Authority
CN
China
Prior art keywords
character
target
character string
retrieval
black
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910289693.3A
Other languages
Chinese (zh)
Other versions
CN110008383A (en
Inventor
张子兴
刘霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anhu Huanyu Technology Co ltd
Original Assignee
Beijing Anhu Huanyu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anhu Huanyu Technology Co ltd filed Critical Beijing Anhu Huanyu Technology Co ltd
Priority to CN201910289693.3A priority Critical patent/CN110008383B/en
Publication of CN110008383A publication Critical patent/CN110008383A/en
Application granted granted Critical
Publication of CN110008383B publication Critical patent/CN110008383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a black and white list retrieval method based on multiple indexes, which comprises the following steps: acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1; traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table; and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched. According to the method, the retrieval of the black and white list can be realized only by adopting the target characters and the corresponding target character positions, and the problems that the retrieval can be completed and the retrieval efficiency is low after a retriever needs to inquire more character positions for the first character with higher frequency because the character string presents a semantic environment and the occurrence frequency of the characters does not accord with uniform distribution are solved.

Description

Black and white list retrieval method and device based on multiple indexes
Technical Field
The invention relates to the technical field of retrieval, in particular to a black and white list retrieval method and device based on multiple indexes.
Background
Dictionary indexing is a common black and white list retrieval approach. By establishing an index for the first character of the character string in the black and white list, the time cost caused by sequential retrieval is avoided.
The inventor researches the existing dictionary indexing mode to find that the character string presents a semantic environment, the occurrence frequency of characters does not accord with uniform distribution, for the first character with higher frequency, a searcher can complete the search after needing to inquire more character positions, and the search efficiency is low.
Disclosure of Invention
In view of the above, the present invention provides a black and white list retrieval method and apparatus based on multiple indexes, so as to solve the problems in the prior art that since a character string presents a semantic environment, the occurrence frequency of characters does not conform to uniform distribution, and for a first character with higher frequency, a retriever needs to query more character positions to complete the retrieval, and the retrieval efficiency is low. The specific scheme is as follows:
a black and white list retrieval method based on multi-index comprises the following steps:
acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1;
traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table;
and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched.
Optionally, the step of traversing the preset number table to determine the occurrence number of the corresponding character S1[ n ] in each character bit n in the preset number table includes:
acquiring the corresponding relation between each character in the character string to be matched and the character position to which the character belongs;
and traversing the preset times table, and determining the occurrence times of each corresponding relation in the preset times table.
Optionally, the method includes a process of constructing a preset number table:
counting the maximum length L2 and the number L3 of all the character strings in the target black and white list set;
counting the occurrence times of each character S2[ j ] in a set S2 consisting of all characters in each character bit K, wherein K is more than or equal to 1 and less than or equal to L2, and j is more than or equal to 1 and less than or equal to L3;
and the character bit K, each character S2[ j ] and the corresponding occurrence number form the preset number table.
Optionally, in the method, the target character with the smallest number of occurrences and the target character bit corresponding to the target character are selected from the number of occurrences and used as search keywords to search in the target black-and-white list set, and determining the list type to which the character string to be matched belongs includes:
judging whether the occurrence frequency corresponding to the search keyword is 1 or not;
if so, taking a character string corresponding to the search keyword as a target character string;
and determining a target list to which the target character string belongs according to the identifier of the target character string.
The above method, optionally, further includes:
if the number of the target character strings is more than 1, other retrieval keywords are continuously input, retrieval is carried out in the target character strings according to the other retrieval keywords, a first target character string is determined, and a target list to which the first target character string belongs is determined, or;
and if the number of the target character strings is 0, executing a corresponding strategy according to the user requirement.
A black and white list retrieval device based on multi-index is characterized by comprising:
the acquiring module is used for acquiring the length L1 of the character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1;
the first determining module is used for traversing the preset number table and determining the occurrence times of the corresponding character S1[ n ] in each character bit n in the preset number table;
and the second determining module is used for selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black-and-white list set for retrieval, and determining the list type of the character string to be matched.
The above apparatus, optionally, the first determining module includes:
the acquiring unit is used for acquiring the corresponding relation between each character in the character string to be matched and the character position to which the character belongs;
and the first determining unit is used for traversing the preset number table and determining the occurrence frequency of each corresponding relation in the preset number table.
Optionally, the apparatus described above, where the process of constructing the preset number table in the first determining module includes:
a first statistical unit, configured to count a maximum value L2 of lengths of each character string included in the target black-and-white list set and a number L3 of all characters;
a second statistical unit for counting the number of occurrences of each character S2[ j ] in a set S2 composed of all characters in each character bit K, wherein K is more than or equal to 1 and less than or equal to L2, and j is more than or equal to 1 and less than or equal to L3;
and the character bit K, each character S2[ j ] and the corresponding occurrence number form the preset number table.
The above apparatus, optionally, the second determining module includes:
a judging unit configured to judge whether or not the number of occurrences corresponding to the search keyword is 1;
a second determining unit, configured to, if yes, take a character string corresponding to the search keyword as a target character string;
and the third determining unit is used for determining a target list to which the target character string belongs according to the identifier of the target character string.
The above apparatus, optionally, further comprises:
a fourth determining unit, configured to continue to input other search keywords if the number of the target character strings is greater than 1, perform a search in the target character strings according to the other search keywords, determine a first target character string, and determine a target list to which the first target character string belongs, or;
and the execution unit is used for executing a corresponding strategy according to the requirement of a user if the number of the target character strings is 0.
Compared with the prior art, the invention has the following advantages:
the invention discloses a black and white list retrieval method based on multiple indexes, which comprises the following steps: acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1; traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table; and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched. According to the method, the retrieval of the black and white list can be realized only by adopting the target characters and the corresponding target character positions, and the problems that the retrieval can be completed and the retrieval efficiency is low after a retriever needs to inquire more character positions for the first character with higher frequency because the character string presents a semantic environment and the occurrence frequency of the characters does not accord with uniform distribution are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a black and white list retrieval method based on multiple indexes according to an embodiment of the present invention;
FIG. 2 is another flowchart of a black and white list retrieving method based on multi-index according to an embodiment of the present invention;
FIG. 3 is another flowchart of a black and white list retrieving method based on multi-index according to an embodiment of the present invention;
fig. 4 is a block diagram of a black and white list retrieving apparatus based on multiple indexes according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses a black and white list retrieval method based on multiple indexes, which is applied to the retrieval process of a black and white list, and aims at a specific retrieval target, the retrieval method replaces the traditional first character retrieval method by starting retrieval from a character bit with the least occurrence frequency, so that the time cost caused by non-uniform distribution is reduced, the retrieval efficiency is improved, and the execution flow of the retrieval method, as shown in figure 1, comprises the following steps:
s101, acquiring the length L1 of a character string to be matched and each character contained in the character string S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1;
in the embodiment of the present invention, the character string to be matched is a character string that needs to be retrieved currently, wherein the character string to be matched may be a combination of numbers, letters, a combination of numbers and letters, or other preferred identifiers, and the form of the plurality of character strings to be matched is not limited in the present invention. In the present invention, the character string to be matched is exemplified as "university", where the length L1 of the character string to be matched is 10, S is [ u, n, i, v, e, r, S, i, t, y ], where n is greater than or equal to 1 and less than or equal to 10.
S102, traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table;
in the embodiment of the invention, the preset order table is established for the target black and white list set, wherein the preset order table comprises three elements of character positions, characters and occurrence times, and represents the occurrence times of each character in the target black and white list set at the character position corresponding to the character position. And traversing the preset number table, and determining the corresponding occurrence times of the corresponding character in each character position of the character string to be matched in the preset number table.
S103, selecting the target character with the least number of occurrences and the corresponding target character position thereof as search keywords in the target black and white list set for searching, and determining the list type of the character string to be matched.
In the embodiment of the invention, the occurrence times of each character and the corresponding character bit in the character string to be matched are obtained, the target character with the least occurrence times and the corresponding target character bit are selected from the occurrence times, the target character and the target character bit are used as retrieval keywords, and the list type to which the target character string to be matched belongs is determined according to the retrieval keywords, wherein the list type can be a black list or a white list.
The invention discloses a black and white list retrieval method based on multiple indexes, which comprises the following steps: acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1; traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table; and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched. According to the method, the retrieval of the black and white list can be realized only by adopting the target characters and the corresponding target character positions, and the problems that the retrieval can be completed and the retrieval efficiency is low after a retriever needs to inquire more character positions for the first character with higher frequency because the character string presents a semantic environment and the occurrence frequency of the characters does not accord with uniform distribution are solved.
In the embodiment of the present invention, a method flow of a process of constructing a preset number table is shown in fig. 2, and includes the steps of:
s201, counting the maximum length L2 of each character string and the number L3 of all characters in the target black and white list set;
in the embodiment of the present invention, it is assumed that the target monochrome list set is a, where a set a includes string elements as ai, a maximum length of all string elements in the set a is L2 (i.e., max (ai)) ═ L2, and the number of string elements in the set a is L3 (i.e., i ═ 1., L3), where an example of the target monochrome list set a is shown in table 1:
wherein, L2 is 12, L3 is 10.
Table 1 example black and white list set a
Serial number Black list Serial number White list
a1 information a6 knowledge
a2 message a7 university
a3 introduction a8 eyesight
a4 copyright a9 million
a5 read a10 accompany
S202, counting the occurrence times of each character S2[ j ] in each character bit K in a set S2 consisting of all characters, wherein K is more than or equal to 1 and less than or equal to L2, and j is more than or equal to 1 and less than or equal to L3;
in the embodiment of the invention, the number of characters in the set A is recorded as p. All characters in the set a are taken to form a set S2[ j ], and the character elements contained in the set S2 are S2[ j ] (j is 1, 2. The number of times IND [ k, bj ] each bj occurs at the kth character bit (k 1, 2.., L2) is calculated. All the times form the preset times table F, wherein all the characters in the set a are taken to form a set S2, and the character elements S2[ j ] contained in the set S2 are { a, c, d, e, F, g, h, i, k, l, m, n, o, p, r, S, t, u, v, w, y }, and the preset times table is shown in table 2:
TABLE 2 exemplary order Table F
k S2[j] IND k S2[j] IND k S2[j] IND k S2[j] IND
1 I 2 2 Y 1 3 L 1 5 A 1
1 M 2 2 I 1 3 C 1 5 O 1
1 C 1 2 C 1 4 O 2 5 L 1
1 R 1 3 F 1 4 S 2 5 E 1
1 K 1 3 S 1 4 R 1 5 I 2
1 U 1 3 T 1 4 Y 1 5 M 1
1 E 1 3 P 1 4 D 1 6 M 1
1 A 1 3 A 1 4 W 1 6 G 2
2 N 4 3 O 1 4 V 1 6 D 1
2 E 2 3 I 1 4 L 1 6 I 1
2 O 1 3 E 1 5 R 2 6 E 1
TABLE 2 exemplary times table F (continuation)
k S2[j] IND k S2[j] IND k S2[j] IND
6 R 1 8 T 2 10 I 1
6 O 1 8 C 1 10 Y 1
6 P 1 8 H 1 11 N 1
7 A 2 8 G 1 11 O 1
7 E 1 8 I 1 12 N 1
7 U 1 8 N 1
7 G 1 9 I 1
7 D 1 9 E 1
7 S 1 9 T 3
7 H 1 9 Y 1
7 N 1 10 O 1
In the embodiment of the invention, the corresponding relation between each character in the character string to be matched and the character position to which the character belongs is obtained, and the occurrence frequency of each corresponding relation in the preset order table is determined by traversing the preset order table.
In the embodiment of the present invention, a method flow for selecting a target character with the least number of occurrences and a target character bit corresponding to the target character from each number of occurrences as a search keyword to search in a target black and white list set and determining a list type to which a character string to be matched belongs is shown in fig. 3, and includes the steps of:
s301, judging whether the occurrence frequency corresponding to the search keyword is 1 or not;
in the embodiment of the present invention, the character string to be matched is s-unity, and the length of the character string s is L1-10. Using the preset number table F to count the number IND [ h, S1[ n ] of occurrences of the character S1[ n ] at the nth character bit (n is greater than or equal to 1 and less than or equal to 10), the statistical result is shown in table 3:
table 3 statistics on the number of times of character string s ═ unity
h 1 2 3 4 5 6 7 8 9 10
s[h] U N I V E R S I T Y
IND 1 4 1 1 1 1 1 1 3 1
And obtaining the minimum value of 1 in the character string to be matched according to the counted IND [ h, s [ h ] ].
S302, if yes, taking a character string corresponding to the search keyword as a target character string;
in the embodiment of the present invention, only any one of U at the 1 st character bit, I at the 3 rd character bit, V at the 4 th character bit, E at the 5 th character bit, R at the 6 th character bit, S at the 7 th character bit, I at the 8 th character bit, and Y at the 10 th character bit needs to be retrieved, and the target character string can be retrieved as S ═ unity.
S303, determining a target list to which the target character string belongs according to the identifier of the target character string.
In the embodiment of the present invention, an identifier included in the target character string is obtained, where the identifier is used to distinguish a list type of the target list of the target character string, and the identifier may be a number, a letter, or another preferred identifier.
S304, if the number of the target character strings is larger than 1, other search keywords are continuously input, searching is carried out in the target character strings according to the other search keywords, a first target character string is determined, and a target list to which the first target character string belongs is determined;
in the embodiment of the present invention, if the number of the target character strings is greater than 1, it is indicated that the target character strings are not unique, and other search keywords need to be continuously input, where the other search keywords include character positions and characters corresponding to the character positions, search is performed in the target character strings according to the other search keywords, a first target character string is determined, an identifier included in the first target character string is obtained, and a list type of the target list of the first target character string is determined according to the identifier.
S305, if the number of the target character strings is 0, executing a corresponding strategy according to user requirements.
In the embodiment of the invention, if the number of the target character strings is 0, the target black and white list and the set do not contain the character strings to be matched, and a corresponding strategy is executed according to the requirement of a user.
Based on the above black and white list retrieval method based on multiple indexes, in an embodiment of the present invention, a black and white list retrieval device based on multiple indexes is further provided, a structural block diagram of the retrieval device is shown in fig. 4, and the retrieval device includes:
an acquisition module 401, a first determination module 402 and a second determination module 403.
Wherein,
the obtaining module 401 is configured to obtain the length L1 of the character string to be matched and each character S1[ n ], where n is greater than or equal to 1 and less than or equal to L1;
the first determining module 402 is configured to traverse the preset number table, and determine the occurrence number of the corresponding character S1[ n ] in each character bit n in the preset number table;
the second determining module 403 is configured to select a target character with the smallest number of occurrences and a target character position corresponding to the target character as a search keyword for searching in a target black-and-white list set, and determine a list type to which the character string to be matched belongs.
The invention discloses a black and white list retrieval device based on multiple indexes, which comprises: acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1; traversing the preset number table, and determining the occurrence times of the corresponding character S1[ n ] in each character position n in the preset number table; and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched. According to the device, the retrieval of the black and white list can be realized only by adopting the target characters and the corresponding target character positions, and the problems that the retrieval can be completed and the retrieval efficiency is low after a retriever needs to inquire more character positions for the first character with higher frequency because the character string presents a semantic environment and the occurrence frequency of the characters does not accord with uniform distribution are solved.
In this embodiment of the present invention, the first determining module 402 includes:
an acquisition unit 404 and a first determination unit 405.
Wherein,
the obtaining unit 404 is configured to obtain a corresponding relationship between each character in the character string to be matched and a character position to which the character belongs;
the first determining unit 405 is configured to traverse the preset number table and determine the occurrence number of each corresponding relationship in the preset number table.
In an embodiment of the present invention, a process of constructing a preset number table in the first determining module includes:
a first statistical unit 406 and a second statistical unit 407.
Wherein,
the first statistical unit 406 is configured to count a maximum value L2 of lengths of each character string included in the target black-and-white list and a number L3 of all characters;
the second statistical unit 407 is configured to count the number of occurrences of each character S2[ j ] in each character bit K in a set S2 composed of all characters, where K is greater than or equal to 1 and less than or equal to L2, and j is greater than or equal to 1 and less than or equal to L3;
and the character bit K, each character S2[ j ] and the corresponding occurrence number form the preset number table.
In this embodiment of the present invention, the second determining module 403 includes:
a judging unit 408, a second determining unit 409 and a third determining unit 410.
Wherein,
the judging unit 408 is configured to judge whether the occurrence number corresponding to the search keyword is 1;
the second determining unit 409 is configured to, if yes, use a character string corresponding to the search keyword as a target character string;
the third determining unit 410 is configured to determine, according to the identifier of the target character string, a target list to which the target character string belongs.
In this embodiment of the present invention, the second determining module 403 further includes:
a fourth determination unit 411 and an execution unit 412.
Wherein,
the fourth determining unit 411, configured to continue to input other search keywords if the number of the target character strings is greater than 1, perform a search in the target character strings according to the other search keywords, determine a first target character string, and determine a target list to which the first target character string belongs, or;
the executing unit 412 is configured to execute a corresponding policy according to a user requirement if the number of the target character strings is 0.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method and the device for searching the black and white list based on multiple indexes are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (8)

1. A black and white list retrieval method based on multi-index is characterized by comprising the following steps:
acquiring the length L1 of a character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1;
traversing a preset number table, and determining the occurrence number of the corresponding character S1[ n ] in each character bit n in the preset number table, wherein the method comprises the following steps:
the preset number table is established for the target black-and-white list set, wherein the preset number table comprises three elements of character positions, characters and occurrence times and represents the occurrence times of each character in the target black-and-white list set at the corresponding character position;
acquiring the corresponding relation between each character in the character string to be matched and the character position to which the character belongs;
traversing the preset times table, and determining the occurrence times of each corresponding relation in the preset times table;
and selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black and white list set for retrieval, and determining the list type of the character string to be matched.
2. The method of claim 1, wherein the step of building the predetermined number table comprises:
counting the maximum length L2 and the number L3 of all the character strings in the target black and white list set;
counting the occurrence times of each character S2[ j ] in a set S2 consisting of all characters in each character bit k, wherein k is more than or equal to 1 and less than or equal to L2, and j is more than or equal to 1 and less than or equal to L3;
and the character bit k, each character S2[ j ] and the corresponding occurrence number form the preset number table.
3. The method of claim 1, wherein the target character with the least number of occurrences and the corresponding target character bit thereof are selected from the occurrences as a search keyword to search in a target black-and-white list set, and determining the list type to which the character string to be matched belongs comprises:
judging whether the occurrence frequency corresponding to the search keyword is 1 or not;
if so, taking a character string corresponding to the search keyword as a target character string;
and determining a target list to which the target character string belongs according to the identifier of the target character string.
4. The method of claim 3, further comprising:
if the number of the target character strings is more than 1, other retrieval keywords are continuously input, retrieval is carried out in the target character strings according to the other retrieval keywords, a first target character string is determined, and a target list to which the first target character string belongs is determined, or;
and if the number of the target character strings is 0, executing a corresponding strategy according to the user requirement.
5. A black and white list retrieval device based on multi-index is characterized by comprising:
the acquiring module is used for acquiring the length L1 of the character string to be matched and each character S1[ n ], wherein n is more than or equal to 1 and less than or equal to L1;
the first determining module is configured to traverse a preset order table, and determine the occurrence number of the corresponding character S1[ n ] in each character bit n in the preset order table, where the determining module includes:
the preset number table is established for the target black-and-white list set, wherein the preset number table comprises three elements of character positions, characters and occurrence times and represents the occurrence times of each character in the target black-and-white list set at the corresponding character position;
the acquiring unit is used for acquiring the corresponding relation between each character in the character string to be matched and the character position to which the character belongs;
the first determining unit is used for traversing the preset number table and determining the occurrence frequency of each corresponding relation in the preset number table;
and the second determining module is used for selecting the target character with the least number of occurrences and the corresponding target character position thereof as a retrieval keyword in the target black-and-white list set for retrieval, and determining the list type of the character string to be matched.
6. The apparatus according to claim 5, wherein the building process of the predetermined number table in the first determining module comprises:
a first statistical unit, configured to count a maximum value L2 of lengths of each character string included in the target black-and-white list set and a number L3 of all characters;
a second statistical unit for counting the number of occurrences of each character S2[ j ] in a set S2 composed of all characters in each character bit k, wherein k is more than or equal to 1 and less than or equal to L2, and j is more than or equal to 1 and less than or equal to L3;
and the character bit k, each character S2[ j ] and the corresponding occurrence number form the preset number table.
7. The apparatus of claim 5, wherein the second determining module comprises:
a judging unit configured to judge whether or not the number of occurrences corresponding to the search keyword is 1;
a second determining unit, configured to, if yes, take a character string corresponding to the search keyword as a target character string;
and the third determining unit is used for determining a target list to which the target character string belongs according to the identifier of the target character string.
8. The apparatus of claim 7, further comprising:
a fourth determining unit, configured to continue to input other search keywords if the number of the target character strings is greater than 1, perform a search in the target character strings according to the other search keywords, determine a first target character string, and determine a target list to which the first target character string belongs, or;
and the execution unit is used for executing a corresponding strategy according to the requirement of a user if the number of the target character strings is 0.
CN201910289693.3A 2019-04-11 2019-04-11 Black and white list retrieval method and device based on multiple indexes Active CN110008383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910289693.3A CN110008383B (en) 2019-04-11 2019-04-11 Black and white list retrieval method and device based on multiple indexes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910289693.3A CN110008383B (en) 2019-04-11 2019-04-11 Black and white list retrieval method and device based on multiple indexes

Publications (2)

Publication Number Publication Date
CN110008383A CN110008383A (en) 2019-07-12
CN110008383B true CN110008383B (en) 2021-07-27

Family

ID=67171169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910289693.3A Active CN110008383B (en) 2019-04-11 2019-04-11 Black and white list retrieval method and device based on multiple indexes

Country Status (1)

Country Link
CN (1) CN110008383B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650893A (en) * 2020-12-18 2021-04-13 浙江诺诺网络科技有限公司 Character string retrieval method, system, equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN104298684A (en) * 2013-07-18 2015-01-21 深圳中兴网信科技有限公司 Inquiry method, device and server
CN104408162A (en) * 2014-12-05 2015-03-11 国家电网公司 Multimedia system for forming text indexing and multimedia processing method
CN105260034A (en) * 2015-10-21 2016-01-20 魅族科技(中国)有限公司 Character outputting method and apparatus
CN105550298A (en) * 2015-12-11 2016-05-04 北京搜狗科技发展有限公司 Keyword fuzzy matching method and device
CN109040081A (en) * 2018-08-10 2018-12-18 哈尔滨工业大学(威海) A kind of protocol fields conversed analysis system and method based on BWT

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365998B (en) * 2013-07-12 2016-08-24 华东师范大学 A kind of similar character string search method
CN104765750B (en) * 2014-01-07 2020-12-25 腾讯科技(深圳)有限公司 Input language switching method and device in input method application
CN105450830A (en) * 2014-08-05 2016-03-30 无锡买卖宝信息技术有限公司 Contact person searching method and device
US9560074B2 (en) * 2014-10-07 2017-01-31 Cloudmark, Inc. Systems and methods of identifying suspicious hostnames
US10803040B2 (en) * 2017-08-28 2020-10-13 International Business Machines Corporation Efficient and accurate lookups of data by a stream processor using a hash table
CN107704102B (en) * 2017-10-09 2021-08-03 北京新美互通科技有限公司 Text input method and device
CN108984695B (en) * 2018-07-04 2021-04-06 科大讯飞股份有限公司 Character string matching method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN104298684A (en) * 2013-07-18 2015-01-21 深圳中兴网信科技有限公司 Inquiry method, device and server
CN104408162A (en) * 2014-12-05 2015-03-11 国家电网公司 Multimedia system for forming text indexing and multimedia processing method
CN105260034A (en) * 2015-10-21 2016-01-20 魅族科技(中国)有限公司 Character outputting method and apparatus
CN105550298A (en) * 2015-12-11 2016-05-04 北京搜狗科技发展有限公司 Keyword fuzzy matching method and device
CN109040081A (en) * 2018-08-10 2018-12-18 哈尔滨工业大学(威海) A kind of protocol fields conversed analysis system and method based on BWT

Also Published As

Publication number Publication date
CN110008383A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
US8266152B2 (en) Hashed indexing
JP4881322B2 (en) Information retrieval system based on multiple indexes
EP2727247B1 (en) Database compression system and method
CN101796480B (en) Integrating External Related Phrase Information into Phrase-Based Indexing Information Retrieval Systems
US8549000B2 (en) Methods and systems for compressing indices
US20080208833A1 (en) Context snippet generation for book search system
CN106202248A (en) Search based on phrase in information retrieval system
CN103136260A (en) Method and device for applying filtration factor assessment in optimization of access path in database
CN111737608B (en) Method and device for ordering enterprise information retrieval results
CN102073684A (en) Method and device for excavating search log and page search method and device
CN113553339A (en) Data query method, middleware, electronic device and storage medium
CN110580255A (en) method and system for storing and retrieving data
CN103377224B (en) Identify the method and device of problem types, set up the method and device identifying model
CN115809248B (en) Data query method and device and storage medium
CN110008383B (en) Black and white list retrieval method and device based on multiple indexes
CN110334191A (en) A Set Similarity Query Algorithm Based on Length Partition
CN103226550B (en) A kind of focus incident based on inquiry input determines method and system
CN111581327A (en) Administrative law enforcement assisting method and device
KR101557960B1 (en) Device for selecting core kyword, method for selecting core kyword, and method for providing search service using the same
Baziz et al. Evaluating a conceptual indexing method by utilizing wordnet
WO2018182058A1 (en) Join method for relational database
US7774347B2 (en) Vortex searching
CN108984582A (en) A kind of inquiry request processing method
KR101414999B1 (en) Search result providing system and method using tag based boolean query matching
KR101331946B1 (en) Search method using wildcard matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant