[go: up one dir, main page]

CN110888894A - Patent search method, server and computer readable medium - Google Patents

Patent search method, server and computer readable medium Download PDF

Info

Publication number
CN110888894A
CN110888894A CN201811041125.3A CN201811041125A CN110888894A CN 110888894 A CN110888894 A CN 110888894A CN 201811041125 A CN201811041125 A CN 201811041125A CN 110888894 A CN110888894 A CN 110888894A
Authority
CN
China
Prior art keywords
symbol
text
keywords
symbols
patent text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811041125.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongli Hui Information Technology Co Ltd
Original Assignee
Shenzhen Zhongli Hui Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongli Hui Information Technology Co Ltd filed Critical Shenzhen Zhongli Hui Information Technology Co Ltd
Priority to CN201811041125.3A priority Critical patent/CN110888894A/en
Publication of CN110888894A publication Critical patent/CN110888894A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a patent searching method, a server and a computer readable medium, wherein the patent searching method comprises the following steps: recognizing the full text or the designated part of the patent text in a patent database, and extracting keywords corresponding to the symbols of the patent text from the patent text; storing the extracted keywords in a keyword database, and storing the corresponding relation between the keywords and the corresponding patent text; receiving a patent search request, wherein the patent search request comprises non-bibliographic item keywords; acquiring keywords matched with the keywords of the non-bibliographic projects in the keyword database; determining a patent search result of the patent search request according to the corresponding relation between the keywords and the corresponding texts; and returning the patent search result, so that the user can quickly find the actually needed patent with high value.

Description

Patent search method, server and computer readable medium
Technical Field
The invention relates to the field of computer communication, in particular to a patent searching method, a server and a computer readable medium.
Background
In the era of the current high-speed innovation of the technology and the continuous emergence of new inventions, a great amount of inventions are filed every day in the form of patent applications, and a great amount of new patents are published and issued every week. It is very significant for the individuals and organizations who are engaged in the invention creation work or interested in the invention creation and its results to fully mine the patent information including the invention creation, research and analyze the contents and utilize the contents. However, since many relatively worthless "junk" patents are often included in patent applications, it becomes very difficult to find out the actually needed high-value patents from the large amount of retrieved patent documents. The traditional patent retrieval method only uses the 'relevancy' provided by a search engine to sort, which is too dependent on the expression patent retrieval keywords, so that a user can not find out the actually needed patents with high value quickly.
Therefore, a new patent searching method is needed to find out the patents with high value actually needed by the user.
Disclosure of Invention
The invention is based on the above problems, and provides a patent searching method, a server and a computer readable medium, so that a user can quickly find a patent with high value which is actually needed.
In view of this, a first aspect of the embodiments of the present invention provides a method for patent search, where the method includes:
recognizing the full text or the designated part of the patent text in a patent database, and extracting keywords corresponding to the symbols of the patent text from the patent text;
storing the extracted keywords in a keyword database, and storing the corresponding relation between the keywords and the corresponding patent text;
receiving a patent search request, wherein the patent search request comprises non-bibliographic item keywords;
acquiring keywords matched with the keywords of the non-bibliographic projects in the keyword database;
determining a patent search result of the patent search request according to the corresponding relation between the keywords and the corresponding texts;
and returning the patent search result.
In some possible designs, extracting keywords corresponding to the symbols of the patent text from the patent text specifically includes:
identifying numbers or English letters or a combination of the numbers and the English letters in the patent text to obtain a first candidate symbol;
traversing the patent text to determine the occurrence number of the first candidate symbol, and removing the candidate symbol which only occurs once from the first candidate symbol to obtain a second candidate symbol;
comparing characters at adjacent positions of each second candidate symbol in the same second candidate symbols, and determining the second candidate symbols adjacent to the characters which appear repeatedly as the symbols of the patent text.
In some possible designs, after determining a word adjacent to a repeated word as a symbol of the patent text by comparing words adjacent to each of the same second candidate symbols, the method further includes:
determining characters repeatedly appearing at adjacent positions of the symbols as symbol names of the patent texts;
and storing the corresponding relation between the symbol name and the symbol.
In some possible designs, the weight of the symbol names is positively correlated with the frequency with which the symbol names appear in the patent text.
In some possible designs, the ranking of the patents corresponding to the patent texts in the patent search results is determined according to the weight of the symbolic name in each patent text.
In some possible designs, after determining the character repeatedly appearing at the adjacent position of the symbol as the symbol name of the patent text, the method further includes:
classifying the symbol names to obtain symbol name categories and weights of the symbol name categories, wherein the weights of the symbol name categories are the sum of the weights of the symbol names;
and taking the weight of the symbol name category as the weight of each symbol name in the symbol name category.
In some possible designs, after identifying the full text or the designated part of the patent text in the patent database and extracting the keywords corresponding to the symbols of the patent text from the patent text, the method further includes:
and arranging corresponding symbol controls for linking the symbol names in the adjacent area of the drawings of the patent texts.
In some possible designs, corresponding symbol names are arranged in adjacent positions of symbols in the drawing area of the patent text.
A second aspect of embodiments of the present invention provides a server comprising a processor, an input device, an output device and a memory, the processor, the input device and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions that, when executed by the processor, cause the processor to perform the method of the first aspect or any possible design of the first aspect.
A third aspect of embodiments of the present invention provides a computer readable medium having a computer program stored thereon, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect or any possible design of the first aspect.
The technical scheme of the embodiment of the invention identifies the full text or the designated part of the patent text in the patent database, and extracts the key words corresponding to the symbols of the patent text from the patent text; storing the extracted keywords in a keyword database, and storing the corresponding relation between the keywords and the corresponding patent text; receiving a patent search request, wherein the patent search request comprises non-bibliographic item keywords; acquiring keywords matched with the keywords of the non-bibliographic projects in the keyword database; determining a patent search result of the patent search request according to the corresponding relation between the keywords and the corresponding texts; and returning the patent search result, so that the user can quickly find the actually needed patent with high value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic system architecture diagram of a patent search system according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method of patent searching provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The present invention will be described in detail with reference to fig. 1 in conjunction with examples.
In some embodiments of the invention, the invention provides a patent search system 100 comprising a terminal 101 and a server 102.
It should be understood that, in the embodiment of the present invention, the terminal 101 may also be a client in the art, and there is at least one terminal 101. In particular implementations, the terminal 101 described in embodiments of the present invention includes, but is not limited to, other portable devices such as a mobile phone, a laptop computer, or a tablet computer. It should also be understood that in some embodiments of the invention, the device is not a portable communication device, such as a desktop personal computer or a large workstation.
In the following discussion, a terminal including a display and an input device is described. However, it should be understood that the terminal 101 may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
It should be understood that the terminal 101 and the server 102 may be physical devices, or may be apparatuses, units or modules disposed in the physical devices; in addition, the terminal 101 and the server 102 may be implemented on the same physical device, or may be implemented on different physical devices.
In some embodiments of the present invention, the server 102 may be a local area network server, or may be a local server, but is not limited thereto. It should be noted that there may be one or more servers 102, and those skilled in the art may select different numbers of servers according to actual needs to achieve the effect of improving the search efficiency. The server 102 is connected to a patent database, where the connection relationship specifically includes: the patent database is included in the server 102 or the patent database is independent of the server 102, it being understood that the above two embodiments are only some preferred embodiments of the present invention and do not represent the present invention in only the above two cases.
The method of patent search as shown in fig. 2 includes:
step S01: the method comprises the steps of identifying the full text or the designated part of the patent text in a patent database, and extracting keywords corresponding to the symbols of the patent text from the patent text.
In some embodiments of the invention, the patent text includes, according to the kind of the patent text, a patent publication that is an original text filed by the applicant to the patent office and a patent grant that is a final publication that is regarded as being in compliance with patent claims after the amendment by the applicant. Specifically, according to the protection object of the patent text, the patent text includes an invention patent text or a utility model patent text with drawings, and more specifically, the invention patent text includes an abstract, an abstract drawing, a specification, a claim, and a specification drawing; the utility model discloses a patent text includes abstract, abstract drawing, description, claim, description drawing. The patent text is not limited thereto, and may be a us patent text, a japanese patent text, a european patent text, and a patent text of other countries, but limited by statute or agreement of agreement specified in each country or each region. The foregoing are merely some preferred embodiments of the present invention, and are not exhaustive.
It should be noted that the patent text specifying part may specify a part of the content in the patent text for the user, and when the user does not specify the content of the patent text, the server 102 may further specify the full-text content or the part of the content of the patent text according to a preset rule. The preset rule is a rule preset by a user and used for specifying the content of the patent text, namely, the user can specify the full-text content or part of the content of the patent text in advance. The server 102 defaults to specify the patent text according to the preset rule when the user does not perform an additional operation instruction. For example, the user may specify a specific embodiment section in the specification of the patent text, or the user may specify a specific embodiment section in the specification of the patent text in advance, but is not limited thereto. It should be understood that the patent text designation section is a literal section of the patent text, and not a drawing of the patent text.
It should be understood that the server 102 identifies the full text or the designated content of the patent text in the patent database in various ways, for example, when the patent text is in PDF format, or TIF format, or other picture format, the patent text can be identified by image identification; when the patent text is in a text format, such as WORD document format or TXT document format, the text content identification may be used to perform data analysis on the text portion of the patent text, or the text content identification may be used to identify the full text or the designated content in other manners without being limited thereto.
In some embodiments of the present invention, the symbols of the patent text may include various symbols, for example, the symbols of the patent text may include reference numerals of the patent text, and may also include symbols other than the reference numerals of the patent text, but not limited thereto.
In some embodiments of the invention, common symbols include the following categories:
(1) a symbol of the numeric type. For example, "100" in "body 100" is a numeric type symbol. The numeric symbol may be a single number or a numeric character string having a plurality of characters, and is not particularly limited.
(2) A symbol in which both a number and a letter are combined. For example, "100 a" in "case 100 a" is a symbol in which a number and a letter are combined.
(3) A symbol in the form of a letter. For example, "a" in "the first axis a" is a symbol in the form of a letter.
(4) Other symbol forms set by the user. Other symbol types are included in addition to the types of symbols previously described. The symbol is not particularly limited, and may be in a conventional form or a form flexibly set by a user according to actual needs.
In some embodiments of the present invention, the keywords corresponding to the symbols of the patent text are specifically phrases at positions adjacent to the symbols in the patent text, and the phrases mainly include words or phrases, and possibly simplified words, such as english abbreviations or chinese abbreviations. For example, "body 100", wherein "body" is a keyword in the form of a phrase, and wherein "100" is a symbol corresponding thereto; for example, "PC 01", where "PC" is an english abbreviation of personal computer; for example, the "dc motor 18" is a chinese abbreviation for a dc motor. It should be noted that, in the patent text, the symbol and the corresponding keyword appear at the same time, and the keyword usually precedes the symbol, and there is no punctuation symbol or character other than a space allowed to appear between the two. Of course, for some special cases it is also possible to write the keyword after the corresponding symbol without any punctuation marks or characters other than spaces between them, for example, in the description part of the patent text, "a base 4 is provided, on which base 4 a lighting device 5 is fixedly mounted. The "middle" base 4 ", the" base "and the" 4 "have no punctuation marks or characters other than spaces in the detailed description of the patent text, and the keyword" base "is located after the" 4 ", but is not limited thereto.
Further, extracting keywords corresponding to the symbols of the patent text from the patent text specifically includes: identifying numbers or English letters or a combination of the numbers and the English letters in the patent text to obtain a first candidate symbol; traversing the patent text to determine the occurrence number of the first candidate symbol, and removing the candidate symbol which only occurs once from the first candidate symbol to obtain a second candidate symbol; comparing characters at adjacent positions of each second candidate symbol in the same second candidate symbols, and determining the second candidate symbols adjacent to the characters which appear repeatedly as the symbols of the patent text.
In some embodiments of the present invention, the symbols of the patent text may also be determined in other ways, for example, firstly obtaining symbols in the patent text, wherein the symbols are numbers or english letters or a combination of the numbers and the english letters, and of course, the symbols may also be in other forms. Furthermore, the symbol acquisition in the patent text may be acquired not only in the drawings of the patent text, but also in the text of the patent text, and a person skilled in the art may select a desired symbol acquisition mode according to actual conditions. At this time, the server 102 takes the symbol obtained from the patent text as the first candidate symbol, which is obtained in order to obtain a more accurate symbol in a subsequent step.
For example, a number or an english letter or a combination of both is recognized in one patent text, and "20", "21", "22", "23", "24", "25", "26", "27", "28", etc. are acquired as first candidate symbols in the patent text. Through a first traversal, the number of times of occurrence of the first candidate symbol in the patent text is found, and when the candidate symbols "20", "23", "26", "27" only occur once in the patent text, this indicates that the candidate symbols "20", "23", "26", "27" are not highly related to the patent text and thus are not highly important relative to the patent text, so that the symbols appearing only once in the patent text should not be included in the keywords of the patent text, and thus the symbols "20", "23", "26", "27" appearing only once can be removed from all the first candidate symbols found in the patent text. Therefore, after the first traversal, the candidate symbols of the patent text are "21", "22", "24", "25" and "28", and for convenience of description, the previous candidate symbols are referred to as "21", "22", "24", "25" and "28", and thus, the candidate symbols are referred to as "21", "22", "24", "25" and "28" as the second candidate symbols of the patent text.
More specifically, the second traversal is performed to find the symbol name corresponding to the second candidate symbol, and first, the second candidate symbol "28" is described as an example according to the position relationship in the patent text where the symbol name is located at the adjacent position of the second candidate symbol, where the symbol name of the second candidate symbol "28" is referred to as "light source", but the invention is not limited thereto. Traversing a first character before the symbol 28, acquiring the first character before all the symbols 28, determining whether the first character is the same before all the symbols 28, if so, storing the first character, and at the moment, the first character is 'light'; performing a third traversal, performing traversal on a second character before the symbol "28", obtaining the second character before all the symbols "28", determining whether the second character is the same before all the symbols "28", if so, storing the second character, and at this time, the first character is a "source"; and performing a fourth traversal, performing traversal on a third character before the symbol "28", obtaining a third character before all the symbols "28", determining whether the third character is the same before all the symbols "28", if so, storing the third character, wherein the third character is the aforementioned ", performing a fifth traversal in the same way to obtain a fourth character" so ", and performing a sixth traversal to find that the fifth character is different before all the symbols" 28 ", so that traversal is stopped, and obtaining a total of four characters" the light source ".
More specifically, since "said" is a qualifier that is irrelevant to the contents of the patent text, the third character and the fourth character "said" are removed, so that a symbol name corresponding to the second candidate symbol "28" is obtained as "light source", in which case, the second candidate symbol "28" is a symbol of the patent text, and the "light source" is a keyword corresponding to the second candidate symbol "28".
The above examples are only some of the preferred embodiments of the present invention, but not limiting. It should be noted that the symbol name corresponding to the second candidate symbol may be obtained through traversal, and traversal may be performed not only from an adjacent position before the second candidate symbol, but also from an adjacent position after the second candidate symbol, and the traversal is stopped until no repeated text is formed. It should be understood that for symbols in numbers or letters or a combination of the two or other forms, considering that the symbols include symbols in a single character form and a character string form, the server 102 combines symbols in the single character form in adjacent positions into symbols in the character string form in the first pass, and uses the symbols in the character string form as the first candidate symbols of the patent text; and if the adjacent positions of the symbols in the single character form have no other symbols in the single character form, taking the symbols in the single character form as the first candidate symbols of the patent text.
In some embodiments of the present invention, the server 102 performs traversal in the patent text to obtain the number of occurrences of the first candidate symbol in the patent text. More specifically, the server 102 extracts a candidate symbol in which only one symbol appears from the first candidate symbol to obtain the second candidate symbol, so that the server 102 can more easily obtain a keyword having a greater correlation with the patent text.
In some embodiments of the present invention, the server 102 further compares the adjacent words of each of the second candidate symbols, and determines the adjacent second candidate symbol of the repeated words as the symbol of the patent text. Since symbols irrelevant to the contents of the patent text, such as "fig. 1", "step S1", "(1)", may appear in the second candidate symbols, these symbols need to be removed, but if "fig. 1" appears in the patent text during traversal, characters in adjacent positions before "fig. 1" also repeatedly appear, so as to form "map 1" or "seascape 1" or other phrases or nouns with "fig. 1", the "fig. 1" appearing in the patent text is retained and cannot be removed. In addition, it should be noted that, since the punctuation mark does not relate to the content of the patent text, and the keyword cannot be obtained according to the punctuation mark, the first candidate symbol, the second candidate symbol, and the symbol of the patent text do not include the punctuation mark. Therefore, in order to select a symbol and a symbol name having a high correlation with the contents of the patent document, it is necessary to remove a limiting term such as "first" or other ordinal words representing logical orders, "step" and "the" if the limiting term appears in the symbol name or the keyword corresponding to the symbol. In addition to the above list, the term of limitation is not limited thereto. Therefore, the second candidate symbol adjacent to the repeated character is determined as the symbol of the patent text, so that the symbol irrelevant to the patent text can be removed.
Further, after comparing the characters at the adjacent positions of each of the second candidate symbols which are the same, and determining the second candidate symbol adjacent to the repeatedly appearing characters as the symbol of the patent text, the method further includes: determining characters repeatedly appearing at adjacent positions of the symbols as symbol names of the patent texts; and storing the corresponding relation between the symbol name and the symbol.
In some embodiments of the present invention, the repeated appearance of the adjacent positions of the symbols is a specific explanation of the symbols, that is, the repeated appearance of the adjacent positions of the symbols is the name of the symbols. The server 102 determines the characters repeatedly appearing at the adjacent positions of the symbols as the symbol names of the patent texts, and the server also stores the symbol names and the symbols.
It should be noted that there are two main forms of the symbol names, which are phrases and sentences respectively. The phrase mainly comprises words or phrases. For the drawings describing structural relationships, the symbol names of the phrase form are the main forms. For example, "body 100", wherein "body" is a symbol name in the form of a phrase, and "100" is a symbol corresponding thereto.
Further, the weight of the symbol name is positively correlated with the frequency of occurrence of the symbol name in the patent text.
In some embodiments of the present invention, the higher the frequency of the symbol name appearing in the patent text, the higher the correlation between the symbol name and the patent text, so that the server 102 can more easily find the actually needed patent text with high value by the symbol name. Therefore, the weight of the symbol name is positively correlated with the frequency of the symbol name appearing in the patent text, and when the frequency of the symbol name appearing in the patent text is higher, the weight of the symbol name in the patent text is higher, and the server 102 can find the actually needed patent text with high value more quickly and easily.
Further, after determining characters repeatedly appearing at adjacent positions of the symbols as the symbol names of the patent texts, the method further comprises the following steps: classifying the symbol names to obtain symbol name categories and weights of the symbol name categories, wherein the weights of the symbol name categories are the sum of the weights of the symbol names; and taking the weight of the symbol name category as the weight of each symbol name in the symbol name category.
In some embodiments of the present invention, in order to better arrange the symbol names and reduce the computation load of the server 102, the server 102 further classifies the symbol names, and the symbol names of the same class are used as one class of symbol names and are distinguished from the symbol names of other classes, at which time the server 102 obtains the symbol name class and the weight of the symbol name class. It should be noted that the weight of the symbol name category is the sum of the weights of the symbol names in the category. More specifically, the weight of each symbol name in each symbol name category may also be the weight of the symbol name category, so as to associate different dispersed symbol names, thereby greatly improving the working efficiency of the server 102.
In an embodiment of the present invention, for the purpose of specific description, the "first side 101", "second side 102", "third side 103", "fourth side 104", "first bottom 120", and "second bottom 130" are taken as examples for description, but are not limited thereto. The "first side 101", "second side 102", "third side 103", "fourth side 104", "first bottom 120" and "second bottom 130" are all different symbols and symbol names, but may be sorted into two categories, and in the category that "first side 101", "second side 102", "third side 103", "fourth side 104" may be classified as "sides", and "first bottom 120" and "second bottom 130" may be classified as "bottoms", and the weight of a side in the patent text is the sum of the weights of "first side 101", "second side 102", "third side 103", "fourth side 104" in the patent text, while the weight of each of "first side 101", "second side 102", "third side 103", "fourth side 104" in the patent text may also be the weight of a "side" in the patent text, i.e., the sum of the weights, and so on, as do "first floor 120", "second floor 130", and "floor".
Further, after recognizing the full text or the designated part of the patent text in the patent database and extracting the keywords corresponding to the symbols of the patent text from the patent text, the method further comprises: and arranging corresponding symbol controls for linking the symbol names in the adjacent area of the drawings of the patent texts.
In some embodiments of the present invention, it should be noted that the symbolic control includes a label control, a button control, a rotation control, an animation control, or a slider control, and the symbolic control may also be other controls, which is not exhaustive here. Preferably, the symbol control is specifically a label control, and a user may directly send a request for obtaining a symbol name corresponding to the symbol to the server 102 by clicking the symbol on the terminal 101. After the server 102 receives the request for obtaining the symbol names matched with the symbols, the server 102 returns the corresponding relationship between the symbols and the corresponding symbol names to the terminal 101, and at this time, the user does not need to make one-to-one correspondence between the symbols and the corresponding symbol names by himself or herself, but the server 102 automatically arranges the symbols, so that the user is more convenient, and the energy and time of the user are saved. In other embodiments of the present invention, the symbol control may be specifically a label control, the server 102 sends the symbol, the symbol name corresponding to the symbol, and the corresponding relationship between the symbol and the symbol name to the terminal 101 in advance, and the user clicks the selected symbol on the terminal 101, so that the terminal 101 automatically and directly jumps to the position of the symbol name corresponding to the symbol on the client and displays the position of the symbol name on the client, and the server 102 does not need to perform operation, which greatly reduces the operation load of the server 102, reduces the reaction time between the terminal 101 and the server 102, and greatly improves the user experience.
It should be noted that, the user may click the label control by touching and clicking on the touch display screen, or by clicking with a mouse.
Further, corresponding symbol names are arranged at adjacent positions of symbols in the drawing area of the patent document.
It should be noted that when a user needs to obtain the patent text through the terminal 101, the terminal 101 often displays a full text of the patent text and a drawing of the patent text, but the text content of the patent text and the drawing are displayed separately, and the user often needs to spend much effort and time to correspond the text content of the patent text and the drawing one to one, so as to obtain high-value information actually required by the user. In some embodiments of the present invention, a symbol corresponding to a drawing and a symbol name corresponding to the symbol may be arranged in a blank area beside the drawing of the patent text. The configured symbol and the corresponding symbol name are located at adjacent positions, so that a user can quickly find the actually required symbol and the corresponding symbol.
Step S02: and storing the extracted keywords in a keyword database, and storing the corresponding relation between the keywords and the corresponding patent text.
In some embodiments of the present invention, the server 102 first establishes a keyword database, which is used to store the extracted keywords and store the correspondence between the keywords and the corresponding patent texts. At this time, the symbol name is a keyword stored in the keyword database by the server 102, and the correspondence between the symbol name and the symbol is the correspondence between the keyword extracted from the patent text by the server 102 and the symbol. For example, the correspondence relationship may be as follows: when the server 102 retrieves a required keyword from the keyword database, the server also retrieves a patent text corresponding to the required keyword, and may also be in other corresponding relations, which is not limited to this. Therefore, by extracting the keywords from the patent texts and obtaining the corresponding relations corresponding to the keywords, the server 102 can retrieve the patent texts more accurately and find the actually needed patent texts with high value more easily.
Step S03: a patent search request is received, the patent search request including non-bibliographic keyword.
It should be understood that the patent search request may specifically be a patent search request submitted after a user inputs a patent search keyword through at least one terminal 101, where the patent search request specifically includes the patent search keyword and other related information. In some embodiments of the present invention, the patent search request includes non-bibliographic keywords, wherein bibliographic keywords include: application number, application date, invention creation name, classification number, priority item (including prior application number, application date, and original agency name), applicant or patentee item (including applicant or patentee name, nationality or registered country, address, zip code, organization code, or resident identification document number), inventor name, patent agent item (including patent agent name, organization code, address, zip code, patent agent name, practice number, contact number), contact item (including name, address, zip code, contact number), and representative, etc. The bibliographic project keywords related to human matters comprise: applicant or patentee entry, inventor name, patent agency entry, contact entry, representative. For example, the non-bibliographic keywords included in the patent search request may be keywords related to the subject matter of the patent text, or keywords corresponding to the content of the patent abstract, the claims, or any part of the specification in the patent text, such as a car, a display panel, etc., and may also be other keywords, but are not limited thereto.
In some embodiments of the present invention, the patent search request is specifically a patent search request that is sent by the at least one terminal 101 to the server 102 to acquire a target patent, specifically, after the at least one terminal 101 and the server 102 are in network connection, a user opens a client of the at least one terminal 101, inputs the patent search request to the client of the at least one terminal 101, and after the at least one terminal 101 acquires the patent search request, the patent search request is sent to the server 102. Specifically, the client may be a browser, may also be APP software, and may also not be limited thereto.
It should be noted that before the server 102 receives the patent search request sent by the at least one terminal 101, the at least one terminal 101 further sends address information of each of the at least one terminal 101 to the server 102, where the address information includes a domain name of each terminal. After the server 102 receives the address information sent by the at least one terminal 101, the server 102 returns patent search page information to the at least one terminal 101, so that after the at least one terminal 101 receives the patent search page information, a user can operate in a patent search page displayed on a client of the at least one terminal 101 to input a patent search request of a required target patent text.
Step S04: and acquiring keywords matched with the keywords of the non-bibliographic projects in the keyword database.
In some embodiments of the present invention, the patent search request is generated according to a keyword input by a user on a patent search page displayed by the at least one terminal 101. Specifically, the user inputs a patent search request submitted after a patent search keyword through at least one terminal 101, and specifically, the at least one terminal 101 may classify and sort according to the patent search keyword input by the user. Preferably, the server 102 may also perform classification and sorting according to a patent search keyword instruction sent by a user, so as to generate a more accurate keyword.
In some embodiments of the present invention, the server 102 obtains the keywords matching the non-bibliographic item keywords from the keyword database according to the keywords generated after the classification. The server 102 firstly analyzes and matches the keywords generated after the classification and the sorting with the keywords in the keyword database.
More specifically, the manner of matching the keywords may be performed by similarity, for example, it is determined whether there is a keyword in the keyword database, which has a similarity greater than or equal to 90% with the keywords generated after the classification and sorting, and if so, the server 102 regards the keyword, which has a similarity greater than or equal to 90%, as the keyword matched with the non-bibliographic item keyword. In addition, the keyword may be matched in other manners, which is not limited to this.
Step S05: determining a patent search result of the patent search request according to the corresponding relation between the keywords and the corresponding texts; and returning the patent search result.
In some embodiments of the present invention, the patent search result may specifically be a content of a patent text (including an attached drawing) corresponding to the patent search request, may be bibliographic item information of the patent text corresponding to the patent search request, may also be a ranked list of the patent text corresponding to the patent search request, may also be a list of ranking the patent text according to a weight of a keyword extracted from the patent text corresponding to the patent search request, and may even be any combination of the above information corresponding to the patent search request, but not limited thereto, the patent search result may be any information corresponding to the patent search request, and a person skilled in the art may determine the specific content of the output patent search result according to actual needs.
In some embodiments of the present invention, since the keyword library further stores patent texts having corresponding relations with keywords, after the server 102 obtains the keywords matching the non-bibliographic item keywords in step S04, the server 102 may thus obtain patent search results matching the patent search request, where the relevance of the patent search results obtained by the server 102 to the patent search request is greater.
In some embodiments of the present invention, after obtaining the patent search result matching the patent search request, the server 102 stores the patent search result matching the patent search request in a temporary storage space and returns the patent search result to the terminal 101, thereby completing the whole patent search process. Since there is at least one terminal 101, in order to avoid the problem that at least one terminal sends the same patent search request within the same time period (for example, 10 minutes), and the server 102 needs to perform twice repeated data processing, the server 102 is further configured with a temporary storage space, and preset temporary storage time, and after the preset temporary storage time is exceeded, the server 102 automatically clears the patent search result data in the temporary storage space. For example, the temporary storage time is set to 24 hours, and other times may be set. The server 102 may transmit the patent search result matching the patent search request in the temporary storage space to a different terminal 101.
In some embodiments of the present invention, after receiving the patent search result matching the patent search request, the at least one terminal 101 displays the patent search result on the client of the at least one terminal 101, so that the user can directly and intuitively view the patent search result in the display of the at least one terminal 101. Optionally, the display may be a touch screen display and/or a touch pad, or may also be a non-touch display and/or a touch pad, or may not be limited thereto.
Further, the ranking of the patents corresponding to the patent texts in the patent search results is determined according to the weight of the symbolic names in each patent text.
In some embodiments of the present invention, the server 102 determines, according to the weight of the symbol name in each patent text, the ranking of the patent corresponding to the patent text in the patent search result, and when the weight of the symbol name in the patent text is higher, the ranking of the patent text in the patent search result is higher, so that when the user operates the at least one terminal 101 to input a non-bibliographic item keyword, and sends the patent search request, the obtained patent search result is faster and easier to find a patent text matching the non-bibliographic item keyword.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a search server according to an embodiment of the present invention. The search server of this embodiment includes components for performing the steps in the foregoing embodiments, and please refer to the relevant description in the foregoing embodiments, which is not repeated herein. The search server of the present embodiment comprises a processor 302, an input device (not shown), an output device (not shown) and a memory 306, the processor 302, the input device and the memory 306 being connected to each other, wherein the memory 306 is configured to store a computer program 3061, and the computer program 3061 comprises program instructions which, when executed by the processor, cause the processor to perform the respective methods of the embodiments of the present invention described above.
Optionally, the structure of the search server provided by the embodiment of the present invention includes at least one processor 302 (e.g., CPU), at least one network interface 305 or other communication interfaces, a memory 306, and at least one communication bus 303; a communication bus 303 is used to enable connection communication between these components. The processor 302 is used to execute executable modules, such as computer programs, stored in the memory 306. The memory 306 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection with at least one other network element is realized through at least one network interface 305 (which may be wired or wireless).
In some implementations, the memory 306 stores programs 3061, and the processor 302 executes the programs 3061 for performing the various methods of the embodiments of the present invention described above.
The computer readable medium may be an internal storage unit of the server according to any of the foregoing embodiments, for example, a hard disk or a memory of the server. The computer readable medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the server. Further, the computer readable medium may also include both an internal storage unit of the server and an external storage device. The computer readable medium is used for storing the computer program and other programs and data required by the server. The computer readable medium may also be used for temporarily storing data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional generic sense in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the method, the apparatus and the server described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed method, apparatus and server may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of patent searching, comprising:
recognizing the full text or the designated part of the patent text in a patent database, and extracting keywords corresponding to the symbols of the patent text from the patent text;
storing the extracted keywords in a keyword database, and storing the corresponding relation between the keywords and the corresponding patent text;
receiving a patent search request, wherein a patent search request language of the patent search request comprises non-bibliographic item keywords;
acquiring keywords matched with the keywords of the non-bibliographic projects in the keyword database;
determining a patent search result of the patent search request according to the corresponding relation between the keywords and the corresponding texts;
and returning the patent search result.
2. The method of patent search according to claim 1, wherein extracting keywords corresponding to symbols of the patent text from the patent text specifically comprises:
identifying numbers or English letters or a combination of the numbers and the English letters in the patent text to obtain a first candidate symbol;
traversing the patent text to determine the occurrence number of the first candidate symbol, and removing the candidate symbol which only occurs once from the first candidate symbol to obtain a second candidate symbol;
comparing characters at adjacent positions of each second candidate symbol in the same second candidate symbols, and determining the second candidate symbols adjacent to the characters which appear repeatedly as the symbols of the patent text.
3. The method of patent search according to claim 2, wherein after determining the second candidate symbol adjacent to the repeated character as the symbol of the patent text by comparing the characters at adjacent positions of each of the second candidate symbols which are the same, further comprising:
determining characters repeatedly appearing at adjacent positions of the symbols as symbol names of the patent texts;
and storing the corresponding relation between the symbol name and the symbol.
4. The method of patent search according to claim 3, wherein the weight of the symbol name is positively correlated with the frequency of occurrence of the symbol name in the patent text.
5. The method for patent search according to claim 4, wherein the ranking of the patents corresponding to the patent texts in the patent search results is determined according to the weight of the symbolic names in each patent text.
6. The method of patent search according to claim 4 or 5, wherein after determining the character whose symbol adjacent position appears repeatedly as the symbol name of the patent text, further comprising:
classifying the symbol names to obtain symbol name categories and weights of the symbol name categories, wherein the weights of the symbol name categories are the sum of the weights of the symbol names;
and taking the weight of the symbol name category as the weight of each symbol name in the symbol name category.
7. The method for searching for patent according to claim 3, wherein after identifying the full text or the designated part of the patent text in the patent database and extracting the keywords corresponding to the symbols of the patent text from the patent text, the method further comprises:
and arranging corresponding symbol controls for linking the symbol names in the adjacent area of the drawings of the patent texts.
8. The method of patent search according to claim 3, wherein corresponding symbol names are arranged at adjacent positions of symbols in a drawing area of the patent text.
9. A server, characterized in that it comprises a processor, an input device, an output device and a memory, which are interconnected, wherein the memory is used to store a computer program comprising program instructions, which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-8.
10. A computer-readable medium, characterized in that the computer storage medium has a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-8.
CN201811041125.3A 2018-09-07 2018-09-07 Patent search method, server and computer readable medium Pending CN110888894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811041125.3A CN110888894A (en) 2018-09-07 2018-09-07 Patent search method, server and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811041125.3A CN110888894A (en) 2018-09-07 2018-09-07 Patent search method, server and computer readable medium

Publications (1)

Publication Number Publication Date
CN110888894A true CN110888894A (en) 2020-03-17

Family

ID=69744416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811041125.3A Pending CN110888894A (en) 2018-09-07 2018-09-07 Patent search method, server and computer readable medium

Country Status (1)

Country Link
CN (1) CN110888894A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784026A (en) * 2021-01-12 2021-05-11 国网江苏省电力有限公司电力科学研究院 Patent information analysis method and device based on big data and storage medium
CN113239194A (en) * 2021-04-30 2021-08-10 中国航空工业集团公司西安飞机设计研究所 Patent review method, system, storage medium and electronic device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784026A (en) * 2021-01-12 2021-05-11 国网江苏省电力有限公司电力科学研究院 Patent information analysis method and device based on big data and storage medium
CN112784026B (en) * 2021-01-12 2024-04-16 国网江苏省电力有限公司电力科学研究院 Patent information analysis method, equipment and storage medium based on big data
CN113239194A (en) * 2021-04-30 2021-08-10 中国航空工业集团公司西安飞机设计研究所 Patent review method, system, storage medium and electronic device
CN113239194B (en) * 2021-04-30 2023-05-05 中国航空工业集团公司西安飞机设计研究所 Patent evaluation method, system, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US10489439B2 (en) System and method for entity extraction from semi-structured text documents
JP4701292B2 (en) Computer system, method and computer program for creating term dictionary from specific expressions or technical terms contained in text data
US9483460B2 (en) Automated formation of specialized dictionaries
Borges et al. Discovering geographic locations in web pages using urban addresses
US9965459B2 (en) Providing contextual information associated with a source document using information from external reference documents
CN107085583B (en) Electronic document management method and device based on content
US20130124515A1 (en) Method for document search and analysis
CA3177671A1 (en) Enquiring method and device based on vertical search, computer equipment and storage medium
US20220222292A1 (en) Method and system for ideogram character analysis
CN109299235B (en) Knowledge base searching method, device and computer readable storage medium
WO2020056977A1 (en) Knowledge point pushing method and device, and computer readable storage medium
US20090112845A1 (en) System and method for language sensitive contextual searching
US11520835B2 (en) Learning system, learning method, and program
CN110741376A (en) Automatic document analysis for different natural languages
US12099551B2 (en) Information search system
Sutoyo et al. Detecting documents plagiarism using winnowing algorithm and k-gram method
Saini et al. Intrinsic plagiarism detection system using stylometric features and DBSCAN
CN110888894A (en) Patent search method, server and computer readable medium
CN119739838A (en) RAG intelligent question answering method, device, equipment and medium for multi-label generation and matching
Nanba et al. Bilingual PRESRI-Integration of Multiple Research Paper Databases.
CN112818005A (en) Structured data searching method, device, equipment and storage medium
JP7326637B2 (en) CHUNKING EXECUTION SYSTEM, CHUNKING EXECUTION METHOD, AND PROGRAM
JP4426893B2 (en) Document search method, document search program, and document search apparatus for executing the same
EP2026216A1 (en) Data processing method, computer program product and data processing system
JP2007128224A (en) Document indexing apparatus, document indexing method, and document indexing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Request for anonymity

Inventor before: Request for anonymity

CB03 Change of inventor or designer information