[go: up one dir, main page]

CN112307302A - New technology query recommendation method based on keyword extraction - Google Patents

New technology query recommendation method based on keyword extraction Download PDF

Info

Publication number
CN112307302A
CN112307302A CN202011048900.5A CN202011048900A CN112307302A CN 112307302 A CN112307302 A CN 112307302A CN 202011048900 A CN202011048900 A CN 202011048900A CN 112307302 A CN112307302 A CN 112307302A
Authority
CN
China
Prior art keywords
new technology
word
keywords
keyword
technical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011048900.5A
Other languages
Chinese (zh)
Inventor
郑鑫
于德尚
张旭
侯永红
高经纬
江秀财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Mengdou Network Technology Co ltd
Original Assignee
Qingdao Mengdou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Mengdou Network Technology Co ltd filed Critical Qingdao Mengdou Network Technology Co ltd
Priority to CN202011048900.5A priority Critical patent/CN112307302A/en
Publication of CN112307302A publication Critical patent/CN112307302A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The new technology query recommendation method based on keyword extraction comprises the following steps: a preparation stage: preparation of a new technical data model, and an application phase: and recommending new technology according to the technical requirement matching. The preparation phase comprises the following steps: for the title and the achievement introduction of the new technology, extracting key vocabulary sentences, extracting key words, counting the key words and corresponding word frequency, and determining to extract the key words and the corresponding word frequency of the new technology according to the sequence of the word frequency from high to low. The application stage comprises the following steps: processing the title and the requirement introduction of the technical requirement by adopting the method, and determining and extracting the technical requirement key words and the corresponding word frequency; calculating the matching degree between the technical requirement and the new technology based on the keywords; calculating a new technology with the matching degree between the technical requirement and the new technology being not 0, and calculating the matching rate between the technical requirement and the new technology based on the keywords; and recommending new technology for the user according to the technical requirements of the user. The new technology recommended by the method is more consistent with the technical requirements, and meanwhile, the calculation speed can be improved, and the manual participation can be reduced.

Description

New technology query recommendation method based on keyword extraction
Technical Field
The invention relates to the technical field of new technology query recommendation, in particular to a new technology query recommendation method based on keyword extraction.
Background
The application scenarios of the invention are as follows: for a given technical need, a scenario is sought for a new technology. The technical requirement is a technical problem which needs to be solved currently by customers. For the technical requirements given by users, how to match and interface new technologies so as to solve the corresponding problems of the technical requirements. The invention aims to provide the following advantages: and inquiring and recommending by extracting technical requirements and keywords of the new technology. And a more appropriate new technology is matched for the user.
When a user provides a new technical introduction, the introduction of a keyword is basically not provided, and a technical auditor is generally required to manually process the introduction, extract the keyword of the corresponding technology and confirm the technical field (the technical field is a field classification specified by a platform). The extraction of the keywords manually is time-consuming and labor-consuming, and is influenced by subjective factors, technical limitations of personnel and the like, and the extracted keywords may be incorrect or incomplete. Adversely affecting keyword-based applications.
The invention adopts a keyword method, and the butt joint technology requires and adopts a new technology, thereby improving the efficiency and the accuracy of butt joint.
Keyword extraction is to select a keyword or a term that can represent a new technology or a technical requirement from new technology introduction or technical requirements. The automatic extraction of the keywords is realized by automatically selecting words capable of expressing main content from new technology introduction or technical requirements by means of a computer, provides a short summary for the new technology or the technical requirements, and can accurately and quickly match the associated new technology and the new technology with high association degree from a large number of new technologies.
And extracting, storing and using the new technical key words. After extracting keywords from the technical requirements of the user, inquiring the keywords which are the same as the keywords in the new technical keywords, and matching the technical requirements with the new technology according to the weight value (the weight value is calculated in the step decomposition part). According to the new technology query recommendation method based on keyword extraction, the recommended new technology is more consistent with the technical requirements, meanwhile, the calculation speed can be effectively improved, and manual participation is reduced.
Disclosure of Invention
The purpose of the invention is: aiming at the problems described in the background art, the invention provides a new technology query recommendation method based on keyword extraction, which carries out query recommendation by extracting technical requirements and keywords of the new technology so as to match a more appropriate new technology for a user.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
the new technology query recommendation method based on keyword extraction is characterized by comprising the following steps:
(1) a preparation stage: preparing a new technical data model; the method comprises the following steps:
step 1.1: extracting key vocabulary sentences from the new technology in the database and the title and result brief introduction of the new technology continuously input by a user;
step 1.2: extracting key words from key vocabulary sentences of new technical titles and achievement introduction respectively;
step 1.3: counting the keywords and the corresponding word frequency of the new technology;
step 1.4: determining the first KT new technical keywords and the corresponding word frequencies to be finally extracted according to the sequence of the word frequencies from high to low;
(2) an application stage: matching and recommending a new technology according to the technical requirement;
step 2.1: extracting key vocabulary sentences according to technical requirements input by a user at the current time and titles and requirement brief introduction of the technical requirements;
step 2.2: extracting key words from the titles of the technical requirements and the key vocabulary sentences of the requirement brief introduction respectively;
step 2.3: counting key words and corresponding word frequencies of technical requirements;
step 2.4: determining to finally extract the first KD technical requirement keywords and the corresponding word frequencies according to the sequence of the word frequencies from high to low;
step 2.5: calculating a keyword-based matching degree FW between the technical requirement and the new technology;
step 2.6: calculating a new technology with the matching degree between the technical requirement and the new technology being not 0, and calculating a keyword-based matching rate FR between the new technology and the technical requirement;
step 2.7: according to the technical requirements of users, sorting from high to low according to FW and sorting from high to low according to FR; and recommending new technology for the user according to the sequencing result.
Further, for (1) the preparation phase: for a new technology which is newly input, the key words need to be extracted through the steps of the preparation stage, and the key words and the new technology are simultaneously stored in the database to provide a basis for the following calculation; the new technology and the keyword information of the existing database do not need to be repeatedly calculated through the steps every time, and only need to be periodically updated when the word stock is changed.
Further, the step 1.1: the method for extracting the key vocabulary sentences comprises the following steps: utilizing a word segmentation module function of the ending word segmentation, segmenting the new technology based on the basic word stock and the stop word stock, and reserving words with partial parts of speech as the description of the next new technology; the reserved part of speech comprises nouns, dynamic nouns, English and morpheme words; when the removed part of speech has extractable meaning, the word stock is modified and supplemented by adopting two modes: (1) modifying the part of speech: modifying the part of speech of the word in a word bank, defining the part of speech as vnmd, and taking the part of speech as exclusive extracted word; (2) adding parts of speech: adding proper nouns in each field of the new technology into a newly-built part-of-speech library with part-of-speech mnmd, and setting the corresponding word frequency as the maximum word frequency +1 in the current basic word library; the method of extracting the key vocabulary sentences of step 2.1 is the same as the method of step 1.1.
Further, the step 1.2: the method comprises the following steps of extracting key words from key vocabulary sentences of new technical titles and achievement introduction, and comprises the following specific steps: adding an exclusive noun library and a stop word library by using a word segmentation model function of the ending word segmentation, segmenting the key vocabulary sentences extracted in the step 1.1, accumulating proper nouns or terms of the new technology in each field, and supplementing a basic word library or an exclusive noun library; for the supplement of the disabled word stock, only the disabled word stock in the step 1.2 is supplemented; for the words which are not successfully screened in the step 1.1, adding the words into a non-stop word bank, and not adding the words into the words which are successfully segmented; the method of extracting keywords from the keyword sentences of the technical requirement titles and requirement profiles in step 2.2 is the same as that in step 1.2.
Further, the step 1.3: the method for counting the keywords and the corresponding word frequency of the new technology comprises the following steps:
for the words successfully segmented in the step 1.2, counting the word frequency of the corresponding words;
in the extraction of the new technology keywords, extracting the keywords from the title and the content of the new technology respectively;
the final confirmation method of the keyword word frequency in the new technology comprises the following steps:
Fi=T×Fti+Fci,T≥1
wherein, FiFinal word frequency, F, representing the ith keywordtiIndicates the number of times the ith keyword appears in the new technology title, FciIndicates the number of times of the ith keyword appearing in the new technical content, and T indicates FtiThe importance of the ith keyword appearing in the title is greater than or equal to the importance of the ith keyword appearing in the content; in the case of only new technical titles, Fci0; with only new technical content, Fti=0;
The method for counting the keywords and the corresponding word frequencies of the technical requirements in step 2.3 is the same as the method in step 1.3, and the difference is only that the default in the technical requirements is free of titles and only technical requirement profile information exists.
Further, the step 1.4: according to the sequence of the word frequency from high to low, the first KT new technical keywords and the corresponding word frequency are determined to be finally extracted, and the specific method comprises the following steps: sorting the keywords extracted in the step 1.3 from high to low according to the calculated final word frequency, selecting the first KT keywords as the keywords of the new technology, storing the keywords and using the keywords for subsequent use, wherein when the keywords are less than KT, the actual number of the keywords is used as the standard; and 2.4, determining the method for finally extracting the first KD technical requirement keywords and the corresponding word frequencies according to the sequence of the word frequencies from high to low, wherein the method is the same as the method in the step 1.4.
Further, the step 2.5: the method comprises the following steps of calculating a keyword-based matching degree FW between technical requirements and a new technology, and specifically comprises the following steps:
setting initial FW for each new technology and technology requirementi0,1, …, n; wherein the number of new technologies is n, i represents the ith new technology, FWiRepresenting the keyword matching degree of the ith new technology;
and (3) counting the keywords of each new technology keyword and the technical requirement, wherein the number m of the overlapped keywords is that the matching degree of the technical requirement and the new technology keyword is as follows: FWi=m,i=0,1,…,n;0≤m≤KT。
Further, the step 2.6: the matching rate of the keywords is calculated, and the specific method comprises the following steps:
calculating a keyword matching rate FR between a new technology and a technology requirement FW ≠ 0;
Figure BDA0002708906300000051
wherein, FRiRepresenting a matching rate between the ith new technology and the technology requirement; FDjRepresenting the word frequency in the technical requirement corresponding to the jth keyword in the keyword set of which the technical requirement is coincident with the ith new technology; FD represents the sum of word frequencies of all keywords of technical requirements; FTijIndicating the word frequency, FT, in the new technique corresponding to the jth keyword in the set of keywordsiRepresenting the word frequency sum of K keywords before the ith new technology;
namely:
Figure BDA0002708906300000052
Figure BDA0002708906300000053
where FD denotes the sum of word frequencies of the keywords of the technical requirement, FtWord frequency, t, representing the t-th keyword in technical requirementsnExpressing the sum of the number of the keywords in the technical requirement, and expressing the actual set value of the number K of the keywords of the technical requirement by KD; fitIndicating the word frequency, t, of the t-th keyword in the ith new techniqueinThe sum of keywords in the ith new technology is shown, and KT is the actual set value of the number K of the keywords of the new technology.
Further, the step 2.7: the new technology recommends sequencing, and the specific method comprises:
sorting the new technologies from high to low according to the technical requirements and the keyword matching degree FW of the new technologies;
when FW is 0, no new technology is recommended;
sorting keyword matching rate FR between the new technology with FW ≠ 0 and technical requirements from high to low on the basis of FW sorting, namely for the condition that the FW values are the same;
and the sorting result after sorting by FW and FR is the final recommended sorting order.
Further, the method further comprises: the method for maintaining the basic word bank, the exclusive name word bank and the stop word bank comprises the following steps:
maintenance of basic word stock: the results after word segmentation are: partial or front loss, namely maintaining the word to a basic word bank, and setting the word frequency of the word according to the word forming probability of the word;
maintaining an exclusive name word library: the results after word segmentation are: dividing the word into two or more words, namely maintaining the words to an exclusive name word library;
and (3) stopping maintaining the word bank: for some words appearing in new technology or technical requirements, due to universality of word description, word frequency is too high, words influencing keyword extraction are maintained in a disabled word bank.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least: the new technology query recommendation method based on keyword extraction has the beneficial effects that:
(1) the method can objectively and effectively extract the keywords in the new technology and the technical requirement, and eliminate redundant words in the description information.
(2) The word stock has strong expansibility and maintainability.
(3) The technical requirement is matched with the new technology to a higher degree.
(4) The matching speed of the technical requirement and the new technology is improved, and the manual participation is reduced.
(5) The method provides query search of keyword dimension for query recommendation of new technology, not only depends on platform labels, but also solves the search problem of new technology with more span in the technical field.
(6) The extracted keywords lay the foundation for other further applications, such as automatic division of the application fields of the new technology and the like.
The invention provides a more appropriate new technology query recommendation method, which simultaneously has a more perfect basic word bank, a stop word bank and an exclusive name word bank.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of a new technology query recommendation method based on keyword extraction disclosed in the embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a new technology query recommendation method based on keyword extraction. A preparation stage: preparing a new technical data model; an application stage: and recommending new technology according to the technical requirement matching. Each stage mainly comprises the following steps:
(1) a preparation stage: preparation of new technology data model
Step 1.1: and respectively extracting key vocabulary sentences from the titles and result introduction of the new technologies continuously input by the user and the new technologies in the database.
Step 1.2: and respectively extracting key words from the key vocabulary sentences of the new technical titles and the achievement introduction.
Step 1.3: and (5) counting the keywords and the corresponding word frequency of the new technology.
Step 1.4: and determining to finally extract the first KT new technical keywords and the corresponding word frequency according to the sequence of the word frequency from high to low.
Note that: for the new technology of new input, the keywords and the like need to be extracted through the steps, and the extracted keywords and the new technology are simultaneously stored in the database to provide a basis for the subsequent calculation. The information such as new technology and key words of the existing database does not need to be repeatedly calculated through the steps each time, and only needs to be periodically updated when the word stock changes.
(2) An application stage: matching and recommending new technology according to technical requirements
Step 2.1: and extracting key vocabulary sentences according to the technical requirements input by the user at the time, the titles of the technical requirements and the requirement brief introduction.
Step 2.2: keywords are extracted from the key vocabulary sentences of the title and the requirement introduction of the technical requirement respectively.
Step 2.3: and (5) counting the keywords and the corresponding word frequency of the technical requirements.
Step 2.4: and determining to finally extract the first KD technical requirement keywords and the corresponding word frequencies according to the sequence of the word frequencies from high to low.
Step 2.5: and calculating the matching degree between the technical requirement and the new technology based on the keywords.
Step 2.6: and calculating a matching rate between the technical requirement and the new technology, wherein the matching degree between the technical requirement and the new technology is not 0, and the matching rate between the technical requirement and the new technology is based on the keywords.
Step 2.7: and recommending new technology for the user according to the technical requirements of the user.
The following describes in detail a new technical query recommendation method based on keyword extraction according to an embodiment of the present invention:
part 1: keywords are extracted for the title and the result profile of the new technology, and the title and the requirement profile of the technical requirement, respectively. The following description will take the new technology as an example to extract keywords, and the method for extracting the technical requirement keywords is the same.
The method comprises the following steps: key vocabulary sentences are extracted (corresponding to step 1.1, the same method as step 2.1).
The word segmentation method comprises the steps of utilizing a word segmentation module function jieba.posseg.cut () of the ending word segmentation, based on a basic word bank (the basic word bank contains words, word frequency and word properties) and a stop word bank (a part of useless words and words with the word properties kept are screened out), segmenting a new technology, and keeping a part of words with the word properties as the description of the next new technology for use. The reserved parts of speech include nouns, vernouns, english, morpheme words, etc.
And when some removed parts of speech only have a few words with more or less meanings and have extractable meanings, the word stock is modified and supplemented in two ways.
(1) Modifying the part of speech: and modifying the part of speech of the word in the word bank, defining the part of speech as vnmd (exclusive use of lemon bean platform verb), and taking the part of speech as exclusive use for extracting the word. For example, manufacturing, the original part-of-speech is a verb, while most verbs do not have the ability to describe new technologies, but a small part of verb words, such as manufacturing, can describe a word in the industry, manufacturing, and thus modify such words to a part-of-speech of vnmd.
(2) Adding parts of speech: for proper nouns in various fields of the new technology, when segmentation is carried out by utilizing a basic word stock, the segmentation is too dispersed, for example, deer blood peptide, when segmentation is carried out by utilizing the word stock with part-of-speech attributes, the original part-of-speech segmentation is as follows: deer, noun n; blood, noun n; peptide, nominal morpheme ng. Adding deer blood peptide into a newly-built part-of-speech library with part-of-speech mnmd (exclusive for lemon bean platform medicine), and setting the corresponding word frequency as the maximum word frequency +1 in the current basic word library so as to ensure that the probability of word formation of the currently newly-added word is higher during the calculation of the final participle and enhance the ambiguity error correction capability.
Step two: keywords are extracted for the key vocabulary sentences of the new technology title, the achievement profile, and the technical requirement title and requirement profile (corresponding to step 1.2 and step 2.2).
Adding an exclusive noun library and a stop lexicon by using a word segmentation model function jieba.cut () of the ending word segmentation, segmenting the key vocabulary sentences extracted in the step one, accumulating proper nouns or terms of a new technology in each field in the actual application process, and supplementing a basic lexicon or an exclusive noun library; and for the supplement of the disabled word stock, only the disabled word stock in the step two needs to be supplemented. And adding the words which are not successfully screened in the step one into a disabled word bank, and not adding the words into the words which are successfully segmented.
Step three: the keywords and corresponding word frequencies of the new technology or technology requirement are counted, and the new technology is taken as an example below (corresponding to step 1.3, and the method of step 2.3 is the same as the above).
And for the words with successfully segmented words extracted in the step two, counting the word frequency of the corresponding words.
In the new technology keyword extraction, keywords are extracted from the title and the content of the new technology respectively.
The final confirmation method of the keyword word frequency in the new technology comprises the following steps:
Fi=T×Fti+Fci,T≥1
wherein, FiFinal word frequency, F, representing the ith keywordtiIndicates the number of times the ith keyword appears in the new technology title, FciIndicates the number of times of the ith keyword appearing in the new technical content, and T indicates FtiThe weighted value of (i.e. the importance of the ith keyword appearing in the title is greater than or equal to the importance of the ith keyword appearing in the content), the tentative value T is 2, and the T is adjusted appropriately in the later stage according to the usage situation. In the case of only new technical titles, Fci0; with only new technical content, Fti0. The default in technical requirements is no title, and only new technical requirement description profile information exists. (Note: the number of keywords in the new technology and the number of extracted keywords in the technical requirements are set according to the respective requirements, the set standard is that the value is the number of extracted keywords in the new technology when the expert can accurately describe the new technology according to the condition of the extracted keywords and the first K keywords can be generalized, and the technical requirements are the same.)
Step four: identifying the final keywords according to the word frequency ordering of the New technology or technical requirement (corresponding to step 1.4 and step 2.4)
And D, sorting the keywords extracted in the step three from high to low according to the calculated final word frequency. And selecting the first K keywords as the keywords of the new technology, storing and using the keywords subsequently, wherein when the number of the keywords is less than K, the actual number of the keywords is used as the standard. The number K of the tentative keywords is 10, and the number K can be adjusted in the later period according to the use condition. (Note that the number of keywords of the new technology and the number of extracted keywords of the technical requirements are set according to the respective requirements.)
Note that: after the keywords of the new technology are extracted, the keywords and the corresponding word frequency are stored, and when the basic word bank, the exclusive name word bank and the stop word bank are changed, the keywords of the new technology can be set regularly or manually, and are updated without being extracted and used again every time the new technology is inquired. The keywords of the technical requirements are used as soon as possible, so that the timeliness is guaranteed.
Part 2: technical requirement and new technology, matching and butting
When a user inputs a technical requirement, after extracting keywords from the new technology, the user inquires and matches and docks the new technology.
The method comprises the following steps: the keyword match degree is calculated (corresponding to step 2.5).
The technical requirement is matched with a keyword matching degree FW of a new technology.
(1) Setting initial FW for each new technology and technology requirementi0,1, …, n. Wherein the number of new technologies is n, i represents the ith new technology, FWiIndicating the keyword matching degree of the ith new technology. (2) And (5) counting the number m of the keywords which are overlapped with each new technical keyword and the keyword of the technical requirement. The matching degree of the technical requirement and the keyword of the new technology is as follows: FWiM, i is 0,1, …, n; m is more than or equal to 0 and less than or equal to KT (KT represents the actual set value of the number of keywords K of the new technology)
Step two: the keyword match rate is calculated (corresponding to step 2.6).
And calculating a keyword matching rate FR between the new technology and the technical requirement FW ≠ 0.
Figure BDA0002708906300000111
Wherein, FRiRepresenting a matching rate between the ith new technology and the technology requirement; FDjRepresenting the word frequency in the technical requirement corresponding to the jth keyword in the keyword set of which the technical requirement is coincident with the ith new technology; FD represents the sum of word frequencies of all keywords of technical requirements; FTijIndicating the word frequency, FT, in the new technique corresponding to the jth keyword in the set of keywordsiAnd (4) representing the word frequency sum of K keywords before the ith new technology.
Namely:
Figure BDA0002708906300000112
Figure BDA0002708906300000113
where FD denotes the sum of word frequencies of the keywords of the technical requirement, FtWord frequency, t, representing the t-th keyword in technical requirementsnThe sum of the number of the keywords in the technical requirement is shown, and KD shows the actual set value of the number K of the keywords in the technical requirement. FitIndicating the word frequency, t, of the t-th keyword in the ith new techniqueinThe sum of keywords in the ith new technology is shown, and KT is the actual set value of the number K of the keywords of the new technology.
Step three: the new technique recommends an ordering (corresponding to step 2.7).
(1) And sequencing the new technologies from high to low according to the technical requirements and the keyword matching degree FW of the new technologies.
(2) When FW is 0, no new technique is recommended.
(3) For the keyword matching rate FR between the new technology and the technical requirement with FW ≠ 0, on the basis of FW sorting, namely for the condition that the FW values are the same, sorting is carried out according to the keyword matching rate FR between the new technology and the technical requirement from high to low.
(4) And the sorting result after sorting by FW and FR is the final recommended sorting order.
Part 3: maintenance of basic word stock, exclusive name word stock and stop word stock
(1) Maintenance of basic word stock: the results after word segmentation are: and (4) partial or front loss, namely maintaining the word to a basic word bank, and setting the proper word frequency according to the word forming probability of the word. For example, (1) deer blood peptide, the proper noun in the biological field is added into a newly-created part-of-speech base with part-of-speech of mnmd, and the corresponding word frequency is set to be the maximum word frequency +1 in the current basic word base, so that the probability of word formation of the current newly-added word during the calculation of the final participle is high, and the ambiguity correction capability is enhanced. (2) Manufacturing, if the extraction fails because the verb is extracted, modifying the manufactured part-of-speech to vnmd, and not modifying the word frequency pair. For addition and modification of words and addition or modification of parts of speech, expert review is needed to confirm whether the words are allowed to be added or modified, and a basic word bank is maintained.
(2) Maintaining an exclusive name word library: the results after word segmentation are: the word is divided into two or more words, namely the words are maintained to the proper name word bank. For example, (1) the word segmentation result is a three-dimensional model, but according to the professional domain knowledge, the word is a proper noun, i.e. the word is maintained to a proper noun library, so that the word segmentation result is a three-dimensional model. (2) The word segmentation result of the varicella vaccine is varicella and vaccine, and if the keyword result is more accurate and the query recommendation result is more accurate, the varicella vaccine is maintained to a special name word library; if the keyword extraction result can be relaxed slightly, because reference or portability may exist between the technologies, namely, the chickenpox vaccine is not maintained because the chickenpox and the vaccine or the new technology of the chickenpox or the vaccine can be used as the recommended new technology. For addition and modification of the special words, the addition or deletion of the words can be confirmed only through examination of experts, and an exclusive name word library is maintained.
(3) And (3) stopping maintaining the word bank: only the decommissioned thesaurus in part1 step two needs to be maintained. For some words which may appear in new technology or technical requirements, due to universality of word description, the word frequency is too high, words which affect keyword extraction are maintained in a disabled word bank. Such as: (1) the subject, the word does not have the ability to outline new technologies or technical needs, i.e., to add the word to a deactivated thesaurus. For addition and deletion of stop words, expert review is needed to confirm whether addition or deletion is allowed or not, and the stop words are maintained to a stop word bank.
Note that: at present, a basic word stock, an exclusive name word stock and a stop word stock used for extracting keywords of a platform pass through supplement, modification, deletion and the like of words in a period of time and 5000 new technologies and technical requirements. Has certain word segmentation capability and can ensure certain accuracy. And in the later period, the basic word bank, the exclusive name word bank and the stop word bank are continuously maintained and updated according to the actual use condition.
Part 4: word sources of basic word stock, special name word stock and disuse word stock
(1) Technical auditors regularly check and analyze new technical requirements and keyword extraction results of the new technology and determine whether maintenance needs exist in keyword extraction.
(2) The suggestion is made by the user. The method can be used for automatically uploading the results of keyword extraction to users with new technology and technical requirements, and increasing the function of manually filling in keywords, so that important key words corresponding to the new technology and technical requirements are collected from the users. After the collection is completed, the words are evaluated by platform auditors, and a word bank is maintained.
The specific embodiment is as follows:
example (c): 1. the following ten items of new technology (filled in by a user or by a staff member) are as follows:
Figure BDA0002708906300000141
Figure BDA0002708906300000151
2. the new technology title extracts keywords and counts word frequency:
note that: for the part of extracting the keywords, corresponding change and adjustment can occur according to updating, supplementing and iteration of each word stock, so that the part of extracting the keywords below is not the final effect and is only used as a case to show a corresponding calculation process.
Figure BDA0002708906300000152
3. Introduction of new technical achievements, extracting keywords and counting word frequency:
Figure BDA0002708906300000161
Figure BDA0002708906300000171
4. the new technology comprises word frequency statistics:
Figure BDA0002708906300000172
Figure BDA0002708906300000181
5. the match rate between the technical requirements and the new technology.
If the user inputs technical requirement 1: i need a preparation technology of graphene.
(1) Extracting technical requirement keywords:
Figure BDA0002708906300000182
(2) calculating the matching rate:
serial number Name of new technology Match rate
1 Ultrathin graphene/metal nanowire flexible electronic material 15.22%
2 Preparation of super-elastic graphene aerogel and research on performance of super-elastic graphene aerogel 22.50%
3 Rapid graphene preparation technology 28.57%
4 Quartz glass graphene coating heating tube and electric heater 9.09%
5 Graphene heat-conducting film 22.22%
6 Thermal insulation material 0.00%
7 Project for refining ethanol from straw 0.00%
8 Wheel positioning instrument by three-dimensional imaging technology 0.00%
9 0.00%
10 Newest nano antibacterial fresh-keeping drawer (for household electrical appliance) 0.00%
(3) Recommending and sequencing:
Figure BDA0002708906300000183
Figure BDA0002708906300000191
it should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims (10)

1. The new technology query recommendation method based on keyword extraction is characterized by comprising the following steps:
(1) a preparation stage: preparing a new technical data model; the method comprises the following steps:
step 1.1: extracting key vocabulary sentences from the new technology in the database and the title and result brief introduction of the new technology continuously input by a user;
step 1.2: extracting key words from key vocabulary sentences of new technical titles and achievement introduction respectively;
step 1.3: counting the keywords and the corresponding word frequency of the new technology;
step 1.4: determining the first KT new technical keywords and the corresponding word frequencies to be finally extracted according to the sequence of the word frequencies from high to low;
(2) an application stage: matching and recommending a new technology according to the technical requirement;
step 2.1: extracting key vocabulary sentences according to technical requirements input by a user at the current time and titles and requirement brief introduction of the technical requirements;
step 2.2: extracting key words from the titles of the technical requirements and the key vocabulary sentences of the requirement brief introduction respectively;
step 2.3: counting key words and corresponding word frequencies of technical requirements;
step 2.4: determining to finally extract the first KD technical requirement keywords and the corresponding word frequencies according to the sequence of the word frequencies from high to low;
step 2.5: calculating a keyword-based matching degree FW between the technical requirement and the new technology;
step 2.6: calculating a new technology with the matching degree between the technical requirement and the new technology being not 0, and calculating a keyword-based matching rate FR between the new technology and the technical requirement;
step 2.7: according to the technical requirements of users, sorting from high to low according to FW and sorting from high to low according to FR; and recommending new technology for the user according to the sequencing result.
2. The keyword extraction-based new technology query recommendation method according to claim 1, characterized in that for (1) preparation phase: for a new technology which is newly input, the key words need to be extracted through the steps of the preparation stage, and the key words and the new technology are simultaneously stored in the database to provide a basis for the following calculation; the new technology and the keyword information of the existing database do not need to be repeatedly calculated through the steps every time, and only need to be periodically updated when the word stock is changed.
3. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 1.1: the method for extracting the key vocabulary sentences comprises the following steps: utilizing a word segmentation module function of the ending word segmentation, segmenting the new technology based on the basic word stock and the stop word stock, and reserving words with partial parts of speech as the description of the next new technology; the reserved part of speech comprises nouns, dynamic nouns, English and morpheme words; when the removed part of speech has extractable meaning, the word stock is modified and supplemented by adopting two modes: (1) modifying the part of speech: modifying the part of speech of the word in a word bank, defining the part of speech as vnmd, and taking the part of speech as exclusive extracted word; (2) adding parts of speech: adding proper nouns in each field of the new technology into a newly-built part-of-speech library with part-of-speech mnmd, and setting the corresponding word frequency as the maximum word frequency +1 in the current basic word library; the method of extracting the key vocabulary sentences of step 2.1 is the same as the method of step 1.1.
4. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 1.2: the method comprises the following steps of extracting key words from key vocabulary sentences of new technical titles and achievement introduction, and comprises the following specific steps: adding an exclusive noun library and a stop word library by using a word segmentation model function of the ending word segmentation, segmenting the key vocabulary sentences extracted in the step 1.1, accumulating proper nouns or terms of the new technology in each field, and supplementing a basic word library or an exclusive noun library; for the supplement of the disabled word stock, only the disabled word stock in the step 1.2 is supplemented; for the words which are not successfully screened in the step 1.1, adding the words into a non-stop word bank, and not adding the words into the words which are successfully segmented; the method of extracting keywords from the keyword sentences of the technical requirement titles and requirement profiles in step 2.2 is the same as that in step 1.2.
5. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 1.3: the method for counting the keywords and the corresponding word frequency of the new technology comprises the following steps:
for the words successfully segmented in the step 1.2, counting the word frequency of the corresponding words;
in the extraction of the new technology keywords, extracting the keywords from the title and the content of the new technology respectively;
the final confirmation method of the keyword word frequency in the new technology comprises the following steps:
Fi=T×Fti+Fci,T≥1
wherein, FiFinal word frequency, F, representing the ith keywordtiIndicates the number of times the ith keyword appears in the new technology title, FciIndicates the number of times of the ith keyword appearing in the new technical content, and T indicates FtiThe importance of the ith keyword appearing in the title is greater than or equal to the importance of the ith keyword appearing in the content; in the case of only new technical titles, Fci0; with only new technical content, Fti=0;
The method for counting the keywords and the corresponding word frequencies of the technical requirements in step 2.3 is the same as the method in step 1.3, and the difference is only that the default in the technical requirements is free of titles and only technical requirement profile information exists.
6. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 1.4: according to the sequence of the word frequency from high to low, the first KT new technical keywords and the corresponding word frequency are determined to be finally extracted, and the specific method comprises the following steps: sorting the keywords extracted in the step 1.3 from high to low according to the calculated final word frequency, selecting the first KT keywords as the keywords of the new technology, storing the keywords and using the keywords for subsequent use, wherein when the number of the keywords is less than KT, the actual number of the keywords is used as the standard; and 2.4, determining the method for finally extracting the first KD technical requirement keywords and the corresponding word frequencies according to the sequence of the word frequencies from high to low, wherein the method is the same as the method in the step 1.4.
7. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 2.5: the method comprises the following steps of calculating a keyword-based matching degree FW between technical requirements and a new technology, and specifically comprises the following steps:
setting initial FW for each new technology and technology requirementi0, i-0, 1,.., n; wherein the number of new technologies is n, i represents the ith new technology, FWiRepresenting the keyword matching degree of the ith new technology;
and (3) counting the keywords of each new technology keyword and the technical requirement, wherein the number m of the overlapped keywords is that the matching degree of the technical requirement and the new technology keyword is as follows: FWi=m,i=0,1,...,n;0≤m≤KT。
8. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 2.6: the matching rate of the keywords is calculated, and the specific method comprises the following steps:
calculating a keyword matching rate FR between a new technology and a technology requirement FW ≠ 0;
Figure FDA0002708906290000041
wherein, FRiRepresenting a matching rate between the ith new technology and the technology requirement; FDjRepresenting the word frequency in the technical requirement corresponding to the jth keyword in the keyword set of which the technical requirement is coincident with the ith new technology; FD represents the sum of word frequencies of all keywords of technical requirements; FTijIndicating the word frequency, FT, in the new technique corresponding to the jth keyword in the set of keywordsiRepresenting the word frequency sum of K keywords before the ith new technology;
namely:
Figure FDA0002708906290000042
Figure FDA0002708906290000043
where FD denotes the sum of word frequencies of the keywords of the technical requirement, FtWord frequency, t, representing the t-th keyword in technical requirementsnExpressing the sum of the number of the keywords in the technical requirement, and expressing the actual set value of the number K of the keywords of the technical requirement by KD; fitIndicating the word frequency, t, of the t-th keyword in the ith new techniqueinThe sum of keywords in the ith new technology is shown, and KT is the actual set value of the number K of the keywords of the new technology.
9. The keyword extraction-based new technology query recommendation method according to claim 1, wherein the step 2.7: the new technology recommends sequencing, and the specific method comprises:
sorting the new technologies from high to low according to the technical requirements and the keyword matching degree FW of the new technologies;
when FW is 0, no new technology is recommended;
sorting keyword matching rate FR between the new technology with FW ≠ 0 and technical requirements from high to low on the basis of FW sorting, namely for the condition that the FW values are the same;
and the sorting result after sorting by FW and FR is the final recommended sorting order.
10. The keyword extraction-based new technology query recommendation method according to claim 1, further comprising: the method for maintaining the basic word bank, the exclusive name word bank and the stop word bank comprises the following steps:
maintenance of basic word stock: the results after word segmentation are: partial or front loss, namely maintaining the word to a basic word bank, and setting the word frequency of the word according to the word forming probability of the word;
maintaining an exclusive name word library: the results after word segmentation are: dividing the word into two or more words, namely maintaining the words to an exclusive name word library;
and (3) stopping maintaining the word bank: for some words appearing in new technology or technical requirements, due to universality of word description, word frequency is too high, words influencing keyword extraction are maintained in a disabled word bank.
CN202011048900.5A 2020-09-29 2020-09-29 New technology query recommendation method based on keyword extraction Pending CN112307302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011048900.5A CN112307302A (en) 2020-09-29 2020-09-29 New technology query recommendation method based on keyword extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011048900.5A CN112307302A (en) 2020-09-29 2020-09-29 New technology query recommendation method based on keyword extraction

Publications (1)

Publication Number Publication Date
CN112307302A true CN112307302A (en) 2021-02-02

Family

ID=74489257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011048900.5A Pending CN112307302A (en) 2020-09-29 2020-09-29 New technology query recommendation method based on keyword extraction

Country Status (1)

Country Link
CN (1) CN112307302A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420554A (en) * 2021-06-18 2021-09-21 枣庄科技职业学院 Ancient poetry word frequency analysis method and system
CN114328826A (en) * 2021-12-20 2022-04-12 青岛檬豆网络科技有限公司 Method for extracting key words and abstracts of technical achievements and technical requirements

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120123855A1 (en) * 2010-11-11 2012-05-17 Nhn Business Platform Corporation System and method for suggesting recommended keyword
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
US8798995B1 (en) * 2011-09-23 2014-08-05 Amazon Technologies, Inc. Key word determinations from voice data
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN110188344A (en) * 2019-04-23 2019-08-30 浙江工业大学 A Keyword Extraction Method Based on Multi-feature Fusion
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 A recommendation model for court similar cases based on word vector and word frequency
CN110874530A (en) * 2019-10-30 2020-03-10 深圳价值在线信息科技股份有限公司 Keyword extraction method and device, terminal equipment and storage medium
CN111061957A (en) * 2019-12-26 2020-04-24 广东电网有限责任公司 Method and device for recommending article similarity

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120123855A1 (en) * 2010-11-11 2012-05-17 Nhn Business Platform Corporation System and method for suggesting recommended keyword
US8798995B1 (en) * 2011-09-23 2014-08-05 Amazon Technologies, Inc. Key word determinations from voice data
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN110188344A (en) * 2019-04-23 2019-08-30 浙江工业大学 A Keyword Extraction Method Based on Multi-feature Fusion
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 A recommendation model for court similar cases based on word vector and word frequency
CN110874530A (en) * 2019-10-30 2020-03-10 深圳价值在线信息科技股份有限公司 Keyword extraction method and device, terminal equipment and storage medium
CN111061957A (en) * 2019-12-26 2020-04-24 广东电网有限责任公司 Method and device for recommending article similarity

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420554A (en) * 2021-06-18 2021-09-21 枣庄科技职业学院 Ancient poetry word frequency analysis method and system
CN113420554B (en) * 2021-06-18 2023-10-27 枣庄科技职业学院 Ancient poetry word frequency analysis method and system
CN114328826A (en) * 2021-12-20 2022-04-12 青岛檬豆网络科技有限公司 Method for extracting key words and abstracts of technical achievements and technical requirements
CN114328826B (en) * 2021-12-20 2024-06-11 青岛檬豆网络科技有限公司 Method for extracting keywords and abstracts of technical achievements and technical demands

Similar Documents

Publication Publication Date Title
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN103605815B (en) A kind of merchandise news being applicable to B2B E-commerce platform is classified recommendation method automatically
CN102054016B (en) System and method for capturing and managing community intelligence information
CN112257422A (en) Named entity normalization processing method and device, electronic equipment and storage medium
CN112270178B (en) Medical literature cluster theme determination method and device, electronic equipment and storage medium
CN106528532A (en) Text error correction method and device and terminal
CN110674296B (en) Information abstract extraction method and system based on key words
CN115983233B (en) A method for estimating the duplication rate of electronic medical records based on data stream matching
CN109710841A (en) Review recommended methods and devices
CN109994215A (en) Disease automatic coding system, method, equipment and storage medium
CN106204156A (en) A kind of advertisement placement method for network forum and device
CN103324626A (en) Method for setting multi-granularity dictionary and segmenting words and device thereof
CN110162597B (en) Article data processing method and device, computer readable medium and electronic equipment
Sarkar A hybrid approach to extract keyphrases from medical documents
CN107908669A (en) A kind of big data news based on parallel LDA recommends method, system and device
CN112948527A (en) Improved TextRank keyword extraction method and device
CN115687960B (en) Text clustering method for open source security information
CN112131341A (en) Text similarity calculation method and device, electronic equipment and storage medium
CN112307302A (en) New technology query recommendation method based on keyword extraction
CN110399493B (en) Author disambiguation method based on incremental learning
CN109509517A (en) A kind of medical test Index for examination modified method automatically
CN113641788B (en) Unsupervised long and short film evaluation fine granularity viewpoint mining method
CN105653546A (en) Method and system for searching target theme
CN112307178A (en) Query recommendation method based on technical requirements and new technology similarity
CN115982222A (en) A search method based on special disease and special medicine scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202

RJ01 Rejection of invention patent application after publication