[go: up one dir, main page]

CN108628832B - Method and device for acquiring information keywords - Google Patents

Method and device for acquiring information keywords Download PDF

Info

Publication number
CN108628832B
CN108628832B CN201810431832.7A CN201810431832A CN108628832B CN 108628832 B CN108628832 B CN 108628832B CN 201810431832 A CN201810431832 A CN 201810431832A CN 108628832 B CN108628832 B CN 108628832B
Authority
CN
China
Prior art keywords
keyword
keywords
temporary
keyword set
coverage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810431832.7A
Other languages
Chinese (zh)
Other versions
CN108628832A (en
Inventor
李素粉
孙兆欣
张云勇
滕佳佳
伍珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201810431832.7A priority Critical patent/CN108628832B/en
Publication of CN108628832A publication Critical patent/CN108628832A/en
Application granted granted Critical
Publication of CN108628832B publication Critical patent/CN108628832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种信息情报关键字获取方法及装置,通过计算当前热点信息关键字集合、各跟踪对象的共性关键字集合、各跟踪对象的个性关键字集合的并集,可以快速确定出信息情报关键字,且确定出的信息情报关键字不但覆盖当前热点信息,而且具有针对性,能够满足各用户(即跟踪对象)的个性化需求,具有多维度、覆盖面广的特点。

Figure 201810431832

The present invention provides a method and device for obtaining information intelligence keywords. By calculating the current hot information keyword set, the common keyword set of each tracking object, and the union of the individual keyword sets of each tracking object, information intelligence can be quickly determined. The determined information intelligence keywords not only cover the current hot information, but also have pertinence, can meet the individual needs of each user (that is, the tracking object), and have the characteristics of multi-dimensional and wide coverage.

Figure 201810431832

Description

Method and device for acquiring information keywords
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for acquiring information keywords.
Background
In order to adapt to market environment changes, information work is highly emphasized by domestic and foreign mainstream operators, equipment providers and internet enterprises all the time, and decision support is provided for company strategies.
According to the analysis report of Gartner, telecommunication operators in the top 30 of the world rank all have special competitive information departments, and pay more attention to the information work, AT & T designs a professional portal to manage the information work as early as 2007; verizon also establishes a self-service model as early as 2008 to improve the information service efficiency, and releases 77 related job recruitments all the year round as 2016 for Linkin thermal information recruitment analysts; the leader of the German telecommunication market and the competitive intelligence department is also an expert of the world competitive intelligence society and the European competitive intelligence society.
The keyword is a main search basis for obtaining information materials, and how to obtain the information keyword is a main problem for information management.
Disclosure of Invention
The invention provides a method and a device for acquiring information keywords, aiming at the defects in the prior art and used for at least partially solving the problem of how to automatically acquire the information keywords.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a method for acquiring information keywords, which comprises the following steps:
determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information;
determining a third keyword set, wherein keywords in the third keyword set are common keywords of all tracked objects;
calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine an information keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, the determining the third keyword set specifically includes:
calculating the coverage of each keyword to be selected;
determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set;
and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, the calculating the coverage of each candidate keyword specifically includes:
acquiring the number of tracking objects related to each keyword to be selected and the total number of the tracking objects;
and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, the determining a third keyword set according to the number of keywords in a preset third keyword set and the first temporary set specifically includes:
comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set;
if the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set;
if the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, the determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set, and the second keyword set specifically includes:
comparing the coverage of each keyword to be selected with a preset threshold, and if the coverage of each keyword to be selected is greater than the preset threshold, taking the corresponding keyword to be selected as an element of a second temporary set;
and calculating the intersection of the second temporary set, the first keyword set and the second keyword set and negating to obtain the first temporary set.
The present invention also provides a keyword management apparatus, the apparatus comprising: the system comprises a first processing module, a second processing module and a third processing module;
the first processing module is used for determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information;
the second processing module is used for determining a third keyword set, wherein keywords in the third keyword set are common keywords of all tracked objects;
the third processing module is used for calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine an information intelligence keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, the second processing module is specifically configured to calculate a coverage of each keyword to be selected; determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set; and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, the second processing module is specifically configured to obtain the number of the tracked objects related to each keyword to be selected and the total number of the tracked objects; and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, the second processing module is configured to compare the number of keywords in the first temporary set with a preset number of keywords in a third keyword set; when the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set; when the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, the third processing module is configured to compare the coverage of each keyword to be selected with a preset threshold, and when the coverage of each keyword is greater than the preset threshold, use the corresponding keyword to be selected as an element of a second temporary set, and calculate and negate an intersection of the second temporary set, the first keyword set, and the second keyword set, so as to obtain the first temporary set.
According to the invention, the information keywords can be quickly determined by calculating the union of the current hotspot information keyword set, the common keyword set of each tracked object and the individual keyword set of each tracked object, and the determined information keywords not only cover the current hotspot information, but also have pertinence, can meet the individual requirements of each user (namely the tracked object), and have the characteristics of multiple dimensionality and wide coverage.
Drawings
Fig. 1 is a flow chart of information intelligence keyword acquisition provided by the embodiment of the present invention;
FIG. 2 is a flowchart of determining a third set of keywords according to an embodiment of the present invention;
FIG. 3 is a second flowchart of determining a third keyword set according to an embodiment of the present invention;
FIG. 4 is a flow chart of determining a first temporary set according to an embodiment of the present invention;
fig. 5 is a structural diagram of a keyword management apparatus according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention provides a method for acquiring information keywords, which is applied to an information resource system, wherein the information resource system comprises the following steps: the system comprises a material collecting device, a resource pool establishing device, an information resource pool, a tracking object database, a tracking object management device, a keyword database and a keyword management device, wherein the tracking object management device is used for updating the tracking object database, the keyword management device is used for updating the keyword database, the material collecting device respectively obtains a tracking object information material and a keyword information material through the tracking object management device and the keyword management device, and the resource pool establishing device encodes the tracking object information material and the keyword information material to obtain information and stores the information in the information resource pool.
In the embodiment of the invention, the tracked objects mainly comprise global mainstream operators and large Internet companies, and are stored in the tracked object database.
The intelligence information keywords include individual keywords and common keywords. The keywords are formed according to the extraction of the strategy and key work of the company, the keywords to be searched for information retrieval of each tracked object are common keywords, such as 5G, cloud computing, big data, the Internet of things and the like, and the common keywords are stored in a common keyword module of a keyword database. And combing and refining the keywords according to the current hot spot information of the tracked object to be the individual keywords of the tracked object, wherein the individual keywords are stored in an individual keyword module of a keyword database.
As shown in fig. 1, the method for acquiring information keywords comprises the following steps:
step 101, determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information.
Specifically, the number of keywords in the second keyword set kw2 is n2, n2 is a preset value, and n2 is greater than or equal to 0. kw2 ═ kw21,kw22,kw23,…kw2n2And each keyword in the second keyword set kw2 is a keyword of the current hotspot information and can be obtained by calculating the most active selection algorithm, wherein the most active selection algorithm is the existing algorithm and is not described herein again.
Step 102, determining a third keyword set, wherein the keywords in the third keyword set are common keywords of all tracked objects.
Specifically, the number of the keywords in the third keyword set kw3 is n3, n3 is a preset value, and n3 is greater than or equal to 0. kw3 ═ kw31,kw32,kw33,…kw3n3And each keyword in the third keyword set kw3 is a common keyword of each tracked object, and includes a hotspot vocabulary concerned by each tracked object, for example: the internet of things, block chains, big data and the like.
A specific implementation of determining the third set of keywords kw3 is described in more detail later in connection with fig. 2.
And 103, calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine the information intelligence keyword.
Specifically, the first gateThe number of key words in the key word set kw1 is n1, n1 is a preset value, and n1 is greater than or equal to 0. The total number of the information intelligence keywords is n, n is n1+ n2+ n3, n1 is more than or equal to 0 and less than or equal to n, n2 is more than or equal to 0 and less than or equal to n, and n3 is more than or equal to 0 and less than or equal to n. kw1 ═ kw11,kw12,kw13,…kw1n1And the keywords in the first keyword set kw1 are individual keywords of each tracked object, and include the long-term attention field and hot vocabulary of each tracked object, and each keyword in the first keyword set kw1 can be set by each tracked object.
The set of the finally determined information intelligence keywords is kw, kw-kw 1-kw 2-kw 3.
It can be seen from step 101 and step 103 that, by calculating the union of the current hot spot information keyword set, the common keyword set of each tracked object, and the individual keyword set of each tracked object, the invention can quickly determine the information keywords, and the determined information keywords not only cover the current hot spot information, but also have pertinence, can meet the individual requirements of each user (i.e. tracked object), and has the characteristics of multiple dimensions and wide coverage.
Further, as shown in fig. 2, the determining the third keyword set (i.e. step 102) specifically includes the following steps:
step 201, calculating the coverage of each keyword to be selected.
Specifically, the number T1 of the tracked objects related to each candidate keyword is obtainediAnd the total number T of the tracked objects, and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects so as to obtain the coverage q of each keyword to be selectediI.e. qi=T1iand/T, wherein i represents a key to be selected.
The tracking object related to each candidate keyword refers to a tracking object which focuses on the candidate keyword, that is, a tracking object which selects the candidate keyword as a common keyword and/or an individual keyword.
Step 202, according to the coverage q of each keyword to be selectediA preset threshold Q, a first keyword set kw1 and a second keyword setIn total kw2, a first temporary set temp1kw is determined.
Specifically, the process of determining the first temporary set temp1kw is described in detail later with reference to fig. 4.
Step 203, determining a third keyword set kw3 according to the number n3 of the keywords in the preset third keyword set kw3 and the first temporary set temp1 kw.
Specifically, the keywords in the first temporary set temp1kw may be final information intelligence keywords, i.e., the third set of keywords kw3 is the same as the first temporary set temp1 kw. The keywords in the first temporary set temp1kw may be different from the final information intelligence keywords, i.e. the range of the first temporary set temp1kw is larger than the range of the third set kw 3.
The scheme of how to determine the third set of keywords kw3 is described in detail later with reference to fig. 3.
As can be seen from step 201 and 203, the coverage q of each keyword to be selected is determinediAs a criterion for determining the keywords in the third keyword set kw3, keywords with high coverage and wide coverage may be selected, so that different requirements of each tracked object can be covered.
The process of determining the third set of keywords kw3 (i.e., step 203) is described in detail below with reference to fig. 3. As shown in fig. 3, the process of determining the third keyword set kw3 includes the following steps:
step 301, comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set, and if the former is greater than the latter, executing step 302; otherwise, step 304 is performed.
Specifically, assuming that the number of keywords in the first temporary set temp1kw is n ', n ' is compared with the number n3 of keywords in the third set kw3, and if n ' > n3, it indicates that the number of keywords in the first temporary set temp1kw is greater than the required number of keywords in the third set kw3, at this time, a more suitable keyword needs to be further selected from the first temporary set temp1kw and placed in the third set kw3 (i.e., step 302 and step 303 are executed); if n' is less than or equal to n3, it means that the number of keywords in the first temporary set temp1kw is less than or equal to the required number of keywords in the third keyword set kw3, at this time, all the keywords in the first temporary set temp1kw are placed in the third keyword set kw3 (i.e., step 304 is executed).
Step 302, sorting the keywords in the first temporary set from large to small according to coverage.
Specifically, n' keywords in the first temporary set temp1kw are arranged according to the coverage qiSorting from big to small, wherein the coverage of each keyword qiCalculated in step 201.
Step 303, selecting a preset number of keywords in the sequence as elements of a third keyword set.
Specifically, the preset number is the number n3 of keywords in the third keyword set, that is, in the coverage ranking, the first n3 keywords are selected to form the third keyword set kw 3.
Step 304, the third set of keywords is the first temporary set.
Specifically, if the number n' of keywords in the first temporary set temp1kw does not reach the number of keywords required by the third set kw3, the entire first temporary set temp1kw is used as the third set kw 3.
As can be seen from steps 301 to 303, by filtering the keywords in the first temporary set temp1kw, the information intelligence keywords (i.e. the third keyword set kw3) thus selected have a larger coverage and a wider coverage.
The process of determining the first temporary set temp1kw (i.e., step 202) is described in detail below with reference to fig. 4. As shown in fig. 3, the process of determining the first temporary set temp1kw includes the following steps:
step 401, comparing the coverage of each keyword to be selected with a preset threshold, and if the coverage of each keyword to be selected is greater than the preset threshold, executing step 402; otherwise, discarding the candidate keyword.
Specifically, a threshold Q is preset, and the coverage Q of each keyword to be selected is determinediRespectively compared with a threshold value Q if the coverage of the key word to be selectedDegree q ofi>Q, if the keyword to be selected is qualified, putting the keyword to be selected into the second temporary set temp2kw (i.e. executing step 402); if the coverage of the key word to be selected is qiAnd if the value is less than or equal to Q, the keyword to be selected is not qualified, and the keyword to be selected is discarded.
Step 402, using the corresponding candidate keyword as an element of the second temporary set.
Step 403, calculating and negating the intersection of the second temporary set, the first keyword set and the second keyword set to obtain the first temporary set.
In particular, the first temporary set
Figure BDA0001653615510000081
In this way, the same keys in the second temporary set temp2kw, the first key set kw1 and the second key set kw2 may be excluded. Keyword duplication within the third set of keywords kw3 is avoided when subsequently determining the third set of keywords kw3 from the first temporary set temp1 kw.
Based on the same technical concept, an embodiment of the present invention further provides a keyword management apparatus, as shown in fig. 5, the keyword management apparatus includes: a first processing module 51, a second processing module 52 and a third processing module 53.
The first processing module 51 is configured to determine a second keyword set, where keywords in the second keyword set are keywords of current hotspot information.
The second processing module 52 is configured to determine a third keyword set, where the keywords in the third keyword set are common keywords of each tracked object.
The third processing module 53 is configured to calculate a union of the second keyword set, the third keyword set, and a preset first keyword set to determine an information intelligence keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, the second processing module 52 is specifically configured to calculate a coverage of each keyword to be selected; determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set; and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, the second processing module 52 is specifically configured to obtain the number of the tracked objects related to each keyword to be selected, and the total number of the tracked objects; and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, the second processing module 52 is configured to compare the number of keywords in the first temporary set with the number of keywords in a preset third keyword set; when the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set; when the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, the third processing module 53 is configured to compare the coverage of each keyword to be selected with a preset threshold, and when the coverage of each keyword is greater than the preset threshold, use the corresponding keyword to be selected as an element of the second temporary set, and calculate and negate an intersection of the second temporary set, the first keyword set, and the second keyword set, so as to obtain the first temporary set.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (6)

1.一种信息情报关键字获取方法,其特征在于,所述方法包括:1. A method for acquiring information intelligence keywords, characterized in that the method comprises: 确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字;determining a second keyword set, where the keywords in the second keyword set are keywords of current hotspot information; 确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字;determining a third keyword set, where the keywords in the third keyword set are common keywords of each tracking object; 计算所述第二关键字集合、所述第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字;其中,所述第一关键字集合内的关键字为各跟踪对象的个性关键字;其中,Calculate the union of the second keyword set, the third keyword set and the preset first keyword set to determine information intelligence keywords; wherein, the keywords in the first keyword set are Personalized keywords of each tracked object; among them, 所述确定第三关键字集合,具体包括:The determining of the third keyword set specifically includes: 计算各待选关键字的覆盖度;Calculate the coverage of each candidate keyword; 根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合;determining a first temporary set according to the coverage of each candidate keyword, a preset threshold, the first keyword set and the second keyword set; 根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合;According to the preset number of keywords in the third keyword set and the first temporary set, determine the third keyword set; 所述根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合,具体包括:The determining the third keyword set according to the preset number of keywords in the third keyword set and the first temporary set specifically includes: 将所述第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较;comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set; 若前者大于或等于后者,则将所述第一临时集合内的关键字按照覆盖度从大到小排序,并选取所述排序中前预设数量个关键字作为所述第三关键字集合的元素,所述预设数量为所述第三关键字集合内关键字的数量;If the former is greater than or equal to the latter, the keywords in the first temporary set are sorted in descending order of coverage, and the first preset number of keywords in the sorting are selected as the third keyword set , the preset number is the number of keywords in the third keyword set; 若前者小于后者,则所述第三关键字集合为所述第一临时集合。If the former is smaller than the latter, the third keyword set is the first temporary set. 2.如权利要求1所述的方法,其特征在于,所述计算各待选关键字的覆盖度,具体包括:2. The method according to claim 1, wherein the calculating the coverage of each candidate keyword specifically comprises: 获取与各待选关键字相关的跟踪对象的数量,以及跟踪对象的总数量;Get the number of tracking objects related to each candidate keyword, and the total number of tracking objects; 分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度。The ratio of the number of tracking objects related to each candidate keyword to the total number of tracking objects is calculated respectively, so as to obtain the coverage of each candidate keyword. 3.如权利要求1所述的方法,其特征在于,所述根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合,具体包括:3 . The method according to claim 1 , wherein determining the first keyword according to the coverage of each candidate keyword, a preset threshold, the first keyword set and the second keyword set. 4 . A temporary collection, specifically including: 将所述各待选关键字的覆盖度与预设的阈值相比较,若前者大于后者,则将相应的待选关键字作为第二临时集合的元素;Comparing the coverage of each candidate keyword with a preset threshold, if the former is greater than the latter, the corresponding candidate keyword is used as an element of the second temporary set; 计算所述第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到所述第一临时集合。The intersection of the second temporary set, the first key set and the second key set is calculated and negated to obtain the first temporary set. 4.一种关键字管理装置,其特征在于,包括:第一处理模块、第二处理模块和第三处理模块;4. A keyword management device, comprising: a first processing module, a second processing module and a third processing module; 所述第一处理模块用于,确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字;The first processing module is configured to determine a second keyword set, where the keywords in the second keyword set are keywords of current hotspot information; 所述第二处理模块用于,确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字;The second processing module is used to determine a third keyword set, and the keywords in the third keyword set are the common keywords of each tracking object; 所述第三处理模块用于,计算所述第二关键字集合、所述第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字;其中,所述第一关键字集合内的关键字为各跟踪对象的个性关键字;其中,The third processing module is configured to calculate the union of the second keyword set, the third keyword set and the preset first keyword set to determine information intelligence keywords; The keywords in a keyword set are individual keywords of each tracking object; wherein, 所述第二处理模块具体用于,计算各待选关键字的覆盖度;根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合;根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合;The second processing module is specifically configured to calculate the coverage of each candidate keyword; according to the coverage of each candidate keyword, a preset threshold, the first keyword set and the second keyword set , determine the first temporary set; according to the preset number of keywords in the third keyword set and the first temporary set, determine the third keyword set; 所述第二处理模块用于,将所述第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较;当前者大于或等于后者时,将所述第一临时集合内的关键字按照覆盖度从大到小排序,并选取所述排序中前预设数量个关键字作为所述第三关键字集合的元素,所述预设数量为所述第三关键字集合内关键字的数量;当前者小于后者时,所述第三关键字集合为所述第一临时集合。The second processing module is configured to compare the number of keywords in the first temporary set with the number of keywords in the preset third keyword set; when the former is greater than or equal to the latter, the The keywords in the first temporary set are sorted in descending order of coverage, and the first preset number of keywords in the sorting are selected as elements of the third keyword set, and the preset number is the first keyword set. The number of keywords in the three keyword sets; when the former is less than the latter, the third keyword set is the first temporary set. 5.如权利要求4所述的关键字管理装置,其特征在于,所述第二处理模块具体用于,获取与各待选关键字相关的跟踪对象的数量,以及跟踪对象的总数量;分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度。5. The keyword management device according to claim 4, wherein the second processing module is specifically configured to obtain the number of tracking objects related to each candidate keyword, and the total number of tracking objects; respectively; Calculate the ratio of the number of tracking objects related to each candidate keyword to the total number of tracking objects to obtain the coverage of each candidate keyword. 6.如权利要求4所述的关键字管理装置,其特征在于,所述第三处理模块用于,将所述各待选关键字的覆盖度与预设的阈值相比较,当前者大于后者时,将相应的待选关键字作为第二临时集合的元素,并计算所述第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到所述第一临时集合。6 . The keyword management device according to claim 4 , wherein the third processing module is configured to compare the coverage of each candidate keyword with a preset threshold, and the former is greater than the latter In the case of the first keyword, the corresponding candidate keyword is used as an element of the second temporary set, and the intersection of the second temporary set, the first keyword set and the second keyword set is calculated and negated to obtain the first Temporary collection.
CN201810431832.7A 2018-05-08 2018-05-08 Method and device for acquiring information keywords Active CN108628832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810431832.7A CN108628832B (en) 2018-05-08 2018-05-08 Method and device for acquiring information keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810431832.7A CN108628832B (en) 2018-05-08 2018-05-08 Method and device for acquiring information keywords

Publications (2)

Publication Number Publication Date
CN108628832A CN108628832A (en) 2018-10-09
CN108628832B true CN108628832B (en) 2022-03-18

Family

ID=63695891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810431832.7A Active CN108628832B (en) 2018-05-08 2018-05-08 Method and device for acquiring information keywords

Country Status (1)

Country Link
CN (1) CN108628832B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651657B (en) * 2020-06-04 2024-05-24 深圳前海微众银行股份有限公司 Information monitoring method, device, equipment and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425677A (en) * 2012-05-18 2013-12-04 阿里巴巴集团控股有限公司 Method for determining classified models of keywords and method and device for classifying keywords
CN103530398A (en) * 2013-10-23 2014-01-22 合山市科学技术情报研究所 Information collecting, processing and retrieving system
CN103744873A (en) * 2013-12-18 2014-04-23 天脉聚源(北京)传媒科技有限公司 Method, device and browser for displaying hotspot keyword
CN104035997A (en) * 2014-06-13 2014-09-10 淮阴工学院 Scientific and technical information acquisition and pushing method based on text classification and image deep mining
CN104679787A (en) * 2013-11-27 2015-06-03 华为技术有限公司 Interest information statistical method and device
CN106126588A (en) * 2016-06-17 2016-11-16 广州视源电子科技股份有限公司 Method and device for providing related words
CN106227735A (en) * 2016-07-11 2016-12-14 苏州天梯卓越传媒有限公司 A kind of word cloud Topic Selection for Publishing Industry and system

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181438B1 (en) * 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
CN100405371C (en) * 2006-07-25 2008-07-23 北京搜狗科技发展有限公司 Method and system for abstracting new word
CN101296128A (en) * 2007-04-24 2008-10-29 北京大学 A method for monitoring abnormal state of Internet information
CN101520878A (en) * 2009-04-03 2009-09-02 华为技术有限公司 Method, device and system for pushing advertisements to users
CN102110269A (en) * 2011-02-25 2011-06-29 中兴通讯股份有限公司 Advertisement releasing method and system
CN103714413A (en) * 2013-11-21 2014-04-09 清华大学 System and method for constructing quality model based on position information
CN104965893A (en) * 2015-06-18 2015-10-07 山东师范大学 Big data advertisement delivery method
CN107786595A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 The processing method of keyword, apparatus and system in distributed memory system
CN106453423B (en) * 2016-12-08 2019-10-01 黑龙江大学 A kind of filtration system and method for the spam based on user individual setting
CN107341199B (en) * 2017-06-21 2020-05-22 北京林业大学 Recommendation method based on document information commonality mode

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425677A (en) * 2012-05-18 2013-12-04 阿里巴巴集团控股有限公司 Method for determining classified models of keywords and method and device for classifying keywords
CN103530398A (en) * 2013-10-23 2014-01-22 合山市科学技术情报研究所 Information collecting, processing and retrieving system
CN104679787A (en) * 2013-11-27 2015-06-03 华为技术有限公司 Interest information statistical method and device
CN103744873A (en) * 2013-12-18 2014-04-23 天脉聚源(北京)传媒科技有限公司 Method, device and browser for displaying hotspot keyword
CN104035997A (en) * 2014-06-13 2014-09-10 淮阴工学院 Scientific and technical information acquisition and pushing method based on text classification and image deep mining
CN106126588A (en) * 2016-06-17 2016-11-16 广州视源电子科技股份有限公司 Method and device for providing related words
CN106227735A (en) * 2016-07-11 2016-12-14 苏州天梯卓越传媒有限公司 A kind of word cloud Topic Selection for Publishing Industry and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于关键词相关度的Deep_Web爬虫爬行策略;田野等;《计算机工程》;20080805;第34卷(第15期);220-222 *

Also Published As

Publication number Publication date
CN108628832A (en) 2018-10-09

Similar Documents

Publication Publication Date Title
US10621493B2 (en) Multiple record linkage algorithm selector
US10185771B2 (en) Method and system for scheduling web crawlers according to keyword search
US20120215795A1 (en) System and Method For Intelligent Job Hunt
CN103838857B (en) Automatic service combination system and method based on semantics
CN109711708A (en) Whole process engineering consulting method of servicing and system
CN109726734B (en) An automatic target platform identification system based on radiation source reconnaissance information
CN119003840B (en) Government service recommendation method and system based on user behavior analysis
CN104615734B (en) A kind of community management service big data processing system and its processing method
CN110442614B (en) Metadata searching method and device, electronic equipment and storage medium
CN115858598A (en) Enterprise big data-based target information screening and matching method and related equipment
CN111177481A (en) User identifier mapping method and device
CN111159559A (en) Method for constructing recommendation engine according to user requirements and user behaviors
CN107067033A (en) The local route repair method of machine learning model
CN108628832B (en) Method and device for acquiring information keywords
CN103646035B (en) A kind of information search method based on heuristic
CN112734382A (en) Talent recruitment system capable of quickly matching related job seekers according to enterprise requirements
CN115687579A (en) Document tag generation and matching method and device and computer equipment
CN119357406A (en) A method for constructing a knowledge graph of government hotline demands based on agent technology
CN118536957A (en) Talent post matching method and device based on model screening, medium and equipment
CN110941765A (en) Search intention identification method, information search method and device and electronic equipment
CN114297341B (en) Public opinion popularity determination method, device, equipment and storage medium
JP2008187612A (en) Traffic analysis model construction method, apparatus, construction program, and storage medium thereof
CN107577690A (en) The recommendation method and recommendation apparatus of magnanimity information data
CN111858733A (en) A method and system for government affairs information comparison based on Internet multi-source heterogeneous data
US20120054117A1 (en) Identifying an individual in response to a query seeking to locate personnel with particular experience

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant