CN1211747C - System for registering key words of articles and its method - Google Patents
System for registering key words of articles and its method Download PDFInfo
- Publication number
- CN1211747C CN1211747C CNB02131859XA CN02131859A CN1211747C CN 1211747 C CN1211747 C CN 1211747C CN B02131859X A CNB02131859X A CN B02131859XA CN 02131859 A CN02131859 A CN 02131859A CN 1211747 C CN1211747 C CN 1211747C
- Authority
- CN
- China
- Prior art keywords
- article
- synonym
- words
- keyword
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 21
- 238000012217 deletion Methods 0.000 claims description 16
- 230000037430 deletion Effects 0.000 claims description 16
- 238000013500 data storage Methods 0.000 abstract 1
- 238000004458 analytical method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a system for registering key words of articles, which comprises a sign bank, a data storage device of a virtual word bank and a key word database, and a processor, wherein the processor is used for contrasting an article with the sign bank; accordingly, signs identical with the signs recorded in the sign bank are deleted from the article; virtual words identical with the virtual words recorded in the virtual word bank are deleted from the article; subsequently, the occurring frequency of all words in the article is calculated; consequently, a plurality of alternate select words and the corresponding occurring frequency are obtained; finally, a plurality of key words are selected from the alternate select words on the basis of a setting condition; the selected key words are registered to the key word database.
Description
Technical field
The present invention relates to a kind of system for registering key words of articles and method, and be particularly related to a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.
Background technology
In the face of the epoch that information is spread unchecked, also can't have time enough digests a large amount of articles to common people.Also just owing to this reason, if there is effective method to confirm the theme of article or the association area that article is touched upon, but the user just direct reading and meet the article that the user expects the field through screening, do not read all articles and need not spend a large amount of time.
For the affirmation of the association area of the theme of article or article, normally judge with the most normal keyword of mentioning in the article.Analysis and the login method known for the keyword of article mainly screen with manual type.Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.At first, a large amount of articles 10 through artificial analysis one by one (11) afterwards can be by obtaining its relevant keyword 12 in each article 10.Afterwards, the analysis personnel login keyword to keyword database 14 by the mode of manual entry (13).
Because the analysis of the article keyword of knowing and login are to see through manpower to analyze for article one by one, therefore need expend plenty of time and manpower and can finish the keyword analysis.In addition, for some synonym words, also must see through analysis personnel's memory and the analysis that experience can correctly be finished synonymous keyword.
Summary of the invention
In view of this, fundamental purpose of the present invention is for providing a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
In order to realize above-mentioned purpose of the present invention, can realize by system for registering key words of articles provided by the present invention and method.
According to the system for registering key words of articles of the embodiment of the invention, comprise have a symbolic library, the data memory device and a processor of an empty word dictionary and a keyword database.Processor compares article and symbolic library, and then with in the article with symbolic library in the identical symbol deletion of noting down, and with in the article with the empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words, last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and the keyword of choosing is logined to keyword database.
Also can have a synonym dictionary in the data memory device.Processor also compares article and thesaurus, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and the number of times that will occur with the words and the synonym of synonym synonym is embedded in a synonym buffer zone.In addition, processor also combines with its corresponding occurrence number of prepare word with the number of times that synonym occurs with the words synonym synonym that note down in the synonym buffer zone.
Article keyword login method according to the embodiment of the invention at first, receives an article, then, article and symbolic library are compared, and then with in the article with symbolic library in the identical symbol of noting down delete.Afterwards, with in the article with the empty word dictionary in the identical empty word deletion of noting down.
Afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.At last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and keyword is logined to keyword database.
In addition, article and thesaurus can also be compared, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and will be embedded in a synonym buffer zone with the number of times of the words of synonym synonym and synonym appearance.Afterwards, also the number of times that occurs with the words synonym synonym and synonym that note down in the synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
According to the embodiment of the invention, imposing a condition can be a set number of times lower limit, and occurrence number then is chosen as keyword greater than the prepare word of set number of times lower limit, and logins to keyword database.In addition, processor also can sort prepare word according to its corresponding occurrence number.At this moment, imposing a condition can be an ordering ranking lower limit, and ordering then is chosen as keyword greater than the prepare word of ordering ranking lower limit, and logins to keyword database.
Description of drawings
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, embodiment cited below particularly, and cooperate appended diagram, it is as follows to be elaborated:
Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.
Embodiment
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
According to the system for registering key words of articles of the embodiment of the invention, comprise a data memory device 200 and a processor 210.Have in the data memory device 200 a synonym dictionary 201, a symbolic library 202, an empty word dictionary 203, a keyword database 204, with a synonym buffer zone 205.
Corresponding relation in the thesaurus 201 between record synonym words, the synonym that for example is synonymous to " VIA " have " VIA Tech " and " VIA Technologies, Inc. " etc.Some special symbols of record in the symbolic library 202 are as punctuation mark etc.Tool function word in all senses not in the general article of record in the empty word dictionary 203 is not had a words of meaning as verb, adjective, adverbial word, auxiliary word or other, for instance, and " a ", " is ", " on " and " he " or the like.Then can be in the keyword database 204 in order to deposit the keyword of analyzing out.
Processor 210 can compare article and thesaurus 201, and then with in the article with thesaurus 201 in the identical synonym of noting down by deleting among the article, and the number of times that synonym occurs in the record article, and the number of times that will occur with the words and the synonym of synonym synonym is embedded among the synonym buffer zone 205.
Processor 210 can compare article and symbolic library 202, so with in the article with symbolic library 202 in the identical symbol of noting down by deleting among the article.Processor 210 also can compare article and empty word dictionary 203, so with in the article with empty word dictionary 203 in the identical empty word deletion of noting down.
Then, processor 210 calculates all remaining number of times that words occurred in the article, thereby obtains its corresponding occurrence number of a plurality of prepare words.Afterwards, processor 210 adds corresponding candidate words and corresponding occurrence number thereof with the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205.
At last, processor 210 sorts prepare word according to its occurrence number, and foundation one imposes a condition, as a set number of times lower limit (as, occurrence number is more than 10 times) or an ordering ranking lower limit (as, preceding 5), by selecting keyword in the prepare word, and the keyword of choosing is logined to keyword database 204.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.With reference to figure 2 and Fig. 3, will be illustrated in down according to the article keyword login method of the embodiment of the invention.
Article keyword login method according to the embodiment of the invention, at first, as step S30, receive an article, then, as step S31, article and thesaurus 201 are compared, and then with in the article with thesaurus 201 in the identical synonym of noting down by deleting among the article, and the number of times that synonym occurs in the record article, and will being embedded among the synonym buffer zone 205 with the number of times of the words of synonym synonym and synonym appearance.
Then,, article and symbolic library 202 are compared as step S32, so with in the article with symbolic library 202 in the identical symbol deletion of noting down.And as step S33, article and symbolic library 203 are compared, and then with in the article with empty word dictionary 203 in the identical empty word deletion of noting down.
Afterwards, as step S34, calculate all remaining number of times that words occurred in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.Then, as step S35, the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205 is added corresponding candidate words and corresponding occurrence number thereof.
At last, as step S36, prepare word is sorted according to its occurrence number, and as step S37, according to imposing a condition, as set number of times lower limit or ordering ranking lower limit, by selecting to meet the keyword that imposes a condition in the prepare word, and, the keyword of choosing is logined to keyword database 204 as step S38.
Wherein, if impose a condition set number of times lower limit,, and login to keyword database 204 just then occurrence number can be selected as keyword greater than the prepare word of set number of times lower limit.And if the ordering ranking lower limit that imposes a condition just the prepare word that then sorts greater than ordering ranking lower limit can be selected as keyword, and is logined to keyword database 204.
It should be noted that in embodiments of the present invention, because step S31, step S32, different for the target of article deletion with step S33, and be respectively independently, so the change that its order can be mutual.In addition, only be to prescribe a time limit under the set number of times if impose a condition, then step S36 (prepare word is sorted according to its occurrence number) then can omit.
In addition, according to another kenel, be identical with the purpose of empty word dictionary 203 because symbolic library 202 is provided, promptly by leaving out special symbol and empty word in the article, therefore, symbolic library 202 also can be combined into a character word stock with empty word dictionary 203, wherein notes down symbol and the words that must delete in the article.
Next, lifting an example describes.
The text is as follows to suppose an article:
The article original text
| The VIA C3 1GHz processor is the coolest 1GHz processor on the market, saving energy and maximizing total system savings by allowing the use of inexpensive,off-the-shelf components.The processor runs so cool that it can operate with standard small coolers and power supplies, making it the ideal solution for ergonomic small footprint quiet PC designs.The first processor in the world to be manufactured using a leading edge 0.13 micron manufacturing process,the VIA C3 1GHz processor has the world′s smallest x86 processor die size. VIA Technologies,Inc.is a leading innovator and developer of PC core logic chipsets,microprocessors,and multimedia and communications chips |
In addition, thesaurus is as follows:
Thesaurus
| VIA | VIATech |
| VIA | VIA Technologies,Inc. |
At first, article is through after the thesaurus contrast, in the article with thesaurus in the synonym noted down, can be deleted as " VIA Technologies, Inc ", and calculate the number of times that it occurs in article.Afterwards, the same words " VIA " of synonym is noted down to the synonym buffer zone with occurrence number therewith again, and is as follows:
The synonym buffer zone
| VIA(1) |
Article behind the deletion synonym is as follows:
Article
| The VIA C3 1GHz processor is the coolest 1GHz processor on the market, saving energy and maximizing total system savings by allowing the use of inexpensive,off-the-shelf components.The processor runs so cool that it can operate with standard small coolers and power supplies, making it the ideal solution for ergonomic small footprint quiet PC designs.The first processor in the world to be manufactured using a leading edge 0.13 micron manufacturing process,the VIA C3 1GHz processor has the world′s smallest x86 processor die size. is a leading innovator and developer of PC core |
| logic chipsets,microprocessors,and multimedia and communications chips |
Conventional letter storehouse and empty word dictionary are as follows:
Symbolic library
| , | . | ‘ | “ |
| ; | [ | 、 | !· |
| @ | # | $ | % |
The empty word dictionary
| A | It | this | by |
| Is | On | Are | she |
| The | He | that | I |
Article passes through after symbolic library and the contrast of empty word dictionary and delete mark and the empty word again, and article is as follows:
Article
| VIA C3 1GHz processor coolest 1GHz processor market saving energy and maximizing total system savings allowing use of inexpensive off shelf components processor runs so cool can operate with standard small coolers and power supplies making ideal solution for ergonomic small footprint quiet PC designs first processor in world to be manufactured using leading edge 013 micron manufacturing process VIA C3 1GHz processor has worlds smallest x86 processor die size leading innovator and developer of PC core logic chipsets microprocessors and multimedia and communications chips |
Afterwards, calculate the number of times that all remaining words occurred in the article, therefore, prepare word and occurrence number thereof (in the bracket) are as follows:
Prepare word
| VIA(3) | C3(2) | 1GH(3) | processor(6) |
| coolest(1) | Viatech(1) | … |
Afterwards, add the interior data of synonym buffer zone:
Prepare word
| VIA(4) | C3(2) | 1GH(3) | processor(6) |
| coolest(1) | Viatech(1) | … |
Then, sort according to the occurrence number of each prepare word, ranking results is as follows:
Ranking results
| processor(6) VIA(4) 1GHz(3) C3(2) Coolest(1) Viatech(1) |
At last, just can meet the keyword that imposes a condition by selection in the prepare word, and the keyword of choosing is logined to keyword database according to imposing a condition.Wherein, be in article, to occur more than 3 times if impose a condition, then " processor ", " VIA ", just can be selected as keyword with " 1GHz ", and login to keyword database.And if to impose a condition be ordering ranking more than 4, then " processor ", " VIA ", " 1GHz " just can be selected as keyword with " C3 ", and login to keyword database.
In addition, according to another distortion of the present invention, can also be encoded in computing machine and read computer program in the media and come the activation computing machine to carry out the login of article keyword, as described in the embodiment of the invention.
Therefore, by system for registering key words of articles provided by the present invention and method, can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
Though the present invention with preferred embodiment openly as above; right its is not in order to limit the present invention; any those skilled in the art; without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking accompanying the claim person of defining.
Claims (6)
1. system for registering key words of articles comprises:
One data memory device has a symbolic library, an empty word dictionary and a keyword database; And
One processor, one article and this symbolic library are compared, and then with in this article with this symbolic library in the identical symbol deletion of noting down, and this article and this empty word dictionary compared, and then with in this article with this empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words, at last, foundation one imposes a condition by a plurality of keywords of selection in the described prepare word, and described keyword is logined to this keyword database
It is characterized in that, this data memory device also has a synonym dictionary, and this processor also compares this article and this thesaurus, and then with in this article with this thesaurus in the identical synonym deletion of noting down, and the number of times that this synonym occurs in record this article, and will be embedded in a synonym buffer zone with the words of this synonym synonym and the number of times of this synonym appearance, and
This processor also adds corresponding candidate words and corresponding occurrence number thereof with the number of times that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone.
2. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
3. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be an ordering ranking lower limit, and this processor also sorts described prepare word according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
4. an article keyword login method comprises the following steps:
Receive an article;
A this article and a symbolic library are compared, so with in this article with this symbolic library in the identical symbol deletion of noting down;
A this article and an empty word dictionary are compared, so with in this article with this empty word dictionary in the identical empty word deletion of noting down;
Calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words;
Impose a condition by selecting a plurality of keywords in the described prepare word according to one; And
Described keyword is logined to a keyword database,
It is characterized in that also comprising the following steps:
A this article and a synonym dictionary are compared, so with in this article with this thesaurus in the identical synonym deletion of noting down;
The number of times that this synonym occurs in record this article;
The number of times that will occur with words and this synonym of this synonym synonym is embedded in a synonym buffer zone; And
The number of times that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
5. article keyword login method as claimed in claim 4, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
6. article keyword login method as claimed in claim 4, wherein this to impose a condition be an ordering ranking lower limit, and also comprise described prepare word is sorted according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB02131859XA CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB02131859XA CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1480875A CN1480875A (en) | 2004-03-10 |
| CN1211747C true CN1211747C (en) | 2005-07-20 |
Family
ID=34145051
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB02131859XA Expired - Lifetime CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1211747C (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4439562B2 (en) | 2008-02-26 | 2010-03-24 | シャープ株式会社 | Electronic data retrieval device |
| CN101980196A (en) * | 2010-10-25 | 2011-02-23 | 中国农业大学 | Article comparison method and device |
| CN113096635B (en) * | 2021-03-31 | 2024-01-09 | 抖音视界有限公司 | Audio and text synchronization method, device, equipment and medium |
-
2002
- 2002-09-06 CN CNB02131859XA patent/CN1211747C/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| CN1480875A (en) | 2004-03-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109992645B (en) | Data management system and method based on text data | |
| US7801392B2 (en) | Image search system, image search method, and storage medium | |
| CN103198057B (en) | One kind adds tagged method and apparatus to document automatically | |
| KR101201037B1 (en) | Verifying relevance between keywords and web site contents | |
| Déjean et al. | A system for converting PDF documents into structured XML format | |
| CN111368038B (en) | Keyword extraction method and device, computer equipment and storage medium | |
| US20070217693A1 (en) | Automated evaluation systems & methods | |
| US20110191310A1 (en) | Method and system for ranking intellectual property documents using claim analysis | |
| US20130185060A1 (en) | Phrase based document clustering with automatic phrase extraction | |
| CN1774713A (en) | A method, system and computer program for naming a cluster of words and phrases extracted from a set of documents using a lexical database | |
| US7822752B2 (en) | Efficient retrieval algorithm by query term discrimination | |
| CN105528411B (en) | Device and method for full-text retrieval of ship equipment interactive electronic technical manual | |
| GB2417110A (en) | Extracting indices from scanned documents | |
| CN1378158A (en) | File classifying management system and method for operation system | |
| CN101692639A (en) | Bad webpage recognition method based on URL | |
| KR20060048778A (en) | Phrases-based Search in Information Retrieval Systems | |
| KR20060048777A (en) | Phrase-based generation of document descriptions | |
| GB2391087A (en) | Content extraction configured to automatically accommodate new raw data extraction algorithms | |
| CN1629844A (en) | Dynamic content clustering | |
| CN1145899C (en) | A Method for Automatically Generating Abstracts for Text Documents | |
| CN1920820A (en) | Image meaning automatic marking method based on marking significance sequence | |
| CN1928862A (en) | System and method for obtaining words or phrases unit translation information based on data excavation | |
| CN1211747C (en) | System for registering key words of articles and its method | |
| TWI289770B (en) | Keyword register system of articles and computer readable recording medium | |
| Müller et al. | Benchmarking image retrieval applications |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CX01 | Expiry of patent term |
Granted publication date: 20050720 |
|
| CX01 | Expiry of patent term |