CN1328695C - Automatic searching and determining method for key words information in name card identification - Google Patents
Automatic searching and determining method for key words information in name card identification Download PDFInfo
- Publication number
- CN1328695C CN1328695C CNB2004101034834A CN200410103483A CN1328695C CN 1328695 C CN1328695 C CN 1328695C CN B2004101034834 A CNB2004101034834 A CN B2004101034834A CN 200410103483 A CN200410103483 A CN 200410103483A CN 1328695 C CN1328695 C CN 1328695C
- Authority
- CN
- China
- Prior art keywords
- word
- zone
- character
- literal information
- business card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000005520 cutting process Methods 0.000 claims description 4
- 206010020675 Hypermetropia Diseases 0.000 claims description 3
- 239000003086 colorant Substances 0.000 claims description 3
- 238000004040 coloring Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000012216 screening Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- VYMDGNCVAMGZFE-UHFFFAOYSA-N phenylbutazonum Chemical compound O=C1C(CCCC)C(=O)N(C=2C=CC=CC=2)N1C1=CC=CC=C1 VYMDGNCVAMGZFE-UHFFFAOYSA-N 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Character Discrimination (AREA)
- Character Input (AREA)
Abstract
The present invention discloses an automatic searching and determining method for key word information in name card identification. An acquired name card image is firstly provided with the dividing operation of character rows to obtain rows of character zones or communicating zones. Thus, the sequence arrangement is carried out according to character parameters and the number of the communicating zones, and the zones of the front number positions are used. The zone on which is key word information finally positioned is identified by the combination that whether the semantic meaning of the zones contiguous to the used zones is in accordance with positions or titles. Thus, the accurate extraction of the key word information is realized. Compared with the prior art, the present invention has the characteristics of simple method, convenient use, high character recognizing speed, high efficiency, high accuracy, etc.
Description
Technical field
The present invention relates to a kind of name card recognition technique, the method for the automatic searching and determining of crucial literal information in particularly a kind of business card identification.
Technical background
Name, company, position etc. all are important information in business card, general, these important informations all can mark out with relatively special form, positional alignment as important information is earlier, text parameters such as size, width, word space are bigger, and perhaps background or foreground color are different etc.For different application scenarios or different user, often there is the information of most critical in the important information again, how accurately to extract these crucial literal information, be a good problem to study.And at present in name card recognition technique, all be that the content on the business card is completely scanned, discern then, select in the character string as a result of identification by the user then.Therefore, on the one hand, this mode is owing to will carry out full scan and full identification, particularly identification institute is time-consuming quite long entirely, therefore cause the business card recognition speed slow, but the needed information of user is wherein one or several in fact, so there is the waste on the certain procedure in the full identification of full scan; On the other hand, owing to need user oneself to select crucial literal information, just bring certain use trouble to the user.
Summary of the invention
The objective of the invention is: at the deficiencies in the prior art, provide that a kind of method is simple, step is reasonable, the automatic searching and determining method of crucial literal information in intelligent stronger a kind of business card identification.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is: the automatic searching and determining method of crucial literal information in a kind of business card identification, described critical file information is name, comprises the steps:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum;
Step 3, the described adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, unite that zone of judging that crucial literal information finally is positioned at thereby whether meet position according to the semanteme of the adjacent area of the described character area that comprises crucial literal information.
Described special connected domain sum can be the number of the less connected domain of lap on horizontal projection.
Described step 2 can be according to described character properties, chooses the zone of arranging former, has promptly obtained to comprise the character area of crucial literal information.
Described step 2 can also be rejected icon area according to number, character properties and the colouring information of described special connected domain from the zone that comprises crucial literal information that is obtained.
The foundation of described rejecting icon area can be: the number of the special connected domain in condition one, this zone is less than or equal to 1; There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours; If in described three conditions any one satisfied in a certain zone, should the zone be icon area rather than character area then.
Described condition two can be for the word that a word is arranged in this zone wide or word tall and big in average word wide or word is high 2.5 times.
Described character properties can comprise that word height, word are wide, level interval between word.
In technique scheme, the present invention obtains the character zone or the connected region of an every trade owing at first the business card image that obtains is done the character row cutting operation.Generally speaking, be located in substantially within first three zone of average character size maximum as at first interested crucial literal information of user such as name, company's icon, Business Names.As below name under the common situation or the lower right side can be position or title, we can be according to all near these three zones the semanteme that faces the zone mutually whether meet position or title and unite and differentiate that zone that crucial literal information name finally is positioned at, promptly when facing the zone for position or title region mutually, its top or upper left side are exactly that zone that name finally is positioned at.Therefore, realized the accurate extraction of crucial literal information.Simultaneously, the present invention needs user's frequent operation owing to realized the automatic location of crucial literal information and differentiate automatically and saved in the prior art, therefore use more convenient, literal identification is faster.In addition, the present invention adopts the means of statistical nature and keyword lookup to carry out searching of crucial literal information region, can guarantee the accuracy of crucial literal information retrieval.Relative prior art, the present invention has characteristics such as method is simple, easy to use, the literal recognition speed is fast, efficient is high, accuracy height.
Embodiment
Below in conjunction with specific embodiment the present invention is described in further detail.
In our daily life, the overwhelming majority's that people adopted business card all is the first behavior Business Name, and second row is a name between two parties, the third line lower right corner is a job title, then is respectively specifying informations such as address, phone, mobile phone, mail after fourth line reaches.And character properties such as the font of crucial literal information such as general Business Name, name, font size, word space will be far longer than the character properties of other word content.Given this,, particularly obtain crucial literal information name fast, propose technical scheme of the present invention in order to improve the speed of business card identification.
The invention provides the automatic searching and determining method of crucial literal information in a kind of business card identification, described crucial literal information is name.Its step is as follows:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Here, described character properties comprise that word height, word are wide, level interval etc. between word.
Described special connected domain sum is the number of the less connected domain of lap on horizontal projection.Just can only calculate a connected domain as character j; Rj is two connected domains at last.
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum; Specifically:
At first, be index with average (the word height, word is wide) of intra-zone, from high to low arrangement is carried out by this index in zones all on the business card.Choose the zone of front three.According to statistics, often the zone of front three has all comprised Business Name, company's icon, important informations such as name.
Then, in these three zones, utilize the number of the connected domain of trying to achieve in the step 1 and the wide high feature of word to also have colouring information to eliminate icon (icon is often in the front three Candidate Set).Judgment criterion is as follows:
If satisfy one of following rule,
The number of the special connected domain in condition one, this zone is less than or equal to 1;
There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; Be generally greater than wide greater than average word or word is high 2.5 times.
In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours;
Then judge to be icon rather than name in this zone, this zone is rejected away from following further screening.
So, then can accurately obtain to comprise the character area of crucial literal information.After this screening, general just only has been left 2~3 zones in application process.
Step 3, described below or the bottom-right adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, thereby whether the semanteme according to the adjacent area of the described character area that comprises crucial literal information meets position or title, unite that zone of judging that crucial literal information name finally is positioned at, promptly when facing the zone for position or title region mutually, its top or upper left side are exactly that zone that name finally is positioned at.
Here, adopt a toy data base or data-carrier store in the present embodiment, call word as " Manager ", " Sales ", " Engineer ", " Director ", " Doctor ", " manager ", " chairman of the board ", " factory director " or the like similar position in the toy data base as keyword, and according to this keyword, in the character string of identification, search and have or not corresponding vocabulary, have then to show or storage corresponding character content.Under quite rare situation, also might search vocabulary less than correspondence, then owing to can't determine crucial literal information area position, can only carry out the retrieval of keyword in the remaining All Ranges (generally also just being left 1~2 zone this moment), to search the word content of crucial literal information.
Claims (7)
1, the automatic searching and determining method of crucial literal information in a kind of business card identification, described crucial literal information is name, comprises the steps:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum;
Step 3, the described adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, unite that zone of judging that crucial literal information finally is positioned at thereby whether meet position according to the semanteme of the adjacent area of the described character area that comprises crucial literal information.
2, the automatic searching and determining method of crucial literal information in the business card identification according to claim 1 is characterized in that: described special connected domain sum is total number of the less connected domain of lap on horizontal projection.
3, the automatic searching and determining method of crucial literal information during business card is discerned as claimed in claim 1 or 2, it is characterized in that: described step 2 is according to described character properties, chooses the zone of arranging former, has promptly obtained to comprise the character area of crucial literal information.
4, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 3, it is characterized in that: described step 2 also according to number, character properties and the colouring information of described connected domain, is rejected icon area from the zone that comprises crucial literal information that is obtained.
5, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 4, it is characterized in that: the foundation of described rejecting icon area is: the number of the special connected domain in condition one, this zone is less than or equal to 1; There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours; If in described three conditions any one satisfied in a certain zone, should the zone be icon area rather than character area then.
6, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 5, it is characterized in that: described condition two for the word that a word is arranged in this zone wide or word tall and big in average word wide or word is high 2.5 times.
7, as the method for the automatic searching and determining of crucial literal information in the business card identification as described in the claim 6, it is characterized in that: described character properties comprises that word height, word are wide, level interval between word.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004101034834A CN1328695C (en) | 2004-12-30 | 2004-12-30 | Automatic searching and determining method for key words information in name card identification |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2004101034834A CN1328695C (en) | 2004-12-30 | 2004-12-30 | Automatic searching and determining method for key words information in name card identification |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1632821A CN1632821A (en) | 2005-06-29 |
| CN1328695C true CN1328695C (en) | 2007-07-25 |
Family
ID=34848182
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2004101034834A Expired - Fee Related CN1328695C (en) | 2004-12-30 | 2004-12-30 | Automatic searching and determining method for key words information in name card identification |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1328695C (en) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101246475B (en) | 2007-02-14 | 2010-05-19 | 北京书生国际信息技术有限公司 | A Retrieval Method Based on Layout Information |
| CN102194118B (en) * | 2010-03-02 | 2013-04-10 | 方正国际软件(北京)有限公司 | Method and device for extracting information from image |
| CN103209241A (en) * | 2012-01-11 | 2013-07-17 | 联想(北京)有限公司 | Information sending method and electronic device |
| CN103093217A (en) * | 2013-01-06 | 2013-05-08 | 北京百度网讯科技有限公司 | Interactive image and character recognition method and device |
| CN106056114B (en) * | 2016-05-24 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Contents of visiting cards recognition methods and device |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001052112A (en) * | 1999-08-11 | 2001-02-23 | Fujitsu Ltd | Recognition processing method, information processing device and recording medium |
| CN1339775A (en) * | 2000-08-22 | 2002-03-13 | 英业达集团(上海)电子技术有限公司 | Automatic identifying method and system for name card |
-
2004
- 2004-12-30 CN CNB2004101034834A patent/CN1328695C/en not_active Expired - Fee Related
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001052112A (en) * | 1999-08-11 | 2001-02-23 | Fujitsu Ltd | Recognition processing method, information processing device and recording medium |
| CN1339775A (en) * | 2000-08-22 | 2002-03-13 | 英业达集团(上海)电子技术有限公司 | Automatic identifying method and system for name card |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1632821A (en) | 2005-06-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109657738A (en) | Character identifying method, device, equipment and storage medium | |
| CN101211370B (en) | Content register device, content register method | |
| JP4454789B2 (en) | Form classification method and apparatus | |
| CN105320778B (en) | A method for product labeling of Chinese e-commerce websites | |
| EP1265189A1 (en) | Pattern extraction apparatus and method | |
| CN101645086B (en) | Retrieval method | |
| CN101911069A (en) | Method and system for discovery and modification of data clusters and synonyms | |
| CN101923643A (en) | General form recognizing method | |
| CN109101561B (en) | Wine label identification method | |
| CN104915664A (en) | Contact object identification acquisition method and device | |
| CN1328695C (en) | Automatic searching and determining method for key words information in name card identification | |
| CN106326454A (en) | Image identification method | |
| CN107302757A (en) | The method of calling and device of emergency numbers | |
| CN101087480A (en) | An intelligent legend dialing method of mobile terminal | |
| CN115203474A (en) | Automatic database classification and extraction technology | |
| CN108921016B (en) | Book score obtaining method based on image recognition, electronic equipment and storage medium | |
| CN103020651B (en) | Method for detecting sensitive information of microblog pictures | |
| CN114220112A (en) | A method and system for job relationship extraction for character business cards | |
| CN111488327B (en) | Data standard management method and system | |
| KR101692244B1 (en) | Method for spam classfication, recording medium and device for performing the method | |
| CN1564569A (en) | Serching method of telephone number and its serching engine | |
| CN105872232A (en) | Number on-line inquiry method and number on-line inquiry apparatus | |
| CN103475764A (en) | Method and device for inputting contacts, and method and device for classifying contacts | |
| CN107562944A (en) | A kind of scan image and the approaches to IM for extracting image | |
| CN107609104A (en) | The method and system of associated video is searched according to video image material |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| C17 | Cessation of patent right | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070725 Termination date: 20111230 |