[go: up one dir, main page]

CN1328695C - Automatic searching and determining method for key words information in name card identification - Google Patents

Automatic searching and determining method for key words information in name card identification Download PDF

Info

Publication number
CN1328695C
CN1328695C CNB2004101034834A CN200410103483A CN1328695C CN 1328695 C CN1328695 C CN 1328695C CN B2004101034834 A CNB2004101034834 A CN B2004101034834A CN 200410103483 A CN200410103483 A CN 200410103483A CN 1328695 C CN1328695 C CN 1328695C
Authority
CN
China
Prior art keywords
word
zone
character
literal information
business card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004101034834A
Other languages
Chinese (zh)
Other versions
CN1632821A (en
Inventor
吴文钦
王浩
夏煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vimicro Corp
Original Assignee
Vimicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimicro Corp filed Critical Vimicro Corp
Priority to CNB2004101034834A priority Critical patent/CN1328695C/en
Publication of CN1632821A publication Critical patent/CN1632821A/en
Application granted granted Critical
Publication of CN1328695C publication Critical patent/CN1328695C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The present invention discloses an automatic searching and determining method for key word information in name card identification. An acquired name card image is firstly provided with the dividing operation of character rows to obtain rows of character zones or communicating zones. Thus, the sequence arrangement is carried out according to character parameters and the number of the communicating zones, and the zones of the front number positions are used. The zone on which is key word information finally positioned is identified by the combination that whether the semantic meaning of the zones contiguous to the used zones is in accordance with positions or titles. Thus, the accurate extraction of the key word information is realized. Compared with the prior art, the present invention has the characteristics of simple method, convenient use, high character recognizing speed, high efficiency, high accuracy, etc.

Description

The automatic searching and determining method of crucial literal information in a kind of business card identification
Technical field
The present invention relates to a kind of name card recognition technique, the method for the automatic searching and determining of crucial literal information in particularly a kind of business card identification.
Technical background
Name, company, position etc. all are important information in business card, general, these important informations all can mark out with relatively special form, positional alignment as important information is earlier, text parameters such as size, width, word space are bigger, and perhaps background or foreground color are different etc.For different application scenarios or different user, often there is the information of most critical in the important information again, how accurately to extract these crucial literal information, be a good problem to study.And at present in name card recognition technique, all be that the content on the business card is completely scanned, discern then, select in the character string as a result of identification by the user then.Therefore, on the one hand, this mode is owing to will carry out full scan and full identification, particularly identification institute is time-consuming quite long entirely, therefore cause the business card recognition speed slow, but the needed information of user is wherein one or several in fact, so there is the waste on the certain procedure in the full identification of full scan; On the other hand, owing to need user oneself to select crucial literal information, just bring certain use trouble to the user.
Summary of the invention
The objective of the invention is: at the deficiencies in the prior art, provide that a kind of method is simple, step is reasonable, the automatic searching and determining method of crucial literal information in intelligent stronger a kind of business card identification.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is: the automatic searching and determining method of crucial literal information in a kind of business card identification, described critical file information is name, comprises the steps:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum;
Step 3, the described adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, unite that zone of judging that crucial literal information finally is positioned at thereby whether meet position according to the semanteme of the adjacent area of the described character area that comprises crucial literal information.
Described special connected domain sum can be the number of the less connected domain of lap on horizontal projection.
Described step 2 can be according to described character properties, chooses the zone of arranging former, has promptly obtained to comprise the character area of crucial literal information.
Described step 2 can also be rejected icon area according to number, character properties and the colouring information of described special connected domain from the zone that comprises crucial literal information that is obtained.
The foundation of described rejecting icon area can be: the number of the special connected domain in condition one, this zone is less than or equal to 1; There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours; If in described three conditions any one satisfied in a certain zone, should the zone be icon area rather than character area then.
Described condition two can be for the word that a word is arranged in this zone wide or word tall and big in average word wide or word is high 2.5 times.
Described character properties can comprise that word height, word are wide, level interval between word.
In technique scheme, the present invention obtains the character zone or the connected region of an every trade owing at first the business card image that obtains is done the character row cutting operation.Generally speaking, be located in substantially within first three zone of average character size maximum as at first interested crucial literal information of user such as name, company's icon, Business Names.As below name under the common situation or the lower right side can be position or title, we can be according to all near these three zones the semanteme that faces the zone mutually whether meet position or title and unite and differentiate that zone that crucial literal information name finally is positioned at, promptly when facing the zone for position or title region mutually, its top or upper left side are exactly that zone that name finally is positioned at.Therefore, realized the accurate extraction of crucial literal information.Simultaneously, the present invention needs user's frequent operation owing to realized the automatic location of crucial literal information and differentiate automatically and saved in the prior art, therefore use more convenient, literal identification is faster.In addition, the present invention adopts the means of statistical nature and keyword lookup to carry out searching of crucial literal information region, can guarantee the accuracy of crucial literal information retrieval.Relative prior art, the present invention has characteristics such as method is simple, easy to use, the literal recognition speed is fast, efficient is high, accuracy height.
Embodiment
Below in conjunction with specific embodiment the present invention is described in further detail.
In our daily life, the overwhelming majority's that people adopted business card all is the first behavior Business Name, and second row is a name between two parties, the third line lower right corner is a job title, then is respectively specifying informations such as address, phone, mobile phone, mail after fourth line reaches.And character properties such as the font of crucial literal information such as general Business Name, name, font size, word space will be far longer than the character properties of other word content.Given this,, particularly obtain crucial literal information name fast, propose technical scheme of the present invention in order to improve the speed of business card identification.
The invention provides the automatic searching and determining method of crucial literal information in a kind of business card identification, described crucial literal information is name.Its step is as follows:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Here, described character properties comprise that word height, word are wide, level interval etc. between word.
Described special connected domain sum is the number of the less connected domain of lap on horizontal projection.Just can only calculate a connected domain as character j; Rj is two connected domains at last.
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum; Specifically:
At first, be index with average (the word height, word is wide) of intra-zone, from high to low arrangement is carried out by this index in zones all on the business card.Choose the zone of front three.According to statistics, often the zone of front three has all comprised Business Name, company's icon, important informations such as name.
Then, in these three zones, utilize the number of the connected domain of trying to achieve in the step 1 and the wide high feature of word to also have colouring information to eliminate icon (icon is often in the front three Candidate Set).Judgment criterion is as follows:
If satisfy one of following rule,
The number of the special connected domain in condition one, this zone is less than or equal to 1;
There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; Be generally greater than wide greater than average word or word is high 2.5 times.
In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours;
Then judge to be icon rather than name in this zone, this zone is rejected away from following further screening.
So, then can accurately obtain to comprise the character area of crucial literal information.After this screening, general just only has been left 2~3 zones in application process.
Step 3, described below or the bottom-right adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, thereby whether the semanteme according to the adjacent area of the described character area that comprises crucial literal information meets position or title, unite that zone of judging that crucial literal information name finally is positioned at, promptly when facing the zone for position or title region mutually, its top or upper left side are exactly that zone that name finally is positioned at.
Here, adopt a toy data base or data-carrier store in the present embodiment, call word as " Manager ", " Sales ", " Engineer ", " Director ", " Doctor ", " manager ", " chairman of the board ", " factory director " or the like similar position in the toy data base as keyword, and according to this keyword, in the character string of identification, search and have or not corresponding vocabulary, have then to show or storage corresponding character content.Under quite rare situation, also might search vocabulary less than correspondence, then owing to can't determine crucial literal information area position, can only carry out the retrieval of keyword in the remaining All Ranges (generally also just being left 1~2 zone this moment), to search the word content of crucial literal information.

Claims (7)

1, the automatic searching and determining method of crucial literal information in a kind of business card identification, described crucial literal information is name, comprises the steps:
Step 1, the business card image of input is carried out cutting apart based on the printed page analysis and the literal of connected domain, and statistics character properties and special connected domain sum;
Step 2, choose the character area that comprises crucial literal information according to described character properties and special connected domain sum;
Step 3, the described adjacent area that comprises the character area of crucial literal information of search, and described adjacent area is carried out literal discern;
Step 4, in the character string that literal identification obtains, search the keyword of expression position, obtain its word content, unite that zone of judging that crucial literal information finally is positioned at thereby whether meet position according to the semanteme of the adjacent area of the described character area that comprises crucial literal information.
2, the automatic searching and determining method of crucial literal information in the business card identification according to claim 1 is characterized in that: described special connected domain sum is total number of the less connected domain of lap on horizontal projection.
3, the automatic searching and determining method of crucial literal information during business card is discerned as claimed in claim 1 or 2, it is characterized in that: described step 2 is according to described character properties, chooses the zone of arranging former, has promptly obtained to comprise the character area of crucial literal information.
4, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 3, it is characterized in that: described step 2 also according to number, character properties and the colouring information of described connected domain, is rejected icon area from the zone that comprises crucial literal information that is obtained.
5, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 4, it is characterized in that: the foundation of described rejecting icon area is: the number of the special connected domain in condition one, this zone is less than or equal to 1; There is the wide or word of the word of a word farsighted wide or word is high in condition two, this zone greater than average word; In condition three, the foreground target in the zone that is partitioned into of view picture business card image, having only the prospect in this zone is different colours; If in described three conditions any one satisfied in a certain zone, should the zone be icon area rather than character area then.
6, as the automatic searching and determining method of crucial literal information in the business card identification as described in the claim 5, it is characterized in that: described condition two for the word that a word is arranged in this zone wide or word tall and big in average word wide or word is high 2.5 times.
7, as the method for the automatic searching and determining of crucial literal information in the business card identification as described in the claim 6, it is characterized in that: described character properties comprises that word height, word are wide, level interval between word.
CNB2004101034834A 2004-12-30 2004-12-30 Automatic searching and determining method for key words information in name card identification Expired - Fee Related CN1328695C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004101034834A CN1328695C (en) 2004-12-30 2004-12-30 Automatic searching and determining method for key words information in name card identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004101034834A CN1328695C (en) 2004-12-30 2004-12-30 Automatic searching and determining method for key words information in name card identification

Publications (2)

Publication Number Publication Date
CN1632821A CN1632821A (en) 2005-06-29
CN1328695C true CN1328695C (en) 2007-07-25

Family

ID=34848182

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004101034834A Expired - Fee Related CN1328695C (en) 2004-12-30 2004-12-30 Automatic searching and determining method for key words information in name card identification

Country Status (1)

Country Link
CN (1) CN1328695C (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246475B (en) 2007-02-14 2010-05-19 北京书生国际信息技术有限公司 A Retrieval Method Based on Layout Information
CN102194118B (en) * 2010-03-02 2013-04-10 方正国际软件(北京)有限公司 Method and device for extracting information from image
CN103209241A (en) * 2012-01-11 2013-07-17 联想(北京)有限公司 Information sending method and electronic device
CN103093217A (en) * 2013-01-06 2013-05-08 北京百度网讯科技有限公司 Interactive image and character recognition method and device
CN106056114B (en) * 2016-05-24 2019-07-05 腾讯科技(深圳)有限公司 Contents of visiting cards recognition methods and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001052112A (en) * 1999-08-11 2001-02-23 Fujitsu Ltd Recognition processing method, information processing device and recording medium
CN1339775A (en) * 2000-08-22 2002-03-13 英业达集团(上海)电子技术有限公司 Automatic identifying method and system for name card

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001052112A (en) * 1999-08-11 2001-02-23 Fujitsu Ltd Recognition processing method, information processing device and recording medium
CN1339775A (en) * 2000-08-22 2002-03-13 英业达集团(上海)电子技术有限公司 Automatic identifying method and system for name card

Also Published As

Publication number Publication date
CN1632821A (en) 2005-06-29

Similar Documents

Publication Publication Date Title
CN109657738A (en) Character identifying method, device, equipment and storage medium
CN101211370B (en) Content register device, content register method
JP4454789B2 (en) Form classification method and apparatus
CN105320778B (en) A method for product labeling of Chinese e-commerce websites
EP1265189A1 (en) Pattern extraction apparatus and method
CN101645086B (en) Retrieval method
CN101911069A (en) Method and system for discovery and modification of data clusters and synonyms
CN101923643A (en) General form recognizing method
CN109101561B (en) Wine label identification method
CN104915664A (en) Contact object identification acquisition method and device
CN1328695C (en) Automatic searching and determining method for key words information in name card identification
CN106326454A (en) Image identification method
CN107302757A (en) The method of calling and device of emergency numbers
CN101087480A (en) An intelligent legend dialing method of mobile terminal
CN115203474A (en) Automatic database classification and extraction technology
CN108921016B (en) Book score obtaining method based on image recognition, electronic equipment and storage medium
CN103020651B (en) Method for detecting sensitive information of microblog pictures
CN114220112A (en) A method and system for job relationship extraction for character business cards
CN111488327B (en) Data standard management method and system
KR101692244B1 (en) Method for spam classfication, recording medium and device for performing the method
CN1564569A (en) Serching method of telephone number and its serching engine
CN105872232A (en) Number on-line inquiry method and number on-line inquiry apparatus
CN103475764A (en) Method and device for inputting contacts, and method and device for classifying contacts
CN107562944A (en) A kind of scan image and the approaches to IM for extracting image
CN107609104A (en) The method and system of associated video is searched according to video image material

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070725

Termination date: 20111230