[go: up one dir, main page]

CN113821666B - Image and text recognition retrieval method and device, storage medium and electronic equipment - Google Patents

Image and text recognition retrieval method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113821666B
CN113821666B CN202110126962.1A CN202110126962A CN113821666B CN 113821666 B CN113821666 B CN 113821666B CN 202110126962 A CN202110126962 A CN 202110126962A CN 113821666 B CN113821666 B CN 113821666B
Authority
CN
China
Prior art keywords
text
image
score
database
grading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110126962.1A
Other languages
Chinese (zh)
Other versions
CN113821666A (en
Inventor
余伟伟
闫创
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN202110126962.1A priority Critical patent/CN113821666B/en
Publication of CN113821666A publication Critical patent/CN113821666A/en
Application granted granted Critical
Publication of CN113821666B publication Critical patent/CN113821666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供一种图文识别检索方法及装置、电子设备、计算机可读存储介质,涉及图文识别技术领域。所述图文识别检索方法包括:接收图像并存储至数据库,识别所述图像中的多条文字;依据各所述文字的文字信息得到对应的多个评分值,并基于所述多个评分值得到各所述文字对应的综合评分;基于所述综合评分将各所述文字存储至所述数据库,以便在所述数据库中基于评分后的所述文字搜索对应的所述图像。本公开在识别图像中的文字后,通过结合识别到的文字信息得到多个评分值,并基于多个评分值得到综合评分,基于该综合评分对上述文字识别的结果进行存储,从而可以达到优化检索结果的效果。

The present disclosure provides a method and device for image and text recognition and retrieval, an electronic device, and a computer-readable storage medium, and relates to the field of image and text recognition technology. The image and text recognition and retrieval method comprises: receiving an image and storing it in a database, and identifying multiple texts in the image; obtaining multiple corresponding scoring values according to the text information of each of the texts, and obtaining a comprehensive score corresponding to each of the texts based on the multiple scoring values; storing each of the texts in the database based on the comprehensive score, so as to search the corresponding image in the database based on the scored text. After recognizing the text in the image, the present disclosure obtains multiple scoring values by combining the recognized text information, and obtains a comprehensive score based on the multiple scoring values, and stores the above-mentioned text recognition results based on the comprehensive score, so as to achieve the effect of optimizing the retrieval results.

Description

Image-text recognition retrieval method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of image-text recognition technology, and in particular, to an image-text recognition and retrieval method, an image-text recognition and retrieval device, an electronic device, and a computer readable storage medium.
Background
The search refers to a process of searching information or information required by the user from information sets such as literature information and network information. With the advent of the information age, carriers of information have been diversified, for example, to carry information in the form of images and the like. Images are widely used in many fields because they can more vividly and intuitively convey information, for example, in the advertising field, images can bring better visual stimulus, better propaganda effect, and the like.
In order to better adapt to the progress of informatization, the research on the image-text recognition and retrieval technology is also becoming more important. In the related art, the image-text recognition search focuses on providing an image-text search system or expanding the search function, but the problems of data cleaning of the recognized text result set, how to use the diversified text information in the image to optimize the search effect, and the like are not considered.
Therefore, in order to solve the above-mentioned problems, it is necessary to provide a method for identifying and retrieving graphics, which can implement data cleaning of the identification result by integrating the information of the graphics in multiple aspects in the image during the process of character identification, so as to optimize the effect of retrieving the image according to the graphics.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of an embodiment of the present disclosure is to provide a text-to-text recognition and retrieval method, a text-to-text recognition and retrieval device, an electronic apparatus, and a computer readable storage medium, so as to implement data cleaning of recognition results by integrating multiple aspects of information of text in an image in a text recognition process, thereby optimizing an effect of retrieving an image according to text.
According to a first aspect of the present disclosure, there is provided a text recognition retrieval method, including:
receiving an image and storing the image into a database, and identifying a plurality of characters in the image;
obtaining a plurality of corresponding grading values according to the character information of each character, and obtaining a comprehensive grading corresponding to each character based on the grading values;
and storing each text to the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
In an exemplary embodiment of the disclosure, the identifying the plurality of words in the image includes:
and extracting the plurality of characters in the image based on an optical character recognition algorithm, and obtaining the character information of each character.
In an exemplary embodiment of the disclosure, the obtaining a plurality of corresponding scoring values according to the text information of each text includes:
Acquiring an algorithm score of an algorithm for identifying the corresponding text in the image, a semantic integrity score of each text, and a saliency score of each text in the image based on the text information;
The semantic integrity score is determined by the proportion of the number of words after word segmentation of each word to the total number of words, and the saliency score is determined by the proportion of each word to the total area of all words in the image.
In an exemplary embodiment of the disclosure, the obtaining, based on the multiple scoring values, a composite score corresponding to each of the words includes:
and carrying out weighted operation on the algorithm score, the semantic integrity score and the saliency score to obtain the comprehensive score corresponding to each word.
In an exemplary embodiment of the disclosure, the storing each of the words to the database based on the composite score includes:
and sorting the words based on the comprehensive scores, classifying the words into a plurality of grades according to the sorting result, filtering the words according to the grades, and storing the filtered words into the database according to a preset proportion.
In an exemplary embodiment of the disclosure, the searching the corresponding image in the database based on the scored text includes:
Receiving a search keyword, matching the scored text based on the search keyword, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the plurality of secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
In an exemplary embodiment of the disclosure, the searching the corresponding image in the database based on the scored text includes:
receiving a search keyword, respectively matching the search keyword with the words of different grades in the database, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
According to a second aspect of the present disclosure, there is provided a graphic identification search device, including:
the character recognition module is used for receiving the image and storing the image into the database, and recognizing a plurality of characters in the image;
the data cleaning module is used for obtaining a plurality of corresponding grading values according to the character information of each character and obtaining a comprehensive grading corresponding to each character based on the grading values;
And the identification and retrieval module is used for storing each text into the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
According to a third aspect of the present disclosure, there is provided an electronic device comprising a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure may have some or all of the following advantages:
The image-text recognition retrieval method provided by the example embodiment of the disclosure receives an image, stores the image in a database, recognizes a plurality of characters in the image, obtains a plurality of corresponding grading values according to character information of each character, obtains a comprehensive grading corresponding to each character based on the plurality of grading values, and stores each character in the database based on the comprehensive grading so as to search for a corresponding image in the database based on the graded characters. On the one hand, after the text recognition and search method provided by the embodiment of the invention recognizes a plurality of texts in an image, the comprehensive score corresponding to each text is obtained through the text information of each text, and the information of each aspect of the text is fully utilized, so that the text recognition result can be effectively cleaned. On the other hand, after the character recognition result is effectively cleaned, the character recognition result is put into a database, so that images can be searched in the database through the cleaned characters, and the searching accuracy is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 is a schematic diagram of an exemplary system architecture to which the methods and apparatus for image-text recognition retrieval of embodiments of the present disclosure may be applied;
FIG. 2 illustrates a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a method of teletext identification retrieval according to one embodiment of the disclosure;
fig. 4 schematically illustrates a system architecture diagram of an application scenario of a teletext recognition retrieval method according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a flow diagram of data cleansing for one particular application scenario according to the present disclosure;
FIG. 6 schematically illustrates a flow diagram of picture retrieval for one particular application scenario according to the present disclosure;
fig. 7 schematically illustrates a block diagram of a teletext recognition retrieval arrangement according to an embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram of a system architecture of an exemplary application environment to which a method and apparatus for image-text recognition retrieval according to an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be electronic devices with photographing or image transmission functions, including but not limited to desktop computers, portable computers, smart phones, tablet computers, etc. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.
The image-text recognition retrieval method provided by the embodiment of the disclosure can be executed by the terminal equipment 101, 102 and 103, and correspondingly, the image-text recognition retrieval device can also be arranged in the terminal equipment 101, 102 and 103. The image-text recognition retrieval method provided by the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, 103 and the server 105 together, and accordingly, the image-text recognition retrieval device may be disposed in the terminal devices 101, 102, 103 and the server 105. In addition, the image-text recognition search method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the image-text recognition search device may be disposed in the server 105, which is not particularly limited in the present exemplary embodiment.
For example, in the present exemplary embodiment, the above-described teletext identification search method may be performed by the terminal apparatus 101, 102, 103 in conjunction with the server 105. First, an image may be photographed or received by a terminal device, which transmits the image to a server after receiving or photographing the image. The server stores the image into a database, invokes a graph-text recognition algorithm to recognize a plurality of words in the image, obtains a plurality of corresponding grading values according to the word information of each word, obtains a comprehensive grading value based on the obtained plurality of grading values, and finally stores the plurality of words into the database by taking the comprehensive grading value as a basis so as to search the corresponding image in the database based on the graded words.
Fig. 2 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.
As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU) 201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data required for the system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.
Connected to the I/O interface 205 are an input section 206 including a keyboard, a mouse, and the like, an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like, a storage section 208 including a hard disk, and the like, and a communication section 209 including a network interface card such as a LAN card, a modem, and the like. The communication section 209 performs communication processing via a network such as the internet. The drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 210 as needed, so that a computer program read out therefrom is installed into the storage section 208 as needed.
With the advent of the information age, the carrier of information has been diversified, and for example, images have been widely used in many fields because they can more vividly and intuitively convey information. In order to better adapt to the progress of informatization, the research on the image-text recognition and retrieval technology is also becoming more important.
The image-text recognition and retrieval technology mainly comprises two aspects of data cleaning and text searching in the text recognition process. In the related art, the technical means about data cleaning mainly comprises two aspects, namely, optimizing a recognition algorithm to improve recognition accuracy, scoring a recognition result through the algorithm, and filtering through a set score threshold. This approach allows for effective cleaning, but ignores the information such as the location, size, etc. of the text in the image.
In addition, for searching the graph in text, the sentence to be searched is matched with the data set in the database mainly through the function of full text index of the database, and a result set is returned. Although the method can realize the retrieval, the returned result set only depends on the full text retrieval matching result of the database, the problems of how to optimize the retrieval effect by utilizing the diversified text information in the image and the like are not considered, and the accuracy is lower during the short word retrieval.
In order to solve the problems in the above method, the present exemplary embodiment proposes a technical solution, which can implement data cleaning on the recognition result by integrating multiple aspects of information of the text in the image during text recognition, so as to optimize the effect of retrieving the image according to the text. The following describes the technical scheme of the embodiments of the present disclosure in detail:
The present exemplary embodiment first provides a method for identifying and retrieving graphics. Referring to fig. 3, the image-text recognition search method specifically includes the following steps:
step S310, receiving an image, storing the image in a database, and identifying a plurality of characters in the image;
Step S320, obtaining a plurality of corresponding grading values according to the text information of each text, and obtaining a comprehensive grading corresponding to each text based on the plurality of grading values;
and step S330, storing each text into the database based on the comprehensive scores so as to search the corresponding images in the database based on the scored text.
In the image-text recognition and search method provided by the exemplary embodiment of the present disclosure, on one hand, after recognizing a plurality of characters in an image, the image-text recognition and search method provided by the exemplary embodiment further obtains a comprehensive score corresponding to each character through the character information of each character, and makes full use of information of each aspect of the character, so that effective cleaning of the character recognition result can be achieved. On the other hand, after the character recognition result is effectively cleaned, the character recognition result is put into a database, so that images can be searched in the database through the cleaned characters, and the searching accuracy is improved.
In another embodiment, the above steps are described in more detail below.
In step S310, the image is received and stored in a database, and a plurality of characters in the image are recognized.
The image-text recognition retrieval method provided by the present exemplary embodiment is used for providing a function of recognizing characters in an image and retrieving the image through keywords. For example, the image-text recognition retrieval method can be executed together by the terminal device and the server. Specifically, a database may be established in the server for storing text, images, providing retrieval functions, and the like. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the present exemplary embodiment, the image is an arbitrary image including text content, for example, may be an advertisement image including a plurality of advertisements, or may be another type of image including text, and the present exemplary embodiment is not limited thereto. In addition, the above-mentioned image may be captured by the terminal device and transmitted to the server, or may be an image externally input to the terminal device and transmitted from the terminal device to the server, or directly transmitted from the outside to the server, which is not particularly limited in this exemplary embodiment.
In this exemplary embodiment, after the image is received, the received image is stored in a database, and a plurality of characters included in the image are recognized by a text recognition algorithm. The image-text recognition algorithm is used for analyzing, recognizing and processing the image file of the text data to acquire the text and layout information in the image. For example, the image-text recognition algorithm may be an OCR (Optical Character Recognition ) algorithm. Characters in the image can be extracted through an OCR algorithm so as to facilitate subsequent further analysis, thereby exerting the value of the characters. For example, an image is searched for according to text contents, and the image is classified according to text contents. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In step S320, a plurality of corresponding score values are obtained according to the text information of each text, and a comprehensive score corresponding to each text is obtained based on the plurality of score values.
In this exemplary embodiment, after the plurality of characters in the image are identified by the image-text recognition algorithm, the composite score corresponding to each character may also be obtained according to the character information of each character. For example, the text information may include text content, text position information in an image, text area, and the like. It should be noted that the above scenario is only an exemplary illustration, and the present exemplary embodiment is not limited thereto. For example, the text information may include more or less information according to actual needs, which falls within the protection scope of the present exemplary embodiment.
In the present exemplary embodiment, the scoring value may be obtained based on the text information. For example, the scoring values may include a graphic recognition algorithm score, a text saliency score, and a sentence meaning integrity score, and the process of obtaining multiple scoring values based on text information may be implemented by obtaining the algorithm score of the graphic recognition algorithm, the semantic integrity score of each text, and the saliency score of each text in the image based on the text information. The semantic integrity score is determined by the proportion of the number of words after word segmentation of each word to the total number of words, and the saliency score is determined by the proportion of each word to the total area of all words in the image.
Taking the image as an advertisement picture, the advertisement picture comprises a plurality of advertisement words, and assuming that any advertisement word is C i, the semantic integrity score is obtainedThe calculation formula of (2) can be as follows:
Wherein, the Word number after word segmentation for each row of characters,For the total word number of the advertisementCan represent the word number of each row of words after word segmentationAccounting for total word numberThe higher the value, the higher the scale representing the semantic integrity of the sentence, and the easier it is to understand the corresponding sentence.
In addition, the above-mentioned saliency scoreThe calculation formula of (2) can be as follows:
wherein a i is the area of the area where any line of characters is located, The area occupied by all text areas in the whole picture isThe area of the area where the text of the line is located occupies the area of the area where all the text is located in the whole pictureRepresenting the level of salience of the line of text in the picture, the greater this value, the more noticeable it is in the picture, and the higher its importance.
It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the present exemplary embodiment, after the plurality of score values of each text are obtained based on the recognized text information, a composite score corresponding to each text may be obtained based on the plurality of score values. The process can be realized by carrying out weighted operation on the algorithm score, the semantic integrity score and the saliency score to obtain the comprehensive score corresponding to each text.
Taking the advertisement picture as an example, the composite score can be obtained by the following formula:
Wherein, the above In order to weight the composite score,Scoring the image-text recognition algorithm,For the above-described semantic integrity score,The above-mentioned saliency was scored. The alpha, the beta and the lambda are respectively the weighted proportion of the image-text recognition algorithm score, the semantic integrity score and the saliency score. The weighted specific gravity can be determined according to actual conditions and empirical values. For example, it is preferable that the specific gravity combination be such that the value of α is 0.3, the value of β is 0.1, and the value of λ is 0.6. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In step S330, each text is stored in the database based on the composite score, so that the corresponding image is searched in the database based on the scored text.
In this exemplary embodiment, in order to achieve more efficient data cleansing, after the composite score is obtained in step S320, the identified plurality of words may be correspondingly stored in the database based on the composite score. For example, the process may be implemented by sorting the words based on the composite score, classifying the words into a plurality of levels according to the sorting result, filtering the words according to the plurality of levels, and storing the filtered words in a database according to a predetermined ratio. In addition, after the characters are stored in the database, an index relationship between the characters and the images can be established, so that the corresponding images can be searched in the database based on the scored characters.
Specifically, taking the advertisement picture as an example, the above process may be that a weighted composite score of all advertisements in the advertisement picture is calculatedAnd then sorting the identified plurality of advertisements into three categories of criterial (very important), normal (generally important) and dirty (filterable) according to the score of the comprehensive score from large to small. Wherein, the classification rule can be that according to the weighted comprehensive scoreSorting, namely dividing all the advertisements into two parts according to a certain proportion, dividing the advertisements sorted in the former part into critical categories, wherein the proportion can be determined according to actual requirements and experience values, preferably, the proportion can be 1:1, then dividing the advertisements in the latter part into normal categories according to a preset fixed weighted score S ', and dividing the advertisements smaller than S' into dirty categories. The value of S' can be determined according to actual demands and experience values, preferably, the value of the preset fixed weighted score can be 0.15, and finally, advertisement words in the category of dirty are filtered out and the category of critical and the category of normal are stored in the database according to the category. In addition, an index relationship between the advertisement and the image can be established in the database. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In this exemplary embodiment, the process of searching for a corresponding image based on the scored text in the database may be implemented by receiving a search keyword, matching the scored text based on the search keyword, obtaining a plurality of secondary scoring results, and obtaining a secondary composite score based on the plurality of secondary scoring results, sorting the secondary composite scores, and returning a result set based on the sorting result, where the result set includes a plurality of pictures. Specifically, the matching of the scored text based on the search keyword may obtain multiple secondary scoring results, where the matching of the search keyword from different grades of text in the database may obtain multiple secondary scoring results.
Taking the advertisement picture as an example, the process can be implemented by performing secondary scoring according to the classified advertisement words of the critical and normal categories (the category of the dirty is filtered in the process of data cleaning) and scoring of full-text retrieval of the database, obtaining a plurality of secondary scoring results, sorting the secondary scoring results, and returning a result set according to the sorting results. Specifically, assuming that, for the search keyword K, when the database performs full-text search, the secondary scoring result of searching for an image from the critical category is Sc and the secondary scoring result of searching for an image from the normal category is S n, the secondary comprehensive score can be calculated by the following formula:
wherein α+β=1, and the values of α and β can be determined according to practical situations and experience. For example, the value of α may be 0.71 and the value of β may be 0.29. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the searching process, after the secondary comprehensive scores of all the results in the result set are calculated, the results are reordered according to the scores of the secondary comprehensive scores, and the semantic integrity of the characters in the image and the saliency of the characters in the image are considered, so that the results obtained by the searching method are more accurate.
In the following, taking a specific application scenario of image-text identification search of advertisement pictures as an example, the image-text identification search method is fully described with reference to fig. 4 to 6. Fig. 4 is a schematic diagram of an advertisement picture text recognition and retrieval system. As shown in fig. 4, the system architecture of the recognition retrieval system includes an external interaction layer 410, a recognition service module 420, a data cleansing module 430, a database 440, a result set optimization module 450, and a retrieval service module 460. Wherein:
The external interaction layer is used for submitting the picture to the system in a direct or indirect mode, receiving the result set retrieved by the system and sending the result set to the caller.
The recognition service module is used for recognizing characters (a plurality of advertisement words) in the picture obtained through the external interaction layer, and specifically, the advertisement words in the picture can be recognized through an optical character recognition OCR algorithm.
The data cleaning module is used for cleaning the data of the identified advertisement. Specifically, the data cleansing process may be implemented by executing the flow shown in fig. 5. As shown in fig. 5, the process includes the steps of:
and S510, obtaining scores recognized by an OCR algorithm.
In this step, a score recognized by the OCR algorithm of any of the above recognized advertisments C i is obtainedIn repeating this step, scores of OCR algorithm recognition for all of the recognized advertisements may be obtained.
Step S520, word segmentation is carried out on sentences (advertisement words), and the proportion of the word segmentation to the sentences is calculated.
In the step, the advertisement languages are segmented, and semantic integrity scores are obtained by calculating the proportion of the segmented words in sentences. Wherein the word segmentation accounts for the proportion of sentencesThe method can be calculated by the following formula:
Wherein, the Word number after word segmentation for each row of characters,For the total word number of the advertisementCan represent the word number of each row of words after word segmentationAccounting for total word numberThe higher the value, the higher the scale representing the semantic integrity of the sentence, and the easier it is to understand the corresponding sentence.
And step S530, calculating the proportion of the area of each advertisement in the picture to the area of all characters in the picture.
In this step, the ratio of the area of each advertisement in the picture to the area of all characters in the picture is calculatedTo obtain a prominence score for each advertisement in the picture,The method can be calculated by the following formula:
wherein a i is the area of the area where any line of characters is located, The area occupied by all text areas in the whole picture isThe area of the area where the text of the line is located occupies the area of the area where all the text is located in the whole pictureRepresenting the level of salience of the line of text in the picture, the greater this value, the more noticeable it is in the picture, and the higher its importance.
Step S540, calculating weighted scores.
In this step, a weighted score of the OCR algorithm recognition score, the semantic integrity score, and the prominence score is calculated. The weighted score may be calculated by the following formula:
Wherein, the above In order to weight the composite score,Scoring the image-text recognition algorithm,For the above-described semantic integrity score,The above-mentioned saliency was scored. The alpha, the beta and the lambda are respectively the weighted proportion of the image-text recognition algorithm score, the semantic integrity score and the saliency score. The weighted specific gravity can be determined according to actual conditions and empirical values. For example, it is preferable that the specific gravity combination be such that the value of α is 0.3, the value of β is 0.1, and the value of λ is 0.6.
And step S550, classifying all advertisements into critical, normal, dirty three grades according to the weighted scores.
In this step, the scores of the weighted scores are sorted in order from the large to the small, and all the identified advertisement languages are classified into three classes of critical, normal, and dirty (filterable) according to the sorting result. Wherein, the classification rule can be that according to the weighted comprehensive scoreSorting, namely dividing all the advertisements into two parts according to a certain proportion, dividing the advertisements sorted in the former part into critical categories, wherein the proportion can be determined according to actual requirements and experience values, preferably, the proportion can be 1:1, then dividing the advertisements in the latter part into normal categories according to a preset fixed weighted score S ', and dividing the advertisements smaller than S' into dirty categories. The value of S' may be determined according to the actual requirement and the empirical value, and preferably, the value of the preset fixed weighted score may be 0.15.
Step S560, storing the advertisement words after data cleaning into a database.
In this step, the advertisement of the dirty category is filtered out, and the critical category and the normal category are stored in the database by category.
The database is used for storing the received pictures and the advertisement words processed by the data cleaning module. In addition, an index relationship between the advertisement and the image can be established in the database.
The result set optimizing module is used for integrating sentences of the classified critical and normal categories in the data cleaning flow and scores given by full-text retrieval of the database, performing secondary scoring and sorting, and optimizing the retrieved result set based on the sorting result. Most sentences of the category of dirty obtained in the data cleaning flow are advertisement words which are wrong in recognition or too small in characters and difficult to be perceived, so that the sentences are not considered in the retrieval process. Specifically, the result optimization module may implement the optimization process by executing the flow shown in fig. 6:
step S610, receiving a search keyword.
Step S620, search matching is carried out in the database from the advertising languages of the two categories of the critical and the normal.
And S630, acquiring a critical matching score and a normal matching score.
In this step, a matching score Sc of retrieving a certain image from the critical category is acquired, and a matching score S n of the result is retrieved from the normal category.
Step S640, calculating weighted scores.
In this step, a weighted score of the above-mentioned critical matching score and normal matching score is calculated, and the weighted score can be calculated by the following formula:
Wherein α+β=1, and the values of α and β can be determined according to practical situations and experience. For example, the value of α may be 0.71 and the value of β may be 0.29.
Step 650, sorting and returning the result according to the weighted scores.
In this step, the weighted scores are ranked and a retrieved result set is returned based on the ranked results.
The search service module is used for providing search service. For example, the result set may be retrieved by a keyword.
It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
Further, in this exemplary embodiment, a text recognition and search device is further provided, and referring to fig. 7, the text recognition and search device 700 may include a text recognition module 710, a data cleaning module 720, and a recognition and search module 730. Wherein:
The text recognition module 710 may be configured to receive the image and store the image in a database, and recognize a plurality of text in the image;
The data cleaning module 720 may be configured to obtain a plurality of corresponding score values according to the text information of each text, and obtain a comprehensive score corresponding to each text based on the plurality of score values;
The recognition retrieval module 730 may be configured to store each text to the database based on the composite score to search the database for a corresponding image based on the scored text.
In this exemplary embodiment, the text recognition module includes a receiving unit and a text recognition unit. The receiving unit is used for receiving the image, for example, the image can be obtained in a direct or indirect mode through external interaction, and the character recognition unit is used for recognizing a plurality of characters in the image through a character recognition algorithm. For example, all text in an image may be recognized by an optical character recognition algorithm.
In the embodiment of the invention, the data cleaning module can clean the data of the identified characters by executing the method of acquiring the scores of an image-text identification algorithm, the semantic integrity scores of the characters and the saliency scores of the characters in the image based on the character information, wherein the semantic integrity scores are determined by the proportion of the number of words of the segmented characters to the total number of characters, the saliency scores are determined by the proportion of the characters to the total area of all the characters in the image, and the algorithm scores, the semantic integrity scores and the saliency scores are weighted to obtain the comprehensive scores corresponding to the characters.
In this example embodiment, the identification and search module may include a warehouse entry unit and a search unit. The storage unit is used for sorting the characters based on the comprehensive scores, classifying the characters into a plurality of grades according to the sorting result, filtering the characters according to the grades, and storing the filtered characters into the database according to a preset proportion. The search unit is used for receiving the search keywords, obtaining a plurality of secondary scoring results based on the characters of which the search keywords are matched and scoring, obtaining secondary comprehensive scores based on the secondary scoring results, sorting the secondary comprehensive scores, and returning a result set based on the sorting results, wherein the result set comprises a plurality of searched pictures.
The specific details of each module or unit in the above-mentioned image-text recognition search device have been described in detail in the corresponding image-text recognition search method, so that the details are not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
As another aspect, the present application also provides a computer-readable medium that may be included in the electronic device described in the above embodiment, or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3, 5, or 6, and so on.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. The image-text recognition and retrieval method is characterized by comprising the following steps of:
receiving an image and storing the image into a database, and identifying a plurality of characters in the image;
Obtaining a plurality of corresponding grading values according to the text information of each text, and obtaining a comprehensive grading corresponding to each text based on the grading values, wherein the grading values comprise a text recognition algorithm grading, a text conspicuity grading and a sentence integrity grading;
and storing each text to the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
2. The method for recognizing and retrieving text according to claim 1, wherein the step of recognizing a plurality of characters in the image includes:
and extracting the plurality of characters in the image based on an optical character recognition algorithm, and obtaining the character information of each character.
3. The method of claim 1, wherein the obtaining a plurality of corresponding scoring values according to the text information of each text comprises:
Acquiring an algorithm score of an algorithm for identifying the corresponding text in the image, a semantic integrity score of each text, and a saliency score of each text in the image based on the text information;
The semantic integrity score is determined by the proportion of the number of words after word segmentation of each word to the total number of words, and the saliency score is determined by the proportion of each word to the total area of all words in the image.
4. A method for identifying and retrieving text according to claim 3, wherein the obtaining the composite score corresponding to each text based on the multiple score values includes:
and carrying out weighted operation on the algorithm score, the semantic integrity score and the saliency score to obtain the comprehensive score corresponding to each word.
5. The method of claim 1, wherein storing each of the words in the database based on the composite score comprises:
and sorting the words based on the comprehensive scores, classifying the words into a plurality of grades according to the sorting result, filtering the words according to the grades, and storing the filtered words into the database according to a preset proportion.
6. The method according to claim 1, wherein said searching the corresponding image in the database based on the scored text comprises:
Receiving a search keyword, matching the scored text based on the search keyword, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the plurality of secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
7. The method according to claim 5, wherein searching the corresponding image in the database based on the scored text comprises:
receiving a search keyword, respectively matching the search keyword with the words of different grades in the database, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
8. An image-text recognition and search device is characterized by comprising:
the character recognition module is used for receiving the image and storing the image into the database, and recognizing a plurality of characters in the image;
The data cleaning module is used for obtaining a plurality of corresponding grading values according to the character information of each character, and obtaining a comprehensive grading corresponding to each character based on the grading values, wherein the grading values comprise a graphic recognition algorithm grading, a character conspicuity grading and a sentence meaning integrity grading;
And the identification and retrieval module is used for storing each text into the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of identifying and retrieving a graphic as claimed in any one of claims 1 to 7.
10. An electronic device, comprising:
A processor;
A memory for storing executable instructions of the processor;
Wherein the processor is configured to perform the teletext identification retrieval method according to any one of claims 1-7 via execution of the executable instructions.
CN202110126962.1A 2021-01-29 2021-01-29 Image and text recognition retrieval method and device, storage medium and electronic equipment Active CN113821666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126962.1A CN113821666B (en) 2021-01-29 2021-01-29 Image and text recognition retrieval method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126962.1A CN113821666B (en) 2021-01-29 2021-01-29 Image and text recognition retrieval method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113821666A CN113821666A (en) 2021-12-21
CN113821666B true CN113821666B (en) 2025-07-15

Family

ID=78912382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126962.1A Active CN113821666B (en) 2021-01-29 2021-01-29 Image and text recognition retrieval method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113821666B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769200B1 (en) * 2015-07-01 2020-09-08 A9.Com, Inc. Result re-ranking for object recognition

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8098934B2 (en) * 2006-06-29 2012-01-17 Google Inc. Using extracted image text
US8811742B2 (en) * 2009-12-02 2014-08-19 Google Inc. Identifying matching canonical documents consistent with visual query structural information
CN108399161A (en) * 2018-03-06 2018-08-14 平安科技(深圳)有限公司 Advertising pictures identification method, electronic device and readable storage medium storing program for executing
CN108897862A (en) * 2018-07-02 2018-11-27 广东飞企互联科技股份有限公司 One kind being based on government document picture retrieval method and system
CN110008365B (en) * 2019-04-09 2023-02-07 广东工业大学 Image processing method, device and equipment and readable storage medium
CN110502650A (en) * 2019-08-12 2019-11-26 深圳智能思创科技有限公司 A kind of image indexing system and method based on natural language description
CN111274428B (en) * 2019-12-19 2023-06-30 北京创鑫旅程网络技术有限公司 Keyword extraction method and device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769200B1 (en) * 2015-07-01 2020-09-08 A9.Com, Inc. Result re-ranking for object recognition

Also Published As

Publication number Publication date
CN113821666A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
EP3866026A1 (en) Theme classification method and apparatus based on multimodality, and storage medium
EP3872652B1 (en) Method and apparatus for processing video, electronic device, medium and product
CN108829893A (en) Determine method, apparatus, storage medium and the terminal device of video tab
CN110674312B (en) Method, device and medium for constructing knowledge graph and electronic equipment
CN106708940B (en) Method and device for processing pictures
CN114021577A (en) Content tag generation method and device, electronic equipment and storage medium
CN102043843A (en) Method and obtaining device for obtaining target entry based on target application
CN113239204B (en) Text classification method and device, electronic equipment and computer readable storage medium
CN103577452A (en) Website server and method and device for enriching content of website
CN110765973B (en) Account type identification method and device
CN107436916B (en) Intelligent answer prompting method and device
WO2021218027A1 (en) Method and apparatus for extracting terminology in intelligent interview, device, and medium
CN111369980A (en) Voice detection method and device, electronic equipment and storage medium
JP2023544925A (en) Data evaluation methods, training methods and devices, electronic equipment, storage media, computer programs
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN116109732A (en) Image labeling method, device, processing equipment and storage medium
CN104881447A (en) Searching method and device
CN113688268A (en) Picture information extraction method and device, computer equipment and storage medium
CN118643342A (en) Sample pair generation, large model training, image retrieval method and device, equipment and medium
CN104881446A (en) Searching method and searching device
CN111368553A (en) Intelligent word cloud picture data processing method, device, equipment and storage medium
CN113821666B (en) Image and text recognition retrieval method and device, storage medium and electronic equipment
CN107590163B (en) The methods, devices and systems of text feature selection
CN112529627A (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN114742168B (en) Web page similarity model training method, device, electronic device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant