Disclosure of Invention
An object of an embodiment of the present disclosure is to provide a text-to-text recognition and retrieval method, a text-to-text recognition and retrieval device, an electronic apparatus, and a computer readable storage medium, so as to implement data cleaning of recognition results by integrating multiple aspects of information of text in an image in a text recognition process, thereby optimizing an effect of retrieving an image according to text.
According to a first aspect of the present disclosure, there is provided a text recognition retrieval method, including:
receiving an image and storing the image into a database, and identifying a plurality of characters in the image;
obtaining a plurality of corresponding grading values according to the character information of each character, and obtaining a comprehensive grading corresponding to each character based on the grading values;
and storing each text to the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
In an exemplary embodiment of the disclosure, the identifying the plurality of words in the image includes:
and extracting the plurality of characters in the image based on an optical character recognition algorithm, and obtaining the character information of each character.
In an exemplary embodiment of the disclosure, the obtaining a plurality of corresponding scoring values according to the text information of each text includes:
Acquiring an algorithm score of an algorithm for identifying the corresponding text in the image, a semantic integrity score of each text, and a saliency score of each text in the image based on the text information;
The semantic integrity score is determined by the proportion of the number of words after word segmentation of each word to the total number of words, and the saliency score is determined by the proportion of each word to the total area of all words in the image.
In an exemplary embodiment of the disclosure, the obtaining, based on the multiple scoring values, a composite score corresponding to each of the words includes:
and carrying out weighted operation on the algorithm score, the semantic integrity score and the saliency score to obtain the comprehensive score corresponding to each word.
In an exemplary embodiment of the disclosure, the storing each of the words to the database based on the composite score includes:
and sorting the words based on the comprehensive scores, classifying the words into a plurality of grades according to the sorting result, filtering the words according to the grades, and storing the filtered words into the database according to a preset proportion.
In an exemplary embodiment of the disclosure, the searching the corresponding image in the database based on the scored text includes:
Receiving a search keyword, matching the scored text based on the search keyword, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the plurality of secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
In an exemplary embodiment of the disclosure, the searching the corresponding image in the database based on the scored text includes:
receiving a search keyword, respectively matching the search keyword with the words of different grades in the database, obtaining a plurality of secondary scoring results, and obtaining a secondary comprehensive score based on the secondary scoring results;
And sorting the secondary comprehensive scores, and returning a result set based on the sorting result, wherein the result set comprises a plurality of images.
According to a second aspect of the present disclosure, there is provided a graphic identification search device, including:
the character recognition module is used for receiving the image and storing the image into the database, and recognizing a plurality of characters in the image;
the data cleaning module is used for obtaining a plurality of corresponding grading values according to the character information of each character and obtaining a comprehensive grading corresponding to each character based on the grading values;
And the identification and retrieval module is used for storing each text into the database based on the comprehensive scores so as to search the corresponding image in the database based on the scored text.
According to a third aspect of the present disclosure, there is provided an electronic device comprising a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure may have some or all of the following advantages:
The image-text recognition retrieval method provided by the example embodiment of the disclosure receives an image, stores the image in a database, recognizes a plurality of characters in the image, obtains a plurality of corresponding grading values according to character information of each character, obtains a comprehensive grading corresponding to each character based on the plurality of grading values, and stores each character in the database based on the comprehensive grading so as to search for a corresponding image in the database based on the graded characters. On the one hand, after the text recognition and search method provided by the embodiment of the invention recognizes a plurality of texts in an image, the comprehensive score corresponding to each text is obtained through the text information of each text, and the information of each aspect of the text is fully utilized, so that the text recognition result can be effectively cleaned. On the other hand, after the character recognition result is effectively cleaned, the character recognition result is put into a database, so that images can be searched in the database through the cleaned characters, and the searching accuracy is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram of a system architecture of an exemplary application environment to which a method and apparatus for image-text recognition retrieval according to an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be electronic devices with photographing or image transmission functions, including but not limited to desktop computers, portable computers, smart phones, tablet computers, etc. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.
The image-text recognition retrieval method provided by the embodiment of the disclosure can be executed by the terminal equipment 101, 102 and 103, and correspondingly, the image-text recognition retrieval device can also be arranged in the terminal equipment 101, 102 and 103. The image-text recognition retrieval method provided by the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, 103 and the server 105 together, and accordingly, the image-text recognition retrieval device may be disposed in the terminal devices 101, 102, 103 and the server 105. In addition, the image-text recognition search method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the image-text recognition search device may be disposed in the server 105, which is not particularly limited in the present exemplary embodiment.
For example, in the present exemplary embodiment, the above-described teletext identification search method may be performed by the terminal apparatus 101, 102, 103 in conjunction with the server 105. First, an image may be photographed or received by a terminal device, which transmits the image to a server after receiving or photographing the image. The server stores the image into a database, invokes a graph-text recognition algorithm to recognize a plurality of words in the image, obtains a plurality of corresponding grading values according to the word information of each word, obtains a comprehensive grading value based on the obtained plurality of grading values, and finally stores the plurality of words into the database by taking the comprehensive grading value as a basis so as to search the corresponding image in the database based on the graded words.
Fig. 2 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.
As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU) 201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data required for the system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.
Connected to the I/O interface 205 are an input section 206 including a keyboard, a mouse, and the like, an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like, a storage section 208 including a hard disk, and the like, and a communication section 209 including a network interface card such as a LAN card, a modem, and the like. The communication section 209 performs communication processing via a network such as the internet. The drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 210 as needed, so that a computer program read out therefrom is installed into the storage section 208 as needed.
With the advent of the information age, the carrier of information has been diversified, and for example, images have been widely used in many fields because they can more vividly and intuitively convey information. In order to better adapt to the progress of informatization, the research on the image-text recognition and retrieval technology is also becoming more important.
The image-text recognition and retrieval technology mainly comprises two aspects of data cleaning and text searching in the text recognition process. In the related art, the technical means about data cleaning mainly comprises two aspects, namely, optimizing a recognition algorithm to improve recognition accuracy, scoring a recognition result through the algorithm, and filtering through a set score threshold. This approach allows for effective cleaning, but ignores the information such as the location, size, etc. of the text in the image.
In addition, for searching the graph in text, the sentence to be searched is matched with the data set in the database mainly through the function of full text index of the database, and a result set is returned. Although the method can realize the retrieval, the returned result set only depends on the full text retrieval matching result of the database, the problems of how to optimize the retrieval effect by utilizing the diversified text information in the image and the like are not considered, and the accuracy is lower during the short word retrieval.
In order to solve the problems in the above method, the present exemplary embodiment proposes a technical solution, which can implement data cleaning on the recognition result by integrating multiple aspects of information of the text in the image during text recognition, so as to optimize the effect of retrieving the image according to the text. The following describes the technical scheme of the embodiments of the present disclosure in detail:
The present exemplary embodiment first provides a method for identifying and retrieving graphics. Referring to fig. 3, the image-text recognition search method specifically includes the following steps:
step S310, receiving an image, storing the image in a database, and identifying a plurality of characters in the image;
Step S320, obtaining a plurality of corresponding grading values according to the text information of each text, and obtaining a comprehensive grading corresponding to each text based on the plurality of grading values;
and step S330, storing each text into the database based on the comprehensive scores so as to search the corresponding images in the database based on the scored text.
In the image-text recognition and search method provided by the exemplary embodiment of the present disclosure, on one hand, after recognizing a plurality of characters in an image, the image-text recognition and search method provided by the exemplary embodiment further obtains a comprehensive score corresponding to each character through the character information of each character, and makes full use of information of each aspect of the character, so that effective cleaning of the character recognition result can be achieved. On the other hand, after the character recognition result is effectively cleaned, the character recognition result is put into a database, so that images can be searched in the database through the cleaned characters, and the searching accuracy is improved.
In another embodiment, the above steps are described in more detail below.
In step S310, the image is received and stored in a database, and a plurality of characters in the image are recognized.
The image-text recognition retrieval method provided by the present exemplary embodiment is used for providing a function of recognizing characters in an image and retrieving the image through keywords. For example, the image-text recognition retrieval method can be executed together by the terminal device and the server. Specifically, a database may be established in the server for storing text, images, providing retrieval functions, and the like. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the present exemplary embodiment, the image is an arbitrary image including text content, for example, may be an advertisement image including a plurality of advertisements, or may be another type of image including text, and the present exemplary embodiment is not limited thereto. In addition, the above-mentioned image may be captured by the terminal device and transmitted to the server, or may be an image externally input to the terminal device and transmitted from the terminal device to the server, or directly transmitted from the outside to the server, which is not particularly limited in this exemplary embodiment.
In this exemplary embodiment, after the image is received, the received image is stored in a database, and a plurality of characters included in the image are recognized by a text recognition algorithm. The image-text recognition algorithm is used for analyzing, recognizing and processing the image file of the text data to acquire the text and layout information in the image. For example, the image-text recognition algorithm may be an OCR (Optical Character Recognition ) algorithm. Characters in the image can be extracted through an OCR algorithm so as to facilitate subsequent further analysis, thereby exerting the value of the characters. For example, an image is searched for according to text contents, and the image is classified according to text contents. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In step S320, a plurality of corresponding score values are obtained according to the text information of each text, and a comprehensive score corresponding to each text is obtained based on the plurality of score values.
In this exemplary embodiment, after the plurality of characters in the image are identified by the image-text recognition algorithm, the composite score corresponding to each character may also be obtained according to the character information of each character. For example, the text information may include text content, text position information in an image, text area, and the like. It should be noted that the above scenario is only an exemplary illustration, and the present exemplary embodiment is not limited thereto. For example, the text information may include more or less information according to actual needs, which falls within the protection scope of the present exemplary embodiment.
In the present exemplary embodiment, the scoring value may be obtained based on the text information. For example, the scoring values may include a graphic recognition algorithm score, a text saliency score, and a sentence meaning integrity score, and the process of obtaining multiple scoring values based on text information may be implemented by obtaining the algorithm score of the graphic recognition algorithm, the semantic integrity score of each text, and the saliency score of each text in the image based on the text information. The semantic integrity score is determined by the proportion of the number of words after word segmentation of each word to the total number of words, and the saliency score is determined by the proportion of each word to the total area of all words in the image.
Taking the image as an advertisement picture, the advertisement picture comprises a plurality of advertisement words, and assuming that any advertisement word is C i, the semantic integrity score is obtainedThe calculation formula of (2) can be as follows:
Wherein, the Word number after word segmentation for each row of characters,For the total word number of the advertisementCan represent the word number of each row of words after word segmentationAccounting for total word numberThe higher the value, the higher the scale representing the semantic integrity of the sentence, and the easier it is to understand the corresponding sentence.
In addition, the above-mentioned saliency scoreThe calculation formula of (2) can be as follows:
wherein a i is the area of the area where any line of characters is located, The area occupied by all text areas in the whole picture isThe area of the area where the text of the line is located occupies the area of the area where all the text is located in the whole pictureRepresenting the level of salience of the line of text in the picture, the greater this value, the more noticeable it is in the picture, and the higher its importance.
It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the present exemplary embodiment, after the plurality of score values of each text are obtained based on the recognized text information, a composite score corresponding to each text may be obtained based on the plurality of score values. The process can be realized by carrying out weighted operation on the algorithm score, the semantic integrity score and the saliency score to obtain the comprehensive score corresponding to each text.
Taking the advertisement picture as an example, the composite score can be obtained by the following formula:
Wherein, the above In order to weight the composite score,Scoring the image-text recognition algorithm,For the above-described semantic integrity score,The above-mentioned saliency was scored. The alpha, the beta and the lambda are respectively the weighted proportion of the image-text recognition algorithm score, the semantic integrity score and the saliency score. The weighted specific gravity can be determined according to actual conditions and empirical values. For example, it is preferable that the specific gravity combination be such that the value of α is 0.3, the value of β is 0.1, and the value of λ is 0.6. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In step S330, each text is stored in the database based on the composite score, so that the corresponding image is searched in the database based on the scored text.
In this exemplary embodiment, in order to achieve more efficient data cleansing, after the composite score is obtained in step S320, the identified plurality of words may be correspondingly stored in the database based on the composite score. For example, the process may be implemented by sorting the words based on the composite score, classifying the words into a plurality of levels according to the sorting result, filtering the words according to the plurality of levels, and storing the filtered words in a database according to a predetermined ratio. In addition, after the characters are stored in the database, an index relationship between the characters and the images can be established, so that the corresponding images can be searched in the database based on the scored characters.
Specifically, taking the advertisement picture as an example, the above process may be that a weighted composite score of all advertisements in the advertisement picture is calculatedAnd then sorting the identified plurality of advertisements into three categories of criterial (very important), normal (generally important) and dirty (filterable) according to the score of the comprehensive score from large to small. Wherein, the classification rule can be that according to the weighted comprehensive scoreSorting, namely dividing all the advertisements into two parts according to a certain proportion, dividing the advertisements sorted in the former part into critical categories, wherein the proportion can be determined according to actual requirements and experience values, preferably, the proportion can be 1:1, then dividing the advertisements in the latter part into normal categories according to a preset fixed weighted score S ', and dividing the advertisements smaller than S' into dirty categories. The value of S' can be determined according to actual demands and experience values, preferably, the value of the preset fixed weighted score can be 0.15, and finally, advertisement words in the category of dirty are filtered out and the category of critical and the category of normal are stored in the database according to the category. In addition, an index relationship between the advertisement and the image can be established in the database. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In this exemplary embodiment, the process of searching for a corresponding image based on the scored text in the database may be implemented by receiving a search keyword, matching the scored text based on the search keyword, obtaining a plurality of secondary scoring results, and obtaining a secondary composite score based on the plurality of secondary scoring results, sorting the secondary composite scores, and returning a result set based on the sorting result, where the result set includes a plurality of pictures. Specifically, the matching of the scored text based on the search keyword may obtain multiple secondary scoring results, where the matching of the search keyword from different grades of text in the database may obtain multiple secondary scoring results.
Taking the advertisement picture as an example, the process can be implemented by performing secondary scoring according to the classified advertisement words of the critical and normal categories (the category of the dirty is filtered in the process of data cleaning) and scoring of full-text retrieval of the database, obtaining a plurality of secondary scoring results, sorting the secondary scoring results, and returning a result set according to the sorting results. Specifically, assuming that, for the search keyword K, when the database performs full-text search, the secondary scoring result of searching for an image from the critical category is Sc and the secondary scoring result of searching for an image from the normal category is S n, the secondary comprehensive score can be calculated by the following formula:
wherein α+β=1, and the values of α and β can be determined according to practical situations and experience. For example, the value of α may be 0.71 and the value of β may be 0.29. It should be noted that the above scenario is only an exemplary illustration, and the protection scope of the present exemplary embodiment is not limited thereto.
In the searching process, after the secondary comprehensive scores of all the results in the result set are calculated, the results are reordered according to the scores of the secondary comprehensive scores, and the semantic integrity of the characters in the image and the saliency of the characters in the image are considered, so that the results obtained by the searching method are more accurate.
In the following, taking a specific application scenario of image-text identification search of advertisement pictures as an example, the image-text identification search method is fully described with reference to fig. 4 to 6. Fig. 4 is a schematic diagram of an advertisement picture text recognition and retrieval system. As shown in fig. 4, the system architecture of the recognition retrieval system includes an external interaction layer 410, a recognition service module 420, a data cleansing module 430, a database 440, a result set optimization module 450, and a retrieval service module 460. Wherein:
The external interaction layer is used for submitting the picture to the system in a direct or indirect mode, receiving the result set retrieved by the system and sending the result set to the caller.
The recognition service module is used for recognizing characters (a plurality of advertisement words) in the picture obtained through the external interaction layer, and specifically, the advertisement words in the picture can be recognized through an optical character recognition OCR algorithm.
The data cleaning module is used for cleaning the data of the identified advertisement. Specifically, the data cleansing process may be implemented by executing the flow shown in fig. 5. As shown in fig. 5, the process includes the steps of:
and S510, obtaining scores recognized by an OCR algorithm.
In this step, a score recognized by the OCR algorithm of any of the above recognized advertisments C i is obtainedIn repeating this step, scores of OCR algorithm recognition for all of the recognized advertisements may be obtained.
Step S520, word segmentation is carried out on sentences (advertisement words), and the proportion of the word segmentation to the sentences is calculated.
In the step, the advertisement languages are segmented, and semantic integrity scores are obtained by calculating the proportion of the segmented words in sentences. Wherein the word segmentation accounts for the proportion of sentencesThe method can be calculated by the following formula:
Wherein, the Word number after word segmentation for each row of characters,For the total word number of the advertisementCan represent the word number of each row of words after word segmentationAccounting for total word numberThe higher the value, the higher the scale representing the semantic integrity of the sentence, and the easier it is to understand the corresponding sentence.
And step S530, calculating the proportion of the area of each advertisement in the picture to the area of all characters in the picture.
In this step, the ratio of the area of each advertisement in the picture to the area of all characters in the picture is calculatedTo obtain a prominence score for each advertisement in the picture,The method can be calculated by the following formula:
wherein a i is the area of the area where any line of characters is located, The area occupied by all text areas in the whole picture isThe area of the area where the text of the line is located occupies the area of the area where all the text is located in the whole pictureRepresenting the level of salience of the line of text in the picture, the greater this value, the more noticeable it is in the picture, and the higher its importance.
Step S540, calculating weighted scores.
In this step, a weighted score of the OCR algorithm recognition score, the semantic integrity score, and the prominence score is calculated. The weighted score may be calculated by the following formula:
Wherein, the above In order to weight the composite score,Scoring the image-text recognition algorithm,For the above-described semantic integrity score,The above-mentioned saliency was scored. The alpha, the beta and the lambda are respectively the weighted proportion of the image-text recognition algorithm score, the semantic integrity score and the saliency score. The weighted specific gravity can be determined according to actual conditions and empirical values. For example, it is preferable that the specific gravity combination be such that the value of α is 0.3, the value of β is 0.1, and the value of λ is 0.6.
And step S550, classifying all advertisements into critical, normal, dirty three grades according to the weighted scores.
In this step, the scores of the weighted scores are sorted in order from the large to the small, and all the identified advertisement languages are classified into three classes of critical, normal, and dirty (filterable) according to the sorting result. Wherein, the classification rule can be that according to the weighted comprehensive scoreSorting, namely dividing all the advertisements into two parts according to a certain proportion, dividing the advertisements sorted in the former part into critical categories, wherein the proportion can be determined according to actual requirements and experience values, preferably, the proportion can be 1:1, then dividing the advertisements in the latter part into normal categories according to a preset fixed weighted score S ', and dividing the advertisements smaller than S' into dirty categories. The value of S' may be determined according to the actual requirement and the empirical value, and preferably, the value of the preset fixed weighted score may be 0.15.
Step S560, storing the advertisement words after data cleaning into a database.
In this step, the advertisement of the dirty category is filtered out, and the critical category and the normal category are stored in the database by category.
The database is used for storing the received pictures and the advertisement words processed by the data cleaning module. In addition, an index relationship between the advertisement and the image can be established in the database.
The result set optimizing module is used for integrating sentences of the classified critical and normal categories in the data cleaning flow and scores given by full-text retrieval of the database, performing secondary scoring and sorting, and optimizing the retrieved result set based on the sorting result. Most sentences of the category of dirty obtained in the data cleaning flow are advertisement words which are wrong in recognition or too small in characters and difficult to be perceived, so that the sentences are not considered in the retrieval process. Specifically, the result optimization module may implement the optimization process by executing the flow shown in fig. 6:
step S610, receiving a search keyword.
Step S620, search matching is carried out in the database from the advertising languages of the two categories of the critical and the normal.
And S630, acquiring a critical matching score and a normal matching score.
In this step, a matching score Sc of retrieving a certain image from the critical category is acquired, and a matching score S n of the result is retrieved from the normal category.
Step S640, calculating weighted scores.
In this step, a weighted score of the above-mentioned critical matching score and normal matching score is calculated, and the weighted score can be calculated by the following formula:
Wherein α+β=1, and the values of α and β can be determined according to practical situations and experience. For example, the value of α may be 0.71 and the value of β may be 0.29.
Step 650, sorting and returning the result according to the weighted scores.
In this step, the weighted scores are ranked and a retrieved result set is returned based on the ranked results.
The search service module is used for providing search service. For example, the result set may be retrieved by a keyword.
It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
Further, in this exemplary embodiment, a text recognition and search device is further provided, and referring to fig. 7, the text recognition and search device 700 may include a text recognition module 710, a data cleaning module 720, and a recognition and search module 730. Wherein:
The text recognition module 710 may be configured to receive the image and store the image in a database, and recognize a plurality of text in the image;
The data cleaning module 720 may be configured to obtain a plurality of corresponding score values according to the text information of each text, and obtain a comprehensive score corresponding to each text based on the plurality of score values;
The recognition retrieval module 730 may be configured to store each text to the database based on the composite score to search the database for a corresponding image based on the scored text.
In this exemplary embodiment, the text recognition module includes a receiving unit and a text recognition unit. The receiving unit is used for receiving the image, for example, the image can be obtained in a direct or indirect mode through external interaction, and the character recognition unit is used for recognizing a plurality of characters in the image through a character recognition algorithm. For example, all text in an image may be recognized by an optical character recognition algorithm.
In the embodiment of the invention, the data cleaning module can clean the data of the identified characters by executing the method of acquiring the scores of an image-text identification algorithm, the semantic integrity scores of the characters and the saliency scores of the characters in the image based on the character information, wherein the semantic integrity scores are determined by the proportion of the number of words of the segmented characters to the total number of characters, the saliency scores are determined by the proportion of the characters to the total area of all the characters in the image, and the algorithm scores, the semantic integrity scores and the saliency scores are weighted to obtain the comprehensive scores corresponding to the characters.
In this example embodiment, the identification and search module may include a warehouse entry unit and a search unit. The storage unit is used for sorting the characters based on the comprehensive scores, classifying the characters into a plurality of grades according to the sorting result, filtering the characters according to the grades, and storing the filtered characters into the database according to a preset proportion. The search unit is used for receiving the search keywords, obtaining a plurality of secondary scoring results based on the characters of which the search keywords are matched and scoring, obtaining secondary comprehensive scores based on the secondary scoring results, sorting the secondary comprehensive scores, and returning a result set based on the sorting results, wherein the result set comprises a plurality of searched pictures.
The specific details of each module or unit in the above-mentioned image-text recognition search device have been described in detail in the corresponding image-text recognition search method, so that the details are not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
As another aspect, the present application also provides a computer-readable medium that may be included in the electronic device described in the above embodiment, or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3, 5, or 6, and so on.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.