[go: up one dir, main page]

CN113011409A - Image identification method and device, electronic equipment and storage medium - Google Patents

Image identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113011409A
CN113011409A CN202110359351.1A CN202110359351A CN113011409A CN 113011409 A CN113011409 A CN 113011409A CN 202110359351 A CN202110359351 A CN 202110359351A CN 113011409 A CN113011409 A CN 113011409A
Authority
CN
China
Prior art keywords
target
image
target detection
image segmentation
detection frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110359351.1A
Other languages
Chinese (zh)
Inventor
单海蛟
何小坤
熊泽法
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202110359351.1A priority Critical patent/CN113011409A/en
Publication of CN113011409A publication Critical patent/CN113011409A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像识别方法、装置、电子设备及存储介质,通过获取目标图像,利用预先训练完成的图像分割模型得到目标检测框以及与目标检测框对应的图像分割结果图,根据目标检测框和图像分割结果图,对目标图像进行裁剪,得到与目标对象对应的目标区域,利用光学字符识别算法确定目标区域中目标对象的内容,得到识别结果,将目标检测算法和图像分割算法相结合,能够准确的对目标图像进行裁剪,得到单个目标区域,有效的减少其他文本信息的干扰,从而对目标区域进行精准的识别,提高图像识别的准确率。

Figure 202110359351

The present disclosure relates to an image recognition method, device, electronic device and storage medium. By acquiring a target image, a pre-trained image segmentation model is used to obtain a target detection frame and an image segmentation result map corresponding to the target detection frame. and the image segmentation result map, crop the target image to obtain the target area corresponding to the target object, use the optical character recognition algorithm to determine the content of the target object in the target area, obtain the recognition result, and combine the target detection algorithm with the image segmentation algorithm, The target image can be accurately cropped to obtain a single target area, which can effectively reduce the interference of other text information, so as to accurately identify the target area and improve the accuracy of image recognition.

Figure 202110359351

Description

Image identification method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image recognition method and apparatus, an electronic device, and a storage medium.
Background
Nowadays, with the rapid development of artificial intelligence, professional answers are obtained by searching for topics from images containing topic information, and the method becomes a popular learning method.
The method for searching topics based on an image is mainly a method based on target detection, and comprises the steps of performing frame selection on each topic contained in the image by using a rectangular frame, cutting multiple topics contained in the image according to a frame selection result to obtain a topic area containing single topic information, performing character recognition according to the cut topic area, and performing searching according to recognition content to obtain an accurate search result.
However, the topic contents contained in the obtained image often have the phenomena of inclination and distortion, the topic areas obtained by the prior art are difficult to accurately distinguish the boundaries of each topic, especially when the image contains a plurality of topic areas, the areas selected by the topic frame are easy to overlap, and when the image is cut according to the frame selection result, the image contains other topic information, so that an interference text appears before, behind or in the middle of a single topic identification result, an accurate identification result cannot be obtained, and the accuracy rate of searching is low.
Disclosure of Invention
To solve the technical problem or at least partially solve the technical problem, the present disclosure provides an image recognition method, an apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides an image recognition method, including:
acquiring a target image, wherein the target image comprises one or more target objects;
according to the target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model;
according to the target detection frame and the image segmentation result graph, cutting the target image to obtain a target area corresponding to the target object;
and determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
Optionally, the cutting the target image according to the target detection frame and the image segmentation result map to obtain a target region corresponding to the target object includes:
cutting the target image according to the target detection frame to obtain a first target image;
determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph;
obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area;
correcting the first target image according to the inclination angle of the minimum inclination matrix;
and cutting the corrected first target image according to the width and the height of the minimum tilt matrix to obtain a target area corresponding to the target object.
Optionally, before obtaining, according to the target image, a target detection frame corresponding to the target object and an image segmentation result map corresponding to the target detection frame by using an image segmentation model trained in advance, the method further includes:
inputting the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotating the target image according to the angle classification result;
and according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
Optionally, the image segmentation model includes a target detection layer and an image segmentation layer, the target detection layer is configured to perform feature extraction and target detection on the target image to obtain target feature information and a target detection frame, and the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame.
Optionally, the image segmentation layer is configured to obtain the image segmentation result graph according to the target feature information and the target detection frame, and includes:
the image segmentation layer is used for determining first target feature information corresponding to the target detection frame in the target feature information, calculating a probability value of each pixel point in the first target feature information, and obtaining the image segmentation result graph according to the probability value of each pixel point.
Optionally, before the acquiring the target image, the method further includes generating an image segmentation model, including:
acquiring a first sample image and a first target detection frame containing a target object in the first sample image;
according to the first sample image and the first target detection frame, performing model training on a target detection layer in the image segmentation model to obtain a first target detection layer;
acquiring a second sample image and a second target segmentation map containing a target object in the second sample image;
and performing model training on a first target detection layer and an image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation image.
Optionally, the performing model training on the target detection layer in the image segmentation model according to the first sample image and the first target detection frame to obtain a first target detection layer includes:
inputting the first sample image into a target detection layer in the image segmentation model to obtain a first predicted target detection frame;
determining a first loss function according to the first predicted target detection box and the first target detection box;
and updating the parameters of the target detection layer according to the first loss function to obtain a first target detection layer.
Optionally, the performing model training on the first target detection layer and the image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation map includes:
inputting the second sample image into the first target detection layer to obtain second feature information and a second prediction target detection frame corresponding to the second sample image;
inputting the second feature information and the second prediction target detection frame into the image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map;
determining a second loss function according to the second feature information, the second predicted target detection frame, the second predicted target segmentation map and the second target segmentation map;
and updating the parameters of the first target detection layer and the parameters of the image segmentation layer according to the second loss function.
In a second aspect, an embodiment of the present disclosure provides an image recognition apparatus, including:
the device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a target image, and the target image comprises one or more target objects.
And the image segmentation module is used for obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by utilizing a pre-trained image segmentation model according to the target image.
And the image cutting module is used for cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object.
And the image recognition module is used for determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
In a third aspect, an embodiment of the present disclosure provides an electronic device, which includes a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method as described above.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described method.
The embodiment of the disclosure provides an image recognition method, an image recognition device, an electronic device and a storage medium, wherein a target image is obtained, a pre-trained image segmentation model is utilized to obtain a target detection frame and an image segmentation result graph corresponding to the target detection frame, the target image is cut according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to a target object, the content of the target object in the target area is determined by an optical character recognition algorithm to obtain a recognition result, the target detection algorithm and the image segmentation algorithm are combined to cut the target image, the boundary of each topic can be accurately distinguished to obtain the target area containing a single topic, the phenomenon that the target area of each topic is overlapped is effectively reduced, the target area is accurately recognized, and a text which is interfered when the single topic is recognized is avoided, the accuracy of image recognition is effectively improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a network architecture diagram of a target detection algorithm provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a block selection result based on a target detection algorithm according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an application scenario provided by the embodiment of the present disclosure;
fig. 4 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure;
FIG. 5 is a diagram of a network structure of an image segmentation model provided in an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure;
fig. 7 is a flowchart of an image recognition method according to an embodiment of the present disclosure;
fig. 8 is a flowchart of an image recognition method provided by an embodiment of the present disclosure;
fig. 9 is a schematic diagram of an image recognition method according to an embodiment of the present disclosure;
fig. 10 is a schematic diagram of a framing result of an image recognition method according to an embodiment of the disclosure;
fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The existing target detection algorithm is mainly a single-stage target detection method, such as a real-time target detection algorithm (You Only Look one/Yolo), which can quickly locate a target, but is difficult to accurately segment a title boundary, and the title content in an actually acquired image is easy to have the phenomena of inclination and distortion. Therefore, the existing target detection algorithm is difficult to obtain a single topic area, and the accuracy rate of the search result is low.
The current Yolo series detection algorithm is a target detection network with the most balanced speed and precision, and has evolved from Yolo v1 to Yolo v 5. The speed can reach 140 FPS at the fastest speed. Fig. 1 is a network structure diagram of a target detection algorithm provided in an embodiment of the present disclosure, and a Yolo network structure 100 is shown in fig. 1, and includes an input layer 110, a trunk layer 120, a sampling layer 130, and an output layer 140, where the input layer 110 performs data enhancement (Mosaic) and adaptive anchor frame calculation on data input to a network, and inputs a processed feature map to the trunk layer 120 to perform slicing operation and convolution network structure processing (adopting a Focus structure and a CSP structure), the sampling layer 130 performs up-sampling and down-sampling on the feature map output by the trunk layer 120, and the output layer 140 calculates an accuracy of an output detection result and outputs a detection result frame.
Taking the network structure of fig. 1 as an example, 3 features with different scales are obtained through the input layer 110, the trunk layer 120 and the sampling layer 130, assuming that the width and the height of the input image are 512, the sizes of the output features are 64, 32 and 16, and the number of channels of the 3 features with different scales are 256, 512 and 1024, respectively. And the features of 3 different scales are finally obtained by the convolution layer in the output layer 140 to obtain all possible detection results of 3 scales, wherein an output frame in the detection results comprises probability information and coordinate information, and coordinates are represented by (x, y, w, h) and are respectively the coordinates, width and height of the central point of the rectangular frame. All the test results are passed together through non-maximum suppression (NMS) in the output layer 140 to get the test result box with the highest accuracy.
However, due to the limitation of network design, the output coordinates of the Yolo detection model are a regular rectangle, and the subject is easy to tilt in the shooting scene. If a detected single topic information is represented by a regular rectangular box, there will be overlap between multiple topics, as shown in fig. 2, there will be an overlap phenomenon in the selection of each topic, and a cut topic area containing each topic will also contain other topic information, for example, the information of C is contained in the content of the topic B box in fig. 2. The recognition result of the optical character recognition algorithm (OCR) will also contain texts of other topics, and when the texts of other topics are relatively large, the recognition result of the current topic is likely to be affected.
Specifically, an image recognition method may be performed by a terminal or a server. Specifically, the terminal or the server may perform target detection and image segmentation on the target object in the target image through the image segmentation model. The execution subject of the training method of the image segmentation model and the execution subject of the image recognition method may be the same or different.
For example, in one application scenario, as shown in FIG. 3, server 320 trains an image segmentation model. The terminal 310 obtains the trained image segmentation model from the server 320, and the terminal 310 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model. The target image may be captured by the terminal 310. Alternatively, the target image is obtained by the terminal 310 from another device. Still alternatively, the target image is an image obtained by image processing of a preset image by the terminal 310, where the preset image may be obtained by shooting by the terminal 310, or the preset image may be obtained by the terminal 310 from another device. Here, the other devices are not particularly limited.
In another application scenario, the server 320 trains the image segmentation model. Further, the server 320 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model. The manner in which the server 320 acquires the target image may be similar to the manner in which the terminal 310 acquires the target image as described above, and will not be described herein again.
In yet another application scenario, the terminal 310 trains an image segmentation model. Further, the terminal 310 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model.
It can be understood that the image segmentation model training method and the image recognition method provided by the embodiments of the present disclosure are not limited to the several possible scenarios described above. Since the trained image segmentation model can be applied to the image recognition method described below, the image segmentation model training method can be described below before the image recognition method is described.
Taking the example of training the image segmentation model by the server 320, a method for training the image segmentation model, that is, a training process of the image segmentation model, is described below. It is understood that the image segmentation model training method is equally applicable to the scenario in which the terminal 310 trains the image segmentation model.
Fig. 4 is a schematic diagram of an image segmentation model training method provided in the embodiment of the present disclosure. The image segmentation model includes a target detection layer and an image segmentation layer, as shown in fig. 5, the image segmentation model network structure 500 is shown in fig. 5, the target detection layer includes an input layer 110, a trunk layer 120, a sampling layer 130 and an output layer 140 shown in fig. 5, that is, the network structure of the above Yolo target detection algorithm, and is used to perform feature extraction and target detection on the target image to obtain target feature information and a target detection frame, and the image segmentation layer includes a segmentation layer 150 for obtaining the image segmentation result map according to the target feature information and the target detection frame. The method comprises the following steps as shown in fig. 4:
s410, acquiring a first sample image and a first target detection frame containing a target object in the first sample image.
In this embodiment, the first sample image may specifically refer to an image including one or more items of information, correspondingly, the target object may specifically refer to each item of information in the first sample image, and the first target detection frame may specifically refer to an image in which each item of information included in the first sample image is framed, where a framing result of each item is accurate on the first sample image.
Optionally, the first sample image may be an image that is shot by a terminal and contains one or more topic information, or an image that is obtained through operations such as screenshot and downloading, where the topic information may specifically refer to a mathematical topic, a language topic, or the like, and may also refer to content of an article, a newspaper, or a webpage that contains each segment of text information, and text recognition may be performed by using the image recognition method described in this embodiment, which is not limited herein.
And S420, performing model training on a target detection layer in the image segmentation model according to the first sample image and the first target detection frame to obtain a first target detection layer.
Understandably, according to the first sample image obtained in S410 and the first target detection frame serving as the label, the target detection layer in the image segmentation model is trained to obtain the trained first target detection layer, where the target detection layer may be constructed for the above-mentioned target detection network (Yolo).
Optionally, the specific implementation step of S420 includes: inputting the first sample image into a target detection layer in the image segmentation model to obtain a first predicted target detection frame; determining a first loss function according to the first predicted target detection box and the first target detection box; and updating the parameters of the target detection layer according to the first loss function to obtain a first target detection layer.
Understandably, a first sample image is input into a constructed target detection layer to obtain a first predicted target detection frame, wherein the first predicted target detection frame is an image obtained by framing a target object, namely topic information, by a target detection layer, namely a Yolo network, and then a first loss function of the target detection layer is determined according to the first predicted target detection frame and the first target detection frame serving as a label, wherein a specific calculation formula of the first loss function is not limited, and the first loss function can be selected by self according to the input image, parameters of the target detection layer are gradually updated according to the first loss function, so that the first target detection layer with updated network parameters is obtained, and the first target detection layer is stored.
S430, acquiring a second sample image and a second target segmentation map containing a target object in the second sample image.
In this embodiment, the second sample image may specifically be an image including one or more items of item information, the target object may specifically be each item of item information included in the second sample image, and the second target segmentation map may specifically be a result map obtained by segmenting the target object in a second target detection frame selected from one or more items of item information frames in the second sample image, that is, segmenting the target object in the second target detection frame as a foreground and separating the target object from a background, where the number of the obtained second target segmentation maps is the same as the number of items of item information included in the second sample image, that is, each item information in the second detection frame is segmented, where the segmentation result of the second target segmentation map is accurate.
S440, according to the second sample image and the second target segmentation image, performing model training on a first target detection layer and an image segmentation layer in the image segmentation model.
Optionally, the image segmentation layer is configured to determine first target feature information corresponding to the target detection box in the target feature information, calculate a probability value of each pixel in the first target feature information, and obtain the image segmentation result graph according to the probability value of each pixel.
Understandably, in the S440, the first target detection layer and the image segmentation layer obtained in the S420 in the image segmentation model are subjected to model training by using the second sample image and the second target segmentation map obtained in the S430, so as to generate the image segmentation model.
The image segmentation model training method provided by the embodiment of the disclosure trains a target detection layer in an image segmentation model by obtaining a first sample image and a first target detection frame corresponding to the first sample image, performs model training on the first target detection layer and the image segmentation layer in the image segmentation model by obtaining a second sample image and a second target segmentation image corresponding to the second sample image, obtains the image segmentation model, performs combined training on the target detection layer and the image segmentation layer by obtaining a new sample image by adopting the target detection layer trained in advance, so as to continuously converge a network layer, thereby not only further improving the training precision of the network model, accelerating the convergence speed of the model, maintaining the stability of the network training, but also enabling the accuracy of the model to be not lower than the accuracy of the original target detection layer after the image segmentation layer is subsequently added, therefore, the accuracy of the image segmentation model is effectively ensured.
Fig. 6 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure; on the basis of the foregoing embodiment, optionally, the model training is performed on the first target detection layer and the image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation map, and the specific implementation step of fig. 6 includes:
s610, inputting the second sample image into the first target detection layer, and obtaining second feature information and a second prediction target detection frame corresponding to the second sample image.
It can be understood that the first target detection layer is a network layer trained by using the first sample image and updated to extract the second feature information, which is the image feature in the second sample image, and the second predicted target detection frame, and therefore, the accuracy of framing the question information in the second sample image in the second predicted target detection frame obtained by the trained first target detection layer is relatively high, and the training of the image segmentation layer in the image segmentation model is facilitated.
S620, inputting the second feature information and the second prediction target detection frame into the image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map.
Understandably, the second feature information and the second predicted target detection box obtained in the step S610 are input into an image segmentation layer in the image segmentation model, wherein the image segmentation layer is used for determining target feature information corresponding to the second predicted target detection box in the second feature information, calculating a probability value of each pixel point in the target feature information, and obtaining a second predicted target segmentation graph according to the probability value of each pixel point.
Alternatively, as shown in fig. 5, the segmentation layer 150 may include a convolution layer, a regional feature aggregation layer, and an example segmentation layer, and the regional feature aggregation layer (ROI Align) may be used to determine target feature information corresponding to the second predicted target detection frame in the second feature information, and scale the target feature information, preferably, scale the target region to a fixed size of 7 × 7, calculate a probability value of each pixel point in the target region scaled to a size of 7 × 7 using an example segmentation layer (mask predictor), and perform image segmentation to obtain a second predicted target segmentation map, where the number of the second predicted target segmentation maps is the same as the number of target objects selected by the second predicted target detection frame.
S630, determining a second loss function according to the second feature information, the second predicted target detection frame, the second predicted target segmentation map and the second target segmentation map.
Understandably, a second loss function is determined according to the second characteristic information obtained in the step S620, the second predicted target detection box, the second predicted target segmentation map and the second target segmentation map.
And S640, updating the parameters of the first target detection layer and the parameters of the image segmentation layer according to the second loss function.
Understandably, according to the second loss function obtained in S630, the parameters of the first target detection layer and the parameters of the image segmentation layer are updated to obtain the image segmentation model, where the first target detection layer is the updated target detection layer.
According to the image segmentation model training method and device, the image segmentation model is trained through the first sample image, the parameters of the first target detection layer and the parameters of the image segmentation layer can be updated, and on the basis of training the target detection layer, through repeated iteration training, the parameters of the first target detection layer and the parameters of the image segmentation layer are updated simultaneously, so that the image segmentation model is higher and higher in accuracy, the convergence rate is higher and more stable, and the accuracy of the image segmentation model is improved.
Fig. 7 is a flowchart of an image recognition method according to an embodiment of the disclosure. For example, the image recognition method may be performed by the terminal 310. Similarly, the image recognition method may also be performed by the server 320. Specifically, the terminal 310 may obtain a trained image segmentation model from the server 320, and further, the terminal 310 performs image recognition on a target object in the target image according to the trained image segmentation model. Specifically, the method illustrated in fig. 7 includes the following steps:
s710, acquiring a target image, wherein the target image comprises one or more target objects.
Optionally, the target image may specifically refer to an image shot, captured, or received by a user, where the target image includes one or more target objects, the shot image including one or more items of information is used as the target image, and the target object may be content of each item included in the target image, for example, content corresponding to the item a, the item B, or the item C included in fig. 2.
Optionally, the size of the obtained target image is normalized, the height or the width of the target image is judged according to the set maximum side length, and the target image is scaled in an equal ratio according to the judgment result, so that the long edge of the image is smaller than or equal to the preset maximum side length.
S720, according to the target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
It can be understood that, by using the image segmentation model trained in the above embodiment, the target detection and the image segmentation are performed on the target image obtained in S710 to obtain a target detection frame and an image segmentation result map corresponding to the target detection frame, where the target detection frame selects all the topics included in the target image on a frame-by-frame basis on the target image, and the image segmentation result map performs image segmentation on the topic information framed by the target detection frame, that is, the size of the target detection frame of each topic is the same as the size of the image segmentation result map.
And S730, cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object.
Understandably, the target detection frame and the image segmentation result graph obtained according to the S720 are obtained. And cutting the target image to obtain a target area corresponding to the target object. Optionally, the target image is cut according to the target detection frame to obtain the first target image; determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph; obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area; correcting the first target image according to the inclination angle of the minimum inclination matrix; and cutting the corrected first target image according to the width and the height of the minimum tilt matrix to obtain a target area corresponding to the target object.
Understandably, cutting the target image according to the coordinate information in the target detection frame to obtain a first target image corresponding to the target object in the target image; then, determining the foreground segmented from the image segmentation result image, namely the maximum connected region of the target object, wherein the number of the image segmentation result images obtained by the image segmentation model is the same as that of the first target image; obtaining a minimum inclination matrix of the outline according to the pixel points of the outline of the maximum communication area, wherein the minimum inclination matrix can be represented by a central point coordinate (x, y), the width and the height (width, height) of an inclined rectangle and an inclination angle theta, the inclination angle theta is an included angle formed by anticlockwise rotation of a horizontal shaft (x axis) and a first edge of the touched rectangle, the side length of the edge is width, and the side length of the other edge is height; and (4) turning the first target image to be positive according to the inclination angle theta of each minimum oblique rectangle, and cutting out the minimum title area in the first target image according to the central point and the width and the height of the minimum oblique rectangle.
It can be understood that, in this embodiment, the finally calculated minimum tilt rectangle information may also be output to the user, and the user cuts out a desired topic area according to the information of the minimum tilt matrix, and returns the final topic area to the terminal or the server for identification.
S740, determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
It can be understood that the content of the target object in the target region obtained in S730 is determined by using an optical character recognition algorithm to obtain a recognition result, and the obtained target region may be an image containing only a single title information, that is, only the content of one target object.
The image recognition method provided by the embodiment of the disclosure obtains a target image, obtains a target detection frame and an image segmentation result image corresponding to the target detection frame by using a pre-trained image segmentation model, cuts the target image according to the target detection frame and the image segmentation result image to obtain a target area corresponding to a target object, determines the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result, combines the target detection algorithm and the image segmentation algorithm to segment the target object in the target detection frame, thereby cutting the target image, accurately distinguishing the boundary of each topic to obtain a target area containing a single topic, effectively reducing the phenomenon that the target area of each topic is overlapped, thereby accurately recognizing the target area, the method avoids the text which is interfered when a single question is identified, and effectively improves the accuracy of image identification.
Fig. 8 is a flowchart of an image recognition method according to an embodiment of the present disclosure. In a basic implementation of the foregoing embodiment, optionally, before obtaining, according to the target image, a target detection frame corresponding to the target object and an image segmentation result map corresponding to the target detection frame by using an image segmentation model that is trained in advance, the method further includes:
and S810, inputting the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotating the target image according to the angle classification result.
Understandably, the acquired target image is input into a pre-trained angle classification model, the angle corresponding to the target object in the target image is judged, the target image is rotated according to the determined angle classification result, preferably, the angle type determined by the angle classification model can be the angle classification result in four directions of 0, 90, 180 and 270, and the target image picture can be corrected according to the angle classification result.
Optionally, a convolutional neural network may be selected to construct an angle classification model, and the constructed network is trained to obtain the angle classification model.
And S820, according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
It can be understood that, the step of performing target detection and image segmentation on the rotated target image obtained in S810 by using the image segmentation model trained in advance to obtain a target detection frame for framing one or more target objects in the rotated target image and an image segmentation result map corresponding to the target detection frame, and the subsequent step of performing clipping and identification on the target objects according to the target detection frame and the image segmentation result map is the same as that in the above embodiment, and is not repeated here.
According to the image identification method provided by the embodiment of the disclosure, the angle classification is carried out on the target image, the rotated target image is obtained according to the classification result, and the operations such as target framing, image segmentation and image identification are carried out according to the corrected target image, so that the accuracy of topic framing can be effectively improved, and the correctness of topic content identification in the cut target area is ensured.
Fig. 9 is a schematic diagram of an image recognition method according to an embodiment of the present disclosure, and based on the above embodiment, the results obtained in each step of the image recognition method are described with reference to fig. 9 as an example.
Taking fig. 2 as an example of the acquired target image, wherein the title information included in the title a, the title B, and the title C is taken as a target object, and taking the title B in fig. 2 as an example, each step will be described in detail.
According to the target image, a pre-trained image segmentation model is utilized to obtain a target detection frame and an image segmentation result graph B corresponding to the target detection frame, wherein the target detection frame can be as shown in FIG. 2, all question information included in the target image is framed and selected to obtain the target detection frame, and an image segmentation result graph corresponding to each question information frame in the target detection frame is also obtained, for example, FIG. 2 includes 3 question detection frames, and after image segmentation processing, the image segmentation result graph includes 3 image segmentation result graphs.
Cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object, which may specifically include:
in fig. 2, cutting is performed according to the result of framing each target object, so as to obtain a first target image 910 containing a single target object, i.e. title information;
determining the maximum connected region of the target object, namely, the topic B, segmented in the image segmentation result graph 920 according to the image segmentation result graph 920;
obtaining a minimum tilt matrix 940 of the outline according to the pixel points of the outline 930 of the maximum connected region, wherein the gray line in 930 is the outline of the maximum connected region; in the minimum tilt matrix 940 of the profile, θ is a tilt angle, a black dot represents a central point of the minimum tilt matrix, and a subject frame selection is performed in the obtained target image according to the minimum tilt matrix information, so that a target frame selection result as shown in fig. 10 can be obtained, and thus, compared with the target frame selection result shown in fig. 2, the frame selection result of the subject information is more accurate;
and correcting the corresponding first target image 910 according to the inclination angle of the minimum inclination matrix to obtain 950 an image obtained by correcting the first target image 910 according to the inclination angle, and cutting out the minimum title region according to the center point and the width and height of the minimum inclination matrix, namely cutting 950 according to the center point and the width and height of the minimum inclination matrix to obtain 960 the target region corresponding to the target object.
The content of the target object in the target area 960 is determined by using an optical character recognition algorithm, and a recognition result 970, that is, the topic information "B.
According to the image identification method provided by the embodiment of the disclosure, the target detection algorithm and the image segmentation algorithm are combined to cut the target image, so that the boundaries of each topic can be accurately distinguished, the target area containing a single topic is obtained, the phenomenon that the target areas of each topic are overlapped is effectively reduced, the target area is accurately identified, the text which is interfered when the single topic is identified is avoided, and the accuracy of image identification is effectively improved.
Fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure. The apparatus 1100 comprises an acquisition module 1101, an image segmentation module 1102, an image cropping module 1103, and an image recognition module 1104.
An obtaining module 1101 is configured to obtain a target image, where the target image includes one or more target objects.
And an image segmentation module 1102, configured to obtain, according to the target image, a target detection frame and an image segmentation result map corresponding to the target detection frame by using a pre-trained image segmentation model.
An image clipping module 1103, configured to clip the target image according to the target detection frame and the image segmentation result map, so as to obtain a target area corresponding to the target object.
And the image recognition module 1104 is configured to determine the content of the target object in the target area by using an optical character recognition algorithm, so as to obtain a recognition result.
Optionally, the image recognition apparatus 1100 further includes an image rotation module, where the image rotation module is configured to input the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotate the target image according to the angle classification result; and according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
Optionally, the image cropping module 1103 specifically includes: cutting the target image according to the target detection frame to obtain the first target image; determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph; obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area; and correcting the minimum tilt matrix, and cutting the first target image according to the corrected minimum tilt matrix to obtain a target area corresponding to the target object.
Understandably, the image clipping module 1103 is connected to the image segmentation module 1102 and the acquisition module 1101, and clips the target image obtained by the acquisition module 1101 according to the target detection frame obtained by the image segmentation module 1102 to obtain a first target image, determines the minimum tilt matrix according to the image segmentation result graph obtained by the image segmentation module 1102, and clips the first target image according to the corrected minimum tilt matrix to obtain a target area corresponding to the target object.
Fig. 11 is an image recognition apparatus provided in an embodiment of the present disclosure, which can be used to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, and are not described herein again.
Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 1200 may be a server or a terminal as described above. The electronic device provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the image recognition method, as shown in fig. 12, the electronic device 1200 includes: memory 1210, processor 1220, and communications interface 1230; wherein the computer program is stored in the memory 1210 and is configured to be executed by the processor 1220 for performing the image recognition method as described above.
In addition, the embodiments of the present disclosure also provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the image recognition method as described above is implemented.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1.一种图像识别方法,其特征在于,包括:1. an image recognition method, is characterized in that, comprises: 获取目标图像,所述目标图像中包含一个或多个目标对象;obtaining a target image, the target image contains one or more target objects; 根据所述目标图像,利用预先训练完成的图像分割模型得到目标检测框以及与所述目标检测框对应的图像分割结果图;According to the target image, a target detection frame and an image segmentation result map corresponding to the target detection frame are obtained by using a pre-trained image segmentation model; 根据所述目标检测框和所述图像分割结果图,对所述目标图像进行裁剪,得到与所述目标对象对应的目标区域;According to the target detection frame and the image segmentation result map, crop the target image to obtain a target area corresponding to the target object; 利用光学字符识别算法确定所述目标区域中所述目标对象的内容,得到识别结果。The content of the target object in the target area is determined by using an optical character recognition algorithm, and a recognition result is obtained. 2.根据权利要求1所述的方法,其特征在于,所述根据所述目标检测框和所述图像分割结果图,对所述目标图像进行裁剪,得到与所述目标对象对应的目标区域,包括:2. The method according to claim 1, wherein, according to the target detection frame and the image segmentation result map, the target image is cropped to obtain a target area corresponding to the target object, include: 根据所述目标检测框对所述目标图像进行裁剪,得到第一目标图像;The target image is cropped according to the target detection frame to obtain a first target image; 根据所述图像分割结果图,确定所述图像分割结果图中分割出来的所述目标对象的最大连通区域;According to the image segmentation result graph, determine the maximum connected area of the target object segmented in the image segmentation result graph; 根据所述最大连通区域的轮廓的像素点,得到所述轮廓的最小倾斜矩阵;According to the pixel points of the contour of the largest connected region, the minimum tilt matrix of the contour is obtained; 根据所述最小倾斜矩阵的倾斜角度将所述第一目标图像转正;Correcting the first target image according to the inclination angle of the minimum inclination matrix; 根据所述最小倾斜矩阵的宽和高对所述转正后的第一目标图像进行裁剪,得到与所述目标对象对应的目标区域。The first target image after being normalized is cropped according to the width and height of the minimum tilt matrix to obtain a target area corresponding to the target object. 3.根据权利要求1所述的方法,其特征在于,在所述根据所述目标图像,利用预先训练完成的图像分割模型得到与所述目标对象对应的目标检测框以及与所述目标检测框对应的图像分割结果图之前,还包括:3 . The method according to claim 1 , wherein, according to the target image, a target detection frame corresponding to the target object and a target detection frame corresponding to the target object are obtained by using a pre-trained image segmentation model. 4 . Before the corresponding image segmentation result map, it also includes: 将所述目标图像输入到预先训练完成的角度分类模型中,得到所述目标图像的角度分类结果,根据所述角度分类结果将所述目标图像进行旋转;Inputting the target image into the pre-trained angle classification model, obtaining the angle classification result of the target image, and rotating the target image according to the angle classification result; 根据所述旋转后的目标图像,利用预先训练完成的图像分割模型得到目标检测框以及与所述目标检测框对应的图像分割结果图。According to the rotated target image, a target detection frame and an image segmentation result map corresponding to the target detection frame are obtained by using a pre-trained image segmentation model. 4.根据权利要求1所述的方法,其特征在于,所述图像分割模型包括目标检测层和图像分割层,所述目标检测层用于对所述目标图像进行特征提取和目标检测,得到目标特征信息和目标检测框,所述图像分割层用于根据所述目标特征信息以及所述目标检测框,得到所述图像分割结果图。4. The method according to claim 1, wherein the image segmentation model comprises a target detection layer and an image segmentation layer, and the target detection layer is used to perform feature extraction and target detection on the target image to obtain a target feature information and a target detection frame, the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame. 5.根据权利要求4所述的方法,其特征在于,所述图像分割层用于根据所述目标特征信息以及所述目标检测框,得到所述图像分割结果图,包括:5. The method according to claim 4, wherein the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame, comprising: 所述图像分割层用于确定所述目标特征信息中与所述目标检测框对应的第一目标特征信息,计算所述第一目标特征信息中的每个像素点的概率值,根据所述每个像素点的概率值,得到所述图像分割结果图。The image segmentation layer is used to determine the first target feature information corresponding to the target detection frame in the target feature information, and calculate the probability value of each pixel in the first target feature information. The probability value of each pixel point is obtained to obtain the image segmentation result map. 6.根据权利要求1所述的方法,其特征在于,所述获取目标图像之前,所述方法还包括生成图像分割模型,包括:6. The method according to claim 1, wherein before acquiring the target image, the method further comprises generating an image segmentation model, comprising: 获取第一样本图像以及所述第一样本图像中包含目标对象的第一目标检测框;acquiring a first sample image and a first target detection frame containing a target object in the first sample image; 根据所述第一样本图像和所述第一目标检测框,对所述图像分割模型中的目标检测层进行模型训练,得到第一目标检测层;According to the first sample image and the first target detection frame, model training is performed on the target detection layer in the image segmentation model to obtain a first target detection layer; 获取第二样本图像以及所述第二样本图像中包含目标对象的第二目标分割图;acquiring a second sample image and a second target segmentation map containing the target object in the second sample image; 根据所述第二样本图像和所述第二目标分割图,对所述图像分割模型中的第一目标检测层和图像分割层进行模型训练。According to the second sample image and the second target segmentation map, model training is performed on the first target detection layer and the image segmentation layer in the image segmentation model. 7.根据权利要求6所述的方法,其特征在于,所述根据所述第一样本图像和所述第一目标检测框,对所述图像分割模型中的目标检测层进行模型训练,得到第一目标检测层,包括:7. The method according to claim 6, wherein, according to the first sample image and the first target detection frame, model training is performed on the target detection layer in the image segmentation model to obtain The first object detection layer, including: 将所述第一样本图像输入到所述图像分割模型中的目标检测层,得到第一预测目标检测框;inputting the first sample image into the target detection layer in the image segmentation model to obtain a first predicted target detection frame; 根据所述第一预测目标检测框和所述第一目标检测框,确定第一损失函数;determining a first loss function according to the first predicted target detection frame and the first target detection frame; 根据所述第一损失函数,更新所述目标检测层的参数,得到第一目标检测层。According to the first loss function, the parameters of the target detection layer are updated to obtain a first target detection layer. 8.根据权利要求6所述的方法,其特征在于,所述根据所述第二样本图像和所述第二目标分割图,对所述图像分割模型中的第一目标检测层和图像分割层进行模型训练,包括:8 . The method according to claim 6 , wherein, according to the second sample image and the second target segmentation map, the first target detection layer and the image segmentation layer in the image segmentation model are analyzed. 9 . Perform model training, including: 将所述第二样本图像输入到所述第一目标检测层,得到与所述第二样本图像对应的第二特征信息和第二预测目标检测框;Inputting the second sample image to the first target detection layer to obtain second feature information and a second predicted target detection frame corresponding to the second sample image; 将所述第二特征信息和所述第二预测目标检测框输入到所述图像分割模型中的图像分割层,得到第二预测目标分割图;Inputting the second feature information and the second prediction target detection frame into the image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map; 根据所述第二特征信息、所述第二预测目标检测框、所述第二预测目标分割图和所述第二目标分割图,确定第二损失函数;determining a second loss function according to the second feature information, the second prediction target detection frame, the second prediction target segmentation map, and the second target segmentation map; 根据所述第二损失函数,更新所述第一目标检测层的参数和所述图像分割层的参数。According to the second loss function, the parameters of the first object detection layer and the parameters of the image segmentation layer are updated. 9.一种图像识别装置,其特征在于,所述装置包括:9. An image recognition device, wherein the device comprises: 获取模块,用于获取目标图像,所述目标图像中包含一个或多个目标对象;an acquisition module for acquiring a target image, the target image including one or more target objects; 图像分割模块,用于根据所述目标图像,利用预先训练完成的图像分割模型得到目标检测框以及与所述目标检测框对应的图像分割结果图;an image segmentation module, configured to obtain a target detection frame and an image segmentation result map corresponding to the target detection frame by using a pre-trained image segmentation model according to the target image; 图像裁剪模块,用于根据所述目标检测框和所述图像分割结果图,对所述目标图像进行裁剪,得到与所述目标对象对应的目标区域;an image cropping module, configured to crop the target image according to the target detection frame and the image segmentation result map to obtain a target area corresponding to the target object; 图像识别模块,用于利用光学字符识别算法确定所述目标区域中所述目标对象的内容,得到识别结果。The image recognition module is used for determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result. 10.一种电子设备,其特征在于,包括:10. An electronic device, comprising: 存储器;memory; 处理器;以及processor; and 计算机程序;Computer program; 其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行以实现如权利要求1-8中任一项所述的方法。wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-8. 11.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-8中任一项所述的方法。11. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-8 is implemented.
CN202110359351.1A 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium Pending CN113011409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110359351.1A CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110359351.1A CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113011409A true CN113011409A (en) 2021-06-22

Family

ID=76387941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110359351.1A Pending CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113011409A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673576A (en) * 2021-07-26 2021-11-19 浙江大华技术股份有限公司 Image detection method, terminal and computer readable storage medium thereof
CN113963351A (en) * 2021-09-08 2022-01-21 厦门天锐科技股份有限公司 Method for improving OCR recognition precision and speed
CN114937047A (en) * 2022-05-19 2022-08-23 深圳市优必选科技股份有限公司 Method, device, device and storage medium for segmenting images
CN115116065A (en) * 2022-06-01 2022-09-27 腾讯科技(深圳)有限公司 Scanning method, apparatus, electronic device, storage medium and program product
CN115546791A (en) * 2022-10-18 2022-12-30 读书郎教育科技有限公司 A method, storage medium and device for frame question recognition based on target detection
CN116664822A (en) * 2023-06-01 2023-08-29 广州阅数科技有限公司 Image target detection method based on automatic graph cutting algorithm
WO2024066375A1 (en) * 2022-09-29 2024-04-04 青岛海尔空调器有限总公司 Method and apparatus used by air conditioner for monitoring, and air conditioner and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301386B1 (en) * 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
WO2017162069A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Image text identification method and apparatus
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110969129A (en) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 End-to-end tax bill text detection and identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301386B1 (en) * 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
WO2017162069A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Image text identification method and apparatus
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110969129A (en) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 End-to-end tax bill text detection and identification method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673576A (en) * 2021-07-26 2021-11-19 浙江大华技术股份有限公司 Image detection method, terminal and computer readable storage medium thereof
CN113963351A (en) * 2021-09-08 2022-01-21 厦门天锐科技股份有限公司 Method for improving OCR recognition precision and speed
CN114937047A (en) * 2022-05-19 2022-08-23 深圳市优必选科技股份有限公司 Method, device, device and storage medium for segmenting images
CN114937047B (en) * 2022-05-19 2025-07-08 深圳市优必选科技股份有限公司 Method, device, equipment and storage medium for dividing image
CN115116065A (en) * 2022-06-01 2022-09-27 腾讯科技(深圳)有限公司 Scanning method, apparatus, electronic device, storage medium and program product
WO2024066375A1 (en) * 2022-09-29 2024-04-04 青岛海尔空调器有限总公司 Method and apparatus used by air conditioner for monitoring, and air conditioner and storage medium
CN115546791A (en) * 2022-10-18 2022-12-30 读书郎教育科技有限公司 A method, storage medium and device for frame question recognition based on target detection
CN116664822A (en) * 2023-06-01 2023-08-29 广州阅数科技有限公司 Image target detection method based on automatic graph cutting algorithm

Similar Documents

Publication Publication Date Title
CN113011409A (en) Image identification method and device, electronic equipment and storage medium
US10936911B2 (en) Logo detection
CN109146892B (en) Image clipping method and device based on aesthetics
CN109993040B (en) Text recognition method and device
KR101479387B1 (en) Methods and apparatuses for face detection
CN112541395A (en) Target detection and tracking method and device, storage medium and electronic device
US9721387B2 (en) Systems and methods for implementing augmented reality
US20100194679A1 (en) Gesture recognition system and method thereof
CN110909724B (en) A thumbnail generation method for multi-target images
WO2018103608A1 (en) Text detection method, device and storage medium
CN113313083B (en) Text detection method and device
US11647294B2 (en) Panoramic video data process
CN110460838B (en) Lens switching detection method and device and computer equipment
JP2017211939A (en) Generating device, generating method, and generating program
CN113850238B (en) Document detection method, device, electronic device and storage medium
CN113033256B (en) A training method and device for fingertip detection model
CN112434696A (en) Text direction correction method, device, equipment and storage medium
CN113657369A (en) Character recognition method and related equipment thereof
CN113850805B (en) Multi-document detection method and device, electronic equipment and storage medium
US20170091760A1 (en) Device and method for currency conversion
WO2025261263A1 (en) Document image extraction method and apparatus, and storage medium and electronic device
CN113657370A (en) Character recognition method and related equipment thereof
CN113052162A (en) Text recognition method and device, readable storage medium and computing equipment
CN119048343B (en) A video image stitching method and system based on feature matching
CN114529905A (en) Pinyin identification method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210622