[go: up one dir, main page]

WO2019045101A1 - Dispositif et programme de traitement d'images - Google Patents

Dispositif et programme de traitement d'images Download PDF

Info

Publication number
WO2019045101A1
WO2019045101A1 PCT/JP2018/032635 JP2018032635W WO2019045101A1 WO 2019045101 A1 WO2019045101 A1 WO 2019045101A1 JP 2018032635 W JP2018032635 W JP 2018032635W WO 2019045101 A1 WO2019045101 A1 WO 2019045101A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image data
character
detection means
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2018/032635
Other languages
English (en)
Japanese (ja)
Inventor
清晴 相澤
小川 徹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Tokyo NUC
Original Assignee
University of Tokyo NUC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Tokyo NUC filed Critical University of Tokyo NUC
Publication of WO2019045101A1 publication Critical patent/WO2019045101A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Definitions

  • the present invention relates to an image processing apparatus and program.
  • Non-Patent Document 1 In order to perform such processing, it has been conventionally considered to specify processing of a drawn person part and a serif part using the result of color and character recognition processing and the like (Non-Patent Document 1).
  • the present invention has been made in view of the above-mentioned circumstances, and in the processing of cartoon image data, the machine learning processing is used to compare the recognition accuracy of the top, face, body and character parts with the conventional one.
  • An object of the present invention is to provide an image processing apparatus and program that can be improved.
  • the present invention for solving the problems of the above-mentioned conventional example is an image processing apparatus, which comprises an accepting means for receiving cartoon image data, and detecting a frame portion of a cartoon drawn in the image data from the image data.
  • image detection means in the machine-learned state
  • face detection means in the state of machine-learned to detect the face portion drawn in the image data from the image data
  • image data from the image data Body detecting means in a state of machine learning to detect a body part drawn in
  • character detecting means in a state of machine learning to detect a character part included in the image data from the image data
  • Detection information generation means for generating information for identifying the body part detected by the body detection means, and information for identifying the character part detected by the character detection means;
  • the information specifying the top part, the information specifying the face part, the information specifying the body part, and the information specifying the character part are to be subjected to predetermined information processing.
  • the present invention it is possible to improve the recognition accuracy when recognizing a top, a face, a body, and a character from cartoon image data using machine learning processing as compared with the conventional one.
  • FIG. 7 is an explanatory diagram showing a schematic example of processing according to another example of the detection processing unit of the image processing apparatus according to the embodiment of the present invention. It is a functional block diagram showing the example of composition of the detection processing part of the image processing device concerning an embodiment of the invention.
  • the image processing apparatus 1 includes the control unit 11, the storage unit 12, the operation unit 13, the display unit 14, and the input / output unit 15, as illustrated in FIG. It is configured.
  • the control unit 11 is a program control device such as a CPU, and executes a program stored in the storage unit 12 to receive cartoon image data, and information for specifying a top part based on the received cartoon image data and , Information for identifying the face part, information for identifying the body part, and information for identifying the text part.
  • the control unit 11 according to the present embodiment is a frame detector in a state of being machine-learned to detect a comic piece drawn in the image data from the image data in the process of identifying each of these parts.
  • a face detector in a state of being machine-learned to detect a face part drawn in the image data from the image data, and a body part drawn in the image data from the image data
  • a body detector in a machine-learned state and a character detector in a machine-learned state to detect character portions included in the image data from the image data are used.
  • control unit 11 executes predetermined information processing using the generated information for identifying the frame part, the information for identifying the face part, the information for identifying the body part, and the information for identifying the character part Do.
  • This information processing includes, for example, processing for outputting an image representing each part, optical character recognition processing for a character string within a range specified by information specifying a character part, and division for dividing image data for each frame part There is processing etc. The operation of these control units 11 will be described in detail later.
  • the storage unit 12 is a memory device or the like, and holds a program executed by the control unit 11. This program may be stored in a computer readable non-transitory recording medium and provided, and may be stored in the storage unit 12.
  • the storage unit 12 also operates as a work memory of the control unit 11.
  • the operation unit 13 is a mouse, a keyboard, or the like, receives an instruction operation of the user, and outputs the operation to the control unit 11.
  • the display unit 14 is, for example, a display or the like, and displays and outputs information based on an instruction input from the control unit 11.
  • the input / output unit 15 is, for example, a network interface or the like, receives data (image data and the like) from the outside, and outputs the data to the control unit 11.
  • the input / output unit 15 also sends data to an external device or the like according to an instruction input from the control unit 11.
  • the control unit 11 executes the program stored in the storage unit 12 to functionally receive the reception unit 21, the detection processing unit 22, and the detection information as illustrated in FIG. 2.
  • the configuration includes a generation unit 23 and an information processing unit 24.
  • the detection processing unit 22 also includes a frame detection unit 31, a face detection unit 32, a body detection unit 33, and a character detection unit 34.
  • the receiving unit 21 receives cartoon image data and outputs the data to the detection processing unit 22.
  • the cartoon image data is generally image data drawn by overlapping the face portion (F), the body portion (B) and the character portion (C) with each other (FIG. 3), and at least one frame (M) is included.
  • the receiving unit 21 enlarges or reduces the comic image data and resizes it to a size suitable for the input of the neural network.
  • the frame detection unit 31 of the detection processing unit 22 has a frame detector in a state of being machine-learned to detect a comic piece drawn in the image data from the image data.
  • the frame detector included in the frame detection unit 31 is R-CNN (Regions with CNN features) (Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014), Fast R-CNN (Girshick, Ross. "Fast r- cnn.” Proceedings of the IEEE International Conference on Computer Vision. 2015), Faster R-CNN (Ren, Shaoqing, et al.
  • a detector 40 employing a neural network such as an SSD is configured to include a base network unit 41 and a classifier 42.
  • the base network unit 41 outputs the range of the image including the candidate to be detected and the feature amount of the image within the range.
  • the classifier 42 is based on the output feature amount whether or not the detection target (in the case of the frame detection unit 31, a frame line dividing frames of comic image data) is included in the range of the output image. To judge.
  • the detector 40 employing such an SSD or the like artificially designates the range to be detected (in the case of the frame detection unit 31, the range of the shape circumscribed to the frame line dividing the frame of the comic image data) artificially Machine learning using the sample of.
  • the range to be detected in the case of the frame detection unit 31, the range of the shape circumscribed to the frame line dividing the frame of the comic image data
  • Machine learning since a specific method of machine learning and a method of using the detector 40 are widely known, the detailed description thereof is omitted here.
  • the face detection unit 32 has a face detector in a state of being machine-learned to detect a face portion of a character drawn in the image data from the image data.
  • This face detector can also be realized by employing a neural network configured by various methods such as an SSD, as with the frame detector included in the frame detection unit 31.
  • the face detector performs machine learning using a sample of image data in which a range of a predetermined shape circumscribing the face of a character included in cartoon image data, which is a range of a detection target, is artificially specified.
  • the body detection unit 33 has a body detector in a state of being machine-learned to detect the body part of the character drawn in the image data from the image data.
  • This body detector can also be realized by employing a neural network configured by various methods such as an SSD, as with the frame detector included in the frame detection unit 31.
  • the body detector performs machine learning using a sample of image data artificially designating a range of a predetermined shape circumscribing the body of a character included in cartoon image data, which is a range to be detected.
  • the character detection unit 34 has a character detector in a state of being machine-learned to detect a character portion drawn in the image data from the image data.
  • This character detector can also be realized by employing a neural network configured by various methods such as an SSD, as with the frame detector included in the frame detection unit 31.
  • the character detector performs machine learning using a sample of image data in which a range of a predetermined shape circumscribed to a character portion included in cartoon image data, which is a range to be detected, is artificially specified.
  • the detection information generation unit 23 includes, for the cartoon image data received by the reception unit 21, information for identifying a top portion detected by the frame detection unit 31, information for identifying a face portion detected by the face detection unit 32, and a body The information which specifies the body part which the detection part 33 detected, and the information which specifies the character part which the character detection part 34 detected are generated.
  • the information processing unit 24 executes predetermined information processing using the generated information for identifying the top part, the information for identifying the face part, the information for identifying the body part, and the information for identifying the text part.
  • This information processing includes, for example, processing for performing optical character recognition (OCR) on an image of a specified character portion and outputting the result.
  • OCR optical character recognition
  • the information processing unit 24 may translate and output a character string obtained as a result of optical character recognition into another language by machine translation processing.
  • An example of the present embodiment has the above configuration and operates as follows.
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 by the control unit 11 adopt an SSD, and each of the image data in advance and the image data It is assumed that machine learning has been performed so as to detect a comic piece, a face, a body, and a character part of the drawn cartoon.
  • the image processing apparatus 1 sets a frame detector in parallel to image data of the processing target, with the image data of a cartoon (not included in the machine learning sample) input from the user as the processing target.
  • a face detector, a body detector, and a character detector respectively detect a top part, a face part of a character, a body part, and a character part, and obtain information for specifying the range of the detected image.
  • the image processing apparatus 1 performs predetermined information processing, for example, identification using information identifying the top part, information identifying the face part, information identifying the body part, and information identifying the character part.
  • Optical character recognition OCR is performed on the image of the character part, and a character string obtained as a result of the optical character recognition is translated into another language by machine translation processing and output.
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 each include a base network and a detector that are independent of each other. It is not limited to this example.
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 may share one base network.
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 are, as illustrated in FIG.
  • the feature amount of the image within the range which is machine-learned, and based on the image data to be processed, the range of the image to be the candidate of the detection target and the feature amount of the image within the range Classifiers 42a, 42b, 42c, 42d provided independently corresponding to the base network unit 41 'for output, the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34, respectively.
  • the base network unit 41 'and the classifiers 42a, b, c, and d may be neural networks based on SSDs, but the SSDs are modified and used in the following points. That is, in the general output stage of the SSD, a plurality of candidate areas (anchor boxes) for detecting an object are determined in advance (a set of a plurality of anchor boxes is called an anchor set), and among the plurality of candidate areas From this, identify the area in which the object of interest is included.
  • anchor sets for example, 8732 anchor boxes are provided in each anchor set
  • the face detection unit 32, the body detection unit 33, and the character detection unit 34 four copies are used as classifiers 42a, b, c, and d.
  • the anchor box in the second anchor set A2 the face portion of the image data is machine-learned, and for the anchor box in the third anchor set A3, the body portion of the image data is machine-learned
  • the anchor box in the fourth anchor set A4 is in a state in which the character portion of the image data is machine-learned (FIG. 6).
  • a rectangle circumscribing the frame of the frame is estimated when the learning sample is input.
  • an output is performed so that a rectangle circumscribing the face portion of the character is estimated when the learning sample is input.
  • the error is back-propagated from the stage to update the parameters of the classifier 42b and the base network section 41 '.
  • an error is generated from the output stage so that a rectangle circumscribed to the character's body is estimated when a sample for learning is input. Are back-propagated to update the parameters of the classifier 42c and the base network unit 41 '.
  • the error is reversed from the output stage so that the rectangle circumscribing the character is estimated when the sample for learning is input. It propagates and updates the parameters of the classifier 42d and the base network unit 41 '.
  • g is an integer of 1 or more and G (m) or less
  • G (m) is the number of correct answers included in the m-th sample
  • t (m, g) and B (m, g) g) represents a g-th correct class (information indicating which is a top, a face, a body, or a character) of the m-th sample and a circumscribed rectangle.
  • the loss function (Loss function) L (z) is set as the sum of the position identification error Lloc (m, z) and the certainty factor Lconf (m, z) as follows.
  • z represents the output of the neural network
  • a (m, pos) is the index set of the anchor box to which the object is assigned for the m-th sample, specifically, And so on.
  • Lloc (m, z) and Lconf (m, z) are defined as follows. Note that A (m, neg) is a set of hard negative and among the anchor boxes not assigned to the object, the top k
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 are configured as described above
  • the classifiers 42a, b, c, d corresponding to the body detection unit 33 and the character detection unit 34 respectively specify information for identifying a top, information for identifying a face, and information for identifying a body. Since the information for identifying the character part is estimated, predetermined information processing is executed using these.
  • the control unit 11 does not enlarge or reduce the entire input cartoon image data, and resizes it to a size suitable for the input of the neural network, but does not
  • the extracted cartoon image data is divided into a plurality of divided portions based on predetermined conditions, and divided portions obtained by the division (partial cartoon image data, hereinafter referred to as It may be resized to a size suitable for the input of and output to the detection processing unit 22.
  • the predetermined condition is, for example, dividing the original cartoon image data (width w, height h) into 2 ⁇ 2 pieces (each having a width w / 2, a height h / 2, an overlap It may be a condition to divide into four areas which do not match. Further, this condition may be, for example, a condition of division at a portion where white (background color) continues based on the content of the cartoon image data. Furthermore, the predetermined condition may be a condition of division into pieces.
  • the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34 of the detection processing unit 22 do not divide a frame part from each of the partial image data And detect the face part, body part and character part.
  • the machine learning process may also be performed using partial image data obtained by division.
  • the detection information generation unit 23 detects, for each partial image data, information specifying the frame portion detected by the frame detection unit 31, information detected by the face detection unit 32, information specifying the face portion, and the body detection unit 33
  • the information for identifying the body part and the information for identifying the character part detected by the character detection unit 34 are generated, and these are put together and the top part, face part, body part, and the like in the original cartoon image data Generates information that identifies each of the letter parts.
  • the information processing unit 24 uses the information specifying the top part generated by the detection information generation unit 23, the information specifying the face part, the information specifying the body part, and the information specifying the character part. Execute the information processing of.
  • the predetermined condition for division into partial image data may be a condition of division for each frame.
  • the control unit 11 may detect a top portion by the operation of the detection processing unit 22 as the frame detection unit 31, and generate partial image data by dividing the detected top portion.
  • At least one curve (such as a circular arc) circumscribed to a frame portion (a frame line dividing a frame) output by the frame detection unit 31.
  • the information which specifies “which may be included in the unit” is output to the face detection unit 32, the body detection unit 33, and the character detection unit 34.
  • each of the face detection unit 32, the body detection unit 33, and the character detection unit 34 sets partial image data with each frame portion as partial image data for each frame portion specified by the information output from the frame detection unit 31.
  • the face part, the body part and the character part are detected from each of the Also in this example, the machine learning process related to the face detection unit 32, the body detection unit 33, and the character detection unit 34 may be performed using partial image data obtained by division.
  • the detection information generation unit 23 detects, for each frame portion detected by the frame detection unit 31, the information for identifying the face portion detected by the face detection unit 32, and the body portion detected by the body detection unit 33.
  • Information identifying the character and information identifying the character part detected by the character detection unit 34 are generated, and these are put together to form each of the top, face, body, and character in the original cartoon image data. Generate information to identify
  • the information processing unit 24 uses the information specifying the top part generated by the detection information generation unit 23, the information specifying the face part, the information specifying the body part, and the information specifying the character part. Execute the information processing of.
  • the control unit 11 When dividing image data to be processed, the control unit 11 resizes the image data before division to a size suitable for input to the neural network and performs the operation as the detection processing unit 22. Good. That is, as the operations of the frame detection unit 31, the face detection unit 32, the body detection unit 33, and the character detection unit 34, the control unit 11 calculates the frame part, the face part, the body part and the character part from each of the image data before division. To detect
  • control unit 11 stores information for identifying the frame part, the face part, the body part and the character part detected here, and further divides the image data to be processed into a plurality of divisions based on a predetermined condition. It is divided into parts, and the partial image data obtained by the division is resized to a size suitable for the input of the neural network, and the operation as the detection processing unit 22 is performed.
  • control unit 11 detects the top part, the face part, the body part and the character part detected from the image data before division, and the face part, the body part and the part detected from each of the partial image data after division. If a face part, a body part and a character part are detected from at least one of before or after division using information specifying a character part, the detection information generation unit 23 detects it from the at least one It generates and outputs information for identifying the face part, the body part and the character part (the detection results of each part are integrated and output, respectively).
  • the so-called OR of the face, body and character detected from the image data before division and the face, body and character detected from the image data after division is As a face portion, a body portion and a character portion detected from the image data to be processed, information for specifying a face portion, a body portion and a character portion detected from the image data to be processed is output.
  • control unit 11 since the top part can be generally detected with higher accuracy than the face part, the body part, or the character part, only one of the image data before division (or the image data after division may be used) Although it is considered sufficient if it is detected from the control unit 11, the control unit 11 also outputs information for specifying the frame portion when it is detected from at least one of the image data before division and the image data after division also for the frame portion. You may do it.
  • the division mode is one type here, plural types of partial image data obtained by division in plural types of division modes may be generated.
  • partial image data obtained by dividing in a plurality of types of modes such as partial image data obtained by division for each frame, partial image data obtained by division into 2 ⁇ 2, etc. (image before division)
  • image before division It is also possible to output information specifying a frame part, a face part, a body part and a character part detected from any one or more of (or each of the data may be added). Also in this case, when duplication occurs, the duplication is excluded and output.
  • independent detectors respectively correspond to the parts of the frame part, the face part of the character, the body part, and the character part, which are mutually overlapping or included in the cartoon image data.
  • the accuracy of the detection can be improved as compared with the conventional detection using machine learning.
  • Reference Signs List 1 image processing apparatus 11 control unit, 12 storage unit, 13 operation unit, 14 display unit, 15 input / output unit, 21 reception unit, 22 detection processing unit, 23 detection information generation unit, 24 information processing unit, 31 frame detection unit , 32 face detection units, 33 body detection units, 34 character detection units, 40 detectors, 41, 41 'base network units, 42 classifiers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Un dispositif de traitement d'images pour accepter des données d'image de bandes dessinées et générer, sur la base des données d'image de bandes dessinées, des informations qui spécifient une partie du cadre, des informations qui spécifient une partie de visage, des informations qui spécifient une partie de corps, et des informations qui spécifient une partie de texte par l'utilisation d'un résultat obtenu par l'apprentissage automatique de façon à spécifier chacune des parties du cadre, de la position de visage, de la partie de corps et de la partie de texte, à savoir, les informations étant fournies à un traitement d'informations prescrit.
PCT/JP2018/032635 2017-09-04 2018-09-03 Dispositif et programme de traitement d'images Ceased WO2019045101A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017-169632 2017-09-04
JP2017169632A JP2019046253A (ja) 2017-09-04 2017-09-04 画像処理装置及びプログラム

Publications (1)

Publication Number Publication Date
WO2019045101A1 true WO2019045101A1 (fr) 2019-03-07

Family

ID=65527562

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/032635 Ceased WO2019045101A1 (fr) 2017-09-04 2018-09-03 Dispositif et programme de traitement d'images

Country Status (2)

Country Link
JP (1) JP2019046253A (fr)
WO (1) WO2019045101A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070082A (zh) * 2020-08-24 2020-12-11 西安理工大学 一种基于实例感知成分合并网络的曲线文字定位方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7626384B2 (ja) * 2020-04-15 2025-02-04 ネットスター株式会社 学習済みモデル、サイト判定プログラム及びサイト判定システム
JP7324475B1 (ja) 2022-10-20 2023-08-10 株式会社hotarubi 情報処理装置、情報処理方法及び情報処理プログラム
JP7802902B1 (ja) * 2024-12-12 2026-01-20 株式会社Nttドコモ 情報処理装置及び翻訳方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002229766A (ja) * 2000-11-29 2002-08-16 Eastman Kodak Co 低表示機能端末に対して画像を送る方法
JP2011238043A (ja) * 2010-05-11 2011-11-24 Kddi Corp マンガコンテンツの要約を生成する要約マンガ画像生成装置、プログラム及び方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002229766A (ja) * 2000-11-29 2002-08-16 Eastman Kodak Co 低表示機能端末に対して画像を送る方法
JP2011238043A (ja) * 2010-05-11 2011-11-24 Kddi Corp マンガコンテンツの要約を生成する要約マンガ画像生成装置、プログラム及び方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FUJIMOTO, AZUMA ET AL.: "Creation and Analysis of Academic Manga Image Dataset with Annotations", IEICE TECHNICAL REPORT, vol. 116, no. 64, 15 March 2017 (2017-03-15), pages 35 - 40 *
FUKUI, HIROSHI ET AL.: "Research Trends in Pedestrian Detection Using Deep Learning", IEICE TECHNICAL REPORT, vol. 116, no. 366, 17 January 2017 (2017-01-17), pages 37 - 46 *
YANAGISAWA, HIDEAKI ET AL.: "Structural Analysis of Comic Images using Faster R-CNN", PCSJ/IMPS 2016, 30 November 2016 (2016-11-30), pages 80 - 81 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070082A (zh) * 2020-08-24 2020-12-11 西安理工大学 一种基于实例感知成分合并网络的曲线文字定位方法
CN112070082B (zh) * 2020-08-24 2023-04-07 西安理工大学 一种基于实例感知成分合并网络的曲线文字定位方法

Also Published As

Publication number Publication date
JP2019046253A (ja) 2019-03-22

Similar Documents

Publication Publication Date Title
US11430259B2 (en) Object detection based on joint feature extraction
US11681418B2 (en) Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning
Hoque et al. Real time bangladeshi sign language detection using faster r-cnn
KR101896357B1 (ko) 객체를 검출하는 방법, 디바이스 및 프로그램
JP2020095713A (ja) 対話型インタフェース及びデータベースクエリを用いた文書画像からの情報抽出の方法及びシステム
JP2020527260A (ja) テキスト検出分析方法、装置及びデバイス
US11893784B2 (en) Assessment of image quality for optical character recognition using machine learning
GB2549554A (en) Method and system for detecting an object in an image
US12498556B2 (en) Microscopy system and method for evaluating image processing results
CN112036447A (zh) 零样本目标检测系统及可学习语义和固定语义融合方法
CN114862845B (zh) 手机触摸屏的缺陷检测方法、装置、设备及存储介质
WO2019045101A1 (fr) Dispositif et programme de traitement d'images
CN110674777A (zh) 一种专利文本场景下的光学字符识别方法
US20240212382A1 (en) Extracting multiple documents from single image
CN115713481A (zh) 基于多模态融合的复杂场景目标检测方法及存储介质
Uddin et al. Horse detection using haar like features
CN115775386A (zh) 用户界面组件的识别方法、装置、计算机设备和存储介质
CN113780116A (zh) 发票分类方法、装置、计算机设备和存储介质
Guo et al. Multi-face detection and alignment using multiple kernels
CN113343989B (zh) 一种基于前景选择域自适应的目标检测方法及系统
WO2023112302A1 (fr) Dispositif et procédé d'assistance à la création de données d'entraînement
Mayer et al. Adjusted pixel features for robust facial component classification
CN117540715A (zh) 一种基于深度学习和计算机视觉的表格识别方法与系统
Kasem et al. Attention-guided hybrid learning for accurate defect classification in manufacturing environments
US20260017810A1 (en) Alignment system for specimen inspection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18852176

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18852176

Country of ref document: EP

Kind code of ref document: A1