[go: up one dir, main page]

CN114926812B - An adaptive focusing positioning target detection method - Google Patents

An adaptive focusing positioning target detection method Download PDF

Info

Publication number
CN114926812B
CN114926812B CN202210501677.8A CN202210501677A CN114926812B CN 114926812 B CN114926812 B CN 114926812B CN 202210501677 A CN202210501677 A CN 202210501677A CN 114926812 B CN114926812 B CN 114926812B
Authority
CN
China
Prior art keywords
layer
target
mask
processing unit
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210501677.8A
Other languages
Chinese (zh)
Other versions
CN114926812A (en
Inventor
施颖君
石文君
朱冬晨
李嘉茂
张晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202210501677.8A priority Critical patent/CN114926812B/en
Publication of CN114926812A publication Critical patent/CN114926812A/en
Application granted granted Critical
Publication of CN114926812B publication Critical patent/CN114926812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种自适应聚焦定位目标检测方法,包括以下步骤:接收待识别图像;将所述待识别图像输入至目标检测模型中得到所述待识别图像中目标的位置和类别。所述目标检测模型包括:特征提取层,用于提取所述待识别图像的特征;目标类别预测层,用于将所述特征提取层提取的每层特征进行分块操作,并对每一个块进行类别预测和系数预测;目标定位预测层,用于根据所述特征提取层提取的特征生成掩膜元组,并将所述掩膜元组与所述目标类别预测层得到的系数相乘后求和得到目标掩码。本发明能够提升目标检测的精度。

The present invention relates to an adaptive focusing and positioning target detection method, comprising the following steps: receiving an image to be identified; inputting the image to be identified into a target detection model to obtain the position and category of the target in the image to be identified. The target detection model comprises: a feature extraction layer, used to extract the features of the image to be identified; a target category prediction layer, used to perform block operations on each layer of features extracted by the feature extraction layer, and perform category prediction and coefficient prediction on each block; a target positioning prediction layer, used to generate a mask tuple according to the features extracted by the feature extraction layer, and multiply the mask tuple by the coefficients obtained by the target category prediction layer and then sum them to obtain a target mask. The present invention can improve the accuracy of target detection.

Description

Self-adaptive focusing positioning target detection method
Technical Field
The invention relates to the technical field of target detection, in particular to a self-adaptive focusing positioning target detection method.
Background
Current traffic sign detection can be broadly divided into two main categories. One is the traditional identification of traffic signs according to the shape, color and other features of the traffic signs, and the other is the use of deep learning to identify traffic signs. The method of deep learning can be further divided into a single-stage and a double-stage according to the method adopted by the method. The method comprises the steps of firstly generating candidate frames with various sizes and dimensions at each pixel point by utilizing a neural network, and then judging whether targets and categories of the targets are contained in the candidate frames. The single-stage method is to directly obtain the parameters of the target rectangular frame and the class labels of the corresponding targets through the neural network. The accuracy of the dual-stage target detection method is generally higher than that of the single-stage method, but this is at the cost of computational effort.
Compared with other target detection tasks, the traffic sign detection task does not need to consider the shielding condition, but the similarity of different traffic signs and the recognition difficulty caused by the changeable traffic sign sizes caused by different distances are difficult.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive focusing positioning target detection method which can improve the target detection precision.
The technical scheme adopted by the invention for solving the technical problems is that the invention provides a self-adaptive focusing and positioning target detection method, which comprises the following steps:
Receiving an image to be identified;
Inputting the image to be identified into a target detection model to obtain the position and the category of a target in the image to be identified, wherein the target detection model comprises:
the feature extraction layer is used for extracting the features of the image to be identified;
The target category prediction layer is used for carrying out blocking operation on each layer of features extracted by the feature extraction layer, and carrying out category prediction and coefficient prediction on each block;
and the target positioning prediction layer is used for generating a mask tuple according to the characteristics extracted by the characteristic extraction layer, multiplying the mask tuple by the coefficients obtained by the target category prediction layer, and summing the multiplied mask tuple to obtain a target mask.
The feature extraction layer adds an aliased residual structure between adjacent residual structures.
The aliasing residual structure comprises a first processing unit, a second processing unit and a second ReLU activation function layer, wherein the input of the first processing unit is the characteristic information of the layer, the output of the first processing unit and the low-layer characteristic information are simultaneously used as the input of the second processing unit, the output of the second processing unit and the characteristic information of the layer are used as the input of the second ReLU activation function layer, the first processing unit comprises a 3X 3 convolution layer, a first normalization layer and a first ReLU activation function layer which are sequentially connected, and the second processing unit comprises a 1X 1 convolution layer and a second normalization layer which are sequentially connected.
And the channel number of the mask tuple is the same as the vector dimension of the coefficient obtained by the target class prediction layer.
The total loss function of the target detection model comprises a category loss function, a segmentation loss function and a center point loss function, wherein the center point loss function is a mask-guided center point error function.
The center point loss function is: Wherein m represents the number of positive samples, k represents the index of S x S blocks from left to right and from top to bottom, i= [ k/S ], j= kmodS, I represents an indication function, 1 is when p i,j >0, otherwise 0; Representing the predicted center point position, c i represents the true center point position.
Advantageous effects
Compared with the prior art, the method has the advantages that the residual aliasing module is introduced to maintain low-level information so as to cope with the similarity between targets in traffic sign recognition tasks, and therefore the recognition accuracy is improved. According to the invention, the independent single target positioning result is obtained through the sum of the multiplied target base and the coefficient, the quality of the target base can be improved, and the influence of the low-quality target base is weakened, so that the aliasing phenomenon between targets existing in the current method is improved. The invention avoids the situation that the detection result is poor due to an error which is not emphasized in the segmentation by introducing the center point error of mask guidance.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a network architecture diagram of a target detection model in an embodiment of the invention;
FIG. 3 is a schematic structural view of a feature extraction layer in an embodiment of the invention;
Fig. 4 is a schematic diagram of an aliasing residual structure in an embodiment of the invention.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
The embodiment of the invention relates to a self-adaptive focusing positioning target detection method, as shown in fig. 1, comprising the following steps of receiving an image to be identified, and inputting the image to be identified into a target detection model to obtain the position and the category of a target in the image to be identified. The method for improving the target detection model comprises the steps of (1) introducing a residual aliasing module to maintain low-layer information so as to improve the identification accuracy, (2) obtaining independent single target positioning results through the sum of multiplied target bases and coefficients, weakening the influence of low-quality target bases, and improving the aliasing phenomenon between targets existing in the existing method, (3) introducing a mask-guided center point error to avoid the situation that the detection results are poor due to the error which is not emphasized in segmentation. Therefore, the detection method of the present embodiment can be applied to the detection of traffic signs, and the present invention will be further described below with reference to SOLOv as an example.
SOLOv2 the main idea is to divide the picture into S x S blocks and predict for each block his class (background class that does not contain an object). Each block also predicts a coefficient vector in N dimensions. SOLOv2 refers to the coefficient vectors predicted by all blocks as kernels. The target mask is generated by convolving the kernel with the mask branches. For a large target across multiple blocks SOLOv provides that the target center position determines which block is responsible for predicting the target.
The existing SOLOv network structure includes two parts, a feature extraction part and a decoding part.
The feature extraction part is mainly composed of resnet plus FPN. Resnet consists of a residual block and a downsampling operation. resnet continuously extracting the features by a residual error module, obtaining a new layer of features with larger receptive fields by downsampling operation, and enabling the features with large receptive fields to be integrated into the low layer features by FPN.
The decoding section may be divided into two branches. A branch performs a block operation on each layer of the FPN output, and each block performs class prediction and coefficient prediction. All coefficients of each layer are kernels. The other branch generates a mask with the same number of channels as the coefficient vector dimension. The mask for each object is generated by a kernel and mask convolution. One benefit of placing the core prediction and the class prediction in the same branch is that the result of the class prediction can be used to filter the predicted coefficients and filter out some blocks of background classes.
As shown in fig. 2, the object detection model in this embodiment includes a feature extraction layer, an object class prediction layer, and an object location prediction layer, where the feature extraction layer is used to extract features of the image to be identified, the object class prediction layer is used to perform a blocking operation on each layer of features extracted by the feature extraction layer, and perform class prediction and coefficient prediction on each block, and the object location prediction layer is used to generate a mask tuple according to the features extracted by the feature extraction layer, and sum the mask tuple multiplied by coefficients obtained by the object class prediction layer to obtain an object mask. The present embodiment improves on the feature extraction layer and the target location prediction layer in the existing SOLOv network architecture.
In this embodiment, the feature extraction layer is improved for Resnet, as shown in fig. 3, and adds an aliased residual structure between adjacent residual structures to enhance the transmission of detail features, and maintains low-level information through the introduced residual aliased structure, so as to improve the accuracy of recognition.
As shown in FIG. 4, the aliasing residual structure comprises a first processing unit, a second processing unit and a second ReLU activation function layer, wherein the input of the first processing unit is the characteristic information of the layer, the output of the first processing unit and the low-layer characteristic information are simultaneously used as the input of the second processing unit, the output of the second processing unit and the characteristic information of the layer are used as the input of the second ReLU activation function layer, the first processing unit comprises a 3×3 convolution layer, a first normalization layer and a first ReLU activation function layer which are sequentially connected, and the second processing unit comprises a 1×1 convolution layer and a second normalization layer which are sequentially connected. The structure strengthens the detail features of the input features by introducing low-level features into the current module and aliasing with the input features through the first processing unit, and is used between different layers of the encoder such that the detail features are preserved and transferred.
According to the characteristics extracted by the characteristic extraction layer, the target positioning prediction layer in the embodiment generates mask tuples with the same channel number as the vector dimension of the coefficient obtained by the target class prediction layer, multiplies the mask tuples by the coefficient obtained by the target class prediction layer, and then sums the multiplied mask tuples to obtain a target mask. The model tuples in this embodiment are similar to the concept of bases in vector space and can therefore be regarded as target bases. Only in vector space, after basis determination, the vector has a unique representation. However, in the neural network, the number of channels is fixed and the number of targets in the input picture is not fixed, so that there is no guarantee that the masks of the channels have linear uncorrelation or non-uniformity. The inventor finds that for a picture, most channels have little help to the detection result, so the method obtains the target mask by multiplying the mask tuple and the coefficient obtained by the target class prediction layer and then summing the result, so that the quality of the target base can be improved, the influence of the low-quality target base is weakened, and the aliasing phenomenon between targets existing in the SOLOv network is improved.
The target detection model in this embodiment introduces a center point error of mask guidance in terms of a loss function, so the total loss function of the target detection model in this embodiment is composed of a class loss function, a division loss function, and a center point loss function, where the center point loss function is: Wherein m represents the number of positive samples, k represents the index of S x S blocks from left to right and from top to bottom, i= [ k/S ], j= kmodS, I represents an indication function, 1 is when p i,j >0, otherwise 0; Representing the predicted center point position, c i representing the true center point position, c= (u c,vc), U, v denote pixel positions. The center point error of mask guiding is introduced to avoid the situation that the detection result is deteriorated due to an error which is not emphasized in the segmentation.
It is easy to find that the invention introduces a residual aliasing module to maintain low-level information so as to cope with the similarity between targets in traffic sign recognition tasks, thereby improving the recognition accuracy. According to the invention, the independent single target positioning result is obtained through the sum of the multiplied target base and the coefficient, the quality of the target base can be improved, and the influence of the low-quality target base is weakened, so that the aliasing phenomenon between targets existing in the current method is improved. The invention avoids the situation that the detection result is poor due to an error which is not emphasized in the segmentation by introducing the center point error of mask guidance.

Claims (6)

1. The self-adaptive focusing positioning target detection method is characterized by comprising the following steps of:
Receiving an image to be identified;
inputting the image to be identified into a target detection model to obtain the position and the category of a target in the image to be identified;
Wherein the object detection model comprises:
the feature extraction layer is used for extracting the features of the image to be identified;
The target category prediction layer is used for carrying out blocking operation on each layer of features extracted by the feature extraction layer, and carrying out category prediction and coefficient prediction on each block;
and the target positioning prediction layer is used for generating a mask tuple according to the characteristics extracted by the characteristic extraction layer, multiplying the mask tuple by the coefficients obtained by the target category prediction layer, and summing the multiplied mask tuple to obtain a target mask.
2. The adaptive focus positioning target detection method of claim 1, wherein the feature extraction layer adds an aliased residual structure between adjacent residual structures.
3. The adaptive focusing and positioning target detection method according to claim 2, wherein the aliasing residual structure comprises a first processing unit, a second processing unit and a second ReLU activation function layer, the input of the first processing unit is present layer feature information, the output of the first processing unit and low layer feature information are simultaneously used as the input of the second processing unit, the output of the second processing unit and the present layer feature information are used as the input of the second ReLU activation function layer, the first processing unit comprises a 3×3 convolution layer, a first normalization layer and a first ReLU activation function layer which are sequentially connected, and the second processing unit comprises a1×1 convolution layer and a second normalization layer which are sequentially connected.
4. The adaptive focus location target detection method according to claim 1, wherein the number of channels of the mask tuple is the same as the vector dimension of the coefficients obtained by the target class prediction layer.
5. The adaptive focus positioning object detection method of claim 1 in which the total loss function of the object detection model comprises a class loss function, a segmentation loss function, and a center point loss function, wherein the center point loss function is a mask-guided center point error function.
6. The adaptive focus positioning target detection method of claim 5 wherein the center point loss function is: Wherein m represents the number of positive samples, k represents the index of S x S blocks from left to right and from top to bottom, i= [ k/S ], j=k mod S, I represents an indication function, when p i,j >0, 1, otherwise 0; Representing the predicted center point position, c i represents the true center point position.
CN202210501677.8A 2022-05-09 2022-05-09 An adaptive focusing positioning target detection method Active CN114926812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210501677.8A CN114926812B (en) 2022-05-09 2022-05-09 An adaptive focusing positioning target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210501677.8A CN114926812B (en) 2022-05-09 2022-05-09 An adaptive focusing positioning target detection method

Publications (2)

Publication Number Publication Date
CN114926812A CN114926812A (en) 2022-08-19
CN114926812B true CN114926812B (en) 2025-06-27

Family

ID=82807989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210501677.8A Active CN114926812B (en) 2022-05-09 2022-05-09 An adaptive focusing positioning target detection method

Country Status (1)

Country Link
CN (1) CN114926812B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011334A (en) * 2023-07-10 2023-11-07 浙江工业大学 Pupil center tracking method for assisting cerebral apoplexy primary screening

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414499A (en) * 2019-07-26 2019-11-05 第四范式(北京)技术有限公司 Text position positioning method and system and model training method and system
CN113936268A (en) * 2021-12-16 2022-01-14 比亚迪股份有限公司 Obstacle detection method for rail vehicle, computer device, and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090122143A (en) * 2008-05-23 2009-11-26 엘지전자 주식회사 Audio signal processing method and apparatus
JP5375716B2 (en) * 2010-03-31 2013-12-25 ソニー株式会社 Base station, communication system and communication method
CN107622239B (en) * 2017-09-15 2019-11-26 北方工业大学 A Method for Detection of Specified Building Areas in Remote Sensing Images Constrained by Hierarchical Local Structure
US10824862B2 (en) * 2017-11-14 2020-11-03 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN110503112B (en) * 2019-08-27 2023-02-03 电子科技大学 A Small Target Detection and Recognition Method Based on Enhanced Feature Learning
CN111460967B (en) * 2020-03-27 2024-03-22 北京百度网讯科技有限公司 Illegal building identification method, device, equipment and storage medium
CN112232240B (en) * 2020-10-21 2024-08-27 南京师范大学 Road casting object detection and identification method based on optimized cross-over ratio function

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414499A (en) * 2019-07-26 2019-11-05 第四范式(北京)技术有限公司 Text position positioning method and system and model training method and system
CN113936268A (en) * 2021-12-16 2022-01-14 比亚迪股份有限公司 Obstacle detection method for rail vehicle, computer device, and storage medium

Also Published As

Publication number Publication date
CN114926812A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN106845487B (en) End-to-end license plate identification method
Yang et al. Prediction-guided distillation for dense object detection
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN111222513B (en) License plate number recognition method and device, electronic equipment and storage medium
CN119048730B (en) A target detection method for drone perspective
CN115147598A (en) Target detection segmentation method and device, intelligent terminal and storage medium
CN113313720B (en) Object segmentation method and device
CN117078656B (en) An unsupervised image quality assessment method based on multimodal cue learning
Zhou et al. Temporal keypoint matching and refinement network for pose estimation and tracking
CN113223011A (en) Small sample image segmentation method based on guide network and full-connection conditional random field
CN119863623A (en) SAM-based self-prompting semantic segmentation method, device and storage medium
CN111368850A (en) Image feature extraction method, image target detection method, image feature extraction device, image target detection device, convolution device, CNN network device and terminal
CN111126401A (en) License plate character recognition method based on context information
CN114926812B (en) An adaptive focusing positioning target detection method
CN116824330A (en) A small-sample cross-domain target detection method based on deep learning
CN116958873A (en) Pedestrian tracking method, device, electronic equipment and readable storage medium
CN118172540A (en) A camouflaged target detection method and system based on pyramid-type visual Transformer
CN114663839B (en) Method and system for re-identifying blocked pedestrians
CN115393868A (en) Text detection method and device, electronic equipment and storage medium
CN118657999B (en) Microalgae image classification method and related device based on feature calibration Transformer
CN113486718A (en) Fingertip detection method based on deep multitask learning
CN113468935A (en) Face recognition method
CN114764787B (en) An automatic focusing method for single-cell mass spectrometry system based on deep learning
CN116958545A (en) Composite material plate damage image segmentation method and device based on deep learning
CN116758552A (en) An end-to-end method for text detection and recognition based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant