[go: up one dir, main page]

CN111178217A - Method and device for detecting face image - Google Patents

Method and device for detecting face image Download PDF

Info

Publication number
CN111178217A
CN111178217A CN201911340156.3A CN201911340156A CN111178217A CN 111178217 A CN111178217 A CN 111178217A CN 201911340156 A CN201911340156 A CN 201911340156A CN 111178217 A CN111178217 A CN 111178217A
Authority
CN
China
Prior art keywords
image
feature
characteristic
sampling
characteristic image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911340156.3A
Other languages
Chinese (zh)
Inventor
周康明
曹磊磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN201911340156.3A priority Critical patent/CN111178217A/en
Publication of CN111178217A publication Critical patent/CN111178217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种人脸图像的检测方法及设备,能够构建基于深度可分离卷积的人脸检测模型,并将得到的人脸检测模型部署到移动终端,使得移动终端可使用部署的人脸检测模型对获取的待检测图像进行人脸检测,获取待检测图像中的人脸图像,与传统的深度卷积神经网络相比,减少了模型参数的数量和模型计算量,降低了模型整体的检测时间,使得可在存储空间和计算资源有限的移动终端上部署人脸检测模型,满足了人脸检测的实时性要求;另外,与现有的轻量化人脸检测模型相比,还提高了模型的检测精度。

Figure 201911340156

The present application provides a method and device for detecting a face image, which can build a face detection model based on depthwise separable convolution, and deploy the obtained face detection model to a mobile terminal, so that the mobile terminal can use the deployed human The face detection model performs face detection on the acquired image to be detected, and obtains the face image in the image to be detected. Compared with the traditional deep convolutional neural network, the number of model parameters and the amount of model calculation are reduced, and the overall model of the model is reduced. This makes it possible to deploy the face detection model on mobile terminals with limited storage space and computing resources, which meets the real-time requirements of face detection; in addition, compared with the existing lightweight face detection models, it also improves the the detection accuracy of the model.

Figure 201911340156

Description

Method and equipment for detecting face image
Technical Field
The present application relates to the field of image recognition, and in particular, to a method and an apparatus for detecting a face image.
Background
Face detection is the basis of tasks such as face recognition, face key point detection, face tracking, face expression recognition and the like, and is always concerned by academia and industry. In recent years, Face detection technologies are designed mainly based on a universal target detection framework, and known Face detection algorithms include S3FD (single Shot-innovative Face Detector), PyramidBox, SRN (Selective reflection Network), DSFD (Dual Shot Face Detector), retina Face, and the like. The algorithms usually take a deep convolution neural network model such as VGG Net or ResNet as a backbone network to extract image features, then carry out face detection according to the image features, and the face detection model constructed according to the algorithms has high performance, but the detection time is long, and the real-time requirement is difficult to achieve even under the condition of GPU acceleration. The reason that the detection time of the deep convolutional neural network model is long is that the parameter quantity of the deep convolutional neural network model like VGG Net or ResNet is large, so that the calculation quantity is large, and the required storage space and the calculation resources are large. The storage space and the computing resources of the mobile terminal are very limited, so that the face detection model established by the deep convolutional neural network is difficult to be directly deployed on the mobile terminal for use.
Disclosure of Invention
An object of the present application is to provide a method and an apparatus for detecting a face image, which are used to solve the problem that an existing face detection model based on a deep convolutional neural network is difficult to be directly deployed on a mobile terminal.
In order to achieve the above object, the present application provides a method for detecting a face image, wherein the method includes:
constructing a face detection model, wherein a feature extraction network corresponding to the face detection model comprises a plurality of image feature extraction sections which are sequentially connected, each image feature extraction section comprises a first processing block and a plurality of second processing blocks which are sequentially connected, the first processing block is used for outputting an input feature image after downsampling by using depth separable convolution, the second processing block is used for carrying out channel separation on the input feature image, and then the feature extraction is carried out on the feature image after channel separation by using the depth separable convolution and then the feature image is output;
and deploying the face detection model to a mobile terminal so that the mobile terminal can acquire an image to be detected, and performing face detection on the image to be detected according to the face detection model to acquire a face image in the image to be detected.
Further, constructing a face detection model, comprising:
constructing a feature extraction network and a feature detection network corresponding to the face detection model, wherein the feature extraction network further comprises an image feature pyramid network, the image feature pyramid network performs up-sampling or down-sampling according to the feature images output by the image feature extraction section, and the feature images obtained through the up-sampling or down-sampling form a multi-level image feature pyramid; the feature detection network acquires an image feature pyramid output by the image feature pyramid network, and outputs a classification feature image and a regression feature image according to the image feature pyramid, wherein the classification feature image is used for explaining the probability that pixel points belong to a face, and the regression feature image is used for giving information of a face frame;
inputting a training image into the feature extraction network to extract the face features, and acquiring a corresponding feature image;
inputting the feature image into the feature detection network for feature detection to obtain a corresponding classification feature image and a corresponding regression feature image;
comparing the classification characteristic image with the face classification labeled in advance by the training image to determine a classification error;
comparing the regression feature image with a face image frame labeled in advance by the training image to determine a regression feature error;
adjusting model parameters of the face detection model based on the classification error and the regression feature error;
and when a preset model training stopping condition is met, determining the current model parameters of the face detection model as the trained model parameters of the face detection model.
Further, the image feature pyramid network performs up-sampling or down-sampling according to the feature image output by the image feature extraction section, and forms a multi-level image feature pyramid from the feature images obtained through the up-sampling or down-sampling, including:
acquiring a characteristic image output by the last image characteristic extraction section in the plurality of sequentially connected image characteristic extraction sections, performing channel adjustment on the characteristic image through a convolution core with the size of 1 multiplied by 1, and determining the characteristic image after the channel adjustment as a reference characteristic image;
performing down-sampling on the reference characteristic image through a depth separable convolution processing block to obtain a feature image after down-sampling;
the reference characteristic image is subjected to up-sampling, and the obtained characteristic image is subjected to characteristic extraction through a depth separable convolution processing block to obtain an up-sampled characteristic image;
and sorting the up-sampled characteristic image, the reference characteristic image and the down-sampled characteristic image according to the image size to obtain a multi-level image characteristic pyramid.
Further, the method for downsampling the reference feature image through the depth separable convolution processing block to obtain a downsampled feature image further includes:
the feature image after down sampling is subjected to down sampling through a depth separable convolution processing block to obtain a feature image after down sampling;
the method comprises the following steps of up-sampling the reference characteristic image, then extracting the characteristics of the obtained characteristic image through a depth separable convolution processing block, and obtaining the characteristic image after up-sampling, wherein the method comprises the following steps:
the reference characteristic image is subjected to up-sampling through a bilinear interpolation algorithm, and a first characteristic image is obtained;
acquiring a characteristic image output by the last image characteristic extraction section connected with the last image characteristic extraction section, performing channel adjustment on the characteristic image through a convolution kernel with the size of 1 multiplied by 1, and performing pixel-by-pixel addition on the characteristic image after the channel adjustment and the first characteristic image to acquire a second characteristic image;
and performing feature extraction on the second feature image through a depth separable convolution processing block to obtain an up-sampled feature image.
Further, the depth separable convolution processing block includes:
the device comprises a depth separable convolution layer, a batch normalization layer, a convolution layer with convolution kernel size of 1 x 1, a batch normalization layer and an activation function layer which are connected in sequence, wherein the convolution kernel size of the depth separable convolution layer is 3 x 3.
Further, the first processing block is configured to:
inputting the input characteristic image into a first depth separable convolution processing block for downsampling to obtain a first downsampled characteristic image;
inputting the characteristic image obtained by the convolution layer with the convolution kernel size of 1 multiplied by 1 of the input characteristic image into a second depth separable convolution processing block for down-sampling to obtain a second down-sampled characteristic image;
respectively connecting the first downsampling characteristic image and the second downsampling characteristic image through convolution layers with convolution kernel size of 1 x 1 to obtain a first connection characteristic image;
and performing random channel mixing on the first connection characteristic image, and outputting a first random channel mixed characteristic image.
Further, the second processing block is configured to:
carrying out channel separation on the input characteristic image to obtain a characteristic image after channel separation;
sequentially inputting the characteristic image after channel separation into a convolutional layer with a convolutional kernel size of 1 multiplied by 1, a depth separable convolutional processing block and a convolutional layer with a convolutional kernel size of 1 multiplied by 1 to obtain a characteristic image after characteristic extraction;
connecting the characteristic image after the channel separation with the characteristic image after the characteristic extraction to obtain a second connection characteristic image;
and performing random channel mixing on the second connection characteristic image, and outputting a second random channel mixed characteristic image.
Further, the feature detection network acquires an image feature pyramid output by the image feature pyramid network, and outputs a classification feature image and a regression feature image according to the image feature pyramid, including:
the feature detection network acquires a feature image of each level in the image feature pyramid output by the image feature pyramid network;
inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3 to obtain a classification feature image with the channel number reduced to 2;
inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3, and obtaining a regression feature image with the number of channels reduced to 4.
Based on another aspect of the present application, the present application further provides an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the aforementioned method for detecting a face image.
The application also provides a computer readable medium, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to realize the detection method of the human face image.
Compared with the prior art, the scheme provided by the application can construct a face detection model based on the depth separable convolution, and the obtained face detection model is deployed to the mobile terminal, so that the mobile terminal can use the deployed face detection model to perform face detection on the obtained image to be detected, and obtain the face image in the image to be detected, compared with the traditional depth convolution neural network, the number of model parameters and the model calculation amount are reduced, the overall detection time of the model is reduced, the face detection model can be deployed on the mobile terminal with limited storage space and calculation resources, and the real-time requirement of face detection is met; in addition, compared with the existing lightweight face detection model, the detection precision of the model is also improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a schematic flowchart of a method for detecting a face image according to some embodiments of the present application.
Fig. 2 is a schematic network structure diagram corresponding to a face image detection method according to some preferred embodiments of the present application.
Fig. 3 is a schematic structural diagram of a plurality of image feature extraction segments in a feature extraction network according to some preferred embodiments of the present application.
Fig. 4 is a schematic flowchart of constructing an image feature pyramid according to some preferred embodiments of the present application.
Fig. 5 is a schematic structural diagram of a first processing block according to some embodiments of the present disclosure.
Fig. 6 is a schematic structural diagram of a second processing block according to some embodiments of the present application.
Fig. 7 is a schematic structural diagram of a detection network according to some preferred embodiments of the present application.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal and the network device each include one or more processors (CPUs), input/output interfaces, network interfaces, and memories.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
Fig. 1 illustrates a method for detecting a face image according to some embodiments of the present application, where the method specifically includes the following steps:
step S101, a face detection model is constructed, a feature extraction network corresponding to the face detection model comprises a plurality of image feature extraction sections which are sequentially connected, each image feature extraction section comprises a first processing block and a plurality of second processing blocks which are sequentially connected, the first processing block is used for outputting an input feature image after downsampling by using depth separable convolution, the second processing block is used for carrying out channel separation on the input feature image, and then the feature image after channel separation is output after feature extraction by using depth separable convolution;
step S102, deploying the face detection model to a mobile terminal so that the mobile terminal can obtain an image to be detected, and carrying out face detection on the image to be detected according to the face detection model to obtain a face image in the image to be detected.
The scheme is particularly suitable for scenes in which a face image is expected to be detected on a mobile terminal, a feature extraction network corresponding to a face detection model can be constructed through depth separable convolution, then the trained face detection model is deployed to the mobile terminal, and the mobile terminal performs face detection on the acquired image to be detected according to the deployed face detection model to obtain the corresponding face image.
In step S101, a face detection model is first constructed. The face detection model constructs a corresponding network structure based on depth separable convolution, the network structure corresponding to the face detection model may include a feature extraction network and a feature detection network, the feature extraction network is used for performing face-related feature extraction on the image, and the feature detection network is used for performing face detection on a feature image obtained by the feature extraction network to determine a face image therein.
The feature extraction network comprises a plurality of image feature extraction sections (stages), each image feature extraction section firstly receives an input feature image, then performs feature extraction on the input feature image to obtain a new feature image, and then outputs the new feature image. The image feature extraction sections are connected in sequence, and the output feature image of the previous image feature extraction section is used as the input feature image of the next image feature extraction section.
The image feature extraction section comprises a first processing block and a plurality of second processing blocks, the first processing block and the plurality of second processing blocks are sequentially connected, a feature image output by the first processing block is used as an input feature image of the first second processing block, and an output feature image of the first second processing block is used as an input feature image of the next second processing block. The first processing block downsamples an input feature image using depth separable convolution and outputs a downsampled feature image. The second processing block firstly carries out channel separation on the input characteristic image, then carries out characteristic extraction on the characteristic image after the channel separation by using depth separable convolution, and outputs the characteristic image after the characteristic extraction.
Downsampling, also known as reducing an image, is generally performed to generate a thumbnail of the corresponding image. One feature image is M × N in size, and is down-sampled by a factor of S to obtain an image of (M/S) × (N/S) size, where S is typically a common divisor of M and N. The feature image is subjected to down-sampling operation, so that the high-level semantic features of the feature image with low resolution can be obtained, and the down-sampled feature image has low resolution, namely small length and width, so that the number of pixels needing to be calculated is small, and the calculation complexity can be reduced. Here, the high-level semantic features refer to texture structures, semantic information, and the like of the image, and the convolutional layer initiated by the neural network learns the low-level shape features, color features, and the like of the image.
The depth separable convolution may separate the common convolution kernel into two separate convolution kernels, which perform depth convolution and point-by-point convolution, respectively. By using deep separable convolution, the number of multiplications in the convolution calculation process can be greatly reduced, thereby greatly reducing the calculation amount.
In some embodiments of the present application, constructing the face detection model may specifically include the following steps:
1) constructing a feature extraction network and a feature detection network corresponding to the face detection model;
the feature extraction network comprises a plurality of image feature extraction sections and also comprises an image feature pyramid network, wherein the image feature pyramid network performs up-sampling or down-sampling according to feature images output by the image feature extraction sections, and the feature images obtained through the up-sampling or down-sampling form a multi-level image feature pyramid.
The feature detection network acquires an image feature pyramid output by the image feature pyramid network, and outputs a classification feature image and a regression feature image according to the image feature pyramid, wherein the classification feature image is used for explaining the probability that pixel points belong to a face, and the regression feature image is used for giving information of a face frame.
2) Inputting a training image into the feature extraction network to extract the face features, and acquiring a corresponding feature image;
3) inputting the feature image into the feature detection network for feature detection to obtain a corresponding classification feature image and a corresponding regression feature image;
4) comparing the classification characteristic image with the face classification labeled in advance by the training image to determine a classification error;
5) comparing the regression feature image with a face image frame labeled in advance by the training image to determine a regression feature error;
6) adjusting model parameters of the face detection model based on the classification error and the regression feature error;
7) and when a preset model training stopping condition is met, determining the current model parameters of the face detection model as the trained model parameters of the face detection model.
Fig. 2 shows a schematic structural diagram of a preferred feature extraction network and a feature detection network, in which a backbone network and a feature pyramid are the feature extraction network, and a detection head is the feature detection network. The method comprises the steps that a backbone network receives an externally input image containing a human face, corresponding output characteristic images are obtained through 4 image characteristic extraction sections (stage 1-stage 4), the output characteristic images are input into a characteristic pyramid network to generate a characteristic pyramid consisting of 6 levels of characteristic images, and then the characteristic images corresponding to each level of the characteristic pyramid are input into a characteristic detection network.
In some embodiments of the present application, the feature extraction network further adds an initialization block before the plurality of image feature extraction segments, where the initialization block is configured to perform image initialization on an external input image, and then input an initialized feature image into the image feature extraction segment. Preferably, the initialization block may include a plurality of convolutional layers, a first processing block, and a plurality of second processing blocks, which are sequentially connected.
Fig. 3 shows a schematic structural diagram of a plurality of image feature extraction segments in a preferred feature extraction network, wherein an original image is processed by an initialization block and then input into a subsequent image feature extraction segment, the initialization block includes 3 convolutional layers, the convolutional kernel size of the convolutional layer is 3 × 3, a processing block B is a first processing block, a processing block a is a second processing block, and the number of the processing blocks a is 3. After the original image passes through 3 convolution layers of 3 multiplied by 3, the output channel of the obtained characteristic image is 24, the characteristic image is input into the processing block B for down sampling, the size of the output characteristic image is reduced but the number of channels is kept unchanged, the characteristic image output by the processing block B is sequentially input into the 3 processing blocks A, the size of the output characteristic image is unchanged, and the number of channels is unchanged. Preferably, the processing block B down-samples by a multiple of 2, and the size of the feature image output after the original image passes through the initialization block is 1/2 of the size of the original image. Each of the following 4 image feature extraction sections (stage 1-stage 4) is composed of a processing block B and 3 processing blocks a, from the first image feature extraction section to the last image feature extraction section, the number of channels of the output feature image is 2 times of the number of channels of the feature image output by the previous image feature extraction section, for example, the number of channels of stage1 is 58, the number of channels of stage2 is 116, the number of channels of stage3 is 232, and the number of channels of stage4 is 464; the size of the characteristic image is 1/2 of the characteristic image output by the previous image characteristic extraction segment, for example, the size of stage1 is 1/4 of the original image, the size of stage2 is 1/8 of the original image, the size of stage3 is 1/16 of the original image, and the size of stage4 is 1/32 of the original image. Here, the number of the processing blocks a in the image feature extraction segment may be set according to actual service requirements, for example, 2 or 4, and when a stage is composed of 1 processing block B and 2 processing blocks a connected, the network computation amount is reduced, but the accuracy effect is slightly worse; when a stage is composed of 1 processing block B and 4 processing blocks a, the amount of network computation increases, but the effect of accuracy is relatively high.
In addition, in order to pre-train the backbone network in fig. 3 on the Imagenet data set, the backbone network is continuously connected with a convolution layer with a convolution kernel size of 1 × 1, a global average pooling operation is adopted, and finally, a full-connection layer is input to output 1000 types of confidence coefficients.
In some embodiments of the present application, the image feature pyramid network performs up-sampling or down-sampling according to the feature image output by the image feature extraction section, and forms a multi-level image feature pyramid from the feature image obtained through the up-sampling or down-sampling, which may specifically include the following steps:
1) acquiring a characteristic image output by the last image characteristic extraction section in the plurality of sequentially connected image characteristic extraction sections, performing channel adjustment on the characteristic image through a convolution core with the size of 1 multiplied by 1, and determining the characteristic image after the channel adjustment as a reference characteristic image;
2) performing down-sampling on the reference characteristic image through a depth separable convolution processing block to obtain a feature image after down-sampling;
3) the reference characteristic image is subjected to up-sampling, and the obtained characteristic image is subjected to characteristic extraction through a depth separable convolution processing block to obtain an up-sampled characteristic image;
4) and sorting the up-sampled characteristic image, the reference characteristic image and the down-sampled characteristic image according to the image size to obtain a multi-level image characteristic pyramid.
Upsampling is also called image amplification, and the main purpose is to amplify an image, and an interpolation method is usually adopted, that is, a suitable interpolation algorithm is adopted to insert new pixels between pixel points on the basis of the original image pixels. The characteristic image is up-sampled through the up-sampling layer, so that the resolution of the characteristic image can be improved, and the capability of predicting the position of the face area is improved.
Fig. 4 shows a preferred procedure for constructing an image feature pyramid, where the image feature pyramid is generated according to a feature image output by an image feature extraction segment, the last image feature extraction segment in the figure is stage4, the number of channels of a feature image output by stage4 is 464, the feature image passes through a common convolution layer with a convolution kernel size of 1 × 1, the number of channels of the output feature image is adjusted to 116, the output feature image is a reference feature image, and the reference feature image is P5 in the figure. The P5 can be down-sampled and up-sampled respectively, and the down-sampling can be used to obtain a feature image with a smaller size, and the up-sampling can be used to obtain a feature image with a larger size. Downsampling the P5 through the depth separable convolution processing block can obtain a feature image P6.
In some embodiments of the present application, the obtained downsampled feature image may be further downsampled by the depth separable convolution processing block, and the obtained feature image is also the downsampled feature image, for example, the feature image P6 is further downsampled by the depth separable convolution processing block, so as to obtain the downsampled feature image P7.
In some embodiments of the present application, the depth separable convolution processing block may include: depth separable convolutional layers, Batch Normalization layers (BN layers), convolutional layers with a convolutional kernel size of 1 × 1, Batch Normalization layers (BN layers), and activation function layers (e.g., using the ReLU activation function), which are connected in sequence, with the output of the previous layer as input to the next layer. Here, the convolution kernel size of the depth separable convolution layer is 3 × 3. The number of input and output channels of the depth-separable convolution processing block can be determined as desired, as shown in fig. 4, the number of input channels of the depth-separable convolution processing block is 116, the number of output channels is 116, and the default downsampling multiple is 1, i.e., no downsampling is performed. When downsampling is needed, 2-fold downsampling can be performed by setting the downsampling multiple stride to 2, and for example, the feature image P6 with the size of P5 feature image 1/2 can be obtained after P5 is subjected to depth separable convolution processing blocks with the downsampling multiple set to 2.
In some embodiments of the present application, upsampling the reference feature image, and then performing feature extraction on the obtained feature image through the depth separable convolution processing block to obtain the upsampled feature image may include the following steps:
1) the reference characteristic image is subjected to up-sampling through a bilinear interpolation algorithm, and a first characteristic image is obtained;
2) acquiring a characteristic image output by the last image characteristic extraction section connected with the last image characteristic extraction section, performing channel adjustment on the characteristic image through a convolution kernel with the size of 1 multiplied by 1, and performing pixel-by-pixel addition on the characteristic image after the channel adjustment and the first characteristic image to acquire a second characteristic image;
3) and performing feature extraction on the second feature image through a depth separable convolution processing block to obtain an up-sampled feature image.
As shown in fig. 4, the reference feature image P5 is up-sampled by 2 times by a bilinear interpolation algorithm to obtain a first feature image, the previous image feature extraction stage of the last image feature extraction stage4 is stage3, the feature image output by stage3 is input into a convolution layer with a convolution kernel size of 1 × 1, and since the number of channels of the feature image output by stage3 is 232 and the number of channels of the feature image received by the depth separable convolution processing block is 116, the number of channels of the feature image output by stage3 needs to be adjusted to be smaller. The number of channels of the input characteristic image is adjusted from 232 to 116 by the convolutional layer, the characteristic image after channel adjustment is added element by element with the first characteristic image to obtain a second characteristic image, the second characteristic image is subjected to characteristic extraction by a depth separable convolution processing block DWConv, and the depth separable convolution processing block does not perform down sampling to obtain an up-sampled characteristic image P4.
In some embodiments of the present application, the feature image after upsampling may also be upsampled, and then the obtained feature image is subjected to feature extraction by a depth separable convolution processing block, and the obtained feature image is also the upsampled feature image. If the feature image P4 is up-sampled to obtain a first feature image, the first feature image and a feature image (not shown in the figure) obtained by performing convolution with a convolution kernel size of 1 × 1 on the feature image output by stage2 are added element by element to obtain a second feature image. Here, since the number of channels of the feature image output from stage2 is 116, there is no need to adjust the number of channels in the convolutional layer, and the second feature image is passed through the depth separable convolution processing block to obtain the up-sampled feature image P3. Similarly, in the case of up-sampling P3 to P2, since the number of channels of the feature image output from stage1 is 58, the number of channels needs to be adjusted to 116 by the convolutional layer.
The reference feature image P5 is down-sampled to obtain P6 and P7, up-sampled to obtain P4, P3 and P2, and the P2, P3, P4, P5, P6 and P7 are arranged in descending order of image size to obtain an image feature pyramid, as shown in fig. 4.
In some embodiments of the present application, the first processing block may be configured to implement the following steps, as shown in fig. 5:
1) inputting the input characteristic image into a first depth separable convolution processing block for downsampling to obtain a first downsampled characteristic image;
2) inputting the characteristic image obtained by the convolution layer with the convolution kernel size of 1 multiplied by 1 of the input characteristic image into a second depth separable convolution processing block for down-sampling to obtain a second down-sampled characteristic image;
3) respectively connecting the first downsampling characteristic image and the second downsampling characteristic image through convolution layers with convolution kernel size of 1 x 1 to obtain a first connection characteristic image;
4) and performing random channel mixing on the first connection characteristic image, and outputting a first random channel mixed characteristic image.
In some embodiments of the present application, the second processing block may be configured to implement the following steps, as shown in fig. 6:
1) carrying out channel separation on the input characteristic image to obtain a characteristic image after channel separation;
2) sequentially inputting the characteristic image after channel separation into a convolutional layer with a convolutional kernel size of 1 multiplied by 1, a depth separable convolutional processing block and a convolutional layer with a convolutional kernel size of 1 multiplied by 1 to obtain a characteristic image after characteristic extraction;
3) connecting the characteristic image after the channel separation with the characteristic image after the characteristic extraction to obtain a second connection characteristic image;
4) and performing random channel mixing on the second connection characteristic image, and outputting a second random channel mixed characteristic image.
In some embodiments of the present application, the feature detection network obtains an image feature pyramid output by the image feature pyramid network, and outputs a classification feature image and a regression feature image according to the image feature pyramid, which may specifically include the following steps:
1) the feature detection network acquires a feature image of each level in the image feature pyramid output by the image feature pyramid network;
2) inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3 to obtain a classification feature image with the channel number reduced to 2;
3) inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3, and obtaining a regression feature image with the number of channels reduced to 4.
Fig. 7 shows a schematic structural diagram of a preferred feature detection network, where Pi is a feature image of any level in an image feature pyramid, the feature image is respectively input into common convolutional layers with convolution kernel size of 3 × 3, one convolutional layer reduces the number of channels of the input feature image from 116 to 2, the output feature image with the number of channels of 2 is a classification feature image, and a numerical value corresponding to a pixel point at each position on the classification feature image is a probability that the pixel point belongs to a pixel point in a face or a pixel point in a background; the other convolution layer reduces the number of channels of the input feature image from 116 to 4, the output feature image with the number of channels of 4 is a regression feature image, and the regression feature image comprises information of a face frame.
In some embodiments of the present application, the information of the face frame includes a center point, a width, and a height of the face frame.
In some embodiments of the present application, the regression signature error may be determined by a Softmax loss function and the regression signature error may be determined by a SmoothL1 loss function.
In step S102, the face detection model is deployed to a mobile terminal, so that the mobile terminal obtains an image to be detected, and performs face detection on the image to be detected according to the face detection model, thereby obtaining a face image in the image to be detected. After the face detection model is trained, the model parameters are determined, and the model parameters are deployed to a mobile terminal, such as a mobile phone, an access control management system and the like, and are used for face detection. The image to be detected is not pre-labeled with pixel point classification and face frame information, the face detection model carries out face detection on the input image to be detected to obtain corresponding pixel point classification and face frame information, and a face image is obtained according to the information.
Some embodiments of the present application also provide an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the aforementioned method of detecting a face image.
Some embodiments of the present application also provide a computer readable medium, on which computer readable instructions are stored, the computer readable instructions being executable by a processor to implement the aforementioned method for detecting a face image.
In summary, the scheme provided by the application can construct a face detection model based on depth separable convolution, and deploy the obtained face detection model to the mobile terminal, so that the mobile terminal can use the deployed face detection model to perform face detection on the obtained image to be detected, and obtain the face image in the image to be detected, compared with the traditional depth convolution neural network, the number of model parameters and the model calculation amount are reduced, the overall detection time of the model is reduced, the face detection model can be deployed on the mobile terminal with limited storage space and calculation resources, and the real-time requirement of face detection is met; in addition, compared with the existing lightweight face detection model, the detection precision of the model is also improved.
In addition, it is known from the production practice that the parameter quantity of the face detection model constructed by the scheme of the present application can be reduced to 0.85M, when the size of the input picture is 480x640, the calculated quantity of the networks, FLOPs, is only 5.12G, and the parameter quantity of the face detection model obtained according to the existing scheme exceeds 20M, and the calculated quantity exceeds 100G. The running speed of the face detection model constructed by the method is higher, and particularly under the condition of no GPU acceleration. The face detection model constructed by the method is trained and tested on the face data set, and the accuracy under single-scale test on the verification set is as follows: 91.1 percent of easy, 90.1 percent of medium and 82.3 percent of hard. The single-scale test is that the size of each picture is not modified, the picture is directly input into a network for detection, and the detection is only carried out once. Compared with the existing lightweight face detection model FaceBoxes, the result has very obvious precision advantage, and the precision of the FaceBoxes under single-scale test is as follows: easy: 79.1%, medium: 79.4%, hard: 71.5%, and its parameters are 1.01M, calculated as 2.84G. Compared with the faceBox parameters, the face detection model constructed by the method is smaller in quantity, higher in precision and only slightly increased in calculated amount.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises a device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (10)

1. A method for detecting a face image comprises the following steps:
constructing a face detection model, wherein a feature extraction network corresponding to the face detection model comprises a plurality of image feature extraction sections which are sequentially connected, each image feature extraction section comprises a first processing block and a plurality of second processing blocks which are sequentially connected, the first processing block is used for outputting an input feature image after downsampling by using depth separable convolution, the second processing block is used for carrying out channel separation on the input feature image, and then the feature extraction is carried out on the feature image after channel separation by using the depth separable convolution and then the feature image is output;
and deploying the face detection model to a mobile terminal so that the mobile terminal can acquire an image to be detected, and performing face detection on the image to be detected according to the face detection model to acquire a face image in the image to be detected.
2. The method of claim 1, wherein constructing a face detection model comprises:
constructing a feature extraction network and a feature detection network corresponding to the face detection model, wherein the feature extraction network further comprises an image feature pyramid network, the image feature pyramid network performs up-sampling or down-sampling according to the feature images output by the image feature extraction section, and the feature images obtained through the up-sampling or down-sampling form a multi-level image feature pyramid; the feature detection network acquires an image feature pyramid output by the image feature pyramid network, and outputs a classification feature image and a regression feature image according to the image feature pyramid, wherein the classification feature image is used for explaining the probability that pixel points belong to a face, and the regression feature image is used for giving information of a face frame;
inputting a training image into the feature extraction network to extract the face features, and acquiring a corresponding feature image;
inputting the feature image into the feature detection network for feature detection to obtain a corresponding classification feature image and a corresponding regression feature image;
comparing the classification characteristic image with the face classification labeled in advance by the training image to determine a classification error;
comparing the regression feature image with a face image frame labeled in advance by the training image to determine a regression feature error;
adjusting model parameters of the face detection model based on the classification error and the regression feature error;
and when a preset model training stopping condition is met, determining the current model parameters of the face detection model as the trained model parameters of the face detection model.
3. The method of claim 2, wherein the image feature pyramid network performs up-sampling or down-sampling according to the feature images output by the image feature extraction section, and combines the feature images obtained through the up-sampling or down-sampling into a multi-level image feature pyramid, and the method includes:
acquiring a characteristic image output by the last image characteristic extraction section in the plurality of sequentially connected image characteristic extraction sections, performing channel adjustment on the characteristic image through a convolution core with the size of 1 multiplied by 1, and determining the characteristic image after the channel adjustment as a reference characteristic image;
performing down-sampling on the reference characteristic image through a depth separable convolution processing block to obtain a feature image after down-sampling;
the reference characteristic image is subjected to up-sampling, and the obtained characteristic image is subjected to characteristic extraction through a depth separable convolution processing block to obtain an up-sampled characteristic image;
and sorting the up-sampled characteristic image, the reference characteristic image and the down-sampled characteristic image according to the image size to obtain a multi-level image characteristic pyramid.
4. The method of claim 3, wherein downsampling the reference feature image through a depth separable convolution processing block to obtain a downsampled feature image, further comprising:
the feature image after down sampling is subjected to down sampling through a depth separable convolution processing block to obtain a feature image after down sampling;
the method comprises the following steps of up-sampling the reference characteristic image, then extracting the characteristics of the obtained characteristic image through a depth separable convolution processing block, and obtaining the characteristic image after up-sampling, wherein the method comprises the following steps:
the reference characteristic image is subjected to up-sampling through a bilinear interpolation algorithm, and a first characteristic image is obtained;
acquiring a characteristic image output by the last image characteristic extraction section connected with the last image characteristic extraction section, performing channel adjustment on the characteristic image through a convolution kernel with the size of 1 multiplied by 1, and performing pixel-by-pixel addition on the characteristic image after the channel adjustment and the first characteristic image to acquire a second characteristic image;
and performing feature extraction on the second feature image through a depth separable convolution processing block to obtain an up-sampled feature image.
5. The method of claim 4, wherein the depth separable convolution processing block comprises:
the device comprises a depth separable convolution layer, a batch normalization layer, a convolution layer with convolution kernel size of 1 x 1, a batch normalization layer and an activation function layer which are connected in sequence, wherein the convolution kernel size of the depth separable convolution layer is 3 x 3.
6. The method of claim 5, wherein the first processing block is to:
inputting the input characteristic image into a first depth separable convolution processing block for downsampling to obtain a first downsampled characteristic image;
inputting the characteristic image obtained by the convolution layer with the convolution kernel size of 1 multiplied by 1 of the input characteristic image into a second depth separable convolution processing block for down-sampling to obtain a second down-sampled characteristic image;
respectively connecting the first downsampling characteristic image and the second downsampling characteristic image through convolution layers with convolution kernel size of 1 x 1 to obtain a first connection characteristic image;
and performing random channel mixing on the first connection characteristic image, and outputting a first random channel mixed characteristic image.
7. The method of claim 5, wherein the second processing block is to:
carrying out channel separation on the input characteristic image to obtain a characteristic image after channel separation;
sequentially inputting the characteristic image after channel separation into a convolutional layer with a convolutional kernel size of 1 multiplied by 1, a depth separable convolutional processing block and a convolutional layer with a convolutional kernel size of 1 multiplied by 1 to obtain a characteristic image after characteristic extraction;
connecting the characteristic image after the channel separation with the characteristic image after the characteristic extraction to obtain a second connection characteristic image;
and performing random channel mixing on the second connection characteristic image, and outputting a second random channel mixed characteristic image.
8. The method of claim 2, wherein the obtaining, by the feature detection network, the image feature pyramid output by the image feature pyramid network, and outputting a classification feature image and a regression feature image according to the image feature pyramid comprises:
the feature detection network acquires a feature image of each level in the image feature pyramid output by the image feature pyramid network;
inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3 to obtain a classification feature image with the channel number reduced to 2;
inputting the feature image of each layer into a convolution layer with the convolution kernel size of 3 multiplied by 3, and obtaining a regression feature image with the number of channels reduced to 4.
9. An apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the method of any of claims 1 to 8.
10. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 8.
CN201911340156.3A 2019-12-23 2019-12-23 Method and device for detecting face image Pending CN111178217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911340156.3A CN111178217A (en) 2019-12-23 2019-12-23 Method and device for detecting face image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911340156.3A CN111178217A (en) 2019-12-23 2019-12-23 Method and device for detecting face image

Publications (1)

Publication Number Publication Date
CN111178217A true CN111178217A (en) 2020-05-19

Family

ID=70652062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911340156.3A Pending CN111178217A (en) 2019-12-23 2019-12-23 Method and device for detecting face image

Country Status (1)

Country Link
CN (1) CN111178217A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898615A (en) * 2020-06-16 2020-11-06 济南浪潮高新科技投资发展有限公司 Feature extraction method, device, equipment and medium of object detection model
CN112347936A (en) * 2020-11-07 2021-02-09 南京天通新创科技有限公司 A fast object detection method based on depthwise separable convolution
CN112364831A (en) * 2020-11-30 2021-02-12 姜培生 Face recognition method and online education system
CN112580435A (en) * 2020-11-25 2021-03-30 厦门美图之家科技有限公司 Face positioning method, face model training and detecting method and device
CN113657245A (en) * 2021-08-13 2021-11-16 亮风台(上海)信息科技有限公司 Method, device, medium and program product for human face living body detection
CN114219810A (en) * 2020-09-04 2022-03-22 Tcl科技集团股份有限公司 Portrait segmentation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150312455A1 (en) * 2013-03-15 2015-10-29 Pelican Imaging Corporation Array Camera Architecture Implementing Quantum Dot Color Filters
CN109002766A (en) * 2018-06-22 2018-12-14 北京邮电大学 A kind of expression recognition method and device
CN109034268A (en) * 2018-08-20 2018-12-18 北京林业大学 A pheromone trap-oriented optimization method for the red beetle beetle detector
CN109685737A (en) * 2018-12-24 2019-04-26 华南农业大学 A kind of image defogging method
CN110009662A (en) * 2019-04-02 2019-07-12 北京迈格威科技有限公司 Method, apparatus, electronic device, and computer-readable storage medium for face tracking

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150312455A1 (en) * 2013-03-15 2015-10-29 Pelican Imaging Corporation Array Camera Architecture Implementing Quantum Dot Color Filters
CN109002766A (en) * 2018-06-22 2018-12-14 北京邮电大学 A kind of expression recognition method and device
CN109034268A (en) * 2018-08-20 2018-12-18 北京林业大学 A pheromone trap-oriented optimization method for the red beetle beetle detector
CN109685737A (en) * 2018-12-24 2019-04-26 华南农业大学 A kind of image defogging method
CN110009662A (en) * 2019-04-02 2019-07-12 北京迈格威科技有限公司 Method, apparatus, electronic device, and computer-readable storage medium for face tracking

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MA NN ET AL: "《ShuffleNet V2:Practical Guidelines for Efficient CNN Architecture Design》" *
XU CY ET AL: "《Semantic Scene Understanding on Mobile Device with Illumination Invariance for the Visually Impaired》" *
张伟: "《引入全局约束的精简人脸关键点检测网络》" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898615A (en) * 2020-06-16 2020-11-06 济南浪潮高新科技投资发展有限公司 Feature extraction method, device, equipment and medium of object detection model
CN114219810A (en) * 2020-09-04 2022-03-22 Tcl科技集团股份有限公司 Portrait segmentation method and device
CN112347936A (en) * 2020-11-07 2021-02-09 南京天通新创科技有限公司 A fast object detection method based on depthwise separable convolution
CN112347936B (en) * 2020-11-07 2025-11-04 的卢技术有限公司 A Fast Object Detection Method Based on Depthwise Separable Convolution
CN112580435A (en) * 2020-11-25 2021-03-30 厦门美图之家科技有限公司 Face positioning method, face model training and detecting method and device
CN112580435B (en) * 2020-11-25 2024-05-31 厦门美图之家科技有限公司 Face positioning method, face model training and detecting method and device
CN112364831A (en) * 2020-11-30 2021-02-12 姜培生 Face recognition method and online education system
CN113657245A (en) * 2021-08-13 2021-11-16 亮风台(上海)信息科技有限公司 Method, device, medium and program product for human face living body detection
CN113657245B (en) * 2021-08-13 2024-04-26 亮风台(上海)信息科技有限公司 Method, device, medium and program product for human face living body detection

Similar Documents

Publication Publication Date Title
CN115187820B (en) Lightweight target detection method, device, equipment, and storage medium
CN111178217A (en) Method and device for detecting face image
US11126862B2 (en) Dense crowd counting method and apparatus
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
JP7045490B2 (en) Image segmentation and division Network training methods and equipment, equipment, media, and products
US20200126209A1 (en) System and method for detecting image forgery through convolutional neural network and method for providing non-manipulation detection service using the same
US20210358082A1 (en) Computer-implemented method using convolutional neural network, apparatus for generating composite image, and computer-program product
CN111862127A (en) Image processing method, device, storage medium and electronic device
KR20190039459A (en) Learning method and learning device for improving performance of cnn by using feature upsampling networks, and testing method and testing device using the same
CN112488923A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
CN111476719A (en) Image processing method, image processing device, computer equipment and storage medium
US20230153965A1 (en) Image processing method and related device
CN112602088A (en) Method, system and computer readable medium for improving quality of low light image
CN113807361A (en) Neural network, target detection method, neural network training method and related products
CN116721334A (en) Training methods, devices, equipment and storage media for image generation models
KR20220143550A (en) Method and apparatus for generating point cloud encoder and method and apparatus for generating point cloud data, electronic device and computer storage medium
EP4222700A1 (en) Sparse optical flow estimation
CN116994000B (en) Methods and apparatus for extracting edge features of parts, electronic equipment and storage media
CN116363037A (en) A multi-modal image fusion method, device and equipment
TW202441464A (en) Generating semantically-labelled three-dimensional models
JP2021120814A (en) Learning program and learning method and information processor
US12079957B2 (en) Modeling continuous kernels to generate an enhanced digital image from a burst of digital images
KR102488858B1 (en) Method, apparatus and program for digital restoration of damaged object
CN116543246A (en) Training method of image denoising model, image denoising method, device and equipment
CN120182612A (en) Image semantic segmentation method, device and medium based on multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20241108