[go: up one dir, main page]

CN112801027B - Vehicle target detection method based on event camera - Google Patents

Vehicle target detection method based on event camera Download PDF

Info

Publication number
CN112801027B
CN112801027B CN202110182127.XA CN202110182127A CN112801027B CN 112801027 B CN112801027 B CN 112801027B CN 202110182127 A CN202110182127 A CN 202110182127A CN 112801027 B CN112801027 B CN 112801027B
Authority
CN
China
Prior art keywords
dvs
aps
image
event
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110182127.XA
Other languages
Chinese (zh)
Other versions
CN112801027A (en
Inventor
孙艳丰
刘萌允
齐娜
施云惠
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110182127.XA priority Critical patent/CN112801027B/en
Publication of CN112801027A publication Critical patent/CN112801027A/en
Application granted granted Critical
Publication of CN112801027B publication Critical patent/CN112801027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle target detection method based on an event camera, which is researched under an extreme scene by utilizing a deep learning technology based on the event camera. The event camera can generate frame and event data asynchronously, which is helpful for overcoming motion blur and extreme lighting conditions. Firstly, converting an event into an event image, then simultaneously sending a frame image and the event image into a fusion convolutional neural network, and adding a convolutional layer for extracting characteristics of the event image; simultaneously, the characteristics of the two are fused at the middle layer of the network through a fusion module; finally, the effectiveness of vehicle target detection is improved by redesigning the loss function. The method can make up for the defect that only the frame image is used for target detection in the extreme scene, and the event image is fused in the fused convolutional neural network on the basis of using the frame image, so that the effect of vehicle target detection in the extreme scene is enhanced.

Description

Vehicle target detection method based on event camera
Technical Field
The invention discloses a vehicle target detection method under an extreme scene based on an event camera and by utilizing a deep learning technology, belongs to the field of computer vision, and particularly relates to the technologies of deep learning, target detection and the like.
Background
With the rapid development of the automotive industry, autopilot automotive technology has received extensive attention in recent years in both academia and industry. Vehicle target detection is a challenging task in the art of automatically driving vehicles. It is an important application in the fields of automatic driving automobile technology and intelligent traffic systems. It plays a key role in autopilot technology. The purpose of vehicle target detection is to accurately position the positions of the rest vehicles in the surrounding environment, so that accidents with other vehicles are avoided.
A great deal of current target detection research uses deep neural networks to augment target detection systems. These studies basically use frame-based cameras known as Active Pixel Sensors (APS). Thus, many detected objects are stationary or slowly moving, and lighting conditions are also suitable. In practice, vehicles encounter a variety of complex and extreme scenarios. In extreme illumination and motion blur, the image presented by a conventional frame-based camera may be overexposed and blurred, which may present a significant challenge for object detection.
Dynamic Vision Sensors (DVS) have key features of high dynamic range and low latency. These features enable them to capture environmental information and generate images faster than standard cameras. At the same time, they are not affected by motion blur, which is helpful for frame cameras in extreme cases. Furthermore, due to its low delay and short response time, an autonomous vehicle may be made more sensitive. Dynamic and active pixel sensors (DAVIS) can output regular gray frames and asynchronous events through APS and DVS channels, respectively. Regular grayscale frames can provide the primary information for object detection and asynchronous events can provide information for rapid motion and illumination changes. In this way, the detection performance of the target can be improved by combining the two data.
In recent years, the deep learning algorithm has achieved great success, and has been widely used in image classification and target detection. The deep neural network has excellent feature extraction capability and strong learning capability, and can identify target categories and locate target positions in target identification tasks. Convolutional Neural Networks (CNNs) based on boundary regression can regression directly from the input image to obtain the location and class of the target without searching for candidate regions. But this requires that the objects to be discriminated in the image fed into the CNN are sharp, whereas the objects of the image generated in the extreme scene may be blurred. It cannot meet the demand if only CNN is used for target detection of a frame image generated in an extreme scene.
A CNN-based vehicle detection method is presented herein that fuses the two data of a frame and an event output by a DAVIS camera. Firstly reconstructing event data into an image, simultaneously sending a frame image and the event image into a convolutional neural network, and fusing the characteristics extracted from the event image and the characteristics extracted from the frame image in a network middle layer through a fusion module. At the last detection layer, the loss function of the network is redesigned, and the loss term for the DVS feature is increased. The data set used for the experiment employs a self-built vehicle target detection data set (Dataset of APS AND DVS, DAD). The comparison of different input modes shows that the vehicle detection result is obviously improved under different environmental conditions. Meanwhile, the method proposed herein has a remarkable effect compared with a different method using a network in which a single image is input and a network in which two kinds of data are simultaneously input, or the like.
Disclosure of Invention
The invention provides a vehicle target detection method based on an event camera by utilizing a deep learning technology. Since a general camera generates motion blur, overexposure, or darkness in fast moving and extreme brightness scenes, temporal data generated by the event camera is used to enhance the detection effect. The event camera may asynchronously output events for changes in intensity, including coordinates of pixels, polarity of intensity, and time stamps, so the events are first turned into images. This is because image-based object detection techniques are now well established, whereby detection of events is achieved. Then, the frame image (APS) and the event image (DVS) are simultaneously sent to a frame (ADF) of a fusion convolution network to perform convolution operation, and feature extraction and feature fusion are performed in the network frame. Thus, the characteristics of the respective images can be extracted, and simultaneously, the finally extracted characteristics have effective characteristic information of the two characteristics. Finally, by modifying the loss function of the model, the loss term of the DVS is increased on the basis of only carrying out the loss term of the APS. The whole frame diagram of the method is shown in the attached figure 1, and can be divided into the following four steps: the event data is converted into event images, features are extracted through the integral framework of the fusion convolutional neural network, the features are fused through the fusion module, and the extracted features are subjected to target detection through the detection layer.
(1) Converting event data into event images
Considering that the current target detection algorithm for the image is relatively mature, the event data of the DVS channel is converted into the image and then is sent into the network together with the APS image for target detection. The event data is 5 parts total, the abscissa x of the pixel, the ordinate y of the pixel, the luminance polarity is increased by +1, the luminance polarity is decreased by-1 and the time stamp. According to the change of the coordinates and the polarity of the pixels, the event data is converted into an event image with the same size as the frame image in the accumulated time.
(2) Integral frame for feature extraction
The invention uses darknet-53 as a basic framework, and adds a convolution layer for extracting features of DVS images on the basis of performing convolution operation on APS images. Because the data of the DVS channel is sparse, features are extracted with fewer convolution layers at different resolutions. With reference Darknet-53, the dvs channel still employs successive 3 x 3 and 1 x1 convolutional layers. The specific number of convolution layers is shown in table 1.
(3) Fusion module
In the network architecture, a fusion module is designed with reference ResNet. The fusion module fuses the main features of the DVS with the features of the APS with the same size after extracting the main features of the DVS under different resolutions so as to guide the network to learn more detail features of the APS and the DVS at the same time. The fusion module is shown in fig. 2.
(4) Performing object detection on the extracted features through a detection layer
The loss function of the network is modified at the detection layer, and the loss function of the APS features is a cross entropy loss function, including losses to coordinates, class and confidence. The loss calculation is also carried out on the DVS characteristic by adopting a loss function of cross entropy. And finally, combining the detection result of the APS and the detection result of the DVS. The results for APS or DVS alone may still be correct results. Taking only the intersection of the two, many correct detection results will be lost. The results of the two are combined, so that errors can be reduced, and the accuracy is improved.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
The invention adopts convolutional neural network technology to detect the target of the vehicle in the polar scene based on the APS image and DVS data generated by the event camera. First, compared with using only the conventional APS image, the event data is converted into the event image, and the image is identified by using sophisticated deep learning. And then adding a fusion module in the convolutional neural network, and fusing the two parts of information in a characteristic layer. Finally, through revising the loss function, the capability of the network for target identification when problems such as target blurring and illumination discomfort exist in the image is improved, and a good effect is obtained in an extreme scene.
Drawings
FIG. 1 is a diagram of an overall network architecture;
FIG. 2 is a schematic diagram of a fusion module;
FIG. 3 is an experimental effect diagram;
Detailed Description
In light of the foregoing, the following is a specific implementation, but the scope of protection of this patent is not limited to this implementation.
Step 1: converting event data into event images
Based on the generation mechanism of the event, there are three reconstruction methods to convert the event into a frame. These are the fixed event number method, the leaky integrator method and the fixed time interval method, respectively. In the present invention, the object is to be able to detect fast moving objects. The fixed time interval method is used to set the event reconstruction to a fixed frame length of 10 ms. In each time interval, according to the pixel position generated by the event, at the corresponding pixel point generated by the polarity, the event with increased polarity is drawn as a white pixel, the event with reduced polarity is drawn as a black pixel, and the background color of the image is gray. Finally, an event image of the same size as the APS image is generated.
Step 2: feature extraction via a network overall framework
The APS image and the DVS image are simultaneously input into the network frame, features are extracted through respective convolution layers of 3×3 and 1×1, except that the number of convolution layers of the respective extracted features is different, and DVS is less than APS. The network of (a) predicts the input APS image and predicts the DVS image at the same time. Both APS and DVS images are divided into sxs grids, each of which predicts B bounding boxes, co-predicting class C. Each bbox is introduced into the Gaussian model, predicting 8 coordinate values, μ_x, ε_x, μ_y, ε_y, μ_w, ε_w, μ_h, ε_h. A confidence score p is also predicted. The last input detection layer in the network is a tensor of 2 x S x B x (c+9). The tensors of three sizes of the APS channel and the tensors of three same sizes of the DVS channel are respectively fed into the detection layer.
Step 3: fusion module
After passing through respective convolution layers, APS and DVS respectively obtain characteristics F aps and F dvs, and sending the characteristics F aps and F dvs into a fusion model, and firstly, F aps and F dvs are subjected to a given transformation operation Tc, F→U, F epsilon R, U epsilon R M×N×C,U=[u1,u2,…,uC to obtain transformed characteristics U aps and U dvs, wherein U c is a characteristic matrix with the size M multiplied by N of a C-th channel in C channels. Briefly, the Tc operation is taken as a convolution operation;
After obtaining the transformed feature U dvs, we consider the global information of all channels in the feature and compress this global information into one channel to obtain the aggregate information z c. This is accomplished by a global average pooling operation Tst (U dvs), formally expressed as:
where u c (i, j) is the (i, j) th value in the feature matrix. In order to perform excitation operation by using the aggregated information z c in compression operation, the convolution characteristic information of each channel is fused, and the dependency relationship s on the channel is obtained, namely:
s=Tex(z,E)=δ(E2σ(E1z))#(2)
where σ represents the ReLU activation function, δ represents the sigmoid activation function, and E 1 and E 2 are two weights. Two fully connected layers are used to achieve this;
Scaling of the s-activated transition U aps is used by the Tscan operation to obtain a feature block U':
U′=Tscale(Uaps,s)=Uaps·s#(3)
Finally, fusing the feature blocks of the DVS and the features of the APS to obtain a final fusion feature F aps':
The specific implementation adopts splicing operation.
Step 4: performing object detection on the extracted features through a detection layer
As in the APS section, adding a DVS detection result to the detection layer, and performing binary cross entropy loss on objects and classes detected by DVS, a negative log likelihood loss function (NLL) of the coordinate frame is as follows:
Wherein the method comprises the steps of NLL penalty for the x-coordinate of DVS. W and H are the number of grids per width and height, respectively, and K is the number of prior frames. The output of the detection layer at the kth a priori frame of the (i, j) trellis is: And The coordinates of the x are represented as such,Representing the uncertainty of the x-coordinate.Is Ground Truth of the x-coordinate, which is calculated from the width and height of the image adjusted in Gaussian YOLOv and the kth a priori frame a priori. ζ is a fixed value of 10-9.The loss of the remaining coordinates y, w, h is represented as the x-coordinate.
ωscale=2-wG×hG#(7)
Omega scale provides different weights during training according to object size (w G,hG). (6) In (a) and (b)Is a parameter that is only applied in the penalty if there is an anchor point in the a priori box that is most appropriate for the current object. The value of this parameter is 1 or 0, which is determined by the Intersection (IOU) of GroundTruth with the kth a priori frame in the (i, j) grid.
The value of C ijk depends on whether the bounding box of the grid cell fits into the predicted object. If appropriate, then C ijk =1; otherwise, C ijk=0.τnoobj indicates that the kth a priori frame of the grid is not fit into the target.Representing the correct category.The kth a priori block of the indication grid is not responsible for predicting the target.
Category losses are as follows:
p ij represents the probability that the currently detected target is the correct target.
The loss function of the DVS part is:
Where L DVS represents the sum of DVS channel coordinate value loss, class loss, and confidence loss.
L APS and L DVS remain formally identical. The loss function of the whole network is:
L=LAPS+LDVS#(11)
the loss function of the DVS channel is increased, so that the data of the extreme environment detected by the model has stronger robustness, and the accuracy of the algorithm is improved.
In order to verify the effectiveness of the proposed solution of the present invention, experiments were first performed on a custom data set. A comparison experiment was performed on different methods of inputting only APS image, inputting only DVS image, inputting superimposed image of APS and DVS pixels, and inputting both images at the same time, and the experimental results are shown in table 2. In addition, the effects of the different input modes are shown in fig. 3. Each column in the figure corresponds to an input mode. Each method selects four scenes (fast moving, light too strong, light too dark and normal). In a scene where an object is moving fast, the input DVS image may detect a fast moving vehicle, but may not detect a relatively stationary vehicle. The opposite input APS image can detect a relatively stationary vehicle, but cannot detect a fast moving vehicle. The effect of inputting an image after the superposition of APS and DVS pixels is equivalent to the effect of inputting only an APS image. By inputting two images at the same time, a good detection effect can be obtained for a vehicle that is moving rapidly or stationary. In the case of too strong or too dark illumination, neither the input APS image nor the superimposed image of APS and DVS pixels has a good detection effect. In comparison, the two parts of characteristics can be well fused by inputting the APS image and the DVS image at the same time, and the defect of APS is overcome through the DVS. DVS image detection is the worst in normal scenes because only the brightness change in the image can produce information, while the area without brightness change corresponds to the background and cannot be recognized. In general, the method of inputting two images to be fused in a network while using an ADF network is significantly superior to other methods.
At the same time, several most advanced single input network and methods were also selected for comparison, as shown in table 3. The network comparison results of the single image inputs are all compared on the custom data set. It can be seen from the table that the model of the present invention is not as effective as other networks in the case where only a single image is input, because the network itself is designed to achieve dual inputs. So when the model inputs the frame and the event at the same time, the experimental result is improved, which also proves that the recognition effect can be improved by using the event data.
In addition, the present invention compares the data set of PKU-DDD17-CAR with the JDF network that inputs both data, and the results are shown in Table 4. The event data in the dataset is converted into an image and then sent to the ADF network. The results of inputting only the frame image and simultaneously inputting the frame image and the event data are compared, respectively. Although the network is inferior to the JDF network in the case of inputting only a frame image, the network is better than the JDF network in the case of inputting both data at the same time.
Table 1 number of convolutional layers in network frame
TABLE 2 results of experiments on custom datasets
Table 3 results of comparison with Single image input network
Table 4 results of comparison with two different networks of data inputs

Claims (3)

1. The vehicle target detection method based on the event camera is characterized by comprising the following steps of: based on APS image and DVS data generated by an event camera, a convolutional neural network technology is adopted to detect a target of a vehicle in a polar scene, and the event data is converted into an event image; according to the change of the coordinates and the polarity of the pixels, converting the event data into an event image with the same size as the frame image in the accumulated time; by using a mature convolutional neural network, on the basis of darknet-53 frames, a convolutional layer for extracting features of a DVS image is added on the basis of performing convolutional operation on the APS image, and a DVS channel still adopts continuous 3×3 and 1×1 convolutional layers; then adding a fusion module in the convolutional neural network, and weighting the features of the APS with the same size after extracting the DVS features under different resolutions so as to guide the network to learn more detail features of the APS and the DVS at the same time; modifying a loss function of the network at a detection layer, wherein the loss function of the APS features adopts a cross entropy loss function, and the loss comprises losses of coordinates, categories and confidence; the loss function of the cross entropy carries out loss calculation on the DVS characteristics;
The two parts of characteristics are effectively fused in the fusion module; after passing through respective convolution layers, APS and DVS respectively obtain characteristics F aps and F dvs, sending the characteristics F aps and F dvs into a fusion model, and firstly, passing F aps and F dvs through a given transformation operation Tc, F- & gt U, F E R and U E R M×N×C,U=[u1,u2,…,uC to obtain transformed characteristics U aps and U dvs, wherein U c is a characteristic matrix with the size M multiplied by N of a C-th channel in C channels; briefly, the Tc operation is taken as a convolution operation;
after the transformation feature U dvs is obtained, the global information of all channels in the feature is considered, and the global information is compressed into one channel to obtain the aggregation information z c; this is accomplished by a global average pooling operation Tsq (U dvs), formally expressed as:
Wherein u c (i, j) is the (i, j) th value in the feature matrix; in order to perform excitation operation by using the aggregated information z c in compression operation, the convolution characteristic information of each channel is fused, and the dependency relationship s on the channel is obtained, namely:
s=Tex(z,E)=δ(E2σ(E1z)) (2)
wherein σ represents a ReLU activation function, δ represents a sigmoid activation function; e 1 and E 2 are two weights; two fully connected layers are used to achieve this;
Scaling of the s-activated transition U aps is used by the Tscan operation to obtain a feature block U':
U′=Tscale(Uaps,s)=Uaps·s (3)
Finally, fusing the feature blocks of the DVS and the features of the APS to obtain a final fusion feature F aps':
Faps′=U′⊕Uaps (4)
splicing operation is adopted in the concrete implementation;
Adding a loss term for the DVS characteristic in the detection layer; as in the APS part, adding a DVS detection result to the detection layer, and performing binary cross entropy loss on objects and classes detected by DVS, where the negative log likelihood loss function NLL of the coordinate frame is as follows:
Wherein the method comprises the steps of NLL penalty for the x coordinate of DVS; w and H are the grid number of each width and height respectively, K is the prior frame number; the output of the detection layer at the kth a priori frame of the (i, j) trellis is: And The coordinates of the x are represented as such,Representing uncertainty of x coordinates; Ground Truth, which is the x-coordinate, calculated from the width and height of the image adjusted in Gaussian YOLOv and the kth a priori frame a priori; ζ is a fixed value of 10-9; the same as the x coordinate, the loss of the rest coordinates y, w and h is represented;
ωscale=2-wG×hG (7)
omega scale provides different weights during training according to object size (w G,hG); in formula (7) Is a parameter that is applied in the penalty only if there is an anchor point in the a priori frame that is most appropriate for the current object; the value of this parameter is 1 or 0, which is determined by the intersection IOU of Ground Truth with the kth a priori frame in the (i, j) grids;
The value of C ijk depends on whether the bounding box of the grid cell fits into the predicted object; if appropriate, then C ijk =1; otherwise, C ijk=0;τnoobj indicates that the kth prior box of the grid is not fit into the target; representing the correct category; The kth prior box of the indication grid is not responsible for predicting the target;
Category losses are as follows:
P ij denotes the probability that the currently detected target is the correct target;
the loss function of the DVS part is:
Wherein L DVS represents the sum of DVS channel coordinate value penalty, class penalty, and confidence penalty;
L APS and L DVS remain formally identical; the loss function of the whole network is:
L=LAPS+LDVS (11)。
2. The event camera-based vehicle target detection method according to claim 1, wherein converting an event into an image employs a fixed time interval method; to achieve detection at a speed FPS of 100 frames per second, the frame reconstruction is set to a fixed frame length of 10 ms; in each time interval, according to the pixel position generated by the event, at the corresponding pixel point generated by the polarity, the event with increased polarity is drawn into a white pixel, the event with reduced polarity is drawn into a black pixel, and the background color of the image is gray; finally, an event image of the same size as the APS image is generated.
3. The event camera based vehicle object detection method of claim 1, wherein successive 3 x3 and 1 x 1 convolution layers are added to extract features from the DVS image; the APS image and the DVS image are simultaneously input into a network frame, the features are extracted through the respective convolution layers of 3 multiplied by 3 and 1 multiplied by 1, the difference is that the number of the convolution layers of the respective extracted features is different, and the DVS is less than the APS; the network predicts the input APS image and predicts the DVS image at the same time; both the APS image and the DVS image are divided into sxs grids, each of which predicts B bounding boxes, co-predicting class C; each bbox is introduced into the gaussian model, predicting 8 coordinate values, μ_x, ε_x, μ_y, ε_y, μ_w, ε_w, μ_h, ε_h; in addition, a confidence score p is predicted; so at the last input detection layer of the network is 2 XSXSXSXBX tensors of (c+9); the tensors of three sizes of the APS channel and the tensors of three same sizes of the DVS channel are respectively fed into the detection layer.
CN202110182127.XA 2021-02-09 2021-02-09 Vehicle target detection method based on event camera Active CN112801027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110182127.XA CN112801027B (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110182127.XA CN112801027B (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Publications (2)

Publication Number Publication Date
CN112801027A CN112801027A (en) 2021-05-14
CN112801027B true CN112801027B (en) 2024-07-12

Family

ID=75815068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110182127.XA Active CN112801027B (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Country Status (1)

Country Link
CN (1) CN112801027B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115526814A (en) * 2021-06-25 2022-12-27 华为技术有限公司 Image prediction method and device
US20240348934A1 (en) * 2021-08-24 2024-10-17 The University Of Hong Kong Event-based auto-exposure for digital photography
CN113762409B (en) * 2021-09-17 2024-06-28 北京航空航天大学 A UAV target detection method based on event camera
CN116205838A (en) * 2021-11-30 2023-06-02 鸿海精密工业股份有限公司 Abnormal image detection method, system, terminal equipment and storage medium
CN117372941A (en) * 2022-06-30 2024-01-09 清华大学 An event data processing method and related equipment
CN115497028B (en) * 2022-10-10 2023-11-07 中国电子科技集团公司信息科学研究院 Event-driven-based dynamic hidden target detection and recognition method and device
CN117893856A (en) * 2022-10-14 2024-04-16 华为技术有限公司 Signal processing method, device, equipment, storage medium and computer program
CN115631407B (en) * 2022-11-10 2023-10-20 中国石油大学(华东) Underwater transparent biological detection based on fusion of event camera and color frame image
CN116416602B (en) * 2023-04-17 2024-05-24 江南大学 Moving object detection method and system based on combination of event data and image data
CN116206196B (en) * 2023-04-27 2023-08-08 吉林大学 A multi-target detection method and detection system in marine low-light environment
CN116682000B (en) * 2023-07-28 2023-10-13 吉林大学 Underwater frogman target detection method based on event camera
CN117274321A (en) * 2023-09-26 2023-12-22 北京理工大学 A multi-modal optical flow estimation method based on event cameras
CN120635428B (en) * 2025-08-12 2025-10-24 中国空气动力研究与发展中心计算空气动力研究所 Target detection method based on event data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 A fast vehicle detection method based on deep learning
CN112163602A (en) * 2020-09-14 2021-01-01 湖北工业大学 Target detection method based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685152B (en) * 2018-12-29 2020-11-20 北京化工大学 An Image Object Detection Method Based on DC-SPP-YOLO

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 A fast vehicle detection method based on deep learning
CN112163602A (en) * 2020-09-14 2021-01-01 湖北工业大学 Target detection method based on deep neural network

Also Published As

Publication number Publication date
CN112801027A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112801027B (en) Vehicle target detection method based on event camera
CN113052210B (en) A fast low-light target detection method based on convolutional neural network
CN115376108B (en) Obstacle detection method and device in complex weather conditions
CN111582201B (en) A Lane Line Detection System Based on Geometric Attention Perception
CN113762409B (en) A UAV target detection method based on event camera
CN111832453B (en) Real-time semantic segmentation method of unmanned driving scenes based on dual-channel deep neural network
CN104517103A (en) Traffic sign classification method based on deep neural network
CN115035298B (en) Urban streetscape semantic segmentation enhancement method based on multidimensional attention mechanism
CN114494934B (en) Unsupervised moving object detection method based on information reduction rate
CN119314141B (en) Lightweight parking detection method based on multi-scale attention mechanism
CN116311154A (en) A Vehicle Detection and Recognition Method Based on YOLOv5 Model Optimization
CN114998879B (en) Fuzzy license plate recognition method based on event camera
CN118314606B (en) Pedestrian detection method based on global-local characteristics
CN116245860B (en) A small target detection method based on super-resolution-yolo network
CN117115616A (en) A real-time low-light image target detection method based on convolutional neural network
CN119723374A (en) A method, system, device and medium for detecting small targets in unmanned aerial image
CN115497059A (en) A Vehicle Behavior Recognition Method Based on Attention Network
CN119152453A (en) Infrared expressway foreign matter detection method based on Mamba framework
CN115063704B (en) A UAV monitoring target classification method based on 3D feature fusion and semantic segmentation
CN114648755B (en) Text detection method for industrial container in lightweight moving state
CN113920455B (en) Night video coloring method based on deep neural network
CN119723045A (en) A lightweight nighttime target detection method and system based on deep learning
CN119888308A (en) Lightweight semi-supervised target detection method based on joint estimation and scale fusion
CN119942588A (en) Fingertip detection method, device, equipment and storage medium based on event camera
CN119379988A (en) An image target detection method and system based on improved LW-DETR algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant