CN117975383A

CN117975383A - Vehicle positioning and identifying method based on multi-mode image fusion technology

Info

Publication number: CN117975383A
Application number: CN202410387616.2A
Authority: CN
Inventors: 邓乾; 刘文平; 李思涵; 杨凌晨; 刘行军
Original assignee: HUBEI UNIVERSITY OF ECONOMICS
Current assignee: HUBEI UNIVERSITY OF ECONOMICS
Priority date: 2024-04-01
Filing date: 2024-04-01
Publication date: 2024-05-03
Anticipated expiration: 2044-04-01
Also published as: CN117975383B

Abstract

The application discloses a vehicle positioning and identifying method based on a multi-mode image fusion technology, which comprises the following steps: acquiring an infrared image and a corresponding visible light image of a target vehicle in a current environment; judging whether the current environment is a dark light environment, if so, inputting the infrared image and the visible light image into a fusion generation model to obtain an enhanced visible light image output by the fusion generation model, otherwise, taking the visible light image as the enhanced visible light image; inputting the infrared image and the enhanced visible light image into a vehicle detection model to obtain the position and model of a target vehicle output by the vehicle detection model; the vehicle detection model is trained based on the second sample infrared image and the second sample visible light image and the position label and the model label of the corresponding vehicle. The application realizes the passive positioning and vehicle type recognition of the target vehicle, and ensures that more accurate vehicle positioning and recognition results can be obtained under different illumination conditions.

Description

Vehicle positioning and identifying method based on multi-mode image fusion technology

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a vehicle positioning and identifying method based on a multi-mode image fusion technology.

Background

The vehicle positioning and identifying technology mainly adopts a target detection technology to accurately position and identify a plurality of target vehicles of different categories in images or videos, and can be applied to the fields of intelligent transportation, automatic driving, security monitoring and the like.

Under night driving conditions, existing vehicle positioning devices, such as license plate positioners and vehicle GPS devices, have the problem of insufficient visibility, which directly affects the accuracy of positioning and identifying vehicles, thereby possibly threatening traffic safety, weakening traffic monitoring efficiency, and delaying vehicle tracking in emergency situations.

Disclosure of Invention

Aiming at the defects of the prior art, the application aims to provide a vehicle positioning and identifying method based on a multi-mode image fusion technology, which aims to solve the problem that the positioning and identifying technology of the existing vehicle is poor in accuracy in a night environment.

In order to achieve the above object, in a first aspect, the present application provides a vehicle positioning and identifying method based on a multi-mode image fusion technology, including the steps of:

step S101, acquiring an infrared image and a corresponding visible light image of a target vehicle in a current environment;

Step S102, judging whether the current environment is a dark light environment, if so, inputting the infrared image and the visible light image into a fusion generation model to obtain an enhanced visible light image output by the fusion generation model, otherwise, taking the visible light image as the enhanced visible light image;

The fusion generation model is obtained by combining a discrimination model to generate countermeasure training based on the first sample infrared image and the first sample visible light image, and the discrimination model is used for discriminating the authenticity of the sample enhanced visible light image generated by the fusion generation model;

step S103, inputting the infrared image and the enhanced visible light image into a vehicle detection model to obtain the position and model of the target vehicle output by the vehicle detection model;

The vehicle detection model is obtained through training based on the second sample infrared image and the second sample visible light image and the position label and the model label of the corresponding vehicle.

In an optional example, inputting the infrared image and the visible light image into a fusion generation model to obtain an enhanced visible light image output by the fusion generation model, specifically includes:

Inputting the infrared image and the visible light image into a fusion generation model, respectively carrying out convolution processing on the infrared image and the visible light image by the fusion generation model, carrying out splicing processing on characteristics obtained by the convolution processing on characteristic channels, and inputting the characteristics obtained by the splicing processing into a pix2pix generator in the fusion generation model to obtain the enhanced visible light image;

Or the fusion generation model carries out convolution processing on the infrared image and the visible light image respectively, the features obtained by the convolution processing are spliced on the feature channel, the features obtained by the splicing processing are input to the SE attention module in the fusion generation model, and the output result of the SE attention module is input to the pix2pix generator in the fusion generation model to obtain the enhanced visible light image.

In an alternative example, the fusion generation model is specifically trained with the constraint of consistency between the sample enhanced visible light image and the first sample visible light image; the sample enhanced visible light image is generated by fusing a fusion generation model in the training process based on the simulated visible light image and the first sample infrared image; the simulated visible light image is obtained by randomly shielding and darkening the first sample visible light image.

In an alternative example, inputting the infrared image and the enhanced visible light image into a vehicle detection model to obtain a position and a model of the target vehicle output by the vehicle detection model, specifically including:

Inputting the infrared image and the enhanced visible light image into a vehicle detection model, firstly adopting double branches to respectively extract the infrared image characteristic and the visible light image characteristic by the vehicle detection model, respectively extracting the multiscale characteristics of the infrared image characteristic and the visible light image characteristic, calculating the attention weight between the multiscale characteristics of the infrared image characteristic and the visible light image characteristic by using an SE attention mechanism so as to respectively generate an infrared enhanced characteristic and a visible light enhanced characteristic, then carrying out a shuffle operation on the infrared enhanced characteristic and the visible light enhanced characteristic to obtain a mixed characteristic, and finally carrying out vehicle positioning and model classification based on the mixed characteristic to obtain the position and the model of the target vehicle.

In an alternative example, the loss function of the vehicle detection model includes a cross entropy loss between infrared enhancement features and visible enhancement features, a CIOU loss for a vehicle localization task, and a Focal loss for a vehicle model classification task.

In an alternative example, step S103 further includes:

converting the position of the target vehicle into the position of the target vehicle under a camera coordinate system based on an internal reference matrix of the camera corresponding to the infrared image;

based on the external reference matrix of the camera, the position of the target vehicle in the camera coordinate system is converted into the position of the target vehicle in the world coordinate system.

In a second aspect, the present application provides a vehicle positioning and identification system based on a multi-modal image fusion technique, comprising:

the image acquisition module is used for acquiring an infrared image and a corresponding visible light image of the target vehicle in the current environment;

The fusion generation module is used for judging whether the current environment is a dark light environment, if so, inputting the infrared image and the visible light image into a fusion generation model to obtain an enhanced visible light image output by the fusion generation model, otherwise, taking the visible light image as the enhanced visible light image; the fusion generation model is obtained by combining a discrimination model to generate countermeasure training based on the first sample infrared image and the first sample visible light image, and the discrimination model is used for discriminating the authenticity of the sample enhanced visible light image generated by the fusion generation model;

the vehicle detection module is used for inputting the infrared image and the enhanced visible light image into a vehicle detection model to obtain the position and the model of the target vehicle output by the vehicle detection model; the vehicle detection model is obtained through training based on the second sample infrared image and the second sample visible light image and the position label and the model label of the corresponding vehicle.

In a third aspect, the present application provides an electronic device comprising: at least one memory for storing a program; at least one processor for executing a memory-stored program, which when executed is adapted to carry out the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, the application provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

In general, the above technical solutions conceived by the present application have the following beneficial effects compared with the prior art:

The application provides a vehicle positioning and identifying method based on a multi-mode image fusion technology, which comprises the steps of acquiring an infrared image and a corresponding visible light image of a target vehicle in a current environment, judging whether the current environment is a dark light environment, if so, fusing an input infrared image and a visible light image with insufficient light by a fusion generation model to generate a high-quality visible light image, and then carrying out joint target detection by combining useful information of the infrared image and the high-quality visible light image by a vehicle detection model, thereby fully utilizing image information of two modes, realizing passive positioning and vehicle type identification of the target vehicle, and ensuring that more accurate vehicle positioning and identifying results can be acquired under different illumination conditions.

Drawings

FIG. 1 is a schematic flow chart of a vehicle locating and identifying method according to an embodiment of the present application;

FIG. 2 is a second flow chart of a method for locating and identifying a vehicle according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a network structure of a fusion generation model according to an embodiment of the present application;

FIG. 4 is a second diagram of a network structure of a fusion generation model according to an embodiment of the present application;

FIG. 5 is a detection flow chart of a vehicle detection model provided by an embodiment of the present application;

FIG. 6 is a block diagram of a vehicle locating and identification system provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. For example, the first sample infrared image and the second sample infrared image, etc., are sample infrared images for discriminative training of different models, and are not used to describe a particular order of sample infrared images.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

First, technical terms involved in the embodiments of the present application will be described.

Infrared imaging technology: infrared imaging techniques refer to capturing infrared radiation emitted by an object with an infrared sensor, thereby generating an infrared image. The infrared image can be used for analyzing the heat distribution of an object, and is suitable for monitoring and identification under night or low-visibility conditions.

Visible light imaging technology: the visible light imaging technology is to capture visual information such as appearance and color of an object by using a visible light sensor, and generate a visible light image. The visible light image is suitable for viewing details and features of the object.

The infrared dual-mode camera is a camera device with the capability of simultaneously acquiring and processing visible light images and infrared images. Such cameras typically comprise two independent sensors: one for capturing visible light images and the other for capturing infrared radiation images. In this way it is able to provide information about two different perspectives of a scene, which is very valuable in certain applications. In the visible mode, the infrared dual mode camera is capable of recording color images similar to conventional cameras, which are suitable for viewing visual features such as size, shape, and color of objects. In the infrared mode, the camera captures infrared radiation emitted by the object, which is related to the temperature of the object. The method combines the visible light shooting and infrared thermal imaging technologies, and can acquire visible light images and infrared images simultaneously.

Deep learning (DEEP LEARNING) is a branch of machine learning, which imitates the working mode of a human brain neural network and performs data processing and learning through a multi-level neural network. The traditional geometric core idea is to learn and extract high-level abstract features from a large amount of data by constructing and training a multi-level neural network model so as to realize effective classification and prediction of the data. Deep learning can realize accurate positioning of vehicles by constructing a multi-level neural network model and automatically learning complex features in images. By combining the traditional geometric methods, such as plane geometry, space geometry and the like, the problems of attitude estimation, scale transformation and the like in the positioning process can be effectively solved.

Object Detection (Object Detection) is an important task in the field of computer vision, aimed at accurately locating and identifying a number of different classes of Object in an image or video. Unlike image classification, which only requires determining whether an object is present in an image, object detection requires locating the position of the object in the image and classifying each object.

The cross-modal re-identification of an infrared image to a visible image refers to the task of converting the infrared image into a corresponding visible image. Infrared images and visible light images are different physical modalities that differ greatly in image characteristics and content. Through cross-modal re-identification, the infrared image can be converted into a visible light image, so that the understanding and the visualization of the infrared image content are realized.

Next, the technical scheme provided in the embodiment of the present application is described.

The application provides a vehicle positioning and identifying method based on a multi-mode image fusion technology, and fig. 1 is one of flow diagrams of the vehicle positioning and identifying method provided by the embodiment of the application, as shown in fig. 1, and the method comprises the following steps:

Here, the target vehicle, that is, the vehicle that needs to be positioned and identified, may be one or more vehicles, which is not particularly limited in the embodiment of the present application.

Specifically, an infrared dual-mode camera may be used to capture a visible light image and a corresponding infrared image of a target vehicle in a current environment. In consideration of poor visible light image quality in a dark light environment such as a night environment, the embodiment of the application judges whether the current environment is the dark light environment, if so, the fusion generation model is applied to combine the dominant parts of the infrared image and the visible light image, thereby generating a high-definition enhanced visible light image, and for a bright light environment, the step can be omitted, and the visible light image is directly used as the enhanced visible light image. On the basis, the vehicle detection model can carry out combined target detection on the infrared image and the enhanced visible light image to obtain the position and model of the target vehicle. Optionally, the determining whether the current environment is a dark light environment may specifically be performed according to whether the brightness of the collected visible light image is lower than a preset brightness threshold.

It can be understood that the first sample infrared image and the first sample visible light image, and the second sample infrared image and the second sample visible light image are respectively high-quality sample image pairs for training different models, and can be acquired by using an infrared dual-mode camera under the condition of more sufficient illumination. Further, after the model of the target vehicle is acquired, the parameter information of the target vehicle may be acquired according to a correspondence between the pre-stored vehicle model and the parameter information. The parameter information of brands, years, colors and the like corresponding to the vehicle model can be obtained through a web crawler.

According to the method provided by the embodiment of the application, the infrared image and the corresponding visible light image of the target vehicle in the current environment are obtained, whether the current environment is the dim light environment or not is judged, if the current environment is the dim light environment, the fusion generation model fuses the input infrared image and the visible light image with insufficient light to generate the high-quality visible light image, the vehicle detection model synthesizes the useful information of the infrared image and the high-quality visible light image to perform combined target detection, the image information of two modes is fully utilized, the passive positioning and the vehicle type recognition of the target vehicle are realized, and the accurate vehicle positioning and recognition results can be obtained under different illumination conditions.

Based on the above embodiment, inputting the infrared image and the visible light image into a fusion generation model, and obtaining the enhanced visible light image output by the fusion generation model specifically includes:

It should be noted that, the fusion generation model adopts an improved pix2pix network structure, so that the image quality of the generated enhanced visible light image can be further improved, and the accuracy of vehicle positioning and recognition can be further improved. The improved pix2pix network structure provides two structural alternatives, wherein the SE attention module is a channel attention module, the SE attention module can carry out channel feature enhancement on the input feature map, the size of the input feature map is not changed, and the image generation effect can be further improved.

In addition, in the cross-mode task of converting the infrared image into the corresponding visible light image, the original visible light image is combined, and the image quality of the generated enhanced visible light image is further improved.

Based on any one of the above embodiments, the fusion generation model is specifically trained with a constraint of consistency between the sample enhanced visible light image and the first sample visible light image; the sample enhanced visible light image is generated by fusing a fusion generation model in the training process based on the simulated visible light image and the first sample infrared image; the simulated visible light image is obtained by randomly shielding and darkening the first sample visible light image.

It can be understood that, considering that if the first sample infrared image and the first sample visible light image are directly used as input samples of the fusion generation model, the training label, that is, the fusion image label, is difficult to obtain, for this reason, in the embodiment of the present application, the first sample visible light image is firstly randomly shielded and darkened to obtain the simulated visible light image, so that a region of the simulated visible light image may be black in a dark environment, then the simulated visible light image and the first sample infrared image are used as input samples of the fusion generation model, the first sample visible light image is used as the training label, that is, the consistency between the sample enhanced visible light image generated by the fusion generation model and the first sample visible light image is used as constraint, the initial fusion generation model is trained, and finally the trained fusion generation model is obtained.

The consistency between the enhanced visible light image and the first sample visible light image can be judged through the judging model, and then the training of the model is generated through constraint fusion of the judging result.

Based on any one of the above embodiments, inputting the infrared image and the enhanced visible light image to a vehicle detection model, obtaining a position and a model of the target vehicle output by the vehicle detection model specifically includes:

It should be noted that, the vehicle detection model adopts an improved Dual yolo structure network, designs a Dual-Fusion (D-Fusion) module, and includes an Attention Fusion module composed of a Inception module and an Attention-Fusion module, and a Fusion-Shuffle module connected in series, so as to effectively fuse the features of two different modes. Wherein Inception module extracts multi-scale characteristics to reduce calculation cost; the Attention-Fusion module calculates the Attention weight between infrared and visible image features by using an SE Attention mechanism and generates two enhanced features, specifically, calculates the Attention feature vector of the visible image features by using the infrared image features and then combines the Attention feature vector with the visible image features to generate the enhanced visible image features, and vice versa, so as to obtain the two enhanced features; the Fusion-Shuffle module further enhances and shuffles the enhanced features. The detection module carries out vehicle positioning and model classification based on mixed features, adopts four detection heads, and each detection head is respectively responsible for detecting target objects with different dimensions such as small, medium, large, ultra-large and the like, so that the target objects with different sizes are covered, and the comprehensiveness of detection is ensured.

By the design, the Dual-YOLO architecture not only reduces redundant information, but also effectively accelerates the convergence speed of the network. Experimental results show that the infrared target detection performance is remarkably improved by the framework, and an effective solution is provided for target detection at night or under low illumination conditions.

The input of the vehicle detection model is an infrared image and an enhanced visible light image, and the output comprises:

position coordinates of the detection frame: including x, y coordinates representing the center position of the frame, as well as the width and height;

Target class probability: each detection frame can output the probability of each type of vehicle, and the most probable type, namely model, in the detection frame is represented;

Confidence level: indicating the likelihood of inclusion of the target vehicle within the detection frame.

Based on any of the above embodiments, the loss function of the vehicle detection model includes a cross entropy loss between the infrared enhancement features and the visible enhancement features, a CIOU loss for a vehicle localization task and a Focal loss for a vehicle model classification task.

It should be noted that, in the design of the loss function, the cross entropy loss between the infrared enhancement feature and the visible light enhancement feature, i.e. the feature entropy loss, is used to penalize the redundant feature in the attention fusion module, so as to improve the generalization capability of the vehicle detection model, and the positioning loss and the classification loss adopt CIoU loss and Focal loss to improve the accuracy and stability of the detection, wherein CIoU loss is used to measure the position error between the prediction frame and the real frame, so as to realize the stability of frame regression, and Focal loss is used to measure the error between the prediction category and the real category.

Based on any of the above embodiments, step S103 further includes:

It will be appreciated that the position output by the vehicle detection model is essentially voxel information of the target vehicle detection frame, and that camera internal and external parameters are also required to calculate the actual position of the target vehicle. The infrared image corresponds to the camera, namely the camera for collecting the infrared image and the visible light image in the step S101, namely the infrared dual-mode camera.

Specifically, the embodiment of the application firstly acquires detection frame information of a vehicle, including a center position (X, Y) and a size (W, H), by using a vehicle detection model based on a double-YOLO architecture.

The position (u, v) of the center point of the vehicle model in the image coordinate system can be obtained by the following formula:

，/>

Using the internal reference matrix K, and the external reference matrix consisting of the rotation matrix R and the translation matrix T, (u, v) is converted into a position in the world coordinate system. First, the image coordinates (u, v) are converted into points in the camera coordinate system ：

Then, the external parameter matrix is used forConversion to points in the world coordinate System/>：

Thus, the accurate position of each vehicle in the world coordinate system can be obtained, and high-precision vehicle positioning is realized.

Based on any of the above embodiments, at present, the vehicle positioning technology is not good at night, and there are problems that the reflection effect of the traditional reflective material under the low illumination condition is limited, the configuration of the lighting device is insufficient, the design of the device itself fails to fully consider the night use environment, and the like, and the factors work together, so that the night vehicle positioning device is difficult to reach the visibility level in the daytime.

Aiming at the defects of the prior art, the application aims to provide a vehicle positioning and identifying method based on a multi-mode image fusion technology, which realizes the passive positioning of a vehicle by using an infrared dual-mode camera to reconstruct the three-dimensional position and calculate the coordinates of the vehicle in all-weather (daytime and night) modes.

The vehicle positioning and identifying method based on the multi-mode image fusion technology can be applied to the fields of intelligent transportation, automatic driving, security monitoring and the like. Fig. 2 is a second flow chart of a vehicle positioning and identifying method according to an embodiment of the present application, as shown in fig. 2, the vehicle positioning method includes steps S10 to S40, and is described in detail as follows:

S10, acquiring a picture of a vehicle and corresponding public vehicle parameter information from the Internet; shooting an infrared image and a corresponding visible light image by using an infrared dual-mode camera to manufacture a corresponding data set;

S20, training the data set of the infrared image and the visible light image by using an improved pix2pix model, and predicting the input infrared image and the visible light image with insufficient light in a night environment to obtain a high-quality visible light image;

S30, carrying out joint target detection on the infrared image and the visible light image by adopting a vehicle detection model based on the improvement of a double YOLO architecture;

S40, according to the target detection result, combining the internal reference information and the external reference information of the camera, and realizing a pixel coordinate system reincarnation boundary coordinate system of the vehicle, thereby realizing positioning.

According to the application, a data set is acquired from the Internet and the dual-mode camera, a cross-mode conversion model and a vehicle detection model are trained, voxel information in a labeling frame, namely a detection frame, is acquired, and the actual position of the vehicle is calculated by utilizing the internal parameters and the external parameters of the camera, so that the three-dimensional coordinates of one or more vehicles under the infrared dual-mode camera can be calibrated, and the passive positioning of the vehicle is realized.

In this step S10, the specific steps of obtaining the vehicle picture and the corresponding model and vehicle parameter information may be:

(1) And acquiring a vehicle picture by writing and developing a crawler program through Python, and extracting parameter information such as model, brand, year, color and the like. By using an XML library of Python, an XML file can be created, and the picture path and corresponding parameter information are stored in the XML node according to a certain structure and specification. And cleaning the obtained vehicle pictures and models by data, storing the obtained vehicle pictures and models into a data set in a voc format after data processing, establishing parameter information corresponding to the models by using XML, and storing the parameter information by using an XML file.

(2) And shooting an infrared image and a corresponding visible light image in daytime by using an infrared dual-mode camera to manufacture a corresponding data set.

Wherein, in step S20:

And (3) in the night environment, using an improved pix2pix network to fuse the generated model, and training the model by utilizing the data of the sub-step (2) in the step S10, wherein the input of the model is a visible light image and a corresponding infrared image in a low-illumination environment, and the model outputs a high-definition visible light image, namely the enhanced visible light image. The daytime environment does not proceed to this step and proceeds directly to step S30.

Fig. 3 is one of network structure diagrams of a fusion generation model provided in the embodiment of the present application, as shown in fig. 3, compared with a standard pix2pix network structure, an input part is modified in a generator part by using an improved pix2pix, two inputs are respectively convolved, and a feature of the convolution is spliced on a feature channel, so as to obtain the fusion generation model. Fig. 4 is a second schematic diagram of a network structure of a fusion generation model according to an embodiment of the present application, as shown in fig. 4, compared to the network structure of fig. 3, an SE attention module is added, and the SE attention module performs channel feature enhancement on an input feature map without changing the size of the input feature map.

After the trained model is obtained, a visible light image and an infrared image with insufficient light are input in the night environment, and a visible light image with higher definition is generated.

Wherein, in step S30:

the improved YOLOV model of the present application will now be described in detail using the dataset prepared in step (2) in step S10 to train a vehicle detection model under visible light, first using the improved YOLOV model. The application carries out fusion target detection on the infrared image and the visible light image based on the double-YOLO architecture, acquires a detection frame for target detection, and divides the detection frame.

Fig. 5 is a detection flow chart of a vehicle detection model provided by an embodiment of the present application, as shown in fig. 5, a main design based on a Double-YOLO (Double-YOLO) architecture is as follows: based on YOLOv design, the hierarchical recognition structure comprising P1 to P6 is adopted, and the characteristic extraction aspect adopts a double-branch backlight to extract infrared and visible light image characteristics respectively. The design of a Dual-Fusion (D-Fusion) module, including a Attention Fusion module composed of Inception modules and Attention-Fusion modules, and a Fusion-Shuffle module in series, aims to effectively fuse the characteristics of two different modalities. Wherein Inception module: and multi-scale characteristics are extracted, and the calculation cost is reduced. The Attention-Fusion module: infrared and visible features are enhanced with each other by SE attention mechanisms. Fusion-Shuffle module: the infrared and visible light features are integrated by a shuffle operation to adapt the network to both modes.

In addition, the double YOLO architecture adopts four detection heads, so that targets with different sizes are covered, and the comprehensiveness of detection is ensured. In the design of the Loss function, the feature entropy Loss is used for punishing redundant features in the fusion module, and the positioning Loss and the classification Loss adopt CIoU and Focal Loss to improve the accuracy and the stability of detection.

Then, the relevant parameters are configured according to the method, the data set is divided into a training set and a verification set, and the corresponding proportion is 8:2, training by using a training module of the improved YOLOV model according to the embodiment of the application, and finally obtaining a vehicle detection model.

Next, the infrared image and the corresponding visible light image are collected as training data. The images should be paired, i.e. the infrared image and the visible image correspond to each other. The collected images are preprocessed, including adjusting the image size, normalizing, removing noise, etc. The infrared image and the visible image are paired and a data set is created such that each sample contains a pair of infrared image and visible image.

And training the model by utilizing the preprocessed training data set. In each training iteration, the model accepts a pair of infrared and visible light images as input, calculates a loss function, and updates model parameters by a back propagation algorithm. The training process continues until the model performance reaches a predetermined threshold. The model is input into an infrared image and an enhanced visible light image, and is output into a target detection frame and category information.

Wherein, in step S40:

the embodiment of the application firstly obtains the detection frame information of the vehicle by utilizing a vehicle detection model based on an improved double YOLO architecture, wherein the detection frame information comprises the center position (X, Y) and the size (namely the width and the height) (W, H) of the detection frame.

，/>

Thus, the accurate position of each vehicle in the world coordinate system can be obtained, and high-precision vehicle positioning is realized. Through this process, not only the position of the vehicle but also its orientation and posture can be determined.

Based on any one of the embodiments, the embodiment of the application provides a vehicle positioning and identifying system based on a multi-mode image fusion technology. FIG. 6 is a block diagram of a vehicle locating and recognition system according to an embodiment of the present application, as shown in FIG. 6, the system includes:

The image acquisition module 610 is configured to acquire an infrared image and a corresponding visible light image of a target vehicle in a current environment;

The fusion generation module 620 is configured to determine whether the current environment is a dark light environment, if so, input the infrared image and the visible light image to a fusion generation model to obtain an enhanced visible light image output by the fusion generation model, otherwise, take the visible light image as the enhanced visible light image; the fusion generation model is obtained by combining a discrimination model to generate countermeasure training based on the first sample infrared image and the first sample visible light image, and the discrimination model is used for discriminating the authenticity of the sample enhanced visible light image generated by the fusion generation model;

A vehicle detection module 630, configured to input the infrared image and the enhanced visible light image into a vehicle detection model, and obtain a position and a model of the target vehicle output by the vehicle detection model; the vehicle detection model is obtained through training based on the second sample infrared image and the second sample visible light image and the position label and the model label of the corresponding vehicle.

It can be understood that the detailed functional implementation of each module may be referred to the description in the foregoing method embodiment, and will not be repeated herein.

Based on the method in the above embodiment, the embodiment of the application provides an electronic device. Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 7, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform the methods of the embodiments described above.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application.

Based on the method in the above embodiment, the embodiment of the present application provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.

Based on the method in the above embodiments, an embodiment of the present application provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.

It is to be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), other general purpose processor, digital signal processor (DIGITAL SIGNAL processor, DSP), application Specific Integrated Circuit (ASIC), field programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.

The steps of the method in the embodiment of the present application may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (erasable PROM, EPROM), electrically Erasable Programmable ROM (EEPROM), registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Drive (SSD)), etc.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the application and is not intended to limit the application, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. The vehicle positioning and identifying method based on the multi-mode image fusion technology is characterized by comprising the following steps of:

2. The method according to claim 1, wherein inputting the infrared image and the visible light image into a fusion generation model, obtaining an enhanced visible light image output by the fusion generation model, specifically comprises:

3. The method according to claim 1, wherein the fusion generated model is trained with the constraint of consistency between the sample enhanced visible image and the first sample visible image; the sample enhanced visible light image is generated by fusing a fusion generation model in the training process based on the simulated visible light image and the first sample infrared image; the simulated visible light image is obtained by randomly shielding and darkening the first sample visible light image.

4. The method according to claim 1, wherein inputting the infrared image and the enhanced visible light image to a vehicle detection model to obtain a position and a model of the target vehicle output by the vehicle detection model, specifically comprises:

5. The method of claim 4, wherein the loss function of the vehicle detection model includes a cross entropy loss between infrared enhancement features and visible enhancement features, a CIOU loss for a vehicle locating task, and a Focal loss for a vehicle model classification task.

6. The method according to claim 1, further comprising, after step S103:

7. A vehicle locating and recognition system based on a multi-modal image fusion technique, comprising:

8. An electronic device, comprising:

At least one memory for storing a computer program;

At least one processor for executing the memory-stored program, which processor is adapted to perform the method according to any of claims 1-6 when the memory-stored program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program, when run on a processor, causes the processor to perform the method according to any one of claims 1-6.

10. A computer program product, characterized in that the computer program product, when run on a processor, causes the processor to perform the method according to any of claims 1-6.